Understanding Plotkin's Call-By-Name,
Call-By-Value and the λ-Calculus

posted by Darryl on 6 September 2016

This post is an attempt to communicate what I've come to understand about Plotkin's paper Call-By-Name, Call-By-Value and the λ-Calculus. In this paper, Plotkin describes the a form of the λ Calculus, together with an abstract machine called the SECD machine, and then describes, partially, how to get from an SECD machine to a continuation passing style implementation in a host λ Calculus. Plotkin addresses the issue that your host language might be lazy (call-by-name) while the language you want to implement is eager (call-by-value), or vice versa, and a naive implementation won't suffice, hence the CPS translation.

It took me a little bit to fully understand what was going on with the paper, and it took even longer to understand how to arrive at the translations that Plotkin has. Or to be more accurate, there is an algorithm that can be built relatively quickly from Plotkin's examples, but formulating the reasoning behind why that's the right algorithm isn't clear. So what I'll try to do in this post is show the process that I arrived at. I'll also use a simplification: rather than the SECD machine, I'll use a CEK machine. In fact, just a CK machine, to simplify even further.

The λ Calculus

First, we'll need to define the version of the LC we'll be using in this post. We'll have a type theory in which we have function and pair types, but this generalizes naturally.

Γ ⊢ (M,N) : A × B ; ×I
  Γ ⊢ M : A
  Γ ⊢ N : B
Γ ⊢ fst P : A ; ×E₁
  Γ ⊢ P : A × B
Γ ⊢ snd P : B ; ×E₂
  Γ ⊢ P : A × B
Γ ⊢ λx.M : A → B ; →I
  Γ, x : A ⊢ M : B
Γ ⊢ M N : B ; →E
  Γ ⊢ M : A → B
  Γ ⊢ N : A

This LC has the usual β reductions:

fst (M,N) ⇝ M
snd (M,N) ⇝ N
(λx.M) N ⇝ [N/x]M

The CK Machine

The CK Machine (C for Control, K for K(c)ontinuation or K(c)ontext) is a state machine with two states, which we'll fall forward and backward, corresponding to a recursion phase and a return/reduce phase. Each state has two parts, an evaluation context or call stack (depending on how you wish to think about it), and either an expression to compute (in the forward direction) or a return value (in the return/reduce phase). We define these as follows:

Terms: M,N ::= (M,N) | fst M | snd M | λx.M | M N
Values: U,V ::= (U,V) | λx.M
Contexts: K ::= • | (K,N) | (V,K) | fst K | snd K | K N | V K
States:  S ::= K ▻ M | K ◅ V

It helps to think about contexts as being typed as well, so we will define a judgment K from A to R to mean that K is a context for an A, inside of a computation for an R, as follows:

• from A to A ;
(K,N) from A to R
  K from A × B to R
  ⊢ N : B
(M,K) from B to R
  ⊢ M : A
  K from A × B to R
fst K from A × B to R
  K from A to R
snd K from A × B to R
  K from B to R
K N from A → B to R
  K from B to R
  ⊢ N : A
M K from A to R
  ⊢ M : A → B
  K from B to R

Additionally, a state computers an R precisely when the context is an R-computing context for an A, and the term/value is of type A:

K ▻ M state R
  K from A to R
  ⊢ M : A
K ◅ M state R
  K from A to R
  ⊢ M : A

Readers familiar with type theory for classical logic will recognize this as being very similar to the typing rules for refutations, as classical logic is a more general form of type theory for CK machines.

Eager computation with the CK machine proceeds by iterated stepping defined as follows:

K ▻ (M,N)  ↦  (K,N) ▻ M
(K,N) ◅ M'  ↦  (M',K) ▻ N
(M',K) ◅ N'  ↦  K ◅ (M',N')

K ▻ fst P  ↦  fst K ▻ P
fst K ◅ (M',N')  ↦  K ◅ M'

K ▻ snd P  ↦  snd K ▻ P
snd K ◅ (M',N')  ↦  K ◅ N'

K ▻ λx.M  ↦  K ◅ λx.M

K ▻ M N  ↦  K N ▻ M
K N ◅ M'  ↦  M' K ▻ N
(λx.M') K ◅ N'  ↦  K ▻ [N'/x]M'

Additionally, lazy computation with the CK machine proceeds by iterated stepping defined as follows:

K ▻ (M,N)  ↦  K ◅ (M,N)

K ▻ fst P  ↦  fst K ▻ P
fst K ◅ (M,N)  ↦  K ▻ M

K ▻ snd P  ↦  snd K ▻ P
snd K ◅ (M',N')  ↦  K ◅ N'

K ▻ λx.M  ↦  K ◅ λx.M

K ▻ M N  ↦  K N ▻ M
K N ◅ (λx.M')  ↦  K ▻ [N/x]M'

That is to say, lazy computation doesn't compute under pairs, and doesn't compute on arguments, during the recursive phase

We start a CK machine running on a program M with the initial state • ▻ M, and it's done when it returns to the empty stack, i.e. when it's in the state • ◅ M' for some value M'.

Interpreting CK Machines

Taking the above definitions to be a kind of syntactic representation, we can think of an evaluator defined in our host language, which converts the syntactic representations into their equivalent form in the host language. If, for instance, our host language is Haskell, we can define a function that evaluates terms by recursively evaluating and applying. Doing this will make the implemented language identical to the host language (in this case Haskell) in terms of being eager or lazy.

Alternatively, we can implement the state machine via it's transitions, and we would instead get an eager or lazy, despite Haskell being lazy, simply in virtue of what the specified next states are, rather than relying on Haskell's own evaluation semantics to evaluate pairs and functions.

However, the state machine is defined on the syntax, we're not translating into the host language's own terms. The implemented pairs don't become Haskell pairs, nor implemented functions Haskell functions. They're just some elements of a term type. This is distinct from the evaluation function we could define, which would genuinely turn implemented pairs into Haskell pairs, etc.

But, using the state machine as a basis, we can define a translation into the host language that has the desired eager or lazy evaluation. We do this by drawing on Conal Elliot's denotational design approach, viewing the host language programs as denotations, and demanding that when we translate, we use a homomorphism. That is to say, we'll define a collection of functions that preserve the structure of states, contexts, terms, and values, when translating into the host language.

First we'll define translations of terms. Given some term M : A, we want to give tm M : Tm A, where Tm A is some translation of the type.

tm x = variableTm x
tm (M,N) = pairTm (tm M) (tm N)
tm (fst P) = firstTm (tm P)
tm (snd P) = secondTm (tm P
tm (λx.M) = lambdaTm (λx. tm M)
tm (M N) = applyTm (tm M) (tm N)

This translation relies on six currently undefined functions that will translate each respective sort of thing. We will have to chose how to define this type level translation later as well. We leave it unspecified for now, but we will state the typing relations we expect to hold amongst them:

If  x : A  then  variableTm x : Tm A
If  M : Tm A  and  N : Tm B  then  pairTm M N : Tm (A × B)
If  P : Tm (A × B)  then  fstTm P : Tm A
If  P : Tm (A × B)  then  sndTm P : Tm B
If  F : Val A → Tm B  then  lambdaTm F : Tm (A → B)
If  M : Tm (A → B)  and  N : Tm A  then  applyTm M N : Tm B

Next, we define a translation on values. If M' : A, then val M' : Val A, defined as

val (M',N') = pairVal (val M') (val N')
val (λx.M) = lambdaVal (λx. tm M)

Again we have to define the interpretation of types, and the auxiliary functions, which we'll come back to, but the typing relations for them are

If  M' : Val A  and  N' : Val B  then  pairVal M' N' : Val (A × B)
If  F : Val A → Tm B  then  lambdaVal F : Val (A → B)

Next, translating contexts. Given a context K from A to R we expect to translate that into some ctx K : Ctx A R, for some choice of types for the context's translation. We define it similar to the definitions for terms:

ctx • = done
ctx (K,N) = pairLCtx (ctx K) (tm N)
ctx (M',K) = pairRCtx (val M') (ctx K)
ctx (fst K) = fstCtx (ctx K)
ctx (snd K) = sndCtx (ctx K)
ctx (K N) = appLCtx (ctx K) (tm N)
ctx (M' K) = appRCtx (val M') (ctx K)

The typing relations for contexts are obvious given the relations for terms and values:

done : Ctx A A
If  K : Ctx (A × B) R  and  N : Tm B  then  pairLCtx K N : Ctx A R
If  M' : Val A  and  K : Ctx (A × B) R  then  pairRCtx M' K : Ctx B R
If  K : Ctx A R  then  fstCtx K : Ctx (A × B) R
If  K : Ctx B R  then  sndCtx K : Ctx (A × B) R
If  K : Ctx B R  and  N : Tm A  then  appLCtx K N : Ctx (A → B) R
If  M' : Val (A → B)  and  K : Ctx B R  then  appRCtx M' K : Ctx A R

And finally we'll translate machine states, so given a state S state R, we'll have st S : Val R:

st (K ▻ M) = forward (ctx K) (tm M)
st (K ◅ M') = backward (ctx K) (val M')

Which obviously should obey the typing relations

If  K : Ctx A R  and  M : Tm A  then  forward K M : Val R
If  K : Ctx A R  and  M' : Val A  then  backward K M' : Val R

Again, we haven't yet defined any of these auxiliary functions/values, but we know what the typing relations are that have to hold and so that gives us some guidance.

Translations Preserve Execution

The main guiding principle for how we will proceed now is that we want the translations of states to preserve the behavior of the states. That is to say, if S ↦* S' then st S ⇝* st S'. Or to put it another way, if a state S steps some number of times to S', then the translation st S computes in the host language to st S'. This provides a substantial constraint on the possible definitions of the auxiliary functions and type translations now.

In particular, we have two major choices. The first is to translate things into some concrete data type, except for states, which would perform some pattern matching computation on the concrete data. For example, we could use the trivial translation turning terms, values, and contexts into themselves, and then executing the state machine.

The other choice is to translate things into some use of function types, and use function reduction instead pattern matching as the computation in the host language. We will use the second choice.

Having decided then that computation of forward K M and backward K M' is done by function reduction, we have slightly less than three choices: either the contexts are functions that take terms as arguments, or terms are functions that take contexts as arguments. We don't have a choice about whether or not contexts take values as arguments, however. Since the entire thing must return a value, we cannot chose values to be functions from contexts to values. Instead, it must be that contexts are functions from values to values. This induces more constraints: either terms and values are interpreted the same (if terms are arguments to contexts) or terms are functions from contexts (i.e. functions from values to values) to values. Using Haskell type notation:

Either  Tm A = Val A  and  Ctx A R = Val A -> Val R
Or  Tm A = (Val A -> Val R) -> Val R  and  Ctx A R = Val A -> Val R

The first choice can be ruled out as follows: if the only function types involved are contexts, then all behavior is determined by the context. But the state machine's forward direction is determined by the nature of the term not the context. Therefore, contexts cannot be the only function types involved. So we must pick the second option.

We now can state the definitions of forward and backward:

forward K M = M K
backward K M' = K M'

That is to say, in the forward direction, the translation of the term (M) takes the translation of the context (K) as argument, while in the backward direction, the translation of the context (K) takes the translation of the value (M') as argument.

Now we can pick a representation for values that's straight forwardly in terms of host language programs:

Val (A × B) = Val A × Val B
Val (A → B) = Val A → Tm B

This together with the typing relation we expect to hold gives us obvious definitions for the value auxiliary functions:

pairVal M N = (M,N)
lambdaVal F = F

We can now look at the statements that define how the CK machine steps, and look at what host language computations that induces. In the case of the recursive steps for pairs, we have

K ▻ (M,N) ↦ (K,N) ▻ M

therefore

pairTm (tm M) (tm N) (ctx K) ⇝* tm M (pairLCtx (ctx K) (tm N))

If we now ask how to define pairTm so that this computation holds, there's one obvious way: simply define it as a function that directly reduces to this:

pairTm M N = λk. M (pairLCtx K N)

We have to still define pairLCtx before we have a full definition for pairTm, but we're closer. Let's now look at the next machine step for pairs:

(K,N) ◅ M' ↦ (M',K) ▻ N

therefore

pairLCtx (ctx K) (tm N) (val M') ⇝* tm N (pairRCtx (val M') (ctx K))

We can again use this to directly define pairLCtx as an obvious function:

pairLCtx K N = λm'. N (pairRCtx m' K)

The next step for pairs gives us

(M',K) ◅ N' ↦ K ◅ (M',N')

therefore

pairRCtx (val M') (ctx K) (val N') ⇝* K (M',N')

We can now define pairRCtx:

pairRCtx M' K = λn'. K (M',n')

Unfolding these definitions, we find

pairTm M N = λk. M (λm'. N (λn'. k (m',n')))

Which is precisely the definition of a CPS transformed pair constructor.

We also have the recursive and computation steps for fst:

K ▻ fst P ↦ fst K ▻ P

therefore

fstTm (tm P) (ctx K) ⇝* tm P (fstCtx (ctx K))

Which gives the definition

fstTm P = λk. P (fstCtx K)

And the computation step

fst K ◅ (M',N') ↦ K ◅ M'

therefore

fstCtx (ctx K) (val M', val N') ⇝* K M'

Which gives us the definition

fstCtx K = λp'. K (fst p')

Together, this tells us

fstTm P = λk. P (λp'. K (fst p'))

Similarly for snd we have the recursive step

K ▻ snd P ↦ snd P ▻ P

therefore

sndTm (tm P) (ctx K) ⇝* tm P (sndCtx (ctx K))

Giving the definition

sndTm P = λk. P (sndCtx k)

And the computation step

snd K ◅ (M',N') ↦ K ◅ N'

therefore

sndCtx (ctx K) (val M', val N') ⇝* K N'

Which gives us the definition

sndCtx K = λp'. K (snd p')

And together these two give us

sndTm P = λk. P (λp'. K (snd p'))

We now have the full, eager CPS transformed interpretation of the pairs of the STLC, derived directly from the CK machine steps, under the constraint that host language computation follows the machine steps. Let's do the same for functions.

K ▻ λx.M ↦ K ◅ λx.M

therefore

lambdaTm (λx. tm M) (ctx K) ⇝* ctx K (λx. tm M)

Which gives us the definition

lambdaTm F = λk. k F

And now the first application step:

K ▻ M N ↦ K N ▻ M

therefore

appTm (tm M) (tm N) (ctx K) ⇝* tm M (appLCtx (ctx K) (tm N))

Which gives the definition

appTm M N = λk. M (appLCtx K N)

The second application step:

K N ◅ λx.M' ↦ (λx.M') K ▻ N

therefore

appLCtx (ctx K) (tm N) M' ⇝* tm N (appCtxR M' (ctx K))

Giving the definition

appLCtx K N = λf'. N (appCtxR f' K)

And finally the computation step

(λx.M') K ◅ N' ↦ K ▻ [N'/x]M'

therefore

appCtxR (val M') (ctx K) (val N') ⇝* val M' (val N') (ctx K)

Giving the definition

appCtxR M' K = λn'. M' n' K

Together with the previous steps, we have

appTm M N = λk. M (λm'. N (λn'. m' n' k))

We now have, again, the standard CPS translation for eager evaluation, this time for functions, derived straight from the CK machine. The same can be done for lazy evaluation by simply using the lazy CK machine instead of the eager one, and performing the same translation process.

If you have comments or questions, get it touch. I'm @psygnisfive on Twitter, augur on freenode (in #languagengine and #haskell).