So you want to write
a type checker...

posted by Darryl on 23 Dec 2014

So you want to write a type checker, but you don't know how. Well in this post, I'm going to show you how to implement one using JavaScript, for the simply typed lambda calculus with pairs and functions. JS is not the most ideal language for this, unfortunately, because it lacks a good type system of its own, and makes a bunch of stuff really ugly and noisy, but in a way, that's ideal. It'll give you a sense of what you'd have to do in other languages with similarly impoverished type systems. The way I'm going to implement it should be relatively easy to translate into idiomatic code in various.

You can see the full code for this type checker on GitHub. There is also a Haskell version of the same type checker, for comparison. Question info at the bottom of this post as usual.

I'm assuming you know a little bit about type theory, such as having looked at the Pfenning videos that I posted in this little guide. If not, I highly recommend you brush up, otherwise some of this might be confusing.

Our first task is to figure out just what the type theory is that we want to implement. To keep things simple, we'll implement only four types (and add some more later). We'll have pair types and function types, which will be the interesting cases, and we'll also have three types Foo, Bar, and Baz, which won't do anything except give us two base types to work with. We won't be able to do much programming with just these, but we'll still be able to see how a type checker works.

Before we really get into defining the type system and implementing it in JS, we should first figure out how we're going to represent structured data in JS. To remain fairly language-generic, what we'll do is use JS objects to structure things according to their functionality within a piece of data. So for instance, just as you might implement a linked list in JS like so:

var nil = { tag: "nil" };

function cons(x,xs) {
    return { tag: "cons", head: x, tail: xs };
}

we can use this same technique for other kinds of data, such as the representation of types, or the representation of terms.

So now let's define the type theory. First we'll define a judgment A type which will serve as an internal way of representing when something is or isn't a well-formed type:

A type
======

-------- Foo Formation      -------- Bar Formation
Foo type                    Bar type


            -------- Baz Formation
            Baz type


A type    B type                  A type    B type
---------------- * Formation      ---------------- -> Formation
    A*B type                        A -> B type

We can implement this in JS relatively easily. The representation of the types will be like the example link list implementation, while the judgment A type will be a function judgment_type, which returns true if its argument is a type according to these rules, and false otherwise. We'll call the pair type helper function "prod" instead of "pair" just because we want to use "pair" later for the pair constructor.

var Foo = { tag: "Foo" };
var Bar = { tag: "Bar" };
var Baz = { tag: "Baz" };

function prod(a,b) {
    return { tag: "*", left: a, right: b };
}

function arr(a,b) {
    return { tag: "->", arg: a, ret: b };
}

function judgment_type(a) {
  if ("Foo" == a.tag || "Bar" == a.tag || "Baz" == a.tag) {
      
      return true;
      
  } else if ("*" == a.tag) {
      
      return judgment_type(a.left) && judgment_type(a.right);
      
  } else if ("->" == a.tag) {
      
      return judgment_type(a.arg) && judgment_type(a.ret);
      
  } else {
      
      return false;
      
  }
}

A more robust implementation might include checks that all of the right properties exist on the objects (i.e. check tag exists, and for each choice of its value, check that the corresponding properties exist), but for simplicity I leave these checks out. Anything created with the helper values/functions will work with this, but if you construct a tagged value manually, be careful!

Next, we should define what it means to be a context of typed variables, using a new judgment G ctx:

G ctx
=====

------ empty context
<> ctx

G ctx    A type    x is not in G
-------------------------------- new var context
          G, x : A ctx

We'll leave it undefined in the proof system what it means to say that a variable is not in a context, but of course it just means that there's no other type declaration for it in the context. We'll still need to implement it tho.

We'll use a snoc linked-list design for encoding contexts, but other options are possible:

var empty = { tag: "<>" }

function snoc(g,x,a) {
    return { tag: ",:", rest: g, name: x, type: a };
}

function not_in(n, g) {
    if ("<>" == g.tag) {
        
        return true;
        
    } else {
        
        if (n == g.name) {
            
            return false;
            
        } else {
            
            return not_in(n, g.rest);
            
        }
        
    }
}

We're not actually going to use the G ctx judgment, tho if we did want to use it (for instance, to verify that we never mess anything up by reusing a variable name) we would implement it like so:

function judgment_ctx(g) {
    if ("<>" == g.tag) {
        
        return true;
        
    } else if (",:" == g.tag) {
        
        return judgment_ctx(g.rest) &&
               judgment_type(g.type) &&
               not_in(g.name, g.rest);
        
    } else {
        
        return false;
        
    }
}

Lastly, we need to define the rules for introducing and eliminating these connectives (i.e. the type checking rules). Since Foo and Bar are just there to give us some base types, we don't need rules for them, as there are no actual values, but for pairs and functions we can use standard definitions, using a hypothetical typing judgment. Let's start with pairs:

G !- M : A
==========

G !- M : A    G !- N : B
------------------------ * Intro
    G !- (M,N) : A*B

G !- P : A*B    G, x : A, y : B !- M : C
----------------------------------------- * Elim
G !- split P as (x :: A, y :: B) in M : C

This uses the pair splitting definition of pairs (which is basically just a pattern match on pairs) rather than the projection definition (with fst and snd) simply because its easier to type check these. If we used the projection definition, we would need to do type inference as well as checking. The notation for the eliminator uses parentheses and double colons to be mnemonic of what's going on, but we could just as well have written split(P,x,A,y,B,M).

function pair(m,n) {
  return { tag: "(,)", first: m, second: n };
}

function split(p, x, a, y, b, m) {
    return { tag: "split",
             pair: p,
             name_x: x, type_a: a,
             name_y: y, type_b: b,
             body: m };
}

// judgment_check will be modified shortly
function judgment_check(g, m, a) {
    
    if ("(,)" == m.tag && "*" == a.tag) {
        
        return judgment_check(g, m.first, a.left) &&
               judgment_check(g, m.second, a.right);
        
    } else if ("split" == m.tag) {
        
        return judgment_check(g, m.pair, prod(m.type_a, m.type_b)) &&
               judgment_check(snoc(snoc(g, m.name_x, m.type_a),
                                   m.name_y, m.type_b),
                              m.body,
                              a);
    
    } else {
        
        return false;
        
    }
    
}

And now the rules for function types:

G, x : A !- M : B
------------------ -> Intro
G !- \x.M : A -> B

G !- M : A -> B    G !- N : A
----------------------------- -> Elim
   G !- app(M, N :: A) : B

This eliminator marks the type of the argument, again because its easier to type check that way. If you look at the rules for pair and function elims, you'll notice that without these annotations, some information is present in the premises but not in the conclusion, which would make it impossible to simply type check, and would necessitate inference.

Now the implementation (which changes judgment_check):

function lam(x,m) {
    return { tag: "lam", name: x, body: m };
}

function app(m,n,a) {
    return { tag: "app", fun: m, arg: n, type_arg: a };
}

// the new version
function judgment_check(g, m, a) {
    
    ... else if ("lam" == m.tag && "->" == a.tag) {
        
        return judgment_check(snoc(g, m.name, a.arg), m.body, a.ret);
        
    } else if ("app" == m.tag) {
        
        return judgment_check(g, m.fun, arr(m.type_arg, a)) &&
               judgment_check(g, m.arg, m.type_arg);
        
    } ...
    
}

The only thing that we have to do is implement the rule for variables:

x has type A in G
----------------- variable
   G !- x : A

However, to do this we need to implement a way to check that a variable has a specific type in the context, and therefore we need to have a way of testing equality on types. Using JS's built in equality won't work because we need to traverse structure. It would be nice if it were true that prod(Foo,Bar) == prod(Foo,Bar), but it simply isn't, because each use of prod creates a new JS object. So we'll define some auxiliary functions first:

function type_equality(a,b) {
    
    if (("Foo" == a.tag && "Foo" == b.tag) ||
        ("Bar" == a.tag && "Bar" == b.tag) ||
        ("Baz" == a.tag && "Baz" == b.tag)) {
        
        return true;
        
    } else if ("*" == a.tag && "*" == b.tag) {
        
        return type_equality(a.left, b.left) &&
               type_equality(a.right, b.right);
        
    } else if ("->" == a.tag && "->" == b.tag) {
        
        return type_equality(a.arg, b.arg) &&
               type_equality(a.ret, b.ret);
        
    } else {
        
        return false;
        
    }
    
}

function var_has_type(n,a,g) {
    if ("<>" == g.tag) {
        
        return false;
        
    } else if (",:" == g.tag) {
        
        if (n == g.name) {
            
            return type_equality(a, g.type);
            
        } else {
            
            return var_has_type(n, a, g.rest);
            
        }
        
    }
}

And now we can add the variable rule to the checking function:

function v(n) {
    return { tag: "variable", name: n };
}

function judgment_check(g, m, a) {
    
    ... else if ("variable" == m.tag) {
        
        return var_has_type(g, m.name, a);
        
    } ...
    
}

And that's it! If we now test it on some standard examples, we get the results we want:

/*
 *  the identity function for foo
 *  !- \x. x : Foo -> Foo
 */
judgment_check(empty,
               lam("x",v("x")),
               arr(Foo,Foo))

/*
 * the fst function
 * !- \p. split p as (x :: Foo, y :: Bar) in x : Foo*Bar -> Foo
 */
judgment_check(empty,
               lam("p",split(v("p"),"x",Foo,"y",Bar,v("x"))),
               arr(prod(Foo,Bar),Foo))

/*
 * the const function
 * !- \x. \y. x : Foo -> Bar -> Foo
 */
judgment_check(empty,
               lam("x", lam("y", v("x"))),
               arr(Foo, arr(Bar, Foo)))

/*
 * the apply function
 * !- \f. \x. f x : (Foo -> Bar) -> Foo -> Bar
 */
judgment_check(empty,
               lam("f", lam("x", app(v("f"), v("x"), Foo))),
               arr(arr(Foo,Bar),arr(Foo,Bar)))

/*
 * the continuize function or reverse apply function
 * !- \x. \f. f x : Foo -> (Foo -> Bar) -> Bar
 */
judgment_check(empty,
               lam("x", lam("f", app(v("f"), v("x"), Foo))),
               arr(Foo, arr(arr(Foo,Bar),Bar)))

/*
 * currying
 * !- \f. \x. \y. f (x,y) : (Foo*Bar -> Baz) -> Foo -> Bar -> Baz
 */
judgment_check(empty,
               lam("f", lam("x", lam("y",
                 app(v("f"), pair(v("x"), v("y")), prod(Foo,Bar))))),
               arr(arr(prod(Foo,Bar),Baz), arr(Foo, arr(Bar, Baz))))

/*
 * uncurrying
 * !- \f. \p. split p as (x :: Foo, y :: Bar) in f x y
 *  : (Foo -> Bar -> Baz) -> Foo*Bar -> Baz
 */
judgment_check(empty,
               lam("f", lam("p", split(v("p"), "x", Foo, "y", Bar,
                 app(app(v("f"), v("x"), Foo), v("y"), Bar)))),
               arr(arr(Foo, arr(Bar, Baz)), arr(prod(Foo,Bar), Baz)))

Again, you can see the full code for this type checker on GitHub, and there is a Haskell version of the same type checker, for comparison.

If you have comments or questions, get it touch. I'm @psygnisfive on Twitter, augur on freenode (in #languagengine, ##typetheory, and #haskell). Here's the HN thread if you prefer that mode.