Statically Typed Lambda Calculus as an Embedded Language
In this post we will take a look at what it takes to create an embedded language: a language contained and used from within a different language. The host language of choice will be haskell, which will provide us with powerful typechecking utilities that can decide whether our embedded program is admissible at compile time.
A good example of when an embedded language is actually interesting, is the Accelerate project. Which is itself an embedded language that lets you describe parallel computations that can then be compiled to either CPU or even GPU code. The Accelerate language is, just like haskell or any other functional language, an embellished form of lambda calculus. As you will come to discover, pure lambda may be powerful enough, but is certainly not very pragmatic or intuitive at times. However, it forms a great basis from which you can later expand to create your own language.
In this post we create multiple iterations of an embedded language (click to view source):
- Simplest (no types no environment)
- Harder (types but no environment, worst option)
- Complete (types with a typed environment, great as a basis of another language)
- Scoped (untyped with a scope environment, the best of lambda calculus)
The code in this post was heavily inspired by the CHAD: Combinatory Homomorphic Automatic Differentiation project. Which is an embedded language for automatic differentiation. One of the authors of that project and an acquittance at Utrecht University, Tom Smeding, whom I regularly trouble with my questions, also deserves some credit for answering those questions.
Lambda Calculus
If you randomly stumbled upon this post, chances are that you already know what lambda calculus is. If not, let me give a quick overview.
Lambda calculus is a programming language (of sorts) that is proven to be expressively equivalent to a Turing machine, a theoretical machine that forms the basis for the modern day computer. By extension, this means that like a Turing machine, any computation can be expressed as lambda calculus. What makes lambda calculus so interesting however, is its sheer simplicity. There are only 3 distinct concepts: variables, abstractions (i.e. functions), and function application. Refer to the following examples:
x : variable named x
λx.x : function that takes an argument and simply returns it (identity function)
f x : the variable x applied to the variable f (which is supposedly a function)
# Function application (reduce by taking the body of the abstraction, and substitute)
(λx.x) y === y (id y === y)
# more examples:
(λx.λy.y x) a b === b a (flip ($) a b === b a)
With that short refresh out of the way, I would like to refer you to this most amazing presentation that tackles the subject of lambda calculus way better than I ever could. If you still feel confused, feel especially targeted as it is beginner friendly:
De Bruijn indices
The first problem we encounter is with the shadowing of variables. For example, the following expression is perfectly valid lambda calculus:
λx.λx.x
(λx.λx.x) y === λx.x
λ.λ.0
Simplest: No types
We first define the datatype as well as a custom Show instance:
data Term = Var Int | Lambda Term | App Term Term
instance Show Term where
show (Var n) = "x" <> show n
show (Lambda p) = "λ." <> show p
show (App f arg) = "(" <> show f <> ") " <> show arg
test :: Term
test = Lambda (Lambda (Var 0)) `App` Var 1 `App` Var 2
ghci> test
((λ.λ.x0) x1) x2
-- subst variable with id n, with term sub, in a term
subst' :: Int -> Term -> Term -> Term
subst' n sub (Var n') = if n == n' then sub else Var n'
subst' n sub (Lambda p) = Lambda $ subst' (n+1) sub p
subst' n sub (App f arg) = App (subst' n sub f) (subst' n sub arg)
-- Commonly used when wishing to erase a lambda
subst = subst' 0
reduce :: Term -> Term
reduce (Var n) = Var n
reduce (Lambda p) = Lambda (reduce p)
reduce (App f arg) = let rf = reduce f in case rf of
-- After substitution, more reduction opportunities may arise
Lambda p -> reduce $ subst arg p
-- The function argument could not be reduced
_ -> App rf (reduce arg)
reduce test
x2
eval :: Term -> a
eval = error "Not even Tom Cruise can do this..."
Harder: Typed expressions
Using the power of Generic Algebraic Datatypes (GADTs), we can inductively build up a return type as we construct a term:
{-# LANGUAGE GADTs #-}
module Harder where
data Term a where
Var :: Int -> Term a
Lambda :: Term b -> Term (a -> b)
App :: Term (a -> b) -> Term a -> Term b
instance Show (Term a) where
show (Var n) = "x" <> show n
show (Lambda p) = "λ." <> show p
show (App f arg) = "(" <> show f <> ") " <> show arg
Perhaps now we can implement the eval function that maps our embedded language back to the host language. Our evaluation strategy is as follows: every time we encounter a lambda, we return a lambda function and evaluate the body with that argument in the evaluation environment (a list). When we encounter a variable, we can simply index the list at its De Bruijn index to get the value.
But we have a problem... The environment can contain terms of all sorts of types, but we cannot have a polymorphic list in haskell (not without more advanced features). Sadly, we can at best botch it with some unsafeCoerce sprinkles; we might just as well have used C.
import Unsafe.Coerce
eval :: Term a -> a
eval = eval' []
where eval' :: [t] -> Term a -> a
eval' env (Var n) = unsafeCoerce $ env !! n
eval' env (Lambda p) = \arg -> eval' (unsafeCoerce arg:env) p
eval' env (App f arg) = eval f (eval arg)
-- But we can evaluate if we give the compiler some type hints
ghci> eval (Lambda (Lambda (Var 0))) 10 20 :: Int
20
subst' :: Int -> Term a -> Term b -> Term b
-- There is no evidence that a ~ b (cause it isn't always true...)
subst' n sub (Var n') = if n == n' then unsafeCoerce sub else Var n'
subst' n sub (Lambda p) = Lambda $ subst' (n+1) sub p
subst' n sub (App f arg) = App (subst' n sub f) (subst' n sub arg)
Complete: Type Environments
The ultimate crux is the environment from which variables get their values. If we encounter a variable with index 1, we must be sure that that binding site exists, as well as as that the type of that variable matches with the value in the environment. The way to go about this is to keep track of a type level environment in the Term datatype itself. This describes on the type level what kind of environment it needs to resolve all its variables. If that environment is empty, the expression is closed and can be seen as a program that stands on its own.
-- We get all the imports and language extensions out of the way
{-# LANGUAGE GADTs #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE Rank2Types #-}
{-# LANGUAGE LambdaCase #-}
module Complete where
import qualified Data.Map as M
import Data.Map (Map)
import Data.Maybe
import Data.Proxy
import Data.GADT.Compare
import Data.Type.Equality ((:~:)(Refl))
data Term env a where
Var :: Idx env a -> Term env a
Lambda :: Term (a ': env) b -> Term env (a -> b)
App :: Term env (a -> b) -> Term env a -> Term env b
instance Show (Term env a) where
show (Var x) = show x
show (Lambda p) = "λ" <> "." <> "(" <> show p <> ")"
show (App f arg@(Var _)) = show f <> " " <> show arg
show (App f arg) = show f <> " " <> "(" <> show arg <> ")"
-- Our programs should be of this type, implying that they are closed terms
type Exp = Term '[]
-- | Typed De Bruijn index
data Idx env t where
Z :: Idx (t ': env) t
S :: Idx env t -> Idx (s ': env) t
-- Z = 0
-- S Z = 1
-- S (S Z) = 2
-- etc
idxToInt :: Idx env t -> Int
idxToInt Z = 0
idxToInt (S i) = 1 + idxToInt i
instance Show (Idx env t) where
show = ("x"<>) . show . idxToInt
data Env env where
VZ :: Env '[]
VS :: t -> Env env -> Env (t ': env)
envLookup :: Env env -> Idx env t -> t
envLookup (VS x _) Z = x
-- we decrement the index and the list in lockstep to count down
-- to the target value
envLookup (VS _ env) (S i) = envLookup env i
-- evaluation starts with an empty environment,
-- collecting variables each time we encounter an abstraction
eval :: Exp a -> a
eval = eval' VZ
where eval' :: Env env -> Term env a -> a
eval' env (Var i) = envLookup env i
eval' env (Lambda e) = \arg -> eval' (VS arg env) e
eval' env (App f e) = eval' env f (eval' env e)
ghci> eval (Lambda (Lambda (Var Z))) "ignore me" 420
420
ghci> Lambda (Var (S Z)) :: Exp (a -> b)
<interactive>:4:1: error:
• Couldn't match type: b1 : env0
with: '[]
Expected: Exp (a1 -> b1)
Actual: Term (b1 : env0) (a1 -> b1)
• In the expression: Lambda (Var (S Z)) :: Exp (a -> b)
In an equation for ‘it’: it = Lambda (Var (S Z)) :: Exp (a -> b)
Complete: Substitutions and beta reduction
Us awaits only one last feature: beta reduction. Before we can start on that, we first need a notion of equality for two indices that live in the same environment. Not only should we know that they are the same number, we should also learn that they must then refer to the same type. If we can observe that fact, we no longer need any coercions. The perfect candidate for this is the Refl constructor from Data.Equality. It is roughly defined as:
data IsEq a b where
Refl :: IsEq a a
instance GEq (Idx env) where
geq Z Z = Just Refl
geq (S i) (S i') = geq i i'
geq _ _ = Nothing
trySubst :: Idx env a -> Term env' a -> Term env t -> Term env' t
trySubst idx sub (Var i)
| Just Refl <- geq idx i = sub
| otherwise = Var i -- error comes from here
--------------------------
• Couldn't match type 'env' with 'env''
Expected: Term env' t
Actual: Term env t
-- Proof that env is at least as strong as env', unlocks a function
-- that performs the weakening
newtype env :> env' = Weaken { (>:>) :: forall t'. Idx env t' -> Idx env' t' }
-- A custom operator that allows us to chain two weakening operations
infixr 9 .>
(.>) :: env2 :> env3 -> env1 :> env2 -> env1 :> env3
Weaken f .> Weaken g = Weaken (f . g)
-- A basic lemma, any environment is at least as strong as itself
wId :: env :> env
wId = Weaken id
-- We can always make the weaker environment weaker
wSucc :: env :> env' -> env :> (t : env')
wSucc = (Weaken S .>)
-- We can always make the stronger environment stronger
wRaise :: (t' : env) :> env' -> env :> env'
wRaise = (.> Weaken S)
-- We can weaken two environments in simultaneously
wSink :: env :> env' -> (t ': env) :> (t ': env')
wSink w = Weaken (\case Z -> Z ; S i -> S (w >:> i))
-- Lift the weakening operation to terms
-- i.e. just propagate to the Var cases
sinkTerm :: env :> env' -> Term env t -> Term env' t
sinkTerm w (Var i) = Var (w >:> i)
sinkTerm w (Lambda e) = Lambda (sinkTerm (wSink w) e)
sinkTerm w (App f arg) = App (sinkTerm w f) (sinkTerm w arg)
-- An often used simplification
sinkTerm1 :: Term env t -> Term (a ': env) t
sinkTerm1 = sinkTerm (wSucc wId)
subst :: env :> env' -> Term env' u -> Term (u ': env) t -> Term env' t
subst w = subst' weakening Z
where weakening = Weaken $ \case
-- This case is actually unreachable
Z -> error "variable should have been replaced"
S i -> w >:> i
subst' :: env :> env' -> Idx env a -> Term env' a -> Term env t -> Term env' t
subst' w n sub (Var n')
| Just Refl <- geq n n' = sub
| otherwise = Var (w >:> n')
subst' w n sub (Lambda e) = Lambda (subst' (wSink w) (S n) (sinkTerm1 sub) e)
subst' w n sub (App f arg) = App (subst' w n sub f) (subst' w n sub arg)
reduce :: Term env a -> Term env a
reduce (Var n) = Var n
reduce (App f arg) = let rf = reduce f in case rf of
Lambda f' -> reduce $ subst wId arg f'
_ -> App rf (reduce arg)
reduce (Lambda e) = Lambda $ reduce e
ghci> discardfirst
λ.(λ.(x0))
ghci> App (App discardfirst discardfirst) discardfirst
λ.(λ.(x0)) (λ.(λ.(x0))) (λ.(λ.(x0)))
ghci> reduce $ App (App discardfirst discardfirst) discardfirst
λ.(λ.(x0))
-- ^ IT WORKS!
Complete: Summary, Limitations, Future
Well done if you have made it this far. Likely you are either very excited, or very frustrated at this point. If the first is the case, I wish to end on a slightly sour note by pointing out that our strongly typed lambda calculus is not as powerful as vanilla lambda calculus. For example, take the famous Mockingbird combinator, that applies a given function to itself:
--| λf.f f
mockingbird = Lambda $ App (Var Z) (Var Z)
• Couldn't match type a with a -> b
Expected: Term ((a -> b) : env) a
Actual: Term ((a -> b) : env) (a -> b)
We could never define this function because the types would never be possible. Namely, being on the left side of the application, variable 0 must be a function, say some a -> b. However, because it is its own argument, we find that it must also be of type a. We could then say that instead we type it as (a -> b) -> b. But then we are stuck for the same reason, with no end to the cycle...
The same problem arouses with the y-combinator, which introduces a form of recursion in lambda calculus. All in all, it seems that our typed version is much less expressive, and that is indeed true. I would argue that just typed lambda calculus also does not make that much sense. Instead it would be more sensible to use the type environment only as as 'count the number of binding sites' environment. That way we can still ensure that variables are not out of scope, but without the typing conflicts.
But what we can be looking at, is the start of a more extensive functional embedded language. If we instead introduce more Term inhabitants, such as a constant integer value, along with an Add operation, lists, etc. We could still design a language that is some ways restricted so that we may compile it to certain platforms, or allow for an otherwise hard or impossible analysis. And that is what I believe Accelerate exactly to be.
Scoped
For the true fans of lambda calculus, I dedicate this chapter. Lambda calculus is beautiful because it doesn't need types as such. If you think about it, it is impossible for a pure lambda calculus expression to be type incorrect. The only place of concern for beta reduction is the application, what if the function argument is not a Lambda expression? First option is that is a variable, which references a binding site and may in the future be substituted with a different term. Second and final option would be that it is itself an application, which we can beta reduce again. Therefore the only thing that really matters is that variables are in scope!
We could easily change the list environment of the Complete module, and replace it just with natural numbers that count how many variables are scope. If a Term references no variables with a higher number, then we know the outer term to be closed. Anything that is of type Exp is therefore a valid lambda calculus expression. Furthermore, we are now free to apply functions to themselves, as well as create any other amazing combinator that is named after a bird :)
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE Rank2Types #-}
{-# LANGUAGE StandaloneDeriving #-}
module Scoped where
import Data.GADT.Compare
import Data.Type.Equality ((:~:)(Refl))
data Nat where
NZ :: Nat
NS :: Nat -> Nat
data Idx n where
Z :: Idx (NS n)
S :: Idx n -> Idx (NS n)
deriving instance Eq (Idx n)
idxToInt :: Idx env -> Int
idxToInt Z = 0
idxToInt (S n) = 1 + idxToInt n
instance Show (Idx env) where
show x = "x" <> show (idxToInt x)
data Term env where
Var :: Idx n -> Term n
Lambda :: Term (NS n) -> Term n
App :: Term n -> Term n -> Term n
instance Show (Term n) where
show (Var x) = show x
show (Lambda p) = "λ" <> "." <> "(" <> show p <> ")"
show (App f arg@(Var _)) = show f <> " " <> show arg
show (App f arg) = show f <> " " <> "(" <> show arg <> ")"
type Exp = Term NZ
data Env env where
VZ :: Env '[]
VS :: t -> Env env -> Env (t ': env)
newtype n :> n' = Weaken { (>:>) :: Idx n -> Idx n' }
infixr 9 .>
(.>) :: n2 :> n3 -> n1 :> n2 -> n1 :> n3
Weaken f .> Weaken g = Weaken (f . g)
wId :: n :> n
wId = Weaken id
wSucc :: n :> n' -> n :> NS n'
wSucc = (Weaken S .>)
wRaise :: NS n :> n' -> n :> n'
wRaise = (.> Weaken S)
wSink :: n :> n' -> NS n :> NS n'
wSink w = Weaken (\case Z -> Z ; S i -> S (w >:> i))
sinkTerm :: n :> n' -> Term n -> Term n'
sinkTerm w (Var i) = Var (w >:> i)
sinkTerm w (Lambda p) = Lambda (sinkTerm (wSink w) p)
sinkTerm w (App f arg) = App (sinkTerm w f) (sinkTerm w arg)
sinkTerm1 :: Term n -> Term (NS n)
sinkTerm1 = sinkTerm (wSucc wId)
subst :: n :> n' -> Term n' -> Term (NS n) -> Term n'
subst w = subst' weakning Z
where weakning = Weaken $ \case
Z -> error "variable should have been replaced"
S i -> w >:> i
subst' :: n :> n' -> Idx n -> Term n' -> Term n -> Term n'
subst' w n sub (Var n') = if n == n' then sub else Var (w >:> n')
subst' w n sub (Lambda p) = Lambda (subst' (wSink w) (S n) (sinkTerm1 sub) p)
subst' w n sub (App f arg) = App (subst' w n sub f) (subst' w n sub arg)
reduce :: Term n -> Term n
reduce (Var n) = Var n
reduce (App f arg) = let rf = reduce f in case rf of
Lambda f' -> reduce $ subst wId arg f'
_ -> App rf (reduce arg)
reduce (Lambda p) = Lambda $ reduce p
-- | boolean logic
--
true :: Exp
true = Lambda $ Lambda $ Var (S Z)
false :: Exp
false = Lambda $ Lambda $ Var Z
ifte = Lambda $ Lambda $ Lambda $ Var (S (S Z)) `App` Var (S Z) `App` Var Z
-- | naturals
incr = Lambda $ Lambda $ Lambda $ Var (S Z) `App` (Var (S (S Z)) `App` Var (S Z) `App` Var Z)
zero = Lambda $ Lambda $ Var Z
one = reduce $ App incr zero
two = reduce $ App incr two
intToLambda :: Int -> Exp
intToLambda 0 = zero
intToLambda n = reduce $ incr `App` intToLambda (n-1)
-- | cool combinators
mockingbird :: Exp
mockingbird = Lambda $ App (Var Z) (Var Z)
ycombinator :: Exp
ycombinator = Lambda $ App
(Lambda (App (Var (S Z)) (App (Var Z) (Var Z))))
(Lambda (App (Var (S Z)) (App (Var Z) (Var Z))))