Paul Bone

Making lambdas recursive in Plasma

While implementing closures in Plasma I hit this problem and think it might be an interesting example to demonstrate where theory and practice meet in programming language implementation.

Lambda expressions are anonymous (lambda expression means anonymous function). and because things need to be named so we can refer to them (in most languages) a lambda expression cannot refer to itself and be recursive. So why would I want to make them recursive? We’ll get to that, but first some background.

Statements in Plasma work like let expressions might in typical functional languages. The first statement here binds x and subsequent statements may now refer to it.

x = a + b
print!(int_to_string(x) ++ "\n")

x can’t be used before it is defined, including the right-hand-side of an binding (=) in the same statement. So this is illegal.

x = x + 1

Meanwhile functions are recursive:

func foo(x) -> Int {
   return foo(x + 1)
}

is legal, but not a good idea.

Next we want to support nested functions and lambda expressions. Nested functions look like:

func main() uses IO -> Int {
    greeting = "Hello "
    func hi(name : String) uses IO {
        print!(greeting ++ name ++ "\n")
    }

    hi!("Paul")
    return 0
}

And lambda expressions might look like:

func main() uses IO -> Int {
    greeting = "Hello "
    hi = func(name : String) uses IO {
        print!(greeting ++ name ++ "\n")
    }

    hi!("Paul")
    return 0
}

In fact, both are equivalent, and therefore, to keep the compiler and other tools simpler one is usually implemented in terms of the other. Often as a "syntax sugar" (but not always). In the case of functions in Plasma, the lambda expression is more general, so we’ll implement the nested function definitions in terms of lambda expressions. (In Plasma this is not a syntax sugar since the desugaring of nested functions happens at a later compilation stage, after parsing.)

Everything is fine until we attempt recursion.

func length(l : List(x)) -> Int {
   match (l) {
      case [] -> {
         return 0
      }
      case [x|xs] -> {
         return length(xs) + 1
      }
   }
}

If this is nested inside another function and implemented in terms of lambda, then it becomes:

length = func(l : List(x)) -> Int {
   match (l) {
      case [] -> {
         return 0
      }
      case [x|xs] -> {
         return length(xs) + 1      // UNDEFINED
      }
   }
}

And if we think about our x = x + 1 example above, we can’t access length before it has been bound. This is not only restrictive and surprising, but it’s also against one principal I have for Plasma, that is that the same thing (syntax-wise) should behave the same regardless of context. If this function was at the top level then it would not be transformed this way and would be able to be recursive. (A related principal is that things that look the same should be the same.)

Naturally if we wrote a lambda it should not be able to access itself. Maybe the above case looks okay, however:

greetings = map(func(name : String) -> String {
      return "G'day " ++ name
   }, names);

It makes no-sense for the lambda, which is very definitely an expression in map's first argument, to access the greetings variable. Because things that look the same (lambdas) should be the same, we need the same rule for the above length lambda as we do here.

That brings us to the decision that we have to make the lambda syntax, and nested function syntax behave differently even though they map to the same internal thing. This is where we may add some exceptions to the rules of our language’s internal consistency, in order to provide external consistency from the user’s perspective.

In my opinion, the solution isn’t the interesting part. What you just read, how we decide how to make a language consistent is the interesting part. Read on for how we usually solve this within a compiler.

When a function is defined as a statement, such as:

func length(l : List(x)) -> Int {
   ...
}

and we transform it to a statement containing an expression which is a lambda. The compiler can attach to the lambda the name it was defined with and the knowledge that it must be able to access its own name from within. Compilers attach a lot of data to the nodes that represent the program, this is just one other thing.

length = func(l : List(x)) Int
   #pragma lambda name "length"
   {
      ...
   }

Now when we process this lambda we know to make the name "length" refer to itself and to allow it to be resolved from within. And when performing later analyses we have to check such annotations.

What Plasma actually does is slightly more complex because it fits with the structure of the compiler better (and is not implemented yet). At the time when these statements are transformed to lambdas, a name for the function that tracks the lambda is generated (and used for other reasons) We can search the body of such functions to look for the name "length" and replace it with the internal name.

length = func lambda_23_length_2(l : List(x)) -> Int {
   match (l) {
      case [] -> {
         return 0
      }
      case [x|xs] -> {
         return lambda_23_length_2(xs) + 1
      }
   }
}

If you’re wondering where the numbers come from, they’re the line and column numbers from the source file, they’ll always be unique and this symbol won’t be exported.

Update 2019-02-13

The simple solution presented here won’t work for mutually recursive functions, which should work because they do work at the top level so they should work in nested levels. I think the solution here could be made to work, subject to other bindings and causality.