55 – Sebastian Ullrich

Recorded 2024-07-15. Published 2024-08-16.

In this episode, Niki and Andres talk with Sebastian, one of the main developers of Lean, currently working at the Lean Focused Research Organization. Today we talk about the addictive notion of theorem provers, what is a sweet spot between dependent types and simple programming and how Lean is both a theorem prover and an efficient general purpose programming language.

Transcript

This transcript may contain mistakes. Did you find any? Feel free to fix them!

Niki Vazou (0:00:15): Welcome to the Haskell Interlude. I am Niki, the co-host is Andres, and our guest today is Sebastian, one of the main developers of Lean, currently working at the Lean Focused Research Organization. Today, we talk about the addictive notion of theorem provers, what is the sweet spot between dependent types and simple programming, and how Lean is both a theorem prover and an efficient general purpose programming language.

Hello, everyone. Our guest today is Sebastian, who works with Lean. So, thank you for joining us.

Sebastian Ullrich (0:00:49): Hi, thanks for having me.

NV (0:00:51): Do you want to tell us how you started to work with functional programming and theorem provers?

SU (0:00:56): Yes. So, let’s see. Functional programming I’ve certainly known quite a bit longer than what even theorem provers are. I think I started learning about functional programming in high school. I was interested in F# I think at first, moving there from C#, moving from imperative to functional, and I did also learn Haskell in that time.

NV (0:01:22): At high school?

SU (0:01:23): Yes.

Andres Loh (0:01:24): So, that was all self-study then, or did you actually learn that in school?

SU (0:01:29): No, not really. That was all self-study. But what was the big Haskell book at that time? It was not Learn You a Haskell. It was one before that.

AL (0:01:40): I’m not sure. I mean, I think different people would answer quite differently with what the big Haskell books were. So, I have difficulty guessing.

SU (0:01:51): Yeah, but there were still great resources, and I enjoyed learning Haskell. Then, of course, I started computer science in university. And I can’t really say I used any programming language, I don’t know, as a primary language during my studies. I mean, of course, you get exposed to many further languages in university, right? And as a stint, I maybe focused more on C++ for competitive programming. So then completely back in the other direction, but that might get important later. And so then I got involved in theorem proving for my bachelor’s thesis because a group there at the university were using Isabelle for formalization.

AL (0:02:36): So, what university was this?

SU (0:02:38): Oh yes, I should probably add that. That was in Karlsruhe, Germany, the Karlsruhe Institute of Technology. Name is certainly not reminiscent of any other universities. So, they were using Isabelle for formalization and verification also in the realm of compiler engineering, which was another topic that I was very interested in. So, for my bachelor thesis, I did a small formalization of static single assignment form construction in Isabelle. So, that was basically a program verification project. And so from there, I kept being interested in theorem proving. That seemed like, I don’t know, going from imperative to function to theorem proving to somehow get even more pure and more abstract as well, right? And for me, that was just a natural evolution. 

And so, yeah, I kept being interested in that. I definitely want to do something with theorem provers for my master’s thesis. So, while I was looking around what to do, I saw on Hacker News, of all places, a link to the Lean theorem prover website. The link itself was, to the homepage, just basically the blank homepage, but there is not even that much direct content. But then I saw on a sub-page of the people listed there a friend I knew from university, Jakob von Raumer. So, I asked him, “Hey, what are you doing with Lean?” And he’s telling me that he’s currently writing a master’s thesis at CMU in Lean using an exchange program between the two universities that I didn’t even know about.

NV (0:04:19): That was the beginning of the Lean, no? Lean is not that old.

SU (0:04:24): Right. That must’ve been in 2015, the year where Lean 2, the first official release, was published. The project was basically started in 2013 by Leonardo de Moura. 2014 was the release of Lean 0.1, the first prototype. So, both Jakob and me were very close to the beginning of Lean, got involved there. The next logical step, of course, was to apply there as well, to write my master’s thesis in Lean. And thanks to Jeremy Avigad at CMU, that became possible. And that’s how I got started, at least using Lean as an end user. 

AL (0:05:04): I guess you must have liked all these things. I mean, you learned about functional programming already when you were in school, you then picked a bachelor’s thesis that was around theorem proving. Was there something specific why you picked that route as opposed to the many other options that there are in computer science, at least, and also perhaps somewhat related to that? I mean, was there a specific reason why this Lean announcement on Hacker News sounded particularly interesting to you? Was there something that like, just that it was a new project or something specific about it that set it apart from other things?

SU (0:05:41): So, I’m not sure I have a great answer for that. In the end, it’s about taste, right? 

AL (0:05:47): Fair enough. 

SU (0:05:48): Why are some programmers more interested in functional than imperative? But I think one thing that can be said is that anyone who really got started with theorem proving at some point. So, this credible, addictive quality of this game that you’re playing with the theorem prover, of this back and forth with doing a proof step, getting back from the prover, okay, this was wrong, or okay, here’s the next thing you have to solve, or here’s the next three things you have to solve. And I still feel that addiction. And it’s just an incredibly interesting game to me still, even if nowadays I don’t even get to that much actual proving instead of implementation.

AL (0:06:30): Yeah, I think there is certainly some truth to that. I mean, when I started studying mathematics and then I learned about functional programming a while later, I thought that was always the major difference, was that feeling when the type checker says your program is correct. That is – I mean, if your TA says your assignment sheet and mathematics is correct, it’s not quite the same feeling. This has a much longer delay, and you don’t have this instant gratification. So, is it a lesson from that that we should gamify theorem proving and programming even more? Is that something to actually draw some conclusions from, or are we underestimating the power that this has?

SU (0:07:16): Yeah, I guess there’s a certain aspect to that, but in the end, it also makes sense to focus on what do we actually want to achieve with those tools. So, yeah, I would say gamification, that quality is a nice aspect, but maybe not something to focus on specifically achieving that. It’s just a nice side effect, I would say, of theorem proving.

AL (0:07:38): So, yeah. But the other half of my question was, why specifically did this Lean announcement stick out to you?

SU (0:07:46): So, yeah, that’s a good question. So, I mean, yeah. As I said, I learned Isabelle as my first theorem prover. I should also say that actually, Joachim Breitner, the other co-host, taught me that, together with another PhD student. And it’s a very nice system, and I was very productive in it. But I just wanted to see something new from my master’s thesis. And now I actually remember I got introduced to Agda a little before that, in the context of a small seminar on homotopy type theory. So, what I got from that seminar is that I did not really get the homotopy type theory part because I was also just stacking the necessary mathematics. But the dependent type theory part behind that, I got very interested in that. And that just seemed like a very natural way to express. It has a very small foundation to express things on top of. So that’s why I also saw, “Oh, there’s an even newer theorem prover also using dependent type theorem. That actually sounds great.”

AL (0:08:55): Right. So, I don’t know that much about the history of Lean. The first Lean version I ever looked at was Lean 3. But from what you’re saying, I conclude that even Lean 0.1 and Lean 2, they already had the same general flavor of being based on a dependent type theory.

SU (0:09:15): Yes. Yes. Actually, since Lean 2, the type theory has not really changed, only in minor details. Really, the one thing that was added to Lean 2 after Lean 0.1 was support for inductive types. So, if you’ve worked in a dependent type theory before without inductive types, which are generalization of algebraic data types, it’s really no fun. And of course, programming is no fun without that either. So, that really was the one thing that was still missing in this type theory.

NV (0:09:52): So, you need the inductive types to express properties that are not expressible without them, right, to do the constructive reasoning? What do you mean it’s not fun?

SU (0:10:06): Here I may just show my lack of deep type theory knowledge, but you can still express it in the base calculus of constructions with encoding. So, you can encode everything using just basic lambda calculus, depending on the type of lambda calculus, but it’s so much more convenient if you have actual inductive types in the theory and in the system.

NV (0:10:31): And so do you want to tell us, when you say the theory of Lean, you mean some kind of intermediate language that relies on some calculus or –

SU (0:10:43): Yes. Right. So, the basic makeup of Lean and of similar provers is that, of course, you have a surface language that you use to actually interact with the system that you in the end program in. And then you have a type theory below that, a kernel language, that actually expresses what is valid and what is not in this type theory. And of course, that should match what you expect the statements that you expect to be valid or not outside that as well. And so a big part of the system is just translating that surface language to this core language and then checking the result using the core type checker. Of course, the core language is much smaller than the surface language, so it’s also much easier to write a dedicated type checker for that. In fact, easy enough that people have written independent type checkers for the Lean core language in other languages, in Rust, also in Lean itself, so that you don’t have to trust the implementation that’s in Lean itself.

AL (0:11:47): So, you went to CMU for this exchange program and either started writing or did actually write your master thesis as a part of that. So, what was the master thesis in the end actually about?

SU (0:12:04): That was a bit funny. We started together with Jeremy. I actually wanted to work on – I think it was geometric topology. So, I did take a bit of math alongside my CS studies. But in the end, I mean, yeah, I’m not a mathematician, so that might’ve been a bit of an interesting topic. And so when I actually arrived in Pittsburgh, we did decide to slightly shift the direction into verification of Rust programs in Lean. So, yeah, very different topic. 

AL (0:12:40): Okay. But even the math topic was supposed to be a formalization in Lean.

SU (0:12:47): Yes, exactly. 

AL (0:12:47): Right.

NV (0:12:48): What do you mean verification of Rust programs in Lean? Super cool.

SU (0:12:55): Thanks. So, the basic idea was that – well, Rust is this, I mean, imperative language, but that also has a very strong restrictive type system in some ways, right? And in fact, so strong that the mutability of Rust is quite constrained. And so I thought that should make it possible to actually translate it to a functional program. In fact, a pure functional Lean program that you could then verify in Lean. So, instead of expressing the entire semantics of Rust in Lean, like doing a deep embedding, I thought that there should be a much, much easier way to do that, to simply translate Rust to Lean and then verify the Lean program.

NV (0:13:42): By explicitly carrying a heap around, or how are you going to –

SU (0:13:47): No heap at all, because mutability is so constrained. So, for example, if you give a mutable reference to a Rust program, then in Lean, in a pure functional language, we can simply take the input value and return the new volume of that mutable reference as a result, as a part of the result. It gets a bit more interesting when you return mutable references because then you basically return a lens into another value. So, there you get a bit deeper into functional programming. So, for basic programs, this worked out quite well in the end.

NV (0:14:29): Ah, so this is done.

SU (0:14:30): Yeah. Well, to the extent that it’s possible in a master’s thesis, I never then extended it into an actual publication. But there was actually, one or two years ago, a new project called Aeneas that basically took up that idea and evolved that into a working program for verifying Rust code in, I believe, Coq, Lean, and other provers as well. 

AL (0:14:58): Okay. But somehow you got from all this to being one of the, if not the, lead developers of Lean. So, how did that transformation happen?

SU (0:15:14): Yes. Yes. I should say I’m still number two, at least, for example, in the GitHub stats. Leonardo de Moura is such a productive person. I don’t think I will ever be able to overtake him. So, during that visit at CMU, I did get interested in contributing to Lean as well, since, yeah, I mean, it’s an open source program. And the thing, of course, was mostly mathematicians had been using Lean at that point. And most mathematicians do not know C++, which was the implementation language of Lean at that point. And I did know a bit of C++ from my competitive programming days. So, I was one of the first external people to actually contribute to Lean. And then I basically never stopped contributing to it.

AL (0:16:05): Okay. So, you were uniquely placed with your high degree of C++ skills and your interest on theorem proving. 

SU (0:16:13): Yeah. And I also have to thank Jeremy Avigad here again because, during my master’s thesis, he not only hosted me in Pittsburgh, but then facilitated the stay at Leo’s place at Redmond for a few days. And so that’s where we first—Leo and I—met in person and really got started collaborating on Lean.

AL (0:16:37): Okay. So, was there a specific thing initially that you wanted to add to Lean or do with Lean that wouldn’t have happened otherwise that really made you want to contribute yourself?

SU (0:16:53): In the beginning, it definitely was more minor things. The first non-trivial patch that I wrote, I believe, was on better spacing in pretty printing because, well, in theorem provers, pretty printing is quite important because when you do a logical step in that core language, of course, you have to reflect that back to the user into the surface language so that they can understand what happened after this step and what is left to be proved. So yeah, that was a very specific change in there. I would say really contributing to something that I’d love to see in Lean, that only started with my PhD, that I then after returning from Pittsburgh, started in Karlsruhe and only intensified collaboration with Leo. So, during an internship at Microsoft Research, we basically threw up the design of Lean 4, of the current version of Lean. So, one central part of that became the macro system that, in the end, I designed. So, that probably was the single largest contribution I’ve made to Lean so far, like at this day, standalone contribution.

AL (0:18:12): Yeah. I mean, that seems to be one of the major ingredients of Lean 4, at least, right? And I think it is rather impressive, but I mean, I would actually like to hear you describe it with your own words. What is it that makes that macro system special, or how does it actually work? And as far as you know, Haskell programmers will have some familiarity with things like Template Haskell or something like that. Perhaps you can somehow contrast it to that.

SU (0:18:43): So, to begin with, all features of Lean, that grew out of limitations with the previous system because, in theorem proving languages, you can, of course, add infix notation, like you can do in Haskell. But usually, they go even further, give you an array notation because, when you, for example, want to write down some semantics or a type system in the theorem prover, then you have all these funny symbols that you use in papers. And of course, you want to write them down in the prover just as much, right?

AL (0:19:20): So, this is really – I mean, you would say like the original motivation here is making the syntax, the surface syntax, more flexible, not so much even avoiding boilerplate code or generating stuff for you, but really like the syntax flexibility. 

SU (0:19:37): Yes. Yes, exactly. The notational flexibility and really finding a robust and good foundation for that instead of further adding ad hoc extensions to it. And so that really made us look around what’s going on in the literature, what’s going on in other programming languages. And of course, Racket is well known for basically consisting of a tower of languages, the tower of syntactic abstractions that build on top of each other and that you can combine in many different ways. And so that immediately spoke to us that’s something that we wanted to achieve as well. So, basically, that was a very general theme all across the Lean version so that we can give our users more flexibility without us having to make the change in Lean itself.

NV (0:20:30): And that was very important because mainly the users were mathematicians that wanted to use their own math symbols for every theorem they were proving.

SU (0:20:40): Yes, exactly. You have so many notations in math, and I mean, each paper probably introduces a new one, right? So, it’s impossible to just hard code all of them in the system itself. But we are also computer scientists. When they write something formal, when they write proofs, they get very creative with notations. So, the nice thing about a general macro system and about just making the syntax almost arbitrarily extensible is that then you can really think further and say, “Well, now, can’t we use the same system to embed, say, domain-specific languages into Lean?” And there already were domain-specific languages in Lean. For example, the entire tactic mode that you usually use to prove, to actually prove things in Lean, that’s basically a separate language from the outer language. With the macro system, now it became possible to add your own embedded languages as well. And yeah, we are seeing people experimenting with that. I think that’s still at an earlier state. But one very important domain-specific language we now have is our documentation tool called Verso that David Thrane Christiansen has written. So, that is an embedded language in Lean, which means that if you use it to document Lean code, to write a book about Lean, and so on, you can actually embed Lean code in there, and it will work in your editor because you’re still in a Lean file because the whole document is embedded in Lean. And then, of course, the tool allows you to also export all that code to HTML, say, and keep all the semantic information that you can access in your editor. You can now access in the browser version as well.

AL (0:22:31): Right. Going back, you mentioned the tactics language. The tactics language is now also just implemented in terms of this macro system, right? So, it’s no longer – I guess it used to be like a hard coded part of the syntax, even though it was domain-specific, but then afterwards it was reinvented.

SU (0:22:51): Right. To be a bit more specific, syntactically, it is now just one more syntactic category. You are able to add your own categories as well for your own languages and then proceed. The other part is giving that new syntax semantics, and you can do that with macros. So, if you have a simple tactic that, I don’t know, just the previous calling free other tactics, then you can write that down as a macro in Lean 4. If you want to do something more complicated, then at some point you get too complex for a macro. So, at that point, you can go down one more level of abstraction to what we call an elaborator, which really is a program. So, macros, I should say, are programs that take surface language and produce surface language. And then elaborator is a program that takes surface language and directly produces core language. And so in the end, it can produce any proof that’s expressible in the core language.

AL (0:23:53): Right. So, as soon as you have such powerful and flexible systems, then how much of the language is actually still essential? Did you basically remove and reimplement in terms of this macro system even more things, just except for the tactics language? Is anything left, or is there a price to pay at some point that you need certain things to be efficient enough so that even though you possibly could, you are still deciding to implement them directly?

SU (0:24:26): No. Basically, everything is built on these extensible systems. Starting with the syntax, we have built-in syntactic categories and built-in parses for these categories, but there’s at least not a big conceptual difference between them being built in and added by users. And yeah, for the macros and elaborators as well, you could copy out any of them in Lean and copy them into your own project, change them in whatever way you want, and then overwrite the built-in one with that. So, it is all built on top of this general extensible framework.

AL (0:25:06): Okay. So, one thing that you mentioned originally, like your first contribution to Lean, was something with a pretty printing. So, one thing that often is difficult about these kinds of very generic, very extensible systems is that things at some point start to still somehow feel second class and that the printing leaks or that the, I don’t know, error messages are suboptimal. I don’t have the feeling that Lean actually suffers from that, but there must be something truly magic going on in order to make it not suffer from that. So, perhaps you can say a little bit more about this. I mean, there seems to be some sort of bi-directional stuff going on that the moment you implement new syntax, that’s also being used by the printer. How does it all work?

SU (0:25:54): Yes. So, for simple notations, we try to automatically synthesize a pretty printer as well, but yeah, basically reversing exactly that program from surface to surface, that definitely does not always happen. If that doesn’t work, then you can add your own pretty printer to Lean as well. That is a bit of work. That usually is a bit harder than writing the parser, the macro in the first place. I would say the more polished you want there, certainly the more work you have to put in as the creator of that macro or of that embedded language.

AL (0:26:31): And in terms of, say, parse errors or even type errors, is there anything that you can also use notational features in order to improve the presentation of these or –

SU (0:26:45): We do not have a good way right now to, I would say, adjust errors, as I think Haskell has some ways to do that, right? Some errors of the system make them more understandable in the current use case.

AL (0:27:03): I mean, I don’t think so. I mean, there is a specific way of generating user-defined type errors, but it is also a little bit fragile. I think the most impressive work I have seen in that direction – but I mean, I’ve not really studied a lot of the literature there. It’s just from my background at Utrecht University, there was for a while, there was this Helium compiler, which was a Haskell dialect specifically targeted for teaching functional programming and Haskell. And there you had an extension that, for domain-specific languages, you could influence the order in which constraints would be solved. And if a particular situation happened, you could generate a custom-type error message. So, that was one of the standard examples of this, was like a parser combinator library that would actually tell you something in terms of parser combinators. You need to apply a semantic function, you forgot to apply a semantic function or something like that, rather than your generic type error message. But I was just wondering whether there is anything in that direction in Lean as well because you’re doing so much with macros.

SU (0:28:12): Yeah, that sounds nice. No, that is certainly one part where you can get to the limitations of macros. If you’re just not satisfied with the error messages created by the system, then usually it’s better to actually go in and do the check yourself and then create your own error message. But you can’t do that in a macro anymore because macros are not type-sensitive. So, if the error is created from some type mismatch, macros are basically fired too early for that. But instead, you would do that in your own elaborator program. So, it’s a bit more low-level, but yeah, certainly if you really want to create your custom error message, it’s possible.

NV (0:28:54): So, do you have experience from people complaining about error messages? When you are teaching or when you have people learning Lean for the first time, do you explain this thing that you have to go and edit and make your own error message?

SU (0:29:12): Yes. Less, I would say, for custom extensions of notation, because just the messages created by the system itself for basic use, we can still see so many ways to improve them just to give more context and so on, instead of here’s the exact thing that failed. Like what was tried before? Like just type class resolution, of course. Like right now, we say, yeah, we failed to infer that type class. But where exactly did it fail? What was tried, and what probably should have happened? We have the debug output to give specific traces on that, but it would be really good to have something in between. Something that is immediately useful to users without first digging through the system. Just to say in general, in error messages, there’s still a lot we can improve, and we’re quite aware of that. And it is part of our roadmap as well.

NV (0:30:08): But I think this is the most difficult problem when you’re trying to persuade somebody to use even Haskell and everything with advanced type checker. If there was a way to explain more interactively the error messages, this would be fantastic.

SU (0:30:29): Yeah, absolutely. So, since you mentioned interactive, one thing we do have at least is that for any term that’s embedded in an error message, you can actually hover over it and explore its structure, explore like what’s going on here, where is that, what’s the inferred type class, what’s the implicit argument here, what is that notation actually short for? Just like you can do in general for codes. So, that definitely already helps a bit with just making error messages more accessible.

NV (0:31:02): And this interactivity happens in what editor?

SU (0:31:05): I believe it’s currently implemented in VS Code. That’s our main editor. And there is an extension for Neovim by someone else that I think implements some of that interactivity, at least. But yes.

AL (0:31:20): So, this is not playing LSP, but something more.

SU (0:31:26): Correct. It is an extension. We do have a few extensions by now because LSP, of course, was not made for theorem provers, and with their interactivity, they need more features from that. So, yes, that’s related to a general system we have, or extensions in the prover can just send HTML to the editor and use that to visualize structure. You could also do that to visualize string diagrams if you do category theory proofs. Someone has written a visualization for that. And so that’s the way we’re trying to go for making the system more interactive, even outside writing proofs.

AL (0:32:10): That’s all cool. So, another thing that seems to have changed rather drastically between Lean 3 and Lean 4, if I’m not mistaken, is that Lean 3 was still primarily marketed at mathematicians to prove things, whereas Lean 4 is also marketed as software developers to write software. So, Lean 4 feels much more like a general purpose programming language plus a theorem prover, whereas Lean 3 was more just one of that. Is that also a deliberate shift in focus that you advocated for, or did it arise out of something else? 

SU (0:32:48): Yes. I would say it was also part of the natural evolution of Lean. In Lean 2, in the first official release, it was actually the case that you could add your own tactics using Lua, which makes sense as an extension, but turns out mathematicians don’t know how to write Lua either. So, it didn’t really help them. And so in Lean 3, we decided, okay, let’s make Lean a domain-specific language for writing tactics. And that worked out well for, say, relatively simple tactics. But yeah, users got even more ambitious, wanted to write even more complex tactics. At some point, it got clear. Okay, they really want to use Lean as a proper programming language, and they want to write more than tactics in it. And as you said, in Lean 4, Lean became a general purpose programming language, which in particular allowed us to rewrite Lean, basically the entire system in itself.

NV (0:33:50): I have this intuition that if you have a theorem prover, you need to have everything terminating and everything, normalizing a well structure. But Lean is this general purpose programming language. So I assume you need to use features that don’t translate to your core calculus. Is that correct?

SU (0:34:14): Yes. That’s basically right. So, yes, Lean is a total functional programming language for purposes of consistency. But in Lean 4, we do have a keyword named partial that you can apply to a function, and then you do not have to prove termination anymore. And so what happens in the system, in the logic, is that you have to then show that the type of that function is actually inhabited. So, basically, there could be some other implementation that returns some value of that function that does not infinitely recurse. And as soon as you show that, the function just becomes totally opaque to the rest of the system. So, the rest of the system cannot determine anymore that there might have been some infinite recursion in here. If you write a recursive function using that returns a natural number, then of course, we know natural numbers exist. There’s a zero. So, that function could have returned zero. It just will never look into it. And so from that point on, it’s consistent, even if you don’t prove termination of that specific function.

NV (0:35:27): So, you said Lean is written in Lean?

SU (0:35:30): Yes. The vast majority of it is now written in Lean. There are some C++ remnants. For example, the kernel, the type checker. Since we don’t change it all that much, it has not made sense yet to rewrite it in Lean. Perhaps that will happen at some later point. And as I mentioned, people have actually written a separate type checker in Lean, using Lean as the programming language. So, it’s definitely possible. It’s almost as fast as the C++ implementation, but yeah, basically all the other parts have been rewritten in Lean, and of course, applying all the learnings on the way to that point as well. So, in the end, even if Lean as a programming language is not quite as fast as C++, the whole system got quite a bit faster going from Lean 3 to Lean 4 doing this part because of all the additional optimizations we now knew that we should add.

NV (0:36:23): And what about the quality of the software? Do you actually use Lean to prove properties about the code?

SU (0:36:33): Yeah. So, we do not prove properties about the implementation of Lean. That external type checker that I mentioned—I think there’s a goal of actually checking that in Lean. I think that is ongoing, the Lean-written type checker. But yeah, for Lean itself, writing a theorem prover is already hard enough. We don’t also want to think about how to prove it correct at the same time. And of course, in the end, the beauty of a system like Lean is that you only have to trust the type checker because that in the end is what everything gets translated to, and the rest of the system could be nice to have to know that it’s correct.

AL (0:37:13): So, that’s interesting. Would you also go as far as saying that in the Lean-written parts of Lean itself, you’re mostly using a simply typed fragment of functional programming, or are you making a significant use of dependent types in some places?

SU (0:37:32): It is relatively simply typed. I would say it’s not very different from your usual Haskell program or Haskell-implemented compiler. So yeah, we do make much use of monads, monad transformers, type classes on top of that. There are some specific parts where it was just very natural and easy to use dependent types. But yeah, I would not say it was a surprising application. And you probably could do something like that in Haskell as well. So for example, now we do have a strongly typed syntax type, at least in the sense that if you have a syntax tree, you know which grammar rule produce that. So, the syntax tree type is parameterized by the name of the grammar rule. So, yeah, that is a dependent type, but probably in Haskell, you could find other ways to encode that if you wanted to.

AL (0:38:30): Yeah. But you would say it’s like modest use of advanced features. 

SU (0:38:35): Yes. Yes. 

AL (0:38:36): Is that what you say? This is a pragmatic choice simply because there is not that much experience, or is that really the sweet spot? Because I mean, for example, if you listen to people like Conor McBride or something like that, you would start to believe that designing any program as not fully dependently typed up front is wrong, and we’re all doing it wrong. So that would be where the future is. But then again, I’ve now heard from multiple people, I think, also from people implementing Agda, people working on Idris, now you’re working on Lean. In these implementations, there’s not actually all that much dependently typed programming going on itself. So, it seems like for just writing compilers or implementations of dependently typed programming languages, one does not really need dependent types.

SU (0:39:28): Yeah, it’s a very good question. To some degree, because we were writing and evolving the language while we were implementing the system and then bootstrapping everything, we did have this constraint of also keeping it simple to some degree. So, who knows if we would have done something differently if we did that in the current language? But yeah, generally, I would agree, especially like this large code that, in the end, you don’t want to verify anyway. I’m not sure we could have seen much more good use of dependent types in there.

AL (0:40:07): Okay. All right. I mean, can talk again in five to 10 years and see where we are at. 

SU (0:40:16): Sounds good. 

AL (0:40:17): Yeah. I have a couple of more things that are loosely connected that I still want to ask about. So, one thing is, when I look at Lean as a general purpose programming language, perhaps this is because of my familiarity with Haskell, but I find lots of things that seem familiar to me, in particular, the way that IO is being designed and some parts of the syntactic sugar around it and so on and so forth. Would you say Haskell was a specific influence for this? Or would you say you just looked at various different other languages and picked features that you liked? Or how did you design these parts?

SU (0:40:54): Yeah. Certainly, for many of the functional parts, especially since you mentioned these monads and so on, that was directly inspired by Haskell. No question. The syntax of Lean, I would actually describe more as an ML-inspired, perhaps. Now we even went further in that direction by extending our do notation with imperative concepts that are, of course, nevertheless compiled down into purely functional programs. So yes, it’s absolutely true that Lean was inspired by many different systems. The type 3 is very close to Coq. The style of interaction improves is very similar to Isabelle. The macro system, of course, is similar to that of Racket. Yes, many parts of programming are similar to Haskell, to ML, to OCaml, as with any other system. Many inspirations from many different sources.

AL (0:41:56): In terms of proving things, there have been some impressive achievements in Lean already. In terms of actually using it as a software development language, apart from Lean itself, are there any larger projects that you think everyone should be aware of?

SU (0:42:14): So, certainly, Lean itself is, I assume, the biggest program currently written in Lean. That will probably not change all that soon. Right now, I would say we’re actually more focused on program verification than strictly just programming in Lean. We have seen some first projects being done in Lean that are not about mathematics, because before that, it was all mathematics being done in Lean. So, Amazon Web Services now, for example, started. They have a specification language called Cedar, and they have now written a verified type checker for that in Lean. They gave a great talk at Lean Together this year. One really interesting facet of that is that the program itself that they proved correct, they also ran that against their Rust implementation they use in production, and apparently the efficiency was very comparable, very competitive. I think we were very pleasantly surprised by that.

NV (0:43:13): But you said that before. How is it possible to be so efficient? You have garbage collectors and stuff.

SU (0:43:20): Yes, we do garbage collection, but more specifically, we do reference counting. So, that actually was a decision from a development standpoint because coming up with our own tracing garbage collector, of course, that’s a big task. That’s not something we wanted to tackle while we were busy with building all the other systems of Lean. So, we really started thinking what can we do to make the reference counting we already had in the C++ code more efficient. So, we also wrote a paper on that, on, okay, identifying what objects are not shared out between threads. So, we can use non-atomic increments and decrements for them. How can we change our intermediate language in the compiler to avoid these reference counting instructions? And yes, in the end, also, how can we make use of reference counting to avoid allocating objects by actually reusing existing allocation and so really getting close to what imperative languages do with actual mutable objects?

NV (0:44:26): And the theorem proving features, are they helping in all this and in making run times faster?

SU (0:44:34): It’s possible, I would say, in some different ways. For example, of course, if you prove that an index access and array access in Lean is inbounds, then that actually creates faster code at runtime. That’s, I think, the primary example I have right now. But yeah, otherwise as well, if you prove something is not possible, then you do not have to create code for that. Maybe you can avoid a few branches, and so on.

NV (0:45:03): I mean, it would be good to have a study that says this, that if you prove your theorems, they become faster, because I think this is a great motivation to persuade people to use these fancy type systems.

SU (0:45:16): Yeah, it would be great to use a theorem prover to basically compile down to the fastest program possible. But yeah, I would not say we’ve tried there yet.

AL (0:45:26): Perhaps we should go back to the historical arc once more because, I mean, we talked very much about your bachelor’s, your master’s. We mentioned your PhD, which you completed, right? And then now you’re working for this Lean FRO, right? What is it you’re actually trying to do right now, like in the next few years? Are you working on Lean 5 or –

SU (0:45:55): Yeah. That is a frequent question. I can completely understand where it’s coming from. I always lead this with saying that the time we took to actually implement Lean 4, which was both mostly Leo and me, mostly that was during the time of my PhD, which I completed last year, that was actually longer than all of the previous versions of Lean. So, that really escalated, and we are not at all thinking about how to restart the system yet again. No one is thinking about that. And it’s also just from what you wanted to achieve with Lean 4, is to open up the system by re-implementing it in Lean. And now it’s completely open. Everyone can use all the compiler API from the system, and you have all these different extension points. So, it would not even be clear to me what should be part of a Lean 5. So, at least for that, I think I can say that we’re not planning to do any major, completely changing the language yet again anytime soon. 

AL (0:46:55): So, a stable point has been reached. I mean, that is totally normal, I guess. If you look at the evolution of Haskell, for example, right? I mean, you had this initial phase where there were lots of releases of the language. Haskell 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, which became Haskell 98. And then after Haskell 98, not all that much has been happening anymore. Of course, things are still changing, but in terms of the standard, it’s not been as drastic anymore. So, you would say Lean 4 is the culmination of the original design, and probably most likely further changes will, of course, happen, but are more likely to be incremental on top of Lean 4 while preserving some amount of compatibility.

SU (0:47:42): Yes, exactly. 

AL (0:47:43): But are there nevertheless any sort of big goals that you’re currently trying to achieve in your work?

SU (0:47:52): Yes. So, the last half year, I was mostly focused on some internal changes like making proof compilation, proof execution, more incremental in the editor, and now I’m looking at parallelism. But one language addition that we want to tackle, let’s say, in the medium-term future is to give Lean a better module system because right now it does not really have one. And with module system, I mean, the file level. Like in Haskell, you have more control about what you actually export from a file and what you don’t, which, of course, it’s important for modularity, for robustness of code, right? But for us, it’s also very important just for the performance of the system. Because now you have this method, the mathematical library of Lean, that’s 1. 5 million lines of code. And of course, with all the proofs in there, that takes a while to compile. Also for programmers, if you change something that should not affect other files, if the system actually is able to detect that, that would really help us with just scaling everything up and making computation faster in Lean.

AL (0:49:06): Right. I’m just trying to understand what the actual problem here is because, obviously, you have clear scoping rules and namespaces and so on and so forth. But is the problem that often you still need stuff, even though it is not in scope, because you need to run a proof term or something like that, and therefore you need an independent notion of what you actually need to export independently from whether the name is visible.

SU (0:49:34): Yes. So, it is actually the case that right now in Lean, we do have these scopes, these namespaces, but in the end, each name is visible, which is very important for the type checker because it has to be able to access everything. So, not actually making a name visible, that is part of what we want to change. We want to make it completely additional, and we don’t want to change the current language, but that will be a fundamental change in how the system works. Yeah.

AL (0:50:04): Okay. And you’re saying at the moment you also have a very pessimistic recompilation checking in the sense that basically everything needs to always be recompiled.

SU (0:50:16): Exactly. 

AL (0:50:17): Okay. Sounds like a good improvement. It happens.

SU (0:50:23): Yeah. We’re looking forward to that.

AL (0:50:25): Any other major things that you’re working on?

SU (0:50:28): Yeah. So I should, I guess, mention that we now have this Lean Focused Research Organization. So, we have our own non-profit focused on the development of Lean. We now have 13 people working there from all over the world. So, yeah, just with that number, there’s so many more projects that we can now concentrate on than when it was just Leo and me. There’s no comparison. So, we have David working on documentation that we would not have had time to really concentrate on before. We are working, of course, on further automation. We are working on making the system in general more performant, as I mentioned. Yeah, we’re of course working on the entire usability, on the extension, on the language server. There are still many more things that have not been implemented there. And yes, in the end, the Lean Focused Research Organization is focused on improving scalability, usability, and automation of Lean for the purpose of formalization of mathematics, but also other purposes like program verification.

AL (0:51:37): That sounds a little bit rehearsed. But nevertheless, it’s a nice piece of text. So, is there anything that you would say, like of all the nice things that Lean is doing that Haskell should take note of and learn from specifically, or do you think Haskell is hopeless?

SU (0:51:56): No, I don’t think so at all. It’s a good question. I mean, Lean is a pure functional language, but it’s also a strict language. So, not all the learnings might even be applicable. And I assume Haskell does not want to be a theorem prover. It totally makes sense that these two projects have, in the end, very different goals. Also about extensibility, of course, there is always a price to pay here. So, I can very much see Haskell being very comfortable with the syntactic extensibility that it currently has and not thinking, ‘Oh, we should definitely get a macro system as well.’

AL (0:52:35): Well, I’m not sure about the solution, but I certainly see a problem, right? I mean, Haskell, when I started learning Haskell many, many years ago, I actually thought that compared to ML, it had a cleaner, smaller syntax. Perhaps that was a misimpression even back then. But by now, it’s certainly not the case, right? There are all these extensions that have been bolted onto Haskell. And in those parts, I would say, it’s not beautiful anymore because all the nice syntactic constructs have long been taken, and there is simply no space to insert anything new in a nice way. So, having some revision or some path to clean up would actually be something I would like to see. I just think that all these things have a very high price in terms of either not being backwards compatible or otherwise breaking lots of things or being very difficult to get any kind of agreement on. So, I don’t think it is very realistic, but I certainly feel like it would be nice to revise the Haskell syntax somehow or clean it up a bit. And perhaps a macro system could help, because you could at least argue that everybody can do whatever they want. And there is a lot of flexibility there.

NV (0:53:51): Which is both good and bad, no?

SU (0:53:54): Yeah, it definitely has its own benefits and costs. Absolutely.

NV (0:53:59): Good. Thank you very much for being here, Sebastian. It was very interesting. 

AL (0:54:04): Yeah, thank you.

SU (0:54:05): Thank you for your great questions.

Narrator (00:54:08): The Haskell Interlude Podcast is a project of the Haskell Foundation, and it is made possible by the generous support of our sponsors, especially the Monad-level sponsors: GitHub, Input Output, Juspay, and Meta.

SPONSORS
Monads
IOHK Juspay Meta
Applicatives
CarbonCloud Digital Asset ExFreight Mercury Obsidian Systems Platonic Systems Standard Chartered Tweag Well-Typed
Functors
Artificial Channable FlipStone Freckle Google HERP MLabs TripShot
To learn more about the Haskell Foundation
Haskell Foundation, Inc.
2093 Philadelphia Pike #8119
Claymont, DE 19703
USA