Simon Marlow

Transcript

This transcript may contain mistakes. Did you find any? Feel free to fix them!

Andres Löh (0:00:19): This is the Haskell Interlude. I am Andres Löh. Our guest today is Simon Marlow, a long-time GHC contributor, and currently working at Meta. We will talk about compiling functional languages via C and the Evil Mangler, the importance of parallelism and its impacts on garbage collection, and about using Haskell in the real world via Sigma, Haxl, and Glean.

Hi, Simon. Welcome to the Haskell Interlude.

Simon Marlow (0:00:47): Hi there.

AL (0:00:49): Welcome. So, let’s just jump into the first question as usual by now, like how did you first get into contact with Haskell?

SM (0:00:57): Well, so I was at Bristol University in the late ’80s, and I did a course on functional programming, which used the language Miranda at the time. And so Miranda was my first exposure to functional programming. And first of all, I encountered it in the first year and it seemed like a very strange way to write programs. It didn’t really click for me at all, and I wasn’t really sure what you would use it for. At the time, I think I was more interested in writing programs in C. And yeah, functional programming just seemed a bit strange at the time. So, I didn’t encounter it again until my last year, my undergrad degree, and we did a longer course on functional programming. It included more, a lot more depth, and in particular, some material on how you actually implement it as well. That sort, for me, connected it to the actual machine, made the connection between this abstract programming language and how it actually executes. And that to me was really fascinating. I think at that point, it kind of clicked and became a programming language you could use because I could understand how it worked.

AL (0:02:01): So when you learned how to implement it, basically.

SM (0:02:04): Yes, that’s correct.

AL (0:02:05): That was sort of what made it more–

SM (0:02:07): Yeah.

AL (0:02:07): Okay. When you learned how to implement it, you looked at the actual Miranda implementation, or did you look at sort of like general ideas of how to implement it?

SM (0:02:16): Well, so this course, I think it used the STG machine as its implementation technique. I mean, there were various techniques floating around at the time. Like the G-machine, SKI combinators were quite common, and the STG machine was very new. And so this was a book by Ian Hoyler who was giving the course. And he wrote a book, I think, and he included a chapter on implementation using the STG machine.

AL (0:02:41): So Miranda itself, if I remember correctly, was actually implemented using SKI combinators or some form of SKI combinators, right?

SM (0:02:49): Yeah, that’s right. So this implementation was not about Miranda in particular. It was about functional languages in general. And there was also a bit of material in this course about Haskell, which was very new at the time. I think this was– so this would’ve been 1990, 1991, and Haskell was just being born at the time. So there was a little bit of material in the course about the differences between Haskell and Miranda, and–

AL (0:03:17): And that was at Bristol, you said, or?

SM (0:03:19): Yeah, Bristol.

AL (0:03:20): Okay. So they immediately picked up on Haskell, even though it was not one of the places where it was actually born. Or was there someone there who was very close to the whole research effort or even on the committee?

SM (0:03:37): Right. So I guess it was a community of researchers across the UK and, well, throughout the world doing functional programming and collaborating on language design and research around, yeah, functional languages.

Matthias Pall (0:03:50): And then how did you move on from Miranda to Haskell, or when did the focus shift?

SM (0:03:54): Yeah. So this course kind of kindled my interest in functional programming, and I was really excited about what was going on with Haskell and implementing it. So I decided to go on from there to do a PhD, and I looked around for places where I could do functional programming-related things. And of course, Glasgow came up because at the time, Glasgow was a sort of hotbed of functional programming, and there were people like Simon Peyton Jones and Phil Wadler, and John Hughes. And many great people in the world of functional programming were at Glasgow. So I contacted people at Glasgow and asked whether I could do a PhD, and that came to pass. So I met Simon Peyton Jones and told him that I’d been reading his book and he seemed to be quite pleased about that.

AL (0:04:38): That was this Implementation of Functional Language, this book, right?

SM (0:04:41): Yeah. So I devoured this book. I found it in the library at University of Bristol. I devoured this book from cover to cover, and even implemented most of it. I sort of started writing my own little compiler in C on my PC at home using this book. So it was fascinating, really. This book had all of the various phases of the compiler from the beginning, from parsing through type checking and the backend implementation. So you could take this book and implement a complete functional language.

AL (0:05:12): Yeah, it just– I mean, I also have extremely good memories of this book, but at the same time, I think when I first came into contact with Haskell, which was around ’97, I think, 1997, it was already very difficult to get it. It was very strange that this book, which was such a beautiful book about implementing functional programming, somehow seems to have had a very low print number or have to become out of print very early in its life. And it was also not the time when everybody had PDFs floating around the internet anyways.

SM (0:05:51): So at some point, I think Simon had to get the book scanned so that you could distribute PDFs of it later on because it had gone out of print. Yeah, it was incredible. I still have a copy of it on my shelf.

AL (0:06:03): No, that’s very good. So you basically then applied to Glasgow or you–

SM (0:06:07): Yes. So I applied to Glasgow, Phil Wadler ended up being my supervisor at Glasgow. So I did a PhD under Phil and worked on deforestation. There were several of us at the time doing PhDs. So you probably know Andy Gill. There was also Andre Santos, David King. There was a few of us. We ended up in a room together at Glasgow. We were all doing PhDs in functional programming-related things. So it was a lot of fun. And to be amongst this group of eminent researchers, doing really groundbreaking stuff with functional programming, was quite amazing.

AL (0:06:41): Okay. And what was Simon PJ’s role at the time? Was he also a professor there, or?

SM (0:06:45): So he became a professor. I’m not sure if he was at the time when I joined. But yeah, he was a professor and he had various projects that included working on the compiler and developing the language. And there were a couple of research assistants working on those. So Will Partain was one of the research assistants, and he did a lot of the implementation of the compiler. And then there were all these PhD students doing various things related to the compiler. So my PhD didn’t involve working on the compiler directly, but I quite often found excuses to work on it. And I found little side projects to add various analyses. For example, I think I worked on a thing called Update Analysis, which was an analysis to detect when a thunk needed to be updated after it was evaluated, because quite often a thunk is never used again after you’ve evaluated it, so you don’t need to update it. So we did an analysis, and that was one thing I implemented.

AL (0:07:44): So what was the state of GHC at the time when you just joined Glasgow? I mean, obviously was working in some form already, but it probably also was vastly different from the GHC we know now.

SM (0:07:57): Yes. Well, so the compiler used to compile via C, it was using C as its intermediate language. C as a portable assembler was the way that we used to describe it, because architectures were a lot more diverse in those days. You typically wanted to have your compiler compile for various different architectures, so 68000s and MIPS. And we did a lot of our work on Sun SPARCstations. And then PCs came along a lot later. It wasn’t for quite a while before the compiler was actually ported to x86 chips. And then DEC Alpha as well was another one that came along. And so, portability was quite important, and using C as a portable assembler was a path that many compilers took at the time. But it led to tremendous amounts of complexity. And you had to have swades of CPP macros and goodness knows what else to compile it. And then actually making it so that you could run the output of the seed compiler in a way that wasn’t going to blow up the stack, for example. That’s one problem because we rely very much on tail calls in Haskell.

So compiling via C without blowing up the C stack is quite an art. There are various techniques for doing that. One of them was called the mini-interpreter, where you call a function and it returns at the address of the next function to call, and then there’s a little loop in the middle that sort of continuously jumps to the next one. So that’s one way to do it. It’s not the most efficient way, of course, because the cost of that little loop is quite high.

So yeah, the compiler compiled via C at the time, and it was bootstrapped to a new platform by taking all the intermediate C files, taking them to the new platform, compiling them with the C compiler, and then you have a compiler and now you sort of bootstrapped it. So we had to do that quite a lot.

AL (0:09:44): And when did that change?

SM (0:09:45): Well, so the native code generator came around. It was definitely not for a few years. So I think we started prototyping the native code generator. I believe it was Andy Gill who sort of wrote the first prototype of the native code generator just as a sort of side project. He was just playing around with ideas around how to write native code generators and ended up building something that sort of worked. And then we pushed it through and made it work, and it was much faster, of course, than compiling via C because you’re not generating all the C code and then compiling with a separate C compiler.

AL (0:10:17): You mean faster in terms of, like the actual compilation time was faster?

SM (0:10:20): Faster in terms of compilation time, yes. It generated more or less the same code. Because even though you think you’re relying on the C compiler, we get to use all these wonderful optimizations that the C compilerizers have implemented. In practice, the code we generate doesn’t really give the compiler very much scope to optimize anything at all because it’s all just shuffling stuff from the stack to the heap and back and forth. And that you never get to see an actual loop or anything interesting that you can really optimize. So we built the native code generator, and it was much faster at compiling, and it obviously sidestepped a ton of complexity in the C backend.

There was this thing called the Evil Mangler, which some people might have heard of, which took the output of the C compiler, the assembly generated by the C compiler, and mangled it to do various things. So one thing it wanted to do was to remove the sort of functional preambles and prologue and epilogue, I think they were called, and little sequences and instructions at the beginning and the end of every function, which weren’t necessary for Haskell. And it wanted to do various other transformations that involved mapping over all of the assembly and changing things. And this was written in a Perl script. So this was a Perl script. They ran over the assembly code after the C compiler had finished it and spat out some new assembly code that we then assembled and became the binary.

AL (0:11:48): Sounds like a debugging nightmare, but if it works, it works, I guess.

SM (0:11:54): So I can’t remember how long this Perl script was, but it was thousands of lines for sure. It was impossible to debug it. Perl is not the nicest language.

AL (0:12:04): But if it was impossible to debug it, does it mean it was just never incorrect in the first place, or did you have mysterious bugs that you never really found for a long time?

SM (0:12:17): It was incorrect all the time, but it was impossible to do any kind of refactoring on it because it was just so difficult to develop in Perl for a start. I mean, that’s a very difficult language.

AL (0:12:28): Why was it done on Perl?

SM (0:12:32): That’s a very good question. So I mentioned Will Partain who was one of the research assistants on the project, and Will had a particular liking for CPP macros and Perl scripts and things of that nature. He wrote vast swathes of this stuff, layers and layers of very complicated CPP macros and Perl scripts to do this, that, and the other. And we had this framework for doing literate programming, which was kind of popular at the time. So literate programming was the idea that your program should be between delimiters in the source code and everything else. It should be comment by default. So every source file is comments with code in brackets, in sections. And the whole compiler was written in this style, between /begin code, /encode. So the whole compiler was written like that, and much of the build system and the runtime system, and the rest of the source tree was all written in literate style. And there was this literate programming system, which was a set of Perl scripts, which ran over the code and stripped out the code before it was actually compiled or stripped out the comments rather.

AL (0:13:42): So because you actually typeset the–

SM (0:13:45): We did. We used to typeset the compiler and print it out before it got too big. That was unreasonable. Yes. At some point, we decided this was not– we hadn’t really been maintaining the comment as well as the code. At some point, we sort of reversed that and went back to code being the default. So we were talking about Perl script and there was a lot of Perl in the compiler at that time because Will was a big fan of Perl. And I guess it was a tool that most people used for just getting stuff done when you needed to write glue and scripts and things. Perl was the go-to choice at the time.

AL (0:14:23): But basically the Evil Mangler became less important with the creation of the native code generator, I guess. But probably the via C existed for a long time. I mean, I–

SM (0:14:37): That’s right. So the nice thing about the native code generator was that it didn’t need the Evil Mangler because you could generate exactly the assembly code that you wanted because you had full control over what was being generated. So that was one reason. It was faster because not only did you not have to run the C compiler, but you didn’t have to run the Evil Mangler either. And we could make various optimizations that weren’t really possible with the other pipeline. But obviously, the downside is that you have to write a separate native code generator for every backend that you want to implement. So somebody had to go and write a Spark code generator. And I think we had Spark and Alpha and x86. X86 had been ported to it at the time.

AL (0:15:20): So did C– exist at that time, or was that introduced later as an intermediate language before?

SM (0:15:26): That came along much later actually.

AL (0:15:27): Okay. So at that point, you went from STG directly to native code.

SM (0:15:33): So the compiler went from Core to STG and from SSTG directly to– I think there was a data type in the compiler called Abstract C. Abstract C was sort of data type that represented C, and we would pretty print that as C code. And I think the native code generator probably went from abstract C to native code. So abstract C was a thing that got replaced by C– at some point later on.

AL (0:15:57): Okay. So in a way, there was always one extra point in between.

SM (0:16:01): That’s right.

AL (0:16:03): Well, I mean, perhaps going back to your PhD or I mean, at some point, I guess you decided that you want to continue to work on GHC or continue to do Haskell stuff. And I mean, was that sort of like before you finished your PhD or while you were doing your PhD? Or why did you, for example, not decide to continue in the direction of deforestation or–

SM (0:16:28): Yes. So deforestation was interesting because there were two competing approaches to deforestation at the time. There was the original deforestation, which was a transformation over essentially Core intermediate code, which was just– well, it was a direct transformation over Core. So it transformed Core to eliminate the intermediate data structures. And that was the approach that I was following on from Phil’s original work on first-order deforestation to extend it to higher-order deforestation. And I did implement that in the compiler. There was a pass in GHC that did higher-order deforestation, and it really could eliminate intermediate structures from some fairly complex programs that had some examples in my thesis.

But the competing approach was the foldr/build approach deforestation, which in some ways is much simpler. You just rewrite all of your functions using lists in terms of foldr and the build combinator. And then there’s a single rewrite rule that replaces compositions of foldr and build directly to eliminate the intermediate data structures. So this turned out to be, in practice, much easier to implement. It had a different set of challenges, namely once you’ve written things in terms of foldr and build, often you want to rewrite them back again if you don’t end up doing the transformation, because otherwise you’ve made a pessimization instead of an optimization. So that’s one challenge that arises. But if you can solve those challenges, generally, this is a simpler approach because you don’t have to unfold everything and do lots of transformations. That was the other approach involved, unfolding the definitions of all the functions together and then performing the transformation over the composition of these.

AL (0:18:11): But you kind of have to prepare for it everywhere, right? I mean, you have to write your code in a particular way, or these days it’s done via rewrite rules. But nevertheless, I mean, you have to somehow build this transformation, the rewrite to foldr/build into your libraries. And also, you have to do it for other data structures if you want to do it for other data structures. Whereas I could imagine if you do it on the core level, it was your work tied to lists as well, or could you directly apply it to trees or other data structures?

SM (0:18:43): Yes, it actually worked for any data structure really that you can define. So I think there were some limitations. I forget offhand, but yeah, certainly it worked on trees and other data structures. So that’s I suppose one restriction of foldr/build, is that you have to do something different if you want it to work on different data structures. But much of the low-hanging fruit is in lists and we have many benchmarks that we’re doing, lots of list processing that would benefit quite a lot from foldr/build. So I suppose one problem with general deforestation is the unfolding that you get when you unfold these large definitions and then transform them. And then if you don’t end up realizing any benefit from the transformation, you are left with a lot of code that you didn’t have before. And code size bloat was one of the big problems that I think we never really solved in the context of deforestation.

AL (0:19:38): So that stuff that was based on your thesis and that, as you said, was in GHC for a while is no longer in GHC these days? Is that, or is it still somewhere?

SM (0:19:47): No, it was in GHC for a while. I think it was probably never practical enough to really label on general programs. There were some fairly large examples that we used to demonstrate it, to test it, but the foldr/build ended up winning the practicality race there because you could target very specifically lists and solve these problems of not pessimizing when the optimization didn’t apply.

MP (0:20:13): Right. So when did the focus start on more concurrent and parallel Haskell? Was it in there from the start or– because it feels like it started happening more a bit later, right? How did that come to be?

SM (0:20:26): So concurrent Haskell was also a PhD thesis. Sigbjorn Finne I think worked on concurrent Haskell in his PhD, and there was an implementation in GHC of that. But it didn’t really become widely used, I think, until we did a rewrite of the runtime system. So this was around– about the late ’90s, I think. So I was a research assistant. After I finished my PhD, I did a brief foray into type checking Erlang, and then I came back to work on GHC and we rewrote the runtime system. So there was an effort to write a unified runtime system between GHC and Hugs at the time. So Hugs wanted to be the interpreter for GHC. This was before GHCi actually. So we wanted to have an interpreter for GHC, and it made sense to sort of unify Hugs as the interpreter using a combined backend, so Hugs would be able to work with compiled code.

So we designed a new runtime system and I was doing most of the implementation on that. Well, myself and Alastair Reid, who was working on the Hugs side of things. And since we were rewriting the thing, we redesigned everything. We threw away lots of old ideas that hadn’t really panned out in practice and ended up being quite complicated. So that was quite nice. And we could redesign everything.

And it made sense at the time to incorporate concurrency as a sort of native property. So before, it was sort of bolted on in the old runtime system, but we decided to do it as a first-class aspect of the implementation. And one reason we did that was because we discovered how to do it quite cheaply. We were redesigning the runtime system, including the storage manager, and we designed a storage manager that was based around blocks of memory. So when you allocated the heap, you allocated a series of blocks and each block was– well, you could tune the size, but we ended up with something around 4k, I think. And there was this sort of natural break when the program is executing and it’s allocating memory and it comes to the end of one of these blocks. And you have a natural break where you can do things that would be too expensive to do all the time, but it’s quite useful to be able to do them on a continuous basis. And one of those things was checking to see whether you should switch to another thread in a concurrent runtime system. So we could put it at the end of this, the end of allocating a block of memory and it had negligible effect on performance. And it had this nice observable effect to the user where it looks like your threads are into leaving nicely. So that was a really good compromise, and it meant that we could incorporate concurrency for minimal cost and as a native feature of the runtime system. So we decided to do that.

MP (0:23:04): Right. Because I always remember it was always a bit magical when I got into Haskell, how you could just parallel map. And it looks like it just works. But then you moved on to more– yeah. You went down to Haxl and– well, that’s a lot later, I guess. Like making the parallelism very easy to do without having to know exactly how to write it.

SM (0:23:26): Right. Yes. So parallelism separate from concurrency had been around for a long time in the context of Haskell. I think even in the very early days of Haskell, there was experimentation going on in parallel Haskell and building parallel hardware to execute functional programming. So the very basis of parallelism in Haskell is these par and seq combinators. Par is something where you evaluate two arguments in parallel and seq is where you force evaluation of the left-hand side before the right-hand side. And with these two, you can express lots of things including parallel map.

So these have been around for a while, and parallel implementations of Haskell have also been around for a while. But of course, we wanted to have parallelism as well as concurrency in our runtime system. So we also implemented these combinators and it wasn’t until a little while later where we actually started exploiting parallel hardware, because multi-process weren’t very common, I think until– well, I guess it was the late ’90s where they started to become more common, and multicore didn’t come around for a while after that. But we decided, I think it was around late ’90s, early 2000s, to do a multiprocessor implementation of the runtime system. And that was probably quite early. Most languages were not doing this sort of thing. OCaml has only just got a parallel runtime. And I think we were quite early in tackling that.

And there were many challenges, of course, because you want to have parallel garbage collection. The single heap is always the problem. And Erlang had this nice solution to the single heap of just having multiple heaps and not having any direct sharing. If you want to do sharing, you have to do it via another mechanism. But we wanted to take the approach of having a single heap so that sharing was cheap, but garbage collection was going to be complicated. So we had to figure out how to do garbage collection along with parallelism. And first of all, we just didn’t do any parallel garbage collection at all. Every garbage collection would stop the world, stop all the parallel threads, and then garbage collect the heap. And then we did parallel garbage collection. And then eventually, we started experimenting with garbage collection in parallel with mutation execution of the program that we still don’t do actually to this day, but there were experiments done on that.

AL (0:25:44): Yeah, I mean, it’s a difficult problem to solve the way that a garbage collector works. I mean, that’s– what is the core idea of how to make it work in parallel?

SM (0:25:53): Making a garbage collector work in parallel by itself. It is not terribly difficult. There are lots of papers about how to do it. So copying garbage collection, you can just have a work queue, for example.

AL (0:26:05): Sure. But the way that the Haskell guard, the GHC garbage collector works.

SM (0:26:10): So the parallel GC works by– well, we use this block idea that the whole is divided into blocks of memory. So the work queue is a list of blocks, blocks of objects that need to be traversed by the garbage collector. And the blocks give you a good unit of granularity because if your units of granularity is a single object, that’s quite expensive. It means your work queue. And we normally use lock-free queues for this to reduce the overhead, but even so, you want your granularity to be not too small because the overhead of the work queue is quite high. So using blocks gives you a slightly better granularity at the expense of some parallelism because you don’t get quite as good sharing or balancing of the work across CPUs when you use a larger granularity.

So parallelism is never perfect. And in fact, parallelism is never going to be perfect with a garbage collector anyway because the shape of the heap might prevent you from doing any kind of parallel work in the garbage collector. The worst case, of course, is if you just have a single link list. If the heap consists of a single linked list, then you will never discover any parallelism with garbage collectors. You just have one thread working its way down the list. So there’s always a limit to the amount of parallelism you can have. You also have multiple CPUs executing Haskell code with separate allocation areas. So that’s one other aspect of this. Each CPU is allocating into its own allocation area. So that gives you a natural place to introduce parallelism in the garbage collector because you can have parallel threads processing the live objects in each of these separate allocation areas. So that means you do get quite a good parallelism when you’re doing young generation garbage collection, but the old generation garbage collection is collecting all of the data in the long-lived old generation and might encounter these problems of just having long linked lists that don’t have much parallelism.

AL (0:28:12): To put things back into historical context somehow, we’ve now moved on talking a lot about your work on parallel Haskell, but I guess most of that work you have actually been done during your time at Microsoft Research, right?

SM (0:28:25): That’s right.

AL (0:28:26): Did you go to Microsoft Research directly after your PhD or was there something in between?

SM (0:28:32): So after my PhD, like I said, I did a brief foray into type checking Erlang, which was lots of fun. And then I went back to working on GHC, where we were rewriting the runtime system. And then Simon Peyton Jones decided to move to Cambridge and go and work for Microsoft Research in Cambridge. And I went along at the time. I was a research assistant on his project and I also applied to Microsoft Research in Cambridge, and they were fortunately happy to give me a postdoc position to go and continue working on research in the compiler and related things. So yeah, we more or less moved to Cambridge and continued working on GHC, which was very nice, and started doing things like parallelism and a whole bunch of different things.

AL (0:29:17): And you also wrote your book during that time, right? Your Parallel and Concurrent Programming in Haskell?

SM (0:29:23): Yes. The book came out of various– I was doing lectures at summer schools and things like that in concurrency and parallelism, and eventually, I’d accumulated quite a lot of material for these lectures and I thought, well, how can I get more value for this? It seems like there’s almost a book here. And of course, there was probably about a tenth of a book at the time, but it seemed like there was a lot. But I approached O’Reilly about writing a book and they were quite receptive to the idea. So it wasn’t very difficult to convince O’Reilly that this was a good idea. And I showed them, of course, what we already had and there weren’t any other competing books in this space at the time. So yeah, we went ahead. They assigned me an editor and I got onto the very well-designed book writing pipeline at O’Reilly where they set up a schedule and set you up with all the tools, and they give you an editor and decide when you’re going to publish it and how long it’s going to be and all that sort of thing, and started turning what I had into a book. It mostly went fairly smoothly. I think it took a bit longer than anticipated, but I managed to include all the material that I wanted to. And the editor was fortunately quite patient with me. I managed to get my own way most of the time.

AL (0:30:40): Yeah, I guess book writing is one of these things that always takes longer than you expect and there’s always more work than you expect, don’t you? Yeah, I mean, still, you managed to finish it and it’s a nice book. You also didn’t like it so much that after that, you said, you’re like, “Let’s write more and more books.”

SM (0:30:59): Well, yes. I didn’t have any time to write any more books after that. I had just finished the book– well, I left Microsoft to go work at Facebook at the time and I was just finishing the book at that time. In fact, I was writing the very last chapters of the book on the bus, on the way into Facebook in the morning, my first weeks at Facebook.

AL (0:31:22): Yeah, that was a big change, I guess, like moving from a job where you mostly could focus on Haskell and research and keeping the compiler alive to doing other things. I mean, was it even part of the deal initially? Was it clear that you could be using Haskell at Facebook? Was that obvious from the beginning or was that just something that turned out to be true later?

SM (0:31:43): Well, it is very interesting. So I talked to some people at Facebook and I decided to leave Microsoft. So I talked to various people and Facebook was one of the companies that I talked to. And it turned out they were already doing various things with Haskell. So I talked to one team at Facebook who were using Haskell at the time, and they were doing some very exciting things. And it was a very interesting company to work for because they were very open to the idea of open sourcing things that they were working on, mainly backend systems and so on. And I very much liked the company culture and there were people I knew who worked there, like Bryan O’Sullivan for example. He was well-known in the Haskell community and had recently moved to Facebook as well. So I ended up moving to Facebook and I was going to join this team that I’d been talking to who were using Haskell. But unfortunately, on the first week I joined the project, this team we working on got canceled and I was left without a team to join and without a project to work on. So I was kind of left a bit high and dry. But it was still very exciting. And I get to go and talk to lots of teams working on interesting projects around the company and go and try and find out where I might fit in.

I ended up finding a really good opportunity. We had this system called Sigma, which was doing a lot of anti-abuse filtering on Facebook and other platforms, mainly to eliminate spam. But over time, it grew into a general-purpose anti-abuse system that was fighting not just spam, but all kinds of bad stuff that we wanted to detect and get rid of. And the people writing the rules for these systems, because it was a rule engine using a combination of handwritten rules and machine learning to detect various kinds of abuse—the people writing the rules were using a language that this system exposed in the form of a DSL. It was a functional programming language, a purely functional programming language called FXL, but with a completely custom implementation. It was a very special purpose language designed for this one purpose and had a very strange set of features that were needed for this particular use case.

So I met some people on this team and asked them whether they would be interested in using a real language instead of this custom language. And interestingly, I had arrived at the perfect time because they were just feeling the pain of having a custom language. You suddenly need lots of tools and you need profilers and debuggers, and the language needs to be extended, and the language had lots of dark corners that nobody had really fixed properly and lots of legacy stuff. And users were really writing tremendous amounts of code, but users were struggling with the lack of tooling and the strangeness of this language. And they wanted better abstraction to be able to write more complex stuff.

So I came along and said, “Should we use Haskell? Because it’s also a functional programming language and it has lots of tooling and it’s probably going to be a lot faster than your interpreter.” And yeah, they were very receptive to this idea. But we didn’t just say, “Go do it.” We, first of all, decided what we would need to prototype and to demonstrate in order to be able to make the switch. And there were some very particular features that needed to work properly with Haskell.

And the one thing that this language FXL was good at was doing parallel data fetching. So you could write these nicely, purely functional expressions, but in the background, they were fetching data about activity on Facebook, about users, about posts, about friends, all of the activity relating to the content that we wanted to try to assess as to whether it’s abuse or not. So we had to fetch data about all the activity. And of course, you’re doing lots of data fetching and it has to be done in parallel to make it efficient. So this language was very good at doing that, and we had to figure out some way to do the same thing in Haskell. So that’s how the Haxl library came about. I tried various ideas, some of them included unsafe before my– unfortunately, unsafe before my own ideas, didn’t end up panning out in practice. But this idea of using applicative did work out quite nicely, even though arguably, it’s an abuse of applicative in some way. But that’s one debate that we can have over a beer in the pub at some point. But it fitted very nicely and we designed this library and we could implement most of the– well, all of the ideas that were currently being implemented in FXL, we could recast into Haxl and they had the right kind of performance.

MP (0:36:26): So how did the applicative do grow out of the Haxl project?

SM (0:36:32): Yeah, that’s a good question. So the Haxl library gives you the way to parallelize things by using applicative. And it works very nicely if you’re using something like mapM because you get parallelism for free if you mapM a function over a list. But if you’re writing something like a sequence of statements in a do, you wouldn’t automatically get any parallels in between those, even though in many cases, there were no dependencies between the statements so you could parallelize them. And we didn’t really want to have to teach our users how to do this because– well, for two reasons. One is, it’s complicated and it ends up looking a bit ugly if you manually write the applicative expression to get the parallelism.

So we wanted this to happen automatically in some way. So that’s how applicative do came about. But it wasn’t obvious how to do it. But because many do statements don’t completely turn into applicative, somehow you want to extract the available parallelism by analyzing the dependencies and doing some transformation on it. But there isn’t one single way to do that. So it ends up being something that’s a bit like an optimization rather than a bit of extra syntax, which is not really syntactic sugar; it’s more like an optimization. So nevertheless, it fitted into the front end of the compiler. That’s the natural place to put it because it’s alongside where we normally translate do syntax.

So it ended up being this slightly strange transformation, which I think in hindsight, perhaps, could have been maybe a compiler plugin or something like that because it has a sort of weird, not fully specified behavior.

MP (0:38:18): Right. I mean, and later, you had these selective applicative functors. And then people were like, “No, we’re not going to do selective applicative do. That helps too much.”

SM (0:38:28): Yeah. So this is Andrey Mokhov’s idea, and it’s very clever. So the idea is that sometimes you have applicatives where you want to explore multiple branches and then give up if one of the branches fails, which we actually do in Haxl sometimes so that there are cases where you want to try multiple branches, and if you get a result from one branch, then give up on the other branch that you were exploring. And this turns out to be really useful actually in some of the cases that we needed to implement in Sigma where you have multiple different tests, and if one of the tests returns a result, then you don’t have to do the other test. So that turns out to be really useful. And you can express these things in terms of selective applicative functors. So Andrey’s idea was to nicely abstract that into a general property. So I think we do actually implement in Haxl there. We implement our parallel combinators in terms of selective applicative functors. So that worked out quite nicely.

MP (0:39:27): And do you end up then using a compiler plugin or how do you optimize it automatically without your users having to do anything?

SM (0:39:34): Using applicative do, or the selective applicative functors you mean?

MP (0:39:38): Yeah, no, I was saying, did your users have to learn new things to optimize the selective applicative functors or it just works with the current framework?

SM (0:39:47): No. So we only expose some very simple combinators, parallel or unparallel. And giving them the full power of selective applicative functors is probably not the best thing to do. So the users of Sigma fall into two categories. We’ve got a large category of people who just want to go in and write a rule to do something, and they’re probably writing Haskell less than 10% of their time. And they would really like to learn the bare minimum to get something done. And then there are people who spend their whole time writing stuff in Sigma and they build complicated abstraction frameworks to do interesting things. So those sorts of people might be open to the idea of using selective applicative functors, but the vast majority of people using Sigma would really like to use a very simple language and not have a very steep learning curve to climb.

AL (0:40:42): Is there actual sort of manual performance tuning going on, on the Sigma side that people spend time optimizing their rules? And are there sort of tools that you’ve been developed that are specific to Sigma in order to help with that, or is that all just Haskell?

SM (0:41:00): There’s definitely a lot of optimization that has to happen because many of these rules have to run synchronously when the user is performing some action on Facebook. They’re posting a message or they’re writing a post, or they’re clicking a button, or whatever. The action has to run synchronously and return a result, which might deny the user the action for some reason. So in those cases, we have a matter of milliseconds to make a decision. So you need to– well, we run all the rules in parallel for one thing. So that’s where Haxl is useful because it runs everything together in parallel and combines all the data, fetches and does memorization, and so on. And then there are a number of rules that run offline as well. They also have performance constraints because we have a total number of resources that we can bring to bear on running Sigma rules. And anything that’s far too expensive will just consume too many machines and reduce the resources we have available to run other rules.

MP (0:41:58): So how do you teach people? Like you introduce them, they get hired and they have to get up and running with a functional language quite quick, I guess. So do you have like a bootcamp, or how do you teach them the tricks?

SM (0:42:12): There’s a lot of documentation, but generally, the way people learn things at Facebook is by doing. So there’s a lot of– we make small tasks that you can use as learning tools for people who are trying to climb the learning curve. And it’s usually the job of the more senior people who’ve already climbed the learning curve to make these tasks. So we normally help people up the learning curve by giving them actual things to do and helping them do those things generally by supporting them while they’re doing–

AL (0:42:43): So these are always new and real tasks. They’re not just specific teaching tasks that nobody cares about, but they’re–

SM (0:42:52): Yeah, exactly. Yeah.

AL (0:42:54): Okay. So you continuously have to come up with new small tasks that are suitable for new learners.

SM (0:43:01): That’s right. In some ways, that’s a burden on more senior people that we have to continuously come up with tasks and support new team members to be able to do these things. And that invariably takes longer than just doing it yourself. So it’s always a temptation not to make a– we call them bootcamp tasks because they’re assigned to people who are in bootcamp. Bootcamp is a period where you’ve just joined the company and you are learning things. So there’s always a temptation not to make bootcamp tasks, but just to do it yourself. But we have to get people up the learning curve somehow, and that’s the way we do it.

AL (0:43:39): So more recently, I think you’ve been working no longer on Sigma, but mostly on another thing called Glean. Is that because Sigma was sort of successful enough for you to just leave it behind and focus on other things, or was it because Glean was specifically something that you were very interested in?

SM (0:43:58): Yes. Well, there came a point with Sigma where we’d gone through the whole cycle from imagining the project, prototyping it and developing it, building a team around it, and so on, and deploying it in production. And it got to the point where it was successful. We deployed it, we were solving various problems of scaling. Most systems have a continuous stream of scaling issues to solve. So, we were solving those problems, but it had kind of got to the point where most of the changes we were making were incremental, and the team could survive by itself. We’d become a self-sufficient team and it could survive without me. So I felt at that point it was time to look around and see whether there were any other challenges that I might go and try.

And so we had this system called Glean that Roman Leshchinskiy, who was also a Haskell person, you might know as the author of the Vector Library—he had joined Facebook a few months earlier and started to work on a system called Glean, which is designed to do code indexing at scale. And he designed this quite elaborate solution that involved a sort of datalog-like storage engine with some innovations that was particularly designed to solving the kind of problems that we had with indexing the large code basis that we had.

AL (0:45:21): So the primary motivation is for indexing your own code basis.

SM (0:45:25): Yeah, that’s right. But we didn’t want it to be just a Facebook-only solution. We wanted it to be open source and generally applicable, and not just for code indexing. I think although it happens to be very good at code indexing, we designed it for that. It’s not specific to code indexing. So it has nothing built in that makes it useful only for code indexing. It’s a general-purpose storage engine for immutable data with a datalog-like query engine. So after Sigma, I went along to Roman’s team and asked whether they’d be interested in having me join and help with Glean, and yeah, they were–

AL (0:46:03): Who would say no?

SM (0:46:06): And, well, there was only Roman and Chris on the team at the moment, at the time. Chris Kuklewicz who you may know is the author of the Regex Library in Haskell. So also a lot of Haskell–

MP (0:46:18): There were a lot of Haskell people on the team, or it was just Haskell people, I guess?

SM (0:46:22): It was just Haskell people on the team. Yes.

AL (0:46:25): So it was also written in Haskell from the beginning without any real discussion, I guess.

SM (0:46:32): Well, there was some discussion for sure. If you want to use Haskell, you have to justify your choice somehow. So Roman had decided to write it, or at least the higher layers of Glean in Haskell. The lower layers of Glean are written in C++ for reasons of interfacing with RocksDB for one, and have a very tight control over storage management for data in memory, because it’s dealing with a lot of data. And also because we wanted to index C++. And C++, we needed to talk to the Clang library, which is a C++.

So how did Roman get away with writing it in Haskell? Well, he made the case that to use Glean, you wouldn’t have to learn Haskell. So Glean is used via either an RPC layer, which is in Thrift, and then you can use that from any language. And at the lower layers, there is a C-only interface to the internals of Glean, which is used by the C++ indexer, for example. So there’s multiple different ways that you can interface with Glean without having to know Haskell. And Haskell is purely used as the language that implements all of the higher-level functionality, manipulating databases, manipulating schemas, the query engine, all of the stuff that lives inside the server. So it meant that we got to use Haskell for our own productivity and for reliability and all the other good reasons that we want to use Haskell, but we don’t force it on anybody else, which is just fine.

AL (0:48:04): And what’s the state of Glean now? Is it done or are there big goals left? Is it in active use at Meta or?

SM (0:48:12): It’s definitely in active use, yes. So we’re indexing all of our code– well, the vast majority of our code using Glean. It’s used in a number of different applications that developers use day-to-day in their workflows. So one reason they use code indexing is to be able to navigate from symbols to their definitions. That’s something that you do very commonly when you are writing code. So you want to click through in your code browser or in your ID to a definition. And that’s one of the primary features that we provide in the data that’s indexed by Glean. So the code browsers, online code browsers, ID, navigation, also documentation tools. So in the Haskell world, we have Haddock, but we’re trying to provide Haddock-like functionality for all the other languages as well by flipping out the dot comments in your C++, rendering types and declarations, and so forth in our code browser. So we want Glean to be able to collect and provide all of the information you would need to be able to render a page of documentation for any language. So that’s one of the things that we’re working on. And doing code search, of course. You’d like to be able to take a symbol name and go find all the declarations that have that symbol name. So Glean is providing search facilities as well.

AL (0:49:31): So in the sense that Haxl and perhaps applicative do came out of Sigma, are there sort of any general purpose Haskell libraries or tools or extensions that have come or will come out of Glean?

SM (0:49:50): Yeah. Well, that’s a good question.

AL (0:49:51): Or is Haskell just good enough for everything that Glean is trying to do so that nothing needs to be added?

SM (0:49:58): Nothing like Haxl really. I would say. Although we are actually using Haxl in Glean. I mean, Haxl is tremendously useful. Because clients of Glean who want to do things like traversing lots of data about doing analysis, for example. You want to traverse lots of data about the code that you are doing analysis on, often want to do lots of queries in parallel. And it’s a natural use case for Haxl, so you just write all your code in a naturally imperative, well, monadic style and applicative do, and Haxl does the business of parallelizing all those queries. So it works very nicely for clients with Glean as well. But we haven’t extracted from Glean any sort of reusable libraries other than small utilities. There’s quite a few sort of small utility libraries, but we haven’t done the work of just extracting those.

AL (0:50:48): So perhaps sort of slowly moving towards the end is, if you look at the next five to 10 years, is there something like, I mean, are you planning to continue working on Glean, or are there any other big things that you would still like to do?

SM (0:51:05): For the medium term, certainly, I’m planning on still working on Glean. There’s quite a few interesting challenges that we’re still working on. So one of them is that we’d like to be able to index code incrementally. This was always one of the core ideas with Glean, is that you would be able to not just build a single monolithic database of all the data about your code, but you would be able to incrementally index the changes and provide that.

AL (0:51:35): And with incrementally, do you mean while a person is editing or on the granularity of Git commits, or in what sense incremental?

SM (0:51:46): In the limit, yes. You would like to be able to interactively as the user is typing. But we’re not there yet. We’ve got to the point now where we’re able to index the difference between two commits in the database, in the repository rather. And to be able to build a database that consists of the single monolithic base database plus the delta on top. And the delta hides some of the data in the base database, so it hides the files that you’ve changed and anything that’s affected by the files that have changed. And then we stack the new data on top of that. And then you can view the stack of these two databases as a single database from the point of view of the user. It’s indistinguishable from the database you would’ve built if you’d done the full index of the repository at the new revision. So we’re able to do that and it’s much faster than doing the full index. Obviously, it produces much less data and it’s non-destructive. So you still have access to the base revision and you also have access to the new revision. So you can index many revisions and have access to all of them simultaneously.

MP (0:52:53): Right. And can it show like diffs between what changed documentation-wise between the previous and the new?

SM (0:53:01): So showing differences from between the databases is probably much too low level than what you would want because the database is storing facts in the Glean schema, which is a very low level. It is essentially equivalent to the data types that you would write for the AST of your language. So that’s what’s been stored in the database. So you probably don’t want to see the diffs between the AST, you would like to see some sort of higher-level view of what the changes are. So you would have to– I think to show the diffs, you would probably have to design some queries that extracted the data you want. You could certainly do that. Glean has quite an expressive query language that’s designed around it. It’s rather Prolog-ish, but it’s intended to be more deterministic, like datalog. It doesn’t have backtracking. It’s datalog with a more Prolog-ish-like syntax to it.

AL (0:53:57): Okay. All right. I think let’s come to an end here. Thank you very much, Simon, for taking the time to talk to us.

SM (0:54:04): You’re very welcome.

Narrator (0:54:08): The Haskell Interlude Podcast is a project of the Haskell Foundation, and it is made possible by the support of our sponsors, especially the Monad-level sponsors: Digital Asset, GitHub, Input Output, Juspay, and Meta.

26 – Simon Marlow

Related links

Transcript