Ben Gamari

Transcript

This transcript may contain mistakes. Did you find any? Feel free to fix them!

Wouter Swierstra (0:00:08): Welcome to the next episode of the Haskell Interlude. I’m Wouter Swierstr, and joining us today is Ben Gamari. Ben is a consultant at Well-Typed, known for his work on GHC. In this episode, he’ll tell us a little bit about his switch from Python to Haskell, but not because he was missing the static typing, how programming his thermostat led him to a career in compiler development, and what it’s like to be a GHC force multiplier.

Okay, so welcome, Ben. You work at a Haskell consultancy called Well-Typed, where you’ve done a lot of work on GHC. So I’m very happy to have you here, that we have an opportunity to talk about that. But before we do so, I was curious, how did you actually get into Haskell?

Ben Gamari (0:00:54): Oh, that is quite a question actually. I came maybe in a strange way. I think a lot of Haskellers have somewhat theoretical backgrounds and really love the types and the ideas in a very high-level way. But in my case, I actually came from maybe the other direction. I started programming when I stumbled upon the TRS-80 in my grandparents’ basement. It’s just this little microcomputer thing. It has a composite video port on the back, you plug it into your TV along with a tape drive. It’s a very 1980s technology and it was like maybe 8, version 9 when that happened. And I found a book that accompanied it and made me really start to appreciate, well, these machines, having a machine there that you can program, and it’s just a simple enough machine that you could reasonably understand what the thing is doing. It has a basic interpreter on it, but in the back of the book, there was a list of I/O ports and you could really get down into the nitty-gritties of what the machine was comprised of.

I started on that and I took a very, I think, sort of common pathway. I moved on from that to Visual Basic to C to C++ and some Java, C#, et cetera, and then eventually made it to Python where I sort of– by the time I was an undergraduate, I was an undergraduate in Physics and started– we did experimental research in my undergraduate insofar as you do research as an undergraduate, and I was doing a lot of data analysis and Python is the typical tool for this in the physical sciences. Up until that point, I really had a love for languages and further learning. I loved algorithmic, thinking about how to express things to a computer. And Python, it seemed like it was a great language. It allowed me to do the things I wanted to do, but on the other hand, it had some distinct limitations.

And concretely, later in my undergraduate, I started working with a QCD group and we were doing very numerically heavy simulations and targeting GPUs even, which was a relatively new technology at the time. And it really required, again, that you understand what the machine is doing. And we, our group, and we, humanity, were at that point still trying to figure out how to write efficient general versus GPU programs. And so we went through a lot of iterations and trying to explore differences in data layout and memory access patterns and all this. And every time we did that in our C++ code base, it required that we rewrite much of the program. That made me start to wonder, “Well, there’s got to be a better way.” This seems way too low-level, but also too fragile. I mean, you have no real way of expressing the idea that you’re trying to convey orthogonally to the implementation of that idea. And so that’s something that I started to poke around again and rekindled my love of, well, languages and how do you express ideas to computers.

I stumbled across Real World Haskell at that time. I read that for a few hours in the quad and promptly forgot about it. But a few years later when I was in grad school, I came across the same exact situation. I was trying to– at that point, in an experimental group, we were doing single-molecule fluorescence experiments and we were trying to figure out how to use probabilistic methods to analyze this sort of data, again, using Python because that was the tool. Everybody in the lab knew it. But you ran into the same problem that whenever you end up looping over a million points in your Python program, performance is going to suffer. And so I started to think back, “Oh, wasn’t there this Haskell thing that really at least allowed you to– it was compiled language and allowed you to express the ideas more clearly.” Because you write performance code in Python using arrays and NumPy and Cython and all these tools that people came up with. But really, I was dissatisfied with the code that resulted and it just seemed way too fragile. So I started to look at Haskell again in grad school and that was a turning point for me, I think. I really fell in love with the language at that point, having bounced it off at the first time. And I think that was like 2012 or 2011 maybe.

WS (0:04:54): Okay. So that’s about 10 years ago.

BG (0:04:56): Yeah, I guess so. Incredibly enough, yeah. I know, time flies.

WS (0:05:01): It sounds like you’ve had quite a bit of exposure to other languages. And as much as I appreciate Python, I always– I mean, I’ve had to program professionally in Python a little bit, and I’m probably like the world’s worst Python programmer because the part of my brain, which keeps track of which variables are in scope and what their types are, is atrophy to the point where I’m relying on the compiler to help me out here. And then as soon as I had to write any Python code, even just like some simple script or test framework or something, I would completely mess that up and forget to test branches in the if analysis and like, “Oh, it seems to work and this should be good enough.” But that’s not what drove you to Haskell then. It wasn’t so much the types or the static safety, but it’s the–

BG (0:05:45): No, it was really being able to write a clear program that would actually perform well. That was what I really found so compelling about it. And the composability that you could get out was nice, but I can actually write what I mean in this language and I don’t have to suffer for it.

Joachim Breitner (0:06:02): Were you able to just do this on your own, or did you have to convince your group that they should all know, read and write this strange language?

BG (0:06:09): Well, I mean, as we all know as Haskellers, it’s always a challenge—getting the rest of the world to buy in. So thankfully, within the physical sciences, you often find– in a group, there’s like one or two people which really have the computing knowledge and develop the tooling that the rest of the group use. And in my case, to some extent, that was me. At least, I developed a very deep set of tools. That was the part of my work there that I really enjoyed. That’s when I realized I think that I was probably more of an engineer than I was a scientist. I mean, I love working on hardware, the experimental apparatus. We spent a lot of time developing the actual measurement equipment and trying to make like a– essentially, I developed an end-to-end experiment, which was great. The building part, I loved. Yeah, for the most part, people used the compiled code, the Haskell programs that I wrote, not knowing that they were in Haskell essentially, or they used Python for some analyses, and yeah. So it was not an easy sell, but I also didn’t really try to sell it. That was one of the great things about that setting, is, well, you do your own thing and as long as you’re doing the research that is useful for you with whatever tools, then that’s fine.

WS (0:07:20): But this is in 2012 then. And then at some point, this is you in grad school, but how did you then transition to becoming a professional Haskell programmer?

BG (0:07:30): Kind of being at the right place at the right time, I guess. So during my time in graduate school, I found myself being increasingly drawn to Haskell as an object of fascination and maybe a side project that I could work on. And from the very moment I saw the language and the high-level power that it gave you, I wondered how is it possible that this thing produces the code. Knowing what the machine does and knowing what the high-level declarative specification that you’re writing, what happens in between those two things? There’s got to be some magic sauce in there, and I really want to know it. And so I started to dive probably in 2012 or so into the compiler a bit. I started looking at the linker of all things because I really wanted to– in keeping with this low-level background I guess, I really wanted to work on this project of having a little– I took an embedded computer called BeagleBone and use it as a thermostat in my apartment and have a network of these things that I wanted to write the control loop in Haskell because obviously, you don’t want your heater to fail. And so, yeah, I started looking at the linker simply because I really wanted my program to actually run on this arm-based BeagleBone. And I started to then work my way forward in the compiler and started to look at, oh yeah, the core, I see. And these are the optimizations that allow this code to run so well and this is why it doesn’t run so well, and then start to think about how that can be improved.

And so around 2015 or so, I thought I’d mostly done with graduate school, one way or the other, wanting to move on, do something new, and I really heard about this open Well-Typed. Austin Seipp pinged me at the time and said, “We’re looking for somebody.” And I, at the time, was thinking of moving to Germany anyways with my partner Laura, and said, “Okay, well, I guess I’ll put it in an application. Why not, right?” And that somehow just happened to pull the right people. And now they hired me and thankfully they were looking for somebody to work on GHC, which was really– that was what I really wanted to do, was work on GHC.

WS (0:09:45): So it’s actually programming your thermostat, which led to–

BG (0:09:51): Yeah. So very indirect, right?

WS (0:09:55): Okay.

JB (0:09:56): That’s a very well-shaved yak then.

BG (0:10:00): Well, I mean, unfortunately, the thermostat project never really got to the– like so many projects that one starts, it’s an ongoing project. In fact, I have an ancestor of that project in the wall behind the screen over there. It no longer runs Haskell, but–

WS (0:10:19): Oh. So you’ve done a lot of different things for GHC. So maybe we can talk about the garbage collector that you implemented a little bit. So how did that go?

BG (0:10:28): Oh boy. Yeah. Well, how did it go? So it’s still going actually.

WS (0:10:31): Yeah. Okay.

BG (0:10:33): In multiple ways. So, yeah. Essentially, this whole project started with a client of ours saying, “We want to deploy Haskell more broadly on our server side applications.” But ultimately, it’s a server. And consequently, you need to– it’s possible to work around slow code. It’s very hard to work around code that is just very highly variable in latency. And so this is what– they wanted ultimately to have some approach and it was fairly open-ended. We proposed a concurrent collector, but there was other avenues you can take as well to reduce that latency variance. And well, it was a stretch project for me. I had always been fascinated by storage management, but really had never worked on a garbage collector. I worked in various parts of GHC’s storage manager, but never still lost that I did not know at that time. And so I had great support within Well-Typed to say, “Let’s just put in a proposal. If you think you can do it, then let’s write up a plan and see what the client says.” And we did. And they apparently liked the proposal.

We started that work in– I think it was probably 2017 at that point. That was a slow start. It took a while because at the time, it was me and– I’m not sure if Omer had started yet. There were essentially two of us working on GHC under the maintenance contract. It was a challenge to try to keep GHC moving and also get deep time to think about this collector, did a lot of reading, trying to get up to speed on how do you design a concurrent collector. It’s a non-trivial thing. Essentially, you’re trying to traverse this graph, the heap graph, the heap reference graph, while it is being mutated in an efficient way, but also in a safe way. You have to make sure that you don’t stumble across an object that’s being mutated, and if you do mutate something, you have to make sure that the old object is still reachable in some sense that you don’t collect. So it took probably three months of just reading. And that was primarily during writing the proposal, trying to come up with an idea and see what is the state of the art here and how do we fit that more importantly into what GHC already has, because we had a fairly, not small budget, but it was not infinite either. So we needed to keep things fairly low risk and be able to be confident that we can actually implement this thing correctly.

So we merged in, I think, 2019 it was, and that was in many ways the beginning because we did a lot of testing on it. But you can never– in a concurrent algorithm like this, it is very, very difficult to test with the sufficient depth that you’ve actually caught all of the bugs. And if this project has taught me anything, it is that systems can be very horribly wrong and yet still somehow limp along, suggesting that everything is just fine. Thankfully, we caught most of those bugs before release, but we recently just– boy, this last month I caught a bug which was essentially preventing weak objects from getting ever finalized in some cases. There’s these difficult cases. GHC has a very simple heap layout, but a lot of cases. So there’s a lot of interactions that you need to deal with. And making sure that you have considered all those interactions was really the challenge there.

And so I was disappointed frankly. I looked at the tools for verification and less thorough, not necessarily proof-based verification, but model checking in these sorts of tools and wasn’t really able to find a tool that would allow me to both be able to develop an expansive enough model, in the case of model checking that it would actually catch the issues that I want to catch, which are, in many cases, the unknowns. That’s the interactions you didn’t think of that are so difficult. So that’s the issue with model checking, is you have to know what you’re looking for to develop the model, know what’s relevant. And the problem then on the more proof-based side of things is, well, you could easily spend as much time proving correctness as you did actually right in the collector.

WS (0:14:57): If not more, right?

BG (0:14:58): Yeah, that’s right. Easily. So yeah, that’s something that I’ve been really wanting to think more about. How do you actually test these sorts of things? And maybe the idea isn’t necessarily you saying more at compile time, especially because in C, which is in GHC, we have a runtime system written in C which is where most of the garbage collector resides. And then a lot of code written in C–, which is our intermediate representation, which will hold prim ops and whatnot are implemented. And then you have this third component, which is the code generated by the compiler, the code generator. And so there isn’t really one representation. You’re not even talking about one language when you’re talking about proving the correctness of something like this or establishing the correctness of a system like a garbage collector or runtime. You need to think about these three and how they interact in this often very difficult-to-characterize way. These interactions are not trivial and there’s a lot of them to think about.

WS (0:15:57): It must be a huge challenge, right? Because it’s– like you say, debugging these concurrent programs is very, very hard. And then I mean, GHC is a big test suite and you can run lots of code from Hackage or something, but once you release it into the wild, you know that there will be people who run code that you never thought of. And all of a sudden, it triggers a corner case that you didn’t have in your test suite or that forks thread in a certain order. And that must be– that’s a really tricky part, I guess.

BG (0:16:26): Yeah. So I mean, there were, in the course of this project, literally weeks spent in the debugger on single bugs. This is the time scale and you just have to put essentially everything aside and just completely immerse yourself in it to really get the feel. Because when you’re debugging a GC bug in particular, all you know is something went wrong sometime. I mean, it may have been like five minutes ago that the actual issue happened or three GC cycles ago, but you don’t actually see it manifest until you happen to look at the object that was affected or, yeah. So it’s a really very tricky set of problems.

JB (0:17:05): Did you improve your test tooling while doing this?

BG (0:17:08): Yeah. So we have a battery of essentially micro tests that exercise individual heap closure types and making sure that you can expect that weak objects are going to get finalized now promptly, which is not something we had previously. That was the bug that I– but yeah. So we have a set of tests like that. One of the things that I did not do, that in hindsight I think I would’ve put more effort into, was really come up with a means of being able to reproduce arbitrary interleavings of GC and mutator work such that you can actually write tests that are not dependent upon like the scheduler.

WS (0:17:51): So you can replay a particular series of mutation GC–

BG (0:17:56): That’s right.

WS (0:17:57): That trigger the bugs that you can reliably try to retrigger the same bug after a half dozen commits later.

BG (0:18:05): Yeah, exactly. And that’s actually something that recently we’ve thought a bit about again, because right now– actually, last year and into last summer, I had a student, and she’s actually still working on this even now, together with Steve Blackburn, who’s sort of a GC person, who’s responsible for MMTk, which is a very venerable– Memory management toolkit is the name of the library. And it used to be a Java library, and now they’ve poured it to Rust. And our goal has been to see what would it look like if we poured a GHC to use MMTk. So we’re relearning all of the things I learned together with Junming, the student, about how the failure modes go in GC work. And so we’ve been thinking about, okay, maybe we could actually do a quick textile test so if you had the ability to produce into leavings and allocate, adjust constructor and then enter this stunk and then do a GC, it’s like model checking, but in a somehow more Haskelly sense. So you have a bit more control over the test that you’re generating.

WS (0:19:09): There’s a lot of quick check work in that direction as well, right? They try to make some kind of API manifest and then generate a bunch of random calls to the API, which have a certain structure in order to observe the behavior of some bigger system. And then even if you don’t know all the functions hidden behind the API, you can still generate a whole bunch of traces and still maybe observe unexpected behavior or a desirable behavior or something like that.

BG (0:19:37): Yeah, precisely. I mean, we know very concretely what the invariants that we expect to hold are. So asserting those on arbitrary traces already goes a good way towards establishing correctness. But in the GHC runtime, we already have a sanity checker which can do this. The problem is we just don’t have a lot of control over how the state of the system evolves prior to hitting the sanity checker. What concretely happens; again, what interleavings of mutator, mutation work, and runtime work are encountered. So it’s still lots of work to be done here, and I think it’s an interesting set of open problems.

WS (0:20:14): Yeah, for sure.

JB (0:20:15): Can you describe the idea behind the concurrent garbage collector in a way that you can transmit through radio?

BG (0:20:22): Yeah, I can try. So really, most concurrent collectors, it’s a mark-sweep collector, and mark-sweep, we call it the non-moving collector because essentially, let’s step back. What are the constraints of the collector? Well, we wanted to make sure that we didn’t have to touch the mutator code. We wanted to make sure we didn’t have to recompile the program. And the mutator code that GHC generates has its allocation logics that are baked into it. So we couldn’t really change out the allocator. So the idea is we just take, and we use the bump pointer nursery, which is how the allocator that is baked into the generated code is by a bump pointer nursery. We’re just going to allocate a block of memory and just allocate, first, one object. And then directly after that, you’ll place the next object. And directly after that, the next object until you run out of memory. Until you run out of memory in your block, and you’ll do a GC. And so we’re going to use that same nursery. No change there, but we’re going to, instead of moving old data, once data has lasted for a few GC cycles, we’re going to move it off into the non-moving section of the heap, which is going to be collected via a different collection strategy. So whereas in a typical– in GHC’s old collector, and then in the copying collector, we use a copying sort of two-space or semi-space collector where we’re going to walk the entire reachable heap and copy all, everything that we find live into a new section of memory, into a fresh block of memory, which then is where that date is going to live until the next time we do a garbage collection. So every time you do a garbage collection, you’re going to copy essentially everything that you–

JB (0:22:02): That’s the latency that your client–

BG (0:22:04): And that is the reason why it’s high latency, exactly, is because you need to– during a pause, the mutator has to be not running when you do this copying for a reason. It’s very difficult to do a copying collector to minimize latency in a copying collector because everything is potentially moving. So if I have an object A, which is going to refer to an object B, and if I move B, then I have to make sure that before that change is visible, that I update the reference in A. Otherwise, the mutator, if it looks at A, it’s going to see an old reference. It’s now A should be pointed over to its new position in so-called to-space, but it’s still pointing at its old location. So we took what is a fairly common approach here and just instead went with a non-moving generation for the old generation, which is the only generation that is unbounded in size and therefore unbounded in latency. And we collect that non-moving generation with a mark-sweep collector. So instead of copying, we’re just going to leave the data where it is, but keep track of whether it’s live or not in a bitmap. And we have an allocator, which is based on a paper, a very clever allocator designed by Bueno, who allows us to allocate into these non-moving segments we call them, and be able to reclaim data and reallocate into those blocks after we reclaim them.

And the nice thing about a mark-sweep collector is that the marking effort can be done concurrently with mutation. You don’t have the problems of a copying collector or you need to make sure that the GC is atomic, and you have to make sure that the mutator can’t see any intermediate states during collection. But in the case of a concurrent mark-sweep collector like ours, you can safely just carry on marking. You’re only updating the metadata, not the data of the heap. And that allows you to be able to safely do the bulk of the work concurrently. And then it’s only just a small, short, hopefully, synchronizations are necessary for where the mutator needs to be interrupted.

JB (0:24:05): And is that something that I would want now in all my Haskell programs or is that really for special applications?

BG (0:24:11): Yeah, no. I mean, there’s definitely a latency throughput tradeoff here. Copying collectors are actually remarkably efficient for a number of reasons. You have the fact that they tend to improve data locality because if you think about what a collector is going to do, it’s going to– when it encounters object A and again, A refers to B, it’s going to copy object A. And then soon thereafter, it’s going to encounter object B just by the order in which we traverse the heap. And we’re going to copy B temporarily at a similar time to A. And therefore spatially, it’s going to end up in a similar region of heap, close by region of heap, which is good for cash utilization. And moreover, copying is very cheap and the allocation is even cheaper. Since it’s a bump pointer allocator, the only allocation state– allocator state rather is just where you are in the block that you’re currently allocated into. So allocation is just a single add instruction essentially. It doesn’t get much cheaper than that. Whereas in the non-moving collector, you’ve got to search where is the next block that’s free. And then update various metadata to record the fact that you’ve allocated into that block. And it’s a more involved process. So you pay a little bit in the cost. Maybe between, in my experience, 1 and 10% in throughput depending upon the program by using the non-moving collector. But in exchange, you get one or two orders of magnitude improvement in latency.

If you do care about latency, it can be a really great tradeoff. But that usually is only a concern if you have– there’s a few criteria that you need to meet in order to actually get benefits from this collector. You need a very large heap. Over a hundred megabytes really, ideally gigabytes. You need to care about latency. And there’s certain patterns of mutation that we still don’t deal with particularly well overheads if you do a lot of array mutation, like you have a boxed array in particular. Unboxed arrays are not a concern here by arrays and such, but a boxed array suffering a lot of mutation tends to result in more work that the collector needs to do because of an invariant that it needs to maintain internally. Something that would be interesting to look at resolving, but hasn’t really come up in any relevant applications thus far. So we’ve not really bothered.

WS (0:26:30): Okay. So besides all of this work on the garbage collector, you play a very important part in the management of the GC developer community, going through issues and welcoming newcomers. So how did you go from the very low-level stuff into suddenly this more social role of managing a massive open source project like GHC.

BG (0:26:51): I guess by need. I mean, for a long time, the work that Well-Typed did on GHC was funded by Microsoft Research under contract from Simon Peyton Jones, who really instilled from the moment I joined, essentially, that your job is to be a force multiplier. You are only one person, but there’s a whole entire community of potential contributors out there and we need to make sure we leverage– we’re able to get those contributions upstream and foster that community effectively. And it’s a tradeoff because ultimately, there are bugs that you need to handle to get the release out, but then there’s also the pile of review work that you need to do alongside it to make sure that community members feel like they can engage.

So that was a pretty hard thing to juggle over for a while. Thankfully now we have more people, and I think that’s really helped a lot. In the past few years, we’ve gained a few more team members. So now we have essentially four people working pretty close to full-time on GHC on various contracts. And that makes it much easier to balance because ultimately, there’s only so many hours in a day. But sometimes there are developments in the community that you have to respond to. And that was what– the GHC proposal process was one example of this. There was community members who rightly pointed out that GHC development and language evolution seemed like it was closed. It seemed like there’s a small number of people who have the ear of– the powers that be. And if you’re one of those people, great. You can get your contributions into the compiler. And if not, then well, tough luck. And so we really felt like, yeah, well, that can’t stand because that’s really exactly the opposite of what we’re trying to achieve here. So that was something that we thought, okay, well how can we make this more open? And thankfully, Rust has really served as a great source of inspiration in this regard. They have really served as a great template for how you foster community and we’ve been trying to emulate that. It’s tricky, but–

WS (0:28:53): Is it a rule you enjoy?

BG (0:28:55): Yeah. I think it’s great to know that there are people who care enough about the project that you work on to actually try to engage. I won’t lie. I mean, sometimes you look at the queue of tickets and merge requests, and you say, “Oh, is it ever going to end?” But that being said, then you think about, well, but each of these people is somebody who ultimately wants to help. And that’s really the important thing. And it’s just something that few projects can say at the scale that a project like GHC can, that they have a really engaged set of users.

WS (0:29:32): So another thing that I think that you’ve spent quite some time on is also making what you might say GHC commercially relevant. These were your words that you suggested.

BG (0:29:44): Yep.

WS (0:29:45): Tackling things that maybe if you’re exploring some fancy new type system feature are academically very exciting, but there’s a lot of stuff which also needs to be done in order to make the language usable and commercially viable or make it ready for being more widely adopted, I guess. So is there anything there that you’re particularly proud of?

BG (0:30:05): Oh boy. Yeah. I mean, there’s a lot. Ultimately, that’s a good fraction of my work there, is ultimately– because what it comes down to is we have a lot of people in our community who are very enthusiastic about the surface language and the type system and there’s no shortage of contribution there, but the less glamorous part of, well, okay how do you get a backtrace of your program when it crashes? How do we ensure that we run correctly and efficiently on Air 64 when Apple is deploying that architecture more and more? How do we get releases out and be certain that programs that have been written against previous releases will continue to run against the new release? All of these infrastructural things or work that either focuses on the more back end of the compiler or the bits that are really almost more DevOpsy things, but need to get done.

We’ve done a lot of work recently in improving our testing story for our backend. That’s something that I really– this test prim ops package that essentially uses a quick check style approach to testing the correctness of CMM backend. So you can actually run in cross-compiled and non-cost-compiled forms against a reference interpreter and get a reasonable assurance that at least some fragment of the CMM implementation of that target is correct. In GHC, we tend not to use too many fancy types and whatnot. There’s a few got-its to be found here and there, but some type families, thanks to Tables– Trees that Grow rather. But yeah, that was a– because it was a greenfield project, I could design it as I saw fit without the constraints of working with an existing code base. And that was a nice change.

WS (0:31:47): So you’ve also done quite a lot of work on exception provenance recently where, like you say, sometimes your program crashes, you get head): empty list or you get some funny exception and you want to figure out where this comes from. Just getting your hands on the stack trace is, I mean, not so easy. And then even if you manage, it’s not always so informative. So why is this so hard in GHC?

BG (0:32:10): That’s a really good question actually. Yeah. So there’s actually an entire thesis on this topic, essentially. Peter Wortman around 20– what was it? 2013 I think he probably graduated, wrote his– I think his master’s thesis (but it might have been PhD, I don’t remember) on essentially introducing instrumentation within GHC, metadata within GHC’s intermediate representations to be able to track the provenance within the source program of core expressions, and then CMM beyond that and being able to track, well, line 51 in the original source program lowered to these instructions in the actual object code that we emit. It was already a pretty good start and he had a really great treatment there of how do you reason about causality in a lazy language like Haskell because it’s actually quite tricky. So one way of getting a backtrace of sorts is of course to use the cost center stack. And that gives you a very precise stack, but it requires that the compiler and the runtime system track every point in evaluation, sort of which cost centers the programs currently enclose the current point of execution, which comes at a very high cost in runtime, which is why people often, especially when it comes to debugging CPU usage, sometimes the cost center profiler is not the most representative.

JB (0:33:29): So that’s the mode you get when you run this -prof, correct?

BG (0:33:32): That’s right. Yeah, exactly. And -prof-auto, et cetera. Yeah.

JB (0:33:35): You have to recompile everything to get it running. And if you have libraries installed and you don’t have that profile versions, then it’s tricky.

BG (0:33:45): Yeah. So that is a rather annoying– the fact that there are deployment challenges in using the cost center profiler is one thing, but then there’s also, in addition to that, the challenges in the overheads involved. So the challenge there is that how do you track the state of execution in a way that is implicit, that you don’t need to, unlike the cost center profiler, represent the execution state explicitly in the state of the SDG machine, which is how the cost center profiler does this. And it’s tricky for a number of reasons. For one, for a long time, I tried with pretty good success to make GHC emit DWARF information, which DWARF is the standard that is used by most unices for representing debug information. But it’s quite tricky for a variety of reasons. It’s not particularly well suited to the code. It’s heavily inlined code that GHC produces where we don’t necessarily– it’s hard to say if you have a data.vector.map of some function, you have some instruction that is due to that map, well, which of the source spans of that original program is that instruction actually due to? Is it due to the F? Is it due to the vector.map? Is it due to the enclosing context? DWARF requires you choose one and that’s it. Peter in his thesis works around the– explored various ideas on how to use, how to work around that limitation essentially. So that’s one limitation of DWARF. Another is that it tends to be quite slow and unwinding the stack. So to actually use that information to back out, the state of the execution is quite painful in many cases. And it’s very platform-dependent. And that really, I think, is what killed it in my mind. It’s a very useful debugging tool. It’s not so great for getting reliable, easy-to-interpret stacks out of your program state.

So we spent a fair amount of time there. And I think it is very useful as a debugging tool as I said, but ultimately the IPE, the Info Table Provenance approach that Matt Pickering implemented is really the best option for us going forward because it’s relatively simple and yet reliable and platform agnostic. And so really, it just took us a while to come to that realization. And that is the reason it took so long. If we had known that, oh, well, DWARF is very helpful for native programs but not so great for Haskell programs out of the gate, then we probably would’ve tried this a long time ago. I don’t save ourselves a lot of time. Yeah, because the IPE mechanism is quite simple, and yet it gives you, in many cases, modular limitations in our ability to track provenance through the simplifier almost as clear backtraces as you would get from the cost center profiler.

WS (0:36:33): What’s the idea behind this Info Table Provenance approach?

BG (0:36:37): Essentially, it exploits a feature of GHC’s evaluation of a heap model, where every closure on the heap, including stack frames, have a so-called info table, which is used by the garbage collector to decode the structure of the corresponding heap closure, so it knows which things are pointers and which are not. And we just use that pointer as a key, have a sort of off the side, a table of lookup structure, mapping info table pointers to their source locations, the source provenance. And then we can very relatively easily just build a backtrace. For instance, we just walk the stack, we look up the info table of every stack frame that we find on the stack and there is your backtrace. So that ends up being a very relatively easy implementation-wise. Very simple, much simpler than DWARF approach to address quite a few questions regarding the runtime state, including profiling and backtraces, et cetera.

JB (0:37:36): This will give you the evaluation trace, right? Like which things are causing which other thing to relate, not necessarily where has this other thing being defined. So it’s not–

BG (0:37:44): That’s right.

JB (0:37:45): It’s not the stack trace you would naively expect if you’re not used to laziness yet by looking at the source code. Is that a good thing or a bad thing?

BG (0:37:52): Yeah, it really depends. Sometimes I think there’s going to be cases where you just need the cost center profiler, especially when it comes to things like tail calls. IPE information actually cannot give you– essentially, the stack unwinding that you get from an IPE profile is going to tell you where you’re going to go next, where execution is going to flow to, not who called you. So if you have a tail call, in most languages, those are the same thing. If F calls G, then once G returns, you’re going to be back in F. So the stack is going to essentially look like, first, there’s going to be F on the stack because that is where you came from and that is where also you happen to be going to always in most languages. That’s not true in Haskell, though. We have tail calls where F will call G, but the return frame that’s going to specify where control flows to actually goes to H. All right, this other function. And so from the IPE stack trace, you would actually see that, “Oh, well, it looks like H called me, but no, no, that’s not true. That’s actually just where I’m going next.” And you’ve lost all chance to–

JB (0:38:59): It sounds like a bad time-travel movie.

BG (0:39:01): Yeah, that’s right. And reconstructing that “Oh, F is how I got here,” is all but impossible in some cases. Now, in most cases though, we find that’s not actually all that problematic. With a bit of domain knowledge in the program, you can build a mental map of, “Oh, well, if I’m going to H, then I must have come from that.” I mean, it’s just something, a conclusion that you can draw knowing a bit about the structure of your program. That’s not always true, though.

JB (0:39:28): So if you find one of these unexpected return places on the stack, you probably have to look at functions that also call this thing, because that’s the thing that put the H on the stack?

BG (0:39:40): Yeah, that’s right. That’s right. And thankfully now, we also have really great tools in the cost center profile, which really complement that need or address that need quite well. We have this as of– I think it was 9.2, I added a profiler mode where you can tell the profiler to only add cost centers at call sites of a given function. So if I care about all call sites of map, I can just say anytime there’s a call site of map and close that in a cost center. And those can be the only cost centers in your program, which for the most part will greatly alleviate at least the overhead that the cost center profiler results in. And so that really makes it possible to use these two methods in conjunction. You have the IPE profiling which can give you, at the very least, a high-level view of, “Oh, well, I’m somehow ending up in G. I don’t know who calls G, but then I can rely on the cost center profiler to answer that question of who calls G.” I think in many cases, these tools complement one another quite well.

To answer your original question though, after you can actually collect a call stack, then you need to have some way of actually conveying that to the user. And that becomes more of a question of library design. And that is something that, of course, is actually where the proposal is currently not stuck, but where it is currently is we’re trying to get the proposal through the GHC steering committee and be able to move forward with implementation. There’s an implementation that’s actually largely done, but it just didn’t quite make the cut for 9.6 because of the proposal.

WS (0:41:16): So this covers a lot of things that you’ve been up to recently, but if you look ahead, what are you most excited about on the horizon, the kind of things coming up?

BG (0:41:27): Yeah, there are a few things that are as exciting as web backends): the JavaScript and WebAsm especially. It seems like everybody is moving in this direction of being able to target these web languages, web representations. And I think we’ve seen even just with GHCJS what people can do, which GHCJS is great, but it has so much friction in using it that it’s remarkable. I’ve used it a bit. And it’s great if you use Nix, but ultimately, you need to use Nix because it gets quite hard to manage otherwise. Having a first-class backend for WebAsm that you can just say download a GHC binary distribution, ideally, the same binary distribution that you use to compile your X86 code, and just say, “Oh, –target=WebAsm,” and out pops a WebAsm module. That is something that I’m very much looking forward to. And there’s a lot of work that needs to be done there.

GHC currently, for largely historical reasons, still makes a lot of assumptions about its platform, its target platform. It’s essentially a compiler whose target platform is defined when you compile the compiler. It’s not runtime retargetable. So recently, Sylvain Henry and John Ericson and I, and a few others within Well-Typed have been looking at lifting this. Sylvain and John have been doing a lot of work in refactoring GHC to make the– what information do each of the components of the compiler actually need to do what they need to do. And then it makes it easier to reason about, okay, well what are the dependencies that we need to break in order to make sure that we can safely rechange the target at runtime? It makes it easier to reason about what parts are actually dependent upon the target. And then I’ve been working recently on doing a bunch of the infrastructural work in making the toolchain dependencies satisfiable at runtime so you can, for instance, add a native toolchain at runtime or add a WebAsm tool at runtime so that you don’t have to– essentially, it makes it possible to download a binary distribution and then use that to target any number of platforms after you’ve installed it. Whereas currently, the toolchain is found when you install the bindist and there is no changing. So that is the work that I think– hopefully for 9.8, I’m hoping we can get that in. And that will make the whole workflow regarding cross compilers in general and WebAsm in particular much, much smoother.

There’s still some issues that we’ll need to work out. Template Haskell, in particular, poses a real problem when it comes to cross-compilation. In WebAsm, it’s not as bad because you can run WebAsm code using an interpreter on the same platform as the compiler itself runs on. But ultimately, if you want to cross-compile two ARM or something like this, then you need some way to evaluate Template Haskell splices. And that requires either machine emulation or a change in Template Haskell semantics to allow us to run the splices on the host instead of the target.

JB (0:44:38): I was always wondering, why is that not simply the case?

BG (0:44:41): Yeah, I think largely, because nobody thought of it. It’s always been possible to cross-compile GHC, but it was never really a first class. It’s always been a bit awkward. And I think Template Haskell in particular, I’m not sure that there is a good reason why things are the way they are other than the fact that it tends to be a bit tricky. You have to place restrictions on your splices. For instance, if you imagine you have a splice that mentions the int type. And you’re on a 64-bit. Compiler is running on a 64-bit platform, but you’re targeting a 32-bit platform. What is int in that case? It’s not actually all that clear. So if you lift an int, because we have then the lift type class, which allows you to move values between these two spaces, then all of a sudden you have the possibility that, oh, you need to convert a 64-bit int to a 32-bit int or vice versa. And that’s something that we will need to address if that’s the mechanism by which we’re going to make Template Haskell in a cross-compiled setting feasible. Essentially, up until now, people have been using emulation and it works well enough in many cases.

WS (0:45:47): On that note, thanks very much for joining us, Ben.

BG (0:45:49): Thank you for having me. It’s been great.

JB (0:45:51): Yeah, that was fun. Thanks a lot.

Narrator (0:45:56): The Haskell Interlude Podcast is a project of the Haskell Foundation, and it is made possible by the support of our sponsors, especially the Monad-level sponsors: Digital Asset, GitHub, Input Output, Juspay, and Meta.

23 – Ben Gamari

Related links

Transcript