Tom Ellis

Transcript

This transcript may contain mistakes. Did you find any? Feel free to fix them!

Niki Vazou (0:00:15): This is the Haskell interlude. I am Niki. My co-host is Matti.

Matthías Páll Gissurarson (0:00:19): Hi.

NV (0:00:20): Our guest today is Tom Ellis, who works at Groq, using Haskell to compile AI models. Today, we talk about stability of both GHC and Haskell libraries, effects and strictness, and the premise of functional programming, which is to make invalid states and invalid laziness unrepresentable.

Hi Tom, do you want to tell us a little bit about your background and how did you get to know Haskell?

Tom Ellis (0:00:54): Well, I’ve always been interested in computers and computer programming. I started at a young age with the BBC Model B, which was a computer that was brought out in the UK when I was a youngster. It had BBC BASIC, a simple programming language, and I was fascinated by programming this computer. In fact, even before BBC BASIC, there was Logo, which was a program which allowed you to draw pictures by telling a pointer to move around the screen, giving it instructions, move forwards, turn right, move backwards. And that was really fascinating to me.

So, little by little, I picked up more and more programming. I moved on to QuickBASIC that was available on PC computers as I got a bit older and enjoyed programming games and other things that let me explore what programming could do. And then later in my teenage years, I thought, ‘I’d better learn C because that was a serious language.’ And C really blew my mind because there were so many dangerous things there that you could do that would make things go completely wrong. And you wouldn’t know really why they’d gone wrong or when they’d gone wrong. Just you discover sometime later in your program that something was terribly wrong. You had to manually allocate memory, remember to free it. And so I thought, ‘This was very high-tech and advanced.’ But at the same time, I thought, ‘This doesn’t seem like the right way of programming. There’s a lot of bookkeeping to do, a lot to keep in your head, not the fun bits.’

So then I discovered Python, and that was the opposite because it took care of everything for you and just let you get on and write what you wanted. So, that was a lot of fun. Very enjoyable. Kept that up all through my university days. And I studied mostly mathematics, but also computer science. And it was only when I started working that I really discovered Haskell, despite having known a bit about ML, which is a functional language, one of the early functional languages from my computer science studies. I didn’t really know about Haskell until I started working professionally programming Python. And I thought, ‘Actually, I’m finding a lot of difficulties programming Python as well. There’s also a lot to keep in mind. There’s lots of things that can go wrong. And Python isn’t really helping me with them either.’

Then I discovered the Haskell Reddit, and every other post on the Haskell Reddit was basically explaining a way that I could solve problems I was having day to day in my job with Python, make my life easier. Stop bugs creeping in, make it easier to write what I wanted to write and not what I didn’t want to write to check my program before it actually ran in production and tell me if something was wrong with it early when I could fix it rather than late when it had caused a bug and annoyed other people. So, that’s when I started getting into Haskell. And then I just got hooked. And ever since, I’ve been completely hooked on Haskell.

NV (0:03:37): Your university was in mathematics, not programming?

TE (0:03:40): That’s right. I focused mostly on mathematics, although I did a bit of computer science. But I kept up an interest in computer programming, and it was one of my hobbies, so I was always very interested in it. So, I’ve learned a bit about computer science from my university and a bit of computer science from the work I’ve done with Haskell because, to get into the guts of Haskell, it can be very useful to pick up computer science knowledge and techniques and so on.

NV (0:04:05): What are you programming in Python when you started with the Haskell Redditt?

TE (0:04:09): This was my first job when I was working for a hedge fund. So, I started working in finance, which is quite a common thing for people who’ve studied maths to do, at least in the UK. There were lots of interesting technological challenges to solve there on the boundaries of programming and mathematics.

I was a huge enthusiast of Python. I really loved it. It was my first language, which really clicked with me. And I thought, ‘This just makes me love to program.’ But then at the sharp end, actually putting it into production, I felt the error rate of this just seemed too high. This is not what we really want for production software in an industrial setting. And yeah, that’s when Haskell became more and more appealing to me.

MPG (0:04:49): Right. But it’s quite a big jump to go from C to Python and then completely into Haskell. Did you stop somewhere on the way? I mean, you went a little bit into ML.

TE (0:04:59): Yeah. So, I’d had some introductory lectures to ML at university. So, I knew about functional programming from those days. In fact, I always look back and remember that one of the programming tasks we had to do as homework was to write something that was essentially a loop. And so I thought, ‘I know how to do this. I already program in Python. I know basic, I know for loops.’ I thought, ‘I’ll do the homework before we’ve even been taught it.’ I was really confident and just thought, ‘I’ll arrogantly solve the homework before you’ve even been taught it.’ So, I searched online, “How do you do a for loop in ML?” And the answer was, there is no for loop in ML. You have to do recursion. And I thought, ‘What is this language? This seems absurd.’ I thought, ‘I’ll just go back to Python. That’s much easier, much more straightforward.’ But at least I’d had that exposure to ML, and I’d already had functional ideas presented to me, which was really helpful when it came to adopt Haskell.

MPG (0:05:55): But then you have the Bluefin library, right? You still kept running into these things breaking production, I guess.

TE (0:06:02): That’s right. And in a sense, my effect library, Bluefin, that I’ve been developing fairly recently is a way of writing Haskell in a direct style that is more familiar to me and familiar to people who have come from imperative backgrounds. And I think it’s a more direct style to turn ideas into code, but at the same time, not losing any of the guarantees, safety that we really like from Haskell. So, I see it as a way of getting the best of both worlds.

NV (0:06:35): And how do you do that? Do you want to explain to a speaker who doesn’t know exactly what is in effect maybe?

TE (0:06:42): So, I think I should start from the very beginning, all the way back in time. There was a time when Haskell didn’t really have any effects in the way that we know them today. You could communicate through reading and writing lists with the system that you were running on. If you wanted to print something to the screen, you had to emit an element of a list to say, “Print this thing to the screen.” This was very awkward to work with. Skipping over a lot of details, then there was the introduction of the monad abstraction as a way of allowing users to express programs as a sequence of statements, essentially, which was very useful for allowing Haskell to conveniently do things that interact with the system, underneath the system that the Haskell program is running on, the operating system. So, printing to the screen and reading from the terminal, reading and writing files and drawing on the screen and connecting to the network, and everything we need to do in real programs.

NV (0:07:40): The IO monad.

TE (0:07:41): The IO monad, exactly. This was satisfactory for doing stuff, but it’s not very satisfactory for the promise of functional programming, which I really interpret as making invalid states unrepresentable. And that means you want to have some control or some knowledge over what different components of your system are doing. You don’t necessarily want to write a subroutine in your program that’s supposed to, for example, do a calculation based on some data that’s been produced, say, calculate some statistics, calculate a mean, calculate a range. You don’t want a function like that to be able to write to your hard disk or write to the network.

If any subcomponent of your program can do anything, literally anything that your computer can do, that’s dangerous from a security perspective. It’s confusing from the perspective of somebody who’s writing or reading the code. It’s really nice to be able to build programs out of smaller subcomponents and have some degree of control or knowledge over what those subcomponents can do.

So, for this reason, people in the Haskell world introduced the notion of monad transformers. This was a way of getting access to things like mutable states, like exceptions. Useful things that are very convenient for structuring programs, but don’t go all the way to allowing your components to do any possible operation that a computer can do. You have control over what the range of possible things they could do is. And that works well for structuring programs, but there are a few problems with monad transformers and MTL, which is a library based on them.

Skipping ahead over many details, of course, because there are lots of details of this, modeling effects using these transformers or MTL style whilst also preserving some other properties that you want your program to have, such as resource safety. That means if you’ve opened, for example, some file handles and you’re using them to read in some data or to write some data too, well, if your program exits from a certain component, a certain flow of control and has forgotten to close those file handles, well, that’s really bad because there can only be so many file handles open in your program. Your program may later crash with an out-of-resource error. And that doesn’t play too well with the monad transformers style of things.

So, the pendulum swung back to the other direction and Michael Snoyman became a proponent of what he called the Reader T IO pattern. That is a way of using the IO monad, which can do everything, but also threading around some components that you need for your program in what’s called a ReaderT monad that allows you to store an environment of essentially local variables and pass them from call to call and do useful things with them. So, you might put a database handle in there if you’re writing some component that interacts with a database. You might put that in the environment of the ReaderT, and that works quite well. But as we mentioned before, if you’ve got an IO monad at the base of that ReaderT, well, then your component can still do anything that the computer can do. So, you, again, don’t have this nice property of functional programming of making invalid states unrepresentable.

So, the final swing of the pendulum till now is Andrzej Rybczak’s idea when he developed the effectful library, which was released in the recent two or three years, I think. That takes the Reader T IO pattern, but it uses constraints in the Haskell type system to allow you to see what operations a particular component is allowed to perform. So, this has the benefits of the transformer style. You can limit what certain components can do because the type system gives you a limitation on what the components can do. But at the same time, you’re running in IO under the hood. So, the effectful library is essentially ReaderT IO, but wrapped in a system constructed from Haskell’s type system that makes sure that you can limit the effects.

NV (0:11:31): And this is still a monad?

TE (0:11:33): This is still a monad.

NV (0:11:34): Because you said it has constraints, right?

TE (0:11:37): That’s right. It’s a monad, and a lot of the effect system monads are called Eff, E-F-F, just standing for effect. There are quite a few attempts at different effect systems, and that’s one of the names that they’ve adopted.

Essentially, effectful, and Andrzej Rybczak’s idea, as far as I’m concerned, solves the whole problem of effects in Haskell, in the sense that you have both desired properties. You run an IO and you can essentially do anything you want. There’s no ambiguity that leads to difficulties with resource safety. You won’t leak resources. You won’t miss state updates because of subtle effects that monad transformers bring in. Everything is under the hood, being done in the IO monad, but you have these constraints at the type level that mean you get the other nice desired property, which is you make your invalid states unrepresentable. If I’m going to say this function can only mutate state, then you know it’s not going to do any printing to the terminal or open any network connections. And that’s a really nice property to have when you’re structuring large programs.

So, that’s the end of the story, the swinging pendulum. Except there’s one more thing which I’ve introduced which is Bluefin, which is heavily inspired by effectful. It is very similar to effectful, but instead of passing these constraints that tell you what you can do at the type level; it passes them at the value level. So, you pass in value level handles—sometimes these are also known as capabilities—that allow you to perform effects. So, if you want to have a mutable state, you pass in a state capability, and that allows you to mutate the state. If you want to be able to throw an exception, you pass in a capability, which is a value that allows you to throw an exception. And if you want to be able to do arbitrary IO, for example, you can pass in the capability to allow you to do arbitrary IO. So, that has the same benefits. If you don’t pass that in, you can’t do it. If I don’t pass IO in, I can’t do arbitrary IO. So, that has the same benefits as effectful.

But from my point of view, the value-level handles are more ergonomic. I personally quite like them being manual. You have these values you have to pass around just like any other values in your program. Whereas effect and effectful, and in fact, in essentially every effect library that’s previously been developed, they’re implicit and they’re at the type level.

NV (0:13:52): But the effects do not show in your type signatures.

TE (0:13:55): So, the effects show in the type signatures because you’re passing them in as arguments.

NV (0:13:59): Explicit arguments, not like dictionaries.

TE (0:14:02): Yeah, that’s right. So, the effectful style, and in fact, essentially every effect system that’s gone to date will pass them implicitly as type class constraints, but with Bluefin, you pass them in manually.

NV (0:14:17): So then your functions are pure.

TE (0:14:19): The functions are pure in the same sense that IO is pure. Yes. So, it’s still a referentially transparent function, but of course, when you arrange for the Bluefin Eff monad to be run in IO, it’ll just do the effects that were mentioned in your program.

MPG (0:14:36): Right. But you still need to run effect in IO if you want to use IO, I guess.

TE (0:14:41): Exactly. That’s right.

MPG (0:14:43): That’s very cool.

NV (0:14:44): My understanding was that the huge benefit of effects instead of monad transformers is the compositionality. And I assume if you pass values around, like in Bluefin, then it’s like easily composable, no?

TE (0:14:58): So, yes, I think I’d say that’s right. We also want what we implement to be composable. And that’s another great promise of functional programming that you can freely compose components, build programs up of smaller pieces by composing them and they all fit together and match nicely. And that only really works if you have this ability to make invalid states unrepresentable, and you also have this nice compositionality property where you can build two components separately with their own individual behaviors. And when you plug them together, they also keep their individual behaviors and they interact in a known way.

So, I’d say all effect systems to date have had that. If you look at concrete monad transformers, then they don’t really have that because you’re committing yourself to an order of stacking the monads. But the MTL library was designed to circumvent that by also passing the particular monads that you are using in your stack of effects as constraints. And effect systems like Polysemy, well, they don’t use monad transformers, but they take a similar approach where you pass the effects that you’re doing as constraints at the type level. And in fact, every effect system has until Bluefin. And I would probably make one exception to that, which is Haskell’s ST monad, which allows you to do mutable state by passing around ST refs. That’s very similar to how Bluefin does it. So, essentially, Bluefin was also inspired by the ST monad in that sense.

MPG (0:16:29): But what’s the story with performance? Because I’ve heard a lot of stories from production where these Monad transformer stacks get very slow because you keep lifting up and down the stack. But I guess it’s a lot easier in Bluefin then to get it to run fast.

TE (0:16:44): Yeah, that’s absolutely right. And that’s another key element of this move to the ReaderT IO pattern and then to effectful. So, if you compose your effects out of monad transformers, that works all very well from the point of view of composing fine-grained effects into a bigger set of effects, but you’re implementing them as algebraic data types in Haskell and not necessarily in the most efficient way for the overall effect that you’re trying to build. So, if you want something that can potentially throw three different types of exceptions, you’re going to have to nest an ExceptT transformer three levels deep. And then every time you run two statements, one after the other, well, what’s that doing? It’s got to convert the first one to three levels of nested eithers and check whether any of them was left. If it is, well, it’s got to abort. That means the exception was thrown. That’s how ExceptT model is throwing an exception. And if none of them was left, then it extracts the value and then it continues with the next statement. So yes, that is a time-consuming thing to do.

So, one of the big benefits of moving to the ReaderT IO pattern was you just do state using IO refs and you just do exceptions using IO exceptions. And these are very nice, well-performing constructs. And so effectful decided, well, we can use those, but we can get the benefit of having an effect system, which allows us to see what operations each component can do. And Bluefin follows that path.

MPG (0:18:17): And again, on performance, how do you get the feel for the large-scale performance?

TE (0:18:22): So, I haven’t done much benchmarking or performance measurement with Bluefin. I’m basically very confident that it’s a very high-performance library and it will match the ReaderT IO pattern or effectful, which I know to be high performance because they have been benchmarked. The reason why I’m confident that it will match them is because it’s essentially doing the same thing under the hood. So, I haven’t spent my time benchmarking and trying to demonstrate that it’s going to be high performance because it’s essentially doing the same thing as other high-performance systems. It’s just wrapping it up in a slightly different skin.

NV (0:18:55): So, are you using Bluefin on your daily job?

TE (0:18:58): Yes. So, my daily job is at Groq, which is a startup that’s developing novel compute hardware for AI. We develop what’s called an LPU, a language processing unit for running large language models very fast. So, if you’ve used things like ChatGPT, then you will be familiar with what Groq runs, and we’ve got the world’s fastest inference engine for LLMs. Part of the pipeline for compiling the models to our chip is written in Haskell. And some of that is using Bluefin as well. Yeah, that’s right.

NV (0:19:33): And the code is closed, right?

TE (0:19:36): Yes. So, the codes that we’re developing at Groq is for our — well, we have a wide variety of different systems that interact with the thing that I’m developing as part of the compiler pipeline. This is all in-house code for compiling models, predominantly open source models, but also any models that customers want to bring for us to compile and run. And yeah, that’s all in in-house code and closed source. We use Haskell in the compiler pipeline for some stages.

NV (0:20:05): The compiler from what to what?

TE (0:20:08): So, machine learning models, AI models are very often written in Python, particularly PyTorch. TensorFlow was an older Python library for writing AI models, but PyTorch has a huge amount of the market now. So, one way of getting models into our system, into our compiler is by writing them in PyTorch. That’s a Python library that allows you to express various operations that are popular in AI, different matrix multiplication operations, mathematical operations. That’s the building blocks of AI, neural networks.

There’s another standard file format called ONNX. And that is a standard format that allows you to express these machine learning AI operations at a fairly high level. So, that’s another way of getting machine learning models into our compiler pipeline. And it goes through many stages. The bulk of our compiler is actually in C++ and MLIR. Lots of transformations, all sorts of scheduling work to schedule data flowing around the chip. And at the end, it gets to the assembler stage, which is the part that’s written in Haskell. And the input code is written out to binary code that actually runs on the chip and is arranged to run on the chip in the most efficient way possible.

But we also have a lot of other software projects. It’s a very complicated system. So, we need to arrange for racks of these chips to be set up, to be deployed. We have a big CI system that tests all our code and makes sure that the changes we make don’t contain bugs, or at least they pass the tests. And we have a whole effort of designing new chips. Some of the chip design has been done in Haskell as well.

MPG (0:21:42): Do you know why did they choose Haskell initially? Of course, all of us were like, “Oh yeah, of course Haskell.” But do you know what their external reasoning was?

TE (0:21:49): So, our CEO and founder, Jonathan Ross, is a big Haskell fan. And he actually developed the first TPU. That’s Google’s Tensor Processing Unit, which came out—I’m not sure exactly when—maybe a bit over 10 years ago. He developed that in BlueSpec, which is a Haskell-like language for designing hardware, and in fact, it’s written itself in Haskell. He’s a big functional programming and Haskell fan, and he felt that designing hardware in Haskell was a very good fit. And many other people have thought this as well. And there is a niche, a Haskell-like functional programming niche in hardware design. So, some of the early designs of our chip were designed using Haskell and Haskell-based tooling. And there are quite a few people in the company who are fans of that kind of thing. But we also combine that with traditional methods of hardware design. So, we use SystemVerilog, which is a very traditional language for designing hardware.

NV (0:22:47): But there are not many Haskell libraries that are for AI. Is that correct?

TE (0:22:53): Yeah, that’s right. So, when we think of AI, there’s a variety of different areas. And probably if listeners are familiar with AI and AI-based libraries or libraries for implementing AI and machine learning models, they’ll be thinking of the model side of the software side. That’s the side of things where you think about training models. Well, first designing models, implementing them, and then training. That’s the kind of thing that PyTorch is for, that TensorFlow was initially very popular for, and there are other tools for different languages like this, such as JAX. By far the most popular is PyTorch, and Python is by far the most popular language for doing this in. And in the Haskell world, we don’t have anything popular like that really. There are a number of different frameworks, different libraries in Haskell for doing a similar thing, but they’ve never really taken off. And people who are doing this in practice will almost always be using Python.

But the other side of things is what I’m doing. That’s the hardware side of things. And that’s much less common because that requires a huge effort. You can’t do that with it unless you have huge amounts of funding. Designing new microchips is very capital-intensive. So, it’s less common to see people tackling that. And yeah, the hardware side of things, well, that’s very commonly in SystemVerilog, Verilog. It’s in these traditional hardware design languages.

MPG (0:24:12): You wrote an experience report from Groq’s perspective on when you upgraded from 8.10 to 9.6, right? From GHC versions?

TE (0:24:21): That’s right. So, until recently, a few weeks ago, we were using GHC 8.10. And then we decided, “Well, it’s time we upgraded to a more modern version of GHC.” We chose 9.6. And I’ve written up an experience report of how that went in terms of the changes that we had to make to our code base in order for it to work under 9.6.

In the past, when there have been upgrades to GHC, a lot of the changes that came along with that, partly changes to the Haskell language or the defaults that GHC presents as the Haskell language, but also changes to the base library or to the Template Haskell library or other libraries that come along with GHC, have forced people to make changes to their code in order to accommodate this change to the GHC libraries.

The more of that you have to do, the more painful it is to do these upgrades. And a common refrain amongst industrial Haskell users is that there’s just been too much change. They feel exhausted keeping up with everything. And because I’m actually trying to do what I can to make Haskell more amenable for industrial development and more attractive to industrial users, this is one of the things that I really wanted to tackle.

So, I’ve been keeping tabs on stability in the Haskell world. That is in a sense of, what do you have to change to do the upgrades that you want to do? And so, this article that I wrote, the experience report is part of that. And I’m pleased to announce there wasn’t that much change from GHC itself. It wasn’t that much that we were required to change because of changes to GHC itself. In fact, I’m not sure we were forced to change anything if you take it literally, but there were a few warnings that came up that we wanted to suppress. Most of the changes came from libraries, in fact. So, I think we still do have a little bit of a problem in the Haskell ecosystem that library developers and maintainers frequently change the APIs of their libraries. And this forces people to change the code that they’re writing to use these libraries when they need to upgrade.

NV (0:26:25): But how can you enforce stability of the libraries?

TE (0:26:29): So, I wouldn’t say it’s a question of enforcing. I would say it’s a question of describing and advocating. So, I want to make a clear statement of what is currently required in the Haskell ecosystem to do upgrades and then try to explain the downsides that this has and hopefully persuade some people that more efforts put into maintaining the stability of APIs, avoiding breaking APIs, is actually very beneficial for the Haskell community. That’s the hope. And there are a variety of techniques that you can use to achieve that. People, when they change their APIs, they do it for a good reason. They’re not doing it gratuitously. They want to remove old functions, say, that are considered deprecated because they don’t necessarily do the right thing or do it in the right way. They want to make the API more coherent. So, there are a lot of good reasons for changing things, but it has costs as well. So, I think having a conversation in the community about the costs and benefits and how we find a good point of balance between those is a really useful thing for the community.

MPG (0:27:33): Right.

NV (0:27:34): And this move comes from the Haskell Foundation, right?

TE (0:27:37): So, the Haskell Foundation has a working group called the Stability Working Group. That means it’s an officially organized group of people under the Haskell Foundation banner, but it’s open to anybody. It’s not closed in any sense. Nobody has to have a specific affiliation with the Haskell Foundation to participate. It’s open to everyone. And they’re a group of people who’s particularly interested in stability and the issues involved there. And I’ve been involved in that for a bit. I’m not directly involved in that anymore, but they’ve done a lot of work towards advocating for stability, tracking stability in GHC specifically. They’ve done a lot of advocacy work for making sure that when the GHC steering committee, who decides which changes are going to go into GHC, look at a proposal, then they take into account stability in that proposal. The same is true of the Core Libraries committee, which is the committee that is responsible for maintaining the base library, and I’m on the Core Libraries committee. So, that’s something that I’ve been advocating for in that committee. And we have some other people in the committee who are also very strong on that. And bit by bit, these ideas of stability are percolating in the Haskell ecosystem.

NV (0:28:50): And if somebody wants to join these committees, what should they do?

TE (0:28:53): So, the Stability Working Group, I believe, is open to anyone at any time. So, there is a GitHub page under the Haskell Foundation GitHub called the Stability Working Group, and that will give you contact details of the people you need to contact to find out when the meetings are and to be invited to participate in that. The Core Libraries committee and the GHC steering committee are formed by elections every year. So, I was elected about three years ago to the Core Libraries committee. And in a few months, I’ll leave, and there’ll be new elections. And anyone who’s interested in participating in that can put themselves forward for the election. And you can find information about that in the Core Libraries Committee GitHub repo. I think it’s under the Haskell GitHub organization. And the same for the GHC steering committee. They have an election process. I’m not entirely sure where their call for nominations goes.

MPG (0:29:45): It’s usually on the Haskell Discourse. That’s where things will be announced if they ask for new elections.

TE (0:29:50): Yeah. I imagine a lot of these things will be announced on Haskell Discourse, the Haskell-Cafe mailing lists, probably Reddit also. A variety of different community forums. 30:01

MPG (0:29:59): Right. But there was some discussion, I think, recently that GHC is usually tied to a version of the base library.

TE (0:30:06): That’s right.

MPG (0:30:07): There was some discussion that was, essentially, there’s no reason why GHC 9.8 shouldn’t work with an older base. That’s one of the big pain points in upgrading because you upgrade base and then all your packages now need to be upgraded because they depend on a later version of base, or as the compiler itself didn’t really change.

TE (0:30:26): That’s right. Currently, it’s not possible to upgrade your version of GHC without also upgrading your version of base at the same time, and vice versa. This kind of upgrading in lockstep, forced upgrading in lockstep, introduces a lot of fragility into the system because it means if you really want to get a new GHC, but there’s something in base that’s you don’t want—maybe there’s a change to base that you don’t particularly want to have to deal with yet—then you simply can’t. You can’t do that.

The technical reason that base and GHC are tied together is Template Haskell. If you’re using some Template Haskell to generate code that’s going to be compiled into your program, the version of base that you generate the code with in the running GHC process must be the same version that you’re currently compiling your own program against. So, there is this technical reason to tie them together, but there are potential ways that we could untie them, and that would be fantastic. We would be able to upgrade to a new version of base if we wanted to, if it has something new in it that we want, and not be forced to upgrade GHC or vice versa. So, that would be really nice.

MPG (0:31:31): Right.

NV (0:31:32): Yes, because this dependency exists even if you are not using Template Haskell.

TE (0:31:36): Yes, absolutely. So, the restriction exists even for people who are not using any Template Haskell in their own code or even any Template Haskell in any of their dependencies. So, it really has a big knock-on effect. At the moment, we just have to cope with it, just have to accept it. But it would be nice if we could make it possible to upgrade these two components separately and remove this inflexibility that causes a lot of fragility.

NV (0:32:00): And you are involved with the stability because you had to go through the upgrades or because you see that not guaranteed stability is one of the reasons why Haskell is not so much adopted by the industry.

TE (0:32:15): For me, it’s more the latter. Yes, the personal difficulty — well, personal or professional difficulty I’ve had with upgrading it, GHC and Haskell versions, I don’t really mind. I just do it. But I think it must be very off-putting for people who are less familiar with Haskell because they don’t necessarily know how to interpret all the build failures that they’re getting. They don’t necessarily know how it is that they should be fixing them. And so, I’m concerned that people who have adopted Haskell but are not really Haskell experts who are deeply familiar with it will be put off. And similarly, people who could adopt Haskell looking from the outside will be discouraged from using Haskell.

I think that’s bad for our community because, firstly, I think we should want to welcome new people in and grow the community. I think we would benefit from that in many ways. We would get more contributions and have a more active community. But I think it’s also very disappointing for people who do want to use Haskell and can’t have the benefit of it because they see a lot of sharp edges that they risk or fear getting cut on. I think that’s very sad because we’ve got something great. Haskell is a wonderful language and a wonderful ecosystem and a wonderful community. And I think it’s very sad if people can’t benefit. So, I’d really like to remove roadblocks that are stopping people really getting to grips with Haskell and to retain people who do start off using Haskell. And one of the big ways of doing that, I think, is focusing on stability. So, that’s my main personal motivation for that.

MPG (0:33:40): Right. But you also have been very active in collecting feedback from industry users or potential industry users.

TE (0:33:47): Yeah. So, I’m always keen to try to understand what people’s experience of Haskell is. And particularly with regard to stability, if people mention breaking changes that have caused their builds to fail, and either they don’t know how to deal with them or it caused them a lot of work, I’ve been trying to ask them to write up their experiences, to share their experiences so we can make it clear to everybody exactly how this is panning out in practice. Because when everybody has their own perception and experience of a thing, well, that may well be a very correct experience, but it doesn’t necessarily communicate very well to other people just to say, for example, “Oh, Haskell has a stability problem,” or “Haskell has a breaking changes problem.” I think it communicates much better if you can write a document saying, “These are all of the things that we struggled with.” And then people can read that. And piece by piece, they can understand all of the challenges.

So, that’s why I wrote up my article about the experience report upgrading GHC and I hope other people will do the same, and eventually I hope that people will write that it’s very straightforward to upgrade GHC because we found a nice path for minimizing breaking changes, not doing them when they’re not necessary and making life a lot easier for people using Haskell in that way.

NV (0:35:05): But why Haskell has so many breaking changes and the other languages don’t?

TE (0:35:09): I’m not sure, and I’m not certain either that other languages don’t because I’m not involved in other language communities these days. But I suspect if you look at languages like Java and C#, which are very much focused on enterprise, then, well, I know it comes from top down. The developers of those languages don’t want to make breaking changes to their languages because otherwise, their enterprise customers will be very unhappy. And that probably percolates down to library developers as well. And in the Haskell world, we’ve come from an academic history. And I think that’s great. I think that’s a fantastic history of Haskell that has led to many of the innovations that we know and love. But when you have a lot of your infrastructure and libraries developed by academics, they don’t necessarily have the perspective necessary, or even the time and energy to make what they’re producing consumable in an ideal way for an industrial user. So, that could explain one reason why Haskell has this particular issue, but I don’t think that means it should be more difficult to solve. I think it’s a question of education and advocacy and making the case. And then I think we can cope as well with this as any other language.

NV (0:36:17): Haskell Foundation could enforce the GHC Stable, but since most of the library developers are volunteers, I really don’t see how this can be enforced.

TE (0:36:27): That’s right. Almost all of the Haskell ecosystem is developed by volunteers, ultimately. And the Haskell Foundation is a fairly new entrant on the scene. I’m in fact the vice chair of the Haskell Foundation, and one of my roles in the Haskell Foundation that I’ve taken on is this stability work. And in the HF, in the Haskell Foundation, we don’t really see that we’re an authority on Haskell. We can’t boss people around and tell them, “You must now do this. We’ve decided a certain thing is important, so you must do it.” We don’t have that authority. We’re not governing and running things. We’re trying to help volunteers and help existing community members and community projects do the best they can.

So, one role I see for the Haskell Foundation is to listen to people and to see what they need, particularly industrial users who have been historically a small group, but hopefully now a growing group, and advocate for their needs. But it’s not about forcing anyone to do anything, and it’s not about forcing existing groups of stakeholders, say, academic users, to just accept whatever industrial users want either. So, it’s not about forcing anyone or about telling anyone what to do. It’s really about sharing information, sharing stories, allowing different people to understand each other, and coming up with plans for improving everything for everybody.

NV (0:37:45): And so, other than stability, is there something that you would like to see in Haskell in the next year?

TE (0:37:52): So, another thing that I’ve been very interested in is space leaks. So, this is another thing that I’m interested in working on from the point of view of making Haskell more suitable for industrial use. The reputation of Haskell from the point of view of being a lazy language is that it means it’s vulnerable to space leaks. This is when you use more memory than you expected. You could, for example, want to sum all of the numbers in a list, and due to laziness under certain circumstances and under certain implementations, that may end up not adding all the numbers in the list when you thought it was going to, but only at a later stage. In fact, it might build up a very big, what’s called a thunk in memory, representing the operation of summing all the lists, but not actually doing it until later. And some people from outside the Haskell community, or even inside, have the impression that these space leaks, which are very bad from the point of view of real-world usage where memory constraints are a real thing, they give the impression that that makes Haskell less suitable as an industrial language.

So, I’ve wanted to dispel that notion and produce tools to make it easy or make the default style of programming in Haskell one that would be space leak-free. So, following this notion of make invalid states unrepresentable, which I think is one of the big benefits of functional programming, I call that “make invalid laziness unrepresentable.” So, I’ve developed a library called Strict Wrapper, which introduces essentially strict versions of base data types. They can’t contain these thunks. They can’t contain space leaks and easy ways to work with them and interoperate with the existing ecosystem, because so much of the existing ecosystem does revolve around these base data types, which are lazy and can contain thunks and could theoretically contain space leaks.

So, in my view, that is a one-stop shop. That’s the single thing that you need to know in order to fix space leaks in your Haskell program. And from that point of view, I’m very keen to recommend it and keen to see it used because it requires minimal effort, minimal overhead in terms of what you have to think and keep in your mind as you’re programming and just does the job in the same way we know and love Haskell to do by tracking things in the type system and making sure you’re getting things correct by construction, essentially. That’s an idea and a library that I’m keen to promote and to dispel this notion that Haskell is vulnerable to space leaks. And in that way, I hope to make Haskell more attractive to industrial users.

NV (0:40:26): So, the library is Strict Wrapper and it redefines the basic structure or it’s like, do I use the base list or do I use a list that you define in your library?

TE (0:40:37): So, Strict Wrapper specifically introduces strict versions of tuples, of maybe, and of either. I think that’s all it does for now, and —

NV (0:40:48): Not lists?

TE (0:40:50): So, this is a little bit of a more complicated situation. I’ll come back to that in a second because that’s also related to some Bluefin ideas. But in terms of these basic container types, small container types, tuples, either, maybe, Strict Wrapper introduces strict versions of those that also interoperate easily and efficiently with the base data types, the same base data types, so that you can combine code that doesn’t use that assumption, that strictness assumption provided by Strict Wrapper, with code that does. So, it’s supposed to be a lightweight wrapper around that to design your data types to be space leak-free.

When it comes to lists, that’s a little bit more tricky because actually strict lists, as in strict cons lists, linked lists, where you cons one element to another, they’re not necessarily that useful for anything. In limited cases, they may be useful. They may be suitable because they’re simple. But in general, if you really want to materialize your whole list, you probably also want to do operations like indexing into it, adding or removing elements from the front or the back. We have other good data types for that. For example, the sequence data type, Data.Sequence. And if you want constant time lookup, you could use Haskell arrays or Haskell vectors. So, actually, in practice, choosing those data types rather than lists can be the correct solution. And I haven’t found so much in practice that space leaks arise due to the use of lazy lists in Haskell. It’s more due to accumulating into smaller data types, repeatedly doing operations to smaller data types, and building up a big thunk. And in that case, that’s where the Strict Wrapper approach is of benefit.

NV (0:42:30): That’s very interesting.

MPG (0:42:31): So, one of the difficult topics in Haskell, a lot of what we do, we represent in the types, right? But the laziness is specifically not in the types, right? But here you’ve introduced a library that lets you put laziness in the types.

TE (0:42:44): That’s right. So, essentially in Haskell, because it’s a lazy by default language, meaning that function arguments are lazy, they’re not function arguments, not evaluated when you call a function. It means you don’t have a distinction between laziness and strictness in types. In fact, if you did, that would be a problem because it would mean you couldn’t have the nice benefits of using Haskell as a language for writing domain-specific languages and getting all the benefits of being able to write your own control flow operators like if and so on and so forth.

But on the other hand, we do want the type system to guide us. We do want to make invalid states unrepresentable. We want to make invalid laziness unrepresentable. So, Strict Wrapper takes a dual approach to that by making a type family called Strict. The type family called Strict is a thing you can apply to a maybe or a tuple. So, you could have a strict of maybe int. That would be the same as a maybe int in the sense that it can either be an int, a just of an int, or it can be a nothing. Those are the two constructors of maybe: just or nothing.

But the difference with the strict version is that if the just constructor has already been evaluated, you know that the int that it contains has also been evaluated. It’s not a thunk. So, you know that you can’t be building up a big thunk in that just payload. So, in that sense, we are now back to the world of using the type system to guide us to write the correct code. If you design your data types so that they use strict maybe ints, then you know that you’re not hiding a thunk in that int. It’s actually an evaluated int, at least when the just itself is evaluated. And so, you’re not leaking a large amount of memory. We’re using more memory than you anticipated. So, it’s bringing back the ability to see in the type system that you’re doing something correctly.

MPG (0:44:28): Right. So, just last question for the listeners. We’ve been talking a lot about thunks, but what exactly does a thunk containment? When can it be a large thunk?

TE (0:44:37): The implementation of laziness in GHC says that when you are defining a variable in terms of some function call, so you might say let x = y + z, it’s not actually doing y + z at that moment when it makes x. It’s actually storing the computation, storing a little program, if you like, saying, “When you want my value, do y + z.” That’s essentially what lazy evaluation is. You defer the computation until the time it’s needed.

Now that works well in many cases and it’s straightforward, but in some cases, it doesn’t quite go right. Unfortunately, those cases crop up a lot in long-running programs because long-running programs explore a lot of the state space. For example, if you try to add a list of numbers together to get the total, the sum, and you do that by accumulating a running total, then when you take off each element of the list and add it into your running total, unless you’re careful, you’re not actually doing the addition. You’re just writing down, “When you ask me for my results, do the addition.”

MPG (0:44:39): Right.

TE (0:44:39): So, you are actually building up essentially a little program or a little data structure that says, “When asked for the result, add up all these numbers.” So, you end up building a thunk that’s as big as the original input list.

There’s another situation where this can be tricky, and it is related to lazy lists, and it’s a little of the one case where lazy lists can be difficult with laziness. But the same is true of streaming libraries, like Conduit and Pipes, which essentially allow yielding elements like a lazy list would processing one at a time, but doing other effects as well, like doing IO and so on. And the reason is that all of these constructions are data structures, lazy data structures that when they’re evaluated, they stay around. So, a common pattern is to iterate over a lazy list, say, 1..100—that produces the lazy list of elements numbering 1, 2, 3, up to 100—to do something 100 times within an increasing index, starting at 1 and going up to 100. And that doesn’t actually ever make a list of 100 elements long. It just goes through the lazy list one at a time, peeling them off. Except if you somehow reuse that list, you keep a reference to it. Then, GHC will say, “Well, you wanted to make a list, and I’ve made a list. It’s got a hundred elements.” Doing that for a hundred elements might be fine, but if it was for a million elements, you might start to have a problem. And if it’s a billion, it’s an even worse problem.

So, this is one case where lazy lists can cause difficulties with space leaks, but it’s not because of treating lazy lists as lists. It’s because of treating lazy lists as ephemeral streams that produce things but never exist in their entirety. So, now my recommendation is just to avoid those kinds of things entirely, because Bluefin offers a streaming operation that allows you to create these ephemeral things to yield elements, say yield the numbers one up to a hundred, but it’s not a data structure. So, it will remain ephemeral. It’s never going to be materialized into some big data structure of a hundred or a billion elements. It’s just some code that happened to yield these elements.

So, my solution to that is not anything Strict Wrapper-related. It’s not about choosing a data type. It’s more about choosing the correct representation for an iterator-style problem. And I think that’s basically what’s provided by a Bluefin-style stream.

NV (0:47:51): So, we should use streams and arrays.

TE (0:47:54): Yeah. If you want ephemeral things, I would suggest using streams and yielding elements, and if you want something that is staying around permanently, then forgo laziness entirely and just use an array, or if you want some more flexible data structure, a Data.Sequence is a very powerful choice.

NV (0:48:10): Do you have any other final suggestion for the listeners?

TE (0:48:13): Just enjoy Haskell. It’s a great language and a great community. So, I suggest people in Haskell, I hope they continue to enjoy it. People who are not using Haskell and want to get their feet wet, I hope they do. And if anyone wants to reach out to me with any questions, any thoughts, and wants to ask for my help, they’re welcome to do that as well.

NV (0:48:32): Thank you very much, Tom.

MPG (0:48:33): Thank you.

TE (0:48:34): Thanks.

Narrator (0:48:36): The Haskell Interlude podcast is a project of the Haskell Foundation. It’s made possible by the generous support of our sponsors, especially the Monad level sponsors, GitHub, InputOutput, JustPay, and Meta.

60 – Tom Ellis

Related links

Transcript