Lindsey Kuper

Transcript

This transcript may contain mistakes. Did you find any? Feel free to fix them!

Wouter Swierstra (0:00:18): Welcome to the next episode of the Haskell Interlude. We’re joined today by Lindsey Kuper, who will tell us a little bit about what to do when your data center gets hit by a tornado, life in academia versus life in industry, and what is choreographic programming.

Niki Vazou (0:00:37): Hi Lindsey, welcome to the Haskell Interlude. So why don’t you start by telling us a little bit about your background and how did you get into Haskell?

Lindsey Kuper (0:00:46): Sure, I’d be happy to. So I’ve been a functional programmer for a long time, but I haven’t been a Haskeller for quite as long. My first programming language, my first real programming language, was actually PLT Scheme back before it was called Racket, which I used in undergrad. And so I loved functional programming. And then I graduated, and I was working in industry for a while between undergrad and grad school. And I wasn’t getting much of an opportunity to use functional programming. But then I moved in with some Haskellers. I was living in Portland at the time, and I had some roommates who were Haskell programmers, and it made me remember that I loved functional programming. My roommates actually did the ICFP programming contest one year, and they did really well. So I was inspired by that, and it made me remember how much I liked doing functional programming when I’d been an undergrad. And it was a direct influence on me deciding to go to grad school.

So I went to grad school at Indiana University, which was at the time very much a stronghold of Scheme. So I, again, didn’t do much typed functional programming in the beginning part of grad school; I wrote a lot of Scheme. And I started to study type systems as I started to get into the latter years of grad school. But I still wasn’t doing a whole lot of programming. I was doing a lot of type system stuff on paper. And it was only in the close to the end of my PhD, working with my advisor Ryan Newton, that I began to really write Haskell. And so 2012, 2013, 2014, I was working with my advisor on the project that would eventually become my PhD, and he was a Haskeller. So Haskell was the natural choice for us to use, and I really enjoyed it. And I’ve been a Haskeller off and on ever since, and I’m now using Haskell quite a bit in my work.

NV (0:02:37): So what was your PhD originally?

LK (0:02:41): I wasn’t sure what I wanted to do when I arrived at grad school and tried a few different things. I worked with miniKanren, the logic programming language, for a while. And for a little while, I was studying logical relations and parametricity and some very theoretical stuff. And so I ended up not really settling on a topic until I was quite far along and started working with Ryan. So it took me a while to settle on something.

NV (0:03:12): And have you like – logical relation is something that I keep seeing in the PL community, and I really wish I knew about them all.

LK (0:03:21): Oh. So I spent some time near the beginning of my PhD studying logical relations with Amal Ahmed, and it was really educational. But I haven’t really applied logical relations to anything since then. And every now and then, I get a question about did anything ever come of the work that we did. And other people ended up continuing on with it later. So it didn’t end up being something that I stuck with.

NV (0:03:50): Yes, because Amal is still doing this kind of thing.

LK (0:03:53): Oh, yes. Yeah. So logical relations are an amazing tool, and she uses logical relations as a big hammer to hit all kinds of problems and do lots of damage to those problems. And yeah, I was really fortunate to have spent that time working with her because even though I didn’t use that particular proof technique directly, I gained a lot from those years.

NV (0:04:17): And you can use this technique to prove things about the work you’re doing now.

LK (0:04:23): Yeah. I mean, if nothing else, it’s given me an idea of when one would make use of this certain class of high-powered proof techniques. So that’s been helpful.

NV (0:04:34): Okay. And so you settled to Haskell eventually in your PhD, and then?

LK (0:04:40): Yeah. So I began using Haskell seriously in my PhD in the later part of my time when I was working with Ryan Newton on LVars. So this kind of made a lot of sense as an implementation choice because the work we were doing was kind of building on and extending work that Ryan had done earlier with collaborators on deterministic parallelism implemented as a monad. This was the monad-par work which Ryan and his collaborators published in the early 2010s. I don’t remember the exact year.

WS (0:05:20): So what is deterministic parallelism?

LK (0:05:23): Yeah, great question. Deterministic parallelism. Well, let’s talk about what it means for something to be deterministic first. When we say that a program is deterministic, we mean that given the same inputs, it always produces the same outputs. But there’s a lot of things that we could mean by that, right? It depends on what you mean by ‘same.’ In some sense, we could argue that very few programs are deterministic, right? When I run a program that adds 1 + 1 on my laptop, does the processor always produce the same amount of heat when that program runs? No, not necessarily. Right?

So the outputs of a program—what you decide to count as an output of a program—matters with regard to what you want to call determinism. You could pick a narrow definition of determinism, which is, what does the program evaluate to? Or you could pick a definition of determinism that says something about what side effects are produced by running the program, and then try to capture that in your notion of determinism as well. We can pick a narrow definition, which is what the program evaluates to, but it turns out that that’s pretty – even just that narrow definition of determinism is pretty hard to accomplish when you’re talking about programs that involve multiple threads of control.

And so deterministic parallelism is this kind of term for being able to – so I say parallelism, but perhaps I should really say concurrency. But deterministic parallelism is kind of the stock phrase that gets used, though. So that’s why I use it. This is the problem of trying to ensure that programs run deterministically, even in the presence of arbitrary thread scheduling. You might have threads that are scheduled onto parallel hardware, and different tasks could be running in arbitrary order.

So there’s some tricks that you can play to make sure that a program is deterministic, even in the presence of this non-deterministic scheduling, or that it always produces the same result. And typically, what that means is that you really have to constrain a lot of what the program can do. And so this is something that people in the programming languages community have been doing for decades.

What I did during my PhD was kind of look at one classic approach to deterministic parallelism and then generalize it a little bit. That was what the work on LVars was. We took this kind of classic work on what are called IVars, which are variables that can only be written to once, and that if you try to read from an IVar, your attempt to read will block until it’s been written to. So it’s a sort of synchronization variable where if I try to read from an IVar and it hasn’t been written to yet, my attempt to read will block until somebody else comes – some other thread perhaps comes along and writes to it, and then I can read from it. That’s an IVar.

The monad-par library is an IVar library for Haskell. So it’s a library that’s available. It’s also actually discussed in Simon Marlow’s book on Parallel and Concurrent Programming in Haskell. And what we did for my PhD was generalize the notion of IVars to what we call LVars. And so an LVar is a generalization of IVar where you can write to it multiple times, but those writes have to be increasing in some sense. So, for instance, you could have a counter where you can only count up. I could increment the counter, but I could never decrement. And then we’ve made writes more expressive now. We can do multiple writes. So if we want to keep determinism, we have to do something to kind of simultaneously decrease the expressivity of reads. So now, instead of being able to arbitrarily read a value out of that counter, instead, I can only ask a question like, has the counter reached, let’s say, the value 5 yet? And so that’s a question which we know, either it will eventually become true given certain inputs or it won’t. But the program will deterministically produce an answer given the same inputs, regardless of scheduling.

And so that’s what LVars are. They generalize IVars a little bit to let you do this multiple increasing or what we call inflationary reads, or sorry, writes. And then reads have to be what you call threshold reads, which means you can only ask this limited set of questions. And you can put those together to get – what it turns out is a data structure that’s useful for deterministic parallel programming.

WS (0:09:52): So this monotonicity sounds a bit like CRDT. What is it? Like conflict-free replicated data structures, data types.

LK (0:10:00): Yes. Yes, yes. So yeah, that’s exactly right. So CRDT is – just at the time that I was doing my PhD, CRDTs were beginning to come on the scene, and it was, in fact, people from the distributed programming community reaching out to me and telling me, “By the way, the work that you’re doing is a lot like what we are doing with CRDTs.” That was what got me into distributed systems in the first place, in fact. So in the latter part of my PhD, I began hearing from people who were working with CRDTs, and they were telling me, “Hey, what you’re doing is a lot like what we’re doing. You should pay attention.”

And so I began to learn about distributed systems. And I had no experience in distributed systems at the time, so I had a lot to learn. The community met me more than halfway. I ended up giving talks at a distributed systems industry conference, which was a lot of fun. And in the end, I ended up putting the word ‘distributed’ in the title of my dissertation. But I would say that at the time, I was still pretty clueless about distributed systems. And so it wasn’t until really the last five years or so that I feel that I’ve really started to become serious about studying distributed systems. And so that’s been the focus of what I’ve been doing for the last five years or so. And I’ve really jumped in – and as Niki knows, I’ve worked together with her on some distributed systems verification work with Liquid Haskell, and then more recently some other projects that – so I’m continuing to use Haskell, but I’ve jumped into this distributed systems focus.

WS (0:11:39): So it seems like a strange combination, though, right? Because usually, when you think of systems programming, you usually think of very kind of low-level bare metal writing device drivers and sending bits over the network at blazingly fast speeds, right? And then all of a sudden, you have Haskell, which is much more this abstract lambda calculus, which is lazy. It has all these weird properties that seem totally kind of unsuitable and foreign, right? So, it’s an interesting approach.

LK (0:12:08):* Oh, I think it depends on what you mean by systems. When I speak of ‘distributed systems,’ I mean systems that involve multiple computers that are communicating over some kind of a network where those computers can fail independently, subject to arbitrary failures and subject to – usually, we’re talking about asynchronous networks where there could be an arbitrary delay between me sending you a message and you receiving it. And so in that kind of a setting, the kinds of concerns that you’re talking about for systems programming, where you need to have very tight-grained control over things like memory layout and the time things take, those kinds of problems get dwarfed by the problems of dealing with network latency, dealing with machine failure, and these sorts of global-scale issues, right?

The concern there is not so much having this sort of fine-grained control over things like what memory gets allocated or having very careful control over every cycle, right? Because the concern is more – like if you’re going to be making network calls that take 200 milliseconds, then the details of whether something else is taking three nanoseconds or six nanoseconds doesn’t matter quite as much, right?

WS (0:13:33): Right, right. So could you say, like, people always claim that Haskell’s such a great language to write correct programs in because you can do equational reasoning and you have QuickCheck and you’ve, “Oh,” then Liquid Haskell and all kinds of tools to help you kind of write your programs in a way that is a high assurance or correct? And what you’re saying is, because of these abstractions and things like monads and MVars, IVars, LVars, and because we have this higher level of reasoning and composing programs, and because it’s really hard to get distributed programs, right, Haskell’s a more natural fit than you might think at first.

LK (0:14:13): So I would say distributed programming is just hard no matter what, and I don’t think Haskell is any kind of silver bullet here. However, I think that if you’re using a programming language that lets you rule out certain classes of bugs, right, which Haskell does, then that perhaps frees you up to have the bandwidth to think about a different kind of bug, which maybe the language doesn’t rule out for you. But if you want to be concerned with dealing with correctness of a distributed system, then it’s nice to not have to worry about the kinds of bugs that Haskell does let you avoid.

So I think the point that you’re raising is interesting. When I was working on my PhD, one of the things that we wanted to be able to implement LVars—and then this is also true for CRDTs—is within LVar, you’re getting these writes. Your data structure, you’re getting writes. They’re coming in from different threads. For CRDT, it would be similar except you’re receiving updates from different processes, perhaps over a network. And you want to ensure that updates are inflationary. That is, that you’re only – the contents of your data structure will only ever grow over time. And moreover, the different states that that data structure can take on, they have to be elements of a certain algebraic structure that’s called a join-semilattice, right? So there wasn’t really anything, any way to express in the Haskell type system that that was true. And I really wished that there was some way that I could say, using Haskell types, that these updates are inflationary or that the state this data structure can take on are elements of a lattice, and so on. And so I was frustrated that I couldn’t do that. Liquid Haskell was just beginning to come on the scene at the time when I was doing my PhD, and it’s a really promising technology to be able to express those kinds of properties. But Liquid Haskell was kind of – as I was doing my PhD, Liquid Haskell was kind of just more getting off the ground. And so this was like 2013, 2014. It was an exciting idea, but it wasn’t really ready for us to use yet.

And then I stepped away from distributed programming for a while in the time after my PhD and while I was working at Intel. And so it was really exciting to me when I came back to thinking about this stuff and began getting more serious about distributed systems, especially with an eye toward verification to come back to this stuff after I started my faculty position in 2018 and returned to looking at Liquid Haskell, realizing that Liquid Haskell had made a lot of progress in the intervening years and realized that it actually was a viable tool for being able to express and prove the kinds of properties that I had always wanted to be able to state and prove using Haskell types.

WS (0:17:16): So what kind of properties are you interested in there that you would want to prove? Because these distributed systems, they must be very complex, right?

LK (0:17:23): Right. Yeah. So for example, for CRDTs, the goal, right, is that you – so you have a bunch of replicas, you have a bunch of different copies of some piece of data. And first of all, why do we care about having replicas? Well, it’s because – so I’m here in California, Niki is in Spain. Wouter, I don’t know where you are, but I assume you’re in Europe.

WS (0:17:46): In Utrecht.

LK (0:17:47): Yeah. So you’re in Northern Europe, right? So we’re in different places around the world. Maybe we have a colleague who’s in Tokyo or something. And so if we all want to access the same data, it would be ideal if we could all have instantaneous message lossless access to the same piece of data, but it’s not practical. So instead, that data – let’s say it’s a photo album. Instead, that data is inevitably going to be replicated across multiple machines with different copies in different places that are close to each of us, right? So then we have lower latency access to the data. So the person in Tokyo can access the copy that’s close to them in Tokyo, and I’m in California, and I can access the copy that’s close to me, and so forth. And it also provides fault tolerance to have this replication. So if one data center gets hit by a tornado, then the data isn’t lost. But now we have this problem, right, of keeping these replicas in sync with one another. Then the more replicas that there are, the bigger of a problem it is. So replication is both a blessing and a curse.

Okay. So how do we do that, and how do we make sure that they stay in sync? CRDTs are a mechanism for doing that. And the trade-off that we make is we say, okay, we’re going to constrain the interface to this replicated data where it can only be updated in particular ways. And in exchange for that, we’re going to guarantee the property that as long as any replica has received the same set of updates in some order, right, in any order, then the replicas will agree in state.

And so it turns out that there are a lot of data structures that you can kind of encode with this approach, and it can be a pretty flexible mechanism that’s good enough for accomplishing a lot of things that we might want to accomplish. But how do we get there? You asked what kind of properties we want to prove. Well, for example, if I need it to be the case that the data structures will agree in state if they’ve received the same set of updates in any order, I need those updates to commute with one another. And so for all of the operations that the interface to this data structure allows me to do, I need to make sure those operations commute. And that’s the sort of thing that one can prove with, for example, Liquid Haskell.

NV (0:20:12): So CRDT does not ensure that all your replicas are the same, right? It’s a relaxed assumption. So is this like what, for example, Netflix is using?

LK (0:20:24): Ah, yeah. Okay. So we were talking about determinism a minute ago, right? And determinism says that the output of a program is going to be the same as long as the input is the same. So on every run, right, if you can compare different runs and you can see that given the same inputs on two different runs, you’ll get the same output. So actually, it’s what we call a hyperproperty, which means that it’s about comparing different runs.

The property that we’re asking for with CRDTs is not that. So you could actually get a bit different behavior on different runs. And then what Niki is talking about is also important to keep in mind, which is that at different points during a given run, replicas may differ, right? So if Wouter and I are both adding photos to our shared photo album, and Niki is adding photos to this photo album as well, then maybe we each have a replica that we are making updates to, and we can see our updates on our own replica, but there might be some delay before our updates get to each other. And furthermore, the delay could differ, right? So at any given time, maybe I see my updates and Wouter’s updates. Niki might see her own updates and my updates, but not Wouter’s and so forth.

And so the only guarantee we have with CRDTs is that if two replicas have indeed received the same set of updates, at that point, their states agree. But if they haven’t received the same set of updates, then all bets are off. And so if you need something stronger, right, if you need a property like, let’s say, maybe every replica, if you really need anybody who’s observing replicas to see – for example, you could have a property like every replica’s contents are – I’m having a hard time thinking of how to say it without introducing a lot more vocabulary. Okay, I’ll put it this way. You can imagine that there is some linearization of all the updates, and you could have some property like, well, if you don’t see all the updates, you at least see some prefix of all of those updates, right? So that might be some property that you might want a system to ensure. And CRDTs do not give you that. If the linearization says that there have been updates A, B, and C, with a CRDT, you might see A and C, or you might only see B and C or something like that. And so for some settings, that is indeed not good enough, right? If you need, for example, everybody to agree on a sequence of transactions and you need them all to agree in order, then perhaps CRDTs are not the appropriate technology.

WS (0:23:06): Okay, that makes sense. So you’ve mentioned quite a few times that you spent both time in academia and in industry. So you worked at Intel for a while, and before that, in Portland. Was that also Intel, or were you doing something else?

LK (0:23:21): No. Yeah. Although Portland is home to a lot of Intel people, I was not working at Intel then. So between undergrad and grad school, I worked for a couple of – well, at first, I worked for a startup for a while, which failed. And then, after kind of burning out on startup life, I worked for the opposite of a startup, which was a huge, old-fashioned publishing company, and was doing some programming plumbing for them. And that’s what I did between undergrad and grad school.

WS (0:23:57): Right. And what did you do at Intel after your grad school?

LK (0:24:02): Yeah, so I was a research scientist at Intel for almost four years between finishing my PhD and starting my faculty position in 2018. And I worked in a group called the Parallel Computing Lab, and I got to work on a couple of cool projects, open source projects, actually, which was not necessarily the norm at Intel. So that was great. I worked for a little while on River Trail, which was a deterministic parallel JavaScript implementation, which was really cool. And then I worked with the Julia programming language for quite a while. Our team was collaborating with the Julia team. And at the time, Julia didn’t have any native parallelism support.

NV (0:24:47): What is the Julia programming language?

LK (0:24:49): So the best way to describe Julia is, it’s a kind of a modern alternative to scientific computing languages like Matlab, and it’s a really cool language effort. It originally came out of MIT, but it’s now a thriving open-source project with a growing ecosystem, and it’s really cool. It’s got some cool features and modern tooling, and just some nice stuff. And I was fortunate to get to hang out a bit in the Julia community during the years I was working at Intel, and I really enjoyed my time as part of that community. In Julia, you can do – so it’s a language where you can do array-style programming. So you can do aggregate operations on arrays. So if you’ve ever worked with a language like Matlab, it’s kind of a similar style. But in my opinion, Julia is a much nicer language. So kind of a more coherent design and also modern tooling and a nice package manager, and all of that good stuff.

So the time that I spent working with Julia, at the time, Julia didn’t have any native parallelism support as in multithreading. At Intel, our approach to this was to compile Julia to OpenMP C++ and then compile that with a C++ compiler. We were working with the Julia team to develop this package that we called ParallelAccelerator that was basically – it was a Julia package, so implemented in Julia, but it was a compiler that would compile parts of your Julia programs that you annotated to a fast parallel machine code with the help of this external C++ compiler. So we were able to make some Julia code run way faster as a result.

WS (0:26:45): How would you compare your time at Intel and – I mean, if you’re a research scientist there, it must be a very different kind of research that you do when you’re in academia. Can you say something about that, perhaps?

LK (0:26:57): Sure, yeah. So when I was at Intel, I was one of maybe 40 people who were part of this large lab called the Parallel Computing Lab, and kind of making parallel computing better, faster, easier, was kind of the mandate of the lab. And this was a great fit for what I had done. My PhD was on deterministic parallelism and so on, but at the time, I was interested in kind of moving more toward distributed systems. And I was also interested in verification, right? So I was kind of moving more toward these other areas. And so over time, it became less of a fit for what I wanted my research interests to be. Whereas in academia, I have total control over what I want to research, right? If I decide to change direction, it’s very easy to do so. Well, okay, I wouldn’t say it was easy, but nobody’s telling me what direction I need to focus on.

Another thing is, at Intel, obviously, it’s a silicon company, right? So I was trying to be a software person at Intel, not only trying to be a software person, but trying to be a programming languages person. So there is a niche for software people and even programming languages people at Intel. And I felt like my managers did a great job of carving out that niche for me. But I always did feel like it was a niche, right, that I was kind of like the one weirdo who was interested in PL and surrounded by a bunch of people who were computer architecture experts.

And over time, what I started doing when I was at Intel was working a lot with external people. And after a while, my job turned into trying to raise money for external people. So trying to advocate within Intel for us to fund external research that I liked. And this was – I definitely learned a lot by doing this and got really experienced with the sort of communication that I needed to do to do that. But over time, I realized I’m spending a lot of time kind of cheerleading for other people’s projects that I like and advocating for other people’s research that I like instead of doing my own research. And so eventually, I decided that I wanted to return to academia, and that’s what I ended up doing.

WS (0:29:24): And do you enjoy the teaching as well, or is that a necessary evil?

LK (0:29:29 ): Ha, I love teaching. Teaching, for me, it’s one of the great parts of the job. I’ve tried to do a lot of things to make teaching more fun. Obviously, during the pandemic, I had to do a lot of teaching online. That’s less fun for me than being in front of a classroom. And so I tried to find a way to make it more fun, which was teaching my class on Twitch. So during the pandemic, I actually taught my undergrad distributed systems class live on Twitch twice, and recorded all of those lectures. And so they’re now on YouTube. And I have to say that it’s been a really rewarding experience. I don’t think that my lectures are particularly amazing, and they’re definitely not anything like professional YouTuber quality, right? There’s lots of long pauses, there’s lots of ums and uhs, there’s no cool music or visuals or anything like that. But the response that I’ve gotten to them has been absolutely incredible. And people have said that they’ve learned a ton about distributed systems from watching them and so on. It was really a rewarding experience and worth the effort to get my streaming set up going.

One thing that it did for me was – teaching online can be kind of a depressing experience if people don’t engage, right? But it helped it be more of a performance for me because I was speaking not only to my own class, but to the whole world. And I like to perform. And so the adrenaline from that, I think, ended up making it a better experience and made it more fun in a way that kind of made up for the downsides of needing to teach remotely.

NV (0:31:20): So are you also teaching functional programming?

LK (0:31:23): I do, yeah. So in the last two years, I have been teaching our undergrad programming languages course, which is actually inspired by the course that Ranjit teaches at UC San Diego, Niki. And thanks to my colleague Owen Arden for designing the course. The design kind of came through him and was originally inspired by Ranjit.

NV (0:31:46): Are you also starting with lambda calculus?

LK (0:31:49): We are starting with lambda calculus, yes.

NV (0:31:52): You know, the truth is that Ranjit started with lambda calculus so that he scares people and only the good ones stay in the class.

LK (0:32:01): I don’t know if that’s been my experience. I mean, I think certainly some students struggle with the lambda calculus assignment, but I think a lot of students approach it as kind of a fun puzzle to solve. I think – so I’ll say a little bit about the design of the course. So the course is taught in Haskell, and we spend the first part of the course – well, there’s first a phase of lambda calculus, and we use this great tool that was developed at UC San Diego, which is this Elsa programming language, which is kind of like a toy pedagogical language designed for building intuitions about lambda calculus. So, you can use Elsa to write down a lambda calculus expression and then explore different ways that that expression can be evaluated, and Elsa will tell you whether you’re wrong or right. So that’s kind of how the course starts out.

Then we spend some time just kind of getting our legs under us in Haskell. And so many students in the class—certainly most of them haven’t been exposed to Haskell before, and many of them haven’t been exposed to functional programming before. So we spend some time kind of just getting used to Haskell. But then, as quickly as possible, we switch over to doing what the actual content of the course is, which is talking about programming languages. So once we develop that foundation in Haskell, we then use it to implement interpreters. And then the course kind of concludes with this big project where you implement Hindley-Milner type inference in Haskell. And so we try to get all that done in 10 weeks.

I would say many students are kind of laboring under the assumption that the class is a Haskell class, which it’s really not. It’s a programming languages class. But nevertheless, Haskell is the substrate that we’re using to teach the class. And so, why do we do that? Why teach students a whole new language? So I think there’s a couple of reasons. And one, of course, is that Haskell is a very suitable language for implementing programs that operate on programs like we’re trying to do, right? So it’s a suitable language for implementing interpreters and type inferencers because the tools that it provides make a lot of sense in that setting. But then another reason to do it is that it’s useful for students to be exposed to a sophisticated modern language design, and that’s what Haskell is. And so, I think using Haskell is kind of good for both of those reasons.

I still think that there’s more that could be done in the course to kind of make it more clear that Haskell itself is not the point, right? And if I had more time – quarters are rough, right? You only get 10 weeks to get through a lot of stuff. So if I had more time, I think I would be able to kind of go further in illustrating how these ideas that we’re discussing play out in a variety of languages, not just Haskell.

NV (0:35:12): And your distributed class is in Haskell too?

LK (0:35:16): In my distributed systems class, we don’t mandate any particular language. So the way that that course works is we give students a project and we give them a specification that they need to implement. So they’re implementing a system which clients interact with via an HTTP API, but they could use any language they want to do their implementation. We don’t provide them any skeleton code or anything. So I would say that most of the students in the class use Python, but we’ve had a wide variety of languages over the years, including some fairly exotic choices. And we have had some students use Haskell for at least some of the programming assignments. I have done some of the programming assignments myself in Haskell, just for fun. But yeah, for the most part, there’s not that much Haskell in that class. It’s not that it wouldn’t be a suitable choice, but we don’t make any rules about the choice of language. The consequences of that choice are that when we test people’s code, we can only test it against this API, right? And so that kind of means that we can’t have the sort of observability into the behavior of their code that might be useful, right?

And so I’ve noticed that some other distributed systems courses provide a specific framework and a specific language and ask people to write code against this specific API in this language or even using a certain framework. And when you do that, you can have a way more sophisticated and deeper kind of testing than what we have. And so I am certain that there’s a lot of students who write code in my distributed systems class that is incredibly buggy, but still passes the tests, right? That’s a trade-off that we make when we decide that students should have the freedom and also the responsibility to start from a blank slate. Is it a good trade-off? I’m not sure. And that’s something that we need to continue to think about.

WS (0:37:30): So that’s very interesting. But how do you test these distributed systems?

LK (0:37:34): So the way that we test that code is we – and students get all of this. We teach them how to do this as well. We simulate having an actual distributed system using a bunch of Docker containers, and then there’s different nodes running in different containers and communication between those containers via network. And then we can do things like turn one of the nodes off, right? And then see what happens, how other nodes respond or react to that. So we can simulate failure in that way. And so that’s a good start, right? But it can be actually very tricky to try to exhibit some of the bugs that would show up in practice. And what I tell students is, like if they have written something that is buggy because it’s sensitive to timing in some way, the bug might never actually be exercised or might not ever show up in their testing. And so I tell students, “If you do see a bug, you –” often, students, their code may have worked great for a week and then they suddenly see a bug and they freak out because they didn’t change anything, right? So I always tell students, “You should be thrilled that you have a bug, right, because – you should be thrilled that this bug that was latent, that was there the whole time, is finally being exposed so you can finally do something about it.” Of course, it’s hard for students to see it that way. A better testing approach would be something that would always find the bugs if they are there, right? And of course, that’s wishful thinking, but maybe we could do better than we’re doing right now.

WS (0:39:21): So a lot of your recent work has been on something called choreographic programming. So what is this choreographic programming?

LK (0:39:27): So choreographic programming is a cool topic that I’ve been getting into in the last year or so. The idea of choreographic programming is that programming distributed systems is hard in part because you have a bunch of nodes that are communicating with each other, and each of these nodes traditionally has to be programmed individually. And so if you look at the design of distributed protocols, they have to be designed in such a way that if one node sends a message, another node has to be expecting to receive it and so forth. And so if you don’t carefully co-design the behavior of these different nodes, then you can easily run into problems where one node is behaving in some way where another node isn’t expecting that behavior. And in particular, if you have a mismatched send and receive, then you can have bad behavior or even deadlocks.

So choreographic programming is a way to try to avoid this. And the idea is you write a single program where, instead of having explicit sends and receives, you write a program that describes the behavior of the entire system, and you have one language construct that does both send and receive. So you’ll have a language construct which might be called something like ‘communicate’ or ‘comm,’ and it will say, “Alice communicates with Bob.” So instead of having – instead of programming Alice to say, “Send a message to Bob,” and programming Bob to say, “Receive a message from Alice,” you have a single program that has this single language construct that encompasses both send and receive. And then there’s a compilation step, which is called endpoint projection, which then compiles that single unified program, which is called a choreography, into distinct programs that run on each of the nodes.

NV (0:41:24): And this is a programming technique that is new?

LK (0:41:28): It’s relatively new. So the name ‘choreographic programming’ was coined about 10 years ago. And Fabrizio Montesi’s dissertation, which was called Choreographic Programming, I think was the originator of that phrase. But the idea of choreographies has been around longer. It emerged from the session types community. And so there’s a lot of similarity between work on session types and work on choreographic programming. And the work on multiparty session types in particular also has this notion of projection, right? So if you have a type, which is a multiparty session type, that describes at a global level the communication patterns in a program, you can then project that type to individual types that describe the communication patterns of different nodes. And so there’s also a notion of projection in the session types literature. The difference with choreographic programming is the thing that you’re projecting is not just a type, but actually a program. And so the work was inspired by and grew out of the session types community, but it’s not quite the same thing.

WS (0:42:39): So you mentioned session types, and session types, they use the type system to record a series of read and write operations to a single channel or multiple channels, but that sounds quite similar to choreographic programming. Can you say something about the differences?

LK (0:42:54): So I think the key difference is that when you do projection on a multiparty session type, you get local types, right, for individual nodes. And when you do projection on a choreography, a choreography is not just a type, it’s a program. And then what you get is individual programs. You could give types to choreographies using session types, and people have, but not always. So some work on choreographies does use session types. Ours doesn’t, actually. The work that I’ve done on this most recently with my students has been a framework for choreographic programming in Haskell. And so what we found was that choreographic programming is really cool, and it’s this emerging technology, but to do choreographic programming, there are very few practical choices. Some choreographic programming languages exist, but they’re research languages. If you want to do choreographic programming in a mainstream language without having to use a distinct compiler, then there really weren’t a lot of options out there.

So we think that our framework, which we call HasChor, is the first implementation of choreographic programming as a library. And not only that. But because it’s implemented as a library in Haskell, we’re able to piggyback on a lot of the wonderful things that Haskell gives you. So for example, Haskell’s great for higher-order programming, and higher-order choreographies are kind of a very recent cutting-edge feature of choreographic languages. And we get higher-order choreographies kind of for free because we’re just able to piggyback on higher-order programming in Haskell.

Likewise, another thing that is relatively new in choreographic languages is what’s called role polymorphism or location polymorphism, where you can write a choreography that’s polymorphic over location. So, for example, instead of just writing a choreography that’s about interaction between Alice and Bob, I can write a choreography that’s about Alice interacting with an arbitrary other person, which I take as an argument. That’s really useful for being able to implement things like a service, for example. Something where you would expect clients to connect and you don’t know in advance what the identity of those clients are, right? For that, we need some sort of location polymorphism. And that’s something that we also get from our choice of embedding in Haskell. And so HasChor lets you do location-polymorphic and higher-order choreographic programming in Haskell. And it’s all just implemented as a library, so you can use your usual Haskell tooling to do it.

NV (0:45:40): And what is the magic? Are you using like Template Haskell or something?

LK (0:45:45): There’s no Template Haskell. So what we have is we have a monad, which choreographies are computations in this monad, which is called Choreo. And there’s kind of three things that you can do in a choreography. Well, it’s two things. You can either do local computations or you can communicate with somebody else. So there’s kind of like three language constructs that HasChor has, which are the different operations that you can do inside a choreo computation. And those are – so you can communicate with somebody else. That would be like Alice communicates with Bob. There are local computations.

And then one other thing that I didn’t mention is conditionals. So conditionals are kind of interesting in a choreography because, for example, let’s say that I am doing some computation and there’s some conditional, and then it has two branches. Well, if I want to express this in a choreography where I’m interacting with other people, it could be tricky. How do I explain? If I need to make a choice and I want to behave differently in the two branches of my conditional, and in particular if the communication patterns differ in those two branches. So let’s say that I’m checking some condition, and then if it’s true, then I want to send a message to Wouter, and if it’s false, I want to send a message to Niki. Because the communication patterns are different in those two branches, for me to be able to compile that choreography in a way that – or project that choreography in a way that will make sense when it’s run on each of our three nodes, then you both need to be informed about the outcome of my local choice, right? Because if I’m going to be sending a message to Niki, then she needs to know. If I’m going to be sending a message to Wouter, then he needs to know.

And so we have to have this special notion of conditional, which is like choreographic conditional choice. That’s the third language construct in HasChor. So we have regular communication, we have this choreographic conditional, and then we have local computations. And so within that framework, you can do really anything you want inside a choreographic computation. A choreo computation is parameterized by another monad. It would typically be something like IO. And so you can do arbitrary like user input or whatever you want, and then express the interaction that you want to express via the choreographic constructs that we provide.

NV (0:48:13): That is cool. And the library is available in Hackage?

LK (0:48:17): It is, yes. Well, okay. So it’s not on Hackage yet, I think, but it’s on GitHub. And hopefully, it’ll be on Hackage soon, and we’ll be presenting a paper about it at ICFP this fall.

NV (0:48:28): Cool.

LK (0:48:29): One other cool thing to mention, I think, about HasChor – and this was my student Gan’s great idea. So the implementation of HasChor uses freer monads. And so the choreo monad is implemented using this freer monad idea, which is basically a way to abstract away the details of how you’re actually doing communication under the hood. And so it works out really nicely because if I’m implementing a choreography, I have this communication notion, and eventually that has to compile into some sort of real notion of send and receive. However, what send and receive might actually mean could depend on the setting, right? It could be making network calls, it could be using TCP, it could be using HTTP, it could be using passenger pigeons, or I could be using it for something locally, right? I could be using a choreography to coordinate threads on a single machine. And so maybe send and receive aren’t making network calls. But that is all implemented via a backend—a separate backend that a HasChor programmer can plug in. So HasChor has this notion of pluggable backends, and we have a provided backend that uses HTTP that programmers can use, but they can also supply their own backend if they want to. And the implementation of HasChor makes this really straightforward and simple.

And so I would say it’s programmer-friendly in the sense that we don’t impose any particular choice of backend. And in principle, you could use HasChor to program any kind of system where you have entities that communicate via message passing. As long as you can think of your program as doing sends and receives, if you’re willing to implement a suitable message passing backend that supplies the implementation of send and receive, then you can program that system with choreographies using HasChor.

NV (0:50:31): That’s very cool. So we’re approaching the end of the recording. So do you have any final wishes you want from many – from the Haskell or from the distributed community?

LK (0:50:46): Oh boy. So if the question is, if a Genie could grant me three wishes to somehow improve Haskell or the Haskell community, I don’t know exactly what I should wish for because I think I haven’t yet plumbed to the depth of Haskell and I don’t know if anyone has, right? There’s probably a lot that Haskell has to offer that I don’t know about. But maybe one issue is that Haskell is a big language, right? There’s kind of a lot that’s stuffed in there into Haskell and the runtime system. And an observation that my student Patrick Redmond and I made recently – and we have a paper under submission about this, so I hope I’m not de-anonymizing us. But we observed that it was possible to use GHC’s asynchronous exceptions to encode something that is very much like an actor programming framework. And so piggybacking on the existing asynchronous exceptions to send and receive messages between threads. And we found it somewhat shocking that this was possible to do. And it made us wonder, okay, does this mean that asynchronous exceptions are too powerful or that the runtime system has kind of too many features packed into it?

And so I don’t know the answer to that question, right? And I wouldn’t presume to try to make claims about the design of Haskell when I don’t know all the details of the decisions that went into that design, right? But it was nevertheless something that surprised us. And so I think Haskell is – and I’ve noticed also, working with students who are new to Haskell, Haskell can be a challenging language to learn, the behavior. Laziness is often very surprising, right? Laziness surprises me sometimes, but I don’t necessarily have any concrete suggestions. Maybe this is just the way it is.

NV (0:52:48): So you want some less surprising laziness.

LK (0:52:52): Well, isn’t that what everybody wishes for, right? We all want performance predictability, and we want efficiency and we want safety, right? We want all those things at once. Another observation that a student of mine made was that Haskell has, at every step of the way, made choices that were kind of maximizing laziness, right? And it would be interesting to explore a point in the language design space that’s somewhere in between what Haskell does and a fully strict language. And that’s an interesting question. What are the points in that spectrum that are underexplored? But this isn’t necessarily advocating for changes to Haskell so much as advocating for more language design research.

NV (0:53:44): Well, okay, thank you.

LK (0:53:48): Thanks to both of you.

Narrator (0:53:52): The Haskell Interlude Podcast is a project of the Haskell Foundation, and it is made possible by the generous support of our sponsors, especially the Monad-level sponsors: GitHub, Input Output, Juspay, and Meta.

34 – Lindsey Kuper

Related links

Transcript