Gabriele Keller

Transcript

This transcript may contain mistakes. Did you find any? Feel free to fix them!

Andres Löh (0:00:15): This is the Haskell Interlude. I’m Andres Löh. My co-host is Joachim Breitner.

Joachim Breitner (0:00:20): Hello.

AL (0:00:21): Our guest today is Gabriele Keller, professor at Utrecht University. We talk about her journey through the world as well as programming languages, why Haskell is the best environment for embedding languages, how the desire to implement parallel programming sparked the development of type families in Haskell, and that teaching functional programming works better with graphics.

Okay, our guest today is Gabriele. Why don’t we start by you telling us a little bit about how you started programming and how you eventually ended up with Haskell?

Gabriele Keller (0:00:55): I started programming basically because when I was in my teens, my brother bought a Commodore 64. And I love playing games on it. At some point, I got bored playing games. Also, we didn’t have a huge range of different games. And at that point, you could buy these Commodore magazines, which had listings basically, which you could –

AL (0:01:21): Type in for hours.

GK (0:01:22): Yeah.

AL (0:01:24): And then after five hours of typing, get some kind of weird error and start looking line by line.

GK (0:01:32): You would get syntax error. And I think it was maybe in sixth grade or so. I didn’t know what syntax was. And I also didn’t really know the meaning of error anyway. So basically, just copying the program without understanding anything, what it actually meant. I really enjoyed the fact that you could create your own tiny little game. And then from that, I basically tried to reverse engineer and change little things. So, I did a little bit of Commodore app basic programming. I did enjoy that a lot.

And then later at school, we had computer science, not as a regular subject at school, but clubs after school clubs. And I learned some Pascal there. And when it was basically time to decide what to study, I always liked mathematics, but wasn’t really sure what you actually do as a mathematician. So, in Karlsruhe, where I started studying, you could basically either study math with computer science as an elective. But computer science with math as an elective, you had exactly the same program. So, I started with computer science and math as an elective. And then I think I realized that basically the part of mathematics, which I really liked, was the puzzle solving. And I found that in computer science, I had more of that than in the actual math lectures. And there the language we used was Modula-2. And then my first experience with the actual declarative language was when we had, I think in the second or third year, PROLOG.

AL (0:03:19): Oh, okay. Yeah. That was my first declarative language as well.

GK (0:03:23): I really remember like the first exercise was that we had to define a tree data structure. And by then, I knew Modula-2, I knew a little of assembly, and I had no idea how to do that in Pascal. And then I was really frustrated. I asked a friend, and he showed me basically this one line, and I thought, yeah, well, that’s basically what I need to do, but that’s not an implementation because I was thinking about what you had to do, what has to happen in memory to define a tree. And I think that was a really wake up, that it’s possible to program without describing in detail what has to happen at the machine level. I mean, that was really well before Haskell yet.

Then we moved to Berlin. So, I also started basically studying, continued my studies in Berlin. And I worked as a basically teaching assistant there for first-year course, and that was in functional programming, but that was in Miranda, at least the first few years. And then when Gofer basically became a thing, we switched to Haskell. So, I mainly did some programming with programming assignments there, but I didn’t do a lot of programming, Haskell programming for applications for myself.

AL (0:04:55): Yeah, perhaps we should remind our audience with slightly less experience a little bit about what Miranda and Gofer actually are, because I could imagine that many Haskellers these days don’t actually know. So, I mean, I guess, do you want to say yourself or –

GK (0:05:14): Yeah. So, I’m not completely familiar with the history of Miranda, but it was basically, I think, a predecessor of Haskell and had already a lot of Haskell features. Like not all the type system features, of course, that Haskell is now, but the core Haskell.

AL (0:05:33): Yeah, it looks quite like Haskell, right? Even like syntactically already. And it’s a lazy, strongly typed functional language.

JB (0:05:43): How did it do IO? Did it already have the IO monad, or is it something that –

GK (0:05:48): No, no, no.

AL (0:05:49): I think it had this interact, but I’m not entirely sure. Like, I mean, where you basically can define a main program as a lazy string to string, but I’m not entirely sure myself.

GK (0:06:00): Yeah. I think so as well. I think the continuation passing style, I think that. I think it was early Haskell, not Miranda, but the course was also more data structures. There was very little IO. Yeah. And it’s kind of a while ago.

AL (0:06:19): Did you like – I mean, did you like this, and did the students like this at the time?

GK (0:06:25): So, I liked it a lot. For the students, it was basically that Stefanie, who used to be a professor in Karlsruhe, came to Berlin, and the whole group came to Berlin, and they taught this first year in functional programming. And I think the beginning of the course was too challenging. Like you had a group of students who really loved it, and then many students who struggled with it.

JB (0:06:54): Is that the story of every functional programming course anywhere?

GK (0:06:58): Probably yes, but also, I think it probably was a bit too sudden. I think we now also hope we are better at introducing and starting slowly. The nice thing was like you had a group of students—I think two group of students who really liked it. Some students who had a lot of programming experience and basically had a similar impression that I had with the PROLOG program, that people were really amazed in what you could express with very little effort, basically, or not necessarily effort, but a few lines of code. And then one thing I found was that people who had no programming experience at all sometimes felt this actually worked quite well for them because they didn’t struggle that much with the switch.

AL (0:07:54): And perhaps also – I mean, certainly at the time there was not all that much you could do, right? I mean, on the one hand, it’s impressive how much you can express, but at the same time, if you have a lot of programming experience in other languages and have written, let’s say, real-world stuff, then the lack of IO and the lack of all sorts of libraries, certainly, at that time, was really significant limitation, I guess, in order to get real things done. So, I guess that may have been more of a disappointment to people who already have had a lot of programming experience than to people who are just starting out and are just learning things.

GK (0:08:36): Yeah, that’s certainly true. But also, I guess in the first year, people mainly focus on passing the exams and doing the assignments. We attracted a number of students who then wanted to do research in it. Some enjoyed it, but it was certainly not something that people generally like. And then I finished my studies and I always – first, I worked as a teaching assistant, and then I had a part-time job basically as a programmer for a company who did point of sales systems, and there I did C programming. So, at that time, basically most PhD positions—almost all PhD positions in Germany were these five-year positions as a lecturer. And due to the switch from Karlsruhe to Berlin and me working part-time most of the time, I had basically spent a lot of time at the university. So, I wasn’t really interested in doing a PhD. But then the professor where I had done my master thesis, they got these PhD scholarships, what’s called the Graduiertenkolleg, where you could do the PhD basically in three years.

One thing I had come across was this NESL language by Guy Blelloch, which was kind of a – it’s a small functional language, but the main aim is basically it’s a parallel language where you can express nested data-parallel programming. And I have been playing around with it. And it’s really – so the nice thing about this language and one thing I really love because I had done some parallel programming with MPI was that all the parallelism is basically expressed via maps and folds and so on. And I thought that was a very elegant way. And then I downloaded it. I installed it. And we had a Cray T3e at that time in Berlin, but running NESL was really slow because it was designed for vector machines.

So, to come back to my PhD, I thought it would be really interesting in having something like NESL, the programming model of NESL, but implemented in C because I thought that was the main performance bottleneck. And when I got offered this PhD position, I proposed that as the PhD topic. But then I realized fairly early that the performance bottleneck was not C or whether it was a functional language or C, which was the host language, but the whole problem with how do you schedule these operations on a distributed memory model.

AL (0:11:27): How did NESL itself work? I mean, was it compiled to something or –

GK (0:11:33): It was interpreted basically. Now it was compiled to C, and then you had this C library, which had basically a parallel map, a parallel fold, and so on. And the whole trick about NESL, because it’s a nested data-parallel language, is that it does what’s called flattening. So, you can have basically maps of maps of maps of folds, and it manages to map all of these to regular folds and regular maps and one segmented. So, one level nested maps and folds. This is basically what you can do with a fixed set of parallel operations. And Blelloch did that initially for vector machines, and in vector machines, this runs very effectively. But on general purpose of architectures, you can imagine if you do one fold after a map and so on, it gets incredibly slow because you have all these intermediate data structures.

What you basically need, the same optimizations you also need in functional programming, is fusion. You fuse different traversals of data structures together. But in contrast to regular sequential functional languages, you have to do this in a way to preserve the inherent parallelism.

In the end, I worked – I wrote this compiler for small nested data parallel functional language, which did a lot of fusion. But the reason why the language I’m working on was functional was more or less just that in a parallel context, the functional language is a lot better. Like you don’t have to worry about side effects. And I had given up on trying to define a pure subset of C to use that. But all my work was really more in the parallel computing community and not in the functional community. And the main topic was really to have complex user-defined data structures and also managed to flatten them and map them to an efficient machine representation.

So, in NESL, you just have primitive types and tuples. And I wanted to have arrays of lists, which then get automatically mapped into a sequence of packed arrays. Then I finished my PhD there, and the timing was extremely unfortunate because I worked on this parallel computing, but it was basically at the time when the interest in parallel computing really vain because just off-the-shelf commodity hardware got fast, so much faster than these specialized machines like Cray and so on. They were incredibly expensive, but a few years later, regular PCs came already fairly close to their performance. So, no one was really interested in parallel computing when I finished my thesis. And as a consequence, all these conferences where I published were a bit depressing because I remember being at a conference where I think the average attendance was less than the number of speakers for a session.

And then my partner, he was already in the Haskell community and published an ICFP. And then he said, “Well, all these things you’re doing are actually also interested for functional programming, right? Because it is a functional programming language you’re working on. Why don’t you try to publish this at ICFP about the transformation of data structure and how you can flatten them and map them to dense arrays?” And that is effectively basically how I came back to Haskell or how I did actual work on Haskell because then we wrote this paper on the flattening of complex data structures, published it at ICFP, and ICFP was a completely different experience to the parallel conferences I’ve been to. Really, I love the community. I found it was a really welcoming community. People were really interested. And from that on, basically we worked on trying to implement these things, which I had done for my thesis in the context of this small, minimal functional language in Haskell.

AL (0:16:30): Okay. Well, that’s quite a journey. But I mean, before that, you had still been working on top of C, or have you then basically been trying to publish your own small languages that are specialized for parallel programming, I guess?

GK (0:16:50): Yeah. So, I basically published with this specialized language, and it was really just like Miranda, like Haskell had a strict language. But the publications I had were not on the language level. It was basically just what the runtime system looked like, what the mapping really to parallel data structures, how the communication worked. These languages, if you actually look at them like High Performance Fortran, they are functional as well. But I didn’t market that as a functional programming language and really talked more about the backend and the implementation.

AL (0:17:33): Right. And I guess when you tied the work back to Haskell, that is what then became – what was it called? DPH?

GK (0:17:39): Data Parallel Haskell, exactly. Because then we thought it would be so cool to have this nested data parallelism in Haskell. And the nice thing about the nested data parallelism really is that it’s almost as powerful basically as control parallelism because you can map parallel programs over already parallel data structures, and there’s no restriction there. So, we started with this data parallel Haskell project, and the idea was really there. What we did was to implement this as a part of an extension to GHC, and it used the ideas behind what I had for this small language, which I never even named—all the flattening of data structures, all the things I had basically done on fusion, because it’s really essential. If you try to get any performance out of that, you have to fuse very, very aggressively and fuse beyond what’s done basically in the sequential case.

AL (0:18:46): But also add new problems, or did it just become a good vehicle for your work?

GK (0:18:58): Yeah. I mean, it added a lot of problems because my language was nice and restricted. So, for the purpose of my thesis, I could ignore all the features, which were a bit too complex. So, trying to integrate that in a general-purpose programming language was really difficult.

The other thing which we struggled with a lot was we wanted to do this mapping in the fusion using the GHC rewrite rules. And fusion anyway is a very, very fragile transformation, right? Because you have to be very careful with the inlining. It interacts with all kinds of optimizations with inlining. Things can break it. And then basically whenever we had our benchmark suite, everything ran okay or performed okay. Then a new compiler version came out, they changed something, and everything broke. In the end, we never managed to get that to perform reliably. And that’s not just fault or the reason was not just to get it to work reliably with the GHC, but generally, this nested data parallelism using flattening for any kind of program is incredibly difficult. And at some point, it’s very easy to write programs which have horrible performance. And then at some point, we basically decided that we had all this infrastructure that worked quite well for some classes of parallel programs to have a more restricted language and give up on this general model of nested data parallelism and try to implement that efficiently for any kind of problem and look into some subclasses of problems which we could handle effectively.

So, several things came out of that. We had basically this Repa language. And the idea there was to start from a model where you don’t actually try to manifest all the intermediate data structures, but keep that as calculations, and only when you force the evaluation, do the computation.

And the other thing that came out of that was basically the Accelerate language, which is a language embedded in Haskell because it is restricted in the sense that it does not allow nested data parallelism. It just allows regular nesting. And the other thing is because it’s an embedded language, we wrote our own compiler. So, it was all embedded in Haskell, but we didn’t rely on GHC. It was a lot easier to achieve the performance there. But basically, we use all that we learned from data parallel Haskell for this implementation.

And in terms of this whole mapping from user data structures, like the user view of data structures to low-level implementation data structures, which I had done in my compiler more or less by hand, and then first in data parallel Haskell, we try to do that also with translation rules that triggered basically the interest in associated types and type families, because that’s really the mapping. Just try to do that in a nice way on the type level.

JB (0:22:47): So, the whole idea of type level computation in Haskell came from you needing to implement Accelerate?

GK (0:22:55): More or less, yes.

JB (0:22:56): That’s – I didn’t know that.

GK (0:22:59): So, we wanted to have something, like in C++ you have the traits stuff. You have these STLs (Standard Template Library) in C++ and the self-optimizing libraries in C++, where basically with exactly these type-to-type functions, you manage to map some user-level types to low-level types. We wanted to have something similar in Haskell. And basically, the associated types and the type families was what came out of that.

AL (0:23:33): That’s interesting, really, but I mean, also understandable that all these – like in particular, I guess, more of these embedded languages—Repa and Accelerate and then DPH itself—have sort of like strong requirements for the type system. But I also wanted to ask how this all tied in with your personal career, because suddenly all these things were happening in Australia, right? I mean, before you were talking about Karlsruhe and Berlin. So, how did that happen and how – I guess you made Haskell big in Australia in a way. So, could you say a little bit more about that?

GK (0:24:18): So, it basically came via Japan to Australia. I had just started my PhD when my partner got a job offer at a university in Japan, Tsukuba, to work as basically a lecturer there. And because my PhD position was just a scholarship, I could basically just pick up my stuff and move there and continue my work in Japan.

And then after I was finished with my PhD, I submitted my thesis, and I had to wait six weeks or two months for the defense to happen. And I was just looking at a mailing list. And Barry Jay, who you might know, he’s done this Bondi language and done a lot of pattern matching calculus. He was looking for someone to work on a project in Sydney for a couple of months. And I thought, okay, that’s perfect. I really need a change of scenery and distraction from waiting for my defense. And then I went to basically visit Barry Jay and worked on his project for two months. And while we were there, it was, I think, ’98 or 1999. It was basically just at the height of one of the IT bubbles, and they were hiring a lot of lecturers at the university there and decided to apply.

And I started working at UTS, in the University of Technology in Sydney. And then after a few years, I switched employer and moved to the University of New South Wales. And yeah, Manuel had in the meantime basically started working there, and they were teaching already. They were already teaching Haskell as a first-year programming language at UNSW, and it was basically fairly easy, or we were really lucky to attract a lot of really good students via this first-year course. And then we taught other courses of software construction. And yeah, I think that was basically the start of the programming language group at UNSW.

AL (0:26:49): So basically saying that the course already existed. I mean, you did not have to make a case for teaching Haskell or teaching functional programming. That was just something that you had to – like a nice coincidence or –

GK (0:27:07): Well, yeah. So, the course already existed, but they have been using Haskell for a while. But I had started at UTS, my previous employer. So, I’ve been using functional programming there as well. They had a very strange course for the first year. It was one-third PROLOG, one-third functional programming, and one-third just logic, predicate logic. I think this course was a result of a compromise of when the bachelor was restructured. Some people wanted to keep some logic in, some people wanted to keep some functional programming in. So there was this course, which was very hard to teach because it was basically just a mix of things and it compromised some committee, right? And there I started to write down some lecture notes where I tried to get this together, this idea of declarative programming and then functional programming.

And then when I moved to UNSW and started teaching their first-year functional program course there, I worked more on the lecture notes. And we later published that as a book, how to, you know, first-year programming basically in Haskell, because one thing I really wanted to do was I thought, also actually from my experience in Berlin and at UTS, that there are really advantages to teaching functional programming as a first-year language. But I didn’t want to emphasize the fact that it’s functional programming because, for first-year students, I think that’s not really important. It’s important to have fun with it, to be able to do some nice interactive things. So, we started there, teaching IO from the beginning. So, have nice graphics, little games, things like that. So, that was the underlying theme of the course, I think, to get people in a playful way into programming and also make it easy.

I think in computer science education, you really have this problem that most universities basically say, “Well, you don’t need any experience in programming. We’ll teach you from scratch.” But then you have a very diverse class, and some people have a lot of programming experience, others don’t. So, you have to try and keep the ones who know a lot entertained and interested and make it challenging for them without losing people who have no background knowledge and basically even might still struggle to use an editor effectively. And we thought this kind of using lots of graphics and basically starting there in a playful way is a nice way to draw people in. So, that was how. But the Haskell course already exists there. But I think how it was taught and the content changed a lot.

AL (0:30:38): It’s interesting that you’re saying that because, I mean, it’s a common argument that you’re hearing about this, like different levels of incoming knowledge that functional programming serves to somehow level the playing field or something like that, because it’s something that far fewer people have advanced experience with. I’ve never been completely convinced that that is really a good approach, but you’re making, I think, a different argument, which I like much better, which is basically some sort of case for effectively using DSLs with graphics and stuff like that that make it easy, even for people with little previous experience to achieve quite a lot without being exposed to all the nitty gritty details and have to be confronted with everything in advance, right? And I think that’s actually one side of the story, which I quite like.

JB (0:31:39): So, how did that look like? Did they get some program that was complete with IO and graphics and a little comment saying, “Only edit this one line to do your little exercise and ignore the rest, but look at it if you want,” or did you hide the complexity of IO? Did they have to actually learn writing graphics before they did the exercises on data structures?

GK (0:32:02): No. So we didn’t hide the complexity of IO, but basically, I mean, we started really – the first program was a Hello World and IO. I think we didn’t mention that it’s a monad or anything. We just basically told them how to plug IO operations together. And I forgot the name of the library. We use the graphics library.

AL (0:32:29): Was it gloss already at the time?

GK (0:32:31): Gloss, gloss. Yeah, it was gloss. We wrote basically a layer of a gloss, which made it a bit easier to compose little graphics things. So, kind of writing trees, recursive tree diagrams, and things like that by visualizing trees and lists. So, we didn’t try to hide the complexity of IO, and I don’t think that it’s actually that difficult. I mean, it’s basically you plug IO operations together.

JB (0:33:07): Right. Like for textile, I understand you can do that, but if you hear graphics and new interactivity, then I guess you are glossing over some technical details by using gloss. But that’s quite reasonable because doing that from scratch in a beginner course would be –

AL (0:33:23): Yeah, gloss is a massive abstraction, but in a very good way, right?

JB (0:33:27): Absolutely.

GK (0:33:28): Yeah. And they basically had been lists of objects and of graphical objects, which – and we did the displaying of them. And we have these not competitions, but the prettiest, the top prettiest visualizations of trees on websites, which I like because it was not compulsory to do that and it was – in some sense, a lot of people who might not be that interested in taking the program as fast as possible and so on got really drawn into making pretty trees and nice things and complicated graphics. So yeah, I mean, we did not do a proper scientific study, whether this is really easier for people who haven’t done any programming. But I think one thing is, if you enjoy things, people tend to learn better. And being able to create nice graphics is usually something people quite like. I mean, this is coming from my own experience. That was what interested me in programming in the end, like writing nice games, being able to do something which you can actually see on the screen.

AL (0:34:49): It goes back to the Commodore 64 listings. I mean, you had quite big classes, right? I mean, there were many students and there was, if I remember correctly.

GK (0:35:02): Yeah. It was pretty rough because I think the first classes were around like 900,000 students. So, the only lecture hall that actually was big enough was not a lecture hall, was the big Ola. Especially when you’re not an experienced teacher, it’s pretty rough. And just the logistics of – we had—I don’t know how many teaching assistants—like I think around 20 teaching assistants are organizing. All of that is, yeah, exhausting, especially in the beginning.

AL (0:35:43): Yeah. I mean, you have to find 20 teaching assistants who have the necessary knowledge of Haskell already so that they can actually assist, right?

GK (0:35:53): Yeah. I think that was always relatively easy because you also have a large pool to draw from, right, if you have these large classes. So usually, that was also organized very well, I think, at UNSW that the teaching assistants did get training, and people knew who could teach us were there.

AL (0:36:21): So, did the teaching also inspire your research again?

GK (0:36:27): Not the functional program teaching, but the other courses I taught were basically operating systems. So I, on one hand, taught a nice abstract verification and functional programming. And then I taught operating systems. And that inspired basically work later on when I worked with the people in the operating systems group more on the verification. And we were the operating systems group and our programming language group were initially located on the same open-plan office and just working with the teaching assistants there. We polluted their minds with functional programming ideas.

Also, for me, I think I was always very interested – so, I like functional programming, but I mean, the reason I like parallel programming and all these things is that I am also really interested in all the low-level stuff and getting things to run fast and doing these optimizations. So there, I think that was really, for both sides, very good situation to exchange ideas there. That was the group who later worked on the seL4, this verified microkernel. It was like Gernot Heiser’s group. They did a lot of research in operating systems.

And then at some point, when NICTA, a new research organization, was set up, they had this project trying to verify microkernel, operating systems microkernel. And they had this challenge, right? If you want to verify an operating system kernel, you have to work with formal methods people, and you have to work with systems people. It’s a pretty big gap between them, right? Usually fairly disjoint communities. And they first started basically with a high-level specification as a kind of interface between the systems and the formal methods people. And the first attempt was to do that in C++. And that didn’t seem to work. And then they tried to do the same thing in Haskell. So, they wrote this kind of specification in Haskell. So, I was not actively involved in that project. There was just – we were talking a lot about it over lunch. So, a lot of the PhD students who were basically both teaching in the operating systems and in the functional program course were involved there.

I later got involved in that project after they had done the full verification of the seL4 kernel. One of the follow-up projects was that they wanted to have a higher-level specification language for other systems components. So, for file systems, for example. And their idea was to generate then C code already verified, efficient C code, to automatically generate it because they thought the approach they applied to formalizing or verifying the seL4 kernel just would not scale to all the other systems components. I mean, that was a huge effort to verify the kernel.

And then I got involved and I started leading that project. And initially, I talked a lot with the seL4 people, what they actually wanted, what they expected from a language like that. Again, it was basically hard because the language is very different. And I remember it started. One person I talked to said, “Well, we had this for the seL4 kernel. We modeled the state as a big monad. Now we want to have lots of little monads.” I didn’t really understand what they were talking about until I realized, like, basically for them, a monad was a synonym for state in general. And what they wanted to be able to do, instead of threading the whole state all the time through all the functions is basically check out small parts of the state, thread that through, and be able to argue about that.

I looked at the – there was one project which already looked at writing file system. It was called Bilby by Sidney Amani. That was basically built for verification. And I thought what they were actually doing, the way they structured it, reminded me a lot of linear types because they had these annotations of types that were basically threaded through. So, that was basically when the idea for this language, which was later called Cogent, was born, have basically a functional systems language where you model the state via linear types.

And then later on, this project started with this proposal, and then this became a fairly big project. So, I had one of the teaching assistants in the functional programming classes, Liam O’Connor. He started his PhD, and he basically was one of the main people who – lots of people were involved in that project, but that was the core of his PhD thesis then, this language, which was not what they initially envisioned as a high-level specification language where you synthesize the systems code, but the language was already fairly low level. But what we did was that the compiler emitted the C code and an Isabelle proof script that showed that the generated C code is basically semantically of refinement of this Cogent.

AL (0:42:46): That’s pretty impressive. Okay. So, in recent years, you somehow ended back up in Europe.

GK (0:42:54): Yeah, yeah.

AL (0:42:45): I mean, you’re now at Utrecht University, right? So, has that changed the focus of what you’re working on yet again, or are you basically doing the same things like verification, functional programming, Haskell?

GK (0:43:13): So, I was still basically – when we moved for a while, I was still working fairly closely with the people who were involved in the Cogent project and went to Australia, not regularly, but once or twice a year. But then during COVID, that made basically work together fairly complicated. And also, there was a lot of restructuring. So, I’m currently not working with the people in Australia anymore. We have a related project. In the sense, a similar idea of having this code and proof code generation currently together with Wouter Swierstra and PhD student, Jacco Krijnen, and Manuel is also involved in that. And there, the idea is for Haskell, or basically Plutus, which is a subset for the Plutus compiler, subset of Haskell basically, to have, again, both with the compilation emit a proof, but this time in Coq that establishes the correctness of the compilation. So that idea is the same, but it’s a lot more incremental with Coq. And this was really an all-or-nothing thing. Either we were able to generate this Isabelle proof or not. And then with now this project working on Plutus verification, it’s really more about incrementally verifying individual passes of the compiler. But basically, the idea is a bit the same.

And then we’re also still working on the Accelerate language. We did work there on basically about how can you allow users to have user-defined data types in an embedded language in a way that gives you access to things like pattern matching and so on and how can you compile that, and also lots of backend things for Accelerate. So, in some sense, I’m still following the same themes.

JB (0:45:21): Given that you work on this, Accelerate started the type families feature. And I guess that kind of then involved the whole thing about dependent Haskell. Is there something about like later developments in the type system that really excites you for what you’re doing with data parallel Haskell and similar things? Are you looking forward to fully dependent Haskell, or is it actually not what’s on your mind in that way?

AL (0:45:47): Or the linear types? I don’t know.

JB (0:45:49): Oh, the linear. Good point. Linear types as well.

AL (0:45:51): Yeah, yeah.

GK (0:45:52): So, one thing which we found really, really useful, which is more of a surface thing for what I just mentioned for these user-defined, like embedding user-defined data types, is really stuff like pattern synonyms. And that might not be very theoretically sophisticated type level things, but I think for user convenience, it’s really great.

With linear types, we thought about having linear types, in some sense in Accelerate, a little bit to trace when we can recycle data structures, which is really important. We thought about that to some extent, but I think it is actually easier for us. We have the whole program because in Accelerate, you write your program and then that gets compiled to several kernels, which then run on the GPU, but it’s, in some sense, whole program compilation in there. Now, just looking at the program, we have quite a lot of information, and I think that is probably easier than using something like linear types to track it on the type system. Because as much as I like linear types, and we use them in Cogent a lot, but it also makes it hard sometimes for the user to express what they want, right? So, it’s nicer in some sense. If you can find this, extract this information in the program via program analysis, I think it’s more user-friendly. But I mean, apart from that, I really liked the linear type stuff in Haskell. I just don’t think it’s for Accelerate necessarily the best way to go.

AL (0:47:42): Are there any wider lessons that you would want to convey from all your experience with approaching this? Perhaps generally, the theme seems to have been to try to make certain areas of Haskell reliably performant, which I quite like. I think functional programming can be performant, but it is also good to not having to rely on potentially brittle, fragile optimizations like fusion, but I don’t know, designing special purpose DSLs. Is that something which you think works? Like identifying subclasses of problems and then making a DSL, or is that an approach that should be copied in other areas? Or do we have everything we need, in your opinion, or are there big open goals?

GK (0:48:35): I think for these big problems, using domain-specific language to target special sub-problems basically, which you know how to handle well, I think that is a very good approach. And I really think Haskell is the best language I know of to embed a language because we’re like, in Accelerate, we use particularly all – apart from linear types, but we use all kinds of type level features excessively. It’s useful both for presenting a nice view to the user, why are these, why are type families kind of mapping things to some less convenient internal representation.

I think for us, it was really valuable to have like a typed AST because lots of people work on that compiler. And in the beginning, we didn’t have a fully typed AST. And the tools Haskell gives you to keep track of the type of your computation, the free variables, all these things on the type level, in my view, is especially important for embedded languages, especially deeply embedded languages, where you basically compile a program actually at the runtime of the Haskell program. So, you really want to make sure that you don’t have any type errors in your embedded language because they become runtime errors in the host file. And that’s obviously something you really want to avoid.

AL (0:50:21): So yeah, basically you’re saying this is something at least in this area to actually strive for, right? I mean, type level programming in Haskell always is a little bit controversial. So some people are saying, “This is just overly complicated, just stick to the simple stuff.” But you’re saying that in this area, it’s really worth the trade-off. I mean, do well-typed, well-scoped ASTs because you gain so much.

GK (0:50:48): Yeah. For us, it’s definitely worthwhile. And don’t get me wrong, you pay a price for it. It is really complicated. And we see this if people wanted to – we have lots of students doing their thesis on Accelerate. It’s a very steep learning curve. Also, implementing transformations on the AST is much more complicated if you have to convince the compiler that it’s correct as well, that it’s type correct as well. So, it’s not that it’s a free lunch, right? You have to make up your mind, basically what you really want to track in the type system and whatnot. But I think for us, it is definitely worth the price. And maybe even if you have lots of people working on the compiler, if you also want to have the things in the compiler, which students implement, yes, it’s much harder for them to get started. I mean, the old-type story. Then once it’s implemented, chances that it actually works a lot higher.

AL (0:51:56): Probably also easier to sort of trust these contributions in the first place.

GK (0:52:01): Exactly, exactly. And I mean, that’s what – we have a very small group working on the compiler. So, we really rely on other people and students to contribute to the compiler. But yeah, it’s not without pain. Definitely not. Yeah, no, but I’ve been basically looking at – I would like to have something Accelerate, I mean, in other languages, but in some languages to offer some parts of what Haskell does. But I think Haskell, in some sense, is the most comprehensive one if you look at everything we need. Of course, I mean, it’s also a chicken-egg thing, right? We did these things because it’s available in Haskell. But yeah, I think that’s probably the biggest thing for me about the language.

AL (0:52:54): Okay. Thank you very much for taking the time to talk to us, Gabriele.

JB (0:52:58): Yes. Thanks a lot.

GK (0:53:00): Well, thank you for the invitation.

Narrator (0:53:04): The Haskell Interlude Podcast is a project of the Haskell Foundation, and it is made possible by the generous support of our sponsors, especially the Monad-level sponsors: GitHub, Input Output, Juspay, and Meta.

57 – Gabriele Keller

Related links

Transcript