69 – Jurriaan Hage

Recorded 2025-03-11. Published 2025-08-25.

Today’s guest is Jurriaan Hage. Jurriaan is a professor at Heriot-Watt University in Edinburgh who’s worked with and on Haskell for many years. He’s known for the Helium Haskell compiler, specifically designed for teaching, and he has plenty of other projects related to Haskell, including improvements to the type system, the generation of better error messages, or detection of plagiarism.

Transcript

This transcript may contain mistakes. Did you find any? Feel free to fix them!

Mike Sperber (0:00:15): Welcome to the next episode of the Haskell Interlude. I’m Mike Sperber, and today’s guest is Jurriaan Hage, whom I will be interviewing with Wouter Swierstra. Jurriaan is a professor at Heriot-Watt University in Edinburgh who has worked with and on Haskell for many years. He’s known for the Helium Haskell compiler, which is specifically designed for teaching, and he has plenty of other projects related to Haskell, including improvements to the type system, the generation of better error messages or detection of plagiarism.

Wouter Swierstra (0:00:46): Okay. So welcome, Jurriaan. The first question we usually ask our guests is, how did you get to Haskell?

Jurriaan Hage (0:00:52): Well, I started, I mean, as normal. I started my studies in ’88, and then you learn all the competitive languages like Pascal and C, C++, and so forth. But that was a course in functional programming, and I did that in ’91, I think. And that was Scheme. And I actually programmed a lot of Scheme for about nine years also, including part of my PhD.

But then, when my PhD was over, I needed to find a job, and I wanted to stay in academia, and I applied in both Eindhoven and Utrecht. And Utrecht is pretty much a hotbed of Haskell programming. And I was actually hired. I was offered a job there and hired to work on static analysis. And this was in a group where many of the people that worked there were very proficient in Haskell. So Johan Jeuring, Doaitse Swierstra at the time, and a few more people. So yeah, it was clear that, at some point, I would need to learn Haskell because we used it a lot in our teaching. We had a course on functional programming. We had a course on implementation of functional languages, programming languages. And for that I needed to both learn Java because I was also teaching on other courses, but I also had to learn Haskell. 

And so, yeah, because of course I knew Scheme quite well, I’d written lots of code in Scheme, from a PhD and for other reasons. So, functional programming was relatively easy. And actually, I mean, I sat down, and I implemented something that made very much use of laziness. And I love the way that I could use laziness to implement something in a fashion that would not be really possible in any other language that I do.

WS (0:02:24): So what kind of things did you do in Scheme?

JH (0:02:26): In Scheme? So I did my master’s in developing a reversible programming language, and I implemented all of that in Scheme. Actually, when I did my functional programming course, we had to basically implement all of Simon Peyton Jones’s book. That was basically the exam. So I did that in Scheme. So that was the first program I wrote, implementing all of that in Scheme. And later on, I did a PhD where I did a lot of work with graphs and groups and group theory, and that turned out to be much more easy to implement in Scheme, of course, until the point that I ran into serious performance issues. And then we implemented everything in C, and it ran about 600 times faster when I did that. And because I was doing combinatorics on graphs, that was really necessary. I needed that speed. 

But the nice thing, what I really loved about that, is that when I re-implemented everything in C, I ran everything in C and I got answers out, but I could still verify results in Scheme. And that implementation was completely different, right? So in Scheme, everything was with maps and folders and everything. Well, everything in C, of course, was hand-carved, manipulating matrix-like structures. And that kind of give me a strong confirmation that my results were correct.

WS (0:03:36): So you could check the results that your C program produced?

JH (0:03:41): Yeah.

WS (0:03:41): And then verify it. 

JH (0:03:42): Yeah. Because finding it was the hard work. I had to look for certain kinds of graphs. In C, you can do that very quickly, and it could do that in place. While Scheme had a hard time doing that because the numbers of graphs were like two to the power of 50, not actually two to the power of 50 graphs that I needed to check, and you cannot do that in Scheme. At least you could do that, but then I would spend a lot of time circumventing the restrictions. I wouldn’t be able to use the idiom, let’s say, a functional program where you just construct a list and you walk over that list and you do an AND, or you do an OR. And that would, performance-wise, not work at all. I looked at all graphs up to end vertices. With Scheme, I got up to five vertices, and with C, I got up to 12 vertices. And that’s a lot of graphs. So that’s how I got into that. 

I also did a bit of Clean programming because when I was a PhD, we had to go to a school for Epoch, the research institute, the research organization. They organized a week on functional programming and software technology. So we were taught by Eric Meijer. We were taught by Henk Barendregt. We were taught by the Clean people. So did a bit of Clean programming there. But, yeah, I mean, since 2000 or so, if I do any functional programming, it’s Haskell.

WS (0:04:52): And how was the transition from Scheme to Haskell? Was that a big step? 

JH (0:04:56): Not necessarily. I mean, in those days, for example, becoming productive in Haskell didn’t need you to actually learn many libraries because there weren’t that many libraries. Of course, I focused very much on using all the idiomatic stuff. I didn’t need very complicated things. I never actually built very big programs, except if you discount my contribution, say, to Helium, which I also did not program a lot of. I’m just the maintainer. So no, I don’t think I – there was not a big difference. I mean, there are some things that are a bit more inconvenient, I think, in Scheme. I mean, the syntax to some extent is inconvenient, but if you have a good Emacs mode, you can survive that. It’s not problematic. 

What I never liked about Scheme is that you have to explicitly query on query. So that’s something that I did like in Haskell. A lot of things, like operator slices and stuff like that, that’s so convenient. Similar framework, list comprehensions, I love those as well. And I think, yeah, I mean, having types around is, for me, useful. But I mean, it never really hurt me. I think types become particularly important first when you’re taught how to program. But also when you want to build code that is maintainable over a long period of time, and doing it with lots of people at the same time, that’s when types really become important in my mind. If I just write code for my own PhD, then, yeah. And the stuff I did in Scheme was also mostly straightforward. 

WS (0:06:10): But static typing wasn’t a barrier to learning Haskell? 

JH (0:06:14): Oh, no. It was more the error messages at the time were a problem for our students, right? And that was what motivated the development of Helium. And not my motivation, by the way, right? I did not instigate Helium at all. That was Arjan van IJzendoorn and Rijk Jan van Haaften. So this was a teacher who taught functional programming at the time in Utrecht, and one of his students. And yeah, they were kind of fed up with the bad error messages, particularly when it involved type classes. So they said, “We’re going to reimplement Haskell, but then just leave out the type classes, just to give students an easier way of getting into the language without being hampered by these error messages.”

WS (0:06:47): That’s kind of like Elm.

JH (0:06:48): Well, yeah, to some extent. Well, I mean, yeah. Well, Elm, but I think Elm does it for a different reason. Actually, we did a study on Elm quite late with a master’s student, and we actually implemented type classes in Elm, and we could also give them good error messages. 

WS (0:07:00): Okay.

JH (0:07:00): But then the Elm people still didn’t want to have it. So I told them, “Look, we did this.” We implemented this even in the Elm compiler, but they wouldn’t have it. So I don’t think this is because of error diagnosis. It must be another reason why they don’t want type classes. 

But yeah, so this was the main motivation for Arjan to work on that. And of course, it was very pretty. So basically, Haskell 98 without the type classes, more or less, that’s what the compiler was. There’s still a lot of work to implement, right? Because it’s not just that you want to have some kind of compiler that sort of works. You’re giving this to students when they are working on their programming assignments, right? Yeah. So it really has to be a usable tool, production-strength tool, actually. So that’s what they started to implement. 

And then Bastiaan Heeren, my first PhD student, who I’d gotten involved with when I moved to Utrecht, he was working on type error diagnosis. So they asked him, “Why don’t you –” and so to some extent, this kind of just, I think, organically happened, is that Bastiaan got involved to implement the type system, the polymorphic type system, while they focused on the parsing and the lexing and stuff like that, and I guess code generation a little bit. And then he did most of the typing. And then this was also for him, then a testbed where he could implement his innovations on type error diagnosis. 

So this is really serendipitous, right? These are two different strands of activities that came together. And then on top of that, we had Daan Leijen in the group, who was doing his PhD with Eric Meijer. Eric Meijer had left – well, he was almost done. And he basically ripped, as I can see it, the garbage collector out of OCaml, and he implemented this lazy virtual machine in C++ based on the OCaml code. And that was actually a very fast virtual machine for running our Haskell programs in. And so we could kind of give that to the students to work with. And we did that for a number of years. We had, I guess, 200 students, maybe a bit less at that time. And from 2002 or ’03 up to 2005 or 2006, we used it every year for our functional programming course.

WS (0:08:56): So what was different about Helium’s error diagnosis as compared to GHC or Hugs?

JH (0:09:02): Well, I think that the big problem was of GHC, I mean, Haskell maybe in general, is it has a class system that’s open world, which means that you cannot really say that if you write a piece of code and there’s no instance for it that there might not ever be an instance for it. So you cannot often take advantage of that situation. It would be bad to say this is not correct because you could make it correct at some point. 

So while, for example, if you say, “I’m comparing two functions,” you know that there will never be an instance for comparing functions. And this is actually what Helium – what GHC does have now with their type error class, right? You could say, “Well, this is never going to happen.” It’s a bit of a hack, but I think it works well enough in their situation that you could say, “Well, there’s never going to be an instance Eq for A arrow B,” and that will map to some kind of type error class. And then you say, “Well, then the compiler can take advantage of that by saying not, ‘Oh, please define an instance for Eq A arrow B,’” which is in principle always possible, but just give an error message saying, “Well, we don’t compare functions. Sorry about that.” All of that goes away when you don’t have type classes, right? So, yeah, life is easier. Of course, you also lose a lot of what is beautiful about Haskell, which is, in my mind, a very elegant approach to ad hoc overloading, ad hoc polymorphism overloading.

One of our papers, actually the paper in Puddle, I think 2005 or so, actually tries to kind of add directives to Haskell or to an implementation where you can actually do that at a high level. You can say, “Well, there will never be an instance for this,” or at this point, you know all the instances that there are ever going to be, right? And then the type error mechanism can take advantage of that by giving more tailored error messages when that happens. I think that was the main thing that was in the way of Arjan when he was teaching. 

The other things were, I mean, polymorphism is not easy, of course, parametric polymorphism, but actually in that course, we actually spent time on what unification is, how it works. They actually had to hand unify types at the exam, right? So that was a thing you could teach them. But the problem is that when students start the program in Haskell, they will basically immediately be using type classes, while maybe the lecture on type classes could be six, seven lectures away.

And there are other solutions to this, right? I still remember seeing the lecture notes of Manuel Chakravarti. So he just started his first lecture notes, explaining about type errors. If you see this, then this is actually what it means. And that’s also a way of fixing it. But what was maybe interesting at the time is that a lot of students complained. So they used this, the fact that the compiler did not react very clearly to their programs. They used it also as a bit of an excuse to say, “Well, this technology sucks. We don’t want to work with this technology.”

This has changed. This changed over the years. Not just because I think the technology got better, but also because it has become much more acceptable to teach this functional program because they do see it. And you can argue that it’s actually everywhere nowadays. It has entered in other programming languages because they also have seen the advantage of higher-order programming, polymorphic programming. And to my mind, that’s the most important thing, not so much Haskell itself, although I like Haskell as a package of particular concepts. But for me, it’s much more important that I have access to these concepts, to high-order programming, to parametric polymorphism. 

WS (0:12:09): Algebraic data types. 

JH (0:12:10): Yeah, algebraic data types, indeed.

MS (0:12:12): It’s interesting that you talk about introducing the meaning of type error messages early in the course. I remember John Hughes saying that he teaches students to not read the type error messages at all. Instead, just look at the location. 

JH (0:12:27): Yeah, yeah, yeah. That’s another possibility. But the thing is, with type classes, I don’t even think that – well, I mean, it will point probably to the place where it finds an unsatisfied instance constraint. So yeah. I mean, you can do that as well. But of course, if the students are looking for an excuse not to like the assignments, I mean, they’re going to pick on that, right?

MS (0:12:47): No, and it was definitely an obstacle, right? I mean, things are a little better now, but there were certainly versions of Haskell where you would say unresolved overloading.

WS (0:12:54): No, and some of the newer books, like Rebecca Skinner’s book, I think, which kind of really goes out of its way to show this is an error message that you get and this is how you fix it, right, which is very useful but also susceptible to change because maybe GHC has a new version and then the error –

JH (0:13:13): True. In fact, okay, I have an anecdote about that. So after quite a few years in Utrecht, I was finally allowed to take over the functional programming course. So I think this was when Doaitse retired. He had been teaching it for six, seven years. Before that, it was Arjan, and Arjan left. And it was interesting because I basically took the lecture notes we had. I mean, the lecture notes had been going on so long, long-running development of lecture notes, and this was like 200 pages about Haskell in English and Dutch. And it was a huge section there as well, written maybe originally by Arjan, but maybe also by Doaitse, on, okay, if you write this particular program, you run it through Helium, this is what you get. You run it through GHC, this is what you get. And it was interesting to see the divergence between the error messages at that time. But this was done probably in 2005. 

So when I got these error lecture notes, I sat down and said, “Okay, what does GHC say now?” So I ran all the programs through GHC, and it gave the exact error message that Helium did. Ah, right. So GHC had developed, right, over the years to give actually better error messages because I think the Helium error messages were clearer, but also they had adopted those. Not by adopting the technology that drives Helium, which is very different from what drove GHC at the time, but it was interesting that I could see that for Haskell 98 programs at least, they now had become quite similar. So I guess the main selling point of Helium at the time was still domain-specific type error diagnosis, which is something that GHC doesn’t have. And also, of course, GHC has the advantage that it has access to all these advanced features like type class, type families, and GADTs in particular.

WS (0:14:47): So I think that’s one of the features of Helium, which hasn’t been as widely adopted, which I’d love to see, is this domain-specific kind of custom type errors. Can you say a little bit about how that works?

JH (0:14:58): So, this was some, and this was my first paper, I think, with – no, this was my second paper with Bastiaan. So I still vaguely remember the following. I think I’d said at some point to Bastiaan, I said, “I think what we want, for example, let’s take the Parsec combinators, like you have something like Parsec or the Parsec combinators.” Basically, what you’re doing there, you’re implementing something, if you have these combinators, in a domain-specific language that is tailored to the domain of EBNF grammars. And you can build new combinators out of old combinators because of the higher orderness and because of the polymorphism. It’s beautiful, but essentially what you’re dealing with is an embedded domain-specific language in Haskell, which is, I think, great. But the problem is these are all encoded as Haskell functions, Haskell data types. So when you make a mistake, error message that you get is in terms of the underlying encoding in Haskell, right? And if you expect an EDSL, you can still, if you’re a good Haskell programmer, you can still work with these things fairly well because you also understand the underlying technology, the underlying stuff. You shouldn’t actually need to be able to look at them, but you can get along with that. But if an EDSL is also kind of meant to have non-expert programmers develop programs in that domain without actually knowing about the host language that it’s embedded in, then you’re in real trouble, right? 

So that was always been – that’s kind of like the main lie that kind of turned into my research theme, is that I want to make complicated technology more usable. So I had this idea then, and I was discussing this with Bastiaan, thinking, well, what we maybe could do is we could kind of tailor specific type rules for specific function calls. So normally, you would have a type rule for all applications in Haskell. That’s built into the compiler. But what if I could rearrange the typing of all applications where the function that I apply is a specific function, one of the Parsec combinator functions, and then associate, with all of the constraints there, a particular error message that can be returned? Then these error messages can talk about Parsec combinators. They can talk about parses, right? A lot of work. 

My main problem was that I couldn’t see how you could do that while still retaining the type soundness of your implementation. But then Bastian said, “But I know how to do that.” So we got to work. And that turned out to be our 2003 Scripting the Type Inference paper at ICFP. And this was all built into Helium. We could actually use it and play with it, and it actually worked really well.

The main problem with the current version of Helium is because we made so many changes, actually, that support is more or less gone, actually, in the current version. Also, because at some point in Helium, I wanted to move to having GADTs and type families and make sure that type error diagnosis there is also understandable. And for that, we had to implement OutsideIn. You cannot do GADTs otherwise. I mean, you could redesign something completely different, but why would we? But you can simply not get away with playing Hindley-Milner if you want to do GADTs because of the local type inferencing.

So we built that in, but that also means that all of the other support is gone. And it’s not difficult to do that, but it would take quite a bit of engineering. And I don’t have time for that. And PhD students actually also don’t have time for that, right? Because they need to write papers.

WS (0:17:57): So could you imagine retrofitting these ideas back onto GHC? 

JH (0:18:01): Ah, yeah. Interesting. Good question. I think it has come up sometimes in discussions with Simon. I think he, at some point, wanted it. But you have to remember that it would involve a major revamping, possibly, of GHC. And I saw that also with the impredicativity work that we did. So the first paper took us about three years, maybe even four, with Alejandro and also Dimitrios Vytiniotis and Simon Peyton-Jones, and it became a PLDI paper. 

What I wanted is that the GHC has basically two phases of typing. It has a bidirectional type system to do with hiring types, and then it has OutsideIn. And I was thinking that for error diagnosis, this could be problematic. And with Alejandro, I had hired him as a PhD student to work on advanced features of the Haskell type language because Bastiaan had looked at Haskell 98 with type classes, and I wanted to look at type families, GADTs. So we needed some way to kind of deal with GADTs in the first instance. 

What I thought would be useful is to integrate the bidirectional type system into the OutsideIn framework. So you could look at everything at the same time because that’s what worked for us in Helium. Not looking at a type constraint at the time, but looking at type sets, sets of type constraints at the time, because it allows you to better diagnose what is wrong if something is wrong.

WS (0:19:22): Right. So the example here is you collect type constraints about your functions and applications, and then at 17 places you think F has type int to boo, and there’s one call which assumes that F is string to bool, right? And then who’s to blame here? Well, it’s probably the one. 

JH (0:19:38): It’s probably the one. And you can only do that. And if you don’t look at constraints as a set, there’s always going to be cases where you will point to the wrong place, the one not at the exceptional case but actually at the non-exceptional case. But if you look at all of these constraints in one set, we did that by turning them into type graphs and then having heuristics run over these type graphs. That’s how we technically did it in the compiler. Then each of these heuristics can say, “Well, I think this is a problem.” And another heuristic say, “I think this is the problem.” And there’s some kind of trust factor there because more specific type errors, I think we are better than non-specific ones, and then we elect a type error to kind of report.

Now, back to the predicativity is that what we wanted to do with Alejandro is to integrate the bidirectional type system into OutsideIn. And what was fortunate was that when we did that, it turned out that we could also do impredicativity in a way. And that was something that Haskell at the time did not really support well, GHC. It did support it, but it had turned out to be unsound. So the feature was still there, but you could not actually use it well. 

So we talked to Simon and Dimitrios. They were, of course, very interested. I mean, Dimitrios did a PhD in this area, impredicative types. And this technical notion that actually Alejandro developed that allowed us to kind of come up with a solution, that’s still now in GHC. Yeah, we still want it. I still want it because of error diagnosed to bring everything into the OutsideIn framework. But the problem is that if you do that, you have to basically redesign the entire OutsideIn framework, which is what is more complicated even is that when you do that, all the other type features that GHC has possibly could interact with that, right? So when you make a change to OutsideIn, you also have to take into account, what does that do to all my type extensions? Because Haskell is not a single language. Haskell is dozens of languages. We have impredicativity on or off, higher rank on or off, all these variants, higher kind, whatever, right? So all of these things, overloaded this, overloaded that. So ascertaining then that that works as it should is a huge amount of work.

So the first paper was, I think, from a theoretical point of view, fantastic, and it worked well. And we could show that we can actually deal with a large class of programs that are type-correct, and we could actually ascertain them as being type-correct that were impredicative. So we started doing higher rank. We got impredicativity more or less for free. And the hardest part about the research was not so much, I guess, the implementation, but actually carving out a design space that we could say, “Well, first of all, we can do this. It sounds, but also, is it explainable to somebody who is going to program with all this stuff, right?” So there were a lot of soft constraints, as it were, and that made our life quite hard because sometimes you could say, “Oh, I want this program as well,” but then you would also get a number of other programs in that should not be in, right? It’s really carving out that space.

But that is also why there was a follow-up paper in ICFP a year later, because that is a paper that actually still has the original structure of GHC, a bidirectional type system followed by OutsideIn. So there, there were no changes to OutsideIn. The only thing we did was change the bidirectional type system based on the ideas that we had for the previous paper. And that allowed us to be impredicative with more or less the same, a slightly different bidirectional type system before that. And that, of course, from the perspective of Simon and Dimitrios, was much to be preferred because it involved much fewer changes to GHC, and you could leave the entire OutsideIn, which is a complicated framework, unaffected, and would also be a lot easier in the sense of interactions with other features. 

My motivation had never been to build impredicativity or something like that. We wanted to work on improving type error diagnosis for higher-end type systems. That was the target. And we actually never got around to that because then Alejandro got his PhD because he was done, and I never had a follow-up student because I moved to Edinburgh.

WS (0:23:35): So another line of work that you’ve also looked into has been kind of plagiarism detection. Can you say something about that?

JH (0:23:42): Yeah, yeah, yeah. So when I was – so like I said earlier on, when I moved to Utrecht, I had to study Java, and I had to study Haskell. I was doing the TA-ing and running the labs for a concurrent programming course, and I found out accidentally that some students had copied code from each other. So I was looking around. Are there any tools that can help me find plagiarism automatically? I was not aware of MOSS at the time. I think MOSS probably was around, but it was not – people weren’t using these kinds of tools. So I just wrote a plagiarism detector in Perl because I had this idea. I wanted to do this based on regular expressions. So I wrote a – I still have that program. I still use that program sometimes, about 440 lines of Perl using all kinds of regular expressions to compare Java programs, and now also C# programs. 

I applied that on my own students. I applied it in other classes for other teachers, and I found quite a bit of plagiarism. Published a paper about it with my wife. Then I had a student come to me, and he says, “I want to do a dissertation with you.” And I said, “Well, I have this idea. We could kind of try and adapt what I did in the other paper to the Haskell setting. So let’s implement a plagiarism detector for Haskell in a slightly different way, because regular expressions don’t work really well in Haskell’s setting, because actually the syntax is not noisy enough.” What helps you working with regular expression is there’s a lot of signposting in the syntax in the language like Java and C# that are gone in languages like Python and Haskell, which actually makes these languages, the programs in these languages, nicer to look at, I think. But it’s a bit in the way of my work. 

WS (0:25:15): So lots of curly braces, semicolons. Yeah.

JH (0:25:18): Exactly. Curly braces, semicolons, all that stuff, right? And that’s all gone in Haskell.

MS (0:25:23): But before you go to Haskell, could you explain a little bit? So I know your approach with Java and C# programs involved regular expressions, but could you explain the general idea?

JH (0:25:31): Yeah. So what do you do? So I start with the Java program. Let’s say I start with a Java program, and then I have regular expressions that, for example, first throw away all the comments, all the literal strings, everything like that. So all the comments are gone. End-of-line comments, multi-line comments, and strings. What I do actually, I also convert the entire program to small case, to lowercase. And what I then do is I take that program that is in lowercase. Of course, it’s not a Java program anymore because even the types are in lowercase. I don’t care. And what I’m going to do is I’m going to map that piece by piece into an uppercase program, where, for example, I replace all the identifiers with the single identifier capital X. So that’s a single X. So I know that in that position there was an identifier. And I do the same with numbers. So all the numbers become N. All the floating points become F. 

Essentially, what I’m getting, I’m getting the token stream, as it were, from lexing the program, but without actually lexing the program, without actually parsing. That is, of course, a challenge in the sense that the Java programs are nested, right? And I actually, for plagiarism detection, doing plagiarism detection well, I actually need to know where the classes are, where the methods are, and so forth. So there’s a bit of magic around, let’s say, a while loop around my regular expressions to make sure I can deal with that effectively.

But essentially, what I do, I take a program, and I map it to a stream of token types. So you get – all the operators are still there, but all the identifiers become X. All the things like keywords, they get keywords. And that is something I can diff with something else that somebody else did. So then I map everything to the line of its own as much as I can, and then I just think the text will diff, the Unix diff of these two files. And the more comparison, the higher the similarity, the more likely I think they are. And this works out quite well. 

WS (0:27:26): And can you defend against this as a student, right? You’d have to really change the lexical structure of your program, right?

JH (0:27:32): Exactly. So the idea is that when you do these regular expressions, you throw away the things that are easy to change, like renaming identifiers. I often got two programs that are exactly the same, except these identifiers were in English and these identifiers were in Dutch. Because in the Netherlands, it’s okay to write either form, right? They would translate the comments. They would delete the comments. They would turn all the identifiers into I, J, K, L, N, M because they said, “As long as we have something working, we get some points. We don’t care.” So these are things you have to fight against. 

What they also did in Java is they would reorder their methods around, which, of course, you can easily do because the semantics is the same. And that’s why I actually needed – that’s why my plagiarism protection tool actually needed to know where the methods were because I would sort them. And I would also find cases where students had actually done a lot of work trying to hide their plagiarism, but my tool would warn MOSS. MOSS is also pretty good, but my tool would tend to find them. But this is the Java version. 

With Haskell, I wanted to have a different approach because I already saw – I actually applied a version of my tool, I think, to Haskell. It didn’t work at all. There are too many false positives. And so I thought, “Well, then we do something completely different.” But of course, I had access to a parser for Haskell. So what we actually did there, we did it a bit deeper in a sense, and we applied lots of heuristics and everything, and then we made a comparison. 

What was for me the most interesting thing is that I had the students build all that software. At the university, of course, we’d been teaching certainly Haskell since I was there, like from 2000. I think I did this study about 2013 or so. So I had 10 to 12 years of Haskell programs being submitted by students. So what I did, I ran my tool over all of them, right? So I organized them per assignment because it doesn’t make sense, of course, to compare a program for one assignment with a program of another assignment. But I would have to say, we maybe used a particular assignment for six years, and then we’d have 100 students per year. And then I would compare all of these to each other. And I found not a huge amount of plagiarism, but I did find 75 or so clear cases of plagiarism where people say they got the program in 2003 from somebody or found it online and then changed it a little bit, tried to hide, sometimes tried to hide the plagiarism, and then would submit it two or three years later. And we hadn’t caught those, of course. They also had to pass the exam because I actually checked afterwards what happened to these students, and I could actually find that information about them. And every time I did that, I found that one of the plagiarizers had left the studies at some point and never failed and never passed the course and left the university. That was reassuring to some extent, but yeah, with all this data, it did allow me to do quite an extensive comparison.

WS (0:30:18): Yeah. So nowadays, the worry isn’t so much about plagiarism but about people using AI and Copilot or some large language model to help in their programming.

JH (0:30:28): Yeah. 

WS (0:30:29): Is that something you’ve thought about?

JH (0:30:31): I have not, well, thought about it. I have no experience in that, right? So that’s clear also because I mean, currently I’ve not taught any courses for three years basically, because I’m the head of the department out here and that’s all I do. But I do have some thoughts. 

So one of the things that a lot of people say, “Well, why don’t you use your tool to look on the internet, whether you can find comparable programs?” And that was for GenAI, right? I mean, GenAI just makes the internet a bit more active, and you could ask more, prompted more with questions saying, “I want something like this.” Actually, I do have a student now who’s actually investigating, can GenAI tools at this point already fool existing plagiarism detectors? He just started his – he’s just planning out his research now. But he’s doing this on Python, not on Haskell, because at this university, not many people actually know how to program in Haskell. We have one course that does Haskell.

But to get back to the other thing is that one reason why I never looked on the internet is that what I found is that typically if there is something on the internet that can be found by students, more than one student who’s looking for it will find it. And then I will find them, right? Because they will be very similar to each other, right? So I’ve always hoped that that’s enough. Maybe with the GenAI, the situation is a bit different if you can prompt it, right? So if you can prompt it to kind of push it into a particular part of the state space of all programs that solve your problem, then that will be a challenge. That basically is. I can’t imagine it.

So I think maybe in the end, I mean, the solution to GenAI should maybe be that you actually will converse with the student about their solution. And that’s the only fail-safe, only foolproof way I think of dealing with this, and in a much more general setting than a Haskell program, right? This is even when they write the dissertation, whatever, right?

WS (0:32:16): So one observation I’ve heard about kind of catching people who use GenAI methods is that they generally don’t understand what they’re doing.

JH (0:32:23): Yeah, yeah, yeah, exactly.

WS (0:32:25): So they get some code out, which is valid Python, which maybe solves their problem, but then it uses features that were kind of not covered at all in the course, which are highly non-standard.

JH (0:32:37): Right. 

WS (0:32:38): And then they just trip up obviously in that way, right? But maybe that’s wishful thinking. I don’t know.

JH (0:32:44): Well, it’s still difficult to convict people. I mean, the thing is, I mean, note that the plagiarism detection tools don’t discover plagiarism. None of them do. They discover similarity. And then you as a teacher still have to decide, is this similarity plagiarism or not? And that is based on your understanding, for example, of what they should know and not know. This is based on your understanding of what have I taught them in class, right? If somebody implements a quicksort in the exact same if somebody else does, and you say, “Yeah, but this is the idiomatic implementation of quicksort,” or “This is the implementation I showed on the slides,” it’s not a problem if they use that, right? So even similarities are not enough. And if they have a similarity in things that you think are actually not relevant enough, well, then maybe you think, “Well, this is not doing it.” As long as they do the stuff that I really care about themselves, I’m okay with that. 

It’s always going to be a judgment call. And even big similarities also doesn’t mean that you can easily convict people. So what I always do when I design assignments is I always try to have at least one assignment in every course where there’s a lot of room for creativity on the part of the students. Why is that? Because if different students work on this assignment, then there’s parts that are more or less predictated. This is how it should be, and this is how it should work, fairly precisely specified. But all the other parts, they are free to invent themselves. And the chances that somebody would then invent the exact same thing, very, very small, right? It’s like giving the students enough rope to hang themselves by. That’s how I tend to think of it. And it actually does work. 

One interesting thing is that, and maybe this brings us to when I was teaching functional programming. So when I took over the course from Doaitse, they used to have this structure where you have two big assignments. And the problem was, and we found that students were disengaging because they had done courses where it was okay to kind of not do anything about a course until a week before the deadline. The problem with programming language courses, that doesn’t work because everything builds on everything, right? And particularly the pace was quite high with Haskell because they’d seen Java in the previous year. They had a lot of experience in Java. For them, this was actually the first seriously, maybe seriously hard course they were doing. And some students disengaged for, let’s say, two weeks, and they went to number two and three, thinking, “Well, I get back in the week number four,” and that’ll be too late. 

So my idea was, and I was doing this together with Ruud Koot, who was my PhD student at the time, we wanted to have a situation where we started with a series of small assignments, one per week, more or less, or one and a half weeks, so that they really had to get to work right away, but only on mastering the things that we taught. So there would be one assignment about higher-order programming. There would be one assignment about, let’s say, dealing with map filter folder. There would be one assignment about type classes, and so forth. So it’s small nuggets, small but important parts, algebraic data type would be one. So I think we had four of them, and that these would take us like five, in total, maybe five, six weeks of the total of 10. 

But after that, we just said, “Here you have gloss. Here you have diagrams. Now build a game in Haskell, an interactive game in Haskell, a graphical interactive game in Haskell.” What was fantastic is that they could actually do that quite well. If they’d done the first assignments, they could do that quite well. However, this game, of course, was very creative. So we didn’t tell them what game to develop. We gave them some ideas, but they would all kind of choose their own graphics, their own whatever. So if I was ever suspicious of anybody not doing the work himself, it was quite easy, just running this through MOSS, because MOSS can actually handle Haskell. Running that through MOSS would clearly point out plagiarism quite easily. 

Now the first assignments, however, were very precisely specified, and there was a reason for that. The first class I ever taught, we had 295 students, right? Let’s say you have the first delivery date for the first assignment in week two, and then the next assignment has to be delivered one week later. Students will react very badly if you don’t give them the feedback about the first assignment before they hand in the second assignment. They will say they will be disadvantaged, and they’re right. 

So what we decided to do is we decided to implement, or Ruud took actually most of the heavy lift, all the work there essentially. What we decided to do is to design assignments that could be automatically marked completely for functionality. So Ruud designed small but compact, concise assignments focusing on a small piece of the Haskell language and then write QuickCheck properties, very precise QuickCheck properties, basically for us to be able to say, “Well, if you pass the properties, then your program is 100% correct.” Sometimes we did run into programs once in a while that was indeed correct and did not ask the properties, and then we had to extend the properties a little bit. But we used that. We gave access to those tests to the students, right? So they didn’t know what the tests were and what the properties were, but we had a system where they could submit a program, their program that they were working on, at any time, and it would say, “Okay, you’ve passed this, you’ve passed this, you’ve passed this, but not this and not this.” So when they submit it, they would know, “Okay, I passed all the tests.” And then we would still look at style, we would still look at naming, and so forth, because we think that’s very important. But once you know that the program is actually correct, you don’t need to start it up and run and play with it. Checking style and things like that is actually something you could do in five minutes. If you get a bit of a hang of it, you take no more than five minutes per student to actually mark that. So that actually was what made that work.

And again, I mean, QuickCheck is something that is available in Haskell. You could have done that in Java. Well, certainly not at the time. Like 2013 or ’14-ish, something around that time. And it worked perfectly for us. And that was also noticeable in the sense that we had way fewer students who would be giving up. Why? Because they had to deliver something right away, right? They couldn’t sit back and say, “Oh, the first two weeks, let’s wait, see what happens, see if this is a hard course or not,” and then it would be too late. So I think that that works really well. 

WS (0:38:38): So, looking ahead a little bit, I mean, you don’t have so much time for Haskell nowadays, but –

JH (0:38:43): No.

WS (0:38:43): Do you have any ideas about what you’d like to do if you had more time for research or getting back into Haskell?

JH (0:38:47): Okay. So yeah, there’s two things. Well, actually, two things I want to work on and one thing I am doing if I have the time. So I decided last year when things seemed to be working out a bit at the department that I’m the head of, I decided to write a book. And the focus of that book is to implement static analysis for functional languages, and all of that is implemented just in Haskell, right?

But what I want to do is I want to compare different ways of implementing these things. So with folds, with direct traversals, with attribute grammar system, all kinds of ways of implementing Hindley-Milner, of implementing type and effect systems, actually annotated type systems, nothing to do with effect handlers or anything. So that’s the thing I set myself to kind of do as a hobby if I had nothing to do here in Edinburgh. My family is back in the Netherlands, so when I’m here in the weekend, I might have nothing to do, just to have something to do in the weekends. So I started doing that, and I’ve gone some ways. But yeah, the main thing I need to do is build, let’s say, a few dozen implementations of different analyses with different levels of precision answers. So that’s the only time that I actually do Haskell nowadays.

So that is what I do if I have the time, but it’s been, well, nine months now since I last actually worked. No, no, in November in the plane to Malaysia, I actually tried to work on it, but not very often, I’m afraid. If I would have time – I do have a PhD student, but he actually works on Isabelle HOL. He’s doing something for Haskell, though, at some point. We just started on that formalizing system FC in Isabelle HOL and proved it sounds, including row types. So that is something that is Haskell-related that we are working on, and I expect that to be finished this year.

WS (0:40:28): And why Isabelle HOL?

JH (0:40:30): Well, that’s a technology that he loves to use, and that’s the one that he experienced with. I don’t care very much. And actually, I mean, last year he had a POPL paper, basically designing the groundwork to be able to do this effectively. I think it was also the most distinguished paper at POPL this year, this POPL. So that was very good for him.

But I mean, for me, the most interesting thing is how system FC and then also system FC with row types, because I like row types. But that’s a side thing. That is just happening. But for my own, if I look for myself to what I think are the challenges, what I was working on with Alejandro and also with some later master students was type error diagnosis for the advanced features of Haskell. So we’ve looked at type families, we’ve looked at GADTs. GADTs, I think we’ve covered quite well. Type families, I think we also have the technology right, but there’s some features missing from Helium to make it interesting enough. So we could only work with very simple type families. There’s something that needs to be done that to make that publishable, I think. And then I would also like to look at other extensions, to look at the predicativity from the type error perspective, and so forth. So that is one major thing that I’d like to do. 

But there’s another thing I’m interested in, and that is that some time ago, I had this project with Wishnu Prasetya. This was an EU project where we had to do something about testing. And we had Arie Middelkoop. He was a postdoc working for us, a scientific programmer. And what he did, he implemented the technology to instrument shockwave flash files. And that was all implemented in Haskell. It was really a nice study, and it was a huge implementation. But the problem was that they’d given us an example Shockwave Flash program to play with, and that was Habbo Hotel. And that’s four million bytes of bytecode, right? That’s peanuts to some extent, but it’s a big application in Shockwave Flash, right? And actually, it was used like 20 million people monthly worldwide, right? So this was a big player in that time. It’s like a second life for schoolchildren from Finland. 

The problem was that when we applied our transformation, because it took a Shockwave Flash file of 4 million bytes, and then we applied the identity transformation using our Haskell implementation, it took 23 gigabytes of internal memory. And that was pretty much unacceptable. That’s not workable. And then in those days on a machine that we had, I wanted to work on my laptop. So I spent some time, like a few weeks, getting to understand the whole runtime system of GHC and actually making sure that I could add the right strictness annotations in places to make sure that it actually ran within four gigabytes, and I succeeded in doing that. And yeah, if I would have to do that again, I can imagine it wouldn’t be that difficult, because once you know how the profiling and so forth work, things become, yeah, you can do it. But it’s still cumbersome that you kind of have to guess, okay, this is where I have to do something, particularly because we were using the attribute grammar system. And the problem was in the generated code of the attribute grammar. So I fortunately found something that I could tweak to make sure that the generated code behaved better. But yeah, I was maybe very lucky. It could be very tricky.

WS (0:43:41): Yeah. But the attribute grammar system is inherently lazy, right? It needs the laziness to schedule stuff.

JH (0:43:47): Yes. But Arie Middelkoop had just implemented, or maybe, yeah, I think Arie or Jeroen Bransen had just implemented new features to make the running of attribute grammars much more efficient. And there were strictness flags. But it was interesting that I had, say, 10 modules. For nine of them, I had to turn on the strictness, and for one, the number 10, I actually had to keep the strictness off, because it would actually slow things down, right? 

So this kind of taught me that a problem I see with Haskell, and I’ve also seen this in Factor or Fabrics, for example, where they decided, “I will stay for all the months, and also you worked.” They started to use OCaml because I think they thought using Haskell was too risky in terms of resource consumption, right? You don’t control as much your resources as you can in a strict language. 

So one of the things I found is that, yeah, if you really want to understand what GHC does to your code, yeah, the only thing you can do is look at the high files, to some extent, right? That would be the best way to do that. But it’s not a very productive way. What I would prefer is that my compiler gives me feedback on this function as two arguments, the first one of which is considered to be strict. So what I would like to have, I would like to be able to open up at a suitably high level of abstraction, and that is the challenge, right? What properties does my program have in terms of sharing, uniqueness, maybe exception analysis –

WS (0:45:07): Strictness annotations.

JH (0:45:09): Strictness annotations, right? What can it figure out? Because you don’t know, right? You don’t know because it’s all hidden in the compiler. You don’t know if it actually sees, because it could be you wrote the program in the wrong way that cannot discover the strictness because of maybe a lack of precision in the compiler. You don’t know.

So the first thing I would do is to open up that, making that visible to the program in some way. And I think you can use types for that, right? Types, I think, are a perfect method for explaining this is a function from int to int to int, and it’s strict in its first argument. You could easily have a compiler report that to the programmer when they ask for it. And then I’m saying, “Well, why would you stick to just strictness? You should do that for many other analyses as well.” And then what you could do is you could just kind of introduce, like, I mean, strictness annotations, where you say you have to be strict. You’re actually changing the semantics, and you hope you’re not screwing things up when you do that, right? And you might.

So I think there’s this whole interplay there of opening up the compiler and its optimizations. And to report, I think, in some high-level way, “Okay, this is what I understand about how the computation is going on.” That you know that certain values are not shared, right? So the compiler can take advantage of that. All of these things, I think there’s a lot of room for that. And that will allow us, because the problem of high-level language like Haskell is, you can program high-level. That’s fantastic, and I love that. But when you need to get down to the nitty-gritty detail, you pay extra, right? When it turns out that your implementation in Haskell is fast enough, okay, everybody’s fine. It’s fantastic because your code is compact. It’s high-level, high-abstraction-level. But when you actually need the performance and you only need it in a few places, that’s when you really start to suffer. And then you have to resort to profiling, and it would be nicer for me if I could kind of then say, “Look, please compile it. Give me some detailed information about this function, because I care about this function very much.” And then compare the types and see what it finds. And maybe you can help it along a little bit. 

And so this is what I would say optimization assistance is as a field. But it kind of uses type error diagnosis. Why? Because in the end, analysis in GHC and also in Helium are implemented as a variant of an annotated type system. So many of the ideas of type systems, like polymorphism and all that, are also part of that world. So anything you’ve learned and how to explain types translates quite easily to explaining annotated types. We actually did that in a security paper that we published in POPL in, I think, 2007 or ’08. So we already did that. So that kind of shows that that is workable. That actually works. 

WS (0:47:48): So what you’re thinking here is that you have static analysis like a strictness analysis or a uniqueness analysis on linear types in Haskell, right?

JH (0:47:57): Yeah.

WS (0:47:57): Where you have certain information about the behavior of your program, and especially related to the performance characteristics, rather than its behavior. It’s kind of partial specifications. It’s type signature.

JH (0:48:09): Possibly, but not even just that. I mean, if you have dimension types, I mean, I would love to have dimension types in Haskell. I once had a student start on that. So then when you say something is not just a float, but it’s a float that measures meters. And that if you add meters, you can only add a meter to a meter. You cannot add a meter to a kilometer. So that’s a refinement, like a refinement type system. Again, you would – all of these, this is implemented more or less the same way as these optimizing analyses. The technology for that is not really different. But all of these would profit if you would be able to kind of do that. 

Now, of course, there is one major challenge. So in the case of dimension types, and also things like pattern match analysis that actually take place in the user space, if you know what I mean, right? So these are based on things that the program actually wrote. That’s relatively easy. The static analyses are not easy, but you’re still close to the original problem.

What might be problematic, particularly in GHC, is that before you actually do the analysis, the program has been transformed in many ways. A lot of the higher orderness may be gone because they have applied rules to kind of flatten the structures a little bit. They may have introduced helper functions for having different calling conventions. So there’s a lot of work, a lot of engineering that would need to be done there to make sure that you map it back to the source language. That is going to be a major challenge, I think, in that case. Or you have to do the static analysis on the source language. I’m not sure whether that’s a good idea or not. It’s really important that you kind of find the right place and that there is a good place to actually do these kinds of things. Maybe we can learn something from the Scala people there. I once read this paper where, the Scala, they have this whole sequence of compilation steps, and they have different analyses running at different levels, right? And that makes sense. So you have to be really careful about that.

But I think that would be something that at least would give me, as a programmer, a user of Haskell, the confidence that if I ever need performance, then I don’t have to look at high files. I mean, profile I think is also fine, but it’s still, I mean, I’ve only done one or two of these profiling exercises. What I found is that there’s a lot of misunderstanding with people that they think, “Well, if I make this strict, then everything will be faster.” But if they make it strict, it actually becomes slower. If they would have made it deeply strict, then actually it would make it faster, right? So there’s a lot of – yeah, it’s not easy, right? And also, this is not something. And note that this kind of stuff would not be something that I think our students would need to know to be able to use Haskell. But I think people that work in companies at production that use Haskell, I think, could use very well. So that is what I would like to work on.

WS (0:50:43): High-performance Haskell is tricky. 

JH (0:50:44): Yeah, indeed. There’s a huge amount of effort, extra effort that you have to put into a piece of software to make it product strength, right? And it works on all platforms, that it actually works, consistency –

WS (0:50:58): Gives good type error messages. 

JH (0:50:59): Well, not just that, but it doesn’t crash. The first time we ran with Helium, one of the things – so that was actually my only contribution to the first version of Helium, because I’d done this for my object-oriented concurrency program course. When you run Helium as a student and there was a type error, then it could send the type error, the program, to the server that I made in Java, actually. But what it also does, when there was an internal error, so if the compiler noticed that something was wrong about the compiler, it would also send such a message. 

So I still remember that the first time we ran, Arjan ran his course with this compiler, we had one or two of these internal errors, but we could really because we got the message, and we could have a fix deployed the next day on all the systems, right? So that was really beautiful. This was like 2002 or ’03 that we got that to work.

The socket library, actually, of Haskell didn’t work, at least didn’t work as it should as a socket library. But I liked it very much that you could kind of just say, first of all, I got interesting type incorrect programs we could do experiments on, but what also was fantastic for people like Arjan, yeah, they would immediately know, okay, there’s an internal error here, boom. Then you also had the program that gave the internal error. You could just rerun, “Oh yeah, this is the problem.” We could fix it and then deploy the next version. So that was really nice. 

WS (0:52:24): Okay. Thanks, Jurriaan. That was great to have you on. 

JH (0:52:28): Great for being here. Thanks for inviting me.

Narrator (0:52:30): The Haskell Interlude Podcast is a project of the Haskell Foundation, and it is made possible by the generous support of our sponsors, especially the Gold-level sponsors: Input Output, Juspay, and Mercury.

SPONSORS
Gold
IOHK Juspay Mercury Standard Chartered
Silver
Tweag Well-Typed
Bronze
Channable DigitalOcean Google QBayLogic TripShot
To learn more about the Haskell Foundation
Haskell Foundation, Inc.
2093 Philadelphia Pike #8119
Claymont, DE 19703
USA