Mike Sperber

Transcript

This transcript may contain mistakes. Did you find any? Feel free to fix them!

Andres Löh (0:00:15): Welcome to the Haskell Interlude. I’m Andres Löh, and I’m here with Matthias Pall Gissurarson.

Matthias Pall Gissurarson (0:00:21): Hi.

AL (0:00:22): In today’s episode, we talk to Mike Sperber, the CEO of Active Group in Germany. We discuss how to successfully develop an application based on deep learning in Haskell, contrast learning by example with the German bureaucratic approach, and highlight the virtues of having fewer changes in the language.

Okay. So, today’s guest is Mike Sperber. Welcome, Mike.

Mike Sperber (0:00:43): Hi. Thanks for having me.

AL (0:00:44): Thank you for being here. So, usually, our first question is—and that’ll be the same for you—how you first got into contact with Haskell or functional programming more generally.

MS (0:00:57): So, it’s a longwinded story, but my first contact with functional programming was that a friend of mine, Sebastian Egner, who I wrote a book with on computer graphics in the ’90s, recommended the Purple Book, the Wizard Book (Structure and Interpretation of Computer Programs) to me, and that I should look at Scheme, which I then did. And so, that was my contact in what we would now consider functional programming, I think.

AL (0:01:20): So, I know about this book perhaps, but I mean, there are very many different paths to Haskell these days. And I think there was a time when most people had an academic background because Haskell was mostly used in academia. And I have the feeling that Structure and Interpretation of Computer Programs is a very famous academic book, but perhaps you can briefly actually explain what that book is about.

MS (0:01:43): So, the book explains many programming techniques with wonderful examples using Scheme. And so, in particular, it focuses a lot on abstraction and also on building things from the ground up. So, Scheme is just a tiny language, and the Scheme that’s used in the book really does not have a lot of the facilities that we have in modern languages. It does not even have records or anything like that. And so, it really builds from very primitive parts, a lot of things, and then goes out into object-oriented programming. It goes out into machine-level programming, demonstrating that all in that common, unified framework. And so, that’s a very – so it’s still one of my top 10 books in computer science that I would recommend to anybody.

It has a slightly sad story that we might get into later in that it’s a great book to read as a grown computer scientist, but it was originally written as the intro textbook for MIT’s Computer Science degree. And as an intro textbook, it just is a pretty miserable failure. So, it’s a great book to read. If you’ve already done some programming, then it’s wonderful. But don’t try to learn programming from it or teach from it, I’ve certainly tried. It’s such a compelling book you really want to. You feel like, “Oh, it feels so nice to me. I should really teach somebody else using it too.” And don’t do that.

AL (0:03:13): Okay. But it was the path into functional programming for you?

MS (0:03:17): It was the first contact, and I think that was what motivated me. I think I was then studying Computer Science at the University of Hannover, where I met one of the research assistants there, showed Miranda and Standard ML to me. And then for romantic reasons, I told him that I was going to study in Tübingen next. And he recommended I talk to Peter Thiemann, who was pretty prominent in the Haskell community. And so then I studied under Peter Thiemann, got my degree there, and then got to be a research assistant as well. And I mean, we were all over the place, so we weren’t particularly tied to doing Haskell stuff, but were certainly doing a lot of stuff with Haskell and also wrote a bunch of papers doing things like partial evaluation with Haskell. We did some experiments with the type system and a couple of other things that I probably don’t remember in detail.

And then I got – I think for a while, once I left the university, definitely I got out of touch, especially, I mean, I was really heavily invested in the Scheme community. I was the project editor for the sixth edition of the standard of the Scheme programming language. And it was only, as far as my long life is concerned, fairly recently that I got back in touch with Haskell for two purposes. So, I’m now seeing –

AL (0:04:36): Can I interrupt just briefly? I mean, so –

MS (0:04:41): Yeah.

AL (0:04:41): So, in that time in between where you said you were mostly invested in the Scheme community, would you say that your interest was mostly academic during that time? Were you trying to, I don’t know, do research in the languages, or were you actually even then trying to make them practical?

MS (0:05:00): No, I was definitely – I mean, most of my work with the Scheme community was when I left the university and was a freelance consultant. And so, there were two parts to that. One is I’ve been the main – I mean, I haven’t done much lately, but I was the maintainer of the Scheme 48 implementation of Scheme that I took over from Richard Kelsey, and I was actually using that on commercial projects that I was doing as a freelance consultant. I actually got hired for one or two projects because of that. So, I remember writing a compiler for a language called TTCN-3 for a client in Scheme and also doing pricing financial derivatives at a large German bank, also using Scheme for doing the parallelization. It felt like a really – Scheme 48 in particular was just a great tool for doing this kind of stuff. And as a maintainer of Scheme 48, I added some of the facilities that were missing to make it practical. Some of which had already been done in my university day. So, that was one part of it.

And with Scheme, I mean, one of the problems with the Scheme community is that it’s heavily fractured. So, Scheme is this – I mean, one of the things that’s great and terrible about Scheme is that it’s such a tiny language. It’s really easy to write an implementation. And so, everybody and their brother has written a Scheme implementation. And so, as soon as you’ve done that, you have an interest in the way that the language goes. And of course, people have different ideas. And historically, the Scheme community has focused a lot on getting small things right. And so, everybody’s convinced that in Scheme, and also historically in the old days, the Scheme standards process, really everything could only be decided by unanimous consent. And so everybody was convinced that these things and the Scheme language need to be the right thing. And that goes up to a certain degree, but once you start to make things practical, there’s probably multiple paths to doing things right, and people still heavily disagreed on them.

And so, there was this problem. There’s actually quite a few wonderful Scheme implementations out there, but they’re slightly incompatible or not slightly. I mean, they’re incompatible because the standard only specifies very little. So, I felt it would be a service to the community to work on a standard that would go in the direction of enabling practical programming. So, I worked on the sixth edition. The fifth edition of the language was not geared towards practical programming. It was mainly geared towards standardizing just enough so you could use it in academic papers. And the sixth edition was the first version that deviated from that and tried to make it practical. It eventually failed, but we certainly tried.

AL (0:07:38): So, in many ways, it seems like Scheme is a little bit dual to Haskell then, right? Because I mean, in Scheme, you have then many implementations each with their own bunch of extensions, whereas in Haskell, you really basically have one implementation, but then still with like hundreds of extensions essentially, right?

MS (0:07:54): Yeah. And all the others died. Yeah.

AL (0:07:56): Unfortunately, perhaps. But yes. Okay. But you were already going to talk about the Active Group company, right?

MS (0:08:05): Yeah, that’s right. So, Active Group is a software consultancy in southern Germany, and we do mostly project work. We’re also doing actually an increasing amount of educational work, but mostly project work using exclusively functional languages. And of course, one of those is Haskell. And so, we’re using Haskell and a few industrial projects and all. So, that’s half of my interest in Haskell right now. And I’ve also been interested in what I call software architecture, or not what I call software architecture, but I’ve been interested in software architectures in large-scale programming. And in particular, we teach an industrial class in software architecture on functional software architecture that, among other things, uses Haskell as its main vehicle for examples and exercises. And so, I’m really interested in how can I use it there. I’m really interested in using Haskell as an educational vehicle for architectural stuff.

AL (0:09:08): Yeah. So, let’s talk about all this a little bit more, but perhaps to start with, this decision to use functional programming exclusively and Haskell as one of those, where exactly did that come from for Active Group?

MS (0:09:22): Well, it came pretty much when I joined. So, Active Group existed before I joined. So, it’s not something that I started; it’s something that existed before I started and was – and so, I joined in 2010 because of a particular client that Active Group had just acquired, a company that’s unfortunately now defund called Starview in the US. And Starview, maybe of small historical interest, had been working on designing a new functional language that actually shared quite a bit with Haskell. And it had a type system quite like Haskell. Among other things, they had type classes, which is not something that has been adopted by many programming languages. And they were looking for somebody who can help them with design and implementation and then building that into a software system. And so, there were – I actually looked at a job ad that said, “We’re looking for somebody who actually knows about functional language design here in the region.” So not far from where I live. So, it’s the first job advert that seemed vaguely interesting to me in a long time, or maybe ever. And so, that is how I came to first join the company to work for this particular client. And then the company was at a crossroads as far as the business model is concerned. And then we made a very conscious decision that we were going to try to do that with other clients as well. And so, we’ve effectively been doing that since about 2010.

AL (0:10:47): Was it then this first project that, as you say, was somewhat inspired by Haskell or close to Haskell that also made you look more at Haskell again, or –

MS (0:10:56): Well, I certainly looked at Haskell a lot in the day, especially because of the intricacies of the type system. So, I mean, while a lot of the research, the basic questions on the type systems are well documented in research papers, a lot of the pragmatics and a lot of what actually landed in GHC turned out not to be documented, even at the time. Like very basic stuff, like multi-parameter type classes, how they worked. So, there’s a research paper that shows several design options, and GHC uses none of them, uses a different one, for example, and it was undocumented at the time. And so, we looked at that definitely and looked at other options of designing the type systems. I think Andres and I, we were briefly in touch because we were trying to think about existential types and things like that. So, we were trying to do this language, so-called Star, which still exists, by the way. But I was mainly looking at Haskell as a subject of study back at the time, not as something that I was using.

I think the next time around may be interesting was that I took over a project. I have to be a little bit careful with names here. So, another consultancy here in the region approached us and said that they had started a Haskell project and everybody had quit. We worked on it. And now the client wanted some changes. So, we took that over. And one of the issues with that particular project was that – I mean, the changes actually weren’t so big, but the project was such that, as is popular, I think, with Haskell. As the original author would start a Haskell file, would turn on all the GHC extensions existing at the time and then proceed to use them. And now I think one or two years have passed, and GHC would no longer compile the sources. They knew what changes they were going to make. They probably even had people who could do the changes, but they did not have – they literally did not have a single person there who could make it compile. And so, since my brain was still full of my knowledge of the GHC, of the Haskell type system, and all the practicalities from the Star project, I was the right person to take that over and fix all the – I actually had to slightly redesign the way the types were used in the system to then make it work again and then make those changes that the client needed. So, that was the first actual industrial project that I’ve been working on using Haskell in my time with Active Group. And we’ve since been using it and a few others.

So, one that I was just working on this morning was we’re working with Siemens, making a product for industrial anomaly detection. So, it does – so you take sensor input from a production process, and you look for things that are slightly off. And that might be an indication that the production process might fail in the near future or that there are problems with the quality coming out. So, we use deep learning to find those anomalies, and we have an entire application that allows you to train those networks and then use them and find anomalies using it. And that’s written in Haskell.

AL (0:14:07): So, one question that I have that interests me from the perspective of also being in a software consultancy, but one that is more or less exclusively using Haskell, is for a new client, for a new project, how do you actually decide which language you’re using? I mean, is that something that just comes from the client, or are you saying if it has this and this and this aspect, then Haskell, if it has these other aspects, then something else?

MS (0:14:33): Actually, a lot of the time, the client comes to us and tells us something about their platform requirements. And so, I think in most projects, at least, the field is narrowed greatly by that. So, a lot of clients that we talk to or that we work with know how to deploy and run and administer, for example, a Java platform-based application. And that pretty much narrows the field to Scala and closure. We also have a few clients that are just Microsoft shops, and so they use the .NET platform, and F# is right there. The decision has been made. And I think that’s more than half the projects. Things get decided that way. And then on the Java platform, decision is then made mainly by a program of preference most of the time.

I mean, with the Siemens project, we really needed – so this is deployed on Siemens hardware that really runs on industrial machines that are very solid machines. They’re heavy. You don’t want to drop them on your foot. On the other hand, they’re very weak computationally. And so, we really didn’t want to have the overhead of a managed platform and wanted to have the performance options that Haskell gives us. Also, eventually, the GPU stuff, but that’s another story. So, there, we chose Haskell mainly for performance reasons, and sometimes there’s also a political component. That was certainly the case here.

MS (0:16:00): So, Siemens is based in Bavar – so the relevant people were based in Bavaria since the machine learning project. Somebody eventually comes out of the woodwork and says, “Why don’t you do it in Python like everybody else?” And of course, there are good reasons for not doing it in Python that you can’t really argue at that level. And they said, “Well, how are we going to find Haskell programmers?” And the neat thing in Bavaria, I think every single university in Bavaria offers a Haskell class. Very few offer a Python class. So, not that that really matters practically, but that’s cinched the decision for using Haskell in that case.

AL (0:16:40): So, there is at least an easier argument to make that you should be able to find Haskell developers. Okay. But I mean, given that you say that you have had projects in Scala and have had projects in F# and all these different languages, I mean, do you want to say anything about what you actually perceive to be the relative strengths and weaknesses? I found it curious that you said you chose Haskell for performance reasons. I don’t think many people would usually jump to that conclusion. I mean, perhaps, you can go into the details there a little bit more.

MS (0:17:17): Okay. Interesting.

AL (0:17:18): No, I mean, I don’t want to – I’m –

MS (0:17:21): So, GHC is a highly optimizing compiler. Of course, the default evaluation mode of Haskell code is not always super high-performing. But we felt that with GHC, there’s always this option to focus on performance-sensitive parts. So, that was one-half of it, and then really tweak that. And there’s certainly – I mean, people who know how to tweak Haskell can really get really good performance out of it. And the other factor in this particular project is since we wanted to do deep learning, we were eventually going to look at executing that on a GPU and then – I mean, there’s a great framework for Haskell Accelerate that enables GPU programming. And then there were a couple of things in the middle. So, we wanted – I mean, we personally wanted to play with Conal Elliot’s ConCat framework with this, which we then proceeded to do, and then hook all these things up together.

So, I think there, if we wanted – since we were building like neural network execution machinery from the ground up, if you take that as a given, I feel Haskell is probably pretty much the only choice among the functional languages in which to do this. So, of course, you can get good – I mean, you can get great performance out of OCaml as well, but it does not have the GPU stuff that we wanted. Otherwise, that might have been a viable choice as well. Also, though, of course, the compiling with categories, but we could probably also have done without it. Let’s put it like that. So, I mean, we’re enjoying that a great deal, but maybe a little bit too much.

MPG (0:18:57): Yeah. I’m interested in this. So, what was the experience of building a neural network from the ground up in Haskell? So, you have the performance, you have everything you need, but did you have to use fancy multi-type parameter classes, or what did you really use?

MS (0:19:17): So I mean, the way this – so as I said, we’re using a framework called ConCat for compiling with categories. And the way that that works is you just write straightforward Haskell code, essentially, that implements your neural network. And that’s a very pleasant experience because you can really – I mean, you can visually see – looking at the Haskell code, you can visually see the structure of the neural network. And this is also something that’s apparent to people who are not that much of a Haskell expert. So, certainly talked about that at conferences to Python people, and they go, “Oh yeah, we recognize. Here’s an autoencoder. We see the different layers,” and so on. So, writing it is a very pleasant experience.

Then, of course, you need a – you have two challenges. You want to compile that to efficient code. And if you have functional code that deals with matrices, that usually runs very slowly because the matrices get – you multiply two matrices, another one gets allocated, and so on. So, that’s very inefficient. And of course, you can’t run it directly on the GPU. And the other aspect is that you need to compute derivatives in order to do the gradient computation in the neural network. And so, for that, what happened, what we used then, we –

So, to get back to your question, the Haskell code is very straightforward. It doesn’t – the main thing that it uses in terms of fancy type features is it uses matrix types that have dimensions at the type level so that you can catch a very common error in numerical code, which is that your dimensions don’t match up. So, the Conal’s framework for writing that code is even abstracted over that. So, actually, the code is fairly straightforward Haskell code.

Now, the ConCat framework then works this way. So, as I said, we wanted to compute derivative, so you need to transform your Haskell code into another piece of code that will not compute your original function but the derivative. And the ConCat framework does a very generic transformation of your code in order to enable that. And since categories are very general, it’s also exactly the thing that enables compiling to GPU code and hooking up the Accelerate framework. And that then gives you a very efficient numerical code. So, that does some heavy lifting with the type systems. So, it operates on the types. So, in that way, it’s quite involved. I mean, if you really want to get into the esoterics and the ConCat framework, we’d like to use quantified constraints that would make a lot of things easier. But we have not been able to make them work yet because of some restrictions—some esoteric restrictions in the way GHC inferred types there.

MPG (0:21:59): Right. Yeah, because I thought machine learning was very stuck in Python, right?

MS (0:22:06): Absolutely. And it’s much to its detriment. In many – I mean, of course, coming from functional languages, you’re looking at it with a certain eye. But I think the user experience hacking together your neural network from Python is absolutely terrible. I’m not going to say it’s great in Has – I mean, writing the original code is great. Then making it go through this plugin is a very challenging process. It’s something that we greatly underestimated, not just in terms of how much effort it would take to make this research prototype into something that we could use in an industrial project, but also how much knowledge and experience it requires working with that. But I mean, that was our choice. Not everybody needs to make that same choice. Certainly not a general statement on Haskell or GHC.

MPG (0:22:56): And that’s how you can get the performance, right? You make the compiler do all the heavy transformations before, and then it runs super fast on the – yeah, that’s very cool. Very cool.

MS (0:23:05): Yes. And you reuse – I mean, Python, your typical Python, network framework is going to include an entire Python front end. And this is something that we could avoid with this approach. We are really reusing everything that we possibly can from GHC in order to make this work on a practical level.

AL (0:23:24): Yeah. So, another thing that you mentioned about Active Group is that you’re doing quite a bit of teaching, and perhaps we should talk a little bit more about that then.

MS (0:23:37): Yeah. So, this goes back way before that time. So, I started teaching programming in the ’80s, I think, the first official Computer Science class in high school. I taught in high school in 1987. So, both my parents were high school teachers. Something I feel that’s – it’s dear to my heart, is teaching other people. Sometimes – of course, a lot of people have suffered over the years.

So, teaching programming, I mean, when you do that for a long time and you really look at the results, you just see it’s hard. It’s really difficult to teach something. I mean, I taught – so moving on from there, from high school is I designed the intro course at the University of Tübingen, the Intro to Programming and Computer Science course. Still being taught there. And doing that over the years, we try to make this work for as many students as we possibly could. Your goals might vary, but our goal was to have a baseline amount of material that we really wanted everybody to get or most people in a class to get.

And if you are trying to do that and you really take a hard look at how well your teaching is working, it tends to get a little bit depressing. I mean, especially if you’re teaching func – of course, I mean, I mentioned the Purple Book in the beginning, which is this compelling book. And in the late ’90s, we started teaching with essentially the material from the Purple Book and just found out that it just doesn’t work in the sense that there’d be three groups of students. So, students that would just outright fail. And then there were a few students that we could actively teach—a very small number of students, actually. And then there were students that taught themselves. And so that’s, I mean, great, but – and more students failed and didn’t get the material than we liked at the time. And so, we felt – I mean, this insight, I guess it’s no surprise to anybody, but to us, it was – if the visceral insight was just because something feels compelling to you as a teacher, no matter how compelling it feels, it might not be the right way to teach it, and it might not be the right way to impart the knowledge or the –

AL (0:25:51): So, you’re blaming the book here.

MS (0:25:54): I’m blaming the book. Yeah.

AL (0:25:55): Yeah. So, the hypothesis is that there’s not always going to be these three groups that you just talked about. So, what would you say is it about the book that made you think that it is to blame?

MS (0:26:09): So, I should still say that it’s a great book, and it also contains a lot of great things that are useful in teaching, but the combination of things just doesn’t work. And the main reason for that is that it’s teaching by example. That is not – this book’s particular fault, almost all books on teaching anything work that way. But it’s particularly conspicuous with programming intro books. And essentially, the way that they go is they give you two or three examples on a particular technique or on a feature of your programming language, and then they go well. And now you go forth and solve this new problem in the same way.

AL (0:26:47): In the same way, without ever actually spelling out –

MS (0:26:50): Without spelling what the way is. It’s just given you two or three examples. And then it’s left to the reader to infer from that the techniques that were being used. And so, I learned – I mean, I fundamentally learned what we know about teaching from Matthias Felleisen’s group that made the same discovery a few years earlier in the ’90s. So, also started to teach with the Purple Book, found out that various things about it didn’t work. We’ll get to the programming language later. I guess that is also a factor in this, but – and the most important factor to me is the didactics in that it’s really important to spell out explicit techniques that people can follow coming from a problem statement, going to a working program. And Matthias’s group laid all the groundwork, creating something called the design recipes, which is a collection of very systematic techniques that do this. And we picked that up in our teaching and found out that that was the main tool that would make the teaching significantly more effective.

It met a lot of – I mean, it took – I mean, I’d read the book previously that they’ve published How to Design Programs. That is also a great book. I’d read that previously, but it had not clicked with me because it’s such a pedantic book. If you read it, you feel – I mean, Matthias is American now but used to be German. So, it takes this German bureaucratic approach to programming. It’s really impossible to describe it without having looked at it. And it just – you feel the drudgery looking at it. It just feels, “Ah, I don’t want to do this.” So, it’s the opposite of the Purple Book, which is like, “Oh, here’s these great examples. Isn’t this wonderful?” It just doesn’t do that. It’s very pedantic. But we found out that it’s much more effective, enabling students to then go, “Okay, I have this technique. I have a couple of steps that I can follow. I know exactly my problem. This part of the problem has this particular shape; therefore, my program has to have this particular shape.” So, that is the key part. And what Matthias Group did and what we then followed up on was designing the machinery, which is the programming language, and also the tooling being used to match that.

I mean, one of the depressing insights from this, as far as functional programming is concerned, the Purple Book I mentioned uses Scheme, which is this tiny language and has very regular syntax coming from the Lisp era. And so, it’s one of those languages. I mean, you know it once in your life. Even if you don’t touch it for 10 years, you come back, it’s still there in a tiny corner in your mind because it takes up very little space, not so with Haskell. And so, you think that’s great. I don’t have to spend a whole lot of time teaching the syntax or whatever the standard functions in the programming language and so on. But still, fundamentally, the language is designed for researchers. It’s not designed for teaching. It comes from a research perspective. And the tooling is also designed to appeal to researchers, maybe people building software. And the needs of those people are just different from the needs of learners.

And so, one of the things that Matthias Group did initially was to create what is now the Racket programming system. It started out really as an educational tool with an IEEE that works for beginners and with – I guess in the beginning would say adapted versions of Scheme that were changed from Scheme iteratively based on the feedback that they got from the classroom. We did something similar. So, Racket now ships with those same teaching languages that were designed by Matthias PLT group. And over time, back in the university days and following that, we also have our own set of languages that are similar, but not quite the same as Matthias’s. So, that was the academic part of that.

And then once I started working professionally with functional languages, we did get the occasional request to teach somebody else to give them an introduction to F# or Haskell or some other functional language, and I felt great, wonderful. I really know how to do this. I’ve taught maybe a thousand students back in my university days. We’re just going to use the pedagogy that we learned then. And then people have requested whatever Haskell class, and I could just use the pedagogy, but use it with Haskell, especially as I assume that professional programmers, I mean, they are aware. They’ve used the professional IDE before, so they know how complicated it can be. They’re not daunted by that, and they’ve surely seen terrible error messages from their favorite language compiler, which is one of the factors how the teaching languages differ from professional languages. And we did that for a while. It was okay, but it wasn’t great.

And so then I figured – I mean, one of the things about education really that we learned the hard way is there’s only so much you can pre – you can only go so far predicting how you’re – I mean, I mentioned things are compelling to you, they don’t have to be compelling to your students. So, even knowing that and having taught for 20, 25, 30 years as I have, I find it very difficult to predict how a particular change or educational aspect of my teaching is going to actually work, so I need to try it out. So, I figured we might as well try out teaching our professional clients that order two or three-day courses on something like that. We might as well just try out what we know to work, which is to do really the intro course with everything from the university with everything—with the teaching languages, with the Racket system. So, with the IDE, that’s designed for learners—do all of that and do a high-speed version of an intro programming class for professional programmers. And then if somebody’s ordered an F# class, an OCaml class, and Haskell class, then segue into that, and then say – because really what the intro curriculum that we’ve designed really does well, and the languages also enable this, is to teach the fundamental concepts of functional programming or of programming and of program construction, and they teach them as separate concepts that are sometime – and these separate concepts, especially in Haskell, they get lumped into combination mechanisms. And you can then use the vocabulary that you’ve built up there to very quickly say, “Well, Haskell is this language. And here, the concepts that you already know are represented this way in Haskell.” And we found that to be extraordinarily effective, so much that in our architecture classes, we do the extreme version of that in that we do a one-day introduction to functional programming, which is like the super turbo, super-fast version of the intro class. And then we do about a day of Haskell, and then we’re off and do stuff.

AL (0:33:59): And this intro is still using Scheme or Racket?

MS (0:34:03): Well, we’ve avoided calling it Scheme because that gets you into a discussion with this slightly fractured community, which is also one of the reasons that Racket, which was originally called PLT Scheme, has renewed themselves. So, we just call them the dying program teaching languages. And while the synta – I mean, looking at it superficially, I mean, you could see the Scheme legacy, but there’s many, many pragmatic differences. Among others, I mean, one of the most conspicuous ones is that you write something that we call signatures that very much looks like a type signature, and it isn’t, but it looks like one. And so, that’s also, of course, one of the things that then enables easily switching to a strongly typed language.

AL (0:34:46): Yeah. So, this is actually the main point that surprises me about this whole argument. I mean, I can see that in terms of techniques, at least if you consider functional languages, they are similar. So, if you’re teaching problem-solving or programming, program development strategies in Scheme or Racket, I can see that they’re transferrable to Haskell. But when it comes to thinking about problems in the first place – and actually, just the other day, because you posted a video, like a German video about you talking about software architecture, in that video, the first thing that you were doing was extracting all the nouns out of a text and then turning them into data types. So, that was at the very beginning in this design process. And I think, for me, it’s also always like, if I teach Haskell, data types are first. I mean, how do you do a full day of Scheme without talking about types or whether you probably do, but –

MS (0:35:56): No, no. We do that all the time. I should mention that. But no, the design recipes approach and ours is very much type-driven. Very much so. Going back to something called Jackson Structured Programming back in the old days, the idea is – and the types and the structure of the data then drive the entire design process from then on. And so, the data analysis is the fundamental part of this. And we give names to those things as well. I mean, they’re not technically types for just what types are, but the whole thing, and I guess that’s also one of the differences with the Purple Book, is in how much it is driven by the types.

Now, the – I mean, one of the issues with teaching about type-driven design in Haskell, I mean, there’s a couple of things. But one of the things is just that this concept of an algebraic data type lumps together several concepts that are separate in our teaching languages. So, you have a sum type, what we call a mixed data, whatever you call it. Discriminated union. And then also, in each individual case, you have compound data. So, it mixes two things into one. And this is – my experience has been, if you try to teach that from day one, especially people coming from Java, maybe I’m not doing it right, but people have trouble in that you always have to tag the alternatives in that you can’t just take two separate types and then combine them, right?

AL (0:37:25): Yes.

MS (0:37:26): And they often make this mistake of writing two type definitions and say that, well, this new type is going to be the first type (vertical bar) the second type, especially also with the Pascal tradition of if you have like a type with a single constructor –

AL (0:37:41): Running between the constructor name and the type name.

MS (0:37:44): Yeah. So, there’s a lot of little things that are easy to under – right? And so, in the teaching languages, the concept of a sum type and the concept of compound data are separate, and they get – I mean, you can combine them, but we can teach them separately. And therefore, I mean, that is certainly something that I found consistently useful in didactics, is that you try to teach only one thing at a time. And that is generally just the way that Haskell is designed, is just exceedingly difficult to do. I mean, I would say it’s probably impossible to do, especially because you immediately – I mean, very quickly just dealing with Haskell, interacting with the repel, you very quickly run into type classes that you maybe don’t want to teach in the beginning. But on the other hand, if you have the conceptual framework in your mind of what the concepts are, you can just say, “Well, in Haskell, this concept that you already know is expressed like this.” And then people generally take to it very quickly.

AL (0:38:38): So you’re – yeah. I mean, I think it’s an interesting approach definitely to say like, I mean, rather than one – most people will probably say teaching one language in two days is already extremely challenging. But you’re saying, teaching two or, if you take all the teaching languages apart, even five languages in two days, that’s the way to solve this problem.

MS (0:39:02): And it’s challenging. So, I mean – and consistently, people at the end of this say, “Well, this is the most exhausting class I’ve ever taken.” But on the other hand, it seems to work for all kinds of people.

AL (0:39:17): No, I think also, in general, I’ve been surprised how much you can teach in very little amount of time. I think it is one of the curses of, in a way, commercial teaching that the requirements often are that people allocate far too little time for it. Ideally, they would say, “Let’s learn this language for a week or even two weeks,” but they say, “Two days is the most we can offer,” and then you just have to work with that. But conversely, I mean, I think it often works surprisingly well. I mean, obviously, the people, they need some time to digest everything they’ve heard in those two days. But it is still doable. No, that’s interesting. So, do you want to say more about the didactics or the specific approach, or do you think you’ve already covered that?

MS (0:40:12): Well, I mean, if you’re interested in that, you can either, I mean, read How to Design Programs or read the book that we just actually finished after a bunch of years called Schreibe Dein Programm. So, if you Google my name, you’ll find it, which is online for free, but it’s written in German. So, that’s sort of – I mean, this is a class on programming. And so, what I’m currently interested in, I mentioned in that. Well, the next level up from that is – so How to Design Programs. Well, it says ‘programs,’ but really it’s about designing functions or small groups of functions, small applications. But of course, as you scale up and you want to develop larger applications, and of course, if we want people to adopt functional programming that it’s not enough to teach them just the basic programming. We also need to teach them how to assemble and structure larger programs.

And unfortunately, there’s very little literature, comprehensive literature on this. Very, very little. I mean, there’s a great number of Haskell intro books. There is a small number of Haskell architecture books, and there’s a very small number of OCaml architecture books. But generally, the total number of books on how to do large-scale programming with functional programming can, I mean, definitely be counted on two hands with this. And it seems that a lot of the things – one of the things I always teach in the architecture class, I usually show a paper by Simon Peyton Jones and Jean-Marc Eber on how to do financial derivatives using functional programming, which is a classic – it’s just a – so it’s a combinator model. And so, it’s one of, I mean, highly recommended classic paper. And on the first page, it says any red-blooded functional programmer should be foaming at the mouth and be yelling, do a combinator library. And so, this is something – the paper I think was from 2000 or 2001, was folklore in the functional programming community. Everybody was supposed to know you’re doing a – you should do a combinator library, which is true, but where is even the paper that says how to design a combinator library? It does not exist, as far as I know.

So, there’s lots of examples. There’s lots of research papers on individual combinator libraries, and of course, the library from this particular paper turned into a commercial product, but the design process is not documented. It’s not also slightly beyond what we do in How to Design Programs. We have a chapter in our book. I’m not sure how far that carries. So, I don’t have enough experiment data from that. So, just to name one example.

AL (0:42:55): Yeah, yeah. But it’s also a difficult problem. I mean, it’s not as if everybody could just sit down and write down how to design a combinator library.

MS (0:43:06): No, no. But I mean, I’m sure there’s a bunch of people that know – I mean, that in their heart, they know, but they haven’t – I mean, as with many things, programming – I mean, previous to How to Design Programs, program – I mean, a lot of people I think knew the basic programming techniques but could not put a name on them. And so, beyond the How to Design Programs stuff, I feel there’s a lot of things that we know about programming, about software architecture, and how to structure large-scale architecture that we have not documented yet, right?

AL (0:43:37): Yes. Yeah.

MS (0:43:38): How do you find – I mean, very simple problem. People come up to me, and I’m always slightly stumped when they come up to me and say, “Well, in object-oriented programming, we know where to put the methods. We put the methods and whatever class is associated with the object, blah, blah. But how do you do that in Haskell? How do you decide which function to put in what module? How do you do that?” And I’m like, “Hmm.” So, I can talk about this now, but of course, they don’t just want to hear about this from me in a five-minute conversation after a talk. They want to hear from me, “Well, look it up in this book.”

AL (0:44:16): Yeah. Yeah. Exactly.

MS (0:44:17): And so, I feel there’s a lot of things we can offer. I mean, even this basic data modeling stuff, so you find the nouns. This is curiously one of the things that the object-oriented community has unlearned how to do—data modeling. They don’t know how to do this anymore. They knew in the ’90s, but they’ve forgotten. And this was part of that conversation that you cited earlier. They’ve forgotten, and maybe they didn’t know all about it, that we know would be tremendously useful to the software architecture community, but we haven’t communicated it. I mean, we haven’t documented it and we also don’t talk to these people, and they don’t talk to us. And the more I think about this and the more I try to talk to these folks, the larger I realize the gulf is. We’ve spent 25, 30, 40 years developing our own methodologies and our own way of doing software completely, I mean, basically without any communication to the object-oriented software architecture community. And of course, they’ve also developed stuff. It’s not just because it’s object-oriented programming, because we’re functional people. That doesn’t mean that the techniques that they’ve developed couldn’t be useful to us. And particularly in the area of domain-driven design, there’s a lot of things I think that would carry over, but that still require significant effort to integrate with what we do.

So, there’s just a tremendous amount of work, I feel, that needs to be done in order to really have practical software development out there in the large benefit from the insights and functional programming. And maybe, on a couple of things, we might even benefit from the other side as well. But we just have not had those conversations. And the same way that – I recently did this workshop at ICFP on functional software architecture and just asked, has anybody heard of something called domain-driven design, right? Which is something that everybody in the larger software architecture community knows about, and they have conferences that are bigger than ours, which I should note. And maybe half the people in the room said, “I’ve heard this term,” and half the people didn’t. And conversely, if you ask the object-oriented folks, have you ever – do you know what a combinator library is, they go, “No, never heard of it,” even though there’s a chapter on the – whatever. But it’s literally not a thing there. And I recently had the – so I have the great pleasure of being able to talk to some of these people over time and trying to figure out what it is that we can – how we can benefit from each other.

So, I’ve done a series of talks with Henning Schwentner, who’s one of the big proponents here in Germany of domain-driven design and various other design techniques. And we did this experiment of just having a small problem, like designing a shopping cart domain for an e-commerce site. And just – we said, Henning and I, “Henning is going to do it. I’m going to do it, and we’re going to compare the activities that we perform in doing this.” And I felt that we would be able – initially before we started out, I thought we would be able to relate our activities somehow to each other. I was sure Henning would do some sort of data modeling. I was sure he would do some kind of whatever. I make all kinds of assumptions of what I thought object-oriented programmers would do in order to go from a problem statement to a working program. And what he does has absolutely nothing to do with what I do. Zero overlap. So, that’s how big this – I mean, this is how large this gulf is. I mean, there’s nothing.

And so, that’s my current mission, is try to enable that communication. So, I’m plugged into the – so in Germany, there’s an organization called the iSAQB that is really about architecture education and has a large portfolio of curricula. And with them, we’ve established – actually, a couple of years ago, established a curriculum on functional software architecture that we’ve been teaching many times since then. And slowly, that is leaking into a larger conversation that we’re having about how to really do software architecture.

AL (0:48:43): Yeah. So, one thing I noticed is that you’re talking a lot about reaching out to other communities, which is, of course, very important. But I mean, an aspect of the fact that we don’t have these things written down, that often also comes back in community discussions. Even among people who are already convinced that Haskell is the right language, there is not a lot of common ground necessarily. Different people have very different ideas. And in that sense, the Haskell community is perhaps also very fragmented. And it goes as far as that there are disagreements as to like, should you use all the language extensions available or should you restrict yourself deliberately to something which is close to Haskell 2010 or something like that? I mean, do you have an opinion on all this? I mean, is there something for the functional programmers or for the Haskell programmers that we should be doing or that you think are like things that we are in general doing wrong, apart from not talking to others, which I already take as a point? Like more immediate advice for like, if you write software in Haskell, don’t do this or –

MS (0:49:57): Yeah. So, I think I have two answers to this. So, first of all, of course, I mean, the way that Haskell has developed over the years is that things got added to the type system, essentially. And I feel having the strong front foundation of System F and the ML style type system, I mean, shows us the value of having something firmly grounded, like in mathematics in a very convenient formalism, even for practical applications. But we kind of – I think in the conversations that we’ve had over the years, I think people are a little bit too much in love with the type formalisms and tend to not always look at the requirements that we have when we try to write programs. That doesn’t necessarily mean that the developments have been bad, but it feels that – just looking at Haskell, it feels that nobody ever sat down and said, “Well, here’s the things that we want to enable in programming,” specifically in large-scale programming, of course, what I’m interested in. And we’re going to try to go from requirements to particular things that we need in the type system. A lot of things feel more like we have this small three-line program and it doesn’t type checks, or we’re just going to change the compiler instead of just rewriting the program. And over the years, it just feels very sort of, “Oh, I looked in the type system and I saw a little hole there or something that’s missing,” as opposed to a more overall systematic approach.

So, I would like for the Haskell people to maybe, at some point, take a step back and say, “Well, what are the things in our type system that enable actual programming?” I mean, it sounds banal, but that enables specifically – and specifically that enable large-scale programming. Because a lot of things that are convenient in the small scale – I mean, let’s take our algebraic data types, for example. Maybe the simplest example right there. They’re super convenient in this small. But if you have algebraic data types and they get exported from your module, and they get spread all over the place, they create very significant coupling. A couple of ways in which this happens, for example, all the cases have to be defined in one place, in one file. Very simple, trivial thing. And in large-scale software architecture, you don’t always do that. You don’t always have that. You want to modularize and say, “I have this one data type, and different parts of the system contribute to this data type.” And you can’t do this. Now, there’s a solution to this—data types à la carte—but it came out in 2008, something like that. So, Haskell had been around for a long time, and it took somebody as smart as Wouter Swierstra to figure this out. Certainly, figuring this out is beyond the capability of a programmer. Certainly beyond my capability, at least at the time.

AL (0:52:49): Yeah. I mean, I may be getting the years wrong now, but I mean, to be slightly fair, I think Phil Wadler phrased the Expression Problem. And I think even that appeared relatively late. I mean, the terminology of the Expression Problem, which then resulted in people actively looking for solutions. So, I think if you look at the number of years between recognizing the problem as a problem and finding a relatively adequate solution, then it’s not actually that bad, in my opinion.

MS (0:53:23): So, my experience, this actual problem occurs frequently enough. I mean, and also data types à la carte is not a very efficient solution, as you know. So, this deserves more direct treatment in the language. And OCaml has something like open unions. I don’t know how realistic that would be to integrate something like that into Haskell. But it demonstrates my larger point, is that – so now we could look in language design. Well, of course. Okay. So, the algebraic data types created a certain amount of coupling if you use pattern matching across modules. Always fun to look at GHC sources, which then have these funny functions that are like selector/getter functions that just do pattern matching there. And then you don’t have to use pattern matching in other places in order to reduce that coupling.

AL (0:54:11): Yeah. GHC has its own Trees that Grow now. I mean, since a couple of years, which is like –

MS (0:54:16): Yeah. I mean, that would be another example where types actively – I mean, Trees that Grow are great if you remember to put them in, but if you haven’t, you can’t do it after the fact unless you change the source code. And so, it is with a lot of things in Haskell. And so, we could look particularly with algebraic data types. Of course, there is extensions in language that then allow you to loosen this coupling, the view patterns and the pattern synonyms and things like that. And this could be – but I feel that historically, I think the view patterns is something that was actually done explicitly to reduce coupling. But I think this could be – so maybe this is an example of something, of a successful initiative, but I think this is just a way to think about the requirements that could be applied more broadly, is to really take a hard look of all those fancy features that we have in the Haskell type system. Which ones enable things like modularity and decoupling, and where do we still have blind spots? Where does Haskell force us to create a lot of coupling, and how could we make solving this more convenient? So, this is a larger issue, I think, that would yield some very useful results.

The other issue really is about language stability, and they really have a couple of bones to pick with the Haskell community because, as anybody knows, who maintains Haskell code over longer periods of time, a new version of the compiler of something comes out and your code breaks, and you’re going to have to make some changes. I mean, the most recent example that bugged me was to remove the not equals method from the Eq class, which – and then somebody ran this change overall of Hackage and said, “Well, there’s just a few packages affected.” Unfortunately, these packages tend to be at the bottom of your transitive closure of dependencies. And because it seems – I’m going to just dunk a little bit on the Haskell ecosystem and culture there. It seems this attitude carries over into the way that a lot of packages on Hackage are developed as well, is that there’s often breaking changes and minor versions in parts of the system that it’s really difficult to see why did you need to put a breaking change in there. Of course, to make things better, but there’s a trade-off. And I’m sure –

And people often only look and say, well, the change that you’re going to have to make to adapt to this, whatever it is, FTP or this change to Eq or the change in hierarchy between that monad now depends on applicative, things like that, all these changes are – looking at the changes in isolation, they’re all for the better, but they break old code. And even if your code doesn’t use it directly, but depends on some package that has been unmaintained for a while, that depends. Then again, on being before that particular change, it just creates a lot of hassle. And I wish really that in the Haskell community, they would more strongly focus on stability. I mean, the language – I mean, I realize a lot of the development is driven by research, and it’s a great research vehicle, of course. On the other hand, Haskell has been around for 40 years now, 43 years or something like that. And it seems if we still haven’t figured out how to do these things, I mean, maybe we should do that somewhere else.

AL (0:57:51): I mean, we’ve had all these breaking changes that caused lots of waves, but at the same time, we still have lots of things that basically there is consensus about that there are vastly suboptimal that have never been fixed and probably will never be fixed properly. I mean, we have lots of occurrences of string in the Prelude. We have lots of partial functions in the Prelude. We have a numb hierarchy that nobody really likes, and so on and so forth. I mean, it is quite difficult. And then at the same time, some people are saying, “Well, one of Haskell’s strengths,” which I actually truly believe in, “is that the type system allows you to refactor fearlessly so that you can make vast, sweeping changes and basically get away with it.” But somehow there is an ingredient missing somewhere that allows you to make it practical because you don’t want to force these changes on arbitrary developer at essentially arbitrary points in time. You want to make it a conscious choice by the individual developer that they want to do some changes. And I think that is something that is not well supported somehow, but I also don’t really know how.

MS (0:59:14): Well, other communities have been more successful doing this, in particular, I mean, the Java community is one good example. And while there’s a lot of things to dislike about Java, they’ve integrated a lot of features from functional programming that doesn’t make it a functional language, but they’re useful to Java programmers. And they think very, very, very – I mean, it seems like they just think much, much longer about a change to the language and really come from a place where they say, “Well, what we are going to do, it must not break code.” And somehow, they managed to do this in a way that it doesn’t look too awkward. Their lambda syntax, for one, is not worse than Haskell’s. Probably better actually, things like that. And they thought for years and years and years, and everybody was screaming, “We’ve got to have lambdas,” they thought for a long time on how to really put this and make it work in the syntax so it will not break code to integrate it with a type system so they don’t have to, so it doesn’t propagate too many changes through the standard library and things like that. They just thought very hard. And it seems that with a lot of things, this should be just waiting a little bit longer before you put the next change in, whatever.

The deep subsumption thing, I don’t even understand in detail what it is, but I’m like, you didn’t – what went wrong is you didn’t think hard enough at both ends. Just putting it in originally and then taking it out again, and then putting it back in again. Somebody should have taken a break there, it seems, and reflected on what the consequences would be for users.

AL (1:00:55): Yeah. At the same time, of course, it is a volunteer-driven, open source community, and it is small. Compared to languages like Java, it is really, really small. So, often it is – even with the best intentions, if you’re trying to make a change and predict what kind of impact it has, it’s actually pretty difficult because people will only really start using it the moment you make a release. And I mean, you’ve been asking 10 times before, is there any feedback on this proposal? People are saying, “Yeah, yeah, it’s fine.” And then the moment you release it, people are saying, “How could this ever go in?”

MS (1:01:34): I’m not saying this is easy. And I highly appreciate – I mean, there’s a lot of great things in Haskell at any point in time. It’s just the changes.

AL (1:01:42): No, I know. Yeah. I also agree. I mean, I agree with you. I agree that it is a difficult problem. I just think it is actually really difficult. I mean, also the way that Haskell is set up in various different ways, it is certainly recognized as a problem. I mean, I think there are much more advocates for stability than there used to be. I think that’s one observable thing. The other thing is that there is now also actual work on – the Haskell Foundation has a stability working group that is trying to set out guidelines for this sort of stuff. And I think it is also long-term recognized as a problem for compiler development to solve that GHC should make changes that go into the direction, at least in the long term, making these kinds of problems less pronounced. But yeah, it is not something that can be really solved overnight, in my opinion.

MPG (1:02:40): Yeah. Ironically, the last thing we usually ask people are, what changes would you make to Haskell? So, you would say change the amount of changes, I guess.

AL (1:02:53): Yeah. Change the amount of changes, it’s a good final word.

MS (1:02:56): Moratorium on changes for a while.

AL (1:03:00): Okay. Yeah. Thank you very much.

MS (1:03:02): Thank you.

AL (1:03:02): That was fun.

MPG (1:03:03): Yeah. Thank you.

AL (1:03:04): And, yeah. Have a great day.

MS (1:03:06): Yeah. Thank you so much for having me.

Narrator (1:03:08): The Haskell Interlude Podcast is a project of the Haskell Foundation, and it is made possible by the generous support of our sponsors, especially the Monad-level sponsors: GitHub, Input Output, Juspay, and Meta.

40 – Mike Sperber

Related links

Transcript