62 – Conal Elliott

Recorded 2024-11-13. Published 2025-02-17.

In this episode Wouter Swiestra and Niki Vazou talk with Conal Elliott. Conal discusses doing things just for the poetry, how most programs miss their purpose, and the simplest way to ask a question. Conal is currently working on a book about his ideas and actively looking for partners.

Transcript

This transcript may contain mistakes. Did you find any? Feel free to fix them!

Wouter Swiestra (0:00:15): Welcome to the next episode of the Haskell interlude. I’m Wouter Swiestra, and I’m joined today by Niki Vazou.

Niki Vazou (0:00:21): Hello.

WS (0:00:22): Today’s guest is Conal Elliott, who will tell us a bit about doing things just for the poetry, how most programs miss their purpose, and what is the simplest way to ask a question.

NV (0:00:34): Hi, Conal, can you tell us how did you get into functional programming?

Conal Elliott (0:00:38): Oh golly. I think I was an undergraduate in Santa Barbara in California, and I was just wandering around the library, and I found The Little Lisper on the bookshelf. And I checked it out and fell in love. That was my introduction to functional programming. So my first few years I worked mainly in LISP.

NV (0:00:56): So you didn’t have any – it was a math, your major was math? 

CE (0:01:00): That’s right. Yeah. Undergraduate was math. Yeah, exactly.

NV (0:01:02): And you didn’t have any classes on functional programming?

CE (0:01:07): No, definitely not functional programming. There weren’t many classes on programming at all. I started college in 1978, or just barely starting to be computer science-related classes. Some in the math department, some in the college of engineering, but there was no computer science department, for instance, when I started there.

WS (0:01:21): And then you did a PhD in Computer Science, but how did that come about?

CE (0:01:27): Like why did I switch from math to computer science? 

WS (0:01:29): Yeah. 

CE (0:01:31): It was mainly like a dissatisfaction. I enjoyed math, but I think I felt sort of sad and lonely about not being able to share what I was working on and thinking about with many people. And computer science has this more sort of outward aspect to it. Like we make machines do stuff. I really love geometry and used to build polyhedra by hand with the scoring and folding and construction paper and all that stuff. That was something I enjoyed doing very much.

And so I got drawn into computer graphics as an intersection of math and computer science. And that was really satisfying because I could make beautiful things and share them with my parents without them having to understand anything really abstract. So I think that that’s what got me going.

And so when I went to grad school, my intention was to do computer graphics. That’s what I was planning on doing at Carnegie Mellon. As fate would have it, when I arrived, the computer graphics people left. So it looked like there was some really interesting work going on, but it turned out it was going on until I arrived, and then it wasn’t going on anymore. I found functional programming and, in particular, program derivation, program transformation, and understanding of how efficient programs, efficient algorithms can emerge from inefficient programs that have a simpler structure, easier to understand in a systematic way.

WS (0:02:42): But your thesis was on higher-order abstract syntax, wasn’t it?

CE (0:02:46): Reunification, yeah, including higher-order abstract syntax. In grad school, I looked around, I realized, oops, I’m not computer graphics here. And I looked around, and I think first I was interested in AI. I started in 1983. AI then was quite different from what the modern resurgence of AI is. It was a good old-fashioned, logic-based AI. It was just too squishy and ill-defined for me. But there was some neat work going on. Will Scherlis and Dana Scott were heading a project. So I think maybe there was some overlap there. It was a program derivation. And then since I was interested in AI, it seemed like there would be a good opportunity to do some kind of search for efficient algorithms within a systematic framework of correctness-preserving kinds of stuff.

So I went into this project reasoning about programs. It was the Ergo project at Carnegie Mellon. And then while I was there, Frank Pfenning joined our group. He graduated doing logic in the math department. He became my advisor. Will Scherlis had been my advisor, but he went off to DARPA. And then I worked with John Reynolds, who was just wonderful, amazing. And Dana Scott also gave me some guidance. But Frank and I were a really good fit. So I worked with Frank. 

And so, yeah, doing this reasoning about programs. And then we were exploring this, I guess what we called, higher-order abstract syntax. And that was combined with Lambda Prolog, which is higher-order logic programming language. Higher-order logic and higher-order programming. And at the heart of it, just like in first-order logic programming, is first-order unification, and higher-order logic programming is higher-order unification. That’s why I got interested in higher-order unification. 

And so then I studied Gérard Huet’s work. His dissertation was on higher-order unification of the simply typed lambda calculus. And I had been studying his algorithm with this lens of program derivation. So how might this algorithm emerge from a simple, clear specification? And I was able to get some good insights, and then dependent types were popping up. And a particular Edinburgh LF (Logical Framework) was just happening. And so it was a natural thing for me to look at, can I extend Gérard Huet’s higher-order unification algorithm? It’s just a semi-decision procedure to dependent types and also to some polymorphism. And because I had been looking at Gérard’s work in this systematic way, I was in a good position to think about if the specification changes, what needs to happen to the algorithm. And so that was a lucky combination that ended up working out as a dissertation.

WS (0:05:05): And these higher-order unification algorithms, they’re very tricky, right? I mean, I’ve tried to understand them, but they’re so subtle. 

CE (0:05:13): Well, I’m guessing that you’re looking at a program and not at a derivation of a program. 

WS (0:05:18): Maybe that’s the problem. Yeah.

CE (0:05:20): Yeah. So programs are tricky because we want them to be efficient. And the insights that you would need in order to understand them deeply aren’t in the program. They’re in the logical structure that got erased, of which the program is like a lower-dimensional slice. 

WS (0:05:36): Yeah, a shadow of the original slice.

NV (0:05:38): Yes, exactly. A shadow, a projection, a slice. Yeah. And that is still how I think about correct computation. Programming languages only let you address this very superficial slice of what programming really is about, which is higher dimensional.

NV (0:05:51): So it is a slice of the program derivation, you’re saying?

CE (0:05:55): Yeah. Or another way to think of it is it’s the surface. If you have a three-dimensional object that has all kinds of internal structure, like, for instance, a human or other animal’s body, right? We have our physiology, anatomy, or musculature—all that kind of stuff that’s on the inside. And you can understand the outside by understanding the inside. And that was actually Michelangelo’s insight. That’s why Michelangelo defied the Holy Roman Empire and started dissecting human cadavers because he understood you have to understand the inside to portray the outside well.

NV (0:06:24): But then why isn’t the types of the programs that is the outside?

CE (0:06:28): Ah. Yeah. Well, maybe there’s a third level there. Maybe the types are one outside of the programs are another, but that program itself is the outside. Maybe the programs are the inside of the types, but the programs are the outside of the logical structure, the derivation, which is what? The simple—hopefully—specification and the logical construction. And I think if you really want to understand, at least for me, when I try to understand algorithms, I want to understand families of algorithms, and the familial structure that relates them is not in the program. It’s in the derivation of the program.

WS (0:07:02): It’s at another level, basically. The program is the end product, but it’s the specification or the calculation which tells the full story. Is that how you put it? 

CE (0:07:12): Yeah. Maybe not quite end product. I just want to emphasize that it’s not just the thing you end up with. It’s the surface of the thing you end up with. It’s the shadow. It’s the projection. So in other words, it’s actually harmful if that’s all you end up with. If that’s all you end up with and then you want to go and do something similar to spec changes, whatever, you’ve lost.

WS (0:07:30): Yes. I see your point. So what you’re emphasizing here is a lot of program calculation people, they say, look at things like the refinement calculus, where you’re allowed to mix code and specifications. And then they have these rules for doing the calculation. And then they say code is the bit that you’re left over without any specification, e bits left that we know how to execute. 

CE (0:07:52): Yeah. And proof e bits. 

WS (0:07:53): Yeah. The whole story is not just the end product. It’s the derivation which tells the whole story.

CE (0:07:58): Yeah. And maybe another way to say that is the whole product is not just the executable. 

WS (0:08:02): Right.

CE (0:08:03): I think maybe we jumped ahead. You were asking –

NV (0:08:06): Yeah. I think about it is more familiar. I’m trying to understand. So you do the derivation of the program, and you take the outside, and you can analyze it better. Is it this way, or you go the other way? 

CE (0:08:20): I think to understand computation, you need more than a program. To understand a program, you need more than the program. So the idea of studying a program is really like missing the heart because the understanding isn’t in the program. I mean, a shallow in that sense of lower-dimensional understanding is in the program. But if I want to understand why the program is what it is, I want to understand its correctness. I want to understand its motivation. I want to understand all of the choice points that led through a very high-dimensional conceptual space to that one program because that same pathway leads to other programs that maybe share a specification and make different assumptions or trade-offs. For instance, am I running this on a machine that pretends to be sequential like a modern processor? Or am I running on a machine that is not pretending to be sequential that is allowing the natural parallelism that nature gives us? And so depending on which one of those I’m doing, I’m going to come up with probably a quite different algorithm, a different computation. But those computations have the same starting point and maybe even have quite a lot in common before they diverge. And if you imagine all the places you can diverge and all the reasons for diverging, you get this high-dimensional tree. All of which share a root, which is the specification, and then a share of context, which is the logic, like if I’m using Martin-Löf type theory, nice constructive logic. They will all be living in that space. 

WS (0:09:32): I know you’ve done quite a bit of work with Agda independent types recently. I understand how you say, “Well, if you just have the program, you always want to know what the specification is.” Yet in this kind of language, you typically have as part of the specification in the type. Even when I write about Agda code that I wrote or I try to present it or explain it to students, what they’re seeing is an end product of this dialogue that I had with the type checker, where I said, “Oh, I tried this, but I got stuck.” So I needed to generalize my index hypothesis or do something smart. And then all they see is like this polished, very hard-to-understand sometimes piece of work. 

CE (0:10:08): Yeah. I think I would like to ask you to come back to that question. I’m going to be a difficult interviewee.

WS (0:10:15): No worries.

CE (0:10:16):. But you asked me this question. I’m going to go back to your original because I actually liked the sequence you had in mind. Yeah. 

WS (0:10:21): Okay. 

CE (0:10:21): So we’ll come back to it. And if we don’t, please grab me and shake me. 

WS (0:10:28): Go ahead.

CE (0:10:28): Yeah. So at Carnegie Mellon, I got to be in this group that was about – it was kind of, I think, eventually started out called the Ergo group and then called Reasoning About Programs or something like that. Bob Harper also joined our group late in the game for me. So yeah, we were building tools for deriving correct, efficient computations. And at first, we did it in the Ergo system. This thing we were building, including the Lambda Prolog implementation, was in Common Lisp, and that was quite a lot of fun. I found myself in Common Lisp wanting types. And wonderful thing about Lisp is every project can be a language-hacking project. And that just sort of appealed to me. I sort of hacked a type system into the language that I could reflect the types I had in my head. And then ML came out, Edinburgh ML, and then Cambridge ML, I think, with Larry Paulson. I think he made the move from Edinburgh to Cambridge, something like that. And so we started playing. We got a very early ML implementation. This was before Standard ML of New Jersey or OCaml. And I just loved it. I mean, it was a much better match to me. It was what I was trying to make Lisp into. And so then changed gears and worked in ML and that was really wonderful. 

And then I graduated, and I still had an itch to do computer graphics like I wanted to do while I was in grad school, but I got a little bit sidetracked into type theory and such. So then I got a job at Sun Microsystems. This was in 1990, I think it was. And they were forming a graphics technology group, like a hybrid research and product group, a research group in an org, or a product organization, I guess. And so we were tasked to a pretty blank slate, come up with the next-generation programming interface, a programming library for interactive 3D graphics. And in those days, not even the computer graphics research group, advanced technology group, had graphics accelerators. They were like just barely available, and they were super expensive. Eventually, I think maybe one or two people in our group had one. I certainly did not. 

So I started looking into interactive computer graphics APIs and reflecting on them and what are the programming models. And it was there that I realized there’s API that shows up over and over. It shows up a lot in computer graphics. It’s called retained mode graphics, or sometimes it’s called display lists, or sometimes it’s called something trees. I forget now. Scene graphs, that’s another word for it. These are all similar ideas. Neural networks are another example of this, and often video editing software will have this form. And what is the form? The form is there’s a directed graph, and your program builds this directed graph. So, for example, neural networks. It’s called a network because it’s a graph. So your program builds these graphs. And then there’s something that interprets the graphs. But I had just been in grad school at CMU. And it was really clear to me, this whole paradigm is like a poor man’s functional programming. It’s somebody who realizes, “I need some compositionality.” Of course, you need compositionality, right? We all need that when we’re trying to do something open-ended. They need compositionality. It’s very easy to go from this realization. You need compositionality to the assumption that what you need is a graph. I’ve got these operators, and they feed information to each other. So this must be a graph. So people over and over start with realizing they need compositionality and they have a rich vocabulary, and they go to, “Oh, I need a graph API.” And what I noticed was, no, you don’t. You need a functional programming language. A graph API is just a really clumsy way to do functional programming.

And then people invent their own type system, which is usually horrible and very clumsy and very limited. And they’ll take the topic they’re interested in. It’s like 3D graphics make that first class in a harmful way, like more first class than what we consider. Now, what we call first class is it’s a value, and it has a type, but you can go further and you say, “No, no, it’s a language feature.” 3D is this language feature. And that I noticed early on. That’s a really harmful thing because as soon as you make some topics on your domain into a language feature, then that feature supports every other type really badly, if at all. That’s what I was seeing over and over and certainly seeing in graphics APIs. So I realized, no, no, no, we don’t need some kind of graph API that has this like special magic for 3D graphics stuff, 3D geometry. We really need just a nice typed functional programming language. It has to be polymorphic. Why? Because there’s 3D, there’s colors, there’s points, there’s vectors, there’s lights, there’s cameras, all kinds of stuff. And because we were doing interactive graphics, which means there’s going to be some animation, like coming from your programs, but also from yourself and the way you interact, then the time-varying things needed to be rich and polymorphic and that kind of stuff. 

WS (0:14:35): And that’s FRP. 

CE (0:14:36): Yeah. Those are the insights that led to functional reactive programming. I started that work at Sun and then continued it.

WS (0:14:41): So you mentioned how functional programming is really somehow modeling graphs, but that’s not obvious to me.

CE (0:14:47): No, no, no, no, no. I was thinking the other way around. Graphs are very clumsily modeling functional programming.

WS (0:14:53): Right. Okay. Because usually when you think of functional languages, I think of inductive structures and doing folds over trees. And then how is that a graph? 

CE (0:15:00): No, no. No, no. No. Don’t think of the data structures in the language. Think of the syntax of the language. And the syntax of language is a directed graph. You know what I’m saying? It’s a tree. It’s not a DAG. But it really is a DAG because we have let. We have things that are like let. We have the ability to name something and reuse it.

WS (0:15:17): Okay. So then if you have a let binding and have a let x and then you see several occurrences of x, then that’s the graph part.

CE (0:15:23): Exactly, that’s the graph. But I want to program it in a nice, civilized, well-designed, orthogonal kind of functional language with a good type system, not a thing I made up for my specific domain of application.

NV (0:15:37): And did you push it to your group to use that?

CE (0:15:40): Yeah, and it was not an easy sell at all. My boss kept telling me, “Conal, you need to think more about standard practice. You need to think more about what people already understand and make small tweaks.” And that’s never been my nature. So every performance review, I would get those kinds of remarks, “You need to learn to compromise and try harder to accommodate current practice.” Eventually, I told my boss, “Just don’t even bother saying that to me anymore. I’m not going to try to do it.” Because I could see from my exposure and typed functional programming and denotational semantics, like really understanding what’s going on in a general and elegant, succinctly expressive way. 

So I got a lot of pushback at first, but I had a strong intuition that I was onto something. I was also only like, what? 29 years old. And it was almost my first job. And it was difficult for me to take my boss’s reprimands and not be too demoralized. But eventually, the rest of the group could see the power, the generality, and simplicity of what I was talking about in contrast to what they had seen before. Eventually, it started just giving me more leeway, and we did. We did quite a lot of that. Then that group basically moved almost intact to Microsoft Research when Sun closed down the whole department, the whole graphics multimedia department. It was at Microsoft Research that I discovered Haskell. I discovered Lisp in the library when I was a math student. And then from Lisp, I had good luck to find ML. And then I was still basically working in ML at Microsoft Research. And somehow I heard about Haskell and tried it out and I realized this is so much better match for me than ML is. 

WS (0:17:13): So do you remember how you first ran into Haskell or Gopher or Miranda or something? 

CE (0:17:18): Not exactly. Well, of course, I was interested in functional programming, so I must have been tuned roughly into the right channels to hear about it. 

WS (0:17:26): Sure.

CE (0:17:27): But I do remember that I heard of it, and in particular the Hugs implementation, Mark Jones’s Hugs implementation, which was lovely. I downloaded it on my Windows machine with Microsoft Research, and I ran it, and it opened up this DOSBox and gave me some prompt, and I immediately deleted it. So this can’t possibly be what I’m looking for because it was just this very primitive-looking thing. And then I looked around, and I realized, “Oh, I guess that was the right thing.” So I downloaded it again and had a little more patience with it and learned it. So that would have been probably ’94. 1994, ’95. 

I had come to Microsoft Research to continue the work I started at Sun Microsystems. I had a paper at SIGGRAPH in whatever that was. ’94, I guess. And I met folks from Microsoft Research, and they were doing a cool 3D graphics hardware design project. And they hired me to design the programming interface. And I had some things that I had not explored yet at Sun Microsystems I was itching to, and that was the finishing pieces of functional reactive programming.

WS (0:18:21): Oh, nice.

NV (0:18:22): So why did you say that Haskell was a better fit for you than OCaml? Is it the syntax?

CE (0:18:28): It’s a lot of things. And it wasn’t OCaml. It was just ML, whatever ML. By that time, Standard ML of New Jersey is probably what I was using. So what was it? Definitely, the syntax was more relaxed and elegant. I mean, it still is. I still choke on ML. Let’s see something better. It’s hard to go back. And of course, just like when I worked in Lisp, by one of the types, when I worked in ML, I did not want side effects. I did not want rampant effects in an undisciplined sense. And why? Because I was always focused on what does it mean? And when I say, what does it mean, I mean, denotation. What is it? Simple mathematical meaning, not an operational kind of semantics, which is complex machinery that doesn’t really tell you what it’s about in an elegant, clear way, nor how to implement it well. So I was always focused on this simple, precise denotation. And that meant, of course, I didn’t want any effects in the types. So as long as I worked in ML, I had to pretend that ML was a pure functional language in order to get the mental clarity that I was after. Just like when I was in Lisp, I had to pretend that it was a typed language. So Haskell let me say what I was trying to say in a more direct way and, of course, supported it much better. And of course, like non-strictness is absolutely essential for modularity, particularly in the absence of rampant side effects. That was just an easy argument as a very clear argument to me. I didn’t want side effects. I did want non-strictness. 

WS (0:19:48): I mean, that’s what John Hughes says in Why Functional Programming Matters, right?

CE (0:19:51): Exactly.

WS (0:19:51): It’s the modularity argument for laziness.

CE (0:19:53): Yeah, exactly. That paper, it’s called Why Functional Programming Matters. It was really why non-strictness matters in programming. And as I remember, it was probably Simon PJ saying that non-strictness was really the first thing, and then the purity of Haskell, that choice was a consequence of non-strictness. Non-strictness isn’t very practical without purity. But for me, the purity was really more at the heart of it.

WS (0:20:16): That’s the thing which matters more, right? I could imagine working in a strict yet pure language. 

CE (0:20:21): Yes. 

WS (0:20:22): But –

CE (0:20:22): Yeah, kind of. 

WS (0:20:23): Yeah. Kind of, right? Yeah. But you have to be careful. You write things slightly different. 

CE (0:20:28): Yeah, absolutely. I mean, Sean’s paper really, I think it explains well why that doesn’t work out to do pure functional programming with a strict language. With a strictifying language is a better way to put it. 

So that got me to Haskell. And so I worked with Hugs. I did the first few experiments and prototypes of functional reactive programming. I think I was calling it reactive behavior, modeling language reactive modeling in Haskell, I think was another name for it. So that’s what got me into Haskell, and then Haskell was my main language for more than 25 years. I did quite a lot. It was a good run. It was a lot of fun. And of course, it introduced me to a community of wonderful people like yourselves. 

I did a lot in interactive graphics, 3D, 2D, that kind of stuff. And then graphics hardware, affordable graphics hardware. So mainstream, you know, graphics processors for video games basically came out in the mid-nineties, maybe early second half of the nineties, and I love playing with imagery and that kind of thing. And I got very curious about what if I wanted to do a 2D graphics, but much richer than the 3D graphics I was doing. So not D geometry, but 2D imagery, and make it run really fast. So I had done optimizing compiler kind of stuff, generating C code. I could take very elegant denotationally—in other words, simply and precisely—defined graphics interface or imagery interface, and generate quite efficient code with it. First, I did that, it generated C code, but when graphics processors came out, then I generated a graphics processor code. At first, the graphics processors were very weak, so I could do like 128 instructions and that kind of thing. Really not very much. But it was enough to validate the idea. And there I was impressed. The speed went up tremendously. The optimized C code I generated versus the not-very-optimized graphic process code for the very first generation of graphics processors. It was tremendously faster than any C code I could run. And the programming inference was identical. So it wasn’t like you’re programming a graphics processor; you’re just writing very clear descriptions of imagery. And that imagery was infinite and continuous. Just like my animation stuff and interaction set was infinite and continuous in time, the imagery was infinite and continuous in space. And that infinite continuous is essential to modularity. It’s really the same argument that John Hughes was making in his paper. Lazy functional programming, we think about like infinitely large data structures, but infinitely fine data structures is just another perspective. It’s really the same information, which is you can probe arbitrarily deep.

WS (0:22:46): Okay. And then you sample bitmap at some point to actually visualize this, but for the sake of compositionality, at least, you don’t want to a priori say this is the granularity because you don’t know. 

CE (0:22:57): Yeah, because that breaks. Yeah, because you don’t know. If you want spatial modularity, that is, if you want to be able to describe things and then transform in those space, then the things that that flexibility, which is incredibly important, valuable, that’s what makes graphics like geometric graphics work, is that ability. But that means you have to not commit to the resolution when you construct it because you don’t know what resolution to use until it’s displayed to the screen. And that is after it’s gone through a lot of transformations. And similarly, in time, you want to be able to describe things in a temporally modular way. You have to be able to transform them in time after you describe them, which means you mustn’t describe their granularity. So to get simple modularity, they need to be infinitely fine-grained. And then similar to with the extent, if you want to be able to describe something and then shrink it or move it or whatever, you must not commit to the extent of it when you create it. It’s only when you render it. In other words, when you clip it and sample it. So it was really this, I think, the same argument, or maybe a little bit subtler version or a variant of John Hughes’s argument that infinitely regulated time and space. And of course, that fits very well in the spirit of a non-strict so-called lazy functional programming. That got me into graphics hardware and that kind of thing.

Then I took some time off. I did a midlife sabbatical in my forties and did a bunch of wonderful stuff, shamanic training ballroom dance and jazz singing. And I coached relationship workshops, my current partner that time. And then at the end of my forties, beginning early in fifties, then I went back to work, and I got into massively parallel hardware design work for a hardware company. And this all again was perfect application for functional programming. 

The fellow, a brilliant guy named Steve Teig, who founded the company, he designed this hardware architecture that was massively parallel, meaning like not 16 cores but maybe half a million or a million tiny, tiny reprogrammable gates. And the cool thing about it technically was that the architecture, the hardware—they call it the fabric of this device—it’s like an FPGA (field programmable gate array). It reprograms itself 2 billion times a second, unlike normal FPGAs, and there’s a huge performance when you can get from that very, very fast, low-overhead dynamic reprogrammability. Basically, it can cure dark silicon and a bunch of other things. And he had the, I think, brilliant insight for his background that no von Neumann style of programming languages was going to be a good fit for the architecture. And then he looked around. I think he found John Backus’s Turing Award paper, Can Programming Be Liberated from the von Neumann Paradigm? And he realized, “Ah, this is it, functional programming is it.” And he looked around, and he found Haskell. It’s like, “This is it.” And then he found GHC and already had the idea that we’re going to get into GHC and change the backend. This was all before he met me. So I met this fellow, and in 30 seconds, I was like, “I’m going back to work. I found somebody.” And he understood that we cannot make significant progress on hardware or on software. Neither one of those we can do. We can only make progress on both of them simultaneously because they both hold each other back. 

WS (0:25:43): Was this asynchronous or –? 

CE (0:25:45): This one is synchronous. There’s like the clock region. So in a way, a little bit asynchronous, but not like purely asynchronous, which is a fascinating area. Asynchronous. 

WS (0:25:53): Yeah.

CE (0:25:54): Yeah. So then my job was to figure out how to compile Haskell to hardware and not like yucky imperative Haskell, but like beautiful, but Landon called genuinely functional or what do you call denotative Haskell. I know it’s Haskell without IO and its ilk. So it’d be free of the sequential bias that all imperative programming has, including Haskell. So I did that, and that was actually quite successful. He had running hardware, and my compiler was working. And that was where I discovered what I wrote up as a compiling to categories, understanding hardware, because I didn’t really understand how it worked very well. Talking to a bunch of people, I got a little bit of inspiration that unfolded into compiling to categories. And then that idea unified a lot of what I had done before and pointed into new directions. 

And then from there, I went to another job. This was at Target. It’s a weird time at Target where there happened to be a group that was forming who was going to do wonderful, lovely Haskell-based work and with an eye toward mathematical clarity. And so I got in on that action, and there I continued the compiling to categories idea. I realized that computer hardware can be described elegantly well, effectively, insightfully in the language of category theory, that basically circuits can be expressed in the language of categories, and then timing analysis can be computed in a very natural way as a functor in category-theoretic notions. Not advanced category theory, very simple category theory. That was when I realized, ah, now I know how to do the thing I was hired for and compile Haskell to hardware. 

But I also realized this is a much bigger answer than the question I started with. In other words, this answer answers a lot of questions that I haven’t even explored yet. And so over the next several years, I explored what else can you do with this insight of non-standard but systematic, principled interpretations of typed functional programming to do things that we cannot do in our standard functional programming implementations. In particular, there are lots and lots of interesting but undecidable things that you’d like to do with pure functions. And so rather than make a really clumsy API, like all these graph APIs, like TensorFlow and all the other things—instead of making a graph API, which changes the conceptual model and really messes it up, what if we use exactly the same conceptual model? The problem isn’t that we don’t have a language that expresses what we want. The problem is we’re using implementations that don’t allow us to do analysis, don’t allow us to do certain kinds of analysis. In other words, we have implementations that make certain kinds of really useful questions, like differentiation, derivatives uncomputable. And that uncomputability is not a consequence of the language; it’s a consequence of the implementation of the language. So the wrong solution is to change the language, or even worse, to make a second layer language of graphs and then give it all kinds of dodgy semantics because you’re not thinking in a principled way. That was the realization, and then I applied that idea to efficient reverse mode, higher-order automatic differentiation. 

NV (0:28:36): Wait, what is the correct solution?

CE (0:28:38): The correct solution that I recommend is that we say, yes, our languages have semantics, by which I mean denotation. They have mathematical meanings. That’s what’s essential. Nothing operational. We have languages, and they have certain meanings. That’s a language. Now we have implementations, either like concrete ones, like compiler, interpreter, that’s actually some code, or we have abstract implementations that people call operational semantics. They call it semantics, but I think of it as not quite so concrete implementation. Those definitions, the operational ones, lead us away from the ability to do really powerful things, like derivatives and interval analysis. Most interesting questions about functions are non-computable, but if you just go a little bit further, instead of saying, “I’m just talking about functions,” to “I’m talking about maybe a few other categories. Maybe they’re mostly functions. Maybe they are functions that have a certain property, like differentiability. Oh, just a little bit.” If you put just a little bit more, and you look at what’s the simplest, most natural way to define the categorical operations, then what you get is, you get not only computable solution to these interesting problems, but a simple and beautiful and general one and one that parallelizes easily. 

So, for instance, the dominant – I was going to say algorithm, but that’s much generous word. The dominant computation that’s used in neural networks is backpropagation. And it’s probably wrong because it’s very complex and imperative, and it’s very inefficient because it’s imperative because nowadays performance comes from parallelism. And if you have a stateful way to think about a problem, then parallelism is going to be real hard in there for performances. What I was advocating is just relax a little bit about saying, that’s not necessarily the category of functions that we’re talking about in this language. Oh yeah, and I learned in grad school, I learned this at Carnegie Mellon, that the simply typed lambda calculus is not just a language. Its models are not just computable functions. There’s a lot of models, and they’re exactly the Cartesian closed categories. I never had any idea that that was a practically useful theoretical result until I was standing one day at work and I had this aha. Oh, circuits are categories. I mean, the circuits form a category in a natural way, not a unnatural way, elegant way. And then, oh, the Lambda calculus—in other words, Haskell—its models are not just computable functions. They’re all kinds of things. And so putting those two insights together, I think, opened up quite a lot of work. So, that carried me through another few years. 

And then I started mentoring a group of people doing hardware design because I had really thought a lot about particularly parallel hardware-friendly algorithms. Not just algorithms, but infinite families of algorithms. When I say algorithm, I mean a correct computation. I think the word ‘algorithm’ has gotten watered down, and people just mean computation. But I mean, in the original sense, an algorithm is a correct computation. In other words, you have to have a specification, and you have to have a computation. You have to have a proof that the computation is faithful to the specification. That’s what I mean. So I had been thinking about infinite families of parallel computations, which means I’m thinking about the proofs of correctness. Of course, the logical structure, that is what defines the family.

And so I started mentoring people who were doing hardware design for machine learning kind of hardware. And I knew that matrices are actually not the right way to think about linear algebra. It’s the way almost everybody does, but it’s a harmful way to think about it. It breaks modularity. It pushes you towards inefficient algorithms and a bunch of other things can be described in a much more flexible and more type-safe way than using matrices, than using numerically indexed matrices. So I wanted to bring this insight to these people I was mentoring. And that’s when I realized—that’s when I came face to face with the limitation of Haskell. And the limitation was I wanted to help them understand. I was coaching them in correct, efficient computation, but Haskell could only talk about computation. It can’t talk about correctness. They would keep forgetting what correctness meant. Correctness means it’s a homomorphism, natural homomorphism that relates the specification and implementation. That’s it. But they would keep forgetting. And I realized that the language itself and their way of interacting with the compiler was not reinforcing what I was teaching them. And that’s when I switched to Agda.

WS (0:32:25): If I remember correctly, you had related to this idea of a denotational design, where you have some type class, which captures a structure.

CE (0:32:34): Exactly. 

WS (0:32:34): Something like a monoid, right?

CE (0:32:36): Yeah, exactly. So denotational design – thank you for bringing it up. Denotational design was a reflection on what I had been doing for a long time. And after maybe 20-ish years of doing denotational design, I became aware of what it was I was doing in terms of homomorphisms. And at first, I just did it for poetry. At first I was writing a paper, and I thought it’s about functional reactive programming as an implementation paper. And I may as well remind people what it is, which to me is it’s denotations. No operational idea at all. And most operational ideas about FRP are actually bad, harmful ideas. Not only unnecessary, but harmful. So I wanted to renew this. By that time, I had been exposed to enough patterns from my Haskell experience. I realized I could write these definitions in a pretty poetic way. I started noticing it worked for one example, and I said, “Oh, that’s just pretty.” And I like to write pretty code. I like to write code that’s poetic and satisfying and pleasant to read and pleasant to say. And so I noticed a little rhyme in there, a little duality. I thought that’s pretty. And then I started doing more, and I noticed, “Oh, wow. There’s another one here.” And then it worked for everything I was doing. And then I realized, “Oh, this isn’t just a pretty thing. This is a really profound principle. This is what correctness is. This is what computation is.” This is what algorithms are. Purposeful, meaningful computation is essentially an analogy—in other words, homomorphism—between something that has a clear mathematical, simple mathematical definition and something that is amenable to efficient computation. The correctness of that computation is the homomorphism. That’s the heart of it. I couldn’t teach that in Haskell. And I have found that whenever I tried to teach it, it would just slip through people’s fingers. They were like, “Yeah, I get that. That’s nice.’ And they go away, and they come back, and I saw they didn’t understand. So I realized I need a language. I need to give them a language so that when they lose the thread of what correctness is and what the goal is, the compiler will remind them. So the goal itself can be expressed in the language and check automatically. And that’s why I had to leave Haskell, the inability to do that. And then the mentoring became tremendously more effective for my mentees and also a lot more fun for me. 

NV (0:34:38): But they accepted it. You said, like, this was people who wanted to do hardware, no?

CE (0:34:43): Yeah.

NV (0:34:44): Like, I mean, it’s very difficult for these practical people to learn and accept Agda, no?

CE (0:34:52): Yeah. It is. It’s rare. And there’s a group of folks that I knew and then people they knew who had a startup for a while. They were open to it because we had a previous relationship, and then they brought other people into it. So it was a lucky situation. And then of course, in my hardware company job I had before that, like I said, the founder, he got it in his bones why functional programming is the right thing.

WS (0:35:14): To make it a little bit more concrete, right?

CE (0:35:16): Yep. Please.

WS (0:35:17): So the idea behind this denotational design is that you want to preserve structure. So if you have some structure, let’s make it very simple, like a monoid class. And then the idea is that whenever you write a program, you want to make sure that that structure is preserved. So it’s not just the preservation of mapping the operator to another operator in some other domain and the unit value to some other unit value. You really want to know that you’re not just mapping to any value, but you’re mapping to the unit value in the target homomorphism. And that’s that kind of thing which is hard to enforce in Haskell?

CE (0:35:48): That is what I believed. Yeah. So what you just described, that this homomorphism nature, the preservation of particular algebraic structures, that is what I believed the essence of it was until the last few years since I switched to Agda. And I started writing a book about exactly what this methodology is in a completely precise, non-hand-wavy way. And that process was really helpful to me. So I’ve worked the last few years. What I’ve been doing lately is I’ve been taking these ideas, the spirit, and the intuition, and my inclinations from back early in my career when I did functional reactive programming and things like that, making it more clear, and some of the papers I published and the work taking as far as I could with clarity in Haskell. But since I switched to Agda, it’s really made me so much more – it’s allowed me to be so much more aware in a much more precise and a much higher quality way than I was before. So it’s not just that I can teach other people well. It’s really, I understand my own denotational design much better. And I realize it’s really not about homomorphism. It’s really about something more fundamental. Homomorphism is convenient. What it really is about is a precise analogy between two, let’s say, computations, where one computation is sort of more optimized toward mathematical simplicity—in other words, human understanding, precise human understanding—and the other one is optimized more toward physics or toward physical manifestation, toward efficient implementation. That’s what it’s really about. 

So in other words, these two computations, they have a certain relationship to each other. Well, what is the relationship between the two computations? Well, if you think of those two computations, if you draw them as horizontal arrows, one above the other—and I’m going to say the top one is the implementation; the lower one is the specification, the semantics, you might say—then there are two other arrows that are completely crucial that are invisible or they’re implicit. I think when we reflect on them, you’ll see, “Oh, of course, I use them all the time,” but until writing them down, it’s hard to see. And those are the data interpretations. In what sense does the runtime data—the data I’m actually going to be computing on—in what sense does it denote the mathematical counterparts? So, for instance, I might be doing binary arithmetic. Why would I do binary arithmetic? It’s entirely for performance. If I wanted clarity, I do unary arithmetic. I just use piano numbers. That’s much clearer. A correct arithmetic algorithm or arithmetic API is going to have all the simple specifications, all the clever implementations, but it also has to say in what sense do the binary, the bit sequences, denote natural numbers. And there’s actually a few different ways to do it. For instance, you could do what’s called little-endian versus big-endian. You can do like lower order bit first, higher order bit first, or all kinds of things you could do in all kinds of number systems. Exactly in what way do your bit vectors represent natural numbers? Let’s say piano numbers. That’s as important as the other two computations. 

So I realized that the essence is really these commutative squares, these very simple commutative squares that have implementation specification and data interpretations. And then for convenience, it’s nice to have homomorphisms. In other words, it’s nice to have a common vocabulary between the specification and implementation. Different interpretations of that vocabulary, but a common vocabulary. That homomorphism helps you in a systematic sort of way build these commutative diagrams. And the wonderful thing about these commutative diagrams is they have the full story of what you’re trying to accomplish (that’s specification), how you’re trying to do it (that’s the implementation) and in what sense do the runtime representations denote the ideal? Those commutative squares themselves form a category. They form actually a two-dimensionally compositional structure. That understanding led me to compositional correctness, which I think is hugely important, which is the idea of what’s often called program verification or hardware verification, is you take some big complex artifact, and then you verify it. But what does that mean? It means, first of all, there was no specification most of the time. Oops. So I have to invent some story about what I was trying to do. And then I have to invent a rationale that says I accomplished it. But the truth was I made the implementation with a purpose in mind. So the purpose really comes first. In other words, the specification motivation really comes first. 

WS (0:39:32): Yeah. But it’s often left implicit. 

CE (0:39:35): It’s usually left implicit. And the implementation language might not even be able to express it and almost certainly cannot express the correctness, the consistency of the two. And the details, the specific way in which it’s correct, is the data interpretations. So we need to bring those all into one package, and that package has to itself be compositional. I think that’s absolutely necessary. Compositional simple correctness is necessary for the scalability of correct computation. And how do we do that? Well, that’s what I’ve been exploring for the last few years. I’ve been at a deep dive of what really is a methodology for scalable, reliable, correct computation in hardware and software. Why hardware? Because you can get like 10,000-factor improvement if you design your own hardware versus if you use an off-the-shelf CPU. I mean, like one to 10,000 because CPUs are actually incredibly inefficient because the architecture and multi-core ones have the extra inefficiency of bus contention, all this kind of stuff. 

So for the last few years, I’ve been in this deep dive of what exactly is this methodology. And it’s a further refiner, like denotational design was a clarification. And then I realized, no, I can really clarify denotational design more, and thanks to Agda, I had the tools that allowed me to see my own thinking process much more clearly, and I saw some defects in it. So it wasn’t just like it helped me say what I was trying to say; it actually helped me say something better. It helped me to say better, but an idea that was better. 

And so now I’m in this phase where I’ve developed a lot of, I think, very useful insights, very powerful insights in hardware and software design with simple specifications. That’s crucial. That’s what I think almost everybody in our area is doing. Formal stuff is missing, and all of operational semantics, which is really dominating the PLT community now, is missing the boat and without simple specifications – and the word is elegant. Some people get uncomfortable with the word elegant, but to me, elegance is like it’s completely essential. And by elegance, I mean, simple, precise, adequate. That’s all I mean. So perspicuous is another good word for it. Perspicuous specifications, perspicuous developments. It’s simple, precise, and it’s adequate. It gets something done in a surprisingly simple way. 

And that’s not just important for aesthetic reasons, even if maybe that’s really what drew me into it. I think it’s essential for scaling because if you make things that have complex specifications, then you have a complex theorem. This implementation satisfies this very complex property. It correctly implements this instruction set of a processor. That’s an extremely complex specification. Everything that’s done in operational semantics has a complex specification. Look, here’s this linguistic structure that I just made up. Look, here’s this transition system, this sequential transition system. Here’s all these questions about its termination, its confluence, its type preservation, all this complicated junk that does not have to do with a computational question we’re interested in. It has to do with coping with a clumsy way to try to do correct computation. Whenever we have a complex specification, we have a complex theorem. If we succeed, ideally at the best, we have a complex theorem. Every complex theorem is hard to apply. The cost of coming up with this theorem has to do with the complexity of the proof, but the cost of using it, consuming it, has to do with the complexity of the theorem. And hopefully, we’re going to have more users than we have producers. That’s what we’re trying to do. We’re trying to make reusable stuff, which means we’ve optimized the wrong thing. We’ve made every user have a difficult time because we have a complex theorem. 

So my commitment is giving people simple theorems of really useful things, and that’s what I think is necessary for compositionality because when I have a simple theorem, then I can build other theorems, and as long as the theorems are always simple. And that’s why it has to be denotational and not operational. But denotational isn’t enough because you can make a mess with denotation. You could just translate anything operational into something that’s technically denotational, but you’ve just documented all the complexity. You haven’t made it go away. You’ve just documented it. So that’s why simple specifications start with specification first and then simplicity always.

WS (0:43:08): So how do you – I find this a lot when I try to do verification. There’s often more than one way to write the spec, right?

CE (0:43:14): Yes. 

WS (0:43:15): And sometimes you feel around a little bit, and then at some point you have the spec, and you say, “Oh yeah, this one feels right.” But I find it really hard to quantify and say, why is it better to write it this way than that way? Do you have any insight there? 

CE (0:43:29): Yeah, I do. I think I do. That’s been the main question for me for my whole career. What’s the simplest question? What’s the simplest way to express the question for which a correct answer is useful? And the simplest, most insightful way to express the question. Forget the answer—the question itself, because the question is what I’m exporting.

NV (0:43:48): Yeah.

CE (0:43:48): So that’s what I’ve always felt around for. And for me, it’s been a strongly intuitive process. And you can look at it, how many characters you write down, but there are easy ways to cheat that metric. It’s not the right thing. So, a few years ago, I stumbled across a talk given by Murray Gell-Mann. Murray Gell-Mann was a colleague of Richard Feynman, and he’s a Nobel laureate theoretical physicist. He discovered quarks, or predicted quarks, or something. And he gave a talk about elegance in physics about – I can’t remember the title right now, but it was about why is beauty a useful notion in understanding the physical universe? And he gave a definition, which I really, really like. It resonates with me deeply. And he said this. So he’s talking about it. He’s talking about a theory of physics. When he says theory, I say specification. A theory of physics is a specification of a computation. That’s a translation I do. So he said, “We consider a theory beautiful or elegant if it can be expressed concisely in terms of mathematics that we have already learned for some other reasons.” Now, it’s not a formal definition, but I think it really gets to the heart. It’s the concision. The concision is crucial. Now, concision, you can measure in characters or something like that. Well, yeah, but what about maybe you’ve like done a whole bunch of work and shoved it in a library for this one use just so you can cheat the metric? But no, he says, that we’ve already learned for some other reasons. That’s the crucial thing. 

So, for instance, if I’m going to use linear algebra, I don’t have to invent linear algebra. It’s already been invented, and it’s already paid its way. There was a lot of work to do linear algebra well, but we’ve already had a tremendous amount of applications. And because of that, we have linear algebra in our education system. Maybe we do it clumsily using matrices and that kind of thing when other approaches are better, but at least we really understand the theory of linear algebra. So that doesn’t count in the complexity. But anything that I invent that’s motivated by the thing I’m trying to talk about, that does count as its complexity. Maybe it counts for the complexity of what I’m doing, but then we realize, well, something else could use it too, and something else could use it too. And I factored out to do a library, right? There’s an honesty, a scientific integrity thing here, which is, I’ve got to not be trying to game the system, right?

WS (0:45:57): Sure. 

CE (0:45:58): Yeah. I’m really trying to make scientific progress here. That’s not a precise answer to your question, Wouter, how do you really measure, but it’s the closest I’ve seen, and it resonates with me. 

WS (0:46:07): It’s a litmus test, right?

CE (0:46:09): Yes. 

WS (0:46:09): It gives you a way to at least compare two things and say, “Well, okay, I understand that definition, but in order to understand that, I have to develop a whole bunch of ad hoc functionality for this particular problem.” Whereas if it’s something more general, which is like, “Oh, this is a Prelude function that everyone knows,” then all of a sudden, the cost is not there, right?

CE (0:46:30): Yeah, exactly. So I may do a bunch of stuff in my code. And then I was like, “Wait a minute, there’s a monoid,” if I just say there’s a monoid here, right? Or there’s a category or just a group or a vector space, whatever. Like, “Oh, okay. Now I can eliminate a lot of vocabulary.” And my goal is always eliminate all the vocabulary. We drive my API to zero. That’s always my goal. Drive my proof to zero. That’s always my goal. Most of the time I don’t reach that goal. I just get as close as I can to it. So I’m always on the lookout, like this little thing that I invented, that reminds me. It smells like this other thing that I learned in math. And then okay, I’m going to try and make it be that. And then of course, ideally, I would rather not implement it myself. And then the process of Agda or Coq or whatever, a proof system, I was also rather not build a bunch of proofs. I want somebody else, mathematicians, to have made the investment in designing the math. And then I want the people who are willing to work with proof assistants to do the work of doing the proofs. And the very fact that it’s in a library—if it’s in a library—it’s because people have found it useful for a variety of things and they’ve learned to use it. I want to leverage as much as I can on what people have already learned for some other reasons, and the result should be concise, concisely expressed in those terms.

NV (0:47:39): And so you said right now you’re putting all these ideas in the book.

CE (0:47:43): Ah. So yeah, I started writing a book a couple of years ago, and that process itself was super helpful for me. It was really in that process that I realized I was onto the categorical description, but I didn’t want to have category theory in my book. So it was my understanding, but I still wanted to present a simpler, more familiar thing in terms of just homomorphisms and functions and that kind of thing. So I started writing my book in that sense, but the process, I realized, this story is not something that I can express entirely and precisely. It still has to have some hand waving because it’s not necessarily unary homomorphism, it could be binary homomorphism, all kinds of other stuff. And then there are some times when you have to generalize some function, add like an accumulator parameter or whatever. For hardware arithmetic, it may be like a carry-in and a carry-out kind of thing. These things are really there for performance reasons. Those optimizations, which I know are really important, they really break the simple homomorphism story. I would have to invent an abstraction and invent homomorphism, that abstraction, entirely for performance-motivated reasons. And that violates my principles. And then I realized, well, I’ve been thinking about it in this categorical way. And the categorical story unifies all of the arities of homomorphisms. It unifies all of these kind of funny ad hoc things that are there for cumulative parameters, all that kind of stuff. Actually, it is a complete story, and it’s a very simple story. And so I realized I really have to tell the story in terms of category theory and then do it in a friendly and intuitively appealing way. I don’t want to hit people with a bunch of category theory at the very beginning and then try to – then hope the survivors are interested in what I have to say. So I still want to present it in a very intuitive, simple sort of way, but I know it has to be in terms of categories.

Then I realized I had to restart this book. And so for the last several months, I have been in a deep dive about what exactly is this structure and playing with things like functors and natural transformations and this kind of stuff to find the simplest, most general story. And I also realized that we don’t implement algorithms. We implement families of algorithms. And not just families of algorithms that satisfy the same specification. That’s what I was doing before, a bunch of different adders or a bunch of different parallel scans or something like that. I mean, more like what we call APIs. We got a bunch of computation. I want to do addition, and I want to do multiplication. I want to do exponentiation. I want to do all kinds of stuff. That’s what we implement, is clusters of functionality. And so how do we talk about these clusters of functionality? What is their specification for all of them? What’s their implementation for all of them? What’s their interpretation that relates to these? So it’s not really single commutative squares. It’s really these families of commutative squares. And it turns out that there’s a – it’s like from the beginning of category theory, there’s been a very nice way to talk about it. So that’s what I’m working on fleshing out now. 

Also, I am looking for—not very actively and not at all competently—but I’m looking for partners because I’ve gotten a tremendous amount of progress in these insights. I think I really know now what I am wanting to do. I want to show it off in hardware design, very efficient hardware design. It’s not that I care more about hardware. It’s just that it’s more impressive because a hardware design is a very esoteric skill. It’s a rare kind of thing. And the professionals themselves, what they’re good at is some theory, but it’s a lot of coping mechanisms for fundamentally bad choices that the industry made. Same is true with software. Hardware has the advantage of (a) it hurts a lot more when it’s wrong than software does. It costs you like $10 million to fix a bug, $100 million, whatever. Or if you’re like Intel, some billions of dollars with that floating-point bug. So hardware, the pain is much closer to the surface. That’s a good thing. And also the hardware people know that they do not know how to design correct hardware. So they have to rely on testing and non-scalable methods like model checking and graph isomorphism and SMT, all these kinds of non-scalable things. I mean, it’s impressive that they scale as well as they do, but they’re fundamentally non-scalable. And of course, they can’t do things like you can’t prove an infinite family of correct hardware designs. You couldn’t translate that to SMT, for instance, or whatever you use, or any concrete like graph isomorphism or anything. I want to generate infinite families and prove them all correct in a small amount of effort. And so that’s something you can do in a modern proof-assisted, independently typed language that you really can’t do with these other things.

WS (0:51:43): So you want to describe an end bit wide adder and then –

CE (0:51:46): For instance.

WS (0:51:47): – prove that. Yeah, just make it very concrete. To prove it, but inductively for every end.

CE (0:51:52): Yep. 

WS (0:51:52): This will be an adder. 

CE (0:51:53): Exactly. 

WS (0:51:53): And instead of having to rerun your SMT solver or stat solver or whatever, for every end, it’s 32 or 64, right?

CE (0:52:00): For every end. And as the ends get bigger and bigger, the solver is going to get slower and slower because it’s NP-hard, right?

WS (0:52:05): Exactly. Yeah.

CE (0:52:06): Yeah. And then beyond n-bit adders, there’s a tremendous amount of flexibility in adders. What do I do sequentially? What do I want to do in parallel? Am I willing to do n log n work instead of n work? Am I willing to do some redundant like a carry look-ahead adder to lower the latency tremendously? I have to do some redundant work. There’s all these things, all these families of algorithms, which means there are all these families of proofs. And they all share a specification, but they have some other parameters besides the specification that describe all these choices. And that’s what I am been fiddling with, how to describe. And of course, there’s not just one operation. There’s all the operations that you want to do on a particular interpretation of a particular representation, like embed vectors. 

It’s really insight for me, and I’m so grateful to have gotten this realization I want to share it with, which is when we write code in what now I’m calling programming only language – okay, what we’ll call programming language, I’m now calling a programming-only language, meaning it can only talk about the surface and only talk about the implementation, so it misses the heart of the matter and the reusability. I think that the purpose of my code is to take some data and transform it into other data. That’s, of course, a functional perspective, but even when you’re doing imperative programming, you’re still doing that just in a sequential way. Start with some data, you end up with some data. 

But I don’t think that’s really what computation is about. I think that’s what the surface layer is about. I think that’s what the executable is about. But the whole process of why am I generating this computation and why am I happy with it, or should I be happy with it, and if so, why, that is not about transforming. The purposeful computation, or meaningful computation, is not about transforming data into other data. It’s about transforming meaningful data into other meaningful data and doing it meaningfully. And what I mean by that is that when I write code to do, say, binary addition, there’s a reason for that code. It is not just because I have a vector of bits and a vector of bits. It’s because I have a particular interpretation of that vector of bits. I have a particular way in which I’m interpreting that vector of bits as a natural number. So my job in constructing computation is not to transform data to data. It’s to transform, in a way, an interpretation. So there’s a type of data, and there’s a type of its meaning, and then there’s a function from the formative ladder, from the representations to the meanings. That’s the domain. That’s what I’m transforming. That’s what I’m taking in, is meaningful data, is the data type interpreted in a particular way. And that’s what my computation is generating. That’s the whole thing. That’s the higher-dimensional thing that has the purposefulness in it and has the ability to then have the logical proof in it as well.

When I’m just reflecting, I look back, and when I’m programming, of course, I have an interpretation in mind. Everybody has an interpretation of mine, but what often happens is maybe I’m working with you and we have different interpretations in mind and our programming language cannot capture either one of those. And so it cannot notice the difference. You write your code. I write my code. They’re both correct. We put them together. It’s not correct. Why not? Because we didn’t say what we meant. We said, “I’m transferring bit vectors to bit vectors. 

WS (0:54:54): Yeah. So it’s the mapping from representation to meaning, which is missing in a way, right? Because we only focus on the one arrow and not the whole diagram. 

CE (0:55:03): Exactly. Yeah. So our programs miss their purpose, and the languages cannot capture that purpose, and therefore they cannot support us in fulfilling it. Especially we, who like typed languages, for instance, appreciate that the program is helping us, supporting us in fulfilling our intentions. But in this very fundamental way, our programming-only languages don’t and can’t, but dependently typed ones can. Doesn’t mean you get it whenever you’re using dependently typed languages. No, no. You have to use a discipline in which you really capture it. And that’s what I’ve been exploring. 

So anyway, now I’m looking for partners and how to take these insights and scale them up. I’ve been like not having an income for the last few years, and I’ve just devoted this, and it’s been a real luxury. It’s given me time to, you know, like, learning Agda’s not easy, even for me, 25 years in Haskell. It’s not easy. Now that I’ve done it, I realize, wow, it’s worth way more than the effort I put into it. It surprised me. 

My dissertation was on dependent types, so I didn’t really use it in a practical way, and I didn’t get it until a few years ago. I didn’t really get it, my bones. So now I’m looking for how to move it forward, which maybe means start a company to make compelling demonstrations, say, efficient families of correct hardware designs. Infinite families work the first time every time because they’re approved, and we understand them because the specifications are simple, perspicuous, elegant. It’s that joint effort, compelling demonstrations, and then pedagogy. And then I think not many people be interested in the pedagogy without the compelling demonstrations. And so I think those two really go together. That’s what I’m looking for partners on. Gotten to a point I think where it’s more difficult to carry on by myself. Anyway, that’s from the start of my career to the –

WS (0:56:39): To the very end. Yeah.

CE (0:56:41): To finish. Oh, that’s terrible. To my current state. Hopefully, it’s not the finish. 

NV (0:56:51): It’s very interesting. So if one of our listeners wants to be your partner, what do they do?

CE (0:56:58): Yeah. Email me. Reach out at conal.net. You got to spell my first name right. C-O-N-A-L. It helps. You don’t have to spell my last name, right? But most people don’t. That’s E-L-L-I-O-T-T, two T’s. Anyway, yeah, absolutely. So I’ve been mentoring people these last few years and making some good progress there. But most of them, they have to have day jobs, and that really slows them down, that kind of thing.

NV (0:57:23): Thank you very much. 

WS (0:57:24): Thanks, Conal. Yeah, that was great. 

CE (0:57:25): It was a pleasure. Yeah. Well, it’s just I like both of you so much. 

WS (0:57:31): Yeah, it’s great to see you again. .

CE (0:57:33): Double pleasure me to get to hang out with you both. Yeah. Thanks for the opportunity and just the time together.

Narrator (0:57:39): The Haskell Interlude podcast is a project of the Haskell Foundation. It’s made possible by the generous support of our sponsors, especially the Monad level sponsors: GitHub, Input Output, Juspay, and Meta.

SPONSORS
Monads
IOHK Juspay Meta
Applicatives
CarbonCloud Digital Asset ExFreight Mercury Obsidian Systems Platonic Systems Standard Chartered Tweag Well-Typed
Functors
Artificial Channable FlipStone Freckle Google HERP MLabs TripShot
To learn more about the Haskell Foundation
Haskell Foundation, Inc.
2093 Philadelphia Pike #8119
Claymont, DE 19703
USA