Nadia Polikarpova

Transcript

This transcript may contain mistakes. Did you find any? Feel free to fix them!

AS: Welcome to the Haskell Interlude. Today we have as a guest Nadia, and with me is Niki. So – hello!

NV: Hello!

NP: Hi everyone!

AS: So Nadia. You are mostly in academia, but why don’t you introduce a bit, what kind of things you do, and how do you relate to the Haskell world a bit.

NP: Sure, yeah. First of all, thanks for inviting me. I’m really excited to be here and talk to the Haskell community in this way. My name is Nadia, I’m an assistant professor at the University of California San Diego in the US. My work is mostly in programming languages and in particular in the areas of program verification, program synthesis, and I’m also interested in type systems. And my connection to Haskell is – well, first, it’s my language of choice for implementing my research projects. I also teach Haskell to undergrads at UCSD. And well, yeah, it’s just my favorite language. laughs That’s what it is.

AS: Great to hear! So actually, you mentioned one of the things that I’m personally quite interested and excited about: all this area of program synthesis. So I started a long time ago with this idea that Haskell is good because it allows me to not make mistakes, but if I understand correctly, you sort of flip a bit this idea, and use types to create programs. So, things that write programs for you? What is synthesis exactly?

NP: Yeah. So program synthesis is this idea that the current way of writing programs is old and we should on to something else. So basically what’s the next step in how we should be writing programs. And in program synthesis, the idea is we’ll let the programmer to express their intent in terms of some high level declarative specification, and then the synthesizer translates that intent into an executable program, usually by searching a large space of possible programs, to find one that satisfies this intent. And you might ask, how should we express this intent? Aren’t programs already the best way to express what behavior you want? But in the program synthesis community we have actually tried a lot of different ways of writing the specification, so it could be formal specifications, they could be I/O examples. As a recent natural language. But specifically the kind of synthesis that I work a lot on is when this intent is expressed as types. I’m very excited about this because – well, because I’m a Haskell programmer, but also because types are known to be… so first of all, programmers are used to them, and programmers already use types to describe to each other and to themselves what a program is supposed to do in some sense. So types are also precise, they’re concise, so these are the things that I think make them an exciting way to express intent in program synthesis.

NV: So can you give an example that you express intent and you get synthesized output?

NP: Yeah. So actually one example that we probably all know and love is the Hoogle system, for API search. This is not quite program synthesis, but this is a similar idea. So for example if you forgot what is the name for the function that appends two lists together in Haskell, you would fire up the Hoogle website and you can put in a type, you can say “I want a function that takes a list of as and another list of as and produces a list of as”, and then Hoogle will helpfully suggest, as its first suggestion in fact, that this is an operator (++). So this is an example of expressing your intent using types. But obviously Hoogle is just doing the search among functions that someone already has implemented as a library function. But not all the code I want to write, unfortunately, is already implemented as a readily-packaged library function. So for example, one way in which synthesis can help is, what if you had something like Hoogle where you could put in a type, but instead of just searching among individual library functions that match this type, it would search in the space of compositions of different library functions. So that it can generate a new program for you, instead of just reusing an existing one. And in fact, we have worked on something like this.

AS: I was going to ask because we’ve made our homework and we’ve read about something called Hoogle+! Which already the name seems quite appealing. Can you tell us – so, Hoogle+ is like, the realization of what you’ve just told us?

: Yeah, yeah, exactly. So Hoogle+ is this project that my group has been working on. It basically marries this Hoogle idea of searching by type with program synthesis. So it has a very similar interface to Hoogle but instead of just returning a list of individual library functions, it will return a list of compositions of library functions that match a given type. Some interesting things start happening when you start doing that, because the search space becomes so large that first of all you get a lot of junk solutions. So for example, think about – pretty much to any type query that you pose, you can get a solution that says something like “head of an empty list”. Because that has any type that you possibly could want! And so

NV: … similar to undefined?

: Or undefined – undefined is something that you can specifically exclude from the search because you’re like “well, this is obviously a useful function”, but you cannot exclude all kinds of…

: But then you have also error and all these undefined-like…

: Right, right. You can have a blacklist of things like undefined and error, right. Or for example const is also not very useful, or id as well – in these kind of searches. But it’s very hard to simply syntactically exclude all of these uninteresting solutions. So one of the challenges that Hoogle+ has to address is, how do we throw out the solutions that are complete garbage like that.

NV: So is this something that is online, and I can try it now?

NP: Yes, so we have a version that is available on the web. The URL is https://hoogleplus.goto.ucsd.edu/. It only works on HTTP though, not HTTPS. Sorry about that. We’ll fix that at some point.

NV: We’ll give the link!

NP: laughs So, the current version – since this is actually a very difficult problem, it turned out – the current version only works with a small set of library functions. It’s about 300 functions from Prelude and very popular libraries like Data.List . Of course our dream eventually is to have it work on all of Hackage, or a huge set of libraries like Hoogle does. We are not there yet, but this is work in progress, we are hoping to get there one day, and if anyone in the community is interested in contributing and has ideas, we’d be definitely excited to chat about it.

AS: Is this more about the, you have to think more about the algorithm, or it’s more computing power. Let’s say you know we have people in the community working on Meta, which I guess has these huge servers. Would that be enough, or would you still need to think hard about the problem?

NP: I think it’s a little bit of both. Computing power definitely helps, for example when we do give demos of Hoogle+ at conferences, we don’t use our default server that we usually run it on, but we spin up an Amazon AWS really beefy machine, and it definitely gets faster. But I think that we cannot scale it – so currently it works with like 300 components, right, so we cannot scale it to like tens of thousands of components with just compute power. So I think there has to be – well, one possibility is we might have to sacrifice completeness. The algorithm that we have running right now, it was published at POPL 2020 if anyone is interested, uses some fancy combination of abstract refinement and things like that. So this algorithm guarantees that you will always find a solution if one exists of the correct type. But we might have to give this up at some point if we actually wanted to scale to tens of thousands of components. There might be other algorithmic insights that we haven’t had yet that might be needed here. Also recently there’s a lot of advances in program synthesis that use some kind of machine learning, language models, things like that. So it’s possible that we might combine our algorithms with something like that to guide the search towards– let’s say we can ask a model, a language model, something like GitHub Copilot, what are the likely components that we might want to use in this problem. And then somehow extract a smaller set of components that our algorithm could work with.

NV: But, if it mostly works with existing libraries, you could cache a lot of the results, right? I would expect–

NP: So, we could…

NV: The problem would be when you combine it with your user-defined code that it has no idea about, and you say “use my code with existing libraries”.

NP: Right, so this is a really good point, this is something that Hoogle+ does not do enough of. Meaning doing some precomputation that would allow it to be faster with every user query. However, the reason we are not doing this is basically– so the main difficulty in doing this search actually comes from polymorphic types. Because essentially once you have polymorphic components in your library, there are infinitely many possible types – concrete types, that you can have. And that’s why, intuitively) speaking, we cannot instantiate this space of all possible types we’re interested in a priori and search in it. So instead what we do use is, given the user query – and by query I mean the type that the user wants – to implement, given that type we can start exploring this infinite set of possible types that might be relevant, intermediate types that might be relevant. But it’s really hard to instantiate that a priori, I don’t know if that makes sense.

AS: It actually – I’ve never thought about. We usually see polymorphism as this super beautiful, great tool that we use in Haskell, but I could see when you want to synthesize a program this is actually at odds with your quest. If you know you can only take an Int, well that’s easy. Otherwise, you have so many things–

NP: Well, it’s actually interesting because polymorphism is sort of both a curse and a blessing for synthesis. So I already talked about a little bit how it can be a curse, because then the space of possible types becomes infinite. But it can also be a blessing, because polymorphic queries, polymorphic requests from the user, are actually much much stronger than the non-polymorphic ones. So for example – well, you probably know that already, like if you have type for something like map there’s pretty much no other function, reasonable function that has this type, right. Similarly, for replicate, if I say I have an a and I have an Int and I want a list of as, there’s nothing else you can possibly put into that list apart from the a that you were given. So this actually helps synthesis in many ways.

NV: Yeah, because if you ask for an Int

: It can be anything?

NP: Exactly. So parametricity is really helping synthesis in that way.

NV: But I guess the user doesn’t really care about completeness, right?

NP: Right. So the user doesn’t care about completeness as long as, you know in most cases it’s useful. So I think – that’s why I think there should be space in this algorithm also for some heuristics, for like “given the user query, what is the smaller set of modules and components that we should be searching”. We just haven’t really done this work yet.

AS: I was actually going to ask something in that direction, because from an academic point of view we are mostly interested in these kinds of questions, like completeness and so on. But something like this work has very – it has a usefulness, it matters if it’s useful, and I don’t know if there is actually a way, or if you’ve thought of a way of measuring it, if we can even talk about measuring how “useful” this algorithm is to a user, or you have ideas on that front.

NP: Yeah, actually we did. So in our followup paper – in the first paper we published the algorithm – and then in the followup paper, we actually did a user study where we had 30 participants with different levels of experience in Haskell, and we gave each participant 4 little tasks to solve. And then each participant first solved 2 tasks with just Hoogle and a repl. And the other 2 tasks with our tool, with Hoogle+. And we basically measured their success rate, their time to completion, things like that. One of our most important findings was that the participants who were equipped with Hoogle+ managed to solve 50% more tasks. So from that, we can make a conclusion that at least in this artificial lab setting, we observed it being useful. And interestingly, we also observed an interesting difference between beginners and experts in the way they were using the tool. So for experts, it was really natural to provide specifications as types. So once we tell them what the task is they were like “oh of course, it should be this type”, and they always go for the most generic types. So always polymorphism, if possible. And we even asked them why, how did you know that this should be the type. And they have the intuition that yes, this is the most – this is the strongest specification that I can give. The more polymorphic it is, the stronger is the spec. Which is really cool, that Haskell programmers actually – you know, it’s a common knowledge in the community. But beginners were not like that at all. Beginners actually struggled a lot with coming up with these types, that their task should be implementing. Which I think is very interesting, because I also recently heard Justin Lubin from Berkeley talk about – he did a user study about how functional programmers write code. And so he observed a whole bunch of programmers doing programming in various statically-typed functional languages. And he observed that most of them – and those were not beginners – that the first step for everyone is always “come up with a type”. And that’s – I feel like that’s how most of us program if we have a task in mind. But beginners actually have trouble with that. And that was interesting to observe.

NP: So actually after observing this, we made this kind of tweak to the tool. So currently in Hoogle+, if you don’t know what type to implement, you can actually instead provide some examples, and then Hoogle+ will suggest different types to you that you might have meant. As an example of that, let’s say you want to implement a function that removes a JSON duplicate from a list. This is one of the running examples in our paper. If you think about it, the type that you actually need here is not just [a] -> [a], you need a typeclass constraint, right. So you need a to be Eq because you want to be comparing those elements, if you want to remove equal elements. But that’s not very intuitive, especially for beginners. So this type is not very intuitive to come up with. However, if they put in some examples, for example if they put in a list of integers, and then they– how do you remove JSON duplicates from that, and then you put a string and how do you remove JSON duplicates from that. Then from these two examples of list of integers and a string, Hoogle+ will suggest “did you mean [a] -> [a], did you mean Eq a with the same type”, or did you mean let’s say Ord a with the same type”. And then hopefully, the user will remember at least that these things like typeclasses exist, and will be prompted to pick the right one. And in our study we observed that among beginners, the majority of the queries to Hoogle+ were made with examples alone. Without the type. So for a majority of the queries, it was easier for them to come up with examples than the type.

AS: I would actually love for GHC to suggest sometimes that I can make a more polymorphic type, especially…

NV: More polymorphic? I mean, I go to the other side! I think GHC always puts so many constraints, that I wanted to remove some. It’s– yeah, actually this feature that you have, I think it could be used even independently of this synthesis. The “given an example, find me a type”. Are you using existing GHC technology, or is it something that you developed for synthesis?

NP: We developed our own algorithm because we didn’t find anything that does exactly that. We definitely use GHC to get the concrete types of each one of the examples. So for example like I said, if you have one example that goes [a] -> [a] and another example that goes String -> String, then of course you can ask GHC for the concrete types of those. And that’s what we do. But then, how do you go from this to what the user likely meant as the spec? We didn’t find an existing algorithm that does that. One thing that you do, and that’s what we start with, is you do anti-unification to find the least-general generalization. Of these two –

AS: Maybe– maybe you need to explain that a bit…? Anti-unification sounds like some kind of bomb or something, I mean laughs

NV: ??

AS: The world was anti-unified and now we’re here…

NV: ??

AS: I don’t know! Anyway, what is anti-unification?

NP: It’s actually a really old thing. It’s kind of the opposite of unification in the sense that you can do it with types, you can do it with just first-order terms, but let’s say with types, if you have two types that have common structure but in some places they differ, then anti-unification gives you a type that has all this common structure preserved, but those different parts are replaced with type variables. So in the absence of typeclasses, the anti-unification of [Int] -> [Int] and [Char] -> [Char] would be [a] -> [a].

AS: Right. OK.

NP: Because the Int and Char would be replaced with a type variable. However, if you add– so if you didn’t have typeclasses, this would actually be an easy problem, because you say “well maybe you meant [a] -> [a], maybe you meant” – so, the user might have also meant – anti-unification like I said gives you the least general type that subsumes both of those examples. But the user could have meant also something more general, in principle. So you could also suggest maybe it was a -> a. Or you could say maybe it was just a, which is– doesn’t make sense, but it is a more general type, right. But once you add typeclasses in the mix, there will be many more types that are possibly more general than the least general generalization of those two. Because it could be– because both Int and String are for example Ord, so you might’ve wanted an Ord constraint, you might’ve wanted an Eq constraint. Any kind of typeclass that String and Int share could have been in there. And so what we observed: if you do this naïvely, meaning you just take the anti-unification and you take everything that’s more general than that, then you get thousands of types. And you cannot possibly suggest them all to the user. So we had to add some heuristics and some rules that basically filter out types that are uninteresting. So for example, something like a -> b cannot be an interesting type, because there’s no way to make a b from an a, and things like that. So we have rules like that for filtering out uninteresting types and then most of the time we just end up with a few reasonable suggestions for the user. But I don’t know of any algorithm that did this before us, so if you know of anything let me know.

AS: I’ve… I literally didn’t know what anti-unification was so laughs

NP: It’s actually really cool! If you never heard about it, you should look it up because I think it’s useful in just so many areas. And I end up using it in like, every other project that I work on somehow.

AS: Is this one of these hidden things which were invented in the 60s…

NP: Yeah, yeah!

AS: …and we forgot, and then we should just remember? OK, yeah!

NV: And now we have the processor power to implement them.

NP: Yeah! laughs

NV: So, you assume that you know the typeclass– for example…

NP: Right, we assume that there is some small set of typeclasses that we’re interested in, in those libraries, and we know which types instantiate them. I agree that you could have defined– if we looked in a large set of libraries we would have too many possible typeclasses that all these things instantiate. But we just consider a small set of typeclasses.

NV: So, at the end, Haskell is good for synthesis, or bad?

NP: laughs This is a a really good question! I think more good than bad. One thing that’s good is that Haskell programs tend to be very concise, which is always good for synthesis. For example, Hoogle+ only searches in the space of function compositions. It doesn’t do any control flow or anything like this. But it’s amazing how much you can express in Haskell just by composing library functions. So like, the thing that I just said with removing JSON duplicates? I bet in most languages you have to roll out a loop or some recursion to do that. So in Haskell, you can do group and then map head to do this.

AS: You not only give the answer, you give the pointfree answer!

NP: Yeah!

AS: So no colleague can go into your code review and say “do it pointfree”! You really get the real one.

NP: Yeah, so this is cool for synthesis, right. Because you can sort of search in a smaller space in a sense and find the answer. The polymorphic types in the end are also very cool, because you basically can express very strong specifications using just types. I think my earlier comment about beginners aside, I think this is actually really cool, because for many programs, more traditional kinds of specifications like input-output examples can actually be very hard to write. If you’re talking about functions on strings, or lists of integers, sure you can also write examples and that’s not hard. But once you start thinking about higher-order functions, thinking about functions that manipulate some more complex data structures, then writing specifications in terms of examples becomes really hard, because you have to either think of some function you want to pass in as an argument, or type in a huge data structure. No one wants to do that. Whereas types as specification, they kind of scale to this much more complex inputs and outputs, right. Yeah. So I think these two things are really cool, the conciseness and the polymorphism are really cool about Haskell. I think the purity is also cool because dealing with side effects in synthesis is a whole other problem that we don’t have to deal with. Yeah. So I think more good than bad, for sure.

AS: I’ve also seen that– you have work in Haskell, but you’ve also– you move to other things, I’m particularly interested on all the work you’ve done in refinement types. Niki might know one or two things about refinement types…

NV: Little bit!

AS: So I– first of all, I’m just a user, so maybe any of you could give us a better explanation of what refinement types are and how they are different from the usual types we would write in Haskell.

NP: Maybe Niki can start?

NV: No! laughs I think we will want to hear from you though!

NP: laughs Alright. Should I start from the synthesis perspective, or in general?

AS: Whatever you see fit.

NP: Well, since I’ve been talking about synthesis this whole time, so maybe I’ll start from the synthesis perspective. So, remember how I said that if you use just regular Haskell types as the specification, you get a lot of garbage results from your synthesizer because your specification is usually not strong enough to really narrow down the search to just one particular program that you want. One way to address that is to make your specification stronger; basically to make your types more powerful, so that they can describe more interesting properties of your programs. Refinement types is one way to do that, one way to address that. The kind of refinement types that I’m talking about are– so think of them as Haskell types but they’re annotated with simple logical predicates, that can further restrict the kind of values that you can put into the type. So for example, you have a Haskell type Int. So you can have a refinement type that says “all Ints that are greater than or equal to zero”. So you add this little predicate that says “the value should be greater than or equal to zero”, and this is how you describe a type of natural numbers.

NP: You might think well, what’s the big deal, I can describe natural numbers – but it turns out that when you combine this very simple predicate with other language mechanisms, for example polymorphism, you can get very interesting effects. For example, you can write – very easily, say, write a refinement type that says “list of natural numbers”. You basically just say “list of” and then that natural number type that I just talked about. And so all of a sudden, you have expressed a universal property – you’re basically saying “for all elements in this list, the element has to be greater than or equal to zero”, but you only used a very simple logical predicate. You did not have to use any quantifiers, which is what you normally have to do when you write pre- and post-conditions in other program logics. So that’s kind of the key thing to remember, is that because we’re pushing these little predicates inside of the types, and because we have cool features in our type system like polymorphism, we get more expressive power while still keeping those logic predicates very simple. And why is it important to keep them simple? Well then we can use these powerful tools called SMT solvers to actually do all the type checking for us automatically. We don’t have to manually write proofs like in some other systems, if we want to check our program against those types.

AS: Interesting. So something– maybe that’s a more controversial question but, Haskell seems to be moving into this dependent types territory, and refinement types, I don’t know, is this something different, something additional? Why– for example, I was wondering, I had a look at this you’ve made, and– you were not using dependent types, you were using refinement types, and this– it popped into my mind, why not use what is already in Haskell? Is it just useability, or is there another reason to do that?

NP: Yeah, this is a really good question. I mean, I get this all the time, people asking, first of all what’s the difference between refinement types and dependent types. And I always find this so difficult to answer because there’s so many different versions of both of these terms that different people think they mean different things. Um–

AS: Maybe we can narrow it to what GHC means for dependent types? Which is I guess what most of the audience would be familiar with, somehow? I mean, the one…

NP: Yeah.

NV: This is an open question, right?

AS: Yeah, that’s right, yeah. laughs

NP: OK. So maybe I wouldn’t be able to answer this question very precisely, but I think generally, dependent types just means that you can use values in types. And then in that sense, refinement types are a kind of dependent types. Because you’re still using values to refine your types. So what’s important in the particular flavor of refinement types that I’m talking about and that I’m using in synthesis, is that we want to keep the type checking decidable. Because – this is especially important in synthesis, because if you are just doing verification, if you’re just writing the program yourself, and then you’re writing the type yourself, then you might be willing to add more annotations here and there, and do a proof yourself. If you really care about checking this program against its type. But when you’re doing program synthesis, you’re just writing the type, and then the synthesizer will go through millions of programs, trying to find one that satisfies the type. It cannot ask the user every time it’s stuck, right? Trying to typecheck a candidate against a specification, it cannot ask the user “hey, can you help me do this type checking for this random program that I just came up with”, right? Definitely the programmer cannot do this a million times. That’s why it’s so important for synthesis to keep the type system decidable. And I think this kind of refinement types that I’m talking about – also sometimes we refer to as liquid types – so there, this cost-benefit ratio is really great. Basically, you get a lot of expressivity, and you get it fully automatically. Which– OK, I’m maybe not as familiar with dependent types in Haskell, so maybe I’ll say something that’s wrong, but I feel like they are not at the same level of expressivity-to-automation ratio. Because at least I’ve seen people would have to do some– if you want to do some primitive arithmetic where you don’t have the affordances of an SMT solver, you have to do some manual manipulations of terms and types to be able to–

AS: You mean like not having to convince the compiler that n + 0 is 0, right– n + 0 is n, yeah.

NV: I think, while hearing Nadia, specifically for Haskell there is another good explanation between what’s different with dependent and refinement types, at the level that they are now. And it would be that, adding more dependent types in Haskell basically increases the expressivity of the existing type system. While refinement types say “here’s your type system, I’m going to reject more programs”. Because it’s not increasing the expressivity, It’s enforcing more checks. I don’t know if this is more technical, or simplified, or complicates things…

NP: What’s an example of increasing expressivity that is not just rejecting more programs? So, you mean accepting more programs?

NV: I mean type families– or like, the Nat example, right? You need to lift things from expressions inside your types to express things like Nat or vectors. So you need your types to be able to refer to things that were not referring before, so you do singletons, and you do type families, and you do type-level computations, and you are saying OK, I’m going to generate a system that is more expressive.

NP: But ultimately, the goal is still to reject more programs, right? Because you could–

NV: They’re just going to be different kinds of programs.

NP: Well the program is going to be different, yeah. So in refinement types definitely, the changes are only to the type system, and the language stays the same. Yeah, I think that’s important.

NV: Yeah, refinement types say I’m not going to change the expressivity of the language or my expressions.

NP: The terminals stay the same.

NV: Yeah. Given that, I’m not – I cannot change anything, I’m going to reject programs that I don’t like.

AS: OK. I was now wondering, given that – it seems that you’ve done things with Haskell, and then with refinement types you were able to express more things. Have you also looked at other kinds of type systems, where you can maybe use your synthesis? I don’t know, my typical example is Rust-like things, or linear types where you can also do this kind of thing. Have you also looked at these kinds of things?

NP: Oh, yeah. This is a really good question. Indeed, we have looked at a whole spectrum of how to do synthesis from types. I would say, at one end of the spectrum is something like Hoogle+, where types are simple – or at least, for Haskellers they are simple laughs – and then we can add different kinds of adornments to our types to make them stronger in different ways. So one thing I mentioned is refinement types, and this was a tool called TODO SyncWith that I developed during my postdoc, where we can add more functional specifications to the types using refinements. So for example, we can say something like “I want to sort a list”, right. So you can say I have a list– that my input is a list, and my output is a sorted list with the same set of elements, right. And then SyncWith can generate an insertion sort for you actually using that. So, that’s one thing, just adding on functional properties. But, for example, what if you want some kind of guarantees about how efficient your program is? And so we also have some work on– where on top of refinement types, we also add resource annotations to our types. Where essentially you can say “this function has to be linear in the input list”, so there is a simple way to say that. And we also integrate an automated resource analysis into our synthesizer that, as the synthesizer is searching for programs, it’s not just doing the type checking against the refinements, it’s also doing the resource analysis, and always checking “is the current program– does it have enough resources to still execute”. And this is how you can get programs with stronger guarantees on their efficiency.

NP: And actually right now, my student Tristan Knoth, who worked on this resource-guided synthesis business, he is actually working on a generalization of all of those different tools. So he’s building this type-driven synthesis framework, where the idea is we don’t want to reimplement the synthesizer every time we want to add something to our type system: why don’t we have a framework where you can just plug in your type checker, for your new type system? And then you’ll get the synthesizer for free, because we have the search engine already implemented, and it’s sort of generic with respect to the types. You still have to write your type checker in a little bit of a peculiar way, so it kind of has to do things in a certain way. Basically, it has to adhere to our API.

AS: We– we already write type systems in Greek most of the time, so laughs

NP: laughs Yeah. We are actually very excited about this because – it’s work in progress right now, but we’re hoping that if we have support for different type systems, so for example you can have refinement types, you can have– for example, input-output examples can also be cast as a type system, so you basically can refine your types with concrete examples. We also want to add things like linear types in there. And I think the exciting thing is then, if you have it all in one framework you can combine different specifications. Because actually, different properties are easy to express in some type systems and not in other– or rather, different programs. One example of that is, if you want to synthesise something like concat that concatenates a list of lists, Synquid (that refinement type tool) can do it, but it’s really a big pain to actually write a specification with refinement types that is strong enough. Because essentially, if you just give it the type [[a]] -> [a] (!) it would just say empty list! So you somehow have to convince it to put all the elements that are in your input into the output, and expressing that in Synquid is kind of hard, you basically have to define a custom logic-level function, that computes the total length of the input, and then you can say “the length of the output should be the total length of the input” – you can do that, but writing that function is already almost more complicated than writing the concat itself. Whereas if we also had something like linear types in that same framework, we could say “oh yeah, just use this Haskell type – and by the way, it should be understood linearly” and then you’re not allowed to lose any of the input elements, and then it would be basically forced to generate concat, without any extra annotations. So I think this combination of different type systems is actually a very interesting avenue for synthesis. For that reason.

AS: It actually sounds even nice for general– I mean, I’ve always had the feeling that every time we have a new type system we write a new programming language, and it’s very hard to combine– like, you cannot get Haskell typeclasses with Rust, borrowed types and so on because they are just two languages. So that sounds really cool.

NP: I don’t think we’ll solve that problem, because laughs in this current framework, the term language is still kind of shared between all the type systems. Just simple, functional language. But it would be cool in the future if we could somehow make it generic in the term language. We just don’t yet know how to do it. Again, if someone from the community wants to contribute, we’d be really happy to chat about it!

AS: So– changing maybe completely the topic, you mentioned that you also teach in San Diego, right?

NP: Yep.

AS: I was wondering, you have a lot of experience on doing these verification and synthesis tools– do you tell your students about this kind of thing, or how do you approach this topic as a teaching perspective, where you’re also trying to push– tell your students you shouldn’t be using this kind of thing, use– verified programs, I don’t know. What’s your perspective on that matter?

NP: Oh yeah, this is a really good question. I think this is very different if you’re talking about graduate classes or undergraduate classes. At UCSD I teach a graduate class on program synthesis which is actually a research class mostly. I will be teaching this in the winter, so it’s coming up, another iteration of it. It’s a class where we mostly read research papers, and students have to write reviews on these papers, and then they do a class project which is very open so they can do whatever research project on program synthesis. So of course this is the class where I talk a lot about synthesis and verification, because you basically cannot do much interesting synthesis without verification. In an undergrad class it’s very different, because I think coming into this from the perspective of someone with a PhD who hasn’t really talked to anyone who doesn’t have a PhD in a very long time, it’s really hard to– at first, when I just started teaching, it was really hard to adjust to talking to people who don’t know anything. It’s so exciting, so refreshing! I remember one of the questions that I got in my very first time I was teaching this programming languages class – which is basically just a functional programming class at this point – I remember I was telling them about lambda calculus, right. And I was telling them “everything is a function” and then I was giving them the syntax of what the lambda calculus is, and I was showing them a quiz asking which terms are syntactically incorrect. And there was one term that was, instead of \x -> something it was like lambda, and then instead of x there was an expression in parentheses. So there was another function inside. Basically another lambda instead of the argument. So that was the answer, that was the incorrect one. And then someone was like, “Why is this incorrect?”. And I was like, after a lambda you have to have a variable, not a function. And he was like, “but you said everything is a function!”. And then, I don’t really know what to say to that! laughs

NV: Maybe they were thinking de Bruijn…?

NP: Yeah, maybe that was the solution. Anyway, it’s very – I think it’s really exciting to teach these concepts of functional programming and Haskell. What we do is we teach a couple of weeks of lambda calculus first and then Haskell for the rest of the class. And it’s not just a class about Haskell, it’s also – because it’s a programming languages class, they also get to implement an interpreter, and learn a little bit about environments and closures and parsing as well, at the same time as learning about Haskell. And all of these things that we take for granted, like higher-order functions and typeclasses and monads, I think once you start explaining them to people who have never seen them before, you have to go down to the level where you have to come up with really simple examples of how this helps you structure your code, and how this magically turns this really ugly code into this really beautiful code. For me, every time it’s so inspiring becauase I’m like, oh my god, I forgot how nice these things are! And now I have to remind myself every time.

NV: What is the background of your students? I believe it is easier to teach people Haskell and functional programming when they don’t know how to program.

NP: Ah, yeah. That’s not our case. Most of these students are like 3rd or 4th years. At UCSD they start with Python and Java, um. Yeah. And they usually know some C I think. At least most of them do. But yeah, I think all of them know Python and Java, and some of them know C and C++.

NV: And do you try to compare different programming paradigms, or…?

NP: So, the class is not explicitly structured around comparing the paradigms. There were some versions of this class that were like that, but we kind of transitioned more into this “let’s implement a language in a functional language” style of PL class. I do comparisons when I think it’s useful for their understanding. So for example when explaining typeclasses I say that this is similar to interfaces in Java, but it’s also different in these ways. I mean, it kind of blows people’s minds, so laughs Especially starting with lambda calculus, it’s like, immediately throw them in the water, and it’s nothing like they’ve ever seen before.

AS: I don’t know if you’ve had the same, but I always had the feeling when people write an interpreter, they suddenly figure out that what they thought was black magic is no longer black magic, and it’s like a cool thing for most people. If you start programming, the compiler is– you don’t know how it works! And it takes years to understand all the parts. And suddenly– I don’t know, at least for me it was great to say “oh, I can actually write an interpreter” which– OK, is not GCC, but it takes code and spits assembly code, so I really like these kind of courses like you are describing where students have to implement their own languages, and not only use them, but learn how they are in the inside, so to say.

NP: Yeah, I think– I also like this style of class and I feel like it gives you a better understanding of the anatomy of languages. But I do think it’s confusing for– and of course some students just get it, right, and they’re perfect – but it’s still confusing for a lot of folks, because they have just learned this new language and this new paradigm, so they’ve just learned this functional language. And now they have to implement their own functional language in a functional language, and they get totally confused between the– so I have– every year– we have this assignment where they have to implement an interpreter, and then they can test it out and how it evaluates various programs, we show them how to test it. They kind of get a little repl for free, where they can test their stuff. And every time there are students that are like “oh, when I tested manually it works perfectly, but then when I’m running the test suite it fails”, and then it turns out that what they do is they just input the program into GHCi, so they’re basically just testing whether GHCi is good at executing Haskell programs. Because the syntax of our language is a subset of Haskell, so programs in our language are Haskell programs, so they’re just like literally, input the program into GHCi -> “oh it works!”. (!) Well of course it works, GHCi is pretty good at executing Haskell!

AS: I was actually hoping you would say something which they have to execute their own interpreter in their own interpreter, getting some kind of metacircular thing as the last exercise or something like this!

NP: That would be cruel. laughs

AS: You said cruel, or cool?

NP: Well, I said cruel, but probably both! laughs

AS: Good, so maybe a bit also more on the teaching. Again, going back to this. We’ve been talking again a lot of verification, and different tools, and refinement types and that kind of things, but I still have the feeling that we are talking about it, but it’s something which is not that well known. I mean, I think it still blows people’s minds that you can have a refinement type for natural numbers, right? People don’t see that so often. And it feels strange, because it’s not new. We cannot say it’s new, I mean refinement types have been– I don’t know, but certainly a decade? A couple of decades? People have been using this, the idea or things like this. I wonder whether you have some ideas how we could also get through to these other people maybe more in the industry, and convince them that using this tool, the synthesis you are building and other tools, are actually useful for them.

NP: Yeah, I think this is a really good question, a question about adoption of programming language research. I think this is just so random, and there’s so many factors that influence it, because we see some really good ideas from the 70s or whatever just having been adopted recently. I think one good example is Rust, because ownership type systems have been around since forever, and I think the general sentiment was always “this is too impractical, this is too restrictive, and no one will ever use that”, right? Because no one can program like this. And then all of a sudden something happened, and everyone is programming like this! Why?! I don’t actually have an answer to this, but I think it’s a combination of just the right time, just the right tooling, so someone built a really good toolchain for this, and just the right things happened and people met, and the right community just formed around it. And so, just having seen all these examples, I don’t get particularly hung up if the technology that I’m working on is not getting implemented right now. If someone tells me it’s not gonna be used tomorrow or next week, I’m like, well, as long as I feel like it’s elegant and it’s nice and it’s potentially useful in the future I’m happy.

AS: In 50 years, somebody will say “oh, it’s this old thing from the 20s!” or something like this! laughs

NP: Yeah! laughs I’m happy to work on these things nevertheless. I mean of course I agree with you that it would be great if more people were using refinement types right now. But I believe that all good ideas will eventually find their way into the hearts and minds of developers, so we just– I mean, not that we shouldn’t be doing anything about it, so if you ideas definitely we should be– and I think we are doing something about it right now! But, I’m not too worried about it, let’s say.

AS: Good! Well, I have no more questions left. I think it’s a good moment, going from the specifics of all the work you’ve been doing to this more general thing about adoption and teaching of all these tools. So Nadia, thanks again for being here with us, for almost an hour and explaining all the nice work you’re doing.

NP: Thanks again for inviting me! It was super fun.

NV: Thank you!

10 – Nadia Polikarpova

Related links

Transcript