Transcript of: Clojure for Lispers, by Rich Hickey Presented Sep 29, 2008 at the Boston Lisp meeting. Available on http://clojure.blip.tv Currently at these links: http://blip.tv/clojure/clojure-for-lisp-programmers-part-1-1319721 http://blip.tv/clojure/clojure-for-lisp-programmers-part-2-1319826 Note: I've got most of Rich's words transcribed correctly. The audience members, I'm not so sure about. They are much quieter in the recording, so sometimes I've guessed or left part or all of their questions as tbd (to be determined). Feel free to send any corrections to andy_fingerhut at alum dot wustl dot edu (Andy Fingerhut). You can jump to slide numbers by searching for "slide 37", for example. I've included book, article, and web citations for some things. (cover slide) Hi. I'm here to talk about Clojure, which I've done several times, and never -- oh, yeah, once, at the European Common Lisp workshop -- for an audience of Lispers, but usually not. So it's always nice to talk to a crowd of people who already know Lisp, because then I can talk about more advanced things and more interesting things. I don't have to explain S-expressions and things like that. But I want to make sure that that presumption is correct. Does everybody here know some dialect of Lisp? Yes? Anyone not? OK. Somewhat. I'm not going to cater to you, so you're just going to have to hang on. And amongst the people who know Lisp, how many are of the Common Lisp persuasion? And Scheme? OK, I won't make any Scheme jokes then. (laughter) Does this work. Let's see. Oh yeah, look at that. (tries out slide control and slides advance) (slide 2) So I have a lot to talk about. So it's nice that it's a Lisp group, because I can also presume that you are all very smart and I can talk _fast_. So first I just want to set this up. What was I trying to do in building Clojure? Well, I wanted to have a Lisp. I fell in love with Lisp a long time ago and decided this is what I want to be able to do. In particular, I want Clojure to be a functional programming language, and there's a lot of reasons for that, one of which has to do with the robustness of programs, and the other of which has to do with concurrency, which we'll talk a lot about. I want it to be a language that supports concurrency, and what I do, I haven't written a non-concurrent program in almost 20 years. I do broadcast automation software and things like that, and there's always more than one thread or process. Clojure also is designed to run on the JVM. Being on the JVM is not an implementation detail. It's an objective. Interoperating with Java is an objective, and an important part of Clojure. It's not just something under the hood. Which brings me to my next question. How many people know Java? Oh, good! OK, I will make fun of Java. (laughter) But I hope, after this talk is done, that you have an appreciation of the difference between the JVM and Java. It's OK to hate Java and like the JVM. That's where I'm at. Also, Clojure is open source because I think that's mandatory for adoption for a language. (slide 3) So why Lisp? This is not an audience in which I have to justify why Lisp. We all like Lisp. It's dynamic. In particular, for me it was great because it has got a small core. We have to implement these 7 primitives, and just macros and everything else takes you from there, which means I have a shot at getting it correct, unlike languages that have all that syntax, and therefore very complicated compilers and parsers and everything else. We all like Lisp syntax. This is still an advantage of Lisp: code as data, and the resulting macro system and syntactic abstraction. Also I saw in Clojure an opportunity to help along the never ending battle of having people not balk at the parentheses before they see the rest of the value. (slide 4) So one of the things I want to do, and one of the questions I expect to face in this room is, you know, we have these two great Lisps, which are standardized to some degree. Why yet another Lisp? And I think this is an important and valid question, which I am going to answer tonight. I think any new dialect of Lisp should be asked this question, and should have a good answer. So some of the reasons why I couldn't use Common Lisp or Scheme for what I wanted to do was they are hard to change. I don't know how you change Common Lisp. Scheme just recently changed a little bit. Because I want a language that is functional, I want the core data structures to be immutable. That's not something you can retrofit on either of those two Lisps. There are no concurrency specs for those languages as they exist. That is not to say that implementations don't provide concurrency semantics for the stuff that is already in the language, or facilities for dealing with threads, but it is not in the language. It is not something you can presume. It is not something that ports. And there are already good Lisps on the JVM. Kawa, Armed Bear, SISC, they're all good. So if you have as your objective -- I have all this code and I want to move it on the JVM for some reason, that is a solved problem, and there is no reason for me to take it on. The other thing has to do with languages and platforms, and I will talk more about that, but basically Lisps, like any language of their day are kind of their own platforms, and that's also true of languages like Python and Ruby. (slide 5) So why the JVM? Well, it's coming around that, just like you would say, I want this language and I want it to talk to this operating system, you know, I'll port it to these operating systems as targets, virtual machines are targets. In fact, they are the new targets. They are the new platforms. Just like you would say, "Oh, Lisp is a platform" "Unix is a platform". For businesses, these VMs are platforms, because they abstract away platforms, or what we used to call platforms, the hardware, the instruction set, the CPUs, no one wants to deal with that. No one wants to deal with the OS level stuff any more. So we've now got another layer of abstraction, which are these virtual machines. The two big ones are .NET and Java. And they provide a lot of facilities, more than just how do you access the file system, including type systems. What is interesting, and surprising a little bit, is that the type systems in these VMs are dynamic. They have dynamic enforcement, dynamic creation of types. They're very dynamic, even though the languages that typically target them, Java and C#, are not particularly dynamic. The big story, of course, is this huge set of libraries you get to tap into. All this code that other people wrote. They also provide a bunch of high level services like garbage collection and memory management, resource management, meaning that garbage collection is no longer a language problem. Garbage collection is a platform problem, which is good. And also, I'll talk more in detail about this later, but they offer byte code, and in particular in the JVM, just in time compilation that is extremely sophisticated. You know, run time analysis of your running program to find best paths, optimize short call paths. These things will take apart your code, make a fast path, compile a special fast path. All kinds of things that you absolutely cannot do with a static compiler. The compilation technology on the JVM is state of the art. (slide 6) So old school was, well, we didn't have very great tools. We were all writing our languages, or starting, bootstrapping with C, and every language sort of solved the same set of problems defining its own platform: its own garbage collector, its own byte code, its own type systems, and facilities for libraries. Now we can look to these VMs to provide that stuff for us, and in particular, they can provide it in a way that is somewhat independent of language. I wouldn't pretend that the JVM or the .NET run time, the CLR, are really independent of language, especially the CLR. They talk so much about it. But really any language can target this platform as long as it looks like C#. But it ends up that if you said, well, if I want to have an abstract machine, I could have like a little stack machine that did almost nothing, or I could have a stack machine that had this sort of a primitive object system. Well, that is what these virtual machines have. They have the primitive object systems that are supported by C# and Java. And a big thing for me is Clojure is designed to let me continue to work for a living. And I know a lot of people in this room, or some people in this room, are extremely lucky and can write Common Lisp for money. Great. But most people can't. And platforms are dictated by clients, just like they say, I have this Sun box and your program has to run on it, they also now say I have this JVM infrastructure, and your software has to run on it, because my guys know how to install it, secure it, scale it, and you have to target it. If you don't, your language -- it's just like your language didn't support OS X. Your programs just couldn't run there. And they have real investments, and they're not stupid. This is a real requirement in a lot of commercial software. (slide 7) Again, as I said before, I'd like to hopefully get you to distinguish between Java and the JVM. Unfortunately, Sun called them both Java for the longest period of time. It was like Java the ecosystem, and Java the language, which made it difficult to split them apart. But it's really always been the case. There are hundreds of languages that target the JVM. And now that the JVM is open source, you're seeing even more adoption and more targeting, more libraries, more infrastructure. Any language always had to have an interop layer with something. Typically that was C. It ends up that that is pretty insufficient, because lots of the great new software, lots of the great new infrastructure, is not being written in C. The old stuff was, but a lot of the new stuff isn't. So some ability to call or consume Java is critical. It's very difficult to do with a bridge. I've written two bridges for Common Lisp. You may know of Jfli or Foil. I wrote them. I tried very hard to make that work, but it's not a satisfying experience. http://jfli.sourceforge.net http://foil.sourceforge.net So we look at Clojure as a language, and the JVM as a platform. (slide 8) Why functional programming? Well, I think there are a couple of benefits before you even get to concurrency. They are certainly easier to reason about, and easier to test. Now everybody will say Lisp invented functional programming. We've been doing functional programming from the beginning. It's certainly possible to adopt a functional programming style in Lisps that already exist. And that is both true and not true. It's true in that yes, if you want to use lists only, and you promise each other never to modify them, you can take a functional approach. But otherwise, you can't. And I'll talk more in detail about why that is the case. When you get to concurrency, however, you need to take a much more serious view of what it means to be functional, and what it means to be immutable, because it is going to be the case that we simply cannot go into this multicore environment and keep scribbling on the same memory from different threads. There aren't too many dynamic functional languages. Erlang is sort of a dynamic functional language, and Clojure is one, but most of the functional language like Haskell and ML couple this functional programming, and by that I mean, programming without side effects, with these ornate and very complex type systems, because that is where the hot research is. But they don't need to go together, although people will dispute that. In particular, it is my opinion, having tried to get people to do it for a pretty long time, that functional by convention is not good enough, both because people mess it up, and because it's not subject to the optimization that is available when things really really are immutable. Something that is genuinely immutable, and provably immutable by the optimizer, is a different thing from something where two people have said in a conversation at lunch, "I promise not to change that." (slide 9) All right. Why focus on concurrency? Well, we've long since, I mean it has been a while now, we are not going to have our single-threaded programs have their performance improve by time passing and Moore's law kicking in. That is over. We've hit this plateau. It is not going up. If you have a program right now today, and it is only capable of running on one CPU, it will never ever be faster. OK. That wasn't always the case. It used to be you wrote a program and said, "Man, by the time we finish this, it will be fast enough." And that's not the case. So you really have to get a grip on this, but trying to do this with threads, with the conventional mechanisms, and Java has locks, and everybody has locks, and a lot of the languages, and a lot of the enhancements to Lisp that support multi-threading really have the same kind of trusty lock-based mechanisms for dealing with concurrency. They don't work. They're very very hard to use. They don't scale. And even if you happen to get it right in one moment in time, in maintenance it breaks. Programming in a functional way helps. Of course anything that is really immutable, you don't have to worry about. Multiple threads can access it. There's no sweat. Nothing bad can happen. But it ends up that most real programs that do real work, that people will pay for, are not functions. You know, the things that a lot of the functional programming guys write are really functions. A compiler is sort of a big function, right? It takes the source code as an input, it calculates output, spits it on the disk. A theorem prover is a sort of a function. A broadcast automation system is not a function. I work on the election system that runs on election night. I can tell you that's not a function. It's all we can do to make it function, but it's certainly not a function. So these kinds of real programs are more like models of processes. You know you go to businesses, they have this process. Most things have to run indefinitely. So you need something that is going to appear to change over time, that multiple parts of the program can access. And typically we call that state, and it's a reality. So Clojure couples this immutable approach to programming with some mechanisms for doing safe state. And in particular, I think it's critical that the language provides both support and enforcement for that. That it's not a convention, where locks are a convention. (slide 10) Clojure is not object oriented. And there are many reasons, most of which have to do with the fact that I think we have to dig ourselves out of this rut we've gotten into with object oriented programming. We just haven't done anything novel with it in decades. But in particular, for this problem space, it is a disaster. Object oriented programming, as normally implemented, is inherently imperative. It's about creating this object, and banging on it from multiple threads. There is encapsulation, but it doesn't ... that is orthogonal to the problem of concurrency. So this has got to change. Also, I think that the traditional style of object oriented programming, where you have methods inside a class, has been proven not as flexible as Common Lisp's generic functions, which show we should bring this polymorphism out of this box and apply it to a ... multiple arguments, for instance. And I would go even further and say, "Why do we base polymorphism on types alone?" In the real world, we don't categorize everything by a fundamental at-birth type. Nobody is born a taxi driver, or a painter, or tall. But these type systems work that way. You shall be tall, and we'll make every decision about you based upon you being tall. Whereas most systems and most reality supports some sort of polymorphism based upon value, or current state, or relationships to other things. So we need to sort of pull this stuff apart, and traditional object orientation has just put us in this rut where we can't even see the difference between these things. (slide 11) So this is the question I want to answer, in addition to others tonight: Why is Clojure different? The first thing that is different about Clojure is it has more first class data structures. So lists, if you interpreted Lisp as LISt Processing, there's definitely a primary data structure, which is the list, and everybody who uses Lisp, especially Common Lisp, knows that that ain't enough. It's not a data structure that scales well. It's not suitable for a lot of application domains. And of course Common Lisp has other data structures. It has vectors, and it has hash tables, but they're not first class. There's no reader support. They don't print well. They're just not as nice as lists. So Clojure enhances Lisp by making print-read representations for more data structures, and I'll show you the details of that. Another big difference with Clojure is that the common and core algorithms are defined in terms of abstractions. As we'll see, a lot of the library in Lisps is based around a concrete data representation. You know, the two cell cons. That is really bad. Of course, no on is blaming McCarthy for doing that 50 years ago. It was brilliant. But today, we can do better, and we have to do better. Clojure is different again because it embraces a host. It is symbiotic with the platform, which means it is going to get a lot of leverage of that platform, and work other people have done. We're thread aware, and as you'll see, I am not constrained by backwards compatibility. I am not going to compile your old code. That's not the objective of Clojure. And that frees me up to, for instance, reuse words which, if they are to mean what they mean in Common Lisp and Scheme forever, you'll never have a Lisp with different semantics. You just can't own the words. Sorry. (slide 12) So I'm going to break this down. We're sort of branching like a tree. We'll talk about Clojure as a Lisp. We'll look at the data structures, I'll talk about abstraction and building algorithms on top of abstractions instead of concrete data structures. We'll dig into how the immutability works in Clojure, and then I'll look at some grab bag of Lisp features. You'll see how Clojure differs from Common Lisp or Scheme, and then we'll dig down into the concurrency mechanisms in Clojure, and finally look at the JVM, both how you interoperate with it, and how is it as a host platform. (slide 13) So Clojure meets all of the objectives of being a Lisp. It's dynamic, code as data, which is a primary characteristic of Lisps, and it's the same in Clojure. It's the same compilation model as Common Lisp. I'm sort of a Common Lisp guy. I'm not a Scheme guy. So I like the reader, and the properties of the reader being well defined. It has a small core. You have a REPL. It's lifted the notion of lists to this abstraction called sequences, and I'll talk a lot more about that. And it has macros, and the other goodies you expect from a Lisp. If you used Clojure, you would not be at all confused about it being a Lisp. (slide 14) As a Lisp, these are the details. It's a Lisp-1. It's lexically scoped. It's a Lisp-1. However, it has Common Lisp style full function macros. You can do whatever you want in there. And it has dynamic variables, which behave a lot like dynamic variables in Common Lisp, except they have threading semantics. It's a case-sensitive language. That's really got to go if you want to interoperate with XML, and sort of anything. You have to be case sensitive. It has namespaces instead of packages, and I might have some time to talk about that in detail, but in particular, it is important to note that Clojure's reader is side effect free, which is a big benefit. Clojure is 100% compiled. There is no interpreter. It loads code, it compiles it right away to JVM byte code. From there, there's a lot more compilation that goes on, which I didn't have to write. As I said before, there's some very sophisticated optimizing compilers inside the JITs, inside Java VMs, in particular, HotSpot. Clojure does not have tail call optimization. Not because I'm against it. I'm all for it. It's something I had to trade off in being on the JVM. However, I was just at the JVM languages summit, and tail call optimization, every language designer who cam up there said "tail call optimization, tail call optimization". And they've heard it, and hopefully it will come in a future VM, hopefully soon. A lot of the names are different. As I said before, the names can't mean the same thing forever and ever and ever. There's only a few good names, so these are the primitives in Clojure. fn is like lambda. if. def is define. let, loop, and recur I'll talk about a little bit later. do is a block operation. new, dot are Java things. Java interop things. try and throw have to do with exceptions. There's set!, quote, and var, which I might have time to talk about, but this should be familiar in terms of their operation, if not the names. (slide 15) So the atomic data types are the things you would expect. Clojure has good math with integers that don't truncate or wrap. There are doubles, there are BigDecimal literals, which are important for people who deal with money. There are ratios. Strings are in double quotes. Characters are preceded by a slash, a backslash. There are symbols. The thing about symbols is they are not places with values stuck on them. Symbols and Vars are different in Clojure. They are two separate things. So you can consider symbols in Clojure to be more simple names. There's no interning of the symbol with associated values. Keywords start with a colon. The colon doesn't participate in the package system like it would in Common Lisp, so there's no implication there other than the typical thing you would expect with keywords, which is that they evaluate to themselves. There are Boolean literals, true and false. They unify with Java Boolean. And there is nil, which unifies with Java null. So Java null and nil are the same thing. I have a whole slide on nil, so let's save the arguments and whatever for that. And there are regex literals, but they really aren't any more atomic literals than these. (slide 16) So I said earlier Clojure has more first class data structures. Everybody knows you couldn't write a performant program with lists. It's just going to be a pig. Yes, you can use a-lists and pretend you have an associative thing, but it's not good enough. The other ones are second class. They don't have read/print support. In particular, though, they are not very Lispy. A Lisp hash table is not Lispy at all. You can't recur on it. You can't build it up incrementally. You can't unroll the stack and just pop off values and use it like you do a list. You don't use hash tables like you do lists in Common Lisp. And the same thing with vectors. And one of the reasons why is because they are destructive, right? You add something to a hash table, and you trashed it, and you can't undo that trashing, which means that you do nice things with lists. You make lists of a-lists as you go down, if you're writing a little evaluator or compiler, you'll have a list of a-lists, and you know you can just unroll that, and pop it right back off, right? Because it's all nondestructive. You can't do that same job with hash tables. In Clojure, you can. Because Clojure's data structures are structurally recursive. You could say, "Pfff! Lists and a-lists for an environment? That is really not so great. But I'd much rather have a stack of modifications to a nondestructive true performant associative data structure." And that's what Clojure gives you. In particular, compared to the languages du jour, Python, Ruby, whatever, the lack of a first class associative data structure is really a bad blight on Lisp. People look at that, and they're like, I can't work with this. Because it is something you really need. So Clojure has it. (slide 17) So what have we got? We have lists. We have lists. They are singly linked. They are exactly like what you know. I've heard rumors that Clojure doesn't have cons cells. Clojure does have cons cells. What Clojure doesn't have is a giant beautiful library that is married to cons cells. That's what it doesn't have. It does have cons cells. Clojure has vectors. It has maps, both hashed and sorted. By maps, I mean associative data structures. Hash tables. A hash map is a hash table. It has sets. Again, first class sets, both hashed and sorted. They all have read/print support. They are all structurally recursive. In addition, they all support a uniform add operation, which is conj. Another S-J pun. But it ends up being an extremely powerful thing. You can write algorithms that manipulate lists, vectors, maps, or sets uniformly. Add things to them, look for things, map functions across them, and you could just substitute a map for a set, for a list, for a vector, and not change the rest of your code at all. So if you have this presumption that, ooh, we have these data types. More data types means more complexity for my macros, and I have all this gook, because now I have this heterogeneity to my data structure set, that's not true. Because you have homogeneity in your algorithm set, and I'll show you more about that later. And conj is one of the keys to that. (slide 18) This is what they look like. Lists look like what you're familiar with. Lists are parenthesized. They are heterogeneous. You can stick anything you want in there. They grow at the front. Every time I say grow, change, add, remove, there's quotes around it, OK, because none of these things actually change. So just like a list in any of the Lisps, adding something to the list doesn't change the list. It's a new list, right? That logic applies to all the data structures in Clojure. Vectors support fast indexed access, like you'd expect. They also grow at the end automatically, so you can conj onto a vector and it is going to grow at the tail. Maps are in curly braces. This is just key, value, key, value. Commas are white space. They make people coming from other languages feel more comfortable, so they can use them and we don't care if you want to use them. You don't have to. So commas are just ignored. And again, it's key, value, key, value, and the keys and the values can be anything. Sets are in curly braces preceded by a hash sign, and that's just going to be a set of unique things, which will support fast look up and detection of the things in there, and they have, as I said before, both of these, sets and maps, have hashed versions and sorted versions with different performance characteristics and sorting characteristics. And then, all of this stuff nests. (slide 19) This is just some code that uses some data structure literals so you can see what is going on. You put the literals in your code and you get versions of that. Of course, as in all Lisps, lists by default get evaluated, so if you want a list you have to say 'list' or quote it. But the nice thing is we have this uniform interface. We can add to any collection some thing, and it will do the right thing for whatever kind of collection it is. So we can add key/value pairs to maps, you can add to the beginning of lists and the end of vectors. Making these changes doesn't really change anything. You'll see we print, we get the changed versions and the unchanged versions. If at any point somebody has questions, just Audience: What does that program do? This program just shows you ... It does nothing. It creates a vector, a map, and a list. It adds something to each of those, and it is returning a vector of the changed versions and the original versions. Audience: Oh, the brackets are It's a literal. A literal vector. Right. So it is a vector of the answers and the originals. And we get a vector of the answers, changed vector, changed map, changed list, and the originals: unchanged original vector, map, and list. Audience: Are commas white space everywhere, or just in maps? Everywhere! Have at it! Stick them anywhere you want! (laughter) Just whatever makes you feel good. Audience: Is the square bracket (tbd?) No, there are named functions for all of them things. There are functions for all of them. So you can map 'vector' and get vectors. Audience: ? Yes, the bracket is part of the let syntax. You've seen some Schemes start to advocate using square brackets for lists that are not calls, right? And then, like PLT or whatever, they treat them as lists. That same convention of using square brackets when things are not calls or operators happens in Clojure. So these things, these are not calls, so people don't get confused when they see parens everywhere, they're like, "Is this a call? Is this data? I can't get my head around it." So that convention was a good one, except in Clojure, I don't want to waste square brackets on another kind of list, so I have real vectors. Audience: ? That's a vector. When this is read, this is a list with a symbol, a vector with a symbol, a vector, symbol, map, symbol, list, vector, etc. etc. etc. This is real code as data. You read this in, it's fine. Audience: But it could have been ... you couldn't have had a ? This is a hash table. Audience: No, as the first argument of the let. Oh, that's the destructuring stuff. I'll show you that later. Audience: some question about comments Ignored. Ignored. It's the old ignored. Ignored to the end of the line comment. So nothing is produced by the reader for that. It's a cool reader, though. For instance, it captures all the line numbers, and associates them with the data structures as metadata. I'll talk about metadata later. OK? (slide 20) So, this is a big thing I want to talk about, and I want to kind of beat up on the existing Lisps. Not because they made the choice, but because it's a choice that is getting moldy. Which is, you have so many great algorithms defined in terms of concrete data types. That is really a mistake today. It limits how general your code can be. It limits the extensibility of your language. How many people said "Oh, I wish this was a generic function" about some library function? So yeah, you do. Because the generic functions are the abstraction mechanism in Common Lisp, and they weren't used a lot in the standard library, because of the way the history was. Again, I'm not blaming the people who wrote Lisp, Common Lisp and Scheme, for doing what they did. It's just today, starting from scratch, that would be not a good idea, because there are so many efficient mechanisms for implementing abstractions that are fast. That are super-optimized. It ends up on the JVM. That mechanism is interfaces and virtual function calls. So under the hood, implementing this stuff, implementing these abstractions in terms of that mechanism, means it is super fast. Because some people say, well, we can't make that thing in Common Lisp a generic function, because we just can't take the hit. No hit. These optimizers can make them all go away and inline virtual functions pretty much everywhere. (slide 21) So this happened a lot in Clojure, but I want to take as an example case of a cons cell. Getting rid of the cons cell. Getting it out of the library code. So the cons cell is just brimming with details. It's so sad to see the Lisp books with pair of boxes, with arrows between the boxes. What a horrible way to start, right? Because there is a great abstraction that's in there, that is inside conses. We don't care that it has two slots. We don't care about CAR and CDR at all. We don't care about the contents, and in particular, we don't want to know that at all. I don't want to promise that you'll give me back something that can change. That is terrible. So if we were to lift the interaction pattern you have with these things, there's only two things, right? Get me the first thing, and get me the rest. Two functions. Two functions that have nothing to do with pairs, with cons cells, with boxes, or arrows. Right? This is an abstraction in there, which is, there's this sequence potentially. Now what if there is no sequence? What do we have? Well, the Common Lispers will say you have nil. And I do, too. OK. But nil means nothing in Clojure. It means you have nothing. It does not mean some magic name that also means the empty list. Because there's empty vectors, and there's empty all kinds of things. Nil means you don't have something. So either you have it, a sequence, or you have nothing, nil. If you have a sequence, it's going to support these two functions. first is going to do what? Return the first thing in the sequence. That's good. Not surprising. rest is going to do what? It's going to return another sequence. Again, no change, this is not an iterator. This is the abstraction inside cons cells. rest says "get me the rest", which is going to be either what? A sequence or nil. That's it. That is the abstraction that is inside there. Audience: No improper lists? No. Audience: ? Well at this point, I've stopped talking about lists, right? Audience: Right. Fair enough. I have nothing to do with this. In fact, the next thing I'm going to say is there is no reason you can't build an exemplar of this abstraction on top of vectors, right? I can make something that implements first and rest on vectors. I can implement something that implements first and rest on maps. I can implement something that provides first and rest on anything, and that is what happens. All the data structures in Clojure support a function called seq, which gets you something that implements this interface on that data structure. Now you've got this separation of concerns. Sequencing across a data structure, and the data structure, are separate. So we've lifted that away. That is a big deal. Audience: What does seq do if you give it a vector? It returns something that implements this interface on top of the vector. Audience: A cursor, right? You could consider it a cursor, if you want, as long as you don't associate that with anything like iterators in these broken languages. It is not a stateful thing. It is literally this. What is the difference between rest and move-next? Audience: That just gives you ? rest gives you _another thing_. Right? move-next is a cursor. It is destructive on the iterator. Iterators are _bad_, right? Sequences and this abstraction is good. Because it means, for instance, with an iterator you can't look at three things, and then hand the head again to somebody else. That head, well that's toast. You moved it. You can't even get at it any more. That is not true with these. So when you say to a vector, give me a seq on you, you get a new thing. It is not the vector. Of course, some things can implement first and rest themselves. What can? Actual lists! What a beautiful thing. So when you implement this thing on cons cells, they are non-allocating. Exactly beautiful. The same exact stuff you used to have. It is just that we have separated this from that. But that is still the most efficient implementation of this. Audience: Does seq on a list return the list? Yes it does. (slide 22) So this is a famous quote, and it is a good idea. We have to go further now. We should be implementing our functions or algorithms on top of abstractions, not data structures. Pull out the abstraction. Seq is implemented for everything in Clojure. In addition, I implemented seq on almost everything in Java. Java collections. Any Java Iterable. Java Strings. Java regex matches. Lines from a file. etc. etc. etc. There is a seq implementation for all of those, which means that all of the algorithms in Clojure work on everything. Audience: Do you ever have trouble with the object overhead? No! No. Certainly. Let's just take an example. Vector. Audience: ? No, it isn't. Audience: ? No. I disagree. I disagree strongly. Audience: ? No, no, no. No. We have to stop doing this. (laughter) We have to stop doing this. I'll tell you. I have an anecdote, which I'll use a couple of times, because it was extremely cool. I was just at the JVM languages summit, where there was a guy there who works for Azul systems. This is a company that makes these mega-boxes that run Java. They're dedicated to running Java. They have hundreds of cores. I gave him this little Clojure program I wrote in an afternoon. It's less than a hundred lines. It implements an ant colony optimization of the Traveling Salesman Problem. I only had a 4 core machine. I did what I had to, and I used some of the stuff I'm going to show you later, the STM and the agent system. But he said, "Oh, I want to try it on my thing." So he tried it on his thing. And boom! He pulled up my program and it ran on 600 cores. Boom! Just like that. He had these cool tools for watching the cores with LED graphs popping up and down. It was so exciting. And that program was churning. You know how much it was churning? It was churning 20 Gigs of garbage a second. (laughter) 20 gigs of garbage a second. Do you know how much of the overall CPU energy that used? 7%. Guess what? Ephemeral garbage on the JVM is cheap, and that is a design principle underlying Clojure. Because you can't implement seq on a vector without allocating memory. Right? What is a seq on a vector going to be? It's going to be a thing that has an index in it, right? And when you ask for the next, for the rest, what is it going to do? Make a new one of those with an index that is 1 greater than that. That's going to be an allocation. So there are many data structures you can't sequence over without an allocation per step. That's a beautiful thing. You have to see the way these JVMs optimize that stuff away. You cannot be concerned about ephemeral consing any more. It's a real design misfeature to prioritize that. The other key thing which we'll see in Clojure is first and rest. They are so lightweight concepts that we don't actually have to have a sequence, do we? How much of a sequence do we need to implement first and rest? Audience: First ? One thing, and some way to get at the other things. Almost nothing. Right? We don't have to have a realized sequence, or a realized data structure at all, do we? So it means that we can implement first and rest lazily. That ends up being an extremely powerful part of Clojure, leveraging that. Audience: So you implement seq over Java collections. Yes. Audience: ? You lose some of the immutability here, it seems, when you start doing ? over collections. No. You do not lose the immutability guarantees of this sequence. The sequence as it walks across the collection caches everything it finds. So if you've made a pass, you will never get different answers. If you've made that sequence on top of some crappy mutable Java thing, and you've mutated that thing from multiple threads, you're going to get the normal Java error from that, which is concurrent modification exception. But no, these seqs, they are immutable. They're golden. You walk through. You get the same stuff. You can share it. You can share it between threads. You can point at different parts of it. No. Really. (slide 23) OK, so I've just shown you one, right? The sequence. We sort of broke that down. But there is a similar abstraction behind all the associative things, the associative maps, and the indexed vectors, and the sets. When I showed you those pictures of those literals, well there's a bunch of implementations of these abstractions. One of the neat things about having a data structure that returns a changed version of itself, is you can have one type for when it's a tiny hash, and then as you add stuff to it, one of those adds could return a whole different type that is optimized for being slightly larger. And when it gets bigger than that, one of those adds can give you yet another different type for being huge. All that stuff happens under the hood in Clojure. So the key thing is that we are working with abstractions and not any promises of the concrete type. Audience: ? Yes. Audience: ? Yes, numbers are always boxed all the time, but I'm going to ... except when they're not. (laughs) Which I have the slides for, so I'll talk about that. All the data structures have corresponding abstractions. Callability itself is an abstraction. There's no reason to say only lambdas can be called, and the Lisp-2 thing of the thing in the first position is going to be hard wired to this type of thing. Callability is an abstraction, and it is implemented by several of the data structures. Maps are functions of their keys. Well guess what, it is because a map really is a function of a key, isn't it? That's what it means to be associative. It's a function. So maps are functions. Vectors are functions, and sets are functions of their keys, because that is really what it should be. There are many implementations, as I said, and the cool thing is, this is an infrastructure. I've given you the interfaces for these things. You can extend this yourself. You don't have to call me up and say, "Oh, I wish you had this kind of cool data structure. Could you please add it?" You can do that. You can do that from Java, or from Clojure. Anybody can do it. And all the algorithms you have, and all the algorithms that I've already written, and all of the algorithms all of the other people have already written, are going to work with your new data structure without change. That's the power of abstraction. You get a large library, and by building F functions and I implementations you get F times I power. Audience: ? Yes. Audience: How does that affect inlining? How does that affect inlining? Audience: ? do we really have to worry about this stuff? Is it all ? Right. If you were concerned about it affecting inlining, it would be because you were working with the biggest ones, and so that optimization is going to kick in after a number of times. For instance, the JIT in Java will typically watch your function and say, "You know, you are calling this a lot. I'm going to go set off a thread on the side and try to figure out how to do that fast." By the time you're doing that, you're already on the big ones, typically. You're either already on the big ones, you're already on the big size, or you will always be on the small size. In other words, you're not doing something that's really growing it. But yes, that kind of polymorphism does impact inlining, and that's an ongoing challenge for the JIT guys. All I can say is, they're doing it better than anyone ever has so far. These compilers are amazing. (slide 24) All right. So let's talk a little bit about laziness, because it's another key component of the way Clojure works. The basis of it is very simple. first and rest are not produced until requested. How much of a list do we need to have? It ends up: none at all. We just need a recipe for finding the first thing, and a recipe for finding the rest of the things. That is really all we need to say we have a sequence. So there's an easy way to define your own lazy sequence producing functions. It's called lazy-cons. It takes two expressions. It is a macro. It takes two expressions. One will produce the first value when requested. The other will produce the rest, OK? Everybody is now going to raise their hands and say, "Oh, we've had lazy streams. We know about lazy streams. We have it. Whatever." But all the lazy streams libraries, do they interoperate with the cons cells? No! There's lazy this, and lazy that, and lazy that, and lazy whatever, and lazy whatever, and you can't cons. That is not the same. If you have first and rest, you are a seq. You can cons concrete things onto lazy things, and then more concrete things, and then you can concatenate them all. They are all the same from an interface perspective. That is not the same as a lazy streams library. Although you can understand lazy conses by understanding the way lazy streams work, and libraries like the ones for Scheme. You can use them like iterators or generators of other languages, but they are much better because they are not mutable. And they interoperate, I already said. So this is what take looks like. take is kind of more Haskell-ish lingo: take n things from some collection. If it is still a positive number and there's stuff in the collection... OK, this is classic Lisp punning, there is a cons or there isn't. Yes. Audience: You're going to tell me later how to write a take that actually returns a value of an arbitrary type? Because here you're writing a take, and it returns a sequence. take returns a sequence. Audience: It returns a seq. It returns anything that implements seq. seq is an abstraction, so it can return any arbitrary type that implements seq, not one particular type. seq is an interface, so that doesn't dictate the concrete type. That only says that whatever the concrete type is, it has to support this interface. Audience: Yeah, but what I get back is going to be a seq. An implementation of ISeq, yes. So that's pretty cool. So we just lazy-cons, calling first on that, and then take on the rest. OK? Yes. Audience: So if I wanted to take the first 100 things out of a vector and have that put that some place I could have random access, I would want to do some ? Well, no, you could take that, and then dump it into a vector if you wanted to. Or you can use sub-vector, which is a constant time way to get a window on another vector. If you really wanted another vector with constant time look up. Audience: So there is still a difference in that sense. Yes, the sequence library is still a library of sequences. It has the properties of sequences, which is linear access. But there's so many things you do with that. There's mapping, filtering, and all those great Lisp functions, now apply to everything. That's the point. It doesn't make sequences now the best data structure for all usages. Sometimes you're going to take that thing, and because you know you're going to need to hammer on it from a look up perspective, dump it into something where that look up is going to be fast, because it is not going to be fast on a seq. But that is very easy to do. Any other question on this? (slide 25) So as I said before, Clojure is primarily a functional language. There was a great talk by Eric Meyer where he said we were all wrong about functional languages, and the only possible functional language is Haskell. Because it has to be lazy, and it has to have this type system, and the type system has to let you declare effects, etc. etc. etc. The rest of us sort of banded together and said, well, we still want to call our things functional, but we'll put this qualifier on it all of the time. So it is impure. It certainly is. I don't believe in disallowing dangerous things, because sometimes dangerous things are useful. The idea behind Clojure is it gives you tools to do the right thing, and if you can do that most of the time, your program will be a better program. If you have to do something ugly somewhere, I'm not going to stop you, because I know you have to do that, either for performance or whatever. So what does that mean for Clojure? For Clojure it means the core data structures are immutable, the core library functions do not produce side effects. let-bound locals are immutable, so no evil twiddling inside your function, because if you could, you'd have all kinds of other issues, because you could close over those twiddlable locals, etc. etc. Also, it creates this awful style where in the whole world you're doing this nice thing with functions, and applying functions, and inside your functions you have this mess where you have loops that trash local variables and you just end up making spaghetti -- small bowls of spaghetti. (laughter) So we don't do that. Because of the lack of tail calls, and the desire to have a sort of functional way of programming, I needed a special construct for looping, which I'll show you. (slide 26) So, this is what it sort of looks like to use. Drop 2 things from the vector, you get a sequence of that. Infinite sequences. Why not? cycle returns an infinite sequence looping through whatever was passed, over and over and over again. As long as you don't try to print it at the REPL, because I haven't implemented *print-length*, you'll be fine. Note that *print-length* has been added since this talk: http://clojure.org/api#toc22 That is really a beautiful thing. It will change the way you write software, to have essentially all of these functions are lazy. They all return a lazy sequence. All of the algorithms in Clojure, that can. I mean, some things can't be lazy. But anything that can be lazy is. So if you pass some huge mungo thing, partition 3 is going to return just the window on getting the first one. It ends up that makes it really easy in Clojure to manipulate and work on data sets that don't fit in memory. Also, you've been paying a huge price by building your algorithms in terms of functions that return fully realized lists, and then of course n-reversing them or reversing them. It's awful. From a memory contention standpoint, it's a big deal. Remember I talked about that ephemeral memory being cheap? You know what else is really good? Ephemeral memory always lives in the generation that can get tossed for almost nothing. Right? Garbage acquisition, or memory acquisition in that generation is pointer bumping, and the collects are extremely fast. But when you start allocating real lists, or real data structures, to accommodate interim values, interim results, pieces of your algorithm are creating these huge things. Guess what? They're going to move into the next generation, and the garbage collection cost will be much higher than if you'd used a lazy approach. So this is really great. interleave is lazy, partition, these are all lazy. And there are just tons of functions in Clojure. Audience: Is the first line a typo, because you drop a vector and get a list back? No, you always get a sequence back from these. These are the sequence functions. They take anything, but they'll return a sequence, because they're returning a lazy sequence. If I had to return a vector, I would have to realize it, and all the problems I just talked about would happen. If you want a vector, it is really easy to say ... you can just say into an empty vector, drop whatever, and now you'll have it poured in. In fact, into works with anything. You could say into a set, empty set, or a set that has some stuff in it already. So it's really beautiful. You can say into something, take blah, and that will get dumped into the thing, whatever its concrete type is. So this is really neat, and it works on strings, and it works on whatever. range is also a lazy producer of infinite lists, infinite sequences. Audience: ? Excuse me? Audience: into is just a representation ... into takes any type of collection, which is going to support conj, and then some sequence, and then its going to take things from the sequence and conj them into the collection, so it will do the right thing for the type. Audience: And then it returns the collection? It returns the collection. That's right. So for instance, you could write a completely generic map, right? Because people say, "Oh, I wish I had map vector". Well if you said into an empty vector, map this thing, you would get that. So then what's different between different calls to that. Just the thing, right? Just the, I'm dumping into a vector, and I started with a vector, or I'm dumping into a set, and I had a set, or I'm dumping into a map, and I had a map. Well, there's a function called empty, so if you had some collection called C, you could say into empty C, map this function on C. And whatever C is, you'll get a copy of it, and that code is 100% generic. You don't care. It will work on anything. That's cool. (slide 27) OK. So this is some of what maps and sets look like. This is def. It just creates a global thing called m, and we give it a map. Maps are functions of their keys, so we can look something up by just calling the map itself. Also, keywords are functions. Keywords are functions of associative things. They look themselves up in the thing, because keywords are so often used as names, as keys. So it ends up supporting a really nice looking style. And if you ever wished you had a better way to do attributes, this makes that nice looking. You can get to keys, you can get to values, you can do all kinds of stuff. The basic operation for adding to an associative thing is assoc, which is short for associate. Again, you can't own this word. So it takes some associative thing, and then an arbitrary number of keys, values, keys, values. And it gives you a new map with those entries made to it. If we look at m again, it will be the same thing that it was there. In fact, this function, this call here, presumes m is the same way as it was, because it didn't change here. So merge-with is a very cool thing. It will take one or more maps, and it will say, "Pour all the stuff from these successive maps into the baseline map. OK, well what if there's already a key there? Well, use this function to arbitrate the value combination. So if there's already a key, take the value you have at this key, and add it. You can merge with conj and build data structures this way. It's fun. There's a whole library for set logic that works on real sets. Union, intersection, and all that stuff, and go a higher level still. What's a relation? A relation is a set of maps. And guess what? If you build relations, there's a whole library of stuff for that, like relational algebra stuff, like joins. Audience: When you get that union there, what determined the order of the new union? Because these are hash sets, the order is random. Because it's hashed. If they were sorted, you'd get a sorted thing. All of the literals are the hashed versions. The literal map is a hash map. The literal set is a hash set. But you can get the sorted version with function calls. Audience: In that line, couldn't you have done the association with conj, right? I could have written it with conj, but then you'd have to treat each guy as a pair. The values in a collection that is associative are pairs, logically. So like if you seq on m, you're going to get a set of key value little vectors. Audience: You said that a hash map is arbitrary in the order ... In the print, right. Audience: If I union two things twice, do I get the same thing? You get whatever you started with. Are you going to get ... you're going to get whatever the left one is. Audience: So if I union these guys, if I union again Yeah. Audience: Suppose I union this guy and then union them again, am I guaranteed that the results will be the same? You are guaranteed the results are going to be _equal_. I'm going to talk about equal. Audience: When you say that the reader versions are hashed, you don't mean that they're implemented literally with hashing, and that they're unordered? I mean that the concrete thing you get is the hash map or the hash set. I have to pick one. I'm reading, and I see a literal map, right? I have to produce a real data structure. The real data structure I'm going to produce is going to be in the hashed family, not the sorted family. Audience: Right. The point is that it has unsorted semantics. I'm not going to promise what they are, but usually hashed things don't have any sorting, right? Because maybe I'll come up with a new hashed/sorted hybrid. I don't know. Audience: But you're not promising that it will actually really be a hash table underneath (that's small enough to just...?) Sometimes the fastest way to implement a hash is to do a skip list thing, right? That's proven, when it gets small enough. But it's still in the hash family in that, when that thing grows, it will become hashed. There's ... oh, I should make a slide of that. There's this huge beautiful hierarchy of abstractions under Clojure. Somebody made a picture of it once. It's scary. So, for instance, there's an abstraction called Sorted, and one called Associative, which aren't even in the concrete tree of things, and Indexed, and Reversible, and stuff like that. The abstraction set is good. (slide 28) So, just to get a feeling... Again, I can't... I don't expect you to totally be able to grok Clojure code, but just to get a feel for the weight of it, this is some Python. Peter Norvig wrote this Python code. It's a simple spell corrector. He uses a little simple Bayesian thingy. But Python is sort of the champ in terms of syntactic concision? Because it has no grouping. It uses white space. and length (slide 29) and this is the Clojure version of the same thing. Roughly the same. And we all know how to ignore this. (laughter) As do our editors. In fact, we know the superiority about being able to move these pieces around, vs. something that is whitespace based. So it's kind of neat. You know, def and blah, and you see again where we are using lists for, what we would have used lists for, data, we're going to use vectors instead, and the syntax. There's list comprehensions, or sequence comprehensions. A bunch of other cool stuff. But it lets you write code that is the same size and weight as Python. (slide 30) So, saying that everything is immutable, the first thing is "Oh, my god! That is going to be slow! You're going to be copying those things left and right." But of course that need not be the case. We know, for instance, that when you add something to a list, we don't copy the whole list. We don't need to, because it has structural sharing. And similarly, there are structurally shared versions of these data structures. So by persistent data structure here, we're not talking about the database persistence, right? We're talking about the concept of having something that's immutable be subject to change, having the old version and the new version both be available after the change, with the same performance characteristics as before. So this isn't a trick one. There are some persistent algorithms that produce changed versions whose performance degrades over time. As you add more new things, all the older versions start slipping out from the performance profile. To me, that's not really persistence at all. And it's not usable. In particular, the way those things are implemented is usually with some real sharing, which means that they're a disaster from a concurrency standpoint. Really, to do this correctly, you have to have immutable data structures, which Clojure does. The new versions are not full copies. So there's structural sharing. This really is the Lisp way. This is the kind of map, you know, or hash table, or vector, you should have in Lisp, because you want to do the same kinds of programming with these other data structures that you do with lists. And so often, you have this beautiful thing you did with lists, then you try to scale it up, and ugh, lists are really not the right thing. Then what do you do? Completely change your program! Right? Because using the other data structures is nothing like using lists. They don't have the same properties, they mutate, etc. etc. That's not true in Clojure. So you can use real maps for things like environments, which is just so beautiful, and very very fast. Oh, I don't have the slides for that. So I will say that Clojure's data structures are the fastest persistent data structures I know about. They're different. They're based around Bagwell's hash map trees. They have a depth of log 32 N, so they're very shallow. And they're fast. They're fast enough to use this for commercial development. Compared to the Java data structures, they are ... for insertion times, like 4 times worse. For look up times, they can be as fast or better, to twice as slow. So that's way in the ballpark considering the benefits. And when you start comparing them to having to lock those other things in concurrent scenarios, there's no contest. So this is a small cost we're going to pay, and we're just going to be able to sleep from then on. http://en.wikipedia.org/wiki/VList http://lampwww.epfl.ch/papers/idealhashtrees.pdf (slide 31) All right. So as I said before, we have two things impacting Clojure from a looping perspective. One is, I don't want any mutable locals. The other is, the JVM won't let me do tail calls. You've seen languages... SISC Scheme does tail calls in the JVM, right. It does it with this whole other infrastructure. They're calling scheme is nothing like Java's, right? You have to pass additional things, or trampoline, or whatever. Clojure does none of that. Clojure has pedal to the metal calling conventions that match Java's, so I don't have tail recursion, because you can't do that unless the JVM does that. So we have a special construct called recur. It does constant space recursive looping. It's going to rebind and jump to the nearest loop, which is another special op, or function frame. So you say I want to zipmap this thing, I say loop. It is exactly like let. loop is like let. You let all these things, and then if we need to keep going, we recur. recur is going to go and hit this loop, rebinding map, k, and v, to this, that, and that. Audience: So does the program compiler analyze and make sure it's a tail call? recur can only occur in the tail position, and I will flag you on that, right. And it's a goto. It's fast, under the hood. But the semantics are the semantics of rebinding. Audience: What's going to happen to all this if the JVM actually gets tail calls? Well, this recur happens to target this, but you can have recur's that target the function arguments themselves, if there was no enclosing loop, so those recurs could be like a call to the same named function. But the biggest reason for true tail call is that you want to build some sort of network of recursive calls, not self calls. And basically what will happen is, you can start doing that. Right now if you do it, you're subject to stack overflow, and when they change this, you won't be. It ends up that in Clojure, the cost of the lazy sequences and this loop/recur, people complain a lot before they start using Clojure, and they complain hardly at all, after they know how to do it the Clojure way. That's not to say I don't want tail calls. I absolutely do, and I'm looking forward to the JVM guys doing it. But that's what I have. So that's a little zipmap, and you can create a set of keys and values, and they will get zipped together and you get a map out. These two things will do the same thing. You have apply, you can do that. You can interleave, or you can dump into that into that I was talking about before. Audience: I assume that if you wrote the code using a self invocation, you wouldn't intentionally take it and turn it into something like this? I don't, and I don't on purpose, because I get a lot of people coming to Clojure from Scheme, and I really feel like I don't want to tease you, and I don't want you to be confused about when you will get a real tail call, because I think that is an unworkable position to be in. Can I do this? Will I get a tail call? So I just say no, you will not. The only time you will is if you do this. When we have tail calls, you can do the other. And that's really sort of clean, because otherwise people would be very frustrated, and understandably. So I do not optimize self calls, even though I easily could. (slide 32) OK, equality. The equals sign is equality in Clojure. There is also identical?, which is reference equality. We really don't do that. It's there. I know sometimes you need to do it, but it's not something you're going to do often in Clojure. Audience: ? Equal! Equal! Have you read Henry Baker's great paper on equality and object identity? Anybody? http://home.pipeline.com/~hbaker1/ObjectIdentity.html http://portal.acm.org/citation.cfm?id=165593.165596&coll=Portal&dl=ACM&CFID=14076373&CFTOKEN=85491572 Baker, H. G. 1993. Equal rights for functional objects or, the more things change, the more they are the same. SIGPLAN OOPS Mess. 4, 4 (Oct. 1993), 2-27. DOI= http://doi.acm.org/10.1145/165593.165596 Audience: A while ago. It's a great paper. It's still great, and it's still correct. In it he says the only things you can really compare for equality are immutable things, because if you compare two things for equality that are mutable, and ever say true, and they're ever not the same thing, you are wrong. Or you will become wrong at some point in the future. So he had this new operator he called egal. He went through all this equal, and eq, and eql, and agh! I need a new name. So he came up with egal and defined these semantics for egal. Those are the semantics of Clojure. If you're comparing two reference types, they have to be the same thing in order for them to be equal. Again, here I mean Clojure's stuff. There is still Java stuff. Java stuff has plenty of broken equals, and I can't fix them. So if you call equal on some Java things, it's going to map to equals, and it will be broken sometimes, because they're broken. Audience: That maps to Java dot equals object? Yes, object dot equals. I can't fix that. Use Clojure's data structures. They're much better. Audience: For cyclic structures, do you do the thing about considering two things that are observationally equivalence, even if they have distinct ... You know what I'm referring to here? Well, good luck trying to make a cyclic data structure out of all these immutable parts. Audience: So, you can't express ... Audience: tbd Which is not really a data structure. It's a logical sequence. I don't have any special tests for equality determination there, because I don't think ... Audience: I'm just recently out of Scheme, so I was curious if you followed tbd No, I haven't yet. Don't do that. Audience: But if you call equals on a lazy, on a lazy generator that generates an infinite sequence, then what happens? It's the same as trying to print it at the REPL. You're going to consume it all. That's right. I mean, it may terminate early. (laughter) No, not abnormally, but it may terminate early because it will find the things are not equal. Audience: Yeah. If one is finite, it's finite. Audience: Yeah. Audience: What if you do into, a cycle? Same thing. I mean, a cycle is infinite. What do you expect? Take from cycle and build what you want. You don't get... There are no other infinite data structures. There are infinite sequences. There are not infinite vectors or infinite maps. OK, so that's how it works. I haven't even talked about references, but Clojure does have some reference types. They are the things that have the concurrency semantics. (slide 33) All right, the big religious debate. nil, false, end of stream, or end of sequence, or end of list, and the empty list. OK. Let's just do it. Clojure has nil. nil means nothing. You don't have anything. It's not the name of some magic thing. It means nothing. Common Lisp has nil and the empty list, and those notions are unified. It might have made sense, because you really only had one primary aggregate data structure. I don't, right? Why should nil be the empty list? Why shouldn't it be the empty vector, or the empty whatever? There's no good reason. Scheme punts on this, they don't like that nil punning or end of sequence stuff, and Java has null, and it means nothing in Java. It could be any type. Clojure has real true and false. Common Lisp has nil or not, right? Scheme has true and false, but they're broken, right? And Java has true and false. Clojure's conditional logic is like Common Lisp's. I like that. I like nil punning. I like saying nothing is conditional false. But I have two cases, because I have to deal with false. I have to deal with false. false is in Java land. It's going to come. I'm going to encounter it. I tried very hard. There's no way to automatically or automagically do a nil/false translation in both directions. It's not possible. If you think it is, write it up and send it to me. But I think it's not. So I have these two things, or everything else, which is like Common Lisp's. It's also like Scheme's. Of course, this seems to be inherently wrong. If you're going to do this, then do it, right? Audience: What's your objection to false? I don't have an objection to false. I have an objection to, if you're going to say true and false, and this is logical false, but no, it's not logical false. Audience: I know, I meant earlier you said you were forced to use false because Java used false. Well, it would be nice if I didn't have to do both of these. It would be nice if I could do nil and everything else. I would be happy with true and nil. I'm fine with Common Lisp's use of T and nil. I would have been done. I tried to make that work. OK, is there a special singleton, the empty list. And the singleton is the key thing. There are empty list values in Clojure. In fact, they are not nothing. Because an empty list is not nothing. It's an empty list. And it needs to be distinguishable from an empty vector, because you want to say into, into it. If you only had nil, you have no power to have more than one special empty thing, so no. Clojure, no. We know Common Lisp and Scheme have the empty list. Here again ... yes? Audience: Plus then you can have metadata on an empty thing. You can have metadata on an empty thing. You can tell the difference between an empty thing and nothing, because they're not the same. An empty bucket is not "no bucket". Audience: I'm not getting the distinction. Common Lisp has a specific empty list. Clojure doesn't have an empty list? It doesn't have a singleton empty list. You can have ten different ones whose identity differs. Audience: Oh, OK. And it's not represented by nil. That's also sort of the key piece here. I'm not actually trying to establish the rightness or wrongness of anything except this. (laughter) I'm just showing you how they're different, and the decisions I made, which generally, as you'll see through this, correspond to Common Lisp's choices. Audience: If all of those empty lists are immutable ... Yes. Audience: ... then why would you not want to put them all into the same representation? I do. (laughter - probably while he points at the '() representation) Why would I do otherwise? Audience: I thought you just said that each of those empty lists had a different unique identity? It could. It could, but I don't want you to start doing identity stuff with that. Use equals, OK? equals is good. End of sequence. OK, what happens when we are using a sequence -- it used to be a list -- we're walking through and there's no more. Well, I say nil as Common Lisp did, because Common Lisp meant this when it said no more (probably referring to nil/'() equivalence in Common Lisp). I mean there's no more. There's nothing. So that's the same. And of course, combined with this (probably the nil or false conditional in Clojure), it means you can do nice elegant little loop things, because that idiom of Common Lisp is good. I have host issues. I have to deal with the host, so nil maps to null, true and false map to big T big F True and False in the Java land. And the library does not use concrete types. Are we OK? All right. That was easy. (slide 34) So this is just some Clojure-y -- how does Clojure do some things that other languages do -- other Lisps do. Clojure can have arity-overloaded functions. OK, Clojure doesn't have optional arguments. Instead it has real arity overloading. That means that a single function object can hold inside it multiple function bodies overloaded by arity. That is really fast in Java, and it's quite nice. It means you can report all kinds of things, and you don't really have any conditionals to do this work inside. So it's kind of a neat thing. So this is an example, drop-last, there are two overloads. You can pass it a thing, or you can pass n and a thing. It does support variable arity with ampersand, and the argument after that is bound to the rest of the things. That maps to a sequence, which means you can theoretically pass an infinite set of things as arguments to a function, as long as you don't try to print them or do any of those things you shouldn't do. Audience: So if you wanted to have the Common Lisp optional arguments, you could just have something extract ... Really, you can build optional arguments or keyword parameters on this, so they are not primitive, in my opinion, and they don't belong in lambda or fn. Make macros. (slide 35) I'm going to talk later about references, in the context of the concurrency primitives, and these are the mutable things in the language, but they're actually here because of the mapping to Vars from Common Lisp. So you can have Vars. Unlike Common Lisp, where there is this sort of symbiosis between a symbol, and the value cell, and the Var, that's separated in Clojure. There are Vars. Vars are in namespaces. Those are the things that really have the data associated with them. Symbols are lighter weight names. Vars are named by symbols. And they have dynamic scope. They obey a stack discipline that is exactly like Common Lisp Vars, except that you can define a Var at the root with def, it could be unbound. set! is restricted to apply only to Vars that have been bound thread-locally. OK, so you can't set!, you can't whack the roots with set!. Otherwise, they have the semantics of special Vars. Audience: I don't understand. What do you mean by shared root binding? When you have a dynamic Var, a defvar in Common Lisp, there's a root value. You can set it in Common Lisp. You can setf it. You can't in Clojure. You could bind it locally, right? And the special operator for that is called binding. It's not let. So there's a special way to bind these things. When it's bound, then you can set it, and not otherwise. It can be unbound. There's really nothing special to them. Otherwise they are extremely similar to Vars, you know defvars, except they have real thread-local semantics. It is guaranteed a binding in a thread will be distinct and invisible to any other thread. And you can use that in concurrency programming. That's a hard promise. Audience: When you bind a new (upper function?) that you set! and you fork off into a new thread ... You get a whole new set. No binding propagation to threads. Well, there have been long arguments about this. I decided no. Audience: Do you effectively copy that? No, you get a new binding set. If you want to propagate bindings to a fn, and you're launching another thread, do it, because I'm not going to do it for you. Audience: It takes the global ... It's a new thread. It's got root bindings. Set up bindings that you want. OK? So it's very neat. The one neat thing, though, is because Clojure is a Lisp-1, and all Clojure functions are defined with def, they're all stored in dynamic Vars, which means you can dynamically bind functions in nested contexts. That's really cool. If you've been into ContextL or aspect oriented programming, you can do that with Clojure out of the box because of this behavior. Because functions are stored in Vars, you can rebind them in context. If you want to, in a particular context, add logging to a function, you can do that. Change its behavior, swap out the algorithm, because you know the data set is going to be unusual in this call context, totally doable. It's very powerful stuff. And it sort of fell out of it. It's easy. Audience: ??? How much do you have to actually do at runtime to make that happen? The thing about the JIT optimizations is, if you're calling something 3 times, you do not care, right? If you're calling it enough times, in a particular context, the JIT is gonna kick in and make a custom version. Audience: Is it in practice that they do that? I doubt it. For dynamic rebindings, I doubt it. Would I put it past them? No. What was neat about the JVM languages summit, was for some of these engineers, because they have a lot of the same guys that worked for JRockit and Azul, this is the first time they're sort of hearing the needs of these dynamic languages. The kind of stuff that would matter to languages like this. They all were highly informed by the presentations. They were, "Ooh, we don't do that simple escape analysis, and if we did, all of these boxed numbers could disappear." And various other things could happen. Audience: ??? Do you sometimes have to locally re-bind something? Thread locals. It's thread locals with a fast path that tests an atomic integer, which is spun up when things are bound, so if no one has rebound it, there's a fast test that doesn't involve thread locals, to say, pfff, can't be thread locally bound, fast. It's more complicated than that, but that's the short answer. Audience: So if I have a let form and I (??? bind the first) it won't shadow the first function because you don't use binding ... No it will shadow. It's still lexical. It's still a lexical system. In other words, I have to get all the way down and say, this thing really means that Var at the root. If you've lexically shadowed it, then that shadowing happens. Otherwise, it's a dynamic system. It's not really lexical then. Audience: Then why are you even doing it? Because I don't want to muck up let. let is so confusing when let does dynamic binding and regular binding, in my opinion. I want people to know when they're doing this. I'm going to talk about it. There's only one let. It's let star. It's sequential let. Audience: In talking about dynamic variables, if the JVM ever does get a true tail call (??? ...) It's not going to interact through dynamic bindings. Audience: ? It's still in tail position, right? It's still got to test to see which one to call. It's still tail. So that still can be optimized through. There's nothing about it that's inherently non-tail. If you make a decision and use this, and that's the tail call, then that's still a tail call. It's not like you're taking a result back, and then doing something with it, and now that's not a tail call any more. That's the only place where you run into trouble. So I consider it optimizable, but it probably isn't yet. (slide 36) So, I say Clojure is a Lisp-1, and it has Common Lisp style defmacro. Typically, you would say, "Errrr! That isn't going to work." That's why Scheme has this hygienic system, because you're going to have these clashes. But it ends up you have a lot of these clashes because of the way the reader and packages work in Common Lisp, not because of any inherent problem. So if you separate out Vars from symbols, then symbols can be really simple. In fact, they can be read by the reader, and not impact the system. In other words, there is no interning happening during reading. In fact, there is interning, only of the names, so that symbols are still fast-comparable, but it's not interning of values, or mucking with your Var space. There is no value cell or anything like that. They have constant time equality. So they're usable as fast keys and the things you typically use things for when you are doing symbolic programming. But the reader has no side effects. It returns this. Symbols can have a name space part, and even a name space qualified symbol is still just a name. It doesn't say there is storage associated with this. It does not say or imply there is a Var there, or anything else. It is just a name with a namespace. So namespaces really are a way to just qualify names. They're not about packages or anything else. You just say my namespace slash the thing, and you get a namespace-qualified symbol. It's just a name. Then, the trick in Clojure is, there's a process called resolution, which resolves names. So now you're in a namespace. You've read in a bunch of these unadorned, or possibly namespace-qualified symbols. Now you need to say, "What does foo mean?" OK? Now we're going to go look up the Var namespace and say, "In this namespace, foo means this Var." But we didn't mess with those namespaces just by reading some source code, or anything else. It also means that macros are manipulating this name world, not this Var world, so they can easily manipulate names, and emit stuff, into a space in which they'll be qualified where the macro is defined, and the qualified version will be what the macro emits. I'll show you that. (slide 37) So I call that syntax quote. So we have regular quote. quote foo is foo. 'foo -> foo It's a symbol. So we're imagining a fresh REPL. Nothing is defined. Backquote foo says, well, we're in the namespace user. That name resolves to user/foo. `foo -> user/foo There is no user/foo yet. It's just saying this is the way the name is qualified, which means when you use backquote in macros, it's going to look up those names in your namespace and emit, again, names, that resolve to what their name means in your namespace. But still just plain names. So we can see if these two things are equal, and they're not, OK? (= 'foo `foo) -> false If they don't look equal, they ain't. We're not in that namespace world. If we try to use it, we can't. foo -> Exception: Unable to resolve symbol: foo We didn't make any Vars called foo. So let's define something called foo. (def foo 5) -> #'user/foo No problem. It gives us this. This (the #') means something different in Clojure. And now we can evaluate foo. foo -> 5 We get 5. We can say ... we can refer to foo, qualified or unqualified. We're talking about the same Var. So it's the same value. (= foo user/foo) -> true Audience: Why doesn't the first foo evaluate to 5? This (foo and user/foo) is 5 and 5. 5 and 5 are equal, still. Even in Clojure. user/foo is just a fully qualified name. We're in the user namespace, so this (foo) will resolve to user/foo, because we did this (I think that refers to the (def foo 5) above). And we did this, and we didn't get an error. In fact, if there was already a foo in this namespace, this (the (def foo 5), I think) would have failed. Clojure is stricter about names existing before you use them.vv So this works. We can resolve foo, right? (resolve 'foo) -> #'user/foo In this namespace, what does foo name? This (the #'user/foo) is a Var. I said sharp-quote is different. This is the Var, the actual Var that is in the namespace, that has the value in it. So we name them like this. And the Var is called user/foo. A Var _always_ has a namespace-qualified name. Audience: Vars aren't objects? Everything is an object, but Vars are the places where stuff gets stored, not symbols. Audience: What is the current namespace mean? You're in a namespace. I'm presuming we just started the REPL, so we're in the user namespace. Audience: How do we know what namespace you're in? There's all kinds of functions for that: in-namespace, namespace, by-namespace, you can change namespaces. It's a lot like Common Lisp that way. Audience: It's global state? You can setq it? It's thread local state, because it's bound. So it's thread local, and it will unwind when you do a compilation unit. In a compilation unit, you can say I'm in this namespace, and the current namespace gets pushed, it gets bound, a new binding gets pushed and popped. So it's not global. There's no trashing from other threads. Audience: So you don't have to define a namespace before a symbol can be qualified in that namespace? No. You can say anything you want. These are plain names. Until you get to resolution, it's fine, which is really the way it should be, right? Who cares? It's just this name. In fact, you may want to use it because it names something in some XML file. They're names. Symbols as names are more useful than symbols that are bound to be Vars. The only cost, really, is that the reader can't read pointers. The Common Lisp reader can effectively read pointers. That's a cool thing. It has a huge cost. Audience: Can you elaborate on that? ??? What can read pointers? If I read, right, I read a symbol. It's interned. It's storage. If I read that in another file, I get the same storage. Just by reading, I have two pointers to the same storage. Which is cool, and there's some Common Lisp code that leverages that. You can't do that in Clojure. It is a cost. Audience: The videographer begs to pause for about 30 seconds so he can switch tapes. OK (Part 2) So in practice, this combination of things. So look, we resolved foo, this was a local guy. We got the local one. Now list, we never defined in the user namespace. In fact, all the Clojure functions are auto-imported into the user namespace, which means if we resolve list, we see it says clojure/list. It's the system one, which is neat. Which means, if we write a macro that says backquote list foo x (referring to expression `(list foo ~x) on slide), and expand it, look what we get. Data structures with symbols, but the symbols have qualifiers. No matter what namespace I resolve this in, clojure/list resolves to clojure/list. So this system actually solves a lot of the hygiene issues from being a Lisp-1. Separating symbols out. I don't want to spend any more time on it, but I will say I think this goes a long way towards that ... oh, I also have auto-gensyms. That is not to say you will never call gensym. You will, but you'll have to be writing a more sophisticated macro, in which case you know what you're doing already. (slide 38) Clojure has pervasive destructuring. Destructuring is pretty cool, I think. Pattern matching is all the rage and I'm not exactly a fan. I like the destructuring part of pattern matching, but I don't like the switch statement part of pattern matching. So Clojure has destructuring, but it has it everywhere. But like the other parts of Clojure, destructuring is based around abstractions. It's what I called abstract structural binding. So in let and loop and fn and binding lists really anywhere where you're doing bindings, or anything that's built on these things, which is basically everywhere you're doing bindings, you can use data structures where symbols would go, and you're going to get destructuring. So you would say let a1 or let a42, right? Now, instead of saying a, you can put a data structure there, and there is interpretation, two interpretations. If you have a vector as your binding form, it will destructure anything that's sequential, so that's anything that implements the Sequential interface. You can destructure lists, and strings, and arrays, and anything that supports nth, and it will pull that apart. Of course it will be a vector with names in it, and those names will be bound to the parts of the thing, and I'll show you this in a second. You can also have a map as a binding form, and that will destructure anything that is associative, which includes maps and vectors. Vectors are associative. You could do a map style binding form and get the 17th thing out of a vector, and call it fred. (slide 39) So unfortunately, I think these examples are a little bit too verbose, but this first one, we might be able to understand. So let, there's not a simple name here, right? The first thing, it's a vector, so we're going to bind a to the first thing b to the second thing, c to the third thing, d to the rest, and we're going to call the whole thing e. And then we're binding that to a literal vector of numbers. So a, b, c, d, e. That's really cool. This nests, so this is a vector with two vectors, where we're naming the parts inside the vector, and then we're passing a vector two vectors, and we're naming the parts of it, so that all does the right thing. This one is particularly wordy, and too large, but the idea here is name, key, name, key, name, key, same :as thing, defaults, etc. etc. We bind it. We pick the pieces out. We can get defaults in. We can name the whole structure. a, b, c, I didn't have b, right? So we got the default, and m is the argument. Audience: What does the :or do? :or says if you don't get the key, use this as the value. Audience: You said that destructuring works for function arguments? Yes, you can put these things in function arguments. Audience: That means you can have optional arguments? You can just :or them? A certain flavor of them. Right? If you're using :or, someone has to be passing you something that is Associative, which the argument list isn't by default, but if they pass you a map, they could get defaults that way. You can build all kinds of cool things with this. They nest arbitrarily, so if you do a vector destructure, or a map destructure, and then for one of the arguments, you destructure that key as a vector, you can pull all the pieces out, and if you stare at this long enough, you can figure out what it says. But the idea is that they nest. The nest arbitrarily without limit, and it's very powerful. Audience: There's also the keys ... Yeah, it's not even on here, but there's other sugar that lets you avoid saying {a :a b :b c :c}. If you want the same names, you can just say "keys a b c", and it will pull them out. Similarly you can pull out strings that way or symbols that way, so there's lots of power, user features, inside this. But the nice thing is, it's every place. http://clojure.org/special_forms#let (slide 40) So polymorphism. I said Clojure is not object oriented, but Clojure is not anti polymorphism. I'm very pro polymorphism. I have two fundamental ways to do polymorphism. One is the Java way. You have interfaces, all the abstractions in Clojure are defined in terms of interfaces. The nice thing about doing that way is it's a bridge to Java. If you want to allow extension via Java, Java can do that. Clojure can also do it. Multimethods are the other way. That's kind of Clojure-specific, because Java doesn't have this kind of power. But what I wanted to do was avoid marrying dispatch to types. I'm sort of tired of types. I'm tired of having these rigid type systems that really incur a lot of accidental complexity for you. Every time you make a class, you have all this complexity. You don't necessarily realize any more. And fixing valuable feature like polymorphism to a type system, which usually you only get one taxonomy for your arguments. So what Clojure's multimethods are is a complete generalization of dispatch. What is generic dispatch? It's ... there are these arguments. I don't want to hard wire what happens when you call this function with these arguments. I want to do something specific to the arguments, right? In Java and C#, we're going to do something specific based around the type of the first argument. OK. In Common Lisp we have more flexibility. You can use the types or the values of any of the arguments, but it's still a hard wired heuristic, right? Unless you're getting into CLOS underpinnings, you're still saying it's either types or values, and there's an order dependency, and all this other stuff. But the reality of it is, it should be an arbitrary function of the arguments that determines what your method is. And that's what it is in Clojure. The dispatch function is an arbitrary function of the arguments. You want to have a multimethod, give me a dispatch function. I don't care what it is. That dispatch function will return different values for different argument sets. I don't care what they are. You can associate actual methods with those values. I don't care what they are either. Have at it. You want to build single dispatch, fine. You want to build CLOS style dispatch, fine. You want to do something that compares the values of arguments, go for it. You want to do something with the first and the third and ignore the middle guys, I don't really care. So how does it work? You're going to give me a dispatch method, a dispatch function when you define the method. When I'm called, I'm going to call it on the arguments to get a value. Then I'm going to look up and see, have you associated a method with this value. If you have, I'll call it. If you haven't, and you've provided a default, I'll call that. If you haven't defined a default, you'll get an error. (slide 41) So this is what it looks like in my silly bunny example. The key thing to remember here is, so we're going to define a multimethod called encounter, and the dispatch function is: take two arguments, and pull out their species, and return a vector of the two species. The trick you have to remember here is that keywords are functions of Associative things. So those are two function calls (referring to (:Species x) and (:Species y) expressions). The species of the first thing and the species of the second thing. So we're faking types. Although it's not the type of the first thing and the type of the second thing, because you only get one type. It's an attribute of the first thing and an attribute of the second thing, which means you can have other methods, multimethods with different taxonomies. You can look at metadata, which I haven't talked about. You could do whatever you want. So in this case, we're going to pull out the species. So we encounter. If the dispatch value is bunny lion, then we run away. if it's lion and bunny, then we eat it. Lion lion, they fight, bunny bunny, they mate. (laughter) Then we define a bunch of literals. b1 has a species of bunny, and I don't care what else. I don't care what type it is. I don't care how many other attributes it has, or anything. I don't care. I shouldn't care, and I shouldn't dictate to you that it be this particular type in order to interact with my API. That has to stop, right? Let people just glom stuff on so they can talk to you, and not have to _be_ something so they can interact with you. Audience: something about map interface .. the species, but stored in a different way. ... When you call :Species on it. Absolutely. I don't care, right? The way keywords act as functions is they look themselves up in the thing, using the map interface. Does it have to be a map? No. You could have species 1 and species 2, and you could pass 2 vectors. That would work. You know, whatever you need to do. So then we define a bunch of data, and then the encounter things do what you want. I haven't had time to update this slide, but I've enhanced this system substantially by making it such that the values that are produced by the dispatch methods are compared via an isa? relationship, which will work for equality, and two things that are equal will be isa?. But it also allows you to define relationships between really any names as hierarchies. It's called ad hoc hierarchies. So you don't need to have hierarchies only in types. You can just have hierarchies. You can say colon square isa? colon rectangle. You just tell me that. And then you can do this dispatch and say, if its rectangles, do this. And if you pass squares, it will work. Not types, right? Why does hierarchy have to be in types? Why stick it there? It's a dungeon. So hierarchy is about ... Excuse me? Audience: So it's an ad hoc hierarchy per method? Ad hoc hierarchies. No, it has nothing to do with methods. It's just ... you can leverage them here because the multimethod system uses isa? to determine what happens. http://clojure.org/multimethods Audience: (Some question relating to performance of this mechanism.) It's a hash table look up. So all that hierarchy stuff is looked up once and cached, and then it will track any changes to the hierarchy system. Audience: tbd Well, yeah, the overhead is a hash table look up. Audience: There's an extra function call. An extra function call, and a hash table look up. Yes. It's not bad. Audience: So if you're using the isa? test to ??? Don't you have ambiguity based on the ... Yes. Yes. You could. Let's just talk a little bit about that. I'm sorry I don't have it on the slide. So isa? works between any two names. You define what the relationships are. Then you say, what if I have a vector of these and those? Vectors do a per-value isa?, OK? Which still begs the question: there could be ambiguity. Yes, there could be ambiguity. For instance, you could use Java interfaces to do dispatch, and something could implement two different Java interfaces, so which one do you use? If I get an ambiguity, I will tell you when you first call it, and it will fail, but you can then tell me, prefer, for this method, prefer this to that. And that will resolve the ambiguity. Audience: So you actually test for applicability of every method that you defined, and then ... No, that only happens once. Once I get it resolved once, I cache that, so it's just a hash table look up by value. I see the same dispatch value, which is a result of the dispatch method, and it's a hash table look up. If I fail to find it, now I have to do the work. If I do the work and find an ambiguity, I'll tell you. If you want to resolve that ambiguity, you can by saying, for this method, prefer this to that. Prefer this interface to that one, for instance. Or some argument set. It ends up being a pretty good compromise. Audience: So is this similar to the predicate dispatch thing? What it's missing from predicate dispatch, which I really haven't been able to figure out how to do fast, is logical implication. Right? A real predicate dispatch should know that being less than 10 implies being less than 100. But this is not a predicate dispatch system. But it's pretty powerful, and very general. (slide 42) Clojure supports metadata. This is the ability to attach data to any value that does not constitute part of the value's value. It's _about_ the value. It's not a component of the value. So it's orthogonal to the logical value of the data. You could have a vector, and you can say, "I got this vector from Fred", or "I found it on the Internet", or "I last read it 3 weeks ago," or anything else you want to say about it. That's not part of the value, right? If it's a vector of 1, 2, 3, it should still be equal to 1, 2, 3. It's just, you found it on the Internet instead of pulled it out of the database. So it does not impact the equality semantics. You don't see it when you do any operation on the value, but it allows you to adorn values, and it's a very important thing that's extremely hard to do in most systems, which is track trustworthiness, or age, or validity, or your source, or anything else. So you have the ability to attach metadata to all the Clojure collections, and to symbols. Audience: But how is it that the namespace ... It has nothing to do with it. It's attached to values. Audience: Yeah, so for the things that are in the namespace (something about not attached to the Var) ... Vars have a separate notion of metadata. This is really the real deal. Audience: But if I wanted to have metadata attached to one of these tbd You can attach metadata to Vars, but this example is about real value operations. Vars are not values. Vars are references. So they have separate semantics for metadata, but they do have metadata. But I don't want to talk about that right now, because it's too confusing. Audience: But can't I just do metadata in Common Lisp by saying we have an object, and then there's this hash table that maps the objects to, I don't know Everybody has to know about that hash table. It doesn't flow through your call chain. Nobody can go an inspect it. They have to play. Metadata you don't have to play. And a real hash table where everybody had their metadata is a concurrency disaster. It belongs on the object, so it should flow. It should be copyable. And if you do that, it has to be associated with identity. These things are associated with values, right? I can have 1, 2, 3, and then say I found it on the Internet. I could have 1, 2, 3, and say I got it out of the database. What am I going to put in my global hash table? Errrr. I'm in trouble. I'd have to put the identity of these things, but I don't want an identity system. I want a value system. I can't have a hash table by value, because they're the same value. And I can talk more about it later, but it wouldn't work. Audience: Is this implemented using Java tbd ? No. No, annotations are properties of Not really. Audience: tbd Not everything. It's attached to the class stuff, usually. Not to every instance. Audience: It's attached to code. Yes. Audience: tbd It avoids a lot of added overhead, tbd In real systems, I need this all the time. As soon as you have a big enough system, you find you need this kind of thing all the time. I have this huge calculation stack, right? I source it with some data. I mean, I work on this election system. So I source it with some data from counties, right? But the data looks exactly the same as data that comes from precincts. Now I'm in the middle of a calculation. Do I have to say precinct map of this, county map of this. Putting it on types is wrong. It's just where I got it from. So I can flow that through. So essentially any of the Clojure data structures support detachment of a map of metadata, and you can use it for whatever you want. Clojure actually uses metadata in the reader. You can adorn your code with metadata, and you can talk to the compiler that way. In fact, that is how you give hints to the compiler. Because there is literal support for metadata. So let's look at this. We can define v. It's the vector 1, 2, 3. Then I can say trusted-v is v, same value, with this metadata associated to it. So one thing we know about with-meta is what? It's returning a new value. Right? Because this can't actually be the same v. Then we can look. This caret is the same as saying meta of the thing. The metadata of trusted-v is trusted. I mean the source of the trusted-v is trusted. The source of this thing (v), we don't have. It's great. Audience: The caret gets you the metadata of Caret is reader syntax for meta of blah. Audience: tbd Most of the things, like assoc or whatever, will build only on the first guys. They won't automatically incorporate. But all the operations are metadata preserving. OK, I can't really properly combine metadata. I don't know what it means. So I can't do that for you. But it's really useful. Audience: tbd of a sequence will take the first thing tbd Probably. I mean, when you're adorning with metadata, you don't want to have little pieces of metadata on sub-pieces of all kinds of things. It's not that meaningful. You're usually attaching it to things whose contents are going to be persistent, or you're going to be responsible for adding or removing, and those additions and removals will produce new values with the same metadata, so they are metadata preserving. That's sort of as far as I'll go, and anything beyond that you can do. (slide 43) OK, let's get into some of the meat. This is an important part of the talk, because everybody in every language is facing this. Your programs, are they ready to run on 600 cores at one time? Are they going to use them profitably? I don't think so, but that's coming. What they're basically saying is, these cores are not getting any faster. We're just going to give you more of them. Making use of them? Your problem, software developers. Hardware guys are _punting_! Punting! They've left us out here. So I want to be clear about what I'm talking about in this part of the talk when I say concurrency, because there's a bunch of different things. We're talking about interleaved execution, and now of course, there was always threads, and there were ways to do green threads and pretend, but the reality is we're getting _real_ interleaved execution now. Multiple cores means really two things are happening at the same time. We want to avoid seeing inconsistent data, or creating inconsistent data. And as the complexity goes up, as your universe of work involves more things, it's get more and more difficult to keep consistency, especially if you're using locks. There are also whole other sets of concurrency things that have to do with parallelism, right? I want to do this one logical operation and in the background this vector, you know, matrix multiply, just put it on all of these cores. I'm not talking about that right now. Java has a really nice library coming up in Java 7 for doing that kind of parallel math, and Clojure has a very beautiful wrapper for it. So here I'm talking about task level parallelism. Most typical applications are going to try to leverage these cores by setting up a few threads to watch the web interactions, and another couple to deal with the database, and maybe you have some streams coming in from this data source, and maybe you have a couple of calculation engines that you're going to use. That kind of task level parallelism is what we're talking about here. (slide 44) So, this is my message. I don't care what language you're using, except if you're using something like Haskell, or something already very purely functional. If you're using a system where you're using objects or shared data structures that are mutable, it is not going to work. It is not going to continue to work. You absolutely have to find some different way to do it. I think object oriented programs, when they get to a certain size, are just spaghetti. I've never seen one that grew indefinitely and kept being maintainable. So it's just a new spaghetti code. It's all object oriented, but it's still a disaster. It's very hard to understand a program, even an object oriented program, once you have all these relationships. Once you create a graph of interconnected changing things. But one thing is for sure, whether or not you disagree with that, from a concurrency perspective, it's a disaster, and unfortunately, it's the default architecture of all of these languages, including Common Lisp. You create objects, you get these mutable things. And it ends up that, even if you know this, and you say, well, I'll take a different approach in my applications, doing the right thing is really hard, because it's not idiomatic. It's almost anti-idiomatic. So, for instance, try to do immutable data structures in Java, for instance. Although I did. That's what I'm giving you in the library that underlies Clojure. But you look at the Java, you're like "Wow! That is nasty!" (slide 45) So what happens now? I mean, even if you're using a Lisp, one of the more cutting edge Common Lisps that has threads and everything else, chances are good they have these same old concurrency constructs that people in Java and C# are already saying don't work. So copying those is not really moving forward. What do you have? Applications have direct references to mutable things, right? And what could you possibly do in that context? It's essentially, each program is written to be imperative, like, "I own the world." Each part of the program. I own the world. When you don't own the world, you either have to stop the whole world, that doesn't work, or somehow somebody has to preserve your illusion that you see the whole world. And doing that with locks is incredibly difficult, in particular because accomplishing that is completely convention. You have to have meetings. You say this data structure is going to be looked up from multiple threads. If you're going to touch it, make sure you lock first, and then you have all this complexity. Well, what if I'm touching data structure A, B, and C? Well, we have to grab the locks in the right order, or you're going to deadlock. What if I'm only using C and B? Well, you still have to grab all three. Or maybe we'll have this global lock and our concurrency is going to go down. It doesn't work. If you're just starting to do this, just skip to the end. It doesn't work. (laughter) It doesn't work. I can save you years of grief, headache, and nightmares. Because it is a nightmare. I have been doing this in C++ and in C# and Java for 20 years, and it just is awful. You cannot ever believe that it's working. So, what do we do? Instead of that, we're going to say we're going to have indirect references to immutable persistent data structures. So this ref notion, it's kind of like Standard ML's ref notion. If you're going to have something mutable, you're going to have to call it out. This could change! All right? And then, Clojure has concurrency semantics for those references, so those references are like cells. All they're going to do is point to some bigger immutable aggregate. And if you're going to want to change those cells, because they can change, you're going to have enforced concurrency semantics. And your programs in Clojure do not do any locking. (slide 46) So this is what it looks like in the old school. A bunch of parts of the program are pointing to the same block of memory. The problem, fundamentally, is this: object oriented programming done the traditional way, unifies identity and value. You want identities in your program. Like tomorrow is going to mean different things as your program runs and runs and runs, right? So you want that identity, but the value of tomorrow, or today's date, or the time is something that is going to change. Unfortunately, when you say the identity of this thing is the address where we're keeping its value, game over. You can't do the right thing. Anything can change at any time, and the consistency problem goes to the users. (slide 47) So what do we do instead? We have programs refer to the identity, and the identity is that cell. That reference. And that in turn points to some immutable aggregate thing. It could be a Clojure vector or map or any of the Clojure stuff. So now we've separated identity and value. If you want to obtain the value, you need an explicit dereference. At that point, you can have a pointer to this. A hundred people could have a pointer to this. Is that OK? Sure. This can't change. It's immutable. Have at it! You want to look at it from a hundred threads, 600 cores, that's going to be great. That's going to perform well, because there is no locking required to look at the value. So we're going to explicitly dereference, the values can never change. You could never see an inconsistent value, because they don't change. Somebody's going to make a consistent value, stick it in there. You pull it out. It's consistent. Objects don't change in your hands. (slide 48) So what does it mean to do editing? Well we've already seen with all of these data structures, we make new values by calling functions on the old values, and getting a new value. So while everybody is referring to foo, and while foo is referring to this immutable thing, anybody who wants to could be saying, "I've got a new better, improved foo!" and make one. It shares some structure, as we saw. That's the way these data structures work. It can share structure with the one that is already in there. That's no problem. They're all immutable. Nobody who is reading this is impeded by the creation of this new value. It goes on in parallel, and it's not impeded by the fact that anybody is reading. So nobody is in anybody else's way. (slide 49) Then when we want to edit, we want to change this reference atomically to refer to the new thing. And that is always coordinated in Clojure. There are actually multiple semantics. I haven't said how this happens, when it happens, right? You can have different semantics for that, and Clojure has three. Anybody who is consuming the value is unaffected. If I was in the middle of doing a calculation on this, I'm still fine, right? (slide 50) So the only thing of Clojure's data structures that can change are these reference types. There's three of them. Refs, that's an unfortunate name in this context, which are about shared synchronized coordinated change. What's the hardest problem in a system with multiple threads. It's my unit of work touches three data structures. In other words, this composite unit of work. It's very hard to pull off. So Refs do that hardest job. Change three things, and either have all of them happen, or none of them happen. It uses a transaction system. I'll show you that in a second. Agents are more of a lightweight autonomous change model, where you say, you know what, just eventually do this, and I don't care when. I just want to return right away, and I want you to take care of this job for me. Those things are asynchronous. They're also independent. There's no way to move something from one collection to another with agents. And Vars you already saw. Vars have concurrency semantics, when you set them, because they're all thread independent. We saw we couldn't set it unless it was per-thread bound. (slide 51) So Clojure has something that's called software transactional memory system. If you've ever worked with a database, this should all be familiar to you. You can only change a Ref, and here I'm talking about that STM Ref, you can only change them in a transaction. Any set of changes you make inside a transaction will happen atomically. Your transaction will see a consistent view of the world while it's running, as of a point in time. In addition, Clojure as an STM is kind of unique in supporting consistency rules for the values that you're going to take on. So you have isolation, you have atomicity, and then you have consistency. You can attach to a Ref. You can say validate any data anybody wants to put in here with this function. If that validation fails, that transaction will fail. I won't take the value. You can attach those validation functions to Vars, Refs, or Agents. So they all support consistency. No transaction sees any other transaction. Essentially what happens is your transaction sees the world as if it was at a particular point in time, and when you commit, it is as if all the changes you made happened at a particular point in time. The only caveat with this system is, unlike databases, which sort of do deadlock detection and keep guys waiting, and sort of have a big lock chain thing, in STM systems typically, what happens is they're semi-optimistic, or fully optimistic. Clojure's is only semi-optimistic, which means you can proceed along down your transaction and determine, "Eh! I am not gonna win. I am not going to be able to maintain a consistent view of the world and have a successful commit." So do over, and that's what happens in STMs. You get an automatic retry. Which is great, and it is absolutely what you want, but you have to avoid doing any side effects in the transaction. I can't enforce that, but if you have side effects, and you get retried, you'll have your side effects happen more than once. (slide 52) Using this is so simple. Put dosync around your block of code. Any changes you make to Refs inside that block participate in the same transaction, including inside nested functions, including if those nested functions run their own transactions. They all join the enclosing transaction. Under the hood Clojure's STM is pretty unique. If you've done any reading about STMs, it's a multi-version concurrency control system, which is unusual. Yes? http://wiki.zope.org/ZODB/MultiVersionConcurrencyControl Audience: Can you imagine mutations to Refs that aren't wrapped in a dosync as participating in just a very small transaction? No, you can imagine them throwing exceptions. (laughter) It's a very clear image, too, because that's what happens. No, that's not OK. Although you can do flying reads, because the contents of any reference is consistent. So I don't care if you read it outside of a transaction. That's perfectly fine. If you want to read a bunch of things outside of a transaction, and you only cared that each one was consistent, that's also fine. If you want to read a bunch of things and know that they're all consistent as of a point in time, even those reads need to be in a transaction. Audience: Sorry, I'm being stupid here. You said that users have to carefully avoid side effects. Yes, in transactions. Audience: I thought this language doesn't have side effects. Well, you can call Java, you can print, you can read a database, you can ... Audience: Yeah, I'm trying to figure out what you mean by the language being functional, and not having side effects. Any language that has no side effects can only heat up your computer, right? Audience: Yeah, right. (laughter and applause) Yeah, that's not my line, but many times redone. Yeah, of course... Clojure is a realistic language in saying, of course your programs do I/O. Of course your program may want to have identities, which are associated with values that change over time. How can I help you do that right? I can't help you with I/O. You have to just do I/O separately. I can help you completely with this data part. Audience: So if you were to send a function to an agent, it will produce a ... Ah. No, no, no. No. In fact, I haven't talked about agents yet. So hang on to that question. No, that's beautiful. Audience: tbd Oh, how do you start multiple threads? I don't care. You can definitely use Java's Executor framework, or Thread.start, or whatever. You can also use the Agents, which I haven't gotten to yet. Audience: tbd I don't care. I guess I don't want to say, yes it would be the Agents, because that's one way, but I'm perfectly happy for you to use the Executor framework of Java to launch threads. I don't care how they start. Clojure does not require your threads to be adorned, by me, in any way, for this to work. So any way you can start a thread in Java, this will work, including Agents, which I'm getting to. Audience: So on the previous slide, you put that note about don't do I/O. Right. Audience: It's kind of a footnote, but it's a pretty big one. It's a big thing. That's true of all STMs. Audience: No, but like in past STMs, the type guarantees that you're not going to do I/O inside a transaction. Well now ... Absolutely. I think if you're in a strongly typed system, you can have more enforcement. I'm telling you that I can't, in Clojure, because it's a dynamically typed. Audience: No, I'm talking about in practice, how easy is it to shoot yourself in the foot? I don't think I'm doing I/O here, but in reality you are. It's the same as anything. I mean, this is a dynamic language, so you'll get run time errors. You'll start seeing things print three times and be like, hmm, Oh! I'm in a transaction. I got retried. And if you're in a dynamic language, you're used to dealing with problems that way. There's no enforcement. I can't enforce it. I'm not anti-enforcement. There are languages that try to enforce it. They come with very intricate type systems that are not yet expressive enough, in my opinion. But that's another way, and if you do that, you could do more. I would love to try to add just enough effect type system to this to help with that, and that's something that I'll probably research in the next couple of years, but right now it's on you. It could happen. OK? So, this is pretty cool. This makes sense from a database standpoint. This is a particularly unique thing about Clojure's STM, which is, what happens in an STM? Well, there could be multiple transactions going on that both want to increment this counter when they're done. In most STMs that means that one transaction is going to win. The other transaction is going to retry, which means doing all their work again in order to get in there. But it ends up that incrementing a counter does not need to be a read, modify, write action. You could do it that way. You could say: look up the counter. It's 41! Add one. It's 42! Set it back into the Ref. It's 42. Now if I do that in a transaction, and somebody else does that in a transaction, one of us will win and the other one will redo. But, if you have something that I call commute, you could instead say: You know what? I don't care what value is in there. All I care is that when my transaction commits, it's one more than whatever it is. If you have a way to say that, then both those transactions can get all the way to the end, and one could win, and the other could win. They both can win. Neither one needs to retry. So commute is a beautiful facility. It's actually ... it was an old database thing called field calls, or something. Great idea. And it ends up that you can now build systems with even better concurrency, because if you only need to end up adding something to a map, or end up incrementing something, you can use commute instead of read-modify-write. (Field calls are described in the book by Jim Gray and Andreas Reuter referenced below.) Audience: So are you going to give an example of the syntax of commute ... (slide 53) Sure! (laughter) OK, so what does this look like to you? We define foo to be that map. (def foo (ref {:a "fred" :b "ethel" :c 42 :d 17 :e 6})) So we have a map, but oop! It's a reference to a map, which means that if we want to see the value, we have to dereference it. This (@foo) is just reader syntax for deref of foo. @foo -> {:d 17, :a "fred, :b "ethel", :c 42, :e 6} So just like quote, there's deref, meta, there's a couple of other ones that keep me from too many tedious parens. So we deref foo. We see that value. Notice that it's in a different order, because it's a hash thing, so it's not necessarily what I typed in. I can use that value in calculations. This is a flying read: (assoc @foo :a "lucy") -> {:d 17, :a "lucy, :b "ethel", :c 42, :e 6} Read the value, associate this thing with it. Have not changed foo, right? I got the value out. It was this immutable thing. I did some logic on it. I got a new value. This is just that value. Nobody put it back in foo. So that's totally fine. I look at foo. It's the same. @foo -> {:d 17, :a "fred, :b "ethel", :c 42, :e 6} I haven't really changed foo. Now, I can try to commute foo. The way commute works is it says commute, this reference, with this function, and these values. Well what will happen is that function will be passed the current state of the ref, and those values, and the return value of that function will become the new value in the ref. It's beautiful. There's also alter, which is the read, modify, write. But commute is the one that says, whenever I'm done, do this. So it's nice. It's a functional interface. You say apply this function to the value that's in that reference, and that result of that function becomes the new value in the reference. So we do this: (commute foo assoc :a "lucy") -> IllegalStateException: No transaction running Errr! Right? We asked before, can we just do this with a single guy? No, you can't do this anywhere that's outside of a transaction. This isn't OK, because we didn't start a transaction. This is one in a transaction: (dosync (commute foo assoc :a "lucy")) Start with dosync. I do that. I could have done a bunch of things inside this dosync, but this is just one. Now this works, and when we dereference foo, we see the changed value: @foo -> {:d 17, :a "lucy, :b "ethel", :c 42, :e 6} It's pretty easy. It's very easy. And the nice thing about it is, you end up reusing a lot of the functions you already wrote. I could test this function right now. I just did, right? In fact, I could do it with any piece of data I want. Because you're applying these functions inside transactions means you can test these functions on values, not burden your whole system with mock objects, and all this other nightmare stuff. Test your functions. You have some data, you are going to apply this function, it's supposed to do this transformation and return this result. Do all that before you get involved in references. There's nothing about that that involves references. Then you say, oh, I want to store these things in references. OK. (slide 54) Audience: Are there any requirements that the function actually be a function, and not write any other references inside of them? No, you can write other references. They'll all participate in the same transaction. Audience: If I say commute it, then it will ... Same thing. You can commute some references in a transaction and alter others. It's totally fine. Audience: It seems like there is a way I could shoot myself in the foot. No. Completely not. Because all those things participate in the same transaction. They're either all going to succeed, or none will succeed. Audience: So, if I'm commuting, so the function call, and the function is in a commute does some other random write to some other reference. Oh, the commuting function. No, the commuting function has to be pure. Audience: Can you check that? No I can't check it. I'm a dynamic language. That Audience: You could check it dynamically. No. No. That's an interesting idea, though, checking it dynamically. I guess I could do that. Yeah, I could do that one. Audience: tbd It's already in the library, sure. Right. So you say commute x inc, and that will happen when it commits. Audience: Question about what happens if two transactions that do "commute foo inc" commit at the same time, and they seem to be leading up to wondering whether it is possible for both to read the current value, do the inc concurrently, and both write back the same value, ending up with 1 more than the original, instead of the desired two more than the original. They don't commit at the same time. What ends up happening is commutes get all the way through, right, any transaction that involves a commute happens all the way through to commit time. Right, now typically, if it's read, modify, write, you'll see the conflict. Two guys tried to modify foo. One will win. The other one will go back. With commutes what ends up happening is, two guys try to commute foo, the second guy in will wait. This guy will finish, he'll go. So he'll do 42, he'll do 43. Audience: How do you get 43? It read that value. No, no, no. Whatever happened inside your transaction in a commute, is a temporary value. It's just to make you feel good. At commit time, whatever actual value is in the Ref is going to have that function applied to it then. Audience: Right, so then it has to read from this ... It's going to happen twice. It's going to happen twice, and it's a pure function, and that's the cost. Audience: OK. So there's no difference between doing that and redoing the transaction, right? Well, you could have done a hundred other things, though. Let's say you have a really complicated transaction, and it's completely independent from this other one, but both of you have to increment the same counter. Well, if you're really doing that with read, modify, write, you've made both of those transactions completely restart in conflict with each other. With commute they're not. Audience: So you're only going to do a small part of it again. You do whatever you like. I'm just saying you have this facility and it has this property, and you can leverage that in your designs to get more throughput. Audience: So you could only do I/O conditionally, say No. Audience: I'll only do I/O if I succeed. No. Do not. No side effects. I'm going to give you a recipe for that in one second. Audience: If my ? is down, and I call alter early in a transaction, right, and you commute foo, does that make the That order will work. The other order will not work. I will track when you try to use that commuted value, which is really just a junk value to make you feel good. It can be useful. For instance, if you commuted something that was empty, your transaction would see that it had something in it. If you didn't care about the value that was in it, you can at least see that non-emptiness, which is important sometimes. (TBD: I'd like to see an example where getting this junk value back from commute is useful. It seems to me perhaps nice to have the option of a different variant of commute that only calls the function once at the end of the transaction, not twice.) Audience: If I commute first, and then try to alter it, that's You're going to get an exception. Yes. Audience: Is there a way after you commute to find out in which order that, like after the fact. You can go back and read. Not otherwise. Audience: This has all been worked out in the database world for a long time. It works. It's very cool. Except software transaction systems, you read the papers about, don't work like this. Audience: No. No. I said in the database world. Yeah. This is very much database driven. MVCC and everything else is really inspired by like Oracle, and PostgreSQL and that database stuff. Yes. This is a very database-y kind of thing. All right, let's talk about Agents. So Agents are for managing independent state. We saw transactions are super powerful. They let you change more than one thing atomically, which is very powerful, but it has some overhead to it. Sometimes you just don't care. You want to make a system where you have loosely coupled processes and they just need to communicate with each other. So Agents are a way to do that. If you've read about the actor model, or whatever, Agents are sort of similar, but they're not the actor model. In particular, the actor model requires messages to read the values. Agents do not. So you get independent state. You have actions, which are again, just functions, you say, apply this function to your state at some point in the future. It will happen in a thread pool. I don't care when. And that will transform your state. They happen immediately. You call send or send-off, you pass this function, you return right away, and then some time later, that will happen. And it ends up happening asynchronously in a thread pool. The system makes sure that only one thing is happening to an agent at a time, and it makes all kinds of other great promises. (TBD: Can send/send-off block if there is no memory available to store the message, e.g. if they build up a backlog because they are being sent faster than they are being consumed?) (slide 55) You can dereference Agents and see their values, but remember, all kinds of stuff is happening at the same time. You have to start programming that way. If you're still programming from a perspective of, the world has stopped any time I want to look at it, you're going to have to change that. When you have 600 cores, you have to start having looser ideas about how much of the world needs to be consistent in order for you to make decisions. Agents let you do that. What's neat about this is if you send actions during an action, they'll be held. Also if you send actions to Agent during a transaction, they'll be held. So they will happen once, and once only, when you commit. This is your way to do side effects, because Agents can do side effects. So you're doing a transaction. As part of that transaction, you send a message, "I'm done! Woohoo! Tell somebody." And that will only go out when you commit. So it's kind of a cool thing that those two systems work together. Audience: Does Clojure itself manage the (something about thread pools) Yes. There are actually two Audience: tbd No, not so far, but I probably need to give you some more control to say these set of Agents live in this pool. I don't have that now. Right now there are two pools for two different kinds of work. And as I said, they're not actors, and the actor model is a distributed programming model. The Agent model is only a local programming model -- the same process. (TBD: Same process means like a same JVM process with a common address space for the sender and the Agent? Why is that a restriction?) (slide 56) So what does this look like? It looks a lot like the transactions, which is a good thing. We say def foo to be an Agent with this map value. We can dereference it. We can see what's in it. No problem. We can send foo. Look how similar this looks to commute blah, or alter blah. So send foo, a function, and maybe some arguments. That function will be applied to the state of foo. The arguments will be passed to it as well, and that will become the new value of foo. If I send this, and immediately go and look, it may not have happened yet, right? It happens asynchronously. await will make sure anything that has been sent from this thread has already been consumed. Then if I look at it, I will see the change. So this is a powerful way to do asynchronous programming. And this is a way to launch threads. There was a question about how do you launch threads. Well, this does things in threads. Audience: tbd Effectively what happens is it puts a message in the queue, and then waits for it to be consumed. It's a way to allow you to coordinate. I want to make sure you heard everything before I tell you more. But other people could be talking to the same agent. (slide 57) OK, this is the last leg. You guys are doing great. Oh, look! We're good. (laughter) Well, we are pretty good. I'd like to have enough time to have some good talk. So I'll try to get through this quickly, so any lingering questions. Now you'll see all of Clojure, maybe you can put it together. So Clojure is specifically hosted. I haven't shown you much Java stuff, but again, the integration is not a mere implementation detail. I want to be able to call Java. I want to be able to interact with it. Java is my low level language. And the nice thing about a language like Java being your low level language is it actually shares the same GC, and memory model, and threading model, and everything else. Think about all those issues you have when you try to start doing multi-threaded programming, or even just GC. The interactions between GC and calling out to C. It's a nightmare. Yeah, I'm sure people are like uuuuuhhhhhh. You know, it's _bad_. So JVM is my runtime. I can't say enough about how high quality the infrastructure is in these JVMs. You have a wide range of choices. Sun has their thing, but you can also get JRockit, and IBM has excellent VMs. I mean this is a really mature platform with a lot of competition. A lot of extremely bright people working on it. This is excellent technology. As goofy as Java is, the JVM is stunning technology. You cannot ignore it. These just in time compilers and these GC systems are the state of the art. Clojure allows wrapper free interaction with Java. A lot of languages that are hosted on Java, they put their own classes around every Java class. There's none of that. You call Java, you get a Java call. Of course the other advantage of being on Java is there are libraries for absolutely every single thing you could ever possibly want to do. They may not be the best, but they're there. Some are better than others. I'm not saying Java is perfect, or anything like that. But there will be choices of libraries, in every domain. (slide 58) So what is the integration, what form does it take? There's as much unification as I can possibly make happen. For instance, Java Strings are immutable. Pfff! That's perfectly fine for Clojure strings. They're immutable. They're Unicode. We're done. Clojure strings are Java Strings. The numbers I use are the boxed Java Numbers. The only thing I had to add to that set was Ratio. Otherwise, I used the capital, you know, I integer, and things like that. All my collections implement the Collection interface of Java, so if you want to use them from Java, you can pass them. If you have a thing that uses Collections, or Iterable, they're all Iterable. All my functions are Callable and Runnable. So if you want to use a Clojure fn, a Clojure closure as a callback for Swing, you don't have to do anything. You just pass it, because it's Runnable. And I showed you before, the Clojure sequence library works with everything Java. You know, Iterables, Strings, Arrays, the works. You can implement Java classes and interfaces in Clojure, which is nice. It keeps you from going there. In addition, I do support primitives. Java primitives as locals. So if you have a core piece of code that's, you know, your inner loop, where you want to max out the performance, you can make a few little type declarations and say this is an int, this is a long, this is an array of floats, and you'll get the Java primitives that correspond to that. In addition, when you start doing math with those things, you'll get unboxed primitive math. And that doesn't mean you'll necessarily get unsafe math. So there's two different levels. There's do I have the boxing, which you can turn off with the declaration, and then you're going to have to tell me explicitly, I want bad, you know, plus, and I'll give you bad plus. And it turns out that the boxing is much more overhead than the overflow detection. (slide 59) So this is what that looks like. You can access static members directly. It looks a lot like that namespace syntax, doesn't it? Because that's really what static class members are. They are things in a namespace, where the namespace is the class, so I just unify that. You can call methods. This is a macro, dot dot. It says put dots in between everything. As many as you want. (laughter) System dot get properties dot get Java version. And it ends up, this has no more, it has fewer parentheses than the Java version. You rapidly have fewer parentheses than the equivalent Java. Anybody from Java is like, "Oh, Lisp! Parens!" You just show them _this_ one, right? Do to a new Jframe, add a new label with hello world, and then pack it, and then show it. It has a third of the number of parentheses and lines, well, it has no lines, one line. You can make Java completely easy. And interacting with Java stuff, if you use the sequence library, you don't even know it's Java. You totally don't care. Things like this. This doesn't look like Java, right? This looks like Lisp. into a map, filter, this is shorthand for fn, so this is like the most concise lambda expression you can do in Clojure, do this regular expression find across, ooh! This is actually Java! This is the only Java part of this. But it's just transparent. It doesn't turn your Lisp code into Java code. It turns Java code into Lisp code. That's the beauty of it. So don't be afraid Clojure is Java with parens, because there have been Lisps from the JVM that went that way. (slide 60) This is what it looks like to do Swing. Some imports. You know, it's Lisp. This is a callback, right? We're proxying. This is a one-off implementation of a Java interface. It's an anonymous inner class, effectively, that you would use in Java. And again, this is so much shorter. The Java for this is three times the size. (slide 61) I really like the JVM. It is a great platform. It is a serious piece of technology. My main complaint is lack of tail call optimization, as I've said before. I'm hopeful that it will get that. I can't also say enough about HotSpot, and I'm using HotSpot, which is like a Sun branded thing. There are equivalent ones from IBM and from JRockit, and whatnot. But these JITs are amazing. For instance, I talked about that primitive math. Actually Clojure doesn't emit any of the primitive arithmetic byte codes. If I can get to a static method call on primitive types, those JITs will finish, and finish emitting (a) the byte codes for Java for arithmetic, and (b) turning that into the right instructions for the CPU. In other words, I only have to get as far as a method call, and these optimizers get me to optimized machine code, which runs as fast as C, and often faster now, because they're doing run time profiling. There are things you can optimize only by looking at a running program, and that's the way these optimizers work. As I said before, ephemeral garbage is extremely cheap. I leverage that. You should, too. If you're not aware of it, it's great. And there's so many other features. There's the verifier, the security, there's a lot of... the ability to load code over the wire. Java has a lot of stuff under the hood. (slide 62) So the benefits are, you can just focus on your language. If you're looking to write a language and target this. Of course, I recommend just starting with Clojure. But for me, it really let me focus on the language. The infrastructure was great. Sharing with the implementation language, especially these kinds of things: GC, big deal. You get tools for almost nothing. I emit the correct stuff, the correct debugging information, you can debug Clojure, step debug, breakpoints, the whole works. I mean, I'm a LispWorks user. That stuff only came a couple years ago to LispWorks. You have excellent debugging, excellent profilers, and your choice, all work out of the box with Clojure. And then there are libraries for everything. Clojure users were instantly doing all kinds of stuff I had no ... I couldn't possibly imagine in advance. They were just talking to databases, and doing net kernel stuff, and big document processing, map reduce, Hadoop, lots of cool stuff. There's also excellent infrastructure on the JVM for implementing things like the STM. This java.util.concurrent library is really good. (slide 63) There's a lot more to Clojure. It has list comprehensions, ad hoc hierarchies I mentioned before. With type hints you can get Java code to run exactly the same speed as Java, so Clojure's fast. I hinted at the relational set algebra, and the parallel libraries for doing parallel computations -- automatically leverage all the CPUs to crunch big arrays. And namespaces. There's zippers, like Huet's paper, XML, and a whole bunch of other stuff. (applause) Any questions? Audience: So what does generic plus turn into in Java byte code, and then into, in practice, in execution? Generic plus, it turns into a Java type test quickly, and then branch to, in my library, individual types, which implement logic for that type. They will do automatic promotion to the bigger type. Audience: But how bad is the type dispatch in Java byte codes? The thing is that these optimizers optimize a lot of that stuff away, and in fact they really can take something where you haven't hand coded a fast path, and they'll chop it out, and they'll stick it over here, and they'll say, well, let me try running that in all cases, and they'll have a little very inexpensive test they'll use to see if it's still the right way to go, and then it'll slow path the others. So I couldn't say, because each JVM does different things, but it's pretty fast. The main optimization required moving forward is something that they call escape analysis, where if you new up a boxed number, and immediately consume it, they should be able to completely avoid that allocation, and that is something that is just coming into the next generation of optimizers for the JVM, so that will help all of these things. And in addition, like I said, if you really care, and you need to go right to the metal, you say int x and you're done. No dispatch, no nothing. Audience: Could you describe your development environment when you're writing in Clojure? Do you use a debugger or something to debug ... I use Emacs. There's Emacs plugins. There's SLIME. There's all kinds of cool stuff already ported for Clojure. There's an Emacs mode, there's a VI mode, which is not as sophisticated. There's an environment for NetBeans, a plugin for NetBeans, which has syntax highlighting and everything else. That has the best debugger support right now. The NetBeans plugin. But you can use any debugger. You can use JSwat, or jdb, or profile with YourKit, which is just a gorgeous piece of technology. Really, all the tools work. But in terms of editing, I edit Java code with IntelliJ and I edit Clojure code with Emacs. And I don't debug. (laughter) Audience: I was just about to ask, is there a Clojure language level debugger? No, I'm not going to do that. Because these debuggers are so good. Oh, these debuggers, when debugging Clojure, debug Clojure language. When I'm setting breakpoints and stepping, you're stepping through Clojure code. You're setting breakpoints in Clojure code. Clojure emits a mapping file, and line number information that's necessary for Java debuggers to say, line 42 is over here in the Clojure file. Audience: But if I'm typing away at the REPL, and somewhere down inside somewhere I take the car of 5, and it says, no you don't Attach a debugger. You can attach these debuggers ... you don't have to be in debug mode in Clojure. If you start it with the debug API alive, you can attach a Java debugger to it whenever you want, including right then. But I'm not going to reimplement Common Lisp style debugging, because this stuff is ... You know, it's not the same, but it's really good, and I don't have time for the other. If somebody wants to do it, they could, but. Audience: You said you've also programmed in C# Sure. Audience: tbd the .NET tbd Yeah, they don't always do it, though. Audience: Well, it can always do it later. Yeah, well later is still ... I'm waiting for later, too. Audience: Is there anything in Clojure, other than STM stuff, which tbd No, I started Clojure for both platforms. Clojure ran on both platforms for a long time. I just got tired of doing everything twice. I also wanted to make a decision about what will library authors do? How will the library for Clojure grow? And as much as I'm expert enough to do C# and Java, and many other people are, everybody isn't, and it's a big thing to take on having to do everything twice, and do the least common denominator, and everything else. So I abandoned this C# port maybe 2 years ago now. It certainly would be possible to do. Yes. C# has everything you need to do this. Audience: How long have you been doing this? Three years. It's been out since last October, so it's coming around a year. It's really taking off. I mean, I have a big group of users. I have over 500 people on the mailing list, and we get 5 or 600 messages a month, and I have thousands of SVN hits a month, so I think I have a couple of thousand users. Audience: What would you say is the most interesting tbd Well, you know, I don't hear from people unless they have a problem. (laughter) So I wish I knew. I do know there's a couple of companies using it, and it's sort of like their secret weapon thing. You know, they do document processing and some other stuff. I've seen database work. I've seen interesting graphics work, and XML things. There's some really neat stuff. Like there's a zipper library which allows you to sort of pretend you're manipulating something in a destructive way, when you're really not, and producing a changed version of a data structure. And somebody wrote a really neat, Xpath-like way for processing XML functionally with that. Audience: How large is the runtime? Excuse me? Audience: The runtime. How large is it? 400K. Fits on a floppy disk. If you could find a floppy disk. And half of that is the ASM library that I use to emit the bytecode. Clojure is about 250, 300K. Audience: You've listed Common Lisp, Oracle, and Standard ML as influences. Anything else? Sure. Haskell, and all the STM work, and basically anything I could get my hands on to read. A lot of database stuff went into STM. That old Andreas Reuter book, or whatever, which will make you really not ever want to use a database that is not MVCC ever again. Really whatever I could get my hands on to read. I like to read. "Transaction Processing: Concepts and Techniques", Jim Gray, Andreas Reuter, 1992 Audience: tbd Clojure versus Scala. Well, Scala is statically typed, and has one of these sophisticated type systems. I think Clojure is more strictly functional, because it defaults to immutability, and defaults to ... the data structures are these persistent data structures, but it also has a recipe for concurrency built in. There is some library work for that, for Scala, but Scala is really still a multi-paradigm language where the objects are still those old share-the-reference-to-the-block-o-memory objects, and I think that is doomed. Audience: So you mentioned running on these Azul's 600 core boxes. Yeah. I highly recommend it. It's so much fun. Audience: So does the performance actually scale fairly linearly, or Well, he found it did on this problem. Of course. You know the whole thing with STMs -- everybody is like STM is this holy grail, and when you try to say something is the holy grail, it's going to fail, because nothing can do everything. You still have to architect your programs. You still have to have some sense of what is the contention profile in my program. You still may run into performance bottlenecks that you have to solve by changing the contention profile of your program. I mean, it's not a silver bullet. What it is, is a way to write something that's correct easily. And the tools for doing that. But there's still going to be work. Audience: So the moral is ... I was being sort of idiomatic here. You can probably add a 100 times more cores than you an add ... If you've ever written programs with locks, you can be wrong on two cores right away. Audience: You can go to one core! You can be wrong on one core. It's so easy to be wrong. It's so easy to be wrong. Audience: Could you explain your logo? My brother did it for me. It's pretty. It's a little ying yang, a little lambda. Yeah, he's good. Audience: In Haskell, it seems like every time a Haskell programmer runs into a speed or space issue, he goes through and makes everything strict. Have you found that in Clojure? Have you gotten around it? Is it just not happening? It's not the same thing. Because in Haskell, everything is lazy, so that's a big difference. When everything is lazy, it's very difficult to tell when you're going to have this explosion in memory requirements. And that's the problem people run into with Haskell. With Clojure, the laziness is limited to these sequences. And in particular, I'm very careful that inside of a sequence operation, as you move through, which is usually what you're doing -- usually you're just -- maybe you have a concatenated set of operations. But you're walking that window through your data set. I make sure, by cleaning up the stack that you don't retain the head. It's easy to retain the head accidentally in Haskell, in which case, as soon as you've realized the thing, you've realized the thing. As soon as you've walked it, or touched it, you have the whole thing. In Clojure, you walk it, you don't have the whole thing. In fact, it can erase as you go. I mean, it can get GCed as you go. So I haven't seen that be a problem. In fact, I think the laziness is a real attribute in the low memory utilization of Clojure programs. Audience: How much of Clojure is written in Clojure, and how much is written in Java? It's about 15,000 lines of Java and 4,000 lines of Clojure. Audience: tbd Absolutely. In fact, when you define a Java class in Clojure, you end up with a Java class that is extremely dynamic, much more dynamic than a regular Java class. In fact, all it is is a set of stubs that hook into Clojure Vars. So you can, no, it's not funny. It's very powerful. (laughter) You can write Java classes in Clojure that you can fix while they're running. You can fix the definitions of methods. You can also write transactional Java objects in Clojure. You can write Java objects in Clojure whose data can only be changed inside transactions. It's cool. Audience: I was a little surprised by your answer about the source code being 3 times more Java. Are you planning on moving towards a system that is more bootstrap? Well, you know how it is when you have something that works? (laughter) There's a reason that the data structures are written in Java, and the interfaces are written in Java, and that's so that consumption from Java is straightforward. Clojure does not yet have all of the mechanisms in Clojure's Java-emitting syntax to do the lowest level, like field type declarations you need for the highest performance classes, and I'm not sure I want to go there, because it'll end up being Java with parentheses. By the time you have private, protected, whatever, and all the declarations you need to support exactly what you would do in Java. So, the biggest thing that's in Java, that probably I would like not to be in Java, is the compiler itself, which is about 4,000 lines of Java. And people have asked about bootstrapping that. That certainly would be possible to do. It's the old "it's working right now" kind of problem. And I'd rather do new things. So I don't have a big thing, and I know it's like sort of a chest beating proof of whatever, and I really just don't care. (laughter) Audience: Could you do a mobile subset of Java? Some people have tried to do that. Unfortunately, even 450K is a bit much, but one of the other challenges is just that I do need a fair amount of Java concurrency mojo. Like I only go as low as Java 1.5, because I need a lot of the java.util.concurrent stuff to support some of the things. So that's definitely something I'd like to do. Another thing that's tough on the mobile phones is dynamic code generation. The byte code loaders, and some of the things are secured that way. One of the things on my to do list is ahead-of-time compilation to byte code, which will let me target some of those static platforms, like Google Android. Audience: There's a standard problem with promises in Scheme, where it makes you streams moderately high when you filter out all of the integers, numbers that are ??? 10 billion, and then you try to get one of them. If the promises were implemented naively, you'd get a string of 10 billion promises ??? Do you do the appropriate trampoline? I don't do it that way. No, that will walk through -- you'll have to walk to get to 10 billion, but it's going to throw everything away on the way. Audience: So ... So it's not a chain. There's no realized chain. If you haven't hung on to the first thing, you don't have it. If you say drop 10 billion from a cycle, whatever, and then take 3, you'll have the head of that 3, and you won't have 10 billion of anything. It's not a chain of promises. Audience: So there's a trampoline. No. It cleans up as it moves. That's the trick. What happens is, there are suspended expressions, yes? In the first, in the head. When they're realized, they're discarded. Now the realized versions are cached. Now we move to the next. Now we have new expressions. Now we evaluate those. Now the expressions are let go, of the value that they are. In addition, if this was held on the stack, and this is a nested call, or some reduce or something like that, no one else is looking at this, we erase the reference to this, boom! What's left? One guy, 2 values. No lingering anything. So no is the answer. It doesn't build up. Audience: tbd No, I can't talk about Haskell's implementation. All I can say is, it is a known problem with some Haskell programs, that the laziness has this interactions that can make it difficult to predict the memory usage of your program in advance. I'm saying in Clojure that's not the case because the laziness is limited to these sequences, and I've just told you their behavior that makes it so that they don't accumulate anything that you haven't asked to hang onto. If you hang onto the head, and then go 4 billion nexts, or rests, well you're now holding on to a 4 billion item thing. But you didn't have to do that, and I won't do that accidentally for you. Audience: ??? That's kind of a weak spot in ?? Yeah, it is. There is hope. Right now I hae the same as Java does. Arrays of arrays of arrays. Multidimensional arrays are arrays of arrays. But I've seen some cool stuff in IBM's X10 language, where they take a giant array, and then they pretend its multidimensional, even though Java doesn't support it, and they have some very neat syntax for using little tuples to do subranges and to do multidimensional indexing, and I was inspired by that, and maybe if it comes to it, I may copy something like that. That is the neatest I've seen that gets around those problems. Audience: ??? Datalog. Anybody know any good Datalog implementations? Yes, I'd like to add some ... Where Clojure is going is I'd like to add some declarative data access stuff. Not really Prolog, because it's not that great for database stuff, but Datalog, which is sort of a subset of Prolog, for which a lot of research has been done in optimization, would be a good mini-rule system plus data access that's pretty. You know, it supports recursion and things like that. So I'm still researching whether I'm going to roll my own or use Iris Reasoner, which is a Java one that is open source. Audience: ??? Maps. Yes, there is actually something called a struct in Clojure, and its an optimized implementation of map, where you say, I know I'm going to have 10 million of these things, that are going to have fields a, b, and c. So repeating a, b, and c in all of those little maps is kind of a waste of time. So I can make that go away. But it still implements the map interface, and for all intents and purposes, it is a map. You can still add extra stuff to it, just like you do. You can still assoc them. So it's nice. It's the best of both worlds. Audience: And for multiple values do you ... Multiple values, you can do whatever you want, but I don't have them on the stack. I can't do them. Clojure is stack compatible with Java, so there's no multiple returns. Audience: Is there any interaction that we should know about between metadata and transactions? No. No, I mean, like I said, most of the transformative functions of values will be metadata preservative. And you're just going to pass a function to the transaction to say modify the value with this function, which will be something like assoc, or something you wrote based on assoc. It will still preserve metadata. Metadata is very neat. You can do multimethods that key off metadata, so do this if it's data I don't trust. That kind of polymorphism is something we need, and doing everything with types really has limited us. Audience: So I read that at the JVM summit there was some talk about these dynamic languages at the summit. Did you learn anything interesting about these I learned that those guys need to learn a lot more about languages that aren't a lot like Smalltalk. (laughter) No, I mean their first cut is to try to support the popular languages there, which are JRuby and Jython, which have very similar object models, so they have a distinguished receiver, and stuff like that in the MOP, and I did try to tell the guy, look, just move the parenthesis over, and let's just realize that it's all functions. Can we just do that? There's really no such thing as methods. Audience: You made a note on the discussion board about how, basically, having all these defns, ? generation wasn't a big deal for Clojure. Can you elaborate on that? Was tbd Audience: Yeah, so you do a defn and it gets implemented as a class, right? Oh, the classes, right. And there were some of these languages that need to generate tons of classes in order to, like for instance do these wrappers. I said Clojure had wrapper free access to Java. A lot of these languages don't. In addition, a lot of these languages have, like Ruby and Python, they were written by guys who were writing something fun for themselves. They weren't thinking about compiling those languages them when they wrote them. Now these Java guys doing Jython and JRuby are trying to compile them, and it's very challenging. So one of the things they're trying to do is polymorphic inline caches, and all that stuff, and when they generate any code in Java, you have to generate a class, because that's the unit of code. So they are doing a lot of ephemeral class generation, and classes have an impact on this Permgen. Clojure really doesn't do any of that. Audience: tbd Yeah, which is not that many ... unless you really need to fix that function a lot. (laughter) You should be OK. Audience: Were you doing genetic programming? Well, you know, those things are all compiled, so even if you see fns, that's not like a new class at runtime. It's a class at compile time. See, that's the stuff that I'm talking about. Yeah, if you were writing genetic programs, and you needed to do a lot of evaluation of function definitions, sure, you would get into permgen issues, yes. Audience: Any advice or hints about what sort of applications do you think would shine in Clojure? ? the Lisp side ? I think anything. I mean, right now, if you want to write a program ... Clojure is designed to be useful anywhere you would use Java, and Java's being applied, you know, people already asked about phones and these 600 CPU machines, so it's a wide range. It's the same question for Lisp. I mean, what is Lisp good for? Pretty much anything. Audience: What would be dramatic? What would work really well as a 600 core machine tbd Anything that you can think of a way to parallelize. I mean, by the time you get to 600 cores. I actually think most applications will never be able to leverage 600 cores. There will be certain kinds of things that can, and I talked about that task parallelism. Most applications will be able to number their tasks in the low dozens, I think, really. So the job of those mega machines will be to run multiple of those applications. The main thing for Clojure is you can write that multiple dozens of tasks parallelism without error, and in terms of application domain, every application I've ever written has needed to do that, and struggled to get it correct, so I think it's good for a wide variety. I wouldn't want to pin it down. Audience: Can you talk about your STM implementation? For instance, tbd No it's a runtime thing, a transaction. Audience: So how does that work? It's a thread local thing that we're in a transaction. So a transaction is going to be in a thread, and then the transaction starts in the thread, and there's a transaction object. References, all interaction with references, are going to test for the presence of that object and make sure that you are in a transaction. That's how we got that exception before. Audience: So you have to go through references. You have to go through references. That's the beauty of this STM. You have to have identified the things that can change. Either they're in an Agent, they're in a Var, or they're in a Ref. (some people talking at once that are difficult to understand) Audience: That's where you track the writes and reads in order to Correct, except I do not track reads. Clojure's STM is unique. It does not track reads. Audience: Oh, because you're doing MVCC. Right. It's MVCC. Audience: What about the condition system? You know, restarts ... One description I like describing Common Lisp's condition system is a chapter in this book: "Practical Common Lisp", Peter Siebel, 2005 You can read that book on line. The chapter on the condition system is here: http://gigamonkeys.com/book/beyond-exception-handling-conditions-and-restarts.html You know, I looked at that. I mean, I actually think that the condition system is a hard coded special case in Common Lisp. That if you had some more general primitives, the condition system would just be one thing you could implement with them. In Clojure, you have two things you can utilize to help you with those kinds of problems. One is the exception system, which allows you to travel up and down the stack. The other is that dynamic binding of functions. You want to have a function. I'm going to bind this function to something that you should call if you have a problem. Well now you've got that ability to call some code from above down low to see what you should be doing. Combining dynamic functions and exceptions gives you some of the condition system. Audience: But it doesn't give you restarts. Audience: You can, once you throw an exception. But you don't throw an exception. That's what I'm saying. Let's say you're somebody who needs to talk to the file system. And you say, if I have a problem talking to the file system, I'm going to call "I had a problem talking to the file system". Somebody above you can bind that to do whatever you want. You're going to call that and use the return value, and not throw. Right, and so I think that gets you most of the way there. Dynamically rebinding of functions and exceptions, I think, is enough. I like the condition system, but it seems to me to be a hard wired special construct, rather than two ... Audience: It's just throw with some tbd Right. Audience: Did you do ??? for that to be rebindable, or is the important thing just the presence of dynamic variables. I've looked at whether you should, instead of having everything be potentially dynamic, you call out the things that you're going to allow to be dynamic. And the problem with that is, you don't know. Now your program is running, and you say, "Aww! We're having this problem in here. I wish I could log this function." Well, you didn't declare it dynamic, so now let's stop the running system. And so, rather than do that, I've made it so that all Vars are potentially rebindable, and I think that's the better way to go, only because I was able to make the check extremely inexpensive, so you can incur it everywhere. So far that's worked out. (Note 1: Does this check occur on every fn call? What exactly is that check again?) The cost of calling them out would be you wouldn't be able to, on an ad hoc basis, say "Oh! I wish I could dynamic rebind this." Audience: I take it you don't have first class continuations? http://en.wikipedia.org/wiki/Continuation No. (pause. laughter) Audience: Well someone had to say it. There was a talk about it at the JVM. I mean, if the JVM supports them, I don't know why I wouldn't, but otherwise, I think they're too complicated for me. Audience: First class what? Continuations. Audience: ??? No. (applause) Audience: The way the symbols are tbd No, no. I think the system is good. If you're coming from Common Lisp, and you get your head around how this is different, you'll see. It does get rid of, I think, 80 plus percent of the hygiene issues really just go away. Because when this macro was written, list meant the list function in Clojure. It will emit something that will make sure that it gets that binding. And that is what hygiene is about. But it doesn't have a sophisticated renaming system. Audience: Well, would you want to possibly get a hygienic macro system? I also don't understand those too well. Especially the implementation. It's like the sourdough implementation of that. No one ever reimplements it. Everybody just passes this one thing around, which means if your language isn't Scheme, you have to ... what do you do? I seem to see this one implementation of hygienic macros reused, let's say, everywhere. It's complicated! Audience: Do you have a white paper or something specifically on macros tbd No, just the material on the site. There's a lot of material on the site. But we have a good group, and any question or dialog you start will inform everyone. I encourage anyone who is using Clojure to join the Google group and I'll be happy to expound on any details. Audience: What's zipper? What's zipper? Some French guy whose name I can't quite pronounce. Huet? Huay? Or Huet? H U E T, wrote a beautiful functional pearl on zippers, which is a way to navigate through a tree-like data structure as if it was imperative, and you can go through, and you say boom, and you get here, and say replace this with something else, and then you can say up up up up up, and you go back up to the root of the tree, and what you have then is a changed thing. There's an implementation of that here in Clojure. http://en.wikipedia.org/wiki/Zipper_(data_structure) In latest Clojure as of Aug 2009, the Clojure implementation is in namespace clojure.zip Audience: I think we should probably stop here with the questions at this point. OK. Thank you very much. TBD: I'd like to see an example of extending one of these abstractions from Clojure, as mentioned in slide 23 discussion. TBD: I guess a transaction is always run within a single thread? i.e. there is no real way to get another thread to do some of the work of computing some of the results of the transaction for you? Or if there is a way to do that, do those other threads automatically get joined into the transaction with you? Answer: Right now that doesn't work by default, but apparently it can be made to work if you go to some extra effort with copying certain thread-local bindings to other threads: http://groups.google.com/group/clojure/browse_thread/thread/df7f24285e50dec8 It seems to me like in the longer term, it would be nice if it could work without such extra trouble on the programmer's part, especially if there start to be more common things in the Clojure library that do parallelism "under the hood". Transactions should ideally be short things, if you can make them that way. TBD: If you use commute, can there be a "backlog" of commute operations that haven't finished yet, even after all transactions have completed? Or I guess they are still synchronous, guaranteed to have "effectively completed" by the time the dosync is finished? If not, then they would be more like Agents, right?