Friday, January 13, 2012

What makes Clojure different?

A friend of mine asked me why Clojure matters and what makes it special and why I think it is good for linguists. This post is the edited version of my answer to my dear friend. Since there are very good books on the market (my favourite is Clojure in Action) and the internet is full of good tutorials (4Clojure is esp. good if you like the learning by doing method) my goal is only to give you a rough picture of functional programming.

An example

We are going to solve a "toy" problem stolen from the first chapter of Peter Norvig's seminal Paradigms of Artificial Intelligence. The question is how do you extract first and last names from someone’s full name. Before you think this is too simple and it doesn't worth dealing with, consider names like Robert Downey Jr, Admiral Grace Hopper, and what about Staff Sergeant William "Wild Bill" Guarnere (a character for the Band of Brothers series). Machines should be programmed to solve these problems, and even humans could have problems with names. It took me years to figure out that Martin "Boban" Doktor (a well known Czech Olympic champion sprint canoer) is not a real doctor...
First, we need some data to test our assumptions.
The function 'def' associates the symbol 'names' with names (oh, a vector of vectors). A first name is usually just the first word in a name.
And the last name is the last word in a name.
Let's test our functions. Calling first-name and last-name on my name gives the right answers.
We stored out test data in names, and now it's time to test our functions en mass. The higher order function map helps us in doing so. Map takes a function as its first argument and applies it to every member of its second argument.
Oooops, the program is having serious problems with "titles" or prefixes. Calling last-name on names gives interesting results too. Our program is not that bad, it captures the basic logic of identifying first and last names, but affixes cause problems. The first name should be the first word in a name if it is not a prefix. Let's store the affixes in vectors.
We want to test if the first word of the full name is a member of the titles. We need a function that tests membership.
The function member is recursive. First, it test if its second argument is a sequence. The second if gives us a terminating condition, if x and the first element of the second argument are equivalent, it returns the whole second argument. Otherwise it tests the membership again on the rest of the sequence (i.e. everything but the first element of the original sequence). Now, we can redefine our first name function. If the first word of the full name is in the list of prefixes, call first-name on the rest of the full name, otherwise return the first word of the full name.
Testing our new function shows it works correctly.
We can redefine last-name similarly.
Storing names in vectors of strings is very unnatural (at least for humans, I guess machines don't care about these issues). Wouldn’t it be nicer to type names like "Zoltán Varjú" instead of ["Zoltán" "Varjú"]?
First, we need new test data, which is a vector of strings.
We want to use our first-name and last-name functions. Can we split a name into individual words? clojure.string provides us a split function (that's why we put (:use [clojure.string :as str :only [split]] :reload) into ns) which splits a string into a vector of strings at a given point. The space character delimits the parts of a name. Our source code looks like this now:
Now we can test split from clojure.string.
Let's define a split-name function just to save ourself from repetitive strain injury caused by excessive typing.
Finally, we test if our functions work on splitted names.

Notes

I have to note, you can make the code more concise and idiomatic. I hope you can see 1) how can you solve a problem with functions and by combining them 2) you have a basic idea of what is recursion 3) how can you go from a basic problem to an acceptable solution.

What makes Clojure different?

Norvig lists eight features that make Lisp different:
  1. built-in support for lists
  2. automatic storage management
  3. dynamic typing
  4. first-class functions
  5. uniform syntax
  6. interactive environment
  7. extensibility
  8. history (see Paul Graham's essays, What Made Lisp Different and The Roots of Lisp)

Clojure is a Lisp on the JVM which makes it unique. The Java Virtual Machine makes it portable, reliable and secure, but there is a new JavaScript based version called ClojureScript. Slime is an excellent development environment, leiningen makes project automation easy. Java interoperability means Clojure has got a great collection of libraries for almost everything.
However Clojure is not for complete beginners. The Clojure community is very open and supportive, but asking the right question requires some sort of maturity. As this Reddit thread explains you shouldn't be a Java expert to pick up the language, even you can learn what you have to know on the go. But you should know at least one 'conventional' language like Python before you start learning Clojure. More propaganda in our Why Clojure lx? post.

2 comments:

  1. Thank you for your series of blog posts - really getting a lot of out them.
    I'm just starting to look at core.logic and took this post as inspiration.
    I have tried to implement your first-name function in core.logic, and thought it would be good to share:

    It works, but I am unhappy with having to define the two cases of 'prefix' and 'not prefix' redundantly (i.e how would I do this as an else), also not sure if my not-prefixo is good style.

    Anyway, thanks for your enjoyable blog posts,

    Martin

    (defn not-prefixo [x]
    (conda
    [(membero x prefixes) u#]
    [s#]))

    (defne first-nameo [f n]
    ([a [h . t]]
    (membero h prefixes)
    (first-nameo a t))
    ([a [a . _]]
    (not-prefixo a)))

    (run* [q]
    (fresh [n]
    (membero n names)
    (first-nameo q n)))

    (run 1 [q]
    (first-nameo "Martin" q)) ;; It would be cool if this gave names with all the prefix options too

    ReplyDelete
  2. Just to follow up, i've made a bit of progress. I now realise I have to have the negation of the first case in conde (that's just the way conde works), also I was needlessly recurring (unless we are going to have people called "Mr Sir Martin Jones", I just need to strip the prefix.

    (defne first-nameo [f n]
    ([a [h . t]]
    (fresh [b]
    (membero h prefixes)
    (firsto t a)))
    ([a [a . _]]
    (conda
    [(membero a prefixes) u#]
    [s#])))

    (run* [q]
    (fresh [n]
    (membero n names)
    (first-nameo q n)))

    (run* [q]
    (first-nameo "Martin" q)) ;; Gives full names that could have Martin as a prefix

    ReplyDelete