A friend of mine asked me why Clojure matters and what makes it special and why I think it is good for linguists. This post is the edited version of my answer to my dear friend. Since there are very good books on the market (my favourite is Clojure in Action) and the internet is full of good tutorials (4Clojure is esp. good if you like the learning by doing method) my goal is only to give you a rough picture of functional programming.
An example
We are going to solve a "toy" problem stolen from the first chapter of Peter Norvig's seminal Paradigms of Artificial Intelligence. The question is how do you extract first and last names from someone’s full name. Before you think this is too simple and it doesn't worth dealing with, consider names like Robert Downey Jr, Admiral Grace Hopper, and what about Staff Sergeant William "Wild Bill" Guarnere (a character for the Band of Brothers series). Machines should be programmed to solve these problems, and even humans could have problems with names. It took me years to figure out that Martin "Boban" Doktor (a well known Czech Olympic champion sprint canoer) is not a real doctor...
First, we need some data to test our assumptions.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(def names [["John" "Q" "Public"] | |
["Malcolm" "X"] | |
["Admiral" "Grace" "Hopper"] | |
["Spot"] | |
["Aristotle"] | |
["A" "A" "Milne"] | |
["Z" "Z" "Top"] | |
["Sir" "Larry" "Oliver"] | |
["Miss" "Scarlet"] | |
["Robert" "Downey" "Jr"] | |
["Gregory" "House" "MD"]]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defn first-name [name] | |
(first name)) |
And the last name is the last word in a name.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defn last-name [name] | |
(last name)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
names> (first-name [ "Zoltán" "Varjú" ]) | |
"Zoltán" | |
names> (last-name [ "Zoltán" "Varjú" ]) | |
"Varjú" |
We stored out test data in names, and now it's time to test our functions en mass. The higher order function map helps us in doing so. Map takes a function as its first argument and applies it to every member of its second argument.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
names> (map first-name names) | |
("John" "Malcolm" "Admiral" "Spot" "Aristotle" "A" "Z" "Sir" "Miss" "Robert" "Gregory") | |
names> (map last-name names) | |
("Public" "X" "Hopper" "Spot" "Aristotle" "Milne" "Top" "Oliver" "Scarlet" "Jr" "MD") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(def prefixes ["Mr" "Mrs" "Miss" "Sir" "Madam" "Dr" "Admiral" "Major" "General"]) | |
(def suffixes ["MD" "Jr"]) |
We want to test if the first word of the full name is a member of the titles. We need a function that tests membership.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defn member [x sq] | |
(if (seq sq) | |
(if (= x (first sq)) | |
sq | |
(recur x (rest sq))))) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defn first-name [name] | |
(if (member (first name) prefixes) | |
(first-name (rest name)) | |
(first name))) |
Testing our new function shows it works correctly.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
names> (map first-name names) | |
("John" "Malcolm" "Grace" "Spot" "Aristotle" "A" "Z" "Larry" "Scarlet" "Robert" "Gregory") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defn last-name [name] | |
(if (member (last name) suffixes) | |
(last-name (butlast name)) | |
(last name))) |
Storing names in vectors of strings is very unnatural (at least for humans, I guess machines don't care about these issues). Wouldn’t it be nicer to type names like "Zoltán Varjú" instead of ["Zoltán" "Varjú"]?
First, we need new test data, which is a vector of strings.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(def names2 ["John Q Public" | |
"Malcolm X" | |
"Admiral Grace Hopper" | |
"Spot" | |
"Aristotle" | |
"A A Milne" | |
"Z Z Top" | |
"Sir Larry Oliver" | |
"Miss Scarlet" | |
"Robert Downey Jr" | |
"Gregory House MD"]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(ns names | |
(:use [clojure.string :as str :only [split]] :reload)) ; otherwise split messes up everything | |
(def names2 ["John Q Public" | |
"Malcolm X" | |
"Admiral Grace Hopper" | |
"Spot" | |
"Aristotle" | |
"A A Milne" | |
"Z Z Top" | |
"Sir Larry Oliver" | |
"Miss Scarlet" | |
"Robert Downey Jr" | |
"Gregory House MD"]) | |
(defn member [x sq] | |
(if (seq sq) | |
(if (= x (first sq)) | |
sq | |
(recur x (rest sq))))) | |
(defn last-name [name] | |
(if (member (last name) suffixes) | |
(last-name (butlast name)) | |
(last name))) | |
(defn first-name [name] | |
(if (member (first name) prefixes) | |
(first-name (rest name)) | |
(first name))) |
Now we can test split from clojure.string.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
names> (split "Zoltán Varjú" #" ") | |
["Zoltán" "Varjú"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(defn split-name [name] | |
(split name #" ")) |
Finally, we test if our functions work on splitted names.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
names> (map split-name names2) | |
(["John" "Q" "Public"] ["Malcolm" "X"] ["Admiral" "Grace" "Hopper"] ["Spot"] ["Aristotle"] ["A" "A" "Milne"] ["Z" "Z" "Top"] ["Sir" "Larry" "Oliver"] ["Miss" "Scarlet"] ["Robert" "Downey" "Jr"] ["Gregory" "House" "MD"]) | |
names> (map first-name (map split-name names2)) | |
("John" "Malcolm" "Grace" "Spot" "Aristotle" "A" "Z" "Larry" "Scarlet" "Robert" "Gregory") |
Notes
I have to note, you can make the code more concise and idiomatic. I hope you can see 1) how can you solve a problem with functions and by combining them 2) you have a basic idea of what is recursion 3) how can you go from a basic problem to an acceptable solution.
What makes Clojure different?
Norvig lists eight features that make Lisp different:
- built-in support for lists
- automatic storage management
- dynamic typing
- first-class functions
- uniform syntax
- interactive environment
- extensibility
- history (see Paul Graham's essays, What Made Lisp Different and The Roots of Lisp)
Clojure is a Lisp on the JVM which makes it unique. The Java Virtual Machine makes it portable, reliable and secure, but there is a new JavaScript based version called ClojureScript. Slime is an excellent development environment, leiningen makes project automation easy. Java interoperability means Clojure has got a great collection of libraries for almost everything.
However Clojure is not for complete beginners. The Clojure community is very open and supportive, but asking the right question requires some sort of maturity. As this Reddit thread explains you shouldn't be a Java expert to pick up the language, even you can learn what you have to know on the go. But you should know at least one 'conventional' language like Python before you start learning Clojure. More propaganda in our Why Clojure lx? post.
Thank you for your series of blog posts - really getting a lot of out them.
ReplyDeleteI'm just starting to look at core.logic and took this post as inspiration.
I have tried to implement your first-name function in core.logic, and thought it would be good to share:
It works, but I am unhappy with having to define the two cases of 'prefix' and 'not prefix' redundantly (i.e how would I do this as an else), also not sure if my not-prefixo is good style.
Anyway, thanks for your enjoyable blog posts,
Martin
(defn not-prefixo [x]
(conda
[(membero x prefixes) u#]
[s#]))
(defne first-nameo [f n]
([a [h . t]]
(membero h prefixes)
(first-nameo a t))
([a [a . _]]
(not-prefixo a)))
(run* [q]
(fresh [n]
(membero n names)
(first-nameo q n)))
(run 1 [q]
(first-nameo "Martin" q)) ;; It would be cool if this gave names with all the prefix options too
Just to follow up, i've made a bit of progress. I now realise I have to have the negation of the first case in conde (that's just the way conde works), also I was needlessly recurring (unless we are going to have people called "Mr Sir Martin Jones", I just need to strip the prefix.
ReplyDelete(defne first-nameo [f n]
([a [h . t]]
(fresh [b]
(membero h prefixes)
(firsto t a)))
([a [a . _]]
(conda
[(membero a prefixes) u#]
[s#])))
(run* [q]
(fresh [n]
(membero n names)
(first-nameo q n)))
(run* [q]
(first-nameo "Martin" q)) ;; Gives full names that could have Martin as a prefix