<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-1506645095464847642</id><updated>2012-02-25T05:15:12.880-08:00</updated><category term='road-map'/><category term='manifesto'/><category term='nlp'/><category term='emacs'/><category term='hello'/><category term='linguistics'/><category term='Clojars'/><category term='word frequency'/><category term='intro'/><category term='Leiningen'/><category term='Zipf&apos;s law'/><category term='clojure-opennlp'/><category term='Clojure'/><category term='version control'/><category term='paip'/><category term='Incanter'/><category term='Lisp'/><category term='beginner'/><title type='text'>Clojure &amp; lx</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://clojurelx.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://clojurelx.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Zoltán Varjú</name><uri>https://profiles.google.com/102852068976721430833</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-kiZQY3iBnOo/AAAAAAAAAAI/AAAAAAAAAfs/n2Nb-MYeGE8/s512-c/photo.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>10</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1506645095464847642.post-8231190579981595643</id><published>2012-02-25T05:04:00.003-08:00</published><updated>2012-02-25T05:15:12.886-08:00</updated><title type='text'>lx in core.logic #3: Finite State Transducers</title><content type='html'>This is the third post in the series on using core.logic to implement basic constructs in computational linguistics. If you haven't already, you might want to have a look at the first two before you start:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://clojurelx.blogspot.com/2012/01/finite-state-machines-in-corelogic.html"&gt;Finite State Machines in Clojure core.logic&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://clojurelx.blogspot.com/2012/01/lx-in-corelogic-2-jumps-flexible.html"&gt;lx in core.logic #2: Jumps, Flexible Transitions and Parsing&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Today, we're gonna look at finite state transducers, which are commonly used to model and implement translation. While sounding fancy and powerful, they are straightforward extensions of finite automata.&lt;br /&gt;&lt;br /&gt;&lt;script src="https://gist.github.com/1908409.js"&gt; &lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1506645095464847642-8231190579981595643?l=clojurelx.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clojurelx.blogspot.com/feeds/8231190579981595643/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://clojurelx.blogspot.com/2012/02/lx-in-corelogic-3-finite-state.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/8231190579981595643'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/8231190579981595643'/><link rel='alternate' type='text/html' href='http://clojurelx.blogspot.com/2012/02/lx-in-corelogic-3-finite-state.html' title='lx in core.logic #3: Finite State Transducers'/><author><name>Peteris Erins</name><uri>http://www.blogger.com/profile/07081849539043439792</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/-0Ox5rQz7JrY/TsnPw-pVxlI/AAAAAAAAAMA/BizPX9ZYrxc/s220/mendeley.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1506645095464847642.post-8039378807867467615</id><published>2012-01-29T08:18:00.001-08:00</published><updated>2012-01-29T08:22:09.175-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Zipf&apos;s law'/><category scheme='http://www.blogger.com/atom/ns#' term='Incanter'/><category scheme='http://www.blogger.com/atom/ns#' term='word frequency'/><title type='text'>Counting words</title><content type='html'>&lt;a href="http://en.wikipedia.org/wiki/Zipf_law" target="_blank"&gt;Zipf's law&lt;/a&gt; is a well-know word frequency distribution. Let's assume you are learning a foreign language and your teacher gives you books to read. You have to take exams that test if you acquired the vocabulary of the books. You have other commitments, and you prefer reading blogs and books on computational linguistics, so you'd like to determine the most frequent words of the texts and learn them by rote memorization right before the exam. You know that the higher the frequency of a word, the higher the probability it will be on the test. At first, it seems to be obvious that we have to count how many times each word occurs in a text, but it will get a bit complicated.&lt;br /&gt;&lt;script src="https://gist.github.com/1699208.js?file=zipf01.clj"&gt;&lt;/script&gt;We need a text file, I'm using Austen's Persuasion from the NLTK corpora.&lt;script src="https://gist.github.com/1699239.js?file=zipf02.clj"&gt;&lt;/script&gt;&lt;br /&gt;Warning! slurp reads the whole file into the memory! Counting the words is pretty straightforward.&lt;br /&gt;&lt;script src="https://gist.github.com/1699258.js?file=zipf03.clj"&gt;&lt;/script&gt;Plot the text with (graph-words austen) (or your text) and you will see something like this.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-BFsX1tuwG2A/TyVx0bslRDI/AAAAAAAAAsQ/JV_bNoe8PQY/s1600/zipf01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://1.bp.blogspot.com/-BFsX1tuwG2A/TyVx0bslRDI/AAAAAAAAAsQ/JV_bNoe8PQY/s320/zipf01.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Not an informative picture! Let's analyse our text before we modify our program. The raw text file contains a lots of "noise". E.g. it is full of punctuation marks, our program is case sensitive and etc. Another problem lies in the nature of language.&lt;script src="https://gist.github.com/1699325.js?file=zipf04.clj"&gt;&lt;/script&gt;&lt;br /&gt;Function words like determiners and prepositions are high frequency words. We are interested in the so called content words like nouns and verbs.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-4RUecOobjGo/TyVx9Ke-ZkI/AAAAAAAAAsY/I4ku9bOLdqs/s1600/zipf02.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://1.bp.blogspot.com/-4RUecOobjGo/TyVx9Ke-ZkI/AAAAAAAAAsY/I4ku9bOLdqs/s320/zipf02.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Part of speech tagging consumes your resources, so instead of removing function words identified by their pos tag, we are going to use a stopword list, and a list of punctuation marks. I used the NLTK English stopword list and made my own list of punctuation marks.&lt;br /&gt;&lt;script src="https://gist.github.com/1699393.js?file=zipf05.clj"&gt;&lt;/script&gt;The stop lists are stored in sets because we can filter the complement of a set (in Clojure, filter gives you the elements, doesn't remove them). It is a common practice to remove &lt;a href="http://en.wikipedia.org/wiki/Hapax_legomenon" target="_blank"&gt;hapax legomena&lt;/a&gt; from the distribution and to use logarithmic scales on the axes of the chart.&lt;script src="https://gist.github.com/1699437.js?file=zipf06.clj"&gt;&lt;/script&gt;&lt;br /&gt;Now we've got a nicer chart.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-QiNLWyrIlzI/TyVyIlur6ZI/AAAAAAAAAsg/-ZeqSS9kwlQ/s1600/zipf03.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://4.bp.blogspot.com/-QiNLWyrIlzI/TyVyIlur6ZI/AAAAAAAAAsg/-ZeqSS9kwlQ/s320/zipf03.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The chart shows you that you can get a decent score if you concentrate on the most frequent words.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1506645095464847642-8039378807867467615?l=clojurelx.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clojurelx.blogspot.com/feeds/8039378807867467615/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://clojurelx.blogspot.com/2012/01/counting-words.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/8039378807867467615'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/8039378807867467615'/><link rel='alternate' type='text/html' href='http://clojurelx.blogspot.com/2012/01/counting-words.html' title='Counting words'/><author><name>Zoltán Varjú</name><uri>https://profiles.google.com/102852068976721430833</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-kiZQY3iBnOo/AAAAAAAAAAI/AAAAAAAAAfs/n2Nb-MYeGE8/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-BFsX1tuwG2A/TyVx0bslRDI/AAAAAAAAAsQ/JV_bNoe8PQY/s72-c/zipf01.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1506645095464847642.post-8087266116812563109</id><published>2012-01-27T04:55:00.000-08:00</published><updated>2012-01-27T05:04:52.519-08:00</updated><title type='text'>lx in core.logic #2: Jumps, Flexible Transitions and Parsing</title><content type='html'>This is a continuation of the post &lt;a href="http://clojurelx.blogspot.com/2012/01/finite-state-machines-in-corelogic.html"&gt;Finite State Machines in Clojure core.logic&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This current plan for this series is to follow the book &lt;a href="http://www.coli.uni-saarland.de/projects/milca/courses/coal/html/"&gt;Algorithms for Computational Linguistics&lt;/a&gt; using Clojure core.logic instead of Prolog.&lt;br /&gt;&lt;br /&gt;Jumps, wildcard transitions and parsing are natural and useful ways to extend and leverage finite state machines for text analysis. This was an opportunity to introduce extensions of fact databases and non-deterministic matching. Here's the code:&lt;br /&gt;&lt;br /&gt;&lt;script src="https://gist.github.com/1688620.js"&gt; &lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1506645095464847642-8087266116812563109?l=clojurelx.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clojurelx.blogspot.com/feeds/8087266116812563109/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://clojurelx.blogspot.com/2012/01/lx-in-corelogic-2-jumps-flexible.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/8087266116812563109'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/8087266116812563109'/><link rel='alternate' type='text/html' href='http://clojurelx.blogspot.com/2012/01/lx-in-corelogic-2-jumps-flexible.html' title='lx in core.logic #2: Jumps, Flexible Transitions and Parsing'/><author><name>Peteris Erins</name><uri>http://www.blogger.com/profile/07081849539043439792</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/-0Ox5rQz7JrY/TsnPw-pVxlI/AAAAAAAAAMA/BizPX9ZYrxc/s220/mendeley.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1506645095464847642.post-7174501278369150172</id><published>2012-01-25T05:28:00.000-08:00</published><updated>2012-01-25T05:38:08.450-08:00</updated><title type='text'>Finite State Machines in core.logic</title><content type='html'>This is an implementation of Finite State Machines in Clojure using core.logic. They are a good starting point for computational linguistics and illustrate several features of core.logic, such as various ways of defining new relations, pattern matching and also the invertibility of relations.&lt;br /&gt;&lt;br /&gt;It is not an introduction to core.logic. To learn the basics, I would recommend the &lt;a href="https://github.com/frenchy64/Logic-Starter/wiki"&gt;Logic Starter&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;script src="https://gist.github.com/1676242.js"&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1506645095464847642-7174501278369150172?l=clojurelx.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clojurelx.blogspot.com/feeds/7174501278369150172/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://clojurelx.blogspot.com/2012/01/finite-state-machines-in-corelogic.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/7174501278369150172'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/7174501278369150172'/><link rel='alternate' type='text/html' href='http://clojurelx.blogspot.com/2012/01/finite-state-machines-in-corelogic.html' title='Finite State Machines in core.logic'/><author><name>Peteris Erins</name><uri>http://www.blogger.com/profile/07081849539043439792</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://4.bp.blogspot.com/-0Ox5rQz7JrY/TsnPw-pVxlI/AAAAAAAAAMA/BizPX9ZYrxc/s220/mendeley.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1506645095464847642.post-3840811760503464648</id><published>2012-01-24T13:31:00.000-08:00</published><updated>2012-01-24T14:26:10.015-08:00</updated><title type='text'>Beginning with Clojure</title><content type='html'>&lt;span style="font-style: normal; font-variant: normal; font-weight: normal; "&gt;&lt;span style="font-size: 100%;"&gt;I am, by heart, a linguist, not a computational linguist. I was trained in Edinburgh, which is theoretically heavy, although not in Chomskyan, traditional linguistics. What I learned of Python I essentially taught myself, and there's no limit to my ignorance with traditional programming languages. That doesn't mean I'm not willing to try something new - far from it.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: normal; font-variant: normal; font-weight: normal; "&gt;&lt;span style="font-size: 100%;"&gt;So, here we go. Rather than sit and pretend I haven't been twiddling my thumbs or busy for the past few months, I'm going to come straight out and say that is exactly what has been happening. I like blogging as I go along, though.&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: normal; font-variant: normal; font-weight: normal; "&gt;&lt;span style="font-size: 100%;"&gt;Install Clojure from &lt;/span&gt;&lt;/span&gt;&lt;a href="http://clojure.org/getting_started" style="font-style: normal; font-variant: normal; font-weight: normal; font-family: Georgia, serif; font-size: 100%; line-height: normal; "&gt;this site&lt;/a&gt;&lt;span style="font-style: normal; font-variant: normal; font-weight: normal; "&gt;&lt;span style="font-size: 100%;"&gt;. And here is where I ran into my first problem. I downloaded Clojure 1.3.0, unzipped it into my 'code' folder, and then cd'd in there in the Terminal. (I run a Mac.) The site suggests running this:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;pre class="brush: java"&gt;java -cp clojure.jar clojure.main&lt;/pre&gt;&lt;div style="font-style: normal; font-variant: normal; font-weight: normal; "&gt;&lt;span style="font-family: Georgia, serif; "&gt;Well, that didn't work. (Neither did posting code snippets on Blogger, it seems. update: &lt;a href="http://www.craftyfella.com/2010/01/syntax-highlighting-with-blogger-engine.html"&gt;nevermind.&lt;/a&gt;) Instead, I got this:&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush:java"&gt;Exception in thread "main" java.lang.NoClassDefFoundError: clojure/main&lt;br /&gt;Caused by: java.lang.ClassNotFoundException: clojure.main&lt;br /&gt;at java.net.URLClassLoader$1.run(URLClassLoader.java:202)&lt;br /&gt;at java.security.AccessController.doPrivileged(Native Method)&lt;br /&gt;at java.net.URLClassLoader.findClass(URLClassLoader.java:190)&lt;br /&gt;at java.lang.ClassLoader.loadClass(ClassLoader.java:306)&lt;br /&gt;at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)&lt;br /&gt;at java.lang.ClassLoader.loadClass(ClassLoader.java:247)&lt;/pre&gt;&lt;br /&gt;&lt;div style="font-style: normal; font-variant: normal; font-weight: normal; "&gt;This is because there's &lt;a href="http://stackoverflow.com/questions/5904846/getting-exception-in-thread-main-java-lang-noclassdeffounderror"&gt;an issue&lt;/a&gt; where it is packaged, and you need to call it by the specific package name.  Quick fix, and...:&lt;/div&gt;&lt;pre class="brush:java"&gt;java -cp clojure-1.3.0.jar clojure.main Clojure 1.3.0&lt;/pre&gt;We're properly off!&lt;div style="font-style: normal; font-variant: normal; font-weight: normal; "&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-style: normal; font-variant: normal; font-weight: normal; "&gt;I've been messing around with clojure on and off for a while now, over &lt;a href="http://tryclj.com/"&gt;here&lt;/a&gt;. I highly suggest the tutorial, it is great. (I also highly suggest checking out this post on why &lt;a href="http://www.sauria.com/blog/2011/11/15/clojure-conj-2011/"&gt;Clojure Con&lt;/a&gt; was great, but that's not really on topic.&lt;/div&gt;&lt;div style="font-style: normal; font-variant: normal; font-weight: normal; "&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-style: normal; font-variant: normal; font-weight: normal; "&gt;&lt;span style="color: rgb(39, 39, 39); font-family: 'Lucida Grande', 'Trebuchet MS', 'Bitstream Vera Sans', Verdana, Helvetica, sans-serif; font-size: 12px; line-height: 18px; text-align: -webkit-auto; background-color: rgb(255, 255, 255); "&gt;&lt;/span&gt;&lt;blockquote&gt;&lt;span style="color: rgb(39, 39, 39); font-family: 'Lucida Grande', 'Trebuchet MS', 'Bitstream Vera Sans', Verdana, Helvetica, sans-serif; font-size: 12px; line-height: 18px; text-align: -webkit-auto; background-color: rgb(255, 255, 255); "&gt;Depending on your development style, you may also want&lt;/span&gt;&lt;br style="color: rgb(39, 39, 39); font-family: 'Lucida Grande', 'Trebuchet MS', 'Bitstream Vera Sans', Verdana, Helvetica, sans-serif; font-size: 12px; line-height: 18px; text-align: -webkit-auto; background-color: rgb(255, 255, 255); "&gt;&lt;br style="color: rgb(39, 39, 39); font-family: 'Lucida Grande', 'Trebuchet MS', 'Bitstream Vera Sans', Verdana, Helvetica, sans-serif; font-size: 12px; line-height: 18px; text-align: -webkit-auto; background-color: rgb(255, 255, 255); "&gt;&lt;ul style="padding-top: 0px; padding-right: 0px; padding-bottom: 0px; padding-left: 3em; margin-right: 0px; margin-bottom: 0px; margin-left: 0px; margin-top: 0.5em; color: rgb(39, 39, 39); font-family: 'Lucida Grande', 'Trebuchet MS', 'Bitstream Vera Sans', Verdana, Helvetica, sans-serif; font-size: 12px; line-height: 18px; text-align: -webkit-auto; background-color: rgb(255, 255, 255); "&gt;&lt;li&gt;line editing and history at the REPL&lt;/li&gt;&lt;li&gt;a syntax-highlighting editor&lt;/li&gt;&lt;li&gt;package management&lt;/li&gt;&lt;li&gt;automated builds&lt;/li&gt;&lt;li&gt;a full IDE&lt;/li&gt;&lt;li&gt;a tutorial environment&lt;/li&gt;&lt;/ul&gt;&lt;/blockquote&gt;&lt;/div&gt;&lt;div style="font-style: normal; font-variant: normal; font-weight: normal; "&gt;The site suggests all of those. I'm not so sure. I generally do all of my code with the Terminal and with MacVim. I'll be relying heavily on &lt;a href="http://writequit.org/blog/?p=386"&gt;Vim for Clojure.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;That's enough for this post. I'll put more up tomorrow, I hope! I know I'm late, but I view this as a running project and not an end-based one. Again, I'm a linguist, not a coder. So this is a long process. &lt;/div&gt;&lt;div style="font-style: normal; font-variant: normal; font-weight: normal; "&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-style: normal; font-variant: normal; font-weight: normal; "&gt;Next: &lt;a href="http://dev.clojure.org/display/doc/Getting+Started"&gt;Getting Started.&lt;/a&gt;&lt;/div&gt;&lt;/span&gt;&lt;div style="font-style: normal; font-variant: normal; font-weight: normal; "&gt;&lt;div&gt;&lt;span&gt;&lt;span style="font-size: 100%;"&gt; &lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1506645095464847642-3840811760503464648?l=clojurelx.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clojurelx.blogspot.com/feeds/3840811760503464648/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://clojurelx.blogspot.com/2012/01/beginning-with-clojure.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/3840811760503464648'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/3840811760503464648'/><link rel='alternate' type='text/html' href='http://clojurelx.blogspot.com/2012/01/beginning-with-clojure.html' title='Beginning with Clojure'/><author><name>Richard L.</name><uri>http://www.blogger.com/profile/01922168505806787799</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='26' height='32' src='http://2.bp.blogspot.com/-7BHS9Qd0y1M/TWB67WIiToI/AAAAAAAAACY/ZFo_TRDjf_Y/s220/Screen%2Bshot%2B2010-12-10%2Bat%2B23.53.54.png'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1506645095464847642.post-8307737231145787715</id><published>2012-01-13T09:46:00.001-08:00</published><updated>2012-01-13T09:51:41.082-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='paip'/><category scheme='http://www.blogger.com/atom/ns#' term='Lisp'/><category scheme='http://www.blogger.com/atom/ns#' term='Clojure'/><title type='text'>What makes Clojure different?</title><content type='html'>&lt;div style="text-align: justify;"&gt;A friend of mine asked me why Clojure matters and what makes it special and why I think it is good for linguists. This post is the edited version of my answer to my dear friend. Since there are very good books on the market (my favourite is &lt;a href="http://www.manning.com/rathore/"&gt;Clojure in Action&lt;/a&gt;) and the internet is full of good tutorials (&lt;a href="https://www.4clojure.com/"&gt;4Clojure&lt;/a&gt; is esp. good if you like the learning by doing method) my goal is only to give you a rough picture of functional programming.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;b&gt;An example&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;We are going to solve a "toy" problem stolen from the first chapter of Peter Norvig's seminal &lt;a href="http://norvig.com/paip.html"&gt;Paradigms of Artificial Intelligence&lt;/a&gt;. The question is how do you extract first and last names from someone’s full name. Before you think this is too simple and it doesn't worth dealing with, consider names like Robert Downey Jr, Admiral Grace Hopper, and what about Staff Sergeant William "Wild Bill" Guarnere (a character for the Band of Brothers series). Machines should be programmed to solve these problems, and even humans could have problems with names. It took me years to figure out that Martin "Boban" Doktor (a well known Czech Olympic champion sprint canoer) is not a real doctor...&lt;/div&gt;&lt;div style="text-align: justify;"&gt;First, we need some data to test our assumptions.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1602834.js?file=names01.clj"&gt;&lt;/script&gt;The function 'def' associates the symbol 'names' with names (oh, a vector of vectors).  A first name is usually just the first word in a name.&lt;script src="https://gist.github.com/1602843.js?file=names02.clj"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;And the last name is the last word in a name.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1602847.js?file=names03.clj"&gt;&lt;/script&gt;Let's test our functions. Calling first-name and last-name on my name gives the right answers.&lt;script src="https://gist.github.com/1602871.js?file=names04.clj"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;We stored out test data in names, and now it's time to test our functions en mass. The higher order function map helps us in doing so. Map takes a function as its first argument and applies it to every member of its second argument.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1602878.js?file=names05.clj"&gt;&lt;/script&gt;Oooops, the program is having serious problems with "titles" or prefixes. Calling last-name on names gives interesting results too.  Our program is not that bad, it captures the basic logic of identifying first and last names, but affixes cause problems. The first name should be the first word in a name if it is not a prefix. Let's store the affixes in vectors.&lt;script src="https://gist.github.com/1605408.js?file=names06.clj"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;We want to test if the first word of the full name is a member of the titles. We need a function that tests membership.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1605411.js?file=names07.clj"&gt;&lt;/script&gt;The function member is recursive. First, it test if its second argument is a sequence. The second if gives us a terminating condition, if x and the first element of the second argument are equivalent, it returns the whole second argument. Otherwise it tests the membership again on the rest of the sequence (i.e. everything but the first element of the original sequence).  Now, we can redefine our first name function. If the first word of the full name is in the list of prefixes, call first-name on the rest of the full name, otherwise return the first word of the full name.&lt;script src="https://gist.github.com/1605415.js?file=names08.clj"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Testing our new function shows it works correctly.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1605418.js?file=names09.clj"&gt;&lt;/script&gt;We can redefine last-name similarly.&lt;script src="https://gist.github.com/1605423.js?file=names10.clj"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Storing names in vectors of strings is very unnatural (at least for humans, I guess machines don't care about these issues). Wouldn’t it be nicer to type names like "Zoltán Varjú" instead of ["Zoltán" "Varjú"]?&lt;/div&gt;&lt;div style="text-align: justify;"&gt;First, we need new test data, which is a vector of strings.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1605431.js?file=names11.clj"&gt;&lt;/script&gt;We want to use our first-name and last-name functions. Can we split a name into individual words? clojure.string provides us a split function (that's why we put (:use [clojure.string :as str :only [split]] :reload) into ns) which splits a string into a vector of strings at a given point. The space character delimits the parts of a name. Our source code looks like this now:&lt;script src="https://gist.github.com/1607641.js?file=names12.clj"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Now we can test split from clojure.string.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1607665.js?file=names13.clj"&gt;&lt;/script&gt;Let's define a split-name function just to save ourself from repetitive strain injury caused by excessive typing.&lt;script src="https://gist.github.com/1607675.js?file=names14.clj"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Finally, we test if our functions work on splitted names.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1607686.js?file=names15.clj"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;b&gt;Notes&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;I have to note, you can make the code more concise and idiomatic. I hope you can see 1) how can you solve a problem with functions and by combining them 2) you have a basic idea of what is recursion 3) how can you go from a basic problem to an acceptable solution.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;b&gt;What makes Clojure different?&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Norvig lists eight features that make Lisp different:&lt;/div&gt;&lt;ol&gt;&lt;li&gt; built-in support for lists&lt;/li&gt;&lt;li&gt; automatic storage management&lt;/li&gt;&lt;li&gt; dynamic typing&lt;/li&gt;&lt;li&gt; first-class functions&lt;/li&gt;&lt;li&gt; uniform syntax&lt;/li&gt;&lt;li&gt; interactive environment&lt;/li&gt;&lt;li&gt; extensibility&lt;/li&gt;&lt;li&gt; history (see Paul Graham's essays, &lt;a href="http://www.paulgraham.com/diff.html"&gt;What Made Lisp Different&lt;/a&gt; and &lt;a href="http://www.paulgraham.com/rootsoflisp.html"&gt;The Roots of Lisp&lt;/a&gt;)&lt;/li&gt;&lt;/ol&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Clojure is a Lisp on the JVM which makes it unique. The Java Virtual Machine makes it portable, reliable and secure, but there is a new JavaScript based version called &lt;a href="https://github.com/clojure/clojurescript"&gt;ClojureScript&lt;/a&gt;. &lt;a href="http://common-lisp.net/project/slime/"&gt;Slime&lt;/a&gt; is an excellent development environment, leiningen makes project automation easy. Java interoperability means Clojure has got a great collection of libraries for almost everything.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;However Clojure is not for complete beginners. The Clojure community is very open and supportive, but asking the right question requires some sort of maturity. As &lt;a href="http://www.reddit.com/r/Clojure/comments/n90id/about_how_much_java_do_you_use_or_need_to_know/"&gt;this&lt;/a&gt; Reddit thread explains you shouldn't be a Java expert to pick up the language, even you can learn what you have to know on the go. But you should know at least one 'conventional' language like Python before you start learning Clojure. More propaganda in our &lt;a href="http://clojurelx.blogspot.com/2011/11/why-clojure-lx.html"&gt;Why Clojure lx?&lt;/a&gt; post.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1506645095464847642-8307737231145787715?l=clojurelx.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clojurelx.blogspot.com/feeds/8307737231145787715/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://clojurelx.blogspot.com/2012/01/what-makes-clojure-different.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/8307737231145787715'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/8307737231145787715'/><link rel='alternate' type='text/html' href='http://clojurelx.blogspot.com/2012/01/what-makes-clojure-different.html' title='What makes Clojure different?'/><author><name>Zoltán Varjú</name><uri>https://profiles.google.com/102852068976721430833</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-kiZQY3iBnOo/AAAAAAAAAAI/AAAAAAAAAfs/n2Nb-MYeGE8/s512-c/photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1506645095464847642.post-4065921524046595083</id><published>2011-12-29T11:48:00.000-08:00</published><updated>2011-12-29T11:48:38.846-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='nlp'/><category scheme='http://www.blogger.com/atom/ns#' term='hello'/><category scheme='http://www.blogger.com/atom/ns#' term='clojure-opennlp'/><category scheme='http://www.blogger.com/atom/ns#' term='intro'/><title type='text'>Hello nlp!</title><content type='html'>&lt;div style="text-align: justify;"&gt;This post assumes you have already installed Leiningen and you can work with your choice of programmers' editor.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;b&gt;Starting a new project&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1529357.js?file=hello-nlp01"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;This command creates a new directory (hello-nlp). Navigate into that new directory and open the file project.clj. You are going to see something like this:&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1529367.js?file=hello-nlp02.clj"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;We are going to use the Apache Foundation's &lt;a href="http://incubator.apache.org/opennlp/"&gt;OpenNLP&lt;/a&gt; library with the help of &lt;a href="http://writequit.org/"&gt;Lee Hinman&lt;/a&gt;'s Clojure library interface (and this post is based on Hinman's tutorial). Searching for “opennlp” gives various results, so we picked up the first (ending with 0.1.7). The information page contains everything you might want to know, the location of the github repo and a short code snippet for leiningen users [clojure-opennlp "0.1.7"]. Copy and paste the code into project.clj as follows:&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1529375.js?file=hello-nlp03.clj"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Now your project.clj knows everything and is ready to serve you. The command&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1529382.js?file=hello-nlp04.clj"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;downloads dependencies (e.g. the clojure-opennlp library) and puts them into your path. Have a look at the lib library in your project library and you'll see jar files.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The core &lt;/b&gt;&lt;br /&gt;Now navigate into the hello-nlp/src/hello_nlp/ library. You'll find a core.clj file there.  Open it in your editor.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;You'll see something like this:&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1529392.js?file=hello-nlp05.clj"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;To “enable” OpenNLP, modify the file:&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1529399.js?file=hello-nlp06.clj"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;You need a few additional files. Make a models directory in hello-nlp and download the pre-trained models from here (&lt;a href="http://opennlp.sourceforge.net/models-1.5/"&gt;http://opennlp.sourceforge.net/models-1.5/&lt;/a&gt;). In this post, we are using English models, but feel free to change to another one. You need the Sentence Detector (en-sent.bin), Tokenizer (en-token.bin) and the POS Tagger (en-pos-maxent.bin).&lt;/div&gt;&lt;div style="text-align: justify;"&gt;Now, we can add user defined functions to core.clj. In the example, we made a sentence detector (get-sentences), a tokenizer (tokenize) and a POS tagger (pos-tag) based on the downloaded models.&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;script src="https://gist.github.com/1535805.js?file=hello-nlp08.clj"&gt;&lt;/script&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;&lt;b&gt;Get your hands dirty&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: justify;"&gt;You can try out the newly defined function on your own sentences! &lt;/div&gt;&lt;script src="https://gist.github.com/1529406.js?file=hello-nlp07.clj"&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1506645095464847642-4065921524046595083?l=clojurelx.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clojurelx.blogspot.com/feeds/4065921524046595083/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://clojurelx.blogspot.com/2011/12/hello-nlp.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/4065921524046595083'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/4065921524046595083'/><link rel='alternate' type='text/html' href='http://clojurelx.blogspot.com/2011/12/hello-nlp.html' title='Hello nlp!'/><author><name>Zoltán Varjú</name><uri>https://profiles.google.com/102852068976721430833</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-kiZQY3iBnOo/AAAAAAAAAAI/AAAAAAAAAfs/n2Nb-MYeGE8/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1506645095464847642.post-2683856577447914942</id><published>2011-11-28T10:14:00.001-08:00</published><updated>2011-11-28T10:14:45.848-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='road-map'/><title type='text'>Road-map – or n+1 steps to enlightenment (or loonybind)</title><content type='html'>&lt;p&gt;As we expressed in our previous post, we'd like to experiment with Clojure. Let us emphasis again, we are NOT developing a new library, we just believe that using Clojure in linguistic computing might be fruitful. In order to prove this assumption (or refute it), we are going to try some tools out, and summarize and share our experiences as blog posts. Here is our tentative road-map.&lt;br /&gt; &lt;a name='more'&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Topics&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;We don't want to cover everything since we are neither omniscient, nor experienced Clojure hackers &amp;ndash; so take our words with a grain of salt. We've chosen a few &amp;ldquo;core&amp;rdquo; topics interesting to us. Naturally, the topics are divided into two categories; &amp;ldquo;classics&amp;rdquo; and &amp;ldquo;using Java power tools&amp;rdquo;. Please leave a comment; we'd appreciate your thoughts (even your critique!).&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Classics&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;Algorithms for Computational Linguistics &amp;ndash; we stole the title from the &lt;a href="http://www.coli.uni-saarland.de/projects/milca/courses/coal/html/"&gt;great (and open) book&lt;/a&gt; by Striegnitz, Blackburn, Erk, Walter, Burchardt and Tsovaltz. We'd like to approach finite state techniques from two perspectives: logical and functional.&lt;/p&gt;&lt;p&gt;Zipf's law &amp;ndash; the well-known distribution &amp;ndash; is the guinea pig of linguistic statistics. Inspired by the &lt;a href="http://zipfr.r-forge.r-project.org/"&gt;ZipfR&lt;/a&gt; package and &lt;a href="http://incanter.org/"&gt;Incanter&lt;/a&gt;, we examine some very basic stats about texts, like word length and frequency and we try out plotting our results.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Java power tools&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="http://incubator.apache.org/opennlp/"&gt;OpenNLP&lt;/a&gt; and the &lt;a href="http://nlp.stanford.edu/software/lex-parser.shtml"&gt;Stanford parser&lt;/a&gt; are real power tools. Tagging, chunking and parsing are &lt;span style="font-style: normal;"&gt;indispensable when we are working with data.&lt;/span&gt;&lt;/p&gt;&lt;p&gt;Latent semantics &amp;ndash; using &lt;a href="https://github.com/davidandrzej/chisel"&gt;chisel&lt;/a&gt; to do LDA analysis with the &lt;a href="http://mallet.cs.umass.edu/"&gt;MALLET&lt;/a&gt; package.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1506645095464847642-2683856577447914942?l=clojurelx.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clojurelx.blogspot.com/feeds/2683856577447914942/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://clojurelx.blogspot.com/2011/11/road-map-or-n1-steps-to-enlightenment.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/2683856577447914942'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/2683856577447914942'/><link rel='alternate' type='text/html' href='http://clojurelx.blogspot.com/2011/11/road-map-or-n1-steps-to-enlightenment.html' title='Road-map – or n+1 steps to enlightenment (or loonybind)'/><author><name>Zoltán Varjú</name><uri>https://profiles.google.com/102852068976721430833</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-kiZQY3iBnOo/AAAAAAAAAAI/AAAAAAAAAfs/n2Nb-MYeGE8/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1506645095464847642.post-4293717577287666309</id><published>2011-11-25T01:27:00.001-08:00</published><updated>2011-11-25T01:29:26.008-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Leiningen'/><category scheme='http://www.blogger.com/atom/ns#' term='version control'/><category scheme='http://www.blogger.com/atom/ns#' term='emacs'/><category scheme='http://www.blogger.com/atom/ns#' term='Clojure'/><category scheme='http://www.blogger.com/atom/ns#' term='beginner'/><category scheme='http://www.blogger.com/atom/ns#' term='Clojars'/><title type='text'>Hints for newbies</title><content type='html'>&lt;div style="margin-bottom: 0in;"&gt;&lt;i&gt;We received emails from interested folks who are new to Clojure. We hope they can find enough information about setting up a convenient environment for working with us so that they can provide us feedback. Here we give them a few tips. Please share your experiences in the comments!&lt;/i&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0in;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0in;"&gt;Installing Clojure requires some expertise, this means you should be comfortable with your operating system. The easiest way to run Clojure, is downloading the clojure.jar file, and using the java -cp clojure.jar clojure.main command from the command line. However, this isn't the most effective way. Finding information about how to install Clojure on your platform is not impossible with a search engine. Ubuntu users find everything in &lt;a href="http://riddell.us/ClojureOnUbuntu.html"&gt;&lt;b&gt;Clojure on Ubuntu&lt;/b&gt;&lt;/a&gt;, please note the clojure github repo has been moved to &lt;a href="https://github.com/clojure/clojure"&gt;https://github.com/clojure/clojure&lt;/a&gt; and clojure-contrib also moved individual repos, so don't follow the description literally!&lt;/div&gt;&lt;div style="margin-bottom: 0in;"&gt;You'll also need to install &lt;a href="https://github.com/technomancy/leiningen"&gt;&lt;b&gt;Leiningen&lt;/b&gt;&lt;/a&gt;. Why? As you can read on its repo “Working on Clojure projects with tools designed for Java can be an exercise in frustration. With Leiningen, you just write Clojure”. We are going to use Java tools, and the &lt;a href="http://clojars.org/"&gt;&lt;b&gt;Clojars&lt;/b&gt;&lt;/a&gt; community repository provides us with these tools. Although using Leiningen to include various Java libraries into our projects looks very tedious (have a look at the &lt;a href="https://github.com/technomancy/leiningen/blob/stable/sample.project.clj"&gt;sample file&lt;/a&gt;), but taking some time before getting into coding can give us goodies like the Stanford parser, OpenNLP, WEKA.&lt;/div&gt;&lt;div style="margin-bottom: 0in;"&gt;&lt;a href="" name="toc_1"&gt;&lt;/a&gt;If you haven't found your text editor of choice, &lt;a href="http://www.gnu.org/s/emacs/"&gt;emacs&lt;/a&gt; with &lt;a href="http://common-lisp.net/project/slime/"&gt;SLIME&lt;/a&gt; is the stuff you need. The &lt;a href="http://riddell.us/ClojureSwankLeiningenWithEmacsOnLinux.html"&gt;Clojure, Swank, and Leiningen with Emacs on Linux&lt;/a&gt; shows you how can you set up your development environment.&lt;/div&gt;&lt;div style="margin-bottom: 0in;"&gt;We won't speak about version control, but you using version control is good house keeping technique. If you are new to this theme, and haven't committed yourself to a tool yet, have a look at &lt;a href="http://git-scm.com/"&gt;git&lt;/a&gt;, and &lt;a href="https://github.com/"&gt;github&lt;/a&gt;, and read the &lt;a href="http://book.git-scm.com/"&gt;git community book&lt;/a&gt;.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1506645095464847642-4293717577287666309?l=clojurelx.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clojurelx.blogspot.com/feeds/4293717577287666309/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://clojurelx.blogspot.com/2011/11/hints-for-newbies.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/4293717577287666309'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/4293717577287666309'/><link rel='alternate' type='text/html' href='http://clojurelx.blogspot.com/2011/11/hints-for-newbies.html' title='Hints for newbies'/><author><name>Zoltán Varjú</name><uri>https://profiles.google.com/102852068976721430833</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-kiZQY3iBnOo/AAAAAAAAAAI/AAAAAAAAAfs/n2Nb-MYeGE8/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1506645095464847642.post-5629746026631647628</id><published>2011-11-16T09:30:00.000-08:00</published><updated>2011-11-16T21:57:57.898-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='manifesto'/><category scheme='http://www.blogger.com/atom/ns#' term='linguistics'/><category scheme='http://www.blogger.com/atom/ns#' term='Clojure'/><title type='text'>Why Clojure lx?</title><content type='html'>&lt;p style="text-align: justify;"&gt;&lt;strong&gt;The NLTK is a natural choice for students of linguistics and computer science. It has matured into a stable project, its users are very active, and it is now used outside of academia. Those who are into functional programming can use the &lt;a href="http://www.snltk.org/"&gt;Scheme Natural Language Toolkit&lt;/a&gt;, or learn from the &lt;a href="http://nlpwp.org/"&gt;Natural Language Processing for the Working Programmer&lt;/a&gt;, and those who needs the JVM can turn to &lt;a href="http://www.scalanlp.org/"&gt;ScalaNLP&lt;/a&gt;. So why brother with Clojure?&lt;/strong&gt;&lt;/p&gt;&lt;p style="text-align: justify;"&gt;First of all, we are NOT proposing a new framework/library here! Our main goal is to examine what Clojure offers to linguists. Although more and more linguistics departments offer courses in statistics and probability theory, the vast majority of students graduate with some background in discrete maths, mostly taught in an implicit way through a class in syntax and/or semantics (and the same is true for philosophy education). Using computer programs to test our scientific ideas is becoming a common practice in sciences, and this is true for linguists too. &lt;a href="http://szamitogepesnyelveszet.blogspot.com/2010/11/on-computational-corpus-linguistics.html"&gt;Stefan Th. Gries distinguishes&lt;/a&gt; linguistic computing from computational linguistics; following him, we think linguistic computing will become a common methodology used in the language sciences.&lt;/p&gt;&lt;p style="text-align: justify;"&gt;So, what's the difference between computational linguistics and linguistic computing? Well, there is no clear boundary! We'd say computational linguistics (or natural language processing) is a kind of applied science and engineering, and as such it is more &amp;ldquo;goal oriented&amp;rdquo;. &lt;a href="http://norvig.com/chomsky.html"&gt;Norvig's recent critique of Chomsky&lt;/a&gt; shows that commercial success is a measure of ideas, but despite the proliferation of statistical methods linguists are still doing research on rule based systems like HPSG, minimalism, etc., and new interdisciplinary research themes have emerged like &lt;a href="http://www.sci.brooklyn.cuny.edu/cis/parikh/"&gt;Parikh&lt;/a&gt;'s idea of the &lt;a href="http://www.sci.brooklyn.cuny.edu/cis/parikh/softsen.pdf"&gt;social software&lt;/a&gt; (and &lt;a href="http://ibe.eller.arizona.edu/docs/2008/blume/jaeger-semantics.pdf"&gt;game theoretic semantics&lt;/a&gt; and &lt;a href="http://www.csc.liv.ac.uk/%7Edel/"&gt;dynamic epistemic logic&lt;/a&gt;, among others). But what is &amp;ldquo;pure&amp;rdquo; research today can become applied research tomorrow. To foster communication between pure and applied research, between linguistic computing and computational linguistics, we need a lingua franca.&lt;/p&gt;&lt;p style="text-align: justify;"&gt;As Clojure is the Lisp for the JVM, it is a convenient language for linguists. In the not-so-distant past, Touretzky wrote his &lt;a href="http://www.cs.cmu.edu/%7Edst/LispBook/"&gt;Gentle Introduction to Symbolic Computation&lt;/a&gt;, an excellent book for beginners in the humanities. Gazdar and Mellish &lt;span style="text-decoration: underline;"&gt;Natural Language Processing in X&lt;/span&gt; (where X stands for &lt;a href="http://www.informatics.susx.ac.uk/research/groups/nlp/gazdar/nlp-in-prolog/"&gt;Prolog&lt;/a&gt;, &lt;a href="http://www.informatics.susx.ac.uk/research/groups/nlp/gazdar/nlp-in-lisp/"&gt;Lisp&lt;/a&gt; or &lt;a href="http://www.informatics.susx.ac.uk/research/groups/nlp/gazdar/nlp-in-pop11/index.html"&gt;Pop11&lt;/a&gt;) is a good introduction to finite state techniques, grammars, parsing and it even has a chapter on question answering. We don't deny that these techniques are old, but they are still part of the well-educated linguists' body of knowledge. Also, although Norivig's &lt;a href="http://norvig.com/paip.html"&gt;PAIP&lt;/a&gt; is a real gem, one cannot argue against the &amp;ldquo;old&amp;rdquo; AI paradigm without seeing the past, and those ideas are still important for linguist, philosophers and cognitive scientists. Logic programming is a natural pair of functional programming. The basic techniques of computational linguistics can be expressed in logic programs, and although they have their computational limitations, these little programs has got unquestionable educational value.&lt;/p&gt;&lt;p style="text-align: justify;"&gt;Porting the classic into Clojure is not a novel idea, as some Google searching shows that people are turning the classic Lisp books like PAIP or the Structure and Interpretations of Computer Programs into modern Clojure. The core.logic library opens up the possibility to do the same with the Prolog literature.&lt;/p&gt;&lt;p style="text-align: justify;"&gt;The most common argument against NLTK is that you can't use mature, industry standard tools like the GATE framework, Stanford core, and openNLP. Clojure's Java interoperability solves this problem. If you are into machine learning, Weka, MALLET and etc. are at your service. The Incanter package provides an R-like statistical library.&lt;/p&gt;&lt;p style="text-align: justify;"&gt;With these tools in your hand, you can test your ideas in a language that's very close to what you learned about formal languages. Using Java libraries is like using rapid prototyping material when you are a marble sculptor. And as your works end result can be shared with the computational linguists, you can get more feedback, and even help from the greater community.&lt;/p&gt;&lt;p style="text-align: justify;"&gt;That's why we think that Clojure lx is an idea worths exploring. We'd like to test ourselves! Can we use Clojure to express our simple ideas? How easy is it to use Java libraries for a project? If you would like to join us, please send an email to zoltan.varju(at)gmail.com. We welcome everyone, linguists and Clojure hackers, philosphers, digital humanists, everyone who is interested!&lt;/p&gt;&lt;p&gt;&lt;strong&gt;About us&lt;/strong&gt;&lt;br /&gt; &lt;a href="http://about.me/zoltanvarju"&gt;Zolt&amp;aacute;n Varj&amp;uacute;&lt;/a&gt; &amp;ndash; computational linguist at Weblib LLC, &lt;a href="http://twitter.com/#%21/zoltanvarju"&gt;@zoltanvarju&lt;/a&gt;, &lt;a href="http://szamitogepesnyelveszet.blogspot.com/"&gt;Sz&amp;aacute;m&amp;iacute;t&amp;oacute;g&amp;eacute;pes nyelv&amp;eacute;szet&lt;/a&gt;&lt;br /&gt; &lt;a href="http://www.burntfen.net/hub.php"&gt;Richard Littauer&lt;/a&gt; &amp;ndash; MSc computational linguistics student at the University of Saarland, &lt;a href="http://twitter.com/#%21/richlitt"&gt;@richlitt&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Special thanks to&lt;/strong&gt;&lt;br /&gt; Neil Ashton - &lt;a href="http://twitter.com/#%21/nmashton"&gt;@nmashton&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1506645095464847642-5629746026631647628?l=clojurelx.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://clojurelx.blogspot.com/feeds/5629746026631647628/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://clojurelx.blogspot.com/2011/11/why-clojure-lx.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/5629746026631647628'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1506645095464847642/posts/default/5629746026631647628'/><link rel='alternate' type='text/html' href='http://clojurelx.blogspot.com/2011/11/why-clojure-lx.html' title='Why Clojure lx?'/><author><name>Zoltán Varjú</name><uri>https://profiles.google.com/102852068976721430833</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh3.googleusercontent.com/-kiZQY3iBnOo/AAAAAAAAAAI/AAAAAAAAAfs/n2Nb-MYeGE8/s512-c/photo.jpg'/></author><thr:total>2</thr:total></entry></feed>
