Sunday, May 23, 2010

Stat 101 and little more in 25 lines of Clojure

If you like numbers or statistics to be exact, Clojure is your best friend. There is this wonderful statistical graphing library called Incanter, which is an R like environment for all of your data crunching and visualization needs. And there is such an expressive power of Clojure at your fingertips that you can pretty much express your stat 101 (probability, permutation, combination, number of trial and degree of certainty) with a mere 25 (blank lines are not counted) lines of Clojure code. OK, without further ado, here is the whole thing:
Note: DC stands for degree of certainty that an event will appear, P stands for probability of the event and N stands for number of trials (events). from the auther of Probability Theory, Live!
1:  (defn N [p dc] 
2:   (/ (Math/log (- 1 dc)) 
3:     (Math/log (- 1 p)))) 
4:   
5:  (defn DC [p n] 
6:   (let [np (- 1 p) 
7:      pp (Math/pow np n)] 
8:    (- 1 pp))) 
9:   
10:  (defn k-factorial [n] 
11:   (reduce * (range 1 (inc n)))) 
12:   
13:  (defn permutation [n k] 
14:   (/ (k-factorial n) 
15:     (k-factorial (- n k)))) 
16:   
17:  (defn combination [n k] 
18:   (/ (k-factorial n) 
19:     (* (k-factorial k) 
20:      (k-factorial (- n k))))) 
21:   
22:  (defn- k-filter [el coll] 
23:   (filter #(not (= el %)) coll))   
24:   
25:  (defn permutations [n coll] 
26:    (if (= n 1) 
27:     (map #(vector %) coll) 
28:     (for [el coll nlis (permutations (- n 1) (k-filter el coll))] 
29:      (conj nlis el))))  
30:   
31:  (defn combinations [n coll] 
32:   (set (map set (permutations n coll)))) 



The combinations and permutations implementation is recursive and will result in stack over flow for large data. For non recursive implementation, you can check clojure-contrib's combinatorics library. It is fast and makes use of lazy-seq, meaning you can virtually generate combinations for arbitrary large amount of data. However, the point of this post is to show you that it is very easy to implement statistical formulas in Clojure. You might wander what you can do with the above code. Here is one interesting statistical riddle you can solve with it: Let us say you have a bag with 100 coins. There are 7 gold coins and 93 silver coins. If you are to pick up a coin without looking inside the bag, what is minimum number of coins that you had to pick up to be able to say with 50% confidence that this bag has 7 gold coins? Now, we can put the code above into action:
First we need to clarify the riddle is asking us to find the number of trials, which is N as defined above.
Second, we know that the possibility P of picking up a gold coin is 7/100.  The degree of certainty, which is denoted as DC above, is given as .5 or 1/2. So let us just plug in those numbers to the function N:
=>(println (N (/ 7 100) 0.5))
This will produce 9.551337509447352, which roughly around 10 trials.  So we can say that after picking about 10 coins, you can say with 50% percent confidence that the bag has 7 gold coins.

Again, if you are math inclined person, you will have a lot more fun with Clojure than you might have imagined.  Note that the DC and N formula is directly from following website:
http://saliu.com/Saliu2.htm. I also got help and feedback to my permutations function from nice folks hang around in Google Clojure groups, which I highly recommend you sign up immediately if you want to learn from a master or masters.  I might do another post with the non recursive version of the function in my next post when time permits. Until then, enjoy your Clojure journey.
 

Sunday, May 16, 2010

The most portable Clojure REPL box - Eee PC 701

I have been trying to put my Eee PC 701 into a good use. This weekend I realized that turning it into a portable Clojure Box could not be more useful considering I can have REPL access whenever I want. As it turned out, it is quite easy, and here is how I got it done:
  1. Download the Arch Linux ISO (here: http://www.archlinux.org/download/) and burn it onto a CD.
  2. Use an external CD drive to boot your Eee PC with Arch Linux Image and follow the instructions to install the base system.  You will find very detailed step by step instruction from here: http://wiki.archlinux.org/index.php/Asus_Eee_PC_701. Even though the installation process is pretty straight forward, here are few things you need to pay attention:
    • Make sure you select ext2 file system
    • Run "pacman -Syu" to do system upgrade before installing any extra programs
    • Make sure to add hal and fam to the deamon list by adding "DAEMONS=(syslog-ng network netfs crond @hal @fam)" to your /etc/rc.conf if you are planning to add a graphical desktop environment
    • Now you can install Mercurial, Git, Java and Clojure. (You can install all from Arch repo except Clojure. You can easily install Clojure by: hg clone http://bitbucket.org/kasim/clojurew/, which is a project I created to make setting up Clojure less painful.)   
You can also install Xorg and a lightweight desktop called lxde very easily if you want. However, All I needed is a shell log in so I can play with Clojure REPL. The Clojurew project include latest versions of Clojure-contrib and JLine so you pretty much have all Clojure API doc and source code browsing at your fingertips with command line history. After installing Mercurial, Git, Java and Clojure with clojure-contrib, you will still have close to 3 Gig empty space left on your hard drive(It has only 4 Gig hard drive). That means you can still install Incanter, which I have been playing with a lot recently. (Note: you can just copy Incanter app jar from the binary download to Clojurew's lib directory to have Incanter available at the REPL).  It took me around an hour to set everything up and the joy of being able to carry a little Clojure box around is well worth the effort and amazing. Go ahead and try it out if you own Eee PC 701.

Friday, May 7, 2010

Easy Clojure developement with Vim

Starting with Vim or Clojure can be difficult. Starting with both at the same time is not recommended. But it can get difficult for someone who knows both. Following is a simple way how I do my development:
I edit clojure scripts in Vim. I found putting following lines in .vimrc very helpful:
set showmatch 
map <F5> <Esc>:!clj '%:p'<CR>
The first one is for highlighting matching (,{ and [. The second one is for executing the currently edited file.  Please note that !clj '%:p' just invokes shell command clj with the fully qualified path of the current buffer you are editing and it is mapped to the F5 function key.  If you are using MacVim, make sure that you put following lines in your .vimrc file:
let $PATH="path/to/clojure/home/bin:".$PATH
I also use a vim plug in called “AutoClose” (here: http://www.vim.org/scripts/script.php?script_id=1849) that will automatically close parenthesis, brackets and curly braces. You can also do yourself a favor by installing a plug in called “VimClojure” (here: http://www.vim.org/scripts/script.php?script_id=2501). I only use syntax highlighting feature and do not run interactive REPL with it(I do not recommended using nailgun server). The above is just enough to do small scale development. However, for involved development, there is dependency and class path hurdles to overcome. It is even worse for someone who is new to Java World from other dialects of LISP.  To ease the pain, I created a project called Clojurew (here: http://bitbucket.org/kasim/clojurew/src) that will do automatic class path resolution. Any jar dependency will be resolved by just coping the jar to the lib folder of Clojurew. If you want to include your own script to the class path,  you can even create a soft link to your script within the lib directory and it will be included in the class path as well. Here is how you can do this:
 ln  -s path/to/your/script linkname
After this, you can use any function in your script that is defined in other files.  Happy Vimming and Clojuring!
I hope someone find this helpful.