Sunday, May 23, 2010

Stat 101 and little more in 25 lines of Clojure

If you like numbers or statistics to be exact, Clojure is your best friend. There is this wonderful statistical graphing library called Incanter, which is an R like environment for all of your data crunching and visualization needs. And there is such an expressive power of Clojure at your fingertips that you can pretty much express your stat 101 (probability, permutation, combination, number of trial and degree of certainty) with a mere 25 (blank lines are not counted) lines of Clojure code. OK, without further ado, here is the whole thing:
Note: DC stands for degree of certainty that an event will appear, P stands for probability of the event and N stands for number of trials (events). from the auther of Probability Theory, Live!
1:  (defn N [p dc] 
2:   (/ (Math/log (- 1 dc)) 
3:     (Math/log (- 1 p)))) 
4:   
5:  (defn DC [p n] 
6:   (let [np (- 1 p) 
7:      pp (Math/pow np n)] 
8:    (- 1 pp))) 
9:   
10:  (defn k-factorial [n] 
11:   (reduce * (range 1 (inc n)))) 
12:   
13:  (defn permutation [n k] 
14:   (/ (k-factorial n) 
15:     (k-factorial (- n k)))) 
16:   
17:  (defn combination [n k] 
18:   (/ (k-factorial n) 
19:     (* (k-factorial k) 
20:      (k-factorial (- n k))))) 
21:   
22:  (defn- k-filter [el coll] 
23:   (filter #(not (= el %)) coll))   
24:   
25:  (defn permutations [n coll] 
26:    (if (= n 1) 
27:     (map #(vector %) coll) 
28:     (for [el coll nlis (permutations (- n 1) (k-filter el coll))] 
29:      (conj nlis el))))  
30:   
31:  (defn combinations [n coll] 
32:   (set (map set (permutations n coll)))) 



The combinations and permutations implementation is recursive and will result in stack over flow for large data. For non recursive implementation, you can check clojure-contrib's combinatorics library. It is fast and makes use of lazy-seq, meaning you can virtually generate combinations for arbitrary large amount of data. However, the point of this post is to show you that it is very easy to implement statistical formulas in Clojure. You might wander what you can do with the above code. Here is one interesting statistical riddle you can solve with it: Let us say you have a bag with 100 coins. There are 7 gold coins and 93 silver coins. If you are to pick up a coin without looking inside the bag, what is minimum number of coins that you had to pick up to be able to say with 50% confidence that this bag has 7 gold coins? Now, we can put the code above into action:
First we need to clarify the riddle is asking us to find the number of trials, which is N as defined above.
Second, we know that the possibility P of picking up a gold coin is 7/100.  The degree of certainty, which is denoted as DC above, is given as .5 or 1/2. So let us just plug in those numbers to the function N:
=>(println (N (/ 7 100) 0.5))
This will produce 9.551337509447352, which roughly around 10 trials.  So we can say that after picking about 10 coins, you can say with 50% percent confidence that the bag has 7 gold coins.

Again, if you are math inclined person, you will have a lot more fun with Clojure than you might have imagined.  Note that the DC and N formula is directly from following website:
http://saliu.com/Saliu2.htm. I also got help and feedback to my permutations function from nice folks hang around in Google Clojure groups, which I highly recommend you sign up immediately if you want to learn from a master or masters.  I might do another post with the non recursive version of the function in my next post when time permits. Until then, enjoy your Clojure journey.
 

No comments:

Post a Comment