Thursday, July 16, 2009

Practical uses of n-euclidean distance

One thing I really hate is tech blogs, or programming blogs for that matter, that talk about a given subject domain - throw some code on the screen and that's it for the discussion.

It doesn't really explain the practicality of their opinion or even their intelligence. I've seen some very messy code written by some very intelligent folks (myself included) so, just because you can splatter some pixels in the shape of keywords doesn't mean that you really know what you're talking about.

Most importantly - talking is not communicating. It only becomes communication when the reader understands what you're blathering on about. Just ask my wife - I communicate very poorly A LOT of the time. ;-)

Anywho -

I thought I would share some examples of using the Scheme code I tossed up there the other day in some practical manners. So, if you don't recall, just scroll down and re-read the previous post.

If you're following along with a PocketPC with Scheme then please leave a comment and we'll talk. Seriously.

You can also follow along with PLT Scheme just remember to put it in "R5RS" mode before you start.

Example 1:

You're looking at the map for a place to eat. You've been driving all day, (perhaps on a road trip between Halifax and Toronto) and the map shows that there is a Jack Astors and a Moxies both within a reasonable amount of distance.

But how close are they? Well, let's say you have one of those maps that has a row of numbers and a row of letters. Replace the letters with numbers, such as A = 1, B = 2, etc and you now have a map with Cartesian notation and you are ready to use the n-euclidean function I wrote for you (yes, just for you) to find out which one is closer.

Remember, the higher the number in this function, the less the distance between you and the object. (Counter intuitive but there you have it).

So, here we go:


(define you (list 12 12)) ;you are at 12, 12 on the map
(define moxies (list 4 3))
(define jack-astors (list 23 20))

> (n-euclidean jack-astors you)
0.07352146220938077

> (n-euclidean moxies you)
0.08304547985373997
So, based on this it appears that Moxies is closer to you than Jack Astors by 0.0095240176443592 units on your map.

So, next up is a more personal type of usage. Let's say that my friend has been raving about this new online movie critic that knows her stuff and really can pick good movies. Now, fortunately she's been posting for a while and has been going through a list of older titles and giving her reviews on them as well so I'm able to compile a list of movies that I think are brilliant and ones that I think suck eggs.

Here is my responses to the following titles:

Lost Horizon 5
Big Trouble Lt. China 4
Matrix 5
American Beauty 4
LA Confidential 4
F&L in Las Vegas 5
English Patient 1
Far and Away 1
Donnie Darko 4
Good Will Hunting 5

and the critic's ratings:


Lost Horizon 5
Big Trouble Lt. China 2
Matrix 4
American Beauty 5
LA Confidential 5
F&L in Las Vegas 1
English Patient 5
Far and Away 5
Donnie Darko 2
Good Will Hunting 5

and for good measure the scores from Rotten-Tomatoes.com (score is divided by 20 to maintain a 1-5 range):
Lost Horizon 5
Big Trouble Lt. China 4.15
Matrix 4.3
American Beauty 4.45
LA Confidential 4.95
F&L in Las Vegas 2.4
English Patient 4.2
Far and Away 2.4
Donnie Darko 4.15
Good Will Hunting 4.85

So, with all the data - we're ready to find out which critic matches my mindset the most.


(define me (list 5 4 5 4 4 5 1 1 4 5))
(define online-critic (list 5 2 4 5 5 1 5 5 2 5))
(define rotten-tomatoes (list 5 4.15 4.3 4.45 4.95 2.4 4.2 2.4 4.15 4.85))

> (n-euclidean me online-critic)
0.13018891098082389

> (n-euclidean me rotten-tomatoes)
0.2202060992539127
So, it would appear that I should stick with picking my reviews from Rotten Tomatoes and ignore this new would be darling of the online movie criticism space.

The last example is somewhat contrived but, gives an indication on some workplace usages for these types of criteria judging systems.

Let's say that the program manager over at Acme Widgets has hired me to create a system which will help him pair up his programmers into XP teams in such a way that the programmers don't kill each other or devolve into hostile silence with zero productivity.

So, I generate a list of touchy-feely questions such as:
  • I prefer rainy days to sunny days
  • I prefer cats to dogs
  • One must create the plan for the application before starting
  • Blue is a nice colour
And so on, for about 19 questions. The users are to rate the 'truth' of the statement on a scale of 1 to 10 with 1 being "I totally disagree" and 10 being "Amen brother - preach it".

So, let's see what we get:

(define joe-javascript (list 1 2 4 3 2 1 1 2 4 5 6 9 7 6 4 5 6 1 1 1))
(define robert-ruby (list 3 4 6 5 4 3 3 4 6 7 0 0 0 0 6 7 0 3 3 3))
(define lucius-lisp (list 1 2 2 9 9 8 7 9 8 8 7 8 9 0 8 8 7 7 8 7))
(define charlie-c (list 2 3 5 4 3 2 2 3 5 6 3 5 4 3 5 6 3 2 2 2))

> (n-euclidean joe-javascript robert-ruby)
0.057928444636349226
> (n-euclidean joe-javascript lucius-lisp)
0.047836487323493986
> (n-euclidean joe-javascript charlie-c)
0.12216944435630522
> (n-euclidean robert-ruby lucius-lisp)
0.047619047619047616
> (n-euclidean robert-ruby charlie-c)
0.10976425998969035
> (n-euclidean lucius-lisp charlie-c)
0.052999894000318
So, let's piece this together. At first look, it appears that the best match is Joe and Charlie with a ~.12216 but, the side effect of pair them would mean that Lucius and Robert would have to work together and that gives us the lowest score of the set with a ~.0476

This leads us to the logical conclusion that the best pairing is going to be Joe and Robert and Lucius and Charlie. Charlie seems to be the most balanced of the group, getting along with everyone except for Lucius.

Tuesday, July 14, 2009

n-dimensional sweetener

Well, I couldn't wait - I had this:


\sqrt{(p_1-q_1)^2 + (p_2-q_2)^2+...+(p_n-q_n)^2}.
In my head all afternoon so I worked up a quick and dirty set of functions to deal with a n-dimensional pair of lists.


; sub-list, a function to subtract one list from the other
(define (sub-list xlist ylist)
(let* ((blah (list 'head)))
(let iter ((i 0))
(if (>= i (length xlist)) (cdr blah)
(begin
(set! blah
(append blah
(list (- (list-ref xlist i)
(list-ref ylist i)))))
(iter (+ i 1)))))))

; sum of squares
(define (sum-of-squares xlist)
(let* ((sum 0))
(for-each (lambda (x) (set! sum (+ sum (square x))))
xlist) sum))

; n-euclidean distance
(define (n-euclidean n n1)
(/ 1 (sqrt
(sum-of-squares
(sub-list n n1))) 1)))

Combined with the square() function from previously today one now has the ability to do such things as the following:
  1. Use a standardised list of questions to create "compatibility" charts
  2. Use this as a performance metric - using the final result as a range from 0-1.0 with regards to predictions vs realities
  3. Plot the distance between objects in n-dimensional space
#1 would be useful for creating teams -as in, ask a bunch of questions and find the best matching partners.

#2 is used in Build Your Own Expert System which is a book I have read, marked up, and re-read so many times it is looking very much worse for the wear.

#3 - well how many n-dimensional spaces do you know?

Euclidean Distance

So, I'm packing things up and got to thinking about my copy of "Programming Collective Intelligence" from which I've been making notes on a port of the code to Ruby from Python as a way of building my Python-Fu.

But, I confirmed something that Rich Hickey talks about in his Clojure presentations. That "thing" is that the increase in power one receives from every expression being evaluated is far superior to most other languages vs. a Lisp. (Ex Clojure, Kawa, PocketScheme, PLT, CLISP etc)

He calls it "ceremony" and talks a bit about how Clojure removes 90% of the nonsense that goes on when building a Java program by abstracting it all away. (I mean, realisitically if Groovy can get away with being dynamically typed, what's the deal with Java remaining static? How 20th century can one be?)

Anywho - I had a few minutes break in the morning while my daughter was trying to take her mornig nap (all sounds must cease in the universe for her to find her "happy place") and so I huddled up in the living room with my PocketPC and a copy of PCI went to work.

The first example in the book is how to classify various data as a relation to the other. The example given is of the user trying to find how close two critics are (which inferentially speaking could mean the user would provide ratings for a set of movies and then the system would match them up with a critic they would agree with but, I digress) using the Euclidean Distance.

For the sake of not being viewed as a copyright infringer, I won't quote the book but, you can look at the Python code on page 10 where the formula is presented as an interaction with the Python REPL.

Here's my translation of that into Scheme:


; square function
(define (square n)
(expt n 2))

; euclidean distance
(define (euc-distance x y x1 y1)
(/ 1
(+
(sqrt (+ (square (- x x1)
(square (- y y1)))) 1)))

Essentially one takes the 1/(square root of the (sum of the squares of the (distances between vertices)) + 1)

Which is pretty much how I wrote this function - by working inside out.

It was a nice way to spend about twenty minutes.

Now what's interesting to me is that while my formula is only applicable to two dimensions - one could easily abstract it to be applicable for n dimensions in alignment with the actual theorem.


\sqrt{(p_1-q_1)^2 + (p_2-q_2)^2+...+(p_n-q_n)^2}.

Meaning one would need to iterate over the whole set of differences between the two items along their dimensions.

For someone coming from a liberal arts background - all this n dimension stuff is exciting.

Sunday, July 12, 2009

Scheming Away

I've been averaging about a function a day (at this rate it'll take a thousand days to complete my Ruby port) and some days are better than others.

The interesting thing is that there is so many options to port that when I get stuck on one, I usually shift sideways mentally and solve another before I come back.

Example, I'm stuck on the reverse of the list-join function which would be (in Ruby) a split function usually used as thus:

"1,2,3".split(',') => [1 2 3]


I haven't found an elegant way inside my head to solve that issue however while working on it I was able to "improve" on Ruby somewhat by addressing what I have always griped about with regards to the index function.

Example, in Ruby one can use the index function to return the index (string or array) of what you are searching for by using the index method on that object.

The problem (for me) has always been that the index function only returns the first instance of the item in the set. Which is (again for me) kind of silly because it precludes being able to do more interesting manipulations.

Ruby Way (ugh)
[1,1,2,3,4,5].index(1) => 0

Now, if I'm interested in all of the locations of the value "1" in the set I'm going to have to do something annoying like build a separate array and store in it the location of the item as I iterate over the set.

I think by now you can see why this annoys me because my thinking along the lines of the Object.split function requires knowing not only what the delimiter is but also the location of each instance of the delimiter within the set.

So, when I created the index function, I made my Ruby port drift a little (some amateur language designing if you will) with the following:

;index
(define (index lst val)
(let* ((blah (list 'head)))
(let bump ((i 0))
(if (= (length lst) i)
(cdr blah)
(if (equal? (list-ref lst i) val)
(begin
(set! blah
(append blah (list i)))
(bump (+ 1 i)))
(bump (+ 1 i)))))))

Which can be shown like this:
> (define l-ist (list 0 1 1 2 3 4 4 5 6))
> (index l-ist 1)
(1 2)

Now, I can convert my string to a list (string->list) and find out the locations of all the delimiters as follows:

> (define commas "1,2345,789,0")
> (index (string->list commas) #\,)
(1 6 10)

Which will allow me to use my slice function to carve up the interior bits of the string.

At the moment, the index function only works on lists so it should be called list-index but, duck-typing is an easy mimic as well so I would write something along the lines of:

(define (index object val)
(if (list? object) (list-index val)
(if (string? object) (string-index val)
(if (vector? object) (vector-index val)))))

Elegant it ain't but, it works. What I hope to do is look at some of the clojure source code and see if I can grok what Rich Hickey did by abstracting the seq function over every object in clojure.

The more I look at clojure, the more I find myself disappointed somewhat that Groovy looks so freakin' ugly. It's as if the language designers want to slowly wean the Java-istas off the curly braces without creating culture shock.

From someone who's spent enough time mucking about with Lispy languages the curly braces only mean one thing to me: hash-table.

Clojure is brilliant for a number of reasons but, my two favourites are:

1. It removes the ugly java syntax and replaces it with something beautiful
2. It does #1 while preserving the power of the Virtual Machine, which is second to none.

Clojure for Lisp Programmers is a presentation Rich gave for the Boston Lisp meetup.

There is one for Java programmers being introduced to Lisp but, if you take the time to go through either SICP or "Teach Yourself Scheme in Fixnum Days" you can probably ignore it and go right to the meat of the matter. He explains more about Clojure to the Lisp programmers because he doesn't have to explain Lisp first.

Unless
I love unabashedly the "unless" conditional much in the same way that I prefer to invert my sentence structures. But, mostly it's because any language that has the "unless" allows me to write a conditional the way English speakers normally write - that is to say "Give me that bucket, unless it has paint in it".

Well, PocketScheme doesn't have "unless". Only if, cond, equal?, eqv?, =, case are part of the language.

But, no fear - some googling around and I found a procedural definition of the Unless statement on Google Books that goes like this:

(unless test expression1 expression2)

Well, that makes tonnes of sense to me. So, here it is in Scheme:

(define (unless test expr1 expr2)
(if (eqv? test #t) expr2 expr1))

usage:
> (unless (= 2 3) 't 'f)
t

I hope to have the first fifty or so functions up on a github location by the end of the month and when I do, I'll provide the link.

Tuesday, July 7, 2009

Test Driven Development with PocketScheme

During the last few days, I've been feverishly pounding out more and more Scheme code, reading a bunch of online documentation and generally staying up until the wee hours of the morning learning more about the rich history and power of Scheme.

Days -> spent looking for a job and packing/sorting/etc
Nights -> Scheme both using PLT and PocketScheme on my ipaq.

However, since I spent the last year in the middle of a programming culture where the first thing one was to do was develop a test for the function you wanted to write, I began to feel a little uneasy about my code until I figured out what it was: I had no unit testing ability.

So, I wrote a small little function called "assert" and it goes like this:
(define (assert test result)
(if (equal? test result)
(display ".")
(display "F")
)
)
which then has a unit test for the list-join function described above as:

(assert (list-join (list 1 2 3) #\,) "1,2,3")
While I was waiting at the doctor's office, I coded up a half dozen tests for my little framework and it wasn't until I googled "unit testing Scheme" that I found this gem: http://www.neilvandyke.org/testeez/ which is Unit Testing for R5RS.

Ah well.

Tuesday, June 30, 2009

ruby-fying scheme

So, I was reviewing ID3A and realized that there was a potential usage (mostly for myself) to port it into PocketScheme for my ipaq.

Then, I hit upon the brick wall that is string manipulation (or lack thereof) in Scheme.
For example, there is no String.reverse equivalent (hint: convert it to a list, then reverse, then back to a string) so I thought I would begin my porting all of my favourite Ruby commands over to Scheme.

First up: Array.join => list-join


; list-join works the same way that Array.join in ruby works
; in other words, take an array and make a delimited string
; "one,two,three"
; a simple way to build strings for file output
; x-> list
; y-> delimiter (must be a char)
(define (list-join x y)
(define o (open-output-string))
(let iter (( i 0 ))
(if (= i (length x)) (newline)
(begin
(display (list-ref x i) o)
(if (< i (- (length x) 1)) (write-char y o))
(iter (+ i 1)
)
)
)
)
(get-output-string o)
)

Sunday, June 28, 2009

Scheme Goodness

But after several months of fighting in my spare time with getting the eVB up and running on my virtualised XP system, I just gave up and figured that this blog would not really go anywhere interesting.

Until, I chanced upon the fact that PocketScheme has the hooks in it to generate and deal with actual WindowsCE forms.

Of course, it's as ugly as writing windowed apps in C but, hey - it's Scheme for pete's sake. One should be able to abstract much of that ugliness that is windowed apps in C. (*shudder)

Here's a taste of some Scheme goodness in a simple app that adds two to a number you give it:

(begin
(define (x n) (+ 2 n))
(display "enter a number:")
(define mynum (read))
(define res (x mynum))
(display #\newline)
(display "result:")
(display "res)
)