Chicken Scheme and CouchDB

But Why?

I’ve recently been playing around with Chicken Scheme, leaving poor Clojure all high and dry (minus some visits to the newly started DFW-Clojure User Group). After building a service in C at work, I’ve caught the high-performance bug and Clojure hasn’t particularly been helpful in that regard. I’ve also found Chicken Scheme to be very simple with a more straightforward syntax than Clojure. I’ll still use Clojure, and am using it on a web-app side-project, but client side I’m really digging Chicken.

I’ve also been using MongoDB, but (OMG!) there’s no driver currently for MongoDB in Chicken Scheme. Until I have the chops to write my own version, I decided I might as well learn a similar DB: enter Couch. In my naive view, it’s very similar to Mongo – it’s a document-oriented database with JSON (or variation thereof) as its representative data structure. Even cooler, it’s already been ported to BlackBerry 10. This gives me some really good ideas for new projects.

The Setup

I’m currently running Linux Mint, a “fork” of Ubuntu, so this setup guide follows what was necessary for me to get up and going with Chicken Scheme and CouchDB.

Chicken Scheme

There are two options. The easy way:

sudo apt-get install chicken-bin.

This installs everything for you with no issue, however it’s an older version (as seems to be a common issue with the Ubuntu repositories). At the time of writing it’s at 4.7.0. You can also install Chicken by compiling from source. Download the tarball from call-cc.org, untar, and follow the INSTALL instructions. Easy!

CouchDB Egg

Chicken Scheme has an extension system which packages and distributes “eggs.” An egg consists of Scheme sources plus some meta information such as build scripts and egg dependencies. You can install an egg using the chicken-install utility that comes with Chicken.

chicken-install -sudo couchdb

This will download the couchdb egg along with all eggs in the dependency tree. The utility then begins compiling and installing all eggs along the way. If you don’t do a lot of development, chances are likely that you will run into system dependencies that are unmet. Chicken Scheme compiles to C, and as such many eggs have dependencies on C libraries.

The one I hit with couchdb in particular was openssl. These system lib dependencies can be a little tricky to identify. Two tips:

  • See which egg is failing, then find it in the egg index and look through it’s notes which usually indicate dependencies
  • Read the error message and decipher from there.

For example, my first attempt at installing the couchdb egg failed due to missing openssl header files. The install utility was attempting to compile the openssl egg, which was a dependency of the http-client egg, which is a dependency of the couchdb egg.

A few searches for openssl and I satisfied the dependency with the following command:

sudo apt-get install libssl-dev

With that successfully in place, another call to install the couchdb egg succeeded.

Playing Around

I didn’t have time to compile and install a CouchDB server on my machine, so to the cloud I went. I signed up for an account at cloudant.com which provides a free-for-small-databases CouchDB compatible database. After signing up, I imported the example database aptly named “crud” and began toying around.

Booting up csi, the Chicken Scheme interpreter, I called (use couchdb) and set out to play. The first hurdle: making a connection. The current documentation says to use something along the lines of (define couch (make-connection "databasename" "http://localhost:123")).

That failed miserably.

No matter what I tried, the command (get-server-info couch) would fail with a connection refused. (connection-uri couch) showed that the URI struct was still pointing at localhost. Luckily, the source for Chicken eggs are available, even for CouchDB. The make-connection function is created by the (defstruct connection ...) expression. I noticed that the server attribute of connection struct is a uri-referencenot a string. Reading through the defstruct docs I figured out the following worked – woot.

(import uri-common)
(define cloudant-url server: (uri-reference "https://username:password@username.cloudant.com"))
(define cloudant (make-connection server: cloudant-url database: "crud"))
(get-server-info cloudant)

This returns JSON data in Scheme structure:

#(("couchdb" . "Welcome")
  ("version" . "1.0.2")
  ("cloudant_build" . "768"))

Since I’m new to Scheme, getting the data out was a bit odd, and I’m sure completely inefficient.

(define server-info (vector->list (get-server-info cloudant)))
(alist-ref "version" server-info equal?)

That about sums it up. Hopefully I figure more stuff out and post some updates. Feel free to comment if I’m reading the JSON structure in a completely asinine way. The vector->list doesn’t seem right.

Update

The couchdb egg documentation has been updated to cover most of this. I’ve also noticed a generally handy function that is exported by the couchdb module, json-ref, that makes it simple to grab top-level JSON items like “version” above.

(define server-version (json-ref 'version (get-server-info cloudant)))

ClojureScript Gotchas & Tips

I’ve been playing a lot lately with ClojureScript, especially with regards to mobile development (more on that to come later). Clojure has never been known for great error messages, and ClojureScript is probably even worse in that regard.

Included below are a list of things that gave me a rough time. Some are pretty simple, but needless to say they got me. Hopefully they won’t get you.

Gotchas

namespaces etc

The ns function is very special in ClojureScript when compared to Clojure. I won’t go over all the differences, those have been covered, but I’ll highlight a few things quickly: only one namespace per file; use requires only and require requires as; REPL requires load-namespace instead of ns call; invalid require statements don’t give compile errors.

A big one that got me is requiring/using the Google Closure libs and classes. When you require them in your namespace, the ClojureScript compiler just outputs goog.require(“goog.foobar”). The :as makes no difference here. When you refer to the functions/objects, you have to use the fully qualified name, e.g. (goog.foobar/baz 42).

To sum that annoying issue up, you have to both require and use the fully-qualified name.

Interop With JS Objects

Sometimes you just have to fall back. Use the js* function to create some straight-up JavaScript:

((js* "alert") "Hello")

There will also come a time when you have to create an object to interop with a JavaScript library. It would be nice if ClojureScript maps/lists could be output as an object, but that may just be wishful thinking. In other words, you’ll have to copy and paste one of the dozen or so implementations into your code.

You can also use the goog.object package for help which is wrapped by several CLJS functions, the primary one being js-obj which takes arguments and simply applies them to goog.object/create.

Example:

(js-keys (js-obj "name" "Ben Franklin" "age" 88 "height" 60))

Arity With Functions

When you define a function using defn, it creates a JavaScript function in the background. JavaScript functions do not enforce arity. This means you can do something like this without any failure, only unexpected results.

(defn tip-cows [])
(tip-cows 1 "zebra")

The Compiler Doesn’t Care

Don’t expect the compiler to help you out chasing and picking out errors. The compiler only ensures the code can be read and checks for a few minor things (like :require has :as, etc). The vast majority of errors will be found at runtime. You’ll have to change your mindset from compiled language to interpreted language.

Tips

Himera ClojureScript REPL - An online REPL. Very useful! Don’t miss the Summary page.

Don’t expect (= :ClojureScript :Clojure). They are different.

Surf the source!

When you get a “call undefined” error, you most likely forgot your require, misspelled it, or similar.

When you get an error, or something goes bad, load up the JavaScript console and play around. It obviously helps to know JavaScript to some extent here. For example, if you get an undefined error in something like:

goog.foobar.baz.call(null, helloworld.core.mydef);

You could use the console to probe and find the culprit, e.g. type

goog;
goog.foobar;
goog.foobar.baz;

Keep probing the environment to see what is not as is expected.


A human being…

A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects. – Robert A. Heinlein

I love this quote. I’ve not read the book, but I now plan on it.

  1. Change a diaper
  2. Plan an invasian
  3. Butcher a hog
  4. Conn a ship
  5. Design a building
  6. Write a sonnet
  7. Balance accounts
  8. Build a wall
  9. Set a bone
  10. Comfort the dying
  11. Take orders
  12. Give orders
  13. Cooperate
  14. Act alone
  15. Solve equations
  16. Analyze a new problem
  17. Pitch manure
  18. Program a computer
  19. Cook a tasty meal
  20. Fight efficiently
  21. Die gallantly

I’m pretty far along on the list, though I think I’ll hold off on completing #21 for as long as possible.


A Quick Fix

First off, I love using the keyboard. When I’m forced to use the mouse, my heart cries just a bit. The mouse is an excellent and convenient tool, there is no doubt, but if it’s an action I will perform multiple times, I want to automate it with the quick press of a key (or two).

After looking at XMonad a bit, I wasn’t quite ready to give up my comfortable XFCE-provided window manager but did want to resize my windows in a way that was convenient. A bit of google-fu led me to wmctrl and stiler.py. Once installed, I setup a new Runner command in Launchy, a simple call to stiler.py with a corresponding argument, and voila, a new tiling manager. It’s a bit on the slow side, but it’s simple to install and use and fits into my current workflow without much thinking.

As a side note, I think I’m going to use ramdump.com as more of a Tumblog-type site versus a full-on blog. My notebook is full of ideas, but the perfectionist in me doesn’t like putting out the smaller posts that my time allows. I’ll fight that feeling and use this as my brain dump that I envisioned to begin with.


Obsidian Theme for IDLE

I attempt to use dark backgrounds in all of my editors. I find it much easier on my eyes when I’m staring at code all day long or hacking on something early in the morning. I generally use something close to my favorite color scheme on Notepad++, Obsidian, which I believe is based on Obsidian Coast in KDE.

With that said, Python’s IDLE application is harshly bright compared to my normal editors. After a quick Google search I found only a few themes available for IDLE, so I created IDLE highlighting settings that mimic my favorite color scheme. Here is the first draft for all to enjoy. You can also find the latest updates in this Gist repo. Installation is easy: just copy and paste these settings into your .idlerc\config-highlight.cfg file in your home directory (creating as necessary), then choose it in your Highlighting settings in IDLE by selecting “Use Custom Theme” and “Obsidian” from the dropdown.

Screenshot of Obsidian for IDLE

[Obsidian]
definition-foreground = #678CB1
error-foreground = #FF0000
string-background = #293134
keyword-foreground = #93C763
normal-foreground = #E0E2E4
comment-background = #293134
hit-foreground = #E0E2E4
builtin-background = #293134
stdout-foreground = #678CB1
cursor-foreground = #E0E2E4
break-background = #293134
comment-foreground = #66747B
hilite-background = #2F393C
hilite-foreground = #E0E2E4
definition-background = #293134
stderr-background = #293134
hit-background = #000000
console-foreground = #E0E2E4
normal-background = #293134
builtin-foreground = #E0E2E4
stdout-background = #293134
console-background = #293134
stderr-foreground = #FB0000
keyword-background = #293134
string-foreground = #EC7600
break-foreground = #E0E2E4
error-background = #293134

Project Euler Problem 1 in Clojure

I’ve been trying out Clojure lately. Reasons, non-exclusive, for choosing Clojure?

  • It’s a functional programming language. I’m unfamiliar with this paradigm and want to expand my horizons.
  • It provides for easy concurrency. With multi-core processors the norm now, I want to make sure I get the most out of them.
  • It runs on the JVM, which should mean that I can tie it in to my everyday work, and if I get really stuck provide a mechanism for me to code in something more familiar.
  • It’s new, which satisfies my desire for shiny things.

So with that, the best way to learn a language is to try it out, and Project Euler provides many fun challenges to get you going. Problem 1 is summarized as such: “Find the sum of all the multiples of 3 or 5 below 1000.” Remember, I’m new to Clojure and functional programming in general, so if you see issues, feel free to comment.

The first thing that crosses my mind is how I would do this in procedural languages. First, the naive approach in Java

public int sumOfMultiples(int[] multiples, int max)  {
    int sum = 0;
    for ( int i=1 ; i<1000 ; i++ )  {
        for ( int j=0 ; j<multiples.length ; j++ )  {
            if ( i % multiples[j] == 0 )  {
                sum++;
                break;   //don't sum the same multiple twice
            }
        }
    }

    return sum;
}

public void printTest()  {
    System.out.println("Result = " + sumOfMultiples([3,5], 1000));
}

Like I said, this is a naive approach and has several issues that I will cover later in the post. But as a first shot, I try porting the idea to Clojure.

;Project Euler Problem 1
;Return the sum of all unique multiples of 3 and 5 up to 1000

(defn is-multiple [x y]
  (zero? (mod x y)))

(defn multiple3? [x] (is-multiple x 3))
(defn multiple5? [x] (is-multiple x 5))

(defn sum-it-up [sum adder max]
  (if (< adder max)
    (let [sumadder (if (multiple3? adder) adder (if (multiple5? adder) adder 0))]
      (sum-it-up (+ sum sumadder) (inc adder) max))
    sum))

(defn do-summation [max]
  (sum-it-up 0 1 max))

(println "Result=" (do-summation 1000))

The idea here is the same as the procedural style: loop through numbers 1-1000, and if the number is a multiple of 3 or 5, add it up. Since I don’t want an external state to manipulate throughout the loop, I use recursion and keep track of the state within the function itself. This is the code the got me the correct answer first, so I kept refactoring it as I read more about Clojure.

(defn is-multiple? [x y]
  (zero? (mod x y)))

(defn sum-it-up
  [max]
  (loop [sum 0 adder 1]
    (if (< adder max)
      (let [mult3 (is-multiple? adder 3)
            mult5 (is-multiple? adder 5)
            sumadder (if (or mult3 mult5) adder 0)]
        (recur (+ sum sumadder) (inc adder)))
      sum)))

(time (println "Result=" (sum-it-up 1000)))

After a few iterations, this was the result. Things I added include:

  • Use loop/recur – this allows me to define a single function versus two (one to recurse and one to setup) and also allows for tail call optimization. Without the use of recur, the JVM would quickly run out of stack space.
  • Removed the specialized is-multiple functions – they were more for experimentation anyways.
  • Added a time function to keep track of optimizations. Something like this is a pain in Java, full of boilerplate.

However, this solution has a few things that I don’t like: it still requires iterating over all numbers 1 to max, the conditional for multiples looks “wonky” (that’s an official term by the way) and out of place, and it’s still long. There has to be a cleaner way.

Clojure has the ability to generate lazy sequences – you define the sequences via functions and the values are generated only when retrieved. If I can create a sequence consisting of all multiples of 3 and 5 from 1 to max, I could do a simple call to sum them all. After playing around with the different sequence functions available, I came up with the following:

(defn is-multiple? [x y] (zero? (mod x y)))
(defn sum-it-up [max]
  (reduce + (filter #(or (is-multiple? % 3) (is-multiple? % 5)) (range (inc max)))))

(time (println "Result=" (sum-it-up 1000)))

So what does it mean? It’s easiest to read the code from the innermost forms to the outermost. So I’ll start:

(range (inc max))

generates a lazy-sequence from [0-max] (note inclusive). By lazy, it means it’s not actually in memory and will only be created as called. Note that I could have also done the following:

(map #(inc %) (range max))

I actually think this makes a little more sense — it generates 1-max inclusive since we really don’t want 0. However, since 0 doesn’t matter in a summation I went ahead and left it in. Timing showed a difference of 3 ms favoring my chosen implementation.

Next up we filter that range to only include our multiples using the following:

(filter #(or (is-multiple? % 3) (is-multiple? % 5)) RANGE)

Note that this also returns a lazy sequence as well. Filter takes two arguments, a predicate function that should return true or false and a collection. In this case, the predicate function is an anonymous function that returns true if the value is a multiple of 3 or 5.

Now that we have a sequence of all numbers [1-max] that are multiples of 3 or 5, it’s time to sum them up. The code

(reduce + SEQUENCE-OF-MULTIPLES)

is fairly straightforward. It acts like a looping accumulator taking each item off of the sequence and sending it, along with the accumulated value, to the predicate function (plus in this case) and then storing the result into the accumulator. Once all items have been processed, the accumulated result is returned.


On Descriptive Conditionals

if ( ((x > y) || (y == z)) && isSomethingTrue() )  {
    //Do Something
}

OK, what just happened there? This type of conditional can be painful to read and debug. Is there a better way?

boolean isXBigEnough = x > y;
boolean isYValid     = y == z;
boolean isInCorrectState = isXBigEnough || isYValid;

if ( isInCorrectState && isSomethingTrue() )  {
    //Do Something
}

Pulling anonymous conditionals out into named variables can make reading and debugging code much easier. You can come back to this code a year later and know exactly what it was attempting to do based on the names of the variables. Walking through with a debugger is a piece of cake; we can see exactly which conditional was true or false at runtime without having to perform the checks manually. The second format also allows us to easily modify the conditional values via the debugger to change the execution path at run-time – something difficult to accomplish otherwise. But as with anything in coding, there are tradeoffs. In this particular case, we have a performance hit, and a pretty large one at that.

I’ve taken the following Java code and examined the bytecode produced to get a better idea of what is happening. First up is the straight conditional.

private boolean methodOne(int a, int b, int c, int d)  {
    if ( a < b && b < c && c < d )  {
        return true;
    } else {
        return false;
    }
}

On compilation, this will produce the following bytecode

private boolean methodOne(int, int, int, int);
  Code:
   0:	iload_1
   1:	iload_2
   2:	if_icmpge	18
   5:	iload_2
   6:	iload_3
   7:	if_icmpge	18
   10:	iload_3
   11:	iload	4
   13:	if_icmpge	18
   16:	iconst_1
   17:	ireturn
   18:	iconst_0
   19:	ireturn

Reading this, you can see that the compiler takes advantage of short-circuit evaluation. At first, only the first two arguments are loaded and compared. If the comparison is true, it goes to the return statement. If it’s false, the next two arguments are loaded and evaluated, and so on.

Now let’s compare to the documented version:

private boolean methodTwo(int a, int b, int c, int d)  {
    boolean first  = a<b;
    boolean second = b<c;
    boolean third  = c<d;

    if ( first && second && third )  {
        return true;
    } else {
        return false;
    }
}

Here, we have separated the conditionals out and described what they actually mean. This is a very simple example, and arguably useless in the real world, but the idea is the same. Below is the bytecode produced:

private boolean methodTwo(int, int, int, int);
  Code:
   0:	iload_1
   1:	iload_2
   2:	if_icmpge	9
   5:	iconst_1
   6:	goto	10
   9:	iconst_0
   10:	istore	5
   12:	iload_2
   13:	iload_3
   14:	if_icmpge	21
   17:	iconst_1
   18:	goto	22
   21:	iconst_0
   22:	istore	6
   24:	iload_3
   25:	iload	4
   27:	if_icmpge	34
   30:	iconst_1
   31:	goto	35
   34:	iconst_0
   35:	istore	7
   37:	iload	5
   39:	ifeq	54
   42:	iload	6
   44:	ifeq	54
   47:	iload	7
   49:	ifeq	54
   52:	iconst_1
   53:	ireturn
   54:	iconst_0
   55:	ireturn

The code is now noticeably longer. Breaking it down, you can see that 0-10 stores the first boolean’s conditional result. 12-22 store’s second and 24-35 stores third. Now to the important conditional, 37-55. This looks almost exactly like methodOne‘s bytecode but it only loads a single value versus two values in methodOne. It still provides for short-circuit evaluation of the final conditional.

The issues should be obvious at this point. methodTwo must perform all of the conditional checks, whereas methodOne takes advantage of short-circuit evaluation to only perform work as necessary. We must also store an additional value during runtime for each conditional, and with that comes the cost of cleaning it up when it goes out of scope.

To compare the run-time performance of the two approaches, I benchmarked the simplest cases that do not take advantage of short-circuit evaluation since the short-circuit may or may not run.

    private boolean methodOne(int a, int b)  {
        if ( a < b )  {
            return true;
        } else {
            return false;
        }
    }
    
    private boolean methodTwo(int a, int b)  {
        boolean first  = a<b;
    
        if ( first )  {
            return true;
        } else {
            return false;
        }
    }

Benchmarking this simple code shows that the mere act of storing a new boolean slows the method down by 18%. With more conditionals and use of short-circuit evaluation, the performance hit will only go up from there.

With that said, this is still extremely fast code; on my laptop running each of these methods 10 million times each, methodOne was only faster than methodTwo by 14 ms. That’s only a 1.4 nanosecond difference between each method.

So the question remains, is descriptively naming your conditionals a worthy practice? The answer will depend on your exact situation. For me, I prefer the more descriptive format that naming conditionals provides, and the debugging aspect really puts it over the top. There have been several times where I was able to effectively identify and resolve buggy code by breaking down large and obscure conditionals into their component parts and labeling them with descriptive variables.

My general approach to writing complex conditionals is to use descriptive naming first, then optimizing later if it is needed. The majority of the time, it’s not, and I’d rather be able to read and debug my code than worry about a few nanoseconds performance hit. Refactoring into the optimized form is a trivial exercise. And with the knowledge gained by actually testing both approaches and looking at the bytecode that is produced, you should be able to make a more informed decision as to which is better for any given situation.


Follow

Get every new post delivered to your Inbox.