Incanter: an R-like statistical package for the JVM
Monday, January 4th, 2010Incanter is a statistical package written in Clojure. It brings some bits of the R-language to the JVM.
Incanter is a statistical package written in Clojure. It brings some bits of the R-language to the JVM.
I didn’t realize that there was a web framework built using Clojure, but Compojure has been around for over a year now. Brian Carper wrote a small online game using Clojure/Compojure, and it sounds like an impressive combo. (My one dislike: templates made up of code.) He is highly critical of Rails:
This was a nightmare to do properly in Rails. Rails wants to dispatch to controllers, which are classes. To make this work in Rails I had to mangle this concept to fit into the idea of a hierarchy of classes and subclasses. There might have been an elegant way to do it, but I couldn’t think of one. I suspect someone will leave me a flame comment telling me how to do it.
I’m also surprised to learn that Clojure is over 2 years old. It is weird how much time I spend reading tech news, and yet so many developments go past me without me noticing.
Colin Steele writes about the challenges that he and his team are facing at Hotelicopter:
We’ve officially run headlong into one of Ruby on Rails’ deficiencies: programming in the large. We’re not interested in computer science-y solutions, only pragmatic ones.
I’m curious what is excluded by the phrase “computer science-y solutions”? I would normally interpret that to mean “we are looking for well tested solutions with wide deployment” but elsewhere he writes:
We’re currently investigating a spectrum of new technologies in the NoSQL realm, including Tokyo Tyrant, MongoDB, Amazon’s SDB, CouchDB, Voldemort, and more. Tis’ a dizzying mix, and things are popping in the space.
So clearly they are looking at some cutting edge technologies. Colin links to an essay about Programming in the large which includes this:
Maintenance and locality are strong arguments in favor of immutability. The less aspects of an object can be changed, the less you have to worry about the execution history. If an object has a two-phase initialization sequence (e.g. this is so in C++ if you need a virtual function during initialization), you have to make sure that the objects are properly initialized; code that gets handed over such an object will have to check that it’s initialized (if only in an assert()). This all vanishes if the language makes sure that no object remains uninitialized, and doesn’t force a two-phase initialization on programmers like C++. (In C++, the “wrong” design decision was that objects mutate from the base type to the subtype when the various constructors are run. It’s this kind of far-reaching consequences (IOW non-locality) that makes language design an art.) If you take immutability to its extremes, nothing can ever be changed. If you wish to change the world, you write a function that returns a list of changes and let the run-time system inspect that list and execute the proper actions. If you have an interactive program, you emit a list that has a function pointer at its end; the function gets fed the next input and is expected to generation another action list.
In the last few years, many programmers have pointed to mutability as one of main problems they face when they build larger systems. Chas Emerick does a good job of highlighting the problem in his post “All my methods take 316 arguments, and I like it that way”
316 arguments to a method (which I don’t think is actually possible in the jvm, but bear with me)? “That’s absurd!”, you’d say. The problem, of course, is that the 3-arg doSomething actually has far more arguments than its signature implies:
The behaviour of every function in a mutable, imperative environment is dependent upon the state of all of the other (variables|attributes|bindings|whatever) in your program at the time the function is invoked.
So, if you have 313 other variables in your program, that 3-arg doSomething is functionally (ha!) operating over 316 arguments.
Would you ever intentionally write a method signature that takes 316 arguments? Would you use any library that contained such a function signature? No? Then why are you using tools that force such craziness upon you?
Chas says “The languages are ready” and he links to some of the major functional languages: Erlang, Clojure, F#, and Fantom. In comments, his readers add in their favorite functional languages: OCaml, Haskell, etc.
One of his readers challenges the idea that functional languages are safer than imperative languages by offering this:
Regarding functions changing state, what about things like this in clojure, Isn’t it like global variables in imperative lang?
(def state (ref #{}))
(defn function that updates state)
(defn another function that updates state)
Chas responds:
You bet. Clojure is not a purely functional programming language, so you can have as much shared state as you want – but the language is going to make you work for those bits of shared state, so you have to “pay” for them. Conversely, imperative languages like Java et al. make you work to achieve immutability, and provide nothing in the way of enabling persistent data structures, etc.
The point is that defaults matter, a lot.
I like the word “default” in this context. In his 2001 book, Effective Java, Joshua Bloch wrote “Favor immutable objects over mutable.” When writing a big system in Java, you work to make your system immutable. In a language like Clojure, the default is just the opposite – you work to make parts of your system mutable.
I have very little experience with functional programming. I am just learning Clojure now (Lisp redone for the JVM). I can not say what benefits its brings. I’m looking forward to learning more in this area. It’ll be interesting to see where functional programming comes to be regarded as a “best practice”. Certainly, it will be interesting to see if CTO’s start using these languages at startups, or whether they will be regarded as “computer science-y solutions”.
I’ve somewhat more experience with the web app frameworks that have emerged since 2004. I’m interested in what Colin wrote here:
I suspect as we muddle along we’ll develop a component-level (service level if you prefer) version of the Law of Demeter, which will drive us to make the right decisions for decoupling. I’m not too worried about that. However, we definitely have issues with reuse. Currently the Ruby on Rails state of the art solution for reuse is the gem. Which, let’s face it, is a pathetic solution.
Some of the frameworks seem to encourage bad habits. I’ve already written of Symfony’s weaknesses in Symfony versus The Law Of Demeter: does Symfony promote bad habits?.
When Ruby on Rails first emerged it was targeting web apps, not web services. Rails has a lot of imitators: Groovy/Grails, PHP/Symfony, etc. These all help create web sites, but not necessarily web services. I suspect a new generation of frameworks will be needed to make this kind of work easier:
The place we’re aiming for is a highly decoupled (and scalable), cohesive set of services, joined through REST APIs and/or fully reused common business models.
In their book Restful Web Services the authors Leonard Richardson and Sam Ruby talk about “the human web” and the “programmable web”. This is from page 2:
The Web you use is full of data: book information, opinions, prices, arrival times, messages, photographs, and miscellaneous junk. It’s full of services: search engines, online stores, weblogs, wikis, calculators, and games. Rather than installing all this data and all these programs on your own computer, you install one program – a web browser – and access the data and services through it.
The programmable web is just the same. The main difference is that instead of arranging its data in attractive HTML pages with banner ads and cute pastel logos, the programmable web usually serves stark, brutal XML documents. The programmable web is not necessarily for human consumption. Its data is intended as input to a software program that does something amazing.
Originally, frameworks like Rails were created to help speed the production of sites for the human web. They have evolved since then, Rails in particular. In fact, Richardson and Ruby use Rails for many of the examples they offer in the book, about how to correctly build a RESTful web service. And yet, the scaffolding systems in these frameworks still tend to automate the production of CRUD web pages, rather than PGPD services. (I do not know the state-of-the-art with Rails, so someone can tell me if I’m wrong about its scaffolding.)
Richardson and Ruby suggest that every module (resource) in a RESTful web service should expose just 4 actions:
POST
GET
PUT
DELETE
These are the HTTP verbs, and they roughly correspond to the standard CRUD actions, except that POST is used for both Create and Update, and PUT is used for uploading files:
Create/Update
Read
Upload
Delete
I suspect we need a new generation of frameworks, or at least new scaffolding systems for the existing frameworks, that automate the setup of PGPD services. That seems like the next obvious step forward.
Rich Hickey offers a great write up of the problems facing the funding of open source software:
There *are* companies that make software themselves, whose consumers
see a value in it and willingly pay to obtain that value. The money
produced by this process pays the salaries of the people who are
dedicated to making it, and some profit besides. It’s called
“proprietary software”. People pay for proprietary software because
they have to, but otherwise the scenario is very similar to open
source – people make software, consumers get value from it. In fact,
we often get a lot less with proprietary software – vendor lock-in, no
source etc. Most alarmingly, this is the only model that associates
value with software itself, and therefore with the people who make it. …As should be obvious, Clojure is a labor of love on my part. Started
as a self-funded sabbatical project, Clojure has come to occupy me far
more than full-time. However, Clojure does not have institutional or
corporate sponsorship, and was not, and is not, the by-product of
another profitable endeavor. I have borne the costs of developing
Clojure myself, but 2009 is the last year I, or my family, can bear
that.Many generous people have made donations (thanks all!), but many more
have not, and, unfortunately, donations are not adding up to enough
money to pay the bills. So far, less than 1% of the time I’ve spent on
Clojure has been compensated.Right now, it is economically irrational for me to work on Clojure,
yet, I want to continue working on Clojure, and people are clearly
deriving benefit from my work. How can we rectify this? Barring the
arrival of some white knight, I’m asking the users of Clojure to fund
its core development (i.e. my effort) directly, and without being
forced to do so.
This is a great interview between Bill Venners and Rich Hickey. Hickey is the guy behind Clojure, the new JVM language that brings Lisp to the world of Java. I have so far only heard good things about Clojure. Hickey makes the argument that what is needed in modern languages is the automatic management of time and concurrency issues. In the same way that, during the early 90s, Java introduced (to the mainstrteam) the idea of automatic garbage collection, now we need languages to automatically manage the problem of multiple processes accessing the same object at the same time:
Bill Venners: What do you mean when you say the problem of mutable state is a time problem?
Rich Hickey: If somebody hands you something mutable—let’s say it has methods to get this, get that, and get the other attribute—can you walk through those and know you’ve seen a consistent object? The answer is you can’t, and that’s a problem of time. Because if there were no other actors in the world, and if time wasn’t passing between when you looked at the first, second, and third attribute, you would have no problems. But because nothing is captured of the aggregate value at a point in time, you have to spend time to look at the pieces. And while that time is elapsing, someone else could be changing it. So you won’t necessarily see something consistent.
For example, take a mutable Date class that has year, month, and day. To me, changing a date is like trying to change 42 into 43. That’s not something we should be doing, but we think we can, because the architecture of classes is such that we could make a Date object that has mutable year, month, and day. Say it was March 31, 2009, and somebody wanted to make it February 12, 2009. If they changed the month first there would be, at some point in time, February 31, 2009, which is not a valid date. That’s not actually a problem of shared state as much as it is a problem of time. The problem is we’ve taken a date, which should be just as immutable as 42 is, and we’ve turned it into something with multiple independent pieces. And then we don’t have a model for the differences in time of the person who wants to read that state and the person who wants to change it.
…The time problem is not easy to see in today’s mainstream languages because there are no constructs that make time explicit. It is implicit in the system. We don’t even know that’s what we’re doing when we use locks to try to make this work. Because what we’re trying to do is partition time up to say, I’m going to get a portion of time when I get to look at it, and you’re going to get a separate portion of time when you’ll get to write it. That time management we have to do manually. We have to use locks and come up with some kind of convention, because it’s not automatic. So that’s why I’m saying, the problem here is a lack of automatic time management. We have to do that manually, just like we had to call delete before we had garbage collection. Somebody allocated something, and we had to call delete. It was our problem. It was manual. Now when we want to change a date or look at a date coherently, we have this time management problem that we use locks to try to solve.
This idea has been gaining strength for at least 2 years now. Automatic thread management was one of the reasons why Sam Ruby argued that Erlang would be one of the key technologies of the next 5 years.