Archive for the ‘file systems’ Category

btfs

Sunday, August 2nd, 2009

Yesterday I learned about ZFS. Apparently there is a file system called btfs, which bring some of the same features to Linux:

Imagine you are a Linux file system developer. It’s 2007, and you are at the Linux Storage and File systems workshop. Things are looking dim for Linux file systems: Reiserfs, plagued with quality issues and an unsustainable funding model, has just lost all credibility with the arrest of Hans Reiser a few months ago. ext4 is still in development; in fact, it isn’t even called ext4 yet. Fundamentally, ext4 is just a straightforward extension of a 30-year-old format and is light-years behind the competition in terms of features. At the same time, companies are clamping down on funding for Linux development; IBM’s Linux division is coming to the end of its grace period and needs to show profitability now. Other companies are catching wind of an upcoming recession and are cutting research across the board. They want projects with time to results measured in months, not years.

Ever hopeful, the file systems developers are meeting anyway. Since the workshop is co-located with USENIX FAST ‘07, several researchers from academia and industry are presenting their ideas to the workshop. One of them is Ohad Rodeh. He’s invented a kind of btree that is copy-on-write (COW) friendly [PDF]. To start with, btrees in their native form are wildly incompatible with COW. The leaves of the tree are linked together, so when the location of one leaf changes (via a write – which implies a copy to a new block), the link in the adjacent leaf changes, which triggers another copy-on-write and location change, which changes the link in the next leaf… The result is that the entire btree, from top to bottom, has to be rewritten every time one leaf is changed.

Rodeh’s btrees are different: first, he got rid of the links between leaves of the tree – which also “throws out a lot of the existing b-tree literature”, as he says in his slides [PDF] – but keeps enough btree traits to be useful. (This is a fairly standard form of btrees in file systems, sometimes called “B+trees”.) He added some algorithms for traversing the btree that take advantage of reference counts to limit the amount of the tree that has to be traversed when deleting a snapshot, as well as a few other things, like proactive split and merge of interior nodes so that inserts and deletes don’t require any backtracking. The result is a simple, robust, generic data structure which very efficiently tracks extents (groups of contiguous data blocks) in a COW file system. Rodeh successfully prototyped the system some years ago, but he’s done with that area of research and just wants someone to take his COW-friendly btrees and put them to good use.

Very cool technology in OpenSolaris

Friday, July 31st, 2009

I’ve always been happy with Linux on the server, and I’ve never felt any need to experiment with OpenSolaris, until I read this bit about ZFS:

ZFS is the most amazing filesystem I’ve ever come across. Integrated volume management. Copy-on-write. Transactional. End-to-end data integrity. On-the-fly corruption detection and repair. Robust checksums. No RAID-5 write hole. Snapshots. Clones (writable snapshots). Dynamic striping. Open source software. It’s not available on Linux. Ugh. Ok, that sucks. (GPL is a double-edged sword, and this is a perfect example). Since it’s open-source, it’s available on other OSes, like FreeBSD and Mac OS X, but Linux is a no go. *sigh* I have a feeling Sun is working towards GPL’ing ZFS, but these things take time and I’m sick of waiting.

The OpenSolaris project is working towards making Solaris resemble the Linux (GNU) userland plus the Solaris kernel. They’re not there yet, but the goal is commendable and the package management system has taken a few good steps in the right direction. It’s still frustrating, but massively less so. Despite all the rough edges, though, ZFS is just so compelling I basically have no choice. I need end-to-end data integrity. The rest of the stuff is just icing on an already delicious cake.