There is so much data on the web that all the rules of databases need to be rewritten
The modern relational database came together around 1970. SQL (Structured Query Language), the language for getting data into and out of databases, was created shortly thereafter. The early relational databases were great for when a company needed to manage a few million records. Banks could keep track of their customers transactions, and airlines could keep track of their customers tickets. But now, it seems, we’ve reached a point where relational databases no longer work. At least, not for the giant stores of data that exist on the web, stores that might have billions, or tens of billions, of records. Apparently the increased speed of CPUs and the much greater availability of RAM has not been enough to give databases the extra speed they need to handle the larger data sets. The implications of that are worrisome.
Joe Gregorio has seen a new pattern emerge for handling very large data sets:
Sure you can store a lot of data in a relational database, but when I say large, I mean really large; a billion or more records. I know we need this because I keep seeing people build it.
He goes on to outline the main features of the new pattern that he sees:
- Distributed
- The data has to be distributed across multiple machines.
- Joinless
- No joins, and no referential integrity, at least at the data store level.
- De-Normalized
- No one said this explicily, but I presume there is a lot of de-normalization going on if you are avoiding joins.
- Transcationless
- No transactions