Sunday, May 9, 2010

Trends in NoSQL space - what is social web infrastructure using?

If you are following the trend in social web infrastructure, perhaps the biggest and most significant change that is happening is abandoning of MySQL in favor of NoSQL alrenatives. This is shocking for many people who have been developing highly scalable applications using relation database for many years.
Why LAMP is being dumped?
MySQL has been the main ingredient of the LAMP architecture and there are several reasons companies are dumping MySQL database for todays application that are heavily write intensive and growing at such a fast speed with no end in sight. These new breeds of applications require data to be partitioned both vertically and horizontally and must be distributed on several nodes across multiple data centers. Relational databases do this at very heavy price.
There are many NoSQL alternatives to choose today and here are the major issues, based on which CTO is choosing what to use(they all run on commodity hardware so thats a non-issue):
  • Can linear scaling possible by adding new machines and how easy is to add new machine to the infrastructure?
  • Are there any single point of failures? 
  • What is the human cost in maintaining the system?
  • Does system scale in massive writing?
  • What underline technology has been used, is it easy to find people?
  • Is there a healthy community around the project?
  • how easy it is to rewrite and deploy?
These are the options companies are choosing today to build and port their applications:
  1. Cassandra:  originally developed at facebook in early 2008 and later donated to Apache foundation. It is  a distributed database with Google's BigTable like data model that runs on Amazon's Dynamo like web infrastructure. A lot of companies like facebook, Digg, Twitter etc.. uses Cassandra. IBM Research uses Cassandra in BlueRunner; a hosted email service in the cloud infrastructure.
  2. HBase: HBase is hadoop database, an open source implementation of Google's BigTable. This is the best article that explains Hbase and BigTable concepts to a newbie. IBM Research has built a distributed text index that leverages the scalable control layer of HBase.
  3. MemcacheDB: MemcacheDB uses berkleydb as backend. It is a persistent storage engine for fast and reliable key-value based object storage and retrieval. MemcacheDB - A complete guide is the best resource if you would like to evaluate it.
  4. Mongodb: Companies such as shutterfly and sourceforge, the open source repository uses mongoDB. Its a document oriented distributed database which supports full indexing on any attribute. 

No comments:

Post a Comment

Make Everyone Smile

Hey there! Just wanted to let you know that today is officially National 'Make Everyone Smile' Day! So, consider yourself officially...