Sunday, July 1, 2012

Analytic databases are also all about ACID but .....

One acronym in computer science we all have learnt is a drug called ACID. In fact, ACID properties (Atomicity, Consistency, Isolation, and Durability) form the basis of a transactional system of the business world. However maintaining these properties come at heavy expenses. In modern computing where information is analyzed with no waiting, some are trading off these properties for scalability and throughput. The drug itself has changed, but fortunately name remained the same. Some data scientists cleverly kept the name ACID and gave its properties a new meaning. The new ACID is defined as below:
  • A – Associative  
    • Lets look at the classic mathematics example. x+(y+z) = (x+y)+z.  associative property means, a series of operation can be performed in any order. If you look at the distributed computing paradigm, it is all about lack of rigidness ad dependency of operations. 
  • C – Commutative
    • Commutative property in mathematics implies that changing the order of operation does not change the result. x*y = y*x  This can be understood by a practical daily life example. If you are paying cash for a item, it does not matter which order give 2 bills the cashier. Total always will be the same.  
  • I – Idempotency
    • A function remains idempotent if repeated application of that function does not change the result. f(x) = f(f(x))
  • D - Distributive
    • fundamental rule for modern business systems
Now lets see if these property make sense for today's big data analytic systems. Typical requirement is to process a large set of records and find certain records and aggregate them. First we distribute the operation (Distributive), divide the input set into multiple chunks, process each chunk on separate systems and aggregate the results when available (associative and commutative). These operations can be invoked repeatedly (Idempotency) To understand this term in the context of computer science, consider looking up for a customer data in a database. How many times you look up does not change the record. However placing order for an item on amazon repeatedly is not "idempotent" because it will change the order(create many orders). Hope you get used to this new drug for new era of cloud computing. This is mantra for success. Let me know what you think. 

Make Everyone Smile

Hey there! Just wanted to let you know that today is officially National 'Make Everyone Smile' Day! So, consider yourself officially...