Monday, October 29, 2012

Real Time HBase

Over the last 2 years the HBase market has changed dramatically. Originally an implementation of BigTable the most common applications were log processing.

In the last couple years companies like Vertica provided solutions for data which did not fit into HBase.   For users who want an open source approach there is activity around Apache S4 and Twitter Storm.

HBase still has the largest installed base of semistructured big data repositories largely because of the integration with HDFS. HDFS allows the data owner to prevent data silos, where everybody with the correct permissions can access and copy HDFS files or objects.

Most systems which capture data using MongoDB for write performance still will store their data in HDFS.

The users who are providing data services still run map reduce jobs. This is quickly changing.

Continuuity has the best approach where they tried to implement with Twitter Storm and found the semantics of "at least once" for real time analytics was not as useful as implementing "only once" semantics. They modified the HBase RegionServer API and added a distributed queue and transaction manager. Amazing team. Best approach so far where they provide an integrated solution where developers can build applications on top of a data management layer with real time streaming performance.


No comments:

Post a Comment