Monday, December 17, 2007

Google's BigTable

I just read Google's BigTable paper. From the title, "Bigtable: A Distributed Storage System for Structured Data", you can tell this paper is not about PageRank algorithm. Instead, it reports how Google manages their structured data in a distributed way. So it talks about the infrastructure behind various services Google provides, such as webpage indexing, Google Finance, Google Earth, etc..

It's a interesting paper. Many databased-related issues are addressed in the paper. Some topics I found interesting are listed below.
  1. Data are modeled as a sorted map. Key is a combination of a row key, column key and a timestamp, while value is an uninterpreted array of bytes.
  2. Action is coordinated via Chubby lock service and Paxos algorithm.
  3. Data are located via one active master server and many tablet servers.
  4. Data are summarized via a parallel computation method, MapReduce.
A brief introduction to BigTable is here, and I just found that chapter 23 in Beautiful Code describes MapReduce method. I think it should be useful and/or important for practioners, i.e., programmers, as computer is moving toward a multi-core system. I should give it a read.

No comments: