I just read Google's BigTable
paper. From the title, "
Bigtable: A Distributed Storage System for Structured Data", you can tell this paper is not about
PageRank algorithm. Instead, it reports how Google manages their structured data in a distributed way. So it talks about the infrastructure behind various services Google provides, such as webpage indexing, Google Finance, Google Earth, etc..
It's a interesting paper. Many databased-related issues are addressed in the paper. Some topics I found interesting are listed below.
- Data are modeled as a sorted map. Key is a combination of a row key, column key and a timestamp, while value is an uninterpreted array of bytes.
- Action is coordinated via Chubby lock service and Paxos algorithm.
- Data are located via one active master server and many tablet servers.
- Data are summarized via a parallel computation method, MapReduce.
A brief introduction to BigTable is
here, and I just found that chapter 23 in
Beautiful Code describes MapReduce method. I think it should be useful and/or important for practioners, i.e., programmers, as computer is moving toward a
multi-core system. I should give it a read.
No comments:
Post a Comment