Hey guys, I stumbled on an article on shard (chunk) replication that we
should probably keep in mind when we design our data
distribution/replication scheme.
-Daniel
Full article: http://hackingdistributed.com/2014/02/14/chainsets/
----
Copysets and Chainsets: A Better Way to Replicate
In distributed storage systems, data is often partitioned across
multiple servers for scalability and replicated for fault tolerance. The
traditional technique for performing such partitioning and replication
is to randomly assign data to replicas. Although such random assignment
is relatively easy to implement, it suffers from a fatal drawback: as
cluster size grows, it becomes almost guaranteed that a failure of a
small percentage of the cluster will lead to permanent data loss.
There is a far smarter way place replicas, explored by a paper from our
colleagues at Stanford, titled "Copysets: Reducing the Frequency of Data
Loss in Cloud Storage" [CRSK+13]. It was presented at the USENIX
conference, co-chaired by one of us, where it received the best student
paper award. In this post, we want to first explain how random replica
sets, as used in many NoSQL solutions, actually increases the likelihood
of data loss, explain how Copysets mitigate the data loss risks, and
then describe what it takes to render Copysets to practice in a NoSQL
data store.
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the QSERV-L list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=QSERV-L&A=1
|