Читать книгу Big Data - Seifedine Kadry - Страница 56
2.2 Distribution Models
ОглавлениеThe main reason behind distributing data over a large cluster is to overcome the difficulty and to cut the cost of buying expensive servers. There are several distribution models with which an increase in data volume and large volumes of read or write requests can be handled, and the network can be made highly available. The downside of this type of architecture is the complexity it introduces with the increase in the number of computers added to the cluster. Replication and sharding are the two major techniques of data distribution. Figure 2.5 shows the distribution models.
Replication—Replication is the process of placing the same set of data over multiple nodes. Replication can be performed using a peer‐to‐peer model or a master‐slave model.
Sharding—Sharding is the process of placing different sets of data on different nodes.
Sharding and Replication—Sharding and replication can either be used alone or together.