Читать книгу Big Data - Seifedine Kadry - Страница 51

2.1 Cluster Computing

Оглавление

Cluster computing is a distributed or parallel computing system comprising multiple stand‐alone PCs connected together working as a single, integrated, highly available resource. Multiple computing resources are connected together in a cluster to constitute a single larger and more powerful virtual computer with each computing resource running an instance of the OS. The cluster components are connected together through local area networks (LANs). Cluster computing technology is used for high availability as well as load balancing with better system performance and reliability. The benefits of massively parallel processors and cluster computers are high availability, scalable performance, fault tolerance, and the use of cost‐effective commodity hardware. Scalability is achieved by removing nodes or adding additional nodes as per the demand without hindering the system operation. A cluster of systems connects together a group of systems to share critical computational tasks. The servers in a cluster are called nodes. Cluster computing can be client‐server architecture or a peer‐peer model. It provides high‐speed computational power for processing data‐intensive applications related to big data technologies. Cluster computing with distributed computation infrastructure provides fast and reliable data processing power to gigantic‐sized big data solutions with integrated and geographically separated autonomous resources. They make a cost‐effective solution to big data as they do allow multiple applications to share the computing resources. They are flexible to add more computing resources as required by the big data technology. The clusters are capable of changing the size dynamically, they shrink when any server shuts down or grow in size when additional servers are added to handle more load. They survive the failures with no or minimal impact. Clusters adopt a failover mechanism to eliminate the service interruptions. Failover is the process of switching to a redundant node upon the abnormal termination or failure of a previously active node. Failover is an automatic mechanism that does not require any human intervention, which differentiates it from the switch‐over operation.


Figure 2.2 Cluster computing.

Figure 2.2 shows the overview of cluster computing. Multiple stand‐alone PCs connected together through a dedicated switch. The login node acts as the gateway into the cluster. When the cluster has to be accessed by the users from a public network, the user has to login to the login node. This is to prevent unauthorized access by the users. Cluster computing has a master‐slave model and a peer‐to‐peer model. There are two major types of clusters, namely, high‐availability cluster and load‐balancing cluster. Cluster types are briefed in the following section.

Big Data

Подняться наверх