Читать книгу Big Data - Seifedine Kadry - Страница 62
2.3 Distributed File System
ОглавлениеA file system is a way of storing and organizing the data on storage devices such as hard drives, DVDs, and so forth, and to keep track of the files stored on them. The file is the smallest unit of storage defined by the file system to pile the data. These file systems store and retrieve data for the application to run effectively and efficiently on the operating systems. A distributed file system stores the files across cluster nodes and allows the clients to access the files from the cluster. Though physically the files are distributed across the nodes, logically it appears to the client as if the files are residing on their local machine. Since a distributed file system provides access to more than one client simultaneously, the server has a mechanism to organize updates for the clients to access the current updated version of the file, and no version conflicts arise. Big data widely adopts a distributed file system known as Hadoop Distributed File System (HDFS).
The key concept of a distributed file system is the data replication where the copies of data called replicas are distributed on multiple cluster nodes so that there is no single point of failure, which increases the reliability. The client can communicate with any of the closest available nodes to reduce latency and network traffic. Fault tolerance is achieved through data replication as the data will not be lost in case of node failure due to the redundancy in the data across nodes.