Читать книгу Data Science For Dummies - Lillian Pierson - Страница 43

Reminiscing about Hadoop

Because big data’s three Vs (volume, velocity, and variety) don’t allow for the handling of big data using traditional RDMSs, data engineers had to become innovative. To work around the limitations of relational systems, data engineers originally turned to the Hadoop data processing platform to boil down big data into smaller datasets that are more manageable for data scientists to analyze. This was all the rage until about 2015, when market demands had changed to the point that the platform was no longer able to meet them.

When people refer to Hadoop, they’re generally referring to an on-premise Hadoop storage environment that includes the HDFS (for data storage), MapReduce (for bulk data processing), Spark (for real-time data processing), and YARN (for resource management).

Подняться наверх