Читать книгу Data Science For Dummies - Lillian Pierson - Страница 48

Processing big data in real-time

Оглавление

A real-time processing framework is — as its name implies — a framework that processes data in real-time (or near-real-time) as the data streams and flows into the system. Real-time frameworks process data in microbatches — they return results in a matter of seconds rather than the hours or days it typically takes batch processing frameworks like MapReduce. Real-time processing frameworks do one of the following:

 Increase the overall time efficiency of the system: Solutions in this category include Apache Storm and Apache Spark for near-real-time stream processing.

 Deploy innovative querying methods to facilitate the real-time querying of big data: Some solutions in this category are Google’s Dremel, Apache Drill, Shark for Apache Hive, and Cloudera’s Impala.

In-memory refers to processing data within the computer’s memory, without actually reading and writing its computational results onto the disk. In-memory computing provides results a lot faster but cannot process much data per processing interval.

Apache Spark is an in-memory computing application that you can use to query, explore, analyze, and even run machine learning algorithms on incoming streaming data in near-real-time. Its power lies in its processing speed: The ability to process and make predictions from streaming big data sources in three seconds flat is no laughing matter.

Real-time, stream-processing frameworks are quite useful in a multitude of industries — from stock and financial market analyses to e-commerce optimizations and from real-time fraud detection to optimized order logistics. Regardless of the industry in which you work, if your business is impacted by real-time data streams that are generated by humans, machines, or sensors, a real-time processing framework would be helpful to you in optimizing and generating value for your organization.

Data Science For Dummies

Подняться наверх