Читать книгу Computational Statistics in Data Science - Группа авторов - Страница 91
2 Data Stream Computing
ОглавлениеData stream computing alludes to the real‐time processing of vast measures of data produced at high speed from numerous sources, with different schemas, and different temporal resolutions [12]. It is another required worldview given the new wellsprings of data‐generation situations, which incorporates the cell phones, ubiquity of location services, and sensor universality [13].
The principal presumption of stream computing is that the likelihood estimation of data lies in its newness. Thus, the analysis of data is done the moment they arrive in a stream instead of what obtains in batch processing where data are first stored before they are analyzed. This is a serious requirement for suitable platforms for scalable computing with parallel architectures [14]. With stream computing, it is feasible for organizations to analyze and respond to speedily changing data in real‐time [15]. Integrating streaming data into the decision‐making process brings about a programming concept called stream computing. Stream processing solutions ought to have the option to deal with the high volume of data from different sources in real‐time by giving due consideration to accessibility, versatility, and adaptation to noncritical failure. Datastream analysis includes the ingestion of data as a boundless tuple, analysis, and creation of significant outcomes in a stream [16].
In a stream processor, the representation of an application is done with the data flow graph, which is comprised of operations and interconnected streams. A stream processing workflow consists of programs. Formally, a composition where is a set of transaction programs and <p is the program order, also called partial order. The partial order contains the dataflow and control order of the data stream. The composition graph is the acyclic graph representing the partial order. Input streams to the composition are called source streams, while the output streams are called derived streams [17]. In a streaming analytics system, the application comes as continuous queries, data are continuously ingested, analyzed, and interrelated to produce results in streaming fashion. Streaming analytics frameworks must be able to recognize new data, build models incrementally, and detect deviation from model predictions [18].