Читать книгу Big Data - Seifedine Kadry - Страница 33

1.8.3.3 Data Reduction

Оглавление

Data processing on massive data volume may take a long time, making data analysis either infeasible or impractical. Data reduction is the concept of reducing the volume of data or reducing the dimension of the data, that is, the number of attributes. Data reduction techniques are adopted to analyze the data in reduced format without losing the integrity of the actual data and yet yield quality outputs. Data reduction techniques include data compression, dimensionality reduction, and numerosity reduction. Data compression techniques are applied to obtain the compressed or reduced representation of the actual data. If the original data is retrieved back from the data that is being compressed without any loss of information, then it is called lossless data reduction. On the other hand, if the data retrieval is only partial, then it is called lossy data reduction. Dimensionality reduction is the reduction of a number of attributes, and the techniques include wavelet transforms where the original data is projected into a smaller space and attribute subset selection, a method which involves removal of irrelevant or redundant attributes. Numerosity reduction is a technique adopted to reduce the volume by choosing smaller alternative data. Numerosity reduction is implemented using parametric and nonparametric methods. In parametric methods instead of storing the actual data, only the parameters are stored. Nonparametric methods stores reduced representations of the original data.

Big Data

Подняться наверх