Читать книгу Big Data - Seifedine Kadry - Страница 39
1.9.2 Heterogeneity and Incompleteness
ОглавлениеThe data types of big data are heterogeneous in nature as the data is integrated from multiple sources and hence has to be carefully structured and presented as homogenous data before big data analysis. The data gathered may be incomplete, making the analysis much more complicated. Consider an example of a patient online health record with his name, occupation, birth data, medical ailment, laboratory test results, and previous medical history. If one or more of the above details are missing in multiple records, the analysis cannot be performed as it may not turn out to be valuable. In some scenarios a NULL value may be inserted in the place of missing values, and the analysis may be performed if that particular value does not have a great impact on the analysis and if the rest of the available values are sufficient to produce a valuable outcome.