Читать книгу Big Data Analytics and Machine Intelligence in Biomedical and Health Informatics - Группа авторов - Страница 37

1.6 Challenges in Big Data Analytics

Оглавление

Some difficult issues should be considered when collecting large amounts of data. The cost of experimental measurements is a factor in obtaining high throughput ‘omics’ data. Prior to integrating these heterogeneous data and employing data mining methods, it is necessary to consider the heterogeneity of the data sources, the noise in the experimental ‘omics’ data, and the variety of experimental techniques, environmental conditions, and biological nature. On these heterogeneous biomedical data sets, various data mining techniques can be applied, including anomaly detection, clustering, classification, association rules, as well as summarization and visualization of those large data sets.

These flaws may result in the unreliability of individual data points, such as missing values or outliers. Despite these limitations of ‘omics’ data, EHRs data are heavily influenced by the staff who entered the patient’s data, which can result in the entry of missing values or incorrect data due to human error, misunderstanding, or incorrect interpretation of the original data [20]. Integrating data from disparate databases and standardizing laboratory protocols and values continue to be difficult issues [21].

The high dimensionality of the ‘omics’ data means that it contains many more dimensions or features than the number of samples, whereas the EHRs data, which contains information about individuals/patients, makes data mining techniques more difficult to apply.

The following stage is data pre-processing, which typically entails dealing with noisy data, outliers, missing values, as well as data transformation and normalization. This data pre-processing enables the application of statistical techniques and data mining methods, thereby improving the quality and outcomes of big data analytics and potentially resulting in the discovery of novel knowledge. This novel knowledge obtained through the integration of ‘omics’ and EHRs data should result in enhanced healthcare delivery to patients as well as advanced decision-making by healthcare policymakers.

Big Data Analytics and Machine Intelligence in Biomedical and Health Informatics

Подняться наверх