Читать книгу Data Mining and Machine Learning Applications - Группа авторов - Страница 14
1.2 Knowledge Discovery in Database (KDD)
ОглавлениеIt helps detect the new patterns of previously unknown data, i.e., extracting the hidden patterns, data from the massive volume of datasets [3, 6]. Figure 1.1 gives an idea about Knowledge discovery in Database—KDD, which consists of the following phases:
Data cleaning: This step can be defined as removing irrelevant data. Removing irrelevant data is nothing but unwanted data; records can be removed. Data collection may consist of missing values which must be either needs to be removed or should impute the missing information [7].Figure 1.1 Knowledge discovery in Database—KDD.
Data integration: Data is collected from heterogeneous sources and integrated into a common source like data-warehouse (DW). A very common technique, Extract-Transform-Load (ETL), is beneficial in this regard. Integrating the data from multiple sources requires proper synchronization between the systems [2].
Data selection & transformation: Once the required data is selected, the next task is data transformation. As its name suggests transformation, it is nothing but transforming it into the desired mining procedure [8, 9].
Pattern evaluation: Evaluation is based on some measures; once these measures are applied, retrieved results are strictly compared/evaluated based on the stored patterns [9–11].
Knowledge representation: It is nothing but representing the processed data into the required formats such as tables and reports. One can say knowledge representation generates the rules, and using the exact visualization is possible [10].