Читать книгу Data Mining and Machine Learning Applications - Группа авторов - Страница 21
1.6 Data Mining Techniques
ОглавлениеDecision trees: It is a tree-like structure that helps identify the possible outcomes/results/consequences, etc. It is usually used in a decision support system. One can say it can be used in classification and prediction. It resembles a tree-like structure where leaf nodes represent the outcomes/results, etc. as shown in Figure 1.4. As it is a tree-like structure, classification/prediction starts from the root node and traverses through the leaf nodes. Its benefit is there is no need for high computation to find perfect predictions [1–6].
If there are ‘n’ nodes (root node and leaf nodes) in a sorted manner, then the best option/desired option can be found within less time.
Genetic algorithms (GAs): It helps in finding possible solutions. These algorithms help to optimize the given problem and find better solutions. One can categorize the identified solutions into optimal and near-optimal solutions. It may comprise of ‘n’ computations and hence known as an evolutionary approach to find the perfect solution. In NP-hard problems, it has been proven that usable near-optimal solutions can be found using GAs. This concept is related to biology, i.e., chromosomes, genes, and population. These terms can be described in the computations as follows:Figure 1.4 Decision Tree.Chromosome—one possible solutionPopulations—set and subset of all possible solutionsGenes—one element of the chromosome
GAs could have the following steps involved—
Population initialization
Fitness function calculation
Crossover (finding the probabilities)
Mutation (a method to get a new solution)
Survivor selection (selecting the required and removing the unwanted)
Return the best solution.
Nearest neighbor method: As its name suggests, the nearest neighbor method tries to find the new possible solution, data based on some similarity. It classifies the given data and predicts the possible new data. Proximity among the given objects is calculated and as per the set threshold, objects close to each other are selected. E.g., KNN—‘k’ nearest neighbor algorithm. One has to decide the value of ‘k’ for better involvement of the objects. If someone decides the value of k = 1, possible outcomes become unstable, and as the value of ‘k’ increases, it involves the majority of objects which results in better predictions. Such algorithms can be used in Banking and financial systems and To calculate the credit of the users.