Читать книгу Artificial Intelligence and Quantum Computing for Advanced Wireless Networks - Savo G. Glisic - Страница 31

2.2.2 Decision Tree Classifiers

Оглавление

As indicated in Section 2.1, decision trees are considered to be one of the most popular approaches for representing classifiers. Here, we present a number of methods for constructing decision tree classifiers in a top‐down manner. This section suggests a unified algorithmic framework for presenting these algorithms and describes the various splitting criteria and pruning methodologies. The material is discussed along the lines presented in [8–11].

Supervised methods are methods that attempt to discover relationship between the input attributes and the target attribute. The relationship discovered is represented in a structure referred to as a model. Usually, models can be used for predicting the value of the target attribute knowing the values of the input attributes. It is useful to distinguish between two main supervised models: classification models (classifiers) and regression models. Regression models map the input space into a real‐valued domain, whereas classifiers map the input space into predefined classes. For instance, classifiers can be used to classify students into two groups: those passing exams on time and those passing exams with a delay. Many approaches are used to represent classifiers. The decision tree is probably the most widely used approach for this purpose.

Terminology: In a typical supervised learning, a training set of labeled examples is given, and the goal is to form a description that can be used to predict previously unseen examples. Most frequently, the training sets are described as a bag instance of a certain bag schema. The bag schema provides the description of the attributes and their domains. Formally, bag schema is denoted as R(Ay), where A denotes the set of n attributes A = {a1, … ai, … an}, and y represents the class variable or the target attribute.

Attributes can be nominal or numeric. When the attribute ai is nominal, it is useful to denote by its domain values, where ∣ dom(ai)∣ stands for its finite cardinality. In a similar way, dom(y) = {c1, …, c∣ dom(y)∣} represents the domain of the target attribute. Numeric attributes have infinite cardinalities. The set of all possible examples is called the instance space, which is defined as the Cartesian product of all the input attributes domains: X = dom(a1) × dom(a2) × … × dom(an). The universal instance space (or the labeled instance space) U is defined as the Cartesian product of all input attribute domains and the target attribute domain, that is, U = X × dom(y). The training set is a bag instance consisting of a set of m tuples (also known as records). Each tuple is described by a vector of attribute values in accordance with the definition of the bag schema. Formally, the training set is denoted as


Usually, it is assumed that the training set tuples are generated randomly and independently according to some fixed and unknown joint probability distribution D over U. Note that this is a generalization of the deterministic case when a supervisor classifies a tuple using a function y = f(x). Here‚ we use the common notation of bag algebra to present the projection (π) and selection (σ) of tuples.

The ML community, which is target audience of this book, has introduced the problem of concept learning. To learn a concept is to infer its general definition from a set of examples. This definition may be either explicitly formulated or left implicit, but either way it assigns each possible example to the concept or not. Thus, a concept can be formally regarded as a function from the set of all possible examples to the Boolean set {true, false}.

On the other hand, the data mining community prefers to deal with a straightforward extension of the concept learning, known as the classification problem. In this case, we search for a function that maps the set of all possible examples into a predefined set of class labels and is not limited to the Boolean set.

An inducer is an entity that obtains a training set and forms a classifier that represents the generalized relationship between the input attributes and the target attribute.

The notation I represents an inducer and I(S) represents a classifier that was induced by performing I on a training set S. Most frequently, the goal of the classifier’s inducers is formally defined as follows: Given a training set S with input attributes set A = {a1, a2, … an} and a target attribute y from a unknown fixed distribution D over the labeled instance space, the goal is to induce an optimal classifier with minimum generalization error.

Generalization error is defined as the misclassification rate over the distribution D. In case of the nominal attributes, it can be expressed as

(2.11)

where L(y, I(S)(x) is the loss function defined as L(y, I(S)(x)) = 0, if y = I(S)(x) and 1, if yI(S)(x). In the case of numeric attributes, the sum operator is replaced with the appropriate integral operator.

Tree representation: The initial examples of tree representation including the pertaining terminology are given in Figures 2.3 and 2.4.

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks

Подняться наверх