Читать книгу Electronics in Advanced Research Industries - Alessandro Massaro - Страница 31

1.5 Basic Concepts of Artificial Intelligence

Оглавление

AI in industrial applications is a discipline to aid data analysis, capable of self‐learning from the data, to predict and classify data, and to formulate KPIs. The prediction is referred to a machine failure, or in more extensive cases to the evolution of production malfunctions. AI contributes to main areas in sustainable manufacturing such as:

 supply chain management

 predictive maintenance

 quality control

 energy consumption optimization

Of industrial interest is the self‐classification of images classifying and predicting production defects. AI algorithms, as for deep learning (DL), can simultaneously process multiple variables with different “calculation weight.” The AI models are specific for the application and therefore are not generic; the training dataset to be analyzed must be chosen accurately. A particular AI algorithm is the ANN. A neural network is composed of:

 Input layer (level designed to receive information from outside in order to learn to recognize and process the same information received).

 Hidden layer (layer connecting the input level with the output level, and helping the neural network to learn the complex relationships analyzed by the data; often there is more than one hidden level as for DL neural networks).

 Output layer (final layer showing the results of what the algorithm has processed).

Each connection between neurons is associated with a weight determining the importance of the input value (input variable xi). The initial weights are set randomly. Each neuron, on the other hand, is characterized by an activation of a mathematical function. Figure 1.17a shows a basic neural network and Figure 1.17b shows a DL neural network. The ANNs are based on the concept of back‐propagation error of the neural network training, consisting of a tuning of the weights based on the error rate obtained in the previous iterations. The iteration is named an epoch. Proper refinement of the weights tuning ensures lower error rates, optimizing the model for the specific case of study. In Figure 1.18a is sketched the principle of the back propagation feedback system, enabling self‐adjusting weights. Figure 1.18b shows a basic neural network implementing the mathematical function defining node output (unit step functions named activation functions).


Figure 1.17 (a) Simple ANN. (b) DL neural network.


Figure 1.18 (a) Feedback system minimizing calculation error in the training model. (b) Neural network model implementing unit step function.

The pseudocode of the ANN training process is as follows:

1. Train_ANN (fi, wi, oj) 2. For epochs = 1 to N Do 3. While (j ≤ m) Do 4. Randomly initialize wi = { w1 , w2 …, wn}; 5. Input oj = { o1 , o2 ,…, om} in the input layer forward propagate (fi· wi) though layers until is obtained the predicted result y; 6. Compute the error e = y - y2; 7. Back propagate e from the right to the left of the ANN network through layers; 8. Update wi; 9. End While 10. End For

The pseudocode highlights that there are two mechanisms in the ANN network: the forward propagation of the estimation of the predicted output y, and the back propagation of the error function as sketched in Figure 1.18b. The output is estimated by considering the summation of the input contributions and is defined as:

(1.10)

where f is the activation function. Some examples of activation functions are plotted in Figure 1.19, where the analytical forms are:

(1.11)

(1.12)

(1.13)

(1.14)

(1.15)


Figure 1.19 Basic mathematical functions defining activation functions.

Other mathematical activation functions are the following [68]:

(1.16)

(1.17)

(1.18)

(1.19)

(1.20)

(1.21)

(1.22)

(1.23)

(1.24)

(1.25)

(1.26)

(1.27)

(1.28)

(1.29)

(1.30)

(1.31)

The activation function represents a basic research element of considerable importance. The correct choice of the activation function defines the best implementation of the logic defining the outputs. The analytical model must therefore be appropriately weighted by the various variables and must be “calibrated” for the specific case study. Another important aspect is the ability of the activation function to self‐adapt [69] to the specific case study providing a certain flexibility [70]. Of particular interest is the possibility to consider a combination of activation functions (activation ensemble [71]). The approach to follow is therefore to define a flexible and modular activation function as is the case for the adaptive spline activation function [72].

Concerning the training models, the full dataset of the neural network is divided into a training set, validation set, and test set (Figure 1.20). In particular, the function of the training dataset is to fit the model; the validation set is a small partition of the full dataset able to previously estimate prediction error of the selected model; finally, the test set is used for testing the final model. A correct choice of the three parts depends on the SNR of the full dataset.


Figure 1.20 Supervised artificial network model: partitioning of the available dataset into training set, validation set, and test set.

The intelligent algorithms which constitute the core of the Industry 5.0 system, are classified in Figure 1.21. The generic algorithms of data analysis, including statistical data processing and classification roles, are classified as computer science and data mining algorithms; other high‐level algorithms are included in AI algorithms. The main function of the engine processor enables the managing of big data and of data processing. As previously mentioned, a basic concept of algorithm classification is in the learning supervision: in a supervised learning model, the algorithm learns on a pre‐selected dataset with specific labeled attributes filtered by the user; in an unsupervised model all the attributes are unlabeled, and the algorithm tries to extract features and patterns without a guideline. The supervised algorithms mainly support the user to find a solution for a specific problem such as finding a specific defect category or a specific failure system.


Figure 1.21 Algorithm classification and Industry 5.0 facilities.

A simple way to analyze the data trend is regression analysis, where a linear approach is enough to model a relationship between dependent variables and independent ones (see plot example of Figure 1.22a). Typically, line regression provides information about a linear trend prediction. The classification is based on the concept of data categorization: data are classified by considering a generic classification pattern curve defined as shown in Figure 1.22b, where all data above and below the curve appertain to a particular class. Data classification is often used to solve supervised learning models. Finally, data clustering is based on the grouping of datasets forming clusters having similar features. Data clustering is commonly used to solve unsupervised learning models. All analysis can be performed by analyzing a multidimensional domain by taking into account the variable time, which is fundamental for forecasting approaches.


Figure 1.22 (a) Regression analysis, (b) data classification, and (c) data clustering.

The ensemble approach is an alternative method for data classification. An ensemble is a set of classifiers that learn a target function. By combining different outputs of several classifiers, the risk of selecting a poorly performing classifier is reduced. The typical ensemble procedure is provided by the following pseudocode where T denotes the original training dataset, κ is the number of base classifiers, and B is the test data:

1. For i=1 to k do 2. Create training set Ti from T. 3. Build a base classifier Ci from T. 4. end For 5. for each test record x∈ B do: 6. C*(x)=Vote (C1(x),C2(x),…,Ck(x)) 7. end For

The individual predictions of classifiers are combined to classify new samples, thus optimizing the classifier performance on a specific domain. Figure 1.23 shows an example of an ensemble approach based on the best classification of a single feature.


Figure 1.23 Ensemble method and classification.

The RFo method [73] is a class of ensemble methods specifically designed for DT classifiers. The main property is to combine predictions performed by more DT models: each tree is generated from a part of the training dataset and the values are grouped into a set of random vectors. This algorithm is structured as follows (Figure 1.24):

 F input features are randomly selected to split at each node (step 1 of creation of random vectors).

 A linear combination of the input features is created to split at each node (step 2 of using a random vector to build multiple DTs).

 A combination of DTs is created (step 3).


Figure 1.24 Ensemble method and classification.

The RFo classification technique is also applied in image processing detecting defect features. The logic of the DT algorithm is reported by the following pseudocode:

Decision_Tree Function. 1. Compute Gain values for all attributes and select an attribute having the highest value creating a node for that attribute. 2. Make a branch from this node for every value of the attribute. 3. Assign all possible values of the attributes to branches.

Follow each branch partitioning the dataset to be only instances whereby the value of the branch is present (or for similar values) and then go back to 1.

A particular feature of neural network algorithms are the long short‐term memory (LSTM) networks, which are artificial recurrent neural network (RNN) architectures [74] used for DL applications. The basic architecture is shown in Figure 1.25a: the structure is a cell (cell state), an input gate (input gate state activated at the time step t), an output gate (output gate state at the time step t), and a forget gate (forget gate state at the time step t). The gates calculate their activations at time step t, by taking into account the activation of the memory cell C at time step t 1. Figure 1.25b shows the basic model of a network composed of LSTM nodes.


Figure 1.25 (a) LSTM unit cell. (b) LSTM network and its memory.

Another important parameter used for the choice of the training dataset is the correlation coefficient, indicating a relationship between two attributes. The correlation coefficient indicates a value between −1 and 1 (−1 when the attributes are inversely correlated, +1 when the attributes are absolutely correlated, and 0 represents no correlations). The correlation coefficient considers the magnitudes of the deviations from the mean value. In particular, the Pearson–Bravais correlation coefficient is estimated as the ratio between the covariance of the two variables and the product of their standard deviations as follows [75]:

(1.32)

The correlation coefficients are plotted in a correlation matrix with the structure shown in Table 1.9.

Table 1.9 Example of a correlation matrix for a model with five attributes (v1, v2, v3, v4, v5).

v 1 v 2 v 3 v 4 v 5
v 1 1 0.4 −0.8 0.6 −0.3
v 2 0.4 1 0.1 0.8 −0.5
v 3 −0.8 0.1 1 0.2 0.1
v 4 0.6 0.8 0.2 1 −0.3
v 5 −0.3 −0.5 0.1 −0.3 1
Electronics in Advanced Research Industries

Подняться наверх