Читать книгу Cognitive Engineering for Next Generation Computing - Группа авторов - Страница 44

1.10.4 The Significant Challenges in Machine Learning

Identifying good hypothesis space

Optimization of accuracy on unknown data

Insufficient Training Data.

It takes a great deal of information for most Machine Learning calculations to work appropriately. For underlying issues, regularly need a vast number of models, and for complex issues, for example, picture or discourse recognition you may require a great many models.

Representation of Training Data

It is critical, to sum up, the preparation of information on the new cases. By utilizing a non-representative preparing set, we prepared a model that is probably not going to make precise forecasts, particularly for poor and rich nations. It is essential to utilize a preparation set that is illustrative of the cases you need to generalize to. This is frequently harder than it sounds: if the example is excessively small, you will have inspecting clamor. However, even extremely enormous examples can be non-representative of the testing technique is defective. This is called sample data bias.

Quality of Data

If the preparation of information is loaded with mistakes, exceptions, and clamor it will make it harder for the framework to distinguish the basic examples, so your framework is less inclined to perform well. It is regularly definitely justified even despite the push to invest energy tidying up your preparation information. In all actuality, most information researchers spend a noteworthy piece of their time doing only that. For instance: If a few occurrences are exceptions, it might help to just dispose of them or attempt to fix the blunders physically. If a few examples are feeling the loss of a couple of highlights (e.g., 5% of your clients did not determine their age), you should choose whether you need to overlook this characteristic altogether, disregard these occasions, fill in the missing qualities (e.g., with the middle age), or train one model with the component and one model without it, etc.

Unimportant Features

The machine learning framework might be fit for learning if the preparation information contains enough significant features and not very many unimportant ones. Now days Feature engineering, became very necessary for developing any type of model. Feature engineering process includes choosing the most helpful features to prepare on among existing highlights, consolidating existing highlights to deliver an increasingly valuable one (as we saw prior, dimensionality decrease calculations can help) and then creating new features by social event new information.

Overfitting

Overfitting implies that the model performs well on the preparation information, yet it doesn’t sum up well. Overfitting happens when the model is excessively mind boggling comparative with the sum and din of the preparation information.

The potential arrangements to overcome the overfitting problem are

To improve the model by choosing one with fewer boundaries (e.g., a straight model instead of a severe extent polynomial model), by lessening the number of characteristics in the preparation of data.

To assemble all the more preparing information

To lessen the commotion in the preparation information (e.g., fix information blunders and evacuate anomalies)

Constraining a model to make it more straightforward and decrease the danger of overfitting is called regularization.

Cognitive Engineering for Next Generation Computing

Подняться наверх