Читать книгу Probability with R - Jane M. Horgan - Страница 57

3.6 MACHINE LEARNING AND THE LINE OF BEST FIT

Machine learning is the science of getting computer systems to use algorithms and statistical models to study patterns and learn from data. Supervised learning is the machine learning task of using past data to learn a function in order to predict a future output.

The line of best fit is one of the many techniques that machine learning has borrowed from the field of Probability and Statistics to “train” the machine to make predictions. In this case of what is also known as the simple linear regression line in statistics, a set of pairs of data is obtained, is referred to as the independent variable, and is the dependent variable. The objective is to estimate from . The line of best fit, , is obtained by choosing the intercept and slope so that the sum of the squared distances from the observed to the estimated is minimized. The algebraic details of the derivations of and are given in Appendix B.

Often, the data for supervised learning are randomly divided into two parts, one for training and the other for testing. In machine learning, we derive the line of best fit from the training set

The testing set is used to see how well the line actually fits. Usually, an breakdown of the data is made, the 80% is used for “training,” that is, to obtain the line, and the 20% is used to decide if the line really fits the data, and to ascertain if the model is appropriate for future predictions. The model is updated as new data become available.

Подняться наверх