Читать книгу Machine Learning with Dynamics 365 and Power Platform - Vinnie Bansal - Страница 29
Lifecycle of Machine Learning
ОглавлениеData‐driven organizations face different challenges in developing ML models, from prototyping to production. To derive practical business values, data scientists and data engineers serve the model with a huge amount of data and train it to take advantage of ML algorithms. To create a desirable ML system, businesses need to comprehend the ML lifecycle process. Now let's understand why ML lifecycle is so important for businesses.
According to sas.com, 50 percent of models never make it to production due to the following reasons:
Insufficient data. Insufficient data, when fed to the model, result in an increase in variance. Variance is a value that is equal to the difference between the prediction accuracy of training data and test data in the ML model. If the prediction accuracy between training data and test data is high, the model will produce accurate results with training data but will stop working as soon as test data is fed into it.
Nonrepresentative training data. It is the training set of data that doesn't reflect the cases of the deployment environment. This problem is also called sampling bias. It is necessary to make sure that the sample you are feeding to the model matches the environment it's going to be deployed in.
Poor quality data. It refers to the data that has missing observations, errors, outliers (values that deviate from other observations on data), and noise (spurious and unnecessary data).
Overfitting the data. It is a situation when the model learns the detail and noise in the training data so well that it produces negative results when fed with new data.
Underfitting the data. This situation occurs when you want to build an accurate model with fewer data. Due to a lack of data, the model is unable to capture the underlying trend of the data.
So, to build a model, it is crucial to have the right data, at the right time, in the right location. The ML lifecycles play a key role in building custom ML algorithms to support learning models. The main purpose of the lifecycle is to create a model with a good workflow that can be reproduced, revisited, and deployed to production easily.
Now let's understand what the machine learning lifecycle is and how it works:
The machine learning lifecycle is a repetitive process to build an efficient machine learning system called a “model.”
The machine learning lifecycle consists of four phases: (1) data preparation, (2) machine learning model, (3) validation, and (4) deployment. This lifecycle is all about gaining deeper insights from data. It is leveraged by data engineers, data scientists, and those working with data to develop, train, validate, and serve machine learning models. Figure 1.3 depicts a typical ML lifecycle and its phases.
Let's jump into these phases one by one.