Читать книгу Profit Maximization Techniques for Operating Chemical Plants - Sandip K. Lahiri - Страница 55
2.6.2 Data Refinement is a Two‐Step Iteration
ОглавлениеOnce all relevant raw data of the chemical industry is captured and stored, the next step is the process of translating the enormous amount of unrefined raw data into actionable business insights. This is a very critical step and here lie all the challenges. The two‐step process is comprised of enrichment and extraction (Holger Hürtgen, 2018).
1 Step 1: Enriching data with additional information and/or domain knowledge. It is important to understand that a data engineer working alone is not sufficient to do this translation. A domain expert who runs the chemical company is absolutely necessary to enrich the data with additional domain knowledge – which is a somewhat more complex process. This essentially means that human expertise and domain knowledge is as important to making data useful as is the power of analytics and algorithms.
The blend of data analytics capability along with a domain expert's knowledge is still the optimal approach to data enrichment, although this may well change one day due to further developments in AI space. Therefore, before we even start with machine learning, we need to involve human experts who use their expertise to explain their hypotheses.
The task of a data scientist (or sometimes a data engineer at this stage) is now to translate, i.e. codify, additional information and/or this domain knowledge into variables. Concretely, this means transforming existing data into new variables – often also called feature engineering.
Man + machine example: Predictive maintenance in big compressors in the chemical industry. Deep knowledge in the data science field is important in choosing the right machine learning algorithm and in fine tuning models in a way that best predicts machine/component failure in compressors. At the same time, engineering domain knowledge specific to compressors will make a big difference when interpreting and identifying the root causes of failures. Sometimes data collected from other industries running with similar types of compressors can further optimize these models by providing additional ideas for drivers of failures that can be used in the predictive model to further improve the predictive power. Domain knowledge can also help to interpret the results and derive concrete business‐based solutions to prevent failures in the future. Finally, business knowledge is critical in implementing the recommendations, so that processes can be appropriately aligned, e.g. training maintenance engineers to schedule predictive maintenance using the outputs of the predictive maintenance model in their daily work.
1 Step 2: This step involves extracting insights using the machine learning algorithm. The purpose of this step is to select and run the appropriate machine learning algorithm and doing the actual maths and number crunching. The objective is to find the patterns in the data and feature selection. Though a sophisticated AI‐based algorithm is capable of finding all the features in the data, involvement of domain experts during this process helps to generate insights and improve the ability to explain the evolved solutions. Creating new features just helps the machine to find patterns more easily and also helps humans to describe and act on these patterns.
The purpose of this step is to utilize different machine learning algorithms to identify these patterns. Typically, one can distinguish among descriptive analytics (what happened in the past and why?), predictive analytics (what will happen in the future?), and prescriptive analytics (how can we change the future?). In all these steps, simple but also quite sophisticated methods can be used. More and more advanced techniques in AI and machine learning are being used due to the increased amount of available data and computing power. Figure 2.6 depicts how data science becomes an iterative process that leverages both human domain expertise and advanced AI‐based machine learning techniques.
Figure 2.6 Data science is an iterative process that leverages both human domain expertise and advanced AI‐based machine learning techniques