Читать книгу Profit Driven Business Analytics - Baesens Bart - Страница 9
CHAPTER 1
A Value-Centric Perspective Towards Analytics
ANALYTICS PROCESS MODEL
ОглавлениеFigure 1.1 provides a high-level overview of the analytics process model (Hand, Mannila, and Smyth 2001; Tan, Steinbach, and Kumar 2005; Han and Kamber 2011; Baesens 2014). This model defines the subsequent steps in the development, implementation, and operation of analytics within an organization.
Figure 1.1 The analytics process model.
(Baesens 2014)
As a first step, a thorough definition of the business problem to be addressed is needed. The objective of applying analytics needs to be unambiguously defined. Some examples are: customer segmentation of a mortgage portfolio, retention modeling for a postpaid Telco subscription, or fraud detection for credit-cards. Defining the perimeter of the analytical modeling exercise requires a close collaboration between the data scientists and business experts. Both parties need to agree on a set of key concepts; these may include how we define a customer, transaction, churn, or fraud. Whereas this may seem self-evident, it appears to be a crucial success factor to make sure a common understanding of the goal and some key concepts is agreed on by all involved stakeholders.
Next, all source data that could be of potential interest need to be identified. This is a very important step as data are the key ingredient to any analytical exercise and the selection of data will have a deterministic impact on the analytical models that will be built in a subsequent step. The golden rule here is: the more data, the better! The analytical model itself will later decide which data are relevant and which are not for the task at hand. All data will then be gathered and consolidated in a staging area which could be, for example, a data warehouse, data mart, or even a simple spreadsheet file. Some basic exploratory data analysis can then be considered using for instance OLAP facilities for multidimensional analysis (e.g., roll-up, drill down, slicing and dicing). This will be followed by a data-cleaning step to get rid of all inconsistencies such as missing values, outliers and duplicate data. Additional transformations may also be considered such as binning, alphanumeric to numeric coding, geographical aggregation, to name a few, as well as deriving additional characteristics that are typically called features from the raw data. A simple example concerns the derivation of the age from the birth date; yet more complex examples are provided in Chapter 3.
In the analytics step, an analytical model will be estimated on the preprocessed and transformed data. Depending on the business objective and the exact task at hand, a particular analytical technique will be selected and implemented by the data scientist. In Table 1.1, an overview was provided of various tasks and types of analytics. Alternatively, one may consider the various types of analytics listed in Table 1.1 to be the basic building blocks or solution components that a data scientist employs to solve the problem at hand. In other words, the business problem needs to be reformulated in terms of the available tools enumerated in Table 1.1.
Finally, once the results are obtained, they will be interpreted and evaluated by the business experts. Results may be clusters, rules, patterns, or relations, among others, all of which will be called analytical models resulting from applying analytics. Trivial patterns (e.g., an association rule is found stating that spaghetti and spaghetti sauce are often purchased together) that may be detected by the analytical model are interesting as they help to validate the model. But of course, the key issue is to find the unknown yet interesting and actionable patterns (sometimes also referred to as knowledge diamonds) that can provide new insights into your data that can then be translated into new profit opportunities. Before putting the resulting model or patterns into operation, an important evaluation step is to consider the actual returns or profits that will be generated, and to compare these to a relevant base scenario such as a do-nothing decision or a change-nothing decision. In the next section, an overview of various evaluation criteria is provided; these are discussed to validate analytical models.
Once the analytical model has been appropriately validated and approved, it can be put into production as an analytics application (e.g., decision support system, scoring engine). Important considerations here are how to represent the model output in a user-friendly way, how to integrate it with other applications (e.g., marketing campaign management tools, risk engines), and how to make sure the analytical model can be appropriately monitored and backtested on an ongoing basis.
It is important to note that the process model outlined in Figure 1.1 is iterative in nature in the sense that one may have to return to previous steps during the exercise. For instance, during the analytics step, a need for additional data may be identified that will necessitate additional data selection, cleaning, and transformation. The most time-consuming step typically is the data selection and preprocessing step, which usually takes around 80 % of the total efforts needed to build an analytical model.