Читать книгу Profit Driven Business Analytics - Baesens Bart - Страница 7
CHAPTER 1
A Value-Centric Perspective Towards Analytics
INTRODUCTION
ОглавлениеIn this first chapter, we set the scene for what is ahead by broadly introducing profit-driven business analytics. The value-centric perspective toward analytics proposed in this book will be positioned and contrasted with a traditional statistical perspective. The implications of adopting a value-centric perspective toward the use of analytics in business are significant: a mind shift is needed both from managers and data scientists in developing, implementing, and operating analytical models. This, however, calls for deep insight into the underlying principles of advanced analytical approaches. Providing such insight is our general objective in writing this book and, more specifically:
◼ We aim to provide the reader with a structured overview of state-of-the art analytics for business applications.
◼ We want to assist the reader in gaining a deeper practical understanding of the inner workings and underlying principles of these approaches from a practitioner's perspective.
◼ We wish to advance managerial thinking on the use of advanced analytics by offering insight into how these approaches may either generate significant added value or lower operational costs by increasing the efficiency of business processes.
◼ We seek to prosper and facilitate the use of analytical approaches that are customized to needs and requirements in a business context.
As such, we envision that our book will facilitate organizations stepping up to a next level in the adoption of analytics for decision making by embracing the advanced methods introduced in the subsequent chapters of this book. Doing so requires an investment in terms of acquiring and developing knowledge and skills but, as is demonstrated throughout the book, also generates increased profits. An interesting feature of the approaches discussed in this book is that they have often been developed at the intersection of academia and business, by academics and practitioners joining forces for tuning a multitude of approaches to the particular needs and problem characteristics encountered and shared across diverse business settings.
Most of these approaches emerged only after the millennium, which should not be surprising. Since the millennium, we have witnessed a continuous and pace-gaining development and an expanding adoption of information, network, and database technologies. Key technological evolutions include the massive growth and success of the World Wide Web and Internet services, the introduction of smart phones, the standardization of enterprise resource planning systems, and many other applications of information technology. This dramatic change of scene has prospered the development of analytics for business applications as a rapidly growing and thriving branch of science and industry.
To achieve the stated objectives, we have chosen to adopt a pragmatic approach in explaining techniques and concepts. We do not focus on providing extensive mathematical proof or detailed algorithms. Instead, we pinpoint the crucial insights and underlying reasoning, as well as the advantages and disadvantages, related to the practical use of the discussed approaches in a business setting. For this, we ground our discourse on solid academic research expertise as well as on many years of practical experience in elaborating industrial analytics projects in close collaboration with data science professionals. Throughout the book, a plethora of illustrative examples and case studies are discussed. Example datasets, code, and implementations are provided on the book's companion website, www.profit-analytics.com, to further support the adoption of the discussed approaches.
In this chapter, we first introduce business analytics. Next, the profit-driven perspective toward business analytics that will be elaborated in this book is presented. We then introduce the subsequent chapters of this book and how the approaches introduced in these chapters allow us to adopt a value-centric approach for maximizing profitability and, as such, to increase the return on investment of big data and analytics. Next, the analytics process model is discussed, detailing the subsequent steps in elaborating an analytics project within an organization. Finally, the chapter concludes by characterizing the ideal profile of a business data scientist.
Business Analytics
Data is the new oil is a popular quote pinpointing the increasing value of data and – to our liking – accurately characterizes data as raw material. Data are to be seen as an input or basic resource needing further processing before actually being of use. In a subsequent section in this chapter, we introduce the analytics process model that describes the iterative chain of processing steps involved in turning data into information or decisions, which is quite similar actually to an oil refinery process. Note the subtle but significant difference between the words data and information in the sentence above. Whereas data fundamentally can be defined to be a sequence of zeroes and ones, information essentially is the same but implies in addition a certain utility or value to the end user or recipient. So, whether data are information depends on whether the data have utility to the recipient. Typically, for raw data to be information, the data first need to be processed, aggregated, summarized, and compared. In summary, data typically need to be analyzed, and insight, understanding, or knowledge should be added for data to become useful.
Applying basic operations on a dataset may already provide useful insight and support the end user or recipient in decision making. These basic operations mainly involve selection and aggregation. Both selection and aggregation may be performed in many ways, leading to a plentitude of indicators or statistics that can be distilled from raw data. The following illustration elaborates a number of sales indicators in a retail setting.
Providing insight by customized reporting is exactly what the field of business intelligence (BI) is about. Typically, visualizations are also adopted to represent indicators and their evolution in time, in easy-to-interpret ways. Visualizations provide support by facilitating the user's ability to acquire understanding and insight in the blink of an eye. Personalized dashboards, for instance, are widely adopted in the industry and are very popular with managers to monitor and keep track of business performance. A formal definition of business intelligence is provided by Gartner (http://www.gartner.com/it-glossary):
Example
For managerial purposes, a retailer requires the development of real-time sales reports. Such a report may include a wide variety of indicators that summarize raw sales data. Raw sales data, in fact, concern transactional data that can be extracted from the online transaction processing (OLTP) system that is operated by the retailer. Some example indicators and the required selection and aggregation operations for calculating these statistics are:
◼ Total amount of revenues generated over the last 24 hours: Select all transactions over the last 24 hours and sum the paid amounts, with paid meaning the price net of promotional offers.
◼ Average paid amount in online store over the last seven days: Select all online transactions over the last seven days and calculate the average paid amount;
◼ Fraction of returning customers within one month: Select all transactions over the last month and select customer IDs that appear more than once; count the number of IDs.
Remark that calculating these indicators involves basic selection operations on characteristics or dimensions of transactions stored in the database, as well as basic aggregation operations such as sum, count, and average, among others.
Business intelligence is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.
Note that this definition explicitly mentions the required infrastructure and best practices as an essential component of BI, which is typically also provided as part of the package or solution offered by BI vendors and consultants. More advanced analysis of data may further support users and optimize decision making. This is exactly where analytics comes into play. Analytics is a catch-all term covering a wide variety of what are essentially data-processing techniques. In its broadest sense, analytics strongly overlaps with data science, statistics, and related fields such as artificial intelligence (AI) and machine learning. Analytics, to us, is a toolbox containing a variety of instruments and methodologies allowing users to analyze data for a diverse range of well-specified purposes. Table 1.1 identifies a number of categories of analytical tools that cover diverse intended uses or, in other words, allow users to complete a diverse range of tasks.
Table 1.1 Categories of Analytics from a Task-Oriented Perspective
A first main group of tasks identified in Table 1.1 concerns prediction. Based on observed variables, the aim is to accurately estimate or predict an unobserved value. The applicable subtype of predictive analytics depends on the type of target variable, which we intend to model as a function of a set of predictor variables. When the target variable is categorical in nature, meaning the variable can only take a limited number of possible values (e.g., churner or not, fraudster or not, defaulter or not), then we have a classification problem. When the task concerns the estimation of a continuous target variable (e.g., sales amount, customer lifetime value, credit loss), which can take any value over a certain range of possible values, we are dealing with regression. Survival analysis and forecasting explicitly account for the time dimension by either predicting the timing of events (e.g., churn, fraud, default) or the evolution of a target variable in time (e.g., churn rates, fraud rates, default rates). Table 1.2 provides simplified example datasets and analytical models for each type of predictive analytics for illustrative purposes.
Table 1.2 Example Datasets and Predictive Analytical Models
The second main group of analytics comprises descriptive analytics that, rather than predicting a target variable, aim at identifying specific types of patterns. Clustering or segmentation aims at grouping entities (e.g., customers, transactions, employees, etc.) that are similar in nature. The objective of association analysis is to find groups of events that frequently co-occur and therefore appear to be associated. The basic observations that are being analyzed in this problem setting consist of variable groups of events; for instance, transactions involving various products that are being bought by a customer at a certain moment in time. The aim of sequence analysis is similar to association analysis but concerns the detection of events that frequently occur sequentially, rather than simultaneously as in association analysis. As such, sequence analysis explicitly accounts for the time dimension. Table 1.3 provides simplified examples of datasets and analytical models for each type of descriptive analytics.
Table 1.3 Example Datasets and Descriptive Analytical Models
Note that Tables 1.1 through 1.3 identify and illustrate categories of approaches that are able to complete a specific task from a technical rather than an applied perspective. These different types of analytics can be applied in quite diverse business and nonbusiness settings and consequently lead to many specialized applications. For instance, predictive analytics and, more specifically, classification techniques may be applied for detecting fraudulent credit-card transactions, for predicting customer churn, for assessing loan applications, and so forth. From an application perspective, this leads to various groups of analytics such as, respectively, fraud analytics, customer or marketing analytics, and credit risk analytics. A wide range of business applications of analytics across industries and business departments is discussed in detail in Chapter 3.
With respect to Table 1.1, it needs to be noted that these different types of analytics apply to structured data. An example of a structured dataset is shown in Table 1.4. The rows in such a dataset are typically called observations, instances, records, or lines, and represent or collect information on basic entities such as customers, transactions, accounts, or citizens. The columns are typically referred to as (explanatory or predictor) variables, characteristics, attributes, predictors, inputs, dimensions, effects, or features. The columns contain information on a particular entity as represented by a row in the table. In Table 1.4, the second column represents the age of a customer, the third column the postal code, and so on. In this book we consistently use the terms observation and variable (and sometimes more specifically, explanatory, predictor, or target variable).
Table 1.4 Structured Dataset
Because of the structure that is present in the dataset in Table 1.4 and the well-defined meaning of rows and columns, it is much easier to analyze such a structured dataset compared to analyzing unstructured data such as text, video, or networks, to name a few. Specialized techniques exist that facilitate analysis of unstructured data – for instance, text analytics with applications such as sentiment analysis, video analytics that can be applied for face recognition and incident detection, and network analytics with applications such as community mining and relational learning (see Chapter 2). Given the rough estimate that over 90 % of all data are unstructured, clearly there is a large potential for these types of analytics to be applied in business.
However, due to the inherent complexity of analyzing unstructured data, as well as because of the often-significant development costs that only appear to pay off in settings where adopting these techniques significantly adds to the easier-to-apply structured analytics, currently we see relatively few applications in business being developed and implemented. In this book, we therefore focus on analytics for analyzing structured data, and more specifically the subset listed in Table 1.1. For unstructured analytics, one may refer to the specialized literature (Elder IV and Thomas 2012; Chakraborty, Murali, and Satish 2013; Coussement 2014; Verbeke, Martens and Baesens 2014; Baesens, Van Vlasselaer, and Verbeke 2015).