Читать книгу Profit Driven Business Analytics - Baesens Bart - Страница 11
CHAPTER 1
A Value-Centric Perspective Towards Analytics
ANALYTICS TEAM
ОглавлениеProfiles
The analytics process is essentially a multidisciplinary exercise where many different job profiles need to collaborate. First of all, there is the database or data warehouse administrator (DBA). The DBA ideally is aware of all the data available within the firm, the storage details and the data definitions. Hence, the DBA plays a crucial role in feeding the analytical modeling exercise with its key ingredient, which is data. Since analytics is an iterative exercise, the DBA may continue to play an important role as the modeling exercise proceeds.
Another very important profile is the business expert. This could, for instance, be a credit portfolio manager, brand manager, fraud investigator, or e-commerce manager. The business expert has extensive business experience and business common sense, which usually proves very valuable and crucial for success. It is precisely this knowledge that will help to steer the analytical modeling exercise and interpret its key findings. A key challenge here is that much of the expert knowledge is tacit and may be hard to elicit at the start of the modeling exercise.
Legal experts are gaining in importance since not all data can be used in an analytical model because of factors such as privacy and discrimination. For instance, in credit risk modeling, one typically cannot discriminate good and bad customers based on gender, beliefs, ethnic origin, or religion. In Web analytics, information is typically gathered by means of cookies, which are files that are stored on the user's browsing computer. However, when gathering information using cookies, users should be appropriately informed. This is subject to regulation at various levels (regional and national, and supranational, e.g., at the European level). A key challenge here is that privacy and other regulatory issues vary highly depending on the geographical region. Hence, the legal expert should have good knowledge about which data can be used when, and which regulation applies in which location.
The software tool vendors should also be mentioned as an important part of the analytics team. Different types of tool vendors can be distinguished here. Some vendors only provide tools to automate specific steps of the analytical modeling process (e.g., data preprocessing). Others sell software that covers the entire analytical modeling process. Some vendors also provide analytics-based solutions for specific application areas, such as risk management, marketing analytics, or campaign management.
The data scientist, modeler, or analyst is the person responsible for doing the actual analytics. The data scientist should possess a thorough understanding of all big data and analytical techniques involved and know how to implement them in a business setting using the appropriate technology. In the next section, we discuss the ideal profile of a data scientist.
Data Scientists
Whereas in a previous section we discussed the characteristics of a good analytical model, in this paragraph we elaborate on the key characteristics of a good data scientist from the perspective of the hiring manager. It is based on our consulting and research experience, having collaborated with many companies worldwide on the topic of big data and analytics.
A Data Scientist Should Have Solid Quantitative Skills
Obviously, a data scientist should have a thorough background in statistics, machine learning and/or data mining. The distinction between these various disciplines is becoming more and more blurred and is actually no longer that relevant. They all provide a set of quantitative techniques to analyze data and find business-relevant patterns within a particular context such as fraud detection or credit risk management. A data scientist should be aware of which technique can be applied, when, and how, and should not focus too much on the underlying mathematical (e.g., optimization) details but, rather, have a good understanding of what analytical problem a technique solves, and how its results should be interpreted. In this context, the education of engineers in computer science and/or business/industrial engineering should aim at an integrated, multidisciplinary view, with graduates formed in both the use of the techniques, and with the business acumen necessary to bring new endeavors to fruition. Also important is to spend enough time validating the analytical results obtained so as to avoid situations often referred to as data massage and/or data torture, whereby data are (intentionally) misrepresented and/or too much time is expended in discussing spurious correlations. When selecting the optimal quantitative technique, the data scientist should consider the specificities of the context and the business problem at hand. Key requirements for business models have been discussed in the previous section, and the data scientist should have a basic understanding of, and intuition for, all of those. Based on a combination of these requirements, the data scientist should be capable of selecting the best analytical technique to solve the particular business problem.
A Data Scientist Should Be a Good Programmer
As per definition, data scientists work with data. This involves plenty of activities such as sampling and preprocessing of data, model estimation, and post-processing (e.g., sensitivity analysis, model deployment, backtesting, model validation). Although many user-friendly software tools are on the market nowadays to automate and support these tasks, every analytical exercise requires tailored steps to tackle the specificities of a particular business problem and setting. In order to successfully perform these steps, programming needs to be done. Hence, a good data scientist should possess sound programming skills in, for example, SAS, R, or Python, among others. The programming language itself is not that important, as long as the data scientist is familiar with the basic concepts of programming and knows how to use these to automate repetitive tasks or perform specific routines.
A Data Scientist Should Excel in Communication and Visualization Skills
Like it or not, analytics is a technical exercise. At this moment, there is a huge gap between the analytical models and the business users. To bridge this gap, communication and visualization facilities are key! Hence, a data scientist should know how to represent analytical models and their accompanying statistics and reports in user-friendly ways by using, for example, traffic light approaches, OLAP (online analytical processing) facilities, or if-then business rules, among others. A data scientist should be capable of communicating the right amount of information without getting lost in complex (e.g., statistical) details, which will inhibit a model's successful deployment. By doing so, business users will better understand the characteristics and behavior in their (big) data, which will improve their attitude toward and acceptance of the resulting analytical models. Educational institutions must learn to balance between theory and practice, since it is known that many academic degrees mold students who are skewed to either too much analytical or too much practical knowledge.
A Data Scientist Should Have a Solid Business Understanding
While this might seem obvious, we have witnessed (too) many data science projects that failed since the respective data scientist did not understand the business problem at hand. By business we refer to the respective application area. Several examples of such application areas have been introduced in Table 1.5. Each of those fields has its own particularities that are important for a data scientist to know and understand in order to be able to design and implement a customized solution. The more aligned the solution with the environment, the better its performance will be, as evaluated according to each of the dimensions or criteria discussed in Table 1.7.
A Data Scientist Should Be Creative!
A data scientist needs creativity on at least two levels. First, on a technical level, it is important to be creative with regard to feature selection, data transformation and cleaning. These steps of the standard analytics process have to be adapted to each particular application and often the right guess could make a big difference. Second, big data and analytics is a fast-evolving field. New problems, technologies, and corresponding challenges pop up on an ongoing basis. Therefore, it is crucial that a data scientist keeps up with these new evolutions and technologies and has enough creativity to see how they can create new opportunities. Figure 1.2 summarizes the key characteristics and strengths constituting the ideal data scientist profile.
Figure 1.2 Profile of a data scientist.