Читать книгу Mind+Machine - Vollenweider Marc - Страница 8
PART I
THE TOP 12 FALLACIES ABOUT MIND+MACHINE
FALLACY #2
MORE DATA MEANS MORE INSIGHT
ОглавлениеCompanies complain that they have far more data than insight. In 2014, the International Data Corporation (IDC) predicted that the amount of data available to companies will increase tenfold by 2020, doubling every two years.4 In one conversation, a client compared this situation to a desert with the occasional oasis of insight.
“Marc, we are inundated with reports and tables, but who'll give me the ‘so what'? I don't have enough time in the day to study all the data, and my junior people don't have the experience to come up with interesting insights.”
The ratio seems to worsen daily, as the amount of available data rises rapidly, while the level of insight remains constant or increases only slightly. The advent of the Internet of Things puts us at risk of making this ratio even worse, with more devices producing more data.
As the Devex consulting practice writes:
Stanley Wood, senior program officer for data, evidence and learning at the Bill & Melinda Gates Foundation, has said that while large sums have been invested to collect various types of data, much of the results these efforts yielded were nowhere to be seen. In a previous interview, Wood even told Devex that one of the biggest points open data can help with is the waste of billions of dollars that have been spent in data collection.5
FT.com writes:
According to Juerg Zeltner, CEO of UBS Wealth Management, a mass of information does not equal a wealth of knowledge. With global financial markets continuing to be volatile, the need for interpretation has never been greater.6
Before we proceed, let me introduce a simple but necessary concept for talking about data. It is quite surprising how confused the discussion of the term data still is, even among data scientists and vendors. In reality, there are four fundamentally different levels, depicted in Figure I.3:
Figure I.3 Pyramid of Use Cases (Levels 1–4)
● Level 1: Data – raw data and “cleansed” or “preprocessed” data
This could be a sequence of measurements sent from a temperature or vibration sensor in a packaging machine, a set of credit card transactions, or a few pictures from a surveillance camera. No meaning can be gleaned without further processing or analysis. You may know the term cleansing, but this just refers to readying data for further analysis (e.g., by changing some formats).
Returning to our restaurant analogy from the start of this section, raw data are like raw vegetables just delivered from the grocery store, but not really scrutinized by the chef. Data quality remains a very big issue for companies. In the 2014 Experian Data Quality survey, 75 percent of responding UK companies said that they had wasted 14 percent of their revenue due to bad data quality.7
● Level 2: Information – data already analyzed to some extent
Simple findings have already been derived. For example, the sensor data has been found to contain five unexpected outliers where vibrations are stronger than allowed in the technical specifications, or an analysis of market shares has shown the ranking of a product's market share in various countries in the form of a table or a pie chart. The key point is that we have some initial findings, but certainly no “so what.”
In the restaurant analogy, the chef might have cut and cooked the vegetables, but they haven't been arranged on the plate yet.
● Level 3: Insight – the “so what” that helps in making value-adding decisions
This is what the decision maker is looking for. In our restaurant analogy, the vegetables have now been served on the plate as part of the full meal, and the patron's brain has signaled this wonderful moment of visual and gustatory enjoyment.
There is definitely some room to improve, as shown in a BusinessIntelligence.com survey sponsored by Domo: only 10 percent of 302 CEOs and CXOs believed that their reports provided a solid foundation for decision making,8 and 85 percent of 600 executives interviewed by the Economist Intelligence Unit (EIU) mentioned that the biggest hurdle of analytics was to derive actionable insights from the data.9
● Level 4: Knowledge – a group of Level 3 insights available to others across time and space
This is the essence of what analytics, and indeed research, aims for: insights have been made reusable over time by multiple people in multiple locations. The decision maker might still decide to ignore the knowledge (not everyone learns from history!), but the insights are available in a format that can be used by others. In the restaurant analogy, our guest was actually a reviewer for a major and popular food blog, magazine, or even the Michelin Guide. The reviewer's description informs others, sharing the experience and helping in decisions about the next evening out.
A core question that I am posing here is how these four levels of data relate to the concept of mind+machine: where does Mind have a unique role, and where can Machine assist? The short answer is that machines are essential at Level 1 and are becoming better at their role there. At Level 2, some success has been achieved with machines creating information out of data automatically. However, Levels 3 and 4 will continue to require the human mind for 99 percent of analytics use cases in the real world for quite some time.
It is interesting to see that companies are experiencing challenges across Levels 1 to 3, with a higher focus on Level 3. A 2013 survey sponsored by Infogroup and YesMail with more than 700 marketers showed that 38 percent were planning to improve data analysis, 31 percent data cleansing, and 28 percent data collection capabilities.10 The survey did not include questions pertinent to Level 4.
To illustrate the variation in data volumes for each level, we'll take the use case of the chef explaining the process of cooking a great dish in various ways: in a video, in an audio recording, and in a recipe book. Let's assume that all these media ultimately contain the same Level 4 knowledge: how to prepare the perfect example of this dish.
A video can easily have a data volume between 200 megabytes (1 MB = 1 million bytes = 8 million bits) up to about 1 gigabyte (1 GB = 1 billion bytes = 8 billion bits) depending on the definition resolution. A one-hour audio book describing the same meal would be about 50 to 100 megabytes – roughly 4–10 times less data than the video – and the 10 pages of text required to describe the same process would be only about 0.1 megabytes – about 2,000 times less data than the video.
The actual Level 3 insights and the Level 4 knowledge consume only a very small amount of storage space (equal to or less than the text in the book), compared to the initial data volumes. If we take all the original video cuts that never made it into the final video, the Level 1 data volume might have even been 5 to 10 times bigger.
Therefore, the actual “from raw data to insight” compression factor could easily be 10,000 in this example. Please be aware that this compression factor is different from the more technical compression factor used to store pictures or data more efficiently (e.g., in a file format such as .jpeg or .mp3). This insight compression factor is probably always higher than the technical compression factor because we elevate basic data to higher-level abstract concepts the human brain can comprehend more easily.
The key point is that decision makers really want the compressed insight and the knowledge, not the raw data or even the information. The reality we see with our clients is exactly the opposite. Everyone seems to focus on creating Level 1 data pools or Level 2 reports and tables with the help of very powerful machines, but true insights are as rare as oases in the desert.
If you're not convinced, answer this question. Who in your organization is getting the right level of insight at the right time in the right delivery format to make the right decision?
Here is a funny yet sad real-life situation I encountered a few years ago. An individual in a prospective client's operations department had spent half of their working time for the previous seven years producing a list of records with various types of customer data. The total annual cost to the company, including overhead, was USD 40,000. When we spoke to the internal customer, we received confirmation that they had received the list every month since joining the company a few years previously. They also told us that they deleted it each time because they did not know its purpose. The analysis never made it even to Level 2 – it was in fact a total waste of resources.
Regarding delivery, I can share another story. A senior partner in a law firm got his team to do regular reports on the key accounts for business development – or as they referred to it, client development. The team produced well-written, insightful 2–3 MB reports in MS Word for each account and sent them to the partner via email. However, he found this format inconvenient – he perceived scrolling through documents on his Blackberry to be a hassle and didn't even realize that his team summarized the key takeaways from each report in the body of the email itself.
In this case, the Level 3 insights actually existed but had zero impact: right level of insight, right timing, but wrong format for that decision maker. You can imagine what wasted resources it took to create these reports. This example also illustrates the need to change the delivery of insights from lengthy reports into a model where relevant events trigger insightful and short alerts to the end users, prompting them to take action.
These two examples show the need to understand the value chain of analytics in more detail. The value is created largely at the end, when the decision is made, while the effort and cost are spent mostly at the beginning of the analytics cycle, or the Ring of Knowledge (Figure I.4):
Figure I.4 The Ring of Knowledge
Step 1: Gather new data and existing knowledge (Level 1).
Step 2: Cleanse and structure data.
Step 3: Create information (Level 2).
Step 4: Create insights (Level 3).
Step 5: Deliver to the right end user in the right format, channel, and time.
Step 6: Decide and take action.
Step 7: Create knowledge (Level 4).
Step 8: Share knowledge.
If any step fails, the efforts of the earlier steps go to waste and no insight is generated. In our first example, step 3 never happened so steps 1 and 2 were a waste of time and resources; in the second, step 5 failed: the insight desert was not successfully navigated!
The insight desert is filled with treacherous valleys and sand traps that could block the road to the oasis at each stage:
● Steps 1 and 2: Functional or geographic silos lead to the creation of disparate data sets. Inconsistent definitions of data structures and elements exist with varying time stamps. Various imperfect and outdated copies of the original sources lead to tens, hundreds, or thousands of manual adjustments and more errors in the data.
● Step 3: Too much information means that really interesting signals get lost. There is a lack of a proper hypothesis of what to analyze.
● Step 4: There is a lack of thinking and business understanding. Data scientists sometimes do not fully understand the end users' needs. Contextual information is lacking, making interpretation difficult or impossible. Prior knowledge is not applied, either because it does not exist or because it is not accessible in time and at a reasonable cost.
● Steps 5 and 6: Communication problems occur between the central data analytics teams and the actual end users. Distribution issues prevent the insights from being delivered – the so-called Last Mile problem is in effect. There is an ineffective packaging and delivery model for the specific needs of the end user.
● Steps 7 and 8: There is a lack of accountability for creating and managing the knowledge. Central knowledge management systems contain a lot of obsolete and irrelevant content, and a lack of documentation leads to loss of knowledge (e.g., in cases of employee attrition).
Any of the aforementioned means a very significant waste of resources. These issues keep business users from making the right decisions when they are needed. The problem is actually exacerbated by the increasing abundance of computing power, data storage capacity, and huge new data sources.
What would a world of insight and knowledge look like? Our clients mention the following key ingredients:
● Less but more insightful analysis addressing 100 percent of the analytics use case
● Analytic outputs embedded in the normal workflow
● More targeted and trigger-based delivery highlighting the key issues rather than just regular reporting or pull analysis
● Short, relevant alerts to the end user rather than big reports deposited on some central system
● Lower infrastructure cost and overhead allocations to the end users
● No requirement to start a major information technology (IT) project to get even simple analyses
● Simple, pay-as-you-go models for analytics output rather than significant fixed costs
● Full knowledge management of analytics use cases, so that the world does not need to be reinvented each time there is a change
Innovation Scouting: Finding Suitable Innovations shows a good example of very large amounts of data being condensed into a high-impact set of insights, and knowledge being extracted to ensure learning for the future.
Конец ознакомительного фрагмента. Купить книгу
4
Vernon Turner, John F. Gantz, David Reinsel, and Stephen Minton, “The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things,” IDC iView, 2014, www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm.
5
Mario Villamor, “Is #globaldev Optimism over Big Data Based More on Hype Than Value?,” Devex.com, 2015, https://www.devex.com/news/is-globaldev-optimism-over-big-data-based-more-on-hype-than-value-86705.
6
Jürg Zeltner, “A Mass of Information Does Not Equal a Wealth of Knowledge,” FT.com, January 2015, www.ft.com/cms/s/0/69b0154c-959a-11e4-a390-00144feabdc0.html#axzz491J3nqhQ.
7
Experian News, “New Experian Data Quality Research Shows Inaccurate Data Preventing Desired Customer Insight,” Experian, 2015, https://www.experianplc.com/media/news/2015/new-experian-data-quality-research-shows-inaccurate-data-preventing-desired-customer-insight.
8
Domo and BusinessIntelligence.com, “What Business Leaders Hate about Big Data,” Domo, 2013, https://web-assets.domo.com/blog/wp-content/uploads/2013/09/Data_Frustrations_Final2.pdf.
9
Aaron Kahlow, “Data Driven Marketing, Is 2014 the Year?,” Online Marketing Institute, January 2014, https://www.onlinemarketinginstitute.org/blog/tag/analytics.
10
Infogroup and YesMail, “Data-Rich and Insight-Poor,” Infogroup and YesMail, 2013, www.infogrouptargeting.com/lp/its/data-rich-insight-poor/index-desk.html.