Читать книгу Artificial Intelligence for Marketing - Sterne Jim - Страница 19
CHAPTER 1
Welcome to the Future
MACHINE LEARNING'S BIGGEST ROADBLOCK
ОглавлениеThat would be data. Even before the application of machine learning to marketing, the glory of big data was that you could sort, sift, slice, and dice through more data than previously computationally possible.
Massive numbers of website interactions, social engagements, and mobile phone swipes could be sucked into an enormous database in the cloud and millions of small computers that are so much better, faster, and cheaper than the Big Iron of the good old mainframe days could process the heck out of it all. The problem then – and the problem now – is that these data sets do not play well together.
The best and the brightest data scientists and analysts are still spending an enormous and unproductive amount of time performing janitorial work. They are ensuring that new data streams are properly vetted, that legacy data streams continue to flow reliably, that the data that comes in is formatted correctly, and that the data is appropriately groomed so that all the bits line up.
■ Data set A starts each week on Monday rather than Sunday.
■ Data set B drops leading zeros from numeric fields.
■ Data set C uses dashes instead of parentheses in phone numbers.
■ Data set D stores dates European style (day, month, year).
■ Data set E has no field for a middle initial.
■ Data set F stores transaction numbers but not customer IDs.
■ Data set G does not include in‐page actions, only clicks.
■ Data set H stores a smartphone's IMEI or MEID number rather than its phone number.
■ Data set I is missing a significant number of values.
■ Data set J uses a different scale of measurements.
■ Data set K, and so on.
It's easy to see how much work goes into data cleansing and normalization. This seems to be a natural challenge for a machine learning application.
Конец ознакомительного фрагмента. Купить книгу