Читать книгу Smarter Data Science - Cole Stryker - Страница 45
Quae Quaestio (Question Everything)
ОглавлениеDifferent users might phrase comparable questions using different terminology, and even the same user from query to query might introduce nuances and various idiosyncrasies. Users are not always succinct or clear about their objectives or informational needs. Users may not necessarily know what to request.
Consequently, in business, there is a need to question everything to gain understanding. Although it might seem that to “question everything” stymies progress in an endless loop (Figure 2-5), ironically to “question everything” opens up all possibilities to exploration, and this is where the aforementioned trust matrix can help guide the development of a line of inquiry. This is also why human salespeople, as a technique, will often engage a prospect in conversation about their overall needs, rather than outright asking them what they are looking for.
Figure 2-5: Recognizing that the ability to skillfully ask questions is the root to insight
In Douglas Adams' The Hitchhiker's Guide to the Galaxy, when the answer to the ultimate question was met with a tad bit of disdain, the computer said, “I think the problem, to be quite honest with you, is that you've never actually known what the question is” (New York: Harmony Books, 1980). The computer then surmised that unless you fully come to grips with what you are asking, you will not always understand the answer. Being able to appropriately phrase a question (or query) is a topic that cannot be taken too lightly.
Inserting AI into a process is going to be more effective when users know what they want and can also clearly articulate that want. As there are variations as to the type of an AI system and many classes of algorithms that comprise an AI system, the basis to answer variations in the quality of question is to first seek quality and organization in the data.
However, data quality and data organization can seem out-of-place topics if an AI system is built to leverage many of its answers from unstructured data. For unstructured data that is textual—versus image, video, or audio—the data is typically in the form of text from pages, documents, comments, surveys, social media, and so on. But even nontextual data can yield text in the form of metadata, annotations, or tags via transcribing (in the case of audio) or annotating/tagging words or objects found in an image, as well as any other derivative information such as location, object sizes, time, etc. All types of unstructured data can still yield structured data from parameters associated with the source and the data's inherent context.
Social media data, for example, requires various additional data points to describe users, their posts, relationships, time of posts, location of posts, links, hashtags, and so on. This additional data is a form of metadata and is not characteristic of the typical meta-triad: business metadata, technical metadata, and operational metadata. While data associated with social media is regarded as unstructured data, there is still a need for an information architecture to manage the correlations between the core content: the unstructured data, along with the supporting content (the structured metadata). Taken in concert, the entire package of data can be used to shape patterns of interest.
Even in the case of unsupervised machine learning (a class of application that derives signals from data that has not previously been predefined by a person), the programmer must still describe the data with attributes/features and values.