Читать книгу Enterprise AI For Dummies - Zachary Jarvinen - Страница 80

Identifying data sources

Оглавление

Before you start, you should perform a data audit to determine what data you already have and identify gaps in your data that you must fill to accomplish your business goals.

As mentioned in Chapter 1, for the enterprise, data falls into two categories: structured data (databases and spreadsheets) and unstructured data (email, text messages, voice mail, social media, connected sensors, and so on). Potential sources for data include:

 Internal data: The first place to look is the IT department, but depending on the organization, you may not find everything you need in one place. The most common challenges associated with big data aren’t analytics problems; they are information integration problems. To reap the benefits of big data, you must first slay the data silo dragon, from department-level tribal thinking down to that one app on that one computer in that one person’s office.

 Data capture: The second place to look is the data entering your organization. It can arrive in many forms, but you can use data extraction, metadata extraction, and categorization to supplement data. For example, you can run paper documents, whether handwritten or printed, through an optical character recognition system to digitize them in preparation for processing. Then they can join the rest of the digital data, such as emails, PDF files, Word documents, images, voice mail messages, videos, and other formats to be classified and populate the data store that will feed your AI insights.

 Data as a service (DaaS): If there are still holes in your data requirements, you can turn to third-party data for purchase, either commercial datasets such as Accuweather or public datasets such as data.gov and Kaggle.com. Broadening your datasets can increase the insights lurking in your own data.

Enterprise AI For Dummies

Подняться наверх