Читать книгу Decision Intelligence For Dummies - Pamela Baker - Страница 54

MIA: Chunks of crucial but hard-to-get real-world data

Оглавление

At a 2019 Microsoft workshop, the powers that be gave tech journalists and industry analysts hands-on experience in programming AI chat bots and a preview of upcoming Microsoft data-related technologies, including AI, quantum computing, and bioinformatics. One topic touched on was the need for synthetic data, although if I recall correctly, Microsoft called it something else at the time. (Virtual data? Augmented data?)

Regardless of what people call it, you might ask why anyone would need to use artificially created data, given the exponential growth of data from the real world. International Data Corporation, a premier global market intelligence firm, has reported that data from the real world expected to be created over the years 2020 to 2023 is growing at a rate that will surpass the amount of data created over the past 30 years. The analysts also say that the world will create more than three times the data over the next five years than it did in the previous five. Statista, another global leader when it comes to market and consumer data, pegs data growth to be more than 181 zettabytes by 2025.

I don’t care how big your data center is, that’s an overwhelming amount of data! So, why on earth would you need to create artificial data on top of what you already have? Well, it comes down to the fact that data sets are by nature incomplete. Furthermore, some real-world data is extremely difficult, impossible, or too dangerous to capture.

Synthetic, augmented, and virtual data aren’t the same things as false-made-up-out-of-whole-cloth data here, although false data or manipulated data can definitely be injected into real-world and synthetic data sets. (Those are problems for cybersecurity and data validators to address.) Here I’m talking about creating data that you cannot easily, safely or affordably obtain through other means. For example, you might think that getting wind speed data from the blades of a wind turbine, like the ones shown in Figure 3-1, would be a simple matter of taking reads from a sensor on the blades. But what do you do if those sensors fail?

You can’t safely send a repairperson to replace the sensor in the middle of a commercial wind turbine farm where the wind coming off the blades of numerous high-powered windmills can be at hurricane force. You can, however, infer data reads based on previous sensor data relative to neighboring wind turbine data in current weather conditions — filling in the missing data with values inferred from previous metrics and/or neighboring devices’ measurements, in other words. For example, one can infer without benefit of actually measuring it again that, since a specific structure measured 6 feet tall yesterday and it does not possess the ability to grow, that it is still 6 feet tall today. A better inference would also note that the structure has not toppled or sunk into the ground.

However, you can also create synthetic data sets based on known laws of physics, wind turbine specs, and other factors to create a simulation resulting in synthetic data that can be safely collected and used in decision-making. Most, but not all, synthetic data is created by simulations.


FIGURE 3-1: How fast are these spinning again?

Another example would be facial recognition data. Many countries regulate how much (if any) facial data can be taken or used without a person’s prior consent. This can significantly limit the amount of facial data available on which to train facial recognition machine learning models. To overcome a data shortage, companies turn to AI-generated faces of people who don’t actually exist. Data from fake faces also helps machine learning know how to determine which faces are real and which are not. The distinction can be useful in many endeavors, including detecting deep fake videos.

In decision intelligence, data considerations aren’t the first priority. Focus on the outcome you want and then determine the tools and data you need to get there. Once you have a map in hand, it’s easier to determine whether the data you need is available or needs to be obtained.

Decision Intelligence For Dummies

Подняться наверх