Читать книгу Outsmarting AI - Brennan Pursell - Страница 27
The Math of AI
ОглавлениеAs we said in the introduction, AI is software at work on computer hardware, and it performs sophisticated statistical analysis of your digitized data. AI is just math.
So let’s start with the math. Now don’t close the book!
I want to equip you against the torrents of numbers and statistical calculations coming from data scientists. I’ll explain the essential principles and spare you the formulas. The math, in some cases, is centuries old, and computers do it all today anyway, but you as the human have to understand what it’s doing, because it doesn’t.
Remember that the goal is to obtain business value from your data. The terms below will empower you as you get to know AI tools and implement them in your organization.[2]
At the heart of it all is conditional probability, which is just a percentage. What is the chance that something is going to happen—or not happen—given what has taken place? For example, given the data, what is the chance that this or that transaction will occur? And these percentages are constantly changing. There is, for example, no set percentage chance that you might develop colon cancer for the duration of your adulthood. Doctors hawking colonoscopies won’t show you that your chance of having cancerous polyps is not just the national average, but that average adjusted over time by your age, your weight, the prevalence of the disease in your family, your daily diet, your level of physical activity, the incidence of intestinal inflammation, whether you have already been checked and cleared once before, etc.
Netflix’s recommendation system, a major part of its market success, is likewise based on conditional probability. Given the films you have seen and liked, what other films should be recommended to you? Your own viewing history, however, is a very limited data set. What about everyone else who liked the films that you liked? What other films did they like? Netflix’s recommendations made for you are based on a vast range of data entries about people’s viewing histories and many other factors as well. AI algorithms process that data, calculate, and update those recommendations for each user. Netflix paid $1 million in prize money for the algorithm model in 2008. The paper about it is available online for free.[3]
Conditional probability is a key component of the math-mix that allows AI to constantly update and improve its predictive calculations in just about every imaginable application. Many call it “personalization,” but the software and hardware that calculate it could not be more impersonal.
Prediction rules are just mathematical equations that describe the relationship between input data and the calculated output. You can also call them models. The easiest example is your maximum heart rate, given your age. Subtract your age (the input) from 220, and you have your maximum number of heart beats per minute (the output). As you age, your maximum heart rate declines. Add more and more data points, and the prediction rules become necessarily more complex. They are the “patterns” that AI can “detect.” When you “train” an AI system on a data set, and it “learns a pattern,” that means that it fits the prediction rules to match the data inputs and outputs.[4] I’ll come back to this idea with “backpropagation,” below.
Regression analysis calculates the statistical relationship between variables, usually an input and an output. You’ve probably seen a regression graph before, perhaps many times. A picture is worth a thousand words.[5]
Regression Analysis
Source: Public Domain. Wikipedia.org. Regression analysis. Image by Sewaqu, 4 Nov. 2010. Public Domain.
The line, expressed in a mathematical equation, is the prediction rule for the relationship between the X and Y data entries for each point. The best equation has the lowest average distance of all points to the line. This mathematical achievement comes from Adrien-Marie Legendre in 1805.
Regression analysis can be linear, showing the relationship between a dependent variable and an independent variable. If more than one independent variable influences a dependent variable, then “multiple regression” can show those statistical relationships in linear and nonlinear ways. We won’t go deeper into the math.
But now apply the idea of conditional probability, and that is how real-time language translation works in computers. Which word or phrase in English, for example, is most likely to match a word or phrase in German, based on the vast data set of translated texts, and given what you have just written hitherto? Such a translation system is nowhere near perfect, because it has no idea what the words mean, but it is getting better all the time.
Parameters are just numbers that you can manipulate in prediction rules to obtain the best result. An example is 220 in your maximum heart rate (MHR) equation: MHR = 220 – Age. That number can move a bit based on the limitations of your data set. The more parameters you add, the better your model’s accuracy becomes. MHR = 208 – 0.7 x Age is actually a better prediction rule. AI can handle many, many parameters, and they are essential for image recognition and identification. A certain model developed to distinguish between digitized images of two dog breeds has nearly 390,000 parameters.[6] No, you don’t have to program or enter them from scratch. AI can set and adjust them automatically, based on the data sets used for training the model.
Bayes’s rule (also called Bayes’s law or theorem, or “Bayesian inference”) only means that you have to update your current state of knowledge as more data becomes available. Bayes’s rule is similar to conditional probability, and the basic idea is easy: prior probability + new facts = revised probability. The formula takes the prior probability that something is the case (a percentage), multiplies it by the accuracy of the new data (another percentage), which is then divided by the new data’s results (a final percentage). (The actual formula is in this footnote.[7]) Thomas Bayes, by the way, is another oldie, a mathematically gifted English Presbyterian minister who lived and worked in the 1700s.
A good applied example of Bayes’s rule is in self-driving cars. It is crucial for autonomous vehicles, drones, robots, and other devices to “know” where they are at any given moment, taking into account all the incoming sensor and location data. The vehicle’s AI system must constantly calculate the need to adjust its position relative to other vehicles and potential obstacles (such as pedestrians), and even perhaps take evasive action.
A vector is just a set of numbers. Think of it as a horizontal row in a table with many columns. All the numbers in the vector are associated with the first in the first cell of the row. AI uses vectors to process language. AI “understands” human language by associating one word with others in numbers, ranging from 0 to 1. Think of it as the percentage chance that one word will appear near another. For example, “happy,” generally speaking, has a low association (close to 0) with “new,” but “happy” followed by “new” has an association much closer to 1 when typed at the end of December and the top of January, followed by “year.” Happy New Year. The inputs of words and their “semantic closeness” expressed in numbers, compiled as “word co-location statistics,” powers AI speech recognition: Alexa, Siri, Cortana, Google voice, and all chatbots.[8]
Variability is crucial for AI to identify statistical anomalies in data sets. Banks and credit card companies use it for fraud detection, and supply chain managers, forensic accountants, equipment maintenance managers, and sports teams apply it to data to predict when things may start to go wrong before a full breakdown occurs. The square-root rule specifies the average variability relative to sample size. You take the variability of a single measurement and divide it by the square root of the sample size. Formula One uses this to check streams of data from its cars’ engines, tires, brakes, etc., to look for signs of impending failure. “Smart cities” use it to monitor and target inspections for many kinds of problems, from gas leaks to illegal subdivision of apartments.[9]
The constant threat of fraud in millions of electronic payment transactions demands the best tools for cost-effective, automated oversight. AI applications are cutting-edge, built on old math. The square-root rule was discovered by Abraham de Moivre in Switzerland in 1718.
See how there’s little that’s new in the math that underlies AI?! This chapter does not get into calculus, derivatives, and linear algebra, but they are even older, stemming from the 1600s. Without them, the algorithms you will meet next would be unthinkable.