Читать книгу Outsmarting AI - Brennan Pursell - Страница 28
The Software of AI
ОглавлениеNow for the software—the coded algorithms that perform the math.
You have probably heard the terms machine learning, predictive analytics, deep learning, and neural networks, which refer to groups of algorithms in code. We’re going to go out on a limb here: In artificial intelligence, all four are pretty much the same thing.[10] Okay, “deep learning” algorithms are usually associated with image classification and voice transcription, but they use “neural networks” just like the others can. All four use the prediction rules or models discussed above. All four involve mathematical, statistical algorithms working on data. There’s no need to parse technical jargon flaunted by marketers.
And, just to reiterate, machines cannot “learn” the way a human does; hardware and software are nothing like neurons; and “deep” can mean anything. Computers do not have self-awareness, independent consciousness, feelings, or even thoughts. AI software is a set of data-analysis tools. All code is prone to bugs, and all computer systems crash from time to time. AI is down-to-earth.
One final observation before we get to term definitions: Software is a lot like the law. They both find commonality in reason. Our point is that business software and the law are not natural enemies. Software and math classify and perform procedures—the former, with numbers, the latter, with words.
Indulge a shallow dive into distant history: The same person, Gottfried Wilhelm Leibniz (1646–1716), developed calculus at about the same time Isaac Newton devised mechanical calculators and created the binary number system that computers use today, whereby all numbers are expressed in 1s and 0s. Leibniz also developed a rational, legal “machine,” a code, for classifying disputes (input data) and generating rulings (classification outputs). For him, math and the law complemented each other.
Okay. We’re ready for the software.
Data captured in software must be accurate, reliable, and correctly classified for any of the above procedures to produce useful results. Computer data can come from many sources: keyboard entries, audio recordings, visual images, sonar readings, GPS, document files, spreadsheets, etc. The devices gathering sensory inputs must themselves be of high quality for the sake of accuracy. All the data is reduced to series of numbers and are normally stored in tables, with many rows and columns. If the data is flawed in any way, the procedures will be risky at best. If the procedures are inaccurate, then your business may make poor decisions and follow wrong actions.
As we said before, “Garbage in, garbage out” is as true today as it ever was. An AI firm proclaiming that its products or services can take any kind and quality of data and turn it into perfect predictions and decisions is just alchemy from the Middle Ages. Many people tried in vain for many centuries to turn common metals such as lead into gold. Even a genius like Isaac Newton poured many hours into that total waste of time and energy.
“Garbage in, garbage out” is true of the law as well. If the evidence is faulty or fake, the results of the court proceedings are going to be skewed, distorted, or just dead wrong. The goal is to calculate right determinations and benefit stakeholders. Data science and the law are compatible, indispensable tools in the struggle to make the right move and do the right thing.
Structured data refers to data stored in tables of rows and columns and formatted in a database for queries and analysis. Queries retrieve data, update it, insert more, delete some, and so on. SQL (structured query language), a computer programming language from the 1980s, is still widely used, especially by Microsoft and Amazon Web Services (AWS), and there are dozens of others. Structured data has to be clean and complete. It has to be accurate.
Let’s use the 80/20 rule here: 80 percent of the time spent on your AI projects will be spent in data preparation. (More on that in chapter 5.)
AI guru Andrew Ng says that structured data “is driving massive value today and will continue as companies across all industries transform themselves with AI.”[11]
Unstructured data gets more of the media attention, because people are more impressed when a machine can identify objects in pictures and respond to written and spoken speech in a human-like way. Unstructured data includes digitized photographs and video, audio recordings, and many kinds of documents that people can readily understand much faster and more thoroughly than a computer. It took years and many millions of images and dollars to get AI algorithms to identify cats in photos with a high degree of reliability. Your average three-year-old child would get it right for the rest of her life after one or two encounters.
Back to data in general. It is the source of your organization’s knowledge, not the AI. It is a key part of your institution’s historical record, actually. Your data is a cross between a gold mine and a swamp. Working through it can really pay off, but you have to be very, very careful. Always maintain a healthy, skeptical attitude toward your data. How accurate is it, really? How was it compiled? Are there obvious or hidden biases that might skew your analysis and its conclusions?
Historical data presents a big problem for AI applications in the US criminal justice system. More often than not, inmates with lighter skin in the past have received parole at a much higher rate than those whose skin is darker, and it is no surprise that AI predictive models recently put these results into use in similar decisions.[12] To what extent does data about past court decisions reflect poor legal representation in court or base prejudice instead of actual guilt or innocence? There is no easy answer to this loaded question. Drug enforcement efforts quite frequently concentrate on neighborhoods with darker-skinned, poorer inhabitants, although drug trade and use are relatively color-blind and prevalent among all socioeconomic classes. Data collected from these efforts used to predict the best time and place for the next drug bust are virtually guaranteed to continue the trend. The computer certainly hasn’t the foggiest idea that anything could be wrong in the data’s origin or derivation.
The problem is nearly universal. In the United Kingdom, a model was used in Durham, in northern England, to predict whether a person released from prison would commit another crime—until people noticed that to a high degree, it correlated repeat offenders with those who live in poorer neighborhoods. Authorities then removed residential address data from the system, and the resulting predictions became more accurate.[13]
People are just people. We are sometimes rational and sometimes not, depending on the circumstances. If only our legal systems were as rational and reliable as mathematical analysis. The two meet in data and the analysis of it. Our great, shared challenge in this age of AI is that the machines serve the people, and not the reverse.
Let’s return to our key term definitions for AI software.
Rules in software say what your organization will do with the outputs. You set the operational rules. You can have an AI system perform numerous tasks, but you and your organization, not the machine, are responsible for the results. Your rules must keep your organization’s actions compliant with the law.
Will your chatbot or texting app use or suggest words associated with hate speech, however popular in use? Will you grant or deny credit to an individual? Will you interview this or that person for a possible job? Will you grant an employee a promotion or not? Will you place this order or not? Will you call the customer about a suspected case of credit card fraud? Will you dispatch a squad car at that time and place? Your organization is completely responsible for the data source—did you obtain it legally?—for the data classification, the procedures, and the rules. The law bans discrimination against people based on race, religion, gender, sexual orientation, etc., even if you made no such data entries. Other laws protect personal privacy about certain topics. (Joshua will explore this topic further in chapter 6.)
Scorecards set up the different factors that contribute to a complex prediction, such as the likelihood of whether someone will contract lung cancer, and algorithms working through masses of historical examples assign points to each factor that accumulate into a final score. Age and the incidence of the disease in family history tend to count more than gender, smoking more than income level or education. Because the computer calculates without thinking rationally, it can point out statistical relationships among the factors without any prior expectations. It may detect possible connections that health experts had not thought of. A computer does not bother with the distinction between cause and coincidence, so some of the correlations might prove medically absurd. (Recall the statistical connection between a person’s IQ and their shoe size!)
Decision trees are used to model predictions based on answers to a series of simple classification questions. Using a breast cancer example, routine mammogram results divide patients into two groups. Those with no abnormalities are classified in the negative; the others go to the next question based on a mammogram. Were the mammogram results suspicious? The “no” answers are set aside, and the “yes” answers move on to the biopsy. That test may result in either a nonmalignant cyst, or, in case of a “yes,” a recommendation for surgery and chemotherapy. To a certain extent, medical professionals are trained to think in decision trees—as should AI systems in the same field, no?
In AI, decision trees can manage multiple sources of data and run constantly. They can “decide” whether a vehicle accelerates, cruises, or brakes; turns left, turns right, or heads straight; remains squarely in its lane or heads to the side of the road to take evasive action.
Neural networks are a key component of most AI systems, but the term is fundamentally misleading. Recall the table about the differences between human brains and computers (see introduction). Neural networks are computerized functions performed by software on hardware, nothing else. They take digitized data, make many calculations quick as lightning, and end in a number. A real neuron is a living human cell that accepts and sends electro-chemicals in the body. Human biologists actually don’t really know how a neuron works, when it comes down to it. To compare a neuron to an electric transistor that is either on or off (reads either 1 or 0) is wildly misleading. But there is no point in trying to change the name at this point.
The best way to explain what a neural network in AI does is by way of example. Think of everything that can go into determining the actual price of a house at sale.[14] Square footage, lot size, year built, address, zip code, number of bedrooms, number of bathrooms, basement, floors, attached/detached garage, pool, facade type, siding type, window quality, inside flooring, family size, school quality, local tax burden, recent sale price of a house nearby, and so forth. Those are the inputs. For simplicity’s sake, let’s say there are thirty of them in a single column of thirty rows.
The neural network sets up an inner “hidden layer” of calculations—imagine another column of say, ten rows—in which each of the original inputs is “weighted,” or multiplied by a parameter (a number you can change) and results in a single, new output number. Think of the hidden layer as a stack of ten functions that push the house price up or down. One could be for, say, “curb appeal,” another for “in-demand trend,” another for “family fit,” another for “convenience,” etc. All thirty input data points are included, each weighted differently by a set parameter, in each of the ten processed inputs in this first hidden layer.
The next layer does the same as the first, adjusting the numbers from the first hidden layer further, bringing them closer to a final recommended price. The final output is the price number. The input data, the inner “hidden” layers, and the final output comprise the neural network.
Neural Network
Source: Image by Brennan Pursell
Although the statistical calculations linking the layers can become very complex, to say the least, the computer performs them accurately, except where bugs intervene—and they can be fiendishly difficult to detect. Given enough data entries and enough hidden layers, neural networks can produce some very accurate calculations. Neural networks can have one, few, or many hidden layers. The more there are, the “deeper” the neural network. “Deep” networks usually work better than “shallow” networks with fewer layers.
Here’s the catch: The neural network has no way of knowing whether its output house price is accurate or not. It needs to be “trained” based on example data sets. You need to train it based on good data—actual examples, in this case, of houses successfully sold at a given price.
The beauty is that when you enter into the network the output and its matching set of inputs, the network’s algorithms can adjust the parameters in the hidden layers automatically.
There are three key algorithms that make neural networks work. In them you will see that “deep learning” and “machine learning” really have a lot to do with training, but almost nothing to do with human learning.
Backpropagation makes the neural network adjust the weights in the hidden layers. Backpropagation is usually at the heart of “deep learning,” “machine learning,” and AI today. The algorithm was a breakthrough by Geoffrey Hinton and, independently, Yann Lecun, in the mid-1980s, although their work relied on research from the 1960s and ’70s. Backpropagation only lately became an effective tool when available data and processing speeds both grew exponentially in the last decade.
Backpropagation starts at the output—the house price, in our example—and then works backward through the hidden layers toward the inputs. Say that the neural network predicted a price of $500,000 for a certain house, but you know that the actual price was $525,000. Backpropagation takes the correct price and adjusts the weights in each calculation in the hidden layers so that the neural network would come to the same price based on the initial inputs.
But that is just one example. If you train the network on many examples, then backpropagation averages all the adjustments that it needs to make, in order to optimize accurate performance across the entire data set.
The more you train it, the more you test its results, the better it gets.
Gradient descent refers to the mathematical process of measuring the adjustment of the weights—the parameters—in the hidden layers to improve the accuracy of the neural network’s calculations and minimize the effect of errors in the data. Think of it as steps toward the sweet spot, the best set of weights in the network for generating the best, most realistic output, given the inputs. Gradient descent relies on derivatives in good old calculus, which determine the slopes of function lines.
Finally, the sigmoid and rectified linear unit (ReLU) functions help neural networks to generate clear “yes” and “no” answers and/or classifications (in the form of a 1 or a 0) or to sort data inputs into various categories. These functions are important in decision trees, as you might expect. ReLUs in particular enable neural networks to obtain their results faster.
“Recurrent neural networks,” “convolutional neural networks,” “deep belief networks,” “generative adversarial networks,” and even “feedforward multilayer perceptron deep networks” all rely on the software you just learned about. And none of them are worth anything without good quality data, and lots of it.
Equipped with the basic math and software underlying AI, you can readily face down any aggressive sales associate who tries to persuade you that his or her AI thinks like a human, one smarter than you.
As you have seen, the technical ideas behind AI have been around for decades, and their applicability in the workplace has soared in just the past few years. The amount of available raw data has exploded as more and more applications on more and more mobile devices collect data and share it on the Internet with service companies and their partners. We communicate and work increasingly through apps. The “Internet of Things” (IoT) is a volcano, disgorging data everywhere faster than anyone can measure. No one can stop it. “Smart” devices and sensors proliferate. Better hardware, from the individual device to the network systems, help to process and transfer that data faster than ever.
AI systems are increasingly capable of recognizing images, processing human language, and managing information in structured and unstructured data through statistical procedures. (I will revisit what applied AI can do for your organization in chapter 3.)
To sum it all up: Software and hardware put statistics on steroids—and it will get much, much more powerful over time.
We need to use AI because, when the data are well-labeled and the procedures are correct, computers run great numbers of them at high speed and low cost. Computers are also immune to human errors such as prejudice, favoritism, distraction, and mood swings (although we all wonder sometimes, when our systems suddenly slow down, freeze up, or crash).
Nonetheless, AI has three big problems: dependency, consistency, and transparency. Dependency refers to the machines’ need for large amounts of high-quality, correctly classified training data. “Garbage in, garbage out,” as we said earlier. Consistency is a problem because adjustments made to algorithms produce different end results, regardless of data completeness and quality. Different AI systems produce different results on the same darned data. Finally, and most importantly, transparency in neural network processes is limited at best. Backpropagation makes it extremely difficult to know why the network produces the result that it does. The chains of self-adjusting calculations get so long and complicated that they turn the AI system into a “black box.”[15]
Despite these problems, AI functionalities are improving all the time, and applications of AI technology are proliferating in just about every sector of the economy. For every human being that uses a smartphone, there is literally no avoiding it.
1.
See World Intellectual Property Organization, Technology Trends 2019: Artificial Intelligence. https://www.wipo.int/edocs/pubdocs/en/wipo_pub_1055.pdf. The executive summary can be downloaded from https://www.wipo.int/edocs/pubdocs/en/wipo_pub_1055-exe_summary1.pdf.
2.
The best book on AI mathematics is Nick Polson and James Scott, AIQ (New York: St. Martin’s Press, 2018).
3.
See https://www.netflixprize.com/community/topic_1537.html and https://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf. The irony is that Netflix never actually used the prize-winning algorithm, supposedly because of the excessive engineering expense involved. It has used other algorithms developed by the prize-winning team. Casey Johnston, “Netflix Never Used Its $1 Million Algorithm Due to Engineering Costs,” Ars Technica, April 16, 2012, https://www.wired.com/2012/04/netflix-prize-costs/.
4.
See Polson and Scott, ch. 2.
5.
Public domain image from Wikipedia. https://en.wikipedia.org/wiki/Regression_analysis.
6.
See Polson and Scott, ch. 2.
7.
P(H|D) = P(H) * P(D|H) / P(D). See Polson and Scott, ch. 3.
8.
See Polson and Scott, ch. 4.
9.
See Polson and Scott, ch. 5.
10.
An excellent book on this topic is Steven Finlay, Artificial Intelligence and Machine Learning for Business: A No-Nonsense Guide to Data-Driven Technologies (Relativistic, 2nd ed., 2017).
11.
Quoted in Clive Cookson, “Huge Surge in AI Patent Applications in Past 5 Years,” Financial Times, January 31, 2019.
12.
Cathy O’Neil, Weapons of Math Destruction (New York: Broadway Books, 2016).
13.
Madhumita Murgia, “How to Stop Computers Being Biased: The Bid to Prevent Algorithms Producing Racist, Sexist or Class-Conscious Decisions,” Financial Times, February 12, 2019.
14.
We recommend Andrew Ng’s courses on machine learning, available for free through Coursera, the online learning platform that he founded. See https://www.coursera.org/.
15.
See interview with Erik Cambria in Kim Davis (ed.), The Promises and Perils of Modern AI (DMN, 2018) eBook. https://forms.dmnews.com/whitepapers/usadata/0218/?utm_source=DMNTP04302018&email_hash=BBFC7DC6CF59388D09E7DF83D3FE564F&spMailingID=19474637&spUserID=NTMyMTI1MTM4MTMS1&spJobID=1241853091&spReportId=MTI0MTg1MzA5MQS2. For Erik Cambria’s work on teaching computers emotional recognition through using semantics and linguistics, see http://sentic.net/.