Читать книгу The Centaur's Dilemma - James E. Baker - Страница 9
ОглавлениеTWO
The New Electricity
WHAT IS ARTIFICIAL INTELLIGENCE?
WHAT IS ARTIFICIAL INTELLIGENCE?
Popular and scientific literature identifies several benchmark events in the development of AI. In 1950, for example, the English computer scientist Alan Turing wrote an article titled “Computing Machinery and Intelligence.” He asked, can machines think, and can they learn from experience as a child does? “The Turing test” was Turing’s name for an experiment testing the capacity of a computer to think and act like a human. The test would be passed when a computer could communicate with a person in an adjacent room without the person realizing he was communicating with a computer.
In 1956, Dartmouth College hosted the first conference to study AI. The host, Professor John McCarthy, is credited by many with coining the term “artificial intelligence.” As quoted by Nick Bostrum, author of the best-selling book Superintelligence: Paths, Dangers, Strategies, the funding proposal submitted to the Rockefeller Foundation stated: We propose that a two-month, ten-man study of artificial intelligence be carried out … We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together.1
Despite this optimistic beginning, progress in the field was not linear or exponential. Progress occurred in fits and starts. As a result, AI development went through a series of “AI winters,” periods of low funding and low results. Continuing the seasonal metaphor, there is little doubt that we are now in an AI spring. The first question is, what will blossom? The second question is, if this is spring, when will summer arrive and how hot will it be? This leads to the third question: are we ready for the results?
AI Defined
Artificial intelligence (AI) is defined in different ways by different people and institutions. Here is one definition:
Artificial intelligence is a science and a set of computational technologies that are inspired by—but typically operate quite differently from—the ways people use their nervous systems and bodies to sense, learn, reason, and take action.2
The reality is, there is no common or widely accepted definition. One reason it is hard to define is because the term itself is inapt. AI invites human comparison to include anthropomorphism, the instinct to give human or animate feelings to inanimate objects like cars and other machines. People also associate the word “intelligence” with inherently human traits, like kindness, compassion, and a desire or will to do good or evil. But these are not machine traits, at least not yet.
Machines do what they are programmed to do, not because they choose to do so, but because they are programmed to do so. Software drives machines. The fact that many cultural representations of AI involve robots—as do some applications—does not help. Moreover, the use of voice triggers and natural language conversion, as in the case of talking navigational aids and “personal assistants” such as Siri, Alexa, and Watson, add to the anthropomorphic tendency. Additionally, much of the current research into machine learning, described below, is predicated on trying to reproduce the human brain—literally, in the case of efforts to replicate the human brain using 3D printers, or with neurological metaphors such as artificial neural networks. Artificial intelligence may mimic, and in some cases outperform, human intelligence, but it is not human intelligence. It is machine capacity and optimization, hence the preferable term human-level machine intelligence (HLMI).
Another definitional challenge with AI is that when it becomes embedded in our everyday lives—in shopping algorithms, navigational aids, and search engines—it is no longer referred to as AI. It is treated as just one more bit of technology in the stuff around us. AI, after all, is a thing of the future. Maybe we should call this sort of AI “regular AI.”
Most significantly, AI is hard to define because it draws on a wide spectrum of technologies, subfields, and capacities that make a crisp and singular definition difficult. These other technologies have reached fruition at different times, but all have synergistically propelled AI research and capacity. These include the following:
Computational capacity. Moore’s Law (actually an engineer’s prediction regarding the period 1965–85) posits that the number of transistors in an integrated circuit, the basic electronic chip used to store and transmit data in a computer, will double every eighteen months. Today, for example, the iPhone 5 has 2.7 times the processing capacity of a 1985 Cray-2 supercomputer.3 Nanotechnology affects Moore’s Law by allowing an ever-greater number of transistors and circuits to operate in smaller and smaller places. (How small? Nanotechnology deals with materials smaller than 100 nanometers and thus the manipulation of individual atoms and molecules. A nanometer is one-billionth of a meter; to illustrate, a human hair is 80,000 to 100,000 nanometers wide.) The advent of quantum computing (QC) has potential to fundamentally change Moore’s Law, which is to say replace it altogether as a unit of measure. One reason for the growth in QC research is the ultimate finite capacity of silicon chips and circuits to store and transmit data.
Big data, cloud computing, sensors, and the Internet of Things. The internet and subsequently the Internet of Things (IoT)—the ubiquitous connection of devices such as cars, refrigerators, and doorbells, along with “traditional” electronic devices such as televisions, computers, and phones—has resulted in an explosion of data, metadata, and stored data. This data is available because of the revolution in storage capacity brought on by cloud computing. It is also available and accessible because of the development of finer and smaller multispectrum sensors. Light Imaging Detection and Ranging (LIDAR) is a type of sensor, for example, that uses light pulses to make high-resolution maps, as well as guide driverless cars.
Algorithms, software, and data analytics. The revolution in computing capacity is matched by the development in software and algorithmic reasoning, which has transformed the capacity of engineers to search stored data. An algorithm is a set of instructions or calculations to perform a task, often in the form of a mathematical formula. A program is an algorithm that completes a task.4 Among other things, algorithms are used to search for and detect patterns in data and metadata. Electrical computer circuits communicate with short bursts of energy in the form of ones and zeros. Computer software code, therefore, translates letters, words, and other symbols into ones and zeros that can then be searched and analyzed by algorithms. (Metadata is data about data; in the context of telephones, the data is the content of your phone call while the metadata is the time, duration, and number you called.) Most algorithms are complex. The Google search algorithm, for example, is believed to consist of over 2 billion lines of code. The word is “believed” because the algorithm is proprietary information, a trade secret. Moreover, it continuously adjusts based on user data—that is, what we search for on an individual and collective basis. The process involves engineering input as well as software changes the algorithm itself makes.
Knowledge and mapping of the human brain. The past ten years have seen a remarkable growth in human knowledge about the brain. This is a product of means, such as the ubiquity of CAT scans and MRIs, and of opportunity, including new imperatives and funding to study battlefield injuries such as traumatic brain injury (TBI) and chronic traumatic encephalopathy (CTE) in athletes.
Robotics and autonomous systems. AI has benefited from the parallel development of civil, military, and commercial autonomous architectures dependent on AI-enabled computers, such as robotics, drones, and smart grids. Each has spurred the others. The Department of Defense often uses the acronym RAS—robotic and autonomous systems—to describe AI initiatives rather than just AI.
The dot.com to Facebook phenomenon. Silicon Valley, along with social media and data giants Amazon, Alphabet (Google), Facebook, Microsoft, and Apple, has demonstrated that algorithms and data make money. To borrow Napoleon’s phrase about “a marshal’s baton in every soldier’s backpack,” the entrepreneurial imagine that in every college dorm room there is a billionaire’s portfolio. Or so many hope and think; the will to explore and invent continues, as does the desire to make money.
Machine learning. It all comes together, or so it seems, with machine learning. Machine learning is just that: the capacity of a computer using algorithms, calculation, and data to learn to better perform programmed tasks and thus optimize function. “The network learns to recognize features and cluster similar examples, thus revealing hidden groups, links, or patterns within data.”5 Machine learning occurs in different ways using different mathematical theories and formulas. As explained by Buchanan and Miller in Machine Learning for Policymakers, there are currently three common processes for teaching machines to learn: supervised learning, unsupervised learning, and reinforcement learning. (Note the use of the qualifier “currently.”) Moreover, there are multiple mathematical theories and models for implementing these methods. The United Nations Institute for Disarmament Research, for example, listed seven illustrative methods in 2018: (1) evolutionary or genetic algorithms; (2) inductive reasoning; (3) computational game theory; (4) Bayesian statistics; (5) fuzzy logic; (6) hand-coded expert knowledge; and (7) analogical reasoning. The most promising present method is known as deep learning.6
Supervised learning occurs when a computer is fed data that is mathematically weighted to train the computer to better analyze new data and predict outcomes. “The ‘supervised’ part of the name comes from the fact that each piece of data given to the algorithm also contains the correct answer about the characteristic of interest …”7 Supervised learning is time-consuming. Engineers must feed the data to the computer (that is, enter the data into the computer program), rewarding the algorithm when it is correct in identifying an object, pattern, or sequence, and penalizing the program when it is not. “Reward” here refers to symbolic language, that is, code the algorithm is programmed to seek and compile, like coins in video games. In a homeland security context, for example, this methodology might be useful in identifying or searching for specific people or characteristics of people boarding flights or transiting checkpoints. Likewise, it has obvious potential to facilitate the analysis of overhead surveillance in search of missile silos, underground bunkers, and camouflaged weapons.
Unsupervised learning occurs when the machine seeks to find patterns in data based on algorithmic models but without being trained on data first. “Unsupervised learning is more useful when there is not a clear outcome of interest about which to make a prediction or assessment. Unsupervised learning algorithms are given large amounts of data and try to identify key structures, or patterns, within them.”8 In security context, unsupervised learning might be useful in determining if there are travel patterns identifying links between otherwise unconnected passengers. Likewise, this methodology might help identify patterns in financial transactions or shipments that are anomalous or indicative of illicit economic or proliferation activity.
Reinforcement learning introduces what is referred to as a “software change agent” in the form of algorithmic programming to encourage the machine to learn from its experience in order to optimize its objective. “An agent takes an action, observes the effect on its environment, and then determines whether that action helped it achieve its goal.”9 Thus, AlphaZero—the next iteration of AlphaGo—“played many games against itself and learned over time which moves increased the probability of winning.”10 In other words, reinforcement software teaches itself by finding and implementing preferred strategies based on weighing and predicting outcomes. In 2019, DeepMind, a subsidiary of Google, reported that it was able to train virtual agents to play capture-the-flag within a video game using reinforcement learning. The game was played with teams of five, including human actors, and thus created the impression of coordination between agents as well as with humans, which mimics in part the sort of coordination that might occur with AI-enabled drones and swarms. However, commentators note, “the agents are responding to what is happening in the game rather than trading messages with one another, as human players do.” The experiment also illustrates the scale of effort required. “DeepMind’s autonomous agents learned capture-the-flag by playing roughly 450,000 rounds of it, tallying about four years of game experience over weeks of training.” To do so, the lab rented tens of thousands of computer chips at significant financial and energy cost.11 One can imagine how this methodology might be useful in war-gaming, testing policy alternatives, or planning logistics options.
Deep learning is a machine learning method used with supervised, unsupervised, and reinforcement learning. With deep learning, a computer’s software connects different layers and segments of data internally in a “neural network.” The computer adjusts the weights (inputs) in response to the outputs at each stage as it recognizes and rewards itself for correct outputs. Deep learning is often compared to how the human brain sorts and connects information through neural brain networks. Scientists are not quite sure how this occurs in humans, but we know it is happening. That is why some engineers are trying to imitate and map the brain. “Deep learning can combine an unsupervised process to learn the features of the underlying data (such as the edge of a face), then provide that information to a supervised learning algorithm to recognize features as well as the final result (correctly identifying the person in the picture).”12 There is, in essence, a cascade of computations and predictions. At each level of cascade, the computer is making probability assessments regarding the next set of output data, all inside the software unseen. Nick Bostrum likens computational neural networks to the sifting of sand through increasingly fine screens until one is left with the finest sand output.13
Engineers know this process is going on. They caused it to happen. But they cannot describe with certainty exactly what occurs between the input and output layer, that is, why the program has selected one picture, or pixel formulation, or pattern, to weigh in one cascade versus the next. As a result, this part of the process is sometimes referred to as the “black box.” What engineers do know is what they started with (the input) and the machine’s answer (the output) and how often in test scenarios the machine correctly performs its assigned task of moving from input to desired output. They also know how that accuracy percentage compares to humans performing the same task with the benefit of the same input data, for example, looking for tumors, anomalies in data, or camouflaged tanks. In many cases, AI is more accurate.
Algorithms are designed to vary the breadth and depth of the process. Breadth refers to how many different variables or symbols an algorithm will search for at a given level. Depth refers to how many levels of input and output the program might process before delivering an external output. With image recognition, for example, depth appears to offer more accuracy in prediction than breadth.14 Here is the mysterious part. Because the machine is learning and adjusting as it goes, engineers cannot always determine what the machine is weighing and how. As a facial recognition algorithm breaks down data into subordinate parts, engineers cannot be sure which data—for example, facial line, pixel, hue, or angle—the algorithm is using or will use to predict which images or portions of images best match or predict the desired output and thus get passed to the next neural network and ultimately on to the output layer. This is important, for example, where coding or data bias may affect the quality of the ultimate predictive outcome. An algorithm designed to predict recidivism or flight risk, for example, may use a hundred different data inputs as well as patterns it derives from the data of the general population in generating a predictive risk that an individual defendant will return to prison. But with deep learning, the judge and lawyers may not know what decisional weight, if any, was assigned to race, location, gang affiliation, the letter “S” in a defendant’s name, or population trends inapt to a specific defendant or inappropriate in determining bail or parole.
So, What Is AI?
AI is predicated on the notion that if you can express an idea, thought, or purpose in numeric fashion, you can code that purpose into software and subsequently zeros and ones for a machine to perform. This includes the ability of machines to shift from task to task and to learn to better optimize their function(s). AI is an umbrella term comprising different techniques to do this. But there are some common characteristics to most current AI applications, like pattern recognition, multitasking, sensors, and speed. Ryan Calo, a leading AI academic and lab director, writes, “The entire purpose of AI is to spot patterns people cannot.”15 In short, AI is a series of technologies designed to promote machine optimization based on computational capacity. This definition avoids the necessity of designating which element of the process is AI versus computation, versus robotics, versus algorithmic design.
NARROW AND STRONG AI
Readers should also know that experts refer to three kinds of AI. Narrow AI is where we are today. This is the ability of computational machines to perform singular tasks at optimal levels, or near optimal levels, and usually better, although sometimes just in different ways, than humans. This type of AI is all around us, performing single-purpose tasks, generally based on the capacity of AI to spot patterns, identify structure, and predict—and outperform humans in speed and computational capacity doing so.
When an AI-enabled machine is able to shift from task to task and “think on its own,” experts say, machines—or rather the engineers who programmed them—will have achieved what is called artificial general intelligence (AGI), also known as strong AI. There is variation in how experts define AGI. But generally, the definitions share three attributes.
First, fluidity, which is to say an ability to engage in task variance not only by performing more than one task but by shifting from task to task as needed.
Second, the ability to learn by training itself using the internet and any other source of knowledge available, such as data.
And third, the ability to write code—create—and thus rewrite and improve its own programming direction, returning the AGI cycle full-circle to fluidity.
AGI represents a level of intelligence (that is, capacity) at least equivalent to human, at least when it comes to IQ.
National security generalists should also know that AI philosophers and science fiction writers refer to a third hypothetical category of AI as superintelligent artificial intelligence (SAI), or superintelligence. In theory, SAI is achieved when machines not only fluidly switch from task to task but are generally more intelligent than humans. That is because SAI contemplates that the computational machine knows everything that is known to man, because it is connected to the internet. In further theory, that means it has unlimited sources of energy and thus computational capacity allowing it to progress to new levels of knowledge and invention that humans cannot even comprehend.
SNAPSHOT ASSESSMENT: THE STRENGTHS AND WEAKNESSES OF NARROW AI
Many narrow AI applications are known to consumers who rely on it daily. If you shop on Amazon, you are using AI algorithms. Amazon back-propagates training data from all purchases made on Amazon as well as data from individual consumers. Algorithms then identify patterns in the data and weight those patterns, allowing the algorithm to suggest (predict) additional purchases to the shopper. The algorithm adjusts as it goes, based on the responses (or lack of responses) from recipients. This is an example of predictive big data analytics. It is also an example of a push, predictive, and recommendation algorithm. Former secretary of the navy Richard J. Danzig writes:
… machines can record, analyze, and accordingly anticipate our preferences, evaluate our opportunities, perform our work, and so on, better than we do. With 10 Facebook “likes” as inputs, an algorithm predicts a subject’s other preferences better than the average work colleague, with 70 likes better than a friend, with 150 likes better than a family member, and with 300 likes better than a spouse.16
Narrow AI is also embedded in mapping applications, which sort through route alternatives with constant near-instantaneous calculations—factoring speed, distance, and traffic to determine the optimum route from A to B. Then the application uses AI to convert numbered code into natural language telling the driver to turn left or right. AI computations and algorithms are also used to spot finite changes in stock pricing and generate automatic sales and purchases of stock as well as spot anomalies that generate automatic sales and purchases. All of this is based on algorithms created and initiated by humans, but programmed to act autonomously and automatically, because the calculations are too large and the margins and speed too small and too fast for humans to keep pace and make decisions in real time. Of course, as one trader’s algorithm gets faster, the next trader must either change his algorithm’s design, its speed, or both, to achieve advantage. AI machine learning and pattern recognition is also used for translation, logistics planning, and spam detection. The beat goes on. All of which is why in 2017, the former chief scientist for Baidu, Andrew Ng, declared AI “the new electricity.”
Perhaps the most prominent illustration of next generation AI is the driverless car. AI empowers driverless cars by performing a myriad of data input and output tasks simultaneously, like a driver does. But in a different way. Human drivers rely on intuition, instinct, experience, and rules to drive—all at once, it seems, using the actual neural networks of the brain. In the case of driverless cars, sensors instantaneously feed computers data based on speed, conditions, images, and so on, of the sort ordinarily processed by the driver’s eyes and brain. The car’s software processes the data to determine the best outcome based on probabilities and based on what it has been programmed to understand and decide. This requires constant algorithmic calculations of the sort a human actor could not make in real time if humans relied on math to drive cars.
This is impressive stuff. But it is still narrow AI. And, in its current state, it comes with significant shortcomings. The driverless vehicle can make calculated choices based on sensors, pattern recognition, and calculations, but it cannot make moral or ethical choices. And it can only make calculated choices based on what it has been taught, rather than what it can intuit.
Chatham House scholar M. L. Cummings describes this process:
… estimating or guessing what other drivers will do is a key component of how humans drive, but humans do this with little cognitive effort. It takes a computer significant computation power to keep track of all these variables while also trying to maintain and update its current world model. Given this immense problem of computation, in order to maintain safe execution times for action, a driverless car will make best guesses based on probabilistic distributions. In effect, therefore, the car is guessing which path or action is best, given some sort of confidence interval.17
If a ball bounces into the road between two parked vehicles and an AI-enabled self-driving vehicle has not been “trained” to identify the ball-between-two-cars pattern, it will not intuitively surmise, as a human would, that a child may soon follow the ball into the street.
AI philosophers prefer a different example to explain the moral and other limitations in current AI—the ubiquitous crosswalk dilemma or “trolley problem,” a famous ethical thought experiment. The scenarios vary. Imagine two persons entering a crosswalk: one a bank robber fleeing the scene of a crime, the other a pregnant woman running after a child. Here comes a car or trolley. The driverless vehicle AI is likely to calculate what to do based on mathematical inputs that might predict the course with the highest probability of avoiding both individuals, and if that is not possible, to be certain to avoid at least one of the individuals, likely seeing the pedestrians as having equivalent value. But the calculations will be based on what is already embedded in the machine’s software and training data, not the new contextual information on site, in the moment, about the characteristics of the people within the crosswalk. Engineers refer to AI that lacks this sort of situational awareness and flexibility as being “brittle.” In contrast, a human driver, if alert, will adjust and select a new course of action based on experience, judgment, intuition, and moral choice involving the actual pedestrians, erring we assume on the side of missing the pregnant woman at the risk of hitting the bank robber.
Alas, there are real-world examples of this problem. In 2018, a driverless Uber test vehicle hit and killed a bicyclist at night on a Phoenix road. There was no moral dilemma to address, the vehicle’s sensors and computer failed because they were not trained to identify a bicycle at night. According to press reports, the computer initially classified the person and bike as “an unrecognized object” apparently without reference to the human on board. The vehicle eventually sought to stop, but not in time. Neither did the human safety driver in the test vehicle respond in time.18 No wonder there are strongly held views about the safety of driverless cars; proponents seek to deter anecdotal reasoning and invite consideration of trend lines and safety percentages in comparison to human drivers. The case is presented here not to take sides in the driverless car debate, but because it illustrates a present weakness in narrow AI. Policymakers and lawyers should now imagine how this lack of situational awareness might affect military applications of AI.
Perhaps the most compelling (and successful) illustrations of supervised machine learning enabled by narrow AI comes in the area of medicine. Two examples illustrate. In India, diabetes is a significant cause of blindness. Seventy million Indians are diabetic and thus at risk of diabetic retinopathy, which can cause blindness. The condition is treatable if identified before the onset of blindness. However, the ophthalmologist-to-patient ratio in India is on the order of eleven doctors for every million people. One solution is to prescreen the populace using an AI application designed to detect retinal patterns in the eye that presage the onset of diabetes-induced blindness. The limited number of doctors can then focus on the prescreened high-risk patients and bypass the others. As a result, Google is testing an AI application in India to screen for diabetic retinopathy. The patient data is compared to centralized data in the United States. While the system has been approved for use in Europe, it is pending approval in India and the United States. There remain concerns about false positives and the longer-term validation of accuracy.19
Three points emerge. First, this is the centaur model at work. The centaur is usually better than the machine or the human acting alone. It is similar to the way common-law judges approach criminal confessions. Confessions offer powerful and often conclusive evidence of criminal conduct and guilt. However, for reasons beyond the scope of this book, there are psychological and other reasons why some persons falsely confess. There are also instances where confessions are the product of interrogation pressure, perhaps coercion, and therefore are less reliable, or not reliable, as a determinant of culpability. As a result, the law requires that any confession be corroborated by independent evidence—something more—like a human validating an AI result.
If one prefers a national security example of the centaur model, consider how the intelligence community uses polygraphs as a counterintelligence tool, but rarely a counterintelligence determinant. Polygraph machines measure physiological indicators, such as pulse, blood pressure, and respiration that can be associated with deception. However, these indicia can also manifest fear and stress. Proponents of polygraphs argue that the machine is only as good as the operator who is, or is not, specially trained to distinguish between deceptive indication and stress. Courts are not so sure. The Supreme Court has stated,
… there is simply no consensus that polygraph evidence is reliable. To this day, the scientific community remains extremely polarized about the reliability of polygraph techniques.… Rule 707—excluding polygraph evidence in all military trials—is a rational and proportional means of advancing the legitimate interest in barring unreliable evidence.20
Here is the point. When used effectively, polygraphs are a supplement to human judgment. They are a tool that might prompt recipients to disclose information or help agencies determine who warrants additional scrutiny. Once again, the centaur model.
Second, empirical studies show that AI-enabled machines identify and locate a higher percentage of cancerous tumors than do radiologists reviewing the same images. It does not take a specialist to recognize how this capacity could apply to satellite analysis or cyber-malware detection. This is an area where machine optimization in spotting and recognizing patterns, as well as anomalies in patterns, demonstrates the realized potential of AI to serve a greater good, beyond shopping algorithms. It also illustrates the potential impact AI will have on the workforce, including white-collar workforce, as well as the sometime tension and difference between what it means to augment human capacity and what it means to replace it. Ask the question: would you be prepared to learn you have cancer from a machine, alone? Or would you want to validate the results of an AI-analyzed scan with a second opinion from a trusted doctor? And would you not want that doctor to convey to you not just the probability that you will live or die, but also the hope that comes from the touch of a human hand and the knowledge of your personal circumstances? The distinction between human intelligence and narrow AI may, in the end, rest on the difference between what we measure with IQ and what we measure with EQ. Perhaps emotional intelligence is the essential human trait.
Third, human actors and decisionmakers who rely on AI should always ask: what is missing? Machines can now act with certain attributes of intelligence and outperform humans at certain intelligent tasks. The list is growing. “It is expected that machines will reach and exceed human performance on more and more tasks.”21 But is something missing? Here are some possibilities: intuition, compassion, creativity, and judgment. Judgment lets us fill in the gaps between experience and what is new. It also allows us to address competing norms and interests and make rational choices. These are also the traits that inform situational awareness allowing humans to adapt to the unexpected, unknown, or changing circumstances.
Let’s look at intuition. There is no doubt that a trained machine can usually better identify objects in pictures. It sees more in depth and in breadth. It can see everything at once and it can break the picture down into discrete quadrants. It sees patterns and pixels the human eye cannot, all instantaneously. But narrow AI does not know what it does not know, and, at present, it lacks the intuition to find out, other than through brute force matching. Once again, M. L. Cummings:
Expertise leverages judgment and intuition as well as the quick assessment of a situation, especially in a time-critical environment such as weapons release.… In humans, the ability to cope with the highest situations of uncertainty is one of the hallmarks of the true expert, but in comparison such behavior is very difficult for computers to replicate.22
The point is illustrated with reference to a famous experiment in a different field—the gorilla experiment. In the experiment, a set of observers is asked to watch two groups of people who are busy passing an object back and forth in an office-type environment. They are told to focus on the object. At some point during the experiment, a person dressed in a gorilla suit enters the office and walks behind the people passing the object back and forth; in other words, an anomalous event or pattern occurs. When asked to record everything they observed during the experiment, only half the participants report observing the gorilla. The experiment was used to study the tendency of the mind to focus, and the cognitive bias that occurs, when a person is focusing on one task to the exclusion of others. Paul Slovic, professor of psychology at the University of Oregon, calls this the “prominence effect.” But it illustrates a point about narrow AI as well. An AI-enabled machine would not miss the gorilla. What is more, AI could have generated a ten-second microfilm of all that occurred during the course of the hours-long experiment, allowing a human to skip the boredom and fatigue of downtime and easily notice the gorilla walk across the stage. At the same time, the machine, unless it is trained to identify gorillas, would not identify the gorilla as a gorilla. In contrast, a human would intuitively determine that the object was a gorilla, like a gorilla, or an ape, based on life experience and perhaps having once seen a picture of a gorilla. One might call this judgment. In short, humans are less able to detect the anomaly but more capable of interpreting its meaning, at least at this time.
Likewise, a machine can be programmed to repaint the “Mona Lisa” as “Paint by Numbers,” a 1950s arts-and-crafts technique. It could also be programmed and trained to find and mix the exact hues of oils, and age them, to mimic the “Mona Lisa” canvas. It might even pass the forger’s test. But could AI conceive of the “Mona Lisa”? Would AI, on its own, wonder if Mona Lisa was smiling or why? It raises the rhetorical question Whitney Griswold asked in 1957: Is there something divine about artistic creation? Or, can creativity and imagination be learned, taught, or programmed? What if a machine had access to the internet and the capacity to draw on all the world’s accumulated knowledge and know-how? Or all the world’s art? Could it then create the “Mona Lisa”? We can imagine Griswold’s response. There is something more to creativity than knowledge, computation, and code. Although writing in a different time and context, Griswold asked rhetorically,
Could Hamlet have been written by a committee, or the “Mona Lisa” painted by a club? Could the New Testament have been composed as a conference report? Creative ideas do not spring from groups. They spring from individuals. The divine spark leaps from the finger of God to the finger of Adam, whether it takes ultimate shape in a law of physics or a law of the land, a poem or a policy, a sonata or a mechanical computer.23
Why is this important? Because one of the first topics of debate about AI is whether the next category of AI will bridge this divide and allow machines to effectively teach themselves to create, for example, by identifying objects they have not been trained to see before, and performing tasks they have not been programmed to perform.
The debate highlights a string of essential philosophical, technical, and legal questions about AI. What does it mean to be human? Are there inherently human functions that cannot, or should not, be replicated by code-driven machines? If so, what are those attributes, traits, or functions? And if they can be replicated, are there any traits or attributes we should regulate or prohibit? If it can be done, should we allow it to be done, and if so, subject to what substantive and procedural limitations? Identifying these attributes, if any, allows us to understand what we might be giving up, or risking, by turning certain functions over to AI-enabled machines with and without human control.
A THIRD WAVE?
AI has blossomed with machine learning, which is why many experts contend we are skipping spring and entering an AI summer. The National Artificial Intelligence Research and Development Strategic Plan (2016) described the first wave of AI machine learning as if-then linear learning. That is, a process of AI that relies on the brute force computational power of today’s computers. The computer is, in essence, “trained” that if something occurs, then it should take a countervailing or corresponding step. This is essentially how the IBM computer Deep Blue beat Gary Kasparov in chess in 1997, a significant AI milestone. The computer was optimizing its computational capacity to sort through and weigh every possible move in response to each of Kasparov’s actual and potential moves, through the end of the game. It did so with the knowledge of all of Kasparov’s prior games, while on the clock in real time. But Deep Blue was a display of computational force, an endless and near-instantaneous series of if-this-then-that questions and calculations. Watson would go on to defeat Ken Jennings in Jeopardy! one year later, using much the same method.
The report describes the second wave of AI as “machine learning.” That is where we were in 2016 and where we are in 2020. The current benchmark of the second wave of AI is AlphaGo—the Google computer that beat the world’s best Go player in 2016. This was a milestone beyond Watson’s mastery of chess and Jeopardy!, not just because Go is a more complex, multidimensional game, but because AlphaGo won using unsupervised learning. It got better at the game through experience and by adjusting its own decisional weights internally—in the black box—without training data or other if-this-then-that learning. This represented more than brute force computational power; it was a machine optimizing its capacity. Thinking? No. Learning? Yes.
Experts are now debating whether we are entering a third wave of AI and machine learning. This is what the National Artificial Intelligence Research and Development Strategic Plan said in October 2016 on the prospect of a third wave of AI development:
The AI field is now in the beginning stages of a possible third wave, which focuses on explanatory and general AI technologies … If successful, engineers could create systems that construct explanatory models for classes of real-world phenomena, engage in natural communication with people, learn and reason as they encounter new tasks and situations, and solve novel problems by generalizing from past experience.24
Imagine a computer linked to the internet, the Cloud, and the IoT. Next imagine that the computer is not programmed to play chess or Go, a single task and limitation, but is programmed to solve problems or answer questions generally. It moves fluidly from one task to the next. Now consider that if a computer can do that, it could not only write code, which computers can do now, but it could rewrite, improve, and change its own code. It might do this to optimize the task it was originally programmed to perform and find new and unanticipated paths to optimize the execution of assigned tasks. The questions with general AI are: whether and when?
In 2015 and 2016, a group of scholars associated with the Oxford Future of Humanity Institute, AI Impacts, and Yale University surveyed “all researchers who published at the 2015 NIPS and ICML [Workshop on Neural Information Processing Systems and International Conference on Machine Learning] conferences (two of the premier venues for peer-reviewed research in machine learning).” The survey asked respondents to estimate when HLMI would arrive. The 2018 study did not define AGI or SAI, but stipulated that “Human-level machine learning is achieved when unaided machines can accomplish every task better and more cheaply than human workers.” Three-hundred and fifty-two researchers responded, a return rate of 21 percent. The results ranged across the board from never to beyond a hundred years. What is noteworthy is that the “aggregate forecast gave a 50 percent chance of HLMI occurring within forty-five years and a 10 percent chance of it occurring within nine years.” Most optimistic—or alarming, depending on one’s perspective—the median response for the two countries with the most respondents in the survey, the United States and China, was seventy-six years for the American respondents and twenty-eight years for the Chinese respondents.25
TAKEAWAYS
Having defined AI, reviewed its constituent parts, and explored some strengths and weaknesses of narrow AI, one might now ask what should policymakers and lawyers know and understand to better imagine and apply AI for national security benefit as well as mitigate its risks. Here are eight takeaways.
Narrow (contemporary) AI is best at pattern recognition, classifying, and predicting. AI applications can detect patterns exponentially faster and more reliably than humans in many cases, as well as detect patterns humans cannot. That makes AI a preferred tool for facial recognition, voice recognition, navigation, data mining, and natural language conversion, among other things. AI is also an essential component for enabling autonomous systems, such as smart grids, robots, and unmanned vehicles. If you want a sense of how good or bad AI is at the moment, consider how good shopping algorithms and other push algorithms are at predicting your behavior. Now ask yourself to what extent you would be willing to rely on a shopping-quality algorithm to make national security decisions or inform those decisions.
Narrow (contemporary) AI is not (yet?) good at tasks that require or involve situational awareness, moral choice, intuition, judgment, empathy, or creativity. Specialists refer to AI as brittle—not able to adjust to new situations or react to unforeseen circumstances. Thus, current AI is better at limited tasks, such as follow-the-leader logistics trains, than exercising independent choice such as distinguishing between potential military targets or, for that matter, persons in a crosswalk. If policymakers want a sense of when AI might be nimble enough to handle swarm deployments, they might watch for the first AI application to successfully master a video game, such as StarCraft II, without extensive reinforcement learning, as well as switch from game to game.26
Mastering the ’ilities. In order for narrow AI to get better at the things it is good at as well as the tasks it is not good at, engineers must overcome what the 2017 MITRE Corporation JASON Report, “Perspectives on Research in Artificial Intelligence and Artificial General Intelligence Relevant to DOD,” identified as the “so-called ’ilities,” which “are of particular importance to DOD applications.”27
Reliability
Maintainability
Accountability
Verifiability
Evolvability
Attackability, and
So forth.
In the “so forth” category, one might also include interoperability.
The centaur model. Until such time as the ’ilities are successfully addressed, the centaur model offers the only sure and effective model of operation. The challenge for policymakers, technologists, and lawyers is to move beyond human-in-the-loop bromides in defining law and policy. With AI, a human is always in the loop in some manner, for a human always writes the initial code and chooses to deploy an AI-enabled machine or capacity. The better question is when, where, and how should a human be involved in each specific AI function? The next question is, when a specific human is not directly in control, which humans should be held responsible or accountable for what occurs and based on what mechanism of adjudication and decision? If we do not get the human part of the centaur right, it may not matter how good the AI part is at performing its programmed task.
Experts do not know where this is all headed or how soon. You do not need to buy into the debate about artificial general intelligence and superintelligent artificial intelligence to realize that experts do not know where AI is ultimately headed. The debate itself evidences uncertainty as to the outcome, timeline, and milestones along the way.
AI predicts. It does not conclude. It is a statistical and computational tool. Current AI is all about calculating probabilities. As Chris Meserole of the Brookings Institute has written, “The core insight of machine learning is that much of what we recognize as intelligence hinges on probability rather than reason or logic.”28 That means that where AI is used, policymakers must ask such questions as: How accurate is the algorithm? What confidence threshold have decisionmakers applied? What is the false positive rate? What is the false negative rate? When the algorithm is wrong, why is it wrong?
The importance of data. Prediction is one reason the internet, IoT, and data banks are so important to the development of AI and machine learning. Accuracy depends on the amount and quality of data, just as it depends on the algorithm used to derive meaning from data. The more data the better. Supervised or not, the more data that is fed to or known to a computer the more accurately it can adjust its coding weights to classify data and select the right outputs. This makes sense. A human who has never seen a cheetah can infer, based on having seen a domestic cat, that a cheetah is a type of cat. However, at present, a computer can only know what it is programmed to know. Thus the more pictures of different types of cats a supervised learning algorithm is exposed to the more likely it is to discern a shape, a color, or a pattern that is predictive of a cat when it is shown its first picture of a cheetah. But if the algorithm has only seen one picture of a domesticated cat, and hundreds of pictures of fur coats, it might predict that the picture of a cheetah is a fur coat. This is why the development of AI will depend in part on the procurement of data—overtly, covertly, and through synthetic means. Likewise, a facial recognition algorithm trained on predominantly male images, perhaps because the software engineer anticipates that most terrorists are or will be male, will be less apt at identifying female faces.
This means policymakers and lawyers must define the right and left boundaries of data collection, use, and retention in the private as well as public sectors and do so conscious of how security and privacy values are affected. In addition, intelligence analysts should watch who is collecting data and what data they are collecting. Lawyers should help to define the left and right boundaries of data collection and use, while technologists should determine if they can embed those boundaries in code.
AI has a low threshold of entry, but a high threshold for accuracy. Any business or government that is using an algorithm to process data is arguably using AI. A non-state actor programming an unmanned vehicle is likely using AI. In this sense, AI may become “the Kalashnikovs of tomorrow.”29 But strong AI, accurate AI, requires significant amounts of data, storage capacity, energy, sustained commitment, and financial resources.
Equipped with this background, the next two chapters consider the national security applications and implications of AI.