Читать книгу The Creativity Code - Marcus du Sautoy - Страница 10
Game Boy extraordinaire
ОглавлениеAt the beginning of 2016 it was announced that a program had been created to play Go that its developers were confident could hold its own against the best humans had to offer. Go players around the world were extremely sceptical, given the failure of past efforts. So the company that developed the program offered a challenge. It set up a public contest with a huge prize and invited one of the world’s leading Go players to take up the challenge. An international champion, Lee Sedol from Korea, stepped forward. The competition would be played over five games with the winner taking home a prize of one million dollars. The name of Sedol’s challenger: AlphaGo.
AlphaGo is the brainchild of Demis Hassabis. Hassabis was born in London in 1976 to a Greek Cypriot father and a mother from Singapore. Both parents are teachers and what Hassabis describes as bohemian technophobes. His sister and brother went the creative route, one becoming a composer, the other choosing creative writing. So Hassabis isn’t quite sure where his geeky scientific side came from. But as a kid Hassabis was someone who quickly marked himself out as gifted and talented, especially when it came to playing games. His abilities at chess were such that at eleven he was the second-highest-ranked child of his age in the world.
But then at an international match in Liechtenstein that year Hassabis had an epiphany: what on earth were they all doing? The hall was full of so many great minds exploring the logical intricacies of this great game. And yet Hassabis suddenly recognised the total futility of such a project. In a radio interview on the BBC he admitted thinking at the time: ‘We were wasting our minds. What if we used that brain power for something more useful like solving cancer?’
His parents were pretty shocked when after the tournament (which he narrowly lost after battling for ten hours with the adult Dutch world champion) he announced that he was giving up chess competitions. Everyone had thought this was going to be his life. But those years playing chess weren’t wasted. A few years earlier he’d used the £200 prize money he’d won for beating a US opponent, Alex Chang, to buy his first computer: a ZX Spectrum. That computer sparked his obsession with getting machines to do the thinking for him.
Hassabis soon graduated on to a Commodore Amiga, which could be programmed to play the games he enjoyed. Chess was still too complicated, but he managed to program the Commodore to play Othello, a game that looks rather similar to Go with black and white stones that get flipped when they are trapped between stones of the opposite colour. It’s not a game that merits grandmasters, so he tried his program out on his younger brother. It beat him every time.
This was classic ‘if …, then …’ programming: he needed to code in by hand the response to each of his opponent’s moves. It was: ‘If your opponent plays that move, then reply with this move.’ The creativity all came from Hassabis and his ability to see what the right responses were to win the game. It still felt a bit like magic though. Code up the right spell and then, rather like the Sorcerer’s Apprentice, the Commodore would go through the work of winning the game.
Hassabis raced through school, culminating with an offer from Cambridge to study computer science at the age of sixteen. He’d set his heart on Cambridge after seeing Jeff Goldblum in the film The Race for the Double Helix. ‘I thought, is this what goes on at Cambridge? You go there and you invent DNA in the pub? Wow.’
Cambridge wouldn’t let him start his degree at the age of sixteen, so he had to defer for a year. To fill his time he won a place working for a game developer after having come second in a competition run by Amiga Power magazine. While he was there, he created his own game, Theme Park, where players had to build and run their own theme park. The game was hugely successful, selling several million copies and winning a Golden Joystick award. With enough funds to finance his time at university, Hassabis set off for Cambridge.
His course introduced him to the greats of the AI revolution: Alan Turing and his test for intelligence, Arthur Samuel and his program to play draughts, John McCarthy, who coined the term artificial intelligence, Frank Rosenblatt and his first experiments with neural networks. These were the shoulders on which Hassabis aspired to stand. It was while sitting in his lectures at Cambridge that he heard his professor repeating the mantra that a computer could never play Go because of the game’s creative and intuitive characteristics. This was like a red rag to the young Hassabis. He left Cambridge determined to prove his professor wrong.
His idea was that rather than trying to write a program himself that could play Go, he would write a meta-program that would be responsible for writing the program that would play Go. It sounded a crazy idea, but the point was that the meta-program would be created so that as the Go-playing program played more and more games it would learn from its mistakes.
Hassabis had learned about a similar idea implemented by the artificial-intelligence researcher Donald Michie in the 1960s. Michie had written an algorithm called ‘MENACE’ that learned from scratch the best strategy to play noughts and crosses. (MENACE stood for Machine Educable Noughts And Crosses Engine.) To demonstrate the algorithm, Michie had rigged up 304 matchboxes representing all the possible layouts of noughts and crosses encountered while playing. Each matchbox was filled with different-coloured balls to represent possible moves. Balls were removed or added to the boxes to punish losses or reward wins. As the algorithm played more and more games, the reassignment of the balls eventually led to an almost perfect strategy for playing. It was this idea of learning from your mistakes that Hassabis wanted to use to train an algorithm to play Go.
Hassabis had a good model to base his strategy on. A newborn baby does not have a brain that is pre-programmed to cope with making its way through life. It is programmed instead to learn as it interacts with its environment.
If Hassabis was going to tap into the way the brain learned to solve problems, then knowing how the brain works was clearly going to help in his dream of creating a program to play Go. So he decided to do a PhD in neuroscience at University College London. It was during coffee breaks from lab work that Hassabis started discussing with a neuroscientist, Shane Legg, his plans to create a company to try out his ideas. It shows the low status of AI even a decade ago that they never admitted to their professors their dream to dedicate their lives to AI. But they felt they were on to something big, so in September 2010 the two scientists decided to create a company with Mustafa Suleyman, a friend of Hassabis from childhood. DeepMind was incorporated.
The company needed money but initially Hassabis just couldn’t raise any capital. Pitching on a platform that they were going to play games and solve intelligence did not sound serious to most investors. A few, however, did see the vision. Among those who put money in right at the outset were Elon Musk and Peter Thiel. Thiel had never invested outside Silicon Valley and tried to persuade Hassabis to relocate to the West Coast. A born-and-bred Londoner, Hassabis held his ground, insisting that there was more untapped talent in London that could be exploited. Hassabis remembers a crazy conversation he had with Thiel’s lawyer. ‘Does London have law on IP?’ she asked innocently. ‘I think they thought we were coming from Timbuctoo!’ The founders had to give up a huge amount of stock to the investors, but they had their money to start trying to crack AI.
The challenge of creating a machine that could learn to play Go still felt like a distant dream. They set their sights at first on a seemingly less cerebral goal: playing 1980s Atari games. Atari is probably responsible for a lot of students flunking courses in the late 1970s and early 1980s. I certainly remember wasting a huge amount of time playing the likes of Pong, Space Invaders and Asteroids on a friend’s Atari 2600 console. The console was one of the first whose hardware could play multiple games that were loaded via a cartridge. It allowed a whole range of different games to be developed over time. Previous consoles could only play games that had been physically programmed into the units.
One of my favourite Atari games was called Breakout. A wall of coloured bricks was at the top of the screen and you controlled a paddle at the bottom that could be moved left or right using a joystick. A ball would bounce off the paddle and head towards the bricks. Each time it hit a brick, the brick would disappear. The aim was to clear the bricks. The yellow bricks at the bottom of the wall scored one point. The red bricks on top got you seven points. As you cleared blocks, the paddle would shrink and the ball would speed up to make the game play harder.
We were particularly pleased one afternoon when we found a clever way to hack the game. If you dug a tunnel up through the bricks on the edge of the screen, once the ball made it through to the top it bounced back and forward off the top of the screen and the upper high-scoring bricks, gradually clearing the wall. You could sit back and watch until the ball eventually came back down through the wall. You just had to be ready with the paddle to bat the ball back up again. It was a very satisfying strategy!
Hassabis and the team he was assembling also spent a lot of time playing computer games in their youth. Their parents may be happy to know that the time and effort they put into those games did not go to waste. It turned out that Breakout was a perfect test case to see if the team at DeepMind could program a computer to learn how to play games. It would have been a relatively straightforward job to write a program for each individual game. Hassabis and his team were going to set themselves a much greater challenge.
They wanted to write a program that would receive as an input the state of the pixels on the screen and the current score and set it to play with the goal of maximising the score. The program was not told the rules of the game: it had to experiment randomly with different ways of moving the paddle in Breakout or firing the laser cannon at the descending aliens of Space Invaders. Each time it made a move it could assess whether the move had helped increase the score or had had no effect.
The code implements an idea dating from the 1990s called reinforcement learning, which aims to update the probability of actions based on the effect on a reward function or score. For example, in Breakout the only decision is whether to move the paddle at the bottom left or right. Initially the choice will be 50:50. But if moving the paddle randomly results in it hitting the ball, then a short time later the score goes up. The code then recalibrates the probability of whether to go left or right based on this new information. It will increase the chance of heading in the direction towards which the ball is heading. The new feature was to combine this learning with neural networks that would assess the state of the pixels to decide what features were correlating to the increase in score.
At the outset, because the computer was just trying random moves, it was terrible, hardly scoring anything. But each time it made a random move that bumped up the score, it would remember the move and reinforce the use of such a move in future. Gradually the random moves disappeared and a more informed set of moves began to emerge, moves that the program had learned through experiment seemed to boost its score.
It’s worth watching the supplementary video the DeepMind team included in the paper they eventually wrote. It shows the program learning to play Breakout. At first you see it randomly moving the paddle back and forward to see what will happen. Then, when the ball finally hits the paddle and bounces back and hits a brick and the score goes up, the program starts to rewrite itself. If the pixels of the ball and the pixels of the paddle connect that seems to be a good thing. After 400 game plays it’s doing really well, getting the paddle to continually bat the ball back and forward.
The shock for me came when you see what it discovered after 600 games. It found our hack! I’m not sure how many games it took us as kids to find this trick, but judging by the amount of time I wasted with my friend it could well have been more. But there it is. The program manipulated the paddle to tunnel its way up the sides, such that the ball would be stuck in the gap between the top of the wall and the top of the screen. At this point the score goes up very fast without the computer’s having to do very much. I remember my friend and I high-fiving when we’d discovered this trick. The machine felt nothing.
By 2014, four years after the creation of DeepMind, the program had learned how to outperform humans on twenty-nine of the forty-nine Atari games it had been exposed to. The paper the team submitted to Nature detailing their achievement was published in early 2015. To be published in Nature is one of the highlights of a scientist’s career. But their paper achieved the even greater accolade of being featured as the cover story of the whole issue. The journal recognised that this was a huge moment for artificial intelligence.
It has to be reiterated what an amazing feat of programming this was. From just the raw data of the state of the pixels and the changing score, the program had changed itself from randomly moving the paddle of Breakout back and forth to learning that tunnelling the sides of the wall would win you the top score. But Atari games are hardly on a par with the ancient game of Go. Hassabis and his team at DeepMind decided they were ready to create a new program that could take it on.
It was at this moment that Hassabis decided to sell the company to Google. ‘We weren’t planning to, but three years in, focused on fundraising, I had only ten per cent of my time for research,’ he explained in an interview in Wired at the time. ‘I realised that there’s maybe not enough time in one lifetime to both build a Google-sized company and solve AI. Would I be happier looking back on building a multi-billion business or helping solve intelligence? It was an easy choice.’ The sale put Google’s firepower at his fingertips and provided the space for him to create code to realise his goal of solving Go … and then intelligence.