Читать книгу A Framework for Scientific Discovery through Video Games - Seth Cooper - Страница 11
Оглавление3 Framework
3.1 Introduction
This chapter introduces a general framework for scientific discovery games. We present guidelines for mapping a scientific problem into a game, and address the often conflicting goals of engagement and scientific relevance. The driving example is Foldit, a game for scientific discovery in biochemistry. We describe the architecture of the game. The architecture is flexible and able to coevolve, along with the game’s players, to improve as a tool. We discuss the teaching and reward structures in the game, intended to appeal to a wide variety of players, regardless of biochemistry background.
A scientific discovery game translates a class of computationally difficult scientific problems into puzzles, and provides a game-like mechanism for non-scientist players to help solve these problems. Many traditional aspects of game design apply to scientific discovery games, including the design of introductory levels to draw newcomers and explain game mechanics, the use of a client-server architecture for competition and collaboration, and the requirement that the game be fun. However, unlike games whose goal is entertainment or education, scientific discovery games introduce a unique challenge: enabling non-scientist natural problem solvers to advance a specific scientific domain. This challenge influences all aspects of the game design. First, visualization and graphics need to promote human ability to see complex solutions and convey accurate scientific information while remaining accessible to beginners. Second, interaction design must optimize for natural interactions suitable for the human exploration process, while still respecting scientific constraints. Finally, the scoring mechanism needs to be informative enough to promote multiple human strategies, while remaining true to the latest models of the underlying scientific phenomenon. Perhaps the most distinguishing feature and the greatest difficulty of design for this type of game is that the solution to the scientific problem, and thus the solution to the corresponding puzzles, is unknown. Since we do not know the solution a priori, we cannot design the game with specific solutions in mind.
Figure 3.1 Foldit webpage. The front page shows recent news about the game, the top players and groups for the current puzzles, and allows the player to log in.
To explore this space, we focused on human ability to reason about 3D structures and on the biochemistry domain, where many problems tend to be structural. We developed Foldit, a biochemical discovery game. In this chapter, we discuss the framework for Foldit’s design, with emphasis on the game’s initial focus on protein structure prediction—determining a protein’s shape given its sequence of constituent amino acids. Protein structure prediction involves finding favorable interactions that form when the protein’s chemical groups come into contact—essentially a 3D jigsaw puzzle. We believe that humans’ innate spatial reasoning ability makes it possible for non-scientists to make useful contributions to this problem. We leverage scientists’ knowledge to shape the rules of the game, thus enabling a much larger pool of non-scientists to make discoveries within this framework.
The webpage for Foldit is located at http://fold.it. The front page is shown in Figure 3.1. Foldit was publicly released in May 2008. During the first two years following release, we ran roughly 600 structure prediction puzzles and had over 57,000 players from a wide variety of backgrounds participate.
The rest of this chapter describes our experience designing Foldit, with a special emphasis on the unique challenges posed by making biochemistry problems accessible to anyone. The creation of Foldit was a challenging and multidisciplinary project, drawing together computer science, art, game design and biochemistry. Moreover, we did not know ahead of time which parts of the problem players would be best at solving, or which in-game manipulation tools they would use most effectively. The only way to find out was to have people play Foldit. In order to deal with these and other uncertainties, we took an iterative approach both before and after releasing the game to the public. We have continually evolved the gameplay in response to massive gameplay traces, player feedback and scientists’ analysis, and continue even now with this iterative process as we add features and expand the set of biochemical problems to which the Foldit community can contribute.
Games are often designed with an iterative approach, which involves designing, testing, and evaluating repeatedly until the player’s experience meets some criteria [Fullerton 2008 ]. For most games, the main criterion for the player’s experience is simply to have fun. Player feedback and playtesting are an integral part of the process, and there are a number of methods of gathering and incorporating this information from players [Ambinder 2009 ]. We have also continued the design process after the game’s release, to incorporate data gathered from the players in a continual process of evolutionary redesigning [Kennerly 2003 ]. Our work differs from the standard iterative approach in that the game design space is constrained to conform with existing physical models, we include the input of scientists in the evaluation of the game, and we include the long-term coevolution of the players and game in the design.
3.2 Biochemistry Background
Here we provide some background on biochemistry and proteins that will be used throughout the rest of this work.
DNA, a cellular chemical perhaps more widely recognized than proteins, derives its entire purpose in encoding protein sequences. Proteins are coded for by DNA, and are created in the cell as a long chain of amino acids. A protein’s amino acid sequence is known as its primary structure. There are twenty different types of amino acids. Regardless of type, some of atoms making up the amino acid will be the same; these are connected together and form the protein’s backbone. However, the remaining atoms are different for each type; these extend outward from the backbone and are called sidechains. The atoms that make up the sidechains divide the amino acids into two main groups: hydrophobic, which prefer to be buried on the interior away from water; and hydrophilic, which prefer to be exposed on the exterior near water. These preferences impact how the protein folds. As the amino acids are connected together, the protein begins to fold up; after the amino acids join together, they are often called residues. Local characteristics of the fold are referred to as secondary structure. These include: helices, which are tightly coiled; sheets, which are extended straight; and loops, which are everything else. The positions of the atoms making up a folded protein is its tertiary structure; the tertiary structure taken in nature is a native structure. The native structure is one that is lowest in free energy—it has the most favorable set of chemical interactions. It is well known that sequence determines structure [Anfinsen 1973 ]. In this book, the term sequence will refer to a protein’s primary structure, and structure will refer to its tertiary structure, unless otherwise specified.
3.3 Framework Description
3.3.1 Architecture
Herewegiveanoverview of the architecture of Foldit, which can be seen at a high level in Figure 3.2. Foldit uses a client-server architecture. Players must create an account and download the game in order to play. Thegamethen communicates with a central server to send information about the local player and get information about other players.
Scientists post problems to the server; in the case of Foldit, these are protein structures for which the players are meant to find the native structures. An initial protein structure is associated with metadata such as a title and description, and parameterization such as which energy function terms to use. We call these puzzles, and they are posted on the server for a fixed amount of time (usually a week). While a puzzle is active, players can download it and interactively reshape the protein to try to achieve the best score. This often requires significant changes to the puzzle structures, which are given in various partially-folded states, and in some cases need to be completely refolded from a straight line. Players’ structures, or solutions, are reported back to the server, and players are ranked against other players who are playing the same puzzle. Players can form groups with which to share their solutions through the server, allowing them to work together to find even better solutions than they could working alone. Whenone player shares a solution by uploading it to the server, other players in the same group are able to see it and download it. The social aspect of the game is supported by in-game chat, a website with forums, and a player-created wiki. At the close of a puzzle, the solution data is aggregated, and presented to the scientists for analysis.
The game is designed to be flexible, and the client allows automatic updating so that we can continually evolve the gameplay. The puzzle posting cycle and automatic updates allow us to respond to not only player feedback, but also to scientists’ analysis, as we introduce and refine gameplay elements.
Figure 3.2 Overview of architecture for scientific discovery games. The biochemistry team provides structure prediction and design problems for the server. These problems become puzzles and are sent to each player’s client. Players collaborate and compete to solve these problems and upload their solutions to the server, where they are aggregated and sent back to the biochemistry team for analysis. This analysis can then be used to improve the design of the game and puzzles. (Figure from Cooper et al. [2010b])
Foldit is built on top of the Rosetta molecular modeling suite which has proven useful at a wide variety of protein modeling tasks [Rohl et al. 2004, Bradley et al. 2005, Qian et al. 2007, Kuhlman et al. 2003]. The suite contains an energy function which captures the interaction energies between protein elements, as well as a set of structural optimization subroutines. For protein structure prediction, structures closer to the native structure will have a lower energy than structures further away from it. Foldit uses this state-of-the-art energy function to compute player’s scores, and also takes advantage of the optimization routines Rosetta makes available.
3.3.2 Coevolution Strategy
In order to arrive at the current state of Foldit, we took an coevolution approach to the game’s design. Given the complexity of this undertaking, werealized that it was unlikely that all our initial decisions would be the best. There are three major groups relevant to our approach: (1) the scientists whose problems the game is meant to help solve; (2) the players; and (3) the game development team. The development team must incorporate feedback from the players to make sure the game is understandable and fun, and from the scientists to make sure that the results produced will be useful to them. Anoverview of the interactions between these three groups is given in Figure 3.3.
Figure 3.3 Overview of the interactions between the three iterative design groups. (Figure from Cooper et al. [2010b])
During the game’s initial development, the development team and scientists must work together closely to determine an initial direction. This involves defining what problems to approach, what the fundamental gameplay mechanics needed are, and what the desired results are. Once possible games have been prototyped, player feedback can begin to be incorporated. Early playtesting helps to uncover what elements of the problem are fun and which can be most confusing and difficult to understand. This can help to both focus the gameplay and narrow the scope of the game to where players will most likely be able to contribute.
After making the game available to the public, a large amount of data and feedback can become available to help improve the game. As in a traditional game, data on gameplay can be gathered from players for an objective analysis of what players are doing, and feedback from the player community is extremely useful in determining new features. However, in a scientific discovery game, as scientists post puzzles and player solutions are analyzed, this analysis must then be incorporated in the design of the game, progressing towards ever better results.
Following this pattern, Foldit has evolved significantly since its initial release. A timeline of significant events in the evolution of the game are given in Figure 3.4.
Figure 3.4 Selected events from the game’s evolution over time. The timeline is shown on the top. Screenshots are included from before release (bottom left) and the current version (bottom right). (Figure from Cooper et al. [2010b])
3.3.3 Categorization as a Game
Although it relies heavily on simulation and visualization, Foldit can be classified as a game, as it possesses the qualities of a game set forth by Schell [Morgan Kaufmann]. Here we list the qualities and how Foldit embodies each.
1.Games are entered willfully: We do not require players to play Foldit.
2.Games have goals: Foldit’s goal is to find the best scoring structure.
3.Games have conflict: Foldit has conflict with both the protein itself, trying to find a better score, and with other players, trying to outrank them.
4.Games have rules: The rules of Foldit are given by the scoring function, available moves, global point structure, and so forth.
5.Games can be won and lost: Each puzzle has a ranking, which could be broken down into “winners” and “losers”.
6.Games are interactive: Foldit allows players to interactively reshape a protein and gives them immediate feedback.
7.Games have challenge: Similar to conflict, Foldit’s challenge arises from achieving higher scores and competing with other players.
8.Games create their own internal value: Foldit’s global points have value for ranking within the game.
9.Games engage players: Foldit keeps players engaged in manipulating protein structures.
10.Games are closed, formal systems: Foldit’s rules define the pieces of the system and how they work together.
3.4 Game Design Challenges
3.4.1 Visualizations
While a user is playing Foldit, several visualizations are available. These help the player determine when they are or aren’t doing well, and show which areas of the protein they could improve and what is wrong with them, so the player can think about how to fix any problems. Figure 3.5 shows a screenshot of the game’s main screen. We intend for the game to look like a game and not necessarily a scientific illustration. While scientific illustration techniques are useful for scientists, they may not be for our purposes, and may in fact be intimidating for non-scientists. Many of the visualizations have options, or can be turned off and on by the player. They include the following.
The protein. The protein itself is rendered in a cartoon-like style. This style is abstract and does not show the exact positions of all the atoms in the protein. The helices, sheets, and loops appear differently along the backbone, and sidechains are rendered very simply. The protein is colored by the score of each residue.
Clashes. These are red flashing spiky balls. They appear where two atoms are too close together, which will severely reduce the score.
Figure 3.5 Foldit’s main game screen. The puzzle Collagen is shown. The protein is in the center; some clashes are visible. The panel in the top right shows the player’s rank and score, leaderboards for groups and individuals in the current puzzle, and chat. Menus and information are in the other corners of the screen.
Hydrogen bonds. These will appear as blue and white ladders where hydrogen bonds have been formed. These bonds improve the score and help hold the protein together.
Hydrophobic sidechains. Hydrophobic and hydrophilic sidechains are shown in different colors. Burying hydrophobic sidechains in the core of the protein can improve the score.
Voids. These yellow spheres will appear where there is empty space in the protein. Filling in the space can improve the score.
The visualizations in a scientific discovery game must achieve several purposes in order to allow players to apply their problem-solving skills. They must reflect and illuminate the natural rules of the system, in a way that makes state of the system evident to the player and directs them to where their contribution will be most useful. At the same time, the visualizations need to manage and hide the complexity of the system, so that players are not immediately overwhelmed by information. They must be approachable by players who have no knowledge of the scientific problem at hand. Thus, they should look inviting and fun, and not bring back memories of high school textbooks. Ideally, they should be customizable, because as with other aspects of the game, it is not clear from the outset what the best visualization will be, and different players may have different preferences.
In order to make the visualization of Foldit reflect and illuminate the fundamental properties of proteins, we worked with scientists to distill simple rules upon which to base them. The first rule is to avoid clashes. Clashes occur when atoms are unrealistically close to each other, causing a large repulsive force. These can be prevented by keeping the atoms from overlapping, and are represented by spiky, rotating spheres that float between the overlapping atoms. The second rule is to fill voids, or empty spaces in the protein. Packing the protein tightly will remove voids. Voids are represented as bubble-like objects that pop when they come in contact with the protein. Clashes and voids appear red, as natural proteins should not generally have any. The third rule is to bury exposed hydrophobics. Hydrophobics are sidechains whose chemical properties are such that it is favorable for them to be on the interior of the protein. Exposed hydrophobics are represented as small, pulsing spheres that move along their sidechain. These are drawn in yellow, rather than red, because natural proteins may have some exposed hydrophobics. The fourth rule is to maintain and create hydrogen bonds, which form between particular pairs of atoms and hold the protein together. Hydrogen bonds appear as undulating bars between the bonded atoms, and are drawn in blue, because they are good.
Due to the spatial nature of the problem, the visualization of the protein closely matches the actual geometry of the protein. To make the overall structure stand out, sheets, helices, and loops are stylized, similar to many scientific visualization tools.1 Sheets appear with a zig-zag pattern that will form hydrogen bonds when properly fit together. Color also plays a large role in the visualization of the protein. The backbone color reflects the score of the protein in a particular region—going from red in poor scoring regions to green in good scoring regions—so players can see where they can gain the most points. The sidechains are colored by hydrophobicity, so players can quickly see if they are extending them in the preferred direction. By coloring backbone and sidechain independently we can display more information while not introducing too much visual clutter.
Foldit takes a number of approaches to manage and hide the complexity of huge networks of interconnected atoms that make up a protein. Many unimportant details are hidden. Hydrogen atoms, which are plentiful on the protein but do not add a lot of structural information, are hidden. However, hidden information will reappear if it becomes important to the player: sidechains can disappear entirely to make the overall structure of the protein’s backbone clearer, but will reappear if they are causing a problem, such as if they are involved in a clash. Many actual clashes themselves are also hidden: only the worst clash is shown on a per-amino acid basis. This prevents the player from being overwhelmed by the number of clashes if the protein is compressed too tightly.
To make the game approachable, we gave the protein itself a bright, cartoonish look. Many pieces of the visualizations move playfully around the protein. There are a wide variety of visualization options available in the game as well, such as alternative colorings and geometries for the protein. These can be accessed through a special menu option that is turned off by default. This approach allows more advanced players the ability to customize their view in the view options menu, but keeps things simple for newcomers.
Visualizations such as voids and exposed hydrophobics can be computationally expensive to compute. To keep the game interactive, we compute such visualizations in a separate thread, which will update the visualization after a delay.
3.4.2 Interactions
Foldit also provides several different methods of acting on the protein. Figure 3.6 is a screenshot of user interaction. Players need actions that allow them to manipulate the proteins in a way which will bring them closer to their goal. They should be intuitive and useful; we have tried to structure our input around the idea of touchability, or direct manipulation. Whenever possible, we have tried to make operations act directly on the protein itself. Some of the possible actions the user can perform are:
Pulling. This is intended to be the primary method of interaction. When the user clicks and drags on part of the protein, a purple arrow extends from the protein to the location of the user’s mouse cursor. The protein will then try to move to stay under the mouse, while still satisfying some energetic preferences.
Bands. Purple bands can be placed by the user to attach one residue to another, or a residue to a point in space. When the user performs another operation on the protein, bands will pull on the attached residues. This can allow the user to keep parts of the protein in place, pull to specific points in space, or pull in multiple places at once.
Figure 3.6 Foldit’s main game screen during interaction. The puzzle Collagen is shown. The player is acting on the protein. The icy blue sheet is locked and the purple cylinder with the round end is a band. The purple cylinder with the pointed end shows where the user is pulling on the backbone. The dark blue part of the backbone is affected by the pull.
Locks. Locks prevent the protein from being affected by operations. The user can lock individual residues or whole secondary structures, giving them an ice-like appearance. Locks allow a kind of implicit selection; by locking two residues, the user can then easily operate on the residues between them.
Wiggle and shake. Wiggle and shake are two automatic actions the user can launch. Wiggle performs an optimization over the backbone, and shake performs an optimization over the sidechains. They can be performed globally as well as locally by using locks.
Rebuild. Rebuild allows the user to specify a section of the protein to be modified primarily by Rosetta fragment insertion, a process of copying backbone angles from similar native structures. This operation has a large element of randomness and can result in drastic changes to the structure.
The interactions in a scientific discovery game must also meet several criteria. They must respect the constraints of the system required. However, they must also be sufficient to explore the space of solutions enough to be able to solve the problem. They should also be as intuitive and fun as possible.