Читать книгу The State of Science - Marc Zimmer - Страница 24
The Internet Opens Science to Amateurs:
Citizen Scientists
ОглавлениеThe growth of the internet, the need for data, low-cost sensor technology, the ubiquitous distribution of cell phones, home computers, and user-friendly mail-order gene editing kits have led to a burgeoning do-it-yourself (DIY) science movement. Amateur scientists have spawned biohackers and citizen scientists.
In 2014, the Oxford English Dictionary added the term “citizen science,” defining it as “scientific work undertaken by members of the general public, often in collaboration with or under the direction of professional scientists and scientific institutions.” The concept of crowdsourcing data collection is not a new one. More than 2,000 years ago, ancient China used its residents to monitor migratory locust swarms. Since then many large research projects have actively involved members of the public. The practice became so common that the phrase “citizen science” was coined by Alan Irwin and Rick Bonney.[6]
I am a computational chemist interested in the structures of bioluminescent proteins, so it should come as no surprise that my first example of citizen science is a massive crowdsourced project to understand how proteins fold, folding@home. Many research groups are trying to use computers to calculate three-dimensional protein structures from their amino acid sequence. Unfortunately, this is not possible yet. The process of protein folding is still a mystery waiting to be solved. It is an important one, with many consequences for science and medicine. If proteins misfold they no longer function properly, and diseases such as mad cow disease and Alzheimer’s disease can result. Protein folding normally takes about one-thousandth of a second. This is extremely fast, but the process is so complicated that even the largest supercomputer cannot solve the problem. There are hundreds of computational chemistry groups working on the protein folding problem, trying to solve it by simplifying the problem and using major computational power or machine learning, as described in chapter 7. This challenge ranks as one of the toughest problems in biology.
Folding@home is an attempt to make use of the computational capacity that is being wasted when home personal computers are sitting around not being used. Anyone can download folding@home software from the web. It is free and looks just like a screensaver to the user. As soon as the computer has been sitting idle for a predetermined time, the folding@home software kicks in and starts calculating. Any keystrokes stop the calculation, and the picture of a molecule disappears from the screen. The folding@home app runs on Linux, Windows, Apple, and Android operating systems. It is one of the world’s largest distributed computing networks. As of October 2016, folding@home had a total computing power of over 100 petaflops, making it the fastest computing system in the world
Since October 2000, more than 900,000 people have downloaded the folding@home software, and it now runs on over 100,000 computers around the world every day. Buying and running 100,000 computers would cost about $50 million, and it would be a nightmare to maintain them. According to Professor Vijay Pande, who is in charge of the project and is based at Stanford University, “This is like having a whole new kind of funding agency for research—namely, the general public donating its computers. (Since this writing, Pande has left academia to become a venture capitalist using artificial intelligence in the biopharma space.) When you factor in the maintenance they are doing, the operating system upgrades, and so on, that’s a gigantic resource!”[7] The program is primarily downloaded by people with interests in computers, biology, and fighting diseases, as well as teachers who find that folding@home is a unique way to get students interested in science. Using the power of distributed computing, the folding@home group has published more than 200 papers. To help compensate users for their computer use, monthly donor statistics are calculated, and foldingcoins are awarded. These are cryptocoins that can be exchanged for bitcoins on cryptocurrency exchanges. Poem@home and Rosetta@home are two distributed computing networks that work just like folding@home, both also focused on calculating protein structures.
“Traditional forms of large-scale computing—building your own cluster, buying time on a supercomputer, buying time on commercial clouds—are all expensive. Many scientists can’t afford large-scale computing,” says David Anderson, a research scientist at UC Berkeley’s Space Science Lab and director of SETI@home. “Volunteer computing tries to solve this problem.”[8] This is why there are dozens of distributed computing networks hoping to use your computer’s downtime more effectively. Most are addressing chemistry and astronomy problems. SETI@home’s home page describes it as “a scientific experiment, based at UC Berkeley, that uses Internet-connected computers in the Search for Extraterrestrial Intelligence (SETI). You can participate by running a free program that downloads and analyzes radio telescope data.” The SETI@home app searches through data collected from the Arecibo radio telescope in Puerto Rico and the Green Bank Telescope in West Virginia to find radio transmissions that might indicate the existence of extraterrestrial intelligence.
Although distributed computed systems certainly provide an important service to science, it could easily be argued that the computer owners are funding the research rather than doing it, that they are science supporters rather than amateur scientists.
The Global Biodiversity Information Facility (GBIF) and iNaturalist rely on traditional amateur scientists and are perhaps better examples of citizen science in action. iNaturalist.org was founded in 2008. The premise of the site is that citizen scientists take photos of plants and animals, which they post with their locations and observations. Other naturalists and scientists on the site identify the species and can use the information to monitor changes in biodiversity. iNaturalist.org has also used the vast amounts of photos and information gathered from its citizen scientists to train an artificial neural network to identify the species of the organisms in most of the animal/plant pictures. In June 2017 the site released an app that uses an artificial intelligence algorithm to identify the species of plant or bird photographed.[9] Many of the iNaturalist.org postings are deposited in the GBIF, where they are part of a database of hundreds of millions of “species occurrence” records. Half the observations come from citizen scientists. In its own words, the GBIF “is an international network and research infrastructure funded by the world’s governments and aimed at providing anyone, anywhere, open access to data about all types of life on Earth.”[10] The facility estimates that its database has been used for more than 2,500 peer-reviewed papers in the last 10 years.[11] The GBIF and iNaturalist.org use the large number of citizen science postings to give us a global picture of what’s happening to our biodiversity, while at the same time educating us and enticing us to participate in the protection of the planet’s biodiversity.
There are some concerns with using citizen science. The animal sightings and geospatial information sent to sites like iNaturalist.org could be used by poachers to find rare and elusive wildlife. And there are no restrictions on the way health monitoring apps such as PatientsLikeMe use the medical data they collect.