Читать книгу The Language of the Genes - Steve Jones, Steve Jones - Страница 9
Chapter Three HERODOTUS REVISED
ОглавлениеThe Greek traveller Herodotus felt that he knew the world well. He voyaged around the Mediterranean and heard much of the Phoenicians’ journeys into Africa. By putting what he knew of the globe’s landmarks together he came to the conclusion that ‘Europe is as long as Africa and Asia put together, and for breadth is not, in my opinion, even to be compared with them.’ Herodotus had things in about the right places in relation to each other but the physical distances between them were hopelessly wrong.
For two thousand years maps could only be made in the Greek way. They were relative things, made by trying to fit landmarks together, with no measure of the absolute distances involved. Familiar bits of the countryside loomed far larger than they deserved. Mediaeval charts were not much better. Although the shape of Africa is recognisable it is much distorted. The cartographers’ perception of remoteness was determined by how long it took to travel between two points rather than how far apart they really were.
Genetics, like geography, is about maps; in this case the inherited map of ourselves. Not until the invention of accurate clocks and compasses two thousand years after Herodotus was it possible to measure real distances on the earth’s surface. Once these had been perfected, good maps soon appeared and Herodotus was made to look somewhat foolish. Now the same thing is happening in biology. Geneticists, it appears, were until not long ago making the same mistakes as the ancient Greeks.
Just as in mapping the world, progress in charting genes had to wait for technology. Now that it has arrived the shape of the biological atlas has been revolutionised, with a change in world-view far greater than that which separates the geography of the Athenians from that of today. What, even three decades ago, seemed a simple and reliable chart of the genome (based, as it was, on landmarks such as the colour of peas or of inborn disease) now looks very deformed.
The great age of cartography was driven, in the end, by economics: by the desire to find new materials and new markets. The mappers’ Columbian ambitions needed a Ferdinand and Isabella. Even fifty years ago, to those in the know, there seemed to be money in DNA, and many great foundations gave cash to the subject. Not until the 1980s did it seem feasible to chart the whole lot and, even then, it seemed that the task would take decades. Such is the rate of progress that the job is now, just after the millennium, in effect complete. The politician’s ear and the scientist’s ego shifted cash into Programs, Institutes and Centres as the free market in science was abandoned in favour of the planned economy; but, in the end, the Human Genome Project worked and at last we have the map of ourselves. Taxpayers (most of them American) played an important part, but in its latter days the job was split, with some acrimony, between governments in consort with charities (such as the Wellcome Foundation at its campus near Cambridge) and private institutions, the biggest run by a defector from an American government laboratory. There was a mad rush to patent genes. Large sums changed hands. The rights to one technology were sold to a Swiss company for three hundred million dollars. At the end of the DNA bonanza the altruists were ahead and large parts of the information were fed onto the internet, where it is available to all.
The idea of a gene map came first not from technology but from deviations from Mendel’s laws. Morgan, with his flies, found lots of inherited attributes that followed the rules. Their lines of transmission down the generations were not connected to each other; like pea colour and shape the traits were independently inherited. There was one big exception. Certain combinations of characters, those on the sex chromosomes, did not behave in this way. Soon, they were joined by others.
Mendel found that the inherited ratios for the colour of peas were not affected by whether the peas were round or wrinkled. Morgan, in contrast, discovered that, quite often, pairs of characteristics (such as eye colour and sex) travelled down the generations together. Soon, many different genes (such as those for eye colour, reduced wings and forked body hairs) in flies were found to share a pattern of inheritance with sex and, as a result, with the X chromosome. They were, in flagrant disregard of Mendel’s rules, not independent. To use Morgan’s term, they were linked.
Within a few years, many other traits turned out to be transmitted together. Experiments with millions of flies showed that all Drosophila genes could be arranged into groups on the basis of whether or not their patterns of inheritance were independent. Some combinations behaved as Mendel expected. For others, pairs of traits from one parent tended to stay together in later generations. The genes involved were, as Morgan put it, in the same linkage group. The number of groups was the same as the number of chromosomes. This discovery began the ‘linkage map’ of Drosophila and became the connection between Mendelism and molecular biology.
Linkage is the tendency of groups of genes to travel together down the generations. It is not absolute. Genes may be closely associated or may show only a feeble preference for each other’s company. Such incompleteness is explained by some odd events when sperm and egg are formed. Every cell contains two copies of each of the chromosomes. The number is halved during a special kind of cell division in testis or ovary. The chromosomes lie together in their pairs and exchange parts of their structure. Sperm or egg cells hence contain combinations of chromosomal material that differ from those in the cells of the parents who made them.
That is why, within a linkage group, certain genes are inherited in close consort while others have a less intimate association. If genes are near each other they are less likely to be parted when chromosomes exchange material. If they are a long way apart, they split more often. Pairs of genes that each follow Mendel are on different chromosomes. Recombination, as the process is called, is like shuffling a red and a black hand of cards together. Two red cards a long way apart in the hand are more likely to find themselves split from each other when the new deck is divided than are two such cards close together. Such rearrangements mean that each chromosome in the next generation is a new mixture of the genetic material made up of reordered pieces of the chromosome pairs of each parent.
Recombination helped make the first genetic maps. Like the cards in a hand held by a skilled player, genes are arranged in a sequence. Their original position can be determined by how much this is disturbed each generation as the inherited cards are shuffled. By studying the inheritance of groups of genes Morgan worked out their order and their relative distance apart. Combining the information from small sets of inherited characters allowed what he called a ‘linkage map’ to be made.
Linkage maps, based as they are on exceptions to Mendelism, are very useful. They have been made for bacteria, tomatoes, mice and many other beings. Thousands of genes have been mapped in this way. In Drosophila almost all have been arranged in order along the chromosomes and in mice almost as many.
Because this work needs breeding experiments, the human linkage map remained for many years almost a perfect and absolute blank. Most families are too small to look for deviations from Mendel’s rules and too few variants were known to look for them. There seemed little hope that a genetic chart of humankind could be made.
The one exception to this terra incognita was sex linkage. If genes are linked to the X chromosome, they must be linked to each other. It did not take long for dozens of traits to be mapped there. To draw the linkage map for other chromosomes was a painfully slow business. The gene for colour-blindness was mapped to the X in 1911, but the first linkage on other chromosomes did not emerge until 1955, when the gene for the ABO blood groups was found to be close to that for an abnormality of the skeleton. The actual number of human chromosomes was established in the following year and the first non-sex linked gene mapped onto a specific chromosome in 1968.
Now, genetics has been transformed. The technology involved is as to linkage mapping as satellites are to sextants. It does not depend on crosses and comes up with much more than a biological chart based on patterns of inheritance. Geneticists have now made a more conventional (but much more detailed) kind of chart, a physical map of the actual order of all the bases along the DNA. The new atlas of ourselves has changed our views of what genes are.
In the infancy of human genetics, thirty years ago, biologists had a childish view of what the world looks like. As in the mental map of an eleven year-old (or of Herodotus) linkage was based on a few familiar landmarks placed in relation with each other. The tedious but objective use of a measure of distance changed all that. Thirty years ago, molecular biologists were full of hubris. They had, they thought, solved the problems of inheritance. The new ability to read the DNA message would do the job that family studies and linkage mapping had failed to complete; it would show where all our genes were in relation to each other. The edifice whose foundations were laid by Mendel would then be complete. Optimism was, at the time, reasonable. It seemed a fair guess that the physical map of the genes would look much like a biological map based on patterns of inheritance and might in time replace it.
Such optimism was soon modified. The first explorations of the unknown territory which lay along the DNA chain showed that the physical map was quite different from the linkage map as inferred from peas or fruit-flies. The genes themselves are not beads lined up on a chromosomal string, but have a complicated and unexpected structure.
The successes of the molecular explorers depended, like those of their geographical predecessors, on new surveying instruments which made the world a bigger and more complicated place. The tools used in molecular geography deserve a mention.
The first device is electrophoresis, the separation of molecules in an electric field. Many biological substances, DNA included, carry an electrical charge. When placed between a positive and a negative terminal they move towards one or the other. A gel (which acts as a sieve) is used to improve the separation. Gels were once made of potato starch, while modern ones are made of chemical polymers. I have tried strawberry jelly, which works quite well. The gel separates molecules by size and shape. Large molecules move more slowly as they are pulled through the sieve while smaller ones pass with less difficulty. Various tricks improve the process. Thus, a reversal of the current every few seconds means that longer pieces of DNA can be electrophoresed, as they wind and unwind each time the power is interrupted. The latest technology uses arrays of fine glass tubes filled with gel, into each of which a sample is loaded. With various tricks the whole process becomes a production line and tens of thousands of samples can be analysed each day.
The computer on which I wrote this book has some fairly useless talents. It can – if asked – sort all sentences by length. This sentence, with its twenty words, would line up with many otherwise unrelated sentences from the rest of the book. Electrophoresis does this with molecules. The length of each DNA piece can be measured by how far it has moved into the gel. Its position is defined with ultraviolet light (absorbed by DNA), with chemical stains, fluorescent dyes that light up when a laser of the correct wavelength is shone on them, or with radioactive labels. Each piece lines up with all the others which contain the same number of DNA letters.
Another tool uses enzymes extracted from bacteria to divide the landscape into manageable pieces. Bacteria are attacked by viruses which insert themselves into their genetic message and force the host to copy the invader. They have a defence: enzymes which cut foreign DNA in specific places. These ‘restriction enzymes’ can be used to slice human genes into pieces. Dozens are available, each able to cut a particular group of DNA letters. The length of the pieces that emerge depends on how often the cutting-site is repeated. If each sentence in this volume was severed whenever the word ‘and’ appeared, there would be thousands of short fragments. If the enzyme recognised the word ‘but’, there would be fewer, longer sections; and an enzyme that sliced through the much less frequent word ‘banana’ (which, I assure you, does appear now and again) would produce just a few fragments thousands of letters long.
The positions of the cuts (like those of the words and, but and banana) provide a set of landmarks along the DNA. To track them down is a first step to reconstituting the book itself. The process is close to that carried out by the students who stormed the American Embassy in Tehran after the fall of the Shah. With extraordinary labour they pieced together secret documents which had been put through a shredder. By putting the fragments together the students reconstituted a long, complicated and compromising message.
Molecular biology does much the same. First, it needs to multiply the number of copies of the message to allow each short piece to be surveyed in detail as a preliminary to the complete map. Various tricks allow cut pieces of DNA to be inserted into that of a bacterium or yeast. The DNA has been cloned. Whenever the host divides, it multiplies not only its own genetic message but the foreign gene. As a result, millions of copies of an original are ready for study in the exquisite detail needed for genetic geography.
Cloning has been supplemented by another contrivance, the polymerase chain reaction. This takes advantage of an enzyme used in the natural replication of DNA to make replicas of the molecule in the laboratory. To pursue our rather tortured literary analogy, the method is a biological photocopier which can produce many duplicates of each page in the genetic manual. The photocopying enzyme comes from a bacterium which lives in hot springs. The reaction is started with a pair of short artificial DNA sequences which bind to the natural DNA on either side of the length to be amplified. By heating and cooling the reaction mixture and feeding it with a supply of the four bases, the targeted strands of DNA unwind, copy themselves with the help of the enzyme, and re-form. Each time the cycle is repeated, the number of copies doubles and millions of replicas of the original piece of DNA are soon generated.
Another piece of trickery exploits DNA’s ability to bind to a mirror image of itself. DNA bases form two matched pairs; A with T and G with C. To find a gene, a complementary copy is made in the laboratory. When added to a cell this seeks out and binds to its equivalent on the chromosome. My computer can do the same. On a simple command, it will search for any word I choose and highlight it in an attractive purple. It does the job best with rare words (like ‘banana’). A DNA probe labelled with a fluorescent dye shows up genes in the same way. The method is known as FISHing (for Fluorescent In-Situ Hybridisation) for genes. A modified kind of FISH involves unwinding the DNA before it is stained. This makes the method more sensitive.
All this and much more has revolutionised the mapping of human DNA. First, it has improved the linkage map. Patterns of inheritance of short sequences of DNA can be tracked through the generations just as well as can those of colour-blindness or stubby fingers. There are millions of sites which vary from person to person. All can be used in pedigree studies. Another scheme is to use the polymerase chain reaction to multiply copies of DNA from single sperm cells. The linkage map is made from a comparison of the reordered chromosomes in the sperm with that in the man who made them. This avoids the problem of family size altogether.
Linkage mapping in humans took a long time to get started and still has some way to go. Before the days of high technology the great problem was a shortage of differences; of variable genes, or segments of genetic material, whose joint patterns of inheritance could be studied. That problem has been solved. Our DNA is now known to be saturated with hundreds of thousands of variable sites, many based on individual variation in the numbers and positions of repeats of the two letters C and A. As a result, a whole new industry based on the most traditional kind of genetics has burst into existence.
It needs, like any industry, raw material. The French, together with the Americans, have identified sixty or so large families with long and complicated pedigrees, well suited for gene mapping. They come from various parts of the world, from Venezuela to Bangladesh. From each individual, lines of cells are kept alive in the laboratory and thousands of variants have been identified, tightly packed along the entire length of the chromosomes. Patients with, say, heart disease can be screened to see whether they also tend to carry other inherited variants. If they do, there is a good chance that the actual gene involved is nearby, and is dragging its anonymous fellows along with it. To find such a milestone may be the first step to the gene itself.
The descendants of Morgan have at last managed to do for humans what was long ago achieved for the fruit fly, and a linkage map of man is close at hand. That of woman, it transpires, is rather longer. Such maps depend on the sexual reshuffling of genes. This takes place, for some reason, more in females than in males and, as a result, their chart works to a different scale.
The human linkage map is useful, but biologists have always wanted to make a different kind of chart, one rather like that used by geographers, based on a straightforward description of the genetic material. Now, it is here. The approach was brutal: to assault the genome with time, money and tedium until the whole lot was read from one end to the other.
The first move in tying the linkage map to one based on the physical structure of DNA depended on a stroke of luck. Morgan noticed that in one of his fly stocks a gene which was usually sex-linked started behaving as if it was not on the X chromosome at all. A glance down the microscope showed why. The X was stuck to one of the other chromosomes and was inherited with it. A change in the linkage relationships of the gene was due to a shift in its physical position.
Such chromosomal accidents were used to begin the human physical map. Sometimes, because of a mistake in the formation of sperm or egg, part of a chromosome shifts to a new home. Any parallel change in the pattern of inheritance of a particular gene shows where it must be. Now and again a tiny segment of chromosome is absent. That can lead to several inborn diseases at once. One unfortunate American boy had a deficiency of the immune system, a form of inherited blindness, and muscular dystrophy. A minute section of his X chromosome had been deleted. It must have included the length of DNA which carried these genes. He gave a vital hint as to just where the gene for muscular dystrophy – one of the most frequent and most distressing of all inherited diseases – was located. The absent segment was a landmark upon which a physical map of the area around this gene could be anchored.
To map genes with changes in chromosomes need not wait for natural accidents. Human cells, or those of mice or hamsters, can be cultured in the laboratory. When mixtures of mouse and human cells are grown together, the cells may fuse to give a hybrid with chromosomes from both species. As the hybrids divide, they lose the chromosomes (and the genes) from one species or the other. Some specifically human genes are lost each time a human chromosome is ejected. To match the loss of particular genes with that of chromosomes (or of their short segments) shows where they must be.
All these methods hint at a gene’s position rather than giving its precise coordinates. Small-scale cartography (or mindless sequencing, as it is affectionately known) involves various clever ruses. One depends on the ability of DNA to copy itself when a special enzyme is provided and the mixture fed with the A, G, C and T bases. It is possible to gradually lengthen pieces of a DNA strand from one end to the other, in four separate experiments (each using a different base). By chemical trickery, some of the growing strands are stopped each time a base is added. This produces a set of DNA pieces of different length, each stopped at an A, a G, a C or a T. Electrophoresis of the mixtures on the same gel gives four parallel lines of DNA fragments arranged by length. A scan across and down the gel gives the order of the bases. This is a most tedious task. It has been supplanted by machines that do the job in other ways. The most important change in genetics is a conceptual one. Because the three-letter code for each amino acid is known, it is possible to deduce the order of the amino acids made by a piece of the DNA once its sequence of bases has been established. What any gene does can be inferred by comparing that sequence with the computer database of others whose job is known. The fit need not be precise; after all, a French dictionary contains thousands of words similar enough to those in English to allow its meaning to be guessed at. It is also sometimes possible to work out the three-dimensional structure of the protein from its amino acid sequence and to deduce what its function might be.
There are some remarkable similarities among inherited vocabularies. The genes that control development are similar in humans and fruit flies, as are those that make their brains. Genes that, when they go wrong, damage the nervous system have close analogues in yeast (which do not have nerves at all) and one of our own genes is almost identical to another that alters the pattern of veins on an insect wing. Such conservatism has had a radical influence on human genetics.
The parts catalogue for a Mercedes C-class car contains four and a half thousand named items, from accelerator pedal to wing mirror to wheel nuts. Some (like individual bolts or washers) may be repeated dozens of times; but the factory has to make fewer than five thousand pieces to feed its assembly line and, in the end, to make its contribution to the European traffic jam. To make a human takes ten times as many – an executive jet’s worth – and the task of seeing how that vast number of pieces is bolted together might seem almost impossible. Even the yeast cell (scarcely the Mercedes of the living world) needs more than the car, with six thousand proteins.
The yeast gene sequence itself, like any other, is no more than a factory manual, containing information on castings, mouldings and blanks but also on various extraneous bits which are removed before the assembly line gets them. Then, as in the Mercedes factory, the parts have to be put together to make a functional piece of machinery. Even that is of no use to someone who cannot drive, and even a skilled driver is no help when dumped in a strange city without a road map. To understand the workings of the cell demands even more.
DNA dismantlers, like car wreckers, generate only a box of bits and pieces; the biological equivalents of the nuts, bolts, relays, springs, struts, wires and all the other things needed to make an automobile. The shape of a human protein can be inferred from a DNA sequence, but even usually gives no hint as to how it fits into the cellular machinery. Yeasts are simpler, and rather more is known about their mechanics. Life’s unwillingness to change allows the yeast machine to be used to explore our own cells. One approach in the human gene hunt is rather like fishing. Take a protein whose job is known, and attach a molecular hook and a separate float to it. Insert it into a male (or a cell showing what passes for maleness in yeasts). Then, mate that alluring individual to a female and drift his gene past all her thousands of cell parts until one takes the bait by slotting into it. The float causes the female cell to light up and the match is made.
A fishing expedition with two hundred or so bait proteins from yeast captured more than a thousand genes in human cells. One whole set of yeast proteins attached themselves to a single human protein that tells the cell when to start dividing and when to stop. The yeast bait is similar to one that, when it goes wrong, causes human cancer: and a quick test proved that the newly hooked human equivalents represented crucial parts of our own cells’ brake and accelerator systems. Such a discovery is of great interest to medicine, and marked the first step in what may become an era of hunting for genes in complex creatures with a lure based on more humble beings.
The genetic languages spoken by different organisms are close indeed; close enough, in fact, to give an even chance that a newly-discovered human gene sequence will be related to something else, either another of our genes or one from a creature remote from ourselves. Human genetics has been transformed. No longer does it start with an inherited change (such as a genetic disease) and search for its location. Instead, it uses the opposite strategy, with a logic precisely opposite that of Mendel: from inherited particle to function, rather than the other way around. Genetics is the first science to have accelerated by going into reverse.
The first breakthrough of this new approach was the successful hunt for the cystic fibrosis gene in 1990. It gave a hint as to what was possible and was the introduction to the advances that led to the complete map a mere decade or so later. The job cost one hundred and fifty million dollars, but the costs per gene have now dropped by hundreds of times.
Cystic fibrosis is the most common inherited abnormality among white-skinned people. In Europe, it affects about one child in two thousand five hundred. Until a few years ago those with the disease died young. Their lungs filled with mucus and became infected. Those with the illness find it hard to digest food as they cannot produce enough gut enzymes. Its dangers have long been recognised. Swiss children sing a song that says ‘The child will die whose brow tastes salty when kissed.’ These symptoms seem at first sight unrelated, but all are due to a failure to pump salt across the membranes which surround cells. Medicine has improved the lives of those affected, but few survive beyond their mid-thirties.
Family studies showed long ago that the disease is due to a recessive gene that is not carried on the sex chromosomes. In 1985, pedigrees revealed that it was linked to another DNA sequence which controls a liver enzyme, although it was not then known upon which chromosome that was. Within a year or so, a kindred was discovered in which this pair of genes was linked to a DNA variant that had already been mapped to chromosome seven. The relevant segment of that chromosome was inserted into a mouse cell line, cut into short lengths and the painful task of sequencing begun. By 1988 the crucial region had been tracked down to a segment of DNA one and a half million base-pairs long. Fragments were tested to see if (like the yeast and human sequences later found to control cell division) they had sequences in common with the DNA of other animals as, if they did, the order of letters must have been retained through evolution because they did some unknown but useful job. Several such sections were uncovered. One had an order of DNA letters similar to that of other proteins involved in transport across membranes. It followed the pattern of inheritance of cystic fibrosis. The gene had been tracked down.
The cystic fibrosis gene is a quarter of a million DNA bases long, although the protein has only about one and a half thousand amino acids. Computer models of its shape show that it spans the cell membrane several times, just as expected for a molecule whose job is to act as a pump. Many families with the disease have just one change in the protein: a single amino acid is missing. That changes its shape and stops the new protein from going to the right place in the cell. Instead it is picked up and destroyed by the internal quality-control network.
The discovery of the gene allowed carriers (together with foetuses bearing two copies) to be identified. Unfortunately, cystic fibrosis which once seemed a simple disorder, can, we now know, be caused by many different DNA changes that vary from place to place and from family to family. The illness gave the first hint about the unexpected and unwelcome complexity that the full map was to reveal.
Mapping exploded after that first discovery. At first, the mappers behaved like any explorer in a new territory. A cartographer does not start with a plan of the beach which is then extended in excruciating detail until the whole country is covered. Instead he picks out the major landmarks and leaves the details until later, when he knows what is likely to be interesting. Before today’s triumph of technology, most mappers were concerned with a small proportion of the genes, those that lead to inherited disease.
All the most important single-gene inherited illnesses were tracked down within a few years. Huntington’s Disease leads to a degeneration of the nervous system and death in middle age. It was once called Huntington’s Chorea (a word with the same root as choreography) after the involuntary dancing movements of those afflicted. An eighteenth-century Harvard professor claimed that those with the disease were blasphemers as their gestures were imitations of the movements of Christ on the Cross and some sufferers were burned. It is a dominant, but with a nasty twist: because of the late onset of symptoms, those at risk are left in uncertainty about their predicament. In 1983 came a breakthrough helped by great good luck. Soon after the search started, the approximate site of the Huntington’s gene was found by following its association with a linked DNA variant some distance away on the same chromosome. Then, luck ran out, and it took ten years to find the gene. It has now been tracked to the tip of chromosome 4. The shape of the protein which has gone wrong – huntingtin, as it is with some lack of imagination called – has been worked out to give, for the first time, some insight into the nature of the disease, which involves nerve cells in effect committing suicide when the aberrant protein (which looks like nothing else in the cell) instructs them to do so. Many more damaged genes soon fell victim to the genetic explorers and were pinned onto the map.
Type in the four letters OLIM – On Line Inheritance In Man – into any search engine and a list of ten thousand inherited diseases at once appears; symptoms, inheritance patterns, and, for nearly all, chromosomal grid reference. From the hunt for inherited illness, the search shifted to a wider set of genes. No longer were diseases needed as a first clue. To look for genes only when they go wrong is like trying to work out the principles of the internal combustion engine from car breakdowns. Now, the machine itself can be dismantled and its mechanism inferred directly.
When a gene makes something, it generates a complementary molecule – a messenger, as it is known – which transfers information from DNA to the main part of the cell. Because it produces nothing, most DNA generates no messengers at all. To find such molecules is hence an excellent way to search out working genes. There are tens of thousands of distinct messengers. What most do is quite unknown. In most cells, most are switched off but in the brain a large proportion are at work at any time. The brain is more active than is any other tissue (which may help to explain why more than a quarter of all inherited diseases lead to mental illness).
The hunt for genes is more like that for Timbuctu than for El Dorado. The mappers soon found that genes are oases of sense in a desert of nonsense. At one time, it seemed scarcely worth sifting the sands between the genetic cities, but, in the end, the complete map was made mainly on the grounds that it was worth while as one never knows what might turn up. It reaffirmed one of the most misunderstood facts in science; that it is possible to solve most problems by throwing money at them.
The assault on the physical map is best compared to surveying a country with a six-inch ruler, starting at one end and driving on to the opposite frontier. Twenty and more years ago, when the job began, one person could do about five thousand DNA bases a year. Now, it is routine to do thousands of times as many. Much of the intellectual effort of the job has moved from the simple accumulation of information to understanding it. Computer wizardry has played as important a part in the gene map as has biochemical machinery.
Once a segment of DNA has been sequenced, the local maps – the town plans – must be put in the right order. One way to build up a larger chart is to make a series of overlapping sequences of short pieces of DNA. The approach is a little like putting pages ripped out of a street guide back together by looking at the overlaps at the edge of each page in an attempt to find streets which run into each other. Sophisticated programs look for superimposed segments, long or short, and reassemble the torn fragments of DNA. That is much harder than it seems. An alphabet of just four letters and – like the map of an American city – many repeats of the same pattern of streets, gives plenty of chances for confusion. There are some short cuts. One trick, useful in the early days, was to jump several pages in the guide in the hope of missing out particularly tedious parts of the neighbourhood but for completion even the dullest parts of town must be charted.
New and powerful computers have made it possible, in principle at least, to make a whole genetic atlas at once, rather than piecing it together page by page. The ‘random shotgun’ approach lives up to its name. It blasts copies of the genome into thousands of segments, again and again, and, like a taxidermist rebuilding a single pheasant from the casual slaughter of many by a blind man with a twelve-bore, reconstitutes the whole thing from scratch. A giant program puts all the shattered pieces together, until at last they look like a map (or a game-bird). That approach worked well in fruit-flies, whose genome was sequenced before that of our own, but flies have a tenth as many DNA letters and far less repetition of easily-confused short sequences than we do. The less audacious ‘clone by clone’ approach takes tiny fragments (each about a twenty-thousandth of the whole of human DNA) and sequences them one by one. Then, it reassembles short segments of genes and, in time, re-forms the whole atlas. The approach, plodding as it may be, has worked well with humans and was used by the publicly-funded mappers to publish each clone as it appeared and to help thwart the privatised plan to sequence (and patent) the whole of our DNA at one fell swoop.
The physical map does not look at all like the linkage maps which emerged from family studies. The central difficulty is one of scale. A few tens of thousands of functional genes fit into three thousand million DNA letters. As most genes use only the information coded into several thousand bases there seems to be far more DNA than is needed. Mapping shows that just one part in twenty represents part of a gene. Our genome has an extraordinary and quite unexpected structure.
A geographical analogy may help. Imagine the journey along the whole of your own DNA as a trip from Land’s End to John o’Groat’s via London; about a thousand miles altogether. To fit in all the DNA letters into a road map on this scale, there have to be fifty DNA bases per inch, or about three million per mile. The journey passes through twenty-three counties of different sizes. These administrative divisions, conveniently enough, are the same in number as the twenty-three chromosomes into which human DNA is packaged. With the exception of some short segments a few hundred yards long which, for various technical reasons, have proved recalcitrant, the whole lot has been mapped out with an accuracy of one part in fifty thousand – an inch in a mile (which is as good or better than the maps sold by the Ordnance Survey).
The scenery for most of the trip is tedious. Like much of modern Britain it seems to be unproductive. About a third of the whole distance is covered by repeats of the same message. Fifty miles, more or less, is filled with words of five, six or more letters, repeated next to each other. Many are palindromes. They read the same backwards as forwards, like the obituary of Ferdinand de Lesseps – ‘A man, a plan, a canal: Panama!’ Some of these ‘tandem repeats’ are scattered in blocks all over the genome. The position and length of each block varies from person to person. The famous ‘genetic fingerprints’, the unique inherited signature used in forensic work, depend on variation in the number and position of such segments. Other repeated sequences involve just the two letters, C and A, multiplied thousands of times while yet more are remnants of ancient viruses. Large sections of the genome are given over to long and complicated messages that seem to say nothing.
It is dangerous to dismiss all this DNA as useless because we do not understand what it says. The Chinese term ‘Shi’ can – apparently – have seventy-three different meanings depending on how it is pronounced. It is possible to construct a sentence such as ‘The master is fond of licking lion spittle’ just by using ‘Shi’ again and again. This would seem like empty repetition to those who cannot speak Chinese.
Much of the inherited landscape is littered with the corpses of abandoned genes, sometimes the same one again and again. The DNA sequences of these ‘pseudogenes’ look rather like that of their functional relatives, but are riddled with decay and no longer make anything. At some time in their history a crucial part of the machinery was damaged. Since then they have been rusting. Oddly enough, the same pseudogenes may turn up at several points along the journey.
After many miles of dull and repetitive DNA terrain, we begin to see places where some product is made. These are the functional genes. They, too, have some surprises in their structure. Each can be recognised by the order of the letters in the DNA alphabet, which start to read in words of three letters written in the genetic code, as a hint that it could produce a protein. In most cases there are few clues about what its product does, although its structure can be deduced (and its shape inferred) from the order of its DNA letters.
Most genes are arranged in groups that make related products, with about a thousand of these ‘gene families’ altogether. One is involved in the manufacture of the red pigment of the blood. Most of the DNA in the bone-marrow cells which produce the red cells of the blood is switched off. One small group of genes is hard at work. As a result they are better known than any other. Much of human molecular biology grew from research on this particular genetic industrial centre, the globin genes.
They have two factories. One is halfway along the genetic road to John o’Groat’s – in Leeds. It makes one part of the protein involved in carrying oxygen. The beta-globin industrial estate contains about half a dozen sections of DNA that code for related things. That responsible for part of adult haemoglobin (and involved, when it goes wrong, in sickle-cell disease) is quite small: about three feet long on this map’s scale. A few feet away is another one which makes a globin found in the embryo. Close to that is the decayed hulk of some equipment which stopped working years ago. The beta-globin factory covers about a hundred feet altogether, most of which seems to he unused space between functional genes. It co-operates with a sister estate, the alpha-globin unit, a long way away, (near London, on this mythical map) which produces a related protein. When joined together, the two products make the red blood pigment itself. Most genes are arranged in families, either close together or scattered all over the genome.
The map of ourselves shows that genes are of very different size, from about five hundred letters long to more than two million. One makes the largest known protein, titin, a molecular shock-absorber; a long, pleated structure found in muscles, in blood cells and in chromosomes. Whatever the size of its product, titin is by no means the largest gene. Most human genes have their functional segments interrupted by lengths of non-coding DNA – in Huntington’s disease, for example, by nearly seventy In many genes (such as the one which goes wrong in muscular dystrophy) the great majority of the DNA codes for nothing. The non-coding material, whose importance varies greatly from gene to gene, participates in the first part of the production process, but this segment of the genetic alphabet is snipped out of the message before the protein is assembled. This seems an odd way to go about things, but it is the one which evolution has come up with.
The general picture began to emerge as soon as the mappers began work. In the year 2000 – almost exactly a century after the rediscovery of Mendel’s rules – their labours were, in effect, complete and the whole human gene sequence was laid out in all its tedium before a less than startled world. Three thousand million letters (or, as now it appears, slightly more) is a lot. For accuracy, each section had to be sequenced ten times or more and even at a thousand DNA bases a second (which is what the machinery pumps out) that was not easy. Sixteen centres, in France, Japan, Germany, China, Britain and the United States combined to do the job. Most were funded by governments or charities, with the notorious exception of the Celera Genomics company (their motto: ‘Discovery Can’t Wait!’), whose head defected from a government programme. Advances in technology reduced the original estimate of three billion dollars by ten times which, for a project – described by President Clinton as the most wondrous map ever produced – with far more scientific weight than the Moon landings, was a remarkable bargain. For much of the time, the private and public sectors were at daggers drawn (vividly illustrated by Celera’s description of the director of one public laboratory behaving as if he had been bitten by a rabid dog).