JUNK DNA
JUNK DNA
A Journey Through the Dark Matter of the Genome
NESSA CAREY
COLUMBIA UNIVERSITY PRESS
NEW YORK
Columbia University Press
Publishers Since 1893
New York Chichester, West Sussex
cup.columbia.edu
Copyright 2015 Nessa Carey
All rights reserved
E-ISBN 978-0-231-53941-8
Published simultaneously in the United Kingdom by Icon Books Ltd.
ISBN 978-0-231-17084-0 (cloth : alk. paper)
ISBN 978-0-231-53941-8 (e-book)
Library of Congress Control Number : 2014955417
A Columbia University Press E-book.
CUP would be pleased to hear about your reading experience with this e-book at .
Cover design by Edward Bettison
Illustration by Edward Bettison
References to websites (URLs) were accurate at the time of writing. Neither the author nor Columbia University Press is responsible for URLs that may have expired or changed since the manuscript was prepared.
For Abi Reynolds, who is always by my side
And for Sheldon good to see you again
Contents
I am lucky that for my second book I continue to have the support of a great agent, Andrew Lownie, and of lovely publishers. At Icon Books Id particularly like to thank Duncan Heath, Andrew Furlow and Robert Sharman, but not forgetting their former colleagues Simon Flynn and Henry Lord. At Columbia University Press Im very grateful to Patrick Fitzgerald, Bridget Flannery-McCoy and Derek Warker.
As always, entertainment and enlightenment have been obtained from some unusual quarters. Conor Carey, Finn Carey and Gabriel Carey all played a role in this, and outside the genetic clan Id also like to thank Iona Thomas-Wright. Endless support and lots of biscuits have been provided by my ever-patient, delightful mother-in-law, Lisa Doran.
Ive had a blast delivering lots of science talks to non-specialist audiences since my first book was published. The various organisations that have invited me to speak are too many to namecheck but they know who they are and Ive enjoyed the privilege immensely. Its been very inspiring. Thank you all.
And finally Abi. Who is mercifully forgiving of the fact that, despite my promises, I still havent had that ballroom dancing lesson yet.
Theres a bit of a linguistic difficulty in writing a book on junk DNA, because it is a constantly shifting term. This is partly because new data change our perception all the time. Consequently, as soon as a piece of junk DNA is shown to have a function, some scientists will say (logically enough) that its not junk. But that approach runs the risk of losing perspective on how radically our understanding of the genome has changed in recent years.
Rather than spend time trying to knit a sweater with this ball of fog, I have adopted the most hard-line approach. Anything that doesnt code for protein will be described as junk, as it originally was in the old days (second half of the twentieth century). Purists will scream, and thats OK. Ask three different scientists what they mean by the term junk, and we would probably get four different answers. So theres merit in starting with something straightforward.
I also start by using the term gene to refer to a stretch of DNA that codes for a protein. This definition will evolve through the course of the book.
After my first book The Epigenetics Revolution was published, I realised the readership was quite binary with respect to gene names. Some people love knowing which gene is being discussed, but for other readers it disrupts the flow horribly. So this time I have only used specific gene names in the text where absolutely necessary. But if you want to know them, they are in the footnotes, and the citations for the original references are at the back of the book.
Imagine a written script for a play, or film, or television programme. It is perfectly possible for someone to read a script just as they would a book. But the script becomes so much more powerful when it is used to produce something. It becomes more than just a string of words on a page when it is spoken aloud, or better yet, acted.
DNA is rather similar. It is the most extraordinary script. Using a tiny alphabet of just four letters it carries the code for organisms from bacteria to elephants, and from brewers yeast to blue whales. But DNA in a test tube is pretty boring. It does nothing. DNA becomes far more exciting when a cell or an organism uses it to stage a production. The DNA is used as the code for creating proteins and these proteins are vital for breathing, feeding, getting rid of waste, reproducing and all the other activities that characterise living organisms.
Proteins are so important that in the twentieth century scientists used them to define what they meant by a gene. A gene was described as a sequence of DNA that codes for a protein.
Lets think about the most famous scriptwriter in history, William Shakespeare. It can take a while for us to tune in to Shakespeares writings because of the way the English language has changed in the centuries since his death. But even so, we are always confident that the bard only wrote the words he needed his actors to speak.
Shakespeare did not, for example, write the following:
vjeqriugfrhbvruewhqoerahcxnqowhvgbutyunyhewq icxhjafvurytnpemxoqp[etjhnuvrwwwebcxewmoipzo wqmroseuiednrcvtycuxmqpzjmoimxdcnibyrwvyteb anyhcuxqimokzqoxkmdcifwrvjhentbubygdecftywer ftxunihzxqwemiuqwjiqpodqeotherpowhdymrxname hnfeicvbrgytrchguthhhhhhhgcwouldupaizmjdpq smellmjzufernnvgbyunasechuxhrtgcnionytuiongdjsi oniodefnionihyhoniosdreniokikiniourvjcxoiqweopap qsweetwxmocviknoitrbiobeierrrrrrruorytnihgfiwosw akxdcjdrfuhrqplwjkdhvmogmrfbvhncdjiwemxsklowe
Instead, he just wrote the words which are underlined:
vjeqriugfrhbvruewhqoerahcxnqowhvgbutyunyhewq icxhj a fvurytnpemxoqp[etjhnuvrwwwebcxewmoipzo wqm rose uiednrcvtycuxmqpzjmoimxdcni by rwvyteb any hcuxqimokzqoxkmdcifwrvjhentbubygdecftywer ftxunihzxqwemiuqwjiqpodqe other powhdymrx name hnfeicvbrgytrchguthhhhhhhgc would upaizmjdpq smell mjzufernnvgbyun as echuxhrtgcnionytuiongdjsi oniodefnionihyhoniosdreniokikiniourvjcxoiqweopap q sweet wxmocviknoitrbiobeierrrrrrruorytnihgfiwosw akxdcjdrfuhrqplwjkdhvmogmrfbvhncdjiwemxsklowe
That is, A rose by any other name would smell as sweet.
But if we look at our DNA script it is not sensible and compact, like Shakespeares line. Instead, each protein-coding region is like a single word adrift in a sea of gibberish.
For years, scientists had no explanation for why so much of our DNA doesnt code for proteins. These non-coding parts were dismissed with the term junk DNA. But gradually this position has begun to look less tenable, for a whole host of reasons.