JUNK
DNA
Also by Nessa Carey
The Epigenetics Revolution
JUNK
DNA
A Journey Through the
Dark Matter of the Genome
NESSA CAREY
Published in the UK in 2015 by
Icon Books Ltd, Omnibus Business Centre,
3941 North Road, London N7 9DP
email:
www.iconbooks.com
Sold in the UK, Europe and Asia
by Faber & Faber Ltd, Bloomsbury House,
7477 Great Russell Street,
London WC1B 3DA or their agents
Distributed in the UK, Europe and Asia
by TBS Ltd, TBS Distribution Centre, Colchester Road,
Frating Green, Colchester CO7 7DW
Distributed in Australia and New Zealand
by Allen & Unwin Pty Ltd,
PO Box 8500, 83 Alexander Street,
Crows Nest, NSW 2065
Distributed in South Africa by
Jonathan Ball, Office B4, The District,
41 Sir Lowry Road, Woodstock 7925
Distributed in India by Penguin Books India,
7th Floor, Infinity Tower C, DLF Cyber City,
Gurgaon 122002, Haryana
ISBN: 978-184831-826-7
Text copyright 2015 Nessa Carey
The author has asserted her moral rights.
No part of this book may be reproduced in any form, or by any means, without prior permission in writing from the publisher.
Typeset in Janson Text by Marie Doherty
Printed and bound in the UK
by Clays Ltd, St Ives plc
For Abi Reynolds, who is always by my side
And for Sheldon good to see you again
Contents
Acknowledgements
I am lucky that for my second book I continue to have the support of a great agent, Andrew Lownie, and of lovely publishers. At Icon Books Id particularly like to thank Duncan Heath, Andrew Furlow and Robert Sharman, but not forgetting their former colleagues Simon Flynn and Henry Lord. At Columbia University Press Im very grateful to Patrick Fitzgerald, Bridget Flannery-McCoy and Derek Warker.
As always, entertainment and enlightenment have been obtained from some unusual quarters. Conor Carey, Finn Carey and Gabriel Carey all played a role in this, and outside the genetic clan Id also like to thank Iona Thomas-Wright. Endless support and lots of biscuits have been provided by my ever-patient, delightful mother-in-law, Lisa Doran.
Ive had a blast delivering lots of science talks to non-specialist audiences since my first book was published. The various organisations that have invited me to speak are too many to namecheck but they know who they are and Ive enjoyed the privilege immensely. Its been very inspiring. Thank you all.
And finally Abi. Who is mercifully forgiving of the fact that, despite my promises, I still havent had that ballroom dancing lesson yet.
Notes on Nomenclature
Theres a bit of a linguistic difficulty in writing a book on junk DNA, because it is a constantly shifting term. This is partly because new data change our perception all the time. Consequently, as soon as a piece of junk DNA is shown to have a function, some scientists will say (logically enough) that its not junk. But that approach runs the risk of losing perspective on how radically our understanding of the genome has changed in recent years.
Rather than spend time trying to knit a sweater with this ball of fog, I have adopted the most hard-line approach. Anything that doesnt code for protein will be described as junk, as it originally was in the old days (second half of the twentieth century). Purists will scream, and thats OK. Ask three different scientists what they mean by the term junk, and we would probably get four different answers. So theres merit in starting with something straightforward.
I also start by using the term gene to refer to a stretch of DNA that codes for a protein. This definition will evolve through the course of the book.
After my first book The Epigenetics Revolution was published, I realised the readership was quite binary with respect to gene names. Some people love knowing which gene is being discussed, but for other readers it disrupts the flow horribly. So this time I have only used specific gene names in the text where absolutely necessary. But if you want to know them, they are in the footnotes, and the citations for the original references are at the back of the book.
An Introduction to Genomic Dark Matter
Imagine a written script for a play, or film, or television programme. It is perfectly possible for someone to read a script just as they would a book. But the script becomes so much more powerful when it is used to produce something. It becomes more than just a string of words on a page when it is spoken aloud, or better yet, acted.
DNA is rather similar. It is the most extraordinary script. Using a tiny alphabet of just four letters it carries the code for organisms from bacteria to elephants, and from brewers yeast to blue whales. But DNA in a test tube is pretty boring. It does nothing. DNA becomes far more exciting when a cell or an organism uses it to stage a production. The DNA is used as the code for creating proteins and these proteins are vital for breathing, feeding, getting rid of waste, reproducing and all the other activities that characterise living organisms.
Proteins are so important that in the twentieth century scientists used them to define what they meant by a gene. A gene was described as a sequence of DNA that codes for a protein.
Lets think about the most famous scriptwriter in history, William Shakespeare. It can take a while for us to tune in to Shakespeares writings because of the way the English language has changed in the centuries since his death. But even so, we are always confident that the bard only wrote the words he needed his actors to speak.
Shakespeare did not, for example, write the following:
vjeqriugfrhbvruewhqoerahcxnqowhvgbutyunyhewqicxhjafvurytnpemxoqp[etjhnuvrwwwebcxewmoipzowqmroseuiednrcvtycuxmqpzjmoimxdcnibyrwvytebanyhcuxqimokzqoxkmdcifwrvjhentbubygdecftywerftxunihzxqwemiuqwjiqpodqeotherpowhdymrxnamehnfeicvbrgytrchguthhhhhhhgcwouldupaizmjdpqsmellmjzufernnvgbyunasechuxhrtgcnionytuiongdjsioniodefnionihyhoniosdreniokikiniourvjcxoiqweopapqsweetwxmocviknoitrbiobeierrrrrrruorytnihgfiwoswakxdcjdrfuhrqplwjkdhvmogmrfbvhncdjiwemxsklowe
Instead, he just wrote the words which are underlined:
vjeqriugfrhbvruewhqoerahcxnqowhvgbutyunyhewqicxhjafvurytnpemxoqp[etjhnuvrwwwebcxewmoipzowqmroseuiednrcvtycuxmqpzjmoimxdcnibyrwvytebanyhcuxqimokzqoxkmdcifwrvjhentbubygdecftywerftxunihzxqwemiuqwjiqpodqeotherpowhdymrxnamehnfeicvbrgytrchguthhhhhhhgcwouldupaizmjdpqsmellmjzufernnvgbyunasechuxhrtgcnionytuiongdjsioniodefnionihyhoniosdreniokikiniourvjcxoiqweopapqsweetwxmocviknoitrbiobeierrrrrrruorytnihgfiwoswakxdcjdrfuhrqplwjkdhvmogmrfbvhncdjiwemxsklowe
That is, A rose by any other name would smell as sweet.
But if we look at our DNA script it is not sensible and compact, like Shakespeares line. Instead, each protein-coding region is like a single word adrift in a sea of gibberish.
For years, scientists had no explanation for why so much of our DNA doesnt code for proteins. These non-coding parts were dismissed with the term junk DNA. But gradually this position has begun to look less tenable, for a whole host of reasons.
Perhaps the most fundamental reason for the shift in emphasis is the sheer volume of junk DNA that our cells contain. One of the biggest shocks when the human genome sequence was completed in 2001 was the discovery that over 98 per cent of the DNA in a human cell is junk. It doesnt code for any proteins. The Shakespeare analogy used above is in fact a simplification. In genome terms, the ratio of gibberish to text is about four times as high as shown. There are over 50 letters of junk for every one letter of sense.
There are other ways of envisaging this. Lets imagine we visit a car factory, perhaps for something high-end like a Ferrari. We would be pretty surprised if for every two people who were building a shiny red sports car, there were another 98 who were sitting around doing nothing. This would be ridiculous, so why would it be reasonable in our genomes? While its a very fair point that its the imperfections in organisms that are often the strongest evidence for descent from common ancestors we humans really dont need an appendix this seems like taking imperfection rather too far.
Next page