Bioinformatics
Sequence Alignment and Markov Models
Kal Renganathan Sharma, Ph.D., P.E.
Adjunct Professor
Department of Chemical Engineering
Prairie View A&M University
Prairie View, Texas
Copyright 2009 by The McGraw-Hill Companies, Inc. All rights reserved. Manufactured in the United States of America. Except as permitted under the United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of the publisher.
0071593071
The material in this eBook also appears in the print version of this title: 0-07-159306-3.
All trademarks are trademarks of their respective owners. Rather than put a trademark symbol after every occurrence of a trademarked name, we use names in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infringement of the trademark. Where such designations appear in this book, they have been printed with initial caps.
McGraw-Hill eBooks are available at special quantity discounts to use as premiums and sales promotions, or for use in corporate training programs. For more information, please contact George Hoare, Special Sales, at george_hoare@mcgraw-hill.com or (212)904-4069.
TERMS OF USE
This is a copyrighted work and The McGraw-Hill Companies, Inc. ("McGraw-Hill") and its licensors reserve all rights in and to the work. Use of this work is subject to these terms. Except as permitted under the Copyright Act of 1976 and the right to store and retrieve one copy of the work, you may not decompile, disassemble, reverse engineer, reproduce, modify, create derivative works based upon, transmit, distribute, disseminate, sell, publish or sublicense the work or any part of it without McGraw-Hills prior consent. You may use the work for your own noncommercial and personal use; any other use of the work is strictly prohibited. Your right to use the work may be terminated if you fail to comply with these terms.
THE WORK IS PROVIDED AS IS. McGRAW-HILL AND ITS LICENSORS MAKE NO GUARANTEES OR WARRANTIES AS TO THE ACCURACY, ADEQUACY OR COMPLETENESS OF OR RESULTS TO BE OBTAINED FROM USING THE WORK, INCLUDING ANY INFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA HYPERLINK OR OTHERWISE, AND EXPRESSLY DISCLAIM ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. McGraw-Hill and its licensors do not warrant or guarantee that the functions contained in the work will meet your requirements or that its operation will be uninterrupted or error free. Neither McGraw-Hill nor its licensors shall be liable to you or anyone else for any inaccuracy, error or omission, regardless of cause, in the work or for any damages resulting therefrom. McGraw-Hill has no responsibility for the content of any information accessed through the work. Under no circumstances shall McGraw-Hill and/or its licensors be liable for any indirect, incidental, special, punitive, consequential or similar damages that result from the use of or inability to use the work, even if any of them has been advised of the possibility of such damages. This limitation of liability shall apply to any claim or cause whatsoever whether such claim or cause arises in contract, tort or otherwise.
DOI: 10.1036/0071593063
Want to learn more?
We hope you enjoy this McGraw-Hill eBook! If youd like more information about this book, its author, or related books and websites, please .
This work is dedicated to my son
R. Hari Subrahmanyan Sharma
(alias Ramkishan, born August 13, 2001)
with unconditional love.
About the Author
.
Kal Renganathan Sharma, Ph.D., P.E., has written five books, 12 journal articles, and 448 conference papers. He has earned three degrees in chemical engineeringa B.Tech. from the Indian Institute of Technology, Chennai, and an M.S. and a Ph.D. from West Virginia University, Morgantown. He has held a number of high-level positions at engineering colleges and universities. Dr. Sharma currently teaches at Prairie View A&M University in Prairie View, Texas.
Contents
Preface
.
I was requested by the former controller of examinations at the University of Madras, India, A. Sivamurthy, to prepare the curriculum and syllabus for a B.Tech. degree in bioinformatics at the Vellore Institute of Technology, Vellore, India, in 2001. I was requested by letter to prepare a project report on a course program for an M.Tech. degree in bioinformatics by Dr. B. Srinivasan, Vice Chancellor at Sri. Chandrasekharendra Saraswati Viswa Mahavidyalaya University, Kancheepuram, India, in 2002. The Vice Chancellor at SASTRA University, Thanjavur, India, R. Sethuraman, chartered me with the task of writing a book entitled Lecture Notes in Computational Molecular Biology, to be used for instruction in the newly formed M.Tech. and B.Tech. bioinformatics programs in 2003.
Since I wrote Lecture Notes in Computational Molecular Biology, a number of interesting developments in the field of bioinformatics has come about. The Human Genome Project has been completed ahead of time. The biologic databases double in size every 10 months, and the computing speed of microprocessors doubles in speed every 18 months. So a database search that cost $2 today would, two years from now, quadruple in cost to $8 on account of the explosive growth of databases and would be cut back in half to $4 on account of the increase in computing power. There is scope for the development of data search and data storage algorithms and methods. It can be viewed as a marriage between information technology and computational biology. Bioinformatics is emerging as a distinct discipline of its own. Textbooks need to be neither mathematically intimidating nor biologically intensive and laborious. Over 560 end-of-chapter exercises are provided in this book. Appendices with Internet hotlinks to public-domain databases and PERL code commands are given. This book can be used as a textbook for core subjects in a bioinformatics undergraduate program and as an elective for chemical engineering and biotechnology undergraduate and graduate programs.
The dynamic programming methods of Needleman and Wunsch and Smith and Waterman for global, local, and semiglobal alignment of sequences are discussed. The affine gap model and the different scoring schemes to make the alignment more biologically meaningful are treated with worked examples. Further reductions in time and space efficiciency from O (n2) needed for the dynamic programming algorithms are introduced. These include the greedy algorithms that tap into the existing similarity of biologic sequences that are homologous. The X-drop algorithm for very similar sequences that can be completed in O (en) time, where e is much smaller than n, are discussed. Dynamic array techniques that only need O (n) space for dynamic programming methods are introduced. Sparse dynamic programming table problems are reviewed. Methods discussed in this text feature the software used in industry, such as MUMer, Genie, LAGAN, CHAOS, GLASS, QSAR, AVID, REPuter, CLUSTALW, T-Coffee, DIALIGN, MAFTT, PSI-BLAST, BLAST, FASTA, STAMP, JalView, SAM, HMMER, HMMPRO, Meta-Meme, PFAM, Profile HMMs, GLIMMER, GENEMARK, PROCRUSTES, GRAIL, fGENEH, ROSETTA, GENSCAN, SLAM, HMMSTER, PHDSec, DISULFIND, SAM-T99, JPRED, etc.