LitArk » Books » Children

Chauve Cedric - Models and Algorithms for Genome Evolution

Here you can read online Chauve Cedric - Models and Algorithms for Genome Evolution full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. City: London, year: 2013, publisher: Springer London : Imprint : Springer, genre: Children. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Book:
Models and Algorithms for Genome Evolution
Author:
Chauve Cedric / ElMabrouk Nadia / Tannier Eric
Publisher:
Springer London : Imprint : Springer
Genre:
Books / Children
Year:
2013
City:
London
Rating:
4 / 5
Favourites:
Add to favourites
Your mark:
- 80
- 1
- 2
- 3
- 4
- 5

Description
Author's other books
Similar books

Models and Algorithms for Genome Evolution: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Models and Algorithms for Genome Evolution" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Part I: Emergence of Standard Algorithms -- Whats Behind Blast -- Forty Years of Model-Based Phylogeography -- How to Infer Ancestral Genome Features by Parsimony -- Duplication, Rearrangement and Reconciliation -- The Genesis of the DCJ Formula -- Part II: New Lights on Current Paradigms -- Large-Scale Multiple Sequence Alignment and Phylogeny Estimation -- Rearrangements in Phylogenetic Inference -- Status of Research on Insertion and Deletion Variations in the Human Population -- A Retrospective on Genomic Preprocessing for Comparative Genomics -- A Comparison of DCJ and Algebraic Distances -- Part III: Promising Directions -- Fractionation, Rearrangement, Consolidation, Reconstruction -- Error Detection and Correction of Gene Trees -- The Potential of Family-Free Genome Comparison -- Genetic History of Populations.

Chauve Cedric: author's other books

Who wrote Models and Algorithms for Genome Evolution? Find out the surname, the name of the author of the book and a list of all author's works by series.

Models and Algorithms for Genome Evolution — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Models and Algorithms for Genome Evolution" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Part 1
Emergence of Standard Algorithms

Cedric Chauve , Nadia El-Mabrouk and Eric Tannier (eds.) Computational Biology Models and Algorithms for Genome Evolution 2013 10.1007/978-1-4471-5298-9_1 Springer-Verlag London 2013

1. Whats Behind Blast

Gene Myers 1

(1)

MPI for Cellular Molecular Biology and Genetics, 01307 Dresden, Germany

Gene Myers

Email:

Abstract

The BLAST search engine was published and released in 1990. It is a heuristic that uses the idea of a neighborhood to find seed matches that are then extended. This approach came from work that this author was doing to lever these ideas to arrive at a deterministic algorithm with a characterized and superior time complexity. The resulting Models and Algorithms for Genome Evolution - image 1

Models and Algorithms for Genome Evolution - image 1

expected-time algorithm for finding all e -matches to a string of length p in a text of length n was completed in 1991. The function

is 0 for =0 and concave increasing, so the algorithm is truly sublinear in that its running time is O ( n c ) for c <1 for sufficiently small. This paper reviews the history and the unfolding of the basic concepts, and it attempts to intuitively describe the deeper result whose time complexity, to this authors knowledge, has yet to be improved upon.

1.1 The Meeting

The 1980s were an active decade for basic advances in sequence comparison algorithms. Michael Waterman, Temple Smith, Esko Ukkonen, Webb Miller, Gad Landau, David Lipman, Bill Pearson, and myself, among others, were all very active in this period of time and were working out the basic algorithms for comparing sequences, approximate pattern matching, and database searching (e.g. [].

In 1988, Webb Miller and I organized a small bioinformatics meeting in Bethesda, Maryland that included such notable figures as David Sankoff, Michael Waterman, Temple Smith, Eric Lander, Zvi Galil, Esko Ukkonen, David Lipman and other great investigators of that time. At the meeting Zvi Galil gave a talk about suffix trees [] and raised two questions that ultimately were answered:

Can suffix trees be built in a way that is independent of alphabet size s ?

Can a precomputed index such as a suffix tree of a large text be used to speed up searches for approximate matches to a string?

To understand the first question, one must recall that one can either use an s -element array in each suffix tree node to permit a search for a string of length p in a text of length n in O ( p ) time but requiring O ( ns ) space for the suffix tree, or one can use only O ( n ) space by using binary trees to decide which edge to follow out of a node, but resulting in O ( p log s ) time for the search. This question ultimately led Udi Manber and I to develop suffix arrays in late 1990 [], to be computed in O ( n ) time.

1.2 Filters and Neighborhoods

But it was the second question on the use of an index, such as a suffix tree, to speed searches for approximate matches that captured my attention immediately after the meeting. Most algorithmicists work on deterministic search algorithms meaning that the method finds exactly the set of locations in the text where a query approximately matches within some specified threshold, whereas a heuristic is an algorithm that finds most of the matches sought, but may miss a few, called a false negative , and may further report a few locations where a match doesnt actually occur, called a false positive . In between these two types of algorithms, a filter is an algorithm that has no false negatives but may produce false positives. That is, it produces a superset of the instances sought, or equivalently it filters out most of the locations where matches do not occur. A filter can be the first step of a deterministic algorithm simply by running a deterministic checking algorithm on the subset of locations reported by the filter. If the filter is much more efficient than the deterministic checker, then one ends up with a much more efficient search.

At the time, there were surprisingly no published methods using the simple idea of finding exact matches to k -mers (strings of length k ) from the query string [] even though this was fairly obvious and had been used in the heuristic method of FASTA. Shortly after the Bethesda meeting, I had the first and most important idea of looking for exact matches to strings in the neighborhood of k -mers selected from the query string. Let be a sequence comparison measure that given two strings v and w returns a numeric measure ( v , w ) of the degree to which they differ (e.g. the generalized Levenshtein metric). Given a string w the -neighborhood of w with respect to ,

is the set of all strings v whose best alignment with w under scoring scheme is less than , i.e. { v :( v , w )}.

For this paper, except where mentioned otherwise, we are focusing on the approximate match problem where is the simple Levenshtein metric, which is the minimum number of insertions, deletions, and substitutions possible in an alignment between the two strings in question. That is, we seek matches of a query of length p to a text of length n , where up to e differences are allowed. Another way to phrase this, which we will use interchangeably, is that we seek -matches where = e / p is the length relative fraction of differences allowed. To illustrate the idea of a neighborhood under this metric, the 1-neighborhood of abba (or 25 %-neighborhood) is 1( abba )={ aaba , aabba , abaa , aba , abaa , ababa , abba , abbaa , abbab , abb , abbb , abbba , babba , bba , bbba }.

For the example above, notice that wherever one finds abaa one will also finds aba as it is a prefix of the former. So to find all matches to neighborhood strings it suffices to look up in an index only those that are not an extension of a shorter string in the same neighborhood. Let the condensed -neighborhood of w be the subset of these strings, i.e. For our example the condensed 1-neighborhood of abba is a considerably - photo 4

. For our example, the condensed 1-neighborhood of abba is

, a considerably smaller set of strings.

To illustrate the advantage of using (condensed) neighborhoods, consider looking for a match with nine differences to a query of length say 100. If one partitions the query into 10 strings of length 10, then by the Pigeon Hole principle, one of the 10 strings must exactly match in the database. So one can filter the text by looking for one of these 10 strings of length 10. But if one partitions the query into 5 strings of length 20, then by the Pigeon Hole principle, a string in the 1-neighborhoods of the five query parts must exactly match in the database. A rough estimate for the number of strings in the condensed e -neighborhood of a string of length k is Models and Algorithms for Genome Evolution - image 6

Models and Algorithms for Genome Evolution - image 6

. Thus in our example we can filter the text by looking for one of 800 strings of length 20. Which filter is better? The probability of a random false positive for the k -mer filter is 10/ s 10 and for the neighborhood filter it is 800/ s 20. Thus the later filter produces s 10/80 fewer false positives. If s is 4 (e.g. the DNA alphabet) and n is 3109 (e.g. the size of the human genome) then the neighborhood filter produces 13,000 times fewer false positives, and reports in expectation 2.18 false positive, whereas the k -mer filter reports over 28,600!

Light

Font size:

↓

↑

Reset

Interval:

↓

↑

Bookmark:

Make

Similar books «Models and Algorithms for Genome Evolution»

Look at similar books to Models and Algorithms for Genome Evolution. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.

Eva Šlesingerová

We, Other Utopians: Recombinant DNA, Genome Editing, and Artificial Life

Marcello Maresca

Genome Editing in Drug Discovery

Siddhartha Roy

Chemical Biology of the Genome

Carla Mooney

The Human Genome: Mapping the Blueprint of Human Life

J. Craig Venter

A life decoded: my genome

Robert DeSalle

Welcome to the Genome: A Users Guide to the Genetic Past, Present, and Future

Axelson-Fisk

Comparative gene finding: models, algorithms and implementation

Shu-Miaw Chaw

Plastid Genome Evolution

Eugene E. Harris

Ancestors in Our Genome: The New Science of Human Evolution

Nessa Carey

Junk DNA: A Journey Through the Dark Matter of the Genome

National Research Council (U.S.). Committee on Human Genome

Evaluating human genetic diversity

Moyra Smith

Investigating the Human Genome: Insights into Human Variation and Disease Susceptibility (Ft Press Science Series)

Reviews about «Models and Algorithms for Genome Evolution»

Discussion, reviews of the book Models and Algorithms for Genome Evolution and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.