logo资料库

Bioinformatics Algorithms - vol2.pdf

第1页 / 共314页
第2页 / 共314页
第3页 / 共314页
第4页 / 共314页
第5页 / 共314页
第6页 / 共314页
第7页 / 共314页
第8页 / 共314页
资料共314页,剩余部分请下载后查看
List of Code Challenges
About the Textbook
Meet the Authors
Meet the Development Team
Acknowledgments
Chapter 7: Which Animal Gave Us SARS?
The Fastest Outbreak
Trouble at the Metropole Hotel
The evolution of SARS
Transforming Distance Matrices into Evolutionary Trees
Constructing a distance matrix from coronavirus genomes
Evolutionary trees as graphs
Distance-based phylogeny construction
Toward An Algorithm for Distance-Based Phylogeny Construction
A quest for neighboring leaves
Computing limb lengths
Additive Phylogeny
Trimming the tree
Attaching a limb
An algorithm for distance-based phylogeny construction
Constructing an evolutionary tree of coronaviruses
Using Least Squares to Construct Approximate Distance-Based Phylogenies
Ultrametric Evolutionary Trees
The Neighbor-Joining Algorithm
Transforming a distance matrix into a neighbor-joining matrix
Analyzing coronaviruses with the neighbor-joining algorithm
Limitations of distance-based approaches to evolutionary tree construction
Character-Based Tree Reconstruction
Character tables
From anatomical to genetic characters
How many times has evolution invented insect wings?
The Small Parsimony Problem
The Large Parsimony Problem
Epilogue: Evolutionary Trees Fight Crime
Detours
When did HIV jump from primates to humans?
Searching for a tree fitting a distance matrix
The four point condition
Did bats give us SARS?
Why does the neighbor-joining algorithm find neighboring leaves?
Computing limb lengths in the neighbor-joining algorithm
Giant panda: bear or raccoon?
Where did humans come from?
Bibliography Notes
Chapter 8: How Did Yeast Become a Wine Maker?
An Evolutionary History of Wine Making
How long have we been addicted to alcohol?
The diauxic shift
Identifying Genes Responsible for the Diauxic Shift
Two evolutionary hypotheses with different fates
Which yeast genes drive the diauxic shift?
Introduction to Clustering
Gene expression analysis
Clustering yeast genes
The Good Clustering Principle
Clustering as an Optimization Problem
Farthest First Traversal
k-Means Clustering
Squared error distortion
k-means clustering and the center of gravity
The Lloyd Algorithm
From centers to clusters and back again
Initializing the Lloyd algorithm
k-means++ Initializer
Clustering Genes Implicated in the Diauxic Shift
Limitations of k-Means Clustering
From Coin Flipping to k-Means Clustering
Flipping coins with unknown biases
Where is the computational problem?
From coin flipping to the Lloyd algorithm
Return to clustering
Making Soft Decisions in Coin Flipping
Expectation maximization: the E-step
Expectation maximization: the M-step
The expectation maximization algorithm
Soft k-Means Clustering
Applying expectation maximization to clustering
Centers to soft clusters
Soft clusters to centers
Hierarchical Clustering
Introduction to distance-based clustering
Inferring clusters from a tree
Analyzing the diauxic shift with hierarchical clustering
Epilogue: Clustering Tumor Samples
Detours
Whole genome duplication or a series of duplications?
Measuring gene expression
Microarrays
Proof of the Center of Gravity Theorem
Transforming an expression matrix into a distance/similarity matrix
Clustering and corrupted cliques
Bibliography Notes
Chapter 9: How Do We Locate Disease-Causing Mutations?
What Causes Ohdo Syndrome?
Introduction to Multiple Pattern Matching
Herding Patterns into a Trie
Constructing a trie
Applying the trie to multiple pattern matching
Preprocessing the Genome Instead
Introduction to suffix tries
Using suffix tries for pattern matching
Suffix Trees
Suffix Arrays
Constructing a suffix array
Pattern matching with the suffix array
The Burrows-Wheeler Transform
Genome compression
Constructing the Burrows-Wheeler transform
From repeats to runs
Inverting the Burrows-Wheeler Transform
A first attempt at inverting the Burrows-Wheeler transform
The First-Last Property
Using the First-Last property to invert the Burrows-Wheeler transform
Pattern Matching with the Burrows-Wheeler Transform
A first attempt at Burrows-Wheeler pattern matching
Moving backward through a pattern
The Last-to-First mapping
Speeding Up Burrows-Wheeler Pattern Matching
Substituting the Last-to-First mapping with count arrays
Getting rid of the first column of the Burrows-Wheeler matrix
Where are the Matched Patterns?
Burrows and Wheeler Set Up Checkpoints
Epilogue: Mismatch-Tolerant Read Mapping
Reducing approximate pattern matching to exact pattern matching
BLAST: Comparing a sequence against a database
Approximate pattern matching with the Burrows-Wheeler transform
Charging Stations
Constructing a suffix tree
Solving the Longest Shared Substring Problem
Partial suffix array construction
Detours
The reference human genome
Rearrangements, insertions, and deletions in human genomes
The Aho-Corasick algorithm
From suffix trees to suffix arrays
From suffix arrays to suffix trees
Binary search
Bibliography Notes
Chapter 10: Why Have Biologists Still Not Developed an HIV Vaccine?
Classifying the HIV Phenotype
How does HIV evade the human immune system?
Limitations of sequence alignment
Gambling with Yakuza
Two Coins up the Dealer's Sleeve
Finding CG-Islands
Hidden Markov Models
From coin flipping to a Hidden Markov Model
The HMM diagram
Reformulating the Casino Problem
The Decoding Problem
The Viterbi graph
The Viterbi algorithm
How fast is the Viterbi algorithm?
Finding the Most Likely Outcome of an HMM
Profile HMMs for Sequence Alignment
How do HMMs relate to sequence alignment?
Building a profile HMM
Transition and emission probabilities of a profile HMM
Classifying proteins with profile HMMs
Aligning a protein against a profile HMM
The return of pseudocounts
The troublesome silent states
Are profile HMMs really all that useful?
Learning the Parameters of an HMM
Estimating HMM parameters when the hidden path is known
Viterbi learning
Soft Decisions in Parameter Estimation
The Soft Decoding Problem
The forward-backward algorithm
Baum-Welch Learning
The Many Faces of HMMs
Epilogue: Nature is a Tinkerer and not an Inventor
Detours
The Red Queen Effect
Glycosylation
DNA methylation
Conditional probability
Bibliography Notes
Chapter 11: Was T. rex Just a Big Chicken?
Paleontology Meets Computing
Which Proteins Are Present in This Sample?
Decoding an Ideal Spectrum
From Ideal to Real Spectra
Peptide Sequencing
Scoring peptides against spectra
Where are the suffix peptides?
Peptide sequencing algorithm
Peptide Identification
The Peptide Identification Problem
Identifying peptides in the unknown T. rex proteome
Searching for peptide-spectrum matches
Peptide Identification and the Infinite Monkey Theorem
False discovery rate
The monkey and the typewriter
Statistical significance of a peptide-spectrum match
Spectral Dictionaries
T. rex Peptides: Contaminants or Treasure Trove of Ancient Proteins?
The hemoglobin riddle
The dinosaur DNA controversy
Epilogue: From Unmodified to Modified Peptides
Post-translational modifications
Searching for modifications as an alignment problem
Building a Manhattan grid for spectral alignment
Spectral alignment algorithm
Detours
Gene prediction
Finding all paths in a graph
The Anti-Symmetric Path Problem
Transforming spectra into spectral vectors
The infinite monkey theorem
The probabilistic space of peptides in a spectral dictionary
Are terrestrial dinosaurs really the ancestors of birds?
Solving the Most Likely Peptide Vector Problem
Selecting parameters for transforming spectra into spectral vectors
Bibliography Notes
Bibliography
Image Courtesies
Welcome! Thank you for joining us! As you explore this book, you will find a number of active learning components that help you learn the material at your own pace. 1. CODE CHALLENGES ask you to implement the algorithms that you will en- counter (in any programming language you like). These code challenges are hosted in the “Bioinformatics Textbook Track” location on Rosalind (http:// rosalind.info), a website that will automatically test your implementations. 2. CHARGING STATIONS provide additional insights on implementing the algo- rithms you encounter. However, we suggest trying to solve a Code Challenge before you visit a Charging Station. 3. EXERCISE BREAKS offer “just in time” assessments testing your understanding of a topic before moving to the next one. 4. STOP and Think questions invite you to slow down and contemplate the current material before continuing to the next topic. 5. DETOURS provide extra content that didn’t quite fit in the main text. DETOUR 6. FINAL CHALLENGES ask you to apply what you have learned to real experi- mental datasets. This textbook powers our popular online courses on Coursera. We encourage you to sign up for a session and learn this material while interacting with thousands of other talented students from around the world. You can also find lecture videos and PowerPoint slides at the textbook website, http://bioinformaticsalgorithms.org. i
Bioinformatics Algorithms:An Active Learning Approach2nd Edition, Vol. IIhttp://bioinformaticsalgorithms.org© 2015Phillip Compeau & Pavel Pevzner
Copyright © 2015 by Phillip Compeau and Pavel Pevzner. All rights reserved. This book or any portion thereof may not be reproduced or used in any manner whatso- ever without the express written permission of the publisher except for the use of brief quotations in a book review. Printed in the United States of America First Printing, 2015 ISBN: 978-0-9903746-2-6 Library of Congress Control Number: 2015945208 Active Learning Publishers, LLC 9768 Claiborne Square La Jolla, CA 92037
To my family. — P. C. To my parents. — P. P.
In Case You Missed Volume I. . . CHAPTER 1 CHAPTER 2 vi
CHAPTER 3 CHAPTER 4 CHAPTER 5 CHAPTER 6 vii
分享到:
收藏