Bioinformatics Algorithms - vol2.pdf

发布时间：2022-06-18 发布人：admin 分类：说明书资料大小：6.42M 资料格式：pdf 举报版权申诉

e38e686b-4398-462a-b9da-6c3db826ded0.pdf-第1页.png

第1页 / 共314页

e38e686b-4398-462a-b9da-6c3db826ded0.pdf-第2页.png

第2页 / 共314页

e38e686b-4398-462a-b9da-6c3db826ded0.pdf-第3页.png

第3页 / 共314页

e38e686b-4398-462a-b9da-6c3db826ded0.pdf-第4页.png

第4页 / 共314页

e38e686b-4398-462a-b9da-6c3db826ded0.pdf-第5页.png

第5页 / 共314页

e38e686b-4398-462a-b9da-6c3db826ded0.pdf-第6页.png

第6页 / 共314页

e38e686b-4398-462a-b9da-6c3db826ded0.pdf-第7页.png

第7页 / 共314页

e38e686b-4398-462a-b9da-6c3db826ded0.pdf-第8页.png

第8页 / 共314页

List of Code Challenges

About the Textbook

Meet the Authors

Meet the Development Team

Acknowledgments

Chapter 7: Which Animal Gave Us SARS?

The Fastest Outbreak

Trouble at the Metropole Hotel

The evolution of SARS

Transforming Distance Matrices into Evolutionary Trees

Constructing a distance matrix from coronavirus genomes

Evolutionary trees as graphs

Distance-based phylogeny construction

Toward An Algorithm for Distance-Based Phylogeny Construction

A quest for neighboring leaves

Computing limb lengths

Additive Phylogeny

Trimming the tree

Attaching a limb

An algorithm for distance-based phylogeny construction

Constructing an evolutionary tree of coronaviruses

Using Least Squares to Construct Approximate Distance-Based Phylogenies

Ultrametric Evolutionary Trees

The Neighbor-Joining Algorithm

Transforming a distance matrix into a neighbor-joining matrix

Analyzing coronaviruses with the neighbor-joining algorithm

Limitations of distance-based approaches to evolutionary tree construction

Character-Based Tree Reconstruction

Character tables

From anatomical to genetic characters

How many times has evolution invented insect wings?

The Small Parsimony Problem

The Large Parsimony Problem

Epilogue: Evolutionary Trees Fight Crime

Detours

When did HIV jump from primates to humans?

Searching for a tree fitting a distance matrix

The four point condition

Did bats give us SARS?

Why does the neighbor-joining algorithm find neighboring leaves?

Computing limb lengths in the neighbor-joining algorithm

Giant panda: bear or raccoon?

Where did humans come from?

Bibliography Notes

Chapter 8: How Did Yeast Become a Wine Maker?

An Evolutionary History of Wine Making

How long have we been addicted to alcohol?

The diauxic shift

Identifying Genes Responsible for the Diauxic Shift

Two evolutionary hypotheses with different fates

Which yeast genes drive the diauxic shift?

Introduction to Clustering

Gene expression analysis

Clustering yeast genes

The Good Clustering Principle

Clustering as an Optimization Problem

Farthest First Traversal

k-Means Clustering

Squared error distortion

k-means clustering and the center of gravity

The Lloyd Algorithm

From centers to clusters and back again

Initializing the Lloyd algorithm

k-means++ Initializer

Clustering Genes Implicated in the Diauxic Shift

Limitations of k-Means Clustering

From Coin Flipping to k-Means Clustering

Flipping coins with unknown biases

Where is the computational problem?

From coin flipping to the Lloyd algorithm

Return to clustering

Making Soft Decisions in Coin Flipping

Expectation maximization: the E-step

Expectation maximization: the M-step

The expectation maximization algorithm

Soft k-Means Clustering

Applying expectation maximization to clustering

Centers to soft clusters

Soft clusters to centers

Hierarchical Clustering

Introduction to distance-based clustering

Inferring clusters from a tree

Analyzing the diauxic shift with hierarchical clustering

Epilogue: Clustering Tumor Samples

Detours

Whole genome duplication or a series of duplications?

Measuring gene expression

Microarrays

Proof of the Center of Gravity Theorem

Transforming an expression matrix into a distance/similarity matrix

Clustering and corrupted cliques

Bibliography Notes

Chapter 9: How Do We Locate Disease-Causing Mutations?

What Causes Ohdo Syndrome?

Introduction to Multiple Pattern Matching

Herding Patterns into a Trie

Constructing a trie

Applying the trie to multiple pattern matching

Preprocessing the Genome Instead

Introduction to suffix tries

Using suffix tries for pattern matching

Suffix Trees

Suffix Arrays

Constructing a suffix array

Pattern matching with the suffix array

The Burrows-Wheeler Transform

Genome compression

Constructing the Burrows-Wheeler transform

From repeats to runs

Inverting the Burrows-Wheeler Transform

A first attempt at inverting the Burrows-Wheeler transform

The First-Last Property

Using the First-Last property to invert the Burrows-Wheeler transform

Pattern Matching with the Burrows-Wheeler Transform

A first attempt at Burrows-Wheeler pattern matching

Moving backward through a pattern

The Last-to-First mapping

Speeding Up Burrows-Wheeler Pattern Matching

Substituting the Last-to-First mapping with count arrays

Getting rid of the first column of the Burrows-Wheeler matrix

Where are the Matched Patterns?

Burrows and Wheeler Set Up Checkpoints

Epilogue: Mismatch-Tolerant Read Mapping

Reducing approximate pattern matching to exact pattern matching

BLAST: Comparing a sequence against a database

Approximate pattern matching with the Burrows-Wheeler transform

Charging Stations

Constructing a suffix tree

Solving the Longest Shared Substring Problem

Partial suffix array construction

Detours

The reference human genome

Rearrangements, insertions, and deletions in human genomes

The Aho-Corasick algorithm

From suffix trees to suffix arrays

From suffix arrays to suffix trees

Binary search

Bibliography Notes

Chapter 10: Why Have Biologists Still Not Developed an HIV Vaccine?

Classifying the HIV Phenotype

How does HIV evade the human immune system?

Limitations of sequence alignment

Gambling with Yakuza

Two Coins up the Dealer's Sleeve

Finding CG-Islands

Hidden Markov Models

From coin flipping to a Hidden Markov Model

The HMM diagram

Reformulating the Casino Problem

The Decoding Problem

The Viterbi graph

The Viterbi algorithm

How fast is the Viterbi algorithm?

Finding the Most Likely Outcome of an HMM

Profile HMMs for Sequence Alignment

How do HMMs relate to sequence alignment?

Building a profile HMM

Transition and emission probabilities of a profile HMM

Classifying proteins with profile HMMs

Aligning a protein against a profile HMM

The return of pseudocounts

The troublesome silent states

Are profile HMMs really all that useful?

Learning the Parameters of an HMM

Estimating HMM parameters when the hidden path is known

Viterbi learning

Soft Decisions in Parameter Estimation

The Soft Decoding Problem

The forward-backward algorithm

Baum-Welch Learning

The Many Faces of HMMs

Epilogue: Nature is a Tinkerer and not an Inventor

Detours

The Red Queen Effect

Glycosylation

DNA methylation

Conditional probability

Bibliography Notes

Chapter 11: Was T. rex Just a Big Chicken?

Paleontology Meets Computing

Which Proteins Are Present in This Sample?

Decoding an Ideal Spectrum

From Ideal to Real Spectra

Peptide Sequencing

Scoring peptides against spectra

Where are the suffix peptides?

Peptide sequencing algorithm

Peptide Identification

The Peptide Identification Problem

Identifying peptides in the unknown T. rex proteome

Searching for peptide-spectrum matches

Peptide Identification and the Infinite Monkey Theorem

False discovery rate

The monkey and the typewriter

Statistical significance of a peptide-spectrum match

Spectral Dictionaries

T. rex Peptides: Contaminants or Treasure Trove of Ancient Proteins?

The hemoglobin riddle

The dinosaur DNA controversy

Epilogue: From Unmodified to Modified Peptides

Post-translational modifications

Searching for modifications as an alignment problem

Building a Manhattan grid for spectral alignment

Spectral alignment algorithm

Detours

Gene prediction

Finding all paths in a graph

The Anti-Symmetric Path Problem

Transforming spectra into spectral vectors

The infinite monkey theorem

The probabilistic space of peptides in a spectral dictionary

Are terrestrial dinosaurs really the ancestors of birds?

Solving the Most Likely Peptide Vector Problem

Selecting parameters for transforming spectra into spectral vectors

Bibliography Notes

Bibliography

Image Courtesies

Welcome! Thank you for joining us! As you explore this book, you will ﬁnd a number of active learning components that help you learn the material at your own pace. 1. CODE CHALLENGES ask you to implement the algorithms that you will en- counter (in any programming language you like). These code challenges are hosted in the “Bioinformatics Textbook Track” location on Rosalind (http:// rosalind.info), a website that will automatically test your implementations. 2. CHARGING STATIONS provide additional insights on implementing the algo- rithms you encounter. However, we suggest trying to solve a Code Challenge before you visit a Charging Station. 3. EXERCISE BREAKS offer “just in time” assessments testing your understanding of a topic before moving to the next one. 4. STOP and Think questions invite you to slow down and contemplate the current material before continuing to the next topic. 5. DETOURS provide extra content that didn’t quite ﬁt in the main text. DETOUR 6. FINAL CHALLENGES ask you to apply what you have learned to real experi- mental datasets. This textbook powers our popular online courses on Coursera. We encourage you to sign up for a session and learn this material while interacting with thousands of other talented students from around the world. You can also ﬁnd lecture videos and PowerPoint slides at the textbook website, http://bioinformaticsalgorithms.org. i

Copyright © 2015 by Phillip Compeau and Pavel Pevzner. All rights reserved. This book or any portion thereof may not be reproduced or used in any manner whatso- ever without the express written permission of the publisher except for the use of brief quotations in a book review. Printed in the United States of America First Printing, 2015 ISBN: 978-0-9903746-2-6 Library of Congress Control Number: 2015945208 Active Learning Publishers, LLC 9768 Claiborne Square La Jolla, CA 92037

To my family. — P. C. To my parents. — P. P.

In Case You Missed Volume I. . . CHAPTER 1 CHAPTER 2 vi

CHAPTER 3 CHAPTER 4 CHAPTER 5 CHAPTER 6 vii

分享到：

赞收藏

资料库

Bioinformatics Algorithms - vol2.pdf

相关推荐

信息化

热门标签

最新资料