logo资料库

Bioinformatics with Python Cookbook(英文原版).pdf

第1页 / 共306页
第2页 / 共306页
第3页 / 共306页
第4页 / 共306页
第5页 / 共306页
第6页 / 共306页
第7页 / 共306页
第8页 / 共306页
资料共306页,剩余部分请下载后查看
Cover
Copyright
Credits
About the Author
About the Reviewers
www.PacktPub.com
Table of Contents
Preface
Chapter 1: Python and the Surrounding Software Ecology
Introduction
Installing the required software with Anaconda
Installing the required software with Docker
Interfacing with R via rpy2
Performing R magic with IPython
Chapter 2: Next-generation Sequencing
Introduction
Accessing GenBank and moving around NCBI databases
Performing basic sequence analysis
Working with modern sequence formats
Working with alignment data
Analyzing data in variant call format
Studying genome accessibility and filtering SNP data
Chapter 3: Working with Genomes
Introduction
Working with high-quality reference genomes
Dealing with low-quality genome references
Traversing genome annotations
Extracting genes from a reference using annotations
Finding orthologues with the Ensembl REST API
Retrieving gene ontology information from Ensembl
Chapter 4: Population Genetics
Introduction
Managing datasets with PLINK
Introducing the Genepop format
Exploring a dataset with Bio.PopGen
Computing F-statistics
Performing Principal Components Analysis
Investigating population structure with Admixture
Chapter 5: Population Genetics Simulation
Introduction
Introducing forward-time simulations
Simulating selection
Simulating population structure using island and stepping-stone models
Modeling complex demographic scenarios
Simulating the coalescent with Biopython and fastsimcoal
Chapter 6: Phylogenetics
Introduction
Preparing the Ebola dataset
Aligning genetic and genomic data
Comparing sequences
Reconstructing phylogenetic trees
Playing recursively with trees
Visualizing phylogenetic data
Chapter 7: Using the Protein Data Bank
Introduction
Finding a protein in multiple databases
Introducing Bio.PDB
Extracting more information from a PDB file
Computing molecular distances on a PDB file
Performing geometric operations
Implementing a basic PDB parser
Animating with PyMol
Parsing mmCIF files using Biopython
Chapter 8: Other Topics in Bioinformatics
Introduction
Accessing the Global Biodiversity Information Facility
Geo-referencing GBIF datasets
Accessing molecular-interaction databases with PSIQUIC
Plotting protein interactions with Cytoscape the hard way
Chapter 9: Python for Big Genomics Datasets
Introduction
Setting the stage for high-performance computing
Designing a poor human concurrent executor
Performing parallel computing with IPython
Computing the median in a large dataset
Optimizing code with Cython and Numba
Programming with laziness
Thinking with generators
Index
Free ebooks ==> www.ebook777.com 1 www.ebook777.com www.it-ebooks.info
Free ebooks ==> www.ebook777.com Bioinformatics with Python Cookbook Learn how to use modern Python bioinformatics libraries and applications to do cutting-edge research in computational biology Tiago Antao BIRMINGHAM - MUMBAI www.ebook777.com www.it-ebooks.info
Free ebooks ==> www.ebook777.com Bioinformatics with Python Cookbook Copyright © 2015 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: June 2015 Production reference: 1230615 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78217-511-7 www.packtpub.com www.ebook777.com www.it-ebooks.info
Free ebooks ==> www.ebook777.com Credits Project Coordinator Harshal Ved Proofreader Safis Editing Indexer Monica Ajmera Mehta Production Coordinator Arvindkumar Gupta Cover Work Arvindkumar Gupta Author Tiago Antao Reviewers Cho-Yi Chen Giovanni M. Dall'Olio Commissioning Editor Nadeem N. Bagban Acquisition Editor Kevin Colaco Content Development Editor Gaurav Sharma Technical Editor Shashank Desai Copy Editor Relin Hedly www.ebook777.com www.it-ebooks.info
Free ebooks ==> www.ebook777.com About the Author Tiago Antao is a bioinformatician. He is currently studying the genomics of the mosquito Anopheles gambiae, the main vector of malaria. Tiago was originally a computer scientist who crossed over to computational biology with an MSc in bioinformatics from the Faculty of Sciences of the University of Porto, Portugal. He holds a PhD in the spread of drug resistant malaria from the Liverpool School of Tropical Medicine, UK. Tiago is one of the coauthors of Biopython—a major bioinformatics package—written on Python. He has also developed Lositan, a Jython-based selection detection workbench. In his postdoctoral career, he has worked with human datasets at the University of Cambridge, UK, and with the mosquito whole genome sequence data at the University of Oxford, UK. He is currently working as a Sir Henry Wellcome fellow at the Liverpool School of Tropical Medicine. I would like to take this opportunity to acknowledge everyone at Packt Publishing, especially Gaurav Sharma, my very patient development editor. The quality of this book owes much to the excellent work of the reviewers who provided outstanding comments. Finally, I would like to thank Ana for all that she endured during the writing of this book. www.ebook777.com www.it-ebooks.info
Free ebooks ==> www.ebook777.com About the Reviewers Cho-Yi Chen is an Olympic swimmer, a bioinformatician, and a computational biologist. He majored in computer science and later devoted himself to biomedical research. Cho-Yi Chen received his MS and PhD degrees in bioinformatics, genomics, and systems biology from National Taiwan University. He was a founding member of the Taiwan Society of Evolution and Computational Biology and is now a postdoctoral research fellow at the Department of Biostatistics and Computational Biology at the Dana-Farber Cancer Institute, Harvard University. As an active scientist and a software developer, Cho-Yi Chen strives to advance our understanding of cancer and other human diseases. Giovanni M. Dall'Olio is a bioinformatician with a background in human population genetics and cancer. He maintains a personal blog on bioinformatics tips and best practices at http://bioinfoblog.it. Giovanni was one of the early moderators of Biostar, a Q&A on bioinformatics (http://biostars.org/). He is also a Python enthusiast and was a co-organizer of the Barcelona Python Meetup community for many years. After earning a PhD in human population genetics at the Pompeu Fabra University of Barcelona, he moved to King's College London, where he applies his knowledge and programming skills to the study of cancer genetics. He is also responsible for the maintenance of the Network of Cancer Genes (http://ncg.kcl.ac.uk/), a database of system-level properties of genes involved in cancer. www.ebook777.com www.it-ebooks.info
Free ebooks ==> www.ebook777.com www.PacktPub.com Support files, eBooks, discount offers, and more For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks. TM https://www2.packtpub.com/books/subscription/packtlib Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books. Why Subscribe? f Fully searchable across every book published by Packt f Copy and paste, print, and bookmark content f On demand and accessible via a web browser Free Access for Packt account holders If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access. www.ebook777.com www.it-ebooks.info
Free ebooks ==> www.ebook777.com Table of Contents Preface Chapter 1: Python and the Surrounding Software Ecology Introduction Installing the required software with Anaconda Installing the required software with Docker Interfacing with R via rpy2 Performing R magic with IPython Chapter 2: Next-generation Sequencing Introduction Accessing GenBank and moving around NCBI databases Performing basic sequence analysis Working with modern sequence formats Working with alignment data Analyzing data in the variant call format Studying genome accessibility and filtering SNP data Chapter 3: Working with Genomes Introduction Working with high-quality reference genomes Dealing with low-quality genome references Traversing genome annotations Extracting genes from a reference using annotations Finding orthologues with the Ensembl REST API Retrieving gene ontology information from Ensembl Chapter 4: Population Genetics Introduction Managing datasets with PLINK Introducing the Genepop format www.ebook777.com www.it-ebooks.info v 1 1 2 7 9 16 19 19 20 25 28 37 44 47 61 61 62 68 73 76 80 83 89 89 91 97 i
分享到:
收藏