logo资料库

R Programming for Bioinformatics.pdf

第1页 / 共325页
第2页 / 共325页
第3页 / 共325页
第4页 / 共325页
第5页 / 共325页
第6页 / 共325页
第7页 / 共325页
第8页 / 共325页
资料共325页,剩余部分请下载后查看
Title
Copyright
Contents
Chapter 1: Introducing R
Chapter 2: R Language Fundamentals
Chapter 3: Object-Oriented Programming in R
Chapter 4: Input and Output in R
Chapter 5: Working with Character Data
Chapter 6: Foreign Language Interfaces
Chapter 7: R Packages
Chapter 8: Data Technologies
Chapter 9: Debugging and Profiling
References
Index
R Programming for Bioinformatics
Chapman & Hall/CRC Computer Science and Data Analysis Series The interface between the computer and statistical sciences is increasing, as each discipline seeks to harness the power and resources of the other. This series aims to foster the integration between the computer sciences and statistical, numerical, and probabilistic methods by publishing a broad range of reference works, textbooks, and handbooks. SERIES EDITORS David Blei, Princeton University David Madigan, Rutgers University Marina Meila, University of Washington Fionn Murtagh, Royal Holloway, University of London Proposals for the series should be sent directly to one of the series editors above, or submitted to: Chapman & Hall/CRC 4th Floor, Albert House 1-4 Singer Street London EC2A 4BQ UK Published Titles Bayesian Articial Intelligence Kevin B. Korb and Ann E. Nicholson Computational Statistics Handbook with MATLAB®, Second Edition Wendy L. Martinez and Angel R. Martinez Pattern Recognition Algorithms for Data Mining Sankar K. Pal and Pabitra Mitra Exploratory Data Analysis with MATLAB® Wendy L. Martinez and Angel R. Martinez Clustering for Data Mining: A Data Recovery Approach Boris Mirkin Correspondence Analysis and Data Coding with Java and R Fionn Murtagh Design and Modeling for Computer Experiments Kai-Tai Fang, Runze Li, and Agus Sudjianto Introduction to Machine Learning and Bioinformatics Sushmita Mitra, Sujay Datta, Theodore Perkins, and George Michailidis R Graphics Paul Murrell R Programming for Bioinformatics Robert Gentleman Semisupervised Learning for Computational Linguistics Steven Abney Statistical Computing with R Maria L. Rizzo
R Programming for Bioinformatics Robert Gentleman Fred Hutchinson Cancer Research Center Seattle, Washington, U.S.A.
Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2009 by Taylor & Francis Group, LLC Chapman & Hall/CRC is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-13: 978-1-4200-6367-7 (Hardcover) This book contains information obtained from authentic and highly regarded sources Reason- able efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The Authors and Publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www. copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Gentleman, Robert, 1959- R programming for bioinformatics / Robert Gentleman. p. cm. -- (Chapman & Hall/CRC computer science and data analysis series) Bibliographical references (p. ) and index. ISBN 978-1-4200-6367-7 1. Bioinformatics. 2. R (Computer program language) I. Title. II. Series. QH324.2.G46 2008 572.80285’5133--dc22 Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com 2008011352
Contents 1 Introducing R 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 A note on the text . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 2 R Language Fundamentals 2.1 Some special values 2.3.1 Finding out more about an object Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 A brief introduction to R . . . . . . . . . . . . . . . . 2.1.2 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 A very brief introduction to OOP in R . . . . . . . . . 2.1.4 . . . . . . . . . . . . . . . . . . . 2.1.5 Types of objects . . . . . . . . . . . . . . . . . . . . . 2.1.6 Sequence generating and vector subsetting . . . . . . . 2.1.7 Types of functions . . . . . . . . . . . . . . . . . . . . 2.2 Data structures . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Atomic vectors . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Numerical computing . . . . . . . . . . . . . . . . . . 2.2.3 Factors . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Lists, environments and data frames . . . . . . . . . . 2.3 Managing your R session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Language basics . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Operators . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Subscripting and subsetting . . . . . . . . . . . . . . . . . . . 2.5.1 Vector and matrix subsetting . . . . . . . . . . . . . . 2.6 Vectorized computations . . . . . . . . . . . . . . . . . . . . . 2.6.1 The recycling rule . . . . . . . . . . . . . . . . . . . . 2.7 Replacement functions . . . . . . . . . . . . . . . . . . . . . . 2.8 Functional programming . . . . . . . . . . . . . . . . . . . . . 2.9 Writing functions . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 Flow control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11 Exception handling . . . . . . . . . . . . . . . . . . . . . . . . 2.12 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12.1 Standard evaluation . . . . . . . . . . . . . . . . . . . 2.12.2 Non-standard evaluation . . . . . . . . . . . . . . . . . 2.10.1 Conditionals 1 1 2 3 4 5 5 5 6 7 8 9 11 12 12 12 15 16 18 22 24 25 26 28 29 36 37 38 39 41 42 44 45 50 51 52 vii
viii 2.12.3 Function evaluation . . . . . . . . . . . . . . . . . . . 2.12.4 Indirect function invocation . . . . . . . . . . . . . . . 2.12.5 Evaluation on exit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12.6 Other topics 2.12.7 Name spaces . . . . . . . . . . . . . . . . . . . . . . . 2.13 Lexical scope . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.13.1 Likelihoods . . . . . . . . . . . . . . . . . . . . . . . . 2.13.2 Function optimization . . . . . . . . . . . . . . . . . . 2.14 Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Object-Oriented Programming in R 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The basics of OOP . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Inheritance . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Dispatch . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Abstract data types . . . . . . . . . . . . . . . . . . . Self-describing data . . . . . . . . . . . . . . . . . . . 3.2.4 3.3 S3 OOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Implicit classes . . . . . . . . . . . . . . . . . . . . . . 3.3.1 . . . . . . . . . . . . . . . . 3.3.2 Expression data example 3.3.3 S3 generic functions and methods . . . . . . . . . . . . 3.3.4 Details of dispatch . . . . . . . . . . . . . . . . . . . . 3.3.5 Group generics . . . . . . . . . . . . . . . . . . . . . . 3.3.6 S3 replacement methods . . . . . . . . . . . . . . . . . 3.4 S4 OOP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Types of classes . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Attributes . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.4 Class unions . . . . . . . . . . . . . . . . . . . . . . . 3.4.5 Accessor functions . . . . . . . . . . . . . . . . . . . . 3.4.6 Using S3 classes with S4 classes . . . . . . . . . . . . . 3.4.7 S4 generic functions and methods . . . . . . . . . . . . 3.4.8 The syntax of method declaration . . . . . . . . . . . 3.4.9 The semantics of method invocation . . . . . . . . . . 3.4.10 Replacement methods . . . . . . . . . . . . . . . . . . 3.4.11 Finding methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.12 Advanced topics 3.5 Using classes and methods in packages . . . . . . . . . . . . . 3.6 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 Finding documentation . . . . . . . . . . . . . . . . . 3.6.2 Writing documentation . . . . . . . . . . . . . . . . . 3.7 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Managing S3 and S4 together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.1 Getting and setting the class attribute 3.8.2 Mixing S3 and S4 methods 53 54 54 55 57 59 61 62 64 67 67 68 69 71 72 73 74 76 77 78 81 83 83 84 85 98 98 99 100 100 101 105 106 107 107 108 110 110 110 111 111 112 113 114
3.9 Navigating the class and method hierarchy . . . . . . . . . . 115 ix 4 Input and Output in R 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Basic file handling . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Viewing files . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 File manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Working with R’s binary format 4.3 Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Text connections . . . . . . . . . . . . . . . . . . . . . 4.3.2 . . . . . . . . . . . . . . Interprocess communications Seek . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Reading rectangular data . . . . . . . . . . . . . . . . 4.4.2 Writing data . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Debian Control Format (DCF) . . . . . . . . . . . . . 4.4.4 FASTA Format . . . . . . . . . . . . . . . . . . . . . . 4.5 Source and sink: capturing R output . . . . . . . . . . . . . . 4.6 Tools for accessing files on the Internet . . . . . . . . . . . . . 4.4 File input and output 5 Working with Character Data 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Builtin capabilities . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Modifying text . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Sorting and comparing . . . . . . . . . . . . . . . . . . 5.2.3 Matching a set of alternatives . . . . . . . . . . . . . . 5.2.4 Formatting text and numbers . . . . . . . . . . . . . . 5.2.5 Special characters and escaping . . . . . . . . . . . . . 5.2.6 Parsing and deparsing . . . . . . . . . . . . . . . . . . 5.2.7 Plotting with text . . . . . . . . . . . . . . . . . . . . 5.2.8 Locale and font encoding . . . . . . . . . . . . . . . . 5.3 Regular expressions . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Regular expression basics . . . . . . . . . . . . . . . . 5.3.2 Matching . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Using regular expressions . . . . . . . . . . . . . . . . 5.3.4 Globbing and regular expressions . . . . . . . . . . . . 5.4 Prefixes, suffixes and substrings . . . . . . . . . . . . . . . . . 5.5 Biological sequences . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Encoding genomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Matching single query sequences 5.6.2 Matching many query sequences . . . . . . . . . . . . 5.6.3 Palindromes and paired matches . . . . . . . . . . . . 5.6.4 Alignments . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Matching patterns 119 119 120 124 125 129 130 131 133 136 137 138 139 140 141 142 143 145 145 146 151 152 153 155 155 158 159 159 159 160 166 167 169 169 171 172 173 174 175 177 179
分享到:
收藏