An-Introduction-to-Statistics-with-Python-With-Applications-in-t....pdf

发布时间：2022-05-29 发布人：admin 分类：说明书资料大小：6.44M 资料格式：pdf 举报版权申诉

weixin_38743737-11726269-4744302543361557810.pdf-第1页.png

第1页 / 共285页

weixin_38743737-11726269-4744302543361557810.pdf-第2页.png

第2页 / 共285页

weixin_38743737-11726269-4744302543361557810.pdf-第3页.png

第3页 / 共285页

weixin_38743737-11726269-4744302543361557810.pdf-第4页.png

第4页 / 共285页

weixin_38743737-11726269-4744302543361557810.pdf-第5页.png

第5页 / 共285页

weixin_38743737-11726269-4744302543361557810.pdf-第6页.png

第6页 / 共285页

weixin_38743737-11726269-4744302543361557810.pdf-第7页.png

第7页 / 共285页

weixin_38743737-11726269-4744302543361557810.pdf-第8页.png

第8页 / 共285页

Preface

For Whom This Book Is

Additional Material

Acknowledgments

Contents

Acronyms

Part I Python and Statistics

1 Why Statistics?

2 Python

2.1 Getting Started

2.1.1 Conventions

2.1.2 Distributions and Packages

a) Python Packages for Statistics

b) PyPI: The Python Package Index

2.1.3 Installation of Python

a) Under Windows

b) Under Linux

c) Under Mac OS X

2.1.4 Installation of R and rpy2

a) Under Windows

b) Under Linux

2.1.5 Personalizing IPython/Jupyter

a) In Windows

b) In Linux

c) In Mac OS X

2.1.6 Python Resources

2.1.7 First Python Programs

a) Hello World

b) SquareMe

2.2 Python Data Structures

2.2.1 Python Datatypes

2.2.2 Indexing and Slicing

2.2.3 Vectors and Arrays

2.3 IPython/Jupyter: An Interactive Programming Environment

2.3.1 First Session with the Qt Console

2.3.2 Notebook and rpy2

a) The Notebook

b) rpy2

2.3.3 IPython Tips

2.4 Developing Python Programs

2.4.1 Converting Interactive Commands into a Python Program

2.4.2 Functions, Modules, and Packages

a) Functions

b) Modules

2.4.3 Python Tips

2.4.4 Code Versioning

2.5 Pandas: Data Structures for Statistics

2.5.1 Data Handling

a) Common Procedures

b) Notes on Data Selection

2.5.2 Grouping

2.6 Statsmodels: Tools for Statistical Modeling

2.7 Seaborn: Data Visualization

2.8 General Routines

2.9 Exercises

3 Data Input

3.1 Input from Text Files

3.1.1 Visual Inspection

3.1.2 Reading ASCII-Data into Python

a) Simple Text-Files

b) More Complex Text-Files

c) Regular Expressions

3.2 Input from MS Excel

3.3 Input from Other Formats

3.3.1 Matlab

4 Display of Statistical Data

4.1 Datatypes

4.1.1 Categorical

a) Boolean

b) Nominal

c) Ordinal

4.1.2 Numerical

a) Numerical Continuous

b) Numerical Discrete

4.2 Plotting in Python

4.2.1 Functional and Object-Oriented Approaches to Plotting

4.2.2 Interactive Plots

4.3 Displaying Statistical Datasets

4.3.1 Univariate Data

a) Scatter Plots

b) Histograms

c) Kernel-Density-Estimation (KDE) Plots

d) Cumulative Frequencies

e) Error-Bars

f) Box Plots

g) Grouped Bar Charts

h) Pie Charts

i) Programs: Data Display

4.3.2 Bivariate and Multivariate Plots

a) Bivariate Scatter Plots

b) 3D Plots

4.4 Exercises

Part II Distributions and Hypothesis Tests

5 Background

5.1 Populations and Samples

5.2 Probability Distributions

5.2.1 Discrete Distributions

5.2.2 Continuous Distributions

5.2.3 Expected Value and Variance

a) Expected Value

b) Variance

5.3 Degrees of Freedom

5.4 Study Design

5.4.1 Terminology

5.4.2 Overview

5.4.3 Types of Studies

a) Observational or Experimental

b) Prospective or Retrospective

c) Longitudinal or Cross-Sectional

d) Case–Control and Cohort studies

e) Randomized Controlled Trial

f) Crossover Studies

5.4.4 Design of Experiments

a) Sample Selection

b) Sample Size

c) Bias

d) Randomization

e) Blinding

f) Factorial Design

5.4.5 Personal Advice

1) Preliminary Investigations and Murphy's Law

2) Calibration Runs

3) Documentation

4) Data Storage

5.4.6 Clinical Investigation Plan

6 Distributions of One Variable

6.1 Characterizing a Distribution

6.1.1 Distribution Center

a) Mean

b) Median

c) Mode

d) Geometric Mean

6.1.2 Quantifying Variability

a) Range

b) Percentiles

c) Standard Deviation and Variance

d) Standard Error

e) Confidence Intervals

6.1.3 Parameters Describing the Form of a Distribution

a) Location

b) Scale

c) Shape Parameters

6.1.4 Important Presentations of Probability Densities

6.2 Discrete Distributions

6.2.1 Bernoulli Distribution

6.2.2 Binomial Distribution

b) Example: Binomial Test

6.2.3 Poisson Distribution

6.3 Normal Distribution

6.3.1 Examples of Normal Distributions

6.3.2 Central Limit Theorem

6.3.3 Distributions and Hypothesis Tests

6.4 Continuous Distributions Derived from the NormalDistribution

6.4.1 t-Distribution

6.4.2 Chi-Square Distribution

a) Definition

b) Application Example

6.4.3 F-Distribution

a) Definition

b) Application Example

6.5 Other Continuous Distributions

6.5.1 Lognormal Distribution

6.5.2 Weibull Distribution

6.5.3 Exponential Distribution

6.5.4 Uniform Distribution

6.6 Exercises

7 Hypothesis Tests

7.1 Typical Analysis Procedure

7.1.1 Data Screening and Outliers

7.1.2 Normality Check

a) Probability-Plots

b) Tests for Normality

7.1.3 Transformation

7.2 Hypothesis Concept, Errors, p-Value, and Sample Size

7.2.1 An Example

7.2.2 Generalization and Applications

a) Generalization

b) Additional Examples

7.2.3 The Interpretation of the p-Value

7.2.4 Types of Error

a) Type I Errors

b) Type II Errors and Test Power

c) Pitfalls in the Interpretation of p-Values

7.2.5 Sample Size

a) Examples

b) Python Solution

c) Programs: Sample Size

7.3 Sensitivity and Specificity

7.3.1 Related Calculations

7.4 Receiver-Operating-Characteristic (ROC) Curve

8 Tests of Means of Numerical Data

8.1 Distribution of a Sample Mean

8.1.1 One Sample t-Test for a Mean Value

a) Example

8.1.2 Wilcoxon Signed Rank Sum Test

8.2 Comparison of Two Groups

8.2.1 Paired t-Test

8.2.2 t-Test between Independent Groups

8.2.3 Nonparametric Comparison of Two Groups: Mann–Whitney Test

8.2.4 Statistical Hypothesis Tests vs Statistical Modeling

a) Classical t-Test

b) Statistical Modeling

8.3 Comparison of Multiple Groups

8.3.1 Analysis of Variance (ANOVA)

a) Principle

b) Example: One-Way ANOVA

8.3.2 Multiple Comparisons

a) Tukey's Test

b) Bonferroni Correction

c) Holm Correction

8.3.3 Kruskal–Wallis Test

8.3.4 Two-Way ANOVA

8.3.5 Three-Way ANOVA

8.4 Summary: Selecting the Right Test for Comparing Groups

8.4.1 Typical Tests

8.4.2 Hypothetical Examples

8.5 Exercises

9 Tests on Categorical Data

9.1 One Proportion

9.1.1 Confidence Intervals

9.1.2 Explanation

9.1.3 Example

9.2 Frequency Tables

9.2.1 One-Way Chi-Square Test

9.2.2 Chi-Square Contingency Test

a) Assumptions

b) Degrees of Freedom

c) Example 1

d) Example 2

e) Comments

9.2.3 Fisher's Exact Test

a) Example: ``A Lady Tasting Tea''

9.2.4 McNemar's Test

a) Example

9.2.5 Cochran's Q Test

a) Example

9.3 Exercises

10 Analysis of Survival Times

10.1 Survival Distributions

10.2 Survival Probabilities

10.2.1 Censorship

10.2.2 Kaplan–Meier Survival Curve

10.3 Comparing Survival Curves in Two Groups

Part III Statistical Modeling

11 Linear Regression Models

11.1 Linear Correlation

11.1.1 Correlation Coefficient

11.1.2 Rank Correlation

11.2 General Linear Regression Model

11.2.1 Example 1: Simple Linear Regression

11.2.2 Example 2: Quadratic Fit

11.2.3 Coefficient of Determination

a) Relation to Unexplained Variance

b) ``Good'' Fits

11.3 Patsy: The Formula Language

11.3.1 Design Matrix

a) Definition

b) Examples

11.4 Linear Regression Analysis with Python

11.4.1 Example 1: Line Fit with Confidence Intervals

11.4.2 Example 2: Noisy Quadratic Polynomial

11.5 Model Results of Linear Regression Models

11.5.1 Example: Tobacco and Alcohol in the UK

11.5.2 Definitions for Regression with Intercept

11.5.3 The R2 Value

11.5.4 2: The Adjusted R2 Value

a) The F-Test

b) Log-Likelihood Function

c) Information Content of Statistical Models: AIC and BIC

11.5.5 Model Coefficients and Their Interpretation

a) Coefficients

b) Standard Error

c) t-Statistic

d) Confidence Interval

11.5.6 Analysis of Residuals

a) Skewness and Kurtosis

b) Omnibus Test

c) Durbin–Watson

d) Jarque–Bera Test

e) Condition Number

11.5.7 Outliers

11.5.8 Regression Using Sklearn

11.5.9 Conclusion

11.6 Assumptions of Linear Regression Models

11.7 Interpreting the Results of Linear Regression Models

11.8 Bootstrapping

11.9 Exercises

12 Multivariate Data Analysis

12.1 Visualizing Multivariate Correlations

12.1.1 Scatterplot Matrix

12.1.2 Correlation Matrix

12.2 Multilinear Regression

13 Tests on Discrete Data

13.1 Comparing Groups of Ranked Data

13.2 Logistic Regression

13.2.1 Example: The Challenger Disaster

13.3 Generalized Linear Models

13.3.1 Exponential Family of Distributions

13.3.2 Linear Predictor and Link Function

13.4 Ordinal Logistic Regression

13.4.1 Problem Definition

13.4.2 Optimization

13.4.3 Code

13.4.4 Performance

14 Bayesian Statistics

14.1 Bayesian vs. Frequentist Interpretation

14.1.1 Bayesian Example

14.2 The Bayesian Approach in the Age of Computers

14.3 Example: Analysis of the Challenger Disaster with a Markov-Chain–Monte-Carlo Simulation

14.4 Summing Up

Solutions

Problems of Chap.2

Problems of Chap.4

Problems of Chap.6

Problems of Chap.8

Problems of Chap.9

Problems of Chap.11

Glossary

References

Index

Statistics and Computing Thomas Haslwanter An Introduction to Statistics with Python With Applications in the Life Sciences www.allitebooks.com

********************************************************************************* Statistics and Computing Series editor W.K. Härdle www.allitebooks.com

********************************************************************************* More information about this series at http://www.springer.com/series/3022 www.allitebooks.com

********************************************************************************* Thomas Haslwanter An Introduction to Statistics with Python With Applications in the Life Sciences 12 3 www.allitebooks.com

********************************************************************************* Thomas Haslwanter School of Applied Health and Social Sciences University of Applied Sciences Upper Austria Linz, Austria Series Editor: W.K. Härdle C.A.S.E. Centre for Applied Statistics and Economics School of Business and Economics Humboldt-Universität zu Berlin Unter den Linden 6 10099 Berlin Germany The Python code samples accompanying the book are available at www.quantlet.de. All Python programs and data sets can be found on GitHub: https://github.com/thomas- haslwanter/statsintro_python.git. Links to all material are available athttp://www.springer. com/de/book/9783319283159. The Python solution codes in the appendix are published under the Creative Commons Attribution-ShareAlike 4.0 International License. ISSN 1431-8784 Statistics and Computing ISBN 978-3-319-28315-9 DOI 10.1007/978-3-319-28316-6 ISSN 2197-1706 (electronic) ISBN 978-3-319-28316-6 (eBook) Library of Congress Control Number: 2016939946 © Springer International Publishing Switzerland 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG Switzerland www.allitebooks.com 欢迎加入非盈利Python学习交流编程QQ群783462347，群里免费提供500+本Python书籍！

********************************************************************************* To my two, three, and four-legged household companions: my wife Jean, Felix, and his sister Jessica. www.allitebooks.com 欢迎加入非盈利Python学习交流编程QQ群783462347，群里免费提供500+本Python书籍！

********************************************************************************* www.allitebooks.com 欢迎加入非盈利Python学习交流编程QQ群783462347，群里免费提供500+本Python书籍！

********************************************************************************* Preface In the data analysis for my own research work, I was often slowed down by two things: (1) I did not know enough statistics, and (2) the books available would provide a theoretical background, but no real practical help. The book you are holding in your hands (or on your tablet or laptop) is intended to be the book that will solve this very problem. It is designed to provide enough basic understanding so that you know what you are doing, and it should equip you with the tools you need. I believe that thePythonsolutions provided in this book for the most basic statistical problems address at least 90 % of the problems that most physicists, biologists, and medical doctors encounter in their work. So if you are the typical graduate student working on a degree, or a medical researcher analyzing the latest experiments, chances are that you will ﬁnd the tools you require here—explanation and source-code included. This is the reason I have focused on statistical basics and hypothesis tests in this book and refer only brieﬂy to other statistical approaches. I am well aware that most of the tests presented in this book can also be carried out using statistical modeling. But in many cases, this is not the methodology used in many life science journals. Advanced statistical analysis goes beyond the scope of this book and—to be frank— exceeds my own knowledge of statistics. My motivation for providing the solutions in Pythonis based on two considera- tions. One is that I would like them to be available to everyone. While commercial solutions likeMatlab,SPSS,Minitab, etc., offer powerful tools, most can only use them legally in an academic setting. In contrast,Pythonis completely free (“as in free beer” is often heard in thePythoncommunity). The second reason is thatPython is the most beautiful coding language that I have yet encountered; and around 2010 Pythonand its documentation matured to the point where one can use it without being a serious coder. Together, this book,Python, and the tools that thePython ecosystem offers today provide a beautiful, free package that covers all the statistics that most researchers willneed in their lifetime. vii www.allitebooks.com 欢迎加入非盈利Python学习交流编程QQ群783462347，群里免费提供500+本Python书籍！

分享到：

赞收藏

资料库

An-Introduction-to-Statistics-with-Python-With-Applications-in-t....pdf

相关推荐

开发技术

热门标签

最新资料