logo资料库

Data Science with Java.pdf

第1页 / 共405页
第2页 / 共405页
第3页 / 共405页
第4页 / 共405页
第5页 / 共405页
第6页 / 共405页
第7页 / 共405页
第8页 / 共405页
资料共405页,剩余部分请下载后查看
Preface
Who Should Read This Book
Why I Wrote This Book
A Word on Data Science Today
Navigating This Book
Conventions Used in This Book
Using Code Examples
O’Reilly Safari
How to Contact Us
Acknowledgments
1. Data I/O
What Is Data, Anyway?
Data Models
Univariate Arrays
Multivariate Arrays
Data Objects
Matrices and Vectors
JSON
Dealing with Real Data
Nulls
Blank Spaces
Parse Errors
Outliers
Managing Data Files
Understanding File Contents First
Reading from a Text File
Parsing big strings
Parsing delimited strings
Parsing JSON strings
Reading from a JSON File
Reading from an Image File
Writing to a Text File
Mastering Database Operations
Command-Line Clients
Structured Query Language
Create
Select
Insert
Update
Delete
Drop
Java Database Connectivity
Connections
Statements
Prepared statements
Result sets
Visualizing Data with Plots
Creating Simple Plots
Scatter plots
Bar charts
Plotting multiple series
Basic formatting
Plotting Mixed Chart Types
Saving a Plot to a File
2. Linear Algebra
Building Vectors and Matrices
Array Storage
Block Storage
Map Storage
Accessing Elements
Working with Submatrices
Randomization
Operating on Vectors and Matrices
Scaling
Transposing
Addition and Subtraction
Length
Distances
Multiplication
Inner Product
Outer Product
Entrywise Product
Compound Operations
Affine Transformation
Mapping a Function
Decomposing Matrices
Cholesky Decomposition
LU Decomposition
QR Decomposition
Singular Value Decomposition
Eigen Decomposition
Determinant
Inverse
Solving Linear Systems
3. Statistics
The Probabilistic Origins of Data
Probability Density
Cumulative Probability
Statistical Moments
Entropy
Continuous Distributions
Uniform
Normal
Multivariate normal
Log normal
Empirical
Discrete Distributions
Bernoulli
Binomial
Poisson
Characterizing Datasets
Calculating Moments
Sample moments
Updating moments
Descriptive Statistics
Count
Sum
Min
Max
Mean
Median
Mode
Variance
Standard deviation
Error on the mean
Skewness
Kurtosis
Multivariate Statistics
Covariance and Correlation
Covariance
Pearson’s correlation
Regression
Simple regression
Multiple regression
Working with Large Datasets
Accumulating Statistics
Merging Statistics
Regression
Using Built-in Database Functions
4. Data Operations
Transforming Text Data
Extracting Tokens from a Document
Utilizing Dictionaries
Vectorizing a Document
Scaling and Regularizing Numeric Data
Scaling Columns
Min-max scaling
Centering the data
Unit normal scaling
Scaling Rows
L1 regularization
L2 regularization
Matrix Scaling Operator
Reducing Data to Principal Components
Covariance Method
SVD Method
Creating Training, Validation, and Test Sets
Index-Based Resampling
List-Based Resampling
Mini-Batches
Encoding Labels
A Generic Encoder
One-Hot Encoding
5. Learning and Prediction
Learning Algorithms
Iterative Learning Procedure
Gradient Descent Optimizer
Evaluating Learning Processes
Minimizing a Loss Function
Linear loss
Quadratic loss
Cross-entropy loss
Bernoulli
Multinomial
Two-Point
Minimizing the Sum of Variances
Silhouette Coefficient
Log-Likelihood
Classifier Accuracy
Unsupervised Learning
k-Means Clustering
DBSCAN
Dealing with outliers
Optimizing radius of capture and minPoints
Inference from DBSCAN
Gaussian Mixtures
Gaussian mixture model
Fitting with the EM algorithm
Optimizing the number of clusters
Supervised Learning
Naive Bayes
Gaussian
Multinomial
Bernoulli
Iris example
Linear Models
Linear
Logistic
Softmax
Tanh
Linear model estimator
Iris example
Deep Networks
A network layer
Feed forward
Back propagation
Deep network estimator
MNIST example
6. Hadoop MapReduce
Hadoop Distributed File System
MapReduce Architecture
Writing MapReduce Applications
Anatomy of a MapReduce Job
Hadoop Data Types
Writable and WritableComparable types
Custom Writable and WritableComparable types
Writable
WritableComparable
Mappers
Generic mappers
Customizing a mapper
Reducers
Generic reducers
Customizing a reducer
The Simplicity of a JSON String as Text
Deployment Wizardry
Running a standalone program
Deploying a JAR application
Including dependencies
Simplifying with a BASH script
MapReduce Examples
Word Count
Custom Word Count
Sparse Linear Algebra
A. Datasets
Anscombe’s Quartet
Sentiment
Gaussian Mixtures
Iris
MNIST
Index
Download from finelybook www.finelybook.com Data Science with Java Michael R. Brzustowicz, PhD 2
Download from finelybook www.finelybook.com Data Science with Java by Michael R. Brzustowicz, PhD Copyright © 2017 Michael Brzustowicz. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Editors: Nan Barber and Brian Foster Production Editor: Kristen Brown Copyeditor: Sharon Wilkey Proofreader: Jasmine Kwityn Indexer: Lucie Haskins Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Rebecca Demarest June 2017: First Edition 3
Download from finelybook www.finelybook.com Revision History for the First Edition 2017-05-30: First Release The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Data Science with Java, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-93411-1 [LSI] This book was downloaded from AvaxHome! Visit my blog for more new books: https://avxhm.se/blogs/AlenMiler 4
Download from finelybook www.finelybook.com Dedication This book is for my cofounder and our two startups. 5
Download from finelybook www.finelybook.com Preface Data science is a diverse and growing field encompassing many subfields of both mathematics and computer science. Statistics, linear algebra, databases, machine intelligence, and data visualization are just a few of the topics that merge together in the realm of a data scientist. Technology abounds and the tools to practice data science are evolving rapidly. This book focuses on core, fundamental principles backed by clear, object-oriented code in Java. And while this book will inspire you to get busy right away practicing the craft of data science, it is my hope that you will take the lead in building the next generation of data science technology. 6
Download from finelybook www.finelybook.com Who Should Read This Book This book is for scientists and engineers already familiar with the concepts of application development who want to jump headfirst into data science. The topics covered here will walk you through the data science pipeline, explaining mathematical theory and giving code examples along the way. This book is the perfect jumping-off point into much deeper waters. 7
Download from finelybook www.finelybook.com Why I Wrote This Book I wrote this book to start a movement. As data science skyrockets to stardom, fueled by R and Python, very few practitioners venture into the world of Java. Clearly, the tools for data exploration lend themselves to the interpretive languages. But there is another realm of the engineering–science hybrid where scale, robustness, and convenience must merge. Java is perhaps the one language that can do it all. If this book inspires you, I hope that you will contribute code to one of the many open source Java projects that support data science. 8
分享到:
收藏