Preface
Acknowledgments
Contributors
Contents
Part I Microanalysis
1 R Basics
1.1 Introduction
1.2 R and RStudio
1.3 Download and Install R
1.4 Download and Install RStudio
1.5 Download the Supporting Materials
1.6 RStudio
1.7 Let's Get Started
Practice
2 First Foray into Text Analysis with R
2.1 Loading the First Text File
2.2 Separate Content from Metadata
2.3 Reprocessing the Content
2.4 Beginning the Analysis
Practice
3 Accessing and Comparing Word Frequency Data
3.1 Accessing Word Data
3.2 Recycling
Practice
4 Token Distribution Analysis
4.1 Dispersion Plots
4.2 Searching with grep
4.2.1 Cleaning the Workspace
4.2.2 Identify the chapter break positions in the vector using the grep function
4.3 The for Loop and if Conditional
4.4 Accessing and Processing List Items
4.4.1 rbind
4.4.2 More Recycling
4.4.3 apply
4.4.4 do.call (Do Dot Call)
4.4.5 cbind
Practice
5 Correlation
5.1 Introduction
5.2 Correlation Analysis
5.3 A Word About Data Frames
5.4 Testing Correlation with Randomization
Practice
Part II Mesoanalysis
6 Measures of Lexical Variety
6.1 Lexical Variety and the Type-Token Ratio
6.2 Mean Word Frequency
6.3 Extracting Word Usage Means
6.4 Ranking the Values
6.5 Calculating the TTR Inside lapply
6.6 A Further Use of Correlation
Practice
7 Hapax Richness
7.1 Introduction
7.2 sapply
7.3 A Mini-Conditional Function
Practice
8 Do It KWIC
8.1 Introduction
8.2 Custom Functions
8.3 A Word List Making Function
8.4 Finding Words and Their Neighbors
Practice
9 Do It KWIC (Better)
9.1 Getting Organized
9.2 Separating Functions for Reuse
9.3 User Interaction
9.4 readline
9.5 Building a Better KWIC Function
9.6 Fixing a Problem
Practice
10 Text Quality, Text Variety, and Parsing XML
10.1 Introduction
10.2 The Text Encoding Initiative (TEI)
10.3 Parsing XML with R
10.4 Installing R Packages
10.5 Loading and Using the XML Package
10.6 Metadata
Practice
Part III Macroanalysis
11 Clustering
11.1 Introduction
11.2 Review
11.3 Some Oddities in R
11.4 Corpus Ingestion
11.5 Another Function
11.6 Unsupervised Clustering and the Euclidean Metric
11.7 Converting an R List into a Data Matrix
11.8 Preparing Data for Clustering
11.9 Clustering Data
Practice
12 Classification
12.1 Introduction
12.2 A Small Authorship Experiment
12.3 Text Segmentation
12.4 Converting an R List into a Matrix
12.5 Organizing the Data
12.6 Cross Tabulation
12.7 Mapping the Data to the Metadata
12.8 Reducing the Feature Set
12.9 Performing the Classification with SVM
Practice
13 Topic Modeling
13.1 Introduction
13.2 R and Topic Modeling
13.3 Text Segmentation and Preparation
13.4 The R mallet Package
13.5 Simple Topic Modeling with a Standard Stop List
13.6 Unpacking the Model
13.7 Topic Visualization
13.8 Topic Coherence and Topic Probability
13.9 Pre-processing with a POS Tagger
Practice
A Variable Scope Example
B The LDA Buffet
C Start up Code
C.1 Chapter 3
C.2 Chapter 4
C.3 Chapter 5
C.4 Chapter 6
C.5 Chapter 7
D R Resources for Further Reading
Practice Exercise Solutions
Index