Front cover
brief contents
contents
foreword
preface
acknowledgments
about this book
Roadmap
How to use this book
Code conventions and downloads
Author Online
About the cover illustration
Part 1—Meet Solr
1 Introduction to Solr
1.1 Why do I need a search engine?
1.1.1 Managing text-centric data
1.1.2 Common search-engine use cases
1.2 What is Solr?
1.2.1 Information retrieval engine
1.2.2 Flexible schema management
1.2.3 Java web application
1.2.4 Multiple indexes in one server
1.2.5 Extendable (plugins)
1.2.6 Scalable
1.2.7 Fault-tolerant
1.3 Why Solr?
1.3.1 Solr for the software architect
1.3.2 Solr for the system administrator
1.3.3 Solr for the CEO
1.4 Features overview
1.4.1 User-experience features
1.4.2 Data-modeling features
1.4.3 New features in Solr 4
1.5 Summary
2 Getting to know Solr
2.1 Getting started
2.1.1 Installing Solr
2.1.2 Starting the Solr example server
2.1.3 Understanding Solr home
2.1.4 Indexing the example documents
2.2 Searching is what it’s all about
2.2.1 Exploring Solr’s query form
2.2.2 What comes back from Solr when you search
2.2.3 Ranked retrieval
2.2.4 Paging and sorting
2.2.5 Expanded search features
2.3 Tour of the Solr administration console
2.4 Adapting the example to your needs
2.5 Summary
3 Key Solr concepts
3.1 Searching, matching, and finding content
3.1.1 What is a document?
3.1.2 The fundamental search problem
3.1.3 The inverted index
3.1.4 Terms, phrases, and Boolean logic
3.1.5 Finding sets of documents
3.1.6 Phrase queries and term positions
3.1.7 Fuzzy matching
3.1.8 Quick recap
3.2 Relevancy
3.2.1 Default similarity
3.2.2 Term frequency
3.2.3 Inverse document frequency
3.2.4 Boosting
3.2.5 Normalization factors
3.3 Precision and Recall
3.3.1 Precision
3.3.2 Recall
3.3.3 Striking the right balance
3.4 Searching at scale
3.4.1 The denormalized document
3.4.2 Distributed searching
3.4.3 Clusters vs. servers
3.4.4 The limits of Solr
3.5 Summary
4 Configuring Solr
4.1 Overview of solrconfig.xml
4.1.1 Common XML data-structure and type elements
4.1.2 Applying configuration changes
4.1.3 Miscellaneous settings
4.2 Query request handling
4.2.1 Request-handling overview
4.2.2 Search handler
4.2.3 Browse request handler for Solritas: an example
4.2.4 Extending query processing with search components
4.3 Managing searchers
4.3.1 New searcher overview
4.3.2 Warming a new searcher
4.4 Cache management
4.4.1 Cache fundamentals
4.4.2 Filter cache
4.4.3 Query result cache
4.4.4 Document cache
4.4.5 Field value cache
4.5 Remaining configuration options
4.6 Summary
5 Indexing
5.1 Example microblog search application
5.1.1 Representing content for searching
5.1.2 Overview of the Solr indexing process
5.2 Designing your schema
5.2.1 Document granularity
5.2.2 Unique key
5.2.3 Indexed fields
5.2.4 Stored fields
5.2.5 Preview of schema.xml
5.3 Defining fields in schema.xml
5.3.1 Required field attributes
5.3.2 Multivalued fields
5.3.3 Dynamic fields
5.3.4 Copy fields
5.3.5 Unique key field
5.4 Field types for structured nontext fields
5.4.1 String fields
5.4.2 Date fields
5.4.3 Numeric fields
5.4.4 Advanced field type attributes
5.5 Sending documents to Solr for indexing
5.5.1 Indexing documents using XML or JSON
5.5.2 Using the SolrJ client library to add documents from Java
5.5.3 Other tools for importing documents into Solr
5.6 Update handler
5.6.1 Committing documents to the index
5.6.2 Transaction log
5.6.3 Atomic updates
5.7 Index management
5.7.1 Index storage
5.7.2 Segment merging
5.8 Summary
6 Text analysis
6.1 Analyzing microblog text
6.2 Basic text analysis
6.2.1 Analyzer
6.2.2 Tokenizer
6.2.3 Token filter
6.2.4 StandardTokenizer
6.2.5 Removing stop words with StopFilterFactory
6.2.6 LowerCaseFilterFactory—lowercase letters in terms
6.2.7 Testing your analysis with Solr’s analysis form
6.3 Defining a custom field type for microblog text
6.3.1 Collapsing repeated letters with PatternReplaceCharFilterFactory
6.3.2 Preserving hashtags, mentions, and hyphenated terms
6.3.3 Removing diacritical marks using ASCIIFoldingFilterFactory
6.3.4 Stemming with KStemFilterFactory
6.3.5 Injecting synonyms at query time with SynonymFilterFactory
6.3.6 Putting it all together
6.4 Advanced text analysis
6.4.1 Advanced field attributes
6.4.2 Per-language text analysis
6.4.3 Extending text analysis using a Solr plugin
6.5 Summary
Part 2—Core Solr capabilities
7 Performing queries and handling results
7.1 The anatomy of a Solr request
7.1.1 Request handlers
7.1.2 Search components
7.1.3 Query parsers
7.2 Working with query parsers
7.2.1 Specifying a query parser
7.2.2 Local params
7.3 Queries and filters
7.3.1 The fq and q parameters
7.3.2 Handling expensive filters
7.4 The default query parser (Lucene query parser)
7.4.1 Lucene query parser syntax
7.5 Handling user queries (eDisMax query parser)
7.5.1 eDisMax query parser overview
7.5.2 eDisMax query parameters
7.5.3 Searching across multiple fields
7.5.4 Boosting queries and phrases
7.5.5 Field aliasing
7.5.6 User-accessible fields
7.5.7 Minimum match
7.5.8 eDisMax benefits and drawbacks
7.6 Other useful query parsers
7.6.1 Field query parser
7.6.2 Term and Raw query parsers
7.6.3 Function and Function Range query parsers
7.6.4 Nested queries and the Nested query parser
7.6.5 Boost query parser
7.6.6 Prefix query parser
7.6.7 Spatial query parsers
7.6.8 Join query parser
7.6.9 Switch query parser
7.6.10 Surround query parser
7.6.11 Max Score query parser
7.6.12 Collapsing query parser
7.7 Returning results
7.7.1 Choosing a response format
7.7.2 Choosing fields to return
7.7.3 Paging through results
7.8 Sorting results
7.8.1 Sorting by fields
7.8.2 Sorting by functions
7.8.3 Fuzzy sorting
7.9 Debugging query results
7.9.1 Returning debug information
7.10 Summary
8 Faceted search
8.1 Navigating your content at a glance
8.2 Setting up test data
8.3 Field faceting
8.4 Query faceting
8.5 Range faceting
8.6 Filtering upon faceted values
8.6.1 Applying filters to your facets
8.6.2 Safely filtering on faceted values
8.7 Multiselect faceting, keys, and tags
8.7.1 Keys
8.7.2 Tags, excludes, and multiselect faceting
8.8 Beyond the basics
8.9 Summary
9 Hit highlighting
9.1 Overview of hit highlighting
9.2 How highlighting works
9.2.1 Set up a new Solr core for UFO sightings
9.2.2 Preprocess UFO sightings before indexing
9.2.3 Exploring the UFO sightings dataset
9.2.4 Hit highlighting out of the box
9.2.5 Nuts and bolts
9.2.6 Refining highlighter results
9.3 Improving performance using FastVectorHighlighter
9.4 PostingsHighlighter
9.5 Summary
10 Query suggestions
10.1 Spell-check
10.1.1 Indexing Wikipedia articles
10.1.2 Spell-check example
10.1.3 Spell-check search component
10.2 Autosuggesting query terms
10.2.1 Autosuggest request handler
10.2.2 Autosuggest search component
10.3 Suggesting document field values
10.3.1 Using n-grams for suggestions
10.3.2 N-gram-driven request handler
10.4 Suggesting queries based on user activity
10.5 Summary
11 Result grouping/ field collapsing
11.1 Result grouping vs. field collapsing
11.2 Skipping duplicate documents
11.3 Returning multiple documents per group
11.4 Grouping by functions and queries
11.4.1 Grouping by function
11.4.2 Grouping by query
11.5 Paging and sorting grouped results
11.6 Grouping gotchas
11.6.1 Faceting upon result groups
11.6.2 Distributed result grouping
11.6.3 Returning a flat list
11.6.4 Grouping on multivalued and tokenized fields
11.6.5 Grouping performance
11.7 Efficient field collapsing with the Collapsing query parser
11.8 Summary
12 Taking Solr to production
12.1 Developing a Solr distribution
12.2 Deploying Solr
12.2.1 Building your Solr distribution
12.2.2 Embedded Solr
12.3 Hardware and server configuration
12.3.1 RAM and SSDs
12.3.2 JVM settings
12.3.3 The index shuffle
12.3.4 Useful system tricks
12.4 Data acquisition strategies
12.5 Sharding and replication
12.5.1 Choosing to shard
12.5.2 Choosing to replicate
12.6 Solr core management
12.7 Managing clusters of servers
12.7.1 Load balancers and Solr health check
12.7.2 Generic vs. customized configuration
12.8 Querying and interacting with Solr
12.8.1 REST API
12.8.2 Available Solr client libraries
12.8.3 Using SolrJ from Java
12.9 Monitoring Solr’s performance
12.9.1 Solr’s Plugins / Stats page
12.9.2 Solr cache performance
12.9.3 Pulling stats from request handlers and MBeans
12.9.4 External monitoring options
12.9.5 Solr logs
12.9.6 Load testing
12.10 Upgrading between Solr versions
12.11 Summary
Part 3—Taking Solr to the next level
13 SolrCloud
13.1 Getting started with SolrCloud
13.1.1 Starting Solr in cloud mode
13.1.2 Motivation behind the SolrCloud architecture
13.2 Core concepts
13.2.1 Collections vs. cores
13.2.2 ZooKeeper
13.2.3 Choosing the number of shards and replicas
13.2.4 Cluster-state management
13.2.5 Shard-leader election
13.2.6 Important SolrCloud configuration settings
13.3 Distributed indexing
13.3.1 Document shard assignment
13.3.2 Adding documents
13.3.3 Near real-time search
13.3.4 Node recovery process
13.4 Distributed search
13.4.1 Multistage query process
13.4.2 Distributed search limitations
13.5 Collections API
13.5.1 Create a collection
13.5.2 Collection aliasing
13.6 Basic system-administration tasks
13.6.1 Configuration updates
13.6.2 Rolling restart
13.6.3 Restarting a failed node
13.6.4 Is node X active?
13.6.5 Adding a replica
13.6.6 Offsite backup
13.7 Advanced topics
13.7.1 Custom hashing
13.7.2 Shard splitting
13.8 Summary
14 Multilingual search
14.1 Why linguistic analysis matters
14.2 Stemming vs. lemmatization
14.3 Stemming in action
14.4 Handling edge cases
14.4.1 KeywordMarkerFilterFactory
14.4.2 StemmerOverrideFilterFactory
14.5 Available language libraries in Solr
14.5.1 Language-specific analyzer chains
14.5.2 Dictionary-based stemming (Hunspell)
14.6 Searching content in multiple languages
14.6.1 Separate field per language
14.6.2 Separate index per language
14.6.3 Multiple languages in one field
14.6.4 Creating a field type to handle multiple languages per field
14.7 Language identification
14.7.1 Update processors for language identification
14.7.2 Dynamically assigning detected language analyzers within a field
14.8 Summary
15 Complex query operations
15.1 Function queries
15.1.1 Function syntax
15.1.2 Searching on functions
15.1.3 Returning functions like fields
15.1.4 Sorting on functions
15.1.5 Available functions in Solr
15.1.6 Implementing a custom function
15.2 Geospatial search
15.2.1 Searching near a single point
15.2.2 Advanced geospatial search
15.3 Pivot faceting
15.4 Referencing external data
15.5 Cross-document and cross-index joins
15.6 Big data analytics with Solr
15.7 Summary
16 Mastering relevancy
16.1 The impact of relevancy tuning
16.2 Debugging the relevancy calculation
16.3 Relevancy boosting
16.3.1 Per-field boosting
16.3.2 Per-term boosting
16.3.3 Payload boosting
16.3.4 Function boosting
16.3.5 Term-proximity boosting
16.3.6 Elevating the relevancy of important documents
16.4 Pluggable Similarity class implementations
16.5 Personalized search and recommendations
16.5.1 Search vs. recommendations
16.5.2 Attribute-based matching
16.5.3 Hierarchical matching
16.5.4 More Like This
16.5.5 Concept-based matching
16.5.6 Geographical matching
16.5.7 Collaborative filtering
16.5.8 Hybrid approaches
16.6 Creating a personalized search experience
16.7 Running relevancy experiments
16.8 Summary
appendix A—Working with the Solr codebase
A.1 Pulling the right version of Solr
A.2 Setting up Solr in your IDE
A.3 Debugging Solr code
A.4 Downloading and applying Solr patches
A.5 Contributing patches
appendix B—Language-specific field type configurations
appendix C—Useful data import configurations
C.1 Indexing Wikipedia
C.2 Indexing Stack Exchange
index
Symbols
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Z
Back cover