Solr in Action最新完整版.pdf

发布时间：2022-06-16 发布人：admin 分类：说明书资料大小：21.86M 资料格式：pdf 举报版权申诉

bb4b0371-fc9a-4014-a2d0-d0fb83bbe31b.pdf-第1页.png

第1页 / 共666页

bb4b0371-fc9a-4014-a2d0-d0fb83bbe31b.pdf-第2页.png

第2页 / 共666页

bb4b0371-fc9a-4014-a2d0-d0fb83bbe31b.pdf-第3页.png

第3页 / 共666页

bb4b0371-fc9a-4014-a2d0-d0fb83bbe31b.pdf-第4页.png

第4页 / 共666页

bb4b0371-fc9a-4014-a2d0-d0fb83bbe31b.pdf-第5页.png

第5页 / 共666页

bb4b0371-fc9a-4014-a2d0-d0fb83bbe31b.pdf-第6页.png

第6页 / 共666页

bb4b0371-fc9a-4014-a2d0-d0fb83bbe31b.pdf-第7页.png

第7页 / 共666页

bb4b0371-fc9a-4014-a2d0-d0fb83bbe31b.pdf-第8页.png

第8页 / 共666页

Front cover

brief contents

contents

foreword

preface

acknowledgments

about this book

Roadmap

How to use this book

Code conventions and downloads

Author Online

About the cover illustration

Part 1—Meet Solr

1 Introduction to Solr

1.1 Why do I need a search engine?

1.1.1 Managing text-centric data

1.1.2 Common search-engine use cases

1.2 What is Solr?

1.2.1 Information retrieval engine

1.2.2 Flexible schema management

1.2.3 Java web application

1.2.4 Multiple indexes in one server

1.2.5 Extendable (plugins)

1.2.6 Scalable

1.2.7 Fault-tolerant

1.3 Why Solr?

1.3.1 Solr for the software architect

1.3.2 Solr for the system administrator

1.3.3 Solr for the CEO

1.4 Features overview

1.4.1 User-experience features

1.4.2 Data-modeling features

1.4.3 New features in Solr 4

1.5 Summary

2 Getting to know Solr

2.1 Getting started

2.1.1 Installing Solr

2.1.2 Starting the Solr example server

2.1.3 Understanding Solr home

2.1.4 Indexing the example documents

2.2 Searching is what it’s all about

2.2.1 Exploring Solr’s query form

2.2.2 What comes back from Solr when you search

2.2.3 Ranked retrieval

2.2.4 Paging and sorting

2.2.5 Expanded search features

2.3 Tour of the Solr administration console

2.4 Adapting the example to your needs

2.5 Summary

3 Key Solr concepts

3.1 Searching, matching, and finding content

3.1.1 What is a document?

3.1.2 The fundamental search problem

3.1.3 The inverted index

3.1.4 Terms, phrases, and Boolean logic

3.1.5 Finding sets of documents

3.1.6 Phrase queries and term positions

3.1.7 Fuzzy matching

3.1.8 Quick recap

3.2 Relevancy

3.2.1 Default similarity

3.2.2 Term frequency

3.2.3 Inverse document frequency

3.2.4 Boosting

3.2.5 Normalization factors

3.3 Precision and Recall

3.3.1 Precision

3.3.2 Recall

3.3.3 Striking the right balance

3.4 Searching at scale

3.4.1 The denormalized document

3.4.2 Distributed searching

3.4.3 Clusters vs. servers

3.4.4 The limits of Solr

3.5 Summary

4 Configuring Solr

4.1 Overview of solrconfig.xml

4.1.1 Common XML data-structure and type elements

4.1.2 Applying configuration changes

4.1.3 Miscellaneous settings

4.2 Query request handling

4.2.1 Request-handling overview

4.2.2 Search handler

4.2.3 Browse request handler for Solritas: an example

4.2.4 Extending query processing with search components

4.3 Managing searchers

4.3.1 New searcher overview

4.3.2 Warming a new searcher

4.4 Cache management

4.4.1 Cache fundamentals

4.4.2 Filter cache

4.4.3 Query result cache

4.4.4 Document cache

4.4.5 Field value cache

4.5 Remaining configuration options

4.6 Summary

5 Indexing

5.1 Example microblog search application

5.1.1 Representing content for searching

5.1.2 Overview of the Solr indexing process

5.2 Designing your schema

5.2.1 Document granularity

5.2.2 Unique key

5.2.3 Indexed fields

5.2.4 Stored fields

5.2.5 Preview of schema.xml

5.3 Defining fields in schema.xml

5.3.1 Required field attributes

5.3.2 Multivalued fields

5.3.3 Dynamic fields

5.3.4 Copy fields

5.3.5 Unique key field

5.4 Field types for structured nontext fields

5.4.1 String fields

5.4.2 Date fields

5.4.3 Numeric fields

5.4.4 Advanced field type attributes

5.5 Sending documents to Solr for indexing

5.5.1 Indexing documents using XML or JSON

5.5.2 Using the SolrJ client library to add documents from Java

5.5.3 Other tools for importing documents into Solr

5.6 Update handler

5.6.1 Committing documents to the index

5.6.2 Transaction log

5.6.3 Atomic updates

5.7 Index management

5.7.1 Index storage

5.7.2 Segment merging

5.8 Summary

6 Text analysis

6.1 Analyzing microblog text

6.2 Basic text analysis

6.2.1 Analyzer

6.2.2 Tokenizer

6.2.3 Token filter

6.2.4 StandardTokenizer

6.2.5 Removing stop words with StopFilterFactory

6.2.6 LowerCaseFilterFactory—lowercase letters in terms

6.2.7 Testing your analysis with Solr’s analysis form

6.3 Defining a custom field type for microblog text

6.3.1 Collapsing repeated letters with PatternReplaceCharFilterFactory

6.3.2 Preserving hashtags, mentions, and hyphenated terms

6.3.3 Removing diacritical marks using ASCIIFoldingFilterFactory

6.3.4 Stemming with KStemFilterFactory

6.3.5 Injecting synonyms at query time with SynonymFilterFactory

6.3.6 Putting it all together

6.4 Advanced text analysis

6.4.1 Advanced field attributes

6.4.2 Per-language text analysis

6.4.3 Extending text analysis using a Solr plugin

6.5 Summary

Part 2—Core Solr capabilities

7 Performing queries and handling results

7.1 The anatomy of a Solr request

7.1.1 Request handlers

7.1.2 Search components

7.1.3 Query parsers

7.2 Working with query parsers

7.2.1 Specifying a query parser

7.2.2 Local params

7.3 Queries and filters

7.3.1 The fq and q parameters

7.3.2 Handling expensive filters

7.4 The default query parser (Lucene query parser)

7.4.1 Lucene query parser syntax

7.5 Handling user queries (eDisMax query parser)

7.5.1 eDisMax query parser overview

7.5.2 eDisMax query parameters

7.5.3 Searching across multiple fields

7.5.4 Boosting queries and phrases

7.5.5 Field aliasing

7.5.6 User-accessible fields

7.5.7 Minimum match

7.5.8 eDisMax benefits and drawbacks

7.6 Other useful query parsers

7.6.1 Field query parser

7.6.2 Term and Raw query parsers

7.6.3 Function and Function Range query parsers

7.6.4 Nested queries and the Nested query parser

7.6.5 Boost query parser

7.6.6 Prefix query parser

7.6.7 Spatial query parsers

7.6.8 Join query parser

7.6.9 Switch query parser

7.6.10 Surround query parser

7.6.11 Max Score query parser

7.6.12 Collapsing query parser

7.7 Returning results

7.7.1 Choosing a response format

7.7.2 Choosing fields to return

7.7.3 Paging through results

7.8 Sorting results

7.8.1 Sorting by fields

7.8.2 Sorting by functions

7.8.3 Fuzzy sorting

7.9 Debugging query results

7.9.1 Returning debug information

7.10 Summary

8 Faceted search

8.1 Navigating your content at a glance

8.2 Setting up test data

8.3 Field faceting

8.4 Query faceting

8.5 Range faceting

8.6 Filtering upon faceted values

8.6.1 Applying filters to your facets

8.6.2 Safely filtering on faceted values

8.7 Multiselect faceting, keys, and tags

8.7.1 Keys

8.7.2 Tags, excludes, and multiselect faceting

8.8 Beyond the basics

8.9 Summary

9 Hit highlighting

9.1 Overview of hit highlighting

9.2 How highlighting works

9.2.1 Set up a new Solr core for UFO sightings

9.2.2 Preprocess UFO sightings before indexing

9.2.3 Exploring the UFO sightings dataset

9.2.4 Hit highlighting out of the box

9.2.5 Nuts and bolts

9.2.6 Refining highlighter results

9.3 Improving performance using FastVectorHighlighter

9.4 PostingsHighlighter

9.5 Summary

10 Query suggestions

10.1 Spell-check

10.1.1 Indexing Wikipedia articles

10.1.2 Spell-check example

10.1.3 Spell-check search component

10.2 Autosuggesting query terms

10.2.1 Autosuggest request handler

10.2.2 Autosuggest search component

10.3 Suggesting document field values

10.3.1 Using n-grams for suggestions

10.3.2 N-gram-driven request handler

10.4 Suggesting queries based on user activity

10.5 Summary

11 Result grouping/ field collapsing

11.1 Result grouping vs. field collapsing

11.2 Skipping duplicate documents

11.3 Returning multiple documents per group

11.4 Grouping by functions and queries

11.4.1 Grouping by function

11.4.2 Grouping by query

11.5 Paging and sorting grouped results

11.6 Grouping gotchas

11.6.1 Faceting upon result groups

11.6.2 Distributed result grouping

11.6.3 Returning a flat list

11.6.4 Grouping on multivalued and tokenized fields

11.6.5 Grouping performance

11.7 Efficient field collapsing with the Collapsing query parser

11.8 Summary

12 Taking Solr to production

12.1 Developing a Solr distribution

12.2 Deploying Solr

12.2.1 Building your Solr distribution

12.2.2 Embedded Solr

12.3 Hardware and server configuration

12.3.1 RAM and SSDs

12.3.2 JVM settings

12.3.3 The index shuffle

12.3.4 Useful system tricks

12.4 Data acquisition strategies

12.5 Sharding and replication

12.5.1 Choosing to shard

12.5.2 Choosing to replicate

12.6 Solr core management

12.7 Managing clusters of servers

12.7.1 Load balancers and Solr health check

12.7.2 Generic vs. customized configuration

12.8 Querying and interacting with Solr

12.8.1 REST API

12.8.2 Available Solr client libraries

12.8.3 Using SolrJ from Java

12.9 Monitoring Solr’s performance

12.9.1 Solr’s Plugins / Stats page

12.9.2 Solr cache performance

12.9.3 Pulling stats from request handlers and MBeans

12.9.4 External monitoring options

12.9.5 Solr logs

12.9.6 Load testing

12.10 Upgrading between Solr versions

12.11 Summary

Part 3—Taking Solr to the next level

13 SolrCloud

13.1 Getting started with SolrCloud

13.1.1 Starting Solr in cloud mode

13.1.2 Motivation behind the SolrCloud architecture

13.2 Core concepts

13.2.1 Collections vs. cores

13.2.2 ZooKeeper

13.2.3 Choosing the number of shards and replicas

13.2.4 Cluster-state management

13.2.5 Shard-leader election

13.2.6 Important SolrCloud configuration settings

13.3 Distributed indexing

13.3.1 Document shard assignment

13.3.2 Adding documents

13.3.3 Near real-time search

13.3.4 Node recovery process

13.4 Distributed search

13.4.1 Multistage query process

13.4.2 Distributed search limitations

13.5 Collections API

13.5.1 Create a collection

13.5.2 Collection aliasing

13.6 Basic system-administration tasks

13.6.1 Configuration updates

13.6.2 Rolling restart

13.6.3 Restarting a failed node

13.6.4 Is node X active?

13.6.5 Adding a replica

13.6.6 Offsite backup

13.7 Advanced topics

13.7.1 Custom hashing

13.7.2 Shard splitting

13.8 Summary

14 Multilingual search

14.1 Why linguistic analysis matters

14.2 Stemming vs. lemmatization

14.3 Stemming in action

14.4 Handling edge cases

14.4.1 KeywordMarkerFilterFactory

14.4.2 StemmerOverrideFilterFactory

14.5 Available language libraries in Solr

14.5.1 Language-specific analyzer chains

14.5.2 Dictionary-based stemming (Hunspell)

14.6 Searching content in multiple languages

14.6.1 Separate field per language

14.6.2 Separate index per language

14.6.3 Multiple languages in one field

14.6.4 Creating a field type to handle multiple languages per field

14.7 Language identification

14.7.1 Update processors for language identification

14.7.2 Dynamically assigning detected language analyzers within a field

14.8 Summary

15 Complex query operations

15.1 Function queries

15.1.1 Function syntax

15.1.2 Searching on functions

15.1.3 Returning functions like fields

15.1.4 Sorting on functions

15.1.5 Available functions in Solr

15.1.6 Implementing a custom function

15.2 Geospatial search

15.2.1 Searching near a single point

15.2.2 Advanced geospatial search

15.3 Pivot faceting

15.4 Referencing external data

15.5 Cross-document and cross-index joins

15.6 Big data analytics with Solr

15.7 Summary

16 Mastering relevancy

16.1 The impact of relevancy tuning

16.2 Debugging the relevancy calculation

16.3 Relevancy boosting

16.3.1 Per-field boosting

16.3.2 Per-term boosting

16.3.3 Payload boosting

16.3.4 Function boosting

16.3.5 Term-proximity boosting

16.3.6 Elevating the relevancy of important documents

16.4 Pluggable Similarity class implementations

16.5 Personalized search and recommendations

16.5.1 Search vs. recommendations

16.5.2 Attribute-based matching

16.5.3 Hierarchical matching

16.5.4 More Like This

16.5.5 Concept-based matching

16.5.6 Geographical matching

16.5.7 Collaborative filtering

16.5.8 Hybrid approaches

16.6 Creating a personalized search experience

16.7 Running relevancy experiments

16.8 Summary

appendix A—Working with the Solr codebase

A.1 Pulling the right version of Solr

A.2 Setting up Solr in your IDE

A.3 Debugging Solr code

A.4 Downloading and applying Solr patches

A.5 Contributing patches

appendix B—Language-specific field type configurations

appendix C—Useful data import configurations

C.1 Indexing Wikipedia

C.2 Indexing Stack Exchange

index

Symbols

Back cover

Trey Grainger Timothy Potter FOREWORD BY Yonik Seeley M A N N I N G

Solr in Action TREY GRAINGER TIMOTHY POTTER M A N N I N G SHELTER ISLAND Download from BookDL (http://bookdl.com)

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 261 Shelter Island, NY 11964 Email: orders@manning.com ©2014 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Photographs in this book were created by Martin Evans and Jordan Hochenbaum, unless otherwise noted. Illustrations were created by Martin Evans, Joshua Noble, and Jordan Hochenbaum. Fritzing (fritzing.org) was used to create some of the circuit diagrams. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. 20 Baldwin Road PO Box 261 Shelter Island, NY 11964 Development editors: Elizabeth Lexleigh, Susan Conant Copyeditor: Melinda Rankin Proofreader: Elizabeth Martin Typesetter: Dennis Dalinnik Cover designer: Marija Tudor ISBN: 9781617291029 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – MAL – 19 18 17 16 15 14 Download from BookDL (http://bookdl.com)

PART 1 MEET SOLR. .................................................................1 brief contents Introduction to Solr 3 1 ■ 2 ■ Getting to know Solr 26 3 ■ Key Solr concepts 48 4 ■ Configuring Solr 82 5 ■ 6 ■ Text analysis 162 Indexing 116 PART 2 CORE SOLR CAPABILITIES ..........................................195 7 ■ Performing queries and handling results 197 8 ■ Faceted search 250 9 ■ Hit highlighting 281 10 ■ Query suggestions 306 11 ■ Result grouping/field collapsing 330 12 ■ Taking Solr to production 356 iii Download from BookDL (http://bookdl.com)

iv BRIEF CONTENTS PART 3 TAKING SOLR TO THE NEXT LEVEL.............................403 13 ■ SolrCloud 405 14 ■ Multilingual search 450 15 ■ Complex query operations 501 16 ■ Mastering relevancy 548 Download from BookDL (http://bookdl.com)

contents xv xvii foreword preface acknowledgments about this book xix xxi PART 1 MEET SOLR . .....................................................1 1 Introduction to Solr 3 1.1 Why do I need a search engine? 4 Managing text-centric data 4 Common search-engine use cases 7 1.2 What is Solr? 9 Information retrieval engine 11 ■ Flexible schema management 13 ■ Java web application 13 Multiple indexes in one server 15 ■ Extendable (plugins) 15 Scalable 15 ■ Fault-tolerant 16 1.3 Why Solr? 17 Solr for the software architect 17 ■ Solr for the system administrator 18 ■ Solr for the CEO 19 v Download from BookDL (http://bookdl.com)

vi CONTENTS 1.4 Features overview 19 User-experience features 19 ■ Data-modeling features 21 New features in Solr 4 23 1.5 Summary 24 2 Getting to know Solr 26 2.1 Getting started 27 Installing Solr 27 ■ Starting the Solr example server 28 Understanding Solr home 32 ■ Indexing the example documents 33 2.2 Searching is what it’s all about 34 Exploring Solr’s query form 34 ■ What comes back from Solr when you search 38 ■ Ranked retrieval 39 ■ Paging and sorting 40 Expanded search features 41 2.3 Tour of the Solr administration console 43 2.4 Adapting the example to your needs 45 2.5 Summary 46 3 Key Solr concepts 48 3.1 Searching, matching, and finding content 49 What is a document? 49 ■ The fundamental search problem 50 The inverted index 53 ■ Terms, phrases, and Boolean logic 54 Finding sets of documents 56 ■ Phrase queries and term positions 59 ■ Fuzzy matching 60 ■ Quick recap 65 3.2 Relevancy 65 Default similarity 65 ■ Term frequency 67 Inverse document frequency 68 ■ Boosting 69 Normalization factors 69 3.3 Precision and Recall 71 Precision 72 ■ Recall 73 ■ Striking the right balance 73 3.4 Searching at scale 74 The denormalized document 75 ■ Distributed searching 77 Clusters vs. servers 78 ■ The limits of Solr 79 3.5 Summary 80 4 Configuring Solr 82 4.1 Overview of solrconfig.xml 85 Common XML data-structure and type elements 87 Applying configuration changes 87 ■ Miscellaneous settings 88 Download from BookDL (http://bookdl.com)

CONTENTS vii 4.2 Query request handling 90 Request-handling overview 90 ■ Search handler 93 Browse request handler for Solritas: an example 94 Extending query processing with search components 98 4.3 Managing searchers 103 New searcher overview 103 ■ Warming a new searcher 104 4.4 Cache management 107 Cache fundamentals 107 ■ Filter cache 109 Query result cache 112 ■ Document cache 113 Field value cache 113 4.5 Remaining configuration options 114 4.6 Summary 114 5 Indexing 116 5.1 Example microblog search application 117 Representing content for searching 117 Overview of the Solr indexing process 119 5.2 Designing your schema 121 Document granularity 121 ■ Unique key 122 Indexed fields 123 ■ Stored fields 123 Preview of schema.xml 124 5.3 Defining fields in schema.xml 125 Required field attributes 126 ■ Multivalued fields 127 Dynamic fields 128 ■ Copy fields 131 ■ Unique key field 133 5.4 Field types for structured nontext fields 133 String fields 134 ■ Date fields 135 ■ Numeric fields 137 Advanced field type attributes 138 5.5 Sending documents to Solr for indexing 141 Indexing documents using XML or JSON 141 ■ Using the SolrJ client library to add documents from Java 144 ■ Other tools for importing documents into Solr 146 5.6 Update handler 147 Committing documents to the index 148 ■ Transaction log 151 Atomic updates 152 5.7 Index management 155 Index storage 155 ■ Segment merging 158 5.8 Summary 160 Download from BookDL (http://bookdl.com)

分享到：

赞收藏

资料库

Solr in Action最新完整版.pdf

相关推荐

行业

热门标签

最新资料