logo资料库

Elasticsearch in Action(Manning,2015).pdf

第1页 / 共498页
第2页 / 共498页
第3页 / 共498页
第4页 / 共498页
第5页 / 共498页
第6页 / 共498页
第7页 / 共498页
第8页 / 共498页
资料共498页,剩余部分请下载后查看
Front cover
brief contents
contents
preface
acknowledgments
about this book
Roadmap
Code conventions and downloads
Author Online
about the cover illustration
Part 1
1 Introducing Elasticsearch
1.1 Solving search problems with Elasticsearch
1.1.1 Providing quick searches
1.1.2 Ensuring relevant results
1.1.3 Searching beyond exact matches
1.2 Exploring typical Elasticsearch use cases
1.2.1 Using Elasticsearch as the primary back end
1.2.2 Adding Elasticsearch to an existing system
1.2.3 Using Elasticsearch with existing tools
1.2.4 Main Elasticsearch features
1.2.5 Extending Lucene functionality
1.2.6 Structuring your data in Elasticsearch
1.2.7 Installing Java
1.2.8 Downloading and starting Elasticsearch
1.2.9 Verifying that it works
1.3 Summary
2 Diving into the functionality
2.1 Understanding the logical layout: documents, types, and indices
2.1.1 Documents
2.1.2 Types
2.1.3 Indices
2.2 Understanding the physical layout: nodes and shards
2.2.1 Creating a cluster of one or more nodes
2.2.2 Understanding primary and replica shards
2.2.3 Distributing shards in a cluster
2.2.4 Distributed indexing and searching
2.3 Indexing new data
2.3.1 Indexing a document with cURL
2.3.2 Creating an index and mapping type
2.3.3 Indexing documents from the code samples
2.4 Searching for and retrieving data
2.4.1 Where to search
2.4.2 Contents of the reply
2.4.3 How to search
2.4.4 Getting documents by ID
2.5 Configuring Elasticsearch
2.5.1 Specifying a cluster name in elasticsearch.yml
2.5.2 Specifying verbose logging via logging.yml
2.5.3 Adjusting JVM settings
2.6 Adding nodes to the cluster
2.6.1 Starting a second node
2.6.2 Adding additional nodes
2.7 Summary
3 Indexing, updating, and deleting data
3.1 Using mappings to define kinds of documents
3.1.1 Retrieving and defining mappings
3.1.2 Extending an existing mapping
3.2 Core types for defining your own fields in documents
3.2.1 String
3.2.2 Numeric
3.2.3 Date
3.2.4 Boolean
3.3 Arrays and multi-fields
3.3.1 Arrays
3.3.2 Multi-fields
3.4 Using predefined fields
3.4.1 Controlling how to store and search your documents
3.4.2 Identifying your documents
3.5 Updating existing documents
3.5.1 Using the update API
3.5.2 Implementing concurrency control through versioning
3.6 Deleting data
3.6.1 Deleting documents
3.6.2 Deleting indices
3.6.3 Closing indices
3.6.4 Re-indexing sample documents
3.7 Summary
4 Searching your data
4.1 Structure of a search request
4.1.1 Specifying a search scope
4.1.2 Basic components of a search request
4.1.3 Request body–based search request
4.1.4 Understanding the structure of a response
4.2 Introducing the query and filter DSL
4.2.1 Match query and term filter
4.2.2 Most used basic queries and filters
4.2.3 Match query and term filter
4.2.4 Phrase_prefix query
4.3 Combining queries or compound queries
4.3.1 bool query
4.3.2 bool filter
4.4 Beyond match and filter queries
4.4.1 Range query and filter
4.4.2 Prefix query and filter
4.4.3 Wildcard query
4.5 Querying for field existence with filters
4.5.1 Exists filter
4.5.2 Missing filter
4.5.3 Transforming any query into a filter
4.6 Choosing the best query for the job
4.7 Summary
5 Analyzing your data
5.1 What is analysis?
5.1.1 Character filtering
5.1.2 Breaking into tokens
5.1.3 Token filtering
5.1.4 Token indexing
5.2 Using analyzers for your documents
5.2.1 Adding analyzers when an index is created
5.2.2 Adding analyzers to the Elasticsearch configuration
5.2.3 Specifying the analyzer for a field in the mapping
5.3 Analyzing text with the analyze API
5.3.1 Selecting an analyzer
5.3.2 Combining parts to create an impromptu analyzer
5.3.3 Analyzing based on a field’s mapping
5.3.4 Learning about indexed terms using the terms vectors API
5.4 Analyzers, tokenizers, and token filters, oh my!
5.4.1 Built-in analyzers
5.4.2 Tokenization
5.4.3 Token filters
5.5 Ngrams, edge ngrams, and shingles
5.5.1 1-grams
5.5.2 Bigrams
5.5.3 Trigrams
5.5.4 Setting min_gram and max_gram
5.5.5 Edge ngrams
5.5.6 Ngram settings
5.5.7 Shingles
5.6 Stemming
5.6.1 Algorithmic stemming
5.6.2 Stemming with dictionaries
5.6.3 Overriding the stemming from a token filter
5.7 Summary
6 Searching with relevancy
6.1 How scoring works in Elasticsearch
6.1.1 How scoring documents works
6.1.2 Term frequency
6.1.3 Inverse document frequency
6.1.4 Lucene’s scoring formula
6.2 Other scoring methods
6.2.1 Okapi BM25
6.3 Boosting
6.3.1 Boosting at index time
6.3.2 Boosting at query time
6.3.3 Queries spanning multiple fields
6.4 Understanding how a document was scored with explain
6.4.1 Explaining why a document did not match
6.5 Reducing scoring impact with query rescoring
6.6 Custom scoring with function_score
6.6.1 weight
6.6.2 Combining scores
6.6.3 field_value_factor
6.6.4 Script
6.6.5 random
6.6.6 Decay functions
6.6.7 Configuration options
6.7 Tying it back together
6.8 Sorting with scripts
6.9 Field data detour
6.9.1 The field data cache
6.9.2 What field data is used for
6.9.3 Managing field data
6.10 Summary
7 Exploring your data with aggregations
7.1 Understanding the anatomy of an aggregation
7.1.1 Structure of an aggregation request
7.1.2 Aggregations run on query results
7.1.3 Filters and aggregations
7.2 Metrics aggregations
7.2.1 Statistics
7.2.2 Advanced statistics
7.2.3 Approximate statistics
7.3 Multi-bucket aggregations
7.3.1 Terms aggregations
7.3.2 Range aggregations
7.3.3 Histogram aggregations
7.4 Nesting aggregations
7.4.1 Nesting multi-bucket aggregations
7.4.2 Nesting aggregations to get result grouping
7.4.3 Using single-bucket aggregations
7.5 Summary
8 Relations among documents
8.1 Overview of options for defining relationships among documents
8.1.1 Object type
8.1.2 Nested type
8.1.3 Parent-child relationships
8.1.4 Denormalizing
8.2 Having objects as field values
8.2.1 Mapping and indexing objects
8.2.2 Searching in objects
8.3 Nested type: connecting nested documents
8.3.1 Mapping and indexing nested documents
8.3.2 Searches and aggregations on nested documents
8.4 Parent-child relationships: connecting separate documents
8.4.1 Indexing, updating, and deleting child documents
8.4.2 Searching in parent and child documents
8.5 Denormalizing: using redundant data connections
8.5.1 Use cases for denormalizing
8.5.2 Indexing, updating, and deleting denormalized data
8.5.3 Querying denormalized data
8.6 Application-side joins
8.7 Summary
Part 2
9 Scaling out
9.1 Adding nodes to your Elasticsearch cluster
9.1.1 Adding nodes to your cluster
9.2 Discovering other Elasticsearch nodes
9.2.1 Multicast discovery
9.2.2 Unicast discovery
9.2.3 Electing a master node and detecting faults
9.2.4 Fault detection
9.3 Removing nodes from a cluster
9.3.1 Decommissioning nodes
9.4 Upgrading Elasticsearch nodes
9.4.1 Performing a rolling restart
9.4.2 Minimizing recovery time for a restart
9.5 Using the _cat API
9.6 Scaling strategies
9.6.1 Over-sharding
9.6.2 Splitting data into indices and shards
9.6.3 Maximizing throughput
9.7 Aliases
9.7.1 What is an alias, really?
9.7.2 Alias creation
9.8 Routing
9.8.1 Why use routing?
9.8.2 Routing strategies
9.8.3 Using the _search_shards API to determine where a search is performed
9.8.4 Configuring routing
9.8.5 Combining routing with aliases
9.9 Summary
10 Improving performance
10.1 Grouping requests
10.1.1 Bulk indexing, updating, and deleting
10.1.2 Multisearch and multiget APIs
10.2 Optimizing the handling of Lucene segments
10.2.1 Refresh and flush thresholds
10.2.2 Merges and merge policies
10.2.3 Store and store throttling
10.3 Making the best use of caches
10.3.1 Filters and filter caches
10.3.2 Shard query cache
10.3.3 JVM heap and OS caches
10.3.4 Keeping caches up with warmers
10.4 Other performance tradeoffs
10.4.1 Big indices or expensive searches
10.4.2 Tuning scripts or not using them at all
10.4.3 Trading network trips for less data and better distributed scoring
10.4.4 Trading memory for better deep paging
10.5 Summary
11 Administering your cluster
11.1 Improving defaults
11.1.1 Index templates
11.1.2 Default mappings
11.2 Allocation awareness
11.2.1 Shard-based allocation
11.2.2 Forced allocation awareness
11.3 Monitoring for bottlenecks
11.3.1 Checking cluster health
11.3.2 CPU: slow logs, hot threads, and thread pools
11.3.3 Memory: heap size, field, and filter caches
11.3.4 OS caches
11.3.5 Store throttling
11.4 Backing up your data
11.4.1 Snapshot API
11.4.2 Backing up data to a shared file system
11.4.3 Restoring from backups
11.4.4 Using repository plugins
11.5 Summary
Appendix A—Working with geospatial data
A.1 Points and distances between them
A.2 Adding distance to your sort criteria
A.2.1 Sorting by distance and other criteria at the same time
A.3 Filter and aggregate based on distance
A.4 Does a point belong to a shape?
A.4.1 Bounding boxes
A.4.2 Geohashes
A.5 Shape intersections
A.5.1 Indexing shapes
A.5.2 Filtering overlapping shapes
Appendix B—Plugins
B.1 Working with plugins
B.2 Installing plugins
B.3 Accessing plugins
B.4 Telling Elasticsearch to require certain plugins
B.5 Removing or updating plugins
Appendix C—Highlighting
C.1 Highlighting basics
C.1.1 What should be passed on to the user
C.1.2 Too many fields contain highlighted terms
C.2 Highlighting options
C.2.1 Size, order, and number of fragments
C.2.2 Highlighting tags and fragment encoding
C.2.3 Highlight query
C.3 Highlighter implementations
C.3.1 Postings Highlighter
C.3.2 Fast Vector Highlighter
Appendix D—Elasticsearch monitoring plugins
D.1 Bigdesk: visualize your cluster
D.2 ElasticHQ: monitoring with management
D.3 Head: advanced query building
D.4 Kopf: snapshots, warmers, and percolators
D.5 Marvel: fine-grained analysis
D.6 Sematext SPM: the Swiss Army knife
Appendix E—Turning search upside down with the percolator
E.1 Percolator basics
E.1.1 Define a mapping, register queries, then percolate documents
E.1.2 Percolator under the hood
E.2 Performance tips
E.2.1 Options for requests and replies
E.2.2 Separating and filtering percolator queries
E.3 Functionality tricks
E.3.1 Highlighting percolated documents
E.3.2 Ranking matching queries
E.3.3 Aggregations on matching query metadata
Appendix F—Using suggesters for autocomplete and did-you-mean functionality
F.1 Did-you-mean suggesters
F.1.1 Term suggester
F.1.2 Phrase suggester
F.2 Autocomplete suggesters
F.2.1 Completion Suggester
F.2.2 Context Suggester
index
Symbols
Numerics
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Y
Back cover
Radu Gheorghe Matthew Lee Hinman Roy Russo M A N N I N G
Elasticsearch in Action Licensed to Thomas Snead
Licensed to Thomas Snead
Elasticsearch in Action RADU GHEORGHE MATTHEW LEE HINMAN ROY RUSSO M A N N I N G SHELTER ISLAND Licensed to Thomas Snead
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2016 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Development editor: Susan Conant Technical development editor: David Pombal Copyeditor: Linda Recktenwald Proofreader: Melody Dolab Technical proofreader: Valentin Crettaz Typesetter: Dennis Dalinnik Cover designer: Marija Tudor ISBN: 9781617291623 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – EBM – 20 19 18 17 16 15 Licensed to Thomas Snead
PART 1 .......................................................................................1 brief contents Introducing Elasticsearch 3 Indexing, updating, and deleting data 53 1 ■ 2 ■ Diving into the functionality 20 3 ■ 4 ■ Searching your data 83 5 ■ Analyzing your data 118 6 ■ Searching with relevancy 148 7 ■ Exploring your data with aggregations 179 8 ■ Relations among documents 215 PART 2 ...................................................................................259 9 ■ Scaling out 261 10 ■ 11 ■ Administering your cluster 340 Improving performance 293 v Licensed to Thomas Snead
Licensed to Thomas Snead
contents preface acknowledgments about this book about the cover illustration xxiii xvii xv xix PART 1 ........................................................................ 1 1 Introducing Elasticsearch 3 1.1 Solving search problems with Elasticsearch 4 Providing quick searches 5 ■ Ensuring relevant results 6 Searching beyond exact matches 7 1.2 Exploring typical Elasticsearch use cases 8 Using Elasticsearch as the primary back end 9 Adding Elasticsearch to an existing system 9 Using Elasticsearch with existing tools 11 Main Elasticsearch features 12 ■ Extending Lucene functionality 13 ■ Structuring your data in Elasticsearch 15 Installing Java 15 ■ Downloading and starting Elasticsearch 16 ■ Verifying that it works 16 1.3 Summary 18 vii Licensed to Thomas Snead
分享到:
收藏