Designing.Data-Intensive.Applications.2017.3.pdf

发布时间：2022-06-03 发布人：admin 分类：说明书资料大小：28.00M 资料格式：pdf 举报版权申诉

jeromeecui-10017850-4744302543013967521.pdf-第1页.png

第1页 / 共613页

jeromeecui-10017850-4744302543013967521.pdf-第2页.png

第2页 / 共613页

jeromeecui-10017850-4744302543013967521.pdf-第3页.png

第3页 / 共613页

jeromeecui-10017850-4744302543013967521.pdf-第4页.png

第4页 / 共613页

jeromeecui-10017850-4744302543013967521.pdf-第5页.png

第5页 / 共613页

jeromeecui-10017850-4744302543013967521.pdf-第6页.png

第6页 / 共613页

jeromeecui-10017850-4744302543013967521.pdf-第7页.png

第7页 / 共613页

jeromeecui-10017850-4744302543013967521.pdf-第8页.png

第8页 / 共613页

Table of Contents

Preface

Who Should Read This Book?

Scope of This Book

Outline of This Book

References and Further Reading

O’Reilly Safari

How to Contact Us

Acknowledgments

Part I. Foundations of Data Systems

Chapter 1. Reliable, Scalable, and Maintainable Applications

Thinking About Data Systems

Reliability

Hardware Faults

Software Errors

Human Errors

How Important Is Reliability?

Scalability

Describing Load

Describing Performance

Approaches for Coping with Load

Maintainability

Operability: Making Life Easy for Operations

Simplicity: Managing Complexity

Evolvability: Making Change Easy

Summary

Chapter 2. Data Models and Query Languages

Relational Model Versus Document Model

The Birth of NoSQL

The Object-Relational Mismatch

Many-to-One and Many-to-Many Relationships

Are Document Databases Repeating History?

Relational Versus Document Databases Today

Query Languages for Data

Declarative Queries on the Web

MapReduce Querying

Graph-Like Data Models

Property Graphs

The Cypher Query Language

Graph Queries in SQL

Triple-Stores and SPARQL

The Foundation: Datalog

Summary

Chapter 3. Storage and Retrieval

Data Structures That Power Your Database

Hash Indexes

SSTables and LSM-Trees

B-Trees

Comparing B-Trees and LSM-Trees

Other Indexing Structures

Transaction Processing or Analytics?

Data Warehousing

Stars and Snowflakes: Schemas for Analytics

Column-Oriented Storage

Column Compression

Sort Order in Column Storage

Writing to Column-Oriented Storage

Aggregation: Data Cubes and Materialized Views

Summary

Chapter 4. Encoding and Evolution

Formats for Encoding Data

Language-Specific Formats

JSON, XML, and Binary Variants

Thrift and Protocol Buffers

Avro

The Merits of Schemas

Modes of Dataflow

Dataflow Through Databases

Dataflow Through Services: REST and RPC

Message-Passing Dataflow

Summary

Part II. Distributed Data

Chapter 5. Replication

Leaders and Followers

Synchronous Versus Asynchronous Replication

Setting Up New Followers

Handling Node Outages

Implementation of Replication Logs

Problems with Replication Lag

Reading Your Own Writes

Monotonic Reads

Consistent Prefix Reads

Solutions for Replication Lag

Multi-Leader Replication

Use Cases for Multi-Leader Replication

Handling Write Conflicts

Multi-Leader Replication Topologies

Leaderless Replication

Writing to the Database When a Node Is Down

Limitations of Quorum Consistency

Sloppy Quorums and Hinted Handoff

Detecting Concurrent Writes

Summary

Chapter 6. Partitioning

Partitioning and Replication

Partitioning of Key-Value Data

Partitioning by Key Range

Partitioning by Hash of Key

Skewed Workloads and Relieving Hot Spots

Partitioning and Secondary Indexes

Partitioning Secondary Indexes by Document

Partitioning Secondary Indexes by Term

Rebalancing Partitions

Strategies for Rebalancing

Operations: Automatic or Manual Rebalancing

Request Routing

Parallel Query Execution

Summary

Chapter 7. Transactions

The Slippery Concept of a Transaction

The Meaning of ACID

Single-Object and Multi-Object Operations

Weak Isolation Levels

Read Committed

Snapshot Isolation and Repeatable Read

Preventing Lost Updates

Write Skew and Phantoms

Serializability

Actual Serial Execution

Two-Phase Locking (2PL)

Serializable Snapshot Isolation (SSI)

Summary

Chapter 8. The Trouble with Distributed Systems

Faults and Partial Failures

Cloud Computing and Supercomputing

Unreliable Networks

Network Faults in Practice

Detecting Faults

Timeouts and Unbounded Delays

Synchronous Versus Asynchronous Networks

Unreliable Clocks

Monotonic Versus Time-of-Day Clocks

Clock Synchronization and Accuracy

Relying on Synchronized Clocks

Process Pauses

Knowledge, Truth, and Lies

The Truth Is Defined by the Majority

Byzantine Faults

System Model and Reality

Summary

Chapter 9. Consistency and Consensus

Consistency Guarantees

Linearizability

What Makes a System Linearizable?

Relying on Linearizability

Implementing Linearizable Systems

The Cost of Linearizability

Ordering Guarantees

Ordering and Causality

Sequence Number Ordering

Total Order Broadcast

Distributed Transactions and Consensus

Atomic Commit and Two-Phase Commit (2PC)

Distributed Transactions in Practice

Fault-Tolerant Consensus

Membership and Coordination Services

Summary

Part III. Derived Data

Chapter 10. Batch Processing

Batch Processing with Unix Tools

Simple Log Analysis

The Unix Philosophy

MapReduce and Distributed Filesystems

MapReduce Job Execution

Reduce-Side Joins and Grouping

Map-Side Joins

The Output of Batch Workflows

Comparing Hadoop to Distributed Databases

Beyond MapReduce

Materialization of Intermediate State

Graphs and Iterative Processing

High-Level APIs and Languages

Summary

Chapter 11. Stream Processing

Transmitting Event Streams

Messaging Systems

Partitioned Logs

Databases and Streams

Keeping Systems in Sync

Change Data Capture

Event Sourcing

State, Streams, and Immutability

Processing Streams

Uses of Stream Processing

Reasoning About Time

Stream Joins

Fault Tolerance

Summary

Chapter 12. The Future of Data Systems

Data Integration

Combining Specialized Tools by Deriving Data

Batch and Stream Processing

Unbundling Databases

Composing Data Storage Technologies

Designing Applications Around Dataflow

Observing Derived State

Aiming for Correctness

The End-to-End Argument for Databases

Enforcing Constraints

Timeliness and Integrity

Trust, but Verify

Doing the Right Thing

Predictive Analytics

Privacy and Tracking

Summary

Glossary

Index

About the Author

Colophon

Designing Data-Intensive Applications THE BIG IDEAS BEHIND RELIABLE, SCALABLE, AND MAINTAINABLE SYSTEMS Martin Kleppmann

Designing Data-Intensive Applications The Big Ideas Behind Reliable, Scalable, and Maintainable Systems Martin Kleppmann Beijing Beijing Boston Boston Farnham Sebastopol Farnham Sebastopol Tokyo Tokyo

Designing Data-Intensive Applications by Martin Kleppmann Copyright © 2017 Martin Kleppmann. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐ tutional sales department: 800-998-9938 or corporate@oreilly.com. Editors: Ann Spencer and Marie Beaugureau Production Editor: Kristen Brown Copyeditor: Rachel Head Proofreader: Amanda Kersey Indexer: Ellen Troutman-Zaig Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Rebecca Demarest March 2017: First Edition Revision History for the First Edition 2017-03-01: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781449373320 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Designing Data-Intensive Applications, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-449-37332-0 [LSI]

Technology is a powerful force in our society. Data, software, and communication can be used for bad: to entrench unfair power structures, to undermine human rights, and to protect vested interests. But they can also be used for good: to make underrepresented people’s voices heard, to create opportunities for everyone, and to avert disasters. This book is dedicated to everyone working toward the good.

Computing is pop culture. […] Pop culture holds a disdain for history. Pop culture is all about identity and feeling like you’re participating. It has nothing to do with cooperation, the past or the future—it’s living in the present. I think the same is true of most people who write code for money. They have no idea where [their culture came from]. —Alan Kay, in interview with Dr Dobb’s Journal (2012)

分享到：

赞收藏

资料库

Designing.Data-Intensive.Applications.2017.3.pdf

相关推荐

大数据

热门标签

最新资料