NoSQL Distilled
A Brief Guide to the Emerging World of Polyglot Persistence
Pramod J. Sadalage
Martin Fowler
Upper Saddle River, NJ • Boston • Indianapolis • San Francisco
New York • Toronto • Montreal • London • Munich • Paris • Madrid
Capetown • Sydney • Tokyo • Singapore • Mexico City
Many of the designations used by manufacturers and sellers to distinguish their products are claimed
as trademarks. Where those designations appear in this book, and the publisher was aware of a
trademark claim, the designations have been printed with initial capital letters or in all capitals.
The authors and publisher have taken care in the preparation of this book, but make no expressed or
implied warranty of any kind and assume no responsibility for errors or omissions. No liability is
assumed for incidental or consequential damages in connection with or arising out of the use of the
information or programs contained herein.
The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or
special sales, which may include electronic versions and/or custom covers and content particular to
your business, training goals, marketing focus, and branding interests. For more information, please
contact:
U.S. Corporate and Government Sales
(800) 382–3419
corpsales@pearsontechgroup.com
For sales outside the United States please contact:
International Sales
international@pearson.com
Visit us on the Web: informit.com/aw
Library of Congress Cataloging-in-Publication Data:
Sadalage, Pramod J.
NoSQL distilled : a brief guide to the emerging world of polyglot
persistence / Pramod J Sadalage, Martin Fowler.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-321-82662-6 (pbk. : alk. paper) -- ISBN 0-321-82662-0 (pbk. :
alk. paper) 1. Databases--Technological innovations. 2. Information
storage and retrieval systems. I. Fowler, Martin, 1963- II. Title.
QA76.9.D32S228 2013
005.74--dc23
Copyright © 2013 Pearson Education, Inc.
All rights reserved. Printed in the United States of America. This publication is protected by
copyright, and permission must be obtained from the publisher prior to any prohibited reproduction,
storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical,
photocopying, recording, or likewise. To obtain permission to use material from this work, please
submit a written request to Pearson Education, Inc., Permissions Department, One Lake Street, Upper
Saddle River, New Jersey 07458, or you may fax your request to (201) 236–3290.
ISBN-13: 978-0-321-82662-6
ISBN-10: 0-321-82662-0
Text printed in the United States on recycled paper at RR Donnelley in Crawfordsville, Indiana.
First printing, August 2012
For my teachers Gajanan Chinchwadkar,
Dattatraya Mhaskar, and Arvind Parchure. You
inspired me the most, thank you.
—Pramod
For Cindy
—Martin
Contents
Preface
Part I: Understand
Chapter 1: Why NoSQL?
1.1 The Value of Relational Databases
1.1.1 Getting at Persistent Data
1.1.2 Concurrency
1.1.3 Integration
1.1.4 A (Mostly) Standard Model
1.2 Impedance Mismatch
1.3 Application and Integration Databases
1.4 Attack of the Clusters
1.5 The Emergence of NoSQL
1.6 Key Points
Chapter 2: Aggregate Data Models
2.1 Aggregates
2.1.1 Example of Relations and Aggregates
2.1.2 Consequences of Aggregate Orientation
2.2 Key-Value and Document Data Models
2.3 Column-Family Stores
2.4 Summarizing Aggregate-Oriented Databases
2.5 Further Reading
2.6 Key Points
Chapter 3: More Details on Data Models
3.1 Relationships
3.2 Graph Databases
3.3 Schemaless Databases
3.4 Materialized Views
3.5 Modeling for Data Access
3.6 Key Points
Chapter 4: Distribution Models
4.1 Single Server
4.2 Sharding
4.3 Master-Slave Replication
4.4 Peer-to-Peer Replication
4.5 Combining Sharding and Replication
4.6 Key Points
Chapter 5: Consistency
5.1 Update Consistency
5.2 Read Consistency
5.3 Relaxing Consistency
5.3.1 The CAP Theorem
5.4 Relaxing Durability
5.5 Quorums
5.6 Further Reading
5.7 Key Points
Chapter 6: Version Stamps
6.1 Business and System Transactions
6.2 Version Stamps on Multiple Nodes
6.3 Key Points
Chapter 7: Map-Reduce
7.1 Basic Map-Reduce
7.2 Partitioning and Combining
7.3 Composing Map-Reduce Calculations
7.3.1 A Two Stage Map-Reduce Example
7.3.2 Incremental Map-Reduce
7.4 Further Reading
7.5 Key Points
Part II: Implement
Chapter 8: Key-Value Databases
8.1 What Is a Key-Value Store
8.2 Key-Value Store Features
8.2.1 Consistency
8.2.2 Transactions
8.2.3 Query Features
8.2.4 Structure of Data
8.2.5 Scaling
8.3 Suitable Use Cases
8.3.1 Storing Session Information
8.3.2 User Profiles, Preferences
8.3.3 Shopping Cart Data
8.4 When Not to Use
8.4.1 Relationships among Data
8.4.2 Multioperation Transactions
8.4.3 Query by Data
8.4.4 Operations by Sets
Chapter 9: Document Databases
9.1 What Is a Document Database?
9.2 Features
9.2.1 Consistency
9.2.2 Transactions
9.2.3 Availability
9.2.4 Query Features
9.2.5 Scaling
9.3 Suitable Use Cases
9.3.1 Event Logging
9.3.2 Content Management Systems, Blogging Platforms
9.3.3 Web Analytics or Real-Time Analytics
9.3.4 E-Commerce Applications
9.4 When Not to Use
9.4.1 Complex Transactions Spanning Different Operations
9.4.2 Queries against Varying Aggregate Structure
Chapter 10: Column-Family Stores
10.1 What Is a Column-Family Data Store?
10.2 Features
10.2.1 Consistency
10.2.2 Transactions
10.2.3 Availability
10.2.4 Query Features
10.2.5 Scaling
10.3 Suitable Use Cases
10.3.1 Event Logging
10.3.2 Content Management Systems, Blogging Platforms
10.3.3 Counters
10.3.4 Expiring Usage