Acknowledgments
Note to the Reader
Introduction
1.1 Warehouse-Scale Computers
1.2 Cost Efficiency at Scale
1.3 Not Just a Collection of Servers
1.4 One Datacenter Vs. Several Datacenters
1.5 Why WSCS Might Matter to You
1.6 Architectural Overview of WSCS
1.6.1 Storage
1.6.2 Networking Fabric
1.6.3 Storage Hierarchy
1.6.4 Quantifying Latency, Bandwidth, and Capacity
1.6.5 Power Usage
1.6.6 Handling Failures
Workloads and Software Infrastructure
2.1 Datacenter Vs. Desktop
2.2 Performance and Availability Toolbox
2.3 Platform-Level Software
2.4 Cluster-Level Infrastructure Software
2.4.1 Resource Management
2.4.2 Hardware Abstraction and Other Basic Services
2.4.3 Deployment and Maintenance
2.4.4 Programming Frameworks
2.5 Application-Level Software
2.5.1 Workload Examples
2.5.2 Online: Web Search
2.5.3 Offline: Scholar Article Similarity
2.6 A Monitoring Infrastructure
2.6.1 Service-Level Dashboards
2.6.2 Performance Debugging Tools
2.6.3 Platform-Level Health Monitoring
2.7 Buy Vs. Build
2.8 Tail-Tolerance
2.9 Further Reading
Hardware Building Blocks
3.1 Cost-Efficient Server Hardware
3.1.1 The Impact of Large SMP Communication Efficiency
3.1.2 Brawny vs. Wimpy Servers
3.1.3 Balanced Designs
3.2 WSC Storage
3.2.1 Unstructured WSC Storage
3.2.2 Structured WSC Storage
3.2.3 Interplay of Storage and Networking Technology
3.3 WSC Networking
3.4 Further Reading
Datacenter Basics
4.1 Datacenter Tier Classifications and Specifications
4.2 Datacenter Power Systems
4.2.1 Uninterruptible Power Systems
4.2.2 Power Distribution Units
4.2.3 Alternative: DC Distribution
4.3 Datacenter Cooling Systems
4.3.1 CRACs, Chillers, and Cooling Towers
4.3.2 CRACs
4.3.3 Chillers
4.3.4 Cooling towers
4.3.5 Free Cooling
4.3.6 Air Flow Considerations
4.3.7 In-Rack, In-Row Cooling, and Cold Plates
4.3.8 Case Study: Google’s In-row Cooling
4.3.9 Container-Based Datacenters
4.4 Summary
Energy and Power Efficiency
5.1 Datacenter Energy Efficiency
5.1.1 The PUE Metric
5.1.2 Issues with the PUE Metric
5.1.3 Sources of Efficiency Losses in Datacenters
5.1.4 Improving the Energy Efficiency of Datacenters
5.1.5 Beyond the Facility
5.2 The Energy Efficiency of Computing
5.2.1 Measuring Energy Efficiency
5.2.2 Server Energy Efficiency
5.2.3 Usage Profile of Warehouse-Scale Computers
5.3 Energy-Proportional Computing
5.3.1 Causes of Poor Energy Proportionality
5.3.2 Improving Energy Proportionality
5.3.3 Energy Proportionality—The Rest of the System
5.4 Relative Effectiveness of Low-Power Modes
5.5 The Role of Software in Energy Proportionality
5.6 Datacenter Power Provisioning
5.6.1 Deploying the Right Amount of Equipment
5.6.2 Oversubscribing Facility Power
5.7 Trends in Server Energy Usage
5.7.1 Using Energy Storage for Power Management
5.8 Conclusions
5.8.1 Further Reading
Modeling Costs
6.1 Capital Costs
6.2 Operational Costs
6.3 Case Studies
6.3.1 Real-World Datacenter Costs
6.3.2 Modeling a Partially Filled Datacenter
6.3.3 The Cost of Public Clouds
Dealing with Failures and Repairs
7.1 Implications of Software-Based Fault Tolerance
7.2 Categorizing Faults
7.3 Machine-Level Failures
7.4 Repairs
7.5 Tolerating Faults, Not Hiding Them
Closing Remarks
8.1 Hardware
8.2 Software
8.3 Economics
8.4 Key Challenges
8.4.1 Rapidly Changing Workloads
8.4.2 Building Responsive Large Scale Systems
8.4.3 Energy Proportionality of Non-CPU components
8.4.4 Overcoming the End of Dennard Scaling
8.4.5 Amdahl’s Cruel Law
8.5 Conclusions
Bibliography
Author Biographies