Apache Hadoop YARN.pdf完整电子版

发布时间：2022-06-15 发布人：admin 分类：说明书资料大小：8.17M 资料格式：pdf 举报版权申诉

db38be55-71e4-4f7d-98bf-9038e037ae85.pdf-第1页.png

第1页 / 共337页

db38be55-71e4-4f7d-98bf-9038e037ae85.pdf-第2页.png

第2页 / 共337页

db38be55-71e4-4f7d-98bf-9038e037ae85.pdf-第3页.png

第3页 / 共337页

db38be55-71e4-4f7d-98bf-9038e037ae85.pdf-第4页.png

第4页 / 共337页

db38be55-71e4-4f7d-98bf-9038e037ae85.pdf-第5页.png

第5页 / 共337页

db38be55-71e4-4f7d-98bf-9038e037ae85.pdf-第6页.png

第6页 / 共337页

db38be55-71e4-4f7d-98bf-9038e037ae85.pdf-第7页.png

第7页 / 共337页

db38be55-71e4-4f7d-98bf-9038e037ae85.pdf-第8页.png

第8页 / 共337页

Contents

Foreword by Raymie Stata

Foreword by Paul Dix

Preface

Acknowledgments

About the Authors

1 Apache Hadoop YARN: A Brief History and Rationale

Introduction

Apache Hadoop

Phase 0: The Era of Ad Hoc Clusters

Phase 1: Hadoop on Demand

HDFS in the HOD World

Features and Advantages of HOD

Shortcomings of Hadoop on Demand

Phase 2: Dawn of the Shared Compute Clusters

Evolution of Shared Clusters

Issues with Shared MapReduce Clusters

Phase 3: Emergence of YARN

Conclusion

2 Apache Hadoop YARN Install Quick Start

Getting Started

Steps to Configure a Single-Node YARN Cluster

Step 1: Download Apache Hadoop

Step 2: Set JAVA_HOME

Step 3: Create Users and Groups

Step 4: Make Data and Log Directories

Step 5: Configure core-site.xml

Step 6: Configure hdfs-site.xml

Step 7: Configure mapred-site.xml

Step 8: Configure yarn-site.xml

Step 9: Modify Java Heap Sizes

Step 10: Format HDFS

Step 11: Start the HDFS Services

Step 12: Start YARN Services

Step 13: Verify the Running Services Using the Web Interface

Run Sample MapReduce Examples

Wrap-up

3 Apache Hadoop YARN Core Concepts

Beyond MapReduce

The MapReduce Paradigm

Apache Hadoop MapReduce

The Need for Non-MapReduce Workloads

Addressing Scalability

Improved Utilization

User Agility

Apache Hadoop YARN

YARN Components

ResourceManager

ApplicationMaster

Resource Model

ResourceRequests and Containers

Container Specification

Wrap-up

4 Functional Overview of YARN Components

Architecture Overview

ResourceManager

YARN Scheduling Components

FIFO Scheduler

Capacity Scheduler

Fair Scheduler

Containers

NodeManager

ApplicationMaster

YARN Resource Model

Client Resource Request

ApplicationMaster Container Allocation

ApplicationMaster–Container Manager Communication

Managing Application Dependencies

LocalResources Definitions

LocalResource Timestamps

LocalResource Types

LocalResource Visibilities

Lifetime of LocalResources

Wrap-up

5 Installing Apache Hadoop YARN

The Basics

System Preparation

Step 1: Install EPEL and pdsh

Step 2: Generate and Distribute ssh Keys

Script-based Installation of Hadoop 2

JDK Options

Step 1: Download and Extract the Scripts

Step 2: Set the Script Variables

Step 3: Provide Node Names

Step 4: Run the Script

Step 5: Verify the Installation

Script-based Uninstall

Configuration File Processing

Configuration File Settings

core-site.xml

hdfs-site.xml

mapred-site.xml

yarn-site.xml

Start-up Scripts

Installing Hadoop with Apache Ambari

Performing an Ambari-based Hadoop Installation

Step 1: Check Requirements

Step 2: Install the Ambari Server

Step 3: Install and Start Ambari Agents

Step 4: Start the Ambari Server

Step 5: Install an HDP2.X Cluster

Wrap-up

6 Apache Hadoop YARN Administration

Script-based Configuration

Monitoring Cluster Health: Nagios

Monitoring Basic Hadoop Services

Monitoring the JVM

Real-time Monitoring: Ganglia

Administration with Ambari

JVM Analysis

Basic YARN Administration

YARN Administrative Tools

Adding and Decommissioning YARN Nodes

Capacity Scheduler Configuration

YARN WebProxy

Using the JobHistoryServer

Refreshing User-to-Groups Mappings

Refreshing Superuser Proxy Groups Mappings

Refreshing ACLs for Administration of ResourceManager

Reloading the Service-level Authorization Policy File

Managing YARN Jobs

Setting Container Memory

Setting Container Cores

Setting MapReduce Properties

User Log Management

Wrap-up

7 Apache Hadoop YARN Architecture Guide

Overview

ResourceManager

Overview of the ResourceManager Components

Client Interaction with the ResourceManager

Application Interaction with the ResourceManager

Interaction of Nodes with the ResourceManager

Core ResourceManager Components

Security-related Components in the ResourceManager

NodeManager

Overview of the NodeManager Components

NodeManager Components

NodeManager Security Components

Important NodeManager Functions

ApplicationMaster

Overview

Liveliness

Resource Requirements

Scheduling

Scheduling Protocol and Locality

Launching Containers

Completed Containers

ApplicationMaster Failures and Recovery

Coordination and Output Commit

Information for Clients

Security

Cleanup on ApplicationMaster Exit

YARN Containers

Container Environment

Communication with the ApplicationMaster

Summary for Application-writers

Wrap-up

8 Capacity Scheduler in YARN

Introduction to the Capacity Scheduler

Elasticity with Multitenancy

Security

Resource Awareness

Granular Scheduling

Locality

Scheduling Policies

Capacity Scheduler Configuration

Queues

Hierarchical Queues

Key Characteristics

Scheduling Among Queues

Defining Hierarchical Queues

Queue Access Control

Capacity Management with Queues

User Limits

Reservations

State of the Queues

Limits on Applications

User Interface

Wrap-up

9 MapReduce with Apache Hadoop YARN

Running Hadoop YARN MapReduce Examples

Listing Available Examples

Running the Pi Example

Using the Web GUI to Monitor Examples

Running the Terasort Test

Run the TestDFSIO Benchmark

MapReduce Compatibility

The MapReduce ApplicationMaster

Enabling Application Master Restarts

Enabling Recovery of Completed Tasks

The JobHistory Server

Calculating the Capacity of a Node

Changes to the Shuffle Service

Running Existing Hadoop Version 1 Applications

Binary Compatibility of org.apache.hadoop.mapred APIs

Source Compatibility of org.apache.hadoop. mapreduce APIs

Compatibility of Command-line Scripts

Compatibility Tradeoff Between MRv1 and Early MRv2 (0.23.x) Applications

Running MapReduce Version 1 Existing Code

Running Apache Pig Scripts on YARN

Running Apache Hive Queries on YARN

Running Apache Oozie Workflows on YARN

Advanced Features

Uber Jobs

Pluggable Shuffle and Sort

Wrap-up

10 Apache Hadoop YARN Application Example

The YARN Client

The ApplicationMaster

Wrap-up

11 Using Apache Hadoop YARN Distributed-Shell

Using the YARN Distributed-Shell

A Simple Example

Using More Containers

Distributed-Shell Examples with Shell Arguments

Internals of the Distributed-Shell

Application Constants

Client

ApplicationMaster

Final Containers

Wrap-up

12 Apache Hadoop YARN Frameworks

Distributed-Shell

Hadoop MapReduce

Apache Tez

Apache Giraph

Hoya: HBase on YARN

Dryad on YARN

Apache Spark

Apache Storm

REEF: Retainable Evaluator Execution Framework

Hamster: Hadoop and MPI on the Same Cluster

Wrap-up

A: Supplemental Content and Code Downloads

Available Downloads

B: YARN Installation Scripts

install-hadoop2.sh

uninstall-hadoop2.sh

hadoop-xml-conf.sh

C: YARN Administration Scripts

configure-hadoop2.sh

D: Nagios Modules

check_resource_manager.sh

check_data_node.sh

check_resource_manager_old_space_pct.sh

E: Resources and Additional Information

F: HDFS Quick Reference

Quick Command Reference

Starting HDFS and the HDFS Web GUI

Get an HDFS Status Report

Perform an FSCK on HDFS

General HDFS Commands

List Files in HDFS

Make a Directory in HDFS

Copy Files to HDFS

Copy Files from HDFS

Copy Files within HDFS

Delete a File within HDFS

Delete a Directory in HDFS

Decommissioning HDFS Nodes

Index

www.it-ebooks.info ptg12441863

Apache Hadoop™ YARN

The Addison-Wesley Data and Analytics Series Visit informit.com/awdataseries for a complete list of available publications. T he Addison-Wesley Data and Analytics Series provides readers with practical knowledge for solving problems and answering questions with data. Titles in this series primarily focus on three areas: 1. Infrastructure: how to store, move, and manage data 2. Algorithms: how to mine intelligence or make predictions based on data 3. Visualizations: how to represent data and insights in a meaningful and compelling way The series aims to tie all three of these areas together to help the reader build end-to-end systems for fighting spam; making recommendations; building personalization; detecting trends, patterns, or problems; and gaining insight from the data exhaust of systems and user interactions. Make sure to connect with us! informit.com/socialconnect www.it-ebooks.info ptg12441863

Apache Hadoop™ YARN Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2 Arun C. Murthy Vinod Kumar Vavilapalli Doug Eadline Joseph Niemiec Jeff Markham Upper Saddle River, NJ • Boston • Indianapolis • San Francisco New York • Toronto • Montreal • London • Munich • Paris • Madrid Capetown • Sydney • Tokyo • Singapore • Mexico City www.it-ebooks.info ptg12441863

Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. For information about buying this title in bulk quantities, or for special sales opportunities (which may include electronic versions; custom cover designs; and content particular to your business, training goals, marketing focus, or branding interests), please contact our corporate sales depart- ment at corpsales@pearsoned.com or (800) 382-3419. For government sales inquiries, please contact governmentsales@pearsoned.com. For questions about sales outside the United States, please contact international@pearsoned.com. Visit us on the Web: informit.com/aw Library of Congress Cataloging-in-Publication Data Murthy, Arun C. Apache Hadoop YARN : moving beyond MapReduce and batch processing with Apache Hadoop 2 / Arun C. Murthy, Vinod Kumar Vavilapalli, Doug Eadline, Joseph Niemiec, Jeff Markham. pages cm Includes index. ISBN 978-0-321-93450-5 (pbk. : alk. paper) 1. Apache Hadoop. 2. Electronic data processing—Distributed processing. I. Title. QA76.9.D5M97 2014 004'.36—dc23 2014003391 Copyright © 2014 Hortonworks Inc. Apache, Apache Hadoop, Hadoop, and the Hadoop elephant logo are trademarks of The Apache Software Foundation. Used with permission. No endorsement by The Apache Software Foundation is implied by the use of these marks. Hortonworks is a trademark of Hortonworks, Inc., registered in the U.S. and other countries. All rights reserved. Printed in the United States of America. This publication is protected by copy- right, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. To obtain permission to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, One Lake Street, Upper Saddle River, New Jersey 07458, or you may fax your request to (201) 236-3290. ISBN-13: 978-0-321-93450-5 ISBN-10: 0-321-93450-4 Text printed in the United States on recycled paper at RR Donnelley in Crawfordsville, Indiana. First printing, March 2014 www.it-ebooks.info ptg12441863

Contents xiii Foreword by Raymie Stata Foreword by Paul Dix xv Preface Acknowledgments About the Authors xxi xxv xvii 1 Apache Hadoop YARN: 1 1 A Brief History and Rationale Introduction Apache Hadoop 2 Phase 0: The Era of Ad Hoc Clusters Phase 1: Hadoop on Demand 3 3 HDFS in the HOD World Features and Advantages of HOD Shortcomings of Hadoop on Demand 5 6 7 Phase 2: Dawn of the Shared Compute Clusters 9 Evolution of Shared Clusters Issues with Shared MapReduce Clusters 9 15 Phase 3: Emergence of YARN Conclusion 20 18 2 Apache Hadoop YARN Install Quick Start 21 Getting Started Steps to Configure a Single-Node YARN Cluster 22 22 22 23 23 Step 1: Download Apache Hadoop Step 2: Set JAVA_HOME 23 Step 3: Create Users and Groups Step 4: Make Data and Log Directories 24 Step 5: Configure core-site.xml 24 Step 6: Configure hdfs-site.xml Step 7: Configure mapred-site.xml Step 8: Configure yarn-site.xml Step 9: Modify Java Heap Sizes Step 10: Format HDFS Step 11: Start the HDFS Services 25 26 26 27 25 www.it-ebooks.info ptg12441863

vi Step 12: Start YARN Services Step 13: Verify the Running Services Using the Web Interface 28 28 Run Sample MapReduce Examples Wrap-up 31 30 3 Apache Hadoop YARN Core Concepts 33 Beyond MapReduce 33 The MapReduce Paradigm Apache Hadoop MapReduce 35 35 37 The Need for Non-MapReduce Workloads Addressing Scalability Improved Utilization User Agility 38 37 38 Apache Hadoop YARN YARN Components 39 38 39 40 ResourceManager ApplicationMaster Resource Model ResourceRequests and Containers Container Specification 42 41 41 Wrap-up 42 4 Functional Overview of YARN Components 43 Architecture Overview ResourceManager YARN Scheduling Components 45 43 46 46 47 47 FIFO Scheduler Capacity Scheduler Fair Scheduler 49 Containers NodeManager ApplicationMaster YARN Resource Model 49 50 50 Client Resource Request ApplicationMaster Container Allocation ApplicationMaster–Container Manager Communication 51 52 51 www.it-ebooks.info ptg12441863Contents

vii 53 Managing Application Dependencies 54 55 LocalResources Definitions LocalResource Timestamps 55 LocalResource Types LocalResource Visibilities Lifetime of LocalResources 56 57 Wrap-up 57 5 Installing Apache Hadoop YARN 59 The Basics System Preparation 59 60 Step 1: Install EPEL and pdsh Step 2: Generate and Distribute ssh Keys 62 Script-based Installation of Hadoop 2 60 62 JDK Options Step 1: Download and Extract the Scripts Step 2: Set the Script Variables Step 3: Provide Node Names 64 Step 4: Run the Script Step 5: Verify the Installation 63 64 65 61 63 68 Script-based Uninstall Configuration File Processing Configuration File Settings 68 68 68 69 core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml Start-up Scripts Installing Hadoop with Apache Ambari 70 71 69 71 72 Performing an Ambari-based Hadoop Installation Step 1: Check Requirements 73 Step 2: Install the Ambari Server Step 3: Install and Start Ambari Agents 74 Step 4: Start the Ambari Server Step 5: Install an HDP2.X Cluster 73 75 73 Wrap-up 84 www.it-ebooks.info ptg12441863Contents

分享到：

赞收藏

资料库

Apache Hadoop YARN.pdf完整电子版

相关推荐

后端

热门标签

最新资料