Cloudera Data Analyst Training:
Using Pig, Hive, and Impala with
Hadoop
201602
IntroducCon
Chapter 1
Course Chapters
§ Introduc.on
§ Hadoop Fundamentals
§ IntroducCon to Pig
§ Basic Data Analysis with Pig
§ Processing Complex Data with Pig
§ MulC-Dataset OperaCons with Pig
§ Pig TroubleshooCng and OpCmizaCon
§ IntroducCon to Impala and Hive
§ Querying with Impala and Hive
§ Impala and Hive Data Management
§ Data Storage and Performance
§ RelaConal Data Analysis With Impala and Hive
§ Working with Impala
§ Analyzing Text and Complex Data with Hive
§ Hive OpCmizaCon
§ Extending Hive
§ Choosing the Best Tool for the Job
§ Conclusion
© Copyright 2010-2016 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriIen consent from Cloudera.
01-3
Chapter Topics
Introduc.on
§ About this Course
§ About Cloudera
§ Course LogisCcs
§ IntroducCons
© Copyright 2010-2016 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriIen consent from Cloudera.
01-4
Course ObjecCves (1)
During this course, you will learn
§ The purpose of Hadoop and its related tools
§ The features that Pig, Hive, and Impala offer for data acquisi.on, storage,
and analysis
§ How to iden.fy typical use cases for large-scale data analysis
§ How to load data from rela.onal databases and other sources
§ How to manage data in HDFS and export it for use with other systems
§ How Pig, Hive, and Impala improve produc.vity for typical analysis tasks
§ The language syntax and data formats supported by these tools
© Copyright 2010-2016 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriIen consent from Cloudera.
01-5
Course ObjecCves (2)
§ How to design and execute queries on data stored in HDFS
§ How to join diverse datasets to gain valuable business insight
§ How Hive and Impala can be extended with custom func.ons and scripts
§ How to analyze structured, semi-structured, and unstructured data
§ How to store and query data for beOer performance
§ How to determine which tool is the best choice for a given task
© Copyright 2010-2016 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriIen consent from Cloudera.
01-6
Chapter Topics
Introduc.on
§ About this Course
§ About Cloudera
§ Course LogisCcs
§ IntroducCons
© Copyright 2010-2016 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriIen consent from Cloudera.
01-7
About Cloudera (1)
§ The leader in Apache Hadoop-based soRware and services
§ Founded by Hadoop experts from Facebook, Yahoo, Google, and Oracle
§ Provides support, consul.ng, training, and cer.fica.on for Hadoop users
§ Staff includes commiOers to virtually all Hadoop projects
§ Many authors of industry standard books on Apache Hadoop projects
© Copyright 2010-2016 Cloudera. All rights reserved. Not to be reproduced or shared without prior wriIen consent from Cloudera.
01-8