Hadoop Explained
An introduction to the most popular Big Data platform in
the world
Aravind Shenoy
BIRMINGHAM - MUMBAI
Hadoop Explained
Copyright © 2014 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means, without the prior written permission of the publisher,
except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the
information presented. However, the information contained in this book is sold without
warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers
and distributors will be held liable for any damages caused or alleged to be caused directly or
indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies
and products mentioned in this book by the appropriate use of capitals. However, Packt
Publishing cannot guarantee the accuracy of this information.
First Published: June 2014
Production reference: 1100614
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
www.packtpub.com
Cover Image by Feroze Babu (mail@feroze.me)
Credits
Author
Aravind Shenoy
Reviewer
Mark Kerzner
Proofreader
Paul Hindle
Graphics
Sheetal Aute
Commissioning Editor
Edward Gordon
Production Coordinator
Adonia Jones
Content Development Editor
Madhuja Chaudhari
Cover Work
Adonia Jones
Technical Editors
Pankaj Kadam
Veena Pagare
About the Author
Aravind Shenoy is an in-house author at Packt Publishing. An Engineering graduate
from the Manipal Institute of Technology, his core interests include technical writing, web
designing, and software testing. He is a native of Mumbai, India, and currently resides there.
He has authored books such as, Thinking in JavaScript and Thinking in CSS. He has also
authored the bestselling book HTML5 and CSS3 Transition, Transformation, and Animation,
Packt Publishing (http://www.packtpub.com/html5-and-css3-for-transition-
transformation-animation/book). He is a music buff with The Doors, Oasis, and R.E.M
ruling his playlists.
Overview
With the almost unfathomable increase in web traffic over recent years, driven by millions
of connected users, businesses are gaining access to massive amounts of complex,
unstructured data from which to gain insight.
When Hadoop was introduced by Yahoo in 2007, it brought with it a paradigm shift in how
this data was stored and analyzed. Hadoop allowed small- and medium-sized companies to
store huge amounts of data on cheap commodity servers in racks. This data could thus be
processed and used to make business decisions that were supported by ‘Big Data’.
Hadoop is now implemented in major organizations such as Amazon, IBM, Cloudera,
and Dell to name a few. This book introduces you to Hadoop and to concepts such as
MapReduce, Rack Awareness, YARN, and HDFS Federation, which will help you get
acquainted with the technology.
Hadoop Explained
Understanding Big Data
A lot of organizations used to have structured databases on which data processing would be
implemented. The data was limited and maintained in a systematic manner using database
management systems (think RDBMS). When Google developed its search engine, it was
compounded with the task of maintaining a large amount of data, as the web pages used
to get updated on a regular basis. A lot of information had to be stored in the database, and
most of the data was in an unstructured format. Video, audio, and web logs resulted in a
humongous amount of data. The Google search engine is an amazing tool that users can use
to search for the information they need with a lot of ease. Research on data can determine
the user preferences which can be used to increase the customer base as well as customer
satisfaction. An example of that would be the advertisements found on Google.
As we all know, Facebook is widely used today, and the users upload a lot of content in the
form of videos and other posts. Hence, a lot of data had to be managed quickly. With the
advent of social networking applications, the amount of data to be stored increased day by
day and so did the rate of increase. Another example would be financial institutions. Financial
institutions target customers by the data at their disposal, which shows the trends in the
market. They can also determine user preferences by their transaction history. Online portals
also help to determine the purchase history of the customers, and based on the pattern, they
gauge the need of the customers, thereby customizing the portals according to the market
trend. Hence, data is very crucial, and the increase in data at a rapid pace is of significant
importance. This is how the concept of Big Data came into existence.