Hadoop Explained.pdf )

发布时间：2022-05-30 发布人：admin 分类：说明书资料大小：0.75M 资料格式：pdf 举报版权申诉

doctor_who2004-10238262-16359648042625935727.pdf-第1页.png

第1页 / 共22页

doctor_who2004-10238262-16359648042625935727.pdf-第2页.png

第2页 / 共22页

doctor_who2004-10238262-16359648042625935727.pdf-第3页.png

第3页 / 共22页

doctor_who2004-10238262-16359648042625935727.pdf-第4页.png

第4页 / 共22页

doctor_who2004-10238262-16359648042625935727.pdf-第5页.png

第5页 / 共22页

doctor_who2004-10238262-16359648042625935727.pdf-第6页.png

第6页 / 共22页

doctor_who2004-10238262-16359648042625935727.pdf-第7页.png

第7页 / 共22页

doctor_who2004-10238262-16359648042625935727.pdf-第8页.png

第8页 / 共22页

文本预览

Hadoop Explained An introduction to the most popular Big Data platform in the world Aravind Shenoy BIRMINGHAM - MUMBAI

Hadoop Explained Copyright © 2014 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First Published: June 2014 Production reference: 1100614 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. www.packtpub.com Cover Image by Feroze Babu (mail@feroze.me)

Credits Author Aravind Shenoy Reviewer Mark Kerzner Proofreader Paul Hindle Graphics Sheetal Aute Commissioning Editor Edward Gordon Production Coordinator Adonia Jones Content Development Editor Madhuja Chaudhari Cover Work Adonia Jones Technical Editors Pankaj Kadam Veena Pagare

About the Author Aravind Shenoy is an in-house author at Packt Publishing. An Engineering graduate from the Manipal Institute of Technology, his core interests include technical writing, web designing, and software testing. He is a native of Mumbai, India, and currently resides there. He has authored books such as, Thinking in JavaScript and Thinking in CSS. He has also authored the bestselling book HTML5 and CSS3 Transition, Transformation, and Animation, Packt Publishing (http://www.packtpub.com/html5-and-css3-for-transition- transformation-animation/book). He is a music buff with The Doors, Oasis, and R.E.M ruling his playlists.

Overview With the almost unfathomable increase in web traffic over recent years, driven by millions of connected users, businesses are gaining access to massive amounts of complex, unstructured data from which to gain insight. When Hadoop was introduced by Yahoo in 2007, it brought with it a paradigm shift in how this data was stored and analyzed. Hadoop allowed small- and medium-sized companies to store huge amounts of data on cheap commodity servers in racks. This data could thus be processed and used to make business decisions that were supported by ‘Big Data’. Hadoop is now implemented in major organizations such as Amazon, IBM, Cloudera, and Dell to name a few. This book introduces you to Hadoop and to concepts such as MapReduce, Rack Awareness, YARN, and HDFS Federation, which will help you get acquainted with the technology.

Hadoop Explained Understanding Big Data A lot of organizations used to have structured databases on which data processing would be implemented. The data was limited and maintained in a systematic manner using database management systems (think RDBMS). When Google developed its search engine, it was compounded with the task of maintaining a large amount of data, as the web pages used to get updated on a regular basis. A lot of information had to be stored in the database, and most of the data was in an unstructured format. Video, audio, and web logs resulted in a humongous amount of data. The Google search engine is an amazing tool that users can use to search for the information they need with a lot of ease. Research on data can determine the user preferences which can be used to increase the customer base as well as customer satisfaction. An example of that would be the advertisements found on Google. As we all know, Facebook is widely used today, and the users upload a lot of content in the form of videos and other posts. Hence, a lot of data had to be managed quickly. With the advent of social networking applications, the amount of data to be stored increased day by day and so did the rate of increase. Another example would be financial institutions. Financial institutions target customers by the data at their disposal, which shows the trends in the market. They can also determine user preferences by their transaction history. Online portals also help to determine the purchase history of the customers, and based on the pattern, they gauge the need of the customers, thereby customizing the portals according to the market trend. Hence, data is very crucial, and the increase in data at a rapid pace is of significant importance. This is how the concept of Big Data came into existence.

分享到：

赞收藏

资料库

Hadoop Explained.pdf )

相关推荐

开发技术

热门标签

最新资料