logo资料库

Cloudera Introduction官方介绍文档.pdf

第1页 / 共82页
第2页 / 共82页
第3页 / 共82页
第4页 / 共82页
第5页 / 共82页
第6页 / 共82页
第7页 / 共82页
第8页 / 共82页
资料共82页,剩余部分请下载后查看
Table of Contents
About Cloudera Introduction
Documentation Overview
CDH Overview
Impala Overview
Impala Benefits
How Impala Works with CDH
Primary Impala Features
Cloudera Search Overview
How Cloudera Search Works
Understanding Cloudera Search
Cloudera Search and Other Cloudera Components
Cloudera Search Architecture
Cloudera Search Configuration Files
Cloudera Search Tasks and Processes
Ingestion
Indexing
Querying
Apache Sentry Overview
Apache Spark Overview
Cloudera Manager 5 Overview
Terminology
Architecture
State Management
Configuration Management
Process Management
Software Distribution Management
Host Management
Resource Management
User Management
Security Management
Cloudera Management Service
Cloudera Manager Admin Console
Starting and Logging into the Admin Console
Cloudera Manager Admin Console Home Page
Starting and Logging into the Cloudera Manager Admin Console
Displaying Cloudera Manager Documentation
Displaying the Cloudera Manager Server Version and Server Time
Cloudera Manager API
Backing Up and Restoring the Cloudera Manager Configuration
Using the Cloudera Manager Java API for Cluster Automation
Cluster Automation Use Cases
Java API Examples
Extending Cloudera Manager
Cloudera Navigator 2 Overview
Cloudera Navigator Data Management Overview
Cloudera Navigator Data Management UI
Cloudera Navigator Data Management API
Accessing API Documentation
Capturing and Downloading API Calls
Displaying Cloudera Navigator Data Management Documentation
Displaying the Cloudera Navigator Data Management Component Version
Cloudera Navigator Data Encryption Overview
Cloudera Navigator Encryption Architecture
Cloudera Navigator Encryption Integration with an EDH
Cloudera Navigator Key Trustee Server Overview
Key Trustee Server Architecture
Cloudera Navigator Key HSM Overview
Key HSM Architecture
Cloudera Navigator Encrypt Overview
Process-Based Access Control List
Encryption Key Storage and Management
Frequently Asked Questions About Cloudera Software
Cloudera Express and Cloudera Enterprise Features
Cloudera Manager 5 Frequently Asked Questions
General Questions
Cloudera Navigator 2 Frequently Asked Questions
Impala Frequently Asked Questions
Trying Impala
Impala System Requirements
Supported and Unsupported Functionality In Impala
How do I?
Impala Performance
Impala Use Cases
Questions about Impala And Hive
Impala Availability
Impala Internals
SQL
Partitioned Tables
HBase
Cloudera Search Frequently Asked Questions
General
What is Cloudera Search?
What is the difference between Lucene and Solr?
What is Apache Tika?
How does Cloudera Search relate to web search?
How does Cloudera Search relate to enterprise search?
How does Cloudera Search relate to custom search applications?
Do Search security features use Kerberos?
Do I need to configure Sentry restrictions for each access mode, such as for the admin console and for the command line?
Does Search support indexing data stored in JSON files and objects?
How can I set up Cloudera Search so that results include links back to the source that contains the result?
Why do I get an error “no field name specified in query and no default specified via 'df' param" when I query a Schemaless collection?
Performance and Fail Over
How large of an index does Cloudera Search support per search server?
What is the response time latency I can expect?
What happens when a write to the Lucene indexer fails?
What hardware or configuration changes can I make to improve Search performance?
Are there settings that can help avoid out of memory (OOM) errors during data ingestion?
Schema Management
If my schema changes, will I need to re-index all of my data and files?
Can I extract fields based on regular expressions or rules?
Can I use nested schemas?
What is Apache Avro and how can I use an Avro schema for more flexible schema evolution?
Supportability
Does Cloudera Search support multiple languages?
Which file formats does Cloudera Search support for indexing? Does it support searching images?
Getting Support
Cloudera Support
Information Required for Logging a Support Case
Community Support
Get Announcements about New Releases
Report Issues
Cloudera Introduction
Important Notice © 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this document are trademarks of Cloudera and its suppliers or licensors, and may not be copied, imitated or used, in whole or in part, without the prior written permission of Cloudera or the applicable trademark holder. Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation. All other trademarks, registered trademarks, product names and company names or logos mentioned in this document are the property of their respective owners. Reference to any products, services, processes or other information, by trade name, trademark, manufacturer, supplier or otherwise does not constitute or imply endorsement, sponsorship or recommendation thereof by us. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Cloudera. Cloudera may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Cloudera, the furnishing of this document does not give you any license to these patents, trademarks copyrights, or other intellectual property. For information about patents covering Cloudera products, see http://tiny.cloudera.com/patents. The information in this document is subject to change without notice. Cloudera shall not be liable for any damages resulting from technical errors or omissions which may be present in this document, or from use of this document. Cloudera, Inc. 1001 Page Mill Road, Bldg 3 Palo Alto, CA 94304 info@cloudera.com US: 1-888-789-1488 Intl: 1-650-362-0488 www.cloudera.com Release Information Version: 5.5.x Date: May 16, 2016
Table of Contents About Cloudera Introduction.....................................................................................6 Documentation Overview...........................................................................................................................6 CDH Overview.............................................................................................................9 Impala Overview..........................................................................................................................................9 Impala Benefits.......................................................................................................................................................10 How Impala Works with CDH................................................................................................................................10 Primary Impala Features.......................................................................................................................................11 Cloudera Search Overview........................................................................................................................11 How Cloudera Search Works.................................................................................................................................12 Understanding Cloudera Search...........................................................................................................................13 Cloudera Search and Other Cloudera Components............................................................................................13 Cloudera Search Architecture................................................................................................................................15 Cloudera Search Tasks and Processes.................................................................................................................18 Apache Sentry Overview...........................................................................................................................20 Apache Spark Overview.............................................................................................................................20 Cloudera Manager 5 Overview................................................................................22 Terminology................................................................................................................................................22 Architecture................................................................................................................................................25 State Management....................................................................................................................................26 Configuration Management.....................................................................................................................27 Process Management...............................................................................................................................29 Software Distribution Management........................................................................................................30 Host Management.....................................................................................................................................31 Resource Management.............................................................................................................................31 User Management.....................................................................................................................................33 Security Management...............................................................................................................................33 Cloudera Management Service................................................................................................................34 Cloudera Manager Admin Console...........................................................................................................36 Starting and Logging into the Admin Console.....................................................................................................37 Cloudera Manager Admin Console Home Page..................................................................................................38 Displaying Cloudera Manager Documentation....................................................................................................40 Displaying the Cloudera Manager Server Version and Server Time..................................................................41 Cloudera Manager API...............................................................................................................................41
Backing Up and Restoring the Cloudera Manager Configuration ....................................................................42 Using the Cloudera Manager Java API for Cluster Automation.........................................................................43 Extending Cloudera Manager...................................................................................................................45 Cloudera Navigator 2 Overview..............................................................................46 Cloudera Navigator Data Management Overview.................................................................................47 Cloudera Navigator Data Management UI...........................................................................................................47 Cloudera Navigator Data Management API.........................................................................................................47 Displaying Cloudera Navigator Data Management Documentation................................................................49 Displaying the Cloudera Navigator Data Management Component Version..................................................49 Cloudera Navigator Data Encryption Overview......................................................................................49 Cloudera Navigator Encryption Architecture.......................................................................................................51 Cloudera Navigator Encryption Integration with an EDH..................................................................................51 Cloudera Navigator Key Trustee Server Overview..............................................................................................52 Cloudera Navigator Key HSM Overview...............................................................................................................53 Cloudera Navigator Encrypt Overview..................................................................................................................54 Frequently Asked Questions About Cloudera Software......................................58 Cloudera Express and Cloudera Enterprise Features............................................................................58 Cloudera Manager 5 Frequently Asked Questions................................................................................60 General Questions..................................................................................................................................................60 Cloudera Navigator 2 Frequently Asked Questions...............................................................................62 Impala Frequently Asked Questions.......................................................................................................63 Trying Impala...........................................................................................................................................................64 Impala System Requirements...............................................................................................................................65 Supported and Unsupported Functionality In Impala........................................................................................66 How do I?.................................................................................................................................................................67 Impala Performance...............................................................................................................................................68 Impala Use Cases....................................................................................................................................................70 Questions about Impala And Hive........................................................................................................................71 Impala Availability..................................................................................................................................................71 Impala Internals......................................................................................................................................................72 SQL...........................................................................................................................................................................74 Partitioned Tables..................................................................................................................................................76 HBase.......................................................................................................................................................................76 Cloudera Search Frequently Asked Questions.......................................................................................77 General.....................................................................................................................................................................77 Performance and Fail Over....................................................................................................................................78 Schema Management............................................................................................................................................79 Supportability..........................................................................................................................................................80 Getting Support........................................................................................................81
Cloudera Support.......................................................................................................................................81 Information Required for Logging a Support Case.............................................................................................81 Community Support..................................................................................................................................81 Get Announcements about New Releases.............................................................................................82 Report Issues.............................................................................................................................................82
About Cloudera Introduction About Cloudera Introduction Cloudera provides a scalable, flexible, integrated platform that makes it easy to manage rapidly increasing volumes and varieties of data in your enterprise. Cloudera products and solutions enable you to deploy and manage Apache Hadoop and related projects, manipulate and analyze your data, and keep that data secure and protected. Cloudera provides the following products and tools: • CDH—The Cloudera distribution of Apache Hadoop and other related open-source projects, including Cloudera Impala and Cloudera Search. CDH also provides security and integration with numerous hardware and software solutions. – Cloudera Impala—A massively parallel processing SQL engine for interactive analytics and business intelligence. Its highly optimized architecture makes it ideally suited for traditional BI-style queries with joins, aggregations, and subqueries. It can query Hadoop data files from a variety of sources, including those produced by MapReduce jobs or loaded into Hive tables. The YARN resource management component lets Impala coexist on clusters running batch workloads concurrently with Impala SQL queries. You can manage Impala alongside other Hadoop components through the Cloudera Manager user interface, and secure its data through the Sentry authorization framework. – Cloudera Search—Provides near real-time access to data stored in or ingested into Hadoop and HBase. Search provides near real-time indexing, batch indexing, full-text exploration and navigated drill-down, as well as a simple, full-text interface that requires no SQL or programming skills. Fully integrated in the data-processing platform, Search uses the flexible, scalable, and robust storage system included with CDH. This eliminates the need to move large data sets across infrastructures to perform business tasks. • Cloudera Manager—A sophisticated application used to deploy, manage, monitor, and diagnose issues with your CDH deployments. Cloudera Manager provides the Admin Console, a web-based user interface that makes administration of your enterprise data simple and straightforward. It also includes the Cloudera Manager API, which you can use to obtain cluster health information and metrics, as well as configure Cloudera Manager. • Cloudera Navigator—An end-to-end data management and security tool for the CDH platform. Cloudera Navigator enables administrators, data managers, and analysts to explore the large amounts of data in Hadoop, and simplifies the storage and management of encryption keys. The robust auditing, data management, lineage management, lifecycle management, and encryption key management in Cloudera Navigator allow enterprises to adhere to stringent compliance and regulatory requirements. This introductory guide provides a general overview of CDH, Cloudera Manager, and Cloudera Navigator. This guide also includes frequently asked questions about Cloudera products and describes how to get support, report issues, and receive information about updates and new releases. Documentation Overview The following guides are included in the Cloudera documentation set: Guide Description Overview of Cloudera and the Cloudera Documentation Set Cloudera Release Guide Cloudera provides a scalable, flexible, integrated platform that makes it easy to manage rapidly increasing volumes and varieties of data in your enterprise. Cloudera products and solutions enable you to deploy and manage Apache Hadoop and related projects, manipulate and analyze your data, and keep that data secure and protected. This guide contains release and download information for installers and administrators. It includes release notes as well as information about 6 | Cloudera Introduction
About Cloudera Introduction Guide Cloudera QuickStart Cloudera Installation and Upgrade Cloudera Administration Cloudera Data Management Cloudera Operation Cloudera Security Impala Guide Cloudera Search Guide Spark Guide Description versions and downloads. The guide also provides a release matrix that shows which major and minor release version of a product is supported with which release version of Cloudera Manager, CDH and, if applicable, Cloudera Impala. This guide describes how to quickly install Cloudera software and create initial deployments for proof of concept (POC) or development. It describes how to download and use the QuickStart virtual machines, which provide everything you need to start a basic installation. It also shows you how to create a new installation of Cloudera Manager 5, CDH 5, and managed services on a cluster of four hosts. Quick start installations should be used for demonstrations and POC applications only and are not recommended for production. This guide provides Cloudera software requirements and installation information for production deployments, as well as upgrade procedures. This guide also provides specific port information for Cloudera software. This guide describes how to configure and administer a Cloudera deployment. Administrators manage resources, availability, and backup and recovery configurations. In addition, this guide shows how to implement high availability, and discusses integration. This guide describes how to perform data management using Cloudera Navigator. Data management activities include auditing access to data residing in HDFS and Hive metastores, reviewing and updating metadata, and discovering the lineage of data objects. This guide shows how to monitor the health of a Cloudera deployment and diagnose issues. You can obtain metrics and usage information and view processing activities. This guide also describes how to examine logs and reports to troubleshoot issues with cluster configuration and operation as well as monitor compliance. This guide is intended for system administrators who want to secure a cluster using data encryption, user authentication, and authorization techniques. This topic also provides information about Hadoop security programs and shows you how to set up a gateway to restrict access. This guide describes Impala, its features and benefits, and how it works with CDH. This topic introduces Impala concepts, describes how to plan your Impala deployment, and provides tutorials for first-time users as well as more advanced tutorials that describe scenarios and specialized features. You will also find a language reference, performance tuning, instructions for using the Impala shell, troubleshooting information, and frequently asked questions. This guide explains how to configure and use Cloudera Search. This includes topics such as extracting, transforming, and loading data, establishing high availability, and troubleshooting. This guide describes Apache Spark, a general framework for distributed computing that offers high performance for both batch and interactive processing. The guide provides tutorial Spark applications, how to develop and run Spark applications, and how to use Spark with other Hadoop components. Cloudera Introduction | 7
About Cloudera Introduction Guide Description Cloudera Glossary This guide contains a glossary of terms for Cloudera components. 8 | Cloudera Introduction
分享到:
收藏