Hive基础介绍(201804).pptx

发布时间：2022-06-03 发布人：admin 分类：说明书资料大小：1.12M 资料格式：pptx 举报版权申诉

weixin_43292255-12081894-4744302543382026691.pptx.pdf-第1页.png

第1页 / 共44页

weixin_43292255-12081894-4744302543382026691.pptx.pdf-第2页.png

第2页 / 共44页

weixin_43292255-12081894-4744302543382026691.pptx.pdf-第3页.png

第3页 / 共44页

weixin_43292255-12081894-4744302543382026691.pptx.pdf-第4页.png

第4页 / 共44页

weixin_43292255-12081894-4744302543382026691.pptx.pdf-第5页.png

第5页 / 共44页

weixin_43292255-12081894-4744302543382026691.pptx.pdf-第6页.png

第6页 / 共44页

weixin_43292255-12081894-4744302543382026691.pptx.pdf-第7页.png

第7页 / 共44页

weixin_43292255-12081894-4744302543382026691.pptx.pdf-第8页.png

第8页 / 共44页

文本预览

Hive基础介绍 2018.04.19 陈彬 1

目录 CONTENT 1 2 3 4 Hadoop & Hive概述 Hive SQL基础常见问题及规范 Hive SQL优化 2

第一章 CHAPTER ONE 1 2 3 4 Hadoop & Hive概述 u Hadoop 与 Hive u 离线分析平台的软件版本 u Hive的访问客户端 Hive SQL基础常见问题及规范 Hive SQL优化 3

1 Hadoop & Hive概述 | Hadoop与Hive Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. u• Hadoop HDFS: A distributed file system that provides high-throughput access to application data. 分布式文件系统 u• Hadoop YARN: A framework for job scheduling and cluster resource management. 资源管控系统 u• Hadoop MapReduce: A YARN-based system for parallel processing of large data sets. 并行计算框架 4

1 Hadoop & Hive概述 | Hadoop与Hive Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and the output of the job are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. 书本单词计数作业如何转换为一个分布式作业？分类统计作业如何转换为一个分布式作业？ (input) -> map -> -> combine -> -> reduce -> (output) 5

1 Hadoop & Hive概述 | Hadoop与Hive The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive. Hive通过把SQL语句翻译成MapReduce作业完成数据处理流程。 Hive is not designed for online transaction processing (OLTP) workloads. It is best used for traditional data warehousing tasks. Hive并非设计于进行OLTP任务场景的计算组件，它最适用于传统的数据仓库的任务场景。高IO和跑批 6

1 Hadoop & Hive概述 | Hadoop与Hive 使用场景实时查询联机事务处理（OLTP）数仓场景与分析处理？大数据量批处理即席查询多维数据分析数据挖掘 RDMS NoSQL Hive Kylin Spark 7

1 Hadoop & Hive概述 | 离线分析平台的软件版本离线分析平台采用华为FusionInsight产品搭建平台，现有上海(OFA)及深圳(OFB) 两个集群，上海集群共65个节点，深圳集群共计105个结点，相关软件版本如下: 软件名称 RHEL FusionInsight Hadoop Hive Spark OFB 7.2 C70 2.7.2 1.2.1 说明操作系统华为版本分布式平台 Sql数仓工具 2.1.0/1.5.1 分布式处理框架版本号 OFA 6.4 C60 2.7.2 1.2.1 1.5.1 8

分享到：

赞收藏

资料库

Hive基础介绍(201804).pptx

相关推荐

大数据

热门标签

最新资料