logo资料库

基于Spark和Cloudera数据科学家工作平台的分布式机器学习实践.pdf

第1页 / 共56页
第2页 / 共56页
第3页 / 共56页
第4页 / 共56页
第5页 / 共56页
第6页 / 共56页
第7页 / 共56页
第8页 / 共56页
资料共56页,剩余部分请下载后查看
数据科学的企业应用及其自动化 © Cloudera, Inc. All rights reserved. 1
数据科学改变传统行业 Connected Car Smart Industry Smart Cities & Ports Environment Sensing Usage Based Insurance Predictive Maintenance Aerospace & Aviation Smart Healthcare © Cloudera, Inc. All rights reserved. 2
Advanced Analytics & Machine Learning © Cloudera, Inc. All rights reserved. 3
Data Science: It’s all about Curiosity & Passion Curiosity Passion Jeff Hammerbacher, Cloudera创始人以及首席科学家 © Cloudera, Inc. All rights reserved. 4
Hierarchy of needs of Data Science © Cloudera, Inc. All rights reserved. 5
Exploratory Data Science Required Capabilities: ● Unified Platform: ○Enables more workloads in a single platform at an integrated data science and engineering environment. ● Performance: ○Developers and data scientists have access to high scale and high performance query engines for distributed analytics. ● Ease of Use: ○Not for hacker users but data scientists. “Cloudera’s approach was more aligned with our own philosophies, such as building simpler, more prescriptive libraries that broaden the audience for the platform. ” “Cloudera Enterprise expedites round-trips to access and compute data for data discovery, translating into significant reductions in R&D time. This will have a very meaningful scientific upside.” © Cloudera, Inc. All rights reserved. 6
Machine Learning for Enterprise Required Capabilities: ●Expanded Data Access ○Expand your empirical data ●Test and Train Faster ○Iterative development ●Use Familiar Tools ○Increase developer productivity with familiar API’s ●Integrated Batch and Streaming ○unified batch and streaming programming model “Cloudera, using complex machine learning algorithms, analyzes large amounts of data in real time and allows personalization of game interaction with players through recommendations.,” “Machine learning and big data is like a marriage made in heaven. I mean it works really well with some other tooling that's already there on the stack. It used to take time. It was cumbersome.” “CDSW + Spark is perfect combination for machine learning on Hadoop. It simplified our work and save the cost of our data industry.” © Cloudera, Inc. All rights reserved. 7
Developing of Data Science 大量开源的机器学习框架 Data Science team(s) wants to use the latest open source tools IT/business don’t know how to support with existing analytics investments Hadoop平台 - 避免数据科学 的”孤岛” - 数据供给和可视化加工问题 - 数据治理是数据科学的重要基 础 -- 数据科学需要大量的计算资源, 分布式资源调度、数据湖、私 有云和多租户技术成为必然 数据科学的相关组织架构 n 自助服务 – 数据服务、分析服 务、建模服务 n 个性化的数据环境 n 业务部门和科技部门的价值平等, 需要紧密协作 © Cloudera, Inc. All rights reserved. 8
分享到:
收藏