Building a Scalable Data Warehouse with Data Vault 2.0.pdf

发布时间：2022-06-03 发布人：admin 分类：说明书资料大小：101.45M 资料格式：pdf 举报版权申诉

yangws2004-10845885-16359647620864338981.pdf-第1页.png

第1页 / 共663页

yangws2004-10845885-16359647620864338981.pdf-第2页.png

第2页 / 共663页

yangws2004-10845885-16359647620864338981.pdf-第3页.png

第3页 / 共663页

yangws2004-10845885-16359647620864338981.pdf-第4页.png

第4页 / 共663页

yangws2004-10845885-16359647620864338981.pdf-第5页.png

第5页 / 共663页

yangws2004-10845885-16359647620864338981.pdf-第6页.png

第6页 / 共663页

yangws2004-10845885-16359647620864338981.pdf-第7页.png

第7页 / 共663页

yangws2004-10845885-16359647620864338981.pdf-第8页.png

第8页 / 共663页

文本预览

Building a Scalable Data Warehouse with Data Vault 2.0 Daniel Linstedt Michael Olschimke AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an Imprint of Elsevier

Publisher: Todd Green Editorial Project Manager: Amy Invernizzi Project Manager: Paul Prasad Chandramohan Designer: Matthew Limbert Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451, USA Copyright © 2016 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions poli- cies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress ISBN: 978-0-12-802510-9 For information on all Morgan Kaufmann publications visit our website at www.mkp.com/

Authors Biography DANIEL LINSTEDT Daniel has more than 25 years of experience in the Data Warehousing and Business Intelligence field and is internationally known for inventing the Data Vault 1.0 model and the Data Vault 2.0 System of Business Intelligence. He helps business and government organizations around the world to achieve BI excellence by applying his proven knowledge in Big Data, unstructured information management, agile methodologies and product development. He has held training classes and presented at TDWI, Teradata Partners, DAMA, Informatica, Oracle user groups and Data Modeling Zone conference. He has a background in SEI/CMMI Level 5, and has contributed architecture efforts to petabyte scale data warehouses and offers high quality on-line training and consulting services for Data Vault. MICHAEL OLSCHIMKE Michael has more than 15 years of experience in IT and has been working on business intelligence topics for the past eight years. He has consulted for a number of clients in the automotive indus- try, insurance industry and nonprofits. In addition, he has consulted for government organizations in Germany on business intelligence topics. Michael is responsible for the Data Vault training program at Dörffler + Partner GmbH, a German consulting firm specialized in data warehousing and business intelligence. He is also a lecturer at the University of Applied Sciences and Arts in Hannover, Germany. In addition, he maintains DataVault.guru, a community site on Data Vault topics. xiii

Foreword I met Daniel Linstedt during a speech at Lockheed Martin in the early 1990’s for the first time. By the time, he was an employee of the company, working for government projects. He approached me because he wanted my opinion about a concept that he had invented at the Department of Defense, in order to store large amounts of data. Back then, the term Big Data was not invented yet. But from what Daniel explained to me, the concept to deal with such huge amounts of data, was born. Because back then, the end user had cried for “give me my data!”. But over time the end user be- came more sophisticated. The end user learned that it was not enough to get one’s data. What a person needed was the RIGHT data. And then the sophisticated end user cried for “give me my accurate and correct data!” The data warehouse represented the architectural solution to the issue of needing a single version of the truth. The primary reason for the existence of the data warehouse was the corporate need for integ- rity and believability of data. As such the data warehouse became the major architectural evolutionary leap beyond the early application systems. But the data warehouse was not the end of architecture. Indeed, the data warehouse was only one stepping stone – architecturally speaking – in the progression of the evolution of architecture. It was Daniel’s idea that followed the data warehouse. In many ways the data warehouse set the stage for him. Daniel used the term common foundational modeling architecture to describe a model based on three simple entities, focusing on business keys, their relationships and descriptive information for both. By doing so, the model closely followed the way business was using the data in the source sys- tems. It allowed to source all kinds of data, regardless its structure, in a fully auditable manner. This was a core requirement of government agencies at the time. And due to Enron and a host of other corporate failures, Basel, and SOX compliance auditability was pushed to the forefront of the industry. Not only that, the model was able to evolve on changing data structures. It was also easy to extend by adding more and more source systems. Daniel later called it the “Data Vault Model” and it was groundbreaking. The data vault became the next architectural extension of the data warehouse. But the data vault con- cept – like all evolutions – continued to evolve. He asked me what to do about it and, as a professional author, I gave him the advice to “publish the heck out of it.” But Daniel decided to take it to the long run. Over multiple years, he improved the Data Vault and evolved it into Data Vault 2.0. Today, this System of Business Intelligence includes not only a more sophisticated model, but an agile methodology, a refer- ence architecture for enterprise data warehouse systems, and best practices for implementation. The Data Vault 2.0 System of Business Intelligence is ground-breaking, again. It incorporates con- cepts from massively parallel architectures, Big Data, real-time and unstructured data. And after all the time, I’m glad that he followed my advice and has started to publish more on the topic. This book represents that latest, most current step in the larger evolution of the Data Vault that has been occurring. This book had been carefully and thoughtfully prepared by leaders in the thought and implementation of the Data Vault. Bill Inmon June 29, 2015 xv

Preface When I was asked by the Department of Defense to build a scalable data warehouse, I was confronted with a problem. Back then, before the term Big Data was invented, there was no approach for building such systems – systems that could accommodate large data sets, delivered at high frequencies, and in multiple structures. I started intensive research to come up with a viable solution for this challenge. The analysis was based on patterns from nature, because I expected that a partial solution would already exist some- where. Over more than 10 years, from 1990 to early 2000, I tested the applicability of these natural pat- terns in data warehousing. By doing so, I reduced the initial list of 50 potential entities down to three. These remaining entity types were based on a hub-and-spoke architecture that scaled well and was easy to extend. This model is known today as Data Vault modeling. The three entities are: hubs, which provide a unique list of business keys from business processes; links, which integrate the business keys within and over source system boundaries; and satellites, which provide descriptive data. This model enabled my clients to build the most sophisticated systems and complete their assigned tasks. When I left the government context, the system was storing and processing more than 15 pet- abytes of data and is still growing today. However, over the years, Data Vault modeling evolved. It became one of the pillars of the Data Vault 2.0 Standard. The Data Vault 2.0 Architecture and the Data Vault 2.0 Methodology are the other pillars, in conjunction with the Data Vault 2.0 Implementation best practices. Without these other pillars, a Data Vault 2.0 model is just a model. The pillars together provide a set of best practices, standards, and techniques that organizations rely on to build scalable data warehouse systems by using agile practices. Data Vault 2.0 enables data warehouse teams around the world to exploit the Data Vault as a system of business intelligence. This is what I teach: how to take advantage of the Data Vault 2.0 Standard, in rapid, small steps; and it is what this book is all about. Daniel Linstedt Inventor of Data Vault modeling and the Data Vault 2.0 System of Business Intelligence St. Albans, Vermont, USA This book is the result of my own evolution regarding the Data Vault. When I heard of the concept for the very first time in 2011 or 2012 from Oliver Cramer, I remained very skeptical. This was due to the fact that, at that time, Data Vault was seen primarily as a model. It was different, but the model by itself was not enough for me to become convinced of the value of it. But Christian Haedrich, CEO of Dörffler, wanted to find out what’s behind Data Vault and decided to go for a training in 2013 with the inventor, Daniel Linstedt, in Vermont. To be honest, my first thought was: “what a waste of time.” I was not very happy to board a plane for six or more hours, head over to Vermont, sit in a training class for four days, and spend another six hours on the return trip. And because I hate to waste time, I decided to take advantage of it. My goal became not to waste my time during the flight or in Vermont. Instead, I wanted to seriously understand what the Data Vault xvii

xviii Preface is, but certainly not to use it in business. Instead, I wanted to rule it out with confidence. That’s not a lot of value, honestly, but at least you lose the uncertainty that you might miss some great technology because you don’t understand it. That was the plan, and I failed miserably at it. In fact, Daniel convinced me that the Data Vault was the technology you don’t want to miss if you’re building data warehouse solutions. Most people in the industry are unaware that he had further developed the concept and integrated best practices for implementation and methodology, as well as a reference architecture. These were the pieces that were missing for me. This now explained to me why the model is as it is, along with all the background in- formation that described why some designs are fundamentally different in Data Vault. Since then, I have asked Daniel many questions, because I wanted to fully understand the Data Vault, the concepts behind it and what his intentions are behind his design decisions. Our discussions back then started a work relationship and learning experience that I have truly enjoyed. This book is the outcome of this time spent. I might have failed when I tried to rule out Data Vault as a viable solution for business intelligence projects. But I always try to make mistakes only once in life. I’m glad that I changed my mind. Since that time, the Data Vault has become part of daily work and success in the industry. My personal wish is that this book becomes part of your success, too. The file name of the source code file is provided in the companion site, please refer the site for more details: http://booksite.elsevier.com/9780128025109 Michael Olschimke Hannover, Germany

Acknowledgments DANIEL LINSTEDT I would like to acknowledge my wife and family for granting me the support and love I needed to finish this book. I would also like to acknowledge my co-author Michael Olschimke for working extremely hard at trying to understand my writing, and spending countless hours on Skype calls with me in order to discuss my ideas. Furthermore, I would like to personally thank Scott Ambler for all his contribu- tions over time (especially to my last book); many of these ideas have made it into the foundations of Disciplined Agile Delivery embedded in the Data Vault 2.0 methodology. I am also pleased to thank Bill Inmon (the father of the data warehouse) for not only writing the foreword but also creating the industry I earn a living in. Without the “Data Warehouse” I would not have been able to create the Data Vault 2.0 System of Business Intelligence. In addition, I would like to thank Roelant Vos for kick-starting the Australian Data Vault market, as well as my partners: Doerffler & Partner, and Analytics8, who assist me with training in the Data Vault 2.0 space. I also would like to thank AnalytixDS, for their brilliant work on Automation of Data Vault 2.0 templates through their incredible product, Mapping Manager. Without their assis- tance, we could not generate much of the work that goes into Data Vault 2.0 systems worldwide. In addition, there are some customers I would like to thank for trying out the Data Vault 2.0 ideas as I refined them over the past several years. This includes Commonwealth Bank in Australia, QSuper in Australia, Intact Financial in Canada, and Microsoft – not only for creating the wonderful technol- ogy we have applied in this book, but also for utilizing Data Vault Modeling in-house for their own solutions. MICHAEL OLSCHIMKE My acknowledgements go to Dörffler + Partners who have financed my contributions to this book and gave me a safe harbor to be able to focus on writing. This certainly includes the management team around Werner Dörffler, Christian Hädrich and Siegfried Heger, but also the current and former employees of the firm, especially Timo Cirkel, Dominik Kroner, and Jens Lehmann. I would also like to thank our customers, especially the team of Gabriela Goldner at SwissLife and the team of Marcus Jacob at DEVK for giving me some valuable opportunities and feedback. Furthermore, I’d like to thank all those who have helped me become what I am today. This includes my parents Barbara and Paul Olschimke, for obvious reasons; Udo Bornschier who encouraged me to take an academic career; Prof. Cornelius Wille (Bingen) who promoted my scientific interest and en- couraged me to continue my academic career; Dr. Betty Robbins (OU) who teached me how to write, with the help of large amounts of red ink, which I deserved; Dr. Albert Schwarzkopf (OU) who helped me to discover my interest for data warehousing; Udo Apel who supervised my bachelor’s thesis at Borland and gave me some valuable advice when I started my graduate studies at Santa Clara Univer- sity; Prof. Manoochehr Ghiassi (SCU) who teached me how to organize a research team, among other valuable things (such as data mining and the value of taking notes); Oliver Cramer who discovered the Data Vault for me; and Daniel Linstedt for explaining it to me. The faculty at Santa Clara University xix

xx Acknowledgments deserves credit for helping me to understand the value of the Data Vault and see the glory in the service to others. But the most life-changing person, and the one who enabled me to make my contribution to this book, is Christina Woitzik, my partner for the last ten years. We strayed through darkness and went all the way through hell. But in the early light of dawn, our love is still there. By the time this book is published, she should be my lovely wife.

分享到：

赞收藏

资料库

Building a Scalable Data Warehouse with Data Vault 2.0.pdf

相关推荐

大数据

热门标签

最新资料