logo资料库

Practical Machine Learning_A New Look at Anomaly Detection-O'Rei....pdf

第1页 / 共66页
第2页 / 共66页
第3页 / 共66页
第4页 / 共66页
第5页 / 共66页
第6页 / 共66页
第7页 / 共66页
第8页 / 共66页
资料共66页,剩余部分请下载后查看
Copyright
Table of Contents
Chapter 1. Looking Toward the Future
Chapter 2. The Shape of Anomaly Detection
Finding “Normal”
If you enjoy math, read this description of a probabilistic model of “normal”…
Human Insight Helps
Finding Anomalies
Once again, if you like math, this description of anomalies is for you…
Take-Home Lesson: Key Steps in Anomaly Detection
A Simple Approach: Threshold Models
Chapter 3. Using t-Digest for Threshold Automation
The Philosophy Behind Setting the Threshold
Using t-Digest for Accurate Calculation of Extreme Quantiles
Issues with Simple Thresholds
Chapter 4. More Complex, Adaptive Models
Windows and Clusters
Matches with the Windowed Reconstruction: Normal Function
Mismatches with the Windowed Reconstruction: Anomalous Function
A Powerful But Simple Technique
Looking Toward Modeling More Problematic Inputs
Chapter 5. Anomalies in Sporadic Events
Counts Don’t Work Well
Arrival Times Are the Key
And Now with the Math…
Event Rate in a Worked Example: Website Traffic Prediction
Extreme Seasonality Effects
Chapter 6. No Phishing Allowed!
The Phishing Attack
The No-Phishing-Allowed Anomaly Detector
How the Model Works
Putting It All Together
Chapter 7. Anomaly Detection for the Future
Appendix A. Additional Resources
GitHub
Apache Mahout Open Source Project
Additional Publications
About the Authors
Practical Machine Learning A New Look at Anomaly Detection Ted Dunning & Ellen Friedman
Sandbox ® Fast The first drag-and-drop sandbox for Hadoop Free Fully-functional virtual machine for Hadoop Easy Point-and-click tutorials walk you through the Hadoop experience www.mapr.com/sandboxML Use the Sandbox to tackle anomaly detection as described in the book!
Practical Machine Learning A New Look at Anomaly Detection Ted Dunning and Ellen Friedman
Practical Machine Learning by Ted Dunning and Ellen Friedman Copyright © 2014 Ellen Friedman and Ted Dunning. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Editor: Mike Loukides June 2014: First Edition Revision History for the First Edition: 2014-05-14: First release See http://oreilly.com/catalog/errata.csp?isbn=9781491904084 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Practical Machine Learning: A New Look at Anom‐ aly Detection and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their prod‐ ucts are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc. was aware of a trademark claim, the designations have been printed in caps or initial caps. Photos are copyright Ellen Friedman. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN: 978-1-491-90408-4 [LSI]
Table of Contents 1. Looking Toward the Future. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2. The Shape of Anomaly Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Finding “Normal” 8 If you enjoy math, read this description of a probabilistic model of “normal”… 10 Human Insight Helps 11 Finding Anomalies 12 Once again, if you like math, this description of anomalies is for you… 13 Take-Home Lesson: Key Steps in Anomaly Detection 14 A Simple Approach: Threshold Models 14 3. Using t-Digest for Threshold Automation. . . . . . . . . . . . . . . . . . . . . . 15 The Philosophy Behind Setting the Threshold 17 Using t-Digest for Accurate Calculation of Extreme Quantiles 19 Issues with Simple Thresholds 20 4. More Complex, Adaptive Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Windows and Clusters 25 Matches with the Windowed Reconstruction: Normal Function 28 Mismatches with the Windowed Reconstruction: Anomalous Function 30 A Powerful But Simple Technique 32 iii
Looking Toward Modeling More Problematic Inputs 34 5. Anomalies in Sporadic Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Counts Don’t Work Well 36 Arrival Times Are the Key 38 And Now with the Math… 40 Event Rate in a Worked Example: Website Traffic Prediction 41 Extreme Seasonality Effects 43 6. No Phishing Allowed!. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 The Phishing Attack 47 The No-Phishing-Allowed Anomaly Detector 49 How the Model Works 50 Putting It All Together 51 7. Anomaly Detection for the Future. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 A. Additional Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 iv | Table of Contents
CHAPTER 1 Looking Toward the Future Everyone loves a mystery, and at the heart of it, that’s what anomaly detection is—spotting the unusual, catching the fraud, discovering the strange activity. Anomaly detection has a wide range of useful appli‐ cations, from banking security to natural sciences to medicine to mar‐ keting. Anomaly detection carried out by a machine-learning program is actually a form of artificial intelligence. With the ever-increasing volume of data and the new types of data, such as sensor data from an increasingly large variety of objects that needs to be considered, it’s no surprise that there also is a growing interest in being able to handle more decisions automatically via machine-learning applications. But in the case of anomaly detection, at least some of the appeal is the excitement of the chase itself. 1
Figure 1-1. Finding anomalies is the detective work of machine learning. When are anomaly-detection methods a good choice? Unlike fictional detective stories, in anomaly detection, you may not have a clear sus‐ pect to search for, and you may not even know what the “crime” is. In fact, one way to think about when to turn to anomaly detection is this: Anomaly detection is about finding what you don’t know to look for. You are searching for anomalies, but you don’t know what their char‐ acteristics will be. If you did, you could use a different form of machine learning, called classification, or you would just write specific rules to find the anomalies. But that’s not generally where you start. Classification is a form of supervised learning where you have exam‐ ples of each kind of thing you are looking for. You apply a learning algorithm to these examples to build a model that can use features of new data to classify them into categories that represent each kind of data of interest. When you have examples of normal and some number of abnormal situations, classifers can help you mark new situations as normal or abnormal. Even when you know about some kinds of anomalies, it is always good to keep an eye out for new kinds that you don’t know about. That is where anomaly detection is applied. 2 | Chapter 1: Looking Toward the Future
分享到:
收藏