Bandit Algorithms for Website Optimization.pdf 下载-资料库

Bandit Algorithms for Website Optimization John Myles White

Bandit Algorithms for Website Optimization by John Myles White Copyright © 2013 John Myles White. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or corporate@oreilly.com. Editors: Mike Loukides and Meghan Blanchette Production Editor: Christopher Hearse Proofreader: Christopher Hearse Cover Designer: Randy Comer Interior Designer: David Futato Illustrator: Rebecca Demarest December 2012: First Edition Revision History for the First Edition: 2012-12-07 First release See http://oreilly.com/catalog/errata.csp?isbn=9781449341336 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Bandit Algorithms for Website Optimization, the image of an eastern barred bandicoot, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trade‐ mark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. ISBN: 978-1-449-34133-6 LSI

Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 1. Two Characters: Exploration and Exploitation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 The Scientist and the Businessman 1 Cynthia the Scientist 1 Bob the Businessman 2 Oscar the Operations Researcher 3 The Explore-Exploit Dilemma 4 2. Why Use Multiarmed Bandit Algorithms?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 What Are We Trying to Do? 7 The Business Scientist: Web-Scale A/B Testing 8 3. The epsilon-Greedy Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Introducing the epsilon-Greedy Algorithm 11 Describing Our Logo-Choosing Problem Abstractly 13 What’s an Arm? 13 What’s a Reward? 14 What’s a Bandit Problem? 14 Implementing the epsilon-Greedy Algorithm 15 Thinking Critically about the epsilon-Greedy Algorithm 19 4. Debugging Bandit Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Monte Carlo Simulations Are Like Unit Tests for Bandit Algorithms 21 Simulating the Arms of a Bandit Problem 22 Analyzing Results from a Monte Carlo Study 26 Approach 1: Track the Probability of Choosing the Best Arm 26 Approach 2: Track the Average Reward at Each Point in Time 28 Approach 3: Track the Cumulative Reward at Each Point in Time 29 iii

Exercises 31 5. The Softmax Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Introducing the Softmax Algorithm 33 Implementing the Softmax Algorithm 35 Measuring the Performance of the Softmax Algorithm 37 The Annealing Softmax Algorithm 40 Exercises 46 6. UCB – The Upper Confidence Bound Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Introducing the UCB Algorithm 47 Implementing UCB 49 Comparing Bandit Algorithms Side-by-Side 53 Exercises 56 7. Bandits in the Real World: Complexity and Complications. . . . . . . . . . . . . . . . . . . . . . . . 59 A/A Testing 60 Running Concurrent Experiments 60 Continuous Experimentation vs. Periodic Testing 61 Bad Metrics of Success 62 Scaling Problems with Good Metrics of Success 62 Intelligent Initialization of Values 63 Running Better Simulations 63 Moving Worlds 63 Correlated Bandits 64 Contextual Bandits 65 Implementing Bandit Algorithms at Scale 65 8. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Learning Life Lessons from Bandit Algorithms 69 A Taxonomy of Bandit Algorithms 71 Learning More and Other Topics 72 iv | Table of Contents

Preface Finding the Code for This Book This book is about algorithms. But it’s not a book about the theory of algorithms. It’s a short tutorial introduction to algorithms that’s targetted at people who like to learn about new ideas by experimenting with them in practice. Because we want you to experiment, this book is meant to be read while you’re near an interpreter for your favorite programming language. In the text, we illustrate every al‐ gorithm we describe using Python. As part of the accompanying online materials, there is similar code available that implements all of the same algorithms in Julia, a new programming language that is ideally suited for implementing bandit algorithms. Alongside the Python and Julia code, there are also links to similar implementations in other languages like JavaScript. We’ve chosen to use Python for this book because it seems like a reasonable lingua franca for programmers. If Python isn’t your style, you should be able to translate our Python code into your favorite programming language fairly easily. Assuming you are happy using Python or Julia, you can find the code for the book on GitHub at https://github.com/johnmyleswhite/BanditsBook. If you find mistakes or would like to submit an implementation in another language, please make a pull request. Dealing with Jargon: A Glossary While this book isn’t meant to introduce you to the theoretical study of the Multiarmed Bandit Problem or to prepare you to develop novel algorithms for solving the problem, we want you to leave this book with enough understanding of existing work to be able to follow the literature on the Multiarmed Bandit Problem. In order to do that, we have to introduce quite a large number of jargon words. These jargon words can be a little v

odd at a first, but they’re universally used in the academic literature on Multiarmed Bandit Problems. As you read this book, you will want to return to the list of jargon words below to remind yourself what they mean. For now, you can glance through them, but we don’t expect you to understand these words yet. Just know that this material is here for you to refer back to if you’re ever confused about a term we use. Reward A quantitative measure of success. In business, the ultimate rewards are profits, but we can often treat simpler metrics like click-through rates for ads or sign-up rates for new users as rewards. What matters is that (A) there is a clear quantitative scale and (B) it’s better to have more reward than less reward. Arm What options are available to us? What actions can we take? In this book, we’ll refer to the options available to us as arms. The reasons for this naming convention will be easier to understand after we’ve discuss a little bit of the history of the Multiarmed Bandit Problem. Bandit A bandit is a collection of arms. When you have many options available to you, we call that collection of options a Multiarmed Bandit. A Multiarmed Bandit is a mathematical model you can use to reason about how to make decisions when you have many actions you can take and imperfect information about the rewards you would receive after taking those actions. The algorithms presented in this book are ways of trying to solve the problem of deciding which arms to pull when. We refer to the problem of choosing arms to pull as the Multiarmed Bandit Problem. Play/Trial When you deal with a bandit, it’s assumed that you get to pull on each arm multiple times. Each chance you have to pull on an arm will be called a play or, more often, a trial. The term “play” helps to invoke the notion of gambling that inspires the term “arm”, but the term trial is quite commonly used. Horizon How many trials will you have before the game is finished? The number of trials left is called the horizon. If the horizon is short, you will often use a different strategy than you would use if the horizon were long, because having many chances to play each arm means that you can take greater risks while still having time to recover if anything goes wrong. Exploitation An algorithm for solving the Multiarmed Bandit Problem exploits if it plays the arm with the highest estimated value based on previous plays. vi | Preface

资料库

Bandit Algorithms for Website Optimization.pdf 下载

相关推荐

人工智能

热门标签

最新资料