History
MACHINE LEARNING FUNDAMENTALS
Topics
Copyright © 2018 Packt Publishing
Tutorials
Offers & Deals
All rights reserved. No part of this book may be
reproduced, stored in a retrieval system, or transmitted in
any form or by any means, without the prior written
permission of the publisher, except in the case of brief
quotations embedded in critical articles or reviews.
Highlights
Settings
Support
Sign Out
Every effort has been made in the preparation of this book
to ensure the accuracy of the information presented.
However, the information contained in this book is sold
without warranty, either express or implied. Neither the
author, nor Packt Publishing, and its dealers and
distributors will be held liable for any damages caused or
alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark
information about all of the companies and products
mentioned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy
of this information.
Author: Hyatt Saleh
Managing Editor: Neha Nair
Acquisitions Editor: Aditya Date
Production Editor: Samita Warang
Editorial Board: David Barnes, Ewan Buckingham, Simon
Cox, Manasa Kumar, Alex Mazonowicz, Douglas Paterson,
Dominic Pereira, Shiny Poojary, Saman Siddiqui, Erol
Staveley, Ankita Thakur, and Mohita Vyas
First Published: November 2018
Production Reference: 1291118
ISBN: 9781789803556
Table of Contents
Preface
Introduction to Scikit-Learn
INTRODUCTION
SCIKIT-LEARN
ADVANTAGES OF SCIKIT-LEARN
DISADVANTAGES OF SCIKIT-LEARN
DATA REPRESENTATION
TABLES OF DATA
FEATURES AND TARGET MATRICES
EXERCISE 1: LOADING A SAMPLE DATASET AND
CREATING THE FEATURES AND TARGET
MATRICES
ACTIVITY 1: SELECTING A TARGET FEATURE AND
CREATING A TARGET MATRIX
DATA PREPROCESSING
MESSY DATA
EXERCISE 2: DEALING WITH MESSY DATA
DEALING WITH CATEGORICAL FEATURES
EXERCISE 3: APPLYING FEATURE ENGINEERING
OVER TEXT DATA
RESCALING DATA
EXERCISE 4: NORMALIZING AND STANDARDIZING
DATA
ACTIVITY 2: PREPROCESSING AN ENTIRE
DATASET
SCIKIT-LEARN API
HOW DOES IT WORK?
SUPERVISED AND UNSUPERVISED LEARNING
SUPERVISED LEARNING
UNSUPERVISED LEARNING
SUMMARY
Unsupervised Learning: Real-Life
Applications
INTRODUCTION
CLUSTERING
CLUSTERING TYPES
APPLICATIONS OF CLUSTERING
EXPLORING A DATASET: WHOLESALE
CUSTOMERS DATASET
UNDERSTANDING THE DATASET
DATA VISUALIZATION
LOADING THE DATASET USING PANDAS
VISUALIZATION TOOLS
EXERCISE 5: PLOTTING A HISTOGRAM OF ONE
FEATURE FROM THE NOISY CIRCLES DATASET
ACTIVITY 3: USING DATA VISUALIZATION TO AID
THE PREPROCESSING PROCESS
K-MEANS ALGORITHM
UNDERSTANDING THE ALGORITHM
EXERCISE 6: IMPORTING AND TRAINING THE K-
MEANS ALGORITHM OVER A DATASET
ACTIVITY 4: APPLYING THE K-MEANS ALGORITHM
TO A DATASET
MEAN-SHIFT ALGORITHM
UNDERSTANDING THE ALGORITHM