Hands-On Machine Learning with Scikit-Learn

and Scientific Python Toolkits

A practical guide to implementing supervised and unsupervised machine learning algorithms in Python

Ready to kickstart your machine learning career but feeling lost in a sea of resources?

This guide is your personal compass, empowering you to navigate the complexities and become a machine learning expert.

"The book is the perfect read for anyone who wants to transition into machine learning. It broadly covers all the key algorithms with an insightful practitioner's perspective"

"I've written this book to help you start your journey in machine learning. With over a decade of practical experience and a postgraduate degree in the field, I've gained the expertise to bridge theory and practice. This book, my second in the data-related domain, aims to be an enjoyable resource for you.

The focus of this book is Scikit-Learn, a versatile and popular library among machine learning practitioners. However, it goes beyond Scikit-Learn, introducing complementary libraries like NumPy, Pandas, SpaCy, imbalanced-learn, and Scikit-Surprise. By understanding the theoretical concepts covered in this book, you'll also be well-prepared to explore other libraries such as TensorFlow and PyTorch, expanding your knowledge and skills in the field."

Start your machine learning journey by visitng this link

Book Reviews

Here are some example reviews

From GoodReads:

Ali Faizan rated it: 5 out of 5 stars.

"For a machine learning noob like me, it was pleasing to see that the book did not dive straight into the nitty-gritty of machine learning algorithms: it first established the raison d’être for machine learning and cohesively captured the whole gamut of developing a machine learning model. This helped me quite a bit to understand the bigger picture later on in the book where it demonstrated the practical use of various machine learning algorithms. I'll happily recommend this book to anyone interested in scikit-learn, and machine learning in general too".

Paul Schmidt rated it: 5 out of 5 stars.

"This book is information rich with practical examples. I whom never read or touched this area was suprised to learn the weight that data analysis had on machine learning. Yes, this book also teaches you about data analysis. Throughout the chapters you learn what not to do when building machine learning and deep learning models. The author teaches you what not to do by analysing the data at hand and improving the models upon that knowledge. The book is very information rich and can easily be reread from chapter to chapter. There are some things to keep in mind, this book is not for python beginners and i urge you to know some of the basics from the pandas and matplotlib modules. In other words this book is strongly recommended".

From Amazon:

Przemyslaw Chojecki rated it: 5 out of 5 stars.

"If you've already did a couple of data science projects, had a basic understanding of Python, did some visualisation and want to go deeper into some details of what it means to analyse data, then this book is for you. This is a practical guide to both supervised and unsupervised learning with plenty of examples in code. The main focus is on imperfect data and how to make sense of these imperfections through various machine learning algorithms. The author discusses standard data science algorithms using scikit-learn library which gives a coherent overview of the subjest. You will learn decision trees, KNN classification, Naive Bayes and much more; applied to classical datasets like Iris dataset, Boston housing prices or Fashion-MNIST. Recommended for beginning data scientists!".

Adam Powell rated it: 5 out of 5 stars.

"The perfect read for an analyst that wants to transition into machine learning. It broadly covers all the key algorithms with an insightful practitioner's perspective. Highly recommended!".

From YouTube:

DigitalSreeni: Book Review - Machine Learning with scikit-learn and scientific python toolkits

Dimitri Bianco: Hands-On Machine Learning with scikit-learn and Scientific Python Toolkit

Book Content

This book is composed of 13 chapters. Here is a brief about each chapter:

Chapter 1:

Embark on an illuminating journey into the realm of machine learning. Curious about how machines acquire knowledge? This chapter unveils the big picture, laying a solid foundation for the captivating algorithms we delve into next.

Chapter 2: Making Decisions with Trees

Introducing our first supervised learning algorithm in this book: decision trees.

We chose this versatile and easily comprehensible algorithm to kickstart your journey. As you progress, you'll discover its vital role as a foundation for advanced algorithms like Random Forest and Gradient Boosted Trees.

Each chapter is designed to expand your knowledge of machine learning and statistical concepts alongside the main topic. This here, we will explore data splitting, model evaluation, and hyper-parameter tuning.

By the chapter's end, you'll have mastered:

The inner workings of decision trees and how they learn
Optimal strategies for data splitting
Harnessing cross-validation for reliable scores
Unveiling hyper-parameters and their effective tuning
Visualizing decision boundaries within the tree
Leveraging decision trees for regression tasks
Tailoring weights for diverse training samples

Get ready for a transformative learning experience that equips you with the tools to unlock the potential of decision trees and beyond.

Chapter 3: Making Decisions with Linear Equations

The linear models are possibly the most commonly used algorithms in statistics and machine learning. They are used for both regression and classification. Thus, in this chapter we will start by looking into the basic least-squares algorithm, then will move on to more advanced algorithms as the chapter progresses.

The secondary topics that you will get introduced to in parallel to the linear model are regularization and regression intervals. Regularization is a very powerful concept that you will meet over and over again throughout your machine learning journey. Thus, I decided to introduce it early on in the book. The concept of regression intervals is also a very useful tool to quantify your uncertaining about your productions.

By the end of this chapter, you will have a very good understanding of the following topics:

Understanding linear models and their history
Learn about regression models evaluations criteria (MSE, MAE and Coefficient of Determination, i.e. R^2)
How to use Confidence Intervals to get more reliable scores?
How to engineer new features and find their Importances (e.g. Polynomial features)
What is regularisation? What are solvers?
Start using your first Generalised Linear Model (GLM), Logistic regression
Additional linear models (Stochastics Gradient Descent, Elastic-net, RANSAC, etc)
Finding regression intervals

Chapter 4: Preparing your data

You probably heard one version or another of the saying, "Data scientists spend 80% of their time cleaning data". Data cleaning is an essential part of the job, but furthermore, even when the data is clean, many algorithms demand the data to be processed in ways to make it suitable for them to operate on. In this chapter we will talk about the following:

Imputing missing values (e.g. SimpleImputer and IterativeImputer)
Encoding non-numerical features (e.g. One-hot Encoding, Ordinal Encoding, Target Encoding, Leave-one-out Encoding, etc.)
Feature Scaling (MinMax Scaler, Standard Scaler, Robust Scaler, etc.)
Feature Selection (Variance Threshold, Mutual Information, etc.)

Chapter 5: Image Processing with Nearest Neighbors

Image processing is an essential part of machine learning. I find the Nearest Neighbor Algorithm a good way to understand how image classification works before getting into more complex algorithms that may obscure things. In this chapter we will learn about the following topics.

Nearest Neighbor Algorithm (KNN)
Different Distances (e.g. Euclidean, Cosine, Manhattan, Minkowski, etc.)
Creating a custom distance to use with KNN
Radius Neighborhood
Nearest Centroid Algorithm
Principal Component Analysis (PCA)
Neighborhood Component Analysis (NCA)
Bias-Variance Tradeoff
Hyper-parameter tuning via GridSearchCV

Chapter 6: Classifying Text using Naive Bayes

"A word after a word after a word is port" - Margaret Atwood. In this chapter we will learn about Natural Language Processing (NLP) and text classification. Here are the topics covered here.

Tokenization and Vector Space Model
Bag og Words, TF-IDF and Word Embedding (Word2Vec)
Bayes Rule and Naive Bayes Classifier
Multinomial vs Bernoulli Naive Bayes Classifier
Gaussian Naive Bayes Classifier
Additive Smooting (Lidstone and Laplace Smoothing)
F1-Score for combining Precision and Recall Scores
Scikit-Learn Pipelines
Creating a custom Scikit-Learn Transformer
Using NLTK and SpaCy with Scikit-Learn

Chapter 7: Neural Networks; Here Comes the Deep Learning

The term deep learning refers to deep Artificial Neural Networks (ANNs). The latter concept comes in different forms and shapes. In this chapter, we are going to cover one subset of feedforward neural networks known as the Multilayer Perceptron (MLP). It is one of the most commonly used types and is implemented by scikit-learn. As its name suggests, it is composed of multiple layers, and it is a feedforward network as there are no cyclic connections between its layers. The more layers there are, the deeper the network is. These deep networks can exist in multiple forms, such as MLP, Convolutional Neural Networks (CNNs), or Long Short-Term Memory (LSTM). The latter two are not implemented by scikit-learn, yet this will not stop us from discussing the main concepts behind CNNs and manually mimicking them using the tools available from the scientific Python ecosystem.

In this chapter, we are going to cover the following topics:

Getting to know the Multilayer Perceptron (MLP)
Monitoring and tuning your neural network's learning rate.
Judging if you need more training data or more epochs.
Activation functions such as Softmax, ReLu, Leaky ReLu, etc.
Adding your own activation function to scikit-learn.
Classifying items of clothing
Learning about convolutions, kernels and max pooling

Chapter 8: Ensembles - When one model is not enough

Bagging vs Boosting
Random Forest
Bagging Meta Estimator (with KNN)
AdaBoost
Gradient Boosting
Voting and Stacking Ensembles
Random Tree Embedding
Learning Deviance
Quantile Regression and Regression Ranges
Early Stopping and Adaptive Learning Rate
The ROC Curve

Chapter 9: The Y is as important as the X

Regression Target Scaling
Multi-Class vs Multi-Label (Multi-Output) Classifiers
OneVsOne vs OneVsRest Classifiers
Classifier Probability Calibration (Sigmoid / Isotonic)
Precision at K

Chapter 10: Imbalanced Learning - Not even 1% win the lottery

Predicting Click-through Rate (CTR)
Reweighting the training samples
Random Oversampling (ROS)
Random Undersampling (RUS)
Combining Sampling with Ensemble Methods (e.g. Balanced Random Forest and Balanced Bagging)
Area Under the Curve (AUC)
Fairness in Machine Learning (Equal Opportunity Score)

Chapter 11: Making Sense of Unlabeled Data

Understanding Clustering
K-means Clustering Algorithm
Agglomerative Clustering
Dense-based Spatial Clustering (DBSCAN)
The Silhouette Score
The Adjusted Rand Index
Affinity and Linkage

Chapter 12: Anomaly Detection and Finding Outliers in Data

EllipticEnvelope (Mahalanobis Distance)
Local Outlier Factor (LOF)

Chapter 13: Recommender System - Getting to know their taste

Probably recommender systems are the first ones to come to a layperson's mind when they hear about machine learning. These systems are everywhere, from Spotify to Netflix and Amazon. In this chapter we will be using a sister library to scikit-learn called Surprise. Then you will learn the difference between content-based and collaborative filtering algorithms. You will learn how to solve the cold-start problem, and how to package your final model and serve it behind a REST API. Here are the main topics of this chapter:

How the different recommendation paradigms work.
How KNN (K-Nearest Neighbors) algorithm helps in recommendation.
What is Singular Value Decomposition (SVD).
What are the best options for a baseline recommender?
How to deploy your machine learning models to production.

Get your hands on Hands-On Machine Learning Now

"Hands-On Machine Learning with Scikit-Learn" is generally well-regarded in the machine learning community. It is known for its practical approach, providing readers with hands-on examples and exercises using the Scikit-Learn library.

The book covers fundamental concepts and techniques in machine learning, making it suitable for beginners and intermediate learners. It is often praised for its clear explanations and code examples that help readers understand and apply machine learning algorithms effectively.

Start your machine learning journey by visiting this link*

_{^{Links to Amazon are affiliate links.}}