Guide to Supervised Machine Learning

7 min readMay 26, 2020

Insights are the stuff the competitive advantage is built upon. Understanding what the information holds for you is one of the most important requirements for successful business proceedings.

Supervised Machine Learning paves the way for understanding uneven, hidden patterns in data. It turns raw data into the menagerie of insights that show you the possibilities of moving forward and accomplishing goals.

The secret of the successful use of machine learning is in knowing what exactly you want it to do. In this article, we will take a closer look at business applications of supervised learning algorithms.

What is supervised machine learning?

Supervised Learning is a type of machine learning algorithm that looks for the unknown in the known.

It is a kind of algorithm designed to specifically sort through the data and squeeze the gist of it in the process so that you could understand what the future is holding for you. Its work is based on well-defined input and output.

Supervised machine learning is all about:

Scaling the scope of data;
Uncovering the hidden patterns in the data;
Extracting the most relevant insights;
Discovering relationships between entities;
Enabling predictions of the future outcomes based on available data;

How does Supervised Learning work?

The supervised learning algorithm is trained on a labeled dataset, i.e. the one where input and output are clearly defined.

Data Labeling means:

Defining an input — the types of information in the dataset that the algorithm is training on. It shows what types of data are there and what are their defining features;
Defining an output — labeling sets the desired results for the algorithm. It determines the articulation of the algorithm with the data (for example, matching data on “yes/no” or “true/false” criteria).

The labeled dataset contains everything the algorithm needs in order to operate. It sets the ground rules. It shows where is what and what is how. During the training process that is split in the ratio of 80% of training data and 20% of testing data.

With clearly determined values, the “learning” process is enabled. That makes the algorithm understand what it is supposed to be looking for.

In other words, you know what you need and the goal of the supervised algorithm is to find it. Or, if being more exact, to find what is matching the criteria.

From the algorithm’s perspective, the whole process turns into something akin to “connect the dots” exercise.

Now let’s look at two fundamental processes of supervised machine learning — classification and regression.

Classification — Sorting Out The Data

Classification is the process of differentiating and categorizing the types of information presented in the dataset into the discrete values. In other words, it is the “sorting out” part of the operation.

Here’s how it works:

The algorithm labels the data according to the input samples on which the algorithm was trained.
It recognizes certain types of entities, looks for similar elements and couples them into relevant categories.
The algorithm is also capable of detecting anomalies in the data.

The classification process covers optical character or image recognition, and also binary recognition (whether a particular bit of data is compliant or non-compliant to certain requirements in a manner of “yes” or “no”).

Regression — Calculating the Possibilities

Regression is the part of supervised learning that is responsible for calculating the possibilities out of the available data. It is a method of forming the target value based on specific predictors that point out cause and effect relations between the variables.

The process of Regression can be described as finding a model for distinguishing the data into continuous real values. In addition to that, Regression can identify the distribution movement derived from the part data.

The purpose of regression is:

To understand the values in the data
To identify the relations or patterns between them.
To calculate predictions of certain outcomes based on past data.

Now let’s look at the most widely used algorithms.

Supervised Machine Learning Real Life Examples

Decision Trees — Sentiment Analysis & Lead Classification

Decision trees are a basic form of organizing the operation in machine learning. It is a sequential form

The decision tree can be used both for classification and regression models. Basically, the decision tree breaks down the dataset into exponentially smaller subsets with the deeper and deeper definition of an entity. It provides the algorithm with the decision framework.

Structure-wise, decision trees are comprised of branches with different options (nodes) going from general to specific. Each branch constitutes a sequence based on compliance to the node requirements.

Usually, the requirements of the nodes are formulated as simple as “yes” and “no”. The former enables further proceeding while the latter signifies the conclusion of the operation with the desirable result.

The depth of the decision tree depends on the requirements of the particular operation. For example, the algorithm is recognizing the images of apples out of the dataset. One of the basic nodes is based on color “red” and it asks whether the color on the image is red. If “yes” the sequence moves on. If not — the image is passed on.

Overall, decision trees use cases include:

Customer’s Sentiment analysis
Sales Funnel Analysis

Linear Regression — Predictive Analytics, Sentiment Analysis

Linear regression is the type of machine learning model that is commonly used to get the insight out of available information.

It involves determining the linear relationship between multiple input variables and a single output variable. The output value is calculated out of a linear combination of the input variables.

There are two types of linear regression:

Simple linear regression — with a single independent variable used to predict the value of a dependent variable
Multiple linear regression — with multiple independent variables used to predict the value of a dependent variable.

It is a nice and simple way of extracting an insight into data.

Use cases of linear regression include:

Predictive Analytics
Price Optimization (Marketing and sales)
Analyzing sales drivers (pricing, volume, distribution, etc)
Basic Sentiment Analysis — where you need to determine basic polarity and layout of the sentiment.

Logistic Regression — Audience Segmentation and Lead Classification

Logistic regression is similar to linear regression but instead of a numeral dependent variable, it uses a different type of variables, most commonly binary “yes/no” / “true/false” variations.

Its primary use case is for binary prediction. For example, it is used by insurance companies to determine whether to give a credit card to the customer or decline.

Logistic Regression also involves certain elements of classification in the process as it classifies the dependent variable into one of the available classes.

Use cases of logistic regression include:

Classifying the contacts, leads, customers into specific categories
Segmenting target audience based on relevant criteria
Predicting various outcomes out of input data

Random Forest Classifier — Recommender engine, Image Classification, Feature Selection

Random Forest Classifier is one of the more elaborate variations of the decision trees.

It creates a sequence of decision trees based on a randomly organized selection from the training dataset. Then it gathers the information from the other decision trees so that it could decide on the final class of the test object.

The difference from the traditional decision trees is that it applies an element of randomness to a bigger extent that usually. Instead of simply looking for the most important feature upon the node split — it tries to find the best feature in the random selection of features.

This brings a large degree of diversity to the model and can seriously affect the subsequent quality of its work.

Deep decision trees may suffer from overfitting, but random forests avoid overfitting by making trees on random subsets. It takes the average of all the predictions, which cancels out the biases.

Random Forest Classifier use cases include:

Content Customization according to the User Behavior and Preferences
Image recognition and classification
Feature selection of the datasets (general data analysis)

Gradient Boosting Classifier — Predictive Analysis

Gradient Boosting Classifier is another method of making predictions. The process of boosting can be described as a combination weaker (less accurate) learners into a stronger whole.

Instead of creating a pool of predictors, as in bagging, boosting produces a cascade of them, where each output is the input for the following learner. It is used to minimize prediction bias.

Gradient boosting takes a sequential approach to obtain predictions. In gradient boosting, each decision tree predicts the error of the previous decision tree — thereby boosting (improving) the error (gradient).

Gradient Boosting is widely used in sales, especially in retail and eCommerce sectors. The use cases include:

Inventory Management
Demand Forecasting
Price Prediction.

Support Vector Machines (SVM) — Data Classification, Sentiment Analysis

Support Vector Machines (aka SVM) is a type of an algorithm that can be used for both for Regression and Classification purposes.

In its core — it is a sequence of decision planes that define the boundaries of the decision. Different planes signify different classes of entities.

The algorithm performs classification by finding the hyperplane (a unifying plane between two or more planes) that maximizes the margin between the two classes with the help of support vectors. This shows what the features the data and what they might mean in a specific context.

As such, SVM is very good at handling multi-dimensional data and finding a variety of different insights inside of one dataset.

Support Vector Machines algorithms are widely used in ad tech. The use cases include:

Segmenting audience
Sentiment Analysis — determining the elements of the opinion and the context of the statement.
Managing Ad Inventory
Providing a framework for understanding the possibilities of conversions in the specific audience segments of the specific types of ads.
Text Classification

Naive Bayes — Sentiment Analysis

Naive Bayes classifier is based on Bayes’ theorem with the independence assumptions between predictors i.e it assumes the presence of a feature in a class is unrelated to any other feature. Even if these features depend on each other or upon the existence of the other features, all of these properties independently. Thus, the name Naive Bayes.

It is used for classification based on the normal distribution of data.

Naive Bayes model is easy to build, with no complicated iterative parameter estimation which makes it particularly useful for very large datasets.

Naive Bayes use cases include:

Data Classification (such as spam detection)
Lead Classification
Sentiment Analysis (based on input texts, such as reviews or comments)

***

Supervised Machine Learning is one of the preeminent tools to make every single bit of this data work for the benefit of the company.

When applied properly — supervised algorithm shows the real value of the information and uncovers the opportunities of how to use it in the most effective manner.