Confusion Matrix in Machine Learning

Table of Contents

What is Confusion Matrix?

A confusion matrix is a performance measurement technique for Machine learning classification. It measures the performance of a classifier in depth. Confusion matrix helps us know performance of the classification model on a set of test data for that the true values are known.

A confusion matrix is a summary of prediction results on a classification problem. It gives you insight not only into the errors being made by your classifier but more importantly the types of errors that are being made.

Looking for a Data Science Career? Explore Here!

Outcomes of Confusion Matrix

The four different combinations from the predicted and actual values of a classifier are:

True Positive (TP): You predicted positive values and it is positive.
False Positive (FP): You predicted positive values but it is actually negative.
False Negative (FN): You precited negative value but it is actually positive.
True Negative (TN): You predicted negative value and it is negative.

Example of Confusion Matrix

Suppose you predict the result of a cricket match between India and Australia.

True Positive: When you predict a positive outcome and it turns out to be correct. Suppose you predict that India will win and it wins.
True Negative: When you predict a negative outcome and it turns out to be correct. Suppose you predict that India will lose and it loses.
False Positive: When you predict a positive outcome and it turns out to be wrong. Suppose you predict that India will win and it loses.
False negative: When you predict a negative outcome and it turns out to be correct. Suppose you predict that India will lose but it wins.

Important Terms using a Confusion matrix

Precision: It measures how good our model is when the prediction is positive. It measures how likely the prediction of the positive class is correct. It is useful for the conditions where false positive is a higher concern as compared to a false negative.

Recall: Precision alone is not very helpful because it ignores the negative class. It measures how good our model is at correctly predicting positive classes. It is the ratio of correct positive predictions to all positive classes. It is useful when false negative dominates false positives.

Accuracy: It is used to find the portion of correctly classified values. It is the sum of all true values divided by total values.

Sensitivity: It measures the proportion of positive class that is correctly predicted as positive and hence it is also known as True Positive Rate (TPR). It is the same as recall.

Specificity: It measures the proportion of negative class that is correctly predicted as negative.

F1 score: It is a weighted average score of the precision and recall. It is useful when you need to take both Precision and Recall into account.

Roc Curve: It shows the true positive rates against the false positive rate at various cut points. It also demonstrates a trade-off between sensitivity.

Null Error Rate: For the conditions when the model always predicted the majority class, null error rate defines how frequently the model would be incorrect.

Misclassification Rate: It explains how repeatedly the mode yields the wrong predictions, and also known as error rate.

How to Calculate a Confusion Matrix?

You need a test dataset or a validation dataset with expected outcome values.
Predict all the rows in the test dataset.
From the expected outcomes and predictions count:

The total of correct predictions of each class.
The total of incorrect predictions of each class.

Then organize the numbers into a table, or a matrix as follows:

Each row of the matrix links to a predicted class.
Each column of the matrix corresponds with an actual class.
Enter the total counts of correct and incorrect classification into the table.
The sum of correct predictions for a class go into the predicted column and expected row for that class value.
The sum of incorrect predictions for a class goes into the expected row for that class value and the predicted column for that specific class value.

Join Our Data Science and Machine Learning Course! Enroll Here!

Benefits of Using a Confusion Matrix

It provides an insight not only to the errors which are made by a classifier but also errors that are being made.
It reflects how a classification model is disorganized and confused while making predictions.
It is a useful machine learning method which allows you to measure Recall, Precision, Accuracy, and AUC-ROC curve.
It assists in prevailing over the limitations of deploying classification accuracy alone.

In short, a confusion matrix is a summarized table of the number of correct and incorrect predictions for binary classification tasks. By visualizing the confusion matrix, an individual could determine the accuracy of the model by observing the diagonal values for measuring the number of accurate classification.

Free Tutorials To Learn

SQL Tutorial for Beginners PDF – Learn SQL Basics
HTML Exercises to Practice \| HTML Tutorial
DSA Practice Series \| DSA Tutorials
Java Programming Notes PDF 2023

Related Articles
Understanding Machine Learning Basics – A Simple Guide	Importance of Data Preprocessing in Machine Learning
Exploratory Data Analysis in Machine Learning – EDA Steps, Importance	Data Analysis – Process, Methods, Types
Best Data Science Skills for Data Science Career	What is Data Interpretation? Methods and Benefits

Confusion Matrix in Machine Learning

Ramzeena Althaf

Related Posts

SEM Interview Questions

Top 10 Quantity Surveying Companies in Dubai

On-page SEO Interview Questions (2024 Updated)

Paragraph Writing on Covid 19 - Samples

More to Explore

Free Tutorials For You

Data Science & Python Training in Different Cities

More to Learn

Courses

Company

Spoken English Courses

Quick Links

Other Courses

Popular Exam