The business has undergone a revolution thanks to machine learning, enabling us to create complex apps that tackle challenging issues. Classification, regression, and clustering methods can be used to solve problems utilizing supervised and unsupervised machine learning models. In this post, we’ll talk about the Python implementation of the supervised machine learning technique known as logistic regression. Both classification and regression issues can be resolved using logistic regression.
A labeled training dataset is used by supervised machine learning algorithms to discover insights, patterns, and relationships. It indicates that each record in the collection already has a known value for the target variable. Because an algorithm learns from the training dataset under the supervision of an instructor, this process is known as supervised learning. The algorithm iteratively makes predictions based on the training data, and the instructor corrects it. You already know the right answers. Learning is complete when the algorithm reaches the desired performance and accuracy level
Supervised learning problems can be further classified into regression and classification problems.
- Classification: In a classification problem, the output variable is a category, such as “red” or “blue,” “disease” or “no disease,” “true” or “false,” etc.
- Regression: In a regression problem, the output variable is a real continuous value, such as “dollars” or “weight.”
What is Logistic Regression?
When creating machine learning models, logistic regression is a statistical technique used when the dependent variable is dichotomous, or binary. Data and the relationship between one dependent variable and one or more independent variables are described using logistic regression. Nominal, ordinal, or interval types are all acceptable for the independent variables. The concept of the logistic function that it employs is where the term “logistic regression” originates. The sigmoid function is another name for the logistic function. This logistic function has a value between 0 and 1.
Advantages of Logistic Regression Algorithm
- Logistic regression performs better when the data is linearly separable
- It does not require too many computational resources as it’s highly interpretable
- There is no problem scaling the input features—It does not require tuning
- It is easy to implement and train a model using logistic regression
- It gives a measure of how relevant a predictor (coefficient size) is, and its direction of association (positive or negative)
How Does the Logistic Regression Algorithm Work?
Think about the following instance: A company wants to decide whether to enhance an employee’s pay based on their performance. They will make their decision using a linear regression technique. They can simplify their task by plotting a regression line using the employee’s performance as the independent variable and the salary rise as the dependent variable. What happens if a company wants to determine whether a worker deserves a promotion based on their performance? The linear graph shown above won’t work in this situation. In order to create a sigmoid curve, we clip the line at zero and one (S curve). The company can decide whether or not to raise an employee’s wage based on the threshold values.
To understand logistic regression, let’s go over the odds of success.
Odds (𝜃) = Probability of an event happening / Probability of an event not happening
𝜃 = p / 1 – p
The values of odds range from zero ∞ and the values of probability lie between zero and one.
Consider the equation of a straight line:
𝑦 = 𝛽0 + 𝛽1* 𝑥
Here, 𝛽0 is the y-intercept
𝛽1 is the slope of the line
x is the value of the x coordinate
y is the value of the prediction
Now to predict the odds of success, we use the following formula:
Exponentiating both sides, we have:
Let Y = e 𝛽0+𝛽1 * 𝑥
Then p(x) / 1 – p(x) = Y
p(x) = Y(1 – p(x))
p(x) = Y – Y(p(x))
p(x) + Y(p(x)) = Y
p(x)(1+Y) = Y
p(x) = Y / 1+Y
The equation of the sigmoid function is:
The sigmoid curve obtained from the above equation is as follows:
Applications of Logistic Regression
- Using the logistic regression algorithm, banks can predict whether a customer would default on loans or not
- To predict the weather conditions of a certain place (sunny, windy, rainy, humid, etc.)
- Ecommerce companies can identify buyers if they are likely to purchase a certain product
- Companies can predict whether they will gain or lose money in the next quarter, year, or month based on their current performance
- To classify objects based on their features and attributes
The assumption in a Logistic Regression Algorithm
- In a binary logistic regression, the dependent variable must be binary
- For a binary regression, the factor level one of the dependent variables should represent the desired outcome
- Only meaningful variables should be included
- The independent variables should be independent of each other. This means the model should have little or no multicollinearity
- The independent variables are linearly related to the log odds
- Logistic regression requires quite large sample sizes
Predict the Digits in Images Using a Logistic Regression Classifier in Python
We’ll be using the digits dataset in the sci-kit learn library to predict digit values from images using the logistic regression model in Python.
- Importing libraries and their associated methods
- Determining the total number of images and labels
- Displaying some of the images and their labels
- Dividing the dataset into “training” and “test” set
- Importing the logistic regression model
- Making an instance of the model and training it
- Predicting the output of the first element of the test set
- Predicting the output of the first 10 elements of the test set
- Prediction for the entire dataset
- Determining the accuracy of the model
- Representing the confusion matrix in a heat map
- Presenting predictions and actual output
The images above depict the actual numbers and the predicted digit values from our logistic regression model.
We anticipated that this essay would help you become familiar with the fundamentals of logistic regression and supervised learning. We discussed the logistic regression algorithm in depth using a complex case. The list of assumptions you should make to develop a logistic regression model was then discussed, after which we looked at the various applications of logistic regression. Finally, to predict the digits in photos, we developed a model using the logistic regression approach.