Logistic Regression
Learn all about Logistic Regression in this comprehensive tutorial.
- •Logistic regression aims to solve classification problems.
- •In Python we have modules that will do the work for us.
- •In logistic regression the coefficient is the expected change in log-odds of having the outcome per unit change in X.
- •The coefficient and intercept values can be used to find the probability that each tumor is cancerous.
- •To find the log-odds for each observation, we must first create a formula that looks similar to the one from linear regression, extracting the coefficient and the intercept.
- •3.
Logistic Regression
Logistic regression aims to solve classification problems. It does this by predicting categorical outcomes, unlike linear regression that predicts a continuous outcome.
In the simplest case there are two outcomes, which is called binomial, an example of which is predicting if a tumor is malignant or benign. Other cases have more than two outcomes to classify, in this case it is called multinomial. A common example for multinomial logistic regression would be predicting the class of an iris flower between 3 different species.
Here we will be using basic logistic regression to predict a binomial variable. This means it has only two possible outcomes.
How does it work?
In Python we have modules that will do the work for us. Start by importing the NumPy module.
Store the independent variables in X.
Store the dependent variable in y.
Below is a sample dataset:
We will use a method from the sklearn module, so we will have to import that module as well:
From the sklearn module we will use the LogisticRegression() method to create a logistic regression object.
This object has a method called fit() that takes the independent and dependent values as parameters and fills the regression object with data that describes the relationship:
Now we have a logistic regression object that is ready to whether a tumor is cancerous based on the tumor size:
We have predicted that a tumor with a size of 3.46mm will not be cancerous.
Coefficient
In logistic regression the coefficient is the expected change in log-odds of having the outcome per unit change in X.
This does not have the most intuitive understanding so let's use it to create something that makes more sense, odds.
This tells us that as the size of a tumor increases by 1mm the odds of it being a cancerous tumor increases by 4x.
Probability
The coefficient and intercept values can be used to find the probability that each tumor is cancerous.
Create a function that uses the model's coefficient and intercept values to return a new value. This new value represents probability that the given observation is a tumor:
Function Explained
To find the log-odds for each observation, we must first create a formula that looks similar to the one from linear regression, extracting the coefficient and the intercept.
To then convert the log-odds to odds we must exponentiate the log-odds.
Now that we have the odds, we can convert it to probability by dividing it by 1 plus the odds.
Let us now use the function with what we have learned to find out the probability that each tumor is cancerous.
Results Explained
3.78 0.61 The probability that a tumor with the size 3.78cm is cancerous is 61%.
2.44 0.19 The probability that a tumor with the size 2.44cm is cancerous is 19%.
2.09 0.13 The probability that a tumor with the size 2.09cm is cancerous is 13%.
Module quiz
2 questionsWhich of the following is true about Logistic Regression?
What is the most common pitfall when working with Logistic Regression?
Answer all questions to submit.