How to Do Logistic Regression in Excel (Step by Step Guide)

Logistic regression is one of the most effective tools for analyzing binary outcomes, such as forecasting whether a consumer will purchase a product (Yes/No) or if a student will pass an exam (Pass/Fail). Although Excel does not have a built in function for logistic regression, you can use add-ins to perform logistic regression. This is a step by step guide on how to do logistic regression in Excel. Upon completion, you will understand the impact of the independent variables on a binary dependent variable and predict probabilities.

Key Takeaways

Logistic regression relates multiple independent variables and a binary dependent variable.
➤ The initial coefficient values are chosen arbitrarily.
Logit, n: =c0+(c1*Income)+(c2*Credit Score)+(c3*Loan Amount)
Exponent: =en
P(n): =IF(dependent variable outcome=1,en/(1+en),1-(en/(1+en)))
Log likelihood: =LN(P(n))
Enable Solver: File >> Options >> Excel Add-ins >> Solver Add-in
Solver: Data tab >> Analysis >> Solver >> Objective >> To >> Changing variable cells >> Uncheck make unconstrained variables non-negative >> GRG Nonlinear >> Solve
➤ The resulting coefficients are the solution to y=1. Just reverse the signs for the solution to y=0.

Logistic regression in Excel

In this article, we’ll learn about logistic regression and how to perform logistic regression in Excel using the Solver Add-in.

Download Practice Workbook


What Is Logistic Regression in Statistics?

Logistic regression models the relationship between a set of independent variables (predictors) and a binary dependent variable (outcome) that can take only two possible values, such as “yes” or “no”, “pass” or “fail”, or “1” and “0”.

Instead of fitting a line, as in linear regression, logistic regression uses the logistic function to model the probability that an event will occur. The mathematical expression is given below:

Logistic regression equation

where,

  • P(y) is the probability of success (y=1) or failure (y=0)
  • c0 is the intercept
  • c1, c2, c3 are the coefficients
  • x1, x2, x3 are the independent variables

Performing Logistic Regression in Excel

Consider the loan default dataset containing the income, credit score, loan amount, and defaulted (Yes or No) in columns A through D.

Loan default dataset for performing logistic regression in Excel

Here, the independent variables are the income, credit score, and the loan amount. The Defaulted is the dependent variable, where 1 represents yes (defaulted) and 0 means no (not defaulted).

Let’s use the above dataset to model how income, credit score, and loan amount affect the chance of a default. Our objective is to determine the logistic regression equation that best predicts default based on the other factors.

Steps:

➤ Choose arbitrary values for the c0, c1, c2, and c3 coefficients. For instance, we’ve set all the values to 0.01. Excel’s Solver Add-in will optimize these values later on.

Choosing arbitrary values for the coefficients

➤ You’ll need to construct a few columns to perform the logistic regression. In the calculation, we’ve defined Logit as n. The formula for Logit is given below.

The logit formula

➤ Select the output cell (E2) and enter the formula. Use the Fill Handle tool to copy the formula.

=$C$19+$C$20*A2+$C$21*B2+$C$22*C2

Applying the logit formula and using Fill Handle tool

➤ Calculate the values for en and copy the formula to the cells below.

=EXP(E2)

Calculating exponentital values

➤ The formula below gives the probability that event n will occur.

The formula to calculate probability using logistic regression in Excel

➤ Apply the formula to get the probabilities and autofill the cells below.

=IF(D2=1,F2/(1+F2),1-(F2/(1+F2)))

Calculating the probabilities

➤ The log likelihood represents the natural logarithm of the probability.

=LN(G2)

Log likelihood

➤ Use the SUM function to calculate the total log likelihood.

=SUM(H2:H16)

Total log likelihood

Now let’s use the Solver Add-in to perform the logistic regression in Excel. Note that you must activate the Solver Add-in before use. Follow these steps to activate this tool.

➤ Click on the File tab.

File tab

➤ Select Options. You can also use the shortcut  Alt + F + T  to open Excel Options.

Excel Options

➤ Select Add-ins >> Choose Excel Add-ins from the dropdown >> Go.

Excel add-ins

➤ Check the Solver Add-in option >> OK.

Activating Solver add-in

The Solver Add-in will be available in the Data tab whenever you open a new workbook.

➤ Go to the Data tab >> Analyze >> Solver.

Locating solver in the Data tab

Set Objective: Select the H17 cell containing the sum of log likelihoods
To: Max
By Changing Variable Cells: Choose the regression coefficients in C19:C22
Make Unconstrained Variables Non-Negative: Uncheck this option
Select a Solving Method: GRG Nonlinear
Solve

Solver inputs

Keep Solver Solution >> OK.

Keeping solver solution for the coefficients in logistic regression in Excel

➤ These are the coefficients for the solution y=1.

  • c0 = -113.58
  • c1 = -1.251
  • c2 = 0.2181
  • c3 = 0.7209

Coefficients for positive outcome in logistic regression in Excel

➤ For the solution of y=0, just reverse the signs of the coefficients, i.e.

  • c0 = 113.58
  • c1 = 1.251
  • c2 = -0.2181
  • c3 = -0.7209

Coefficients for negative outcome in logistic regression in Excel

➤ For example, the probability that a borrower with an income of $52K, a credit score of 678, and a loan amount of $33k will default is:

=EXP(C19+C20*D24+C21*D25+C22*D26)/(1+EXP(C19+C20*D24+C21*D25+C22*D26))

Since the probability is close to zero, the model predicts that this borrower is unlikely to default on the loan.

Results of logistic regression in Excel


Frequently Asked Questions

What is logistic regression, and when should I use it?

Logistic regression models the relationship between multiple independent variables and a binary dependent variable. You can use logistic regression when the dependent variable can take only two possible values, like “yes” or “no”, “pass” or “fail”, or “1” and “0”.

What is the formula for logistic regression?

The formula for logistic regression models the probability of the binary dependent variable as:

P(n) = en / (1 + en) where n = c0+c1x1+c2x2+c3x3+…+cnxn

What are common errors when performing logistic regression in Excel?

  • Solver not converging: This can occur if initial guesses for coefficients are far from the optimal values.
  • Perfect separation: Logistic regression struggles when a predictor perfectly separates the outcome (e.g., all 1s or 0s for a range of predictor values).
  • Small sample size: Logistic regression requires sufficient data to provide reliable estimates.

Wrapping Up

In this tutorial, we’ve learned about logistic regression and covered detailed steps on how to do logistic regression in Excel using the Solver Add-in. Feel free to download the practice file and share your thoughts with us.

ExcelInsider Team
We will be happy to hear your thoughts

Leave a reply

Excel Insider
Logo