Predicting Loan Approval Chances Using Machine Learning

Yonathan Levy
5 min readMar 8, 2024

In the last U.S. census in 2020 the US counted 331.4 million residents, with almost 80% of them are people older than 18. Combined with the fact that Americans owe 84 million mortgages, that’s about 1 mortgage per 3 American adults. Mortgages take a substential part of American debt, with about 12.14 trillion$ owed! That’s more than the GDPs of Germany, France and the UK combined.

According to “Bankrate”, the average balance for a first mortgage reached a record high in 2022, at 323,780$. Adding up even the property taxes and home insurances obligated by the states, the average mortgage payment is 2,883$ on a 30-year fixed mortgage and 3,759$ on a 15-year fixed mortgage.

With most Americans living paycheck to paycheck, understanding how to increase chances for loan approvals is essential for one’s “financial health”. Not only that, but banks often offer lower mortgage rates for high-credit customers.

The data

In my project, aiming both to understand the factors taken into account when approving or refusing a loan, and predicting future decisions. My dataset comprised from data of 4,269 loan seekers. Beside my target variable, whether the loan got either approved or refused, the dataset included some more loan seekers information:

  • Number of dependents of the applicant.
  • Education, either graduate or not graduate.
  • Whether the applicant is self-employed or not (employees, unemployed, students and retirees are all counted as “not self-employed”).
  • Annual income.
  • Asked loan ammount and loan term in years.
  • Applicant’s credit score.
  • Assets belonging to the applicant, divided into 4 asset classes — residential, commercial, luxury and bank.

Should you increase your asked loan?

The data hints that there is a linear correlation between the asked loan and the chances for approval, suggesting the higher the loan, the higher the approval chances.

Before rushing to the bank asking for a Guinness record breaker loan, you should know that there is almost a perfect correlation between the individual’s income and their asked mortgage. Perhaps that could be explained by pointing out that high-income individuals tend to buy their homes in more expensive neighborhoods to social-proof their children’s future.

Also, it might seem that rejected loan applicants have more dependents only from looking at the data. Approved applicants have about 2.5 dependents, while rejected applicants have about 3. However, checking the strength of the amount of dependents to loan approval or refusal shows little to no affection. Perhaps people who got rejected on their loans have another “thing” in common that people who got approved don’t have. More specifically, they tend to have different amounts.

Queen of the debt kingdom

The next factor in loan approval might as well be called “riskiness”, or, “can we trust the applicant’s ability to pay us back eventually?”. Taking about 80% of mortgage approval considerations(!) is the applicant’s credit score.

There is no single standard metric for calculating credit scores, with every institution calculating their own credit score a little differently. The dataset I chose uses CIBIL scores, ranging from 300 to 900. According to this method, a credit score between 300–499 is deemed as “poor”, 500–649 accounts to “average”, 650–749 is “good”, and 750–900, the highest group, accounts as “excellent”.

The method takes into account a few factors, with the first two accounting for about two thirds of the final scoring:

  • Repayment history — the most major factor, contributing 35% to the score.
  • Credit balance and utilisation (about 30% of the score) — refering to the total credit available and how much has been already used. Customers that spend more than they have are considered risky borrowers.
  • Length of credit history — adults and people who got their first credit card the second they could benefit from this factor of time. The banks tend to believe that if you were a responsible customer at the past, you would remain one in the future.
  • New credit — people who open up new lines of credit any other day are seen as “credit-hungry” in the eyes of lenders, understanding the possibility of their money defaulting in the applicants' hands.
  • Credit mix — refering to all kinds of already taken credit, loans and debt in all shapes and sizes.

Approved applicants have a median credit score of 720, with 50% of them having a credit score of about 600 to 800, which is much higher than the rejected applicants. The median score of the rejected group is only a little more than 400.

In other words, poor credit leads to poor loan approval chances, while good and excellent credits can get your hands on the next loan approval.

Other, more minor factors

While major, not only credit score can determine your loan approval. Other factors are assets you can bail on if your loan defaulted. However, as we have seen, owning assets will not automatically guarantee your next loan.

Approved loans are, on average, shorter in 2 years than rejected loans. About half of approved loans lie between 4.5 and 16 years, though a 10 year loan might be ideal.

Note that most loans are too big to be paid in 10 years. That is why you should pay attention to your credit score prior to applying for a loan.

Using technology to determine loan approval

In 2024, some banks allow their customers to apply for loans from their couch. Other banks might make you sit down for a meeting, but the final decision will be bounded to the algorithm the bank created.

Using machine learning, I built 4 models to predict whether a loan should be approved or not. I’ll leave my github page here as well.

Conclusions

If you are American, or any other person that might take a loan in the near future, you should take into account a very important main factor, and that is your financial behavior. If you are paying your credit bills on time and repaying your loans if you have any, then you are a responsible customer. And you might be elegible for reduced rates.

P.S:

This guide was written about my data science project. The information and data in the article do not constitute a recommendation for loans specifically or any other financial matter in general; nor is it a substitute for personalized advice that takes into acocunt the individual’s data and needs.

--

--