I have always been fascinated by financial modeling as the stakes are usually huge! While people tend to vilify banks, the truth remains that the world is currently facing a savings glut, and banks are under immense pressure to lend (read: low interest rates). In this project-blog, I only look at peer-to-peer lending company called LendingClub. LendingClub's platform enables borrowers to obtain a loan, and investors to purchase notes backed by payments made on loans. While it started in 2006, it has grown exponentially since then to be the largest peer-to-peer lending platform of its kind. Personally, I found it interesting because it's a great example of a company earning profits by playing within the rules of our free-market economic system. It also goes without mentioning that LendingClub started out during the aftermath of the 2008 financial crisis.
Timeline of Loans Funded by LendingClub
Interest Rates versus Loan Grades
US States by Mean Interest Rates
LendingClub's data is available publicly for investors to judge if it's doing a good job with it's portfolio. As a result, I expected the data to be a good fit, considering that the data (2007-2015) consisted of about $ 13.1 billion of loans made. During this period, the average loan amount was $14,742, and the total number of loans funded was 887,449. As the dataset is larger than usual, one should realize that analyzing this dataset directly using Jupyter Notebook (Python) is likely to crash the computer. One needs some experience to realize the boundaries between ordinary data and Big Data, and for an array shaped at 887449 rows and 135 features, we are beginning to enter this territory. To deal with size, the data was inserted into a PostgreSQL database located in my Amazon AWS server. This way, I could query the data remotely using Python's SQLAlchemy module. Nonetheless, this was done after cleaning the data.
A few obvious trends could be identified: a) Lending standards have become loose since late 2014 (read: lower interest rates), and the total loan volumes have increased every year since 2007; b) With a few exceptions, the lending standards are generally uniform across all 50 states; c) The default rate is strongly correlated to the grade of the loan being funded, which in-turn is correlated with credit score, debt-to-income-ratio and number-of-delinquencies; d) Home owners are much less likely to default than renters.
After data exploration, I wanted to develop a model that could predict the probability that a borrower would stop making payments. As classification models typically lead to a binary output, the problem at hand required the calculation of probabilities with different models. With probabilities in hand, I could then use different thresholds to loosen or tighten the lending standards and observe it's effect on profits/revenues. Yet, this approach still requires that I select the best model. One could look at the Accuracy Score, Recall, Precision, Sensitivity, True Positive Rate (TPR), False Positive Rate (FPR), Positive Predictive Value (PPV) etc. However, the best metric to compare binary outcomes from models' probability distribution is by examining the ROC (Receiver Operating Characteristic) curves and their corresponding AUC (Area Under the Curve) values.
ROC Curves for the Churn Data
As seen in the table above, the best AUC value of 0.68 was obtained for Logistic Regression, while Gradient Boosting and Linear SVC were close in performance. Note that the above values were obtained after tweaking many different parameters for each model, and also that relevant scaling (normalization) had to be applied for all features for kNN, Logistic Regression, Linear SVC and SVC.
Next, I wanted to pickle my Logistic Regression model, and develop a Flask App to evaluate a borrower's ability to borrow. But I still had not set an optimum threshold value for my model. It was easy to see that a high threshold would reduce the default-rate, but decrease the total volume of loans funded. Similarly, a low threshold would increase the default-rate, but increase the total volume of loans funded. At this point, one could get into further microeconomic complexities for this data-set. By playing with thresholds and corresponding recall values, I realized the nature of modeling that must have enabled LendingClub's success. There is a certain inherent limitation in being able to predict one's chances of default. But by creating bundles of low-risk and high-risk borrowers, LendingClub mitigates risk by charging higher interest rates to high-risk borrowers. Thus, any chance of optimizing profits/revenues is likely to be tight fit against LendingClub's own models.
A Snap-shot of a Flask-based tool to Re-evaluate a Customer's Chance of Getting Funded
Define Thresholds for Lending Standards
Maximize Recall for Defaulters (Ideal range: 0.29-0.70)
Maximize F-1 for Regular Borrowers (Ideal range: 0.89-0.68)
For example, by setting a stricter lending standard with 29% Recall from our Logistic Regression model, initial analysis indicate significant savings. However, these values should be taken with a grain of salt because any change in default rate or market conditions would significantly sway the results one way or other:
Savings from Decreased Default (~ 29% Recall): 588 million
Loss from Stricter Lending (~ 29% Recall): 578 million
Net Increase in Profits (~ 29% Recall): $10,000,000
In conclusion, with specific values of recall/threshold, one can estimate savings for LendingClub. However, as the models are a tight fit, any increase in the overall rate of default are likely to lead to significant losses. One would need to address a wide variety of economic preferences to be able to estimate precise savings or losses with the given amount of data.
Comments