Developing a credit scoring model using social network analysis
Student thesis: Doctoral Thesis
This research examines the effects of adding social network attributes on credit scoring. Many lenders have realised the potential of borrowers with thin financial files who lack sufficient credit history. To overcome this information asymmetry problem, there has been a trend in examining the behaviour of borrowers. In many cases, such behaviour is influenced by peers within social circles of borrowers. This influence imposed by social circle is explained with the concept of homophily in sociology and network science disciplines. In this research, reducing information asymmetry is the first of two aims; whereas, increasing financial inclusion is the second aim. Achieving the aforementioned aims is done by finding meaningful information on social data of those who are unbanked or underbanked to measure how such data would affect their credit scores. Examples of such data are network types and sizes. Nine exploratory in-depth interviews were conducted with professional bankers and regulators to explore the effects of social networks on performance of borrowers. Additionally, a dataset containing loans given by a European lender to 307,000 borrowers was used to confirm and explain the effects of social network attributes on credit scoring. Alternative data made of social and behavioural artefacts were identified in the aforementioned dataset. Also, traditional data that are used in financial institutions were identified. A Mann-Whitney hypothesis test revealed that, at 1% significance level, bad social network types are higher at the sample group of defaulters than the sample group of transactors who repay their loans. Thereafter, a preliminary tree-based Bayesian analysis and a machine learning technique in applying Logistic regression model were completed on the dataset. Results have shown that one of the two social network types tested, defaulting ties, has a significant relationship with the probability of default and, accordingly, the credit score of borrowers. The aforementioned variable had a coefficient of 0.22 in two test trials when social data was added to financial and behavioural separately and 0.18 in the last test performed on all types of data combined. The area under curve (AUC) produced by the model was 0.58. In evaluating the applicability of social data in lending practices, the best explainable dataset, which included social network variables, was evaluated by running machine learning classification algorithms and achieving 0.68 accuracy level using XGBoost classifier. This research contributes, empirically, to the understanding credit scores using new variables (i.e. social network types). Finally, the study provides theoretical framework and evidence from the industry on when social data become important and justifies selecting a social credit score in such cases.
2.61 MB, PDF document