CN117172910A - Credit evaluation method and device based on EBM model, electronic equipment and storage medium - Google Patents

Credit evaluation method and device based on EBM model, electronic equipment and storage medium Download PDF

Info

Publication number
CN117172910A
CN117172910A CN202311196623.6A CN202311196623A CN117172910A CN 117172910 A CN117172910 A CN 117172910A CN 202311196623 A CN202311196623 A CN 202311196623A CN 117172910 A CN117172910 A CN 117172910A
Authority
CN
China
Prior art keywords
ebm
model
data
credit
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311196623.6A
Other languages
Chinese (zh)
Inventor
刘佳明
张雪妹
王刘安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Beijing Technology and Business University
Original Assignee
Beijing Institute of Technology BIT
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT, Beijing Technology and Business University filed Critical Beijing Institute of Technology BIT
Priority to CN202311196623.6A priority Critical patent/CN117172910A/en
Publication of CN117172910A publication Critical patent/CN117172910A/en
Pending legal-status Critical Current

Links

Landscapes

  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

A credit assessment method, a credit assessment device, an electronic device and a storage medium based on an EBM model, wherein the credit assessment method comprises the following steps: acquiring first data of an object to be predicted, wherein the first data is credit risk related data of the object to be predicted; inputting the first data into an EBM model, and acquiring a prediction result output by the EBM model, wherein the prediction result comprises a first result which comprises whether the object to be predicted breaks down after acquiring a loan; the training process of the EBM model comprises the following steps: obtaining a credit risk related dataset, each sample of the dataset comprising a plurality of features; based on the dataset, the EBM model is trained, the EBN model being the sum of all main effect shape functions and all interaction effect shape functions. The invention can more accurately predict whether the borrower can violate the constraint.

Description

基于EBM模型的信用评估方法、装置、电子设备、存储介质Credit assessment methods, devices, electronic equipment, and storage media based on EBM models

技术领域Technical field

本发明涉及信用评估技术领域,尤其涉及一种基于EBM模型的信用评估方法、装置、电子设备、存储介质。The present invention relates to the technical field of credit assessment, and in particular to a credit assessment method, device, electronic equipment, and storage medium based on an EBM model.

背景技术Background technique

第三方支付平台等行业逐步抬头,银行推动个人信用消费业务,提供收益的同时也面临相应的经营风险,房屋按揭、汽车贷款、银行卡等各类个人消费都急需信贷保障。但由于个人信用制度仍在不断完善阶段,大量信贷客户的财产信息不明朗以及经济情况不明确,给很多诚信意识薄弱的贷款人带来了可乘之机,而这种风险也基本由商业银行承受,违约贷款现象也不利于商业银行的稳定运行,进而可能导致更严重的金融系统性风险。当前关于信用风险评估模型的研究大多追求精准的预测性能,但是忽略了决策的可解释性,并且准确性和可解释性往往难以兼顾,存在准确性-可解释性困境,因此亟需研究导致借款人违约的关键因素以及各违约因素之间的影响状态。Industries such as third-party payment platforms are gradually gaining ground. Banks promote personal credit consumption business, which provides income but also faces corresponding operating risks. Various types of personal consumption such as house mortgages, car loans, and bank cards are in urgent need of credit protection. However, because the personal credit system is still in the stage of continuous improvement, a large number of credit customers have unclear property information and unclear economic conditions, which has brought opportunities to many lenders with weak integrity awareness, and this risk is basically borne by commercial banks. The phenomenon of defaulted loans is also detrimental to the stable operation of commercial banks, which may lead to more serious financial systemic risks. Most of the current research on credit risk assessment models pursues accurate prediction performance, but ignores the interpretability of decision-making, and accuracy and interpretability are often difficult to balance. There is an accuracy-interpretability dilemma. Therefore, there is an urgent need to study the factors that lead to borrowing. The key factors of people's default and the influence status between various default factors.

公开于该背景技术部分的信息仅仅旨在加深对本发明的总体背景技术的理解,而不应当被视为承认或以任何形式暗示该信息构成已为本领域技术人员所公知的现有技术。The information disclosed in this Background section is merely intended to enhance an understanding of the general background of the invention and should not be construed as an admission or in any way implying that the information constitutes prior art that is already known to those skilled in the art.

发明内容Contents of the invention

针对现有技术存在的问题,本发明提供一种基于EBM模型的信用评估方法、装置、电子设备、存储介质。In view of the problems existing in the existing technology, the present invention provides a credit assessment method, device, electronic equipment, and storage medium based on the EBM model.

本发明的技术方案提供的一种基于EBM模型的信用评估方法,所述方法包括:The technical solution of the present invention provides a credit assessment method based on the EBM model, which method includes:

获取待预测对象的第一数据,所述第一数据为所述待预测对象的信用风险相关数据;Obtain first data of the object to be predicted, where the first data is credit risk-related data of the object to be predicted;

将所述第一数据输入EBM模型,获取所述EBM模型输出的预测结果,所述预测结果包括第一结果,所述第一结果包括所述待预测对象在获取贷款后是否会违约;Input the first data into the EBM model to obtain the prediction results output by the EBM model. The prediction results include a first result, and the first result includes whether the object to be predicted will default after obtaining the loan;

其中,所述EBM模型的训练过程包括:Among them, the training process of the EBM model includes:

获取信用风险相关的数据集,所述数据集的每条样本包括多个特征;Obtain a credit risk-related data set, where each sample of the data set includes multiple features;

基于所述数据集,训练所述EBM模型,所述EBN模型是所有主效应形函数和所有交互效应形函数之和,所述主效应形函数是针对单一所述特征的形函数,所述交互效应形函数是针对两个不同所述特征的形函数。Based on the data set, the EBM model is trained. The EBN model is the sum of all main effect shape functions and all interaction effect shape functions. The main effect shape function is a shape function for a single feature. The interaction effect The effect shape function is a shape function for two different described characteristics.

可选地,所述预测结果还包括第二结果,所述第二结果包括所述多个特征针对所述第一结果的影响程度,用来分析预测得到所述第一结果的原因。Optionally, the predicted result also includes a second result, and the second result includes the degree of influence of the multiple features on the first result, which is used to analyze the reason why the first result is predicted.

可选地,获取信用风险相关的数据集,所述数据集的每条样本包括多个特征,进一步包括:Optionally, obtain a credit risk-related data set. Each sample of the data set includes multiple features, further including:

获取信用风险相关的样本集,所述样本集的每条样本包括多个项;Obtain a credit risk-related sample set, where each sample of the sample set includes multiple items;

通过Lasso算法筛选所述多个项,将筛选出的项作为所述多个特征。The multiple items are filtered through the Lasso algorithm, and the filtered items are used as the multiple features.

可选地,获取信用风险相关的样本集,进一步包括:Optionally, obtain credit risk-related sample sets, further including:

获取银行披露的个人信用数据;Obtain personal credit data disclosed by banks;

对所述个人信用数据进行预处理,形成所述样本集;Preprocess the personal credit data to form the sample set;

其中,所述样本集的每条样本包括的项至少包括如下的两种:Each sample in the sample set includes at least two of the following items:

当前贷款利率,所在公司类型,工作年限,是否有房,审核情况,贷款用途类别,贷款的初始列表状态,借款人提前还款次数,借款人提前还款累计金额,工作类型,抵押贷款账户数量,教育类型,家庭成员人数,贷款的还款月数,借款人是否违约。Current loan interest rate, company type, working years, whether you own a house, review status, loan purpose category, initial list status of the loan, borrower's number of early repayments, cumulative amount of borrower's early repayments, type of job, number of mortgage accounts , type of education, number of family members, number of months to repay the loan, whether the borrower is in default.

可选地,所述预处理至少包括如下的一种:Optionally, the preprocessing includes at least one of the following:

缺失值处理、异常值处理、数据平衡处理和数据归一化处理。Missing value processing, outlier processing, data balancing processing and data normalization processing.

可选地,所述Lasso算法的公式为:Optionally, the formula of the Lasso algorithm is:

其中,y是所述借款人是否违约的向量,X是所述多个项的矩阵,β是系数向量,N是样本数量,α是正则化强度超参数。Among them, y is the vector of whether the borrower defaults, X is the matrix of the multiple items, β is the coefficient vector, N is the number of samples, and α is the regularization strength hyperparameter.

可选地,所述EBM模型的训练过程还包括:Optionally, the training process of the EBM model also includes:

在所述EBM模型训练完成后,生成全局解释;After the EBM model training is completed, a global explanation is generated;

其中,所述全局解释包括:各个所述特征的重要性,和/或,各个所述特征和所述第一结果的函数关系。Wherein, the global explanation includes: the importance of each of the features, and/or the functional relationship between each of the features and the first result.

本发明的技术方案还提供的一种基于EBM模型的信用评估装置,所述装置包括:The technical solution of the present invention also provides a credit assessment device based on the EBM model, which device includes:

获取模块,用于获取待预测对象的第一数据,所述第一数据为所述待预测对象的信用风险相关数据;An acquisition module, configured to acquire the first data of the object to be predicted, where the first data is the credit risk-related data of the object to be predicted;

预测模块,用于将所述第一数据输入EBM模型,获取所述EBM模型输出的预测结果,所述预测结果包括第一结果,所述第一结果包括所述待预测对象在获取贷款后是否会违约;A prediction module, configured to input the first data into the EBM model, and obtain the prediction results output by the EBM model. The prediction results include a first result, and the first result includes whether the object to be predicted will obtain a loan. will breach the contract;

其中,所述EBM模型的训练过程包括:Among them, the training process of the EBM model includes:

获取信用风险相关的数据集,所述数据集的每条样本包括多个特征;Obtain a credit risk-related data set, where each sample of the data set includes multiple features;

基于所述数据集,训练所述EBM模型,所述EBN模型是所有主效应形函数和所有交互效应形函数之和,主效应形函数是针对单一特征的形函数,交互效应形函数是针对两个不同特征的形函数。Based on the data set, the EBM model is trained. The EBN model is the sum of all main effect shape functions and all interaction effect shape functions. The main effect shape function is the shape function for a single feature, and the interaction effect shape function is for two shape functions with different characteristics.

本发明的技术方案还提供的一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述任一项所述基于EBM模型的信用评估方法的步骤。The technical solution of the present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the above-mentioned steps are implemented. The steps of any one of the credit evaluation methods based on the EBM model.

本发明的技术方案还提供的一种非暂态计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上述任一项所述基于EBM模型的信用评估方法的步骤。The technical solution of the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by the processor, the credit evaluation method based on the EBM model as described in any of the above is implemented. A step of.

本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

附图说明Description of drawings

为了更清楚地说明本发明或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments or the prior art description will be briefly introduced below. Obviously, the drawings in the following description are the drawings of the present invention. For some embodiments, those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting creative efforts.

图1为本发明实施例提供的一种基于EBM模型的信用评估方法的流程示意图;Figure 1 is a schematic flow chart of a credit assessment method based on the EBM model provided by an embodiment of the present invention;

图2为本发明实施例还提供的一种基于EBM模型的信用评估装置结构示意图;Figure 2 is a schematic structural diagram of a credit assessment device based on the EBM model further provided by an embodiment of the present invention;

图3为本发明实施例还提供一种基于EBM模型的信用风险评估和解释系统的数据流转示意图;Figure 3 is a schematic diagram of data flow of a credit risk assessment and interpretation system based on the EBM model provided by an embodiment of the present invention;

图4为本发明实施例提供的特征重要性的排序图;Figure 4 is a ranking diagram of feature importance provided by an embodiment of the present invention;

图5为本发明实施例提供的“当前贷款利率”特征与目标变量之间的形函数以及特征自身取值的分布情况示意图;Figure 5 is a schematic diagram of the shape function between the "current loan interest rate" feature and the target variable as well as the distribution of the values of the feature itself provided by the embodiment of the present invention;

图6为本发明实施例的数据集中某一样本的局部解释示意图;Figure 6 is a partial explanation diagram of a certain sample in the data set according to the embodiment of the present invention;

图7为本发明提供的一种电子设备的实体结构示意图。FIG. 7 is a schematic diagram of the physical structure of an electronic device provided by the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明中的附图,对本发明中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the present invention more clear, the technical solutions in the present invention will be clearly and completely described below in conjunction with the accompanying drawings of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts fall within the scope of protection of the present invention.

下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的基于EBM模型的信用评估方法进行详细地说明。The credit evaluation method based on the EBM model provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios.

图1为本发明实施例提供的一种基于EBM模型的信用评估方法的流程示意图,如图1所示,本发明的技术方案提供的一种基于EBM(Explainable Boosting Machine,可解释机器学习)模型的信用评估方法,方法包括如下步骤:Figure 1 is a schematic flow chart of a credit evaluation method based on an EBM model provided by an embodiment of the present invention. As shown in Figure 1, the technical solution of the present invention provides a model based on EBM (Explainable Boosting Machine, explainable machine learning) The credit assessment method includes the following steps:

S110、获取待预测对象的第一数据,第一数据为待预测对象的信用风险相关数据;S110. Obtain the first data of the object to be predicted, where the first data is the credit risk-related data of the object to be predicted;

S120、将第一数据输入EBM模型,获取EBM模型输出的预测结果,预测结果包括第一结果,第一结果包括待预测对象在获取贷款后是否会违约;S120. Input the first data into the EBM model, and obtain the prediction result output by the EBM model. The prediction result includes the first result, and the first result includes whether the object to be predicted will default after obtaining the loan;

其中,EBM模型的训练过程包括:Among them, the training process of the EBM model includes:

获取信用风险相关的数据集,数据集的每条样本包括多个特征;Obtain a data set related to credit risk. Each sample of the data set includes multiple features;

基于数据集,训练EBM模型,EBN模型是所有主效应形函数和所有交互效应形函数之和,主效应形函数是针对单一特征的形函数,交互效应形函数是针对两个不同特征的形函数。Based on the data set, train the EBM model. The EBN model is the sum of all main effect shape functions and all interaction effect shape functions. The main effect shape function is the shape function for a single feature, and the interaction effect shape function is the shape function for two different features. .

通过上述EBM模型提高对待预测对象(借款人)的贷款违约预测准度。The above-mentioned EBM model is used to improve the accuracy of loan default prediction for the target of prediction (borrower).

可选地,预测结果还包括第二结果,第二结果包括多个特征针对第一结果的影响程度,用来分析预测得到第一结果的原因。据此,可以告知待预测对象或者审批人员,具体不能获得贷款的主要因素。Optionally, the prediction result also includes a second result. The second result includes the degree of influence of multiple features on the first result, which is used to analyze the reasons for predicting the first result. Based on this, the target to be predicted or the approving personnel can be informed of the main reasons why they cannot obtain a loan.

可选地,获取信用风险相关的数据集,数据集的每条样本包括多个特征,进一步包括:Optionally, obtain a credit risk-related data set. Each sample of the data set includes multiple features, further including:

获取信用风险相关的样本集,样本集的每条样本包括多个项;Obtain a sample set related to credit risk. Each sample of the sample set includes multiple items;

通过Lasso(Least absolute shrinkage and selection operator,最小绝对收缩和选择算子)算法筛选多个项,将筛选出的项作为多个特征。Multiple items are filtered through the Lasso (Least absolute shrinkage and selection operator) algorithm, and the filtered items are used as multiple features.

通过上述的项筛选,可以控制用来影响结果的特征,从而控制整个算法的计算量和具体影响预测结果的项。Through the above-mentioned item screening, you can control the features used to affect the results, thereby controlling the calculation amount of the entire algorithm and the items that specifically affect the prediction results.

可选地,获取信用风险相关的样本集,进一步包括:Optionally, obtain credit risk-related sample sets, further including:

获取银行披露的个人信用数据;Obtain personal credit data disclosed by banks;

对个人信用数据进行预处理,形成样本集;Preprocess personal credit data to form a sample set;

其中,样本集的每条样本包括的项至少包括如下的两种:Among them, each sample in the sample set includes at least two of the following items:

当前贷款利率,所在公司类型,工作年限,是否有房,审核情况,贷款用途类别,贷款的初始列表状态,借款人提前还款次数,借款人提前还款累计金额,工作类型,抵押贷款账户数量,教育类型,家庭成员人数,贷款的还款月数,借款人是否违约。借款人是否违约是目标变量,其他是解释变量。Current loan interest rate, company type, working years, whether you own a house, review status, loan purpose category, initial list status of the loan, borrower's number of early repayments, cumulative amount of borrower's early repayments, type of job, number of mortgage accounts , type of education, number of family members, number of months to repay the loan, whether the borrower is in default. Whether the borrower defaults is the target variable, and the others are explanatory variables.

可选地,预处理至少包括如下的一种:Optionally, preprocessing includes at least one of the following:

缺失值处理、异常值处理、数据平衡处理和数据归一化处理。Missing value processing, outlier processing, data balancing processing and data normalization processing.

可选地,Lasso算法的公式为:Alternatively, the formula of the Lasso algorithm is:

其中,y是借款人是否违约的向量,X是多个项的矩阵,β是系数向量,N是样本数量,α是正则化强度超参数。Among them, y is a vector of whether the borrower defaults, X is a matrix of multiple items, β is a coefficient vector, N is the number of samples, and α is a regularization strength hyperparameter.

在拟合过程中,通过调节正则化参数α||β||1来控制项的稀疏性。可以使用交叉验证或网格搜索等方法找到最佳的正则化强度超参数。During the fitting process, the sparsity of the terms is controlled by adjusting the regularization parameter α||β|| 1 . The optimal regularization strength hyperparameters can be found using methods such as cross-validation or grid search.

根据训练好的Lasso回归模型,获取所有项的系数;对系数进行排序,按照绝对值从大到小排序;通过先验知识和实际需求来确定最佳的阈值保留系数大于阈值的项,从而作为筛选出的特征。According to the trained Lasso regression model, obtain the coefficients of all items; sort the coefficients from large to small in absolute value; determine the best threshold through prior knowledge and actual needs to retain items with coefficients greater than the threshold, so as to Filtered features.

可选地,EBM模型的训练过程还包括:Optionally, the training process of the EBM model also includes:

在EBM模型训练完成后,生成全局解释;After the EBM model training is completed, a global explanation is generated;

其中,全局解释包括:各个特征的重要性,和/或,各个特征和第一结果的函数关系。Among them, the global explanation includes: the importance of each feature, and/or the functional relationship between each feature and the first result.

通过全局解释,可以持续调整EBM模型,使得模型的侧重方向更符合用户的需求。Through global interpretation, the EBM model can be continuously adjusted to make the model's focus more in line with user needs.

下面对本发明提供的基于EBM模型的信用评估装置进行描述,下文描述的基于EBM模型的信用评估装置与上文描述的基于EBM模型的信用评估方法可相互对应参照。需要说明的是,这里所述的装置包括计算机、处理器等程序运行设备上的虚拟装置。The credit evaluation device based on the EBM model provided by the present invention is described below. The credit evaluation device based on the EBM model described below and the credit evaluation method based on the EBM model described above can be mutually referenced. It should be noted that the devices described here include virtual devices on program running equipment such as computers and processors.

图2为本发明实施例还提供的一种基于EBM模型的信用评估装置结构示意图,如图2所示,本发明的技术方案还提供的一种基于EBM模型的信用评估装置,装置包括:Figure 2 is a schematic structural diagram of a credit assessment device based on the EBM model further provided by an embodiment of the present invention. As shown in Figure 2, the technical solution of the present invention also provides a credit assessment device based on the EBM model. The device includes:

获取模块,用于获取待预测对象的第一数据,第一数据为待预测对象的信用风险相关数据;The acquisition module is used to obtain the first data of the object to be predicted, where the first data is the credit risk-related data of the object to be predicted;

预测模块,用于将第一数据输入EBM模型,获取EBM模型输出的预测结果,预测结果包括第一结果,第一结果包括待预测对象在获取贷款后是否会违约;The prediction module is used to input the first data into the EBM model and obtain the prediction results output by the EBM model. The prediction results include the first result, and the first result includes whether the object to be predicted will default after obtaining the loan;

其中,EBM模型的训练过程包括:Among them, the training process of the EBM model includes:

获取信用风险相关的数据集,数据集的每条样本包括多个特征;Obtain a data set related to credit risk. Each sample of the data set includes multiple features;

基于数据集,训练EBM模型,EBN模型是所有主效应形函数和所有交互效应形函数之和,主效应形函数是针对单一特征的形函数,交互效应形函数是针对两个不同特征的形函数。Based on the data set, train the EBM model. The EBN model is the sum of all main effect shape functions and all interaction effect shape functions. The main effect shape function is the shape function for a single feature, and the interaction effect shape function is the shape function for two different features. .

本实施例能够更准确地预测借款人是否会违约。This embodiment can more accurately predict whether a borrower will default.

又一实施例中,本发明提出一种基于EBM模型的信用风险评估和解释方法,用于金融机构预测信用风险违约概率,来降低贷款风险减少损失,包括以下步骤:步骤1),从信用风险相关的数据源中收集和预处理数据,包括缺失值处理、异常值处理、数据平衡和数据归一化处理,进而得到预处理后数据集;步骤2),所述预处理后数据集的特征数据,使用Lasso进行特征筛选,保留筛选后特征,所述预处理后数据集中的筛选后特征数据作为数据集;步骤3),将所述信用风险数据输入可解释模型EBM,所述EBM模型从数据中学习和选择重要的特征,以及数据之间的相互作用,生成可解释的预测结果;步骤4),依据所述EBM模型的预测结果,对借款人的信用风险进行评估和分类,输出借款人的违约概率,预测准确率等评价指标;步骤5),依据所述EBM模型的全局解释和局部解释(即第二结果)等信息,对信用风险评估结果进行解释和分析,输出借款人的信用风险因素、信用风险趋势、信用风险建议等信息。In another embodiment, the present invention proposes a credit risk assessment and interpretation method based on the EBM model, which is used by financial institutions to predict credit risk default probability to reduce loan risks and losses, including the following steps: Step 1), from credit risk Collect and preprocess data from relevant data sources, including missing value processing, outlier processing, data balancing and data normalization processing, and then obtain a preprocessed data set; step 2), characteristics of the preprocessed data set Data, use Lasso to perform feature screening, retain the filtered features, and use the screened feature data in the preprocessed data set as a data set; step 3), input the credit risk data into the interpretable model EBM, and the EBM model is Learn and select important features in the data, as well as the interaction between the data, to generate interpretable prediction results; step 4), based on the prediction results of the EBM model, evaluate and classify the borrower's credit risk, and output the loan Evaluation indicators such as human default probability and prediction accuracy; step 5), interpret and analyze the credit risk assessment results based on the global explanation and local explanation (i.e., the second result) of the EBM model, and output the borrower's Credit risk factors, credit risk trends, credit risk recommendations and other information.

本发明首先从银行贷款业务中收集借款人的真实数据,对数据预处理后,进行了基于Lasso的特征选择,再采用EBM方法实现信用风险的监控和识别工作,并进行了实验验证。实验结果表明,本发明在银行个贷违约预测应用方面具有较好的效果,并具有良好的预测鲁棒性,并对借款人违约的关键因素以及各因素的影响状况提供了合理的解释。This invention first collects the real data of borrowers from bank loan business, performs feature selection based on Lasso after preprocessing the data, and then uses the EBM method to realize the monitoring and identification of credit risks, and conducts experimental verification. Experimental results show that the present invention has a good effect in the application of bank personal loan default prediction, has good prediction robustness, and provides a reasonable explanation for the key factors of the borrower's default and the influence of each factor.

进一步地,在步骤1)中:Further, in step 1):

根据金融机构实际贷款业务的信息收集的需要,在银行中收集相关的特征数据,采集的信息包括解释变量,主要包含个人基本信息,还款能力及还款意愿三大方面,个人基本信息主要由年龄、职业、学历等特征构成,还款能力主要有资产、工资、社会关系的特征构成,还款意愿主要考核该人是否有相应的违约时间,提前还款次数,以及累计还款金额。According to the information collection needs of the actual loan business of financial institutions, relevant characteristic data are collected in banks. The collected information includes explanatory variables, which mainly include three aspects: personal basic information, repayment ability and repayment willingness. The basic personal information is mainly composed of Characteristics such as age, occupation, education, etc. make up the repayment ability. The repayment ability mainly consists of the characteristics of assets, wages, and social relationships. The willingness to repay mainly assesses whether the person has corresponding default time, the number of early repayments, and the cumulative repayment amount.

具体地,解释变量分别是当前贷款利率,所在公司类型,工作年限,是否有房,审核情况,贷款用途类别,贷款的初始列表状态,借款人提前还款次数,借款人提前还款累计金额,工作类型,抵押贷款账户数量,教育类型,家庭成员人数,贷款的还款月数等。Specifically, the explanatory variables are the current loan interest rate, company type, working years, whether there is a house, review status, loan purpose category, initial listing status of the loan, number of borrowers' prepayments, cumulative amount of borrowers' prepayments, Type of job, number of mortgage accounts, type of education, number of family members, number of months to repay the loan, etc.

最后,还有一个用于判断借款人是否违约的目标变量。Finally, there is a target variable used to determine whether the borrower defaults.

解释变量对应的集合表示为X={x1,x2,……,xM},其中M为解释变量的个数,目标变量的标签信息表示为Y={0,1},其中,0表示未违约,1表示发生违约。The set corresponding to the explanatory variables is expressed as Indicates no breach of contract, 1 indicates a breach of contract.

缺失值处理步骤,由于数据集缺失样本量较少,仅占全部样本数据量的9%,因此直接删除缺失样本数据后仍能保持较大的数据量,最终保留了9142条数据。In the missing value processing step, since the number of missing samples in the data set is small, accounting for only 9% of the total sample data, a large amount of data can still be maintained after directly deleting the missing sample data, and 9142 pieces of data were finally retained.

异常值处理步骤,由于信息收集过程中记录失误等原因,数据集中出现明显异样的样本,为避免异常数据对违约判断产生干扰,采用分位距法,对所述原始数据集中每个特征数据进行筛选和调整,将数据分为四个等分,分别是第一四分位数(Q1,25%分位数)、第二四分位数(Q2,50%分位数)和第三四分位数(Q3,75%分位数)。IQR是Q3和Q1的差值,它表示数据集中的中间50%数据的分布范围。将数据集按照从小到大的顺序排列,然后计算Q1和Q3。Q1是数据集中25%的位置处的值,Q3是数据集中75%的位置处的值。通过计算Q3和Q1的差值,得到IQR。公式为:IQR=Q3-Q1。异常值定义为数据集中位于Q1-1.5*IQR以下或Q3+1.5*IQR以上的值。对于超出异常值边界的数据点,认为是异常值。对于异常数据进行删除处理。In the outlier processing step, due to recording errors during the information collection process and other reasons, obviously abnormal samples appear in the data set. In order to avoid the abnormal data from interfering with the judgment of breach of contract, the quantile distance method is used to conduct analysis on each feature data in the original data set. Filter and adjust the data into four equal parts, namely the first quartile (Q1, 25% quantile), the second quartile (Q2, 50% quantile) and the third quartile. Quantile (Q3, 75% quantile). IQR is the difference between Q3 and Q1, which represents the distribution range of the middle 50% of the data in the data set. Arrange the data sets in order from small to large, and then calculate Q1 and Q3. Q1 is the value at 25% of the data set, Q3 is the value at 75% of the data set. By calculating the difference between Q3 and Q1, the IQR is obtained. The formula is: IQR=Q3-Q1. Outliers are defined as values in the data set that are below Q1-1.5*IQR or above Q3+1.5*IQR. Data points that exceed the outlier boundary are considered outliers. Delete abnormal data.

数据平衡处理步骤,经缺失值和异常值处理后,数据中存在严重的类不平衡的问题,严重影响EBM模型的数据拟合过程,采用SMOTE(Synthetic Minority Over-samplingTechnique,合成少数群体过度取样技术)方法对数据进行平衡处理,将原始数据中每个少数样本xi的最近邻中样本中任意选择一个样本xj,然后在确定样本xi和最近邻样本xj之间的连线上任意选择一点,将其作为新合成的少数类样本,然后添加到原数据集中从而改善数据集中存在的数据不平衡问题。其中所述原始数据集中少量样本经数据平衡处理的得到的新数据xnew,具体处理公式如下:In the data balancing processing step, after missing values and outliers are processed, there is a serious class imbalance problem in the data, which seriously affects the data fitting process of the EBM model. SMOTE (Synthetic Minority Over-samplingTechnique) is used. ) method to balance the data, randomly select a sample x j from the nearest neighbor samples of each minority sample x i in the original data, and then determine any sample x j on the connection line between the sample x i and the nearest neighbor sample x j Select a point, use it as a newly synthesized minority class sample, and then add it to the original data set to improve the data imbalance problem existing in the data set. The new data x new obtained by data balancing processing of a small number of samples in the original data set, the specific processing formula is as follows:

xnew=x+rand(0,1)×(x-xn)x new =x+rand(0,1)×(xx n )

其中,xn表示从少数类样本中找到样本x的k个近邻样本中随机选择样本的样本数值;Among them, x n represents the sample value of a randomly selected sample from the k nearest neighbor samples of the sample x found in the minority class sample;

归一化处理步骤,由于不同解释变量之间存在不同的量纲,变化区间也处于不同的数量级,因此导致某些变量被忽略,影响到数据分析的结果。用归一化方法,将所述原始数据集中每个特征数据的数值,替换为每个特征数据的归一化数据,其中所述原始数据集中第j个样本第i个特征数据的归一化后的数值x′(ij),具体计算公式如下:In the normalization processing step, due to the different dimensions between different explanatory variables and the change intervals are also in different orders of magnitude, some variables are ignored, which affects the results of data analysis. Use the normalization method to replace the value of each feature data in the original data set with the normalized data of each feature data, where the normalized value of the i-th feature data of the j-th sample in the original data set The final value x′ (ij) , the specific calculation formula is as follows:

其中,xij表示所述原始数据集中第j个样本第i个特征数据,xmin表示所述原始数据集中第i列特征数据的最小值,xmax表示所述原始数据集中第i列特征数据的最大值;Among them, x ij represents the i-th feature data of the j-th sample in the original data set, x min represents the minimum value of the i-th column feature data in the original data set, and x max represents the i-th column feature data in the original data set. the maximum value;

进一步地,在步骤2)中:Further, in step 2):

收集的原始数据中包含了借款人的解释变量,其中不乏存在对判断是否违约性能较弱的特征,所以这部分内容采用了Lasso的特征选择方法,对原始数据集中的特征进行选择,一方面提高模型的预测效果,另一方面降低计算的复杂度。The collected original data contains explanatory variables of borrowers, many of which have features that are weak in determining whether to default. Therefore, this part of the content uses Lasso's feature selection method to select features in the original data set. On the one hand, it improves The prediction effect of the model, on the other hand, reduces the computational complexity.

Lasso进行特征选择的主要思路是简单线性回归加上L1正则化。在普通的线性回归中,优化目标是最小化预测值与实际值之间的平方误差。而在Lasso中,优化目标是最小化损失函数,该损失函数由两部分组成:平方误差项(类似于线性回归)和L1正则化项。L1正则化项是模型系数的绝对值之和,乘以一个正则化参数λ(lambda)。这一项的存在使得优化过程中模型的系数受到约束,倾向于将一些系数压缩为零。The main idea of Lasso's feature selection is simple linear regression plus L1 regularization. In ordinary linear regression, the optimization goal is to minimize the squared error between the predicted value and the actual value. In Lasso, the optimization goal is to minimize the loss function, which consists of two parts: the squared error term (similar to linear regression) and the L1 regularization term. The L1 regularization term is the sum of the absolute values of the model coefficients, multiplied by a regularization parameter λ (lambda). The existence of this term constrains the coefficients of the model during the optimization process, tending to compress some coefficients to zero.

Lasso的优化问题可以表示为以下形式:最小化:平方误差项+λ*Σ|β|。其中,平方误差项衡量了模型预测值与实际值之间的差距,λ是正则化参数,β表示模型的系数向量。随着λ的增大,优化过程会导致部分特征的系数逐渐收缩为零。这是因为正则化项惩罚系数的绝对值,使得优化过程更倾向于选择较少的特征,并将不重要的特征的系数压缩为零。由于L1正则化的影响,某些特征的系数会变为零,从而实现了特征选择。具有非零系数的特征被认为在解释目标变量方面更具有重要性,因此被保留下来。Lasso的数学表达式可以等价于下式:Lasso's optimization problem can be expressed in the following form: Minimize: squared error term + λ*Σ|β|. Among them, the squared error term measures the difference between the model's predicted value and the actual value, λ is the regularization parameter, and β represents the coefficient vector of the model. As λ increases, the optimization process will cause the coefficients of some features to gradually shrink to zero. This is because the regularization term penalizes the absolute value of the coefficient, making the optimization process more inclined to select fewer features and compress the coefficients of unimportant features to zero. Due to the influence of L1 regularization, the coefficients of some features will become zero, thus achieving feature selection. Features with non-zero coefficients are considered more important in explaining the target variable and are therefore retained. The mathematical expression of Lasso can be equivalent to the following formula:

式中的前半部分表示对原始目标函数的拟合,后半部分表示参数的惩罚项,惩罚系数λ∈[0,+∞],λ越小,模型的惩罚力度就越小,保留的特征变量就越多,λ越大,惩罚力度就越大,系数为0的参数就越多。总之,Lasso通过引入L1正则化项,通过调整正则化参数λ,可以在优化过程中实现对模型系数的收缩和特征选择,从而达到降低模型复杂性、防止过拟合的目标。The first half of the formula represents the fitting of the original objective function, and the second half represents the penalty term of the parameter, the penalty coefficient λ∈[0, +∞]. The smaller λ, the smaller the penalty of the model, and the retained characteristic variables The greater the number, the larger λ, the greater the penalty, and the more parameters with a coefficient of 0. In short, by introducing the L1 regularization term and adjusting the regularization parameter λ, Lasso can shrink the model coefficients and select features during the optimization process, thereby achieving the goal of reducing model complexity and preventing overfitting.

进一步地,在步骤3)中:Further, in step 3):

EBM模型训练过程:EBM模型是一种加性模型,用提升树模型拟合各个特征,并寻找特征之间的交互项效应。所述个人信用数据输入EBM模型,确定所述EBM模型中的单个特征以及所述EBM模型中的二阶交互特征,并确定所述单个特征的输出结果以及所述二阶交互特征的输出结果,并将它们相加得到最终的预测结果。EBM model training process: The EBM model is an additive model that uses a boosted tree model to fit each feature and find the interaction effects between features. The personal credit data is input into the EBM model, a single feature in the EBM model and a second-order interaction feature in the EBM model are determined, and the output result of the single feature and the output result of the second-order interaction feature are determined, And add them to get the final prediction.

识别EBM模型二阶交互特征,具体包括:Identify the second-order interaction characteristics of the EBM model, including:

所述EBM模型是将提升树模型融入到广义加性模型之中,所述EBM模型结构为:The EBM model integrates the lifting tree model into the generalized additive model. The EBM model structure is:

其中,是主效应,∑i,jfij(xi,xj)是二阶交互效应,g(E(Y|X))表示的是每个特征如何影响模型的输出。in, is the main effect, ∑ i, j f ij (x i , x j ) is the second-order interaction effect, and g(E(Y|X)) represents how each feature affects the output of the model.

给定任意一对特征(xi,xj),放入所述模型训练,得到形函数fij(xi,xj),对应的单个特征其形函数为fi(xi),作为主效应,先对主效应对应的第一部分参数进行训练,所有特征主效应训练完之后,计算残差,再以残差降低为目标,训练所述交互作用fij(xi,xj)对应的第二部分参数。如果(xi,xj)之间有较强的相关性,其交互作用fij(xi,xj)极大降低残差。 Given any pair of features ( xi , For the main effect, first train the first part of the parameters corresponding to the main effect. After training the main effects of all features, calculate the residual, and then with the goal of reducing the residual, train the corresponding interaction f ij (x i , x j ) The second part of the parameters. If there is a strong correlation between (x i , x j ), its interaction f ij (x i , x j ) greatly reduces the residual error.

对每对特征的二元函数fij(xi,xj),使用FAST(Features from acceleratedsegment test,一种用于角点检测的算法)算法进行筛选,挑选显著的二阶交互项;将所述单个特征的输出结果以及所述二阶交互特征的输出结果线性相加,得到所述EBM模型最终的预测结果。For each pair of feature binary functions f ij (x i , x j ), use the FAST (Features from accelerated segment test, an algorithm for corner detection) algorithm to screen and select significant second-order interaction terms; The output result of the single feature and the output result of the second-order interactive feature are linearly added to obtain the final prediction result of the EBM model.

进一步地,在步骤4)中:Further, in step 4):

为了对比EBM模型在信用风险评估上的预测效果,选取了5种常用的对比模型,分别是决策树,K近邻模型,XGBoost,逻辑回归模型和随机森林。采用准确率、召回率、精准率和F1得分(即F1-score)作为模型的评判标准。In order to compare the prediction effect of EBM models in credit risk assessment, five commonly used comparison models were selected, namely decision tree, K nearest neighbor model, XGBoost, logistic regression model and random forest. Accuracy rate, recall rate, precision rate and F1 score (ie F1-score) are used as the evaluation criteria of the model.

所述信用风险评估和分类时,依据所述EBM模型的预测结果,将借款人划分为不同的信用风险类别。同时评估所述EBM模型预测值与真实值之间的差异,进而评估所述EBM模型的预测性能,性能评估常见使用混淆矩阵,具体见如下的表1During the credit risk assessment and classification, borrowers are divided into different credit risk categories based on the prediction results of the EBM model. At the same time, the difference between the predicted value and the real value of the EBM model is evaluated, and then the prediction performance of the EBM model is evaluated. The confusion matrix is commonly used for performance evaluation. See Table 1 below for details.

表1Table 1

准确率:描述所述EBM模型所有判断正确的结果占所述数据集观测值的比重。Accuracy: Describe the proportion of all correctly judged results of the EBM model to the observed values of the data set.

精准率:描述所述EBM模型预测是Positive的结果中,所述EBM模型预测正确的比重。Accuracy: Describe the proportion of correct predictions by the EBM model among the results that the EBM model prediction is Positive.

灵敏度:描述所述数据集中真实分类结果是Positive,所述EBM模型预测正确的比重。Sensitivity: It is described that the true classification result in the data set is Positive, and the EBM model predicts the correct proportion.

F1-score:表示精确率和召回率的调和平均数,最大为1,最小为0,越接近1表示效果越好。其中,recall=TP/(TP+FN),TP表示真正例(TruePositive),即模型正确识别为正例的样本数量;FN表示假反例(False Negative),即模型将实际为正例的样本错误地识别为反例的数量。F1-score: Represents the harmonic mean of precision and recall. The maximum is 1 and the minimum is 0. The closer to 1, the better the effect. Among them, recall=TP/(TP+FN), TP represents TruePositive, that is, the number of samples that the model correctly identifies as positive examples; FN represents False Negative, that is, the model incorrectly identifies samples that are actually positive examples. The number of counterexamples identified.

进一步地,在步骤5)中:Further, in step 5):

依据EBM模型结构特点从全局解释和局部解释两个角度对信用风险决策做出分析。全局解释是基于所述数据集中的特征变量对所述EBM模型结果进行解释。局部解释是指对每个输入样本的预测结果,分析其受到哪些特征和交互项的影响,以及影响的程度和方向。所述EBM模型每个特征的形函数fu,无论是单特征还是二阶交互特征,都计算其权重,然后进行排序,找到重要性较前的单个特征或交互特征。对于权重的计算,EBM模型使用度量fu的公式,/>视为fu的标准差,在所述EBM模型中不考虑截距项的重要性,因此令E(fu)=0,那么/> 因此对数据进行正则化之后,将标准差当成权重,来度量每一项的重要性。从而对信用风险评估结果进行解释和分析,输出借款人的信用风险因素、信用风险趋势、信用风险建议等信息。需要说明的是,步骤4)和步骤5)属于并列关系,步骤4)的结果并没有对步骤5)产生影响。Based on the structural characteristics of the EBM model, the credit risk decision-making is analyzed from two perspectives: global explanation and local explanation. Global explanation is to explain the EBM model results based on the characteristic variables in the data set. Local explanation refers to the prediction result of each input sample, analyzing which features and interaction terms it is affected by, and the degree and direction of the influence. The shape function f u of each feature of the EBM model, whether it is a single feature or a second-order interactive feature, calculates its weight, and then sorts it to find the single feature or interactive feature with higher importance. For the calculation of weights, the EBM model uses The formula for measuring f u ,/> Considered as the standard deviation of f u , the importance of the intercept term is not considered in the EBM model, so let E(f u )=0, then/> Therefore, after regularizing the data, the standard deviation is used as a weight to measure the importance of each item. In this way, the credit risk assessment results are interpreted and analyzed, and information such as the borrower's credit risk factors, credit risk trends, and credit risk recommendations are output. It should be noted that step 4) and step 5) are in a parallel relationship, and the result of step 4) does not affect step 5).

图3为本发明实施例还提供一种基于EBM模型的信用风险评估和解释系统的数据流转示意图,如图3所示,本发明实施例还提供一种基于EBM模型的信用风险评估和解释系统,所述系统包括:Figure 3 is a schematic diagram of data flow of a credit risk assessment and interpretation system based on the EBM model. As shown in Figure 3, the embodiment of the present invention also provides a credit risk assessment and interpretation system based on the EBM model. , the system includes:

数据获取模块,用于获取银行个人贷款业务相关数据,并发送至数据预处理模块,特征信息包含个人基本信息,还款能力及还款意愿三大方面,主要包含当前贷款利率,所在公司类型,工作年限,是否有房,审核情况,贷款用途类别,贷款的初始列表状态,借款人提前还款次数,借款人提前还款累计金额,工作类型,抵押贷款账户数量,教育类型,家庭成员人数,贷款的还款月数,借款人是否违约;The data acquisition module is used to obtain data related to the bank's personal loan business and send it to the data preprocessing module. The characteristic information includes basic personal information, repayment ability and repayment willingness, mainly including the current loan interest rate, company type, Working years, whether you own a house, review status, loan purpose category, initial listing status of the loan, number of borrowers' early repayments, cumulative amount of borrowers' early repayments, type of job, number of mortgage accounts, type of education, number of family members, The number of repayment months for the loan and whether the borrower is in default;

数据预处理模块,用于接收所述数据获取模块所发送的贷款数据集,所述贷款数据集中特征数据进行缺失值处理、异常值处理,数据平衡处理和数据归一化处理,并将预处理后的贷款数据集发送至EBM模型训练模块;The data preprocessing module is used to receive the loan data set sent by the data acquisition module, perform missing value processing, outlier processing, data balancing processing and data normalization processing on the characteristic data in the loan data set, and perform preprocessing The final loan data set is sent to the EBM model training module;

特征提取模块,使用Lasso进行特征筛选,保留筛选后特征,所述预处理后数据集中的筛选后特征数据作为数据集;The feature extraction module uses Lasso to perform feature screening, retains the filtered features, and uses the filtered feature data in the preprocessed data set as the data set;

EBM模型训练模块,用于接收所述数据预处理模块发送的所述预处理后的贷款数据集,用于对数据进行训练和验证,自动地从数据中学习和选择重要的特征,以及它们之间的相互作用,生成可解释的预测结果;The EBM model training module is used to receive the preprocessed loan data set sent by the data preprocessing module, to train and verify the data, to automatically learn and select important features from the data, and among them The interaction between them generates interpretable prediction results;

信用风险评估和分类模块,用于接收所述EBM模型训练和验证模块产生的预测结果与数据集真实结果对比,对借款人的信用风险进行评估和分类;The credit risk assessment and classification module is used to receive the prediction results generated by the EBM model training and verification module and compare them with the real results of the data set to evaluate and classify the borrower's credit risk;

信用风险解释和分析模块,用于接收所述EBM模型训练和验证模块产生的预测结果,并根据EBM模型全局解释和局部解释等信息,对信用风险评估结果进行解释和分析;The credit risk interpretation and analysis module is used to receive the prediction results generated by the EBM model training and verification module, and interpret and analyze the credit risk assessment results based on information such as global interpretation and local interpretation of the EBM model;

用户界面模块,用于接收EBM模型训练和验证模块产生的预测结果,信用风险评估和分类模块产生的性能评估,以及信用风险解释和分析模块产生的结果解释,用于向用户展示信用风险评估和解释的结果,并提供交互功能。The user interface module is used to receive the prediction results produced by the EBM model training and verification module, the performance evaluation produced by the credit risk assessment and classification module, and the result interpretation produced by the credit risk interpretation and analysis module, and is used to display the credit risk assessment and Interpret the results and provide interactive functionality.

为了验证该方法在EBM模型的信用风险评估和解释系统中的性能,进行了实证实验,从中原银行中收集了真实的贷款数据集。数据样本量为15216条违约贷款与未违约贷款记录。In order to verify the performance of this method in the credit risk assessment and interpretation system of the EBM model, an empirical experiment was conducted to collect a real loan data set from Zhongyuan Bank. The data sample size is 15,216 records of defaulted loans and non-defaulted loans.

本发明提供了一种基于EBM模型的信用风险评估和解释方法,所述EBM模型是一种可解释的机器学习模型,通过选择、加权关键证据,可以更准确地预测借款人的信用风险;与其它机器学习方法相比,在准确率、精准率、召回率和灵异度等方面都具有最好的识别结果;这有助于降低误判和漏判的风险,从而优化贷款决策。The present invention provides a credit risk assessment and explanation method based on the EBM model. The EBM model is an interpretable machine learning model that can more accurately predict the borrower's credit risk by selecting and weighting key evidence; and Compared with other machine learning methods, it has the best recognition results in terms of accuracy, precision, recall and sensitivity; this helps reduce the risk of misjudgment and missed diagnosis, thereby optimizing loan decisions.

本发明采用EBM模型进行信用风险评估和解释,可以充分利用客户的信用数据,提高信用风险评估的准确性和可靠性。同时,本发明还可以根据EBM模型输出客户的信用等级,确定信用风险的影响因素及其影响程度,并且解释为什么某个借款人被划分到特定的信用风险类别。这种解释性有助于金融机构更好地理解模型的决策依据,并为客户提供透明度,增加信任。The present invention uses the EBM model for credit risk assessment and interpretation, which can make full use of customer credit data and improve the accuracy and reliability of credit risk assessment. At the same time, the present invention can also output the customer's credit rating based on the EBM model, determine the influencing factors of credit risk and their degree of influence, and explain why a certain borrower is classified into a specific credit risk category. This interpretability helps financial institutions better understand the basis for model decisions and provides transparency and increased trust to customers.

本发明还可以根据不同场景和需求,灵活地调整EBM模型的参数和特征选择,从而实现对不同类型客户的个性化信用风险评估和解释。此外,本发明还可以通过持续地更新EBM模型,实现对客户信用风险评估和解释的动态优化。The present invention can also flexibly adjust the parameters and feature selection of the EBM model according to different scenarios and needs, thereby realizing personalized credit risk assessment and interpretation for different types of customers. In addition, the present invention can also realize dynamic optimization of customer credit risk assessment and interpretation by continuously updating the EBM model.

为实证性能,选取了五种常用的机器学习分类方法作为对比方法,分别是决策树(Decision tree),K近邻模型(KNN),XGBoost,逻辑回归(Logistic)和随机森林(Randomforest)与本发明提出的方法进行比较,采用的判断指标包括准确率(Accuracy)、召回率(Precision)、精准率(Sensitivity)和F1-score。For empirical performance, five commonly used machine learning classification methods were selected as comparison methods, namely decision tree (Decision tree), K nearest neighbor model (KNN), XGBoost, logistic regression (Logistic) and random forest (Randomforest) and the present invention. The proposed methods are compared, and the judgment indicators used include accuracy, recall, sensitivity and F1-score.

为了对银行贷款数据集有直观的了解,贷款数据集样本采用的数据样本见如下的表2:In order to have an intuitive understanding of the bank loan data set, the data samples used in the loan data set sample are shown in Table 2 below:

表2Table 2

在表2中对应表示借款人的9个Lasso选取的影响程度较高的解释变量(分别为当前贷款利率,所在公司类型,工作年限,是否有房,审核情况,贷款用途类别,贷款的初始列表状态,借款人提前还款次数,借款人提前还款累计金额),y代表目标变量(y=1表示违约,y=0表示未违约)In Table 2, the 9 Lasso-selected explanatory variables with a high degree of influence corresponding to the borrower (respectively, current loan interest rate, company type, working experience, whether there is a house, review status, loan purpose category, initial list of loans) status, the number of prepayments by the borrower, the cumulative amount of prepayments by the borrower), y represents the target variable (y=1 means default, y=0 means no default)

按照7:3的比例划分了训练样本和测试样本。本发明EBM方法及对比方法(Decision tree,KNN,XGBoost,Logistic和Random forest)的性能比较(训练集:测试集=70:30)见如下的表3:The training samples and test samples are divided according to the ratio of 7:3. The performance comparison of the EBM method of the present invention and the comparison method (Decision tree, KNN, XGBoost, Logistic and Random forest) (training set: test set = 70:30) is shown in the following Table 3:

表3table 3

从表3的分类结果可以看出,EBM模型取得了最好的预测效果,对比其他五种方法,在准确率,召回率,精准率和F1得分都有非常好的分类效果。准确率达到了86.6%,较其他模型准确率提升近6%左右,众所周知,在银行贷款业务中,每提升1%的准确率可以减少近百万的损失。It can be seen from the classification results in Table 3 that the EBM model has achieved the best prediction effect. Compared with the other five methods, it has very good classification effects in accuracy, recall, precision and F1 score. The accuracy rate reached 86.6%, which is nearly 6% higher than that of other models. As we all know, in bank loan business, every 1% increase in accuracy can reduce losses by nearly one million.

EBM模型对于数据集的全局解释,经过特征选择的数据集包含9个特征变量,目标变量是预测是否贷款违约,EBM模型为特征变量的全局解释,其中包含两个层次,一个是特征重要性排序,二是每个特征与目标变量之间的函数关系。图4为本发明实施例提供的特征重要性的排序图,如图4所示,全局解释可以了解EBM模型对不同特征的影响程度,协助银行贷款人员把握影响决策判断的关键因素,从而提供更全面的洞察。The EBM model is a global interpretation of the data set. The feature-selected data set contains 9 feature variables. The target variable is to predict whether a loan defaults. The EBM model is a global interpretation of the feature variables. It contains two levels, one is the ranking of feature importance. , the second is the functional relationship between each feature and the target variable. Figure 4 is a ranking diagram of feature importance provided by the embodiment of the present invention. As shown in Figure 4, the global explanation can understand the impact of the EBM model on different features, assist bank loan personnel to grasp the key factors that affect decision-making, and thus provide more accurate information. Comprehensive insights.

图5为本发明实施例提供的“当前贷款利率”特征与目标变量之间的形函数以及特征自身取值的分布情况示意图,如图5所示,上方显示的是“当前贷款利率”特征与目标变量之间的关系,下方显示的是“当前贷款利率”特征在数据集中的概率分布情况。从图中可以看出,贷款利率取值为0到0.5时,违约概率逐渐增大,当贷款利率超过0.7后,违约概率随着贷款利率的增大逐渐减小Figure 5 is a schematic diagram of the shape function between the "current loan interest rate" feature and the target variable and the distribution of the values of the feature itself provided by the embodiment of the present invention. As shown in Figure 5, what is shown above is the "current loan interest rate" feature and the distribution of the value of the feature itself. The relationship between the target variables. Shown below is the probability distribution of the "current loan interest rate" feature in the data set. It can be seen from the figure that when the loan interest rate ranges from 0 to 0.5, the default probability gradually increases. When the loan interest rate exceeds 0.7, the default probability gradually decreases as the loan interest rate increases.

EBM模型的局部解释,主要体现为对样本中的每个特征,针对结果的作用情况进行解释,即样本中每个特征的得分情况,以及每个特征对该样本的预测结果的影响程度,图6为本发明实施例的数据集中某一样本的局部解释示意图,如图6所示,该样本的真实结果为0,且预测结果也为0,对于模型为何给出这样的预测出结果,条形图给出了具体的得分情况,并对影响程度从上到下进行了排序,右侧部分的特征如“当前贷款利率”、“工作年限”、“审核情况”、“是否有房”和“所在公司类型”对预测结果起负向作用,模型的预测概率为每个特征得分相加的结果。局部解释可以协助银行贷款人员分析借款人贷款未通过(预测结果如果是借款人会违约,则借款人贷款未通过)的具体原因,准确到某一具体条件,方便对决策结果进行调整。The local explanation of the EBM model is mainly reflected in the explanation of the effect of each feature in the sample on the results, that is, the score of each feature in the sample, and the degree of influence of each feature on the prediction result of the sample, Figure 6 is a partial interpretation schematic diagram of a certain sample in the data set of the embodiment of the present invention. As shown in Figure 6, the real result of the sample is 0, and the predicted result is also 0. As for why the model gives such a predicted result, it is The graphic chart gives the specific score and sorts the degree of impact from top to bottom. The features on the right side include "current loan interest rate", "working years", "review status", "whether there is a house" and "Company type" has a negative effect on the prediction results, and the prediction probability of the model is the sum of the scores of each feature. Partial explanations can help bank loan officers analyze the specific reasons why the borrower's loan has not been approved (if the prediction result is that the borrower will default, the borrower's loan has not been approved), and can be accurate to a specific condition to facilitate adjustment of the decision-making results.

本发明的一种基于EBM模型的信用风险评估和解释方法及系统,其中方法包括:从信用风险相关的数据源中收集和预处理数据,包括借款人的个人信息、贷款信息、还款记录、信用评分等;将数据输入到可解释机器学习Explainable Boosting Machine(EBM)模型,利用EBM生成解释性强的规则;利用EBM进行信用风险的评估和预测并根据预测解释进行解释输出,以便决策者理解模型的判断依据。实验结果表明,本发明具有良好的预测性能及可解释能力,可广泛应用于金融领域的信用风险评估与决策支持。The present invention is a credit risk assessment and interpretation method and system based on the EBM model, wherein the method includes: collecting and preprocessing data from credit risk-related data sources, including borrower's personal information, loan information, repayment records, Credit scoring, etc.; input data into the Explainable Boosting Machine (EBM) model and use EBM to generate highly interpretable rules; use EBM to assess and predict credit risk and interpret the output based on the prediction explanation for decision-makers to understand The basis for judging the model. Experimental results show that the present invention has good prediction performance and interpretability, and can be widely used in credit risk assessment and decision support in the financial field.

图7为本发明提供的一种电子设备的实体结构示意图,如图7所示,该电子设备可以包括:处理器(processor)710、通信接口(Communications Interface)720、存储器(memory)730和通信总线740,其中,处理器710,通信接口720,存储器730通过通信总线740完成相互间的通信。处理器710可以调用存储器730中的逻辑指令,以执行基于EBM模型的信用评估方法,所述方法包括:Figure 7 is a schematic diagram of the physical structure of an electronic device provided by the present invention. As shown in Figure 7, the electronic device may include: a processor (processor) 710, a communications interface (Communications Interface) 720, a memory (memory) 730 and a communication interface. Bus 740, in which the processor 710, the communication interface 720, and the memory 730 complete communication with each other through the communication bus 740. The processor 710 can call logical instructions in the memory 730 to execute a credit evaluation method based on the EBM model, which method includes:

获取待预测对象的第一数据,所述第一数据为所述待预测对象的信用风险相关数据;Obtain first data of the object to be predicted, where the first data is credit risk-related data of the object to be predicted;

将所述第一数据输入EBM模型,获取所述EBM模型输出的预测结果,所述预测结果包括第一结果,所述第一结果包括所述待预测对象在获取贷款后是否会违约;Input the first data into the EBM model to obtain the prediction results output by the EBM model. The prediction results include a first result, and the first result includes whether the object to be predicted will default after obtaining the loan;

其中,所述EBM模型的训练过程包括:Among them, the training process of the EBM model includes:

获取信用风险相关的数据集,所述数据集的每条样本包括多个特征;Obtain a credit risk-related data set, where each sample of the data set includes multiple features;

基于所述数据集,训练所述EBM模型,所述EBN模型是所有主效应形函数和所有交互效应形函数之和,所述主效应形函数是针对单一所述特征的形函数,所述交互效应形函数是针对两个不同所述特征的形函数。Based on the data set, the EBM model is trained. The EBN model is the sum of all main effect shape functions and all interaction effect shape functions. The main effect shape function is a shape function for a single feature. The interaction effect The effect shape function is a shape function for two different described characteristics.

此外,上述的存储器730中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logical instructions in the memory 730 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .

另一方面,本发明还提供一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各方法所提供的基于EBM模型的信用评估方法,所述方法包括:On the other hand, the present invention also provides a computer program product. The computer program product includes a computer program stored on a non-transitory computer-readable storage medium. The computer program includes program instructions. When the program instructions are read by a computer, When executed, the computer can execute the credit evaluation method based on the EBM model provided by each of the above methods. The methods include:

获取待预测对象的第一数据,所述第一数据为所述待预测对象的信用风险相关数据;Obtain first data of the object to be predicted, where the first data is credit risk-related data of the object to be predicted;

将所述第一数据输入EBM模型,获取所述EBM模型输出的预测结果,所述预测结果包括第一结果,所述第一结果包括所述待预测对象在获取贷款后是否会违约;Input the first data into the EBM model to obtain the prediction results output by the EBM model. The prediction results include a first result, and the first result includes whether the object to be predicted will default after obtaining the loan;

其中,所述EBM模型的训练过程包括:Among them, the training process of the EBM model includes:

获取信用风险相关的数据集,所述数据集的每条样本包括多个特征;Obtain a credit risk-related data set, where each sample of the data set includes multiple features;

基于所述数据集,训练所述EBM模型,所述EBN模型是所有主效应形函数和所有交互效应形函数之和,所述主效应形函数是针对单一所述特征的形函数,所述交互效应形函数是针对两个不同所述特征的形函数。Based on the data set, the EBM model is trained. The EBN model is the sum of all main effect shape functions and all interaction effect shape functions. The main effect shape function is a shape function for a single feature. The interaction effect The effect shape function is a shape function for two different described characteristics.

又一方面,本发明还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各提供的基于EBM模型的信用评估方法,所述方法包括:In another aspect, the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored. The computer program is implemented when executed by a processor to execute the above-mentioned credit evaluation methods based on the EBM model, so The methods include:

获取待预测对象的第一数据,所述第一数据为所述待预测对象的信用风险相关数据;Obtain first data of the object to be predicted, where the first data is credit risk-related data of the object to be predicted;

将所述第一数据输入EBM模型,获取所述EBM模型输出的预测结果,所述预测结果包括第一结果,所述第一结果包括所述待预测对象在获取贷款后是否会违约;Input the first data into the EBM model to obtain the prediction results output by the EBM model. The prediction results include a first result, and the first result includes whether the object to be predicted will default after obtaining the loan;

其中,所述EBM模型的训练过程包括:Among them, the training process of the EBM model includes:

获取信用风险相关的数据集,所述数据集的每条样本包括多个特征;Obtain a credit risk-related data set, where each sample of the data set includes multiple features;

基于所述数据集,训练所述EBM模型,所述EBN模型是所有主效应形函数和所有交互效应形函数之和,所述主效应形函数是针对单一所述特征的形函数,所述交互效应形函数是针对两个不同所述特征的形函数。Based on the data set, the EBM model is trained. The EBN model is the sum of all main effect shape functions and all interaction effect shape functions. The main effect shape function is a shape function for a single feature. The interaction effect The effect shape function is a shape function for two different described characteristics.

以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the part of the above technical solution that essentially contributes to the existing technology can be embodied in the form of a software product. The computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments.

最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be used Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1.一种基于EBM模型的信用评估方法,其特征在于,所述方法包括:1. A credit assessment method based on the EBM model, characterized in that the method includes: 获取待预测对象的第一数据,所述第一数据为所述待预测对象的信用风险相关数据;Obtain first data of the object to be predicted, where the first data is credit risk-related data of the object to be predicted; 将所述第一数据输入EBM模型,获取所述EBM模型输出的预测结果,所述预测结果包括第一结果,所述第一结果包括所述待预测对象在获取贷款后是否会违约;Input the first data into the EBM model to obtain the prediction results output by the EBM model. The prediction results include a first result, and the first result includes whether the object to be predicted will default after obtaining the loan; 其中,所述EBM模型的训练过程包括:Among them, the training process of the EBM model includes: 获取信用风险相关的数据集,所述数据集的每条样本包括多个特征;Obtain a credit risk-related data set, where each sample of the data set includes multiple features; 基于所述数据集,训练所述EBM模型,所述EBN模型是所有主效应形函数和所有交互效应形函数之和,所述主效应形函数是针对单一所述特征的形函数,所述交互效应形函数是针对两个不同所述特征的形函数。Based on the data set, the EBM model is trained. The EBN model is the sum of all main effect shape functions and all interaction effect shape functions. The main effect shape function is a shape function for a single feature. The interaction effect The effect shape function is a shape function for two different described characteristics. 2.根据权利要求1所述的基于EBM模型的信用评估方法,其特征在于,所述预测结果还包括第二结果,所述第二结果包括所述多个特征针对所述第一结果的影响程度,用来分析预测得到所述第一结果的原因。2. The credit evaluation method based on the EBM model according to claim 1, characterized in that the prediction result also includes a second result, and the second result includes the impact of the multiple features on the first result. The degree is used to analyze and predict the reasons for obtaining the first result. 3.根据权利要求1所述的基于EBM模型的信用评估方法,其特征在于,获取信用风险相关的数据集,所述数据集的每条样本包括多个特征,进一步包括:3. The credit assessment method based on the EBM model according to claim 1, characterized in that a data set related to credit risk is obtained, and each sample of the data set includes multiple features, further including: 获取信用风险相关的样本集,所述样本集的每条样本包括多个项;Obtain a credit risk-related sample set, where each sample of the sample set includes multiple items; 通过Lasso算法筛选所述多个项,将筛选出的项作为所述多个特征。The multiple items are filtered through the Lasso algorithm, and the filtered items are used as the multiple features. 4.根据权利要求3所述的基于EBM模型的信用评估方法,其特征在于,获取信用风险相关的样本集,进一步包括:4. The credit assessment method based on the EBM model according to claim 3, characterized in that obtaining a sample set related to credit risk further includes: 获取银行披露的个人信用数据;Obtain personal credit data disclosed by banks; 对所述个人信用数据进行预处理,形成所述样本集;Preprocess the personal credit data to form the sample set; 其中,所述样本集的每条样本包括的项至少包括如下的两种:Each sample in the sample set includes at least two of the following items: 当前贷款利率,所在公司类型,工作年限,是否有房,审核情况,贷款用途类别,贷款的初始列表状态,借款人提前还款次数,借款人提前还款累计金额,工作类型,抵押贷款账户数量,教育类型,家庭成员人数,贷款的还款月数,借款人是否违约。Current loan interest rate, company type, working years, whether you own a house, review status, loan purpose category, initial list status of the loan, borrower's number of early repayments, cumulative amount of borrower's early repayments, type of job, number of mortgage accounts , type of education, number of family members, number of months to repay the loan, whether the borrower is in default. 5.根据权利要求4所述的基于EBM模型的信用评估方法,其特征在于,所述预处理至少包括如下的一种:5. The credit evaluation method based on the EBM model according to claim 4, characterized in that the preprocessing includes at least one of the following: 缺失值处理、异常值处理、数据平衡处理和数据归一化处理。Missing value processing, outlier processing, data balancing processing and data normalization processing. 6.根据权利要求5所述的基于EBM模型的信用评估方法,其特征在于,所述Lasso算法的公式为:6. The credit evaluation method based on the EBM model according to claim 5, characterized in that the formula of the Lasso algorithm is: 其中,y是所述借款人是否违约的向量,X是所述多个项的矩阵,β是系数向量,N是样本数量,α是正则化强度超参数。Among them, y is the vector of whether the borrower defaults, X is the matrix of the multiple items, β is the coefficient vector, N is the number of samples, and α is the regularization strength hyperparameter. 7.根据权利要求1所述的基于EBM模型的信用评估方法,其特征在于,所述EBM模型的训练过程还包括:7. The credit evaluation method based on the EBM model according to claim 1, characterized in that the training process of the EBM model further includes: 在所述EBM模型训练完成后,生成全局解释;After the EBM model training is completed, a global explanation is generated; 其中,所述全局解释包括:各个所述特征的重要性,和/或,各个所述特征和所述第一结果的函数关系。Wherein, the global explanation includes: the importance of each of the features, and/or the functional relationship between each of the features and the first result. 8.一种基于EBM模型的信用评估装置,其特征在于,所述装置包括:8. A credit evaluation device based on the EBM model, characterized in that the device includes: 获取模块,用于获取待预测对象的第一数据,所述第一数据为所述待预测对象的信用风险相关数据;An acquisition module, configured to acquire the first data of the object to be predicted, where the first data is the credit risk-related data of the object to be predicted; 预测模块,用于将所述第一数据输入EBM模型,获取所述EBM模型输出的预测结果,所述预测结果包括第一结果,所述第一结果包括所述待预测对象在获取贷款后是否会违约;A prediction module, configured to input the first data into the EBM model, and obtain the prediction results output by the EBM model. The prediction results include a first result, and the first result includes whether the object to be predicted will obtain a loan. will breach the contract; 其中,所述EBM模型的训练过程包括:Among them, the training process of the EBM model includes: 获取信用风险相关的数据集,所述数据集的每条样本包括多个特征;Obtain a credit risk-related data set, where each sample of the data set includes multiple features; 基于所述数据集,训练所述EBM模型,所述EBN模型是所有主效应形函数和所有交互效应形函数之和,主效应形函数是针对单一特征的形函数,交互效应形函数是针对两个不同特征的形函数。Based on the data set, the EBM model is trained. The EBN model is the sum of all main effect shape functions and all interaction effect shape functions. The main effect shape function is the shape function for a single feature, and the interaction effect shape function is for two shape functions with different characteristics. 9.一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1-7中任一项所述基于EBM模型的信用评估方法的步骤。9. An electronic device, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that when the processor executes the program, it implements claim 1 -The steps of the credit evaluation method based on the EBM model described in any one of 7. 10.一种非暂态计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-7中任一项所述基于EBM模型的信用评估方法的步骤。10. A non-transitory computer-readable storage medium with a computer program stored thereon, characterized in that when the computer program is executed by a processor, the EBM model-based EBM model as described in any one of claims 1-7 is implemented. Steps in the Credit Assessment Method.
CN202311196623.6A 2023-09-15 2023-09-15 Credit evaluation method and device based on EBM model, electronic equipment and storage medium Pending CN117172910A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311196623.6A CN117172910A (en) 2023-09-15 2023-09-15 Credit evaluation method and device based on EBM model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311196623.6A CN117172910A (en) 2023-09-15 2023-09-15 Credit evaluation method and device based on EBM model, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117172910A true CN117172910A (en) 2023-12-05

Family

ID=88942832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311196623.6A Pending CN117172910A (en) 2023-09-15 2023-09-15 Credit evaluation method and device based on EBM model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117172910A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118940973A (en) * 2024-10-12 2024-11-12 山东国研自动化有限公司 A big data-based energy operation command and management system and method thereof

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118940973A (en) * 2024-10-12 2024-11-12 山东国研自动化有限公司 A big data-based energy operation command and management system and method thereof

Similar Documents

Publication Publication Date Title
Shen et al. A cost-sensitive logistic regression credit scoring model based on multi-objective optimization approach
US20220122171A1 (en) Client server system for financial scoring with cash transactions
CN113095927B (en) Method and equipment for identifying suspected transactions of backwashing money
Zhu et al. Explainable prediction of loan default based on machine learning models
Bravo et al. Granting and managing loans for micro-entrepreneurs: New developments and practical experiences
Van Thiel et al. Artificial intelligence credit risk prediction: An empirical study of analytical artificial intelligence tools for credit risk prediction in a digital era
CN107633030B (en) Credit evaluation method and device based on data model
Van Thiel et al. Artificial Intelligent Credit Risk Prediction: An Empirical Study of Analytical Artificial Intelligence Tools for Credit Risk Prediction in a Digital Era.
Zhang et al. An attention‐based Logistic‐CNN‐BiLSTM hybrid neural network for credit risk prediction of listed real estate enterprises
CN111709826A (en) Target information determination method and device
Ruyu et al. A comparison of credit rating classification models based on spark-evidence from lending-club
Li et al. Credit Risk management of P2P network Lending
KR20220074327A (en) Loan regular auditing system using artificia intellicence
CN117172910A (en) Credit evaluation method and device based on EBM model, electronic equipment and storage medium
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN114612239A (en) Stock public opinion monitoring and wind control system based on algorithm, big data and artificial intelligence
Gorle et al. A semi-supervised Anti-Fraud model based on integrated XGBoost and BiGRU with self-attention network: an application to internet loan fraud detection
CN117934162A (en) Multi-dimensional dynamic assessment of real estate mortgage financial risk prevention and control method and system
CN112329862A (en) Decision tree-based anti-money laundering method and system
Yang et al. An evidential reasoning rule-based ensemble learning approach for evaluating credit risks with customer heterogeneity
CN117114812A (en) Financial product recommendation method and device for enterprises
Li Credit card fraud identification based on unbalanced data set based on fusion model
CN117764692A (en) Method for predicting credit risk default probability
CN117575595A (en) Payment risk identification method, device, computer equipment and storage medium
Anglekar et al. Machine learning based risk assessment analysis for smes loan grant

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination