CN115293889A

CN115293889A - Credit risk prediction model training method, electronic device and readable storage medium

Info

Publication number: CN115293889A
Application number: CN202210995711.1A
Authority: CN
Inventors: 谭蕴琨; 陈婷; 吴三平; 庄伟亮; 张鹏; 壮青; 吴轶凡; 陈庆麟; 徐朔; 黄勇卫
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2022-11-04

Abstract

The application discloses a credit risk prediction model training method, electronic equipment and a readable storage medium, which are applied to the technical field of financial science and technology, wherein the credit risk prediction model training method comprises the following steps: acquiring a training sample set and sample weights of training samples in the training sample set; obtaining a credit risk prediction total model according to the training sample set and the weight of each sample, wherein the credit risk prediction total model consists of a plurality of risk prediction submodels; clustering the risk prediction submodels according to the submodel prediction result of the risk prediction submodel to obtain a submodel group; amplifying a training sample set according to the sub-model prediction result and the sub-model group; and returning to the execution step: and (4) obtaining a credit risk prediction total model through iterative training according to the training sample set and the weights of all samples until the credit risk prediction total model meets a preset iteration updating ending condition, and obtaining a target credit risk prediction total model. The method and the device solve the technical problem that the prediction accuracy of the credit risk of the user is low.

Description

Credit risk prediction model training method, electronic device and readable storage medium

技术领域technical field

本申请涉及金融科技(Fintech)的人工智能技术领域，尤其涉及一种信用风险预测模型训练方法、电子设备及可读存储介质。The present application relates to the artificial intelligence technology field of financial technology (Fintech), and in particular to a credit risk prediction model training method, electronic equipment and a readable storage medium.

背景技术Background technique

随着金融科技，尤其是互联网科技金融的不断发展，越来越多的技术(如分布式、人工智能等)应用在金融领域，但金融业也对技术提出了更高的要求，如对金融业对用户的信用水平也有更高的要求。With the continuous development of financial technology, especially Internet technology finance, more and more technologies (such as distributed, artificial intelligence, etc.) The industry also has higher requirements on the credit level of users.

目前，为了评估用户的信用水平，通常依据训练好的多个风险子模型对于用户行为数据的预测信用风险和用户行为数据对应的真实信用风险，对待训练风险总模型进行训练，得到风险总模型，从而通过风险总模型预测用户的信用风险，但是该方法训练得到的风险总模型中可能存在少量风险子模型的模型权重较高，其余大量风险子模型的模型权重较低，容易出现由于风险总模型过于依赖少量的风险子模型，导致风险总模型的预测准确度低，所以，当前用户信用风险的预测准确度低。At present, in order to evaluate the user's credit level, usually based on the predicted credit risk of the user behavior data of multiple trained risk sub-models and the real credit risk corresponding to the user behavior data, the total risk model to be trained is trained to obtain the total risk model, Therefore, the credit risk of users can be predicted through the total risk model, but in the total risk model trained by this method, there may be a small number of risk sub-models with high model weights, and the model weights of a large number of other risk sub-models are low, which is prone to occur due to the overall risk model. Too much reliance on a small number of risk sub-models leads to low prediction accuracy of the overall risk model, so the current user credit risk prediction accuracy is low.

发明内容Contents of the invention

本申请的主要目的在于提供一种信用风险预测模型训练方法、电子设备及可读存储介质，旨在解决现有技术中用户信用风险的预测准确度低的技术问题。The main purpose of this application is to provide a credit risk prediction model training method, electronic equipment and readable storage medium, aiming to solve the technical problem of low prediction accuracy of user credit risk in the prior art.

为实现上述目的，本申请提供一种信用风险预测模型训练方法，应用于信用风险预测设备，所述信用风险预测模型训练方法包括：In order to achieve the above purpose, the present application provides a credit risk prediction model training method, which is applied to credit risk prediction equipment, and the credit risk prediction model training method includes:

获取用户的历史行为数据作为训练样本集以及所述训练样本集中各训练样本的样本权重；Obtaining the user's historical behavior data as a training sample set and the sample weight of each training sample in the training sample set;

根据所述训练样本集和各所述样本权重，迭代训练得到信用风险预测总模型，其中，所述信用风险预测总模型由多个风险预测子模型组成；According to the training sample set and each of the sample weights, iteratively train to obtain the overall credit risk prediction model, wherein the overall credit risk prediction model is composed of multiple risk prediction sub-models;

获取各所述风险预测子模型对于所述训练样本集的子模型预测结果，依据各所述子模型预测结果，对所述风险预测子模型进行聚类，得到至少一个子模型群；Obtaining the sub-model prediction results of each of the risk prediction sub-models for the training sample set, and clustering the risk prediction sub-models according to the prediction results of each of the sub-models to obtain at least one sub-model group;

依据各所述子模型预测结果和各所述子模型群，优化各所述样本权重，以扩增所述训练样本集；Optimizing each of the sample weights according to the prediction results of each of the sub-models and each of the sub-model groups to amplify the training sample set;

返回执行步骤：根据所述训练样本集和各所述样本权重，迭代训练得到信用风险预测总模型，直至所述信用风险预测总模型满足预设迭代更新结束条件，得到目标信用风险预测总模型。Return to the execution step: According to the training sample set and the weights of each sample, iteratively train to obtain the overall credit risk prediction model until the overall credit risk prediction model meets the preset iterative update end condition, and obtain the target overall credit risk prediction model.

为实现上述目的，本申请还提供一种信用风险预测装置，所述信用风险预测装置应用于信用风险预测设备，所述信用风险预测装置包括：In order to achieve the above purpose, the present application also provides a credit risk prediction device, the credit risk prediction device is applied to credit risk prediction equipment, and the credit risk prediction device includes:

获取模块，用于获取用户的历史行为数据作为训练样本集以及所述训练样本集中各训练样本的样本权重；An acquisition module, configured to acquire the user's historical behavior data as a training sample set and the sample weight of each training sample in the training sample set;

训练模块，用于根据所述训练样本集和各所述样本权重，迭代训练得到信用风险预测总模型，其中，所述信用风险预测总模型由多个风险预测子模型组成；A training module, configured to perform iterative training according to the training sample set and each of the sample weights to obtain an overall credit risk prediction model, wherein the overall credit risk prediction model is composed of multiple risk prediction sub-models;

聚类模块，用于获取各所述风险预测子模型对于所述训练样本集的子模型预测结果，依据各所述子模型预测结果，对所述风险预测子模型进行聚类，得到至少一个子模型群；The clustering module is used to obtain the sub-model prediction results of each of the risk prediction sub-models for the training sample set, and cluster the risk prediction sub-models according to the prediction results of each of the sub-models to obtain at least one sub-model model group;

扩增模块，用于依据各所述子模型预测结果和各所述子模型群，优化各所述样本权重，以扩增所述训练样本集；An amplification module, configured to optimize the weight of each sample according to the prediction results of each of the sub-models and each of the sub-model groups, so as to amplify the training sample set;

优化模块，用于返回执行步骤：根据所述训练样本集和各所述样本权重，迭代训练得到信用风险预测总模型，直至所述信用风险预测总模型满足预设迭代更新结束条件，得到目标信用风险预测总模型。The optimization module is used to return to the execution step: according to the training sample set and each sample weight, iteratively train to obtain the overall credit risk prediction model, until the overall credit risk prediction model meets the preset iterative update end condition, and obtain the target credit Overall risk prediction model.

本申请还提供一种电子设备，所述电子设备包括：存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的所述信用风险预测模型训练方法的程序，所述信用风险预测模型训练方法的程序被处理器执行时可实现如上述的信用风险预测模型训练方法的步骤。The present application also provides an electronic device, which includes: a memory, a processor, and the program of the credit risk prediction model training method stored on the memory and operable on the processor, the credit When the program of the risk prediction model training method is executed by the processor, the steps of the above credit risk prediction model training method can be realized.

本申请还提供一种计算机可读存储介质，所述计算机可读存储介质上存储有实现信用风险预测模型训练方法的程序，所述信用风险预测模型训练方法的程序被处理器执行时实现如上述的信用风险预测模型训练方法的步骤。The present application also provides a computer-readable storage medium, on which a program for implementing a credit risk prediction model training method is stored. When the program of the credit risk prediction model training method is executed by a processor, the above-mentioned The steps of the credit risk prediction model training method.

本申请还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述的信用风险预测模型训练方法的步骤。The present application also provides a computer program product, including a computer program. When the computer program is executed by a processor, the steps of the above-mentioned credit risk prediction model training method are realized.

本申请提供了一种信用风险预测模型训练方法、电子设备及可读存储介质，相比于依据训练好的多个风险子模型对于用户行为数据的预测信用风险和用户行为数据对应的真实信用风险，对待训练风险总模型进行训练，得到风险总模型的方法，本申请通过获取用户的历史行为数据作为训练样本集以及所述训练样本集中各训练样本的样本权重；根据所述训练样本集和各所述样本权重，迭代训练得到信用风险预测总模型，其中，所述信用风险预测总模型由多个风险预测子模型组成，以获取各所述风险预测子模型对于所述训练样本集的子模型预测结果，依据各所述子模型预测结果，对所述风险预测子模型进行聚类，得到至少一个子模型群，实现了对风险预测子模型进行聚类分析，从而得到可为信用风险预测总模型提供补充信息的子模型群，进而依据各所述子模型预测结果和各所述子模型群，优化各所述样本权重，从而扩增所述训练样本集，以供信用风险预测总模型根据扩增后的训练样本进行增量学习，返回执行步骤：根据所述训练样本集和各所述样本权重，迭代训练得到信用风险预测总模型，直至所述信用风险预测总模型满足预设迭代更新结束条件，得到目标信用风险预测总模型，为信用风险预测总模型迭代合适的样本权重，使得根据迭代优化后的各样本权重扩增后的训练样本集训练得到的各风险预测子模型的模型加权权重分布均匀，从而提高了具有差异化的风险预测子模型的数据补充效果，避免了采用依据训练好的多个风险子模型对于用户行为数据的预测信用风险和用户行为数据对应的真实信用风险，对待训练风险总模型进行训练，得到风险总模型的方法时，容易出现由于风险总模型过于依赖少量的风险子模型，导致风险总模型的预测准确度低的技术缺陷，从而提高了用户信用风险的预测准确度。The present application provides a credit risk prediction model training method, electronic equipment and readable storage medium, compared to the predicted credit risk of user behavior data based on multiple trained risk sub-models and the real credit risk corresponding to user behavior data , to train the total risk model to be trained to obtain the total risk model, the application obtains the user's historical behavior data as the training sample set and the sample weight of each training sample in the training sample set; according to the training sample set and each The sample weights are iteratively trained to obtain a total credit risk prediction model, wherein the total credit risk prediction model is composed of a plurality of risk prediction sub-models, so as to obtain the sub-models of each of the risk prediction sub-models for the training sample set Prediction results, according to the prediction results of each of the sub-models, the risk prediction sub-models are clustered to obtain at least one sub-model group, and the cluster analysis of the risk prediction sub-models is realized, so as to obtain the credit risk prediction total The model provides a sub-model group of supplementary information, and then optimizes the weight of each sample according to the prediction results of each sub-model and each sub-model group, thereby expanding the training sample set for the overall credit risk prediction model. Carry out incremental learning on the amplified training samples, and return to the execution step: according to the training sample set and each sample weight, iteratively train to obtain the overall credit risk prediction model until the overall credit risk prediction model meets the preset iterative update The end condition is to obtain the overall model of target credit risk prediction, iterate the appropriate sample weights for the overall credit risk prediction model, so that the model weights of each risk prediction sub-model obtained by training according to the training sample set after the iterative optimization of each sample weight are amplified The weight distribution is uniform, thereby improving the data supplement effect of the differentiated risk prediction sub-model, and avoiding the use of multiple risk sub-models trained based on the predicted credit risk of user behavior data and the real credit risk corresponding to user behavior data. When the total risk model is to be trained to obtain the total risk model, it is easy to have the technical defect that the total risk model is too dependent on a small number of risk sub-models, resulting in low prediction accuracy of the total risk model, thereby increasing the user's credit risk. prediction accuracy.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本申请的实施例，并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application.

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，对于本领域普通技术人员而言，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, for those of ordinary skill in the art, In other words, other drawings can also be obtained from these drawings without paying creative labor.

图1为本申请信用风险预测模型训练方法第一实施例的流程示意图；Fig. 1 is a schematic flow chart of the first embodiment of the credit risk prediction model training method of the present application;

图2为本申请信用风险预测模型训练方法第二实施例的流程示意图；Fig. 2 is a schematic flow chart of the second embodiment of the credit risk prediction model training method of the present application;

图3为本申请信用风险预测模型训练方法涉及的装置示意图；Fig. 3 is a schematic diagram of the devices involved in the credit risk prediction model training method of the present application;

图4为本申请实施例中信用风险预测模型训练方法涉及的硬件运行环境的设备结构示意图。FIG. 4 is a schematic diagram of the equipment structure of the hardware operating environment involved in the credit risk prediction model training method in the embodiment of the present application.

本申请目的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization of the purpose, functions and advantages of the present application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

具体实施方式Detailed ways

为使本申请的上述目的、特征和优点能够更加明显易懂，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述。显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其它实施例，均属于本申请保护的范围。In order to make the above objects, features and advantages of the present application more obvious and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Apparently, the described embodiments are only some of the embodiments of this application, not all of them. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments in the present application without creative efforts shall fall within the protection scope of the present application.

实施例一Embodiment one

本申请实施例提供一种信用风险预测模型训练方法，在本申请信用风险预测模型训练方法的第一实施例中，参照图1，所述信用风险预测模型训练方法包括：The embodiment of the present application provides a credit risk prediction model training method. In the first embodiment of the credit risk prediction model training method of the present application, referring to FIG. 1, the credit risk prediction model training method includes:

步骤S10，获取用户的历史行为数据作为训练样本集以及所述训练样本集中各训练样本的样本权重；Step S10, acquiring the user's historical behavior data as a training sample set and the sample weight of each training sample in the training sample set;

在本实施例中，需要说明的是，所述用户的数量可以为一个，也可以为多个，所述历史行为数据为用户在上一时间步的针对于银行的行为数据，所述时间步可以设置为周期性提取的时间周期，所述历史行为数据的数据数量为多个。In this embodiment, it should be noted that the number of users can be one or multiple, and the historical behavior data is the bank-specific behavior data of the user in the last time step, and the time step It can be set as a time period for periodic extraction, and the data quantity of the historical behavior data is multiple.

示例性地，步骤S10包括：获取用户在上一时间步的行为数据，得到历史行为数据，将历史行为数据作为训练样本集，其中，所述时间步可以为一个月，也可以为一年，也可以为三年；所述时间步可以为评判用户的信用风险行为最佳的评判时间经验值，也可以为用户信用风险评判方自行设置；所述历史行为数据可以为同一个用户的多个时间段的行为数据，也可以为不同用户的多个时间段的行为数据，还可以为不同用户的同一时间段的行为数据；获取所述训练样本集中各训练样本的样本权重，其中，所述样本权重可以根据各风险预测子模型对于各训练样本的子模型预测结果计算得到，也可以为信用风险预测效果最佳的各训练样本的权重经验值，还可以为用户信用风险评判方自行设置。Exemplarily, step S10 includes: obtaining the behavior data of the user in the previous time step, obtaining historical behavior data, and using the historical behavior data as a training sample set, wherein the time step can be one month or one year, It can also be three years; the time step can be the best judging time experience value for judging the user's credit risk behavior, or it can be set by the user's credit risk judging party; the historical behavior data can be the same user's multiple The behavior data of the time period can also be the behavior data of different users in multiple time periods, and can also be the behavior data of different users in the same time period; obtain the sample weight of each training sample in the training sample set, wherein the The sample weight can be calculated according to the sub-model prediction results of each risk prediction sub-model for each training sample, or it can be the weight experience value of each training sample with the best credit risk prediction effect, or it can be set by the user's credit risk judge.

步骤S20，根据所述训练样本集和各所述样本权重，迭代训练得到信用风险预测总模型，其中，所述信用风险预测总模型由多个风险预测子模型组成；Step S20, according to the training sample set and each sample weight, perform iterative training to obtain a credit risk prediction overall model, wherein the credit risk prediction overall model is composed of multiple risk prediction sub-models;

步骤S30，获取各所述风险预测子模型对于所述训练样本集的子模型预测结果，依据各所述子模型预测结果，对所述风险预测子模型进行聚类，得到至少一个子模型群；Step S30, obtaining the sub-model prediction results of each of the risk prediction sub-models for the training sample set, and clustering the risk prediction sub-models according to the prediction results of each of the sub-models to obtain at least one sub-model group;

步骤S40，依据各所述子模型预测结果和各所述子模型群，优化各所述样本权重，以扩增所述训练样本集；Step S40, according to the prediction results of each of the sub-models and each of the sub-model groups, optimize the weight of each of the samples to expand the training sample set;

步骤S50，返回执行步骤：根据所述训练样本集和各所述样本权重，迭代训练得到信用风险预测总模型，直至所述信用风险预测总模型满足预设迭代更新结束条件，得到目标信用风险预测总模型。Step S50, return to the execution step: according to the training sample set and the weight of each sample, iteratively train to obtain the overall credit risk prediction model until the overall credit risk prediction model meets the preset iterative update end condition, and obtain the target credit risk prediction total model.

在本实施例中，需要说明的是，所述风险预测子模型可以为评分卡模型，也可以为树模型，还可以为深度学习模型。所述子模型群为至少一个风险预测子模型的集群。In this embodiment, it should be noted that the risk prediction sub-model may be a scorecard model, a tree model, or a deep learning model. The sub-model group is a cluster of at least one risk prediction sub-model.

示例性地，步骤S20至步骤S50包括：根据所述训练样本集和各所述样本权重，对待训练信用风险预测总模型进行迭代训练得到信用风险预测总模型；获取各所述风险预测子模型对于所述训练样本集的子模型预测结果，依据各所述子模型预测结果，对各所述子模型结果进行聚类分析或者相似度分析，得到分析结果，依据所述分析结果，对所述风险预测子模型进行聚类，得到至少一个子模型群；依据各所述子模型预测结果和各所述子模型群，优化各所述样本权重，依据各所述样本权重，对所述训练样本集进行扩增；返回执行步骤：根据所述训练样本集和各所述样本权重，迭代训练得到信用风险预测总模型，若检测到所述信用风险预测总模型满足预设迭代更新结束条件，则将所述信用风险预测总模型作为目标信用风险预测总模型，其中，由于通过优化各样本权重扩增了训练样本集，因此实现了在持续迭代训练信用风险预测总模型的过程中利用了多样的训练样本集来训练信用风险预测总模型的目的，从而增强了最终训练得到的目标信用风险预测总模型的泛化性。Exemplarily, steps S20 to S50 include: according to the training sample set and each sample weight, perform iterative training on the overall credit risk prediction model to be trained to obtain the overall credit risk prediction model; obtain each of the risk prediction sub-models for The sub-model prediction results of the training sample set, according to the sub-model prediction results, perform cluster analysis or similarity analysis on the sub-model results to obtain the analysis results, and according to the analysis results, the risk The prediction sub-models are clustered to obtain at least one sub-model group; according to the prediction results of each of the sub-models and each of the sub-model groups, the weights of each of the samples are optimized, and according to the weights of each of the samples, the training sample set is Perform amplification; return to the execution step: according to the training sample set and each sample weight, iteratively train to obtain the overall credit risk prediction model, if it is detected that the overall credit risk prediction model meets the preset iterative update end condition, then the The overall credit risk forecasting model is used as the target credit risk forecasting overall model, wherein, since the training sample set is amplified by optimizing the weights of each sample, various training methods are utilized in the process of continuously iteratively training the overall credit risk forecasting model. The purpose of training the overall credit risk prediction model is to use the sample set, thereby enhancing the generalization of the target credit risk prediction overall model obtained from the final training.

其中，在步骤S20中，所述根据所述训练样本集和各所述样本权重，迭代训练得到信用风险预测总模型，其中，所述信用风险预测总模型由多个风险预测子模型组成的步骤包括：Wherein, in step S20, said iterative training according to said training sample set and each said sample weight to obtain a credit risk prediction model, wherein said credit risk prediction model is composed of multiple risk prediction sub-models include:

步骤S21，从所述训练样本集中选取训练样本，并根据所述训练样本和所述训练样本对应的样本权重确定待预测样本；Step S21, selecting a training sample from the training sample set, and determining a sample to be predicted according to the training sample and the sample weight corresponding to the training sample;

步骤S22，将所述待预测样本分别输入各所述风险预测子模型，得到各子模型输出预测结果；Step S22, inputting the samples to be predicted into each of the risk prediction sub-models to obtain the output prediction results of each sub-model;

步骤S23，根据各所述风险预测子模型对应的模型加权权重，对各所述子模型输出预测结果进行加权聚合，得到总模型输出预测结果；Step S23, performing weighted aggregation on the output prediction results of each of the sub-models according to the model weights corresponding to each of the risk prediction sub-models to obtain the overall model output prediction result;

步骤S24，根据所述总模型输出预测结果，优化各所述模型加权权重以及各所述风险预测子模型；Step S24, outputting prediction results according to the overall model, optimizing the weighted weights of each of the models and each of the risk prediction sub-models;

步骤S25，返回执行步骤：从所述训练样本集中选取训练样本，并根据所述训练样本和所述训练样本对应的样本权重确定待预测样本，直至各所述模型加权权重满足预设权重条件以及各所述子模型参数满足预设模型参数条件，得到所述信用风险预测总模型。Step S25, return to the execution step: select training samples from the training sample set, and determine the samples to be predicted according to the training samples and the sample weights corresponding to the training samples, until the weighted weights of each of the models meet the preset weight conditions and Each of the sub-model parameters satisfies the preset model parameter conditions, and the overall credit risk prediction model is obtained.

在本实施例中，需要说明的是，所述待预测样本包括训练样本和所述训练样本对应的样本权重，所述待预测样本用于对信用风险预测总模型进行迭代训练。In this embodiment, it should be noted that the samples to be predicted include training samples and sample weights corresponding to the training samples, and the samples to be predicted are used for iterative training of the overall credit risk prediction model.

示例性地，步骤S21至步骤S25包括：从所述训练样本集中随机选取训练样本，或者按预设顺序从所述训练样本集中选取训练样本，其中，所述预设顺序可以由各样本权重生成，并根据所述训练样本和所述训练样本对应的样本权重，生成待预测样本；将所述待预测样本分别输入各所述风险预测子模型，得到各所述风险预测子模型对于所述待预测样本的子模型输出预测结果；根据各所述风险预测子模型对应的模型加权权重，通过预设聚合方法对各所述子模型输出预测结果进行加权聚合，得到子模型输出特征，通过所述信用风险预测总模型将所述子模型输出特征映射为总模型输出预测结果，其中，所述预设聚合方法可以为均值聚合方法，也可以为线性聚合方法，还可以为增叠聚合方法等其他聚合方法；获取所述训练样本对应的真实标签，根据所述真实标签和所述总模型输出预测结果之间的差异度，优化各所述风险预测子模型对应的模型加权权重以及各所述风险预测子模型；返回执行步骤：从所述训练样本集中选取训练样本，并根据所述训练样本和所述训练样本对应的样本权重确定待预测样本，直至各所述模型加权权重满足预设权重条件以及各所述子模型参数满足预设模型参数条件，得到所述信用风险预测总模型。Exemplarily, steps S21 to S25 include: randomly selecting training samples from the training sample set, or selecting training samples from the training sample set in a preset order, wherein the preset order can be generated by each sample weight , and according to the training sample and the sample weight corresponding to the training sample, a sample to be predicted is generated; the sample to be predicted is input into each of the risk prediction sub-models respectively, and each of the risk prediction sub-models is obtained for the to-be-predicted The sub-models of the prediction samples output prediction results; according to the model weighted weights corresponding to each of the risk prediction sub-models, the output prediction results of each of the sub-models are weighted and aggregated by a preset aggregation method to obtain the sub-model output characteristics, through the The overall credit risk prediction model maps the output features of the sub-models to the output prediction results of the overall model, wherein the preset aggregation method can be a mean aggregation method, a linear aggregation method, or an accumulating aggregation method, etc. Aggregation method; obtain the real label corresponding to the training sample, optimize the model weighting weight corresponding to each of the risk prediction sub-models and the risk Predict the sub-model; return to the execution step: select training samples from the training sample set, and determine the samples to be predicted according to the training samples and the sample weights corresponding to the training samples, until the weighted weights of each of the models meet the preset weight conditions And each sub-model parameter satisfies the preset model parameter condition, so as to obtain the overall credit risk prediction model.

作为一种示例，判定所述各所述模型加权权重满足预设权重条件可以为：若各所述模型加权权重满足预设权重范围，其中，所述预设权重范围为预先设置的判定各所述模型加权权重分布均匀的各模型加权权重的范围，则判定各所述模型加权权重满足预设权重条件，若各所述模型加权权重中存在目标模型加权权重不满足预设权重范围，则判定各所述模型加权权重不满足预设权重条件，或者，判断各所述模型加权权重中是否存在大于第一预设权重阈值或者小于第二预设权重阈值的目标模型加权权重，若各所述模型加权权重中存在大于第一预设权重阈值或者小于第二预设权重阈值的目标模型加权权重，则判定各所述模型加权权重不满足预设权重条件，若各所述模型加权权重中不存在大于第一预设权重阈值或者小于第二预设权重阈值的目标模型加权权重，则判定各所述模型加权权重满足预设权重条件。As an example, determining that the weighted weights of each of the models meet a preset weight condition may be: if the weighted weights of each of the models meet a preset weight range, wherein the preset weight range is a preset If the weighted weights of each model are evenly distributed, it is determined that the weighted weights of each of the models meet the preset weight condition, and if there is a weighted weight of the target model in each of the weighted weights of the models that does not meet the preset weight range, it is determined that Each of the weighted weights of the models does not meet the preset weight condition, or it is judged whether there is a weighted weight of the target model greater than the first preset weight threshold or less than the second preset weight threshold among the weighted weights of each of the models, if each of the If there is a target model weight greater than the first preset weight threshold or less than the second preset weight threshold in the model weighted weights, it is determined that each of the model weights does not meet the preset weight conditions, and if each of the model weights does not If there is a target model weight greater than the first preset weight threshold or less than the second preset weight threshold, it is determined that each of the model weights satisfies the preset weight condition.

作为一种示例，所述直至各所述模型加权权重满足预设权重条件以及各所述子模型参数满足预设模型参数条件，得到所述信用风险预测总模型的步骤可以为：根据所述差异度，构建所述信用风险预测总模型对应的模型损失；判断所述模型损失是否收敛，若收敛，则将各所述风险预测子模型对应的模型加权权重以及各所述风险预测子模型下的信用风险预测总模型作为信用风险预测总模型，若未收敛，则根据所述模型损失计算的梯度，更新所述信用风险预测总模型，并返回执行步骤：从所述训练样本集中选取训练样本，并根据所述训练样本和所述训练样本对应的样本权重确定待预测样本，直至计算得到的模型损失收敛。As an example, until the weighted weight of each of the models meets the preset weight condition and each of the sub-model parameters meets the preset model parameter condition, the step of obtaining the overall credit risk prediction model may be: according to the difference Degree, construct the model loss corresponding to the overall credit risk prediction model; judge whether the model loss is convergent, if convergent, then weight the model weights corresponding to each of the risk prediction sub-models and the weight of each of the risk prediction sub-models The overall credit risk prediction model is used as the overall credit risk prediction model. If it does not converge, update the overall credit risk prediction model according to the gradient calculated by the model loss, and return to the execution step: select training samples from the training sample set, And according to the training sample and the sample weight corresponding to the training sample, the sample to be predicted is determined until the calculated model loss converges.

可以理解的是，风险预测子模型由训练样本和待训练风险预测子模型迭代训练得到，本申请实施例通过训练样本和对应的多次优化后的各样本权重，对风险预测子模型进行多次迭代训练，使得各风险预测子模型的模型加权权重分布均匀。It can be understood that the risk prediction sub-model is obtained by iterative training of the training samples and the risk prediction sub-model to be trained. In the embodiment of the present application, the risk prediction sub-model is performed multiple times through the training samples and the corresponding weights of each sample after multiple optimizations. Iterative training makes the model weight distribution of each risk prediction sub-model uniform.

其中，在步骤S40中，所述依据各所述子模型预测结果和各所述子模型群，优化各所述样本权重，以扩增所述训练样本集的步骤包括：Wherein, in step S40, the step of optimizing each of the sample weights according to the prediction results of each of the sub-models and each of the sub-model groups to expand the training sample set includes:

步骤S41，在各所述子模型群中选取满足预设条件的目标子模型群，其中，所述预设条件包括相关性条件和重要度条件中的至少一种；Step S41, selecting a target sub-model group that satisfies a preset condition in each of the sub-model groups, wherein the preset condition includes at least one of a correlation condition and an importance condition;

步骤S42，依据所述目标子模型群和各所述子模型预测结果，调整各所述样本权重，以扩增所述训练样本集。Step S42, adjusting the weight of each sample according to the target sub-model group and the prediction results of each sub-model, so as to expand the training sample set.

在本实施例中，需要说明的是，所述预设条件为预设的筛选需要进行样本权重优化的目标子模型群的子模型群条件。In this embodiment, it should be noted that the preset condition is a preset sub-model group condition for screening target sub-model groups that need sample weight optimization.

示例性地，步骤S41至步骤S42包括：在各所述子模型群中选取满足预设条件的目标子模型群；依据所述目标子模型群和所述训练样本集，调整所述训练样本集中各训练样本的样本权重，得到调整权重，以对所述训练样本集进行扩增。Exemplarily, steps S41 to S42 include: selecting a target sub-model group satisfying preset conditions in each of the sub-model groups; adjusting the training sample set according to the target sub-model group and the training sample set The sample weight of each training sample is obtained to adjust the weight to amplify the training sample set.

其中，在步骤S41中，所述在各所述子模型群中选取满足预设条件的目标子模型群，其中，所述预设条件包括相关性条件和重要度条件中的至少一种的步骤包括：Wherein, in step S41, the target sub-model group that satisfies the preset condition is selected in each of the sub-model groups, wherein the preset condition includes at least one of a correlation condition and an importance condition include:

步骤A10，获取所述信用风险预测总模型对于所述训练样本集的总模型预测结果；Step A10, obtaining the overall model prediction result of the overall credit risk prediction model for the training sample set;

步骤A20，依据各所述子模型预测结果和所述总模型预测结果，生成各所述子模型群与所述信用风险预测总模型之间的子模型群相关性；Step A20, according to the prediction results of each of the sub-models and the prediction result of the overall model, generate the sub-model group correlation between each of the sub-model groups and the overall credit risk prediction model;

步骤A30，在各所述子模型群中选取所述子模型群相关性小于预设相关性阈值的目标子模型群。Step A30, selecting a target sub-model group whose correlation of the sub-model groups is smaller than a preset correlation threshold among each of the sub-model groups.

在本实施例中，需要说明的是，所述预设相关性阈值为预先设置的判定需要进行样本权重优化的子模型群与信用风险预测总模型之间的子模型群相关性的临界值。In this embodiment, it should be noted that the preset correlation threshold is a preset critical value of the sub-model group correlation between the sub-model group that needs sample weight optimization and the overall credit risk prediction model.

示例性地，步骤A10至步骤A30包括：通过将所述训练样本集中各训练样本输入至信用风险预测总模型，得到总模型预测结果；获取各所述子模型预测结果和所述总模型预测结果之间的结果相关性，将所述结果相关性作为风险预测子模型与所述信用风险预测总模型之间的子模型相关性；依据各所述子模型相关性，生成各所述子模型群与所述信用风险预测总模型之间的子模型群相关性；在各所述子模型群中选取所述子模型群相关性小于预设相关性阈值的目标子模型群。Exemplarily, step A10 to step A30 include: by inputting each training sample in the training sample set into the overall credit risk prediction model to obtain the overall model prediction result; obtaining each of the sub-model prediction results and the overall model prediction result The correlation between the results, using the correlation of the results as the sub-model correlation between the risk prediction sub-model and the overall credit risk prediction model; according to the correlation of each of the sub-models, generating each sub-model group and the sub-model group correlation between the overall credit risk prediction model; selecting a target sub-model group whose correlation with the sub-model group is smaller than a preset correlation threshold among each of the sub-model groups.

作为一种示例，所述依据各所述子模型相关性，生成各所述子模型群与所述信用风险预测总模型之间的子模型群相关性的步骤可以为：将所述子模型群中各所述风险预测子模型对应的子模型相关性的和或者平均值，作为所述信用风险预测总模型之间的子模型群相关性。As an example, the step of generating the sub-model group correlation between each of the sub-model groups and the overall credit risk forecasting model according to the sub-model correlation may be as follows: the sub-model group The sum or average value of the sub-model correlations corresponding to each of the risk prediction sub-models in , is used as the sub-model group correlation between the overall credit risk prediction models.

其中，在步骤S41中，所述在各所述子模型群中选取满足预设条件的目标子模型群，其中，所述预设条件包括相关性条件和重要度条件中的至少一种的步骤，还包括：Wherein, in step S41, the target sub-model group that satisfies the preset condition is selected in each of the sub-model groups, wherein the preset condition includes at least one of a correlation condition and an importance condition ,Also includes:

步骤B10，获取各所述风险预测子模型的模型加权权重；Step B10, obtaining the model weights of each of the risk prediction sub-models;

步骤B20，依据所述模型加权权重，确定各所述子模型群对于所述信用风险预测总模型的子模型群重要度；Step B20, according to the weighted weight of the model, determine the importance of each sub-model group to the sub-model group of the overall credit risk prediction model;

步骤B30，在各所述子模型群中选取所述子模型群重要度小于预设重要度阈值的目标子模型群。Step B30 , selecting a target sub-model group whose importance degree of the sub-model group is less than a preset importance threshold in each sub-model group.

在本实施例中，需要说明的是，所述预设重要度阈值为预先设置的判定需要进行样本权重优化的子模型群与信用风险预测总模型之间的子模型群重要度的临界值。In this embodiment, it should be noted that the preset importance threshold is a preset critical value of the importance of the sub-model group between the sub-model group that needs sample weight optimization and the overall credit risk prediction model.

示例性地，步骤B10至步骤B30包括：获取各所述风险预测子模型的模型加权权重；将各所述模型加权权重作为各所述风险预测子模型对于所述信用风险预测总模型的子模型重要度，依据各所述子模型重要度，生成各所述子模型群对于所述信用风险预测总模型的子模型群重要度；在各所述子模型群中选取所述子模型群重要度小于预设重要度阈值的目标子模型群。Exemplarily, steps B10 to B30 include: obtaining the model weights of each of the risk prediction sub-models; using each of the model weights as a sub-model of each of the risk prediction sub-models for the overall credit risk prediction model Importance, according to the importance of each sub-model, generate the sub-model group importance of each sub-model group for the overall credit risk prediction model; select the sub-model group importance in each sub-model group The target sub-model group that is less than the preset importance threshold.

作为一种示例，所述依据各所述子模型重要度，生成各所述子模型群对于所述信用风险预测总模型的子模型群重要度的步骤可以为：将所述子模型群中各所述风险预测子模型对应的子模型重要度的和或者平均值，作为所述信用风险预测总模型之间的子模型群重要度。As an example, according to the importance of each of the sub-models, the step of generating the importance of the sub-model groups of each of the sub-model groups for the overall credit risk prediction model may be: The sum or average value of the sub-model importances corresponding to the risk prediction sub-models is used as the sub-model group importance among the overall credit risk prediction models.

步骤C10，在各所述子模型群中选取所述子模型群相关性小于预设相关性阈值且所述子模型群重要度小于预设重要度阈值的目标子模型群。Step C10 , selecting a target sub-model group whose correlation of the sub-model group is less than a preset correlation threshold and whose importance degree is less than a preset importance threshold in each of the sub-model groups.

示例性地，步骤C10包括：依据各所述子模型群相关性和各所述子模型群重要度，在各所述子模型群中选取所述子模型群相关性小于预设相关性阈值且所述子模型群重要度小于预设重要度阈值的目标子模型群。Exemplarily, step C10 includes: according to the correlation of each of the sub-model groups and the importance of each of the sub-model groups, selecting the correlation of the sub-model groups in each of the sub-model groups is less than a preset correlation threshold and The target sub-model group whose importance degree of the sub-model group is less than a preset importance threshold.

其中，在步骤S42中，所述依据所述目标子模型群和各所述子模型预测结果，调整各所述样本权重的步骤包括：Wherein, in step S42, the step of adjusting each sample weight according to the target sub-model group and each sub-model prediction result includes:

步骤D10，依据所述目标子模型群和各所述子模型预测结果，确定所述目标子模型群的预测结果分布信息；Step D10, according to the target sub-model group and the prediction results of each of the sub-models, determine the distribution information of the prediction results of the target sub-model group;

步骤D20，依据所述预测结果分布信息，调整各所述样本权重。Step D20, adjusting the weight of each sample according to the prediction result distribution information.

在本实施例中，需要说明的是，所述预测结果分布信息包括所述目标子模型群中各风险预测子模型的子模型预测结果的分布信息。所述预测结果分布信息可以为各子模型预测结果的数据分布信息，也可以为各子模型预测结果的向量分布信息，还可以为各子模型预测结果的占比信息。In this embodiment, it should be noted that the prediction result distribution information includes distribution information of sub-model prediction results of each risk prediction sub-model in the target sub-model group. The prediction result distribution information may be the data distribution information of the prediction results of each sub-model, the vector distribution information of the prediction results of each sub-model, or the proportion information of the prediction results of each sub-model.

示例性地，步骤D10至步骤D20包括：依据各所述子模型预测结果，确定所述目标子模型群中各风险预测子模型的子模型预测结果的分布信息，得到所述预测结果分布信息；依据所述预测结果分布信息，调整各所述样本权重。Exemplarily, steps D10 to D20 include: according to the prediction results of each of the sub-models, determine the distribution information of the sub-model prediction results of each risk prediction sub-model in the target sub-model group, and obtain the distribution information of the prediction results; Adjusting the weight of each sample according to the distribution information of the prediction result.

其中，在步骤D20中，所述依据所述预测结果分布信息，调整各所述样本权重的步骤包括：Wherein, in step D20, the step of adjusting the weight of each sample according to the distribution information of the prediction result includes:

步骤D21，获取所述预测结果分布信息中各所述训练样本被各所述风险预测子模型正确预测的正确预测占比，其中，所述预测结果分布信息包括各风险预测子模型针对于训练样本的子模型预测结果；Step D21, obtaining the correct prediction proportion of each of the training samples correctly predicted by each of the risk prediction sub-models in the prediction result distribution information, wherein the prediction result distribution information includes each risk prediction sub-model for the training sample The sub-model prediction results of ;

步骤D22，判断所述正确预测占比是否大于预设占比阈值；Step D22, judging whether the correct predicted proportion is greater than a preset proportion threshold;

步骤D23，若是，则调低各所述训练样本对应的样本权重；Step D23, if yes, lower the sample weight corresponding to each of the training samples;

步骤D24，若否，则调高各所述训练样本对应的样本权重。Step D24, if not, increase the sample weight corresponding to each of the training samples.

在本实施例中，需要说明的是，所述预设占比阈值为预先设置的判定训练样本被所述目标子模型群成功预测的各所述训练样本被各所述风险预测子模型正确预测的正确预测占比的临界值。In this embodiment, it should be noted that the preset percentage threshold is a preset determination that each of the training samples successfully predicted by the target sub-model group is correctly predicted by each of the risk prediction sub-models. The critical value of the proportion of correct predictions of .

作为一种示例，步骤D21至步骤D24包括：所述预测结果分布包括各所述子模型预测结果的预测占比信息，所述预测占比信息包括正确预测占比和错误预测占比，判断所述正确预测占比是否大于预设占比阈值；若所述正确预测占比大于预设占比阈值，则调低各所述训练样本对应的样本权重；若所述正确预测占比不大于预设占比阈值，则调高各所述训练样本对应的样本权重。As an example, step D21 to step D24 include: the prediction result distribution includes prediction proportion information of each sub-model prediction result, the prediction proportion information includes correct prediction proportion and wrong prediction proportion, and judging the Whether the correct prediction ratio is greater than the preset ratio threshold; if the correct prediction ratio is greater than the preset ratio threshold, lower the sample weight corresponding to each of the training samples; if the correct prediction ratio is not greater than the preset If the proportion threshold is set, the sample weight corresponding to each training sample is increased.

作为一种示例，步骤D21至步骤D24包括：所述预测结果分布包括各所述子模型预测结果的数据分布信息，若所述数据分布信息满足预设数据范围，其中，所述预设分布范围为预先设置的判定训练样本被所述目标子模型群成功预测的各子模型预测结果的数据分布范围，则调低各所述训练样本对应的样本权重；若所述数据分布信息不满足预设数据范围，则调高各所述训练样本对应的样本权重。或者，所述预测结果分布包括各子模型预测结果的向量分布信息，若所述向量分布信息满足预设向量范围，其中，所述预设向量范围为预先设置的判定训练样本被所述目标子模型群成功预测的各子模型预测结果的向量分布范围，则调低各所述训练样本对应的样本权重；若所述数据分布信息不满足预设向量范围，则调高各所述训练样本对应的样本权重。As an example, steps D21 to D24 include: the distribution of predicted results includes data distribution information of the predicted results of each sub-model, if the data distribution information satisfies a preset data range, wherein the preset distribution range The pre-set judgment training sample is the data distribution range of each sub-model prediction result successfully predicted by the target sub-model group, then lower the sample weight corresponding to each of the training samples; if the data distribution information does not meet the preset If the data range is large, the sample weight corresponding to each training sample is increased. Alternatively, the prediction result distribution includes the vector distribution information of the prediction results of each sub-model, and if the vector distribution information satisfies a preset vector range, wherein the preset vector range is that the preset judgment training sample is determined by the target sub-model The vector distribution range of each sub-model prediction result successfully predicted by the model group, then lower the sample weight corresponding to each of the training samples; if the data distribution information does not meet the preset vector range, then increase the weight corresponding to each of the training samples The sample weight of .

本申请实施例提供了一种信用风险预测模型训练方法，相比于依据训练好的多个风险子模型对于用户行为数据的预测信用风险和用户行为数据对应的真实信用风险，对待训练风险总模型进行训练，得到风险总模型的方法，本申请实施例通过获取用户的历史行为数据，依据历史行为数据确定信用风险预测总模型的训练样本集以及训练样本集中各训练样本的样本权重，其中，信用风险预测总模型由多个风险预测子模型通过模型加权得到，以采集各风险预测子模型对于训练样本集的子模型预测结果，从而依据子模型预测结果，对各风险预测子模型进行分类，得到至少一个子模型群，实现了对风险预测子模型进行聚类分析，从而得到可为信用风险预测总模型提供补充信息的子模型群，进而依据训练样本集和子模型群，优化各所述样本权重，从而扩增所述训练样本集，以供信用风险预测总模型根据扩增后的训练样本进行增量学习，返回执行步骤：根据所述训练样本集和各所述样本权重，迭代训练得到信用风险预测总模型，直至所述信用风险预测总模型满足预设迭代更新结束条件，得到目标信用风险预测总模型，为信用风险预测总模型迭代合适的样本权重，使得根据迭代优化后的各样本权重扩增后的训练样本集训练得到的各风险预测子模型的模型加权权重分布均匀，从而提高了具有差异化的风险预测子模型的数据补充效果，避免了采用依据训练好的多个风险子模型对于用户行为数据的预测信用风险和用户行为数据对应的真实信用风险，对待训练风险总模型进行训练，得到风险总模型的方法时，容易出现由于风险总模型过于依赖少量的风险子模型，导致风险总模型的预测准确度低的技术缺陷，从而提高了用户信用风险的预测准确度。The embodiment of the present application provides a credit risk prediction model training method. Compared with the predicted credit risk of user behavior data based on multiple trained risk sub-models and the real credit risk corresponding to user behavior data, the total model of training risk Perform training to obtain the method of the overall risk model. The embodiment of the present application obtains the user's historical behavior data, and determines the training sample set of the overall credit risk prediction model and the sample weight of each training sample in the training sample set according to the historical behavior data. The overall risk prediction model is obtained by multiple risk prediction sub-models through model weighting, so as to collect the sub-model prediction results of each risk prediction sub-model for the training sample set, and then classify each risk prediction sub-model according to the sub-model prediction results, and obtain At least one sub-model group realizes the cluster analysis of the risk prediction sub-models, thereby obtaining a sub-model group that can provide supplementary information for the overall credit risk prediction model, and then optimizes the weight of each sample according to the training sample set and the sub-model group , so as to amplify the training sample set for the credit risk prediction model to carry out incremental learning according to the expanded training samples, and return to the execution step: according to the training sample set and each sample weight, iteratively train to obtain the credit The overall risk forecast model, until the overall credit risk forecast model satisfies the preset iterative update end condition, obtain the target credit risk forecast overall model, and iterate appropriate sample weights for the overall credit risk forecast model, so that each sample weight after iterative optimization The model weighted weight distribution of each risk prediction sub-model obtained from the training of the expanded training sample set is uniform, thereby improving the data supplement effect of the differentiated risk prediction sub-model and avoiding the use of multiple risk sub-models trained according to For the predicted credit risk of user behavior data and the real credit risk corresponding to user behavior data, when the total risk model to be trained is trained to obtain the total risk model, it is easy to cause risk due to the total risk model being too dependent on a small number of risk sub-models. The technical defect of low prediction accuracy of the overall model improves the prediction accuracy of user credit risk.

实施例二Embodiment two

进一步地，参照图2，基于本申请第一实施例，在本申请另一实施例中，与上述实施例一相同或相似的内容，可以参考上文介绍，后续不再赘述。在此基础上，其中，在步骤S30中，所述依据各所述子模型预测结果，对所述风险预测子模型进行聚类，得到至少一个子模型群的步骤包括：Further, referring to FIG. 2 , based on the first embodiment of the present application, in another embodiment of the present application, for the same or similar content as the first embodiment above, reference may be made to the introduction above, and details will not be repeated hereafter. On this basis, wherein, in step S30, the step of clustering the risk prediction sub-models according to the prediction results of each of the sub-models to obtain at least one sub-model group includes:

步骤S31，获取各所述子模型预测结果之间的子模型预测结果相似度，将所述子模型预测结果相似度作为各所述风险预测子模型之间的子模型相似度；Step S31, obtaining the sub-model prediction result similarity between the sub-model prediction results, and using the sub-model prediction result similarity as the sub-model similarity between the risk prediction sub-models;

步骤S32，依据所述子模型相似度，对各所述风险预测子模型进行聚类得到至少一个子模型群。Step S32, clustering each of the risk prediction sub-models according to the similarity of the sub-models to obtain at least one sub-model group.

示例性地，步骤S31至步骤S32包括：通过预设相似度算法计算各所述子模型预测结果之间的子模型预测结果相似度，其中，所述预设相似度算法可以为欧几里得距离算法，也可以为皮尔逊相关系数算法，还可以为余弦相似度算法，可以理解的是，由于各子模型预测结果为预测得到的用户的信用风险，各子模型预测结果之间的差异化体现在数值上，而欧几里得距离算法的算法内容简单，需要保证各指标在相同的刻度级别；皮尔逊相关系数算法不能对方差不为0的数据进行计算；余弦相似度算法更注重向量在方向上的差异，因此优选采用欧几里得距离算法，以兼顾子模型相似度获取的效率和准确性。将所述子模型预测结果相似度作为各所述风险预测子模型之间的子模型相似度，将各所述子模型相似度中大于模型相似度阈值的目标子模型相似度对应的各风险预测子模型进行聚类，其中，所述模型相似度阈值为判定风险预测子模型之间的相似度较高的子模型相似度的临界值，得到至少一个子模型群。Exemplarily, steps S31 to S32 include: calculating the similarity of sub-model prediction results between the sub-model prediction results through a preset similarity algorithm, wherein the preset similarity algorithm can be Euclidean The distance algorithm can also be the Pearson correlation coefficient algorithm, or the cosine similarity algorithm. It is understandable that since the prediction results of each sub-model are the predicted credit risk of the user, the difference between the prediction results of each sub-model It is reflected in the value, and the content of the Euclidean distance algorithm is simple, and it is necessary to ensure that each index is at the same scale level; the Pearson correlation coefficient algorithm cannot calculate the data whose variance is not 0; the cosine similarity algorithm pays more attention to the vector Therefore, it is preferable to use the Euclidean distance algorithm to take into account the efficiency and accuracy of sub-model similarity acquisition. The similarity of the predicted results of the sub-models is used as the similarity of the sub-models between the risk prediction sub-models, and each risk prediction corresponding to the target sub-model similarity greater than the model similarity threshold among the similarities of the sub-models The sub-models are clustered, wherein the model similarity threshold is a critical value for judging the similarity of sub-models with higher similarity among risk prediction sub-models, and at least one sub-model group is obtained.

作为一种示例，步骤S30包括：在所述风险预测子模型中选取各子模型群的第一中心模型，可以根据经验选取，也可以人为设置；获取除各所述中心模型以外的风险预测子模型与各所述第一中心模型之间的模型距离；依据所述模型距离，将各所述风险预测子模型分配至所述模型距离最小的第一中心模型对应的子模型群；依据所述子模型群中的各所述风险预测子模型，确定各所述子模型群的第二中心模型，判断所述第二中心模型与所述第一中心模型是否一致，若所述第二中心模型与所述第一中心模型一致，则将所述子模型群作为目标子模型群，若所述第二中心模型与所述第一中心模型不一致，则返回执行步骤：在所述风险预测子模型中选取各子模型群的第一中心模型，直至所述第二中心模型与所述第一中心模型一致，得到至少一个子模型群。As an example, step S30 includes: selecting the first central model of each sub-model group in the risk prediction sub-model, which can be selected based on experience, or can be manually set; obtain risk prediction sub-models other than each of the central models The model distance between the model and each of the first central models; according to the model distance, each of the risk prediction sub-models is assigned to the sub-model group corresponding to the first central model with the smallest model distance; according to the For each of the risk prediction sub-models in the sub-model group, determine the second central model of each sub-model group, and judge whether the second central model is consistent with the first central model, if the second central model Consistent with the first central model, then use the sub-model group as the target sub-model group, if the second central model is inconsistent with the first central model, return to the execution step: in the risk prediction sub-model Select the first central model of each sub-model group until the second central model is consistent with the first central model to obtain at least one sub-model group.

其中，在步骤S10中，在获取用户的历史行为数据作为训练样本集以及所述训练样本集中各训练样本的样本权重的步骤之前，还包括：Wherein, in step S10, before the step of obtaining the user's historical behavior data as the training sample set and the sample weights of each training sample in the training sample set, it also includes:

步骤S11，获取各所述训练样本对应的真实标签；Step S11, obtaining a real label corresponding to each of the training samples;

步骤S12，依据所述训练样本集和各所述风险预测子模型，生成各所述风险预测子模型对于所述训练样本集的子模型预测结果；Step S12, according to the training sample set and each of the risk prediction sub-models, generate the sub-model prediction results of each of the risk prediction sub-models for the training sample set;

步骤S13，依据所述真实标签和子模型预测结果，确定各所述训练样本被各所述风险预测子模型预测正确的子模型数量；Step S13, according to the real label and sub-model prediction results, determine the number of sub-models that are correctly predicted by each of the risk prediction sub-models for each of the training samples;

步骤S14，依据所述子模型数量、预设参数和权重平滑系数，生成各所述训练样本的样本权重。Step S14, generating sample weights for each of the training samples according to the number of sub-models, preset parameters and weight smoothing coefficients.

在本实施例中，需要说明的是，所述真实标签为各训练样本中用户的真实信用风险。In this embodiment, it should be noted that the real label is the real credit risk of the users in each training sample.

示例性地，步骤S11至步骤S14包括：获取各所述训练样本对应的真实标签；通过各所述风险预测子模型将各所述训练样本映射为用户的信用风险，得到各所述风险预测子模型对于所述训练样本集的子模型预测结果；判断所述真实标签和所述子模型预测结果是否一致，若所述真实标签和所述子模型预测结果一致，则判定所述子模型预测结果对应的风险预测子模型预测正确，累计所述子模型预测结果对应的风险预测子模型的子模型数量，并返回执行步骤：判断所述真实标签和所述子模型预测结果是否一致，若所述真实标签和所述子模型预测结果不一致，则判定所述子模型预测结果对应的风险预测子模型预测错误，并返回执行步骤判断所述真实标签和所述子模型预测结果是否一致，直至各所述子模型预测结果均判定完毕，获取各所述训练样本被各所述风险预测子模型预测正确的子模型数量，依据所述子模型数量、预设参数和权重平滑系数，生成各所述训练样本的样本权重。Exemplarily, steps S11 to S14 include: obtaining the real label corresponding to each of the training samples; mapping each of the training samples to the user's credit risk through each of the risk prediction sub-models, and obtaining each of the risk prediction sub-models The prediction result of the sub-model of the model for the training sample set; judging whether the real label is consistent with the prediction result of the sub-model, and if the prediction result of the sub-model is consistent with the real label, then determine the prediction result of the sub-model The prediction of the corresponding risk prediction sub-model is correct, accumulating the number of sub-models of the risk prediction sub-model corresponding to the prediction result of the sub-model, and returning to the execution step: judging whether the real label is consistent with the prediction result of the sub-model, if the If the real label is inconsistent with the prediction result of the sub-model, it is determined that the risk prediction sub-model corresponding to the prediction result of the sub-model is wrong, and returns to the execution step to determine whether the real label is consistent with the prediction result of the sub-model, until each All the prediction results of the sub-models have been judged, and the number of sub-models predicted correctly by each of the risk prediction sub-models for each of the training samples is obtained, and each of the training samples is generated according to the number of sub-models, preset parameters and weight smoothing coefficients. The sample weight of the sample.

可选地，所述依据所述子模型数量、预设参数和权重平滑系数，生成各所述训练样本的样本权重的步骤具体可以为：Optionally, the step of generating sample weights of each of the training samples according to the number of sub-models, preset parameters and weight smoothing coefficients may specifically be:

其中，w为各所述训练样本的样本权重；m_i为各所述训练样本被各所述风险预测子模型预测正确的子模型数量；α为权重平滑系数；β为预设参数。Wherein, _w is the sample weight of each of the training samples; mi is the number of sub-models predicted correctly by each of the risk prediction sub-models for each of the training samples; α is a weight smoothing coefficient; β is a preset parameter.

作为一种示例，生成各所述训练样本的样本权重的步骤还可以为：获取各所述训练样本的样本数量，生成所述样本数量的正态分布随机数，将所述正态分布随机数分配至各所述训练样本，作为各所述训练样本的样本权重。As an example, the step of generating the sample weights of each of the training samples may also include: obtaining the sample size of each of the training samples, generating a normal distribution random number of the sample size, and dividing the normal distribution random number assigned to each of the training samples as a sample weight of each of the training samples.

可以理解的是，根据训练样本被各所述风险预测子模型预测正确的子模型数量确定各训练样本的样本权重，且子模型数量与样本权重呈负相关关系，而子模型数量越少，说明该样本为与其他样本具有差异化的补充样本数据，从而通过赋予补充样本数据较高的样本权重，在一定程度上减少了后续调整样本权重的迭代优化步骤，进而提高了信用风险预测总模型的训练效率。It can be understood that the sample weight of each training sample is determined according to the number of sub-models for which the training sample is correctly predicted by each of the risk prediction sub-models, and the number of sub-models is negatively correlated with the sample weight, and the smaller the number of sub-models, the This sample is supplementary sample data that is different from other samples, so by assigning a higher sample weight to the supplementary sample data, the iterative optimization steps for subsequent adjustment of sample weights are reduced to a certain extent, thereby improving the performance of the overall credit risk prediction model. training efficiency.

实施例三Embodiment three

本申请实施例还提供一种信用风险预测装置，所述信用风险预测装置应用于信用风险预测设备，参照图3，所述信用风险预测装置包括：The embodiment of the present application also provides a credit risk prediction device, the credit risk prediction device is applied to credit risk prediction equipment, referring to Figure 3, the credit risk prediction device includes:

可选地，所述训练模块还用于：Optionally, the training module is also used for:

从所述训练样本集中选取训练样本，并根据所述训练样本和所述训练样本对应的样本权重确定待预测样本；selecting a training sample from the training sample set, and determining a sample to be predicted according to the training sample and the sample weight corresponding to the training sample;

将所述待预测样本分别输入各所述风险预测子模型，得到各子模型输出预测结果；Inputting the samples to be predicted into each of the risk prediction sub-models to obtain the output prediction results of each sub-model;

根据各所述风险预测子模型对应的模型加权权重，对各所述子模型输出预测结果进行加权聚合，得到总模型输出预测结果；Perform weighted aggregation on the output prediction results of each of the sub-models according to the model weighting weights corresponding to each of the risk prediction sub-models to obtain the total model output prediction results;

根据所述总模型输出预测结果，优化各所述模型加权权重以及各所述风险预测子模型；Outputting prediction results according to the overall model, optimizing the weighted weights of each of the models and each of the risk prediction sub-models;

返回执行步骤：从所述训练样本集中选取训练样本，并根据所述训练样本和所述训练样本对应的样本权重确定待预测样本，直至各所述模型加权权重满足预设权重条件以及各所述子模型参数满足预设模型参数条件，得到所述信用风险预测总模型。Return to the execution step: select training samples from the training sample set, and determine the samples to be predicted according to the training samples and the sample weights corresponding to the training samples, until the weighted weights of each of the models meet the preset weight conditions and each of the The sub-model parameters meet the preset model parameter conditions, and the overall credit risk prediction model is obtained.

可选地，所述聚类模块还用于：Optionally, the clustering module is also used for:

获取各所述子模型预测结果之间的子模型预测结果相似度，将所述子模型预测结果相似度作为各所述风险预测子模型之间的子模型相似度；Obtaining the sub-model prediction result similarity between the sub-model prediction results, using the sub-model prediction result similarity as the sub-model similarity between the risk prediction sub-models;

依据所述子模型相似度，对各所述风险预测子模型进行聚类，得到至少一个子模型群。According to the similarity of the sub-models, each of the risk prediction sub-models is clustered to obtain at least one sub-model group.

可选地，所述扩增模块还用于：Optionally, the amplification module is also used for:

在各所述子模型群中选取满足预设条件的目标子模型群，其中，所述预设条件包括相关性条件和重要度条件中的至少一种；Selecting a target sub-model group that satisfies a preset condition in each of the sub-model groups, wherein the preset condition includes at least one of a correlation condition and an importance condition;

依据所述目标子模型群和各所述子模型预测结果，调整各所述样本权重，以扩增所述训练样本集。Adjusting the weight of each sample according to the target sub-model group and the prediction results of each sub-model, so as to amplify the training sample set.

获取所述信用风险预测总模型对于所述训练样本集的总模型预测结果；Obtain the overall model prediction result of the overall credit risk prediction model for the training sample set;

依据各所述子模型预测结果和所述总模型预测结果，生成各所述子模型群与所述信用风险预测总模型之间的子模型群相关性；generating a sub-model group correlation between each of the sub-model groups and the overall credit risk prediction model according to the prediction results of each of the sub-models and the prediction result of the overall model;

在各所述子模型群中选取所述子模型群相关性小于预设相关性阈值的目标子模型群。A target sub-model group whose correlation of the sub-model groups is smaller than a preset correlation threshold is selected from each of the sub-model groups.

获取各所述风险预测子模型的模型加权权重；Obtaining the model weights of each of the risk prediction sub-models;

依据所述模型加权权重，确定各所述子模型群对于所述信用风险预测总模型的子模型群重要度；determining the importance of each of the sub-model groups to the sub-model groups of the overall credit risk prediction model according to the weighted weight of the model;

在各所述子模型群中选取所述子模型群重要度小于预设重要度阈值的目标子模型群。A target sub-model group whose importance degree of the sub-model group is less than a preset importance threshold is selected from each sub-model group.

在各所述子模型群中选取所述子模型群相关性小于预设相关性阈值且所述子模型群重要度小于预设重要度阈值的目标子模型群。In each of the sub-model groups, a target sub-model group whose correlation of the sub-model groups is less than a preset correlation threshold and whose importance degree is less than a preset importance threshold is selected.

依据所述目标子模型群和各所述子模型预测结果，确定所述目标子模型群的预测结果分布信息；According to the target sub-model group and the prediction results of each of the sub-models, determine the distribution information of the prediction results of the target sub-model group;

依据所述预测结果分布信息，调整各所述样本权重。Adjusting the weight of each sample according to the distribution information of the prediction result.

获取所述预测结果分布信息中各所述训练样本被各所述风险预测子模型正确预测的正确预测占比，其中，所述预测结果分布信息包括各风险预测子模型针对于训练样本的子模型预测结果；Obtain the correct prediction proportion of each of the training samples correctly predicted by each of the risk prediction sub-models in the prediction result distribution information, wherein the prediction result distribution information includes the sub-models of each risk prediction sub-model for the training samples forecast result;

判断所述正确预测占比是否大于预设占比阈值；judging whether the proportion of the correct prediction is greater than a preset proportion threshold;

若是，则调低各所述训练样本对应的样本权重；If so, lower the sample weight corresponding to each of the training samples;

若否，则调高各所述训练样本对应的样本权重。If not, increase the sample weight corresponding to each of the training samples.

可选地，在所述获取用户的历史行为数据作为训练样本集以及所述训练样本集中各训练样本的样本权重的步骤之前，所述信用风险预测装置还用于：Optionally, before the step of acquiring the user's historical behavior data as the training sample set and the sample weights of each training sample in the training sample set, the credit risk prediction device is further used for:

获取各所述训练样本对应的真实标签；Obtain a true label corresponding to each of the training samples;

依据各所述训练样本和各所述风险预测子模型，生成各所述风险预测子模型对于所述训练样本集的子模型预测结果；According to each of the training samples and each of the risk prediction sub-models, generate a sub-model prediction result of each of the risk prediction sub-models for the training sample set;

依据所述真实标签和子模型预测结果，确定各所述训练样本被各所述风险预测子模型预测正确的子模型数量；According to the real label and sub-model prediction results, determine the number of sub-models that are correctly predicted by each of the risk prediction sub-models for each of the training samples;

依据所述子模型数量、预设参数和权重平滑系数，生成各所述训练样本的样本权重。Generate sample weights for each of the training samples according to the number of sub-models, preset parameters, and weight smoothing coefficients.

本申请提供的信用风险预测装置，采用上述实施例中的信用风险预测模型训练方法，解决了用户信用风险的预测准确度低的技术问题。与现有技术相比，本申请实施例提供的信用风险预测装置的有益效果与上述实施例提供的信用风险预测模型训练方法的有益效果相同，且该信用风险预测装置中的其他技术特征与上述实施例方法公开的特征相同，在此不做赘述。The credit risk prediction device provided by the present application adopts the credit risk prediction model training method in the above-mentioned embodiments, and solves the technical problem of low prediction accuracy of user credit risk. Compared with the prior art, the beneficial effect of the credit risk prediction device provided by the embodiment of the present application is the same as that of the credit risk prediction model training method provided by the above embodiment, and other technical features of the credit risk prediction device are the same as those described above The features disclosed in the methods of the embodiments are the same, and will not be repeated here.

实施例四Embodiment four

本申请实施例提供一种电子设备，所述电子设备包括：至少一个处理器；以及，与至少一个处理器通信连接的存储器；其中，存储器存储有可被至少一个处理器执行的指令，指令被至少一个处理器执行，以使至少一个处理器能够执行上述实施例中的信用风险预测模型训练方法。An embodiment of the present application provides an electronic device, and the electronic device includes: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores instructions executable by the at least one processor, and the instructions are executed by Execution by at least one processor, so that at least one processor can execute the credit risk prediction model training method in the above-mentioned embodiments.

下面参考图4，其示出了适于用来实现本公开实施例的电子设备的结构示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图4示出的电子设备仅仅是一个示例，不应对本公开实施例的功能和使用范围带来任何限制。Referring now to FIG. 4 , it shows a schematic structural diagram of an electronic device suitable for implementing an embodiment of the present disclosure. The electronic equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like. The electronic device shown in FIG. 4 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.

如图4所示，电子设备可以包括处理装置(例如中央处理器、图形处理器等)，其可以根据存储在只读存储器(ROM)中的程序或者从存储装置加载到随机访问存储器(RAM)中的程序而执行各种适当的动作和处理。在RAM中，还存储有电子设备操作所需的各种程序和数据。处理装置、ROM以及RAM通过总线彼此相连。输入/输出(I/O)接口也连接至总线。As shown in FIG. 4, an electronic device may include a processing device (such as a central processing unit, a graphics processing unit, etc.), which may be stored in a read-only memory (ROM) or loaded into a random access memory (RAM) Various appropriate actions and processing are performed by the programs in the program. In RAM, various programs and data necessary for the operation of electronic equipment are also stored. The processing means, ROM, and RAM are connected to each other via a bus. Input/output (I/O) interfaces are also connected to the bus.

通常，以下系统可以连接至I/O接口：包括例如触摸屏、触摸板、键盘、鼠标、图像传感器、麦克风、加速度计、陀螺仪等的输入装置；包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置；包括例如磁带、硬盘等的存储装置；以及通信装置。通信装置可以允许电子设备与其他设备进行无线或有线通信以交换数据。虽然图中示出了具有各种系统的电子设备，但是应理解的是，并不要求实施或具备所有示出的系统。可以替代地实施或具备更多或更少的系统。Typically, the following systems can be connected to the I/O interface: input devices including, for example, touch screens, touchpads, keyboards, mice, image sensors, microphones, accelerometers, gyroscopes, etc.; including, for example, liquid crystal displays (LCDs), speakers, vibrators output devices such as; storage devices including, for example, magnetic tapes, hard disks, etc.; and communication devices. A communication device may allow an electronic device to communicate with other devices wirelessly or by wire to exchange data. While an electronic device is shown with various systems in the figures, it should be understood that implementing or having all of the systems shown is not a requirement. More or fewer systems may alternatively be implemented or provided.

特别地，根据本公开的实施例，上文参考流程图描述的过程可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信装置从网络上被下载和安装，或者从存储装置被安装，或者从ROM被安装。在该计算机程序被处理装置执行时，执行本公开实施例的方法中限定的上述功能。In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication means, or installed from a storage means, or installed from a ROM. When the computer program is executed by the processing device, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are performed.

本申请提供的电子设备，采用上述实施例中的信用风险预测模型训练方法，解决了用户信用风险的预测准确度低的技术问题。与现有技术相比，本申请实施例提供的电子设备的有益效果与上述实施例提供的信用风险预测模型训练方法的有益效果相同，且该电子设备中的其他技术特征与上述实施例方法公开的特征相同，在此不做赘述。The electronic device provided by the present application adopts the credit risk prediction model training method in the above-mentioned embodiments, and solves the technical problem of low prediction accuracy of the user's credit risk. Compared with the prior art, the beneficial effect of the electronic device provided by the embodiment of the present application is the same as that of the credit risk prediction model training method provided by the above embodiment, and other technical features of the electronic device are disclosed in the method of the above embodiment The features are the same and will not be repeated here.

应当理解，本公开的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式的描述中，具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。It should be understood that various parts of the present disclosure may be implemented in hardware, software, firmware or a combination thereof. In the description of the above embodiments, specific features, structures, materials or characteristics may be combined in any one or more embodiments or examples in an appropriate manner.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application. Should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

实施例五Embodiment five

本实施例提供一种计算机可读存储介质，具有存储在其上的计算机可读程序指令，计算机可读程序指令用于执行上述实施例中的信用风险预测模型训练方法的方法。This embodiment provides a computer-readable storage medium, which has computer-readable program instructions stored thereon, and the computer-readable program instructions are used to implement the method for training a credit risk prediction model in the above-mentioned embodiments.

本申请实施例提供的计算机可读存储介质例如可以是U盘，但不限于电、磁、光、电磁、红外线、或半导体的系统、系统或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本实施例中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、系统或者器件使用或者与其结合使用。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：电线、光缆、RF(射频)等等，或者上述的任意合适的组合。The computer-readable storage medium provided in the embodiment of the present application may be, for example, a USB flash drive, but is not limited to an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, system, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this embodiment, a computer-readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system, system or device. Program code embodied on a computer readable storage medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

上述计算机可读存储介质可以是电子设备中所包含的；也可以是单独存在，而未装配入电子设备中。The above-mentioned computer-readable storage medium may be included in the electronic device, or may exist independently without being incorporated into the electronic device.

上述计算机可读存储介质承载有一个或者多个程序，当上述一个或者多个程序被电子设备执行时，使得电子设备：获取用户的历史行为数据作为训练样本集以及所述训练样本集中各训练样本的样本权重；根据所述训练样本集和各所述样本权重，迭代训练得到信用风险预测总模型，其中，所述信用风险预测总模型由多个风险预测子模型组成；获取各所述风险预测子模型对于所述训练样本集的子模型预测结果，依据各所述子模型预测结果，对所述风险预测子模型进行聚类，得到至少一个子模型群；依据各所述子模型预测结果和各所述子模型群，优化各所述样本权重，以扩增所述训练样本集；返回执行步骤：根据所述训练样本集和各所述样本权重，迭代训练得到信用风险预测总模型，直至所述信用风险预测总模型满足预设迭代更新结束条件，得到目标信用风险预测总模型。The above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: acquires the user's historical behavior data as a training sample set and each training sample in the training sample set sample weight; according to the training sample set and each of the sample weights, the iterative training obtains the overall credit risk prediction model, wherein the overall credit risk prediction model is composed of a plurality of risk prediction sub-models; each of the risk predictions is obtained For the sub-model prediction results of the training sample set, cluster the risk prediction sub-models according to the prediction results of each of the sub-models to obtain at least one sub-model group; according to the prediction results of each of the sub-models and For each of the sub-model groups, optimize the weight of each sample to amplify the training sample set; return to the execution step: according to the training sample set and each sample weight, iteratively train to obtain the overall credit risk prediction model until The overall credit risk forecasting model satisfies the preset iterative update end condition, and the target credit risk forecasting overall model is obtained.

可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码，上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out the operations of the present disclosure can be written in one or more programming languages, or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).

附图中的流程图和框图，图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.

描述于本公开实施例中所涉及到的模块可以通过软件的方式实现，也可以通过硬件的方式来实现。其中，模块的名称在某种情况下并不构成对该单元本身的限定。The modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the module does not constitute a limitation of the unit itself under certain circumstances.

本申请提供的计算机可读存储介质，存储有用于执行上述信用风险预测模型训练方法的计算机可读程序指令，解决了用户信用风险的预测准确度低的技术问题。与现有技术相比，本申请实施例提供的计算机可读存储介质的有益效果与上述实施提供的信用风险预测模型训练方法的有益效果相同，在此不做赘述。The computer-readable storage medium provided by the present application stores computer-readable program instructions for executing the above-mentioned credit risk prediction model training method, which solves the technical problem of low prediction accuracy of user credit risk. Compared with the prior art, the beneficial effect of the computer-readable storage medium provided by the embodiment of the present application is the same as the beneficial effect of the credit risk prediction model training method provided by the above-mentioned implementation, and will not be repeated here.

实施例六Embodiment six

本申请提供的计算机程序产品解决了用户信用风险的预测准确度低的技术问题。与现有技术相比，本申请实施例提供的计算机程序产品的有益效果与上述实施例提供的信用风险预测模型训练方法的有益效果相同，在此不做赘述。The computer program product provided by this application solves the technical problem of low prediction accuracy of user credit risk. Compared with the prior art, the beneficial effect of the computer program product provided by the embodiment of the present application is the same as that of the method for training the credit risk prediction model provided by the above embodiment, and will not be repeated here.

以上仅为本申请的优选实施例，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利处理范围内。The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. All equivalent structures or equivalent process transformations made by using the description of the application and the accompanying drawings are directly or indirectly used in other related technical fields. , are all included in the patent processing scope of the present application in the same way.

Claims

1. A credit risk prediction model training method is characterized in that, the credit risk prediction model training method comprises:

Obtaining the user's historical behavior data as a training sample set and the sample weight of each training sample in the training sample set;

According to the training sample set and each of the sample weights, iteratively train to obtain the overall credit risk prediction model, wherein the overall credit risk prediction model is composed of multiple risk prediction sub-models;

Obtaining the sub-model prediction results of each of the risk prediction sub-models for the training sample set, and clustering the risk prediction sub-models according to the prediction results of each of the sub-models to obtain at least one sub-model group;

Optimizing each of the sample weights according to the prediction results of each of the sub-models and each of the sub-model groups to amplify the training sample set;

Return to the execution step: According to the training sample set and the weights of each sample, iteratively train to obtain the overall credit risk prediction model until the overall credit risk prediction model meets the preset iterative update end condition, and obtain the target overall credit risk prediction model.

2. credit risk prediction model training method as claimed in claim 1, is characterized in that, described according to described training sample set and each described sample weight, iterative training obtains credit risk prediction total model, wherein, described credit risk prediction The steps in which the overall model is composed of multiple risk prediction sub-models include:

selecting a training sample from the training sample set, and determining a sample to be predicted according to the training sample and the sample weight corresponding to the training sample;

Inputting the samples to be predicted into each of the risk prediction sub-models to obtain the output prediction results of each sub-model;

Perform weighted aggregation on the output prediction results of each of the sub-models according to the model weighting weights corresponding to each of the risk prediction sub-models to obtain the total model output prediction results;

Outputting prediction results according to the overall model, optimizing the weighted weights of each of the models and each of the risk prediction sub-models;

Return to the execution step: select training samples from the training sample set, and determine the samples to be predicted according to the training samples and the sample weights corresponding to the training samples, until the weighted weights of each of the models meet the preset weight conditions and each of the The sub-model parameters meet the preset model parameter conditions, and the overall credit risk prediction model is obtained.

3. credit risk prediction model training method as claimed in claim 1, is characterized in that, described according to each described sub-model prediction result, described risk prediction sub-model is clustered, and the step of obtaining at least one sub-model group comprises :

Obtaining the sub-model prediction result similarity between the sub-model prediction results, using the sub-model prediction result similarity as the sub-model similarity between the risk prediction sub-models;

According to the similarity of the sub-models, each of the risk prediction sub-models is clustered to obtain at least one sub-model group.

4. credit risk prediction model training method as claimed in claim 1, is characterized in that, described according to each described sub-model prediction result and each described sub-model group, optimize each described sample weight, to amplify described training The steps for the sample set include:

Selecting a target sub-model group that satisfies a preset condition in each of the sub-model groups, wherein the preset condition includes at least one of a correlation condition and an importance condition;

Adjusting the weight of each sample according to the target sub-model group and the prediction results of each sub-model, so as to amplify the training sample set.

5. credit risk prediction model training method as claimed in claim 4, is characterized in that, described in each described sub-model group, selects the target sub-model group that satisfies preset condition, and wherein, described preset condition comprises correlation The step of at least one of the condition and the importance condition includes:

Obtain the overall model prediction result of the overall credit risk prediction model for the training sample set;

generating a sub-model group correlation between each of the sub-model groups and the overall credit risk prediction model according to the prediction results of each of the sub-models and the prediction result of the overall model;

A target sub-model group whose correlation of the sub-model groups is smaller than a preset correlation threshold is selected from each of the sub-model groups.

6. credit risk prediction model training method as claimed in claim 4, is characterized in that, described in each described sub-model group, selects the target sub-model group that satisfies preset condition, and wherein, described preset condition comprises correlation The step of at least one of the condition and the importance condition, further comprising:

Obtaining the model weights of each of the risk prediction sub-models;

determining the importance of each of the sub-model groups to the sub-model groups of the overall credit risk prediction model according to the weighted weight of the model;

A target sub-model group whose importance degree of the sub-model group is less than a preset importance threshold is selected from each sub-model group.

7. as described in any one of claim 4 to 6 credit risk forecasting model training method, it is characterized in that, described in each described sub-model group, select the target sub-model group that satisfies preset condition, wherein, the The preset condition includes at least one of a correlation condition and an importance condition, and also includes:

In each of the sub-model groups, a target sub-model group whose correlation of the sub-model groups is less than a preset correlation threshold and whose importance degree is less than a preset importance threshold is selected.

8. credit risk prediction model training method as claimed in claim 4, is characterized in that, described according to described target sub-model group and each described sub-model prediction result, the step of adjusting each described sample weight comprises:

According to the target sub-model group and the prediction results of each of the sub-models, determine the distribution information of the prediction results of the target sub-model group;

Adjusting the weight of each sample according to the distribution information of the prediction result.

9. credit risk prediction model training method as claimed in claim 8, is characterized in that, described according to described prediction result distribution information, the step of adjusting each described sample weight comprises:

Obtain the correct prediction proportion of each of the training samples correctly predicted by each of the risk prediction sub-models in the prediction result distribution information, wherein the prediction result distribution information includes the sub-models of each risk prediction sub-model for the training samples forecast result;

judging whether the proportion of the correct prediction is greater than a preset proportion threshold;

If so, lower the sample weight corresponding to each of the training samples;

If not, increase the sample weight corresponding to each of the training samples.

10. credit risk prediction model training method as claimed in claim 1, is characterized in that, before the step of the sample weight of each training sample in described training sample set as training sample set and described training sample set, also comprises :

Obtain a true label corresponding to each of the training samples;

According to each of the training samples and each of the risk prediction sub-models, generate a sub-model prediction result of each of the risk prediction sub-models for the training sample set;

According to the real label and sub-model prediction results, determine the number of sub-models that are correctly predicted by each of the risk prediction sub-models for each of the training samples;

Generate sample weights for each of the training samples according to the number of sub-models, preset parameters, and weight smoothing coefficients.

11. An electronic device, characterized in that the electronic device comprises:

at least one processor; and,

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1 to 10 The steps of the credit risk prediction model training method.

12. A computer-readable storage medium, characterized in that, the computer-readable storage medium is stored with a program for implementing a credit risk prediction model training method, and the program for realizing the credit risk prediction model training method is executed by a processor to The steps of realizing the credit risk prediction model training method according to any one of claims 1 to 10.