CN114676936A - Method for predicting default time and related device - Google Patents

Method for predicting default time and related device Download PDF

Info

Publication number
CN114676936A
CN114676936A CN202210460740.8A CN202210460740A CN114676936A CN 114676936 A CN114676936 A CN 114676936A CN 202210460740 A CN202210460740 A CN 202210460740A CN 114676936 A CN114676936 A CN 114676936A
Authority
CN
China
Prior art keywords
model
customer
default
data
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210460740.8A
Other languages
Chinese (zh)
Other versions
CN114676936B (en
Inventor
谢伟
王磊
吴冕冠
程鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210460740.8A priority Critical patent/CN114676936B/en
Publication of CN114676936A publication Critical patent/CN114676936A/en
Application granted granted Critical
Publication of CN114676936B publication Critical patent/CN114676936B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Finance (AREA)
  • General Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Technology Law (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

本申请提供了一种违约时间的预测方法及相关装置,涉及人工智能领域。该方法包括:获取第一客户的数据,第一客户的数据包括用于反映第一客户的信用风险的参数;将上述数据输入至第一模型中,得到第一客户的违约概率;在第一客户的违约概率大于预设值的情况下,通过第二模型预测第一客户的违约时间所属的时间区间;基于第二模型预测的第一客户的违约时间所属的时间区间,通过对应的第三模型预测第一客户的违约时间。通过预测出第一客户的违约时间,金融机构可以针对违约时间制定更合理的风险控制措施,有利于提高金融机构的信用风险控制的效率。

Figure 202210460740

The present application provides a method for predicting default time and a related device, which relate to the field of artificial intelligence. The method includes: acquiring data of a first customer, where the data of the first customer includes parameters used to reflect the credit risk of the first customer; inputting the above data into a first model to obtain the probability of default of the first customer; When the default probability of the customer is greater than the preset value, the time interval to which the default time of the first customer belongs is predicted by the second model; The model predicts the time to default for the first customer. By predicting the default time of the first customer, financial institutions can formulate more reasonable risk control measures for the default time, which is beneficial to improve the efficiency of credit risk control of financial institutions.

Figure 202210460740

Description

一种违约时间的预测方法及相关装置A method for predicting default time and related device

技术领域technical field

本申请涉及人工智能领域,尤其涉及一种违约时间的预测方法及相关装置。The present application relates to the field of artificial intelligence, and in particular, to a method for predicting default time and a related device.

背景技术Background technique

随着我国经济和金融的快速发展,消费信贷逐渐被人们接受。近年来,汽车贷款、教育贷款、小额现金贷、美容贷款等各种信贷业务蓬勃发展。对于信贷业务来说,金融机构对客户的信用风险评估是至关重要的。在大数据时代,虽然用于信用风险评估的数据越来越丰富,但给信用风险评估也带来了诸多挑战。With the rapid development of my country's economy and finance, consumer credit is gradually accepted by people. In recent years, various credit businesses such as auto loans, education loans, petty cash loans, and beauty loans have flourished. For credit business, financial institutions' credit risk assessment of customers is crucial. In the era of big data, although the data used for credit risk assessment is becoming more and more abundant, it also brings many challenges to credit risk assessment.

目前主流的信用风险评估方法是利用统计模型预测客户是否会发生违约或计算客户的违约概率,但单纯地预测客户的违约概率,对于金融机构的风险控制来说还不够全面,风险控制的效率不高。The current mainstream credit risk assessment method is to use statistical models to predict whether a customer will default or calculate the customer's default probability, but simply predicting the customer's default probability is not comprehensive enough for financial institutions' risk control, and the efficiency of risk control is not sufficient. high.

发明内容SUMMARY OF THE INVENTION

本申请提供一种违约时间的预测方法及相关装置,通过预测出客户的违约时间,金融机构可以针对违约时间制定更合理的风险控制措施,进而提高金融机构的信用风险控制的效率。The present application provides a method for predicting default time and a related device. By predicting the default time of a customer, a financial institution can formulate more reasonable risk control measures for the default time, thereby improving the efficiency of credit risk control of the financial institution.

第一方面,本申请提供了一种违约时间的预测方法,该方法可以由服务器执行,或者,也可以由配置在服务器中的部件(如芯片、芯片系统等)执行,或者,还可以由能够实现全部或部分服务器功能的逻辑模块或软件实现,本申请对此不作限定。In the first aspect, the present application provides a method for predicting default time, the method can be executed by a server, or can also be executed by a component (such as a chip, a chip system, etc.) configured in the server, or can also be executed by a server capable of A logic module or software implementation that implements all or part of the server functions, which is not limited in this application.

其中,上述服务器配置有第一模型、第二模型和至少一个第三模型,所述第一模型用于预测客户的违约概率,所述第二模型用于预测客户的违约时间所属的时间区间,所述至少一个第三模型和至少一个时间区间对应,每个第三模型用于预测客户在所对应的时间区间内的违约次数和违约时间。Wherein, the above server is configured with a first model, a second model and at least one third model, the first model is used to predict the default probability of the customer, and the second model is used to predict the time interval to which the customer's default time belongs, The at least one third model corresponds to at least one time interval, and each third model is used to predict the number of defaults and the default time of the customer within the corresponding time interval.

示例性地,该方法包括:获取第一客户的数据,所述第一客户的数据包括用于反映所述第一客户的信用风险的参数;将所述数据输入至所述第一模型中,得到所述第一客户的违约概率;在所述第一客户的违约概率大于预设值的情况下,通过所述第二模型预测所述第一客户的违约时间所属的时间区间;基于所述第二模型预测的所述第一客户的违约时间所属的时间区间,通过对应的第三模型预测所述第一客户的违约时间。Exemplarily, the method includes: acquiring data of a first customer, the data of the first customer including parameters for reflecting the credit risk of the first customer; inputting the data into the first model, Obtain the default probability of the first customer; when the default probability of the first customer is greater than a preset value, predict the time interval to which the default time of the first customer belongs by using the second model; based on the The time interval to which the default time of the first customer predicted by the second model belongs, and the default time of the first customer is predicted by the corresponding third model.

基于上述技术方案,将获取到的第一客户的数据输入至第一模型中,以得到该客户的违约概率,在该违约概率大于预设值的情况下,通过第二模型预测该客户的违约时间所属的时间区间,也即该客户可能在未来多长时间内违约,进一步地基于预测的客户的违约时间所属的时间区间,通过对应的第三模型预测该客户具体的违约时间,如此一来,不但可以预测出客户的违约概率,还可进一步针对可能违约的客户预测违约时间,因此可以获得更为全面的信息,便于金融机构基于此来制定更合理的风险控制措施,有利于提高金融机构的信用风险控制的效率。Based on the above technical solution, the acquired data of the first customer is input into the first model to obtain the default probability of the customer, and when the default probability is greater than the preset value, the second model is used to predict the default of the customer The time interval to which the time belongs, that is, how long the customer may default in the future. Further, based on the time interval to which the predicted default time of the customer belongs, the specific default time of the customer is predicted through the corresponding third model. , not only can predict the default probability of customers, but also further predict the default time for customers who may default, so more comprehensive information can be obtained, which is convenient for financial institutions to formulate more reasonable risk control measures based on this, which is conducive to improving financial institutions. the efficiency of credit risk control.

结合第一方面,在第一方面某种可能的实现方式中,所述第一模型为多元逻辑(Logistic)模型,所述第二模型为高斯混合模型(Gaussian mixture model,GMM),所述第三模型为自回归移动平均(autoregressive integrated moving average,ARIMA)模型。With reference to the first aspect, in a certain possible implementation manner of the first aspect, the first model is a multivariate logistic model, the second model is a Gaussian mixture model (GMM), and the first model is a Gaussian mixture model (GMM). The three models are autoregressive integrated moving average (ARIMA) models.

结合第一方面,在第一方面某种可能的实现方式中,所述数据包括如下一项或多项:行业类别、执行利率、放款金额、贷款期数、性别、年龄、学历、家庭年收入、雇佣情况、单位类型、居住情况、职务、社保标志以及客户等级。In combination with the first aspect, in a possible implementation manner of the first aspect, the data includes one or more of the following: industry category, execution interest rate, loan amount, number of loan periods, gender, age, education, and annual family income , employment status, unit type, residence status, job title, social security mark, and customer rating.

结合第一方面,在第一方面某种可能的实现方式中,所述第一模型包括多个第一子模型,所述多个第一子模型对应不同的客户类型,所述客户类型根据客户所属的年龄段或地区确定;所述将所述数据输入至所述第一模型中,得到所述第一客户的违约概率,包括:从所述多个第一子模型中确定与所述第一客户的客户类型对应的第一子模型;将所述第一客户的数据输入至所述第一子模型中,得到所述第一客户的违约概率。With reference to the first aspect, in a certain possible implementation manner of the first aspect, the first model includes multiple first sub-models, the multiple first sub-models correspond to different customer types, and the customer types are based on the customer The age group or region to which it belongs is determined; the inputting the data into the first model to obtain the probability of default of the first customer includes: determining from the plurality of first sub-models the relationship with the first sub-model A first sub-model corresponding to the customer type of a customer; inputting the data of the first customer into the first sub-model to obtain the default probability of the first customer.

结合第一方面,在第一方面某种可能的实现方式中,所述第二模型包括多个第二子模型,所述多个第二子模型对应不同的客户类型,所述客户类型根据客户所属的年龄段或地区确定;所述通过所述第二模型预测所述第一客户的违约时间所属的时间区间,包括:从所述多个第二子模型中确定与所述第一客户的客户类型对应的第二子模型;通过所述第二子模型预测所述第一客户的违约时间所属的时间区间。With reference to the first aspect, in a certain possible implementation manner of the first aspect, the second model includes multiple second sub-models, the multiple second sub-models correspond to different customer types, and the customer types are based on the customer The age group or region to which the first customer belongs is determined; the predicting the time interval to which the default time of the first customer belongs by using the second model includes: determining from the plurality of second sub-models the relationship between the first customer and the first customer The second sub-model corresponding to the customer type; the time interval to which the default time of the first customer belongs is predicted by the second sub-model.

结合第一方面,在第一方面某种可能的实现方式中,每个第三模型包括多个第三子模型,所述多个第三子模型中的任意两个第三子模型对应的客户类型不同,所述客户类型根据客户所属的年龄段或地区确定;所述基于所述第二模型预测的所述第一客户的违约时间所属的时间区间,通过对应的第三模型预测所述第一客户的违约时间,包括:在与所述第二模型预测的所述第一客户的违约时间所属的时间区间对应的第三模型中,确定与所述第一客户的客户类型对应的第三子模型;通过所述第三子模型预测所述第一客户的违约时间。With reference to the first aspect, in a certain possible implementation manner of the first aspect, each third model includes multiple third sub-models, and customers corresponding to any two third sub-models in the multiple third sub-models different types, the customer type is determined according to the age group or region to which the customer belongs; the time interval to which the default time of the first customer predicted based on the second model belongs, the corresponding third model is used to predict the first customer The default time of a customer includes: in a third model corresponding to the time interval to which the default time of the first customer predicted by the second model belongs, determining a third model corresponding to the customer type of the first customer sub-model; predicting the default time of the first customer through the third sub-model.

结合第一方面,在第一方面某种可能的实现方式中,所述方法还包括:获取训练集,所述训练集包括多个客户的历史数据;基于所述训练集,分别对所述第一模型、所述第二模型和所述至少一个第三模型进行训练。With reference to the first aspect, in a certain possible implementation manner of the first aspect, the method further includes: acquiring a training set, where the training set includes historical data of multiple customers; A model, the second model and the at least one third model are trained.

结合第一方面,在第一方面某种可能的实现方式中,所述方法还包括:基于所述多个客户分别对应的客户类型,对所述训练集进行分组,得到多组训练集,所述多组训练集对应的客户类型不同,所述客户类型根据客户所属的年龄段或地区确定;以及,所述基于所述训练集,分别对所述第一模型、所述第二模型和所述至少一个第三模型进行训练,包括:基于每一组训练集分别对所述第一模型、所述第二模型和所述至少一个第三模型进行训练,得到训练好的一个第一子模型、第二子模型以及多个训练好的第三子模型。With reference to the first aspect, in a possible implementation manner of the first aspect, the method further includes: grouping the training set based on the customer types corresponding to the multiple customers respectively, to obtain multiple sets of training sets, where the The customer types corresponding to the multiple sets of training sets are different, and the customer types are determined according to the age group or region to which the customer belongs; and, based on the training set, the first model, the second model and the Training the at least one third model includes: training the first model, the second model and the at least one third model based on each group of training sets, respectively, to obtain a trained first sub-model , a second sub-model, and multiple trained third sub-models.

结合第一方面,在第一方面某种可能的实现方式中,所述方法还包括:按照预设周期,更新所述训练集,得到更新后的训练集;基于所述更新后的训练集,分别对所述第一模型、所述第二模型和所述至少一个第三模型进行训练。With reference to the first aspect, in a certain possible implementation manner of the first aspect, the method further includes: updating the training set according to a preset period to obtain an updated training set; based on the updated training set, The first model, the second model and the at least one third model are trained separately.

第二方面,本申请提供了一种模型训练方法,该方法可以由服务器执行。该服务器配置有第一模型、第二模型和至少一个第三模型,所述第一模型用于预测客户的违约概率,所述第二模型用于预测客户的违约时间所属的时间区间,所述至少一个第三模型和至少一个时间区间对应,每个第三模型用于预测客户在所对应的时间区间内的违约次数和违约时间。In a second aspect, the present application provides a model training method, which can be executed by a server. The server is configured with a first model, a second model and at least one third model, the first model is used to predict the default probability of the customer, the second model is used to predict the time interval to which the customer's default time belongs, the At least one third model corresponds to at least one time interval, and each third model is used to predict the number of defaults and the default time of the customer within the corresponding time interval.

示例性地,该方法包括:获取多个客户的数据,每个客户的数据包括用于反映客户的信用风险的参数;将所述多个客户的数据输入至所述第一模型中,得到所述多个客户的违约概率;在客户的违约概率大于预设值的情况下,通过所述第二模型预测客户的违约时间所属的时间区间;基于所述第二模型预测的客户的违约时间所属的时间区间,通过对应的第三模型预测客户的违约时间。Exemplarily, the method includes: acquiring data of a plurality of customers, and the data of each customer includes a parameter used to reflect the credit risk of the customer; inputting the data of the plurality of customers into the first model, and obtaining the obtained data. the default probability of the multiple customers; when the default probability of the customer is greater than the preset value, the second model is used to predict the time interval to which the default time of the customer belongs; the default time of the customer predicted based on the second model belongs to , and predict the default time of customers through the corresponding third model.

基于上述技术方案,将获取到的多个客户的数据输入至第一模型中,以得到多个客户的违约概率,对于违约概率大于预设值的客户,通过第二模型预测客户的违约时间所属的时间区间,也即客户可能在未来多长时间内违约,进一步地基于预测的客户的违约时间所属的时间区间,通过对应的第三模型预测客户具体的违约时间,这样一来,可以通过多个客户的数据训练第一模型、第二模型以及至少一个第三模型,有利于提高模型的准确性。Based on the above technical solution, the obtained data of multiple customers is input into the first model to obtain the default probability of multiple customers. For customers whose default probability is greater than the preset value, the second model is used to predict the default time of the customer. The time interval, that is, how long the customer may default in the future, and further based on the time interval to which the predicted default time of the customer belongs, the specific default time of the customer is predicted through the corresponding third model. The first model, the second model and the at least one third model are trained on the data of each customer, which is beneficial to improve the accuracy of the model.

第三方面,本申请提供了一种服务器,所述服务器配置有第一模型、第二模型和至少一个第三模型,所述第一模型用于预测客户的违约概率,所述第二模型用于预测客户的违约时间所属的时间区间,所述至少一个第三模型和至少一个时间区间对应,每个第三模型用于预测客户在所对应的时间区间内的违约次数和违约时间。In a third aspect, the present application provides a server, the server is configured with a first model, a second model and at least one third model, the first model is used to predict the default probability of a customer, and the second model uses The at least one third model corresponds to the at least one time interval in the time interval to which the default time of the customer is predicted, and each third model is used to predict the number of default times and the default time of the customer in the corresponding time interval.

示例性地,所述服务器包括获取单元、输入单元以及处理单元。其中,获取单元用于获取第一客户的数据,所述第一客户的数据包括用于反映所述第一客户的信用风险的参数;输入单元用于将所述数据输入至所述第一模型中,得到所述第一客户的违约概率;处理单元用于在所述第一客户的违约概率大于预设值的情况下,通过所述第二模型预测所述第一客户的违约时间所属的时间区间;所述处理单元还用于基于所述第二模型预测的所述第一客户的违约时间所属的时间区间,通过对应的第三模型预测所述第一客户的违约时间。Exemplarily, the server includes an acquisition unit, an input unit and a processing unit. Wherein, the acquisition unit is used for acquiring the data of the first customer, the data of the first customer includes parameters used to reflect the credit risk of the first customer; the input unit is used for inputting the data into the first model , obtain the default probability of the first customer; the processing unit is configured to predict the default time of the first customer through the second model when the default probability of the first customer is greater than a preset value. time interval; the processing unit is further configured to predict the default time of the first customer by using the corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs.

第四方面,本申请提供了一种服务器,该装置包括处理器。该处理器与存储器耦合,可用于执行存储器中的计算机程序,以实现第一方面至第二方面及第一方面至第二方面中任一种可能实现方式中所述的方法。In a fourth aspect, the present application provides a server, the apparatus including a processor. The processor is coupled to the memory and is operable to execute a computer program in the memory to implement the methods described in the first to second aspects and any possible implementations of the first to second aspects.

可选地,第四方面中的服务器还包括存储器。Optionally, the server in the fourth aspect further includes a memory.

可选地,第四方面中的服务器还包括通信接口,处理器与通信接口耦合。Optionally, the server in the fourth aspect further includes a communication interface, and the processor is coupled to the communication interface.

第五方面,本申请提供了一种芯片系统,该芯片系统包括至少一个处理器,用于支持实现上述第一方面至第二方面及第一方面至第二方面中任一种可能实现方式中所涉及的功能,例如,接收或处理上述方法中所涉及的数据等。In a fifth aspect, the present application provides a chip system, where the chip system includes at least one processor for supporting the implementation of any one of the above-mentioned first to second aspects and any possible implementation manners of the first to second aspects The functions involved, for example, receiving or processing the data involved in the above methods, etc.

在一种可能的设计中,所述芯片系统还包括存储器,所述存储器用于保存程序指令和数据,存储器位于处理器之内或处理器之外。In a possible design, the chip system further includes a memory for storing program instructions and data, and the memory is located inside the processor or outside the processor.

该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。The chip system can be composed of chips, and can also include chips and other discrete devices.

第六方面,本申请提供了一种计算机可读存储介质,所述计算机存储介质上存储有计算机程序(也可以称为代码,或指令),当所述计算机程序在被处理器运行时,使得上述第一方面至第二方面及第一方面至第二方面中任一种可能实现方式中所述的方法被执行。In a sixth aspect, the present application provides a computer-readable storage medium on which a computer program (also referred to as code, or instruction) is stored, and when the computer program is executed by a processor, causes the The method described in any one of possible implementations of the first aspect to the second aspect and the first aspect to the second aspect is performed.

第七方面,本申请提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序(也可以称为代码,或指令),当所述计算机程序被运行时,使得上述第一方面至第二方面及第一方面至第二方面中任一种可能实现方式中所述的方法被执行。In a seventh aspect, the present application provides a computer program product, the computer program product comprising: a computer program (also referred to as code, or instructions), when the computer program is executed, the above-mentioned first aspect to the first The method described in the second aspect and any one of the possible implementations of the first aspect to the second aspect is performed.

应理解,本申请的第三方面至第七方面与本申请的第一方面和第二方面的技术方案相对应,各方面及对应的可行实施方式所取得的有益效果相似,不再赘述。It should be understood that the third to seventh aspects of the present application correspond to the technical solutions of the first and second aspects of the present application, and the beneficial effects obtained by each aspect and the corresponding feasible implementation manner are similar, and will not be repeated.

还应理解,本申请提供的一种违约时间的预测方法及相关装置可应用于人工智能领域,也可应用于其他领域。本申请对此不作限定。It should also be understood that the method for predicting the default time and the related device provided by the present application can be applied to the field of artificial intelligence, and can also be applied to other fields. This application does not limit this.

附图说明Description of drawings

图1是适用于本申请实施例提供的方法的应用场景示意图;1 is a schematic diagram of an application scenario applicable to the method provided by the embodiment of the present application;

图2是本申请实施例提供的违约时间的预测方法的示意性流程图;2 is a schematic flowchart of a method for predicting a default time provided by an embodiment of the present application;

图3是本申请实施例提供的违约时间的预测方法的又一示意性流程图;3 is another schematic flowchart of the method for predicting default time provided by an embodiment of the present application;

图4是本申请实施例提供的服务器的示意性框图;4 is a schematic block diagram of a server provided by an embodiment of the present application;

图5是本申请实施例提供的服务器的又一示意性框图。FIG. 5 is another schematic block diagram of a server provided by an embodiment of the present application.

具体实施方式Detailed ways

以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following detailed description of the embodiments of the application provided in the accompanying drawings is not intended to limit the scope of the application as claimed, but is merely representative of selected embodiments of the application. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

下面为便于理解本申请实施例提供的违约时间的预测方法,下面将对适用于本申请实施例的应用场景进行说明。可理解的,本申请实施例描述的应用场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定。In order to facilitate understanding of the method for predicting the default time provided by the embodiment of the present application, the following describes the application scenarios applicable to the embodiment of the present application. It is understandable that the application scenarios described in the embodiments of the present application are for the purpose of illustrating the technical solutions of the embodiments of the present application more clearly, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application.

图1是适用于本申请实施例提供的方法的应用场景示意图。如图1所示,技术人员可以通过电子设备110输入用于客户的信用风险评估的相关数据。其中,电子设备110可与服务器120通信连接,服务器120可通过电子设备110来呈现用户界面。该用户界面提供了技术人员与服务器120交互的接口,技术人员可通过在用户界面输入或选择等操作方式,向服务器120发送数据或信息。相应地,服务器120可以基于技术人员输入的数据或信息,通过电子设备110呈现出客户的信用风险评估结果。FIG. 1 is a schematic diagram of an application scenario applicable to the method provided by the embodiment of the present application. As shown in FIG. 1 , the technician may input relevant data for credit risk assessment of the customer through the electronic device 110 . The electronic device 110 may be connected in communication with the server 120 , and the server 120 may present a user interface through the electronic device 110 . The user interface provides an interface for the technician to interact with the server 120, and the technician can send data or information to the server 120 by inputting or selecting operations on the user interface. Accordingly, the server 120 may present the customer's credit risk assessment result through the electronic device 110 based on the data or information input by the technician.

应理解,图1所示的场景仅为示例,服务器120可以为一台物理设备,也可以为多台物理设备组成的服务器集群。It should be understood that the scenario shown in FIG. 1 is only an example, and the server 120 may be one physical device, or may be a server cluster composed of multiple physical devices.

随着我国经济和金融的快速发展,消费信贷逐渐被人们接受。近年来,汽车贷款、教育贷款、小额现金贷、美容贷款等各种信贷业务蓬勃发展。对于信贷业务来说,金融机构对客户的信用风险评估是至关重要的。在大数据时代,虽然用于信用风险评估的数据越来越丰富,但给信用风险评估也带来了诸多挑战。With the rapid development of my country's economy and finance, consumer credit is gradually accepted by people. In recent years, various credit businesses such as auto loans, education loans, petty cash loans, and beauty loans have flourished. For credit business, financial institutions' credit risk assessment of customers is crucial. In the era of big data, although the data used for credit risk assessment is becoming more and more abundant, it also brings many challenges to credit risk assessment.

目前主流的信用风险评估方法是利用统计模型预测客户是否会发生违约或计算客户的违约概率,但单纯地预测客户的违约概率,对于金融机构的风险控制来说还不够全面,风险控制的效率不高。The current mainstream credit risk assessment method is to use statistical models to predict whether a customer will default or calculate the customer's default probability, but simply predicting the customer's default probability is not comprehensive enough for financial institutions' risk control, and the efficiency of risk control is not sufficient. high.

为解决上述问题,本申请提供了一种违约时间的预测方法,将获取到的第一客户的数据输入至第一模型中,以得到该客户的违约概率,在该违约概率大于预设值的情况下,通过第二模型预测该客户的违约时间所属的时间区间,也即该客户可能在未来多长时间内违约,每个时间区间对应一个预测客户违约时间的第三模型,进一步地基于预测的客户的违约时间所属的时间区间,通过对应的第三模型预测该客户具体的违约时间,以便于金融机构基于违约时间制定更合理的风险控制措施。In order to solve the above-mentioned problems, the present application provides a method for predicting the default time. The obtained data of the first customer is input into the first model to obtain the default probability of the customer. When the default probability is greater than the preset value, In this case, the second model is used to predict the time interval to which the customer's default time belongs, that is, how long the customer may default in the future. Each time interval corresponds to a third model that predicts the customer's default time, and is further based on the prediction. The customer's default time belongs to the time interval, and the specific default time of the customer is predicted through the corresponding third model, so that the financial institution can formulate more reasonable risk control measures based on the default time.

下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本申请的实施例进行描述。The technical solutions of the present application and how the technical solutions of the present application solve the above-mentioned technical problems will be described in detail below with specific examples. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. The embodiments of the present application will be described below with reference to the accompanying drawings.

图2是本申请实施例提供的违约时间的预测方法200的示意性流程图。图2所示的违约时间的预测方法200可以包括步骤210和步骤240。下面详细说明方法200中的各个步骤。FIG. 2 is a schematic flowchart of a method 200 for predicting a default time provided by an embodiment of the present application. The method 200 for predicting the default time shown in FIG. 2 may include step 210 and step 240 . Each step in the method 200 is described in detail below.

应理解,图2所示的方法200以服务器作为执行主体,但不应对该方法的执行主体构成任何限定,只要能够通过运行记录有本申请提供的方法的代码的程序,便可执行本申请实施例提供的方法。例如,服务器也可以替换为配置在服务器中的部件(如,芯片、芯片系统等),或其他能够调用程序并执行程序的功能模块。It should be understood that the method 200 shown in FIG. 2 uses the server as the execution body, but the execution body of the method should not constitute any limitation. As long as the program that records the code of the method provided by the present application can be executed, the implementation of the present application can be executed. method provided by the example. For example, the server can also be replaced with components (eg, chips, chip systems, etc.) configured in the server, or other functional modules capable of calling programs and executing programs.

还应理解,上述服务器配置有第一模型、第二模型和至少一个第三模型,其中,第一模型用于预测客户的违约概率,第二模型用于预测客户的违约时间所属的时间区间,也即,用于预测客户可能在多长时间内发生违约,至少一个第三模型和至少一个时间区间对应,每个第三模型用于预测客户在所对应的时间区间内的违约次数和违约时间。It should also be understood that the above-mentioned server is configured with a first model, a second model and at least one third model, wherein the first model is used to predict the default probability of the customer, and the second model is used to predict the time interval to which the customer's default time belongs, That is, it is used to predict how long a customer may be in default, at least one third model corresponds to at least one time interval, and each third model is used to predict the number of defaults and the default time of the customer in the corresponding time interval. .

步骤210,获取第一客户的数据,第一客户的数据包括用于反映第一客户的信用风险的参数。Step 210: Acquire data of the first customer, where the data of the first customer includes parameters used to reflect the credit risk of the first customer.

其中,第一客户可以是增量客户,即新客户,也可以是存量客户,即老客户,本申请实施例对此不作限定。其中,老客户的历史数据可以用于训练第一模型、第二模型以及至少一个第三模型。The first customer may be an incremental customer, that is, a new customer, or an existing customer, that is, an old customer, which is not limited in this embodiment of the present application. The historical data of old customers can be used to train the first model, the second model and at least one third model.

一种可能的实现方式是,服务器响应于技术人员的输入操作,获取到第一客户的数据,也即,技术人员可以通过用户界面,输入反映第一客户的信用风险的数据,相应地,服务器获取到第一客户的数据,以用于预测第一客户的违约时间。A possible implementation is that the server obtains the data of the first client in response to the input operation of the technician, that is, the technician can input the data reflecting the credit risk of the first client through the user interface, and accordingly, the server The data of the first customer is acquired to predict the default time of the first customer.

另一种可能的实现方式是,服务器可以从自有平台或合作方平台获取第一客户的数据,其中,自有平台或合作方平台中存储有第一客户的数据,自有平台或合作平台可以和服务器进行通信。服务器可以响应于技术人员的点击等操作,触发服务器从上述平台获取第一客户的数据,也即,技术人员可以通过用户界面,点击预测第一客户的违约时间,进而触发服务器从上述平台获取第一客户的数据。Another possible implementation is that the server can obtain the data of the first customer from its own platform or the partner platform, wherein the own platform or the partner platform stores the data of the first customer, and the own platform or the partner platform stores the data of the first customer. Can communicate with the server. The server can trigger the server to obtain the data of the first customer from the above-mentioned platform in response to the operation such as the click of the technical staff, that is, the technical staff can click to predict the default time of the first customer through the user interface, and then trigger the server to obtain the data of the first customer from the above-mentioned platform. A customer's data.

可选地,上述数据包括如下一项或多项:行业类别、执行利率、放款金额、贷款期数、性别、年龄、学历、家庭年收入、雇佣情况、单位类型、居住情况、职务、社保标志以及客户等级。Optionally, the above data includes one or more of the following: industry category, execution interest rate, loan amount, number of loan periods, gender, age, education, annual household income, employment situation, unit type, residence situation, job title, social security sign and customer level.

其中,客户等级可以用于反映该客户的重要程度。例如,客户等级越高,表示该客户的重要程度越高。Among them, the customer level can be used to reflect the importance of the customer. For example, the higher the customer rank, the more important the customer is.

一示例,服务器可以响应于技术人员的输入操作,获取第一客户的行业类别、执行利率、放款金额、贷款期数、性别、年龄、学历、家庭年收入、雇佣情况、单位类型、居住情况、职务、社保标志以及客户等级中的一项或多项,以便于服务器基于多方面的影响预测该第一客户的违约概率。In one example, the server may, in response to the input operation of the technician, obtain the industry category, execution interest rate, loan amount, loan period, gender, age, education level, annual family income, employment status, unit type, residence status, One or more items of job title, social security mark, and customer level, so that the server can predict the default probability of the first customer based on various influences.

又一示例,服务器可以从自有平台或合作方平台获取第一客户的行业类别、执行利率、放款金额、贷款期数、性别、年龄、学历、家庭年收入、雇佣情况、单位类型、居住情况、职务、社保标志以及客户等级中的一项或多项,其中,自有平台或合作方平台中存储有第一客户的上述数据,自有平台或合作平台可以和服务器进行通信,服务器可以响应于技术人员的点击等操作,触发服务器从上述平台获取第一客户的上述数据。For another example, the server may obtain the industry category, execution interest rate, loan amount, loan period, gender, age, education level, annual family income, employment status, unit type, and residence status of the first customer from its own platform or a partner platform. One or more of , job title, social security logo, and customer level, where the above-mentioned data of the first customer is stored in the self-owned platform or the partner platform, the self-owned platform or the cooperative platform can communicate with the server, and the server can respond The server is triggered to acquire the above-mentioned data of the first client from the above-mentioned platform due to the operation such as the click of the technician.

应理解,在本申请中,所涉及的金融数据或用户个人数据等信息收集、存储、使用、加工、传输、提供和公开等处理,均符合相关法律法规的规定,且不违背公序良俗。It should be understood that in this application, the collection, storage, use, processing, transmission, provision and disclosure of financial data or user personal data and other information involved are in compliance with relevant laws and regulations, and do not violate public order and good customs.

本申请实施例通过提供上述多种可能影响客户违约情况的数据,综合考虑了多方面的因素对客户的违约情况的影响,有利于提高预测的客户的违约概率的准确性。By providing the above-mentioned various data that may affect the default situation of the customer, the embodiment of the present application comprehensively considers the impact of various factors on the default situation of the customer, which is beneficial to improve the accuracy of the predicted default probability of the customer.

可选地,上述第一模型为Logistic模型,第二模型为GMM,第三模型为ARIMA模型。Optionally, the above-mentioned first model is a Logistic model, the second model is a GMM, and the third model is an ARIMA model.

下面将分别详细介绍Logistic模型、GMM以及ARIMA模型。The following will introduce the Logistic model, GMM and ARIMA model in detail.

一、Logistic模型1. Logistic model

Logistic模型形式类似如下:

Figure BDA0003621584510000081
其中,
Figure BDA0003621584510000082
表示客户i的违约概率,
Figure BDA0003621584510000083
表示客户i的数据,
Figure BDA0003621584510000084
是需要训练的参数。The logistic model has a form similar to the following:
Figure BDA0003621584510000081
in,
Figure BDA0003621584510000082
represents the default probability of customer i,
Figure BDA0003621584510000083
represents the data of customer i,
Figure BDA0003621584510000084
are the parameters to be trained.

可以理解,上述

Figure BDA0003621584510000085
表示客户i发生违约,不应对本申请实施例构成任何限定。例如,
Figure BDA0003621584510000091
还可以表示客户i不发生违约。相应地,当客户不违约的概率大于预设值时,表示客户的信用良好,可以认为该客户在贷款期内守约,换言之,服务器无需进一步预测该客户的违约时间;当客户不违约的概率小于或等于预设值时,表示客户的信用一般,可以认为该客户在贷款期内将会违约(可以称其为高风险客户),换言之,服务器需要进一步预测该客户的违约时间。It is understood that the above
Figure BDA0003621584510000085
It means that the customer i has breached the contract, which shall not constitute any limitation to the embodiments of this application. E.g,
Figure BDA0003621584510000091
It can also mean that customer i does not default. Correspondingly, when the probability of the customer not defaulting is greater than the preset value, it means that the customer's credit is good, and the customer can be considered to be in compliance during the loan period. In other words, the server does not need to further predict the default time of the customer; when the probability of the customer not defaulting When it is less than or equal to the preset value, it indicates that the customer's credit is average, and it can be considered that the customer will default within the loan period (it can be called a high-risk customer). In other words, the server needs to further predict the customer's default time.

二、GMM2. GMM

1维高斯分布的形式类似如下:

Figure BDA0003621584510000092
其中,N(x|μ,σ)表示概率,σ表示标准差,μ表示均值,即期望,x表示客户的数据,如家庭收入、或年龄等。上述公式表示在μ附近的概率。可以理解,距离μ越近,即σ越小,其概率越大。The 1D Gaussian distribution has a form similar to the following:
Figure BDA0003621584510000092
Among them, N(x|μ,σ) represents the probability, σ represents the standard deviation, μ represents the mean, that is, the expectation, and x represents the customer's data, such as family income, or age. The above formula expresses the probability around μ. It can be understood that the closer the distance μ, that is, the smaller the σ, the greater the probability.

d(d为大于1的正整数)维高斯分布的形式类似如下:The form of d (d is a positive integer greater than 1) dimensional Gaussian distribution is similar to the following:

Figure BDA0003621584510000093
其中,d表示x的维数,∑表示d*d的协方差矩阵,|∑|为协方差的行列式的值,μ表示均值,即期望,x表示客户的数据,如家庭收入、年龄等。
Figure BDA0003621584510000093
Among them, d represents the dimension of x, ∑ represents the covariance matrix of d*d, |∑| is the value of the determinant of the covariance, μ represents the mean, that is, the expectation, and x represents the customer's data, such as family income, age, etc. .

高斯混合模型就是将多个高斯模型混合在一起,使用权重参数来调整不同高斯模型(代表数据样本中的类别)的混合比例,高斯混合模型的形式类似如下:

Figure BDA0003621584510000094
其中,p(x|Cj)=N(x|μj,∑j)是类别或分组j(服从高斯分布)的条件概率密度,p(Cj)≥0,p(Cj)是类别j的权重参数(wj=p(Cj)),并且
Figure BDA0003621584510000095
模型的参数为p(Cj),μj,∑j,其中j=1,…,k,k是类别或分组的总数(总共有k个类别或分组),k为正整数。μj是类别j的均值,∑j是类别j的协方差。The Gaussian mixture model is to mix multiple Gaussian models together, and use the weight parameter to adjust the mixture ratio of different Gaussian models (representing the categories in the data sample). The form of the Gaussian mixture model is similar to the following:
Figure BDA0003621584510000094
Among them, p(x|C j )=N(x|μ j ,∑ j ) is the conditional probability density of category or group j (obey Gaussian distribution), p(C j )≥0, p(C j ) is the category weight parameter for j (w j =p(C j )), and
Figure BDA0003621584510000095
The parameters of the model are p(C j ), μ j , Σ j , where j=1,...,k, k is the total number of categories or groups (there are k categories or groups in total), and k is a positive integer. μ j is the mean of category j and ∑ j is the covariance of category j.

在本申请实施例中,类别表示客户的违约时间的时间区间,也即,客户将会在未来多长时间内发生违约。例如,多个类别为:一年内违约、两年内违约、三年内违约以及四年内违约。基于上述高斯混合模型可以对第一客户进行后验概率的计算,将其划分到其中某个高斯模型中,也就是得出该第一客户所属的类别,即该第一客户的违约时间所属的时间区间。In the embodiment of the present application, the category represents the time interval of the customer's default time, that is, how long in the future the customer will default. For example, several categories are: default within one year, default within two years, default within three years, and default within four years. Based on the above Gaussian mixture model, the posterior probability of the first customer can be calculated and divided into one of the Gaussian models, that is, the category to which the first customer belongs, that is, the default time of the first customer belongs to. time interval.

可以理解,在使用上述高斯混合模型之前,需要进行参数估计,以获得用于预测第一客户所属的类别的高斯混合模型。例如可以使用期望最大化(expectationmaximization,EM)算法对高斯混合模型进行参数估计,获得用于预测第一客户所属的类别的高斯混合模型。下面详细描述参数估计的过程。It can be understood that before using the above-mentioned Gaussian mixture model, parameter estimation needs to be performed to obtain a Gaussian mixture model for predicting the category to which the first customer belongs. For example, an expectation maximization (expectation maximization, EM) algorithm can be used to perform parameter estimation on the Gaussian mixture model, to obtain a Gaussian mixture model for predicting the category to which the first customer belongs. The process of parameter estimation is described in detail below.

拟合GMM的目标是找到p(Cj),μj,Σj,使得最大化

Figure BDA0003621584510000101
其中,p(x|Cj)=N(x|μjj)。两边取对数,获得GMM的对数似然函数:
Figure BDA0003621584510000102
目标也就是使该对数似然函数最大化,故使用EM算法。EM算法包括初始化步骤和迭代步骤。初始化步骤为:初始化K个集群:C1,…,Ck,对于每个集群j,都有参数(μj,∑j)和p(Cj)。迭代步骤为:估计每个数据点的所属集群p(Cj|xj)(期望步骤),计算似然函数的期望;重新估计每个集群j的参数(μj,∑j)和p(Cj)(最大化步骤)。The goal of fitting a GMM is to find p(C j ), μ j , Σ j such that maximizing
Figure BDA0003621584510000101
Wherein, p(x|C j )=N(x|μ jj ). Take the logarithm of both sides to obtain the log-likelihood function of GMM:
Figure BDA0003621584510000102
The goal is to maximize the log-likelihood function, so the EM algorithm is used. The EM algorithm includes an initialization step and an iterative step. The initialization steps are: Initialize K clusters: C 1 , . . . , C k , for each cluster j, there are parameters (μ j , ∑ j ) and p(C j ). The iterative steps are: estimate the cluster p(C j |x j ) to which each data point belongs (expectation step), calculate the expectation of the likelihood function; re-estimate the parameters (μ j ,∑ j ) and p( C j ) (maximization step).

EM算法具体过程:The specific process of the EM algorithm:

步骤一:令z1,…,zn表示对应数据x1,…,xn的真实来源(即类别)。每个zi都是一个取值在j=1,…,k之间的离散变量,其中k是类别的数量。有如下对数似然函数:

Figure BDA0003621584510000103
Step 1: Let z 1 ,...,z n denote the true source (ie category) of the corresponding data x 1 ,...,x n . Each zi is a discrete variable with values between j=1,...,k, where k is the number of categories. has the following log-likelihood function:
Figure BDA0003621584510000103

步骤二:用θ表示模型参数,将logp(X,z|θ)用其期望

Figure BDA0003621584510000104
来代替。Step 2: Use θ to represent the model parameters, and use logp(X, z|θ) as its expectation
Figure BDA0003621584510000104
instead.

步骤三:需要给定当前参数估计的logp(X,z|θ(t))的分布,根据贝叶斯规则,其满足

Figure BDA0003621584510000105
Step 3: The distribution of logp(X,z|θ (t) ) of the current parameter estimation needs to be given, according to the Bayes rule, it satisfies
Figure BDA0003621584510000105

步骤四:p(zi=c|xi(t))称为集群c对数据点i承担的“责任”,ric=p(zi=c|xi(t))。Step 4: p(z i =c| xi(t) ) is called the “responsibility” of cluster c for data point i, r ic =p(z i =c| xi(t) ) .

步骤五:GMM的期望步骤,将x1,…,xn记为X,Step 5: Desired step of GMM, denote x 1 ,...,x n as X,

Figure BDA0003621584510000106
Figure BDA0003621584510000106

步骤六:GMM的最大化步骤,将Q(θ,θ(t))对每个参数取其对应的偏导数,并且将这些偏导数设为零,从而得到新的参数估计

Figure BDA0003621584510000111
其中,
Figure BDA0003621584510000112
Step 6: The maximization step of GMM, take Q(θ,θ (t) ) for each parameter and take its corresponding partial derivatives, and set these partial derivatives to zero, so as to obtain new parameter estimates
Figure BDA0003621584510000111
in,
Figure BDA0003621584510000112

可以理解,关于GMM以及EM算法的更具体的描述可以参看已知的技术,此处不再赘述。It can be understood that for a more specific description of the GMM and EM algorithms, reference may be made to known technologies, and details are not repeated here.

三、ARIMA模型3. ARIMA model

自回归模型首先需要确定一个阶数p,表示用几期的历史值来预测当前值。p阶自回归模型的公式定义为:

Figure BDA0003621584510000113
其中,yt是当前值,μ是常数项,p是阶数,γi是自相关系数,et是误差。移动平均模型关注的是自回归模型中的误差项的累加,公式定义如下:
Figure BDA0003621584510000114
自回归模型和移动平均模型相结合,就得到了自回归移动平均模型ARMA(p,q),计算公式如下:
Figure BDA0003621584510000115
将自回归模型、移动平均模型和差分法结合,就得到了ARIMA(p,d,q),其中d是需要对数据进行差分的阶数。The autoregressive model first needs to determine an order p, which means that the historical value of several periods is used to predict the current value. The formula of the p-order autoregressive model is defined as:
Figure BDA0003621584510000113
where y t is the current value, μ is a constant term, p is the order, γ i is the autocorrelation coefficient, and e t is the error. The moving average model is concerned with the accumulation of error terms in the autoregressive model, and the formula is defined as follows:
Figure BDA0003621584510000114
Combining the autoregressive model and the moving average model, the autoregressive moving average model ARMA(p,q) is obtained. The calculation formula is as follows:
Figure BDA0003621584510000115
Combining the autoregressive model, the moving average model, and the differencing method yields ARIMA(p,d,q), where d is the order at which the data needs to be differenciated.

应理解,上文对Logistic模型、GMM以及ARIMA模型的介绍仅是为了更加清楚地本申请实施例提供的方法,不应对本申请实施例构成任何限定,更详尽的介绍可以参看已知的技术,此处不再赘述。It should be understood that the above descriptions of the Logistic model, the GMM and the ARIMA model are only for the purpose of making the methods provided by the embodiments of the present application more clearly, and should not constitute any limitation to the embodiments of the present application. For a more detailed introduction, reference may be made to known technologies. It will not be repeated here.

本申请实施例中,第一模型采用Logistic模型预测客户的违约概率,有利于初步筛选出一些高风险的客户,即违约概率较高的客户,进而有利于金融机构针对该类客户着重评估,尽可能地降低金融机构的损失。第二模型采用GMM,该模型可以用于识别比较复杂的分布,通过使用GMM可以综合考虑客户的违约情况受行业、收入、雇佣情况等多方面的影响,进而有利于提高预测的客户的违约时间的所属的时间区间的准确性。第三模型采用ARIMA模型预测客户的违约时间,可以提高预测的准确性。In the embodiment of the present application, the first model uses the Logistic model to predict the default probability of customers, which is beneficial to preliminarily screen out some high-risk customers, that is, customers with a higher default probability, and further facilitates financial institutions to focus on evaluating such customers. Possibly reduce the losses of financial institutions. The second model adopts GMM, which can be used to identify more complex distributions. By using GMM, it can comprehensively consider the influence of customers' default situation by industry, income, employment situation, etc., which is beneficial to improve the predicted default time of customers. The accuracy of the time interval to which it belongs. The third model uses the ARIMA model to predict the customer's default time, which can improve the accuracy of the prediction.

步骤220,将上述数据输入至第一模型中,得到第一客户的违约概率。Step 220: Input the above data into the first model to obtain the default probability of the first customer.

服务器获取到第一客户的数据后,将第一客户的数据输入至第一模型中,以得到第一客户的违约概率。After acquiring the data of the first customer, the server inputs the data of the first customer into the first model to obtain the default probability of the first customer.

示例性地,服务器将第一客户的行业类别、执行利率、放款金额、贷款期数、性别、年龄、学历、家庭年收入、雇佣情况、单位类型、居住情况、职务、社保标志以及客户等级输入至Logistic模型中,得到第一客户的违约概率。Exemplarily, the server inputs the industry category, execution interest rate, loan amount, loan period, gender, age, education, annual family income, employment status, unit type, residence status, position, social security mark and customer level of the first customer. In the Logistic model, the default probability of the first customer is obtained.

应理解,上文以得到第一客户的违约概率为例进行描述,但不应对本申请实施例构成任何限定。例如,服务器还可以基于Logistic模型计算出第一客户不发生违约的概率。相应地,当客户不违约的概率大于预设值时,表示客户的信用良好,可以认为该客户在贷款期内守约,换言之,服务器无需进一步预测该客户的违约时间;当客户不违约的概率小于或等于预设值时,表示客户的信用一般,可以认为该客户在贷款期内将会违约(可以称其为高风险客户),换言之,服务器需要进一步预测该客户的违约时间。It should be understood that the above description takes obtaining the default probability of the first customer as an example, but should not constitute any limitation to the embodiments of the present application. For example, the server may also calculate the probability that the first customer does not default based on the logistic model. Correspondingly, when the probability of the customer not defaulting is greater than the preset value, it means that the customer's credit is good, and the customer can be considered to be in compliance within the loan period. In other words, the server does not need to further predict the default time of the customer; when the probability of the customer not defaulting When it is less than or equal to the preset value, it means that the customer's credit is average, and it can be considered that the customer will default within the loan period (it can be called a high-risk customer). In other words, the server needs to further predict the customer's default time.

步骤230,在第一客户的违约概率大于预设值的情况下,通过第二模型预测第一客户的违约时间所属的时间区间。Step 230 , when the default probability of the first customer is greater than the preset value, predict the time interval to which the default time of the first customer belongs by using the second model.

服务器得到第一客户的违约概率后,在第一客户的违约概率小于或等于预设值的情况下,可以认为第一客户的信用良好,也即该第一客户在贷款期内守约,换言之,服务器无需进一步预测该客户的违约时间。After the server obtains the probability of default of the first customer, if the probability of default of the first customer is less than or equal to the preset value, it can be considered that the credit of the first customer is good, that is, the first customer keeps the contract during the loan period, in other words , the server does not need to further predict the default time of the customer.

服务器在第一客户的违约概率大于预设值的情况下,通过第二模型预测第一客户的违约时间所属的时间区间,也即,第一客户可能在未来多长时间内发生违约。When the default probability of the first customer is greater than the preset value, the server uses the second model to predict the time interval to which the default time of the first customer belongs, that is, how long in the future the first customer may default.

示例性地,在第一客户的违约概率大于预设值的情况下,服务器通过GMM预测第一客户的违约时间所属的时间区间,例如,服务器通过GMM预测出第一客户可能在贷款后一年内发生违约。Exemplarily, when the default probability of the first customer is greater than the preset value, the server predicts the time interval to which the default time of the first customer belongs by using GMM. Breach occurs.

步骤240,基于第二模型预测的第一客户的违约时间所属的时间区间,通过对应的第三模型预测第一客户的违约时间。Step 240 , based on the time interval to which the default time of the first customer predicted by the second model belongs, predict the default time of the first customer through the corresponding third model.

每个时间区间对应一个第三模型,例如,客户可能一年内发生违约,其对应一个第三模型,用于预测该客户在一年发生违约的具体时间,如在一年内的第十个月发生违约。Each time interval corresponds to a third model. For example, a customer may default within one year, and it corresponds to a third model, which is used to predict the specific time when the customer defaults in one year, such as the tenth month of one year. Breach of contract.

服务器预测出第一客户的违约时间所属的时间区间后,通过对应的第三模型预测第一客户的违约时间。After predicting the time interval to which the default time of the first customer belongs, the server predicts the default time of the first customer through the corresponding third model.

示例性地,服务器通过GMM预测出第一客户可能在贷款后的一年内发生违约,进一步地,通过其对应的ARIMA模型,预测出该第一客户可能在贷款后的第十个月发生违约,进而便于金融机构制定更合理的风险控制措施。Exemplarily, the server predicts through the GMM that the first customer may default within one year after the loan, and further, through its corresponding ARIMA model, predicts that the first customer may default in the tenth month after the loan, In turn, it is convenient for financial institutions to formulate more reasonable risk control measures.

应理解,不同类型的客户的经济水平不同,其可能发生违约的概率较大,因此,服务器可以针对第一客户所属的客户类型,选择其对应的第一子模型、第二子模型以及第三子模型。下面将详细描述针对客户类型对客户进行分类后,服务器如何预测客户的违约时间。It should be understood that different types of customers have different economic levels, and their probability of default is relatively high. Therefore, the server can select the corresponding first sub-model, second sub-model and third sub-model according to the customer type to which the first customer belongs. submodel. The following will describe in detail how the server predicts the default time of the customer after the customer is classified according to the customer type.

可选地,第一模型包括多个第一子模型,多个第一子模型对应不同的客户类型,客户类型根据客户所属的年龄段或地区确定;将数据输入至第一模型中,得到第一客户的违约概率,包括:从多个第一子模型中确定与第一客户的客户类型对应的第一子模型;将第一客户的数据输入至第一子模型中,得到第一客户的违约概率。Optionally, the first model includes a plurality of first sub-models, the plurality of first sub-models correspond to different customer types, and the customer type is determined according to the age group or region to which the customer belongs; data is input into the first model, and the first sub-model is obtained. The default probability of a customer includes: determining a first sub-model corresponding to the customer type of the first customer from a plurality of first sub-models; inputting the data of the first customer into the first sub-model to obtain the first customer’s Default probability.

示例性地,第一模型包括第一子模型1和第一子模型2,第一子模型1对应的客户类型为南方客户,第一子模型2对应的客户类型为北方客户,服务器基于第一客户所属的类型,确定将其数据输入至哪个第一子模型,如第一客户属于南方客户,则服务器将第一客户的数据输入至第一子模型1,以得到第一客户的违约概率。Exemplarily, the first model includes a first sub-model 1 and a first sub-model 2, the customer type corresponding to the first sub-model 1 is a southern customer, the customer type corresponding to the first sub-model 2 is a northern customer, and the server is based on the first sub-model 2. The type of the customer determines which first sub-model to input its data into. If the first customer belongs to the southern customer, the server inputs the data of the first customer into the first sub-model 1 to obtain the default probability of the first customer.

可选地,第二模型包括多个第二子模型,多个第二子模型对应不同的客户类型,客户类型根据客户所属的年龄段或地区确定;通过第二模型预测第一客户的违约时间所属的时间区间,包括:从多个第二子模型中确定与第一客户的客户类型对应的第二子模型;通过第二子模型预测第一客户的违约时间所属的时间区间。Optionally, the second model includes a plurality of second sub-models, the plurality of second sub-models correspond to different customer types, and the customer type is determined according to the age group or region to which the customer belongs; the second model is used to predict the default time of the first customer. The time interval to which it belongs includes: determining a second sub-model corresponding to the customer type of the first customer from a plurality of second sub-models; and predicting the time interval to which the default time of the first customer belongs by using the second sub-model.

示例性地,第二模型包括第二子模型1和第二子模型2,第二子模型1对应的客户类型为南方客户,第二子模型2对应的客户类型为北方客户,服务器基于第一客户所属的类型,确定将其数据输入至哪个第二子模型,如第一客户属于南方客户,则若基于第一子模型1预测出的第一客户的违约概率大于预设值,则服务器将第一客户的数据输入至第二子模型1,以预测第一客户的违约时间所属的时间区间。Exemplarily, the second model includes a second sub-model 1 and a second sub-model 2, the client type corresponding to the second sub-model 1 is a southern client, the client type corresponding to the second sub-model 2 is a northern client, and the server is based on the first The type of the customer to which it belongs determines which second sub-model to input its data into. For example, if the first customer belongs to the southern customer, if the default probability of the first customer predicted based on the first sub-model 1 is greater than the preset value, the server will The data of the first customer is input into the second sub-model 1 to predict the time interval to which the default time of the first customer belongs.

可选地,每个第三模型包括多个第三子模型,多个第三子模型中的任意两个第三子模型对应的客户类型不同,客户类型根据客户所属的年龄段或地区确定;基于第二模型预测的第一客户的违约时间所属的时间区间,通过对应的第三模型预测第一客户的违约时间,包括:在与第二模型预测的第一客户的违约时间所属的时间区间对应的第三模型中,确定与第一客户的客户类型对应的第三子模型;通过第三子模型预测第一客户的违约时间。Optionally, each third model includes a plurality of third sub-models, and any two third sub-models in the plurality of third sub-models correspond to different customer types, and the customer types are determined according to the age group or region to which the customer belongs; Based on the time interval to which the default time of the first customer predicted by the second model belongs, predicting the default time of the first customer by using the corresponding third model, including: in the time interval to which the default time of the first customer predicted by the second model belongs In the corresponding third model, a third sub-model corresponding to the customer type of the first customer is determined; the default time of the first customer is predicted through the third sub-model.

示例性地,第一客户属于南方客户,服务器基于第二子模型1预测出第一客户的违约时间所属的时间区间,而不同的时间区间对应着不同的第三子模型,例如,贷款后一年内对应第三子模型1,贷款后两年内对应第三子模型2,贷款后三年内对应第三子模型3,假设第一客户的违约时间所属的时间区间为贷款后一年内,则服务器通过第三子模型1预测出该第一客户的违约时间,如贷款后第十个月该第一客户可能发生违约。Exemplarily, the first customer belongs to the southern customer, and the server predicts the time interval to which the default time of the first customer belongs based on the second sub-model 1, and different time intervals correspond to different third sub-models, This year corresponds to the third sub-model 1, two years after the loan corresponds to the third sub-model 2, and three years after the loan corresponds to the third sub-model 3. Assuming that the first customer's default time falls within one year after the loan, the server will pass The third sub-model 1 predicts the default time of the first customer, for example, the first customer may default in the tenth month after the loan.

可选地,在获取第一客户的数据之前,图2所示的方法还包括:获取训练集,训练集包括多个客户的历史数据;基于训练集,分别对第一模型、第二模型和至少一个第三模型进行训练。Optionally, before acquiring the data of the first customer, the method shown in FIG. 2 further includes: acquiring a training set, where the training set includes historical data of multiple customers; based on the training set, the first model, the second model and the At least one third model is trained.

示例性地,服务器可以获取多个客户的历史数据,将上述多个客户的历史数据划分为训练数据和验证数据,也即一部分客户的数据用于训练模型,一部分客户的数据用于验证模型,以得到训练好的第一模型、第二模型以及至少一个第三模型。Exemplarily, the server may obtain the historical data of multiple clients, and divide the historical data of the multiple clients into training data and verification data, that is, a part of the client's data is used for training the model, and a part of the client's data is used for the verification model, to obtain the trained first model, second model and at least one third model.

本申请实施例通过训练集分别对第一模型、第二模型和至少一个第三模型进行训练,有利于提高第一模型、第二模型和至少一个第三模型的准确性,进而有利于提高预测的客户的违约时间的准确性。In this embodiment of the present application, the first model, the second model, and the at least one third model are respectively trained through the training set, which is beneficial to improve the accuracy of the first model, the second model, and the at least one third model, and further helps to improve the prediction the accuracy of the customer's default time.

可选地,图2所示的方法还包括:基于多个客户分别对应的客户类型,对训练集进行分组,得到多组训练集,多组训练集对应的客户类型不同,客户类型根据客户所属的年龄段或地区确定;以及,基于训练集,分别对第一模型、第二模型和至少一个第三模型进行训练,包括:基于每一组训练集分别对第一模型、第二模型和至少一个第三模型进行训练,得到训练好的一个第一子模型、第二子模型以及多个训练好的第三子模型。Optionally, the method shown in FIG. 2 also includes: grouping the training set based on the customer types corresponding to the multiple customers respectively, to obtain multiple sets of training sets, the customer types corresponding to the multiple sets of training sets are different, and the customer types are based on the customer’s belonging. The age group or region of the A third model is trained to obtain a trained first sub-model, a second sub-model and a plurality of trained third sub-models.

服务器可以将多个客户按照客户所在的地区或客户所属的年龄段进行分组,也即,将训练集分为多组训练集,每一组训练集对应一种类型的客户,并基于每一组训练集训练第一模型、第二模型和至少一个第三模型进行训练,以得到训练好的一个第一子模型、第二子模型以及多个训练好的第三子模型。The server can group multiple customers according to the region where the customer is located or the age group the customer belongs to, that is, the training set is divided into multiple sets of training sets, each set of training sets corresponds to a type of customer, and based on each group The training set trains the first model, the second model and the at least one third model for training to obtain a trained first sub-model, a second sub-model and a plurality of trained third sub-models.

示例性地,服务器获取了100个客户的数据,其中,40个客户属于南方,60个客户属于北方,由于南北方存在经济水平的差异,将100个客户分为两组,例如,使用40个属于南方的客户的数据,分别对第一模型、第二模型和至少一个第三模型进行训练,得到一个训练好的第一子模型、第二子模型以及多个训练好的第三子模型;并且,使用60个属于北方的客户的数据,分别对第一模型、第二模型和至少一个第三模型进行训练,得到一个训练好的第一子模型、第二子模型以及多个训练好的第三子模型。这样一来,可以得到2个第一子模型,2个第二子模型,以及若干个第三子模型(第三子模型的个数是第三模型的个数的两倍)。对于一个新客户来说,服务器可以基于该客户所属的类型,使用其对应的第一子模型、第二子模型以及第三子模型进行预测。Exemplarily, the server obtains data of 100 customers, of which 40 customers belong to the south and 60 customers belong to the north. Due to the difference in economic level between the north and the south, the 100 customers are divided into two groups, for example, 40 customers are used. For data belonging to customers in the south, the first model, the second model and at least one third model are respectively trained to obtain a trained first sub-model, a second sub-model and a plurality of trained third sub-models; And, use the data of 60 customers belonging to the north to train the first model, the second model and at least one third model respectively, and obtain a trained first sub-model, second sub-model and a plurality of trained sub-models. The third submodel. In this way, two first sub-models, two second sub-models, and several third sub-models (the number of third sub-models is twice the number of third models) can be obtained. For a new client, the server may use its corresponding first sub-model, second sub-model and third sub-model to make predictions based on the type of the client.

本申请实施例中,对按照客户类型,对其进行分类,有利于降低不同类型的客户的经济差异对预测客户的违约概率的影响。In the embodiment of the present application, classifying customers according to their types is beneficial to reduce the influence of economic differences of different types of customers on predicting the default probability of customers.

可选地,图2所示的方法还包括:按照预设周期,更新训练集,得到更新后的训练集;基于更新后的训练集,分别对第一模型、第二模型和至少一个第三模型进行训练。Optionally, the method shown in FIG. 2 further includes: updating the training set according to a preset period to obtain an updated training set; The model is trained.

换言之,服务器可以周期性地更新训练集,并基于更新后的训练集,分别对第一模型、第二模型和至少一个第三模型进行训练。例如,服务器可以周期性地获取不同的客户的历史数据,并基于获取的数据训练第一模型、第二模型和至少一个第三模型。In other words, the server may periodically update the training set, and based on the updated training set, train the first model, the second model and the at least one third model, respectively. For example, the server may periodically acquire historical data of different clients, and train the first model, the second model and the at least one third model based on the acquired data.

本申请实施例通过周期性地更新训练集,并基于更新后的训练集训练模型,也即多次训练模型,有利于提高模型的准确性,进而提高预测的客户的违约时间的准确性。In the embodiment of the present application, by periodically updating the training set and training the model based on the updated training set, that is, training the model multiple times, it is beneficial to improve the accuracy of the model, thereby improving the accuracy of the predicted customer default time.

图3是本申请实施例提供的违约时间的预测方法的又一流程示意图。FIG. 3 is another schematic flowchart of a method for predicting a default time provided by an embodiment of the present application.

如图3所示,步骤310,服务器启动。As shown in FIG. 3, in step 310, the server is started.

步骤320,服务器维护或更新模型、配置模型参数。例如,服务器维护或更新第一模型、第二模型以及至少一个第三模型,配置上述模型的参数。Step 320, the server maintains or updates the model and configures model parameters. For example, the server maintains or updates the first model, the second model, and at least one third model, and configures the parameters of the above models.

步骤330,服务器启用客户风险监测的模型。例如,服务器启用上述第一模型。Step 330, the server enables the model of client risk monitoring. For example, the server enables the first model described above.

步骤340,服务器对客户进行分类和聚类。示例性地,服务器基于第一模型确定出客户属于高风险客户还是守约客户。例如,服务器基于第一模型预测客户的违约概率,在客户的违约概率大于预设值的情况下,认为该客户为高风险客户,需要进一步预测其违约时间所属的时间区间和具体的违约时间。详细的过程可以参看图2的相关描述,此处不再赘述。In step 340, the server classifies and clusters the clients. Exemplarily, the server determines whether the customer is a high-risk customer or a compliance customer based on the first model. For example, the server predicts the customer's default probability based on the first model. If the customer's default probability is greater than a preset value, the server considers the customer to be a high-risk customer, and needs to further predict the time interval and specific default time to which the default time belongs. For a detailed process, reference may be made to the related description of FIG. 2 , which will not be repeated here.

步骤350,服务器展示客户分类和聚类结果。这样一来,可以提示技术人员高风险的客户有哪些,供业务人员重点审查。例如,对于高风险客户可以进一步预测其违约时间。In step 350, the server displays the client classification and clustering results. In this way, the technical staff can be prompted which high-risk customers are for the business staff to focus on reviewing. For example, high-risk customers can further predict their default time.

步骤360,服务器基于历史记录进行计算,展示模型监测效果。换言之,服务器将客户的历史数据输入模型,确定出违约时间,判断客户的违约时间是否准确,例如,该客户的实际违约时间与基于模型计算出的违约时间相同,表示模型监测效果较好。In step 360, the server performs calculation based on the historical records, and displays the model monitoring effect. In other words, the server inputs the customer's historical data into the model, determines the default time, and judges whether the customer's default time is accurate. For example, the customer's actual default time is the same as the default time calculated based on the model, indicating that the model monitoring effect is better.

可以理解,基于模型监测效果,开发人员可以指示服务器调整模型,以提高模型的准确性。Understandably, based on the model monitoring effect, the developer can instruct the server to adjust the model to improve the accuracy of the model.

基于上述技术方案,服务器将获取到的第一客户的数据输入至第一模型中,以得到该客户的违约概率,在该违约概率大于预设值的情况下,通过第二模型预测该客户的违约时间所属的时间区间,也即该客户可能在未来多长时间内违约,进一步地基于预测的客户的违约时间所属的时间区间,通过对应的第三模型预测该客户具体的违约时间,如此一来,不但可以预测出客户的违约概率,还可进一步针对可能违约的客户预测违约时间,因此可以获得更为全面的信息,便于金融机构基于此来制定更合理的风险控制措施,有利于提高金融机构的信用风险控制的效率。Based on the above technical solution, the server inputs the acquired data of the first customer into the first model to obtain the default probability of the customer, and when the default probability is greater than the preset value, predicts the customer's default probability through the second model The time interval to which the default time belongs, that is, how long the customer may default in the future. Further, based on the time interval to which the predicted default time of the customer belongs, the specific default time of the customer is predicted through the corresponding third model. It can not only predict the default probability of customers, but also further predict the default time for customers who may default, so more comprehensive information can be obtained, which is convenient for financial institutions to formulate more reasonable risk control measures based on this, which is conducive to improving financial The efficiency of an institution's credit risk control.

可选地,本申请实施例还提供了一种模型训练方法,该方法可以由服务器执行。该服务器配置有第一模型、第二模型和至少一个第三模型,第一模型用于预测客户的违约概率,第二模型用于预测客户的违约时间所属的时间区间,至少一个第三模型和至少一个时间区间对应,每个第三模型用于预测客户在所对应的时间区间内的违约次数和违约时间。Optionally, the embodiment of the present application further provides a model training method, and the method can be executed by a server. The server is configured with a first model, a second model and at least one third model, the first model is used to predict the default probability of the customer, the second model is used to predict the time interval to which the customer's default time belongs, at least one third model and At least one time interval corresponds, and each third model is used to predict the number of defaults and the default time of the customer within the corresponding time interval.

示例性地,该方法包括:获取多个客户的数据,每个客户的数据包括用于反映客户的信用风险的参数;将多个客户的数据输入至第一模型中,得到多个客户的违约概率;在客户的违约概率大于预设值的情况下,通过第二模型预测该客户的违约时间所属的时间区间;基于第二模型预测的该客户的违约时间所属的时间区间,通过对应的第三模型预测该客户的违约时间。Exemplarily, the method includes: acquiring data of a plurality of customers, where the data of each customer includes a parameter used to reflect the credit risk of the customer; inputting the data of the plurality of customers into the first model to obtain the default of the plurality of customers. Probability; when the customer's default probability is greater than the preset value, the second model is used to predict the time interval to which the customer's default time belongs; the time interval to which the customer's default time is predicted based on the second model is determined through the corresponding first The three-model predicts the customer's default time.

其中,对于训练第一模型、第二模型以及至少一个第三模型的具体过程可以参看图2所示的实施例的相关描述,此处不再赘述。For the specific process of training the first model, the second model and the at least one third model, reference may be made to the relevant description of the embodiment shown in FIG. 2 , which will not be repeated here.

基于上述技术方案,将获取到的多个客户的数据输入至第一模型中,以得到多个客户的违约概率,对于违约概率大于预设值的客户,通过第二模型预测客户的违约时间所属的时间区间,也即客户可能在未来多长时间内违约,进一步地基于预测的客户的违约时间所属的时间区间,通过对应的第三模型预测客户具体的违约时间,这样一来,可以通过多个客户的数据训练第一模型、第二模型以及至少一个第三模型,有利于提高模型的准确性。Based on the above technical solution, the obtained data of multiple customers is input into the first model to obtain the default probability of multiple customers. For customers whose default probability is greater than the preset value, the second model is used to predict the default time of the customer. The time interval, that is, how long the customer may default in the future, and further based on the time interval to which the predicted default time of the customer belongs, the specific default time of the customer is predicted through the corresponding third model. The first model, the second model and the at least one third model are trained on the data of each customer, which is beneficial to improve the accuracy of the model.

图4是本申请实施例提供的服务器的示意性框图。FIG. 4 is a schematic block diagram of a server provided by an embodiment of the present application.

如图4所示,该装置400可以包括:获取单元410、输入单元420以及处理单元430。该服务器400可用于实现图2或图3所示实施例中所述的方法。As shown in FIG. 4 , the apparatus 400 may include: an acquisition unit 410 , an input unit 420 and a processing unit 430 . The server 400 can be used to implement the method described in the embodiment shown in FIG. 2 or FIG. 3 .

示例性地,当该装置400用于实现图2所示实施例中所述的方法时,获取单元410用于获取第一客户的数据,第一客户的数据包括用于反映第一客户的信用风险的参数;输入单元420用于将数据输入至第一模型中,得到第一客户的违约概率;处理单元430用于在第一客户的违约概率大于预设值的情况下,通过第二模型预测第一客户的违约时间所属的时间区间;处理单元430还用于基于第二模型预测的第一客户的违约时间所属的时间区间,通过对应的第三模型预测第一客户的违约时间。Exemplarily, when the apparatus 400 is used to implement the method described in the embodiment shown in FIG. 2 , the obtaining unit 410 is used to obtain the data of the first customer, and the data of the first customer includes the credit used to reflect the first customer. The parameter of risk; the input unit 420 is used to input the data into the first model to obtain the default probability of the first customer; the processing unit 430 is used to pass the second model when the default probability of the first customer is greater than the preset value Predict the time interval to which the default time of the first customer belongs; the processing unit 430 is further configured to predict the default time of the first customer through the corresponding third model based on the time interval to which the default time of the first customer belongs to predicted by the second model.

可选地,第一模型为Logistic模型,第二模型为GMM,第三模型为ARIMA模型。Optionally, the first model is a Logistic model, the second model is a GMM, and the third model is an ARIMA model.

可选地,上述数据包括如下一项或多项:行业类别、执行利率、放款金额、贷款期数、性别、年龄、学历、家庭年收入、雇佣情况、单位类型、居住情况、职务、社保标志以及客户等级。Optionally, the above data includes one or more of the following: industry category, execution interest rate, loan amount, number of loan periods, gender, age, education, annual household income, employment situation, unit type, residence situation, job title, social security sign and customer level.

可选地,第一模型包括多个第一子模型,多个第一子模型对应不同的客户类型,客户类型根据客户所属的年龄段或地区确定;以及,输入单元420具体用于从多个第一子模型中确定与第一客户的客户类型对应的第一子模型;将第一客户的数据输入至第一子模型中,得到第一客户的违约概率。Optionally, the first model includes a plurality of first sub-models, and the plurality of first sub-models correspond to different customer types, and the customer type is determined according to the age group or region to which the customer belongs; A first sub-model corresponding to the customer type of the first customer is determined in the first sub-model; the data of the first customer is input into the first sub-model to obtain the default probability of the first customer.

可选地,第二模型包括多个第二子模型,多个第二子模型对应不同的客户类型,客户类型根据客户所属的年龄段或地区确定;处理单元430具体用于从多个第二子模型中确定与第一客户的客户类型对应的第二子模型;通过第二子模型预测第一客户的违约时间所属的时间区间。Optionally, the second model includes a plurality of second sub-models, the plurality of second sub-models correspond to different customer types, and the customer type is determined according to the age group or region to which the customer belongs; the processing unit 430 is specifically configured to select from the plurality of second sub-models. A second sub-model corresponding to the customer type of the first customer is determined in the sub-model; the time interval to which the default time of the first customer belongs is predicted by the second sub-model.

可选地,每个第三模型包括多个第三子模型,多个第三子模型中的任意两个第三子模型对应的客户类型不同,客户类型根据客户所属的年龄段或地区确定;处理单元430具体用于在与第二模型预测的第一客户的违约时间所属的时间区间对应的第三模型中,确定与第一客户的客户类型对应的第三子模型;通过第三子模型预测第一客户的违约时间。Optionally, each third model includes a plurality of third sub-models, and any two third sub-models in the plurality of third sub-models correspond to different customer types, and the customer types are determined according to the age group or region to which the customer belongs; The processing unit 430 is specifically configured to determine a third sub-model corresponding to the customer type of the first customer in the third model corresponding to the time interval to which the default time of the first customer predicted by the second model belongs; through the third sub-model Predict the default time of the first customer.

可选地,处理单元430还用于获取训练集,训练集包括多个客户的历史数据;基于训练集,分别对第一模型、第二模型和至少一个第三模型进行训练。Optionally, the processing unit 430 is further configured to acquire a training set, where the training set includes historical data of multiple customers; based on the training set, the first model, the second model and at least one third model are trained respectively.

可选地,处理单元430还用于基于多个客户分别对应的客户类型,对训练集进行分组,得到多组训练集,多组训练集对应的客户类型不同,客户类型根据客户所属的年龄段或地区确定;以及,处理单元430具体用于基于每一组训练集分别对第一模型、第二模型和至少一个第三模型进行训练,得到训练好的一个第一子模型、第二子模型以及多个训练好的第三子模型。Optionally, the processing unit 430 is further configured to group the training set based on the customer types corresponding to the multiple customers respectively, to obtain multiple sets of training sets, the customer types corresponding to the multiple sets of training sets are different, and the customer types are based on the age groups to which the customers belong. or region determination; and, the processing unit 430 is specifically configured to respectively train the first model, the second model and the at least one third model based on each group of training sets to obtain a trained first sub-model and a second sub-model and multiple trained third sub-models.

可选地,处理单元430还用于按照预设周期,更新训练集,得到更新后的训练集;基于更新后的训练集,分别对第一模型、第二模型和至少一个第三模型进行训练。Optionally, the processing unit 430 is further configured to update the training set according to a preset period to obtain an updated training set; based on the updated training set, respectively train the first model, the second model and at least one third model .

应理解,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。另外,在本申请各个实施例中的各功能单元可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。It should be understood that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division manners in actual implementation. In addition, each functional unit in each embodiment of the present application may be integrated into one processor, or may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software function modules.

示例性地,服务器可以包括数据管理单元、模型管理单元以及风险监测单元。其中,数据管理单元用于存储或更新贷款客户数据,贷款客户数据用于机器学习模型训练和验证。模型管理单元用于配置和更新用于数据分析、风险分析的机器学习模型,如Logistic模型、GMM以及ARIMA模型。风险监测单元用于应用机器学习模型,对贷款客户数据进行聚类分析和风险预测等,展示信用风险,即客户的违约时间和/或违约概率。Exemplarily, the server may include a data management unit, a model management unit, and a risk monitoring unit. Among them, the data management unit is used to store or update loan customer data, and the loan customer data is used for machine learning model training and verification. The model management unit is used to configure and update machine learning models for data analysis and risk analysis, such as logistic model, GMM and ARIMA model. The risk monitoring unit is used to apply machine learning models, perform cluster analysis and risk prediction on loan customer data, etc., and display credit risk, that is, the customer's default time and/or default probability.

数据管理单元还可以具体划分为贷款客户数据模块、客户群体数据模块和预测记录数据模块。The data management unit can also be specifically divided into a loan customer data module, a customer group data module and a forecast record data module.

其中,贷款客户数据模块用于存储贷款客户的数据。数据包括行业类别、执行利率、放款金额、贷款期数、性别、年龄、学历、家庭年收入、雇佣情况、单位类型、居住情况、职务、社保标志、客户等级等。客户群体数据模块用于经过聚类分析,生成客户群体数据(也即属于同一类别的客户的数据的集合)。预测记录数据模块用于存储客户风险预测结果,并记录数据。Among them, the loan customer data module is used to store loan customer data. The data includes industry category, execution interest rate, loan amount, number of loan periods, gender, age, education, family annual income, employment situation, unit type, residence situation, position, social security mark, customer level, etc. The customer group data module is used to generate customer group data (that is, a collection of data of customers belonging to the same category) through cluster analysis. The forecast record data module is used to store the customer risk forecast result and record the data.

模型管理单元还可以具体划分为模型实例模块、模型参数模块和模型更新模块。The model management unit can also be specifically divided into a model instance module, a model parameter module and a model update module.

其中,模型实例模块用于配置用于分类、聚类分析的机器学习模型,包括Logistic模型、GMM等。模型参数模块用于配置和更新机器学习模型的参数。针对GMM的参数,使用梅特罗波利斯-黑斯廷斯(Metropolis-Hastings)算法更新模型参数达到收敛。模型更新模块用于更新现有模型,或者删除旧模型,或增加新模型。The model instance module is used to configure machine learning models for classification and cluster analysis, including logistic models and GMMs. The Model Parameters module is used to configure and update the parameters of a machine learning model. For the parameters of the GMM, the Metropolis-Hastings algorithm is used to update the model parameters to achieve convergence. The model update module is used to update existing models, or delete old models, or add new models.

风险监测单元还可以具体划分为启用模型模块、客户聚类结果模块和模型监测效果模块。The risk monitoring unit can also be specifically divided into an enabling model module, a customer clustering result module and a model monitoring effect module.

其中,启用模型模块用于启用或关闭机器学习模型处理风险监测。客户聚类结果模块用于使用模型对贷款客户进行分类、聚类分析,显示结果,如果判定为高风险客户,则提示违约时长预测结果,比如,客户A将在一年内发生违约,概率为83%。模型监测效果模块用于综合贷款客户的历史数据,展示模型风险监测效果,比如正确分类的比例、错误比例、召回率等指标,供模型开发人员进行分析,便于优化模型。Among them, the enable model module is used to enable or disable machine learning model processing risk monitoring. The customer clustering result module is used to use the model to classify and cluster the loan customers, and display the results. If it is determined to be a high-risk customer, it will prompt the default duration prediction result. For example, customer A will default within one year, with a probability of 83 %. The model monitoring effect module is used to synthesize the historical data of loan customers to display the model risk monitoring effect, such as the correct classification ratio, error ratio, recall rate and other indicators, for model developers to analyze and optimize the model.

图5是本申请实施例提供的服务器的又一示意性框图。FIG. 5 is another schematic block diagram of a server provided by an embodiment of the present application.

该服务器500可用于实现上述图2或图3所示的实施例中所述的方法。该服务器500可以为芯片系统。本申请实施例中,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。The server 500 can be used to implement the method described in the embodiment shown in FIG. 2 or FIG. 3 above. The server 500 may be a chip system. In this embodiment of the present application, the chip system may be composed of chips, or may include chips and other discrete devices.

如图5所示,该服务器500可以包括至少一个处理器510。As shown in FIG. 5 , the server 500 may include at least one processor 510 .

示例性地,处理器510可以用于获取第一客户的数据,第一客户的数据包括用于反映第一客户的信用风险的参数;将数据输入至第一模型中,得到第一客户的违约概率;在第一客户的违约概率大于预设值的情况下,通过第二模型预测第一客户的违约时间所属的时间区间;基于第二模型预测的第一客户的违约时间所属的时间区间,通过对应的第三模型预测第一客户的违约时间。具体参见方法示例中的详细描述,此处不做赘述。Exemplarily, the processor 510 may be configured to obtain data of the first customer, where the data of the first customer includes parameters used to reflect the credit risk of the first customer; input the data into the first model to obtain the default of the first customer; probability; when the default probability of the first customer is greater than the preset value, the second model is used to predict the time interval to which the default time of the first customer belongs; the time interval to which the default time of the first customer is predicted based on the second model belongs, The default time of the first customer is predicted by the corresponding third model. For details, refer to the detailed description in the method example, which is not repeated here.

该服务器500还可以包括至少一个存储器520,可以用于存储程序指令和/或数据。存储器520和处理器510耦合。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。处理器510可能和存储器520协同操作。处理器510可能执行存储器520中存储的程序指令。所述至少一个存储器中的至少一个可以包括于处理器中。The server 500 may also include at least one memory 520, which may be used to store program instructions and/or data. Memory 520 is coupled to processor 510 . The coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units or modules, which may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules. Processor 510 may cooperate with memory 520 . Processor 510 may execute program instructions stored in memory 520 . At least one of the at least one memory may be included in the processor.

该服务器500还可以包括通信接口530,用于通过传输介质和其它设备进行通信,从而使得该服务器500可以和其它设备进行通信。所述通信接口530例如可以是收发器、接口、总线、电路或者能够实现收发功能的装置。处理器510可利用通信接口530收发数据和/或信息,并用于实现图2或图3所示的实施例中所述的方法。The server 500 may also include a communication interface 530 for communicating with other devices through a transmission medium, so that the server 500 can communicate with other devices. The communication interface 530 may be, for example, a transceiver, an interface, a bus, a circuit, or a device capable of implementing a transceiving function. The processor 510 can use the communication interface 530 to send and receive data and/or information, and be used to implement the method described in the embodiment shown in FIG. 2 or FIG. 3 .

本申请实施例中不限定上述处理器510、存储器520以及通信接口530之间的具体连接介质。本申请实施例在图5中以处理器510、存储器520以及通信接口530之间通过总线540连接。总线540在图5中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图5中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The specific connection medium between the processor 510, the memory 520, and the communication interface 530 is not limited in the embodiments of the present application. In the embodiment of the present application, in FIG. 5 , the processor 510 , the memory 520 , and the communication interface 530 are connected through a bus 540 . The bus 540 is represented by a thick line in FIG. 5 , and the connection manners between other components are only for schematic illustration and are not intended to be limiting. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 5, but it does not mean that there is only one bus or one type of bus.

本申请还提供了一种芯片系统,所述芯片系统包括至少一个处理器,用于实现上述图2或图3所示实施例中所述的方法。The present application further provides a chip system, where the chip system includes at least one processor for implementing the method described in the embodiment shown in FIG. 2 or FIG. 3 above.

在一种可能的设计中,所述芯片系统还包括存储器,所述存储器用于保存程序指令和数据,存储器位于处理器之内或处理器之外。In a possible design, the chip system further includes a memory for storing program instructions and data, and the memory is located inside the processor or outside the processor.

该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。The chip system can be composed of chips, and can also include chips and other discrete devices.

本申请还提供一种计算机程序产品,所述计算机程序产品包括:计算机程序(也可以称为代码,或指令),当所述计算机程序被运行时,使得计算机执行如图2或图3所示的实施例中所述的方法。The present application also provides a computer program product, the computer program product includes: a computer program (also referred to as code, or instructions), when the computer program is run, the computer executes as shown in FIG. 2 or FIG. 3 . The method described in the examples.

本申请还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序(也可以称为代码,或指令)。当所述计算机程序被运行时,使得计算机执行如图2或图3所示的实施例中所述的方法。The present application also provides a computer-readable storage medium storing a computer program (also referred to as code, or instructions). When the computer program is executed, the computer is caused to perform the method described in the embodiment shown in FIG. 2 or FIG. 3 .

需要说明的是,本申请实施例中提供的违约时间的预测方法及相关装置可以应用于人工智能领域,也可以应用于除人工智能领域之外的任意领域,本申请对此不作限定。It should be noted that the default time prediction method and related device provided in the embodiments of the present application can be applied to the field of artificial intelligence, and can also be applied to any field other than the field of artificial intelligence, which is not limited in this application.

应理解,本申请实施例中的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(digitalsignal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。It should be understood that the processor in this embodiment of the present application may be an integrated circuit chip, which has a signal processing capability. In the implementation process, each step of the above method embodiments may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software. The above-mentioned processor may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable Logic devices, discrete gate or transistor logic devices, discrete hardware components. The methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.

还应理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rateSDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(directrambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It should also be understood that the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory may be random access memory (RAM), which acts as an external cache. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) And direct memory bus random access memory (directrambus RAM, DR RAM). It should be noted that the memory of the systems and methods described herein is intended to include, but not be limited to, these and any other suitable types of memory.

本说明书中使用的术语“单元”、“模块”等,可用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。The terms "unit", "module" and the like used in this specification may be used to refer to a computer-related entity, hardware, firmware, a combination of hardware and software, software, or software in execution.

本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各种说明性逻辑块(illustrative logical block)和步骤(step),能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。在本申请所提供的几个实施例中,应该理解到,所揭露的装置、设备和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。Those of ordinary skill in the art will appreciate that the various illustrative logical blocks and steps described in connection with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware accomplish. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application. In the several embodiments provided in this application, it should be understood that the disclosed apparatuses, devices and methods may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple modules or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or modules, and may be in electrical, mechanical or other forms.

所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上单元集成在一个模块中。In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically alone, or two or more units may be integrated into one module.

在上述实施例中,各功能模块的功能可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令(程序)。在计算机上加载和执行所述计算机程序指令(程序)时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字通用光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。In the above embodiments, the functions of each functional module may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions (programs). When the computer program instructions (programs) are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, digital video disc (DVD)), or semiconductor media (eg, solid state disk (SSD)) )Wait.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk and other mediums that can store program codes.

以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims (13)

1. A method for predicting default time is applied to a server, the server is configured with a first model, a second model and at least one third model, the first model is used for predicting default probability of a customer, the second model is used for predicting a time interval to which default time of the customer belongs, the at least one third model corresponds to the at least one time interval, and each third model is used for predicting default times and default time of the customer in the corresponding time interval, the method comprises the following steps:
obtaining data of a first customer, the data of the first customer comprising a parameter for reflecting credit risk of the first customer;
inputting the data into the first model to obtain a default probability of the first customer;
under the condition that the default probability of the first customer is larger than a preset value, predicting a time interval to which default time of the first customer belongs through the second model;
and predicting the default time of the first customer through a corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs.
2. The method of claim 1, wherein the first model is a multivariate logical Logistic model, the second model is a gaussian mixture model GMM, and the third model is an autoregressive moving average ARIMA model.
3. The method of claim 1, wherein the data comprises one or more of: industry category, interest rate of performance, amount of loan, number of loan terms, gender, age, academic history, family annual income, employment status, unit type, occupancy status, job title, social security label, and customer rating.
4. The method of claim 1, wherein the first model comprises a plurality of first sub-models corresponding to different customer types, the customer types being determined according to the age group or region to which the customer belongs;
said inputting said data into said first model to obtain a probability of breach by said first customer, comprising:
determining a first sub-model corresponding to a customer type of the first customer from the plurality of first sub-models;
and inputting the data of the first customer into the first submodel to obtain the default probability of the first customer.
5. The method of claim 1, wherein the second model comprises a plurality of second submodels corresponding to different customer types, the customer types being determined according to the age group or region to which the customer belongs;
the predicting, by the second model, a time interval to which the default time of the first customer belongs includes:
determining a second sub-model corresponding to the customer type of the first customer from the plurality of second sub-models;
and predicting a time interval to which the default time of the first customer belongs through the second submodel.
6. The method of claim 1, wherein each third model comprises a plurality of third submodels, any two third submodels in the plurality of third submodels correspond to different customer types, and the customer types are determined according to the age groups or regions to which the customers belong;
the predicting the default time of the first customer through a corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs comprises:
determining a third sub-model corresponding to the customer type of the first customer in a third model corresponding to a time interval to which the default time of the first customer predicted by the second model belongs;
predicting, by the third submodel, a default time for the first customer.
7. The method of claim 1, wherein the method further comprises:
acquiring a training set, wherein the training set comprises historical data of a plurality of clients;
training the first model, the second model, and the at least one third model, respectively, based on the training set.
8. The method of claim 7, wherein the method further comprises:
grouping the training sets based on the client types respectively corresponding to the clients to obtain a plurality of groups of training sets, wherein the client types corresponding to the training sets are different, and the client types are determined according to the age groups or the regions of the clients; and the number of the first and second groups,
the training the first model, the second model, and the at least one third model, respectively, based on the training set, includes:
and training the first model, the second model and the at least one third model respectively based on each group of training sets to obtain a trained first sub-model, a trained second sub-model and a plurality of trained third sub-models.
9. The method of claim 7 or 8, wherein the method further comprises:
updating the training set according to a preset period to obtain an updated training set;
training the first model, the second model, and the at least one third model, respectively, based on the updated training set.
10. A server, wherein the server is configured with a first model, a second model and at least one third model, the first model is used for predicting default probability of a customer, the second model is used for predicting a time interval to which default time of the customer belongs, the at least one third model corresponds to at least one time interval, each third model is used for predicting number of times of default and default time of the customer in the corresponding time interval, the server comprises:
an acquisition unit for acquiring data of a first customer, the data of the first customer comprising a parameter for reflecting credit risk of the first customer;
the input unit is used for inputting the data into the first model to obtain the default probability of the first customer;
the processing unit is used for predicting a time interval to which default time of the first customer belongs through the second model under the condition that the default probability of the first customer is greater than a preset value;
the processing unit is further used for predicting the default time of the first customer through a corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs.
11. A server comprising a processor and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement the method of any of claims 1 to 9.
12. A computer-readable storage medium, comprising a computer program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 9.
13. A computer program product, comprising a computer program which, when executed, causes a computer to perform the method of any one of claims 1 to 9.
CN202210460740.8A 2022-04-28 2022-04-28 A method for predicting default time and related device Active CN114676936B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210460740.8A CN114676936B (en) 2022-04-28 2022-04-28 A method for predicting default time and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210460740.8A CN114676936B (en) 2022-04-28 2022-04-28 A method for predicting default time and related device

Publications (2)

Publication Number Publication Date
CN114676936A true CN114676936A (en) 2022-06-28
CN114676936B CN114676936B (en) 2025-06-20

Family

ID=82080049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210460740.8A Active CN114676936B (en) 2022-04-28 2022-04-28 A method for predicting default time and related device

Country Status (1)

Country Link
CN (1) CN114676936B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL257381A (en) * 2017-02-06 2018-03-29 Neural Algorithms Ltd System and method for automatic data modelling
CN110490379A (en) * 2019-08-13 2019-11-22 山东建筑大学 MCMC-Based Prediction Method and System of Office Personnel's Energy Use Behavior
CN111191825A (en) * 2019-12-20 2020-05-22 北京淇瑀信息科技有限公司 User default prediction method and device and electronic equipment
CN111324862A (en) * 2020-02-10 2020-06-23 深圳华策辉弘科技有限公司 Method and system for monitoring behavior in loan
CN111352794A (en) * 2018-12-24 2020-06-30 鸿富锦精密工业(武汉)有限公司 Abnormality detection method, device, computer device, and storage medium
CN112526378A (en) * 2019-09-18 2021-03-19 中车时代电动汽车股份有限公司 Battery inconsistency fault early warning method and equipment
US20210201400A1 (en) * 2019-12-27 2021-07-01 Lendingclub Corporation Intelligent servicing
CA3135469A1 (en) * 2020-09-30 2022-03-30 10353744 Canada Ltd. Default loss rate prediction method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL257381A (en) * 2017-02-06 2018-03-29 Neural Algorithms Ltd System and method for automatic data modelling
CN111352794A (en) * 2018-12-24 2020-06-30 鸿富锦精密工业(武汉)有限公司 Abnormality detection method, device, computer device, and storage medium
CN110490379A (en) * 2019-08-13 2019-11-22 山东建筑大学 MCMC-Based Prediction Method and System of Office Personnel's Energy Use Behavior
CN112526378A (en) * 2019-09-18 2021-03-19 中车时代电动汽车股份有限公司 Battery inconsistency fault early warning method and equipment
CN111191825A (en) * 2019-12-20 2020-05-22 北京淇瑀信息科技有限公司 User default prediction method and device and electronic equipment
US20210201400A1 (en) * 2019-12-27 2021-07-01 Lendingclub Corporation Intelligent servicing
CN111324862A (en) * 2020-02-10 2020-06-23 深圳华策辉弘科技有限公司 Method and system for monitoring behavior in loan
CA3135469A1 (en) * 2020-09-30 2022-03-30 10353744 Canada Ltd. Default loss rate prediction method and device

Also Published As

Publication number Publication date
CN114676936B (en) 2025-06-20

Similar Documents

Publication Publication Date Title
US20230325724A1 (en) Updating attribute data structures to indicate trends in attribute data provided to automated modelling systems
US10846643B2 (en) Method and system for predicting task completion of a time period based on task completion rates and data trend of prior time periods in view of attributes of tasks using machine learning models
US20200134716A1 (en) Systems and methods for determining credit worthiness of a borrower
US20210303970A1 (en) Processing data using multiple neural networks
US11037236B1 (en) Algorithm and models for creditworthiness based on user entered data within financial management application
US20210272195A1 (en) Instant Lending Decisions
WO2020168851A1 (en) Behavior recognition
US12265892B2 (en) Utilizing machine learning models to characterize a relationship between a user and an entity
CN115423040A (en) User portrait recognition method and AI system for interactive marketing platform
CN110110882A (en) Risk Forecast Method, device, computer equipment and storage medium
CN116883154A (en) Credit risk identification method, credit risk identification device, electronic equipment and readable storage medium
EP4457697A1 (en) Processing sequences of multi-modal entity features using convolutional neural networks
CN111310931A (en) Parameter generation method and device, computer equipment and storage medium
CN115063145B (en) Transaction risk factor prediction method, device, electronic device and storage medium
US20240161117A1 (en) Trigger-Based Electronic Fund Transfers
US20200051175A1 (en) Method and System for Predicting and Indexing Probability of Financial Stress
CN114676936A (en) Method for predicting default time and related device
WO2024136904A1 (en) Explainable machine-learning techniques from multiple data sources
CN116304607A (en) Automated feature engineering for predictive modeling using deep reinforcement learning
CN114266655A (en) Wind control model construction method and device based on reinforcement learning
CN114881340A (en) Merchant analysis method and related device
US20200013114A1 (en) Systems and methods for optimal bidding in a business to business environment
US20240257254A1 (en) Systems and methods for generating personalized asset allocation glidepaths
Kraus et al. Credit scoring optimization using the area under the curve
CN119323473A (en) Method, apparatus, device and storage medium for user risk assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant