CN114676936A - Method for predicting default time and related device - Google Patents
Method for predicting default time and related device Download PDFInfo
- Publication number
- CN114676936A CN114676936A CN202210460740.8A CN202210460740A CN114676936A CN 114676936 A CN114676936 A CN 114676936A CN 202210460740 A CN202210460740 A CN 202210460740A CN 114676936 A CN114676936 A CN 114676936A
- Authority
- CN
- China
- Prior art keywords
- model
- customer
- default
- predicting
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 83
- 238000012549 training Methods 0.000 claims description 71
- 238000012545 processing Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 14
- 239000000203 mixture Substances 0.000 claims description 9
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 claims 1
- 238000012954 risk control Methods 0.000 abstract description 10
- 238000013473 artificial intelligence Methods 0.000 abstract description 5
- 230000009286 beneficial effect Effects 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 11
- 238000012544 monitoring process Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 10
- 238000012502 risk assessment Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 7
- 238000010801 machine learning Methods 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000008878 coupling Effects 0.000 description 5
- 238000010168 coupling process Methods 0.000 description 5
- 238000005859 coupling reaction Methods 0.000 description 5
- 238000013523 data management Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 230000003796 beauty Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000007621 cluster analysis Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000013179 statistical model Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 241001123248 Arma Species 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000010224 classification analysis Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- Finance (AREA)
- General Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Game Theory and Decision Science (AREA)
- Technology Law (AREA)
- Entrepreneurship & Innovation (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The application provides a default time prediction method and a related device, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring data of a first client, wherein the data of the first client comprises a parameter for reflecting credit risk of the first client; inputting the data into a first model to obtain default probability of a first customer; under the condition that the default probability of the first customer is larger than a preset value, predicting a time interval to which default time of the first customer belongs through a second model; and predicting the default time of the first customer through a corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs. By predicting the default time of the first customer, the financial institution can make more reasonable risk control measures aiming at the default time, which is beneficial to improving the credit risk control efficiency of the financial institution.
Description
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a method and a related apparatus for predicting default time.
Background
With the rapid development of economy and finance in China, consumption credit is gradually accepted by people. In recent years, various credit businesses such as automobile loan, education loan, small cash loan, and beauty loan have been actively developed. For credit businesses, financial institutions are critical to the assessment of credit risk to customers. In the big data era, although data for credit risk assessment is becoming more abundant, it also poses many challenges to credit risk assessment.
At present, the mainstream credit risk assessment method is to predict whether a customer will default or calculate default probability of the customer by using a statistical model, but simply predict the default probability of the customer, which is not comprehensive enough for risk control of financial institutions, and the efficiency of the risk control is not high.
Disclosure of Invention
The application provides a default time prediction method and a related device, and by predicting default time of a client, a financial institution can make more reasonable risk control measures aiming at the default time, so that the credit risk control efficiency of the financial institution is improved.
In a first aspect, the present application provides a method for predicting default time, which may be performed by a server, or may be performed by a component (e.g., a chip system, etc.) configured in the server, or may be implemented by a logic module or software capable of implementing all or part of the functions of the server, and is not limited in this application.
The server is provided with a first model, a second model and at least one third model, wherein the first model is used for predicting default probability of the customer, the second model is used for predicting a time interval to which default time of the customer belongs, the at least one third model corresponds to the at least one time interval, and each third model is used for predicting default times and default time of the customer in the corresponding time interval.
Illustratively, the method comprises: obtaining data of a first customer, the data of the first customer comprising a parameter for reflecting credit risk of the first customer; inputting the data into the first model to obtain a default probability of the first customer; under the condition that the default probability of the first customer is larger than a preset value, predicting a time interval to which default time of the first customer belongs through the second model; and predicting the default time of the first customer through a corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs.
Based on the technical scheme, the acquired data of the first customer is input into the first model to obtain the default probability of the customer, the time interval to which the default time of the customer belongs is predicted through the second model under the condition that the default probability is larger than the preset value, namely the customer possibly defaults in the future, and the specific default time of the customer is predicted through the corresponding third model on the basis of the predicted time interval to which the default time of the customer belongs.
With reference to the first aspect, in a certain possible implementation manner of the first aspect, the first model is a multiple logical (Logistic) model, the second model is a Gaussian Mixture Model (GMM), and the third model is an autoregressive integrated moving average (ARIMA) model.
With reference to the first aspect, in a certain possible implementation manner of the first aspect, the data includes one or more of the following: industry category, interest rate of performance, amount of loan, number of loan terms, gender, age, academic history, family annual income, employment status, unit type, occupancy status, job title, social security label, and customer rating.
With reference to the first aspect, in a certain possible implementation manner of the first aspect, the first model includes a plurality of first sub-models, the plurality of first sub-models correspond to different client types, and the client type is determined according to an age group or a region to which the client belongs; said inputting said data into said first model to obtain a probability of breach by said first customer, comprising: determining a first sub-model corresponding to a customer type of the first customer from the plurality of first sub-models; and inputting the data of the first customer into the first submodel to obtain the default probability of the first customer.
With reference to the first aspect, in a certain possible implementation manner of the first aspect, the second model includes a plurality of second sub-models, the second sub-models correspond to different client types, and the client type is determined according to an age group or an area to which the client belongs; the predicting, by the second model, a time interval to which the default time of the first customer belongs includes: determining a second sub-model corresponding to the customer type of the first customer from the plurality of second sub-models; and predicting a time interval to which the default time of the first customer belongs through the second submodel.
With reference to the first aspect, in a certain possible implementation manner of the first aspect, each third model includes a plurality of third sub-models, where client types corresponding to any two third sub-models in the plurality of third sub-models are different, and the client type is determined according to an age group or an area to which the client belongs; the predicting the default time of the first customer through a corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs comprises: determining a third sub-model corresponding to the customer type of the first customer in a third model corresponding to a time interval to which the default time of the first customer predicted by the second model belongs; predicting, by the third submodel, a time of default for the first customer.
With reference to the first aspect, in a certain possible implementation manner of the first aspect, the method further includes: acquiring a training set, wherein the training set comprises historical data of a plurality of clients; training the first model, the second model, and the at least one third model, respectively, based on the training set.
With reference to the first aspect, in a certain possible implementation manner of the first aspect, the method further includes: grouping the training sets based on the client types respectively corresponding to the clients to obtain a plurality of groups of training sets, wherein the client types corresponding to the training sets are different, and the client types are determined according to the age groups or the regions of the clients; and training the first model, the second model, and the at least one third model, respectively, based on the training set, comprising: and training the first model, the second model and the at least one third model respectively based on each group of training sets to obtain a trained first sub-model, a trained second sub-model and a plurality of trained third sub-models.
With reference to the first aspect, in a certain possible implementation manner of the first aspect, the method further includes: updating the training set according to a preset period to obtain an updated training set; training the first model, the second model, and the at least one third model, respectively, based on the updated training set.
In a second aspect, the present application provides a model training method, which may be performed by a server. The server is provided with a first model, a second model and at least one third model, wherein the first model is used for predicting default probability of a customer, the second model is used for predicting a time interval to which default time of the customer belongs, the at least one third model corresponds to the at least one time interval, and each third model is used for predicting default times and default time of the customer in the corresponding time interval.
Illustratively, the method comprises: acquiring data of a plurality of clients, wherein the data of each client comprises a parameter for reflecting credit risk of the client; inputting the data of the customers into the first model to obtain default probabilities of the customers; under the condition that the default probability of the customer is larger than a preset value, predicting a time interval to which default time of the customer belongs through the second model; and predicting the default time of the customer through a corresponding third model based on the time interval to which the default time of the customer predicted by the second model belongs.
Based on the technical scheme, the acquired data of the plurality of customers are input into the first model to obtain default probabilities of the plurality of customers, for the customers with the default probabilities larger than a preset value, the time interval to which default time of the customers belongs is predicted through the second model, namely, how long the customers possibly default in the future, and further, the specific default time of the customers is predicted through the corresponding third model based on the predicted time interval to which the default time of the customers belongs, so that the first model, the second model and at least one third model can be trained through the data of the plurality of customers, and the accuracy of the models is improved.
In a third aspect, the present application provides a server, where the server is configured with a first model, a second model, and at least one third model, the first model is used to predict a default probability of a customer, the second model is used to predict a time interval to which default time of the customer belongs, the at least one third model corresponds to the at least one time interval, and each third model is used to predict a number of times of default and the default time of the customer in the corresponding time interval.
Illustratively, the server includes an acquisition unit, an input unit, and a processing unit. The acquisition unit is used for acquiring data of a first client, wherein the data of the first client comprises a parameter for reflecting credit risk of the first client; the input unit is used for inputting the data into the first model to obtain the default probability of the first customer; the processing unit is used for predicting a time interval to which default time of the first customer belongs through the second model under the condition that the default probability of the first customer is greater than a preset value; the processing unit is further used for predicting the default time of the first customer through a corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs.
In a fourth aspect, the present application provides a server, the apparatus comprising a processor. The processor is coupled to the memory and is operable to execute the computer program in the memory to implement the method described in any of the possible implementations of the first to the second aspect and the first to the second aspect.
Optionally, the server in the fourth aspect further comprises a memory.
Optionally, the server in the fourth aspect further comprises a communication interface, and the processor is coupled with the communication interface.
In a fifth aspect, the present application provides a chip system, which includes at least one processor, and is configured to support implementation of functions involved in any one of the possible implementations of the first aspect to the second aspect and the first aspect to the second aspect, for example, receiving or processing data involved in the above methods, and the like.
In one possible design, the system-on-chip further includes a memory to hold program instructions and data, the memory being located within the processor or external to the processor.
The chip system may be formed by a chip, and may also include a chip and other discrete devices.
In a sixth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program (which may also be referred to as code or instructions) that, when executed by a processor, causes the method described in any of the possible implementations of the first to second aspects and the first to second aspects to be performed.
In a seventh aspect, the present application provides a computer program product comprising: a computer program (also referred to as code, or instructions), which when executed, causes the method described in any of the possible implementations of the first to the second aspect and the first to the second aspect described above to be performed.
It should be understood that the third to seventh aspects of the present application correspond to the technical solutions of the first and second aspects of the present application, and the advantageous effects obtained by the aspects and the corresponding possible implementations are similar and will not be described again.
It should also be understood that the method for predicting default time and the related device provided by the application can be applied to the field of artificial intelligence and can also be applied to other fields. This is not a limitation of the present application.
Drawings
FIG. 1 is a schematic diagram of an application scenario suitable for a method provided by an embodiment of the present application;
FIG. 2 is a schematic flow chart diagram of a method for predicting default time provided by an embodiment of the present application;
FIG. 3 is a further schematic flow chart of a method for predicting default time provided by an embodiment of the present application;
FIG. 4 is a schematic block diagram of a server provided by an embodiment of the present application;
fig. 5 is another schematic block diagram of a server provided in an embodiment of the present application.
Detailed Description
The following detailed description of the embodiments of the present application, presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to facilitate understanding of the default time prediction method provided in the embodiment of the present application, an application scenario applicable to the embodiment of the present application will be described below. It can be understood that the application scenarios described in the embodiments of the present application are for more clearly illustrating the technical solutions in the embodiments of the present application, and do not constitute limitations on the technical solutions provided in the embodiments of the present application.
Fig. 1 is a schematic view of an application scenario applicable to the method provided in the embodiment of the present application. As shown in fig. 1, a technician may enter relevant data for a customer's credit risk assessment via electronic device 110. Wherein the electronic device 110 is communicatively coupled to the server 120, the server 120 can present a user interface through the electronic device 110. The user interface provides an interface for a technician to interact with the server 120, and the technician may send data or information to the server 120 by entering or selecting, etc. from the user interface. Accordingly, the server 120 may present the customer's credit risk assessment results via the electronic device 110 based on data or information entered by the technician.
It should be understood that the scenario shown in fig. 1 is only an example, and the server 120 may be one physical device or a server cluster formed by multiple physical devices.
With the rapid development of economy and finance in China, consumption credit is gradually accepted by people. In recent years, various credit businesses such as automobile loan, education loan, small cash loan, and beauty loan have been actively developed. For credit businesses, financial institutions are critical to the assessment of credit risk to customers. In the big data era, although data for credit risk assessment is becoming more abundant, it also poses many challenges to credit risk assessment.
At present, the mainstream credit risk assessment method is to predict whether a customer will default or calculate default probability of the customer by using a statistical model, but simply predict the default probability of the customer, which is not comprehensive enough for risk control of financial institutions, and the efficiency of the risk control is not high.
In order to solve the above problems, the present application provides a method for predicting default time, wherein acquired data of a first customer is input into a first model to obtain default probability of the customer, when the default probability is greater than a preset value, a time interval to which the default time of the customer belongs is predicted through a second model, that is, how long the customer may default in the future, each time interval corresponds to a third model for predicting the default time of the customer, and further, based on the predicted time interval to which the default time of the customer belongs, the specific default time of the customer is predicted through a corresponding third model, so that a financial institution formulates more reasonable risk control measures based on the default time.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 2 is a schematic flow chart of a method 200 for predicting default time provided by an embodiment of the present application. The method 200 for predicting default time shown in fig. 2 may include steps 210 and 240. The various steps in method 200 are described in detail below.
It should be understood that the method 200 shown in fig. 2 has a server as an execution subject, but the execution subject of the method should not be limited in any way as long as the method provided by the embodiment of the present application can be executed by running a program recorded with codes of the method provided by the present application. For example, the server may be replaced with a component (e.g., a chip system, etc.) configured in the server or other functional modules capable of calling the program and executing the program.
It should also be understood that the server is configured with a first model for predicting a default probability of a customer, a second model for predicting a time interval to which default time of the customer belongs, that is, for predicting how long the customer may have a default, and at least one third model corresponding to the at least one time interval, each third model being used for predicting the number of times of the default and the default time of the customer in the corresponding time interval.
In step 210, data of the first client is obtained, the data of the first client including a parameter for reflecting credit risk of the first client.
The first client may be an incremental client, i.e., a new client, or an inventory client, i.e., an old client, which is not limited in this embodiment of the application. Wherein historical data of the old customer can be used to train the first model, the second model, and the at least one third model.
One possible implementation manner is that the server acquires the data of the first customer in response to the input operation of the technician, that is, the technician can input data reflecting the credit risk of the first customer through the user interface, and accordingly, the server acquires the data of the first customer for predicting the default time of the first customer.
In another possible implementation, the server may obtain the data of the first client from an owned platform or a partner platform, where the owned platform or the partner platform stores the data of the first client, and the owned platform or the partner platform may communicate with the server. The server can respond to operations such as clicking of a technician and the like, and trigger the server to acquire the data of the first customer from the platform, namely, the technician can click and predict default time of the first customer through a user interface, and further trigger the server to acquire the data of the first customer from the platform.
Optionally, the data includes one or more of: industry category, interest rate of performance, amount of loan, number of loan terms, gender, age, academic history, family annual income, employment status, unit type, occupancy status, job title, social security label, and customer rating.
Wherein the customer rating may be used to reflect the importance of the customer. For example, a higher customer rating indicates a higher importance level for the customer.
As an example, the server may be responsive to input from a technician to obtain one or more of industry category, interest rate, loan amount, loan terms, gender, age, academic history, yearly household income, employment status, unit type, occupancy status, job title, social security label, and customer rating for the first customer to facilitate the server in predicting the first customer's probability of default based on multi-faceted impact.
As yet another example, the server may obtain one or more of an industry category, an execution interest rate, a loan amount, a loan term, a gender, an age, a scholarly calendar, a family annual income, an employment situation, a unit type, a living situation, a job title, a social security label, and a customer rating of the first customer from an owned platform or a partner platform, wherein the owned platform or the partner platform stores the data of the first customer, may communicate with the server, and may trigger the server to obtain the data of the first customer from the platform in response to a click of a technician, or the like.
It should be understood that the information collection, storage, usage, processing, transmission, provision, and disclosure, such as financial data or user personal data, referred to in this application, all comply with relevant legal regulations and do not violate common customs.
The embodiment of the application comprehensively considers the influence of various factors on the default condition of the customer by providing the data of various possible default conditions of the customer, and is favorable for improving the accuracy of the predicted default probability of the customer.
Optionally, the first model is a Logistic model, the second model is a GMM, and the third model is an ARIMA model.
The Logistic model, GMM and ARIMA model will be described in detail below, respectively.
First, Logistic model
The Logistic model is similar in form as follows:wherein,represents the probability of breach of customer i,the data representing the customer i is transmitted to the customer i,are parameters that require training.
It will be appreciated that the above description has been madeThis indicates that the client i has a default, and should not be construed as limiting the embodiments of the present application in any way. For example,it may also mean that customer i does not violate. Accordingly, when the probability of non-default of the customer is greater than the preset value, the customer is indicatedGood, the customer can be considered to be on hold during the loan period, in other words, the server need not further predict the default time of the customer; when the probability that the client does not default is less than or equal to the preset value, the credit of the client is general, and the client can be considered to default in the loan period (which can be called as a high-risk client), in other words, the server needs to further predict the default time of the client.
II, GMM
The form of the 1-dimensional gaussian distribution is similar as follows:where N (x | μ, σ) represents probability, σ represents standard deviation, μ represents mean, i.e., expectation, and x represents customer data, such as household income, or age, etc. The above formula represents the probability in the vicinity of μ. It will be appreciated that the closer the distance μ, i.e. the smaller σ, the greater its probability.
The form of a d (d is a positive integer greater than 1) dimensional gaussian distribution is similar as follows:
where d represents the dimension of x, Σ represents the covariance matrix of d × d, | Σ | is the value of the determinant of covariance, μ represents the mean, i.e., the expectation, and x represents customer data such as family income, age, etc.
The gaussian mixture model is obtained by mixing a plurality of gaussian models together, and using the weight parameter to adjust the mixing ratio of different gaussian models (representing classes in the data sample), and the form of the gaussian mixture model is similar as follows:wherein, p (x | C)j)=N(x|μj,∑j) Is the conditional probability density, p (C), of a class or group j (obeying a Gaussian distribution)j)≥0,p(Cj) Is the weight parameter (w) of the class jj=p(Cj) And are) andthe parameter of the model is p (C)j),μj,∑jWhere j is 1, …, k is the total number of categories or groupings (k categories or groupings in total), k being a positive integer. Mu.sjIs the mean, Σ, of class jjIs the covariance of class j.
In the embodiment of the present application, the category represents a time interval of default time of the customer, that is, how long in the future the customer will have the default. For example, the multiple categories are: default within one year, default within two years, default within three years, and default within four years. Based on the above gaussian mixture model, a posteriori probability calculation can be performed on the first customer, and the first customer is classified into one of the gaussian models, that is, a category to which the first customer belongs, that is, a time interval to which default time of the first customer belongs is obtained.
It will be appreciated that before using the gaussian mixture model described above, parameter estimation is required to obtain the gaussian mixture model used to predict the class to which the first customer belongs. The gaussian mixture model may be subjected to parameter estimation using, for example, an Expectation Maximization (EM) algorithm, and the gaussian mixture model used for predicting the category to which the first client belongs may be obtained. The process of parameter estimation is described in detail below.
The goal of fitting the GMM is to find p (C)j),μj,ΣjSo as to maximizeWherein, p (x | C)j)=N(x|μj,Σj). Taking logarithms on two sides to obtain a GMM log-likelihood function:the goal is to maximize the log-likelihood function, so the EM algorithm is used. The EM algorithm includes an initialization step and an iteration step. The initialization steps are as follows: initializing K clusters: c1,…,CkFor each cluster j, there is a parameter (μ)j,∑j) And p (C)j). The iteration steps are as follows: estimating the cluster to which each data point belongs p (C)j|xj) (expectation step) calculating an expectation of a likelihood function; reestimating the parameters (μ) of each cluster jj,∑j) And p (C)j) (maximization step).
The specific process of the EM algorithm is as follows:
the method comprises the following steps: let z1,…,znRepresenting corresponding data x1,…,xnThe true source (i.e., category) of (i.e., the category). Each ziAre discrete variables that take values between j 1, …, k, where k is the number of classes. There are the following log-likelihood functions:
step two: model parameters are represented by θ, and logp (X, z | θ) is used as its expectationInstead of this.
Step three: logp (X, z | theta) needs to be estimated given the current parameters(t)) According to Bayes' rule, which satisfies
Step four: p (z)i=c|xi,θ(t)) Called the "responsibility" that cluster c assumes for data point i, ric=p(zi=c|xi,θ(t))。
Step five: desired step of GMM, will x1,…,xnThe number of the X is recorded as X,
step six: maximization step of GMM, and Q (theta )(t)) Taking the corresponding partial derivatives for each parameter and setting these partial derivatives to zero, resulting in a new parameter estimateWherein,
it is understood that more detailed description of the GMM and EM algorithms can refer to known techniques and will not be described herein.
Three, ARIMA model
The autoregressive model first requires determining an order p, which represents the prediction of the current value from several phases of historical values. The formula of the p-order autoregressive model is defined as:wherein, ytIs the current value, μ is a constant term, p is the order, γiIs the autocorrelation coefficient, etIs an error. The moving average model focuses on the accumulation of error terms in the autoregressive model, and the formula is defined as follows:combining the autoregressive model with the moving average model to obtain an autoregressive moving average model ARMA (p, q), wherein the calculation formula is as follows:combining the autoregressive model, the moving average model and the difference method, ARIMA (p, d, q) is obtained, wherein d is the order of the difference of the data.
It should be understood that the above descriptions of the Logistic model, the GMM, and the ARIMA model are only provided for clarity of the method provided in the embodiments of the present application, and should not constitute any limitation to the embodiments of the present application, and for more detailed descriptions, reference may be made to known technologies, and details are not repeated herein.
In the embodiment of the application, the first model adopts the Logistic model to predict the default probability of the client, which is beneficial to preliminarily screening some high-risk clients, namely the clients with higher default probability, and further beneficial to the financial institution to evaluate the clients seriously, and reduce the loss of the financial institution as much as possible. The second model adopts GMM, which can be used for identifying more complex distribution, and by using GMM, the influence of multiple aspects such as industry, income, employment situation and the like on the default condition of the client can be comprehensively considered, thereby being beneficial to improving the accuracy of the time interval to which the default time of the client belongs. And the ARIMA model is adopted by the third model to predict the default time of the client, so that the prediction accuracy can be improved.
After the server acquires the data of the first customer, the data of the first customer is input into the first model to obtain the default probability of the first customer.
Illustratively, the server inputs the industry category, the executed interest rate, the loan amount, the loan futures, the gender, the age, the academic history, the family annual income, the employment situation, the unit type, the living situation, the position, the social security label and the client grade of the first client into the Logistic model to obtain the default probability of the first client.
It should be understood that the foregoing description is provided by way of example to obtain the default probability of the first customer, and the embodiments of the present application should not be construed as limiting in any way. For example, the server may also calculate a probability that the first customer will not violate based on the Logistic model. Accordingly, when the probability that the client does not default is greater than the preset value, the credit of the client is good, and the client can be considered to be in the loan period, in other words, the server does not need to further predict the default time of the client; when the probability that the client does not default is less than or equal to the preset value, the credit of the client is general, and the client can be considered to default in the loan period (which can be called as a high-risk client), in other words, the server needs to further predict the default time of the client.
And step 230, under the condition that the default probability of the first customer is greater than a preset value, predicting a time interval to which default time of the first customer belongs through a second model.
After the server obtains the default probability of the first customer, in the case that the default probability of the first customer is less than or equal to the preset value, the credit of the first customer can be considered to be good, that is, the first customer is in a state of being on hold during the loan period, in other words, the server does not need to further predict the default time of the customer.
And under the condition that the default probability of the first client is greater than the preset value, the server predicts a time interval to which the default time of the first client belongs through the second model, namely, how long the first client is possible to default in the future.
Illustratively, in the case that the default probability of the first client is greater than the preset value, the server predicts the time interval to which the default time of the first client belongs through the GMM, for example, the server predicts that the first client may have default within one year after the loan through the GMM.
And 240, predicting the default time of the first customer through a corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs.
Each time interval corresponds to a third model, e.g., a customer may have a breach within a year, which corresponds to a third model used to predict the specific time that the customer will have breaches within a year, e.g., the breach in the tenth month of the year.
And after the server predicts the time interval to which the default time of the first customer belongs, predicting the default time of the first customer through a corresponding third model.
Illustratively, the server predicts that the first client may have default within one year after the loan through the GMM, and further predicts that the first client may have default within the tenth month after the loan through the corresponding ARIMA model, thereby facilitating the financial institution to make more reasonable risk control measures.
It should be understood that different types of customers have different economic levels, and the probability of default is high, so the server can select the corresponding first sub-model, second sub-model and third sub-model for the type of customer to which the first customer belongs. How the server predicts the default time of the client after classifying the client for the client type will be described in detail below.
Optionally, the first model includes a plurality of first sub-models, the plurality of first sub-models correspond to different client types, and the client type is determined according to the age group or region to which the client belongs; inputting the data into a first model to obtain a default probability for the first customer, comprising: determining a first sub-model corresponding to a customer type of a first customer from the plurality of first sub-models; and inputting the data of the first customer into the first sub-model to obtain the default probability of the first customer.
Illustratively, the first model comprises a first sub-model 1 and a first sub-model 2, the type of the client corresponding to the first sub-model 1 is a south client, the type of the client corresponding to the first sub-model 2 is a north client, and the server determines which first sub-model the data of the first client is input to based on the type of the first client, if the first client belongs to the south client, the server inputs the data of the first client to the first sub-model 1 to obtain the default probability of the first client.
Optionally, the second model includes a plurality of second submodels, the plurality of second submodels correspond to different client types, and the client type is determined according to the age group or region to which the client belongs; predicting, by a second model, a time interval to which the default time of the first customer belongs, including: determining a second submodel corresponding to the customer type of the first customer from the plurality of second submodels; and predicting a time interval to which the default time of the first customer belongs through the second submodel.
Illustratively, the second model comprises a second sub-model 1 and a second sub-model 2, the type of the customer corresponding to the second sub-model 1 is a south customer, the type of the customer corresponding to the second sub-model 2 is a north customer, and the server determines which second sub-model the data of the first customer is input to based on the type of the first customer, if the first customer belongs to the south customer, if the predicted default probability of the first customer based on the first sub-model 1 is greater than a preset value, the server inputs the data of the first customer to the second sub-model 1 to predict the time interval to which the default time of the first customer belongs.
Optionally, each third model includes a plurality of third submodels, the types of customers corresponding to any two of the plurality of third submodels are different, and the customer type is determined according to the age group or region to which the customer belongs; predicting the default time of the first customer through a corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs, wherein the method comprises the following steps: determining a third submodel corresponding to the customer type of the first customer in a third model corresponding to a time interval to which the default time of the first customer predicted by the second model belongs; and predicting the default time of the first customer through the third submodel.
Illustratively, the first customer belongs to a southern customer, the server predicts a time interval to which default time of the first customer belongs based on the second submodel 1, and different time intervals correspond to different third submodels, for example, the third submodel 1 corresponds to a year after loan, the third submodel 2 corresponds to a year after loan, and the third submodel 3 corresponds to a year after loan, and if the time interval to which the default time of the first customer belongs is within a year after loan, the server predicts the default time of the first customer through the third submodel 1, and the first customer may have default for a month after loan.
Optionally, before acquiring the data of the first client, the method shown in fig. 2 further includes: acquiring a training set, wherein the training set comprises historical data of a plurality of clients; the first model, the second model and the at least one third model are trained, respectively, based on a training set.
For example, the server may obtain historical data of a plurality of clients, and divide the historical data of the plurality of clients into training data and verification data, that is, a part of the data of the clients is used for training the model, and a part of the data of the clients is used for verifying the model, so as to obtain the trained first model, the trained second model, and the trained third model.
The embodiment of the application trains the first model, the second model and the at least one third model respectively through the training set, so that the accuracy of the first model, the second model and the at least one third model is improved, and the accuracy of the default time of the predicted customer is improved.
Optionally, the method shown in fig. 2 further includes: grouping the training sets based on the client types respectively corresponding to the clients to obtain a plurality of groups of training sets, wherein the client types corresponding to the training sets are different, and the client types are determined according to the age groups or the regions of the clients; and training the first model, the second model and the at least one third model respectively based on the training set, including: and respectively training the first model, the second model and at least one third model based on each group of training sets to obtain a trained first sub-model, a trained second sub-model and a plurality of trained third sub-models.
The server may group the clients according to areas where the clients are located or age groups to which the clients belong, that is, divide the training set into a plurality of groups of training sets, where each group of training set corresponds to one type of client, and train the first model, the second model, and the at least one third model based on each group of training set to obtain a trained first submodel, a trained second submodel, and a trained third submodel.
Illustratively, the server obtains data of 100 clients, wherein 40 clients belong to south and 60 clients belong to north, and the 100 clients are divided into two groups due to the difference of economic level between south and north, for example, the first model, the second model and at least one third model are trained respectively by using the data of 40 clients belonging to south, so as to obtain a trained first sub model, a trained second sub model and a trained third sub model; and using 60 data of customers belonging to the north to train the first model, the second model and at least one third model respectively to obtain a trained first sub-model, a trained second sub-model and a plurality of trained third sub-models. In this way, 2 first submodels, 2 second submodels, and a number of third submodels (the number of third submodels is twice the number of third models) can be obtained. For a new client, the server may use its corresponding first sub-model, second sub-model, and third sub-model to predict based on the type to which the client belongs.
In the embodiment of the application, the clients are classified according to the types of the clients, so that the influence of economic differences of the clients of different types on the default probability of the predicted clients is reduced.
Optionally, the method shown in fig. 2 further includes: updating the training set according to a preset period to obtain an updated training set; and respectively training the first model, the second model and at least one third model based on the updated training set.
In other words, the server may periodically update the training set and train the first model, the second model, and the at least one third model, respectively, based on the updated training set. For example, the server may periodically obtain historical data for different customers and train the first model, the second model, and the at least one third model based on the obtained data.
According to the embodiment of the application, the training set is periodically updated, and the model is trained based on the updated training set, namely the model is trained for multiple times, so that the accuracy of the model is improved, and the accuracy of the predicted default time of the client is improved.
Fig. 3 is a schematic flowchart of a default time prediction method according to an embodiment of the present application.
As shown in fig. 3, the server starts, step 310.
The server maintains or updates the model, and configures the model parameters, step 320. For example, the server maintains or updates the first model, the second model, and the at least one third model, configuring parameters of the models.
The server enables a model of client risk monitoring, step 330. For example, the server enables the first model described above.
In step 340, the server classifies and clusters the clients. Illustratively, the server determines whether the client belongs to a high-risk client or a conservative client based on the first model. For example, the server predicts the default probability of the customer based on the first model, and in the case that the default probability of the customer is greater than a preset value, the customer is considered as a high-risk customer, and the time interval to which the default time belongs and the specific default time need to be further predicted. The detailed process can refer to the related description of fig. 2, and is not described herein again.
And step 360, the server calculates based on the historical records and displays the model monitoring effect. In other words, the server inputs the historical data of the client into the model, determines the default time, and determines whether the default time of the client is accurate, for example, the actual default time of the client is the same as the default time calculated based on the model, which indicates that the monitoring effect of the model is better.
It can be appreciated that based on the model monitoring effect, a developer can instruct the server to adjust the model to improve the accuracy of the model.
Based on the technical scheme, the server inputs the acquired data of the first customer into the first model to obtain the default probability of the customer, under the condition that the default probability is greater than the preset value, the time interval to which the default time of the customer belongs is predicted through the second model, namely, how long the customer possibly defaults in the future, and further the specific default time of the customer is predicted through the corresponding third model on the basis of the predicted time interval to which the default time of the customer belongs.
Optionally, an embodiment of the present application further provides a model training method, which may be executed by a server. The server is provided with a first model, a second model and at least one third model, wherein the first model is used for predicting default probability of a customer, the second model is used for predicting a time interval to which default time of the customer belongs, the at least one third model corresponds to the at least one time interval, and each third model is used for predicting default times and default time of the customer in the corresponding time interval.
Illustratively, the method comprises: acquiring data of a plurality of clients, wherein the data of each client comprises a parameter for reflecting credit risk of the client; inputting data of a plurality of customers into a first model to obtain default probabilities of the customers; under the condition that the default probability of the customer is greater than a preset value, predicting a time interval to which default time of the customer belongs through a second model; and predicting the default time of the customer through a corresponding third model based on the time interval to which the default time of the customer predicted by the second model belongs.
For a specific process of training the first model, the second model and the at least one third model, reference may be made to the related description of the embodiment shown in fig. 2, and details are not repeated here.
Based on the technical scheme, the acquired data of the plurality of customers are input into the first model to obtain default probabilities of the plurality of customers, for the customers with the default probabilities larger than a preset value, the time interval to which default time of the customers belongs is predicted through the second model, namely, how long the customers possibly default in the future, and further, the specific default time of the customers is predicted through the corresponding third model based on the predicted time interval to which the default time of the customers belongs, so that the first model, the second model and at least one third model can be trained through the data of the plurality of customers, and the accuracy of the models is improved.
Fig. 4 is a schematic block diagram of a server provided in an embodiment of the present application.
As shown in fig. 4, the apparatus 400 may include: an acquisition unit 410, an input unit 420, and a processing unit 430. The server 400 may be used to implement the methods described in the embodiments shown in fig. 2 or fig. 3.
Exemplarily, when the apparatus 400 is used to implement the method described in the embodiment shown in fig. 2, the obtaining unit 410 is configured to obtain data of the first client, where the data of the first client includes a parameter for reflecting credit risk of the first client; the input unit 420 is configured to input data into the first model, and obtain a default probability of the first customer; the processing unit 430 is configured to predict, through the second model, a time interval to which default time of the first customer belongs, when the default probability of the first customer is greater than a preset value; the processing unit 430 is further configured to predict the default time of the first customer through a corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs.
Optionally, the first model is a Logistic model, the second model is a GMM, and the third model is an ARIMA model.
Optionally, the data includes one or more of: industry category, interest rate of performance, amount of loan, number of loan terms, gender, age, academic history, family annual income, employment status, unit type, occupancy status, job title, social security label, and customer rating.
Optionally, the first model includes a plurality of first sub-models, the plurality of first sub-models correspond to different client types, and the client type is determined according to the age group or region to which the client belongs; the input unit 420 is specifically configured to determine a first sub-model corresponding to the customer type of the first customer from the plurality of first sub-models; and inputting the data of the first customer into the first submodel to obtain the default probability of the first customer.
Optionally, the second model includes a plurality of second submodels, the plurality of second submodels correspond to different client types, and the client type is determined according to the age group or region to which the client belongs; the processing unit 430 is specifically configured to determine, from the plurality of second submodels, a second submodel corresponding to the customer type of the first customer; and predicting a time interval to which the default time of the first customer belongs through the second submodel.
Optionally, each third model includes a plurality of third submodels, the types of customers corresponding to any two of the plurality of third submodels are different, and the customer type is determined according to the age group or region to which the customer belongs; the processing unit 430 is specifically configured to determine, in a third model corresponding to a time interval to which the default time of the first customer predicted by the second model belongs, a third submodel corresponding to the customer type of the first customer; and predicting the default time of the first customer through the third submodel.
Optionally, the processing unit 430 is further configured to obtain a training set, where the training set includes historical data of a plurality of clients; the first model, the second model and the at least one third model are trained, respectively, based on a training set.
Optionally, the processing unit 430 is further configured to group the training sets based on client types respectively corresponding to the multiple clients to obtain multiple groups of training sets, where the client types corresponding to the multiple groups of training sets are different, and the client types are determined according to the age groups or the regions to which the clients belong; and the processing unit 430 is specifically configured to train the first model, the second model, and the at least one third model based on each group of training sets, respectively, to obtain a trained first sub-model, a trained second sub-model, and a plurality of trained third sub-models.
Optionally, the processing unit 430 is further configured to update the training set according to a preset period, so as to obtain an updated training set; and respectively training the first model, the second model and at least one third model based on the updated training set.
It should be understood that the division of the units in the embodiment of the present application is illustrative, and is only one logical function division, and in actual implementation, there may be another division manner. In addition, functional units in the embodiments of the present application may be integrated into one processor, may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.
Illustratively, the server may include a data management unit, a model management unit, and a risk monitoring unit. The data management unit is used for storing or updating loan client data, and the loan client data is used for machine learning model training and verification. The model management unit is used for configuring and updating machine learning models for data analysis and risk analysis, such as Logistic model, GMM and ARIMA model. The risk monitoring unit is used for applying a machine learning model, performing cluster analysis, risk prediction and the like on loan customer data, and displaying credit risk, namely default time and/or default probability of customers.
The data management unit can be further divided into a loan customer data module, a customer group data module and a prediction record data module.
The loan client data module is used for storing the data of loan clients. The data includes industry category, performance interest rate, loan amount, loan term, gender, age, academic calendar, family annual income, employment status, unit type, housing status, job title, social security label, customer rating, etc. The customer group data module is used for generating customer group data (namely a set of data of customers belonging to the same category) through clustering analysis. And the prediction record data module is used for storing the client risk prediction result and recording data.
The model management unit may be further specifically divided into a model instance module, a model parameter module, and a model update module.
The model instance module is used for configuring machine learning models for classification and cluster analysis, and the machine learning models comprise a Logistic model, a GMM and the like. The model parameter module is used for configuring and updating parameters of the machine learning model. For the parameters of the GMM, the convergence is achieved by updating the model parameters using the Metropolis-Hastings algorithm. The model updating module is used for updating the existing model, or deleting the old model, or adding the new model.
The risk monitoring unit can be further divided into a model starting module, a customer clustering result module and a model monitoring effect module.
Wherein the enabling model module is used for enabling or disabling the machine learning model to process the risk monitoring. And the customer clustering result module is used for classifying and clustering loan customers by using the model, displaying the result, and prompting the default duration prediction result if the loan customers are judged to be high-risk customers, for example, the customer A will have default within one year, and the probability is 83%. The model monitoring effect module is used for integrating historical data of loan clients and displaying model risk monitoring effects, such as indexes of correct classification proportion, error proportion, recall rate and the like, so that model developers can analyze the results, and model optimization is facilitated.
Fig. 5 is another schematic block diagram of a server provided in an embodiment of the present application.
The server 500 may be used to implement the methods described in the embodiments illustrated in fig. 2 or fig. 3 above. The server 500 may be a system on a chip. In the embodiment of the present application, the chip system may be formed by a chip, and may also include a chip and other discrete devices.
As shown in fig. 5, the server 500 may include at least one processor 510.
Illustratively, processor 510 may be configured to obtain data for a first customer, the data for the first customer including a parameter reflecting a credit risk of the first customer; inputting the data into a first model to obtain default probability of a first customer; under the condition that the default probability of the first customer is larger than a preset value, predicting a time interval to which default time of the first customer belongs through a second model; and predicting the default time of the first customer through a corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs. For details, reference is made to the detailed description in the method example, which is not repeated herein.
The server 500 may also include at least one memory 520 that may be used to store program instructions and/or data. The memory 520 is coupled to the processor 510. The coupling in the embodiments of the present application is an indirect coupling or a communication connection between devices, units or modules, and may be an electrical, mechanical or other form for information interaction between the devices, units or modules. The processor 510 may operate in conjunction with the memory 520. Processor 510 may execute program instructions stored in memory 520. At least one of the at least one memory may be included in the processor.
The server 500 may also include a communication interface 530 for communicating with other devices over a transmission medium such that the server 500 may communicate with other devices. The communication interface 530 may be, for example, a transceiver, an interface, a bus, a circuit, or a device capable of performing a transceiving function. Processor 510 may utilize communication interface 530 to send and receive data and/or information and to implement the methods described in the embodiments illustrated in fig. 2 or 3.
The specific connection medium between the processor 510, the memory 520 and the communication interface 530 is not limited in the embodiments of the present application. In fig. 5, the processor 510, the memory 520, and the communication interface 530 are connected by a bus 540. The bus 540 is shown in fig. 5 by a thick line, and the connection between other components is merely illustrative and not intended to be limiting. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but that does not indicate only one bus or one type of bus.
The present application further provides a chip system, which includes at least one processor, and is configured to implement the method in the embodiment shown in fig. 2 or fig. 3.
In one possible design, the system-on-chip further includes a memory to hold program instructions and data, the memory being located within the processor or external to the processor.
The chip system may be formed by a chip, and may also include a chip and other discrete devices.
The present application further provides a computer program product, the computer program product comprising: a computer program (also referred to as code, or instructions), which when executed, causes a computer to perform the method as described in the embodiments shown in fig. 2 or fig. 3.
The present application also provides a computer-readable storage medium having stored thereon a computer program (also referred to as code, or instructions). When executed, cause a computer to perform the method as described in the embodiments shown in fig. 2 or fig. 3.
It should be noted that the method for predicting default time and the related apparatus provided in the embodiments of the present application may be applied to the field of artificial intelligence, and may also be applied to any field other than the field of artificial intelligence, which is not limited in this application.
It should be understood that the processor in the embodiments of the present application may be an integrated circuit chip having signal processing capability. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
It will also be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM, enhanced SDRAM, SLDRAM, Synchronous Link DRAM (SLDRAM), and direct rambus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
As used in this specification, the terms "unit," "module," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution.
Those of ordinary skill in the art will appreciate that the various illustrative logical blocks and steps (step) described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more units are integrated into one module.
In the above embodiments, the functions of the functional modules may be wholly or partially implemented by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions (programs). The procedures or functions described in accordance with the embodiments of the present application are generated in whole or in part when the computer program instructions (programs) are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk, and various media capable of storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (13)
1. A method for predicting default time is applied to a server, the server is configured with a first model, a second model and at least one third model, the first model is used for predicting default probability of a customer, the second model is used for predicting a time interval to which default time of the customer belongs, the at least one third model corresponds to the at least one time interval, and each third model is used for predicting default times and default time of the customer in the corresponding time interval, the method comprises the following steps:
obtaining data of a first customer, the data of the first customer comprising a parameter for reflecting credit risk of the first customer;
inputting the data into the first model to obtain a default probability of the first customer;
under the condition that the default probability of the first customer is larger than a preset value, predicting a time interval to which default time of the first customer belongs through the second model;
and predicting the default time of the first customer through a corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs.
2. The method of claim 1, wherein the first model is a multivariate logical Logistic model, the second model is a gaussian mixture model GMM, and the third model is an autoregressive moving average ARIMA model.
3. The method of claim 1, wherein the data comprises one or more of: industry category, interest rate of performance, amount of loan, number of loan terms, gender, age, academic history, family annual income, employment status, unit type, occupancy status, job title, social security label, and customer rating.
4. The method of claim 1, wherein the first model comprises a plurality of first sub-models corresponding to different customer types, the customer types being determined according to the age group or region to which the customer belongs;
said inputting said data into said first model to obtain a probability of breach by said first customer, comprising:
determining a first sub-model corresponding to a customer type of the first customer from the plurality of first sub-models;
and inputting the data of the first customer into the first submodel to obtain the default probability of the first customer.
5. The method of claim 1, wherein the second model comprises a plurality of second submodels corresponding to different customer types, the customer types being determined according to the age group or region to which the customer belongs;
the predicting, by the second model, a time interval to which the default time of the first customer belongs includes:
determining a second sub-model corresponding to the customer type of the first customer from the plurality of second sub-models;
and predicting a time interval to which the default time of the first customer belongs through the second submodel.
6. The method of claim 1, wherein each third model comprises a plurality of third submodels, any two third submodels in the plurality of third submodels correspond to different customer types, and the customer types are determined according to the age groups or regions to which the customers belong;
the predicting the default time of the first customer through a corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs comprises:
determining a third sub-model corresponding to the customer type of the first customer in a third model corresponding to a time interval to which the default time of the first customer predicted by the second model belongs;
predicting, by the third submodel, a default time for the first customer.
7. The method of claim 1, wherein the method further comprises:
acquiring a training set, wherein the training set comprises historical data of a plurality of clients;
training the first model, the second model, and the at least one third model, respectively, based on the training set.
8. The method of claim 7, wherein the method further comprises:
grouping the training sets based on the client types respectively corresponding to the clients to obtain a plurality of groups of training sets, wherein the client types corresponding to the training sets are different, and the client types are determined according to the age groups or the regions of the clients; and the number of the first and second groups,
the training the first model, the second model, and the at least one third model, respectively, based on the training set, includes:
and training the first model, the second model and the at least one third model respectively based on each group of training sets to obtain a trained first sub-model, a trained second sub-model and a plurality of trained third sub-models.
9. The method of claim 7 or 8, wherein the method further comprises:
updating the training set according to a preset period to obtain an updated training set;
training the first model, the second model, and the at least one third model, respectively, based on the updated training set.
10. A server, wherein the server is configured with a first model, a second model and at least one third model, the first model is used for predicting default probability of a customer, the second model is used for predicting a time interval to which default time of the customer belongs, the at least one third model corresponds to at least one time interval, each third model is used for predicting number of times of default and default time of the customer in the corresponding time interval, the server comprises:
an acquisition unit for acquiring data of a first customer, the data of the first customer comprising a parameter for reflecting credit risk of the first customer;
the input unit is used for inputting the data into the first model to obtain the default probability of the first customer;
the processing unit is used for predicting a time interval to which default time of the first customer belongs through the second model under the condition that the default probability of the first customer is greater than a preset value;
the processing unit is further used for predicting the default time of the first customer through a corresponding third model based on the time interval to which the default time of the first customer predicted by the second model belongs.
11. A server comprising a processor and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement the method of any of claims 1 to 9.
12. A computer-readable storage medium, comprising a computer program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 9.
13. A computer program product, comprising a computer program which, when executed, causes a computer to perform the method of any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210460740.8A CN114676936A (en) | 2022-04-28 | 2022-04-28 | Method for predicting default time and related device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210460740.8A CN114676936A (en) | 2022-04-28 | 2022-04-28 | Method for predicting default time and related device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114676936A true CN114676936A (en) | 2022-06-28 |
Family
ID=82080049
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210460740.8A Pending CN114676936A (en) | 2022-04-28 | 2022-04-28 | Method for predicting default time and related device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114676936A (en) |
-
2022
- 2022-04-28 CN CN202210460740.8A patent/CN114676936A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210182690A1 (en) | Optimizing neural networks for generating analytical or predictive outputs | |
US20230325724A1 (en) | Updating attribute data structures to indicate trends in attribute data provided to automated modelling systems | |
Malgonde et al. | An ensemble-based model for predicting agile software development effort | |
US20200134716A1 (en) | Systems and methods for determining credit worthiness of a borrower | |
Aytac et al. | Characterization of demand for short life-cycle technology products | |
US20150294246A1 (en) | Selecting optimal training data set for service contract prediction | |
US20160148321A1 (en) | Simplified screening for predicting errors in tax returns | |
US20210374582A1 (en) | Enhanced Techniques For Bias Analysis | |
US20220207420A1 (en) | Utilizing machine learning models to characterize a relationship between a user and an entity | |
US20210357699A1 (en) | Data quality assessment for data analytics | |
US20240346531A1 (en) | Systems and methods for business analytics model scoring and selection | |
US20230105547A1 (en) | Machine learning model fairness and explainability | |
RU2680760C1 (en) | Scoring models development and control computerized method | |
US10803403B2 (en) | Method for adaptive tuning via automated simulation and optimization | |
Stødle et al. | Data‐driven predictive modeling in risk assessment: Challenges and directions for proper uncertainty representation | |
Fonseca et al. | Setting the right expectations: Algorithmic recourse over time | |
US20140344020A1 (en) | Competitor pricing strategy determination | |
CN110796379B (en) | Risk assessment method, device and equipment of business channel and storage medium | |
US20140344021A1 (en) | Reactive competitor price determination using a competitor response model | |
US20140344022A1 (en) | Competitor response model based pricing tool | |
US20240161117A1 (en) | Trigger-Based Electronic Fund Transfers | |
CN116720946A (en) | Credit risk prediction method, device and storage medium based on recurrent neural network | |
CN115237970A (en) | Data prediction method, device, equipment, storage medium and program product | |
CN114676936A (en) | Method for predicting default time and related device | |
CN114418776A (en) | Data processing method, device, terminal equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |