CN116258574B - Mixed effect logistic regression-based default rate prediction method and system - Google Patents

Mixed effect logistic regression-based default rate prediction method and system Download PDF

Info

Publication number
CN116258574B
CN116258574B CN202310199292.5A CN202310199292A CN116258574B CN 116258574 B CN116258574 B CN 116258574B CN 202310199292 A CN202310199292 A CN 202310199292A CN 116258574 B CN116258574 B CN 116258574B
Authority
CN
China
Prior art keywords
subdivision
guest group
model
client
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310199292.5A
Other languages
Chinese (zh)
Other versions
CN116258574A (en
Inventor
王宇轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202310199292.5A priority Critical patent/CN116258574B/en
Publication of CN116258574A publication Critical patent/CN116258574A/en
Application granted granted Critical
Publication of CN116258574B publication Critical patent/CN116258574B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • General Business, Economics & Management (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a mixed effect logistic regression-based default rate prediction method and a system, wherein the method comprises the following steps: obtaining first customer information of a target customer, conducting customer group subdivision on the target customer according to the first customer information, obtaining subdivision results, calling the subdivision results and second customer information of the target customer to output first results through a preset customer group subdivision sub-model, constructing a customer group prediction sub-model of each customer group in the subdivision results to conduct risk prediction on the target customer, outputting second results, coupling the first results and the second results to obtain final risk scores of the target customer, and predicting the default rate of the target customer according to the final risk scores. By adopting end-to-end model calling, the modeling flow and the downstream model use flow are greatly simplified, and meanwhile, as only a single model is used, the model deployment flow is greatly simplified, the model testing and monitoring burden is reduced, and the maintainability of the model and the experience of a downstream user are improved.

Description

Mixed effect logistic regression-based default rate prediction method and system
Technical Field
The application relates to the technical field of credit evaluation, in particular to a mixed effect logistic regression-based default rate prediction method and system.
Background
Currently, big data scoring models are applied in large numbers to business scenarios such as credit card applications, installment approval, and the like. In general, the risk performance of a customer base is not homogenous, and there is a difference in risk performance across different segments of the customer base, i.e., the relationship between the independent and dependent variables is not consistent, a phenomenon known statistically as the simpson paradox. One common solution is that a modeler will divide clients into sub-clusters according to business experience first, and construct independent models on sub-clusters to reduce variance in the group, improve accuracy of model prediction, specifically, perform cluster analysis on the whole clusters first, and perform model training on the cluster result. The system can generate a plurality of guest group sub-models, the plurality of sub-models can cause trouble to downstream model users, the downstream model users can not correctly select proper sub-models for prediction, and experience feeling is reduced.
Disclosure of Invention
Aiming at the problems shown above, the application provides a mixed effect logistic regression-based default rate prediction method and system for solving the problems that the group of guests mentioned in the background art can generate a plurality of group of guests sub-models, the plurality of sub-models can cause trouble for downstream model users, the downstream model users can not correctly select proper sub-models for prediction, and the experience feeling is reduced.
A mixed effect logistic regression-based default rate prediction method comprises the following steps:
acquiring first client information of a target client, and performing client group subdivision on the target client according to the first client information to acquire subdivision results;
invoking the subdivision result and second client information of the target client to output a first result through a preset guest group subdivision model;
constructing a guest group prediction sub-model of each guest group in the subdivision result to predict risk of the target client, and outputting a second result;
and coupling the first result and the second result to obtain a final risk score of the target client, and predicting the default rate of the target client according to the final risk score.
Preferably, the obtaining the first client information of the target client, performing guest group subdivision on the target client according to the first client information, and obtaining the subdivision result includes:
acquiring the identity information and consumption information of a target client and confirming the identity information and consumption information as the first client information;
determining the client type of the target client according to the identity information of the target client, and extracting the common attribute of the client type;
performing first customer group definition on the target customer according to the common attribute and an experience rule corresponding to a preset service experience to obtain a definition customer group;
and determining the consumption level and consumption capacity of the target client according to the consumption information of the target client, deleting the first client group which is not satisfactory in the defined client group based on the consumption level and the consumption capacity, obtaining a second client group, and identifying the second client group as the sub-divided client group.
Preferably, the calling the subdivision result and the second client information of the target client outputs the first result through a preset guest group subdivision model, including:
acquiring income information credit record information of a target client and confirming the income information credit record information as second client information;
confirming whether the second client information is sequence data, if so, selecting a cyclic neural network model as a preset guest group subdivision model, and if not, selecting a Softmax regression model as the preset guest group subdivision model;
determining a guest group identification index parameter of each subdivision guest group in the subdivision result;
inputting the second client information and the guest group identification index parameters of each subdivision guest group into the preset guest group subdivision model, outputting the probability that the target client belongs to each guest group, and confirming the probability as the first result.
Preferably, the building the guest group prediction sub-model of each guest group in the subdivision result performs risk prediction on the target client, and outputs a second result, including:
selecting a general model architecture according to general logic parameters of each subdivision guest group in the subdivision result;
constructing a guest group predictor model of each subdivision guest group based on the general model architecture and the logic prediction parameters of the subdivision guest group;
inputting second client information of the target clients into a guest group prediction sub-model of each subdivision guest group to obtain output first risk scores;
and confirming the first risk score output in each guest group predictor model as the second result.
Preferably, the method further comprises:
constructing a mixed effect logistic regression model of each subdivision guest group according to the preset guest group subdivision sub-model and the guest group prediction sub-model of the subdivision guest group;
detecting the number of control variables of the mixed effect logistic regression model of each subdivision guest group;
determining whether the number of the control variables of the mixed effect logistic regression model of each subdivision guest group is larger than or equal to a preset value, if so, training and optimizing the mixed effect logistic regression model of the subdivision guest group through a random parallel gradient descent algorithm;
the trained and optimized mixed effect logistic regression model of each subdivision guest group is tested to evaluate its model accuracy.
Preferably, coupling the first result and the second result to obtain a final risk score of the target client, and predicting the default rate of the target client according to the final risk score includes:
carrying out weighted average calculation on the first risk scores output in the guest group prediction sub-model of the guest group by utilizing the probability that the target client belongs to each guest group to obtain a second risk score of the target client;
confirming the second risk score as a final risk score for the target customer;
and mapping the final risk score of the target client into the default probability through a preset function.
Preferably, the mapping the final risk score of the target client to the default probability through a preset function includes:
the probability of breach of the target client is calculated by the following formula:
p=sigmoid(f(X)·g(X))
wherein p is represented as the default probability of the target client output by the mixed effect logistic regression model, sigmoid is represented as a sigmoid function, real number risk scores are mapped to the default probability, X is represented as second client information of the target client, f () is represented as a preset client group subdivision model, and g () is represented as a client group prediction sub model.
Preferably, the determining the client type of the target client according to the identity information of the target client, and extracting the common attribute of the client type includes:
acquiring potential characteristics of a target user according to identity information of the target user, and determining the fit degree of the potential characteristics of the target user and the first attribute tags of each preset client type;
determining potential client types of the target clients according to the fit degree;
acquiring attribute value fields of second attribute tags of each potential client type;
and screening out the general attribute value of the potential client type based on the attribute value field of the second attribute label of each potential client type, and acquiring the entity attribute corresponding to the general attribute value as the common attribute of the potential client type of the target client.
Preferably, the defining the first customer group for the target customer according to the experience rule corresponding to the common attribute and the preset service experience, to obtain a defined customer group includes:
acquiring experience classification factors according to experience rules corresponding to the preset service experiences;
performing guest group classification on the target clients by using the experience classification factors to obtain a first classification result;
determining attribute variables according to the common attributes and consumption information of the target clients, and classifying the target clients by utilizing a preset decision tree guest group subdivision model based on the attribute variables to obtain a second classification result;
and integrating the first classification result and the second classification result to generate the first guest group definition, and acquiring a definition guest group.
A mixed-effect logistic regression-based default rate prediction system, the system comprising:
the acquisition module is used for acquiring first client information of the target client, conducting client group subdivision on the target client according to the first client information, and acquiring subdivision results;
the first output module is used for calling the subdivision result and second client information of the target client to output a first result through a preset guest group subdivision model;
the second output module is used for constructing a guest group prediction sub-model of each guest group in the subdivision result to predict the risk of the target client and outputting a second result;
and the prediction module is used for coupling the first result and the second result to obtain a final risk score of the target client, and predicting the default rate of the target client according to the final risk score.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical scheme of the application is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, serve to explain the application.
FIG. 1 is a workflow diagram of a mixed-effect logistic regression-based method for predicting the offending rate according to the present application;
FIG. 2 is another workflow diagram of a mixed-effect logistic regression-based method for predicting the offending rate according to the present application;
FIG. 3 is a further workflow diagram of a mixed-effect logistic regression-based method for predicting the offending rate according to the present application;
fig. 4 is a schematic structural diagram of a mixed-effect logistic regression-based default prediction system according to the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
Currently, big data scoring models are applied in large numbers to business scenarios such as credit card applications, installment approval, and the like. In general, the risk performance of a customer base is not homogenous, and there is a difference in risk performance across different segments of the customer base, i.e., the relationship between the independent and dependent variables is not consistent, a phenomenon known statistically as the simpson paradox. One common solution is that a modeler will divide clients into sub-clusters according to business experience first, and construct independent models on sub-clusters to reduce variance in the group, improve accuracy of model prediction, specifically, perform cluster analysis on the whole clusters first, and perform model training on the cluster result. The system can generate a plurality of guest group sub-models, the plurality of sub-models can cause trouble to downstream model users, the downstream model users can not correctly select proper sub-models for prediction, and experience feeling is reduced. In order to solve the above problems, the present embodiment discloses a mixed effect logistic regression-based default rate prediction method.
A mixed effect logistic regression-based default rate prediction method, as shown in figure 1, comprises the following steps:
s101, acquiring first customer information of a target customer, and carrying out customer group subdivision on the target customer according to the first customer information to acquire subdivision results;
step S102, calling the subdivision result and second client information of the target client to output a first result through a preset guest group subdivision model;
step S103, constructing a guest group prediction sub-model of each guest group in the subdivision result to predict risk of the target client, and outputting a second result;
and step S104, coupling the first result and the second result to obtain a final risk score of the target client, and predicting the default rate of the target client according to the final risk score.
In this embodiment, the first customer information is represented as identity information and consumption information of the target user;
in this embodiment, the guest group subdivision is represented by setting a plurality of consuming guest groups according to the identity information and the consumption information of the target user;
in this embodiment, the first result is expressed as calculating the probability that the target customer belongs to each consumer group;
in this embodiment, the second outcome is represented as a risk prediction score for the target customer under each consumer group;
in this embodiment, the final risk score is expressed as a composite risk score for the target user under the attribution of multiple consumer groups.
The working principle of the technical scheme is as follows: obtaining first customer information of a target customer, conducting customer group subdivision on the target customer according to the first customer information, obtaining subdivision results, calling the subdivision results and second customer information of the target customer to output first results through a preset customer group subdivision sub-model, constructing a customer group prediction sub-model of each customer group in the subdivision results to conduct risk prediction on the target customer, outputting second results, coupling the first results and the second results to obtain final risk scores of the target customer, and predicting the default rate of the target customer according to the final risk scores.
The beneficial effects of the technical scheme are as follows: by adopting end-to-end model call, the modeling flow and the downstream model use flow are greatly simplified, only a single score is output, and a user does not need to know the existence of a plurality of sub-models of the bottom layer. Meanwhile, as only a single model is used, the model deployment process is greatly simplified, namely, only one model deployment is needed, the model testing and monitoring burden is reduced, the maintainability of the model and the experience of a downstream user are improved, the problem that a plurality of guest group sub-models are generated in the prior art due to guest group clustering, the downstream model user is puzzled by the plurality of sub-models, and the downstream model user cannot correctly select the proper sub-model for prediction is solved, and the experience is reduced.
In one embodiment, as shown in fig. 2, the obtaining the first client information of the target client, performing guest group subdivision on the target client according to the first client information, and obtaining the subdivision result includes:
step S201, acquiring identity information and consumption information of a target client and confirming the identity information and the consumption information as the first client information;
step S202, determining the client type of a target client according to the identity information of the target client, and extracting the common attribute of the client type;
step 203, defining a first customer group for the target customer according to the common attribute and the experience rule corresponding to the preset service experience, and obtaining a defined customer group;
and S204, determining the consumption level and the consumption capacity of the target client according to the consumption information of the target client, deleting the first client group which is not in accordance with the requirements in the definition client group based on the consumption level and the consumption capacity, obtaining a second client group, and confirming the second client group as the subdivision client group.
In this embodiment, the social identity of the target client may be determined according to the identity information of the target client. Such as students, teachers, administrative staff, or individual households. The corresponding client types are student clients, teacher clients, administrative staff clients and the like;
in this embodiment, the common attribute represents a common attribute represented for each client type.
The beneficial effects of the technical scheme are as follows: all potential client types to which the target client belongs can be comprehensively determined by determining the common attribute of the client types to which the target client belongs according to the identity information of the target client, the occurrence of omission is avoided, the stability and the practicability are improved, and further, the classified client groups can be removed according to the consumption capacity and the consumption level of the target client to obtain the subdivided client groups conforming to the identity and the consumption level of the target client, and the reliability and the objectivity are improved.
In one embodiment, as shown in fig. 3, the calling the subdivision result and the second client information of the target client outputs the first result through a preset guest group subdivision model, including:
step S301, acquiring income information credit record information of a target client and confirming the income information credit record information as second client information;
step S302, confirming whether the second client information is sequence data, if so, selecting a cyclic neural network model as a preset guest group subdivision model, otherwise, selecting a Softmax regression model as the preset guest group subdivision model;
step S303, determining a guest group identification index parameter of each subdivision guest group in the subdivision result;
and step S304, inputting the second client information and the guest group identification index parameter of each subdivision guest group into the preset guest group subdivision model, outputting the probability that the target client belongs to each guest group, and confirming the probability as the first result.
In the present embodiment, the income information is expressed as specific amount information such as wages and other incomes of the target clients;
in this embodiment, the credit record information is represented as credit record information such as borrowing and repayment of the target client on each credit software;
in this embodiment, the group identification index parameter is represented as a specific identification index parameter that is specifically divided for each group.
In the present embodiment, the time of output using the recurrent neural network is also softmax regression, which is actually softmax (RNN (X)) or softmax (FFN (X)), where FFN is a feedforward neural network. The RNN or FFN is simply a feature extractor, which converts the original credit feature (a vector such as income and age) of the client into another vector, or directly uses the original feature as the input of softmax without any transformation.
The beneficial effects of the technical scheme are as follows: by selecting different network models as preset guest group subdivision models, corresponding processing models can be reasonably selected according to the data types of the client data, so that the practicability and the compatibility between the selected models and the client data are improved, and the stability is further improved. Further, the matching degree of the target client and the guest group identification index parameter of each subdivision guest group can be rapidly and accurately determined by acquiring the guest group identification index parameter of each subdivision guest group as model input, so that the probability that the target client belongs to each guest group is rapidly determined, and the model evaluation efficiency and accuracy are improved.
In one embodiment, the building the guest group predictor model of each guest group in the subdivision result performs risk prediction on the target client, and outputs a second result, including:
selecting a general model architecture according to general logic parameters of each subdivision guest group in the subdivision result;
constructing a guest group predictor model of each subdivision guest group based on the general model architecture and the logic prediction parameters of the subdivision guest group;
inputting second client information of the target clients into a guest group prediction sub-model of each subdivision guest group to obtain output first risk scores;
and confirming the first risk score output in each guest group predictor model as the second result.
In this embodiment, the general logical parameter is represented as a general partition logical parameter for each subdivided guest group;
in this embodiment, the model architecture is represented as a generic infrastructure model architecture of the guest group predictor model to be built;
in this embodiment, the first risk score is expressed as a default risk score for the target customer under each of the subdivided guest groups.
The beneficial effects of the technical scheme are as follows: the risk score of the target client in each prediction guest group can be accurately predicted by constructing the guest group prediction sub-model of each subdivision guest group, the condition is laid for the subsequent comprehensive scoring, the practicability is improved, and meanwhile, the exclusive model can be specifically constructed for risk prediction, so that the prediction result is more objective and practical, and the stability is improved.
In one embodiment, the method further comprises: constructing a mixed effect logistic regression model of each subdivision guest group according to the preset guest group subdivision sub-model and the guest group prediction sub-model of the subdivision guest group;
detecting the number of control variables of the mixed effect logistic regression model of each subdivision guest group;
determining whether the number of the control variables of the mixed effect logistic regression model of each subdivision guest group is larger than or equal to a preset value, if so, training and optimizing the mixed effect logistic regression model of the subdivision guest group through a random parallel gradient descent algorithm;
the trained and optimized mixed effect logistic regression model of each subdivision guest group is tested to evaluate its model accuracy.
The beneficial effects of the technical scheme are as follows: the reliability and accuracy of the mixed effect logistic regression model of each subdivision guest group can be ensured by training and optimizing the mixed effect logistic regression model of each subdivision guest group by adopting a random parallel gradient descent algorithm.
In this embodiment, testing the trained and optimized mixed effect logistic regression model for each subdivision guest group to evaluate its model accuracy includes:
constructing an adaptive test environment based on the suitability scene parameters of the mixed effect logistic regression model of each subdivision guest group;
selecting test sample data according to the guest group type, the guest group classification triggering mode and the guest group crowd comprehensive characteristics of each subdivision guest group and constructing a test data set;
constructing an evaluation task of each subdivision guest group based on model parameters of a mixed effect logistic regression model of each subdivision guest group;
serializing the test data set of the mixed effect logistic regression model of each subdivision guest group;
generating test cases of a mixed effect logistic regression model of each subdivision guest group according to the serialized data set;
configuring a data coverage criterion of each test case, and calling a test sample of a mixed effect logistic regression model of each subdivision guest group;
associating a mixed effect logistic regression model for each subdivision guest group with its test sample;
testing the configured test cases of the mixed effect logistic regression model of each subdivision guest group by using the test sample of the mixed effect logistic regression model of each subdivision guest group to obtain a test result;
and evaluating the model precision of the mixed effect logistic regression model of each subdivision guest group according to the test result of the mixed effect logistic regression model of the subdivision guest group.
The beneficial effects of the technical scheme are as follows: the model test cases and the call test scripts of each subdivision guest group are configured, so that the accurate test can be performed on the model of each subdivision guest group more stably, the objectivity and the accuracy of test data are guaranteed, meanwhile, the condition is laid for the judgment of the accuracy of the subsequent model, and the practicability is further improved.
In one embodiment, coupling the first and second results to obtain a final risk score for the target customer, predicting the target customer's rate of breach based on the final risk score, includes:
carrying out weighted average calculation on the first risk scores output in the guest group prediction sub-model of the guest group by utilizing the probability that the target client belongs to each guest group to obtain a second risk score of the target client;
confirming the second risk score as a final risk score for the target customer;
and mapping the final risk score of the target client into the default probability through a preset function.
The beneficial effects of the technical scheme are as follows: the probability that the target client belongs to each guest group is utilized to carry out weighted average calculation on the first risk score output in the guest group prediction sub-model of the guest group to obtain the second risk score of the target client, the actual risk score of the target client belonging to each guest group can be accurately determined in consideration of the guest group type probability of the target client, and then the average risk score is calculated, so that the reliability and objectivity of a scoring result are ensured. Lays a foundation for the follow-up calculation of the default probability.
In one embodiment, the mapping the final risk score of the target customer to the default probability by a preset function includes:
the probability of breach of the target client is calculated by the following formula:
p=sigmoid(f(X)·g(X))
wherein p is represented as the default probability of the target client output by the mixed effect logistic regression model, sigmoid is represented as a sigmoid function, real number risk scores are mapped to the default probability, X is represented as second client information of the target client, f () is represented as a preset client group subdivision model, and g () is represented as a client group prediction sub model.
In the present embodiment, expressed as dot product/inner product, for example: presetting five guest groups to be subdivided, respectively outputting two five-dimensional vectors (R≡5) by f (X) and g (X), and then carrying out inner product on the two vectors to obtain a risk score
The beneficial effects of the technical scheme are as follows: the risk score can be directly mapped into the default probability by utilizing the sigmoid function, so that the working efficiency, the prediction accuracy and the objectivity are improved.
In one embodiment, the determining the client type of the target client according to the identity information of the target client, and extracting the common attribute of the client type includes:
acquiring potential characteristics of a target user according to identity information of the target user, and determining the fit degree of the potential characteristics of the target user and the first attribute tags of each preset client type;
determining potential client types of the target clients according to the fit degree;
acquiring attribute value fields of second attribute tags of each potential client type;
and screening out the general attribute value of the potential client type based on the attribute value field of the second attribute label of each potential client type, and acquiring the entity attribute corresponding to the general attribute value as the common attribute of the potential client type of the target client.
The beneficial effects of the technical scheme are as follows: the common attribute of the potential client types of the target client can be obtained rapidly according to the common attribute of each potential client type by acquiring the common attribute value of the potential client type of the target client by utilizing the attribute value field, so that the accuracy and precision of attribute extraction are improved, and a foundation is laid for subsequent work.
In one embodiment, the defining the first customer group for the target customer according to the experience rule corresponding to the common attribute and the preset service experience, and obtaining the defined customer group includes:
acquiring experience classification factors according to experience rules corresponding to the preset service experiences;
performing guest group classification on the target clients by using the experience classification factors to obtain a first classification result;
determining attribute variables according to the common attributes and consumption information of the target clients, and classifying the target clients by utilizing a preset decision tree guest group subdivision model based on the attribute variables to obtain a second classification result;
and integrating the first classification result and the second classification result to generate the first guest group definition, and acquiring a definition guest group.
The beneficial effects of the technical scheme are as follows: by carrying out experience classification and decision classification on the target clients and integrating the experience classification and the decision classification, the potential definition client group of the target clients can be more comprehensively and completely determined, the occurrence of missing situations is avoided, and the practicability and the stability are improved.
In one embodiment, the method uses the concepts of Mixed effect logistic regression Model (Mixed-effect Logistic Regression) and Model integration (Model) to couple guest group subdivision and sub-Model prediction, and uses a single overall Model to simultaneously conduct guest group division and risk prediction for the customer. The model consists of the following parts:
1. setting the number of potential sub-guest groups: first, based on business experience, it is assumed that there are k sub-groups.
2. Guest group subdivision model: the model takes as input customer information, such as credit records and revenue levels, and outputs a probability distribution that the customer belongs to k sub-groups. For example, assuming k is 3, for each incoming customer information, the model will output probabilities that the customer belongs to three categories. Typically the model may employ a Multi-classification (Multi-classification) model structure, such as a Softmax regression model, but requires a derivable (differential).
3. Guest group predictor model: for each potential sub-guest group (k in total), a sub-model is constructed to predict the risk of the client, the input is still client information, and the output is a risk score (Logits). Then each input customer will have k risk scores. All sub-models may employ the same architecture, but the requirements are conductive.
4. Combining sub-scores: and using the output of the guest group subdivision model, carrying out weighted average on the k sub-scores to obtain a final risk score. A specific operation may directly use the Dot Product (Dot Product) of two vectors.
5. Model training: the entire model is thus guided, so that a gradient update method, such as a random gradient descent method (Stochastic Gradient Descent, SGD), can be used for model training.
First, the method adopts an End-to-End model (End-to-End Paradigm), greatly simplifies the modeling process and downstream model usage process (only outputs a single score, and the user does not need to know the existence of multiple sub-models of the bottom layer). Meanwhile, because only a single model is used, the method also greatly simplifies the model deployment flow, namely, only one model deployment is needed, so that the model test and monitoring burden is reduced, and the maintainability of the model is improved.
The embodiment also discloses a mixed effect logistic regression-based default rate prediction system, as shown in fig. 4, which comprises:
the obtaining module 401 is configured to obtain first client information of a target client, perform guest group subdivision on the target client according to the first client information, and obtain subdivision results;
a first output module 402, configured to invoke the subdivision result and second client information of the target client to output a first result through a preset guest group subdivision model;
a second output module 403, configured to construct a guest group prediction sub-model of each guest group in the subdivision result, perform risk prediction on the target client, and output a second result;
and the prediction module 404 is configured to couple the first result and the second result to obtain a final risk score of the target client, and predict the default rate of the target client according to the final risk score.
The working principle and the beneficial effects of the above technical solution are described in the method claims, and are not repeated here.
It will be appreciated by those skilled in the art that the first and second aspects of the present application refer to different phases of application.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (9)

1. The mixed effect logistic regression-based default rate prediction method is characterized by comprising the following steps of:
acquiring first client information of a target client, and performing client group subdivision on the target client according to the first client information to acquire subdivision results;
invoking the subdivision result and second client information of the target client to output a first result through a preset guest group subdivision model;
constructing a guest group prediction sub-model of each guest group in the subdivision result to predict risk of the target client, and outputting a second result;
coupling the first result and the second result to obtain a final risk score of the target client, and predicting the default rate of the target client according to the final risk score;
the method further comprises the steps of:
constructing a mixed effect logistic regression model of each subdivision guest group according to the preset guest group subdivision sub-model and the guest group prediction sub-model of the subdivision guest group;
detecting the number of control variables of the mixed effect logistic regression model of each subdivision guest group;
determining whether the number of the control variables of the mixed effect logistic regression model of each subdivision guest group is larger than or equal to a preset value, if so, training and optimizing the mixed effect logistic regression model of the subdivision guest group through a random parallel gradient descent algorithm;
testing the mixed effect logistic regression model of each sub-divided guest group after training and optimization to evaluate the model precision;
testing the trained and optimized mixed effect logistic regression model of each subdivision guest group to evaluate model accuracy thereof, comprising:
constructing an adaptive test environment based on the suitability scene parameters of the mixed effect logistic regression model of each subdivision guest group;
selecting test sample data according to the guest group type, the guest group classification triggering mode and the guest group crowd comprehensive characteristics of each subdivision guest group and constructing a test data set;
constructing an evaluation task of each subdivision guest group based on model parameters of a mixed effect logistic regression model of each subdivision guest group;
serializing the test data set of the mixed effect logistic regression model of each subdivision guest group;
generating test cases of a mixed effect logistic regression model of each subdivision guest group according to the serialized data set;
configuring a data coverage criterion of each test case, and calling a test sample of a mixed effect logistic regression model of each subdivision guest group;
associating a mixed effect logistic regression model for each subdivision guest group with its test sample;
testing the configured test cases of the mixed effect logistic regression model of each subdivision guest group by using the test sample of the mixed effect logistic regression model of each subdivision guest group to obtain a test result;
and evaluating the model precision of the mixed effect logistic regression model of each subdivision guest group according to the test result of the mixed effect logistic regression model of the subdivision guest group.
2. The method for predicting the offence rate based on mixed effect logistic regression according to claim 1, wherein the obtaining the first client information of the target client, performing the client group subdivision on the target client according to the first client information, and obtaining the subdivision result includes:
acquiring the identity information and consumption information of a target client and confirming the identity information and consumption information as the first client information;
determining the client type of the target client according to the identity information of the target client, and extracting the common attribute of the client type;
performing first customer group definition on the target customer according to the common attribute and an experience rule corresponding to a preset service experience to obtain a definition customer group;
and determining the consumption level and consumption capacity of the target client according to the consumption information of the target client, deleting the first client group which is not satisfactory in the defined client group based on the consumption level and the consumption capacity, obtaining a second client group, and identifying the second client group as the sub-divided client group.
3. The mixed-effect logistic regression-based violation rate prediction method according to claim 1, wherein the invoking the subdivision result and the second client information of the target client to output the first result through a preset guest group subdivision model includes:
acquiring income information credit record information of a target client and confirming the income information credit record information as second client information;
confirming whether the second client information is sequence data, if so, selecting a cyclic neural network model as a preset guest group subdivision model, and if not, selecting a Softmax regression model as the preset guest group subdivision model;
determining a guest group identification index parameter of each subdivision guest group in the subdivision result;
inputting the second client information and the guest group identification index parameters of each subdivision guest group into the preset guest group subdivision model, outputting the probability that the target client belongs to each guest group, and confirming the probability as the first result.
4. The method for predicting the offensiveness rate based on mixed effect logistic regression according to claim 1, wherein constructing the guest group predictor model of each guest group in the subdivision result to perform risk prediction on the target client, outputting a second result, comprises:
selecting a general model architecture according to general logic parameters of each subdivision guest group in the subdivision result;
constructing a guest group predictor model of each subdivision guest group based on the general model architecture and the logic prediction parameters of the subdivision guest group;
inputting second client information of the target clients into a guest group prediction sub-model of each subdivision guest group to obtain output first risk scores;
and confirming the first risk score output in each guest group predictor model as the second result.
5. The hybrid logistic regression-based violation rate prediction method according to any one of claims 1 to 4, wherein coupling the first and second results to obtain a final risk score for the target customer, predicting the target customer's violation rate based on the final risk score, comprises:
carrying out weighted average calculation on the first risk scores output in the guest group prediction sub-model of the guest group by utilizing the probability that the target client belongs to each guest group to obtain a second risk score of the target client;
confirming the second risk score as a final risk score for the target customer;
and mapping the final risk score of the target client into the default probability through a preset function.
6. The hybrid logistic regression-based breach rate prediction method of claim 5, wherein said mapping the final risk score of the target client to breach probabilities by a preset function comprises:
the probability of breach of the target client is calculated by the following formula:
p=sigmoid(f(X)·g(X))
wherein p is represented as the default probability of the target client output by the mixed effect logistic regression model, sigmoid is represented as a sigmoid function, real number risk scores are mapped to the default probability, X is represented as second client information of the target client, f () is represented as a preset client group subdivision model, and g () is represented as a client group prediction sub model.
7. The method for predicting the offence rate based on mixed effect logistic regression according to claim 2, wherein the determining the client type of the target client based on the identity information of the target client, extracting the common attribute of the client type, comprises:
acquiring potential characteristics of a target user according to identity information of the target user, and determining the fit degree of the potential characteristics of the target user and the first attribute tags of each preset client type;
determining potential client types of the target clients according to the fit degree;
acquiring attribute value fields of second attribute tags of each potential client type;
and screening out the general attribute value of the potential client type based on the attribute value field of the second attribute label of each potential client type, and acquiring the entity attribute corresponding to the general attribute value as the common attribute of the potential client type of the target client.
8. The method for predicting the offence rate based on mixed effect logistic regression according to claim 2, wherein the defining the target customer group according to the rule of thumb corresponding to the common attribute and the preset business experience includes:
acquiring experience classification factors according to experience rules corresponding to the preset service experiences;
performing guest group classification on the target clients by using the experience classification factors to obtain a first classification result;
determining attribute variables according to the common attributes and consumption information of the target clients, and classifying the target clients by utilizing a preset decision tree guest group subdivision model based on the attribute variables to obtain a second classification result;
and integrating the first classification result and the second classification result to generate the first guest group definition, and acquiring a definition guest group.
9. A mixed-effect logistic regression-based default rate prediction system, comprising:
the acquisition module is used for acquiring first client information of the target client, conducting client group subdivision on the target client according to the first client information, and acquiring subdivision results;
the first output module is used for calling the subdivision result and second client information of the target client to output a first result through a preset guest group subdivision model;
the second output module is used for constructing a guest group prediction sub-model of each guest group in the subdivision result to predict the risk of the target client and outputting a second result;
the prediction module is used for coupling the first result and the second result to obtain a final risk score of the target client, and predicting the default rate of the target client according to the final risk score;
the system is also for:
constructing a mixed effect logistic regression model of each subdivision guest group according to the preset guest group subdivision sub-model and the guest group prediction sub-model of the subdivision guest group;
detecting the number of control variables of the mixed effect logistic regression model of each subdivision guest group;
determining whether the number of the control variables of the mixed effect logistic regression model of each subdivision guest group is larger than or equal to a preset value, if so, training and optimizing the mixed effect logistic regression model of the subdivision guest group through a random parallel gradient descent algorithm;
testing the mixed effect logistic regression model of each sub-divided guest group after training and optimization to evaluate the model precision;
testing the trained and optimized mixed effect logistic regression model of each subdivision guest group to evaluate model accuracy thereof, comprising:
constructing an adaptive test environment based on the suitability scene parameters of the mixed effect logistic regression model of each subdivision guest group;
selecting test sample data according to the guest group type, the guest group classification triggering mode and the guest group crowd comprehensive characteristics of each subdivision guest group and constructing a test data set;
constructing an evaluation task of each subdivision guest group based on model parameters of a mixed effect logistic regression model of each subdivision guest group;
serializing the test data set of the mixed effect logistic regression model of each subdivision guest group;
generating test cases of a mixed effect logistic regression model of each subdivision guest group according to the serialized data set;
configuring a data coverage criterion of each test case, and calling a test sample of a mixed effect logistic regression model of each subdivision guest group;
associating a mixed effect logistic regression model for each subdivision guest group with its test sample;
testing the configured test cases of the mixed effect logistic regression model of each subdivision guest group by using the test sample of the mixed effect logistic regression model of each subdivision guest group to obtain a test result;
and evaluating the model precision of the mixed effect logistic regression model of each subdivision guest group according to the test result of the mixed effect logistic regression model of the subdivision guest group.
CN202310199292.5A 2023-02-28 2023-02-28 Mixed effect logistic regression-based default rate prediction method and system Active CN116258574B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310199292.5A CN116258574B (en) 2023-02-28 2023-02-28 Mixed effect logistic regression-based default rate prediction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310199292.5A CN116258574B (en) 2023-02-28 2023-02-28 Mixed effect logistic regression-based default rate prediction method and system

Publications (2)

Publication Number Publication Date
CN116258574A CN116258574A (en) 2023-06-13
CN116258574B true CN116258574B (en) 2023-10-13

Family

ID=86680652

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310199292.5A Active CN116258574B (en) 2023-02-28 2023-02-28 Mixed effect logistic regression-based default rate prediction method and system

Country Status (1)

Country Link
CN (1) CN116258574B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978680A (en) * 2019-03-18 2019-07-05 杭州绿度信息技术有限公司 A kind of air control method and system segmenting objective group's credit operation air control differentiation price
CN115146731A (en) * 2022-07-15 2022-10-04 北京三快在线科技有限公司 Model training method, business wind control method and business wind control device
CN116451841A (en) * 2023-03-20 2023-07-18 中银金融科技有限公司 Enterprise loan default probability prediction method, device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978680A (en) * 2019-03-18 2019-07-05 杭州绿度信息技术有限公司 A kind of air control method and system segmenting objective group's credit operation air control differentiation price
CN115146731A (en) * 2022-07-15 2022-10-04 北京三快在线科技有限公司 Model training method, business wind control method and business wind control device
CN116451841A (en) * 2023-03-20 2023-07-18 中银金融科技有限公司 Enterprise loan default probability prediction method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116258574A (en) 2023-06-13

Similar Documents

Publication Publication Date Title
Bracke et al. Machine learning explainability in finance: an application to default risk analysis
Hussain et al. Student-performulator: Predicting students’ academic performance at secondary and intermediate level using machine learning
Wang et al. Predicting construction cost and schedule success using artificial neural networks ensemble and support vector machines classification models
Omurca An intelligent supplier evaluation, selection and development system
Ogor Student academic performance monitoring and evaluation using data mining techniques
Thangavel et al. Student placement analyzer: A recommendation system using machine learning
CN110866819A (en) Automatic credit scoring card generation method based on meta-learning
CN109376766B (en) Portrait prediction classification method, device and equipment
Perez et al. Predicting student program completion using Naïve Bayes classification algorithm
CN109767312A (en) A kind of training of credit evaluation model, appraisal procedure and device
Moallemi et al. An agent-monitored framework for the output-oriented design of experiments in exploratory modelling
Lamba et al. A MCDM-based performance of classification algorithms in breast cancer prediction for imbalanced datasets
CN114239732A (en) Ordered classification label determination method and device, electronic equipment and storage medium
CN116911994B (en) External trade risk early warning system
Wimmer et al. Leveraging vision-language models for granular market change prediction
CN112836750A (en) System resource allocation method, device and equipment
CN116258574B (en) Mixed effect logistic regression-based default rate prediction method and system
Gata et al. The Feasibility of Credit Using C4. 5 Algorithm Based on Particle Swarm Optimization Prediction
Shahoud et al. Incorporating unsupervised deep learning into meta learning for energy time series forecasting
Wang et al. Predicting project success using ANN-ensemble classificaiton models
CN117787569B (en) Intelligent auxiliary bid evaluation method and system
Nurajijah et al. Gradient Tree Boosting for HR Talent Management Application
US20240127214A1 (en) Systems and methods for improving machine learning models
Khadragy et al. Adaptive Network Based Fuzzy Inference System and the Future of Employability
Kampfer Performance and Interpretability of Machine Learning Algorithms for Credit Risk Modelling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant