CN114820159A - Risk quota method, device, equipment and readable storage medium - Google Patents

Risk quota method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN114820159A
CN114820159A CN202210344004.6A CN202210344004A CN114820159A CN 114820159 A CN114820159 A CN 114820159A CN 202210344004 A CN202210344004 A CN 202210344004A CN 114820159 A CN114820159 A CN 114820159A
Authority
CN
China
Prior art keywords
overdue
model
risk
training
limit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210344004.6A
Other languages
Chinese (zh)
Inventor
万世想
杨青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Du Xiaoman Technology Beijing Co Ltd
Original Assignee
Du Xiaoman Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Du Xiaoman Technology Beijing Co Ltd filed Critical Du Xiaoman Technology Beijing Co Ltd
Priority to CN202210344004.6A priority Critical patent/CN114820159A/en
Publication of CN114820159A publication Critical patent/CN114820159A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Finance (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Accounting & Taxation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a risk quota method, which adopts a causal inference framework to construct a double-robust double-stage depth model based on loan feature data, finds overdue expressions of users with similar features but different allocation amounts, the maximum credit line under the condition of high-probability overdue of the clients is not triggered, the line corresponding to the maximum overdue rate is used as a risk quota, the two-stage depth model not only guarantees unbiased and faster training convergence rate of risk inference theoretically from causal inference, but also can learn dense representation of massive users, the defects of low efficiency and complexity of conventional query and high report omission rate are avoided, the correlation learning defect of the traditional data mining algorithm is also avoided, thousands of risk quota of thousands of people can be given for large-scale clients, global risk and credit line are balanced, and the method has good robustness and practical application capability. The invention also discloses a risk rating device, equipment and a readable storage medium, and the risk rating device, the equipment and the readable storage medium have corresponding technical effects.

Description

Risk quota method, device, equipment and readable storage medium
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a risk rating method, apparatus, device, and readable storage medium.
Background
The loan service is one of the most important businesses of the financial institution, when a client submits a credit application, the financial institution gives a credit line with controllable risk level, and the process is risk quota. Generally, the higher the whole credit line is, the higher the proportion of the customers exceeding the repayment capacity of the financial institution is, and the higher the risk level born by the financial institution is; and the low overall credit limit will reduce the market competitiveness of the financial institution. The goal of modern risk quotients is therefore to reduce the amount to customers with low reimbursement capacity and increase the amount to customers with high reimbursement capacity, while maintaining the existing risk level, to increase the service level and profitability of the institution itself.
The existing risk rating system generally adopts a statistical method based on mass data, the method is used for counting the limits of different overdue risk groups, statistical rules are abstracted from the mass data, and the rules are superposed together to judge the limit range of a client.
The obtained result of the method depends on expert experience in a large amount, and the method lacks robustness and cannot be iterated efficiently along with the continuous change of the external financial environment; and data mining is based on correlation at present, so that a large number of rule misjudgments of statistical rules can be caused, and risk expressions that each client presents causal relationships under different limits cannot be given.
In summary, how to ensure the robustness and the practical application capability while ensuring the accuracy of risk inference is a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
The invention aims to provide a risk quota method, a risk quota device, risk quota equipment and a readable storage medium, so that the accuracy of risk deduction is ensured, and meanwhile, the robustness and the practical application capability of the risk quota method are ensured.
In order to solve the technical problems, the invention provides the following technical scheme:
a risk rating method comprising:
receiving the loan characteristics of a client, and inputting the loan characteristics into a two-stage depth model; wherein the two-stage depth model comprises: an amount regression model, an overdue classification model and a potential output frame model;
calling the amount regression model to perform regression fitting according to the lending characteristics to obtain a predicted amount; calling the overdue classification model to perform classification fitting according to the lending characteristics to obtain predicted overdue rate;
calling the potential output frame model to carry out overdue rate risk unbiased inference on the target limit range according to the predicted limit and the predicted overdue rate so as to generate the overdue rate corresponding to each limit;
and extracting the maximum overdue rate lower than the maximum overdue rate threshold from the overdue rates, and taking the limit corresponding to the maximum overdue rate as a risk quota.
Optionally, the training method of the two-stage depth model includes:
splitting the sample data into a training sample and a test sample;
performing data preprocessing on the training sample and the test sample to be used as a training set and a test set;
inputting the training set into the limit regression model and the overdue classification model for prediction training to obtain a training limit and a training overdue rate;
and performing parameter optimization training on the potential output frame model according to the limit residual between the training limit and the corresponding real limit and the overdue residual between the training overdue rate and the corresponding real overdue state.
Optionally, the splitting the sample data into a training sample and a test sample includes:
and splitting the sample data into a training sample and a test sample which are not overlapped by time according to a time window.
Optionally, the performing data preprocessing on the training sample and the test sample includes:
deleting the characteristic that the proportion of the null value exceeds a first threshold value;
adjusting sample attribution in the training set and the test set until the difference value of the feature distribution of the training set and the feature distribution of the test set is smaller than a second threshold value;
and deleting the characteristic that the disturbance influence degree is less than the third threshold value.
Optionally, the risk rating method further comprises:
and performing model evaluation on the causal effect distinguishing capability of the client overdue rate according to the risk quota of the two-stage depth model.
Optionally, the risk rating method further comprises:
if the online duration of the double-stage depth model reaches a threshold value, collecting the received loan characteristics as accumulated characteristics;
and adding the accumulated features into sample data, and executing the step of splitting the sample data into a training sample and a test sample.
Optionally, the receiving a loan feature of the customer, comprising: receiving high-dimensional credit investigation characteristics, user portrait characteristics and platform interaction characteristics of a client.
A risk rating device comprising:
the characteristic receiving unit is used for receiving the loan characteristics of a client and inputting the loan characteristics into the two-stage depth model; wherein the two-stage depth model comprises: an amount regression model, an overdue classification model and a potential output frame model;
the first prediction unit is used for calling the quota regression model to perform regression fitting according to the loan feature to obtain a predicted quota; calling the overdue classification model to perform classification fitting according to the lending characteristics to obtain predicted overdue rate;
the second prediction unit is used for calling the potential output frame model to carry out overdue risk unbiased inference on the target limit range according to the predicted limit and the predicted overdue rate so as to generate the overdue rate corresponding to each limit;
and the limit determining unit is used for extracting the maximum overdue rate lower than the maximum overdue rate threshold from the overdue rates and taking the limit corresponding to the maximum overdue rate as a risk quota.
A computer device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the risk rating method described above when executing the computer program.
A readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the above-mentioned risk rating method.
The method provided by the embodiment of the invention adopts a cause and effect inference framework to construct a double-robust double-stage depth model based on loan feature data, finds overdue expressions of users with similar features but different allocated amounts, thereby obtaining the highest risk amount of the clients, namely the maximum credit granting amount under the condition of not triggering high-probability overdue of the clients, and takes the amount corresponding to the maximum overdue rate as a risk quota, wherein the double-stage depth model not only ensures unbiased and faster training convergence rate of risk inference from cause and effect inference theory, but also can learn dense representations of massive users, avoids the defects of low efficiency, complexity and high report omission rate of conventional enterprise institution qualification inquiry, also avoids the related learning defect of the traditional data mining algorithm, can give out thousands of people and thousands of faces of risk quota, balance risk and credit granting for large-scale clients, has good robustness and practical application capability.
Accordingly, embodiments of the present invention further provide a risk quota device, an apparatus, and a readable storage medium corresponding to the risk quota method, which have the above technical effects and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present invention or technical solutions in related arts, the drawings used in the description of the embodiments or related arts will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of an embodiment of a risk rating method according to the present invention;
FIG. 2 is a schematic diagram of two-stage training of a two-stage depth model according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of an overall training process according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a risk rating device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a risk quota method, which can ensure the accuracy of risk inference and simultaneously ensure the robustness and the practical application capability of the risk inference method.
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating a risk rating method according to an embodiment of the present invention, the method including the following steps:
s101, receiving the loan characteristics of a client, and inputting the loan characteristics into a two-stage depth model;
the method is mainly applied to identifying the risk overdue rate of the current client under different limits according to massive loan data of the clients when the user submits loan application to a financial institution, and finally providing the highest limit with the overdue risk lower than a reasonable threshold. The overdue rates of the users under different limits are completed through a two-stage depth model, so that the estimated theory unbiased is ensured.
Receiving a loan feature of the customer, wherein the loan feature refers to a characteristic of performing a loan activity. The information type specifically included in the obtained loan characteristics is not limited in this embodiment, and may be adjusted accordingly according to the actual quota requirement and the information obtaining channel. Optionally, receiving a loan feature of the customer, comprising: receiving high-dimensional credit investigation characteristics, user portrait characteristics and platform interaction characteristics of a client. The characteristics referred by the scheme mainly comprise high-dimensional credit investigation characteristics of the client, user portrait characteristics, platform interaction characteristics and the like, wherein the high-dimensional credit investigation characteristics comprise historical loan behaviors, loan records and overdue conditions of the client on various normal institutions, up to four thousand dimensions (without limitation) can be extracted, and after the credit investigation characteristics are subjected to characteristic screening and characteristic combination, the financial requirements of the client can be more comprehensively reflected. The user portrait characteristics comprise basic credit information given by the client after risk assessment of the financial institution, and the characteristics can be used as an advantageous supplement of credit investigation characteristics to help the financial institution comprehensively depict the user portrait of a new client. The platform interaction characteristics comprise interaction data of the user on each regular loan platform, such as whether recent loans are frequent or not, historical credit times and the like, and the characteristics can more comprehensively reflect the period and short-term requirements of the client. In the present embodiment, only the three features are taken as examples to perform the analysis and introduction of the risk quota, and other features may be further added or adjusted, which are not limited herein.
After receiving the loan characteristics of the customer, the loan characteristics are input into the two-stage depth model. The method provides a brand-new two-stage depth model based on a potential output framework in causal inference. Under the traditional method, only one model is used for predicting the overdue rate of the user under different limits, and the overdue rate prediction of the single model can be abstracted into the following forms:
Y=θ(X)·T+g(X)+∈,T=f(X)+η
Figure BDA0003575729350000051
wherein, the variable X represents the high-dimensional characteristics of the credit client, T represents the credit limit of the client, and Y represents overdue performance under the credit limit. θ (X) represents the parameters of the pot function that need to be estimated, which may be in the form of parameters,or in a non-parametric form. The parametric form is linear function, polynomial function or reproducible kernel Hilbert space defined, and the non-parametric form is direct fitting
Figure BDA0003575729350000052
Based on the above formula, the model mixes the influence of the characteristics X and T together, the relevance weight of different amounts of T on the characteristics X is evaluated, and actually the overdue performance of different people under different amounts is evaluated, the former is related in nature, and the latter is causal, so that the conclusion obtained based on the related relationship is often realistic and biased, that is, the weight of the amounts on the overdue rate is often estimated by mistake.
In view of this, the method provides a two-stage depth model, which specifically includes: a limit regression model, an overdue classification model, and a potential output frame model. The two-stage depth model can be formally represented as:
Figure BDA0003575729350000061
Figure BDA0003575729350000062
Figure BDA0003575729350000063
Figure BDA0003575729350000064
the two-stage depth model includes two stage recognition depth models, the first recognition stage is to predict the amount T and the overdue rate Y through the amount regression model and the overdue classification model according to the input high-dimensional lending feature X, and step S102 is performed. The purpose of this is to predict the natural tendency of the lending feature X to the value unit, since people with certain features must have a tendency to intervene in the observation data, which is one of the causes of the occurrence of the relevant disturbances. Similarly, the overdue rate Y is directly predicted to obtain the natural overdue rate of this feature sample. The second recognition stage, namely step S103, respectively generates two prediction results, called prediction limit and prediction overdue rate, according to the first-stage limit regression model and the overdue classification model, and calls the potential output frame model to perform difference adjustment according to the prediction limit and the prediction overdue rate and the actual limit and the overdue rate of different people in the historical training, so that the overdue rate of different people in different limits can be arbitrarily inferred, the risk of overdue rate in the target limit range can be arbitrarily inferred, and accurate assessment can be realized.
Wherein the convergence rate of the two-stage depth model is
Figure BDA0003575729350000065
N is the number of training samples, which ensures that the model training can be converged certainly, and the convergence speed approaches to a value, thereby ensuring the model training speed.
Specifically, the first-stage identification of the two-stage depth model is detailed in step S102.
S102, calling a credit regression model to perform regression fitting according to the loan characteristics to obtain a predicted credit; calling an overdue classification model to perform classification fitting according to the loan characteristics to obtain predicted overdue rate;
the credit regression model refers to a deep learning model for predicting the credit of a client through regression fitting, and specific model types and model structures are not limited in this embodiment, and may refer to settings of related prior art. The overdue classification model refers to a deep learning model for predicting whether a client is overdue through classification fitting, and similarly, the specific model type and model structure of the overdue classification model are not limited in this embodiment, and the setting of the related prior art can be referred to.
And the amount regression model and the overdue classification model respectively predict the amount and the overdue rate according to the natural tendency of the characteristics and the high-dimensional loan characteristics to generate a predicted amount and a predicted overdue rate. It should be noted that the execution sequence of the credit regression model and the overdue classification model is not limited, and the credit regression model and the overdue classification model can be executed synchronously or serially.
S103, calling a potential output frame model to carry out overdue rate risk unbiased inference on the target limit range according to the predicted limit and the predicted overdue rate to generate the overdue rate corresponding to each limit;
after the step S102 is executed to complete the identification of the first stage, the step S103 is triggered to execute the identification of the second stage calling the potential output frame model.
The potential output framework is a core framework of the causal inference domain, containing core concepts and assumptions of causal inference. Assume that there are N complete experimental samples, of which m belong to the experimental group and the remaining N-m belong to the control group. Taking binary intervention (treatment) as an example, T is represented as 0 or 1. The characteristic is represented by X, and the response is represented by Y, the whole set of ate (average traffic effect) is represented by:
ATE=E[Y(1)-Y(0)]
ATE reflects the average improvement over the entire experimental sample. For individuals, it is desirable to estimate ite (induced Treatment effect):
ITE=Y i (1)-Y i (0)
ITE is difficult to observe directly because it is virtually impossible to intervene multiple times on the same customer, such as by pricing the customer's interest rate multiple times at the same time to observe his willingness to loan. To this end, ITE may be approximated by a desired estimate of the rate (conditional access flow effect):
CATE=E[Y(1)-Y(0)|X=x i ]
it can be shown that the best estimate of the gate is equivalent to the best estimate of the ITE. The above is the potential output framework.
Calling a potential output frame model to carry out overdue rate risk unbiased inference on the target limit range according to the predicted limit and the predicted overdue rate so as to generate the overdue rate corresponding to each limit. The potential output frame model carries out parameter optimization training according to the residual of the amount between the predicted amount of the sample data and the corresponding real amount and the residual of the predicted overdue rate and the corresponding real overdue state.
S104, extracting the maximum overdue rate lower than the maximum overdue rate threshold from the overdue rates, and taking the limit corresponding to the maximum overdue rate as a risk quota.
And the processed client data enters a double-stage depth model, and overdue rate risk inference is carried out on the set limit range. For example, if the highest overdue rate set by the financial institution is α, the maximum amount below the overdue rate α is the risk quota of the model output, i.e. the loan quota for maximizing the profit under the controllable risk can be realized.
Based on the introduction, the technical scheme provided by the embodiment of the invention adopts a cause and effect inference framework to construct a double-robust double-stage depth model based on loan feature data, finds out overdue expressions of users with similar features but different allocated amounts, thereby obtaining the highest risk amount of the clients, namely the maximum credit line under the condition of not triggering high-probability overdue of the clients, and takes the amount corresponding to the maximum overdue rate as a risk quota, the double-stage depth model not only ensures unbiased and faster training convergence rate of risk inference theoretically from cause and effect inference, but also can learn dense representations of massive users, avoids the defects of low efficiency and complexity of conventional enterprise institution qualification inquiry and high report omission rate, avoids the related learning defect of the traditional data mining algorithm, and can give thousands of people of risk quota for large-scale clients, the global risk and the credit line are balanced, and the robustness and the practical application capability are good.
It should be noted that, based on the above embodiments, the embodiments of the present invention also provide corresponding improvements. In the preferred/improved embodiment, the same steps as those in the above embodiment or corresponding steps may be referred to each other, and corresponding advantageous effects may also be referred to each other, which are not described in detail in the preferred/improved embodiment herein.
In the above embodiment, the training process of the two-stage depth model is not limited, and in order to ensure the training speed and the accuracy of the parameters, the embodiment provides a training method, which specifically includes the following steps:
(1) splitting the sample data into a training sample and a test sample;
the collected off-line data is split into training samples and testing samples. Alternatively, the data may be divided into training sets and test sets according to time windows, for example, data of 3 to 9 months 2021 is used as the training set, and data of 10 months 2021 is used as the test set, which do not overlap with each other in time, so as to enhance the training effect.
(2) Carrying out data preprocessing on the training samples and the test samples to be used as a training set and a test set;
the data preprocessing mainly comprises sample screening and characteristic screening. Optionally, the feature screening may specifically include null value rate, feature stability, and feature importance. The null value rate is a feature that the null value proportion is deleted beyond the first threshold value, and a feature that the null value proportion is deleted is higher, so that the interpretability of the model itself can be ensured. The feature stability is to adjust sample attribution in the training set and the test set, for example, by means of PSI index and the like, until the difference value of the feature distribution of the training set and the test set is smaller than the second threshold, thereby ensuring that the distribution of each feature on the training set and the test set tends to be consistent, otherwise, the feature may be strongly correlated with the time window to cause unstable model performance. The feature importance is that the feature with the disturbance influence degree smaller than the third threshold is deleted, and random disturbance is added to the feature, so that the feature with small influence of the disturbance on the prediction result is deleted, and the distinguishing effect cannot be realized because the performance of the feature is the same as that of the random feature. In this embodiment, only the three data preprocessing methods are described as an example, and other preprocessing methods configured according to the application scenario can refer to the description of this embodiment, and are not described herein again.
After data preprocessing, a two-stage depth model needs to be trained, triggering step (3).
It should be noted that the data processing process performed on the sample data and the data processing process of the online inference process in actual application need to be set correspondingly, in the above embodiment, the step S101 is only described by taking the direct acquisition of the loan feature of the client as an example, and if the acquired data is the original loan data, corresponding data preprocessing needs to be performed with reference to this step.
(3) Inputting the training set into a limit regression model and an overdue classification model for prediction training to obtain a training limit and a training overdue rate;
the two-stage depth model comprises two training stages, wherein the first training stage, namely step (3), respectively predicts a limit T and an overdue rate Y according to the input high-dimensional feature X. The purpose of this is to predict the natural tendency of the feature X to the quota on the one hand, since people with certain features are artificially intervened in the observation data and must have a tendency, which is one of the reasons for the occurrence of the correlation disturbance. The principle of directly predicting the overdue rate Y is similar to obtain the natural overdue rate of this feature sample.
(4) And performing parameter optimization training on the potential output frame model according to the limit residual between the training limit and the corresponding real limit and the overdue residual between the training overdue rate and the corresponding real overdue state.
The second training stage of the two-stage depth model respectively generates two prediction results according to the limit regression model (model T) and the overdue classification model (model Y) trained in the first stage, and the two prediction results are respectively called training limit (prediction T) and training overdue (prediction Y). And subtracting the predicted T from the real T to obtain a residual T, and subtracting the predicted Y from the real Y to obtain a residual Y. The function form of theta (phi) is set (for example, a polynomial function about X), then theta (X) is multiplied by residual X, model training is carried out on the product and the residual Y, and finally parameters of theta (phi) are determined, so that the overdue rates of different people in different limits can be estimated without bias.
As shown in fig. 2, a two-stage training diagram of a two-stage depth model is shown, where a high-dimensional feature X passes through a sharing layer composed of a plurality of fully connected layers, and extracts a dense semantic meaning representing more fully, and is used to train an amount regression model T, an overdue classification model Y and a regression model of a second stage in a first stage. Features specific to the model T are further extracted by the shared layer, and then mean-square loss is accessed to complete regression fitting. Similarly, features specific to model Y are extracted by the shared layer, and cross-entry loss is then accessed to complete the class fit. The first stage trained limit regression model T and overdue classification model Y respectively generate two prediction results, which are respectively called prediction T and prediction Y. The real T and the predicted T are subtracted to obtainAnd residual T, and subtracting the predicted Y from the real Y to obtain residual Y. And finally, performing regression fitting on the residual T and the residual Y obtained in the first stage in the second stage to finish the final model training. In order to avoid overfitting in the training process, cross-fitting (cross-fitting) training can be adopted, a training set is divided into K parts, the K times are circulated, the first-stage training is carried out by using the kth e to the K parts each time, and the rest K is carried out - E, K shares are used for the second stage training. Therefore, the training data is fully utilized to avoid overfitting, and the unbiased property of the theta (-) parameter estimation is ensured, and of course, other training modes can be adopted without limitation.
It should be noted that the model structure may be slightly adjusted according to the training results under different data. For example, a Dropout layer, an Early Stopping training mechanism, an L2 regularization may be added appropriately, and the adjustment may be performed according to the actual situation, which is not limited herein.
Further, model evaluation can be performed on the causal effect distinguishing capability of the client overdue rate of the trained two-stage depth model according to the risk quota of the two-stage depth model, so that whether the training effect reaches the degree that online inference can be achieved or not is determined. As shown in fig. 3, the overall training process is a schematic diagram, after the acquired raw data is preprocessed, the raw data is divided into a training set and a test set, the test set is used for off-line evaluation, and the trained model is subjected to parameter feedback adjustment according to the off-line evaluation result, so as to optimize the model recognition accuracy. In particular, the causal effect discrimination ability of risk ratings on customer overdue rates can be evaluated by means of Qini Score et al. The Qini Score is a calculation result of the model on a part of real samples, reflects overdue rate inference accuracy under different limits, and the higher the Qini Score is, the stronger the distinguishing capability of the model is, and the stronger the theoretical inference capability on line is. Other model evaluation methods may also be used, and are not described in detail herein.
In addition, in order to ensure that the double-stage depth model is closer to the data type of practical application to realize more accurate evaluation, the received loan characteristics can be collected as the accumulated characteristics after the online duration of the double-stage depth model reaches a threshold value; adding the accumulated features to the sample data and triggering step (1).
And at each period of time, updating the parameters of the offline model by adopting the data accumulated on the line, circulating the process, and continuously updating the offline model by depending on the mass data accumulated continuously in practical application, so that the real-time risk quota of thousands of people and thousands of faces can be realized more accurately.
Corresponding to the above method embodiments, the present invention further provides a risk rating device, and the risk rating device described below and the risk rating method described above may be referred to in correspondence.
Referring to fig. 4, the apparatus includes the following modules:
the feature receiving unit 110 is mainly configured to receive the loan feature of the client, and input the loan feature into the two-stage depth model; wherein the dual stage depth model comprises: an amount regression model, an overdue classification model and a potential output frame model;
the first prediction unit 120 is mainly configured to invoke the credit line regression model to perform regression fitting according to the loan characteristics to obtain a predicted credit line; calling an overdue classification model to perform classification fitting according to the loan characteristics to obtain predicted overdue rate;
the second prediction unit 130 is mainly used for calling the potential output frame model to perform overdue rate risk unbiased inference on the target limit range according to the predicted limit and the predicted overdue rate, and generating the overdue rate corresponding to each limit;
the limit determining unit 140 is mainly configured to extract a maximum overdue rate lower than the maximum overdue rate threshold from the overdue rates, and use a limit corresponding to the maximum overdue rate as a risk quota.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a computer device, and a computer device described below and a risk rating method described above may be referred to correspondingly.
The computer device includes:
a memory for storing a computer program;
a processor for implementing the steps of the risk rating method of the above method embodiments when executing the computer program.
Specifically, referring to fig. 5, a specific structural diagram of a computer device provided in this embodiment is a schematic diagram, where the computer device may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 322 (e.g., one or more processors) and a memory 332, where the memory 332 stores one or more computer applications 342 or data 344. Memory 332 may be, among other things, transient or persistent storage. The program stored in memory 332 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a data processing device. Still further, the central processor 322 may be configured to communicate with the memory 332 to execute a series of instruction operations in the memory 332 on the computer device 301.
The computer device 301 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input-output interfaces 358, and/or one or more operating systems 341.
The steps in the risk rating method described above may be implemented by the structure of a computer device.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a readable storage medium, and a readable storage medium described below and a risk rating method described above may be referred to in correspondence.
A readable storage medium, having stored thereon a computer program which, when executed by a processor, carries out the steps of the risk rating method of the above-described method embodiments.
The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various other readable storage media capable of storing program codes.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Claims (10)

1. A method of risk rating, comprising:
receiving the loan characteristics of a client, and inputting the loan characteristics into a two-stage depth model; wherein the two-stage depth model comprises: an amount regression model, an overdue classification model and a potential output frame model;
calling the amount regression model to perform regression fitting according to the lending characteristics to obtain a predicted amount; calling the overdue classification model to perform classification fitting according to the lending characteristics to obtain predicted overdue rate;
calling the potential output frame model to carry out overdue rate risk unbiased inference on the target limit range according to the predicted limit and the predicted overdue rate so as to generate the overdue rate corresponding to each limit;
and extracting the maximum overdue rate lower than the maximum overdue rate threshold from the overdue rates, and taking the limit corresponding to the maximum overdue rate as a risk quota.
2. A risk rating method according to claim 1 wherein the training method of the two-stage depth model comprises:
splitting the sample data into a training sample and a test sample;
performing data preprocessing on the training sample and the test sample to be used as a training set and a test set;
inputting the training set into the limit regression model and the overdue classification model for prediction training to obtain a training limit and a training overdue rate;
and performing parameter optimization training on the potential output frame model according to the limit residual between the training limit and the corresponding real limit and the overdue residual between the training overdue rate and the corresponding real overdue state.
3. The risk rating method of claim 2, wherein the splitting sample data into training samples and test samples comprises:
and splitting the sample data into a training sample and a test sample which are not overlapped by time according to a time window.
4. The risk rating method of claim 2, wherein the pre-processing of the data for the training samples and the test samples comprises:
deleting the characteristic that the proportion of the null value exceeds a first threshold value;
adjusting sample attribution in the training set and the test set until the difference value of the feature distribution of the training set and the feature distribution of the test set is smaller than a second threshold value;
and deleting the characteristic that the disturbance influence degree is less than the third threshold value.
5. The risk rating method of claim 2, further comprising:
and performing model evaluation on the causal effect distinguishing capability of the client overdue rate according to the risk quota of the two-stage depth model.
6. The risk rating method of claim 2, further comprising:
if the online duration of the double-stage depth model reaches a threshold value, collecting the received loan characteristics as accumulated characteristics;
and adding the accumulated features into sample data, and executing the step of splitting the sample data into a training sample and a test sample.
7. The risk rating method of claim 1, wherein the receiving a loan profile of the customer comprises: receiving high-dimensional credit investigation characteristics, user portrait characteristics and platform interaction characteristics of a client.
8. A risk rating device, comprising:
the characteristic receiving unit is used for receiving the loan characteristics of a client and inputting the loan characteristics into the two-stage depth model; wherein the two-stage depth model comprises: an amount regression model, an overdue classification model and a potential output frame model;
the first prediction unit is used for calling the quota regression model to perform regression fitting according to the loan feature to obtain a predicted quota; calling the overdue classification model to perform classification fitting according to the lending characteristics to obtain predicted overdue rate;
the second prediction unit is used for calling the potential output frame model to carry out overdue risk unbiased inference on the target limit range according to the predicted limit and the predicted overdue rate so as to generate the overdue rate corresponding to each limit;
and the quota determining unit is used for extracting the maximum overdue rate lower than the maximum overdue rate threshold from the overdue rates and taking the quota corresponding to the maximum overdue rate as a risk quota.
9. A computer device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the risk rating method according to any of claims 1 to 7 when executing the computer program.
10. A readable storage medium, having stored thereon a computer program which, when executed by a processor, carries out the steps of the risk rating method according to any of claims 1 to 7.
CN202210344004.6A 2022-03-31 2022-03-31 Risk quota method, device, equipment and readable storage medium Pending CN114820159A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210344004.6A CN114820159A (en) 2022-03-31 2022-03-31 Risk quota method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210344004.6A CN114820159A (en) 2022-03-31 2022-03-31 Risk quota method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN114820159A true CN114820159A (en) 2022-07-29

Family

ID=82532791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210344004.6A Pending CN114820159A (en) 2022-03-31 2022-03-31 Risk quota method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114820159A (en)

Similar Documents

Publication Publication Date Title
US20180260891A1 (en) Systems and methods for generating and using optimized ensemble models
CN109389494B (en) Loan fraud detection model training method, loan fraud detection method and device
KR102009309B1 (en) Management automation system for financial products and management automation method using the same
CN111340616B (en) Method, device, equipment and medium for approving online loan
CN111324862A (en) Method and system for monitoring behavior in loan
US11804302B2 (en) Supervised machine learning-based modeling of sensitivities to potential disruptions
Zuev et al. Machine learning in IT service management
CN112700324A (en) User loan default prediction method based on combination of Catboost and restricted Boltzmann machine
CN110737641A (en) Construction method, device and system of confidence and audit models
Eddy et al. Credit scoring models: Techniques and issues
CN115249081A (en) Object type prediction method and device, computer equipment and storage medium
CN114638695A (en) Credit evaluation method, device, equipment and medium
CN115293598A (en) Enterprise financial management risk identification method based on financial big data
CN117764706A (en) Risk identification method and device and electronic equipment
CN112348685A (en) Credit scoring method, device, equipment and storage medium
CN114820159A (en) Risk quota method, device, equipment and readable storage medium
CN113421154B (en) Credit risk assessment method and system based on control chart
CN114612231A (en) Stock quantitative trading method and device, terminal device and readable storage medium
CN115641198A (en) User operation method, device, electronic equipment and storage medium
CN117455681A (en) Service risk prediction method and device
CN114612239A (en) Stock public opinion monitoring and wind control system based on algorithm, big data and artificial intelligence
Suganya et al. Stock price prediction using tech news based soft computing approach
US11004156B2 (en) Method and system for predicting and indexing probability of financial stress
CN111951099A (en) Credit card issuing model and application method thereof
CN112950392A (en) Information display method, posterior information determination method and device and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination