CN115186489A - Grading modeling method and system based on pedestrian credit information rejection inference technology - Google Patents

Grading modeling method and system based on pedestrian credit information rejection inference technology Download PDF

Info

Publication number
CN115186489A
CN115186489A CN202210821327.XA CN202210821327A CN115186489A CN 115186489 A CN115186489 A CN 115186489A CN 202210821327 A CN202210821327 A CN 202210821327A CN 115186489 A CN115186489 A CN 115186489A
Authority
CN
China
Prior art keywords
sample
rejection
credit
samples
bad
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210821327.XA
Other languages
Chinese (zh)
Inventor
何再德
王云清
吴凡
汪涣涣
刘江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Boc Consumer Finance Co ltd
Original Assignee
Boc Consumer Finance Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Boc Consumer Finance Co ltd filed Critical Boc Consumer Finance Co ltd
Priority to CN202210821327.XA priority Critical patent/CN115186489A/en
Publication of CN115186489A publication Critical patent/CN115186489A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/041Abduction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of financial science, and particularly discloses a scoring modeling method and a scoring modeling system based on a human credit investigation information rejection inference technology, wherein the method comprises the steps of obtaining a KGB sample and a rejection sample; the KGB sample is a known good-quality label sample; the rejection sample is obtained by analyzing the performance data of the people's bank credit report collected by different institutions to deduce the good and bad labels of the user; the AGB samples comprise KGB samples and IGB samples; and establishing a scoring card model according to the AGB sample. The method is based on the target re-classification method, comprehensively considers the service scene and the data expression condition, fully utilizes the pedestrian report historical information to improve the post-credit expression of the user, and realizes the inference of the good and bad sample after the credit of the refused sample and the completion of the application grading development.

Description

Grading modeling method and system based on pedestrian credit information rejection inference technology
Technical Field
The invention relates to the technical field of financial science, in particular to a grading modeling method and a grading modeling system based on a human credit investigation information rejection deduction technology.
Background
In recent years, with the rapid development of internet technology, the development of the internet financial industry and the construction of general finance are greatly promoted, but the core and risk management of finance are not changed by the internet technology. The core logic of credit risk management is to evaluate repayment willingness and repayment capacity of a customer, the core of the two problems is data, and the advanced technology of the internet realizes that a financial institution can call a third-party data manufacturer in real time to obtain credit investigation data to provide a basis for risk real-time decision making.
In the whole credit life cycle, mainly including pre-credit, mid-credit and post-credit stages, a three-stage scoring card model is a core part for formulating a strategy system of the whole credit life cycle, and is contradictory to the actual strategy application scene for evaluating the risk level of all samples by only developing a scoring card by examining and approving samples passing the post-credit good and bad performance (KGB).
The current mainstream inference technology is mainly based on a binary algorithm, a survival analysis algorithm, an EM (unsupervised) algorithm, and the like. Under different scenes, the credit granting passing rate of the product and the characteristics of effectively evaluating the rejection sample can be acquired, so that the influence on the rejection inference effect is large, and the development effect of the scoring card is influenced.
Disclosure of Invention
The invention aims to provide a grading modeling method and a grading modeling system based on a human credit investigation information rejection deduction technology, which aim to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a scoring modeling method based on a pedestrian credit information rejection inference technique, the method comprising:
obtaining a KGB sample and a rejection sample; the KGB sample is a sample containing known good and bad labels;
in the rejection sample, deducing a good label and a bad label of the user by analyzing the historical information of the people's bank note report to obtain an IGB sample; the IGB sample is a rejection sample containing a label of good or bad inference;
establishing a scoring card model according to the AGB sample; the AGB samples include KGB samples and IGB samples.
As a further limitation of the technical solution of the present invention, the step of acquiring the KGB sample and the step of rejecting the KGB sample include:
acquiring product application data in a preset period;
carrying out view analysis and rolling rate analysis on clients which pass the credit and are used for quota, and determining target variables;
obtaining a KGB sample and a rejection sample based on the target variable.
As a further limitation of the technical solution of the present invention, in the rejection sample, the step of obtaining the IGB sample by analyzing the history information of the person's behavioral credit report to infer the quality label of the user includes:
acquiring a personal credit investigation report of a sample inquired in an application stage;
reading the post-credit repayment data of the client in the personal credit report;
and deducing a good label and a bad label of a rejected sample according to the people's bank credit report data and the credited repayment data to obtain an IGB sample.
As a further limitation of the technical solution of the present invention, the step of obtaining the IGB sample by inferring a good or bad label of the rejected sample according to the human credit investigation report data and the post-credit repayment data includes:
obtaining the performance after loan with the loan duration of 6 months and the overdue duration of 30 days in the pedestrian report corresponding to the refused client, and establishing an overdue evaluation rule set;
marking the clients triggering the rule set as bad, and marking the clients not triggering the overdue standard as good;
and inserting good and bad labels into the rejected samples according to the marking result to obtain the IGB samples.
As a further limitation of the technical solution of the present invention, the step of establishing the score card model according to the AGB sample includes:
and modeling the AGB sample by adopting a machine learning algorithm LightGBM to obtain a scoring card model.
As a further limitation of the technical scheme of the invention, the rejection reasons of the rejection samples comprise anti-fraud score rejection, pedestrian information admission rejection and credit score rejection; and in the process of obtaining rejection samples, rejecting the anti-fraud scoring rejection samples.
The technical scheme of the invention also provides a scoring modeling system based on the human credit investigation information rejection deduction technology, and the system comprises:
the sample acquisition module is used for acquiring a KGB sample and a rejection sample; the KGB sample is a sample containing known good and bad labels;
the sample marking module is used for deducing the quality label of the user by analyzing the history information of the people's bank credit report in the rejection sample to obtain an IGB sample; the IGB sample is a rejection sample containing a label of good or bad inference;
the model establishing module is used for establishing a scoring card model according to the AGB sample; the AGB samples include KGB samples and IGB samples.
As a further limitation of the technical solution of the present invention, the sample acquiring module includes:
the data acquisition unit is used for acquiring product application data in a preset period;
the target variable determining unit is used for carrying out view analysis and rolling rate analysis on clients which pass the credit and are used for quota, so as to determine a target variable;
and the sample generation unit is used for acquiring KGB samples and rejection samples based on the target variable.
As a further limitation of the technical solution of the present invention, the sample labeling module comprises:
the pedestrian data query unit is used for acquiring a pedestrian credit report of the sample queried in the application stage;
the repayment data extraction unit is used for reading the post-loan repayment data of the client in the bank credit report;
and the processing execution unit is used for realizing the inference of the good or bad label of the rejection sample according to the people credit investigation report data and the credited repayment data to obtain the IGB sample.
As a further limitation of the technical solution of the present invention, the processing execution unit further includes:
the rule set determining subunit is used for obtaining the performance after loan of which the loan duration reaches 6 months and the overdue duration reaches 30 days in the pedestrian report corresponding to the refused client, and establishing an overdue evaluation rule set;
the marking subunit is used for marking the clients triggering the rule set as bad and marking the clients not triggering the overdue standard as good;
and the label inserting subunit is used for inserting good and bad labels into the rejection sample according to the marking result to obtain the IGB sample.
Compared with the prior art, the invention has the beneficial effects that: the invention is based on the target reclassification method, closely combines the service knowledge and the data expression condition, fully utilizes the behavior of people to collect the expression data of different organizations to perfect the post-credit expression of the user, and realizes the inference of the good and bad samples of the rejected samples.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention.
Fig. 1 is a flow chart of a scoring modeling method based on a human credit information rejection inference technique.
Fig. 2 is a first sub-flow diagram of a scoring modeling method based on the human credit information rejection inference technique.
Fig. 3 is a second sub-flow diagram of a scoring modeling method based on the human credit information rejection inference technique.
Fig. 4 is a third sub-flow diagram of a scoring modeling method based on the human credit information rejection inference technique.
Fig. 5 is a block diagram showing the structural components of a scoring modeling system based on the human credit information rejection inference technology.
Fig. 6 is a block diagram of a structure of a sample acquisition module in a scoring modeling system based on a human credit information rejection inference technology.
Fig. 7 is a block diagram of a structure of a sample labeling module in a scoring modeling system based on a human credit information rejection inference technology.
FIG. 8 is a block diagram of the structure of the processing execution unit in the sample labeling module.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
For the whole credit life cycle, the grading card model mainly comprises a pre-credit stage, a mid-credit stage and a post-credit stage, the grading card model of the three stages is a core part for formulating a strategy system of the whole credit life cycle, and the grading card is developed only by using a sample which is approved to pass the post-credit good and bad performance (KGB), so that the grading card model is contradictory to the risk level evaluation of all samples in an actual strategy application scene.
The rejection inference technique researches how to use rejected samples to solve the problem of model application deviation, and the common rejection inference technique comprises the following steps:
1. sample Re-weighting or dilation (Re-weighting/Augmentation);
2. re-classification (Re-classification);
3. scatter-and-wrap (Fuzzy Augmentation/packaging);
4. other methods, such as obtaining a full sample of post-credit performance data, developing a scorecard to evaluate the effect of a policy preset pass rejection.
The invention is based on the 2 nd frame, aiming at the rejected samples, the quality label of the user is deduced by utilizing the human credit report data to evaluate the repayment condition of the user in other organizations, and the full-scale (All Good Bad, AGB) sample training model is obtained based on the rejected samples of the known quality label (KGB) sample and the unknown Good Bad (IGB) sample. The method mainly solves two problems, namely, deducing the quality of the rejected sample based on the pedestrian report information to complete the development of the application scoring card, and directly deducing the performance of the rejected sample through the pedestrian report information to eliminate the influence of rejection factors of third-party data, thereby objectively realizing the effect of evaluating the third-party data in a unified scale.
Further, the concept of the KGB sample, the IGB reject sample, and the AGB sample described above will be specifically described as follows:
known Good and Bad label (KGB) samples: the admission model allows the passing of a sample set, and the model trained by the sample is called KGB model.
Unknown tag (introduced Good Bad, IGB) reject samples: the sample set rejected by the admission model, unknown samples without labels, is not typically used to train the model.
Full volume (All Good Bad, AGB) samples: full sample set containing both KGB and IGB fractions.
Example 1
Fig. 1 is a flow chart of a scoring modeling method based on a human credit information rejection inference technique, in an embodiment of the present invention, the scoring modeling method based on the human credit information rejection inference technique includes:
step S100: obtaining a KGB sample and a rejection sample; the KGB sample is a sample containing known good and bad labels;
step S200: in the rejection sample, deducing a good label and a bad label of the user by analyzing the historical information of the people's bank credit report to obtain an IGB sample; the IGB sample is a rejection sample containing a label of good or bad inference;
step S300: establishing a scoring card model according to the AGB sample; the AGB samples include KGB samples and IGB samples.
The method is characterized in that inference is rejected to essentially solve the problem of sample deviation of KGB sample and model application population extension, the problem of loss of dependent variables of partial samples is solved, and data loss types can be divided into completely random loss, random loss and non-random loss. Because the admission policy refuses the passenger groups with higher risks, the default probability of the refusal samples is higher than the bad account rate of the samples which actually pass, and the behavior of the policy refusal samples of the credit business after the credit is credited can be classified as non-random loss. In order to solve the problem of sample deviation, a rejection deduction technology needs to be introduced to realize quality deduction of rejected samples, and development of a scoring card model is completed.
Deducing the rejected sample based on the algorithm is the most common method at present, and after the quality deduction of the rejected sample is completed. The AGB samples were obtained and then scored. The common rejection inference technology and the common thinking are introduced as follows, firstly, a sample weight weighting method or an expansion method is adopted, a model is trained by utilizing a full sample (whether a target variable is defined as whether credit is passed or not) in a certain period, the model scores to obtain the credit passing probability of each sample, the reciprocal of the credit passing probability is calculated or the reciprocal of the passing rate of each sub-box is counted after sub-box division is carried out to be used as a weight, the full sample is obtained approximately, and a scoring card model is trained on the basis of observed 'good' and 'bad' samples and the weight. Secondly, the target reclassification method comprises the steps of taking an observed 'bad' sample as a rejection sample, training a pass rate model to predict the pass rate of all samples, marking a 'bad' sample label on part of the rejection sample by setting a threshold value, obtaining an AGB training model through KGB + ITB, and developing an application scoring card. And thirdly, a decentralized packaging method, wherein the rejected samples are scored through a KGB training model, and P (G) and P (B) are respectively obtained and used as weight training models of new samples. Other methods, such as full sample data obtained in stress testing, and extending the various techniques based on the above, etc.
Firstly, the Chinese people bank credit investigation center is used as the most authoritative credit investigation institution in China, and the people credit investigation report collects the financial data of most financial institutions, including all big banks and consumption financial institutions holding financial holding cards, most Internet financial institutions, small credit companies and the like. The user quality label can be directly calculated through the report data, the problem of model accuracy does not exist, and the problem of sample prediction deviation caused by the model is reduced. Secondly, complexity of developing a scoring card model is reduced, two models are generally required to be developed for scoring card development at present, inference is completed through whether KGB or credit approval is carried out on the development model, logic of a good-quality sample is directly referred to, and an IGB and AGB direct development model is obtained through calculation based on pedestrian report data. Finally, the method can objectively evaluate the effect of data other than the non-human report, including data accumulated by third party data sources and own platforms.
The following two types of scenarios can be effectively solved, firstly, the situation that the guest group is biased due to serial decision rejection, and secondly, the characteristic is that the decision stream already participates in the decision. In general risk strategy actual decision making, a scoring card is developed based on pedestrian report data or platform own data to carry out shunting, third-party scoring serial or parallel decision making is utilized, each node of the serial decision making flow carries out decision making on the previous node through the passenger group, so that the passenger groups which are decided by different nodes are biased relative to the Shen Qingke group, and different nodes, namely, decision nodes i are represented, and the probability of default correspondingly occurs. In addition, aiming at the characteristics which are already participated in the decision, the performance of the characteristics is reduced by evaluating the sample after the characteristics are participated in the decision, the performance of the sample is inferred and rejected through the pedestrian data, the performance before the decision is leveled, and the effect of the characteristics is unified and objective.
Fig. 2 is a first sub-flow diagram of a scoring modeling method based on the human credit information rejection inference technology, where the step of acquiring a KGB sample and the step of rejecting the KGB sample include steps S101 to S104:
step S101: acquiring product application data in a preset period;
step S102: carrying out view analysis and rolling rate analysis on clients which pass the credit and are used for quota, and determining target variables;
step S103: acquiring a KGB sample and a rejection sample based on the target variable.
In an example of the technical scheme of the invention, application data of a certain product of my company is collected in a preset period, vintage analysis and rolling rate analysis are performed on clients who pass credit and have quota support, development scale of business is comprehensively considered, target variables are formulated, 10000 pieces of application data are sampled after sampling, 7000 pieces of application data are credit-passing crowd and known credit back performance (KGB), 3000 pieces of application data are rejected samples (including human credit data development strategy, third-party data development strategy, combination strategy and the like), and third-party data of 10000 pieces of data return samples in an application link include multi-headed debt data, behavior data, scoring data, portrait users and the like.
Fig. 3 is a second sub-flow diagram of a scoring modeling method based on the human credit investigation information rejection inference technology, where the method infers the quality label of the user by analyzing performance data after the human credit investigation report collects credits of different institutions, and the step of obtaining an IGB sample in the rejection sample includes steps S301 to S304:
step S201: acquiring a personal credit investigation report of a sample inquired in an application stage;
step S202: reading the post-credit repayment data of the client in the personal credit report;
step S203: and deducing a good label and a bad label of a rejected sample according to the people's bank credit report data and the credited repayment data to obtain an IGB sample.
Fig. 4 is a third sub-flow diagram of a scoring modeling method based on a human credit information rejection inference technique, where the step of labeling the rejection sample based on the post-credit performance evaluation model to obtain an IGB sample includes:
step S2031: obtaining the performance after loan with the loan duration of 6 months and the overdue duration of 30 days in the pedestrian report corresponding to the refused client, and establishing an overdue evaluation rule set;
step S2032: marking the clients triggering the rule set as bad, and marking the clients not triggering the overdue standard as good;
step S2033: and inserting good and bad labels into the rejected samples according to the marking result to obtain the IGB samples.
In one example of the technical scheme, the method comprises the steps of inquiring human credit report data of a sample according to an application stage, formulating a set of post-credit performance evaluation mechanism based on post-credit repayment data of a client in other institutions, focusing on post-credit performance of more than 6 months and more than 30 days after credit in a rejection client human report, establishing an overdue evaluation rule set, marking a client triggering the rule set as bad, and marking a client not triggering the overdue standard as good (not bad or good) so as to obtain an IGB sample by marking good and bad labels on the rejection sample, wherein the ratio of the bad sample of the IGB sample to the bad sample is about 2 times of the ratio of the bad sample of the KGB sample in practice.
As a preferred embodiment of the technical solution of the present invention, the step of respectively establishing a score card model according to the KGB sample and the AGB sample includes:
and modeling the AGB sample by adopting a machine learning algorithm LightGBM to obtain a scoring card model.
A scoring card model is respectively established for KGB/AGB, a traditional logistic regression method and a mainstream machine learning algorithm LightGBM are respectively adopted for modeling, and the effect of the rejection inference technology is evaluated. The traditional logistic regression method has the advantages of simplicity and easiness in understanding, good model interpretability, clear display of influence of characteristic weight on a final result and the like, and is widely applied to a credit business scoring card modeling link. The LightGBM algorithm is a framework for realizing the GBDT algorithm, supports high-efficiency parallel training, and has the advantages of higher training speed, lower memory consumption, higher accuracy, supporting distributed type, capability of quickly processing mass data and the like. The method has wide application prospect in the field of credit risk research.
As a preferred embodiment of the technical solution of the present invention, the rejection reasons of the rejection samples include anti-fraud score rejection, pedestrian information admission rejection, and credit score rejection; and in the process of obtaining rejection samples, rejecting the anti-fraud scoring rejection samples.
In one example of the technical scheme of the invention, the rejection sample sampling process selects the reason for plan optimization rejection in the actual production link, and the method mainly comprises the following steps: the method comprises three parts of anti-fraud scoring rejection, pedestrian information admission rejection and credit scoring rejection, and because a pedestrian report is not called in an anti-fraud scoring rejection sample due to the problem of deployment in an actual production link, the sample cannot use the rejection inference method. In 3000 samples rejected currently, the actual calling pedestrian data sample size is 2500, and related derivative data are generated based on the sample, namely the actual IGB sample size is 2500 pens, wherein the pedestrian information admission rejects 1500 pens, and the credit score rejects 1000 pens. In the stage of identifying the good or bad labels aiming at the rejection samples, because the inference of the rejection samples is based on the overdue related information reported by the client rows, and the repetition degree of the inference rules of the rejection of the client rows in actual production is higher, the proportion of the bad samples in the rejection inference of the person row admission rejection samples is far higher than that of the credit score rejection samples (the credit score rejection samples pass through the related person strategy). Overall, such rejected samples have a high proportion of bad samples, which is also consistent with production practice, where the proportion of bad samples in rejected samples is typically significantly higher than accepted samples in actual credit windage.
In order to facilitate understanding of the beneficial effects of the technical scheme of the invention, the effect of the characteristics and the effect of the model are specifically described as follows:
1. the characteristic effects are as follows:
in the deduction rejection practice, aiming at multiple external credit scores, the method for expressing sampling and deduction rejection means is adopted to evaluate the scoring utility, deduction rejection is introduced, the problem that a sample is biased due to application link strategy screening when the credit scoring effect is evaluated is solved, the effect of evaluating third-party data in an objective unified scale is restored, the KS value corresponding to the score is used as an evaluation standard, two parties are compared to calculate the KS value only aiming at the application money sample and the KS value after the rejection deduction sample is added (the KS score is selected from multiple test scores of the same organization), and the comparison is related as follows:
TABLE 1 KS comparison of test squares across different samples
Name of score KGB AGB Comparison of Effect (Absolute value)
Score A 0.236 0.247 0.01
Score B 0.237 0.248 0.01
Score C 0.344 0.351 0.01
Score D 0.297 0.290 -0.01
Score E 0.230 0.208 -0.02
Score F 0.177 0.145 -0.03
Score G 0.256 0.217 -0.04
Score H 0.366 0.312 -0.05
Score I 0.369 0.280 -0.09
Scoring J 0.454 0.353 -0.10
It can be seen that after the IGB sample is added, the different scoring effects vary to different degrees, and are related to the actual rejection ratio of the scoring in the risk decision and the decision order. The IGB sample obtained based on the patent can objectively and comprehensively evaluate the scoring effect. Calculating KS according to a univariate, and scoring I and J most obviously aiming at KS attenuation, wherein the scores are consistent with the use in an actual decision link.
2. The model effect is as follows:
two algorithms are adopted to evaluate the model effect, one is a linear model (logistic regression), and the other is a tree model (LightGBM), the variables selected by the tree model are kept consistent with LR and are respectively modeled on a KGB sample and an AGB sample (rejection inference technology of the invention), and the parameters used by the LightGBM are basically consistent between the AGB and the KGB. Evaluation indexes of conventional models, AUC (Area Under the current/Area Under the Receiver Operating Characteristics) and KS (Kolmogorov-Smirnov) values (refer to the results in table 2) were evaluated for effects, and the effects of the present patent were analyzed. It is concluded that,
1. comparison of model effects between AGB and KGB samples. From the logistic regression method, AGB and KGB performed relatively consistently, and LightGBM was higher in KGB. Based on the training model on AGB, the orderliness of dividing IGB sample and KGB into 10 boxes is basically consistent.
2. Comparison of model stability between AGB and KGB samples. AUC and KS on the cross-term validation and training set showed more stable performance on AGB than KGB. Through analysis and an actual decision-making link, rejection of a certain score is added in a cross period, bad sample information can be perfected through a rejection inference technology, consistency of a sample space is guaranteed, and cross period verification and a training set are shown to be stable in AGB.
TABLE 2 Effect comparison of rejection inference
Figure BDA0003744598980000111
Note: KGB, in the passing sample and known "good", "bad" samples;
AGB refers to the "good" and "bad" samples after IGB + KGB combination.
Example 2
Fig. 5 is a block diagram of a component structure of a scoring modeling system based on a human credit information rejection inference technique, in an embodiment of the present invention, the scoring modeling system based on a human credit information rejection inference technique includes:
a sample obtaining module 11, configured to obtain a KGB sample and a reject sample; the KGB sample is a sample containing known good and bad labels;
the sample marking module 12 is configured to infer a good or bad label of the user by analyzing the history information of the people's bank credit report in the rejected sample, so as to obtain an IGB sample; the IGB sample is a rejection sample containing a label of good or bad inference;
the model establishing module 13 is used for establishing a scoring card model according to the AGB sample; the AGB samples include KGB samples and IGB samples.
Fig. 6 is a block diagram illustrating a structure of a sample obtaining module 11 in a scoring modeling system based on a human credit information rejection inference technique, where the sample obtaining module 11 includes:
a data acquiring unit 111, configured to acquire product application data in a preset period;
a target variable determination unit 112, configured to perform a view analysis and a scroll rate analysis on a client who passes the credit and is available for credit to determine a target variable;
a sample generating unit 113, configured to obtain the KGB sample and the rejection sample based on the target variable.
Fig. 7 is a block diagram illustrating a structure of a sample labeling module 12 in a scoring modeling system based on a human credit information rejection inference technology, where the sample labeling module 12 includes:
a pedestrian data query unit 121, configured to obtain a pedestrian credit investigation report of a sample queried in an application stage;
a repayment data extraction unit 122, configured to read post-credit repayment data of the customer in the human credit investigation report;
and the processing execution unit 123 is configured to implement inference of a good or bad label of the rejected sample according to the people's bank credit report data and the credited repayment data, so as to obtain an IGB sample.
Fig. 8 is a block diagram illustrating a structure of a process execution unit 123 in a sample labeling module, where the process execution unit 123 further includes:
a rule set determining subunit 1231, configured to obtain the post-loan performance that the loan duration reaches 6 months and the overdue duration reaches 30 days in the pedestrian report corresponding to the rejected client, and establish an overdue evaluation rule set;
a marking subunit 1232, configured to mark the client that triggered the rule set as bad, and mark the client that did not trigger the overdue criterion as good;
and a label inserting subunit 1233, configured to insert a good label or a bad label into the rejected sample according to the marking result, so as to obtain the IGB sample.
The functions that can be realized by the scoring modeling method based on the human credit information rejection inference technology are all completed by a computer device, and the computer device comprises one or more processors and one or more memories, wherein at least one program code is stored in the one or more memories, and the program code is loaded by the one or more processors and executed to realize the functions of the scoring modeling method based on the human credit information rejection inference technology.
The processor fetches instructions and analyzes the instructions one by one from the memory, then completes corresponding operations according to the instruction requirements, generates a series of control commands, enables all parts of the computer to automatically, continuously and coordinately act to form an organic whole, realizes the input of programs, the input of data, the operation and the output of results, and the arithmetic operation or the logic operation generated in the process is completed by the arithmetic unit; the Memory comprises a Read-Only Memory (ROM) for storing a computer program, and a protection device is arranged outside the Memory.
Illustratively, a computer program can be partitioned into one or more modules, which are stored in memory and executed by a processor to implement the present invention. One or more of the modules may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the terminal device.
Those skilled in the art will appreciate that the above description of the service device is merely exemplary and not limiting of the terminal device, and may include more or less components than those described, or combine certain components, or different components, such as may include input output devices, network access devices, buses, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the terminal equipment and connects the various parts of the entire user terminal using various interfaces and lines.
The memory may be used to store computer programs and/or modules, and the processor may implement various functions of the terminal device by operating or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory mainly comprises a storage program area and a storage data area, wherein the storage program area can store an operating system, application programs (such as an information acquisition template display function, a product information publishing function and the like) required by at least one function and the like; the storage data area may store data created according to the use of the berth status display system (such as product information acquisition templates corresponding to different product categories, product information that needs to be issued by different product providers, and the like). In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The terminal device integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the modules/units in the system according to the above embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used by a processor to implement the functions of the embodiments of the system. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present invention or directly or indirectly applied to other related technical fields are also included in the scope of the present invention.

Claims (10)

1. A scoring modeling method based on a human credit information rejection inference technology is characterized by comprising the following steps:
acquiring a KGB sample and a rejection sample; the KGB sample is a sample containing known good and bad labels;
in the rejection sample, deducing a good label and a bad label of the user by analyzing the historical information of the people's bank credit report to obtain an IGB sample; the IGB sample is a rejection sample containing a label of good or bad inference;
establishing a scoring card model according to the AGB sample; the AGB samples include KGB samples and IGB samples.
2. The method of claim 1, wherein the step of obtaining KGB samples and rejecting the samples comprises:
acquiring product application data in a preset period;
carrying out view analysis and rolling rate analysis on clients which pass the credit and are used for quota, and determining target variables;
acquiring a KGB sample and a rejection sample based on the target variable.
3. The scoring modeling method based on the human credit information rejection inference technology as claimed in claim 1, wherein said step of inferring the user's good or bad label by analyzing the human credit report history information in said rejection sample to obtain IGB sample comprises:
acquiring a personal credit investigation report of a sample inquired in an application stage;
reading the post-credit repayment data of the client in the personal credit report;
and deducing a good label of a refusal sample according to the people's bank credit report data and the credited repayment data to obtain an IGB sample.
4. The scoring modeling method based on the human credit information rejection inference technique according to claim 3, wherein the step of deducing the good or bad label of the rejection sample according to the human credit report data and the post-credit repayment data to obtain the IGB sample comprises:
obtaining the performance after loan with the loan duration of 6 months and the overdue duration of 30 days in the pedestrian report corresponding to the refused client, and establishing an overdue evaluation rule set;
marking the clients triggering the rule set as bad, and marking the clients not triggering the overdue standard as good;
and inserting good and bad labels into the rejected samples according to the marking result to obtain the IGB samples.
5. The method according to claim 1, wherein the step of establishing a score card model according to the AGB sample comprises:
and modeling the AGB sample by adopting a machine learning algorithm LightGBM to obtain a scoring card model.
6. The scoring modeling method based on pedestrian credit information rejection inference technique of claim 1, wherein the rejection reasons of the rejection samples include anti-fraud scoring rejection, pedestrian information admission rejection and credit scoring rejection; and in the process of obtaining rejection samples, rejecting the anti-fraud scoring rejection samples.
7. A scoring modeling system based on a human credit information rejection inference technique, the system comprising:
the sample acquisition module is used for acquiring KGB samples and rejection samples; the KGB sample is a sample containing known good and bad labels;
the sample marking module is used for deducing the quality label of the user by analyzing the history information of the people's bank credit report in the rejection sample to obtain an IGB sample; the IGB sample is a rejection sample containing a label of good or bad inference;
the model establishing module is used for establishing a scoring card model according to the AGB sample; the AGB samples include KGB samples and IGB samples.
8. The system of claim 7, wherein the sample acquisition module comprises:
the data acquisition unit is used for acquiring product application data in a preset period;
the target variable determining unit is used for carrying out view analysis and rolling rate analysis on clients which pass the credit and are used for quota, so as to determine a target variable;
and the sample generation unit is used for acquiring the KGB sample and the rejection sample based on the target variable.
9. The system of claim 7, wherein the sample labeling module comprises:
the pedestrian data query unit is used for acquiring a pedestrian credit report of the sample queried in the application stage;
the repayment data extraction unit is used for reading the post-loan repayment data of the client in the bank credit report;
and the processing execution unit is used for realizing the inference of the good or bad label of the rejection sample according to the people credit investigation report data and the credited repayment data to obtain the IGB sample.
10. The system of claim 9, wherein the processing unit further comprises:
the rule set determining subunit is used for obtaining the performance after loan of which the loan duration reaches 6 months and the overdue duration reaches 30 days in the pedestrian report corresponding to the refused client, and establishing an overdue evaluation rule set;
the marking subunit is used for marking the clients triggering the rule set as bad and marking the clients not triggering the overdue standard as good;
and the label inserting subunit is used for inserting good and bad labels into the rejected sample according to the marking result to obtain the IGB sample.
CN202210821327.XA 2022-07-13 2022-07-13 Grading modeling method and system based on pedestrian credit information rejection inference technology Pending CN115186489A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210821327.XA CN115186489A (en) 2022-07-13 2022-07-13 Grading modeling method and system based on pedestrian credit information rejection inference technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210821327.XA CN115186489A (en) 2022-07-13 2022-07-13 Grading modeling method and system based on pedestrian credit information rejection inference technology

Publications (1)

Publication Number Publication Date
CN115186489A true CN115186489A (en) 2022-10-14

Family

ID=83518835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210821327.XA Pending CN115186489A (en) 2022-07-13 2022-07-13 Grading modeling method and system based on pedestrian credit information rejection inference technology

Country Status (1)

Country Link
CN (1) CN115186489A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020069159A1 (en) * 2000-12-05 2002-06-06 Talbot Kevin L. Method and apparatus for recycling declined credit applications
CN109345371A (en) * 2018-08-30 2019-02-15 成都数联铭品科技有限公司 Personal reference report backtracking method and system
CN113919931A (en) * 2021-08-25 2022-01-11 北京睿知图远科技有限公司 Loan application scoring model use effect evaluation method and system
CN114066621A (en) * 2021-11-29 2022-02-18 武汉众邦银行股份有限公司 Customer rating method and device based on rejection inference and storage medium
CN114372862A (en) * 2021-12-08 2022-04-19 南京星云数字技术有限公司 Data processing method, data processing device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020069159A1 (en) * 2000-12-05 2002-06-06 Talbot Kevin L. Method and apparatus for recycling declined credit applications
CN109345371A (en) * 2018-08-30 2019-02-15 成都数联铭品科技有限公司 Personal reference report backtracking method and system
CN113919931A (en) * 2021-08-25 2022-01-11 北京睿知图远科技有限公司 Loan application scoring model use effect evaluation method and system
CN114066621A (en) * 2021-11-29 2022-02-18 武汉众邦银行股份有限公司 Customer rating method and device based on rejection inference and storage medium
CN114372862A (en) * 2021-12-08 2022-04-19 南京星云数字技术有限公司 Data processing method, data processing device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一只小白TWO: "金融风控中第三方数据源应用", 《博客园,HTTPS://WWW.CNBLOGS.COM/HOLE/P/16349359.HTML》 *

Similar Documents

Publication Publication Date Title
Olczyk A systematic retrieval of international competitiveness literature: a bibliometric study
CN111160992A (en) Marketing system based on user portrait system
CN109409677A (en) Enterprise Credit Risk Evaluation method, apparatus, equipment and storage medium
KR102009309B1 (en) Management automation system for financial products and management automation method using the same
CN112418653A (en) Number portability and network diver identification system and method based on machine learning algorithm
CN112700319A (en) Enterprise credit line determination method and device based on government affair data
Ereiz Predicting default loans using machine learning (OptiML)
CN110796539A (en) Credit investigation evaluation method and device
Antretter et al. Predicting startup survival from digital traces: Towards a procedure for early stage investors
CN113554350A (en) Activity evaluation method and apparatus, electronic device and computer readable storage medium
CN112232944A (en) Scoring card creating method and device and electronic equipment
Tan et al. The impact of digital transformation on the economic growth of the countries
CN113435713B (en) Risk map compiling method and system based on GIS technology and two-model fusion
CN113450158A (en) Bank activity information pushing method and device
CN114004691A (en) Line scoring method, device, equipment and storage medium based on fusion algorithm
CN112734566A (en) Credit limit acquisition method and device and computer equipment
CN112308623A (en) High-quality client loss prediction method and device based on supervised learning and storage medium
Dileo et al. Link prediction with text in online social networks: The role of textual content on high-resolution temporal data
CN116451841A (en) Enterprise loan default probability prediction method, device, electronic equipment and storage medium
CN115186489A (en) Grading modeling method and system based on pedestrian credit information rejection inference technology
CN115860924A (en) Supply chain financial credit risk early warning method and related equipment
CN115660814A (en) Risk prediction method and device, computer readable storage medium and electronic equipment
CN114693428A (en) Data determination method and device, computer readable storage medium and electronic equipment
CN113888318A (en) Risk detection method and system
Liu et al. Study on the Evaluation System of Individual Credit Risk in commercial banks based on data mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20221014

RJ01 Rejection of invention patent application after publication