CN112785415A - Scoring card model construction method, device, equipment and computer readable storage medium - Google Patents
Scoring card model construction method, device, equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN112785415A CN112785415A CN202110078425.4A CN202110078425A CN112785415A CN 112785415 A CN112785415 A CN 112785415A CN 202110078425 A CN202110078425 A CN 202110078425A CN 112785415 A CN112785415 A CN 112785415A
- Authority
- CN
- China
- Prior art keywords
- node
- model
- abnormal
- gbdt model
- gbdt
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010276 construction Methods 0.000 title claims description 23
- 230000002159 abnormal effect Effects 0.000 claims abstract description 151
- 238000003066 decision tree Methods 0.000 claims abstract description 138
- 238000012549 training Methods 0.000 claims abstract description 53
- 238000000034 method Methods 0.000 claims abstract description 25
- 230000006399 behavior Effects 0.000 claims description 23
- 238000013138 pruning Methods 0.000 claims description 8
- 238000012937 correction Methods 0.000 claims description 7
- 238000007477 logistic regression Methods 0.000 claims description 7
- 230000000694 effects Effects 0.000 abstract description 7
- 238000004891 communication Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Finance (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Accounting & Taxation (AREA)
- Evolutionary Biology (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention discloses a method, a device and equipment for constructing a rating card model and a computer readable storage medium, wherein the method comprises the following steps: acquiring credit behavior data of a client, and taking the credit behavior data as training data to train a GBDT model based on the training data, wherein the GBDT model comprises a plurality of decision trees; when each decision tree in the GBDT model is trained, undetermined abnormal nodes in non-leaf nodes of the decision tree are determined, the undetermined abnormal nodes are verified, and abnormal nodes in the undetermined abnormal nodes are determined; and retraining the GBDT model based on the abnormal node, and obtaining a corresponding grading card model after the GBDT model is trained. According to the invention, the GBDT model is optimized by controlling the internal structure of the GBDT model, so that the scoring card model not only retains the excellent effect of the GBDT model, but also ensures the interpretability of the model.
Description
Technical Field
The invention relates to the technical field of financial technology (Fintech), in particular to a method, a device and equipment for constructing a scoring card model and a computer readable storage medium.
Background
With the development of computer technology, more and more technologies (big data, distributed, Blockchain, artificial intelligence, etc.) are applied to the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but higher requirements are also put forward on the technologies due to the requirements of security and real-time performance of the financial industry.
Existing scoring card models typically use either a generic scoring card model or a GBDT model. The general scoring card model of the financial scene is constructed by sequentially adopting variable binning, WOE conversion and logistic regression fitting, and the whole process of the general scoring card model can be manually intervened in an intervention model, so that the model can be prevented from being fitted in a wrong direction. The GBDT model belongs to a black box model, given an input variable and an input target label, the GBDT is directly trained according to the direction with the minimum fitting error; because the data source may have noise, the GBDT model may fit the noise, which results in the model being inaccurate, being used in a financial scenario, causing misjudgment and the like, and easily causing capital loss.
However, the general scoring card model belongs to a simple interpretable model, but the model has poor prediction effect; the GBDT model belongs to an integrated tree model, and although the model is good in effect, the model is not explanatory.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a scoring card model construction method, a scoring card model construction device, scoring card model construction equipment and a computer readable storage medium, and aims to solve the technical problem that a GBDT model is poor in interpretability.
In order to achieve the above object, the present invention provides a score card model construction method, including the steps of:
acquiring credit behavior data of a client, and taking the credit behavior data as training data to train a GBDT model based on the training data, wherein the GBDT model comprises a plurality of decision trees;
when each decision tree in the GBDT model is trained, undetermined abnormal nodes in non-leaf nodes of the decision tree are determined, the undetermined abnormal nodes are verified, and abnormal nodes in the undetermined abnormal nodes are determined;
and retraining the GBDT model based on the abnormal node, and obtaining a corresponding grading card model after the GBDT model is trained.
Optionally, when each of the decision trees in the GBDT model is trained, the step of determining a node to be determined as an abnormal node in non-leaf nodes of the decision tree includes:
when each decision tree in the GBDT model is trained, determining the positive sample proportion of each branch of a non-leaf node in the decision tree, and determining a univariate trend corresponding to the non-leaf node;
and determining a node to be determined to be abnormal in the non-leaf nodes based on the positive sample proportion and the univariate trend.
Optionally, the step of determining a pending abnormal node in the non-leaf nodes based on the positive sample fraction and the univariate trend comprises:
determining a first positive sample fraction of a left branch and a second positive sample fraction of a right branch of the non-leaf node;
determining a node trend of the GBDT model on the non-leaf nodes based on the first and second positive sample fractions;
determining a pending abnormal node in the non-leaf nodes based on the node trend of the GBDT model on the non-leaf nodes and the univariate trend.
Optionally, the step of determining a pending abnormal node in the non-leaf node based on the node trend of the GBDT model on the non-leaf node and the univariate trend comprises:
detecting whether the node trend of the GBDT model on the non-leaf nodes is consistent with the univariate trend;
and taking the non-leaf node with the node trend consistent with the univariate trend as the node to be determined to be abnormal in the non-leaf node.
Optionally, the step of determining the node trend of the GBDT model on the non-leaf node based on the first and second positive sample fractions comprises:
comparing the first positive sample fraction to a second positive sample fraction;
if the first positive sample fraction is less than the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is positive;
if the first positive sample fraction is greater than or equal to the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is negative.
Optionally, the step of retraining the GBDT model based on the abnormal node, and obtaining a corresponding score card model after the GBDT model is trained, includes:
retraining each decision tree in the GBDT model based on the training data;
when each decision tree of the GBDT model is trained, if an abnormal node in the decision tree is traversed, the traversal of the abnormal node and a descendant node corresponding to the abnormal node is stopped;
when traversing to the leaf node corresponding to the abnormal node, correcting the residual error output by the leaf node corresponding to the abnormal node into the positive sample proportion of the corresponding leaf node, and generating a structure file of the decision tree after pruning and after correction after traversing each node in the decision tree;
and retraining each decision tree in the GBDT model based on the structure file to construct a scoring card model.
Optionally, the step of retraining each decision tree in the GBDT model based on the structure file comprises:
retraining the GBDT model based on the structure file, and determining an output result of each decision tree in the GBDT model;
and optimizing the decision tree coefficient of the GBDT model based on the output result of the decision tree and a pre-trained logistic regression model, and determining a scoring card model after the GBDT model is optimized.
In addition, in order to achieve the above object, the present invention further provides a score card model construction device, including:
the system comprises a first training module, a second training module and a third training module, wherein the first training module is used for acquiring credit behavior data of a client and taking the credit behavior data as training data so as to train a GBDT model based on the training data, and the GBDT model comprises a plurality of decision trees;
the determining module is used for determining an undetermined abnormal node in non-leaf nodes of the decision tree when each decision tree in the GBDT model is trained, checking the undetermined abnormal node and determining an abnormal node in the undetermined abnormal node;
and the second training module is used for retraining the GBDT model based on the abnormal node and obtaining a corresponding grading card model after the GBDT model is trained.
In addition, in order to achieve the above object, the present invention also provides a score card model building apparatus, including: the system comprises a memory, a processor and a scoring card model building program which is stored on the memory and can run on the processor, wherein the scoring card model building program realizes the steps of the scoring card model building method when being executed by the processor.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a scorecard model construction program which, when executed by a processor, implements the steps of the scorecard model construction method as described above.
According to the method, credit behavior data of a client are obtained and serve as training data, so that a GBDT model is trained on the basis of the training data, wherein the GBDT model comprises a plurality of decision trees; when each decision tree in the GBDT model is trained, undetermined abnormal nodes in non-leaf nodes of the decision tree are determined, the undetermined abnormal nodes are verified, and abnormal nodes in the undetermined abnormal nodes are determined; and retraining the GBDT model based on the abnormal node, and obtaining a corresponding grading card model after the GBDT model is trained. In the embodiment, the method intervenes in the GBDT model, determines abnormal nodes in a decision tree of the GBDT model during first training, trains the GBDT model again after determining the abnormal nodes, and constructs the scoring card model, so that the structure in the GBDT model is controlled, the GBDT model is optimized, the scoring card model not only retains the excellent effect of the GBDT model, but also ensures the interpretability of the model.
Drawings
FIG. 1 is a schematic structural diagram of a scoring card model building device of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a first embodiment of a scoring card model construction method according to the present invention;
fig. 3 is a flowchart illustrating a second embodiment of the scoring card model construction method according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Fig. 1 is a schematic structural diagram of a scoring card model building device of a hardware operating environment according to an embodiment of the present invention.
The scoring card model construction equipment in the embodiment of the invention can be a PC (personal computer), and can also be a mobile terminal equipment with a display function, such as a smart phone, a tablet computer, an electronic book reader, a portable computer and the like.
As shown in fig. 1, the score card model building apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
It will be understood by those skilled in the art that the score card model building apparatus configuration shown in fig. 1 does not constitute a limitation of the score card model building apparatus, and may include more or less components than those shown, or combine some components, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a rating card model building program.
In the scoring card model building device shown in fig. 1, the network interface 1004 is mainly used for connecting with a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to call the scorecard model builder stored in the memory 1005.
In this embodiment, the score card model building apparatus includes: a memory 1005, a processor 1001 and a scorecard model building program stored on the memory 1005 and operable on the processor 1001, wherein when the processor 1001 calls the scorecard model building program stored in the memory 1005, the following operations are performed:
acquiring credit behavior data of a client, and taking the credit behavior data as training data to train a GBDT model based on the training data, wherein the GBDT model comprises a plurality of decision trees;
when each decision tree in the GBDT model is trained, undetermined abnormal nodes in non-leaf nodes of the decision tree are determined, the undetermined abnormal nodes are verified, and abnormal nodes in the undetermined abnormal nodes are determined;
and retraining the GBDT model based on the abnormal node, and obtaining a corresponding grading card model after the GBDT model is trained.
Further, the processor 1001 may call the scorecard model builder stored in the memory 1005, and further perform the following operations:
when each decision tree in the GBDT model is trained, determining the positive sample proportion of each branch of a non-leaf node in the decision tree, and determining a univariate trend corresponding to the non-leaf node;
and determining a node to be determined to be abnormal in the non-leaf nodes based on the positive sample proportion and the univariate trend.
Further, the processor 1001 may call the scorecard model builder stored in the memory 1005, and further perform the following operations:
determining a first positive sample fraction of a left branch and a second positive sample fraction of a right branch of the non-leaf node;
determining a node trend of the GBDT model on the non-leaf nodes based on the first and second positive sample fractions;
determining a pending abnormal node in the non-leaf nodes based on the node trend of the GBDT model on the non-leaf nodes and the univariate trend.
Further, the processor 1001 may call the scorecard model builder stored in the memory 1005, and further perform the following operations:
detecting whether the node trend of the GBDT model on the non-leaf nodes is consistent with the univariate trend;
and taking the non-leaf node with the node trend consistent with the univariate trend as the node to be determined to be abnormal in the non-leaf node.
Further, the processor 1001 may call the scorecard model builder stored in the memory 1005, and further perform the following operations:
comparing the first positive sample fraction to a second positive sample fraction;
if the first positive sample fraction is less than the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is positive;
if the first positive sample fraction is greater than or equal to the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is negative.
Further, the processor 1001 may call the scorecard model builder stored in the memory 1005, and further perform the following operations:
retraining each decision tree in the GBDT model based on the training data;
when each decision tree of the GBDT model is trained, if an abnormal node in the decision tree is traversed, the traversal of the abnormal node and a descendant node corresponding to the abnormal node is stopped;
when traversing to the leaf node corresponding to the abnormal node, correcting the residual error output by the leaf node corresponding to the abnormal node into the positive sample proportion of the corresponding leaf node, and generating a structure file of the decision tree after pruning and after correction after traversing each node in the decision tree;
and retraining each decision tree in the GBDT model based on the structure file to construct a scoring card model.
Further, the processor 1001 may call the scorecard model builder stored in the memory 1005, and further perform the following operations:
retraining the GBDT model based on the structure file, and determining an output result of each decision tree in the GBDT model;
and optimizing the decision tree coefficient of the GBDT model based on the output result of the decision tree and a pre-trained logistic regression model, and determining a scoring card model after the GBDT model is optimized.
The invention also provides a scoring card model construction method, and referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the scoring card model construction method of the invention.
In this embodiment, the method for constructing a score card model includes the following steps:
step S10, obtaining credit behavior data of a client, and using the credit behavior data as training data to train a GBDT model based on the training data, wherein the GBDT model comprises a plurality of decision trees;
the credit card scoring model provided by the invention is applied to a loan institution, is used for constructing an interpretable scoring card model based on GBDT applied to a financial scene, controls the structure in the model by being inserted into the GBDT model, not only can retain the excellent effect of the GBDT model, but also can control the structure in the model, ensures the interpretability of the model, thereby solving the technical problem of poor interpretability of the GBDT model, and ensuring that the scoring card model trained based on the GBDT model has good effect and high interpretability. Wherein, the GBDT (Gradient Boosting Decision Tree) model is a Decision Tree model trained by a Gradient Boosting strategy.
In the embodiment, the credit behavior data of the client comprises a credit history record and a service performance record of the client, wherein the credit history record is a personal credit investigation record recorded by the client at the people's bank, and the service performance record is record data of the behavior of the client on the loan transaction at the loan institution, and comprises a loan amount, a borrowing time, a repayment time and the like. In the process of constructing a scoring card model, firstly, acquiring credit data of a client, and taking the credit data as training data for training a GBDT model; and inputting the training data into a GBDT model to train the GBDT model, wherein the GBDT model is a decision tree model trained by a gradient boosting strategy, and the GBDT model comprises a plurality of decision trees.
Step S20, when each decision tree in the GBDT model is trained, undetermined abnormal nodes in non-leaf nodes of the decision tree are determined, the undetermined abnormal nodes are checked, and abnormal nodes in the undetermined abnormal nodes are determined;
in this embodiment, when each decision tree in the GBDT model is trained, for all decision trees in the GBDT model, nodes in each decision tree are traversed to determine non-leaf nodes in each decision tree; and processing the decision result of the sample in the non-leaf node, and determining the node to be determined to be abnormal in the non-leaf node, so as to detect the abnormal node in the non-leaf node in the decision tree corresponding to the GBDT model, wherein the abnormal node determined in the current step is the node to be determined to be abnormal, and the node to be determined to be abnormal is further analyzed subsequently to determine the abnormal node in the node to be determined to be abnormal. After determining abnormal nodes in the non-leaf nodes corresponding to the decision trees, checking the to-be-determined abnormal nodes of the decision trees for all the decision trees in the GBDT model so as to determine the abnormal nodes in the to-be-determined abnormal nodes.
And step S30, retraining the GBDT model based on the abnormal node, and obtaining a corresponding grading card model after the GBDT model is trained.
In this embodiment, after determining the abnormal node in the decision tree of the GBDT model, the credit behavior data of the client is obtained and used as the training data of the GBDT model to retrain the GBDT model. When each decision tree of the GBDT model is trained, if the abnormal node in the decision tree is traversed, the traversal of the abnormal node and the descendant node corresponding to the abnormal node is stopped, and when the leaf node corresponding to the abnormal node is traversed, the residual error output by the leaf node corresponding to the abnormal node is changed into the positive sample proportion of the leaf node, so that the grading card model is more in line with the expert experience.
In the score card model construction method provided by this embodiment, a GBDT model is trained based on training data by acquiring credit behavior data of a client and using the credit behavior data as the training data, where the GBDT model includes a plurality of decision trees; when each decision tree in the GBDT model is trained, undetermined abnormal nodes in non-leaf nodes of the decision tree are determined, the undetermined abnormal nodes are verified, and abnormal nodes in the undetermined abnormal nodes are determined; and retraining the GBDT model based on the abnormal node, and obtaining a corresponding grading card model after the GBDT model is trained. In the embodiment, the method intervenes in the GBDT model, determines abnormal nodes in a decision tree of the GBDT model during first training, trains the GBDT model again after determining the abnormal nodes, and constructs the scoring card model, so that the structure in the GBDT model is controlled, the GBDT model is optimized, the scoring card model not only retains the excellent effect of the GBDT model, but also ensures the interpretability of the model.
Based on the first embodiment, a second embodiment of the scoring card model construction method of the present invention is provided, and referring to fig. 3, in this embodiment, step S20 includes:
step S21, when each decision tree in the GBDT model is trained, determining the positive sample proportion of each branch of the non-leaf node in the decision tree and determining the univariate trend corresponding to the non-leaf node;
and step S22, determining the node to be determined to be abnormal in the non-leaf nodes based on the positive sample proportion and the univariate trend.
In this embodiment, when each decision tree in the GBDT model is trained, for all decision trees in the GBDT model, nodes in each decision tree are traversed to determine non-leaf nodes in each decision tree. Specifically, when each decision tree in the GBDT model is trained, the decision results of the samples in the non-leaf nodes are processed, and the positive sample proportion of each branch of the non-leaf nodes in the decision tree is determined; after the positive sample proportion of each branch of the non-leaf node is determined, the univariate trend corresponding to the non-leaf node is determined based on expert experience. And then, comparing the positive sample proportion with the univariate trend according to the positive sample proportion and the univariate trend, and determining an abnormal node to be determined in the non-leaf nodes, so as to detect an abnormal node in the non-leaf nodes in the decision tree corresponding to the GBDT model, wherein the abnormal node determined in the current step is the abnormal node to be determined, and subsequently, further analyzing the abnormal node to be determined to determine the abnormal node in the abnormal node to be determined.
Further, the step of determining a pending abnormal node in the non-leaf nodes based on the positive sample fraction and the univariate trend comprises:
step S221, determining a first positive sample proportion of a left branch and a second positive sample proportion of a right branch of the non-leaf node;
step S222, determining the node trend of the GBDT model on the non-leaf nodes based on the first positive sample proportion and the second positive sample proportion;
step S223, determining an abnormal node to be determined in the non-leaf node based on the node trend of the GBDT model on the non-leaf node and the univariate trend.
In this embodiment, the positive sample fraction of each branch of the non-leaf node includes a first positive sample fraction of a left branch of the non-leaf node and a second positive sample fraction of a right branch of the non-leaf node. When each decision tree in the GBDT model is trained, the decision results of the samples in the non-leaf nodes are processed, and the first positive sample proportion of the left branch and the second positive sample proportion of the right branch of the non-leaf nodes in the decision trees are determined.
After the positive sample proportion of each branch of the non-leaf node is determined, comparing the first positive sample proportion with the second positive sample proportion to determine the node trend on the non-leaf node; and determining univariate trends corresponding to the non-leaf nodes based on expert experience. And then, comparing the node trend with the univariate trend according to the node trend and the univariate trend, and determining the to-be-determined abnormal node in the non-leaf node, so as to detect the abnormal node in the non-leaf node in the decision tree corresponding to the GBDT model, wherein the abnormal node determined in the current step is the to-be-determined abnormal node, and subsequently, the to-be-determined abnormal node needs to be further analyzed to determine the abnormal node in the to-be-determined abnormal node.
Further, the step of determining the pending abnormal node in the non-leaf node based on the node trend of the GBDT model on the non-leaf node and the univariate trend comprises:
step S2231, detecting whether the node trend of the GBDT model on the non-leaf nodes is consistent with the univariate trend;
step S2232, using the non-leaf node whose node trend is consistent with the univariate trend as the node to be determined to be abnormal in the non-leaf node.
In this embodiment, after determining the node trend on the non-leaf node and determining the univariate trend corresponding to the non-leaf node for each non-leaf node in the decision tree, according to the node trend and the univariate trend, by detecting whether the node trend of the GBDT model on the non-leaf node is consistent with the univariate trend corresponding to the expert experience, the node trend is compared with the univariate trend, so as to determine the node to be determined to be abnormal in the non-leaf node. Specifically, if the node trend corresponding to the non-leaf node is consistent with the corresponding univariate trend, the non-leaf node is a normal node; and if the node trend corresponding to the non-leaf node is inconsistent with the corresponding univariate trend, the non-leaf node is the node to be determined to be abnormal. And taking the non-leaf node with the node trend consistent with the univariate trend as the undetermined abnormal node in the non-leaf node, thereby detecting the undetermined abnormal node in the non-leaf node in the decision tree corresponding to the GBDT model.
Further, the step of determining the node trend of the GBDT model on the non-leaf nodes based on the first and second positive sample fractions comprises:
step S2221, comparing the first positive sample proportion and the second positive sample proportion;
step S2222, if the first positive sample proportion is smaller than the second positive sample proportion, the node trend of the GBDT model on the non-leaf nodes is positive;
step S2223, if the first positive sample fraction is greater than or equal to the second positive sample fraction, the node trend of the GBDT model on the non-leaf node is negative.
In this embodiment, after determining the positive sample fraction of each branch of the non-leaf node, the first positive sample fraction and the second positive sample fraction are compared to determine the node trend on the non-leaf node. Specifically, if the first positive sample ratio is smaller than the second positive sample ratio, that is, the first positive sample ratio corresponding to the left branch of the non-leaf node is smaller than the second positive sample ratio corresponding to the right branch, the node trend of the GBDT model on the non-leaf node is positive; if the first positive sample ratio is greater than or equal to the second positive sample ratio, that is, the first positive sample ratio corresponding to the left branch of the non-leaf node is greater than or equal to the second positive sample ratio corresponding to the right branch, the node trend of the GBDT model on the non-leaf node is negative.
Further, the step of determining a first positive sample fraction of a left branch and a second positive sample fraction of a right branch of the non-leaf node comprises:
step S2211, obtaining a first positive sample number of the left branch of the non-leaf node, a second positive sample number of the right branch of the non-leaf node, and a sample number on the non-leaf node;
step S2212, determining a first positive sample proportion of the left branch of the non-leaf node based on the first positive sample number and the sample number;
step S2213, determining a second positive sample fraction of the right branch of the non-leaf node based on the second positive sample number and the sample number.
In this embodiment, the positive sample fraction of each branch of the non-leaf node includes a first positive sample fraction of a left branch of the non-leaf node and a second positive sample fraction of a right branch of the non-leaf node. When each decision tree in the GBDT model is trained, the decision results of the samples in the non-leaf nodes are processed, and the first positive sample proportion of the left branch and the second positive sample proportion of the right branch of the non-leaf nodes in the decision trees are determined. Specifically, for each decision tree in the GBDT model, a first positive sample number of a left branch of a non-leaf node, a second positive sample number of a right branch of the non-leaf node and a sample number on the non-leaf node in the decision tree are obtained; then, calculating the proportion of the first positive sample number in the sample number according to the first positive sample number and the total sample number of the non-leaf nodes to obtain the first positive sample proportion of the left branch of the non-leaf nodes; and calculating the proportion of the second positive sample number in the sample number according to the second positive sample number and the total sample number on the non-leaf node to obtain the second positive sample proportion of the right branch of the non-leaf node.
Further, the step of retraining the GBDT model based on the abnormal node and obtaining a corresponding score card model after the GBDT model is trained includes:
step S31, retraining each decision tree in the GBDT model based on the training data;
step S32, when each decision tree of the GBDT model is trained, if the abnormal node in the decision tree is traversed, the traversing of the abnormal node and the descendant node corresponding to the abnormal node is stopped;
step S33, when traversing to the leaf node corresponding to the abnormal node, correcting the residual error output by the leaf node corresponding to the abnormal node into the positive sample proportion of the corresponding leaf node, and generating the structure file of the decision tree after pruning and after correction after traversing each node in the decision tree;
and step S34, retraining each decision tree in the GBDT model based on the structure file to construct a scoring card model.
In this embodiment, after determining the abnormal node in the decision tree of the GBDT model, the credit behavior data of the client is obtained and used as the training data of the GBDT model to retrain the GBDT model. When each decision tree of the GBDT model is trained, if an abnormal node in the decision tree is traversed, the traversal of the abnormal node and a descendant node corresponding to the abnormal node is stopped, so as to prune a branch where the node in the decision tree is located; and when the leaf node corresponding to the abnormal node is traversed, the residual error output by the leaf node corresponding to the abnormal node is changed into the positive sample proportion of the corresponding leaf node, so as to correct the node result of the branch where the abnormal node is located, and the grading card model is more in line with the expert experience. After pruning and correcting the branch where the abnormal node is located, generating a structure file corresponding to the decision tree after pruning and correction so as to be used for training the GBDT model again to construct a scoring card model based on the structure file containing the decision tree after pruning and correction.
Further, the step of retraining each decision tree in the GBDT model based on the structure file and determining the score card model includes:
step S341, retraining the GBDT model based on the structure file, and determining an output result of each decision tree in the GBDT model;
and step S342, optimizing the decision tree coefficient of the GBDT model based on the output result of the decision tree and a pre-trained logistic regression model, and determining a scoring card model after the GBDT model is optimized.
In this embodiment, after determining an abnormal node in a decision tree of the GBDT model, credit behavior data of a client is obtained, and the credit behavior data is used as training data of the GBDT model, so as to retrain the decision tree corresponding to the structure file, and obtain an output result of each decision tree in the GBDT model. When each decision tree of the GBDT model is trained, if the abnormal node in the decision tree is traversed, the traversal of the abnormal node and the descendant node of the abnormal node is stopped, and when the leaf node corresponding to the abnormal node is traversed, the output result output by the GBDT model corresponding to the leaf node is corrected to be the positive sample proportion of the leaf node, so that the grading card model is more in line with the expert experience. And then, after the output result of the leaf node corresponding to the abnormal node in the decision tree is corrected, the decision tree coefficient of the GBDT model is continuously optimized according to the corrected output result and a pre-trained logistic regression model, and after the GBDT model is optimized, a scoring card model is obtained.
In addition, an embodiment of the present invention further provides a score card model construction device, where the score card model construction device includes:
the system comprises a first training module, a second training module and a third training module, wherein the first training module is used for acquiring credit behavior data of a client and taking the credit behavior data as training data so as to train a GBDT model based on the training data, and the GBDT model comprises a plurality of decision trees;
the determining module is used for determining an undetermined abnormal node in non-leaf nodes of the decision tree when each decision tree in the GBDT model is trained, checking the undetermined abnormal node and determining an abnormal node in the undetermined abnormal node;
and the second training module is used for retraining the GBDT model based on the abnormal node and obtaining a corresponding grading card model after the GBDT model is trained.
Further, the determining module is further configured to:
when each decision tree in the GBDT model is trained, determining the positive sample proportion of each branch of a non-leaf node in the decision tree, and determining a univariate trend corresponding to the non-leaf node;
and determining a node to be determined to be abnormal in the non-leaf nodes based on the positive sample proportion and the univariate trend.
Further, the determining module is further configured to:
determining a first positive sample fraction of a left branch and a second positive sample fraction of a right branch of the non-leaf node;
determining a node trend of the GBDT model on the non-leaf nodes based on the first and second positive sample fractions;
determining a pending abnormal node in the non-leaf nodes based on the node trend of the GBDT model on the non-leaf nodes and the univariate trend.
Further, the determining module is further configured to:
detecting whether the node trend of the GBDT model on the non-leaf nodes is consistent with the univariate trend;
and taking the non-leaf node with the node trend consistent with the univariate trend as the node to be determined to be abnormal in the non-leaf node.
Further, the determining module is further configured to:
comparing the first positive sample fraction to a second positive sample fraction;
if the first positive sample fraction is less than the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is positive;
if the first positive sample fraction is greater than or equal to the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is negative.
Further, the second training module is further configured to:
retraining each decision tree in the GBDT model based on the training data;
when each decision tree of the GBDT model is trained, if an abnormal node in the decision tree is traversed, the traversal of the abnormal node and a descendant node corresponding to the abnormal node is stopped;
when traversing to the leaf node corresponding to the abnormal node, correcting the residual error output by the leaf node corresponding to the abnormal node into the positive sample proportion of the corresponding leaf node, and generating a structure file of the decision tree after pruning and after correction after traversing each node in the decision tree;
and retraining each decision tree in the GBDT model based on the structure file to construct a scoring card model.
Further, the second training module is further configured to:
retraining the GBDT model based on the structure file, and determining an output result of each decision tree in the GBDT model;
and optimizing the decision tree coefficient of the GBDT model based on the output result of the decision tree and a pre-trained logistic regression model, and determining a scoring card model after the GBDT model is optimized.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, in which a scorecard model construction program is stored, and when executed by a processor, the scorecard model construction program implements the steps of the scorecard model construction method according to any one of the above.
The specific embodiment of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the above scoring card model construction method, and will not be described in detail herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A scoring card model construction method is characterized by comprising the following steps:
acquiring credit behavior data of a client, and taking the credit behavior data as training data to train a GBDT model based on the training data, wherein the GBDT model comprises a plurality of decision trees;
when each decision tree in the GBDT model is trained, undetermined abnormal nodes in non-leaf nodes of the decision tree are determined, the undetermined abnormal nodes are verified, and abnormal nodes in the undetermined abnormal nodes are determined;
and retraining the GBDT model based on the abnormal node, and obtaining a corresponding grading card model after the GBDT model is trained.
2. The method of claim 1, wherein the step of determining a pending anomaly node among non-leaf nodes of the decision tree when training each of the decision trees in the GBDT model comprises:
when each decision tree in the GBDT model is trained, determining the positive sample proportion of each branch of a non-leaf node in the decision tree, and determining a univariate trend corresponding to the non-leaf node;
and determining a node to be determined to be abnormal in the non-leaf nodes based on the positive sample proportion and the univariate trend.
3. The method of claim 2, wherein the step of determining the pending anomaly node in the non-leaf nodes based on the positive sample fraction and the univariate trend comprises:
determining a first positive sample fraction of a left branch and a second positive sample fraction of a right branch of the non-leaf node;
determining a node trend of the GBDT model on the non-leaf nodes based on the first and second positive sample fractions;
determining a pending abnormal node in the non-leaf nodes based on the node trend of the GBDT model on the non-leaf nodes and the univariate trend.
4. The method of constructing a scorecard model according to claim 3, wherein said step of determining a pending anomaly node in said non-leaf nodes based on a node trend of said GBDT model on said non-leaf nodes and said univariate trend comprises:
detecting whether the node trend of the GBDT model on the non-leaf nodes is consistent with the univariate trend;
and taking the non-leaf node with the node trend consistent with the univariate trend as the node to be determined to be abnormal in the non-leaf node.
5. The method of constructing a scorecard model according to claim 3, wherein said step of determining a node trend of said GBDT model on said non-leaf nodes based on said first and second positive sample fractions comprises:
comparing the first positive sample fraction to a second positive sample fraction;
if the first positive sample fraction is less than the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is positive;
if the first positive sample fraction is greater than or equal to the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is negative.
6. A scoring card model construction method according to any one of claims 1 to 5, wherein said step of retraining said GBDT model based on said abnormal nodes and obtaining a corresponding scoring card model after training said GBDT model comprises:
retraining each decision tree in the GBDT model based on the training data;
when each decision tree of the GBDT model is trained, if an abnormal node in the decision tree is traversed, the traversal of the abnormal node and a descendant node corresponding to the abnormal node is stopped;
when traversing to the leaf node corresponding to the abnormal node, correcting the residual error output by the leaf node corresponding to the abnormal node into the positive sample proportion of the corresponding leaf node, and generating a structure file of the decision tree after pruning and after correction after traversing each node in the decision tree;
and retraining each decision tree in the GBDT model based on the structure file to construct a scoring card model.
7. The method according to claim 6, wherein the step of retraining each decision tree in the GBDT model based on the structure file comprises:
retraining the GBDT model based on the structure file, and determining an output result of each decision tree in the GBDT model;
and optimizing the decision tree coefficient of the GBDT model based on the output result of the decision tree and a pre-trained logistic regression model, and determining a scoring card model after the GBDT model is optimized.
8. A score card model construction device, characterized by comprising:
the system comprises a first training module, a second training module and a third training module, wherein the first training module is used for acquiring credit behavior data of a client and taking the credit behavior data as training data so as to train a GBDT model based on the training data, and the GBDT model comprises a plurality of decision trees;
the determining module is used for determining an undetermined abnormal node in non-leaf nodes of the decision tree when each decision tree in the GBDT model is trained, checking the undetermined abnormal node and determining an abnormal node in the undetermined abnormal node;
and the second training module is used for retraining the GBDT model based on the abnormal node and obtaining a corresponding grading card model after the GBDT model is trained.
9. A score card model building apparatus, characterized by comprising: memory, a processor and a scorecard model builder stored on the memory and executable on the processor, the scorecard model builder, when executed by the processor, implementing the steps of the scorecard model building method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that a scorecard model construction program is stored on the computer-readable storage medium, which when executed by a processor implements the steps of the scorecard model construction method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110078425.4A CN112785415B (en) | 2021-01-20 | 2021-01-20 | Method, device and equipment for constructing scoring card model and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110078425.4A CN112785415B (en) | 2021-01-20 | 2021-01-20 | Method, device and equipment for constructing scoring card model and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112785415A true CN112785415A (en) | 2021-05-11 |
CN112785415B CN112785415B (en) | 2024-01-12 |
Family
ID=75758025
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110078425.4A Active CN112785415B (en) | 2021-01-20 | 2021-01-20 | Method, device and equipment for constructing scoring card model and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112785415B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018188543A1 (en) * | 2017-04-14 | 2018-10-18 | 腾讯科技(深圳)有限公司 | Real-time credit score adjustment processing method and device and processing server |
WO2019061187A1 (en) * | 2017-09-28 | 2019-04-04 | 深圳乐信软件技术有限公司 | Credit evaluation method and apparatus and gradient boosting decision tree parameter adjustment method and apparatus |
CN109587000A (en) * | 2018-11-14 | 2019-04-05 | 上海交通大学 | High latency method for detecting abnormality and system based on collective intelligence network measurement data |
WO2019080407A1 (en) * | 2017-10-25 | 2019-05-02 | 深圳壹账通智能科技有限公司 | Credit evaluation method, apparatus and device, and computer readable storage medium |
CN110796485A (en) * | 2019-10-11 | 2020-02-14 | 上海上湖信息技术有限公司 | Method and device for improving prediction precision of prediction model |
CN111311400A (en) * | 2020-03-30 | 2020-06-19 | 百维金科(上海)信息科技有限公司 | Modeling method and system of grading card model based on GBDT algorithm |
CN111382911A (en) * | 2020-03-20 | 2020-07-07 | 达而观信息科技(上海)有限公司 | High-cabinet personnel scheduling prediction method based on bank outlet business data |
-
2021
- 2021-01-20 CN CN202110078425.4A patent/CN112785415B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018188543A1 (en) * | 2017-04-14 | 2018-10-18 | 腾讯科技(深圳)有限公司 | Real-time credit score adjustment processing method and device and processing server |
WO2019061187A1 (en) * | 2017-09-28 | 2019-04-04 | 深圳乐信软件技术有限公司 | Credit evaluation method and apparatus and gradient boosting decision tree parameter adjustment method and apparatus |
WO2019080407A1 (en) * | 2017-10-25 | 2019-05-02 | 深圳壹账通智能科技有限公司 | Credit evaluation method, apparatus and device, and computer readable storage medium |
CN109587000A (en) * | 2018-11-14 | 2019-04-05 | 上海交通大学 | High latency method for detecting abnormality and system based on collective intelligence network measurement data |
CN110796485A (en) * | 2019-10-11 | 2020-02-14 | 上海上湖信息技术有限公司 | Method and device for improving prediction precision of prediction model |
CN111382911A (en) * | 2020-03-20 | 2020-07-07 | 达而观信息科技(上海)有限公司 | High-cabinet personnel scheduling prediction method based on bank outlet business data |
CN111311400A (en) * | 2020-03-30 | 2020-06-19 | 百维金科(上海)信息科技有限公司 | Modeling method and system of grading card model based on GBDT algorithm |
Non-Patent Citations (1)
Title |
---|
孙权;赵金涛;: "基于数据挖掘的商户风险评分方法和系统", 软件产业与工程, no. 01, pages 33 - 37 * |
Also Published As
Publication number | Publication date |
---|---|
CN112785415B (en) | 2024-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200210899A1 (en) | Machine learning model training method and device, and electronic device | |
JP6778273B2 (en) | Performance model adverse effects compensation | |
DE102020110536A1 (en) | Methods, systems, articles of manufacture, and devices for a context and complexity aware recommendation system for improved software development efficiency | |
CN108021934B (en) | Method and device for recognizing multiple elements | |
CN111160569A (en) | Application development method and device based on machine learning model and electronic equipment | |
EP3441891A1 (en) | Data source-based service customisation apparatus, method, system, and storage medium | |
WO2020238783A1 (en) | Information processing method and device, and storage medium | |
CN112906384B (en) | BERT model-based data processing method, BERT model-based data processing device, BERT model-based data processing equipment and readable storage medium | |
CN112101520A (en) | Risk assessment model training method, business risk assessment method and other equipment | |
CN106649661A (en) | Method and device for establishing knowledge base | |
CN110874710A (en) | Recruitment assistance method and device | |
CN113177700B (en) | Risk assessment method, system, electronic equipment and storage medium | |
CN111815169A (en) | Business approval parameter configuration method and device | |
CN111199469A (en) | User payment model generation method and device and electronic equipment | |
CN112671985A (en) | Agent quality inspection method, device, equipment and storage medium based on deep learning | |
CN116775879A (en) | Fine tuning training method of large language model, contract risk review method and system | |
CN111178656A (en) | Credit model training method, credit scoring device and electronic equipment | |
CN113919432A (en) | Classification model construction method, data classification method and device | |
CN116842263A (en) | Training processing method and device for intelligent question-answering financial advisor model | |
CN116542418A (en) | Deep learning-based business handling method and system for office hall | |
CN112785415B (en) | Method, device and equipment for constructing scoring card model and computer readable storage medium | |
CN114282498B (en) | Data knowledge processing system applied to electric power transaction | |
CN112529624B (en) | Method, device, equipment and storage medium for generating business prediction model | |
CN114581249B (en) | Financial product recommendation method and system based on investment risk bearing capacity assessment | |
CN113918817B (en) | Push model construction method, push model construction device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |