CN112785415A - Scoring card model construction method, device, equipment and computer readable storage medium - Google Patents

Scoring card model construction method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN112785415A
CN112785415A CN202110078425.4A CN202110078425A CN112785415A CN 112785415 A CN112785415 A CN 112785415A CN 202110078425 A CN202110078425 A CN 202110078425A CN 112785415 A CN112785415 A CN 112785415A
Authority
CN
China
Prior art keywords
node
model
abnormal
gbdt model
gbdt
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110078425.4A
Other languages
Chinese (zh)
Other versions
CN112785415B (en
Inventor
陈希蔓
陈婷
吴三平
庄伟亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202110078425.4A priority Critical patent/CN112785415B/en
Publication of CN112785415A publication Critical patent/CN112785415A/en
Application granted granted Critical
Publication of CN112785415B publication Critical patent/CN112785415B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Finance (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Biology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a method, a device and equipment for constructing a rating card model and a computer readable storage medium, wherein the method comprises the following steps: acquiring credit behavior data of a client, and taking the credit behavior data as training data to train a GBDT model based on the training data, wherein the GBDT model comprises a plurality of decision trees; when each decision tree in the GBDT model is trained, undetermined abnormal nodes in non-leaf nodes of the decision tree are determined, the undetermined abnormal nodes are verified, and abnormal nodes in the undetermined abnormal nodes are determined; and retraining the GBDT model based on the abnormal node, and obtaining a corresponding grading card model after the GBDT model is trained. According to the invention, the GBDT model is optimized by controlling the internal structure of the GBDT model, so that the scoring card model not only retains the excellent effect of the GBDT model, but also ensures the interpretability of the model.

Description

Scoring card model construction method, device, equipment and computer readable storage medium
Technical Field
The invention relates to the technical field of financial technology (Fintech), in particular to a method, a device and equipment for constructing a scoring card model and a computer readable storage medium.
Background
With the development of computer technology, more and more technologies (big data, distributed, Blockchain, artificial intelligence, etc.) are applied to the financial field, and the traditional financial industry is gradually changing to financial technology (Fintech), but higher requirements are also put forward on the technologies due to the requirements of security and real-time performance of the financial industry.
Existing scoring card models typically use either a generic scoring card model or a GBDT model. The general scoring card model of the financial scene is constructed by sequentially adopting variable binning, WOE conversion and logistic regression fitting, and the whole process of the general scoring card model can be manually intervened in an intervention model, so that the model can be prevented from being fitted in a wrong direction. The GBDT model belongs to a black box model, given an input variable and an input target label, the GBDT is directly trained according to the direction with the minimum fitting error; because the data source may have noise, the GBDT model may fit the noise, which results in the model being inaccurate, being used in a financial scenario, causing misjudgment and the like, and easily causing capital loss.
However, the general scoring card model belongs to a simple interpretable model, but the model has poor prediction effect; the GBDT model belongs to an integrated tree model, and although the model is good in effect, the model is not explanatory.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a scoring card model construction method, a scoring card model construction device, scoring card model construction equipment and a computer readable storage medium, and aims to solve the technical problem that a GBDT model is poor in interpretability.
In order to achieve the above object, the present invention provides a score card model construction method, including the steps of:
acquiring credit behavior data of a client, and taking the credit behavior data as training data to train a GBDT model based on the training data, wherein the GBDT model comprises a plurality of decision trees;
when each decision tree in the GBDT model is trained, undetermined abnormal nodes in non-leaf nodes of the decision tree are determined, the undetermined abnormal nodes are verified, and abnormal nodes in the undetermined abnormal nodes are determined;
and retraining the GBDT model based on the abnormal node, and obtaining a corresponding grading card model after the GBDT model is trained.
Optionally, when each of the decision trees in the GBDT model is trained, the step of determining a node to be determined as an abnormal node in non-leaf nodes of the decision tree includes:
when each decision tree in the GBDT model is trained, determining the positive sample proportion of each branch of a non-leaf node in the decision tree, and determining a univariate trend corresponding to the non-leaf node;
and determining a node to be determined to be abnormal in the non-leaf nodes based on the positive sample proportion and the univariate trend.
Optionally, the step of determining a pending abnormal node in the non-leaf nodes based on the positive sample fraction and the univariate trend comprises:
determining a first positive sample fraction of a left branch and a second positive sample fraction of a right branch of the non-leaf node;
determining a node trend of the GBDT model on the non-leaf nodes based on the first and second positive sample fractions;
determining a pending abnormal node in the non-leaf nodes based on the node trend of the GBDT model on the non-leaf nodes and the univariate trend.
Optionally, the step of determining a pending abnormal node in the non-leaf node based on the node trend of the GBDT model on the non-leaf node and the univariate trend comprises:
detecting whether the node trend of the GBDT model on the non-leaf nodes is consistent with the univariate trend;
and taking the non-leaf node with the node trend consistent with the univariate trend as the node to be determined to be abnormal in the non-leaf node.
Optionally, the step of determining the node trend of the GBDT model on the non-leaf node based on the first and second positive sample fractions comprises:
comparing the first positive sample fraction to a second positive sample fraction;
if the first positive sample fraction is less than the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is positive;
if the first positive sample fraction is greater than or equal to the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is negative.
Optionally, the step of retraining the GBDT model based on the abnormal node, and obtaining a corresponding score card model after the GBDT model is trained, includes:
retraining each decision tree in the GBDT model based on the training data;
when each decision tree of the GBDT model is trained, if an abnormal node in the decision tree is traversed, the traversal of the abnormal node and a descendant node corresponding to the abnormal node is stopped;
when traversing to the leaf node corresponding to the abnormal node, correcting the residual error output by the leaf node corresponding to the abnormal node into the positive sample proportion of the corresponding leaf node, and generating a structure file of the decision tree after pruning and after correction after traversing each node in the decision tree;
and retraining each decision tree in the GBDT model based on the structure file to construct a scoring card model.
Optionally, the step of retraining each decision tree in the GBDT model based on the structure file comprises:
retraining the GBDT model based on the structure file, and determining an output result of each decision tree in the GBDT model;
and optimizing the decision tree coefficient of the GBDT model based on the output result of the decision tree and a pre-trained logistic regression model, and determining a scoring card model after the GBDT model is optimized.
In addition, in order to achieve the above object, the present invention further provides a score card model construction device, including:
the system comprises a first training module, a second training module and a third training module, wherein the first training module is used for acquiring credit behavior data of a client and taking the credit behavior data as training data so as to train a GBDT model based on the training data, and the GBDT model comprises a plurality of decision trees;
the determining module is used for determining an undetermined abnormal node in non-leaf nodes of the decision tree when each decision tree in the GBDT model is trained, checking the undetermined abnormal node and determining an abnormal node in the undetermined abnormal node;
and the second training module is used for retraining the GBDT model based on the abnormal node and obtaining a corresponding grading card model after the GBDT model is trained.
In addition, in order to achieve the above object, the present invention also provides a score card model building apparatus, including: the system comprises a memory, a processor and a scoring card model building program which is stored on the memory and can run on the processor, wherein the scoring card model building program realizes the steps of the scoring card model building method when being executed by the processor.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a scorecard model construction program which, when executed by a processor, implements the steps of the scorecard model construction method as described above.
According to the method, credit behavior data of a client are obtained and serve as training data, so that a GBDT model is trained on the basis of the training data, wherein the GBDT model comprises a plurality of decision trees; when each decision tree in the GBDT model is trained, undetermined abnormal nodes in non-leaf nodes of the decision tree are determined, the undetermined abnormal nodes are verified, and abnormal nodes in the undetermined abnormal nodes are determined; and retraining the GBDT model based on the abnormal node, and obtaining a corresponding grading card model after the GBDT model is trained. In the embodiment, the method intervenes in the GBDT model, determines abnormal nodes in a decision tree of the GBDT model during first training, trains the GBDT model again after determining the abnormal nodes, and constructs the scoring card model, so that the structure in the GBDT model is controlled, the GBDT model is optimized, the scoring card model not only retains the excellent effect of the GBDT model, but also ensures the interpretability of the model.
Drawings
FIG. 1 is a schematic structural diagram of a scoring card model building device of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a first embodiment of a scoring card model construction method according to the present invention;
fig. 3 is a flowchart illustrating a second embodiment of the scoring card model construction method according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Fig. 1 is a schematic structural diagram of a scoring card model building device of a hardware operating environment according to an embodiment of the present invention.
The scoring card model construction equipment in the embodiment of the invention can be a PC (personal computer), and can also be a mobile terminal equipment with a display function, such as a smart phone, a tablet computer, an electronic book reader, a portable computer and the like.
As shown in fig. 1, the score card model building apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
It will be understood by those skilled in the art that the score card model building apparatus configuration shown in fig. 1 does not constitute a limitation of the score card model building apparatus, and may include more or less components than those shown, or combine some components, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a rating card model building program.
In the scoring card model building device shown in fig. 1, the network interface 1004 is mainly used for connecting with a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to call the scorecard model builder stored in the memory 1005.
In this embodiment, the score card model building apparatus includes: a memory 1005, a processor 1001 and a scorecard model building program stored on the memory 1005 and operable on the processor 1001, wherein when the processor 1001 calls the scorecard model building program stored in the memory 1005, the following operations are performed:
acquiring credit behavior data of a client, and taking the credit behavior data as training data to train a GBDT model based on the training data, wherein the GBDT model comprises a plurality of decision trees;
when each decision tree in the GBDT model is trained, undetermined abnormal nodes in non-leaf nodes of the decision tree are determined, the undetermined abnormal nodes are verified, and abnormal nodes in the undetermined abnormal nodes are determined;
and retraining the GBDT model based on the abnormal node, and obtaining a corresponding grading card model after the GBDT model is trained.
Further, the processor 1001 may call the scorecard model builder stored in the memory 1005, and further perform the following operations:
when each decision tree in the GBDT model is trained, determining the positive sample proportion of each branch of a non-leaf node in the decision tree, and determining a univariate trend corresponding to the non-leaf node;
and determining a node to be determined to be abnormal in the non-leaf nodes based on the positive sample proportion and the univariate trend.
Further, the processor 1001 may call the scorecard model builder stored in the memory 1005, and further perform the following operations:
determining a first positive sample fraction of a left branch and a second positive sample fraction of a right branch of the non-leaf node;
determining a node trend of the GBDT model on the non-leaf nodes based on the first and second positive sample fractions;
determining a pending abnormal node in the non-leaf nodes based on the node trend of the GBDT model on the non-leaf nodes and the univariate trend.
Further, the processor 1001 may call the scorecard model builder stored in the memory 1005, and further perform the following operations:
detecting whether the node trend of the GBDT model on the non-leaf nodes is consistent with the univariate trend;
and taking the non-leaf node with the node trend consistent with the univariate trend as the node to be determined to be abnormal in the non-leaf node.
Further, the processor 1001 may call the scorecard model builder stored in the memory 1005, and further perform the following operations:
comparing the first positive sample fraction to a second positive sample fraction;
if the first positive sample fraction is less than the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is positive;
if the first positive sample fraction is greater than or equal to the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is negative.
Further, the processor 1001 may call the scorecard model builder stored in the memory 1005, and further perform the following operations:
retraining each decision tree in the GBDT model based on the training data;
when each decision tree of the GBDT model is trained, if an abnormal node in the decision tree is traversed, the traversal of the abnormal node and a descendant node corresponding to the abnormal node is stopped;
when traversing to the leaf node corresponding to the abnormal node, correcting the residual error output by the leaf node corresponding to the abnormal node into the positive sample proportion of the corresponding leaf node, and generating a structure file of the decision tree after pruning and after correction after traversing each node in the decision tree;
and retraining each decision tree in the GBDT model based on the structure file to construct a scoring card model.
Further, the processor 1001 may call the scorecard model builder stored in the memory 1005, and further perform the following operations:
retraining the GBDT model based on the structure file, and determining an output result of each decision tree in the GBDT model;
and optimizing the decision tree coefficient of the GBDT model based on the output result of the decision tree and a pre-trained logistic regression model, and determining a scoring card model after the GBDT model is optimized.
The invention also provides a scoring card model construction method, and referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the scoring card model construction method of the invention.
In this embodiment, the method for constructing a score card model includes the following steps:
step S10, obtaining credit behavior data of a client, and using the credit behavior data as training data to train a GBDT model based on the training data, wherein the GBDT model comprises a plurality of decision trees;
the credit card scoring model provided by the invention is applied to a loan institution, is used for constructing an interpretable scoring card model based on GBDT applied to a financial scene, controls the structure in the model by being inserted into the GBDT model, not only can retain the excellent effect of the GBDT model, but also can control the structure in the model, ensures the interpretability of the model, thereby solving the technical problem of poor interpretability of the GBDT model, and ensuring that the scoring card model trained based on the GBDT model has good effect and high interpretability. Wherein, the GBDT (Gradient Boosting Decision Tree) model is a Decision Tree model trained by a Gradient Boosting strategy.
In the embodiment, the credit behavior data of the client comprises a credit history record and a service performance record of the client, wherein the credit history record is a personal credit investigation record recorded by the client at the people's bank, and the service performance record is record data of the behavior of the client on the loan transaction at the loan institution, and comprises a loan amount, a borrowing time, a repayment time and the like. In the process of constructing a scoring card model, firstly, acquiring credit data of a client, and taking the credit data as training data for training a GBDT model; and inputting the training data into a GBDT model to train the GBDT model, wherein the GBDT model is a decision tree model trained by a gradient boosting strategy, and the GBDT model comprises a plurality of decision trees.
Step S20, when each decision tree in the GBDT model is trained, undetermined abnormal nodes in non-leaf nodes of the decision tree are determined, the undetermined abnormal nodes are checked, and abnormal nodes in the undetermined abnormal nodes are determined;
in this embodiment, when each decision tree in the GBDT model is trained, for all decision trees in the GBDT model, nodes in each decision tree are traversed to determine non-leaf nodes in each decision tree; and processing the decision result of the sample in the non-leaf node, and determining the node to be determined to be abnormal in the non-leaf node, so as to detect the abnormal node in the non-leaf node in the decision tree corresponding to the GBDT model, wherein the abnormal node determined in the current step is the node to be determined to be abnormal, and the node to be determined to be abnormal is further analyzed subsequently to determine the abnormal node in the node to be determined to be abnormal. After determining abnormal nodes in the non-leaf nodes corresponding to the decision trees, checking the to-be-determined abnormal nodes of the decision trees for all the decision trees in the GBDT model so as to determine the abnormal nodes in the to-be-determined abnormal nodes.
And step S30, retraining the GBDT model based on the abnormal node, and obtaining a corresponding grading card model after the GBDT model is trained.
In this embodiment, after determining the abnormal node in the decision tree of the GBDT model, the credit behavior data of the client is obtained and used as the training data of the GBDT model to retrain the GBDT model. When each decision tree of the GBDT model is trained, if the abnormal node in the decision tree is traversed, the traversal of the abnormal node and the descendant node corresponding to the abnormal node is stopped, and when the leaf node corresponding to the abnormal node is traversed, the residual error output by the leaf node corresponding to the abnormal node is changed into the positive sample proportion of the leaf node, so that the grading card model is more in line with the expert experience.
In the score card model construction method provided by this embodiment, a GBDT model is trained based on training data by acquiring credit behavior data of a client and using the credit behavior data as the training data, where the GBDT model includes a plurality of decision trees; when each decision tree in the GBDT model is trained, undetermined abnormal nodes in non-leaf nodes of the decision tree are determined, the undetermined abnormal nodes are verified, and abnormal nodes in the undetermined abnormal nodes are determined; and retraining the GBDT model based on the abnormal node, and obtaining a corresponding grading card model after the GBDT model is trained. In the embodiment, the method intervenes in the GBDT model, determines abnormal nodes in a decision tree of the GBDT model during first training, trains the GBDT model again after determining the abnormal nodes, and constructs the scoring card model, so that the structure in the GBDT model is controlled, the GBDT model is optimized, the scoring card model not only retains the excellent effect of the GBDT model, but also ensures the interpretability of the model.
Based on the first embodiment, a second embodiment of the scoring card model construction method of the present invention is provided, and referring to fig. 3, in this embodiment, step S20 includes:
step S21, when each decision tree in the GBDT model is trained, determining the positive sample proportion of each branch of the non-leaf node in the decision tree and determining the univariate trend corresponding to the non-leaf node;
and step S22, determining the node to be determined to be abnormal in the non-leaf nodes based on the positive sample proportion and the univariate trend.
In this embodiment, when each decision tree in the GBDT model is trained, for all decision trees in the GBDT model, nodes in each decision tree are traversed to determine non-leaf nodes in each decision tree. Specifically, when each decision tree in the GBDT model is trained, the decision results of the samples in the non-leaf nodes are processed, and the positive sample proportion of each branch of the non-leaf nodes in the decision tree is determined; after the positive sample proportion of each branch of the non-leaf node is determined, the univariate trend corresponding to the non-leaf node is determined based on expert experience. And then, comparing the positive sample proportion with the univariate trend according to the positive sample proportion and the univariate trend, and determining an abnormal node to be determined in the non-leaf nodes, so as to detect an abnormal node in the non-leaf nodes in the decision tree corresponding to the GBDT model, wherein the abnormal node determined in the current step is the abnormal node to be determined, and subsequently, further analyzing the abnormal node to be determined to determine the abnormal node in the abnormal node to be determined.
Further, the step of determining a pending abnormal node in the non-leaf nodes based on the positive sample fraction and the univariate trend comprises:
step S221, determining a first positive sample proportion of a left branch and a second positive sample proportion of a right branch of the non-leaf node;
step S222, determining the node trend of the GBDT model on the non-leaf nodes based on the first positive sample proportion and the second positive sample proportion;
step S223, determining an abnormal node to be determined in the non-leaf node based on the node trend of the GBDT model on the non-leaf node and the univariate trend.
In this embodiment, the positive sample fraction of each branch of the non-leaf node includes a first positive sample fraction of a left branch of the non-leaf node and a second positive sample fraction of a right branch of the non-leaf node. When each decision tree in the GBDT model is trained, the decision results of the samples in the non-leaf nodes are processed, and the first positive sample proportion of the left branch and the second positive sample proportion of the right branch of the non-leaf nodes in the decision trees are determined.
After the positive sample proportion of each branch of the non-leaf node is determined, comparing the first positive sample proportion with the second positive sample proportion to determine the node trend on the non-leaf node; and determining univariate trends corresponding to the non-leaf nodes based on expert experience. And then, comparing the node trend with the univariate trend according to the node trend and the univariate trend, and determining the to-be-determined abnormal node in the non-leaf node, so as to detect the abnormal node in the non-leaf node in the decision tree corresponding to the GBDT model, wherein the abnormal node determined in the current step is the to-be-determined abnormal node, and subsequently, the to-be-determined abnormal node needs to be further analyzed to determine the abnormal node in the to-be-determined abnormal node.
Further, the step of determining the pending abnormal node in the non-leaf node based on the node trend of the GBDT model on the non-leaf node and the univariate trend comprises:
step S2231, detecting whether the node trend of the GBDT model on the non-leaf nodes is consistent with the univariate trend;
step S2232, using the non-leaf node whose node trend is consistent with the univariate trend as the node to be determined to be abnormal in the non-leaf node.
In this embodiment, after determining the node trend on the non-leaf node and determining the univariate trend corresponding to the non-leaf node for each non-leaf node in the decision tree, according to the node trend and the univariate trend, by detecting whether the node trend of the GBDT model on the non-leaf node is consistent with the univariate trend corresponding to the expert experience, the node trend is compared with the univariate trend, so as to determine the node to be determined to be abnormal in the non-leaf node. Specifically, if the node trend corresponding to the non-leaf node is consistent with the corresponding univariate trend, the non-leaf node is a normal node; and if the node trend corresponding to the non-leaf node is inconsistent with the corresponding univariate trend, the non-leaf node is the node to be determined to be abnormal. And taking the non-leaf node with the node trend consistent with the univariate trend as the undetermined abnormal node in the non-leaf node, thereby detecting the undetermined abnormal node in the non-leaf node in the decision tree corresponding to the GBDT model.
Further, the step of determining the node trend of the GBDT model on the non-leaf nodes based on the first and second positive sample fractions comprises:
step S2221, comparing the first positive sample proportion and the second positive sample proportion;
step S2222, if the first positive sample proportion is smaller than the second positive sample proportion, the node trend of the GBDT model on the non-leaf nodes is positive;
step S2223, if the first positive sample fraction is greater than or equal to the second positive sample fraction, the node trend of the GBDT model on the non-leaf node is negative.
In this embodiment, after determining the positive sample fraction of each branch of the non-leaf node, the first positive sample fraction and the second positive sample fraction are compared to determine the node trend on the non-leaf node. Specifically, if the first positive sample ratio is smaller than the second positive sample ratio, that is, the first positive sample ratio corresponding to the left branch of the non-leaf node is smaller than the second positive sample ratio corresponding to the right branch, the node trend of the GBDT model on the non-leaf node is positive; if the first positive sample ratio is greater than or equal to the second positive sample ratio, that is, the first positive sample ratio corresponding to the left branch of the non-leaf node is greater than or equal to the second positive sample ratio corresponding to the right branch, the node trend of the GBDT model on the non-leaf node is negative.
Further, the step of determining a first positive sample fraction of a left branch and a second positive sample fraction of a right branch of the non-leaf node comprises:
step S2211, obtaining a first positive sample number of the left branch of the non-leaf node, a second positive sample number of the right branch of the non-leaf node, and a sample number on the non-leaf node;
step S2212, determining a first positive sample proportion of the left branch of the non-leaf node based on the first positive sample number and the sample number;
step S2213, determining a second positive sample fraction of the right branch of the non-leaf node based on the second positive sample number and the sample number.
In this embodiment, the positive sample fraction of each branch of the non-leaf node includes a first positive sample fraction of a left branch of the non-leaf node and a second positive sample fraction of a right branch of the non-leaf node. When each decision tree in the GBDT model is trained, the decision results of the samples in the non-leaf nodes are processed, and the first positive sample proportion of the left branch and the second positive sample proportion of the right branch of the non-leaf nodes in the decision trees are determined. Specifically, for each decision tree in the GBDT model, a first positive sample number of a left branch of a non-leaf node, a second positive sample number of a right branch of the non-leaf node and a sample number on the non-leaf node in the decision tree are obtained; then, calculating the proportion of the first positive sample number in the sample number according to the first positive sample number and the total sample number of the non-leaf nodes to obtain the first positive sample proportion of the left branch of the non-leaf nodes; and calculating the proportion of the second positive sample number in the sample number according to the second positive sample number and the total sample number on the non-leaf node to obtain the second positive sample proportion of the right branch of the non-leaf node.
Further, the step of retraining the GBDT model based on the abnormal node and obtaining a corresponding score card model after the GBDT model is trained includes:
step S31, retraining each decision tree in the GBDT model based on the training data;
step S32, when each decision tree of the GBDT model is trained, if the abnormal node in the decision tree is traversed, the traversing of the abnormal node and the descendant node corresponding to the abnormal node is stopped;
step S33, when traversing to the leaf node corresponding to the abnormal node, correcting the residual error output by the leaf node corresponding to the abnormal node into the positive sample proportion of the corresponding leaf node, and generating the structure file of the decision tree after pruning and after correction after traversing each node in the decision tree;
and step S34, retraining each decision tree in the GBDT model based on the structure file to construct a scoring card model.
In this embodiment, after determining the abnormal node in the decision tree of the GBDT model, the credit behavior data of the client is obtained and used as the training data of the GBDT model to retrain the GBDT model. When each decision tree of the GBDT model is trained, if an abnormal node in the decision tree is traversed, the traversal of the abnormal node and a descendant node corresponding to the abnormal node is stopped, so as to prune a branch where the node in the decision tree is located; and when the leaf node corresponding to the abnormal node is traversed, the residual error output by the leaf node corresponding to the abnormal node is changed into the positive sample proportion of the corresponding leaf node, so as to correct the node result of the branch where the abnormal node is located, and the grading card model is more in line with the expert experience. After pruning and correcting the branch where the abnormal node is located, generating a structure file corresponding to the decision tree after pruning and correction so as to be used for training the GBDT model again to construct a scoring card model based on the structure file containing the decision tree after pruning and correction.
Further, the step of retraining each decision tree in the GBDT model based on the structure file and determining the score card model includes:
step S341, retraining the GBDT model based on the structure file, and determining an output result of each decision tree in the GBDT model;
and step S342, optimizing the decision tree coefficient of the GBDT model based on the output result of the decision tree and a pre-trained logistic regression model, and determining a scoring card model after the GBDT model is optimized.
In this embodiment, after determining an abnormal node in a decision tree of the GBDT model, credit behavior data of a client is obtained, and the credit behavior data is used as training data of the GBDT model, so as to retrain the decision tree corresponding to the structure file, and obtain an output result of each decision tree in the GBDT model. When each decision tree of the GBDT model is trained, if the abnormal node in the decision tree is traversed, the traversal of the abnormal node and the descendant node of the abnormal node is stopped, and when the leaf node corresponding to the abnormal node is traversed, the output result output by the GBDT model corresponding to the leaf node is corrected to be the positive sample proportion of the leaf node, so that the grading card model is more in line with the expert experience. And then, after the output result of the leaf node corresponding to the abnormal node in the decision tree is corrected, the decision tree coefficient of the GBDT model is continuously optimized according to the corrected output result and a pre-trained logistic regression model, and after the GBDT model is optimized, a scoring card model is obtained.
In addition, an embodiment of the present invention further provides a score card model construction device, where the score card model construction device includes:
the system comprises a first training module, a second training module and a third training module, wherein the first training module is used for acquiring credit behavior data of a client and taking the credit behavior data as training data so as to train a GBDT model based on the training data, and the GBDT model comprises a plurality of decision trees;
the determining module is used for determining an undetermined abnormal node in non-leaf nodes of the decision tree when each decision tree in the GBDT model is trained, checking the undetermined abnormal node and determining an abnormal node in the undetermined abnormal node;
and the second training module is used for retraining the GBDT model based on the abnormal node and obtaining a corresponding grading card model after the GBDT model is trained.
Further, the determining module is further configured to:
when each decision tree in the GBDT model is trained, determining the positive sample proportion of each branch of a non-leaf node in the decision tree, and determining a univariate trend corresponding to the non-leaf node;
and determining a node to be determined to be abnormal in the non-leaf nodes based on the positive sample proportion and the univariate trend.
Further, the determining module is further configured to:
determining a first positive sample fraction of a left branch and a second positive sample fraction of a right branch of the non-leaf node;
determining a node trend of the GBDT model on the non-leaf nodes based on the first and second positive sample fractions;
determining a pending abnormal node in the non-leaf nodes based on the node trend of the GBDT model on the non-leaf nodes and the univariate trend.
Further, the determining module is further configured to:
detecting whether the node trend of the GBDT model on the non-leaf nodes is consistent with the univariate trend;
and taking the non-leaf node with the node trend consistent with the univariate trend as the node to be determined to be abnormal in the non-leaf node.
Further, the determining module is further configured to:
comparing the first positive sample fraction to a second positive sample fraction;
if the first positive sample fraction is less than the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is positive;
if the first positive sample fraction is greater than or equal to the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is negative.
Further, the second training module is further configured to:
retraining each decision tree in the GBDT model based on the training data;
when each decision tree of the GBDT model is trained, if an abnormal node in the decision tree is traversed, the traversal of the abnormal node and a descendant node corresponding to the abnormal node is stopped;
when traversing to the leaf node corresponding to the abnormal node, correcting the residual error output by the leaf node corresponding to the abnormal node into the positive sample proportion of the corresponding leaf node, and generating a structure file of the decision tree after pruning and after correction after traversing each node in the decision tree;
and retraining each decision tree in the GBDT model based on the structure file to construct a scoring card model.
Further, the second training module is further configured to:
retraining the GBDT model based on the structure file, and determining an output result of each decision tree in the GBDT model;
and optimizing the decision tree coefficient of the GBDT model based on the output result of the decision tree and a pre-trained logistic regression model, and determining a scoring card model after the GBDT model is optimized.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, in which a scorecard model construction program is stored, and when executed by a processor, the scorecard model construction program implements the steps of the scorecard model construction method according to any one of the above.
The specific embodiment of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the above scoring card model construction method, and will not be described in detail herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A scoring card model construction method is characterized by comprising the following steps:
acquiring credit behavior data of a client, and taking the credit behavior data as training data to train a GBDT model based on the training data, wherein the GBDT model comprises a plurality of decision trees;
when each decision tree in the GBDT model is trained, undetermined abnormal nodes in non-leaf nodes of the decision tree are determined, the undetermined abnormal nodes are verified, and abnormal nodes in the undetermined abnormal nodes are determined;
and retraining the GBDT model based on the abnormal node, and obtaining a corresponding grading card model after the GBDT model is trained.
2. The method of claim 1, wherein the step of determining a pending anomaly node among non-leaf nodes of the decision tree when training each of the decision trees in the GBDT model comprises:
when each decision tree in the GBDT model is trained, determining the positive sample proportion of each branch of a non-leaf node in the decision tree, and determining a univariate trend corresponding to the non-leaf node;
and determining a node to be determined to be abnormal in the non-leaf nodes based on the positive sample proportion and the univariate trend.
3. The method of claim 2, wherein the step of determining the pending anomaly node in the non-leaf nodes based on the positive sample fraction and the univariate trend comprises:
determining a first positive sample fraction of a left branch and a second positive sample fraction of a right branch of the non-leaf node;
determining a node trend of the GBDT model on the non-leaf nodes based on the first and second positive sample fractions;
determining a pending abnormal node in the non-leaf nodes based on the node trend of the GBDT model on the non-leaf nodes and the univariate trend.
4. The method of constructing a scorecard model according to claim 3, wherein said step of determining a pending anomaly node in said non-leaf nodes based on a node trend of said GBDT model on said non-leaf nodes and said univariate trend comprises:
detecting whether the node trend of the GBDT model on the non-leaf nodes is consistent with the univariate trend;
and taking the non-leaf node with the node trend consistent with the univariate trend as the node to be determined to be abnormal in the non-leaf node.
5. The method of constructing a scorecard model according to claim 3, wherein said step of determining a node trend of said GBDT model on said non-leaf nodes based on said first and second positive sample fractions comprises:
comparing the first positive sample fraction to a second positive sample fraction;
if the first positive sample fraction is less than the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is positive;
if the first positive sample fraction is greater than or equal to the second positive sample fraction, the node trend of the GBDT model on the non-leaf nodes is negative.
6. A scoring card model construction method according to any one of claims 1 to 5, wherein said step of retraining said GBDT model based on said abnormal nodes and obtaining a corresponding scoring card model after training said GBDT model comprises:
retraining each decision tree in the GBDT model based on the training data;
when each decision tree of the GBDT model is trained, if an abnormal node in the decision tree is traversed, the traversal of the abnormal node and a descendant node corresponding to the abnormal node is stopped;
when traversing to the leaf node corresponding to the abnormal node, correcting the residual error output by the leaf node corresponding to the abnormal node into the positive sample proportion of the corresponding leaf node, and generating a structure file of the decision tree after pruning and after correction after traversing each node in the decision tree;
and retraining each decision tree in the GBDT model based on the structure file to construct a scoring card model.
7. The method according to claim 6, wherein the step of retraining each decision tree in the GBDT model based on the structure file comprises:
retraining the GBDT model based on the structure file, and determining an output result of each decision tree in the GBDT model;
and optimizing the decision tree coefficient of the GBDT model based on the output result of the decision tree and a pre-trained logistic regression model, and determining a scoring card model after the GBDT model is optimized.
8. A score card model construction device, characterized by comprising:
the system comprises a first training module, a second training module and a third training module, wherein the first training module is used for acquiring credit behavior data of a client and taking the credit behavior data as training data so as to train a GBDT model based on the training data, and the GBDT model comprises a plurality of decision trees;
the determining module is used for determining an undetermined abnormal node in non-leaf nodes of the decision tree when each decision tree in the GBDT model is trained, checking the undetermined abnormal node and determining an abnormal node in the undetermined abnormal node;
and the second training module is used for retraining the GBDT model based on the abnormal node and obtaining a corresponding grading card model after the GBDT model is trained.
9. A score card model building apparatus, characterized by comprising: memory, a processor and a scorecard model builder stored on the memory and executable on the processor, the scorecard model builder, when executed by the processor, implementing the steps of the scorecard model building method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that a scorecard model construction program is stored on the computer-readable storage medium, which when executed by a processor implements the steps of the scorecard model construction method according to any one of claims 1 to 7.
CN202110078425.4A 2021-01-20 2021-01-20 Method, device and equipment for constructing scoring card model and computer readable storage medium Active CN112785415B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110078425.4A CN112785415B (en) 2021-01-20 2021-01-20 Method, device and equipment for constructing scoring card model and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110078425.4A CN112785415B (en) 2021-01-20 2021-01-20 Method, device and equipment for constructing scoring card model and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112785415A true CN112785415A (en) 2021-05-11
CN112785415B CN112785415B (en) 2024-01-12

Family

ID=75758025

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110078425.4A Active CN112785415B (en) 2021-01-20 2021-01-20 Method, device and equipment for constructing scoring card model and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112785415B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018188543A1 (en) * 2017-04-14 2018-10-18 腾讯科技(深圳)有限公司 Real-time credit score adjustment processing method and device and processing server
WO2019061187A1 (en) * 2017-09-28 2019-04-04 深圳乐信软件技术有限公司 Credit evaluation method and apparatus and gradient boosting decision tree parameter adjustment method and apparatus
CN109587000A (en) * 2018-11-14 2019-04-05 上海交通大学 High latency method for detecting abnormality and system based on collective intelligence network measurement data
WO2019080407A1 (en) * 2017-10-25 2019-05-02 深圳壹账通智能科技有限公司 Credit evaluation method, apparatus and device, and computer readable storage medium
CN110796485A (en) * 2019-10-11 2020-02-14 上海上湖信息技术有限公司 Method and device for improving prediction precision of prediction model
CN111311400A (en) * 2020-03-30 2020-06-19 百维金科(上海)信息科技有限公司 Modeling method and system of grading card model based on GBDT algorithm
CN111382911A (en) * 2020-03-20 2020-07-07 达而观信息科技(上海)有限公司 High-cabinet personnel scheduling prediction method based on bank outlet business data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018188543A1 (en) * 2017-04-14 2018-10-18 腾讯科技(深圳)有限公司 Real-time credit score adjustment processing method and device and processing server
WO2019061187A1 (en) * 2017-09-28 2019-04-04 深圳乐信软件技术有限公司 Credit evaluation method and apparatus and gradient boosting decision tree parameter adjustment method and apparatus
WO2019080407A1 (en) * 2017-10-25 2019-05-02 深圳壹账通智能科技有限公司 Credit evaluation method, apparatus and device, and computer readable storage medium
CN109587000A (en) * 2018-11-14 2019-04-05 上海交通大学 High latency method for detecting abnormality and system based on collective intelligence network measurement data
CN110796485A (en) * 2019-10-11 2020-02-14 上海上湖信息技术有限公司 Method and device for improving prediction precision of prediction model
CN111382911A (en) * 2020-03-20 2020-07-07 达而观信息科技(上海)有限公司 High-cabinet personnel scheduling prediction method based on bank outlet business data
CN111311400A (en) * 2020-03-30 2020-06-19 百维金科(上海)信息科技有限公司 Modeling method and system of grading card model based on GBDT algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙权;赵金涛;: "基于数据挖掘的商户风险评分方法和系统", 软件产业与工程, no. 01, pages 33 - 37 *

Also Published As

Publication number Publication date
CN112785415B (en) 2024-01-12

Similar Documents

Publication Publication Date Title
US20200210899A1 (en) Machine learning model training method and device, and electronic device
JP6778273B2 (en) Performance model adverse effects compensation
DE102020110536A1 (en) Methods, systems, articles of manufacture, and devices for a context and complexity aware recommendation system for improved software development efficiency
CN108021934B (en) Method and device for recognizing multiple elements
CN111160569A (en) Application development method and device based on machine learning model and electronic equipment
EP3441891A1 (en) Data source-based service customisation apparatus, method, system, and storage medium
WO2020238783A1 (en) Information processing method and device, and storage medium
CN112906384B (en) BERT model-based data processing method, BERT model-based data processing device, BERT model-based data processing equipment and readable storage medium
CN112101520A (en) Risk assessment model training method, business risk assessment method and other equipment
CN106649661A (en) Method and device for establishing knowledge base
CN110874710A (en) Recruitment assistance method and device
CN113177700B (en) Risk assessment method, system, electronic equipment and storage medium
CN111815169A (en) Business approval parameter configuration method and device
CN111199469A (en) User payment model generation method and device and electronic equipment
CN112671985A (en) Agent quality inspection method, device, equipment and storage medium based on deep learning
CN116775879A (en) Fine tuning training method of large language model, contract risk review method and system
CN111178656A (en) Credit model training method, credit scoring device and electronic equipment
CN113919432A (en) Classification model construction method, data classification method and device
CN116842263A (en) Training processing method and device for intelligent question-answering financial advisor model
CN116542418A (en) Deep learning-based business handling method and system for office hall
CN112785415B (en) Method, device and equipment for constructing scoring card model and computer readable storage medium
CN114282498B (en) Data knowledge processing system applied to electric power transaction
CN112529624B (en) Method, device, equipment and storage medium for generating business prediction model
CN114581249B (en) Financial product recommendation method and system based on investment risk bearing capacity assessment
CN113918817B (en) Push model construction method, push model construction device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant