CN111105305A - Machine learning-based receivable and receivable cash cashing risk control method and system - Google Patents
Machine learning-based receivable and receivable cash cashing risk control method and system Download PDFInfo
- Publication number
- CN111105305A CN111105305A CN201911244056.0A CN201911244056A CN111105305A CN 111105305 A CN111105305 A CN 111105305A CN 201911244056 A CN201911244056 A CN 201911244056A CN 111105305 A CN111105305 A CN 111105305A
- Authority
- CN
- China
- Prior art keywords
- random forest
- forest model
- receivable
- output accuracy
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000010801 machine learning Methods 0.000 title claims abstract description 23
- 238000012954 risk control Methods 0.000 title claims abstract description 23
- 238000007637 random forest analysis Methods 0.000 claims abstract description 131
- 238000012795 verification Methods 0.000 claims abstract description 47
- 238000003066 decision tree Methods 0.000 claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 32
- 238000005457 optimization Methods 0.000 claims description 17
- 230000011218 segmentation Effects 0.000 claims description 16
- 238000010276 construction Methods 0.000 claims description 6
- 238000010200 validation analysis Methods 0.000 claims 2
- 238000012546 transfer Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000007667 floating Methods 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Technology Law (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention discloses a machine learning-based receivable and cash redemption risk control method and system, which comprises the following steps: establishing a decision tree according to a recursion method, training a plurality of decision trees through the obtained sample data, and establishing a random forest model; training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data; optimizing calling parameters in the random forest model, and selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as an optimized prediction model, wherein the calling parameters comprise the maximum number of the use characteristics of a single decision tree allowed by the random forest model, the number of the built subtrees and the minimum leaf node number; classifying and outputting accounts receivable by using the optimized random forest model so as to predict whether debtors or initial debtors default or not; the cash-in risk of due accounts receivable is reduced, and the sustainability of the flow of accounts receivable is improved.
Description
Technical Field
The invention relates to the technical field of financial risk management, in particular to a machine learning-based receivable and payable risk control method and system.
Background
The most important risk of accounts receivable is the credit risk, if the creditor cannot pay, the creditor cannot pay the funds according to the date, how to effectively identify the cash risk of the creditor who commits to pay or the initial creditor who commits to buy back, and take the targeted risk control measures to prevent the cash risk of accounts receivable due to the cash, which is an important ring for ensuring the continuous and healthy development of accounts receivable creditor transfer business. Traditionally, a wind control mode and technology based on full-time investigation, expert scoring and the like and based on manual investigation and judgment are difficult to meet the requirements of rapid, efficient and low-cost business development.
Disclosure of Invention
Based on the technical problems in the background art, the invention provides a machine learning-based receivable cash redemption risk control method and system, which reduce the redemption risk of due receivable accounts and improve the sustainability of receivable account circulation.
The invention provides a machine learning-based receivable and cash redemption risk control method, which comprises the following steps:
establishing a decision tree according to a recursion method, training a plurality of decision trees through the obtained sample data, and establishing a random forest model;
training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
optimizing calling parameters in the random forest model, and selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as an optimized prediction model, wherein the calling parameters comprise the maximum number of the use characteristics of a single decision tree allowed by the random forest model, the number of the built subtrees and the minimum leaf node number;
classifying and outputting accounts receivable by using the optimized random forest model so as to predict whether debtors or initial debtors default or not;
further, the establishing the decision tree according to the recursive method includes:
s11: sequentially traversing possible values a of each feature A in the current data set, and calculating the Gini index of each segmentation point (A, a);
s12: selecting the segmentation point with the minimum Gini index as the optimal segmentation point, and segmenting the current data set D into two subsets D through the optimal segmentation point1And D2Wherein D is1Is a set of samples in the current dataset that satisfies A ═ a, where D2Is a sample set in the current dataset that does not satisfy a ═ a;
s13: for D after segmentation1And D2And step S11 and step S12 are executed in sequence and circularly until the Gini index of the sample set is smaller than the preset threshold.
S14: a decision tree is generated based on the kini index minimization criterion.
Further, in the step of calculating the kiney index of each segmentation point (a, a), M classes are preset, and then the probability p that the current data set belongs to the kth class of the M classeskThen p iskGini (p) of the distribution:
in a binary decision treekGini (p) of the distribution:
Gini(p)=2p(1-p)
for D1(ii) the Kini index of the middle sample set
The kini index Gini (D, a) of the current dataset D under the condition of characteristic a ═ a is:
wherein Gini (D)1) As subset D1Gini (D) is a Gini index2) As subset D2The kini index of (a).
Further, the training of the random forest model on the acquired verification data to obtain the output accuracy of the random forest model based on the verification data includes:
s21: acquiring verification data, wherein the verification data comprises training set data and verification set data;
s22: training a classifier of the random forest model by using training set data;
s23: predicting and outputting verification set data through a classifier of the trained random forest model to obtain the output accuracy rate E1 of the random forest model based on the verification data;
s24: circulating the steps S21 to S23 to obtain the output accuracy rates E2, E3, ·, EN of the N-1 random forest models based on the verification data;
s25: and weighting the N times of output accuracy rates E1, E2, and EN to obtain an average value of the output accuracy rates, and using the average value of the output accuracy rates as the final output accuracy rate of the random forest model.
Further, obtaining the output accuracy of the random forest model based on the verification data in steps S23 and S24 includes:
outputting a fulfillment exchange risk value of the debtor or the initial creditor corresponding to the expired receivable account through a random forest model;
acquiring a performance value of the debtor or the initial creditor corresponding to the expired receivable in the actual performance;
and obtaining the output accuracy of the random forest model according to the deviation of the fulfillment redemption risk value from the fulfillment value.
Further, in the step of classifying and outputting the accounts received by the optimized random forest model to obtain a fulfillment exchange risk value of due accounts, when the fulfillment exchange risk value is smaller than a preset threshold value, due exchange of the accounts to be received is improved by adding credit increase measures.
Further, in the optimization of the calling parameters in the random forest model, double-layer circulation traversal is performed to establish array parameters of the number of subtrees and the minimum leaf node number to obtain a plurality of output accuracy rates, and a group with the highest output accuracy rate is selected as the optimal calling parameter of the random forest model.
Further, in the process of selecting the optimized random forest model with the output accuracy rate larger than the preset accuracy rate to output accounts receivable in a classified mode, the random forest model with the output accuracy rate of more than 90% is selected as the optimized random forest model.
A receivable cash redemption risk control system based on machine learning comprises a construction module, an optimization accuracy rate module and a prediction output module;
the construction module is used for establishing a decision tree according to a recursion method, training a plurality of decision trees through the acquired sample data and constructing a random forest model;
the optimization accuracy module is used for training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
the parameter optimization module is used for optimizing calling parameters in the random forest model, selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as the optimized prediction model, and the calling parameters comprise the maximum number of the features allowed to be used by a single decision tree by the random forest model, the number of the established subtrees and the minimum leaf node number.
And the prediction output module uses the optimized random forest model obtained in the optimized parameter optimization module to classify and output accounts receivable so as to predict whether debtors or initial debtors default or not.
A computer readable storage medium having stored thereon a number of get classification programs for being invoked by a processor and performing the steps of:
establishing a decision tree according to a recursion method, training a plurality of decision trees through the obtained sample data, and establishing a random forest model;
training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
optimizing calling parameters in the random forest model, and selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as an optimized prediction model, wherein the calling parameters comprise the maximum number of the use characteristics of a single decision tree allowed by the random forest model, the number of the built subtrees and the minimum leaf node number;
and classifying and outputting accounts receivable by using the optimized random forest model so as to predict whether debtors or initial debtors default or not.
The machine learning-based receivable and cash redemption risk control method and system provided by the invention have the advantages that: according to the receivable exchange risk control method and system based on machine learning, due exchange risk of a debtor who receives accounts and debts or an initial debtor who promises due and repurchase is quickly identified, the contract exchange risk value of the debtor or the initial debtor can be directly obtained, the exchange condition that the receivable accounts are due can be known in advance through the predicted contract exchange risk value, the exchange risk that the receivable accounts are due is reduced through the advance prediction and the advance increase of credit measures, and the sustainability of the transfer of the receivable accounts is improved.
Drawings
FIG. 1 is a schematic diagram illustrating steps of a machine learning-based receivable and payment risk control method according to the present invention;
FIG. 2 is a schematic flow chart of an accounts receivable redemption risk control system based on machine learning according to the present invention
The system comprises a building module, a 2-accuracy module, a 3-parameter optimization module and a 4-prediction output module.
Detailed Description
The present invention is described in detail below with reference to specific embodiments, and in the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather should be construed as broadly as the present invention is capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Referring to fig. 1, the method for controlling a receivable cash redemption risk based on machine learning provided by the invention comprises the following steps:
s1: establishing a decision tree according to a recursion method, training a plurality of decision trees through the obtained sample data, and establishing a random forest model;
a random forest model is established by using a random forest classifier method in a Python machine learning tool class scimit-lean.
The characteristics of the sample data at least comprise enterprise names, unified social credit codes (industrial and commercial registration numbers), residence areas of enterprises, areas (provinces, cities, counties and districts), enterprise properties, industries, economic types, enterprise scale, employee quantity, industrial status, establishment time, registered capital, credit status, total business income, business profits, accounts receivable and mobile asset proportion, default records and the like.
S2: training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
s3: and optimizing calling parameters in the random forest model, and selecting the random forest model with the output accuracy rate greater than the preset accuracy rate as the optimized prediction model, wherein the calling parameters comprise the maximum number of the use characteristics of a single decision tree allowed by the random forest model, the number of the built subtrees and the minimum leaf node number.
The accuracy of the model prediction result is improved by adjusting the calling parameters in the RandomForestClassiier method, and a random forest model with the output accuracy of more than 90% is selected as the optimized random forest model
S4: and classifying and outputting the receivable accounts by using the optimized random forest model to obtain a fulfillment exchange risk value of due receivable accounts so as to predict whether debtors or initial creditors default or not.
According to the steps S1 to S4, the method is mainly used for predicting default behaviors of debtors or initial creditors, the cashing condition of due accounts receivable can be known in advance through the prediction result, the cashing risk of due accounts receivable is reduced through the advance prediction and the advance increase of credit-adding measures, and the sustainability of the flow of accounts receivable is improved. Firstly, rapidly identifying the due cashing risk of a debtor receiving the debt right or an initial debtor committing to due repurchase, directly obtaining a performance cashing risk value of the debtor or the initial debtor, and when the performance cashing risk value is larger than a preset threshold value, indicating that the cashing risk of the debtor or the initial debtor is lower, and completing the due cashing of receivable accounts; when the fulfillment redemption risk value is smaller than the preset threshold value, the redemption risk of the debtor or the initial creditor is indicated, corresponding credit increase measures can be required to be provided in advance to enhance the maturity fulfillment capability to avoid the risk, the risk control flow is assisted by the account receivable and creditor management service mechanism from multiple aspects of cost, efficiency, accuracy and the like, and the continuous and healthy development of the account receivable and creditor transfer business is ensured.
In the random forest model, the input value must be an integer or a floating point number, so when data is input into the random forest model, the data needs to be preprocessed, and a character string is converted into the integer or the floating point number, so that the random forest model can be used compatibly.
Further, in the step S1 of building the decision tree according to the recursive method, x1, x2,.. xn represents n attributes of the accounts receivable debtors and the initial creditors, and the n-dimensional space is recursively divided into non-overlapping rectangles, and the dividing step includes:
s11: sequentially traversing possible values a of each feature A in the current data set, and calculating the Gini index of each segmentation point (A, a);
the M types are preset, and the types used in the application are divided into two types: whether or not to default: yes (1) and no (0), the default recorded by each debtor or initial creditor can only belong to a certain condition, namely default or no default, namely M is 2. The probability p that the current dataset belongs to the kth of the M classeskThen p iskGini (p) of the distribution:
in a binary decision treekGini (p) of the distribution:
Gini(p)=2p(1-p)
s12: selecting the segmentation point with the minimum Gini index as the optimal segmentation point, and segmenting the current data set D into two subsets D through the optimal segmentation point1And D2Wherein D is1Is a set of samples in the current dataset that satisfies A ═ a, where D2Is a sample set in the current dataset that does not satisfy a ═ a;
the kini index Gini (D, a) of the current dataset D under the condition of characteristic a ═ a is:
wherein Gini (D)1) As subset D1The Giny index of (a) represents the set D1Uncertainty of (2), Gini (D)2) As subset D2The Giny index of (a) represents the set D2Uncertainty of (2). Gini (D, A): representing the uncertainty of the set D after A ═ a segmentation; the larger the kini index, the greater the interpretation uncertainty; the smaller the kini coefficient, the smaller the uncertainty, the more thorough and clean the data segmentation.
S13: for D after segmentation1And D2And step S11 and step S12 are executed in sequence and circularly until the Gini index of the sample set is smaller than the preset threshold.
For the stop condition in performing steps S11 and S12 on the loop, the kini index of the sample set may be less than a predetermined threshold, or the number of sample data may be less than a predetermined threshold of 30, or there may be no more features available for segmentation.
S14: a decision tree is generated based on the kini index minimization criterion.
According to the steps S11 to S14, n decision trees can be constructed, and each decision tree can grow to the maximum extent on the premise of not pruning; and finally, forming a random forest by the n generated decision trees, and constructing a random forest model through data training.
Further, at step S2: the training of the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data comprises the following steps:
s21: acquiring verification data, wherein the verification data comprises training set data and verification set data;
randomly selecting 80% of the verification data as training set data, and using the rest 20% of the verification data as verification set data.
S22: training a classifier of the random forest model by using training set data;
s23: predicting and outputting verification set data through a classifier of the trained random forest model to obtain the output accuracy rate E1 of the random forest model based on the verification data;
s24: circulating the steps S21 to S23 to obtain the output accuracy rates E2, E3, ·, EN of the N-1 random forest models based on the verification data;
s25: and weighting the N times of output accuracy rates E1, E2, and EN to obtain an average value of the output accuracy rates, and using the average value of the output accuracy rates as the final output accuracy rate of the random forest model.
The output accuracy of the random forest model adopts the following formula:
wherein, TP is the number of records with correct prediction, TN is the number of records with wrong prediction, and TP + TN is the total number of predicted records.
The obtaining of the output accuracy of the random forest model based on the verification data in steps S23 and S24 includes:
outputting a fulfillment exchange risk value of the debtor or the initial creditor corresponding to the expired receivable account through a random forest model; and obtaining the output accuracy of the random forest model according to the deviation of the fulfillment redemption risk value from the fulfillment value.
And comparing the result output by the random forest model with the actual result in the operation process to obtain the output accuracy.
Preferably, at step S3: and classifying and outputting the accounts received by the optimized random forest model to obtain a fulfillment contract payment risk value when the accounts to be received are due, and increasing a credit increase measure to improve due payment of the accounts to be received when the fulfillment contract payment risk value is smaller than a preset threshold value.
The credit-increasing measures include, but are not limited to, the following ways:
A. paying a certain proportion of performance bond
B. Core enterprise commitment to pay
C. The guarantee company guarantees
D. Third party warranty
E. Insurance company insurance
F. Effective asset pledge
G. Bank preauthorization credit
H. Non-gold company transfer
Through the combination of one or more credit-increasing measures, the risk that the debtor cannot pay or the initial debtor cannot buy after the accounts receivable are due can be effectively prevented, the settlement of the full amount can be timely obtained after the accounts receivable and the debt are due, and the adoption of the specific measures can be comprehensively determined by the accounts receivable and debt management service mechanism in combination with the system recommendation and the manual judgment.
At step S3 l: in the optimization of the calling parameters in the random forest model, double-layer circulation traversal is used for establishing array parameters of the number of subtrees and the minimum leaf node number to obtain a plurality of output accuracy rates, and one group with the highest output accuracy rate is selected as the optimal calling parameter of the random forest model. One specific example is as follows:
and (3) optimizing calling parameters of the random forest model, firstly, the random forest model allows the maximum number of the features used by a single decision tree, and all the features are selected because the feature data segments are not many and only ten in total in the data of the debtors or the initial creditors. In the remaining parameters for establishing the number of subtrees and the minimum leaf node number, tuning is performed on variables of the number of established subtrees (n _ estimators) and the minimum leaf node number (min _ sample _ leaf), an array of the number of established subtrees (n _ estimators) and the minimum leaf node number (min _ sample _ leaf) is established through loop traversal, wherein the initial value of the number of established subtrees (n _ estimators) is 1, the self-increment is 3, the maximum value is 50, the initial value of the minimum leaf node number (min _ sample _ leaf) is 1, the self-increment is 5, the maximum value is 100, a prediction result is obtained through double-layer loop traversal modeling, the accuracy is calculated through comparison between a real result and the prediction result, and as shown in table 1, a group of parameters with the highest accuracy is finally selected as the optimal parameters of the random forest model.
TABLE 1
Serial number | Number of decision trees | Minimum leaf node number | Rate of |
1 | 1 | 1 | 0.8 |
2 | 1 | 6 | 0.9 |
3 | 1 | 11 | 0.9 |
4 | 1 | 16 | 0.95 |
5 | 1 | 21 | 0.95 |
6 | 1 | 26 | 0.95 |
7 | 4 | 1 | 0.75 |
8 | 4 | 6 | 0.75 |
9 | 4 | 11 | 0.75 |
… | … | … | … |
N-2 | 49 | 86 | 0.75 |
N-1 | 49 | 91 | 0.75 |
N | 49 | 96 | 0.75 |
The optimal parameters (1, 16, 0.95) can be intuitively obtained through table 1, that is, when the number of decision trees is 1 and the number of minimum leaf nodes is 16, the output accuracy of the random forest model is 0.95. Therefore, the random forest model under the parameter can be selected to classify and output accounts receivable, and a relatively accurate fulfillment and payment risk value is obtained.
A receivable cash redemption risk control system based on machine learning comprises a construction module 1, an accuracy rate module 2, a parameter optimization module 3 and a prediction output module 4;
the construction module 1 is used for establishing a decision tree according to a recursion method, training a plurality of decision trees through the acquired sample data, and constructing a random forest model;
the optimization accuracy module 2 is used for training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
the parameter optimization module 3 is configured to optimize call parameters in the random forest model, and select the random forest model with an output accuracy greater than a preset accuracy as the optimized prediction model, where the call parameters include a maximum number of features allowed to be used by a single decision tree by the random forest model, a number of subtrees to be built, and a minimum number of leaf nodes.
And the prediction output module 4 uses the optimized random forest model obtained in the optimization parameter optimization module 3 to classify and output accounts receivable so as to predict whether debtors or initial creditors default or not.
A computer readable storage medium having stored thereon a number of get classification programs for being invoked by a processor and performing the steps of:
establishing a decision tree according to a recursion method, training a plurality of decision trees through the obtained sample data, and establishing a random forest model;
the random forest model is trained by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
optimizing calling parameters in the random forest model, and selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as an optimized prediction model, wherein the calling parameters comprise the maximum number of the use characteristics of a single decision tree allowed by the random forest model, the number of the built subtrees and the minimum leaf node number;
and classifying and outputting accounts receivable by using the optimized random forest model so as to predict whether debtors or initial debtors default or not.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.
Claims (10)
1. A machine learning-based receivable cash redemption risk control method is characterized by comprising the following steps:
establishing a decision tree according to a recursion method, training a plurality of decision trees through the obtained sample data, and establishing a random forest model;
the random forest model is trained by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
optimizing calling parameters in the random forest model, and selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as an optimized prediction model, wherein the calling parameters comprise the maximum number of the use characteristics of a single decision tree allowed by the random forest model, the number of the built subtrees and the minimum leaf node number;
and classifying and outputting accounts receivable by using the optimized random forest model so as to predict whether debtors or initial debtors default or not.
2. The machine learning-based receivables redemption risk control method of claim 1, wherein said building a decision tree according to a recursive method comprises:
s11: sequentially traversing possible values a of each feature A in the current data set, and calculating the Gini index of each segmentation point (A, a);
s12: selecting the segmentation point with the minimum Gini index as the optimal segmentation point, and segmenting the current data set D into two subsets D through the optimal segmentation point1And D2Wherein D is1Is a set of samples in the current dataset that satisfies A ═ a, where D2Is a sample set in the current dataset that does not satisfy a ═ a;
s13: for D after segmentation1And D2Step S11 and step S12 are executed in a circulating mode respectively until the Gini index of the sample set is smaller than a preset threshold value;
s14: a decision tree is generated based on the kini index minimization criterion.
3. The machine-learning-based receivable redemption risk control method of claim 2, wherein in the calculating of the kini index of each cut point (a, a), there are predefined M classes of default conditions, and then the probability p that the current data belongs to the kth class of the M classeskThen p iskGini (p) of the distribution:
in a binary decision treekGini (p) of the distribution:
Gini(p)=2p(1-p)
the kini index Gini (D, a) of the current dataset D under the condition of characteristic a ═ a is:
wherein Gini (D)1) As subset D1Gini (D) is a Gini index2) As subset D2The kini index of (a).
4. The machine-learning-based receivables redemption risk control method of claim 1, wherein the training of the random forest model using the acquired validation data to obtain the output accuracy of the validation-data-based random forest model comprises:
s21: acquiring verification data, wherein the verification data comprises training set data and verification set data;
s22: training a classifier of the random forest model by using training set data;
s23: predicting and outputting verification set data through a classifier of the trained random forest model to obtain the output accuracy rate E1 of the random forest model based on the verification data;
s24: circulating the steps S21 to S23 to obtain the output accuracy rates E2, E3, ·, EN of the N-1 random forest models based on the verification data;
s25: and weighting the N times of output accuracy rates E1, E2, and EN to obtain an average value of the output accuracy rates, and using the average value of the output accuracy rates as the final output accuracy rate of the random forest model.
5. The machine learning-based receivable redemption risk control method of claim 4, wherein obtaining the output accuracy of the random forest model based on the validation data in steps S23 and S24 comprises:
outputting a fulfillment exchange risk value of the debtor or the initial creditor corresponding to the expired receivable account through a random forest model;
acquiring a performance value of the debtor or the initial creditor corresponding to the expired receivable in the actual performance;
and obtaining the output accuracy of the random forest model according to the deviation of the fulfillment redemption risk value from the fulfillment value.
6. The machine learning-based receivable exchange risk control method according to any one of claims 1 to 5, wherein in the step of classifying and outputting the receivable corresponding to the account through the optimized random forest model to obtain a fulfillment exchange risk value at which the receivable is due, when the fulfillment exchange risk value is smaller than a preset threshold value, the due exchange of the receivable is improved by adding a credit-increase measure.
7. The machine-learning-based receivables redemption risk control method of claim 6, wherein in optimizing the invocation parameters in the random forest model, the double-layer loop traversal establishes the array parameters of the number of subtrees and the minimum leaf node number to obtain a plurality of output accuracy rates, and selects the group with the highest output accuracy rate as the optimal invocation parameter of the random forest model.
8. The machine learning-based receivable exchange risk control method according to any one of claims 1-5, wherein in the step of selecting the optimized random forest model with the output accuracy rate greater than the preset accuracy rate to classify and output receivable accounts, the random forest model with the output accuracy rate of more than 90% is selected as the optimized random forest model.
9. A receivable cash redemption risk control system based on machine learning is characterized by comprising a construction module (1), an accuracy rate module (2), a parameter optimization module (3) and a prediction output module (4);
the construction module (1) is used for establishing a decision tree according to a recursion method, training a plurality of decision trees through the acquired sample data, and constructing a random forest model;
the optimization accuracy module (2) is used for training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
the parameter optimization module (3) is used for optimizing calling parameters in the random forest model, selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as the optimized prediction model, and the calling parameters comprise the maximum number of the features allowed to be used by a single decision tree by the random forest model, the number of the established subtrees and the minimum number of leaf nodes.
And the prediction output module (4) uses the optimized random forest model obtained in the optimization parameter optimization module (3) to classify and output accounts receivable so as to predict whether debtors or initial debtors default or not.
10. A computer readable storage medium having stored thereon a number of get classification programs for being invoked by a processor and performing the steps of:
establishing a decision tree according to a recursion method, training a plurality of decision trees through the obtained sample data, and establishing a random forest model;
training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
optimizing calling parameters in the random forest model, and selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as an optimized prediction model, wherein the calling parameters comprise the maximum number of the use characteristics of a single decision tree allowed by the random forest model, the number of the built subtrees and the minimum leaf node number;
and classifying and outputting accounts receivable by using the optimized random forest model so as to predict whether debtors or initial debtors default or not.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911244056.0A CN111105305A (en) | 2019-12-06 | 2019-12-06 | Machine learning-based receivable and receivable cash cashing risk control method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911244056.0A CN111105305A (en) | 2019-12-06 | 2019-12-06 | Machine learning-based receivable and receivable cash cashing risk control method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111105305A true CN111105305A (en) | 2020-05-05 |
Family
ID=70421828
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911244056.0A Pending CN111105305A (en) | 2019-12-06 | 2019-12-06 | Machine learning-based receivable and receivable cash cashing risk control method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111105305A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117875969A (en) * | 2023-12-07 | 2024-04-12 | 指增(上海)科技有限责任公司 | Training method, payment route selection method, system, electronic equipment and medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109480833A (en) * | 2018-08-30 | 2019-03-19 | 北京航空航天大学 | The pretreatment and recognition methods of epileptic's EEG signals based on artificial intelligence |
WO2019088972A1 (en) * | 2017-10-30 | 2019-05-09 | Equifax, Inc. | Training tree-based machine-learning modeling algorithms for predicting outputs and generating explanatory data |
CN109949152A (en) * | 2019-04-15 | 2019-06-28 | 武汉理工大学 | A kind of personal credit's violation correction method |
CN110334737A (en) * | 2019-06-04 | 2019-10-15 | 阿里巴巴集团控股有限公司 | A kind of method and system of the customer risk index screening based on random forest |
US20190318421A1 (en) * | 2018-04-13 | 2019-10-17 | GDS Link, LLC | Decision-making system and method based on supervised learning |
-
2019
- 2019-12-06 CN CN201911244056.0A patent/CN111105305A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019088972A1 (en) * | 2017-10-30 | 2019-05-09 | Equifax, Inc. | Training tree-based machine-learning modeling algorithms for predicting outputs and generating explanatory data |
US20190318421A1 (en) * | 2018-04-13 | 2019-10-17 | GDS Link, LLC | Decision-making system and method based on supervised learning |
CN109480833A (en) * | 2018-08-30 | 2019-03-19 | 北京航空航天大学 | The pretreatment and recognition methods of epileptic's EEG signals based on artificial intelligence |
CN109949152A (en) * | 2019-04-15 | 2019-06-28 | 武汉理工大学 | A kind of personal credit's violation correction method |
CN110334737A (en) * | 2019-06-04 | 2019-10-15 | 阿里巴巴集团控股有限公司 | A kind of method and system of the customer risk index screening based on random forest |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117875969A (en) * | 2023-12-07 | 2024-04-12 | 指增(上海)科技有限责任公司 | Training method, payment route selection method, system, electronic equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Heyman et al. | The financial structure of private held Belgian firms | |
US7191150B1 (en) | Enhancing delinquent debt collection using statistical models of debt historical information and account events | |
KR102044205B1 (en) | Target information prediction system using big data and machine learning and method thereof | |
US20070124237A1 (en) | System and method for optimizing cross-sell decisions for financial products | |
KR20190064749A (en) | Method and device for intelligent decision support in stock investment | |
CN115271912A (en) | Credit business intelligent wind control approval system and method based on big data | |
CN116596659A (en) | Enterprise intelligent credit approval method, system and medium based on big data wind control | |
CN115545886A (en) | Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium | |
Laitinen | Data system for assessing probability of failure in SME reorganization | |
CN111105305A (en) | Machine learning-based receivable and receivable cash cashing risk control method and system | |
CN112085593A (en) | Small and medium-sized enterprise credit data mining method | |
CN117196808A (en) | Mobility risk prediction method and related device for peer business | |
CN114358519B (en) | Intelligent credit line interest rate adjusting method and device | |
CN113706300A (en) | Loan method and device for small and micro enterprises | |
Salihu et al. | A review of algorithms for credit risk analysis | |
CN107977804B (en) | Guarantee warehouse business risk assessment method | |
US8515841B2 (en) | Financial product application pull-through system | |
Sadatrasoul | Matrix Sequential Hybrid Credit Scorecard Based on Logistic Regression and Clustering | |
Ma | Through the crisis: UK SMEs performance during the ‘credit crunch’ | |
Nazari et al. | Using the Hybrid Model for Credit Scoring (Case Study: Credit Clients of microloans, Bank Refah-Kargeran of Zanjan, Iran) | |
Glawion et al. | Applications of non-linear machine learning tree-based methods for prepayments forecasting of fixed-rate institutional loans | |
Motale | Predicting business failure in a South African business bank | |
CN117764692A (en) | Method for predicting credit risk default probability | |
Ertuğrul | Customer Transaction Predictive Modeling via Machine Learning Algorithms | |
CN116308590A (en) | Bill product pushing method, device and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200505 |