CN111105305A - Machine learning-based receivable and receivable cash cashing risk control method and system - Google Patents

Machine learning-based receivable and receivable cash cashing risk control method and system Download PDF

Info

Publication number
CN111105305A
CN111105305A CN201911244056.0A CN201911244056A CN111105305A CN 111105305 A CN111105305 A CN 111105305A CN 201911244056 A CN201911244056 A CN 201911244056A CN 111105305 A CN111105305 A CN 111105305A
Authority
CN
China
Prior art keywords
random forest
forest model
receivable
output accuracy
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911244056.0A
Other languages
Chinese (zh)
Inventor
黄林
梁樑
曾水保
吴斌
朱香友
黄晓漫
黄超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Sea Converge Financial Investment Group Co ltd
Original Assignee
Anhui Sea Converge Financial Investment Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Sea Converge Financial Investment Group Co ltd filed Critical Anhui Sea Converge Financial Investment Group Co ltd
Priority to CN201911244056.0A priority Critical patent/CN111105305A/en
Publication of CN111105305A publication Critical patent/CN111105305A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Technology Law (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention discloses a machine learning-based receivable and cash redemption risk control method and system, which comprises the following steps: establishing a decision tree according to a recursion method, training a plurality of decision trees through the obtained sample data, and establishing a random forest model; training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data; optimizing calling parameters in the random forest model, and selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as an optimized prediction model, wherein the calling parameters comprise the maximum number of the use characteristics of a single decision tree allowed by the random forest model, the number of the built subtrees and the minimum leaf node number; classifying and outputting accounts receivable by using the optimized random forest model so as to predict whether debtors or initial debtors default or not; the cash-in risk of due accounts receivable is reduced, and the sustainability of the flow of accounts receivable is improved.

Description

Machine learning-based receivable and receivable cash cashing risk control method and system
Technical Field
The invention relates to the technical field of financial risk management, in particular to a machine learning-based receivable and payable risk control method and system.
Background
The most important risk of accounts receivable is the credit risk, if the creditor cannot pay, the creditor cannot pay the funds according to the date, how to effectively identify the cash risk of the creditor who commits to pay or the initial creditor who commits to buy back, and take the targeted risk control measures to prevent the cash risk of accounts receivable due to the cash, which is an important ring for ensuring the continuous and healthy development of accounts receivable creditor transfer business. Traditionally, a wind control mode and technology based on full-time investigation, expert scoring and the like and based on manual investigation and judgment are difficult to meet the requirements of rapid, efficient and low-cost business development.
Disclosure of Invention
Based on the technical problems in the background art, the invention provides a machine learning-based receivable cash redemption risk control method and system, which reduce the redemption risk of due receivable accounts and improve the sustainability of receivable account circulation.
The invention provides a machine learning-based receivable and cash redemption risk control method, which comprises the following steps:
establishing a decision tree according to a recursion method, training a plurality of decision trees through the obtained sample data, and establishing a random forest model;
training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
optimizing calling parameters in the random forest model, and selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as an optimized prediction model, wherein the calling parameters comprise the maximum number of the use characteristics of a single decision tree allowed by the random forest model, the number of the built subtrees and the minimum leaf node number;
classifying and outputting accounts receivable by using the optimized random forest model so as to predict whether debtors or initial debtors default or not;
further, the establishing the decision tree according to the recursive method includes:
s11: sequentially traversing possible values a of each feature A in the current data set, and calculating the Gini index of each segmentation point (A, a);
s12: selecting the segmentation point with the minimum Gini index as the optimal segmentation point, and segmenting the current data set D into two subsets D through the optimal segmentation point1And D2Wherein D is1Is a set of samples in the current dataset that satisfies A ═ a, where D2Is a sample set in the current dataset that does not satisfy a ═ a;
s13: for D after segmentation1And D2And step S11 and step S12 are executed in sequence and circularly until the Gini index of the sample set is smaller than the preset threshold.
S14: a decision tree is generated based on the kini index minimization criterion.
Further, in the step of calculating the kiney index of each segmentation point (a, a), M classes are preset, and then the probability p that the current data set belongs to the kth class of the M classeskThen p iskGini (p) of the distribution:
Figure BDA0002307031840000021
in a binary decision treekGini (p) of the distribution:
Gini(p)=2p(1-p)
for D1(ii) the Kini index of the middle sample set
The kini index Gini (D, a) of the current dataset D under the condition of characteristic a ═ a is:
Figure BDA0002307031840000022
wherein Gini (D)1) As subset D1Gini (D) is a Gini index2) As subset D2The kini index of (a).
Further, the training of the random forest model on the acquired verification data to obtain the output accuracy of the random forest model based on the verification data includes:
s21: acquiring verification data, wherein the verification data comprises training set data and verification set data;
s22: training a classifier of the random forest model by using training set data;
s23: predicting and outputting verification set data through a classifier of the trained random forest model to obtain the output accuracy rate E1 of the random forest model based on the verification data;
s24: circulating the steps S21 to S23 to obtain the output accuracy rates E2, E3, ·, EN of the N-1 random forest models based on the verification data;
s25: and weighting the N times of output accuracy rates E1, E2, and EN to obtain an average value of the output accuracy rates, and using the average value of the output accuracy rates as the final output accuracy rate of the random forest model.
Further, obtaining the output accuracy of the random forest model based on the verification data in steps S23 and S24 includes:
outputting a fulfillment exchange risk value of the debtor or the initial creditor corresponding to the expired receivable account through a random forest model;
acquiring a performance value of the debtor or the initial creditor corresponding to the expired receivable in the actual performance;
and obtaining the output accuracy of the random forest model according to the deviation of the fulfillment redemption risk value from the fulfillment value.
Further, in the step of classifying and outputting the accounts received by the optimized random forest model to obtain a fulfillment exchange risk value of due accounts, when the fulfillment exchange risk value is smaller than a preset threshold value, due exchange of the accounts to be received is improved by adding credit increase measures.
Further, in the optimization of the calling parameters in the random forest model, double-layer circulation traversal is performed to establish array parameters of the number of subtrees and the minimum leaf node number to obtain a plurality of output accuracy rates, and a group with the highest output accuracy rate is selected as the optimal calling parameter of the random forest model.
Further, in the process of selecting the optimized random forest model with the output accuracy rate larger than the preset accuracy rate to output accounts receivable in a classified mode, the random forest model with the output accuracy rate of more than 90% is selected as the optimized random forest model.
A receivable cash redemption risk control system based on machine learning comprises a construction module, an optimization accuracy rate module and a prediction output module;
the construction module is used for establishing a decision tree according to a recursion method, training a plurality of decision trees through the acquired sample data and constructing a random forest model;
the optimization accuracy module is used for training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
the parameter optimization module is used for optimizing calling parameters in the random forest model, selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as the optimized prediction model, and the calling parameters comprise the maximum number of the features allowed to be used by a single decision tree by the random forest model, the number of the established subtrees and the minimum leaf node number.
And the prediction output module uses the optimized random forest model obtained in the optimized parameter optimization module to classify and output accounts receivable so as to predict whether debtors or initial debtors default or not.
A computer readable storage medium having stored thereon a number of get classification programs for being invoked by a processor and performing the steps of:
establishing a decision tree according to a recursion method, training a plurality of decision trees through the obtained sample data, and establishing a random forest model;
training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
optimizing calling parameters in the random forest model, and selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as an optimized prediction model, wherein the calling parameters comprise the maximum number of the use characteristics of a single decision tree allowed by the random forest model, the number of the built subtrees and the minimum leaf node number;
and classifying and outputting accounts receivable by using the optimized random forest model so as to predict whether debtors or initial debtors default or not.
The machine learning-based receivable and cash redemption risk control method and system provided by the invention have the advantages that: according to the receivable exchange risk control method and system based on machine learning, due exchange risk of a debtor who receives accounts and debts or an initial debtor who promises due and repurchase is quickly identified, the contract exchange risk value of the debtor or the initial debtor can be directly obtained, the exchange condition that the receivable accounts are due can be known in advance through the predicted contract exchange risk value, the exchange risk that the receivable accounts are due is reduced through the advance prediction and the advance increase of credit measures, and the sustainability of the transfer of the receivable accounts is improved.
Drawings
FIG. 1 is a schematic diagram illustrating steps of a machine learning-based receivable and payment risk control method according to the present invention;
FIG. 2 is a schematic flow chart of an accounts receivable redemption risk control system based on machine learning according to the present invention
The system comprises a building module, a 2-accuracy module, a 3-parameter optimization module and a 4-prediction output module.
Detailed Description
The present invention is described in detail below with reference to specific embodiments, and in the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather should be construed as broadly as the present invention is capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Referring to fig. 1, the method for controlling a receivable cash redemption risk based on machine learning provided by the invention comprises the following steps:
s1: establishing a decision tree according to a recursion method, training a plurality of decision trees through the obtained sample data, and establishing a random forest model;
a random forest model is established by using a random forest classifier method in a Python machine learning tool class scimit-lean.
The characteristics of the sample data at least comprise enterprise names, unified social credit codes (industrial and commercial registration numbers), residence areas of enterprises, areas (provinces, cities, counties and districts), enterprise properties, industries, economic types, enterprise scale, employee quantity, industrial status, establishment time, registered capital, credit status, total business income, business profits, accounts receivable and mobile asset proportion, default records and the like.
S2: training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
s3: and optimizing calling parameters in the random forest model, and selecting the random forest model with the output accuracy rate greater than the preset accuracy rate as the optimized prediction model, wherein the calling parameters comprise the maximum number of the use characteristics of a single decision tree allowed by the random forest model, the number of the built subtrees and the minimum leaf node number.
The accuracy of the model prediction result is improved by adjusting the calling parameters in the RandomForestClassiier method, and a random forest model with the output accuracy of more than 90% is selected as the optimized random forest model
S4: and classifying and outputting the receivable accounts by using the optimized random forest model to obtain a fulfillment exchange risk value of due receivable accounts so as to predict whether debtors or initial creditors default or not.
According to the steps S1 to S4, the method is mainly used for predicting default behaviors of debtors or initial creditors, the cashing condition of due accounts receivable can be known in advance through the prediction result, the cashing risk of due accounts receivable is reduced through the advance prediction and the advance increase of credit-adding measures, and the sustainability of the flow of accounts receivable is improved. Firstly, rapidly identifying the due cashing risk of a debtor receiving the debt right or an initial debtor committing to due repurchase, directly obtaining a performance cashing risk value of the debtor or the initial debtor, and when the performance cashing risk value is larger than a preset threshold value, indicating that the cashing risk of the debtor or the initial debtor is lower, and completing the due cashing of receivable accounts; when the fulfillment redemption risk value is smaller than the preset threshold value, the redemption risk of the debtor or the initial creditor is indicated, corresponding credit increase measures can be required to be provided in advance to enhance the maturity fulfillment capability to avoid the risk, the risk control flow is assisted by the account receivable and creditor management service mechanism from multiple aspects of cost, efficiency, accuracy and the like, and the continuous and healthy development of the account receivable and creditor transfer business is ensured.
In the random forest model, the input value must be an integer or a floating point number, so when data is input into the random forest model, the data needs to be preprocessed, and a character string is converted into the integer or the floating point number, so that the random forest model can be used compatibly.
Further, in the step S1 of building the decision tree according to the recursive method, x1, x2,.. xn represents n attributes of the accounts receivable debtors and the initial creditors, and the n-dimensional space is recursively divided into non-overlapping rectangles, and the dividing step includes:
s11: sequentially traversing possible values a of each feature A in the current data set, and calculating the Gini index of each segmentation point (A, a);
the M types are preset, and the types used in the application are divided into two types: whether or not to default: yes (1) and no (0), the default recorded by each debtor or initial creditor can only belong to a certain condition, namely default or no default, namely M is 2. The probability p that the current dataset belongs to the kth of the M classeskThen p iskGini (p) of the distribution:
Figure BDA0002307031840000071
in a binary decision treekGini (p) of the distribution:
Gini(p)=2p(1-p)
s12: selecting the segmentation point with the minimum Gini index as the optimal segmentation point, and segmenting the current data set D into two subsets D through the optimal segmentation point1And D2Wherein D is1Is a set of samples in the current dataset that satisfies A ═ a, where D2Is a sample set in the current dataset that does not satisfy a ═ a;
the kini index Gini (D, a) of the current dataset D under the condition of characteristic a ═ a is:
Figure BDA0002307031840000072
wherein Gini (D)1) As subset D1The Giny index of (a) represents the set D1Uncertainty of (2), Gini (D)2) As subset D2The Giny index of (a) represents the set D2Uncertainty of (2). Gini (D, A): representing the uncertainty of the set D after A ═ a segmentation; the larger the kini index, the greater the interpretation uncertainty; the smaller the kini coefficient, the smaller the uncertainty, the more thorough and clean the data segmentation.
S13: for D after segmentation1And D2And step S11 and step S12 are executed in sequence and circularly until the Gini index of the sample set is smaller than the preset threshold.
For the stop condition in performing steps S11 and S12 on the loop, the kini index of the sample set may be less than a predetermined threshold, or the number of sample data may be less than a predetermined threshold of 30, or there may be no more features available for segmentation.
S14: a decision tree is generated based on the kini index minimization criterion.
According to the steps S11 to S14, n decision trees can be constructed, and each decision tree can grow to the maximum extent on the premise of not pruning; and finally, forming a random forest by the n generated decision trees, and constructing a random forest model through data training.
Further, at step S2: the training of the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data comprises the following steps:
s21: acquiring verification data, wherein the verification data comprises training set data and verification set data;
randomly selecting 80% of the verification data as training set data, and using the rest 20% of the verification data as verification set data.
S22: training a classifier of the random forest model by using training set data;
s23: predicting and outputting verification set data through a classifier of the trained random forest model to obtain the output accuracy rate E1 of the random forest model based on the verification data;
s24: circulating the steps S21 to S23 to obtain the output accuracy rates E2, E3, ·, EN of the N-1 random forest models based on the verification data;
s25: and weighting the N times of output accuracy rates E1, E2, and EN to obtain an average value of the output accuracy rates, and using the average value of the output accuracy rates as the final output accuracy rate of the random forest model.
The output accuracy of the random forest model adopts the following formula:
Figure BDA0002307031840000091
wherein, TP is the number of records with correct prediction, TN is the number of records with wrong prediction, and TP + TN is the total number of predicted records.
The obtaining of the output accuracy of the random forest model based on the verification data in steps S23 and S24 includes:
outputting a fulfillment exchange risk value of the debtor or the initial creditor corresponding to the expired receivable account through a random forest model; and obtaining the output accuracy of the random forest model according to the deviation of the fulfillment redemption risk value from the fulfillment value.
And comparing the result output by the random forest model with the actual result in the operation process to obtain the output accuracy.
Preferably, at step S3: and classifying and outputting the accounts received by the optimized random forest model to obtain a fulfillment contract payment risk value when the accounts to be received are due, and increasing a credit increase measure to improve due payment of the accounts to be received when the fulfillment contract payment risk value is smaller than a preset threshold value.
The credit-increasing measures include, but are not limited to, the following ways:
A. paying a certain proportion of performance bond
B. Core enterprise commitment to pay
C. The guarantee company guarantees
D. Third party warranty
E. Insurance company insurance
F. Effective asset pledge
G. Bank preauthorization credit
H. Non-gold company transfer
Through the combination of one or more credit-increasing measures, the risk that the debtor cannot pay or the initial debtor cannot buy after the accounts receivable are due can be effectively prevented, the settlement of the full amount can be timely obtained after the accounts receivable and the debt are due, and the adoption of the specific measures can be comprehensively determined by the accounts receivable and debt management service mechanism in combination with the system recommendation and the manual judgment.
At step S3 l: in the optimization of the calling parameters in the random forest model, double-layer circulation traversal is used for establishing array parameters of the number of subtrees and the minimum leaf node number to obtain a plurality of output accuracy rates, and one group with the highest output accuracy rate is selected as the optimal calling parameter of the random forest model. One specific example is as follows:
and (3) optimizing calling parameters of the random forest model, firstly, the random forest model allows the maximum number of the features used by a single decision tree, and all the features are selected because the feature data segments are not many and only ten in total in the data of the debtors or the initial creditors. In the remaining parameters for establishing the number of subtrees and the minimum leaf node number, tuning is performed on variables of the number of established subtrees (n _ estimators) and the minimum leaf node number (min _ sample _ leaf), an array of the number of established subtrees (n _ estimators) and the minimum leaf node number (min _ sample _ leaf) is established through loop traversal, wherein the initial value of the number of established subtrees (n _ estimators) is 1, the self-increment is 3, the maximum value is 50, the initial value of the minimum leaf node number (min _ sample _ leaf) is 1, the self-increment is 5, the maximum value is 100, a prediction result is obtained through double-layer loop traversal modeling, the accuracy is calculated through comparison between a real result and the prediction result, and as shown in table 1, a group of parameters with the highest accuracy is finally selected as the optimal parameters of the random forest model.
TABLE 1
Serial number Number of decision trees Minimum leaf node number Rate of accuracy
1 1 1 0.8
2 1 6 0.9
3 1 11 0.9
4 1 16 0.95
5 1 21 0.95
6 1 26 0.95
7 4 1 0.75
8 4 6 0.75
9 4 11 0.75
N-2 49 86 0.75
N-1 49 91 0.75
N 49 96 0.75
The optimal parameters (1, 16, 0.95) can be intuitively obtained through table 1, that is, when the number of decision trees is 1 and the number of minimum leaf nodes is 16, the output accuracy of the random forest model is 0.95. Therefore, the random forest model under the parameter can be selected to classify and output accounts receivable, and a relatively accurate fulfillment and payment risk value is obtained.
A receivable cash redemption risk control system based on machine learning comprises a construction module 1, an accuracy rate module 2, a parameter optimization module 3 and a prediction output module 4;
the construction module 1 is used for establishing a decision tree according to a recursion method, training a plurality of decision trees through the acquired sample data, and constructing a random forest model;
the optimization accuracy module 2 is used for training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
the parameter optimization module 3 is configured to optimize call parameters in the random forest model, and select the random forest model with an output accuracy greater than a preset accuracy as the optimized prediction model, where the call parameters include a maximum number of features allowed to be used by a single decision tree by the random forest model, a number of subtrees to be built, and a minimum number of leaf nodes.
And the prediction output module 4 uses the optimized random forest model obtained in the optimization parameter optimization module 3 to classify and output accounts receivable so as to predict whether debtors or initial creditors default or not.
A computer readable storage medium having stored thereon a number of get classification programs for being invoked by a processor and performing the steps of:
establishing a decision tree according to a recursion method, training a plurality of decision trees through the obtained sample data, and establishing a random forest model;
the random forest model is trained by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
optimizing calling parameters in the random forest model, and selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as an optimized prediction model, wherein the calling parameters comprise the maximum number of the use characteristics of a single decision tree allowed by the random forest model, the number of the built subtrees and the minimum leaf node number;
and classifying and outputting accounts receivable by using the optimized random forest model so as to predict whether debtors or initial debtors default or not.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (10)

1. A machine learning-based receivable cash redemption risk control method is characterized by comprising the following steps:
establishing a decision tree according to a recursion method, training a plurality of decision trees through the obtained sample data, and establishing a random forest model;
the random forest model is trained by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
optimizing calling parameters in the random forest model, and selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as an optimized prediction model, wherein the calling parameters comprise the maximum number of the use characteristics of a single decision tree allowed by the random forest model, the number of the built subtrees and the minimum leaf node number;
and classifying and outputting accounts receivable by using the optimized random forest model so as to predict whether debtors or initial debtors default or not.
2. The machine learning-based receivables redemption risk control method of claim 1, wherein said building a decision tree according to a recursive method comprises:
s11: sequentially traversing possible values a of each feature A in the current data set, and calculating the Gini index of each segmentation point (A, a);
s12: selecting the segmentation point with the minimum Gini index as the optimal segmentation point, and segmenting the current data set D into two subsets D through the optimal segmentation point1And D2Wherein D is1Is a set of samples in the current dataset that satisfies A ═ a, where D2Is a sample set in the current dataset that does not satisfy a ═ a;
s13: for D after segmentation1And D2Step S11 and step S12 are executed in a circulating mode respectively until the Gini index of the sample set is smaller than a preset threshold value;
s14: a decision tree is generated based on the kini index minimization criterion.
3. The machine-learning-based receivable redemption risk control method of claim 2, wherein in the calculating of the kini index of each cut point (a, a), there are predefined M classes of default conditions, and then the probability p that the current data belongs to the kth class of the M classeskThen p iskGini (p) of the distribution:
Figure FDA0002307031830000021
in a binary decision treekGini (p) of the distribution:
Gini(p)=2p(1-p)
the kini index Gini (D, a) of the current dataset D under the condition of characteristic a ═ a is:
Figure FDA0002307031830000022
wherein Gini (D)1) As subset D1Gini (D) is a Gini index2) As subset D2The kini index of (a).
4. The machine-learning-based receivables redemption risk control method of claim 1, wherein the training of the random forest model using the acquired validation data to obtain the output accuracy of the validation-data-based random forest model comprises:
s21: acquiring verification data, wherein the verification data comprises training set data and verification set data;
s22: training a classifier of the random forest model by using training set data;
s23: predicting and outputting verification set data through a classifier of the trained random forest model to obtain the output accuracy rate E1 of the random forest model based on the verification data;
s24: circulating the steps S21 to S23 to obtain the output accuracy rates E2, E3, ·, EN of the N-1 random forest models based on the verification data;
s25: and weighting the N times of output accuracy rates E1, E2, and EN to obtain an average value of the output accuracy rates, and using the average value of the output accuracy rates as the final output accuracy rate of the random forest model.
5. The machine learning-based receivable redemption risk control method of claim 4, wherein obtaining the output accuracy of the random forest model based on the validation data in steps S23 and S24 comprises:
outputting a fulfillment exchange risk value of the debtor or the initial creditor corresponding to the expired receivable account through a random forest model;
acquiring a performance value of the debtor or the initial creditor corresponding to the expired receivable in the actual performance;
and obtaining the output accuracy of the random forest model according to the deviation of the fulfillment redemption risk value from the fulfillment value.
6. The machine learning-based receivable exchange risk control method according to any one of claims 1 to 5, wherein in the step of classifying and outputting the receivable corresponding to the account through the optimized random forest model to obtain a fulfillment exchange risk value at which the receivable is due, when the fulfillment exchange risk value is smaller than a preset threshold value, the due exchange of the receivable is improved by adding a credit-increase measure.
7. The machine-learning-based receivables redemption risk control method of claim 6, wherein in optimizing the invocation parameters in the random forest model, the double-layer loop traversal establishes the array parameters of the number of subtrees and the minimum leaf node number to obtain a plurality of output accuracy rates, and selects the group with the highest output accuracy rate as the optimal invocation parameter of the random forest model.
8. The machine learning-based receivable exchange risk control method according to any one of claims 1-5, wherein in the step of selecting the optimized random forest model with the output accuracy rate greater than the preset accuracy rate to classify and output receivable accounts, the random forest model with the output accuracy rate of more than 90% is selected as the optimized random forest model.
9. A receivable cash redemption risk control system based on machine learning is characterized by comprising a construction module (1), an accuracy rate module (2), a parameter optimization module (3) and a prediction output module (4);
the construction module (1) is used for establishing a decision tree according to a recursion method, training a plurality of decision trees through the acquired sample data, and constructing a random forest model;
the optimization accuracy module (2) is used for training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
the parameter optimization module (3) is used for optimizing calling parameters in the random forest model, selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as the optimized prediction model, and the calling parameters comprise the maximum number of the features allowed to be used by a single decision tree by the random forest model, the number of the established subtrees and the minimum number of leaf nodes.
And the prediction output module (4) uses the optimized random forest model obtained in the optimization parameter optimization module (3) to classify and output accounts receivable so as to predict whether debtors or initial debtors default or not.
10. A computer readable storage medium having stored thereon a number of get classification programs for being invoked by a processor and performing the steps of:
establishing a decision tree according to a recursion method, training a plurality of decision trees through the obtained sample data, and establishing a random forest model;
training the random forest model by using the acquired verification data to obtain the output accuracy of the random forest model based on the verification data;
optimizing calling parameters in the random forest model, and selecting the random forest model with the output accuracy rate larger than the preset accuracy rate as an optimized prediction model, wherein the calling parameters comprise the maximum number of the use characteristics of a single decision tree allowed by the random forest model, the number of the built subtrees and the minimum leaf node number;
and classifying and outputting accounts receivable by using the optimized random forest model so as to predict whether debtors or initial debtors default or not.
CN201911244056.0A 2019-12-06 2019-12-06 Machine learning-based receivable and receivable cash cashing risk control method and system Pending CN111105305A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911244056.0A CN111105305A (en) 2019-12-06 2019-12-06 Machine learning-based receivable and receivable cash cashing risk control method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911244056.0A CN111105305A (en) 2019-12-06 2019-12-06 Machine learning-based receivable and receivable cash cashing risk control method and system

Publications (1)

Publication Number Publication Date
CN111105305A true CN111105305A (en) 2020-05-05

Family

ID=70421828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911244056.0A Pending CN111105305A (en) 2019-12-06 2019-12-06 Machine learning-based receivable and receivable cash cashing risk control method and system

Country Status (1)

Country Link
CN (1) CN111105305A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117875969A (en) * 2023-12-07 2024-04-12 指增(上海)科技有限责任公司 Training method, payment route selection method, system, electronic equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109480833A (en) * 2018-08-30 2019-03-19 北京航空航天大学 The pretreatment and recognition methods of epileptic's EEG signals based on artificial intelligence
WO2019088972A1 (en) * 2017-10-30 2019-05-09 Equifax, Inc. Training tree-based machine-learning modeling algorithms for predicting outputs and generating explanatory data
CN109949152A (en) * 2019-04-15 2019-06-28 武汉理工大学 A kind of personal credit's violation correction method
CN110334737A (en) * 2019-06-04 2019-10-15 阿里巴巴集团控股有限公司 A kind of method and system of the customer risk index screening based on random forest
US20190318421A1 (en) * 2018-04-13 2019-10-17 GDS Link, LLC Decision-making system and method based on supervised learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019088972A1 (en) * 2017-10-30 2019-05-09 Equifax, Inc. Training tree-based machine-learning modeling algorithms for predicting outputs and generating explanatory data
US20190318421A1 (en) * 2018-04-13 2019-10-17 GDS Link, LLC Decision-making system and method based on supervised learning
CN109480833A (en) * 2018-08-30 2019-03-19 北京航空航天大学 The pretreatment and recognition methods of epileptic's EEG signals based on artificial intelligence
CN109949152A (en) * 2019-04-15 2019-06-28 武汉理工大学 A kind of personal credit's violation correction method
CN110334737A (en) * 2019-06-04 2019-10-15 阿里巴巴集团控股有限公司 A kind of method and system of the customer risk index screening based on random forest

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117875969A (en) * 2023-12-07 2024-04-12 指增(上海)科技有限责任公司 Training method, payment route selection method, system, electronic equipment and medium

Similar Documents

Publication Publication Date Title
Heyman et al. The financial structure of private held Belgian firms
US7191150B1 (en) Enhancing delinquent debt collection using statistical models of debt historical information and account events
KR102044205B1 (en) Target information prediction system using big data and machine learning and method thereof
US20070124237A1 (en) System and method for optimizing cross-sell decisions for financial products
KR20190064749A (en) Method and device for intelligent decision support in stock investment
CN115271912A (en) Credit business intelligent wind control approval system and method based on big data
CN116596659A (en) Enterprise intelligent credit approval method, system and medium based on big data wind control
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
Laitinen Data system for assessing probability of failure in SME reorganization
CN111105305A (en) Machine learning-based receivable and receivable cash cashing risk control method and system
CN112085593A (en) Small and medium-sized enterprise credit data mining method
CN117196808A (en) Mobility risk prediction method and related device for peer business
CN114358519B (en) Intelligent credit line interest rate adjusting method and device
CN113706300A (en) Loan method and device for small and micro enterprises
Salihu et al. A review of algorithms for credit risk analysis
CN107977804B (en) Guarantee warehouse business risk assessment method
US8515841B2 (en) Financial product application pull-through system
Sadatrasoul Matrix Sequential Hybrid Credit Scorecard Based on Logistic Regression and Clustering
Ma Through the crisis: UK SMEs performance during the ‘credit crunch’
Nazari et al. Using the Hybrid Model for Credit Scoring (Case Study: Credit Clients of microloans, Bank Refah-Kargeran of Zanjan, Iran)
Glawion et al. Applications of non-linear machine learning tree-based methods for prepayments forecasting of fixed-rate institutional loans
Motale Predicting business failure in a South African business bank
CN117764692A (en) Method for predicting credit risk default probability
Ertuğrul Customer Transaction Predictive Modeling via Machine Learning Algorithms
CN116308590A (en) Bill product pushing method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200505