CN110310200B - Method and device for clearing overdue loan - Google Patents

Method and device for clearing overdue loan Download PDF

Info

Publication number
CN110310200B
CN110310200B CN201910604647.8A CN201910604647A CN110310200B CN 110310200 B CN110310200 B CN 110310200B CN 201910604647 A CN201910604647 A CN 201910604647A CN 110310200 B CN110310200 B CN 110310200B
Authority
CN
China
Prior art keywords
clearing
clue
category
clearance
decision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910604647.8A
Other languages
Chinese (zh)
Other versions
CN110310200A (en
Inventor
张鑫
赵焕芳
侯鑫磊
张皓
董朝霞
刘晓龙
朱伟伟
李振
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN201910604647.8A priority Critical patent/CN110310200B/en
Publication of CN110310200A publication Critical patent/CN110310200A/en
Application granted granted Critical
Publication of CN110310200B publication Critical patent/CN110310200B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Abstract

The invention provides a method and a device for clearing overdue loan, which are characterized in that after receiving a clearing request of the overdue loan to be cleared, clearing clues respectively corresponding to a plurality of associated accounts are obtained, the clearing clues are analyzed by using a decision-making auxiliary model, recommended clearing clues are determined from the clearing clues according to the analysis result, and the recommended clearing clues are used as a basis for formulating a clearing strategy of the overdue loan to be cleared; the associated accounts comprise deposit accounts except for appointed repayment accounts of clients to be cleared for overdue loans and deposit accounts of the supporters of the clients, each clearing clue comprises a plurality of categories of characteristics, the characteristics of each category are determined according to account information and client information, and only one characteristic of each category in one clearing clue is provided. Based on the scheme provided by the invention, the approver can formulate an acceptance policy by analyzing the recommended acceptance clues without analyzing each acceptance clue of the overdue loan to be accepted, so that the efficiency of accepting the overdue loan is improved.

Description

Method and device for clearing overdue loan
Technical Field
The invention relates to the field of data processing, in particular to a method and a device for clearing and collecting overdue loans.
Background
The loan transaction means that a commercial bank borrows a fund to a customer and appoints the customer to return principal and interest after a certain time. For any loan, if the client does not return principal and interest on time, the loan is called an overdue loan, and the process of withdrawing principal and interest of the overdue loan is to say.
For commercial banks, a large amount of overdue loans can have adverse effects on the fund turnover of the banks, so that efficient clearing of the overdue loans becomes an important guarantee for stable operation of the commercial banks.
At present, aiming at the overdue loans to be cleared, the main clearing method of a commercial bank is that an approver analyzes a plurality of clearing clues of the overdue loans to be cleared one by one, then a clearing strategy is formulated according to the analysis result, money is deducted from an appointed repayment account of a client for clearing the overdue loans according to the clearing strategy, and the original interest of the overdue loans is paid by the deducted funds. However, a pending overdue loan usually corresponds to a large number of clearance clues, and the manual analysis of each clearance clue in the prior art is inefficient, which limits the clearance efficiency of the overdue loan.
Disclosure of Invention
Based on the defects of the prior art, the invention provides a method and a device for clearing overdue loans, which are used for screening clearing clues and utilizing the screened recommended clearing clues to assist in formulating a clearing strategy so as to improve the clearing efficiency of the overdue loans.
To solve the above problems, the following solutions are proposed:
the invention discloses a method for clearing overdue loan, which comprises the following steps:
after receiving a clearing request of the overdue loan to be cleared, acquiring a plurality of clearing clues corresponding to the overdue loan to be cleared; each clearance cue corresponds to an associated account of the overdue loan to be cleared, and each clearance cue comprises: the characteristics of a plurality of categories are determined according to the account information of the corresponding associated account and the customer information of the customer waiting to settle the overdue loan, and each category in one clearing clue only corresponds to one characteristic; the related account of the overdue loan to be cleared comprises a deposit account of the client except for an agreed repayment account and/or a deposit account of a guarantor of the client;
analyzing each clearing clue by using a decision auxiliary model, and determining a recommended clearing clue in the plurality of clearing clues according to an analysis result; the decision-making auxiliary model is established according to a plurality of historical samples, each historical sample comprises an acceptance clearance clue and a deduction mark of an associated account corresponding to the acceptance clearance clue, and the deduction mark indicates whether a deposit of the corresponding associated account is used for paying the interest of any overdue loan corresponding to the acceptance clearance clue.
Optionally, the analyzing the clearance clues by using the decision-making assistance model, and determining recommended clearance clues in the clearance clues according to an analysis result, includes:
aiming at each clearing clue, analyzing the clearing clue by using a decision auxiliary model to obtain a corresponding analysis result;
judging whether an analysis result corresponding to each clearing clue meets a preset clearing condition or not;
and for each clearing clue, if the analysis result corresponding to the clearing clue meets the clearing condition, determining the clearing clue as a recommended clearing clue.
Optionally, after the analyzing the clearing clues by using the decision-making assistance model and determining the recommended clearing clue in the plurality of clearing clues according to the analysis result, the method further includes:
for each recommended clearance cue, determining whether the associated account corresponding to the recommended clearance cue is used for paying the interest of the overdue loan to be cleared, and then optimizing the decision-making auxiliary model by using the recommended clearance cue and the deduction identification of the associated account corresponding to the recommended clearance cue; and the deduction mark of the associated account corresponding to the recommended clearing clue is set according to whether the deposit of the corresponding associated account is used for paying the interest of the overdue loan to be cleared or not.
Optionally, the process of establishing the decision-making assistance model includes:
determining a key category in the multiple categories according to the importance of each category of the historical sample; calculating the importance of each category based on the plurality of historical samples by using a random forest algorithm;
for each historical sample, constructing a key sample corresponding to the historical sample by using the characteristics of the key category of the historical sample and the deduction identification of the historical sample;
calculating an initial value as a current auxiliary model according to the deduction marks of the plurality of key samples, and setting the iteration number to be 1;
calculating to obtain the negative gradient of each key sample by using the current auxiliary model;
calculating to obtain an update value of an auxiliary model according to each key sample and the negative gradient of each key sample;
updating the current auxiliary model based on the auxiliary model update value, and increasing the iteration number by 1;
taking the updated current auxiliary model as a current auxiliary model, judging whether the current iteration times are larger than a preset threshold value, and if the current iteration times are smaller than or equal to the threshold value, returning to execute the current auxiliary model and calculating to obtain the negative gradient of each key sample; and if the current iteration times are larger than the threshold value, determining the current auxiliary model as a decision auxiliary model.
Optionally, the calculating, by using a random forest algorithm, the importance of each category based on the plurality of historical samples includes:
establishing M decision trees based on the plurality of historical samples; wherein M is a preset positive integer, each decision tree is established by using a C4.5 algorithm, and the M decision trees form a random forest;
calculating a first out-of-bag data error of each decision tree in the random forest according to the plurality of historical samples;
for each category, applying random noise to all features of the category in the plurality of historical samples, and then calculating a second out-of-bag data error of each decision tree corresponding to the category according to the plurality of historical samples after the random noise is applied;
for each class, calculating the importance of the class according to the first error and the second error of the class; wherein the first error comprises a first out-of-bag data error for each decision tree in the random forest and the second error for the category comprises a second out-of-bag data error for each decision tree in the random forest corresponding to the category.
The invention discloses a device for clearing overdue loan in a second aspect, which comprises:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of clearance clues corresponding to the overdue loan to be cleared after receiving a clearance request of the overdue loan to be cleared; each clearance thread corresponds to an associated account of the overdue loan to be cleared, and each clearance thread comprises: the characteristics of a plurality of categories are determined according to the account information of the corresponding associated account and the customer information of the customer waiting to settle the overdue loan, and each category in one clearing clue only corresponds to one characteristic; the related account of the overdue loan to be cleared comprises a deposit account of the client except for an agreed repayment account and/or a deposit account of a guarantor of the client;
the construction unit is used for establishing a decision auxiliary model according to a plurality of historical samples; each history sample comprises a clearing thread and a deduction identifier of an associated account corresponding to the clearing thread, wherein the deduction identifier indicates whether the deposit of the corresponding associated account is used for paying the interest of any overdue loan corresponding to the clearing thread;
the recommending unit is used for analyzing each clearing clue by using the decision-making auxiliary model and determining a recommended clearing clue in the plurality of clearing clues according to an analysis result; and the recommended clearance clue is used as a basis for formulating a clearance strategy of the overdue loan to be cleared.
Optionally, the recommending unit includes:
the analysis unit is used for analyzing the clearing clues by utilizing a decision auxiliary model aiming at each clearing clue to obtain corresponding analysis results;
the judging unit is used for judging whether an analysis result corresponding to each clearing clue meets a preset clearing condition or not;
the first determining unit is configured to determine, for each clearing thread, the clearing thread as a recommended clearing thread if an analysis result corresponding to the clearing thread satisfies the clearing condition.
Optionally, the method further includes:
the optimization unit is used for determining whether the associated account corresponding to the recommended clearance clue is used for paying the original information of the overdue loan to be cleared or not for each recommended clearance clue, and then optimizing the decision-making auxiliary model by using the recommended clearance clue and the deduction identification of the associated account corresponding to the recommended clearance clue; and the deduction mark of the associated account corresponding to the recommended clearing clue is set according to whether the deposit of the corresponding associated account is used for paying the interest of the overdue loan to be cleared or not.
Optionally, the building unit includes:
the first calculating unit is used for calculating the importance of each category of the plurality of historical samples based on the plurality of historical samples by using a random forest algorithm;
a second determining unit, configured to determine a key category in the multiple categories according to an importance of each category of the history samples;
the sample construction unit is used for constructing a key sample corresponding to each history sample by using the characteristics of the key category of the history sample and the deduction identification of the history sample;
the second calculation unit is used for calculating an initial value as a current auxiliary model according to the deduction marks of the plurality of key samples and setting the iteration number to be 1;
the third calculation unit is used for calculating the negative gradient of each key sample by using the current auxiliary model;
the fourth calculating unit is used for calculating to obtain an updated value of the auxiliary model according to each key sample and the negative gradient of each key sample;
an updating unit, configured to update the current auxiliary model based on the auxiliary model update value, and increment the iteration number by 1;
the judging unit is used for taking the updated current auxiliary model as the current auxiliary model, judging whether the current iteration frequency is greater than a preset threshold value or not, and if the current iteration frequency is less than or equal to the threshold value, triggering the second calculating unit to calculate the negative gradient of each key sample by using the current auxiliary model; and if the iteration times are larger than the threshold value, determining the current auxiliary model as a decision auxiliary model.
Optionally, the first computing unit includes:
a decision tree building unit, configured to build M decision trees based on the multiple historical samples; wherein M is a preset positive integer, each decision tree is established by using a C4.5 algorithm, and the M decision trees form a random forest;
a first error calculation unit, configured to calculate a first out-of-bag data error of each decision tree in the random forest according to the plurality of historical samples;
a second error calculation unit, configured to apply random noise to all features of the category in the multiple historical samples for each category, and then calculate a second out-of-bag data error corresponding to the category for each decision tree according to the multiple historical samples to which the random noise is applied;
an importance calculation unit that calculates, for each category, an importance of the category from a first error and a second error of the category; wherein the first error comprises a first out-of-bag data error for each decision tree in the random forest, and the second error for the category comprises a second out-of-bag data error for each decision tree in the random forest corresponding to the category.
The invention provides a method and a device for clearing overdue loan, which are characterized in that after receiving a clearing request of the overdue loan to be cleared, clearing clues respectively corresponding to a plurality of associated accounts are obtained, the clearing clues are analyzed by using a decision-making auxiliary model, recommended clearing clues are determined from the clearing clues according to the analysis result, and the recommended clearing clues are used as a basis for formulating a clearing strategy of the overdue loan to be cleared; the associated accounts comprise deposit accounts except for appointed repayment accounts of clients to be cleared for overdue loans and deposit accounts of the supporters of the clients, each clearing clue comprises a plurality of categories of characteristics, the characteristics of each category are determined according to account information and client information, and only one characteristic of each category in one clearing clue is provided. Based on the scheme provided by the invention, the approver can formulate an acceptance policy by analyzing the recommended acceptance clues without analyzing each acceptance clue of the overdue loan to be accepted, so that the efficiency of accepting the overdue loan is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart illustrating a method for clearing a overdue loan, according to an embodiment of the invention;
FIG. 2 is a flow chart illustrating a method for constructing a decision-assistance model according to an embodiment of the present invention;
FIG. 3 is a flow chart illustrating a method for clearing a overdue loan, according to another embodiment of the invention;
FIG. 4 is a flowchart illustrating a method for calculating importance of each category in a history sample according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an apparatus for clearing an overdue loan, according to an embodiment of the invention;
fig. 6 is a schematic structural diagram of a construction unit of an apparatus for clearing overdue loans according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a first computing unit of an apparatus for clearing a overdue loan, according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present application provides a method for clearing a overdue loan, please refer to fig. 1, which includes the following steps:
s101, after receiving a clearing request of the overdue loan to be cleared, obtaining a plurality of clearing clues corresponding to the overdue loan to be cleared.
Each clearance thread corresponds to an associated account of the overdue loan to be cleared, and each clearance thread comprises: the characteristics of a plurality of categories are determined according to the account information of the associated account corresponding to the clearance clue and the customer information of the customer waiting to clear the overdue loan, and each category in one clearance clue only corresponds to one characteristic.
The clearance thread may be recorded in the manner shown in table 1.
Figure GDA0003586865680000071
Figure GDA0003586865680000081
TABLE 1
Shown on line 1 of table 1 are categories that are characteristics in the respective clearance threads, and shown on lines 2 through 6 are 5 clearance threads that include the characteristics of the categories.
It should be noted that the categories included in the purge thread are not limited to the categories shown in table 1, and besides the categories in table 1, the purge thread may further include: client risk preference, whether to involve a client, whether to involve an owner, client classification, client guarantor, and the like.
The associated account for the overdue loan to be cleared includes a deposit account of the client for the overdue loan to be cleared, other than the agreed repayment account, and/or a deposit account of a guarantor of the client. The deposit account refers to an account with a non-zero current deposit.
The account information of the associated account and the customer information of the customer waiting to settle the overdue loan can be obtained by processing service data acquired from a plurality of data sources in advance, and the data sources for acquiring the service data include but are not limited to customer data registered in a bank system, historical data for recording historical business transactions between the customer and a guarantor thereof and the bank, and data acquired from outside the bank system in the mode of internet and the like. Data outside of the banking system may include jurisdictional data, customs data, public opinion data, etc. for the customer and their insurer.
By carrying out structuralization processing on unstructured data and necessary data cleaning on the obtained business data, account information of an associated account of the overdue loan to be cleared and customer information of a customer of the overdue loan to be cleared can be obtained. Optionally, the account information and the customer information may be stored in a data mart.
And S102, analyzing each clearing clue by using a decision auxiliary model, and determining recommended clearing clues in the plurality of clearing clues according to the analysis result.
The determined recommended clearance clue can be used as a basis for formulating a clearance strategy of the overdue loan to be cleared.
Specifically, the clearance strategy of the overdue loan to be cleared is set by the examining and approving personnel of the bank. That is to say, after the method provided by the embodiment of the application is used for determining the recommended clearing clue, the examining and approving personnel can analyze the recommended clearing clue, so that the clearing strategy of the overdue loan to be cleared is formulated according to the recommended clearing lines.
Generally, the recommended clearance lead finally determined in the embodiment of the application is a part of clearance leads in all clearance leads corresponding to the overdue loan to be cleared, that is, the quantity of the recommended clearance leads is less than that of the clearance leads for one overdue loan to be cleared, so that the time for analyzing the recommended clearance lead by an approver is shorter than that for analyzing all clearance leads of the overdue loan to be cleared, and the clearance strategy of the overdue loan to be cleared can be formulated more efficiently according to the recommended clearance lead.
Further, the associated accounts corresponding to the recommended clearance cues may be referred to as recommended clearance accounts, the clearance policy formulated by the approver may specify one or more recommended clearance accounts of the recommended clearance accounts determined in step S102, the deposits of the specified recommended clearance accounts are used for paying the principal of the overdue loan to be cleared (that is, for clearing the overdue loan), and the deposits of the unspecified recommended clearance accounts are not used for paying the principal of the overdue loan to be cleared (that is, not used for clearing the overdue loan).
Of course, the clearing policy made by the approver may not specify any recommended clearing account, that is, the recommended clearing accounts may not be used for clearing overdue loans according to the decision of the approver.
For each clearance lead obtained in step S101, if one clearance lead is not determined as the recommended clearance lead, the deposit of the associated account corresponding to the clearance lead is not used for paying the principal of the overdue loan to be cleared.
The decision auxiliary model is established according to a plurality of historical samples, each historical sample comprises an clearing clue and a deduction mark of the associated account corresponding to the clearing clue, and the deduction mark indicates whether the deposit of the corresponding associated account is used for paying the information of any overdue loan corresponding to the clearing clue.
The clearing clue included in the history sample is the clearing clue corresponding to the overdue loan which has been cleared by the bank.
For any history sample, if the deposit of the associated account corresponding to the history sample is used for paying the interest of the corresponding overdue loan, the deduction identifier of the history sample can be set as 'acceptance', and if the deposit of the associated account corresponding to the history sample is not used for paying the interest of the corresponding overdue loan, the deduction identifier of the history sample can be set as 'rejection'.
According to the method for clearing the overdue loan, after a clearing request of the overdue loan to be cleared is received, clearing clues corresponding to a plurality of associated accounts are obtained, each clearing clue corresponds to one associated account, then the clearing clues are analyzed by using a decision-making auxiliary model, so that a part of clearing clues are determined as recommended clearing clues according to an analysis result, and the determined recommended clearing clues can be used as a basis for an approver to formulate a clearing strategy of the overdue loan to be cleared; the multiple associated accounts comprise deposit accounts except for appointed repayment accounts of clients to be cleared of overdue loans and deposit accounts of holders of the clients, each clearing clue comprises multiple categories of characteristics, the characteristics of each category are determined according to account information of the corresponding associated account and client information, and only one characteristic of each category in one clearing clue is provided. The technical scheme provided by the embodiment of the application can automatically screen a plurality of clearance clues of the overdue loan to be cleared, and then provides recommended clearance clues obtained by screening for the examining and approving personnel to analyze.
Furthermore, when an clearance strategy is formulated, the examining and approving personnel can select the recommended clearance account according to the analysis of the recommended clearance clue, and the deposit of the selected recommended clearance account is used for paying the information of the overdue loan to be cleared. By selecting one or more recommended clearing accounts, the channel for clearing the overdue loan of the bank can be expanded, and the clearing work of the overdue loan can be completed more quickly.
The key point of the method for clearing the overdue loan provided by any embodiment of the application is to analyze a plurality of clearance clues by using a decision-making auxiliary model so as to determine recommended clearance clues. Therefore, referring to fig. 2, a method for building a decision-assistance model based on historical samples is provided below, and the method for building a decision-assistance model includes the following steps:
s201, determining a key category in the multiple categories according to the importance of each category of the historical samples.
The categories of the history sample, that is, the categories of the respective features of the clearance threads included in the history sample, that is, the associated contract types shown in table 1, the account usage and the account status, and the like.
The specific execution process of step S201 may be to sort the categories in order from the highest importance degree to the lowest importance degree, rank the category with the highest importance degree first, and so on, and then select the first N categories as the key categories. N is a positive integer preset according to the number of categories of the history samples. In general, N may be set to 20, that is, after sorting the categories according to importance, the top 20 categories are selected as key categories.
The importance of each category is calculated based on a plurality of historical samples by using a random forest algorithm.
S202, aiming at each historical sample, constructing the key sample corresponding to the historical sample by using the characteristics of the key category of the historical sample and the deduction identification of the historical sample.
Step S202, for each history sample, extracting features corresponding to the aforementioned key category from all the features of the clearance clue of the history sample, and constructing the key sample corresponding to the clearance clue by using the features of the clearance clue corresponding to the key category and the deduction identifier corresponding to the clearance clue.
Taking table 1 as an example, if the clearance thread 1 to clearance thread 5 in table 1 belong to the history sample 1 and the history sample 2 … history sample 5, respectively, and of the 6 categories listed in table 1, three categories of the associated contract type, the account usage and the account status are determined as the key categories. Then for each of the 5 history samples, the key sample corresponding to the history sample is obtained by using the associated contract type of the clearing clue of the history sample, the characteristics of the three categories of the account usage and the account status, and the deduction identification combination of the history sample.
It should be noted that, in order to perform the subsequent process of constructing the decision-making assistance model, the deduction identification of each history sample needs to be represented in a digital form. For example, a "decline" may be represented by 0, i.e., a native message indicating that the corresponding deposit of the associated account has not been used to pay any overdue loans; the designation of "accept" is used to indicate that the deposit of the corresponding associated account was used to pay the instinct of the corresponding overdue loan.
Taking the clearing thread 1 and the corresponding history sample 1 as an example, assuming that the deduction identifier of the history sample 1 is "reject", the key sample corresponding to the history sample 1 is (loan, general, normal, 0), wherein the combination of all the features of the key sample, i.e., (loan, general, normal), can be regarded as the feature vector of the key sample.
S203, calculating an initial value as a current auxiliary model according to the deduction marks of the plurality of key samples, and setting the iteration number to be 1.
In this embodiment, the number of iterations may be denoted as t.
The above initial value is generally calculated based on the following formula (denoted as formula 1):
Figure GDA0003586865680000121
in the above formula, m is the number of key samples used to construct the decision-assistance model. The constructed decision-making assistant model needs to be tested by using part of key samples, so that the total number of the key samples is multiplied by 0.9 in general, the result is rounded to be used as the number of the constructed key samples, and the rest 10 percent of the key samples are used for testing. For example, if there are currently 100 historical samples, and 100 key samples are constructed according to the historical samples, 90 key samples are used for constructing the decision-making assistance model, and the remaining 10 key samples are used for testing the constructed model, that is, m of the above formula is equal to 90.
Optionally, the composition of the plurality of key samples for constructing the decision-making auxiliary model may be further adjusted, and a part of the key samples with deduction marks of "accept" is deleted, so as to increase the proportion of counter cases in the plurality of key samples for constructing the decision-making auxiliary model, so as to achieve an effect of improving the counter case recognition early warning capability of the constructed decision-making auxiliary model. Wherein, the counter example refers to the key sample with the deduction mark of 'reject'.
Wherein yiDeduction marks representing key samples i, conversion of deduction marks based on the previous steps, yiIs 1 or 0, c is the initial value to be calculated, L (y)iAnd c) representing an initial loss function of the key sample i calculated according to the deduction identification and the initial value of the key sample i, wherein f is arranged on the left side of the formula0(x) Representing the current auxiliary model representing the first iteration.
The calculation process of equation 1 is equivalent to calculating an initial value for m key samples used to construct the decision-making assistance model, which satisfies the following condition:
and respectively calculating loss functions of m key samples according to the initial value, wherein the sum of m calculation results is the minimum value.
The initial value obtained by calculation can be recorded as c0
Since the deduction mark is represented by 1 and 0, respectively, the initial value calculated in step S203 is a number greater than 0 and less than 1.
The loss function may have a variety of specific expressions, as shown in the following formula 2, and is a usable loss function:
L(yi,c)=(yi 2-c2)2
the loss function between two values can represent the magnitude of the difference between the two values, as shown in equation 2 above, yiThe smaller the difference between c and c, the smaller the loss function between the two, and the larger the difference, the larger the loss function.
Thus, the above equation 1 can also be understood as,determining an initial value c0So that c is0The difference with the whole deduction mark of m key samples is minimized.
And S204, calculating to obtain the negative gradient of each key sample by using the current auxiliary model.
The negative gradient of the key sample can be calculated based on the following equation 3:
Figure GDA0003586865680000131
wherein r isiRepresents the negative gradient of the key sample i, t represents the current iteration number, xiFeature vector, f, representing key sample it-1(xi) And representing the result obtained after the feature vector of the key sample i is input into the current auxiliary model. Current auxiliary model of t-th iteration using ft-1(x) And (4) showing. For the first iteration, t is 1, and the current auxiliary model is f0(x) That is, the initial value calculated in step S203 is output for the first iteration regardless of the input feature vector, and the initial value calculated in step S203 is output. If t is greater than or equal to 2, the current auxiliary model f of the t-th iterationt-1(x) Namely, in the t-1 iteration, the updated current auxiliary model obtained through the subsequent steps, and for the second iteration and each subsequent iteration, the output f of the current auxiliary modelt-1(xi) Will follow the feature vector xiFor different differences, please refer to the description of the following steps.
The above formula 3 is equivalent to that the partial derivative is calculated by the loss function on the output of the current auxiliary model, and the obtained result multiplied by-1 is the negative gradient of the corresponding key sample. The above partial derivatives have different results depending on the expression of the loss function. If the loss function is expressed by the foregoing formula 2, the negative gradient calculation formula shown in formula 3 can be converted to the following formula 4:
ri=2×(ft-1(xi)-yi)
and S205, calculating to obtain an update value of the auxiliary model according to each key sample and the negative gradient of each key sample.
Step S205, specifically, an update value of the auxiliary model is calculated according to the feature vector of each key sample and the negative gradient of the key sample.
The specific implementation process of step S205 may include the following steps:
and fitting to obtain a Classification Regression Tree (CART) of the iteration according to the feature vector of each key sample And the negative gradient of each key sample, wherein for the t-th iteration, the CART obtained by fitting can be recorded as CART-t, namely the Classification Regression Tree t.
The CART-t obtained by fitting can divide the m key samples into a plurality of sets according to the difference of characteristics among the key samples, each set is equivalent to one leaf node of the CART-t, and each set comprises at least one key sample and a negative gradient corresponding to the key sample. Each of the above sets, or leaf nodes, may be represented by RtjAnd the subscript t indicates that the leaf node is a leaf node of the classification regression tree t, j indicates the number of the leaf node in the corresponding classification regression tree, and the value range is a positive integer which is greater than or equal to 1 and less than or equal to the total number of the leaf nodes of the classification regression tree t.
For each leaf node of the classification regression tree of the iteration, calculating the best fit value of the leaf node according to the following formula 5:
Figure GDA0003586865680000141
the above formula shows that for each leaf node R of the classification regression tree ttjBest fit value c of this leaf nodetjIs a numerical value satisfying the following conditions:
for nodes belonging to leaves RtjIs based on the feature vector x of this key sample iiCalculated output ft-1(xi) And ctjAdding the two, and deducting the mark y of the key sample i according to the result obtained by adding the twoiAnd calculating to obtain a loss function of the key sample i in the iteration. Best fit value ctjShould be such that leaf node RtjThe sum of the loss functions of each key sample in the current iteration reaches the minimum value.
It should be noted that the expressions of the loss functions of the above-described formulas 3 and 5 should be identical to the expressions of the loss functions used when the initial values are calculated in step S203. The method provided by this embodiment may be performed based on multiple loss functions, but in the whole implementation process of any one specific embodiment, the same loss function should be used.
And the classification regression tree t obtained by fitting in the iteration process and the optimal fitting value of each leaf node of the classification regression tree t form an updated value of the auxiliary model obtained by calculation in the iteration process.
And S206, updating the current auxiliary model based on the auxiliary model updating value, and increasing the iteration number by 1.
The update of the current auxiliary model may be performed based on the following equation 6:
Figure GDA0003586865680000151
in the above formula, M represents the total number of leaf nodes of the classification regression tree t obtained by fitting in the current iteration process, and since fitting of the classification regression tree in each iteration process is performed based on the same M feature vectors and the negative gradients corresponding thereto, the total number of leaf nodes is M for each classification regression tree t.
ft(x) Representing the updated current auxiliary model, which may also be said to be the current auxiliary model after the t-th update, ft-1(x) Represents the current auxiliary model before the t-th update, and f is 1t-1(x) Is the initial value set in step S203, and when t is greater than or equal to 2, ft-1(x) Representing the current auxiliary model output after the end of the previous iteration, i.e. the firstAnd (4) updating the current auxiliary model t-1 times.
Formula 6 shows that, for any feature vector x, after the feature vector is input into the current auxiliary model updated for the t time, the current auxiliary model updated for the t time is classified according to each feature of the feature vector by using the classification regression tree t, so as to determine the leaf node R corresponding to the feature vector in the classification regression tree ttjThen using the output of the feature vector in the current auxiliary model after t-1 time of updating and the leaf node RtjThe best fitting values of (a) are added to obtain the output of the feature vector x in the current auxiliary model after the t-th update. Wherein the feature vector x is composed of features of all key categories in any one of the clearance threads.
The iteration number is incremented by 1, which corresponds to the current iteration number being incremented by 1, and then the obtained result is assigned to the current iteration number.
And S207, taking the updated current auxiliary model as a current auxiliary model.
In step S207, the updated current auxiliary model is assigned to the current auxiliary model.
S208, judging whether the current iteration number is larger than a preset threshold value, and if the current iteration number is smaller than or equal to the threshold value, returning to execute the step S204; if the current iteration number is greater than the threshold, step S209 is executed.
The preset threshold may also be considered as a maximum iteration number, which is a positive integer artificially determined according to the accuracy requirement and the time requirement, and the maximum iteration number is increased within a certain range, so that the finally obtained decision-making auxiliary model has higher accuracy, but the time required for constructing the decision-making auxiliary model is increased, and the maximum iteration number is decreased within a certain range, so that the time required for constructing the decision-making auxiliary model is reduced, but the accuracy of the finally obtained decision-making auxiliary model is reduced.
In general, the maximum number of iterations may be set to a positive integer greater than or equal to 5 and less than or equal to 10.
The execution of step S204 to step S208 corresponds to one iteration, the execution of step S208 is finished, which corresponds to the completion of one iteration, and the return from step S208 to step S204 corresponds to the start of the next iteration.
And S209, determining the current auxiliary model as a decision auxiliary model.
The decision-making assistance model output in step S209 can be expressed as the following formula 7:
Figure GDA0003586865680000161
in the above formula, c0Represents the initial value calculated in step S203, f (x) represents the decision-making auxiliary model, T represents the threshold mentioned in step S208, i.e. the maximum number of iterations, and M represents the total number of leaf nodes of each classification regression tree T.
Based on the iteration process, it can be seen that the finally output decision-making auxiliary model includes T classification regression trees, which are classification regression tree 1 and classification regression tree 2 … … classification regression tree T in turn, where each classification regression tree corresponds to M leaf nodes, and each leaf node corresponds to a best fit value.
Optionally, the method provided in this embodiment may further include:
and S210, testing the decision auxiliary model by using key samples which are not used for constructing the decision auxiliary model.
As described in step S203, a part of the key samples may be selected from all the key samples for constructing the decision-assistance model, and another part of the key samples is used for testing the constructed decision-assistance model.
The testing process includes inputting a feature vector of a key sample for testing to the decision-making assistant model, and then obtaining an output value of the decision-making assistant model based on the feature vector, where for the decision-making assistant model described in equation 7, the output value is a positive number greater than 0 and less than 1, and a difference between the output value and a deduction identifier of the key sample for testing is an analysis error of the key sample by the decision-making assistant model. And calculating the analysis error of each key sample for testing one by one, namely evaluating the accuracy of the decision-making auxiliary model according to all the analysis errors.
In conjunction with the method for constructing a decision assistance model described in the previous embodiment, another embodiment of the present application provides a method for clearing a overdue loan, please refer to fig. 3, which includes the following steps:
s301, after receiving a clearing request of the overdue loan to be cleared, obtaining a plurality of clearing clues corresponding to the overdue loan to be cleared.
S302, aiming at each clearing clue, analyzing the clearing clue by using a decision auxiliary model to obtain a corresponding analysis result.
And analyzing an acceptance cue by using the decision-making auxiliary model to obtain a corresponding analysis result, namely inputting the acceptance cue into the decision-making auxiliary model to obtain the output of the decision-making auxiliary model based on the acceptance cue.
The implementation of step S302 is briefly described below with reference to the decision-assistance model shown in equation 7.
It should be noted that, according to step S201 and step S202 in the previous embodiment, each collection thread includes the features of each key category determined in the previous embodiment, and there is only one or one feature of any key category in one collection thread. Therefore, inputting an acceptance cue to the decision-assistance model corresponds to inputting a feature vector composed of features of each key category of the acceptance cue to the decision-assistance model.
And each classification regression tree of the decision-making auxiliary model is formed and used for classifying the input feature vectors according to the features of all key categories of the input feature vectors. Therefore, although the input clearance clue includes a plurality of non-key categories of features in addition to the key category of features, the non-key categories of features do not affect the analysis process and the analysis result of the decision-making assistant model, so that inputting a clearance clue is equivalent to inputting a feature vector corresponding to the clearance clue.
For the decision-making auxiliary model shown in formula 7, the initial value is not changed after the construction is completed, so that after the clearing clue is input, the decision-making auxiliary model calls the classification regression tree of the decision-making auxiliary model one by one to classify the clearing clue according to the characteristics of each key category of the clearing clue.
The calling process may be called in sequence, that is, the classification regression tree 1 is called first to classify the input clearing clue, and it is determined that the input clearing clue corresponds to a certain leaf node R of the classification regression tree 11jThen, the best fitting value c of the input clearing clue in the classification regression tree 1 can be determined1jBy analogy, for the decision-making assistance model shown in equation 7, T best-fit values, i.e., c, among which the input clearance clue is determined1j,c2j……cTjThe determined T best-fit values are added to the initial value of the decision-making assistance model, and the obtained numerical value is the analysis result of the decision-making assistance model shown in formula 7 on the clearing clue.
And S303, judging whether the analysis result corresponding to each clearing clue meets the preset clearing condition or not.
For the analysis result obtained by using the decision model shown in equation 7, the predetermined clearing condition may be set as a threshold. Step S303 is equivalent to determining, for each clearing thread, whether an analysis result of the clearing thread, that is, whether a numerical value output by the decision-making auxiliary model is greater than a set threshold value, and if the analysis result is greater than the threshold value, determining that the clearing thread satisfies a clearing condition, otherwise, determining that the clearing thread does not satisfy the clearing condition.
When the decision assistance model shown in equation 7 is constructed, for the deduction mark of the key sample, 1 is used for "accept", and 0 is used for "reject", so the threshold value may be set to a value slightly smaller than 1, for example, to 0.9.
S304, aiming at each clearing and accepting thread, if the analysis result corresponding to the clearing and accepting thread meets the clearing and accepting condition, determining the clearing and accepting thread as a recommended clearing and accepting thread.
Assuming that the threshold is set to 0.9, if the analysis result of one clearing thread is 0.95, the clearing thread is determined as the recommended clearing thread, and if the analysis result of one clearing thread is 0.8, the clearing thread is not the recommended clearing thread.
S305, for each recommended clearing clue, setting a corresponding deduction identifier according to whether the associated account corresponding to the recommended clearing clue is used for paying the original message or not.
And for each recommended clearance thread, whether the associated account corresponding to the recommended clearance thread is used for paying the message of the corresponding overdue loan to be cleared or not is artificially determined by the examining and approving personnel of the bank.
In step S304, after the recommended clearance leads are determined, each recommended clearance lead, and the analysis results of the recommended clearance leads, the corresponding account information and the corresponding customer information may be displayed to the examining and approving staff of the bank, so that the examining and approving staff determine the associated account for paying the information of the overdue loan to be cleared from the plurality of recommended clearance leads.
For each recommended clearance lead, if the corresponding associated account is determined by the examining and approving personnel to be used for paying the interest of the overdue loan to be cleared, the deduction identifier corresponding to the associated account is set as 'acceptance', and if the corresponding associated account is determined by the examining and approving personnel not to be used for paying the interest of the overdue loan to be cleared, the deduction identifier of the associated account is determined as 'rejection'.
S306, optimizing the decision-making auxiliary model by utilizing each recommendation clearing clue and deduction identification thereof.
It should be noted that, in the step S306, each recommended clearing cue determined in the step S304 is used, regardless of whether the associated account corresponding to the recommended clearing cue is used for paying the interest of the overdue loan to be cleared. That is, the referral clearance cues used to optimize the decision-assistance model include referral clearance cues with a corresponding deduction identified as "rejected" and referral clearance cues with a corresponding deduction identified as "accepted".
In the method for constructing a decision-making assistance model shown in fig. 2, which relates to a process of determining a key category from a plurality of categories of a historical sample according to importance of each category of the historical sample, a method for calculating importance of each category is described below as a reference, but of course, other methods for calculating importance may be applied to the method provided in any embodiment of the present application. Referring to fig. 4, the method for calculating the importance includes the following steps:
s401, establishing M decision trees based on the historical sample set.
The history sample set includes a plurality of history samples, and step S401 is equivalent to constructing M decision trees according to the plurality of history samples.
Wherein, M is a preset positive integer, and if the importance degree obtained by calculation is required to accurately reflect the actual situation, a larger M value can be set.
Each decision tree in step S401 is a decision tree created by using the multiple history samples as training samples and using a C4.5 algorithm.
The C4.5 algorithm is a mature algorithm for generating a decision tree, and therefore, the detailed process for constructing the decision tree is not described herein.
It should be noted that, for each of the decision trees, not all history samples participate in the construction process when constructing the decision tree. For each decision tree, when constructing the decision tree, it is necessary to randomly select a part of the history samples from all the history samples as the in-bag data of the decision tree, and then construct the decision tree based on the in-bag data of the decision tree.
It should be noted that the process of selecting a part of the history samples is a selection process that is put back, that is, after a decision tree is constructed, the in-bag data of the decision tree is combined with other unselected history samples to obtain an initial history sample set, and when other decision trees are constructed, the above process is repeated to randomly select a part of the history samples from the history sample set. Wherein, for each decision tree, all historical samples that are not used to construct the decision tree are collectively referred to as the out-of-bag data of the decision tree.
A decision tree, which corresponds to a unit of classification. The decision tree constructed based on the history sample set can output the deduction identification of the associated account corresponding to one clearing clue after the clearing clue of the history sample is obtained as an input, that is, "reject" or "accept" is output, which is equivalent to determining whether the input clearing clue is specifically of one of the two types.
Of course, the deduction identification output by the decision tree is only one result obtained by a specific algorithm responding according to each feature in the input clearing clue, and is not necessarily the true deduction identification (i.e. the deduction identification of the corresponding historical sample). That is, the clearing hint of the history sample with the deduction mark as "refusal" is input into a decision tree, and the deduction mark output by the decision tree may be "accept", that is, not consistent with the real deduction mark.
The M decision trees form a random forest.
S402, calculating a first out-of-bag data error of each decision tree in the random forest according to the historical sample set.
The out-of-bag data error of a decision tree refers to the error calculated from the out-of-bag data of the decision tree.
The method for calculating the error of the data outside the bag comprises the steps that for a decision tree, the clearing clues of all historical samples in the data outside the bag are input one by one, each clearing clue is input, the deduction identification output by the decision tree based on the clearing clue is obtained, the output deduction identification is compared with the deduction identification of the historical sample corresponding to the clearing clue, if the deduction identification is inconsistent with the deduction identification, the decision of the decision tree on the clearing clue or the historical sample is judged to be wrong, the number of wrong judgments of the decision tree is recorded and is marked as a, and if the number of the historical samples included in the data outside the bag of the decision tree is b, the error of the data outside the bag of the decision tree is equal to the number of the historical samples except the b.
And S403, aiming at each category, applying random noise to all the features of the category in the historical sample set, and calculating a second out-of-bag data error of each decision tree corresponding to the category.
Taking table 1 as an example, assume that the history sample set includes history samples 1 to 5, and the history samples include the clearing threads 1 to 5 in the order of table 1. If random noise needs to be applied to the account usage category of features, then for each clearance cue (or history sample), the account usage category of features for that clearance cue needs to be randomly replaced with another feature for that category. For example, if the account usage of the clearance thread 1 is "general", then this feature of clearance thread 1 would need to be replaced by "finance-through" or "medical insurance owner", and similarly, the "finance-through" of clearance thread 2 would need to be replaced by "general" or "medical insurance owner", and so on.
As mentioned above, this is a method of applying random noise to any one class of features in a historical sample set.
It should be noted that the first off-bag data error and the second off-bag data error are only used to indicate that the former is calculated from the off-bag data before applying noise, the latter is calculated from the off-bag data after applying noise, both are off-bag data errors, and the calculation method is the same.
In step S403, noise is applied to only one class of features at a time, and the features of the other classes are kept consistent with the features in the original historical sample set.
S404, calculating the importance of each category according to the first error and the second error of the category.
The first error comprises a first out-of-bag data error of each decision tree in the random forest, and the second error of any one category comprises a second out-of-bag data error of each decision tree in the random forest corresponding to the category.
One method for calculating the importance of a category in step S404 is to sum the first out-of-bag data errors of each decision tree in the random forest to obtain a first error, sum the second out-of-bag data errors of each decision tree in the random forest corresponding to the category to obtain a second error of the category, subtract the first error from the second error of the category, and divide the difference by M to obtain the importance of the category. This process can be understood with reference to the following equation 8:
Figure GDA0003586865680000221
where Im represents the importance of a class, errOOB1iThe first out-of-bag data error, errOOB2, representing any one of decision trees iiA second out-of-bag data error corresponding to the category is represented for any one decision tree i.
In combination with the method for clearing a overdue loan provided in any of the embodiments of the present application, another embodiment of the present application provides an apparatus for clearing a overdue loan, please refer to fig. 5, which includes the following units:
the obtaining unit 501 is configured to obtain multiple clearance clues corresponding to the to-be-cleared overdue loan after receiving the clearance request of the to-be-cleared overdue loan.
Each clearance thread corresponds to an associated account of the overdue loan to be cleared, and each clearance thread comprises: the characteristics of a plurality of categories are determined according to the account information of the corresponding associated account and the customer information of the customer waiting to settle the overdue loan, and each category in one clearing clue only corresponds to one characteristic; the related account of the overdue loan to be cleared comprises a deposit account of the client except for an agreed repayment account and/or a deposit account of a guarantor of the client.
A building unit 502, configured to build a decision-making assistance model according to a plurality of historical samples.
Each history sample comprises a clearing clue and a deduction mark of the associated account corresponding to the clearing clue, wherein the deduction mark indicates whether the deposit of the corresponding associated account is used for paying the information of any overdue loan corresponding to the clearing clue.
The recommending unit 503 is configured to analyze each of the clearing threads by using the decision-making auxiliary model, and determine a recommended clearing thread of the plurality of clearing threads according to an analysis result.
The recommended clearance clue is provided for the examining and approving personnel to analyze, and the examining and approving personnel is assisted to make a clearance strategy of the overdue loan to be cleared.
Optionally, when the approver formulates the clearing policy, one or more recommended clearing accounts may be selected, and the deposit of the selected recommended clearing accounts is used for paying the interest of the overdue loan to be cleared. Of course, the principal of the overdue loan to be cleared may be paid without recommending a deposit to clear the account, based on the analysis of the approver. The recommendation clearing account is an account corresponding to the recommendation clearing thread determined by the recommending unit 503.
Optionally, the recommending unit 503 may include:
and the analysis unit is used for analyzing the clearing clues by utilizing a decision auxiliary model aiming at each clearing clue to obtain corresponding analysis results.
And the judging unit is used for judging whether the analysis result corresponding to each clearing clue meets the preset clearing condition or not.
The first determining unit is configured to determine, for each clearing thread, the clearing thread as a recommended clearing thread if an analysis result corresponding to the clearing thread satisfies the clearing condition.
Optionally, the apparatus further comprises:
the optimizing unit 504 is configured to, for each recommended clearing thread, determine whether the associated account corresponding to the recommended clearing thread is used for paying the interest of the to-be-cleared overdue loan, and optimize the decision assistance model by using the recommended clearing thread and the withholding identifier of the associated account corresponding to the recommended clearing thread.
And the deduction mark of the associated account corresponding to the recommended clearing clue is set according to whether the deposit of the corresponding associated account is used for paying the interest of the overdue loan to be cleared or not.
Referring to fig. 6, the building unit 502 includes:
a first calculating unit 601, configured to calculate, by using a random forest algorithm, an importance of each category of the plurality of history samples based on the plurality of history samples.
A second determining unit 602, configured to determine a key category in the multiple categories according to the importance of each category of the history sample.
The sample construction unit 603 is configured to, for each history sample, construct a key sample corresponding to the history sample by using the features of the key category of the history sample and the deduction identifier of the history sample.
A second calculating unit 604, configured to calculate an initial value as a current auxiliary model according to the deduction identifiers of the multiple key samples, and set the iteration number to 1.
A third calculating unit 605, configured to calculate a negative gradient of each of the key samples by using the current auxiliary model.
A fourth calculating unit 606, configured to calculate an updated value of the auxiliary model according to each of the key samples and the negative gradient of each of the key samples.
An updating unit 607, configured to update the current auxiliary model based on the auxiliary model update value, and increment the iteration number by 1.
A determining unit 608, configured to use the updated current auxiliary model as the current auxiliary model, and determine whether a current iteration count is greater than a preset threshold, if the current iteration count is less than or equal to the threshold, trigger the second calculating unit 604 to calculate a negative gradient of each key sample by using the current auxiliary model; and if the iteration times are larger than the threshold value, determining the current auxiliary model as a decision auxiliary model.
Referring to fig. 7, the first calculating unit 601 may include the following units:
a decision tree establishing unit 701, configured to establish M decision trees based on the multiple history samples; and M is a preset positive integer, each decision tree is established by using a C4.5 algorithm, and the M decision trees form a random forest.
A first error calculating unit 702, configured to calculate a first out-of-bag data error of each decision tree in the random forest according to the plurality of history samples.
A second error calculation unit 703, configured to apply random noise to all features of the category in the multiple historical samples for each category, and then calculate a second out-of-bag data error corresponding to the category for each decision tree according to the multiple historical samples to which the random noise is applied.
An importance calculation unit 704 that calculates, for each category, an importance of the category from the first error and the second error of the category; wherein the first error comprises a first out-of-bag data error for each decision tree in the random forest, and the second error for the category comprises a second out-of-bag data error for each decision tree in the random forest corresponding to the category.
For the device for clearing a overdue loan provided in the embodiment of the present application, specific working principles thereof may refer to the method for clearing a overdue loan provided in any embodiment of the present application, and details thereof are not repeated herein.
After receiving an acceptance request of an overdue loan to be accepted, the overdue loan acceptance device provided by the embodiment of the application has the advantages that the obtaining unit 501 obtains acceptance clues respectively corresponding to a plurality of associated accounts, each acceptance clue corresponds to one associated account, then the recommending unit 503 analyzes the acceptance clues by using a decision-making auxiliary model, the decision-making auxiliary model is established by the establishing unit 502 according to a plurality of historical samples, the recommended acceptance clues are determined from the acceptance clues of the overdue loan to be accepted according to the analysis result, and the recommended acceptance clues can be provided for an approver and serve as a basis for the approver to formulate an acceptance strategy of the overdue loan to be accepted; the multiple associated accounts comprise deposit accounts except for appointed repayment accounts of the clients to receive the overdue loan and deposit accounts of the supporters of the clients, each clearing clue comprises multiple categories of characteristics, and the characteristics of each category are determined according to account information of the associated accounts and client information of the clients to receive the overdue loan. The method provided by the embodiment of the application can automatically screen a plurality of clearance threads of the overdue loan to be cleared, the recommended clearance threads obtained by screening can be used as a basis for formulating a clearance strategy of the overdue loan to be cleared, and the method assists the examining and approving personnel of the bank in clearing the overdue loan. Compared with the existing method for clearing the overdue loan, the method provided by the embodiment of the application enables the approver to formulate the clearing strategy only by analyzing the selected recommended clearing clues without considering each clearing clue of the overdue loan to be cleared, so that the workload of the approver can be effectively reduced, and the clearing efficiency of the overdue loan to be cleared is improved.
Those skilled in the art will be able to make and use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (4)

1. A method of clearing a loan that is overdue, comprising:
after receiving a clearing request of the overdue loan to be cleared, acquiring a plurality of clearing clues corresponding to the overdue loan to be cleared; each clearance cue corresponds to an associated account of the overdue loan to be cleared, and each clearance cue comprises: the characteristics of a plurality of categories are determined according to the account information of the corresponding associated account and the customer information of the customer waiting to settle the overdue loan, and each category in one clearing clue only corresponds to one characteristic; the related account of the overdue loan to be cleared comprises a deposit account of the client except for an agreed repayment account and/or a deposit account of a guarantor of the client;
analyzing each clearing clue by using a decision auxiliary model, and determining a recommended clearing clue in the plurality of clearing clues according to an analysis result; the decision-making auxiliary model is established according to a plurality of historical samples, each historical sample comprises an acceptance clearance clue and a deduction identifier of an associated account corresponding to the acceptance clearance clue, and the deduction identifier indicates whether a deposit of the corresponding associated account is used for paying the interest of any overdue loan corresponding to the acceptance clearance clue;
aiming at each recommended clearance thread, after determining whether the associated account corresponding to the recommended clearance thread is used for paying the message of the overdue loan to be cleared or not by an approver, optimizing the decision-making auxiliary model by using the recommended clearance thread and the deduction mark of the associated account corresponding to the recommended clearance thread; the deduction mark of the associated account corresponding to the recommended clearing clue is set according to whether the deposit of the corresponding associated account is used for paying the interest of the overdue loan to be cleared or not;
the process of establishing the decision-making auxiliary model comprises the following steps: determining a key category in the multiple categories according to the importance of each category of the historical sample; calculating the importance of each category based on the plurality of historical samples by using a random forest algorithm;
for each historical sample, constructing a key sample corresponding to the historical sample by using the characteristics of the key category of the historical sample and the deduction identification of the historical sample;
calculating an initial value as a current auxiliary model according to deduction marks of a plurality of key samples, and setting the iteration number to be 1;
calculating to obtain the negative gradient of each key sample by using the current auxiliary model;
calculating to obtain an update value of an auxiliary model according to each key sample and the negative gradient of each key sample;
updating the current auxiliary model based on the auxiliary model update value, and increasing the iteration number by 1;
taking the updated current auxiliary model as a current auxiliary model, judging whether the current iteration number is greater than a preset threshold value, if the current iteration number is less than or equal to the threshold value, returning to execute the current auxiliary model, and calculating to obtain the negative gradient of each key sample; if the current iteration times are larger than the threshold value, determining the current auxiliary model as a decision auxiliary model;
the process of calculating the importance of each category based on the plurality of historical samples using a random forest algorithm includes: building M decision trees based on the plurality of historical samples; the M is a preset positive integer, each decision tree is established by using a C4.5 algorithm, and the M decision trees form a random forest; each decision tree utilizes the plurality of historical samples as training samples;
calculating a first out-of-bag data error of each decision tree in the random forest according to the plurality of historical samples;
for each category, applying random noise to all features of the category in the plurality of historical samples, and then calculating a second out-of-bag data error of each decision tree corresponding to the category according to the plurality of historical samples after the random noise is applied;
for each class, calculating the importance of the class according to the first error and the second error of the class; wherein the first error comprises a first out-of-bag data error for each decision tree in the random forest, and the second error for the category comprises a second out-of-bag data error for each decision tree in the random forest corresponding to the category.
2. The method of claim 1, wherein the analyzing the clearance threads using a decision-assistance model and determining a recommended clearance thread of the plurality of clearance threads according to the analysis comprises:
aiming at each clearing clue, analyzing the clearing clue by using a decision auxiliary model to obtain a corresponding analysis result;
judging whether an analysis result corresponding to each clearing clue meets a preset clearing condition or not;
and for each clearing clue, if the analysis result corresponding to the clearing clue meets the clearing condition, determining the clearing clue as a recommended clearing clue.
3. An apparatus for clearing a loan that is overdue, comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a plurality of clearance clues corresponding to the overdue loan to be cleared after receiving a clearance request of the overdue loan to be cleared; each clearance thread corresponds to an associated account of the overdue loan to be cleared, and each clearance thread comprises: the characteristics of a plurality of categories are determined according to the account information of the corresponding associated account and the customer information of the customer waiting to settle the overdue loan, and each category in one clearing clue only corresponds to one characteristic; the related account of the overdue loan to be cleared comprises a deposit account of the client except for an agreed repayment account and/or a deposit account of a guarantor of the client;
the construction unit is used for establishing a decision auxiliary model according to a plurality of historical samples; each history sample comprises an acceptance clearance clue and a deduction identifier of an associated account corresponding to the acceptance clearance clue, wherein the deduction identifier indicates whether a deposit of the corresponding associated account is used for paying the original information of any overdue loan corresponding to the acceptance clearance clue; the construction unit includes:
the first calculating unit is used for calculating the importance of each category of the plurality of historical samples based on the plurality of historical samples by using a random forest algorithm;
a second determining unit, configured to determine a key category in the multiple categories according to an importance of each category of the history sample;
the sample construction unit is used for constructing a key sample corresponding to each historical sample by using the characteristics of the key category of the historical sample and the deduction identification of the historical sample;
the second calculation unit is used for calculating an initial value as a current auxiliary model according to the deduction marks of the plurality of key samples and setting the iteration number to be 1;
the third calculation unit is used for calculating the negative gradient of each key sample by using the current auxiliary model;
the fourth calculating unit is used for calculating to obtain an updated value of the auxiliary model according to each key sample and the negative gradient of each key sample;
an updating unit, configured to update the current auxiliary model based on the auxiliary model update value, and increment the iteration number by 1;
the judging unit is used for taking the updated current auxiliary model as the current auxiliary model, judging whether the current iteration frequency is greater than a preset threshold value or not, and triggering the second calculating unit to calculate the negative gradient of each key sample by using the current auxiliary model if the current iteration frequency is less than or equal to the threshold value; if the iteration times are larger than the threshold value, determining the current auxiliary model as a decision auxiliary model;
the first calculation unit includes: a decision tree establishing unit for establishing M decision trees based on the plurality of history samples; wherein M is a preset positive integer, each decision tree is established by using a C4.5 algorithm, and the M decision trees form a random forest; each decision tree utilizes the plurality of historical samples as training samples;
a first error calculation unit, configured to calculate a first out-of-bag data error of each decision tree in the random forest according to the plurality of historical samples;
a second error calculation unit, configured to apply random noise to all features of the category in the multiple historical samples for each category, and then calculate a second out-of-bag data error corresponding to the category for each decision tree according to the multiple historical samples to which the random noise is applied;
an importance calculation unit that calculates, for each category, an importance of the category from a first error and a second error of the category; wherein the first error comprises a first out-of-bag data error for each decision tree in the random forest, and the second error for the category comprises a second out-of-bag data error for each decision tree in the random forest corresponding to the category;
the recommending unit is used for analyzing each clearing clue by using the decision-making auxiliary model and determining a recommended clearing clue in the plurality of clearing clues according to an analysis result; the recommended clearance clue is used as a basis for formulating a clearance strategy of the overdue loan to be cleared;
the optimization unit is used for determining whether the associated account corresponding to the recommended clearance thread is used for paying the information of the overdue loan to be cleared or not by an approver aiming at each recommended clearance thread, and then optimizing the decision auxiliary model by using the recommended clearance thread and the withholding identification of the associated account corresponding to the recommended clearance thread; and the deduction mark of the associated account corresponding to the recommended clearing clue is set according to whether the deposit of the corresponding associated account is used for paying the interest of the overdue loan to be cleared or not.
4. The apparatus of claim 3, wherein the recommending unit comprises:
the analysis unit is used for analyzing the clearing clues by utilizing a decision auxiliary model aiming at each clearing clue to obtain corresponding analysis results;
the judging unit is used for judging whether an analysis result corresponding to each clearing clue meets a preset clearing condition or not;
the first determining unit is configured to determine, for each clearing thread, the clearing thread as a recommended clearing thread if an analysis result corresponding to the clearing thread satisfies the clearing condition.
CN201910604647.8A 2019-07-05 2019-07-05 Method and device for clearing overdue loan Active CN110310200B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910604647.8A CN110310200B (en) 2019-07-05 2019-07-05 Method and device for clearing overdue loan

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910604647.8A CN110310200B (en) 2019-07-05 2019-07-05 Method and device for clearing overdue loan

Publications (2)

Publication Number Publication Date
CN110310200A CN110310200A (en) 2019-10-08
CN110310200B true CN110310200B (en) 2022-06-03

Family

ID=68078417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910604647.8A Active CN110310200B (en) 2019-07-05 2019-07-05 Method and device for clearing overdue loan

Country Status (1)

Country Link
CN (1) CN110310200B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11809501B2 (en) * 2014-08-28 2023-11-07 Ebay Inc. Systems, apparatuses, and methods for providing a ranking based recommendation
CN106778836A (en) * 2016-11-29 2017-05-31 天津大学 A kind of random forest proposed algorithm based on constraints
CN108460590B (en) * 2018-02-06 2021-02-02 北京三快在线科技有限公司 Information recommendation method and device and electronic equipment

Also Published As

Publication number Publication date
CN110310200A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
CN111291816B (en) Method and device for carrying out feature processing aiming at user classification model
CN108898479B (en) Credit evaluation model construction method and device
WO2017133456A1 (en) Method and device for determining risk evaluation parameter
CN110930038A (en) Loan demand identification method, loan demand identification device, loan demand identification terminal and loan demand identification storage medium
US8984022B1 (en) Automating growth and evaluation of segmentation trees
CN106408325A (en) User consumption behavior prediction analysis method based on user payment information and system
US20210271666A1 (en) Analyzing a processing engine of a transaction-processing system
CN111709826A (en) Target information determination method and device
CN111798304A (en) Risk loan determination method and device, electronic equipment and storage medium
CN110310200B (en) Method and device for clearing overdue loan
CN115358852A (en) Bond data processing method and device
CN113537960A (en) Method, device and equipment for determining abnormal resource transfer link
CN114626940A (en) Data analysis method and device and electronic equipment
US11276117B2 (en) Generating payables and receivables netting proposals based on historical information
CN113034264A (en) Method and device for establishing customer loss early warning model, terminal equipment and medium
CN112823502A (en) Real-time feedback service configured for resource access rules
CN115082079B (en) Method and device for identifying associated user, computer equipment and storage medium
CN114611972B (en) Merchant credit rating system and method based on artificial intelligence
CN113807956B (en) Data processing method, medium, equipment and system for joint loan
KR102308098B1 (en) An apparatus and method for providing user interfaces of managing transaction information based on automatic matching between accounts receivables and deposit information
CN115689732A (en) Risk assessment method for anti-money laundering analysis
KR20200034857A (en) An apparatus and method for managing transaction information providing automatic matching between accounts receivables and deposit information
Vidia The Influence of Gender, Audit Experience, Code of Ethics, and Islamic Religiosity on Audit Judgment
KR20200140764A (en) Apparatus and method for building a pattern database for accounting
CN117333303A (en) Method and system for establishing foreign exchange business risk grade assessment and assessment engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant