CN113642923A

CN113642923A - Bad asset pack value evaluation method based on historical collection urging data

Info

Publication number: CN113642923A
Application number: CN202111004797.9A
Authority: CN
Inventors: 庄涤坤; 刘建新; 赵雪; 黄平
Original assignee: Jianyuan Heguang Beijing Technology Co ltd
Current assignee: Jianyuan Heguang Beijing Technology Co ltd
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2021-11-12

Abstract

The invention discloses a bad asset pack value evaluation method based on historical collection urging data, which comprises the following steps: step 1, constructing a bad asset collection prediction model; step 2, ordering the importance of the variables; step 3, a plurality of most important variables are subjected to union calculation to obtain a most important variable set; step 4, obtaining newly constructed derivative variables and construction rules by using the variables in the most important variable set; step 5, sampling and creating a plurality of virtual resource packages, and obtaining a derivative variable of each virtual resource package according to the step 4; step 6, combining the derived variables in the step 5 with the total amount of the recovery amount promoted by the virtual asset package to obtain a bad asset package value evaluation model; and 7, creating derivative variables of the bad asset pack to be estimated, substituting the derivative variables into the bad asset value evaluation model in the step 6, and predicting the value of the asset pack. The invention abstracts an asset package characteristic which is crucial to the model construction, and ensures the accuracy of the undesirable asset multi-factor regression model.

Description

Bad asset pack value evaluation method based on historical collection urging data

Technical Field

The invention relates to the technical field of asset assessment, in particular to a bad asset package price value assessment method based on historical collection urging data.

Background

The financial bad assets refer to various equity, debt and physical assets held by the card holding financial institutions such as commercial banks and the like, which can not bring normal economic benefits to the card holding financial institutions. The financial bad asset disposal modes mainly comprise modes of litigation clearing, debt reorganization, debt right transfer, debt transfer, asset securitization and the like. The method can not carry out reasonable evaluation pricing on the poor assets in the poor asset treatment, and the evaluation becomes an important reference basis for trading between buyers and sellers on the poor asset market.

The financial undesirable asset assessment directive opinions introduced by the property assessment society of china in 2017 do not make specific requirements on the value type, the assessment method and the specific assessment process in the assessment, and only the seventeenth article, namely the property assessment professional should make clear the basic situation of the property assessment business and properly select the value type and the assessment method according to the assessment purpose, the assessment object, the asset disposal mode, the available assessment data and other factors, has framework requirements and is not really instructive. Currently, the market has no mature method for the valuation of bad asset transfer, and currently, an evaluation organization cannot take out a mature valuation report in a short term. This also results in a large randomness and uncertainty in the price of poor asset transfer on the market.

While a bad asset pack typically contains many bad asset cases, each case being of a very different condition and physical nature. In the bad portfolio valuation process, due to the asymmetry of the information of the buyer and the seller, the relatively perfect financial information of the deficient debtors and the future income, the variable present value of the debt depends on the actual financial condition and repayment willingness of each debtor.

The current bad asset pack estimation method mainly comprises the following steps:

1) static cash flow posting model: the key to this method is the determination of interest rates and cash flows. In the actual operation of the static cash flow chargeback model, the biggest difficult problems are the determination of future cash flow and the prediction of future interest rate trend. The texture and cash flow of individual cases is very difficult to judge and define since detailed knowledge and asset attribute quantification of debtors to each case is not possible during bad asset trading. Therefore, the method has no great practical significance for evaluation in the transaction process;

2) the Monte Carlo simulation is a calculation method based on probability theory and statistical theory. The basic principle is as follows: the method comprises the steps of simulating various cash flow paths by taking the initial price of the asset as a starting point under the condition of considering advance repayment and default, obtaining cash flow under each path, then pasting, and carrying out weighted average on the pasted values under all paths to obtain the theoretical price of the asset. This approach is also limited by whether there is a possibility of access to cash flows during the trading of undesirable assets;

3) a multi-factor regression model is established for the sample data of the bad asset package, the factors influencing the final value of the bad asset package are summarized through summarizing and summarizing the historical bad asset package, and then the factors are subjected to regression analysis by using a statistical model on the basis to establish the regression model. The multi-factor regression analysis adopts a statistical analysis method, is relatively suitable for pricing analysis of the bad assets, but needs a large number of bad asset handling cases, namely bad asset packs, as the basis of theoretical research, and meanwhile, the accuracy of final estimation depends on variables selected when a regression equation is established to a great extent, and if the initially selected relevant factors influencing the recovery rate of the bad assets are wrong, the final result may be far away from the actual situation.

In the existing methods, for example, the method 1 and the method 2 are based on the calculation or simulation of case cash flow, and the cash flow is subject to a plurality of objective (various attributes of bad asset cases) and subjective (actual repayment willingness of debtors) factors and other factors which are difficult to be embodied in cases, such as the current actual financial condition, working stability, family burden, health condition and the like of the debtors, so that the method is difficult to be suitable for value evaluation in the bad asset pack transaction process. Therefore, it is desirable to provide a bad asset pack value assessment method based on historical revenue data.

Disclosure of Invention

An object of the present invention is to solve at least the above problems and to provide at least the advantages described later.

The invention also aims to provide a method for evaluating the value of the bad asset package based on historical collection urging data, which adopts a multi-factor regression model method to create a plurality of virtual asset packages so as to meet a large number of data samples required by machine learning model training, and simultaneously abstracts a comprehensive, integral and comprehensive property package characteristic which is important for model construction, thereby ensuring the accuracy of the bad asset multi-factor regression model.

To achieve these objects and other advantages in accordance with the purpose of the invention, there is provided a bad asset pack value assessment method based on historical revenue data, comprising:

step 1, respectively constructing a plurality of different machine learning models according to existing bad asset cases and the recovery urging results thereof, and carrying out classification model training to obtain a bad asset collection urging prediction model.

And 2, sequencing the importance of the variables according to the influence effect of the variable selected by constructing the bad asset collection prediction model on the bad asset collection prediction model to obtain the most important variable subset influencing the recovery of the bad asset case.

And 3, merging a plurality of most important variables in each most important variable subset to obtain a plurality of most heavy variables from the union to serve as a final most important variable set.

And 4, deriving the variables in the most important variable set into overall characteristic variables of the bad asset package to obtain newly constructed derived variables and construction rules of the derived variables required for constructing the asset package to be estimated.

And 5, sampling the existing bad asset case data information to create a plurality of virtual resource packages, and obtaining the derivative variable of each virtual resource package according to the variable in the most important variable set in the step 4 and the construction rule.

And 6, combining the derived variables in the step 5 with the total amount of the recovery promoted by the virtual asset package to obtain a bad asset package value evaluation model.

And 7, summarizing the bad asset packs to be estimated, creating derivative variables through the construction rules, substituting the derivative variables into the bad asset pack price value evaluation model in the step 6, using various machine learning algorithms, combining results predicted by different machine learning algorithms through a voting recoverer, and returning an average predicted value to obtain a bad asset pack price value prediction result.

Preferably, the target variable of the model in the step 1 is a recovery urging result, and if the poor asset case recovery urging result is "recovery urging", the value of the target variable is 1; if the bad asset case recovery result is 'not recovery', the value of the target variable is 0;

wherein, the characteristic variables of the model are all available variables in the bad asset case machine and the return result data information thereof.

Preferably, the algorithm used for training the classification model in step 1 includes: performing logistic regression; random spanning trees and XGBoost.

Preferably, the concrete method for obtaining the union in step 3 is as follows:

taking out the Top m most important features of the three algorithms respectively, and assuming that the Top m most important features of the three models are combined as follows:

a: the first m most important variables of model 1 = { a1, a 2.. am };

b: the first m most important variables of model 2 = { b1, b 2.. bm };

c: the first m most important variables of model 3 = { c1, c 2.. cm };

the union is solved as AU B U C = { x: x ∈ A (or) x ∈ B (or) x ∈ C };

wherein, the models of the three algorithms are respectively a model 1, a model 2 and a model 3;

m is the number of the most important variables selected by each algorithm;

a = { a1, a2, … am }, ai represents the ith most important variable selected by model 1, and i ranges from 1 to m;

b = { B1, B2, … bm }; bi represents the ith most important variable selected by the model 2, and the value range of i is from 1 to m;

c = { C1, C2, … cm }, ci represents the ith most important variable selected by model 3, and i ranges from 1 to m.

Preferably, the most important variables are taken from the union in step 3, wherein the most important variables are defined as: given a variable Import = Importance of the variable in Model A + Importance of the variable in Model B + Importance of the variable in Model C.

Preferably, step 4 includes deriving variables for the new construction of numerical features and deriving variables for the new construction of classification feature variables.

Preferably, the sampling in step 5 is specifically: random sampling/condition random sampling is carried out by taking a case as a unit through the recovery urging result data information of the bad asset historical case and the case related factor data information, and the sampling quantity each time is in the range of { a, b };

wherein 0 < a < b < total number of bad asset cases; the number of cases in each virtual asset pack should be not less than 1000.

Preferably, the plurality of machine learning algorithms in step 7 includes: linear regression, random spanning tree, and Gradient Boosting.

The invention at least comprises the following beneficial effects:

the method comprises the steps of expecting to recover money data through history of the bad assets, training a recovery prediction model by using a machine on the basis, sequencing the importance of model variables, selecting a plurality of variables, sampling on the basis of the history data, forming a plurality of virtual bad asset packs, producing various derived characteristic variables of the virtual asset packs according to the importance of the variables, modeling the virtual asset packs by using the machine learning model, and generating a final bad asset pack valuation model.

Firstly, a bad asset collection prediction model is constructed, and the model is used for predicting the recovery probability of each case based on a large amount of existing case data and the collection result of each case. After the model is trained, the variables influencing the effect of the model are ranked in importance, so that the most important variable subsets influencing the case return are summarized. And when the bad asset collection prediction model is established, various machine learning algorithms are adopted, so that a plurality of variable subsets are obtained. And finally, performing union set on top m variables of each subset, and taking top n variables out of the union set as a final most important variable set. And deriving the characteristic variables of the final bad resource package by using the rule according to the indexes of the variable sets so as to represent the relevant characteristics of the real resource package. Therefore, one independent bad asset case is abstracted out accurately, which is important for the model construction and has comprehensive, integral and comprehensive property of the asset pack, and the problem that the accuracy of the selected variable cannot be determined due to the fact that each asset pack comprises a plurality of different debts or cases, each case has a plurality of original variables (such as account age, principal, interest, age of debtors, sex, occupation and the like) and the difference between single cases is large is solved, and the accuracy of a bad asset multi-factor regression model is further ensured.

And meanwhile, based on historical case data of the undesirable assets and collection result data of the historical case data, random sampling/condition random sampling is carried out by taking a case as a unit, so that a plurality of virtual asset packs are created to meet a large amount of data samples required by machine learning model training. The method solves the technical problems that the estimation value data of the bad asset package available in the market at present are very few, a large amount of estimation sample data of the bad asset package is difficult to obtain, and a machine learning algorithm is difficult to construct an estimation model based on a large amount of sample data.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.

Drawings

FIG. 1 is a flow chart of a bad asset pack value assessment method based on historical revenue collection data according to the present invention;

FIG. 2 is a graph of the importance ranking of variables according to the present invention.

Detailed Description

The present invention is further described in detail below with reference to the attached drawings so that those skilled in the art can implement the invention by referring to the description text.

It will be understood that terms such as "having," "including," and "comprising," as used herein, do not preclude the presence or addition of one or more other elements or groups thereof.

As shown in fig. 1, the present invention provides a bad asset pack value evaluation method based on historical collection data, which includes:

In the scheme, recovery data is urged to be recovered through the history of the bad assets, a recovery prediction model is trained by using a machine on the basis, then the importance of model variables is sequenced and a plurality of variables are selected, sampling is carried out on the basis of the history data to form a plurality of virtual bad asset packs, various derived characteristic variables of the virtual asset packs are produced according to the importance of the variables, and the machine learning model is used for modeling the virtual asset packs to generate a final bad asset pack valuation model. In which the variable importance is ranked as shown in figure 2.

Creating virtual asset package data required by model training, creating characteristic variables required by model training of the virtual asset package, and taking the total amount of the total urging refunds of all cases in the virtual asset package as the final value of the asset package. Based on the three, a bad asset package value evaluation model is constructed.

When a new bad asset pack is predicted by using the bad asset pack price evaluation model, firstly summarizing all cases in the asset pack according to the method in the step 4, and establishing a derivative variable according to a rule; asset pack value prediction is then performed using the model constructed in step 7.

In a preferred scheme, the target variable of the model in the step 1 is a drive-back result, and if the poor asset case drive-back result is 'drive-back', the value of the target variable is 1; if the bad asset case recovery result is 'not recovery', the value of the target variable is 0;

In the above scheme, all available variables in the data, such as: debtor locale, gender, age, occupation, education level, income status, capital for undesirable assets, capital for paid, capital for remaining, interest, fines, date of last payment, etc.

In a preferred embodiment, the algorithm used for training the classification model in step 1 includes: performing logistic regression; random spanning trees and XGBoost.

In a preferred embodiment, the concrete method for obtaining the union in step 3 is as follows:

a: the first m most important variables of model 1 = { a1, a 2.. am };

b: the first m most important variables of model 2 = { b1, b 2.. bm };

c: the first m most important variables of model 3 = { c1, c 2.. cm };

the union is solved as AU B U C = { x: x ∈ A (or) x ∈ B (or) x ∈ C };

m is the number of the most important variables selected by each algorithm;

c = { C1, C2, … cm }, ci represents the ith most important variable selected by the model 3, and i ranges from 1 to m;

the 2 nd x represents all members in a, the 3 rd x represents all members in B, and the 4 th x represents all members in C.

In a preferred embodiment, step 3 takes the most weight variables from the union, wherein the most important variables are defined as: given a variable Import = Importance of the variable in Model A + Importance of the variable in Model B + Importance of the variable in Model C.

In a preferred embodiment, step 4 includes deriving variables of the new configuration of the numerical characteristic variables and deriving variables of the new configuration of the classification characteristic variables.

In the above scheme, for numerical features (e.g., principal, remaining principal), the newly constructed derivative variables include: 1. summing; 2. average value; 3. a median value; 4. a maximum value; 5. variance;

for categorical feature variables (e.g., age group, gender, account age), newly constructed derived variables include (e.g., categorical variable is account age, the first of the variables is classified as "M12-M24", the second as "M24 +"):

1. number of cases in class 1

2. Total principal of cases for the 1 st class in the classification features

3. The case classified in the 1 st classification in the classification features is always paid

4. Total remaining principal of case for the 1 st classification in classification features

5. Total interest in case of the 1 st classification in classification features

6、… …

7. Number of cases in class 2 of classification features

8. Total principal of cases for the 2 nd classification in classification features

9. Cases classified in the 2 nd classification in the classification feature are always paid

10. Total remaining principal of case for the 2 nd classification in classification features

11. Total interest in case of the 2 nd classification in classification features

12、… …

13. In turn and so on

In a preferred embodiment, the sampling in step 5 is specifically: random sampling/condition random sampling is carried out by taking a case as a unit through the recovery urging result data information of the bad asset historical case and the case related factor data information, and the sampling quantity each time is in the range of { a, b };

In a preferred embodiment, the plurality of machine learning algorithms in step 7 includes: linear regression, random spanning tree, and Gradient Boosting.

While embodiments of the invention have been described above, it is not limited to the applications set forth in the description and the embodiments, which are fully applicable in various fields of endeavor to which the invention pertains, and further modifications may readily be made by those skilled in the art, it being understood that the invention is not limited to the details shown and described herein without departing from the general concept defined by the appended claims and their equivalents.

Claims

1. A bad asset pack value evaluation method based on historical collection urging data comprises the following steps:

step 1, respectively constructing a plurality of different machine learning models according to existing bad asset cases and recovery urging results thereof, and carrying out classification model training to obtain a bad asset urging and receiving prediction model;

step 2, according to variables selected by constructing the bad asset collection prediction model, the influence effect of the bad asset collection prediction model is sequenced, and the importance of the variables is sequenced to obtain the most important variable subset influencing the recovery of bad asset cases;

step 3, a plurality of most important variables in each most important variable subset are subjected to union collection, so that a plurality of most heavy variables are taken out from the union collection and serve as a final most important variable set;

step 4, deriving the variables in the most important variable set into overall characteristic variables of the bad asset package to obtain newly constructed derived variables and construction rules of the derived variables required for constructing the asset package to be estimated;

step 5, sampling the existing bad asset case data information to create a plurality of virtual resource packages, and obtaining a derivative variable of each virtual resource package according to the variables in the most important variable set in the step 4 and the construction rule;

step 6, combining the derived variables in the step 5 with the total amount of the recovery amount promoted by the virtual asset package to obtain a bad asset package value evaluation model;

2. The method for evaluating the value of the bad asset pack based on the historical recovery data as claimed in claim 1, wherein the target variable of the model in step 1 is the recovery urging result, and if the recovery urging result of the bad asset case is "recovery urging", the value of the target variable is 1; if the bad asset case recovery result is 'not recovery', the value of the target variable is 0;

3. The method for evaluating the value of a bad asset pack based on historical revenue-prompting data as claimed in claim 1, wherein the algorithm used for training the classification model in step 1 comprises: performing logistic regression; random spanning trees and XGBoost.

4. The method for evaluating the value of a bad asset pack based on historical collection hastening data as claimed in claim 1, wherein the specific method for the union in step 3 is as follows:

a: the first m most important variables of model 1 = { a1, a 2.. am };

b: the first m most important variables of model 2 = { b1, b 2.. bm };

c: the first m most important variables of model 3 = { c1, c 2.. cm };

the union is solved as AU B U C = { x: x ∈ A (or) x ∈ B (or) x ∈ C };

wherein, the models of the three algorithms are named as model 1, model 2 and model 3 respectively;

m is the number of the most important variables selected by each algorithm;

b = { B1, B2, … bm }, bi represents the ith most important variable selected by model 2, and i ranges from 1 to m;

5. The method for assessing the value of a bad asset pack based on historical revenue-inducing data as claimed in claim 1, wherein the most important variables are extracted from the union in step 3, wherein the most important variables are defined as: given a variable Import = Importance of the variable in Model A + Importance of the variable in Model B + Importance of the variable in Model C.

6. The method for assessing the value of a bad asset pack based on historical revenue-inducing data as claimed in claim 1, wherein step 4 includes deriving variables from the numerical feature new configuration and deriving variables from the classification feature variable new configuration.

7. The method for evaluating the value of a bad asset pack based on historical revenue-prompting data as claimed in claim 1, wherein the sampling in step 5 is specifically as follows: random sampling/condition random sampling is carried out by taking a case as a unit through the recovery urging result data information of the bad asset historical case and the case related factor data information, and the sampling quantity each time is in the range of { a, b };

8. The method for assessing value of a bad asset pack based on historical revenue-prompting data as claimed in claim 1, wherein the plurality of machine learning algorithms in step 7 comprises: linear regression, random spanning tree, and Gradient Boosting.