CN112581254A

CN112581254A - Method and device for measuring financial risk of small and micro enterprises

Info

Publication number: CN112581254A
Application number: CN202011475707.XA
Authority: CN
Inventors: 何泾沙; 夏新宇; 朱娜斐; 张宇晗; 宜裕紫; 陈宝存; 薛瑞昕
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2021-03-30

Abstract

The invention discloses a method, a device, electronic equipment and a storage medium for measuring financial risks of a small micro-enterprise, belonging to the technical field of risks of financial industries, wherein the method comprises the steps of taking industrial and commercial data, credit investigation data and peer-to-peer data of the small micro-enterprise and in-line settlement data, blacklist data and historical data of the small micro-enterprise in a bank as rejection strategies, and carrying out data preprocessing on invoice data of the small micro-enterprise obtained by screening to obtain a data characteristic set; respectively calculating the data feature set by using an LR model and an XGboost model, comparing the average value of the obtained probability mapping scores with a threshold value, and if the average value is greater than the threshold value, determining that the client is good; otherwise, the client is bad; bad customers are classified into bad customers with a variety of different risk labels using the LR softmax multi-classification model. According to the method, two-stage classification prediction is adopted, and small and micro enterprises are classified into good customers and bad customers by the first-stage classification; the secondary classification classifies bad customers into bad customers of different risk categories, so that a more accurate prediction method is provided for financial industries such as banks and the like.

Description

Method and device for measuring financial risk of small and micro enterprises

Technical Field

The invention belongs to the technical field of financial industry risks, and particularly relates to a method and a device for measuring financial risks of small and micro enterprises, electronic equipment and a storage medium.

Background

The traditional financial risk measurement method focuses on measuring account data such as operation funds, total amount of assets and total amount of liabilities of enterprises, the traditional account data is often confidential and private of companies, account information is high in security density when risks are generally evaluated, the risks are not easy to evaluate, and the oriented objects are often large companies. And the existing financial risk measurement method cannot carry out risk type division.

Disclosure of Invention

In view of the above problems, the present invention provides a method, an apparatus, an electronic device and a storage medium for measuring financial risk of a small-scale enterprise.

A method for measuring financial risk of a small micro-enterprise comprises the following steps:

screening the small micro-enterprises by taking the business data, credit investigation data and peer-shield data of the small micro-enterprises as rejection strategies;

screening the small micro-enterprise again by taking in-line settlement data, blacklist data and historical data of the small micro-enterprise in a bank as rejection strategies;

performing data preprocessing on the invoice data of the small micro-enterprise obtained by screening again, and screening to obtain a data characteristic set;

respectively calculating the data feature set by using an LR model and an XGboost model, and respectively obtaining an output probability mapping score of the LR model and an output probability mapping score of the XGboost model;

comparing the average value of the output probability mapping score of the LR model and the output probability mapping score of the XGboost model with a threshold value, and if the average value is greater than the threshold value, judging the small and micro enterprise as a good client; if the value is smaller than the threshold value, judging the small and micro enterprise as a bad client;

and classifying the bad clients by using an LR softmax multi-classification model, and classifying the bad clients into bad clients with different risk labels.

Preferably, the step of screening the small micro-enterprise again by using the business data, credit investigation data and peer-to-peer data of the small micro-enterprise as rejection policies, screening the small micro-enterprise and the small micro-enterprise obtained by screening, and using inline settlement data, blacklist data and historical data of the small micro-enterprise in a bank as rejection policies comprises:

setting a threshold value, and continuing if the business data, credit investigation data and peer shield data of the small and micro enterprise are all larger than the threshold value; otherwise, ending;

setting a threshold value, and if the inline settlement data, the blacklist data and the historical data of the small micro enterprise in the bank are all larger than the threshold value, continuing; otherwise, ending.

Preferably, the step of performing data preprocessing on the invoice data of the small micro-enterprise obtained by re-screening, and the step of obtaining a data feature set by screening includes:

selecting different missing value methods for processing variable characteristics in the invoice data;

determining that the fixed data outside a specific distribution area or range is replaced by an average value through a box plot and a MAD based on the fixed data distribution normality in the invoice data;

adopting forward selection and backward deletion methods to screen out the data with the best attribute from the invoice data with various attributes;

based on the relevance of the univariate data in the invoice data and the predictive variable data thereof, deleting the univariate data with low predictive capability by adopting a method of combining pearson correlation coefficient, chi-square test and tree model;

and calculating the characteristic data which belong to the same class and have similarity in the invoice data according to respective weight to obtain new characteristic data.

Preferably, the step of calculating the set of data features using an LR model comprises:

dividing the small micro-enterprise into a positive sample and a negative sample, and ordering E_θ(X) ═ 0 is the boundary, and the set of data features is X ═ {1, X₁，x₂，x₃，...，x_n}；

Let formula E_θ(X)＝X^Tθ；

Wherein T is a transposed symbol;

will be said formula E_θ(X) conversion to a function h_θ(X)＝sigmoid(E_θ(θ))；

Then h is_θ(X) ═ 0.5 for boundaries;

and carrying out iterative computation on the function to obtain the output probability mapping fraction theta.

Preferably, the LR softmax multi-classification model establishing step includes:

carrying out unsupervised clustering calculation on the data feature sets of the bad clients, and aggregating the bad clients into a plurality of category sets according to risk types;

calculating the category sets by adopting a risk index model respectively to obtain judgment results of the category sets, and adding the judgment results into the category sets;

and establishing the LR softmax multi-classification model according to the classification set of the judgment result.

Preferably, the step of establishing the risk indicator model includes:

the feature set of the category set is U ═ U₁，u₂，u₃，...，u_nRisk category V ═ V }₁，v₂，v₃，...，v_m}，

Carrying out fuzzy judgment on each feature in the feature set U according to the risk category V to obtain an assessment matrix:

wherein r is_ijRepresents u_iAbout v_jDegree of membership of;

determining the importance weight of each feature in the feature set U according to an analytic hierarchy process, wherein A is { alpha ═ alpha₁，a₂，a₃，...，a_nAre multiplied by

Multiplying the weight A by the matrix R to obtain the judgment result B ═ B₁，b₂，b₃，...，b_m}。

Preferably, the step of establishing the LR softmax multi-classification model according to the class set obtained by the determination result includes:

for m judgment results, performing regression calculation on one judgment result and the rest m-1 judgment results;

and establishing an LR softmax multi-classification model for the judgment result probability by adopting a linear predictor and a normalization factor.

The device for realizing the method provided by the embodiment of the invention is characterized by comprising the following steps:

the rejection strategy module is used for screening the small micro-enterprise and screening the small micro-enterprise obtained by screening by taking the industrial and commercial data, credit investigation data and the same shield data of the small micro-enterprise as rejection strategies, and then screening the small micro-enterprise again by taking in-line settlement data, blacklist data and historical data of the small micro-enterprise in a bank as rejection strategies;

the first-stage classification module is used for performing data preprocessing on invoice data of the small micro-enterprise obtained through secondary screening to obtain a data feature set through screening, calculating the data feature set by using an LR model and an XGboost model respectively, obtaining an output probability mapping score of the LR model and an output probability mapping score of the XGboost model respectively, comparing an average value of the output probability mapping score of the LR model and the output probability mapping score of the XGboost model with a threshold value, and if the average value is larger than the threshold value, judging the small micro-enterprise to be a good client; if the value is smaller than the threshold value, judging the small and micro enterprise as a bad client;

and the secondary classification module is used for classifying the bad clients by using an LR softmax multi-classification model and classifying the bad clients into bad clients with different risk labels.

An embodiment of the present invention provides an electronic device, which includes at least one processing unit and at least one storage unit, where the storage unit stores a computer program, and when the program is executed by the processing unit, the processing unit is caused to execute the method described above.

A storage medium storing a computer program executable by an electronic device according to an embodiment of the present invention is configured to, when the program runs on the electronic device, cause the electronic device to execute the method described above.

Compared with the prior art, the invention has the beneficial effects that:

the method carries out secondary classification prediction aiming at small and micro enterprises, wherein the primary classification divides the small and micro enterprises into good customers and bad customers; the secondary classification classifies bad customers into bad customers of different risk categories, so that a more accurate prediction method is provided for financial industries such as banks and the like.

Drawings

FIG. 1 is a schematic structural diagram of a financial risk measuring device for small micro-enterprises in the present invention;

FIG. 2 is a flow chart of the method for measuring financial risk of small micro-enterprise according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should also be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The invention provides a method and a device for measuring financial risk of a small and micro enterprise, electronic equipment and a storage medium.

Referring to fig. 1, a schematic structural diagram of a small micro-enterprise financial risk measurement apparatus according to an embodiment of the present application is shown, which includes:

the rejection strategy module 1 is used for screening the small micro-enterprises by taking the industrial and commercial data, credit investigation data and the sibling data of the small micro-enterprises as rejection strategies, screening the small micro-enterprises obtained by screening, and screening the small micro-enterprises again by taking in-line settlement data, blacklist data and historical data of the small micro-enterprises in a bank as the rejection strategies;

specifically, setting a threshold, and continuing if the business data, credit investigation data and peer shield data of the small micro-enterprise are all larger than the threshold; otherwise, ending;

setting a threshold value, and continuing if the inline settlement data, the blacklist data and the historical data of the small micro enterprise in the bank are all larger than the threshold value; otherwise, ending.

The primary classification module 2 is used for performing data preprocessing on invoice data of the small and micro enterprises obtained through secondary screening to obtain data feature sets through screening, calculating the data feature sets respectively by using an LR model and an XGboost model, respectively obtaining an output probability mapping score of the LR model and an output probability mapping score of the XGboost model, comparing the average value of the output probability mapping scores of the LR model and the XGboost model with a threshold value, and if the average value is greater than the threshold value, judging the small and micro enterprises to be good customers; if the value is smaller than the threshold value, judging the small micro enterprise as a bad client;

and the secondary classification module 3 is used for classifying the bad clients by using an LR softmax multi-classification model and classifying the bad clients into bad clients with different risk labels.

Specifically, unsupervised clustering calculation is carried out on a data feature set of the bad clients, and the bad clients are gathered into a plurality of category sets according to risk types;

the feature set of the class set is U ═ U₁，u₂，u₃，...，u_nRisk category V ═ V }₁，v₂，v₃，...，v_m}，

Further, the risk indicator model calculation method comprises the following steps:

carrying out fuzzy judgment on each feature in the feature set U according to the risk category V to obtain an evaluation matrix:

wherein r is_ijRepresents u_iAbout v_jDegree of membership of;

Multiplying the weight A by the matrix R to obtain a decision result B ═ B₁，b₂，b₃，...，b_m}。

As shown in fig. 2, the present embodiment further provides a method for measuring financial risk of a small micro-enterprise, including:

screening the small and micro enterprises by taking the business data, credit investigation data and the peer-to-peer data of the small and micro enterprises as rejection strategies;

specifically, setting a threshold, and continuing if the business data, credit investigation data and peer shield data of the small micro-enterprise are all larger than the threshold; otherwise, ending.

Screening the small micro-enterprises again by taking in-line settlement data, blacklist data and historical data of the small micro-enterprises in the bank as rejection strategies;

specifically, setting a threshold value, and if the inline settlement data, the blacklist data and the historical data of the small micro enterprise in the bank are all larger than the threshold value, continuing; otherwise, ending.

And performing data preprocessing on the invoice data of the small micro-enterprise obtained by secondary screening, and screening to obtain a data characteristic set, wherein the data characteristic set comprises the following steps:

specifically, filling of the customized missing value is performed according to the missing value ratio, the importance of the variable characteristic and whether the variable characteristic is continuous, for example, the transaction amount ratio of three customers before the last 12 months, the variation coefficient of the last 24 months, the number of newly added sales commodities in the last 12 months and other variables are higher, and if the importance is lower, the variable is determined to be deleted; the ratio of the value of the special invoice invoicing amount of the value-added tax special invoice in the last 12 months to the total invoicing amount (without the waste invoice), the ratio of the variable missing values such as the amount of the waste invoice in the last 6 months/the effective value in the last 6 months plus the amount of the waste invoice and the like is lower, and the judgment is filled by using a median if the importance is lower; the effective invoicing amount is compared in the last 6 months, the variable missing values such as the ratio of the waste invoices to the total invoices in the last 12 months are too much missing, and if the importance is higher, the random forest is used for prediction and filling; and if the variables such as the amount of the red invoice in the last 12 months/the amount of the blue invoice in the last 12 months are discrete and have less different values, the variables are judged to be converted into dummy variables for filling.

the fixed data is, for example: the amount of the invoices which are wasted in the previous 12 months is proportional to the proportion of the amount of the invoices, and the first large commodity circulation ratio in the last 12 months is equivalent to other data.

Adopting forward selection and backward deletion methods to screen out data with the best attributes from the invoice data with various attributes;

the data of the plurality of attributes is, for example: effective sum of invoicing from L1/average effective sum of invoicing from month L2 to month L13, effective sum of invoicing from month L2/average effective sum of invoicing from month L3 to month L14, effective sum of invoicing from month L3/average effective sum of invoicing from month L4 to month L15, and the like.

the univariate data is, for example, the effective ticket amount increase rate in the last 12 months.

The characteristic data with similarity in the same category are, for example, data such as an average value of valid billing sheets from L1 to L3/L4 to L15, an average value of valid billing sheets from L1/L2 to L13, an average value of valid billing sheets from L2/L3 to L14, and an average value of valid billing sheets from L3/L4 to L15.

specifically, the step of calculating the data feature set by using an LR model is as follows;

divide the small micro-enterprise into positive and negative samples, and let E_θ(X) ═ 0 is the boundary, and the data feature set is X ═ {1, X₁，x₂，x₃，...，x_n}；

Let formula E_θ(X)＝X^Tθ；

Wherein T is a transposed symbol;

will be formula E_θ(X) conversion to a function h_θ(X)＝sigmoid(E_θ(θ))；

Then h is_θ(X) ═ 0.5 for boundaries;

and carrying out iterative computation on the function to obtain an output probability mapping fraction theta.

The use of the XGBoost model to perform a data feature set is conventional and will not be set forth in detail herein.

Comparing the average value of the output probability mapping scores of the LR model and the XGboost model with a threshold value, and if the average value is greater than the threshold value, judging that the small and micro enterprises are good customers; if the value is smaller than the threshold value, judging the small micro enterprise as a bad client; the above is a first-level classification, and the small and micro enterprises are roughly classified into good customers and bad customers.

class setIs set as U ═ U₁，u₂，u₃，...，u_nRisk category V ═ V }₁，v₂，v₃，...，v_m}，

wherein r is_ijRepresents u_iAbout v_jDegree of membership of;

an embodiment of the present invention provides an electronic device, which includes at least one processing unit and at least one storage unit, where the storage unit stores a computer program, and when the program is executed by the processing unit, the processing unit is enabled to execute the method.

An embodiment of the present invention provides a storage medium, which stores a computer program executable by an electronic device, and when the program runs on the electronic device, the electronic device is caused to execute the method described above.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for measuring financial risk of a small micro-enterprise is characterized by comprising the following steps:

2. The method for measuring financial risk of small micro-enterprise as claimed in claim 1, wherein the step of screening the small micro-enterprise and the screened small micro-enterprise with the business data, credit investigation data and the identity data of the small micro-enterprise as rejection strategies, and then screening the small micro-enterprise again with the in-line settlement data, blacklist data and historical data of the small micro-enterprise in the bank as rejection strategies comprises:

3. The method for measuring financial risk of small micro-enterprise as claimed in claim 1, wherein the step of performing data preprocessing on the invoice data of the small micro-enterprise obtained by re-screening, and the step of screening the data feature set comprises:

4. The method of small micro-enterprise financial risk measurement according to claim 1, wherein the step of computing the set of data features using an LR model is:

dividing the small micro-enterprise into a positive sample and a negative sample, and ordering E_θ(X) ═ 0 as a boundary, the data characteristicsSet as X ═ 1, X₁，x₂，x₃，...，x_n}；

Let formula E_θ(X)＝X^Tθ；

Wherein T is a transposed symbol;

Then h is_θ(X) ═ 0.5 for boundaries;

5. The method of small micro enterprise financial risk measurement according to claim 1, wherein the LR softmax multi-classification model building step comprises:

6. The method for small micro enterprise financial risk measurement according to claim 5, wherein the step of establishing the risk indicator model includes:

wherein r is_ijRepresents u_iAbout v_jDegree of membership of;

determining the importance weight of each feature in the feature set U according to an analytic hierarchy process, wherein A is { a ═ a₁，a₂，a₃，...，a_nAre multiplied by

7. The method of claim 6, wherein the step of building the LR softmax multi-classification model according to the set of classes resulting in the determination comprises:

8. An apparatus for implementing the method of any one of claims 1 to 7, comprising:

9. An electronic device, comprising at least one processing unit and at least one memory unit, wherein the memory unit stores a computer program that, when executed by the processing unit, causes the processing unit to perform the method of any of claims 1 to 7.

10. A storage medium storing a computer program executable by an electronic device, the program, when run on the electronic device, causing the electronic device to perform the method of any one of claims 1 to 7.