CN112232377A - Method and device for constructing ESG (electronic service guide) three-excellence credit model of enterprise - Google Patents

Method and device for constructing ESG (electronic service guide) three-excellence credit model of enterprise Download PDF

Info

Publication number
CN112232377A
CN112232377A CN202011000208.5A CN202011000208A CN112232377A CN 112232377 A CN112232377 A CN 112232377A CN 202011000208 A CN202011000208 A CN 202011000208A CN 112232377 A CN112232377 A CN 112232377A
Authority
CN
China
Prior art keywords
index
model
credit
level
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011000208.5A
Other languages
Chinese (zh)
Inventor
王遥
施懿宸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongcai Lvzhi Beijing Information Consulting Co ltd
Original Assignee
Zhongcai Lvzhi Beijing Information Consulting Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongcai Lvzhi Beijing Information Consulting Co ltd filed Critical Zhongcai Lvzhi Beijing Information Consulting Co ltd
Priority to CN202011000208.5A priority Critical patent/CN112232377A/en
Publication of CN112232377A publication Critical patent/CN112232377A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Abstract

The invention provides a method and a device for constructing an enterprise ESG three-priority credit model, wherein the method comprises the following steps: step 1, determining a sample set, wherein samples in the sample set comprise meta-information; step 2, collecting data of the model index factors according to the meta-information to obtain an index data set; step 3, standardizing the index data set to obtain a standardized index data set; step 4, after the standardized index data set is scaled in percentage, weighting and summing are carried out to obtain an index score data set; step 5, performing credit scoring on the sample according to the index scoring data set; and 6, carrying out interval division on the credit rating to obtain the credit rating of the sample. The invention creatively brings the enterprise environment, society and administration (ESG) information into an enterprise credit rating system, can effectively identify the credit quality of the bond main body, and more comprehensively measures the overall credit of the debtors.

Description

Method and device for constructing ESG (electronic service guide) three-excellence credit model of enterprise
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for constructing an enterprise ESG three-excellent credit model.
Background
Bond is one of the important financing means of enterprises in China. Over the past decade, the chinese bond market has grown significantly and has been the third world in size. However, since the 11-year-over-the-day debt default in 3 months 2014, the number of bond default cases in China is increased, the bond market default events are frequent, and the credit risk is the focus of the financial market attention. In 2018, 125 bond defaults are shared in the credit and debt market of China, and the default amount is 1209.61 billion yuan; 44 new default publishers are added all the year round; the size and number of default bonds far exceed 2017; meanwhile, the early warning capability of part of rating mechanisms is weak, the default number of high-rating bonds is increased, and AAA-level default publishers appear for the first time.
From the mature experience abroad, the grading method should be mainly based on quantitative analysis and assisted by qualitative analysis. The reliability degree of the rating is more dependent on the comprehensive balance opinion of the analysts with abundant experience and information. However, in China, the appraisers of the bond rating organization can only give out scores according to actual values calculated by indexes determined in advance, and cannot make comprehensive judgment of themselves, so that the credit of the bond of the enterprise cannot be comprehensively evaluated. In addition, in the aspect of index setting, the rating system of the current main rating organization in China also takes the financial index as the main part, and the attention to the non-financial index ESG (environmental, social and governance index) is neglected. The traditional financial index generally measures past performance of an enterprise, and debtors have motivation to hide real operation conditions, so that the problem of information asymmetry between the debtors and the creditors exists.
As the national bond market further promotes the marketization process, the future credit default tends to be normalized, and the credit risk prevention is very important. The investor should strengthen the credit risk recognition capability, strengthen the internal risk management and control, reasonably carry out risk decentralized configuration, and be vigilant about the possibility that the credit risk impact is converted into the mobile impact. The traditional credit rating model has the problems of risk tracking hysteresis, weak risk early warning capability, to-be-improved model rigidness and the like when the conventional credit rating model is used for measuring bond subjects, so that the optimization of the traditional credit rating model to analyze bond credit risks more comprehensively is particularly important.
Disclosure of Invention
In view of the above problems, the present invention provides a method and an apparatus for constructing an enterprise ESG three-goodness credit model.
In order to solve the technical problems, the invention adopts the technical scheme that:
in one aspect, the application provides a method for constructing an enterprise ESG three-goodness credit model, which includes:
step 1, determining a sample set, wherein samples in the sample set comprise meta-information;
step 2, collecting data of the model index factors according to the meta-information to obtain an index data set;
step 3, standardizing the index data set to obtain a standardized index data set;
step 4, after the standardized index data set is scaled in percentage, weighting and summing are carried out to obtain an index score data set;
step 5, performing credit scoring on the sample according to the index scoring data set;
and 6, carrying out interval division on the credit rating to obtain the credit rating of the sample.
As a preferred scheme, the model index factor is divided into three levels, the first level is a first-level index, the second level comprises a second-level index corresponding to the first-level index, the third level comprises a third-level index corresponding to the second-level index, and the third-level index forms a model candidate factor pool.
As a preferred scheme, the normalizing the index data set in the step 3 specifically includes: and classifying the three-level indexes in the model candidate factor pool into a class I and a class II according to the independence and relevance of index values to enterprises, wherein the class I three-level indexes are standardized by a Z value of a moving window, and the class II three-level indexes are standardized by a Z value of a current quarter.
As a preferred scheme, after the step 3, screening the three levels of index factors in the normalized index dataset, specifically: the model classification capability of a single three-level index factor is checked by adopting a univariate logistic regression model to evaluate the significance degree of the three-level index factor and eliminate the three-level index factor with low significance; modifying the three-level index factors to make all the three-level index factors negatively correlated with the occurrence of the negative credit event; and performing multiple collinearity diagnosis on the three-level index factors, and removing part of the highly-related three-level index factors.
As a preferred scheme, the method further comprises the steps of optimizing the model, and dividing a sample set into a training set and a testing set; carrying out random assignment on the weight combination in a training set and then bringing the weight combination into a model for training; calculating the recall rate of the predicted value and the actual value of the model under the weight combination; the steps are circulated for each training set for multiple times, and the recall rate corresponding to the weight combination is recorded successively; arranging the recall rates which are circulated for many times from large to small, taking the corresponding weight combination with the recall rate ranked at the front to calculate the average number of the weight combinations, and obtaining the average weight combination; and taking the average weight combination as a global optimal weight to be brought into the test set to obtain the recall rate of the test set.
Preferably, when the three-level index in the sample set has data loss, the data points with null data are filled to be 0 or mean.
In a second aspect, the present application provides an apparatus for constructing an enterprise ESG three-goodness credit model, including:
a sample set determination module for determining a sample set, wherein the samples in the sample set comprise meta-information;
the index data set acquisition module is used for carrying out data collection on the model index factors according to the meta information to acquire an index data set;
the standardized index data set acquisition module is used for carrying out standardized processing on the index data set to acquire a standardized index data set;
the index score data set acquisition module is used for carrying out weighted summation after the standardized index data set is scaled in percentage to acquire an index score data set;
the credit scoring module is used for scoring the credit of the sample according to the index scoring data set;
and the credit rating module is used for carrying out interval division on the credit scores and acquiring the credit ratings of the samples.
As a preferred scheme, the model index factor is divided into three levels, wherein the first level is a first-level index, the second level comprises a second-level index corresponding to the first-level index, the third level comprises a third-level index corresponding to the second-level index, and the third-level index forms a model candidate factor pool; classifying the three-level indexes in the model candidate factor pool into a type I and a type II according to the independence and relevance of the index numerical value and the enterprise;
the standardized index data set acquisition module comprises a first standardized module and a second standardized module, wherein the first standardized module is used for standardizing the class I three-level index by adopting a moving window Z value; and the second standardization module is used for standardizing the class II three-level indexes by adopting the current quarter Z value.
As a preferred scheme, the construction device further comprises an index factor screening module for screening the third-level index factors in the standardized index data set; the method specifically comprises the following steps: the model classification capability of a single three-level index factor is checked by adopting a univariate logistic regression model to evaluate the significance degree of the three-level index factor, and the three-level index factor with low significance is removed; modifying the three-level index factors to make all the three-level index factors negatively correlated with the occurrence of the negative credit event; and performing multiple collinearity diagnosis on the three-level index factors, and removing part of the highly-related three-level index factors.
As a preferred scheme, the construction device further comprises a model optimization module for dividing the sample set into a training set and a testing set; carrying out random assignment on the weight combination in a training set and then bringing the weight combination into a model for training; calculating the recall rate of the predicted value and the actual value of the model under the weight combination; the steps are circulated for each training set for multiple times, and the recall rate corresponding to the weight combination is recorded successively; arranging the recall rates which are circulated for many times from large to small, taking the corresponding weight combination with the recall rate ranked at the front to calculate the average number of the weight combinations, and obtaining the average weight combination; and taking the average weight combination as a global optimal weight to be brought into the test set to obtain the recall rate of the test set.
Compared with the prior art, the invention has the beneficial effects that:
(1) the method and the system creatively bring the enterprise environment, society and administration (ESG) information into an enterprise credit rating system, can effectively identify the credit quality of the bond main body, and can more comprehensively measure the overall credit of the debtors.
(2) The ESG non-financial performance factor is added to the method, so that the defect of the traditional financial index can be made up, the traditional financial index usually measures the past performance of an enterprise, the ESG is more focused on measuring the current and future development potential of the enterprise, the benefit of a creditor can be protected, and the safety of a financial system can be promoted.
(3) The model provided by the application uses the p-value discretization score of the inspection of the three-level index single-factor logistic regression coefficient as the three-level index weight distribution basis in the synthesis of the two-level index, can reflect the significance degree of the index factor to a certain extent and further reflect the importance of the index factor (the smaller the p-value is, the more significant the p-value is), and finally realizes the differentiation processing of the prediction capability of each three-level index according to default and degradation in the merged three-level index (the two-level index).
(4) The model provided by the application detects multiple collinearity of the three-level indexes in the construction process, removes the homogenization three-level indexes in the index library by combining the economic meaning, controls the complexity of the model, enhances the robustness of the model to a certain extent, and enhances the understandability of the model by combining the three-level indexes.
(5) According to the method and the device, the scoring of experts is not relied on, the model construction is carried out only by adopting the open objective data, and the objectivity and the operability of the model are further ensured.
Drawings
The disclosure of the present invention is illustrated with reference to the accompanying drawings. It is to be understood that the drawings are designed solely for the purposes of illustration and not as a definition of the limits of the invention. In the drawings, like reference numerals are used to refer to like parts. Wherein:
fig. 1 is a schematic flow chart of a method for constructing an enterprise ESG three-priority model according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of another form of a method for constructing an enterprise ESG three-priority model according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a model data set hierarchy of an enterprise ESG three-priority model construction method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an enterprise ESG three-goodness credit model building apparatus according to an embodiment of the present invention.
Detailed Description
It is easily understood that according to the technical solution of the present invention, a person skilled in the art can propose various alternative structures and implementation ways without changing the spirit of the present invention. Therefore, the following detailed description and the accompanying drawings are merely illustrative of the technical aspects of the present invention, and should not be construed as all of the present invention or as limitations or limitations on the technical aspects of the present invention.
An embodiment according to the present invention is shown in connection with fig. 1. The application provides a method for constructing an enterprise ESG three-priority credit model, which comprises the following steps:
step S101, determining a sample set, wherein samples in the sample set comprise meta-information;
step S102, data collection is carried out on the model index factors according to the meta-information, and an index data set is obtained;
step S103, carrying out standardization processing on the index data set to obtain a standardized index data set;
step S104, after the standardized index data set is scaled in percentage, weighting and summing are carried out to obtain an index score data set;
step S105, performing credit scoring on the sample according to the index scoring data set;
and step S106, carrying out interval division on the credit rating, and acquiring the credit rating of the sample.
In the embodiment of the invention, the model index factors are divided into three layers, wherein the first layer is a first-level index, the second layer comprises a second-level index corresponding to the first-level index, the third layer comprises a third-level index corresponding to the second-level index, and the third-level index forms a model candidate factor pool.
In step S103, the normalizing process is performed on the index data set, which specifically includes: and classifying the three-level indexes in the model candidate factor pool into a class I and a class II according to the independence and relevance of the index values to enterprises, wherein the class I three-level indexes are standardized by adopting a moving window Z value, and the class II three-level indexes are standardized by adopting a current quarter Z value.
After step S103, screening the three-level index factors in the normalized index data set, specifically: the model classification capability of a single three-level index factor is checked by adopting a univariate logistic regression model to evaluate the significance degree of the three-level index factor and eliminate the three-level index factor with low significance; modifying the three-level index factors to make all the three-level index factors negatively correlated with the occurrence of the negative credit event; and performing multiple collinearity diagnosis on the three-level index factors, and removing part of the highly-related three-level index factors.
In addition, the method also comprises the step of optimizing the model, and the sample set is divided into a training set and a testing set; carrying out random assignment on the weight combination in a training set and then bringing the weight combination into a model for training; calculating the recall rate of the predicted value and the actual value of the model under the weight combination; the steps are circulated for each training set for multiple times, and the recall rate corresponding to the weight combination is recorded successively; arranging the recall rates which are circulated for many times from large to small, taking the corresponding weight combination with the recall rate ranked at the front to calculate the average number of the weight combinations, and obtaining the average weight combination; and taking the average weight combination as a global optimal weight to be brought into the test set to obtain the recall rate of the test set.
As shown in fig. 2, a detailed description will be given below of a method for constructing an enterprise ESG three-priority model, taking a first sample set as a sample set of a stock listed company (sample set a) and a second sample set as a sample set of a debt enterprise (sample set B).
In the embodiment of the invention, two sample sets are needed for constructing and training the three-excellence credit model. The first sample set is a sample set of a listed companies (sample set a) containing: stocks a stock on market from the beginning of 2014 to the end of 2019 each quarter. The second sample set is a debt enterprise sample set (sample set B) comprising: 1) Debt enterprises with negative credit events (including bond default and subject credit rating degradation) occurring in four years from 1/2016 to 31/12/2019; 2) the above lending enterprises correspond to matching lending enterprises for which no negative credit event occurred within the quarter in which the negative credit event occurred. The sample set A is mainly used for data standardization, credit score overall distribution and score rating mapping quantile calculation; the sample set B is mainly used for model training (parameter optimization), evaluation and final model determination.
Note: considering that the coverage years of key index ESG factors (ESG data of the large financial and mid-financing department) of the model are 2016-2019, the number of enterprises with negative credit events in four years is large, and the number of samples which can be used for model construction is guaranteed, the sampling time of the sample set B is determined to be 2016-2019. The sampling time of the corresponding sample set a is defined as 2014 to 2019.
Sample set a has a sample form of listed company/last quarter date (e.g. peace bank/2014 year 3 month 31). The construction mode is more direct: the list of a stocks listed on the market may be obtained from the wind from the quarter of 2014 to the quarter of 2019 (24 years in total) at the end of each quarter (31 days in 3 months, 30 days in 6 months, 30 days in 9 months, and 31 days in 12 months).
Sample set B was in the form of debt enterprise/quarter end date (e.g. king west group/31 12/2019). There are two types of debt enterprises, type I is a debt enterprise (1 sample in the classification problem) where a negative credit event (including a bond breach and a subject credit rating degradation) occurred within the sample period of sample set B (2016/1/1-2019/12/31); type II is the matching credited enterprise corresponding to the credited enterprise in type I that has not had a negative credit event within the quarter in which its negative credit event occurred (0 sample in the classification problem).
The following description is made with respect to type I and type II debt enterprise sampling, respectively.
Debt enterprise sampling in type I is more straightforward: all debt issuing enterprises that have had bond violations or subject credit rating degradation within the sampling period are obtained from the wind database, with the sample quarter being the quarter (marked with the quarter end date) where their negative credit event occurred (e.g. wang group has bond violations at 2019/12/30, it is sampled as type I sample: wang group/2019, 12/31).
Note: considering that the three-excel credit model is considered to be a debt enterprise instead of a bond and the same debt enterprise should be equivalent in the same quarter, the model adopts a sampling mode that one debt enterprise samples once in a quarter to avoid sample duplication.
Debt enterprises in type II are relatively complex to sample: for each sample of debt enterprises in type I, a series of debt enterprises that did not have a negative credit event within their sample quarter are matched according to the business similarity principle (e.g. industry classification, same debt type).
The specific matching rule contains the following 3 conditions:
1) the enterprise industry conditions are as follows: the matched sample debt enterprise and the original sample debt enterprise belong to the same industry (bond second-level industry) and are used for ensuring the comparability of the matched enterprise sample and the original enterprise sample in the enterprise operation range, and the use of the bond second-level industry classification aims to more accurately match the industry (the bond first-level industry classification is too extensive) on the premise that the enterprise industry data is as complete as possible (the testimony industry classification covers the debt enterprise incompletely);
2) bond type conditions: the method comprises the steps that a match sample debt enterprise has a persistent bond with the same bond type (bond secondary bond type) as an original bond enterprise with the same bond type in a same season within a sample season, the match sample debt enterprise is used for ensuring the comparability of the match sample with the original sample as a debt subject on the property of the bond, and the bond secondary bond type is used for matching the bond type more accurately on the premise that the bond type data is as complete as possible;
3) sample total amount control conditions: matching sample debt enterprises belonging to a-stock marketing companies for limiting the number of matching samples and guaranteeing the data availability of their index factors (e.g. esg, financial index …). If not limited, the number of matched samples is much larger than the number of original samples (more than 1000 times), and the extremely unbalanced sample set is not favorable for model training.
The matching process is described below by way of an example (for reference only, not the actual matching result).
Consider a type I sample: west king group/2019, 12 months and 31 days. The wind second-level industry of the West king group is 'food, beverage and tobacco' (condition 1); there are persistent bonds with bond secondary bond type "general company bond", "ultrashort term financing bond" in four quarters of 2019 (condition 2), and 5 type II samples can be matched in a-stock market (condition 3) debt enterprises in which no negative credit event occurs in four quarters of 2019 by the above conditions. Specifically, as shown in table 1.1:
TABLE 1.1 type II matching sample example
Figure BDA0002694017190000081
Note: consistent with type I sample processing, a debt enterprise in type II is also sampled only once (possibly matched multiple times, only once) within a quarter.
Sample set B shares type I samples: 714 (business/quarter); type II samples: 5782 (business/quarter); overall sample: 6496 (business/quarter).
The scope and structure of the raw data used for model construction and training, and the collection and organization thereof, are discussed below.
The sample set includes some "meta-information" of the sample object itself (information related to some properties of the sample object itself, which is not typically a direct input for model building) and "model indicator factor" data that directly participates in the model building, training process. The former is often used as a key (parameter) for data acquisition of the latter.
The sample meta-information mainly relates to attributes of three aspects of a listed company sample (sample set A) and a debt enterprise sample (sample set B), and can be summarized as follows: 1) enterprise subject information; 2) securities information issued by enterprises; 3) Date information.
The enterprise subject information includes: company code, witness gate industry, province, credit event information. The company code is used for enterprise sample identification (self-coding, and can be replaced by a unified social credit code), and the information of the witness gate industry and province is used for data acquisition of corresponding industry development and regional economic indexes. The credit event information is only applicable to the sample set B for distinguishing between samples (labeled "default" and "deration", respectively) where a bond breach has occurred and the subject's credit rating is degraded within a sample quarter, and the remaining samples (labeled "none"), where "default" and "deration" correspond to 1 sample of the classification problems and "none" corresponds to 0 sample of the classification problems.
The security information issued by the enterprise includes: securities codes and securities are abbreviated. The stock code is used for acquiring data of financial indexes, if the sample company is a listed company, the stock code is the stock code of the sample company, and if the sample company is a non-listed debt issuing enterprise, the sample company is the debt code of the sample company. The securities information is non-essential information for short, and can be used as supplementary information of the securities code, so that the securities can be conveniently identified manually.
The date information includes: sample date, credit event quarter, credit rating quarter, theoretical latest reporting period, actual latest reporting period. Where the first date applies to sample set a and the last 5 dates apply to sample set B. The sample date is used to indicate the effective quarterly date (marked with the last quarterly date) of the sample business in sample set A, i.e., the list of A-stock listed companies for which quarter of the year the sample belongs. For example, the sample form is the safe bank/31/3/2014, wherein 31/3/2014 is the sample date. The sample date will be used as a date information parameter for the acquisition of the index factor data for sample set a. The credit event date marks the negative credit event occurrence date for the type I sample lending enterprise in sample set B (not applicable to the type II sample), and the credit event quarter is the end-of-quarter date of the quarter in which the credit event occurrence date is located (for the matching lending enterprise samples in type II, the credit event quarter is the credit event quarter of the sample it is matched lending enterprise). The credit rating quarter marks the rating quarter (four times a year, frequency of quarters, last quarter date mark) corresponding to the sample's three prime credit rating, i.e., the sample is used to give the three prime credit rating for which quarter of the year. The theoretical latest reporting period is the latest financial/macroscopic reporting period (3/31: one quarter newspaper; 6/30: middle newspaper; 9/30: three quarter newspaper; 12/31: yearbook) theoretically available from the historical perspective of the credit rating quarterly date, and the theoretical latest reporting period is the previous quarter of the credit rating quarterly (the date of disclosure is usually the next quarter of the reporting period). The actual latest reporting period corresponds to the actually available latest reporting period, usually coinciding with the theoretical latest reporting period, and in few cases being one to more than one period earlier than the theoretical latest reporting period (delayed disclosure of reporting). The distinction between credit rating quarterly and macroscopic/financial data reporting periods is made in order to not introduce future information from a historical perspective, ensuring that the data used in the model can conform to historical reality.
The types and uses of the above meta-information are summarized as follows:
Figure BDA0002694017190000091
Figure BDA0002694017190000101
examples of meta-information collection:
table 1.2 and table 1.3 illustrate the metadata collection cases for sample set a and sample set B, respectively:
table 1.2 sample set a meta-information collection example
Company code Certificate and prison gate industry Province of labor Security code Securities abbreviation Sample(s)Date
3 Finance industry Guangdong (Chinese character of Guangdong) 000001 Safe bank 2014/3/31
7 Land industry Guangdong (Chinese character of Guangdong) 000002 Wanke A 2014/6/30
49631 Finance industry Shanghai province 600000 Pufa bank 2016/9/30
73500 Manufacturing industry Jiangsu 688001 Huaxing original wound 2019/12/31
Table 1.3 sample set B-meta information collection example
Figure BDA0002694017190000102
Figure BDA0002694017190000111
The Sanyou Credit model takes 'Sanyou Credit' as a core, and 1) an enterprise credit environment; 2) enterprise credit capability 3) enterprise credit quality three dimensions depict the enterprise credit rating. The index system can be divided into three levels, wherein the first level comprises three primary indexes of credit environment, credit capability and credit quality. The second level comprises specific subdivision secondary indexes corresponding to the three primary indexes. The credit environment comprises macroscopic economic indexes of national economy, market factors, industry development and regional economic environment; the credit capability comprises microscopic financial indexes in the six aspects of profitability, debt paying capability, operation capability, development capability, debt structure and Altman Z value early warning; the credit quality comprises ESG indexes of three aspects of environment, society and company governance. The third level comprises subdivided three-level proxy indexes which actually reflect the second-level index performance of the grading sample, and the specific number and content of the three-level indexes are determined by the specific steps of model construction.
As shown in table 1.4, under the three-level index system, the number and types of the first and second-level indexes are relatively fixed, and the number and types of the third-level indexes may be dynamically adjusted according to the specific sample selection and model construction method.
TABLE 1.4 first and second class model index factors
Figure BDA0002694017190000112
Figure BDA0002694017190000121
The sample credit score in the three-excellent credit model is essentially formed by weighting and summing the index data, and the weight of each index reflects the importance degree of each index in the model, namely the identification and classification capability of the index on negative credit events. The three-excellent-credit rating is a discretization treatment of the three-excellent-credit rating through a certain mapping relation on the premise of ensuring that the credit level of the enterprise is equivalent to that of a rating sample. Different from the fact that meta-information only indirectly participates in the construction and training of the model as auxiliary data, model index factor data directly participate in the calculation of the three-excellent credit scoring and rating as the original input of the model. Therefore, as the most basic ring in the model index system, index selection and data collection of the three-level index factors are particularly important.
Based on the existing literature foundation and the relatively mature credit model index system referred to at home and abroad, the model candidate factor pool formed by the subordinate three-level indexes of each second-level index can be preliminarily determined by combining the availability of data. The indexes of the alternative pool are universal, namely the method is suitable for various types of enterprises, and meanwhile, the data updating frequency is high enough, and at least quarterly updating is carried out, so that the data effectiveness of the model indexes on various types of enterprise samples is met. The 71 tertiary indices for the candidate factor pool are listed in table 1.5:
TABLE 1.5 Tertiary index candidate factor pool
Figure BDA0002694017190000122
Figure BDA0002694017190000131
Figure BDA0002694017190000141
Most factor data in the third-level index candidate factor pool are public data and can be directly obtained from a wind database, and the original data source is a financial report of the national statistical bureau and an enterprise. There are few other alternative tertiary indicators that have other special sources or need to be self-calculated, and these indicators are now explained as follows:
stock market fluctuations, Shanghai depth 300 (full A), HV50: season: the index is used for measuring the fluctuation condition of the second-level market of Chinese stocks, and is the quarterly average value of the historical fluctuation rate (50 days) of the Shanghai depth 300 index (Zhongzhen full A index). The larger the index value, the larger the representative stock market fluctuation.
HHI hfn dall-hhhmann index, an industry development indicator, HHI for an industry is the sum of the squares of the market shares for all enterprises in that industry, using the calculation formula for this model as follows:
Figure BDA0002694017190000142
wherein:
HHIAHHI of industry A (service certificate prison department industry Classification)
XcBusiness total income for Enterprise c (market size)
XAThe sum of the business total revenues of all enterprises in industry A (Total market size)
Sc=Xc/XAMarket share of Enterprise c in industry A
HHI is used to measure the market concentration of an industry, with larger HHIs (approaching 1) being the higher the market concentration of the industry.
Total score of ESG score: the index data source is the China financial institute, the big Green gold institute and the weighted sum of the scores of all the items of the scored enterprise E, S, G. Considering that the ESG score is a whole, the three secondary indexes under the credit quality primary index E, S, G are combined into a single index of 'ESG score total score' in the actual calculation of the model for processing.
The following discusses how to collect the corresponding index data for the sample set a and the sample set B by using the meta information, and describes the collection time span and frequency of the index data.
Factors in the tertiary index selection pool can be divided into four categories according to different meta-information parameters used for data acquisition:
(1) only date parameters were used: the factors comprise national economy in a credit environment and all three levels of indexes under the market factor secondary indexes, and the index value is completely determined by date (the date is the quarterly date), namely the type indexes of all enterprises in the same quarter are completely the same.
(2) Date of use and regional parameters: the factors comprise two tertiary indexes under the secondary index of the regional economic environment in the credit environment, the acquisition of the index value depends on two parameters of date and region (province in the meta-information), and the indexes of the same type in the same quarter of all enterprises of the same province are completely the same.
(3) Date of use and industry parameters: the factors comprise three tertiary indexes under the industry development secondary index in the credit environment, the acquisition of index values depends on two parameters of date and industry (the witness gate industry in meta-information), and the type indexes of all enterprises in the same industry in the same quarter are completely the same.
(4) Date of use and company parameters: the factors comprise credit ability and credit quality, all the secondary indexes belong to three levels, and are company level data, and the index value is determined by company and date parameters.
The sample set A comprises a stock listing company list from 2014 to 2019, 6 quarter and 24 year (according to the quarter frequency), and for each sample (listing company/quarter end date) in the sample set, all factors in the three-level index alternative selection pool are subjected to data collection by using the meta-information according to the four types (date parameter: sample date, region parameter: province, industry parameter: witness department industry, company parameter: company code) to form an index data set A. Wherein the sample date employs a corresponding macroscopic/financial data reporting period (e.g.2016/3/31 corresponds to a 2016 quarterly annual report).
Sample set B contains debt businesses that experienced a negative credit event early in 2016 and late in 2019 and their corresponding matching debt businesses that did not experience a negative credit event. In order to compare the three-best rating variation trend of the sample, the three-best rating of the multi-period history needs to be given to the sample. Since the index data before the credit event date is uncertain in the number of obtainable quarters, considering data availability and sample data validity, all factor data in the three-level index selection pool are collected for each sample (debt enterprise/end-of-quarter date) in the sample set, and all factor data (which can be used for giving a 4-period historical three-best rating) for four consecutive quarters before the sample credit event quarter (credit rating quarters in the corresponding metadata) finally form an index data set B. Therefore, the index data set B can be divided into four subsets with equal quantity, each subset comprises all samples of the sample set B, and the subsets are respectively marked as a 'previous quarter' subset, a 'second quarter' subset, a 'third quarter' subset and a 'fourth quarter' subset according to the distance between the credit rating quarter and the credit event quarter.
In order not to introduce future information when giving a credit rating, only the latest index data available at that time is collected in each credit rating quarter (end of quarter). Specifically, a theoretical latest reporting period is used as a time parameter of a part of the index of the credit environment (macroscopic economic indexes are usually published in a quarter), and an actual latest reporting period is used as a time parameter of a financial related index of the credit ability part (financial report disclosure has a hysteresis). If there are no available financial reports for collecting relevant model index data for a certain period of four consecutive credit rating quarters for a sample, or if the actual latest reporting period is too long (i.e. more than one year) from the date of the credit rating quarter, then all the quarter data for that sample are deleted from the index data set B, thereby ensuring the integrity (quarter data) and validity (within one year) of the remaining sample index data.
After this washing screen, panel B shares type I samples: 597 (business/quarter); type II samples: 5654 (business/quarter); overall sample: 6251 (business/season). There were 25004 (6251 × 4) records in index dataset B corresponding to it.
Example of index data collection:
taking West king group/31.12.2019 as an example, data for four consecutive quarters before the quarter of their credit event is collected as shown in Table 1.6:
TABLE 1.6 example of phase 4 samples in West king group index dataset B
Figure BDA0002694017190000161
For the credit capability index, the financial index data of some companies often has data missing (the index is not disclosed by the latest financial report available in the current period of the company), and one possible reason is that the index is only disclosed in annual or semiannual reports, the latest report period is one-quarter or three-quarter reports, and the index is interrupted or never disclosed. For the former case, the idea of taking the current latest available data is adopted, the missing data is filled with the last available index value before, and meanwhile, in order to ensure the effectiveness of the data, the interval of the filling report period is not more than one year.
In the process of processing the three-level indexes, because the difference between the dimension and the order of magnitude of different indexes in the three-level index alternative pool is large, the data needs to be further standardized. For example, the profit unit of the main business in the credit capability index is ten thousand yuan, the value is ten thousand, and the HHI has no unit and the value range is between 0 and 1. If the original data of the three-level indexes are directly used without processing, the influence of the profit indexes of the main business on the model is naturally larger than the HHI indexes, but the influence is determined by the numerical value of the indexes and cannot reflect the importance degree of the real indexes essentially. Therefore, in order to eliminate the influence of the dimension and the magnitude of the model index, the data standardization of the original data of the model index is needed. The data standardization method used by the model is similar to the conventional Z value standardization (i.e. (x-mu)/sigma: mu and sigma are the mean value and standard deviation of the index x), but in order to ensure the consistency of the standardized values of the model index (i.e. the standardized values of the index of the same enterprise in a specific quarter are not changed due to different credit rating times and sample sets of credit rating objects), the following special processing means are adopted:
indexes in the three-level index alternative pool are divided into I, II types, and different standardization modes are adopted according to different types of indexes. Wherein the I-type indexes comprise two secondary indexes of national economy and market factors in a credit environment and all subordinate tertiary indexes (GDP growth (present price, cumulative year), …, national debt maturity earning rate difference: 10 years-6 months: season), and the index values of the indexes in a fixed quarter are irrelevant to a company; the class II indexes comprise all three levels of indexes except the class I indexes, and the index value of the class II indexes in a fixed quarter is related to a company. Aiming at different types of three-level indexes, different index mean value and standard deviation calculation modes are adopted in the standardization process.
For the class I three-level index, a 20-season moving window is adopted to calculate the mean value and the standard deviation of the index, and the calculated mean value and the standard deviation reflect the statistical distribution condition of the index in 5 years. The index normalization uses the following formula:
Figure BDA0002694017190000171
i ∈ type I
Wherein:
Figure BDA0002694017190000172
-normalized value of model three-level index i quarter t
Figure BDA0002694017190000181
-original value of model three-level index i quarter t
Figure BDA0002694017190000182
The index mean value of the first 20 seasons (including the quarter t) of the three-level index i of the model
Figure BDA0002694017190000183
Figure BDA0002694017190000184
Model three level index i the first 20 quarter (including quarter t) index standard deviation of quarter t
Figure BDA0002694017190000185
Note: the 20 seasons are adopted to ensure the validity of the parameter estimation of mean value and standard deviation, namely under the condition that the samples are enough, the samples are used as new as possible, and the bias caused by using early historical data is avoided.
For class II tertiary indices, the mean and standard deviation of the index for the current season is calculated in index dataset A using the end-of-season stock A listing of listed companies. Considering that the class II three-level indexes are greatly influenced by extreme values (particularly certain financial indexes), the original data should be subjected to extreme value removing treatment before data standardization. The more commonly used method of removing the extremum is "median method of removing the extremum", the formula is as follows:
Figure BDA0002694017190000186
i ∈ type II
c belongs to a quarterly t stock A stock listed company list
Wherein:
Figure BDA0002694017190000187
original value of company c model three-level index i quarter t
Figure BDA0002694017190000188
-the depolarised value of the company c model three-level index i quarter t
Figure BDA0002694017190000189
-median of model three-level index i quarter t
Figure BDA00026940171900001810
Figure BDA00026940171900001811
——
Figure BDA00026940171900001812
Median in quarter t
n is a multiple of the distance, generally taken to be 5
The mean and standard deviation of index i quarter t are calculated using the following formula:
Figure BDA0002694017190000191
Figure BDA0002694017190000192
i ∈ type II
c belongs to a quarterly t stock A stock listed company list
Wherein:
Figure BDA0002694017190000193
-index mean of three-level index i quarter t of model
Figure BDA0002694017190000194
-index standard deviation of model three-level index i quarter t
NtQuarterly t stock A stock total number of listed companies
Figure BDA0002694017190000195
-the depolarised value of the company c model three-level index i quarter t
The final normalization uses the following formula:
Figure BDA0002694017190000196
wherein:
Figure BDA0002694017190000197
company c modelNormalized numerical value of type three-level index i quarter t
Figure BDA0002694017190000198
Original value of company c model three-level index i quarter t
Figure BDA0002694017190000199
-index mean of three-level index i quarter t of model
Figure BDA00026940171900001910
-index standard deviation of model three-level index i quarter t
And respectively carrying out standardization treatment on the I/II type original three-level index values in the index data set A/B by using the two standardization methods to form a standardized index data set A/B. After the treatment, each standardized three-level index is theoretically dimensionless and has consistent orders of magnitude (the mean value is 0, and the standard deviation is 1).
Since the mean and standard deviation used in the data normalization of the class II indicators of the indicator dataset B were found in the indicator dataset a, and the earliest credit rating quarter in the indicator dataset B was four quarters in 2014 (the credit event quarter was 2016 one quarter in 2016), the stock a stock listing of the sample set a needs to be collected from 2014. The form of the normalized index dataset A/B is shown in Table 1.7:
TABLE 1.7 normalized index data set examples
Figure BDA0002694017190000201
Note: k and n are the number of columns of meta-information data and the number of columns of standardized three-level index data respectively
The effective factors in the three-level index are discussed below.
The effective tertiary indicator factor should satisfy two conditions: the first is that there should be some correlation in the economic logic with the level of business credit; the second is that there is a clear correlation between the business credit level and the fact in the demonstration, that is, the occurrence (especially negative) of the business credit event can be identified to some extent. In the previous section, a set of candidate tertiary indicators that should be related to the credit level of the enterprise in economic logic has been listed categorically, and how to validate and screen the validity factors from the perspective of the demonstration will be described next.
The identification of the negative credit events is a classification problem, and the model classification capability of a single three-level index factor is tested by adopting univariate logistic regression so as to evaluate the effectiveness degree of the factor. The univariate logistic regression model is as follows:
Figure BDA0002694017190000202
wherein:
y-an interpreted variable: credit event information (default or loss 1; none 0)
x-an explanatory variable: normalized candidate tertiary indicator factor
P (y ═ 1| x) — probability that given company alternative tertiary index factor value corresponds to the occurrence of a negative credit event
z-logarithmic Odds Ratio (Odds Ratio)
Figure BDA0002694017190000203
Figure BDA0002694017190000204
-model parameters to be estimated
In the "last quarter" subset of normalized index dataset B (the last quarter of a credit event, the factor being most time-efficient), the logistic regression model described above is used to estimate the model parameters for all the alternative normalized tertiary indices
Figure BDA0002694017190000211
Investigation ofFactor significance level for each index: (
Figure BDA0002694017190000212
Corresponding p-value) and its model interpretation ability (pseudo R-squared). The smaller the p-value is, the larger the pseudo R-squared is, the more significant the corresponding tertiary index factor is (factor coefficient)
Figure BDA0002694017190000213
Significantly different from 0) the stronger the recognition of negative credit events (the more efficient the factor), the consideration should be given to the reservation and to the final credit model.
Coefficient sign of normalized three-level index factor in logistic regression
Figure BDA0002694017190000214
Reflecting the positive and negative correlation of the factor to negative credit events. When in use
Figure BDA0002694017190000215
Positive/negative, normalized tertiary index positively/negatively correlates with occurrence of negative credit events, i.e.
Figure BDA0002694017190000216
The larger/smaller the probability that a company will have a negative credit event in the next season is.
The empirical result of univariate logistic regression shows that the factor coefficients in the alternative standardized three-level index pool have positive or negative values. Considering that the model utilities between positive and negative three-level index factors under the same two-level index may cancel each other out, this is not an ideal result (the final model will use positive weight, and the factor model contribution directions must be consistent). Thus, consider appropriate modifications to the tertiary indexing factors such that all tertiary indexing factors are negatively correlated with the occurrence of a negative credit event
Figure BDA0002694017190000217
I.e., the larger the tertiary indicator factor, the higher the business credit level, and the lower the probability of a negative credit event occurring.
The three-level index factor modification method comprises the following steps:
for the
Figure BDA0002694017190000218
The three-level index factors are normalized by the significant (and more) negative correlation, and the factors are kept in the original form without deformation.
For the
Figure BDA0002694017190000219
If the index form of the positive correlation standardized tertiary index factor is a proportional form (e.g. net asset liability rate), firstly, the positive correlation standardized tertiary index factor is considered to be changed into an inverse form (before the standardization process), univariate logistic regression is carried out on the inverse variant standardized tertiary index factor, and different variant treatments are carried out on the factor according to the difference between the significant condition of the factor and the positive and negative correlation in the regression result:
Figure BDA00026940171900002110
note: inverting the factor
For a non-proportional form (e.g. HHI) of a significant positive correlation standardized three-level index factor, the original three-level index is directly subjected to inverse number processing (the factor is not changed in significance after inversion).
The 'effective' factor (after deformation) screened from the standardized alternative three-level index pool by the method has obvious negative correlation with the credit level of an enterprise on the basis of demonstration. Further considering the correlation among the effective factors, since the highly correlated indexes are homogeneous to the model, and multiple collinearity caused by the high correlation may cause potential parameter estimation bias, the principle of the okam razor should be applied to eliminate part of the highly correlated indexes.
The coefficient of variance expansion (VIF) and correlation matrix for each normalized tertiary index are calculated in the "last quarter" subset of the normalized index dataset B (after deformation processing). The coefficient of variance expansion of the "effective" normalized tertiary index i is calculated as follows:
Figure BDA0002694017190000221
Figure BDA0002694017190000222
is the coefficient of determination (R-squared) of the multiple regression equation.
Figure BDA0002694017190000223
When VIFi>When 10, the normalized third-level index i can be interpreted by other effective normalized third-level indexes, the collinearity is high, and the index i can be considered to be removed (or the index highly related to the index i is removed). Specifically, effective standardized three-level indexes which have correlation coefficients larger than 0.95 (or 0.9) with the index i and belong to the same two-level index classification are screened out from the correlation coefficient matrix, and factors with the highest significance degree of the selected factors are selected from the effective standardized three-level indexes and added into a final model three-level index factor list (the economic intuition judgment needs to be combined).
Through the effective factor identification method, 52 three-level index factors which are included in the model are finally selected from the three-level index selection pool, and the corresponding univariate logistic regression results are shown as follows:
TABLE 1.8 optimized three-level index factors and their univariate logistic regression results
Figure BDA0002694017190000224
Figure BDA0002694017190000231
Figure BDA0002694017190000241
Note: in the actual factor checking process, the significance degree of the factor is divided into the following 5 grades according to the value of the p-value. Where the more significant the "+" sign, the "and higher order" may be considered statistically significant. For economic reasons, the final factor list still retains some factors that are not really significant. In the subsequent section, the third-level index factors (the deformed normalization factors) in the table above are used for constructing and training the Sanyou Credit model.
p-value (0,0.001] (0.001,0.01] (0.01,0.05] (0.05,0.1] (0.1,1]
Significant level of **** *** ** *
This section will discuss how the second-level indicators are synthesized from the standardized third-level indicator factors from bottom to top. The synthesis process mainly comprises the following steps: 1) scaling the standardized three-level index in percentage; 2) and weighting and summing the indexes in three stages.
1) Standardized three-level index percentile scaling
To limit the final credit score to 0-100 (credit score is a weighted summary of the scores of the secondary metrics), the normalized tertiary metrics are first scaled by a percentile. The mapping formula is specifically adopted as follows:
Figure BDA0002694017190000242
wherein:
Figure BDA0002694017190000243
-percentage zoom score of company c standardized three level index i quarter t
CDF-cumulative probability distribution function of standard normal distribution
Figure BDA0002694017190000244
Company c standardizes the original value of the three-level index i quarter t
The mapping used by the percentile scaling score is strictly monotonic, the final scaling score reflects the percentile in which the factor is located (which can be seen as giving a "score" of 0-100 for the normalized tertiary index), and for several special cases: CDF (-inf) ═ 0; CDF (0) ═ 50; CDF (+ inf) ═ 100.
And (4) null value processing:
for the part of vacancy left after filling of the null value of the index data set, the percentage standardization three-level index score adopts the following two null value processing schemes:
zero value filling: the padding is done with a value of 0, the worst value (no data is revealed to be the worst).
Mean value filling: fill in with 50, mean (no data revealed is taken as mean level).
2) Three-level index weighted sum
In the above, the univariate logistic regression p-value of the normalized three-level index can reflect the significance of the index factor to some extent and further reflect the importance of the index factor (the smaller the p-value is, the more significant the significance is). Based on this, three levels of index weights are set such that the high (low) saliency factor has a high (low) weight. Because the size difference of different three-level indexes p-value is very different, the three-level indexes are not suitable to be directly used as weights, and discretization processing is considered. Two p-value partitions and partition correspondence scores are shown in tables 2.1 and 2.2:
TABLE 2.1P value 5 level partitioning
p-value (0,0.001] (0.001,0.01] (0.01,0.05] (0.05,0.1] (0.1,1]
score 5 4 3 2 1
TABLE 2.2P value 10 level partitioning
p-value (0,10-64] (10-64,10-32] (10-32,10-16] (10-16,10-8] (10-8,10-4]
score 10 9 8 7 6
p-value (10-4,10-3] (0.001,0.01] (0.01,0.05] (0.05,0.1] (0.1,1]
score 5 4 3 2 1
Note: the main difference between the ten-level and five-level divisions is that the ten-level division emphasizes the nuances between factors of different significance levels more than the five-level division.
After the above preparation work is completed, the secondary index score can be calculated using the following formula:
Figure BDA0002694017190000251
wherein:
Figure BDA0002694017190000252
score of company c Secondary index i quarter t
Figure BDA0002694017190000253
-c standardizing the percentage zoom score of the third-level index j quarter t of the company, the third-level index j being the subordinate index of the second-level index i
wjThe factor weight (internal weight of the secondary index) corresponding to the tertiary index j is calculated as follows:
Figure BDA0002694017190000254
scorekthe value is determined by the p-value of the tertiary index k, as shown in table 2.1 or table 2.2.
And synthesizing a secondary index of the standardized index data set A/B by using the method to form a secondary index score data set A/B. The form of the secondary index score data set a/B is shown in the following table:
TABLE 2.3 example of two-level index score data set form
Figure BDA0002694017190000261
Note: according to the difference of the p value classification grade and the null value filling mode, the secondary index score data sets A/B have four conditions: p-value division x null filling (2 x 2).
Level 5 split/zero padding Level 5 partitioning/mean filling
Level 10 split/zero padding 10-stage partitioning/mean filling
The following describes how to score the credits for the sample set of listed companies (sample set a) and the sample set of debt enterprises (sample set B) using the secondary index score data obtained in the above section. In the process of scoring the credit of the subject, the operation is performed by using a weighted average mode for the secondary indexes obtained in the previous section, namely, each secondary factor corresponds to a weight, and the product of the secondary factor and the corresponding weight is summed to obtain the credit score of the subject. The secondary indices are shown in the following table:
second order factor name Factor score Weight of
National economy X1 W1
Market factor X2 W2
Regional economic development X3 W3
Development of industry X4 W4
Ability to make a profit X5 W5
Ability to pay off debt X6 W6
Capacity of operation X7 W7
Ability to develop X8 W8
Debt structure X9 W9
Altman Z value early warning factor X10 W10
ESG X11 W11
The selection of the secondary index weight meets the following conditions:
Figure BDA0002694017190000262
wherein:
Wi-weight corresponding to each secondary indicator
k is the total number of secondary indexes, k is 10 when no ESG index exists, and k is 11 when an ESG index exists.
In the process of model construction, the ESG factors are assumed to be controllable factors (which can be added or deleted at any time), accuracy of predicting occurrence of negative credit events (default and evaluation) is different, and the influence degree of whether the measured sample contains the ESG factors on the model prediction accuracy needs to be further verified. Therefore, by keeping other conditions unchanged, only the ESG factor is added or deleted for verification, and the value of k in the verification process is as follows: there are 10 secondary indexes (k is 10) in the absence of the ESG index, and 11 secondary indexes (k is 11) in the presence of the ESG index.
When calculating the credit score, a weight is respectively set for each secondary index to correspond to the secondary index. In the initial stage, a value (the weight value is between 0 and 1) is randomly distributed to each weight, so that the weights of the secondary indexes form a group of weights (the sum of all weights in the group is 1), and the specific credit score is calculated by multiplying each secondary index by the corresponding weight and then summing the products. The calculation formula is as follows:
Figure BDA0002694017190000271
wherein:
Xiname of the second order factor
Wi-weights corresponding to secondary factors
k-number of second order factors
Matrix operations are used in the model for the credit score calculation for all samples in sample set a and sample set B:
Figure BDA0002694017190000272
and the finally obtained m x 1 matrix is the credit score corresponding to the subject of the centralized debt issue of the sample.
Wherein:
Xm,k-the maximum value of m in the samples to be calculated for the sample points corresponding to the mth row and kth column cells in the sample is the number of selected sample points, m in the sample set a is the total number in the selected sampling quarter interval, and m in the sample set B is 6251; when there is no ESG index, k is 10, and when there is an ESG index, k is 11.
The manner in which subjects in the debt enterprise sample set (sample set B) rate credit will be described below, and the following three aspects need to be noted before performing credit rating:
the weight calculation method comprises the following steps: the training set of samples to be rated (sample set B) and the sample set of A stock of listed companies (sample set A) use the same weight combination to calculate the subject credit score.
Credit rating: the sample set of the A stock listed company (sample set A) uses a percentile cutting method on the grading interval division, and the credit rating is divided into the following grades: "AAA, AA +, AA, AA-, A +, A, A-, BBB +, BBB, BBB-, BB +, BB, BB-, B +, B-, CCC, CC, C", wherein "AAA" represents the highest credit rating of the subject and "C" represents the lowest credit rating of the subject.
Rating date range: the rating interval used by the sample to be rated (sample set B) and the sample set of the a stock listed company (sample set a) should be in the same quarterly date.
When credit rating is carried out on the debt sending subject in the fixed sample B, the debt sending enterprises in the sample and a sample set (sample set A) of a company on sale of the stock A are mapped one by one according to the quarter date, in the same quarter, the rating corresponding to the drop point of the credit score of the debt sending subject in the sample B in the rating interval obtained by the sample A is checked, and the rating is recorded to obtain the rating of the debt sending subject in the sample B in the quarter.
First, stock a is sampled by date, and in view of the need to more fully sample the a stock listed companies, we sample two time intervals in this model:
the first method is as follows: the a sample subjects were sampled quarterly, for example: when sampling the fourth quarter of 2019 of the A stock samples, the quarter date is selected to be 12 and 31 months in 2019 (the last quarter date is used for representing one quarter, and eg: the first quarter of 2019 corresponds to 31 months in 2019 and 3).
The second method comprises the following steps: the first four quarters of the a sample subjects were sampled, for example: and when the fourth quarter of 2019 of the A strand sample is sampled, selecting the quarters as the first quarter of 2019, the second quarter of 2019, the third quarter of 2019 and the fourth quarter of 2019 for four quarters.
In order to ensure that the optimal and worst number of the subject ratings in the samples is kept in a proper range, the rating intervals are divided according to the sample quantiles corresponding to the percentiles, namely 19 ratings are applied to the sample set (sample set A) of the company listed in the A stock. The reason for using 19 ratings is because the rating division is more detailed and has high market acceptance. The corresponding percentile and the divided rating are as follows:
Figure BDA0002694017190000281
Figure BDA0002694017190000291
examples are: in this example, a model including ESG data and having a mean value of 1/11 × 100% to 9.091% of the weight corresponding to each secondary index in the weight group is used. And (3) arranging credit scores of debt subjects in the A shares of the fourth quarter of 2019 in the order from large to small, calculating quantiles of corresponding samples according to the percentiles, and constructing corresponding intervals.
Percentile interval Rating
(60.0121,100.0] AAA
(57.5263,60.0121] AA+
(55.3494,57.5263] AA
(53.8226,55.3494] AA-
(52.4593,53.8226] A+
(51.1228,52.4593] A
(50.1600,51.1228] A-
(49.1918,50.1600] BBB+
(48.3791,49.1918] BBB
(47.4195,48.3791] BBB-
(46.5744,47.4195] BB+
(45.4332,46.5744] BB
(44.5816,45.4332] BB-
(43.3876,44.5816] B+
(41.7768,43.3876] B
(40.2099,41.7768] B-
(38.3641,40.2099] CCC
(34.5747,38.3641] CC
(0.0,34.5747] C
The rating interval is based on the principle of left-open and right-closed, the left point is the lower limit of the interval, and the right point is the upper limit of the interval. In order to ensure that credit rating intervals constructed by all quantiles can cover scores of 0-100, the quantile values corresponding to the lowest percentile (0.0526) and the highest percentile (1.0) of the calculated quantiles respectively replace cities 0 and 100, and meanwhile, the ratings corresponding to the intervals are divided into 19 grade intervals from high to low according to the sequence.
According to the credit score calculation method, the credit score of the sample body can be obtained, and the A stock interval corresponding to the quarter can be found according to the report date. And after the date indexes are matched, finding the interval where the credit score of the subject is located in the rating interval, and outputting the credit rating corresponding to the interval, so that the credit rating corresponding to the debt subject of the sample B is obtained.
Introduction of rating samples: the samples B to be evaluated used in the model are classified according to time, and the total number is four: in the model, data in two adjacent quarters, namely the last quarter and the second quarter, are selected for testing in consideration of the validity and comparability of the data in the time dimension.
Introduction of rating process: and calculating the mail rating interval in the sample set (sample set A) of the A stock of listed companies. And mapping credit scores of the debt main body in the two samples to be tested of 'previous quarter' and 'previous second quarter' in the A-stock rating interval according to the quarter to respectively obtain the credit rating of the credit score of the main body in the two samples in the interval.
In order to facilitate subsequent calculation, corresponding codes are marked on credit ratings of debt issuing subjects according to the following rating code table.
Rating code table example:
Figure BDA0002694017190000301
Figure BDA0002694017190000311
the following is an example of the rating.
Sample introduction:
1) credit score interval: contains ESG secondary index, P value 10 level divided sample set A.
2) And (3) weighting: and (4) according to the weight of each secondary index.
3) And (3) a sample to be evaluated: the "quarter last", "quarter last" P values rank 10 two data sets.
4) Null value filling mode: and (5) mean value filling.
The meta information of the sample to be evaluated is as follows:
Figure BDA0002694017190000312
the rating results and rating codes are as follows:
Figure BDA0002694017190000313
the above table shows the ratings and corresponding codes of the three samples, and it can be seen from the above table that for the western king group of the debt subject of the "19 west king SCP 001", the date of occurrence of the negative credit event is 2019/12/30, the last second matching quarter is 2019/3/31, the corresponding rating in the quarter is C corresponding code 19, the last matching quarter is 2019/6/30, the corresponding rating in the quarter is C corresponding code 19, and the code change of the subject in the time period is 0 by subtracting the last second coding from the last first coding, which indicates that the rating of the subject is maintained at the C level and is not changed in the two rating periods of 2019/3/31 to 2019/6/30.
For the debt subject of the bond of ' 18 Wan Jing Jian MTN003 ', Huaan ' Wai Jing construction (group) limited company, the date of occurrence of a negative credit event is 2019/12/24, the second-stage matching quarter is 2019/3/31, the corresponding rating in the quarter is B corresponding code 15, the first-stage matching quarter is 2019/6/30, the corresponding rating in the quarter is B-corresponding code 16, and the code change of the subject in the time period is-1 by subtracting the first-stage code from the second-stage code, which indicates that the rating of the subject is reduced in two rating time periods of 2019/3/31 to 2019/6/30.
For the debt subject kady ecosystem science and technology shares company of "H6 kady 03", the date of occurrence of a negative credit event is 2019/12/16, the second-year-last matching quarter is 2019/3/31, the corresponding rating in this quarter is BB-corresponding code is 13, the first-year-last matching quarter is 2019/6/30, the corresponding rating in this quarter is BB + corresponding code is 11, and using the second-year-last code minus the first-year code results in a code change of +2 for this time period, indicating that the subject has an increase in the subject credit rating in the two rating periods 2019/3/31 to 2019/6/30.
From the change of the code, if the code change value is 0, it indicates that the subject has not changed the rating in the period, if the code becomes less than 0, it indicates that the subject rating has decreased in the rating period, and if the code change value is greater than 0, it indicates that the subject rating has increased in the rating period. According to the calculation mode, the corresponding rating and the code of the sample subject in a specific quarter can be obtained, and the change trend of the rating of the sample subject can be reflected through code comparison.
The above model was optimized as follows:
for the measurement of model accuracy, the model selects an objective function with a bias according to different emphasis of various different index considerations, and the recall rate is used as an optimized objective function, and the reason for selecting the recall rate as the objective function is that the recall rate can be measured in a sample where a default actually occurs, the model predicts the ratio of the occurrence of the default to the actual total number of the default, and the higher the ratio is, the higher the prediction accuracy of the model on the sample where the default actually occurs is.
Introduction to each objective function:
1) confusion matrix table
Figure BDA0002694017190000331
The table above is a confusion matrix, which is used for recording the corresponding relation between the actual value of the sample and the model prediction result, and the condition that no negative credit event (0) occurs means that the subject does not have a default or a rating-reducing event in the corresponding time period; occurrence of a negative credit event (1) refers to occurrence of a default or downgrade event by the subject within the corresponding time. When the actual condition of the sample is 0 and the model prediction result is 0, the result is referred to as tn (true neighbors), when the actual condition of the sample is 0 and the model prediction result is 1, the result is referred to as fp (false neighbors), when the actual condition of the sample is 1 and the model prediction result is 0, the result is referred to as fn (false neighbors), when the actual condition is 1 and the model prediction result is 1, the result is referred to as tp (true neighbors).
2) Recall rate
Recall (Recall rate) TP/(FN + TP)
The recall rate refers to the ratio of the correct number of model prediction results to the total number of predictions in a sample with a negative credit event, and the larger the ratio is, the higher the correct number of model predictions for default samples is. Therefore, in the construction of the model, the final recall rate of the model is expected to be maximized by finding a proper weight set (W).
3) Rate of accuracy
Precision TP/(TP + FP)
The precision rate refers to the ratio of the prediction correct rate to the total prediction result in the prediction results of the model, and the larger the ratio is, the higher the prediction accuracy of the model is.
4) Accuracy of
Accuracy (Accuracy) (TP + TN)/(TP + FN + FP + TN)
The accuracy mainly measures the correct prediction probability of the model to the sample, and the higher the accuracy is, the better the model prediction effect is.
5) F1 score
F1 fraction 2 (reduce Precision)/(reduce + Precision)
The F1 score is mainly a harmonic average value of the recall rate and the precision rate and can reflect the comprehensive prediction capability of the model.
In order to ensure the accuracy of the model, a sample B data set is divided into a training set and a test set, wherein the training set is used for training the model, the output end of the training set is an optimal weight combination (Wopt), the test set is used for judging the prediction accuracy of the model when the optimal weight (Wopt) is used, and the output end of the test set is the result of an objective function. We fit the bond body sample in sample B as 4: 1, randomly dividing the model test into a training set and a testing set, wherein the model test is divided into two stages: a training phase and a testing phase. The training sample and the testing sample are independent and do not influence each other.
After the scores of the subjects in the training samples are calculated and mapped to the A shares of same-quarter rating intervals, the corresponding rating codes of the subjects in two quarters can be obtained through the corresponding rating codes.
Code change-last two-stage rating corresponding code-last one-stage rating corresponding code
In the discrimination of the model prediction result, we stipulate that the subject will be tagged with a negative credit event (1) when the subject rating code changes (less than a value) or the last rating corresponds to a code (greater than b value):
Figure BDA0002694017190000341
the value a reflects the change of the credit rating code of the sample, which is calculated by subtracting the credit rating code of the previous period of the sample from the credit rating code of the second period of the sample, and in the model, three numbers (0, -2) are selected to be assigned to the value a, which has the following meaning:
a score usage Description of the value
0 (a=0) The sample body rating is reduced
-2 (a=-2) At least three levels of sample body rating reduction
b Score usage Description of the value
16 (b=16) The last stage is rated CCC and below
18 (b=18) Last year rating of C
From the data in the a and b value correspondence table, it can be seen that when one a value is selected, 2 b values will be associated with the a value, and therefore, the a value and the b value will have 2 × 2 — 4 kinds of combinations, as shown in the following table:
Figure BDA0002694017190000342
Figure BDA0002694017190000351
in a data set tested by the model, aiming at the problem of data deficiency of the three-level index, two filling schemes with deficiency values are selected.
The first scheme is as follows: filling the data before the third-level index is synthesized into the second-level index by a 0 value; data points for which the data is null are filled with 0 s and constitute a data set.
Scheme II: performing mean filling on the data set before the third-level index is synthesized into the second-level index; and filling data points with null data values as the mean value of the three-level index in the data set, and forming a sample data set.
In the process of synthesizing the secondary indexes by the tertiary indexes, two synthesis modes are adopted: the first method is as follows: dividing the three-level indexes into 5 levels according to the P value; the second method comprises the following steps: and dividing the three-level index into 10 levels according to the P value.
According to whether the data set contains ESG data or not, the data set is divided into two conditions of no ESG data set and containing ESG data set: the first condition is as follows: the use of ESG data is avoided, and the ESG data is not introduced into the sample and A data set; case two: the use of ESG data is included, the ESG data is introduced in the sample and a data set of strands, and other factors are kept unchanged. A model dataset hierarchy is shown in fig. 3.
And carrying out random assignment on the weight combination, using the training set to carry out model training under the condition that each weight is more than 0 and the sum of all weights is 1, namely calculating the recall ratio of a predicted value and a true value of the model under the given weight combination, circulating the process for multiple times in the training set of each data set, finding the model recall ratio corresponding to the weight set, calculating the average number of the weight set by taking the corresponding weight set thirty times before the recall ratio after the recall ratios of the circulation for multiple times are arranged from large to small, taking the average weight set as a global optimal weight to be brought into the test data set when the average weight set is obtained, and calculating the recall ratio of the test data set. The average weight set is the optimal weight of the model under the division of the data set and the negative credit events, and the recall rate in the test set is the recall rate of the final model.
The following description focuses on how to determine the subject credit rating and calculate and use the optimal weight Wopt of the model using the secondary index score of the sample subject after obtaining the secondary index data. Simulating a background: a certain private recruitment company wants to use a three-excellent credit model to carry out risk assessment on a liability main body within an investment standard of the private recruitment company; the method comprises the following steps: the prediction of the subject who actually generates the negative credit event in the target is accurate as much as possible, and the judgment severity of the model on the subject who actually generates the negative credit event is moderate; the coping method comprises the following steps: in view of the need for this privatization, we set the following conditions for the model:
1) the model uses the recall rate as an objective function, and finds the corresponding optimal weight combination Wopt under the condition of ensuring the maximum recall rate.
2) When the subject rating is reduced by at least three levels in the evaluation period or the subject rating is CCC or below in the last period, the subject is marked as having a negative credit event (1), and the condition is set to ensure that the judgment of the model on the subject actually having the negative credit event is moderate.
3) The selected data set is: the A sample P values are divided by 10 (sample A), the data set of the quarter and quarter of the last time when the negative credit event occurs is sampled, and the P values are divided by 10 (sample B).
4) Dividing a training set and a testing set: the negative credit event body sample (sample B) is divided into a training set and a testing set.
5) Presence or absence of ESG data: and ESG data is selected from the sample A and the sample B to form a data set.
6) Sampling date: select the appropriate sampling quarter for sample A (in the example, select a quarter sampling mode)
Finally, the values of a and b are a-2 and b-16, and when the subject rating is decreased by at least three levels in the evaluation period or the subject rating is CCC and below in the near period, the subject is marked as having a negative credit event (corresponding to 1 sample in the classification problem).
And after the secondary indexes are obtained, randomly evaluating the weight set Wi under the condition of meeting the conditions (see the factor weight part in detail), calculating the credit score of the sample A body by using the same Wi, and dividing the credit score into 19 grades to be corresponding to the credit rating interval. The credit rating interval map is as follows:
Figure BDA0002694017190000361
Figure BDA0002694017190000371
note: the table is selected from the fourth quarter scoring interval of 2019 for sample A
And after the mapping relation is obtained in the sample A, carrying out weighted average on all secondary factors in the tested previous quarter sample and previous second quarter sample training sets with the negative credit event according to the same weight set Wi to obtain the main credit score in the sample. And according to the mapping relation between the credit score interval and the credit rating obtained in the sample A, mapping the credit scores of the subjects in the previous quarter sample and the previous second quarter sample to obtain the credit rating of the debt subjects in the previous quarter sample and the previous second quarter sample in the sample B. Defining a debt subject as having a negative credit event (labeled 1, the same below) for the last quarter and the last two quarters of the sample credit rating as long as one of the following rules is satisfied:
the subject rating decreases by at least three levels during both the first quarter and second quarter evaluation periods (e.g., subject rating decreases from "A" in the second quarter to "BBB-" in the first quarter, we mark the subject as 1)
The subject rated CCC and below in the last season (e.g., subject rated "C" in the last quarter, we labeled the subject as 1)
And (4) sequencing the recall rates obtained by the final model through multiple cycles, and taking the top 30 corresponding weight groups to calculate an average value as the global optimal weight group Wopt by sequencing the recall rates from high to low considering that the judgment of the optimal weight group by using a single recall rate value may have errors. And substituting the optimal weight Wopt into the test set to obtain the recall rate of the model in the test sample set. The recall result was 0.89 and the optimal weight Wopt set was as follows:
Figure BDA0002694017190000372
the model can be used by clients with different risk preferences:
the requirements for risk control are high: the tolerance of the clients to the asset risk is low, the investment target is more biased to have low risk and stable return, and a second selection scheme is recommended when the model is used for predicting the main body: according to the scheme, the sample rating reduction gear and the latest credit rating are strictly defined, and potential hidden dangers in the investment target can be more comprehensively found.
The risk control requirement is not high: the tolerance of the clients to the asset risk is higher, the investment target is more biased to the asset with higher risk, high return is pursued, and a first selection scheme is recommended when the model is used for predicting the main body: the general scheme has a loose definition on the sample degradation rating and the latest first-stage rating, and facilitates the expansion of investment targets on the premise of ensuring the accuracy of default judgment.
Selection of optimization objectives in the model (recall, precision, accuracy):
the recall ratio is as follows: the pursuit model has high accuracy in judging the actual negative credit event occurrence main body in the sample.
Precision: the judgment accuracy of the negative credit events occurring in the prediction samples in the pursuit model is high.
Accuracy: the pursuit model has higher judgment accuracy for the non-occurrence of the negative credit event and the occurrence of the negative credit event.
F1 score: the pursuit model balances the accuracy of the prediction of actual occurring negative credit events with the accuracy of the prediction of predicted occurring negative credit events.
The scheme is divided into a common scheme and an aggressive scheme.
The first scheme is as follows: general scheme
Figure BDA0002694017190000381
Scheme II: radical scheme
Figure BDA0002694017190000382
Training set, test set classification effect under different schemes:
1) filling with 0 value
Figure BDA0002694017190000383
Figure BDA0002694017190000391
Figure BDA0002694017190000401
2) Using mean filling
Figure BDA0002694017190000402
Figure BDA0002694017190000411
Figure BDA0002694017190000421
As shown in fig. 4, the present application further provides an apparatus for constructing an enterprise ESG triple excel credit model, including:
a sample set determining module 101, configured to determine a sample set, where a sample in the sample set includes meta information;
an index data set obtaining module 102, configured to perform data collection on the model index factor according to the meta information, and obtain an index data set;
a standardized index data set acquisition module 103, configured to perform standardized processing on the index data set to acquire a standardized index data set;
an index score data set obtaining module 104, configured to perform weighted summation after scaling the standardized index data set in percentage, so as to obtain an index score data set;
a credit scoring module 105, configured to score a credit for the sample according to the index scoring data set;
and the credit rating module 106 is configured to perform interval division on the credit scores to obtain the credit ratings of the samples.
In the embodiment of the invention, the model index factors are divided into three layers, wherein the first layer is a first-level index, the second layer comprises a second-level index corresponding to the first-level index, the third layer comprises a third-level index corresponding to the second-level index, and the third-level index forms a model candidate factor pool; and classifying the three-level indexes in the model candidate factor pool into a type I and a type II according to the independence and relevance of the index values to enterprises.
The standardized index data set acquisition module 103 comprises a first standardized module and a second standardized module, wherein the first standardized module is used for standardizing the class I three-level index by adopting a moving window Z value; and the second standardization module is used for standardizing the class II three-level indexes by adopting the current quarter Z value.
Further, the construction device also comprises an index factor screening module which is used for screening the three-level index factors in the standardized index data set; the method specifically comprises the following steps: the model classification capability of a single three-level index factor is checked by adopting a univariate logistic regression model to evaluate the significance degree of the three-level index factor, and the three-level index factor with low significance is removed; modifying the three-level index factors to make all the three-level index factors negatively correlated with the occurrence of the negative credit event; and performing multiple collinearity diagnosis on the three-level index factors, and removing part of the highly-related three-level index factors.
Furthermore, the construction device also comprises a model optimization module which is used for dividing the sample set into a training set and a testing set; carrying out random assignment on the weight combination in a training set and then bringing the weight combination into a model for training; calculating the recall rate of the predicted value and the actual value of the model under the weight combination; the steps are circulated for each training set for multiple times, and the recall rate corresponding to the weight combination is recorded successively; arranging the recall rates which are circulated for many times from large to small, taking the corresponding weight combination with the recall rate ranked at the front to calculate the average number of the weight combinations, and obtaining the average weight combination; and taking the average weight combination as a global optimal weight to be brought into the test set to obtain the recall rate of the test set.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In summary, the beneficial effects of the invention include: by creatively bringing the enterprise environment, society and administration (ESG) information into an enterprise credit rating system, the credit quality of a bond main body can be effectively identified, and the overall credit of a debtor can be more comprehensively measured; the ESG non-financial performance factor is added to help make up the defects of the traditional financial index, the traditional financial index usually measures the past performance of an enterprise, and the ESG is more focused on measuring the current and future development potential of the enterprise, so that the benefit of a creditor is protected, and the safety of a financial system is promoted; the model provided uses the p-value discretization score of the three-level index single-factor logistic regression coefficient as the three-level index weight distribution basis in the synthesis of the second-level index, can reflect the significance degree of the index factor to a certain extent and further reflect the importance of the index factor (the smaller the p-value is, the more significant the p-value is), and realizes the differentiation treatment of the prediction capability of each three-level index according to default and degradation of the index in the merged three-level index (the second-level index). The model provided by the application detects multiple collinearity of the three-level indexes in the construction process, removes the homogenization three-level indexes in the index library by combining the economic meaning, controls the complexity of the model, enhances the robustness of the model to a certain extent, and enhances the understandability of the model by combining the three-level indexes. And the method does not depend on expert scoring, and only adopts open objective data to construct the model, thereby further ensuring the objectivity and the easy operability of the model.
Compared with the traditional seller credit rating model, the three-excellent credit model sets the model from the perspective of the investor, can more deeply mine the correlation between factors and bond credit risks, and can give a credit rating result from the perspective of a buyer. The Sanyou Credit model has several main features: (1) the ESG may identify the quality of credit, including the ESG representation. The model is brought into a local ESG evaluation system independently innovated and developed by the green finance international research institute of the central finance and economics university, and the ESG performance of an enterprise can be comprehensively measured according to qualitative and quantitative indexes and negative risk indexes of three dimensions of environmental protection, social responsibility and company governance. Compared with the existing traditional model, the ESG credit model provides multi-dimensional index reference. (2) The theory and the practice are combined, the model is established on the basis of a large number of credit research documents and practical experience of bond credit evaluation, and the theory and the practice are closely combined, so that the practicability is highlighted, and the theoretical depth is achieved. (3) The early warning capability is strengthened, the model has the buyer early warning function, and early warning can be carried out on bond default and decline. A set of complete ESG credit model is finally formed by continuously strengthening the model interpretation capability, adjusting the factor weight and carrying out empirical research, and the rating adjustment can be more accurate and timely compared with the rating of a seller in the market.
It should be understood that the method for constructing the enterprise ESG three-goodness credit model provided by the invention can be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The technical scope of the present invention is not limited to the above description, and those skilled in the art can make various changes and modifications to the above-described embodiments without departing from the technical spirit of the present invention, and such changes and modifications should fall within the protective scope of the present invention.

Claims (10)

1. A method for constructing an enterprise ESG three-excellence credit model is characterized by comprising the following steps:
step 1, determining a sample set, wherein samples in the sample set comprise meta-information;
step 2, collecting data of the model index factors according to the meta-information to obtain an index data set;
step 3, standardizing the index data set to obtain a standardized index data set;
step 4, after the standardized index data set is scaled in percentage, weighting and summing are carried out to obtain an index score data set;
step 5, performing credit scoring on the sample according to the index scoring data set;
and 6, carrying out interval division on the credit rating to obtain the credit rating of the sample.
2. The method of claim 1, wherein the model index factor is divided into three levels, a first level is a primary index, a second level comprises a secondary index corresponding to the primary index, a third level comprises a tertiary index corresponding to the secondary index, and the tertiary index forms a model candidate factor pool.
3. The method for constructing an enterprise ESG triple credit model according to claim 2, wherein the step 3 of normalizing the index data set specifically includes: and classifying the three-level indexes in the model candidate factor pool into a class I and a class II according to the independence and relevance of index values to enterprises, wherein the class I three-level indexes are standardized by a Z value of a moving window, and the class II three-level indexes are standardized by a Z value of a current quarter.
4. The method for constructing an enterprise ESG triple credit model according to claim 2, further comprising, after the step 3, screening three levels of index factors in the standardized index data set, specifically:
the model classification capability of a single three-level index factor is checked by adopting a univariate logistic regression model to evaluate the significance degree of the three-level index factor and eliminate the three-level index factor with low significance;
modifying the three-level index factors to make all the three-level index factors negatively correlated with the occurrence of the negative credit event;
and performing multiple collinearity diagnosis on the three-level index factors, and removing part of the highly-related three-level index factors.
5. The method of claim 1, further comprising optimizing the model to separate the sample set into a training set and a test set;
carrying out random assignment on the weight combination in a training set and then bringing the weight combination into a model for training;
calculating the recall rate of the predicted value and the actual value of the model under the weight combination;
the steps are circulated for each training set for multiple times, and the recall rate corresponding to the weight combination is recorded successively;
arranging the recall rates which are circulated for many times from large to small, taking the corresponding weight combination with the recall rate ranked at the front to calculate the average number of the weight combinations, and obtaining the average weight combination;
and taking the average weight combination as a global optimal weight to be brought into the test set to obtain the recall rate of the test set.
6. The method of claim 1, wherein when there is a data loss in the three-level indicators in the sample set, the data points whose data are null values are filled to 0 or mean value.
7. An enterprise ESG three-excellence credit model construction device is characterized by comprising:
a sample set determination module for determining a sample set, wherein the samples in the sample set comprise meta-information;
the index data set acquisition module is used for carrying out data collection on the model index factors according to the meta information to acquire an index data set;
the standardized index data set acquisition module is used for carrying out standardized processing on the index data set to acquire a standardized index data set;
the index score data set acquisition module is used for carrying out weighted summation after the standardized index data set is scaled in percentage to acquire an index score data set;
the credit scoring module is used for scoring the credit of the sample according to the index scoring data set;
and the credit rating module is used for carrying out interval division on the credit scores and acquiring the credit ratings of the samples.
8. The apparatus of claim 7, wherein the model index factor is divided into three levels, a first level is a primary index, a second level includes a secondary index corresponding to the primary index, a third level includes a tertiary index corresponding to the secondary index, and the tertiary index forms a model candidate factor pool;
classifying the three-level indexes in the model candidate factor pool into a type I and a type II according to the independence and relevance of the index numerical value and the enterprise;
the standardized index data set acquisition module comprises a first standardized module and a second standardized module, wherein the first standardized module is used for standardizing the class I three-level index by adopting a moving window Z value;
and the second standardization module is used for standardizing the class II three-level indexes by adopting the current quarter Z value.
9. The apparatus of claim 8, further comprising an index factor screening module for screening three levels of index factors in the standardized index dataset; the method specifically comprises the following steps:
the model classification capability of a single three-level index factor is checked by adopting a univariate logistic regression model to evaluate the significance degree of the three-level index factor, and the three-level index factor with low significance is removed;
modifying the three-level index factors to make all the three-level index factors negatively correlated with the occurrence of the negative credit event;
and performing multiple collinearity diagnosis on the three-level index factors, and removing part of the highly-related three-level index factors.
10. The apparatus of claim 7, further comprising a model optimization module for dividing the sample set into a training set and a testing set;
carrying out random assignment on the weight combination in a training set and then bringing the weight combination into a model for training;
calculating the recall rate of the predicted value and the actual value of the model under the weight combination;
the steps are circulated for each training set for multiple times, and the recall rate corresponding to the weight combination is recorded successively;
arranging the recall rates which are circulated for many times from large to small, taking the corresponding weight combination with the recall rate ranked at the front to calculate the average number of the weight combinations, and obtaining the average weight combination;
and taking the average weight combination as a global optimal weight to be brought into the test set to obtain the recall rate of the test set.
CN202011000208.5A 2020-09-22 2020-09-22 Method and device for constructing ESG (electronic service guide) three-excellence credit model of enterprise Pending CN112232377A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011000208.5A CN112232377A (en) 2020-09-22 2020-09-22 Method and device for constructing ESG (electronic service guide) three-excellence credit model of enterprise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011000208.5A CN112232377A (en) 2020-09-22 2020-09-22 Method and device for constructing ESG (electronic service guide) three-excellence credit model of enterprise

Publications (1)

Publication Number Publication Date
CN112232377A true CN112232377A (en) 2021-01-15

Family

ID=74107330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011000208.5A Pending CN112232377A (en) 2020-09-22 2020-09-22 Method and device for constructing ESG (electronic service guide) three-excellence credit model of enterprise

Country Status (1)

Country Link
CN (1) CN112232377A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222255A (en) * 2021-05-17 2021-08-06 上海生腾数据科技有限公司 Method and device for contract performance quantification and short-term default prediction
CN113570281A (en) * 2021-08-20 2021-10-29 瑞格人工智能科技有限公司 ESG index compiling method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103154991A (en) * 2010-07-23 2013-06-12 汤森路透环球资源公司 Credit risk mining
CN108564465A (en) * 2018-05-03 2018-09-21 上海第二工业大学 A kind of enterprise credit management method
JP2018169873A (en) * 2017-03-30 2018-11-01 株式会社 みずほ銀行 Rating evaluation system, rating evaluation method, and rating evaluation program
WO2019140675A1 (en) * 2018-01-22 2019-07-25 大连理工大学 Method for determining credit rating optimal weight vector on basis of maximum default discriminating ability for approximating an ideal point
CN110472884A (en) * 2019-08-20 2019-11-19 深圳前海微众银行股份有限公司 ESG index monitoring method, device, terminal device and storage medium
CN110533528A (en) * 2019-08-30 2019-12-03 北京市天元网络技术股份有限公司 Assess the method and apparatus of business standing
CN111222790A (en) * 2020-01-06 2020-06-02 深圳前海微众银行股份有限公司 Method, device and equipment for predicting risk event occurrence probability and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103154991A (en) * 2010-07-23 2013-06-12 汤森路透环球资源公司 Credit risk mining
JP2018169873A (en) * 2017-03-30 2018-11-01 株式会社 みずほ銀行 Rating evaluation system, rating evaluation method, and rating evaluation program
WO2019140675A1 (en) * 2018-01-22 2019-07-25 大连理工大学 Method for determining credit rating optimal weight vector on basis of maximum default discriminating ability for approximating an ideal point
CN108564465A (en) * 2018-05-03 2018-09-21 上海第二工业大学 A kind of enterprise credit management method
CN110472884A (en) * 2019-08-20 2019-11-19 深圳前海微众银行股份有限公司 ESG index monitoring method, device, terminal device and storage medium
CN110533528A (en) * 2019-08-30 2019-12-03 北京市天元网络技术股份有限公司 Assess the method and apparatus of business standing
CN111222790A (en) * 2020-01-06 2020-06-02 深圳前海微众银行股份有限公司 Method, device and equipment for predicting risk event occurrence probability and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
GUOTAI CHI 等: "Multi Criteria Credit Rating Model for Small Enterprise Using a Nonparametric Method", 《SUSTAINABILITY》, pages 1 - 23 *
姜海燕: "财务危机预警中Logistic回归模型的构建与检验——以我国制造类上市公司为例", 《中国优秀博硕士学位论文全文数据库 (硕士) 经济与管理科学辑》, no. 12, pages 152 - 602 *
张德丰: "《TensorFlow深度学习从入门到进阶》", 机械工业出版社, pages: 121 - 122 *
郑亚男: "我国电力设备与新能源企业信用评级研究", 《中国优秀硕士学位论文全文数据库 经济与管理科学辑》, no. 6, pages 150 - 240 *
饶泽炜: "融入ESG信息的公司债违约风险识别方案策划", 《中国优秀硕士学位论文全文数据库 工程科技Ⅰ辑》, no. 7, pages 027 - 323 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222255A (en) * 2021-05-17 2021-08-06 上海生腾数据科技有限公司 Method and device for contract performance quantification and short-term default prediction
CN113222255B (en) * 2021-05-17 2024-03-05 上海生腾数据科技有限公司 Method and device for quantifying contract performance and predicting short-term violations
CN113570281A (en) * 2021-08-20 2021-10-29 瑞格人工智能科技有限公司 ESG index compiling method

Similar Documents

Publication Publication Date Title
Dalfard et al. Performance evaluation and prioritization of leasing companies using the super efficiency Data Envelopment Analysis model
CN112668945A (en) Enterprise credit risk assessment method and device
CN112598500A (en) Credit processing method and system for non-limit client
Chi et al. Debt rating model based on default identification: Empirical evidence from Chinese small industrial enterprises
Yoshino et al. A comprehensive method for credit risk assessment of small and medium-sized enterprises based on Asian data
CN112232377A (en) Method and device for constructing ESG (electronic service guide) three-excellence credit model of enterprise
Frade Credit Risk Modeling: Default Probabilities
Zhai et al. A financial ratio-based predicting model for hotel business failure
Malik et al. Z-score Model: analysis and implication on textile sector of Pakistan
KR20180104967A (en) Investment Value Index and Model
Khorasgani optimal accounting based default prediction model for the uK sMes
Abad et al. European government bond market integration in turbulent times [WP]
Tesfaye DETERMINANTS OF COMMERCIAL BANKS’PERFORMANCE IN ETHIOPIA
Estran et al. Development of a Shadow Rating Model 1
Azzahra et al. Comparative Analysis of Islamic Banks’ Performance in Indonesia and Malaysia with RGEC and the Islamicity Performance Index (2018-2021)
Nguyen et al. Applying appropriate models to predict bankruptcy for Vietnamese listed construction companies
Onyiri Predicting Financial Distress using Altman's Z-score and the Sustainable Growth Rate
Farhood Role of Accounting Information in Predicting the Financial Failure of Companies
Parnes A spline hazard model for current expected credit losses
Bawono et al. Analysis Of Indonesia's Islamic Banking Bankruptcy Prediction For Period 2014-2016
Sneidere et al. Predicting Business Insolvency: The Latvian Experience
Lausberg et al. Market data and methods for real estate portfolio ratings
Kaur et al. An MDA approach for early identification of firms requiring corporate debt restructuring
Essam Mahmoud et al. The Relationship between Earnings Management and Credit Ratings and their impact on Firm Performance: Evidence from Egypt
Motyani Relationship between financial risk and performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination