WO2019222902A1 - 基于Informedness系数的信用评级最优指标组合遴选方法 - Google Patents

基于Informedness系数的信用评级最优指标组合遴选方法 Download PDF

Info

Publication number
WO2019222902A1
WO2019222902A1 PCT/CN2018/087773 CN2018087773W WO2019222902A1 WO 2019222902 A1 WO2019222902 A1 WO 2019222902A1 CN 2018087773 W CN2018087773 W CN 2018087773W WO 2019222902 A1 WO2019222902 A1 WO 2019222902A1
Authority
WO
WIPO (PCT)
Prior art keywords
index
informedness
coefficient
default
indicators
Prior art date
Application number
PCT/CN2018/087773
Other languages
English (en)
French (fr)
Inventor
迟国泰
张志鹏
周颖
Original Assignee
大连理工大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大连理工大学 filed Critical 大连理工大学
Priority to US16/969,476 priority Critical patent/US20210056622A1/en
Priority to PCT/CN2018/087773 priority patent/WO2019222902A1/zh
Publication of WO2019222902A1 publication Critical patent/WO2019222902A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Definitions

  • the invention provides a method for selecting an optimal index combination of a credit rating system, and specifically relates to the maximum index combination criterion based on the Informedness coefficient of the credit score as the optimal index combination criterion, and whether the index is selected into the index combination as a decision variable and Informedness
  • the method of identifying the maximum coefficient default capacity is the objective function, and it is a method of establishing a 0-1 planning model that takes into account that repeated information indicators cannot be simultaneously selected as constraints, and selects the optimal combination of indicators for credit ratings, which belongs to the field of credit service technology.
  • Credit is a borrowing activity subject to the repayment of principal and interest.
  • the purpose of credit rating is to evaluate the customer's credit rating and its corresponding default rate based on the value and status of the rating indicators.
  • the selection of the best combination of credit rating indicators is a process of selecting a group of indicator combinations with the highest accuracy of default identification among a large number of credit rating indicator combinations.
  • the existing research of credit rating index system based on index combination selection mainly includes three types of sequence selection method, Lasso regression method, and stepwise regression method.
  • Sun Jie et al. (2011) used a sequential floating forward selection algorithm to make the final selected index set the most similar to the information content of the overall index set.
  • Choi et al. (2015) based on the hybrid Lasso method, screened the indicator set including discrete indicators and continuous indicators, and established a credit rating model indicator system.
  • Yiwen Chien et al. (2001) selected indicators that affect credit card defaults, such as income and marital status, through gradual regression.
  • the present invention uses 0-1 planning to find an index system with the largest Informedness coefficient corresponding to the index system, that is, the strongest discrimination capability for breach of contract, to ensure that the entire index system discriminates for breach of contract. And in the 0-1 plan, by constructing the constraint condition of “in a group of indicators that reflect the duplication of information, there is only one selected index combination”, while the index combination has the largest Informedness coefficient, the indicators that reflect duplication of information are eliminated to avoid Information redundancy of the indicator system.
  • An object of the present invention is to provide a method for an optimal combination of credit ratings that can maximize the credit score default discrimination coefficient Informedness coefficient.
  • the maximum the credit score Informedness coefficient IN is used as the objective function.
  • "Indicator combination" as a constraint, establish a 0-1 planning model, and deduces a set of 0-1 variables c i and its corresponding indicator combination that indicate whether the indicator is selected, to ensure that the selected indicator system has the highest precision in identifying defaults, and It also avoids the information redundancy of the indicator system.
  • the method of selecting the best combination of credit rating indicators based on the Informedness coefficient includes 9 steps. Among them: step 1-2 is the loading and preprocessing of the data, step 3-7 is to determine the objective function of the 0-1 plan, step 8 is to determine the constraints of the 0-1 plan, and step 9 is to determine the 0-1 plan model.
  • step 1-2 is the loading and preprocessing of the data
  • step 3-7 is to determine the objective function of the 0-1 plan
  • step 8 is to determine the constraints of the 0-1 plan
  • step 9 is to determine the 0-1 plan model.
  • the determination of the combination of the solution and the optimal index is as follows:
  • Step 2 Data preprocessing
  • Max-Min is just one of them.
  • the index's Informedness coefficient in i is used to measure the index's default discrimination capacity.
  • a represents the number of customers who have actually defaulted and were judged to be in default
  • b represents the number of customers who have actually defaulted but were wrongly judged to be non-default
  • c represents customers who have not actually defaulted but have been wrongly judged to be default Number
  • d represents the number of customers who are actually non-default and are judged non-default
  • A, b, c, and d in formula (1) are obtained by comparing the determined default state D j with the actual default state T j ; and the determined default state is based on the threshold Get; when the value of index i of customer j x ij is greater than the threshold of index i At that time, the customer was judged to be non-default, and vice versa, that is:
  • the maximum threshold of the Informedness coefficient in i corresponding to the index i be the threshold of the index i
  • the corresponding maximum Informedness coefficient be the index of the i Informedness coefficient
  • Step 4 Remove the indicator of Informedness coefficient in i ⁇ 0, that is, the indicator of the default state cannot be identified, and the number of remaining indicators becomes M 1 ;
  • Step 5 Introduce the decision variable c i and weight w i
  • the index's Informedness coefficient in i is used to weight the rating indicators to ensure that the larger the Informedness coefficient, the stronger the default discrimination ability, the greater the corresponding weight, ie:
  • w i represents the weight of the i-th index
  • 0 -1 decision variables of the planning model; M 1 represents the number of indicators that need to be weighted;
  • An expression of customer credit score S j is constructed using a linear weighted formula, namely:
  • w i represents the weight of the i-th index
  • x ij is the value of the j-th customer under the i-th index
  • Step 7 Build the objective function of the 0-1 planning model with the maximum credit score Informedness coefficient IN
  • the credit score corresponding to the Informedness coefficient IN is related to the customer's credit score;
  • the selected index is different, that is, c i is different, the index weight w i obtained in step 5 is different, the credit score S j obtained in step 6 is different, and the Informedness coefficient IN corresponding to the credit score is also different; the credit score Informedness coefficient
  • the maximum IN is the objective function, and whether the index is selected as c i as the decision variable, and a 0-1 plan is selected to select a group of indicators with the strongest discrimination ability as the index system;
  • Step 8 Constraints on building a 0-1 planning model
  • c k and c l are a pair of indicators k and l, which reflect the duplication of information, and whether to be selected as the 0-1 variable of the final index system; there are several pairs of indicators that reflect duplication of information, and there are several constraints (6) ;
  • Step 9 Solution of 0-1 planning model and determination of optimal index combination
  • the set of indicators with the largest Informedness coefficient of credit score default discrimination ability among all the indicator combinations is selected as the optimal indicator combination to ensure that the final indicator combination can judge default and non-default customers to the greatest extent.
  • the present invention provides a method for the optimal combination of credit ratings based on the Informedness coefficient with the highest default discrimination ability, which can ensure the maximum default discrimination ability of the overall credit evaluation system, and provides a new method and new idea for the construction of a credit rating indicator system. .
  • the present invention establishes a 0-1 planning model by using the maximum score of the Informedness coefficient of credit score as the objective function, and setting the 0-1 planning model to reflect the repeated information indicators that cannot be selected at the same time as the constraint conditions. The idea solves the above problems.
  • FIG. 1 is a flowchart of an optimal combination of credit rating index with the largest default discrimination capability based on the Informedness coefficient.
  • the Informedness coefficient is used to measure the credit score's default discrimination capability.
  • the maximum Informedness coefficient default discrimination capacity is used as the objective function, and the reflecting information cannot be selected simultaneously as a constraint to establish a planning model, and the group with the highest credit score Informedness coefficient.
  • Step 1 Data loading
  • the first 81 indicators in column c of Table 1 are the observable indicators of sea election.
  • Column b of Table 1 is the criterion layer corresponding to the indicator, and column d of Table 1 is the type of indicator.
  • the first 81 rows in the 1-1451 column of Table 1 are the original credit rating indicators, and the 82 row is the default status value.
  • Step 2 Data preprocessing
  • Max-Min is just one of them.
  • the first 81 rows of columns 1452-2902 in Table 1 are the standardized values of 81 indicators.
  • Step 3 Calculate the default discrimination capability of a single credit rating sea election indicator in i
  • the index's Informedness coefficient in i is used to measure the index's default discrimination capability. The larger the index's Informedness coefficient, the more default customers will be judged as defaults, and the actual non-default customers will be more non-defaults. That is, this indicator has an indicator of default identification capability.
  • the formula of the Informedness coefficient of the indicator x i is as follows:
  • a represents the number of customers who have actually defaulted and were judged to be in default
  • b represents the number of customers who have actually defaulted but were wrongly judged to be non-default
  • c represents customers who have not actually defaulted but have been wrongly judged to be default Number
  • d represents the number of customers who are actually non-default and are judged non-default.
  • the above a, b, c, and d are obtained by comparing the determined default state D j with the actual default state T j .
  • the judged default status is obtained according to the threshold x i c .
  • the value of index i of customer j x ij is greater than the threshold x i c of indicator i, the customer is judged as a non-default, otherwise it is judged as a default, that is:
  • Step 4 Remove the indicators of Informedness coefficient in i ⁇ 0, that is, the indicators of default status cannot be identified, and the number of remaining indicators becomes M 1 .
  • Step 5 Introduce the decision variable c i and weight w i
  • w i represents the weight of the i-th index
  • the decision variable of the planning model; M 1 represents the number of indicators that need to be weighted.
  • Step 6 Construct a functional relationship between the customer credit score S j and the index weight w i .
  • w i represents the weight of the i-th index
  • x ij is the value of the j-th customer under the i-th index.
  • Step 7 Build the objective function of the 0-1 planning model with the maximum credit score Informedness coefficient IN
  • the index weight w i obtained in step 5 is different
  • the credit score S j obtained in step 6 is different
  • the Informedness coefficient IN corresponding to the credit score is also different. Taking the maximum credit score Informedness coefficient IN as the objective function, and whether the index is selected as c i as a decision variable, a 0-1 plan is selected to select a group of indicators with the strongest default discrimination ability as the indicator system.
  • Step 8 Constraints on building a 0-1 planning model
  • c k and c l respectively mark whether the indicators k and l are selected as the 0-1 variables of the final indicator system.
  • Step 9 Solution of 0-1 planning model and determination of optimal index combination
  • the third column of Table 3 is an indicator combination of the top 29 indicators with the largest Informedness coefficient among all non-redundant indicators.
  • the Informedness coefficient of the customer credit score based on this indicator combination is 0.885, which is significantly smaller than the Informedness coefficient of the indicator combination constructed based on this patent method is 0.973, indicating that an indicator combination consisting of a single indicator with strong default discrimination capability may not necessarily have a default discrimination capability. Strong.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Mathematical Physics (AREA)
  • Technology Law (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Development Economics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

本发明提供一种基于Informedness系数的信用评级最优指标组合遴选方法,属于信用服务技术领域。旨在解决现有信用评价体系不能保证整体违约鉴别能力最强、同时遴选一组指标时没有考虑指标间相关性的问题。以信用得分的Informedness系数违约鉴别能力最大为最优指标组合标准,以指标是否被选入指标组合作为决策变量,以Informedness系数违约鉴别能力最大为目标函数、以反映信息重复指标不能同时入选作为约束条件建立0-1规划模型,遴选信用评级的最优指标组合。本发明的方法确保了信用评价体系整体的Informedness系数违约鉴别能力最大,为银行、信用评级机构等提供了有效识别信用风险的决策依据。

Description

基于Informedness系数的信用评级最优指标组合遴选方法 技术领域
本发明提供一种信用评级体系的最优指标组合的遴选方法,具体涉及以信用得分的Informedness系数违约鉴别能力最大为最优指标组合标准,以指标是否被选入指标组合作为决策变量,以Informedness系数违约鉴别能力最大为目标函数、以反映信息重复指标不能同时入选作为约束条件建立0-1规划模型,遴选信用评级的最优指标组合的方法,属于信用服务技术领域。
背景技术
信用是以还本付息为条件的借贷活动。信用评级旨在通过评级指标的数值和状态对客户的信用级别及其对应的违约率进行评价。信用评级最优指标组合遴选是在众多的信用评级指标组合中,遴选违约鉴别精度最高的一组指标组合的过程。
由于每一个指标都有被选中和不被选中的两种状态,故指标的组合的个数巨大,其最优组合的难度也就更大。因为对于每一个指标、都存在入选指标组合或不入选指标组合这2种情况,且每一个指标是否入选并不影响其他指标是否入选,故组合数为每个指标入选可能情况(即2种)的连乘,则对于n个指标则有2×2×…×2=2 n种组合。
关于评价指标遴选的现有研究包括两类:一类是基于单个指标的信用评级指标遴选研究,另一类是基于指标组合遴选的信用评级指标遴选。
基于单个指标遴选的信用评级指标体系方面,Guotai Chi(2017)在包含还款能力和还款意愿两方面的初始指标集基础上,通过秩和检验筛选能鉴别违约状态的的单个指标、秩相关分析剔除反映信息重复的指标,最终建立了涵盖品德、资本、能力、经营环境和担保情况等5C原则的小企业信用评价指标体系。Wang  Di(2016)基于F-score、信息增益比和Pearson相关系数等多种指标遴选方法,遴选单个指标构成指标体系。
现有基于指标组合遴选的信用评级指标体系研究,主要包括序列选择方法、Lasso回归方法、以及逐步回归方法三类。例如:Sun Jie等(2011)通过顺序浮动向前选择算法,使得最终选择的指标集与整体指标集的信息含量最相似。Choi等(2015)基于混合Lasso方法,对包括离散型指标和连续性指标在内的指标集进行筛选,建立信用评级模型指标体系。Yiwen Chien等(2001)通过逐步回归遴选出了收入、婚姻状态等影响信用卡违约情况的指标。
现有研究在构建指标体系时,存在以下问题:一方面,现有研究仅仅从单个指标是否具有违约鉴别能力的角度出发构建指标体系,没有考虑到单个指标违约鉴别力强、指标体系整体违约鉴别力不一定强的现象。另一方面,即或是遴选一组信用评级指标,但序列选择算法、Lasso算法以及逐步回归方法,均未考虑指标之间的相关性,极可能将彼此反映信息相同的指标选入指标体系,造成指标体系的反映信息冗余。
本发明通过0-1规划寻找指标体系对应的Informedness系数最大、即违约鉴别能力最强的指标体系,保证指标体系整体对违约的鉴别能力。并通过在0-1规划中构建“在反映信息重复的一组指标中、最多只有1个入选指标组合”的约束条件,在指标组合Informedness系数最大的同时,剔除了反映信息重复的指标,避免了指标体系的信息冗余。
发明内容
本发明的目的是提供一种能使信用得分违约鉴别能力Informedness系数最大的信用评级最优指标组合的方法。
本发明的技术方案:
通过对客户的违约状态判别精度越高、信用得分对应的Informedness系数越大的思路,以信用得分Informedness系数IN最大为目标函数,以“在反映信息重复的一组指标中、最多只有1个入选指标组合”为约束条件,建立0-1规划模型,反推出标识指标是否入选的一组0-1变量c i及其对应的指标组合,确保遴选出的指标体系具有最高的违约鉴别精度,同时又避免了指标体系的信息冗余。
基于Informedness系数的信用评级最优指标组合遴选的方法,共包括9个步骤。其中:步骤1-2是数据的载入与预处理,步骤3-7是确定0-1规划的目标函数,步骤8是确定0-1规划的约束条件,步骤9是0-1规划模型的求解与最优指标组合的确定,具体步骤如下:
步骤1:数据载入
将N个客户的M 0个初始信用评级指标数据以及N个客户的违约状态数据载入Excel文件中;其中,违约=1,非违约=0;
步骤2:数据预处理
将信用评级海选指标的数据进行标准化,消除指标量纲的影响;
指标数据的标准化具有若干种方法,Max-Min仅是其中之一。
步骤3:计算单个信用评级海选指标的违约鉴别能力in i
采用指标的Informedness系数in i用于衡量指标的违约鉴别能力大小,指标的Informedness系数越大,则将实际违约客户更多地判为违约,同时将实际非违约客户更多地判为非违约,即该指标具有一个指标具有违约鉴别能力;指标i的Informedness系数公式如下:
Figure PCTCN2018087773-appb-000001
式(1)中,a表示实际违约、且被判定为违约的客户数;b表示实际违约、但被错判为非违约的客户数;c表示实际非违约、但被错判为违约的客户数;d表 示实际非违约、且被判定为非违约的客户数;
式(1)中的a、b、c、d是判定的违约状态D j与实际违约状态T j的对比结果得到;而判别的违约状态又根据阈值
Figure PCTCN2018087773-appb-000002
得到;当客户j的指标i数值x ij大于指标i的阈值
Figure PCTCN2018087773-appb-000003
时,该客户被判定为非违约,反之判定为违约,即:
Figure PCTCN2018087773-appb-000004
取遍全部客户指标i数值的取值,分别作为阈值判别所有客户的违约状态;令指标i对应的Informedness系数in i最大的阈值即为指标i的阈值,对应的最大Informedness系数即为指标i的Informedness系数;
步骤4:剔除Informedness系数in i≤0、即不能鉴别违约状态的指标,剩余指标个数变为M 1
步骤5:引入决策变量c i,并对评级指标赋权w i
采用指标的Informedness系数in i对评级指标进行赋权,确保Informedness系数越大、违约鉴别能力越强的指标对应权重越大,即:
Figure PCTCN2018087773-appb-000005
式(3)中,w i表示第i个指标的权重;c i表示第i个指标是否入选指标体系,若入选则c i=1,反之c i=0,c i也是最优指标组合0-1规划模型的决策变量;M 1表示需要赋权的指标个数;
步骤6:构建客户信用评分S j与指标权重w i的函数关系
采用线性加权公式构建客户信用评分S j的表达式,即:
Figure PCTCN2018087773-appb-000006
式(4)中w i表示第i个指标的权重,x ij是第i个指标下第j个客户的数值;
步骤7:以信用得分Informedness系数IN最大、构建0-1规划模型的目标函数
将步骤3中的指标的数值替换为信用得分,即得到信用得分对应的Informedness系数,记为IN;以信用得分Informedness系数IN最大为目标函数,如式(5):
Figure PCTCN2018087773-appb-000007
式(5)中的信用得分对应Informedness系数IN,是根据a和d的对比分析得到的,即根据所有客户判别出的违约状态D j与真实违约状态T j对比得到,即IN=f(D j,T j);而违约状态的对比又是根据客户的信用得分S j与信用得分阈值S c的大小关系得到,即IN=f[g(S j,S c),T j],故信用得分对应Informedness系数IN与客户的信用得分有关;
客户的信用得分S j是客户的指标数值x ij与指标权重w i的线性加权,如式(4)所示,即IN=f[h(x ij,w i),T j];而权重w i又是0-1规划模型的变量c i和指标Informedness系数in i的函数,如式(3)所示,即IN=f{h[x ij,q(c i,in i)],T j};故信用得分对应的Informedness系数IN是决策变量c i的函数;
入选的指标不同,即c i不同,则通过步骤5求得的指标权重w i不同,通过步骤6求得的信用得分S j不同,信用得分对应的Informedness系数IN也不同;以信用得分Informedness系数IN最大为目标函数,以指标是否入选c i为决策变量,构建0-1规划遴选一组违约鉴别能力最强的指标组合作为指标体系;
步骤8:构建0-1规划模型的约束条件
通过秩相关分析确定反映信息重复的指标;若一对指标的秩相关系数大于等于0.8,则这一对指标反映信息重复;对于每一对重复指标,都建立一个不等式约束条件,保证反映信息重复的一组指标中、最多只有1个入选最终体系, 如式(6)所示:
c k+c l≤1             (6)
其中,c k、c l是指标k和l这一对反映信息重复的指标、是否入选最终指标体系的0-1变量;有几对指标反映信息重复,就有几个约束条件式(6);
确定指标间反映信息重复具有若干种方法,秩相关方法仅是其中之一;
步骤9:0-1规划模型的求解与最优指标组合的确定
以式(5)为目标函数、以式(6)为约束条件,构建0-1规划模型,求解得到信用得分Informedness系数IN最大的那一个指标组合及其对应的最大的Informedness系数违约鉴别能力;
通过上述9个步骤,遴选所有指标组合中、信用得分违约鉴别能力Informedness系数最大的那组指标为最优的指标组合,确保最终的指标组合能最大程度的判对违约和非违约客户。
本发明的有益效果:
1、本发明提供了一种基于Informedness系数违约鉴别能力最大的信用评级最优指标组合的方法,能够确保信用评价体系整体的违约鉴别能力最大,提供了信用评级指标体系构建的新方法与新思路。
2、如何从全部指标组合中找到违约鉴别能力最大的指标组合是信用评级指标体系构建中亟待解决的难题。本发明通过以信用得分的Informedness系数违约鉴别能力最大为目标函数、以反映信息重复指标不能同时入选作为约束条件建立0-1规划模型,遴选信用得分的Informedness系数最大的那组指标形成指标体系的思路解决了上述难题。
3、为银行、信用评级机构、征信机构、开展信用违约业务的保险公司等机构进行信用评级提供决策依据。为购买企业债券的投资者、网络借贷(P2P)的资 金出借人提供投资参考。
附图说明
图1是基于Informedness系数违约鉴别能力最大的信用评级最优指标组合方法的流程图。
具体实施方式
以下结合附图和技术方案,进一步说明本发明的具体实施方式。
本发明基于Informedness系数违约鉴别能力最大的信用评级最优指标组合的方法的工作流程如下。
通过对客户的违约状态判别精度越高、信用得分的Informedness系数越大的思路,运用Informedness系数衡量信用得分的违约鉴别能力。基于0-1规划模型,以指标是否入选作为决策变量,以Informedness系数违约鉴别能力最大为目标函数、以反映信息重复指标不能同时入选作为约束条件建立规划模型,遴选信用得分Informedness系数最大的那组指标形成指标体系。
本发明所述方案实施步骤如下:
以中国某商业银行近20年1451笔小型工业企业贷款数据为实证样本,对本发明所述方案的步骤进行说明。
步骤1:数据载入
将所有N=1451个样本、M 0=81个海选评级指标和违约状态(违约=1,非违约=0)指标的源数据载入到Excel文件中。
表1第c列前81个指标是海选的可观测指标。表1第b列为指标对应的准则层,表1第d列为指标的类型。表1第1-1451列前81行是信用评级指标的原始数值,第82行是违约状态数值。
步骤2:数据预处理
通过Max-Min等标准化方法,将表1第1-1451列前81行信用评级海选指标的原始数据进行标准化,消除指标量纲的影响。
指标数据的标准化具有若干种方法,Max-Min仅是其中之一。
表1第1452-2902列前81行是81个指标标准化后的数值。
表1 81个信用评级海选指标的原始数据及标准化数据
Figure PCTCN2018087773-appb-000008
步骤3:计算单个信用评级海选指标的违约鉴别能力in i
采用指标的Informedness系数大小in i衡量指标的违约鉴别能力大小,指标 的Informedness系数越大,则会将实际违约客户更多地判为违约,同时将实际非违约客户更多地判为非违约,即该指标具有一个指标具有违约鉴别能力。指标x i的Informedness系数公式如下:
Figure PCTCN2018087773-appb-000009
式(1)中,a表示实际违约、且被判定为违约的客户数;b表示实际违约、但被错判为非违约的客户数;c表示实际非违约、但被错判为违约的客户数;d表示实际非违约、且被判定为非违约的客户数。
上述的a、b、c、d是判定的违约状态D j与实际违约状态T j的对比结果得到。而判别的违约状态又根据阈值x i c得到。当客户j的指标i数值x ij大于指标i的阈值x i c时,该客户被判定为非违约,反之判定为违约,即:
Figure PCTCN2018087773-appb-000010
分别以表1第1行第1452-2902列各列作为指标X 1的阈值x 1 c,将表1第1行第1452-2902列全部列的指标X 1数值x 1j,代入式(2),判别所有客户的违约状态。对所有客户的违约状态进行计数,得到1451组a、b、c、d的值,代入式(1),得到指标X 1对应的1451个Informedness系数。选出其中最大的一个Informedness系数,作为指标X 1最终的Informedness系数。同理,可得表1各行所有指标的Informedness系数,如表1第e列所示。
步骤4:剔除Informedness系数in i≤0、即不能鉴别违约状态的指标,剩余指标个数变为M 1
根据表1第e列,删除“年龄”等Informedness系数非正的4个指标,在表1第f列以“初筛删除”进行标记。剩余M 1=77个指标,对这77个指标进行重新编号,序号如表1第g列所示。下文即在这77个指标中选取最优指标组合。
步骤5:引入决策变量c i,并对评级指标赋权w i
采用指标的Informedness系数in i对评级指标进行赋权,确保Informedness系数越大、违约鉴别能力越强的指标对应权重越大,即:
Figure PCTCN2018087773-appb-000011
式(3)中,w i表示第i个指标的权重;c i表示第i个指标是否入选指标体系,若入选则c i=1,反之c i=0,c i也是最优指标组合0-1规划模型的决策变量;M 1表示需要赋权的指标个数。
将表1第e列中没有标记“初筛删除”的指标Informedness系数in i及M 1=77,代入式(3),得到77个指标对应的权重w i。如式(3’-1)-式(3’-77)所示。
Figure PCTCN2018087773-appb-000012
步骤6:构建客户信用评分S j与指标权重w i的函数关系。
采用线性加权公式构建客户信用评分S j的表达式,即:
Figure PCTCN2018087773-appb-000013
式(4)中w i表示第i个指标的权重,x ij是第i个指标下第j个客户的数值。
分别将表1第1452-2902列各指标数据x ij、以及式(3’-1)-式(3’-77)的指标权重w i代入式(4),得到第j个客户的信用得分s j,如式(4’-1)-式(4’-1451)所示:
Figure PCTCN2018087773-appb-000014
步骤7:以信用得分Informedness系数IN最大、构建0-1规划模型的目标函数
将步骤3中的指标值替换为信用得分,即可得到信用得分对应的Informedness系数,记为IN。以信用得分Informedness系数IN最大为目标函数,如式(5):
Figure PCTCN2018087773-appb-000015
因为式(5)中的信用得分对应Informedness系数IN,是根据a和d的对比分析得到的,即根据所有客户判别出的违约状态D j与真实违约状态T j对比得到,即IN=f(D j,T j)。而违约状态的对比又是根据客户的信用得分S j与信用得分阈值S c的大小关系得到,即IN=f[g(S j,S c),T j],故信用得分对应Informedness系数IN与客户的信用得分有关。
又因为客户的信用得分S j是客户的指标数值x ij与指标权重w i的线性加权,如上文式(4)所示,即IN=f[h(x ij,w i),T j]。而权重w i又是0-1变量c i和指标Informedness系数in i的函数,如上文式(3)所示,即IN=f{h[x ij,q(c i,in i)],T j}。故信用得分对应的Informedness系数IN是决策变量c i的函数。
入选的指标不同、即c i不同,则通过步骤5求得的指标权重w i不同,通过步骤6求得的信用得分S j不同,信用得分对应的Informedness系数IN也不同。 以信用得分Informedness系数IN最大为目标函数,以指标是否入选c i为决策变量,构建0-1规划遴选一组违约鉴别能力最强的指标组合作为指标体系。
步骤8:构建0-1规划模型的约束条件
通过秩相关分析确定反映信息重复的指标。若一对指标的秩相关系数大于等于0.8,则这一对指标反映信息重复。对于每一对重复指标,都建立一个不等式约束条件,保证反映信息重复的一组指标中、最多只有1个入选最终体系,如式(6)所示:
c k+c l≤1            (6)
其中,c k、c l分别标记指标k和l是否入选最终指标体系的0-1变量。有几对指标反映信息重复,就有几个约束条件式(6)。
经过秩相关分析,共有23对反映信息重复的指标,指标名称与两两指标的秩相关系数如表2所示。
表2 高相关指标
Figure PCTCN2018087773-appb-000016
将表2第1-23行代入式(6),即:
Figure PCTCN2018087773-appb-000017
确定指标间反映信息重复具有若干种方法,秩相关方法仅是其中之一。
步骤9:0-1规划模型的求解与最优指标组合的确定
以式(5)为目标函数、以式(6’)为约束条件,构建0-1规划模型,求解得到信 用得分Informedness系数IN最大的那一个指标组合及其对应的最大的Informedness系数违约鉴别能力。
利用本发明最优指标组合确定方法,以中国某商业银行近20年1451笔小型工业企业贷款样本为实证数据,得到基于Informedness系数违约鉴别能力最大的包括29个指标的信用评级最优指标组合,在表1第f列以“1”标识,未入选指标以“0”标识。为方便阅读,将表1第f列标识为“1”的指标挑出,列入表3第2列,该指标组合的Informedness系数为0.973。
表3 最优的指标组合及其对比指标组合
Figure PCTCN2018087773-appb-000018
表3第3列是所有非冗余指标中Informedness系数最大的前29个指标组成的指标组合。基于该指标组合的客户信用得分的Informedness系数为0.885,明显小于基于本专利方法构建的指标组合的Informedness系数为0.973,说明违约鉴别能力强的单个指标组成的指标组合,其违约鉴别能力不一定也强。
本发明尚有多种具体的实施方式,凡采用本发明所述“基于Informedness系数违约鉴别能力最大的信用评级最优指标组合的方法”等同替换、或者等效变换而形成的所有技术方案,均落在本发明要求保护的范围内。

Claims (1)

  1. 一种基于Informedness系数的信用评级最优指标组合遴选的方法,其特征在于,步骤如下:
    步骤1:数据载入
    将N个客户的M 0个初始信用评级指标数据以及N个客户的违约状态数据载入Excel文件中;其中,违约=1,非违约=0;
    步骤2:数据预处理
    将信用评级海选指标数据进行标准化,消除指标量纲的影响;
    步骤3:计算单个信用评级海选指标的违约鉴别能力in i
    采用指标的Informedness系数in i用于衡量指标的违约鉴别能力大小,指标的Informedness系数越大,则将实际违约客户更多地判为违约,同时将实际非违约客户更多地判为非违约,即该指标具有一个指标具有违约鉴别能力;指标i的Informedness系数公式如下:
    Figure PCTCN2018087773-appb-100001
    式(1)中,a表示实际违约、且被判定为违约的客户数;b表示实际违约、但被错判为非违约的客户数;c表示实际非违约、但被错判为违约的客户数;d表示实际非违约、且被判定为非违约的客户数;
    式(1)中的a、b、c、d是判定的违约状态D j与实际违约状态T j的对比结果得到;而判别的违约状态又根据阈值
    Figure PCTCN2018087773-appb-100002
    得到;当客户j的指标i数值x ij大于指标i的阈值x i c时,该客户被判定为非违约,反之判定为违约,即:
    Figure PCTCN2018087773-appb-100003
    取遍全部客户指标i数值的取值,分别作为阈值判别所有客户的违约状态;令指标i对应的Informedness系数in i最大的阈值即为指标i的阈值,对应的最大 Informedness系数即为指标i的Informedness系数;
    步骤4:剔除Informedness系数in i≤0、即不能鉴别违约状态的指标,剩余指标个数变为M 1
    步骤5:引入决策变量c i,并对评级指标赋权w i
    采用指标的Informedness系数in i对评级指标进行赋权,确保Informedness系数越大、违约鉴别能力越强的指标对应权重越大,即:
    Figure PCTCN2018087773-appb-100004
    式(3)中,w i表示第i个指标的权重;c i表示第i个指标是否入选指标体系,若入选则c i=1,反之c i=0,c i也是最优指标组合0-1规划模型的决策变量;M 1表示需要赋权的指标个数;
    步骤6:构建客户信用评分S j与指标权重w i的函数关系
    采用线性加权公式构建客户信用评分S j的表达式,即:
    Figure PCTCN2018087773-appb-100005
    式(4)中w i表示第i个指标的权重,x ij是第i个指标下第j个客户的数值;
    步骤7:以信用得分Informedness系数IN最大、构建0-1规划模型的目标函数
    将步骤3中的指标的数值替换为信用得分,即得到信用得分对应的Informedness系数,记为IN;以信用得分Informedness系数IN最大为目标函数,如式(5):
    Figure PCTCN2018087773-appb-100006
    式(5)中的信用得分对应Informedness系数IN,是根据a和d的对比分析得到的,即根据所有客户判别出的违约状态D j与真实违约状态T j对比得到,即 IN=f(D j,T j);而违约状态的对比又是根据客户的信用得分S j与信用得分阈值S c的大小关系得到,即IN=f[g(S j,S c),T j],故信用得分对应Informedness系数IN与客户的信用得分有关;
    客户的信用得分S j是客户的指标数值x ij与指标权重w i的线性加权,如式(4)所示,即IN=f[h(x ij,w i),T j];而权重w i又是0-1规划模型的变量c i和指标Informedness系数in i的函数,如式(3)所示,即IN=f{h[x ij,q(c i,in i)],T j};故信用得分对应的Informedness系数IN是决策变量c i的函数;
    入选的指标不同,即c i不同,则通过步骤5求得的指标权重w i不同,通过步骤6求得的信用得分S j不同,信用得分对应的Informedness系数IN也不同;以信用得分Informedness系数IN最大为目标函数,以指标是否入选c i为决策变量,构建0-1规划遴选一组违约鉴别能力最强的指标组合作为指标体系;
    步骤8:构建0-1规划模型的约束条件
    通过秩相关分析确定反映信息重复的指标;若一对指标的秩相关系数大于等于0.8,则这一对指标反映信息重复;对于每一对重复指标,都建立一个不等式约束条件,保证反映信息重复的一组指标中、最多只有1个入选最终体系,如式(6)所示:
    c k+c l≤1            (6)
    其中,c k、c l是指标k和l这一对反映信息重复的指标、是否入选最终指标体系的0-1变量;有几对指标反映信息重复,就有几个约束条件式(6);
    确定指标间反映信息重复具有若干种方法,秩相关方法仅是其中之一;
    步骤9:0-1规划模型的求解与最优指标组合的确定
    以式(5)为目标函数、以式(6)为约束条件,构建0-1规划模型,求解得到信用得分Informedness系数IN最大的那一个指标组合及其对应的最大的 Informedness系数违约鉴别能力。
PCT/CN2018/087773 2018-05-22 2018-05-22 基于Informedness系数的信用评级最优指标组合遴选方法 WO2019222902A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/969,476 US20210056622A1 (en) 2018-05-22 2018-05-22 Optimal feature subset selection method in credit scoring based on informedness coefficient
PCT/CN2018/087773 WO2019222902A1 (zh) 2018-05-22 2018-05-22 基于Informedness系数的信用评级最优指标组合遴选方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/087773 WO2019222902A1 (zh) 2018-05-22 2018-05-22 基于Informedness系数的信用评级最优指标组合遴选方法

Publications (1)

Publication Number Publication Date
WO2019222902A1 true WO2019222902A1 (zh) 2019-11-28

Family

ID=68616175

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/087773 WO2019222902A1 (zh) 2018-05-22 2018-05-22 基于Informedness系数的信用评级最优指标组合遴选方法

Country Status (2)

Country Link
US (1) US20210056622A1 (zh)
WO (1) WO2019222902A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070129834A1 (en) * 2005-12-05 2007-06-07 Howard Michael D Methods and apparatus for heuristic search to optimize metrics in generating a plan having a series of actions
CN105956915A (zh) * 2016-04-19 2016-09-21 大连理工大学 基于信用相似度最大的信用等级最优划分方法
CN107038511A (zh) * 2016-02-01 2017-08-11 腾讯科技(深圳)有限公司 一种确定风险评估参数的方法及装置
CN107194803A (zh) * 2017-05-19 2017-09-22 南京工业大学 一种p2p网贷借款人信用风险评估的装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070129834A1 (en) * 2005-12-05 2007-06-07 Howard Michael D Methods and apparatus for heuristic search to optimize metrics in generating a plan having a series of actions
CN107038511A (zh) * 2016-02-01 2017-08-11 腾讯科技(深圳)有限公司 一种确定风险评估参数的方法及装置
CN105956915A (zh) * 2016-04-19 2016-09-21 大连理工大学 基于信用相似度最大的信用等级最优划分方法
CN107194803A (zh) * 2017-05-19 2017-09-22 南京工业大学 一种p2p网贷借款人信用风险评估的装置

Also Published As

Publication number Publication date
US20210056622A1 (en) 2021-02-25

Similar Documents

Publication Publication Date Title
Aminadav et al. Corporate control around the world
Francis et al. The role of firm-specific incentives and country factors in explaining voluntary IAS adoptions: Evidence from private firms
Back Explaining financial difficulties based on previous payment behavior, management background variables and financial ratios
McKee Rough sets bankruptcy prediction models versus auditor signalling rates
Ali et al. Prediction of stock performance by using logistic regression model: evidence from Pakistan Stock Exchange (PSX)
Jamaluddin The effect of financial distress and disclosure on going concern opinion of the banking company listing in Indonesian Stock Exchange
WO2019140675A1 (zh) 基于逼近理想点违约鉴别能力最大的信用评级最优权重向量的方法
Abdullah et al. THE VALUE OF GOVERNANCE VARIABLES IN PREDICTING FINANCIAL DISTRESS AMONG SMALL AND MEDIUM-SIZED ENTERPRISES IN MALAYSIA.
Blanco Oliver et al. Improving bankruptcy prediction in micro-entities by using nonlinear effects and non-financial variables
Yazdanfar Predicting bankruptcy among SMEs: evidence from Swedish firm-level data
Yaşar et al. Predicting qualified audit opinions using financial ratios: Evidence from the Istanbul Stock Exchange
TWI464700B (zh) 信用違約預測方法與裝置
Lausen et al. Who is the next “wolf of wall street”? Detection of financial intermediary misconduct
Hope et al. Government transparency and firm‐level operational efficiency
Marti et al. Disparities in sustainable development goals compliance and their association with country risk
Hsu et al. Evaluation of the going‐concern status for companies: An ensemble framework‐based model
Goenner Uncertain times and early predictions of bank failure
WO2019222902A1 (zh) 基于Informedness系数的信用评级最优指标组合遴选方法
Zhai et al. A financial ratio-based predicting model for hotel business failure
CN108765136A (zh) 基于Informedness系数的信用评级最优指标组合遴选方法
Rikkers et al. Default prediction of small and medium-sized enterprises with industry effects
Agostini Two common steps in firms’ failing path
Wang Corporate default prediction: models, drivers and measurements
Sadatrasoul Matrix Sequential Hybrid Credit Scorecard Based on Logistic Regression and Clustering
WO2019140674A1 (zh) 基于Fisher Score违约鉴别能力最大的信用评级最优指标组合的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18920103

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18920103

Country of ref document: EP

Kind code of ref document: A1