CN111047193A - Enterprise credit scoring model generation algorithm based on credit big data label - Google Patents

Enterprise credit scoring model generation algorithm based on credit big data label Download PDF

Info

Publication number
CN111047193A
CN111047193A CN201911278580.XA CN201911278580A CN111047193A CN 111047193 A CN111047193 A CN 111047193A CN 201911278580 A CN201911278580 A CN 201911278580A CN 111047193 A CN111047193 A CN 111047193A
Authority
CN
China
Prior art keywords
credit
enterprise
big data
label
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911278580.XA
Other languages
Chinese (zh)
Inventor
刘海滨
郭佳劼
叶林
沙凌峰
冉作舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dolphin Enterprise Credit Reporting Service Co Ltd
Original Assignee
Shanghai Dolphin Enterprise Credit Reporting Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dolphin Enterprise Credit Reporting Service Co Ltd filed Critical Shanghai Dolphin Enterprise Credit Reporting Service Co Ltd
Priority to CN201911278580.XA priority Critical patent/CN111047193A/en
Publication of CN111047193A publication Critical patent/CN111047193A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an enterprise credit scoring model generation algorithm based on a credit big data label, and relates to the technical field of scoring model generation. The enterprise credit scoring model generation algorithm based on the credit big data label comprises the following steps: 1. labeling original data based on a label classification and quantitative analysis method of credit big data to construct an enterprise label matrix; 2. screening according to the identity tags, and constructing an enterprise scene tag library; 3. processing the enterprise label matrix by using the (k, epsilon) -core set; 4. carrying out index screening on the enterprise credit data by using a random forest algorithm; 5. taking the IV value as a single variable screening standard; 6. and fitting the screened variables to a logistic regression model. According to the method, the big data credit label is utilized, the structure of the parent-child model is adopted, sparse big data information is processed into dense information through the child model, then output information of the child model is used as input variables of the parent model, and the information is processed layer by layer to form the technical framework of the model nested model.

Description

Enterprise credit scoring model generation algorithm based on credit big data label
Technical Field
The invention relates to the technical field of scoring model generation, in particular to an enterprise credit scoring model generation algorithm based on credit big data labels.
Background
The enterprise credit is a product of market economy, is a comprehensive analysis and determination of the capability of various market participation bodies to fulfill corresponding economic contracts and the integral credibility of the enterprises, and in the market economy countries, the level of the enterprise credit is directly linked with the financing cost. The interest rate of the enterprises (units) with high credit rating and excellent credit standing for issuing bonds or applying for loan is low, and the interest rate of the enterprises (units) with low credit rating and poor credit standing for issuing bonds or applying for loan is correspondingly higher; enterprises (entities) without credit rating, i.e., non-credit recorders, are not allowed to issue bonds in the market and are generally difficult to credit for money.
According to the knowledge, the popular is the three-level ten-level credit rating standard, the AAA level credit rating is the highest level, the representative enterprise credit degree is high, the debt risk is small, the excellent credit record is provided, the operation condition is good, the profitability is strong, the development prospect is wide, the influence of uncertain factors on the operation and the development of the enterprise credit rating model is very small, but at present, an effective enterprise credit rating model generation algorithm based on a credit big data label is not provided, and therefore the enterprise credit rating is uncertain.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides an enterprise credit scoring model generation algorithm based on credit big data labels, and solves the problem that the enterprise credit scoring has a lot of uncertainty due to the fact that an effective enterprise credit scoring model generation algorithm based on credit big data labels does not exist in the prior art.
(II) technical scheme
In order to achieve the purpose, the invention is realized by the following technical scheme: an enterprise credit scoring model generation algorithm based on credit big data labels comprises the following steps:
1. labeling original data based on a label classification and quantitative analysis method of credit big data to construct an enterprise label matrix;
2. screening according to the identity tags, and constructing an enterprise scene tag library;
3. processing the enterprise label matrix by using the (k, epsilon) -core set;
4. carrying out index screening on the enterprise credit data by using a random forest algorithm;
5. taking the IV value as a single variable screening standard;
6. fitting the screened variables to a logistic regression model;
7. and combining the scoring card model to obtain the credit score of the enterprise.
Preferably, in the step 1, the original data of the enterprise obtained from the public information credit platform is labeled by a label classification and quantitative analysis method based on credit big data, so as to construct an enterprise label matrix.
Preferably, in the step 2, different identity tags are screened according to different analysis scenarios, and different enterprise scenario tag databases are constructed.
Preferably, the (k, epsilon) -kernel set-based algorithm in the step 3 is used for compressing the sparse matrix and reducing the complexity of operation space and time.
Preferably, the constructing a random forest in the step 4 includes:
1) taking blacklist enterprises in the current year as bad samples, and taking the rest enterprises as good samples to fit a random forest model;
2) after the importance results of all indexes are obtained, the indexes with the importance ratio less than 0.1 percent are removed, and the screened data indexes are obtained preliminarily.
Preferably, in the step 5, the data index generated in the step 3 is evaluated according to the WOE value and the IV value;
1) WOE value calculation formula:
Figure BDA0002315991810000031
pgood is the occupancy of the good sample under the value of the label;
pbad is the occupancy rate of the bad sample under the condition of changing the value of the label;
2) IV value calculation formula:
Figure BDA0002315991810000032
wherein N is the number of the labels with the value capable.
Preferably, in the step 6, a logistic regression model is fitted according to the variables obtained after the screening, the weight of each dimension is calculated, and the WOE value and the regression coefficient of each dimension are combined by combining the scoring card model mentioned in the step 7, so as to obtain the credit score of the enterprise.
(III) advantageous effects
The invention provides an enterprise credit scoring model generation algorithm based on a credit big data label.
The method has the following beneficial effects:
1. according to the method, the big data credit label is utilized, the structure of the parent-child model is adopted, sparse big data information is processed into dense information through the child model, then output information of the child model is used as input variables of the parent model, and the information is processed layer by layer to form the technical framework of the model nested model.
2. The invention analyzes the behaviors and requirements of credit subjects in different industries, fields, areas, application directions and the like through an enterprise credit scoring algorithm model, accurately delineates and scores the credit risk characteristics of the enterprise, brings as much data related to credit as possible into a credit rating index system by using a credit rating system of data evaluation, reduces manual intervention on the result by combining a credit rating model of big data, and applies the rating result to various fields of social governance, public service, economic activity, public welfare and the like.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example (b):
as shown in fig. 1, an embodiment of the present invention provides an enterprise credit scoring model generation algorithm based on credit big data tags, including the following specific contents:
the label classification and quantitative analysis method based on credit big data is characterized by labeling original data and constructing an enterprise label matrix:
the method comprises the steps of taking a renewal enterprise obtaining business qualification and a closing enterprise obtaining business qualification at one time as samples, and collecting all available public credit records and industry credit record data in the last two years, wherein the credit information data are derived from information data, public credit information platform data and credit internet data of registration management departments such as market supervision.
After data cleaning is carried out through a credit big data label technology of a company, all index data are quantized, and an enterprise label matrix is constructed, wherein each value in the matrix represents quantized information under a certain enterprise specific dimension.
Secondly, processing the enterprise label matrix by using the (k, epsilon) -core set:
due to diversification and non-standardization of an enterprise credit data acquisition channel, the client information loss rate is high, information of the same client in different dimensions is often incomplete, and the data sparsity is finally reflected. The (k, epsilon) -core set is used for processing the enterprise label matrix, so that the dimension of the enterprise sparse matrix can be effectively reduced, and the core information in the matrix is highlighted.
The algorithm is defined as follows:
set of points for an n-dimensional space
Figure BDA0002315991810000051
And a vector in an n-dimensional space
Figure BDA0002315991810000052
Defining the minimum Euclidean distance from the vector x to the point set S as:
Figure BDA0002315991810000053
for a matrix A of dimension (m × n), the row vector is (a)1,...,am),
Define the sum of the squares of the distances a to S as:
Figure BDA0002315991810000054
for the kernel set:
by kernel set is meant the row vector (a) for a (m × n) matrix A1,...,am)
It can be understood that m points in n-dimensional space and the kernel set is composed of these row vectors (a)1,...,am)
Set C after weighting, i.e. { omega }1a1,...,ωmam}
Here, the weight average of ownership is greater than or equal to 0, when the kernel set C is a weighted subset of the row vector set of the matrix a, but when the weight average of ownership is equal to 1, this is the set C is the set of row vectors of a, and it is also satisfied that, for all k-th order subspaces S, the distance to a can be approximately expressed as the distance to the kernel set C of a, and the expression is:
|dist2(A,S)-dist2(c,s)|≤ε·dist2(A,S) (3)
in short, the distance from S to A can be approximated by the distance from S to C.
Thirdly, performing index screening on the enterprise credit data by using a random forest algorithm:
obtaining the relative weight of the indexes by using a random forest, arranging the indexes with the importance less than 0.1 percent from large to small according to the specific gravity, finding out whether the three indexes of the asset liability rate, the tax payment credit level and the established year exist at the same time, having the largest influence on judging whether the enterprise is continuously reserved, and simultaneously, according to different definitions of target variables, outputting different data of results.
Fourthly, WOE and IV of the label:
1) evaluating the algorithm tag generated in the third step according to the WOE (weightofEvent) value and the IV (informationValue) value;
2) WOE value calculation formula:
Figure BDA0002315991810000061
Pgoodthe occupancy rate of the good sample under the value of the label is obtained;
Pbadthe occupancy rate of the bad samples under the condition of changing the value of the label is changed;
3) IV value calculation formula:
Figure BDA0002315991810000071
n is the number of possible values of the label.
The criteria for screening variables are shown in the following table, selecting tags with IV values greater than 0.03;
Figure BDA0002315991810000072
fifthly, logistic regression model:
the traditional credit risk scoring model takes a Logistic regression method as a core, and has the advantages that the Logistic method has unique advantages in processing data of two classification dependent variables, meanwhile, the model is weak in assumption about data distribution, and has good performance when the data is in non-normal distribution. Therefore, the method is the most widely applied method of the financial institutions and credit investigation institutions at home and abroad at present. And changing the variables of all dimensions in the model through the difference of the target variables of the training set to obtain the logistic regression model based on the target variables.
Sixthly, grading card models:
and (3) a single-dimensional grading algorithm:
score=A+B*ln(odds),
Figure BDA0002315991810000073
wherein P is the probability of a bad user;
the single-dimensional score interval algorithm is as follows:
Figure BDA0002315991810000081
wherein B is the fractional increase of PDO (pointopodedods) for each 1-fold increase of odds
Substituting the score p0 when odds is θ _0, the score p0+ PDO when odds is 2 θ _0 into the score formula;
the final score determination is shown in the following table:
Figure BDA0002315991810000082
in the model, the potential score of the enterprise in the year is obtained by the formula, wherein the benchmark score is 50 scores, and the score of the double rate (PDO) is 5 scores. Through the process, the scores of all enterprises in the industry of travel are calculated, and the construction of an industry credit development trend analysis model is completed.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. An enterprise credit scoring model generation algorithm based on credit big data labels is characterized in that: the method comprises the following steps:
1. labeling original data based on a label classification and quantitative analysis method of credit big data to construct an enterprise label matrix;
2. screening according to the identity tags, and constructing an enterprise scene tag library;
3. processing the enterprise label matrix by using the (k, epsilon) -core set;
4. carrying out index screening on the enterprise credit data by using a random forest algorithm;
5. taking the IV value as a single variable screening standard;
6. fitting the screened variables to a logistic regression model;
7. and combining the scoring card model to obtain the credit score of the enterprise.
2. The enterprise credit scoring model generation algorithm based on credit big data labels as claimed in claim 1, wherein: in the step 1, original data of the enterprise is labeled through the original data of the enterprise obtained from the public information credit platform based on a label classification and quantitative analysis method of credit big data, and an enterprise label matrix is constructed.
3. The enterprise credit scoring model generation algorithm based on credit big data labels as claimed in claim 1, wherein: in the step 2, different identity tags are screened according to different analysis scenes, and different enterprise scene tag databases are constructed.
4. The enterprise credit scoring model generation algorithm based on credit big data labels as claimed in claim 1, wherein: and 3, the (k, epsilon) -kernel set based algorithm is used for compressing the sparse matrix and reducing the complexity of the operation space and the time complexity.
5. The enterprise credit scoring model generation algorithm based on credit big data labels as claimed in claim 1, wherein: the step 4 of constructing the random forest comprises the following steps:
1) taking blacklist enterprises in the current year as bad samples, and taking the rest enterprises as good samples to fit a random forest model;
2) after the importance results of all indexes are obtained, the indexes with the importance ratio less than 0.1 percent are removed, and the screened data indexes are obtained preliminarily.
6. The enterprise credit scoring model generation algorithm based on credit big data labels as claimed in claim 1, wherein: in the step 5, the data index generated in the step 3 is evaluated according to the WOE value and the IV value;
1) WOE value calculation formula:
Figure FDA0002315991800000021
pgood is the occupancy of the good sample under the value of the label;
pbad is the occupancy rate of the bad sample under the condition of changing the value of the label;
2) IV value calculation formula:
Figure FDA0002315991800000022
wherein N is the number of the labels with the value capable.
7. The enterprise credit scoring model generation algorithm based on credit big data labels as claimed in claim 1, wherein: and 6, fitting a logistic regression model according to the variables obtained after the screening, calculating the weight of each dimension, and combining the WOE value and the regression coefficient of each dimension to obtain the credit score of the enterprise by combining the scoring card model mentioned in the step 7.
CN201911278580.XA 2019-12-13 2019-12-13 Enterprise credit scoring model generation algorithm based on credit big data label Pending CN111047193A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911278580.XA CN111047193A (en) 2019-12-13 2019-12-13 Enterprise credit scoring model generation algorithm based on credit big data label

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911278580.XA CN111047193A (en) 2019-12-13 2019-12-13 Enterprise credit scoring model generation algorithm based on credit big data label

Publications (1)

Publication Number Publication Date
CN111047193A true CN111047193A (en) 2020-04-21

Family

ID=70236178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911278580.XA Pending CN111047193A (en) 2019-12-13 2019-12-13 Enterprise credit scoring model generation algorithm based on credit big data label

Country Status (1)

Country Link
CN (1) CN111047193A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582466A (en) * 2020-05-09 2020-08-25 深圳市卡数科技有限公司 Scoring card configuration method, device, equipment and storage medium for simulation neural network
CN112182333A (en) * 2020-09-25 2021-01-05 山东亿云信息技术有限公司 Talent space-time big data processing method and system based on random forest
CN112418987A (en) * 2020-11-20 2021-02-26 厦门大学 Method and system for rating credit of transportation unit, electronic device and storage medium
CN113159709A (en) * 2021-03-24 2021-07-23 深圳闪回科技有限公司 Automatic label system and system
CN114462516A (en) * 2022-01-21 2022-05-10 天元大数据信用管理有限公司 Enterprise credit score sample labeling method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779457A (en) * 2016-12-29 2017-05-31 深圳微众税银信息服务有限公司 A kind of rating business credit method and system
CN109784731A (en) * 2019-01-17 2019-05-21 上海三零卫士信息安全有限公司 A kind of private education mechanism credit scoring system and its construction method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106779457A (en) * 2016-12-29 2017-05-31 深圳微众税银信息服务有限公司 A kind of rating business credit method and system
CN109784731A (en) * 2019-01-17 2019-05-21 上海三零卫士信息安全有限公司 A kind of private education mechanism credit scoring system and its construction method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582466A (en) * 2020-05-09 2020-08-25 深圳市卡数科技有限公司 Scoring card configuration method, device, equipment and storage medium for simulation neural network
CN111582466B (en) * 2020-05-09 2023-09-01 深圳市卡数科技有限公司 Score card configuration method, device and equipment for simulating neural network and storage medium
CN112182333A (en) * 2020-09-25 2021-01-05 山东亿云信息技术有限公司 Talent space-time big data processing method and system based on random forest
CN112418987A (en) * 2020-11-20 2021-02-26 厦门大学 Method and system for rating credit of transportation unit, electronic device and storage medium
CN112418987B (en) * 2020-11-20 2022-04-29 厦门大学 Method and system for rating credit of transportation unit, electronic device and storage medium
CN113159709A (en) * 2021-03-24 2021-07-23 深圳闪回科技有限公司 Automatic label system and system
CN114462516A (en) * 2022-01-21 2022-05-10 天元大数据信用管理有限公司 Enterprise credit score sample labeling method and device
CN114462516B (en) * 2022-01-21 2024-04-16 天元大数据信用管理有限公司 Enterprise credit scoring sample labeling method and device

Similar Documents

Publication Publication Date Title
CN111047193A (en) Enterprise credit scoring model generation algorithm based on credit big data label
Brezigar-Masten et al. CART-based selection of bankruptcy predictors for the logit model
CN111368147B (en) Graph feature processing method and device
CN104321794B (en) A kind of system and method that the following commercial viability of an entity is determined using multidimensional grading
CN111401600A (en) Enterprise credit risk evaluation method and system based on incidence relation
Li et al. Multi-factor based stock price prediction using hybrid neural networks with attention mechanism
CN107609771A (en) A kind of supplier's value assessment method
Xu et al. Novel key indicators selection method of financial fraud prediction model based on machine learning hybrid mode
CN112102006A (en) Target customer acquisition method, target customer search method and target customer search device based on big data analysis
Song et al. Enhancing enterprise credit risk assessment with cascaded multi-level graph representation learning
Wang et al. Joint loan risk prediction based on deep learning‐optimized stacking model
Petersone et al. A Data-Driven Framework for Identifying Investment Opportunities in Private Equity
CN113506173A (en) Credit risk assessment method and related equipment thereof
CN117350845A (en) Enterprise credit risk assessment method based on cascade hypergraph neural network
CN117114705A (en) Continuous learning-based e-commerce fraud identification method and system
Khajehpour et al. Does Fundraising Have Meaningful Sequential Patterns? The Case of Fintech Startups
Kipkogei et al. Business success prediction in Rwanda: a comparison of tree-based models and logistic regression classifiers
Wang [Retracted] Correlation Analysis between Tourism and Economic Growth Based on Computable General Equilibrium Model (CGE)
Zhou Loan Default Prediction Based on Machine Learning Methods
Yuan [Retracted] Analysis of Consumer Behavior Data Based on Deep Neural Network Model
CN114943563A (en) Rights and interests pushing method and device, computer equipment and storage medium
Giannopoulos The effectiveness of artificial credit scoring models in predicting NPLs using micro accounting data
Li et al. Influence of Internet-based Social Big Data on Personal Credit Reporting
Kalaivani et al. A Comparative Study of Regression algorithms on House Sales Price Prediction
Zhang et al. Enterprise event risk detection based on supply chain contagion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200421