CN111612627A - Method for evaluating bond risk influence indexes - Google Patents

Method for evaluating bond risk influence indexes Download PDF

Info

Publication number
CN111612627A
CN111612627A CN202010464996.7A CN202010464996A CN111612627A CN 111612627 A CN111612627 A CN 111612627A CN 202010464996 A CN202010464996 A CN 202010464996A CN 111612627 A CN111612627 A CN 111612627A
Authority
CN
China
Prior art keywords
data
importance
bond
features
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010464996.7A
Other languages
Chinese (zh)
Inventor
袁豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Bopu Technology Co ltd
Original Assignee
Shenzhen Bopu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Bopu Technology Co ltd filed Critical Shenzhen Bopu Technology Co ltd
Priority to CN202010464996.7A priority Critical patent/CN111612627A/en
Publication of CN111612627A publication Critical patent/CN111612627A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Abstract

The embodiment of the invention provides a method for evaluating bond risk influence indexes, which comprises the following steps: acquiring a data source of the bond sample; constructing data characteristics and classification targets according to the data sources; calculating the importance of the data characteristics by adopting a random forest algorithm, and generating importance ranking of the data characteristics; according to the importance sorting sequence, adding the data features to the classification model one by one, calculating corresponding accuracy, and selecting a feature subset reaching the highest accuracy; and obtaining an important influence index according to the feature subset. The method provided by the embodiment of the invention is used for sequencing the feature importance of the data set of the bond sample, calculating the accuracy through the classification model, finding out the optimal feature subset, and removing the redundant features in the data set under the condition of ensuring the classification capability of the feature subset, thereby screening out the important indexes influencing the bond risk and reducing the workload of bond information acquisition.

Description

Method for evaluating bond risk influence indexes
Technical Field
The invention relates to the technical field of big data, in particular to a method for evaluating risk influence indexes of bonds.
Background
Bond breach events occur frequently in the last two years, and bond breach will become a common risk event as policies are changed. The existing bond risk prediction technology mainly extracts useful data features from a wide range of bond data, and trains bonds in a machine learning mode so as to obtain a classification model capable of predicting bond default. Factors such as credit investigation information, financial data, third-party credit rating reports and research reports may affect bond risks, and how to collect effective indexes from mass data becomes a problem to be solved urgently in evaluating bond risks.
In the prior art, a random forest algorithm is mainly used for evaluating and analyzing a plurality of data characteristics, characteristics with high importance are found, and data sources of the characteristics are traced back to determine which indexes which can be obtained are important for predicting bond default.
However, the existing data feature evaluation method is difficult to accurately define effective features and redundant features, and a great amount of index information containing the redundant features still needs to be collected before risk prediction is carried out by adopting a classification model, so that the time consumption of data acquisition is long.
Disclosure of Invention
The invention mainly aims to provide a method for evaluating bond risk influence indexes, which aims to solve the technical problem that the existing index information contains a large number of redundant features.
The invention provides a method for evaluating bond risk influence indexes, which comprises the following steps:
acquiring a data source of the bond sample;
constructing data characteristics and classification targets according to the data sources;
calculating the importance of the data characteristics by adopting a random forest algorithm, and generating importance ranking of the data characteristics;
according to the importance sorting sequence, adding the data features to the classification model one by one, calculating corresponding accuracy, and selecting a feature subset reaching the highest accuracy;
and obtaining an important influence index according to the feature subset.
Preferably, the data source for obtaining the bond sample comprises: a data source of a bond sample is obtained, wherein the data source comprises two or more index information and a default record of the bond sample.
Preferably, the constructing the data features and the classification targets according to the data source includes:
constructing the data characteristics according to the index information;
and constructing the classification target according to the default record.
Preferably, the calculating the importance of the data features by using a random forest algorithm, and the generating the importance ranking of the data features includes:
constructing a decision tree according to the data characteristics and the classification target, and generating a random forest;
calculating the importance of the data features through the random forest;
and arranging the data features according to the sequence of the importance from high to low to generate the importance ranking of the data features.
Preferably, the classification model is obtained by training data of the bond sample by adopting an SVM algorithm, a random forest algorithm, a naive Bayes algorithm, a CART algorithm or a Bagging algorithm.
The method provided by the embodiment of the invention is used for sequencing the feature importance of the data set of the bond sample, calculating the accuracy through the classification model, finding out the optimal feature subset, and removing the redundant features in the data set under the condition of ensuring the classification capability of the feature subset, thereby screening out the important indexes influencing the bond risk and reducing the workload of bond information acquisition.
Drawings
FIG. 1 is a flowchart of the steps of embodiment 1 of a method for evaluating risk impact indicators of bonds of the present invention;
FIG. 2 is a flowchart illustrating the steps of embodiment 2 of a method for evaluating risk impact indicators of bonds according to the present invention;
fig. 3 is a flowchart of the steps of an embodiment 3 of the method for evaluating risk impact indicators of bonds of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
One of the core ideas of the embodiment of the invention is to provide a method for evaluating bond risk influence indexes, screen out important indexes influencing bond risk and reduce the workload of information acquisition.
Referring to fig. 1, a flowchart illustrating steps of embodiment 1 of a method for evaluating risk influence indicators of bonds of the present invention is shown, which may specifically include the following steps:
s101, obtaining a data source of the bond sample.
Specifically, the method for acquiring the data source is to adopt the original data or to randomly extract the data source based on the original data.
And S102, constructing data characteristics and classification targets according to the data source.
Specifically, a data set for feature importance analysis is constructed according to the data source, and the data set comprises two or more feature quantities and classification targets.
S103, calculating the importance of the data features by adopting a random forest algorithm, and generating the importance ranking of the data features.
Specifically, a random forest algorithm is adopted, the importance of the data features is calculated according to an information entropy or kini impurity feature importance measurement mode, and importance ranking of the data features is generated.
And S104, adding the data features to the classification models one by one according to the importance sorting sequence, calculating corresponding accuracy, and selecting the feature subset reaching the highest accuracy.
Specifically, the data features included in the feature subset are effective features that guarantee the prediction effect of the classification model, and the features other than the feature subset are redundant features.
And S105, obtaining an important influence index according to the feature subset.
Specifically, an index corresponding to the effective characteristic is found in a data source, and the index is an important index influencing bond risk.
The method of the embodiment sorts the feature importance of the data set of the bond sample, calculates the accuracy through the classification model, finds the optimal feature subset, and removes the redundant features in the data set under the condition of ensuring the classification capability of the feature subset, thereby screening out the important indexes influencing the bond risk and reducing the workload of collecting the bond information.
Referring to fig. 2, a flowchart illustrating steps of embodiment 2 of the method for evaluating risk influence indicators of bonds of the present invention is shown, and specifically, the method may include the following steps:
s201, acquiring a data source of the bond sample, wherein the data source comprises two or more index information and a default record of the bond sample.
Specifically, a plurality of index information and default records of the bond sample are extracted and screened from data such as bond equity historical data, corresponding industry index historical data, bond subject historical financial statements, individual bonds and subject historical ratings thereof and the like in each bond.
S202, constructing the data characteristics according to the index information.
Specifically, 30 data features are constructed according to the index information of the bond sample, which are respectively as follows: c _ level (bond rating), CLD (whether or not there is a bond rating down-regulation), D _ level (subject rating), DLD (whether or not there is a subject rating down-regulation), DTAR (asset liability rate), DTAR _ DIFF (difference from the previous portfolio rate), PM (gross interest rate), PM _ DIFF (difference from the previous portfolio rate), DC (liability capital ratio), DC _ DIFF (difference from the previous portfolio ratio), ROE (net asset profitability), ROE _ DIFF (difference from the previous portfolio net asset profitability), OCF (operational net cash flow), OCF _ DIFF (difference from the previous portfolio operational net cash flow), OCF/D (operational net cash flow/negative aggregate), OCF/D _ DIFF (difference from the previous portfolio "operational net cash flow/negative aggregate"), OCF/D _ DIFF (average daily average), and dlf (average daily average) avg _ price (mean price in median valuation quarter), max _ diff (maximum rise in median valuation quarter), min _ diff (maximum fall in median valuation quarter), max _ min _ diff (difference between the highest value and the lowest value appearing in median valuation quarter, with the result containing a sign representing whether the maximum fluctuation is a fall or a rise), diff _ rate (overall rise and fall proportion of median valuation quarter), is _ stop (whether there is an overdue within median valuation quarter), concept _ avg _ day _ diff (industry index daily fluctuation), concept _ avg _ day _ absdiff (industry index daily fluctuation absolute), concept _ max _ diff (maximum fluctuation within industry index quarterly), concept _ min _ diff (maximum drop within industry index quarterly), concept _ max _ min _ diff (difference between the highest and lowest values occurring within industry index quarterly), and concept _ avg _ price (average index within industry index quarterly divided by the index of the first day quarterly).
S203, constructing the classification target according to the default record.
Specifically, the classification target is constructed according to whether the bond sample has a default record in a corresponding quarter: if the default occurs, marking as 1; the default is marked 0 if no violations have occurred.
And S204, constructing a decision tree according to the data characteristics and the classification target, and generating a random forest.
Specifically, the number of the bond samples is recorded as K, and K samples are randomly drawn and taken from the K samples as a training set; recording the number of the data features as M, and randomly extracting M features from the data features as branch bases (M is less than or equal to M); constructing a decision tree according to the training set, the branch basis and the classification target and according to a measuring mode of the purity of the kinney or the information entropy; and repeating the steps to construct a plurality of decision trees, generating a random forest, and recording the number of the decision trees in the random forest as N.
And S205, calculating the importance of the data characteristics through the random forest.
Specifically, for each decision tree in the random forest, the prediction error of the out-of-bag data is calculated by using the corresponding out-of-bag data (OOB) data, and is recorded as errOOB 1; the out-of-bag data is the sample remaining after the decision tree takes the k samples; replacing the data characteristics X of all samples of the data outside the bag with random numbers, calculating the error of the data outside the bag again, and recording the error as errOOB 2; the importance of the data feature X is sigma (eerOOB2-eerOOB 1)/N.
S206, arranging the data features according to the sequence of the importance from high to low, and generating the importance ranking of the data features.
Specifically, on the basis of S203, the importance of all data features is calculated one by one, the data features are arranged according to the order of the importance from high to low, and an importance ranking of the data features is generated, which sequentially includes: DLD, concept _ avg _ day _ absdiff, concept _ max _ DIFF, C _ level, concept _ max _ min _ DIFF, concept _ min _ DIFF, CLD, D _ level, concept _ avg _ day _ DIFF, DTAR _ DIFF, max _ DIFF, OCF/D _ DIFF, is _ stop, OCF/D, OCF _ DIFF, DC _ DIFF, max _ min _ DIFF, avg _ day _ absdiff, concept _ avg _ price, ROE _ DIFF, DIFF _ rate, avg _ day _ DIFF, ROE, avg _ price, DTAR, min _ DIFF, PM _ DIFF, DC, OCF.
And S207, adding the data features to the classification models one by one according to the importance sorting sequence, calculating corresponding accuracy, and selecting the feature subset reaching the highest accuracy.
Specifically, adding the data features to the classification model one by one according to the order of the importance ranking, calculating the corresponding accuracy, and selecting the feature subset reaching the highest accuracy, includes: DLD, concept _ avg _ day _ absdiff, concept _ max _ DIFF, C _ level, concept _ max _ min _ DIFF, concept _ min _ DIFF, CLD, D _ level, concept _ avg _ day _ DIFF, DTAR _ DIFF, max _ DIFF, OCF/D _ DIFF, is _ stop, OCF/D, OCF _ DIFF, DC _ DIFF.
And S208, obtaining an important influence index according to the feature subset.
Specifically, obtaining an important influence index according to the feature subset includes: whether main rating down-regulation, industry index daily average fluctuation absolute value, maximum fluctuation range in industry index quarterly, bond rating, difference between maximum value and minimum value in industry index quarterly, maximum drop in industry index quarterly, whether bond rating down-regulation, main rating, industry index daily average fluctuation, difference with previous financial asset liability rate, maximum fluctuation range in middle bond valuation quarterly, difference with previous financial newspaper operational net cash flow/liability sum, whether overdue card is in middle bond valuation quarterly, operational net cash flow/liability sum, difference with previous financial affair operational net cash flow, difference with previous financial affair capital ratio of previous newspaper.
According to the method, 30 data features in the bond sample data set are subjected to importance sorting, the feature subset reaching the highest accuracy is calculated through the classification model, and a large number of redundant features in the data set are removed, so that important indexes influencing bond risks are screened out, collection of non-important index information is avoided, workload of information processing can be reduced, and evaluation efficiency of the bond risks is improved.
Referring to fig. 3, a flowchart illustrating steps of embodiment 3 of the method for evaluating risk influence indicators of bonds of the present invention is shown, and specifically, the method may include the following steps:
s301, acquiring a data source of the bond sample, wherein the data source comprises two or more index information and a default record of the bond sample.
S302, constructing the data characteristics according to the index information.
S303, constructing the classification target according to the default record.
S304, according to the data characteristics and the classification target, a decision tree is constructed, and a random forest is generated.
S305, calculating the importance of the data characteristics through the random forest.
S306, arranging the data features according to the sequence of the importance from high to low, and generating the importance ranking of the data features.
S307, adding the data features to the classification model one by one according to the importance sorting sequence, calculating corresponding accuracy, and selecting a feature subset reaching the highest accuracy; the classification model is obtained by training data of the bond samples by adopting an SVM algorithm, a random forest algorithm, a naive Bayes algorithm, a CART algorithm or a Bagging algorithm.
Specifically, the data features are added to the classification model one by one according to the importance sorting sequence, the corresponding accuracy is calculated, and the feature subset reaching the highest accuracy is selected; the classification model is obtained by training data of bond samples by adopting an SVM algorithm. The SVM algorithm is a machine learning method developed on the basis of a statistical theory, and shows a plurality of specific advantages in solving the problems of small samples, nonlinearity and high-dimensional pattern recognition based on the principle of minimizing structural risk. The SVM classification model can realize classification prediction according to the data characteristics of the bond samples.
And S308, obtaining an important influence index according to the feature subset.
According to the method, the characteristic subset reaching the highest accuracy is calculated by using the SVM classification model, and the effective characteristics aiming at the SVM classification model can be screened out under the condition that the classification capability is guaranteed, so that the important indexes influencing the bond risk are screened out, and the bond risk evaluation efficiency based on the SVM algorithm is improved.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The method for evaluating risk influence indexes of bonds provided by the invention is described in detail, a specific example is applied in the method to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (5)

1. A method of assessing a bond risk impact indicator, comprising:
acquiring a data source of the bond sample;
constructing data characteristics and classification targets according to the data sources;
calculating the importance of the data characteristics by adopting a random forest algorithm, and generating importance ranking of the data characteristics;
according to the importance sorting sequence, adding the data features to the classification model one by one, calculating corresponding accuracy, and selecting a feature subset reaching the highest accuracy;
and obtaining an important influence index according to the feature subset.
2. The method of claim 1, wherein the obtaining a data source of the bond sample comprises: a data source of a bond sample is obtained, wherein the data source comprises two or more index information and a default record of the bond sample.
3. The method of claim 2, wherein constructing data features and classification targets from the data sources comprises:
constructing the data characteristics according to the index information;
and constructing the classification target according to the default record.
4. The method of claim 3, wherein the calculating the importance of the data features using a random forest algorithm, and wherein generating the importance ranking of the data features comprises:
constructing a decision tree according to the data characteristics and the classification target, and generating a random forest;
calculating the importance of the data features through the random forest;
and arranging the data features according to the sequence of the importance from high to low to generate the importance ranking of the data features.
5. The method of claim 1, wherein the classification model is obtained by training data of the bond sample using an SVM algorithm, a random forest algorithm, a naive Bayes algorithm, a CART algorithm, or a Bagging algorithm.
CN202010464996.7A 2020-05-28 2020-05-28 Method for evaluating bond risk influence indexes Pending CN111612627A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010464996.7A CN111612627A (en) 2020-05-28 2020-05-28 Method for evaluating bond risk influence indexes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010464996.7A CN111612627A (en) 2020-05-28 2020-05-28 Method for evaluating bond risk influence indexes

Publications (1)

Publication Number Publication Date
CN111612627A true CN111612627A (en) 2020-09-01

Family

ID=72201750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010464996.7A Pending CN111612627A (en) 2020-05-28 2020-05-28 Method for evaluating bond risk influence indexes

Country Status (1)

Country Link
CN (1) CN111612627A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112419030A (en) * 2020-11-30 2021-02-26 北京安九信息技术有限公司 Method, system and equipment for evaluating financial fraud risk
CN113112370A (en) * 2021-04-19 2021-07-13 上海同态信息科技有限责任公司 Debt credit assessment method based on SVM algorithm model
CN114721835A (en) * 2022-06-10 2022-07-08 湖南工商大学 Method, system, device and medium for predicting energy consumption of edge data center server
CN115409613A (en) * 2022-09-13 2022-11-29 中债金科信息技术有限公司 Bond risk detection model training method and bond risk detection method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112419030A (en) * 2020-11-30 2021-02-26 北京安九信息技术有限公司 Method, system and equipment for evaluating financial fraud risk
CN112419030B (en) * 2020-11-30 2023-06-27 北京安九信息技术有限公司 Method, system and equipment for evaluating financial fraud risk
CN113112370A (en) * 2021-04-19 2021-07-13 上海同态信息科技有限责任公司 Debt credit assessment method based on SVM algorithm model
CN114721835A (en) * 2022-06-10 2022-07-08 湖南工商大学 Method, system, device and medium for predicting energy consumption of edge data center server
CN115409613A (en) * 2022-09-13 2022-11-29 中债金科信息技术有限公司 Bond risk detection model training method and bond risk detection method

Similar Documents

Publication Publication Date Title
CN111612627A (en) Method for evaluating bond risk influence indexes
Godahewa et al. Monash time series forecasting archive
Premachandra et al. DEA as a tool for predicting corporate failure and success: A case of bankruptcy assessment
CN108090800B (en) Game prop pushing method and device based on player consumption potential
CN104321794B (en) A kind of system and method that the following commercial viability of an entity is determined using multidimensional grading
CN109409677A (en) Enterprise Credit Risk Evaluation method, apparatus, equipment and storage medium
CN109978230B (en) Intelligent power sale amount prediction method based on deep convolutional neural network
Papik et al. Detection models for unintentional financial restatements
KR20090006489A (en) Toolkit of constructing credit risk model, method of managing credit risk using credit risk model construction and recording medium thereof
CN110782349A (en) Model training method and system
Darayseh et al. Corporate failure for manufacturing industries using firms specifics and economic environment with logit analysis
CN107133862A (en) Dynamic produces the method and system of the detailed transaction payment experience of enhancing credit evaluation
CN114139725A (en) Service object prediction method, device and storage medium
CN112819341A (en) Scientific and technological type small and micro enterprise credit risk assessment method
CN111931992A (en) Power load prediction index selection method and device
CN115689713A (en) Abnormal risk data processing method and device, computer equipment and storage medium
CN115545652A (en) Comprehensive personnel evaluation method, device, equipment and storage medium
Niknya et al. Financial distress prediction of Tehran Stock Exchange companies using support vector machine
CN113888047A (en) Technical improvement project investment scale prediction method and system considering regional investment capacity
CN114626940A (en) Data analysis method and device and electronic equipment
Wang Corporate default prediction: models, drivers and measurements
Cheng A Hybrid Predicting Stock Return Model Based on Bayesian Network and Decision Tree
CN112926816B (en) Vendor evaluation method, device, computer device and storage medium
Apitzsch et al. Cluster Analysis of Mixed Data Types in Credit Risk: A study of clustering algorithms to detect customer segments
Cheng Predicting stock returns by decision tree combining neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination