WO2015070314A1 - Supervised credit classifier with accounting ratios - Google Patents

Supervised credit classifier with accounting ratios Download PDF

Info

Publication number
WO2015070314A1
WO2015070314A1 PCT/CA2013/050865 CA2013050865W WO2015070314A1 WO 2015070314 A1 WO2015070314 A1 WO 2015070314A1 CA 2013050865 W CA2013050865 W CA 2013050865W WO 2015070314 A1 WO2015070314 A1 WO 2015070314A1
Authority
WO
WIPO (PCT)
Prior art keywords
training data
accounting
distance
vector
classification
Prior art date
Application number
PCT/CA2013/050865
Other languages
French (fr)
Inventor
Yong Liu
Original Assignee
Yong Liu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yong Liu filed Critical Yong Liu
Priority to PCT/CA2013/050865 priority Critical patent/WO2015070314A1/en
Publication of WO2015070314A1 publication Critical patent/WO2015070314A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Definitions

  • This disclosure relates generally to the field of processing corporate accounting data contained in balance sheet, income statement, and cash flow, and more particularly to using accounting ratios to classify creditworthiness of a corporation based on artificial intelligence.
  • the method disclosed analyzes accounting data as a whole by applying techniques of artificial intelligence. Balance sheet, income statement, and cash flow are concatenated to form a vector of accounting ratios. Different types distances are used to study unknown vectors based on known vectors through supervised learning. Known vectors are called labelled training data in supervised learning.
  • the method disclosed classify vectors of accounting ratios in the presence of ambiguity in training data. An overlaid classification approach is adopted for highly clustered training data. A naive Bayes classifier is applied when clustering is weak in training data. A class based deduction approach is adopted for real time processing.
  • a vector of accounting ratios embodies full accounting data. This approach represents a new perspective of accounting data analysis. Instead of focusing on a narrow set of ratios, this approach focus on overall aspects such as pattern of data. The traditional sub set of ratios approach discovers problem when individual ratio is wrong. This approach is aiming at discovering when the pattern of the full accounting data is problematic. Classification enables us to make assertions such as overall accounting data indicates health even though individual aspect is problematic.
  • the method developed here enables people to independently explore larger credit world based on solid understanding of a small set of corporations. Through classification, an unknown corporate credit is delineated with closest approximations among the small set. By leveraging the technique presented, people gain understanding of many more corporate credits without having to investigate individual accounting data in a repetitive manner.
  • FIG. 1 is an exemplary representation of a process of concatenating balance sheet, income statement and cash flow to form a vector of accounting data
  • FIG. 2 is an exemplary representation of a process of converting a vector of accounting data to a vector of accounting ratios
  • FIG. 3 is an exemplary representation of dividing sample range into a plurality of intervals
  • FIG. 4 is an exemplary representation of organizing labelled training data in a tree structure
  • FIG. 5 is an exemplary representation of a process of classification based on supervised training
  • FIG. 6 is an example of commingling of different ratings
  • FIG. 7 is an example of a process of clustering labelled training data through the K- means algorithm
  • FIG. 8 is an illustration of a process of modified K - nearest neighbor algorithm
  • FIG. 9 is an illustration of a process of applying a naive Bayes classifier on nearest neighbors
  • FIG. 10 is a flow chart of a class based classification process
  • Balance sheet, income statement, and cash flow are separate accounting data because they reflect different aspects of corporate finance.
  • individual data is extracted from each of them for analytical purposes. Much attention has been given to some accounting data such as total revenue or some accounting ratio such as debt ratio. While providing deep insight individually, these individual data are not sufficient to summarize overall credit by themselves. Selective combination of accounting ratios have been studied. However, as diversity prevail because of different operation environment and credit background, it is hard to find a universally applicable set to define overall credit.
  • FIG. 1 an 75 exemplary representation of a process of concatenating balance sheet, income statement and cash flow to form a vector of accounting data.
  • Balance sheet is represented as 11.
  • Income statement is represented as 12.
  • Cash flow is represented as 13.
  • the vector formed by combining 11, 12, and 13 is represented as 14. This vector contains all information we need to evaluate credit and operation of a company.
  • FIG. 2 an exemplary representation of a process of converting a vector of accounting data to a vector of accounting ratios. All elements in an accounting data vector is divided by the total liabilities. The total liabilities is itself an element of the vector. The resulting vector is a single set of accounting
  • the vector of accounting data is 21.
  • the vector of accounting ratio is 22.
  • Elements of 22 is elements of 21 divided by total liabilities. Every element is normalized on per total liabilities basis. Alternatively, if every element is divided by the total revenue, they are normalized on per total revenue basis. Concatenation and division by an accounting data element makes it possible for application of artificial intelligence techniques.
  • a derived situation in the operations of concatenation and division is to apply weighting in the process. Weighting is sometimes needed to merge accounting ratios for companies from different countries with slightly different reporting standards.
  • Equation 1 Representing Accounting Ratios as a Row Vector [0012] Equation 2. Representing Weightings as a Diagonal Matrix
  • Equation 3 The Product of RW is a Row Vector
  • Vectors generated in this manner forms the basis of supervised learning.
  • supervised learning is to infer from a plurality of labelled training data.
  • a plurality of vectors of accounting ratios serve the purpose of the labelled training data.
  • Classification targets are also vectors of accounting ratios.
  • Labelled training data is derived from the definitions for credit worthiness.
  • the first is to reuse accounting ratios of rated companies. This approach accepts accounting ratios of rated companies as best representation of the definition of ratings. A set of accounting ratios for an AA rated company is thus an instantiation of "very strong capacity for payment of financial commitments. This capacity is not significantly vulnerable to foreseeable events”. Labelled training data is comprising of a plurality of vectors derived from rated companies.
  • the second ways is to connect the definition of creditworthiness and labelled training data by expounding the definition with accounting ratios. Expounding the definition involves delegating the concept of "capacity for payment of financial commitments" to a number of factors. These factors may involve total equity to total liabilities, free cash flow to total liabilities, current assets to total liabilities, etc.
  • the division result 32 is a combination of logical range and population
  • the histogram of total equity to total liabilities ratios for the sample data indicates that the population with negative equity is low. As a result, zero equity would be adequate demarcation for situations where there are more than three levels of distinction but not appropriate 135 for the present situation.
  • a combined consideration involving both population in the histogram and logical range picks 0.5 as demarcation between low equity ratio and medium equity ratio. The same considerations put demarcation between high equity ratio and medium equity ratio at 1.5. Similar histograms can be produced for other ratios including the free cash flow to total liabilities ratio. Free cash flow is a measurement of cash generated from business operations.
  • Ratios representing payment capacity may be organized into a tree structure. Referring to FIG. 4, a representative organization of labelled training data in a tree structure. In FIG. 4, each of the three levels of total equity to total liabilities ratios are further divided with three levels of free cash flow to total liabilities ratios. Companies with high equity ratio may have low free cash flow ratio, or medium cash flow ratio, or high cash flow ratio. Note that the choice of three intervals
  • the leaves of the tree generated in this manner correspond to a plurality of credit situations.
  • the leaf 41 in FIG. 4 corresponds to high equity ratio and high free cash flow ratio.
  • FIG. 4 corresponds to high equity ratio and medium free cash flow ratio. Expounding of definition of creditworthiness is completed with mapping of the combination of ratios in each leaf to a definition. For example, the leaf 41 in FIG. 4 can be mapped to "very strong capacity for payment of financial commitments, and this capacity is not significantly vulnerable to foreseeable events". More than one training data can be placed under a leaf. Human intervention described here can be
  • Labelled training data generated in both manners can be used in a supervised classification.
  • FIG. 5 a representative process of classification based on supervised training. The process involves an induction phase and a deduction phase.
  • the objective induction is to build as much coverage as possible with the training data.
  • the objective of deduction is to match 160 classification targets to training data labels with precision.
  • a typical supervised learning starts with input of label training data.
  • training data is sent to the input in stepl .
  • Step 2 is supervised training which may label the training data to rated companies or to defined ratings.
  • Classification model is generated in step 3.
  • the process of induction involves step 1, step 2, and step 3.
  • the process of deduction involves step 3,
  • step 4 Classification model derived in step 3 is applied to unlabelled target data in step 4.
  • a label is generated for the target in step 5 through matching.
  • Supervised learning enables users to focus on deep understanding of a small group of corporate credits while maintaining coverage of a much larger credit world.
  • the small group can be used as labelled training data.
  • the larger credit world can be approximated with training data.
  • Vector distance provides a necessary measure for the matching process.
  • the Euclidean distance measures multiple dimensional distance through the Pythagorean formula.
  • P [ pi, p 2 , p n ]
  • Q [ qi, q 2 , q n ]
  • the distance is square root of sum of square of differences between corresponding accounting ratios of the vectors involved.
  • Equation 4 The Euclidean Distance
  • Example 1 a table of representative values of Euclidean distances between four pharmaceutical companies.
  • Company a is an AAA.
  • the Euclidean distance between companies 180 b, c, d and a indicates that they are also high credit. This is separately confirmed by the fact that all of them are rated above A by rating agencies. Note the difference between an overall measurement and individual ratio measurement. Individually, the current debt to total liabilities ratio of company a is significantly above training data average. The training data contains more than 2000 samples.
  • High current debt implies high likelihood of liquidity problem. From a narrow current debt to total 185 liabilities point of view, company a may not qualify for AAA. However, overall distance
  • Absolute distance is also used to measure distance. Absolute distance is also called
  • Example 2 a table of representative values of absolute distances between four 195 pharmaceutical companies. All four companies are deemed strong capacity to fulfil their financial obligations by rating agencies.
  • the absolute distances between company d and companies a, b, c are 7.2, 7.15, and 8.4 respectively.
  • median distance between d and other data in the training space is 17.1.
  • the values for absolute distance are greater than Euclidean distance for the same reason the sum of two sides of a triangle is greater than the value of the third.
  • the Euclidean distance and absolute distance measure overall difference between two vectors. By measuring the overall difference, supervised classification presents a new perspective for analyzing accounting data.
  • Both Euclidean distance and absolute distance provides measurement in terms to what degree vector A is identical to vector B. In many cases, people are interested in how accounting ratios of company A resembles that of company B on overall basis. The emphasis is the relative pattern. The difference in overall similarity is captured by similarity distance. [0035] Equation 6. The Similarity Distance where p a , q a are average values of p t and q t .
  • Similarity is measured in terms of the Pearson coefficient of correlation. Instead of computing difference between vectors, accounting ratios are computed in terms of variation relative to their mean values. Accounting ratios p! and 3 ⁇ 4 are component of separate vectors. Values p a and q a are mean values of p! and q L The distance d is one minus the Pearson coefficient of correlation. 215 Similarity does not guarantee small Euclidean distance.
  • Example 3 Representative Values of Similarity Distance a b c d a 0 0.02 0.05 0.05 b 0.02 0 0.09 0.07
  • Example 3 a table of representative values of similarity distances between four pharmaceutical companies. All four companies are deemed strong capacity to fulfil their financial
  • Zone 1 indicates similarity distance of 0 ⁇ d ⁇ 0.05. There are 8 high credit and 4 medium credit companies in zone 1.
  • Zone 2 indicates
  • Training data ambiguity also happens in handwriting recognition.
  • the objective of the method detailed here is to provide reasonable classification given ambiguity in 245 labelled training data.
  • ambiguity with training data can also happen when they are derived from expounding creditworthiness definitions.
  • a subset of accounting ratios are involved in grouping the training data. It is likely that the subset does cover all information contained in a vector of accounting ratios. When measured with a distance including a full set of ratios, discrepancy ensues. 250 Ambiguity in training data is a common phenomena is supervised training. Both induction algorithm and deduction algorithm have to keep this phenomena in mind.
  • FIG. 7 a representative example of a process of clustering labelled training data through the K- means algorithm. Given N labelled training data, this algorithm groups them into a specified number of K clusters (K ⁇ N) based on a distance measurement. As input K is given in STP 1, the algorithm works in a loop comprising of STP 2, STP 3, STP 4 and STP 5. STP 2 265 determines centers for each cluster based on existing information. STP 3 computes distances
  • STP 4 groups training data based on minimum distance to the centers.
  • STP 5 compares the clusters generated during the present iteration with those generated during the previous iteration. To compare clusters, the number of clusters has to be equal and that constituents of each cluster have to be the same as corresponding cluster in the 270 previous iteration. When the two groups are identical, no training data move group or cluster. If the clusters between the two iterations are not identical, execution goes back to STP 2 to continue the loop. If the clusters are identical, rating of each cluster is determined by a majority of the constituents in the cluster in STP 6. In STP 7, both rating and the center of clusters are presented in the output. Center of cluster is a representative attribute of cluster.
  • the gist of the demotion promotion algorithm is to produce a number of alternative results in 305 order to achieve majority.
  • the first result is obtained by counting the original vote.
  • the second result is to count the vote with the top level demoted to the second level.
  • the third result is to count the vote with the lowest level promoted to the second lowest level. Note that this algorithm does not cover all tie situations mathematically. If all three situations do not achieve majority, no class can be chosen from this cluster of training data. This happens when training data is highly chaotic.
  • K is normally an odd integer to prevent a tie. Classification based on the K - nearest neighbors is best suited for situations where training data is clustered and that clusters are ordered according to the label to be classified.
  • Classification based on application of the K nearest neighbor algorithm can be unstable when clusters are not ordered according to the label to be classified.
  • FIG. 6 there are 22 high credit and 16 medium credit training data. Therefore it is possible that a target classified as high credit based on a majority of K nearest neighbors would be reclassified as medium credit if K+2 was 345 chosen. Both borrowers and investors have large stakes in the result of classification. Sensitivity to a parameter is unacceptable.
  • Steps 8 1 computes the distance between the classification target and instances of training data.
  • K training data are chosen at step 8 4 1. Classification based on K training data is obtained at step 8 5 1. K+2 training data are chosen at step 8 4 2. Classification based on K+2 training data is obtained at step 8 5 2. The results of 8 5 1 and 8 5 2 are compared at step 8 6. If
  • the Naive Bayes classifier is an inference algorithm developed from the Bayes theorem.
  • Bayes classifier formulates the process of classification given a plurality of features.
  • classification is determined by choosing the largest posterior. Referring to equation 7., an expression of the posterior under the naive Bayes condition.
  • Xi , x 2 , ...x n ) is proportional to the product of the likelihood P(xi, x 2 , x n
  • Y) equals the product of individual P(xi
  • P(xi, x 2 , x n ) is a
  • Equation 7 An Expression of the Posterior Under the Naive Bayes Condition where :
  • Y P ( X j ⁇ Y ) is the likelyhood under the naive Bayes condition
  • K is the number of nearest neighbors.
  • K y is the number of instances belong to class Y.
  • the prior P(Y) is computed by dividing K y with K.
  • Equation 8 A Mathematical Expression for the Prior. 375
  • the likelihood is evaluated at a probability density function of a normal distribution.
  • An accounting ratio is represented by x;.
  • the probability density function is f( Xi).
  • Parameters ⁇ and ⁇ are computed with sample population of the training data. It is not necessary to compute all accounting ratios of a vector because the number of factors
  • Equation 9 The Probability Density Function of a Normal Distribution of an Accounting Ratio.
  • prod prod * P(x,
  • Steps 9 1 computes the distance between the classification target and instances of training data. Elements of the distance array is mapped to training data and is sorted in step 9 2. An odd integer K is specified at step 9 3. A total of K nearest training data is selected in step 9 4. Step 9 5 computes the prior based on the K nearest neighbors. Steps 9 6 computes the likelihood product based on 410 overall training data. Step 9 7 computes the posterior based on the product of the prior and the likelihood product. Step 9 8 determines the maximum of the posterior. It also relates the class associated with the maximum. The result is sent to output at step 9 9.
  • the method disclosed improves reliability of the final classification.
  • the spacial distribution of the 415 classified targets reflects that of training data. In this way, a much larger target space can be
  • the method disclosed is a tool enabling users to handle a larger corporate credit space with understanding of a small sized training data.
  • 420 instance based classification can be inefficient.
  • FIG. 10 a flow chart of a class based classification process.
  • Classes of training data were generated during the training phase based on clustering through the K- Means algorithm. This is can be done before classification.
  • the classes are loaded at the time of classification at step 425 10 1.
  • the distance between the target and the centers of the clusters are computed at step 10 3.
  • the distance is mapped to each class at step 10 4.
  • the minimum of the distances is chosen at step 10 5.
  • the corresponding class is used to classify the target in step 10 6.
  • a hash table type of data structure can be used for mapping purposes.

Abstract

The method disclosed details classification of creditworthiness of a company based on knowledge of existing classifications. Balance sheet, income statement, and cash flow data are concatenated. An vector of accounting ratios is derived from the concatenation. Distance between a target to be classified and a plurality of training data is computed on a plurality of measurements. The target is classified based the nearest neighbors.

Description

TITLE OF INVENTION
SUPERVISED CREDIT CLASSIFIER WITH ACCOUNTING RATIOS
TECHNICAL FIELD
[0001] This disclosure relates generally to the field of processing corporate accounting data contained in balance sheet, income statement, and cash flow, and more particularly to using accounting ratios to classify creditworthiness of a corporation based on artificial intelligence.
SUMMARY OF INVENTION
Technical Problem
[0002] In the wake of the great depression of the last century, governments around the world mandated disclosing of accounting information for public companies. A goal was to enable the public to evaluate creditworthiness based on the disclosed information. Accounting data and accounting ratios have been used for credit assessment. However, selective use of accounting data has been a long running deficiency. Attention has been focused on a minority of accounting data such as net income. Most of the accounting data are ignored even when they are disclosed to bring public attention. Further more, as credit situations differ from one individual to another, there is no statically chosen sub set of accounting data or ratios truly represents corporate credit in a universal manner. As a result, credit classification by rating agencies remain partially quantitative.
Solution to Problem
[0003] The method disclosed analyzes accounting data as a whole by applying techniques of artificial intelligence. Balance sheet, income statement, and cash flow are concatenated to form a vector of accounting ratios. Different types distances are used to study unknown vectors based on known vectors through supervised learning. Known vectors are called labelled training data in supervised learning. The method disclosed classify vectors of accounting ratios in the presence of ambiguity in training data. An overlaid classification approach is adopted for highly clustered training data. A naive Bayes classifier is applied when clustering is weak in training data. A class based deduction approach is adopted for real time processing.
Advantageous Effects of Invention
[0004] A vector of accounting ratios embodies full accounting data. This approach represents a new perspective of accounting data analysis. Instead of focusing on a narrow set of ratios, this approach focus on overall aspects such as pattern of data. The traditional sub set of ratios approach discovers problem when individual ratio is wrong. This approach is aiming at discovering when the pattern of the full accounting data is problematic. Classification enables us to make assertions such as overall accounting data indicates health even though individual aspect is problematic.
[0005] The method developed here enables people to independently explore larger credit world based on solid understanding of a small set of corporations. Through classification, an unknown corporate credit is delineated with closest approximations among the small set. By leveraging the technique presented, people gain understanding of many more corporate credits without having to investigate individual accounting data in a repetitive manner.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] The various embodiments, features and advances of the present invention will be understood more completely hereinafter as a result of a detailed description thereof in which reference will be made to the following drawings:
FIG. 1 is an exemplary representation of a process of concatenating balance sheet, income statement and cash flow to form a vector of accounting data;
FIG. 2 is an exemplary representation of a process of converting a vector of accounting data to a vector of accounting ratios;
FIG. 3 is an exemplary representation of dividing sample range into a plurality of intervals;
FIG. 4 is an exemplary representation of organizing labelled training data in a tree structure;
FIG. 5 is an exemplary representation of a process of classification based on supervised training; FIG. 6 is an example of commingling of different ratings;
FIG. 7 is an example of a process of clustering labelled training data through the K- means algorithm;
FIG. 8 is an illustration of a process of modified K - nearest neighbor algorithm;
FIG. 9 is an illustration of a process of applying a naive Bayes classifier on nearest neighbors;
FIG. 10 is a flow chart of a class based classification process ;
DESCRIPTION OF EMBODIMENTS
[0007] Balance sheet, income statement, and cash flow are separate accounting data because they reflect different aspects of corporate finance. Traditionally, individual data is extracted from each of them for analytical purposes. Much attention has been given to some accounting data such as total revenue or some accounting ratio such as debt ratio. While providing deep insight individually, these individual data are not sufficient to summarize overall credit by themselves. Selective combination of accounting ratios have been studied. However, as diversity prevail because of different operation environment and credit background, it is hard to find a universally applicable set to define overall credit.
[0008] Contrary to the prevailing tendency of slicing disclosed information, separated accounting data are combined to form a concatenated accounting data in this method. Referring to FIG. 1, an 75 exemplary representation of a process of concatenating balance sheet, income statement and cash flow to form a vector of accounting data. Balance sheet is represented as 11. Income statement is represented as 12. Cash flow is represented as 13. The vector formed by combining 11, 12, and 13 is represented as 14. This vector contains all information we need to evaluate credit and operation of a company.
80 [0009] Vectors generated this way often suffer lack of comparability. To address this issue, every elements in the vector is further divided by an accounting data from the vector. Referring to FIG. 2, an exemplary representation of a process of converting a vector of accounting data to a vector of accounting ratios. All elements in an accounting data vector is divided by the total liabilities. The total liabilities is itself an element of the vector. The resulting vector is a single set of accounting
85 ratios. The vector of accounting data is 21. The vector of accounting ratio is 22. Elements of 22 is elements of 21 divided by total liabilities. Every element is normalized on per total liabilities basis. Alternatively, if every element is divided by the total revenue, they are normalized on per total revenue basis. Concatenation and division by an accounting data element makes it possible for application of artificial intelligence techniques.
90 [0010] A derived situation in the operations of concatenation and division is to apply weighting in the process. Weighting is sometimes needed to merge accounting ratios for companies from different countries with slightly different reporting standards.
[0011] Equation 1. Representing Accounting Ratios as a Row Vector
Figure imgf000006_0001
[0012] Equation 2. Representing Weightings as a Diagonal Matrix
Figure imgf000006_0002
[0013] Equation 3. The Product of RW is a Row Vector
RW = r1 *w1 r2*w2 · · · rn*wn ~\ (Eq.3)
[0014] Equations 1. is an expression of the accounting ratios in a row vector form. Accounting ratios r r2, rn derived from full accounting data are represented as R = [ r r2, rn]. Equation 2. is an expression of weightings in the form of a diagonal matrix. Weightings W=[ wi, w2, wn ] are diagonal elements of the matrix. All non diagonal elements are zero. A matrix product of RW obtains a row vector. This is expressed in Equation 3.. In this way, a vector of weighted accounting ratios is expressed as [ nwi, r2w2, rn wn].
[0015] Vectors generated in this manner forms the basis of supervised learning. By definition, supervised learning is to infer from a plurality of labelled training data. A plurality of vectors of accounting ratios serve the purpose of the labelled training data. Classification targets are also vectors of accounting ratios. Labelled training data is derived from the definitions for credit worthiness.
[0016] Definitions for credit worthiness are essentially same across rating agencies. For example, a rating agency defines a rating scale of AA as "very strong capacity for payment of financial commitments. This capacity is not significantly vulnerable to foreseeable events". Note that accounting ratios are not explicitly related in the definition. Under the circumstance, there are two ways to group labelled training data.
115 [0017] The first is to reuse accounting ratios of rated companies. This approach accepts accounting ratios of rated companies as best representation of the definition of ratings. A set of accounting ratios for an AA rated company is thus an instantiation of "very strong capacity for payment of financial commitments. This capacity is not significantly vulnerable to foreseeable events". Labelled training data is comprising of a plurality of vectors derived from rated companies.
120 [0018] The second ways is to connect the definition of creditworthiness and labelled training data by expounding the definition with accounting ratios. Expounding the definition involves delegating the concept of "capacity for payment of financial commitments" to a number of factors. These factors may involve total equity to total liabilities, free cash flow to total liabilities, current assets to total liabilities, etc.
125 [0019] The process of delegating qualitative definition to a plurality of accounting ratios requires human intervention. In order to represent the capacity, the sample range of the corresponding ratios must be divided into a number of intervals. Division of sample data must take into account of both logical range and histogram of the sample data. Referring to FIG. 3, a representative process of dividing sample range into a plurality of intervals. A histogram 31 is used to infer divisions across
130 the overall range. The division result 32 is a combination of logical range and population
distribution as evidenced from 31.
[0020] Referring to FIG. 3, the histogram of total equity to total liabilities ratios for the sample data indicates that the population with negative equity is low. As a result, zero equity would be adequate demarcation for situations where there are more than three levels of distinction but not appropriate 135 for the present situation. A combined consideration involving both population in the histogram and logical range picks 0.5 as demarcation between low equity ratio and medium equity ratio. The same considerations put demarcation between high equity ratio and medium equity ratio at 1.5. Similar histograms can be produced for other ratios including the free cash flow to total liabilities ratio. Free cash flow is a measurement of cash generated from business operations.
140 [0021] Ratios representing payment capacity may be organized into a tree structure. Referring to FIG. 4, a representative organization of labelled training data in a tree structure. In FIG. 4, each of the three levels of total equity to total liabilities ratios are further divided with three levels of free cash flow to total liabilities ratios. Companies with high equity ratio may have low free cash flow ratio, or medium cash flow ratio, or high cash flow ratio. Note that the choice of three intervals
145 serves an illustration purpose. The actual number may vary depending on sample data and objective of classification. The choice of equity ratio and free cash flow ratio are also made for an illustration purpose. Practical choice of features are not limited to these two types of ratio.
[0022] The leaves of the tree generated in this manner correspond to a plurality of credit situations. The leaf 41 in FIG. 4 corresponds to high equity ratio and high free cash flow ratio. The leaf 42 in
150 FIG. 4 corresponds to high equity ratio and medium free cash flow ratio. Expounding of definition of creditworthiness is completed with mapping of the combination of ratios in each leaf to a definition. For example, the leaf 41 in FIG. 4 can be mapped to "very strong capacity for payment of financial commitments, and this capacity is not significantly vulnerable to foreseeable events". More than one training data can be placed under a leaf. Human intervention described here can be
155 included in a design. No human intervention is needed once the method is implemented.
[0023] Labelled training data generated in both manners can be used in a supervised classification. Referring to FIG. 5, a representative process of classification based on supervised training. The process involves an induction phase and a deduction phase. The objective induction is to build as much coverage as possible with the training data. The objective of deduction is to match 160 classification targets to training data labels with precision.
[0024] A typical supervised learning starts with input of label training data. Referring to FIG. 5, training data is sent to the input in stepl . Step 2 is supervised training which may label the training data to rated companies or to defined ratings. Classification model is generated in step 3. The process of induction involves step 1, step 2, and step 3. The process of deduction involves step 3,
165 step 4, and step 5. Classification model derived in step 3 is applied to unlabelled target data in step 4. A label is generated for the target in step 5 through matching. Supervised learning enables users to focus on deep understanding of a small group of corporate credits while maintaining coverage of a much larger credit world. The small group can be used as labelled training data. The larger credit world can be approximated with training data.
170 [0025] Vector distance provides a necessary measure for the matching process. The Euclidean distance measures multiple dimensional distance through the Pythagorean formula. For vectors of accounting ratios P =[ pi, p2, pn] and Q =[ qi, q2, qn], the distance is square root of sum of square of differences between corresponding accounting ratios of the vectors involved. [0026] Equation 4. The Euclidean Distance
175 d (p , Q)=^Pi-qiY+(p2-q2Y + ^ + (Pn-qny (Eq.4) [0027] Example 1. Representative Values of Euclidean Distance a b e d a 0 0.8 1.23 1.28 b 0.8 0 1.7 1.5 c 1.23 1.7 0 1.56 d 1.28 1.5 1.56 0
[0028] Referring to Example 1., a table of representative values of Euclidean distances between four pharmaceutical companies. Company a is an AAA. The Euclidean distance between companies 180 b, c, d and a indicates that they are also high credit. This is separately confirmed by the fact that all of them are rated above A by rating agencies. Note the difference between an overall measurement and individual ratio measurement. Individually, the current debt to total liabilities ratio of company a is significantly above training data average. The training data contains more than 2000 samples.
High current debt implies high likelihood of liquidity problem. From a narrow current debt to total 185 liabilities point of view, company a may not qualify for AAA. However, overall distance
measurement put indicates that the credit quality of these companies are close. Therefore, company a is a strong credit.
[0029] Absolute distance is also used to measure distance. Absolute distance is also called
Manhattan distance. For vectors of accounting ratios P =[ pi, p2, pn] and Q =[ qi, q2, qn], the 190 distance is sum of absolute value of differences between corresponding accounting ratios of the vectors involved.
[0030] Equation 5. The Absolute Distance
Figure imgf000010_0001
[0031] Referring to Example 2., a table of representative values of absolute distances between four 195 pharmaceutical companies. All four companies are deemed strong capacity to fulfil their financial obligations by rating agencies. The absolute distances between company d and companies a, b, c are 7.2, 7.15, and 8.4 respectively. In perspective, median distance between d and other data in the training space is 17.1. The values for absolute distance are greater than Euclidean distance for the same reason the sum of two sides of a triangle is greater than the value of the third.
200 [0032] Example 2. Representative Values of Absolute Distance a b e d
a 0 4.8 6.8 7.2 b 4.8 0 8.5 7.15 c 6.8 8.5 0 8.4 d 7.2 7.15 8.4 0
[0033] The Euclidean distance and absolute distance measure overall difference between two vectors. By measuring the overall difference, supervised classification presents a new perspective for analyzing accounting data. 205 [0034] Both Euclidean distance and absolute distance provides measurement in terms to what degree vector A is identical to vector B. In many cases, people are interested in how accounting ratios of company A resembles that of company B on overall basis. The emphasis is the relative pattern. The difference in overall similarity is captured by similarity distance. [0035] Equation 6. The Similarity Distance
Figure imgf000011_0001
where pa , qa are average values of pt and qt .
[0036] Similarity is measured in terms of the Pearson coefficient of correlation. Instead of computing difference between vectors, accounting ratios are computed in terms of variation relative to their mean values. Accounting ratios p! and ¾ are component of separate vectors. Values pa and qa are mean values of p! and qL The distance d is one minus the Pearson coefficient of correlation. 215 Similarity does not guarantee small Euclidean distance.
[0037] Example 3. Representative Values of Similarity Distance a b c d a 0 0.02 0.05 0.05 b 0.02 0 0.09 0.07
c 0.05 0.09 0 0.08 d 0.05 0.07 0.08 0
[0038] Referring to Example 3., a table of representative values of similarity distances between four pharmaceutical companies. All four companies are deemed strong capacity to fulfil their financial
220 obligations by rating agencies. This is independently confirmed with small similarity distances.
Similarity distance measures overall distance in terms of correlation of accounting ratios. Judged by individual ratios, some may not qualify for strong capacity. Pharmaceutical sector may not require holding a large sum of cash to cover liabilities. However, the cash to total liabilities ratio of companies c and d are significantly below sample average of the training data. This example again
225 demonstrates that measuring overall accounting ratios, rather than focus on individual number offers new perspective.
[0039] With instances of training data and a plurality of distance measurements, supervised learning is ready to go. Before proceeding with matching, it is important to understand distribution of the labelled training data. With accounting ratios of rated companies as training data, it is easy to find
230 commingling among different ratings.
[0040] Referring to FIG. 6, a representative example of commingling of different ratings. The histogram indicates population of rated credits within two distance zones from a AAA rated company. Similarity distance is chosen for this example. Zone 1 indicates similarity distance of 0 < d < 0.05. There are 8 high credit and 4 medium credit companies in zone 1. Zone 2 indicates
235 similarity distance of 0.05 < d < 0.1. There are 22 high credit, 16 medium credit, and 4 low credit companies in zone 2. Recalling that similarity distance is one minus the Pearson correlation, this histogram indicated that some company are rated as a medium credit even thought their accounting data normalized by total liabilities are very much similar to an AAA rated company. As a result, a direct application of instance based deduction can create instability in classification. One result can 240 be deduced based on one set of neighboring instances. Another result can be deduced based on
another set of neighboring instances. Further more, a company can be classified as high credit even though another company performs better is classified as medium credit. This situation is a reflection of training data ambiguity. Training data ambiguity also happens in handwriting recognition. The objective of the method detailed here is to provide reasonable classification given ambiguity in 245 labelled training data.
[0041] To a certain extent, ambiguity with training data can also happen when they are derived from expounding creditworthiness definitions. A subset of accounting ratios are involved in grouping the training data. It is likely that the subset does cover all information contained in a vector of accounting ratios. When measured with a distance including a full set of ratios, discrepancy ensues. 250 Ambiguity in training data is a common phenomena is supervised training. Both induction algorithm and deduction algorithm have to keep this phenomena in mind.
[0042] One way to address the ambiguity issue with the training data is to transcend simple grouping of instances during the induction phase. Specifically, this involves the following steps:
(1) discovering natural clusters in the training data;
255 (2) associating a class to each cluster;
(3) mapping as class to a rating;
(4) representing a cluster with a centroid.
As a result, classes, instead of instances, will be product of the induction phase under this approach. Matching will be a conducted between the target and a training class. Key to the success of this 260 approach is clustering. A proven algorithm for clustering is the K - means algorithm.
[0043] Referring to FIG. 7, a representative example of a process of clustering labelled training data through the K- means algorithm. Given N labelled training data, this algorithm groups them into a specified number of K clusters (K < N) based on a distance measurement. As input K is given in STP 1, the algorithm works in a loop comprising of STP 2, STP 3, STP 4 and STP 5. STP 2 265 determines centers for each cluster based on existing information. STP 3 computes distances
between each training data and centers determined in STP 2. STP 4 groups training data based on minimum distance to the centers. STP 5 compares the clusters generated during the present iteration with those generated during the previous iteration. To compare clusters, the number of clusters has to be equal and that constituents of each cluster have to be the same as corresponding cluster in the 270 previous iteration. When the two groups are identical, no training data move group or cluster. If the clusters between the two iterations are not identical, execution goes back to STP 2 to continue the loop. If the clusters are identical, rating of each cluster is determined by a majority of the constituents in the cluster in STP 6. In STP 7, both rating and the center of clusters are presented in the output. Center of cluster is a representative attribute of cluster.
275 [0044] Determining the rating of a cluster by the ratings of the constituents is not standard
ingredient of the K- Means algorithm. Referring to FIG. 6, there are 8 high credit, 4 medium credit, and 0 low credit in zone 1. There is a clear majority in this situation because high credit is 66% of the total population. Zone 1 will be rated as high credit. Key to computing majority is determine the majority in a weak situation. A demotion and promotion algorithm is developed for this purpose. 280 [0045] Pseudo Code for the Demotion and Promotion Algorithm
input the number of groups G
input votes for each group V = (vi,v2, vG-i, vG)
produce an array D = (0, Vi+v2, vG-i, vG)
produce an array P = (vi,v2, vG_i+vG, 0)
285 input majority requirement M
initialize three variables first, second, third, and undone = false for i = 1 to i = G
Figure imgf000015_0001
first = V,
290 output first
else
undone = true
if ( undone)
for i = 1 to i = G
295 if ( Di >= M)
second = Dj
output second
undone = false
if (undone)
300 for i = l to i = G
if ( Pi >= M)
third = Pi
output third
[0046] The gist of the demotion promotion algorithm is to produce a number of alternative results in 305 order to achieve majority. The first result is obtained by counting the original vote. The second result is to count the vote with the top level demoted to the second level. The third result is to count the vote with the lowest level promoted to the second lowest level. Note that this algorithm does not cover all tie situations mathematically. If all three situations do not achieve majority, no class can be chosen from this cluster of training data. This happens when training data is highly chaotic.
310 [0047] Generating class from training data transcends traditional instance based induction. The strength of class based induction is that the model produced generates relatively more consistent classification in deduction. Through class, ambiguity in training data is partially removed. However, potential radically changes may be observed as a result. Referring to FIG. 6, zone 2 will be classified as high credit based on majority count. There are 22 high credit, 16 medium credit and 4
315 low credit in the cluster. The consequence is that training data labelled as low credit is now
reclassified as high credit based on class association. Another potential issue is possible drop of training data in case of no majority can be achieved when in a highly chaotic zone. In comparison, classification based on instances of training data is more tolerant with training data ambiguity at the cost of transferring ambiguity to deduction.
320 [0048] Management of ambiguity in training data is left for the deduction phase for instance based classification. In high dimensional space, the inference tool of choice is nearest neighbors. The classification begins by computing distances between the target and all instances of the labelled training data. Each distance is a element of a distance array. Mapping each distance to the corresponding training data instance. Sorting the distance array. Extracting the K instances
325 corresponding to the K elements with smallest distance. Determine a majority from rating labels for the training data. Assigning the label to the target as classification. K is normally an odd integer to prevent a tie. Classification based on the K - nearest neighbors is best suited for situations where training data is clustered and that clusters are ordered according to the label to be classified.
[0049] Pseudo Code for the K - Nearest Neighbor Algorithm
330 Initialize K
Initialize the distance array D = {dl, d2, dn}
Input labelled training data L = {11, 12, In}
Input classification target T
For i = 1 to n 335 Compute di = distance(T, li)
Mapping di to li
Sort(D)
Electing K elements from L based on sorted D
Determining the majority label from the K elements
340 Assigning the majority label to T as classification
[0050] Classification based on application of the K nearest neighbor algorithm can be unstable when clusters are not ordered according to the label to be classified. Referring to FIG. 6, there are 22 high credit and 16 medium credit training data. Therefore it is possible that a target classified as high credit based on a majority of K nearest neighbors would be reclassified as medium credit if K+2 was 345 chosen. Both borrowers and investors have large stakes in the result of classification. Sensitivity to a parameter is unacceptable.
[0051] As a result, modification of traditional nearest neighbor algorithm is also disclosed in this method. Referring to FIG. 8, an illustration of a process of modified K - nearest neighbor algorithm. Steps 8 1 computes the distance between the classification target and instances of training data.
350 Elements of the distance array is mapped to training data and is sorted in step 8 2. An odd integer K is specified at step 8 3. Based on K, parallel computation based on nearest neighbors are performed. K training data are chosen at step 8 4 1. Classification based on K training data is obtained at step 8 5 1. K+2 training data are chosen at step 8 4 2. Classification based on K+2 training data is obtained at step 8 5 2. The results of 8 5 1 and 8 5 2 are compared at step 8 6. If
355 the results are identical, it is sent to output at step 8 8. If they differ, classification is determined by naive Bayes classification at step 8 7. In this way, classification based on K nearest neighbors is confirmed by overlaying a K+2 result. If confirmation is not possible, a naive Bayes classification is overlaid to process further information. [0052] The Naive Bayes classifier is an inference algorithm developed from the Bayes theorem.
360 Naive Bayes classifier formulates the process of classification given a plurality of features. In Bayes inference, classification is determined by choosing the largest posterior. Referring to equation 7., an expression of the posterior under the naive Bayes condition. The posterior P(Y| Xi , x2, ...xn) is proportional to the product of the likelihood P(xi, x2, xn | Y) and the prior p(Y). P(xi, x2, xn | Y) equals the product of individual P(xi|Y) under the naive Bayes condition. P(xi, x2, xn ) is a
365 constant for inferring Y. Therefore, it is only necessary to compare the product of the likelihood and the prior. Computation of the prior is performed with K nearest neighbors. Computation of the likelihood rely on population and distribution of the training data.
[0053] Equation 7. An Expression of the Posterior Under the Naive Bayes Condition
Figure imgf000018_0001
where :
P ( Y I X j , X2 , ... , X n ) is the posteriori
Y P ( Xj\Y ) is the likelyhood under the naive Bayes condition
P ( Y )is the prior
P ( X ! , X2 , · · · , X„ ) is a constant under the circumstance
370 [0054] Referring to Equation 8., a mathematical expression for the prior. K is the number of nearest neighbors. Ky is the number of instances belong to class Y. The prior P(Y) is computed by dividing Ky with K.
[0055] Equation 8. A Mathematical Expression for the Prior.
Figure imgf000018_0002
375 [0056] For a continuous variable, the likelihood is evaluated at a probability density function of a normal distribution. Referring to Equation 9, an expression for the probability density function of a normal distribution of an accounting ratio. An accounting ratio is represented by x;. The probability density function is f( Xi). Parameters μ and σ are computed with sample population of the training data. It is not necessary to compute all accounting ratios of a vector because the number of factors
380 involved may lead of a problem of over fitting. Only a number of ratios are needed. In the context of accounting ratios formed by dividing an accounting data vector by the total liabilities, the total equity to total liabilities ratio, the total revenue to total liabilities ratio, and the free cash flow to total liabilities ratio can be used for initial trial. The purpose of the trial is to check if the mean for instances with higher credit is actually materially higher than the mean for lower credit. This is
385 necessary because of ambiguity in training data. Other ratios, such as the cash to total liabilities ratio, the operating income to total liabilities ratio can be used to replace in order to ensure proper ordering and separation among the means.
[0057] Equation 9. The Probability Density Function of a Normal Distribution of an Accounting Ratio.
Figure imgf000019_0001
where:
Figure imgf000019_0002
[0058] Pseudo Code for the Naive Bayes Classification Input the K training data identified as nearest neighbor
Identifying different classes Y
395 Computing different P(Y) based on equation 8
Determining N features in the vector of accounting ratios
initialize a variable prod = 1
for i to N
computing P(xj | Y) for the accounting ratio based on equation 9
400 prod = prod * P(x, | Y)
determining the product of P(Y) * prod for each Y
Choosing the class Y corresponds to the maximum of the product.
[0059] The disclosed overlaying of classifier approach is best when training data is clustered. When clustering is weak in the training data, the naive Bayes classifier should be used directly. Referring 405 to FIG. 9, an illustration of a process of applying a naive Bayes classifier on nearest neighbors.
Steps 9 1 computes the distance between the classification target and instances of training data. Elements of the distance array is mapped to training data and is sorted in step 9 2. An odd integer K is specified at step 9 3. A total of K nearest training data is selected in step 9 4. Step 9 5 computes the prior based on the K nearest neighbors. Steps 9 6 computes the likelihood product based on 410 overall training data. Step 9 7 computes the posterior based on the product of the prior and the likelihood product. Step 9 8 determines the maximum of the posterior. It also relates the class associated with the maximum. The result is sent to output at step 9 9.
[0060] With application of different procedures in response to situations in the training data, the method disclosed improves reliability of the final classification. The spacial distribution of the 415 classified targets reflects that of training data. In this way, a much larger target space can be
understood with the limited number of training data. As quantity and diversity of the target population increases, instances of training data will be increased correspondingly. The method disclosed is a tool enabling users to handle a larger corporate credit space with understanding of a small sized training data. In case when a very large target space needs to be classified in real time, 420 instance based classification can be inefficient. Training data class based classification commands better efficiency.
[0061] Referring to FIG. 10, a flow chart of a class based classification process. Classes of training data were generated during the training phase based on clustering through the K- Means algorithm. This is can be done before classification. The classes are loaded at the time of classification at step 425 10 1. When a target is sent to the input at step 10 2, the distance between the target and the centers of the clusters are computed at step 10 3. The distance is mapped to each class at step 10 4. Based on the mapping, the minimum of the distances is chosen at step 10 5. The corresponding class is used to classify the target in step 10 6. A hash table type of data structure can be used for mapping purposes.
430
Industrial Applicability
[0062] Through the classification method outlined, a much larger space of targets can be classified. In recent years, the need for understanding a large number of unknowns based on a relative limited
435 number of classified debt are growing. An example of classified debt is rated corporation bonds. The unknowns are represented by cross boarder bonds, bank loans, and various portfolio management needs. Cross boarder bonds has become a global trend in recent years. Investors need to understand a bond from a emerging country. The classification tool is clearly needed here. In addition to bonds, fixed income investment has extended to bank loans. On one hand, what was kept deep in the bank
440 assets has become public investment. On the other hand, many companies involved do not have a rating. Again, the method described enables the public to understand before they act. Last but not least, portfolio management is a process where one needs to take action based on public disclosures. Creditworthiness is a dynamic concept. Some companies become more creditworthy, others become less. The classifier disclosed provides reliable classification with full accounting data.

Claims

CLAIMS What is claimed:
1. A method of credit classification comprising: concatenating balance sheet, income statement, and cash flow to form a vector of accounting data; dividing elements of the vector with one element from the vector to form a vector of accounting ratios; grouping a plurality of labelled training data in the format of the vector of accounting ratios; computing a type of distance between the labelled training data and a classification target; selecting at least one labelled training data from the nearest neighbors; determining a label from the elected training data; and classifying the target with the label.
2. A method according to claim 1., where at least one instance of training data are selected,
classification is determined by majority of the labels represented in the selected training data.
3. A method according to claim 1., where more than one instance of training data are selected, classification is determined by a naive Bayes classifier.
4. A method according to claim 1, where more than one instance of training data are selected, the selected training data is divided into at least two groups, classification is determined by a combination of K- nearest neighbors and naive Bayes classifier.
5. A method according to claim 1., where the type of distance between the target and labelled training data is Euclidean distance.
6. A method according to claim 1., where the type of distance between the target and labelled training data is similarity distance.
7. A method according to claim 1., where the labelled training data is grouped based on a plurality of rated companies.
8. A method according to claim 1., where the labelled training data is grouped based on a plurality of accounting ratios organized in a tree structure.
9. A method according to claim 1., where weighting is applied to the components of the vectors of accounting ratios.
10. A method comprising: concatenating balance sheet, income statement, and cash flow to form a vector of accounting data; dividing elements of the vector with one element from the vector to form a vector of accounting ratios; grouping a plurality of labelled training data in the format of vector of accounting ratios; computing a type of distance among the labelled training data; clustering the training data through a K - Means algorithm; associating ratings to the clusters; computing distances between a classification target to the centers of the clusters; and classifying a classification target by the cluster with the minimum distance to the target.
11. A method according to claim 10., where the type of distance between the target and labelled training data is Euclidean distance.
12. A method according to claim 10., where the type of distance between the target and labelled training data is similarity distance.
PCT/CA2013/050865 2013-11-13 2013-11-13 Supervised credit classifier with accounting ratios WO2015070314A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CA2013/050865 WO2015070314A1 (en) 2013-11-13 2013-11-13 Supervised credit classifier with accounting ratios

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CA2013/050865 WO2015070314A1 (en) 2013-11-13 2013-11-13 Supervised credit classifier with accounting ratios

Publications (1)

Publication Number Publication Date
WO2015070314A1 true WO2015070314A1 (en) 2015-05-21

Family

ID=53056553

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2013/050865 WO2015070314A1 (en) 2013-11-13 2013-11-13 Supervised credit classifier with accounting ratios

Country Status (1)

Country Link
WO (1) WO2015070314A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021059081A1 (en) * 2019-09-25 2021-04-01 International Business Machines Corporation Systems and methods for training a model using a few-shot classification process
WO2021098618A1 (en) * 2019-11-21 2021-05-27 中国科学院深圳先进技术研究院 Data classification method and apparatus, terminal device and readable storage medium
WO2021110763A1 (en) * 2019-12-04 2021-06-10 Neoinstinct Sa Computer-implemented method for allocating an accounting document to a pair of debtor/creditor accounts and the accounting entry

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999048036A1 (en) * 1998-03-20 1999-09-23 Iq Financial Systems, Inc. System, method, and computer program product for assessing risk within a predefined market
US20040267647A1 (en) * 2003-06-30 2004-12-30 Brisbois Dorion P. Capital market products including securitized life settlement bonds and methods of issuing, servicing and redeeming same
US7720753B1 (en) * 2007-12-04 2010-05-18 Bank Of America Corporation Quantifying the output of credit research systems
US20120023006A1 (en) * 2010-07-23 2012-01-26 Roser Ryan D Credit Risk Mining

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999048036A1 (en) * 1998-03-20 1999-09-23 Iq Financial Systems, Inc. System, method, and computer program product for assessing risk within a predefined market
US20040267647A1 (en) * 2003-06-30 2004-12-30 Brisbois Dorion P. Capital market products including securitized life settlement bonds and methods of issuing, servicing and redeeming same
US7720753B1 (en) * 2007-12-04 2010-05-18 Bank Of America Corporation Quantifying the output of credit research systems
US20120023006A1 (en) * 2010-07-23 2012-01-26 Roser Ryan D Credit Risk Mining

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021059081A1 (en) * 2019-09-25 2021-04-01 International Business Machines Corporation Systems and methods for training a model using a few-shot classification process
WO2021098618A1 (en) * 2019-11-21 2021-05-27 中国科学院深圳先进技术研究院 Data classification method and apparatus, terminal device and readable storage medium
WO2021110763A1 (en) * 2019-12-04 2021-06-10 Neoinstinct Sa Computer-implemented method for allocating an accounting document to a pair of debtor/creditor accounts and the accounting entry

Similar Documents

Publication Publication Date Title
Harris Credit scoring using the clustered support vector machine
WO2019179403A1 (en) Fraud transaction detection method based on sequence width depth learning
CN107563428A (en) Classification of Polarimetric SAR Image method based on generation confrontation network
CN107766418A (en) A kind of credit estimation method based on Fusion Model, electronic equipment and storage medium
CN107819698A (en) A kind of net flow assorted method based on semi-supervised learning, computer equipment
CN109886284B (en) Fraud detection method and system based on hierarchical clustering
CN109739844A (en) Data classification method based on decaying weight
CN110533116A (en) Based on the adaptive set of Euclidean distance at unbalanced data classification method
Doumpos et al. Model combination for credit risk assessment: A stacked generalization approach
CN108629373A (en) A kind of image classification method, system, equipment and computer readable storage medium
Kumar et al. An optimal churn prediction model using support vector machine with adaboost
WO2015070314A1 (en) Supervised credit classifier with accounting ratios
CN111539451A (en) Sample data optimization method, device, equipment and storage medium
CN113362071A (en) Pompe fraudster identification method and system for Ether house platform
Hajiagha et al. Fuzzy C-means based data envelopment analysis for mitigating the impact of units’ heterogeneity
Daneshmandi et al. A hybrid data mining model to improve customer response modeling in direct marketing
Chi et al. Cluster-based ensemble classification for hyperspectral remote sensing images
Wu et al. Customer churn prediction for commercial banks using customer-value-weighted machine learning models
Zheng Application of silence customer segmentation in securities industry based on fuzzy cluster algorithm
Benchaji et al. Novel learning strategy based on genetic programming for credit card fraud detection in Big Data
KR102266950B1 (en) Method of under-sampling based ensemble for data imbalance problem
CN107992878A (en) A kind of outlier detection method based on ELM-Hierarchical Clustering
IMBALANCE Ensemble Adaboost In Classification And Regression Trees To Overcome Class Imbalance In Credit Status Of Bank Customers
CN112926989A (en) Financial transaction risk assessment method and device based on multi-view ensemble learning
Caplescu et al. Will they repay their debt? Identification of borrowers likely to be charged off

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13897588

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13897588

Country of ref document: EP

Kind code of ref document: A1