WO2015070314A1

WO2015070314A1 - Supervised credit classifier with accounting ratios

Info

Publication number: WO2015070314A1
Application number: PCT/CA2013/050865
Authority: WO
Inventors: Yong Liu
Original assignee: Yong Liu
Priority date: 2013-11-13
Filing date: 2013-11-13
Publication date: 2015-05-21

Abstract

The method disclosed details classification of creditworthiness of a company based on knowledge of existing classifications. Balance sheet, income statement, and cash flow data are concatenated. An vector of accounting ratios is derived from the concatenation. Distance between a target to be classified and a plurality of training data is computed on a plurality of measurements. The target is classified based the nearest neighbors.

Description

TITLE OF INVENTION

SUPERVISED CREDIT CLASSIFIER WITH ACCOUNTING RATIOS

TECHNICAL FIELD

[0001] This disclosure relates generally to the field of processing corporate accounting data contained in balance sheet, income statement, and cash flow, and more particularly to using accounting ratios to classify creditworthiness of a corporation based on artificial intelligence.

SUMMARY OF INVENTION

Technical Problem

[0002] In the wake of the great depression of the last century, governments around the world mandated disclosing of accounting information for public companies. A goal was to enable the public to evaluate creditworthiness based on the disclosed information. Accounting data and accounting ratios have been used for credit assessment. However, selective use of accounting data has been a long running deficiency. Attention has been focused on a minority of accounting data such as net income. Most of the accounting data are ignored even when they are disclosed to bring public attention. Further more, as credit situations differ from one individual to another, there is no statically chosen sub set of accounting data or ratios truly represents corporate credit in a universal manner. As a result, credit classification by rating agencies remain partially quantitative.

Solution to Problem

[0003] The method disclosed analyzes accounting data as a whole by applying techniques of artificial intelligence. Balance sheet, income statement, and cash flow are concatenated to form a vector of accounting ratios. Different types distances are used to study unknown vectors based on known vectors through supervised learning. Known vectors are called labelled training data in supervised learning. The method disclosed classify vectors of accounting ratios in the presence of ambiguity in training data. An overlaid classification approach is adopted for highly clustered training data. A naive Bayes classifier is applied when clustering is weak in training data. A class based deduction approach is adopted for real time processing.

Advantageous Effects of Invention

[0004] A vector of accounting ratios embodies full accounting data. This approach represents a new perspective of accounting data analysis. Instead of focusing on a narrow set of ratios, this approach focus on overall aspects such as pattern of data. The traditional sub set of ratios approach discovers problem when individual ratio is wrong. This approach is aiming at discovering when the pattern of the full accounting data is problematic. Classification enables us to make assertions such as overall accounting data indicates health even though individual aspect is problematic.

[0005] The method developed here enables people to independently explore larger credit world based on solid understanding of a small set of corporations. Through classification, an unknown corporate credit is delineated with closest approximations among the small set. By leveraging the technique presented, people gain understanding of many more corporate credits without having to investigate individual accounting data in a repetitive manner.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] The various embodiments, features and advances of the present invention will be understood more completely hereinafter as a result of a detailed description thereof in which reference will be made to the following drawings:

FIG. 1 is an exemplary representation of a process of concatenating balance sheet, income statement and cash flow to form a vector of accounting data;

FIG. 2 is an exemplary representation of a process of converting a vector of accounting data to a vector of accounting ratios;

FIG. 3 is an exemplary representation of dividing sample range into a plurality of intervals;

FIG. 4 is an exemplary representation of organizing labelled training data in a tree structure;

FIG. 5 is an exemplary representation of a process of classification based on supervised training; FIG. 6 is an example of commingling of different ratings;

FIG. 7 is an example of a process of clustering labelled training data through the K- means algorithm;

FIG. 8 is an illustration of a process of modified K - nearest neighbor algorithm;

FIG. 9 is an illustration of a process of applying a naive Bayes classifier on nearest neighbors;

FIG. 10 is a flow chart of a class based classification process ;

DESCRIPTION OF EMBODIMENTS

[0007] Balance sheet, income statement, and cash flow are separate accounting data because they reflect different aspects of corporate finance. Traditionally, individual data is extracted from each of them for analytical purposes. Much attention has been given to some accounting data such as total revenue or some accounting ratio such as debt ratio. While providing deep insight individually, these individual data are not sufficient to summarize overall credit by themselves. Selective combination of accounting ratios have been studied. However, as diversity prevail because of different operation environment and credit background, it is hard to find a universally applicable set to define overall credit.

[0008] Contrary to the prevailing tendency of slicing disclosed information, separated accounting data are combined to form a concatenated accounting data in this method. Referring to FIG. 1, an 75 exemplary representation of a process of concatenating balance sheet, income statement and cash flow to form a vector of accounting data. Balance sheet is represented as 11. Income statement is represented as 12. Cash flow is represented as 13. The vector formed by combining 11, 12, and 13 is represented as 14. This vector contains all information we need to evaluate credit and operation of a company.

80 [0009] Vectors generated this way often suffer lack of comparability. To address this issue, every elements in the vector is further divided by an accounting data from the vector. Referring to FIG. 2, an exemplary representation of a process of converting a vector of accounting data to a vector of accounting ratios. All elements in an accounting data vector is divided by the total liabilities. The total liabilities is itself an element of the vector. The resulting vector is a single set of accounting

85 ratios. The vector of accounting data is 21. The vector of accounting ratio is 22. Elements of 22 is elements of 21 divided by total liabilities. Every element is normalized on per total liabilities basis. Alternatively, if every element is divided by the total revenue, they are normalized on per total revenue basis. Concatenation and division by an accounting data element makes it possible for application of artificial intelligence techniques.

90 [0010] A derived situation in the operations of concatenation and division is to apply weighting in the process. Weighting is sometimes needed to merge accounting ratios for companies from different countries with slightly different reporting standards.

[0011] Equation 1. Representing Accounting Ratios as a Row Vector

[0012] Equation 2. Representing Weightings as a Diagonal Matrix

[0013] Equation 3. The Product of RW is a Row Vector

RW = r₁ *w₁ r₂*w₂ · · · r_n*w_n ^~\ (Eq.3)

[0014] Equations 1. is an expression of the accounting ratios in a row vector form. Accounting ratios r r₂, r_n derived from full accounting data are represented as R = [ r r₂, r_n]. Equation 2. is an expression of weightings in the form of a diagonal matrix. Weightings W=[ wi, w₂, w_n ] are diagonal elements of the matrix. All non diagonal elements are zero. A matrix product of RW obtains a row vector. This is expressed in Equation 3.. In this way, a vector of weighted accounting ratios is expressed as [ nwi, r₂w₂, r_n w_n].

[0015] Vectors generated in this manner forms the basis of supervised learning. By definition, supervised learning is to infer from a plurality of labelled training data. A plurality of vectors of accounting ratios serve the purpose of the labelled training data. Classification targets are also vectors of accounting ratios. Labelled training data is derived from the definitions for credit worthiness.

[0016] Definitions for credit worthiness are essentially same across rating agencies. For example, a rating agency defines a rating scale of AA as "very strong capacity for payment of financial commitments. This capacity is not significantly vulnerable to foreseeable events". Note that accounting ratios are not explicitly related in the definition. Under the circumstance, there are two ways to group labelled training data.

115 [0017] The first is to reuse accounting ratios of rated companies. This approach accepts accounting ratios of rated companies as best representation of the definition of ratings. A set of accounting ratios for an AA rated company is thus an instantiation of "very strong capacity for payment of financial commitments. This capacity is not significantly vulnerable to foreseeable events". Labelled training data is comprising of a plurality of vectors derived from rated companies.

120 [0018] The second ways is to connect the definition of creditworthiness and labelled training data by expounding the definition with accounting ratios. Expounding the definition involves delegating the concept of "capacity for payment of financial commitments" to a number of factors. These factors may involve total equity to total liabilities, free cash flow to total liabilities, current assets to total liabilities, etc.

125 [0019] The process of delegating qualitative definition to a plurality of accounting ratios requires human intervention. In order to represent the capacity, the sample range of the corresponding ratios must be divided into a number of intervals. Division of sample data must take into account of both logical range and histogram of the sample data. Referring to FIG. 3, a representative process of dividing sample range into a plurality of intervals. A histogram 31 is used to infer divisions across

130 the overall range. The division result 32 is a combination of logical range and population

distribution as evidenced from 31.

[0020] Referring to FIG. 3, the histogram of total equity to total liabilities ratios for the sample data indicates that the population with negative equity is low. As a result, zero equity would be adequate demarcation for situations where there are more than three levels of distinction but not appropriate 135 for the present situation. A combined consideration involving both population in the histogram and logical range picks 0.5 as demarcation between low equity ratio and medium equity ratio. The same considerations put demarcation between high equity ratio and medium equity ratio at 1.5. Similar histograms can be produced for other ratios including the free cash flow to total liabilities ratio. Free cash flow is a measurement of cash generated from business operations.

140 [0021] Ratios representing payment capacity may be organized into a tree structure. Referring to FIG. 4, a representative organization of labelled training data in a tree structure. In FIG. 4, each of the three levels of total equity to total liabilities ratios are further divided with three levels of free cash flow to total liabilities ratios. Companies with high equity ratio may have low free cash flow ratio, or medium cash flow ratio, or high cash flow ratio. Note that the choice of three intervals

145 serves an illustration purpose. The actual number may vary depending on sample data and objective of classification. The choice of equity ratio and free cash flow ratio are also made for an illustration purpose. Practical choice of features are not limited to these two types of ratio.

[0022] The leaves of the tree generated in this manner correspond to a plurality of credit situations. The leaf 41 in FIG. 4 corresponds to high equity ratio and high free cash flow ratio. The leaf 42 in

150 FIG. 4 corresponds to high equity ratio and medium free cash flow ratio. Expounding of definition of creditworthiness is completed with mapping of the combination of ratios in each leaf to a definition. For example, the leaf 41 in FIG. 4 can be mapped to "very strong capacity for payment of financial commitments, and this capacity is not significantly vulnerable to foreseeable events". More than one training data can be placed under a leaf. Human intervention described here can be

155 included in a design. No human intervention is needed once the method is implemented.

[0023] Labelled training data generated in both manners can be used in a supervised classification. Referring to FIG. 5, a representative process of classification based on supervised training. The process involves an induction phase and a deduction phase. The objective induction is to build as much coverage as possible with the training data. The objective of deduction is to match 160 classification targets to training data labels with precision.

[0024] A typical supervised learning starts with input of label training data. Referring to FIG. 5, training data is sent to the input in stepl . Step 2 is supervised training which may label the training data to rated companies or to defined ratings. Classification model is generated in step 3. The process of induction involves step 1, step 2, and step 3. The process of deduction involves step 3,

165 step 4, and step 5. Classification model derived in step 3 is applied to unlabelled target data in step 4. A label is generated for the target in step 5 through matching. Supervised learning enables users to focus on deep understanding of a small group of corporate credits while maintaining coverage of a much larger credit world. The small group can be used as labelled training data. The larger credit world can be approximated with training data.

170 [0025] Vector distance provides a necessary measure for the matching process. The Euclidean distance measures multiple dimensional distance through the Pythagorean formula. For vectors of accounting ratios P =[ pi, p₂, p_n] and Q =[ qi, q₂, q_n], the distance is square root of sum of square of differences between corresponding accounting ratios of the vectors involved. [0026] Equation 4. The Euclidean Distance

175 _d (p , Q)₌^_Pi-_qiY₊(_p2-_q2Y _{+ ^ +} (_Pn-_qny (Eq.4) [0027] Example 1. Representative Values of Euclidean Distance a b e d a 0 0.8 1.23 1.28 b 0.8 0 1.7 1.5 c 1.23 1.7 0 1.56 d 1.28 1.5 1.56 0

[0028] Referring to Example 1., a table of representative values of Euclidean distances between four pharmaceutical companies. Company a is an AAA. The Euclidean distance between companies 180 b, c, d and a indicates that they are also high credit. This is separately confirmed by the fact that all of them are rated above A by rating agencies. Note the difference between an overall measurement and individual ratio measurement. Individually, the current debt to total liabilities ratio of company a is significantly above training data average. The training data contains more than 2000 samples.

High current debt implies high likelihood of liquidity problem. From a narrow current debt to total 185 liabilities point of view, company a may not qualify for AAA. However, overall distance

measurement put indicates that the credit quality of these companies are close. Therefore, company a is a strong credit.

[0029] Absolute distance is also used to measure distance. Absolute distance is also called

Manhattan distance. For vectors of accounting ratios P =[ pi, p₂, p_n] and Q =[ qi, q₂, q_n], the 190 distance is sum of absolute value of differences between corresponding accounting ratios of the vectors involved.

[0030] Equation 5. The Absolute Distance

[0031] Referring to Example 2., a table of representative values of absolute distances between four 195 pharmaceutical companies. All four companies are deemed strong capacity to fulfil their financial obligations by rating agencies. The absolute distances between company d and companies a, b, c are 7.2, 7.15, and 8.4 respectively. In perspective, median distance between d and other data in the training space is 17.1. The values for absolute distance are greater than Euclidean distance for the same reason the sum of two sides of a triangle is greater than the value of the third.

200 [0032] Example 2. Representative Values of Absolute Distance a b e d

a 0 4.8 6.8 7.2 b 4.8 0 8.5 7.15 c 6.8 8.5 0 8.4 d 7.2 7.15 8.4 0

[0033] The Euclidean distance and absolute distance measure overall difference between two vectors. By measuring the overall difference, supervised classification presents a new perspective for analyzing accounting data. 205 [0034] Both Euclidean distance and absolute distance provides measurement in terms to what degree vector A is identical to vector B. In many cases, people are interested in how accounting ratios of company A resembles that of company B on overall basis. The emphasis is the relative pattern. The difference in overall similarity is captured by similarity distance. [0035] Equation 6. The Similarity Distance

where p_a , q_a are average values of p_t and q_t .

[0036] Similarity is measured in terms of the Pearson coefficient of correlation. Instead of computing difference between vectors, accounting ratios are computed in terms of variation relative to their mean values. Accounting ratios p! and ¾ are component of separate vectors. Values p_a and q_a are mean values of p! and q_L The distance d is one minus the Pearson coefficient of correlation. 215 Similarity does not guarantee small Euclidean distance.

[0037] Example 3. Representative Values of Similarity Distance a b c d a 0 0.02 0.05 0.05 b 0.02 0 0.09 0.07

c 0.05 0.09 0 0.08 d 0.05 0.07 0.08 0

[0038] Referring to Example 3., a table of representative values of similarity distances between four pharmaceutical companies. All four companies are deemed strong capacity to fulfil their financial

220 obligations by rating agencies. This is independently confirmed with small similarity distances.

Similarity distance measures overall distance in terms of correlation of accounting ratios. Judged by individual ratios, some may not qualify for strong capacity. Pharmaceutical sector may not require holding a large sum of cash to cover liabilities. However, the cash to total liabilities ratio of companies c and d are significantly below sample average of the training data. This example again

225 demonstrates that measuring overall accounting ratios, rather than focus on individual number offers new perspective.

[0039] With instances of training data and a plurality of distance measurements, supervised learning is ready to go. Before proceeding with matching, it is important to understand distribution of the labelled training data. With accounting ratios of rated companies as training data, it is easy to find

230 commingling among different ratings.

[0040] Referring to FIG. 6, a representative example of commingling of different ratings. The histogram indicates population of rated credits within two distance zones from a AAA rated company. Similarity distance is chosen for this example. Zone 1 indicates similarity distance of 0 < d < 0.05. There are 8 high credit and 4 medium credit companies in zone 1. Zone 2 indicates

235 similarity distance of 0.05 < d < 0.1. There are 22 high credit, 16 medium credit, and 4 low credit companies in zone 2. Recalling that similarity distance is one minus the Pearson correlation, this histogram indicated that some company are rated as a medium credit even thought their accounting data normalized by total liabilities are very much similar to an AAA rated company. As a result, a direct application of instance based deduction can create instability in classification. One result can 240 be deduced based on one set of neighboring instances. Another result can be deduced based on

another set of neighboring instances. Further more, a company can be classified as high credit even though another company performs better is classified as medium credit. This situation is a reflection of training data ambiguity. Training data ambiguity also happens in handwriting recognition. The objective of the method detailed here is to provide reasonable classification given ambiguity in 245 labelled training data.

[0041] To a certain extent, ambiguity with training data can also happen when they are derived from expounding creditworthiness definitions. A subset of accounting ratios are involved in grouping the training data. It is likely that the subset does cover all information contained in a vector of accounting ratios. When measured with a distance including a full set of ratios, discrepancy ensues. 250 Ambiguity in training data is a common phenomena is supervised training. Both induction algorithm and deduction algorithm have to keep this phenomena in mind.

[0042] One way to address the ambiguity issue with the training data is to transcend simple grouping of instances during the induction phase. Specifically, this involves the following steps:

(1) discovering natural clusters in the training data;

255 (2) associating a class to each cluster;

(3) mapping as class to a rating;

(4) representing a cluster with a centroid.

As a result, classes, instead of instances, will be product of the induction phase under this approach. Matching will be a conducted between the target and a training class. Key to the success of this 260 approach is clustering. A proven algorithm for clustering is the K - means algorithm.

[0043] Referring to FIG. 7, a representative example of a process of clustering labelled training data through the K- means algorithm. Given N labelled training data, this algorithm groups them into a specified number of K clusters (K < N) based on a distance measurement. As input K is given in STP 1, the algorithm works in a loop comprising of STP 2, STP 3, STP 4 and STP 5. STP 2 265 determines centers for each cluster based on existing information. STP 3 computes distances

between each training data and centers determined in STP 2. STP 4 groups training data based on minimum distance to the centers. STP 5 compares the clusters generated during the present iteration with those generated during the previous iteration. To compare clusters, the number of clusters has to be equal and that constituents of each cluster have to be the same as corresponding cluster in the 270 previous iteration. When the two groups are identical, no training data move group or cluster. If the clusters between the two iterations are not identical, execution goes back to STP 2 to continue the loop. If the clusters are identical, rating of each cluster is determined by a majority of the constituents in the cluster in STP 6. In STP 7, both rating and the center of clusters are presented in the output. Center of cluster is a representative attribute of cluster.

275 [0044] Determining the rating of a cluster by the ratings of the constituents is not standard

ingredient of the K- Means algorithm. Referring to FIG. 6, there are 8 high credit, 4 medium credit, and 0 low credit in zone 1. There is a clear majority in this situation because high credit is 66% of the total population. Zone 1 will be rated as high credit. Key to computing majority is determine the majority in a weak situation. A demotion and promotion algorithm is developed for this purpose. 280 [0045] Pseudo Code for the Demotion and Promotion Algorithm

input the number of groups G

input votes for each group V = (vi,v₂, v_G-i, v_G)

produce an array D = (0, Vi+v₂, v_G-i, v_G)

produce an array P = (vi,v₂, v_G_i+v_G, 0)

285 input majority requirement M

initialize three variables first, second, third, and undone = false for i = 1 to i = G

first = V,

290 output first

else

undone = true

if ( undone)

for i = 1 to i = G

295 if ( Di >= M)

second = Dj

output second

undone = false

if (undone)

300 for i = l to i = G

if ( Pi >= M)

third = Pi

output third

[0046] The gist of the demotion promotion algorithm is to produce a number of alternative results in 305 order to achieve majority. The first result is obtained by counting the original vote. The second result is to count the vote with the top level demoted to the second level. The third result is to count the vote with the lowest level promoted to the second lowest level. Note that this algorithm does not cover all tie situations mathematically. If all three situations do not achieve majority, no class can be chosen from this cluster of training data. This happens when training data is highly chaotic.

310 [0047] Generating class from training data transcends traditional instance based induction. The strength of class based induction is that the model produced generates relatively more consistent classification in deduction. Through class, ambiguity in training data is partially removed. However, potential radically changes may be observed as a result. Referring to FIG. 6, zone 2 will be classified as high credit based on majority count. There are 22 high credit, 16 medium credit and 4

315 low credit in the cluster. The consequence is that training data labelled as low credit is now

reclassified as high credit based on class association. Another potential issue is possible drop of training data in case of no majority can be achieved when in a highly chaotic zone. In comparison, classification based on instances of training data is more tolerant with training data ambiguity at the cost of transferring ambiguity to deduction.

320 [0048] Management of ambiguity in training data is left for the deduction phase for instance based classification. In high dimensional space, the inference tool of choice is nearest neighbors. The classification begins by computing distances between the target and all instances of the labelled training data. Each distance is a element of a distance array. Mapping each distance to the corresponding training data instance. Sorting the distance array. Extracting the K instances

325 corresponding to the K elements with smallest distance. Determine a majority from rating labels for the training data. Assigning the label to the target as classification. K is normally an odd integer to prevent a tie. Classification based on the K - nearest neighbors is best suited for situations where training data is clustered and that clusters are ordered according to the label to be classified.

[0049] Pseudo Code for the K - Nearest Neighbor Algorithm

330 Initialize K

Initialize the distance array D = {dl, d2, dn}

Input labelled training data L = {11, 12, In}

Input classification target T

For i = 1 to n 335 Compute di = distance(T, li)

Mapping di to li

Sort(D)

Electing K elements from L based on sorted D

Determining the majority label from the K elements

340 Assigning the majority label to T as classification

[0050] Classification based on application of the K nearest neighbor algorithm can be unstable when clusters are not ordered according to the label to be classified. Referring to FIG. 6, there are 22 high credit and 16 medium credit training data. Therefore it is possible that a target classified as high credit based on a majority of K nearest neighbors would be reclassified as medium credit if K+2 was 345 chosen. Both borrowers and investors have large stakes in the result of classification. Sensitivity to a parameter is unacceptable.

[0051] As a result, modification of traditional nearest neighbor algorithm is also disclosed in this method. Referring to FIG. 8, an illustration of a process of modified K - nearest neighbor algorithm. Steps 8 1 computes the distance between the classification target and instances of training data.

350 Elements of the distance array is mapped to training data and is sorted in step 8 2. An odd integer K is specified at step 8 3. Based on K, parallel computation based on nearest neighbors are performed. K training data are chosen at step 8 4 1. Classification based on K training data is obtained at step 8 5 1. K+2 training data are chosen at step 8 4 2. Classification based on K+2 training data is obtained at step 8 5 2. The results of 8 5 1 and 8 5 2 are compared at step 8 6. If

355 the results are identical, it is sent to output at step 8 8. If they differ, classification is determined by naive Bayes classification at step 8 7. In this way, classification based on K nearest neighbors is confirmed by overlaying a K+2 result. If confirmation is not possible, a naive Bayes classification is overlaid to process further information. [0052] The Naive Bayes classifier is an inference algorithm developed from the Bayes theorem.

360 Naive Bayes classifier formulates the process of classification given a plurality of features. In Bayes inference, classification is determined by choosing the largest posterior. Referring to equation 7., an expression of the posterior under the naive Bayes condition. The posterior P(Y| Xi , x₂, ...x_n) is proportional to the product of the likelihood P(xi, x₂, x_n | Y) and the prior p(Y). P(xi, x₂, x_n | Y) equals the product of individual P(xi|Y) under the naive Bayes condition. P(xi, x₂, x_n ) is a

365 constant for inferring Y. Therefore, it is only necessary to compare the product of the likelihood and the prior. Computation of the prior is performed with K nearest neighbors. Computation of the likelihood rely on population and distribution of the training data.

[0053] Equation 7. An Expression of the Posterior Under the Naive Bayes Condition

where :

P ( Y I X _j , X₂ , ... , X _n ) is the posteriori

Y P ( X_j\Y ) is the likelyhood under the naive Bayes condition

P ( Y )is the prior

P ( X _! , X₂ , · · · , X„ ) is a constant under the circumstance

370 [0054] Referring to Equation 8., a mathematical expression for the prior. K is the number of nearest neighbors. K_y is the number of instances belong to class Y. The prior P(Y) is computed by dividing K_y with K.

[0055] Equation 8. A Mathematical Expression for the Prior.

375 [0056] For a continuous variable, the likelihood is evaluated at a probability density function of a normal distribution. Referring to Equation 9, an expression for the probability density function of a normal distribution of an accounting ratio. An accounting ratio is represented by x;. The probability density function is f( Xi). Parameters μ and σ are computed with sample population of the training data. It is not necessary to compute all accounting ratios of a vector because the number of factors

380 involved may lead of a problem of over fitting. Only a number of ratios are needed. In the context of accounting ratios formed by dividing an accounting data vector by the total liabilities, the total equity to total liabilities ratio, the total revenue to total liabilities ratio, and the free cash flow to total liabilities ratio can be used for initial trial. The purpose of the trial is to check if the mean for instances with higher credit is actually materially higher than the mean for lower credit. This is

385 necessary because of ambiguity in training data. Other ratios, such as the cash to total liabilities ratio, the operating income to total liabilities ratio can be used to replace in order to ensure proper ordering and separation among the means.

[0057] Equation 9. The Probability Density Function of a Normal Distribution of an Accounting Ratio.

where:

[0058] Pseudo Code for the Naive Bayes Classification Input the K training data identified as nearest neighbor

Identifying different classes Y

395 Computing different P(Y) based on equation 8

Determining N features in the vector of accounting ratios

initialize a variable prod = 1

for i to N

computing P(xj | Y) for the accounting ratio based on equation 9

400 prod = prod * P(x, | Y)

determining the product of P(Y) * prod for each Y

Choosing the class Y corresponds to the maximum of the product.

[0059] The disclosed overlaying of classifier approach is best when training data is clustered. When clustering is weak in the training data, the naive Bayes classifier should be used directly. Referring 405 to FIG. 9, an illustration of a process of applying a naive Bayes classifier on nearest neighbors.

Steps 9 1 computes the distance between the classification target and instances of training data. Elements of the distance array is mapped to training data and is sorted in step 9 2. An odd integer K is specified at step 9 3. A total of K nearest training data is selected in step 9 4. Step 9 5 computes the prior based on the K nearest neighbors. Steps 9 6 computes the likelihood product based on 410 overall training data. Step 9 7 computes the posterior based on the product of the prior and the likelihood product. Step 9 8 determines the maximum of the posterior. It also relates the class associated with the maximum. The result is sent to output at step 9 9.

[0060] With application of different procedures in response to situations in the training data, the method disclosed improves reliability of the final classification. The spacial distribution of the 415 classified targets reflects that of training data. In this way, a much larger target space can be

understood with the limited number of training data. As quantity and diversity of the target population increases, instances of training data will be increased correspondingly. The method disclosed is a tool enabling users to handle a larger corporate credit space with understanding of a small sized training data. In case when a very large target space needs to be classified in real time, 420 instance based classification can be inefficient. Training data class based classification commands better efficiency.

[0061] Referring to FIG. 10, a flow chart of a class based classification process. Classes of training data were generated during the training phase based on clustering through the K- Means algorithm. This is can be done before classification. The classes are loaded at the time of classification at step 425 10 1. When a target is sent to the input at step 10 2, the distance between the target and the centers of the clusters are computed at step 10 3. The distance is mapped to each class at step 10 4. Based on the mapping, the minimum of the distances is chosen at step 10 5. The corresponding class is used to classify the target in step 10 6. A hash table type of data structure can be used for mapping purposes.

430

Industrial Applicability

[0062] Through the classification method outlined, a much larger space of targets can be classified. In recent years, the need for understanding a large number of unknowns based on a relative limited

435 number of classified debt are growing. An example of classified debt is rated corporation bonds. The unknowns are represented by cross boarder bonds, bank loans, and various portfolio management needs. Cross boarder bonds has become a global trend in recent years. Investors need to understand a bond from a emerging country. The classification tool is clearly needed here. In addition to bonds, fixed income investment has extended to bank loans. On one hand, what was kept deep in the bank

440 assets has become public investment. On the other hand, many companies involved do not have a rating. Again, the method described enables the public to understand before they act. Last but not least, portfolio management is a process where one needs to take action based on public disclosures. Creditworthiness is a dynamic concept. Some companies become more creditworthy, others become less. The classifier disclosed provides reliable classification with full accounting data.

Claims

CLAIMS What is claimed:

1. A method of credit classification comprising: concatenating balance sheet, income statement, and cash flow to form a vector of accounting data; dividing elements of the vector with one element from the vector to form a vector of accounting ratios; grouping a plurality of labelled training data in the format of the vector of accounting ratios; computing a type of distance between the labelled training data and a classification target; selecting at least one labelled training data from the nearest neighbors; determining a label from the elected training data; and classifying the target with the label.

2. A method according to claim 1., where at least one instance of training data are selected,

classification is determined by majority of the labels represented in the selected training data.

3. A method according to claim 1., where more than one instance of training data are selected, classification is determined by a naive Bayes classifier.

4. A method according to claim 1, where more than one instance of training data are selected, the selected training data is divided into at least two groups, classification is determined by a combination of K- nearest neighbors and naive Bayes classifier.

5. A method according to claim 1., where the type of distance between the target and labelled training data is Euclidean distance.

6. A method according to claim 1., where the type of distance between the target and labelled training data is similarity distance.

7. A method according to claim 1., where the labelled training data is grouped based on a plurality of rated companies.

8. A method according to claim 1., where the labelled training data is grouped based on a plurality of accounting ratios organized in a tree structure.

9. A method according to claim 1., where weighting is applied to the components of the vectors of accounting ratios.

10. A method comprising: concatenating balance sheet, income statement, and cash flow to form a vector of accounting data; dividing elements of the vector with one element from the vector to form a vector of accounting ratios; grouping a plurality of labelled training data in the format of vector of accounting ratios; computing a type of distance among the labelled training data; clustering the training data through a K - Means algorithm; associating ratings to the clusters; computing distances between a classification target to the centers of the clusters; and classifying a classification target by the cluster with the minimum distance to the target.

11. A method according to claim 10., where the type of distance between the target and labelled training data is Euclidean distance.

12. A method according to claim 10., where the type of distance between the target and labelled training data is similarity distance.