KR20170128023A

KR20170128023A - 데이터 뎁스를 활용한 기업 부도 예측 방법

Info

Publication number: KR20170128023A
Application number: KR1020160059121A
Authority: KR
Inventors: 배석주; 김성도
Original assignee: 한양대학교 산학협력단
Priority date: 2016-05-13
Filing date: 2016-05-13
Publication date: 2017-11-22

Abstract

데이터 뎁스를 활용한 기업 부도 예측 방법이 개시된다.

Description

데이터 뎁스를 활용한 기업 부도 예측 방법{Data Depth Based Support Vector Machines for Predicting Corporate Bankruptcy}

본 발명은 데이터 뎁스를 활용한 기업 부도 예측 방법에 관한 것이다.

External uncertainties threatening the corporate survival have increased drastically due to unpredictable changes of global economy environment. In the face of such uncertainties, financial institutions are endeavoring to prepare sophisticated countermeasures on delays or defaults on liability fulfillment of firms. The rationale behind predicting default risks from companies liable to pay is the aim of financial institutions to minimize their capital exposure to risks, and to mitigate even their own default risks. The bankruptcy prediction has been extensively studied in the finance and management literature for the last two decades (Lee et al. (1997); Salcedo-Sanz et al. (2005); Li and Sum(2008); Tasi(2009), to name a few). It has become even more important since the Basel Committee on Banking Supervision(Basel II) established borrowers' rating as the key criterion for minimum capital requirements of banks. In general, when a firm faces a default on liability fulfillment eventually, there may exist some symptoms or pre-alarm signals presenting a financial crisis for the firm itself.

Early studies of bankruptcy prediction exploited parametric statistical techniques such as multiple discriminant analysis (MDA) (Altman, 1968), logit model (Ohlson, 1980) and probit model (Zmijewski, 1984). Strict assumptions for the statistical approaches, e.g., linearity, normality, pre-existing functional forms relating criterion variables to predictor variables, however, have limited their applications in the finance area. To overcome the hurdle, nonparametric techniques like artificial intelligence (AI) have been employed to corporate bankruptcy or distress prediction since the late 1980s. The AI techniques include decision trees (DT), artificial neural networks (ANN), genetic algorithms (GA), and back- propagation network (BPN). Odom and Sharda (1990) first introduced the ANN model to predict corporate bankruptcy. Tam and Kiang (1992) compared the performance of neural network model with those of linear discriminant model, logit model, the ID3 algorithm, and the k-nearest neighbor approach, based on insolvent data of commercial banks. They showed that the ANN provides more accurate and robust prediction results among them. While a number of application studies have reported outstanding performance of the ANN, the ANN has a difficulty in clearly explaining prediction results due to the lack of explanatory power and generalization because of overfitting problem. Additionally, it requires significant time and e?orts to construct a best architecture through multiple layers (Sarle, 1995; Lawrence et al., 1997).

As an alternative to the ANNs, recently a support vector machine (SVM) is attracting a special attention in financial distress modeling area. The SVMs have reported better classification results than parametric statistical methods or other nonparametric techniques such as ANNs and BPNs. Moreover, it can overcome overfitting problem via the concept of a structural risk minimization. Haardle et al. (2003) first introduced the SVM to corporate bankruptcy prediction, and compared its performance with ANN, MDA, and a learning vector quantization proposed by Fan and Palaniswami (2000). By mapping input variables onto a high-dimensional feature space, Min and Lee (2005) showed that SVM transforms complex problems of corporate bankruptcy prediction into simpler ones to which linear discriminant functions could be applied. H?aardle et al. (2009) explored the suitability of smooth SVMs in predicting corporate default risk. They investigated how key factors such as selection of appropriate accounting ratios, length of training period, and structure of the training sample influence the prediction precision of corporate bankruptcy. While the SVM achieves an excellent classification accuracy, a main disadvantage of the method is the di?culty in interpreting the modeling results.

For a decade, a number of researchers have actively developed hybrid approaches to predict corporate bankruptcy. Hybrid approaches combine several classification methods to secure greater accuracy than individual (parametric or nonparametric) models. Min et al. (2006) and Ahn et al. (2006) employed a genetic algorithm to design an SVM-based tech- nique for corporate bankruptcy prediction. The selection of both SVM hyper-parameters and input features is integrated into one learning process in the genetic algorithm. Van Ges- tel et al. (2006) applied the Bayesian evidence framework (e.g., MacKay, 1992; Gestel et al., 2002) to find hyper-parameters for the least squares support vector machine (LS-SVM).

본 발명은 Data Depth를 활용하여 기업 부도를 예측할 수 있는 방법을 제공하기 위한 것이다.

또한, 본 발명은 많은 재무재표들이 다변량 정규분포를 따르지 않기 때문에 이러한 재무재표들을 Data depth라는 measure로서 일차원으로 축소시킨 후에 DD-plot이라는 그래픽한 방법을 통해 2차원으로 나타낸 후 이를 Support Vector Machine을 통해 분류를 통해 기업 부도 예측 확률을 높일 수 있는 방법을 제공하기 위한 것이다.

본 발명의 일 실시예에 따르면, 많은 재무재표들이 다변량 정규분포를 따르지 않기 때문에 이러한 재무재표들을 Data depth라는 measure로서 일차원으로 축소시킨 후에 DD-plot이라는 그래픽한 방법을 통해 2차원으로 나타낸 후 이를 Support Vector Machine을 통해 분류를 통해 기업 부도 예측 확률을 높일 수 있는 방법이 제공된다.

본 발명의 일 실시예에 따른 데이터 뎁스를 활용한 기업 부도 예측 방법을 제공함으로써, 많은 재무재표들이 다변량 정규분포를 따르지 않기 때문에 이러한 재무재표들을 Data depth라는 measure로서 일차원으로 축소시킨 후에 DD-plot이라는 그래픽한 방법을 통해 2차원으로 나타낸 후 이를 Support Vector Machine을 통해 분류를 통해 기업 부도 예측 확률을 높일 수 있다.

Figure 1 illustrates the DD-plot for simulated multivariate data.
Figure 2 illustrates the Plot of the Mahalanobis depth values.
Figure 3 illustrates the DD-plot based on the Mahalanobis depths.
Figure 4 illustrates SVM classification results using the RBF kernal functions.

기존의 기업의 부도예측을 위한 방법은 기업의 재무재표를 바탕으로 Logistic 회귀모형, Multiple Discriminant Analysis, Artificial Neural Network, Support Vector Machine (SVM) 등을 활용하여 부도날지 아니면 생존할지를 예측하는 방법이다. 하지만 Multiple Discriminant Analysis 방법의 경우 재무재표가 다변량 정규분포를 따라야만이 좋은 결과가 도출될 수 있지만 많은 재무재표들이 다변량 정규분포를 따르지 않는다.

본 발명에서는 많은 재무재표들이 다변량 정규분포를 따르지 않기 때문에 이러한 재무재표들을 Data depth라는 measure로서 일차원으로 축소시킨 후에 DD-plot이라는 그래픽한 방법을 통해 2차원으로 나타낸 후 이를 Support Vector Machine을 통해 분류한 결과 기존의 방법보다 훨씬 부도예측 확률을 높일 수 있는 이점이 있다.

we propose a hybrid method based on a data depth (DD) and SVM to achieve more accuracy on bankruptcy prediction of the firms in Korea. As one of non- parametric multivariate techniques, the data depth estimates a representative value from multivariate data which may possess nonlinear and non-normal characteristics. The method (referred to as DD-SVM hereinafter) calculates the data depth for annual financial ratios because the ratios are unlikely to follow a multivariate normal distribution, and applies the nonlinear SVM to the data depth plot which presents the depth values of the combined sample of both the failed and non-failed firms to classify a binary output variable. We will compare its performance with other parametric methods or AI methods including ANNs in terms of bankruptcy prediction accuracy.

The remainder of this paper is organized as follows. The basic modeling ideas for predicting corporate bankruptcy are presented in Section 2. The ideas are based mainly on the introduction of the nonlinear SVM to the data depth plot to classify failed or non- failed firms. The research data and pre-analysis results are given in Section 3. Section 4 presents an empirical analysis for corporate bankruptcy prediction to demonstrate the proposed method, along with a comparison with other competing prediction models. Section 5 concludes this study, and discusses directions for future research.

1 The Modeling of Bankruptcy Prediction

1.1 Data Depth Functions

The word "depth" was first used by Tukey (1975) to picture high-dimensional data, and the far reaching ramifications of depth in ordering and analyzing multivariate data was elaborated by Liu (1990), Donoho and Gasko (1992), Liu et al. (1999) and others. Data depth characterizes the centrality of a high-dimensional data with respect to a distribution or a multivariate sample. Viewed as a method of dimension reduction, data depth does not rely on link functions, kernel functions, or other refined mappings unlike related methods such as principal components.

In order to form a general definition of a "depth fuction" Zuo and Serfling (2000) defined a statistical depth function to be a bounded, non-negative mapping that satisfies four desirable properties: (1) a?ne invariance; (2) maximality at center; (3) monotonicity relative to deepest point; and (4) vanishing at infinity. A?ne invariance means that the relative depth of a point should not depend on the underlying coordinate system or the scales of the underlying measurements. For a distribution having a uniquely defined "center" maximality at center indicates that the depth function should attain the maximum at this center. Monotonicity relative to the deepest point means that as a point moves from the center outward, the corresponding depth should decrease monotonically. Vanishing at infinity requires that the depth of a point should tend to zero when its norm tends to infinity. Among a number of data depth functions possessing the above properties, Mahalanobis depth, simplicial depth and Tukey?s depth are the most popular of these.

Mahalanobis Depth

Mahalanobis (1936) introduced a distance function based on Hotelling's T ²statistic (“Mahalanobis depth”). Serving as the first data depth concept, it measures how deep a point

is with respect to a given distribution G. The depth function is given by

Eq.(1)

where

and

denote the mean vector and the covariance of the reference distribution G, respectively. In general, because G is unknown, the sample version of Mahalanobis depth is obtained by replacing

and

with their sample estimates

and

for multivariate

data set { x ₁ , . . . , x _n }.

Simplicial Depth

Liu (1990) intoroduced “simplicial depth”, which is determined by counting simplices de- rived from n data points. For a reference distribution G on Rp, the simplicial depth of a data point x with respect to G is defined by

Eq. (2)

where x1, . . . , xp+1 are independent observations from G and S[x1, . . . , xp+1] is the simplex with vertices x1, . . . , xp+1. In other words, S[x1, . . . , xp+1] is the set of all points in Rp which are convex combinations of {x1, . . . , xp+1}. The sample version of SD is obtained by replacing G in SD(G; x) by Gn, or alternatively, by computing the fraction of the sample random simplices containing the point x as

Eq. (3)

where

is the indicator function. Liu(1990) showed that SD(G; x) is a?ne invariant, and that if G is absolutely continuous, then SD(Gn; x) converges uniformly and strongly to SD(G; x) as n → ∞. we can confirm that x is contained in the simplex S[xi1 , . . . , xip+1 ] if x can be expressed as a convex combination of {xi1 , . . . , xip+1 }.

Tukey Depth

Tukey(1974) proposed a half-space depth which is now commonly called “Tukey depth”. The half-space depth is the smallest proportion of data points contained on one side of any hyperplane passing through a data point x, including points lying on the hyperplane; that is, the Tukey depth is defined as

Eq. (4)

with respect to the reference distribution G.

In the bivariate data, for instance, the Tukey depth calculates the smallest proportion of data points contained on one side of any line (L) passing through x, including points lying on the line itself. Following the method by Rousseeuw and Ruts (1996), for example, two-dimensional Tukey depth requires the vector connecting a fixed x to each member of the reference sample x1, . . . , xn and then measure the angles of these vectors with the positive x-axis. Instead of counting the minimum number of points lying on one side of the line passing through x and a reference sample, we can count the minimum number of angles between the angle of L and its opposite angle. With that, the empirical formula for Tukey depth of x is

where

Here,

is the angle of

for i = 1, . . . , n. We can assume

and

and so on. See Bae et al. (2015) for the details on calculation of the Tukey depth, along with simple examples.

Besides these three popular data depths, there are several other data depth metrics, e.g., “convex hull peeling depth” by Barnett (1976), “likelihood depth” by Fraiman and Meloche (1999), “regression depth” by Rousseeuw and Jubert (1999), and “Lens depth” by Liu and Moddares (2011), to name a few.

2.1 DD -plot

The depth vs. depth plot (called DD-plot), which was first introduced by Liu et al.(1999), is a useful analysis tool for graphical comparisons of two multivariate distributions or sam- ples based on data depth. For two given multivariate samples, its DD-plot represents the depth values of the combined sample under the two corresponding empirical distributions. It transforms the two multivariate samples in any dimension to a simple two-dimensional scatterplot. Li et al. (2012) addressed some advantages of the DD-plot in classification problems: The best separating curve in the DD-plot is determined automatically by the underlying probabilistic geometry of the data, and the classification outcome can be easily visualized in the two-dimensional DD-plot. This is much simpler than tracking the clas- sification outcome in the original sample space of high-dimensional multivariate data. In particular, the DD-plot is robust against outliers and extreme values.

Let F and G be two distribution on R ^p and D be an a?ne-invariant depth. For two ran- dom samples drawn from F and G, { x ₁ , . . . , x _n }(= X) and { y ₁ , . . . , y _m }(= Y), respectively, the DD-plot is defined as

Eq. (4)

when F and G are known. If both F and G are unknown, the sample version of DD-plot is given by

Eq. (5)

Note that DD(F, G) as well as DD(F _n , G _m ) are always subsets of R² no matter how large is the dimension p of the data. If the two given distributions are identical, that is, F = G, then the resulting DD(F, G) is simply a line segment on the 45 ^? line in the DD-plot.

Deviation patterns from this straight line indicate a specific type of di?erence between the two underlying distributions.

Figure 1-(a) shows the DD-plot for two samples(n = m = 500) drawn from the standard bivariate normal distribution. It can be observed that the data are scattered around the 45 ^? line in the plot.

Figure 1-(b) presents the DD-plot for two samples (n = m = 500) where one is drwan from the standard bivariate normal distribution while the other is drawn from the bivariate normal with the mean (2, 0) ^T . All of the DD-plots are constructed using the Mahalanobis depth. The DD-plot shows quite clearly that the observations from two different distributions scatter around 45 ^? line in an almost symmetric manner. The 45 ^? line can be used as the separating line for two different samples in the DD-plot. Its corresponding classification rule is simple in that we assign x to F if D(F _n ; x) > D(G _m ; x), and assign x to G otherwise.

Note that the classifier in the DD-plot is the same concept of the maximum depth classifier in Ghosh and Chaudhuri (2005).

In general, however, the high-dimensional observations do not scatter symmetrically along the 45 ^? line because they have different dispersion structures as well as different locations, hence the linear classifier does not perform well. In this study, we introduce a nonlinear support vector machine (SVM) to classify failed firms and non-failed firms based on the DD-plot.

2. 3 The Nonlinear Support Vector Machine

The bankruptcy prediction can be formulated as a two-class classification problem. We will apply the SVM approach to bankruptcy prediction using real-lfe data from Korean manufacturing companies, and then compare its empirical results with those from other prediction models.

A support vector machine (SVM), which was introduced from statistical learning the- ory by Vapnik (1995), is the powerful classification method which provides better solutions to decision boundary than could be obtained using the traditional neural networks. SVM uses a linear model to construct nonlinear class boundaries through nonlinear mapping of input vectors into a high-dimensional feature space. In general, the linear model in the new space represents a nonlinear decision boundary in the original space. SVM implements the principle of structural risk minimization which aims to reduce the bound of the mis- classification error, by constructing an optimal separating hyperplane in a high-dimensional feature space using quadratic programming to find a unique solution. The application areas SVMs include text categorization, digital image idenfication, handwriting recognition, function approximation and regression, and time series forecasting.

Assuming that there are p financial ratios in our classifier, the (bivariate) data-depth data of predictor variables for the ith firm in the DD-plot can be represented by the vector x _i . The financial status of the ith firm is denoted by

, where +1 represents non-failed firm and -1 for failed firm. Given a training set

, the decision rule which finds the optimal hyperplane separating the binary decision classes is given by

Eq. (6)

for the linearly separable case, where the Lagrange multiplier

and the bias b are the parameters that determine the optimal separating hyperplane with respect to an input vector x. For the non-linearly separable case, Eq. (6) is given as follows:

Eq. (7)

where K(xi, x) is defined as the kernel function which performs the nonlinear mapping between input space and a (high-dimensional) feature space. Popular kernel functions to construct the decision rules are the radial basis function (RBF) kernel K(xi, x) =

, where σ is a tuning parameter; the linear kernel

the polynomial kernel

with degree d and a tuning parameter

and multilayer perceptron (MLP) kernel K(xi, x) =

. Note that the MLP kernel is not positive semi-definite for all choices of the tuning paramter

and

.

The dual problem to find an optimal separating hyperplane, along with the kernel function

, is re-written as the following quadratic programming problem:

for the upper bound C with respect to

. The two-class classification problem via implementation of the optimal separating hyperplane in the feature space is determined by the nonlinear SVM classifer (7). If the bias term b is implicitly a part of the kernel function, as the case of the RBF kernel, Eq. (7) is reduced to

As one of the big issues in SVM, the selection of appropriate values of the parameters plays an important role in building a bankruptcy prediction model with high prediction accuracy and stability. However, there are no general rules which guarantee the best values of the parameters for the application problems. Lin (2001) provided a systematic method for selecting the parameters of the SVM by adopting the concept of the sampling theory into Gaussian filter. In corporate bankruptcy prediction, Min and Lee (2005) proposed a grid-search technique using 5-fold cross-validation to determine the optimal values of the parameters in the kernel function of the SVM. Min et al. (2006) and Wu et al. (2007) used the genetic algorithm to optimize the values of the parameters C and

with the RBF kernel. Van Gestel et al. (2006) introduced Bayes' formula to the inference of the RBF kernel parameter

in the least squares SVM.

3 Research Data

The population of the firms of our interest in this study comprises all of the manufactur- ing firms in Korea Composite Stock Price Index 2000 (KOSPI 2000) from 2000 to 2013. The lists of manufacturing firms were extracted from Korea Listed Companies Association

(KLCA). The KLCA also provides financial information related to the firms through the audit reports from external auditors. This research studies 144 firms which have filed for bankruptcy petition in Korea after 21st century to mainly take corporate management risks into account, and exclude the cases of irresistible liability failures from structural risks like abrupt economic crises (e.g., the bailout from the International Monetary Fund (IMF) in 1997). Originally, the corporate failure was defined as bankruptcy, default on bonds, the overdrawing of a bank account, or non-payment of a preferred stock dividend (Beaver, 1966). For simplicity, however, we only limited a corporate failure to the corporate bankruptcy in this study. For same period, we selected 144 'non-failed' firms randomly from all solvent firms. A failed firm was paired with a non-failed firm in a similar industry, dealing with similar products, with similar capitalization, and with similar values of assets. The one-to- one matching ratio may cause oversampling problems (Zmijewski, 1984). However, in order to highlight the e?ects of key financial ratios on the likelihood of corporate bankrupcy, a matched sample of non-failed firms is selected. We only selected medium- and large-size firms with the property amount to at least $10 billion. Both the failed and non-failed data are arbitrarily split into two subsets: about 80% of the data is used for a training set and 20% for a validation set for k-fold cross validation. The training data is used to build the bankruptcy prediction model using the data depth and SVM. The prediction model is verified by the validation data which is not utilized to construct the model.

The KLCA reported 111 financial ratios representing profitability, stability, activity, and productivity with respect to individual firms. Out of them, we selected 44 significant ratios by using two-sample t-test between failed firms and non-failed firms. The selected financial ratios, which are summarized in Table 1, were used to build up a prediction model for classifying failed or non-failed firms. We analyzed the financial data on individual firms for 10 years preceeding bankruptcy (or survival till 2013) to examine the existence of chronological trends in corporate bankruptcy.

Multivariate Normality Test

Existing parametric distress models have been constructed under the multivariate normality assumption. However, empirical results have shown that most of financial ratios violate the normality assumption in practice, justifying the introduction of nonparametric approaches. First, we checked the multivariate normality for the financial ratios using Henze-Zirkler's test statistic (1990). The Henze-Zirkler's test statistic for multivariate normality is given

Table 1: Financial ratios included in a bankruptcy prediction model

by

where, p is the number of variables,

,

is the Mahalanobis distance of the ith observation to the centroid, and

is the Mahalanobis distnacne between the ith and jth observations, and

is a smoothing paramter. If the data are (multivariate) normally distributed, the test statistic is approximately log-normally distributed with the mean and the variance, which correspond to, respectively

where

. By using the log-normal distribution parameters, the Wald test statistic can be applied to test the significance of multivariate normality. The Henze- Zirkler?s test was implemented by MVN package in the R software. The multivariate nor- mality test results of corporate financial ratios for 10 years are summarized in Table 2. The periods represent the years prior to bankruptcy for failed firms and survival years for non-failed firms. All of the p-values derived from the Henze-Zirkler?s test are very close to zero, concluding that this data set does not satisfy multivariate normality assumption.

Table 2: Multivariate normality test for financial ratios of the firms in Korea.

Period	1	2	3	4	5	6	7	8	9	10
Non-failed firms	1.002	1.001	1.001	1.002	1.001	1.001	1.001	1.004	1.002	1.011
Failed firms	1.023	1.055	1.031	1.101	1.011	1.013	1.009	1.095	1.072	1.072

4 Analytical Results

To predict the bankruptcy of manufacturing firms in Korea, this study will be conducted by the following steps:

1) Reducing the number of multi-dimensional financial ratios by the data depths.

2) Plotting the values of data depths into DD-plot.

3) Classifying the data points in DD-plot using the nonlinear SVM.

We compared the performance of the proposed method with other existing bankruptcy prediction models in the literature. The hit ratio of classification was used as an indicator to evaluate the predictive accuracy of each models. Besides, type I error (defined as the probability that a firm predicted not to fail will in fact fail) and type II error (defined as the probability that a firm predicted to fail will not in fact fail) were also included in the evaluation criteria.

Computing Data Depths

The preliminary analysis shows that the financial ratios of Korean manufacturing firms de- viate from multivariate normality assumption. Instead of parametric approaches, therefore, we introduce nonparametric methods based mainly on the data depth functions to predict bankruptcy of the manufacturing firms in Korea. Three types of data depths were ap- plied to reduce the high-dimensional financial ratios; Mahalanobis depth, simplicial depth, and Tukey depth. For every failed firms x _i (i = 1, 2, . . . , 144) and non-failed firms y _i (i = 1, 2, . . . , 144), 44 financial ratios for failed and non-failed firms were condensed into an one-dimensional measure of data depth without any distributional assumption. For exam- ple, Figure 2 presents the depth values resulting from the Mahalanobis measure for both failed and non-failed firms for 10 years. Figure 2 does not show any remarkable di?erence between the two groups in a scatter plot. We could find similar trend with respect to the other two depth measures. In general, because computing the Tukey depth for dimensions higher than three are intractable using the method by Rousseeuw and Ruts (1996), we can use the random approximation algorithm introduced by Cuesta-Albertos and Nieto-Reyes (2008), which is computationally e?cient in high-dimensional data set.

Plotting via DD -plot

The DD-plot can serve as simple diagnostic tools for visual comparisons of two samples of any dimension. Di?erent distributional di?erences (e.g., the changes of location, scale, skewness or kurthosis) may present different graphical patterns in DD-plots. We drew the DD-plots based on the three measures of data depths. For example, the DD-plots based on the Mahalanobis depth for 10 years are given in Figure 3. In comparing failed and non- failed firms, their scatter plots in Figure 3-(a) and 3-(b) present any separable di?erence between the two groups. Note that the failed firms scatter at the bottom of the DD-plot at two years before the bankruptcy, representing a location shift from the group of non-failed firms. However, the separation trend decreased gradually at five (Figure 3-(c)) and six years (Figure 3-(d)) before the bankruptcy or survival. At nine (Figure 3-(e)) and ten years (Figure 3-(f)) (prior to bankruptcy or survival), the resulting DD(F, G) were scattered on the 45 ^? line in the DD-plot, which means the distributions of the two groups are almost the same.

In summary, it is likely to separate the firms encountering imminent bankruptcy from financially healthy firms using the DD-plot. Based on the values of data depths on the DD-plot, therefore, we tried to classify failed firms and non-failed firms via the nonlinear SVM at next step. Because we could also observe similar trend when we employed different depth measures (the simplicial and Tukey depth), we will illustrate classification results

from the nonlinear SVM based mainly on the Mahalanobis depth.

Classifying via the Nonlinear SVM

In the context of the corporate bankruptcy classification problem, we introduced the non- linear SVM to two-dimensional data set on the DD-plot (called "DD-SVM"). This study employed three kernel functions for the nonlinear SVM; RBF, polynomial, MLP kernel func- tions. The e?ectiveness of the SVM techniques depends on the proper selection of a kernel function, the parameters of the kernel function chosen, and the soft margin parameter C.

For example, there are two parameters associated with the RBF kernels; a tuning parameter

and C. In general, it has not been known beforehand which values of C and

are the best for one problem; consequently, some kinds of model selection approaches may be employed. After conducting the grid-search for training set by following the ap proach by Erdogan (2013), the best combination of

and C were selected by a grid- search algorithm with exponentially growing sequences using 5-fold cross-validation such

as

and C ? {2 ^- ⁵ , 2 ^- ³ , . . . , 2¹³ , 2¹⁵ } . For first year before bankruptcy, Table 3 shows that the optimal values of the parameters

in the RBF kernel (even if they are not unique) report the prediction accuracy of 96.55% for the validation set. In the polynomial kernel function, the best prediction performance for first year was obtained at

, giving the prediction accuracy of 96.55%. For the MLP kernel function, the best prediction results for first year were obtained at the values of the parameters

as (2 ^- ⁵ , 1, 2⁵) and (2 ^- ⁷ , 1, 2⁷), giving the prediction accuracy of 91.38%. We can confirm by the analytical results that the prediction performance of the nonlinear SVM is sensitive to the kernel parameters and the upper bound C.

Table 4 summarizes the prediction results of corporate bankruptcy in Korea for 10 years. We compared the prediction powers from the DD-SVM with those from the nonlinear SVM based on 44 financial ratios themselves. Overall, the DD-SVM outperforms the nonlinear

Table 3: Grid-search results using the RBF kernel function for first year before bankruptcy.

SVM in terms of prediction accuracy of corporate bankruptcy. As shown in Table 4, it is noted that the prediction accuracy resulting from the DD-SVMs drastically decreases after seven years prior to bankruptcy. The RBF kernel of the DD-SVM produces the best prediction performance for 10 years. Prediction powers from the polynomial kernel are comparable with those from the RBF kernel. However, the polynomial kernel requires more hyperparameters (the polynomial degree d) than the RBF kernel and under-fitting and over- fitting problem can occur when d is poorly chosen, therefore we selected the RBF kernel to compare with other traditional approaches in bankruptcy prediction. Figure 4 presents the classification results using the RBF kernel in the nonlinear SVM on DD-plot, based on the Mahalanobis depth. Note that all of the SVM analyses were conducted using R Kernlab package.

Table 4: Comparisons of prediction accuracy between DD-SVM and SVM.

Kernel		1	2	3	4	5	6	7	8	9	10
RBF	DD-SVM	0.9655	0.9655	0.8966	0.9483	0.8793	0.9138	0.7586	0.6724	0.6034	0.6724
	SVM	08444	0.8103	0.7414	0.7759	0.7241	0.6724	0.6897	0.6379	0.6897	0.7069
Polynomial	DD-SVM	0.9655	0.9655	0.8793	0.9483	0.8793	0.9138	0.7586	0.6724	0.5862	0.5517
	SVM	0.8276	0.8103	0.7586	0.7586	0.7759	0.6724	0.6724	0.7069	0.6897	0.7069
MLP	DD-SVM	09138	0.9483	0.8621	0.9310	0.8621	0.8966	0.7586	0.6724	0.5690	0.5517
	SVM	0.8103	0.7931	0.7759	0.7586	0.7931	0.6897	0.6552	0.7069	0.7241	0.7414

We compared the prediction power of the DD-SVM with other traditional bankruptcy prediction methods. The bankruptcy classifiers considered in the comparison studies are the logistic regression, multiple discriminant analysis (MDA), and artificial neural net- work (ANN). The logit model in the logistic regression was employed to investigate the relationship between binary response and financial ratios without multivariate normality assumption. The regression parameters were estimated by using the maximum likelihood estimation method. The final logit models for 10 years were selected by using the stepwise selection method. We introduced MDA to derive a linear combination of 44 financial ratios that best discriminates between failed and non-failed firms. The ANN model in this study employed a three-layer connected back-propagation. Following the approach in Min and Lee (2005), after fixing the number of layers as one, we varied the number of hidden nodes (8, 12, 16, 24, 32) at learning epochs of 100, 200, and 300, then recorded the parameter values which derived the best prediction powers for 10 years. Table 5 reports bankruptcy prediction results for the manufacturing firms for 10 years, along with Type I an Type II errors. The logit and MDA models show poor prediction powers from one to five years prior to bankruptcy. As confirmed in the preliminary analysis, the financial ratios in this study deviate from multivariate normality assumption, while MDA performs well under multi- variate normality. The DD-SVM outperforms the ANN model which is a main comparison target from one to six years prior to bankruptcy. Overall, the DD-SVM has the best predic- tion power among the bankruptcy prediction models as benchmarks in this study. Because of the impact on the domestic economy, Type I error is more important to diagnose the firms at risk for bankruptcy. Remarkably, the DD-SVM shows almost zero Type I errors from one to five years prior to bankruptcy.

Table 5: The comparison of prediction accuracy of corporate bankruptcy.

		1	2	3	4	5	6	7	8	9	10
Logit	Accuracy	0.6897	0.6724	0.5345	0.6724	0.6724	0.5172	0.5862	0.6379	0.5690	0.5862
	Type I err.	0.1379	0.1552	0.2586	0.1552	0.1552	0.1379	0.2241	0.1379	0.2759	0.1552
	Type II err.	0.1724	0.1724	0.2069	0.1724	0.1724	0.3448	0.1897	0.2241	0.1552	0.2586
MDA	Accuracy	0.7758	0.6552	0.6552	0.7241	0.6552	0.6379	0.5172	0.6207	0.6379	0.6897
	Type I err.	0.1034	0.2069	0.2069	0.1552	0.1897	0.1207	0.2586	0.0862	0.1379	0.0690
	Type II err.	0.1207	0.1379	0.1379	0.1207	0.1552	0.2414	0.2241	0.2931	0.2241	0.2414
ANN	Accuracy	0.8276	0.7759	0.7586	0.7069	0.6207	0.6379	0.6724	0.6724	0.6379	0.6207
	Type I err.	0.0690	0.1034	0.1552	0.1724	0.1379	0.1724	0.1379	0.1207	0.1724	0.2241
	Type II err.	0.1034	0.1207	0.0862	0.1207	0.2414	0.1897	0.1897	0.2069	0.1897	0.1552
DD-SVM	Accuracy	0.9655	0.9655	0.8966	0.9483	0.8793	0.9138	0.7586	0.6724	0.6034	0.6724
	Type I err.	0.0000	0.0000	0.0517	0.0000	0.0000	0.0517	0.1379	0.0517	0.3448	0.1897
	Type II err.	0.0345	0.0345	0.0517	0.0517	0.1207	0.0345	0.1034	0.2759	0.0517	0.1379

5 Discussion

The fact that getting complicating company environment and faster business cycle results in heavy losses on social economy when it comes to facing corporate bankruptcy. Thus it is significant to make a bankruptcy prediction model. Nowadays, bankruptcy prediction models using data-mining technique are practiced widely. Especially, The SVM technique shows results better than or equal to those of other techniques. This research uses Data- Depth technique, one of non-parametric multivariate techniques to estimate a representative value avoiding non-linearity and non-normality of real corporate bankruptcy data. However, it has a poor prediction rate when the variables increase so this research transforms Data- Depth into DD-Plot and then classifies it by the SVM technique. This study pioneered on applying the DD-SVM to financial distress prediction. Therefore, the primary goal of this study is to apply this new model to increase the predictive accuracy of financial failure. Empirical results reveal that the proposed DD-SVM model is a very promising hybrid SVM model for predicting bankruptcy in terms of both predictive accuracy and generalization ability.

Claims

많은 재무재표들이 다변량 정규분포를 따르지 않기 때문에 이러한 재무재표들을 Data depth라는 measure로서 일차원으로 축소시킨 후에 DD-plot이라는 그래픽한 방법을 통해 2차원으로 나타낸 후 이를 Support Vector Machine을 통해 분류한 결과를 이용한 기업 부도 예측 방법.