KR20090006489A

KR20090006489A - Toolkit of constructing credit risk model, method of managing credit risk using credit risk model construction and recording medium thereof

Info

Publication number: KR20090006489A
Application number: KR1020070069863A
Authority: KR
Inventors: 박유성; 이동희; 최보승; 김태영
Original assignee: 고려대학교 산학협력단
Priority date: 2007-07-11
Filing date: 2007-07-11
Publication date: 2009-01-15
Also published as: KR100914307B1

Abstract

A tool kit for constructing a credit risk model, a method for managing credit risk by constructing the credit risk model, and a recording medium are provided to quickly process and keep mass credit management data in spite of a small scale system, and offer various usability of a credit management system to users. A raw data database(110) stores accounting data of an enterprise or a person. A data manager(120) removes a missing value from the data selected from the raw data database and manages the data including no missing data. An analysis database storing unit(130) stores the data of the data manager. A model constructor(140) selects a variable of a credit evaluation mode from the data of the analysis database storing unit and constructs the credit evaluation model from the selected variable by using recursive logistic analysis. An analysis result storing database(150) stores an analysis result of the constructed credit evaluation mode. A model evaluator evaluates a credit level of the enterprise or the person.

Description

Toolkit of constructing credit risk model, Method of managing credit risk using credit risk model construction and Recording medium

본 발명은 신용 위험 관리에 관한 것으로서, 더욱 상세하게는 기업 또는 개인의 신용 위험 관리를 위한 모형을 구축하고, 구축된 모형을 기반으로 하여 신용 위험 관리 업무뿐만 아니라 유통분야에 적용할 수 있는 신용 위험 모형 구축용 툴 킷, 신용 위험 모형 구축을 통한 신용 위험 관리 방법 및 이를 기록한 기록매체에 관한 것이다.The present invention relates to credit risk management, and more particularly, to establish a model for credit risk management of an enterprise or individual, and based on the established model, credit risk that can be applied to the distribution field as well as credit risk management work. Model kit, model of credit risk management through credit risk modeling and recording media.

금융 산업의 발전에 발맞추어 리스크 관리(Risk Management)에 대한 인식이 확산됨에 따라 국제결제기구에 의하여 규제 자본에 대한 필요성이 은행권을 중심으로 대두 되었다. 금융기관들은 리스크 관리의 중요성을 여러 차례의 금융위기를 통하여 인식하였으며, 그 적용기법에 있어서도 많은 발전을 이루고 있다. In line with the development of the financial industry, the growing awareness of risk management has led to the need for regulatory capital by banks. Financial institutions have recognized the importance of risk management through several financial crises and are making great progress in their applied techniques.

또한, 산업 기술 분야에도 시장에서의 회사 간 경쟁이 치열해짐에 따라서 기업의 미래 신용도에 대한 예측이 힘들어졌다. 금융기관의 입장에서는 이러한 신용 위험을 얼마나 효율적으로 관리하고 이전할 수 있는가가 경쟁력의 중요한 요인이 되고 있으며, 또한 이에 대한 방법으로 신용 위험에 대한 평가의 필요성이 대두되고 있다.In addition, in the industrial technology sector, competition among companies in the market has become fierce, making it difficult to predict a company's future creditworthiness. For financial institutions, how efficiently they can manage and transfer these credit risks is an important factor in their competitiveness, and the need for evaluation of credit risks is emerging as a method.

신용 위험이란 자금의 차입자가 약속한 시일에 약속한 금액을 상환하지 못할 가능성을 의미하고, 신용 위험은 신용 등급(Credit Rating), 무위험 이자율(Risk-free Interest Rate)과 같은 기준지표에 대한 스프레드(Spread) 및 부채의 시장가격이 반영되어 나타난다.Credit risk means the likelihood that a borrower of a fund will not be able to repay a promised amount. A credit risk is a spread of baseline indicators such as credit rating and risk-free interest rate. This is reflected in the market price of spread and debt.

이와 같은 신용 위험의 관리에 있어서, 국내에서는 소수의 전문화된 기업평가 기관을 통하여 각종 기업의 평가가 수행되고 있으며 주요 금융기관에서는 별도의 신용 위험관리를 위한 시스템의 구축은 시간적, 비용적 측면에서 접근이 어려운 상태이다.In the management of such credit risk, evaluation of various companies is carried out through a small number of specialized corporate evaluation agencies in Korea, and in major financial institutions, the establishment of a system for managing credit risk is approached in terms of time and cost. This is a difficult condition.

그러나, 현재 각 금융기관에서 구축된 신용 위험 관리 모형들은 금융기관의 자체인력에 의한 구현이기보다는 대부분 전문 인력을 통한 외주 위탁(outsourcing)을 통하여 구현되고 있다. 전문 인력들은 전문적인 금융 공학적 지식뿐만 아니라 통계적인 지식을 함께 겸비하고 있어야 하며, 모형 구축을 위한 특화된 시스템보다는 범용적인 통계분석을 위한 프로그램과 상용 개발 소프트웨어를 이용하여 자체적인 개발을 수행하고 있다.However, the credit risk management models established at each financial institution are implemented through outsourcing through professional workers rather than the financial institutions' own manpower. Professional personnel should combine not only professional financial engineering knowledge but also statistical knowledge, and perform their own development using programs for commercial statistical analysis and commercial development software, rather than specialized systems for model building.

따라서, 전문성을 갖춘 신용 위험관리를 위한 시스템을 개발하기 위해서는 상당한 시간과 전문가 활용에 따른 비용이 증가할 수밖에 없는 실정이다.Therefore, in order to develop a system for specialized credit risk management, the time required for considerable time and expert utilization is inevitably increased.

시스템이 구축된 이후에도 시스템 자체의 규모가 상당히 큰 경우가 대부분이 다. 따라서 별도의 신용관리 시스템을 운용하기 위해서는 추가적인 하드웨어의 구축 및 운영 시스템의 구축이 수반되어야 하고, 시스템이 구축된 이후에도 적극적인 시스템의 유지 및 활용을 위해서는 추가적인 전문가 고용 및 시스템 운영에 상당한 비용이 수반하게 된다.Even after the system is built, the system itself is often quite large. Therefore, in order to operate a separate credit management system, additional hardware construction and operation system construction must be involved, and even after the system is established, active maintenance and utilization of the system requires significant costs for additional professional employment and system operation. .

따라서, 종래의 신용 위험의 관리를 위한 시스템은 신용 위험에 관한 지식을 갖춘 소수의 전문가에 의해서만 모형 구축이 이루어지고, 시스템의 규모가 상당히 커서 이를 운용하기 위해서는 막대한 자본이 소요되고, 시스템의 유지 및 활용을 위해서는 추가적인 전문가가 필요하고, 이를 위한 상당한 비용이 수반되는 문제점이 있다.Therefore, the conventional system for managing credit risk is modeled only by a few experts with knowledge of credit risk, and the system is so large that it requires huge capital to operate it. There is a problem in that an additional expert is required for the use, and a considerable cost is required for this.

따라서, 본 발명이 해결하고자 하는 첫 번째 과제는 시스템의 규모가 작으면서도 대용량의 신용관리의 데이터를 신속하게 처리 및 유지할 수 있으며, 사용자에게 다양한 신용관리 시스템의 활용도를 부여할 수 있는 신용 위험 모형 구축용 툴 킷을 제공하는 것이다.Therefore, the first problem to be solved by the present invention is to build a credit risk model that can quickly process and maintain a large amount of credit management data while giving a user a variety of credit management system while the system is small in size. To provide a toolkit.

그리고, 본 발명이 해결하고자 하는 두 번째 과제는, 상기 신용 위험 모형 구축용 툴 킷을 이용하여 신용 위험 관리를 수행할 수 있는 신용 위험 모형 구축을 통한 신용 위험 관리 방법을 제공하는 것이다.In addition, a second problem to be solved by the present invention is to provide a credit risk management method by establishing a credit risk model that can perform credit risk management using the credit risk model building toolkit.

또한, 본 발명이 해결하고자 하는 세 번째 과제는, 상기 신용 위험 구축을 통한 신용 위험 관리 방법을 컴퓨터에서 실행시키기 위한 프로그램을 컴퓨터로 읽 을 수 있는 기록매체를 제공하는 것이다. In addition, a third problem to be solved by the present invention is to provide a computer-readable recording medium for executing the credit risk management method through the credit risk construction on a computer.

상기 첫 번째 과제를 해결하기 위하여 본 발명은,The present invention to solve the first problem,

신용도 평가를 위하여 기업 또는 개인의 재무제표에 대한 데이터를 저장하는 원자료 데이터베이스, 상기 원자료 데이터베이스로부터 데이터를 선택하여 상기 선택된 데이터의 결측치를 제거하고, 상기 결측치가 제거된 데이터를 관리하는 자료 관리부, 상기 자료 관리부의 데이터를 저장하는 분석용 데이터베이스 저장부, 상기 분석용 데이터베이스 저장부의 데이터로부터 신용도 평가 모형의 변수를 선택하고, 상기 선택된 변수로부터 로지스틱 회귀분석에 의해 데이터로부터 신용도 평가 모형을 구축하는 모형 구축부, 상기 모형 구축부로부터 구축된 신용도 평가 모형의 분석 결과를 저장하는 분석 결과 저장 데이터베이스부, 및 상기 분석 결과 저장 데이터베이스부를 이용하여 상기 기업 또는 개인의 신용도를 평가하는 모형 평가부를 포함하는 신용 위험 모형 구축용 툴 킷을 제공한다.A raw material database for storing data on a financial statement of a company or an individual for credit evaluation, a data management unit for selecting data from the raw material database to remove missing values of the selected data, and managing data from which the missing values are removed; Analytical database storage for storing data of the data management unit, a model for selecting a variable of the credit rating model from the data of the analysis database storage, and build a model of credit rating model from the data by logistic regression analysis from the selected variable And a model evaluation unit configured to store an analysis result of the credit evaluation model constructed from the model construction unit, and a model evaluation unit evaluating the credit rating of the company or the individual by using the analysis result storage database unit. Toolkit for building risk models.

그리고, 상기 두 번째 과제를 해결하기 위하여 본 발명은,And, in order to solve the second problem, the present invention,

기업 또는 개인의 신용도 평가를 위해 산출된 데이터가 저장된 원자료 데이터베이스로부터 분석하고자 하는 데이터를 선택하는 단계, 상기 선택된 데이터에서 상기 신용도 평가에 영향을 주는 변수인 독립변수와 상기 독립변수에 따른 신용도 평가의 결과를 나타내는 변수인 반응변수를 지정하고, 상기 독립변수 및 반응변수를 이용하여 상기 선택된 데이터의 기초통계량 정보를 산출하는 단계, 상기 선택된 데이터 중 상기 독립변수 내에서 결측치를 검출하여 데이터를 정제하는 단계, 상기 정제된 데이터를 기반으로 기초통계량, 상기 독립변수 간의 상관도 및 회귀분석을 수행하여 상기 신용도 평가를 수행할 독립변수를 선별하는 단계, 상기 선별된 독립변수를 이용하여 신용도 평가에 관한 예측 테이블을 구성하고, 로지스틱 회귀분석에 따라 상기 구성된 예측 테이블을 이용하여 신용도 평가 모형을 구축하는 단계, 및 상기 구축된 신용도 평가 모형을 이용하여 상기 신용도 평가값을 산출하여 신용도를 추정하는 단계를 포함하는 신용 위험 모형 구축을 통한 신용 위험 관리 방법을 제공한다.Selecting data to be analyzed from an original data database in which data calculated for corporate or individual credit rating is stored; independent variables which are variables affecting the credit rating in the selected data and credit rating according to the independent variables Designating a response variable that is a variable representing a result, calculating basic statistical information of the selected data using the independent variable and the response variable, and refining the data by detecting missing values in the independent variable among the selected data Selecting independent variables to perform the credit evaluation by performing basic statistics, correlations between the independent variables, and regression analysis based on the purified data, and using the selected independent variables, a prediction table for credit evaluation. Construct and recall according to logistic regression Building a credit rating model using the constructed prediction table; and estimating the credit rating by calculating the credit rating value using the constructed credit rating model. to provide.

또한, 상기 세 번째 과제를 해결하기 위하여 본 발명은, In addition, the present invention to solve the third problem,

상기 신용 위험 구축을 통한 신용 위험 관리 방법을 컴퓨터에서 실행시키기 위한 프로그램을 컴퓨터로 읽을 수 있는 기록매체를 제공한다.The present invention provides a computer-readable recording medium for executing a credit risk management method through a credit risk construction on a computer.

본 발명에 의하면, 신용 위험관리의 분석을 위하여 현재 보유하고 있는 데이터를 특수한 형태로 변형할 필요가 없으며, 사용자의 상황에 따라 다양한 유형의 문서 데이터를 이용할 수 있고, 작은 규모의 신용 위험관리를 위한 툴 킷으로 대용량의 데이터를 신속하게 처리할 수 있으며, 이상치 및 결측치 등과 같은 복잡한 작업을 그래픽 유저 인터페이스 환경 하에서 간편하게 처리할 수 있고, 변수 선택 분석방법에 따라 신용 위험관리를 위한 다양한 모형을 제공하며, 현재 많이 사용되는 부도예측 모형에 따른 호환성을 제공하고, 기업의 부도 예측을 위한 효과적인 기업의 분류규칙을 제공할 수 있는 효과가 있다.According to the present invention, there is no need to modify the data currently held in a special form for analysis of credit risk management, and various types of document data can be used according to the user's situation, and for small scale credit risk management The toolkit can handle large amounts of data quickly, easily handle complex tasks such as outliers and missing values in a graphical user interface environment, and provide various models for credit risk management according to variable selection analysis methods. It is effective to provide compatibility according to the bankruptcy prediction model that is widely used and to provide effective classification rules for corporate bankruptcy.

이하, 본 발명의 바람직한 실시예를 첨부도면에 의거하여 상세히 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

그러나, 다음에 예시하는 본 발명의 실시예는 여러 가지 다른 형태로 변형할 수 있으며, 본 발명의 범위가 다음에 상술하는 실시예에 한정되는 것은 아니다. 본 발명의 실시예는 당 업계에서 평균적인 지식을 가진 자에게 본 발명을 더욱 완전하게 설명하기 위하여 제공된다.However, embodiments of the present invention illustrated below may be modified in various other forms, and the scope of the present invention is not limited to the embodiments described below. Embodiments of the invention are provided to more fully illustrate the invention to those skilled in the art.

도 1은 본 발명에 따른 신용 위험 모형 구축용 툴 킷을 블록도로 도시한 것이다.1 is a block diagram illustrating a toolkit for constructing a credit risk model according to the present invention.

도 1을 참조하면, 본 발명에 따른 신용 위험 모형 구축용 툴 킷(100)은 원자료 데이터베이스(110), 자료 관리부(120), 분석용 데이터베이스 저장부(130), 모형 구축부(140) 및 분석 결과 저장 데이터베이스부(150)로 이루어진다.Referring to Figure 1, the credit risk model building tool kit 100 according to the present invention is a raw material database 110, data management unit 120, analysis database storage unit 130, model construction unit 140 and The analysis result storage database unit 150 is made.

우선, 원자료 데이터베이스(110)는 신용도 평가를 위하여 기업 또는 개인의 재무제표에 대한 데이터를 저장한다.First, the raw data database 110 stores data on financial statements of a company or an individual for credit rating.

본 발명에서는 사용자의 작업 단위로 분석 결과를 저장하므로, 이를 수행하기 위하여 별도의 데이터 통합 작업을 할 필요가 없다. 즉, 서로 다른 형식의 다양한 데이터 소스를 하나의 작업 단위에서 불러들일 수 있고, 하나의 작업에서 불러들인 각 데이터 소스는 별도의 데이터 명으로 저장이 가능하다.In the present invention, since the analysis result is stored in the user's work unit, there is no need to perform a separate data integration work to perform this. That is, various data sources of different formats can be loaded in one unit of work, and each data source loaded in one task can be stored under a separate data name.

한편, 본 발명에서는 다양한 형태의 데이터를 이용할 수 있으므로, 일반적으로 쉽게 이용되는 마이크로소프트(Microsoft)사의 엑셀(Excel) 파일이나 다양한 텍스트(text) 형태의 파일을 모두 불러들여 신용도에 대한 분석을 수행할 수 있다.On the other hand, in the present invention, since various types of data can be used, it is possible to perform an analysis on creditability by importing all easily used Excel files or various text type files of Microsoft Corporation. Can be.

자료 관리부(120)는 상기 원자료 데이터베이스(110)로부터 데이터를 선택하고, 상기 선택된 데이터의 결측치를 제거하며, 결측치가 제거된 데이터를 관리한다.The data manager 120 selects data from the raw data database 110, removes missing values of the selected data, and manages data from which missing values are removed.

이를 위하여 상기 자료 관리부(120)는 데이터 정의 모듈(121), 데이터 정제 모듈(122), 데이터 대체 모듈(123), 기초 통계 분석 모듈(124), 상관 분석 모듈(125) 및 회귀 분석 모듈(126)로 이루어질 수 있다.To this end, the data manager 120 includes a data definition module 121, a data purification module 122, a data replacement module 123, a basic statistical analysis module 124, a correlation analysis module 125, and a regression analysis module 126. It can be made of).

상기 데이터 정의 모듈(121)은 상기 원자료 데이터베이스(110)로부터 선택된 데이터에 있어서, 신용도 평가에 영향을 주는 변수인 독립변수와 상기 독립변수에 다른 신용도 평가의 결과를 나타내는 변수인 반응변수를 지정하고, 상기 독립변수 및 반응변수를 이용하여 선택된 데이터의 기초 통계량 정보를 산출할 수 있다.The data definition module 121 designates an independent variable, which is a variable that affects the credit rating, and a response variable, which is a variable representing a result of another credit rating, in the data selected from the raw material database 110. By using the independent variable and the response variable, basic statistical information of the selected data may be calculated.

여기서 독립변수는 신용도 평가에 관한 신용 위험 변수(Risk factor)를 의미할 수 있으며, 상기 변수의 지정에 있어서 독립변수 및 반응변수 이외에도 데이터의 시계열 변수(Time), 데이터의 크기(Size) 또는 데이터의 카테고리(Category)를 사용자가 지정할 수 있다. Herein, the independent variable may refer to a risk factor related to the credit rating, and in addition to the independent variable and the response variable, the time series variable of the data, the size of the data, or the data Categories can be customized.

그리고, 상기 데이터 정의 모듈(121)로부터 산출되는 기초 통계량 정보는 원자료 데이터베이스(110)로부터 선택된 데이터의 타입, 각 변수의 역할, 데이터 표본의 크기, 결측치의 개수, 평균, 분산 및 데이터 다섯숫자요약 값을 포함할 수 있다.In addition, the basic statistics information calculated from the data definition module 121 includes the type of data selected from the source database 110, the role of each variable, the size of the data sample, the number of missing values, the average, the variance, and the five-digit data summary. May contain a value.

상세히 살펴보면, 기업의 부도예측을 위한 모형 구축을 위한 자료를 이용할 경우, 부도 결과를 나타내는 변수는 반응변수(Response)로 정의할 수 있고, 일반적 인 재무제표들은 신용위험변수(Risk factor)로 정의할 수 있다. 대기업 또는 중소기업으로 구분되는 기업 규모 같은 범주형 자료가 존재한다면 이는 카테고리(Category)로 지정할 수 있으며, 년, 분기, 월 또는 일과 같이 시간을 나타내는 변수가 포함되어 있다면 이를 시간(time) 변수로 지정할 수 있다.In detail, when using data to construct a model for corporate default, a variable representing the default can be defined as a response, and general financial statements can be defined as a credit factor. Can be. If you have categorical data, such as company size, that is divided into large or small businesses, you can specify it as a category, and if it contains variables that represent time, such as year, quarter, month, or day, you can specify it as a time variable. have.

각 변수의 역할은 상기 데이터의 각 변수의 역할을 지정하여, 상술한 신용 위험 변수(Risk factor), 반응변수(Response) 이외에도 시간(Time), 크기(Size) 또는 카테고리(Category)를 지정할 수 있다.The role of each variable designates the role of each variable of the data, and may designate a time, a size, or a category in addition to the aforementioned risk factor and response. .

그리고, 데이터 표본의 크기는 아라비아 숫자로 표기할 수 있으며, 결측치의 개수(NMiss)의 정보를 표시한다.In addition, the size of the data sample may be expressed in Arabic numerals, and information on the number of missing values (NMiss) is displayed.

신용도 분석을 위하여 준비된 데이터 중에서는 다른 값들에 비하여 극단적으로 큰 값을 가지거나, 극단적으로 작은 값을 가지는 자료들이 존재할 수 있다. 이러한 데이터를 결측치(missing value)라 하고, 신용도 분석을 수행하는데 있어서 결측치 들을 그대로 사용할 경우, 데이터의 신뢰도에 오류가 발생할 수 있다. 따라서, 실제 신용도 분석을 수행하기 전에 상기 선택된 변수 내에서 결측치라고 파악되는 자료들을 자동적으로 탐색하여 결측치의 총 개수(NMiss)를 출력한다.Among the data prepared for the credit analysis, there may be data with extremely large or extremely small values. If such data is called a missing value and the missing values are used as is in performing a credit analysis, an error may occur in the reliability of the data. Therefore, before performing the actual credit analysis, the data identified as missing values in the selected variable are automatically searched and the total number of missing values NMiss is output.

리고, 상기 기초 통계량 정보에는 평균(Mean), 분산(Variance), 표준편차(Standard Deviation)의 정보를 포함할 수 있으며, 데이터 표본의 평균의 정확도에 대한 기준을 제공하는 표준 오차(Standard error) 값을 포함할 수 있다.The basic statistics information may include information about mean, variance, and standard deviation, and provides a standard error value that provides a reference for the accuracy of the mean of the data sample. It may include.

아울러, 기초 통계량 정보에는 데이터 다섯숫자요약 값을 포함할 수 있는데, 데이터 다섯숫자요약 값은 최소값(Minimum), 제 1사분위수 값(1st Quartile), 중간 값(Median), 제 3사분위수 값(3rd Quartile) 및 최대값(Maximum)을 포함한다.In addition, the basic statistics information may include data five-digit summary value, the data five-digit summary value is the minimum value (Minimum), the first quartile value (1st Quartile), the median value (Median), the third quartile value ( 3rd Quartile) and Maximum.

여기서, 제 1사분위수 값(1st Quartile)은 자료를 크기순으로 배열하였을 경우, 하위 25%에 해당하는 값의 범위를 의미하며, 중간값(Median)은 자료를 크기순으로 배열하였을 경우의 중간에 위치하는 값을 의미하며, 자료의 크기인 N이 홀수이면, (N+1/2) 번째 값이 중간값이며, N이 짝수이면 N/2 번째 값과 (N/2)+1 번째 값의 평균이 중간 값에 해당한다.Here, the first quartile value (1st Quartile) means the range of the value corresponding to the lower 25% when the data is arranged in size order, and the median value is the middle when the data is arranged in size order. If N, the size of data, is odd, the (N + 1/2) th value is the middle value, and if N is even, the N / 2 th value and (N / 2) +1 th value The mean is the median.

그리고, 제 3사분위수 값(3rd Quartile)은 자료를 크기순으로 정렬하였을 경우 상위 25%의 값을 의미할 수 있다. And, the third quartile value (3rd Quartile) may mean the value of the top 25% when the data are sorted in size order.

데이터 정제 모듈(122)은 상기 선택된 데이터를 상기 독립변수 내에서 결측치를 검출하여 데이터를 정제한다.The data purification module 122 detects the selected data within the independent variable and refines the data.

분석을 위하여 준비된 데이터에는 다른 값들에 비하여 극단적으로 큰 값을 가지거나 극단적으로 작은 값을 가지는 자료들이 존재할 수 있다. 이런 결측치(Missing value)들이 데이터 분석에서 그대로 사용된다면 잘못된 결과를 일으킬 수 있으므로 실제 분석에 들어가기 이전에 처리할 수 있는 데이터 정제(Data Cleansing)를 수행한다. 데이터 정제 모듈(122)을 이용하여 결측치 제거 이전과 이후 변수의 속성 변화 과정을 바로 확인할 수 있으며 그 결과는 자동적으로 저장될 수 있다. Data prepared for analysis may contain data that are extremely large or extremely small compared to other values. If these missing values are used as they are in the data analysis, they can produce incorrect results, so perform data cleansing that can be processed before entering the actual analysis. The data purification module 122 may immediately check the process of changing the attribute of the variable before and after the missing value is removed, and the result may be automatically stored.

예를 들어, 우불량 기업 평가에 있어서 데이터의 정제는 데이터의 분포를 안정화시키고 정제된 데이터를 통해 신용 위험에 대한 판별력을 제고하기 위하여 결측치를 제거하는 것이다. 그리고, 데이터에 대하여 사용자가 선택한 변수 혹은 전 체에 대하여 기능을 수행하고 수정된 데이터에 대한 처리 결과를 사용자에게 제공할 수 있다.For example, in the assessment of poor companies, the refining of data is to remove missing values to stabilize the distribution of the data and to increase the discernment of credit risk through the refined data. In addition, the user may perform a function on all or a variable selected by the user with respect to the data, and provide the user with a result of processing the modified data.

데이터 대체 모듈(123)은 상기 데이터 정제 모듈(122)에서 검출된 결측치를 대체 데이터로 대체한다.The data replacement module 123 replaces missing values detected by the data purification module 122 with replacement data.

즉, 데이터 대체 모듈(123)은 신용도 평가 모형에 사용하고자 하는 자료가 결측치를 가지고 있는 경우 사용자가 미리 설정한 데이터로 대체하여 전체 분석용 데이터에 자료를 추가하는 역할을 수행한다. That is, if the data to be used in the credit rating model has a missing value, the data replacement module 123 substitutes data preset by the user and adds data to the entire analysis data.

한편, 신용도 평가 모형을 구축하는데 있어서 분석을 수행하고자 하는 자료에서 상기 결측치의 비율이 너무 많은 경우 이는 정보의 손실을 의미하게 된다. 특히 결측치의 발생이 구조적인 차원에서 이루어진 것이라면 결측치 자체가 충분한 정보를 포함하고 있으므로 이를 신용도 평가 모형에 이용하여야 한다. On the other hand, in constructing a credit rating model, if the ratio of the missing values is too large in the data to be analyzed, this means loss of information. In particular, if the occurrence of the missing value is at the structural level, the missing value itself contains sufficient information and should be used in the credit rating model.

신용도 평가 모형에 상기 결측치를 이용하기 위해서는 상기 결측치를 다른 값으로 대체를 수행하여야 하는데, 본 발명에서는 이를 위하여 평균 대체 방법으로 관찰된 값들의 평균을 연산하여 결측치를 상기 연산된 평균값으로 대체하거나, 중위수 대체 방법으로 관찰된 값들의 중간값을 연산하여 상기 결측치를 연산된 중간값으로 대체할 수 있다. 상기 대체된 결측치는 새로운 데이터의 형태로 상기 데이터 정제 모듈(122)에 저장될 수 있다.In order to use the missing value in the credit rating model, the missing value should be replaced with another value. In the present invention, the average value of the observed values is calculated by using the mean replacement method, and the missing value is replaced with the calculated mean value, or the median is used. Alternatively, the missing values may be replaced with the calculated median by calculating the median of the observed values. The replaced missing value may be stored in the data purification module 122 in the form of new data.

그리고, 기초 통계 분석 모듈(124)은 상기 신용 위험의 관리 모형을 구축하는데 필요한 각 독립변수들에 대한 기초통계량을 연산한다. 기초통계량은 상기 데이터 정의 모듈(121)에서 각 변수들이 정의되면 사용자에 의해 연산될 수 있고, 데 이터 정제 모듈(122)에서 데이터가 정제된 후 기초통계량이 연산될 수 있다.In addition, the basic statistical analysis module 124 calculates basic statistics for each of the independent variables required to build the credit risk management model. The basic statistics may be calculated by the user when each variable is defined in the data definition module 121, and the basic statistics may be calculated after the data is purified in the data purification module 122.

상관 분석 모듈(125)은 신용도 평가 모형을 구축하는데 필요한 각 독립변수들 간의 상관계수를 연산한다. 상기 신용도 평가 모형의 구축을 위하여 위험요인(Risk factor)으로 사용되는 각 독립변수들, 재무제표 간의 상관계수를 연산하고, 연산된 상관계수에 대한 통계적 유의성을 검증하고 검증된 유의성 결과를 바로 조회할 수 있다. 이를 통하여 실제 신용도 평가 모형의 구축을 하기 전 각종 재무제표들 간의 연관성을 검증하고, 이를 실제로 분석할 수 있도록 지원한다. Correlation analysis module 125 calculates the correlation coefficient between each independent variable required to build a credit rating model. In order to construct the credit rating model, the correlation coefficient between each independent variable and financial statements used as a risk factor is calculated, and the statistical significance of the calculated correlation coefficient can be verified and the verified significance result can be immediately retrieved. Can be. Through this, the relationship between various financial statements is verified and actual analysis can be made before establishing the actual credit rating model.

회귀분석 모듈(126)은 상기 반응변수가 다른 독립변수들에 의하여 어떻게 설명되어지는지 또는 어떻게 예측되는지를 알아보기 위하여 독립변수인 위험요인이 반응변수에 미치는 영향력의 정도를 비교하여, 상기 독립변수의 종속변수에 대한 영향력에 대한 유의성 검증을 수행한다. 이는 실제 금융 분야에서 시장모형, 자본시장선 모형과 증권 시장선 모형과 같은 자본자산 가격결정 모형 등에 적용될 수 있다.The regression module 126 compares the degree of influence of the risk factor, which is an independent variable, on the response variable, to see how the response variable is explained or predicted by other independent variables. The significance test for the influence on the dependent variable is performed. This can be applied to market model, capital market line model and capital asset pricing model such as stock market line model in the real financial field.

분석용 데이터베이스 저장부(130)는 상기 회귀분석 모듈(126)에 의해 통계적 유의성 검증을 거친 데이터를 저장한다. The analysis database storage unit 130 stores data that has undergone statistical significance verification by the regression module 126.

상기 분석용 데이터베이스 저장부(130)에 저장된 데이터는 모형 구축부(140)에서 신용도 평가 모형의 변수를 선택하고, 상기 선택된 변수로부터 로지스틱 회귀분석 또는 역회귀분석에 의해 데이터로부터 신용도 평가 모형을 구축한다.The data stored in the analysis database storage unit 130 selects a variable of the credit rating model in the model construction unit 140, and builds a credit rating model from the data by logistic regression or inverse regression analysis from the selected variable. .

한편, 모형 구축부(140)는 세그멘테이션(segmentation) 모듈(141), 독립 변수 선택 모듈(142), 로지스틱 모형 모듈(143) 및 캘리브레이션(calibration) 모 듈(144)을 포함할 수 있다.Meanwhile, the model building unit 140 may include a segmentation module 141, an independent variable selection module 142, a logistic model module 143, and a calibration module 144.

세그멘테이션 모듈(141)은 독자적인 판별 모형을 이용한 분류규칙을 제공한다. 이는 기업이나 개인의 각종 재무제표를 이용하여 미리 정의된 그룹(segment)에 따라 개별 기업의 분류를 수행할 수 있는 적합한 모형을 구축하고, 상기 구축된 모형을 바탕으로 한 분류규칙을 제공한다. 구축된 분류규칙에 의하여 개별 기업에 대한 그룹을 결정할 수 있고, 또한 아직 그룹이 정해지지 않은 새로운 개인이나 기업에 대한 분류 예측을 수행할 수 있게 한다, 이와 같은 기능을 적용함으로써 개별 개인이나 기업에 대한 우량 또는 불량으로 구분을 수행할 수 있고, 금융기관에서는 이와 같은 개인 또는 기업의 데이터를 이용하여 각 신용상태에 따른 우량 또는 불량의 여부를 판별하고, 보험회사 등에서는 고객의 이탈 방지를 위한 세그멘테이션(segmentation) 구축이나 고객 등급의 부여 등에 활용된다.The segmentation module 141 provides a classification rule using a unique discrimination model. This constructs a suitable model for classifying individual companies according to a predefined group using various financial statements of a company or an individual, and provides a classification rule based on the constructed model. The classification rules established allow you to determine groups for an individual company, and also make classification predictions for new individuals or companies that have not yet been grouped. Or, it can be classified as bad, financial institutions use the data of such individuals or companies to determine whether there is a good or bad according to each credit status, insurance companies, etc. segmentation (segmentation) to prevent the departure of customers ) It is used for building and assigning a customer grade.

독립 변수 선택 모듈(142)은 기업이나 개인을 대상으로 하는 신용 위험 관리 모형을 구축하는데 있어서 경우에 따라 매우 방대한 요인들에 사용하여 신용도 평가 모형을 구축할 수 있는데 너무 많은 위험요인, 즉 독립변수를 하나의 모형에 모두 사용하는 것은 분석의 효율을 떨어뜨릴 수 있으며, 구축된 모형의 활용도를 악화시킬 수 있다. Independent variable selection module 142 can be used to build a credit rating model using a very large number of factors in establishing a credit risk management model for a company or an individual. Using all of them in one model can reduce the efficiency of the analysis and can make the built model less useful.

본 발명에서는 신용도 평가 모형 구축의 효율성과 구축된 모형의 활용도를 높이기 위하여 적절한 위험요인의 선택 기준을 제공한다. 신용도 평가의 모형을 구축하는데 빈번하게 사용되는 일반화 선택 모형을 기반으로 하여 각종 위험요인 선택을 위한 통계량을 제시함으로써 사용자는 본 발명에서 제공되는 기초 통계량을 확인하여 적절한 위험요인 즉, 독립변수를 선택할 수 있다.The present invention provides criteria for selecting appropriate risk factors in order to increase the efficiency of credit rating model construction and the utilization of the model. By presenting statistics for selecting various risk factors based on the generalized selection model frequently used to build a model of credit rating, the user can select the appropriate risk factors, that is, independent variables, by checking the basic statistics provided by the present invention. have.

로지스틱 모형 모듈(143)은 개인이나 기업의 신용도 평가 모형을 구축하는데 있어서, 개인이나 기업의 신용도 예측을 위하여 신용 위험에 영향을 미치는 주요한 위험요인을 확인할 수 있으며, 구축된 모형에 대한 통계적 유의성 및 안정성을 기초 통계량을 통하여 확인할 수 있다. 그리고, 상기 위험요인에 따른 독립변수를 이용하여 신용도 평가를 위한 컷 오프 포인트(cut off point)를 추출하여 신용도를 예측하게 된다.Logistic model module 143 can identify the major risk factors affecting credit risk for predicting the creditworthiness of an individual or a company in constructing a credit rating model of an individual or a company, and the statistical significance and stability of the constructed model This can be confirmed through the basic statistics. And, using the independent variable according to the risk factors to extract the cut off point (cut off point) for the credit rating to predict the credit rating.

캘리브레이션 모듈(144)은 상기 로지스틱 모형 모듈을 이용하여 신용도를 추정할 경우 기업 또는 개인의 재무제표에 대한 데이터에 근거하여 신용도를 평가함으로써, 실제 기업이나 개인의 신용도와 일정한 차이를 보일 수 있다. 이와 같은 차이를 실제 기업이나 개인의 신용도에 맞게 재조정하기 위하여 신용도의 재추정을 캘리브레이션 모듈을 이용하여 수행한다. If the calibration module 144 estimates the creditworthiness using the logistic model module, the calibration module 144 evaluates the creditworthiness based on the data on the financial statements of the company or the individual, and thus may show a certain difference from the actual company or individual creditworthiness. In order to re-adjust these differences to the actual creditworthiness of the company or individual, the reestimation of the creditworthiness is performed using the calibration module.

상기 신용도의 재추정을 수행하기 위하여 비모수 회귀 모형을 이용한 역회귀 방법을 이용하고, 특히 부분 가중 회귀법(Locally Weighted Robust Regression Method:LOESS)을 이용하여 신용도 재추정을 수행할 수 있다. 이와 같은 재추정 과정을 통하여 실제 신용도에 근접한 신용도 평가값을 추정하게 된다.In order to perform reestimation of the creditworthiness, a regression method using a nonparametric regression model may be used, and in particular, a creditworthiness re-estimation may be performed by using a locally weighted robust bust regression method (LOESS). Through this reestimation process, the credit rating value close to the actual credit rating is estimated.

이와 같은 신용도 평가값은 분석 결과 저장 데이터베이스부(150)에 저장되어 사용자가 필요시마다 상기 신용도 평가값을 활용할 수 있게 된다.The credit rating value is stored in the analysis result storage database unit 150 so that the user can utilize the credit rating value whenever necessary.

이와 더불어, 상기 원자료 데이터베이스(110), 분석용 데이터베이스 저장부(130) 및 분석 결과 저장 데이터베이스부(150)에 저장된 데이터는 XML 문서 형 식으로 저장될 수 있으므로 웹상에서 필요에 따라 용이하게 사용자가 사용할 수 있다.In addition, the data stored in the raw material database 110, the analysis database storage unit 130 and the analysis result storage database unit 150 can be stored in the form of XML document, so that the user can easily Can be used.

도 2a는 본 발명에 적용되는 원자료 데이터베이스에서 선택된 데이터를 도시한 것이다.Figure 2a shows the data selected from the source database applied to the present invention.

도 2a를 참조하면 우선, 상술한 바와 같이 원자료 데이터베이스에 저장된 데이터를 사용자의 선택에 의해 불러올 수 있다. Referring to FIG. 2A, first, as described above, data stored in an original data database may be retrieved by a user's selection.

사용자의 선택에 의해 불러온 데이터 시트의 화면에서 상단의 첫 번째 행은 데이터의 각 변수를 의미하며, 각 열은 각 변수에 따른 실제 데이터를 표시한 것이다.In the screen of the data sheet loaded by the user's selection, the first row at the top means each variable of data, and each column shows actual data according to each variable.

도 2b는 선택된 데이터에서 데이터의 변수 정보를 도시한 것이다.2B illustrates variable information of data in the selected data.

도 2b를 참조하면, 'Role'에 해당되는 영역이 변수의 역할을 의미하며, 각 변수마다 변수의 역할을 지정할 수 있다. 이는 상술한 바와 같이 독립변수와 독립변수에 대응되는 결과값을 의미하는 반응변수인 종속변수를 지정할 수 있다.Referring to FIG. 2B, a region corresponding to 'Role' denotes a role of a variable, and a role of a variable may be designated for each variable. As described above, the independent variable and the dependent variable, which is a response variable representing a result value corresponding to the independent variable, may be designated.

도 2b에서 독립변수는 'Risk Factor'를 의미하고, 종속변수인 반응변수는 'Response'로 지정되고, 그 외에 시간인 'Time', 데이터의 크기인 'Size', 데이터의 카테고리 'Category'등을 사용자에 의해 지정할 수 있으며, 상기 데이터의 변수 정보를 지정함으로써 실질적인 신용도 평가를 수행할 수 있게 된다.In FIG. 2B, the independent variable means 'Risk Factor', and the response variable, which is the dependent variable, is designated as 'Response'. In addition, time is 'Time', data size is 'Size', data category 'Category', etc. Can be specified by the user, and by specifying the variable information of the data, it is possible to perform the actual credit rating.

도 2c는 선택된 변수에 의해 출력된 기초통계량 정보를 도시한 것이다.Figure 2c shows the basic statistics information output by the selected variable.

이는 상기 도 2b에서 지정한 변수에 따라 선택된 데이터로부터 선택된 데이터의 타입(TYPE), 각 변수의 역할(Role), 데이터 표본의 크기(N), 결측치의 개 수(NMiss), 평균(Mean), 분산(Variance) 표준편차(Standard) 및 데이터 다섯숫자요약 값을 포함할 수 있다.The type of data selected from the data selected according to the variables specified in FIG. 2B, the role of each variable, the size of the data sample, the number of missing values, NMiss, mean, and variance. (Variance) Standard and data five-digit summary values can be included.

도 3a는 본 발명에 적용되는 데이터 정제 화면을 도시한 것이고, 도 3b는 도 3a에 의해 선택된 데이터의 데이터 정제 결과를 도시한 것이다FIG. 3A illustrates a data purification screen applied to the present invention, and FIG. 3B illustrates a data purification result of data selected by FIG. 3A.

우선 도 3a를 참조하면, 선택된 데이터 시트에서 각 독립변수 별로 데이터의 정제 여부를 결정한다. 여기서 데이터를 정제할 독립변수는 'Cleaning'을 선택하고, 정제하지 않을 데이터에 대한 독립변수는 'None'을 선택한다.Referring first to Figure 3a, it is determined whether to refine the data for each independent variable in the selected data sheet. In this case, select 'Cleaning' as the independent variable to clean up data and 'None' as the independent variable for data not to be purified.

그러면, 상기 선택된 결과에 따라 도 3b에서 볼 수 있는 바와 같이 데이터 정제 결과를 확인할 수 있다.Then, as shown in FIG. 3B, the data purification result can be confirmed according to the selected result.

도 3b를 참조하면, 데이터 정제 전(Uncleaned)에는 결측치에 해당되는 데이터가 없었으나, 데이터 정제 후(cleaned)에는 13개의 결측치가 늘어난 것을 확인할 수 있다.Referring to FIG. 3B, although there was no data corresponding to missing values before data cleaning (Uncleaned), 13 missing values increased after data cleaning (cleaned).

이와 더불어 상자 그림(Box plot)을 통하여 데이터 정제 후의 데이터의 분포 양상을 그래프로 확인할 수 있다.In addition, the box plot shows the distribution of the data after data refinement.

상기 상자 그림에서 라인(311)은 선택된 데이터의 중간값을 나타내고, 라인(312)은 선택된 데이터의 중간값에서 하위 25%의 값에 해당되는 데이터의 최소값을 표시한 것이고, 라인(313)은 선택된 데이터의 중간값으로 부터 상위 25%의 값에 해당되는 데이터의 최대값을 표시한 것이다.In the box diagram, line 311 represents the median of the selected data, line 312 represents the minimum of the data corresponding to the lower 25% of the median of the selected data, and line 313 represents the selected value. The maximum value of data corresponding to the top 25% from the median value of the data is displayed.

여기서, 선택된 데이터의 중간값에서 하위 25%의 값에 해당되는 데이터의 최소값인 라인(312)의 값과 선택된 데이터의 중간값으로 부터 상위 25%의 값에 해당 되는 데이터의 최대값인 라인(313) 사이의 범위를 사분위 범위(Inter Quartile Range:IQR)(310)라 지칭한다.Here, the value of the line 312 which is the minimum value of the data corresponding to the lower 25% value from the median value of the selected data and the line 313 which is the maximum value of the data corresponding to the value of the upper 25% from the median value of the selected data ) Range is referred to as Inter Quartile Range (IQR) 310.

그리고, 상기 선택된 데이터의 중간값에서 하위 25%의 값에 해당되는 데이터의 최소값인 라인(312)으로부터 하위로 상기 IQR×1.5의 값에 해당하는 라인을 최소 한계값 라인(320)으로 지칭하고, 상기 선택된 데이터의 중간값에서 상위 25%의 값에 해당되는 데이터의 최대값인 라인(312)으로부터 상위로 상기 IQR×1.5의 값에 해당하는 라인을 최대 한계값 라인(330)으로 지칭한다. In addition, a line corresponding to the value of IQR × 1.5 from the line 312, which is the minimum value of the data corresponding to the lower 25% of the median value of the selected data, is referred to as the minimum threshold line 320. The line corresponding to the value of IQR × 1.5 from the line 312, which is the maximum value of the data corresponding to the upper 25% value of the median of the selected data, is referred to as the maximum limit value line 330.

그러면, 상기 최소 한계값 라인(320)과 최대 한계값 라인(330)의 범위를 벗어나는 값을 결측치라고 하고, 결측치는 다른 값들에 비하여 극단적으로 큰 값을 가지거나 극단적으로 작은 값을 가지기 때문에 이러한 결측치들이 그대로 사용될 경우 잘못된 결과를 일으킬 수 있다.Then, a value out of the range of the minimum limit line 320 and the maximum limit line 330 is called a missing value, and the missing value has such a value because the missing value has an extremely large value or an extremely small value compared to other values. If used as is, it can lead to incorrect results.

따라서, 이러한 결측치 데이터를 정제할 독립변수에 'Cleaning'을 선택한 후 결측치가 탐지되었을 경우, 상기 결측치를 제외한 기초통계량 정보를 상기 도 3b와 같이 확인함으로써 보다 신뢰성 있는 기초통계량 정보를 얻을 수 있다.Therefore, when 'missing' is detected after selecting 'Cleaning' as an independent variable to refine such missing value data, more reliable basic statistics information can be obtained by confirming the basic statistics information excluding the missing values as shown in FIG. 3B.

도 4a는 도 3b의 결측치를 대체하는 것을 도시한 것이다.FIG. 4A illustrates replacing the missing value of FIG. 3B.

도 4a를 참조하면, 도 3b에서 결측치가 존재할 경우, 상기 결측치가 존재하는 변수에 대하여 상기 결측치를 이용하기 위해서는 상기 결측치를 다른 값으로 대체를 수행하여야 하는데, 본 발명에서는 이를 위하여 평균 대체 방법으로 관찰된 값들의 평균을 연산하여 상기 결측치를 평균값(Mean imputation)으로 대체하거나, 중위수 대체 방법(Median imputation)으로 관찰된 값들의 중간값을 연산하여 상기 결측치를 대체할 수 있다. 상기 대체된 결측치는 새로운 데이터의 형태인 평균값 또는 중간값으로 저장되게 된다.Referring to FIG. 4A, when a missing value exists in FIG. 3B, in order to use the missing value with respect to the variable in which the missing value exists, the missing value should be replaced with another value. The missing values may be replaced by a mean imputation by calculating an average of the estimated values, or by substituting a median value observed by a median imputation method. The replaced missing value is to be stored as an average value or a median value in the form of new data.

도 4b는 도 4a에 의해 결측치가 대체된 데이터의 정보를 도시한 것이다.FIG. 4B shows information of data in which missing values are replaced by FIG. 4A.

도 4b를 참조하면, 도 4a에서 평균값(Mean imputation)으로 관찰된 값들의 평균값을 연산하여 X4의 독립변수에 해당하는 상기 결측치를 대체한 경우, 13개의 결측치 값이 모두 평균값으로 대체되어 결측치가 존재하지 않음을 확인할 수 있다. 아울러, 데이터의 분산된 정도를 나타내는 분산의 값이 146.94에서 127.45로 감소하여 데이터가 편차가 줄어들었음을 알 수 있고, 데이터 다섯숫자요약 값의 폭 역시 줄어들어 해당되는 값들의 퍼짐 정도가 감소하였다.Referring to FIG. 4B, when the average value of the values observed as the mean imputation in FIG. 4A is calculated to replace the missing value corresponding to the independent variable of X4, all of the 13 missing value values are replaced by the average value, so that the missing value exists. You can see that it does not. In addition, the value of the variance, which represents the degree of variance of the data, decreased from 146.94 to 127.45, indicating that the data were less deviated.

도 5는 본 발명에 적용되는 데이터의 상관분석도를 도시한 것이다.Figure 5 shows a correlation analysis of the data applied to the present invention.

본 발명은, 데이터의 상관분석을 통하여 신용도 위험 관리를 위하여 적절한 변수를 탐색하는 과정을 수행할 수 있다.The present invention can perform a process of searching for an appropriate variable for credit risk management through correlation analysis of data.

우선, 사용자에 의해 데이터의 상관분석을 수행할 독립변수를 복수 개를 선택한 후 상기 독립변수 간 상관분석을 수행하면 신용도 위험 관리에 적절한 독립변수를 선정할 수 있다.First, by selecting a plurality of independent variables for correlation analysis of data by the user and performing correlation analysis between the independent variables, independent variables suitable for credit risk management can be selected.

아울러, 상관 분석표와 함께 정제된 데이터의 기초통계량 정보(510)를 확인할 수 있으며, 상관계수의 통계적 유의성(520) 검증을 동시에 수행할 수 있다.In addition, the basic statistical information 510 of the purified data together with the correlation analysis table may be confirmed, and the statistical significance 520 of the correlation coefficient may be simultaneously verified.

도 5를 참조하면, 5개의 독립변수가 상관분석에 사용된 경우를 도시한 것으로, 가장 먼저 5개의 독립변수에 해당하는 데이터의 기초 통계량 정보를 확인할 수 있고, 이와 더불어 행렬로 표시된 상관분석도를 확인할 수 있다.Referring to FIG. 5, five independent variables are used for correlation analysis. First, basic statistical information of data corresponding to five independent variables may be checked, and a correlation analysis diagram represented by a matrix may be obtained. You can check it.

행렬로 표시된 상기 상관분석도에는 사용자가 미리 정의한 유의수준을 정의할 수 있다. 본 실시예에서는 유의수준을 0.05로 정의하였고, 유의수준 0.05이하에서 유의한 경우는 하얀 색으로 p-value가 도시됨을 확인할 수 있다.The correlation analysis diagram displayed as a matrix may define a user-defined significance level. In the present embodiment, the significance level is defined as 0.05, and when the significance level is less than 0.05, the p-value is shown in white.

상기 상관분석의 결과에서 주의하여 보아야 할 사항은 변수 간 상관계수 가운데 그 계수가 가장 큰 값이다. 도 5를 참조하면 첫 번째 변수인 X1(510)이 나타내는 총자본 경상이익률과 두 번째 변수인 X3(530)이 나타내는 자기자본 경상이익률 간의 상관계수이다. 두 변수 간 상관계수는 매우 밀접한 연관관계를 가지고 있다고 할 수 있으며, 만약 부도예측모형을 위한 로지스틱 회귀모형에서 이 두 변수가 모두 독립변수로 사용되는 경우 다중공선성이라는 문제가 발생할 수 있다.It should be noted that in the results of the correlation analysis, the coefficient among the correlation coefficients between variables is the largest value. Referring to FIG. 5, it is a correlation coefficient between the total capital recurrence rate represented by the first variable X1 510 and the equity return ordinary income rate represented by the second variable X3 530. Correlation coefficients between two variables are closely related, and if the two variables are used as independent variables in the logistic regression model for the default prediction model, the problem of multicollinearity may occur.

일반적으로 독립변수 간 상관계수가 0.7 이상이 되면 다중공선성을 의심할 수 있으며, 상관계수가 0.9 이상이 되면 심각하게 다중공선성을 의심할 수 있다. 다중공선성이 발생하게 되면 정상적인 회귀분석을 수행할 수 없게 되며 개별 회귀계수에 대한 추론에서 잘못된 추론결과를 도출하는 오류를 범할 수 있다.In general, when the correlation coefficient between the independent variables is 0.7 or more, it can be suspected multicollinearity, and when the correlation coefficient is 0.9 or more, it can be seriously suspected. If multiple collinearity occurs, normal regression analysis cannot be performed and errors can be derived that result in incorrect inferences from inferences about individual regression coefficients.

이는 도 5에서 볼 수 있는 바와 같이 X1(510)과 X3(530)의 상관계수가 0.9를 초과하였으므로 실제 모형구축을 위하여 로지스틱 회귀분석을 수행하고자 할 경우, 두 변수 X1(510)과 X3(530) 가운데에서 하나의 변수만을 독립변수(Risk factor)로 사용하여야 한다.As shown in FIG. 5, since the correlation coefficient between X1 510 and X3 530 exceeds 0.9, two variables X1 (510) and X3 (530) are needed to perform logistic regression analysis for actual model construction. ), Only one variable should be used as the risk factor.

도 6은 본 발명에 적용되는 로지스틱 회귀분석에 의한 독립변수 선택을 위한 분석결과를 도시한 것이다.Figure 6 shows the analysis results for the selection of independent variables by logistic regression analysis applied to the present invention.

도 6을 참조하면, 신용도 평가에 사용될 후보 변수인 독립변수를 하나씩 선 택하여 그 결과를 출력하는 것을 알 수 있다. 이는 로지스틱 회귀모듈에 의해 산출된다.Referring to FIG. 6, it can be seen that the independent variables, which are candidate variables to be used for the credit rating, are selected one by one and output the results. This is calculated by the logistic regression module.

로지스틱 회귀모듈에 따라 선택의 기준으로 사용되는 독립변수의 통계량은 아카이케 정보 기준(Akaike Information Criterion:AIC), 슈바르츠 베이지안 기준(Schwarz Bayesian Criterion:SBC) 및 정확도 비율(Accuracy Ratio)이 있다. 이 가운데에서 AIC와 SBC는 그 값이 작을수록 신용도 분석에 적합한 독립변수가 될 수 있고, 정확도 비율은 그 값이 클수록 신용도 분석에 적합하다.According to the logistic regression module, the statistics of the independent variables used as the selection criteria are the Akaike Information Criterion (AIC), the Schwarz Bayesian Criterion (SBC), and the Accuracy Ratio. Among them, AIC and SBC can be independent variables suitable for credit analysis with smaller values, and higher accuracy ratio is suitable for credit analysis.

도 6에서 Observation(610)은 회귀모듈에 사용된 회사의 부도여부를 평가하기 위한 각 독립변수별 총 자료의 수를 의미한다. Observation 610 in Figure 6 refers to the total number of data for each independent variable for evaluating the bankruptcy of the company used in the regression module.

Missing(620)은 회귀모듈에 사용된 각 독립변수별 결측치에 해당되는 자료의 수를 의미한다.Missing (620) means the number of data corresponding to missing values for each independent variable used in the regression module.

Default Count(630)는 부도여부의 예측을 위한 회귀모형에서 반응변수인 부도여부 변수에서 실제 부도로 나타난 데이터의 수를 의미한다. 분석에서 사용되는 예제자료에서 전체 99개의 회사 가운데에서 2개의 회사가 부도 처리된 회사임을 나타낸다.The default count (630) refers to the number of data represented as actual defaults in default variables, which are response variables in the regression model for predicting default. In the example data used in the analysis, out of a total of 99 companies, two are bankrupt.

Estimate(640)는 부도확률을 p라 하였을 때, 각 독립변수(Risk Factor)에 대한 회귀분석에서 하기의 수학식 1에 사용되는 회귀계수의 추정치인

을 의미한다. 여기서,

는 하기 수학식 1에서 절편에 해당하는 값으로 입력될 독립변수(Risk Factor)가 없을 경우에 적용되는 값을 의미한다.Estimate (640) is an estimate of the regression coefficient used in Equation 1 below in the regression analysis of each independent factor when p default probability is p.

Means. here,

Means a value applied when there is no independent factor to be input as a value corresponding to the intercept in Equation 1 below.

그리고, 상술한 아카이케 정보 기준(Akaike Information Criterion:AIC)(650)은 적합된 모형들 간에 신용 모형 적합도를 비교하기 위하여 사용되는 하나의 측도로써 이 값이 적을수록 신용도 모형의 적합도가 우수하고, 이를 선호되는 모형이라고 할 수 있다. 여기서 AIC는 하기의 수학식 2와 같이 연산될 수 있다.In addition, the above-mentioned Akaike Information Criterion (AIC) (650) is a measure used to compare the credit model fit between the fitted models, the smaller this value, the better the fit of the credit model, This is the preferred model. Here, the AIC may be calculated as in Equation 2 below.

수학식 2에서

는

번째의 관찰치의 빈도를 의미하고,

는

번째 관찰치의 추정된 부도확률,

는 반응변수의 범주-1 으로 여기서는 1을 의미할 수 있고,

는 독립변수의 수를 의미하고, 상기의 단순 로지스틱 회귀모형에서는 1을 나타낸다.In equation (2)

Is

Means the frequency of the first observation,

Is

Estimated default probability of the first observation,

Is category-1 of the response variable, which can mean 1,

Denotes the number of independent variables, and 1 in the above simple logistic regression model.

또한, 슈바르츠 베이지안 기준(Schwarz Bayesian Criterion:SBC)(660)은 상기의 AIC와 마찬가지로 적합된 모형들 간에 모형 적합도를 비교하기 위하여 사용되는 하나의 측도로써, 이 값이 작을수록 모형의 적합도가 우수하여 선호되는 모형이라 할 수 있다. 한편, 상기 슈바르츠 베이지안 기준인 SBC 값은 하기의 수학식 3과 같이 계산될 수 있다.In addition, the Schwarz Bayesian Criterion (SBC) 660 is a measure used to compare model fits between fitted models, similar to the AIC, and the smaller the value, the better the fit of the model. It is a preferred model. Meanwhile, the SBC value which is the Schwarz Bayesian criterion may be calculated as in Equation 3 below.

마찬가지로, 수학식 3에서

는

번째의 관찰치의 빈도를 의미하고,

는

번째 관찰치의 추정된 부도확률,

는 반응변수의 범주-1 으로 여기서는 1을 의미할 수 있고,

는 독립변수의 수를 의미하고, 상기의 단순 로지스틱 회귀모형에서는 1을 나타낸다.Similarly, in equation (3)

Is

Means the frequency of the first observation,

Is

Estimated default probability of the first observation,

Is category-1 of the response variable, which can mean 1,

또한, 정확도 비율(Accuracy Ratio)(670)은 반응변수의 값(여기서는 부도여부)와 모형에 의해 예측된 부도 확률 간의 측정한 값으로써 이 값이 클수록 모형의 적합이 더 우수하다고 할 수 있다.In addition, the accuracy ratio (Accuracy Ratio) 670 is a measure between the value of the response variable (here, default) and the probability of default predicted by the model. The larger the value, the better the fit of the model.

상기 정확도 비율의 측정방법은 하기의 수학식 4와 같다.The measuring method of the accuracy ratio is shown in Equation 4 below.

상기 수학식 4를 기반으로 한 정확도 비율의 측정방법은 전체 자료 가운데에서 2개의 자료를 추출하고, 이 경우 두 개의 값은 서로 다른 반응변수를 가져야 한다. 즉, 한 변수의 값이 부도라면 다른 한 변수의 값이 부도가 아니어야 한다. 이렇게 추출된 두 개의 자료를 하나의 쌍으로 고려한다. 여기서 가능한 모든 쌍의 값을 t라고 정의하면, 각 쌍에서 모형에 의해 계산된 부도 확률을 확인한다. 이 경우 각 쌍에서 부도가 아닌 회사와 부도인 회사의 예측된 부도확률이 같은 방향이라면, 즉, 부도인 회사의 부도 확률이 부도가 아닌 회사의 부도 확률보다 크다면 이를 일치 쌍이라고 하고, 같은 방향이 아니라면, 즉, 부도인 회사의 부도확률이 부도가 아닌 회사의 부도확률보다 작다면, 이를 불일치 쌍이라 한다. 일치쌍과 불일치쌍의 전체 수를 각각

와

라 하고 하여 상기의 수학식 4와 같이 정확도 비율이 측정될 수 있다.In the method of measuring the accuracy ratio based on Equation 4, two data are extracted from all the data, and in this case, the two values should have different response variables. In other words, if the value of one variable is a default, the value of the other variable must not be a default. Consider these two data as a pair. If we define t as possible for all possible pairs, we identify the default probability calculated by the model for each pair. In this case, if the predicted probability of default of the non-default company and the default company in each pair is in the same direction, that is, if the probability of default of the defaulted company is greater than that of the non-default company, it is called a matched pair. If not, that is, if the default probability of a company that is defaulted is less than the default probability of a company that is not defaulted, it is called a mismatch pair. The total number of matched and unmatched pairs, respectively

Wow

The accuracy ratio can be measured as shown in Equation 4 above.

도 7a는 본 발명의 로지스틱 회귀모형을 이용한 신용도 평가 모형을 구축할 경우의 독립변수와 반응변수를 도시한 것이다.7A illustrates independent and response variables when constructing a credit rating model using the logistic regression model of the present invention.

실질적으로 신용도 평가 모형에 있어서 최종적으로 알아내고자 하는 것은 신용도, 예를 들면 기업의 부도 여부이다.In practice, the ultimate goal of the credit valuation model is creditworthiness, for example, corporate defaults.

특정 기업에 대하여 그 기업의 각종 재무제표를 고려할 경우, 그 회사가 부도처리가 될 것인지(y=0), 부도가 되지 않을 것인지(y=1)의 여부를 통계적 모형으로 판별하고자 하는 것이다. 로지스틱 회귀분석은 이와 같이 반응변수인 y가 2개의 값을 가지는 자료에 대하여 통계적 모형으로 적합한 분석방법이라 할 수 있다.When considering a company's financial statements for a particular company, the statistical model is to determine whether the company is in default (y = 0) or not (y = 1). Logistic regression analysis can be said to be a suitable method as a statistical model for data with two values of response variable y.

로지스틱 회귀 모형을 이용하여 특정 기업의 부도 여부에 유의미한 영향을 미치는 독립 변수를 확인할 수 있고, 각 독립 변수 간의 상대적인 비교를 통하여 부도 여부에 미치는 영향력의 정도를 비교할 수 있다.The logistic regression model can be used to identify independent variables that have a significant impact on the bankruptcy of a particular company, and the degree of influence on the bankruptcy can be compared through relative comparisons.

그리고, 모형의 최종 목적인 각 기업별 예측 부도확률을 추정할 수 있으며, 적합된 모형의 평가를 수행하고 부도 여부에 대한 적절한 컷 오프 포인트를 찾기 위한 테이블을 함께 제공할 수 있다.In addition, it is possible to estimate the predicted default probability of each company, which is the final purpose of the model, and to provide a table for performing the evaluation of a suitable model and finding an appropriate cutoff point for the default.

예를 들어, 개인이나 기업의 원자료 데이터베이스 가운데 3개의 변수를 선택하여 부도예측모형을 로지스틱 회귀분석의 모형을 이용하여 모형 적합도를 수행한다.For example, we select three variables from a database of individuals or companies, and use the model of logistic regression as the default predictive model.

이는 자료의 특성에 따라 모형이 적합이 이루어지면, 그 결과들은 모두 결과화면에서 확인할 수 있다. 각 결과로부터 구축된 로지스틱 회귀모형의 적합성, 모형 적합과 관련된 각종 기초통계량, 모형의 실행을 확인할 수 있는 테이블, 신용도를 판정하기 위한 컷 오프 포인트를 탐색하기 위한 테이블 및 최종적으로 분석에 사용된 모든 기업 자료에 대한 추정된 기업별 부도예측 확률을 확인할 수 있다.This means that if the model is fitted according to the characteristics of the data, all the results can be confirmed on the result screen. The suitability of the logistic regression model constructed from each result, various basic statistics related to the model fit, a table to confirm the model execution, a table to search the cutoff points to determine the creditworthiness, and finally all the companies used in the analysis. Estimated probability of company-specific default forecasts for the data can be identified.

우선, 도 7a를 참조하면, 로지스틱 회귀 분석(Logistic Regression Analysis) 창에서는 분석을 위하여 사용자 지정을 수행할 수 있다. First, referring to FIG. 7A, a user designation may be performed for analysis in a logistic regression analysis window.

첫 번째로, 사용자는 하나의 프로젝트에서 수행할 수 있는 복수의 데이터 셋(Data Set)을 지정할 수 있다. 만약, 데이트를 선택하는 과정에서 복수 개의 데이터셋을 호출할 경우, 호출된 데이터 셋의 명칭이 출력된다. 이 가운데 사용자는 모형 구축을 위한 데이터 셋을 선택(Select Data Set)(711)할 수 있고, 만약 프로젝트 내에서 호출된 데이터 셋이 하나일 경우 별도의 지정이 필요 없다.First, the user can specify multiple data sets that can be executed in one project. If a plurality of datasets are called in the process of selecting data, the names of the called datasets are output. Among them, the user can select a data set for building a model (Select Data Set) 711. If there is only one data set called in the project, no special designation is required.

두 번째로, 하나의 프로젝트에서 복수 개의 모형 구축이 가능한데, 이 경우 구축된 모형을 별도로 지정할 수 있고, 이를 위하여 모형의 명칭을 지정(Input Name)(712)한다.Second, a plurality of models can be constructed in one project. In this case, the constructed model can be designated separately, and for this purpose, a name of the model is designated (Input Name) 712.

세 번째로, 구축하고자 하는 로지스틱 회귀모형의 반응변수를 지정(Target Variable)(713)한다. 예를 들면, 기업의 부도예측에 관한 모형을 구축하고자 하는 경우, 회사의 부도 여부를 나타내는 변수를 지정한다. 이는 데이터를 정의할 때 반응변수를 지정하는 것으로 수행될 수 있다.Third, a response variable of the logistic regression model to be constructed is designated (Target Variable) (713). For example, if you want to build a model for corporate bankruptcy forecasting, specify a variable that indicates whether or not the corporate bankruptcy exists. This can be done by specifying response variables when defining the data.

네 번째로, 구축하고자 하는 로지스틱 회귀모형의 독립변수를 지정(Independent Variable)(714)한다. 기업의 부도예측 모형을 구축하고자 하는 경우에는 각 재무제표를 독립변수로 고려하여 지정할 수 있고, 이는 그래픽 유저 인터페이스 환경에서 지정 가능하다.Fourthly, the independent variable of the logistic regression model to be constructed is designated (714). If you want to build a bankruptcy forecasting model, you can specify each financial statement as an independent variable, which can be specified in a graphical user interface environment.

다섯 번째로, 모형구축에 사용된 독립변수에 대한 기초통계량 및 상관분석 결과를 출력할 경우에 기초통계량(Descriptive Statistics)(715)을 선택함으로써 상기 기초통계량 정보를 출력할 수 있다.Fifth, when outputting basic statistics and correlation analysis results for independent variables used in model construction, the basic statistical information may be output by selecting Descriptive Statistics 715.

여섯 번째로, 구축하고자 하는 로지스틱 회귀모형에 인터셉트(intercept)를 추가하고자 하는 경우 인터셉트 항(Intercept)(716)을 지정할 수 있다. 만약 이 지정을 하지 않을 경우, 상기 인터셉트 항이 없는 로지스틱 회귀모형을 이용하여 모형 적합 여부를 판별한다.Sixth, an intercept term 716 may be designated when an intercept is added to a logistic regression model to be constructed. If this is not specified, the logistic regression model without the intercept term is used to determine whether the model fits.

상기 여섯 개의 항을 지정한 다음, 출력되는 분석 결과는 총 5개의 영역으로 나눌 수 있다.After specifying the six terms, the output analysis results can be divided into a total of five areas.

첫 번째로, 모델 정보(Model Information)로 모형의 추정과정에서 오류가 발생하거나 정상적으로 수렴이 이루어졌는지 확인할 수 있다. 정상적으로 모형 추정이 이루어진 경우 특별한 오류 메시지 없이 '반복 완료(Iteration finished)' 메시지를 확인할 수 있고, 만약 추정과정에서 문제가 발생할 경우 자동적으로 오류 메 시지가 출력된다.First, model information can be used to determine whether an error occurred or converged normally in the estimation process of the model. If the model estimation is made normally, 'Iteration finished' message can be checked without any special error message. If a problem occurs in the estimation process, an error message is automatically displayed.

이는 도 7b에서 볼 수 있는 바와 같이 로지스틱 회귀모형의 모형 수렴 결과를 확인할 수 있다.This can confirm the model convergence results of the logistic regression model as shown in Figure 7b.

두 번째로, 파라미터 추정(Parameter Estimate)으로, 이는 적합된 로지스틱 회귀모형에서의 각 회귀계수에 대한 추정치와 관련된 값들이 출력된다. 이 경우 각 독립변수의 유의성 여부도 함께 확인할 수 있다.Second, with Parameter Estimate, it outputs values related to the estimates for each regression coefficient in the fitted logistic regression model. In this case, the significance of each independent variable can also be checked.

상세히 살펴보면, 부도확률 예측 모형 구축을 위한 로지스틱 회귀모형에서 반응변수인 Y는 기업별 부도여부를 나타낸다. 만약 기업이 부도이면, y=1이고, 정상이면 y=0이 된다. 그리고 주어진 독립변수의 값에 따라 개별 기업의 부도 확률을

라 할 때 모형 구축을 위한 로지스틱 회귀모형은 하기의 수학식 5와 같다.In detail, in the logistic regression model for building the probability of default probability model, Y, the response variable, indicates whether there is bankruptcy by company. If the firm is bankrupt, y = 1; if normal, y = 0. And the probability of default of an individual company according to the value of a given independent variable

In this case, the logistic regression model for constructing a model is represented by Equation 5 below.

수학식 5로부터 각 회귀계수

들에 대한 추정치는 뉴톤-랩슨(Newton-Raphson) 방법을 이용하여 하기의 수학식 6에서 연달아 연산되는

과

가 거의 일치할 때까지 반복 수행하여 수렴하는 값이 각 회귀계수에 대한 추정치가 된다.Regression coefficient from equation (5)

The estimates for these fields are calculated in succession using Equation 6 below using the Newton-Raphson method.

and

Repeatedly converge until the values are nearly identical, and the convergence value is an estimate for each regression coefficient.

세 번째로, 모델 적합성 정보(Model Fit Information)으로 적합된 로지스틱 회귀모형이 된 원자료에 대한 적합도를 확인하기 위한 각종 통계량을 제공한다.Third, various statistics are provided to confirm the goodness of fit of the raw data which is the logistic regression model fitted with the Model Fit Information.

이는 도 7c에서 볼 수 있는 바와 같이 본 발명의 로지스틱 회귀모형의 위험요인, 즉 독립변수에 대한 통계량 정보를 확인할 수 있다. 즉, 제공된 통계량을 이용하여 적합된 모형을 이용하여 부도 예측 등에 적용하는데 있어서, 모형에 대한 독립변수의 적합 정도를 확인할 수 있게 된다.As can be seen in Figure 7c it can confirm the risk factors of the logistic regression model of the present invention, that is, statistical information about the independent variable. In other words, in the case of applying to a default prediction using a model that is fitted using the provided statistics, it is possible to confirm the degree of suitability of the independent variable for the model.

본 정보는 -2loglikelihood, AIC, SBC, chi-square, p-value 및 정확도 비율 등의 정보를 포함한다.This information includes information such as -2 loglikelihood, AIC, SBC, chi-square, p-value, and accuracy ratio.

이 가운데 AIC와 SBC는 다른 독립변수를 사용하여 구축된 모형과 상대적인 적합도를 비교하는데 사용된다. 이 값은 작을수록 모형 적합이 잘 이루어진다고 할 수 있다. Among them, AIC and SBC are used to compare the relative goodness-of-fit with models constructed using different independent variables. The smaller this value is, the better the model fits.

chi-square과 p-value는 적합된 로지스틱 회귀모형에서 모든 회귀계수가 0이다 라는 귀무가설을 검증하는데 사용된다. 정확도 비율은 적합된 결과가 실제 부도여부를 얼마나 잘 설명하는가에 대한 측도로써 1과 0 사이의 값을 가지며, 이 값이 클수록 모델 적합도가 우수하다고 할 수 있다.The chi-square and p-value are used to test the null hypothesis that all regression coefficients are zero in the fitted logistic regression model. The accuracy ratio is a measure of how well the fit results explain the actual bankruptcy and has a value between 1 and 0. The larger the value, the better the model fit.

이는 도 7d에 도시된 로지스틱 회귀분석에 따른 모델 적합도를 살펴보면, 모델 적합성 여부에 대한 귀무가설

에 검정결과 p-value가 0.01 보다 더 작으므로 귀무가설을 기각할 수 있다. 즉, 적합된 모형이 실제 자료를 적절히 설명하고 있다고 할 수 있다. 이는 도 7d를 살펴보면, 정확도 비율(Accuracy Ratio)을 살펴보았을 때 0.9355로 1에 가까우므로 모형 적합이 적절하다고 할 수 있다.This is a null hypothesis on the model suitability, looking at the model fit according to the logistic regression analysis shown in Figure 7d

As a result of the test, the p-value is less than 0.01, so the null hypothesis can be rejected. In other words, the fitted model adequately describes the actual data. Referring to FIG. 7D, the model fit is appropriate because it is close to 1 as 0.9355 when the Accuracy Ratio is examined.

한편, -2loglikelihood 통계량은 현재 추정된 모형의 로그 우도 함수에 -2를 곱한 값으로 하기의 수학식 7과 같이 연산될 수 있다.Meanwhile, the -2 loglikelihood statistic may be calculated as Equation 7 below by multiplying the log likelihood function of the currently estimated model by -2.

상기 수학식 7에서

는

번째의 관찰치의 빈도를 의미하고,

는

번째 관찰치의 추정된 부도확률을 나타낸다.In Equation 7

Is

Means the frequency of the first observation,

Is

It represents the estimated default probability of the first observation.

그리고, chi-square는 추정된 회귀계수를 가지고 모집단 회귀계수에 대한 검정인

를 위한 검정통계량을 나타낸다. 로지스틱 회귀분석에서 개별 회귀계수들에 대한 검정은 왈드 검정(wald test)을 사용한다. 이 때 검정통계량은 이미 연산된 추정치와 추정치의 표본오차를 이용하여 하기의 수학식 8과 같이 연산될 수 있으며, 귀무가설하에서 자유도가 1인 카이제곱 분포를 따른다. 자유도가 1인 카이제곱 분포로부터 연산된 검정통계량을 이용하여 p-value를 연산할 수 있다.And chi-square is the test for the population regression coefficient with the estimated regression coefficient.

Assay statistics for. The test for individual regression coefficients in logistic regression uses the wald test. In this case, the test statistic may be calculated using Equation 8 below using the estimated value and the sample error of the estimated value, and follow the chi-square distribution having 1 degree of freedom under the null hypothesis. The p-value can be calculated using the test statistic calculated from the chi-square distribution with 1 degree of freedom.

네 번째로, 분류 테이블(Classification Table)은 정 분류 그래프를 작성하기 위한 수치들을 표의 형태로 제공하여 주는 테이블이다. 즉, 분류 테이블과 정 분류 그래프는 상호 보완적으로 사용될 수 있다.Fourth, the Classification Table is a table that provides numerical values for preparing a positive classification graph in the form of a table. That is, the classification table and the positive classification graph may be used complementarily.

분류 테이블(Classification Table)은 적합된 부도확률 예측 모형의 결과를 이용하여 예측 부도확률을 위한 적절한 컷 오프 포인트를 탐색하고, 적합된 모형이 실제 부도확률을 예측하는데 있어서 얼마나 정확한 결과를 도출해 내는 가를 평가하는데 사용되는 각종 수치들을 제공하는 테이블이다. 그리고 정 분류 그래프는 이 분류 테이블을 한 눈에 파악할 수 있도록 그래프로 도시한 것이다. The Classification Table uses the results of a fitted default probability model to search for appropriate cutoff points for predicted default probability, and evaluates how accurate the fitted model produces in predicting actual default probability. A table that provides various numbers used to And the static classification graph is a graph showing the classification table at a glance.

본 발명에 따른 분류 테이블은 7e에 도시된 바와 같다. 도 7e를 참조하면, 연산된 부도예측 확률로부터 실제 부도 여부를 예측하는 컷 오프 포인트의 확률을 0.005 단위로 제공하고 있다. 예를 들어 도 7e에서 컷 오프 포인트를 0.020으로 할 경우, 모형 적합 결과 연산된 부도확률을 가지고 부도확률이 0.02 이상이면 부도로 판정하고, 그렇지 않을 경우 정상으로 판정함을 의미한다.The classification table according to the invention is as shown in 7e. Referring to FIG. 7E, a probability of a cutoff point for predicting whether an actual bankruptcy is predicted from the calculated default predictive probability is provided in 0.005 units. For example, when the cut-off point is 0.020 in FIG. 7E, if the default probability is 0.02 or more with a default probability calculated as a result of model fitting, the default value is determined as default.

판정결과는 하기의 표 1의 분류 테이블(Classification table)과 같이 정리할 수 있다.The determination result can be summarized as shown in the classification table of Table 1 below.

상기 표 1의 판정결과 분류 테이블에 의하면 실제 부도인 기업을 부도로 예측한 경우는 1 건으로 분류 테이블에서 정확한 이벤트(Correct Event)에 해당한다.According to the determination result classification table of Table 1, a case of predicting a company as an actual bankruptcy is one case and corresponds to an accurate event in the classification table.

또한, 판정결과 실제 정상인 기업을 정상으로 예측한 경우는 87건으로 분류 테이블에서 정확한 비이벤트(Correct Non-Event)에 해당한다.In addition, as a result of the judgment, 87 companies predicted that they were normal, which corresponds to accurate non-events in the classification table.

한편, 판정결과 실제 부도인 기업을 정상으로 잘못 예측한 경우는 1건으로 분류 테이블에서 오류 비이벤트(Incorrect Non-Event)에 해당한다.On the other hand, one case of incorrectly predicting a company that is actually defaulted is a case of Incorrect Non-Event in the classification table.

그리고, 판정결과 실제 정상인 기업을 부도로 잘못 예측한 경우는 6건으로 분류 테이블에서 오류 이벤트(Incorrect Event)에 해당한다.In addition, six cases of incorrectly predicting firms that are actually normal as a result of the judgment correspond to an Incorrect Event in the classification table.

상기 분류 테이블을 기반으로 하여 다시 하기와 같이 확률을 계산할 수 있다. Based on the classification table, the probability may be calculated as follows.

첫 번째로 정확 예측률(Percentage Correct)로 이는 정상을 정상으로 부도를 부도로 정확하게 예측할 확률이다.First is Percentage Correct, which is the probability of accurately predicting a bankruptcy from normal to normal.

두 번째로는 민감도(Sensitivity)로 실제 부도 기업을 부도로 예측할 확률이다.Second, the sensitivity is the probability of predicting a real bankruptcy company as a bankruptcy.

세 번째로는 특이도(Specificity)로 실제 정상인 기업을 정상으로 예측할 확률이다.The third is the specificity, which is the probability of predicting a normal company as normal.

상기 표 1을 기반으로 정확 예측률, 민감도 및 특이도를 정리하면 하기의 표 2와 같다.Based on Table 1, the accurate prediction rate, sensitivity, and specificity are summarized in Table 2 below.

상기 표 1 및 표 2를 기반으로 산출된 각종 빈도와 상기 빈도로부터 산출된 3가지의 확률인 정확 예측률(Percentage Correct), 민감도(Sensitivity) 및 특이도(Specificity)가 0.005 단위로 증가하는 컷 오프 포인트 별로 모두 연산되며 상기 3가지의 확률이 정분류 그래프(Correct Classification Plot)를 통하여 제공된다. 상기 정분류 그래프는 도 7f에 도시된 바와 같다. 도 7f를 참조하면, 도 7e에 의한 분류 테이블을 활용하여 최적의 컷 오프 포인트를 구할 수 있다.Cut-off point for increasing various frequencies calculated based on Table 1 and Table 2 and three probabilities calculated from the frequencies, Percentage Correct, Sensitivity, and Specificity, in increments of 0.005 Each of the three probabilities is provided through a correct classification plot. The normal classification graph is as shown in FIG. 7F. Referring to FIG. 7F, an optimal cutoff point may be obtained by using the classification table of FIG. 7E.

도 7g는 상기 도 7f의 정분류 그래프를 기반으로 생성한 예측 테이블을 도시한 것이다.FIG. 7G illustrates a prediction table generated based on the correct classification graph of FIG. 7F.

도 7g를 참조하면, 예측 테이블(Prediction table)은 상기 데이터 셋의 모든 개별 기업에 대하여 구축된 모형으로부터 연산된 예측 부도 확률과 실제 부도 여부를 하나의 테이블로 정리한 것이다.Referring to FIG. 7G, a prediction table is a table that summarizes predicted default probability and actual default default value calculated from a model constructed for all individual companies of the data set.

이는 분석에 사용된 99개의 기업 데이터 가운데 일부인 20개 기업에 대한 실제 부도여부와 예측부도확률을 나타낸 테이블이다. 도 7g에서 식별자(ID)가 10번에 해당하는 기업의 실제 값이 1로 부도 기업을 나타낸다. 그리고, 이 기업에 대한 모형으로부터 예측된 부도 확률(Predict)은 0.8841임을 볼 수 있으므로 매우 근접한 부도 확률이 연산되었음을 알 수 있다.This is a table showing the actual default and forecast default probability for 20 companies, which are part of the 99 company data used in the analysis. In FIG. 7G, an actual value of a company having an ID of 10 corresponds to 1, which indicates a bank default. In addition, since the predicted default probability (Predict) is 0.8841 from the model for this company, it can be seen that the very close default probability has been calculated.

한편, 식별자가 16인 기업의 경우, 기업의 실제 값이 0으로 정상인 기업이나, 예측된 부도 확률은 0.114466으로 다른 예측값보다 상대적으로 큰 것을 확인할 수 있다.On the other hand, in the case of a company with an identifier of 16, the company whose actual value is 0 is normal, but the predicted default probability is 0.114466, which is relatively larger than other predicted values.

이는 상기 분류 테이블에서 살펴본 바와 같이 만약 컷 오프 포인트를 0.02로 설정한다면 0.114466은 0.02를 훨씬 상회하는 값으로 부도 기업으로 잘못 예측되는 것이다.This means that if the cutoff point is set to 0.02 as described in the classification table, 0.114466 is much higher than 0.02 and is incorrectly predicted as a bankrupt enterprise.

개별기업에 대한 예측확률의 연산은 추정된 로지스틱 회귀계수를 이용하여 수행한다. 로지스틱 회귀모형에서의 추정치를

라 할 때, 상기 추정치를 이용하여 적합된 로지스틱 회귀모형은 상기 수학식 6과 같다.The calculation of the predicted probability for an individual company is performed using the estimated logistic regression coefficients. Estimates from Logistic Regression Models

In this case, the logistic regression model fitted by using the above estimate is as shown in Equation (6).

상기 수학식 6으로부터 각 개별 기업에 대한 예측 부도확률 p는 하기의 수학식 9에 의해 연산된다. The predicted default probability p for each individual company from Equation 6 is calculated by Equation 9 below.

도 8a는 본 발명의 캘리브레이션 모듈에 의해 역회귀분석을 수행하는 데이터 셋 지정화면을 도시한 것이다.8A illustrates a data set designation screen for performing reverse regression analysis by the calibration module of the present invention.

도 8a에서 볼 수 있는 바와 같이 캘리브레이션 모듈에 의해 역회귀 분석을 수행하기 위한 데이터 셋과 모델명을 지정한다. 한편, 분석 시 다양한 조건을 바탕으로 다양한 모델을 만들었을 경우, 캘리브레이션을 수행할 각각의 모델에 대한 지정을 수행할 수 있다.As shown in FIG. 8A, a data set and a model name for performing reverse regression analysis are designated by the calibration module. On the other hand, when various models are created based on various conditions in the analysis, it is possible to designate each model to be calibrated.

이와 더불어 분할할 구간 개수에 대한 지정을 할 수 있다. 이는 나누어진 구간에 따라 전체적인 예측부도확률을 재추정할 것이고, 이를 통하여 보다 현실적인 그래프를 출력한다. 여기서 나누어진 구간의 최소값은 20일 수 있고, 최대값은 50일 수 있다.In addition, the number of sections to be divided can be specified. This will re-estimate the overall predictive probability according to the divided interval, and output a more realistic graph. The minimum value of the divided section may be 20, and the maximum value may be 50.

도 8b는 본 발명의 캘리브레이션 모듈에 의해 역회귀분석의 옵션 지정 화면을 도시한 것이다.8B shows an option designation screen for reverse regression analysis by the calibration module of the present invention.

도 8b를 참조하면, 이는 비모수적 회귀분석 방법에 대하여 지정할 수 있는 옵션을 도시한 것을 확인할 수 있다. 비모수적 회귀분석 방법(Non-linear Regression Method)이나 전술한 LOESS 방법을 선택할 수 있다.Referring to FIG. 8B, it can be seen that this shows the options that can be specified for the nonparametric regression method. The non-linear regression method or the LOESS method described above can be selected.

LOESS방법은 다양한 옵션을 선택할 수 있는데 독립변수의 차원(Order of the polynomial), 가중치의 적용 방법(Function used to determine the weight), 분석에 필요한 반복 횟수(Number of iteration), 평활 모수(smoothing parameter)를 지정할 수 있으며, 평활 모수의 값은 0.20에서 0.78까지 다양하게 지정할 수 있다. 상기 분석에 필요한 반복 횟수는 1-5까지 지정 가능하다.The LOESS method has a variety of options to choose from: the order of the polynomial, the function used to determine the weight, the number of iterations required for the analysis, and the smoothing parameter. The value of the smoothing parameter can vary from 0.20 to 0.78. The number of repetitions required for the analysis can be specified from 1-5.

도 8c는 본 발명의 역회귀분석의 사용 데이터를 도시한 것이다.8C shows usage data of the regression analysis of the present invention.

도 8c를 참조하면, 분석 수행시 선택한 옵션에서 구간을 20개로 나눌 경우, 독립변수인 X1의 값은 1부터 20까지 입력될 것이고, 반응변수인 y값은 로지스틱 회 귀분석을 통하여 추정된 예측 부도 확률을 20구간으로 나누고 그 구간의 실제 부도확률을 나타내게 된다.Referring to FIG. 8C, if the interval is divided into 20 sections in the selected option when the analysis is performed, the value of the independent variable X1 will be inputted from 1 to 20, and the y value of the response variable is estimated by the logistic regression analysis. The probability is divided into 20 intervals to represent the actual default probability of that interval.

도 8d는 상기 도 8c에 따른 캘리브레이션 결과를 출력한 그래프이다.FIG. 8D is a graph outputting the calibration result according to FIG. 8C.

도 8d를 참조하면, 부도여부의 예측 모형에 의한 예측확률과 LOESS에 의한 적합 확률을 동시에 도시하고 있음을 알 수 있다.Referring to FIG. 8D, it can be seen that the prediction probability by the predictive model of bankruptcy and the fitting probability by LOESS are simultaneously shown.

LOESS 방법을 적용한 결과 기존 예측 모형에 의한 예측확률보다 보다 평활(Smooth)한 형태, 즉 전체적으로 단조 증가하는 함수의 형태로 재표현되고 있음을 알 수 있다. As a result of applying the LOESS method, it can be seen that it is re-expressed in the form of a smoother function, that is, a monotonically increasing function as a whole, than the prediction probability by the existing prediction model.

도 9는 본 발명의 신용 위험 모형 구축을 통한 신용 위험 관리 방법의 흐름도를 도시한 것이다.9 is a flowchart illustrating a credit risk management method through establishing a credit risk model of the present invention.

도 9를 참조하면, 우선, 기업 또는 개인의 신용도 평가를 위해 산출된 데이터가 저장된 원자료 데이터베이스로부터 분석하고자 하는 데이터를 선택한다(910 과정).Referring to FIG. 9, first, data to be analyzed is selected from an original data database in which data calculated for corporate or individual credit evaluation is stored (step 910).

즉, 기업 또는 개인의 신용도 평가를 위하여 산출된 데이터가 저장된 데이터베이스로부터 사용자는 직접 평가하고자 하는 데이터를 선택하는데, 본 발명에서는 다양한 형태의 데이터를 이용할 수 있으므로, 일반적으로 쉽게 이용되는 엑셀 파일이나 다양한 텍스트 형태의 파일을 모두 불러들여 신용도에 대한 분석을 수행할 수 있다.That is, a user selects data to be directly evaluated from a database in which data calculated for credit rating of a company or an individual is stored. In the present invention, various types of data can be used. You can import all types of files and perform a credit analysis.

상기 선택된 데이터에서 상기 신용도 평가에 영향을 주는 변수인 독립변수와 상기 독립변수에 따른 신용도 평가의 결과를 나타내는 변수인 반응변수를 지정하 고, 상기 독립변수 및 반응변수를 이용하여 상기 임포트(import)된 데이터의 기초통계량 정보를 산출한다(920 과정).In the selected data, an independent variable that is a variable that affects the credit rating and a response variable that is a variable representing a result of the credit rating according to the independent variable are designated, and the import is imported using the independent variable and the response variable. The basic statistical information of the collected data is calculated (step 920).

이는 원자료 데이터베이스로부터 선택된 데이터에서 신용도 평가 영향을 주는 독립변수와 상기 독립변수에 따른 신용도 평가의 결과를 나타내는 변수인 반응변수를 지정한다.It specifies the independent variables that affect the credit rating in the data selected from the source database and the response variables that are variables representing the results of the credit rating according to the independent variables.

여기서 독립변수는 신용도 평가에 관한 신용 위험 변수를 의미할 수 있고, 상기 변수의 지정에 있어서 데이터의 시계열 변수, 데이터의 크기 또는 데이터의 카테고리를 사용자가 지정할 수도 있다.Herein, the independent variable may mean a credit risk variable related to the credit rating, and the user may designate a time series variable of data, a size of data, or a category of data in designating the variable.

그리고, 기초통계량 정보는 원자료 데이터베이스로부터 선택된 데이터의 타입, 각 변수의 역할, 데이터 표본의 크기, 결측치의 개수, 평균, 분산 및 상술한 데이터의 다섯숫자요약 값을 포함할 수 있다.The basic statistical information may include the type of data selected from the source database, the role of each variable, the size of the data sample, the number of missing values, the average, the variance, and the five-digit summary value of the above-described data.

그 다음, 상기 선택된 데이터 중 상기 독립변수 내에서 결측치를 검출하여 데이터를 정제한다(930 과정).In operation 930, the missing value is detected within the independent variable among the selected data.

이는, 우불량 기업의 평가에 있어서 선택된 자료에 결측치를 포함하고 있는 경우, 사용자가 미리 설정한 데이터로 대체하여 전체 분석용 데이터에 자료를 추가하는 역할을 수행하게 된다.In the case of poor companies, if the selected data contain missing values, the data is replaced by data set by the user and added to the entire analysis data.

본 발명에서는 이를 위하여 평균 대체 방법으로 결측치로 관찰된 값들을 선택된 자료의 평균값으로 대체하거나, 중위수 대체 방법으로 관찰된 값들의 중간값을 연산하여 결측치를 대체한다.In the present invention, for this purpose, the values observed as missing values are replaced with the mean value of selected data by the mean replacement method, or the missing values are calculated by calculating the median value of the observed values using the median replacement method.

상기 정제된 데이터를 기반으로 기초통계량, 상기 독립변수 간의 상관도 및 회귀분석을 수행하여 상기 신용도 평가를 수행할 독립변수를 선별한다(940 과정).Based on the purified data, an independent variable for performing the credit evaluation is selected by performing basic statistics, correlation between the independent variables, and regression analysis (step 940).

상기 기초 통계량은 정제된 데이터를 기반으로 연산된 값이므로 이 값을 통하여 데이터 정제 전과 데이터 정제 후의 선택된 데이터의 분포 상태 등을 알 수 있고, 신용도 평가 모형을 구축하는데 필요한 각 독립변수의 상관도를 분석하여 신용도 평가 모형을 구축하는데 사용되는 각 독립변수인 재무제표들 간의 상관계수를 연산하여, 연산된 상관계수에 대한 통계적 유의성을 검증하고, 검증된 유의성 결과를 조회한다.Since the basic statistic is calculated based on the refined data, it is possible to know the distribution state of the selected data before and after the data refinement, and to analyze the correlation of each independent variable necessary to build a credit rating model. By calculating the correlation coefficient between the financial statements, which are independent variables used to construct the credit rating model, the statistical significance of the calculated correlation coefficient is verified and the verified significance results are retrieved.

그런 다음, 상기 반응변수가 다른 독립변수에 의하여 어떻게 설명되어지는지를 알아보기 위하여 회귀 분석을 수행하여 독립변수인 위험요인이 반응변수에 미치는 영향력의 정도를 비교하여, 상기 독립변수의 종속변수에 대한 영향력의 유의성 검증을 수행한다. 이는 실제 금융 분야에서 시장 모형, 자본 시장선 모형과 증권 시장선 모형과 같은 자본 자산 가격결정 모형 등에 적용될 수 있다.Then, a regression analysis is performed to see how the response variables are explained by other independent variables, and the degree of influence of the risk factors as independent variables on the response variables is compared. Perform a significance test of influence. This may be applied to capital asset pricing models such as market models, capital market line models and securities market line models in the real financial sector.

그 다음, 상기 선별된 독립변수를 이용하여 신용도 평가에 관한 예측 테이블을 구성하고, 상기 구성된 예측 테이블을 이용하여 신용도 평가 모형을 구축한다(950 과정).Next, a prediction table for credit rating is constructed using the selected independent variable, and a credit rating model is constructed using the configured prediction table (step 950).

이는, 신용도 평가 모형을 구축하는데 있어서, 로지스틱 회귀분석을 수행하거나, 상기 로지스틱 회귀분석 후 역회귀분석을 함께 수행하여 신용도 평가 모형을 구축할 수 있다.In building a credit rating model, a logistic regression analysis may be performed, or a regression analysis may be performed after the logistic regression analysis to build a credit rating model.

신용도 평가 모형은 상기 독립변수를 이용하여 신용도 평가에 관한 적절한 컷 오프 포인트를 탐색하고, 적합된 모형이 실제의 신용도를 예측하는데 있어서 얼 마나 정확한 결과를 도출해 내는지의 여부를 평가하는데 사용되는 각종 수치들을 제공하는 예측 테이블에 의해 신용도 평가 모형을 구축할 수 있다.The credit rating model uses these independent variables to search for appropriate cutoff points for credit ratings, and evaluates the various values used to assess how accurate the fitted model produces in predicting actual creditworthiness. You can build a credit rating model by using the prediction tables provided.

한편, 신용도 평가 모형은 로지스틱 평가 모형을 포함할 수 있다.Meanwhile, the credit rating model may include a logistic rating model.

로지스틱 평가 모형에서 사용자는 하나의 프로젝트에서 수행할 수 있는 복수의 데이터 셋을 지정할 수 있는데, 데이터를 선택하는 과정에서 복수 개의 데이터 셋을 호출할 수 있다. 그러면, 호출된 데이터 셋을 기반으로 신용도의 모형을 지정하고, 모형의 명칭을 지정할 수도 있다.In the logistic evaluation model, the user can specify a plurality of data sets that can be executed in one project. In the process of selecting data, a plurality of data sets can be called. Then, you can specify a model of creditworthiness and name the model based on the called dataset.

그런 후, 구축하고자 하는 로지스틱 회귀모형의 반응변수, 독립변수를 지정하여 모형의 추정과정에서 오류가 발생하였는지 정상적으로 수렴이 이루어졌는지에 대한 모델 정보, 파라미터 추정, 적합된 로지스틱 회귀모형이 된 원자료에 대한 적합도를 확인하기 위한 각종 통계량을 제공하는 모델 적합성 정보, 정 분류 그래프를 작성하기 위한 수치들을 표의 형태로 제공하여 주는 분류 테이블을 로지스틱 평가 모형은 제공한다.Then, the response and independent variables of the logistic regression model to be constructed are designated to model information on whether an error occurred or converged normally in the estimation process of the model, parameter estimation, and the original logistic regression model. The logistic evaluation model provides a classification table that provides the model suitability information that provides various statistics to confirm the goodness of fit, and the numbers for creating the static classification graph in the form of a table.

마지막으로, 상기 구축된 신용도 평가 모형을 이용하여 상기 신용도 평가값을 산출하여 신용도를 추정한다(960 과정).Finally, the credit rating value is calculated using the constructed credit rating model to estimate the credit level (step 960).

즉, 이는 상기 구축된 신용도 평가 모형에 의해 산출된 컷 오프 포인트에 따라서 신용도의 우불량 여부를 추정할 수 있다.That is, it is possible to estimate whether the credit quality is poor according to the cutoff point calculated by the established credit rating model.

그리고, 본 발명은 비모수 회귀모형을 이용하여 신용도 평가값을 재추정하고, 재추정된 신용도 평가값을 이용하여 신용도를 평가할 수 있다.In addition, the present invention may re-estimate the credit rating value using the nonparametric regression model, and may evaluate the credit rating using the re-estimated credit rating value.

이는 상기 로지스틱 모형을 이용하여 신용도를 추정할 경우 기업 또는 개인 의 재무제표에 대한 데이터에 근거하여 신용도를 평가함으로써, 실제 기업이나 개인의 신용도와 일정한 차이를 보일 수 있다. 이와 같은 차이를 실제 기업이나 개인의 신용도에 맞게 재조정하기 위하여 신용도의 재추정을 수행한다.When estimating the creditworthiness using the logistic model, the credit rating is evaluated based on the data of the financial statements of the company or the individual, and thus may show a certain difference from the actual company or individual creditworthiness. In order to reconcile these differences with the actual creditworthiness of the company or individual, credit estimates are performed.

상기 신용도의 재추정을 수행하기 위하여 비모수 회귀 모형을 이용한 역회귀 방법을 이용할 수 있고, 특히 부분 가중 회귀법(Locally Weighted Robust Regression Method:LOESS)을 이용하여 신용도 재추정을 수행할 수 있다. 이와 같은 재추정 과정을 통하여 실제 신용도에 근접한 신용도 평가값을 추정하게 된다.In order to perform the reestimation of the credit rating, a reverse regression method using a nonparametric regression model may be used, and in particular, a credit re-estimation may be performed by using a locally weighted robust regression method (LOESS). Through this reestimation process, the credit rating value close to the actual credit rating is estimated.

이와 같은 신용도 평가값은 분석 결과 저장 데이터베이스에 저장되어 사용자가 필요시 마다 상기 신용도 평가값을 활용할 수 있게 된다.Such a credit rating value is stored in the analysis result storage database so that the user can utilize the credit rating value whenever necessary.

본 발명은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터(정보 처리 기능을 갖는 장치를 모두 포함한다)가 읽을 수 있는 코드로서 구현하는 것이 가능하다.The present invention can be embodied as code that can be read by a computer (including all devices having an information processing function) in a computer-readable recording medium.

컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 테이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 장치의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장장치 등이 있다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 장치에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The computer-readable recording medium includes all kinds of recording devices in which data is stored which can be read by a computer system. Examples of computer-readable recording devices include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like. The computer readable recording medium can also be distributed over network coupled computer devices so that the computer readable code is stored and executed in a distributed fashion.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술 적 보호 범위는 첨부된 등록청구범위의 기술적 사항에 의해 정해져야 할 것이다.Although the present invention has been described with reference to the embodiments shown in the drawings, this is merely exemplary, and it will be understood by those skilled in the art that various modifications and equivalent other embodiments are possible. Therefore, the true technical protection scope of the present invention will be defined by the technical details of the appended claims.

도 1은 본 발명에 따른 신용 위험 모형 구축용 툴 킷의 블록도이다.1 is a block diagram of a toolkit for credit risk model construction according to the present invention.

도 3a는 본 발명에 적용되는 데이터 정제 화면을 도시한 것이다.Figure 3a shows a data purification screen applied to the present invention.

도 3b는 도 3a에 의해 선택된 데이터의 데이터 정제 결과를 도시한 것이다.FIG. 3B shows the data purification results of the data selected by FIG. 3A.

도 6은 본 발명에 적용되는 본 발명에 적용되는 로지스틱 회귀분석에 의한 독립변수 선택을 위한 분석결과를 도시한 것이다.Figure 6 shows the analysis results for the independent variable selection by the logistic regression analysis applied to the present invention applied to the present invention.

도 7a는 본 발명의 로지스틱 회귀모형을 이용한 신용도 평가 모형의 독립변수와 반응변수를 도시한 것이다.Figure 7a illustrates the independent and response variables of the credit rating model using the logistic regression model of the present invention.

도 7b는 본 발명의 로지스틱 회귀모형의 수렴 결과를 도시한 것이다.Figure 7b shows the convergence results of the logistic regression model of the present invention.

도 7c는 본 발명의 로지스틱 회귀모형의 독립변수에 대한 통계량 정보를 도시한 것이다.7C shows statistical information on independent variables of the logistic regression model of the present invention.

도 7d는 본 발명의 로지스틱 회귀분석의 모델 적합도를 도시한 것이다.7D shows the model fit of the logistic regression of the present invention.

도 7e는 본 발명에 적용되는 분류 테이블을 도시한 것이다.7E shows a classification table applied to the present invention.

도 7f는 도 7e에 의한 컷 오프 포인트를 포함하는 정분류 그래프를 도시한 것이다.FIG. 7F illustrates a graph showing the classification including the cutoff point of FIG. 7E.

도 7g는 도 7f의 정분류 그래프를 기반으로 생성된 예측 테이블을 도시한 것이다.FIG. 7G illustrates a prediction table generated based on the classification graph of FIG. 7F.

도 8d는 도 8c에 따른 캘리브레이션 결과 출력 그래프이다.8D is a graph of a calibration result output according to FIG. 8C.

도 9는 본 발명의 신용 위험 구축을 통한 신용 위험 관리 방법의 흐름도를 도시한 것이다.9 is a flowchart illustrating a credit risk management method through credit risk building according to the present invention.

Claims

A raw data database that stores data on the financial statements of the company or individual for credit rating;

A data management unit for selecting data from the raw data database to remove missing values of the selected data and managing data from which the missing values are removed;

An analysis database storage unit for storing data of the data management unit;

A model construction unit for selecting a variable of the credit rating model from the data of the analysis database storage unit and constructing a credit rating model from the data by logistic regression analysis from the selected variable;

An analysis result storage database unit for storing analysis results of the credit rating model constructed from the model building unit; And

A tool kit for building a credit risk model comprising a model evaluator for evaluating the creditworthiness of the company or individual using the analysis result storage database.

The method of claim 1,

The data management unit,

In the selected data, an independent variable that is a variable that affects the credit rating and a response variable that is a variable representing a result of the credit rating according to the independent variable are designated, and based on the independent variable and the response variable, the basic statistics of the selected data A data definition module for calculating information;

A data refining module for refining the selected data by detecting missing values in the independent variable;

A data replacement module for replacing the missing value detected by the data purification module with replacement data;

A basic statistical analysis module analyzing basic statistics based on the data purified by the data purification module;

A correlation analysis module extracting correlation independent variables related to the credit rating by analyzing correlations between the independent variables based on the basic statistics analyzed from the basic statistical analysis module; And

And a regression analysis module for verifying statistical significance with the response variable using the correlation independent variable extracted from the correlation analysis module.

The method of claim 2,

The model construction unit,

Segmentation module for classifying the enterprise or individual by a predefined group using the data stored in the analysis database storage unit, Selecting an independent variable required for the credit rating using the data stored in the analysis database storage unit An independent variable selection module, and a logistic model module for extracting a cutoff point for credit evaluation using the independent variable to predict a credit level.

The method of claim 3, wherein

The model building unit further comprises a calibration module for performing a regression analysis of the credit rating predicted by the logistic model module using a nonparametric regression model.

The method of claim 1,

The toolkit for credit risk model building, characterized in that the data stored in the raw material database, the analysis database unit and the analysis result storage database unit is stored in an XML document format.

Selecting data to be analyzed from an original data database in which data calculated for corporate or individual credit rating is stored;

In the selected data, an independent variable that is a variable that affects the credit rating and a response variable that is a variable representing a result of the credit rating according to the independent variable are designated, and based on the independent variable and the response variable, the basic statistics of the selected data Calculating information;

Purifying the data by detecting missing values in the independent variable among the selected data;

Selecting independent variables for performing the credit evaluation by performing basic statistics, correlations between the independent variables, and regression analysis based on the purified data;

Constructing a prediction table related to credit rating using the selected independent variable, and constructing a credit rating model using the configured prediction table according to logistic regression analysis; And

Estimating the credit rating by calculating the credit rating value using the constructed credit rating model.

The method of claim 6,

Purifying the data,

Determining whether a missing value exists in the selected independent variable, and if the missing value exists, replacing the missing value with replacement data;

The substitute data is credit risk management method through a credit risk model, characterized in that the median or average value in the independent variable including the missing value.

The method of claim 6,

The credit rating model comprises a logistic model credit risk management method through the construction of a credit risk model.

The method of claim 6,

Estimating the credit rating,

And re-estimating the credit rating using a nonparametric regression model and evaluating the credit using the reestimated credit rating.

The method of claim 6,

Building a credit rating model using the configured prediction table,

Extracting cutoff points using the configured prediction table, and building a credit rating model according to the extracted cutoff points.

The method of claim 6,

The basic statistics are

A credit risk management method, comprising a credit risk model, comprising the type of data, the role of each variable, the size of the data sample, the number of missing values, the mean, the variance, and the five-digit summary value of the data.

The method of claim 6,

The credit rating model is

Credit risk management method through the establishment of a credit risk model, characterized in that the bankruptcy model of the enterprise or individual, customer purchase model of a particular financial product, card delinquency model of the card company or the insurance against the departure of the insured model.

The method of claim 6,

Building a credit rating model using the configured prediction table,

Credit risk management method through the construction of a credit risk model, characterized in that further comprising the step of building a credit rating model using the regression analysis using the selected independent variable.

A recording medium stored in a computer for executing the method of any one of claims 6 to 13.