KR102588210B1

KR102588210B1 - Apparatus and method for alternatively rating credit through the application of statistical tests to open data

Info

Publication number: KR102588210B1
Application number: KR1020230071661A
Authority: KR
Inventors: 박재준; 박영준; 허재식; 송창호
Original assignee: 주식회사 앤톡
Priority date: 2023-02-24
Filing date: 2023-06-02
Publication date: 2023-10-12
Also published as: KR102545232B1

Abstract

기업 신용 평가 정보 제공 장치 및 방법이 개시된다. 일 실시예에 따른 기업 신용 평가 정보 제공 장치는 기업과 관련된 복수의 항목을 포함하는 오픈 데이터를 수집하고, 상기 오픈 데이터의 시간에 따른 로그 데이터(log data)를 추출함으로써 로그 파생 데이터를 생성하고, 상기 로그 파생 데이터의 유의성을 검증함으로써 유효 데이터를 추출하고, 상기 유효 데이터에 대응하는 로그 파생 데이터에 기초하여 상기 기업의 잠재 리스크 발생 확률을 계산하고, 상기 잠재 리스크 발생 확률에 기초하여 상기 기업의 신용 평가 등급을 결정함으로써 신용 평가 결과를 생성하는 프로세서를 포함할 수 있다.An apparatus and method for providing corporate credit evaluation information are disclosed. An apparatus for providing corporate credit evaluation information according to an embodiment collects open data including a plurality of items related to a company, extracts log data according to time from the open data, and generates log derived data, Extract valid data by verifying the significance of the log-derived data, calculate the probability of occurrence of a potential risk of the company based on the log-derived data corresponding to the valid data, and calculate the probability of occurrence of the potential risk of the company based on the probability of occurrence of the potential risk. It may include a processor that generates a credit evaluation result by determining an evaluation level.

Description

Alternative credit rating method and device through application of statistical tests to open data {APPARATUS AND METHOD FOR ALTERNATIVELY RATING CREDIT THROUGH THE APPLICATION OF STATISTICAL TESTS TO OPEN DATA}

실시예들은 오픈 데이터에 대한 통계적 검정 적용을 통한 대안적 신용 평가 방법 및 장치에 관한 것이다.Embodiments relate to alternative credit assessment methods and devices through the application of statistical tests to open data.

신용평가정보란 금융 거래 등 상거래에 있어서 거래 상대방에 대한 신용도를 평가한 정보를 의미할 수 있으며, 금융 기관은 신용을 기반한 여신 서비스(예를 들어, 대출, 신용카드 발급 등)를 제공하는 과정에서 차주의 신용평가정보를 활용하여 기업의 부실에 의해 발생될 수 있는 피해 가능성을 평가하며, 이를 활용하여 여신 승인 여부, 한도 금리 등을 결정할 수 있다.Credit rating information may refer to information that evaluates the creditworthiness of the counterparty in commercial transactions such as financial transactions, and financial institutions use this information in the process of providing credit-based credit services (e.g., loans, credit card issuance, etc.). The borrower's credit rating information is used to evaluate the possibility of damage that may be caused by the company's insolvency, and this can be used to determine whether to approve a loan, the interest rate limit, etc.

기존의 신용평가정보를 생성하는 모델의 경우, 재무 데이터를 주로 활용하여 신용평가정보를 생성하였으나, 기업의 재무 데이터가 부재하거나 유의미하지 않은 기업의 경우, 신용평가정보의 신뢰도가 떨어지는 문제가 있을 수 있다.In the case of models that generate existing credit rating information, financial data is mainly used to generate credit rating information, but in the case of companies where the company's financial data is absent or insignificant, there may be a problem of low reliability of credit rating information. there is.

이에 따라, 차주의 재무 데이터 이외의 데이터를 활용하여 신용 리스크에 대한 예측을 통해 신용평가정보의 신뢰도를 향상시킬 수 있는 니즈가 존재한다.Accordingly, there is a need to improve the reliability of credit rating information by predicting credit risk using data other than the borrower's financial data.

일 실시예에 따른 기업 신용 평가 정보 제공 장치는, 기업과 관련된 복수의 항목을 포함하는 오픈 데이터를 수집하고, 상기 오픈 데이터의 시간에 따른 로그 데이터(log data)를 추출함으로써 로그 파생 데이터를 생성하고, 상기 로그 파생 데이터의 유의성을 검증함으로써 유효 데이터를 추출하고, 상기 유효 데이터에 대응하는 로그 파생 데이터에 기초하여 상기 기업의 잠재 리스크 발생 확률을 계산하고, 상기 잠재 리스크 발생 확률에 기초하여 상기 기업의 신용 평가 등급을 결정함으로써 신용 평가 결과를 생성하는 프로세서를 포함할 수 있다.An apparatus for providing corporate credit evaluation information according to an embodiment collects open data including a plurality of items related to a company, generates log derived data by extracting log data according to time of the open data, and , extract valid data by verifying the significance of the log-derived data, calculate the probability of occurrence of a potential risk of the company based on the log-derived data corresponding to the valid data, and calculate the probability of occurrence of a potential risk of the company based on the probability of occurrence of a potential risk of the company. It may include a processor that generates a credit evaluation result by determining a credit rating.

일 실시예에 따르면, 상기 프로세서는, 상기 로그 파생 데이터 각각에 대하여, 상기 로그 파생 데이터에 대응하는 항목이 발생된 기업의 제1 리스크 발생 확률을 획득하고, 상기 항목이 발생되지 않은 기업의 제2 리스크 발생 확률을 획득하고, 상기 제1 리스크 발생 확률과 상기 제2 리스크 발생 확률의 차이에 대한 통계적 검정 방법을 적용하여 상기 유효 데이터를 결정할 수 있다.According to one embodiment, for each of the log-derived data, the processor obtains a first risk occurrence probability of a company in which an item corresponding to the log-derived data has been generated, and a second risk occurrence probability of a company in which the item has not been generated. The valid data can be determined by obtaining the risk occurrence probability and applying a statistical test method to the difference between the first risk occurrence probability and the second risk occurrence probability.

일 실시예에 따르면, 상기 프로세서는, 상기 로그 파생 데이터 각각에 대하여, 리스크가 발생된 기업군에 대한 상기 로그 파생 데이터의 평균치와 리스크가 발생되지 않은 기업군에 대한 상기 로그 파생 데이터의 평균치에 대해 통계적 검정 방법을 적용하여 상기 유효 데이터를 결정할 수 있다.According to one embodiment, the processor performs a statistical test on the average value of the log-derived data for a group of companies in which a risk occurred and the average value of the log-derived data for a group of companies in which a risk did not occur, for each of the log-derived data. The method can be applied to determine the valid data.

일 실시예에 따르면, 상기 프로세서는, 상기 로그 파생 데이터에 따른 평균 리스크 발생 확률의 분포에 기초하여, 상기 로그 파생 데이터에 대응하는 평균 리스크 발생 확률을 계산하고, 상기 평균 부도율에 대한 가중치에 기초하여 상기 잠재 부도율을 계산할 수 있다.According to one embodiment, the processor calculates the average risk occurrence probability corresponding to the log-derived data based on the distribution of the average risk occurrence probability according to the log-derived data, and calculates the average risk occurrence probability corresponding to the log-derived data and calculates the average risk occurrence probability based on the weight for the average default rate. The potential default rate can be calculated.

일 실시예에 따르면, 상기 프로세서는, 상기 로그 데이터로부터 상기 로그 데이터에 대응하는 항목의 마지막 발생 시점을 획득함으로써 상기 항목의 최신성(recency)을 나타내는 제1 로그 파생 데이터를 생성하고, 상기 로그 데이터로부터 상기 로그 데이터가 발생한 모든 날짜를 획득함으로써 상기 항목의 발생 빈도(frequency) 나타내는 제2 로그 파생 데이터를 생성하고, 상기 로그 데이터로부터 상기 로그 데이터의 발생 횟수를 획득함으로써 상기 항목의 발생 규모(magnitude)를 나타내는 제3 로그 파생 데이터를 생성할 수 있다.According to one embodiment, the processor generates first log derived data indicating the recency of the item by obtaining the last occurrence time of the item corresponding to the log data from the log data, and the log data Generate second log derived data indicating the frequency of occurrence of the item by obtaining all dates on which the log data occurred from the log data, and determine the magnitude of occurrence of the item by obtaining the number of occurrences of the log data from the log data. Third log derived data representing can be generated.

일 실시예에 따르면, 상기 프로세서는, 상기 로그 파생 데이터에 따른 평균 리스크 발생 확률의 분포에 기초하여 상기 제1 로그 파생 데이터에 대응하는 제1 평균 리스크 발생 확률, 상기 제2 로그 파생 데이터에 대응하는 제2 평균 리스크 발생 확률 및 상기 제3 로그 파생 데이터에 대응하는 제3 평균 리스크 발생 확률을 계산하고, 상기 제1 평균 리스크 발생 확률, 상기 제2 평균 리스크 발생 확률, 상기 제3 평균 리스크 발생 확률, 상기 제1 평균 리스크 발생 확률에 대응하는 제1 가중치, 상기 제2 평균 리스크 발생 확률에 대응하는 제2 가중치 및 상기 제3 평균 리스크 발생 확률에 대응하는 제3 가중치에 기초하여 상기 잠재 리스크 발생 확률을 계산할 수 있다.According to one embodiment, the processor, based on the distribution of the average risk occurrence probability according to the log derived data, provides a first average risk occurrence probability corresponding to the first log derived data, and a first average risk occurrence probability corresponding to the second log derived data. Calculating a second average risk occurrence probability and a third average risk occurrence probability corresponding to the third log derived data, the first average risk occurrence probability, the second average risk occurrence probability, and the third average risk occurrence probability, The potential risk occurrence probability is determined based on a first weight corresponding to the first average risk occurrence probability, a second weight corresponding to the second average risk occurrence probability, and a third weight corresponding to the third average risk occurrence probability. It can be calculated.

일 실시예에 따르면, 상기 프로세서는, 상기 제1 로그 파생 데이터, 상기 제2 로그 파생 데이터 및 상기 제3 로그 파생 데이터를 기계 학습 모델에 입력함으로써 상기 제1 가중치, 상기 제2 가중치 및 상기 제3 가중치를 결정할 수 있다.According to one embodiment, the processor inputs the first log-derived data, the second log-derived data, and the third log-derived data into a machine learning model to determine the first weight, the second weight, and the third weight. Weights can be determined.

일 실시예에 따른 기업 신용 평가 정보 제공 장치에 의해 수행되는 기업 신용 평가 정보 제공 방법은, 기업과 관련된 오픈 데이터를 수집하는 단계; 상기 오픈 데이터의 시간에 따른 로그 데이터(log data)를 추출함으로써 로그 파생 데이터를 생성하는 단계;상기 로그 파생 데이터의 유의성을 검증함으로써 유효 데이터를 추출하는 단계; 상기 로그 파생 데이터에 기초하여 상기 기업의 잠재 리스크 발생 확률을 계산하는 단계; 상기 잠재 리스크 발생 확률에 기초하여 상기 기업의 신용 평가 등급을 결정함으로써 신용 평가 결과를 생성하는 단계를 포함할 수 있다. A method of providing corporate credit evaluation information performed by an apparatus for providing corporate credit evaluation information according to an embodiment includes the steps of collecting open data related to a corporation; Generating log derived data by extracting log data according to time of the open data; Extracting valid data by verifying significance of the log derived data; calculating a probability of occurrence of a potential risk of the company based on the log-derived data; It may include generating a credit evaluation result by determining the credit rating of the company based on the probability of occurrence of the potential risk.

도 1은 일 실시예에 따른 신용 평가 정보 제공 장치의 개략적인 블록도를 나타낸다.
도 2는 도 1에 도시된 신용 평가 정보 제공 장치의 구현의 예를 나타낸다.
도 3은 도 1에 도시된 신용 평가 정보 제공을 위한 펄스 알고리즘(pulse algorithm)을 설명하기 위한 도면이다.
도 4는 도 2에 도시된 표본 유입기의 동작을 설명하기 위한 도면이다.
도 5는 도 2에 도시된 데이터 수집기의 동작을 설명하기 위한 도면이다.
도 6은 도 2에 도시된 로그 추출기의 동작을 설명하기 위한 도면이다.
도 7은 도 2에 도시된 유의성 검수기의 동작을 설명하기 위한 도면이다.
도 8은 도 2에 도시된 점수 계산기의 동작을 설명하기 위한 도면이다.
도 9는 도 2에 도시된 등급 설계기의 동작을 설명하기 위한 도면이다.
도 10은 도 2에 도시된 성능 평가기의 리스크 발생 확률 계산 동작을 설명하기 위한 도면이다.
도 11은 도 2에 도시된 성능 평가기의 성능 평가 결과의 예를 나타낸다.
도 12는 도 1에 도시된 신용 평가 정보 제공 장치의 동작의 흐름도를 나타낸다.Figure 1 shows a schematic block diagram of an apparatus for providing credit evaluation information according to an embodiment.
FIG. 2 shows an example of implementation of the credit rating information providing device shown in FIG. 1.
FIG. 3 is a diagram for explaining the pulse algorithm for providing credit evaluation information shown in FIG. 1.
FIG. 4 is a diagram for explaining the operation of the sample inlet shown in FIG. 2.
FIG. 5 is a diagram for explaining the operation of the data collector shown in FIG. 2.
FIG. 6 is a diagram for explaining the operation of the log extractor shown in FIG. 2.
FIG. 7 is a diagram for explaining the operation of the significance inspector shown in FIG. 2.
FIG. 8 is a diagram for explaining the operation of the score calculator shown in FIG. 2.
FIG. 9 is a diagram for explaining the operation of the grade designer shown in FIG. 2.
FIG. 10 is a diagram for explaining the risk occurrence probability calculation operation of the performance evaluator shown in FIG. 2.
FIG. 11 shows an example of performance evaluation results of the performance evaluator shown in FIG. 2.
FIG. 12 shows a flowchart of the operation of the credit rating information providing device shown in FIG. 1.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 명세서의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be changed and implemented in various forms. Accordingly, the actual implementation form is not limited to the specific disclosed embodiments, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical idea described in the embodiments.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but these terms should be interpreted only for the purpose of distinguishing one component from another component. For example, a first component may be named a second component, and similarly, the second component may also be named a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected” to another component, it should be understood that it may be directly connected or connected to the other component, but that other components may exist in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 문서에서, "A 또는 B", "A 및 B 중 적어도 하나", "A 또는 B 중 적어도 하나", "A, B 또는 C", "A, B 및 C 중 적어도 하나", 및 "A, B, 또는 C 중 적어도 하나"와 같은 문구들 각각은 그 문구들 중 해당하는 문구에 함께 나열된 항목들 중 어느 하나, 또는 그들의 모든 가능한 조합을 포함할 수 있다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly dictates otherwise. As used herein, “A or B”, “at least one of A and B”, “at least one of A or B”, “A, B or C”, “at least one of A, B and C”, and “A Each of phrases such as “at least one of , B, or C” may include any one of the items listed together in the corresponding phrase, or any possible combination thereof. In this specification, terms such as “comprise” or “have” are intended to designate the presence of the described features, numbers, steps, operations, components, parts, or combinations thereof, and are intended to indicate the presence of one or more other features or numbers, It should be understood that this does not exclude in advance the possibility of the presence or addition of steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the art. Terms as defined in commonly used dictionaries should be interpreted as having meanings consistent with the meanings they have in the context of the related technology, and unless clearly defined in this specification, should not be interpreted in an idealized or overly formal sense. No.

본 문서에서 사용된 용어 "모듈"은 하드웨어, 소프트웨어 또는 펌웨어로 구현된 유닛을 포함할 수 있으며, 예를 들면, 로직, 논리 블록, 부품, 또는 회로와 같은 용어와 상호 호환적으로 사용될 수 있다. 모듈은, 일체로 구성된 부품 또는 하나 또는 그 이상의 기능을 수행하는, 상기 부품의 최소 단위 또는 그 일부가 될 수 있다. 예를 들면, 일실시예에 따르면, 모듈은 ASIC(application-specific integrated circuit)의 형태로 구현될 수 있다.The term “module” used in this document may include a unit implemented in hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit, for example. A module may be an integrated part or a minimum unit of the parts or a part thereof that performs one or more functions. For example, according to one embodiment, the module may be implemented in the form of an application-specific integrated circuit (ASIC).

본 문서에서 사용되는 '~부'라는 용어는 소프트웨어 또는 FPGA 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, '~부'는 어떤 역할들을 수행한다. 그렇지만, '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '~부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 예를 들어, '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함할 수 있다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다. 또한, '~부'는 하나 이상의 프로세서를 포함할 수 있다.The term '~unit' used in this document refers to software or hardware components such as FPGA or ASIC, and '~unit' performs certain roles. However, '~part' is not limited to software or hardware. The '~ part' may be configured to reside in an addressable storage medium and may be configured to reproduce on one or more processors. For example, '~part' refers to software components, object-oriented software components, components such as class components and task components, processes, functions, properties, procedures, May include subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. The functions provided within the components and 'parts' may be combined into a smaller number of components and 'parts' or may be further separated into additional components and 'parts'. Additionally, components and 'parts' may be implemented to regenerate one or more CPUs within a device or a secure multimedia card. Additionally, '~ part' may include one or more processors.

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the attached drawings. In the description with reference to the accompanying drawings, identical components will be assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted.

도 1은 일 실시예에 따른 신용 평가 정보 제공 장치의 개략적인 블록도를 나타낸다.Figure 1 shows a schematic block diagram of an apparatus for providing credit evaluation information according to an embodiment.

도 1을 참조하면, 신용 평가 정보 제공 장치(10)는 기업의 신용 평가 정보를 생성하여 제공할 수 있다. 기업의 신용 평가 정보는 기업의 신용 리스크를 평가하여 수치화한 정보에 관한 것으로, 신용 리스크는 부도, 연체, 폐업 등 금융 거래에 있어 발생될 수 있는 기업의 임의의 리스크를 포함할 수 있다.Referring to FIG. 1, the credit rating information providing device 10 can generate and provide credit rating information for a company. A company's credit evaluation information relates to information that evaluates and quantifies the company's credit risk, and credit risk may include arbitrary risks of the company that may arise during financial transactions, such as bankruptcy, delinquency, and business closure.

일 실시예에 따른 신용 평가 정보 제공 장치는 주로 법인을 대상으로 신용 평가 정보를 생성하여 제공할 수 있으나, 본원 발명의 적용 대상은 법인에 한정되는 것은 아니며, 다양한 객체로 확장될 수 있음은 통상의 기술자가 이해할 것이다.The credit evaluation information providing device according to one embodiment may generate and provide credit evaluation information mainly for corporations, but the subject of application of the present invention is not limited to corporations, and can be expanded to various objects. Technicians will understand.

신용 평가 정보 제공 장치(10)는 기업의 신용 리스크에 대한 신용 평가 정보를 생성하고, 이를 제공함으로써, 금융 거래에 있어서, 기업의 신용 리스크를 관리할 수 있는 수단을 제공할 수 있다. 신용 평가 정보 제공 장치(10)는 기업의 대출 관련 정보 및 오픈 데이터에 기초하여 신용 평가 정보를 생성하여 제공할 수 있다.The credit evaluation information providing device 10 generates and provides credit evaluation information about a company's credit risk, thereby providing a means for managing the company's credit risk in financial transactions. The credit evaluation information providing device 10 may generate and provide credit evaluation information based on the company's loan-related information and open data.

오픈 데이터는 공공 데이터를 포함하는 개념으로, 공중에 공개된 정보로써, 기업의 사업 내용, 기업의 재무 정보, 기업의 고용 정보, 기업에 대한 투자 정보, 기업이 보유한 기술에 관한 정보, 기업의 기업 인증 정보 및/또는 기업의 홍보 자료 등 임의의 정보를 포함할 수 있다.Open data is a concept that includes public data, which is information that is open to the public, such as a company's business details, a company's financial information, a company's employment information, investment information about the company, information about the technology possessed by the company, and the company's business. It may contain arbitrary information such as authentication information and/or company promotional material.

신용 평가 정보 제공 장치(10)는 기업의 대출 관련 정보 및 오픈 데이터를 처리하여 기업의 신용 평가 정보를 제공할 수 있다. 이 과정에서 신용 평가 정보 제공 장치(10)는 기업의 오픈 데이터에 기초하여 기업의 리스크 발생 확률을 계산할 수 있다. 리스크 발생 확률은 앞서 설명된 기업의 신용 리스크가 발생될 확률을 의미할 수 있으며, 예시적으로 부도율이 활용될 수 있으나, 본원은 이에 한정되는 것이 아니고, 연체율, 폐업율 등 임의의 신용 리스크와 관련된 요소가 활용될 수 있음은 통상의 기술자가 이해할 것이다.The credit evaluation information providing device 10 may process the company's loan-related information and open data to provide the company's credit evaluation information. In this process, the credit evaluation information providing device 10 can calculate the probability of occurrence of a company's risk based on the company's open data. The risk occurrence probability may refer to the probability of the company's credit risk occurring as explained above, and the default rate may be used as an example, but this is not limited to this and is related to arbitrary credit risks such as delinquency rate and business closure rate. It will be understood by those skilled in the art that elements may be utilized.

기업 리스크 관리 장치(10)는 인공 지능 알고리즘(artificial intelligence algorithm) 또는 기계학습 모델(Machine Learning Model)을 이용하여 기업의 리스크 발생 확률에 대한 정보를 분류할 수 있다. 인공 지능은 학습, 추론 또는 판단과 같은 기능을 갖춘 컴퓨터 시스템을 의미할 수 있다. 예시적으로 인공 지능 알고리즘은 뉴럴 네트워크(neural network)를 이용하여 구현될 수 있으나, 이는 예시적인 것에 불과할 분, 인공 기능 알고리즘은 이에 한정되는 것이 아니며, 통계 기반의 임의의 알고리즘으로도 구현될 수 있음은 통상의 기술자가 이해할 것이다.The corporate risk management device 10 can classify information about the probability of occurrence of corporate risks using an artificial intelligence algorithm or machine learning model. Artificial intelligence can refer to computer systems with capabilities such as learning, reasoning, or judgment. For example, an artificial intelligence algorithm may be implemented using a neural network, but this is only an example. The artificial function algorithm is not limited to this, and may be implemented with any statistical-based algorithm. will be understood by those skilled in the art.

뉴럴 네트워크(또는 인공 신경망)는 기계학습과 인지과학에서 생물학의 신경을 모방한 통계학적 학습 알고리즘을 포함할 수 있다. 뉴럴 네트워크는 시냅스의 결합으로 네트워크를 형성한 인공 뉴런(노드)이 학습을 통해 시냅스의 결합 세기를 변화시켜, 문제 해결 능력을 가지는 모델 전반을 의미할 수 있다.Neural networks (or artificial neural networks) can include statistical learning algorithms that mimic neurons in biology in machine learning and cognitive science. A neural network can refer to an overall model in which artificial neurons (nodes), which form a network through the combination of synapses, change the strength of the synapse connection through learning and have problem-solving capabilities.

뉴럴 네트워크의 뉴런은 가중치 또는 바이어스의 조합을 포함할 수 있다. 뉴럴 네트워크는 하나 이상의 뉴런 또는 노드로 구성된 하나 이상의 레이어(layer)를 포함할 수 있다. 뉴럴 네트워크는 뉴런의 가중치를 학습을 통해 변화시킴으로써 임의의 입력으로부터 예측하고자 하는 결과를 추론할 수 있다.Neurons in a neural network can contain combinations of weights or biases. A neural network may include one or more layers consisting of one or more neurons or nodes. Neural networks can infer the results they want to predict from arbitrary inputs by changing the weights of neurons through learning.

뉴럴 네트워크는 심층 뉴럴 네트워크 (Deep Neural Network)를 포함할 수 있다. 뉴럴 네트워크는 CNN(Convolutional Neural Network), RNN(Recurrent Neural Network), 퍼셉트론(perceptron), 다층 퍼셉트론(multilayer perceptron), FF(Feed Forward), RBF(Radial Basis Network), DFF(Deep Feed Forward), LSTM(Long Short Term Memory), GRU(Gated Recurrent Unit), AE(Auto Encoder), VAE(Variational Auto Encoder), DAE(Denoising Auto Encoder), SAE(Sparse Auto Encoder), MC(Markov Chain), HN(Hopfield Network), BM(Boltzmann Machine), RBM(Restricted Boltzmann Machine), DBN(Depp Belief Network), DCN(Deep Convolutional Network), DN(Deconvolutional Network), DCIGN(Deep Convolutional Inverse Graphics Network), GAN(Generative Adversarial Network), LSM(Liquid State Machine), ELM(Extreme Learning Machine), ESN(Echo State Network), DRN(Deep Residual Network), DNC(Differentiable Neural Computer), NTM(Neural Turning Machine), CN(Capsule Network), KN(Kohonen Network), 트랜스포머(transformer) 및 AN(Attention Network)를 포함할 수 있다.Neural networks may include deep neural networks. Neural networks include CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), perceptron, multilayer perceptron, FF (Feed Forward), RBF (Radial Basis Network), DFF (Deep Feed Forward), and LSTM. (Long Short Term Memory), GRU (Gated Recurrent Unit), AE (Auto Encoder), VAE (Variational Auto Encoder), DAE (Denoising Auto Encoder), SAE (Sparse Auto Encoder), MC (Markov Chain), HN (Hopfield) Network), BM (Boltzmann Machine), RBM (Restricted Boltzmann Machine), DBN (Depp Belief Network), DCN (Deep Convolutional Network), DN (Deconvolutional Network), DCIGN (Deep Convolutional Inverse Graphics Network), GAN (Generative Adversarial Network) ), Liquid State Machine (LSM), Extreme Learning Machine (ELM), Echo State Network (ESN), Deep Residual Network (DRN), Differential Neural Computer (DNC), Neural Turning Machine (NTM), Capsule Network (CN), It may include a Kohonen Network (KN), a transformer, and an Attention Network (AN).

신용 평가 정보 제공 장치(10)는 마더보드(motherboard)와 같은 인쇄 회로 기판(printed circuit board(PCB)), 집적 회로(integrated circuit(IC)), 또는 SoC(system on chip)로 구현될 수 있다. 기업 리스크 관리 장치(10)는 애플리케이션 프로세서(application processor)로 구현될 수 있다.The credit rating information providing device 10 may be implemented with a printed circuit board (PCB) such as a motherboard, an integrated circuit (IC), or a system on chip (SoC). . The enterprise risk management device 10 may be implemented with an application processor.

또한, 신용 평가 정보 제공 장치(10)는 PC(personal computer), 데이터 서버, 또는 휴대용 장치 내에 구현될 수 있다.Additionally, the credit evaluation information providing device 10 may be implemented in a personal computer (PC), a data server, or a portable device.

휴대용 장치는 랩탑(laptop) 컴퓨터, 이동 전화기, 스마트 폰(smart phone), 태블릿(tablet) PC, 모바일 인터넷 디바이스(mobile internet device(MID)), PDA(personal digital assistant), EDA(enterprise digital assistant), 디지털 스틸 카메라(digital still camera), 디지털 비디오 카메라(digital video camera), PMP(portable multimedia player), PND(personal navigation device 또는 portable navigation device), 휴대용 게임 콘솔(handheld game console), e-북(e-book), 또는 스마트 디바이스(smart device)로 구현될 수 있다. 스마트 디바이스는 스마트 와치(smart watch), 스마트 밴드(smart band), 또는 스마트 링(smart ring)으로 구현될 수 있다.Portable devices include laptop computers, mobile phones, smart phones, tablet PCs, mobile internet devices (MIDs), personal digital assistants (PDAs), and enterprise digital assistants (EDAs). , digital still camera, digital video camera, portable multimedia player (PMP), personal navigation device or portable navigation device (PND), handheld game console, e-book ( It can be implemented as an e-book) or a smart device. A smart device may be implemented as a smart watch, smart band, or smart ring.

신용 평가 정보 제공 장치(10)는 프로세서(100) 및 메모리(200)를 포함한다. The credit evaluation information providing device 10 includes a processor 100 and a memory 200.

프로세서(100)는 메모리(200)에 저장된 데이터를 처리할 수 있다. 프로세서(100)는 메모리(200)에 저장된 컴퓨터로 읽을 수 있는 코드(예를 들어, 소프트웨어) 및 프로세서(100)에 의해 유발된 인스트럭션(instruction)들을 실행할 수 있다.The processor 100 may process data stored in the memory 200. The processor 100 may execute computer-readable code (eg, software) stored in the memory 200 and instructions triggered by the processor 100 .

프로세서(100)는 목적하는 동작들(desired operations)을 실행시키기 위한 물리적인 구조를 갖는 회로를 가지는 하드웨어로 구현된 데이터 처리 장치일 수 있다. 예를 들어, 목적하는 동작들은 프로그램에 포함된 코드(code) 또는 인스트럭션들(instructions)을 포함할 수 있다.The processor 100 may be a data processing device implemented in hardware that has a circuit with a physical structure for executing desired operations. For example, the intended operations may include code or instructions included in the program.

예를 들어, 하드웨어로 구현된 데이터 처리 장치는 마이크로프로세서(microprocessor), 중앙 처리 장치(central processing unit), 프로세서 코어(processor core), 멀티-코어 프로세서(multi-core processor), 멀티프로세서(multiprocessor), ASIC(Application-Specific Integrated Circuit), FPGA(Field Programmable Gate Array)를 포함할 수 있다.For example, data processing devices implemented in hardware include microprocessors, central processing units, processor cores, multi-core processors, and multiprocessors. , ASIC (Application-Specific Integrated Circuit), and FPGA (Field Programmable Gate Array).

프로세서(100)는 수신 인터페이스를 포함할 수 있다. 프로세서(100)는 외부 또는 메모리(200)로부터 데이터를 수신할 수 있다. Processor 100 may include a receiving interface. Processor 100 may receive data from external sources or memory 200.

프로세서(100)는 기업의 오픈 데이터를 수집할 수 있다. The processor 100 can collect open data of a company.

예시적으로, 프로세서(100)는 오픈 데이터 중 기업의 사업 내용, 기업의 재무 정보, 기업의 고용 정보, 기업에 대한 투자 정보, 기업이 보유한 기술에 관한 정보, 기업의 기업 인증 정보 및/또는 기업의 홍보 자료를 수집할 수 있다.Illustratively, the processor 100 may include, among open data, the company's business details, the company's financial information, the company's employment information, the company's investment information, information about the technology possessed by the company, the company's corporate authentication information, and/or the company's information. Promotional materials can be collected.

실시예에 따라서, 프로세서(100)는 오픈 데이터에 포함되는 항목 중 일부 항목을 선별하여 앞선 로그 데이터 추출 절차를 진행할 수 있다. 예를 들어, 프로세서(100)는 가공이 용이한지 여부, 속성이 기술 검증 방식에 적합한지 여부, 신용 평가 목적에 부합하는 지 여부에 기초하여 소정의 항목을 선별하여 이하 절차를 수행할 수 있다. 다만, 오픈 데이터에서 일부 항목을 선택하는 구체적인 방식은 제시된 실시예에 한정되는 것이 아님을 통상의 기술자가 이해할 것이다.Depending on the embodiment, the processor 100 may select some of the items included in the open data and proceed with the preceding log data extraction procedure. For example, the processor 100 may select certain items based on whether they are easy to process, whether the attributes are suitable for a technology verification method, and whether they meet the purpose of credit evaluation and perform the following procedures. However, those skilled in the art will understand that the specific method of selecting some items from open data is not limited to the presented embodiment.

프로세서(100)는 오픈 데이터의 시간에 따른 로그 데이터(log data)를 추출함으로써 로그 파생 데이터를 생성할 수 있다. 이하 예시적으로 로그 파생 데이터는 제1 로그 파생 데이터, 제2 로그 파생 데이터 및/또는 제3 로그 파생 데이터를 포함하는 3가지의 로그 파생 데이터에 기초하여 기업의 잠재 리스크 발생 확률을 예측하는 것으로 예시되나, 본원 발명의 실시예는 이후 제시되는 로그 파생 데이터의 개수 및 그 예시에 권리 범위가 한정되는 것이 아님은 통상의 기술자가 이해할 것이다. 오픈 데이터에서 추출된 로그 데이터는 (발생 유무, 발생 시점, 발생 규모(총계))의 형태로 표준화되어 이후 로그 파생 데이터 생성에 활용될 수 있으며, 표준화 형식은 제시된 예시에 한정되는 것이 아님을 통상의 기술자가 이해할 것이다.The processor 100 may generate log derived data by extracting log data according to time from open data. Hereinafter, the log-derived data is exemplified as predicting the probability of occurrence of a company's potential risk based on three types of log-derived data including first log-derived data, second log-derived data, and/or third log-derived data. However, those skilled in the art will understand that the scope of rights of the embodiments of the present invention is not limited to the number and examples of log derived data presented later. Log data extracted from open data can be standardized in the form of (presence or absence of occurrence, time of occurrence, scale of occurrence (total)) and later used to generate log derived data, and the standardization format is not limited to the examples presented. Technicians will understand.

프로세서(100)는 로그 파생 데이터의 유의성을 검증함으로써 유효 데이터를 추출할 수 있다.The processor 100 may extract valid data by verifying the significance of log-derived data.

프로세서(100)는 로그 파생 데이터 각각에 대하여, 로그 파생 데이터에 대응되는 항목이 발생된 기업의 제1 리스크 발생 확률을 획득할 수 있다. 프로세서(100)는 항목이 발생되지 않은 기업의 제2 리스크 발생 확률을 획득할 수 있다. 프로세서(100)는 제1 리스크 발생 확률과 제2 리스크 발생 확률의 차이에 대해 통계적 검정 방법을 적용하여 유효 데이터를 결정할 수 있다. 예를 들어, 프로세서(100)는 제1 리스크 발생 확률과 제2 리스크 발생 확률의 차이에 대하여 카이제곱 검정(chi-squared test) 을 적용하여 그 차이가 유의미한 항목에 대응하는 로그 파생 데이터를 유효 데이터로 결정할 수 있다.The processor 100 may obtain, for each piece of log-derived data, the first risk occurrence probability of the company in which the item corresponding to the log-derived data occurred. The processor 100 may obtain the probability of occurrence of a second risk for a company in which an item has not occurred. The processor 100 may determine valid data by applying a statistical test method to the difference between the probability of occurrence of the first risk and the probability of occurrence of the second risk. For example, the processor 100 applies a chi-squared test to the difference between the probability of occurrence of a first risk and the probability of occurrence of a second risk and converts log-derived data corresponding to items in which the difference is significant into valid data. can be decided.

다른 실시예에 따르면, 프로세서(100)는 제1 리스크 발생 확률과 제2 리스크 발생 확률의 차이가 미리 결정된 문턱보다 큰지 여부를 판단할 수 있다. 프로세서(100)는 문턱값보다 큰지 여부의 판단 결과에 기초하여 유효 데이터를 결정할 수도 있다.According to another embodiment, the processor 100 may determine whether the difference between the probability of occurrence of the first risk and the probability of occurrence of the second risk is greater than a predetermined threshold. The processor 100 may determine valid data based on the result of determining whether the data is greater than the threshold.

또 다른 실시예에 따르면, 프로세서(100)는 리스크(예를 들어, 부도)가 발생된 기업군의 로그 파생 데이터의 평균치와 리스크가 발생되지 않은 기업군의 로그 파생 데이터의 평균치에 통계적 검정 방법(예를 들어, T-검정(T-test))을 적용하여 유효데이터를 결정할 수 있다. 보다 구체적으로, 프로세서(100)는 로그 파생 데이터 각각에 대하여, 부도가 발생한 기업의 평균치와 부도가 발생하지 않은 평균치가 통계적으로 유의미한 차이를 가지는 경우(예를 들어, t 통계량이 1.64 이상이며, p-value가 0.05 미만), 해당 로그 파생 데이터를 유효 데이터로 결정할 수 있다.According to another embodiment, the processor 100 uses a statistical test method (for example, For example, valid data can be determined by applying the T-test. More specifically, the processor 100, for each log derived data, if there is a statistically significant difference between the average value of companies that have defaulted and the average value that has not defaulted (for example, the t statistic is 1.64 or more, p -value is less than 0.05), the log-derived data can be determined as valid data.

이하 예시적으로 로그 파생 데이터를 통해 기업의 잠재 리스크 발생 확률을 예측하는 방식이 설명된다.Below, a method of predicting the probability of occurrence of a company's potential risk using log-derived data is explained as an example.

프로세서(100)는 오픈 데이터에서 추출된 로그 데이터로부터, 로그 데이터에 대응하는 항목의 마지막 발생 시점을 획득함으로써 로그 데이터에 대응되는 항목의 최신성(recency)을 나타내는 제1 로그 파생 데이터를 생성할 수 있다. 프로세서(100)는 로그 데이터로부터 특정 기간동안 대응되는 항목이 발생한 날짜를 모두 획득함으로써 항목의 발생 빈도(frequency) 나타내는 제2 로그 파생 데이터를 생성할 수 있다. 프로세서(100)는 로그 데이터로부터 로그 데이터에 대응되는 항목의 발생 횟수를 획득함으로써 항목의 발생 규모(magnitude)를 나타내는 제3 로그 파생 데이터를 생성할 수 있다.The processor 100 may generate first log derived data indicating the recency of the item corresponding to the log data by obtaining the last occurrence time of the item corresponding to the log data from the log data extracted from the open data. there is. The processor 100 may generate second log derived data indicating the frequency of occurrence of the item by obtaining all dates on which the corresponding item occurred during a specific period from the log data. The processor 100 may generate third log derived data indicating the magnitude of occurrence of the item by obtaining the number of occurrences of the item corresponding to the log data from the log data.

프로세서(100)는 로그 파생 데이터에 기초하여 기업의 잠재 리스크 발생 확률을 계산할 수 있다. 프로세서(100)는 로그 파생 데이터에 따른 평균 리스크 발생 확률의 분포에 기초하여 제1 로그 파생 데이터에 대응하는 제1 평균 리스크 발생 확률, 제2 로그 파생 데이터에 대응하는 제2 평균 리스크 발생 확률 및 제3 로그 파생 데이터에 대응하는 제3 평균 리스크 발생 확률을 계산할 수 있다.The processor 100 may calculate the probability of occurrence of a company's potential risk based on log-derived data. The processor 100 provides a first average risk occurrence probability corresponding to the first log derived data, a second average risk occurrence probability corresponding to the second log derived data, and a second average risk occurrence probability corresponding to the first log derived data, based on the distribution of the average risk occurrence probability according to the log derived data. 3 The third average risk occurrence probability corresponding to the log-derived data can be calculated.

프로세서(100)는 제1 평균 리스크 발생 확률, 제2 평균 리스크 발생 확률, 제3 평균 리스크 발생 확률, 제1 평균 리스크 발생 확률에 대응하는 제1 가중치, 제2 평균 리스크 발생 확률에 대응하는 제2 가중치 및 제3 평균 리스크 발생 확률에 대응하는 제3 가중치에 기초하여 잠재 리스크 발생 확률을 계산할 수 있다.The processor 100 includes a first average risk occurrence probability, a second average risk occurrence probability, a third average risk occurrence probability, a first weight corresponding to the first average risk occurrence probability, and a second average risk occurrence probability. The probability of occurrence of a potential risk may be calculated based on the third weight corresponding to the weight and the third average probability of risk occurrence.

프로세서(100)는 제1 로그 파생 데이터, 제2 로그 파생 데이터 및 제3 로그 파생 데이터를 인공지능 알고리즘에 입력함으로써 기업의 잠재 리스크 발생 확률을 예측할 수도 있다. 이를 위해서 인공지능 알고리즘은 로그 파생 데이터를 입력으로 기업의 잠재 리스크 발생 확률을 예측하도록 미리 학습될 수 있다. 추가적인 구현례에서, 프로세서(100)는 제1 로그 파생 데이터, 제2 로그 파생 데이터 및 제3 로그 파생 데이터를 인공지능 알고리즘에 입력하여 제1 가중치, 제2 가중치 및 제3 가중치를 결정할 수도 있다.The processor 100 may predict the probability of occurrence of a company's potential risk by inputting the first log-derived data, the second log-derived data, and the third log-derived data into an artificial intelligence algorithm. To this end, artificial intelligence algorithms can be trained in advance to predict the probability of a company's potential risk occurrence by inputting log-derived data. In a further implementation, the processor 100 may input the first log-derived data, the second log-derived data, and the third log-derived data into an artificial intelligence algorithm to determine the first weight, the second weight, and the third weight.

프로세서(100)는 인공지능 알고리즘을 학습시킬 수 있다. 프로세서(100)는 학습된 인공지능 알고리즘에 기초하여 가중치를 결정할 수 있으며, 실시예에 따라서 통계기반 알고리즘을 통해 가중치를 결정할 수 있다.The processor 100 can learn an artificial intelligence algorithm. The processor 100 may determine the weight based on a learned artificial intelligence algorithm, and depending on the embodiment, may determine the weight through a statistical algorithm.

프로세서(100)는 잠재 리스크 발생 확률에 기초하여 기업의 신용 평가 등급을 결정함으로써 신용 평가 결과를 생성할 수 있다.The processor 100 may generate a credit evaluation result by determining a company's credit rating based on the probability of occurrence of a potential risk.

메모리(200)는 연산을 위한 데이터 또는 연산 결과를 저장할 수 있다. 메모리(200)는 프로세서에 의해 실행가능한 인스트럭션들(또는 프로그램)을 저장할 수 있다. 예를 들어, 인스트럭션들은 프로세서의 동작 및/또는 프로세서의 각 구성의 동작을 실행하기 위한 인스트럭션들을 포함할 수 있다.The memory 200 may store data for calculation or calculation results. The memory 200 may store instructions (or programs) that can be executed by a processor. For example, the instructions may include instructions for executing the operation of the processor and/or the operation of each component of the processor.

메모리(200)는 휘발성 메모리 장치 또는 비휘발성 메모리 장치로 구현될 수 있다.The memory 200 may be implemented as a volatile memory device or a non-volatile memory device.

휘발성 메모리 장치는 DRAM(dynamic random access memory), SRAM(static random access memory), T-RAM(thyristor RAM), Z-RAM(zero capacitor RAM), 또는 TTRAM(Twin Transistor RAM)으로 구현될 수 있다.Volatile memory devices may be implemented as dynamic random access memory (DRAM), static random access memory (SRAM), thyristor RAM (T-RAM), zero capacitor RAM (Z-RAM), or twin transistor RAM (TTRAM).

비휘발성 메모리 장치는 EEPROM(Electrically Erasable Programmable Read-Only Memory), 플래시(flash) 메모리, MRAM(Magnetic RAM), 스핀전달토크 MRAM(Spin-Transfer Torque(STT)-MRAM), Conductive Bridging RAM(CBRAM), FeRAM(Ferroelectric RAM), PRAM(Phase change RAM), 저항 메모리(Resistive RAM(RRAM)), 나노 튜브 RRAM(Nanotube RRAM), 폴리머 RAM(Polymer RAM(PoRAM)), 나노 부유 게이트 메모리(Nano Floating Gate Memory(NFGM)), 홀로그래픽 메모리(holographic memory), 분자 전자 메모리 소자(Molecular Electronic Memory Device), 또는 절연 저항 변화 메모리(Insulator Resistance Change Memory)로 구현될 수 있다.Non-volatile memory devices include EEPROM (Electrically Erasable Programmable Read-Only Memory), flash memory, MRAM (Magnetic RAM), Spin-Transfer Torque (STT)-MRAM (MRAM), and Conductive Bridging RAM (CBRAM). , FeRAM (Ferroelectric RAM), PRAM (Phase change RAM), Resistive RAM (RRAM), Nanotube RRAM (Nanotube RRAM), Polymer RAM (PoRAM), Nano Floating Gate Memory (NFGM), holographic memory, molecular electronic memory device, or insulation resistance change memory.

도 2는 도 1에 도시된 신용 평가 정보 제공 장치의 구현의 예를 나타내고, 도 3은 도 1에 도시된 신용 평가 정보 제공을 위한 펄스 알고리즘(pulse algorithm)을 설명하기 위한 도면이다.FIG. 2 shows an example of implementation of the credit rating information providing device shown in FIG. 1, and FIG. 3 is a diagram for explaining a pulse algorithm for providing credit rating information shown in FIG. 1.

도 2 및 도 3을 참조하면, 프로세서(예: 도 1의 프로세서(100))는 표본 유입기(110), 데이터 수집기(120), 로그 추출기(130), 유의성 검수기(140), 점수 계산기(150), 등급 설계기(160) 및 성능 평가기(170)를 포함할 수 있다.2 and 3, the processor (e.g., processor 100 in FIG. 1) includes a sample inlet 110, a data collector 120, a log extractor 130, a significance checker 140, and a score calculator ( 150), a rating designer 160, and a performance evaluator 170.

메모리(예: 도 1의 메모리(200))는 DB(DataBase)(210)를 포함할 수 있다.The memory (eg, memory 200 in FIG. 1) may include a DB (DataBase) 210.

표본 유입기(110), 데이터 수집기(120), 로그 추출기(130), 유의성 검수기(140), 점수 계산기(150), 등급 설계기(160) 및 성능 평가기(170)는 DB (210)로부터 데이터를 수신하거나, DB(210)에 데이터를 저장할 수 있다.Sample inputter 110, data collector 120, log extractor 130, significance checker 140, score calculator 150, rating designer 160, and performance evaluator 170 are from DB 210. Data can be received or data can be stored in DB 210.

표본 유입기(110), 데이터 수집기(120), 로그 검출기(130), 유의성 검수기(140), 점수 계산기(150), 등급 설계기(160) 및 성능 평가기(170)의 동작은 도 4 내지 도 11을 참조하여 자세하게 설명한다.The operations of the sample inlet 110, data collector 120, log detector 130, significance checker 140, score calculator 150, rating designer 160, and performance evaluator 170 are shown in Figures 4 to 4. This will be described in detail with reference to FIG. 11.

프로세서(100)는 펄스 알고리즘(pulse algorithm), 로지스틱 회귀(logistic regression), 디시전 트리(decision tree) 및/또는 랜덤 포레스트(random forest)을 이용하여 기업의 리스크를 예측할 수 있다. 프로세서(100)는 기업에 관한 다양한 오픈 데이터를 수집하고, 수집한 오픈 데이터의 최신성(Recency)(310), 빈도(Frequency)(330) 및 규모(Magnitude)(350)에 기초하여 기업의 신용 리스크에 대한 신용 평가 정보를 제공할 수 있다.The processor 100 may predict corporate risks using a pulse algorithm, logistic regression, decision tree, and/or random forest. The processor 100 collects various open data about the company and determines the company's credit based on the recency (310), frequency (330), and magnitude (350) of the collected open data. It can provide credit assessment information about risk.

최신성은 오픈 데이터에 대응하는 사건이 발생한 시점으로부터 현재까지의 시간을 의미할 수 있다. 빈도는 오픈 데이터에 대응하는 사건이 발생한 주기(또는 총 발생 일자)를 의미할 수 있다. 규모는 오픈 데이터에 대응하는 사건이 발생한 횟수를 의미할 수 있다.Recency can refer to the time from the time an event corresponding to open data occurred to the present. Frequency may refer to the cycle (or total number of occurrence dates) in which events corresponding to open data occur. Scale can refer to the number of incidents corresponding to open data.

프로세서(100)는 오픈 데이터의 최신성을 파악하여 데이터에 대응하는 사건을 최근 사건 또는 예전 사건으로 구분할 수 있다. 프로세서(100)는 최신성을 이용하여 현재 관점에서 기업 별 사업 활동의 유효성을 검증할 수 있다.The processor 100 can determine the recency of open data and classify events corresponding to the data as recent events or old events. The processor 100 can use recency to verify the effectiveness of each company's business activities from a current perspective.

프로세서(100)는 오픈 데이터의 빈도를 파악하여 데이터에 대응하는 사건이 자주 발생하는 것인지 또는 간혹 발생하는 것인지를 구분할 수 있다. 프로세서(100)는 빈도에 기초하여 기업의 사업 활동의 지속성 또는 영속성을 정량적으로 도출할 수 있다.The processor 100 can identify the frequency of open data and distinguish whether events corresponding to the data occur frequently or occasionally. The processor 100 can quantitatively derive the continuity or permanence of a company's business activities based on frequency.

프로세서(100)는 오픈 데이터의 규모를 파악하여 데이터에 대응하는 사건이 대량으로 발생했는지 또는 소량으로 발생했는지 여부를 구분할 수 있다. 프로세서(100)는 규모에 기초하여 시장 내에서 개별 기업의 사업 활동에 의한 영향력 또는 파급력을 확인할 수 있다. 제시된 예시에서 프로세서(100)는 오픈 데이터의 최신성, 빈도, 규모를 활용하는 것으로 예시되었으나, 본원 발명은 오픈 데이터를 재가공하여 생성되는 임의의 시계열적 지표를 활용하여 기업의 잠재 리스크 발생 확률을 예측하는 방향으로 구현될 수 있음은 통상의 기술자가 이해할 것이다.The processor 100 can determine the size of open data and distinguish whether an event corresponding to the data occurred in large quantities or in small quantities. The processor 100 can check the influence or ripple effect of the business activities of individual companies in the market based on size. In the presented example, the processor 100 is illustrated as utilizing the recency, frequency, and scale of open data, but the present invention predicts the probability of occurrence of a company's potential risk by utilizing arbitrary time-series indicators generated by reprocessing open data. A person skilled in the art will understand that it can be implemented in this way.

도 4는 도 2에 도시된 표본 유입기의 동작을 설명하기 위한 도면이다.FIG. 4 is a diagram for explaining the operation of the sample inlet shown in FIG. 2.

도 4를 참조하면, 표본 유입기(예: 도 2의 표본 유입기(110))는 복수의 기관으로부터 기업에 관한 정보를 수집할 수 있다. 복수의 기관은 금융(또는, 여신) 기관을 포함할 수 있다. Referring to FIG. 4, a sample inlet device (e.g., sample inlet device 110 of FIG. 2) may collect information about a company from a plurality of organizations. The plurality of institutions may include financial (or credit) institutions.

표본 유입기(110)는 복수의 기관으로부터 준거 데이터로 사용할 기업의 대출과 관련된 정보를 수집할 수 있다. 대출과 관련된 정보는 기업 표본, 식별 번호, 기준 시점 및 부도 여부를 포함할 수 있다. 이외에도 대출과 관련된 정보는 부도 여부 이외에도 기업의 연체 여부, 폐업 여부 등 기업 대출과 관련된 임의의 정보를 포함할 수 있음은 통상의 기술자가 이해할 것이다.The sample inflow device 110 may collect information related to a company's loan to be used as reference data from a plurality of institutions. Information related to the loan may include company sample, identification number, reference point, and default status. In addition, those skilled in the art will understand that the information related to the loan may include arbitrary information related to the corporate loan, such as whether the company is delinquent or closed, in addition to whether it is bankrupt.

기업 표본은 표본으로 사용되는 기업의 차주 표본 범위에 관한 정보를 나타낼 수 있다. 차주 표본 범위는 은행, 카드사, 저축은행, 신용평가사와 같은 여신 기관 내의 전체 또는 일부의 차주를 포함할 수 있다. 차주는 법인 기업을 포함할 수 있다.A corporate sample can reveal information about the sample range of borrowers of the companies used as a sample. The borrower sample range may include all or part of the borrowers within credit institutions such as banks, credit card companies, savings banks, and credit rating agencies. Borrowers may include incorporated companies.

식별 번호는 기업의 식별 정보를 나타낼 수 있다. 식별 정보는 오픈 데이터의 유입 기준으로 활용 가능한 법인등록번호 또는 사업자 등록 번호를 포함할 수 있다.The identification number may represent the company's identification information. Identification information may include a corporate registration number or business registration number that can be used as a standard for inflow of open data.

기준 시점은 기업의 신용 리스크를 관리하기 위한 데이터의 측정 시점을 나타낼 수 있다. 기준 시점은 부도 여부를 측정할 차주별 대출 집행 또는 대출 집행 이후의 모니터링 시점을 포함할 수 있다.The reference point may indicate the point of measurement of data to manage a company's credit risk. The reference point may include loan execution for each borrower or monitoring time after loan execution to measure default.

부도 여부는 표본 기업의 부도 여부를 나타낼 수 있다. 부도 여부는 기준 시점으로부터 특정한 기간 내(예: 1년 또는 2년)의 실제 부도 발생 여부를 포함할 수 있다.Bankruptcy can indicate whether the sample company is bankrupt. Whether or not a default occurs may include whether an actual default occurs within a specific period (e.g., 1 year or 2 years) from the reference point.

표본 유입기(110)는 부도의 발생 여부를 정의할 수 있다. 표본 유입기(110)는 카드 연체, 채권 매각, 기업 폐업, 특정대손충당적립, 채무회생여부, 카드 대환 여부, 일반 상각, 카드 상각, 기업 워크아웃, 자산유동화증권(Asset Backed Securities) 발행 여부 및/또는 파산 신청 정보에 기초하여 기업의 부도 발생 여부를 정의할 수 있다. The sample inlet 110 can define whether default has occurred. The sample inflow unit 110 includes card delinquency, bond sale, corporate closure, provision for specific loan losses, debt recovery, card repayment, general write-off, card write-off, corporate workout, issuance of asset-backed securities, and /Or, it is possible to define whether a company goes bankrupt based on bankruptcy application information.

표본 유입기(110)는 정의한 부도의 기준을 복수의 금융 기관에 전달하여 부도 여부의 판단 기준으로 사용하도록 할 수 있다.The sample inflow device 110 can transmit the defined default criteria to a plurality of financial institutions so that they can be used as criteria for determining default.

제시된 예시에서는 표본 유입기(110)가 대출 관련 정보를 통해 기업의 부도 여부를 정의하고, 이를 활용하여 신용 리스크에 대한 리스크 발생 확률을 산출하는 방식으로 설명되었으나, 이는 본원 발명의 예시적인 구현례에 불과할 뿐, 실시예에 따라서, 부도 여부에 대한 정보 뿐만, 아니라 연체 여부, 폐업 여부 등 임의의 대출 관련 정보를 통해 기업의 신용 리스크에 대한 리스크 발생 확률을 산출/예측하는 방향으로 본원 발명이 구현될 수 있음은 통상의 기술자가 이해할 것이다.In the presented example, the sample inflow device 110 is explained in a way that defines whether the company is bankrupt through loan-related information and uses this to calculate the risk occurrence probability for credit risk, but this is not an exemplary implementation of the present invention. However, depending on the embodiment, the present invention may be implemented in the direction of calculating/predicting the probability of occurrence of a company's credit risk through not only information on bankruptcy, but also arbitrary loan-related information such as delinquency and business closure. A person skilled in the art will understand that this is possible.

도 5는 도 2에 도시된 데이터 수집기의 동작을 설명하기 위한 도면이다.FIG. 5 is a diagram for explaining the operation of the data collector shown in FIG. 2.

도 5를 참조하면, 데이터 수집기(예: 도 2의 데이터 수집기(120))는 다양한 기관으로부터 수집한 기업에 관한 오픈 데이터(공중에 공개된 데이터)를 리스크 관리에 적합한 형태로 가공하여 DB(예: 도 2의 DB(210))에 저장할 수 있다. Referring to FIG. 5, a data collector (e.g., data collector 120 in FIG. 2) processes open data (data disclosed to the public) about companies collected from various organizations into a form suitable for risk management and stores it in a database (e.g. : Can be stored in DB 210 of FIG. 2).

예시적으로 데이터 수집기(120)는 오픈 데이터 중 기업의 식별 정보(511), 조직 정보(512), 인증 정보(513), 지리 정보(514), 지식재산권 정보(515), 자금유치 정보(516), 기술개발 정보(517), 사업 정보(518), 언론보도 정보(519), 재무 정보(520) 및 기타 정보(521)을 수집할 수 있다.For example, the data collector 120 includes company identification information (511), organization information (512), authentication information (513), geographic information (514), intellectual property rights information (515), and fund attraction information (516) among open data. ), technology development information (517), business information (518), press release information (519), financial information (520), and other information (521) can be collected.

데이터 수집기(120)는 오픈 API(Application Programming Interface), 스크래핑, 컨버전(conversion)을 이용하여 기업에 대한 맞춤화 데이터 수집을 수행할 수 있다.The data collector 120 can collect customized data for a company using an open application programming interface (API), scraping, and conversion.

식별 정보(511) 기업의 상호명, 대표자명, 법인등록번호, 사업자등록 번호를 포함할 수 있다. 데이터 수집기(120)는 예시적으로, 대법원 등기소, 국세청으로부터 식별 정보(511)를 수집할 수 있다.Identification information (511) may include the company's business name, representative name, corporate registration number, and business registration number. The data collector 120 may, for example, collect identification information 511 from the Supreme Court Registry and the National Tax Service.

조직 정보(512)는 월별 고용, 월별 입/퇴사자, 평균 임금, 임직원 회전율에 대한 정보를 포함할 수 있다. 데이터 수집기(120)는 예시적으로 국민연금, 한국고용정보원으로부터 조직 정보(512)를 수집할 수 있다.Organizational information 512 may include information on monthly employment, monthly employees joining/leaving the company, average wages, and employee turnover rate. The data collector 120 may, for example, collect organizational information 512 from the National Pension Service and the Korea Employment Information Service.

인증 정보(513)은 벤처기업, 이노비즈, 메인비즈, 강소기업, (예비)사회적기업 등을 포함할 수 있다. 데이터 수집기(120)는 예시적으로 중소벤처기업부, 사회적기업진흥원으로부터 인증 정보(513)를 수집할 수 있다.Authentication information 513 may include venture companies, Innobiz, main biz, small giant companies, (preliminary) social enterprises, etc. The data collector 120 may exemplarily collect authentication information 513 from the Ministry of SMEs and Startups and the Social Enterprise Promotion Agency.

지리 정보(514)는 등기주소, 공시지가, 본점전출, 과거주소, 공시지가 이력 등을 포함할 수 있다. 데이터 수집기(120)는 예시적으로 국토교통부, 대법원 등기소로부터 지리 정보(514)를 수집할 수 있다.Geographic information 514 may include registered address, publicly announced land price, head office relocation, past address, publicly announced land price history, etc. The data collector 120 may, for example, collect geographic information 514 from the Ministry of Land, Infrastructure and Transport and the Supreme Court registry office.

지식재산권 정보(515)는 특허, 실용신안, 상표, 디자인의 출원/등록/공개 정보, 저작권 정보 등을 포함할 수 있다. 데이터 수집기(120)는 예시적으로 특허청, 한국저작권위원회로부터 지식재산권 정보(515)를 수집할 수 있다.Intellectual property rights information 515 may include application/registration/disclosure information of patents, utility models, trademarks, and designs, copyright information, etc. The data collector 120 may, for example, collect intellectual property rights information 515 from the Korean Intellectual Property Office and the Korea Copyright Commission.

자금유치 정보(516)는 기업별 투자유치금, 투자기관, 투자단계, 투자시기, 정부출연자금 정보 등을 포함할 수 있다. 데이터 수집기(120)는 예시적으로, 예탁결제원, 한국언론진흥재단으로부터 자금유치 정보(516)를 수집할 수 있다.Fund attraction information 516 may include investment attraction for each company, investment institution, investment stage, investment period, government contribution fund information, etc. The data collector 120 may, by way of example, collect fund attraction information 516 from the Korea Securities Depository and the Korea Press Foundation.

기술개발 정보(517)는 R&D 출연자금, 수행기간, 사업구분, 주관기관, 개발내용 등에 대한 정보를 포함할 수 있다. 데이터 수집기(120)는 예시적으로, 중소벤처기업부, 과학기술정보통신부로부터 기술개발 정보(517)를 수집할 수 있다.Technology development information 517 may include information on R&D contribution funds, performance period, business classification, supervising organization, development details, etc. The data collector 120 may, by way of example, collect technology development information 517 from the Ministry of SMEs and Startups and the Ministry of Science and ICT.

사업 정보(518)는 R&D기술개발내용, 조달청 입찰/낙찰, 홈페이지 텍스트 등에 대한 정보를 수집할 수 있다. 데이터 수집기(120)는 예시적으로 과학기술정보통신부, 조달청 등으로부터 사업 정보(518)을 수집할 수 있다.Business information (518) can collect information on R&D technology development content, Public Procurement Service bid/success, homepage text, etc. The data collector 120 may, for example, collect business information 518 from the Ministry of Science and ICT, the Public Procurement Service, etc.

언론보도 정보(519)는 언론 및 영상보도, 보도횟수, 발생시기, 언론사, 보도내용 등에 대한 정보를 포함할 수 있다. 데이터 수집기(120)는 예시적으로 한국언론진흥재단, 검색엔진 API로부터 언론보도 정보(519)를 수집할 수 있다.Media report information 519 may include information about media and video reports, number of reports, time of occurrence, media company, report contents, etc. The data collector 120 may, for example, collect press release information 519 from the Korea Press Foundation and search engine API.

재무 정보(520)는 손익 계산서, 재무상태표 관련 세부 항목 정보 등을 포함할 수 있다. 데이터 수집기(120)는 금융감독원, 기획재정부 등으로부터 재무 정보(520)를 수집할 수 있다.Financial information 520 may include detailed item information related to the income statement and statement of financial position. The data collector 120 may collect financial information 520 from the Financial Supervisory Service, the Ministry of Strategy and Finance, etc.

기타 정보(521)는 고액·상습 체납내역, 총체납액, 대표세목, 납기, 체납요지 등에 대한 정보를 포함할 수 있다. 데이터 수집기(120)는 국세청, 관세청 등으로부터 기타 정보(521)를 수집할 수 있다.Other information 521 may include information on large amounts and habitual delinquency details, total amount of delinquency, representative taxes, payment date, summary of delinquency, etc. The data collector 120 may collect other information 521 from the National Tax Service, the Korea Customs Service, etc.

앞서 설명된 오픈 데이터의 종류 및 수집처는 제시된 예시에 한정되는 것이 아니고, 데이터 수집기(120)에 의해 수집되는 오픈 데이터의 종류 및 그 수집처는 확장될 수 있음은 통상의 기술자가 이해할 것이다.Those skilled in the art will understand that the types and collection sources of open data described above are not limited to the examples presented, and that the types and collection sources of open data collected by the data collector 120 can be expanded.

도 6은 도 2에 도시된 로그 추출기의 동작을 설명하기 위한 도면이다.FIG. 6 is a diagram for explaining the operation of the log extractor shown in FIG. 2.

도 6을 참조하면, 로그 추출기(예: 도 2의 로그 추출기(130))는 오픈 데이터의 시간에 따른 로그 데이터를 추출함으로써 로그 파생 데이터를 생성할 수 있다. 도 6의 예시에서, 로그 추출기(130)는 오픈 데이터(예: 뉴스, 고용, 특허 또는 인증 항목에 대한 데이터) 각각에 대한 로그 데이터를 추출할 수 있다. 로그 데이터는 항목이 발생한 시간을 시계열적으로 기재한 데이터를 포함할 수 있다.Referring to FIG. 6, a log extractor (e.g., log extractor 130 of FIG. 2) may generate log derived data by extracting log data according to time of open data. In the example of FIG. 6 , log extractor 130 may extract log data for each piece of open data (e.g., data on news, employment, patents, or certification items). Log data may include data that describes the time at which an item occurred in a time series manner.

로그 추출기(130)는 로그 데이터를 처리하여 로그 파생 데이터를 생성할 수 있다. 예를 들어, 로그 파생 데이터는 로그 데이터를 관련된 시간에 기초하여 처리한 제1 로그 파생 데이터, 제2 로그 파생 데이터 및/또는 제3 로그 파생 데이터를 포함할 수 있다.The log extractor 130 may process log data to generate log derived data. For example, the log derived data may include first log derived data, second log derived data, and/or third log derived data that processes the log data based on the relevant time.

로그 추출기(130)는 로그 데이터로부터 로그 데이터에 대응되는 각각의 항목 항목의 최종 발생 시점을 획득함으로써 각각의 항목의 최신성을 나타내는 제1 로그 파생 데이터를 생성할 수 있다. 도 6의 예시에서, 제1 로그 파생 데이터는 데이터 경과 기간을 포함할 수 있다.The log extractor 130 may generate first log derived data indicating the recency of each item by obtaining the final occurrence time of each item corresponding to the log data from the log data. In the example of Figure 6, the first log derived data may include a data elapsed period.

로그 추출기(130)는 로그 데이터로부터 로그 데이터에 대응하는 각각의 항목이 발생한 모든 날짜를 획득함으로써 항목의 발생 빈도 나타내는 제2 로그 파생 데이터를 생성할 수 있다. 도 6의 예시에서, 제2 로그 파생 데이터는 각각의 로그 데이터에 대응하는 항목의 발생 빈도를 나타낼 수 있다. 데이터의 발생 빈도는 미리 결정된 기간 동안 발생한 데이터 건수에 기초하여 계산될 수 있다.The log extractor 130 may obtain from log data all dates on which each item corresponding to the log data occurred, thereby generating second log derived data indicating the frequency of occurrence of the item. In the example of FIG. 6, the second log derived data may indicate the frequency of occurrence of items corresponding to each log data. The frequency of occurrence of data may be calculated based on the number of data occurrences during a predetermined period.

로그 추출기(130)는 로그 데이터로부터 로그 데이터에 대응하는 항목의 발생 횟수를 획득함으로써 항목의 발생 규모를 나타내는 제3 로그 파생 데이터를 생성할 수 있다. 도 6의 예시에서, 제3 로그 파생 데이터는 각각의 로그 데이터의 전체 발생 건수를 포함할 수 있다.The log extractor 130 may generate third log derived data indicating the scale of occurrence of the item by obtaining the number of occurrences of the item corresponding to the log data from the log data. In the example of FIG. 6, the third log derived data may include the total number of occurrences of each log data.

앞서 제시된 제1 로그 파생 데이터 및 제2 로그 파생 데이터는 일례에 불과할 뿐, 다양한 형태의 데이터 처리를 통해 로그 데이터로부터 로그 파생 데이터가 생성될 수 있음은 통상의 기술자가 이해할 것이다.The first log-derived data and the second log-derived data presented above are only examples, and those skilled in the art will understand that log-derived data can be generated from log data through various types of data processing.

도 7은 도 2에 도시된 유의성 검수기의 동작의 일례를 설명하기 위한 도면이다.FIG. 7 is a diagram for explaining an example of the operation of the significance inspector shown in FIG. 2.

도 7을 참조하면, 유의성 검수기(예: 도 2의 유의성 검수기(140))는 로그 파생 데이터의 유의성을 검증함으로써 유효 데이터를 추출할 수 있다.Referring to FIG. 7, a significance checker (e.g., significance checker 140 of FIG. 2) can extract valid data by verifying the significance of log-derived data.

유의성 검수기(140)는 수집한 오픈 데이터로부터 추출된 로그 파생 데이터 각각에 대하여 유의성을 검증할 수 있다. 유의성 검수기(140)는 로그 파생 데이터 각각에 대응하는 항목(예를 들어, 수출 기록 데이터, 투자 유치 데이터, 특허 등록 데이터 등등)이 발생한 발생 기업 및 그렇지 않은 미발생 기업을 각각의 항목 별로 구분할 수 있다. 유의성 검수기(140)는 항목의 발생 기업 집단과 미발생 기업 집단 간의 리스크 발생 확률(예를 들어, 부도율) 차이를 계산하고, 리스크 발생 확률 차이에 기초하여 유의미한 데이터를 유효 데이터로 추출할 수 있다. 위 과정에서 예시적으로 제시된 부도율은 앞서 설명된 표본 유입기를 통해 획득한 대출과 관련된 정보에 기초하여 산출될 수 있다.The significance checker 140 can verify the significance of each log derived data extracted from the collected open data. The significance checker 140 can distinguish between companies that have generated items corresponding to each log-derived data (e.g., export record data, investment attraction data, patent registration data, etc.) and companies that have not generated items for each item. . The significance checker 140 may calculate the difference in probability of risk occurrence (e.g., default rate) between a group of companies in which an item occurs and a group of companies in which an item does not occur, and extract meaningful data as valid data based on the difference in probability of risk occurrence. The default rate presented as an example in the above process can be calculated based on information related to loans obtained through the sample inflow period described above.

유의성 검수기(140)는 수집한 데이터(예: 도 7의 데이터 6)로부터 추출한 로그 파생 데이터에 대응하는 항목이 발생한 기업과 발생하지 않은 기업을 구분할 수 있다. 예를 들어, 대상이 되는 오픈 데이터의 항목이 특허 데이터인 경우, 유의성 검수기(140)는 특허 데이터가 발생한 기업과 발생하지 않은 기업을 분류할 수 있다. 유의성 검수기(140)는 특허 데이터가 발생한 기업의 제1 리스크 발생 확률(예: 제1 부도율)과 발생하지 않은 기업의 제2 리스크 발생 확률(예: 제2 부도율)의 차이 통해 유효 데이터에 대응하는 로그 파생 데이터를 결정할 수 있다.The significance checker 140 can distinguish between companies in which an item corresponding to log-derived data extracted from the collected data (e.g., data 6 in FIG. 7) occurred and companies in which it did not. For example, when the target open data item is patent data, the significance checker 140 can classify companies that generate patent data and companies that do not. The significance checker 140 corresponds to valid data through the difference between the probability of occurrence of a first risk (e.g., first default rate) of a company in which patent data occurred and the probability of occurrence of a second risk (e.g., second default rate) of a company in which patent data did not occur. Log derived data can be determined.

일 실시예에 따르면, 유의성 검수기(140)는 앞서 설명된 제1 리스크 발생 확률 및 제2 리스크 발생 확률의 차이 대하여 통계적 검정 방법을 적용하여 유효 데이터를 결정할 수 있다. 예를 들어, 유의성 검수기(140)는 제1 리스크 발생 확률 및 제2 리스크 발생 확률의 차이에 대하여 카이제곱 검정(Chi-squared test)을 적용하여, 유의성이 있다고 결정되는 로그 파생 데이터를 유효 데이터로 결정할 수 있다. 앞서 예시적으로 제시된 검정 방법은 구현의 일례에 불과할 뿐, 임의의 통계적 방법을 통한 검정을 통해 유의성이 결정될 수 있음은 통상의 기술자가 이해할 것이다.According to one embodiment, the significance checker 140 may determine valid data by applying a statistical test method to the difference between the probability of occurrence of the first risk and the probability of occurrence of the second risk described above. For example, the significance checker 140 applies a Chi-squared test to the difference between the probability of occurrence of the first risk and the probability of occurrence of the second risk, and converts the log-derived data determined to be significant into valid data. You can decide. The test method presented illustratively above is only an example of implementation, and those skilled in the art will understand that significance can be determined through a test using an arbitrary statistical method.

다른 실시예에 따르면, 유의성 검수기(140)는 도 7에 제시된 바와 같이 리스크 발생 확률(예를 들어, 부도율)의 차이가 문턱값보다 큰지 여부를 판단하고, 판단 결과에 기초하여 유효 데이터를 결정할 수도 있다. 예를 들어, 유의성 검수기(140)는 부도율의 차이가 문턱값보다 큰 경우, 해당 항목에 대응하는 로그 파생 데이터를 유의미한 데이터로 판단하여 유효 데이터로 결정할 수 있고, 부도율의 차이가 문턱값보다 작거나 같은 경우, 해당 항목에 대응하는 로그 파생 데이터를 무의미한 데이터로 판단하여 유효 데이터가 아닌 것으로 결정할 수 있다.도 7의 예시에서, 문턱값은 0.01일 수 있다. 이 경우, 유의성 검수기(140)는 부도율의 차이가 0.01 초과인 경우, 데이터 6에 대응하는 로그 파생 데이터를 유의미한 데이터라고 판단하여, 이를 유효 데이터로 결정할 수 있다.According to another embodiment, the significance checker 140 may determine whether the difference in risk occurrence probability (e.g., default rate) is greater than a threshold as shown in FIG. 7 and determine valid data based on the judgment result. there is. For example, if the difference in default rates is greater than the threshold, the significance checker 140 may determine the log-derived data corresponding to the item to be meaningful data and determine it as valid data, or if the difference in default rates is less than the threshold. In the same case, the log-derived data corresponding to the item may be judged as meaningless data and determined not to be valid data. In the example of FIG. 7, the threshold may be 0.01. In this case, if the difference in the default rate is greater than 0.01, the significance checker 140 may determine that the log-derived data corresponding to data 6 is significant data and determine this as valid data.

앞서 제시된 오픈 데이터에 포함되는 각각의 항목의 종류 및 문턱값은 예시에 불과할 뿐, 본원 발명이 이에 한정되는 것이 아님은 통상의 기술자가 이해할 것이다.Those skilled in the art will understand that the types and thresholds of each item included in the open data presented above are merely examples, and the present invention is not limited thereto.

또 다른 실시예에 따르면, 프로세서(100)는 리스크(신용 리스크로써 예시적으로 부도)가 발생된 기업군의 로그 파생 데이터의 평균치와 리스크가 발생되지 않은 기업군의 로그 파생 데이터의 평균치에 통계적 검정 방법(예를 들어, T-검정(T-test))을 적용하여 유효데이터를 결정할 수 있다. 보다 구체적으로, 프로세서(100)는 로그 파생 데이터 각각에 대하여, 부도가 발생한 기업의 평균치와 부도가 발생하지 않은 평균치가 통계적으로 유의미한 차이를 가지는 경우(예를 들어, t 통계량이 1.64 이상이며, p-value가 0.05 미만), 해당 로그 파생 데이터를 유효 데이터로 결정할 수 있다.According to another embodiment, the processor 100 uses a statistical test method ( For example, valid data can be determined by applying a T-test. More specifically, the processor 100, for each log derived data, if there is a statistically significant difference between the average value of companies that have defaulted and the average value that has not defaulted (for example, the t statistic is 1.64 or more, p -value is less than 0.05), the log-derived data can be determined as valid data.

도 8은 도 2에 도시된 점수 계산기의 동작을 설명하기 위한 도면이다.FIG. 8 is a diagram for explaining the operation of the score calculator shown in FIG. 2.

도 8을 참조하면, 점수 계산기(예: 도 2의 점수 계산기(150))는 유효 데이터로 결정된 로그 파생 데이터에 기초하여 기업의 잠재 리스크 발생 확률(예를 들어, 잠재 부도율)을 계산할 수 있다. 예를 들어, 점수 계산기(150)는 로그 파생 데이터에 따른 평균 리스크 발생 확률(예를 들어, 평균 부도율)의 분포에 기초하여 제1 로그 파생 데이터에 대응하는 제1 평균 리스크 발생 확률(예를 들어, 제1 평균 부도율), 제2 로그 파생 데이터에 대응하는 제2 평균 리스크 발생 확률(예를 들어, 제2 평균 부도율) 및 제3 로그 파생 데이터에 대응하는 제3 평균 리스크 발생 확률(예를 들어, 제2 평균 부도율) 을 계산할 수 있다.Referring to FIG. 8, a score calculator (e.g., score calculator 150 of FIG. 2) may calculate the probability of occurrence of a potential risk (e.g., potential default rate) of a company based on log-derived data determined as valid data. For example, the score calculator 150 may determine a first average risk occurrence probability (e.g., , a first average default rate), a second average risk occurrence probability corresponding to the second log-derived data (e.g., a second average default rate), and a third average risk occurrence probability corresponding to the third log-derived data (e.g. , the second average default rate) can be calculated.

점수 계산기(150)는 제1 평균 리스크 발생 확률, 제2 평균 리스크 발생 확률, 제3 평균 리스크 발생 확률, 제1 평균 리스크 발생 확률에 대응하는 제1 가중치, 제2 평균 리스크 발생 확률에 대응하는 제2 가중치 및 제3 평균 리스크 발생 확률에 대응하는 제3 가중치에 기초하여 잠재 리스크 발생 확률을 계산할 수 있다.The score calculator 150 includes a first average risk occurrence probability, a second average risk occurrence probability, a third average risk occurrence probability, a first weight corresponding to the first average risk occurrence probability, and a second average risk occurrence probability. The probability of occurrence of a potential risk can be calculated based on the third weight corresponding to the second weight and the third average probability of risk occurrence.

도 8의 예시에서, 점수 계산기(150)는 최신성에 대한 평균 부도율의 분포에 기초하여 제1 평균 부도율(예: 0.3%)을 계산할 수 있다. 점수 계산기(150)는 빈도에 대한 평균 부도율의 분포에 기초하여 제2 평균 부도율(예: 0.9%)을 계산할 수 있다. 점수 계산기(150)는 규모에 대한 평균 부도율의 분포에 기초하여 제3 평균 부도율(예: 0.1%)을 계산할 수 있다.In the example of FIG. 8 , the score calculator 150 may calculate a first average default rate (eg, 0.3%) based on the distribution of average default rates for recency. The score calculator 150 may calculate a second average default rate (eg, 0.9%) based on the distribution of the average default rate with respect to frequency. The score calculator 150 may calculate a third average default rate (eg, 0.1%) based on the distribution of average default rates for scale.

도 8의 예시는 가중치가 모두 1인 경우를 예시적으로 나타낼 수 있다. 실시예에 따라, 가중치는 1이 아니고 평균 부도율에 따라 상이한 값을 가질 수 있다.The example in FIG. 8 may exemplarily represent a case where all weights are 1. Depending on the embodiment, the weight may not be 1 and may have different values depending on the average default rate.

점수 계산기(150)는 제1 로그 파생 데이터, 제2 로그 파생 데이터 및 제3 로그 파생 데이터를 인공지능 알고리즘에 입력함으로써 제1 가중치, 제2 가중치 및 제3 가중치를 결정할 수 있다.The score calculator 150 may determine the first weight, the second weight, and the third weight by inputting the first log-derived data, the second log-derived data, and the third log-derived data into an artificial intelligence algorithm.

점수 계산기(150)는 로그 데이터 또는 로그 파생 데이터를 인공지능 알고리즘에 입력하여 평균 부도율의 분포를 파악하고, 평균 부도율 분포에 기초하여 잠재 부도율을 계산할 수 있다.The score calculator 150 may input log data or log-derived data into an artificial intelligence algorithm to determine the distribution of the average default rate and calculate the potential default rate based on the average default rate distribution.

앞서 제시된 예시에서 제1 평균 부도율 내지 제3 평균 부도율에 기초하여 잠재 부도율이 산출되는 것으로 예시되었으나, 이는 본원 발명의 일례에 불과할 뿐, 로그 파생 데이터를 산출하는 방식에 따라 평균 부도율이 산출되는 개수 등은 다양하게 구현될 수 있다.In the example presented above, it is illustrated that the potential default rate is calculated based on the first average default rate to the third average default rate, but this is only an example of the present invention, and the number of average default rates calculated according to the method of calculating log derived data, etc. can be implemented in various ways.

도 9는 도 2에 도시된 등급 설계기의 동작을 설명하기 위한 도면이다.FIG. 9 is a diagram for explaining the operation of the grade designer shown in FIG. 2.

도 9를 참조하면, 등급 설계기(예: 도 2의 등급 설계기(160))는 잠재 리스크 발생 확률(예를 들어, 잠재 부도율)에 기초하여 기업의 신용 평가 등급을 결정할 수 있다.Referring to FIG. 9, a rating designer (e.g., the rating designer 160 of FIG. 2) may determine a company's credit rating based on the probability of occurrence of a potential risk (e.g., a potential default rate).

등급 설계기(160)는 평가 표본 전체 기업군의 예상 리스크 발생 확률(예를 들어, 예상 부도율)을 계산할 수 있다. 등급 설계기(160)는 예상 리스크 발생 확률 분포에 기초하여 신용 평가 등급을 구분할 수 있다. 등급 설계기(160)는 균등 배분 방식에 기초하여 관리 등급을 구분할 수 있다.The rating designer 160 may calculate the expected risk occurrence probability (eg, expected default rate) for the entire group of companies in the evaluation sample. The rating designer 160 may classify credit evaluation ratings based on the distribution of expected risk occurrence probability. The grade designer 160 can classify management grades based on an equal distribution method.

등급 설계기(160)는 전체 기업군의 예상 리스크 발생 확률 및 계산한 잠재 리스크 발생 확률에 기초하여 기업의 신용 평가 등급을 결정할 수 있다.The rating designer 160 may determine the company's credit rating based on the expected risk occurrence probability of the entire company group and the calculated potential risk occurrence probability.

신용 평가 등급은 복수의 등급으로 구성될 수 있다. 도 9의 예시는, 10 등급으로 신용 평가 등급을 분할한 결과를 나타내고 있지만, 실시예에 따라 신용 평가 등급은 10개 미만 또는 10개 초과일 수 있다.A credit rating may consist of multiple ratings. The example of FIG. 9 shows the result of dividing the credit rating into 10 levels, but depending on the embodiment, the credit rating may be less than 10 or more than 10.

등급 설계기(160)는 신용 평가 등급이 높은 분위의 기업일 경우, 리스크 발생 확률(예를 들어, 부도율)이 높은 기업으로 분류할 수 있고, 신용 평가 등급이 낮은 분위의 기업일 경우, 리스크 발생 확률이 낮은 기업으로 분류할 수 있다.The rating designer 160 may classify a company as having a high risk occurrence probability (e.g., default rate) if the company has a high credit rating rating, and may classify the company as having a high risk occurrence probability (e.g., default rate), and if the company has a low credit rating rating, the rating designer 160 may classify the company as having a high risk occurrence probability (e.g., default rate). It can be classified as a company with low probability.

도 10은 도 2에 도시된 성능 평가기의 리스크 발생 확률 계산 동작을 설명하기 위한 도면이다. 도 11은 도 2에 도시된 성능 평가기의 신용 평가 결과의 예를 나타낸다.FIG. 10 is a diagram for explaining the risk occurrence probability calculation operation of the performance evaluator shown in FIG. 2. Figure 11 shows an example of a credit evaluation result of the performance evaluator shown in Figure 2.

도 10 및 도 11을 참조하면, 성능 평가기(예: 도 2의 성능 평가기(170))는 신용 평가 등급에 기초하여 신용 평가 결과를 생성할 수 있다. 성능 평가기(170)는 신용 평가 결과의 성능을 평가할 수 있다.Referring to FIGS. 10 and 11 , a performance evaluator (eg, performance evaluator 170 of FIG. 2 ) may generate a credit evaluation result based on the credit evaluation grade. The performance evaluator 170 may evaluate the performance of the credit evaluation result.

성능 평가기(170)는 AR(Accuracy Ratio)에 기초하여 성능을 평가할 수 있다. 예를 들어, 성능 평가기(170)는 실재 부도 여부에 관한 데이터와 등급 설계기(예: 도 2의 등급 설계기(160))가 계산한 등급에 따른 부도율에 기초하여 AR을 계산할 수 있다.The performance evaluator 170 may evaluate performance based on AR (Accuracy Ratio). For example, the performance evaluator 170 may calculate AR based on data regarding actual default and the default rate according to the grade calculated by the rating designer (e.g., the rating designer 160 of FIG. 2).

도 11의 예시에서, 전체 차주(예: 기업)의 누적 구성비가 증가함에 따라 AR이 증가하는 것을 확인할 수 있다.In the example of Figure 11, it can be seen that AR increases as the cumulative composition ratio of all borrowers (e.g., companies) increases.

도 12는 도 1에 도시된 신용 평가 정보 제공 장치의 동작의 흐름도를 나타낸다.FIG. 12 shows a flowchart of the operation of the credit rating information providing device shown in FIG. 1.

도 12를 참조하면, 프로세서(예: 도 1의 프로세서(100))는 기업 관련 정보를 수집할 수 있다(1210). 예를 들어, 기업과 관련된 정보는 오픈 데이터일 수 있으며, 오픈 데이터는 공중에 공개되는 일련의 정보로써, 기업의 사업 내용, 기업의 재무 정보, 기업의 고용 정보, 기업에 대한 투자 정보, 기업이 보유한 기술에 관한 정보, 기업의 기업 인증 정보 및/또는 기업의 홍보 자료에 대응되는 개별 항목을 포함할 수 있다. 오픈 데이터에 포함되는 항목의 종류는 제시된 예시에 한정되는 것이 아님을 통상의 기술자가 이해할 것이다.Referring to FIG. 12, a processor (e.g., processor 100 of FIG. 1) may collect company-related information (1210). For example, information related to a company may be open data, and open data is a series of information that is open to the public, such as the company's business details, the company's financial information, the company's employment information, the company's investment information, and the company's information. It may include individual items corresponding to information about the technology possessed, the company's corporate certification information, and/or the company's promotional materials. Those skilled in the art will understand that the types of items included in open data are not limited to the presented examples.

프로세서(100)는 오픈 데이터의 시간에 따른 로그 데이터를 추출함으로써 로그 파생 데이터를 생성할 수 있다(1230). 예시적으로 로그 파생 데이터는 제1 로그 파생 데이터, 제2 로그 파생 데이터 및/또는 제3 로그 파생 데이터를 포함할 수 있다.The processor 100 may generate log derived data by extracting log data according to time from open data (1230). Exemplarily, the log-derived data may include first log-derived data, second log-derived data, and/or third log-derived data.

프로세서(100)는 로그 데이터로부터 로그 데이터에 대응하는 각각의 항목의 최종 발생 시점을 획득함으로써 각각의 항목의 최신성을 나타내는 제1 로그 파생 데이터를 생성할 수 있다. 프로세서(100)는 로그 데이터로부터 로그 데이터에 대응하는 항목이 발생한 모든 날짜를 획득함으로써 항목의 발생 빈도 나타내는 제2 로그 파생 데이터를 생성할 수 있다. 프로세서(100)는 로그 데이터로부터 로그 데이터에 대응하는 항목의 발생 횟수를 획득함으로써 항목의 발생 규모를 나타내는 제3 로그 파생 데이터를 생성할 수 있다.The processor 100 may generate first log derived data indicating the recency of each item by obtaining the final occurrence time of each item corresponding to the log data from the log data. The processor 100 may obtain from the log data all dates on which items corresponding to the log data occurred, thereby generating second log derived data indicating the frequency of occurrence of the items. The processor 100 may generate third log derived data indicating the scale of occurrence of the item by obtaining the number of occurrences of the item corresponding to the log data from the log data.

프로세서(100)는 로그 파생 데이터의 유의성을 검증함으로써 유효 데이터를 추출할 수 있다(1250). 보다 구체적으로, 프로세서(100)는 각각의 로그 파생 데이터에 대응하는 항목의 유의성을 검증하여 유효 데이터를 획득할 수 있으며, 이는 도 7을 통해 설명된 바와 같을 수 있다.The processor 100 may extract valid data by verifying the significance of the log-derived data (1250). More specifically, the processor 100 may obtain valid data by verifying the significance of the item corresponding to each log derived data, which may be as described with reference to FIG. 7 .

예를 들어, 프로세서(100)는 각각의 로그 파생 데이터에 대하여, 로그 파생 데이터에 대응하는 항목이 발생된 기업의 제1 리스크 발생 확률(예를 들어, 제1부도율)을 획득할 수 있다. 프로세서(100)는 항목이 발생되지 않은 기업의 제2 리스크 발생 확률(제2 부도율)을 획득할 수 있다. 프로세서(100)는 제1 리스크 발생 확률과 제2 리스크 발생 확률의 차이에 통계적인 검정 방법을 적용하거나, 해당 차이가 미리 결정된 문턱보다 큰지 여부에 기초하여 유효 데이터를 결정할 수 있다. 유효 데이터를 결정하는 구체적인 방식은 앞서 도7에서 설명된 바와 동일할 수 있다.For example, for each log-derived data, the processor 100 may obtain the first risk occurrence probability (eg, first default rate) of the company in which the item corresponding to the log-derived data was generated. The processor 100 may obtain the second risk occurrence probability (second default rate) of the company in which no item has occurred. The processor 100 may apply a statistical test method to the difference between the probability of occurrence of the first risk and the probability of occurrence of the second risk, or may determine valid data based on whether the difference is greater than a predetermined threshold. The specific method of determining valid data may be the same as previously described in FIG. 7.

프로세서(100)는 로그 파생 데이터에 기초하여 기업의 잠재 리스크 발생 확률을 계산할 수 있다(1270). 프로세서(100)는 로그 파생 데이터에 따른 평균 리스크 발생 확률(예를 들어, 로그 파생 데이터별 평균 부도율)의 분포에 기초하여 제1 로그 파생 데이터에 대응하는 제1 평균 리스크 발생 확률, 제2 로그 파생 데이터에 대응하는 제2 평균 리스크 발생 확률 및 제3 로그 파생 데이터에 대응하는 제3 평균 리스크 발생 확률을 계산할 수 있다.The processor 100 may calculate the probability of occurrence of a company's potential risk based on log-derived data (1270). The processor 100 generates a first average risk occurrence probability corresponding to the first log derived data and a second log derivative based on the distribution of the average risk occurrence probability according to the log derived data (e.g., average default rate for each log derived data). A second average risk occurrence probability corresponding to the data and a third average risk occurrence probability corresponding to the third log-derived data may be calculated.

프로세서(100)는 제1 평균 리스크 발생 확률, 제2 평균 리스크 발생 확률, 제3 평균 리스크 발생 확률, 제1 평균 리스크 발생 확률에 대응하는 제1 가중치, 제2 평균 리스크 발생 확률에 대응하는 제2 가중치 및 제3 평균 리스크 발생 확률에 대응하는 제3 가중치에 기초하여 잠재 리스크 발생 확률(잠재 부도율)을 계산할 수 있다.The processor 100 includes a first average risk occurrence probability, a second average risk occurrence probability, a third average risk occurrence probability, a first weight corresponding to the first average risk occurrence probability, and a second average risk occurrence probability. The probability of occurrence of a potential risk (potential default rate) can be calculated based on the third weight corresponding to the weight and the third average probability of risk occurrence.

프로세서(100)는 제1 로그 파생 데이터, 제2 로그 파생 데이터 및 제3 로그 파생 데이터를 뉴럴 네트워크에 입력함으로써 제1 가중치, 제2 가중치 및 제3 가중치를 결정할 수 있다. 실시예에 따라서, 프로세서(100)는 제1 로그 파생 데이터, 제2 로그 파생 데이터 및 제3 로그 파생 데이터를 인공지능 알고리즘에 입력함으로써 잠재 리스크 발생 확률을 직접 계산할 수도 있다.The processor 100 may determine the first weight, the second weight, and the third weight by inputting the first log-derived data, the second log-derived data, and the third log-derived data into the neural network. Depending on the embodiment, the processor 100 may directly calculate the probability of occurrence of a potential risk by inputting the first log-derived data, the second log-derived data, and the third log-derived data into an artificial intelligence algorithm.

프로세서(100)는 인공지능 알고리즘을 학습시킬 수 있다. 프로세서(100)는 학습된 인공지능 알고리즘에 기초하여 가중치를 결정하거나, 직접적으로 잠재 부도율을 산출할 수도 있다.The processor 100 can learn an artificial intelligence algorithm. The processor 100 may determine the weight based on a learned artificial intelligence algorithm or directly calculate the potential default rate.

프로세서(100)는 잠재 리스크 발생 확률에 기초하여 기업의 신용 평가 등급을 결정함으로써 신용 평가 결과를 생성할 수 있다(1290).The processor 100 may generate a credit evaluation result by determining the company's credit rating based on the probability of occurrence of a potential risk (1290).

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented with hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods, and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, and a field programmable gate (FPGA). It may be implemented using a general-purpose computer or a special-purpose computer, such as an array, programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and software applications running on the operating system. Additionally, a processing device may access, store, manipulate, process, and generate data in response to the execution of software. For ease of understanding, a single processing device may be described as being used; however, those skilled in the art will understand that a processing device includes multiple processing elements and/or multiple types of processing elements. It can be seen that it may include. For example, a processing device may include multiple processors or one processor and one controller. Additionally, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may include a computer program, code, instructions, or a combination of one or more of these, which may configure a processing unit to operate as desired, or may be processed independently or collectively. You can command the device. Software and/or data may be used on any type of machine, component, physical device, virtual equipment, computer storage medium or device to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored on a computer-readable recording medium.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 저장할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. A computer-readable medium may store program instructions, data files, data structures, etc., singly or in combination, and the program instructions recorded on the medium may be specially designed and constructed for the embodiment or may be known and available to those skilled in the art of computer software. there is. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -Includes optical media (magneto-optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include machine language code, such as that produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or multiple software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited drawings, those skilled in the art can apply various technical modifications and variations based on this. For example, the described techniques are performed in a different order than the described method, and/or components of the described system, structure, device, circuit, etc. are combined or combined in a different form than the described method, or other components are used. Alternatively, appropriate results may be achieved even if substituted or substituted by an equivalent.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents of the claims also fall within the scope of the claims described below.

Claims

Collect public data, including intellectual property information, authentication information, and organizational information related to the company;
By extracting log data over time for each of the plurality of items included in the public data, log derived data corresponding to each of the plurality of items is generated,
Extract valid data by verifying the significance of the log-derived data,
Calculate the probability of occurrence of a potential risk of the company based on log-derived data corresponding to the valid data,
A processor that generates credit evaluation results by determining the credit rating of the company based on the probability of occurrence of the potential risk.
Including,
The processor,
For each of the log-derived data, calculate a first average value of the log-derived data for the first group of companies in which a risk occurred and a second average value of the log-derived data for a second group of companies in which a risk did not occur,
A T-test is applied to the difference between the first average value and the second average value to determine whether the data is valid, and the log-derived data determined to have a significant difference between the first average value and the second average value are valid. data, and determine that log-derived data for which the difference between the first average value and the second average value is not significantly different is not the valid data,
The risk corresponding to the first average value and the second average value is calculated based on loan-related information of the first group of companies and the second group of companies,
A corporate credit evaluation information providing device in which the plurality of items correspond to information different from the loan-related information.

According to paragraph 1,
The processor,
Based on the distribution of the average risk occurrence probability according to the log-derived data, calculate the average risk occurrence probability corresponding to the log-derived data,
Calculating the potential default rate based on a weight for the average default rate,
A device for providing corporate credit rating information.

According to paragraph 1,
The processor,
Generate first log derived data indicating recency of the item by obtaining the last occurrence time of the item corresponding to the log data from the log data,
Generate second log derived data indicating the frequency of occurrence of the item by obtaining all dates on which the log data occurred from the log data,
Generating third log derived data indicating the magnitude of occurrence of the item by obtaining the number of occurrences of the log data from the log data,
A device for providing corporate credit rating information.

According to paragraph 3,
The processor,
Based on the distribution of the average risk occurrence probability according to the log-derived data, the first average risk occurrence probability corresponding to the first log-derived data, the second average risk occurrence probability corresponding to the second log-derived data, and the third Calculate the third average risk occurrence probability corresponding to the log-derived data,
The first average probability of risk occurrence, the second average probability of risk occurrence, the third average probability of risk occurrence, a first weight corresponding to the first average probability of risk occurrence, and a second weight corresponding to the second average probability of risk occurrence Calculating the probability of occurrence of the potential risk based on the third weight corresponding to the weight and the third average probability of risk occurrence,
A device for providing corporate credit rating information.

According to paragraph 4,
The processor,
Determining the first weight, the second weight, and the third weight by inputting the first log-derived data, the second log-derived data, and the third log-derived data into a machine learning model,
A device for providing corporate credit rating information.

In a method of providing corporate credit evaluation information performed by a corporate credit evaluation information providing device,
collecting public data including intellectual property information, authentication information, and organizational information related to the enterprise;
generating log derived data corresponding to each of the plurality of items by extracting log data over time for each of the plurality of items included in the public data;
extracting valid data by verifying the significance of the log-derived data;
calculating a probability of occurrence of a potential risk of the company based on the log-derived data; and
Generating a credit evaluation result by determining the credit rating of the company based on the probability of occurrence of the potential risk
Including,
The step of extracting the valid data is,
For each of the log-derived data, calculating a first average value of the log-derived data for a first group of companies in which a risk occurred and a second average value of the log-derived data for a second group of companies in which a risk did not occur; and
A T-test is applied to the difference between the first average value and the second average value to determine whether the data is valid, and the log-derived data determined to have a significant difference between the first average value and the second average value are valid. determining the log-derived data as data, and determining that the difference between the first average value and the second average value is not significant, is not the valid data.
Including,
The risk corresponding to the first average value and the second average value is calculated based on loan-related information of the first group of companies and the second group of companies,
A method of providing corporate credit evaluation information, wherein the plurality of items correspond to information different from the information related to the loan.