KR102062560B1

KR102062560B1 - Method and system for integrated knowledge database construction based on production rules

Info

Publication number: KR102062560B1
Application number: KR1020180066327A
Authority: KR
Inventors: 이승룡; 막불알리
Original assignee: 경희대학교 산학협력단
Priority date: 2018-06-08
Filing date: 2018-06-08
Publication date: 2020-01-06
Also published as: KR20190139632A

Abstract

본 발명은 서버가 규칙 기반의 지식 데이터베이스를 구축하는 방법에 관한 것이다. 보다 구체적으로 본 발명은 외부 데이터 세트 소스로부터 데이터 세트를 선택하는 a단계, 상기 데이터 세트에 대응되는 의사 결정 트리 알고리즘을 이용하여 분류 모델을 생성하는 b단계, 상기 분류 모델에서 조건 및 결과를 추출하는 c단계, 상기 조건 및 결과를 이용하여 XML 모델을 생성하는 d단계, 상기 XML 모델의 세부 정보를 추출하는 e단계, 상기 세부 정보를 이용하여 생산 규칙을 생성하고 검증하는 f단계, 상기 a 내지 f단계를 반복하여 하나 이상의 데이터 세트에서 생성된 생산 규칙을 상기 지식 데이터베이스에 저장하는 단계를 포함하는 것을 특징으로 한다.The present invention relates to a method for a server to build a rules-based knowledge database. More specifically, the present invention provides a method for selecting a data set from an external data set source, a step b for generating a classification model using a decision tree algorithm corresponding to the data set, and extracting conditions and results from the classification model. step c, step d for generating an XML model using the conditions and results, step e for extracting details of the XML model, step f for generating and verifying a production rule using the details, and steps a to f Repeating the steps to store the production rules generated in one or more data sets in the knowledge database.

Description

Rule-based knowledge database construction method and system {METHOD AND SYSTEM FOR INTEGRATED KNOWLEDGE DATABASE CONSTRUCTION BASED ON PRODUCTION RULES}

본 발명은 지식 데이터베이스를 구축하는 방법에 관한 것으로, 보다 자세하게는 데이터 세트를 이용하여 규칙 기반의 지식 데이터베이스를 구축하는 방법 및 시스템에 관한 것이다.The present invention relates to a method of building a knowledge database, and more particularly, to a method and system for building a rule-based knowledge database using a data set.

지식 공학은 더 나은 의사 결정을 생성하기 위해 지식 기반의 데이터베이스를 구축하는 핵심 연구 분야 중 하나이다. 상기 연구 분야에서 의사 결정 트리, 생산 규칙 및 의사 결정 그래프와 같은 지식 표현 기술에 중점을 두고 연구가 진행되고 있다. 이 중 생산 규칙은 간결하고, 이해하기 쉬우며, 예측 가능하고 의사 결정을 지원하는 점에 있어서 그 신뢰성이 우수하므로 가장 보편적으로 사용되고 있다.Knowledge engineering is one of the key areas of research to build knowledge-based databases to create better decisions. In the research field, research is being focused on knowledge expression techniques such as decision trees, production rules, and decision graphs. Among these, production rules are most commonly used because they are concise, easy to understand, predictable, and reliable in supporting decision making.

그러나 데이터의 크기가 급격하게 증가함에 따라 전문가는 개인적인 지식을 활용하여 지식 데이터베이스를 구축하는 데에 어려움을 가질 수 있다. 따라서 이러한 방대한 크기를 갖는 데이터를 이용하여 지식 데이터베이스를 구축하기 위해 의사 결정 트리 알고리즘이 사용될 수 있다.However, as the size of data grows rapidly, professionals may have difficulty building knowledge databases using personal knowledge. Therefore, a decision tree algorithm can be used to build a knowledge database using such a huge amount of data.

의사 결정 트리 알고리즘을 이용하여 생산 규칙을 생성하는 방법으로 다양한 시스템이 제안되었다. 대부분의 시스템은 J48 클래스를 기반으로 하는 의사 결정 트리만이 XML 모델을 생성할 수 있다. 즉, 기존의 방법은 다수의 의사 결정 트리를 고려하지 않고 하나의 의사 결정 트리를 기반으로 생산 규칙을 생성 한다는 단점이 있다.Various systems have been proposed as a method for generating production rules using a decision tree algorithm. In most systems, only decision trees based on J48 classes can generate XML models. That is, the conventional method has a disadvantage in that a production rule is generated based on one decision tree without considering a plurality of decision trees.

본 발명은 전술한 문제점을 해결하기 위한 것으로서, 생산 규칙을 기반으로 하는 지식 데이터베이스를 구축하는 것을 일 목적으로 한다.The present invention has been made to solve the above-mentioned problem, and an object thereof is to build a knowledge database based on production rules.

또한 본 발명은 데이터 세트로부터 신뢰 가능한 생산 규칙을 생성하는 것을 일 목적으로 한다.It is also an object of the present invention to create a reliable production rule from a data set.

또한 본 발명은 하나 이상의 의사 결정 트리 알고리즘을 이용하여 생산 규칙을 생성하는 것을 일 목적으로 한다.It is also an object of the present invention to generate a production rule using one or more decision tree algorithms.

또한 본 발명은 전문가로부터 생산 규칙을 검증 받음으로써 신뢰도를 향상시키는 것을 일 목적으로 한다.In addition, the present invention is to improve the reliability by verifying the production rules from experts.

이러한 목적을 달성하기 위한 본 발명은 서버가 규칙 기반의 지식 데이터베이스를 구축하는 방법에 있어서, 외부 데이터 세트 소스로부터 데이터 세트를 선택하는 a단계, 상기 데이터 세트에 대응되는 의사 결정 트리 알고리즘을 이용하여 분류 모델을 생성하는 b단계, 상기 분류 모델에서 조건 및 결과를 추출하는 c단계, 상기 조건 및 결과를 이용하여 XML 모델을 생성하는 d단계, 상기 XML 모델의 세부 정보를 추출하는 e단계, 상기 세부 정보를 이용하여 생산 규칙을 생성하고 검증하는 f단계, 상기 a 내지 f단계를 반복하여 하나 이상의 데이터 세트에서 생성된 생산 규칙을 상기 지식 데이터베이스에 저장하는 단계를 포함하는 것을 일 특징으로 한다.In order to achieve the above object, the present invention provides a method for building a rule-based knowledge database by a server, the method comprising selecting a data set from an external data set source and classifying using a decision tree algorithm corresponding to the data set. Step b of generating a model, step c of extracting conditions and results from the classification model, step d of generating an XML model using the conditions and results, step e of extracting detailed information of the XML model, the detail information By using step f to generate and verify a production rule, and repeating the steps a to f characterized in that it comprises the step of storing the production rule generated in one or more data sets in the knowledge database.

또한 상기 의사 결정 트리 알고리즘은 BFTree, J48, RandomTree, REPTree 또는 SimpleCart 중 적어도 하나를 포함하는 것을 일 특징으로 한다.In addition, the decision tree algorithm may include at least one of BFTree, J48, RandomTree, REPTree, or SimpleCart.

나아가 상기 c단계는, 상기 분류 모델에 텍스트 트리밍, 텍스트 분할 및 특수 문자 대체 기술을 사용함으로써 상기 분류 모델의 헤더 정보를 제거하고 조건 및 결과를 추출하는 것을 일 특징으로 한다.Further, the step c is characterized by removing header information of the classification model and extracting conditions and results by using text trimming, text segmentation, and special character replacement techniques in the classification model.

또한 상기 d단계는, 상기 분류 모델에 연산자 배열, 들여쓰기 및 파일 변환 작업을 수행하여 상기 XML 모델로 변환하는 단계를 포함하는 것을 일 특징으로 한다.Also, the d step may include converting the XML model to the XML model by performing an operator array, an indentation, and a file conversion operation on the classification model.

나아가 상기 e단계는, DOM 분석기를 이용하여 상기 XML 모델을 분석하는 단계, 상기 XML 모델의 분석 결과를 이용하여 속성, 연산자 및 값을 포함하는 상기 생산 규칙을 생성하는 단계를 포함하는 것을 일 특징으로 한다.Furthermore, step e may include analyzing the XML model using a DOM analyzer, and generating the production rule including attributes, operators, and values using the analysis result of the XML model. do.

또한 상기 f단계는, 외부 전문가가 상기 생산 규칙을 검증하는 단계, 상기 생산 규칙이 정상적으로 동작하는 지 확인하는 단계를 포함하는 것을 일 특징으로 한다.In addition, the step f, characterized in that it comprises the step of verifying the production rule, the external expert to verify that the production rule is operating normally.

나아가 상기 생산 규칙을 확인하는 단계는, 상기 생산 규칙과 상기 지식 데이터베이스에 기 저장된 기존 생산 규칙을 비교하여 규칙을 확인하는 단계, 상기 생산 규칙과 상기 기존 생산 규칙에 중복되는 규칙이 존재하면 상기 외부 전문가에게 상기 중복되는 규칙을 제공하는 단계를 포함하는 것을 일 특징으로 한다.Further, the checking of the production rule may include: comparing the production rule with an existing production rule previously stored in the knowledge database, and confirming the rule; if there is a rule overlapping the production rule and the existing production rule, the external expert It characterized in that it comprises the step of providing the overlapping rule to.

또한 본 발명은 규칙 기반의 지식 데이터베이스를 구축하는 시스템에 있어서, 외부 데이터 세트 소스로부터 데이터 세트를 선택하고, 상기 데이터 세트에 대응되는 의사 결정 트리 알고리즘을 선택하여 분류 모델을 생성하는 모델 생성부, 상기 분류 모델에서 조건 및 결과를 추출하는 전처리부, 상기 조건 및 결과를 이용하여 XML 모델을 생성하는 변환부, 상기 XML 모델의 세부 정보를 추출 및 분석하여 생산 규칙을 생성하고 상기 생산 규칙을 검증하는 규칙 생성부, 상기 생산 규칙을 저장하는 지식 데이터베이스를 포함하는 것을 일 특징으로 한다.In addition, the present invention provides a system for building a rule-based knowledge database, the model generator for generating a classification model by selecting a data set from an external data set source, the decision tree algorithm corresponding to the data set, A preprocessing unit for extracting conditions and results from a classification model, a transformation unit for generating an XML model using the conditions and results, and a rule for generating a production rule by extracting and analyzing detailed information of the XML model and verifying the production rule. The generation unit, characterized in that it comprises a knowledge database for storing the production rule.

나아가 상기 의사 결정 트리 알고리즘은 BFTree, J48, RandomTree, REPTree 또는 SimpleCart 중 적어도 하나를 포함하는 것을 일 특징으로 한다.Furthermore, the decision tree algorithm may include at least one of BFTree, J48, RandomTree, REPTree, or SimpleCart.

또한 상기 전처리부는, 상기 분류 모델에 텍스트 트리밍, 텍스트 분할 및 특수 문자 대체 기술을 사용함으로써 상기 분류 모델의 헤더 정보를 제거하고, 조건 및 결과를 추출하는 것을 일 특징으로 한다.The preprocessor may remove the header information of the classification model and extract conditions and results by using text trimming, text segmentation, and special character replacement techniques in the classification model.

나아가 상기 변환부는, 상기 분류 모델에 연산자 배열, 들여쓰기 및 파일 변환 작업을 수행하여 상기 XML 모델을 생성하는 것을 일 특징으로 한다.Furthermore, the conversion unit may generate the XML model by performing an operator array, indentation, and file conversion operation on the classification model.

또한 상기 규칙 생성부는, DOM 분석기를 이용하여 상기 XML 모델을 분석하는 모델 분석부, 상기 XML 모델의 분석 결과를 이용하여 속성, 연산자 및 값을 포함하는 생산 규칙을 생성하는 생산 규칙 생성부를 포함하는 것을 일 특징으로 한다.The rule generator may include a model analyzer that analyzes the XML model using a DOM analyzer, and a production rule generator that generates a production rule including an attribute, an operator, and a value using an analysis result of the XML model. It features one.

나아가 상기 규칙 생성부는, 외부 전문가가 상기 생산 규칙을 검증하는 규칙 검증부, 상기 생산 규칙의 구성이 올바르게 작성되었는지 확인하는 규칙 확인부를 포함하는 생산 규칙 검증부를 더 포함하는 것을 일 특징으로 한다.Furthermore, the rule generation unit may further include a production rule verifying unit including a rule verifying unit for verifying the production rule by an external expert and a rule confirming unit for confirming whether the configuration of the production rule is correctly written.

또한 상기 규칙 확인부는, 상기 생산 규칙과 기 저장된 기존 생산 규칙을 비교하여 규칙을 확인하고, 상기 규칙이 확인되면, 상기 생산 규칙과 상기 기존 생산 규칙에 중복되는 규칙이 존재하면 상기 외부 전문가에게 상기 중복되는 규칙을 제공하는 것을 일 특징으로 한다.The rule checking unit may compare the production rule with a pre-stored existing production rule and check a rule. When the rule is confirmed, the rule checking unit may duplicate the production rule and the existing production rule. It is characterized by providing a rule that becomes.

전술한 바와 같은 본 발명에 의하면, 생산 규칙을 기반으로 하는 지식 데이터베이스를 구축할 수 있다.According to the present invention as described above, it is possible to build a knowledge database based on production rules.

또한 본 발명은 데이터 세트로부터 신뢰 가능한 생산 규칙을 생성할 수 있다.The invention can also generate reliable production rules from data sets.

또한 본 발명은 하나 이상의 의사 결정 트리 알고리즘을 이용하여 생산 규칙을 생성할 수 있다.In addition, the present invention may generate production rules using one or more decision tree algorithms.

또한 본 발명은 전문가로부터 생산 규칙을 검증 받음으로써 신뢰도를 향상시킬 수 있다.In addition, the present invention can improve the reliability by verifying the production rules from experts.

도 1은 본 발명의 일 실시 예에 의한 지식 데이터베이스 구축 과정을 전체적으로 도시한 도면이다.
도 2는 본 발명의 일 실시 예에 의한 지식 데이터베이스 구축 시스템의 구성을 도시한 도면이다.
도 3은 본 발명의 일 실시 예에 의한 지식 데이터베이스 구축 방법을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시 예에 의한 XML 모델을 생성하는 데 요구되는 조건 및 결과를 추출하는 과정을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시 예에 의한 생산 규칙을 생성하는 과정을 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시 예에 의한 생산 규칙을 검증하는 과정을 설명하기 위한 도면이다.
도 7은 본 발명의 일 실시 예에 의한 REPTree를 이용하여 생성된 분류 모델을 설명하기 위한 도면이다.
도 8은 본 발명의 일 실시 예에 의한 XML 모델을 설명하기 위한 도면이다.
도 9는 본 발명의 일 실시 예에 의한 XML 모델을 분석하는 방법을 설명하기 위한 도면이다.
도 10은 본 발명의 일 실시 예에 의한 생산 규칙을 설명하기 위한 도면이다.1 is a diagram illustrating an overall process of building a knowledge database according to an embodiment of the present invention.
2 is a diagram illustrating a configuration of a knowledge database building system according to an embodiment of the present invention.
3 is a view for explaining a knowledge database construction method according to an embodiment of the present invention.
4 is a diagram illustrating a process of extracting a condition and a result required to generate an XML model according to an embodiment of the present invention.
5 is a view for explaining a process of generating a production rule according to an embodiment of the present invention.
6 is a view for explaining a process of verifying a production rule according to an embodiment of the present invention.
7 is a diagram for describing a classification model generated using a REPTree according to an embodiment of the present invention.
8 is a diagram illustrating an XML model according to an embodiment of the present invention.
9 is a view for explaining a method of analyzing an XML model according to an embodiment of the present invention.
10 is a view for explaining a production rule according to an embodiment of the present invention.

전술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술되며, 이에 따라 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 상세한 설명을 생략한다. The above objects, features, and advantages will be described in detail with reference to the accompanying drawings, whereby those skilled in the art may easily implement the technical idea of the present invention. In describing the present invention, when it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description will be omitted.

도면에서 동일한 참조부호는 동일 또는 유사한 구성요소를 가리키는 것으로 사용되며, 명세서 및 특허청구의 범위에 기재된 모든 조합은 임의의 방식으로 조합될 수 있다. 그리고 다른 식으로 규정하지 않는 한, 단수에 대한 언급은 하나 이상을 포함할 수 있고, 단수 표현에 대한 언급은 또한 복수 표현을 포함할 수 있음이 이해되어야 한다. The same reference numerals in the drawings are used to indicate the same or similar components, all combinations described in the specification and claims may be combined in any way. And unless specified otherwise, reference to the singular may include one or more, and reference to the singular may also include the plural expression.

본 명세서에서 사용되는 용어는 단지 특정 예시적 실시 예들을 설명할 목적을 가지고 있으며 한정할 의도로 사용되는 것이 아니다. 본 명세서에서 사용된 바와 같은 단수적 표현들은 또한, 해당 문장에서 명확하게 달리 표시하지 않는 한, 복수의 의미를 포함하도록 의도될 수 있다. 용어 "및/또는," "그리고/또는"은 그 관련되어 나열되는 항목들의 모든 조합들 및 어느 하나를 포함한다. 용어 "포함한다", "포함하는", "포함하고 있는", "구비하는", "갖는", "가지고 있는" 등은 내포적 의미를 갖는 바, 이에 따라 이러한 용어들은 그 기재된 특징, 정수, 단계, 동작, 요소, 및/또는 컴포넌트를 특정하며, 하나 이상의 다른 특징, 정수, 단계, 동작, 요소, 컴포넌트, 및/또는 이들의 그룹의 존재 혹은 추가를 배제하지 않는다. 본 명세서에서 설명되는 방법의 단계들, 프로세스들, 동작들은, 구체적으로 그 수행 순서가 확정되는 경우가 아니라면, 이들의 수행을 논의된 혹은 예시된 그러한 특정 순서로 반드시 해야 하는 것으로 해석돼서는 안 된다. 추가적인 혹은 대안적인 단계들이 사용될 수 있음을 또한 이해해야 한다.The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. Singular expressions as used herein may also be intended to include the plural meanings, unless the context clearly indicates otherwise. The term "and / or," "and / or" includes any and all combinations of the items listed therein. The terms "comprising", "comprising", "comprising", "comprising", "having", "having", and the like have implicit meanings, and as such, these terms are defined by their features, integers, It specifies steps, actions, elements, and / or components and does not exclude the presence or addition of one or more other features, integers, steps, actions, elements, components, and / or groups thereof. The steps, processes, and operations of a method described herein should not be construed as necessarily in their particular order as discussed or illustrated, unless the order of execution is specifically determined. . It should also be understood that additional or alternative steps may be used.

또한, 각각의 구성요소는 각각 하드웨어 프로세서로 구현될 수 있고, 위 구성요소들이 통합되어 하나의 하드웨어 프로세서로 구현될 수 있으며, 또는 위 구성요소들이 서로 조합되어 복수 개의 하드웨어 프로세서로 구현될 수도 있다.In addition, each component may be implemented as a hardware processor, each of the above components may be integrated into a single hardware processor, or the above components may be combined with each other to be implemented as a plurality of hardware processors.

지식 데이터베이스를 구축하기 위해 사용되는 의사 결정 트리는 데이터마이닝 분석에 대표적으로 사용되는 방법으로, 알고리즘을 시각적으로 표현하여 의사를 결정할 수 있다. 의사 결정 트리는 데이터를 분류하는 것을 목적으로 하여, 가장 큰 조건인 트리의 루트, 세부 조건인 트리의 가지, 그리고 조건에 대한 결과를 트리의 리프로 구성될 수 있다.The decision tree used to build the knowledge database is a typical method used for data mining analysis. The decision tree can be expressed visually by algorithms. The decision tree can be composed of the root of the tree which is the largest condition, the branch of the tree which is the detailed condition, and the leaf of the tree for the purpose of classifying the data.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하기로 한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 의한 지식 데이터베이스 구축 과정을 전체적으로 도시한 도면이다.1 is a diagram illustrating an overall process of building a knowledge database according to an embodiment of the present invention.

서버는 외부 데이터 세트 소스로부터 선택한 데이터 세트에 대응되는 의사 결정 트리 알고리즘을 선택할 수 있다. 서버는 모델 생성부를 통해 데이터 세트에 의사 결정 트리 알고리즘을 적용하여 분류 모델을 생성할 수 있다. 분류 모델이 생성되면, 서버는 전처리부를 통해 분류 모델에서 조건 및 결과를 추출하는 전처리를 수행하여 전처리 모델을 생성할 수 있다. 전처리 모델이 생성되면, 서버는 변환부를 통해 전처리 모델의 조건 및 결과를 이용하여 XML 모델을 생성할 수 있다. 서버는 XML 모델에서 속성, 연산자 및 값을 추출하여 생산 규칙을 생성하고, 전문가에게 검증 요청을 수행할 수 있다. 전문가가 검증을 완료하면, 서버는 생산 규칙을 지식 데이터베이스에 저장할 수 있다. 지식 데이터베이스 구축에 대한 보다 자세한 내용은 도 2에서 후술하도록 한다.The server may select a decision tree algorithm corresponding to the data set selected from an external data set source. The server may generate a classification model by applying a decision tree algorithm to the data set through the model generator. When the classification model is generated, the server may generate a preprocessing model by performing preprocessing to extract conditions and results from the classification model through the preprocessor. When the preprocessing model is generated, the server may generate an XML model by using the condition and the result of the preprocessing model through the transformation unit. The server can extract attributes, operators, and values from the XML model, generate production rules, and make validation requests to experts. After the expert completes the verification, the server can store the production rules in a knowledge database. More details on building a knowledge database will be described later with reference to FIG. 2.

도 2는 본 발명의 일 실시 예에 의한 지식 데이터베이스 구축 시스템의 구성을 도시한 도면이다.2 is a diagram illustrating a configuration of a knowledge database building system according to an embodiment of the present invention.

지식 데이터베이스 구축 시스템은 모델 생성부(100), 전처리부(200), 변환부(300), 규칙 생성부(400), 저장부(500) 그리고 지식 데이터베이스(600)를 포함할 수 있다.The knowledge database construction system may include a model generator 100, a preprocessor 200, a converter 300, a rule generator 400, a storage 500, and a knowledge database 600.

모델 생성부(100)는 외부 데이터 세트 소스로부터 데이터 세트를 선택하고, 데이터 세트에 대응되는 의사 결정 트리 알고리즘을 선택하여 분류 모델을 생성할 수 있다. 모델 생성부(100)는 데이터 선택부(110), 알고리즘 선택부(130) 그리고 분류 모델 생성부(150)를 포함할 수 있다.The model generator 100 may generate a classification model by selecting a data set from an external data set source and selecting a decision tree algorithm corresponding to the data set. The model generator 100 may include a data selector 110, an algorithm selector 130, and a classification model generator 150.

데이터 선택부(110)는 생산 규칙을 생성하기 위한 데이터 세트를 외부 데이터 세트 소스에서 선택할 수 있다.The data selector 110 may select a data set for generating a production rule from an external data set source.

알고리즘 선택부(130)는 데이터 선택부(110)에서 선택된 데이터 소스에 적합한 의사 결정 트리 알고리즘을 선택할 수 있다. 이 때, 의사 결정 트리 알고리즘은 BFTree, J48, RandomTree, REPTree 또는 SimpleCart 중 하나일 수 있다.The algorithm selector 130 may select a decision tree algorithm suitable for the data source selected by the data selector 110. In this case, the decision tree algorithm may be one of BFTree, J48, RandomTree, REPTree, or SimpleCart.

분류 모델 생성부(150)는 데이터 세트와 데이터 세트에 대응되는 의사 결정 트리 알고리즘에 10차 교차 검증법을 적용하여 분류 모델을 생성할 수 있다. 교차 검증법은 데이터 세트의 일부는 모델을 생성하는 데 사용하고, 남은 일부는 모델을 검증하는 데 사용하는 기법으로, 과적합을 피할 수 있다는 장점이 있다. 10차 교차 검증법은, 데이터 세트를 10개의 그룹으로 분리하여 9개의 그룹을 모델을 생성하는 데 사용하고 1개의 그룹을 모델을 검증하는 데 사용할 수 있다. 즉, 10차 교차 검증법은 10개의 그룹에 포함된 모든 그룹을 검증에 사용하기 때문에 과정을 10번 반복하여 분류 모델을 생성할 수 있다. 분류 모델 생성부(150)가 REPTree를 이용하여 생성한 분류 모델은 도 7과 같다.The classification model generator 150 may generate a classification model by applying a tenth order cross validation method to a data set and a decision tree algorithm corresponding to the data set. Cross-validation is a technique that uses part of the data set to generate the model and the remaining part to verify the model, which has the advantage of avoiding overfitting. Tenth order cross-validation can separate the data set into ten groups, using nine groups to generate the model and one group to validate the model. That is, since the 10th order cross-validation method uses all groups included in the 10 groups for verification, the classification model can be generated by repeating the process 10 times. The classification model generated by the classification model generator 150 using REPTree is as shown in FIG. 7.

분류 모델 생성부(150)는 생성된 분류 모델을 분류 모델 저장부(510)에 저장할 수 있다.The classification model generator 150 may store the generated classification model in the classification model storage 510.

전처리부(200)는 생성된 분류 모델에서 조건 및 결과를 추출할 수 있다. 전처리부(200)는 분류 모델 선택부(210), 모델 전처리부(230)를 포함할 수 있다. The preprocessor 200 may extract conditions and results from the generated classification model. The preprocessor 200 may include a classification model selector 210 and a model preprocessor 230.

분류 모델 선택부(210)는 분류 모델 저장부(510)에서 전처리를 수행할 분류 모델을 선택할 수 있다.The classification model selector 210 may select a classification model to perform preprocessing in the classification model storage 510.

모델 전처리부(230)는 텍스트 트리밍, 텍스트 분할 및 특수 문자 대체 기술을 이용하여 분류 모델에 전처리 작업을 수행할 수 있다. 모델 전처리부(230)는 예를 들어, 분류 모델에 포함된 헤더 정보를 제거할 수 있다. 그림 1을 참조하면, 분류 모델은 (a/b)[c/d]와 같은 헤더 정보를 포함할 수 있다. 이 때, a는 현재 리프까지의 모든 인스턴스의 총 차수를 의미하고, b는 현재 리프까지의 모든 인스턴스 중 잘못 분류된 인스턴스의 총 차수를 의미하며, c는 현재 리프에서 종료되는 모든 부적절한 인스턴스의 총 차수를 의미하고, d는 현재 리프에서 종료되는 모든 부적절한 인스턴스 중 잘못 분류된 인스턴스의 총 차수를 의미한다. 모델 전처리부(230)는 XML 모델을 생성하는 데에 있어서 불필요한 정보인 헤더 정보를 제거할 수 있다. 불필요한 정보를 제거함으로써 보다 적은 메모리를 사용할 수 있다는 효과를 가질 수 있다.The model preprocessor 230 may perform a preprocessing operation on the classification model by using text trimming, text segmentation, and special character replacement techniques. The model preprocessor 230 may remove header information included in, for example, a classification model. Referring to Figure 1, the classification model may include header information such as (a / b) [c / d]. Where a is the total order of all instances up to the current leaf, b is the total order of the misclassified instance of all instances up to the current leaf, and c is the total of all inappropriate instances terminated at the current leaf D means the total order of the misclassified instance of all inappropriate instances terminated in the current leaf. The model preprocessor 230 may remove header information, which is unnecessary information for generating an XML model. Eliminating unnecessary information can have the effect of using less memory.

모델 전처리부(230)는 분류 모델에 전처리를 수행함으로써 분류 모델에서 조건 및 결과를 추출할 수 있다. 본 명세서에는 전처리된 분류 모델을 전처리 모델이라고 명명한다. 즉, 모델 전처리부(230)는 조건 및 결과를 포함하는 전처리 모델을 생성할 수 있다.The model preprocessor 230 may extract conditions and results from the classification model by performing preprocessing on the classification model. In this specification, the preprocessed classification model is referred to as a preprocessing model. That is, the model preprocessor 230 may generate a preprocessing model including conditions and results.

모델 전처리부(230)는 전처리 모델을 전처리 모델 저장부(530)에 저장할 수 있다.The model preprocessor 230 may store the preprocessing model in the preprocessing model storage 530.

변환부(300)는 전처리부(200)에서 생성된 전처리 모델을 이용하여 XML 모델을 생성할 수 있다. 보다 구체적으로 변환부(300)는 전처리 모델 선택부(310)와 XML 모델 변환부(330)를 포함할 수 있다.The converter 300 may generate an XML model using the preprocessing model generated by the preprocessor 200. More specifically, the converter 300 may include a preprocessing model selector 310 and an XML model converter 330.

전처리 모델 선택부(310)는 XML 모델로 변환할 전처리 모델을 전처리 모델 저장부(530)에서 선택할 수 있다.The preprocessing model selector 310 may select a preprocessing model to be converted into an XML model in the preprocessing model storage 530.

XML 모델 변환부(330)는 전처리 모델에 연산자 배열, 들여쓰기 및 파일 변환 작업을 수행하여 XML 모델을 생성할 수 있다. XML 모델은 의사 결정 트리에서 상호 운용 가능하도록 표준화 된 언어로, 모든 플랫폼에 독립적 유니코드를 적용함으로써 하드디스크, 운영체제 그리고 프로그램 언어에 제약없이 사용할 수 있다. 즉, XML 모델은 데이터 세트 및 의사 결정 트리 알고리즘과 무관하게 동일한 구성을 가질 수 있다.The XML model converter 330 may generate an XML model by performing an operator array, indentation, and file conversion on the preprocessing model. The XML model is a standardized language for interoperability in decision trees, and can be used without limitations on hard disks, operating systems and programming languages by applying independent Unicode across all platforms. In other words, the XML model can have the same configuration regardless of the data set and decision tree algorithm.

XML 모델 변환부(330)가 생성한 XML 모델은 도 8와 같다. 도 8을 참조하면 XML 모델은 속성, 연산자 또는 값 중 적어도 하나로 구성될 수 있다. 예를 들어, XML 모델 변환부(330)가 전처리 모델에서 추출한 조건은 attribute(속성)="SituationCategory", operator(연산자)="=", value(값)="Sitting"으로 표현될 수 있다. 나아가 전처리 모델에서 추출한 결과는 Output decision(값)="Stretching"으로 표현될 수 있다.The XML model generated by the XML model converter 330 is illustrated in FIG. 8. Referring to FIG. 8, the XML model may be configured of at least one of an attribute, an operator, or a value. For example, the condition extracted by the XML model converter 330 from the preprocessing model may be expressed as an attribute = SituationCategory ", an operator = ", and a value = " Sitting ". Furthermore, the result extracted from the preprocessing model may be expressed as an output decision (value) = "Stretching".

XML 모델 변환부(330)는 XML 모델을 XML 모델 저장부(550)에 저장할 수 있다.The XML model converter 330 may store the XML model in the XML model storage 550.

규칙 생성부(400)는 XML 모델의 세부 정보를 추출 및 분석하여 생산 규칙을 생성하고 생산 규칙을 검증할 수 있다. 보다 구체적으로 규칙 생성부(400)는 XML 모델 분석부(410), 생산 규칙 생성부(430) 그리고 생산 규칙 검증부(450)를 포함할 수 있다. The rule generator 400 extracts and analyzes detailed information of the XML model to generate a production rule and verify the production rule. In more detail, the rule generator 400 may include an XML model analyzer 410, a production rule generator 430, and a production rule verifier 450.

XML 모델 분석부(410)는 XML 모델의 세부 정보를 추출할 수 있다. XML 모델 분석부(410)는 XML 모델 저장부(550)에서 XML 모델을 선택한 후, DOM(Document Object Model) 분석기를 이용하여 XML 모델을 구문론적으로 분석할 수 있다. DOM 분석기는 XML 모델을 객체 트리로 구성하여 객체 트리의 인스턴스를 추출할 수 있다. XML 모델 분석부(410)는 도 9와 같은 알고리즘을 이용하여 XML 모델의 세부 정보를 추출할 수 있다. The XML model analyzer 410 may extract detailed information of the XML model. The XML model analyzer 410 may select an XML model from the XML model storage unit 550 and then syntactically analyze the XML model using a Document Object Model (DOM) analyzer. The DOM parser can construct an XML model into an object tree to extract instances of the object tree. The XML model analyzer 410 may extract detailed information of the XML model using the algorithm shown in FIG. 9.

도 9의 알고리즘은 트리의 루트, 형제 노드 및 부모 노드의 관계와 같은 노드의 정보를 조사하여 XML 모델의 세부 정보를 추출할 수 있다. 즉, 객체 트리의 경로에서 결과를 따라 모든 조건을 추출하여 트리를 탐색할 수 있다. 다시 말해서 XML 모델 분석부(410)는 DOM 분석기를 이용하여 XML 모델에서 속성, 연산자 및 값을 추출할 수 있다. The algorithm of FIG. 9 may extract the detailed information of the XML model by examining the information of nodes such as the relationship between the root of the tree, sibling nodes and parent nodes. In other words, you can search the tree by extracting all conditions along the result from the path of the object tree. In other words, the XML model analyzer 410 may extract attributes, operators, and values from the XML model using the DOM analyzer.

생산 규칙 생성부(430)는 XML 모델 분석부(410)가 추출한 속성, 연산자 및 값을 이용하여 생산 규칙을 생성할 수 있다. 예를 들어, 생산 규칙 생성부(430)는 REPTree와 J48을 이용하여 생성된 분류 모델로부터 도 10과 같은 생산 규칙을 생성할 수 있다. 도 10을 참조하면, 생산 규칙은 조건(IF)과 결과(THEN)으로 구성될 수 있다.The production rule generator 430 may generate a production rule using attributes, operators, and values extracted by the XML model analyzer 410. For example, the production rule generator 430 may generate a production rule as shown in FIG. 10 from a classification model generated using REPTree and J48. Referring to FIG. 10, the production rule may consist of a condition IF and a result THEN.

생산 규칙 생성부(430)는 생성된 생산 규칙을 생산 규칙 저장부(570)에 저장할 수 있다.The production rule generator 430 may store the generated production rule in the production rule storage 570.

생산 규칙 검증부(450)는 외부 전문가가 생산 규칙을 검증할 수 있게 하고, 생산 규칙의 구성이 올바르게 작성되었는 지 확인할 수 있다. 보다 구체적으로 생산 규칙 검증부(450)는 규칙 검증부(미도시) 및 규칙 확인부(미도시)를 포함할 수 있다.The production rule verification unit 450 may enable an external expert to verify the production rule, and may verify that the configuration of the production rule is correctly written. More specifically, the production rule verifier 450 may include a rule verifier (not shown) and a rule checker (not shown).

규칙 검증부는 생산 규칙 저장부(570)에 저장된 생산 규칙을 선택하여 외부 전문가가 검증할 수 있도록 한다. 전문가는 생산 규칙을 검증하고 규칙 기반의 지식 데이터베이스(600)에 저장할 수 있다. 전문가는 생산 규칙이 적합한 구조로 생성되었는지 확인할 수 있다.The rule verifier selects a production rule stored in the production rule storage 570 so that an external expert can verify the rule. The expert can verify the production rules and store them in the rule-based knowledge database 600. The expert can verify that the production rules have been created in a suitable structure.

규칙 확인부는 생산 규칙과 기 저장되어 있는 기존의 생산 규칙을 비교하여 생산 규칙을 확인하고, 생산 규칙과 기존 생산 규칙 사이에 중복되는 규칙이 존재하면 외부 전문가에게 중복되는 규칙을 제공할 수 있다.The rule checking unit checks the production rule by comparing the production rule with a pre-stored existing production rule. If there is a duplicate rule between the production rule and the existing production rule, the rule checker may provide a duplicate rule to an external expert.

저장부(500)는 모델 생성부(100), 전처리부(200), 변환부(300) 및 규칙 생성부(400)에서 생성된 모델 및 생산 규칙을 저장할 수 있다. 보다 구체적으로 저장부(500)는 모델 생성부(100)에서 생성된 분류 모델을 저장하는 분류 모델 저장부(510), 전처리부(200)에서 생성된 전처리 모델을 저장하는 전처리 모델 저장부(530), 변환부(300)에서 생성된 XML 모델을 저장하는 XML 모델 저장부(550) 그리고 규칙 생성부(400)에서 생성된 생산 규칙을 저장하는 생산 규칙 저장부(570)를 포함할 수 있다.The storage unit 500 may store the model and the production rule generated by the model generator 100, the preprocessor 200, the converter 300, and the rule generator 400. More specifically, the storage unit 500 includes a classification model storage unit 510 for storing the classification model generated by the model generator 100 and a preprocessing model storage unit 530 for storing the preprocessing model generated by the preprocessor 200. ), An XML model storage unit 550 that stores the XML model generated by the conversion unit 300, and a production rule storage unit 570 that stores the production rule generated by the rule generation unit 400.

지식 데이터베이스(600)는 생산 규칙 검증부(450)로부터 검증된 생산 규칙을 저장할 수 있다.The knowledge database 600 may store the production rule verified from the production rule verifier 450.

이하에서는 도 3 내지 도 6을 참조하여 본 발명의 일 실시 예에 의한 지식 데이터베이스 구축 방법을 설명한다. 지식 데이터베이스 구축 방법에 관한 설명에서 전술한 지식 데이터베이스 구축 시스템과 중복되는 세부 실시 예는 생략될 수 있다.Hereinafter, a knowledge database construction method according to an embodiment of the present invention will be described with reference to FIGS. 3 to 6. In the description of the knowledge database construction method, detailed embodiments overlapping with the above-described knowledge database construction system may be omitted.

도 3은 본 발명의 일 실시 예에 의한 지식 데이터베이스 구축 방법을 설명하기 위한 도면이다.3 is a view for explaining a knowledge database construction method according to an embodiment of the present invention.

도 3을 참조하면, 서버는 외부 데이터 세트 소스로부터 데이터 세트를 선택할 수 있다(S100). Referring to FIG. 3, the server may select a data set from an external data set source (S100).

데이터 세트가 선택되면, 서버는 데이터 세트에 대응되는 의사 결정 트리 알고리즘을 선택할 수 있다(S200). 의사 결정 트리 알고리즘은 BFTree, J48, RandomTree, REPTree 또는 SimpleCart 중 하나일 수 있다.When the data set is selected, the server may select a decision tree algorithm corresponding to the data set (S200). The decision tree algorithm may be one of BFTree, J48, RandomTree, REPTree or SimpleCart.

서버는 데이터 세트와 의사 결정 트리 알고리즘에 10차 교차 검증법을 적용하여 분류 모델을 생성할 수 있다(S300).The server may generate a classification model by applying a tenth order cross validation method to the data set and the decision tree algorithm (S300).

서버는 생성된 분류 모델에 포함된 조건 및 결과를 추출할 수 있다(S400). 서버는 분류 모델에서 조건 및 결과를 추출하기 위해 텍스트 트리밍, 텍스트 분할 및 특수 문자 대체 기술을 사용할 수 있다.The server may extract conditions and results included in the generated classification model (S400). The server can use text trimming, text segmentation, and special character substitution techniques to extract conditions and results from the classification model.

서버는 분류 모델에서 추출된 조건 및 결과를 이용하여 XML 모델을 생성할 수 있다(S500).The server may generate an XML model using the conditions and the results extracted from the classification model (S500).

XML 모델이 생성되면, 서버는 XML 모델에서 세부 정보를 추출하여 생산 규칙을 생성할 수 있다(S600).When the XML model is generated, the server may generate the production rule by extracting the detailed information from the XML model (S600).

서버는 생성된 생산 규칙을 검증 및 확인하여 정상적인 생산 규칙이라고 판단되면 생산 규칙을 지식 데이터베이스에 저장할 수 있다(S700).The server may store the production rule in the knowledge database when it is determined to be a normal production rule by verifying and confirming the generated production rule (S700).

도 4는 본 발명의 일 실시 예에 의한 XML 모델을 생성하는 데 요구되는 조건 및 결과를 추출하는 과정을 설명하기 위한 도면이다.4 is a diagram illustrating a process of extracting a condition and a result required to generate an XML model according to an embodiment of the present invention.

서버는 분류 모델에서 조건 및 결과가 포함된 영역에 텍스트 트리밍을 수행하여 일 영역의 텍스트를 잘라내고(S410), 잘라낸 텍스트를 분할하여 조건 및 결과를 구분할 수 있다(S420).The server may cut the text of one region by performing text trimming on the region including the condition and the result in the classification model (S410), and divide the cut out text to distinguish the condition and the result (S420).

서버는 분류 모델에 포함된 특수 문자를 XML 모델에 맞게 대체(S430)하여 분류 모델의 조건 및 결과를 추출(S440)할 수 있다.The server may extract the conditions and the results of the classification model (S440) by replacing the special characters included in the classification model according to the XML model (S430).

도 5는 본 발명의 일 실시 예에 의한 생산 규칙을 생성하는 과정을 설명하기 위한 도면이다.5 is a view for explaining a process of generating a production rule according to an embodiment of the present invention.

서버는 DOM(Document Object Model) 분석기를 사용하여 XML 모델을 분석할 수 있다(S610). DOM 분석기는 XML 모델을 객체 트리로 변환하여 객체 트리의 인스턴스에 접근할 수 있도록 한다.The server may analyze the XML model using a DOM (Document Object Model) analyzer (S610). The DOM parser converts an XML model into an object tree, allowing access to instances of the object tree.

서버는 XML 모델의 값을 의미하는 객체 트리의 인스턴스를 조사하여 속성, 연산자 및 값을 포함하는 세부 정보를 추출할 수 있다(S620).The server may extract detailed information including an attribute, an operator, and a value by examining an instance of an object tree representing a value of an XML model (S620).

서버는 추출된 세부 정보를 이용하여 생산 규칙을 생성할 수 있다(S630).The server may generate a production rule using the extracted detailed information (S630).

도 6은 본 발명의 일 실시 예에 의한 생산 규칙을 검증하는 과정을 설명하기 위한 도면이다.6 is a view for explaining a process of verifying a production rule according to an embodiment of the present invention.

서버는 생산 규칙이 생성되면, 외부 전문가에 생산 규칙의 검증 요청을 할 수 있다(S640). 외부 전문가는 생산 규칙의 구조가 적합하게 작성되었는 지 판단할 수 있다.When the production rule is generated, the server may make a request for verification of the production rule to an external expert (S640). External experts can determine whether the structure of the production rules is appropriate.

서버는 생산 규칙이 검증되면, 생산 규칙이 정상적으로 작동하는 지 확인할 수 있다(S650). 나아가 서버는 생산 규칙과 지식 데이터베이스에 저장된 기존 생산 규칙을 비교하여 생산 규칙을 확인할 수 있다. 이 때, 기존 생산 규칙과 생성 규칙에 중복되는 규칙이 있다고 판단되면, 서버는 외부 전문가에게 중복되는 규칙에 대한 알림을 제공할 수 있다(S660).If the production rule is verified, the server may check whether the production rule operates normally (S650). Furthermore, the server can check the production rules by comparing the production rules with existing production rules stored in the knowledge database. At this time, if it is determined that there is a duplicate rule in the existing production rule and the generation rule, the server may provide a notification for the duplicate rule to the external expert (S660).

본 명세서와 도면에 개시된 본 발명의 실시 예들은 본 발명의 기술 내용을 쉽게 설명하고 본 발명의 이해를 돕기 위해 특정 예를 제시한 것뿐이며, 본 발명의 범위를 한정하고자 하는 것은 아니다. 여기에 개시된 실시 예들 이외에도 본 발명의 기술적 사상에 바탕을 둔 다른 변형 예들이 실시 가능하다는 것은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 자명한 것이다.Embodiments of the present invention disclosed in the specification and drawings are only specific examples to easily explain the technical contents of the present invention and aid the understanding of the present invention, and are not intended to limit the scope of the present invention. It will be apparent to those skilled in the art that other modifications based on the technical idea of the present invention can be carried out in addition to the embodiments disclosed herein.

Claims

In how the server builds a rules-based knowledge database,
Selecting a data set from an external data set source;
Generating a classification model using a decision tree algorithm corresponding to the data set;
C) extracting conditions and results from the classification model;
Generating an XML model using the conditions and results;
E step of extracting detailed information of the XML model;
F generating and verifying a production rule using the detailed information;
Repeating steps a through f to store the production rules generated in one or more data sets in the knowledge database.

The method of claim 1,
And the decision tree algorithm comprises at least one of BFTree, J48, RandomTree, REPTree, or SimpleCart.

The method of claim 1,
Step c,
Using a text trimming, text segmentation, and special character substitution technique in said classification model to remove header information of said classification model and extract conditions and results.

The method of claim 1,
The d step,
And converting the XML model into the XML model by performing an operator array, indentation, and file conversion operation on the classification model.

The method of claim 1,
The step e,
Analyzing the XML model using a DOM analyzer;
And generating the production rule including attributes, operators, and values using the analysis results of the XML model.

The method of claim 1,
The f step,
Verifying the production rule by an external expert;
Verifying that the production rule is operating normally.

The method of claim 6,
Checking the production rule,
Checking a rule by comparing the production rule with an existing production rule previously stored in the knowledge database;
And providing the overlapping rule to the external expert if there is a overlapping rule in the production rule and the existing production rule.

In a system for building a rule-based knowledge database,
A model generator for selecting a data set from an external data set source and generating a classification model by selecting a decision tree algorithm corresponding to the data set;
A preprocessor extracting a condition and a result from the classification model;
A transformation unit generating an XML model using the conditions and results;
A rule generation unit for generating a production rule by extracting and analyzing detailed information of the XML model and verifying the production rule;
Knowledge database building system comprising a knowledge database for storing the production rule.

The method of claim 8,
The decision tree algorithm includes at least one of BFTree, J48, RandomTree, REPTree, or SimpleCart.

The method of claim 8,
The preprocessing unit,
A knowledge database construction system for removing header information of the classification model and extracting conditions and results by using text trimming, text segmentation and special character substitution techniques in the classification model.

The method of claim 8,
The conversion unit,
A knowledge database construction system for generating the XML model by performing an operator array, indentation, and file conversion on the classification model.

The method of claim 8,
The rule generation unit,
An XML model analyzer for analyzing the XML model using a DOM analyzer;
And a production rule generator for generating a production rule including attributes, operators, and values using the analysis result of the XML model.

The method of claim 8,
The rule generation unit,
A rule verifier for verifying the production rule by an external expert;
And a production rule verifying unit including a rule checking unit for confirming whether a configuration of the production rule is correctly written.

The method of claim 13,
The rule check unit,
The rule is checked by comparing the production rule with the previously stored production rule.
And when the rule is confirmed, if there is a rule overlapping the production rule and the existing production rule, providing the overlapping rule to the external expert.