KR20110017297A

KR20110017297A - Method and apparatus for mapping the heterogeneous classification systems

Info

Publication number: KR20110017297A
Application number: KR1020090074891A
Authority: KR
Inventors: 박형근; 김건오
Original assignee: 주식회사 솔트룩스
Priority date: 2009-08-13
Filing date: 2009-08-13
Publication date: 2011-02-21
Also published as: KR101088483B1

Abstract

PURPOSE: A method and an apparatus for mapping heterogeneous classification systems are provided to search specific items of heterogeneous classification systems while comparing the specific items of heterogeneous classification systems, thereby excavating useful information from past classification systems. CONSTITUTION: At least one keyword is extracted from a unit classification item of the first classification system(610). Similarity between a unit classification item of the first classification system and a unit classification item of the second classification system is calculated by comparing the extracted key word with the unit classification item of the second classification system(620). The unit classification item of the first classification system is connected with a unit classification item corresponding to the second classification system based on the calculated similarity and outputted(630).

Description

Method and apparatus for mapping the heterogeneous classification systems

본 발명은 이종 분류체계들을 매핑시키는 방법 및 장치에 관한 것으로, 보다 상세하게는 복수 개의 분류항목들을 포함하는 이질적인 분류체계들 간의 비교를 통해 분류체계들로부터 유사도를 검출하고, 유사 요소를 중심으로 분류체계를 서로 연결함으로써 이종 분류체계들 간의 연관성을 발견하는 분류체계 매핑 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for mapping heterogeneous classification systems, and more particularly, to detect similarity from classification systems through comparison between heterogeneous classification systems including a plurality of classification items, and to classify them based on similar elements. The present invention relates to a taxonomy mapping method and apparatus for discovering associations between heterogeneous taxonomy by linking systems together.

일반적으로 분류체계라 함은 분류 기호 및 그것에 대응하는 용어를 사용하여 복수 개의 대상 문헌 또는 데이터를 구조화하여 표현하는 체계를 말한다. 이러한 분류체계는 과학기술분야나 도서관학 등에서 정보 검색을 용이하게 하기 위해 일련의 정보들을 특정 카테고리나 기준에 따라 분류하는데 널리 활용하고 있다. 여기서는 문헌이나 데이터 등 분류의 대상이 될 수 있는 광범위한 정보가 이러한 분류체계의 대상 객체가 될 수 있다.In general, a classification system refers to a system for constructing and expressing a plurality of target documents or data using classification symbols and corresponding terms. This classification system is widely used to classify a series of information according to specific categories or criteria to facilitate information retrieval in science and technology fields or library studies. Here, a wide range of information that may be subject to classification, such as literature or data, may be the object of this classification system.

국가 기록물들 역시 이러한 분류체계의 대상 객체가 될 수 있는데, 이러한 분류체계의 일종인 분류기준표는 시대별로 정부의 조직 개편에 따라 변화되어 왔다. 예를 들어, '기록관리기준표', '기록분류기준표' 및 '공문서분류번호표' 등이 각각 변화에 따른 분류체계로 활용되어 왔다. 이러한 분류기준표들은 현재의 정부 조직에서 사용될 경우 '현용'이라고 불리고, 과거의 분류기준표로서 이제는 더 이상 사용되지 않을 경우 '비현용'이라고 불리며, '현용'과 '비현용'의 과도기에 위치할 경우 '준현용'이라고 불린다. 정부 기관에서 새로 생성된 기록물은 당연히 현용 분류기준표에 따라 분류가 될 것이나, 과거의 유사한 기록물들에 대한 검색이 필요할 경우 현용 분류기준표와 다른 체계를 갖는 준현용 및 비현용 분류기준표로 인해 검색이 곤란한 문제가 발생한다.National records can also be objects of this taxonomy. The taxonomy table, a type of taxonomy, has changed over time as the government reorganizes. For example, 'record management criteria table', 'record classification criteria table' and 'public document classification number table' have been used as classification systems according to changes. These taxonomy tables are called 'current' when used in current government organizations, and 'non-current' when they are no longer used as classifier tables in the past, and are located in the transition periods of 'current' and 'non-current' It is called 'Jun Hyunyong'. Naturally, newly created records will be classified according to the current classification table. However, if a search for similar records in the past is necessary, it may be difficult to search due to the quasi-current and non-current classification tables that are different from the current classification table. A problem arises.

이와 같이 다양한 종류의 분류체계들을 널리 활용함에 있어서, 분류체계들 간의 상이한 체계로 인해 정보 검색의 어려움이 발생할 수 있으며, 보다 많은 정보가 축적되면 축적될수록 이러한 이종 분류체계들간의 정보 호환 및 매핑(mapping)에 대한 요구 및 필요성은 증가할 수 밖에 없다.In using these various types of classification systems widely, different systems among classification systems may cause difficulty in information retrieval, and as more information accumulates, information compatibility and mapping between these heterogeneous classification systems will increase. The need and necessity of) must increase.

본 발명이 해결하고자 하는 기술적 과제는 이종 분류체계들 간의 상이한 체계로 인해 정보가 서로 호환되지 않는 문제점을 해결하고, 이종 분류체계들의 정보를 모두 참고하여야 하는 상황에서 이들 분류체계들을 동시에 비교하며 검색할 수 없는 한계와 이종 분류체계들 간의 상호 연관성을 발견할 수 있는 수단이 없는 한계를 극복하는데 있다.The technical problem to be solved by the present invention is to solve the problem of information incompatibility due to different systems between heterogeneous classification systems, and compare and search these classification systems simultaneously in the situation where all information of heterogeneous classification systems should be referred to. It is to overcome the limitations that cannot be found and the means to discover the interrelationships between heterogeneous classification systems.

상기 기술적 과제를 해결하기 위하여, 본 발명에 따른 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들을 매핑시키는 방법은 상기 분류체계에 속하는 제 1 분류체계의 단위 분류항목으로부터 적어도 하나 이상의 키워드를 추출하는 단계; 상기 추출된 키워드와 상기 분류체계에 속하는 제 2 분류체계의 단위 분류항목을 비교하여, 상기 제 1 분류체계의 단위 분류항목과 상기 제 2 분류체계의 단위 분류항목 간의 유사도를 산출하는 단계; 및 상기 산출된 유사도에 기초하여 상기 제 1 분류체계의 단위 분류항목과 상기 제 2 분류체계의 대응하는 단위 분류항목을 연결하여 출력하는 단계를 포함한다.In order to solve the above technical problem, a method of mapping different classification systems including a plurality of classification items according to the present invention comprises the steps of extracting at least one keyword from the unit classification items of the first classification system belonging to the classification system; ; Calculating a similarity between the unit classification item of the first classification system and the unit classification item of the second classification system by comparing the extracted keyword and the unit classification item of the second classification system belonging to the classification system; And outputting by connecting the unit classification items of the first classification system and the corresponding unit classification items of the second classification system based on the calculated similarity.

또한, 이하에서는 상기 기재된 데이터 구조화 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.Further, the following provides a computer readable recording medium having recorded thereon a program for executing the data structuring method described above on a computer.

상기 기술적 과제를 해결하기 위하여, 본 발명에 따른 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들을 매핑시키는 장치는 상기 분류체계에 속하는 제 1 분류체계의 단위 분류항목으로부터 적어도 하나 이상의 키워드를 추출하는 키워드 추출부; 상기 추출된 키워드와 상기 분류체계에 속하는 제 2 분류체계의 단위 분류항목을 비교하여, 상기 제 1 분류체계의 단위 분류항목과 상기 제 2 분류체계의 단위 분류항목 간의 유사도를 산출하는 유사도 산출부; 및 상기 산출된 유사도에 기초하여 상기 제 1 분류체계의 단위 분류항목과 상기 제 2 분류체계의 대응하는 단위 분류항목을 연결하여 출력하는 출력부를 포함한다.In order to solve the above technical problem, an apparatus for mapping different classification systems including a plurality of classification items according to the present invention is a keyword for extracting at least one keyword from the unit classification items of the first classification system belonging to the classification system. Extraction unit; A similarity calculator that compares the extracted keyword and a unit classification item of a second classification system belonging to the classification system and calculates a similarity between the unit classification item of the first classification system and the unit classification item of the second classification system; And an output unit for connecting and outputting a unit classification item of the first classification system and a corresponding unit classification item of the second classification system based on the calculated similarity.

본 발명은 상이한 체계를 갖는 이종 분류체계들의 정보를 모두 참고하여야 하는 상황에서 이들 이종 분류체계들 간의 상호 연관성을 발견하여 이들을 연결함으로써 이종 분류체계들의 정보의 호환성을 향상시키고, 이종 분류체계들 전반에 걸쳐 특정 항목을 동시에 비교하며 검색할 수 있으며, 나아가 과거의 분류체계들로부터 유용한 정보를 발굴하여 활용하는 것이 가능하다.The present invention improves the compatibility of information of heterogeneous classification systems by discovering the interrelationships between these heterogeneous classification systems and linking them in a situation where all information of heterogeneous classification systems having different systems should be referred to. It is possible to compare and search specific items simultaneously, and further to discover and utilize useful information from past classification systems.

본 발명의 다양한 실시예들을 상세히 설명하기에 앞서 이하에서는 본 발명의 실시예들이 직면한 기본적인 문제 상황과 해결 아이디어를 개략적으로 제시한다.Prior to describing various embodiments of the present invention, the following provides an overview of basic problem situations and solving ideas faced by embodiments of the present invention.

도 1은 이종 분류체계들을 매핑시키고자 하는 상황을 설명하기 위한 도면으로, 앞서 예시하였던 국가 기록물들의 분류기준표를 가정하여 상술하도록 하겠다. 즉, 도 1의 분류체계 1(10)를 현용 분류체계라고 가정하고, 분류체계 2(20)를 비현용 분류체계라고 가정하자.FIG. 1 is a view for explaining a situation in which heterogeneous classification systems are to be mapped, and it will be described on the assumption of a classification criteria table of the national records exemplified above. That is, assume that classification system 1 (10) of FIG. 1 is a current classification system and classification system 2 (20) is a non-existence classification system.

새롭게 작성된 문서 A는 현용 분류체계인 분류체계 1(10)의 기준에 따라 분 류가 될 것이다. 이 때, 사용자가 문서 A와 관련된 과거의 기록물을 열람하기를 희망한다면, 분류체계 2(20)를 검색하여야 할 것이다. 그러나, 분류체계 2(20)에 따라 저장되어 있는 문서 및 기타 데이터들이 색인어를 통해 검색이 가능하지 않을 수도 있으며, 저장된 데이터가 색인어가 부여될 수 없는 유형의 데이터일 수도 있다. 이러한 경우 분류체계 2(20)에서 문서 A와 연관된 기록물을 검색하는 것은 실질적으로 어려워질 수 있다. The newly created document A will be classified according to the criteria of the current taxonomy, Taxonomy 1 (10). At this point, if the user wishes to browse past records related to document A, he would have to search for classification system 2 (20). However, documents and other data stored according to the classification system 2 (20) may not be searchable through index words, and the stored data may be data of a type to which index words cannot be assigned. In such cases, retrieval of records associated with document A in taxonomy 2 20 may be substantially difficult.

따라서, 이후에 설명될 본 발명에 따른 다양한 실시예들은 이러한 어려움을 해소하기 위해 이종 분류체계(10, 20)에서 특정 항목을 선택하여 양자를 비교하고, 비교 항목들 간의 유사도에 따라 양자를 연결하고자 한다. 도 1에는 분류체계 1(10)의 단위과제(11)와 분류체계 2(20)의 단위업무(21)를 대응시키는 상황이 도시되어 있다. 이 때, 비교 대상인 분류체계 1(10)의 단위과제(11)와 분류체계 2(20)의 단위업무(21)는 당연히 대응 관계를 형성할 수 있는 성질을 갖는 항목이 되어야 할 것이다. 도 1에서는 각각 단위과제(11) 및 단위업무(21)라는 이름을 갖는 분류체계들(10, 20)의 최소 분류항목에 해당함을 알 수 있다.Therefore, various embodiments according to the present invention to be described later, to solve these difficulties, to select a specific item in the heterogeneous classification system (10, 20) to compare the two, and to connect both according to the similarity between the comparison items do. FIG. 1 illustrates a situation in which the unit task 11 of the classification system 1 (10) is associated with the unit task 21 of the classification system 2 (20). At this time, the unit task 11 of the classification system 1 (10) to be compared with the unit task 21 of the classification system 2 (20) should be an item having a property capable of forming a corresponding relationship. In FIG. 1, it can be seen that they correspond to the minimum classification items of the classification systems 10 and 20 named unit tasks 11 and unit tasks 21, respectively.

비록 비교 대상이 되는 항목들이 각각의 분류체계에서 반드시 최소 분류항목이 될 필요는 없지만, 단위(unit) 분류항목이 되는 것이 바람직하다. 여기서, 단위 분류항목이란 각각의 분류체계를 구성하는 복수 개의 분류 기준 중 단일의 분류 기준을 의미한다. 예를 들어, 도 1의 단위과제(11)는 최소 분류항목임과 동시에 단위 분류항목에 해당한다. 반면, 중기능은 최소 분류항목은 아니지만, 단위 분류항목에 해당한다. 따라서, 도 1에 도시된 바와 달리 분류체계 1(10)의 중기능과 분류체계 2(20)의 중기능을 연결할 수도 있을 것이다. 연결의 대상이 되는 분류항목은 이종의 분류체계들을 활용하고자 하는 사용자의 주어진 상황에 따라 적절히 선택될 수 있을 것이다.Although the items to be compared do not necessarily have to be the minimum category in each classification system, it is desirable to be a unit category. Here, the unit classification item means a single classification criterion among the plurality of classification criteria constituting each classification system. For example, the unit task 11 of FIG. 1 is a minimum classification item and corresponds to a unit classification item. On the other hand, the middle function is not a minimum classification item, but corresponds to a unit classification item. Accordingly, unlike FIG. 1, the heavy function of the classification system 1 (10) and the heavy function of the classification system 2 (20) may be connected. The category to be linked may be appropriately selected according to the given situation of the user who wants to use heterogeneous classification systems.

도 2는 도 1의 이종 분류체계들이 매핑되어 생성된 결과를 예시한 도면으로서, 앞서 가정한 바와 같이 분류체계 1(10)의 단위과제(11)와 분류체계 2(20)의 단위업무(21)를 연결하여 새롭게 매핑된 분류체계(30)가 형성된 상황을 도시하고 있다. 즉, 도 2에서 도 1의 분류체계 1(10)과 분류체계 2(30)는 매핑 유형(31)이라는 항목을 중심으로 연결되어 있다. 여기서, 매핑 유형(31)이란 연결된 이종의 분류체계가 어떠한 관계를 갖는지를 식별할 수 있도록 부여되는 식별 정보이다. 이러한 매핑 유형(31)은 연결 관계에 따라 특정 범위 내의 수치로서 부여될 수도 있고, 연관도에 따른 특정 범위마다 식별 코드를 부여하여 보다 유연하게 관리될 수도 있을 것이다. 예를 들어, 0 이상이고 10 이하의 자연수의 범위 내에서 연관도가 수치로서 부여될 수도 있고, 연관도의 정도에 따라 '일치', '유사', '관련'이라는 표현이 부여될 수도 있다. 후자의 경우 관리의 편의를 위해 'A', 'B', 'C'와 같은 알파벳으로 코드를 부여할 수도 있을 것이다.FIG. 2 is a diagram illustrating a result generated by mapping the heterogeneous classification systems of FIG. 1. As previously assumed, the unit task 11 of the classification system 1 (10) and the unit task 21 of the classification system 2 (20) are illustrated. ), A newly mapped taxonomy 30 is formed. That is, in FIG. 2, the taxonomy 1 (10) and the taxonomy 2 (30) of FIG. 1 are connected around the item of the mapping type 31. Here, the mapping type 31 is identification information provided to identify what relation the heterogeneous classification systems are connected to. The mapping type 31 may be given as a numerical value within a specific range according to the connection relationship, or may be more flexibly managed by assigning an identification code to each specific range according to the degree of association. For example, the degree of association may be given as a numerical value within a range of a natural number greater than or equal to 10 and less than or equal to 10, and expressions of 'match', 'similarity', and 'relevance' may be given depending on the degree of the degree of association. In the latter case, for convenience of management, the codes may be assigned in alphabets such as 'A', 'B', and 'C'.

이하에서는 도면을 참조하여 본 발명의 다양한 실시예들을 상세히 설명한다.Hereinafter, various embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 3은 본 발명의 일 실시예에 따른 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들을 매핑시키는 장치를 도시한 도면으로서, 이종 분류체계 매핑 장치(40)를 중심으로 입력값으로 분류체계들(10, 20)이 주어지고, 매핑된 분류체계(30)를 출력하는 모습이 도시되어 있다. 다시 이종 분류체계 매핑 장치(40)는 입력부(410), 키워드 추출부(42), 유사도 산출부(43) 및 출력부(44)를 포함한다.FIG. 3 is a diagram illustrating an apparatus for mapping different classification systems including a plurality of classification items according to an embodiment of the present invention. The classification systems are input values based on the heterogeneous classification system mapping device 40. 10, 20), the output of the mapped classification system 30 is shown. The heterogeneous classification system mapping apparatus 40 further includes an input unit 410, a keyword extracting unit 42, a similarity calculating unit 43, and an output unit 44.

입력부(41)는 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들(10, 20)을 입력받는다. 여기서, 입력(input)이라는 표현은 전자적인 형태로 주어지는 데이터가 될 수도 있고, 기록매체나 대용량 저장장치(mass storage)로부터 정보를 읽어들이는(read) 형태를 포괄한다.The input unit 41 receives different classification systems 10 and 20 including a plurality of classification items. Here, the expression input may be data given in an electronic form, and encompasses a form in which information is read from a recording medium or a mass storage device.

키워드 추출부(42)는 이종 분류체계에 속하는 제 1 분류체계의 단위 분류항목으로부터 적어도 하나 이상의 키워드를 추출한다. 앞서 도 1에서 설명한 바와 같이 단위 분류항목이란 분류체계 1(10)에 포함된 복수 개의 분류항목들 중 단일의 분류 기준을 의미한다. 키워드 추출부(42)는 단위 분류항목의 해당 내용을 분석하여 이로부터 분류체계 2(20)와의 비교를 위한 키워드를 추출한다. 따라서, 키워드를 어떠한 정보로 정의하는지에 따라 단위 분류항목의 내용 분석 방법이 달라질 수 있다. 특히, 어휘 집단으로부터 특정 키워드를 추출하는 방법의 특성상 언어의 영향을 받을 수 밖에 없다.The keyword extracting unit 42 extracts at least one keyword from the unit classification items of the first classification system belonging to the heterogeneous classification system. As described above with reference to FIG. 1, the unit classification item means a single classification criterion among a plurality of classification items included in the classification system 1 (10). The keyword extracting unit 42 analyzes the corresponding contents of the unit classification items and extracts keywords for comparison with the classification system 2 (20). Therefore, the method of analyzing the content of the unit classification item may vary according to what information the keyword is defined. In particular, the nature of the method of extracting a specific keyword from the lexical group is bound to be influenced by the language.

이러한 키워드는 이종 분류체계들을 활용하고자 하는 분야에 따라서 다양하게 정의될 수 있으나, 의미를 지닌 말의 기초 단위인 형태소인 것이 바람직하다. 왜냐하면, 키워드를 글자 단위로 비교, 분석할 경우 유사도 판단을 위한 자료로서 활용될 키워드에 부적절한 어휘가 포함될 수 있기 때문이다. 특히, 한글의 경우 복합 명사가 많고, 어미의 변화 및 띄어쓰기 없이 연결되는 조사 등과 같이 그 특유의 구조로 인해 이러한 혼란이 가중될 가능성이 매우 크다. 따라서, 분석 대상이 되는 단위 분류항목으로부터 키워드를 추출함에 있어서, 형태소를 기준으로 추출함으로써 키워드 추출부(42)의 성능을 향상시킬 수 있다.These keywords may be defined in various ways depending on the fields of the heterogeneous classification systems, but it is preferable that they are morphemes that are the basic units of words with meanings. This is because, when comparing and analyzing keywords in letter units, an inappropriate vocabulary may be included in a keyword to be used as a material for determining similarity. In particular, Hangeul has a large number of compound nouns, and due to its unique structure such as a change in ending and spacing, it is very likely that this confusion will increase. Therefore, in extracting the keyword from the unit classification item to be analyzed, the performance of the keyword extracting unit 42 can be improved by extracting the keyword based on the morpheme.

이에 따라, 키워드 추출부(42)는 컴퓨터과학의 자연어 처리 분야에서 널리 활용되고 있는 다양한 알고리즘의 자연어 처리기(보다 정확하게는 형태소 분석기를 의미한다.)로 구현될 수 있다. 형태소 분석이란 자연 언어 분석의 첫 단계로서 단어(한국어의 경우 어절이 된다.)를 구성하는 각각의 형태소들을 인식하고 불규칙 활용이나 축약, 탈락 현상이 일어난 경우 원형을 복원하는 과정을 말한다. 특히, 한글은 멀티바이트 코드를 사용하기 때문에 아스키 코드를 활용하는 영어와는 다른 방법의 형태소 분석기가 활용될 수 있을 것이다.Accordingly, the keyword extractor 42 may be implemented as a natural language processor (more precisely, a morphological analyzer) of various algorithms widely used in the natural language processing field of computer science. Morphological analysis is the first step in natural language analysis. It is the process of recognizing each morpheme that constitutes a word (a word in Korean) and restoring the original form in the event of irregular use, abbreviation or dropout. In particular, since Hangul uses multibyte code, a stemmer of a method different from English using ASCII code may be used.

형태소를 대상으로 하는 키워드 추출부(42)는 기본적으로 다음과 같은 기능을 수행한다. 우선, 문자열의 유형을 파악하여, 일련의 텍스트를 어절로 분리한다. 이어서, 분리된 어절의 형태소를 분석한다. 이 과정에서 모든 형태소 어휘와 일정 규칙을 저장하여 놓고 이를 탐색하여 결과를 반환하는 '사전 방식'과 별도의 사전없이 다수의 오토마타 등의 규칙 연산을 통해 결과를 반환하는 '규칙 방식', 그리고 이를 적절히 혼합한 '절충 방식'이 활용될 수 있다. 만약 분리된 어절이 복합 명사라면 이를 분리하여 단일 명사를 추출하는 것이 보다 유용한 키워드로서 활용될 수 있을 것이다. 이상과 같은 기능을 통해 키워드 추출부(42)는 입력부(41)에 입력된 분류체계 1(10)의 단위 분류항목으로부터 정제된 키워드를 추출해낸다.The keyword extracting unit 42 targeting morphemes basically performs the following functions. First, it identifies the type of string and separates the series of text into words. Then, the morphemes of the separated words are analyzed. In this process, all morpheme vocabulary and schedule rules are stored and searched for, and the dictionary method returns a result, and the rule method returns a result through a number of automata and other rule operations without a dictionary. Mixed 'negotiations' can be used. If a separate word is a compound noun, extracting a single noun by separating it may be used as a more useful keyword. Through the above functions, the keyword extraction unit 42 extracts the refined keyword from the unit classification items of the classification system 1 (10) input to the input unit 41.

키워드 추출부(42)는 물리적인 구성 면에서 통상적인 컴퓨터 환경에서의 프로세서와 작업에 필요한 메모리로 구현될 수 있으며, 필요에 따라서는 사전 데이터 를 저장하는 기록매체를 포함할 수 있다. 또한, 키워드 추출부(42)는 입력된 제 1 분류체계의 단위 분류항목으로부터 의미있는 키워드를 추출하여 유사도 산출부(43)에 제공하는 소프트웨어 코드를 포함한다. The keyword extracting unit 42 may be implemented with a processor and a memory required for a task in a general computer environment in terms of physical configuration, and may include a recording medium storing dictionary data, if necessary. In addition, the keyword extracting unit 42 may include a software code for extracting a meaningful keyword from the input unit classification item of the first classification system and providing the same to the similarity calculating unit 43.

유사도 산출부(43)는 키워드 추출부(42)를 통해 추출된 키워드와 이종 분류체계에 속하는 제 2 분류체계의 단위 분류항목을 비교하여, 제 1 분류체계의 단위 분류항목과 상기 제 2 분류체계의 단위 분류항목 간의 유사도를 산출한다. 여기서, 제 1 분류체계와 제 2 분류체계를 구성하는 분류항목들이 상이함은 당연하다. 또한, 제 1 분류체계의 단위 분류항목과 제 2 분류체계의 단위 분류항목이 대응될 수 있는 성질의 항목임 또한 당연하다.The similarity calculator 43 compares the keyword extracted by the keyword extractor 42 with the unit classification items of the second classification system belonging to the heterogeneous classification system, and thus the unit classification item of the first classification system and the second classification system. Calculate the similarity between the unit categories of. Here, it is obvious that the classification items constituting the first classification system and the second classification system are different. In addition, it is also natural that the unit classification items of the first classification system and the unit classification items of the second classification system may correspond to each other.

유사도는 비교 대상이 되는 2 개의 단위 분류항목들이 얼마나 유사한지 여부를 판단할 수 있는 기준을 사용자에게 제시할 수 있도록 미리 설정된 값이다. 따라서, 추출된 키워드와 제 2 분류체계의 단위 분류항목을 비교하여 산출된 산술적인 값인 것이 바람직하다. 이러한 유사도는 연산을 통해 제 2 분류체계의 단위 분류항목 내에서 추출된 키워드가 출현하는 횟수, 빈도, 문자열 간의 유사성 등 다양한 요소들을 고려하여 산출될 수 있다. 다만, 유사도는 앞서 도 2를 통해 설명한 매핑 유형과는 다소 차이점이 있다. 유사도는 연산을 통해 비교 대상과의 유사 정도를 직접적으로 산출한 값인데 반해, 매핑 유형은 이러한 유사도에 기초해 사용자가 용이하게 식별할 수 있도록 부여된 값이라는 차이점이 존재한다. 따라서, 유사도 값 자체가 직접 매핑 유형이 될 수도 있으나, 유사도 값을 다시 연산하여 도출된 식별자가 매핑 유형이 될 수도 있는 것이다.The similarity is a value set in advance so that the user can be presented with a criterion for determining how similar two unit categories to be compared are. Therefore, it is preferable that the arithmetic value calculated by comparing the extracted keyword and the unit classification item of the second classification system. Such similarity may be calculated in consideration of various factors such as the number of times, frequency, similarity between character strings, etc. that appear in the unit classification item of the second classification system through operations. However, the similarity is somewhat different from the mapping type described with reference to FIG. 2. Similarity is a value that directly calculates the degree of similarity with a comparison object through an operation, whereas there is a difference that a mapping type is a value assigned to a user to easily identify based on the similarity. Therefore, although the similarity value itself may be a direct mapping type, the identifier derived by recalculating the similarity value may be the mapping type.

유사도 산출부(43) 역시 물리적인 구성 면에서 통상적인 컴퓨터 환경에서의 프로세서와 작업에 필요한 메모리로 구현될 수 있으며, 추출된 키워드와 제 2 분류체계의 단위 분류항목을 비교, 연산하는 소프트웨어 코드를 포함한다.The similarity calculator 43 may also be implemented with a processor and memory required for work in a general computer environment in terms of physical configuration. The similarity calculator 43 may compare the extracted keyword with a unit classification item of the second classification system and calculate the software code. Include.

한편, 유사도 산출부(43)는 제 2 분류체계의 단위 분류항목 내에 상기 추출된 키워드가 포함되는 횟수를 산출하고, 상기 산출된 횟수 및 상기 추출된 키워드의 개수의 비율에 기초하여 유사도 값을 산출할 수 있다. 이러한 방식은 유사도 산출 과정을 단순하게 구현할 수 있으며, 보다 객관적인 기준을 제공할 수 있다는 점에서 높은 신뢰도를 가진다. 키워드의 개수를 통해 유사도 값을 산출하는 보다 구체적인 방법은 이후의 도 5에서 상술하겠다.Meanwhile, the similarity calculating unit 43 calculates the number of times the extracted keyword is included in the unit classification item of the second classification system, and calculates the similarity value based on the ratio of the calculated number of times and the number of the extracted keywords. can do. This method has a high reliability in that the similarity calculation process can be simplified and can provide a more objective criterion. A more specific method of calculating the similarity value through the number of keywords will be described later with reference to FIG. 5.

출력부(44)는 유사도 산출부(43)를 통해 산출된 유사도에 기초하여 제 1 분류체계의 단위 분류항목과 제 2 분류체계의 대응하는 단위 분류항목을 연결하여 출력한다. 즉, 출력값으로서 매핑된 분류체계(30)를 생성한다. 여기서, 출력(output)이라는 표현은 전자적인 데이터를 제공한다는 의미뿐만 아니라, 새롭게 매핑된 분류체계(30)를 특정 기록 장치에 기록하는 행위도 포함된다고 해석되어야 할 것이다. 따라서, 이러한 출력부(44)는 일종의 데이터베이스(database)나 저장소(repository)를 포함할 수 있으며, 컴퓨터 시스템이 읽거나 저장할 수 있는 데이터들을 하드디스크 드라이브(HDD)나 기타 대용량 데이터 저장수단 등에 저장하는 명령을 수행하는 소프트웨어 코드로서 구현될 수 있다.The output unit 44 connects and outputs the unit classification items of the first classification system and the corresponding unit classification items of the second classification system based on the similarity calculated by the similarity calculation unit 43. That is, the classification system 30 generated as an output value is generated. Here, the expression output should be interpreted not only to provide electronic data but also to record the newly mapped classification system 30 in a specific recording device. Accordingly, the output unit 44 may include a kind of database or repository, and stores data that may be read or stored by a computer system on a hard disk drive (HDD) or other large data storage means. It can be implemented as software code that performs instructions.

본 실시예에 따르면 상이한 체계를 갖는 이종 분류체계들의 정보를 모두 참고하여야 하는 상황에서 이들 이종 분류체계들 간의 유사도를 산출하여 단위 분류 항목을 중심으로 이종 분류체계들을 연결함으로써 이종 분류체계들 간의 정보의 호환성을 향상시키고, 이종 분류체계들 전반에 걸쳐 특정 항목을 동시에 비교하며 검색할 수 있다.According to the present embodiment, in a situation where all information of heterogeneous classification systems having different systems should be referred to, the similarity between these heterogeneous classification systems is calculated, and the heterogeneous classification systems are linked to each other based on the unit classification items, so that Improves compatibility and allows you to simultaneously compare and search for specific items across heterogeneous taxonomies.

도 4는 본 발명의 다른 실시예에 따른 유사도 구간과 온톨로지를 활용하여 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들을 매핑시키는 장치를 도시한 도면으로서, 도 3의 구성에서 제어부(45)와 온톨로지 저장부(46)를 더 포함한다. 이하에서는 도 3과 차별적인 구성을 중심으로 설명하겠다.FIG. 4 is a diagram illustrating an apparatus for mapping different classification systems including a plurality of classification items by using a similarity section and an ontology according to another embodiment of the present invention. In the configuration of FIG. The storage unit 46 further includes. Hereinafter, a description will be given of the configuration that is different from FIG. 3.

제어부(45)는 제 1 분류체계의 단위 분류항목과 제 2 분류체계의 단위 분류항목 간의 유사 정도를 식별할 수 있도록 미리 하나 이상의 유사도 구간을 설정하고, 유사도 산출부(43)를 통해 산출된 유사도 값과 미리 설정된 유사도 구간을 비교하여 매핑 유형을 결정한다. 여기서 매핑 유형이란, 앞서 도 2를 통해 설명한 바와 같이, 연결된 이종의 분류체계가 어떠한 관계를 갖는지를 식별할 수 있도록 부여되는 식별 정보이다. 특히 본 실시예에서는 유사도 구간을 미리 설정함으로써 유사도 산출부(43)로부터 산출된 유사도 값이 설정된 구간 중 어디에 해당하는지를 판단하는 구성을 제시한다. 비교 결과 유사도 값이 속하는 구간을 대표하는 식별자를 부여함으로써 최종적으로 매핑 유형이 결정된다. 예를 들어, 도 2에서 설명한 바와 같이 매핑 유형이 유사 정도에 따라 '일치', '유사' 또는 '관련' 등으로 결정될 수 있을 것이다.The controller 45 sets one or more similarity intervals in advance so as to identify a degree of similarity between the unit classification items of the first classification system and the unit classification items of the second classification system, and the similarity calculated through the similarity calculation unit 43. The mapping type is determined by comparing the value with the preset similarity interval. Here, the mapping type is identification information provided to identify what relation the heterogeneous classification systems are connected to, as described above with reference to FIG. 2. In particular, the present embodiment proposes a configuration for determining where the similarity value calculated from the similarity calculator 43 corresponds to the set interval by presetting the similarity section. The mapping type is finally determined by assigning an identifier representing a section to which the similarity value belongs. For example, as described with reference to FIG. 2, the mapping type may be determined as 'match', 'similar' or 'relevant' according to the degree of similarity.

제어부(45)는 물리적인 구성 면에서 통상적인 컴퓨터 환경에서의 프로세서와 작업에 필요한 메모리로 구현될 수 있으며, 미리 설정된 유사도 구간과 산출된 유 사도 값을 비교하는 소프트웨어 코드를 포함한다.The controller 45 may be implemented with a processor and a memory required for a task in a general computer environment in terms of physical configuration, and includes a software code for comparing a similarity interval and a calculated similarity value.

한편, 키워드는 상기 제 1 분류체계의 단위 분류항목으로부터 추출된 어휘로부터 확장된 유사 어휘를 포함하는 것이 바람직하다. 앞서 설명한 바와 같이 유사도 산출부(43)는 키워드 추출부(42)를 통해 분류체계 1(10)로부터 추출된 키워드와 분류체계 2(20)의 단위 분류항목을 비교함으로써 유사도를 산출한다. 따라서, 근본적으로 비교 대상은 분류체계 1(10)의 단위 분류항목과 분류체계 2(20)의 단위 분류항목이 될 것이다. 만약, 이들 2 개의 단위 분류항목이 의미상으로는 유사한 성질을 갖는 항목이나 문자 그 자체로 상이한 표현으로 작성되었다면, 불행히도 유사도 산출부(43)는 낮은 유사도 값을 산출할 가능성이 크다. 왜냐하면 인간이 어휘를 통해 인지할 수 있는 유사도와 기계가 산술적인 분석을 통해 산출한 유사도 간의 차이가 있기 때문이다. 예를 들어, '수락'이라는 표현과 '용인'이라는 한글 표현은 문자 자체로만 분석할 경우 상당히 낮은 유사도 값이 산출될 가능성이 크다. 그러나, 인간이 이들 2 개의 어휘를 인지할 경우 어느 정도 높은 유사도는 갖는다고 평가할 가능성이 크다.Meanwhile, the keyword may preferably include a similar vocabulary extended from the vocabulary extracted from the unit classification item of the first classification system. As described above, the similarity calculator 43 calculates the similarity by comparing the keyword extracted from the classification system 1 (10) through the keyword extraction unit 42 with the unit classification items of the classification system 2 (20). Therefore, the fundamental object of comparison will be the unit classification items of the classification system 1 (10) and the unit classification items of the classification system 2 (20). If these two unit classification items are written in terms of semantically similar items or characters themselves, unfortunately, the similarity calculator 43 is likely to calculate a low similarity value. This is because there is a difference between the similarity that humans can recognize through vocabulary and the similarity calculated by machine arithmetic analysis. For example, the expression 'accept' and the expression 'yongin' are likely to yield significantly lower similarity values if analyzed only by the letters themselves. However, if humans recognize these two vocabularies, there is a high possibility that they have some degree of similarity.

이상과 같은 문제점을 해결하고자, 본 실시예는 추출된 어휘로부터 유사 어휘를 발견하여 이를 키워드에 포함시킴으로써 비교 대상이 되는 키워드를 확장시킬 수 있다. 나아가 이러한 확장된 유사 어휘를 발견함에 있어서, 온톨로지 기술을 활용하고자 한다. 이하에서는 이러한 기능을 실현시킬 본 발명의 구성을 설명하기에 앞서 온톨로지 기술에 관해 간략히 설명한다.In order to solve the above problems, the present embodiment can expand a keyword to be compared by finding a similar vocabulary from the extracted vocabulary and including it in the keyword. Furthermore, in discovering this extended similar vocabulary, we will use ontology technology. Hereinafter, the ontology technique will be briefly described before explaining the configuration of the present invention to realize such a function.

사람과 기계 사이에 원활한 커뮤니케이션이 가능하기 위해서는 사람이 이해 하는 수준으로 기계도 언어를 이해할 수 있어야 한다. 사람이 언어를 이해하는 방식을 보통 개념화라고 하는데, 사람은 세상에 있는 각각의 사물이나 사건들을 경험하면서 이들 속에 들어있는 특징을 파악해서 언어로 개념화한다. 이와 유사하게 컴퓨터에서도 사람이 갖고 있는 개념에 대응하는 것을 일종의 데이터베이스 형태로 만드는 기술을 온톨로지(ontology) 기술이라고 부른다. 즉, 온톨로지란 사람들이 사물에 대해 생각하는 바를 추상화하고 공유한 모델로서, 정형화되고 개념의 유형이나 사용상의 제약 조건들이 명시적으로 정의된 기술을 말한다.In order to be able to communicate smoothly between people and machines, machines must be able to understand language at a level that people understand. The way a person understands language is usually called conceptualization. As a person experiences each object or event in the world, he grasps the features contained in them and conceptualizes it in language. Similarly, the technology of making a kind of database that corresponds to the concept of human being in computer is called ontology technology. In other words, ontology is a model that abstracts and shares what people think about things. It is a technology that is formalized and explicitly defines the types of concepts and usage constraints.

컴퓨터 과학 분야에서 온톨로지는 사람들이 세상에 대하여 보고 듣고 느끼고 생각하는 것에 대하여 서로 간의 토론을 통하여 합의를 이룬 바를 개념적이고 컴퓨터에서 다룰 수 있는 형태로 표현한 모델로, 일반적으로는 특정한 영역(domain)에 속하는 개념과 개념 사이의 관계를 기술하는 정형(formal) 어휘의 집합으로 정의된다. 왜냐하면, 온톨로지는 일단 합의된 지식을 나타내므로 어느 개인에게 국한되는 것이 아니라 그룹(특정한 영역을 의미한다.) 구성원이 모두 동의하는 개념이다. 그리고 프로그램이 이해할 수 있어야 하므로 정형화가 요구된다. 특히, 온톨로지는 지식 개념을 의미적으로 연결할 수 있는 도구로 사용되며, 컴퓨터에서 사람이 갖고 있는 사물에 대한 개념을 일종의 데이터베이스의 형태로 가공하여 처리할 수 있도록 해 준다. In the field of computer science, ontology is a conceptual and computerized model of the consensus that people see, hear, feel and think about the world through mutual discussion. In general, ontology is a concept that belongs to a specific domain. Defined as a set of formal vocabulary describing the relationship between and concepts. Because ontologies represent knowledge that has been agreed upon once, it is not limited to any individual, but a concept that all members of a group (meaning a specific area) agree with. And because the program must be understandable, formalization is required. In particular, ontology is used as a tool to connect knowledge concepts semantically, and it enables the computer to process and process the concept of human things in the form of a database.

정형 언어(formal language)로 기술된 어휘의 집합인 온톨로지는 추론(reasoning/inference)을 하는 데에 사용된다. 이와 관련하여, 시맨틱 웹(semantic web) 기술이 등장하였는데, 시맨틱 웹은 현재의 인터넷과 같은 분산 환경에서 리소스(웹 문서, 각종 파일, 서비스 등)에 대한 정보와 자원 사이의 관계-의미 정보(semanteme)를 기계, 즉 컴퓨터가 처리할 수 있는 온톨로지 형태로 표현하고, 이를 자동화된 기계가 처리하도록 하는 프레임워크이자 기술이다. 즉, 온톨로지는 시맨틱 웹을 구현할 수 있는 도구로써 지식 개념을 의미적으로 연결할 수 있는 도구이다.Ontologies, a set of vocabulary described in formal language, are used for reasoning / inference. In this regard, the semantic web technology has emerged. The semantic web has a relationship between a resource and information about a resource (web document, various files, services, etc.) in a distributed environment such as the current Internet. ) Is a framework and technology that expresses in the form of a machine, that is, an ontology that can be processed by a computer, and let the automated machine process it. In other words, ontology is a tool that can implement semantic web, and it is a tool that can semantically connect knowledge concept.

온톨로지의 구성 요소는 클래스(class), 인스턴스(instance), 관계(relation), 속성(property)으로 구분할 수 있다. 클래스는 일반적으로 우리가 사물이나 개념 등에 붙이는 이름을 말한다고 설명할 수 있다. "키보드", "모니터", "사랑"과 같은 것은 모두 클래스라고 할 수 있다. 반면, 인스턴스는 사물이나 개념이 구체물이나 사건 등의 실질적인 형태로 나타난 그 자체를 의미한다. 즉, "LG전자 ST-500 슬림키보드", "삼성 싱크마스터 Wide LCD 모니터", "로미오와 줄리엣의 사랑"은 일반적으로 인스턴스라 볼 수 있다. 이와 같은 클래스와 인스턴스의 구분은 응용과 사용목적에 따라서 매우 달라질 수 있다. 즉, 같은 표현의 개체가 어떠한 경우에는 클래스가 되었다가 다른 경우에는 인스턴스가 될 수 있다.Elements of ontology can be classified into class, instance, relationship, and property. Explain that a class is usually a name that we attach to things or concepts. "Keyboard", "monitor" and "love" are all classes. Instances, on the other hand, mean the things or concepts that manifest themselves in the actual form of things or events. In other words, "LG Electronics ST-500 Slim Keyboard", "Samsung Syncmaster Wide LCD Monitor" and "Love of Romeo and Juliet" are generally examples. The distinction between classes and instances can vary greatly depending on the application and purpose of use. That is, an object of the same expression can be a class in some cases and an instance in another.

관계는 클래스/인스턴스 간에 존재하는 관계들을 칭하며, 일반적으로 분류적인 관계(taxonomic relation)와 비분류적인 관계(non-taxonomic relation)로 구분할 수 있다. 분류적인 관계는 클래스/인스턴스들의 개념 분류를 위하여, 보다 폭넓은 개념과 구체적인 개념들로 구분하여 계층적으로 표현하는 관계이다. 예를 들어, "사람은 동물이다"와 같은 개념 간 포함관계를 나타내기 위한 "isA" 관계가 그것이다. 분류적인 관계가 아닌 관계를 비분류적인 관계라 한다. 예를 들어, "운동으로 인해 건강해진다"는 것은 "cause" 관계(인과관계)를 이용하여 표현한다.Relationships refer to relationships that exist between classes / instances and can generally be classified into a taxonomic relation and a non-taxonomic relation. A taxonomy relationship is a hierarchical relationship divided into broader concepts and concrete concepts to classify concepts of classes / instances. For example, the "isA" relationship is intended to indicate inclusion between concepts such as "man is an animal." Relationships that are not categorical are called unclassified. For example, "healthy by exercise" is expressed using the "cause" relationship.

속성은 클래스나 인스턴스의 특정한 성질/성향 등을 나타내기 위하여, 클래스나 인스턴스를 특정한 값(value)와 연결시킨 것이다. 예를 들어, "삼성 싱크마스터 Wide LCD 모니터는 24인치이다."라는 것을 표현하기 위하여, hasSize와 같은 속성을 정의할 수 있다. An attribute is a concatenation of a class or instance with a specific value to indicate a particular property / propensity of the class or instance. For example, to express "A Samsung Syncmaster Wide LCD monitor is 24 inches," you can define attributes such as hasSize.

도 4로 돌아와서, 온톨로지 저장부(46)는 제 1 분류체계의 단위 분류항목으로부터 추출된 어휘를 객체로 하여 정의된 온톨로지를 저장한다. 즉, 이사에서 설명한 온톨로지의 클래스로서 어휘를 정의하여 저장할 수 있다. 다소의 어휘가 온톨로지로서 저장된 경우 각각의 클래스 간의 관계가 발생하게 되고, 어휘라는 도메인의 특성상, 상위 개념, 하위 개념, 동의어, 유의어 및 반의어 등의 관계가 형성되게 된다. 따라서, 이러한 관계를 이용하면, 추출된 키워드와 동의어 또는 유의어 관계에 있는 클래스를 발견할 수 있다. 결과적으로 발견된 동의어 또는 유의어를 키워드에 포함시킴으로써 추출된 키워드의 어휘를 보다 풍부하게 확장시킬 수 있다.4, the ontology storage unit 46 stores the ontology defined by using the vocabulary extracted from the unit classification item of the first classification system as an object. That is, a vocabulary can be defined and stored as a class of ontology described in the director. When some vocabulary is stored as an ontology, a relationship between each class occurs, and a relationship between a higher concept, a lower concept, a synonym, a synonym, and an antonym is formed due to the characteristics of a domain called a vocabulary. Thus, using this relationship, it is possible to find a class that has a synonym or synonym relationship with the extracted keyword. As a result, by including the synonyms or synonyms found in the keyword, it is possible to further enrich the vocabulary of the extracted keyword.

온톨로지 저장부(46)는 물리적으로 전자화된 온톨로지 정보를 저장하는 일종의 데이터베이스나 저장소로서 구현될 수 있으며, 이 경우 키워드 추출부(42)는 컴퓨터 시스템이 읽거나 저장할 수 있는 온톨로지 데이터들을 온톨로지 저장부(46)에 저장하는 명령을 수행하는 소프트웨어 코드를 포함한다.The ontology storage unit 46 may be implemented as a kind of database or storage that stores physically electronic ontology information. In this case, the keyword extractor 42 may store the ontology data that the computer system can read or store. And software code for performing the instructions to be stored.

본 실시예에 따르면 산술적인 수치로 제공되는 유사도 값을 미리 설정된 유사도 구간을 통해 대표 식별자인 매핑 유형을 부여함으로써 보다 식별이 용이하고 간략한 매핑 정보를 제공할 수 있다. 또한, 상이한 체계를 갖는 이종 분류체계들의 정보를 모두 참고하여야 하는 상황에서 확장된 유사 어휘를 발견하여 이를 키워드에 포함시킴으로써 이종 분류체계들 간의 정보의 호환성과 정보 검색의 성능을 향상시킬 수 있다.According to the present exemplary embodiment, the similarity value provided as an arithmetic numerical value may be provided with a mapping type, which is a representative identifier, through a similarity interval that is set in advance, thereby providing easier identification and simple mapping information. In addition, in the situation where all information of heterogeneous classification systems having different systems should be referred to, an expanded similar vocabulary is found and included in a keyword, thereby improving the compatibility of information between heterogeneous classification systems and the performance of information retrieval.

도 5는 본 발명의 일 실시예에 따른 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들을 매핑시키는 매핑 방법을 예시한 도면으로써 이종 분류체계 매핑 장치(50)에 입력된 이종 분류체계들(10, 20)을 연산하여 매핑된 분류체계를 출력하는 간단한 방법을 제안하고 있다.FIG. 5 is a diagram illustrating a mapping method for mapping different classification systems including a plurality of classification items according to an embodiment of the present invention, and the heterogeneous classification systems 10 input to the heterogeneous classification system mapping apparatus 50. We propose a simple method to compute the mapped taxonomy by computing the 20).

우선, 분류체계 1(10)의 단위 분류항목들로부터 N개의 키워드를 추출한다(51). 이어서, 추출된 N개의 키워드들이 분류체계 2(20)의 단위 분류항목 내에서 몇 개나 존재하는지 여부를 검색한다(52). N개의 키워드들을 단순 검색할 경우, 최소 0개에서 최대 N개까지의 검색 결과가 출력될 것이다. 따라서, 유사도는 다음의 수학식 1과 같이 검색된 키워드의 횟수와 추출된 키워드의 총 수의 비율로서 정의될 수 있다.First, N keywords are extracted from unit classification items of the classification system 1 (10) (51). Subsequently, it is searched for how many extracted N keywords exist in the unit classification item of the classification system 2 (20) (52). If you simply search for N keywords, at least 0 to N search results will be displayed. Thus, the similarity may be defined as a ratio of the number of keywords searched and the total number of keywords extracted as in Equation 1 below.

따라서, 수학식 1의 정의에 따르면 유사도는 0 에서 1 사이의 값을 갖는다.Accordingly, according to the definition of Equation 1, the similarity has a value between 0 and 1.

한편, 제 1 분류체계의 단위 분류항목과 제 2 분류체계의 단위 분류항목 간 의 유사 정도를 식별할 수 있도록 미리 하나 이상의 유사도 구간을 설정하고, 이어서, 수학식 1을 통해 산출된 유사도 값과 미리 설정된 유사도 구간을 비교하여 산출된 유사도가 어느 유사도 구간에 속하는지를 판단함으로써 매핑 유형을 결정한다. 유사도 구간은 다음의 수학식 2와 같이 정의될 수 있다.Meanwhile, one or more similarity intervals are set in advance so that the degree of similarity between the unit classification items of the first classification system and the unit classification items of the second classification system is set in advance, and then the similarity values calculated through Equation 1 and in advance The mapping type is determined by comparing the similarity intervals calculated by comparing the similarity intervals. The similarity section may be defined as in Equation 2 below.

한편, 도 5의 실시예에서는 매핑 유형을 결정함에 있어 유사도 값을 활용하지 않고, 매핑 유형을 결정하는 방법을 예시하고 있다. 즉, N개의 키워드 모두가 검색될 경우 '일치'라는 매핑 유형을 부여하고, 2개 이상 N-1개 이하 검색될 경우 '유사'라는 매핑 유형을 부여하며, 1개 이하의 키워드가 검색될 경우 '관련'이라는 매핑 유형을 부여하고 있다.Meanwhile, the embodiment of FIG. 5 illustrates a method of determining the mapping type without using the similarity value in determining the mapping type. In other words, if all N keywords are searched, a mapping type of 'match' is given, if two or more N-1 or less searches are given, a mapping type of 'similar' is given, and if one or less keywords are searched, It has a mapping type of 'relevant'.

이상에서 소개된 수학식들과 도 5의 매핑 방법은 일례로서 제시된 것으로 이들 방법 이외에도 본 발명의 기술적 사상을 유지하면서 다양한 유사도 산출 방법 및 매핑 유형 결정 방법이 제안될 수 있을 것이다. 또한, 앞서 도 4를 통해 설명한 바와 같이 온톨로지를 활용한 확장된 유사 어휘를 키워드에 포함시키는 방법이 활용될 수 있음을 물론이다.The above-described equations and the mapping method of FIG. 5 are presented as examples, and in addition to these methods, various similarity calculation methods and mapping type determination methods may be proposed while maintaining the technical idea of the present invention. In addition, as described above with reference to FIG. 4, the method of including the extended similar vocabulary using the ontology in the keyword may be utilized.

도 6은 본 발명의 일 실시예에 따른 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들을 매핑시키는 방법을 도시한 흐름도로서 다음과 같은 단계들을 포함한다.6 is a flowchart illustrating a method of mapping different classification systems including a plurality of classification items according to an embodiment of the present invention, and includes the following steps.

610 단계에서 복수 개의 분류항목들을 포함하는 이종 분류체계에 속하는 제 1 분류체계의 단위 분류항목으로부터 적어도 하나 이상의 키워드를 추출한다. 이 과정은 앞서 설명한 도 3의 키워드 추출부(42)에 대응하는 것으로 자세한 설명은 생략한다.In operation 610, at least one keyword is extracted from the unit classification items of the first classification system belonging to the heterogeneous classification system including the plurality of classification items. This process corresponds to the keyword extraction unit 42 of FIG. 3 described above, and a detailed description thereof will be omitted.

620 단계에서 610 단계를 통해 추출된 키워드와 이종 분류체계에 속하는 제 2 분류체계의 단위 분류항목을 비교하여, 제 1 분류체계의 단위 분류항목과 제 2 분류체계의 단위 분류항목 간의 유사도를 산출한다. 이 과정은 앞서 설명한 도 3의 유사도 산출부(43)에 대응하는 것으로 자세한 설명은 생략한다.In operation 620 and operation 610, the similarity between the unit classification items of the first classification system and the unit classification items of the second classification system is calculated by comparing the extracted keywords with the unit classification items of the second classification system belonging to the heterogeneous classification system. . This process corresponds to the similarity calculator 43 of FIG. 3 described above, and a detailed description thereof will be omitted.

630 단계에서 620 단계를 통해 산출된 유사도에 기초하여 제 1 분류체계의 단위 분류항목과 제 2 분류체계의 대응하는 단위 분류항목을 연결하여 출력한다. 이 과정은 앞서 설명한 도 3의 출력부(44)에 대응하는 것으로 자세한 설명은 생략한다.Based on the similarity calculated in step 630 and step 620, the unit classification items of the first classification system and the corresponding unit classification items of the second classification system are connected and output. This process corresponds to the output unit 44 of FIG. 3 described above, and a detailed description thereof will be omitted.

도 7은 본 발명의 다른 실시예에 따른 유사도 구간을 활용하여 매핑 유형을 결정함으로써 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들을 매핑시키는 방법을 도시한 흐름도로서, 이하에서는 도 6의 방법들과 차별되는 단계를 중심으로 설명하겠다.7 is a flowchart illustrating a method of mapping different classification systems including a plurality of classification items by determining a mapping type using a similarity interval according to another embodiment of the present invention. I will explain the stages of discrimination.

621 단계 및 622 단계는 도 6의 유사도 산출 단계(620)를 상술한 것으로, 621 단계에서는 제 2 분류체계의 단위 분류항목 내에 추출된 키워드가 포함되는 횟 수를 산출하고, 622 단계에서는 산출된 횟수 및 상기 추출된 키워드의 개수의 비율에 기초하여 유사도 값을 산출한다.Steps 621 and 622 have described the similarity calculation step 620 of FIG. 6. In step 621, the number of times the extracted keyword is included in the unit classification item of the second classification system is calculated. And a similarity value based on a ratio of the number of extracted keywords.

한편, 623 단계에서는 제 1 분류체계의 단위 분류항목과 제 2 분류체계의 단위 분류항목 간의 유사 정도를 식별할 수 있도록 미리 하나 이상의 유사도 구간을 설정한다. 이러한 유사도 구간은 사용자가 이용에 편리하도록 유사도 값의 산출 범위를 고려하여 결정되는 것이 바람직하다.In operation 623, one or more similarity sections may be set in advance to identify a degree of similarity between the unit classification items of the first classification system and the unit classification items of the second classification system. The similarity section is preferably determined in consideration of the calculation range of the similarity value for the user's convenience.

유사도 구간이 설정되었다면, 625 단계에서는 622 단계를 통해 산출된 유사도 값과 623 단계를 통해 설정된 유사도 구간을 비교하여 매핑 유형을 결정한다. 이 과정은 앞서 설명한 도 4의 제어부(45)에 대응하는 것으로 자세한 설명은 생략한다.If the similarity section is set, in step 625 the mapping type is determined by comparing the similarity value calculated in step 622 and the similarity section set in step 623. This process corresponds to the control unit 45 of FIG. 4 described above, and a detailed description thereof will be omitted.

도 8은 본 발명의 다른 실시예에 따른 매핑 유형을 활용하여 이종 분류체계들을 매핑시킨 결과를 예시한 도면으로서, 현용 국가 기록물 분류체계인 '기록관리기준표(Business Reference Model: BRM)'와 비현용 국가 기록물 분류체계인 '기록분류기준표'를 이종 분류체계로서 가정하고 있다. 도 8의 타입(81)은 매핑 유형을 의미하고, 매핑 키워드(82)는 키워드 추출부를 통해 추출된 키워드를 의미한다. 도 8에서 타입(81) 및 매핑 키워드(82)를 중심으로 좌측에 위치한 분류항목들이 현용 분류체계의 분류항목들이고, 우측에 위치한 분류항목들이 비현용 분류체계의 분류항목들이다. 이 때, 비교 대상이 되는 단위 분류항목은 각각 'BRM_단위과제' 및 '기록분류기준_단위업무명'이다.FIG. 8 is a diagram illustrating a result of mapping heterogeneous classification systems using a mapping type according to another embodiment of the present invention. FIG. 8 illustrates a business reference model (BRM) that is a current national records classification system. The national classification system of records is assumed to be a heterogeneous classification system. The type 81 of FIG. 8 refers to a mapping type, and the mapping keyword 82 refers to a keyword extracted through the keyword extraction unit. In FIG. 8, the classification items located on the left side based on the type 81 and the mapping keyword 82 are classification items of the current classification system, and the classification items located on the right side are classification items of the non-current classification system. In this case, the unit classification items to be compared are 'BRM_unit task' and 'record classification criteria_unit task name', respectively.

도 8에서 매핑 키워드(82)는 '기록분류기준_단위업무명'으로부터 추출된 키 워드들이다. 이러한 키워드들이 'BRM_단위과제'에 몇 개가 포함되어 있는지 여부를 검사함으로써 타입(81)이 결정되게 된다. 도 8에는 키워드들의 발견 횟수에 따라 각각 '일치', '유사' 및 '관련'이라는 이름의 타입(81)이 부여된 것을 확인할 수 있다. 또한, 이러한 매핑 유형에 따라 이종 분류체계가 연결되어 하나의 매핑된 분류 체계를 형성하고 있음을 볼 수 있다.In FIG. 8, the mapping keywords 82 are keywords extracted from 'record classification criteria_unit business name'. The type 81 is determined by checking how many of these keywords are included in the 'BRM_unit task'. In FIG. 8, it can be seen that types 81 of the names 'match', 'similar', and 'relevant' are respectively assigned according to the number of times of keyword discovery. In addition, it can be seen that heterogeneous classification systems are linked to form one mapped classification system according to the mapping type.

도 9는 본 발명의 또 다른 실시예에 따른 온톨로지를 활용하여 확장된 키워드를 추출함으로써 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들을 매핑시키는 방법을 도시한 흐름도로서, 이하에서는 도 6의 방법들과 차별되는 단계를 중심으로 설명하겠다.FIG. 9 is a flowchart illustrating a method for mapping different classification systems including a plurality of classification items by extracting an extended keyword using an ontology according to another embodiment of the present invention. Hereinafter, the methods of FIG. The explanation will focus on the stages that are different from the above.

615 단계에서 제 1 분류체계의 단위 분류항목으로부터 추출된 어휘를 객체로 하여 온톨로지를 정의한다.In step 615, the ontology is defined using the vocabulary extracted from the unit classification item of the first classification system as an object.

627 단계에서 615 단계를 통해 정의된 온톨로지 객체들 간의 관계를 이용하여 확장된 유사 어휘를 포함하는 키워드와 제 2 분류체계의 단위 분류항목을 비교한다. 이러한 비교 결과 제 1 분류체계의 단위 분류항목과 제 2 분류체계의 단위 분류항목 간의 유사도를 산출한다.In operation 627 and 615, the keyword including the extended similar vocabulary and the unit classification item of the second classification system are compared using the relationships between the ontology objects defined in operation 615. As a result of this comparison, the similarity between the unit classification items of the first classification system and the unit classification items of the second classification system is calculated.

이상의 615 단계 및 627 단계는 앞서 도 4를 통해 설명한 온톨로지 저장부(46)에 대응하는 것으로 자세한 설명은 생략한다.Steps 615 and 627 correspond to the ontology storage unit 46 described above with reference to FIG. 4, and a detailed description thereof will be omitted.

한편, 본 발명은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한 다. Meanwhile, the present invention can be embodied as computer readable codes on a computer readable recording medium. Computer-readable recording media include all types of recording devices that store data that can be read by a computer system.

컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현하는 것을 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술 분야의 프로그래머들에 의하여 용이하게 추론될 수 있다.Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like, which may also be implemented in the form of carrier waves (for example, transmission over the Internet). Include. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. In addition, functional programs, codes, and code segments for implementing the present invention can be easily deduced by programmers skilled in the art to which the present invention belongs.

이상에서 본 발명에 대하여 그 다양한 실시예들을 중심으로 살펴보았다. 본 발명에 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.The present invention has been described above with reference to various embodiments thereof. Those skilled in the art will understand that the present invention can be implemented in a modified form without departing from the essential features of the present invention. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope will be construed as being included in the present invention.

도 1은 이종 분류체계들을 매핑시키고자 하는 상황을 설명하기 위한 도면이다.1 is a view for explaining a situation to map heterogeneous classification systems.

도 2는 도 1의 이종 분류체계들이 매핑되어 생성된 결과를 예시한 도면이다.FIG. 2 is a diagram illustrating a result generated by mapping heterogeneous classification systems of FIG. 1.

도 3은 본 발명의 일 실시예에 따른 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들을 매핑시키는 장치를 도시한 도면이다.3 is a diagram illustrating an apparatus for mapping different classification systems including a plurality of classification items according to an embodiment of the present invention.

도 4는 본 발명의 다른 실시예에 따른 유사도 구간과 온톨로지를 활용하여 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들을 매핑시키는 장치를 도시한 도면이다.4 is a diagram illustrating an apparatus for mapping different classification systems including a plurality of classification items by using a similarity section and an ontology according to another embodiment of the present invention.

도 5는 본 발명의 일 실시예에 따른 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들을 매핑시키는 매핑 방법을 예시한 도면이다.5 is a diagram illustrating a mapping method for mapping different classification systems including a plurality of classification items according to an embodiment of the present invention.

도 6은 본 발명의 일 실시예에 따른 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들을 매핑시키는 방법을 도시한 흐름도이다.6 is a flowchart illustrating a method of mapping different classification systems including a plurality of classification items according to an embodiment of the present invention.

도 7은 본 발명의 다른 실시예에 따른 유사도 구간을 활용하여 매핑 유형을 결정함으로써 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들을 매핑시키는 방법을 도시한 흐름도이다.7 is a flowchart illustrating a method of mapping different classification systems including a plurality of classification items by determining a mapping type using a similarity interval according to another embodiment of the present invention.

도 8은 본 발명의 다른 실시예에 따른 매핑 유형을 활용하여 이종 분류체계들을 매핑시킨 결과를 예시한 도면이다.8 is a diagram illustrating a result of mapping heterogeneous classification systems using a mapping type according to another embodiment of the present invention.

도 9는 본 발명의 또 다른 실시예에 따른 온톨로지를 활용하여 확장된 키워드를 추출함으로써 복수 개의 분류항목들을 포함하는 서로 다른 분류체계들을 매핑 시키는 방법을 도시한 흐름도이다.9 is a flowchart illustrating a method of mapping different classification systems including a plurality of classification items by extracting an extended keyword by using an ontology according to another embodiment of the present invention.

<도면의 주요 부분에 대한 설명>Description of the main parts of the drawing

10, 20 : 이종 분류체계10, 20: heterogeneous classification system

30 : 매핑된 분류체계30: mapped taxonomy

40, 50 : 이종 분류체계 매핑 장치40, 50: heterogeneous classification system mapping device

41 : 입력부 42 : 키워드 추출부41: input unit 42: keyword extraction unit

43 : 유사도 산출부 44 : 출력부43: similarity calculator 44: output unit

45 : 제어부 46 : 온톨로지 저장부45: control unit 46: ontology storage unit

Claims

In the method of mapping different classification systems including a plurality of classification items,

Extracting at least one keyword from a unit classification item of a first classification system belonging to the classification system;

Calculating a similarity between the unit classification item of the first classification system and the unit classification item of the second classification system by comparing the extracted keyword and the unit classification item of the second classification system belonging to the classification system; And

And connecting and outputting a unit classification item of the first classification system and a corresponding unit classification item of the second classification system based on the calculated similarity.

The method of claim 1,

Computing the similarity,

Calculating a number of times that the extracted keyword is included in a unit classification item of the second classification system; And

Calculating a similarity value based on a ratio of the calculated number of times and the number of extracted keywords.

The method of claim 2,

Setting at least one similarity section in advance so as to identify a degree of similarity between the unit classification item of the first classification system and the unit classification item of the second classification system; And

And comparing the calculated similarity value with the set similarity interval to determine a mapping type.

The method of claim 1,

The keyword is a morpheme which is a unit of a word having a meaning.

The method of claim 1,

The keyword may include a similar vocabulary extended from a vocabulary extracted from a unit classification item of the first classification system.

The method of claim 5,

Defining an ontology by using a vocabulary extracted from a unit classification item of the first classification system as an object;

The similar vocabulary is a vocabulary extracted by using the relationships between the defined ontology objects.

A non-transitory computer-readable recording medium having recorded thereon a program for executing the method of claim 1.

An apparatus for mapping different classification systems including a plurality of classification items,

A keyword extraction unit for extracting at least one keyword from a unit classification item of a first classification system belonging to the classification system;

A similarity calculator that compares the extracted keyword and a unit classification item of a second classification system belonging to the classification system and calculates a similarity between the unit classification item of the first classification system and the unit classification item of the second classification system; And

And an output unit for connecting and outputting unit classification items of the first classification system and corresponding unit classification items of the second classification system based on the calculated similarity.

The method of claim 8,

The similarity calculating unit may calculate a number of times the extracted keyword is included in a unit classification item of the second classification system, and calculate a similarity value based on a ratio of the calculated number of times and the number of extracted keywords. Device.

The method of claim 9,

One or more similarity intervals are set in advance to identify a degree of similarity between the unit classification items of the first classification system and the unit classification items of the second classification system, and the mapping is performed by comparing the calculated similarity values with the set similarity intervals. The apparatus further comprises a control unit for determining the type.

The method of claim 8,

And the keyword is a morpheme which is a unit of a word having a meaning.

The method of claim 8,

13. The method of claim 12,

The apparatus further includes an ontology storage unit configured to store an ontology defined by using a vocabulary extracted from a unit classification item of the first classification system as an object.

And the similar vocabulary is a vocabulary extracted by using the relationships between the defined ontology objects.