KR100400222B1

KR100400222B1 - Dynamic semantic cluster method and apparatus for selectional restriction

Info

Publication number: KR100400222B1
Application number: KR10-2001-0020175A
Authority: KR
Inventors: 조정미
Original assignee: 삼성전자주식회사
Priority date: 2001-04-16
Filing date: 2001-04-16
Publication date: 2003-10-01
Also published as: KR20020080553A

Abstract

선택 제한을 위한 동적 의미 분류 방법 및 장치가 개시된다. 이 방법은, 주어진 한국어 사전에 기재된 각 표제어의 의미를 나타내는 풀이말을 속성별로 세분화하여 사전 지식 베이스를 생성하고, 주어진 한국어 언어 자료에 기재된 각 문장을 분석하여 용언별로 선택 제한 단어들을 찾아 초기 격틀 사전을 생성하는 단계 및 용언별로 분류된 선택 제한 단어들에 공통되는 속성인 핵심 의미 속성을 추출하고, 추출된 핵심 의미 속성을 기반으로 한국어 격틀 사전을 생성하는 단계를 구비하는 것을 특징으로 한다. 그러므로, 한국어 사전의 풀이말을 이용하여 표제어인 명사를 의미에 따라 자동으로 분류할 수 있고, 핵심 의미 속성에 의해 과생성/미생성 없이 선택 제한을 표현할 수 있고, 한국어 격틀 사전을 이용하는 응용 시스템 예를 들면 한국어 구문 해석기, 의미 해석기 등 한국어 해석기의 성능과, 음성 인식 및 합성기에서 언어 처리부의 성능을 향상시킬 수 있는 효과를 갖는다.Disclosed are a dynamic semantic classification method and apparatus for selection restriction. This method generates a dictionary knowledge base by subdividing the pool words that represent the meanings of the headings listed in a given Korean dictionary by attributes, and analyzes each sentence listed in a given Korean language material to find the selection limit words for each language. And generating a Korean semantic dictionary based on the extracted core semantic attributes and extracting a core semantic attribute that is common to the selection restriction words categorized for each term. Therefore, we can classify the nouns that are the headwords according to the meaning by using the lexical words of the Korean dictionary, express the selection restriction without overproducing / not producing by the core semantic attributes, and use the Korean Dictionary Dictionary example. For example, it has the effect of improving the performance of the Korean interpreter such as the Korean syntax interpreter and the semantic interpreter, and the performance of the language processor in the speech recognition and synthesizer.

Description

Dynamic semantic cluster method and apparatus for selectional restriction

본 발명은 한국어 해석에서 의미를 분류하는 것에 관한 것으로서, 특히, 한국어 문장에서 용언의 선택 제한을 위한 동적 의미 분류 방법 및 장치에 관한 것이다.The present invention relates to classifying meanings in Korean interpretation, and more particularly, to a method and apparatus for classifying dynamic meanings for limiting selection of words in Korean sentences.

일반적으로, 선택 제한이란, 문장에서 단어와 단어가 서로 의존 관계를 갖기 위한 의미적 정합성을 표현한 것이다. 즉, 임의의 용언이 주어졌다고 할 때, 그 주어진 용언과 의미적으로 정합될 수 있는 단어(이하, '선택 제한 단어'라 한다.)의 선택은 제한된다. 예를 들어, '감다'라는 용언이 주어졌을 때, '감다'에 대한 선택 제한 단어는 줄, 끈 또는 넥타이는 될 수 있어도 뱀은 될 수 없으므로, '감다'에 대한 선택 제한 단어들은 줄, 끈, 넥타이 따위이며 뱀은 선택 제한 단어가 될 수 없다.In general, the selection restriction is a representation of a semantic consistency in which a word and a word depend on each other in a sentence. That is, when an arbitrary word is given, the selection of a word (hereinafter, referred to as a 'selection limit word') that can be semantically matched to the given word is limited. For example, when the word 'wind' is given, the selection limit word for 'wind' may be a string, a string or a tie, but not a snake, so the selection limit words for 'wind' may be a string, a string It's a tie, and a snake can't be a choice limit word.

종래의 선택 제한을 위한 의미 분류 방법들중 하나로서, WordNet과 같은 기존 의미 분류 체계를 그대로 이용하는 '의미 클래스 기반 방법'이 있다. 의미 클래스 방법은 1969년 J. J. Katz, J. A. Forder 또는 P.M. Postal 같은 언어학자에 의해 처음으로 제시되었고, 자연 언어 처리 분야에서 가장 일반적으로 사용된다. 그러나 이 방법은 선택 제한에서 요구하는 세분화된 분류를 지원하지 못하며, 고정된 분류이기 때문에 각각의 용언마다 달라지는 선택 제한에 적절히 대응할 수 없는 문제점을 갖는다.As one of the semantic classification methods for limiting the conventional selection, there is a 'meaning class-based method' using the existing semantic classification system such as WordNet. Semantic class methods are described in J. J. Katz, J. A. Forder or P.M. First presented by a linguist like Postal, it is most commonly used in the field of natural language processing. However, this method does not support the granular classification required by the selection restriction, and because it is a fixed classification, it does not adequately cope with the selection restriction that varies with each term.

종래의 선택 제한을 위한 다른 의미 분류 방법으로서, 각 용언의 선택 제한에 가능한 모든 단어들을 직접 나열하는 '예제 기반 방법'이 있다. 예제 기반 방법은, 1990년대 초 일본의 교토 대학의 나가오 교수에 의해 처음 자연 언어 처리 분야에 도입되었으며, 특이하게 일본에서만 이용되고 있다. 이 방법에 대해서는 "유사 정보 검색 장치"라는 제목의 일본 특허 공개 번호 1993-342276, "Natural language analyzing apparatus and method, and construction of a knowledge base for natural language analysis"라는 제목의 미국 특허 번호 US5,424,947 및 "A method of case structure analysis for Japanese sentences based on examples in case frame dictionary"라는 제목으로 Sadao Kurohashi와 Makoto Gargo에 의해 IEICE Transactions on Information and System라는 1994년도에 발표된 논문의 Vol. E77-D, No.2 페이지 227부터 239쪽에 개시되어 있다. 그러나, 이 방법들은 모든 예제들을 나열할 수 없을 뿐만 아니라 비효율적이며, 더우기 나열되지 않은 단어를 처리할 수 없는 문제점을 갖는다.As another conventional semantic classification method for limiting selection, there is an 'example-based method' that directly lists all possible words for limiting selection of each word. The example-based method was first introduced into the field of natural language processing by Professor Nagao of Kyoto University in Japan in the early 1990s, and it is uniquely used only in Japan. For this method, see Japanese Patent Publication No. 1993-342276, entitled "Natural language analyzing apparatus and method, and construction of a knowledge base for natural language analysis," US Pat. No. 5,424,947; In a paper published in 1994, entitled IEICE Transactions on Information and System by Sadao Kurohashi and Makoto Gargo, entitled "A method of case structure analysis for Japanese sentences based on examples in case frame dictionary." E77-D, No. 2, page 227 to page 239. However, these methods are not only able to list all the examples, but are also inefficient, and furthermore have the problem that they cannot handle unlisted words.

종래의 선택 제한을 위한 또 다른 의미 분류 방법으로서, '확률 기반 방법'이 있다. 확률 기반 방법은 언어 자료(corpus)에서의 분포 유사도에 의해 명사를 분류하며, 의미 클래스를 자동으로 구축할 수 있다. 이 방법에 대해서는 "Method for document retrieval and for word sense disambiguation using neural networks" 제목의 미국 특허 번호 US5,317,507, "Building and updating of co-occurrence dictionary analyzing of co-occurrence and meaning"라는 제목의미국 특허 번호 US5,406,480 및 "A Corpus-Based Approach for Building Semantic Lexicon"라는 제목으로 Ellen Riloff와 Jessica Shepherd에 의해 Proceedings for the second conference on Empirical Methods in Natural Language Proceessing의 페이지 117-124쪽에 1997에 발표된 논문에 개시되어 있다. 그러나, 이 방법은 의미 클래스 기반 방식이 지닌 단점을 해결할 수 없는 문제점을 갖는다.As another semantic classification method for limiting the conventional selection, there is a 'probability-based method'. Probability-based methods classify nouns by distributional similarity in linguistic data and automatically build semantic classes. For this method, see US Patent No. US5,317,507, entitled "Building and updating of co-occurrence dictionary analyzing of co-occurrence and meaning," US Patent No. US5,317,507, entitled "Method for document retrieval and for word sense disambiguation using neural networks." Published in a paper published in 1997 on pages 117-124 of Proceedings for the second conference on Empirical Methods in Natural Language Proceessing by Ellen Riloff and Jessica Shepherd under the title US5,406,480 and "A Corpus-Based Approach for Building Semantic Lexicon" It is. However, this method has a problem that cannot solve the disadvantage of semantic class based method.

본 발명이 이루고자 하는 기술적 과제는, 선택 제한에서 요구되는 세분화된 분류를 지원하며 각각의 용언마다 달라지는 선택 제한에 적절하게 대응할 수 있는 선택 제한을 위한 동적 의미 분류 방법을 제공하는 데 있다.SUMMARY OF THE INVENTION The present invention has been made in an effort to provide a dynamic semantic classification method for selection restriction that can support the detailed classification required in the selection restriction and can appropriately respond to the selection restriction that varies for each term.

본 발명이 이루고자 하는 다른 기술적 과제는, 상기 선택 제한을 위한 동적 의미 분류 방법을 수행하는 선택 제한을 위한 동적 의미 분류 장치를 제공하는 데 있다.Another object of the present invention is to provide a dynamic semantic classification apparatus for selection restriction that performs the dynamic semantic classification method for the selection restriction.

도 1은 본 발명에 의한 선택 제한을 위한 동적 의미 분류 방법을 설명하기 위한 플로우차트이다.1 is a flowchart illustrating a dynamic semantic classification method for selection restriction according to the present invention.

도 2는 본 발명에 의한 선택 제한을 위한 동적 의미 분류 장치의 일 실시예의 블럭도이다.2 is a block diagram of an embodiment of a dynamic semantic classification apparatus for limiting selection according to the present invention.

도 3은 사전 지식 베이스 생성부에서 수행되는 제10 단계에 대한 본 발명에 의한 실시예를 설명하기 위한 플로우차트이다.3 is a flowchart for explaining an embodiment of the present invention with respect to the tenth step performed by the prior knowledge base generation unit.

도 4 (a) 및 (b)는 각각 한국어 사전 및 사전 지식 베이스의 예시적인 도면들이다.4 (a) and (b) are exemplary diagrams of a Korean dictionary and a dictionary knowledge base, respectively.

도 5는 초기 격틀 사전 생성부에서 수행되는 제10 단계에 대한 본 발명에 의한 실시예를 설명하기 위한 플로우차트이다.5 is a flowchart illustrating an embodiment of the present invention with respect to a tenth step performed by the initial battle dictionary generation unit.

도 6 (a) 및 (b)는 각각 한국어 언어 자료 및 초기 격틀 사전의 예시적인 도면들이다.6A and 6B are exemplary diagrams of Korean language material and an initial battle dictionary, respectively.

도 7은 제12 단계에 대한 본 발명에 의한 실시예를 설명하기 위한 플로우차트이다.7 is a flowchart for explaining an embodiment according to the present invention for the twelfth step.

도 8 (a) 및 (b)는 각각 초기 격틀 사전 및 한국어 격틀 사전의 예시적인 도면들이다.8A and 8B are exemplary diagrams of an initial battle dictionary and a Korean battle dictionary, respectively.

상기 과제를 이루기 위한 본 발명에 의한 선택 제한을 위한 동적 의미 분류 방법은, 주어진 한국어 사전에 기재된 각 표제어의 의미를 나타내는 풀이말을 속성별로 세분화하여 사전 지식 베이스를 생성하고, 주어진 한국어 언어 자료에 기재된 각 문장을 분석하여 용언별로 선택 제한 단어들을 찾아 초기 격틀 사전을 생성하는 단계 및 상기 용언별로 분류된 상기 선택 제한 단어들에 공통되는 상기 속성인 핵심 의미 속성을 추출하고, 추출된 상기 핵심 의미 속성을 기반으로 한국어 격틀 사전을 생성하는 단계로 이루어지는 것이 바람직하다.The dynamic semantic classification method for limiting the selection according to the present invention for achieving the above object is to generate a dictionary knowledge base by subdividing a pool term representing the meaning of each heading term described in a given Korean dictionary by attribute, Analyzing a sentence to find selection limit words for each term, generating an initial battle dictionary, extracting a core semantic attribute that is the attribute common to the selection limit words classified for each term, and based on the extracted core semantic attributes. It is preferable that the step consisting of generating a Korean battle dictionary.

상기 다른 과제를 이루기 위한 본 발명에 의한 선택 제한을 위한 동적 의미 분류 장치는, 외부로부터 입력한 한국어 사전에 기재된 각 표제어의 의미를 나타내는 풀이말을 속성별로 세분화하고, 상기 각 표제어에 대한 세분화된 속성별 의미를 나타내는 사전 지식 베이스를 출력하는 사전 지식 베이스 생성부와, 외부로부터 입력한 한국어 언어 자료에 기재된 각 문장을 분석하여 용언별로 선택 제한 단어들을 찾고, 상기 용언별로 상기 선택 제한 단어들을 나타내는 초기 격틀 사전을 출력하는 초기 격틀 사전 생성부 및 상기 용언별로 분류된 상기 선택 제한 단어들에 공통되는 상기 속성인 핵심 의미 속성을 상기 사전 지식 베이스로부터 추출하고, 각 용언에 대한 상기 핵심 의미 속성을 나타내는 한국어 격틀 사전을 출력하는 한국어 격틀 사전 생성부로 구성되는 것이 바람직하다.In accordance with another aspect of the present invention, there is provided a dynamic semantic classification apparatus for limiting selection, by dividing a pullword indicating the meaning of each heading word described in a Korean dictionary input from the outside by attribute, and by the broken down attribute for each heading. A dictionary knowledge base generation unit for outputting a dictionary knowledge base indicating meaning, and an analysis of each sentence described in the Korean language data input from outside to find selection restriction words for each term, and an initial fiction dictionary indicating the selection restriction words for each term. An initial battle dictionary generation unit for outputting a core semantic attribute, which is the attribute common to the selection restriction words classified by the phrases, from the dictionary knowledge base, and a Korean battle dictionary which indicates the core semantic attribute for each term Korean dictionary dictionary generation unit to output the It is configured are preferred.

이하, 본 발명에 의한 선택 제한을 위한 동적 의미 분류 방법과 그 방법을 수행하는 본 발명에 의한 선택 제한을 위한 동적 의미 분류 장치의 구성 및 동작을 첨부한 도면들을 참조하여 다음과 같이 설명한다.Hereinafter, a structure and an operation of a dynamic semantic classification method for limiting selection according to the present invention and a dynamic semantic classification apparatus for limiting selection according to the present invention performing the method will be described with reference to the accompanying drawings.

도 1은 본 발명에 의한 선택 제한을 위한 동적 의미 분류 방법을 설명하기 위한 플로우차트로서, 한국어 사전과 한국어 언어 자료로부터 각각 생성한 사전 지식 베이스와 초기 격틀 사전을 이용하여 한국어 격틀 사전을 생성하는 단계(제10 및 제12 단계들)로 이루어진다.1 is a flowchart illustrating a dynamic semantic classification method for limiting selection according to the present invention, comprising: generating a Korean battle dictionary using a dictionary knowledge base and an initial battle dictionary generated from a Korean dictionary and Korean language data, respectively; (The tenth and twelfth steps).

도 2는 도 1에 도시된 방법을 수행하는 본 발명에 의한 선택 제한을 위한 동적 의미 분류 장치의 일 실시예의 블럭도로서, 사전 지식 베이스 생성부(20), 초기 격틀 사전 생성부(22) 및 한국어 격틀 사전 생성부(24)로 구성된다.FIG. 2 is a block diagram of an embodiment of a dynamic semantic classification apparatus for limiting selection according to the present invention for performing the method shown in FIG. 1, which includes a prior knowledge base generation unit 20, an initial battle dictionary generation unit 22, and It is composed of a Korean battle dictionary dictionary 24.

도 1에 도시된 본 발명에 의한 선택 제한을 위한 동적 의미 분류 방법은 먼저, 주어진 한국어 사전에 기재된 각 표제어의 의미를 나타내는 풀이말을 속성별로 세분화하여 한국어 사전 지식 베이스(lexical knowledge base)를 생성하는 한편, 주어진 한국어 언어 자료에 기재된 각 문장을 분석하여 용언별로 선택 제한 단어들을 찾아 초기 격틀 사전을 생성한다(제10 단계). 여기서, 본 발명에 의하면, 표제어는 명사로 국한된다. 제10 단계를 수행하기 위해, 도 2에 도시된 바와 같이, 사전 지식 베이스 생성부(20)와 초기 격틀 사전 생성부(22)가 마련된다.The dynamic semantic classification method for limiting selection according to the present invention shown in FIG. 1 first generates a lexical knowledge base by subdividing a pool term representing the meaning of each heading described in a given Korean dictionary by attribute. Next, each sentence described in the given Korean language data is analyzed to find a selection limit word for each verb to generate an initial battle dictionary (step 10). Here, according to the present invention, the headword is limited to nouns. In order to perform the tenth step, as shown in FIG. 2, the prior knowledge base generation unit 20 and the initial gap dictionary generation unit 22 are provided.

먼저, 도 2에 도시된 사전 지식 베이스 생성부(20)는 외부로부터 입력단자 IN1을 통해 입력한 한국어 사전에 기재된 각 표제어의 의미를 나타내는 풀이말을 속성별로 세분화하고, 각 표제어에 대한 세분화된 속성별 의미를 보여주는 사전 지식 베이스를 생성하여 한국어 격틀 사전 생성부(24)로 출력한다. 이 때, 초기 격틀 사전 생성부(22)는 외부로부터 입력단자 IN2를 통해 입력한 한국어 언어 자료에 기재된 문장들 각각을 분석하여 용언별로 선택 제한 단어들을 찾고, 용언별 선택 제한 단어들을 보여주는 초기 격틀 사전을 생성하여 한국어 격틀 사전 생성부(24)로 출력한다.First, the dictionary knowledge base generation unit 20 illustrated in FIG. 2 subdivides a pool term indicating the meaning of each heading term described in the Korean dictionary input through the input terminal IN1 from the outside by attributes, and by the attribute segmentation for each heading term. The dictionary knowledge base showing the meaning is generated and output to the Korean dictionary dictionary generation unit 24. At this time, the initial battle dictionary generation unit 22 analyzes each of the sentences described in the Korean language data input through the input terminal IN2 from the outside to find the selection limit words for each language, the initial battle dictionary showing the selection restriction words for each language To generate and output to the Korean battle dictionary dictionary 24.

이하, 도 1에 도시된 제10 단계에 대한 본 발명에 의한 실시예들을 첨부한 도면들을 참조하여 다음과 같이 설명한다.Hereinafter, exemplary embodiments of the present invention for the tenth step illustrated in FIG. 1 will be described with reference to the accompanying drawings.

도 3은 도 2에 도시된 사전 지식 베이스 생성부(20)에서 수행되는 도 1에 도시된 제10 단계에 대한 본 발명에 의한 실시예(10A)를 설명하기 위한 플로우차트로서, 각 표제어의 풀이말을 형태소 단위로 분리하여 각 표제어에 대한 속성별 의미를 추출하는 단계(제60 및 제62 단계들)로 이루어진다.FIG. 3 is a flowchart for explaining an exemplary embodiment 10A according to the present invention for the tenth step shown in FIG. 1 performed by the prior knowledge base generation unit 20 shown in FIG. Extracting the meaning of each attribute for each headword by separating the morpheme units (sixty and sixty-sixth steps).

도 4 (a)는 한국어 사전의 예시적인 도면으로서, 표제어들과 각 표제어에 대한 풀이말들로 구성되고, 도 4 (b)는 도 4 (a)에 도시된 한국어 사전으로부터 생성된 사전 지식 베이스의 예시적인 도면으로서 표제어 및 그의 속성별 의미로 구성된다.Fig. 4 (a) is an exemplary diagram of a Korean dictionary, which consists of headings and pools for each heading, and Fig. 4 (b) shows a dictionary knowledge base generated from the Korean dictionary shown in Fig. 4 (a). Exemplary drawings are composed of headings and their attribute-specific meanings.

도 3에 도시된 제10A 단계를 수행하기 위해, 도 2에 도시된 사전 지식 베이스 생성부(20)는 제1 형태소 분리부(30) 및 속성별 의미 추출부(32)로 구현될 수 있다. 여기서, 제1 형태소 분리부(30)는 입력단자 IN1을 통해 외부로부터 입력한 한국어 사전에 기재된 각 표제어의 풀이말을 형태소 단위로 분리하고, 분리된 형태소들을 속성별 의미 추출부(32)로 출력한다(제60 단계). 제60 단계의 이해를 돕기 위해, 도 4 (a)에 도시된 한국어 사전이 제1 형태소 분리부(30)로 입력된다고 가정하자. 이 때, 제1 형태소 분리부(30)는 입력단자 IN1을 통해 입력한 도 4 (a)와 같은 한국어 사전에 기재된 표제어들(끈, 줄, 넥타이 및 테이프 등) 각각의 풀이말을 형태소 단위로 분리한다. 예를 들면, 제1 형태소 분리부(30)는 '끈'이라는 표제어의 풀이말인 '물건을 묶거나 꿰는데 쓰이는 가늘고 긴 물건'을 형태소 단위로 분리하고, 분리된 형태소들인 '물건을', '묶거나', '꿰는데', '쓰이는', '가늘고', '긴' 및 '물건'을 속성별 의미 추출부(32)로 출력한다.In order to perform step 10A illustrated in FIG. 3, the prior knowledge base generator 20 illustrated in FIG. 2 may be implemented as a first morpheme separator 30 and a property-specific meaning extractor 32. Here, the first morpheme separator 30 separates the pool words of each headword described in the Korean dictionary input from the outside through the input terminal IN1 in morpheme units, and outputs the separated morphemes to the semantic extractor 32 for each attribute. (Step 60). To aid understanding of the 60th step, it is assumed that the Korean dictionary shown in FIG. 4 (a) is input to the first morpheme separator 30. At this time, the first morpheme separation unit 30 separates the pooling words of each of the headwords (string, string, tie, tape, etc.) described in the Korean dictionary as shown in FIG. 4 (a) input through the input terminal IN1 into morphological units. do. For example, the first morpheme separation unit 30 separates the thin long object used to bind or sew an object, which is a term for the string 'string', into morphological units, and separates the morphemes' things' and ' Bundling, 'to sew', 'used', 'thin', 'long' and 'thing' is output to the attribute-specific meaning extraction unit (32).

제60 단계후에, 속성별 의미 추출부(32)는 제1 형태소 분리부(30)로부터 입력한 각 풀이말에 대한 분리된 형태소들을 속성별로 세분화하여 각 풀이말에 대한 속성별 의미들을 추출하고, 추출된 속성별 의미들을 기반으로 생성한 사전 지식 베이스를 한국어 격틀 사전 생성부(24)로 출력한다(제62 단계). 여기서, 속성별 의미 추출부(32)로부터 출력되는 사전 지식 베이스는 각 표제어에 대한 속성별 의미들을 보여준다. 만일, 각 풀이말에 대한 속성들을 H(Hypernyn), P(Purpose), F(Feature) 및 O(Object)라고 사전에 설정하였다면, 속성별 의미 추출부(32)는 예를 들어 끈에 대한 분리된 형태소들중에서 '물건', '묶다,꿰다', '가늘고 길다' 및 '물건'을 속성들 H, P, F 및 O로 세분화하여 도 4 (b)에 도시된 바와 같이 '끈'의 풀이말에 대한 속성별 의미들을 추출한다. 이와 같이, 속성별 의미 추출부(32)는 '줄', '넥타이', '테이프' 및 '철사'의 풀이말들 각각에 대한 속성별 의미들을 도 4 (b)에 도시된 바와 같이 추출한다. 따라서, 속성별 의미 추출부(32)는 4 (b)에 도시된 바와 같은 추출된 속성별 의미들을 그의 표제어와 함께 사전 지식 베이스로서 한국어 격틀 사전 생성부(24)로 출력한다.After the 60th step, the semantic extractor 32 for each attribute extracts the semantic meanings for the respective glue words by subdividing the separated morphemes for each glue word input from the first morpheme separator 30 by attribute. The dictionary knowledge base generated based on the semantics for each attribute is output to the Korean language dictionary generator 24 (step 62). Here, the dictionary knowledge base output from the semantic extractor 32 for each attribute shows the semantic meanings for each heading. If the attributes for each pool have been previously set to H (Hypernyn), P (Purpose), F (Feature), and O (Object), the property-specific semantic extractor 32 is separated from the string for example. Among the morphemes, 'things', 'tie, sew', 'thin and long' and 'things' are subdivided into attributes H, P, F and O, and as shown in Fig. 4 (b), at the end of the string Extract semantics for each attribute As such, the attribute-specific meaning extracting unit 32 extracts attribute-specific meanings for each of the ends of the string, the tie, the tape, and the wire, as shown in FIG. 4 (b). Accordingly, the attribute-specific meaning extraction unit 32 outputs the extracted attribute-specific meanings as shown in 4 (b) to the Korean language dictionary dictionary generation unit 24 as a dictionary knowledge base together with its headword.

도 5는 도 2에 도시된 초기 격틀 사전 생성부(22)에서 수행되는 도 1에 도시된 제10 단계에 대한 본 발명에 의한 실시예(10B)를 설명하기 위한 플로우차트로서, 한국어 언어 자료의 각 문장을 형태소 단위로 분리하여 그의 구문을 해석한 다음 각 문장이 갖는 용언의 선택 제한 단어들을 추출하는 단계(제70 및 제72 단계들) 및 모든 문장들에 대해 추출한 선택 제한 단어들을 용언별로 분류하는 단계(제74 단계)로 이루어진다.FIG. 5 is a flowchart illustrating an embodiment 10B according to the present invention for the tenth step shown in FIG. 1 performed by the initial frame dictionary generation unit 22 shown in FIG. Each sentence is divided into morphological units, the syntax is interpreted, and then the selection restriction words of the verbs of each sentence are extracted (steps 70 and 72), and the selection restriction words extracted for all sentences are classified by verbs. It consists of a step (step 74).

도 6 (a)는 한국어 언어 자료의 예시적인 도면으로서 다수개의 문장들로 구성되고, 도 6 (b)는 도 6 (a)에 도시된 한국어 언어 자료로부터 생성된 초기 격틀 사전에서 '감다'라는 용언과 그에 대한 선택 제한 단어들 '끈, 줄, 넥타이'를 예시적으로 보이는 도면이다.FIG. 6 (a) is an exemplary diagram of Korean language material and is composed of a plurality of sentences, and FIG. 6 (b) is called 'wind' in an initial fiction dictionary generated from Korean language material shown in FIG. 6 (a). Example of a verb and a selection restriction word 'string, string, tie' is shown.

도 5에 도시된 제10B 단계를 수행하기 위해, 도 2에 도시된 초기 격틀 사전 생성부(22)는 제2 형태소 분리부(40), 선택 제한 단어 추출부(42) 및 선택 제한 단어 분류부(44)로 구현될 수 있다. 여기서, 제2 형태소 분리부(40)는 입력단자 IN2를 통해 입력한 한국어 언어 자료에 기재된 문장들 각각을 형태소 단위로 분리하고, 각 문장에 대한 분리된 형태소들을 선택 제한 단어 추출부(42)로 출력한다(제70 단계). 제70 단계의 이해를 돕기 위해, 도 6 (a)에 도시된 한국어 언어 자료가 제2 형태소 분리부(40)로 입력되었다고 가정하자. 이 때, 제2 형태소 분리부(40)는 도 6 (a)에 도시된 한국어 언어 자료의 문장들 '끈으로 감다', '줄로 감는다' 및 '넥타이로 감았다'등 각각을 형태소 단위로 분리한다. 예를 들면 제2 형태소 분리부(40)는 '끈으로 감다'라는 문장을 형태소들 '끈으로' 및 '감다'로 분리하고, 분리된 형태소들 '끈으로' 및 '감다'를 선택 제한 단어 추출부(42)로 출력한다.In order to perform step 10B illustrated in FIG. 5, the initial frame dictionary generation unit 22 illustrated in FIG. 2 includes a second morpheme separator 40, a selection restriction word extractor 42, and a selection restriction word classification unit. It can be implemented at (44). Here, the second morpheme separator 40 separates each sentence described in the Korean language data input through the input terminal IN2 into morpheme units, and separates the morphemes for each sentence into the selection restriction word extractor 42. Output (step 70). To help understand the seventieth step, it is assumed that the Korean language data illustrated in FIG. 6 (a) is input to the second morpheme separator 40. At this time, the second morpheme separator 40 separates the sentences of the Korean language material shown in FIG. 6 (a) into 'winding with string', 'winding with a string', and 'winding with a tie'. do. For example, the second morpheme separation unit 40 separates the sentence 'wrap with a string' into morphemes 'string' and 'wind', and selects the separated morphemes 'string' and 'wind'. Output to the extraction part 42.

제70 단계후에, 선택 제한 단어 추출부(42)는 제2 형태소 분리부(40)로부터 입력한 각 문장에 대한 분리된 형태소들을 이용하여 각 문장의 구문을 해석하고, 구문을 해석한 결과를 이용하여 각 문장이 갖는 용언에 대한 적어도 하나의 선택 제한 단어를 그의 형태소들로부터 추출하고, 추출된 적어도 하나의 선택 제한 단어를 용언과 함께 선택 제한 단어 분류부(44)로 출력한다(제72 단계). 즉, 선택 제한 단어 추출부(42)는 각 문장의 구문을 분석하여 각 문장의 용언에 대한 예를 들면 주어나 목적어를 찾는다. 예컨데, 선택 제한 단어 추출부(42)는, '끈으로 감다'라는 문장의 구문을 해석하여 '감다'라는 용언에 대한 선택 제한 단어인 '끈'을 그의 용언 '감다'와 함께 선택 제한 단어 분류부(44)로 출력하고, '줄로 감는다'라는 문장의 구문을 해석하여 '감는다'라는 용언에 대한 선택 제한 단어인 '줄'을 그의 용언 '감다'와 함께 선택 제한 단어 분류부(44)로 출력하고, '넥타이로 감았다'라는 문장의 구문을 해석하여 '감았다'라는 용언에 대한 선택 제한 단어인 '넥타이'를 그의 용언 '감다'와 함께 선택 제한 단어 분류부(44)로 출력한다.After operation 70, the selection limit word extractor 42 interprets the syntax of each sentence using the separated morphemes for each sentence input from the second morpheme separator 40, and uses the result of analyzing the syntax. By extracting at least one selection restriction word for the verbs of each sentence from its morphemes, and outputs the extracted at least one selection restriction word to the selection restriction word classification unit 44 together with the phrase (step 72). . That is, the selection limit word extracting section 42 analyzes the syntax of each sentence to find, for example, a subject or an object for the term of each sentence. For example, the selection limit word extracting unit 42 interprets the phrase of the phrase 'winding by string,' and classifies the selection restriction word 'string' for the term 'winding' together with the word 'winding'. It outputs to section 44, interprets the syntax of the sentence 'winding with a line', and selects the word 'limiting' for the word 'winding' with the word 'winding' with the word 'winding' to the selection limiting word classification section 44. Outputs the word " tie ", which is a selection restriction word for the word " wound ", along with his word " wind " .

제72 단계후에, 선택 제한 단어 분류부(44)는 선택 제한 단어 추출부(42)로부터 입력한 모든 문장들에 대해 선택 제한 단어들을 용언별로 분류하여 생성한 초기 격틀 사전을 한국어 격틀 사전 생성부(24)로 출력한다(제74 단계). 예컨데, 선택 제한 단어 분류부(44)는 선택 제한 단어 추출부(42)로부터 선택 제한 단어와 용언간의 세개의 매칭 관계 즉, '끈-감다', '줄-감다' 및 '넥타이-감다'를 입력하고, 입력한 선택 제한 단어들 '끈', '줄' 및 '넥타이'를 용언 '감다'의 격틀([ ])로 묶는다. 여기서, 각 용언에 대한 격틀내에 존재하는 선택 제한 단어들 각각은 그 용언과 의미상으로 정합될 수 있어야 한다. 결국, 도 6 (b)에 도시된 바와 같이 초기 격틀 사전에서 '감다'라는 용언에 대한 선택 제한 단어들은 '끈, 줄, 넥타이'로 격틀에 의해 보여진다.After operation 72, the selection restriction word classification unit 44 generates an initial battle dictionary which is generated by classifying the selection restriction words for each sentence for all sentences input from the selection restriction word extraction unit 42. 24) (step 74). For example, the selection limit word classifying unit 44 receives three matching relations between the selection limit word and the verb from the selection limit word extracting unit 42, that is, 'string-wind', 'string-wind' and 'tie-wind'. Type and restrict the entered selection limit words 'string', 'string' and 'tie' into the perimeter ([]) of the word 'wind'. Here, each of the selection restriction words existing within the framework for each verb should be able to semantically match the verb. As a result, as shown in FIG. 6 (b), the selection restriction words for the word 'wind' in the initial battle dictionary are shown by the battle string as 'string, string, and tie'.

한편, 제10 단계후에, 한국어 격틀 사전 생성부(24)는 용언별로 분류된 선택 제한 단어들에 공통되는 속성인 핵심 의미 속성을 사전 지식 베이스 생성부(20)로부터 입력한 사전 지식 베이스로부터 추출하고, 각 용언에 대해 추출된 핵심 의미 속성을 기반으로 한국어 격틀 사전을 생성하며, 생성된 한국어 격틀 사전을 출력단자 OUT를 통해 출력한다(제12 단계). 여기서, 한국어 격틀 사전은 각 용언에 대한 핵심 의미 속성을 보여준다.Meanwhile, after the tenth step, the Korean battle dictionary dictionary 24 extracts a core semantic attribute, which is an attribute common to the selection restriction words classified by words, from the dictionary knowledge base inputted from the dictionary knowledge base generator 20. In addition, a Korean Korean dictionary is generated based on the extracted core semantic attributes for each verb, and the Korean Korean dictionary is generated through the output terminal OUT (step 12). Here, the Korean Dictionary Dictionary shows the key semantic attributes for each verb.

이하, 제12 단계에 대한 본 발명에 의한 실시예를 첨부된 도면을 참조하여 다음과 같이 설명한다.Hereinafter, with reference to the accompanying drawings an embodiment of the present invention for the twelfth step will be described as follows.

도 7은 도 1에 도시된 제12 단계에 대한 본 발명에 의한 실시예(12A)를 설명하기 위한 플로우차트로서, 사전 지식 베이스와 기초 격틀 사전으로부터 한국어 격틀 사전을 생성하는 단계(제80 ∼ 제84 단계들)로 이루어진다.FIG. 7 is a flowchart for explaining an embodiment 12A according to the present invention with respect to the twelfth step shown in FIG. 1, wherein the Korean language dictionary is generated from a dictionary knowledge base and a basic battle dictionary. 84 steps).

도 7에 도시된 제12A 단계를 수행하기 위해, 도 2에 도시된 한국어 격틀 사전 생성부(24)는 표현 통일부(50), 유사성 조사부(52) 및 핵심 의미 속성 결정부(54)로 구현될 수 있다. 여기서, 표현 통일부(50)는 용언별로 분류된 각 용언의 선택 제한 단어들중 의미상 서로 정합되는 단어들을 하나로 통일하여 표현한다(제80 단계). 예를 들어, 초기 격틀 사전 생성부(22)로부터 입력한 초기 격틀 사전에서, 도 6 (b)에 도시된 용언 '감다'에 대한 선택 제한 단어들 '끈, 줄, 넥타이'들중 '끈'과 '줄'이 의미상으로 정합된다면, 표현 통일부(50)는 '끈'과 '줄'이라는 두 개의 선택 제한 단어들을 '끈' 또는 '줄'이라는 하나의 표현으로 통일한다. 여기서, 표현 통일부(50)는 선택 제한 단어들이 의미상으로 정합되느냐 정합되지 않느냐를 사전 지식 베이스 생성부(20)로부터 입력한 사전 지식 베이스를 분석하여 결정한다.In order to perform step 12A illustrated in FIG. 7, the Korean language dictionary generation unit 24 illustrated in FIG. 2 may be implemented by the expression unification unit 50, the similarity investigation unit 52, and the core semantic attribute determination unit 54. Can be. Here, the expression unification unit 50 uniformly expresses the words matched with each other in the semantic selection restriction words of each verb categorized by one into one (step 80). For example, in the initial battle dictionary inputted from the initial battle dictionary generation unit 22, 'string' among the selection restriction words 'string, string, tie' for the word 'wind' shown in FIG. 6 (b). And 'string' are semantically matched, the expression unification unit 50 unites two selection restriction words 'string' and 'string' into one expression 'string' or 'string'. Here, the expression unification unit 50 analyzes the prior knowledge base input from the prior knowledge base generation unit 20 to determine whether the selection restriction words are semantically matched or not.

제80 단계후에, 유사성 조사부(52)는 표현 통일부(50)로부터 입력한 각 용언에 대한 선택 제한 단어들의 속성별 의미들간 유사성을 지식 기반 베이스생성부(20)로부터 입력한 지식 기반 베이스를 분석하여 조사하고, 조사된 유사성을 핵심 의미 속성 결정부(54)로 출력한다(제82 단계). 예를 들면, 유사성 조사부(52)는 '감다'라는 용언에 대한 선택 제한 단어들 '줄, 넥타이'(또는, '끈, 넥타이')의 속성들간 유사성을 조사하고, 조사된 유사성 즉, "줄과 넥타이는 F라는 속성의 의미가 유사하다"(또는, "끈과 넥타이는 F라는 속성의 의미가 유사하다")을 핵심 의미 결정부(54)로 출력한다.After the 80th step, the similarity research unit 52 analyzes the knowledge base based on the similarity between the semantic meanings of the selection restriction words for each term input from the expression unification unit 50 from the knowledge base generation unit 20. The searched similarity is output to the core semantic attribute determiner 54 (step 82). For example, the similarity investigator 52 examines the similarity between the attributes of the selection restriction words 'string, tie' (or 'string, tie') for the term 'wind', and examines the similarity, that is, "string". And ties have similar meanings of the attribute "F" (or "strings and ties have similar meanings of the attribute F" ") are output to the core meaning determination unit 54.

제82 단계후에, 핵심 의미 속성 결정부(54)는 유사성 조사부(52)로부터 입력한 유사성 및 속성들(H, P, F 및 O)의 우선 순위에 상응하여 각 용언에 대한 선택 제한 단어들에 공통되는 핵심 의미 속성을 결정하고, 결정된 핵심 의미 속성을 기반으로 생성한 한국어 격틀 사전을 출력단자 OUT를 통해 출력한다(제84 단계).After the eighty-eighth step, the core semantic attribute determiner 54 corresponds to the selection restriction words for each term corresponding to the priority of the similarities and attributes H, P, F and O input from the similarity inspector 52. The common core semantic attribute is determined, and the Korean Korean dictionary is generated based on the determined core semantic attribute through the output terminal OUT (step 84).

도 8 (a)는 초기 격틀 사전 생성부(22)에서 생성된 초기 격틀 사전의 예시적인 도면이고, 도 8 (b)는 한국어 격틀 사전 생성부(24)에서 생성된 한국어 격틀 사전의 예시적인 도면이다.FIG. 8 (a) is an exemplary diagram of an initial battle dictionary generated by the initial battle dictionary generator 22, and FIG. 8 (b) is an example of a Korean battle dictionary generated by the Korean battle dictionary generator 24 to be.

예를 들어, 속성들의 우선 순위가 O, F, P 및 H의 순서로 낮아지고, 초기 격틀 사전이 도 8 (a)에 도시된 바와 같이 주어진다고 가정할 때, 핵심 의미 속성 결정부(54)는 유사성 조사부(52)로부터 입력한 끈(또는, 줄) 및 넥타이의 유사성을 조사하여 속성 F를 핵심 의미 속성으로 결정하고, '핵심 의미 속성(F:FEATURE) 및 가늘고 길다'를 한국어 격틀 사전의 격틀([ ])에 도 8 (b)에 도시된 바와 같이 기입한다. 만일, 유사성을 조사한 후 속성들 F와 H의 유사성을 동일하다면, 핵심 의미 속성 결정부(54)는 H보다 우선 순위가 높은 속성 F를 핵심 의미 속성으로서 결정한다. 이와 같이, 한국어 격틀 사전 생성부(24)는 각 용언에 대한 선택 제한 단어들의 핵심 의미 속성을 결정하여 한국어 격틀 사전을 생성한다.For example, assuming that the priority of the attributes is lowered in the order of O, F, P, and H, and the initial battle dictionary is given as shown in Fig. 8 (a), the core semantic attribute determination unit 54 Determines the attribute F as the core semantic attribute by examining the similarity of the string (or the string) and the tie inputted from the similarity investigator 52, and the term 'core semantic attribute (F: FEATURE) and elongated' in Korean dictionary dictionary. Fill in the space [] as shown in Fig. 8B. If the similarities between the attributes F and H are the same after checking the similarity, the core semantic attribute determination unit 54 determines the attribute F having a higher priority than H as the core semantic attribute. As such, the Korean battle dictionary generation unit 24 determines the core semantic attributes of the selection restriction words for each verb to generate the Korean battle dictionary.

전술한 본 발명에 의한 선택 제한을 위한 동적 의미 분류 방법 및 장치는 한국어 문장을 처리하기 위한 자동 번역 시스템, 자동 검색기, 음성 인식기 또는 음성 합성기 등에 적용될 수 있다.The dynamic semantic classification method and apparatus for limiting selection according to the present invention described above may be applied to an automatic translation system, an automatic searcher, a speech recognizer or a speech synthesizer for processing Korean sentences.

이상에서 설명한 바와 같이, 본 발명에 의한 선택 제한을 위한 동적 의미 분류 방법 및 장치는 한국어 사전의 풀이말을 이용하여 표제어인 명사를 의미에 따라 자동으로 분류할 수 있고, 핵심 의미 속성에 의해 과생성/미생성 없이 선택 제한을 표현할 수 있고, 한국어 격틀 사전을 이용하는 응용 시스템 예를 들면 한국어 구문 해석기, 의미 해석기 등 한국어 해석기 및 음성 인식 및 합성기에서 언어 처리부의 성능을 향상시킬 수 있는 효과를 갖는다.As described above, the dynamic semantic classification method and apparatus for limiting selection according to the present invention can automatically classify a noun which is a headword according to a meaning by using a lexical word of a Korean dictionary, and overproduce / It is possible to express the selection restriction without ungenerating, and to improve the performance of the language processing unit in an application system using a Korean dictionary dictionary, for example, a Korean interpreter such as a Korean syntax interpreter and a semantic interpreter, and a speech recognition and synthesizer.

Claims

(a) To create a dictionary knowledge base by dividing the pool words that represent the meaning of each heading listed in a given Korean dictionary by attribute, and to analyze the sentences listed in the given Korean language data to find the selection restriction words for each word and to create an initial lexical dictionary. Doing; And

(b) extracting a core semantic attribute which is the attribute common to the selection restriction words categorized for each term, and generating a Korean dictionary of dictionaries based on the extracted core semantic attributes; Dynamic semantic classification method for restriction.

The method of claim 1, wherein step (a)

(a1) separating the grass glue into morphological units; And

(a2) subdividing the separated morphemes by the attribute and extracting the semantic meanings of each attribute from the morphemes,

And said prior knowledge base represents said semantic meanings for each heading.

The method of claim 1, wherein step (a)

(a3) separating each sentence described in the Korean language material into morpheme units;

(a4) parsing the syntax of each sentence using the morphemes separated for each sentence, and using the result of parsing the phrase, converting the at least one selection restriction word for the verb of each sentence into the morpheme Extracting from them; And

(a5) classifying the selection restriction words extracted for all sentences by the verbs,

And the initial battle dictionary indicates the selection restriction words for each of the words.

The method of claim 1, wherein step (b)

(b1) unifying and expressing the words that are matched with each other semantically among the selection restriction words of the respective words classified by the words into one;

(b2) examining the similarities between the meanings of the attributes of the selection restriction words for each term; And

(b3) determining the core semantic attribute common to the selection restriction words for each verb according to the similarity and the priority of the attributes;

And the Korean fiction dictionary indicates the key semantic attributes for each of the verbs.

A dictionary knowledge base generation unit for dividing a pool term indicating the meaning of each heading term described in the Korean dictionary input from the outside by attribute, and outputting a prior knowledge base indicating the meaning of each heading for each heading term;

An initial battle dictionary generation unit for analyzing selection sentences in Korean language data input from an outside, finding selection limit words for each term, and outputting an initial battle dictionary representing the selection limit words for each term; And

And a Korean language dictionary generation unit for extracting a core meaning attribute, which is the attribute common to the selection restriction words classified by the terms, from the dictionary knowledge base and outputting a Korean language dictionary indicating the core meaning attribute for each verb. Dynamic semantic classification device for selection restriction, characterized in that.

The method of claim 5, wherein the prior knowledge base generation unit

A first morpheme separation unit for separating the grass endings into morpheme units and outputting the separated morphemes; And

The semantic meanings of the attribute morphemes inputted from the first morpheme separation unit are subdivided by the attributes to extract the meanings of each attribute for each pool term and output the dictionary knowledge base generated based on the extracted meanings of the attributes. Dynamic semantic classification device for selection restriction, characterized in that it comprises an extraction unit.

The method of claim 5, wherein the initial frame advance generation unit

A second morpheme separator that separates each sentence described in the Korean language material into morpheme units and outputs the separated morphemes;

A selection restriction word extracting unit for analyzing the syntax of each sentence by using the morphemes input from the second morpheme separating unit and extracting the selection restriction words for the words of each sentence from a result of interpreting the syntax; And

And a selection restriction word classifying unit configured to output the initial frame dictionary generated by classifying the selection restriction words for all sentences inputted from the selection restriction word extraction unit by the terms. Sorting device.

The method of claim 5, wherein the Korean dictionary dictionary generation unit

An expression unification unit for unifying and expressing the words that are matched with each other semantically among the selection restriction words for the respective words classified by the words;

A similarity checking unit that examines the similarity between the meanings of each attribute of the selection restriction words for each term inputted from the expression unification unit, and outputs the examined similarity; And

Determining the core semantic attribute common to the selection restriction words for the respective words according to the similarity and the priority of the attributes input from the similarity checking unit, and generating the Korean semantic based on the determined core semantic attribute And a core semantic attribute determiner for outputting a dictionary.