KR101092363B1

KR101092363B1 - Method for generating korean connectives in chinese-korean machine translation and its apparatus

Info

Publication number: KR101092363B1
Application number: KR1020080130777A
Authority: KR
Inventors: 권오욱; 오영순; 김운; 윤창호; 최승권; 이기영; 노윤형; 김창현; 서영애; 양성일; 황금하; 박은진; 김영길; 박상규
Original assignee: 한국전자통신연구원
Priority date: 2008-12-22
Filing date: 2008-12-22
Publication date: 2011-12-09
Also published as: KR20100072384A

Abstract

본 발명은 중한자동번역을 위한 한국어 연결어미 생성 방법 및 그 장치에 관한 것으로, 중/한 자동번역에서 명시적인 어휘로 절 간 논리연결을 표현하지 않은 중국어 문장을 한국어로 번역하는 경우에 있어서, 번역한 한국어의 절 간에 명시적인 논리연결로 표현하지 못하는 문제점을 해결하여 한국어 연결어미로 절 간의 논리연결을 명시적으로 표현하여 중국어 원문의 의미를 자연스러운 한국어 문장으로 자동 번역함으로써 중/한 자동번역의 번역 성능을 향상시킬 수 있다. 또한, 본 발명은 번역할 중국어 원문이 비명시적인 중국어의 단문 논리연결을 가질 경우에 한국어 관점에서 명시적인 연결어미를 생성하여 한국어 문장을 생성함으로써, 중국어 원문의 원래 의미를 그대로 가지면서 높은 품질의 한국어를 생성할 수 있다.The present invention relates to a method and apparatus for generating Korean connected endings for automatic Chinese-to-Korean translation. In case of translating a Chinese sentence not expressing a logical connection between clauses into an explicit vocabulary in Chinese / Korean automatic translation, translation Resolve the problem of not expressing an explicit logical connection between Korean clauses, and expressing the logical connection between clauses with Korean connection endings to automatically translate the meaning of Chinese original text into natural Korean sentences. It can improve performance. In addition, the present invention, if the original Chinese text to be translated has an explicit Chinese short-term logical connection by generating an explicit connection ending from the Korean perspective, by generating a Korean sentence, while maintaining the original meaning of the original Chinese text, high quality Korean Can be generated.

중국어, 한국어, 중국어-한국어 자동번역, 한국어 생성, 변환 Chinese, Korean, Chinese-Korean Automatic Translation, Korean Generation, Conversion

Description

TECHNICAL FOR GENERATING KOREAN CONNECTIVES IN CHINESE-KOREAN MACHINE TRANSLATION AND ITS APPARATUS

본 발명은 중한자동번역을 위한 한국어 연결어미 생성 방법 및 그 장치에 관한 것으로, 보다 상세하게 설명하면 중국어 입력 문장이 복문인 경우, 단문과 단문 사이에 논리적 연결을 표현하고 있지 않아서 번역 한국어 문장에서 논리적 연결을 위한 연결어미 생성이 자연스럽지 않은 문제를 해결할 수 있도록 하는 방법 및 그 장치에 관한 것이다.The present invention relates to a method and apparatus for generating Korean connected endings for automatic Chinese translation. In more detail, when a Chinese input sentence is a complex sentence, the present invention does not express a logical connection between a short sentence and a short sentence. The present invention relates to a method and apparatus for enabling a connection ending generation for a connection to solve an unnatural problem.

본 발명은 지식경제부 및 정보통신연구진흥원의 IT성장동력기술개발사업의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호: 2006-S-037-03, 과제명: 응용 특화 한중영 자동번역 기술개발].The present invention is derived from the research conducted as part of the IT growth engine technology development project of the Ministry of Knowledge Economy and the Ministry of Information and Communication Research and Development. [Task management number: 2006-S-037-03, Task name: Development of application-specific Korean-Chinese automatic translation technology] ].

주지된 바와 같이, 중국어에서 두 개 이상의 단문으로 구성되어 있는 복문의 경우, 복문을 구성하고 있는 단문 간의 논리적 연결이 접속사로 명시적으로 표현되어 있는 경우도 있지만, 일반적으로 더 많은 복문에서 단문 간의 논리적 연결이 전 혀 표현되어 있지 않는 특징이 있다.As is well known, in the case of a compound sentence consisting of two or more short sentences in Chinese, the logical connection between the sentences forming the compound sentence is sometimes expressly expressed as an conjunction, but in more compound sentences in general, There is a characteristic that the connection is not represented at all.

이러한 특징을 바탕으로 중/한 자동번역을 수행하는 방법에서는 명시적으로 단문 간의 논리적 연결을 표시한 중국어 어휘가 있는 경우는 그 어휘들을 한국어로 번역하여 단문 간의 논리적 연결을 표현할 수 있지만, 복문에서 단문 간의 논리연결을 특정 중국어로 명시적으로 표현하지 않은 비명시적 논리 연결인 경우에는 적합한 한국어 연결어미를 추가 생성하여야 한다. 그리고 그 단문 간의 논리적 연결 특성을 파악할 수가 없어서 일반적으로 가장 많이 사용되고 다양한 의미들을 내포하고 있는 연결어미인 “~고”를 이용하여 단문 간 논리 연결을 한다.On the basis of this feature, in the case of Chinese-Chinese automatic translation method, if there is a Chinese vocabulary expressing a logical connection between short sentences, the logical connection between the short sentences can be expressed by translating the vocabulary into Korean. In the case of non-explicit logical connections that do not explicitly express the logical connection between them, a suitable Korean connection ending must be created. In addition, logical connection between short sentences is made by using “~ go”, which is the most commonly used connection meaning because it cannot grasp the characteristics of logical connection between short sentences.

즉, 한국어 연결어미 “~고”는 다양한 논리연결 표현이 가능하고, 비명시적인논리연결을 가지는 중국어 복문들은 대부분 “~고”로 대표할 수 있는 병렬이나 순차적인 논리연결을 가지는 경우가 많아서 기존 중/한 자동번역 방법에서 단문 간 논리연결을 파악하지 않고 일괄적으로 “~고”를 이용하여 단문 간에 연결을 수행하였다.In other words, Korean connection ending “~ Go” can express various logical connections, and Chinese compound sentences which have non-explicit logical connection mostly have parallel or sequential logical connections that can be represented by “~ high”. In the automatic translation method, we did not grasp the logical connection between short sentences, but used the link between short sentences using “~ high” in a batch.

하지만, 중국어 복문에서 단문 간의 논리연결을 분석하지 않았으므로 당연히 아래와 같이, 한국어 번역문에서 단문 연결을 단지 “~고”로만 생성할 경우에 한국어로 자연스럽지 못한 경우와 “~고”가 표현할 수 없는 논리연결 관계인 경우를 볼 수 있다. 아래의 예문 1,2,3의 경우는 연결어미 “~고”를 사용하면 앞 뒤 단문의 논리연결이 이상함을 알 수 있다. 다시 말하여, 예문 1,2,3의 경우에는 “~어서”를 사용하는 것이 의미적 논리연결이 한국어 관점에서 분명하게 보인다.However, since Chinese did not analyze the logical connection between short sentences, it is obvious that when Korean short texts are created with only "~" in Korean translation, it is not natural in Korean and logic that "~" cannot express. You can see the case of connection. In the following examples 1, 2, and 3, when the connection ending “~ high” is used, the logical connection of the front and back sentences is abnormal. In other words, in the case of sentences 1,2 and 3, it is clear that the semantic logical connection is obvious from the Korean point of view.

(예문 1)

(Example 1)

(번역문 1) (1-1)당일 유럽주식의 표현이 양호하다, (1-2)주요 주가지수가 모두 높게 마감하다.(Translation 1) (1-1) Good European stock expressions on that day, (1-2) All major stocks closed high.

(예문 2) (2-1)

(Example 2) (2-1)

(번역문 2) (2-1)이 주식은 최근 하락폭이 거대하다, (2-2)단기적으로 빠르게 반등할 가능성이 있다.(2-1) The stock has recently fallen significantly, and (2-2) may rebound quickly in the short term.

(예문 3)

(Example 3)

(번역문 3) (3-1)동시에 주식 상승 영향을 받다, (3-2) 자금이 더 이상 미국국채로 몰리지 않는다.(3-1) Affected by rising stocks at the same time, (3-2) Funds no longer flow into US Treasury bonds.

상술한 바와 같이, 중한 자동번역 장치에서 중국어 복문을 한국어로 번역하기 위해서, 단문 간 논리연결의 정확한 의미에 맞고 한국어 관점에서 자연스러운 연결어미를 생성하는 것이 전체적인 의미 전달에 상당한 영향을 미치는 것을 알 수 있다. 비명시적인 논리연결을 가지는 중국어 복문을 한국어 복문으로 생성하기 위해서는 중한 자동번역 장치가 각 단문이 표현하는 의미를 정확하게 알고 그 의미들 간의 연결이 논리적으로 어떠한 상태인지를 파악해야 한다. As described above, in order to translate the Chinese compound sentence into Korean in the Chinese-Chinese automatic translation device, it can be seen that the generation of natural connection endings in accordance with the exact meaning of the logical connection between the short sentences and the natural meaning of the Korean language has a significant effect on the overall meaning transmission. . In order to generate Chinese compound sentences with non-explicit logical connections into Korean compound sentences, a Chinese automatic translation apparatus must know exactly the meanings of each short sentence and figure out the logical state of the connections between the meanings.

그러나, 상술한 바와 같은 종래 기술에서는 문장이 나타내는 의미를 파악하고 그 의미간의 논리적 연결을 파악하는 것은 매우 어려운 문제이고 이를 해결하기 위해서는 매우 복잡한 언어 지식과 엔진이 필요하여서 기존 중한 자동번역 장치에서는 해결을 시도하지 않았다는 문제점이 있다. However, in the prior art as described above, it is very difficult to grasp the meaning represented by the sentence and the logical connection between the meanings. To solve this problem, very complicated language knowledge and an engine are required. There is a problem that I did not try.

이에, 본 발명의 기술적 과제는 상술한 문제점을 해결하기 위해 안출한 것으로서, 중한 자동번역에서 중국어에 명시적으로 표현되지 않은 단문 간의 논리연결을 가지는 중국어 문장을 한국어로 번역하는데 있어서, 번역할 중국어 원문이 비명시적인 중국어의 단문 논리연결을 가질 경우에 한국어 관점에서 명시적인 연결어미를 생성하여 한국어 문장을 생성할 수 있도록 하는 중한자동번역을 위한 한국어 연결어미 생성 방법 및 그 장치를 제공한다. Accordingly, the technical problem of the present invention was devised to solve the above-mentioned problems, and in translation of a Chinese sentence having a logical connection between short sentences not expressly expressed in Chinese in Chinese automatic translation into Korean, the original Chinese text to be translated The present invention provides a method and apparatus for generating a Korean linking ending for automatic Korean-to-Korean translation, which can generate an explicit linking ending from a Korean point of view, in the case of having an explicit Chinese short logical linking.

본 발명의 일 관점에 따른 중한자동번역을 위한 한국어 연결어미 생성 방법은, 중국어 입력문장을 중국어 단어로 분리하고, 분리된 단어에 대하여 형태소 품사를 선택하는 단계와, 형태소 품사를 바탕으로 중국어 구문트리를 생성하는 단계와, 중국어 구문트리를 한국어 구문트리로 변환하고, 중국어 구문트리의 단어에 대하여 한국어 대역어로 변환하는 단계와, 한국어 구문트리가 복수의 단문 구문트리인 경우 한국어 원시코퍼스로부터 추출된 연결어미 지식 DB를 이용하여 각 단문 간의 연결어미를 생성하는 단계와, 연결어미가 생성된 한국어 구문트리를 이용하여 한국어 어순에 따라 한국어 단어들을 재배치하고 한국어 문법과 용언의 양상에 따 라 기능어 생성 기능을 통해 한국어를 생성하는 단계와, 한국어 원시코퍼스로부터 연결어미 지식을 추출하여 연결어미 지식 DB에 구축하는 단계를 포함하는 것을 특징으로 한다.In accordance with an aspect of the present invention, a method of generating Korean connected endings for automatic Korean-Chinese translation comprises: separating Chinese input sentences into Chinese words, selecting morpheme parts of speech for the separated words, and a Chinese syntax tree based on morpheme parts of speech. Generating a step, converting a Chinese syntax tree into a Korean syntax tree, converting a word in the Chinese syntax tree into a Korean band word, and connecting the extracted extracted from the Korean native corpus when the Korean syntax tree is a plurality of short syntax trees. Generating the linking ending between each short sentence using the mother knowledge DB, rearranging the Korean words according to the Korean word order using the Korean syntax tree in which the linking end is generated, and generating functional words according to the aspects of Korean grammar and verbs. To generate Korean language, and to add knowledge of connected endings from Korean native corpus. It is characterized in that it comprises a step of building in the connection mother knowledge DB.

또한, 본 발명의 다른 관점에 따른 중한자동번역을 위한 한국어 연결어미 생성 장치는, 중국어 입력문장을 중국어 단어로 분리하고, 분리된 단어에 대하여 형태소 품사를 선택하도록 형태소를 분석하는 중국어 형태소 분석부와, 형태소 품사를 바탕으로 중국어 구문트리를 생성하도록 구문을 분석하는 중국어 구문 분석부와, 중국어 구문트리를 한국어 구문트리로 변환하고, 중국어 구문트리의 단어에 대하여 한국어 대역어로 변환하는 중/한 변환부와, 한국어 구문트리가 복수의 단문 구문트리인 경우 한국어 원시코퍼스로부터 추출된 연결어미 지식 DB를 이용하여 각 단문 간의 연결어미를 생성하는 연결어미 결정부와, 연결어미가 생성된 한국어 구문트리를 이용하여 한국어 어순에 따라 한국어 단어들을 재배치하고 한국어 문법과 용언의 양상에 따라 기능어 생성 기능을 통해 한국어를 생성하는 한국어 생성부를 포함하는 것을 특징으로 한다.In addition, the Korean connected ending generation apparatus for automatic Korean-Chinese translation according to another aspect of the present invention, the Chinese morpheme analysis unit for separating the Chinese input sentence into Chinese words, and analyzes the morpheme to select the morpheme parts of speech for the separated words; A Chinese parsing unit for parsing a Chinese syntax tree based on morpheme parts of speech, a Chinese / Korean translator for converting a Chinese syntax tree to a Korean syntax tree, and converting a word in the Chinese syntax tree into a Korean band word. And, if the Korean syntax tree is a plurality of short phrase tree, using a connection ending determination unit for generating a connection ending between each sentence using the connection ending knowledge DB extracted from the Korean source corpus, and using the Korean syntax tree generated linking ending Rearrange Korean words according to Korean word order, and follow the pattern of Korean grammar and verbs. Through the function words generation it characterized in that it comprises a generator for generating a Korean Korean.

본 발명은 중/한 자동번역에서 명시적인 어휘로 절 간 논리연결을 표현하지 않은 중국어 문장을 한국어로 번역하는 경우에 있어서, 번역한 한국어의 절 간에 명시적인 논리연결로 표현하지 못하는 문제점을 해결하여 한국어 연결어미로 절 간의 논리연결을 명시적으로 표현하여 중국어 원문의 의미를 자연스러운 한국어 문장 으로 자동 번역함으로써 중/한 자동번역의 번역 성능을 향상시킬 수 있다.The present invention solves the problem of not expressing an explicit logical connection between the translated Korean clauses when translating a Chinese sentence that does not express the logical connection between clauses with an explicit vocabulary in Korean / Chinese translation. By expressing the logical connection between clauses with Korean connection endings, the translation performance of Chinese / Korean automatic translation can be improved by automatically translating the meaning of the original Chinese text into natural Korean sentences.

또한, 본 발명은 번역할 중국어 원문이 비명시적인 중국어의 단문 논리연결을 가질 경우에 한국어 관점에서 명시적인 연결어미를 생성하여 한국어 문장을 생성함으로써, 중국어 원문의 원래 의미를 그대로 가지면서 높은 품질의 한국어를 생성할 수 있는 이점이 있다. In addition, the present invention, if the original Chinese text to be translated has an explicit Chinese short-term logical connection by generating an explicit connection ending from the Korean perspective, by generating a Korean sentence, while maintaining the original meaning of the original Chinese text, high quality Korean There is an advantage that can be generated.

이하, 첨부된 도면을 참조하여 본 발명의 동작 원리를 상세히 설명한다. 하기에서 본 발명을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. Hereinafter, with reference to the accompanying drawings will be described in detail the operating principle of the present invention. In the following description of the present invention, if it is determined that a detailed description of a known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted. The following terms are defined in consideration of the functions of the present invention, and may be changed according to the intentions or customs of the user, the operator, and the like. Therefore, the definition should be based on the contents throughout this specification.

도 1은 본 발명의 일 실시예에 따른 중한자동번역을 위한 한국어 연결어미 생성 장치에 대한 블록 구성도로서, 중국어 입력문장을 중국어 형태소사전을 이용하여 중국어 단어로 분리하고 분리된 단어에 대하여 입력 문장 문맥에서 최적의 형태소 품사를 선택하는 중국어 형태소 분석부(101)와 중국어 형태소 분석부(101)에 의해 분리된 중국어 단어에 대한 형태소 품사를 바탕으로 단어들이 서로 결합하여 문장을 이루는 구조를 현재 문장에 적합하게 문법 구조적 모호성을 해결하고 분석하여 문장의 구조를 나타내는 중국어 구문트리를 생성하는 중국어 구문 분석부(102)와 중국어 구조 분석부(102)에 의해 분석된 중국어 구문트리를 한국어 문법에 맞도록 한국어 구문트리로 변환하고 중국어 구문트리의 단어에 대하여 한국어 대역어로 변환하는 중/한 변환부(103)와 중/한 변환부(103)에 의해 생성된 한국어 구문트리가 2개 이상의 단문 구문트리로 구성된 경우에 각 단문 간의 논리적 연결 표지가 비명시적인 경우에 한국어 원시코퍼스(raw corpus)로부터 추출된 연결어미 지식 DB(106)를 이용하여 각 단문 간의 연결어미를 생성하는 연결어미 결정부(104)와 중/한 변환부(103) 및 연결어미 결정부(104)에 의하여 갖추어진 한국어 구문트리를 이용하여 한국어 어순에 따라 한국어 단어들을 재배치하고 한국어 문법과 용언의 양상(modality)에 따라 조사와 어미와 같은 기능어(functional word)를 생성하여 자연스러운 한국어를 생성하는 한국어 생성부(105)와 연결어미 지식 구축부(107)에 의해 구축된 한국어 원시코퍼스로부터 추출된 연결어미 지식을 저장하는 연결어미 지식 데이터베이스(DataBase, 이하 DB라 함)(106)와 한국어 원시코퍼스(S1)로부터 연결어미 지식을 추출하여 연결어미 지식 DB(106)에 저장하여 구축하는 연결어미 지식 구축부(107)를 포함할 수 있다.1 is a block diagram of an apparatus for generating Korean connected endings for automatic Korean-Chinese translation according to an embodiment of the present invention, wherein a Chinese input sentence is divided into Chinese words using a Chinese morpheme dictionary and an input sentence for the separated words Based on the morpheme parts of Chinese words separated by the Chinese morpheme analysis unit 101 and the Chinese morpheme analysis unit 101 which select the optimal morpheme parts of speech from the context, the structure of words combined with each other to form a sentence is included in the current sentence. The Chinese syntax tree analyzed by the Chinese syntax parser 102 and the Chinese structure parser 102, which suitably resolves and analyzes the grammatical structural ambiguity to generate the sentence structure of the sentence, is adapted to Korean grammar. Translating into a syntax tree and translating from Korean to Korean bandwords for the words in the Chinese syntax tree When the Korean syntax tree generated by the unit 103 and the Chinese / Chinese translator 103 is composed of two or more short phrase trees, the logical linkage mark between the short sentences is unspecified from the Korean raw corpus. Korean syntax provided by the connection ending determination unit 104, the Chinese / Korean conversion unit 103, and the connection ending determination unit 104 which generate the connection endings between the respective short sentences using the extracted connection ending knowledge DB 106. Using the tree to rearrange Korean words according to the Korean word order and connect them to the Korean language generation unit 105 that generates natural Korean by generating functional words such as surveys and endings according to Korean grammar and verbal modality. Linked-edge knowledge database (DataBase, DB hereinafter) 106 that stores the linked-end knowledge extracted from the Korean native corpus constructed by the mother-based knowledge construction unit 107 and Korean native The connection mother knowledge construction unit 107 extracts the connection mother knowledge from the corpus S1 and stores the connection mother knowledge in the connection mother knowledge DB 106.

따라서, 본 발명은 번역할 중국어 원문이 비명시적인 중국어의 단문 논리연결을 가질 경우에 한국어 관점에서 명시적인 연결어미를 생성하여 한국어 문장을 생성함으로써, 중국어 원문의 원래 의미를 그대로 가지면서 높은 품질의 한국어를 생성할 수 있다. Therefore, when the original Chinese text to be translated has an unexplicit Chinese short logical link, the present invention generates an explicit linking ending from the Korean point of view, thereby generating a Korean sentence, while maintaining the original meaning of the original Chinese text while maintaining the high quality Korean. Can be generated.

이하, 본 발명에 따른 중한 자동번역에 있어서 중국어의 비명시적 연결어미를 자연스러운 한국어로 생성하는 중한자동번역을 위한 한국어 연결어미 생성 방법에 대하여 보다 상세하게 설명한다. Hereinafter, a method of generating a Korean linking ending for the Chinese-Korean automatic translation in which the Chinese non-explicit linking in the Korean-Korean automatic translation according to the present invention is generated in natural Korean will be described in more detail.

중국어 형태소 분석부(101)는 중한 자동번역을 수행함에 있어서 중국어 문장을 입력으로 하여 중국어에서 띄어쓰기가 되지 않아 구분이 모호한 단어를 중국어 형태소 분석 사전을 이용하여 가능한 모든 조합의 단어 분리를 임의로 3개씩 분할하여 3개로 분리된 단어들의 전체 길이가 가장 큰 단어 조합으로 분리하고, 분리된 단어에 대하여 같은 전체 길이를 가지는 단어 조합이 여러 개일 때에 각 조합의 각 단어 길이의 곱이 최대인 단어 조합을 선택하며, 그 곱도 같은 경우에는 중국어 1글자 또는 2글자 단어의 빈도수가 큰 단어를 가진 단어 조합을 선택하는 방법으로 반복 수행하여 입력 문장에 대하여 단어 분리를 수행하며, 분리된 단어들에 대하여 중국어 형태소 분석 사전에 있는 형태소 품사들을 부여한 후에 현재 입력 문장의 문맥 하에서 각 단어에 대한 최적의 형태소 품사를 형태소 품사 태깅으로 선택하여 중국어 구문 분석부(102)에 제공할 수 있다. 여기서, 형태소 품사 태깅은 한국어와 영어 등 모든 언어권에서 많이 사용되는 HMM(hidden markov model)에 기반한 형태소 품사 태깅 방법을 이용하여 분리된 중국어에 대한 형태소 품사를 태깅할 수 있다.The Chinese morphological analysis unit 101 arbitrarily divides three possible word separations of all possible combinations using the Chinese morphological analysis dictionary by using a Chinese morphological analysis dictionary because the Chinese sentence is input and the Chinese sentence is not spaced. The total length of the three separated words is divided into the largest word combinations, and when there are several word combinations having the same total length for the separated words, a word combination having the maximum product of each word length of each combination is selected. If the product is also the same, the word combination is repeated by selecting a word combination having a word having a high frequency of one or two Chinese words, and word separation is performed on the input sentence. Each word in the context of the current input sentence after assigning the morpheme parts of speech in An optimal morpheme part-of-speech for the morpheme part-of-speech tagging may be selected and provided to the Chinese parser 102. Here, the morpheme part-of-speech tagging may tag the morpheme parts-of-speech for the separated Chinese using a morpheme part-of-speech tagging method based on the HMM (hidden markov model), which is widely used in all languages such as Korean and English.

중국어 구문 분석부(102)는 중국어 형태소 분석부(101)에 의해 분리된 단어에 대한 최적의 형태소 품사를 이용하여 입력 중국어 문장에서 단어들 간의 문법적 역할에 따른 통사적 구조에 대한 구문트리를 생성하는데, 우선 중국어 형태소 분석 부(101)에 의한 단어분리와 형태소 품사 태깅 결과를 바탕으로 마침표, 콤마, 콜론 등의 절 구분 기호와 기타 절분리 패턴 규칙을 사용하여 분절단문(절)을 분리하고, 분리된 절 마다 따로 단어 간의 통사적 관계를 나타내는 구문분석을 실행하여 확장된 문맥 자유 문법 (extended context-free grammar) 형식의 구문규칙을 이용하는 차트파싱(chart parsing) 방법에 의해 절 단위 구문트리를 생성하여 중/한 변환부(103)에 제공할 수 있다. 여기서, 생성되는 구문트리 중에서 가장 현재 문장 문맥에 가장 적합한 구문트리를 선택하는 방법은 형태소 품사, 어휘 간 구문 관계성 정보, 구의 크기, 단어의 구문속성 등을 고려하여 만들어진 휴리스틱(heuristic)한 구조 규칙과 점수에 의하여 선택되며, 중국어 구문 분석부(102)에서 분리된 절 간의 구문관계는 표시하지 않으며, 단지 절 단위의 독립적인 구문트리들을 생성한다. The Chinese parsing unit 102 generates a syntax tree for the syntactic structure according to the grammatical role between words in the input Chinese sentence by using the optimal morpheme parts of words separated by the Chinese morphological analysis unit 101. First, based on the result of word separation and morpheme part-of-speech tagging by the Chinese morphological analysis unit (101), the segmentation sentences (sections) are separated and separated using section delimiters such as periods, commas, and colons, and other sectioning pattern rules. By analyzing the syntax that shows the syntactic relationship between words separately for each section, a section parsing tree is generated by the chart parsing method using syntax rules in the extended context-free grammar format. It may be provided to the Chinese / Korean conversion unit 103. Here, a method of selecting a syntax tree that is most suitable for the context of the current sentence among the generated syntax trees is a heuristic structural rule created by considering morpheme parts of speech, syntax relationship information between words, phrase size, and syntax attributes of words. Is selected by the scores and scores, and does not display the syntax relations between the sections separated by the Chinese parser 102, but merely generate independent syntax trees in units of sections.

중/한 변환부(103)는 중국어 구문 분석부(102)에 의해 생성된 절 단위 구문트리들을 변환패턴이라는 번역 지식 정보를 이용하여 통사적 및 의미적으로 대응되는 한국어 절단위 구문트리들로 변환하는데, 변환 패턴은 중국어 구문관계와 이에 대응되는 한국어 구문관계로 표현되며, 예컨대, 각 구문관계를 표현하기 위해 술어, 목적어, 주어, 보어 등과 구문관계 정보와 각 구문노드(syntactic node)의 중심어(head word)에 대응되는 단어에 대한 형태소 품사 정보나 어휘 정보, 의미정보, 어근정보, 구문속성정보 등으로 표현할 수 있다. The Chinese / Korean translator 103 converts the section unit syntax trees generated by the Chinese parser 102 into syntactic and semantically corresponding Korean truncated phrase trees using translation knowledge information called a translation pattern. The conversion pattern is expressed as a Chinese syntax relationship and a corresponding Korean syntax relationship. For example, to express each syntax relationship, predicates, objects, subjects, bores, etc., the syntax information and the central word of each syntactic node ( head word) can be expressed as morpheme part-of-speech information, vocabulary information, semantic information, root information, and syntax attribute information.

또한, 중/한 변환부(103)는 중국어 구문 분석부(102)에 의해 생성된 절 단위 중국어 구문트리를 루트(root)부터 방문하여 각 구문노드의 서브트리 중에서 헤드 노드(head node)를 우선으로 방문하면서 변환패턴 매칭을 통하여 중국어 구문노드 를 한국어 구문노드로 변환할 때에 구문노드가 단말노드(terminal node)인 경우 중국어 어휘를 한국어 어휘로 변환하는 대역어 선택 과정을 거쳐 한국어 구문노드의 구문정보로 변경하여 연결어미 결정부(104)에 제공할 수 있다. 여기서, 대역어 선택 과정에서 변환패턴의 중국어 어휘와 일치되는 경우에는 해당되는 변환패턴 내의 해당되는 한국어 어휘로 번역하고, 의미나 속성이 일치할 경우에 중한 대역어 사전에서 그 의미나 속성을 가지는 한국어 대역어로 변역하며, 만약 모든 조건들이 일치하지 않는 중국어 단어에 대해서는 중한 대역어 사전에서 기술된 대표 한국어 대역어로 번역할 수 있다. In addition, the Chinese / Korean translator 103 visits the section-specific Chinese syntax tree generated by the Chinese parser 102 from the root and prioritizes the head node among the subtrees of each syntax node. When converting a Chinese phrase node into a Korean phrase node through conversion pattern matching while visiting the network, when the phrase node is a terminal node, the phrase information of the Korean phrase node is passed through a band word selection process of converting a Chinese word into a Korean word. The change may be provided to the connection ending determination unit 104. Here, when the Chinese word of the conversion pattern is matched in the band word selection process, it is translated into the corresponding Korean vocabulary in the corresponding conversion pattern, and when the meaning or attributes match, the Korean band word having the meaning or property in the heavy band dictionary If a Chinese word does not match all of the conditions, it can be translated into a representative Korean band word described in the Chinese Band Dictionary.

연결어미 결정부(104)는 중/한 변환부(103)에 의해 변환된 절단위 한국어 구문트리가 둘 이상인 경우에 절 간의 논리연결을 위해 각 절의 중심 술부의 용언에 대하여 적합한 연결어미를 생성함에 있어서, 중국어 입력 문장에서 명시적으로 절 간의 논리 연결을 중국어 어휘로 표현한 경우에 해당 중국어 구문노드를 변환한 한국어 구문노드에 그 정보가 들어 있어서 연결어미나 연결 부사로 표현이 가능하지만, 중국어 입력 문장에서 절 간의 논리 연결을 명시적인 중국어 어휘로 표현하지 않은 경우에 한국어 원시코퍼스로부터 추출된 연결어미 지식 DB(106)를 이용하여 현재 절 간의 속성이 한국어 원시코퍼스에서 추출한 연결어미 지식과 일치하는지 여부를 판단하여 먼저 두 절 간의 논리연결을 표현할 수 있는 연결어미의 생성확률을 계산하고, 모든 절 간에 대해서 각 연결어미의 생성확률을 이용하여 전체 문장 관점에서 각 절 간의 연결어미가 어떠한 조합을 가질 때에 가장 자연스럽고 의미전달이 명확한지를 파악하여 모든 절 간의 연결어미를 생성하여 한국어 생성부(105) 에 제공할 수 있다. The linking end determining unit 104 generates a linking ending suitable for the term of the central predicate of each clause for logical connection between clauses when two or more truncated Korean syntax trees converted by the Chinese / Korean conversion unit 103 are used. In the case of expressing a logical connection between clauses in a Chinese input sentence explicitly in a Chinese vocabulary, the information is included in a Korean syntax node converted from a corresponding Chinese syntax node so that it can be expressed as a connection or connection adverb. In the case where the logical connection between the clauses is not expressed in an explicit Chinese vocabulary, the connection ending knowledge DB 106 extracted from the Korean primitive corpus is used to determine whether the attributes between the current clauses match the knowledge of the connection ending extracted from the Korean primitive corpus. We first calculate the probability of generation of the linking end that can represent the logical link between the two clauses. Korean generation unit 105 generates a connection ending between all clauses by grasping the most natural and clear meaning when the connection endings of each clause have a clear combination in terms of the whole sentence, using the probability of generating each connection ending. Can be provided to

다시 말하여, 연결어미 결정부(104)는 중/한 변환부(103)에 의해 중국어 구문트리를 변환한 절단위로 나누어진 의존 구문트리 형태의 한국어 구문트리를 입력으로 하는데, 만약 입력 중국어 문장이 단문이어서 중/한 변환부(103)에서 변환한 한국어 의존 구문트리가 하나일 경우 아무런 일을 수행하지 않는데 반하여, 입력 중국어 문장이 복문이고 비명시적 논리연결로 표현된 절 간 연결이 있는 경우 먼저 두 절 간에서 각 연결어미에 대한 생성확률을 계산하고, 앞에서 계산한 두 절단위마다의 연결어미들에 대한 생성확률을 이용하여 전체 문장에 가장 적합한 연결어미 조합을 찾아서 두 절마다의 연결어미 생성이 아니라 전체 문장을 기반으로 하는 연결어미가 생성되도록 할 수 있다. In other words, the connection ending determiner 104 inputs a Korean syntax tree in the form of a dependent syntax tree divided by a truncation point obtained by converting the Chinese syntax tree by the Chinese / Korean conversion unit 103. If it is a short sentence and there is only one Korean dependent syntax tree converted by the Sino-Korean translator 103, nothing is done. On the other hand, if the input Chinese sentence is a compound sentence and there is a connection between clauses expressed as non-explicit logical connection, Calculate the probability of generation for each linking end in the sections, and find the linking combination that best fits the whole sentence by using the probability of generating the linking ends of the two cutting points. Instead, a linking ending based on the entire sentence can be generated.

여기서, 두 절 간에서 각 연결어미에 대한 생성확률을 계산하는 방법은, 비명시적 논리연결로 이루어진 두 절단위 한국어 의존 구문트리마다 앞 절의 구문트리를 종속절로 보고 뒤 절의 구문트리를 지배절로 보아서 연결어미 지식 DB(106)의 용언패턴과 같은 형태의 용언패턴(y)을 생성한 후에, 생성된 용언패턴(y)을 이용하여 연결어미 지식 DB(106)에서 검색하여 각 연결어미(e)에 대한 생성확률을 수학식 1Here, the method of calculating the generation probability for each connection ending between the two clauses is to connect the two clauses of the non-explicit logical concatenation with the dependent clause tree of the first clause as the dependent clause and the syntax clause of the later clause as the governing clause. After generating a verb pattern y in the same form as the verb pattern of the mother knowledge DB 106, the generated verb pattern y is searched in the connection mother knowledge DB 106 to be used for each connection ending e. Probability of generation for Equation 1

연결어미(e)의 생성확률 = 0.8 * 어휘 용언패턴에 의한 연결어미(e)의 생성확률 + 0.2 * 의미 용언패턴에 의한 연결어미(e)의 생성확률Probability of generation of linking ending (e) = 0.8 * Probability of generation of connecting ending (e) by lexical verb pattern + 0.2 * Probability of generation of connection ending (e) by semantic verb pattern

(여기서, 어휘 용언패턴과 의미 용언패턴에 대한 설명은 표 8 및 표 9에 대 한 설명과 각 자질 값에 대한 설명으로 대체 가능하며, 자질의 값이 없는 경우에 NULL이라는 것을 표기되어야 한다.)(Here, the description of the lexical verb pattern and semantic verb pattern can be replaced with the description of Table 8 and 9 and the description of each feature value, and it should be indicated as NULL if there is no value of the feature.)

의 계산식을 이용하여 계산할 수 있다. It can be calculated using the formula of.

여기서, 수학식 1을 참조하면, 연결어미(e)의 생성확률의 계산식에서 어휘 용언패턴에 의한 연결어미(e)의 생성확률은 아래의 단계별 과정에 의해 계산할 수 있다. Here, referring to Equation 1, the generation probability of the connection ending e by the vocabulary verb pattern in the calculation of the generation probability of the connection ending e may be calculated by the following steps.

즉, 첫 단계로서, 생성된 용언패턴(y)이 연결어미 지식 DB(106)에 있을 경우, 검색된 용언패턴의 연결어미정보에 있는 연결어미(e)의 생성확률 값을 계산하고,That is, as a first step, when the generated verb pattern y is in the linking ending knowledge DB 106, the generation probability value of the linking end e in the linking ending information of the found verb pattern is calculated,

두 번째 단계로서, 계산에 의해 어휘 용언패턴에 의한 연결어미(e)의 생성확률을 구하지 못한 경우에, 용언패턴(y)에서 NULL이 아닌 주어 또는 목적어 중심어 어휘 자질의 개수가 m이라고 하면, i를 1부터 m까지 하여 연결어미(e)의 생성확률이 0 보다 클 때까지 아래의 세 번째 단계를 반복 수행할 수 있다.As a second step, when the probability of generating the connected ending e by the lexical verb pattern is not determined by the calculation, assuming that the number of subject or object center vocabulary features other than NULL in the verb pattern y is m, i From 1 to m, the following third step may be repeated until the probability of generation of the connecting ending (e) is greater than zero.

세 번째 단계로서, 용언패턴(y)에서 NULL이 아닌 주어 또는 목적어 중심어 어휘 자질들 중에서 i개를 NULL로 하여 만든 어휘 용언패턴들 중에 하나라도 연결어미 지식 DB(106)에서 검색될 경우, 이 검색된 용언패턴의 연결어미정보에 있는 연결어미(e)의 생성확률 값들의 평균을 계산하는 것이다. As a third step, if any one of the lexical verb patterns created by setting i to NULL among the subject or object center term lexical qualities that is not NULL in the verb pattern y is searched in the linking knowledge DB 106 It is to calculate the average of the probability of generation of the connection ending (e) in the connection ending information of the verb pattern.

다음으로, 수학식 1을 참조하면, 연결어미(e)의 생성확률의 계산식에서 의미 용언패턴에 의한 연결어미(e)의 생성확률은 아래의 단계별 과정에 의해 계산할 수 있다. Next, referring to Equation 1, the generation probability of the connection ending e by the semantic verb pattern in the calculation of the generation probability of the connection ending e may be calculated by the following step process.

즉, 첫 단계로서, 생성된 용언패턴(y)에서 NULL이 아닌 주어 또는 목적어 중심어 어휘 자질을 모두 해당 어휘에 대한 의미로 변경한 의미 용언패턴(ys)을 생성하여 연결어미 지식 DB(106)에서 검색될 경우, 이 검색된 용언패턴의 연결어미정보에 있는 연결어미(e)의 생성확률 값을 계산하고, That is, as a first step, in the generated verb pattern y, the semantic verb pattern ys which changes all the non-null subject or object center word vocabulary qualities to the meaning for the corresponding vocabulary is generated in the connected ending knowledge DB 106. If found, calculates the probability of generation of the linking ending (e) in the linking ending information of the found verb pattern,

두 번째 단계로서, 계산에 의해 어미 용언패턴에 의한 연결어미(e)의 생성확률을 구하지 못한 경우에, 의미 용언패턴(ys)에서 NULL이 아닌 주어 또는 목적어 중심어 어휘 자질의 개수가 m이라고 하면, i를 1부터 m까지 하여 연결어미(e)의 생성확률이 0 보다 클 때까지 세 번째 단계를 반복 수행할 수 있다. As a second step, if the calculation probability of generating the linking ending e by the ending verb pattern is not determined by calculation, assuming that the number of subject or object center vocabulary features other than NULL in the semantic verb pattern ys is m, With i from 1 to m, the third step may be repeated until the probability of generation of the connecting end (e) is greater than zero.

세 번째 단계로서, 용언패턴(ys)에서 NULL이 주어 또는 목적어 중심어 어휘 자질을 i개 NULL로 하여 만든 의미 용언패턴들 중에 하나라도 연결어미 지식 DB(106)에서 검색될 경우, 이 검색된 의미 용언패턴의 연결어미정보에 있는 연결어미(e)의 생성확률 값들의 평균을 계산하는 것이다. As a third step, if any one of the semantic verb patterns created with i NULL in the verb pattern ys or the object core word lexical feature is i NULL, the retrieved semantic verb pattern is retrieved. It is to calculate the average of the probability of generation of the linking ending (e) in the linking ending information of.

다음으로, 용언패턴(y)이 어휘/용언패턴으로도 연결어미 지식 DB(106)에 없어서 상술한 바와 같이 계산된 용언패턴(y)의 각 연결어미(e)에 대한 생성확률이 모두 0인 경우에는 각 연결어미(e)의 디폴트 생성확률을 용언패턴(y)의 연결어미(e)에 대한 생성확률로 할 수 있다. 여기에서 말하는 각 연결어미(e)의 디폴트 생성확률은 연결어미 지식 DB(106)의 모든 패턴에서 연결어미(e)의 빈도수 합을 모든 패턴에서 모든 연결어미들의 빈도수의 총합으로 나눈 값이다. 즉, 연결어미 지식 DB(106)에서 각 연결어미(e)가 나타날 확률을 의미할 수 있다. Next, since the verb pattern y does not exist in the connected ending knowledge DB 106 as a vocabulary / proverbs pattern, the probability of generating each of the connection endings e of the verb pattern y calculated as described above is all zero. In this case, the default generation probability of each connection ending e may be the generation probability of the connection ending e of the verb pattern y. The default probability of generating each linking end (e) is the sum of the frequency of the linking end (e) in all patterns of the linking end knowledge DB 106 divided by the sum of the frequencies of all linking endes in all the patterns. That is, it may mean the probability that each connection ending e will appear in the connection ending knowledge DB 106.

다시 말하여, 연결어미 결정부(104)는 상술한 수학식 1을 이용하여 두 절간 의 연결어미 생성확률이 계산될 경우, 문장 전체 관점에서 비명시적 논리연결 절 간에 생성할 연결어미들을 최적으로 생성하기 위해서 대등 연결어미가 아닌 경우에 바로 이웃하여 나오지 않도록 하여 모든 두 절간의 연결어미 조합을 만들고 각 조합에 있는 연결어미들의 생성확률을 곱으로 문장 전체에 대한 연결어미 조합 생성확률을 계산하여 그 생성확률 값이 최고인 연결어미 조합에 있는 연결어미들을 각 절 간의 비명시적 논리연결을 명시적으로 표현할 한국어 연결어미로 선정할 수 있다. In other words, when the probability of generating the connection ending between the two clauses is calculated using Equation 1, the connection ending determination unit 104 optimally generates the connection endings to be generated between the non-explicit logical connection clauses in terms of the whole sentence. In order to make the linking end combinations between all two clauses so that they do not appear immediately next to each other when they are not parallel linking ends, the probability of generating the linking end combinations for the whole sentence is generated by multiplying the generation probabilities of the connection endings in each combination. The linking endings in the combination of linking endings with the highest probability values can be selected as Korean linking endings that express the non-explicit logical linking between clauses.

한국어 생성부(105)는 중/한 변환부(103)와 연결어미 결정부(104)에 의해 명시적으로 논리연결이 생성된 한국어 구문트리를 입력으로 하여 한국어 구문트리의 루트(root) 노드부터 방문하여 한국어 문장 생성 순서인 주어, 목적어, 술어 순서로 단말 구문노드를 재배열하고, 한국어 문법과 용언의 양상(modality)에 따라 조사와 어미와 같은 기능어(functional word)를 생성하여 자연스러운 한국어를 생성할 수 있다. 여기서, 단말 구문노드를 재배열시에, 부사의 위치 조정과 강조를 위해 사용된 중복된 용언에 대한 처리, 중국어에서 과다하게 사용하는 양사에 대한 보정 및 삭제, 격변화에 따른 조사 생성, 시제, 수동/피동/사역 등에 대한 어미 생성, 불규칙/규칙 용언에 대한 용언 생성 등에 대한 처리를 하여 각 단말 구문노드들을 자연스러운 한국어 어절로 생성할 수 있다. The Korean language generation unit 105 inputs a Korean syntax tree in which a logical connection is explicitly created by the Chinese / Korean conversion unit 103 and the connection ending determining unit 104, and then starts from the root node of the Korean syntax tree. Rearranges the terminal syntax nodes in order of subject, object, and predicate, which is the order of Korean sentence generation, and generates natural words by generating functional words such as surveys and endings according to Korean grammar and verbal modality. can do. Here, when rearranging the terminal syntax node, processing of duplicate verbs used for adjusting and emphasizing adverbs, correcting and deleting the two excessively used words in Chinese, generating surveys according to violent changes, tense, manual / Each terminal syntax node can be generated as a natural Korean word by processing for generating a parent for a passive / mineral and generating a verb for an irregular / regular verb.

연결어미 지식 DB(106)는 연결어미 지식 구축부(107)에 의해 구축된 한국어 원시코퍼스로부터 추출된 연결어미 지식을 저장할 수 있다.The connection ending knowledge DB 106 may store the connection ending knowledge extracted from the Korean native corpus constructed by the connection ending knowledge construction unit 107.

연결어미 지식 구축부(107)는 도 2에 도시된 바와 같이, 한국어 원시코퍼 스(S1)로부터 연결어미 지식을 추출하여 연결어미 지식 DB(106)에 저장하기 위한 블록으로서, 한국어 형태소 분석부(1071)와 한국어 구문 분석부(1072)와 연결어미별 자질 패턴 추출부(1073)와 연결어미 자질패턴 DB(1074)와 자질패턴 확장부(1075)로 이루어질 수 있다.As shown in FIG. 2, the connection ending knowledge construction unit 107 is a block for extracting the connection ending knowledge from the Korean primitive corpus S1 and storing the connection ending knowledge in the connection ending knowledge DB 106. 1071, the Korean syntax analyzing unit 1072, the feature pattern extracting unit 1073 for each connection ending, the connection ending feature pattern DB 1074, and the feature pattern expansion unit 1075.

한국어 형태소 분석부(1071)는 한국어 원시코퍼스(S1)의 각 문장에 대하여 문장에 속하는 어절을 형태소 단위로 분리하고 각 형태소에 문맥에 적합한 한국어 형태소 품사를 태깅, 다시 말하여 형태소 어휘와 해당 어휘에 대한 형태소 품사 정보 및 좌우접속정보를 가지는 한국어 형태소 사전과 접속정보에 따라 형태소 간의 어절 내에서 접속이 가능한지에 대한 여부를 표시하는 형태소 좌우접속테이블을 이용하여 각 어절마다 해당 형태소 품사들을 부여할 수 있다. 그리고, HMM 모델에 기반한 한국어 형태소 품사 태깅 방법에 의해서 형태소 품사 태그드 코퍼스(part-of-speech tagged corpus)에서 학습한 어휘확률(lexical probability)과 전이확률(transition probability)을 이용하여 현재 문장에서 가장 적합한 형태소 품사를 태깅한 형태소 분석 결과를 한국어 구문 분석부(1072)에 제공할 수 있다.The Korean morpheme analysis unit 1071 separates the phrases belonging to the sentences into morpheme units for each sentence of the Korean primitive corpus (S1), and tags Korean morpheme parts of speech appropriate to the context, that is, the morpheme vocabulary and the corresponding vocabulary. Corresponding morpheme parts of speech can be assigned to each word by using the morpheme left and right connection table indicating whether access is possible within a word between morphemes according to the Korean morpheme dictionary and access information. . In addition, using the lexical probability and the transition probability learned in the part-of-speech tagged corpus by the Korean morpheme part-of-speech tagging method based on the HMM model, The morphological analysis result tagged with the appropriate morpheme parts of speech may be provided to the Korean syntax parser 1072.

예컨대, 한국어 형태소 분석부(1071)는 한국어 원시코퍼스(S1)로부터 추출한 한국어 문장이 “금주에 원달러 환율이 급속하게 올라서 국내 소비자 물가가 상승하였고 수입도 감소하였다.”이라고 할 경우, 이 문장에 대하여 가장 적합하게 형태소 품사를 태깅한 형태소 분석결과, 즉 “금주(명사)+에(부사격조사) 원달러(명사) 환율(명사)+이(주격조사) 급속하(동사)+게(부사전성어미) 오르(동사)+어서(연결어미) 국내(명사) 소비자(명사) (물가)명사+가(주격조사) 상승하(동사)+였(과거 시제선어말어미)+고(연결어미) 수입(명사)+도(보조사) 감소하(동사)+였(과거시제선어말어미)+다(종결어미)+.(종결기호)”를 한국어 구문 분석부(1072)에 제공할 수 있다.For example, if the Korean sentence extracted from the Korean primitive corpus (S1) says, "The Korean won inflation has risen rapidly this week, domestic consumer prices have risen and income has decreased." The result of morphological analysis tagging the morpheme parts of speech most appropriately: “Don't (noun) + (adjective investigation), won dollar (noun), exchange rate (noun) + (early investigation), rapid (verb) + crab (adverb) Mother (or verb) + (consolidated mother) domestic (noun) consumer (noun) (price) noun + price (main investigation) rising (verb) + (in the past tense first mother) + high (consolidated) income (Noun) + degrees (subordinate) decreased (verb) + (previous preliminary ending ending) + multi (ending ending) +. (Ending sign) ”can be provided to the Korean parsing unit 1072.

한국어 구문 분석부(1072)는 한국어 형태소 분석부(1071)에 의해 분석된 형태소 분석 결과를 바탕으로 한국어 문장에서 어절들 간의 문법적 역할에 따른 통사적 구조에 대한 구문트리를 생성하는데, 보다 상세하게 설명하면, 한국어는 중심어가 뒤에 나타나는 중심어 후행성과 단어의 배열이 비교적 자유로운 어순 자유성의 특성을 보이므로, 이러한 특징을 반영하기 위해서 의존문법(dependency grammar)을 사용하여 구문트리를 생성하여 연결어미별 자질패턴 추출부(1073)에 제공할 수 있다. 여기서, 구문트리는 두 어절 간의 지배노드와 종속노드를 연결하는 의존 트리(dependency tree)를 생성할 수 있다.The Korean syntax analyzing unit 1072 generates a syntax tree for the syntactic structure according to the grammatical role between the words in the Korean sentence based on the morphological analysis result analyzed by the Korean morphological analysis unit 1071. In Korean, since the central word trailing and the arrangement of words are relatively free, the Korean language has the characteristics of freedom of word order. Therefore, to reflect this feature, the syntax tree is created by using dependency grammar to characterize patterns of connected endings. The extraction unit 1073 may be provided. Here, the syntax tree may generate a dependency tree connecting the dominant node and the dependent node between two words.

일 예로, 도 3은 한국어 구문 분석부(1072)에 의해 생성된 의존 트리를 나타낸 도면으로서, 도 3을 참조하면, (1)번 노드 “금주(명사)+에(부사격조사)”는 (5)번 노드 “오르(동사)+어서(연결어미)”의 종속노드이고, 반대로 (5)번 노드는 (1)번 노드의 지배노드이고, 두 구문노드 간의 관계는 “부사어” 관계로 용언과 부사어 간의 관계임을 나타낸다. For example, FIG. 3 is a diagram illustrating a dependency tree generated by the Korean syntax analyzing unit 1072. Referring to FIG. 3, the node (1) "Nonju (noun) + (subject investigation)" is (5). ) Node is the subordinate node of "or (verb) + sub (connection ending)", on the contrary, node (5) is the dominant node of node (1), and the relationship between two syntax nodes is "adverb". Indicates a relationship between adverbs.

연결어미별 자질패턴 추출부(1073)는 한국어 구문 분석부(1072)에 의해 생성된 한국어 구문트리를 바탕으로 하여 용언 어절의 구문노드가 연결어미인 구문노드를 찾고 이 찾은 구문노드의 지배소 노드(dependent node)인 용언 구문노드를 찾은 후에, 두 용언 구문노드와 관계된 구문노드를 이용하여 표 1Based on the Korean syntax tree generated by the Korean parsing unit 1072, the feature pattern extracting unit 1073 for the linking end finds the syntax node whose syntactic phrase node is the linking end and finds the control node node of the found syntax node. After finding a verb phrase node that is (dependent node), use the syntax node associated with both verb phrase nodes.

연결어미 : [종속절 = 용언 어휘, 부정여부, 주어 중심어 어휘, 목적어 중심어 어휘]
[지배절 = 용언 어휘, 부정여부, 주어 중심어 어휘, 목적어 중심어 어휘]
·용언 어휘 : 해당 용언 구문노드의 용언 형태소 어휘
·부정 여부 : 각 용언에 대하여 문장에서 부정 형태소나 어절인 “못”, “~ 않”등이
사용하였으면 1을 표시하고 그렇지 않은 경우에는 0을 표시
·주어 중심어 어휘: 용언 노드에 대하여 주어 관계인 구문노드의 기능어를 제외한 실제
형태소 어휘
·목적어 중심어 어휘: 용언 노드에 목적어 관계인 구문노드의 기능어를 제외한 실제
형태소 어휘Connected endings: [subordinate clause = verbal vocabulary, negation, subject-centered vocabulary, object-centered vocabulary]
[Domination clause = verbal vocabulary, whether negative, subject-centered vocabulary, object-centered vocabulary]
Verb vocabulary: Verb morpheme vocabulary of the verb phrase node
· Negation: In the sentence for each verb indefinite morphemes
Display 1 if used, 0 otherwise
Main keyword vocabulary: Actual except the functional words of the syntactic node that is the subject relation for the verb node.
Morpheme vocabulary
Object-centered vocabulary: Actual except the functional words of the syntax nodes that are object relations to the verb node.
Morpheme vocabulary

에 나타난 바와 같이 자질 패턴을 추출할 수 있다. As shown in FIG. 2, feature patterns can be extracted.

즉, 표 1에서와 같이 자질패턴을 추출하는 경우, 문장의 입력 순서대로 구문노드를 검색하여 연결어미를 가지는 종속 용언 구문노드를 찾고 그 용언 구문노드의 지배 구문노드인 용언구문 노드를 찾아서 두 용언 구문노드들에 관련된 자질을 추출하여 자질패턴을 생성할 수 있다. 만약 지배 구문노드가 “고”, “거나”, “든지” 와 같은 대등적 연결어미를 포함하고 있으면 현재 종속 구문노드의 연결어미에 대한 패턴으로 지배 구문노드의 지배 용언 구문노드를 검색하여 현재 종속 용언 구문노드와 지배 구문노드의 지배 구문노드에 관련된 자질을 추출하여 자질패턴을 추출하고 다시 대등적 연결어미로 연결되었는가를 검사하여 대등적 연결어미로 연결될 경우 반복 작업을 계속 하고, 반면에 대등적 연결어미로 연결되지 않아 종료할 수 있다.That is, in the case of extracting the feature pattern as shown in Table 1, the syntax node is searched in the order of sentence input to find the dependent verb phrase node having the linking end, and the verb phrase node which is the dominant syntax node of the verb phrase node is found. A feature pattern may be generated by extracting a feature related to syntax nodes. If the dominant syntax node contains equivalent linking endings such as “high”, “or”, or “or”, then the current dependent dependency is searched for the dominant syntax node by searching for the dominant syntax node of the dominant syntax node in the pattern of the linking ending of the current dependent syntax node. Extracts the feature pattern related to the dominant syntax node of the syntax node and the dominant syntax node, extracts the feature pattern, and checks whether it is connected to the equal linking end again. If not, you can exit.

또한, 현재 종속 구문노드의 연결어미가 대등적 연결어미가 아닌 연결어미이고 현재 종속 구문노드의 종속 노드 중에서 대등적 연결어미를 가지는 용언구문노드가 있다면, 현재 종속 구문노드의 연결어미에 대하여 종속 구문노드의 종속 용언구문노드와 지배 용언구문노드를 이용하여 새로운 자질패턴을 추출하고, 계속 반복하여 마지막 종속 용언구문노드의 연결어미가 대등연결 어미인가를 검색하여 그렇다면 위의 작업을 반복하고 그렇지 않다면 종료할 수 있다. 과정을 입력 어절 순서에 따른 구문노드를 대상으로 반복 수행하여 모든 자질 패턴을 추출할 수 있다.In addition, if the linking end of the current dependent syntax node is a linking end that is not a parallel linking end, and there is a syntactic node having the same linking end among the dependent nodes of the current dependent syntax node, A new feature pattern can be extracted using the dependent verb phrase node and the dominant verb phrase node, and iterated over and over again to find out if the connection term of the last dependent phrase node is an equal connection, and if so, repeat the above operation and terminate otherwise. have. The process can be repeated for syntax nodes according to the input word order to extract all feature patterns.

예컨대, 도 3을 참조하면, 연결어미가 포함된 용언을 입력 문장 순서대로 검색할 경우, (5)번 노드임을 찾고, (5)번 노드의 지배노드인 (8)번 용언 노드를 찾는다. (5)번 노드를 지배소로 하는 구문노드들을 탐색하여 위의 종속절 자질들을 채운다. 그리고, (8)번 노드를 지배소로 하는 구문노드들을 탐색하여 지배절 자질을 채운다. 여기서, 상술한 예제 문장의 경우에는 표 2For example, referring to FIG. 3, when searching for a verb including a connection ending in the order of input sentences, the node is searched for node (5), and the search for node (8) is the dominant node of node (5). Search for syntax nodes with node (5) as the control and fill in the subclause qualities above. Then, it searches for the syntax nodes with node (8) as the dominant and fills the governing qualities. In the case of the above-described example sentence, Table 2

어서 : [종속절 = 오르, 0, 환율, NULL] [지배절 = 상승하, 0, 물가, NULL]Come on: [subordinate = rising, 0, exchange rate, NULL] [controlling = rising, 0, inflation, NULL]

에 나타난 바와 같이 자질을 추출할 수 있다. You can extract qualities as shown in

다음으로, (8)번 노드가 대등적 연결어미 “고”를 포함하고 있으므로 (8)번 노드의 지배소 노드인 (10)번 노드를 찾아서 (5)번 노드를 종속절로 하고 (10)번 노드를 지배절로 하여 표 3Next, since node (8) contains the equal connection ending “high”, find node (10), which is the dominant node of node (8), make node (5) the subordinate clause, and node (10). As the rule

어서 : [종속절 = 오르, 0, 환율, NULL] [지배절 = 감소하, 0, 수입, NULL]Come on: [subordinate = rising, 0, exchange rate, NULL] [controlling = decreasing, 0, income, NULL]

에 나타난 바와 같이 자질패턴을 추출할 수 있다. As shown in FIG. 6, feature patterns can be extracted.

다음에, (5)번 노드를 종속절로 하는 자질패턴 추출이 완료되었으므로, 다음 연결어미를 가지는 구문노드 (8)번을 찾고 해당 지배 노드 (10)에 대해서 자질패턴을 추출하면 표 4Next, since feature pattern extraction with node (5) as a subordinate clause has been completed, find the syntax node (8) with the next connection ending and extract the feature pattern for the corresponding governing node (10).

고 : [종속절 = 상승하, 0, 물가, NULL] [지배절 = 감소하, 0, 수입, NULL]High: [subordinate = rising, 0, inflation, NULL] [controlling = decreasing, 0, income, NULL]

에 나타난 바와 같이 표현될 수 있다. It can be expressed as shown.

이와 같이, 연결어미별 자질패턴 추출부(1073)에서는 한국어 원시코퍼스(S1)의 모든 문장에 대하여 상술한 바와 같이 자질패턴을 추출하고 그 빈도수를 기록하여 연결어미 자질패턴 DB(1074)에 제공하여 구축할 수 있다. As such, the feature pattern extracting unit 1073 for each linking mother extracts the feature pattern as described above for all sentences of the Korean native corpus S1, records the frequency, and provides the feature pattern to the linking mother feature pattern DB 1074. Can be built.

연결어미 자질패턴 DB(1074)는 연결어미별 자질패턴 추출부(1073)에 의해 추출된 자질 패턴을 키(key)로 하여 그 빈도수를 값(value)으로 하여 저장할 수 있다. The connection mother feature pattern DB 1074 may store the feature pattern extracted by the feature pattern extracting unit 1073 for each connection mother as a key and use the frequency as a value.

자질패턴 확장부(1075)는 한국어 원시코퍼스(S1)에 나타나지 않는 자질들에 대하여도 처리가 가능하게 하기 위해서, 연결어미별 자질패턴 추출부(1073)에 의해 추출되어 저장된 연결어미 자질패턴 DB(1074)의 자질패턴들을 이용하여 연결어미에 따라 해당 자질들의 조합으로 새로운 자질패턴을 확장하고, 확장된 자질패턴들을 연결어미 결정부(104)에서 사용하는 연결어미 지식 DB(106)의 포맷으로 변환하여 연결어미 지식 DB(106)에 구축할 수 있다. The feature pattern expansion unit 1075 extracts and stores the connection mother feature pattern DB extracted by the feature pattern extracting unit 1073 for each connection mother so as to be able to process the features that do not appear in the Korean native corpus S1. Using the feature patterns of 1074, the new feature pattern is expanded with the combination of the corresponding features according to the connection ending, and the extended feature patterns are converted into the format of the connection ending knowledge DB 106 used by the connection ending determining unit 104. Can be built in the connection mother knowledge DB (106).

여기서, 새로운 자질패턴을 확장하는 방법은 다음 네 가지 방법에 의해서 확장할 수 있다. Here, the method of extending the new feature pattern can be extended by the following four methods.

즉, 첫 번째 자질패턴 확장 방법은, 어휘 자질 축소 확장법으로 연결어미별 자질패턴 추출부(1073)에서 추출한 자질패턴에서 종속절과 지배절의 자질 중에서 용언 어휘와 부정여부 자질을 제외한 주어와 목적어 중심어 어휘를 하나씩 NULL로 바꾸어 자질을 확장하는 것이다. 이 때, 확장되는 자질패턴에 대한 빈도수는 수학식 2In other words, the first feature pattern expansion method is a lexical feature reduction expansion method in the feature pattern extracted by the feature pattern extracting unit 1073 for each ending mother, excluding the verb and the negation feature from the feature of the dependent clause and the control clause. To extend NULL by replacing them with NULLs one by one. At this time, the frequency for the extended feature pattern is

새로운 자질패턴 빈도수 = 원 자질패턴 빈도수 × 0.5 × (확장된 자질패턴에서 NULL이 아닌 주어와 목적어 중심어 어휘 자질 수 + 1 )/(원 자질패턴에서 NULL이 아닌 주어와 목적어 중심어 어휘 자질 수)New feature pattern frequency = original feature pattern frequency × 0.5 × (non-NULL subject and object core vocabulary features in extended feature pattern + 1) / (number of subject and object core word vocabulary features in original feature pattern)

의 계산식을 이용하여 계산할 수 있다.It can be calculated using the formula of.

예컨대, 상술한 예제의 자질패턴에 대하여 자질 확장을 하면 표 5For example, if the feature expansion of the feature pattern of the above example is performed, Table 5

종류Kinds 자질패턴Feature pattern 빈도수Frequency 원본original 어서 : [종속절 = 오르, 0, 환율, NULL] [지배절 = 상승하, 0, 물가, NULL]Come on: [subordinate = rising, 0, exchange rate, NULL] [controlling = rising, 0, inflation, NULL] 1414 확장expansion 어서 : [종속절 = 오르, 0, 환율, NULL] [지배절 = 상승하, 0, NULL, NULL]Come on: [subordinate = rising, 0, exchange rate, NULL] [controlling = rising, 0, NULL, NULL] 14 × 0.5 × (1 + 1)/2 = 714 × 0.5 × (1 + 1) / 2 = 7 확장expansion 어서 : [종속절 = 오르, 0, NULL, NULL] [지배절 = 상승하, 0, 물가, NULL]Come on: [subordinate = rising, 0, NULL, NULL] [controlling = rising, 0, prices, NULL] 14 × 0.5 × (1 + 1)/2 = 714 × 0.5 × (1 + 1) / 2 = 7 확장expansion 어서 : [종속절 = 오르, 0, NULL, NULL] [지배절 = 상승하, 0, NULL, NULL]Come on: [subordinate = rising, 0, NULL, NULL] [controlling = rising, 0, NULL, NULL] 14 × 0.5 × (0 + 1)/2 = 3.514 × 0.5 × (0 + 1) / 2 = 3.5

에 나타낸 바와 같은 자질패턴과 빈도수를 가질 수 있다. 즉 그 종류가 확장인 자질패턴이 새로 증가한 패턴인 것이다. It may have a feature pattern and a frequency as shown. In other words, it is a pattern in which the feature pattern whose type is extension is a newly increased pattern.

상술한 바와 같이, 확장된 자질패턴들을 연결어미 자질패턴 DB(1074)에 재저장을 수행하는 것이다. 이 경우에 새로 확장한 자질패턴이 이미 연결어미 자질패턴 DB(1074)에 이미 저장되어 있다면, 그 빈도수를 현재 확장되면서 갖게 된 빈도수만큼 증가하여 줄 수 있다. As described above, the extended feature patterns are restored to the connection mother feature pattern DB 1074. In this case, if the newly extended feature pattern is already stored in the connection mother feature pattern DB 1074, the frequency can be increased by the frequency that is obtained while the current feature is expanded.

다음으로, 두 번째 자질패턴 확장 방법은, 자질패턴 전이(transition) 방법으로 첫 번째 자질패턴 확장 방법에 의하여 확장된 연결어미 자질패턴 DB(1074)에서 연결어미에 따라 임의의 자질패턴의 지배절과 다른 자질패턴의 종속절이 일치하는 자질패턴들에서 일치하지 않은 종속절과 지배절을 결합하여 새로운 자질패턴을 생성할 수 있다. 모든 연결어미에 따라 이와 같은 방법이 적용되고, 새롭게 생성되는 자질패턴에 대한 빈도수는 원 자질패턴들의 빈도수 평균에 일정한 가중치를 곱하여 계산하는 수학식 3Next, the second feature pattern extension method is a feature pattern transition method, and the control clause of an arbitrary feature pattern according to the connection ending in the connection end feature pattern DB 1074 extended by the first feature pattern extension method. A new feature pattern can be generated by combining subordinate clauses and dominant clauses that do not match in the feature patterns in which the dependent clauses of other feature patterns match. This method is applied to all connected endings, and the frequency for the newly generated feature pattern is calculated by multiplying the average of the frequency of the original feature patterns by a certain weight.

새로운 자질패턴 빈도수 = 원 자질패턴들의 평균 빈도수 × 0.4New feature pattern frequency = mean frequency of original feature patterns × 0.4

예컨대, 자질패턴들에 대하여 자질 확장을 수행할 경우, 표 6For example, when performing feature expansion on feature patterns, Table 6

종류Kinds 자질패턴Feature pattern 빈도수Frequency 원본original 어서 : [종속절 = 오르, 0, 환율, NULL] [지배절 = 상승하, 0, 물가, NULL]Come on: [subordinate = rising, 0, exchange rate, NULL] [controlling = rising, 0, inflation, NULL] 1414 원본original 어서 : [종속절 = 상승하, 0, 물가, NULL] [지배절 = 위축되, 0, 소비, NULL]Come on: [subordinate = rising, 0, inflation, NULL] [controlling = declining, 0, consumption, NULL] 66 원본original 어서 : [종속절 = 오르, 0, 환율, NULL] [지배절 = 위축되, 0, 소비, NULL]Come on: [subordinate = ord, 0, exchange rate, NULL] [dominant clause = shrunken, zero, consumed, NULL] ((14+6)/2) × 0.4 = 4((14 + 6) / 2) × 0.4 = 4

에 나타낸 바와 같이 자질패턴과 빈도수를 가질 수 있다. As shown in FIG. 1, the feature pattern and the frequency may be used.

상술한 바와 같이 확장된 자질패턴들을 연결어미 자질패턴 DB(1074)에 재저장을 하며, 이 경우에 새로 확장한 자질패턴이 이미 연결어미 자질패턴 DB(1074)에 이미 저장되어 있다면, 그 빈도수를 현재 확장되면서 갖게 된 빈도수만큼 증가하여 줄 수 있다. As described above, the extended feature patterns are restored to the connection mother feature pattern DB 1074. In this case, if the newly expanded feature pattern is already stored in the connection feature pattern DB 1074, the frequency is determined. It can be increased by the frequency of the current expansion.

다음으로, 세 번째 자질패턴 확장 방법은, 대칭 확장법으로 첫 번째와 두 번째 자질패턴 확장까지를 거친 연결어미 자질패턴 DB(1074)에서 “고”, “거나”, “든지” 와 같이 종속절이나 지배절의 구분이 없이 절의 순서를 변경하여도 그 의미가 일치하는 경우의 자질패턴에 대해서 종속절과 지배절의 순서를 바꾸어서 확장하며 확장되는 자질패턴에 대한 빈도수는 수학식 4Next, the third feature pattern expansion method is a symmetric expansion method, and the subordinate clauses such as “high”, “or”, “or” in the connection mother feature pattern DB 1074 that have passed through the first and second feature pattern expansion. Even if the order of clauses is changed without distinguishing the dominant clauses, the qualities patterns in the case where the meanings are consistent are expanded by changing the order of the subordinate clauses and the dominant clauses, and the frequency of the extended feature patterns is expressed in Equation 4.

새로운 자질패턴 빈도수 = 원 자질패턴 빈도수 × 0.2New feature pattern frequency = original feature pattern frequency × 0.2

의 계산식을 이용하여 계산할 수 있고, 빈도수의 계산에 있어서 연결어미 “고”, “거나”, “든지” 와 같은 경우에 의미의 다양성에 의하여 무조건 절의 순서를 변경할 수 없는 경우도 상당히 있어서 원 자질패턴 빈도수에 대하여 낮은 가중치로 다시 계산할 수 있다. It can be calculated using the formula of, and in order to calculate the frequency, the order of clauses can not be changed unconditionally due to the diversity of meanings in the case of the linking endings such as “high”, “or”, or “or”. It can be recalculated with low weight for frequency.

예컨대, 아래의 원본 자질 패턴에 대하여 자질 확장을 수행하면 표 7For example, if you perform feature expansion on the original feature pattern below,

종류Kinds 자질패턴Feature pattern 빈도수Frequency 원본original 고 : [종속절 = 상승하, 0, 물가, NULL] [지배절 = 감소하, 0, 수입, NULL]High: [subordinate = rising, 0, inflation, NULL] [controlling = decreasing, 0, income, NULL] 88 원본original 고 : [종속절 = 감소하, 0, 수입, NULL] [지배절 = 상승하, 0, 물가, NULL]High: [subordinate = declining, 0, income, NULL] [controlling = rising, 0, inflation, NULL] 8 × 0.2 = 1.68 × 0.2 = 1.6

상술한 바와 같이 첫 번째와 두 번째 방법과 마찬가지로 확장된 자질패턴들을 연결어미 자질패턴 DB(1074)에 재저장을 하고, 새로 확장한 자질패턴이 이미 연결어미 자질패턴 DB(1074)에 이미 저장되어 있다면, 그 빈도수를 현재 확장되면서 갖게 된 빈도수만큼 증가하여 줄 수 있다. As described above, like the first and second methods, the extended feature patterns are restored to the connection mother feature pattern DB 1074, and the newly expanded feature patterns are already stored in the connection feature pattern DB 1074. If so, you can increase that frequency by the number of frequencies you have as you expand it.

마지막으로, 네 번째 자질패턴 확장 방법은, 의미일반화 확장법으로 자질패턴 확장 방법들을 거친 연결어미 자질패턴 DB(1074)에서 종속절과 지배절의 주어와 목적어 중심어 어휘를 중한번역에서 사용하는 의미사전을 이용하여 어휘를 해당 의미로 확장하는 작업을 수행할 수 있다. 이때, 어휘를 의미로 일반화하여 확장할 경우에 원 자질패턴 빈도수에 대하여 0.1을 곱한 값을 확장된 의미 자질패턴의 빈도수로 할 수 있다. 새로 생성되는 의미 자질패턴과 그 빈도수도 연결어미 자질패턴 DB(1074)에 저장하고, 만약 이미 그 자질패턴이 저장되어 있다면 현재 빈도수를 더하여 재저장할 수 있다.Finally, the fourth feature pattern extension method uses a semantic dictionary that uses subordinate and dominant clause subject and object-centered word vocabulary in heavy translation in the linkage feature pattern DB 1074 that has undergone the feature pattern extension methods as a semantic generalization extension method. To expand the vocabulary to its meaning. In this case, when generalizing and expanding the vocabulary, a value obtained by multiplying the frequency of the original feature pattern by 0.1 may be used as the frequency of the extended feature feature pattern. The newly created semantic feature pattern and its frequency may also be stored in the connection mother feature pattern DB 1074. If the feature pattern is already stored, the semantic feature pattern and the frequency may be re-stored by adding the current frequency.

또한, 자질패턴 확장부(1075)는 확장된 자질패턴들까지 포함한 연결어미 자질패턴 DB(1074)를 연결어미 결정부(104)에서 사용하는 연결어미 지식 DB(106)으로 변경하는 작업을 수행하며, 연결어미 지식 DB(106)는 표 8In addition, the feature pattern extension unit 1075 changes the connection end feature pattern DB 1074 including the extended feature patterns into the connection end knowledge DB 106 used by the connection end determination unit 104. , Connection ending knowledge DB 106 is shown in Table 8

용언패턴Pattern 연결어미정보Connected Mother Information [종속절 = 용언 어휘, 부정여부, 주어 중심어 어휘, 목적어 중심어 어휘] [지배절 = 용언 어휘, 부정여부, 주어 중심어 어휘, 목적어 중심어 어휘][Subordinate clause = verbal vocabulary, whether negative, subject-centered vocabulary, object-core vocabulary] [dominant clause = verbal vocabulary, negation, subject-core vocabulary, object-core vocabulary] (연결어미1 생성확률1), (연결어미2 생성확률2), …, (연결어미p 생성확률p)(Connection probability 1 generation probability 1), (connection probability 2 generation probability 2),... , (Probability of concatenation p)

에 나타낸 바와 같은 형태를 가질 수 있다. It may have a form as shown in.

즉, 연결어미 자질패턴 DB(1074)의 자질패턴에서 연결어미를 제외한 종속절과 지배절의 패턴을 연결어미 지식 DB(106)의 용언패턴으로 변경하고, 자질패턴의 연결어미와 해당 빈도수를 이용하여 연결어미정보로 계산할 수 있고, 이 계산된 연결어미정보에서 연결어미에 대한 생성확률은 각 연결어미에 대한 용언패턴의 빈도수를 총 용언패턴 빈도수로 나누어서 계산할 수 있다. 그러므로, 각 용언패턴에 대한 연결어미정보에 있는 생성확률들의 합은 1.0이 되고, 연결어미 지식 DB(106)에 용언패턴을 키(key)로 하고 연결어미정보를 값(value)으로 저장되는 것이다. That is, in the feature pattern of the connection ending feature pattern DB 1074, the pattern of the subordinate clause and the dominant clause except the connection ending is changed to the verb pattern of the connection ending knowledge DB 106, and the connection is made using the connection ending of the feature pattern and the corresponding frequency. It can be calculated with the mother information, and the probability of generating the linked mother in the calculated linked mother information can be calculated by dividing the frequency of the verb pattern for each linked mother by the total verb pattern frequency. Therefore, the sum of the generation probabilities in the linking end information for each term pattern becomes 1.0, and the linking end information is stored as a value in the linking end knowledge DB 106 and the linking end information is stored as a value. .

예컨대, 연결어미 지식 DB(106)에 저장되어 있는 용언패턴과 연결어미정보에 대한 것으로, 표 9For example, the verb pattern and the connection end information stored in the connection mother knowledge DB 106, Table 9

키(용언패턴)Key (Proverbs Pattern) 값(연결어미정보)Value (connection ending information)
[종속절 = 오르, 0, 환율, NULL] [지배절 = 상승하, 0, 물가, NULL]
[Subordinate = rising, 0, exchange rate, NULL] [controlling = rising, 0, inflation, NULL] (어서 0.5622[=14/24.9]),
(고 0.1687[=4.2/24.9]),
(며 0.1406[=3.5/24.9])
(면 0.1285[=3.2/24.9])(0.5622 [= 14 / 24.9]),
(High 0.1687 [= 4.2 / 24.9]),
(0.1406 [= 3.5 / 24.9])
(Page 0.1285 [= 3.2 / 24.9])

에 나타낸 바와 같으며, 연결어미정보에 있는 [ ] 사이의 값은 생성확률을 계산하는 과정을 보여주는 것이다.As shown in, the value between [] in the linking mother information shows the process of calculating the generation probability.

따라서, 본 발명은 중/한 자동번역에서 명시적인 어휘로 절 간 논리연결을 표현하지 않은 중국어 문장을 한국어로 번역하는 경우에 있어서, 번역한 한국어의 절 간에 명시적인 논리연결로 표현하지 못하는 문제점을 해결하여 한국어 연결어미로 절 간의 논리연결을 명시적으로 표현하여 중국어 원문의 의미를 자연스러운 한국어 문장으로 자동 번역함으로써 중/한 자동번역의 번역 성능을 향상시킬 수 있다.Therefore, in the case of translating a Chinese sentence that does not express a logical connection between clauses into an explicit vocabulary in Chinese / Chinese translation, the present invention does not express the problem of not expressing an explicit logical connection between translated Korean clauses. By expressing the logical connection between clauses with Korean connection ending, the translation performance of Chinese / Korean automatic translation can be improved by automatically translating the meaning of Chinese original text into natural Korean sentences.

한편, 본 발명의 상세한 설명에서는 구체적인 실시예에 관해 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도 내에서 여러 가지 변형이 가능함은 물론이다. 그러므로 본 발명의 범위는 설명된 실시예에 국한되지 않으며, 후술되는 특허청구의 범위뿐만 아니라 이 특허청구의 범위와 균등한 것들에 의해 정해져야 한다. Meanwhile, in the detailed description of the present invention, specific embodiments have been described, but various modifications are possible without departing from the scope of the present invention. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be defined not only by the scope of the following claims, but also by those equivalent to the scope of the claims.

도 1은 본 발명의 일 실시예에 따른 중한자동번역을 위한 한국어 연결어미 생성 장치에 대한 블록 구성도,1 is a block diagram of an apparatus for generating Korean connected endings for automatic Korean-Chinese translation according to an embodiment of the present invention;

도 2는 본 발명의 일 실시예에 따른 한국어 원시코퍼스로부터 연결어미 지식을 추출하여 연결어미 지식 DB에 구축하는 연결어미 지식 구축부를 도시한 도면,FIG. 2 is a diagram illustrating a connection mother knowledge construction unit for extracting connection mother knowledge from a Korean primitive corpus and constructing the connection mother knowledge DB according to an embodiment of the present invention; FIG.

도 3은 본 발명의 일 실시예에 따른 한국어 구문 분석부에 의해 생성된 의존트리를 나타낸 도면. 3 is a view showing a dependency tree generated by the Korean syntax analysis unit according to an embodiment of the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

101 : 중국어 형태소 분석부 102 : 중국어 구문 분석부101: Chinese morphological analysis unit 102: Chinese syntax analysis unit

103 : 중/한 변환부 104 : 연결어미 결정부103: middle and Korean conversion unit 104: connection ending determination unit

105 : 한국어 생성부 106 : 연결어미 지식 DB105: Korean generation unit 106: Connected mother knowledge DB

107 : 연결어미 지식 구축부107: Connected mother knowledge building

Claims

Dividing a Chinese input sentence into Chinese words and selecting a morpheme part of speech for the separated words;

Generating a Chinese syntax tree based on the morpheme parts of speech;

Converting the Chinese phrase tree into a Korean phrase tree, and converting a word in the Chinese phrase tree into a Korean band word;

Generating a connection ending between each short sentence using a connection ending knowledge DB extracted from a Korean raw corpus when the Korean syntax tree is a plurality of short syntax trees;

Rearranging Korean words according to Korean word order using the Korean syntax tree in which the connected endings are generated, and generating Korean through a function of generating a functional word according to Korean grammar and verbal modality;

Extracting connection mother knowledge from the Korean primitive corpus and constructing the connection mother knowledge in the DB;

Method of generating Korean connected endings for automatic Korean-Chinese translation.

The method of claim 1,

Generating the connection mother,

Obtaining a probability of generating a Korean connection ending using the connection ending knowledge DB automatically extracted from the Korean source corpus by inputting the Korean syntax tree;

Selecting a linkage combination between clauses suitable for the whole sentence based on the generation probability of the linkage ending;

The method of claim 2,

Probability of generation of the connecting mother,

Equation

Probability of generation of linking ending (e) = 0.8 * Probability of generation of connecting ending (e) by lexical verb pattern + 0.2 * Probability of generation of connection ending (e) by semantic verb pattern

A method of generating Korean concatenated endings for Chinese-Korean automatic translation, which is calculated using the formula of.

The method of claim 3, wherein

The probability of generating the connection ending e by the lexical verb pattern is

When the verb pattern is in the connection ending knowledge DB, a calculation probability value of the connection ending e in the connection ending information of the found verb pattern is calculated.

When the probability of generating the connected ending e by the lexical verb pattern is not obtained by the above calculation, i is 1 to m if the number of subject or object center vocabulary features other than NULL in the verb pattern y is m. Repeating until the connection probability (e) is greater than zero;

If any one of the lexical verb patterns made by setting i to NULL from the subject or object center term lexical features that are not NULL in the verb pattern y is searched in the linking knowledge database, the linking mother information of the found verb pattern Calculating the average of the probability of generation of the connected endings (e)

The method of claim 3, wherein

The probability of generating the connection ending e by the semantic verb pattern is

In the verb pattern, if the subject or object central word vocabulary feature that is not NULL is generated and changed in the meaning for the corresponding vocabulary, the semantic verb pattern (ys) is generated and searched in the connection term knowledge DB. Calculating the probability of generating a connection ending (e) in

When the probability of generating the connected ending e by the ending verb pattern is not determined by the calculation, if the number of subject or object central vocabulary features other than NULL in the semantic verb pattern ys is m, i is 1 from 1. repeating until the generation probability of the connecting end (e) is greater than 0 up to m, and

If any one of the semantic verb patterns created with i or NULL in the verb pattern ys is set to i NULL as the object center object vocabulary feature, it is found in the linked ending information of the found semantic pattern. Calculating an average of the probability of generating the connection ending (e)

The method of claim 2,

The step of selecting the inter-connection linkage combination,

If it is not a parallel connection, avoiding neighbors,

Calculating the probability of generating a combined ending combination for the whole sentence by multiplying the probability of generating the associated endings in the combination;

Selecting a combination having the highest calculated probability of generation;

The method of claim 1,

The step of building in the connection mother knowledge DB,

Separating morpheme units by inputting sentences one by one from the Korean primitive corpus, and selecting morpheme parts of speech for the separated morphemes;

Generating a Korean phrase tree by analyzing syntactic relationships between words using the selected morpheme parts of speech;

Extracting a feature pattern according to a linking ending using a generated syntactic syntax node with a linking ending and a syntax node corresponding to a dominant node of the verbal syntax node using the generated Korean syntax tree;

Combining the connection patterns which are not extracted from the Korean native corpus from the connection ending feature pattern DB and combining the connection patterns according to the connection ending to expand the new feature pattern to build in the connection ending knowledge DB.

The method of claim 7, wherein

Extracting and patterning the qualities may include:

Extracting a feature pattern using a syntax node related to a verb phrase node based on the Korean syntax tree;

Repeating the step of extracting the feature pattern when the syntax node is connected with an equal connection ending;

If the syntax node is a linking end and there is a verb syntax node having a parallel linking end in the dependent syntax node, a new feature pattern is used for the linking end of the dependent syntax node by using the dependent syntax node and the governing syntax node of the dependent syntax node. Extracting the

Repeating the step of extracting the new feature pattern when the connection ending of the last dependent phrase node is an equal connection ending

The method of claim 7, wherein

Expanding the new feature pattern to build in the connection mother knowledge DB,

Expanding the feature by changing the subject and object central word vocabulary, except for the verb and the negation feature, from the features of the dependent clause and the governing clause to NULL by the lexical feature reduction expansion method in the extracted feature pattern;

A new feature pattern is generated by combining the inconsistent subordinate clause and the dominant clause in the feature patterns in which the control clause of an arbitrary feature pattern and the subordinate clause of another feature pattern coincide with the feature pattern transition method for the extended feature. Steps,

For the features expanded by the lexical feature reduction method and the feature pattern transfer method, even if the order of the clauses is changed by the symmetric expansion method without the distinction of the subordinate clause or the dominant clause, Expanding,

Extending the vocabulary to the corresponding meaning by using the semantic dictionary used in the Chinese-to-Korean translation of the subject and object of the subordinate clause and the governing clause with the feature pattern extension method

The method of claim 9,

The frequency for the feature pattern expanded by the lexical feature reduction extension method is

Equation

New feature pattern frequency = original feature pattern frequency × 0.5 × (non-NULL subject and object core vocabulary features in extended feature pattern + 1) / (number of subject and object core word vocabulary features in original feature pattern)

The method of claim 9,

Frequency for the feature pattern by the feature pattern transfer method,

Equation

New feature pattern frequency = mean frequency of original feature patterns × 0.4

The method of claim 9,

Frequency for the feature pattern by the symmetric expansion method,

Equation

New feature pattern frequency = original feature pattern frequency × 0.2

A Chinese morphological analysis unit that separates Chinese input sentences into Chinese words and analyzes morphemes to select morpheme parts of speech for the separated words;

A Chinese parsing unit for parsing the syntax to generate a Chinese syntax tree based on the morpheme parts of speech;

A Chinese / Korean conversion unit for converting the Chinese syntax tree into a Korean syntax tree and converting a word of the Chinese syntax tree into a Korean band word;

A connection ending determination unit for generating a connection ending between the short sentences using the connection ending knowledge DB extracted from the Korean source corpus when the Korean syntax tree is a plurality of short syntax trees;

Korean generation unit for rearranging Korean words according to Korean word order using the Korean syntax tree in which the connected endings are generated, and generating Korean through a function word generation function according to aspects of Korean grammar and verbs

Korean connected endings generating device for automatic Chinese-to-Korean translation.

The method of claim 13,

The Korean connected mother generation device,

Connection mother knowledge construction unit for extracting the connection mother knowledge from the Korean native corpus to build in the connection mother knowledge DB

Korean-connected ending generation device for the Chinese-Korean automatic translation further comprising.

The method of claim 14,

The connecting mother knowledge building unit,

A Korean morphological analysis unit for separating morpheme units by inputting sentences one by one from the Korean primitive corpus and selecting morpheme parts of speech from the separated morphemes;

A Korean syntax analysis unit for generating a Korean syntax tree by analyzing syntactic relations between words using the selected morpheme parts of speech;

A feature pattern extracting unit for each connection ending extracting a feature pattern according to a connection ending around the verb phrase node having a connection ending and the syntax node corresponding to the dominant node of the verb phrase node using the generated Korean syntax tree; ,

A linked mother feature pattern DB for storing the extracted feature pattern as a key and storing the frequency as a value;

A feature pattern extension unit for constructing the connection pattern knowledge DB to expand the new feature pattern by combining the connection patterns which are not extracted from the Korean native corpus from the connection mother feature pattern DB.