KR100420474B1

KR100420474B1 - Apparatus and method of long sentence translation using partial sentence frame

Info

Publication number: KR100420474B1
Application number: KR10-2000-0083295A
Authority: KR
Inventors: 노윤형; 박상규; 최승권; 김영길; 서영애
Original assignee: 한국전자통신연구원
Priority date: 2000-12-27
Filing date: 2000-12-27
Publication date: 2004-03-02
Also published as: KR20020054244A

Abstract

1. 청구범위에 기재된 발명이 속한 기술분야1. TECHNICAL FIELD OF THE INVENTION

본 발명은 부분문틀을 이용한 장문 번역 장치 및 그 방법에 관한 것임.The present invention relates to a long sentence translation device and a method using a partial sentence frame.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

본 발명은 장문에 대해 절 단위의 부분문틀을 이용하여 커버리지가 높은 고품질의 번역 결과를 만들어내는 장문 번역 장치 및 그 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있음.The present invention provides a long text translation device for producing a high-quality translation result with high coverage by using a section sentence frame for a long text, and a method and a computer-readable recording medium recording a program for realizing the method. For that purpose.

3. 발명의 해결방법의 요지3. Summary of Solution to Invention

본 발명은, 부분문틀을 이용한 장문 번역 장치에 적용되는 장문 번역 방법에 있어서, 장문을 전처리한 결과에 따라 문장 분할을 수행하여 하나 이상의 단문들로 분할하는 장문 분할 단계; 상기 분할한 단문들을 인식하여 문틀 매칭을 통해 단문 번역을 수행하는 단문 번역 단계; 전체 문틀을 탐색하여 전체 문틀 번역을 수행한 후에 전체 문틀 번역이 성공인지를 판단하는 판단 단계; 및 상기 판단 단계의 판단 결과, 성공이면 번역 결과를 출력하고, 실패이면 부분문틀을 결합하여 번역하는 과정과 전체 문틀을 탐색하여 번역하는 과정을 반복 수행하는 반복 수행 및 번역 결과 출력 단계를 포함한다.According to another aspect of the present invention, there is provided a long sentence translation method applied to a long sentence translation apparatus using a partial sentence frame, the long sentence division step of dividing a sentence into one or more short sentences according to a preprocessing result; A short sentence translation step of recognizing the divided short sentences and performing a short sentence translation through door frame matching; A determination step of determining whether the entire door frame translation is successful after searching the entire door frame to perform the full door frame translation; And as a result of the determination in the determination step, outputting a translation result if successful, and repeating and outputting a translation result by repeating a process of combining and translating a partial sentence frame and searching and translating the entire sentence frame.

4. 발명의 중요한 용도4. Important uses of the invention

본 발명은 기계 번역 장치 등에 이용됨.The present invention is used in machine translation apparatus and the like.

Description

Apparatus and method of long sentence translation using partial sentence frame}

본 발명은 문틀 기반의 자동 번역에서 장문에 대해 절 단위의 부분문틀을 이용하여 커버리지가 높은 고품질의 번역 결과를 만들어내는 장문 번역 장치 및 그 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것으로, 더욱 상세하게는 문틀 기반 자동 번역에서 장문의 경우에 나타나는 현저한 문틀 커버리지의 저하 및 장문 번역이 가지고 있는 번역 품질의 감소 문제를 해결하기 위하여 장문으로부터 절 수준의 부분문틀을 인식하고 절 단위 구조 분석을 통하여 부분문틀간 결합 순서를 결정하고 반복적인 문틀 매칭과 문틀 결합을 수행하여 높은 커버리지의 자연스러운 번역문을 만들어내는 장문 번역 장치 및 그 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.The present invention provides a long text translation device for producing a high-quality translation result with high coverage by using a section-by-section partial frame for a long text in a door frame-based automatic translation, and a computer program having recorded thereon a program for implementing the method. The present invention relates to a recording medium, and more particularly, to recognize a verse-level partial sentence from a long sentence in order to solve the problem of a significant decrease in the door frame coverage in the case of a long sentence and a decrease in the translation quality of the long sentence in a long sentence. The long sentence translation device and its method, and the computer program for realizing the above method, determine the joining order between sub-frames through section unit structure analysis, and perform repeated door frame matching and door frame combination. To a record carrier It is about.

먼저, 본 발명에서 사용되는 용어를 정의하면 다음과 같다.First, terms used in the present invention are defined as follows.

문틀 기반 방식에서 프로텍터라 함은 동사, 접속사 등과 같이 문장에서 구문 분석의 애매성이 폭발적으로 증가하기 시작하는, 문장의 특성을 반영한 단어를 의미한다. 따라서, 프로텍터를 인식함으로써 문장의 형태를 파악할 수 있으며, 번역을 위한 단서를 얻을 수 있다.In the doorframe-based approach, the term protector refers to a word that reflects the characteristics of a sentence, such as verbs and conjunctions, that the ambiguity of parsing begins to increase exponentially. Therefore, by recognizing the protector, it is possible to grasp the form of the sentence and obtain a clue for translation.

그리고, 문틀이라 함은 문장이 가공되어 단순화된 형태를 말하며, 문틀의 구성 요소를 슬롯이라 하는데, 슬롯은 프로텍터 또는 프로텍터 사이의 구문 요소가 된다.In addition, the door frame refers to a simplified form by processing a sentence, the component of the door frame is called a slot, the slot is a protector or a syntax element between the protectors.

그리고, 구문 요소는 프로텍터 사이의 문장 일부를 구문 분석하여 얻은 구문적 성격을 포함하는 것으로, 명사구를 의미하는 "NP", 전치사구를 의미하는 "PP"와 같이 표현된다.The syntax element includes a syntactic character obtained by parsing a part of sentences between protectors, and is expressed as "NP" for a noun phrase and "PP" for a prepositional phrase.

그리고, 부분문틀이라 함은 입력 문장 전체를 포함하지는 않지만 절 수준의 문장을 포함하는 문틀을 의미한다.And, the partial sentence frame refers to a door frame that does not include the entire input sentence but includes a sentence at the clause level.

다음으로, 종래 기술 및 그 문제점을 살펴보면 다음과 같다.Next, the prior art and its problems will be described.

종래의 기계 번역 방법에서 문제가 되었던 상향식 구문 분석이 가지는 애매성 폭발 및 대역 구문의 무제한 생성 문제를 해결하기 위해 문틀 기반 번역 방식이 제안되었다.A doorframe based translation method has been proposed to solve the problem of ambiguity explosion and unlimited generation of band phrases, which has been a problem in the conventional machine translation method.

종래의 문틀 기반의 자동 번역 방법에서는 문장의 구조를 반영하는 프로텍터와 그 사이의 구문 요소로 이루어진 문틀의 개념을 이용하므로 구문 분석의 범위를 제한하여 애매성의 증가를 막으며, 미리 정해진 문장을 위한 틀을 발견하여 대역 구문의 무제한 생성을 방지하고 번역의 품질을 크게 향상시킨다.In the conventional door frame-based automatic translation method, the use of the concept of a door frame composed of a protector reflecting the structure of a sentence and syntax elements therebetween prevents an increase in ambiguity by limiting the scope of parsing and a frame for a predetermined sentence. By discovering it prevents unlimited generation of band phrases and greatly improves the quality of translation.

그러나, 상기와 같은 종래의 문틀 기반의 자동 번역 방법은 문장 길이가 길어짐에 따라 구축해야 할 문틀 수가 급격히 증가하게 되고, 문틀 매칭 성공률이 떨어져 심각한 커버리지 문제를 갖게 된다.However, in the conventional doorframe-based automatic translation method, as the sentence length becomes longer, the number of doorframes to be built up rapidly increases, and the doorframe matching success rate drops, thereby causing serious coverage problems.

따라서, 상기 문제점을 해결하기 위해서는 장문을 분할하여 더 작은 단위로처리해야 할 필요성이 생기는데, 기존의 장문 분할 방법은 정형화된 문장을 가정하고 제한된 패턴을 이용하므로 실제 나타나는 문장들을 처리하는 데에는 현실적이지 않고, 문장 분할이 이루어지고 난 이후에 각 단문의 번역 결과를 연결하여 전체 번역 결과를 생성함에 있어 절 단위 구조 분석이 소수의 규칙으로 이루어지므로 전체 문장을 연결하는 자연스러운 번역 결과를 생성하는데 한계가 있었다.Therefore, in order to solve the above problem, it is necessary to divide the long text and process it in smaller units. However, the conventional long text segmentation method assumes a structured sentence and uses a limited pattern. In addition, since the sentence unit structure analysis is performed with a few rules in generating the entire translation result by connecting the translation results of each short sentence after the sentence division is performed, there is a limit in generating a natural translation result that connects the whole sentences.

따라서, 본 발명은 상기 문제점을 해결하기 위하여 제안된 것으로, 장문에 대해 절 단위의 부분문틀을 이용하여 커버리지가 높은 고품질의 번역 결과를 만들어내는 장문 번역 장치 및 그 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.Accordingly, the present invention has been proposed to solve the above-mentioned problem, and a long sentence translation device and method for realizing a high quality translation result having a high coverage using a section sentence structure of a long sentence and a program for realizing the method Its purpose is to provide a computer readable recording medium having recorded thereon.

즉, 본 발명은, 문틀 기반의 자동 번역에 있어 문장 길이가 길어짐에 따라 발생하는 번역의 커버리지 문제와 일반적인 장문 번역이 가지는 부자연스러운 문장 생성을 해결하기 위하여, 문장 분할, 단문 번역 및 치환을 통하여 처리해야할 문틀 길이를 줄이고 전체 절 단위 구조를 반영하는 문틀 매칭 및 부분문틀간 결합을 단계적으로 반복하여 번역을 수행함으로써, 의미적으로 자연스러운 대역문을 만들어내면서도 문틀 기반이 가지고 있는 커버리지 문제를 해결한 장문 번역 장치 및 그 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.That is, the present invention is processed through sentence segmentation, short sentence translation, and substitution in order to solve the coverage problem of the translation that occurs as the sentence length increases in the door frame-based automatic translation and the unnatural sentence generation of the general long sentence translation. By reducing the length of the door frame to be done and performing translation by repeating the door frame matching and the sub-frame frame combination that reflect the whole clause unit structure, it creates a semantically natural band statement and solves the coverage problem of the door frame base. It is an object of the present invention to provide a translation apparatus, a method thereof, and a computer-readable recording medium having recorded thereon a program for realizing the method.

도 1 은 본 발명에 따른 부분문틀을 이용한 장문 번역 장치의 일실시예 구성도.1 is a block diagram of an embodiment of a long sentence translation device using a partial sentence frame according to the present invention.

도 2 는 본 발명에 따른 부분 문틀을 이용한 장문 번역 방법에 대한 일실시예 흐름도.2 is a flowchart illustrating an embodiment of a long sentence translation method using a partial door frame according to the present invention.

도 3 은 본 발명에 따른 부분문틀을 이용한 장문 번역(영한 번역)의 일예시도.Figure 3 is an example of a long sentence translation (English-Korean translation) using the partial sentence frame according to the present invention.

도 4 는 본 발명에 따른 문장 분할 과정에 대한 일실시예 흐름도.4 is a flowchart illustrating an embodiment of a sentence division process according to the present invention.

도 5 는 본 발명에 따른 문장 분할 과정에 대한 일실시예 상세 흐름도.5 is a detailed flowchart illustrating an embodiment of a sentence division process according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

101 : 입력부 102 : 형태소 분석부101: input unit 102: morphological analysis unit

103 : 품사 결정부 104 : 고정 표현 인식부103: part-of-speech determination unit 104: fixed expression recognition unit

105 : 프로텍터 발견부 106 : 부분 구문 분석부105: protector detector 106: partial parser

107 : 원문틀 생성부 108 : 부분문틀 처리부107: original frame generation unit 108: partial frame processing unit

109 : 대역문틀 선택부 110 : 대역어 생성부109: band sentence selector 110: band word generator

상기 목적을 달성하기 위한 본 발명의 장치는, 부분문틀을 이용한 장문 번역 장치에 있어서, 입력받은 장문에 대하여 전처리를 수행하여 구문 정보를 얻기 위한 전처리 수단; 상기 전처리 수단을 통하여 얻은 구문 정보를 이용하여 상기 입력 장문에 대한 원문틀을 생성하기 위한 원문틀 생성 수단; 및 상기 원문틀 생성 수단에서 생성한 원문틀을 부분문틀로 분할하여 단문을 인식하여 번역한 후에 전체 문틀 탐색과 번역 및 부분문틀 결합과 번역 과정을 반복적으로 수행하여 장문을 번역하기 위한 부분문틀 처리 수단을 포함하는 것을 특징으로 한다.In accordance with an aspect of the present invention, there is provided a device for translating a long sentence, comprising: preprocessing means for preprocessing an input long sentence to obtain syntax information; Text frame generation means for generating a text frame for the input text using the syntax information obtained through the preprocessing means; And a partial sentence processing means for translating a long sentence by dividing the original text frame generated by the original text frame generating means into partial sentence frames, recognizing and translating a short sentence, and repeatedly performing the entire door frame search, translation, and partial sentence frame combining and translation process. Characterized in that it comprises a.

한편, 본 발명의 방법은, 부분문틀을 이용한 장문 번역 장치에 적용되는 장문 번역 방법에 있어서, 장문을 전처리한 결과에 따라 문장 분할을 수행하여 하나 이상의 단문들로 분할하는 장문 분할 단계; 상기 분할한 단문들을 인식하여 문틀 매칭을 통해 단문 번역을 수행하는 단문 번역 단계; 전체 문틀을 탐색하여 전체 문틀 번역을 수행한 후에 전체 문틀 번역이 성공인지를 판단하는 판단 단계; 및 상기 판단 단계의 판단 결과, 성공이면 번역 결과를 출력하고, 실패이면 부분문틀을 결합하여 번역하는 과정과 전체 문틀을 탐색하여 번역하는 과정을 반복 수행하는 반복 수행 및 번역 결과 출력 단계를 포함하는 것을 특징으로 한다.On the other hand, the method of the present invention, in the long sentence translation method applied to the long sentence translation apparatus using a partial sentence frame, the long sentence segmentation step of dividing the sentence into one or more short sentences according to the pre-processing result; A short sentence translation step of recognizing the divided short sentences and performing a short sentence translation through door frame matching; A determination step of determining whether the entire door frame translation is successful after searching the entire door frame to perform the full door frame translation; And if the determination result of the determination step, if the success outputs a translation result, and if it fails, repeating and repeating the process of repeating the process of translating and translating the process of searching the entire door frame and the translation result output step It features.

한편, 본 발명은, 부분문틀을 이용하여 장문을 번역하기 위하여, 프로세서를 구비한 장문 번역 장치에, 장문을 전처리한 결과에 따라 문장 분할을 수행하여 하나 이상의 단문들로 분할하는 장문 분할 기능; 상기 분할한 단문들을 인식하여 문틀 매칭을 통해 단문 번역을 수행하는 단문 번역 기능; 전체 문틀을 탐색하여 전체 문틀 번역을 수행한 후에 전체 문틀 번역이 성공인지를 판단하는 판단 기능; 및 상기 판단 기능에서의 판단 결과, 성공이면 번역 결과를 출력하고, 실패이면 부분문틀을 결합하여 번역하는 과정과 전체 문틀을 탐색하여 번역하는 과정을 반복 수행하는 반복 수행 및 번역 결과 출력 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.On the other hand, the present invention, in order to translate a long sentence using a partial sentence, a long sentence segmentation function for splitting the long sentence into a sentence or more by performing a sentence division according to the result of pre-processing the long sentence; A short sentence translation function that recognizes the divided short sentences and performs a short sentence translation through door frame matching; A determination function of determining whether the entire door frame translation is successful after searching the entire door frame and performing a full door frame translation; And a result of the determination in the determination function, if successful, outputs a translation result, and if it fails, iterates and repeats the process of combining and translating the partial sentence frame and searching and translating the entire sentence frame. A computer readable recording medium having recorded thereon a program is provided.

이처럼, 본 발명은 부분문틀 인식 및 절 단위 구조 분석, 그리고 부분문틀 번역 및 치환, 전체 문틀 탐색, 부분문틀 결합을 반복하는 과정으로 이루어지는 것을 특징으로 한다.As described above, the present invention is characterized by consisting of a process of recognizing a partial sentence frame and analyzing a paragraph unit structure, and translating and replacing a partial sentence frame, searching a whole sentence frame, and combining a partial sentence frame.

즉, 본 발명에서 부분문틀 인식 및 부분문틀간 구조 분석은 분할점 추출, 분할점 구문 패턴 적용, 단문 시작점 인식, 단문 복원 및 문틀 탐색을 통한 단문 인식 및 번역, 시작점-본동사 매칭 및 문틀 탐색을 통한 여러 개의 시작점 후보중에서 올바른 시작점을 선별하는 과정으로 구성되어 있음을 특징으로 하고, 부분문틀간 구조 분석 결과를 이용한 번역 과정은 부분문틀 번역 및 치환, 축소된 문틀 탐색 및 부분문틀 결합을 번역이 성공할 때까지 단계적으로 반복하므로 규칙 기반과 패턴 기반의 절충된 방식으로 번역 과정이 이루어지는 것을 특징으로 한다.That is, in the present invention, the partial sentence frame recognition and the structure analysis between the sub-frame frame is based on the extraction of the split point, applying the split point syntax pattern, the recognition of the short sentence start point, the restoration of the short sentence and the search of the sentence, through the recognition of the short sentence, the starting point-verb verb matching and the door frame search It consists of selecting the correct starting point among several starting point candidates, and the translation process using the structure analysis result between the sub-frames, when the translation is successful in sub-frame translation and substitution, reduced door frame search and sub-frame combination Since the process is repeated step by step, the translation process is performed in a tradeoff between rule-based and pattern-based.

상술한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명에 따른 부분문틀을 이용한 장문 번역 장치의 일실시예 구성도이다.1 is a block diagram of an embodiment of a long sentence translation device using a partial sentence frame according to the present invention.

먼저, 입력부(101)를 통해 들어온 원문은 형태소 분석부(102)를 거쳐 각각의 단어에 대해 형태소가 분석되며, 품사 결정부(103)에서 그 단어들의 품사가 결정된다.First, the original text entered through the input unit 101 is analyzed for each word through the morpheme analysis unit 102, and the part-of-speech of the words is determined by the part-of-speech determination unit 103.

그리고, 고정 표현 인식부(104)에서는 숙어, 복합 명사, 연어 등과 같이 하나의 단어나 구처럼 취급되는 단어들을 묶고 그에 해당하는 새로운 품사를 부착한다. 이처럼 고정 표현을 인식함으로써 구문 분석이나 문틀의 설정을 보다 단순화시킬 수 있다.In addition, the fixed expression recognizer 104 bundles words that are treated like a word or phrase, such as idioms, compound nouns, salmon, and the like, and attaches new parts of speech corresponding thereto. Recognizing fixed expressions like this can simplify parsing and setting up doorframes.

그리고, 프로텍터 발견부(105)에서는 문장에서 중요한 역할을 하는 품사나 단어들, 예를 들어 동사, 접속사, 관계사, 기호 등을 발견하여 프로텍터라는 표시를 부착한다. 이들에 대해서는 어떠한 분석도 수행하지 않는다.The protector finder 105 detects parts of speech or words that play an important role in a sentence, for example, a verb, a conjunction, a related company, a symbol, and the like, and attaches a display as a protector. No analysis is performed on these.

그리고, 프로텍터 사이의 단어들에 대해 부분 구문 분석을 실시하는 부분 구문 분석부(106)에서는 그들에 대해 적절한 구문 태그를 부착한다. 이는 프로텍터를 제외한 부분 구문 분석이므로 애매성이 거의 발생하지 않으며, 분석 시간도 상당히 단축된다.The partial parser 106, which performs partial parsing on the words between the protectors, attaches an appropriate syntax tag to them. This is a partial parsing with the exception of protectors, so ambiguity rarely occurs and the analysis time is significantly reduced.

그리고, 원문틀 생성부(107)에서는 앞에서 결정된 프로텍터와 구문 태그를 이용하여 입력 문장에 대한 원문틀을 만들어낸다. 이렇게 만들어진 원문틀은 원문틀 데이터베이스 내에서 동일한 원문틀을 발견하는데 사용되며, 입력 문장과 정확히 일치하는 원문틀이 없는 경우에는 입력된 원문틀을 절수준의 부분문틀로 분할하여 처리하여 입력 문장을 커버하도록 한다. 이는 본 발명의 핵심에 해당하는 부분문틀 처리부(108)에서 담당한다.In addition, the text frame generation unit 107 generates a text frame for the input sentence using the protector and the syntax tag determined above. The original text frame is used to find the same text frame in the text frame database. If there is no text frame that exactly matches the text, the input text frame is divided into section-level subframes to cover the text. Do it. This is handled by the partial frame processing unit 108 corresponding to the core of the present invention.

즉, 부분문틀 처리부(108)에서는 원문틀에서 부분문틀을 인식하고 분할된 부분문틀에 대한 번역을 수행한다. 각 부분문틀에 대한 번역이 수행되면 전체 문틀을 연결하는 문틀에 의해 전체 번역이 수행되고, 만일 전체를 연결하는 문틀이 존재하지 않으면, 절단위 구조 분석을 통해 부분문틀 간을 묶어서 번역한 후에 부분문틀을 하나의 노드로 치환하여 축소된 전체 문틀을 탐색하여 번역하는 과정을 반복한다.That is, the partial sentence processing unit 108 recognizes the partial sentence frame in the original frame and performs translation on the divided partial sentence frame. When the translation of each partial frame is performed, the entire translation is performed by the door frame connecting the whole door frame, and if there is no door frame connecting the whole door, after translating the partial door frame through the structural analysis, Repeat the process of searching for and translating the entire reduced frame by replacing with one node.

이때, 하나의 부분문틀을 번역하는 과정은 원문틀 데이터베이스 탐색, 대역문틀 선택, 대역어 생성으로 이루어진다. 원문틀 데이터베이스 탐색을 통해 발견된 원문틀은 여러 개의 대역문틀을 가지고 있으며, 문맥에 따라 그 중의 하나가 선택되게 된다. 대역문틀은 이미 대역 언어의 구조적 특성을 모두 포함하고 있는, 현실적으로 가능한 번역 구조만을 포함하고 있다.At this time, the process of translating a partial sentence frame consists of searching the original frame database, selecting a band sentence frame, and generating a band word. The text frame found by searching the database has several band text frames, and one of them is selected according to the context. The band-frame contains only realistically structured translations that already contain all the structural features of the band-language.

그리고, 대역문틀 선택부(109)는 공기 정보, 구문/의미 정보들을 이용하여 현재의 문맥에 맞는 하나의 대역문틀을 선택한다. 이렇게 선택된 대역문틀로부터 그 대역문틀이 가지는 각각의 슬롯을 구조 변환하여 완전한 하나의 문장으로 만들어내는 것은 대역어 생성부(110)가 담당한다. 입력 문장에 대응하도록 만들어진 최종 대역문은 인쇄부(111)나 표시 제어부(113)를 통하여 인쇄 장치(112)나 표시 장치(114)로 출력된다.The band sentence selector 109 selects a band sentence frame suitable for the current context by using air information and syntax / meaning information. The band word generator 110 is responsible for structurally converting each slot of the band sentence frame from the selected band sentence frame into a complete sentence. The final band sentence made to correspond to the input sentence is output to the printing apparatus 112 or the display apparatus 114 through the printing unit 111 or the display control unit 113.

도 2 는 본 발명에 따른 부분 문틀을 이용한 장문 번역 방법에 대한 일실시예 흐름도이다.2 is a flowchart illustrating an embodiment of a long sentence translation method using a partial door frame according to the present invention.

먼저, 그 동작 흐름을 간략하게 살펴보면, 영어 문장에 대하여 형태소 분석,품사 결정, 고정 표현 인식, 프로텍터 발견 및 부분 구분 분석 등과 같은 전처리 과정을 수행한다(201).First, the operation flow is briefly described. A preprocessing process such as morpheme analysis, part-of-speech determination, fixed expression recognition, protector discovery, and partial classification analysis is performed on an English sentence (201).

다음으로, 상기 전처리된 결과인 슬롯의 열에 대해 문장 분할을 수행하여 영어 장문을 하나 이상의 단문들로 분할한다(202).Next, sentence division is performed on a row of slots, which are the result of the preprocessing, to divide the English long sentence into one or more short sentences (202).

이후, 상기 분할된 단문들을 인식하여 그에 대해 문틀 매칭을 통해 단문 번역을 수행한다(203). 이후, 단문 번역이 성공인지를 판단하여(204) 단문 번역에 있어 문틀 매칭에 실패한 경우에는 문장 분할에 오류가 있을 가능성을 고려하여 다른 문장 분할 후보로 단문 번역 과정을 재수행하고, 단문 번역이 성공이면 다음 과정으로 진행한다.Thereafter, the divided short sentences are recognized and short sentence translation is performed through door frame matching (203). Subsequently, if it is determined that the short translation is successful (204) and the door frame matching fails in the short translation, the short sentence translation process is performed again with another sentence division candidate in consideration of the possibility of an error in the sentence division. Proceed to next step.

이후, 단문 번역이 끝나면 부분문틀에 해당하는 부분을 문장 심볼에 해당하는 구문 노드로 치환하고 축소된 전체 문틀을 탐색하여 전체 문틀 번역을 수행한다(205). 이후, 전체 문틀 번역이 성공인지를 판단하여(206) 전체 문틀 탐색에 성공하고 번역이 이루어지면 번역 결과를 출력하고 종료한다.Subsequently, when the short sentence is completed, the part corresponding to the partial sentence frame is replaced with the syntax node corresponding to the sentence symbol, and the entire door frame translation is performed by searching for the reduced entire door frame (205). Subsequently, it is determined whether the entire door frame translation is successful (206), and if the entire door frame search is successful and the translation is made, the translation result is output and ends.

만일, 전체 문틀 번역이 실패하면(206) 절간 구조 분석 규칙에 따라 부분문틀을 결합하여 번역을 시도한다(207). 이후, 부분문틀 번역이 성공인지를 확인하여(208) 성공하면 다시 축소된 전체 문틀에 대해 문틀 탐색 및 번역(205)을 수행하고, 만일 부분문틀 번역에 실패하면 부분문틀 결합 및 번역 과정(207)을 반복 수행한다.If the entire door frame translation fails (206), the partial sentence frame is combined and attempted to be translated according to the intersect structural analysis rule (207). Subsequently, it is checked whether the partial sentence translation is successful (208) and if successful, performs the door frame search and translation (205) for the entire reduced door frame, and if the partial sentence translation fails, the partial sentence combining and translation process (207). Repeat this.

이제, 상술한 동작 흐름을 상세하게 살펴보기로 한다.Now, the above-described operation flow will be described in detail.

상기 문장 분할 과정(202)에서는 부분 구문 분석 결과로부터 모든 하위절의 시작점을 인식한다. 즉, 문장의 분할점은 모든 하위절의 시작점에 해당한다. 시작점의 인식은 정해진 구문 패턴과 미리 구축된 데이터베이스 정보를 통해서 이루어지고 모든 절의 시작점을 인식하는 것을 전제로 한다. 시작점 인식을 위한 구문 패턴은 구두점과 접속사, 관계사, 의문사 등의 조합으로 이루어져 있다. 시작점의 인식에 있어 콤마나 대등접속사에 의한 명사구 병렬에 의해 시작점에 애매성이 있는 경우나 접속사 생략으로 인해 제외된 경우는 시작점을 위해 미리 구축된 시작점 패턴을 적용하여 해결한다. 따라서, 정해진 시작점 패턴에 의해 시작점 후보가 인식되고 데이터베이스의 시작점 패턴에 의해 시작점 제거 및 추가가 이루어지게 된다. 이후에 해결되지 않는 시작점 애매성이 있는 경우에 모든 후보를 인식하고 뒷 단계에서 선택하도록 한다.In the sentence division process 202, the starting point of all the subclauses is recognized from the partial syntax analysis result. In other words, the splitting point of a sentence corresponds to the starting point of every subclause. Recognition of the starting point is achieved through a predetermined syntax pattern and pre-built database information, and it is assumed that the starting point of all clauses is recognized. The syntax pattern for starting point recognition consists of a combination of punctuation, conjunctions, affiliates, and interrogators. In case of ambiguity of the starting point by comma or parallel noun phrases in recognition of the starting point, or when it is excluded due to the omission of the conjunction, the pre-established starting point pattern for the starting point is applied. Therefore, the starting point candidate is recognized by the predetermined starting point pattern, and the starting point is removed and added by the starting point pattern of the database. If there is a starting point ambiguity that is not resolved later, all candidates are recognized and selected in later steps.

상기와 같이 모든 절의 시작점이 인식되면 각 시작점 중에서 단문의 시작점을 인식하여 번역하고(203) 그에 해당하는 단문의 끝점 인식을 시도한다. 단문의 시작점 인식은 다음 시작점이 "that"절, 관계절, 의문사절이 아닌 모든 시작점을 단문의 시작점으로 인식함으로써 이루어지고, 단문의 끝점은 문틀 탐색, 시작점-주동사 매칭 등을 통해 이루어진다. 단문 인식이 수행되면 단문이 속한 절의 종류에 따라 도치나 생략에 의해 문장의 주어나 목적어가 생략된 경우에 단문의 문장 복원을 수행한다. 문장 복원은 접속사 및 관계사를 제외한 문틀이 동사로 시작하는 경우에 명사 슬롯을 동사 앞에 추가하고, 관계절 및 의문사가 이끄는 절에 있어 목적어가 생략된 경우에 목적어 위치를 추정하여 명사 슬롯을 추가한다. 이러한 문장 복원을 통하여 부분문틀의 문틀 커버리지를 높일 수 있고 원활한 번역문 생성을 수행할 수 있다. 단문 번역에 있어 문틀 매칭에 실패한 경우에 이는 문장 분할에 오류가 있을 가능성을 고려하여 다른 문장 분할 후보로 단문 번역 과정을 재수행한다.As described above, when the starting point of all the sections is recognized, the starting point of the short sentence is recognized and translated (203), and the end point recognition of the corresponding short sentence is attempted. Recognition of the starting point of a short sentence is achieved by recognizing all the starting points as the starting point of the short sentence, not the "that" clause, the relation clause, and the interrogation clause, and the end point of the short sentence is achieved through the door frame search and the starting point to the verb matching. When the short sentence recognition is performed, the sentence sentence restoration is executed when the subject or object of the sentence is omitted by inversion or omission according to the type of the clause to which the short sentence belongs. The sentence restoration adds a noun slot before the verb when the door frame except the conjunction and the related verb starts with a verb and adds a noun slot when the object is omitted in the clause led by the relative clause and the interrogator. By restoring such sentences, the door frame coverage of the partial sentence frame can be increased and smooth translation can be generated. In case of short sentence matching in short sentence translation, the short sentence translation process is re-executed to another sentence division candidate in consideration of the possibility of error in sentence division.

이처럼 단문 번역이 끝나면 부분문틀에 해당하는 부분을 문장 심볼에 해당하는 구문 노드로 치환하고 축소된 전체 문틀을 탐색하여 전체 문틀 번역을 수행한다(205). 전체 문틀 탐색에 성공하고 번역이 이루어지면 번역 결과를 출력하고 종료한다. 만일, 실패하면 절간 구조 분석 규칙에 따라 부분문틀 결합 및 번역(207)을 시도하고, 이에 성공하면 다시 축소된 전체 문틀에 대해 문틀 탐색 및 번역(205)을 수행한다. 만일, 부분문틀 결합 및 번역에 실패하면 부분문틀 결합 및 번역 과정(207)과 전체 문틀 탐색 및 번역 과정(205)을 반복한다.When the short sentence is completed as described above, the part corresponding to the partial sentence frame is replaced with the syntax node corresponding to the sentence symbol, and the entire door frame translation is performed by searching for the reduced entire door frame (205). If the whole door frame is searched successfully and the translation is made, the translation result is printed and then terminated. If it fails, the partial frame combining and translation 207 is attempted according to the intersect structure analysis rule, and if successful, the door frame searching and translation 205 is performed on the entire reduced frame. If the partial sentence combining and translation fails, the partial sentence combining and translation process 207 and the entire sentence search and translation process 205 are repeated.

이때, 부분문틀 결합에 사용되는 절간 병렬 분석 규칙은 다음과 같다.At this time, the inter-parallel parallel analysis rule used for the partial statement combining is as follows.

1. "that"절, 관계절, 의문사 절 : 바로 앞 절에 의존(depend)1. The "that" clause, the relation clause and the question clause clause: depend on the immediately preceding clause

2. T "that"절, (T) and "that"절/관계절 : 가장 가까운 "that"절/관계절에 병렬2. T "that" clause, (T) and "that" clause / relational clause: parallel to the nearest "that" clause / relational clause

3. T(n)V: 가장 가까운 앞쪽의 (n)V 또는 (n)V가 없는 접속사에 연결3. T (n) V: Connect to a connection without the nearest front (n) V or (n) V

4. (T) and nV : 앞쪽에 있는 모든 nV와 유사도가 가장 높은 것과 병렬4. (T) and nV: parallel with the highest similarity with all the nV in front

5. (T) and V : 가장 가까운 앞쪽의 V와 병렬(수, 시제 고려)5. (T) and V: parallel to the nearest front V (consideration of numbers and tense)

위에서 'T'는 콤마(comma) 등의 구두점을 나타내고, 'n'은 명사구, 'V'는 동사를 나타내고, 각 시작점간의 유사도를 구하는 식은 아래의 (수학식 1)과 같다.'T' represents a punctuation such as a comma, 'n' represents a noun phrase, 'V' represents a verb, and the equation for calculating the similarity between each starting point is as shown in Equation 1 below.

유사도 S = w1*Sim(C-C) + w2*Sim(n-n) + w3*Sim(V-V)Similarity S = w1 * Sim (C-C) + w2 * Sim (n-n) + w3 * Sim (V-V)

그리고, 각 항목의 유사도를 구할 때 고려하는 대상은 Sim(C-C):어휘, Sim(n-n):타입(type), 의미코드, 어휘, Sim(V-V):타입(type), 어휘, 시제, 수-인칭(TV) 등과 같다.In order to calculate the similarity of each item, the objects to be considered are Sim (CC): Vocabulary, Sim (nn): Type, Semantic Code, Vocabulary, Sim (VV): Type, Vocabulary, Tense, Number Same as First Person (TV).

상기 분석 규칙에 따라 절 단위 결합을 수행하는 과정은, 먼저 현재 처리해야 할 문틀의 각 슬롯에 해당하는 의존(dependency) 리스트를 구성한다. 의존(dependency) 분석은 각 시작점 및 본동사에서 상기 분석 규칙에 의해 다음과 같이 수행된다.In the process of performing the unit-by-section combining according to the analysis rule, first, a dependency list corresponding to each slot of the door frame to be processed currently is constructed. Dependency analysis is performed as follows by the analysis rule at each starting point and main verb.

1. 분석 규칙에서 규칙 1의 경우에 dependency[i].depend <- 1, dependency[i].link <- 바로 앞절의 시작점을 할당한다.1. In rule 1 of the analysis rule, assign the starting point of dependency [i] .depend <-1, dependency [i] .link <-immediately preceding.

2. 분석 규칙에서 규칙 2-4의 경우에 dependency[i].depend <- 0, dependency[i].link <- 해당 절이 병렬을 이루는 절의 시작점을 할당한다.2. In rule 2-4 of the analysis rule, dependency [i] .depend <-0, dependency [i] .link <-assign the starting point of the clause in which the clause is parallel.

3. 그외의 종속 접속사절인 경우에 dependency[i].depend <- 0, dependency[i].link <- 바로 앞절의 시작점을 할당한다.3. For any other dependent connection clause, assign the starting point of dependency [i] .depend <-0, dependency [i] .link <-immediately preceding.

4. 관계절이 문장 내에 포함되는 경우에 관계절의 끝점을 분석하기 위해 슬롯 내의 부정사나 분사가 아닌 본동사에 대해 하나의 절에 하나의 본동사가 매칭되도록 연결한다. 이때, 이러한 연결 링크는 교차되지 않도록 해야 한다. 이러한 연결 링크가 전체 문장에 대해 구성되는 경우에 각 본동사에 대해dependency[i].depend <- 0, dependency[i].link <- 연결되는 시작점을 할당한다.4. When relation clauses are included in a sentence, connect one main verb to one clause for a main verb that is not infinitive or participle in a slot to analyze the end point of the relation clause. At this time, these connecting links should not be crossed. When such a link is constructed for the whole sentence, assign a starting point for each main verb with dependency [i] .depend <-0, dependency [i] .link <-.

상기 과정에서 구해진 의존(dependency) 리스트로부터 각 슬롯의 깊이(depth) 리스트를 다음과 같이 구한다.From the dependency list obtained in the above process, the depth list of each slot is obtained as follows.

dependency[i].link 존재시 depth[i] = depth[dependency[i].link] + dependency[i].depend;depth [i] = depth [dependency [i] .link] + dependency [i] .depend;

dependency[i].link 부재시 depth[i] = depth[i - 1];absence dependency [i] .link depth [i] = depth [i-1];

상기 과정을 통해 구해진 깊이(depth)를 기반으로 깊이(depth)가 깊은 순서부터 절간의 결합이 수행된다.Based on the depth obtained through the above process, the joining of the clauses is performed in the order of the deepest depth.

상기 과정을 도 3 을 참조하여 예문을 통하여 설명하면 다음과 같다.The above process is described with reference to FIG. 3 as follows.

도 3 은 본 발명에 따른 부분문틀을 이용한 장문 번역(영한 번역)의 일예시도이다.3 is an example of a long sentence translation (English-Korean translation) using the partial sentence frame according to the present invention.

먼저, 원문에 대한 부분 구문 분석 결과에서 '/'로 표시되는 시작점 인식을 수행한 후에 각 시작점과 본 동사에 대해 의존(dependency) 리스트를 구한다. 도면에서 화살표와 숫자들은 각 시작점과 본동사들에 대한 dependency[i].link와 dependency[i].depend를 나타낸다. 이러한 의존(dependency) 리스트를 통해 문장의 맨 처음의 깊이(depth)를 '0'으로 하여 깊이(depth) 리스트가 도 3 과 같이 구해진다.First, after performing the starting point recognition indicated by '/' in the partial parsing result of the original text, a dependency list is obtained for each starting point and the verb. Arrows and numbers in the figure indicate dependency [i] .link and dependency [i] .depend for each starting point and main verbs. Through this dependency list, a depth list is obtained as shown in FIG. 3 using the first depth of a sentence as '0'.

그러면, 가장 먼저 인식된 부분문틀 중에서 단문에 해당하는 부분이 먼저 번역되어 도면에 도시된 바와 같이 's'로 치환되고, 전체 문틀(301)에 대한 문틀 탐색이 수행된다.Then, the part corresponding to the short sentence among the first recognized partial sentence frames is first translated and replaced with 's' as shown in the figure, and the door frame search for the entire door frame 301 is performed.

만일, 문틀 탐색 및 번역에 성공하면 번역결과를 출력하고 종료하고, 실패하면 깊이(depth) 리스트 값에 따라 'pTs'가 하나의 문틀로 결합이 시도되고 성공하면 전체 문틀(302)에 대한 번역이 시도된다. 마찬가지로 문틀 탐색에 실패하면 깊이(depth) 리스트에 따라 'nCs'의 문틀 결합이 시도되고 동일한 작업이 반복된다.If the door frame search and translation is successful, the translation result is output and finished. If the door frame fails, 'pTs' is combined into one door frame according to the depth list value. If the door frame is successful, the translation of the entire door frame 302 is performed. Is attempted. Likewise, if the doorframe search fails, the doorframe combination of 'nCs' is attempted according to the depth list, and the same operation is repeated.

따라서, 적절한 과정에서 패턴을 기술함으로써 패턴 방식에 의한 높은 번역 품질과 구조 분석 규칙에 의한 단계적 부분문틀간 결합을 통해 높은 커버리지를 얻을 수 있다.Therefore, by describing the pattern in a proper process, high coverage can be obtained through combining the high translation quality by the pattern method and the stepwise sub-framework by the structural analysis rule.

도 4 는 본 발명에 따른 문장 분할 과정에 대한 일실시예 흐름도이다.4 is a flowchart illustrating an example of a sentence division process according to the present invention.

도면에 도시된 바와 같이, 장문 분할을 위해서는 우선 부분 구문 분석 결과인 슬롯의 열에서 모든 단문의 시작점을 추출한 후에(401) 시작점 패턴을 이용하여 시작점 후보를 추출한다(402).As shown in the figure, for long palm segmentation, first, starting points of all short sentences are extracted from a column of slots which are partial parsing results (401), and starting point candidates are extracted using a starting point pattern (402).

다음으로, 상기 시작점 후보에 대응되는, 관계사절이 아닌 단문의 끝점 후보들을 추출한 후에(403) 끝점 패턴을 이용하여, 관계사로 연결된 단문의 끝점 후보를 추출한다(404).Next, after extracting the end point candidates of the non-relative short sentence corresponding to the start point candidate (403), the end point candidate of the short sentence connected to the related company is extracted using the end point pattern (404).

이후, 단문 후보를 추출하고 복원한 후에 문틀 매칭을 수행하여 전체 문장에 대하여 단문을 추출한다(405).Thereafter, after extracting and restoring the short candidate, door frame matching is performed to extract the short sentence for the entire sentence (405).

도 5 는 본 발명에 따른 문장 분할 과정에 대한 일실시예 상세 흐름도이다.5 is a detailed flowchart of an embodiment of a sentence division process according to the present invention.

먼저, 부분 구문 분석 결과인 슬롯의 열에서 모든 단문의 시작점 후보를 추출하여(501,502) 시작점 후보가 있는지를 판단하여(503) 없으면 리턴하고, 있으면다음 과정으로 진행한다.First, the starting point candidates of all short sentences are extracted from the column of the slot, which is the result of partial parsing (501, 502).

이때, 501 과정은, 접속사, 관계사, 구두점 등의 구문 정보만을 이용하여 단문의 시작점 후보들을 추출하는 과정으로서, 입력된 슬롯의 열로부터 직관적으로 추출 가능하다. 시작점 추출에 사용되는 구문 정보로는 다음과 같은 것이 있다.In this case, step 501 is a process of extracting starting point candidates of short sentences using only syntax information such as a conjunction, a related company, and a punctuation mark, and may be intuitively extracted from a column of input slots. Syntax information used for starting point extraction is as follows.

- 등위 접속사 : and, but, or-Conjunctions: and, but, or

- 종속 접속사 : if, when, before, until, as, becauseDependent conjunctions: if, when, before, until, as, because

- 명사절 접속사 : that-Noun conjunctions: that

- 관계사 : who, which, that, whose-Related companies: who, which, that, whose

- 콤마(comma), 인용부호(" ")Comma, quotation marks ("")

- 문장의 시작-The beginning of a sentence

다음으로, 502 과정은 미리 구축된 단문 시작점 패턴 데이터베이스에 저장된 패턴들을 분할하고자 하는 영어 문장과 매칭하여 시작점 후보를 찾는 과정이다. 영어 문장과 단문 시작점 패턴의 매칭은 501과정에 의해 인식된 시작점 사이의 슬롯 열의 전부 또는 일부에 대해 단문 시작점 패턴과의 매칭을 시도함으로써 이루어지며, 매칭이 성공한 경우에 시작점 후보로 추출해 낸다.Next, step 502 is a process of finding a starting point candidate by matching patterns stored in a pre-built short starting point pattern database with an English sentence to be divided. Matching of the English sentence and the short sentence starting point pattern is performed by attempting matching with the short sentence starting point pattern for all or part of the slot sequence between the starting points recognized by step 501, and extracting the starting point candidate when the matching is successful.

단문 시작점 패턴 데이터베이스는 접속사의 생략 등으로 인해 501과정으로는 추출되지 않는 시작점 후보들을 추출하기 위한 패턴들의 집합으로서, 미리 수집된 대량의 영어 문장들로부터 501과정에 의해 추출되지 않는 시작점에 대한 패턴을 수집하여 구축한다. 단문 시작점 패턴은 시작점 주위의 구문 정보와 문맥 정보를 슬롯의 종류와 자질 정보를 이용하여 기술함으로써 시작점을 인식할 수 있도록 구축된 패턴으로서, 시작점의 앞뒤로 나타나는 슬롯의 종류와 해당 자질의 열을 영어 문장에서 나타난 순서대로 기술하여 시작점의 구문 정보와 문맥 정보를 기술한다. 단문 시작점 패턴에 기술되는 슬롯명은 n, V, T, C, p 등이 있으며, 이것은 차례대로 명사구, 동사구, 쉼표 등의 심벌, 관계사를 포함한 접속사, 전치사구를 나타낸다. 슬롯의 자질은 슬롯명 뒤에 []표시 안에 해당 구문에 대한 자질의 종류와 자질값 검사용 연산자, 그리고 해당 자질값을 기술하여 나타낸다. /는 시작점 위치를 표시한다. 단문 시작점 패턴의 예는 다음과 같다.The short-term starting point pattern database is a set of patterns for extracting starting point candidates that are not extracted in step 501 due to the omission of a conjunction, and the pattern for the starting point that is not extracted in step 501 from a large number of pre-collected English sentences. Collect and build. The short sentence starting point pattern is a pattern constructed to recognize the starting point by describing syntax information and context information around the starting point using the slot type and feature information. The short sentence starting point pattern is an English sentence. Describe the syntax information and context information of the starting point by describing in the order shown in. Slot names described in the short start point pattern include n, V, T, C, and p, which represent symbols of noun phrases, verb phrases, commas and the like, conjunctions including related words, and prepositional phrases. The feature of a slot is indicated by describing the type of feature, the operator for checking the feature value, and the feature value in the [] after the slot name. / Indicates the starting point position. An example of a short start point pattern is shown below.

- V[etype==[D5, T5]] n p / n V[etype==[T1]] pV [etype == [D5, T5]] n p / n V [etype == [T1]] p

상기 패턴은 "...had assured the U.S. government throughout the day / Russian troops would not cross into Kosovo..."의 문장에서와 같이 명사절 "that"이 생략된 문장에 대한 시작점 추정을 위한 패턴이다. T5, D5, T1 등은 동사의 형태를 나타낸다.The pattern is a pattern for estimating the starting point for a sentence in which the noun clause “that” is omitted, as in the sentence “… had assured the U.S. government throughout the day / Russian troops would not cross into Kosovo ...”. T5, D5, T1, etc. represent the verb form.

상기 과정을 통해 시작점 후보들을 모두 추출한 다음에, 시작점 후보에 대응되는 단문의 끝점 후보들을 추출해 낸다(504,505). 여기서, 504 과정은 관계사로 연결되지 않은 단문의 시작점에 대응하는 단문의 끝점 후보를 추출해 내는 과정으로, 관계사로 연결되지 않은 단문에 대해서는 다음 단문의 시작점을 단문의 끝점으로 추출해 낸다.After extracting all the starting point candidates, the short end candidates corresponding to the starting point candidates are extracted (504 and 505). Here, the process 504 extracts an end point candidate of a short sentence corresponding to a start point of a short sentence that is not connected to an affiliated company, and extracts a starting point of the next short sentence as an end point of the short sentence for a single non-connected short sentence.

그리고, 505 과정은 관계사로 연결된 단문의 끝점 후보를 인식하는 과정으로, 단문 끝점 패턴을 이용하여 끝점 후보를 추출한다. 단문 끝점 패턴은 단문의시작점이 단문의 끝점이 되지 않는 단문의 끝점에 대하여 끝점 주위의 구문 정보와 문맥 정보를 슬롯의 종류와 자질 정보를 이용하여 기술함으로써 끝점을 인식할 수 있도록 구축된 패턴으로서, 끝점의 앞뒤로 나타나는 슬롯의 종류와 해당 자질의 열을 영어 문장에서 나타난 순서대로 기술함으로써 끝점의 구문 정보와 문맥 정보를 기술한다.In operation 505, an endpoint candidate of a short sentence connected to a related company is recognized. The endpoint candidate is extracted using a short endpoint pattern. The short end pattern is a pattern constructed to recognize the end point by describing the syntax information and context information around the end point using the slot type and the quality information for the end point of the short sentence that does not become the end point of the short sentence. The syntax information and context information of the endpoint are described by describing the types of slots appearing before and after the endpoint and the columns of the corresponding features in the order in which they appear in the English sentence.

단문의 끝점 패턴은 단문 시작점 패턴과 동일한 형태로 기술된다. 끝점 추정을 위한 패턴의 매칭은 관계사로 시작된 단문이면서 두개의 시작점 후보 사이에 하나 이상의 본동사가 존재하는 경우에 대해 두 시작점 후보 구간 사이의 전부 또는 일부의 슬롯 열에 대해 패턴 매칭을 시도하여 매칭되는 끝점 패턴이 존재하는 경우에 이를 끝점 후보로 추출한다.The short end pattern is described in the same form as the short start pattern. The pattern matching for end point estimation is a short sentence starting with a related company and end point pattern that is matched by attempting pattern matching on all or part of slot columns between two start point candidate intervals when one or more main verbs exist between two start point candidates. If present, it is extracted as an endpoint candidate.

이때, 시작점과 끝점의 후보 인식에 있어서 애매성이 발생하는 경우에는 모든 후보를 인식하고, 이후의 단문 문틀 매칭을 통해 애매성을 해소함으로써 올바른 시작점을 선택한다. 단문 문틀 매칭은 시작점 후보와 끝점 후보 추출의 애매성을 해소하고 올바른 시작점과 끝점을 결정하기 위한 것으로서, 단문 문틀과 단문 후보가 매칭되었다는 것은 단문 후보가 하나의 올바른 단문의 형태라는 것을 의미한다.In this case, when ambiguity occurs in recognition of candidates of the starting point and the end point, all candidates are recognized, and the correct starting point is selected by eliminating the ambiguity through subsequent short sentence matching. Short sentence matching is used to resolve the ambiguity between starting point candidate and end point candidate extraction and to determine the correct starting point and end point. The short sentence matching the short sentence means that the short candidate is a correct short form.

이러한 단문 문틀 매칭을 시도하기 위해 우선 영어 문장의 처음 부분부터 시작점 후보와 대응되는 끝점 후보 사이를 단문 후보로 추출한다(506).In order to attempt the short sentence matching, a short candidate is first extracted between the starting point candidate and the corresponding end point candidate from the beginning of the English sentence (506).

단문 후보가 추출되면 단문이 속한 절의 종류에 따라 도치나 생략에 의해 문장의 주어나 목적어가 생략된 경우에 단문 후보의 문장 복원을 수행한다(507). 문장 복원은 접속사 및 관계사를 제외한 문틀이 동사로 시작하는 경우에 명사 슬롯을동사 앞에 추가하고 관계절 및 의문사가 이끄는 절에 있어 목적어가 생략된 경우에 목적어 위치를 추정하여 명사 슬롯을 추가하여 문장 복원을 수행한다.When the short candidate is extracted, when the subject or object of the sentence is omitted by inversion or omission according to the type of the clause to which the short sentence belongs, the sentence restoration of the short candidate is performed (507). Sentence restoration adds a noun slot in front of a verb when the frame of a sentence except a conjunction and an associated verb starts with a verb, and adds a noun slot when the object is omitted in a clause led by a relative clause and a question. Perform.

이후, 문장 복원된 단문 후보에 대해 단문 문틀 데이터베이스에서 가져온 단문 문틀과의 매칭을 시도하고(508) 문틀 매칭이 성공인지를 판단하여(509) 매칭이 성공하면 이를 단문으로 추출해 내고, 실패이면 단문 후보 추출 과정(506)으로 진행한다.Subsequently, the sentence restored short sentence candidate is attempted to be matched with the short sentence frame obtained from the short sentence frame database (508). If the match is successful (509), and if the match is successful, the short sentence candidate is extracted. Proceed to extraction process 506.

이후, 단문이 추출된 나머지 문장에 대해서 동일한 형태로 단문 후보를 추출해 내고 문장 복원을 수행한 후 문틀 매칭을 반복하여 수행한다(506 내지 510). 즉, 단문으로 분할하고자 하는 영어 문장이 남아 있는 경우에는 다른 시작점 후보와 끝점 후보 사이의 단문 후보에 대해 단문 후보 추출 과정(506)부터 반복 수행하여 영어 문장 전체를 단문들로 분할해 낸다.Subsequently, the short sentence candidates are extracted in the same form with respect to the remaining sentences in which the short sentences are extracted, the sentence restoration is performed, and the door frame matching is repeated (506 to 510). That is, when the English sentence to be divided into short sentences remains, the short sentence candidate between the other starting point candidate and the end point candidate is repeatedly performed from the short candidate extraction process 506 to divide the entire English sentence into short sentences.

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 형태로 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다.As described above, the method of the present invention may be implemented as a program and stored in a recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.) in a computer-readable form.

이상에서 설명한 본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니고, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하다는 것이 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 명백할 것이다.The present invention described above is not limited to the above-described embodiments and the accompanying drawings, and various substitutions, modifications, and changes are possible in the art without departing from the technical spirit of the present invention. It will be clear to those of ordinary knowledge.

상기와 같은 본 발명은, 문장 분할 및 치환을 통하여 처리해야할 문틀 길이를 줄이고 전체 절 단위 구조를 반영하는 문틀 매칭 및 부분문틀간 결합을 단계적으로 반복하여 번역을 수행함으로써, 의미적으로 자연스러운 대역문을 만들어내면서도 문틀 기반이 가지고 있는 커버리지 문제를 해결할 수 있는 효과가 있다.The present invention as described above, by reducing the length of the door frame to be processed through sentence division and substitution, and performing translation by repeating the step-by-step matching and sub-frame frame combination to reflect the entire clause unit structure, the semantically natural band statement While creating, it has the effect of solving the coverage problem of the door frame base.

즉, 본 발명은 문틀 기반 자동 번역에서 정확성 높은 부분문틀 인식과 절 단위 구조 분석, 단계적 문틀 적용 및 부분문틀 결합을 통해 높은 커버리지와 고 품질의 장문 번역을 얻을 수 있는 효과가 있다.That is, the present invention has an effect of obtaining a high coverage and high quality long sentence translation through a highly accurate partial sentence recognition, paragraph unit structure analysis, stepwise door frame application, and partial sentence combination in a door frame-based automatic translation.

Claims

In the long sentence translation apparatus using a partial sentence frame,

Preprocessing means for preprocessing the received palmprint to obtain syntax information;

Text frame generation means for generating a text frame for the input text using the syntax information obtained through the preprocessing means; And

Partial frame processing means for translating long texts by dividing the original text frame generated by the original text frame generating means into partial text frames, recognizing and translating short texts, and then repeatedly performing the entire door frame search, translation, partial text frame combining and translation process

Long sentence translation device using a partial sentence frame comprising a.

The method of claim 1,

The partial frame processing means,

After the original text frame generated by the original text frame generating means is divided, the partial text frame is recognized and the translation is performed for each divided partial text frame. If this does not exist, the translation of the input sentence by repeating the process of combining and translating the partial sentence frames through paragraph unit structure analysis and searching and translating the reduced entire sentence frame by substituting the partial sentence into one node is repeated. Long sentence translation device using a partial sentence frame characterized in.

The method of claim 2,

The partial frame processing means,

A long sentence translation device using a partial sentence frame, wherein the partial sentence frame is translated by performing a text frame database search, a band sentence frame selection, and a band word generation process.

The method according to any one of claims 1 to 3,

The partial frame processing means,

After extracting the starting point of short sentence using syntax information, starting point candidate is extracted using starting point pattern.

Extracting the end point candidates of the short sentence that are not the relational ambassador corresponding to the starting point candidate and extracting the end point candidates of the short sentences connected to the related company using the end point pattern,

A long sentence translation device using a partial sentence frame, characterized in that after extracting and restoring a short sentence candidate, door frame matching is performed to extract a short sentence for all sentences.

In a long sentence translation method applied to a long sentence translation device using a partial sentence frame,

A palmprint segmentation step of segmenting the palmprint into one or more fragments according to a preprocessing result;

A short sentence translation step of recognizing the divided short sentences and performing a short sentence translation through door frame matching;

A determination step of determining whether the entire door frame translation is successful after searching the entire door frame to perform the full door frame translation; And

As a result of the determination of the determination step, if it is successful, output the translation result, and if it is unsuccessful, iteratively repeats the process of translating and combining the partial sentence frame and the process of searching and translating the entire sentence frame.

Long sentence translation method using a partial sentence frame comprising a.

The method of claim 5, wherein

The palmprint division process of the palmprint division step,

A first starting point candidate extracting step of extracting starting point candidates of short sentences using syntax information and extracting starting point candidates using starting point patterns;

Extracting end point candidates using the extracted start point candidates and extracting end point candidates using end point patterns; And

After extracting and restoring short sentence candidates, short frame extraction process is performed to extract the short sentence for the whole sentence by performing door frame matching

Long sentence translation method using a partial sentence frame comprising a.

The method of claim 6,

The short extraction process,

A short code extraction process of extracting a short candidate between the starting point candidate and the corresponding end point candidate;

A sentence restoration process of restoring the extracted short sentence candidates into a grammatically complete sentence;

Determining a starting point candidate and an end point candidate as a starting point and an end point according to a matching success and attempting to match the restored short sentence candidate with a short sentence; And

An iterative process that repeats the short candidate extraction process until the entire long sentence is divided into a set of short sentences.

Long sentence translation method using a partial sentence frame comprising a.

The method according to claim 6 or 7,

The first starting point candidate extraction process,

A second starting point candidate extracting step of extracting short sentence starting point candidates such as conjunctions, affiliated companies, punctuation marks, etc. from a slot column that is a result of partial parsing of the input long sentence; And

A third starting point candidate extracting process for extracting starting point candidates not extracted in the second starting point candidate extracting process through matching of a short text starting point pattern due to omission of a conjunction

Long sentence translation method using a partial sentence frame comprising a.

The method according to claim 6 or 7,

The first endpoint candidate extraction process,

A second end point candidate extracting step of extracting a starting point of the next short sentence as an end point of the short sentence for the short sentence not connected to the related company; And

Third endpoint candidate extraction process for extracting the end points of short sentences connected to related companies using short end pattern

Long sentence translation method using a partial sentence frame comprising a.

In order to translate a long sentence using a partial sentence, in a long sentence translation device having a processor,

A palmprint segmentation function for partitioning a sentence into one or more fragments according to a preprocessing result;

A short sentence translation function that recognizes the divided short sentences and performs a short sentence translation through door frame matching;

A determination function of determining whether the entire door frame translation is successful after searching the entire door frame and performing a full door frame translation; And

As a result of the determination in the determination function, if the result is successful, the translation result is output, and if the failure is repeated, the repetition and the translation result output function are performed to repeat the process of combining the partial sentence frame and the process of searching and translating the entire sentence frame.

A computer-readable recording medium having recorded thereon a program for realizing this.