KR102069697B1

KR102069697B1 - Apparatus and method for automatic interpretation

Info

Publication number: KR102069697B1
Application number: KR1020130089649A
Authority: KR
Inventors: 이수종; 김상훈; 김정세
Original assignee: 한국전자통신연구원
Priority date: 2013-07-29
Filing date: 2013-07-29
Publication date: 2020-02-24
Also published as: JP6397641B2; KR20150014235A; JP2015026054A

Abstract

자동 통역 장치 및 방법을 공개한다. 본 발명은 사용자로부터 사용자 명령과 통역 대상이 되는 소스 언어 기반 음성을 인가받고, 소스 언어 기반 음성이 번역된 타겟 언어 기반 텍스트의 발성음을 소스 언어로 표시하는 소스 언어 텍스트를 출력하는 인터페이스부, 사용자 명령에 응답하여 소스 언어와 타겟 언어에 대한 정보를 설정하는 설정부 및 인터페이스부를 통해 소스 언어 기반 음성을 인가받아 음성 인식하여 소스 언어 기반 텍스트로 변환하고, 소스 언어 기반 텍스트를 타겟 언어 기반 텍스트로 번역하고, 번역된 타겟 언어 기반 텍스트에 부가하여 타겟 언어 기반 텍스트의 발성음을 소스 언어 텍스트로 변환하여 인터페이스부로 전송하는 통역부를 포함한다.Reveal automatic interpretation equipment and methods. According to an aspect of the present invention, an interface unit for receiving a source language-based voice to be interpreted by a user command from a user and outputting a source language text for displaying a sound of a target language-based text in which the source language-based voice is translated in the source language is provided. In response to the command, a source language-based voice is recognized and converted into source language-based text through a setting unit and an interface unit for setting information on the source language and the target language, and the source language-based text is translated into the target language-based text. And an interpreter for converting the utterance of the target language-based text into the source language text in addition to the translated target language-based text and transmitting it to the interface unit.

Description

Automatic interpretation device and method {APPARATUS AND METHOD FOR AUTOMATIC INTERPRETATION}

본 발명은 자동 통역 장치 및 방법에 관한 것으로, 특히 자동 통역시에 출력되는 음성 인식 텍스트 및 자동 번역 텍스트 각각의 발성음을 상대국 언어 또는 발음 표기법으로 표시할 수 있는 자동 통역 장치 및 방법에 관한 것이다.
The present invention relates to an automatic interpretation apparatus and method, and more particularly, to an automatic interpretation apparatus and method capable of displaying the sound of each of the speech recognition text and the automatic translation text output at the time of automatic interpretation in a partner station language or phonetic notation.

자동 통역 장치는 서로 다른 언어를 사용하는 사람들간에 의사소통을 보다 원활하게 해 주는 장치로서, 일반적으로 소스 언어의 사용자가 소스 언어로 발성하면, 발성된 소스 언어를 음성 인식하여, 상대국 언어의 사용자가 사용하는 타겟 언어로 자동 번역하고, 번역된 타겟 언어를 합성 음성으로 출력한다.The automatic interpreter is a device that facilitates communication among people who speak different languages. In general, when a user of the source language speaks in the source language, the user recognizes the spoken source language. Automatically translate the target language to be used, and output the translated target language as a synthesized voice.

즉 기존의 자동 통역 장치에서 소스언어의 발성음은 음성인식에 의해 소스언어의 텍스트로 표시되고, 자동번역에 의해 타겟 언어의 텍스트로 번역된다. 타겟 언어의 텍스트는 타겟 언어의 음성으로 합성하여 출력한다. 그러나, 합성되어 출력되는 음성은 곧바로 소멸되기 때문에 사용자가 상대국 언어의 발성음을 직접 재현하여 의사소통에 활용하기에 어렵다는 문제가 있다.That is, in the existing automatic interpreter, the sound of the source language is displayed as the text of the source language by voice recognition and translated into the text of the target language by the automatic translation. The text of the target language is synthesized by the voice of the target language and output. However, since the synthesized voice is immediately extinguished, there is a problem that it is difficult for a user to directly reproduce the utterance of the other country's language and use it for communication.

또한 기존 대부분의 자동 통역 장치들은 대부분 사용빈도가 낮은 고유명사가 입력되는 경우나 잡음이 심한 환경에서는 음성인식 성능이 급격히 저하되는 현상을 보이고 있다. 이렇게 음성 인식 성능이 저하되는 경우에는 사용자가 자동 통역 장치에 직접 텍스트를 입력하여 의사 소통하거나, 자동통역에 의하지 않고 상대국 언어를 직접 발성해야 할 필요성이 제기된다. 따라서, 사용자의 원활한 의사소통을 위해서는 다양한 인터페이스가 최대한 제공되어야 한다.
In addition, most of the existing automatic interpreters show a phenomenon in which speech recognition performance is rapidly deteriorated in the case of inputting a proper noun with low frequency of use or in a noisy environment. When the speech recognition performance is deteriorated, the user is required to communicate by directly inputting text into the automatic interpretation device, or to speak the language of the other country directly without the automatic interpretation. Therefore, various interfaces should be provided as much as possible for smooth communication of the user.

본 발명의 목적은 소스 언어를 음성 인식하여 타겟 언어로 자동 변환하여 음성으로 출력함과 동시에 사용자가 번역된 타겟 언어를 직접 발음할 수 있도록 타겟 언어를 소스 언어의 발음 표기법에 따라 표시할 수 있는 자동 통역 장치를 제공하는데 있다.An object of the present invention is to automatically display the target language according to the phonetic notation of the source language so that the user can directly pronounce the translated target language while simultaneously outputting the voice by automatically recognizing the source language and converting it into the target language. To provide an interpreter.

본 발명의 다른 목적은 상기 목적을 달성하기 위한 자동 통역 장치의 자동 통역 방법을 제공하는데 있다.
Another object of the present invention to provide an automatic interpretation method of the automatic interpretation device for achieving the above object.

상기 목적을 달성하기 위한 본 발명의 일 예에 따른 자동 통역 장치는 사용자로부터 사용자 명령과 통역 대상이 되는 소스 언어 기반 음성을 인가받고, 상기 소스 언어 기반 음성이 번역된 타겟 언어 기반 텍스트의 발성음을 소스 언어로 표시하는 소스 언어 텍스트를 출력하는 인터페이스부; 상기 사용자 명령에 응답하여 상기 소스 언어와 상기 타겟 언어에 대한 정보를 설정하는 설정부; 및 상기 인터페이스부를 통해 상기 소스 언어 기반 음성을 인가받아 음성 인식하여 소스 언어 기반 텍스트로 변환하고, 상기 소스 언어 기반 텍스트를 상기 타겟 언어 기반 텍스트로 번역하고, 번역된 상기 타겟 언어 기반 텍스트에 부가하여 상기 타겟 언어 기반 텍스트의 발성음을 상기 소스 언어 텍스트로 변환하여 상기 인터페이스부로 전송하는 통역부; 를 포함한다.In order to achieve the above object, an automatic interpretation apparatus according to an embodiment of the present invention receives a source language-based voice to be interpreted by a user command from a user, and generates a sound of the target language-based text in which the source language-based voice is translated. An interface unit for outputting source language text displayed in the source language; A setting unit configured to set information about the source language and the target language in response to the user command; And receiving the source language-based voice through the interface unit, and recognizing the voice, converting the source language-based text into the target language-based text, and adding the translated source language-based text to the target language-based text. An interpreter for converting a voice of a target language-based text into the source language text and transmitting the converted sound to the interface unit; It includes.

상기 통역부는 상기 소스 언어 및 상기 타겟 언어에 대한 언어모델, 음향 모델 및 발음 사전을 통합한 인식 네트워크를 저장하는 음향 및 언어 데이터베이스부; 상기 인터페이스부로부터 수신되는 상기 소스 언어 기반 음성을 상기 음향 및 언어 데이터베이스부의 상기 인식 네트워크를 기반으로 분석하여 소스 언어 기반 텍스트로 변환하는 음성 인식부; 상기 음성 인식부로부터 상기 소스 언어 기반 텍스트를 수신하여 상기 소스 언어 기반 텍스트를 상기 타겟 언어 기반 텍스트로 번역하는 텍스트 번역부; 상기 텍스트 번역부로부터 상기 타겟 언어 기반 텍스트를 수신하여 상기 타겟 언어 기반 텍스트에 대응하는 음성을 합성하여 합성음을 생성하여 상기 인터페이스부로 전송하는 음성 합성부; 및 상기 타겟 언어 기반 텍스트를 수신하여 상기 타겟 언어 기반 텍스트의 발성음을 상기 소스 언어 텍스트로 변환하여 상기 인터페이스부로 출력하는 발성음 변환부; 를 포함하는 것을 특징으로 한다.The translator includes: an acoustic and language database unit for storing a recognition network integrating a language model, an acoustic model, and a pronunciation dictionary for the source language and the target language; A speech recognition unit for analyzing the source language-based voice received from the interface unit based on the recognition network of the sound and language database unit and converting the source language-based speech into source language-based text; A text translation unit which receives the source language-based text from the speech recognition unit and translates the source language-based text into the target language-based text; A voice synthesizer configured to receive the target language-based text from the text translator, synthesize a voice corresponding to the target language-based text, generate a synthesized sound, and transmit the synthesized sound to the interface unit; And a voice sound conversion unit for receiving the target language-based text, converting the utterance of the target language-based text into the source language text, and outputting the sound to the interface unit. Characterized in that it comprises a.

상기 발성음 변환부는 상기 음성 인식부 및 상기 텍스트 번역부 중 하나로부터 상기 소스 언어 기반 텍스트를 수신하고, 상기 언어 데이터 베이스부의 인식 네트워크를 이용하여 수신된 상기 소스 언어 기반 텍스트의 발성음을 상기 타겟 언어 텍스트로 변환하여 상기 인터페이스부로 출력하는 제1 발성음 변환부; 및 상기 텍스트 번역부 및 상기 음성 합성부 중 하나로부터 상기 타겟 언어 기반 텍스트를 수신하여, 상기 타겟 언어 기반 텍스트의 발성음을 상기 소스 언어 텍스트로 변환하여 상기 인터페이스부로 출력하는 제2 발성음 변환부; 를 포함하는 것을 특징으로 한다.The speech tones converting unit receives the source language-based text from one of the speech recognition unit and the text translator, and the sound of the source language-based text received using the recognition network of the language database unit is the target language. A first phonetic sound converting unit converting the text into the text and outputting the converted text to the interface unit; And a second phonetic sound converter configured to receive the target language-based text from one of the text translator and the speech synthesizer, convert the utterance of the target language-based text into the source language text, and output the converted source language text to the interface unit. Characterized in that it comprises a.

상기 제1 및 제2 발성음 변환부 각각은 상기 소스 언어 기반 텍스트 및 상기 타겟 언어 기반 텍스트에서 문법 오류를 분석하여 수정하고, 상기 소스 언어 기반 텍스트 및 상기 타겟 언어 기반 텍스트에 포함된 기호를 대응하는 언어 기반 텍스트로 변환하는 전처리부를 더 포함하는 것을 특징으로 한다.Each of the first and second voice conversion units analyzes and corrects a grammatical error in the source language-based text and the target language-based text, and corresponds to a symbol included in the source language-based text and the target language-based text. Characterized in that it further comprises a pre-processing unit for converting to language-based text.

상기 음향 및 언어 데이터베이스부는 상기 소스 언어 및 상기 타겟 언어의 종류에 따라 상기 인식 네트워크가 발음 변이 데이터베이스, 문자소-음소(grapheme to phoneme 이하 : g2p) 변환 테이블 및 발음 대역 데이터 베이스, 대역어 데이터 베이스 중 적어도 하나를 구비하는 것을 특징으로 한다.The sound and language database may include at least one of a speech variation database, a graph to phoneme (g2p) conversion table, a pronunciation band database, and a band word database according to the type of the source language and the target language. It is characterized by including one.

상기 다른 목적을 달성하기 위한 본 발명의 일 예에 따른 자동 통역 방법은 인터페이스부와 설정부 및 통역부를 구비하는 자동 통역 장치의 자동 통역 방법에 있어서, 상기 자동 통역 장치가 상기 인터페이스부를 통해 인가되는 사용자 명령에 응답하여 자동 통역 설정을 저장하는 단계; 상기 인터페이스부를 통해 소스 언어 기반 음성이 인가되는지 판별하는 단계; 상기 소스 언어 기반 음성이 인가되면, 상기 통역부의 음향 및 언어 데이터베이스부에 저장된 상기 소스 언어 및 상기 타겟 언어에 대한 언어모델, 음향 모델 및 발음 사전을 통합한 인식 네트워크를 이용하여 음성 인식을 수행하여 소스 언어 기반 텍스트를 생성하는 단계; 상기 소스 언어 기반 텍스트를 상기 인식 네트워크를 이용하여 타겟 언어 기반 텍스트로 번역하는 단계; 및 상기 번역된 타겟 언어 기반 텍스트의 발성음을 소스 언어 텍스트로 변환하여 출력하는 단계; 를 포함한다.Automatic interpretation method according to an embodiment of the present invention for achieving the other object in the automatic interpretation method of the automatic interpretation device having an interface unit, the setting unit and the interpretation unit, the user is applied through the interface unit Storing the automatic interpretation setting in response to the command; Determining whether a source language based voice is applied through the interface unit; When the source language-based voice is applied, speech recognition is performed by using a recognition network integrating a language model, an acoustic model, and a pronunciation dictionary for the source language and the target language stored in the sound and language database of the interpreter. Generating language-based text; Translating the source language-based text into target language-based text using the recognition network; And converting and outputting utterances of the translated target language-based text into source language text. It includes.

상기 소스 언어는 한국어이며, 상기 타겟 언어는 일본어인 것을 특징으로 한다.The source language is Korean, and the target language is Japanese.

상기 인식 네트워크는 상기 소스 언어 및 상기 타겟 언어의 종류에 따라 발음 변이 데이터베이스, 문자소-음소(grapheme to phoneme 이하 : g2p) 변환 테이블 및 발음 대역 데이터 베이스, 대역어 데이터 베이스 중 적어도 하나를 포함하는 것을 특징으로 한다.The recognition network may include at least one of a pronunciation variation database, a graph to phoneme (g2p) conversion table, a pronunciation band database, and a band word database according to the type of the source language and the target language. It is done.

상기 타겟 언어 텍스트로 변환하여 출력하는 단계는 상기 소스 언어 기반 텍스트에 대해 상기 소스 언어의 특성에 따른 발음 변이 현상에 대응하기 위한 발음 변이 변환을 수행하는 단계; 상기 발음 변이 변환이 수행된 상기 소스 언어 기반 텍스트를 어절, 음절 및 음소 단위로 순차적으로 분리하는 단계; 상기 g2p 변환 테이블을 이용하여 분리된 상기 음소를 음소 단위의 발음 기호로 변환하는 단계; 상기 변환된 음소 단위 발음 기호를 음절 단위로 결합하는 단계; 상기 결합된 음절에 대응하는 상기 타겟 언어의 음절로 변환하는 단계; 상기 변환된 음절을 결합하여 상기 어절을 복원함으로써 상기 소스 언어 기반 텍스트의 발성음을 표시하는 상기 타겟 언어 텍스트를 생성하는 단계; 및 상기 타겟 언어 텍스트를 상기 인터페이스부를 통해 출력하는 단계; 를 포함하는 것을 특징으로 한다.The converting of the target language text into the target language text may include: performing a phonetic variation on the source language-based text corresponding to a phonetic variation according to the characteristics of the source language; Sequentially dividing the source language-based text on which the pronunciation shift is performed in units of words, syllables, and phonemes; Converting the phoneme separated using the g2p conversion table into phonetic symbols of phoneme units; Combining the converted phonetic unit phonetic symbols in syllable units; Converting into syllables of the target language corresponding to the combined syllables; Combining the converted syllables to generate the target language text indicating the sound of the source language based text by restoring the word; And outputting the target language text through the interface unit. Characterized in that it comprises a.

상기 소스 언어 텍스트로 변환하여 출력하는 단계는 상기 타겟 언어 기반 텍스트에 소스 언어에서 관습적으로 사용하고 있는 어휘인 대역어가 존재하는지 판별하는 단계; 상기 대역어가 존재하는 어휘에 대해서는 대역어의 발음을 적용하는 단계; 상기 대역어가 존재하지 않는 어휘에 대해서 어절, 음절 및 음소 단위로 순차적으로 분리하는 단계; 상기 g2p 변환 테이블을 이용하여 분리된 상기 음소를 음소 단위의 발음 기호로 변환하는 단계; 상기 변환된 음소 단위 발음 기호를 음절 단위로 결합하는 단계; 상기 결합된 음절에 대응하는 상기 소스 언어의 음절로 변환하는 단계; 상기 대역어의 발음과 변환된 음절을 결합하여 상기 어절을 복원함으로써 상기 타겟 언어 기반 텍스트의 발성음을 표시하는 상기 소스 언어 텍스트를 생성하는 단계; 및 상기 소스 언어 텍스트를 상기 인터페이스부를 통해 출력하는 단계; 를 포함하는 것을 특징으로 한다.
The converting of the source language text into the source language text may include determining whether a target language-based text includes a band word that is a vocabulary commonly used in a source language; Applying a pronunciation of a band word to a vocabulary in which the band word exists; Sequentially separating words having no band word into words, syllables, and phonemes; Converting the phoneme separated using the g2p conversion table into phonetic symbols of phoneme units; Combining the converted phonetic unit phonetic symbols in syllable units; Converting into syllables of the source language corresponding to the combined syllables; Generating the source language text indicating the sound of the target language based text by restoring the word by combining the pronunciation of the band word and the converted syllables; And outputting the source language text through the interface unit. Characterized in that it comprises a.

따라서, 본 발명의 자동 통역 장치 및 방법은 기존의 자동 통역 장치와 같이 사용자의 소스 언어 발성음을 수신하여 음성 인식하여 자동으로 타겟 언어로 번역 및 음성으로 출력할 뿐만 아니라 음성 인식된 텍스트에 부가하여 그 발성음을 타겟 언어로 표시하고, 번역된 타겟 언어의 발음을 소스 언어의 발음 표기법에 따라 표시함으로써, 사용자가 직접 음성 인식 텍스트를 발음하거나 번역된 타겟 언어를 발음할 수 있도록 한다. 그러므로, 사용자는 상황에 따라 음성 인식 대상 텍스트나 통역된 타겟 언어를 직접 발성할 수 있어 대화 상대와 원활한 소통을 할 수 있을 뿐만 아니라, 상대국의 언어를 이해하고 알아 듣기 어려운 외국어의 발음을 용이하게 인식하여 발음을 따라 할 수 있으므로, 외국어 학습 성취도를 크게 높일 수 있다. 더불어 사용자의 발성음에 대한 음성 인식 결과를 소스 언어와 타겟 언어로 함께 표시함으로써, 자동 통역 장치의 오류 여부를 신속하고 정확하게 판단하여 대처할 수 있도록 한다.
Therefore, the automatic interpretation apparatus and method of the present invention, like the existing automatic interpretation apparatus, receives the user's source language utterance and recognizes the speech, and automatically translates and outputs the speech to the target language as well as adding to the speech recognized text. The sound is displayed in the target language and the pronunciation of the translated target language is displayed according to the pronunciation notation of the source language, so that the user can directly pronounce the speech recognition text or pronounce the translated target language. Therefore, the user can directly speak the speech recognition target text or the interpreted target language according to the situation, so that the user can not only communicate smoothly with the conversation partner but also easily recognize the pronunciation of the foreign language that is difficult to understand and understand the language of the other country. Because you can follow the pronunciation, you can significantly increase the foreign language learning achievement. In addition, by displaying the voice recognition results of the user's voice in both the source language and the target language, it is possible to quickly and accurately determine and cope with errors of the automatic interpreter.

도1 은 본 발명의 일실시예에 따른 자동 통역 장치의 구성을 나타낸다.
도2 는 본 발명의 일실시예에 따른 자동 통역 장치의 자동 통역 방법을 나타낸다.
도3 은 도2 의 자동 통역 방법에서 제1 발성음 변환 단계를 상세하게 나타낸다.
도4 는 도2 의 자동 통역 방법에서 제2 발성음 변환 단계를 상세하게 나타낸다.1 shows a configuration of an automatic interpretation device according to an embodiment of the present invention.
Figure 2 shows an automatic interpretation method of the automatic interpretation device according to an embodiment of the present invention.
FIG. 3 illustrates in detail the first voice conversion step in the automatic interpretation method of FIG.
FIG. 4 illustrates in detail the second voice conversion step in the automatic interpretation method of FIG.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 충분히 이해하기 위해서는 본 발명의 바람직한 실시예를 예시하는 첨부 도면 및 첨부 도면에 기재된 내용을 참조하여야만 한다. In order to fully understand the present invention, the operational advantages of the present invention, and the objects achieved by the practice of the present invention, reference should be made to the accompanying drawings which illustrate preferred embodiments of the present invention and the contents described in the accompanying drawings.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로서, 본 발명을 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 설명하는 실시예에 한정되는 것이 아니다. 그리고, 본 발명을 명확하게 설명하기 위하여 설명과 관계없는 부분은 생략되며, 도면의 동일한 참조부호는 동일한 부재임을 나타낸다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. In addition, in order to clearly describe the present invention, parts irrelevant to the description are omitted, and the same reference numerals in the drawings indicate the same members.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "...부", "...기", "모듈", "블록" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.
Throughout the specification, when a part is said to "include" a certain component, it means that it may further include other components, without excluding other components unless otherwise stated. In addition, the terms "... unit", "... group", "module", "block", etc. described in the specification mean a unit for processing at least one function or operation, which is hardware or software or hardware. And software in combination.

도1 은 본 발명의 일실시예에 따른 자동 통역 장치의 구성을 나타낸다.1 shows a configuration of an automatic interpretation device according to an embodiment of the present invention.

도1 을 참조하면, 본 발명의 자동 통역 장치는 인터페이스부, 설정부(200) 및 통역부(300)를 포함한다.Referring to FIG. 1, the automatic interpreter of the present invention includes an interface unit, a setting unit 200, and an interpreter 300.

인터페이스부는 자동 통역 장치의 입출력부로서, 사용자의 명령을 인가받고, 사용자에게 통역 결과를 출력한다. 인터페이스부는 음성 감지부(110), 사용자 입력부(120), 디스플레이부(130) 및 음성 출력부(140)를 포함한다. 음성 감지부(110)는 마이크와 같은 음성 감지 센서를 포함하여, 사용자가 입력하는 음성 신호(in)를 감지하여 통역부(300)로 전송한다. 사용자 입력부(120)는 키보드, 마우스, 터치 패드, 터치 스크린등과 같은 사용자 명령 입력 수단으로 구현되어, 사용자가 사용자 명령 또는 텍스트를 자동 통역 장치로 인가할 수 있도록 한다. 사용자 입력부(120)는 사용자 명령이 인가되면 설정부(200)로 전송하고, 통역할 텍스트가 입력되면 텍스트를 통역부(300)로 전송한다.The interface unit is an input / output unit of the automatic interpretation device, receives a user's command, and outputs an interpretation result to the user. The interface unit includes a voice sensing unit 110, a user input unit 120, a display unit 130, and a voice output unit 140. The voice detector 110 includes a voice sensor such as a microphone, and detects a voice signal (in) input by the user and transmits it to the interpreter 300. The user input unit 120 is implemented as a user command input means such as a keyboard, a mouse, a touch pad, a touch screen, and the like so that a user can apply a user command or text to an automatic interpreter. The user input unit 120 transmits the text to the interpreter 300 when the user command is applied, and transmits the text to the interpreter 300 when the text to be interpreted is input.

디스플레이부(130)는 스크린, 모니터 등과 같은 디스플레이 수단으로 구현될 수 있으며, 경우에 따라서는 터치 스크린이나 터치 패널과 같이 사용자 입력부(120)과 결합된 형태로 구현될 수 도 있다. 디스플레이부(130)는 사용자가 음성으로 입력한 소스 언어에 대한 음성 인식 결과나 사용자가 입력한 텍스트, 통역된 타겟 언어 텍스트를 표시한다. 특히 본 발명의 자동 통역 장치의 디스플레이부(130)는 소스 언어에 대한 음성 인식 결과나 사용자가 입력한 텍스트에 대한 발음을 타겟 언어로 표시하고, 통역된 타겟 언어 텍스트에 대한 발음을 소스 언어로 함께 표시한다. 이는 곧바로 소멸해버리는 음성과 달리 사용자가 상대의 언어의 발음을 인지하여 직접 발음할 수 있도록 한다. 그리고 통역된 타겟 언어가 음성으로 출력되고, 타겟 언어 텍스트에 대한 발음이 소스 언어로 함께 표시되면, 음성으로만 발성되는 경우에 비해 사용자가 통역된 언어의 발음에 대해 더욱 이해하기 용이하여 외국어 학습에도 큰 성과를 얻을 수 있다.The display unit 130 may be implemented as a display means such as a screen or a monitor. In some cases, the display unit 130 may be implemented in combination with the user input unit 120 such as a touch screen or a touch panel. The display 130 displays a voice recognition result for the source language input by the user, text input by the user, and translated target language text. In particular, the display unit 130 of the automatic interpreter of the present invention displays the speech recognition result of the source language or the pronunciation of the text input by the user in the target language, and the pronunciation of the interpreted target language text in the source language. Display. This allows the user to recognize the pronunciation of the other's language and pronounce it directly, unlike the voice that is extinguished immediately. When the interpreted target language is output as a voice and the pronunciation of the target language text is displayed together with the source language, the user can understand the pronunciation of the interpreted language more easily than when the voice is spoken only. You can get great results.

음성 출력부(140)는 통역부(300)에서 통역된 타겟 언어가 합성음으로 생성되면, 생성된 합성음을 출력한다. 음성 출력부(140)는 스피커와 같은 음성 출력 수단으로 구현될 수 있다.The voice output unit 140 outputs the synthesized sound generated when the target language interpreted by the interpreter 300 is generated as the synthesized sound. The voice output unit 140 may be implemented by a voice output means such as a speaker.

설정부(200)는 사용자 입력부(120)를 통해 인가되는 사용자 명령에 응답하여 소스 언어 정보, 타겟 언어 정보, 합성음의 출력 설정 등을 설정하여 저장한다. 여기서 소스 언어 정보는 사용자가 음성 또는 텍스트로 입력하여 통역 대상이 되는 소스 언어가 어떠한 언어인지에 대한 정보를 포함한다. 유사하게 타겟 언어 정보는 입력된 소스 언어를 어떤 언어로 통역할지에 대한 정보를 포함한다. 예를 들면, 소스 언어 정보 및 타겟 언어 정보로서 한국어, 영어, 일본어, 중국어 등으로 설정할 수 있다.The setting unit 200 sets and stores source language information, target language information, and output of synthesized sound in response to a user command applied through the user input unit 120. Here, the source language information includes information on which language is a source language to be interpreted by a user inputting by voice or text. Similarly, the target language information includes information on which language to translate the input source language into. For example, the source language information and the target language information may be set to Korean, English, Japanese, Chinese, and the like.

통역부(300)는 사용자가 입력한 소스 언어 기반의 음성 또는 텍스트를 번역하여 타겟 언어 기반의 텍스트로 번역하고, 번역된 타겟 언어 기반의 텍스트를 합성음으로 생성하여 통역을 수행한다. 특히 본 발명에서 통역부(300)는 소스 언어 기반의 텍스트의 발성음을 타겟 언어로 표시하고, 번역된 타겟 언어 기반의 텍스트의 발성음을 소스 언어로 표시한다. 즉 소스 언어와 타겟 언어 각각에 대해 발성음을 서로 상대국의 언어로 표시함에 따라 서로 다른 언어를 사용하는 복수의 사용자들이 서로 상대방의 언어를 직접 발음할 수 있도록 한다.The interpreter 300 translates the source language-based voice or text input by the user into a target language-based text, and generates the translated target language-based text as a synthesized sound to perform interpretation. In particular, in the present invention, the interpreter 300 displays the sound of the source language based text in the target language, and displays the translated sound of the target language based text in the source language. That is, as the voices are displayed in the languages of the counterpart countries for each of the source language and the target language, a plurality of users who use different languages can directly pronounce each other's languages.

통역부(300)는 음성 및 언어 데이터베이스부(310), 음성 인식부(320), 텍스트 번역부(330), 음성 합성부(340), 제1 발성음 변환부(350) 및 제2 발성음 변환부(360)을 포함한다.The interpreter 300 includes a speech and language database 310, a speech recognizer 320, a text translator 330, a speech synthesizer 340, a first voice conversion unit 350, and a second voice. It includes a conversion unit 360.

음성 및 언어 데이터베이스부(310)는 소스 언어 및 타겟 언어에 대한 데이터를 저장한다. 음성 및 언어 데이터베이스부(310)에는 일반적으로 음성인식장치는 음성인식을 하기 위해 사용하는 언어 모델, 발음 사전 및 음향 모델이 저장된다. 언어 모델은 자연어 안에서 문법, 구문, 단어 등에 대한 어떤 규칙성을 찾아내고, 그 규칙성을 이용하기 위해 구비되며, 음향 모델은 음성은 인식 단위로 분리하고 모델링하여 인식단위의 음성을 인식단위의 음소로 변환하기 위해 구비된다. 발음 사전은 각 언어의 발음 표기법 및 발음 특성 정보를 포함하여 언어별 언어 표기법을 제공한다. 언어 모델, 음향 모델 및 발음 사전은 소스 언어 및 타겟 언어에 대해 각각 구비될 수 있으며, 다른 언어에 대해서도 구비될 수 있다. 그리고 음성 및 언어 데이터베이스부(310)는 언어 모델, 발음 사전 및 음향 모델을 통합한 인식 네트워크를 형성하여 저장할 수 있다.The voice and language database unit 310 stores data about a source language and a target language. In general, the speech and language database 310 stores a language model, a pronunciation dictionary, and an acoustic model used by the speech recognition apparatus for speech recognition. The language model is provided to find any regularity about grammar, phrases, words, etc. in the natural language, and is used to use the regularity. The acoustic model separates the speech into recognition units and models the speech to recognize phonemes of recognition units. Is provided to convert. Pronunciation dictionary provides language notation for each language, including pronunciation notation and pronunciation characteristic information of each language. The language model, the acoustic model, and the pronunciation dictionary may be provided for the source language and the target language, respectively, and may be provided for other languages. The voice and language database unit 310 may form and store a recognition network integrating a language model, a pronunciation dictionary, and an acoustic model.

특히 본 발명에서 자동 통역 장치는 기존의 통역 장치와 달리 입력된 소스 언어를 타겟 언어로 변환하여 출력할 뿐만 아니라, 타겟 언어로 변환된 텍스트에 대한 발성음을 소스 언어로 표시하거나, 입력된 소스 언어의 텍스트의 발성음을 타겟 언어로 표시할 수 있도록 한다. 또한 각각의 언어는 고유한 문자 체계 및 발음 체계를 갖고 있는 경우가 많다. 이에 본 발명의 음성 및 언어 데이터베이스부(310)는 언어 모델의 종류에 따라 발음 사전, 발음 변이 데이터베이스, 문자소-음소(grapheme to phoneme 이하 : g2p) 변환 테이블 및 발음 대역 데이터 베이스, 대역어 데이터 베이스 중 적어도 하나를 구비할 수 있다.In particular, in the present invention, the automatic interpretation device not only converts the input source language into a target language and outputs it, but also displays a voice sound for the text converted into the target language as the source language or input source language, unlike the existing interpretation device. Allows you to display the phonetic sounds of the text in the target language. Each language also has its own writing system and pronunciation system. Accordingly, the speech and language database 310 of the present invention may include a phonetic dictionary, a phonetic variation database, a phoneme to phoneme (g2p) conversion table, a pronunciation band database, and a bandword database according to the type of language model. At least one may be provided.

일 예로 한국어를 일본어로, 일본어를 한국어로 상호 통역하는 경우를 가정하면, 한국어는 음절 및 음소 문자 체계 인데 비해, 일본어는 음절 단위의 문자 체계이므로, 발음 표기법이 서로 상이하다. As an example, assuming that Korean is translated into Japanese and Japanese is translated into Korean, Korean is a syllable and phonemic letter system, whereas Japanese is a syllable unit letter system.

먼저 한국어의 발성음을 일본어로 표시하는 경우를 살펴보면, 발음 사전은 문자를 음소 단위의 발음 기호로 변환시키기 위한 규칙이 저장되며, g2p 변환 테이블의 토대가 된다. 즉 문자소가 입력되면, 입력되는 문자소를 대응하는 음소의 발음 기호로 변환하여 출력할 수 있다.First, a case in which a Korean voice is displayed in Japanese is described. The pronunciation dictionary stores a rule for converting a character into a phonetic symbol of a phoneme and is the basis of a g2p conversion table. In other words, when a phoneme is input, the phoneme may be converted into phonetic symbols of the corresponding phoneme and output.

발음 대역 데이터 베이스는 한국어 음절에 대응하는 음소의 결합구조를 저장하고, 음소의 결합에 대응하는 일본어 음절을 저장하여 한국어 음절을 일본어 음절로 변환 할 수 있도록 한다.The pronunciation band database stores the combined structure of phonemes corresponding to Korean syllables, and stores Japanese syllables corresponding to the combination of phonemes to convert Korean syllables into Japanese syllables.

그리고 한국어의 경우에는 자음 동화, 구개 음화, 축약 등의 다양한 발음 변이 현상이 존재한다. 이에 발음 변이 데이터 베이스는 발음 변이 정보를 저장하여 각 어휘들의 실제 발음에 대응하는 발음을 추출할 수 있도록 한다. 예를 들어, 한국어 "신라"를 일본어로 표시하고자 하는 경우에 실제 발음인 "신라" 의 실제 발음인 "실라" 에 대응하는 일본어가 표시될 수 있도록 하고, "굳이"의 경우에는 실제 발음인 "구지" 에 대응하는 일본어가 표시될 수 있도록 한다.In the case of Korean, there are various pronunciation variations such as consonant fairy tales, palate phonetics, and abbreviations. The pronunciation variation database stores the pronunciation variation information so that the pronunciation corresponding to the actual pronunciation of each vocabulary can be extracted. For example, when Korean "Silla" is to be displayed in Japanese, Japanese corresponding to "Sila" which is the actual pronunciation of "Silla" can be displayed. Allows the display of Japanese corresponding to "Kuji."

반면 일본어의 발성음을 한국어로 표시하는 경우를 살펴보면, g2p 변환 테이블은 한국어의 발음을 일본어로 표시하는 경우와 동일하게 동작한다. 발음 대역 데이터 베이스는 일본어 음절의 발음 기호를 한국어 음절로 대비하여 변환할 수 있도록 한다. 그러나 일본어의 경우에는 한국어와 같은 발음 변이가 상대적으로 적기 때문에 발음 변이 데이터 베이스가 생략될 수 있다. 대신 일본어 중에는 한국어의 표기가 이미 관습화 되어있는 어휘가 상당수 존재한다. 대역어 데이터 베이스는 일본어에 대해 관습화된 대역어를 제공하여 관습화된 어휘에 대응할 수 있도록 한다.On the other hand, referring to the case of displaying Japanese phonetic sounds in Korean, the g2p conversion table operates in the same way as displaying the pronunciation of Korean in Japanese. The pronunciation band database allows the phonetic symbols of Japanese syllables to be converted into Korean syllables. However, in Japanese, pronunciation variations such as Korean are relatively small, and thus the pronunciation variation database may be omitted. Instead, there are a number of vocabulary in which Korean notation is already customary in Japanese. The bandword database provides a customary bandword for Japanese to cope with the customary vocabulary.

음성 인식부(320)는 음성 감지부(110)을 통해 음성 신호(in)를 수신하고, 음성 및 언어 데이터베이스부(310)에 저장된 음향 모델과 언어 모델을 기반으로 설정된 인식 네트워크를 이용하여 텍스트로 변환한다. 이때 변환된 텍스트는 소스 언어 기반 텍스트이다. The voice recognition unit 320 receives the voice signal in through the voice detection unit 110 and converts the text into text using a recognition network set based on an acoustic model and a language model stored in the voice and language database 310. Convert. The converted text is source language based text.

텍스트 번역부(330)는 음성 신호(in)가 변환된 소스 언어 기반 텍스트를 음성 인식부(320)로부터 수신하거나, 사용자가 사용자 입력부(120)를 이용하여 입력한 소스 언어 기반 텍스트를 타겟 언어 기반 텍스트로 번역한다. 만일 한국어 음성 신호(in)가 한국어 "안녕히 계세요."로 음성 인식되었다면, 텍스트 번역부(330)는 "안녕히 계세요."에 대응하는 일본어인 "さようなら"로 변환할 수 있다. 텍스트 번역부(330)는 음성 및 언어 데이터베이스부(310)의 언어 모델을 기반하여 소스 언어 기반 텍스트를 타겟 언어 기반 텍스트로 번역한다. 텍스트를 번역하는 기법은 공지된 다양한 기술이 존재하므로 여기서는 상세한 설명을 생략한다.The text translator 330 receives the source language-based text converted from the voice signal in from the speech recognizer 320 or receives the source language-based text input by the user using the user input unit 120 based on the target language. Translate into text. If the Korean voice signal in is voice-recognized as Korean "Goodbye", the text translation unit 330 may convert the Japanese voice signal "さようなら" corresponding to "Goodbye". The text translation unit 330 translates the source language-based text into the target language-based text based on the language model of the speech and language database 310. Techniques for translating texts are well known in the art, so a detailed description thereof will be omitted.

그리고 텍스트 번역부(330)는 전처리부(미도시)를 구비할 수 있다. 전처리부는 텍스트 번역 전에 아라비아 숫자와 같은 기호를 텍스트로 변환하거나, 맞춤법 오류를 체크하여 수정하는 전처리 작업을 수행할 수 있다. 그러나 숫자나 기호의 경우에는 여러 언어에서 통합되어 사용되는 경우도 많으므로, 텍스트로 변환할 필요가 없을 수 있다. 그러나 숫자나 기호에 대한 발음은 각 언어별로 상이한 경우가 대부분이므로, 전처리부는 텍스트를 발성음에 따라 변환하는 제1 및 제2 발성음 변환부(350, 360)에 포함될 수도 있다.The text translation unit 330 may include a preprocessor (not shown). The preprocessor may perform a preprocessing operation to convert symbols such as Arabic numerals into text or check and correct spelling errors before text translation. However, numbers and symbols are often used in many languages, so they may not need to be converted to text. However, since the pronunciation of numbers or symbols is different in each language, the preprocessor may be included in the first and second voice conversion converters 350 and 360 for converting the text according to the voice.

음성 합성부(340)는 번역된 타겟 언어 기반 텍스트를 음성 합성하여 사용자가 청취할 수 있도록 합성음을 생성하여 음성 출력부(140)로 출력한다. 이때 음성 합성부(340)는 설정부(200)는 설정부의 설정에 따라 합성음을 남성의 음성 또는 여성의 음성이나 성인이나 어린이의 음성 등으로 다양하게 합성할 수 있다.The speech synthesizer 340 synthesizes the translated target language-based text and generates a synthesized sound so that the user can listen to the speech output unit 140. In this case, the voice synthesizer 340 may set the synthesizer sound in various ways, such as a male voice or a female voice or an adult or child voice, depending on the setting of the setting unit 200.

제1 발성음 변환부(350)는 음성 인식부(320)에서 인식된 소스 언어 기반텍스트를 음성 및 언어 데이터 베이스의 발음 사전에 기초로 하여 타겟 언어로 변환한다. 즉 음성 신호(in)가 입력되어 음성 인식부(320)에서 소스 언어 기반 텍스트로 변환되면, 변환된 소스 언어 기반 텍스트의 발성음을 타겟 언어로 표시한다. 상기한 예에서와 같이, 한국어를 일본어로 통역하는 경우, 음성 신호(in)는 소스 언어가 한국어인 한국어 기반 음성 신호로 입력된다. 그리고 한국어 기반 음성 신호는 음성 인식부(320)에서 한국어 기반 텍스트로 변환된다. 이에 제1 발성음 변환부(350)는 한국어 기반 텍스트의 발성음을 타겟 언어인 일본어 텍스트로 변환한다. 만일 입력 신호(in)가 "안녕히 계세요."로 음성 인식되었다면, 제1 발성음 변환부(350)는 "안녕히 계세요."를 음성 및 언어 데이터베이스부(310)의 데이터를 이용하여 일본어 가타가나 표기"アンニョンヒギェセヨ"로 변환한다. 이는 텍스트 번역부(330)가 수행하는 의미적인 번역인 "さようなら"와 달리 한국어 발성음을 "アンニョンヒギェセヨ"로 변환함으로써 상대자인 일본인이 한국어 발음을 자국어를 보고 용이하게 발성할 수 있도록 한다.The first speech sound conversion unit 350 converts the source language-based text recognized by the speech recognition unit 320 into a target language based on the pronunciation dictionary of the speech and language database. That is, when the voice signal in is input and converted into the source language-based text by the speech recognition unit 320, the voice of the converted source language-based text is displayed in the target language. As in the above example, when Korean is translated into Japanese, the voice signal in is input as a Korean-based voice signal in which the source language is Korean. In addition, the Korean-based speech signal is converted into Korean-based text by the speech recognizer 320. Accordingly, the first voice conversion unit 350 converts the voice of the Korean-based text into Japanese text that is a target language. If the input signal in is voice recognized as "Hello," the first voice conversion unit 350 displays "Hello," using the data from the speech and language database unit 310 to display Japanese Katakana. " Convert it to ANNION GEESE. This is unlike the semantic translation "さようなら" performed by the text translation unit 330 by converting the Korean utterance into "アンニョンヒギェセヨ" so that the Japanese as a partner can easily pronounce the Korean pronunciation by looking at their native language.

그리고 제1 발성음 변환부(350)는 음성 인식부(320)에서 인식된 소스 언어 기반 텍스트와 소스 언어 기반 텍스트의 발성음을 표시한 타겟 언어 텍스트를 디스플레이부(130)로 전송하여, 디스플레이부(130)가 음성 인식된 소스 언어 기반 텍스트에 부가하여 소스 언어 기반 텍스트의 발성음을 표시한 타겟 언어 텍스트를 표시할 수 있도록 한다. 이에 사용자는 자신이 발화한 음성을 자동 통역 장치가 정확하게 인식하였는지 판별할 수 있고, 이에 대응하는 발성음을 타겟 언어 표기도 확인할 수 있다.In addition, the first voice conversion unit 350 transmits the source language-based text recognized by the speech recognition unit 320 and the target language text indicating the voice sound of the source language-based text to the display 130, and then displays the display unit 130. In addition to the speech-recognized source language-based text, the 130 may display the target language text indicating the voice of the source language-based text. Accordingly, the user may determine whether the automatic interpreter correctly recognizes the voice spoken by the user, and check the target language notation corresponding to the corresponding voice.

상기에서는 제1 발성음 변환부(350)가 음성 인식부(320)로부터 소스 언어 기반 텍스트를 수신하는 것으로 설명하였으나, 소스 언어 기반 텍스트는 텍스트 번역부로부터 수신할 수도 있다.In the above description, the first voice conversion unit 350 receives the source language-based text from the voice recognition unit 320, but the source language-based text may be received from the text translation unit.

제2 발성음 변환부(360)는 제1 발성음 변환부(350)과 반대로 텍스트 번역부(330)에서 번역된 타겟 언어 기반 텍스트의 발성음을 언어 데이터 베이스를 기초로 하여 소스 언어의 텍스트로 변환한다. 제2 발성음 변환부(360)는 음성 합성부(340)로 인가된 타겟 언어로 번역된 텍스트를 수신하여, 번역된 타겟 언어 텍스트의 발성음을 소스 언어로 변환한다. 상기한 예에서 제2 발성음 변환부(360)는 한국어 "안녕히 계세요."가 번역된 일본어 "さようなら"를 한국어 "사요-나라"로 변환한다. 여기서 "-"은 장음 표기 기호이다.The second voice conversion unit 360, as opposed to the first voice conversion unit 350, converts the voice of the target language-based text translated by the text translation unit 330 into text of the source language based on the language database. Convert. The second voice conversion unit 360 receives the text translated into the target language applied to the speech synthesis unit 340 and converts the voice sound of the translated target language text into the source language. In the above example, the second voice conversion unit 360 converts the Japanese "さようなら" into which the Korean "Goodbye" is translated into the Korean "Sayo-Nara". Where "-" is a long note notation.

그리고 제1 발성음 변환부(350)와 마찬가지로 제2 발성음 변환부(360)가 번역된 일본어의 발성음을 표시하는 한국어를 일본어 텍스트와 함께 디스플레이부(130)로 전송하여 표시되도록 함으로써, 사용자는 타겟 언어로 번역된 텍스트에 대한 발음을 한국어로 확인할 수 있어, 발음에 대한 높은 이해도를 갖고 직접 번역된 텍스트를 용이하게 발화할 수 있다.As in the first voice conversion unit 350, the second voice conversion unit 360 transmits the Korean voice indicating the translated Japanese voice to the display unit 130 along with the Japanese text for display. Since the pronunciation of the text translated in the target language can be confirmed in Korean, it is possible to easily utter the directly translated text with a high understanding of the pronunciation.

동시에 타겟 언어로 번역된 텍스트에 대응하는 합성음이 음성 출력부(140)를 통해 출력되므로, 발음에 대한 높은 이해도를 가질 수 있어 어학 학습 시에도 성취도를 높일 수 있다.At the same time, since the synthesized sound corresponding to the text translated into the target language is output through the voice output unit 140, it is possible to have a high understanding of the pronunciation, thereby improving achievement even during language learning.

상기에서는 설명의 편의를 위해 제1 발성음 변환부(350)와 제2 발성음 변환부(360)를 별도로 도시하였으나, 제1 발성음 변환부(350)와 제2 발성음 변환부(360)는 통합되어 구현되어도 무방하다. 또한 도1 에서는 제2 발성음 변환부(360)가 음성 합성부(340)로부터 번역된 타겟 언어 기반 텍스트를 수신하는 것으로 도시하였으나, 제2 발성음 변환부(360)는 텍스트 번역부(330)로부터 타겟 언어 기반 텍스트를 수신하여도 무방하다.In the above description, the first voice conversion unit 350 and the second voice conversion unit 360 are separately illustrated for convenience of description, but the first voice conversion unit 350 and the second voice conversion unit 360 are illustrated. May be integrated. In addition, in FIG. 1, although the second voice conversion unit 360 receives the target language-based text translated from the voice synthesis unit 340, the second voice conversion unit 360 is the text translation unit 330. It is also possible to receive target language-based text from.

그리고 도1 에서는 설정부(200)를 통역부(300)과 별도로 도시하였으나, 설정부(200)는 통역부(300)에 포함될 수도 있다.In addition, although the setting unit 200 is illustrated separately from the interpreter 300 in FIG. 1, the setting unit 200 may be included in the interpreter 300.

도1 의 자동 통역 장치는 통역을 위한 별도의 장치로 구현될 수도 있으나, 설정부(200) 및 통역부(300)는 소프트웨어로 구현 가능하므로, 인터페이스부를 구비하는 다양한 장치가 자동 통역 장치로 활용될 수 있다. 예를 들면, 스마트폰, 스마트 패드, PDA, PC 등과 같은 각종 정보 통신 기기가 자동 통역 장치로 활용 될 수 있다.
The automatic interpretation device of FIG. 1 may be implemented as a separate device for interpretation, but since the setting unit 200 and the interpretation unit 300 may be implemented in software, various devices having an interface unit may be used as the automatic interpretation device. Can be. For example, various information communication devices such as smart phones, smart pads, PDAs, PCs, etc. may be used as the automatic interpretation device.

도2 는 본 발명의 일실시예에 따른 자동 통역 장치의 자동 통역 방법을 나타낸다.Figure 2 shows an automatic interpretation method of the automatic interpretation device according to an embodiment of the present invention.

도2 의 자동 통역 방법 또한 도1 에서와 같이 한국어를 일본어로 통역하는 경우를 예로 들어 설명한다. 도1 을 참조하여 도2 의 자동 통역 방법을 설명하면, 자동 통역 장치는 우선 사용자가 사용자 입력부(120)를 통해 인가하는 사용자 명령으로 자동 통역 설정을 수신하여 저장한다(S10). 여기서 자동 통역 설정은 소스 언어 및 타겟 언어 정보, 합성음의 출력 설정 등을 저장한다. 자동 통역 설정은 기본값이 미리 지정되어 사용자가 설정하지 않더라도 기본 설정에 의해 통역이 수행될 수 있다.The automatic interpretation method of FIG. 2 will also be described taking the case of interpreting Korean as Japanese as shown in FIG. Referring to FIG. 1, the automatic interpretation method of FIG. 2 will be described below. The automatic interpretation apparatus first receives and stores an automatic interpretation setting as a user command authorized through the user input unit 120 (S10). Here, the automatic interpretation setting stores source language and target language information, output setting of the synthesized sound, and the like. The automatic interpretation setting defaults, and even if not set by the user can be interpreted by the default setting.

자동 통역 설정이 지정되면, 자동 통역 장치는 음성 신호(in)인 소스어 음성이 입력되는지 판별한다(S20). 소스어 음성은 사용자가 사용하는 소스 언어 기반의 음성으로서 음성 감지부(110)를 통해 입력될 수 있으며, 사용자는 사용자 입력부(120)을 통해 사용자 명령으로 음성 입력 명령을 인가함으로써 소스어 음성이 입력되도록 할 수 있으며, 경우에 따라서는 자동 통역 장치가 자동으로 음성 입력 여부를 감지할 수 있도록 할 수도 있다. 만일 소스어 음성이 입력된 것으로 판별되면, 음성 인식부(320)가 음성 및 언어 데이터베이스부(310)에서 음향 모델, 발음 사전 및 언어 모델을 통합하여 저장된 인식 네트워크를 이용하여 음성 인식을 수행하여 소스 언어 기반 텍스트를 생성한다(S30).If the automatic interpretation setting is specified, the automatic interpretation device determines whether a source language voice, which is a voice signal in, is input (S20). The source language voice may be input through the voice sensor 110 as a source language based voice used by the user, and the user inputs the voice input command as a user command through the user input unit 120. In some cases, an automatic interpreter can automatically detect whether a voice is input. If it is determined that the source language voice is input, the speech recognition unit 320 integrates the acoustic model, the pronunciation dictionary and the language model in the speech and language database 310 to perform speech recognition using a stored recognition network. Generate language-based text (S30).

한편 소스어 음성이 입력되지 않으면, 소스어 텍스트가 입력되는지 판별한다(S40). 사용자는 통역하고자 하는 문장을 음성으로 자동 통역 장치로 인가할 수도 있으나, 잡음이 많은 경우나, 음성으로 발화하기 어려운 환경과 같이 특별한 경우에는 사용자 입력부(120)를 이용하여 통역하고자 하는 문장을 텍스트로 직접 입력할 수도 있다. 이 경우에는 음성 인식이 불필요하므로 음성 인식 단계(S30)을 생략한다.On the other hand, if the source language voice is not input, it is determined whether the source language text is input (S40). A user may apply a sentence to be interpreted to an automatic interpretation device by voice. However, in a special case such as a lot of noise or an environment in which speech is difficult to speak, the sentence to be interpreted using the user input unit 120 may be used as a text. You can also enter it manually. In this case, since speech recognition is unnecessary, the speech recognition step S30 is omitted.

음성 인식 또는 소스 텍스트 입력에 의해 소스 언어 기반 텍스트가 획득되면, 제1 발성음 변환부(350)는 소스 언어 기반 텍스트의 발음을 타겟 언어의 타겟 언어의 텍스트로 변환하고, 소스 언어 기반 텍스트와 소스 언어 텍스트의 발성음을 표시하는 타겟 언어 텍스트로 디스플레이부(130)를 통해 출력한다(S50).When the source language-based text is obtained by speech recognition or source text input, the first voice conversion unit 350 converts the pronunciation of the source language-based text into the text of the target language of the target language, and the source language-based text and the source. The target language text is displayed through the display unit 130 to display the spoken sound of the language text (S50).

그리고 소스 언어 기반 텍스트를 자동 통역 설정에 따라 자동으로 번역하여 타겟 언어 텍스트로 변환한다(S60).The source language-based text is automatically translated according to the automatic interpretation setting and converted into the target language text (S60).

자동 번역이 수행되어 타겟 언어 텍스트가 획득되면, 자동 통역 장치의 제2 발성음 변환부(260)가 번역된 타겟 언어 텍스트의 발음을 소스 언어 텍스트로 변환하고, 타겟 언어 텍스트와 타겟 언어 텍스트의 발성음을 표시하는 소스 언어 텍스트를 디스플레이부(130)를 통해 출력한다(S70).When the automatic translation is performed and the target language text is obtained, the second phonetic sound conversion unit 260 of the automatic interpretation device converts the pronunciation of the translated target language text into the source language text, and the speech of the target language text and the target language text is spoken. The source language text displaying the sound is output through the display 130 (S70).

한편, 음성 합성부(340)는 타겟 언어 기반 텍스트에 대응하는 음성을 합성하여 합성음을 생성한다(S80). 그리고 음성 출력부(140)는 합성음을 수신하여 출력한다(S90).On the other hand, the speech synthesizer 340 synthesizes a voice corresponding to the target language-based text to generate a synthesized sound (S80). The voice output unit 140 receives and outputs the synthesized sound (S90).

합성음을 출력한 후, 자동 통역 장치는 반대로 타겟어에 대한 통역 여부를 사용자 입력부(120)를 통해 인가되는 사용자 명령에 따라 판별한다(S100). 만일 타겟어에 대한 통역 명령이 인가된다면, 기설정된 자동 통역 설정에서 소스어와 타겟어를 상호 교체하는 통역 설정 전환을 수행한다(S110).After outputting the synthesized sound, the automatic interpretation device, on the contrary, determines whether to interpret the target word according to a user command applied through the user input unit 120 (S100). If an interpreter command is applied to the target word, an interpreter setting switch for exchanging the source word and the target word in the preset automatic interpreter setting is performed (S110).

상기한 바와 같이 본 발명에 따른 자동 통역 장치의 자동 통역 방법은 소스어가 음성 또는 텍스트로 입력되면 입력된 소스어의 발음에 대응하는 타겟 언어 텍스트와 소스어가 번역된 타겟어에 부가하여 그 발성음을 표기한 소스 언어 텍스트를 출력함으로써, 사용자가 발화한 음성에 대한 타겟어의 발음 표기와 통역된 타겟어에 대한 소스어의 발음 표기를 모두 확인할 수 있다. 따라서 사용자가 통역된 타겟 언어의 발음을 용이하게 인지할 수 있어 직접 발화할 수 있도록 할 뿐만 아니라, 어학 학습에 도움을 줄 수 있다.As described above, in the automatic interpretation method of the automatic interpretation apparatus according to the present invention, when the source language is input as voice or text, the target language text corresponding to the pronunciation of the input source language and the source language to which the source language is translated are added to the spoken sound. By outputting the displayed source language text, it is possible to confirm both the pronunciation representation of the target word for the voice spoken by the user and the pronunciation representation of the source language for the interpreted target word. Therefore, the user can easily recognize the pronunciation of the interpreted target language, so that the user can speak directly, as well as help in language learning.

상기에서는 소스 언어 기반 텍스트의 발음을 타겟 언어 텍스트로 변환하는 제1 발성음 변환 단계(S50)를 기술하였으나, 실제 사용자는 타겟 언어 텍스트를 필요로 하지 않을 수 있다. 이 경우, 제1 발성음 변환 단계(S50)는 생략될 수 있으며, 제1 발성음 변환부(350) 또한 자동 통역 장치에서 제거 될 수 있다.
In the above description, the first voice conversion step S50 of converting the pronunciation of the source language-based text into the target language text is described, but the actual user may not need the target language text. In this case, the first voice conversion step S50 may be omitted, and the first voice conversion unit 350 may also be removed from the automatic interpreter.

도3 은 도2 의 자동 통역 방법에서 제1 발성음 변환 단계를 상세하게 나타내고, 도4 는 도2 의 자동 통역 방법에서 제2 발성음 변환 단계를 상세하게 나타낸다.FIG. 3 shows the first voice conversion step in detail in the automatic interpretation method of FIG. 2, and FIG. 4 shows the second voice conversion step in detail in the automatic interpretation method in FIG.

도3 및 도4 또한 한국어에서 일본어로의 통역을 일예로서 설명하며, 전처리부가 텍스트 번역부가 아닌 제1 및 제2 발성음 변환부(350, 360)에 각각 구비되는 것으로 가정하여 설명한다.3 and 4 also illustrate the interpretation from Korean to Japanese as an example, and it is assumed that the preprocessor is provided in the first and second phonetic sound converters 350 and 360, respectively, rather than the text translation unit.

도3 의 제1 발성음 변환 단계(S50)는 소스 언어 기반 텍스트의 발성음을 타겟 언어 텍스트로 변환하는 단계로서, 우선 음성 인식되거나, 소스어 텍스트가 입력되어 획득된 소스 언어 기반 텍스트에 대해 전처리 작업을 수행한다(S51). 전처리 작업은 상기한 바와 같이, 맞춤법 체크와 같이 문법적 오류를 수정하고, 숫자나 기호를 소스 언어 기반 텍스트로 변환하여 수행될 수 있다. 그리고 음성 및 언어 데이터베이스부(310)의 발음 변이 데이터베이스를 이용하여 발음 변이 변환을 수행한다(S52). 발음 변이 변환은 상기한 바와 같이 한국어의 특성인 자음 동화, 구개 음화, 축약 등의 다양한 발음 변이 현상이 소스 언어 기반 텍스트에 적용되도록 한다. 즉 소스 언어 기반 텍스트를 발음 기반 텍스트로 일부 변환한다.The first speech conversion step S50 of FIG. 3 is a step of converting the speech sound of the source language-based text into the target language text, which is first pre-processed for the source language-based text obtained by speech recognition or input of the source language text. Perform the operation (S51). As described above, the preprocessing operation may be performed by correcting a grammatical error such as spelling check and converting numbers or symbols into source language-based text. Then, the pronunciation variation is converted using the pronunciation variation database of the speech and language database 310 (S52). The phonetic variation transformation allows various phonetic variation, such as consonant assimilation, palatalization, and abbreviation, which are characteristic of Korean, to be applied to the source language-based text. In other words, the source language-based text is partially converted into pronunciation-based text.

발음 변이 변환을 수행한 후, 소스 언어 기반 텍스트를 어절 단위로 분리한다(S53). 어절 분리는 언어의 종류에 무관하게 띄어쓰기가 반영되도록 하기 위함이다. 어절이 분리되면, 음절 분리를 수행한다(S54). 그리고 음절이 분리되면, 각 음절을 초성, 중성 및 종성으로 음소 분리를 수행한다(S55). 음소 분리는 한국어가 음소 기반 언어이기 때문으로, 음소를 분리할 수 없는 일본어나 중국어의 경우에는 음소 분리를 생략할 수 있다.After the phonetic mutation conversion is performed, the source language-based text is separated into word units (S53). Segmentation is intended to reflect spacing regardless of language type. When the word is separated, syllable separation is performed (S54). Then, when the syllables are separated, phoneme separation is performed for each syllable into primary, neutral, and final (S55). Since phoneme separation is a phoneme-based language, phoneme separation can be omitted in the case of Japanese or Chinese, where phonemes cannot be separated.

음소 분리가 수행되면, 음성 및 언어 데이터베이스부(310)의 g2p 변환 테이블을 이용하여 분리된 음소들을 음소 단위의 발음 기호로 변환한다(S56). 표1 은 한국어를 발음기호로 변환하기 위한 g2p 변환 테이블의 일 예를 나타낸다.When the phoneme separation is performed, the phonemes are separated into phonetic symbols of phoneme units using the g2p conversion table of the speech and language database unit 310 (S56). Table 1 shows an example of a g2p conversion table for converting Korean into phonetic symbols.

그리고 음성 및 언어 데이터베이스부(310)의 발음 대역 데이터 베이스를 이용하여 변환된 음소 단위 발음 기호를 음절 단위로 결합한 후, 결합된 음절에 대응하는 타겟어(여기서는 일본어) 음절로 변환한다(S57). 표2 는 한국어를 일본어 음절로 변환하기 위한 발음 대역 데이터 베이스의 일예를 나타낸다.Then, the phoneme unit phonetic symbols converted using the phonetic band database of the speech and language database unit 310 are combined in syllable units, and then converted into syllable target words (here, Japanese) syllables corresponding to the combined syllables (S57). Table 2 shows an example of a pronunciation band database for converting Korean into Japanese syllables.

분리된 음절들이 모두 타겟어 음절로 변환되면, 변환된 음절들을 다시 결합하여 어절을 복원한다(S58). 복원된 어절은 소스 언어 기반 텍스트의 발성음이 타겟 언어 텍스트로 변환 된 것으로서, 자동 통역 장치는 디스플레이부(130)을 통해 타겟 언어 텍스트를 표시한다(S59). 이때, 소스 언어 기반 텍스트와 소스 언어 기반 텍스트의 발성음에 대한 타겟 언어 텍스트가 부가되어 함께 표시될 수 있다.
When all of the separated syllables are converted to the target word syllable, the converted syllables are combined again to restore the word (S58). The restored word is obtained by converting a utterance of the source language-based text into the target language text, and the automatic interpreter displays the target language text through the display 130 (S59). In this case, the target language text for the sound of the source language-based text and the source language-based text may be added and displayed together.

한편 도4 의 제2 발성음 변환 단계(S70)는 번역된 타겟 언어 기반 텍스트의 발성음을 소스 언어 텍스트로 변환하는 단계로서, 제1 발성음 변환 단계(S50)과 마찬가지로 우선 전처리 단계를 수행할 수 있다(S71). 그리고 대역어 검색 단계를 수행한다(S72). 상기한 바와 같이 일본어 기반 텍스트의 발성음을 한국어 텍스트로 변환하는 경우에, 이미 관습화 되어있는 어휘인 대역어가 상당수 존재하므로 이를 반영할 수 있어야 한다. 이러한 대역어가 발음에 적용되도록 음성 및 언어 데이터베이스부(310)에 포함된 대역어 데이터 베이스를 이용하여 타겟 언어 기반 텍스트에 소스 언어에서 대역어가 존재하는지 판단한다.Meanwhile, the second voice conversion step S70 of FIG. 4 is a step of converting the voice of the translated target language-based text into the source language text, and similarly to the first voice conversion step S50, a preprocessing step may be performed. It may be (S71). Then, the band word search step is performed (S72). As described above, when the phonetic sound of Japanese-based text is converted into Korean text, a large number of band words, which are already conventionalized vocabularies, must be reflected. In order to apply the band word to the pronunciation, the band word database included in the speech and language database unit 310 is used to determine whether the band word exists in the source language in the target language-based text.

만일 대역어가 존재하는 것으로 판별되면, 음성 및 언어 데이터베이스부(310)에 저장된 대역어를 호출하여 적용한다(S74). 대역어를 적용하는 경우에는 해당 어휘에 대해서는 별도의 변환을 위한 과정이 필요하지 않다.If it is determined that the band word exists, the band word stored in the speech and language database unit 310 is called and applied (S74). In the case of applying a band word, a separate conversion process is not necessary for the corresponding vocabulary.

한편 대역어가 존재하지 않는 어휘들에 대해서는 도3 과 유사하게 음절 분리 단계(S75), g2p 변환 단계(S76) 및 발음 대역 변환 단계(S77)를 수행한다. 다만 g2p 변환을 위한 g2p 변환 테이블과 발음 대역 데이터베이스의 경우에는 한국어의 발음을 일본어로 변환하는 경우와 일본어의 발음을 한국어로 변환하는 경우가 다르므로, 서로 다른 g2p 변환 테이블과 발음 대역 데이터베이스를 사용할 수 있다.Meanwhile, similar to FIG. 3, the syllable separation step S75, the g2p conversion step S76, and the pronunciation band conversion step S77 are performed on words having no band word. However, in the case of the g2p conversion table and the pronunciation band database for g2p conversion, the conversion of Korean pronunciation to Japanese is different from the conversion of Japanese pronunciation to Korean, so different g2p conversion tables and pronunciation band databases can be used. have.

표3 은 일본어를 발음기호로 변환하기 위한 g2p 변환 테이블의 일예를 나타낸다.Table 3 shows an example of a g2p conversion table for converting Japanese into phonetic symbols.

표4 는 일본어를 한국어 음절로 변환하기 위한 발음 대역 데이터 베이스의 일예를 나타낸다.Table 4 shows an example of a pronunciation band database for converting Japanese into Korean syllables.

표1 내지 4 에서 알파벳 대문자 B 와 L 는 일본어 발음 특성상 어두, 어중, 어미에 따라 달리 발음되는 발성을 반영한 것이며, ":" 는 장음을 나타낸다.In Tables 1 to 4, the capital letters B and L of the alphabet reflect utterances pronounced differently depending on the vocabulary, the language, and the ending of the Japanese pronunciation.

대역어가 존재하는 어휘에 대해 대역어가 적용되고, 대역어가 존재하지 않는 어휘에 대해 음절 분리, g2p변환 및 발음 대역 변환이 수행되면, 대역어와 발음대역이 수행된 어휘를 다시 문장으로 결합한다(S78).When the band word is applied to the vocabulary in which the band word exists, and the syllable separation, the g2p conversion, and the pronunciation band conversion are performed on the vocabulary in which the band word does not exist, the vocabulary in which the band word and the pronunciation band are performed is combined into a sentence again (S78). .

결합된 문장은 번역된 타겟 언어 기반 텍스트의 발성음이 소스 언어 텍스트로 변환 된 것으로서, 자동 통역 장치는 디스플레이부(130)을 통해 소스 언어 텍스트를 표시한다(S79). 이에 번역된 타겟 언어 기반 텍스트와 함께 그 발성음을 소스 언어로 표시한 소스 언어 텍스트를 함께 표시될 수 있다.The combined sentence is a voice of the translated target language-based text is converted into the source language text, the automatic interpreter displays the source language text through the display 130 (S79). In addition to the target language-based text translated therein, the source language text in which the phonetic sound is displayed in the source language may be displayed.

도3 및 도4 는 한국어가 일본어로 통역되는 경우를 가정하여 설명하였으나, 일본어가 한국어로 통역되는 경우에는 도3 이 제2 발성음 변환 단계로 수행될 수 있으며, 도4 가 제1 발성음 변환 단계로 수행될 수 있을 것이다.3 and 4 have been described on the assumption that Korean is interpreted in Japanese, but when Japanese is interpreted in Korean, FIG. 3 may be performed as a second voice conversion step, and FIG. 4 is a first voice conversion. May be performed in steps.

상기에서는 일예로 한국어와 일본어 사이의 통역을 예로 들어 설명하였으나 본 발명은 이에 한정되지 않고, 다른 종류의 언어에 대해서도 적용 될 수 있음은 자명하다.
In the above, as an example, an interpreter between Korean and Japanese has been described as an example, but the present invention is not limited thereto, and it is obvious that the present invention may be applied to other types of languages.

본 발명에 따른 방법은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등이 있다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The method according to the invention can be embodied as computer readable code on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of the recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. Although the present invention has been described with reference to the embodiments shown in the drawings, this is merely exemplary, and it will be understood by those skilled in the art that various modifications and equivalent other embodiments are possible.

따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이다.Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

Claims

An interface unit for receiving a user language command from a user and a source language-based voice to be interpreted, and outputting source language text for displaying a sound of the target language-based text in which the source language-based voice is translated as a source language;
A setting unit configured to set information about the source language and the target language in response to the user command; And
The source language-based voice is received through the interface unit to recognize the voice, convert the source language-based text, convert the utterance of the converted source language-based text into the target language text, and convert the source language-based text into the target language. And an interpreter for translating the text into the base text and analyzing the grammatical errors in the source language-based text and the target language-based text.

The method of claim 1, wherein the interpreter
An acoustic and language database unit for storing a recognition network integrating a language model, an acoustic model, and a pronunciation dictionary for the source language and the target language;
A speech recognition unit for analyzing the source language-based voice received from the interface unit based on the recognition network of the sound and language database unit and converting the source language-based speech into source language-based text;
A text translation unit which receives the source language-based text from the speech recognition unit and translates the source language-based text into the target language-based text;
A voice synthesizer configured to receive the target language-based text from the text translator, synthesize a voice corresponding to the target language-based text, generate a synthesized sound, and transmit the synthesized sound to the interface unit; And
A phonetic sound conversion unit for receiving the target language-based text, converting the utterance of the target language-based text into the source language text, and outputting the source language text to the interface unit; Automatic interpretation device comprising a.

The method of claim 2, wherein the voice conversion unit
Receiving the source language-based text from one of the speech recognition unit and the text translation unit, converts the sound of the source language-based text received using the recognition network of the language database unit to the target language text A first phonetic sound converter configured to output to the interface unit; And
A second phonetic sound converter which receives the target language-based text from one of the text translator and the speech synthesizer, converts the voiced sound of the target language-based text into the source language text and outputs it to the interface unit; Automatic interpretation device comprising a.

The method of claim 3, wherein each of the first and second voice conversion unit
And a preprocessor for converting the symbols included in the source language-based text and the target language-based text into corresponding language-based text.

The method of claim 3, wherein the sound and language database unit
The recognition network includes at least one of a pronunciation variation database, a graph to phoneme (g2p) conversion table, a pronunciation band database, and a band word database according to the type of the source language and the target language. Automatic translator device.

6. The automatic interpreter of claim 5, wherein the source language is Korean and the target language is Japanese.

The method of claim 6, wherein the first voice conversion unit
The phonetic variation is performed on the source language-based text to correspond to the phonetic variation according to the characteristics of the source language, sequentially separated into units of words, syllables, and phonemes, and separated using the g2p conversion table. Converts a phoneme into a phonetic symbol of a phoneme, combines the converted phoneme unit phoneme into a syllable unit, converts a phoneme into a syllable of the target language corresponding to the combined syllable, and combines the converted syllables to restore the word Thereby generating the target language text displaying the utterance of the source language based text.

The method of claim 7, wherein the second voice conversion unit
Determine whether there is a band word, which is a vocabulary commonly used in a source language, in the target language-based text, apply a pronunciation of a band word to a vocabulary in which a band word exists, and use a word, syllable and Sequentially separating phonemes, and converting the phonemes separated using the g2p conversion table into phonetic symbols of phoneme units, combining the phoneme units phonetic symbols into syllable units, and then corresponding to the combined syllables. And converting the syllables of the source language into a syllable of the target language based text by restoring the word by combining the pronunciation of the band word and the converted syllables.

The method of claim 2, wherein the interface unit
A voice detector for detecting a voice input by the user and transmitting the voice to the interpreter;
A user input unit implemented as a user command input unit to receive the user command or the source language-based text;
Utterance of the source language text and the source language text in which the source language-based text is translated into the target language-based text, and the source language text in which the voice of the target language-based text is expressed in a source language. A display unit configured to display at least one of the target language texts in which sound is written as a target language; And
And an audio output unit configured to be output as the voice output unit to output the synthesized sound.

In the automatic interpretation method of the automatic interpretation device having an interface unit, a setting unit and an interpretation unit, the automatic interpretation device is
Storing an automatic interpretation setting in response to a user command applied through the interface unit;
Determining whether a source language based voice is applied through the interface unit;
When the source language-based voice is applied, speech recognition is performed using a recognition network integrating a language model, an acoustic model, and a pronunciation dictionary for the source language and target language stored in the sound and language database of the interpreter. Generating the base text;
Converting the sound of the source language-based text into the target language text before outputting the target language-based text; And
Translating the source language-based text into a target language-based text using the recognition network;
Generating the source language-based text and converting the target language-based text to output the text are based on:
Automatic interpretation method for analyzing and correcting a grammar error in the source language-based text and the target language-based text.

delete

11. The method of claim 10 wherein the source language is Korean and the target language is Japanese.

The network of claim 12, wherein the recognition network is
An automatic interpretation comprising a pronunciation variation database, a graph to phoneme (g2p) conversion table, a pronunciation band database and a band language database according to the type of the source language and the target language Way.

The method of claim 13, wherein the converting the target language text into output is performed.
Performing a phonetic variation on the source language-based text to correspond to a phonetic variation according to the characteristics of the source language;
Sequentially dividing the source language-based text on which the pronunciation shift is performed in units of words, syllables, and phonemes;
Converting the phoneme separated using the g2p conversion table into phonetic symbols of phoneme units;
Combining the converted phonetic unit phonetic symbols in syllable units;
Converting into syllables of the target language corresponding to the combined syllables;
Combining the converted syllables to generate the target language text indicating the sound of the source language based text by restoring the word; And
Outputting the target language text through the interface unit; Automatic interpretation method comprising a.

The method of claim 10,
Converting the sound of the translated target language-based text into source language text and outputting the same;
The step of converting to the source language text and outputting
Determining whether the target language-based text includes a band word, which is a vocabulary commonly used in a source language;
Applying a pronunciation of a band word to a vocabulary in which the band word exists;
Sequentially separating words having no band word into words, syllables, and phonemes;
Converting the phoneme separated using the g2p conversion table into phonetic symbols of phoneme units;
Combining the converted phonetic unit phonetic symbols in syllable units;
Converting into syllables of the source language corresponding to the combined syllables;
Generating the source language text indicating the sound of the target language based text by restoring the word by combining the pronunciation of the band word and the converted syllables; And
Outputting the source language text through the interface unit; Automatic interpretation method comprising a.

The method of claim 15, wherein the converting and outputting the converted target language text and outputting the converted target language text are respectively performed.
Analyze and correct grammatical errors in the source language-based text and the target language-based text before performing the phonetic variation conversion and determining whether the band language exists, and corrects the source language-based text and the target language-based text. And a pre-processing step of converting the symbols included in the corresponding language-based text.

The method of claim 10, wherein the automatic interpretation method
If the source language-based voice is not applied before the step of translating the target language-based text, directly receiving the source language-based text through the interface unit; And
After translating to the target language-based text, synthesizing a voice corresponding to the target language-based text to generate a synthesized sound and outputting the synthesized sound through the interface unit; Automatic interpretation method further comprising a.

18. A recording medium having recorded thereon a computer readable program for performing the automatic interpretation method of the automatic interpretation apparatus according to any one of claims 10 and 12.