KR20150105075A

KR20150105075A - Apparatus and method for automatic interpretation

Info

Publication number: KR20150105075A
Application number: KR1020140027275A
Authority: KR
Inventors: 이수종; 김상훈; 김정세; 박상규
Original assignee: 한국전자통신연구원
Priority date: 2014-03-07
Filing date: 2014-03-07
Publication date: 2015-09-16
Also published as: CN104899192A; CN104899192B

Abstract

Disclosed are an apparatus and a method for automatic interpretation. The automatic interpretation apparatus comprises an input part receiving pronunciation of a source language; a voice recognition part recognizing the pronunciation of the source language, and generating a voice recognized sentence; an interpretation part converting the voice recognized sentence into a text sentence of a target language; a pronunciation generation part generating a phonetic symbol of each of the target language and the source language for a partner country user to reappear or visually recognize the voice recognized sentence and the text sentence of the target language by using a pronunciation translation database; and an output part outputting the voice recognized sentence, text sentence, phonetic symbol of the target language and source language on a screen or as a voice.

Description

[0001] Apparatus and method for automatic interpretation [

본 발명은 한국어 및 중국어 간의 자동 통역 장치 및 방법에 관한 것이다.
The present invention relates to an automatic interpretation apparatus and method between Korean and Chinese.

한중 자동 통역은 서로의 언어를 모르는 한국인과 중국인 간에 의사소통을 도와 주기 위한 것으로, 기본적으로 소스 언어의 발성, 음성 인식 문장 생성, 타켓 언어로의 자동번역 및 타겟 언어 합성음 출력으로 구성된다. 소스 언어의 발성음은 음성인식에 의해 소스언어의 문장으로 생성되고, 자동 번역에 의해 타겟 언어의 문장으로 번역되고, 타겟 언어의 문장을 바탕으로 타겟 언어의 음성으로 합성되어 출력된다. 그런데, 발성음은 바로 소멸되기 때문에, 그 발성음을 기억하거나 직접 재현하여 의사소통에 활용하기는 사실상 불가능하다. 또한, 발성음은 사용빈도가 낮은 고유명사 또는 소음환경에서는 음성인식 성능이 급격히 저하된다. 이러한 경우, 사용자는 자동 통역 장치에 텍스트를 직접 입력하여 의사소통하게 되고, 특히, 자주 사용하거나 상대국에서 사용하는 간단한 문장을 익혀 직접 발성해야 할 필요성도 제기된다. 이러한 상황에 대처하기 위해서는, 자동 통역 기능을 능동적으로 활용할 수 있도록 추가적인 인터페이스가 최대한 제공되어야 한다.
Korean-Chinese automatic interpreting is to help communication between Korean and Chinese who do not know each other's language. Basically, it consists of source language utterance, voice recognition sentence generation, automatic translation into target language, and output of target language synthesized sound. The utterance sound of the source language is generated as a sentence of the source language by speech recognition, translated into a sentence of the target language by automatic translation, and synthesized and output as the voice of the target language based on the sentence of the target language. However, since the voiced sound disappears immediately, it is practically impossible to memorize or reproduce the voiced sound and use it for communication. In addition, speech recognition performance deteriorates drastically in a proper noun or noise environment in which a vocal sound is used infrequently. In this case, the user is required to input text directly into the automatic interpretation device, and in particular, it is necessary to learn simple sentences used frequently or in other countries to directly speak. To cope with this situation, additional interfaces should be provided as much as possible in order to actively utilize the automatic interpretation function.

본 발명은 소스 언어의 발성음을 음성인식하고, 음성인식 문장에 부가하여 그 문장의 발성음을 타겟언어의 문자로 표시해 주고, 타겟 언어로의 자동번역 문장에 부가하여 그 문장의 발성음을 소스언어의 문자로 표기해 주는 자동 통역 장치 및 방법을 제안하는 것이다.
In the present invention, speech sounds of a source language are voice-recognized. In addition to the speech recognition sentence, a speech sound of the sentence is displayed as a character of a target language. In addition to the automatic translation sentence to the target language, And an automatic interpretation device and method for expressing the characters in the language.

본 발명의 일 측면에 따르면, 자동 통역 장치가 개시된다.According to an aspect of the present invention, an automatic interpretation apparatus is disclosed.

본 발명의 실시예에 따른 자동 통역 장치는 소스 언어의 발성음을 입력받는 입력부, 상기 소스 언어의 발성음을 음성 인식하여 음성 인식 문장을 생성하는 음성 인식부, 상기 음성 인식 문장을 타겟 언어로 된 텍스트 문장으로 변환하는 번역부, 발음 대역 데이터베이스를 이용하여, 상기 음성 인식 문장 및 상기 타겟 언어로 된 상기 텍스트 문장에 대하여 상대국 사용자가 재현 또는 시각적으로 인식할 수 있도록 각각 상기 타겟 언어 및 상기 소스 언어로 된 발음 기호를 생성하는 발성음 생성부 및 상기 음성 인식 문장, 상기 텍스트 문장, 상기 타겟 언어 및 상기 소스 언어로 된 발음 기호를 화면 또는 음성으로 출력하는 출력부를 포함한다.An automatic interpretation apparatus according to an embodiment of the present invention includes an input unit for inputting a speech sound of a source language, a speech recognition unit for generating a speech recognition sentence by voice recognition of a speech sound of the source language, A target language and a source language such that the user of the target station can reproduce or visually recognize the speech recognition sentence and the text sentence in the target language by using a translation unit and a pronunciation band database, And an output unit for outputting a phonetic symbol in the speech recognition sentence, the text sentence, the target language, and the source language on a screen or a voice.

상기 입력부는 마이크를 통해 한국어 및 중국어 중 하나로 된 발성음을 입력받거나, 문자 입력 모듈을 통해 한국어 및 중국어 중 하나로 된 텍스트 문장을 입력받는다.The input unit receives a vocal sound in one of Korean and Chinese through a microphone, or receives a text sentence in one of Korean and Chinese through a character input module.

상기 발음 대역 데이터베이스는 중국어 문장의 발성음을 한국어 발음 기호로 표기하기 위한 한국어 발음대역어 데이터 및 한국어 문장의 발성음을 중국어 발음 기호로 표기하기 위한 중국어 발음대역어 데이터를 저장한다.The pronunciation band database stores Chinese pronunciation word data for expressing the pronunciation sound of the Chinese sentence with the Korean pronunciation symbol and Chinese pronunciation pronunciation data for expressing the pronunciation sound of the Korean sentence with the Chinese pronunciation symbol.

상기 한국어 발음대역어 데이터는 한어병음을 발음 단위로 분리하여 중국어 발음 단위에 상응하는 한국어 발음 기호를 맵핑한 정보를 포함하되, 상기 한어병음은 중국어 한자의 발음을 4성으로 된 성조가 포함된 중국식 로마자로 표기하는 발음 기호이다.Wherein the Korean pronunciation band word data includes information in which a Chinese pronunciation unit corresponding to a Chinese pronunciation unit is mapped by separating a Chinese word pair into a pronunciation unit and the Chinese word is pronounced as a Chinese roman Is a pronunciation symbol.

상기 중국어 발음대역어 데이터는 한국어 음절을 초성, 중성 및 종성으로 나누고, 상기 초성, 중성 및 종성을 각각 한어병음의 발음 기호 또는 로마자 표기로 맵핑한 정보를 포함한다.The Chinese pronunciation domain word data includes Korean syllables divided into first, second, and last names, and the first, second, and third names are mapped to a phonetic symbol or roman alphabet of a Chinese word.

상기 자동 통역 장치는 이동 단말기의 어플리케이션으로 구현된다.The automatic interpretation apparatus is implemented as an application of a mobile terminal.

본 발명의 다른 측면에 따르면, 자동 통역 장치가 수행하는 자동 통역 방법이 개시된다.According to another aspect of the present invention, an automatic interpretation method performed by an automatic interpretation apparatus is disclosed.

본 발명의 실시예에 따른 자동 통역 방법은 소스 언어의 발성음을 입력받는 단계, 상기 소스 언어의 발성음을 음성 인식하여 음성 인식 문장을 생성하여 출력하는 단계, 발음 대역 데이터베이스를 이용하여, 상기 음성 인식 문장에 대하여 상대국 사용자가 재현 또는 시각적으로 인식할 수 있도록 타겟 언어로 된 발음 기호를 생성하여 출력하는 단계, 상기 음성 인식 문장을 상기 타겟 언어로 된 텍스트 문장으로 변환하여 출력하는 단계 및 상기 발음 대역 데이터베이스를 이용하여, 상기 타겟 언어로 된 상기 텍스트 문장에 대하여 상대국 사용자가 재현 또는 시각적으로 인식할 수 있도록 상기 소스 언어로 된 발음 기호를 생성하여 출력하는 단계를 포함한다.An automatic interpretation method according to an embodiment of the present invention includes the steps of receiving an utterance sound of a source language, generating a speech recognition sentence by voice recognition of the utterance sound of the source language and outputting the generated speech recognition sentence, Generating and outputting a pronunciation symbol in a target language so that the user of the partner station can recognize or visually recognize the recognition sentence; converting the speech recognition sentence into a text sentence in the target language and outputting the pronunciation sentence; And generating and outputting phonetic symbols in the source language so that the user of the partner station can reproduce or visually recognize the text sentence in the target language using the database.

상기 소스 언어의 발성음을 입력받는 단계는, 마이크를 통해 한국어 및 중국어 중 하나로 된 발성음을 입력받는 단계 또는 문자 입력 모듈을 통해 한국어 및 중국어 중 하나로 된 텍스트 문장을 입력받는 단계를 포함한다.The step of receiving the utterance sound of the source language includes receiving a utterance sound of one of Korean and Chinese through a microphone or inputting a text sentence of one of Korean and Chinese through a character input module.

상기 중국어 발음대역어 데이터는 한국어 음절을 초성, 중성 및 종성으로 나누고, 상기 초성, 중성 및 종성을 각각 한어병음의 발음 기호 또는 로마자 표기로 맵핑한 정보를 포함한다.
The Chinese pronunciation domain word data includes Korean syllables divided into first, second, and last names, and the first, second, and third names are mapped to a phonetic symbol or roman alphabet of a Chinese word.

본 발명은 소스 언어의 발성음을 음성인식하고, 음성인식 문장에 부가하여 그 문장의 발성음을 타겟언어의 문자로 표시해 주고, 타겟 언어로의 자동번역 문장에 부가하여 그 문장의 발성음을 소스언어의 문자로 표기함으로써, 상대국 언어를 직접 발성하여 의사소통에 도움을 줄 수 있다.
In the present invention, speech sounds of a source language are voice-recognized. In addition to the speech recognition sentence, a speech sound of the sentence is displayed as a character of a target language. In addition to the automatic translation sentence to the target language, By writing in the language of the language, the language of the partner country can be directly spoken to help in communication.

도 1은 한국어 및 중국어 간의 자동 통역 장치의 구성을 개략적으로 예시한 도면.
도 2는 도 1의 자동 통역 장치에서 한국어 및 중국어 간의 자동 통역 방법을 나타낸 흐름도.
도 3은 발음 대역 데이터베이스에 구축된 데이터를 예시한 도면.Brief Description of the Drawings Fig. 1 schematically illustrates a configuration of an automatic interpretation device between Korean and Chinese. Fig.
2 is a flowchart showing an automatic interpretation method between Korean and Chinese in the automatic interpretation apparatus of FIG.
Fig. 3 illustrates data constructed in the pronunciation band database; Fig.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 이를 상세한 설명을 통해 상세히 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.While the present invention has been described in connection with certain exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and similarities. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

본 발명을 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 본 명세서의 설명 과정에서 이용되는 숫자는 하나의 구성요소를 다른 구성요소와 구분하기 위한 식별기호에 불과하다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. In addition, numerals used in the description of the present invention are merely an identifier for distinguishing one component from another.

또한, 본 명세서에서, 일 구성요소가 다른 구성요소와 "연결된다" 거나 "접속된다" 등으로 언급된 때에는, 상기 일 구성요소가 상기 다른 구성요소와 직접 연결되거나 또는 직접 접속될 수도 있지만, 특별히 반대되는 기재가 존재하지 않는 이상, 중간에 또 다른 구성요소를 매개하여 연결되거나 또는 접속될 수도 있다고 이해되어야 할 것이다.Also, in this specification, when an element is referred to as being "connected" or "connected" with another element, the element may be directly connected or directly connected to the other element, It should be understood that, unless an opposite description is present, it may be connected or connected via another element in the middle.

이하, 본 발명의 실시예를 첨부한 도면들을 참조하여 상세히 설명하기로 한다. 본 발명을 설명함에 있어 전체적인 이해를 용이하게 하기 위하여 도면 번호에 상관없이 동일한 수단에 대해서는 동일한 참조 번호를 사용하기로 한다.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In order to facilitate a thorough understanding of the present invention, the same reference numerals are used for the same means regardless of the number of the drawings.

도 1은 한국어 및 중국어 간의 자동 통역 장치의 구성을 개략적으로 예시한 도면이다.1 is a schematic view illustrating a configuration of an automatic interpretation apparatus between Korean and Chinese.

도 1을 참조하면, 자동 통역 장치는 입력부(10), 음성 인식부(20), 번역부(30), 발성음 생성부(40), 출력부(50), 발음대역 데이터베이스(60) 및 번역엔진(70)을 포함한다.1, an automatic interpretation apparatus includes an input unit 10, a speech recognition unit 20, a translation unit 30, a voiced sound generation unit 40, an output unit 50, a pronunciation band database 60, And an engine (70).

입력부(10)는 소스 언어의 발성음을 입력받는다. 예를 들어, 입력부(10)는 마이크를 통해 한국어 및 중국어 중 하나로 된 발성음을 입력받을 수 있다. 또는, 입력부(10)는 문자 입력 모듈을 통해 한국어 및 중국어 중 하나로 된 텍스트 문장을 입력받을 수도 있다. 이와 같은 경우, 입력부(10)는 입력받은 문장을 바로 번역부(30)로 전달할 수 있다.The input unit 10 receives a sound of a source language. For example, the input unit 10 can receive voices in one of Korean and Chinese through a microphone. Alternatively, the input unit 10 may receive a text sentence in one of Korean and Chinese through a character input module. In this case, the input unit 10 can directly transmit the input sentence to the translator 30.

음성 인식부(20)는 입력받은 소스 언어로 된 발성음을 전달 받아 음성 인식을 수행하고, 음성 인식 문장을 생성한다. 예를 들어, 음성 인식부(20)는 입력받은 소스 언어의 발성음에 대하여 신호처리 수행하여 음성구간을 분리해 낸 후, 음성구간을 대상으로 음성특징을 추출한다. 한편, 음성 인식부(20)는 음성 데이터베이스 및 언어 데이터베이스에 기반하여, 음향모델, 발음사전, 언어모델을 구축하고

, 이들이 통합된 인식네트워크를 형성한다. 이어, 음성 인식부(20)는 추출한 음성특징들을 인식네트워크를 통하여 음성인식 텍스트 문장으로 변환하고, 변환한 음성인식 텍스트 문장을 출력부(50)를 통해 사용자 화면에 출력할 수 있다.The speech recognition unit 20 receives a speech sound in the input source language, performs speech recognition, and generates a speech recognition sentence. For example, the speech recognition unit 20 performs a signal process on the speech sound of the input source language to separate the speech segment, and then extracts the speech segment from the speech segment. On the other hand, the speech recognition section 20 constructs an acoustic model, a pronunciation dictionary, and a language model based on the speech database and the language database

, Which form an integrated recognition network. Then, the speech recognition unit 20 converts the extracted speech characteristics into a speech recognition text sentence through the recognition network, and outputs the converted speech recognition text sentence through the output unit 50 to the user screen.

예를 들어, 음성 인식부(20)는 "안녕하세요"를 소스 언어로 발성한 결과가 사용자 화면에 출력되도록 음성인식 문장을 생성할 수 있다.For example, the speech recognition unit 20 can generate a speech recognition sentence such that the result of uttering "Hello" in the source language is output to the user's screen.

번역부(30)는 번역엔진(70)을 이용하여, 소스 언어로 된 음성인식 문장 또는 입력받은 문장을 타겟 언어로 된 텍스트 문장으로 변환한다.The translation unit 30 converts the speech recognition sentence in the source language or the received sentence into a text sentence in the target language by using the translation engine 70. [

예를 들어, 번역부(30)는 음성 인식 문장이 "안녕하세요"인 경우, "안녕하세요"를 "안녕하세요"라는 뜻의 중국어 한문인 "

"로 변환할 수 있다.For example, when the speech recognition sentence is "Hello ", the translator 30 reads" Hello ","Hello&

"

발성음 생성부(40)는 발음 대역 데이터베이스(60)를 이용하여, 소스 언어로 된 음성 인식 문장 및 음성 인식 문장을 타겟 언어로 변환한 텍스트 문장에 대하여 상대국 사용자가 재현 또는 시각적으로 인식할 수 있도록 각각 타겟 언어 및 소스 언어로 된 발음 기호를 생성한다.The utterance tone generator 40 uses the pronunciation band database 60 so that a speech recognition sentence in a source language and a text sentence in which a speech recognition sentence is converted into a target language can be reproduced or visually recognized And generates phonetic symbols in the target language and the source language, respectively.

예를 들어, 발성음 생성부(40)는 음성 인식 문장이 "안녕하세요"인 경우, "(an-nyeong-ha-se-yo)"라고 중국어 발음 기호를 생성할 수 있고, 중국어로 변환한 텍스트 문장이 "

"인 경우, "니ˇ 하오ˇ"라고 한국어 발음 기호를 생성할 수 있다.For example, when the speech recognition sentence is "Hello ", the utterance tone generator 40 can generate a Chinese pronunciation symbol" (an-nyeong-ha-se-yo) The sentence "

Quot ;, it is possible to generate a Korean pronunciation symbol "Nihao ".

이를 위하여, 발음 대역 데이터베이스(60)는 중국어 문장의 발성음을 한국어 발음 기호로 표기하기 위한 한국어 발음대역어 데이터 및 한국어 문장의 발성음을 중국어 발음 기호로 표기하기 위한 중국어 발음대역어 데이터를 저장한다.For this purpose, the pronunciation band database 60 stores Chinese pronunciation word data for expressing the pronunciation sound of a Chinese sentence as a Korean pronunciation symbol and Chinese pronunciation pronunciation data for expressing a pronunciation sound of a Korean sentence as a Chinese pronunciation symbol.

예를 들어, 도 3은 발음 대역 데이터베이스에 구축된 데이터를 예시한 도면이다. 이하, 발음 대역 데이터베이스(60)에 대하여 도 3을 참조하여 설명한다.For example, FIG. 3 is a diagram illustrating data constructed in the pronunciation band database. Hereinafter, the pronunciation band database 60 will be described with reference to FIG.

중국어는 "ㄴ, ㄹ, ㅇ" 외에는 종성이 없는 음절로 구성되며, 특히, 음의 고저 장단으로 의미를 구별하는 성조를 갖는다. 이들의 특성이 발음 대역 데이터베이스(60)에 구축된 데이터에 반영된다.Chinese is composed of syllables which have no consonant other than "b, d, ㅇ". These characteristics are reflected in the data built in the sounding band database 60. [

먼저, 중국어 문장의 발성음을 한국어 발음 기호로 표기하기 위한 한국어 발음대역어 데이터에 대하여 설명한다.First, Korean pronunciation pronunciation word data for expressing a phonetic note of Chinese sentence with a Korean pronunciation symbol will be described.

한어병음은 중국어 한자의 발음을 중국식 로마자로 표기하는 발음 기호이다. 예를 들어, 한어병음은 "

"와 같이, "안녕하세요"라는 중국어 한자와 함께 [ ]안에 중국어 고유의 액센트 즉, 4성으로 된 성조가 포함된 중국식 로마자로 표기된 발음 기호가 될 수 있다. 4성을 살펴보면, 제1 성(" ￣ " 또는 "1")은 고음에서 시작하여 계속 같은 높이로 평탄하게 발음하는 것이고, 제2 성(" ´ " 또는 "2")은 중음에서 시작하여 고음으로 상승하며 발음하는 것이고, 제3 성(" ˇ " 또는 "3")은 중저음에서 시작하여 저음으로 내려갔다가 다시 올라가는 음으로 발음하는 것이고, 제4성 (" ｀ " 또는 "4")은 고음에서 시작하여 급격히 최저음으로 내려가면서 발음하는 것이다. 그리고, 중국어는 성조가 없는 경우도 있다.Chinese phonetics is a phonetic symbol in Chinese phonetic transcription. For example, Chinese words "

"As with the Chinese character" Hello ", [], it can be the Chinese accent, that is, the pronunciation symbol written in Chinese romanization, including the four- Quot; or "1") is to start from a treble and continue to be flatter at the same height, and the second ("("" Or "3") starts from the bass and goes down to the bass and back up. The fourth sex ("` "or" 4 " In addition, there are cases where there is no Chinese language.

실제 구현 결과, 한어병음 세트는 중국어 발음사전과 한어병음표로부터 모두 2,441개를 추출되었고, 이들 각각에 대하여 한국어 발음대역어가 구축되었다. 여기서, 중국어 발음사전은 "

/ni3 hao3"와 같이 중국어 어휘와 그 발음 기호를 대비시켜 놓은 것인데, 이 것들을 발음 단위로 분리하여, ni3 / 니ˇ, hao3 / 하오ˇ와 같이 중국어 발음 단위에 대한 한국어 발음 기호가 한국어 발음대역어 데이터로 구축된 것이다. 한국어 발음 기호에도 중국어 성조 기호가 포함되어, 발음에 참고될 수 있다. As a result of actual implementation, 2,441 Chinese phonetic sets were extracted from Chinese phonetic dictionary and Chinese phonetic table, and Korean pronunciation words were constructed for each of them. Here, the Chinese pronunciation dictionary is "

/ ni3 hao3 ", which are divided into pronunciation units, and the Korean pronunciation symbols for the Chinese pronunciation units such as ni3 / ni and hao3 / The Korean pronunciation symbol also includes the Chinese character symbol, which can be referenced to pronunciation.

이와 같은 중한 발음대역어 데이터를 기반으로, 중국어 "

"이 발성되면, 중국어 g2p(grapheme to phoneme) 변환 데이터베이스 및 인식네트워크를 통해 중국어 음성인식 문장으로 "

"이 생성되고, 동시에 중국어 발음사전에 의해 "ni3 hao3"가 생성될 수 있다. 또한, 중한 발음대역어 데이터가 활용되어, 그 발성음의 한국어 발음 기호인 "니ˇ 하오ˇ"가 생성될 수 있다. 여기서, g2p 변환 데이터베이스는 중국어 발음사전에서의 한어병음을 음성인식을 위하여 일부 확장하여 재구성된 것이다.Based on the above-mentioned medium-term pronunciation data,

"When this is spoken, Chinese g2p (grapheme to phoneme) conversion database and Chinese speech recognition sentence through the recognition network"

Quot ;, and "ni3 hao3" can be generated by the Chinese pronunciation dictionary at the same time. Further, the middle pronunciation language word data is utilized, and the Korean pronunciation phonetic symbol "Nihao" . Here, the g2p conversion database is reconstructed by extending a part of Chinese words in the Chinese pronunciation dictionary for speech recognition.

다음으로, 한국어 문장의 발성음을 중국어 발음 기호로 표기하기 위한 중국어 발음대역어 데이터에 대하여 설명한다.Next, the Chinese pronunciation pronunciation word data for expressing the pronunciation sound of the Korean sentence with the Chinese pronunciation pronunciation symbol will be described.

한중 발음대역어 데이터 구축을 위하여, 첫 단계로, 한국어 음절은 초성, 중성 및 종성으로 나누고, 둘째 단계로는 이들을 한국어 g2p 변환 데이터베이스로 변환한다. 이들에 대한 중국어 발음대역어는 한어병음의 발음 기호와 로마자 표기가 선별적으로 활용되었다. 이하에서 좀 더 자세히 살펴본다.In order to construct Korean-Chinese pronunciations, the first step is to divide Korean syllables into first, neutral, and last words. In the second step, they are converted into a Korean g2p conversion database. The phonetic symbols of Chinese phonetics and phonetic symbols of Chinese phonetics were used selectively. We will look more closely below.

한국어의 가능한 모든 음절은 모두 3192개이고, 초성 19개, 중성 21개, 대표 종성 7개로서, 한국어는 종성이 없는 경우를 포함하여 이들의 조합으로 구성된다. 예로서, "한 / ㅎㅏㄴ, 로 / ㄹㅗ" 와 같은 것이다. 이들은 한국어 g2p 변환 규칙에 의하여, "한 / ㅎㅏㄴ / h a xn", "로 / ㄹㅗ / r o"과 같이 한국어 g2p 변환 테이블로 구축되었다. 여기서, g2p는 음소 단위의 발음 기호로서, 음성을 문자로 변환시키기 위한 단위이며, 발음사전 구축의 토대가 된다. 실제로, 한중 발음대역어 데이터에는 한국어 g2p 단위가 활용된다.All possible syllables in Korean are 3192, 19 initials, 21 neutral, 7 representative, and Korean consists of a combination of these, including the case where there is no uterus. As an example, such as "Han / ㅎ,, ro / ㅗ ㅗ". They are constructed by the Korean g2p conversion rules, with Korean g2p conversion tables, such as "Korean / Korean / h a xn" and "Korean / r o / r o". Here, g2p is a phonetic unit phonetic symbol, which is a unit for converting speech to text, and serves as a basis for constructing a phonetic dictionary. Actually, the Chinese g2p unit is used for the Chinese-Korean pronunciation word data.

한국어 음절에 대한 중국어 발음대역어 데이터의 구축에 있어서는, 로마자 표기와 한어병음 표기가 혼합되었다. 즉, 한국어 음절 중에서 한어병음으로 표현 가능한 293개의 경우에는 이를 반영하되, 이외의 2,899개는 로마자로 표기되었다. 한국어 g2p에 대비한 한어병음의 예로서, "한 / ㅎㅏㄴ / h a xn / han", "로 / ㄹㅗ / r o / lo"와 같은 것이다. 한어병음에 그 발음기호를 찾기 어려운 경우에는 "국 / ㄱㅜㄱ / g u xg / guk"과 같이, 로마자로 표기하였다. 이와 같이, 구축된 한중 발음대역어 데이터는 "h a xn / han", "r o / lo", "g u xg / guk"와 같은 형태가 된다.In constructing Chinese-language pronunciation data for the Korean syllables, the Roman and Chinese words are mixed. In other words, 293 cases of Korean syllables that can be represented by Chinese words are reflected, while 2,899 other words are expressed in Roman numerals. An example of a Chinese word for Chinese g2p is "han / han xn / han", "han / ran / ro / lo". If it is difficult to find the phonetic symbol in Chinese phonetics, it is written in Roman numerals, such as "country / a ㄱ a / g u x g / guk". Thus, the established Chinese-language pronunciation word data has the form of "h a x n / han", "r o / lo", and "g u x g / guk".

출력부(50)는 디스플레이 모듈 또는 소리 출력 모듈을 포함하며, 소스 언어로 된 음성인식 문장과 음성인식 문장에 대한 타겟 언어로 된 발음 기호, 음성 인식 문장을 타겟 언어로 변환한 텍스트 문장과 변환한 텍스트 문장에 대한 소스 언어로 된 발음 기호를 화면 또는 음성으로 출력한다.The output unit 50 includes a display module or a sound output module. The output unit 50 converts a speech recognition sentence in a source language, a pronunciation symbol in a target language for a speech recognition sentence, and a text sentence converted into a target language into a speech recognition sentence Outputs phonetic symbols in the source language for text sentences on the screen or voice.

예를 들어, 이와 같은 자동 통역 장치는 기존의 자동 통역기에 적용되거나, 스마트폰과 같은 이동 단말기의 어플리케이션으로 구현될 수 있다.
For example, such an automatic interpretation apparatus may be applied to an existing automatic translator or may be implemented as an application of a mobile terminal such as a smart phone.

도 2는 도 1의 자동 통역 장치에서 한국어 및 중국어 간의 자동 통역 방법을 나타낸 흐름도이다.2 is a flowchart illustrating an automatic interpretation method between Korean and Chinese in the automatic interpretation apparatus of FIG.

S210 단계에서, 자동 통역 장치는 소스 언어의 발성음을 입력받는다. 예를 들어, 자동 통역 장치는 마이크를 통해 한국어 및 중국어 중 하나로 된 발성음을 입력받을 수 있다. 또는, 입력부(10)는 문자 입력 모듈을 통해 한국어 및 중국어 중 하나로 된 텍스트 문장을 입력받을 수도 있다.In step S210, the automatic interpretation apparatus receives the utterance sound of the source language. For example, the automatic interpretation device can receive a vocal sound in one of Korean and Chinese through a microphone. Alternatively, the input unit 10 may receive a text sentence in one of Korean and Chinese through a character input module.

S220 단계에서, 자동 통역 장치는 입력받은 소스 언어로 된 발성음을 전달 받아 음성 인식을 수행하고, 음성 인식 문장을 생성한다. 예를 들어, 자동 통역 장치는 입력받은 소스 언어의 발성음에 대하여 신호처리 수행하여 음성구간을 분리해 낸 후, 음성구간을 대상으로 음성특징을 추출한다. 한편, 자동 통역 장치는 음성 데이터베이스 및 언어 데이터베이스에 기반하여, 음향모델, 발음사전, 언어모델을 구축하고, 이들이 통합된 인식네트워크를 형성한다. 이어, 자동 통역 장치는 추출한 음성특징들을 인식네트워크를 통하여 음성인식 텍스트 문장으로 변환하고, 변환한 음성인식 텍스트 문장을 사용자 화면을 통해 출력할 수 있다. 만약, 자동 통역 장치는 문자 입력 모듈을 통해 한국어 및 중국어 중 하나로 된 텍스트 문장을 입력받은 경우, S220 단계가 생략될 수 있다.In step S220, the automatic interpretation device receives speech sounds in the input source language, performs speech recognition, and generates a speech recognition sentence. For example, the automatic interpretation device performs signal processing on the speech sound of the input source language to separate speech sections, and then extracts speech features from the speech sections. On the other hand, the automatic interpretation device builds an acoustic model, a pronunciation dictionary, and a language model based on a voice database and a language database, and forms a recognition network in which these are integrated. Then, the automatic interpretation device converts the extracted voice features into a speech recognition text sentence through the recognition network, and outputs the converted speech recognition text sentence through the user screen. If the automatic interpretation apparatus receives a text sentence in either Korean or Chinese through the character input module, step S220 may be omitted.

S230 단계에서, 자동 통역 장치는 소스 언어로 된 음성 인식 문장에 대하여 상대국 사용자가 재현 또는 시각적으로 인식할 수 있도록 타겟 언어로 된 발음 기호를 생성하고 사용자 화면을 통해 출력한다. 예를 들어, 자동 통역 장치는 음성 인식 문장이 "안녕하세요"인 경우, "(an-nyeong-ha-se-yo)"라고 중국어 발음 기호를 생성할 수 있다.In step S230, the automatic interpretation device generates a pronunciation symbol in the target language so that the user of the partner station can recognize or visually recognize the speech recognition sentence in the source language, and outputs the pronunciation symbol on the user screen. For example, the automatic interpretation device can generate a Chinese pronunciation symbol "(an-nyeong-ha-se-yo)" when the speech recognition sentence is "Hello".

S240 단계에서, 자동 통역 장치는 소스 언어로 된 음성인식 문장 또는 입력받은 문장을 타겟 언어로 된 텍스트 문장으로 변환하고 사용자 화면을 통해 출력한다. 예를 들어, 자동 통역 장치는 음성 인식 문장이 "안녕하세요"인 경우, "안녕하세요"를 "안녕하세요"라는 뜻의 중국어 한문인 "

"로 변환할 수 있다.In step S240, the automatic interpretation device converts the speech recognition sentence in the source language or the received sentence into a text sentence in the target language and outputs it through the user screen. For example, if the speech recognition sentence is "Hello ", the automatic interpretation device will send the word "Hello" to the Chinese word "Hello"

"

S250 단계에서, 자동 통역 장치는 음성 인식 문장을 타겟 언어로 변환한 텍스트 문장에 대하여 상대국 사용자가 재현 또는 시각적으로 인식할 수 있도록 소스 언어로 된 발음 기호를 생성하고 사용자 화면을 통해 출력한다. 예를 들어, 자동 통역 장치는 중국어로 변환한 텍스트 문장이 "

"인 경우, "니ˇ 하오ˇ"라고 한국어 발음 기호를 생성할 수 있다.
In step S250, the automatic interpretation device generates a pronunciation symbol in the source language so that the user of the partner station can recognize or visually recognize the text sentence in which the speech recognition sentence is converted into the target language, and outputs the pronunciation symbol on the user screen. For example, an automatic interpretation device may be able to translate a text sentence into Chinese into "

Quot ;, it is possible to generate a Korean pronunciation symbol "Nihao ".

한편, 본 발명의 실시예에 따른 자동 통역 방법은 다양한 전자적으로 정보를 처리하는 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 저장 매체에 기록될 수 있다. 저장 매체는 프로그램 명령, 데이터 파일, 데이터 구조등을 단독으로 또는 조합하여 포함할 수 있다. Meanwhile, the automatic interpretation method according to an embodiment of the present invention may be implemented in the form of a program command that can be executed through a variety of means for processing information electronically and recorded in a storage medium. The storage medium may include program instructions, data files, data structures, and the like, alone or in combination.

저장 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 소프트웨어 분야 당업자에게 공지되어 사용 가능한 것일 수도 있다. 저장 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 또한 상술한 매체는 프로그램 명령, 데이터 구조 등을 지정하는 신호를 전송하는 반송파를 포함하는 광 또는 금속선, 도파관 등의 전송 매체일 수도 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 전자적으로 정보를 처리하는 장치, 예를 들어, 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. Program instructions to be recorded on the storage medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of software. Examples of storage media include magnetic media such as hard disks, floppy disks and magnetic tape, optical media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, magneto-optical media and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. The above-mentioned medium may also be a transmission medium such as a light or metal wire, wave guide, etc., including a carrier wave for transmitting a signal designating a program command, a data structure and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as devices for processing information electronically using an interpreter or the like, for example, a high-level language code that can be executed by a computer.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.
The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야에서 통상의 지식을 가진 자라면 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention as defined in the appended claims. It will be understood that the invention may be varied and varied without departing from the scope of the invention.

10: 입력부
20: 음성 인식부
30: 번역부
40: 발성음 생성부
50: 출력부
60: 발음대역 데이터베이스
70: 번역엔진10: Input unit
20:
30: translation department
40:
50: Output section
60: Pronunciation band database
70: Translation engine

Claims

An input unit for receiving a sound of a source language;
A speech recognition unit for recognizing a speech sound of the source language and generating a speech recognition sentence;
A translation unit for converting the speech recognition sentence into a text sentence in a target language;
A pronunciation language database for generating a pronunciation symbol in the target language and the source language so that the user of the partner station can recognize or visually recognize the speech recognition sentence and the text sentence in the target language, Generating unit; And
And an output unit for outputting the speech recognition sentence, the text sentence, the target language, and pronunciation symbols in the source language on a screen or in a voice.

The method according to claim 1,
Wherein the input unit receives a vocal sound in one of Korean and Chinese through a microphone or receives a text sentence in one of Korean and Chinese through a character input module.

The method according to claim 1,
Wherein the pronunciation band database stores Chinese pronunciation band word data for expressing a pronunciation sound of a Chinese sentence with a Korean pronunciation symbol and Chinese pronunciation pronunciation word data for expressing a pronunciation sound of a Korean sentence with a Chinese pronunciation symbol, .

The method of claim 3,
Wherein the Korean pronunciation band word data includes information in which a Korean pronunciation symbol corresponding to a Chinese pronunciation unit is mapped by separating a Chinese word pair into a pronunciation unit,
Characterized in that the Chinese phonetic symbol is a phonetic symbol for expressing the pronunciation of the Chinese Chinese character in the Chinese roman alphabet including the four-phonetic symbol.

The method of claim 3,
Wherein the Chinese phonetic word data includes information obtained by dividing a Korean syllable into a first sentence, a neutral sentence, and a last sentence, and mapping the first sentence, the neutral sentence, and the last sentence to a phonetic symbol or a roman notation of a Chinese word.

The method according to claim 1,
Wherein the automatic interpretation device is implemented as an application of a mobile terminal.

An automatic interpretation method performed by an automatic interpretation device,
Receiving a speech sound of a source language;
Generating a speech recognition sentence by voice recognition of a speech sound of the source language and outputting the speech recognition sentence;
Generating and outputting a pronunciation symbol in a target language so that the user of the partner station can reproduce or visually recognize the speech recognition sentence using the pronunciation band database;
Converting the speech recognition sentence into a text sentence in the target language and outputting the sentence; And
And generating and outputting phonetic symbols in the source language so that the user of the partner station can reproduce or visually recognize the text sentence in the target language by using the pronunciation band database.

8. The method of claim 7,
The method of claim 1,
Receiving a vocal sound in one of Korean and Chinese through a microphone; or
Inputting a text sentence in one of Korean and Chinese through a character input module.

8. The method of claim 7,
Wherein the pronunciation band database stores Chinese pronunciation band word data for expressing a pronunciation sound of a Chinese sentence with a Korean pronunciation symbol and Chinese pronunciation pronunciation word data for expressing a pronunciation sound of a Korean sentence with a Chinese pronunciation symbol .

10. The method of claim 9,
Wherein the Korean pronunciation band word data includes information in which a Korean pronunciation symbol corresponding to a Chinese pronunciation unit is mapped by separating a Chinese word pair into a pronunciation unit,
Wherein the Chinese phonetic alphabet is a phonetic alphabetic character expressed by a Chinese roman alphabet including a four-phonetic phonetic alphabet.

10. The method of claim 9,
Wherein the Chinese phonetic word data includes information obtained by dividing a Korean syllable into a first sentence, a neutral sentence, and a last sentence, and mapping the first sentence, the neutral sentence, and the last sentence to a phonetic symbol or a roman notation of a Chinese phoneme.