KR20160081244A

KR20160081244A - Automatic interpretation system and method

Info

Publication number: KR20160081244A
Application number: KR1020140194777A
Authority: KR
Inventors: 최무열; 김상훈; 김영익; 이민규; 김승희
Original assignee: 한국전자통신연구원
Priority date: 2014-12-31
Filing date: 2014-12-31
Publication date: 2016-07-08

Abstract

The present invention relates to an automatic interpretation system and an operating method thereof. The automatic interpretation system comprises: a personalization module providing an acoustic model speaker-adapted based on input voices and texts, and a language model adapted to a sentence similar to the input text; an automatic interpretation module performing automatic speech recognition and translation with respect to the input voices and texts, based on the acoustic model and the language module provided by the personalization module; a reliability determining module outputting a reliability value based on reliability of a voice recognition result and a translation result output from the automatic interpretation module; and a server transmitting the voice recognition result, the translation result and the reliability value to an external display device.

Description

[0001] Automatic interpretation system and method [0002]

본 발명은 자동 통역에 관한 것으로, 상세하게는 국제 컨퍼런스 또는 세미나 등에서 발표하는 발표자의 발화를 자동 통역하여 스크린이나 청중의 단말기에 자막 또는 합성음을 제공하는 자동 통역 시스템 및 이의 동작 방법에 관한 것이다.
The present invention relates to an automatic interpretation system, and more particularly, to an automatic interpretation system for automatically interpreting a speaker's utterance announced at an international conference or a seminar to provide captions or synthesized sounds to a screen or an audience terminal.

국제 회의나 세미나 시에 모국어가 아닌 제2외국어를 사용하여 발표할 경우, 발표자와 청중 모두 익숙하지 않은 언어로 인해 발표력과 정보 전달력이 모두 떨어지기 쉽다. 규모가 큰 국제 컨퍼런스에서는 동시 통역사가 실시간으로 통역해 주는 방법이 있으나 대다수의 작은 규모의 컨퍼런스에서는 제공되기 어려운 단점이 있고, 특히 다국어에 대한 동시 통역 서비스는 제공되기 쉽지 않다.When presenting at a conference or seminar using a second language other than the native language, both the speaker and the audience are likely to lose both the presentation ability and the information conveyance due to the unfamiliar language. In large-scale international conferences, simultaneous interpreters have a way to interpret in real time. However, there are drawbacks that can not be provided in most small-scale conferences, and simultaneous interpretation services for multiple languages are not readily available.

이를 대체하기 위하여 음성인식기술을 적용하여 자동 통역하는 기술이 소개되어 사용되고 있다.In order to replace this, automatic speech recognition technology using speech recognition technology has been introduced and used.

잘 알려진 바와 같이, 자동 통역 시스템은 서로 다른 언어를 사용하는 사람들이 자신들의 모국어로도 의사소통을 할 수 있도록 하기 위한 것으로서, 음성 신호를 받아 음성 인식을 수행하고 그 결과를 제 2의 언어로 자동 번역한 후에 그 결과를 다시 음성으로 합성하여 출력한다. 즉, 자동 통역 시스템은 음성 인식, 자동 번역, 음성 합성 등의 기능을 수행한다.As is well known, the automatic interpretation system is intended to enable people using different languages to communicate in their native language. It performs speech recognition by receiving a speech signal, and outputs the result in a second language After the translation, the result is synthesized again and output. That is, the automatic interpretation system performs functions such as speech recognition, automatic translation, and speech synthesis.

음성인식기술을 이용한 자동 통역 시스템은 다양한 화자에게 적용하기 위한 화자적응 기술과 다양한 전문분야에 특화하기 위한 언어모델 적응 기술을 이용하여 음성인식 성능을 높여 발표자로 하여금 외국어로 발표를 준비하는 비효율성을 감소시키고 청취자는 발표 내용을 모국어로 듣도록 하여 이해력을 돕는다.The automatic interpretation system using speech recognition technology uses speech adaptation technology to adapt to various speakers and language model adaptation technology to specialize in various special fields, thereby enhancing speech recognition performance and providing inefficiency to prepare presenters for presentation in a foreign language And the listener helps the comprehension by listening to the contents of the presentation in the native language.

따라서, 음성인식 기술과 화자모델 및 언어모델 적응을 이용한 자동통역 시스템은 다양한 전문영역에서 비전문영역의 일반 통역사에 의한 통역 능력을 능가할 뿐만 아니라 다국어 통역 시스템에 의해 효율적인 통역 및 통역 비용을 절감할 수 있게 한다.Therefore, the automatic interpretation system using the speech recognition technology, the speaker model and the language model adaptation not only surpasses the interpreter ability by the general interpreter in the non-professional area in various specialty areas, but also can reduce the interpretation and interpretation cost efficiently by the multi- Let's do it.

하지만, 다양한 전문분야에 대해 자유 발화도가 높은 환경에서의 음성인식 은 인식성능을 보장하기 어려운 문제가 있기 때문에 자연스러운 동시 통역 서비스를 제공하는데 한계가 있다.
However, speech recognition in an environment where free speech is high in various special fields has a problem that it is difficult to guarantee recognition performance, so there is a limit to providing a natural simultaneous interpretation service.

따라서, 본 발명은 상기와 같은 종래 기술의 문제점을 해결하기 위한 것으로, 본 발명의 목적은, 국제 컨퍼런스 또는 세미나 등에서 발표하는 발표자의 발화를 자동 통역하여 스크린이나 청중의 단말기에 자막 또는 합성음을 제공하는 자동 통역 시스템 및 이의 동작 방법을 제공함에 있다.
SUMMARY OF THE INVENTION Accordingly, the present invention has been made to solve the above-mentioned problems occurring in the prior art, and it is an object of the present invention to provide a method and apparatus for automatically interpreting a speaker's utterance announced at an international conference or a seminar, An automatic interpretation system and an operation method thereof.

상기와 같은 목적을 달성하기 위한 본 발명의 일 측면에 따른 자동 통역 시스템은, 입력되는 음성 및 텍스트를 바탕으로 화자 적응된 음향 모델 및 입력 텍스트와 유사 문장에 적응된 언어 모델을 제공하는 개인화 모듈; 상기 개인화 모듈에 의해 제공되는 음향 모델 및 언어 모델을 바탕으로, 입력되는 음성 및 텍스트에 대해 자동 음성 인식 및 번역을 수행하는 자동 통역 모듈; 상기 자동 통역 모듈로부터 출력되는 음성 인식 결과와 번역 결과에 대한 신뢰도를 바탕으로 신뢰도 값을 출력하는 신뢰도 결정 모듈; 및 상기 음성 인식 결과, 상기 번역 결과 및 상기 신뢰도 값을 외부 표시 장치로 전송하는 서버로 구성된다.According to an aspect of the present invention, there is provided an automatic interpretation system comprising: a personalization module for providing a speaker-adapted acoustic model based on input voice and text, a language model adapted to an input text and a similar sentence; An automatic interpretation module for performing automatic speech recognition and translation on the inputted voice and text based on the acoustic model and the language model provided by the personalization module; A reliability determination module for outputting a reliability value based on the reliability of the speech recognition result and the translation result output from the automatic translation module; And a server for transmitting the speech recognition result, the translation result, and the reliability value to an external display device.

상기 개인화 모듈은 입력되는 음성을 바탕으로 화자를 식별한 후, 데이터베이스로부터 음성 로그 데이터를 추출하고, 현재 음성 인식기의 음향 모델을 추출된 음성 로그 데이터에 적응시켜, 화자 적응된 음향 모델을 제공한다.The personalization module identifies the speaker based on the input voice, extracts voice log data from the database, and adapts the acoustic model of the current voice recognizer to the extracted voice log data to provide a speaker-adapted acoustic model.

상기 개인화 모듈은, 상기 자동 통역 시스템이 음성 녹음 기능을 제공하도록 구성된 경우, 화자에게 음성 녹음 기능을 제공하고, 음성 녹음 기능을 통해 녹음된 음성에 현재 음성 인식기의 음향 모델을 적응시켜, 화자 적응된 음향 모델을 제공한다.Wherein the personalization module provides a voice recording function to the speaker when the automatic interpretation system is configured to provide a voice recording function and adapts the acoustic model of the current voice recognizer to the voice recorded through the voice recording function, Provides an acoustic model.

상기 개인화 모듈은 입력되는 텍스트를 언어 모델에 반영하고 또한, 데이터베이스로부터 입력 텍스트와 유사한 문장을 검색하여 텍스트를 충분히 확보하는 텍스트 처리 과정을 거쳐, 입력 텍스트 및 유사 문장에 적응된 언어 모델을 제공한다.The personalization module provides a language model adapted to the input text and the similar sentence through a text processing process that reflects the input text to the language model and retrieves a sentence similar to the input text from the database to secure a sufficient amount of text.

상기 자동 통역 모듈은, 입력되는 음성을 화자 적응된 음향 모델을 바탕으로 인식하여, 음성 인식 결과를 출력하는 자동음성인식 엔진; 및 입력되는 텍스트와 유사문장을 이용하여 적응된 언어 모델을 바탕으로 번역하여 번역 결과를 출력하는 번역 엔진으로 구성된다.The automatic interpretation module includes an automatic speech recognition engine for recognizing an input speech based on a speaker-adapted acoustic model and outputting a speech recognition result; And a translation engine for translating based on an adapted language model using sentences similar to the input text and outputting translation results.

상기 신뢰도 결정 모듈은 단어별로 계산되는 확률값을 이용하여 상기 음성 인식 결과에 대한 신뢰도를 결정한다.The reliability determination module determines the reliability of the speech recognition result using a probability value calculated for each word.

상기 신뢰도 결정 모듈은 OoV(Out of Vocabulary) 또는 PPL(Perplexity)을 사용하여 번역 결과에 대한 신뢰도를 결정한다.The reliability determination module determines the reliability of the translation result using OoV (Out of Vocabulary) or PPL (Perplexity).

상기 외부 표시 장치는, 상기 서버로부터 전송되는 음성 인식 결과 및 신뢰도 값을 수신하여 표시하는 화자 단말기; 상기 서버로부터 전송되는 번역 결과를 수신하여, 영문으로 표시하는 스크린 장치; 및 상기 서버로부터 전송되는 번역 결과를 수신하여, 설정된 언어의 문자로 표시하는 청취자 단말기를 포함한다.
Wherein the external display device comprises: a speaker terminal for receiving and displaying a speech recognition result and a reliability value transmitted from the server; A screen device for receiving the translation result transmitted from the server and displaying it in English; And a listener terminal for receiving the translation result transmitted from the server and displaying the translation result in the set language.

본 발명의 타 측면에 따른 자동 통역 시스템의 동작 방법은, 입력되는 음성 및 텍스트에 적합하도록 화자 적응된 음향 모델 및 언어 모델을 제공하는 단계; 제공되는 음향 모델 및 언어 모델을 바탕으로, 입력되는 음성 및 텍스트에 대해 자동 음성 인식 및 번역을 수행하여 음성 인식 결과와 번역 결과를 출력하는 단계; 상기 음성 인식 결과와 상기 번역 결과에 대한 신뢰도를 결정하는 단계; 상기 음성 인식 결과에 대한 신뢰도와 상기 번역 결과에 대한 신뢰도를 바탕으로 신뢰도 값을 출력하는 단계; 및 상기 음성 인식 결과, 상기 번역 결과 및 상기 신뢰도 값을 외부 표시 장치로 전송하는 단계로 이루어진다.According to another aspect of the present invention, there is provided an operation method of an automatic interpretation system, comprising: providing a speaker-adapted acoustic model and a language model suitable for input speech and text; Outputting a speech recognition result and a translation result by performing automatic speech recognition and translation on input speech and text based on the provided acoustic model and language model; Determining reliability of the speech recognition result and the translation result; Outputting a reliability value based on the reliability of the speech recognition result and the reliability of the translation result; And transmitting the speech recognition result, the translation result, and the reliability value to an external display device.

상기 음향 모델 및 언어 모델을 제공하는 단계는 입력되는 음성을 바탕으로 화자를 식별한 후, 데이터베이스로부터 음성 로그 데이터를 추출하고, 현재 음성 인식기의 음향 모델을 추출된 음성 로그 데이터에 적응시켜, 화자 적응된 음향 모델을 제공하는 것을 포함한다.Wherein the step of providing the acoustic model and the language model includes the steps of extracting voice log data from a database after identifying the speaker based on the input voice, adapting the acoustic model of the current voice recognizer to the extracted voice log data, Lt; / RTI > acoustic model.

상기 음향 모델 및 언어 모델을 제공하는 단계는 화자에게 음성 녹음 기능을 제공하고, 음성 녹음 기능을 통해 녹음된 음성에 현재 음성 인식기의 음향 모델을 적응시켜 음향 모델을 제공하는 것을 포함한다.The step of providing the acoustic model and the language model includes providing a voice recording function to the speaker and adapting the acoustic model of the current voice recognizer to the voice recorded through the voice recording function to provide an acoustic model.

상기 음향 모델 및 언어 모델을 제공하는 단계는 입력되는 텍스트를 언어 모델 적응에 반영하고 또한, 데이터베이스로부터 입력 텍스트와 유사한 문장을 검색하여 텍스트를 충분히 확보하는 텍스트 처리 과정을 거쳐, 입력 텍스트 및 유사 문장에 적응된 언어 모델을 제공한다.The step of providing the acoustic model and the language model includes a step of performing a text processing process that reflects the input text in the language model adaptation and retrieves a sentence similar to the input text from the database to secure a sufficient amount of text, Provides an adapted language model.

상기 자동 음성 인식 및 번역을 수행하여 음성 인식 결과와 번역 결과를 출력하는 단계는, 입력되는 음성을 화자 적응된 음향 모델을 바탕으로 인식하여, 음성 인식 결과를 출력하는 단계; 및 입력되는 텍스트와 유사문장을 이용하여 적응된 언어 모델을 바탕으로 번역하여 번역 결과를 출력하는 단계로 이루어진다.The step of performing the automatic speech recognition and translation to output the speech recognition result and the translation result may include: recognizing the input speech based on the speaker-adapted acoustic model and outputting the speech recognition result; And a step of translating the translation based on the language model adapted using the sentence similar to the input text and outputting the translation result.

상기 음성 인식 결과와 상기 번역 결과에 대한 신뢰도를 결정하는 단계는 단어별로 계산되는 확률값을 이용하여 상기 음성 인식 결과에 대한 신뢰도를 결정하는 것을 포함한다.The step of determining the reliability of the speech recognition result and the translation result includes determining the reliability of the speech recognition result using a probability value calculated for each word.

상기 음성 인식 결과와 상기 번역 결과에 대한 신뢰도를 결정하는 단계는 OoV(Out of Vocabulary) 또는 PPL(Perplexity)을 사용하여 번역 결과에 대한 신뢰도를 결정하는 것을 포함한다.The step of determining the reliability of the speech recognition result and the translation result includes determining reliability of the translation result using OoV (Out of Vocabulary) or PPL (Perplexity).

상기 외부 표시 장치로 전송하는 단계 이후에, 상기 외부 표지 장치가 수신한 내용을 표시하는 단계를 더 포함한다.Further comprising the step of displaying the contents received by the external marking apparatus after the step of transmitting to the external display apparatus.

상기 외부 표지 장치가 수신한 내용을 표시하는 단계는, 화자 단말기가 상기 음성 인식 결과 및 상기 신뢰도 값을 수신하여 표시하는 단계; 스크린 장치가 상기 번역 결과를 수신하여, 영문으로 표시하는 단계; 및 청취자 단말기가 상기 번역 결과를 수신하여, 설정된 언어의 문자로 표시하는 단계를 포함한다.
The step of displaying the contents received by the external marking device includes: receiving and displaying the speech recognition result and the reliability value by the speaker terminal; The screen device receiving the translation result and displaying it in English; And a step in which the listener terminal receives the translation result and displays the translation result in the language of the set language.

이와 같은 본 발명에 따르면, 개인화 방식과 신뢰도 검증 방식을 적용하여 자동 통역함으로써 신뢰도 및 성능이 보장되는 자동 통역 시스템이 제공된다.According to the present invention, an automatic interpretation system is provided in which reliability and performance are guaranteed by automatically interpreting a personalization method and a reliability verification method.

따라서, 동시 통역이 제공되지 않는 소규모의 국제 컨퍼런스나, 소수의 외국인을 위해 전문 통역인을 고용하기 어려운 상황에 효과적으로 이용할 수 있다.Therefore, it can be used effectively in situations where it is difficult to hire a professional international interpreter for small international conferences or a small number of foreigners who do not have simultaneous interpretation.

특히, 자동 통역 결과는 스크린에 국제 공용어인 영문으로 표시되고, 청취자의 단말에 청취자가 설정한 언어의 문자로 표시되기 때문에 효과적인 통역을 제공할 수 있다.
In particular, the automatic interpretation result is displayed on the screen in English, which is an international official language, and the terminal of the listener is displayed in the language of the language set by the listener, thereby providing effective interpretation.

도 1은 본 발명의 실시 예에 따른 자동 통역 시스템의 구성도이다.
도 2는 본 발명의 실시 예에 따른 자동 통역 시스템을 적용한 사용례를 도시한 도면이다.
도 3은 본 발명의 실시 예에 따른 자동 통역 시스템의 동작에 따른 순서를 도시한 플로우챠트이다.1 is a configuration diagram of an automatic interpretation system according to an embodiment of the present invention.
FIG. 2 is a view showing an example in which an automatic interpretation system according to an embodiment of the present invention is applied.
3 is a flowchart illustrating an operation sequence of an automatic interpretation system according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시 예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시 예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 도면부호는 동일 구성 요소를 지칭한다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like numbers refer to like elements throughout.

본 발명의 실시 예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명의 실시 예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.
In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. The following terms are defined in consideration of the functions in the embodiments of the present invention, which may vary depending on the intention of the user, the intention or the custom of the operator. Therefore, the definition should be based on the contents throughout this specification.

이하, 본 발명의 실시 예에 따른 자동 통역 시스템 및 방법에 대하여 첨부한 도면을 참조하여 상세하게 설명해 보기로 한다.Hereinafter, an automatic interpretation system and method according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시 예에 따른 자동 통역 시스템의 구성도이다.1 is a configuration diagram of an automatic interpretation system according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시 예에 따른 자동 통역 시스템(100)은 입력되는 음성 및 텍스트를 바탕으로 화자 적응된 음향 모델과 입력 텍스트와 유사 문장에 적응된 언어 모델을 제공하고, 음향 모델 및 언어 모델을 바탕으로, 음성을 인식하는 한편 텍스트를 번역하여 제공함으로써, 성능 및 신뢰성이 보장된 통역 결과를 제공한다.Referring to FIG. 1, an automatic interpretation system 100 according to an embodiment of the present invention provides a language model adapted to a speaker-adapted acoustic model, input text and similar sentences based on input speech and text, And language model to provide interpreted results with guaranteed performance and reliability by recognizing speech and translating the text.

이에 더하여, 상기 자동 통역 시스템(100)은 음성 인식 결과와 텍스트 번역 결과를 바탕으로 신뢰도를 결정하여 제공함으로써, 화자가 발표를 진행하는데 도움을 준다.In addition, the automatic interpretation system 100 determines and provides reliability based on the speech recognition result and the text translation result, thereby helping the speaker to proceed with the presentation.

구체적으로, 상기 자동 통역 시스템(100)은 개인화 모듈(110), 자동 통역 모듈(120), 신뢰도 결정 모듈(130) 및 서버(140)를 포함할 수 있다.Specifically, the automatic interpretation system 100 may include a personalization module 110, an automatic interpretation module 120, a reliability determination module 130, and a server 140.

상기 개인화 모듈(110)은 입력되는 음성 및 텍스트를 바탕으로 화자 적응된 음향 모델과 입력 텍스트와 유사 문장에 적응된 언어 모델을 제공한다.The personalization module 110 provides a speaker-adapted acoustic model based on the input voice and text, and a language model adapted to the input text and the similar sentence.

이때, 상기 개인화 모듈(110)은 입력되는 음성을 바탕으로 화자를 식별한 뒤 데이터베이스로부터 음성 로그 데이터를 추출하고, 현재 음성 인식기의 음향 모델을 추출된 음성 로그 데이터에 적응시켜, 화자 적응된 음향 모델을 제공한다.At this time, the personalization module 110 identifies the speaker based on the input voice, extracts voice log data from the database, adapts the acoustic model of the current voice recognizer to the extracted voice log data, .

따라서, 상기 자동 통역 시스템(100) 내 개인화 모듈(110)의 데이터베이스에는 화자별 음성 로그 데이터가 저장되어 있다.Accordingly, the personalization module 110 in the automatic interpretation system 100 stores voice log data per speaker.

한편, 본 자동 통역 시스템(100)을 처음 사용하는 화자의 경우에는 데이터베이스에 화자의 음성 로그 데이터가 저장되어 있지 않기 때문에, 상기 자동 통역 시스템(100)은 화자에게 음성 녹음 기능을 제공하도록 구성될 수 있다.On the other hand, in the case of a speaker who uses the automatic interpretation system 100 for the first time, since the voice log data of the speaker is not stored in the database, the automatic interpretation system 100 can be configured to provide the speaker with a voice recording function have.

이와 같이, 상기 자동 통역 시스템(100)이 음성 녹음 기능을 제공하도록 구성된 경우, 상기 개인화 모듈(110)은 화자에게 음성 녹음 기능을 제공한다.As such, when the automatic interpretation system 100 is configured to provide a voice recording function, the personalization module 110 provides a voice recording function to a speaker.

따라서, 데이터베이스에 음성 로그 데이터가 저장되어 있지 않은 화자가 음성 녹음 기능을 이용하여 음성을 녹음한 후, 상기 개인화 모듈(110)이 녹음된 음성을 바탕으로 현재 음성 인식기의 음향 모델을 적응시켜, 화자 적응된 음향 모델을 제공한다.Therefore, after the speaker, in which the voice log data is not stored in the database, records the voice using the voice recording function, the personalization module 110 adapts the acoustic model of the current voice recognizer based on the recorded voice, And provides an adapted acoustic model.

또한, 상기 개인화 모듈(110)은 입력되는 텍스트를 언어 모델에 반영하고 또한, 데이터베이스로부터 입력 텍스트와 유사한 문장을 검색하여 텍스트를 충분히 확보하는 텍스트 처리 과정을 거쳐, 입력 텍스트 및 유사 문장에 적응된 언어 모델을 제공한다.In addition, the personalization module 110 reflects input text to the language model, and searches text similar to the input text from the database to obtain a sufficient text to process the input text and a language adapted to the similar sentence Model.

상기 자동 통역 모듈(120)은, 개인화 모듈(110)에 의해 화자 적응된 음향 모델 및 입력 텍스트와 유사 문장에 적응된 언어 모델을 바탕으로, 입력되는 음성 및 텍스트에 대한 통역 결과를 출력한다.The automatic interpretation module 120 outputs an interpretation result for the inputted voice and text based on the speaker model adapted by the personalization module 110 and the language model adapted to the input text and the similar sentence.

이때, 상기 자동 통역 모듈(120)은 입력되는 음성을 인식하여 음성 인식 결과를 출력하는 한편, 입력되는 텍스트를 번역하여 번역 결과를 출력한다.At this time, the automatic interpretation module 120 recognizes the input speech, outputs the speech recognition result, and translates the input text to output the translation result.

상기 자동 통역 모듈(120)은 자동음성인식(ASR: Automatic Speech Recongnition) 엔진(121) 및 번역 엔진(122)으로 구성될 수 있다.The automatic interpretation module 120 may comprise an Automatic Speech Recognition (ASR) engine 121 and a translation engine 122.

상기 자동음성인식 엔진(121)은, 화자 적응된 음향 모델을 바탕으로, 입력되는 음성을 인식하여, 음성 인식 결과를 출력한다. 여기서, 화자의 음성을 인식한다는 것은 사용자의 음성을 해석하고 그 의미를 식별하는 것을 의미한다.The automatic speech recognition engine 121 recognizes the input speech based on the speaker-adapted acoustic model and outputs a speech recognition result. Here, recognizing the speaker's voice means interpreting the user's voice and identifying its meaning.

이와 같이, 상기 자동음성인식 엔진(121)이 화자 적응된 음향 모델을 바탕으로 음성을 인식하기 때문에 인식 신뢰도를 향상시킬 수 있다. 또한, 상기 자동음성인식 엔진(121)에 의해 인식된 결과는 최종적으로 화자의 단말기로 전송된다.As described above, since the automatic speech recognition engine 121 recognizes the speech based on the speaker-adapted acoustic model, the recognition reliability can be improved. In addition, the result recognized by the automatic speech recognition engine 121 is finally transmitted to the terminal of the speaker.

또한, 상기 자동음성인식 엔진(121)은 번역 엔진(122)으로부터 번역 결과를 입력받아 출력한다.The automatic speech recognition engine 121 receives the translation result from the translation engine 122 and outputs the translation result.

상기 번역 엔진(122)은, 입력 텍스트와 유사 문장에 적응된 언어 모델을 바탕으로, 입력되는 텍스트를 다른 언어, 예를 들면 공용어의 텍스트로 번역하여, 번역 결과를 출력한다. 여기서, 입력되는 텍스트는 발표 자료에 해당하는 텍스트이며, 상기 번역 엔진(122)은 입력되는 텍스트를 하나의 다른 언어로 번역할 수 있으나, 여러 타 언어로 번역할 수 도 있다.The translation engine 122 translates the input text into another language, for example, a text of a common language, based on a language model adapted to the input text and similar sentences, and outputs the translation result. Here, the input text is a text corresponding to the presentation data, and the translation engine 122 can translate the input text into one other language, but may translate it into various other languages.

이와 같이, 상기 번역 엔진(122)이 입력 텍스트와 유사 문장에 적응된 언어 모델을 바탕으로 텍스트를 번역하기 때문에 번역 신뢰도를 향상시킬 수 있다.As described above, since the translation engine 122 translates the text based on the input text and the language model adapted to the similar sentence, the translation reliability can be improved.

상기 신뢰도 결정 모듈(130)은 자동 통역 모듈(120)로부터 전송되는 음성 인식 결과와 번역 결과를 수신하고, 음성 인식 결과 및 번역 결과에 대한 신뢰도를 결정한다.The reliability determination module 130 receives the speech recognition result and the translation result transmitted from the automatic interpretation module 120, and determines the reliability of the speech recognition result and the translation result.

이때, 상기 신뢰도 결정 모듈(130)은 단어별로 계산되는 확률값을 이용하여 음성 인식 결과에 대한 신뢰도를 결정하고, OoV(Out of Vocabulary) 또는 PPL(Perplexity)을 사용하여 번역 결과에 대한 신뢰도를 결정한다.At this time, the reliability determination module 130 determines the reliability of the speech recognition result using the probability value calculated for each word, and determines the reliability of the translation result using OoV (Out of Vocabulary) or PPL (Perplexity) .

또한, 상기 신뢰도 결정 모듈(130)은 음성 인식 결과에 대한 신뢰도와 번역 결과에 대한 신뢰도를 종합하여 최종적인 신뢰도 값을 출력한다. 상기 신뢰도 결정 모듈(130)에 의해 결정된 신뢰도 값은 최종적으로 화자의 단말기로 전송된다.Also, the reliability determination module 130 outputs the final reliability value by combining the reliability of the speech recognition result and the reliability of the translation result. The reliability value determined by the reliability determination module 130 is finally transmitted to the terminal of the speaker.

상기 서버(140)는 자동 통역 모듈(120)로부터 전송되는 음성 인식 결과 및 번역 결과를 수신하고, 신뢰도 결정 모듈(130)로부터 전송되는 신뢰도 값을 수신하여, 수신한 음성 인식 결과, 번역 결과 및 신뢰도 값을 저장하고 있다가 외부 표시장치(200)로 전송한다.The server 140 receives the speech recognition result and the translation result transmitted from the automatic translation module 120, receives the reliability value transmitted from the reliability determination module 130, and outputs the received speech recognition result, translation result, and reliability And transmits the stored value to the external display device 200.

이때, 상기 외부 표시 장치(200)는 화자 단말기(210), 발표용 스크린 장치(220) 및 청취자 단말기(230)를 포함할 수 있다.The external display device 200 may include a speaker terminal 210, a screen device for presentation 220, and a listener terminal 230.

한편, 상기 서버(140)는 화자 단말기(210)로 음성 인식 결과 및 신뢰도 값을 전송하고, 발표용 스크린(220) 및 청취자 단말기(230)로 번역 결과를 전송한다.The server 140 transmits the voice recognition result and the reliability value to the speaker terminal 210 and transmits the translation result to the announcement screen 220 and the listener terminal 230.

상기 화자 단말기(210)는 서버(140)로부터 전송되는 음성 인식 결과 및 신뢰도 값을 표시하여 화자가 발표하는데 참조하도록 한다.The speaker terminal 210 displays the speech recognition result and the reliability value transmitted from the server 140, and allows the speaker to refer to the speaker.

이때, 상기 화자 단말기(210)는 신뢰도 값을 숫자, 색상, 문양 등 다양한 식별 방법을 이용하여 표시할 수 있다.At this time, the speaker terminal 210 can display the reliability value using various identification methods such as numbers, colors, and patterns.

상기 스크린 장치(220)는 번역 결과를 수신하여 스크린에 국제 공용어인 영어로 표시하고, 상기 청취자 단말(230)은 번역 결과를 수신하고, 수신한 번역 결과를 설정된 언어의 문자로 표시한다.The screen device 220 receives the translation result and displays it on the screen in English, which is an international official language. The listener terminal 230 receives the translation result and displays the received translation result in the character of the set language.

이에 더하여, 상기 외부 표시 장치(200)는 화자 단말기(230)와 연동되는 블루투스 헤드셋(240)을 더 포함할 수 있다. 따라서, 청취자는 자막으로만 번역 결과를 보는 것이 아니라, 음성 합성으로 청취할 수 있다.
In addition, the external display device 200 may further include a Bluetooth headset 240 interlocked with the speaker terminal 230. Therefore, the listener can listen to the synthesized speech only, not to view the translation result only with subtitles.

도 2는 본 발명의 실시 예에 따른 자동 통역 시스템을 적용한 사용례를 도시한 도면이다.FIG. 2 is a view showing an example in which an automatic interpretation system according to an embodiment of the present invention is applied.

도 2에서와 같이, 본 발명의 실시 예에 따른 자동 통역 시스템(100)을 이용하여 화자가 "감사합니다"라고 말을 하면, 자동 통역 시스템(100)은 화자 단말기(210)로 음성 인식 결과와 신뢰도 값을 제공하고, 스크린 장치(220)와 청취자 단말(230)로 번역 결과를 제공한다.2, when the speaker speaks "Thank you" using the automatic interpretation system 100 according to the embodiment of the present invention, the automatic interpretation system 100 causes the speaker terminal 210 to recognize the speech recognition result Provides a confidence value, and provides translation results to the screen device 220 and the listener terminal 230.

이에 따라, 상기 화자 단말기(210)는 음성 인식 결과인 "감사합니다"와 신뢰도 값을 표시하고, 상기 스크린 장치(220)는 스크린(S) 상에 영문으로 표시하고, 상기 청취자 단말(230)은 설정된 언어의 문자로 표시하며, 청취자 단말(230)과 연동되는 헤드셋(240)은 설정된 언어의 음성을 출력한다.
Accordingly, the speaker terminal 210 displays' Thank you 'and the reliability value as the speech recognition result, and the screen device 220 displays in English on the screen S, and the listener terminal 230 displays' And the headset 240 linked with the listener terminal 230 outputs the voice of the set language.

이상에서는 본 발명의 실시 예에 따른 자동 통역 시스템의 구성 및 기능에 대해서 살펴보았다. 이하에서는 첨부된 도면을 바탕으로 본 발명의 실시 예에 따른 자동 통역 시스템의 동작에 대해서 단계적으로 살펴보기로 한다.
The configuration and functions of the automatic interpretation system according to the embodiment of the present invention have been described above. Hereinafter, an operation of the automatic interpretation system according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 3은 본 발명의 실시 예에 따른 자동 통역 시스템의 동작에 따른 순서를 도시한 플로우챠트이다.3 is a flowchart illustrating an operation sequence of an automatic interpretation system according to an embodiment of the present invention.

도 3을 참조하여 본 발명의 실시 예에 따른 자동 통역 시스템의 동작을 살펴보면, 자동 통역 시스템(100)은 입력되는 음성 및 텍스트에 적합하도록 화자 적응된 음향 모델과 입력 텍스트와 유사 문장에 적응된 언어 모델을 제공한다(S310).Referring to FIG. 3, the operation of the automatic interpretation system according to the embodiment of the present invention will be described. The automatic interpretation system 100 includes a speaker-adapted acoustic model adapted to input voice and text, A model is provided (S310).

따라서, 화자 적응된 음향 모델 및 입력 텍스트와 유사 문장에 적응된 언어 모델을 제공하기 위하여, 자동 통역 시스템(100) 내 개인화 모듈(110)의 데이터베이스에는 화자별 음성 로그 데이터와 다량의 텍스트가 저장되어 있다.Thus, in order to provide a speaker-adapted acoustic model and a language model adapted to the similar sentences with the input text, the database of the personalization module 110 in the automatic interpretation system 100 stores voice log data per speaker and a large amount of text have.

이때, 상기 자동 통역 시스템(100)은 입력되는 음성을 바탕으로 화자를 식별한 뒤 데이터베이스로부터 음성 로그 데이터를 추출하고, 현재 음성 인식기의 음향 모델을 추출된 음성 로그 데이터에 적응시켜, 화자 적응된 음향 모델을 제공한다.At this time, the automatic interpretation system 100 identifies the speaker based on the input voice, extracts the voice log data from the database, adapts the acoustic model of the current voice recognizer to the extracted voice log data, Model.

또한, 상기 자동 통역 시스템(100)은 입력되는 텍스트를 언어 모델에 반영하고 또한, 데이터베이스로부터 입력 텍스트와 유사한 문장을 검색하여 텍스트를 충분히 확보하는 텍스트 처리 과정을 거쳐, 입력 텍스트 및 유사 문장에 적응된 언어 모델을 제공한다.In addition, the automatic interpretation system 100 reflects the input text to the language model and searches for a sentence similar to the input text from the database to obtain a sufficient text, Language model.

한편, 본 자동 통역 시스템을 처음 사용하는 화자의 경우에는 데이터베이스에 화자의 음성 로그 데이터가 저장되어 있지 않기 때문에, 상기 자동 통역 시스템(100)은 화자에게 음성 녹음 기능을 제공하도록 구성될 수 있다.On the other hand, in the case of a speaker who uses the automatic interpretation system for the first time, since the voice log data of the speaker is not stored in the database, the automatic interpretation system 100 can be configured to provide the speaker with a voice recording function.

이와 같이 자동 통역 시스템(100)에 의해 음성 녹음 기능이 제공되는 경우, 음성 녹음 기능을 이용하여 녹음된 음성을 바탕으로, 자동 통역 시스템(100)은 현재 음성 인식기의 음향 모델을 적응시켜, 화자 적응된 음향 모델을 제공한다.
When the voice recording function is provided by the automatic interpretation system 100, the automatic interpretation system 100 adapts the acoustic model of the current voice recognizer based on the voice recorded using the voice recording function, Acoustic models.

다음으로, 상기 자동 통역 시스템(100)은 입력되는 음성에 대한 자동 음성 인식을 수행하는 한편 입력되는 텍스트에 대한 자동 번역을 수행한다(S320).Next, the automatic interpretation system 100 performs automatic speech recognition on the input voice and performs automatic translation on the input text (S320).

이때, 상기 자동 통역 시스템(100)은 단계 S310에 따라 제공되는 화자 적응된 음향 모델 및 입력 텍스트와 유사 문장에 적응된 언어 모델을 바탕으로 자동 음성 인식 및 번역을 수행한다.
At this time, the automatic interpretation system 100 performs automatic speech recognition and translation based on the speaker-adapted acoustic model provided according to step S310 and the language model adapted to the input text and the similar sentence.

다음으로, 상기 자동 통역 시스템(100)은 단계 S320에 따라 수행되어 출력되는 음성 인식 결과 및 번역 결과를 바탕으로, 음성 인식 결과 및 번역 결과에 대한 신뢰도를 결정하고(S330), 음성 인식 결과에 대한 신뢰도와 번역 결과에 대한 신뢰도를 종합하여 최종적인 신뢰도 값을 출력한다(S340).Next, the automatic interpretation system 100 determines the reliability of the speech recognition result and the translation result based on the speech recognition result and the translation result output and performed according to step S320 (S330) The final reliability value is output by combining the reliability and the reliability of the translation result (S340).

이때, 음성 인식 결과에 대한 신뢰도를 결정하는 경우, 상기 자동 통역 시스템(100)은 단어별로 계산되는 확률값을 이용할 수 있다.At this time, when the reliability of the speech recognition result is determined, the automatic interpretation system 100 can use a probability value calculated for each word.

또한, 번역 결과에 대한 신뢰도를 결정하는 경우, 상기 자동 통역 시스템(100)은 OoV(Out of Vocabulary) 또는 PPL(Perplexity)을 사용할 수 있다.
In addition, when determining the reliability of the translation result, the automatic interpretation system 100 may use OoV (Out of Vocabulary) or PPL (Perplexity).

다음으로, 상기 자동 통역 시스템(100)은 단계 S320에 따라 출력되는 음성 인식 결과와 번역 결과 및 단계 S340에 따라 출력되는 신뢰도 값을 외부 표시 장치(200)로 전송한다(S350).Next, the automatic interpretation system 100 transmits the speech recognition result, the translation result, and the reliability value output according to step S320 to the external display device 200 (S350).

여기서, 상기 외부 표시 장치(200)는 화자 단말기(210), 발표용 스크린 장치(220) 및 청취자 단말기(230)를 포함할 수 있다.The external display device 200 may include a speaker terminal 210, a screen device for presentation 220, and a listener terminal 230.

이때, 상기 자동 통역 시스템(100)은 화자 단말기(210)로 음성 인식 결과 및 신뢰도 값을 전송하고, 발표용 스크린(220) 및 청취자 단말기(230)로 번역 결과를 전송한다.
At this time, the automatic interpretation system 100 transmits the speech recognition result and the reliability value to the speaker terminal 210, and transmits the translation result to the presentation screen 220 and the listener terminal 230.

이후, 단계 S350에 따라 자동 통역 시스템(100)으로부터 외부 표시 장치(200)로 음성 인식 결과, 번역 결과 및 신뢰도 값이 전송되면, 외부 표시 장치(200)는 수신한 내용을 표시한다(S360).Thereafter, when the voice recognition result, the translation result, and the reliability value are transmitted from the automatic translation system 100 to the external display device 200 in step S350, the external display device 200 displays the received contents in step S360.

이때, 상기 화자 단말기(210)는 자동 통역 시스템(100)로부터 전송되는 음성 인식 결과 및 신뢰도 값을 표시하며, 이때, 신뢰도 값을 숫자, 색상, 문양 등 다양한 식별 방법을 이용하여 표시할 수 있다.At this time, the speaker terminal 210 displays the speech recognition result and the reliability value transmitted from the automatic interpretation system 100, and the reliability value can be displayed using various identification methods such as numbers, colors, and patterns.

또한, 상기 스크린 장치(220)는 번역 결과를 수신하여 스크린에 국제 공용어인 영어로 표시하고, 상기 청취자 단말(230)은 번역 결과를 수신하고, 수신한 번역 결과를 설정된 언어의 문자로 표시한다.Also, the screen device 220 receives the translation result and displays it on the screen in English, which is an international official language. The listener terminal 230 receives the translation result and displays the received translation result as a character of the set language.

이에 더하여, 상기 외부 표시 장치(200)는 화자 단말기(230)와 연동되는 블루투스 헤드셋(240)을 더 포함할 수 있으며, 상기 헤드셋(240)은 번역 결과를 설정된 언어의 음성으로 출력한다.
In addition, the external display device 200 may further include a Bluetooth headset 240 interlocked with the speaker terminal 230, and the headset 240 outputs the translation result as a voice of the set language.

한편, 본 발명에 따른 자동 통역 시스템 및 이의 동작 방법을 실시 예에 따라 설명하였지만, 본 발명의 범위는 특정 실시 예에 한정되는 것은 아니며, 본 발명과 관련하여 통상의 지식을 가진 자에게 자명한 범위 내에서 여러 가지의 대안, 수정 및 변경하여 실시할 수 있다.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the scope of the invention is not limited to the disclosed embodiments, but, on the contrary, Various modifications, alterations, and alterations can be made within the scope of the present invention.

따라서, 본 발명에 기재된 실시 예 및 첨부된 도면들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시 예 및 첨부된 도면에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.
Therefore, the embodiments described in the present invention and the accompanying drawings are intended to illustrate rather than limit the technical spirit of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments and accompanying drawings . The scope of protection of the present invention should be construed according to the claims, and all technical ideas within the scope of equivalents should be interpreted as being included in the scope of the present invention.

100 : 자동 통역 시스템 110 : 개인화 모듈
120 : 자동 통역 모듈 121 : 자동음성인식 엔진
122 : 번역 엔진 130 : 신뢰도 결정 모듈
140 : 서버 200 : 표시 장치
210 : 화자 단말기 220 : 스크린 장치
230 : 청취자 단말 240 : 헤드셋100: Automatic interpretation system 110: Personalization module
120: automatic interpretation module 121: automatic speech recognition engine
122: translation engine 130: reliability determination module
140: server 200: display device
210: Speaker terminal 220: Screen device
230: Listener terminal 240: Headset

Claims

A personalization module for providing a speaker-adapted acoustic model based on input speech and text, and a language model adapted to input text and similar sentences;
An automatic interpretation module for performing automatic speech recognition and translation on the inputted voice and text based on the acoustic model and the language model provided by the personalization module;
A reliability determination module for outputting a reliability value based on the reliability of the speech recognition result and the translation result output from the automatic translation module; And
A server for transmitting the speech recognition result, the translation result, and the reliability value to an external display device;
And an automatic interpretation system.

The method according to claim 1,
The personalization module identifies the speaker based on the input voice, extracts voice log data from the database, adapts the acoustic model of the current voice recognizer to the extracted voice log data, and provides a speaker-adapted acoustic model Automatic interpreter system.

The method according to claim 1,
Wherein the personalization module provides a voice recording function to the speaker when the automatic interpretation system is configured to provide a voice recording function and adapts the acoustic model of the current voice recognizer to the voice recorded through the voice recording function, Lt; RTI ID = 0.0 > a < / RTI > acoustic model.

The method according to claim 1,
The personalization module may be configured to provide a language model adapted to the input text and the similar sentence through a text processing process that reflects the input text to the language model and retrieves a sentence similar to the input text from the database, Automatic interpreter system.

The method according to claim 1,
The automatic interpretation module includes:
An automatic speech recognition engine for recognizing an input speech based on a speaker-adapted acoustic model and outputting a speech recognition result; And
And a translation engine for translating and outputting translation results based on a language model adapted to the input text and similar sentences.

The method according to claim 1,
Wherein the reliability determination module determines reliability of the speech recognition result using a probability value calculated for each word.

The method according to claim 1,
Wherein the reliability determination module uses OoV (Out of Vocabulary) or PPL (Perplexity) to determine confidence in the translation results.

The method according to claim 1,
The external display device includes:
A speaker terminal for receiving and displaying a voice recognition result and a reliability value transmitted from the server;
A screen device for receiving the translation result transmitted from the server and displaying it in English; And
And a listener terminal for receiving the translation result transmitted from the server and displaying the translation result as a character of the set language.

Providing a language model adapted to a speaker-adapted acoustic model and an input text and a similar sentence adapted to the input voice and text;
Outputting a speech recognition result and a translation result by performing automatic speech recognition and translation on input speech and text based on the provided acoustic model and language model;
Determining reliability of the speech recognition result and the translation result;
Outputting a reliability value based on the reliability of the speech recognition result and the reliability of the translation result; And
Transmitting the speech recognition result, the translation result, and the confidence value to an external display device;
The method comprising the steps of:

10. The method of claim 9,
Wherein the step of providing the acoustic model and the language model includes the steps of extracting voice log data from a database after identifying the speaker based on the input voice, adapting the acoustic model of the current voice recognizer to the extracted voice log data, Lt; RTI ID = 0.0 > acoustic model. &Lt; / RTI >

10. The method of claim 9,
Wherein providing the acoustic model and the language model comprises providing a voice recording function to a speaker and adapting the acoustic model of the current speech recognizer to the recorded voice through the voice recording function to provide an acoustic model. Way.

10. The method of claim 9,
The step of providing the acoustic model and the language model includes the steps of: inputting a text to the language model and retrieving a sentence similar to the input text from the database; Wherein the language model is provided to the user.

10. The method of claim 9,
Wherein the step of performing the automatic speech recognition and translation to output the speech recognition result and the translation result comprises:
Recognizing an input voice based on a speaker-adapted acoustic model, and outputting a voice recognition result; And
And outputting the translation result based on the language model adapted to the input text and the similar sentence.

10. The method of claim 9,
Wherein the step of determining the reliability of the speech recognition result and the translation result is to determine the reliability of the speech recognition result by using a probability value calculated for each word.

10. The method of claim 9,
Wherein the step of determining the reliability of the speech recognition result and the translation result is to determine the reliability of the translation result using OoV (Out of Vocabulary) or PPL (Perplexity).

10. The method of claim 9,
Further comprising the step of displaying the content received by the external indicator after the step of transmitting to the external display.

17. The method of claim 16,
The step of displaying the contents received by the external marking apparatus comprises:
The speaker terminal receiving and displaying the voice recognition result and the reliability value;
The screen device receiving the translation result and displaying it in English; And
And the listener terminal receiving the translation result and displaying the translation result in the language of the set language.