KR20110032394A

KR20110032394A - Portable interpretation apparatus and method based on uer's situation

Info

Publication number: KR20110032394A
Application number: KR1020090089861A
Authority: KR
Inventors: 이재원; 안기서
Original assignee: 삼성전자주식회사
Priority date: 2009-09-23
Filing date: 2009-09-23
Publication date: 2011-03-30
Also published as: KR101640024B1

Abstract

PURPOSE: A portable interpreting apparatus and a method thereof including an optimized interpreting model are provided to increase interpreting accuracy based on condition information of a user. CONSTITUTION: A user interface module(212) inputs interpretation object voice information from a user. A location recognition module(211) recognizes the location information of the user. A communication module(213) receives an interpreting result corresponding to interpreting object voice information from an interpretation server(230). A voice synthesizing module synthesizes the received translation result to the voice.

Description

PORTABLE INTERPRETATION APPARATUS AND METHOD BASED ON UER'S SITUATION}

아래의 실시예들은 사용자의 상황에 기반한 휴대용 통역 장치 및 방법에 관한 것이다.The following embodiments relate to a portable interpreter device and method based on the user's situation.

자동 통역 장치는 사용자가 번역하고자 하는 제1 언어로 된 문장을 음성으로 입력 받으면, 입력된 음성을 자동으로 텍스트로 변환하여 기계적으로 번역을 수행하고, 번역된 결과물인 제2 언어로 된 문장을 다시 음성으로 합성하여 출력한다. When the user receives a sentence in a first language to be translated by voice, the automatic interpreter automatically converts the input voice into text to perform a mechanical translation, and retranslate the sentence in the second language as a translated result. Synthesize and output to speech.

이러한 음성을 텍스트로 변환하는 음성인식 과정에서는 성별을 비롯한 화자별 특성 및 모델이 포함하고 있는 어휘의 종류와 개수에 따라 인식 성능과 범위가 결정된다. 또한 번역을 위한 번역 모델은 모델의 크기가 작을 경우 번역할 수 있는 문장의 종류가 제한되고, 모델이 커질 경우 번역 속도와 번역 정확도가 떨어진다. In the speech recognition process of converting speech into text, recognition performance and range are determined according to the type and number of vocabulary included in the speaker and characteristics of the speaker and gender. In addition, the translation model for translation is limited to the type of sentences that can be translated when the model size is small, translation speed and translation accuracy is reduced when the model is large.

이러한 자동 통역 장치는 단일 음성인식/번역 모델을 사용하는 장치상의 제한된 자원으로 인해 원활한 통역 기능을 구동하기 어려움이 있다. Such automatic interpreter is difficult to operate a smooth interpreter function due to limited resources on the device using a single speech recognition / translation model.

본 발명의 일실시예에 따른 휴대용 통역 장치는, 사용자로부터 통역 요청된 음성 정보를 입력 받는 사용자 인터페이스 모듈과, 사용자의 위치 정보를 인식하는 위치 인식 모듈과, 상기 인식된 위치 정보 및 통역 요청된 음성 정보를 통역 서버로 전송하고, 상기 통역 서버로부터 통역 요청된 음성 정보에 대응되는 번역 결과를 수신하는 통신 모듈 및 상기 수신된 번역 결과를 음성으로 합성하는 음성 합성 모듈을 포함하고, 상기 사용자 인터페이스 모듈은 상기 음성으로 합성된 번역 결과를 상기 사용자에게 제공하고, 상기 통역 서버는 상기 위치 정보에 따라 상기 사용자의 상황을 파악하고, 상기 사용자의 상황에 대응되는 화자/상황별 음성인식/번역 모델에 따라 상기 통역 요청된 음성 정보를 인식하여 문자 데이터로 변환하고, 상기 변환된 문자 데이터를 번역하여 상기 통역 요청된 음성 정보에 대응되는 번역 결과로 전송한다.In accordance with an aspect of the present invention, a portable interpretation apparatus includes a user interface module for receiving voice information requested for interpretation from a user, a location recognition module for recognizing user location information, and the recognized location information and interpretation requested voice. A communication module for transmitting the information to an interpreter server, receiving a translation result corresponding to the requested speech information from the interpreter server, and a speech synthesis module for synthesizing the received translation result into voice; Providing the voice synthesized translation result to the user, and the interpreter server grasps the user's situation according to the location information, and according to the speaker / situation voice recognition / translation model corresponding to the user's situation. Recognizes the voice information requested for interpretation, converts it into text data, and converts the converted text into To translate the site and transmits it to the translation result corresponding to the voice information the request interpreter.

또한 본 발명의 일실시예에 따른 휴대용 통역 장치는, 사용자의 위치 정보를 인식하는 위치 인식 모듈과, 사용자로부터 화자 정보 및 상황 정보를 입력 받고, 상기 화자 정보 및 상황 정보에 대응되는 음성인식/번역 모델의 다운로드를 요청 받고, 통역할 음성 정보를 입력 받는 사용자 인터페이스 모듈과, 상기 화자 정보 및 상황 정보에 대응되는 음성인식/번역 모델의 다운로드 요청을 상기 통역 서버로 전송하고, 상기 통역 서버로부터 상기 화자 정보에 대응되는 음성인식 모델 및 상기 상황 정보에 대응되는 번역 모델을 수신하는 통신 모듈과, 상기 수신된 음 성인식 모델 및 번역 모델을 저장하는 저장 모듈과, 상기 음성인식 모델을 참조하여 상기 통역할 음성 정보를 문자 데이터로 변환하는 음성 인식 모듈과, 상기 번역 모델을 참조하여 상기 변환된 문자 데이터를 번역하는 번역 모듈 및 상기 번역된 문자 데이터를 음성으로 합성하는 음성 합성 모듈을 포함하고, 상기 사용자 인터페이스 모듈은 상기 합성된 음성을 상기 사용자에게 제공하고, 상기 통역 서버는 화자별 음성인식 모델 및 상황별 번역 모델을 저장하고, 상기 다운로드 요청에 따라 상기 화자 및 상황을 식별하고, 상기 식별된 화자에 대응되는 음성인식 모델 및 상기 식별된 상황에 대응되는 번역 모델을 전송한다. In addition, the portable interpretation device according to an embodiment of the present invention, the location recognition module for recognizing the location information of the user, and receives the speaker information and situation information from the user, voice recognition / translation corresponding to the speaker information and the situation information A user interface module receiving a request for downloading a model and receiving voice information to be interpreted, and a download request for a voice recognition / translation model corresponding to the speaker information and situation information to the interpreter server, and the speaker from the interpreter server. A communication module for receiving a speech recognition model corresponding to the information and a translation model corresponding to the contextual information, a storage module for storing the received speech adult model and the translation model, and the voice to be interpreted with reference to the speech recognition model A speech recognition module for converting information into text data and the conversion with reference to the translation model A translation module for translating the translated text data and a speech synthesis module for synthesizing the translated text data into a voice, wherein the user interface module provides the synthesized voice to the user, and the interpreter server recognizes the speech per speaker. Stores a model and a contextual translation model, identifies the speaker and the context according to the download request, and transmits a speech recognition model corresponding to the identified speaker and a translation model corresponding to the identified context.

또한 본 발명의 일실시예에 따른 통역 방법은, 사용자로부터 통역 요청된 음성 정보를 입력 받는 단계와, 사용자의 위치 정보를 인식하는 단계와, 상기 인식된 위치 정보 및 통역 요청된 음성 정보를 통역 서버로 전송하는 단계와, 상기 통역 서버로부터 통역 요청된 음성 정보에 대응되는 번역 결과를 수신하는 단계와, 상기 수신된 번역 결과를 음성으로 합성하는 단계 및 상기 음성으로 합성된 번역 결과를 상기 사용자에게 제공하는 단계를 포함하고, 상기 통역 서버는 상기 위치 정보에 따라 상기 사용자의 상황을 파악하고, 상기 사용자의 상황에 대응되는 화자/상황별 음성인식/번역 모델을 참조하여 상기 통역 요청된 음성 정보를 인식하여 문자 데이터로 변환하고, 상기 변환된 문자 데이터를 번역한다.In addition, the interpreting method according to an embodiment of the present invention, the step of receiving the voice information requested for interpretation from the user, the step of recognizing the location information of the user, the interpreted server to interpret the requested location information and the requested voice information Transmitting to the user, receiving a translation result corresponding to the voice information requested for interpretation from the interpretation server, synthesizing the received translation result into a voice, and providing the voice synthesized translation result to the user. And interpreting, by the interpreter server, the situation of the user according to the location information, and the speech information requested for the interpretation by referring to a speaker / situation-specific speech recognition / translation model corresponding to the user's situation. To convert the converted character data into text data.

또한 본 발명의 일실시예에 따른 통역 방법은, 사용자로부터 화자 정보에 대응되는 음성인식 모델 및 상황 정보에 대응되는 번역 모델에 대한 다운로드 요청을 입력 받는 단계와, 상기 다운로드 요청에 따라 통역 서버로부터 상기 화자 정보에 대응되는 음성인식 모델 및 상황 정보에 대응되는 번역 모델을 수신하는 단계와, 상기 수신된 음성인식 모델 및 번역 모델을 저장하는 단계와, 상기 사용자로부터 통역할 음성 정보를 입력 받는 단계와, 상기 음성인식 모델을 참조하여 상기 음성 정보를 문자 데이터로 변환하는 단계와, 상기 번역 모델을 참조하여 상기 문자 데이터를 번역하는 단계와, 상기 번역된 문자 데이터를 음성으로 합성하는 단계 및 상기 합성된 음성을 상기 사용자에게 제공하는 단계를 포함하고, 상기 통역 서버는 화자별 음성인식 모델 및 상황별 번역 모델을 저장하고, 상기 다운로드 요청에 따라 상기 화자 및 상황을 식별하고, 상기 식별된 화자에 대응되는 음성인식 모델 및 상기 식별된 상황에 대응되는 번역 모델을 전송한다. In addition, the interpreting method according to an embodiment of the present invention, receiving a download request for the voice recognition model corresponding to the speaker information and the translation model corresponding to the situation information from the user, and from the interpretation server in accordance with the download request Receiving a speech recognition model corresponding to speaker information and a translation model corresponding to context information, storing the received speech recognition model and translation model, receiving voice information to be interpreted from the user; Converting the speech information into text data with reference to the speech recognition model; translating the text data with reference to the translation model; synthesizing the translated text data into speech; and synthesized speech Providing to the user, wherein the interpretation server is speech recognition mode for each speaker A Dell and contextual translation model are stored, the speaker and the context are identified according to the download request, and a speech recognition model corresponding to the identified speaker and a translation model corresponding to the identified context are transmitted.

본 발명의 일실시예는 화자의 특성과 상황에 적합한 다양한 음성인식 모델 및 상황별 번역 모델을 사전에 구축하고, 사용자의 화자 정보 및 사용자가 처한 상황 정보에 기반하여 통역 정확도를 높일 수 있다. In one embodiment of the present invention, various speech recognition models and contextual translation models suitable for characteristics and situations of a speaker may be previously constructed, and interpretation accuracy may be increased based on the speaker information of the user and the contextual information of the user.

또한 본 발명의 일실시예는 사용자의 화자 정보 및 사용자가 처한 상황 정보에 최적화된 통역 모델을 탑재하여 온라인 또는 오프라인 상에서 통역을 수행할 수 있다. In addition, an embodiment of the present invention can be interpreted on-line or off-line by mounting an interpretation model optimized for the user's speaker information and the user's situation information.

이하, 첨부된 도면들에 기재된 내용들을 참조하여 본 발명에 따른 실시예를 상세하게 설명한다. 다만, 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조부호는 동일한 부재를 나타낸다.Hereinafter, with reference to the contents described in the accompanying drawings will be described in detail an embodiment according to the present invention. However, the present invention is not limited to or limited by the embodiments. Like reference numerals in the drawings denote like elements.

도 1은 본 발명의 일실시예에 따른 휴대용 통역 장치, 웹 서버 및 통역 서버 간의 연동 관계를 나타내는 도면이다.1 is a view showing an interworking relationship between a portable interpreter, a web server, and an interpreter server according to an embodiment of the present invention.

도 1을 참조하면, 휴대용 통역 장치(110), 웹 서버(120) 및 통역 서버(130)는 유/무선 통신 환경을 통해 연동된다. Referring to FIG. 1, the portable interpreter 110, the web server 120, and the interpreter server 130 are interworked with each other through a wired / wireless communication environment.

휴대용 통역 장치(110)는 사용자의 위치를 인식하는 기능을 수행한다. 일례로 휴대용 통역 장치(110)는 GPS 등과 같이 사용자의 위치 좌표를 식별하고, 상기 식별된 사용자의 위치 좌표로 상기 사용자의 위치를 인식할 수 있다. The portable interpreter 110 performs a function of recognizing a user's location. For example, the portable interpreter 110 may identify the location coordinates of the user, such as GPS, and recognize the location of the user by the identified location coordinates of the user.

휴대용 통역 장치(110)는 모델 다운로드 서비스에 따라 모바일 네트워크 통신 기능을 수행하여 웹 서버(120)와 연동하여 통역 서버(130)로부터 화자/상황별 음성 인식/번역 모델을 다운로드 받고, 다운로드된 화자별 음성 인식 모델 및 상황별 번역 모델을 저장한다. The portable interpreter 110 performs a mobile network communication function according to a model download service, downloads a speaker / situation speech recognition / translation model from the interpreter server 130 in conjunction with the web server 120, and downloads the speaker by Stores speech recognition models and contextual translation models.

휴대용 통역 장치(110)는 상기 화자/상황별 음성인식/번역 모델이 다운로드되어 저장된 상태에서 오프라인 통역 서비스를 수행하는 경우, 사용자로부터 입력된 음성 신호를 상기 화자별 음성인식 모델에 따라 인식하여 문자 데이터로 변환하고, 상기 상황별 번역 모델에 따라 상기 문자 데이터를 번역한 후 상기 번역된 문자 데이터를 음성으로 합성하여 상기 사용자에게 제공한다. 이와 같이, 휴대용 통역 장치(110)는 화자에 대응되는 음성인식 모델 및 상황에 대응되는 번역 모델을 사전에 다운로드받아 저장함으로써 저장된 음성인식 모델 및 번역 모델을 참조하여 오프라인 통역 서비스를 사용자에게 제공할 수 있다. When the portable interpreter 110 performs the offline interpretation service while the speaker / situation speech recognition / translation model is downloaded and stored, the portable interpreter 110 recognizes the voice signal input from the user according to the speech recognition model for each speaker. And convert the text data according to the contextual translation model and synthesize the translated text data into voice to provide the user. As such, the portable interpreter 110 may provide the user with an offline interpretation service by referring to the stored speech recognition model and the translation model by downloading and storing the speech recognition model corresponding to the speaker and the translation model corresponding to the situation in advance. have.

한편 휴대용 통역 장치(110)는 상기 상황별 음성/인식 모델이 다운로드되지 않은 상태에서 실시간 통역 서비스를 수행하는 경우, 웹 서버(120)를 통해 통역 서버(130)에게 실시간 통역 중계 서비스를 요청하고, 통역 서버(130)에서 상기 요청된 실시간 통역 중계 서비스에 따라 번역된 문자 데이터를 수신하여 상기 수신된 문자 데이터를 음성으로 합성한다. On the other hand, when the portable interpreter 110 performs a real-time interpretation service without the voice / recognition model for each situation being downloaded, it requests a real-time interpretation relay service from the interpretation server 130 through the web server 120, The interpreter server 130 receives the translated text data according to the requested real-time interpretation relay service and synthesizes the received text data into voice.

웹 서버(120)는 휴대용 통역 장치(110)와 통역 서버(130)간의 통신 및 세션 컨트롤을 수행하고, 모델 다운로드 서비스 또는 실시간 통역 중계 서비스를 제공한다. 상기 모델 다운로드 서비스는 휴대용 통역 장치(110)로부터 화자/상황별 음성인식/번역 모델에 대한 다운로드를 요청 받으면, 통역 서버(130)에서 저장된 화자/상황별 음성인식/번역 모델을 휴대용 통역 장치(110)로 다운로드한다. 상기 실시간 통역 중계 서비스는 휴대용 통역 장치(110)로부터 통역을 요청 받으면, 통역 서버(130)에서 수행된 통역 서비스 결과를 휴대용 통역 장치(110)로 중계한다. The web server 120 performs communication and session control between the portable interpreter 110 and the interpreter server 130, and provides a model download service or a real-time interpreter relay service. When the model download service receives a request for downloading a speaker / situation voice recognition / translation model from the portable interpretation device 110, the portable interpretation device 110 converts the speaker / situation voice recognition / translation model stored in the interpretation server 130. To download). When the real-time interpreter relay service receives an interpreter request from the portable interpreter 110, the interpreter service relayed by the interpreter server 130 is relayed to the portable interpreter 110.

통역 서버(130)는 통역 모델 훈련 기능을 통해 화자별 음성인식 모델 및 상황별 번역 모델을 구축하고, 웹 서버(120)를 통해 휴대용 통역 장치(110)로부터 상기 화자별 음성인식 모델 및 상황별 번역 모델에 대한 다운로드를 요청 받으면, 해당 화자별 음성인식 모델 및 상황별 번역 모델을 웹 서버(120)를 통해 휴대용 통역 장치(110)로 전송한다. The interpreter server 130 builds a speaker-specific speech recognition model and a contextual translation model through an interpretation model training function, and translates the speaker-specific speech recognition model and context-specific speech from the portable interpreter 110 through the web server 120. When a download request for the model is requested, the speaker-specific speech recognition model and the contextual translation model are transmitted to the portable interpreter 110 through the web server 120.

한편, 통역 서버(130)는 웹 서버(120)를 통해 실시간 통역 중계 서비스를 수행하기 위해 휴대용 통역 장치(110)로부터 전송된 음성 신호를 디코딩한다. 통역 서버(130)는 상기 화자별 음성인식 모델에 따라 상기 디코딩된 음성 신호를 문자 데이터로 변환한 후 상기 상황별 번역 모델에 따라 상기 문자 데이터를 번역하 고, 상기 번역된 문자 데이터를 인코딩하여 웹 서버(120)를 통해 전송한다. Meanwhile, the interpreter server 130 decodes the voice signal transmitted from the portable interpreter 110 to perform the real-time interpreter relay service through the web server 120. The interpreter server 130 converts the decoded speech signal into text data according to the speaker-specific speech recognition model, translates the text data according to the contextual translation model, and encodes the translated text data to the web. It transmits through the server 120.

웹 서버(120)는 상기 실시간 통역 중계 서비스에 따라 통역 서버(130)로부터 전송되는 번역된 문자 데이터를 휴대용 통역 장치(110)로 중계한다.The web server 120 relays the translated text data transmitted from the interpreter server 130 to the portable interpreter 110 according to the real-time interpreter relay service.

휴대용 통역 장치(110)는 상기 실시간 통역 중계 서비스에 따라 웹 서버(120)로부터 상기 번역된 문자 데이터를 수신하고, 상기 수신된 문자 데이터를 음성으로 변환하여 상기 사용자에게 제공한다. 이와 같이, 휴대용 통역 장치(110)는 화자/상황별 음성인식/번역 모델이 다운로드되지 않은 상태에도 웹 서버(120)를 통해 통역 서버(130)로부터 실시간 통역 중계 서비스를 제공 받을 수 있다. The portable interpreter 110 receives the translated text data from the web server 120 according to the real-time interpretation relay service, converts the received text data into voice and provides the same to the user. As such, the portable interpreter 110 may receive a real-time interpretation relay service from the interpreter server 130 through the web server 120 even when the speaker / situation voice recognition / translation model is not downloaded.

도 2는 본 발명의 일실시예에 따른 온라인 통역 서비스를 위한 휴대용 통역 장치, 웹 서버 및 통역 서버의 구성을 나타내는 도면이다.2 is a diagram illustrating the configuration of a portable interpreter, a web server, and an interpreter server for an online interpreter service according to an exemplary embodiment of the present invention.

도 2를 참조하면, 휴대용 통역 장치(210)는 위치 인식 모듈(211), 사용자 인터페이스 모듈(212), 통신 모듈(213) 및 음성 합성 모듈(214)을 포함한다.Referring to FIG. 2, the portable interpreter 210 includes a location recognition module 211, a user interface module 212, a communication module 213, and a speech synthesis module 214.

위치 인식 모듈(211)은 사용자의 위치를 인식한다. 일례로 위치 인식 모듈(211)은 GPS를 이용하여 사용자의 위치 좌표를 식별하고, 상기 식별된 사용자의 위치 좌표로 상기 사용자의 위치를 인식할 수 있다. The location recognition module 211 recognizes a user's location. For example, the location recognizing module 211 may identify the location coordinates of the user using GPS and recognize the location of the user by the identified location coordinates of the user.

사용자 인터페이스 모듈(212)은 사용자로부터 통역할 음성 신호를 입력 받고, 상기 입력된 음성 신호에 대한 통역 서비스를 요청 받는다. The user interface module 212 receives a voice signal to be interpreted from a user and requests an interpreter service for the input voice signal.

통신 모듈(213)은 유/무선 통신 환경을 통해 웹 서버(220)와 연동되어 상기 요청된 통역 서비스에 따라 상기 입력된 음성 신호 및 사용자의 위치 정보를 웹 서버(220)로 전송한다. 통신 모듈(213)은 웹 서버(220)로부터 실시간 통역 중계 서 비스(221)에 따른 상기 입력된 음성 신호에 대한 번역 결과를 수신한다. The communication module 213 is linked with the web server 220 through a wired / wireless communication environment and transmits the input voice signal and the user's location information to the web server 220 according to the requested interpretation service. The communication module 213 receives a translation result of the input voice signal according to the real-time interpretation relay service 221 from the web server 220.

음성 합성 모듈(214)는 상기 수신된 번역 결과를 음성으로 합성하여 사용자 인터페이스 모듈(212)을 통해 출력한다. The speech synthesis module 214 synthesizes the received translation result into speech and outputs the speech through the user interface module 212.

웹 서버(220)는 휴대용 통역 장치(210)와 통역 서버(230)간의 실시간 통역 중계 서비스(221)를 위한 통신 및 세션 컨트롤 모듈을 구비한다. 웹 서버(220)는 상기 통신 및 세션 컨트롤 모듈을 통해 실시간 통역 중계 서비스(221)를 위해 휴대용 통역 장치(210)로부터 수신된 음성 신호를 통역 서버(230)로 전송하고, 통역 서버(230)로부터 수신된 음성 신호에 대한 번역 결과를 휴대용 통역 장치(210)로 전송한다. The web server 220 includes a communication and session control module for the real-time interpretation relay service 221 between the portable interpreter 210 and the interpreter server 230. The web server 220 transmits the voice signal received from the portable interpretation device 210 to the interpretation server 230 for the real-time interpretation relay service 221 through the communication and session control module, and from the interpretation server 230 The translation result of the received voice signal is transmitted to the portable interpreter 210.

통역 서버(230)는 화자/상황별 모델 구축 모듈(231), 저장 모듈(232), 통신 모듈(233), 화자/상황 인식 모듈(234), 음성 인식 모듈(235) 및 번역 모듈(236)을 포함한다. The interpreter server 230 includes a speaker / situation model building module 231, a storage module 232, a communication module 233, a speaker / situation recognition module 234, a voice recognition module 235, and a translation module 236. It includes.

화자/상황별 모델 구축 모듈(231)은 각 화자에 대응하는 화자별 음성인식 모델을 구축하고, 각 상황에 대응하는 상황별 번역 모델을 구축한다. 즉, 화자/상황별 모델 구축 모듈(231)은 다양한 사용자의 특성에 맞는 다양한 화자별 음성인식 모델을 구축하고, 사용자가 직면할 수 있는 여러 가지 상황에 맞는 상황별 번역 모델을 구축한다. 일례로 화자/상황별 모델 구축 모듈(231)은 화자의 성별이 남자인 경우 남자에 대응되는 음성인식 모델을 구축하고, 화자의 상황이 비즈니스인 경우 비즈니스 상황에 대응되는 번역 모델을 구축할 수 있다. The speaker / situation model building module 231 builds a speaker-specific speech recognition model corresponding to each speaker, and builds a contextual translation model corresponding to each situation. That is, the speaker / situation model building module 231 builds various speaker-specific speech recognition models suitable for various user characteristics, and builds a context-specific translation model for various situations that a user may face. For example, the speaker / situation model building module 231 may build a speech recognition model corresponding to a man when the speaker's gender is a man and a translation model corresponding to a business situation when the speaker's situation is a business. .

저장 모듈(232)은 화자별 음성인식 모델 및 상황별 번역 모델을 저장한다. 즉, 저장 모듈(232)은 화자/상황별 모델 구축 모듈(231)을 통해 구축된 화자별 음성인식 모델 및 상황별 번역 모델을 저장한다.The storage module 232 stores the speaker-specific speech recognition model and the contextual translation model. That is, the storage module 232 stores the speaker-specific speech recognition model and the context-specific translation model constructed through the speaker / situation model building module 231.

통신 모듈(233)은 웹 서버(220)의 실시간 통역 중계 서비스(221)를 통해 휴대용 통역 장치(210)로부터 전송된 음성 신호 및 사용자의 위치 정보를 수신한다. The communication module 233 receives the voice signal transmitted from the portable interpreter 210 and the location information of the user through the real-time interpretation relay service 221 of the web server 220.

화자/상황 인식 모듈(234)은 상기 수신된 음성 신호 및 사용자의 위치 정보를 이용하여 화자 및 상황을 인식한다. 즉, 화자/상황 인식 모듈(234)은 상기 수신된 음성 신호를 분석하여 화자의 특성을 인식하고, 상기 사용자의 위치 정보를 분석하여 화자의 상황을 인식한다. 일례로 화자/상황 인식 모듈(234)은 상기 사용자의 위치 좌표를 포함한 사용자의 위치 정보를 기반으로 하여 상기 사용자가 공항에 위치한 경우 상기 사용자가 처해진 상황이 여행이라고 인식할 수 있다. The speaker / situation recognition module 234 recognizes the speaker and the situation by using the received voice signal and the user's location information. That is, the speaker / situation recognition module 234 recognizes the speaker's characteristics by analyzing the received voice signal, and recognizes the speaker's situation by analyzing the user's location information. For example, the speaker / situation recognition module 234 may recognize that the situation in which the user is located is a trip when the user is located at an airport based on the location information of the user including the location coordinates of the user.

음성 인식 모듈(235)은 상기 인식된 화자의 특성에 따라 저장 모듈(232)에 저장된 화자별 음성인식 모델을 참조하여 상기 수신된 음성 신호를 문자 데이터로 변환한다. 일례로 음성 인식 모듈(235)은 상기 인식된 화자의 성별이 남자인 경우 저장 모듈(232)에 저장된 화자별 음성인식 모델 중에서 남자 화자에 대응되는 음성인식 모델을 참조하여 상기 수신된 음성 신호를 문자 데이터로 변환할 수 있다. The voice recognition module 235 converts the received voice signal into text data by referring to a speaker-specific voice recognition model stored in the storage module 232 according to the recognized speaker's characteristics. For example, when the recognized speaker's gender is male, the voice recognition module 235 texts the received voice signal with reference to a voice recognition model corresponding to a male speaker among speaker-specific voice recognition models stored in the storage module 232. Can be converted to data.

번역 모듈(236)은 상기 인식된 화자의 상황에 따라 저장 모듈(232)에 저장된 상황별 번역 모델을 참조하여 상기 문자 데이터를 번역한다. 일례로 번역 모듈(236)은 상기 인식된 화자의 상황이 비즈니스인 경우 저장 모듈(232)에 저장된 상황별 번역 모델 중에서 비즈니스 상황에 대응되는 번역 모델을 참조하여 제1 언 어의 문자 데이터를 제2 언어의 문자 데이터로 번역할 수 있다. The translation module 236 translates the text data with reference to the contextual translation model stored in the storage module 232 according to the recognized speaker situation. For example, if the recognized speaker situation is a business, the translation module 236 may refer to the translation model corresponding to the business situation among the contextual translation models stored in the storage module 232 to convert the text data of the first language into a second language. Can be translated into character data of the language.

통신 모듈(233)은 상기 번역된 문자 데이터를 음성 신호에 대한 번역 결과로 웹 서버(220)를 통해 휴대용 번역 장치(210)로 전송한다. The communication module 233 transmits the translated text data to the portable translation apparatus 210 through the web server 220 as a translation result for the voice signal.

휴대용 통역 장치(210)는 웹 서버(220)를 통해 통역 서버(230)와 온라인으로 연결된 상태에서 통역할 음성 신호를 입력 받아 통역 서버(230)로부터 음성 신호에 대한 번역 결과를 수신하고, 수신된 번역 결과를 음성으로 합성하여 사용자에게 실시간 통역 서비스를 제공할 수 있다. The portable interpreter 210 receives a voice signal to be interpreted while being online with the interpreter server 230 through the web server 220 and receives a translation result for the voice signal from the interpreter server 230. By synthesizing the translation result into voice, a real-time interpretation service can be provided to the user.

도 3은 본 발명의 일실시예에 따른 오프라인 통역 서비스를 위한 휴대용 통역 장치, 웹 서버 및 통역 서버의 구성을 나타내는 도면이다.3 is a diagram illustrating the configuration of a portable interpreter, a web server, and an interpreter server for an offline interpreter service according to an exemplary embodiment of the present invention.

도 3을 참조하면, 휴대용 통역 장치(310)는 사용자 인터페이스 모듈(311), 위치 인식 모듈(312), 통신 모듈(313), 저장 모듈(314), 음성 인식 모듈(315), 번역 모듈(316) 및 음성 합성 모듈(317)을 포함한다.Referring to FIG. 3, the portable interpreter 310 may include a user interface module 311, a location recognition module 312, a communication module 313, a storage module 314, a voice recognition module 315, and a translation module 316. And speech synthesis module 317.

휴대용 통역 장치(310)는 오프라인 통역 서비스를 위해 미리 웹 서버(320)을 통한 모델 다운로드 서비스(321)에 따라 통역 서버(330)와 접속하여 통역 서버(330)로부터 화자/상황별 음성인식/번역 모델을 다운로드 받는다. The portable interpretation device 310 is connected to the interpretation server 330 according to the model download service 321 through the web server 320 in advance for the offline interpretation service, and the voice recognition / translation by the speaker / situation from the interpretation server 330 Download the model.

사용자 인터페이스 모듈(311)은 오프라인 통역 서비스를 위해 사전에 사용자로부터 화자 및 상화에 대응되는 음성인식/번역 모델에 대한 다운로드를 요청 받는다. 상기 오프라인 통역 서비스는 사용자가 통역을 요청하는 경우 휴대용 통역 장치(310)가 통역 서버(330)와 온라인으로 연결된 상태가 아닌 오프라인 상태에서 통역 서비스를 제공하는 것이다. 일례로 상기 사용자가 비즈니스를 위해 일본인을 만나는 경우, 사용자 인터페이스 모듈(311)은 상기 사용자로부터 일본인 화자 및 비즈니스 상황에 대응되는 음성인식/번역 모델에 대한 다운로드를 요청 받을 수 있다. 다른 일례로 상기 사용자가 미국 여행을 가는 경우, 사용자 인터페이스 모듈(311)은 상기 사용자로부터 미국인 화자 및 여행 상황에 대응되는 음성인식/번역 모델에 대한 다운로드를 요청 받을 수 있다. 상기 음성인식/번역 모델에 대한 다운로드 요청에 화자 정보 및 위치 정보가 포함된다. 상기 화자 정보는 상기 화자의 성별 및 상기 화자가 통역할 언어 정보가 포함될 수 있고, 상기 위치 정보는 상기 화자가 통역할 위치 좌표 또는 장소 정보가 포함될 수 있다. The user interface module 311 receives a download request for a speech recognition / translation model corresponding to a speaker and a conversation from a user in advance for an offline interpretation service. When the user requests an interpretation, the offline interpretation service is to provide an interpretation service in an offline state instead of being connected to the interpretation server 330 online. For example, when the user meets a Japanese person for business, the user interface module 311 may receive a request for downloading a speech recognition / translation model corresponding to the Japanese speaker and the business situation from the user. As another example, when the user travels to the United States, the user interface module 311 may be requested to download a voice recognition / translation model corresponding to the American speaker and the travel situation from the user. Speaker information and location information are included in the download request for the voice recognition / translation model. The speaker information may include gender of the speaker and language information to be interpreted by the speaker, and the location information may include location coordinates or place information to be interpreted by the speaker.

위치 인식 모듈(312)은 휴대용 통역 장치(310)의 위치를 인식한다. 일례로 위치 인식 모듈(312)는 GPS를 통해 휴대용 통역 장치(310)의 위치 좌표를 식별하고, 상기 식별된 위치 좌표를 통해 현재 위치가 공항인 것을 인식할 수 있다. The location recognition module 312 recognizes the location of the portable interpreter 310. For example, the location recognition module 312 may identify the location coordinates of the portable interpreter 310 through GPS, and recognize that the current location is an airport through the identified location coordinates.

통신 모듈(313)은 상기 음성인식/번역 모델에 대한 다운로드 요청을 웹 서버(320)로 전송한다. The communication module 313 transmits a download request for the voice recognition / translation model to the web server 320.

웹 서버(320)는 통신 및 세션 컨트롤 모듈을 구비하고, 상기 화자/상황별 음성인식/번역 모델에 대한 다운로드 서비스(321)를 휴대용 통역 장치(310)로 제공한다. 웹 서버(320)는 상기 통신 및 세션 컨트롤 모듈을 통해 모델 다운로드 서비스(321)에 따라 휴대용 통역 장치(310)로부터 상기 음성인식/번역 모델에 대한 다운로드 요청을 통역 서버(330)로 전송하고, 통역 서버(330)로부터 해당 화자/상황별 음성인식/번역 모델을 수신하여 휴대용 통역 장치(310)로 전송한다. The web server 320 includes a communication and session control module, and provides the portable interpreter 310 with a download service 321 for the speaker / situation speech recognition / translation model. The web server 320 transmits a download request for the voice recognition / translation model from the portable interpreter 310 to the interpreter server 330 according to the model download service 321 through the communication and session control module. The server / 330 receives a speaker / situation-specific voice recognition / translation model and transmits the same to the portable interpreter 310.

통역 서버(330)는 화자/상황별 모델 구축 모듈(331), 저장 모듈(332), 통신 모듈(333), 화자/상황 식별 모듈(334) 및 화자/상황별 모델 선별 모듈(335)을 포함한다. The interpreter server 330 includes a speaker / situation model building module 331, a storage module 332, a communication module 333, a speaker / situation identification module 334, and a speaker / situation model selection module 335. do.

화자/상황별 모델 구축 모듈(331)은 사전에 다양한 상황 또는 화자에 따른 음성인식/번역 모델들을 구축하고, 상기 구축된 화자/상황별 음성인식/번역 모델들을 저장 모듈(332)에 저장한다. 즉, 화자/상황별 모델 구축 모듈(331)은 다양한 화자에 대한 모델 훈련 기능에 의해 화자별 음성인식 모델을 구축하고, 다양한 상황에 대한 모델 훈련 기능에 의해 상황별 번역 모델들을 구축하고, 구축된 화자별 음성인식 모델 및 상황별 번역 모델을 저장 모듈(332)에 저장한다. 일례로 화자/상황별 모델 구축 모듈(331)은 다양한 일본어 화자에 대한 모델 훈련 기능에 의해 일본어 화자 음성인식 모델을 구축할 수 있다. 다른 일례로 화자/상황별 모델 구축 모듈(331)은 비즈니스 상황에 대한 모델 훈련 기능에 의해 비즈니스 상황에 적합한 번역 모델을 구축할 수 있다. The speaker / situation model building module 331 builds up speech recognition / translation models according to various situations or speakers in advance, and stores the constructed speaker / situation speech recognition / translation models in the storage module 332. That is, the speaker / situation model building module 331 constructs a speech recognition model for each speaker by model training function for various speakers, and builds translation models for each situation by model training function for various situations. The speaker-specific speech recognition model and the contextual translation model are stored in the storage module 332. For example, the speaker / situation model building module 331 may build a Japanese speaker voice recognition model by model training function for various Japanese speakers. As another example, the speaker / situation model building module 331 may build a translation model suitable for the business situation by the model training function for the business situation.

통역 서버(330)는 상기 화자/상황별 음성인식/번역 모델에 대한 다운로드 요청을 통신 모듈(333)을 통해 수신하면, 화자/상황 식별 모듈(334)을 통해 상기 화자/상황별 음성인식/번역 모델에 대한 다운로드 요청에 따른 화자 및 상황을 식별한다. 화자/상황 식별 모듈(334)은 상기 화자/상황별 음성인식/번역 모델에 대한 다운로드 요청에 포함된 화자 정보 및 위치 정보를 이용하여 화자의 특성 및 화자의 상황을 식별한다. 일례로 화자/상황 식별 모듈(334)은 상기 화자/상황별 음성인식/번역 모델에 대한 다운로드 요청에 포함된 화자 정보를 이용하여 화자의 성별을 식별하고, 상기 위치 정보를 이용하여 화자의 상황을 식별할 수 있다. When the interpreter server 330 receives a download request for the speaker / situation voice recognition / translation model through the communication module 333, the speaker / situation voice recognition / translation by the speaker / situation identification module 334 Identifies the speaker and the context of the download request for the model. The speaker / situation identification module 334 identifies the characteristics of the speaker and the speaker's situation using the speaker information and the location information included in the download request for the speaker / situation-specific voice recognition / translation model. For example, the speaker / situation identification module 334 identifies the gender of the speaker using the speaker information included in the download request for the speaker / situation voice recognition / translation model, and uses the location information to identify the speaker's situation. Can be identified.

화자/상황별 모델 선별 모듈(335)은 저장 모듈(332)에 저장된 화자/상황별 음성인식/번역 모델들 중에 상기 식별된 화자 및 상황에 대응되는 화자/상황별 음성인식/번역 모델을 선별한다. 일례로 화자/상황별 모델 선별 모듈(335)은 저장 모듈(332)에 저장된 화자/상황별 음성인식/번역 모델들 중에서 상기 화자/상황별 음성인식/번역 모델에 대한 다운로드 요청에 따라 식별된 화자가 남자이고, 상황이 여행인 경우, 남자 화자에 대응되는 음성인식 모델 및 여행 상황에 대응되는 번역 모델을 선별할 수 있다. The speaker / situation model selection module 335 selects the speaker / situation speech recognition / translation model corresponding to the identified speaker and the situation among the speaker / situation speech recognition / translation models stored in the storage module 332. . For example, the speaker / situation model selection module 335 is a speaker identified according to a download request for the speaker / situation speech recognition / translation model among the speaker / situation speech recognition / translation models stored in the storage module 332. Is a man and the situation is a travel, a voice recognition model corresponding to the male speaker and a translation model corresponding to the travel situation may be selected.

통신 모듈(333)은 상기 선별된 화자/상황별 음성인식/번역 모델을 웹 서버(320)로 전송한다. 일례로 통신 모듈(333)은 상기 선별된 남자 화자에 대응되는 음성인식 모델 및 여행 상황에 대응되는 번역 모델을 웹 서버(320)로 전송한다. The communication module 333 transmits the selected speaker / situation voice recognition / translation model to the web server 320. For example, the communication module 333 transmits the speech recognition model corresponding to the selected male speaker and the translation model corresponding to the travel situation to the web server 320.

웹 서버(320)는 상기 통신 및 세션 컨트롤 모듈을 통해 모델 다운로드 서비스에 따라 통역 서버(330)로부터 상기 화자/상황별 음성인식/번역 모델을 수신하여 휴대용 통역 장치(310)로 전송한다. 일례로 웹 서버(320)는 상기 통신 및 세션 컨트롤 모듈을 통해 상기 모델 다운로드 서비스에 따라 통역 서버(330)로부터 상기 남자 화자에 대응되는 음성인식 모델 및 여행 상황에 대응되는 번역 모델을 수신하여 휴대용 통역 장치(310)로 전송할 수 있다. The web server 320 receives the speaker / situation speech recognition / translation model from the interpreter server 330 according to the model download service through the communication and session control module and transmits it to the portable interpreter 310. For example, the web server 320 receives a voice recognition model corresponding to the male speaker and a translation model corresponding to a travel situation from the interpretation server 330 according to the model download service through the communication and session control module. Transmit to device 310.

휴대용 통역 장치(310)는 웹 서버(320)을 통해 통역 서버(330)로부터 전송된 음성인식/번역 모델을 수신하여 저장 모듈(314)에 저장한다. 일례로 상기 사용자가 성별이 남자이고, 비즈니스 상황에 필요한 통역을 위한 모델 다운로드 서비스를 요청한 경우, 휴대용 통역 장치(310)는 통역 서버(330)로부터 웹 서버(320)을 통한 모델 다운로드 서비스(321)에 따라 남자 음성인식 모델 및 비즈니스 상황 번역 모델을 다운로드하여 저장 모듈(314)에 저장할 수 있다. The portable interpreter 310 receives the voice recognition / translation model transmitted from the interpreter server 330 through the web server 320 and stores the received voice recognition / translation model in the storage module 314. For example, when the user has a gender and requests a model download service for an interpreter necessary for a business situation, the portable interpreter 310 may download the model download service 321 from the interpreter server 330 through the web server 320. The male voice recognition model and the business situation translation model may be downloaded and stored in the storage module 314.

사용자 인터페이스 모듈(311)은 사용자로부터 통역할 음성을 입력 받는다. 또한 사용자 인터페이스 모듈(311)은 사용자로부터 화자 또는 상황 정보를 입력 받을 수 있다. 일례로 사용자 인터페이스 모듈(311)은 상기 사용자로부터 사용자의 성별, 번역 언어쌍(제1 언어 및 제2 언어) 및 사용자가 처한 상황을 입력 받을 수 있다. The user interface module 311 receives a voice to be interpreted from the user. In addition, the user interface module 311 may receive a speaker or situation information from the user. For example, the user interface module 311 may receive from the user a gender, a translation language pair (first language and a second language), and a situation of the user.

음성 인식 모듈(315)은 저장 모듈(314)에 저장된 음성인식 모델을 참조하여 상기 입력된 음성을 인식하여 문자 데이터로 변환한다. The voice recognition module 315 may recognize the input voice and convert the input voice into text data by referring to the voice recognition model stored in the storage module 314.

번역 모듈(316)은 저장 모듈(313)에 저장된 번역 모델을 참조하여 상기 변환된 문자 데이터를 사용자가 원하는 언어로 번역한다. The translation module 316 translates the converted text data into a language desired by a user with reference to a translation model stored in the storage module 313.

음성 합성 모듈(317)은 상기 사용자가 번역을 원하는 언어로 번역된 문자 데이터를 음성으로 합성한다. 일례로 상기 사용자가 번역을 원하는 언어가 영어인 경우, 음성 합성 모듈(317)은 상기 영어로 번역된 문자 데이터를 음성으로 합성할 수 있다. The speech synthesizing module 317 synthesizes the text data translated into the language that the user wants to translate into speech. For example, when the language that the user wants to translate is English, the speech synthesis module 317 may synthesize text data translated into English into speech.

사용자 인터페이스 모듈(311)은 상기 사용자가 원하는 언어로 번역된 문자 데이터가 음성으로 합성된 결과를 상기 사용자에게 출력한다. 일례로 상기 사용자가 통역을 원하는 언어가 영어인 경우, 사용자 인터페이스 모듈(311)은 상기 영어로 합성된 음성을 사용자에게 출력할 수 있다. The user interface module 311 outputs, to the user, a result of speech synthesis of text data translated into a language desired by the user. For example, if the language that the user wants to interpret is English, the user interface module 311 may output the English synthesized voice to the user.

이와 같이, 본 발명의 일실시예에 따른 휴대용 통역 장치(310)는 화자/상황 별 음성인식/번역 모델을 사전에 다운로드 받아 저장한 후 화자/상황별 음성인식/번역 모델을 참조하여 입력된 음성을 인식하여 문자 데이터로 변환하고, 변환된 문자 데이터를 번역한 후 번역된 문자 데이터를 음성으로 합성하여 사용자에게 통역 서비스를 제공할 수 있다. As such, the portable interpreter 310 according to an embodiment of the present invention downloads and stores a speaker / speech / speech recognition / translation model in advance, and then inputs the voice by referring to the speaker / speech / speech / translation model. Recognizes and converts the text data into text data, translates the converted text data, and synthesizes the translated text data into voice to provide an interpreter service to the user.

도 4는 본 발명의 일실시예에 따른 상황별 통역 모델 생성 과정을 나타내는 도면이다.4 is a diagram illustrating a process of generating an interpretation model for each situation according to an exemplary embodiment of the present invention.

도 4를 참조하면, 통역 서버는 말뭉치(401)를 색인화하고(S410), 상기 색인화된 말뭉치를 말뭉치 색인 데이터베이스(402)에 기록한다. 일례로 상기 통역 서버는 제1 언어, 제2 언어 및 상기 제1 언어에 대한 음성 데이터가 병렬로 나열되어 있는 대역 말뭉치(401)에서 색인을 생성할 수 있다. Referring to FIG. 4, the interpreter server indexes the corpus 401 (S410) and records the indexed corpus in the corpus index database 402. For example, the interpreter server may generate an index in a band corpus 401 in which a first language, a second language, and voice data for the first language are arranged in parallel.

상기 통역 서버는 상황 목록(403)에서 웹/온톨로지 데이터베이스(404)를 참조하여 핵심어를 추출하고(S420), 추출된 핵심어를 상황별 핵심어(405)으로 분류한다. 일례로 상기 통역 서버는 사용자가 처할 수 있는 다수의 상황 목록(403)에서 각각의 상황과 관련 있는 다수의 핵심어를 온톨로지를 포함하는 지식 자원을 참조하여 추출할 수 있다. The interpreter server refers to the web / ontology database 404 in the context list 403 to extract the keyword (S420), and classifies the extracted keyword into context keywords 405 for each situation. For example, the interpreter server may extract a plurality of key words related to each situation from a plurality of situation lists 403 that a user may encounter with reference to a knowledge resource including an ontology.

상기 통역 서버는 말뭉치 색인 데이터베이스(402)를 참조하여 상황별 핵심어(405)에 대해 검색하고(S403), 검색 결과에 따라 상황별 말뭉치(406)를 분류한다. 일례로 상기 통역 서버는 상기 추출된 각 상황에 대응하는 다수의 핵심어를 이용하여 대역 말뭉치를 검색하고, 검색 결과물을 상황별 말뭉치로 취합한다. The interpreter server searches the keyword 405 for each context by referring to the corpus index database 402 (S403), and classifies the contextual corpus 406 according to the search result. For example, the interpreter server searches band corpus using a plurality of keywords corresponding to each extracted situation, and collects search results into context corpus.

상기 통역 서버는 상기 분류된 상황별 말뭉치(406)를 이용하여 음성인식/번 역 모델을 생성하고(S440), 상황별 음성인식/번역 모델(407)을 제공한다. 일례로 상기 통역 서버는 상기 상황별 말뭉치를 이용하여 상황별 음성인식 및 번역 모델을 훈련할 수 있다. The interpretation server generates a speech recognition / translation model using the classified contextual corpus 406 (S440), and provides a contextual speech recognition / translation model 407. For example, the interpreter server may train a contextual speech recognition and translation model using the contextual corpus.

도 5는 본 발명의 일실시예에 따른 통역 방법의 플로우 차트를 나타내는 도면이다.5 is a view showing a flow chart of the interpretation method according to an embodiment of the present invention.

도 3 및 도 5를 참조하면, 통역 서버(330)는 사용자 성별 및 상황을 식별한다(S510). 휴대용 통역 장치(310)는 화자 및 상황에 적합한 음성인식/번역 모델에 대한 다운로드를 요청하기 위해 사용자로부터 입력된 화자 정보 및 인식된 위치 정보를 웹 서버(320)로 전송한다. 웹 서버(320)는 통신 및 세션 컨트롤 모듈을 통해 모델 다운로드 서비스(321)에 따라 휴대용 통역 장치(310)로부터 수신된 상기 화자 정보 및 위치 정보를 통역 서버(330)로 전송한다. 통역 서버(330)는 통신 모듈(333)을 통해 휴대용 통역 장치(310)로부터 웹 서버(320)의 중계에 의해 상기 화자 정보 및 위치 정보를 수신하고, 상기 화자 정보를 이용하여 화자의 성별을 식별하고, 상기 위치 정보를 이용하여 화자의 상황을 식별한다. 일례로 통역 서버(330)는 상기 화자 정보를 이용하여 상기 화자의 성별이 남성인 것을 식별하고, 상기 위치 정보가 공항인 경우 상기 화자의 상황이 여행인 것으로 식별할 수 있다. 3 and 5, the interpretation server 330 identifies the user gender and the situation (S510). The portable interpreter 310 transmits the speaker information and the recognized location information input from the user to the web server 320 in order to request the download of the speaker and the voice recognition / translation model suitable for the situation. The web server 320 transmits the speaker information and the location information received from the portable interpreter 310 to the interpreter server 330 according to the model download service 321 through the communication and session control module. The interpreter server 330 receives the speaker information and the location information by the relay of the web server 320 from the portable interpreter 310 through the communication module 333, and identifies the gender of the speaker using the speaker information. The situation of the speaker is identified using the location information. For example, the interpreter server 330 may identify that the gender of the speaker is male by using the speaker information, and identify that the speaker situation is a trip when the location information is an airport.

통역 서버(330)는 상기 식별된 사용자의 성별 및 상황에 따라 화자/상황별 음성인식/번역 모델을 선별한다(S520). 즉, 통역 서버(330)는 화자/상황 식별 모듈(334)을 통해 상기 화자/상황별 음성인식/번역 모델로부터 상기 식별된 화자의 특성에 대응되는 음성인식 모델 및 상기 식별된 상황에 대응되는 번역 모델을 선별 한다. 일례로 통역 서버(330)는 화자별 음성인식 모델로부터 남성 화자에 대응되는 음성인식 모델을 선별하고, 상황별 번역 모델로부터 여행 상황에 대응되는 번역 모델을 선택할 수 있다. 통역 서버(330)는 통신 모듈(333)을 통해 상기 선별된 음성인식 모델 및 번역 모델을 웹 서버(320)의 중계에 의해 휴대용 통역 장치(310)로 전송한다. The interpreter server 330 selects a speaker / situation voice recognition / translation model according to the identified user's gender and situation (S520). That is, the interpreter server 330 is a speech recognition model corresponding to the characteristics of the identified speaker from the speaker / situation identification speech recognition / translation model through the speaker / situation identification module 334 and the translation corresponding to the identified situation Select the model. For example, the interpretation server 330 may select a voice recognition model corresponding to the male speaker from the speaker recognition voice recognition model, and select a translation model corresponding to the travel situation from the contextual translation model. The interpretation server 330 transmits the selected voice recognition model and the translation model to the portable interpretation device 310 by the relay of the web server 320 through the communication module 333.

휴대용 통역 장치(310)는 통역 서버(330)로부터 전송된 음성인식/번역 모델을 탑재한다(S530). 즉, 휴대용 통역 장치(310)는 통신 모듈(313)을 통해 웹 서버(320)의 중계에 의해 상기 전송된 음성인식 모델 및 번역 모델을 수신하고, 상기 음성인식 모델 및 번역 모델을 저장 모듈(314)에 저장한다. 일례로 휴대용 통역 장치(310)는 상기 남성 화자에 대응되는 음성인식 모델 및 상기 번역 모델은 상기 여행 상황에 대응되는 번역 모델을 저장 모듈(314)에 저장할 수 있다. The portable translator 310 is equipped with a voice recognition / translation model transmitted from the interpretation server 330 (S530). That is, the portable translator 310 receives the transmitted voice recognition model and translation model by the relay of the web server 320 through the communication module 313, and stores the voice recognition model and translation model in the storage module 314. ). For example, the portable interpreter 310 may store the speech recognition model corresponding to the male speaker and the translation model in the storage module 314.

휴대용 통역 장치(310)는 상기 사용자로부터 번역할 문장을 음성으로 입력 받는다(S540). 즉, 휴대용 통역 장치(310)는 사용자 인터페이스 모듈(311)을 통해 상기 사용자로부터 제2 언어로 번역할 문장을 제1 언어의 음성으로 입력 받는다. The portable interpreter 310 receives a sentence to be translated from the user by voice (S540). That is, the portable interpreter 310 receives a sentence to be translated into the second language from the user through the user interface module 311 as a voice of the first language.

휴대용 통역 장치(310)는 상기 탑재된 음성인식 모델을 참조하여 상기 입력된 음성을 인식하여 문장으로 변환한다(S550). 일례로 휴대용 통역 장치(310)는 저장 모듈(314)에 저장된 남자 화자에 대응되는 음성인식 모델을 참조하여 상기 입력된 제1 언어의 음성을 제1 언어의 문장으로 변환할 수 있다.The portable translator 310 recognizes the input voice by referring to the mounted voice recognition model and converts the input voice into a sentence (S550). For example, the portable translator 310 may convert a voice of the input first language into a sentence of a first language with reference to a voice recognition model corresponding to a male speaker stored in the storage module 314.

휴대용 통역 장치(310)는 상기 탑재된 번역 모델을 참조하여 상기 변환된 문장을 번역하고(S560), 상기 번역된 문장을 음성으로 합성하여 출력한다(S560). 일례로 휴대용 통역 장치(310)는 저장 모듈(314)에 저장된 여행 상황에 대응되는 번역 모델을 참조하여 번역 모듈(316)을 통해 상기 변환된 제1 언어의 문장을 제2 언어의 문장으로 번역하고, 음성 합성 모듈(317)을 통해 상기 번역된 제2 언어의 문장을 음성으로 합성하여 사용자 인터페이스 모듈(311)을 통해 상기 사용자에게 합성된 제2 언어의 음성을 출력할 수 있다. The portable translator 310 translates the converted sentence with reference to the mounted translation model (S560), and synthesizes the translated sentence into a voice (S560). For example, the portable translator 310 may translate the converted first language sentence into a second language sentence through the translation module 316 by referring to a translation model corresponding to the travel situation stored in the storage module 314. The speech synthesis module 317 synthesizes the translated sentences of the second language into speech and outputs the synthesized speech of the second language to the user through the user interface module 311.

이와 같이, 본 발명의 일실시예에 따른 통역 방법은 휴대용 통역 장치에서 통역 서버로부터 화자 및 상황에 적합한 음성인식 모델 및 번역 모델을 사전에 다운로드받아 탑재한 후 입력된 음성 신호를 탑재된 음성인식 모델에 따라 문자 데이터로 변환하고, 변환된 문자 데이터를 탑재된 번역 모델에 따라 번역한 후 번역된 문자 데이터를 음성 합성함으로써 사용자에게 통역 서비스를 제공할 수 있다. As described above, the interpretation method according to an embodiment of the present invention downloads a speech recognition model and a translation model suitable for a speaker and a situation from a interpretation server in a portable interpretation device in advance, and then mounts the speech signal inputted with the speech signal. According to the present invention, an interpreter service can be provided to a user by converting the text data into text data, translating the converted text data according to a built-in translation model, and then synthesizing the translated text data.

도 6은 본 발명의 일실시예에 따른 위치 인식을 통한 상황 정보 추출 과정을 나타내는 도면이다.6 is a diagram illustrating a process of extracting contextual information through location recognition according to an embodiment of the present invention.

도 2 및 도 6을 참조하면, 휴대용 통역 장치(210)는 GPS 위치 정보를 인식하고, 상기 인식된 GPS 위치 정보로 사용자의 위치 좌표를 식별한다(S610). 즉, 휴대용 통역 장치(210)는 위치 인식 모듈(211)을 통해 사용자의 위치 정보를 인식하고, 상기 인식된 위치 정보로 상기 사용자의 위치 좌표를 식별한다.2 and 6, the portable interpreter 210 recognizes the GPS location information and identifies the location coordinates of the user using the recognized GPS location information (S610). That is, the portable interpreter 210 recognizes the location information of the user through the location recognition module 211, and identifies the location coordinates of the user with the recognized location information.

휴대용 통역 장치(210)는 통신망을 이용하여 웹 서버(220)를 통해 상기 위치 좌표를 통역 서버(230)로 전송한다(S620). 즉, 휴대용 통역 장치(210)는 통신 모듈(213)을 통해 웹 서버(220)로 상기 위치 좌표를 전송하고, 웹 서버(220)가 상기 위치 좌표를 통역 서버(230)로 중계한다. The portable interpreter 210 transmits the location coordinates to the interpreter server 230 through the web server 220 using the communication network (S620). That is, the portable interpreter 210 transmits the location coordinates to the web server 220 through the communication module 213, and the web server 220 relays the location coordinates to the interpreter server 230.

통역 서버(230)는 위치 대 장소 데이터베이스를 검색하여 사용자가 위치해 있는 장소를 식별하여 위치 좌표에 따른 장소 정보를 추출한다(S630). 상기 위치 대 장소 데이터베이스는 위치 좌표에 대응하여 장소 정보를 기록하고 유지할 수 있다. The interpreter server 230 searches the location-to-place database to identify the place where the user is located and extracts the place information according to the location coordinates (S630). The location-to-place database may record and maintain place information corresponding to the location coordinates.

통역 서버(230)는 장소 대 상황 데이터베이스를 검색하여 상황 범주를 식별하여 장소 정보에 따른 상황 정보를 추출한다(S640). 상기 장소 대 상황 데이터베이스는 상기 장소 정보에 대응되는 상황 정보를 기록하고 유지할 수 있다. 일례로 상기 장소 정보가 레스토랑인 경우, 통역 서버(230)는 상기 장소 대 상황 데이터베이스를 검색하여 레스토랑에 따른 식사 상황을 추출할 수 있다. The interpreter server 230 searches the place-to-situation database to identify the situation category and extracts the situation information according to the place information (S640). The venue-to-situation database may record and maintain situation information corresponding to the place information. For example, when the place information is a restaurant, the interpreter server 230 may search the place versus situation database and extract a meal situation according to the restaurant.

통역 서버(230)는 상황 정보에 따른 번역 모델을 선별하여 통역 작업을 수행한다(S650). 통역 서버(230)는 화자 정보에 따라 선별된 음성인식 모델 및 상기 상황 정보에 따라 선별된 번역 모델을 이용하여 휴대용 통역 장치(210)로부터 요청된 음성 신호에 대한 통역 작업을 수행한다. 일례로 통역 서버(230)는 남성 화자에 따라 선별된 음성인식 모델을 참조하여 휴대용 통역 장치(210)로부터 전송된 통역할 제1 언어의 음성 신호를 제1 언어의 문자 데이터로 변환하고, 여행 상황에 따라 선별된 번역 모델을 참조하여 상기 변환된 제1 언어의 문자 데이터를 제2 언어로 번역하여 휴대용 통역 장치(210)로 전송할 수 있다. 휴대용 통역 장치(210)는 통역 서버(230)로부터 상기 제2 언어로 번역된 문자 데이터를 수신하고, 상기 제2 언어의 문자 데이터를 제2 언어의 음성으로 합성하여 사용자 인터페이스 모듈(212)을 통해 상기 사용자에게 제1 언어의 음성에 대한 통역 결과로 합성된 제2 언어의 음성을 제공할 수 있다. The interpretation server 230 selects a translation model according to the situation information and performs an interpretation operation (S650). The interpretation server 230 performs an interpreting operation on the voice signal requested from the portable interpretation device 210 using the speech recognition model selected according to the speaker information and the translation model selected according to the context information. For example, the interpretation server 230 converts the voice signal of the first language to be interpreted from the portable interpretation device 210 into text data of the first language with reference to the voice recognition model selected according to the male speaker, and travels. The text data of the converted first language may be translated into a second language with reference to the selected translation model and transmitted to the portable interpreter 210. The portable interpreter 210 receives the text data translated into the second language from the interpreter server 230, synthesizes the text data of the second language into a voice of the second language, and transmits the text data through the user interface module 212. The voice of the second language synthesized as the interpretation result of the voice of the first language may be provided to the user.

이와 같이, 본 발명의 일실시예에 따른 통역 방법은 휴대용 통역 장치에서 통역할 화자 정보, 음성 신호 및 위치 정보를 통역 서버로 전송하고, 통역 서버에서 화자 정보에 따라 식별된 화자에 대응되는 음성인식 모델을 이용하여 음성 신호를 문자 데이터로 변환하고, 위치 정보에 따라 식별된 상황에 대응되는 번역 모델을 이용하여 문자 데이터를 번역하고, 휴대용 통역 장치에서 번역된 문자 데이터를 수신하여 음성 합성함으로써 통역 서비스를 제공할 수 있다. As such, the interpretation method according to an embodiment of the present invention transmits the speaker information, voice signal, and location information to be interpreted by the portable interpreter to the interpretation server, and voice recognition corresponding to the speaker identified according to the speaker information by the interpretation server. Interpretation service by converting voice signal into text data using a model, translating text data using a translation model corresponding to a situation identified according to location information, and receiving and synthesizing the translated text data from a portable interpreter. Can be provided.

본 발명의 일실시예에 따른 통역 방법들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발 명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Interpretation methods according to an embodiment of the present invention can be implemented in the form of program instructions that can be executed by various computer means may be recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined not only by the claims below but also by the equivalents of the claims.

도 4는 본 발명의 일실시예에 따른 상황별 번역 모델 생성 과정을 나타내는 도면이다.4 is a diagram illustrating a process of generating a translation model for each context according to an embodiment of the present invention.

도 6은 본 발명의 일실시예에 따른 위치 인식을 통한 상황 추출 과정을 나타내는 도면이다.6 is a diagram illustrating a situation extraction process through location recognition according to an embodiment of the present invention.

Claims

A user interface module configured to receive voice information requested for interpretation from a user;

A location recognition module recognizing location information of a user;

A communication module for transmitting the recognized location information and the requested speech information to an interpretation server and receiving a translation result corresponding to the requested speech information from the interpretation server; And

Speech synthesis module for synthesizing the received translation results into speech

Including,

The user interface module,

Providing the voiced translation result to the user,

The interpreter server,

Grasp the user's situation according to the location information, recognize the interpreted voice information according to a speaker / situation speech recognition / translation model corresponding to the user's situation, convert the requested voice information into text data, and convert the converted text into text data. And a translator for translating the data to a translation result corresponding to the requested voice information.

The method of claim 1,

The speaker / situation voice recognition / translation model,

Including the speaker-specific speech recognition model corresponding to the gender of the user and the contextual translation model corresponding to the user's situation,

The interpreter server,

Recognizing the transmitted speech information with reference to the speaker-specific speech recognition model, and translating the recognized speech information with reference to the contextual translation model, and transmits a translation result corresponding to the speech information through a web server, Portable translator.

The method of claim 1,

The location recognition module,

Identify the location coordinates of the user,

The communication module,

Transmit the location coordinates of the identified user to the interpreter server,

The interpreter server,

Extracting location information corresponding to the location coordinates of the transmitted user from a location / place database, extracting situation information corresponding to the extracted place information from the place / situation database, and translating by context corresponding to the extracted situation information A portable interpreter device for translating the text data using a model.

A location recognition module recognizing location information of a user;

A user interface module for receiving speaker information and situation information from a user, requesting to download a voice recognition / translation model corresponding to the speaker information and situation information, and receiving voice information to be interpreted;

Transmitting a download request of a voice recognition / translation model corresponding to the speaker information and context information to the interpreter server, and receiving a speech recognition model corresponding to the speaker information and a translation model corresponding to the context information from the interpreter server; Communication module;

A storage module for storing the received speech recognition model and translation model;

A speech recognition module for converting the speech information to be interpreted into text data with reference to the speech recognition model;

A translation module for translating the converted text data with reference to the translation model; And

Speech synthesis module for synthesizing the translated text data into speech

Including,

The user interface module,

Provide the synthesized voice to the user,

The interpreter server,

Storing a speaker-specific speech recognition model and a contextual translation model, identifying the speaker and the context according to the download request, and transmitting a speech recognition model corresponding to the identified speaker and a translation model corresponding to the identified situation , Portable translator device.

The method according to claim 1 or 4,

The interpreter server,

From the list of contexts, we extract the keywords for each situation by referring to the knowledge resources that include on the ontology a number of keywords related to each situation, and search the band corpus using the extracted contextual keywords from the index database. A portable interpretation device that collects each contextual corpus and trains the contextual translation model using the contextual corpus.

Receiving voice information requested for interpretation from a user;

Recognizing location information of a user;

Transmitting the recognized location information and the interpretation requested speech information to an interpretation server;

Receiving a translation result corresponding to the voice information requested for interpretation from the interpretation server;

Synthesizing the received translation result into speech; And

Providing the voiced translation result to the user

Including,

The interpreter server,

The situation of the user is identified according to the location information, and the speech information requested for the interpretation is recognized and converted into text data by referring to a speaker / situation speech recognition / translation model corresponding to the user's situation. Interpretation method to translate text data.

The method of claim 6,

The speaker / situation voice recognition / translation model,

Speech recognition model for each speaker corresponding to the characteristics of the speaker and contextual translation model corresponding to the situation of the speaker,

The interpreter server,

And converting the speech information into text data by referring to the speaker-specific speech recognition model, and translating the converted text data by referring to the contextual translation model and transmitting the translated text data as a translation result corresponding to the speech information.

The method of claim 6,

Recognizing the location information of the user,

Identify the location coordinates of the user,

Sending to the interpreter server,

The interpreter server,

Extracting location information corresponding to the location coordinates of the transmitted user from a location / place database, extracting situation information corresponding to the extracted place information from the place / situation database, and translating by context corresponding to the extracted situation information Translating the voice information for translating the text data with reference to a model.

Receiving a download request for a voice recognition model corresponding to the speaker information and a translation model corresponding to the situation information from a user;

Receiving a speech recognition model corresponding to the speaker information and a translation model corresponding to context information from an interpreter server according to the download request;

Storing the received speech recognition model and translation model;

Receiving voice information to be interpreted from the user;

Converting the speech information into text data by referring to the speech recognition model;

Translating the text data with reference to the translation model;

Synthesizing the translated text data into voice; And

Providing the synthesized voice to the user

Including,

The interpreter server,

Storing a speaker-specific speech recognition model and a contextual translation model, identifying the speaker and the context according to the download request, and transmitting a speech recognition model corresponding to the identified speaker and a translation model corresponding to the identified situation , Interpretation method.

The method of claim 6 or 9,

The interpreter server,

From the list of situations, a keyword is extracted by referring to a knowledge resource including a plurality of keywords related to each situation as an ontology, and the band corpus is searched using the extracted contextual keywords from the index database. Collecting the results into a contextual corpus, using the contextual corpus to train a contextual translation model, interpretation method.

A computer-readable recording medium having recorded thereon a program for performing the method of any one of claims 6 to 9.