KR20190029237A

KR20190029237A - Apparatus for interpreting and method thereof

Info

Publication number: KR20190029237A
Application number: KR1020170116567A
Authority: KR
Inventors: 김상철; 박기범
Original assignee: (주)한컴인터프리
Priority date: 2017-09-12
Filing date: 2017-09-12
Publication date: 2019-03-20
Also published as: KR102056330B1

Abstract

According to the present invention, disclosed is an interpretation apparatus performing two-way simultaneous interpretation as a portable server including a translation module performing two-way translation and included in the interpretation apparatus and a storage module storing a voice recognition DB and a translation DB and functioning as an input apparatus and/or an output apparatus for at least one speaker in a relationship of parties of simultaneous interpretation. According to the present invention, two-way simultaneous interpretation is possible in an environment in which the interpretation apparatus is not connected to the Internet by using a portable server.

Description

[0001] APPARATUS FOR INTERPRETING AND METHOD THEREOF [0002]

본 발명은 통역 장치 및 그 방법에 관한 것으로, 더욱 상세하게는 인터넷 망의 연결 없이 오프라인에서 통역이 가능한 통역 장치 및 그 방법에 관한 것이다.The present invention relates to an interpretation apparatus and a method thereof, and more particularly, to an interpretation apparatus and a method thereof capable of offline interpretation without connection to the Internet.

교통, 통신수단의 발달에 따라 국가 간의 인적, 물적 교류가 활발해져 왔다. 이러한 인적, 물적 교류의 확대에도 불구하고 국가 사이의 이종 언어는 의사소통에 있어서 장벽으로 작용하고 있다.With the development of transportation and communication means, human and material exchanges between countries have become active. Despite the expansion of human and material exchanges, the heterogeneous language between countries is acting as a barrier in communication.

이종 언어로 인한 불편함을 덜어 주기 위해 이종 언어로 된 문자 간의 변환을 번역이라 한다. 그리고 spoken language translation system은 이종 언어의 음성 간의 변환을 의미하는데, 방송뉴스 통역이 이에 해당한다.To reduce the inconvenience caused by heterogeneous languages, the conversion between characters in a heterogeneous language is called translation. And the spoken language translation system refers to the conversion between the voices of different languages, which is the interpretation of broadcast news.

특히 국가 간의 각종 컨퍼런스에서 이종 언어로 인한 대화자 사이의 불편함은 동시통역을 통해 해결되고 있다. 특히 자동통역은 양방향에서 제1 언어의 음성을 제2 언어의 음성으로 변환하는 것이다.Especially, the inconvenience between the speakers of different languages at various conferences between countries is solved through simultaneous interpretation. In particular, the automatic interpretation is to convert the voice of the first language into the voice of the second language in both directions.

과거 동시 통역사들의 전유물이었던 동시통역이 음성인식, 자동번역 및 음성합성의 기술의 발전으로 인해 기계에 의한 자동통역이 이루어지고 있다.Simultaneous interpreting, which was the exclusive use of past simultaneous interpreters, has been made automatic interpreting by machine due to the development of technology of speech recognition, automatic translation and voice synthesis.

자동통역은 제1 언어의 발화를 음성인식, 자동번역 등의 과정을 거쳐서 제2 언어로 변환하고, 이를 자막으로 출력하거나 혹은 음성합성 후 스피커를 통해 들려주는 과정 및 기술을 의미한다.The automatic interpretation means a process and a technique of converting the utterance of the first language into a second language through processes such as speech recognition and automatic translation, outputting it as a subtitle, or reproducing the voice through a speaker after synthesizing the voice.

도 1은 번역 서버를 이용하는 종래의 기술에 따른 통역 시스템(10)에 관한 예시도이다.1 is an exemplary view of a translation system 10 according to the prior art using a translation server.

도 1을 참조하면, 종래의 기술에 따르면 동시통역 시스템(10)을 비롯하여 사용자 단말 형식의 통역장치(11)의 경우, 음성인식과 번역을 위해 유선 또는 무선의 통신망(14), 예를 들어 셀룰러 무선전화 통신망에 연결된 서버(12)와 데이터베이스(130)를 이용하고 있어서, 네트워크에 연결되지 않은 환경, 이를테면 상공을 운행 중인 항공기 기내 또는 외국 현지 광광지에서 네트워크에 연결되지 않은 통신장치로는 통역기능을 수행할 수 없어 문제가 발생한다.Referring to FIG. 1, according to the related art, in the case of the interpretation device 11 of the user terminal type including the simultaneous interpretation system 10, a wired or wireless communication network 14 for speech recognition and translation, The server 12 and the database 130 connected to the radiotelephone network are used to provide an interpretation function for a communication device not connected to the network in an environment not connected to the network such as an airplane operating in the sky or a foreign local light spot I can not do it and I have problems.

또한, 종래의 기술에 따른 사용자 단말 형식의 단일의 통역장치를 이용하는 경우, 대화자 중에서 통역이 필요한 사람은 일방에 한정된 경우가 대부분이고 상대방은 사용자 단말을 가지고 있지 않아서, 하나의 통역장치를 이용하여 대화자 쌍방이 이를 입력/출력의 도구로 사용해야 해서 불편함이 존재한다.In the case of using a single interpretation device of the user terminal type according to the related art, most of the conversationists are limited to only one of them and the other does not have a user terminal, It is inconvenient for both parties to use it as a tool for input / output.

도 2는 하나의 사용자 단말에 의한 종래의 기술에 따른 통역 상황의 예시도이다.2 is an illustration of an interpretation situation according to the prior art by one user terminal.

도 2를 참조하면, 양 대화자 사이에 번호 순서에 따라,①발화-②통역, ③발화-④통역이 이루어진다. 이 경우 양 대화자는 하나의 통역장치를 이용하여 번갈아 가면서 음성을 입력시키고 이를 출력해야 하는 번거로움이 있다. 그리고 동시 대화는 불가능하거나 처리가 어렵고, 일정 시간 간격을 두고 통역장치에 음성을 입력시켜야 한다.Referring to FIG. 2, between the two talkers, ① an utterance-② interpreter, ③ an utterance-④ interpreter are performed according to the order of numbers. In this case, there is a problem that both dialogue users have to input voice by alternately using one interpretation device and output it. Simultaneous conversation is impossible or difficult to handle, and a voice must be input to the interpreter at regular intervals.

먼저 선행기술 1, 한국등록특허 제10-1626109호(2016.05.25.)는 통역 장치 및 방법에 관한 기술을 개시한다.Prior Art 1 and Korean Patent No. 10-1626109 (May 25, 2015) disclose techniques for interpreting devices and methods.

상기 선행기술 1은, 음성 입력부, 제어부, 통신부, 디스플레이부, 및 사용자로부터 이전에 번역된 문장들에 대한 조작입력을 수신하는 사용자 입력부를 포함한다.The prior art 1 includes a voice input unit, a control unit, a communication unit, a display unit, and a user input unit for receiving operational input on previously translated sentences from the user.

또한, 선행기술 2, 한국등록특허 제10-1747874호(2017.06.09.)는 자동 통역 시스템을 개시하고 있다.Prior Art 2 and Korean Patent No. 10-1747874 (Jun. 19, 2017) disclose an automatic interpretation system.

상기 선행기술 2는, PC, 또는 핸드폰, 스마트폰, PDA, Laptop 등 휴대가 가능한 기기와 통신하거나 직접 자동통역 단말기에 활용되는 자동 통역 시스템에 관한 것으로서, 상기 자동 통역 시스템은, 발화자의 음성 인식용 마이크신호, 골도 마이크 신호 및 발화자의 제스처 신호를 네트워크를 통해 전송하고, 네트워크를 통해 수신된 통역 결과신호를 출력하는 웨어러블 자동 통역 입출력 장치; 및 상기 웨어러블 자동 통역 입출력 장치로부터 네트워크를 통해 전송된 골도 마이크 신호 또는 제스처 신호를 이용하여 상기 음성 인식용 마이크 신호에서 음성 데이터 구간을 검출하고, 검출된 구간내의 음성 데이터의 음성 인식 및 통역을 수행한 후, 통역 결과신호를 네트워크를 통해 상기 웨어러블 자동 통역 입출력 장치로 전송하는 서버를 포함한다.The prior art 2 relates to an automatic interpretation system for communicating with portable devices such as a PC or a mobile phone, a smart phone, a PDA, a laptop, or the like, or used for a direct automatic interpretation terminal, A wearable automatic interpretation input / output device for transmitting a microphone signal, a bone-cord microphone signal and a gesture signal of a speaking person through a network and outputting an interpretation result signal received through a network; And a voice recognition section for detecting voice data sections in the voice recognition microphone signal by using a bone-cord microphone signal or a gesture signal transmitted through the network from the wearable automatic interpretation input / output device and performing voice recognition and interpretation of voice data in the detected section And a server for transmitting an interpretation result signal to the wearable automatic interpretation input / output apparatus via a network.

또한, 선행기술 3, 한국등록특허 제10-1589433호(2016.01.22.)는 동시통역 시스템을 개시하고 있다.Prior Art 3 and Korean Patent No. 10-1589433 (Jan. 22, 2016) disclose a simultaneous interpretation system.

상기 선행기술 3은, 동시통역 시스템에 있어서, 음성을 입력 및 출력시키는 적어도 둘 이상의 헤드셋 및 해당 헤드셋으로부터 출력된 피통역 음성 언어를 입력받아 상기 피통역 음성 언어가 통역된 통역 음성 언어를 지정된 타 헤드셋으로 출력하는 휴대용 단말기를 포함하는 동시통역 시스템을 사용함으로써, 하나의 휴대용 단말기를 매개로한 사용자들 간의 근거리 통신을 이용하여 동시통역을 수행하게 하여 보다 효율적이고 자유로운 대화를 가능하게 한다.In the simultaneous interpretation system described in the prior art 3, at least two or more headsets for inputting and outputting a voice and an interpreter voice language output from the headset are input, and an interpreter voice language in which the interpreter voice language is interpreted is designated as a designated headset The simultaneous interpretation system using short-distance communication between users via one portable terminal enables simultaneous interpretation, thereby enabling more efficient and free conversation.

그런데 상기 선행기술 1은 제1 언어로 번역할 문장을 번역 서버로 전송하고, 번역 서버로부터 제2 언어의 번역된 문장을 수신하는 통신부를 포함하고 있는데, 이를 근거로 서버가 번역을 담당하고 있는 것을 알 수 있다.The prior art 1 includes a communication unit that transmits a sentence to be translated into the first language to the translation server and receives a translated sentence of the second language from the translation server. Based on this, Able to know.

또한 상기 선행기술 2는 상기 음성 마이크로부터 제공되는 사용자 음성 신호와, 골도 마이크로부터 제공되는 사용자 골도 신호 및 모션 센서로부터 제공되는 사용자 행동(제스처) 감지 신호를 수신하여, 상기 제1 서버로 음성 인식을 위한 신호로서 전송하고, 상기 제1 서버로부터 전송되는 음성 인식 결과 정보를 상기 다수의 제2~n 서버로 전송하거나, 상기 제2~n 서버로부터 전송되는 통역 결과 정보를 출력 제어하는 통신 모듈을 포함하고 있는데, 이를 근거로 서버가 음성인식 및 통역을 수행함을 알 수 있다.Also, the prior art 2 receives user voice signals provided from the voice microphones, a user's bone marrow signal provided from the bone marrow microphones, and a user behavior (gesture) detection signal provided from the motion sensor, And a communication module for transmitting the voice recognition result information transmitted from the first server to the second to nth servers or outputting the interpretation result information transmitted from the second to n servers Based on this, it can be seen that the server performs voice recognition and interpretation.

또한, 선행기술 3의 제어부는 제1 언어의 음성을 제2 언어의 음성으로 변환하는 통역 서버를 이용하여, 제1 언어의 음성을 기반으로 변환된 제2 언어의 음성을 획득하는데, 이를 근거로 통역 서버를 이용함을 알 수 있다.The control unit of the prior art 3 acquires the voice of the second language converted based on the voice of the first language by using an interpretation server for converting the voice of the first language into voice of the second language, It can be seen that the interpretation server is used.

본 발명의 실시 예에 따른 통역장치 및 그 방법은, 운항중인 항공기 기내 및 외국 현지 여행지와 같이 인터넷 환경이 조성되어 있지 상황에서 네트워크상의 통역 서버의 이용 없이 동시통역이 가능하게 하는 기술에 관한 것으로 상기 살펴본 선행기술과 구별되는 기술로서 상기 문제점을 해결하기 위한 것이다.The interpreter apparatus and method according to the embodiment of the present invention relate to a technique that enables simultaneous interpretation without using an interpretation server on a network in a situation where an internet environment such as an airplane in flight or a foreign local travel destination is not established, The present invention is intended to solve the above problems as a technique different from the prior art that has been examined.

본 발명은 상기와 같은 문제점을 해결하기 위해 창작된 것으로서, 휴대용 서버를 이용하는 통역장치 및 그 방법을 제공하는 것을 목적으로 한다.It is an object of the present invention to provide an interpretation apparatus and a method thereof, which are created to solve the above-mentioned problems, using a portable server.

또한, 원격의 네트워크 환경이 조성되지 않은 환경에서 통역이 가능한 통역장치 및 그 방법을 제공하는 것을 목적으로 한다.It is also an object of the present invention to provide an interpretation apparatus and an interpretation apparatus capable of interpreting in an environment where a remote network environment is not provided.

또한, 사용자 단말이 구비되지 않은 발화자의 상대방에게 입력장치 및/또는 출력장치를 제공하여 불편을 줄일 수 있게 하는 통역장치 및 그 방법을 제공하는 하는 것을 목적으로 한다.It is another object of the present invention to provide an interpretation apparatus and a method for providing an input device and / or an output device to a counterpart of a speaker who does not have a user terminal, thereby reducing inconvenience.

또한, 고용량의 메모리에 상대적으로 저용량의 음성인식 및 번역에 관한 데이터를 저장하는 휴대용 서버를 이용하여 신뢰도 높은 결과를 신속히 출력할 수 있는 통역장치 및 그 방법을 제공하는 것을 목적으로 한다.It is another object of the present invention to provide an interpretation apparatus and method which can quickly output a reliable result by using a portable server that stores data relating to speech recognition and translation in a relatively small capacity in a high capacity memory.

본 발명의 일 실시 예에 따른 통역장치는, 양방향의 번역을 수행하는 자체 내부에 포함된 번역 모듈; 및 음성인식 DB와 번역 DB가 저장된 저장 모듈을 포함하는 휴대용 서버로서 양방향의 동시통역을 수행하고, 동시통역의 당사자들의 관계에서 적어도 어느 한 발화자에 대해 입력장치 및/또는 출력장치로 기능하는 것을 특징으로 한다.An interpretation device according to an embodiment of the present invention includes a translation module included in itself to perform bidirectional translation; And a storage module in which a speech recognition DB and a translation DB are stored, characterized in that it performs bidirectional simultaneous interpretation and functions as an input device and / or an output device for at least one of the speakers in the relationship of parties in simultaneous interpretation .

또한, 상기 통역장치는, 발화자의 음성을 녹음하고, 이 녹음 데이터 및/또는 사용자 단말에서 수신한 녹음 데이터를 이용하여 양방향의 음성인식을 수행하는 음성인식 모듈을 더 포함하는 것을 특징으로 한다.The interpretation apparatus may further include a speech recognition module for performing a bidirectional speech recognition using the recorded data and / or the recorded data received from the user terminal.

또한, 상기 음성인식 DB와 번역 DB는, 언어의 종류에 따라 출현 빈도수가 낮은 순으로 인식 범위를 축소시켜 결정된 데이터베이스에 해당하고, 상기 음성인식 모듈과 번역 모듈은, 소형화된 음성인식 DB와 번역 DB를 이용하는 엔진을 포함하는 것을 특징으로 한다.The speech recognition DB and the translation DB correspond to a database determined by reducing the recognition range in descending order of frequency of appearance according to the type of language. And an engine using the engine.

또한, 상기 통역장치는, 상기 발화자 및/또는 상대방의 음성 입력 을 판단하기 위한 입력 버튼을 더 포함하는 것을 특징으로 한다.Further, the interpretation apparatus may further include an input button for determining the voice input of the speaking party and / or the other party.

또한, 상기 통역장치는, 사용자 단말과 유선 또는 무선으로 통신하는 통신부를 더 포함하고, 상기 사용자 단말이 상기 어느 한 발화자에 대해 입력장치 및/또는 출력장치로 기능하는 경우, 그 상대방에 대해 입력장치 및/또는 출력 장치로 기능하는 것을 특징으로 한다.Further, the interpretation apparatus may further include a communication unit for communicating with the user terminal by wire or wireless. When the user terminal functions as an input device and / or an output device for the one of the speakers, And / or an output device.

또한, 상기 통역장치는, 완료된 동작을 확인시키기 위한 확인 메시지를 상기 사용자 단말에 전송하게 하기 위해 이를 생성하는 제어 모듈을 더 포함하는 것을 특징으로 한다.The interpretation apparatus may further include a control module for generating an acknowledgment message for confirming the completed operation to the user terminal.

또한, 상기 재생 모듈은, 상기 사용자 단말로부터 전송된 TTS 데이터를 재생하고 그 결과를 상기 스피커로 출력하는 것을 특징으로 한다.The reproducing module may reproduce the TTS data transmitted from the user terminal and output the result to the speaker.

또한, 상기 통역장치는, 상기 휴대용 서버를 제어하는 제어부가 구비된 상기 사용자 단말을 더 포함하는 것을 특징으로 한다.The interpretation apparatus may further include the user terminal having a control unit for controlling the portable server.

또한, 상기 제어부는, 상기 휴대용 서버의 구동, 음성 녹음, 녹음 데이터의 전송, 상기 확인 메시지 전송을 제어하는 것을 특징으로 한다.In addition, the control unit controls driving of the portable server, voice recording, transmission of recorded data, and transmission of acknowledgment messages.

또한, 상기 사용자 단말은 저장부를 더 포함하고, 상기 저장부는 상기 제어부와 상기 제어 모듈을 연동시키기 위해 인스톨된 서버 프로그램을 저장하고, 상기 저장 모듈은 상기 제어부와 상기 제어 모듈을 연동시키기 위해 인스톨된 클라이언트 프로그램을 저장하는 것을 특징으로 한다.Also, the user terminal may further include a storage unit, and the storage unit may store a server program installed to link the control unit and the control module, and the storage module may include a client installed to link the control unit and the control module, And stores the program.

또한, 상기 휴대용 서버는, 상기 사용자 단말과의 관계에서 Standby, Ready 및 Run 상태 사이를 전환하고, Power On 상태와 Power Off 상태 사이를 전환하기 위한 전원 버튼을 더 포함하는 것을 특징으로 한다.In addition, the portable server may further include a power button for switching between Standby, Ready, and Run states in relation to the user terminal, and for switching between a Power On state and a Power Off state.

또한, 상기 휴대용 서버는, 상기 Standby 상태에서, 상기 통신 모듈이 On 상태이고, 상기 제어 모듈이 슬립 상태이고, 상기 Ready 상태에서, 상기 제어 모듈이 On 상태로 전환되는 것을 특징으로 한다.The portable server is characterized in that, in the standby state, the communication module is on, the control module is in a sleep state, and the control module is turned on in the ready state.

또한, 상기 제어 모듈은, 발화자 음성의 사운드 스펙트럼을 이용하여 음색을 결정하고, 결정된 음색에 따라 동시 발화된 이종 언어 음성의 발화자를 구별하는 것을 특징으로 한다.Further, the control module is characterized in that the tone color is determined using the sound spectrum of the speaking person's voice, and the speaking person of the different language speech is identified simultaneously according to the determined tone color.

또한, 상기 제어 모듈은, 동시 발화된 이종 언어의 음성에 대해, 샘플 음성의 번역 결과에 따른 점수(scoring)를 이용하여 이종 언어의 종류를 구별하는 것을 특징으로 한다.Further, the control module distinguishes the kind of the heterogeneous language by using a scoring according to the translation result of the sample voice, with respect to the speech of the simultaneous uttered heterogeneous language.

본 발명의 일 실시 예에 따른 통역방법은, 동시통역의 당사자들과의 관계에서 적어도 어느 한 발화자에 대해 입력장치 및/또는 출력장치로 기능하는 휴대용 서버가 동시통역을 수행하기 위해, 자체 내부에 포함하고 있는 음성인식 모듈 및 음성인식 DB를 이용하여 음성을 인식하는 단계; 및 자체 내부에 포함하고 있는 번역 모듈 및 번역 DB를 이용하여 상기 인식된 텍스트를 번역하는 단계를 포함하는 것을 특징으로 한다.The interpretation method according to an embodiment of the present invention is characterized in that a portable server functioning as an input device and / or an output device for at least one of the speakers in the relationship with the parties of the simultaneous interpretation, Recognizing speech using a speech recognition module and a speech recognition DB; And translating the recognized text using a translation module and a translation DB included in the translation module.

또한, 상기 음성을 인식하는 단계는, 마이크로폰을 통해 상기 적어도 어느 한 발화자의 음성을 입력받고, 상기 음성인식 모듈을 통해 상기 음성을 녹음하고 녹음 데이터를 이용하여 음성인식을 수행하는 것을 특징으로 한다.The recognizing of the voice may include receiving a voice of at least one of the speakers through a microphone, recording the voice through the voice recognition module, and performing voice recognition using the recorded data.

또한, 상기 통역방법은, 스피커가 구비된 재생 모듈을 통해 상기 번역 모듈의 번역에 따라 상기 어느 한 발화자의 음성에 대응하는 합성 음성을 출력하는 단계를 더 포함하는 것을 특징으로 한다.The interpretation method may further include a step of outputting a synthesized voice corresponding to the voice of the one of the speakers in accordance with the translation of the translation module through a reproduction module provided with a speaker.

또한, 상기 음성을 인식하는 단계는, 상기 사용자 및/또는 상대방의 음성 입력 시점을 판단하기 위해 입력 버튼이 사용되는 것을 특징으로 한다.In addition, the step of recognizing the voice may use an input button to determine a voice input time point of the user and / or the other party.

또한, 상기 통역방법은, 상기 휴대용 서버와 사용자 단말이 유선 또는 무선으로 통신하는 단계를 더 포함하고, 상기 사용자 단말이 상기 어느 한 발화자에 대해 입력장치 및/또는 출력장치로 기능하는 경우, 상기 휴대용 서버는 그 상대방에 대해 입력장치 및/또는 출력 장치로 기능하는 것을 특징으로 한다.In addition, the interpretation method further includes a step of the wired or wireless communication between the portable server and the user terminal, and when the user terminal functions as an input device and / or an output device with respect to any one of the speakers, And the server functions as an input device and / or an output device for the other party.

또한, 상기 통역방법은, 완료된 동작을 확인시키기 위한 확인 메시지를 상기 사용자 단말에 전송하게 하기 위해 제어 모듈이 메시지를 생성하는 단계를 더 포함하는 것을 특징으로 한다.The interpretation method may further include generating a message by the control module to transmit an acknowledgment message for confirming the completed operation to the user terminal.

또한, 상기 합성 음성을 출력하는 단계는, 상기 사용자 단말로부터 전송된 TTS 데이터를 재생하고 그 결과를 상기 스피커로 출력하는 것을 특징으로 한다.The outputting of the synthesized voice may be performed by reproducing the TTS data transmitted from the user terminal and outputting the result to the speaker.

또한, 상기 통역방법은, 제어부가 구비된 사용자 단말이 상기 휴대용 서버를 제어하는 단계를 더 포함하는 것을 특징으로 한다.The interpretation method may further include a step of the user terminal having the control unit controlling the portable server.

또한, 상기 제어하는 단계는, 상기 휴대용 서버의 구동, 음성 녹음, 녹음 데이터의 상기 휴대용 서버로의 전송, 상기 확인 메시지의 디스플레이를 제어하는 것을 특징으로 한다.In addition, the controlling step controls driving of the portable server, voice recording, transmission of recorded data to the portable server, and display of the confirmation message.

또한, 상기 사용자 단말은 저장부를 더 포함하고, 상기 저장부는 상기 제어부와 상기 제어 모듈을 연동시키기 위해 인스톨된 서버 프로그램을 저장하고, 상기 저장부는 상기 제어부와 상기 제어 모듈을 연동시키기 위해 인스톨된 클라이언트 프로그램을 저장하는 것을 특징으로 한다.Also, the user terminal may further include a storage unit, and the storage unit may store a server program installed to link the control unit and the control module, and the storage unit may store a client program installed to link the control unit and the control module, Is stored.

또한, 상기 휴대용 서버는, 상기 사용자 단말과의 관계에서 Standby, Ready 및 Run 상태 사이를 전환하고, 전원 버튼을 통해 Power On 상태와 Power Off 상태 사이를 전환하는 것을 특징으로 한다.In addition, the portable server may switch between a Standby, Ready, and Run states in relation to the user terminal, and switch between a Power On state and a Power Off state through a power button.

또한, 상기 음성을 인식하는 단계는, 발화자 음성의 사운드 스펙트럼을 이용하여 음색을 결정하고, 결정된 음색에 따라 동시 발화된 이종 언어 음성의 발화자를 구별하는 단계를 더 포함하는 것을 특징으로 한다.Further, the step of recognizing the voice may further include determining a tone color using the sound spectrum of the speaker's voice, and distinguishing the speakers of the simultaneous-uttered heterogeneous language voice according to the determined tone color.

또한, 상기 음성을 인식하는 단계는, 동시 발화된 이종 언어의 음성에 대해, 샘플 음성의 번역 결과에 따른 점수(scoring)를 이용하여 이종 언어 음성의 종류를 구별하는 단계를 더 포함하는 것을 특징으로 한다.The step of recognizing the speech may further include the step of discriminating the kind of the heterogeneous language speech using scoring according to the translation result of the sample speech with respect to the speech of the simultaneous speaking heterogeneous language do.

본 발명에 의하면, 휴대용 서버를 이용함으로써 인터넷에 연결되지 않은 환경에서도 양방향 동시통역이 가능하다.According to the present invention, bidirectional simultaneous interpretation is possible even in an environment that is not connected to the Internet by using a portable server.

또한, 단말기를 소지하지 않은 상대방에게도 통역에 관한 음성 입력 및/또는 출력이 가능한 단말을 제공할 수 있다.In addition, a terminal capable of inputting and / or outputting audio related to interpretation can be provided to the other party who does not have the terminal.

또한, 음성인식 및 번역에 관한 DB가 저장된 고용량의 메모리를 포함하는 휴대용 서버를 이용하여 신뢰도 높은 결과를 신속히 출력할 수 있다.In addition, reliable results can be promptly output by using a portable server including a high capacity memory storing a DB relating to speech recognition and translation.

도 1은 번역 서버를 이용하는 종래의 기술에 따른 통역 시스템에 관한 예시도이다.
도 2는 하나의 사용자 단말에 의한 종래의 기술에 따른 통역 상황의 예시도이다.
도 3은 본 발명의 일 실시 예에 따른 통역장치의 블록도이다.
도 4는 본 발명의 일 실시 예에 따른 저장 모듈의 블록도이다.
도 5는 본 발명의 일 실시 예에 따른 통역장치의 정면도이다.
도 6은 본 발명의 일 실시 예에 따른 통역장치의 상태의 전환을 나타내는 예시도이다.
도 7은 본 발명의 일 실시 예에 따른 휴대용 서버와 사용자 단말 간의 통신을 나타내는 예시도이다.
도 8은 본 발명의 일 실시 예에 따른 사용자 단말의 블록도이다.
도 9는 본 발명의 일 실시 예에 따른 통역방법의 흐름도이다.
도 10은 본 발명의 일 실시 예에 따른 통역방법에서 S130 단계의 흐름도이다.BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is an illustration of an interpretation system according to prior art using a translation server.
2 is an illustration of an interpretation situation according to the prior art by one user terminal.
3 is a block diagram of an interpretation device according to an embodiment of the present invention.
4 is a block diagram of a storage module in accordance with an embodiment of the present invention.
5 is a front view of an interpretation device according to an embodiment of the present invention.
FIG. 6 is an exemplary diagram showing the transition of the state of an interpreter according to an embodiment of the present invention. FIG.
7 is an exemplary diagram illustrating communication between a portable server and a user terminal according to an embodiment of the present invention.
8 is a block diagram of a user terminal in accordance with an embodiment of the present invention.
9 is a flowchart of an interpretation method according to an embodiment of the present invention.
FIG. 10 is a flow chart of step S130 in the interpretation method according to an embodiment of the present invention.

이하, 첨부한 도면을 참조하여 본 발명의 통역장치 및 그 방법에 대한 바람직한 실시 예를 상세히 설명한다. 각 도면에 제시된 동일한 참조부호는 동일한 부재를 나타낸다. 또한 본 발명의 실시 예들에 대해서 특정한 구조적 내지 기능적 설명들은 단지 본 발명에 따른 실시 예를 설명하기 위한 목적으로 예시된 것으로, 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는 것이 바람직하다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Like reference symbols in the drawings denote like elements. Furthermore, specific structural and functional descriptions for embodiments of the present invention are presented for the purpose of describing an embodiment of the present invention only, and, unless otherwise defined, all terms used herein, including technical or scientific terms Have the same meaning as commonly understood by those of ordinary skill in the art to which the present invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as ideal or overly formal in the sense of the art unless explicitly defined herein .

이하 본 발명의 실시 예에 따른 통역장치에 대해 설명하기로 한다.Hereinafter, an interpreter according to an embodiment of the present invention will be described.

도 3은 본 발명의 일 실시 예에 따른 통역장치의 블록도이다.3 is a block diagram of an interpretation device according to an embodiment of the present invention.

도 3에 도시된 구성요소를 갖는 본 발명의 일 실시 예에 따른 통역장치(100)는 휴대용 서버(Portable Server)에 해당하는 것을 특징으로 한다. 그리고 후술될 사용자 단말(200)을 포함하지 않는 통역장치(100)와 사용자 단말(200)을 포함하는 통역장치(100)를 구별하기 위해, 사용자 단말(200)을 포함하지 않는 통역장치(100)를 휴대용 서버(100)로 정의한다.The interpretation apparatus 100 according to an embodiment of the present invention having the components shown in FIG. 3 corresponds to a portable server. In order to distinguish between the interpreter device 100 including the user terminal 200 and the interpreter device 100 including the user terminal 200 described below, the interpreter device 100, which does not include the user terminal 200, Is defined as the portable server (100).

도 3을 참조하면, 본 발명의 일 실시 예에 따른 통역장치(100)는 음성인식 모듈(110), 번역 모듈(120), 재생 모듈(130), 저장 모듈(140), 입력 모듈(150), 출력 모듈(160), 전원 모듈(170), 통신 모듈(180) 및 제어 모듈(190)을 포함한다. 그리고 추가적으로 통역장치(100)는 사용자 단말(200)을 포함할 수 있다. 즉, 통역장치(100)는 사용자 단말(200)과 유선/무선으로 결합되지 않은 독립(Stand-alone) 형태로 동작하고 그 기능을 수행하거나, 사용자 단말(200)과 유선/무선으로 결합되어 동작하고 그 기능을 수행할 수 있다. 다만 독립 형태로의 동작을 위해 TTS 엔진을 내부에 포함하고 있을 필요가 있다.3, the interpretation apparatus 100 according to an embodiment of the present invention includes a speech recognition module 110, a translation module 120, a reproduction module 130, a storage module 140, an input module 150, An output module 160, a power module 170, a communication module 180, and a control module 190. In addition, the interpretation device 100 may further include a user terminal 200. That is, the interpreting device 100 operates in a stand-alone mode, which is not coupled with the user terminal 200 in a wired / wireless manner, and performs the function thereof, or is coupled with the user terminal 200 in a wired / And perform its function. However, the TTS engine needs to be included internally for independent operation.

통역장치(100)는 음성인식 DB와 번역 DB가 저장된 저장 모듈(140)을 포함하는 휴대용 서버로서 통역을 수행하고, 동시통역의 당사자들의 관계에서 적어도 어느 한 발화자에 대해 입력장치 및/또는 출력장치로 기능한다.The interpretation device 100 is a portable server including a speech recognition DB and a storage module 140 in which a translation DB is stored, and performs interpretation for at least one speaker in the relationship of the parties of the simultaneous interpretation to the input device and / .

여기서 '입력장치 및/또는 출력장치로 기능'이란, 동시통역이 양 당사자 사이에 이루어지는 경우를 포함하여, 출발어(Source Language)를 구사하는 측과, 도착어(Target Language)를 구사하는 측 각각이 복수인 경우에, 어느 한 측의 발화자 또는 발화자들 그리고 양측의 발화자 또는 발화자들은 자신의 음성을 입력시키고, 번역된 상대방의 음성 출력을 청취할 수 있음을 의미한다.Here, the 'function as an input device and / or an output device' is a function that uses a source language and a target language, including simultaneous interpretation between two parties In the case of plural words, it is meant that a speaking person or a speaking person on one side and a speaking person or a speaker on both sides can input his / her voice and listen to the voice output of the translated side.

음성인식 모듈(110)은, 발화자의 음성을 녹음하고, 그 녹음된 데이터를 이용하여 음성인식을 수행한다. 음성인식 모듈(110)은 발화자의 입으로부터 나온 음성신호를 자동으로 인식하여 문자열로 변환해 주는 과정을 수행한다. 음서인식 모듈의 다른 명칭은 ASR(Automatic Speech Recognition), Voice Recognition 또는 STT(Speech-to-Text)이다.The voice recognition module 110 records voice of a speaking person and performs voice recognition using the recorded data. The voice recognition module 110 automatically recognizes the voice signal from the mouth of the speaker and converts the voice signal into a character string. Another name of the sound recognition module is ASR (Automatic Speech Recognition), Voice Recognition or STT (Speech-to-Text).

음성인식 모듈(110)은 확률통계 방식에 기반할 수 있다. 즉 음성인식 모듈(110)은 음성인식 과정에서 사용되는 음향모델(Acoustic Model, AM), 언어모델(Language Model, LM)로서 확률통계에 기반한 모델을 사용한다. 그리고 핵심 알고리즘인 HMM(Hidden Markov Model)도 역시 확률통계에 기반할 수 있다. 상기의 모델들은 예시에 해당되며, 본 발명을 한정하려는 것은 아니다.The speech recognition module 110 may be based on a probability statistic. That is, the voice recognition module 110 uses a probability model based on acoustic models (Acoustic Model, AM, and Language Model, LM) used in speech recognition. And the key algorithm, HMM (Hidden Markov Model), can also be based on probability statistics. The above models are examples and do not limit the present invention.

음향모델로서 GMM(Gaussian Mixture Model)이, 언어모델로서 N-gram이 사용될 수 있다. 더 나아가, GMM 대신에 딥 러닝(Deep Learning) 아키텍처 중의 하나인 DNN(Deep Neural Network)이 사용되는 것도 바람직하다. 그리고 음성인식의 성능을 높이기 위해 양질의 음성모델 및 언어모델이 설정되고, 설정된 모델들은 딥 러닝 알고리즘에 의해 학습될 수 있다. 학습에 필요한 학습 DB는 구어체, 대화체의 음성 및 언어 DB를 포함하고 있을 것이 바람직하다.GMM (Gaussian Mixture Model) can be used as an acoustic model, and N-gram can be used as a language model. Furthermore, it is also desirable to use a Deep Neural Network (DNN), which is one of the Deep Learning architectures instead of the GMM. In order to improve speech recognition performance, a good speech model and a language model are set, and the set models can be learned by a deep learning algorithm. It is desirable that the learning DB necessary for learning includes a spoken and dialogue voice and a language DB.

번역 모듈(120)은 음성인식 모듈(110)에 의해 인식된 출발어(Source Language)로 발화된 발화자의 음성이 텍스트로 출력되면, 출력된 텍스트를 도착어(Target Language)의 문자로 번역한다. 본 발명의 일 실시 예에 따른 통역장치(100)는 음성인식 모듈(110)과 함께 번역 모듈(120)도 자체 포함하고 있는 것을 특징으로 한다.The translation module 120 translates the output text into a character of a target language when the voice of a speaking person uttered by the source language recognized by the voice recognition module 110 is outputted as text. The interpretation apparatus 100 according to an embodiment of the present invention may include the translation module 120 in addition to the speech recognition module 110. [

번역 모듈(120)이 수행하는 번역의 방식은 규칙에 기반한 방법, 말뭉치에 기반한 방법 및 인공신경망번역(Neural Machine Translation, NMT) 중에서 적어도 하나를 포함한다. 규칙에 기반한 방법은 분석 깊이에 따라 다시 직접 번역방식이나 간접 변환방식, 중간 언어방식으로 나뉜다. 말뭉치에 기반한 방법으로 예제 기반 방법과 통계기반 방법이 있다.The translation method performed by the translation module 120 includes at least one of a rule-based method, a corpus-based method, and a neural machine translation (NMT). Rules-based methods are divided into direct translation, indirect conversion, and intermediate language depending on the depth of analysis. A corpus-based method is an example-based method and a statistical-based method.

통계 기반 자동번역(Stochastic Machine Translation, SMT) 기술은 통계적 분석을 통해 이중언어 말뭉치로부터 모델 파라미터를 학습하여 문장을 번역하는 기술이다. 문법이나 의미표상을 개발할 때 수작업으로 하지 않고 번역하고자 하는 언어 쌍에 대한 말뭉치로부터 번역에 필요한 모델을 만든다. 그래서 말뭉치만 확보할 수 있다면 비교적 용이하게 언어 확장을 할 수 있다.Stochastic Machine Translation (SMT) technology is a technique of translating sentences by learning model parameters from bilingual corpus through statistical analysis. When developing a grammar or semantic representation, we make a model for translation from the corpus of language pairs that we want to translate rather than by hand. Therefore, if we can secure only a corpus, we can expand the language relatively easily.

통계 기반 자동번역 기술의 단점은, 대규모의 이중언어 말뭉치가 필요하고, 다수의 언어들을 연결하는 공통된 의미표상이 없다는 것이다.A disadvantage of statistical-based automatic translation technology is that large bilingual corpus is required and there is no common semantic representation linking multiple languages.

이러한 단점을 보완하기 위한 기술이 인공신경망 번역(Neural Machine Translation, NMT)이다.Neural Machine Translation (NMT) is a technique to overcome these drawbacks.

SMT는 문장을 단어 또는 몇 개의 단어가 모인 구 단위로 쪼갠 뒤 통계적 모델에 기반해 번역하는 방식이다. 방대한 학습 데이터를 바탕으로 통계적 번역 규칙을 모델링하는 게 핵심이다.SMT is a method of translating sentences based on a statistical model after breaking a word or a group of words into phrases. The key is to model statistical translation rules based on massive learning data.

이와 달리 NMT는 인공지능(AI)이 문장을 통째로 번역한다. 문장 단위 번역이 가능한 이유는 인공신경망이 문장 정보를 가상공간의 특정 지점을 의미하는 벡터(좌표값)로 변환하기 때문이다.On the other hand, NMT translates sentences as a whole through artificial intelligence (AI). This is because the artificial neural network converts the sentence information into a vector (coordinate value) which means a specific point in the virtual space.

가령 '사람'이란 단어를 '[a, b, c, …, x, z]' 형태로 인식한다. 벡터에는 단어, 구절, 어순 등의 정보가 전부 들어있기 때문에 문맥을 이해한 문장 단위 번역이 가능하다. 인공신경망은 비슷한 의미를 담은 문장들을 서로 가까운 공간에 배치한다.For example, the word 'person' is' [a, b, c, ... , x, z] '. Since the vector contains all the information such as words, phrases, and word order, it is possible to translate sentence units that understand the context. Artificial neural networks place sentences with similar meanings close to each other.

NMT 기술에서 고차원의 벡터가 활용된다. 출발어의 문장과 도착어의 문장으로 이루어진 학습 데이터를 활용하여 인공신경망을 학습시키고, 학습된 인공신경망은 문장 정보를 벡터로 인식하게 된다.Higher order vectors are utilized in the NMT technique. The artificial neural network is learned by using the learning data consisting of the sentence of the source language and the sentence of the destination language, and the learned artificial neural network recognizes the sentence information as a vector.

재생 모듈(130)은 번역 모듈(110)의 번역에 따라 발화자의 음성에 대응하는 합성 음성을 출력한다. 이를 위해 사용자 단말(200)은 통역장치(100)로 TTS데이터를 전송한다. 이 경우 사용자 단말(200)이 음성합성 모듈을 포함하게 된다. 재생 모듈(130)은 사용자 단말(200)로부터 전송된 TTS 데이터를 재생하고, 그 결과는 스피커로 출력된다.The reproduction module 130 outputs the synthesized voice corresponding to the voice of the speaker according to the translation of the translation module 110. [ To this end, the user terminal 200 transmits the TTS data to the interpretation device 100. In this case, the user terminal 200 includes a speech synthesis module. The playback module 130 plays back the TTS data transmitted from the user terminal 200, and the result is output to the speaker.

반대로 통역장치(100)가 음성합성 모듈을 포함하는 경우, 통역장치(100) 스스로 TTS데이터를 생성하고, 이를 재생한다.On the other hand, when the interpreting device 100 includes a voice synthesizing module, the interpreting device 100 itself generates TTS data and reproduces it.

음성합성은 TTS(Text-to-speech) 또는 Voice Synthesis라고 불린다. 음성합성의 방법으로 음편조합방식이 사용될 수 있다. 음편조합방식은, 문장 분석, 분석 결과에 따른 음편을 음편 DB에서 추출, 이를 이어 붙인다. 여러 후보들의 합성음이 생성되고, 운율 및 매끄러움을 고려하여 가장 적합한 것이 채택된다. 더욱이 발화자 음성의 사운드 스펙트럼을 이용하여 발화자의 음색을 결정하고, 합성음을 음색에 맞도록 후처리함으로써 원발화자의 음색에 가까운 합성음이 출력될 수 있다. 또한, 발화자의 감정이 인지되고, 인지된 감정이 합성음에 실릴 수도 있다.Speech synthesis is called text-to-speech (TTS) or voice synthesis. As a method of voice synthesis, a music composition method can be used. In the combination method, sentences are extracted from the score database according to the analysis and analysis results. The synthesized sounds of the various candidates are generated, and the most suitable one is adopted in consideration of the rhythm and smoothness. Further, the sound spectrum of the speaker's voice is used to determine the tone color of the speaker, and the synthesized sound is post-processed so as to match the tone color, so that a synthesized sound close to the tone of the original speaker can be output. In addition, the emotion of the speaker may be perceived, and the perceived emotion may be displayed in the synthesized sound.

도 4는 본 발명의 일 실시 예에 따른 저장 모듈의 블록도이다.4 is a block diagram of a storage module in accordance with an embodiment of the present invention.

도 4를 참조하면, 저장 모듈(140)은 기본적으로 음성인식 DB 및 번역 DB를 저장하고 추가적으로 TTS 엔진을 저장할 수도 있고, 또한, 통역장치(100)와 사용자 단말(200)의 연동을 위해 사용자 단말(200)이 저장하고 있는 클라이언트 프로그램에 대응하는 서버 프로그램을 저장한다.4, the storage module 140 basically stores a voice recognition DB and a translation DB, and additionally stores a TTS engine. Further, in order to interoperate between the interpreter 100 and the user terminal 200, And a server program corresponding to the client program stored in the server 200.

본 발명의 일 실시 예에 따른, 음성인식 DB(141)는, 딥 러닝의 알고리즘을 이용하여 다양한 발화로 인한 음성을 학습시키고 발화 내용의 빈도수에 따라 인식 범위를 축소 또는 확대시켜 구축된 DB인 것을 특징으로 한다. 즉 빈도수가 높은 발화 내용을 인식시키기 위해서는 DB 양을 상대적으로 늘리고, 빈도수가 낮은 발화 내용을 인식시키기 위해서는 DB 양을 대폭 줄이는 것이다.The speech recognition DB 141 according to an embodiment of the present invention is a DB constructed by learning speech by various utterances using an algorithm of deep learning and reducing or enlarging the recognition range according to the frequency of uttered contents . That is, in order to recognize the contents of the utterance having a high frequency, it is necessary to increase the DB amount relatively and reduce the amount of DB to recognize the contents of the utterance with a low frequency.

완성도 높은 음성인식률을 얻기 위해서는 음성인식 DB(141)의 양이 많을수록 유리하나, 시간의 지연 및 과부하의 문제점이 있기 마련인데, 상기 방법에 따르면 DB 전체량을 줄임으로써 저용량의 DB를 구축하는 것이 가능하다.In order to obtain a high-quality speech recognition rate, the larger the amount of the speech recognition DB 141 is, the more advantageous it is, but there is a problem of time delay and overload. According to the above method, it is possible to construct a low- Do.

또한, 번역 DB(142)에 대해서도, 상기 방법과 마찬가지로, 딥 러닝의 알고리즘을 이용하여 다양한 번역 예를 학습시키고, 번역 예의 빈도수에 따라 구어체 표현을 확대하고, 문어체 표현을 축소시켜 DB를 구축할 수 있다.Also, as with the above method, various translation examples can be learned by using the deep learning algorithm, the colloquial expression can be expanded according to the frequency of translation examples, have.

따라서 본 발명에 따른 음성인식 모듈(110)과 번역 모듈(120)은, 빈도수를 고려하지 않고 구축된 DB 대비, 저용량의 음성인식 DB 또는 번역 DB를 이용할 수 있다.Therefore, the speech recognition module 110 and the translation module 120 according to the present invention can use a low-capacity speech recognition DB or a translation DB compared to the DB constructed without considering the frequency.

여기서, 저장 모듈(140)은 휘발성의 RAM 및 비휘발성의 ROM, 플래시 메모리를 포함하고, 그 기능에 따라 각종 디지털 파일을 저장한다.Here, the storage module 140 includes a volatile RAM, a nonvolatile ROM, and a flash memory, and stores various digital files according to the functions thereof.

입력 모듈(150)은, 마이크로폰(151), 음성입력버튼(152) 및 전원 버튼(153)을 포함한다.The input module 150 includes a microphone 151, a voice input button 152, and a power button 153.

도 5는 본 발명의 일 실시 예에 따른 통역장치(100)의 정면도이다.5 is a front view of an interpretation device 100 according to an embodiment of the present invention.

도 5를 참조하면, 마이크로폰(151)은 발화자의 음성을 입력받아 전기적인 신호로 변환한다. 그리고 전기적인 신호는 A/D 컨버터에 의해 디지털로 변환될 수 있다. 도 5에서 스피커(161)와 마이크로폰(151)의 위치는 도 5에 한정되는 것은 아니며, 하부의 USB 잭과 근접하게 놓이도록 설계될 수도 있다.Referring to FIG. 5, a microphone 151 receives a voice of a speaker and converts the voice into an electrical signal. The electrical signal can then be digitally converted by the A / D converter. In FIG. 5, the positions of the speaker 161 and the microphone 151 are not limited to those shown in FIG. 5, but may be designed to be placed close to the lower USB jack.

음성입력버튼(152)은 발화자의 음성 입력을 판단하기 위해 이용될 수 있다. 음성인식 모듈(110)은 단순히 입력된 음성에 대응하는 전사(Transcription)뿐만 아니라 문장 경계에 대한 정보도 제공해야 한다. 이를 위해 음성입력버튼(152)이 이용될 수 있다. The voice input button 152 can be used to determine the voice input of the speaker. The speech recognition module 110 should provide not only the transcription corresponding to the inputted speech but also the information about the sentence boundary. For this purpose, a voice input button 152 may be used.

즉 발화자는 음성입력버튼(152)을 누르거나 터치한 상태에서 발화하거나, 또는 음성입력버튼(152)의 누름 또는 터치를 발화의 처음과 끝에 각각 수행함으로써 자신이 입력하는 음성의 시작과 끝을 통역장치(100), 즉 음성인식 모듈(110)에게 인식시킬 수 있다. That is, the speaker starts speaking or touching the voice input button 152 or touches or touches the voice input button 152 at the beginning and end of the utterance, respectively, That is, the speech recognition module 110, as shown in FIG.

문장 단위로 구분되지 않은 텍스트를 처리할 경우 번역 성능이 떨어질 수 있다. 이를 방지하기 위해서 음성인식의 이전 단계에서 문장 단위로 음성을 입력시키기 위해 음성입력버튼(152)이 이용될 수 있다.If you process text that is not delimited by sentence units, translation performance may be degraded. In order to prevent this, a voice input button 152 may be used to input voice in units of sentences in a previous step of voice recognition.

음성입력버튼(152)은 아래에 후술할 전원버튼(153)과 하나의 버튼으로 그 기능을 공유할 수 있다. 즉, 하나의 버튼을 사용하여 터치에 의해 음성입력의 On/Off이 전환되고, 누름 동작의 길고 짧음에 의해 전원의 On/Off이 전환될 수 있다. 이 경우, 음성입력버튼(152)의 동작을 위해서 터치 면에 콘덴서를 내장시켜 터치 여부를 감지하는 것이 바람직하다.The voice input button 152 may share the function with a power button 153 and a button, which will be described later. In other words, on / off of the voice input is switched by touching using one button, and on / off of the power can be switched by long and short pressing operation. In this case, in order to operate the voice input button 152, it is preferable to incorporate a capacitor on the touch surface to detect whether or not the touch is performed.

전원버튼(153)은 누름 동작에 의해 짧게 또는 길게 눌려짐으로서 통역장치(100)에 파워를 인가함으로써 구동을 시작하게 하거나, 구동을 종료하게 할 수 있다. 그리고 통역장치(100)는 Power On 이후에 Ready 및 Standby라는 상태에 놓일 수 있다. 통역장치(100)는 상기 상태에 따라 전력을 다르게 소비할 수 있다. 이에 대한 자세한 설명은 후술하기로 한다.Since the power button 153 is depressed for a short time or a long time by the depressing operation, power can be applied to the interpreter device 100 to start driving or terminate the driving. Then, the interpreting device 100 can be placed in a state of Ready and Standby after Power On. The interpreting device 100 can consume power differently according to the above-mentioned state. A detailed description thereof will be described later.

다시 도 5를 참조하면, 출력 모듈(160)은 스피커(161), 각종 LED 인디케이터(162, 163, 164)를 포함한다. LED 인디케이터(Indicator)는 배터리 상태 표시 LED(162), 전원 On/Off 상태 표시 LED 및 사용자 단말(200)과의 통신연결 표시 LED를 포함한다.Referring again to FIG. 5, the output module 160 includes a speaker 161 and various LED indicators 162, 163, and 164. The LED indicator includes a battery status display LED 162, a power on / off status display LED, and a communication connection display LED with the user terminal 200.

전원 모듈(170)은 전력 소스 및 전력 소스의 충전 및/또는 방전을 위한 충방전 장치를 포함한다. 전력 소스로는 배터리가 사용될 수 있다. 그리고 배터리는 충방전 장치에 의해 충전 및 방전이 되고, 방전된 전력은 통역장치(100)의 구동에 에너지원으로 작용한다.The power module 170 includes a power source and a charge / discharge device for charging and / or discharging the power source. A battery can be used as a power source. Then, the battery is charged and discharged by the charge / discharge device, and the discharged electric power acts as an energy source for driving the interpretation device 100. [

통역장치(100)에 전원 인가는 전원버튼에 의해 이루어진다. 이하 전원버튼을 이용한 통역장치(100)의 상태 전환에 대하여 설명하기로 한다.Power supply to the interpretation apparatus 100 is performed by a power button. Hereinafter, the status change of the interpretation apparatus 100 using the power button will be described.

도 6은 본 발명의 일 실시 예에 따른 통역장치의 상태의 전환을 나타내는 예시도이다.FIG. 6 is an exemplary diagram showing the transition of the state of an interpreter according to an embodiment of the present invention. FIG.

도 6을 참조하면, 휴대용 서버인 통역장치(100)는 Power On 상태, Ready 상태, Standby 상태, Run 상태 및 Off 상태에 놓일 수 있으며, 전원버튼의 동작 및 통역장치의 수행 기능에 따라 상기 상태를 전환한다.Referring to FIG. 6, the interpretation apparatus 100 as a portable server can be placed in a power on state, a ready state, a standby state, a run state, and an off state. Switch.

Standby 상태에서, 통신 모듈(180)은 On 상태이고, 제어 모듈(190)은 슬립 상태에 있다. 그리고 Ready 상태에서, 제어 모듈(190)이 Off에서 On 상태로 전환되는 것이 특징이다.In the standby state, the communication module 180 is in an ON state and the control module 190 is in a sleep state. In the Ready state, the control module 190 is switched from the Off state to the On state.

Power Off 상태에서 전원버튼이 눌려지면 통역장치(100)는 Ready 상태로 전환한다. 그리고 일정시간 대기후에 통신 모듈(180)이 On 되면서, 통역장치는 Standby 상태로 전환한다.When the power button is pressed in the power off state, the interpreting device 100 switches to the ready state. Then, after waiting for a predetermined time, the communication module 180 is turned on, and the interpreter device switches to the standby mode.

다음으로 통역장치(100)의 제어 모듈(190)과 사용자 단말(200)의 제어부(270)가 연동되면, 즉 On 되면 통역장치(100)는 Ready 상태로 전환한다.Next, when the control module 190 of the interpretation apparatus 100 and the control unit 270 of the user terminal 200 are interlocked, that is, turned on, the interpreting apparatus 100 switches to the Ready state.

다음으로 통역장치는 Ready 상태에서 지령에 따라 기능을 수행하는 Run 상태로 들어가게 된다. Ready 상태에서도 일정시간 대기후 통역장치(100)는 Standby 상태로 전환할 수 있다.Next, the interpreter enters the Run state, which performs the function according to the command in the ready state. After waiting for a predetermined time even in the ready state, the interpreting device 100 can switch to the standby state.

그리고 동작 중의 대부분이 Ready 또는 Run 상태인데, 이들 상태에서 전원버튼이 길게 눌려지면 통역장치(100)는 Power Off 상태로 전환한다.Most of the operations are in the Ready or Run state. When the power button is depressed in these states, the translating apparatus 100 switches to the power off state.

통신 모듈(180)은 사용자 단말(200)과 유선 또는 무선으로 통신한다. 이 경우, 사용자 단말(200)이 대화자 양측 중에서 어느 한 측의 발화자에 대해 입력장치 및/또는 출력장치로 기능하는 경우, 통역장치(100)는 그 상대방에 대해 입력장치 및/또는 출력 장치로 기능함을 특징으로 한다. 즉, 상대방은 전용의 통역장치(200)에 자신의 음성을 입력할 수 있고, 발화자의 번역된 음성을 출력에 의해 청취할 수 있다.The communication module 180 communicates with the user terminal 200 in a wired or wireless manner. In this case, when the user terminal 200 functions as an input device and / or an output device for a speaking party on either side of the talker, the interpreting device 100 functions as an input device and / . That is, the other party can input his / her voice to the dedicated interpreting device 200, and can listen to the translated voice of the speaker by outputting.

도 7은 본 발명의 일 실시 예에 따른 휴대용 서버(100)와 사용자 단말(200) 간의 통신을 나타내는 예시도이다.7 is an exemplary diagram illustrating communication between a portable server 100 and a user terminal 200 according to an embodiment of the present invention.

도 7을 참조하면, 휴대용 서버에 해당하는 통역장치(100)와 사용자 단말(200)이 블루투스 페어링 상태에 있다. 도 7에 나타난 실시 예에 따라, 사용자 단말(200)의 클라이언트 프로그램의 UI를 이용하여 출발어가 한국어로 설정되고, 도착어가 영어로 설정될 수 있다. 그리고 한국어, 영어 명칭을 갖는 마이크 버튼을 터치하고 발화함으로써 통역장치(100)가 언어를 판별하는 부담을 덜어 줄 수 있다.Referring to FIG. 7, the interpretation device 100 and the user terminal 200 corresponding to the portable server are in the Bluetooth pairing state. According to the embodiment shown in FIG. 7, the source language can be set to Korean and the destination language can be set to English using the UI of the client program of the user terminal 200. [ Then, by touching the microphone button having the Korean and English names and making a speech, it is possible to alleviate the burden of the interpreting device 100 discriminating the language.

제어 모듈(190)은 통역장치(100)에 의해 수행된 동작, 예를 들면 음성인식 또는 번역이 완료되었음을 사용자 단말(200)에 확인시키기 위해, 확인 메시지를 생성한다. 생성된 확인 메시지는 통신부를 통해 사용자 단말(200)에 전송된다.The control module 190 generates an acknowledgment message to confirm to the user terminal 200 that the operation performed by the interpreter device 100, for example, voice recognition or translation, has been completed. The generated confirmation message is transmitted to the user terminal 200 through the communication unit.

종합적으로 제어 모듈은, 상기 통역장치(100)인 휴대용 서버의 구동, 음성 녹음, 녹음 데이터의 전송 및 확인 메시지 전송을 제어하는 것을 특징으로 한다.In general, the control module controls driving of the portable server as the interpretation apparatus 100, voice recording, transmission of recorded data, and transmission of acknowledgment messages.

발화자 사이에서 어느 한 측의 발화가 끝나고 상대방의 발화가 있다는 보장은 없다. 따라서 동시에 발생할 수 있는 발화에 있어서, 동시에 입력되는 이종의 음성을 구별할 필요가 있다. 이를 해결하기 위해, 제어 모듈(190)은, 발화자 음성의 사운드 스펙트럼을 이용하여 음색의 특징을 결정하고, 결정된 음색의 특징을 이용하여 동시 발화된 이종 언어의 음성을 필터를 이용하여 필터링한다. 이에 따라 이종 언어 음성의 발화자가 구별되고, 필터링에 의해 이종 언어의 음성이 서로 분리 될 수 있다.There is no guarantee that one side's utterance ends and the other's utterance is between the utterances. Therefore, it is necessary to distinguish the different kinds of speech inputted at the same time in the simultaneous utterance. In order to solve this problem, the control module 190 determines characteristics of the tone color using the sound spectrum of the speaker's voice, and filters the voice of the simultaneous speaking heterogeneous language using the determined characteristics of the tone color using the filter. Accordingly, the speakers of the different language speech are distinguished, and the sounds of the different language can be separated from each other by filtering.

더 나아가, 제어 모듈(190)은, 동시 발화된 이종 언어의 음성에 대해, 샘플 음성의 번역 결과에 따른 점수(scoring)를 이용하여 이종 언어들이 어느 나라의 언어에 해당하는지 구별하는 것을 특징으로 한다.Furthermore, the control module 190 distinguishes which language the disparate languages correspond to, using scoring according to the translation result of the sample voice, for the voice of the simulated speech of the different languages .

구체적으로 영어과 국어의 음성이 혼재되어 입력되는 경우에, 영어 발화자의 음색과 국어 발화자의 음색의 특징에 따른 필터링된 음성 신호에 대해 하나의 음성 신호에 대해 영어 및 국어로, 다른 하나의 음성 신호에 대해 국어 및 영어로 번역을 시도하여 이를 점수로 환산하여 가장 높은 점수를 획득한 번역을 채택함으로써 해당 언어가 어느 나라 언어인지를 결정한다.Specifically, in the case where voices of English and Korean are input mixedly, the filtered speech signal according to the characteristics of the tone of the English speaker and the tone of the Korean speech is translated into English and Korean for one speech signal, We attempt to translate Korean and English into English and translate them into points, and adopt the translation with the highest score to decide which language is the language.

도 8은 본 발명의 일 실시 예에 따른 사용자 단말의 블록도이다.8 is a block diagram of a user terminal in accordance with an embodiment of the present invention.

도 8을 참조하면, 본 발명의 일 실시 예에 따른 사용자 단말(200)은 통신부(210), 디스플레이(220), 저장부(230), 입력부(240), 출력부(250), 전원부(260) 및 제어부(270)를 포함한다.8, a user terminal 200 according to an exemplary embodiment of the present invention includes a communication unit 210, a display 220, a storage unit 230, an input unit 240, an output unit 250, a power source unit 260 And a control unit 270. [0033]

사용자 단말(200)의 다양한 실시 예들은 셀룰러 전화기, 무선 통신 기능을 가지는 스마트 폰, 무선 통신 기능을 가지는 개인 휴대용 단말기(PDA), 무선 모뎀, 무선 통신 기능을 가지는 휴대용 컴퓨터, 무선 통신 기능을 가지는 디지털 카메라와 같은 촬영장치, 무선 통신 기능을 가지는 게이밍 (gaming) 장치, 무선 통신 기능을 가지는 음악저장 및 재생 가전제품, 무선 인터넷 접속 및 브라우징이 가능한 인터넷 가전제품뿐만 아니라 그러한 기능들의 조합들을 통합하고 있는 휴대형 유닛 또는 단말기들을 포함할 수 있으나, 이에 한정되는 것은 아니다.Various embodiments of the user terminal 200 may include a cellular telephone, a smartphone having wireless communication capabilities, a personal digital assistant (PDA) having wireless communication capabilities, a wireless modem, a portable computer having wireless communication capabilities, a digital Gaming devices with wireless communication, music storage and playback appliances with wireless communication capabilities, Internet appliances capable of wireless Internet access and browsing, as well as portable devices incorporating such combinations of functions Units or terminals, but is not limited thereto.

통신부(210)는, 통신망(14)의 각종 통신망에 대응하는 통신 모듈, 예를 들어 블루투스 모듈, WiFi 모듈, 이더넷, USB 모듈, 셀룰러 무선통신 모듈을 포함할 수 있으나, 본 발명에 따른 실시 예에서는 USB 모듈과 같은 유선통신부와 블루투스 모듈, 지그비 모듈, NFC 모듈과 같은 근거리 통신 모듈을 포함하는 것이 가장 바람직하다.The communication unit 210 may include a communication module corresponding to various communication networks of the communication network 14, for example, a Bluetooth module, a WiFi module, an Ethernet, a USB module, and a cellular wireless communication module. It is most preferable to include a wired communication unit such as a USB module and a short-range communication module such as a Bluetooth module, a Zigbee module, and an NFC module.

디스플레이(220)는 LCD 디스플레이, LED 디스플레이와 같이 화소로 이루어진 화면을 보여주는 장치에 해당한다.The display 220 corresponds to a device for displaying a screen composed of pixels such as an LCD display and an LED display.

저장부(230)는 제어부(207)와 제어 모듈(109)을 연동시키기 위한 클라이언트 프로그램을 저장한다. 여기서 저장부(230)는 휘발성의 RAM 및 비휘발성의 ROM, 플래시 메모리를 포함하고, 그 기능에 따라 각종 디지털 파일을 저장한다. 특히 저장부(230)는 TTS 엔진을 저장함으로써, 휴대형 서버(100) 측에 저장하지 않고 사용자 단말 측에 저장할 수도 있다.The storage unit 230 stores a client program for linking the control unit 207 and the control module 109. The storage unit 230 includes a volatile RAM, a non-volatile ROM, and a flash memory, and stores various digital files according to the functions thereof. In particular, the storage unit 230 may store the TTS engine, not the portable server 100, but the user terminal.

입력부(240)는 각종 파라미터 설정을 위한 키보드, 터치스크린 및 마우스를 포함한다.The input unit 240 includes a keyboard, a touch screen, and a mouse for setting various parameters.

출력부(250)는 스피커, 헤드셋 및 이어셋을 포함한다. 특히 마이크로폰과 일체로 형성된 헤드셋과 이어셋은 핸즈프리 동시통역에 있어서 유용한 기능성을 갖는다.The output unit 250 includes a speaker, a headset, and an earset. Especially, the headset and earset formed integrally with the microphone have useful functionality in hands-free simultaneous interpretation.

전원부(260)는 전력 소스 및 전력 소스의 충전 및/또는 방전을 위한 충방전 장치를 포함한다. 전력 소스로는 배터리가 사용될 수 있다. 그리고 배터리는 충방전 장치에 의해 충전 및 방전이 되고, 방전된 전력은 사용자 단말(200)의 구동에 에너지원으로 작용한다.The power supply unit 260 includes a power source and a charge / discharge device for charging and / or discharging the power source. A battery can be used as a power source. The battery is charged and discharged by the charge / discharge device, and the discharged power acts as an energy source for driving the user terminal 200.

제어부(270)는 하드웨어적으로는 중앙처리장치에 해당하는 CPU(Central Processing Unit)로 구현될 수 있으며, 구체적으로는 통역장치(100)의 제어 모듈(190)과 연동을 위해 저장부(230)에 로딩된 클라이언트 프로그램 및 이 상태에서 연산 작용을 하는 상기 CPU를 통합하는 개념이다.The control unit 270 may be implemented as a CPU (Central Processing Unit) corresponding to a central processing unit in terms of hardware. Specifically, the control unit 270 may include a storage unit 230 for interlocking with the control module 190 of the interpretation apparatus 100, And the CPU that performs an operation in this state.

제어부(270)는 제어 모듈(190)을 조작하여 통역장치(100)의 각종 기능을 수행하도록 할 수 있다.The control unit 270 may operate the control module 190 to perform various functions of the interpretation device 100. [

본 발명의 일 실시 예에 따라, 클라이언트 프로그램을 이용하여 통역장치(100)와 사용자 단말(200)이 담당하게 될 언어 설정이 자동으로 수행될 수 있다. 즉, 제어 모듈(109)을 포함하여, 제어부(207)는 사용자 단말(200)의 설정 언어를 참조하여 출발어를 한국어를 자동 설정할 수 있다.According to an embodiment of the present invention, the language setting to be performed by the interpreting device 100 and the user terminal 200 using the client program can be automatically performed. That is, the control unit 207 can automatically set Korean as the source language by referring to the setting language of the user terminal 200, including the control module 109.

추가적으로 이종 언어로 설정된 사용자 단말(200)이 일정 거리 내에 있는 경우, 각각의 사용자 단말의 언어 설정을 참조하여 출발어와 목적어가 자동 설정될 수 있다.In addition, when the user terminal 200 set in the heterogeneous language is within a certain distance, the source language and the object language can be automatically set by referring to the language setting of each user terminal.

또한, 제어부(207)는 통역장치(100)를 통해 발화되는 음성의 샘플을 이용하여 번역의 완성도에 대한 점수를 매겨서 가장 높은 점수를 받은 언어를 도착어로 자동 설정한다.In addition, the control unit 207 scales the degree of completeness of translation by using a sample of speech uttered through the interpreter 100, and automatically sets the language having the highest score as the arrival word.

이하 본 발명의 일 실시 예에 따른 통역방법에 대해 설명하기로 한다.Hereinafter, an interpretation method according to an embodiment of the present invention will be described.

도 9는 본 발명의 일 실시 예에 따른 통역방법의 흐름도이다.9 is a flowchart of an interpretation method according to an embodiment of the present invention.

도 9를 참조하면, 본 발명의 일 실시 예에 따른 통역방법은, 동시통역의 당사자들과의 관계에서 적어도 어느 한 발화자에 대해 입력장치 및/또는 출력장치로 기능하는 휴대용 서버에 해당하는 통역장치(100)에 의해 수행됨을 특징으로 한다.9, an interpretation method according to an embodiment of the present invention includes interpreting apparatuses corresponding to a portable server functioning as an input device and / or an output device with respect to at least one speaker in relation to parties in simultaneous interpretation, (100). &Lt; / RTI >

상기 통역방법은, 상기 휴대용 서버와 사용자 단말이 유선 또는 무선으로 통신하는 단계(S110); 제어부가 구비된 사용자 단말(200)이 휴대용 서버를 제어하는 단계(S120); 자체 내부에 포함하고 있는 음성인식 모듈 및 음성인식 DB를 이용하여 음성을 인식하는 단계(S130); 및 자체 내부에 포함하고 있는 번역 모듈 및 번역 DB를 이용하여 상기 인식된 텍스트를 번역하는 단계(S140); 재생 모듈을 통해 상기 번역 모듈의 번역에 따라 상기 어느 한 발화자의 음성에 대응하는 합성 음성을 출력하는 단계(S150); 및 통역장치(100)에 의해 완료된 동작을 확인시키기 위한 확인 메시지를 사용자 단말(200)에 전송하게 하기 위해 제어 모듈이 메시지를 생성하고, 이를 사용자 단말(200)에 전송하는 단계(S160)를 포함한다.The interpretation method may include: a step (S110) of the wired or wireless communication between the portable server and the user terminal; A step S120 of controlling the portable server by the user terminal 200 having the control unit; Recognizing a voice using a voice recognition module and a voice recognition DB included in the voice recognition module (S130); And translating the recognized text using a translation module and a translation DB included in the translation module (S140); (S150) of outputting a synthesized voice corresponding to a voice of one of the speakers in accordance with the translation of the translation module via the playback module; And a step S160 of causing the control module to generate a message and transmit it to the user terminal 200 in order to transmit a confirmation message to the user terminal 200 to confirm the operation completed by the translating device 100 do.

여기서, 음성을 인식하는 단계(S130)는, 마이크로폰을 통해 상기 적어도 어느 한 발화자의 음성을 입력받고, 상기 음성인식 모듈을 통해 상기 음성을 녹음하고 녹음 데이터를 이용하여 음성인식을 수행하는 것을 특징으로 한다.Here, the step of recognizing the voice (S130) may include receiving the voice of at least one of the speakers through a microphone, recording the voice through the voice recognition module, and performing voice recognition using the recorded data do.

여기서, 상기 음성을 인식하는 단계는, 상기 사용자 및/또는 상대방의 음성 입력 시점을 판단하기 위해 음성입력버튼이 사용되는 것을 특징으로 한다.Herein, the voice recognition step may include using a voice input button to determine a voice input time point of the user and / or the other party.

또한, 상기 사용자 단말이 상기 어느 한 발화자에 대해 입력장치 및/또는 출력장치로 기능하는 경우, 상기 휴대용 서버는 그 상대방에 대해 입력장치 및/또는 출력 장치로 기능하는 것을 특징으로 한다.Further, when the user terminal functions as an input device and / or an output device with respect to any one of the speakers, the portable server functions as an input device and / or an output device with respect to the other user.

여기서, 합성 음성을 출력하는 단계(S150)는, 사용자 단말(200)로부터 전송된 TTS 데이터를 재생하고 합성 음성을 출력하는 것을 특징으로 한다.Here, the step of outputting the synthesized voice (S150) is characterized by reproducing the TTS data transmitted from the user terminal 200 and outputting the synthesized voice.

여기서, 상기 제어하는 단계는, 휴대용 서버(100)의 구동, 음성 녹음, 녹음 데이터의 전송 및 상기 확인 메시지의 전송을 제어하는 것을 특징으로 한다.Here, the controlling step controls driving of the portable server 100, voice recording, transmission of recorded data, and transmission of the confirmation message.

여기서, 상기 음성을 인식하는 단계는, 상기 음성의 사운드 스펙트럼을 이용하여 음색을 결정하고, 결정된 음색에 따라 동시 발화된 이종 언어 음성의 발화자를 구별하는 단계를 더 포함하는 것을 특징으로 한다.Here, the step of recognizing the voice may further include determining a tone color using the sound spectrum of the voice, and distinguishing a speaker of a different language speech that has been simultaneously fired according to the determined tone color.

여기서, 상기 음성을 인식하는 단계는, 동시 발화된 이종의 언어의 음성에 대해, 샘플 음성의 번역 결과에 따른 점수(scoring)를 이용하여 이종 언어의 종류를 구별하는 단계를 포함하는 것을 특징으로 한다.Here, the step of recognizing the speech includes the step of discriminating the kind of the heterogeneous language using scoring according to the translation result of the sample speech, with respect to the speech of the different language of the simultaneous utterance .

도 10은 본 발명의 일 실시 예에 따른 통역방법에서 S130 단계의 흐름도이다.FIG. 10 is a flow chart of step S130 in the interpretation method according to an embodiment of the present invention.

도 10을 참조하면, 음성을 인식하는 단계는, 발화자 음성의 사운드 스펙트럼을 이용하여 음색을 결정하고, 결정된 음색에 따라 동시 발화된 이종 언어 음성의 발화자를 구별하는 단계; 및 동시 발화된 이종 언어의 음성에 대해, 샘플 음성의 번역 결과에 따른 점수(scoring)를 이용하여 이종 언어의 종류를 구별하는 단계를 포함하는 것을 특징으로 한다.Referring to FIG. 10, the step of recognizing speech comprises the steps of: determining a tone color using a sound spectrum of a speaker's voice; discriminating a speaker of a different language speech that has been simultaneously uttered according to the determined tone color; And discriminating the kind of the heterogeneous language using scoring according to the translation result of the sample voice, for the voice of the simultaneous uttered heterogeneous language.

통역장치(100) 및 사용자 단말(200)로 입력되는 소리, 예를 들어 발화자의 음성은 서로 구별되도록 하는 것이 바람직하다.It is preferable that sounds input to the interpreting device 100 and the user terminal 200, for example, voices of the speakers, are distinguished from each other.

그러나 한국어를 사용하는 제1 발화자와 영어를 사용하는 제2 발화자가 동시에 발화하는 경우가 생길 수 있다. 이에 부가하여 제1 발화자와 근접한 사용자 단말(200)의 마이크로폰을 통해 입력되는 제1 발화자의 음성이, 공간의 울림이나 높은 데시벨 값으로 인해서 제2 발화와 근접한 휴대형 서버에 해당하는 통역장치(200)에 입력될 수 있다.However, there may be a case where the first speaker using Korean and the second speaker using English speak at the same time. In addition, the voice of the first speaker, which is input through the microphone of the user terminal 200 adjacent to the first speaker, is interpreted by the interpreter 200 corresponding to the portable server close to the second utterance due to the ringing of the space or the high decibel value, Lt; / RTI >

상기의 경우 제1 발화자의 음성과 제2 발화자의 음성이 혼재된 음성은 통역장치(100)를 통해서도 입력되고, 사용자 단말(200)을 통해서도 입력되게 된다. 따라서 음성을 인식해야 하는 통역장치(100)는 영어와 한국어가 혼합된 음성을 토대로 음성인식하기 이전에 이를 구별하고 분리할 필요가 있다.In such a case, a voice in which the voice of the first speaker and the voice of the second speaker are mixed is also input through the interpreter device 100 and inputted through the user terminal 200 as well. Therefore, the interpreting device 100 that needs to recognize a voice needs to distinguish and separate it from a mixture of English and Korean before recognizing the speech.

혼합된 음성을 분리하기 위해서는, 일단 혼합된 음성을 구성하는 부분 음성 중에서 어떤 부분 음성이 어떤 화자에 것인가를 구별할 필요가 있다. 이를 해결하기 위해서 일단 발화자를 구별하는 것이 필요하다. 더 나아가 언어의 종류까지 구별할 수 있다면 구별된 발화자와 구별된 언어의 종류를 매치시킬 수 있을 것이다.In order to separate mixed speech, it is necessary to distinguish which partial speech belongs to which one of the partial voices constituting the mixed speech. To solve this problem, it is necessary to distinguish the speakers. Furthermore, if we can distinguish the kind of language, we will be able to match the distinctive language with the distinctive speaker.

종합적으로, 본 발명의 일 실시 예에 따른 통역장치 및 그 방법은, 무선통신 환경이 조성되지 않은 상황에서 휴대용 서버를 이용하여 통역을 수행하고, 상대방을 위한 입력/출력 장치를 마련하고, 출발어와 도착어를 분리하여 입력함으로써 음성인식의 품질이 높일 수 있고, 만일을 대비해 동시에 발화된 이종 언어의 발화자를 사운드 스펙트럼을 이용하여 발화자의 음색에 따라 구별하고, 통역 품질에 해당하는 통역 만족도에 관한 점수를 이용하여 언어의 종류를 구별할 수 있는 통역장치 및 그 방법에 관한 것이다.In general, an interpretation apparatus and method according to an embodiment of the present invention can perform interpretation using a portable server in a situation where a wireless communication environment is not provided, provide an input / output device for a counterpart, It is possible to improve the quality of speech recognition by inputting the arrival word separately, and in order to distinguish between the utterances of the simultaneous utterances of the different languages according to the tone of the utterance using the sound spectrum, To an interpretation apparatus and method for distinguishing the types of languages.

이상으로 본 발명은 도면에 도시된 실시 예를 참고로 하여 설명되었으나, 이는 예시적인 것에 불과하며, 당해 기술이 속하는 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 기술적 보호범위는 아래의 특허청구범위에 의해서 판단되어야 할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, I will understand the point. Accordingly, the technical scope of the present invention should be determined by the following claims.

11, 200: 사용자 단말 12, 300: 통역 서버
13: 데이터베이스 14: 통신망
100: 통역장치 110: 음성인식 모듈
120: 번역 모듈 130: 재생 모듈
140: 저장 모듈 141: 음성인식 DB
142: 번역 DB 150: 입력 모듈
151: 마이크로폰 152: 음성인식버튼
153: 전원버튼 160: 출력 모듈
161: 스피커 162: 배터리 상태 표시 LED
163: 전원 On/Off 상툐 표시 LED
164: 블루투스 표시 LED 170: 전원 모듈
200: 사용자 단말 210: 통신부
220: 디스플레이부 230: 저장부
240: 입력부 250: 출력부
260: 전원부 270: 제어부11, 200: user terminal 12, 300: interpretation server
13: Database 14: Network
100: Interpretation device 110: Voice recognition module
120: translation module 130: reproduction module
140: storage module 141: voice recognition DB
142: translation DB 150: input module
151: microphone 152: voice recognition button
153: power button 160: output module
161: Speaker 162: Battery status display LED
163: Power On / Off display LED
164: Bluetooth display LED 170: Power module
200: user terminal 210:
220: display unit 230: storage unit
240: input unit 250: output unit
260: power supply unit 270:

Claims

A translation module included within itself that performs bidirectional translation; And
A portable server including a voice recognition DB and a translation DB, and a storage module for storing the voice recognition DB and the translation DB, wherein the voice recognition DB and the translation DB are stored in the portable terminal, Interpreters.

The method according to claim 1,
The interpretation apparatus comprises:
Further comprising: a voice recognition module for recording a voice of a speaking person and performing bidirectional voice recognition using the recording data and / or the recording data received at the user terminal.

The method according to claim 1 or 2,
The speech recognition DB includes:
The database is constructed by learning the speech due to various utterances using the deep learning algorithm and reducing or enlarging the recognition range according to the frequency of utterance contents.
The translation DB,
It is a database constructed by learning various translation examples using the algorithm of deep learning, expanding colloquial expressions according to frequency of translation examples,
The speech recognition engine or translation engine,
Characterized in that a voice recognition DB or a translation DB of a low capacity as compared with a DB whose frequency is not taken into account is used.

The method of claim 2,
The interpretation apparatus comprises:
Further comprising a reproduction module for outputting a synthesized voice corresponding to a voice of a speaker in accordance with the translation of the translation module.

The method of claim 2,
The interpretation apparatus comprises:
Further comprising a voice input button for determining the voice input of the speaker and / or the other party.

The method of claim 5,
The interpretation apparatus comprises:
Further comprising a communication module for communicating wired or wireless with the user terminal,
Wherein the user terminal functions as an input device and / or an output device for the other user when the user terminal functions as an input device and / or an output device for the one of the speakers.

The method of claim 6,
The interpretation apparatus comprises:
Further comprising a control module for generating a confirmation message for confirming the completed operation to the user terminal so as to transmit the confirmation message to the user terminal.

The method of claim 6,
Wherein the playback module comprises:
And reproduces the TTS data transmitted from the user terminal and outputs the result to the speaker.

The method of claim 6,
The interpretation apparatus comprises:
Further comprising: a user terminal having a control unit for controlling the portable server.

The method of claim 9,
The control module includes:
Wherein said control means controls driving of said portable server, voice recording, transmission of recorded data, and transmission of said acknowledgment message.

The method of claim 10,
The user terminal further includes a storage unit,
Wherein the storage unit stores a client program for linking the control unit and the control module,
Wherein the storage module stores a server program for the client program.

The method of claim 6,
The portable server includes:
Further comprising a power button for switching between Standby, Ready, and Run states in relation to the user terminal and switching between Power On state and Power Off state,

In claim 12,
The portable server includes:
In the standby state, the communication module is in an on state, the control module is in a sleep state,
And in the Ready state, the control module is switched to the On state.

The method of claim 7,
The control module includes:
Wherein a tone color is determined using a sound spectrum of a speaking person's voice and a speaking person of a simultaneous speaking different language speech is distinguished according to the determined tone color.

15. The method of claim 14,
The control module includes:
Characterized in distinguishing different kinds of languages by using scoring according to the translation result of the sample speech with respect to the speech of the simultaneous uttered heterogeneous language.

A portable server functioning as an input device and / or an output device for at least one speaker in a relationship with the parties of the simultaneous interpretation,
Recognizing speech using a speech recognition module and a speech recognition DB included in the speech recognition module; And
And translating the recognized text using a translation module and a translation DB included in the translation module.

18. The method of claim 16,
The step of recognizing the speech comprises:
Receiving a voice of the at least one speaking person through a microphone,
Wherein the voice recognition module records the voice and performs voice recognition using the voice data.

18. The method of claim 17 or 18,
The speech recognition DB includes:
The database is constructed by learning the speech due to various utterances using the deep learning algorithm and reducing or enlarging the recognition range according to the frequency of utterance contents.
The translation DB,
It is a database constructed by learning various translation examples using the algorithm of deep learning, expanding colloquial expressions according to frequency of translation examples,
The speech recognition engine or translation engine,
Wherein a voice recognition DB or a translation DB of a lower capacity than the DB in which the frequency is not taken into account is used.

19. The method of claim 18,
The interpretation method comprises:
Further comprising the step of outputting a synthesized voice corresponding to the voice of one of said speakers in accordance with the translation of said translation module via said reproduction module.

19. The method of claim 18,
The step of recognizing the speech comprises:
Wherein a voice input button is used to determine a voice input point of the user and / or the other party.

The method of claim 20,
The interpretation method comprises:
Further comprising the step of the wired or wireless communication between the portable server and the user terminal,
When the user terminal functions as an input device and / or an output device for any one of the speakers,
Wherein the portable server functions as an input device and / or output device for the other party.

23. The method of claim 21,
The interpretation method comprises:
Further comprising the step of the control module generating a message to cause the user terminal to transmit a confirmation message to confirm the completed operation.

23. The method of claim 21,
Wherein the step of outputting the synthesized voice comprises:
And reproduces the TTS data transmitted from the user terminal and outputs a synthesized voice.

23. The method of claim 21,
The interpretation method comprises:
Further comprising the step of the user terminal having the control unit controlling the portable server.

27. The method of claim 24,
Wherein the controlling comprises:
Wherein said control means controls the operation of said portable server, voice recording, transmission of recorded data, and transmission of said acknowledgment message.

26. The method of claim 25,
The user terminal further includes a storage unit,
Wherein the storage unit stores a client program for linking the control unit and the control module,
Wherein the storage module stores a server program for the client program.

23. The method of claim 21,
The portable server includes:
Wherein the control unit switches between Standby, Ready, and Run states in relation to the user terminal, and switches between a Power On state and a Power Off state through a power button.

According to claim 25,
The portable server includes:
In the standby state, the communication module is in an on state, the control module is in a sleep state,
And in the Ready state, the control module is switched to the On state.

23. The method of claim 22,
The step of recognizing the speech comprises:
Determining a tone color using the sound spectrum of the speaker's voice and distinguishing the speakers of the different language speech which are simultaneously fired according to the determined tone color.

29. The method of claim 29,
The step of recognizing the speech comprises:
The method further comprises the step of discriminating the kind of the heterogeneous language speech using scoring according to the translation result of the sample speech with respect to the speech of the simultaneous uttered heterogeneous language.