KR20210121818A

KR20210121818A - Method for Provide Real-Time Simultaneous Interpretation Service between Conversators

Info

Publication number: KR20210121818A
Application number: KR1020200039150A
Authority: KR
Inventors: 최우열
Original assignee: 조선대학교산학협력단
Priority date: 2020-03-31
Filing date: 2020-03-31
Publication date: 2021-10-08
Also published as: KR102344645B1

Abstract

The present invention relates to a method for providing a real-time simultaneous interpretation service between interlocutors. The method comprises the steps of: receiving an audio signal including speech data; converting the received audio signal into a first text using a speech-to-text (STT) model; translating the converted first text into a preset language to generate a second text; converting the second text generated using a text-to-speech (TTS) model into a final speech signal; and transmitting at least one of the converted final speech signal and the received audio signal to at least one external device.

Description

Method for Provide Real-Time Simultaneous Interpretation Service between Conversators

본 발명은 실시간 동시통역 서비스 제공방법에 관한 것으로, 더욱 상세하게는 대화자간 실시간 동시통역 서비스 제공방법에 관한 것이다.The present invention relates to a method for providing a real-time simultaneous interpretation service, and more particularly, to a method for providing a real-time simultaneous interpretation service between interlocutors.

최근 정보통신기술의 발달로 인해 데이터통신망을 통한 다양한 분야의 정보서비스가 제공되고 있다. 이러한 정보통신기술의 발달은 세계를 하나로 묶고 실시간으로 모든 정보가 전세계에 동시에 노출되고 그 반응 또한 실시간으로 공유되고 있다. 특히, 스마트폰 애플리케이션의 개발이 활발해지면서 다양한 스마트폰 애플리케이션이 출시되고 있는 실정이다.Recently, due to the development of information and communication technology, information services in various fields are provided through data communication networks. The development of such information and communication technology binds the world together, and all information is simultaneously exposed to the world in real time, and the response is also shared in real time. In particular, as the development of smartphone applications is active, various smartphone applications are being released.

다양한 스마트폰용 애플리케이션은 스마트폰 하나의 단말을 이용하여 게임, 인터넷 뱅킹, 메신저, 인터넷 검색, 내비게이션 등 다양한 기능을 제공할 수 있도록 하고 있다. 이러한 움직임은 별도로 제공되던 별도의 기기의 기능을 스마트폰 하나에서 충족시키도록 하고 있다.Various applications for smartphones are designed to provide various functions such as games, Internet banking, messenger, Internet search, and navigation using a single smartphone terminal. This movement is to satisfy the functions of a separate device, which were provided separately, in one smartphone.

한편, 최근에는 번역기능을 가진 스마트폰용 애플리케이션이 출시되고 있다. 그러나, 종래 스마트폰용으로 제공되는 번역 애플리케이션은 서버의 데이터베이스를 중심으로 한 인터넷 기반 서비스가 주를 이루고 있다. 이는 번역 데이터베이스의 양이 많기 때문에 이를 스마트폰에 저장하는데 너무 많은 메모리 용량을 차지하기 때문이다. On the other hand, recently, an application for a smartphone having a translation function has been released. However, the translation application provided for the conventional smart phone is mainly an Internet-based service centered on the database of the server. This is because the amount of the translation database is large and it takes up too much memory to store it on the smartphone.

그러나, 인터넷 기반 번역서비스의 경우 국내에서 사용하는 경우에는 데이터사용량에 크게 제한받지 않고 사용할 수 있지만, 해외에서 사용하는 경우에는 데이터사용량에 따른 요금이 큰 부담이 된다. 이는 번역이나 통역 서비스의 주된 이용이 해외 출장 등의 해외에서 주로 이용된다는 점에서 자유로운 사용을 제한하는 요인이 될 수밖에 없다. However, in the case of Internet-based translation service, when used in Korea, it can be used without much limitation on data usage, but when used abroad, the data usage fee becomes a big burden. This is inevitably a factor limiting the free use of translation and interpretation services, since they are mainly used abroad such as overseas business trips.

[특허문헌 1] 한국등록특허 제10-1695396호. 2017.01.05 등록.[Patent Document 1] Korean Patent Registration No. 10-1695396. Registered on 2017.01.05.

본 발명은 전술한 문제점을 해결하기 위하여 창출된 것으로, 대화자간 실시간 동시통역 서비스 제공방법을 제공하는 것을 그 목적으로 한다. The present invention was created to solve the above problems, and an object of the present invention is to provide a method for providing a real-time simultaneous interpretation service between interlocutors.

본 발명의 목적들은 이상에서 언급한 목적들로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다. Objects of the present invention are not limited to the objects mentioned above, and other objects not mentioned will be clearly understood from the description below.

상기한 목적들을 달성하기 위하여, 본 발명의 일 실시예에 따른 대화자간 실시간 동시통역 서비스 제공방법이 개시된다. 상기 방법은 음성 데이터가 포함된 오디오 신호를 수신하는 단계, 음성인식(Speech-to-Text, STT) 모델을 이용하여 수신된 오디오 신호를 제1 텍스트로 변환하는 단계, 변환된 제1 텍스트를 기설정된 언어로 번역하여 제2 텍스트를 생성하는 단계, 텍스트음성변환(Text-to-Speech, TTS) 모델을 이용하여 생성된 제2 텍스트를 최종음성 신호로 변환하는 단계 및 변환된 최종음성 신호 또는 수신된 오디오 신호 중 적어도 하나를 적어도 하나의 외부 디바이스로 전송하는 단계를 포함할 수 있다. In order to achieve the above objects, a method for providing a real-time simultaneous interpretation service between interlocutors according to an embodiment of the present invention is disclosed. The method includes the steps of receiving an audio signal including voice data, converting the received audio signal into first text using a speech-to-text (STT) model, and writing the converted first text. Generating the second text by translating it into a set language, converting the second text generated using a text-to-speech (TTS) model into a final speech signal, and receiving the converted final speech signal It may include transmitting at least one of the audio signals to at least one external device.

또한, 본 발명의 일 실시예에 따르면, 오디오 신호를 수신하는 단계는 기설정된 디지털 필터 계수를 가지는 적응형 디지털 필터를 이용하여 수신된 오디오 신호를 필터링하는 단계를 포함할 수 있다. Also, according to an embodiment of the present invention, the receiving of the audio signal may include filtering the received audio signal using an adaptive digital filter having a preset digital filter coefficient.

또한, 본 발명의 일 실시예에 따르면, 오디오 신호를 수신하는 단계 이후, 수신한 오디오 신호의 파형과 180도 위상차를 갖는 상쇄음성 신호를 생성하는 단계 및 생성한 상쇄음성 신호를 외부 디바이스로 전송하는 단계를 더 포함할 수 있다. In addition, according to an embodiment of the present invention, after the step of receiving the audio signal, the step of generating an offsetting voice signal having a phase difference of 180 degrees with the waveform of the received audio signal, and transmitting the generated offsetting voice signal to an external device It may include further steps.

또한, 본 발명의 일 실시예에 따르면, 상쇄음성 신호를 생성하는 단계는 수신된 오디오 신호를 증폭하여 오디오 신호의 파형을 분석하는 단계를 포함할 수 있다. Also, according to an embodiment of the present invention, generating the offset speech signal may include amplifying the received audio signal and analyzing the waveform of the audio signal.

또한, 본 발명의 일 실시예에 따르면, 수신된 오디오 신호보다 상쇄음성 신호가 외부 디바이스에 먼저 도달할 수 있다. Also, according to an embodiment of the present invention, the cancellation voice signal may arrive at the external device earlier than the received audio signal.

또한, 본 발명의 일 실시예에 따르면, 오디오 신호를 수신하는 단계 이후, 수신한 오디오 신호에 포함된 음성에서 음성의 특징 정보를 추출하는 단계를 더 포함할 수 있다. In addition, according to an embodiment of the present invention, after the step of receiving the audio signal, the method may further include the step of extracting the characteristic information of the voice from the voice included in the received audio signal.

또한, 본 발명의 일 실시예에 따르면, 음성의 특징 정보를 추출하는 단계는 수신한 오디오 신호에 포함된 음성의 음색 정보, 음높이(Pitch) 정보, 발성 강도(Intensity) 정보, 발화 속도(Speed) 정보 또는 성도(Vocal Tract) 특징 정보 중 적어도 하나의 특성 정보를 추출하는 단계일 수 있다. In addition, according to an embodiment of the present invention, the step of extracting the characteristic information of the voice includes the voice tone information, the pitch information, the utterance intensity information, and the utterance speed (Speed) of the voice included in the received audio signal. It may be a step of extracting at least one characteristic information of information or vocal tract characteristic information.

또한, 본 발명의 일 실시예에 따르면, 최종음성 신호로 변환하는 단계는 추출한 특징 정보에 상응하는 출력음성을 생성하는 단계 및 생성된 출력음성을 텍스트음성변환 모델을 통해 최종음성 신호로 변환하는 단계일 수 있다. In addition, according to an embodiment of the present invention, the converting into a final voice signal includes generating an output voice corresponding to the extracted feature information and converting the generated output voice into a final voice signal through a text-to-speech model can be

또한, 본 발명의 일 실시예에 따르면, 외부 디바이스에 전송하는 단계는 변환된 최종음성 신호와 상쇄음성 신호를 적어도 하나의 외부 디바이스에 전송하는 단계일 수 있다. In addition, according to an embodiment of the present invention, the step of transmitting to the external device may be a step of transmitting the converted final voice signal and the cancellation voice signal to at least one external device.

또한, 본 발명의 일 실시예에 따르면, 제2 텍스트를 생성하는 단계는 기구축된 형태소 사전에 따라 제1 텍스트의 형태소를 분석하는 단계, 기구축된 문법 사전에 따라 제1 텍스트의 구문을 분석하는 단계, 분석된 형태소와 구문을 기반으로 제1 텍스트를 기설정한 언어로 번역하여 제2 텍스트를 생성하는 단계를 포함할 수 있다. Also, according to an embodiment of the present invention, the generating of the second text includes analyzing the morpheme of the first text according to the condensed morpheme dictionary, and analyzing the syntax of the first text according to the condensed grammar dictionary. and generating the second text by translating the first text into a preset language based on the analyzed morpheme and syntax.

또한, 본 발명의 일 실시예에 따르면, 제2 텍스트를 생성하는 단계 이후, 인공지능 번역 소프트웨어를 이용하여 생성된 제2 텍스트를 수정하는 단계를 더 포함할 수 있다. In addition, according to an embodiment of the present invention, after generating the second text, the method may further include modifying the generated second text using artificial intelligence translation software.

또한, 본 발명의 일 실시예에 따르면, 외부 디바이스로 전송하는 단계는 수신한 오디오 신호가 전송되면 상쇄음성 신호를 수신한 오디오 신호와 동시에 외부 디바이스로 전송되는 것을 특징으로 할 수 있다. In addition, according to an embodiment of the present invention, the step of transmitting to the external device may be characterized in that when the received audio signal is transmitted, the cancellation voice signal is transmitted to the external device at the same time as the received audio signal.

상기한 목적들을 달성하기 위한 구체적인 사항들은 첨부된 도면과 함께 상세하게 후술될 실시예들을 참조하면 명확해질 것이다. Specific details for achieving the above objects will become clear with reference to the embodiments to be described in detail below in conjunction with the accompanying drawings.

그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라, 서로 다른 다양한 형태로 구성될 수 있으며, 본 발명의 개시가 완전하도록 하고 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자(이하, "통상의 기술자")에게 발명의 범주를 완전하게 알려주기 위해서 제공되는 것이다. However, the present invention is not limited to the embodiments disclosed below, but may be configured in various different forms, and those of ordinary skill in the art to which the present invention belongs ( Hereinafter, "a person skilled in the art") is provided to fully inform the scope of the invention.

본 발명의 일 실시예에 의하면, 상쇄음성 신호를 생성하여 전송함으로써, 상대방은 발화자의 원음은 제거되고 번역된 음성만 들을 수 있어서, 원음과 번역된 음성의 동시 청취로 인한 혼란을 방지할 수 있다. According to an embodiment of the present invention, by generating and transmitting an offset speech signal, the other party can hear only the translated voice while the original sound of the speaker is removed, thereby preventing confusion due to simultaneous listening of the original sound and the translated voice. .

또한, 본 발명의 일 실시예에 의하면, 대화자의 목소리 특징을 추출하여 번역된 음성에 합성한 후 상대방에 전송하는 방법을 사용함으로써, 발화자와 청취자간에 이질감을 최소화할 수 있다. In addition, according to an embodiment of the present invention, the sense of heterogeneity between the speaker and the listener can be minimized by using a method of extracting the voice characteristics of the talker, synthesizing it with the translated voice, and then transmitting the same to the other party.

또한, 본 발명의 일 실시예에 의하면, 인공지능을 이용하여 대화 내용의 흐름 및 관련성을 보완함으로써 번역 정확도를 향상시킬 수 있다. In addition, according to an embodiment of the present invention, it is possible to improve translation accuracy by supplementing the flow and relevance of conversation contents using artificial intelligence.

본 발명의 효과들은 상술된 효과들로 제한되지 않으며, 본 발명의 기술적 특징들에 의하여 기대되는 잠정적인 효과들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다. The effects of the present invention are not limited to the above-described effects, and potential effects expected by the technical features of the present invention will be clearly understood from the following description.

상기 언급된 본 발명 내용의 특징들이 상세하게, 보다 구체화된 설명으로, 이하의 실시예들을 참조하여 이해될 수 있도록, 실시예들 중 일부는 첨부되는 도면에서 도시된다. 또한, 도면과의 유사한 참조번호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭하는 것으로 의도된다. 그러나 첨부된 도면들은 단지 본 발명 내용의 특정한 전형적인 실시예들만을 도시하는 것일 뿐, 본 발명의 범위를 한정하는 것으로 고려되지는 않으며, 동일한 효과를 갖는 다른 실시예들이 충분히 인식될 수 있다는 점을 유의하도록 한다.
도 1은 본 발명의 일 실시예에 따른 대화자간 실시간 동시통역 서비스 제공방법의 순서도이다.
도 2는 본 발명의 일 실시예에 따른 원음성 제거를 위한 신호의 전달 조건을 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 대화자간 실시간 동시통역 시스템의 개략도를 도시한 도면이다. BRIEF DESCRIPTION OF THE DRAWINGS In order that the above-mentioned features of the present invention may be understood in detail, with a more specific description, with reference to the following examples, some of the embodiments are shown in the accompanying drawings. Also, like reference numerals in the drawings are intended to refer to the same or similar functions throughout the various aspects. However, it should be noted that the accompanying drawings only show certain typical embodiments of the present invention, and are not to be considered as limiting the scope of the present invention, and other embodiments having the same effect may be sufficiently recognized. let it do
1 is a flowchart of a method for providing a real-time simultaneous interpretation service between interlocutors according to an embodiment of the present invention.
2 is a diagram illustrating a signal transmission condition for original audio cancellation according to an embodiment of the present invention.
3 is a diagram illustrating a schematic diagram of a real-time simultaneous interpretation system between interlocutors according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고, 여러 가지 실시예들을 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 이를 상세히 설명하고자 한다. Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail.

청구범위에 개시된 발명의 다양한 특징들은 도면 및 상세한 설명을 고려하여 더 잘 이해될 수 있을 것이다. 명세서에 개시된 장치, 방법, 제법 및 다양한 실시예들은 예시를 위해서 제공되는 것이다. 개시된 구조 및 기능상의 특징들은 통상의 기술자로 하여금 다양한 실시예들을 구체적으로 실시할 수 있도록 하기 위한 것이고, 발명의 범위를 제한하기 위한 것이 아니다. 개시된 용어 및 문장들은 개시된 발명의 다양한 특징들을 이해하기 쉽게 설명하기 위한 것이고, 발명의 범위를 제한하기 위한 것이 아니다. Various features of the invention disclosed in the claims may be better understood upon consideration of the drawings and detailed description. The apparatus, methods, preparations, and various embodiments disclosed herein are provided for purposes of illustration. The disclosed structural and functional features are intended to enable those skilled in the art to specifically practice the various embodiments, and are not intended to limit the scope of the invention. The terms and sentences disclosed are for the purpose of easy-to-understand descriptions of various features of the disclosed invention, and are not intended to limit the scope of the invention.

본 발명을 설명함에 있어서, 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우, 그 상세한 설명을 생략한다. In describing the present invention, if it is determined that a detailed description of a related known technology may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted.

이하, 본 발명의 일 실시예에 따른 대화자간 실시간 동시통역 서비스 제공방법을 설명한다. Hereinafter, a method for providing a real-time simultaneous interpretation service between interlocutors according to an embodiment of the present invention will be described.

도 1은 본 발명의 일 실시예에 따른 대화자간 실시간 동시통역 서비스 제공방법의 순서도이다. 또한, 도 2는 본 발명의 일 실시예에 따른 원음성 제거를 위한 신호의 전달 조건을 도시한 도면이다. 1 is a flowchart of a method for providing a real-time simultaneous interpretation service between interlocutors according to an embodiment of the present invention. Also, FIG. 2 is a diagram illustrating a signal transmission condition for original audio cancellation according to an embodiment of the present invention.

도 1 및 도 2를 참조하면, 동시통역 서비스제공 방법(S100)은 음성 데이터가 포함된 오디오 신호를 수신하는 단계(S101), 음성인식(Speech-to-Text, STT)모델을 이용하여 수신된 오디오 신호를 제1 텍스트로 변환하는 단계(S103), 변환된 제1 텍스트를 기설정된 언어로 번역하여 제2 텍스트를 생성하는 단계(S105), 텍스트음성변환(Text-to-Speech, TTS)모델을 이용하여 생성된 제2 텍스트를 최종음성 신호로 변환하는 단계(S107) 및 변환된 최종음성 신호 또는 상기 수신된 오디오 신호 중 적어도 하나를 적어도 하나의 외부 디바이스로 전송하는 단계(S109)를 포함할 수 있다. 1 and 2, the method for providing a simultaneous interpretation service (S100) includes a step (S101) of receiving an audio signal including voice data, and a speech recognition (Speech-to-Text, STT) model received using a model. Converting an audio signal into a first text (S103), generating a second text by translating the converted first text into a preset language (S105), a text-to-speech (TTS) model Converting the generated second text into a final voice signal using can

일 실시예에서, 오디오 신호를 수신하는 단계(S101)는 음성 데이터가 포함된 오디오 신호를 수신하는 단계일 수 있다. In an embodiment, the step of receiving the audio signal ( S101 ) may be a step of receiving the audio signal including voice data.

보다 구체적으로, 오디오 신호를 수신하는 단계(S101)는 음성 데이터가 포함된 오디오 신호를 기설정된 디지털 필터 계수를 가지는 적응형 디지털 필터를 이용하여 필터링하는 단계를 더 포함할 수 있다. 오디오 신호를 수신하는 단계(S101)는 오디오 신호를 수신하는 단계(S101)에서 음성 데이터가 포함된 오디오 신호가 입력되면, 수신된 오디오 신호에 포함된 잡음을 제거하기 위해 미리 설정해놓은 디지털 필터 계수를 가지는 적응형 디지털 필터를 이용하여 수신한 오디오 신호를 필터링하는 단계일 수 있다. More specifically, the step of receiving the audio signal ( S101 ) may further include filtering the audio signal including voice data using an adaptive digital filter having a preset digital filter coefficient. In the step of receiving the audio signal (S101), when an audio signal including voice data is input in the step of receiving the audio signal (S101), digital filter coefficients set in advance to remove noise included in the received audio signal are applied. It may be a step of filtering the received audio signal using an adaptive digital filter.

또한, 상기 방법을 수행하기 위한 통역기기 장치는 입력장치를 통해 입력되는 오디오 신호일 수 있다. 상기 입력장치는 무선/유선 이어폰을 포함할 수 있다. Also, the interpreter device for performing the method may be an audio signal input through an input device. The input device may include a wireless/wired earphone.

또한, 동시통역 서비스 제공방법(S100)은 오디오 신호를 수신하는 단계 이후, 수신한 오디오 신호의 파형과 180도 위상차를 갖는 상쇄음성 신호를 생성하는 단계 및 생성한 상쇄음성 신호를 외부 디바이스로 전송하는 단계를 더 포함할 수 있다. In addition, the simultaneous interpretation service providing method (S100) includes the steps of, after receiving the audio signal, generating an offsetting voice signal having a phase difference of 180 degrees with the waveform of the received audio signal, and transmitting the generated offsetting voice signal to an external device. It may include further steps.

또한, 상쇄음성 신호를 생성하는 단계는 오디오 신호를 수신하는 단계(S101)에서 수신한 오디오 신호를 증폭하고, 그 증폭된 오디오 신호의 파형 분석을 통해 오디오 신호를 분석하여, 분석된 오디오 신호의 파형에 대응되는 신호를 생성하는 단계일 수 있다. In addition, the generating of the offset speech signal may include amplifying the audio signal received in step S101 of receiving the audio signal, analyzing the audio signal through waveform analysis of the amplified audio signal, and analyzing the waveform of the audio signal. It may be a step of generating a signal corresponding to .

또한, 상쇄음성 신호는 오디오 신호를 수신하는 단계(S101)에서 수신된 오디오 신호의 파형과 180도 위상차를 가지는 것을 특징으로 하는 신호일 수 있다. 따라서, 외부 디바이스의 출력장치를 통해 상쇄음성 신호가 출력되면, 원래의 음성 데이터가 포함된 오디오 신호와 상쇄음성 신호가 상쇄되어 외부 디바이스의 사용자는 최초의 오디오 신호가 차단되는 효과를 볼 수 있다. In addition, the cancellation voice signal may be a signal having a phase difference of 180 degrees from the waveform of the audio signal received in the step of receiving the audio signal ( S101 ). Accordingly, when the offsetting voice signal is output through the output device of the external device, the audio signal including the original voice data and the offsetting voice signal are canceled, so that the user of the external device can see the effect that the original audio signal is blocked.

또한, 생성한 상쇄음성 신호를 외부 디바이스로 전송하는 단계는 수신된 오디오 신호보다 상쇄음성 신호가 외부 디자이스에 먼저 도달하는 것을 특징으로 하는 단계일 수 있다. In addition, the step of transmitting the generated cancellation voice signal to the external device may be a step characterized in that the cancellation voice signal arrives at the external device before the received audio signal.

또한, 동시통역 서비스제공 방법(S100)은 오디오 신호를 수신하는 단계 이후, 수신한 오디오 신호에 포함된 음성데이터를 분석하는 단계 및 분석한 객체 음성의 특징 정보를 추출하는 단계를 더 포함할 수 있다. In addition, the method for providing a simultaneous interpretation service ( S100 ) may further include, after receiving the audio signal, analyzing voice data included in the received audio signal and extracting characteristic information of the analyzed object voice. .

또한, 객체 음성 특징 정보를 추출하는 단계는 수신한 오디오 신호에 포함된 객체 음성을 분석하여 객체 음성에 대한 음색 정보, 음높이(Pitch) 정보, 발성 강도(Intensity) 정보, 발화 속도(Speed) 정보 및 성도(Vocal Tract) 특징 정보 중 적어도 하나의 특성 정보를 추출하는 단계일 수 있다. 즉, 객체 음성 특징 정보를 추출하는 단계는 객체의 목소리를 복제하기 위해 음성의 특징을 추출하는 단계일 수 있다. In addition, the step of extracting the object voice characteristic information comprises analyzing the object voice included in the received audio signal to obtain tone information, pitch information, vocalization intensity information, speech speed information and It may be a step of extracting at least one characteristic information from among the vocal tract characteristic information. That is, the step of extracting the object voice feature information may be the step of extracting the voice feature in order to duplicate the voice of the object.

일 실시예에서, 제1 텍스트로 변환하는 단계(S103)는 음성인식(Speech-to-Text, STT) 모델을 이용하여 수신된 오디오 신호를 제1 텍스트로 변환하는 단계일 수 있다. In an embodiment, the converting into the first text ( S103 ) may be a step of converting the received audio signal into the first text using a speech-to-text (STT) model.

여기서, 음성인식(Speech-to-Text, STT) 모델이란, 사람이 말하는 음성 언어를 컴퓨터가 해석해 그 내용을 문자 데이터로 전환하는 처리를 말한다. 음성인식(Speech-to-Text, STT) 모델을 이용하여 키보드 등 입력장치 대신 음성을 이용하여 문자를 입력할 수 있다. Here, the speech-to-text (STT) model refers to a process in which a computer interprets the speech language spoken by a person and converts the content into text data. Using a speech-to-text (STT) model, characters can be input using voice instead of an input device such as a keyboard.

보다 구체적으로, 제1 텍스트를 변환하는 단계(S103)는 수신한 오디오 신호에 포함되어 있는 음성 데이터를 음성인식(Speech-to-Text, STT) 모델을 이용하여 수신한 객체의 음성을 텍스트로 변환하는 단계일 수 있다. More specifically, in the step of converting the first text (S103), the voice data included in the received audio signal is converted into text by using a speech-to-text (STT) model. It may be a step to

일 실시예에서, 제2 텍스트를 생성하는 단계(S105)는 제1 텍스트를 변환하는 단계(S103)에서 변환된 제1 텍스트를 기설정한 언어로 번역하여 제2 텍스트를 생성하는 단계일 수 있다. In an embodiment, generating the second text ( S105 ) may be a step of generating the second text by translating the first text converted in the step of converting the first text ( S103 ) into a preset language. .

보다 구체적으로, 제2 텍스트를 생성하는 단계(S105)는 기구축된 형태소 사전을 기반으로 생성된 제1 텍스트의 형태소를 분석하는 단계, 기구축된 문법 사전을 기반으로 구문 분석을 수행하는 단계 및 구문 분석 결과에 따라 제1 텍스트를 기설정한 언어로 번역하여 제2 텍스트를 생성하는 단계를 포함할 수 있다. More specifically, generating the second text ( S105 ) includes analyzing morphemes of the first text generated based on the condensed morpheme dictionary, performing syntax analysis based on the condensed grammar dictionary, and The method may include generating the second text by translating the first text into a preset language according to the syntax analysis result.

또한, 제2 텍스트를 생성하는 단계(S105) 이후, 인공지능 번역 소프트웨어를 이용하여 생성된 제2 텍스트를 수정하는 단계를 더 포함할 수 있다. 제2 텍스트를 생성하는 단계(S105)에서 생성된 제2 텍스트를 수정하는 단계를 더 포함함으로써, 생성된 제2 텍스트의 완성도를 높일 수 있다. 즉, 번역과정을 거친 제2 텍스트를 인공지능 기술을 활용하여 원래의 대화 내용 및 상황 정보에 상응하게 보완하여 실시간으로 동일한 언어로 대화하는 느낌을 연출할 수 있다. In addition, after generating the second text ( S105 ), the method may further include modifying the generated second text using artificial intelligence translation software. By further including the step of correcting the second text generated in the step of generating the second text ( S105 ), the degree of completeness of the generated second text may be improved. That is, the second text that has undergone the translation process can be supplemented with the original conversation content and situational information by using artificial intelligence technology to create a sense of conversation in the same language in real time.

일 실시예에서, 최종음성 신호로 변환하는 단계(S107)는 텍스트음성변환(Text-to-Speech, TTS) 모델을 이용하여 생성된 제2 텍스트를 최종음성 신호로 변환하는 단계일 수 있다. In one embodiment, the step of converting the final speech signal ( S107 ) may be a step of converting the second text generated using a text-to-speech (TTS) model into the final speech signal.

보다 구체적으로, 동시통역 서비스제공 방법(S100)은 오디오 신호를 수신하는 단계(S101)에서 오디오 신호를 수신한 이후, 수신한 오디오 신호에 포함된 객체 음성을 분석하는 단계 및 분석한 객체 음성의 특징 정보를 추출하는 단계를 더 포함할 수 있다. More specifically, the method for providing simultaneous interpretation service ( S100 ) includes the steps of analyzing the object voice included in the received audio signal after receiving the audio signal in the step of receiving the audio signal ( S101 ) and the characteristics of the analyzed object voice It may further include the step of extracting information.

또한, 객체 음성 특징 정보는 수신한 오디오 신호에 포함된 객체 음성을 분석하여 객체 음성에 대한 음색 정보, 음높이(Pitch) 정보, 발성 강도(Intensity) 정보, 발화 속도(Speed) 정보 및 성도(Vocal Tract) 특징 정보 중 적어도 하나의 특성 정보를 포함할 수 있다. 즉, 객체 음성 특징 정보는 객체의 목소리를 복제하기 위해 음성의 특징 정보를 포함하고 있다. In addition, the object voice characteristic information is obtained by analyzing the object voice included in the received audio signal to obtain tone information, pitch information, vocal intensity information, speech speed information, and vocal tract of the object voice. ) may include at least one characteristic information among the characteristic information. That is, the object voice feature information includes voice feature information to duplicate the object's voice.

또한, 최종음성 신호로 변환하는 단계(S107)는 추출한 객체 음성의 특징 정보에 상응하는 출력음성을 생성하는 단계 및 생성된 출력음성이 적용된 텍스트음성변환(TTS) 모델을 이용하여 최종음성 신호로 변환하는 단계를 포함할 수 있다. 즉, 대화자간 서로 다른 언어를 사용하는 경우, 자연스러운 대화를 위해 발화자의 목소리를 복제하여 번역한 텍스트를 복제한 발화자의 음성을 이용하여 수화자에게 들려주어, 자연스러운 대화를 유도할 수 있다. In addition, the converting into the final voice signal (S107) is converted into a final voice signal using a text-to-speech (TTS) model to which the output voice corresponding to the extracted feature information of the object voice is generated and the generated output voice is applied. may include the step of That is, when different languages are used between the talkers, a natural conversation can be induced by copying the speaker's voice for a natural conversation and using the copied speaker's voice to hear the translated text.

일 실시예에서, 외부 디바이스로 전송하는 단계(S109)는 최종음성 신호로 변환하는 단계(S107)에서 변환된 최종음성 신호 또는 수신된 오디오 신호 중 적어도 하나를 적어도 하나의 외부 디바이스로 전송하는 단계일 수 있다. 또한, 외부 디바이스로 전송하는 단계(S109)는 변환된 최종음성 신호와 상쇄음성 신호를 적어도 하나의 외부 디바이스에 전송하는 단계일 수 있다. In one embodiment, the step of transmitting (S109) to the external device is a step of transmitting at least one of the final voice signal or the received audio signal converted in the step (S107) of converting to the final voice signal to at least one external device can In addition, the step of transmitting to the external device ( S109 ) may be a step of transmitting the converted final voice signal and the cancellation voice signal to at least one external device.

보다 구체적으로, 외부 디바이스로 전송하는 단계(S109)는 최종음성 신호로 변환하는 단계(S107)에서 변환된 최종음성 신호 또는 오디오 신호를 수신하는 단계(S101)에서 수신한 오디오 신호를 외부 디바이스로 전송할 수 있다. 또는, 외부 디바이스로 전송하는 단계(S109)는 최종음성 신호로 변환하는 단계(S107)에서 변환된 최종음성 신호와 오디오 신호를 수신하는 단계(S101)에서 수신한 오디오 신호를 동시에 외부 디바이스로 전송할 수 있다. More specifically, the step (S109) of transmitting to the external device transmits the audio signal received in the step (S101) of receiving the final voice signal or audio signal converted in the step (S107) of converting to the final voice signal to the external device. can Alternatively, in the step (S109) of transmitting to an external device, the audio signal received in the step (S101) of receiving the final voice signal and the audio signal converted in the step (S107) of converting to the final voice signal can be simultaneously transmitted to the external device. have.

또한, 외부 디바이스로 전송하는 단계(S109)는 오디오 신호를 수신하는 단계(S101)에서 수신한 오디오 신호가 외부 디바이스로 전송되는 경우, 수신한 오디오 신호와 대응되는 상쇄음성 신호를 수신한 오디오 신호와 동시에 외부 디바이스로 전송되는 것을 특징으로 하는 단계일 수 있다. 즉, 수신한 오디오 신호가 외부 디바이스로 전송이 되면, 상쇄음성 신호를 보내어 수신한 원래의 오디오 신호를 상쇄시켜 수화자가 들을 수 없게 하여 최종음성 신호만을 청취하게 하기 위함이다. In addition, in the step S109 of transmitting the audio signal to the external device, when the audio signal received in the step S101 of receiving the audio signal is transmitted to the external device, the received audio signal and the offset voice signal corresponding to the received audio signal It may be a step characterized in that it is transmitted to an external device at the same time. That is, when the received audio signal is transmitted to an external device, an offset audio signal is sent to cancel the received original audio signal so that the receiver cannot hear it, so that only the final audio signal can be heard.

여기서 상쇄음성 신호는 수신된 오디오 신호의 파형과 180도 위상차를 가지는 것을 특징으로 할 수 있다. 따라서 오디오 신호와 상쇄음성 신호의 상쇄간섭으로 상쇄가 되어 수화자가 들을 필요가 없는 신호를 지울 수 있다. Here, the offset speech signal may have a phase difference of 180 degrees from the waveform of the received audio signal. Accordingly, the signal that the receiver does not need to hear can be erased by canceling the destructive interference between the audio signal and the canceling voice signal.

또한, 생성한 상쇄음성 신호는 수신한 오디오 신호가 외부 디바이스에 도달하는 시간보다 상쇄음성 신호가 적어도 먼저 외부 디바이스에 도달하는 것을 특징으로 할 수 있다. 수화자가 들을 필요가 없는 불필요한 신호인 처음 수신한 오디오 신호를 제거하기 위해서 상쇄음성 신호를 생성하여 전송하는 것이므로, 처음 수신한 오디오 신호보다 적어도 먼저 상쇄음성 신호가 외부 디바이스로 도착하게 하는 것이다.In addition, the generated cancellation voice signal may be characterized in that the cancellation voice signal arrives at the external device at least earlier than the time at which the received audio signal arrives at the external device. Since the first received audio signal, which is an unnecessary signal that the receiver does not need to hear, is generated and transmitted, the offset audio signal is generated and transmitted, so that the offset audio signal arrives to the external device at least earlier than the first received audio signal.

도 3은 본 발명의 일 실시예에 따른 대화자간 실시간 동시통역 시스템의 개략도를 도시한 도면이다. 3 is a diagram illustrating a schematic diagram of a real-time simultaneous interpretation system between interlocutors according to an embodiment of the present invention.

도 3을 참조하면, 동시통역 서비스제공 장치는 메모리, 통신부, 프로세서 및 출력부를 포함할 수 있다. Referring to FIG. 3 , the apparatus for providing a simultaneous interpretation service may include a memory, a communication unit, a processor, and an output unit.

일 실시예에서, 메모리는 프로세서의 처리 및 제어를 위한 프로그램을 저장할 수 있고, 동시통역 서비스제공 장치로 입력되거나 동시통역 서비스제공 장치로부터 출력되는 데이터를 저장할 수도 있다In an embodiment, the memory may store a program for processing and controlling the processor, and may store data input to or output from the simultaneous interpretation service providing apparatus.

메모리는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(RAM, Random Access Memory) SRAM(Static Random Access Memory), 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. Memory includes flash memory type, hard disk type, multimedia card micro type, card type memory (eg SD or XD memory), RAM (RAM, Random Access Memory SRAM (Static Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk, optical disk It may include at least one type of storage medium.

일 실시예에서, 통신부는 동시통역 서비스제공 장치가 다른 장치 및 서버와 통신을 하게 하는 하나 이상의 구성요소를 포함할 수 있다. 다른 장치는 동시통역 서비스제공 장치와 같은 컴퓨팅 장치이거나, 센싱 장치일 수 있으나, 이에 제한되지 않는다. 예를 들어, 통신부는, 근거리 통신부, 이동 통신부를 포함할 수 있다. In one embodiment, the communication unit may include one or more components that allow the simultaneous interpretation service providing apparatus to communicate with other apparatuses and servers. The other device may be a computing device, such as a simultaneous interpretation service providing device, or a sensing device, but is not limited thereto. For example, the communication unit may include a short-distance communication unit and a mobile communication unit.

근거리 통신부(short-range wireless communication unit) 는, 블루투스 통신부, BLE(Bluetooth Low Energy) 통신부, 근거리 무선 통신부(Near Field Communication unit), WLAN(와이파이) 통신부, 지그비(Zigbee) 통신부, 적외선(IrDA, infrared Data Association) 통신부, WFD(Wi-Fi Direct) 통신부, UWB(ultra wideband) 통신부, 등을 포함할 수 있으나, 이에 한정되는 것은 아니다. 이동 통신부는, 이동 통신망 상에서 기지국, 외부의 단말, 서버 중 적어도 하나와 무선 신호를 송수신한다. Short-range wireless communication unit, Bluetooth communication unit, BLE (Bluetooth Low Energy) communication unit, near field communication unit (Near Field Communication unit), WLAN (Wi-Fi) communication unit, Zigbee communication unit, infrared (IrDA, infrared) It may include a data association) communication unit, a Wi-Fi Direct (WFD) communication unit, an ultra wideband (UWB) communication unit, and the like, but is not limited thereto. The mobile communication unit transmits/receives a radio signal to and from at least one of a base station, an external terminal, and a server on a mobile communication network.

일 실시예에서, 프로세서는 통상적으로 서버의 전반적인 동작을 제어한다. 예를 들어, 프로세서는, 서버의 DB에 저장된 프로그램들을 실행함으로써, DB 및 통신부 등을 전반적으로 제어할 수 있다. 또한, 프로세서는 DB에 저장된 프로그램들을 실행함으로써, 도 1 내지 도6에서의 동시통역 서비스제공 장치의 동작의 일부를 수행할 수 있다. In one embodiment, the processor typically controls the overall operation of the server. For example, the processor may control the DB and the communication unit as a whole by executing programs stored in the DB of the server. Also, the processor may perform some of the operations of the apparatus for providing simultaneous interpretation services in FIGS. 1 to 6 by executing programs stored in the DB.

일 실시예에서, 출력부는 동시통역 서비스제공 장치에서 처리되는 정보를 출력하기 위한 음성을 포함한다. 또한, 음성은 수신된 오디오 신호, 상쇄음성 신호, 최종음성 신호 등 사용자의 음성에 대한 최종음성 신호를 오디오, 이어폰 등 청취할 수 있는 수단을 통해 청취할 수 있다. In one embodiment, the output unit includes a voice for outputting information processed by the simultaneous interpretation service providing apparatus. In addition, the voice may be heard through a means capable of listening to the final voice signal for the user's voice, such as the received audio signal, the offset voice signal, and the final voice signal, through audio, earphones, and the like.

이상의 설명은 본 발명의 기술적 사상을 예시적으로 설명한 것에 불과한 것으로, 통상의 기술자라면 본 발명의 본질적인 특성이 벗어나지 않는 범위에서 다양한 변경 및 수정이 가능할 것이다. The above description is merely illustrative of the technical spirit of the present invention, and various changes and modifications may be made by those skilled in the art without departing from the essential characteristics of the present invention.

따라서, 본 명세서에 개시된 실시예들은 본 발명의 기술적 사상을 한정하기 위한 것이 아니라, 설명하기 위한 것이고, 이러한 실시예들에 의하여 본 발명의 범위가 한정되는 것은 아니다. Accordingly, the embodiments disclosed in the present specification are not intended to limit the technical spirit of the present invention, but to illustrate, and the scope of the present invention is not limited by these embodiments.

본 발명의 보호범위는 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 이해되어야 한다. The protection scope of the present invention should be interpreted by the claims, and all technical ideas within the scope equivalent thereto should be understood to be included in the scope of the present invention.

Claims

receiving an audio signal including voice data;
converting the received audio signal into a first text using a speech-to-text (STT) model;
generating a second text by translating the converted first text into a preset language;
converting the generated second text into a final speech signal using a text-to-speech (TTS) model; and
Transmitting at least one of the converted final voice signal or the received audio signal to at least one external device,
How to provide simultaneous interpretation service.

According to claim 1,
Receiving the audio signal comprises:
Filtering the received audio signal using an adaptive digital filter having preset digital filter coefficients;
How to provide simultaneous interpretation service.

According to claim 1,
After receiving the audio signal,
generating an offset speech signal having a phase difference of 180 degrees from a waveform of the received audio signal; and
Further comprising the step of transmitting the generated cancellation voice signal to the external device,
How to provide simultaneous interpretation service.

4. The method of claim 3,
The step of generating the offset speech signal comprises:
Comprising the step of amplifying the received audio signal to analyze the waveform of the audio signal,
How to provide simultaneous interpretation service.

4. The method of claim 3,
The cancellation voice signal arrives at the external device earlier than the received audio signal,
How to provide simultaneous interpretation service.

According to claim 1,
After receiving the audio signal, further comprising the step of extracting the characteristic information of the voice from the voice included in the received audio signal,
How to provide simultaneous interpretation service.

7. The method of claim 6,
The step of extracting the characteristic information of the voice,
Extracting at least one characteristic information of tone information, pitch information, vocal intensity information, speech speed information, and vocal tract characteristic information of a voice included in the received audio signal sign,
How to provide simultaneous interpretation service.

7. The method of claim 6,
The step of converting to the final voice signal,
generating an output voice corresponding to the extracted feature information; and
converting the generated output voice into a final voice signal through the text-to-speech model,
How to provide simultaneous interpretation service.

According to claim 1,
The step of transmitting to the external device,
Transmitting the converted final voice signal and the cancellation voice signal to at least one external device,
How to provide simultaneous interpretation service.

According to claim 1,
The step of generating the second text comprises:
analyzing the morpheme of the first text according to the condensed morpheme dictionary;
analyzing the syntax of the first text according to the constructed grammar dictionary; and
Translating the first text into a preset language based on the analyzed morpheme and syntax to generate the second text,
How to provide simultaneous interpretation service.

11. The method of claim 10,
After generating the second text,
Further comprising the step of correcting the generated second text using artificial intelligence translation software,
How to provide simultaneous interpretation service.

4. The method of claim 3,
The step of transmitting to the external device,
When the received audio signal is transmitted, the cancellation voice signal is transmitted to the external device at the same time as the received audio signal,
How to provide simultaneous interpretation service.