KR101165906B1

KR101165906B1 - Voice-text converting relay apparatus and control method thereof

Info

Publication number: KR101165906B1
Application number: KR1020100096893A
Authority: KR
Inventors: 윤용진; 양장모
Original assignee: 주식회사 엘지유플러스
Priority date: 2010-10-05
Filing date: 2010-10-05
Publication date: 2012-07-13
Also published as: KR20120035400A

Abstract

본 발명은 음성-텍스트 변환 중계 장치 및 그 제어방법에 관한 것이다. 본 발명에 따른 음성-텍스트 변환 중계 장치의 제어방법은, 이동통신 단말기로부터 음성신호를 포함하는 텍스트 변환 요청 신호를 수신하는 단계와; 기 구비된 API(Application Programming Interface)를 이용하여 상기 텍스트 변환 요청 신호에 포함된 음성신호를 변환 처리 서버에 전송하여 텍스트로의 변환을 요청하는 단계와; 상기 변환 처리 서버로부터 음성신호로부터 변환된 텍스트를 수신하는 단계와; 상기 수신된 텍스트를 상기 이동통신 단말기에 전송하는 단계를 포함하는 것을 특징으로 한다.The present invention relates to a speech-to-text translator and a control method thereof. According to an aspect of the present invention, there is provided a control method of a voice-to-text conversion relay device, comprising: receiving a text conversion request signal including a voice signal from a mobile communication terminal; Transmitting a voice signal included in the text conversion request signal to a conversion processing server by using an API (Application Programming Interface) to request conversion into text; Receiving the converted text from the audio signal from the conversion processing server; And transmitting the received text to the mobile communication terminal.

Description

VOICE-TEXT CONVERTING RELAY APPARATUS AND CONTROL METHOD THEREOF}

본 발명은 음성-텍스트 변환 중계 장치 및 그 제어방법에 관한 것으로, 보다 상세하게는 이동통신 단말기로부터 전송되는 음성 신호를 소정의 텍스트로 변환하기 위한 변환 중계 장치 및 그 제어방법에 관한 것이다.The present invention relates to a voice-to-text conversion relay device and a control method thereof, and more particularly, to a conversion relay device and a control method for converting a voice signal transmitted from a mobile communication terminal into a predetermined text.

이동통신 기술의 발전으로 인해 이동통신 단말기에서 소정의 텍스트를 입력하는 방식이 다양해지고 있다.Due to the development of mobile communication technology, various methods of inputting predetermined texts in mobile communication terminals have been diversified.

즉, 종래에는 이동통신 단말기에 구비된 키 패드의 버튼을 직접 사용자가 누름으로써 해당하는 텍스트가 이동통신 단말기 화면에 표시되도록 하였으나, 최근에는 사용자의 음성을 감지하여 그 음성에 해당하는 텍스트가 이동통신 단말기 화면에 표시되도록 하는 기술이 개시된 바 있다.That is, in the related art, a corresponding text is displayed on a screen of a mobile communication terminal by directly pressing a button of a keypad provided in the mobile communication terminal. However, in recent years, a text corresponding to the voice is detected by detecting a user's voice. A technique for displaying on a terminal screen has been disclosed.

그런데 이처럼 음성을 텍스트로 변환하는 것은 이동통신 단말기 자체에서 수행되기 보다는 소정의 서버에서 이루어지는 방식이 이용되고 있다. 이는 이동통신 단말기의 하드웨어 구성이 음성의 텍스트로의 변환을 수행하기에는 적절치 않다는 이유 때문이기도 하고, 또한 음성의 텍스트로의 변환은 방대한 데이터베이스를 기초로 비교 분석하는 것이고 해당 데이터베이스의 수시 업데이트가 필요한 것이어서 이처럼 방대한 양의 데이터베이스를 구비하고 수시로 업데이트 하도록 하는 것은 이동통신 단말기에 구현하는 것보다는 소정의 서버에 구현하는 것이 효율적이기 때문이다.However, the method of converting the voice into the text is performed in a predetermined server rather than performed in the mobile communication terminal itself. This is due to the fact that the hardware configuration of the mobile terminal is not suitable for performing the conversion of the voice into the text, and the conversion of the voice into the text is based on a comparative analysis based on a large database. The reason why a large amount of database is provided and updated frequently is that it is more efficient to implement a predetermined server than a mobile communication terminal.

그런데 이처럼 음성신호를 텍스트로 변환하는 기능을 제공하는 변환 서버는 특정 이동통신 단말기의 요청에만 응답하는 단점이 있다.However, a conversion server providing a function of converting a voice signal into text has a disadvantage of responding only to a request of a specific mobile communication terminal.

즉, 변환 서버는 소정의 API(Application Programming Interface)를 특정 플랫폼이 구동되는 이동통신 단말기에 제공하고서 해당 API를 통한 요청 신호에 따라 음성 신호를 텍스트로 변환하는 서비스를 제공하고 있는 것이다.That is, the conversion server provides a service for converting a voice signal into text according to a request signal through a corresponding API by providing a predetermined API (Application Programming Interface) to a mobile communication terminal running a specific platform.

이처럼 변환 서버에 음성 신호의 텍스트로의 변환을 요청하기 위해서는 해당 변환 서버에서 제공하는 API가 이동통신 단말기에 구비되어야 하는데, 변환 서버의 종류는 물론 이동통신 단말기의 플랫폼은 다양한데, 각각의 변환 서버에 대한 API를 각 이동통신 단말기에 구비하는 것은 비효율적이고, 또한 어느 하나의 변환 서버에 대한 API가 모든 종류의 플랫폼에 대해 구비될 수도 없어 해당 API가 제공되지 못하는 이동통신 단말기는 해당 변환 서버를 통한 음성 신호의 텍스트로의 변환을 수행할 수 없게 되는 문제점이 발생한다.As described above, in order to request the conversion server to convert the voice signal into text, an API provided by the conversion server should be provided in the mobile communication terminal. There are various types of conversion servers as well as platforms of the mobile communication terminal. It is inefficient to have an API for each mobile communication terminal, and a mobile communication terminal for which a corresponding API is not provided because an API for any one conversion server cannot be provided for all kinds of platforms is used for voice through the conversion server. The problem arises that the conversion of the signal to text cannot be performed.

본 발명은 상기한 종래의 문제점을 해결하기 위해 안출된 것으로서, 그 목적은 다양한 종류의 변환 서버가 제공하는 음성 신호의 텍스트 변환 서비스를 다양한 종류의 이동통신 단말기가 이용할 수 있도록 하는 음성-텍스트 변환 중계 장치 및 그 제어방법을 제공하는 것이다.SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems, and an object thereof is to provide a voice-to-text conversion relay that enables various types of mobile communication terminals to use text conversion services of voice signals provided by various types of conversion servers. An apparatus and a control method thereof are provided.

상기한 목적을 달성하기 위해 본 발명에 따른 음성-텍스트 변환 중계 장치는, 이동통신 단말기로부터 음성신호를 포함하는 텍스트 변환 요청 신호를 수신하는 요청 수신부와; 기 구비된 API(Application Programming Interface)를 이용하여 상기 텍스트 변환 요청 신호에 포함된 음성신호를 변환 처리 서버에 전송하여 텍스트로의 변환을 요청하는 변환 요청부와; 상기 변환 처리 서버로부터 음성신호로부터 변환된 텍스트를 수신하는 응답 수신부와; 상기 응답 수신부에 수신된 텍스트를 상기 이동통신 단말기에 전송하는 결과 전송부를 포함하여 구성된다.In order to achieve the above object, a voice-to-text conversion relay apparatus includes: a request receiving unit for receiving a text conversion request signal including a voice signal from a mobile communication terminal; A conversion request unit which transmits a voice signal included in the text conversion request signal to a conversion processing server by using an API (Application Programming Interface) to request conversion into text; A response receiving unit which receives the converted text from the voice signal from the conversion processing server; And a result transmitter for transmitting the text received from the response receiver to the mobile communication terminal.

또, 상기한 목적을 달성하기 위해 본 발명에 따른 음성-텍스트 변환 중계 장치의 제어방법은, 이동통신 단말기로부터 음성신호를 포함하는 텍스트 변환 요청 신호를 수신하는 단계와; 기 구비된 API(Application Programming Interface)를 이용하여 상기 텍스트 변환 요청 신호에 포함된 음성신호를 변환 처리 서버에 전송하여 텍스트로의 변환을 요청하는 단계와; 상기 변환 처리 서버로부터 음성신호로부터 변환된 텍스트를 수신하는 단계와; 상기 수신된 텍스트를 상기 이동통신 단말기에 전송하는 단계를 포함하여 이루어진다.In addition, to achieve the above object, a control method of a voice-to-text conversion relay device according to the present invention comprises the steps of: receiving a text conversion request signal including a voice signal from a mobile communication terminal; Transmitting a voice signal included in the text conversion request signal to a conversion processing server by using an API (Application Programming Interface) to request conversion into text; Receiving the converted text from the audio signal from the conversion processing server; And transmitting the received text to the mobile communication terminal.

이상 설명한 바와 같이 본 발명에 따르면, 변환 서버의 API가 제공되지 않는 플랫폼을 가진 이동통신 단말기에서도 본 발명에 따른 음성-텍스트 변환 중계 장치를 통해 해당 변환 서버의 서비스(음성의 텍스트 변환 서비스)를 이용할 수 있다.As described above, according to the present invention, a mobile communication terminal having a platform on which an API of a conversion server is not provided can use a service (voice text conversion service) of the corresponding conversion server through the voice-to-text conversion relay apparatus according to the present invention. Can be.

또한, 음성-텍스트 변환 중계 장치는 이동통신 단말기의 텍스트 변환 요청이 있는 경우 복수 개의 변환 서버에 요청하여 텍스트를 수신하고, 그 수신된 텍스트들을 비교하여 최적의 텍스트를 선정한 후에 이동통신 단말기에 제공함으로써, 이동통신 단말기에서는 음성 신호에 대해 보다 정확하게 변환된 텍스트를 수신할 수 있게 된다. 즉, 이동통신 단말기는 한 번의 요청만으로 실질적으로 복수 개의 변환 서버의 변환 서비스를 이용할 수 있게 되는 것이다.In addition, when the text-to-text conversion request is received from the mobile communication terminal, the voice-to-text conversion relay apparatus receives a text by requesting a plurality of conversion servers, compares the received texts, selects an optimal text, and then provides the text to the mobile communication terminal. In addition, the mobile communication terminal can receive the text converted more accurately with respect to the voice signal. That is, the mobile communication terminal can use the conversion services of the plurality of conversion servers substantially with one request.

도 1은 본 발명의 일 실시예에 따른 음성-텍스트 변환 중계 장치를 포함하는 전체 시스템의 개략 구성도이고,
도 2는 도 1의 음성-텍스트 변환 중계 장치의 기능 블록도이고,
도 3은 본 발명의 일 실시예에 따른 음성-텍스트 변환 중계 장치를 포함하는 전체 시스템의 제어흐름도이다.1 is a schematic structural diagram of an entire system including a speech-to-text transcoding relay device according to an embodiment of the present invention;
FIG. 2 is a functional block diagram of the speech-to-text transcoding relay device of FIG. 1;
3 is a control flowchart of an entire system including a speech-to-text transcoding relay device according to an embodiment of the present invention.

이하에서는 첨부도면을 참조하여 본 발명에 대해 상세히 설명한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

본 발명의 일 실시예에 따른 음성-텍스트 변환 중계 장치(100)를 포함하는 전체 시스템의 개략 구성도는 도 1에 도시된 바와 같다.A schematic block diagram of the entire system including the speech-to-text transcoding relay device 100 according to an embodiment of the present invention is illustrated in FIG. 1.

동 도면에 도시된 바와 같이 전체 시스템은 적어도 하나의 변환 처리 서버(300), 음성-텍스트 변환 중계 장치(100), 이동통신 단말기(200)를 포함하여 구성된다.As shown in the figure, the entire system includes at least one conversion processing server 300, a voice-to-text conversion relay device 100, and a mobile communication terminal 200.

여기서 이동통신 단말기(200) 역시 각기 다른 플랫폼을 가진 복수 개의 이동통신 단말기(200)로 구성될 수 있으나 설명의 편의를 위해 하나의 이동통신 단말기(200)만을 도시하였다.Here, the mobile communication terminal 200 may also be composed of a plurality of mobile communication terminals 200 having different platforms, but only one mobile communication terminal 200 is illustrated for convenience of description.

우선, 이동통신 단말기(200)는 사용자의 음성을 입력받고 음성 신호를 포함하는 텍스트 변환 요청 신호를 음성-텍스트 변환 중계 장치(100)에 전송하고, 그 음성-텍스트 변환 중계 장치(100)로부터 수신되는 텍스트를 표시하는 기능을 수행한다.First, the mobile communication terminal 200 receives a user's voice and transmits a text conversion request signal including a voice signal to the voice-to-text conversion relay 100 and receives from the voice-to-text conversion relay 100. This function displays the text to be displayed.

이를 위해 이동통신 단말기(200)에는 사용자의 음성 입력에 따른 음성 신호를 생성하여 텍스트 변환 요청 신호에 포함시키고, 그 텍스트 변환 요청 신호를 음성-텍스트 변환 중계 장치(100)에 전송하도록 하는 기능이 포함되어 있어야 하고, 이러한 기능은 예를 들어 특정 서버로부터 다운로드하는 어플리케이션에 의해 수행될 수도 있다. 이 경우 해당 어플리케이션에는 음성-텍스트 변환 중계 장치(100)의 네트워크(일 예로, 인터넷) 주소가 프로그램 코딩으로 포함됨이 바람직하다.To this end, the mobile communication terminal 200 includes a function of generating a voice signal according to a voice input of a user and including the same in a text conversion request signal and transmitting the text conversion request signal to the voice-to-text conversion relay apparatus 100. This function may for example be performed by an application downloading from a particular server. In this case, it is preferable that the application includes a network (eg, Internet) address of the voice-to-text transcoding relay device 100 as a program coding.

변환 처리 서버(300)는 소정의 음성 신호를 수신하여 그 수신된 음성 신호에 대응하는 텍스트를 생성하는 기능을 수행하는 것으로서, API를 제공한 플랫폼을 이용하는 이동통신 단말기(200)로부터의 음성-텍스트 변환 요청을 직접 처리할 수도 있는데, 특히 음성-텍스트 변환 중계 장치(100)와 통신하여 음성-텍스트 변환 중계 장치(100)의 요청에 따라 소정의 음성 신호를 텍스트로 변환하여 제공하는 기능을 수행한다.The conversion processing server 300 performs a function of receiving a predetermined voice signal and generating a text corresponding to the received voice signal, and the voice-text from the mobile communication terminal 200 using a platform provided with an API. The conversion request may be directly processed, and in particular, it communicates with the speech-to-text conversion relay apparatus 100 to convert a predetermined speech signal into text in response to a request of the speech-to-text conversion relay apparatus 100. .

변환 처리 서버(300)가 소정의 음성신호에 대응하는 텍스트를 판단하는 방법은 기 공지된 다양한 방식이 이용될 수 있다.As a method of determining, by the conversion processing server 300, a text corresponding to a predetermined voice signal, various methods known in the art may be used.

예를 들어 변환 처리 문맥을 고려한 수천 개의 서브 단어 형태의 모델을 이용할 수 있다. 예를 들어 수신된 음성 신호에서 필요한 특징 벡터를 추출한 후 기 저장된 데이터베이스의 자료와 비교하여 음성 신호에 대응하는 텍스트를 추출할 수 있다.For example, a model in the form of thousands of subwords may be used that takes into account the translation processing context. For example, after extracting the required feature vector from the received speech signal, the text corresponding to the speech signal may be extracted by comparing with the data stored in the database.

여기서 음성 특징 벡터는 음성학적 정보를 담고 있는 것으로서 배경 잡음이나 화자간의 차이 또는 발음 태도 등에 대해서는 민감하게 변하지 않는 통계적 데이터에 해당한다.Here, the speech feature vector contains phonetic information and corresponds to statistical data that is insensitive to background noise, speaker differences, or pronunciation attitudes.

음성 특징 벡터를 추출하기 위해서 다양한 방식이 적용될 수 있으며, 예를 들어 음성 신호에 대해서 모든 주파수 대역에 동일하게 비중을 두어 분석하는 LPC(Linear Predictive Coding) 추출법, 혹은 사람의 음성 인지 양상이 선형적이지 않고 로그 스케일과 비슷한 멜 스케일을 따른다는 특성을 반영한 MFCC(Mel Frequency Cepstral Coefficients) 추출법, 음성과 잡음을 뚜렷하게 구별하기 위해 고주파 성분을 강조해 주는 고역강조 추출법, 음성을 짧은 구간으로 나누어 분석할 때 생기는 단절로 인한 왜곡현상을 최소화 하는 창 함수 추출법 등 다양하다.Various methods can be applied to extract the speech feature vectors, for example, LPC (Linear Predictive Coding) extraction method which analyzes the speech signal with equal weight in all frequency bands, or the human speech recognition pattern is not linear. Mel Frequency Cepstral Coefficients (MFCC) extraction, which reflects the characteristics of a mel scale similar to a logarithmic scale, and a high-frequency emphasis method that emphasizes high frequency components to clearly distinguish between speech and noise, and breaks that occur when the speech is divided into short intervals. The window function extraction method minimizes the distortion caused by

변환 처리 서버(300)는 이렇게 추출된 음성 특징 벡터와 기존에 구축된 음성데이터베이스의 음성학적 정보를 비교하여 인식 결과를 얻을 수 있는데, 단어 단위 검색 또는 문장 단위 검색에 의해 텍스트를 추출할 수 있다.The conversion processing server 300 may obtain the recognition result by comparing the extracted speech feature vector with the phonetic information of the existing speech database, and may extract text by word search or sentence search.

이처럼 변환 처리 서버(300)에서 음성신호에 대응하여 텍스트를 추출하는 과정은 다양한 방식이 이용될 수 있는 것이다.As described above, the process of extracting the text corresponding to the voice signal from the conversion processing server 300 may use various methods.

음성-텍스트 변환 중계 장치(100)는 이동통신 단말기(200)로부터 음성신호를 포함하는 텍스트 변환 요청 신호를 수신하고, 기 구비된 API(Application Programming Interface)를 이용하여 텍스트 변환 요청 신호에 포함된 음성신호를 변환 처리 서버(300)에 전송하여 텍스트로의 변환을 요청하고, 그 요청에 따라 변환 처리 서버(300)로부터 수신되는 텍스트를 이동통신 단말기(200)에 전송하는 일종의 변환 중계 기능을 수행한다.The speech-to-text conversion relay apparatus 100 receives a text conversion request signal including a voice signal from the mobile communication terminal 200 and uses a pre-installed application programming interface (API) to express the voice included in the text conversion request signal. Sends a signal to the conversion processing server 300 to request conversion into text, and performs a kind of conversion relaying function of transmitting the text received from the conversion processing server 300 to the mobile communication terminal 200 according to the request. .

이러한 음성-텍스트 변환 중계 장치(100)의 세부 기능블록의 예는 도 2에 도시된 바와 같다.An example of a detailed functional block of such a speech-to-text relay apparatus 100 is shown in FIG. 2.

동 도면에 도시된 바와 같이 음성-텍스트 변환 중계 장치(100)는 요청 수신부(110), 변환 요청부(120), 응답 수신부(130), 결과 전송부(140)를 포함하여 구성된다.As shown in the figure, the speech-to-text conversion relay apparatus 100 includes a request receiving unit 110, a conversion requesting unit 120, a response receiving unit 130, and a result transmitting unit 140.

요청 수신부(110)는 이동통신 단말기(200)로부터 음성신호를 포함하는 텍스트 변환 요청 신호를 수신하는 기능을 수행한다.The request receiving unit 110 performs a function of receiving a text conversion request signal including a voice signal from the mobile communication terminal 200.

음성신호는 텍스트 변환요청 신호의 일부에 해당하는 것으로서, 요청 수신부(110)는 음성신호에 대해서는 이동통신 단말기(200)로부터 스트리밍 방식으로 수신할 수도 있다. 예를 들어 이동통신 단말기(200)로부터 수신되는 텍스트 변환 요청 신호가 요청 코드와 음성 신호로 구분되는 경우, 요청 수신부(110)는 텍스트 변환을 요청하는 코드를 먼저 수신한 후에 이어서 음성신호를 스트리밍 방식으로 수신할 수도 있는 것이다.The voice signal corresponds to a part of the text conversion request signal, and the request receiver 110 may receive the voice signal from the mobile communication terminal 200 in a streaming manner. For example, when the text conversion request signal received from the mobile communication terminal 200 is divided into a request code and a voice signal, the request receiving unit 110 first receives a code for requesting text conversion and then streams the voice signal. It can also be received.

변환 요청부(120)는 기 구비된 API를 이용하여 이동통신 단말기(200)의 텍스트 변환 요청 신호에 포함된 음성신호를 변환 처리 서버(300)에 전송하여 텍스트로의 변환을 요청하는 기능을 수행한다.The conversion request unit 120 transmits the voice signal included in the text conversion request signal of the mobile communication terminal 200 to the conversion processing server 300 by using the provided API to request conversion to text. do.

즉, 변환 요청부(120)에는 적어도 하나의 API가 저장되어 있는데, 여기서 API는 각각 서로 다른 변환 처리 서버(300)에 대응되는 것들이다. 따라서 변환 요청부(120)는 저장된 API들을 이용하여 복수 개의 변환 처리 서버(300)에 음성에 대한 텍스트로의 변환을 요청할 수 있다.That is, at least one API is stored in the conversion request unit 120, where the APIs correspond to different conversion processing servers 300, respectively. Therefore, the conversion request unit 120 may request a plurality of conversion processing servers 300 to convert the speech into text using stored APIs.

이처럼 변환 요청부(120)에 복수 개의 API가 저장되어 있는 경우에는 각각의 이동통신 단말기(200)마다 복수 개의 변환 처리 서버(300)에 접속하기 위해 복수 개의 API를 구비하지 않아도 되는 장점이 있다.When a plurality of APIs are stored in the conversion request unit 120 as described above, each mobile communication terminal 200 has an advantage of not having a plurality of APIs to access the plurality of conversion processing servers 300.

응답 수신부(130)는 변환 처리 서버(300)로부터 음성신호를 기초로 변환된 텍스트를 수신하는 기능을 수행한다. 즉, 응답 수신부(130)는 변환 요청부(120)에 의해 전송된 음성신호에 대응하는 텍스트를 변화 처리 서버로부터 수신하는 것이다.The response receiver 130 performs a function of receiving the converted text based on the voice signal from the conversion processing server 300. That is, the response receiving unit 130 receives the text corresponding to the voice signal transmitted by the conversion requesting unit 120 from the change processing server.

여기서 변환 요청부(120)가 텍스트로의 변환을 요청한 변환 처리 서버(300)가 복수 개인 경우 응답 수신부(130)는 그 각각의 변환 처리 서버(300)로부터 응답 결과로써 텍스트를 수신할 수 있다.Here, when there are a plurality of conversion processing servers 300 that the conversion request unit 120 requests conversion into text, the response receiving unit 130 may receive text from each conversion processing server 300 as a response result.

결과 전송부(140)는 응답 수신부(130)에 수신된 텍스트를 그 음성에서의 텍스트로의 변환을 요청한 이동통신 단말기(200)에 전송하는 기능을 수행한다.The result transmitter 140 transmits the text received by the response receiver 130 to the mobile communication terminal 200 which has requested the conversion of the voice into the text.

이때, 결과 전송부(140)는 응답 수신부(130)에 복수 개의 변환 처리 서버(300)로부터 각각의 텍스트가 수신됨으로써 총 수신된 텍스트가 복수 개인 경우에는 그 복수개의 텍스트를 상호 비교하여 기 설정된 알고리즘에 따라 최적 텍스트를 선정한 후, 선정된 최적 텍스트를 이동통신 단말기(200)에 전송할 수도 있다.In this case, the result transmitting unit 140 receives the respective texts from the plurality of conversion processing servers 300 in the response receiving unit 130, and when there are a plurality of total received texts, the plurality of texts are compared with each other and a preset algorithm is compared. After selecting the optimal text according to the transmission, the selected optimal text may be transmitted to the mobile communication terminal 200.

예를 들어 복수 개의 변환 처리 서버(300)로부터 수신된 총 텍스트의 개수가 3개 이상인 경우, 결과 전송부(140)는 응답 수신부(130)에 수신된 텍스트 중 일치하는 횟수가 가장 많은 텍스트를 최적 텍스트로 선정한 후 이동통신 단말기(200)에 전송할 수도 있다.For example, when the total number of texts received from the plurality of conversion processing servers 300 is three or more, the result transmitter 140 may optimize the text having the most matches among the texts received by the response receiver 130. The text may be selected and then transmitted to the mobile communication terminal 200.

구체인 예를 든다면, 이동통신 단말기(200) 사용자가 "안녕"이라고 말한 경우, 그 "안녕"이라는 음성 신호에 대해 3개의 변환 처리 서버(300)로부터 "안녕","안녕", "아녕"이라는 텍스트가 각각 수신된 경우에는 "안녕"이라는 단어가 두 개로써 더 많이 수신되었으므로 최적 텍스트로 선정할 수 있는 것이다.For example, when the user of the mobile communication terminal 200 says "good morning," the "good morning", "good morning", "good morning" from the three conversion processing server 300 for the voice signal "good morning" If each of the texts is received, the word "hello" is more received as two words, so that the best text can be selected.

다른 예로써, 결과 전송부(140)는 응답 수신부(130)에 수신된 텍스트가 모두 상이한 경우에는 기 저장된 단어 사전 데이터와의 유사성 정도를 기초로 최적 텍스트를 선정할 수 있다. 예를 들어 단어 사전에 있는 데이터와 가장 근접한 텍스트를 최적 텍스트로 선정할 수 있는 것이다.As another example, when the texts received by the response receiver 130 are different from each other, the result transmitter 140 may select the optimal text based on the degree of similarity with previously stored word dictionary data. For example, the text closest to the data in the word dictionary can be selected as the optimal text.

또 다른 예로써 결과 전송부(140)는 수신된 텍스트가 모두 상이한 경우 기 저장된 각 단어의 사용 빈도수 데이터를 기초로 최적 텍스트를 선정할 수도 있다. 즉, 수신된 텍스트 중 가장 많이 사용되는 텍스트를 최적 텍스트로 선정하는 것이다.As another example, when all of the received texts are different, the result transmitter 140 may select the optimal text based on the frequency of use data of each word. That is, the most used text among the received texts is selected as the optimal text.

이하에서는 본 발명의 일 실시예에 따른 음성-텍스트 변환 중계 장치(100)를 포함하는 전체 시스템의 신호 송수신 및 제어 흐름을 도 3을 참조하여 설명한다.Hereinafter, a signal transmission and control flow of an entire system including the voice-to-text transcoding relay device 100 according to an embodiment of the present invention will be described with reference to FIG. 3.

우선, 이동통신 단말기(200)는 사용자의 명령에 따라 텍스트 변환 중계 준비 요청 신호를 음성-텍스트 변환 중계 장치(100)에 전송한다(단계 S1).First, the mobile communication terminal 200 transmits a text conversion relay preparation request signal to the speech-to-text conversion relay device 100 according to a user's command (step S1).

음성-텍스트 변환 중계 장치(100)는 음성 수신을 위한 포트를 열어 준비하고(단계 S3), 이동통신 단말기(200)에 음성 신호 전송을 요청한다(단계 S5).The speech-to-text conversion relay apparatus 100 opens and prepares a port for receiving a voice (step S3), and requests the mobile communication terminal 200 to transmit a voice signal (step S5).

이동통신 단말기(200)는 사용자의 음성 신호를 입력받고(단계 S7), 그 입력받은 음성 신호를 음성-텍스트 변환 중계 장치(100)에 전송한다(단계 S9). 이때 음성신호는 스트리밍 방식으로 전송될 수 있다.The mobile communication terminal 200 receives a user's voice signal (step S7), and transmits the received voice signal to the voice-to-text conversion relay device 100 (step S9). In this case, the voice signal may be transmitted by a streaming method.

본 실시예에서는 텍스트 변환 중계 준비 요청 신호와 음성 신호가 분리되어 음성-텍스트 변환 중계 장치(100)에 전송되는 것을 일 예로 하였으나, 텍스트 변환 준비 요청과 음성 신호는 이동통신 단말기(200)로부터 동시에 음성-텍스트 변환 중계 장치(100)에 전송될 수도 있음은 물론이다. 즉, 텍스트 변환 중계 준비 요청 신호와 음성 신호는 모두 텍스트 변환 요청 신호에 해당할 수 있다.In this embodiment, the text conversion relay preparation request signal and the voice signal are separated and transmitted to the voice-to-text conversion relay apparatus 100 as an example. However, the text conversion preparation request signal and the voice signal are simultaneously transmitted from the mobile communication terminal 200. Of course, it may be transmitted to the text conversion relay apparatus 100. That is, both the text conversion relay preparation request signal and the voice signal may correspond to the text conversion request signal.

음성-텍스트 변환 중계 장치(100)는 기 저장된 각 변환 처리 서버(300)의 API를 실행(이용)하여(단계 S11) 각 변환 처리 서버(300)에 수신된 음성신호를 전송하며 텍스트로의 변환을 요청한다(단계 S13).The speech-to-text conversion relay apparatus 100 executes (uses) the APIs of the respective conversion processing servers 300 (step S11), and transmits the received voice signals to each conversion processing server 300 and converts them into text. Request (step S13).

여기서 음성-텍스트 변환 중계 장치(100)는 이동통신 단말기(200)로부터 수신된 음성신호 그대로가 아니라, API의 동작에 따라 각 변환 처리 서버(300)에서 요구하는 형식에 맞는 신호로 변환하여 전송할 수 있음은 물론이다.In this case, the voice-to-text conversion relay apparatus 100 may convert the signal to the signal required by the conversion processing server 300 according to the operation of the API, instead of the voice signal received from the mobile communication terminal 200, and transmit the converted signal. Of course.

각 변환 처리 서버(300)는 수신된 음성 신호를 기 저장된 데이터베이스 및 기 설정된 알고리즘을 이용하여 텍스트로 변환한다(단계 S15).Each conversion processing server 300 converts the received speech signal into text using a pre-stored database and a predetermined algorithm (step S15).

변환 처리 서버(300)는 이렇게 변환된 텍스트를 음성-텍스트 변환 중계 장치(100)에 응답으로써 전송한다(단계 S17).The conversion processing server 300 transmits the thus converted text to the speech-to-text conversion relay apparatus 100 in response (step S17).

음성-텍스트 변환 중계 장치(100)는 복수 개의 변환 처리 서버(300)로부터 수신된 각 텍스트 응답을 기초로 최적의 텍스트를 선정한다(단계 S19).The speech-to-text conversion relay apparatus 100 selects an optimal text based on each text response received from the plurality of conversion processing servers 300 (step S19).

이때 최적의 텍스트 선정은 상술한 바와 같이 수신된 텍스트의 일치 횟수 여부, 단어 사전 데이터와의 유사성 정도, 사용 빈도수 등 다양한 방식에 의해 이루어질 수 있다.In this case, the optimal text selection may be performed by various methods such as whether the received text is matched, the similarity with the word dictionary data, the frequency of use, and the like.

음성-텍스트 변환 중계 장치(100) 선정된 텍스트를 이동통신 단말기(200)에 전송하고(단계 S21), 이동통신 단말기(200)는 수신된 텍스트를 표시한다(단계 S23).The speech-to-text conversion relay device 100 transmits the selected text to the mobile communication terminal 200 (step S21), and the mobile communication terminal 200 displays the received text (step S23).

이에 따라 이동통신 단말기(200) 사용자는 각 변환 처리 서버(300)에 대한 별도의 API를 구비하지 않더라도 음성-텍스트 변환 중계 장치(100)를 통해 음성의 텍스트 변환 서비스를 이용할 수 있게 된다.Accordingly, the user of the mobile communication terminal 200 may use the text conversion service of the voice through the voice-to-text conversion relay apparatus 100 even though the user does not have a separate API for each conversion processing server 300.

즉, 자신의 이동통신 단말기(200)에 음성으로 말하기만 하여 해당 음성 신호에 대응되는 텍스트가 표시될 수 있는 것이다.That is, the text corresponding to the corresponding voice signal may be displayed only by speaking to the mobile communication terminal 200 of the mobile terminal 200.

한편, 본 발명은 상기한 특정 실시예에 한정되는 것이 아니라 본 발명의 요지를 벗어나지 않는 범위 내에서 여러 가지로 변형 및 수정하여 실시할 수 있는 것이다. 이러한 변형 및 수정이 첨부되는 특허청구범위에 속한다면 본 발명에 포함된다는 것은 자명할 것이다. While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the invention. It is to be understood that such variations and modifications are intended to be included in the scope of the appended claims.

100 : 음성-텍스트 변환 중계 장치 200 : 이동통신 단말기
300 : 변환 처리 서버 110 : 요청 수신부
120 : 변환 요청부 130 : 응답 수신부
140 : 결과 전송부100: voice-to-text transcoding relay 200: mobile communication terminal
300: conversion processing server 110: request receiving unit
120: conversion request unit 130: response receiving unit
140: result transmission unit

Claims

(a) receiving a text conversion request signal including a voice signal from a mobile communication terminal;
(b) transmitting a voice signal included in the text conversion request signal to a plurality of conversion processing servers by using an API (Application Programming Interface) to request conversion into text;
(c) receiving text converted from a voice signal from the plurality of conversion processing servers;
(d) transmitting the received text to the mobile communication terminal,
In the step (d), when the texts received from the plurality of conversion processing servers are all different, the optimal text is selected based on at least one of the degree of similarity with the pre-stored word dictionary data and the frequency of use. Control method of the speech-to-text conversion relay device characterized in that it comprises the step of transmitting.

The method of claim 1,
Step (b) is a step of requesting a text conversion corresponding to the voice signal to a plurality of conversion processing server corresponding to the corresponding API using a plurality of pre-installed API,
The step (c) consists of receiving each text converted from a speech signal from the plurality of conversion processing servers, respectively.
The step (d) comprises comparing the received plurality of texts with each other to select an optimal text according to a predetermined algorithm, and then transmitting the selected optimal text to the mobile communication terminal. Control method of relay device.

The method of claim 2,
In step (d), the voice-to-text conversion relay is characterized in that the received text has three or more, and the text having the most matching number is selected as the optimal text and then transmitted to the mobile communication terminal. Control method of the device.

delete

A request receiving unit for receiving a text conversion request signal including a voice signal from the mobile communication terminal;
A conversion request unit which transmits a voice signal included in the text conversion request signal to a plurality of conversion processing servers to request conversion into text using an provided application programming interface (API);
A response receiver for receiving text converted from a voice signal from the plurality of conversion processing servers;
A result transmitting unit which transmits the text received in the response receiving unit to the mobile communication terminal;
If the texts received from the plurality of conversion processing servers are different from each other, the result transmitting unit selects an optimal text based on at least one of the degree of similarity with the previously stored word dictionary data and the frequency of use. An apparatus for translating a voice-to-text translator, which transmits to a terminal.

The method of claim 5,
The conversion request unit requests a text conversion corresponding to a voice signal to a plurality of conversion processing servers corresponding to the corresponding API using a plurality of APIs provided.
The response receiving unit receives respective texts converted from voice signals, respectively, from the plurality of conversion processing servers;
The result transmitting unit compares the plurality of texts received from the response receiving unit with each other, selects an optimal text according to a preset algorithm, and transmits the selected optimal text to the mobile communication terminal. .

The method of claim 6,
And the result transmitting unit selects an optimal text having three or more texts received from the response receiving unit, and transmits the text to the mobile communication terminal after selecting the most suitable text.

delete