KR20030039956A

KR20030039956A - A Method for Controlling Data Using Distributed Voice Processing Device and The System Thereof

Info

Publication number: KR20030039956A
Application number: KR1020010071544A
Authority: KR
Inventors: 김현철
Original assignee: (주)시스윌
Priority date: 2001-11-16
Filing date: 2001-11-16
Publication date: 2003-05-22

Abstract

PURPOSE: A method and a system for controlling data by using a distributed voice processing device are provided to use a voice interface for distributed processing of voice in a terminal and a server, so as to download a Web page or a data file through voice input in a mobile terminal and enable the other party to receive data in characters for a character message inputted through voice. CONSTITUTION: The wireless Internet is connected through a mobile terminal(10) to select a menu for voice recognition, and selected contents are transmitted to a Web server(30)(S1,S2). The Web server(30) transmits a VXML(Voice eXtensible Markup Language) tag for the received contents to the mobile terminal(10)(S3). The mobile terminal(10) confirms the VXML tag and operates a voice browser, to prepare for voice receiving(S4). Feature extraction and vector quantization are performed for voice signals inputted by a user, and the voice signals are turned to data with the combination of voice vector, then the voice data are transmitted from the mobile terminal(10) to a natural sound processing server(40)(S5). The natural sound processing server(40) transmits the voice data from the mobile terminal(10) and grammar to a voice recognition server(70), by using an internal grammar database(S6). The voice recognition server(70) compares the received voice data with words in grammar, to notify a result value to the natural sound processing server(40)(S7). According to the received result value, the natural sound processing server(40) requests a Web page to the Web server(30) or transmits required data to the mobile terminal(10) if a Web page is not necessary(S8). The Web server(30) searches the corresponding Web page and transmits the Web page to the mobile terminal(10)(S9). And TTS(Text-To-Speech)-processed data are transmitted to the mobile terminal(10)(S10).

Description

A method for controlling data using distributed voice processing device and the system thereof

본 발명은 분산음성처리 장치를 이용한 데이터 제어 방법과 그 시스템에 관한 것으로서, 구체적으로는 음성처리를 단말기와 서버에서 분산 처리한 음성 인터페이스를 사용하여 데이터를 제어하는 방법과 그 시스템에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and system for controlling data using a distributed speech processing apparatus. More particularly, the present invention relates to a method and system for controlling data using a speech interface distributed by a terminal and a server.

핸드폰 등 이동단말기를 사용하여 단문 메시지 서비스를 이용하는 방식은 사용자가 일일이 키패드상의 버튼을 눌러 문자를 조립하는 번거로움이 있었고 또 잘못 입력시 수정하는 번거로움이 많았다.In the method of using a short message service using a mobile terminal such as a mobile phone, the user has to press the buttons on the keypad to assemble the characters one by one, and to correct when entering the wrong one.

또 무선인터넷이 가능한 이동단말기로 웹서버에 접속하여도 일일히 그 키패드상의 버튼으로 일일이 조작해야 하는 번거로움이 있어 사용자에게는 접속시간 만큼의 비용이 추가로 지불해야 하는 경제적 부담도 많았다.In addition, even when accessing a web server using a wireless terminal capable of wireless Internet, it is cumbersome to operate it with buttons on the keypad. Therefore, the user has to pay additional cost as much as the access time.

그래서 최근 음성인식 시스템이 개발되어 사용자가 그 음성을 입력하여 음성이 문자를 대체하는 기술이 개시된 바가 있다.Thus, recently, a voice recognition system has been developed, and a technology for replacing a character with a voice by a user inputting the voice has been disclosed.

그러나 종래의 이동단말기에 있어 음성처리는 이동단말기에 음성인식 시스템을 구축하여 음성인식과 합성기능을 처리하는 이동단말기와 음성인식방법이 사용되었고, 이동단말기의 메모리나 크기의 용량 제한으로 음성인식 어휘가 제한되었을 뿐 아니라 실시간 음성합성에 있어서도 문제점이 있어 거의 실용화 시키는 데 기술적 한계를 가지고 있었다.However, in the conventional mobile terminal, the mobile terminal and the voice recognition method are used for processing the voice recognition and synthesis function by constructing the voice recognition system in the mobile terminal, and the voice recognition vocabulary is limited by the capacity of the memory or the size of the mobile terminal. Is not only limited, but also has problems in real time speech synthesis, which has technical limitations in practical use.

이처럼 음성포탈(Voice Portal)을 이동단말기에 구현하는 종래기술은 이동단말기에 음성인식 엔진과 문자대음성변환 (TTS:Text To Speech) 엔진을 탑재하기에는 엔진의 무게가 너무 무거울 뿐 아니라 데이터 베이스 업그레이드(DB Upgrade)시 모든 단말기에 부하(Traffic)가 집중화되어 속도감소 혹은 통신단절이라는 문제점이 있었다.As described above, the conventional technology for implementing a voice portal on a mobile terminal is not only too heavy, but also a database upgrade (e.g., a weight to be equipped with a speech recognition engine and a text-to-speech (TTS) engine in the mobile terminal). During the DB upgrade, there was a problem that the traffic was concentrated and the speed was reduced or the communication was disconnected.

또 종래기술의 이동통신망은 음성 위주의 통신구조를 가지고 있어, 음성 확장 생성 언어 (VXML:Voice Extensible Markup Language) 문서의 분해와 해독, 그리고 음성으로 변환시키는 게이트웨이 (일명 음성 포탈 게이트웨이)구조로 되어있기 때문에 진정한 음성포탈 구조를 지향하기 힘들고 서비스를 구축하는 데 있어 제약요소가 많기 때문에 다양하고 유용한 서비스를 구현할 수 없었다.In addition, the conventional mobile communication network has a voice-oriented communication structure, and has a gateway (aka voice portal gateway) structure for disassembling, decoding, and converting a Voice Extensible Markup Language (VXML) document. As a result, it is difficult to pursue a true voice portal structure and there are many constraints in building a service.

그래서 음성인식 시스템을 이동단말기에 구축하였던 것을 시스템에 구축함으로써 이동단말기의 하드웨어적 문제를 해결하고자 하는 방법이 시도되었다. 그러나 이 방법도 이동단말기의 크기나 용량의 한계 등을 향상시킨 방법에 지나지 않고 특히 이동단말기와 이동통신망을 통해 수신된 음성신호의 잡음이나 채널잡음 등으로 인해 인식율이 저하되어 상당수의 음성명령을 수행할 수 없을 뿐 아니라 다수의 접속으로 과부하가 걸려 통신단절을 일으키는 문제점이 있었다.Therefore, a method has been attempted to solve the hardware problem of the mobile terminal by building the voice recognition system in the mobile terminal. However, this method is also a method of improving the size and capacity limit of the mobile terminal, and in particular, the recognition rate is lowered due to noise or channel noise of the voice signal received through the mobile terminal and the mobile communication network to perform a large number of voice commands. Not only that, there is a problem that the communication is overloaded due to a large number of connections.

그 결과 종래기술에 따른 음성처리 시스템이나 방법은 이동통신사 뿐 아니라 단말기 제조사 그리고 이동단말기 사용자에게 많은 불편함을 주는 등 많은 문제점을 안고 있다.As a result, the voice processing system or method according to the related art has many problems such as not only a mobile communication company but also a lot of inconvenience to a terminal manufacturer and a user of a mobile terminal.

따라서 본 발명은 상기와 같은 단점을 제거하여 음성인식처리를 단말기와 서버에서 분산 처리한 음성 인터페이스를 사용하여 데이터 제어 방법과 그 시스템을 제공하는 데 그 목적이 있다.Accordingly, an object of the present invention is to provide a data control method and a system using a voice interface in which voice recognition processing is distributed in a terminal and a server by eliminating the above disadvantages.

도 1 은 분산음성처리를 이용한 데이터 제어 시스템을 도시하는 블럭도이다.1 is a block diagram showing a data control system using distributed speech processing.

도 2 는 분산음성처리를 이용한 데이터 처리 방법을 도식적으로 도시한 블럭도이다.2 is a block diagram schematically showing a data processing method using distributed speech processing.

도 3 은 분산음성처리 시스템을 이용한 음성 단문메시지 (VSMS) 의 입력과 전송을 도시하는 블럭도이다.3 is a block diagram illustrating input and transmission of a voice short message (VSMS) using a distributed speech processing system.

* 도면의 주요부분에 대한 설명 *Description of the main parts of the drawing

10 : 무선단말기, 이동단말기 20 : 스위칭센터, 교환기10: wireless terminal, mobile terminal 20: switching center, switchboard

30 : 웹서버 40 : 자연음 처리 서버30: web server 40: natural sound processing server

50 : 파일서버 60 : 데이터 베이스50: file server 60: database

70 : 음성인식서버 80 : 자연음 처리기70: speech recognition server 80: natural sound processor

90 : 수신측 이동단말기90: receiving mobile terminal

본 발명에 따르면, 분산음성처리 장치를 이용한 데이터 제어 방법과 그 시스템이 제공된다.According to the present invention, there is provided a data control method and system using the distributed speech processing apparatus.

구체적으로는 하기 기재한 특허청구범위의 기재와 동일하다.Specifically, it is the same as that of description of a claim described below.

이하 본 발명에 따른 분산음성처리 장치를 이용한 데이터 제어 방법과 그 시스템을 살펴보도록 한다.Hereinafter, a data control method and a system using the distributed speech processing apparatus according to the present invention will be described.

도 1 은 분산음성처리장치를 이용한 데이터 제어 시스템을 도시하는 블럭도이다.1 is a block diagram showing a data control system using a distributed speech processing apparatus.

분산음성처리는 분산음성인식과 분산음성합성으로 구성된다. 음성채널을 이용하여 전송된 음성신호로부터 음성인식을 수행하는 종전의 방식에서는 음성코딩(low bit rate coding) 과 채널왜곡의 영향으로 인식률 저하를 가져왔다. 그래서 본 발명에서는 분산음성처리의 분산음성인식에서는 음성코딩 데이터를 전송하는 대신 음성으로부터 파라미터를 추출하고, 이를 벡터 양자화 (Vector Quantization)하여 음성채널이 아닌 데이터 채널로 패킷 전송함으로써 분산음성처리가 이루어 진다.Distributed speech processing consists of distributed speech recognition and distributed speech synthesis. In the conventional method of performing speech recognition from a speech signal transmitted using a speech channel, the recognition rate is lowered due to the effects of low bit rate coding and channel distortion. Therefore, in the present invention, instead of transmitting voice coding data, distributed speech recognition of distributed speech processing extracts parameters from speech, performs vector quantization, and transmits packets to data channels instead of voice channels. .

자연음 처리 서버 (40) 는 사용자의 이동단말기 (10) 상에서 음성인식을 구현하는 데 필요한 모든 구성요소간의 흐름을 제어하고, 인식된 음성 데이터와 웹페이지간의 동기를 제공한다. 또한 다수의 서버나 데이터 베이스중에서 부하가 적은 서버를 선택하여 이동단말기에 서버의 주소를 전송한다. 이렇게 전달된 주소를 이용하여 가장 부하가 적은 서버로부터 서비스를 제공받을 수 있도록 제어한다.The natural sound processing server 40 controls the flow between all the components necessary to implement speech recognition on the user's mobile terminal 10, and provides synchronization between the recognized speech data and the web page. It also selects a server with a low load from multiple servers or databases and sends the server's address to the mobile terminal. Using this address, it controls to receive service from the server with the least load.

이동단말기 (10) 에서 음성인식 요청을 받아 해당 문법을 로딩하는 작업, 이동단말기 (10) 에서 특징추출 (Feature Extraction) 과 벡터 양자화 된 음성데이타를 해당 문법과 함께 음성인식서버 (70) 에 전송하는 작업, 음성인식서버 (70) 내의 엔진에서 인식된 결과에 해당하는 데이터를 실행시키는 작업 등이 수행된다. 여기서 특징추출이란 사용자가 이동단말기를 통해 입력한 음성신호를 주파수나 진폭 등의 샘플을 추출하는 작업을 말한다. 즉 필요한 데이터를 다운로드 받는 데 필요한 샘플만을 추출하는 것을 말한다. 이동단말기 (10) 는 핸드폰, 무선인터넷 가능한 컴퓨터 단말기 (노트북 등), 개인휴대단말기 (PDA) 등을 포함하는 개념이다.The mobile terminal 10 receives the voice recognition request and loads the corresponding grammar, and the feature extraction and the vector quantized voice data are transmitted to the voice recognition server 70 together with the corresponding grammar in the mobile terminal 10. A job, a job of executing data corresponding to the result recognized by the engine in the voice recognition server 70, and the like are performed. Here, the feature extraction refers to an operation of extracting a sample such as frequency or amplitude from the voice signal input by the user through the mobile terminal. In other words, it extracts only the samples necessary to download the necessary data. The mobile terminal 10 is a concept including a mobile phone, a wireless Internet capable computer terminal (laptop, etc.), a personal digital assistant (PDA), and the like.

한편, 데이터 베이스 (60) 는 웹서버 (30) 와 연동하여 음성인식을 하도록 웹페이지 중 음성인식이 필요한 장소에 다이아로그 아이디 (Dialog ID) 가 있다. 다이아로그 아이디는 음성인식 기능을 사용할 수 있는 것을 웹브라우저에게 알려주며, 각 다이아로그 아이디는 문법 및 웹컨텐츠와 매핑 (Mapping) 되어 있다. 그래서 데이터 베이스 (60) 는 각 메뉴에 부여된 다이아로그 아이디와 이에 해당하는 문법 파일간, 그리고 웹페이지의 매핑을 관리하는 역할을 한다. 데이터 베이스(60) 는 그 밖에 다른 서버들 및 그에 해당하는 관리도구들이 사용하는 테이블도 데이터 베이스로 함께 활용한다.On the other hand, the database 60 has a dialog ID (Dialog ID) in the place where the voice recognition is required in the web page to perform voice recognition in conjunction with the web server 30. The dialog ID informs the web browser that the voice recognition function can be used, and each dialog ID is mapped with grammar and web content. Thus, the database 60 manages the mapping between the dialog ID assigned to each menu, the corresponding grammar file, and the web page. The database 60 also utilizes tables used by other servers and corresponding management tools as a database.

파일서버 (50) 는 자연음 처리 서버 (40) 와 음성인식서버 (70) 가 필요로 하는 모든 자원 (리소스) 들이 저장되는 곳이다. 간단한 시나리오는 자연음 처리서버 (40) 자체에 저장할 수 있지만, 고용량 시나리오에서는 중복화 된 파일서버(50) 로 구현할 수 있다.The file server 50 is where all the resources (resources) needed by the natural sound processing server 40 and the voice recognition server 70 are stored. A simple scenario may be stored in the natural sound processing server 40 itself, but in a high capacity scenario, it may be implemented as a duplicated file server 50.

음성인식서버 (70) 는 실질적인 음성인식을 담당하는 서버이다. 이동단말기(10) 에서 이미 특징추출과 벡터 양자화 처리된 음성데이터를 자연음 처리 서버(40) 로부터 전달받아 전송된 문법을 비교처리하여 음성인식기능을 수행한다. 그 다음 인식된 결과는 리스트 형태로 자연음 처리 서버 (40) 에 전달된다.The voice recognition server 70 is a server in charge of actual voice recognition. The mobile terminal 10 receives voice data that has already been subjected to feature extraction and vector quantization from the natural sound processing server 40 to perform a voice recognition function by comparing the transmitted grammars. The recognized result is then delivered to the natural sound processing server 40 in the form of a list.

이동단말기 (10) 는 사용자의 음성인식 기능을 수행하도록 무선 인터넷 브라우저와 음성인식용 칩 (예를 들어 DSP 등) 이 탑재되어 있다. 음성인식기능은 단말기에 탑재된 음성인식 버튼을 누를 때 활성화 되며, 수신된 음성명령은 특징추출과 벡터 양자화 과정을 거쳐 그 데이터가 데이터 채널을 통해 자연음 처리 서버(40) 로 전달되도록 하고, 자연음 처리 서버 (40) 에서 파일서버 (50), 음성인식서버 (70) 등과 함께 음성인식과정을 처리한 결과값을 수신한다.The mobile terminal 10 is equipped with a wireless internet browser and a voice recognition chip (for example, a DSP, etc.) to perform a voice recognition function of the user. The voice recognition function is activated when the voice recognition button mounted on the terminal is pressed, and the received voice command is transmitted through the data channel to the natural sound processing server 40 through feature extraction and vector quantization. The sound processing server 40 receives the result of processing the voice recognition process together with the file server 50, the voice recognition server 70, and the like.

본 발명에 따른 분산음성처리를 이용한 데이터 제어 방법을 도 2 를 참조하여 살펴본다.A data control method using distributed speech processing according to the present invention will be described with reference to FIG. 2.

도 2 에서 자연음 처리 서버 (40) 와 이동단말기 (10), 웹서버 (30) 와 이동단말기 (10) 간의 통신연결수단인 스위칭센터 (10) (도 1 참조) 는 설명의 이해를 위해 생략하였으며 가로로 표시한 숫자는 분산음성처리를 이용한 데이터 제어과정을 나타낸 것이다.In FIG. 2, the switching center 10 (see FIG. 1), which is a communication connection means between the natural sound processing server 40 and the mobile terminal 10, the web server 30 and the mobile terminal 10, is omitted for the purpose of explanation. The horizontal numbers indicate the data control process using distributed speech processing.

또한 파일서버 (50) 와 데이터 베이스 (60) 는 도 1 과 같이 독립적으로 존재할 수 있지만 그 용량이 작은 경우에는 자연음 처리 서버 (40) 에 포함될 수 있다. 물론 음성인식서버 (70) 도 자연음 처리 서버 (40) 에 포함될 수 있다.In addition, the file server 50 and the database 60 may exist independently as shown in FIG. 1, but may be included in the natural sound processing server 40 when the capacity thereof is small. Of course, the speech recognition server 70 may also be included in the natural sound processing server 40.

다만 본 발명의 요지가 분산음성처리를 이용한 데이터 처리 (제어) 이므로 음성인식서버 (70) 를 독립한 서버로 나타내었다.However, since the gist of the present invention is data processing (control) using distributed voice processing, the voice recognition server 70 is shown as an independent server.

데이터의 종류에 따라서는 자연음 처리 서버 (40) 를 포함한 서버 등은 두 개 이상 존재할 수 있음은 당연하지만 설명의 이해를 위해 하나만을 표시하였다.Naturally, depending on the type of data, there may be two or more servers including the natural sound processing server 40, but only one is shown for understanding of the description.

먼저 사용자는 이동단말기 또는 무선단말기 (10) 로 그의 키패드상의 버튼을 이용하여 무선인터넷에 접속하는 무선인터넷 접속과정 (S1) 으로부터 시작한다.First, the user starts with the wireless Internet connection process S1, which accesses the wireless Internet by using a button on his keypad with the mobile terminal or the wireless terminal 10.

이어서 사용자는 음성인식을 위한 서브메뉴를 선택하면 선택된 내용이 웹서버 (30) 로 전송된다. 이를 음성 서비스 메뉴선택과정 (S2) 과정이라 한다. 그러면 웹서버 (30) 는 음성인식 서브메뉴가 선택된 내용을 수신하여 음성확장 생성언어 (VXML) 태그를 이동단말기 (10) 에 전송하는 음성확장 생성언어 메뉴전송과정(S3) 이 수행된다.Then, when the user selects a submenu for voice recognition, the selected content is transmitted to the web server 30. This is called a voice service menu selection process (S2). Then, the web server 30 receives the selected contents of the voice recognition submenu, and performs a voice extension generating language menu transmission process S3 for transmitting a voice extension generating language (VXML) tag to the mobile terminal 10.

이렇게 이동단말기 (10) 가 음성확장 생성언어 태그를 수신하면 이를 확인하여 음성 브라우저를 구동시켜 음성을 입력받을 준비를 한다 [브라우저 구동과정(S4)]. 그 후 사용자가 음성메뉴를 선택하여 음성을 전송한다 [음성메뉴선택 및 음성전송과정 (S5)].When the mobile terminal 10 receives the voice extension generating language tag, the mobile terminal 10 checks this and prepares to receive the voice by driving the voice browser [browser driving process (S4)]. After that, the user selects the voice menu and transmits the voice [voice menu selection and voice transmission process (S5)].

즉 음성메뉴선택 및 음성전송과정 (S5) 에서는 사용자가 무선인터넷에 접속되어 있는 상태에서 선택하고 싶은 메뉴명을 예를 들어 "벨소리 다운로드", "문자 메시지 전송" 등과 같이 음성으로 말한다. 특히 선택 가능한 메뉴중 음성인식이 가능한 메뉴들은 웹페이지내에 음성인식용 태그가 되어 있으며, 단말기의 브라우저는 이 태그를 이용하여 단말기의 음성버튼 기능을 구동시키게 된다.In other words, in the voice menu selection and voice transmission process (S5), the menu name that the user wants to select while the user is connected to the wireless Internet is spoken by voice, for example, "ringtone download" or "text message transmission". In particular, the voice recognition menus among the selectable menus are voice recognition tags in the web page, and the browser of the terminal uses the tag to drive the voice button function of the terminal.

또한 음성메뉴선택 및 음성전송과정 (S5) 에서는 사용자의 음성이 단말기(10) 내의 음성인식 선처리기에 의해 특징추출과 벡터 양자화 과정을 거쳐음성벡터의 조합으로 데이터화 된 후 자연음 처리 서버 (40) 에 전송된다. 이때 단말기 화면상의 웹페이지를 가리키는 웹페이지 아이디가 함께 전송된다. 그리고 단말기의 웹브라우저는 위 내용들을 전송한 후 웹서버 (30) 로부터 웹페이지가 전송 되기를 기다리게 된다.In addition, in the voice menu selection and voice transmission process (S5), the user's voice is converted into a combination of voice vectors through feature extraction and vector quantization by a speech recognition preprocessor in the terminal 10, and then the natural sound processing server 40 Is sent to. At this time, the webpage ID indicating the webpage on the terminal screen is transmitted together. The web browser of the terminal waits for the web page to be transmitted from the web server 30 after transmitting the above contents.

그 다음 자연음 처리 서버 (40) 는 단말기 (10) 로부터 전달받은 페이지 아이디를 이용하여 해당 문법을 자연음 처리 서버 (40) 내에 장착된 문법 데이터 베이스 (Grammar DB) 에서 찾아서, 단말기 (10) 로부터 받은 음성벡터 데이터와 문법을 음성인식서버 (70) 에 전송한다[음성인식 처리요청과정 (S6)]. 문법 데이터 베이스 및 응용프로그램의 유저 인터페이스 제어 부분은 음성확정 생성언어로 구현할 수 있다.Next, the natural sound processing server 40 finds the corresponding grammar from the Grammar DB mounted in the natural sound processing server 40 using the page ID received from the terminal 10, and then, from the terminal 10. The received speech vector data and grammar are transmitted to the speech recognition server 70 (speech recognition processing request process (S6)). The user interface control portion of the grammar database and the application program may be implemented as a voice confirmation generation language.

음성인식서버 (70) 는 음성벡터 데이터를 문법내의 단어들과 비교하여 가장 비슷한 단어를 검색하여 자연음 처리 서버 (40) 로 그 결과값을 통보하게 된다 [인식결과통보과정 (S7)].The speech recognition server 70 compares the speech vector data with words in the grammar, searches for the most similar words, and notifies the natural sound processing server 40 of the result value (recognition result notification process S7).

자연음 처리 서버 (40) 는 인식결과통보과정 (S7) 에 의해 전송받은 후 응용프로그램의 알고리즘에 따라 새로운 웹페이지 (또는 음악파일 등의 데이터 파일)가 필요하게 되면 이를 웹서버 (30) 에 요청하게 되고, 웹페이지 요청이 필요하지 않을 경우에는 바로 요구한 데이터를 이동단말기 (10) 로 전송한다[웹페이지 요청과정 (S8)].The natural sound processing server 40 requests the web server 30 when a new web page (or a data file such as a music file) is needed according to the algorithm of the application after being received by the recognition result notification process S7. If the web page request is not required, the requested data is immediately transmitted to the mobile terminal 10 (web page request process S8).

그 다음 자연음 처리 서버 (40) 로부터 요청을 받은 웹서버 (30) 는 해당 웹페이지를 이동단말기 (10) 에 전송하고 [웹페이지 전송과정 (S9)], 이동단말기(10)로 문자대음성변환 (TTS:Text To Speech) 처리된 데이터가 전송하는 과정(S10) 이 실행된다. 즉 자연음 처리 서버 (40) 는 음성인식서버 (70) 에 의해 처리된 문자대음성변환 데이터를 웹서버 (30) 의 메뉴에 맞추어 자연음 처리 서버(40) 가 웹서버 (30) 와 문자대음성변환의 전송을 동기화 시켜서 이동단말기 (10)에 전송하게 된다. 전송받은 데이터는 음성으로 출력된다.Then, the web server 30 having received the request from the natural sound processing server 40 transmits the web page to the mobile terminal 10 and transmits the text to the mobile terminal 10 [S9]. A process (S10) of transferring the processed data (TTS: Text To Speech) is performed. That is, the natural sound processing server 40 matches the character-to-voice conversion data processed by the speech recognition server 70 to the menu of the web server 30. The transmission of the voice conversion is synchronized to the mobile terminal 10. The received data is output as voice.

그러나 웹페이지 요청과정 (S8) 에서 웹페이지 요청이 필요하지 않을 경우, 예를 들어 음악파일인 경우에는 바로 이동단말기 (10) 로 음악파일을 직접 전송해 주고 음성으로 전송한 사실을 출력하여 준다.However, when the web page request is not necessary in the web page request process (S8), for example, in the case of a music file, the music file is directly transmitted to the mobile terminal 10 and the fact that the voice is transmitted is output.

따라서 데이터가 음악인 경우는 음악이 연주되고 동영상인 경우에는 동영상이 연속처리되어 표시장치에 영상이 연속적으로 사용자가 시청할 수 있도록 보여준다.Therefore, when data is music, music is played, and when it is a video, the video is processed continuously so that the user can watch the video continuously on the display device.

예컨대 음성메뉴선택 및 음성전송과정 (S5) 에서 입력된 메뉴명에 따라 예를 들어 "나훈아의 고향역" 명령인 경우 그 음악 데이터에 관련된 음악파일 (또는 웹 페이지) 를 위 과정을 거쳐 단말기 (10) 로 전송하고, 연속해서 연주가 시작된다. 그러면 음성으로 "나훈아의 고향역을 저장하려면 1 번을 누르시고 아니면 2 번을 누르시기 바랍니다" 하고 음성이 출력된다. 전송받은 음악을 이용하여 메모리에 저장하여 나중에 듣는다거나 혹은 벨소리로 선택지정하면 차후 그 다운로드 받은 음악이 벨소리로 출력된다.For example, according to the menu name input in the voice menu selection and voice transmission process (S5), for example, in the case of the "hometown of Nahuna" command, the terminal 10 performs a music file (or web page) related to the music data through the above process. And begin playing continuously. Then press "No. 1 or 2 to save Nahuna's hometown". If you save the music in the memory and listen to it later or select it as a ringtone, the downloaded music will be output as a ringtone later.

또한 이동단말기 (10) 는 선택한 메뉴가 다운로드 되면, 그 데이터 속성에 따라 음악인 경우에는 음악으로 출력해 주고, 동영상 (영화 등) 인 경우에는 동영상 처리되어 사용자가 볼 수 있도록 함은 당연하다.In addition, when the selected menu is downloaded, the mobile terminal 10 outputs music in the case of music according to its data property, and in the case of a video (movie, etc.), the video is processed so that the user can view it.

선택적으로는 다운로드 되어 음성출력 또는 영상출력된 경우 이를 저장시 이동단말기 (10) 의 메모리가 부족한 경우에는 그 경로명을 저장하여 차후에 그 저장된 경로명을 동작시켜면 그 경로를 찾아 직접 음악이나 영상을 출력할 수 있다.Optionally, if the memory of the mobile terminal 10 is insufficient when downloading and outputting the audio output or the video output, the path name may be stored and later operated by the stored path name to directly output music or video. Can be.

도 3은 본 발명의 또 다른 변형예를 예시적으로 도시한 것으로서, 분산음성처리 시스템을 이용한 음성 단문메시지 (VSMS) 의 입력과 전송을 도시하는 블럭도이다. 본 발명의 이해를 돕고자 이동통신망의 일반적인 전송방법과 시스템 구조에 대하여는 생략하기로 하며, 본 발명의 변형예에 있어서 도 2 의 과정 중 S1 과정에서 S6 과정까지는 동일하게 진행한다. 그러나 음성서비스 메뉴선택과정(S2) 에 있어 메뉴는 음성 단문메시지 (VSMS) 메뉴이다.3 is a block diagram illustrating the input and transmission of a voice short message (VSMS) using a distributed speech processing system as an example of another modification of the present invention. In order to help the understanding of the present invention, a general transmission method and system structure of a mobile communication network will be omitted. In the modified example of the present invention, processes S1 to S6 of FIG. 2 are performed in the same manner. However, in the voice service menu selection process (S2), the menu is a voice short message (VSMS) menu.

도 3 에는 도 2 와 비교하여 자연음 처리기 (80) 가 도시되어 있다. 자연음 처리기 (80) 는 입력된 음성 데이터를 분석하여 연음처리, 철자교정, 뛰어쓰기 교정 등 한글문법에 맞는 프로그램이 장착된 서버 혹은 장치이다. 그러나 언어는 한글에 제한되지 않으며 영어나 스페인어 그리고 일어 등을 포함한다. 즉 일본인이 이동단말기 (10) 를 사용하여 음성 단문메시지를 제3자의 이동단말기나 전자우편 주소로 전송하고자 할 때 본 발명에 따른 변형예를 사용하면, 음성 데이터는 일본어 텍스트에 맞게 자연음 처리기 (80) 에서 교정처리되어 제3자의 단말기로 전송된다.3 shows a natural sound processor 80 as compared to FIG. 2. The natural sound processor 80 is a server or device equipped with a program suitable for Hangul grammar, such as soft sound processing, spelling correction, and jump correction by analyzing the input voice data. However, the language is not limited to Korean and includes English, Spanish and Japanese. In other words, when a Japanese uses a mobile terminal 10 to transmit a voice short message to a third party's mobile terminal or e-mail address, using the modified example according to the present invention, the voice data is a natural sound processor ( 80) is calibrated and transmitted to the third party's terminal.

특히 본 발명의 변형예에서는 교정 처리된 문자 텍스트가 올바른지 음성이 입력된 최초의 이동단말기 (10) 로 음성대문자변환 (STT:Speech To Text) 데이터가전송되어 사용자가 보내기 전에 확인할 수 있도록 그 기능을 추가될 수 있다.In particular, in the modified example of the present invention, the speech-to-text (STT: Speech To Text) data is transmitted to the first mobile terminal 10 in which the corrected text is correct and the user can check the function before sending it. Can be added.

지금부터 본 발명의 변형예에 대하여 도 1 내지 도 3 을 참조하여 설명하면, 우선 사용자가 그의 이동단말기 (10) 를 사용하여 무선인터넷 접속하여 음성인식 서부메뉴를 선택하면 (상기 S1 및 S2 과정), 그 선택신호가 이동통신망을 경유하여 이동통신망의 스위칭센터 (20) 에 접속되어 있는 웹서버 (30) 에 전송된다. 웹서버 (30) 는 그 선택신호를 수신한 후 음성확장 생성언어 (VXML) 태그를 전송한다(상기 S3 과정). 여기서 음성인식 서부메뉴는 음성 SMS 서부메뉴를 포함하는 개념이다.1 to 3, a modification of the present invention will now be described. First, when the user accesses the wireless Internet using his mobile terminal 10 and selects the voice recognition western menu (steps S1 and S2). The selection signal is transmitted to the web server 30 connected to the switching center 20 of the mobile communication network via the mobile communication network. After receiving the selection signal, the web server 30 transmits a voice extension generating language (VXML) tag (step S3). Here, the voice recognition western menu is a concept including a voice SMS western menu.

그 다음 사용자는 음성 브라우저를 구동하여 (상기 S4 과정) 음성 메시지, 예를 들어 "안녕하신지요"를 음성으로 입력하면 그 음성신호는 이동단말기 (10) 에서 이동통신망의 교환기 (20) 에 접속되어 있는 자연음 처리 서버 (40) 에 전송되고 (상기 S5 과정), 이 서버 (40) 는 자체 내장된 문법 데이터 베이스 (Grammar DB) 에서 찾아서 단말기 (10) 로부터 받은 음성벡터 데이터와 문법을 음성인식서버(70) 에 전송한다 (상기 S6 과정).Then, the user drives the voice browser (in step S4) and inputs a voice message, for example, "Hello," by voice. The voice signal is connected from the mobile terminal 10 to the switch 20 of the mobile communication network. It is transmitted to the natural sound processing server 40 (step S5 above), and the server 40 finds the speech vector data and the grammar received from the terminal 10 in a built-in grammar database (Grammar DB). 70) (S6 process).

음성인식서버 (70) 는 음성벡터 데이터를 문법내의 단어들과 비교하여 가장 비슷한 단어를 검색하여 자연음 처리기 (80) 로 그 결과값을 전송하고 [결과값 전송과정 (S7)], 자연음 처리기 (80) 는 입력된 데이터를 분석하여 연음처리, 철자교정, 뛰어쓰기 교정 등 문법에 맞도록 하여 자연음 처리 서버 (40) 로 전송한다 [자연음 처리 과정 (S8)].The speech recognition server 70 compares the speech vector data with the words in the grammar, searches for the most similar words, and transmits the result value to the natural sound processor 80. [Result transfer process (S7)], the natural sound processor 80 analyzes the input data and transmits the natural data to the natural sound processing server 40 in accordance with the grammar such as soft-tone processing, spelling correction, and skipping correction (natural sound processing process S8).

그 다음 자연음 처리 서버 (40) 는 자연음 처리된 데이터를 텍스트로 변환시켜 최초에 접속한 이동단말기 (10) 의 사용자가 지정한 수신측 이동단말기 (90) 로전송한다 [음성대문자변환 및 전송 과정 (S9)]. 음성대문자변환 및 전송 과정(S9) 에서는 문자 텍스트가 직접 수신측 이동단말기 (90) 로 전송될 수 있고, 송신측 이동단말기 (10) 에 먼저 전송하여 사용자가 그 텍스트 (즉 단문메시지) 를 확인하여 맞는 경우 수신측 이동단말기 (90) 로 전송할 수 있다. 여기서 수신측 이동단말기 (90) 는 핸드폰, 노트북, 개인휴대단말기 등을 포함하는 개념이다.The natural sound processing server 40 then converts the natural sound processed data into text and transmits it to the receiving mobile terminal 90 designated by the user of the mobile terminal 10 that is initially connected. (S9)]. In the voice-to-character conversion and transmission process (S9), the text of text may be directly transmitted to the receiving mobile terminal 90, and then transmitted first to the transmitting mobile terminal 10 so that the user checks the text (i.e., a short message). If so, it can transmit to the receiving side mobile terminal 90. Here, the receiving side mobile terminal 90 is a concept including a mobile phone, a notebook, a personal portable terminal and the like.

또 경우에 따라서는 자연음 처리된 데이터가 직접 이동단말기 (10, 90) 에 전송되어 그 이동단말기 (10, 90) 에서 문자텍스트로 변환할 수 있다.In some cases, the natural sound processed data is directly transmitted to the mobile terminals 10 and 90, and the mobile terminals 10 and 90 can convert the text into text.

본 발명의 변형예에 따른 음성 SMS 전송은 자연음 처리과정을 통해 음성을 단문 메시지 서비스 (SMS) 문자로 변경하여 상대방의 이동단말기 혹은 전자우편주소로 송부하는 것이다. 따라서 상대방의 이동단말기의 지정이나 전자우편주소의 지정은 음성 SMS 전송 과정 중 어느 때이건 지정할 수 있고 수신이 종료시 음성으로 종료를 출력할 수 있다.Voice SMS transmission according to a modification of the present invention is to change the voice to a short message service (SMS) text through a natural sound processing process and send it to the other party's mobile terminal or e-mail address. Therefore, the designation of the other party's mobile terminal or the e-mail address can be specified at any time during the voice SMS transmission process, and the end can be output by voice when the reception ends.

또한 음성인식서버 (70) 와 자연음 처리기 (80) 는 별도의 서버 혹은 장치로 도시하였지만, 자연음 처리 서버 (40) 에 포함되어 하나의 서버나 혹은 장치로 본 발명의 변형예를 구현할 수 있음은 당업자에게는 다언을 요하지 않는다.In addition, although the voice recognition server 70 and the natural sound processor 80 are illustrated as separate servers or devices, the voice recognition server 70 and the natural sound processor 80 may be included in the natural sound processing server 40 to implement a variation of the present invention with one server or device. Is not required by those skilled in the art.

이상, 본 발명에 따르면, 이동단말기를 통해 직접 음성으로 입력하여 필요한 웹페이지 혹은 데이터 파일을 다운로드 받을 수 있는 효과를 제공한다.According to the present invention, by providing a voice directly input through the mobile terminal to provide the effect of downloading the necessary web page or data file.

또 본 발명은 문자메시지를 키패드상의 버튼을 번거롭게 누르지 아니하고 음성입력으로 상대방은 문자로 데이터를 전송받을 수 있는 효과를 제공한다.In another aspect, the present invention provides the effect that the other party can receive the data by text input by voice input without pressing the button on the keypad cumbersomely.

Claims

In the data control method using a distributed speech processing device,

Accessing the wireless Internet with the mobile terminal 10, selecting a menu for voice recognition, and transmitting the selected content to the web server 30;

The web server 30 transmits a voice extension language (VXML) tag to the mobile terminal 10 in response to the received content;

The mobile terminal 10 checking the voice extension language tag and preparing a voice input by driving a voice browser;

A process of transmitting the speech data from the mobile terminal 10 to the natural sound processing server 40 after the speech signal input by the user is data-formed through the feature extraction and the vector quantization process and then combined with the speech vector;

Transmitting, by the natural sound processing server 40, the voice data and the grammar received from the mobile terminal 10 to the voice recognition server 70 using a self-equipped grammar database;

The voice recognition server 70 comparing the received voice data with words in the grammar and notifying the natural sound processing server 40 of the result value;

The natural sound processing server 40 requests the web server 30 when a web page is needed according to the received result value, and transmits the requested data directly to the mobile terminal 10 when it is not needed. Process,

The web server 30 finds the web page and transmits it to the mobile terminal 10.

And a process of transmitting the text-to-speech data to the mobile terminal (10).

The mobile terminal 10 stores the path name and the natural sound processing server 40 is connected to the mobile terminal 10 when the data received by the mobile terminal 10 has a large capacity. And transmitting the stored data to the mobile terminal (10) when a request is received.

In the data control method using a distributed speech processing device,

Transmitting the log sound data of the natural sound processing server 40 from the mobile terminal 10 after the voice signal input by the user is converted into a combination of voice vectors through feature extraction and vector quantization;

The speech recognition server 70 comparing the speech data with the words in the grammar and transmitting the result value to the natural sound processor 80;

The natural sound processor 80 analyzes the input data, processes it as natural sound, and transmits it to the natural sound processing server 40;

The natural sound processing server 40 converts the natural sound processed data into text and transmits the data directly to the receiving mobile terminal 90, or after transmitting the text to the mobile terminal 10 to confirm the mobile terminal 10. And receiving the signal and transmitting the signal to the receiving side mobile terminal (90).

In a data control system using a distributed speech processing device,

After accessing the web server 30 and receiving the tag of the voice extension language, the parameter is extracted from the received voice signal, vector quantized, and preprocessed by the natural sound processing server 40 connected to the mobile communication network as a data channel. A mobile terminal 10 for transmitting data;

A web that transmits a tag of a voice extension language to the mobile terminal 10 according to a connection request, and transmits a web page to the mobile terminal 10 upon receiving a web page transmission command from the natural sound processing server 40. Server 30,

Receives preprocessed voice data from the mobile terminal 10 and transmits the preprocessed voice data to the voice recognition server 70 with the corresponding grammar, and receives the processed value from the voice recognition server 70 to synchronize with the web server 30. A natural sound processing server 40 for transmitting or storing necessary web pages, data or paths to the mobile terminal 10 and controlling voice recognition to provide the mobile terminal 10 with the necessary voice at the end of the step. ,

A speech recognition server for receiving feature extraction and vector quantized data from the natural sound processing server 40 together with a grammar, performing a speech recognition function by performing a comparison process, and transmitting the result value to the natural sound processing server 40. A data control system using a distributed speech processing device for implementing the method according to claim 1, characterized in that it is composed of (70).

In a data control system using a distributed speech processing device,

A web server 30 for transmitting a tag of a voice extension language to the mobile terminal 10 according to a connection request;

Receives the preprocessed voice data from the mobile terminal 10 and transmits the preprocessed voice data to the voice recognition server 70 together with the corresponding grammar, and receives the natural sound processed result value from the natural sound processor 80 to convert the voice data into text data. A natural sound processing server 40 for transmitting to the mobile terminal 10 or the receiving mobile terminal 90;

A speech recognition server for receiving feature extraction and vector quantized data from the natural sound processing server 40 together with a grammar, performing a speech recognition function by performing a comparison process, and transmitting the result value to the natural sound processing 40. 70,

3. A natural sound processor (80) for processing natural sound of voice data received from the voice recognition server (70) and transmitting the result to the natural sound processing server (40). Data control system using a distributed speech processing device for implementing the method.