KR20040032523A

KR20040032523A - Method and Apparatus for interfacing internet site of mobile telecommunication terminal using voice recognition

Info

Publication number: KR20040032523A
Application number: KR1020020061710A
Authority: KR
Inventors: 전윤호
Original assignee: 와이더덴닷컴 주식회사
Priority date: 2002-10-10
Filing date: 2002-10-10
Publication date: 2004-04-17
Also published as: KR100486030B1

Abstract

PURPOSE: An Internet site access apparatus of a mobile communication terminal using a voice recognition and a method therefor are provided to enable a voice recognition server to recognize an inputted voice of a mobile terminal user and move to a desired Internet site. CONSTITUTION: A mobile communication terminal(100) access the Internet through a Web browser and transmits voice data and a current URL(Uniform Resource Locator). A multimodal server(200) maps a voice recognition grammar by using the URL and generates a target URL of a new page to be accessed. A voice recognition server(300) recognizes a voice by using voice data and voice recognition grammar transmitted from the multimodal server(200). A gateway(400) and a CP(Content Provider) Web server(500) allow the mobile communication terminal(100) to access the Internet.

Description

Method and Apparatus for interfacing internet site of mobile telecommunication terminal using voice recognition}

본 발명은 음성인식을 이용한 이동통신 단말기의 인터넷 검색장치 및 방법에 관한 것으로서, 보다 상세하게는 이동통신 단말기를 통해 입력된 음성을 음성인식 서버에서 인식하여 원하는 인터넷 사이트로 이동하는 음성인식을 이용한 이동통신 단말기의 인터넷 검색장치 및 방법에 관한 것이다.The present invention relates to an internet search apparatus and method for a mobile communication terminal using voice recognition, and more particularly, a mobile terminal using voice recognition for recognizing a voice input through a mobile communication terminal and moving to a desired internet site. An internet search apparatus and method for a communication terminal.

이동통신 단말기가 보급되면서 이 이동통신 단말기에서도 인터넷 상의 정보를 이용하고자 하는 필요성이 증가하고 있으며, 이러한 이동통신 단말기는 하드웨어의 성능, 통신망의 속도, 화면의 크기, 입력장치 등이 PC와 비교하여 그 성능과 제약조건이 현저한 차이가 있어서 종래의 PC와 유선 인터넷 망을 대상으로 하는 인터넷 브라우저와 콘텐츠를 그대로 이용하기에는 문제점이 많이 있었다. 그러나, 이러한 문제점은 WAP 등과 같이 새로운 콘텐츠 형식과 입력방식(예를 들면 숫자 버튼을 이용한 네비게이션)을 통해 해결되었으나, 이동통신 단말기의 작은 화면에서 동시에 많은 메뉴와 링크를 보여주는데 한계가 있으며, 콘텐츠는 여러 단계로 이루어진 트리(Tree)형식의 계층으로 구성되어 초기에 설정된 웹 페이지의 메뉴에서 사용자가 원하는 자료를 구비한 웹 페이지로 이동하기 위해서는 연결된 링크들을 따라 특정키를 연속해서 입력해야 최종 콘텐츠에 도달할 수 있다. 따라서, URL 등의 문자의 입력과 트리를 따라 연속적으로 선택하기 위하여 특수 기능키를 클릭해야 함으로 접속에 필요한 시간이 늘어나는 문제점이 있다.With the spread of mobile communication terminals, the necessity of using information on the Internet is increasing in these mobile communication terminals, and the performance of hardware, speed of communication network, screen size, input device, etc. are compared with PCs. Due to the significant difference in performance and constraints, there are many problems in using the Internet browser and contents for the conventional PC and the wired Internet network. However, this problem has been solved through new content formats and input methods (eg, navigation using numeric buttons), such as WAP, but there are limitations in showing many menus and links at the same time on a small screen of a mobile communication terminal. In order to move from the menu of the initially set web page to the web page with the data desired by the user, it is necessary to continuously input specific keys along the linked links to reach the final content. Can be. Therefore, there is a problem in that the time required for connection is increased because a special function key must be clicked in order to continuously select characters along the tree and input of characters such as URLs.

이러한, 접속에 따른 불편함은 음성인식 기술을 이용하여 해결하려는 시도가 있었으며, 도 1에 나타낸 바와 같이, 사용자로부터 명령을 입력받는 버튼 입력부(10)와, 데이터를 저장하는 메모리(11)와, 소정의 데이터를 사용자에게 표시하는 디스플레이부(12)와, 다른 이동통신 단말기와의 데이터 송수신을 처리하는 데이터 송수신부(13)와, 상기 데이터들을 처리하는 CPU(14)와, 음향을 입력받는 마이크(16)와, 음향을 출력하는 스피커(17)를 통하여 음성의 입력 및 출력을 담당하는 음성제어부(15)로 구성된 음성인식기가 내장된 이동통신 단말기를 통해 해결하는 시도가 있었다. 그러나, 음성인식이 가능한 이동통신 단말기는 별도의 음성인식 모듈을 구비해야하며, 이러한, 음성인식이 가능한 이동통신 단말기는 상기 메모리(11)에 저장된 음성명령어와 동일한 음성 데이터만을 처리할 수 있으므로, 상기 이동통신 단말기의 저장수단의 자원을 소비하고 별도의 모듈을 구비해야 하기때문에 이동통신 단말기 내부의 공간을 소비하게 되는 문제점이 있다.Such an inconvenience caused by the connection has been attempted to be solved by using voice recognition technology. As shown in FIG. 1, a button input unit 10 receiving a command from a user, a memory 11 storing data, and A display unit 12 for displaying predetermined data to a user, a data transceiver 13 for processing data transmission and reception with another mobile communication terminal, a CPU 14 for processing the data, and a microphone for receiving sound An attempt has been made to solve the problem through a mobile communication terminal in which a voice recognizer composed of a voice controller 15 for inputting and outputting voices through the speaker 16 for outputting sound is provided. However, the mobile terminal capable of voice recognition should have a separate voice recognition module. Since the mobile terminal capable of voice recognition can process only the same voice data as the voice command stored in the memory 11, Since there is a need to consume resources of the storage means of the mobile communication terminal and have a separate module, there is a problem in that the space inside the mobile communication terminal is consumed.

도 2는 종래의 이동통신 단말기를 이용하여 인터넷에 접속하는 구성을 나타낸 구성도로서, 이동통신 단말기(20)와, 기지국(21)과, WAP 게이트웨이(22)와, 웹서버(23)로 구성된다. 상기 이동통신 단말기(20)의 상기 버튼 입력부(10)의 인터넷 접속을 위한 버튼이 입력된 경우, 상기 이동통신 단말기(20)는 상기 기지국(21)을 통하여 상기 인터넷 웹 서버(23)와의 접속을 가능하게 하는 WAP 게이트웨이(22)와 접속하여, 상기 웹 서버(23)로부터 초기 설정된 웹 페이지에 대한 데이터를 상기 데이터 송수신부(13)를 통하여 전송받아, 상기 디스플레이부(12)를 통하여 사용자에게 제공된다. 또한, 상기와 같은 구성에 Voice XML과 음성인식 기술을 사용하여 이동통신 단말기의 좁은 화면과 한정된 키 입력의 제약으로부터 벗어나 명령어나 음성을 통해 서비스를 제공할 수 있으나, 종래에 제공되던 콘텐츠나 웹 브라우저와의 호환성이 없어 독립적인 콘텐츠를 제작해야하는 문제점이 있다.2 is a block diagram showing a configuration of connecting to the Internet using a conventional mobile communication terminal, comprising a mobile communication terminal 20, a base station 21, a WAP gateway 22, and a web server 23 do. When a button for accessing the Internet of the button input unit 10 of the mobile communication terminal 20 is input, the mobile communication terminal 20 establishes a connection with the Internet web server 23 through the base station 21. It is connected to the WAP gateway 22 to enable, and receives the data for the web page initially set from the web server 23 through the data transmission and reception unit 13, and provides it to the user through the display unit 12 do. In addition, by using voice XML and voice recognition technology in the above configuration, it is possible to provide a service through a command or a voice from the limitation of a narrow screen and limited key input of a mobile communication terminal, but the content or web browser provided in the past There is a problem in that independent content must be produced because it is incompatible with.

따라서, 상기한 문제점을 해결하기 위하여 본 발명은 이동통신 단말기의 하드웨어 형식 및 무선 인터넷 콘텐츠의 형식변화 없이 사용자의 음성을 이용하여 인터넷 사이트에 접속 및 검색할 수 있는 음성인식을 이용한 이동통신단말기의 인터넷 사이트 접속장치를 제공하는 것을 목적으로 한다.Accordingly, in order to solve the above problems, the present invention provides an internet of a mobile communication terminal using voice recognition that can access and search an Internet site using a user's voice without changing the hardware format of the mobile communication terminal and the format of the wireless Internet content. It is an object to provide a site connection device.

또한, 본 발명의 다른 목적은 이동통신 단말기 사용자의 음성을 이용하여 인터넷 사이트에 접속 및 검색할 수 있는 음성인식을 이용한 이동통신단말기의 인터넷 사이트 접속방법을 제공하는 것을 목적으로 한다.Another object of the present invention is to provide a method for accessing an internet site of a mobile communication terminal using voice recognition, which can access and search an internet site using a voice of a mobile communication terminal user.

도 1 은 종래의 일반적인 이동통신 단말기의 구성을 나타낸 블록도.1 is a block diagram showing the configuration of a conventional general mobile communication terminal.

도 2 는 종래의 이동통신 단말기를 이용하여 인터넷에 접속하는 구성을 나타낸 구성도.2 is a block diagram showing a configuration for connecting to the Internet using a conventional mobile communication terminal.

도 3 은 본 발명에 따른 음성인식을 이용한 이동통신 단말기의 인터넷 사이트 접속장치의 구성을 나타낸 구성도.Figure 3 is a block diagram showing the configuration of the Internet site access device of the mobile communication terminal using the voice recognition according to the present invention.

도 4 는 도 3의 이동통신 단말기 구성을 나타낸 블록도.4 is a block diagram showing the configuration of a mobile communication terminal of FIG.

도 5 는 도 3의 음성인식을 위한 서버의 구성을 나타낸 블록도.5 is a block diagram showing the configuration of a server for voice recognition of FIG.

도 6 은 본 발명에 따른 음성인식을 이용한 이동통신 단말기의 인터넷 사이트 접속과정을 나타낸 흐름도.6 is a flowchart illustrating a process of accessing an internet site of a mobile communication terminal using voice recognition according to the present invention.

도 7 은 도 6의 이동통신 단말기가 오프라인 상태에서 인터넷 사이트에 접속하는 과정을 나타낸 흐름도.7 is a flowchart illustrating a process of accessing an internet site in a mobile communication terminal of FIG. 6 in an offline state.

도 8 은 도 6의 이동통신 단말기가 온라인 상태에서 인터넷 사이트에 접속하는 과정을 나타낸 흐름도.8 is a flowchart illustrating a process of accessing an internet site by the mobile communication terminal of FIG. 6 in an online state.

도 9 는 도 6의 음성인식을 위한 음성인식 문법매핑 과정을 나타낸 흐름도.9 is a flowchart illustrating a speech recognition grammar mapping process for speech recognition of FIG. 6.

(도면의 주요 부분에 대한 부호의 설명)(Explanation of symbols for the main parts of the drawing)

100 : 이동통신 단말기110 : 마이크100: mobile communication terminal 110: microphone

120 : EVRC 엔코더130 : 멀티모달 모듈120: EVRC encoder 130: multi-modal module

140 : 브라우저150 : 무선 모듈140: browser 150: wireless module

200 : 멀티모달 서버210 : 네트웍 연결부200: multi-modal server 210: network connection

220 : 음성데이터 변환부230 : 음성인식 문법 매핑부220: voice data conversion unit 230: speech recognition grammar mapping unit

240 : 전역문법 매핑부250 : 타겟 URL 생성부240: global grammar mapping unit 250: target URL generation unit

260 : 데이터베이스300 : 음성인식 서버260: database 300: speech recognition server

400 : WAP 게이트웨이500 : CP 웹서버400: WAP Gateway 500: CP Web Server

상기한 목적을 달성하기 위하여, 본 발명은 무선 네트워크를 통해 인터넷에 접속하는 브라우저, 마이크로 입력되는 음성데이터를 변환하는 엔코더, 및 상기 브라우저로부터 현재 접속중인 사이트의 URL 정보와, 상기 엔코더에서 변환된 음성데이터와, 이동통신 단말기의 정보를 멀티모달 서버로 전송하는 멀티모달수단을 구비한 이동통신 단말기; 상기 이동통신 단말기에서 전송된 현재의 URL 정보로부터 음성인식에 필요한 문법을 결정하고, 상기 음성인식 문법을 상기 음성데이터와 함께 음성인식 서버로 전송하며, 상기 음성인식 서버에서 인식된 결과를 통해 접속할 타겟 URL을 생성하여 이동통신 단말기로 전송하는 멀티모달 서버; 상기 멀티모달 서버에서 전송된 음성데이터와 음성인식 문법을 이용하여 음성을 인식하고, 인식된 결과를 상기 멀티모달 서버로 전송하는 음성인식 서버; 및 상기 멀티모달 서버로부터 전송된 타겟 URL을 이용하여 상기 이동통신 단말기가 접속하는 웹 서버를 포함한다.In order to achieve the above object, the present invention provides a browser for accessing the Internet through a wireless network, an encoder for converting voice data input into a microphone, and URL information of a site currently being accessed from the browser, and the voice converted by the encoder. A mobile communication terminal having multi-modal means for transmitting data and information of the mobile communication terminal to a multi-modal server; A target to determine a grammar required for voice recognition from the current URL information transmitted from the mobile communication terminal, transmit the voice recognition grammar together with the voice data to a voice recognition server, and access through a result recognized by the voice recognition server A multimodal server generating a URL and transmitting the generated URL to a mobile communication terminal; A speech recognition server for recognizing speech using the speech data and the speech recognition grammar transmitted from the multi-modal server, and transmitting the recognized result to the multi-modal server; And a web server to which the mobile communication terminal accesses using a target URL transmitted from the multi-modal server.

또한, 본 발명은 이동통신 단말기와, 상기 이동통신 단말기에서 전송된 음성데이터와 URL을 이용하여 음성인식 문법을 매핑하고, 타겟 URL을 생성하는 멀티모달 서버와, 상기 멀티모달 서버에서 전송된 데이터를 이용하여 음성인식을 실행하는 음성인식 서버와 상기 이동통신 단말기가 접속하는 CP 웹 서버로 구성된 음성인식 이동통신 단말기의 인터넷 접속장치에 있어서, 상기 이동통신 단말기의 웹 브라우저가 실행되었는지를 판단하고, 상기 웹 브라우저가 실행된 경우 이동통신 단말기로 입력되는 음성데이터와, 현재 페이지의 URL 정보와, 상기 이동통신 단말기의정보를 멀티모달 서버로 전송하는 단계; 상기 단말기에서 전송된 현재 페이지의 URL을 이용하여 음성인식에 필요한 음성인식 문법을 생성하고, 생성된 음성인식 문법을 음성데이터와 함께 음성인식 서버로 전송하는 단계; 상기 멀티모달 서버에서 전송된 음성데이터와 음성인식 문법을 이용한 음성인식 결과를 멀티모달 서버로 전송하는 단계; 상기 인식결과를 이용하여 새로운 페이지로 이동할 타겟 URL을 생성하여 이동통신 단말기로 전송하는 단계; 상기 멀티모달 서버에서 전송된 타겟 URL을 CP 서버로 전송하여 이동할 새로운 페이지를 요청하는 단계; 및 상기 CP 서버에서 전송된 새로운 페이지를 단말기로 출력하는 단계를 포함한다.The present invention also provides a mobile communication terminal, a multimodal server for mapping a speech recognition grammar using a voice data and a URL transmitted from the mobile communication terminal, generating a target URL, and a data transmitted from the multimodal server. In the Internet access device of the voice recognition mobile communication terminal consisting of a voice recognition server for performing voice recognition using a CP web server connected to the mobile communication terminal, it is determined whether the web browser of the mobile communication terminal is executed, Transmitting voice data input to a mobile communication terminal, URL information of a current page, and information of the mobile communication terminal to a multimodal server when a web browser is executed; Generating a speech recognition grammar required for speech recognition using the URL of the current page transmitted from the terminal, and transmitting the generated speech recognition grammar to the speech recognition server together with the speech data; Transmitting a voice recognition result using the voice data transmitted from the multi-modal server and a voice recognition grammar to a multi-modal server; Generating a target URL to be moved to a new page using the recognition result and transmitting the generated target URL to a mobile communication terminal; Requesting a new page to be moved by transmitting a target URL transmitted from the multi-modal server to a CP server; And outputting a new page transmitted from the CP server to the terminal.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 설명하면 다음과 같다.Hereinafter, exemplary embodiments of the present invention will be described with reference to the accompanying drawings.

도 3 은 본 발명에 따른 음성인식을 이용한 이동통신 단말기의 인터넷 사이트 접속장치의 구성을 나타낸 구성도이다. 도 3에서, 웹 브라우저를 통해 인터넷에 접속하고, 음성데이터와 현재 URL(uniform resource locator)을 전송하는 이동통신 단말기(100)와, 상기 이동통신 단말기(100)에서 전송된 현재 URL을 이용하여 음성인식 문법을 매핑하고 이동할 새로운 페이지의 타겟 URL을 생성하는 멀티모달 서버(MultiModal server:200)와, 상기 멀티모달 서버(200)로부터 전송된 음성데이터와 음성인식 문법을 이용하여 음성을 인식하는 음성인식 서버(300)와 상기 이동통신 단말기(100)가 인터넷에 접속할 수 있도록 게이트웨이(400)와 CP 웹 서버(Content Provider Web Server:500)로 구성된다.3 is a block diagram showing the configuration of the Internet site access device of the mobile communication terminal using the voice recognition according to the present invention. In FIG. 3, a mobile communication terminal 100 is connected to the Internet through a web browser and transmits voice data and a current URL (uniform resource locator), and the voice is transmitted using the current URL transmitted from the mobile communication terminal 100. Multi-modal server (MultiModal server: 200) for mapping the recognition grammar and generating a target URL of the new page to move, and voice recognition using the voice data and the speech recognition grammar transmitted from the multi-modal server 200 The server 300 and the mobile communication terminal 100 are configured with a gateway 400 and a CP web server (Content Provider Web Server: 500) to access the Internet.

도 4는 도 3의 이동통신 단말기의 구성을 나타낸 블록도로서, 상기 이동통신단말기(100)는 통화부(미도시), 마이크(110), 엔코더(120), 멀티모달 모듈(130), 브라우저(140), 및 무선모듈(150)로 구성된다.4 is a block diagram showing the configuration of the mobile communication terminal of FIG. 3, wherein the mobile communication terminal 100 includes a communication unit (not shown), a microphone 110, an encoder 120, a multi-modal module 130, and a browser. 140, and a wireless module 150.

상기 엔코더(120)는 마이크(110)를 통해 입력된 음성을 압축 변환하며, 상기 엔코더(120)는 8Kbps EVRC 또는 13Kbps QCELP 이고, 이동통신 단말기의 종류에 따라 다른 종류의 엔코더를 사용할 수 있으며, 바람직하게 상기 엔코더(120)는 8Kbps EVRC 엔코더이다.The encoder 120 compresses and converts the voice input through the microphone 110, and the encoder 120 is 8Kbps EVRC or 13Kbps QCELP, and other types of encoders may be used according to the type of mobile communication terminal. Preferably, the encoder 120 is an 8 Kbps EVRC encoder.

상기 멀티모달 모듈(130)은 상기 엔코더(120)에서 압축 변환된 음성데이터와, 상기 브라우저(140)로부터 검출한 현재 페이지의 URL 정보와, 단말기 정보(예를 들면, 브라우저의 종류, 이동통신 전화번호 등)를 무선모듈(150)을 통해 음성인식 시스템(미도시)으로 전송한다.The multi-modal module 130 includes voice data compressed and converted by the encoder 120, URL information of the current page detected by the browser 140, and terminal information (eg, a type of browser, a mobile communication phone). Number, etc.) is transmitted to the voice recognition system (not shown) through the wireless module 150.

도 5 는 도 3의 음성인식을 위한 서버의 구성을 나타낸 블록도이다. 도 5에서, 음성인식을 위한 서버는 음성인식을 위하여 고유의 음성인식 문법과 인식결과에 따라 새로 이동할 타게 URL을 생성하는 멀티모달 서버(200)와 상기 멀티모달 서버(200)로부터 전송된 음성데이터와, 음성인식 문법을 이용하여 음성을 인식하고, 인식결과를 상기 멀티모달 서버(200)로 전송하는 음성인식 서버(300)로 구성된다.FIG. 5 is a block diagram illustrating a configuration of a server for voice recognition of FIG. 3. In FIG. 5, the server for speech recognition includes a multi-modal server 200 and a voice data transmitted from the multi-modal server 200 generating a target URL to be newly moved according to a unique speech recognition grammar and a recognition result for speech recognition. And a voice recognition server 300 for recognizing the voice using the voice recognition grammar and transmitting the recognition result to the multi-modal server 200.

상기 멀티모달 서버(200)는 네트웍 연결부(210)와, 음성데이터 변환부(220)와, 음성인식 문법 매핑부(230)와, 전역문법 매핑부(240)와, 타겟 URL 생성부(250)와, 데이터 베이스(260)로 구성된다.The multi-modal server 200 includes a network connection unit 210, a voice data conversion unit 220, a speech recognition grammar mapping unit 230, a global grammar mapping unit 240, and a target URL generation unit 250. And a database 260.

상기 네트웍 연결부(210)는 이동통신 단말기(미도시)와 접속하여 데이터를 송수신하며, 바람직하게 TCP/IP 프로토콜을 이용하여 접속한다.The network connection unit 210 is connected to a mobile communication terminal (not shown) to transmit and receive data, preferably by using the TCP / IP protocol.

상기 음성데이터 변환부(220)는 상기 네트웍 연결부(210)와 연결되며, 상기 네트웍 연결부(210)에서 전송된 음성데이터를 음성인식 엔진이 처리할 수 있도록 변환하며, 바람직하게 PCM형식으로 변환되고, 음성인식 엔진이 이동통신 단말기에서 전송한 압축 포멧(예를 들면, EVRC 포맷)을 직접 처리할 수 있는 경우 상기 변환과정을 생략 가능하다.The voice data conversion unit 220 is connected to the network connection unit 210, and converts the voice data transmitted from the network connection unit 210 so that the voice recognition engine can process, preferably converted to PCM format, If the speech recognition engine can directly process the compression format (for example, EVRC format) transmitted from the mobile communication terminal, the conversion process can be omitted.

상기 음성인식 문법 매핑부(230)는 상기 네트웍 연결부(210)에서 전송된 URL로부터 그 페이지에 유효한 고유 음성인식 문법을 검출하여 상기 전송된 URL과 매핑하는 것으로서, 소정의 음성명령에 대하여 현재 페이지에 관련하여 발생되어야 할 조건을 매핑한다. 즉 음성인식 문법은 소정의 URL에 대하여 발생되어야 할 소정의 음성명령을 명시하며, 상기 URL에서 입력 가능한 음성명령 리스트가 음성인식 문법이 된다. 예를 들면, 주식 시세를 출력하는 페이지에서는 회사의 이름을 음성 입력하면, 음성 입력된 회사의 현재 주가를 알려줄 수 있도록 모든 상장사의 회사이름의 리스트가 음성인식 문법이 되어, 상기 음성인식 문법과 입력된 URL이 매핑된다.The speech recognition grammar mapping unit 230 detects a valid speech recognition grammar valid for the page from the URL transmitted from the network connection unit 210 and maps it to the transmitted URL. Map the conditions that should occur in relation. That is, the voice recognition grammar specifies a predetermined voice command to be generated for a predetermined URL, and the list of voice commands inputted from the URL becomes the voice recognition grammar. For example, in a page that outputs a stock quote, when a voice name is inputted, the list of company names of all listed companies becomes a voice recognition grammar so that the current stock price of the voice input company can be reported. URLs are mapped.

상기 전역문법 매핑부(240)는 상기 음성인식 매핑부(230)에서 결정된 고유한 음성인식 문법과, 브라우저(미도시)의 현재 페이지에 관계없이 공통적으로 유효한 음성 명령(예를 들면, 도움말, 북마크 등)을 나타내는 음성인식 문법과 매핑한다.The global grammar mapping unit 240 may have a unique voice recognition grammar determined by the voice recognition mapping unit 230 and a voice command that is commonly valid regardless of the current page of the browser (eg, help, bookmark). Maps to a speech recognition grammar.

상기 타겟 URL 생성부(250)는 상기 네트웍 연결부(210)와 상기 음성인식 서버(300)와 연결되며 전송된 단말기 정보와 음성인식 결과에 따른 명령어를 이용하여 상기 명령어가 지정하는 URL을 이동할 타겟 URL로 생성하여 상기 네트웍연결부(210)로 전송한다. 예를 들면, "주식시세" 사이트에서 "삼성전자"를 음성 인식한 경우 주식시세 사이트의 하위 사이트 중에서 삼성전자의 현재 주식 시세를 볼 수 있는 URL을 타겟 URL로 설정한다.The target URL generation unit 250 is connected to the network connection unit 210 and the voice recognition server 300 and target URL to move the URL designated by the command using a command according to the transmitted terminal information and the voice recognition result Create and transmit to the network connection unit 210. For example, when the "stock quote" site has a voice recognition of "Samsung Electronics", the target URL is set to view the current stock quotes of Samsung Electronics among sub-sites of the stock quote site.

상기 음성인식 서버(300)는 상기 멀티모달 서버(200)와 연결되며, 상기 멀티모달 서버(200)에서 전송된 음성데이터와 음성인식 문법을 이용하여 음성을 인식하며, 본 발명에서는 공지된 음성인식 시스템을 사용한다.The speech recognition server 300 is connected to the multi-modal server 200, and recognizes the speech using the speech data and the speech recognition grammar transmitted from the multi-modal server 200, the speech recognition known in the present invention Use the system.

도 6 은 본 발명에 따른 음성인식을 이용한 이동통신 단말기의 인터넷 사이트 접속과정을 나타낸 흐름도이다. 도 3 내지 도 6에서, 이동통신 단말기(100)의 제어부(미도시)는 웹 브라우저가 실행되었는지를 판단하고(S10), 웹 브라우저(140)가 실행되지 않은 경우 도 7에 나타낸 바와 같이, 특정 키 입력이 발생하였는지 검출하여(S11), 특정 키 입력이 발생한 경우 마이크로 입력되는 음성을 엔코더에서 변환하여 저장하고, 상기 저장된 정보를 초기 설정된 페이지의 URL(예를 들면, "네이트"의 URL http://wap.nate.com)과 함께 멀티모달 서버로 전송(S13)하며, 상기 S11단계에서, 특정 키 입력이 발생하지 않은 경우 일반적인 이동통신 단말기의 기능을 수행하는 단계(S12)를 더 포함한다.6 is a flowchart illustrating a process of accessing an internet site of a mobile communication terminal using voice recognition according to the present invention. 3 to 6, the controller (not shown) of the mobile communication terminal 100 determines whether the web browser is executed (S10), and when the web browser 140 is not executed, as shown in FIG. Detecting whether a key input has occurred (S11), when a specific key input has occurred, the voice input into the microphone is converted and stored by the encoder, and the stored information is a URL of an initially set page (for example, a URL of "nate" http: // wap.nate.com) and transmits to the multi-modal server (S13), and further comprises the step (S12) of performing a function of a general mobile terminal when a specific key input does not occur in step S11. .

상기 S10단계에서, 웹 브라우저(140)가 실행된 경우 이동통신 단말기(100)로 입력되는 음성데이터와, 현재 페이지의 URL 정보와, 상기 이동통신 단말기의 정보를 멀티모달 서버(200)로 전송하는 단계(S20)를 수행한다.In step S10, when the web browser 140 is executed, voice data input to the mobile communication terminal 100, URL information of the current page, and information of the mobile communication terminal are transmitted to the multimodal server 200. Step S20 is performed.

상기 S20단계는 도 8에 나타낸 바와 같이, 마이크(110)로부터 입력이 발생하였는지를 판단하여(S21), 음성입력이 발생하지 않은 경우 일반적인 웹 서핑을 수행하고(S22), 상기 S21단계에서, 음성입력이 발생한 경우 상기 마이크(110)로부터 입력되는 음성을 엔코더(120)에서 변환하여 멀티모달 모듈(130)로 전송(S23)하며, 상기 S23단계에서 변환된 음성데이터가 전송되면, 멀티모달 모듈(130)은 현재 실행중인 웹 브라우저(140)로부터 브라우저 정보와 현재 페이지의 URL 정보를 요청하고(S24), 상기 S23단계에서 변환된 음성데이터와 상기 S24단계에서 획득한 현재 페이지의 URL 정보와 이동통신 단말기 정보(예를 들면, 브라우저 종류, 이동통신 전화번호 등)를 무선모듈(150)을 통해 멀티모달 서버(200)로 전송(S25)한다. 또한, 상기 S21단계는 웹 브라우저(140)가 실행되고, 특정 키 값이 입력된 이후 수행할 수도 있다.In step S20, as shown in FIG. 8, it is determined whether an input is generated from the microphone 110 (S21). When no voice input occurs, general web surfing is performed (S22). In step S21, the voice input is performed. In this case, the voice input from the microphone 110 is converted by the encoder 120 and transmitted to the multi-modal module 130 (S23). When the converted voice data is transmitted in step S23, the multi-modal module 130 is transmitted. ) Requests the browser information and the URL information of the current page from the currently running web browser 140 (S24), the voice data converted in step S23 and the URL information of the current page obtained in step S24 and the mobile communication terminal. Information (eg, browser type, mobile communication phone number, etc.) is transmitted to the multimodal server 200 through the wireless module 150 (S25). In addition, the step S21 may be performed after the web browser 140 is executed and a specific key value is input.

상기 S20단계를 수행한 후, 상기 단말기(100)에서 전송된 현재 페이지의 URL을 이용하여 음성인식에 필요한 음성인식 문법을 생성하고, 생성된 음성인식 문법을 음성데이터와 함께 음성인식 서버로(300) 전송한다(S30).After performing the step S20, using the URL of the current page transmitted from the terminal 100 to generate a speech recognition grammar required for speech recognition, the generated speech recognition grammar with the voice data to the speech recognition server (300) ) S30.

상기 S30단계에서는 도 9에 나타낸 바와 같이, 음성인식을 요청하는 이동통신 단말기(100)와 접속하여 상기 이동통신 단말기(100)로부터 전송되는 음성데이터와 현재의 URL과 단말기 정보를 수신하고(S31), 상기 S31단계에서 수신된 현재의 URL 정보로부터 각 URL에 대응하는 고유 음성인식 문법과 매핑하며(S33), 상기 S33단계에서 매핑된 고유 음성인식 문법과 어느 페이지에서나 유효한 전역 음성인식 문법을 결합(S34)한다.In step S30, as shown in FIG. 9, the mobile terminal 100 requests voice recognition to receive voice data transmitted from the mobile communication terminal 100, the current URL, and terminal information (S31). In operation S33, a unique speech recognition grammar corresponding to each URL is mapped from the current URL information received in step S31, and the unique speech recognition grammar mapped in step S33 is combined with a global speech recognition grammar valid in any page ( S34).

상기 S33단계는 멀티 모달 서버(200)의 데이터베이스(260)에 저장된 각 URL과 상기 데이터베이스(260)에 저장된 상기 URL에 대응하는 음성인식 문법을 이용하여 매핑하는 방법과, 멀티모달 서버(200)에는 URL만 저장하고, 상기 URL에 대응하는 고유한 음성인식 문법은 외부 CP 서버(500)에 저장하여 필요한 경우 URL이 지정하는 외부 CP 서버(500)의 음성인식 문법을 매핑하는 방법과, 화면 표시용 스크립트 언어의 문법에 최소한의 확장을 하거나 또는, 확장을 하지 않더라도 메타(meta)태그 또는 스크립트의 주석문 내에 소정의 표식을 하여 그 페이지에 유효한 음성인식 문법에 대한 URL을 지정하고, 화면 표시용 스크립트를 CP의 웹서버로부터 획득하여 음성인식 문법의 URL을 검출하고, 상기 URL을 사용하여 멀티모달 서버(200)의 데이터베이스(260)에 저장된 음성인식 문법을 매핑하는 방법 중에서 어느 하나의 방법을 이용하여 매핑하며, 바람직하게는 모든 데이터를 멀티모달 서버에서 매핑한다.The step S33 is a method of mapping using the speech recognition grammar corresponding to each URL stored in the database 260 of the multi-modal server 200 and the URL stored in the database 260, the multi-modal server 200 Only the URL is stored, and the unique speech recognition grammar corresponding to the URL is stored in the external CP server 500, and if necessary, a method for mapping the speech recognition grammar of the external CP server 500 specified by the URL, and for screen display. If you do not extend the grammar of the scripting language with minimal or no extension, specify a URL in the meta tag or the comment of the script to specify a URL for valid speech recognition grammar on the page, and display the script for the screen display. The speech recognition grammar obtained from the web server of the CP is detected, and the speech recognition grammar stored in the database 260 of the multi-modal server 200 using the URL. Maps by using any of the methods from the method of mapping, preferably maps all of the data from the multi-modal server.

상기 S30단계는 이동통신 단말기(100)에서 전송된 음성데이터를 음성인식 엔진에 처리할 수 있도록 PCM 변환을 하는 단계를 더 포함한다.The step S30 further includes a PCM conversion to process the voice data transmitted from the mobile communication terminal 100 to the voice recognition engine.

상기 S30단계를 수행한 후 상기 음성인식 서버(300)에서 전송된 음성데이터와 음성인식 문법을 이용하여 전송된 음성을 인식하고, 인식된 결과를 멀티모달 서버(200)로 전송한다(S40). 상기 S40단계에서 사용하는 음성인식 방법은 이미 공지된 음성인식 방법을 사용할 수도 있으며, 바람직하게는 화자독립 음성인식 방법이다.After performing the step S30, the voice is transmitted using the voice data transmitted from the voice recognition server 300 and the voice recognition grammar, and the recognized result is transmitted to the multi-modal server 200 (S40). The speech recognition method used in the step S40 may use a known speech recognition method, preferably speaker-independent speech recognition method.

상기 S40단계에서 전송된 음성 인식결과를 이용하여 새로운 페이지로 이동할 타겟 URL을 생성하여 이동통신 단말기(100)로 전송한다(S50). 상기 S50단계에서 타겟 URL은 기본적으로 음성인식 문법에서 지정되지만, 단순히 고정된 URL이 아니라치환되는 변수를 가질 수도 있다. 예를 들면, 주식 시세 페이지에서 인식된 명령어가 "삼성전자"인 경우 타겟 URL은 http://stock.nate.com?name=${x}에서 상기 ${x}에 인식된 "삼성전자" 단어가 삽입된다. 또한, 한 문장에서 하나 이상의 핵심어를 추출해 낼 수 있는 멀티-슬롯(multi-slot) 음성인식 엔진을 사용하는 경우에는 슬롯 이름을 타겟 URL의 query변수 이름으로 매핑 할 수 있다. 예를 들면, "내일 6시 30분에 모닝콜이요"라는 음성이 입력되면, 상기 음성을 인식한 음성인식 엔진은 생성되는 타겟 URL의 http://pim.nate.com?action=newitem&time=${t}&item=${item}&. 의 ${t}에는 "내일 6시 30분", ${item}에는 "모닝콜"을 삽입한다.Using the speech recognition result transmitted in step S40, a target URL to be moved to a new page is generated and transmitted to the mobile communication terminal 100 (S50). In step S50, the target URL is basically specified in the speech recognition grammar, but may have a variable that is not simply a fixed URL. For example, if the command recognized in the stock quote page is "Samsung", the target URL is "Samsung" recognized in $ {x} above at http://stock.nate.com?name=${x}. The word is inserted. In addition, when using a multi-slot speech recognition engine that can extract one or more keywords from a sentence, the slot name can be mapped to the query variable name of the target URL. For example, if a voice of "Wake up call tomorrow at 6:30" is inputted, the speech recognition engine that recognizes the voice may generate a http://pim.nate.com?action=newitem&time=$ { t} & item = $ {item} &. In the $ {t} "tomorrow 6:30" and $ {item} in the "wake".

상기 S50단계를 수행한 후 생성된 타겟 URL은 CP 서버(500)로 전송하여 이동할 새로운 페이지를 요청하면(S60), CP 서버(500)는 상기 새로운 페이지의 정보를 검색하여 이동할 새로운 페이지의 URL정보를 이동통신 단말기(100)로 전송하여 상기 이동통신 단말기의 디스플레이 장치(미도시)를 통해 출력(S70)한다.When the target URL generated after performing step S50 is transmitted to the CP server 500 and requests a new page to be moved (S60), the CP server 500 searches for the information of the new page and URL information of the new page to be moved. Is transmitted to the mobile communication terminal 100 and output through a display device (not shown) of the mobile communication terminal (S70).

상기한 바와 같이, 본 발명은 이동통신 단말기를 사용하는 무선 인터넷 사용자가 키패드 또는 음성을 이용하여 편리하게 소망하는 정보를 이용할 수 있는 장점이 있으며, 이미 사용중인 이동통신 단말기의 하드웨어와 무선 인터넷 콘텐츠의 형식을 수정하지 않고 사용할 수 있는 장점이 있다.As described above, the present invention has the advantage that a user of a wireless Internet using a mobile communication terminal can conveniently use desired information using a keypad or voice. It has the advantage that it can be used without modifying the format.

이상에서는, 본 발명을 특정의 바람직한 실시예에 대해서 도시하고 설명하였다. 그러나, 본 발명은 상술한 실시예에만 한정되는 것은 아니며, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 이하의 특허청구범위에 기재된 본 발명의 기술적 사상의 요지를 벗어남이 없이 얼마든지 다양하게 변경 실시할 수 있을 것이다.In the above, the present invention has been illustrated and described with respect to certain preferred embodiments. However, the present invention is not limited only to the above-described embodiments, and those skilled in the art to which the present invention pertains may vary without departing from the spirit of the technical idea of the present invention described in the claims below. It will be possible to carry out the change.

Claims

A browser accessing the Internet through a wireless network, an encoder for converting voice data input into a microphone, and URL information of the current site detected by the browser, voice data converted by the encoder, and information of a mobile communication terminal. A mobile communication terminal having multi-modal means for transmitting to a server;

A target to determine a grammar required for voice recognition from the current URL information transmitted from the mobile communication terminal, transmit the voice recognition grammar together with the voice data to a voice recognition server, and access through a result recognized by the voice recognition server A multimodal server generating a URL and transmitting the generated URL to a mobile communication terminal;

A speech recognition server for recognizing speech using the speech data and the speech recognition grammar transmitted from the multi-modal server, and transmitting the recognized result to the multi-modal server; And

An internet access device of a mobile communication terminal using voice recognition comprising a web server to which the mobile communication terminal accesses using a target URL transmitted from the multi-modal server.

The apparatus of claim 1, wherein the multimodal server comprises: a speech recognition grammar mapping unit configured to search a database for a unique speech recognition grammar valid on the page from a current URL and map the current speech to a current URL;

A global grammar mapping unit for mapping the unique speech recognition grammar mapped by the speech recognition grammar mapping unit and the global speech recognition grammar valid in all URLs; And

And a target URL generator configured to generate a target URL to which the URL of the current page is to be moved using the recognition result.

The Internet access apparatus of claim 1, wherein the multi-modal server further comprises a voice data converter for converting the voice data encoded by the mobile communication terminal into a predetermined format.

A voice recognition grammar is mapped using the mobile communication terminal, the voice data transmitted from the mobile communication terminal and the URL, a multimodal server generating a target URL, and the voice recognition is performed using the data transmitted from the multimodal server. In the Internet access device of the voice recognition mobile communication terminal consisting of a voice recognition server to be executed and a CP web server connected to the mobile communication terminal,

It is determined whether a web browser of the mobile communication terminal is executed, and when the web browser is executed, voice data input to the mobile communication terminal, URL information of the current page, and information of the mobile communication terminal are transmitted to the multimodal server. Doing;

Generating a speech recognition grammar required for speech recognition using the URL of the current page transmitted from the terminal, and transmitting the generated speech recognition grammar to the speech recognition server together with the speech data;

Transmitting a voice recognition result using the voice data transmitted from the multi-modal server and a voice recognition grammar to a multi-modal server;

Generating a target URL to be moved to a new page by using the recognition result and transmitting it to a mobile communication terminal;

Requesting a new page to be moved by transmitting a target URL transmitted from the multi-modal server to a CP server; And

Outputting a new page transmitted from the CP server to the terminal.

The method of claim 4, wherein the determining of whether the web browser is executed comprises: detecting whether a specific key input has occurred when the web browser is not executed, converting and storing a voice input by a microphone in the encoder when a specific key input occurs, Transmitting the stored information to the multimodal server together with the URL of the initially set page; And

If the specific key input does not occur, further comprising the step of performing the function of the general mobile communication terminal Internet access method of the mobile communication terminal using the voice recognition.

The method of claim 4, wherein the transmitting of the voice data and the current URL information to the multi-modal server comprises determining whether a voice input has occurred and performing general web surfing when no voice input has occurred, and when the voice input has occurred. Converting and transmitting the input voice in an encoder;

Requesting browser information and URL information of the current page from a running web browser; And

And transmitting the converted voice data, the URL information of the current page, and the mobile communication terminal information.

The method of claim 4, wherein the generating of the voice recognition grammar required for voice recognition using the URL of the current page comprises: receiving voice data, current URL, and terminal information by connecting to a mobile communication terminal requesting voice recognition; ;

Mapping from the received current URL information to a unique speech recognition grammar corresponding to each URL; And

Combining the mapped unique speech recognition grammar and voice recognition commands useful anywhere, Internet access method of a mobile communication terminal using a voice recognition characterized in that it comprises.

The method of claim 7, wherein the mapping of the received current URL information to a unique speech recognition grammar corresponding to each URL is performed by comparing the current URL with a URL stored in the multimodal server to each URL stored in the multimodal server database. An internet access method of a mobile communication terminal using speech recognition, characterized in that mapping with a corresponding unique speech recognition grammar.

8. The method of claim 7, wherein mapping from the received current URL information to a unique speech recognition grammar corresponding to each URL corresponds to a current URL, a URL stored in a multimodal server, and a URL stored in an external CP web server. Internet access method of a mobile communication terminal using a voice recognition, characterized in that for mapping the unique speech recognition grammar.

8. The method of claim 7, wherein the mapping of the received current URL information to a unique speech recognition grammar corresponding to each URL comprises a predetermined marking on a screen display script to designate a URL for the speech recognition grammar valid for the page. And the screen display script is obtained from a web server of a CP to detect a URL of a speech recognition grammar, and the speech recognition grammar stored in a multi-modal server is mapped using the URL. Internet connection method of the terminal.

The method of claim 4, wherein the generating of the speech recognition grammar required for speech recognition using the URL of the current page further comprises converting the speech data transmitted from the terminal into a PCM. Internet access method of mobile communication terminal.