KR20050035784A

KR20050035784A - Voice supporting web browser through conversion of html contents and method for supporting voice reproduction

Info

Publication number: KR20050035784A
Application number: KR1020030071567A
Authority: KR
Inventors: 이동우; 조수선; 신희숙; 최은정; 한동원
Original assignee: 한국전자통신연구원
Priority date: 2003-10-14
Filing date: 2003-10-14
Publication date: 2005-04-19

Abstract

본 발명은 HTML 컨텐츠 변환을 통한 음성지원 웹 브라우저 및 음성지원 방법에 관한 것으로, 사용자가 휴대단말 등에서 웹 문서를 검색할 때 시각뿐만 아니라 음성 등 다양한 사용자 인터페이스가 가능하도록 함으로써 보행 중이나 이동 중에도 화면에 구애받지 않고 웹 검색을 할 수 있다. The present invention relates to a voice-supported web browser and a voice support method through HTML content conversion, and enables users to search a web document using a mobile terminal such as not only visual but also various user interfaces such as voice, while walking or moving on the screen. You can search the web without receiving it.

특히, 본 발명의 웹 브라우저는 문서를 다운로드 하는 페이지 로더, 상기 페이지 로더에서 다운로드 된 문서가 HTML(Hyper Text Makeup Language) 문서일 경우, 상기 문서를 파싱하여 HTML DOM(Document Object Model)을 생성하는 HTML 파서, 상기 DOM으로부터 음성지원이 필요한 부분을 추출하고, 문서의 상관관계에 따른 음성인식 그래머를 생성하여 VoiceXML 문서를 생성하는 컨텐츠 변환기, 상기 VoiceXML(Voice eXtensible Markup Language) 문서에서 문서를 파싱하여 상기 문서의 내용을 TTS(Text to Speech)로 음성출력하고, 음성인식에 필요한 그래머(Grammar)와 사용자로부터 입력된 음성 명령어를 음성인식기에 전달하고, 상기 음성인식기로부터 인식된 단어를 넘겨받아서 처리하는 VoiceXML 해석기 및 상기 VoiceXML 해석기에서 넘겨받은 문장을 음성으로 출력하는 TTS(Text to Speech)와 상기 그래머에 의해 인식된 단어를 상기 VoiceXML 해석기로 전달하는 음성인식기를 포함하여 구성되는 것을 특징으로 한다.In particular, the web browser of the present invention is a page loader for downloading a document, when the document downloaded from the page loader is an HTML (Hyper Text Makeup Language) document, HTML to parse the document to generate an HTML Document Object Model (DOM) A parser, a content converter for extracting a part requiring voice support from the DOM, generating a voice recognition grammar according to the correlation of the document, and generating a VoiceXML document, parsing a document in the VoiceXML (Voice eXtensible Markup Language) document. VoiceXML parser outputs the contents of the text to TTS (Text to Speech), delivers the grammar (Grammar) and voice commands input from the user to the voice recognizer, and receives and processes the recognized words from the voice recognizer. And text to speech (TTS) and the grammar for outputting the sentences received from the VoiceXML parser as voice. It is characterized in that it comprises a speech recognizer for delivering the word recognized by the VoiceXML interpreter.

Description

Voice Supporting Web Browser Through Conversion of HTML Contents and Method for Supporting Voice Reproduction}

본 발명은 HTML 컨텐츠 변환을 통한 음성지원 웹 브라우저 및 음성지원 방법에 관한 것으로, 더욱 상세하게는 기존의 PC(Personal Computer)용 웹 문서를 수정 없이 컨텐츠 변환기를 통하여 HTML에 VoiceXML을 삽입하여 음성을 지원하는 브라우저의 기술을 개발함으로써, 휴대단말 등에서 웹 문서를 검색할 때 시각뿐만 아니라 음성 등 다양한 사용자 인터페이스가 가능하며 보행 중이나 이동 중에도 화면에 구애받지 않고 웹 검색을 할 수 있도록 하는 것이다. The present invention relates to a voice support web browser and a voice support method through HTML content conversion, and more particularly, to support voice by inserting VoiceXML into HTML through a content converter without modifying an existing web document for a personal computer (PC). By developing a browser technology, a user can use a variety of user interfaces such as voice as well as visual when searching a web document in a mobile terminal, and can search the web regardless of the screen while walking or moving.

최근 기술의 발달로 인하여 PC 에서만 인터넷 접속이 가능하던 시절에서 다양한 휴대단말의 보급이 이루어지면서 휴대단말에서 인터넷을 사용하고자 하는 욕구가 늘어나고 있다.Recently, due to the development of technology, various mobile terminals have been spread in the days when only Internet access was possible from PCs, and the desire to use the Internet in mobile terminals is increasing.

하지만, 휴대단말의 경우 디스플레이의 해상도에 한계가 있으며 이동 중에 웹을 검색하고 그 내용을 읽기란 쉽지 않다.However, in the case of a mobile terminal, the resolution of the display is limited, and it is not easy to search the web and read its contents on the go.

또한, 휴대단말 사용자들은 다양한 입출력을 지원하는 멀티모달(Multimodal) 브라우저를 원하고 있는데, 웹 문서 저작 언어인 HTML은 시각적인 표현 위주로 만들어진 마크업(Mark-up)언어여서 다양한 멀티모달리티(Multi-modality)를 지원하지 않는다.In addition, mobile terminal users want a multimodal browser that supports a variety of input and output. HTML, a web document authoring language, is a mark-up language that is designed for visual expression, and thus, multi-modality. ) Is not supported.

이러한 문제점을 해결하기 위해서 HTML문서에 Voice를 지원하기 위한 시도가 일어나고 있다.In order to solve this problem, an attempt has been made to support voice in HTML documents.

일예로, VoiceXML(Voice eXtensible Markup Language)을 이용한 음성 서비스는 여러 곳에서 이루어지고 있으며, 그 서비스의 대부분은 ARS(Automatic Response System : 자동응답시스템)서버를 대체하기 위한 수단으로 대부분 이용되고 있다.For example, voice services using VoiceXML (Voice eXtensible Markup Language) are provided in various places, and most of the services are mostly used as a means for replacing an ARS (Automatic Response System) server.

이는, 상기 VoiceXML을 이용하면 수정 및 유지 보수가 간단하고 쉽기 때문이다.This is because modification and maintenance are simple and easy by using the VoiceXML.

그러나, 상기 ARS서비스는 기존의 HTML 웹 서비스와 다르다는 문제점이 있고, 웹 메일 서버에 VoiecXML을 이용하여 자신에게 온 메일을 음성으로 확인할 수 있도록 하는 서비스도 HTML 파일을 가공하여 VoiceXML파일로 저장하고 새로운 서버를 구축해야 하는 단점이 있다. However, there is a problem that the ARS service is different from the existing HTML web service, and the service that enables the user to check the mail sent to the user by voice using VoiecXML in the web mail server also processes the HTML file and saves it as a VoiceXML file. There is a downside to building it.

또한, 웹 서버에서 음성 서비스를 하기 위해서 HTML과 VoiceXML을 이용하는 시도는 IBM 의 XHTML+VoiceXML(X+V)가 있고, 마이크로소프트는 독자적인 스펙인 SALT(Speech Application Language Tags)가 있지만, 이 두 가지 경우 모두 다 기존의 HTML 문서에 새로운 코드를 추가하여 웹 문서를 재 작성하여야 하며 새로운 스펙을 지원하는 브라우저만이 서비스를 받을 수 있는 문제점이 있다.In addition, attempts to use HTML and VoiceXML for voice services on Web servers include IBM's XHTML + VoiceXML (X + V), and Microsoft has its own specification, Speech Application Language Tags (SALT). All of them have to rewrite the web document by adding new code to the existing HTML document, and only a browser that supports the new specification can receive the service.

따라서, 본 발명은 상술한 종래의 문제점을 해결하기 위한 것으로, 본 발명의 목적은, 기존의 HTML 문서에 실시간으로 VoiceXML을 추가하여 시각적 인터페이스만 지원하는 웹 문서를 음성출력이 가능하게 만드는 멀티모달 웹 브라우저를 구성함으로써 휴대단말과 같은 장치에서 포털 사이트의 신문기사나 게시판의 내용을 음성으로 서비스하여 사용자로 하여금 시/청각적인 웹 검색이 가능하도록 하는 HTML 컨텐츠 변환을 통한 음성지원 웹 브라우저 및 음성지원 방법을 제공하는데 있다. Accordingly, an object of the present invention is to solve the above-described problems. An object of the present invention is to add VoiceXML to an existing HTML document in real time to enable a voice output of a web document supporting only a visual interface. Voice-supported web browser and voice support method through HTML content conversion that enables users to visually and auditively search the web by voice-producing the contents of newspaper articles or bulletin boards of portal sites by configuring the browser. To provide.

상기와 같은 본 발명의 목적을 달성하기 위한 HTML(Hyper Text Makeup Language) 컨텐츠 변환을 통한 음성지원 웹 브라우저는, 문서를 다운로드 하는 페이지 로더, 상기 페이지 로더에서 다운로드 된 문서가 HTML 문서일 경우, 상기 문서를 파싱하여 HTML DOM(Document Object Model)을 생성하는 HTML 파서, 상기 DOM으로부터 음성지원이 필요한 부분을 추출하고, 문서의 상관관계에 따른 음성인식 그래머를 생성하여 VoiceXML 문서를 생성하는 컨텐츠 변환기, 상기 VoiceXML 문서에서 문서를 파싱하여 상기 문서의 내용을 TTS(Text to Speech)로 음성출력하고, 음성인식에 필요한 그래머(Grammar)와 사용자로부터 입력된 음성 명령어를 음성인식기에 전달하고, 상기 음성인식기로부터 인식된 단어를 넘겨받아서 처리하는 VoiceXML 해석기 및 상기 VoiceXML 해석기에서 넘겨받은 문장을 음성으로 출력하는 TTS와 상기 그래머에 의해 인식된 단어를 상기 VoiceXML 해석기로 전달하는 음성인식기를 포함하여 구성된다. In order to achieve the above object of the present invention, a voice-supported web browser through HTML (Hyper Text Makeup Language) content conversion includes a page loader for downloading a document, and when the document downloaded from the page loader is an HTML document, the document An HTML parser for generating an HTML Document Object Model (DOM) by parsing, a content converter for extracting a portion requiring voice support from the DOM, generating a voice recognition grammar according to the correlation of a document, and generating a VoiceXML document, the VoiceXML Parse the document in the document and output the contents of the document to TTS (Text to Speech), and transmit the grammar (Grammar) and the voice command input from the user to the voice recognizer required for the voice recognition, and recognized from the voice recognizer VoiceXML parser that takes over and processes words and outputs sentences received from the VoiceXML parser as voice That the word recognized by the TTS with the programmer is configured to include a speech recognizer which is transferred to the VoiceXML interpreter.

상기와 같은 본 발명의 목적을 달성하기 위한 HTML 컨텐츠 변환을 통한 음성지원 방법은, 브라우저의 페이지 로더에 의해 문서를 다운로드하고, 상기 다운로드 된 문서가 HTML 문서일 경우, 상기 문서를 파싱하여 HTML DOM(Document Object Model)을 생성하는 단계, 상기 DOM에서 음성지원 할 부분을 추출하고, 추출된 문서에서 전후관계를 파악하여 음성인식을 위한 그래머를 생성하는 단계, 상기 음성인식 그래머를 이용하여 VoiceXML 문서를 생성하는 단계 및 상기 VoiceXML 문서를 VoiceXML 해석기에 전송하여 문서를 파싱하면서 문서의 내용을 TTS로 음성 출력하는 단계로 이루어진다. In order to achieve the object of the present invention as described above, the voice support method through HTML content conversion includes downloading a document by a page loader of a browser, and parsing the document by parsing the document if the downloaded document is an HTML document. Document object model), extracting a part to support voice from the DOM, grasping the context of the extracted document to generate a grammar for speech recognition, and generating a VoiceXML document using the speech recognition grammar. And transmitting the VoiceXML document to a VoiceXML interpreter to voicely output the contents of the document to the TTS while parsing the document.

이하, 본 발명에 따른 실시예를 첨부한 도면을 참조하여 상세히 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 HTML 컨텐츠 변환을 통한 음성지원 웹 브라우저의 구성도이다.1 is a block diagram of a voice-supporting web browser through the conversion of HTML content according to the present invention.

도 1에 도시된 바와 같이 음성지원 웹 브라우저의 페이지 로더(10)는 문서를 다운로드하고, 다운로드 된 문서를 HTML 파서(20)로 넘긴다. 상기 HTML 파서(20)는 상기 페이지 로더(10)로부터 전달된 문서가 HTML 문서일 경우, 상기 전달된 문서를 파싱하여 HTML DOM(Document Object Model)을 생성하고, 상기 페이지 로더(10)로부터 전달된 문서가 VoiceXML 문서일 경우에는 바로 VoiceXML 해석기(50)에 전달되어 음성 출력된다. As shown in FIG. 1, the page loader 10 of the voice-enabled web browser downloads a document and passes the downloaded document to the HTML parser 20. When the document delivered from the page loader 10 is an HTML document, the HTML parser 20 parses the delivered document to generate an HTML Document Object Model (DOM) and delivers the document from the page loader 10. If the document is a VoiceXML document, it is immediately transmitted to the VoiceXML interpreter 50 to output the voice.

상기 HTML 파서(20)에 의해 생성된 HTML DOM은 컨텐츠 변환기(30)로 전달되며, 상기 컨텐츠 변환기(30)는 상기 HTML DOM(31)에서 음성지원이 필요한 부분을 추출하고, 문서의 상관관계에 따른 음성인식 그래머를 생성하여 VoiceXML 문서(32)를 생성한다. The HTML DOM generated by the HTML parser 20 is delivered to the content converter 30, and the content converter 30 extracts a portion of the HTML DOM 31 that requires voice support and relies on document correlation. The voice recognition grammar is generated to generate a VoiceXML document 32.

또한, 상기 HTML DOM(31)을 랜더러(40)로 전달하여 사용자의 단말에 문서의 내용이 표시될 수 있도록 한다.In addition, the HTML DOM 31 is transmitted to the renderer 40 so that the contents of the document can be displayed on the user's terminal.

상기 컨텐츠 변환기(30)에 의해 생성된 VoiceXML 문서(32)와 상기 HTML 파서(20)에서 VoiceXML 해석기(50)로 전달된 VoiceXML 문서는 파싱되면서 문서의 내용이 TTS(Text to Speech)(70)로 음성 출력된다.The VoiceXML document 32 generated by the content converter 30 and the VoiceXML document delivered from the HTML parser 20 to the VoiceXML parser 50 are parsed while the contents of the document are transferred to the Text to Speech (TTS) 70. Voice output.

이때, 상기 VoiceXML 해석기(50)는 상기 컨텐츠 변환기(30)로부터 전달된 VoiceXML 문서로부터 음성인식에 필요한 그래머(Grammar)와 사용자로부터 입력된 음성 명령어를 음성인식기(60)에 전달하고, 상기 음성인식기(60)로부터 인식된 단어를 넘겨받아서 처리한다. At this time, the VoiceXML interpreter 50 transmits a grammar necessary for voice recognition and a voice command input from a user to the voice recognizer 60 from the VoiceXML document transmitted from the content converter 30, and the voice recognizer ( The word recognized from 60 is taken over and processed.

즉, 상기 웹 브라우저는 클라이언트에 설치되어 실시간으로 웹 페이지에서 제공하는 HTML 문서를 VoiceXML 문서로 변환하여 음성으로 출력한다.That is, the web browser is installed on the client to convert the HTML document provided in the web page in real time to the VoiceXML document to output the voice.

도 2는 본 발명의 일실시예에 따른 음성지원 웹 브라우저의 인터페이스를 나타내는 도면이다. 2 is a diagram illustrating an interface of a voice supporting web browser according to an embodiment of the present invention.

도 2에 도시된 바와 같이 신문기사, 게시판, 웹 메일 등을 지원하는 대부분의 포털 사이트에서 링크를 가지는 제목이 제공되면, 사용자는 웹 페이지에서 정보를 얻고자하는 제목에 상기 사용자 단말의 용도에 따라 스타일러스 펜이나 마우스의 오른쪽 버튼을 눌러 링크를 클릭 한다. As shown in FIG. 2, if a title having a link is provided in most portal sites that support newspaper articles, bulletin boards, web mails, and the like, the user may obtain a title from the web page according to the purpose of the user terminal. Click the link with the stylus pen or the right mouse button.

그리하면, 상기 링크를 가지는 제목에 메뉴 창이 뜨게 되는데, 상기 메뉴 창에는 'VoiceXML로 읽기', '링크이동', '링크이동 및 VoiceXML로 읽기' 메뉴가 제공되며 사용자는 원하는 메뉴를 선택하여 음성 또는 문자로 링크된 정보를 제공받게 된다.Then, a menu window appears in the title having the link, and the menu window is provided with 'Read as VoiceXML', 'Go To Link', 'Go To Link, and Read With VoiceXML' menus. You will be provided with text-linked information.

상기 사용자가 메뉴 창에서 'VoiceXML로 읽기'를 선택한 경우, 큰 제목에 해당하는 링크된 정보에서 핵심부분만이 링크의 이동 없이 음성으로 상기 사용자에게 제공되고, '링크이동'을 선택했을 경우, 보통 웹 페이지와 같이 문서로 링크된 내용이 상기 사용자에게 제공되며, 상기 '링크이동 및 VoiceXML로 읽기'를 선택한 경우에는 선택된 제목에 링크된 정보의 핵심부분이 사용자의 단말에 문서와 음성으로 동시 제공된다.When the user selects 'Read as VoiceXML' in the menu window, only the core part of the linked information corresponding to the large title is provided to the user by voice without moving the link, and when the user selects 'Move Link', The content linked to the document, such as a web page, is provided to the user, and when the 'Move Link and Read in VoiceXML' is selected, the core portion of the information linked to the selected title is simultaneously provided to the user's terminal in the document and voice. .

도 3은 도 2의 웹 브라우저 문서에서 VoiceXML 문서로 추출된 부분을 나타내는 도면으로써, 상기 도 2에서 설명한 바와 같이 상기 웹 페이지의 큰 제목에 링크된 핵심내용만을 사용자에게 음성 또는 문서로 제공하게 되는 것이다. FIG. 3 is a diagram illustrating a portion extracted from the web browser document of FIG. 2 as a VoiceXML document. As shown in FIG. 2, only the core content linked to the large title of the web page is provided to the user as a voice or a document. .

즉, 신문기사나 게시판, 웹 메일 등을 지원하는 대부분의 포털 사이트에서 링크를 가지는 제목이 있고, 그 제목을 클릭하면 세부 내용을 가지는 웹 페이지로 이동을 하는데 이 이동된 웹 페이지에는 메인 기사나 내용 외에 잡다한 정보를 가지는 링크의 경우, 상기 음성지원 웹 브라우저의 컨텐츠 변환기에서 VoiceXML 문서를 생성할 때 잡다한 내용을 삭제하고, 박스로 묶은 부분처럼 메인이 되는 핵심부분의 내용을 음성 지원할 부분을 추출하는 단계에서 추출하여 VoiceXML 문서를 생성한다. 대부분 선택한 링크 제목이 포함된 블록이 메인 블록이 된다.In other words, most portal sites that support newspaper articles, bulletin boards, and web mails have a title with a link, and clicking on the title takes you to a web page with details. In addition, in the case of a link having miscellaneous information, when generating a VoiceXML document in the content converter of the speech-supporting web browser, deleting miscellaneous contents and extracting a portion to support the voice of the main core part such as a boxed portion Extract from to create a VoiceXML document. In most cases, the block containing the selected link title becomes the main block.

이와 같이, 불필요한 부분을 제거함으로써 사용자 인터페이스를 단순화시킬 수 있고, 실행속도도 향상시킬 수 있는 장점이 있다. As such, by eliminating unnecessary parts, the user interface can be simplified and the execution speed can be improved.

도 4는 본 발명에 따른 HTML 컨텐츠 변환을 통한 음성지원 방법의 흐름도로써, 상기 웹 브라우저의 음성지원 동작과정을 더욱 상세하게 설명하면 다음과 같다.4 is a flow chart of a voice support method through HTML content conversion according to the present invention. The voice support operation process of the web browser will be described in detail as follows.

상기 웹 브라우저는 페이지 로더(10)에 의해 문서를 다운로드하고(S100), 상기 페이지 로더(10)로 다운로드 된 문서가 HTML 문서일 경우, 상기 문서를 파싱하여(S110) HTML DOM을 생성하게 된다(S120).The web browser downloads the document by the page loader 10 (S100), and when the document downloaded by the page loader 10 is an HTML document, the document is parsed (S110) to generate an HTML DOM ( S120).

만약, 상기 페이지 로더(10)에 다운로드 된 문서가 VoiceXML 문서일 경우에는, VoiceXML 해석기(50)로 보내져 TTS(70)에 의해 음성 출력되어 사용자에게 제공된다.If the document downloaded to the page loader 10 is a VoiceXML document, it is sent to the VoiceXML interpreter 50 and is outputted to the voice by the TTS 70 and provided to the user.

계속해서, 생성된 상기 DOM에서 음성지원 할 부분을 추출하고(S130), 추출된 VoiceXML 문서로 변환될 문서에서 전후문서와의 상관관계를 파악하여 음성인식을 위한 그래머를 생성한다(S140).Subsequently, a part to support voice from the generated DOM is extracted (S130), and a correlation for the document before and after is converted into a extracted VoiceXML document to generate a grammar for speech recognition (S140).

상기 생성된 음성인식 그래머를 이용하여 VoiceXML 문서를 생성한다(S150).A voiceXML document is generated using the generated voice recognition grammar (S150).

이때, 상기 VoiceXML 문서를 생성함에 있어서, 기존의 HTML 문서에서 링크에 해당하는 부분, 즉 'a' 태그를 가지는 문장이나 단어가 그래머에 추가되는데, 음성합성에 필요한 문장을 만들 때 링크에 해당하는 부분에 특수문자를 삽입하여 TTS(70)에서 음성 명령어에 해당하는 단어임을 알려 줄 수 있도록 한다.At this time, in generating the VoiceXML document, a portion corresponding to a link in the existing HTML document, that is, a sentence or word having an 'a' tag is added to the grammar, and a portion corresponding to the link when creating a sentence required for speech synthesis By inserting a special character in the TTS 70 to inform that it is a word corresponding to the voice command.

이렇게 생성된 상기 VoiceXML 문서는 상기 VoiceXML 해석기(50)로 전달되며, 상기 HTML DOM은 도 3에 설명했던 바와 같이 사용자의 메뉴 창 선택에 따라 문서로 표현되기 위해 랜더러(40)로 전달되어 사용자의 단말화면에 문서로 제공된다. The generated VoiceXML document is transmitted to the VoiceXML parser 50, and the HTML DOM is delivered to the renderer 40 to be represented as a document according to the user's menu window selection as described in FIG. It is provided as a document on the screen.

상기 VoiceXML 해석기(50)로 전달된 VoiceXML 문서와 그래머는 상기 VoiceXML 해석기(50)에 의해 음성인식기(60)로 전달되고, 상기 음성인식기(60)는 상기 VoiceXML 해석기(50)로부터 넘겨받은 그래머를 가지고 인식된 단어를 상기 VoiceXML 해석기(50)로 재 전달한다(S160).The VoiceXML document and the grammar transmitted to the VoiceXML parser 50 are delivered to the voice recognizer 60 by the VoiceXML parser 50, and the voice recognizer 60 has a grammar passed from the VoiceXML parser 50. The recognized word is re-delivered to the VoiceXML parser 50 (S160).

또한, 상기 음성인식기(60)는 구현상 클라이언트에 탑재되기에는 현재 기술로 어려움이 있으며, 음성인식만을 수행하는 음성인식 서버를 둘 수 있다. 이런 경우, 상기 VoiceXML 해석기(50)에서는 음성인식서버에 그래머를 넘기고 클라이언트에서 입력받은 음성을 음성인식서버에 넘기면 상기 음성인식 서버는 그래머에서 인식된 단어를 클라이언트의 VoiceXML 해석기(50)에 전달한다.In addition, the voice recognizer 60 is difficult to be mounted on the client in current technology, and may have a voice recognition server that performs only voice recognition. In this case, the VoiceXML interpreter 50 passes the grammar to the voice recognition server and passes the voice input from the client to the voice recognition server, and the voice recognition server delivers the word recognized by the grammar to the voiceXML interpreter 50 of the client.

상기 음성인식기(60)에 의해 인식된 단어는 상기 VoiceXML 문서와 함께 상기 VoiceXML 해석기(50)에 의해 파싱되어 문서의 내용이 TTS(70)로 음성 출력되어 사용자에게 제공된다(S170).The word recognized by the voice recognizer 60 is parsed by the VoiceXML interpreter 50 together with the VoiceXML document, and the contents of the document are output to the TTS 70 by voice and provided to the user (S170).

또한, 음성인식 그래머 생성시 상기 웹 페이지의 링크에 해당하는 단어와 페이지의 전후관계를 파악하는 '이전', '다음'을 삽입하여 음성으로 페이지 이동이 가능하도록 하고, 상기 링크이동이 가능한 단어에 대해서 특정 단어나 소리로 사용자에게 링크의 이동을 알린다.In addition, when generating a speech recognition grammar, insert a 'previous' and 'next' to identify the words and links between the words of the web page and the front and rear relationship of the page, so that the page can be moved by voice, Inform the user of the link movement with a specific word or sound.

예컨대, 문장 속에 특정 ASCII 문자를 삽입하고, 상기 TTS가 상기 특정문자를 만날 때 사용자가 알 수 있는 beep음 소리를 내게 하는 것이다.For example, it inserts a certain ASCII character into a sentence and makes a beep sound that a user can know when the TTS encounters the specific character.

이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 또한 설명하였으나, 본 발명은 상기한 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 실시가 가능한 것을 물론이고, 그와 같은 변경은 기재된 청구범위 내에 있게 된다.Although the preferred embodiments of the present invention have been illustrated and described above, the present invention is not limited to the above-described embodiments, and the present invention is not limited to the above-described embodiments without departing from the spirit of the present invention as claimed in the claims. Of course, any person skilled in the art can make various modifications, and such changes are within the scope of the claims.

이상에 설명한 바와 같이 본 발명에 의하면, 휴대단말 등에서 웹 문서를 검색할 때 시각뿐만 아니라 음성 등 다양한 사용자 인터페이스가 가능하며, 보행 중이나 이동 중에도 화면에 구애받지 않고 웹 검색을 할 수 있다는 장점이 있다.As described above, according to the present invention, when searching for a web document in a mobile terminal or the like, various user interfaces such as voice as well as time are possible, and there is an advantage that a web search can be performed regardless of the screen while walking or moving.

또한, 본 발명에서는 HTML 웹 문서의 음성출력을 위해서 문서를 재 작성하거나 ARS 서버 등을 구축하여 서비스하는 등의 기존 방식과 달리 클라이언트에서 실시간으로 직접문서를 변환하고 사용자의 필요에 따라 필요한 부분만 표준화된 문서양식인 VoiceXML로 변환하기 때문에 비용면에서 탁월한 장점이 있으며 적용과 상용화가 쉽다는 장점이 있다. In addition, in the present invention, unlike the existing method of rewriting a document or constructing an ARS server for a voice output of an HTML web document, the client converts the document directly in real time and standardizes only necessary parts according to a user's needs. It converts to VoiceXML, which is a document form, and has an advantage in terms of cost and is easy to apply and commercialize.

도 1은 본 발명에 따른 HTML 컨텐츠 변환을 통한 음성지원 웹 브라우저의 구성도,1 is a block diagram of a voice support web browser through the conversion of HTML content according to the present invention,

도 2는 본 발명의 일실시예에 따른 음성지원 웹 브라우저의 인터페이스를 나타내는 도면,2 is a diagram showing an interface of a voice supporting web browser according to one embodiment of the present invention;

도 3은 도 2의 웹 브라우저 문서에서 VoiceXML 문서로 추출된 부분을 나타내는 도면,3 is a view showing a part extracted as a VoiceXML document from the web browser document of FIG.

도 4는 본 발명에 따른 HTML 컨텐츠 변환을 통한 음성지원 방법의 흐름도이다.4 is a flowchart of a voice supporting method through HTML content conversion according to the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

10 : 페이지 로더 20 : HTML 파서10: Page Loader 20: HTML Parser

30 : 컨텐츠 변환기 31 : HTML DOM30: Content Converter 31: HTML DOM

32 : VoiceXML 문서 40 : 랜더러(Renderer)32: VoiceXML document 40: Renderer

50 : VoiceXML 해석기 60 : 음성인식기50: VoiceXML interpreter 60: Speech recognizer

70 : TTS(Text to Speech) 70: TTS (Text to Speech)

Claims

In a speech-enabled web browser through HTML (Hyper Text Makeup Language) content conversion,

The web browser includes a page loader for downloading a document;

An HTML parser for parsing the document to generate an HTML Document Object Model (DOM) when the document downloaded from the page loader is an HTML document;

A content converter which extracts a portion requiring voice support from the DOM, generates a voice recognition grammar according to the correlation of the document, and generates a VoiceXML document;

Parse the document from the VoiceXML document and output the content of the document to TTS (Text to Speech), and transfer the grammar required for voice recognition and voice commands input from the user to the voice recognizer, and from the voice recognizer. A VoiceXML parser that takes over and processes recognized words; And

And a speech recognizer for transmitting a word received by the voiceXML parser to the VoiceXML interpreter.

The method of claim 1, wherein the web browser

Voice-supported web browser through HTML content conversion, characterized in that the HTML document installed on the client in real time to convert the HTML document provided by the web page to the VoiceXML document to output the voice.

The method of claim 1, wherein the content converter

A voice-enabled web browser using HTML content, which extracts only the core part of the downloaded HTML document, converts it into VoiceXML without outputting a link, and outputs it.

The method of claim 1, wherein the content converter

Voice generation by converting HTML content by inserting 'previous' and 'next' to identify the word and page of the link of the web page and the relationship between the pages when generating the voice recognition grammar. Web browser.

The web browser downloading a document by a page loader, and when the downloaded document is an HTML document, parsing the document to generate an HTML DOM;

Extracting a part to support voice from the DOM, and identifying a front and rear relationship in the extracted document to generate a grammar for voice recognition;

Generating a VoiceXML document using the speech recognition grammar; And

And transmitting the VoiceXML document to a VoiceXML interpreter, parsing the document and outputting the content of the document to the TTS.

The method of claim 5, wherein the first step

Voice support method by converting the HTML content, characterized in that to pass the generated HTML DOM to the renderer.

The method of claim 5, wherein the step 4

And transmitting the grammar transmitted to the VoiceXML parser to a voice recognizer, receiving the recognized result again, and outputting the content of the document as a voice.

The method of claim 5, wherein the step 4

Voice support method through the conversion of the HTML content, characterized in that notifying the user with a specific word or sound for the link-movable word.