KR20090058360A

KR20090058360A - Method, system and computer-readable recording medium for extracting text from web page, converting same text into audio data file, and providing resultant audio data file

Info

Publication number: KR20090058360A
Application number: KR1020070125105A
Authority: KR
Inventors: 이윤현; 김규일; 박진수
Original assignee: 엔에이치엔(주)
Priority date: 2007-12-04
Filing date: 2007-12-04
Publication date: 2009-06-09
Also published as: KR100923942B1

Abstract

A method for extracting text from a web page and converting the text into a voice data file to provide the file, a system, and a computer readable recording medium are provided to remarkably increase a reuse rate without causing a capacity problem of a user terminal device, thereby easily reusing a generated voice data file or sharing the file with a third party. A text recognizer(230) recognizes text extracted from a terminal device. A text-speech converter(240) converts the extracted text into a voice. A voice data file generator(250) compresses the converted voice to generate a voice data file. A communication unit(260) transmits an identifier indicative of a location of the compressed voice data file to the terminal device. A database stores the compressed voice data file. The converted voice is stored in WAV type. The compressed voice data file is an MP3 file. The identifier indicative of the location of the compressed voice data file is a URL(Uniform Resource Locator).

Description

TECHNICAL, SYSTEM AND COMPUTER-READABLE RECORDING MEDIUM FOR EXTRACTING TEXT FROM WEB PAGE, CONVERTING SAME TEXT INTO AUDIO DATA FILE , AND PROVIDING RESULTANT AUDIO DATA FILE}

본 발명은 사용자 단말 장치에서 웹 페이지 상의 텍스트를 추출하고 이를 음성 데이터 파일로 변환하여 제공하기 위한 방법, 시스템 및 컴퓨터 판독 가능한 기록 매체에 관한 것으로서, 보다 상세하게는 웹 페이지 상의 텍스트를 음성 데이터 파일로 변환하여 제공하되, 해당 파일을 사용자가 효율적으로 재사용할 수 있도록 하는 방법, 시스템 및 컴퓨터 판독 가능한 기록 매체에 관한 것이다.The present invention relates to a method, a system, and a computer-readable recording medium for extracting text on a web page from a user terminal device, and converting the text into a voice data file. A method, a system, and a computer-readable recording medium for converting and providing a file to enable a user to efficiently reuse the file.

근래에 들어, 인터넷 사용이 보편화되면서 인터넷을 통한 다양한 정보의 획득이 가능해지고 있다. 웹 사이트를 통해 인터넷 서비스를 제공하는 업체는 더욱더 다양해져 가는 인터넷 사용자들의 욕구를 충족시키기 위해 다양한 종류의 서비스를 제공하고 있으며, 그러한 서비스의 종류 또한 하루가 다르게 증가하고 있는 추세이다.In recent years, as the use of the Internet is becoming more common, various types of information can be obtained through the Internet. Companies that provide Internet services through Web sites provide various kinds of services to meet the needs of more and more diverse Internet users, and the types of such services are also increasing day by day.

사용자들은 이러한 업체들이 제공하는 서비스를 다양한 형태로 접하고 있으며, 특히, 웹 사이트를 통해 뉴스 정보, 사전 정보, 전문 정보, 지역 정보, 쇼핑 정보 등의 다양한 인터넷 컨텐츠를 얻고자 한다.Users are exposed to the services provided by these companies in various forms, and in particular, they want to obtain various Internet contents such as news information, dictionary information, professional information, local information, and shopping information through web sites.

이러한 사용자들은 자신이 원하는 컨텐츠를 얻기 위해 웹 사이트를 통해 검색을 수행하고, 이를 통해 특정 웹 페이지(또는 웹 문서)에서 원하는 컨텐츠를 얻게 되면, 주로 텍스트로 이루어져 있는 해당 컨텐츠를 육안에 의해서 해독하는 것이 일반적이다. 그러나, 사용자의 입장에서는 이렇듯 텍스트 위주로 제공되는 컨텐츠만을 이용하는 것은 멀티미디어 시대인 요즈음에 있어서는 달갑지 않은 일일 수 있고, 현실적으로는 웹 페이지가 담고 있는 정보의 양이 점점 많아짐에 따라 사용자가 텍스트의 형태로 제공 받은 컨텐츠를 해독하기 위하여 육안으로 그 텍스트를 모두 다 읽을 때까지 사용자 단말 장치의 모니터와 같은 디스플레이 수단에서 시선을 떼지 말아야 하게 되는 문제점도 있다. 또한, 사용자 중에는 컨텐츠를 통하여 원하는 정보를 얻으면서 다른 일도 진행하고자 하는 멀티태스킹 욕구를 가진 자도 있을 수 있는데, 텍스트 위주로 제공되는 컨텐츠만을 이용하는 경우에는 이러한 욕구도 충족되기 어려운 측면이 있었다.These users search through a web site to get the content they want, and when they get the content they want from a particular web page (or web document), it's usually easier to decipher the content, which is mostly text. It is common. However, from the user's point of view, using only text-based content may be unpleasant in the multimedia era, and in reality, as the amount of information contained in a web page increases, the user is provided in the form of text. There is also a problem in that the user must not take his / her eyes away from the display means such as the monitor of the user terminal device until all the text is read with the naked eye in order to decrypt the content. In addition, some users may have a multitasking desire to perform other tasks while obtaining desired information through the contents. In the case of using only the contents provided mainly on the text, such desires may not be satisfied.

한편, 근래에 들어, VoIP(Voice over IP) 기술, 음성 인식 기술, 음성 변환 기술, 음성 합성 기술, 자동 응답 시스템 등의 CTI(Computer Telephony Integration) 기술이 많은 관심을 끌고 있는 것이 사실인데, 이러한 기술들에 의하면 인터넷 환경에서도 사용자가 음성으로 지시를 내리고, 음성으로 정보를 제공 받으며, 음성으로 의사소통하는 진일보한 서비스를 누릴 수 있게 될 것으로 기대되고 있다.Recently, computer telephony integration (CTI) technologies such as voice over IP (VoIP) technology, voice recognition technology, voice conversion technology, voice synthesis technology, and answering machine have attracted much attention. According to the Internet environment, the user is expected to be able to provide an advanced service in which voices are instructed, voices are provided with information, and voices are communicated.

이에 따라, 텍스트 위주의 컨텐츠 제공에 따른 문제를 해결하는 한편, CTI 기술에 폭넓게 이용하기 위하여 텍스트-음성 변환(TTS; Text To Speech) 기술이 개발된 바 있다. 텍스트-음성 변환 기술은 음성 인식 기술보다 널리 쓰일 수 있는 기술로서, 각종 텍스트 정보를 음성으로 변환하여 제공하는 휴먼 인터페이스 기술이다. 웹 페이지에서의 텍스트-음성 변환 기술은 주로 웹 페이지 상의 텍스트를 추출하고 이를 음성으로 변환하여 사용자에게 제공하는 방식으로 실현된다. 예를 들면, 사용자가 웹 페이지의 일정 위치에서 일정 시간 동안 마우스를 정지시키면 발생하는 마우스오버(mouse-over) 이벤트에 따라 그때의 마우스 포인터의 위치에 해당하는 텍스트를 추출한 후 이를 음성으로 변환하는 경우나, 사용자가 웹 페이지 상의 텍스트의 일정 부분을 드래깅(dragging)하여 이를 음성으로 변환하는 경우를 언급할 수 있다.Accordingly, text-to-speech (TTS) technology has been developed to solve the problem of providing text-oriented content, and to widely use the CTI technology. Text-to-speech technology is a technology that can be used more widely than speech recognition technology, and is a human interface technology that converts and provides various text information into speech. Text-to-speech technology in web pages is mainly realized by extracting the text on the web page, converting it to speech, and presenting it to the user. For example, when a user stops the mouse at a certain position on a web page for a predetermined time, the text corresponding to the position of the mouse pointer is extracted after the mouse-over event occurs. Alternatively, it may refer to a case in which a user drags a portion of text on a web page and converts it to a voice.

사용자 입장에서는 때에 따라 텍스트로부터 변환된 음성 데이터 파일을 반복하여 재생하거나 저장할 필요가 있는데(특히, 해당 텍스트가 교육용 텍스트이거나 기타 기억을 요하는 텍스트인 경우에 더욱 그러함), 이 경우 매번 해당 텍스트를 찾아 이로부터 음성 데이터 파일을 생성하여 재생하거나 저장하게 되면, 해당 음성 데이터 파일을 필요할 때에 다시 찾아 사용하기가 어렵게 되고, 나아가 사용자 단말 장치의 용량 부족 문제가 발생할 수 있는 등 불편한 점이 많았다.From time to time, users may need to repeatedly play or save voice data files converted from text (especially if the text is educational text or other text that needs to be remembered). When the voice data file is generated and played back or stored therefrom, it is difficult to find and use the voice data file again when necessary, and there is a lot of inconveniences such as insufficient capacity of the user terminal device.

본 발명은 상술한 종래 기술의 문제점을 해결하는 데에 그 목적이 있다.The present invention aims to solve the above-mentioned problems of the prior art.

또한, 본 발명은 사용자가 웹 페이지를 통하여 획득하는 음성 데이터 파일의 재사용성을 높이는 데에 그 목적이 있다.In addition, an object of the present invention is to increase the reusability of a voice data file obtained by a user through a web page.

그리고, 본 발명은 사용자가 음성 데이터 파일을 굳이 사용자 단말 장치에 분류하여 저장할 필요 없이 외부 연산 장치로부터 필요한 때마다 제공 받을 수 있도록 하는 데에 그 목적이 있다.In addition, an object of the present invention is to enable a user to be provided whenever necessary from an external computing device without having to classify and store voice data files in the user terminal device.

또한, 본 발명은 텍스트-음성 변환 기술의 활용 영역을 넓혀 해당 기술에 대한 사용자의 관심을 고취하는 데에도 그 목적이 있다.In addition, the present invention has an object to broaden the application area of the text-to-speech technology to increase the user's interest in the technology.

상기 목적을 달성하기 위한 본 발명의 구성은 다음과 같다.The configuration of the present invention for achieving the above object is as follows.

본 발명의 일 태양에 따르면, 웹 페이지로부터 텍스트를 추출하고 이를 음성으로 변환하여 제공하는 방법으로서, 단말 장치에서 텍스트를 추출하는 단계, 상기 추출된 텍스트를 상기 단말 장치로부터 상기 단말 장치와 통신 가능한 연산 시스템으로 전송하는 단계, 상기 연산 시스템에서 상기 추출된 텍스트를 음성으로 변환하여 음성 데이터 파일을 생성하는 단계, 상기 연산 시스템에서 상기 음성 데이터 파일을 압축하는 단계, 상기 연산 시스템에 상기 압축된 음성 데이터 파일을 저장하는 단계, 및 상기 연산 시스템으로부터 상기 단말 장치로 상기 압축되어 저장된 음성 데이터 파일의 위치를 나타내는 식별자를 전송하는 단계를 포함하는 방법이 제 공된다.According to an aspect of the present invention, a method of extracting text from a web page and converting the text into a voice is provided, the method comprising: extracting text from a terminal device, and computing the extracted text from the terminal device to communicate with the terminal device; Transmitting to the system, generating the speech data file by converting the extracted text into speech in the computing system, compressing the speech data file in the computing system, and compressing the compressed speech data file in the computing system. And storing an identifier indicating a location of the compressed and stored voice data file from the computing system to the terminal device.

본 발명의 다른 태양에 따르면, 단말 장치와 통신하고, 웹 페이지로부터 텍스트를 추출하여 이를 음성으로 변환해서 제공하기 위한 시스템으로서, 단말 장치로부터 추출된 텍스트를 인식하기 위한 텍스트 인식부, 상기 추출된 텍스트를 음성으로 변환하기 위한 텍스트-음성 변환부, 상기 텍스트-음성 변환부에서 변환된 음성을 압축하여 압축된 음성 데이터 파일을 생성하는 음성 데이터 파일 생성부, 및 상기 압축된 음성 데이터 파일의 위치를 나타내는 식별자를 상기 단말 장치로 전송하는 통신부를 포함하는 시스템이 제공된다.According to another aspect of the present invention, there is provided a system for communicating with a terminal device, extracting text from a web page, and converting the text into a voice, the system comprising: a text recognition unit for recognizing text extracted from the terminal device; A text-to-speech converter for converting the speech into a voice, a voice data file generator for compressing the voice converted by the text-to-speech converter to generate a compressed voice data file, and a location of the compressed voice data file. A system including a communication unit for transmitting an identifier to the terminal device is provided.

이 외에도, 본 발명에 따르면 웹 페이지로부터 텍스트를 추출하고 이를 음성으로 변환하여 제공하기 위한 방법, 시스템, 및 상기 방법들을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공된다.In addition, according to the present invention, there is further provided a computer readable recording medium for recording a method, a system for extracting text from a web page, converting the text into a voice, and providing a computer program for executing the methods.

상기와 같은 본 발명을 이용함으로써, 인터넷 상에서 텍스트-음성 변환 기술을 이용할 때에, 사용자가 텍스트로부터 변환된 음성 데이터 파일을 손쉽게 찾아 이를 이용하거나 저장할 수 있게 된다.By using the present invention as described above, when using the text-to-speech technology on the Internet, the user can easily find and use or store the voice data file converted from the text.

또한, 본 발명에 따르면, 텍스트로부터 변환된 음성 데이터 파일의 재사용성이 현저하게 높아져 사용자 단말 장치의 용량 문제를 초래함이 없이 손쉽게 한 번 생성된 음성 데이터 파일을 다시 사용하거나 제3자와 공유할 수 있게 된다.Further, according to the present invention, the reusability of the voice data file converted from text is significantly increased, so that the voice data file generated once can be easily reused or shared with a third party without causing a capacity problem of the user terminal device. It becomes possible.

이하에서는, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발 명을 용이하게 실시할 수 있도록 하기 위하여, 본 발명의 바람직한 실시예들을 첨부된 도면을 참조하여 상세히 설명하기로 한다.DETAILED DESCRIPTION Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement the present invention.

전체 시스템의 구성Configuration of the entire system

도 1은 본 발명의 일 실시예에 따라 텍스트를 음성으로 변환하여 음성 데이터 파일의 형식으로 제공하기 위한 전체 시스템의 개략적인 구성도이다.1 is a schematic structural diagram of an entire system for converting text into speech and providing the text in the form of a voice data file according to an embodiment of the present invention.

도 1에 도시되어 있는 바와 같이, 웹 페이지 상의 텍스트를 음성으로 변환한 후 이를 음성 데이터 파일의 형식으로 제공하기 위한, 본 발명의 일 실시예에 따른 전체 시스템은, 인터넷(500), 인터넷(500)에 접속 가능한 사용자 단말 장치(100), 텍스트-음성 변환을 요청하고 변환된 음성을 압축하여 압축된 음성 데이터 파일을 생성하는 웹 애플리케이션 서버(WAS; Web Application Server)(200), 텍스트-음성 변환을 수행하기 위한 텍스트 음성 변환 서버(300) 및 음성 데이터 파일을 저장/관리하기 위한 음성 데이터 파일 서버(400)로 구성될 수 있다.As illustrated in FIG. 1, the entire system according to an embodiment of the present invention for converting text on a web page into a voice and then providing the same in the form of a voice data file includes the Internet 500 and the Internet 500. A user terminal device 100 capable of accessing the web), a web application server (WAS) 200 requesting a text-to-speech conversion, and compressing the converted speech to generate a compressed voice data file. A text-to-speech server 300 for performing the operation and the voice data file server 400 for storing / managing the voice data file.

먼저, 본 발명의 전체 시스템에 포함되는 사용자 단말 장치(100)는 인터넷(500)을 통하여 WAS(200)에 접속하기 위한 디지털 기기로서, 개인용 컴퓨터(예를 들어, 데스크탑 컴퓨터, 노트북 컴퓨터 등), PDA(Personal Digital Assistant), 이동 전화기 등이 제한 없이 다양하게 사용될 수 있다.First, the user terminal device 100 included in the entire system of the present invention is a digital device for accessing the WAS 200 through the Internet 500, and may include a personal computer (for example, a desktop computer, a notebook computer, etc.), Personal Digital Assistants (PDAs), mobile phones, and the like can be used in various ways without limitation.

사용자는 사용자 단말 장치(100)를 이용하여 Internet Explorer^TM, Netscape^TM, Lynx^TM 등과 같은 웹 브라우저 또는 기타 정보 검색을 가능하게 하는 공지의 프로그램을 실행시켜 WAS(200)에 접속한 후, 추출의 대상이 될 웹 페이지 상 의 텍스트를 지정하게 된다(또는, 필요에 따라 사용자는 음성 변환될 텍스트의 적어도 일부를 웹 페이지 상에서 직접 타이핑하여 입력할 수도 있다). 이러한 지정을 위하여, 사용자는 텍스트의 일정 범위에 대하여 마우스오버를 수행하거나 마우스 드래깅을 행할 수 있다. 이에 따라 추출된 텍스트는 WAS(200)에 제공될 수 있다. 바람직하게는, 사용자는, 웹 페이지 상에서 텍스트를 지정하기 전에, 검색 질의어를 입력하고 전송하는 일련의 과정을 거침으로서 자신이 음성으로 변환하고자 하는 텍스트가 위치한 웹 페이지를 찾아 볼 수도 있다.User to execute a well-known program by using the user terminal apparatus 100 that enables the web browser or other information retrieval, such as Internet Explorer ^TM, Netscape ^TM, Lynx ^TM subject then connected to the WAS (200), extracting The text on the web page is designated (or, if necessary, the user may directly input at least a portion of the text to be voice-converted on the web page). For this designation, the user can perform a mouseover or mouse dragging over a range of text. Accordingly, the extracted text may be provided to the WAS 200. Preferably, the user may browse the web page where the text he wishes to convert to speech is processed by inputting and transmitting a search query before designating the text on the web page.

본 발명의 일 실시예에 따르면, 사용자 단말 장치(100)에는 음성으로 변환하고자 하는 텍스트를 추출하여 이를 WAS(200)에 전송하기 위한 프로그램 모듈(미도시됨)이 포함될 수 있다. 이러한 프로그램 모듈은 WAS(200)로부터 사용자 단말 장치(100)로 다운로드되어 설치될 수 있다. 한편, 음성 데이터 파일을 재생하기 위한 플레이어가 사용자 단말 장치(100)에 더 포함될 수 있다.According to an embodiment of the present invention, the user terminal device 100 may include a program module (not shown) for extracting a text to be converted into a voice and transmitting it to the WAS 200. Such a program module may be downloaded and installed from the WAS 200 to the user terminal device 100. Meanwhile, a player for playing the voice data file may be further included in the user terminal device 100.

WAS(200)는 본 발명의 일 실시예에 따라 사용자 단말 장치(100)로부터 전송된 텍스트에 기초하여 음성 데이터 파일을 생성한 후 이를 사용자 단말 장치(100)에 제공하는 기능을 수행하는 서버이다. 관련하여, 비록 도 1에서는 WAS(200)가 텍스트-음성 변환 서버(300)나 음성 데이터 파일 서버(400)와 별개로 구성되어 있는 것으로 도시되어 있지만, 본 발명을 구현하는 당업자의 필요에 따라, 텍스트-음성 변환 서버(300)나 음성 데이터 파일 서버(400) 중 적어도 하나와 결합되어 있거나 이들을 포함하여 구성되어 있을 수도 있다.The WAS 200 is a server that generates a voice data file based on text transmitted from the user terminal device 100 and provides the same to the user terminal device 100 according to an embodiment of the present invention. In this regard, although FIG. 1 shows that the WAS 200 is configured separately from the text-to-speech server 300 or the voice data file server 400, according to the needs of those skilled in the art to implement the present invention, It may be combined with or include at least one of the text-to-speech server 300 or the voice data file server 400.

텍스트-음성 변환 서버(300)는 실제로 텍스트-음성 변환을 수행하기 위한 서 버로서, 본 발명의 일 실시예에 따르면, WAS(200)로부터 전송 받은 텍스트를 음성으로 변환한 후, 상기 음성 데이터 파일을 다시 WAS(200)로 전송하는 역할을 한다. 이때, 텍스트-음성 변환 서버(300) 내의 텍스트-음성 변환 처리부(미도시됨)가 텍스트를 음성으로 변환하는 처리를 수행할 수 있다. 텍스트-음성 변환 서버(300)는 텍스트-음성 변환을 위한 음성 변환 데이터베이스(미도시됨)를 포함할 수 있다.Text-to-speech server 300 is actually a server for performing text-to-speech conversion, according to an embodiment of the present invention, after converting the text received from the WAS 200 to speech, the voice data file It serves to send back to the WAS (200). In this case, the text-to-speech conversion unit (not shown) in the text-to-speech server 300 may perform a process of converting the text into voice. The text-to-speech server 300 may include a voice conversion database (not shown) for text-to-speech conversion.

본 발명의 일 실시예에 따르면, 텍스트-음성 변환 서버(300)의 음성 변환 데이터베이스(미도시됨)에는 특정 텍스트에 대응하는 음성 데이터에 대한 정보가 미리 저장되어 있을 수 있다. 이렇듯 음성 변환 데이터베이스를 이용하여 텍스트에 기초한 음성 데이터 파일을 획득하는 기술에 관하여는 본 출원인의 연관 출원인 한국특허출원 제10-2007-0119406호(출원일: 2007년 11월 21일) 및 제10-2007-0122819호(출원일: 2007년 11월 29일)를 참조할 수 있다.According to an embodiment of the present invention, information on voice data corresponding to a specific text may be stored in advance in the voice conversion database (not shown) of the text-to-speech server 300. As described above, regarding the technology of acquiring a text data file based on text using a speech conversion database, Korean Patent Application Nos. 10-2007-0119406 (Application Date: November 21, 2007) and 10-2007 of the present applicant -0122819 (filed November 29, 2007).

이에 더하여, 공지된 텍스트-음성 변환 기술에 따르면, 음절, 단어, 단락 및/또는 문장 단위의 텍스트에 대하여 구문 구조 분석이 수행되고 나서, 읽기 변환 과정을 통해 텍스트가 음소열로 변환되며, 획득된 음소열과 구문 구조 정보를 바탕으로 하고 텍스트-음성 변환 서버(300)의 음성 변환 데이터베이스(미도시됨)에 저장되어 있는 읽기 규칙과 운율 정보를 참조함으로써 텍스트로부터 변환된 음성이 생성될 수 있다.In addition, according to the well-known text-to-speech conversion technique, a syntax structure analysis is performed on text in syllable, word, paragraph, and / or sentence units, and then the text is converted into a phoneme string through a read conversion process. A voice converted from text may be generated based on phoneme string and syntax structure information and by referring to reading rules and rhyme information stored in a voice conversion database (not shown) of the text-to-speech conversion server 300.

이상 개략적으로 기술한 바와 같은 텍스트-음성 변환 기술은 이미 공지의 것이므로, 이 외의 다양한 기술을 통해 텍스트-음성 변환 서버(300)에서 텍스트-음성 변환이 수행될 수 있음은 자명하다.Since the text-to-speech conversion technique as outlined above is already known, it is apparent that the text-to-speech conversion may be performed in the text-to-speech conversion server 300 through various other techniques.

한편, 음성 데이터 파일 서버(400)는 WAS(200)로부터 전송된 음성 데이터 파일을 저장하고 관리하는 기능을 수행한다. 이러한 음성 데이터 파일 서버(400)에는 상기 기능을 위한 전용 데이터베이스가 따로 구비되어 있을 수도 있다. 또한, 음성 데이터 파일 서버(400)에 위치한 음성 데이터 파일은 그 반복된 사용을 위하여 URL과 같은 파일 위치를 지시하는 식별자에 의해 식별될 수 있고, 사용자들은 이러한 식별자 정보에 기초하여 음성 데이터 파일을 재사용하거나 다른 사용자와 공유할 수 있다.Meanwhile, the voice data file server 400 stores and manages the voice data file transmitted from the WAS 200. The voice data file server 400 may be separately provided with a dedicated database for the function. In addition, the voice data file located in the voice data file server 400 may be identified by an identifier indicating a file location such as a URL for repeated use, and users may reuse the voice data file based on this identifier information. Or share with others.

이상에서, 본 발명의 일 실시예에 따른 전체 시스템 구성에 대하여 설명하였으나, 이는 전적으로 예시적인 것에 불과함이 이해되어야 한다. 예를 들어, 본 발명의 변형 실시예에서는, WAS(200)의 구성요소의 일부가 텍스트-음성 변환 서버(300) 및/또는 음성 데이터 파일 서버(400)에 실질적으로 포함될 수도 있다. 또한, 도면부호 200 내지 400의 각 서버는 각각 둘 이상의 컴퓨터로 이루어지는 컴퓨터 그룹일 수도 있다.In the above, the overall system configuration according to an embodiment of the present invention has been described, but it should be understood that this is merely exemplary. For example, in a variant embodiment of the invention, some of the components of WAS 200 may be substantially included in text-to-speech server 300 and / or voice data file server 400. In addition, each server 200 to 400 may be a computer group consisting of two or more computers.

이하에서는, 본 발명의 구현을 위하여 가장 중요한 기능을 수행하는 WAS(200)의 내부 구성 및 각 구성요소들의 기능에 대하여 살펴보기로 한다.Hereinafter, the internal configuration of the WAS 200 performing the most important functions for the implementation of the present invention and the function of each component will be described.

WASWAS 의 내부 구성Internal composition of

도 2는 본 발명의 일 실시예에 따른 WAS(200)의 내부 구성을 상세하게 도시한 도면으로서, 도 2를 참조하면, WAS(200)는 제어부(210), 인터페이스부(220), 텍스트 인식부(230), 텍스트-음성 변환부(240), 음성 데이터 파일 생성부(250), 통신부(260) 등을 포함함을 알 수 있다. 본 발명의 일 실시예에 따르면, 제어부(210), 인터페이스부(220), 텍스트 인식부(230), 텍스트-음성 변환부(240), 음성 데이터 파일 생성부(250), 통신부(260)는 그 중 적어도 일부가 WAS(200)에 포함되거나 WAS(200)와 통신하는 프로그램 모듈들일 수 있다. 이러한 프로그램 모듈들은 운영 시스템, 응용 프로그램 모듈 및 기타 프로그램 모듈의 형태로 WAS(200)에 포함될 수 있으며, 물리적으로는 여러가지 공지의 기억 장치 상에 저장될 수 있다. 또한, 이러한 프로그램 모듈들은 WAS(200)와 통신 가능한 원격 기억 장치에 저장될 수도 있다. 한편, 이러한 프로그램 모듈들은 본 발명에 따라 후술할 특정 업무를 수행하거나 특정 추상 데이터 유형을 실행하는 루틴, 서브루틴, 프로그램, 오브젝트, 컴포넌트, 데이터 구조 등을 포괄하지만, 이에 제한되지는 않는다.2 is a diagram illustrating an internal configuration of the WAS 200 according to an embodiment of the present invention in detail. Referring to FIG. 2, the WAS 200 includes a control unit 210, an interface unit 220, and text recognition. It can be seen that the unit 230, the text-to-speech converter 240, the voice data file generator 250, the communication unit 260 and the like. According to an embodiment of the present invention, the control unit 210, the interface unit 220, the text recognition unit 230, the text-to-speech conversion unit 240, the voice data file generator 250, the communication unit 260 At least some of them may be program modules included in or in communication with the WAS 200. Such program modules may be included in the WAS 200 in the form of operating systems, application modules, and other program modules, and may be physically stored on various known storage devices. In addition, these program modules may be stored in a remote storage device that can communicate with the WAS 200. On the other hand, such program modules include, but are not limited to, routines, subroutines, programs, objects, components, data structures, etc. that perform particular tasks or execute particular abstract data types, described below, in accordance with the present invention.

먼저, 본 발명에 따른 제어부(210)는 인터페이스부(220), 텍스트 인식부(230), 텍스트-음성 변환부(240), 음성 데이터 파일 생성부(250), 통신부(260) 간의 데이터의 흐름을 제어하는 기능을 수행한다. 즉, 본 발명에 따른 제어부(210)는 외부로부터의 또는 WAS(200)의 각 구성요소 간의 데이터의 흐름을 제어함으로써, 인터페이스부(220), 텍스트 인식부(230), 텍스트 음성 변환부(240), 음성 데이터 파일 생성부(250)에서 각각 고유 기능을 수행하도록 제어한다.First, the control unit 210 according to the present invention flows data between the interface unit 220, the text recognition unit 230, the text-to-speech conversion unit 240, the voice data file generator 250, and the communication unit 260. It performs the function of controlling. That is, the controller 210 according to the present invention controls the flow of data from the outside or between each component of the WAS 200, thereby, the interface unit 220, the text recognition unit 230, the text-to-speech unit 240 ), The voice data file generation unit 250 controls to perform a unique function.

한편, WAS(200)는, 사용자가 음성으로 변환하고자 하는 텍스트가 포함된 웹 페이지를 검색할 수 있도록 하는 사용자 인터페이스를 제공하고, 사용자가 송신한 질의를 질의 버퍼(미도시됨)에 저장함으로써, WAS(200)에 포함되는 검색 엔진(미도시됨)이 질의에 따른 검색을 처리할 수 있도록 하며, 검색 엔진에 의해 구성된, 질의에 대한 검색 결과를 결과 버퍼(미도시됨)에 저장한 후 사용자가 브라우징할 수 있도록 제공할 수 있다. 이러한 처리를 수행하기 위하여, WAS(200)에는 인터페이스부(220)가 포함될 수 있다.On the other hand, the WAS 200, by providing a user interface that allows the user to search the web page containing the text to be converted to speech, and stores the query sent by the user in a query buffer (not shown), A search engine (not shown) included in the WAS 200 may process a search according to a query, and the user may store a search result for the query configured by the search engine in a result buffer (not shown), and then Can provide for browsing. In order to perform such a process, the WAS 200 may include an interface unit 220.

또한, 본 발명의 일 실시예에 따르면, WAS(200)는 텍스트 인식부(230)에서 사용자 단말 장치(100)로부터 수신된 특정 텍스트를 인식한다. 이러한 텍스트 인식은 단지 사용자 단말 장치(100)로부터 수신된 텍스트 자체를 음성 변환의 대상으로 인식하는 것일 수도 있지만, 수신된 텍스트에 음성 변환이 불가능한 특수 문자 등이 있는지 여부를 살펴 이를 제거하는 처리를 포함하는 것일 수도 있다. 후자의 경우, 텍스트 인식부(230)는 필요에 따라 텍스트-음성 변환 서버(300)와 통신할 수 있다.In addition, according to an embodiment of the present invention, the WAS 200 recognizes the specific text received from the user terminal device 100 in the text recognition unit 230. Such text recognition may merely recognize the text itself received from the user terminal device 100 as a target of speech conversion, and includes processing to check whether the received text includes special characters that are not speech convertible or the like. It may be. In the latter case, the text recognition unit 230 may communicate with the text-to-speech server 300 as needed.

한편, WAS(200)는 텍스트-음성 변환을 위해 텍스트-음성 변환부(240)를 포함할 수 있다. 본 발명의 일 실시예에 따르면, 텍스트-음성 변환부(240)는 인식된 텍스트를 음성으로 변환하기 위하여 텍스트-음성 변환 서버(300)로 전송하고, 텍스트-음성 변환 서버(300)에서 변환된 음성 데이터 파일을 수신하는 기능을 수행한다. 본 발명의 일 실시예에 따르면, 텍스트-음성 변환 서버(300)에서 생성된 음성 데이터 파일은 보편적으로 가장 널리 쓰이는 사용자 컴퓨터 운영체제인 Windows^TM의 표준 미디어 플레이어로 재생 가능한 WAV 파일일 수 있다.Meanwhile, the WAS 200 may include a text-to-speech converter 240 for text-to-speech conversion. According to an embodiment of the present invention, the text-to-speech conversion unit 240 is transmitted to the text-to-speech conversion server 300 to convert the recognized text into speech, and converted from the text-to-speech conversion server 300 Performs the function of receiving a voice data file. According to an embodiment of the present invention, the voice data file generated by the text-to-speech server 300 may be a WAV file that can be played by a standard media player of Windows ^TM , which is the most widely used user computer operating system.

그 다음에, WAS(200)에는 음성 데이터 파일 생성부(250)가 포함될 수 있다. 통상 텍스트-음성 변환 서버(300)에서 생성되는 WAV 파일은 보편적으로 가장 널리 쓰이기는 하나 많은 경우에 있어 용량이 과다하게 커지는 문제가 있다. 따라서, 본 발명의 일 실시예에 따르면, 텍스트-음성 변환 서버(300)의 출력 파일인 WAV 파일을 음질의 지나친 저하를 초래하지 않는 범위 내에서 압축한 음성 데이터 파일을 생성할 필요가 있는데, 이를 위해 가장 적절한 압축 기술이 널리 알려진 MP3 표준 압축 기술이다. 이러한 기술에 의해 생성되는 압축된 음성 데이터 파일을 통상 MP3 파일이라고 칭한다. 여기에서, MP3란, 'MPEG-1 Audio Layer-3'의 약자로서, 영상 압축 표준인 MPEG에서 음성 데이터에 관한 압축 기술만을 따로 분리하여 규준한 것을 말한다. MP3 기술에 의하여 음성 데이터를 압축하면 압축률이 매우 뛰어나 CD 음질을 유지하면서도 원래의 음성 데이터 파일의 1/10 정도로 출력 파일의 크기를 줄일 수 있는 이점이 있다. 본 발명자들은 MP3 파일의 경우 고밀도 음성 데이터에 있어서는 그 음질의 열화가 드러날 경우가 있지만, 본 발명에서와 같이 텍스트를 음성 변환하여 이용하는 경우에는 사소한 음질의 열화가 전혀 문제되지 않는다는 점에 착안하였다.The WAS 200 may then include a voice data file generator 250. Generally, the WAV file generated by the text-to-speech server 300 is generally the most widely used, but in many cases, the WAV file has an excessively large capacity. Therefore, according to an embodiment of the present invention, it is necessary to generate a voice data file compressed with a WAV file that is an output file of the text-to-speech conversion server 300 without causing excessive degradation of sound quality. The most appropriate compression technique is the MP3 standard compression technique. Compressed voice data files generated by this technique are commonly referred to as MP3 files. Here, MP3 is an abbreviation of "MPEG-1 Audio Layer-3" and refers to the standard that separates and compresses only compression technology regarding audio data from MPEG which is a video compression standard. Compressing voice data by MP3 technology has an advantage that the compression rate is very high and the output file size can be reduced to about 1/10 of the original voice data file while maintaining the CD sound quality. The present inventors have focused on the fact that, in the case of MP3 files, the degradation of the sound quality may be revealed in the high-density voice data, but when the text is converted to speech as in the present invention, the slight degradation of the sound quality does not matter at all.

한편, 음성 데이터 파일 생성부(250)에서 생성된 압축된 음성 데이터 파일(예를 들면, MP3 파일)은 음성 데이터 파일 서버(400)에 저장되어 관리될 수 있다. 이때에, WAS(200)는 해당 파일이 음성 데이터 파일 서버(400)에 저장된 위치를 나타내는 식별자를 저장하고, 이를 사용자 단말 장치(100)에 제공하여 줌으로써, 사용자들이 해당 파일을 몇 번이고 반복하여 이용하도록 할 수 있다. 전술한 바와 같이, 상기 파일 위치를 나타내는 식별자는 URL일 수 있다.Meanwhile, the compressed voice data file (for example, an MP3 file) generated by the voice data file generator 250 may be stored and managed in the voice data file server 400. At this time, the WAS 200 stores an identifier indicating a location where the corresponding file is stored in the voice data file server 400 and provides the same to the user terminal device 100 so that the user repeats the file several times. You can use it. As described above, the identifier indicating the file location may be a URL.

마지막으로, 통신부(260)는 사용자 단말 장치(100)로부터 음성으로 변환되어야 하는 텍스트를 수신하고, 수신된 텍스트를 텍스트-음성 변환 서버(300)로 송신 하며, 텍스트-음성 변환 서버(300)에서 생성된 음성 데이터 파일을 수신하고, 수신된 음성 데이터 파일이 압축되어 음성 데이터 파일 서버(400)에 저장된 후에는 그 키 값에 해당하는 파일 위치 식별자를 사용자 단말 장치(100)에 제공하는 기능을 수행할 수 있다. 즉, 통신부(260)는 WAS(200)로부터/로의 데이터 송수신이 가능하도록 하는 기능을 수행한다.Finally, the communication unit 260 receives the text to be converted to speech from the user terminal device 100, transmits the received text to the text-to-speech conversion server 300, the text-to-speech conversion server 300 After receiving the generated voice data file, and after the received voice data file is compressed and stored in the voice data file server 400, a function of providing a file location identifier corresponding to the key value to the user terminal device 100 is performed. can do. That is, the communication unit 260 performs a function of enabling data transmission / reception from / to the WAS 200.

이상에서, 본 발명의 전체 시스템 및 WAS(200)의 내부 구성에 대하여 상세히 설명하였다. 이하에서는, 사용자가 본 발명의 일 실시예에 따라 웹 페이지의 텍스트를 음성 변환한 파일을 제공 받고 이를 재사용하거나 다른 사용자와 공유할 수 있게 되는 실례에 대하여 살펴보기로 한다.In the above, the internal structure of the entire system and the WAS 200 of the present invention has been described in detail. Hereinafter, a description will be given of an example in which a user may receive a file obtained by voice converting a text of a web page and reuse it or share it with another user according to an embodiment of the present invention.

본 발명의 활용 예Application Examples of the Invention

도 3a 내지 도 3b는 본 발명의 일 실시예에 따라 사용자가 웹 페이지 내의 텍스트를 MP3 파일로 변환하여 다운로드 받을 때에 표시되는 화면을 디자인한 도면이다.3A to 3B are diagrams illustrating a screen displayed when a user converts text in a web page into an MP3 file and downloads the text according to an embodiment of the present invention.

예를 들어, 검색 엔진을 갖는 웹 사이트에 접속한 사용자가 '토익 공부 방법'이라는 질의어로 검색을 수행한 후, 검색된 결과 중 특정 웹 문서를 선택하고, 상기 웹 문서 내의 텍스트를 MP3 파일로 변환한 것을 본인의 컴퓨터에 다운로드 받으려 한다고 가정하자.For example, a user who accesses a web site having a search engine performs a search using a query called 'TOIC study method', selects a specific web document among the searched results, and converts the text in the web document into an MP3 file. Suppose you want to download something to your computer.

본 발명의 바람직한 일 실시예에 의하면, 사용자는 상기 웹 문서 내의 텍스트의 원하는 범위를 마우스로 드래깅할 수 있다. 그 다음에 사용자가 드래깅을 해제하지 않은 상태에서 마우스 오른쪽 버튼을 클릭한 후 노출되는 메뉴 바의 목록 중 '선택 영역을 MP3 형식으로 다운로드'를 선택하면 사용자의 컴퓨터로부터 해당 텍스트가 추출되어 상기 웹 사이트로 텍스트-음성 변환 요청이 발의될 수 있다.According to a preferred embodiment of the present invention, the user can drag a desired range of text in the web document with the mouse. Then, if the user right-clicks the mouse and selects 'Download selection in MP3 format' from the list of menu bars that are exposed without dragging, the text is extracted from the user's computer and the website A text-to-speech request may be initiated.

그 다음에, 위와 같은 요청을 전달 받은 웹 사이트는 그 운영 서버에 포함되어 있는 혹은 외부에 위치되어 있는 텍스트-음성 변환 서버에 대하여 텍스트-음성 변환을 요청하게 되고, 이 요청에 따라 생성된 음성 데이터 파일은 웹 사이트의 운영 서버에서 다시 MP3 형식으로 압축되어 저장될 수 있다. 이리하여 생성된 MP3 파일은 웹 사이트의 운영 서버의 소정 데이터베이스에 위치하게 되는데, 웹 사이트 운영 서버는 사용자 컴퓨터에 상기 MP3 파일의 위치를 지시하는 URL을 리턴하여 사용자가 MP3 파일을 다운로드 받을 수 있도록 한다. 이때에, 사용자에게는 도 3b에 도시된 바와 같은 확인 창이 표시될 수 있다. 사용자는 여기에서 저장 버튼을 클릭함으로써 애초에 마우스로 드래깅하였던 텍스트에 대응되는 MP3 파일을 다운로드할 수 있다.Then, the web site receiving the above request requests the text-to-speech to the text-to-speech server included in the operation server or externally located, and the voice data generated according to the request. The file can be compressed and stored in MP3 format again on the website's production server. The MP3 file thus created is located in a predetermined database of the operation server of the web site. The web site operation server returns a URL indicating the location of the MP3 file to the user's computer so that the user can download the MP3 file. . At this time, the user may display a confirmation window as shown in FIG. 3B. The user can download the MP3 file corresponding to the text originally dragged with the mouse by clicking the save button here.

이상 설명된 본 발명에 따른 실시예들은 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스 크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Embodiments according to the present invention described above may be implemented in the form of program instructions that may be executed by various computer components, and may be recorded in a computer-readable recording medium. The computer readable recording medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the recording medium may be those specially designed and constructed for the present invention, or may be known and available to those skilled in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical recording media such as CD-ROMs, DVDs, and magneto-optical media such as floptical disks. optical media) and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform the process according to the invention, and vice versa.

이상에서 본 발명이 구체적인 구성요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명이 상기 실시예들에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형을 꾀할 수 있다.Although the present invention has been described by specific embodiments such as specific components and the like, but the embodiments and the drawings are provided to assist in a more general understanding of the present invention, the present invention is not limited to the above embodiments. For those skilled in the art, various modifications and variations can be made from such descriptions.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등하게 또는 등가적으로 변형된 모든 것들은 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the embodiments described above, and all of the equivalents or equivalents of the claims, as well as the claims below, are included in the scope of the spirit of the present invention. I will say.

도 2는 본 발명의 일 실시예에 따른 WAS의 내부 구성을 상세하게 도시한 도면이다.2 is a diagram illustrating in detail the internal configuration of the WAS according to an embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

100: 사용자 단말 장치100: user terminal device

200: WAS200: WAS

210: 제어부210: control unit

220: 인터페이스부220: interface unit

230: 텍스트 인식부230: text recognition unit

240: 텍스트 음성 변환부240: text-to-speech unit

250: 음성 데이터 파일 생성부250: voice data file generation unit

260: 통신부260: communication unit

300: 텍스트-음성 변환 서버300: text-to-speech server

400: 음성 데이터 파일 서버400: voice data file server

500: 인터넷500: Internet

Claims

A method of extracting text from a web page and converting the text into speech.

Extracting text from a terminal device;

Transmitting the extracted text from the terminal device to a computing system that can communicate with the terminal device;

Generating a speech data file by converting the extracted text into speech in the operation system;

Compressing the voice data file in the computing system,

Storing the compressed voice data file in the computing system, and

Transmitting an identifier indicating a location of the compressed and stored voice data file from the computing system to the terminal device;

How to include.

The method of claim 1,

And the computing system comprises at least one of a web application server, a text-to-speech server, and a voice data file server.

The method of claim 1,

Extracting text from the terminal device further includes recognizing a mouse pointer on the text.

The method of claim 1,

And the speech data file speech-converted from the text is a WAV file.

The method of claim 1,

And the compressed voice data file is an MP3 file.

The method of claim 1,

The identifier indicating the location of the compressed and stored voice data file is a URL.

A system for communicating with a terminal device, extracting text from a web page, converting the text into voice, and providing the same,

A text recognition unit for recognizing text extracted from a terminal device;

A text-to-speech converter for converting the extracted text into speech;

A voice data file generator for compressing the voice converted by the text-to-speech converter to generate a compressed voice data file, and

Communication unit for transmitting the identifier indicating the location of the compressed voice data file to the terminal device

System comprising a.

The method of claim 7, wherein

And a database for storing the compressed voice data file.

The method according to claim 7 or 8,

The converted voice is stored in WAV format.

The method according to claim 7 or 8,

And the compressed voice data file is an MP3 file.

The method according to claim 7 or 8,

The identifier indicating the location of the compressed voice data file is a URL.

A computer-readable recording medium for recording a computer program for executing the method according to any one of claims 1 to 6.