WO2013061718A1

WO2013061718A1 - Apparatus for providing text data with synthesized voice information and method for providing text data

Info

Publication number: WO2013061718A1
Application number: PCT/JP2012/074370
Authority: WO
Inventors: 五十嵐　信夫; 佳史亀島; 田中　公司
Original assignee: 日立公共システムエンジニアリング株式会社
Priority date: 2011-10-28
Filing date: 2012-09-24
Publication date: 2013-05-02
Also published as: JP2013097033A; CN103827961A

Abstract

[Problem] To provide an apparatus which optionally enables an operation for reading aloud arbitrary text of a user's choosing and provides text segments which can be read by speech synthesis in a manner convenient to the user by an operation matching the intention of the user. [Solution] Text data selected from a database which stores a plurality of pieces of text data is loaded in response to an instruction signal from a user terminal, and the text segments constituting the text data are determined. A phonetic symbol sequence is generated by a phonetic symbol sequence generation program per text segment, and a generated phonetic symbol sequence is added to each text segment. The text data having the phonetic symbol sequences added to the text segments thereof is transmitted from a transmission means to a user terminal together with a text-to-speech program.

Description

Text data providing apparatus with speech synthesis information and text data providing method

The present invention relates to a text data providing apparatus with speech synthesis information and a text data providing method.

Certain types of text data including Internet homepages are described in HTML (Hyper Text Markup Language).

There are speech synthesis systems that convert text data and other text data into speech data.

Patent Document 1 describes a speech synthesizer for speech synthesis of text data.

Patent Document 2 describes that a homepage document is newly created or an existing document is read to be a homepage document, a reading range tag is specified, a reading range tag is inserted, and a reading identifier is specified. It is described that the speech reading range is passed to the reading detection text program identified by the above.

Patent Document 3 describes that a read-out part is specified in acquired content, the specified read-out part is read out with sound data, and the user is responded with sound data.

JP 2003-140673 A JP 2001-109612 A JP 2003-99079 A

Conventionally, voice data is generated by using a voice synthesis program for text data selected by a Web server or other server, for example, homepage text data, and transmitted to a user terminal together with the text data. When reading text data, a voice-to-speech operation was performed. In the conventional example, a tool for reading out voice data is built in the user terminal.

However, in order to install such a tool on a user terminal, administrator authority is required, and the act of installation itself is troublesome and has been avoided by users.

In Patent Document 2, a reading range designation tag is inserted into a voice reading range, and a reading text identifier for identifying a reading text detection program is designated as an attribute of the reading range designation tag. However, when reading aloud, it is necessary to install a tool for reading aloud into the user terminal.

The user wants to read a sentence at an arbitrary position in the text data. For this reason, the operation | movement along a user's intention is calculated | required. It is troublesome to install a reading tool on the user terminal, so it is not avoided by the user, but it is converted into voice data at the server side as in the conventional example. Can not be arbitrarily read out for any position that is required, that is, any text is read out, the entire voice data that has been converted collectively on the server side is read out, and the operation according to the intention is not made to the user It is out. Further, even when a tool for speech synthesis is installed in an existing user terminal, an operation in accordance with the user's intention is required without using the installed tool.

In view of the above, the present invention enables an operation to arbitrarily read out an arbitrary sentence requested by the user, and enables the user to read out the voice in a manner convenient for the user by an operation in accordance with the user's intention. For the purpose.

The present invention includes speech synthesis information converting means for converting text data described on a Web page into speech synthesis information, and provides speech synthesis information together with text data to a user terminal via a network. In the text data providing device,
A database for storing a phonetic symbol generation program and a voice conversion synthesis program for converting and synthesizing a phonetic symbol sequence into speech data;
Reads the selected text data from a database storing a plurality of text data based on an instruction signal from the user terminal, and generates a pronunciation consisting of the reading order and how to read each sentence text of the text data by the phonetic symbol string generation program Text data generating means with a sentence-by-sentence symbol string that generates a symbol string and attaches the sentence-by-sentence pronunciation string to each sentence text;
There is provided a text data providing apparatus with speech synthesis information, characterized by comprising: transmission means for transmitting text data with a phonetic symbol string for each sentence and a speech synthesis conversion program to a user terminal via a network.

According to the present invention, the above-described speech conversion synthesis program converts the phonetic symbol string attached to the text text into the speech data when any text text of the text data is designated as a location where the speech data is to be read out. A text data providing apparatus with speech synthesis information is provided.

According to the present invention, the above-described phonetic symbol string generation program generates a phonetic symbol string including a reading order by dividing a sentence text into divided symbol units for each sentence text. A text data providing apparatus is provided.

The present invention includes speech synthesis information converting means for converting text data described on a Web page into speech synthesis information, and provides speech synthesis information to a user terminal together with text data via a network. In the text data providing method with speech synthesis information by the homepage text data providing device,
In the database, a phonetic symbol string generation program and a voice conversion synthesis program for converting and synthesizing a phonetic symbol string into speech data are stored.
A sentence-by-sentence phonetic symbol text data generation unit reads text data selected from a database that stores a plurality of text data based on an instruction signal from a user terminal, and the phonetic symbol string generation program reads the text data for each sentence. Generate a phonetic symbol string, attach the generated phonetic symbol string to each sentence text,
Providing text data with speech synthesis information, characterized in that the transmission means includes transmitting text data in which each sentence text is attached with a phonetic symbol string for each sentence and a speech synthesis conversion program to the user terminal via the network. Provide a method.

In the present invention, when any of the text texts in the text data is designated as a location where the voice data is to be read out, the above-described voice conversion synthesis program converts the phonetic symbol string attached to the text text into the voice data. Provided is a method for providing text data with speech synthesis information, which is a program for conversion.

According to the present invention, the above-described phonetic symbol string generation program generates a phonetic symbol string including a reading order by dividing a sentence text into divided symbol units for each sentence text. A method for providing text data is provided.

When the present invention is applied to homepage text data, the above text data shall be read as homepage text data.

As described above, the present invention generates a phonetic symbol string for each sentence text of text data, for example, homepage text data, using the phonetic symbol string generation program and the speech conversion synthesis program, and pronounces each sentence of the specified sentence text. Since the symbol string is converted into voice data so that it can be easily used on the user terminal, it is not necessary to install a tool for converting voice data into the user terminal as in the prior art. Moreover, since a voice conversion synthesis program for converting each sentence text into a phonetic symbol string and voice data for each sentence is transmitted to the user terminal, the user can arbitrarily instruct and read out any sentence text, Text data according to the user's intention, such as a home page, will be provided.

The block diagram explaining the Example of this invention. The figure which shows the structure of the homepage text data provision apparatus with speech synthesis information with a block. The figure which shows the screen of a homepage text. The image figure which shows the state transition of the text text of an HTML format. The figure which shows the image shown in FIG. 4 more concretely.

Hereinafter, embodiments of the present invention will be described with reference to the drawings.

FIG. 1 is a block diagram illustrating an embodiment of the present invention.
In FIG. 1, a homepage text data providing apparatus 100 with speech synthesis information according to an embodiment of the present invention includes a server 1 (sometimes referred to as a server system). The server 1 includes a Web server 2 and a user. Terminals are connected via networks 4 and 5.
The present embodiment is applicable to providing text data for various contents including a home page, but a home page as a typical example will be described.
Although the server 1 and the Web server 2 may be configured as an integral unit, they will be described as separate components here.

The server 1 stores a phonetic symbol string generation program 11 and a speech conversion synthesis program 12 in the database as will be described later.

Scroll control is a tool that is displayed in the form of a control panel on the window screen of the user terminal, and controls the screen by clicking (ie, touching) the control items constituting the control form.

In such a configuration, the user sends a homepage acquisition request in HTML document format from the user terminal 3 to the homepage text data providing apparatus 100 via the network 5. The homepage text data providing apparatus 100 makes an acquisition request to the Web server 2 via the network 4. The Web server 2 stores a large number of home pages in a database.

The Web server 2 selects a corresponding home page based on the instruction requested to be acquired. The home page includes various home page text information. Hereinafter, this homepage text information is referred to as homepage text data. The text information is referred to as text data. The homepage text data is composed of a plurality of texts (HTML documents). The home page text data is usually formed in units of blocks, and therefore the home page text data can be extracted in units of blocks.

The Web server 2 transmits the selected homepage text data to the homepage text data providing apparatus 100 via the network 4. These data are stored in the server 1.

The homepage text data providing apparatus 100 analyzes the HTML document of the sent homepage text based on the data stored in the server 1 and creates a phonetic symbol string (language analysis data) that is the original data of the voice data.

The homepage text data providing apparatus 100 transmits the text text of the homepage text data with a phonetic symbol string and the voice conversion synthesis program to the user terminal 3 via the network 5.

The user terminal 3 reads out each sentence text of the homepage text data using the transmitted phonetic symbol string as voice data by the voice conversion synthesis program. As a result, sound data is created from the phonetic symbol string and reproduced. The tool for reproducing the voice data is not installed in the user terminal, and the phonetic symbol string and the voice conversion synthesis program are transmitted from the homepage text data providing apparatus 100. It does not prevent the voice data tool from being already installed in the user terminal 3.

Thus, the homepage text data providing system 200 including the homepage text data providing apparatus 100 with speech synthesis information is configured.

This embodiment will be described in further detail with reference to FIG.
FIG. 2 is a block diagram showing the configuration of the homepage text data providing apparatus 100 with speech synthesis information.

In FIG. 2, the homepage text data providing apparatus 100 with speech synthesis information includes an input means 21, homepage text data generation means 22 with sentence-by-sentence phonetic symbol string, transmission means 24, database 25 stored in the server 1, and image display means. 26. These means are connected to each other by a communication circuit 27 to exchange data.

As described above, the homepage text data providing apparatus 100 with speech synthesis information is connected to the Web server 2 via the network 4 and is connected to the user terminal 3 via the network 5.

As described above, the Web server 2 receives a homepage acquisition request from the homepage text data providing apparatus 100, selects the corresponding homepage, and transmits the homepage text data to the input means 21.

The input means 21 inputs the selected homepage text data.

The homepage text data is formed as a collection of sentence texts in the HTML document format, and an HTML document that is one sentence text, that is, one document text is provided with a division symbol.

The database 25 stores a phonetic symbol string generation program and a voice conversion program, and stores a homepage text with a phonetic symbol string generated by the homepage text data generation unit 22 with a phonetic symbol string for each sentence and a voice conversion synthesis program.

The homepage text data generation means 22 with a sentence-by-sentence symbol string for each sentence divides the document text into several parts using the division symbols of each sentence text.

For each sentence text, the phonetic symbol string generation of each sentence text is generated using the phonetic symbol string generation program and attached to the homepage text data.

The transmission means 24 transmits the homepage text data and the speech conversion synthesis program to which the phonetic symbol string is attached for each sentence text to the user terminal 3.

The homepage text data to which the generated phonetic symbol string for each sentence text is attached is displayed on the display screen of the image display means 26.

As described above, the database 25 stores a phonetic symbol string generation program and a voice conversion synthesis program for converting and synthesizing a phonetic symbol string into voice data. The voice conversion synthesis program for each sentence text is a program that can generate and reproduce voice data from the language analysis data generated for each sentence text, and only resides in the browser, and does not install on the user terminal 3. Closing closes the program. As described above, this program is a program that runs on the browser without being installed in the user terminal 3. The homepage text data generation means 22 with a phonetic symbol string for each sentence reads the homepage text data selected from the database storing a plurality of text data based on the instruction signal from the user terminal 3, and each sentence constituting the homepage text data The divided symbols of the text are read, a phonetic symbol string is generated for each sentence text by the phonetic symbol string generation program, and the generated phonetic symbol string for each sentence is attached to each sentence text.

When a sentence text of homepage text data is designated as a place where the voice data is to be read out, the voice conversion synthesis program uses the pronunciation symbol string for each sentence attached to the sentence text and uses each sentence text as an identifier. A program that converts data.

FIG. 3 shows a display example of the displayed home page.
Control the scroll control displayed on the screen to identify and read out text text. Further, “rewinding” and “delaying” can be performed.

The user terminal 3 receives the homepage text data with phonetic symbol strings for each sentence and the voice conversion synthesis program, and receives the text text to be read out as voice data from the scroll control. A designation field can also be formed at the beginning of the document text. This is an identifier for identifying that the text of the text itself is read out as voice data. The transmitted voice conversion synthesis program operates as a program on the browser, and reads out the sentence text as an identifier as voice data. A plurality of text texts with identifiers can be specified.

Thus, voice data is created based on the above-mentioned data transmitted from the user terminal 3.

By generating the voice data at the user terminal 3, the voice data can be generated by designating a necessary portion (that is, sentence text) when necessary, and reading operation and reading in accordance with the user's intention are possible. .

FIG. 4 is an image diagram showing the state transition of the text text in the HTML format.

In FIG. 4, it is assumed that the text text of a part of the homepage text data, “Today, take a walk because the weather is good” is transmitted from the Web server 2 to the homepage text data providing apparatus 100.

The homepage text data generation means 22 with sentence-by-sentence phonetic symbol string of the homepage text data providing apparatus 100 is a program stored in the database, and divides the clauses in units of divided symbols such as “,” “.”.

The division symbol is defined by the following eight symbols.
・「、」
・ "."
・ "?"
・「?」
・ "!"
・「!」
・ "" (Full-width space)
・ "" (Half-width space)
Next, a number (attribute) indicating the reading order and a phonetic symbol string consisting of how to read are added. As a result, a phonetic symbol string is added.

The generated phonetic symbol string is attached to the homepage text data and transmitted from the transmission means to the user terminal 3 together with the voice conversion synthesis program.

The user terminal 3 creates voice data from the phonetic symbol string by the voice conversion program and speaks.

FIG. 5 shows the image shown in FIG. 4 more specifically.
A number (attribute) indicating the reading order and a phonetic symbol string to which the reading is added are added. Because of this, the weather is nice today so I will take a walk. Thus, a sentence text to which a phonetic symbol string is added is constructed.

When the browser is started on the user terminal and the server is accessed for the first time, the speech conversion synthesis program is downloaded to the user terminal together with the homepage text data. Since the downloaded voice conversion program is resident (operably held) in the browser of the user terminal, only the homepage text data is downloaded to the user terminal for the second and subsequent accesses. Since the voice conversion synthesis program resides only in the browser and is not installed in the user terminal, it disappears when the browser is closed. When the browser is restarted and the server is accessed, the voice conversion program is downloaded to the user terminal again together with the homepage text data.

4 and 5, the advantage of implementing the text text of the homepage text data as an HTML document is as follows.

-Due to the structure of the description format, data processing such as correspondence between character strings and their voice data is easy.
-Since the description format is a global standard, audio data can be added to (almost) all contents.
-The HTML text format display program (browser) and the accompanying tools (plug-ins) are highly functional, so they are easier to process than other text data.
-Data and text data with different properties such as images and external links can coexist in one file.

4 and 5, the database stores in advance a phonetic symbol string generation program and a voice conversion synthesis program for converting and synthesizing a phonetic symbol string into speech data. A homepage acquisition instruction is issued from the user terminal 3 to the Web server 2 via the homepage text data providing apparatus 100, and the Web server 2 selects a homepage based on this instruction and provides a server that provides speech synthesis information to the user terminal. Send. The home page text data providing apparatus 100 reads the text text of the selected home page. The divided symbol of each sentence text is discriminated by the phonetic symbol string generation program.

The phonetic symbol string is generated by the phonetic symbol string generation program. Thereby, a phonetic symbol string for each sentence is created. Attach the generated phonetic symbol string to the text. The sentence text is a specific sentence text for each sentence text, and the specific sentence text itself constitutes an identifier.

音声 A speech conversion synthesis program is attached to homepage text data consisting of phonetic symbol strings for each sentence of the specified text.

∙ Homepage text data in which phonetic symbol strings are generated for each generated text text and voice conversion synthesis program are transmitted to the user terminal. As a result, the user acquires the phonetic symbol string and the speech conversion synthesis program generated for each homepage text data intended for the user terminal. These phonetic symbol strings and the speech conversion synthesis program are data and programs that are operated on the browser, and are not installed in the user terminal.

This example makes it possible to read out speech with improved functions. This will improve accessibility for older people, people with weak vision, and people who are not good at color identification. This makes it easier to use the home page.

The user designates arbitrary text text by operating the user terminal and instructs to read out the voice data. Since this sentence text has the function of an identifier, this sentence text is converted and synthesized into voice data by the phonetic symbol sequence attached to this sentence text and the function / function of the voice conversion synthesis program. Is read out. This voice data can be read aloud repeatedly by instructing the text text to be read, and any text text can be designated, that is, rewinded and advanced.

As described above, the homepage text data providing method of speech synthesis information when the homepage text is used as the text data includes the following steps.

Step of storing a phonetic symbol string generation program and a voice conversion synthesis program for converting and synthesizing phonetic symbol strings into voice data in the database. The selected homepage text data is read from the database for storing the text data, the phonetic symbol string is generated for each sentence text by the phonetic symbol string generation program, and the generated phonetic symbol string for each sentence text is attached. A step in which the transmitting means transmits homepage text data and a speech synthesis conversion program in which each sentence text is attached to each sentence text to the user terminal via the network.

Thus, the problem is solved by creating the voice data on the user terminal side, not on the home page (server) side. By creating voice data on the user terminal, voice data can be created only when necessary (minutes) when necessary, so that it is possible to perform a simple reading operation or voice reading in accordance with the user's intention.

DESCRIPTION OF SYMBOLS 1 ... Server, 2 ... Web server, 3 ... User terminal, 4, 5 ... Network, 11 ... Phonetic symbol string (language analysis data generation program), 12 ... Speech conversion synthesis program, 21 ... Input means, 22 ... For each sentence Homepage data generation means with phonetic symbol string, 24 ... transmission means, 25 ... database, 100 ... homepage text data providing device with speech synthesis information, 200 ... homepage text data providing system.

Claims

A text data providing apparatus with speech synthesis information provided with speech synthesis information converting means for converting text data described in a Web page into speech synthesis information, and providing the speech synthesis information together with the text data to the user terminal via the network In
A database for storing a phonetic symbol generation program and a voice conversion synthesis program for converting and synthesizing a phonetic symbol sequence into speech data;
A phonetic symbol consisting of a reading order and a reading method for each sentence text of the text data by reading the text data selected from the plurality of text data stored based on the instruction signal from the user terminal and by the phonetic symbol string generation program A text data generating means with a sentence-by-sentence symbol string, which generates a string and attaches the sentence-by-sentence pronunciation string to each sentence text;
An apparatus for providing text data with speech synthesis information, comprising: transmission means for transmitting text data with a pronunciation symbol string for each sentence and a speech synthesis conversion program to a user terminal via a network.
2. The program according to claim 1, wherein the voice conversion synthesis program converts a phonetic symbol string attached to the sentence text into voice data when any sentence text of the text data is designated as a portion where the voice data is to be read out. An apparatus for providing text data with speech synthesis information.
2. The text data providing apparatus with speech synthesis information according to claim 1, wherein the speech conversion synthesis program is not installed in a user terminal, resides in the browser, and disappears when the browser is closed.
3. The text with speech synthesis information according to claim 2, wherein the phonetic symbol string generation program generates a phonetic symbol string including a reading order by dividing the sentence text in units of divided symbols for each sentence text. Data provision device.
A text data providing apparatus with speech synthesis information provided with speech synthesis information converting means for converting text data described in a Web page into speech synthesis information, and providing the speech synthesis information together with the text data to the user terminal via the network In the text data providing method with speech synthesis information by
In the database, a phonetic symbol string generation program and a voice conversion synthesis program for converting and synthesizing a phonetic symbol string into speech data are stored,
A sentence-by-sentence phonetic symbol string data generation unit reads text data selected from a plurality of text data stored based on an instruction signal from a user terminal, and the phonetic symbol string generation program generates a pronunciation for each sentence text. Generate a symbol string, attach the generated pronunciation symbol string for each sentence text,
Providing text data with speech synthesis information, characterized in that the transmission means includes transmitting text data in which each sentence text is attached to each sentence text and a speech synthesis conversion program to the user terminal via the network. Method.
6. The voice conversion synthesis program according to claim 5, wherein when any sentence text in the text data is instructed as a part where the voice data is to be read out, the phonetic symbol string attached to the sentence text is converted into the voice data. A method for providing text data with speech synthesis information, characterized by being a program.
6. The method for providing text data with speech synthesis information according to claim 5, wherein the speech conversion synthesis program is not installed in a user terminal, resides in the browser, and disappears when the browser is closed.
6. The text with speech synthesis information according to claim 5, wherein the phonetic symbol string generation program generates a phonetic symbol string including a reading order by dividing the sentence text into divided symbols for each sentence text. Data provision method.