JP2000293187A

JP2000293187A - Device and method for synthesizing data voice

Info

Publication number: JP2000293187A
Application number: JP11103207A
Authority: JP
Inventors: Tatsuji Yahashi; 達司矢橋
Original assignee: NEC Solution Innovators Ltd
Current assignee: NEC Solution Innovators Ltd
Priority date: 1999-04-09
Filing date: 1999-04-09
Publication date: 2000-10-20

Abstract

PROBLEM TO BE SOLVED: To provide a data voice synthesizing device and a data voice synthesizing method in which a user is able to listen to the synthesized voice of a text without delay and a voice file is provided in accordance with the desire of the user. SOLUTION: The device is provided with a text input section 111 which extracts text data that are the voice synthesis object from a text database 12, a text dividing process section 112 which divides the data of the voice synthesis object extracted from the database 12 in accordance with punctuation marks, and a voice synthesis processing section 113 which conducts voice synthesis for every text data of the divided voice synthesis object.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、データ音声合成装
置及びデータ音声合成方法に関し、特に長いテキスト文
を素早く音声へ合成し読み上げを可能とする場合に好適
なデータ音声合成装置及びデータ音声合成方法に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data speech synthesizing apparatus and a data speech synthesizing method, and more particularly to a data speech synthesizing apparatus and a data speech synthesizing method suitable for quickly synthesizing a long text sentence into speech and reading it out. About.

【０００２】[0002]

【従来の技術】従来より、テキストデータを音声合成
し、音声ファイル化して音声として出力する技術が開発
されている。従来のテキストデータの音声合成処理の方
式は、テキストデータを音声合成エンジンにそのまま渡
し、音声ファイル化するか、もしくはメモリ内で音声デ
ータ化することを行っていた。2. Description of the Related Art Conventionally, there has been developed a technique for synthesizing text data by voice, converting the text data into a voice file, and outputting the voice as voice. In the conventional method of speech synthesis of text data, text data is passed to a speech synthesis engine as it is, and is converted into a voice file or converted into voice data in a memory.

【０００３】しかし、情報量が大きくなりテキストデー
タサイズも大きくなるに伴い、音声合成処理に要する時
間が非常に長くなる。更に遅延が発生するために、利用
者の意図を取り入れながらの動作はできなかった。However, as the amount of information increases and the text data size increases, the time required for speech synthesis processing becomes very long. In addition, the operation cannot be performed while taking the user's intention into account due to a further delay.

【０００４】上記のような音声合成に関する従来例とし
ては、例えば特開平９−３０７６５８号公報に記載の技
術が提案されている。同公報は、電子メールの読み上げ
を簡単な操作で自在に行うことを目的としたものであ
り、所定のフォーマットで記述した応答メッセージを記
憶するメッセージデータファイルを利用し、前記記憶さ
れている応答メッセージの出力指示を入力し、前記入力
した出力指示に応答して対応する応答メッセージを音声
合成出力することを特徴とする情報処理方法が開示され
ている。As a conventional example relating to the above-described speech synthesis, for example, a technique described in Japanese Patent Application Laid-Open No. 9-307658 has been proposed. The gazette is intended to freely read out an e-mail by a simple operation, and utilizes a message data file for storing a response message described in a predetermined format. An information processing method is disclosed in which an output instruction is input and a response message corresponding to the input output instruction is voice-synthesized and output.

【０００５】また、上記のような音声合成に関する他の
従来例としては、例えば特開平１０−１４９２７３号公
報に記載の技術が提案されている。同公報は、電子メー
ルの内容や意味を聞きとりやすい発声を実現することを
目的としたものであり、電子メールを記憶し、前記記憶
している電子メールのテキストを分析し、前記分析され
た結果に従って発声属性を決定し、前記決定された発声
属性に基づいて前記テキストを音声合成することを特徴
とする情報処理方法が開示されている。As another conventional example relating to the above-described speech synthesis, a technique described in, for example, Japanese Patent Application Laid-Open No. H10-149273 has been proposed. The gazette aims to realize an utterance that makes it easy to hear the content and meaning of the e-mail, stores the e-mail, analyzes the text of the stored e-mail, and performs the analysis. There is disclosed an information processing method, wherein an utterance attribute is determined according to a result, and the text is speech-synthesized based on the determined utterance attribute.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、上述し
た従来例においては次のような問題点があった。However, the above-described prior art has the following problems.

【０００７】第一の問題点は、音声合成処理をするテキ
ストデータのサイズが大きいため、長いテキストデータ
の音声合成処理には非常に長い時間がかかる。The first problem is that since the size of text data to be subjected to speech synthesis processing is large, it takes a very long time to perform speech synthesis processing on long text data.

【０００８】第二の問題点は、テキストの再生が終わる
まで処理を中断できないもしくは中断することができて
も次のテキストデータの音声合成処理が終了していない
ので再生する準備ができていないため、音声合成処理中
および再生中に利用者の指示を反映させることができな
いことである。[0008] The second problem is that the processing cannot be interrupted until the reproduction of the text is completed, or even if the processing can be interrupted, the speech synthesis processing of the next text data is not completed, so that it is not ready for reproduction. Another problem is that the user's instruction cannot be reflected during the speech synthesis processing and the reproduction.

【０００９】本発明の目的は、利用者が遅延なくテキス
トの音声合成された結果を聞くことを可能とし、利用者
の意図に応じた音声ファイルを提供可能としたデータ音
声合成装置及びデータ音声合成方法を提供するものであ
る。SUMMARY OF THE INVENTION An object of the present invention is to provide a data speech synthesizer and a data speech synthesis apparatus which enable a user to listen to the result of text-to-speech synthesis without delay and provide a speech file according to the user's intention. It provides a method.

【００１０】[0010]

【課題を解決するための手段】本発明は、データの音声
合成を行うデータ音声合成装置において、複数のデータ
を蓄積した蓄積手段と、該蓄積手段から音声合成対象デ
ータを抽出する抽出手段と、該抽出手段で前記蓄積手段
から抽出した前記音声合成対象データを所定の分割条件
に基づき分割する分割手段と、該分割手段で分割した前
記音声合成対象データ毎に音声合成を行う音声合成手段
とを具備することを特徴とする。According to the present invention, there is provided a data-speech synthesizing device for synthesizing data, comprising: a storage means for storing a plurality of data; an extraction means for extracting data to be synthesized from the storage means; A dividing unit that divides the speech synthesis target data extracted from the storage unit by the extraction unit based on a predetermined division condition; and a speech synthesis unit that performs speech synthesis for each of the speech synthesis target data divided by the division unit. It is characterized by having.

【００１１】また、本発明は、データの音声合成を行う
データ音声合成方法において、複数の音声合成対象デー
タを蓄積した蓄積手段から特定の音声合成対象データを
抽出し、該抽出工程で前記蓄積手段から抽出した前記音
声合成対象データを所定の分割条件に基づき分割し、該
分割工程で分割した前記音声合成対象データ毎に音声合
成を行うことを特徴とする。The present invention also relates to a data speech synthesizing method for synthesizing data, in which specific speech synthesis target data is extracted from a storage unit storing a plurality of speech synthesis target data, and the storage unit is used in the extracting step. The speech synthesis target data extracted from the above is divided based on a predetermined division condition, and speech synthesis is performed for each of the speech synthesis target data divided in the dividing step.

【００１２】また、本発明のデータ音声合成装置は、図
１を参照しつつ説明すれば、データの音声合成を行うデ
ータ音声合成装置において、複数のデータを蓄積した蓄
積手段（図１の１２）と、該蓄積手段から音声合成対象
データを抽出する抽出手段（図１の１１１）と、該抽出
手段で前記蓄積手段から抽出した前記音声合成対象デー
タを所定の分割条件に基づき分割する分割手段（図１の
１１２）と、該分割手段で分割した前記音声合成対象デ
ータ毎に音声合成を行う音声合成手段（図１の１１３）
とを具備している。The data-speech synthesizing apparatus according to the present invention, which will be described with reference to FIG. 1, is a data-speech synthesizing apparatus for synthesizing data. Extracting means (111 in FIG. 1) for extracting speech synthesis target data from the storage means; and dividing means for dividing the speech synthesis target data extracted from the storage means by the extraction means based on predetermined division conditions ( A speech synthesis unit (112 in FIG. 1) for performing speech synthesis for each of the speech synthesis target data divided by the division unit (113 in FIG. 1).
Is provided.

【００１３】［作用］本発明のデータ音声合成装置は、
テキストを分割して音声合成をするように制御してい
る。このため、分割されたテキストの音声合成処理を短
い時間で行うことが可能となり、利用者が遅延なくテキ
ストの音声合成された結果を聞くことができる。また、
音声合成されたデータは利用者の指示を先読みした音声
合成スケジュールに従うように制御している。このた
め、利用者に「次の電子メール」など利用者の意図に応
じた音声ファイルを提供することができる。更に、音声
合成対象データがテキストデータだけでなく、広範なデ
ータを扱うことができるようにしている。このため、利
用者はデータが限定されないという利点を得ることがで
きる。[Operation] The data-speech synthesizing apparatus of the present invention comprises:
It controls to divide text and synthesize speech. For this reason, the speech synthesis processing of the divided text can be performed in a short time, and the user can hear the result of the speech synthesis of the text without delay. Also,
The voice-synthesized data is controlled so as to follow a voice synthesis schedule in which a user's instruction is read ahead. For this reason, it is possible to provide the user with an audio file such as “next e-mail” according to the user's intention. Further, the speech synthesis target data can handle not only text data but also a wide range of data. Therefore, the user can obtain an advantage that data is not limited.

【００１４】付言すれば、本発明のデータ音声合成装置
は、蓄積手段から抽出した音声合成対象データを句点／
読点などの分割条件に基づき幾つかのデータに分割し、
分割したデータを音声合成して出力する点が特徴であ
り、この点で上記従来例の音声合成方法とは相異するも
のである。[0014] In addition, the data-speech synthesizing apparatus of the present invention converts the speech-synthesis target data extracted from the storage means to a period /
Divide into several data based on division conditions such as reading points,
The feature is that the divided data is synthesized and output as a voice, which is different from the above-described conventional voice synthesizing method.

【００１５】[0015]

【発明の実施の形態】［第１実施形態］次に、本発明の
第１実施形態について図面を参照して詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS First Embodiment Next, a first embodiment of the present invention will be described in detail with reference to the drawings.

【００１６】（１）構成の説明図１は本発明の第１実施形態のテキスト音声合成処理シ
ステムの構成例を示すブロック図である。図１におい
て、本発明の第１実施形態のテキスト音声合成処理シス
テムは、音声合成処理装置１１、テキストデータベース
１２、指示装置１３、音声出力装置１４を具備してい
る。更に、音声合成処理装置１１は、テキスト入力部１
１１、テキスト分割処理部１１２、音声合成処理部１１
３を具備している。(1) Description of Configuration FIG. 1 is a block diagram showing a configuration example of a text-to-speech synthesis processing system according to a first embodiment of the present invention. In FIG. 1, the text-to-speech synthesis processing system according to the first embodiment of the present invention includes a speech synthesis processing device 11, a text database 12, an instruction device 13, and a voice output device 14. Further, the speech synthesis processing device 11 includes the text input unit 1
11, text division processing unit 112, speech synthesis processing unit 11
3 is provided.

【００１７】上記構成を詳述すると、音声合成処理装置
１１は、テキストデータベース１２から抽出したテキス
トの分割を行うと共に、分割したテキスト毎に音声合成
を行い、音声合成したものをファイルとして記憶手段
（図示略）に保存し、読み上げの機会に適宜再生する。
音声合成処理装置１１は、具体的にはコンピュータ内に
音声合成処理をソフトウェアによって行う音声合成エン
ジンに適用した装置である。音声合成処理装置１１のテ
キスト入力部１１１は、指示装置１３からの音声合成指
示に基づきテキストデータベース１２から音声合成対象
のテキストを抽出する。音声合成処理装置１１のテキス
ト分割処理部１１２は、音声合成対象テキストを予め決
められたサイズ内で読点／句点などを区切りに、幾つか
のテキストに分割する。予め決められたサイズとは実使
用では例えば５００ｋバイトを初期値として保持してお
き、５００ｋバイトを越えたところにある読点、句点ま
でというのが分割サイズとなる。この音声合成処理装置
１１の音声合成処理分割処理部１１３は、分割されたテ
キスト毎に音声合成処理を行う。More specifically, the speech synthesis processor 11 divides the text extracted from the text database 12, performs speech synthesis for each of the divided texts, and stores the synthesized speech as a file. (Not shown), and play it back at the occasion of reading out.
The speech synthesis processing device 11 is specifically a device applied to a speech synthesis engine that performs speech synthesis processing in software in a computer. The text input unit 111 of the speech synthesis processing device 11 extracts a text to be speech-synthesized from the text database 12 based on a speech synthesis instruction from the instruction device 13. The text segmentation processing unit 112 of the speech synthesis processing device 11 divides the text to be speech-synthesized into several texts within a predetermined size with a break at a reading point / punctuation mark or the like. In the actual use, the predetermined size is, for example, 500 kbytes held as an initial value, and the division size is up to the reading point and the period beyond 500 kbytes. The speech synthesis processing division processing unit 113 of the speech synthesis processing device 11 performs a speech synthesis process for each divided text.

【００１８】テキストデータベース１２は、具体的には
電子メールサーバに装備されており、テキスト（電子メ
ール）を蓄積している。指示装置１３は、具体的には電
話機の数字ボタンもしくはコンピュータのキーボードで
あり、利用者ＩＤ、パスワード、音声合成指示の入力に
用いる。音声出力装置１４は、具体的には電話機の受話
器もしくはコンピュータに接続されているスピーカであ
り、音声合成処理装置１１で音声合成された音声ファイ
ルを音声出力する。The text database 12 is specifically provided in an e-mail server and stores text (e-mail). The instruction device 13 is specifically a numeric button of a telephone or a keyboard of a computer, and is used to input a user ID, a password, and a voice synthesis instruction. The voice output device 14 is, specifically, a telephone receiver or a speaker connected to a computer, and outputs voice of a voice file synthesized by the voice synthesis processing device 11.

【００１９】（２）動作の説明次に、本発明の第１実施形態の動作について図１〜図４
を参照して詳細に説明する。図２は本発明の第１実施形
態の音声合成処理の流れを示すフローチャート、図３は
本発明の第１実施形態の音声合成スケジュールを示す説
明図、図４は本発明の第１実施形態の音声合成処理の具
体例を示す説明図である。(2) Description of Operation Next, the operation of the first embodiment of the present invention will be described with reference to FIGS.
This will be described in detail with reference to FIG. FIG. 2 is a flowchart showing the flow of a speech synthesis process according to the first embodiment of the present invention, FIG. 3 is an explanatory diagram showing a speech synthesis schedule according to the first embodiment of the present invention, and FIG. FIG. 4 is an explanatory diagram illustrating a specific example of a speech synthesis process.

【００２０】最初に、図２のフローチャートを参照して
本発明の第１実施形態の全体の動作について詳細に説明
する。First, the overall operation of the first embodiment of the present invention will be described in detail with reference to the flowchart of FIG.

【００２１】先ず、音声合成処理装置１１のテキスト入
力部１１１は、利用者が指示装置１３から入力した利用
者ＩＤ、パスワード、音声合成指示に基づき、音声合成
対象のテキストの取り出しを行う。テキストはテキスト
データベース１２から参照される（ステップＳ２１）。
次に、テキスト入力部１１１は、テキストの有無を判定
する（ステップＳ２２）。この場合はテキストデータベ
ース１２からテキストが取り出されているため、次に、
テキスト分割処理部１１２は、テキストの分割処理を行
い、分割処理したテキストを音声合成処理部１１３（音
声合成エンジン）に引き渡す（ステップＳ２３）。これ
により、音声合成処理部１１３は、分割されたテキスト
毎に音声合成を行い音声ファイルを作成する（ステップ
Ｓ２４）。First, the text input unit 111 of the speech synthesis processing device 11 extracts a text to be speech-synthesized based on the user ID, password, and speech synthesis instruction input by the user from the instruction device 13. The text is referred to from the text database 12 (Step S21).
Next, the text input unit 111 determines whether there is text (step S22). In this case, since the text has been extracted from the text database 12, next,
The text division processing unit 112 performs text division processing, and delivers the divided text to the speech synthesis processing unit 113 (speech synthesis engine) (step S23). Thereby, the voice synthesis processing unit 113 performs voice synthesis for each of the divided texts to create a voice file (step S24).

【００２２】音声合成処理部１１３は、作成した音声フ
ァイルを音声出力装置１４を介して音声出力する。これ
により、出来上がった音声ファイルを利用者に音声とし
て聞かせる（ステップＳ２５）。この後、テキスト入力
部１１１は、指示装置１３からの利用者による音声合成
指示が終了したか否か判定する（ステップＳ２６）。次
の利用者指示がある場合は上記ステップＳ２１へ戻り上
記一連の処理を繰り返す。他方、次の利用者指示がない
場合は本処理を終了する。The voice synthesis processing unit 113 outputs the generated voice file via the voice output device 14 as voice. This allows the user to hear the completed audio file as audio (step S25). Thereafter, the text input unit 111 determines whether or not the user has given a speech synthesis instruction from the instruction device 13 (step S26). If there is a next user instruction, the process returns to step S21 to repeat the above series of processing. On the other hand, if there is no next user's instruction, this processing ends.

【００２３】次に、本発明の第１実施形態の具体例につ
いて図４を参照しながら説明する。上記図１に示すよう
に、利用者は指示装置１２（本例では電話機）から音声
合成装置１１に電話をかけて、テキスト（電子メール）
の聞き取り指示を出す。利用者が指示装置１２から利用
者ＩＤ、パスワードを入力すると、音声合成処理装置１
１のテキスト入力部１１１は、テキスト（電子メール）
を電子メールサーバのテキストデータベース１２から取
得する。テキスト分割処理部１１２は、テキスト入力部
１１１で取得したテキスト（電子メール）を句点や読点
に従って分割し、音声合成処理部１１３は、テキスト分
割処理部１１２で分割されたテキスト毎に音声合成処理
を行う。音声合成されたデータは、音声合成のスケジュ
ールに従う。Next, a specific example of the first embodiment of the present invention will be described with reference to FIG. As shown in FIG. 1, the user makes a telephone call from the pointing device 12 (telephone in this example) to the voice synthesizer 11 and sends a text (e-mail).
Give a listening instruction. When the user inputs a user ID and a password from the instruction device 12, the speech synthesis processing device 1
1 text input unit 111 is a text (email)
From the text database 12 of the e-mail server. The text division processing unit 112 divides the text (e-mail) obtained by the text input unit 111 according to a period or a reading point. Do. The voice synthesized data follows the voice synthesis schedule.

【００２４】音声合成スケジュールとは、図３に示すよ
うに分割されたテキストを利用者が次に必要になる順番
に配列し、利用者の指示どおりに再生を行うものであ
り、音声合成処理装置１１の音声合成処理部１１３が指
示装置１３からの入力に基づき予め作成し記憶している
ものである。音声合成スケジュールのルールを以下に示
す。The speech synthesis schedule arranges the divided texts as shown in FIG. 3 in the order required by the user and reproduces the text as instructed by the user. Eleven voice synthesis processing units 113 are created and stored in advance based on the input from the instruction device 13. The rules for the speech synthesis schedule are shown below.

【００２５】テキスト１−分割１を最初に音声合成を行
う。次に利用者が「次のテキスト」を指示したときに即
座に必要になるのはテキスト２−分割１であるため、こ
れを次に音声合成を行う。その次はテキスト１−分割１
の再生時間内でテキスト３−分割１とテキスト１−分割
２の二つの音声合成処理が終了するのなら、テキスト３
−分割１である。しかし、二つの音声合成処理ができな
いときは、先にテキスト１−分割２の音声合成を行う。
この時に音声合成するための時間は、利用者が「早送
り」や「読み飛ばし」を行うことで再生時間が少なくな
ることも考慮にいれなければいけない。このように、音
声合成スケジュールに基づき利用者の行動を先読みして
音声合成処理を行うことで遅延のない処理が可能とな
る。First, speech synthesis is performed on text 1-division 1. Next, when the user instructs the "next text", the text 2 which is immediately required is the text 2-division 1, so this is subjected to speech synthesis next. Next is text 1-split 1
If the two voice synthesis processes of text 3 and division 1 and text 1 and division 2 are completed within the playback time of text 3
-Division 1. However, when two speech synthesis processes cannot be performed, speech synthesis of text 1-split 2 is performed first.
At this time, it is necessary to take into consideration that the time required for the speech synthesis is reduced when the user performs “fast-forward” or “skip” to reduce the reproduction time. As described above, by performing the voice synthesis processing by pre-reading the behavior of the user based on the voice synthesis schedule, it is possible to perform processing without delay.

【００２６】作成された音声ファイルは、順番に利用者
に対して再生する。その間、利用者は「次のメール」
「早送り」「巻き戻し」などの指示を音声合成処理装置
１１に対して送ったりすることが可能であり、その指示
に従って即座に音声ファイルを再生することが可能であ
る。The created audio files are reproduced for the user in order. During that time, the user will receive the "Next Email"
It is possible to send an instruction such as “fast forward” or “rewind” to the voice synthesizing processing device 11, and it is possible to immediately reproduce the audio file according to the instruction.

【００２７】上述した如く、利用者の指示を伝達する指
示装置１３から音声合成処理装置１１のテキスト入力部
１１１へ音声合成を行う指示が与えられると、テキスト
入力部１１１によりテキストデータベース１２から抽出
された長いテキストは、テキスト分割処理部１１２で予
め決められたサイズ内で読点／句点などを区切りに幾つ
かの分割テキストに分けられる。分割されたテキスト
は、音声合成処理部１１３へ次々と引き渡される。小さ
く分割されたテキストの音声合成処理は短い時間で処理
され音声ファイル化され、音声出力装置１４へ音声とし
て出力される。As described above, when an instruction to perform speech synthesis is given from the instruction device 13 for transmitting the user's instruction to the text input unit 111 of the speech synthesis processing device 11, the text input unit 111 extracts the speech from the text database 12. The long text is divided by the text division processing unit 112 into several divided texts with a break at a reading point / punctuation mark or the like within a predetermined size. The divided texts are successively delivered to the speech synthesis processing unit 113. The speech synthesis processing of the text divided into small pieces is processed in a short time, converted into a speech file, and output to the speech output device 14 as speech.

【００２８】また、上記図３に示したような音声合成ス
ケジュールを採用することで、遅延のない読み上げを可
能とする。「読み飛ばし」や「早送り」や「次のテキス
ト読む」を利用者が指示した場合でも即座に読み上げが
可能となる。Further, by adopting the speech synthesis schedule as shown in FIG. 3, it is possible to read out without delay. Even if the user instructs "skip", "fast forward", or "read the next text", it is possible to immediately read out.

【００２９】以上説明したように本発明の第１実施形態
によれば、テキストを分割して音声合成をするため、分
割されたテキストの音声合成処理を短い時間で行うこと
が可能となり、利用者が遅延なくテキストの音声合成さ
れた結果を聞くことができる。また、音声合成されたデ
ータは利用者の指示を先読みした音声合成スケジュール
に従うため、利用者に「次の電子メール」など利用者の
意図に応じた音声ファイルを提供することができる。As described above, according to the first embodiment of the present invention, since the text is divided and the voice is synthesized, the voice synthesis processing of the divided text can be performed in a short time. Can hear the result of text-to-speech synthesis without delay. Further, since the voice-synthesized data follows the voice synthesis schedule in which the user's instruction is read ahead, a voice file according to the user's intention, such as “next e-mail”, can be provided to the user.

【００３０】［第２実施形態］次に、本発明の第２実施
形態について図面を参照して詳細に説明する。[Second Embodiment] Next, a second embodiment of the present invention will be described in detail with reference to the drawings.

【００３１】（１）構成の説明図５は本発明の第２実施形態のテキスト音声合成処理シ
ステムの構成例を示すブロック図である。図５におい
て、本発明の第２実施形態のテキスト音声合成処理シス
テムは、音声合成処理装置５１、汎用データベース５
２、指示装置５３、音声出力装置５４を具備している。
更に、音声合成処理装置５１は、テキスト入力部５１
１、テキスト分割処理部５１２、音声合成処理部５１３
を具備している。更に、テキスト入力部５１１は、デー
タ検索部５１１Ａとテキスト生成部５１１Ｂから構成さ
れている。(1) Description of Configuration FIG. 5 is a block diagram showing a configuration example of a text-to-speech synthesis processing system according to a second embodiment of the present invention. In FIG. 5, a text-to-speech processing system according to a second embodiment of the present invention includes a speech
2, an instruction device 53 and an audio output device 54 are provided.
Further, the speech synthesis processing device 51 includes a text input unit 51.
1. Text division processing unit 512, speech synthesis processing unit 513
Is provided. Further, the text input unit 511 includes a data search unit 511A and a text generation unit 511B.

【００３２】上記構成において第２実施形態が第１実施
形態と相異する点を説明すると、汎用データベース５２
は、テキスト形式でない通常のデータを蓄積している。
音声合成処理装置５１のデータ検索部５１１Ａは、指示
装置５３からの音声合成指示に基づき汎用データベース
５２から音声合成対象のデータを検索する。音声合成処
理装置５１のテキスト生成部５１１Ｂは、データ検索部
５１１Ａにより検索されたデータに対しマージ等の処理
を行いテキスト化を行う。これ以外の構成は第１実施形
態と同様であり説明を省略する。The difference between the second embodiment and the first embodiment in the above configuration will be described.
Stores ordinary data that is not in text format.
The data search unit 511A of the speech synthesis processing device 51 searches the general-purpose database 52 for data to be subjected to speech synthesis based on the speech synthesis instruction from the instruction device 53. The text generation unit 511B of the speech synthesis processing device 51 converts the data searched by the data search unit 511A into text by performing processing such as merging. The other configuration is the same as that of the first embodiment, and the description is omitted.

【００３３】（２）動作の説明次に、本発明の第２実施形態の動作について図５〜図７
を参照して詳細に説明する。図６は本発明の第２実施形
態の音声合成処理の流れを示すフローチャート、図７は
本発明の第２実施形態の音声合成処理の具体例を示す説
明図である。(2) Description of Operation Next, the operation of the second embodiment of the present invention will be described with reference to FIGS.
This will be described in detail with reference to FIG. FIG. 6 is a flowchart showing the flow of the speech synthesis processing according to the second embodiment of the present invention, and FIG. 7 is an explanatory diagram showing a specific example of the speech synthesis processing according to the second embodiment of the present invention.

【００３４】最初に、図６のフローチャートを参照して
本発明の第２実施形態の全体の動作について詳細に説明
する。First, the overall operation of the second embodiment of the present invention will be described in detail with reference to the flowchart of FIG.

【００３５】先ず、音声合成処理装置５１のテキスト入
力部５１１のデータ検索部５１１Ａは、利用者が指示装
置５３から入力した利用者ＩＤ、パスワード、音声合成
指示に基づき、汎用データベース５２から音声合成対象
データの検索を行う（ステップＳ６１）。次に、テキス
ト入力部５１１のテキスト生成部５１１Ｂは、データ検
索部５１１Ａで検索したデータをテキスト化する（ステ
ップＳ６２）。次に、テキスト入力部５１１のテキスト
分割処理部５１１Ｂは、テキストの有無を判定する（ス
テップＳ６３）。この場合は汎用データベース５２のデ
ータがテキスト化されているため、次に、テキスト分割
処理部５１２は、テキストの分割処理を行い、分割処理
したテキストを音声合成処理部５１３（音声合成エンジ
ン）に引き渡す（ステップＳ６４）。これにより、音声
合成処理部５１３は、分割されたテキスト毎に音声合成
を行い音声ファイルを作成する（ステップＳ６５）。First, the data retrieval unit 511A of the text input unit 511 of the speech synthesis processing unit 51 sends a speech synthesis target from the general-purpose database 52 based on the user ID, password, and speech synthesis instruction input by the user from the instruction device 53. Data search is performed (step S61). Next, the text generation unit 511B of the text input unit 511 converts the data searched by the data search unit 511A into text (Step S62). Next, the text division processing unit 511B of the text input unit 511 determines whether there is a text (Step S63). In this case, since the data in the general-purpose database 52 is converted into text, the text division processing unit 512 next performs text division processing, and delivers the divided text to the speech synthesis processing unit 513 (speech synthesis engine). (Step S64). Thus, the speech synthesis processing unit 513 performs speech synthesis for each of the divided texts to create a speech file (step S65).

【００３６】音声合成処理部５１３は、作成した音声フ
ァイルを音声出力装置５４を介して音声出力する。これ
により、出来上がった音声ファイルを利用者に音声とし
て聞かせる（ステップＳ６６）。この後、テキスト入力
部５１１は、指示装置５３からの利用者による音声合成
指示が終了したか否か判定する（ステップＳ６７）。次
の利用者指示がある場合は上記ステップＳ６１へ戻り上
記一連の処理を繰り返す。他方、次の利用者指示がない
場合は本処理を終了する。The voice synthesis processing unit 513 outputs the generated voice file via the voice output device 54 as voice. This allows the user to hear the completed audio file as audio (step S66). Thereafter, the text input unit 511 determines whether or not the user has given a speech synthesis instruction from the instruction device 53 (step S67). If there is a next user instruction, the process returns to step S61 to repeat the series of processes. On the other hand, if there is no next user's instruction, this processing ends.

【００３７】次に、第２実施形態と第１実施形態との相
異点を説明する。上記図１に示した第１実施形態のテキ
スト音声合成処理システムでは、元となるテキストが予
め準備されていないと、音声合成処理を行うことができ
ないという側面がある。Next, differences between the second embodiment and the first embodiment will be described. The text-to-speech synthesis system of the first embodiment shown in FIG. 1 has an aspect that the speech synthesis process cannot be performed unless the original text is prepared in advance.

【００３８】これに対し、第２実施形態のテキスト音声
合成処理システムでは、音声合成処理装置５１のテキス
ト入力部５１１のデータ検索部５１１Ａによるデータ検
索処理で、汎用データベース５２からデータを検索し
（上記図６のステップＳ６１）、テキスト生成部５１１
Ｂでデータをテキスト化する（図６のステップＳ６２）
ことによって音声合成処理を行い、再生することが可能
となる。On the other hand, in the text-to-speech processing system of the second embodiment, data is retrieved from the general-purpose database 52 by the data retrieval processing by the data retrieval unit 511A of the text input unit 511 of the speech synthesis processing unit 51 (see above). Step S61 in FIG. 6), text generator 511
Convert the data to text by B (step S62 in FIG. 6)
This makes it possible to perform a speech synthesis process and reproduce the speech.

【００３９】次に、本発明の第２実施形態の具体例につ
いて図７を参照しながら説明する。汎用データベース５
２内のデータ構造を仮に以下のような場合を考察する。
［キー：数字列］、［設定値１：数字列］、［設定値
２：数字列］、［参照ＤＢキー：別ＤＢのキー］図７では、キー「５４８９」という入力数字列を使って
データの検索を行うと、検索後のデータは「３、４、３
３２１」のようになる。このまま、音声合成処理部５１
３で音声合成処理を行っても利用者にとって理解可能な
音声情報とはならない。そこで、テキスト入力部５１１
のデータ検索部５１１Ａにて汎用データベース５２の検
索を行い、[参照ＤＢキー]の検索を行い、実データを適
用することで利用者に有効な情報として提供することが
できる。Next, a specific example of the second embodiment of the present invention will be described with reference to FIG. General purpose database 5
Let us consider the following case, assuming that the data structure in 2 is as follows.
[Key: Numeric string], [Setting value 1: Numeric string], [Setting value 2: Numeric string], [Reference DB key: Key of another DB] In FIG. When a data search is performed, the data after the search is “3, 4, 3”.
321 ”. In this state, the voice synthesis processing unit 51
Even if speech synthesis processing is performed in step 3, the speech information is not understandable to the user. Therefore, the text input unit 511
The general-purpose database 52 is searched by the data search unit 511A, the [reference DB key] is searched, and by applying the actual data, it can be provided as effective information to the user.

【００４０】別ＤＢにて、キー：「３３２１」で検索さ
れるデータがテキストデータ「タイトル：データベース
の有効活用について…」の場合、利用者に必要なデータ
は「設定値１は３、設定値２は４、タイトル：データベ
ースの有効活用について…」というようなテキストデー
タとなって、利用者に有益な情報を提供することが可能
となる。In another DB, if the data retrieved with key: “3321” is text data “title: effective use of database ...”, the data necessary for the user is “set value 1 is 3, set value 2 is 4, title: About effective use of database ... ", it is possible to provide useful information to the user.

【００４１】上述した如く、汎用データベース５２内の
データは、テキストの形式をしていなくても、音声合成
処理装置５１の音声合成処理部５１３に渡るまでに、テ
キスト入力部５１１のデータ検索部５１１Ａ及びテキス
ト生成部５１１Ａにて検索やマージ等の処理を行いテキ
スト化をすれば、音声合成処理部５１３で音声合成が可
能となる。また、指示装置５３からの指示もデータベー
ス操作を含めることが可能である。これにより、固定の
テキストだけでなく動的に生成されるテキストの音声合
成が可能となる。As described above, even if the data in the general-purpose database 52 is not in the text format, the data search unit 511A of the text input unit 511 must be passed before the data is passed to the speech synthesis processing unit 513 of the speech synthesis processing unit 51. If the text generation unit 511A performs processing such as retrieval or merging to convert the text into text, the voice synthesis processing unit 513 can perform voice synthesis. Further, the instruction from the instruction device 53 can also include a database operation. This enables speech synthesis of dynamically generated text as well as fixed text.

【００４２】以上説明したように、本発明の第２実施形
態によれば、第１実施形態のように音声合成処理対象デ
ータがテキストデータベース１２のテキストデータだけ
でなく、汎用データベース５２の広範なデータを扱うこ
とができるので、利用者はデータが限定されないという
利点を得ることができる。As described above, according to the second embodiment of the present invention, not only the text data of the text database 12 but also the wide data Therefore, the user can obtain an advantage that data is not limited.

【００４３】[0043]

【発明の効果】以上説明したように本発明によれば、テ
キストを分割して音声合成をするため、分割されたテキ
ストの音声合成処理を短い時間で行うことが可能とな
り、利用者が遅延なくテキストの音声合成された結果を
聞くことができる。また、音声合成されたデータは利用
者の指示を先読みした音声合成スケジュールに従うた
め、利用者に「次の電子メール」など利用者の意図に応
じた音声ファイルを提供することができる。更に、音声
合成対象データがテキストデータだけでなく、広範なデ
ータを扱うことができるため、利用者はデータが限定さ
れないという利点を得ることができる。As described above, according to the present invention, the text is divided and the voice is synthesized, so that the voice synthesis processing of the divided text can be performed in a short time, and the user is not delayed. You can hear the result of text-to-speech synthesis. Further, since the voice-synthesized data follows the voice synthesis schedule in which the user's instruction is read ahead, a voice file according to the user's intention, such as “next e-mail”, can be provided to the user. Further, since the voice synthesis target data can handle not only text data but also a wide range of data, the user can obtain an advantage that data is not limited.

[Brief description of the drawings]

【図１】本発明の第１実施形態のテキスト音声合成処理
システムの構成例を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration example of a text-to-speech synthesis processing system according to a first embodiment of the present invention.

【図２】本発明の第１実施形態の音声合成処理の流れを
示すフローチャートである。FIG. 2 is a flowchart showing a flow of a speech synthesis process according to the first embodiment of the present invention.

【図３】本発明の第１実施形態の音声合成スケジュール
の例を示す説明図である。FIG. 3 is an explanatory diagram showing an example of a speech synthesis schedule according to the first embodiment of the present invention.

【図４】本発明の第１実施形態の音声合成処理の具体例
を示す説明図である。FIG. 4 is an explanatory diagram showing a specific example of a speech synthesis process according to the first embodiment of the present invention.

【図５】本発明の第２実施形態のテキスト音声合成処理
システムの構成例を示すブロック図である。FIG. 5 is a block diagram illustrating a configuration example of a text-to-speech synthesis processing system according to a second embodiment of the present invention.

【図６】本発明の第２実施形態の音声合成処理の流れを
示すフローチャートである。FIG. 6 is a flowchart illustrating a flow of a speech synthesis process according to a second embodiment of the present invention.

【図７】本発明の第２実施形態の音声合成処理の具体例
を示す説明図である。FIG. 7 is an explanatory diagram showing a specific example of a speech synthesis process according to the second embodiment of the present invention.

[Explanation of symbols]

１１、５１音声合成処理装置１２テキストデータベース１３、５３指示装置１４、５４音声出力装置５２汎用データベース１１１、５１１テキスト入力部１１２、５１２テキスト分割処理部１１３、５１３音声合成処理部５１１Ａデータ検索部５１１Ｂテキスト生成部 11, 51 voice synthesis processing device 12 text database 13, 53 pointing device 14, 54 voice output device 52 general-purpose database 111, 511 text input unit 112, 512 text division processing unit 113, 513 voice synthesis processing unit 511A data search unit 511B text Generator

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｈ０４Ｌ 12/58 Ｈ０４Ｌ 11/20 １０１ＢＨ０４Ｍ 3/50 Ｆターム(参考） 5B089 GA21 GB04 JA31 JB05 KB04 KH14 LB13 5D045 AA20 AB02 5K015 AA00 GA00 GA12 5K030 GA18 HA06 KA04 KA20 LB16 LD17 LE14 9A001 BZ03 FF03 HH18 JJ14 JZ19──────────────────────────────────────────────────続き Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI theme coat ゛ (reference) H04L 12/58 H04L 11/20 101B H04M 3/50 F term (reference) 5B089 GA21 GB04 JA31 JB05 KB04 KH14 LB13 5D045 AA20 AB02 5K015 AA00 GA00 GA12 5K030 GA18 HA06 KA04 KA20 LB16 LD17 LE14 9A001 BZ03 FF03 HH18 JJ14 JZ19

Claims

[Claims]

1. A data-speech-synthesizing apparatus for performing data-speech-synthesis, comprising: a storage unit that stores a plurality of data to be voice-synthesized; an extraction unit that extracts data to be voice-synthesized from the storage unit;
A dividing unit that divides the speech synthesis target data extracted from the storage unit by the extraction unit based on a predetermined division condition; and a speech synthesis unit that performs speech synthesis for each of the speech synthesis target data divided by the division unit. A data speech synthesizer comprising:

2. An apparatus according to claim 1, further comprising: instruction means for instructing speech synthesis of the speech synthesis target data; and speech output means for outputting a speech synthesis result as speech. The voice synthesis target data is extracted from the storage unit, and the dividing unit divides the voice synthesis target data extracted from the storage unit by the extraction unit into breaks, punctuation marks, and the like, and the voice synthesis unit 2. The data speech synthesizer according to claim 1, wherein speech synthesis is performed for each of the speech synthesis target data divided by the division means, and the speech synthesis result is outputted as speech from the speech output means.

3. The data speech synthesizer according to claim 2, wherein the speech synthesis means causes the speech output means to output the speech synthesis result according to a speech synthesis schedule.

4. The voice synthesizing schedule according to claim 3, wherein the voice synthesizing target data divided by the dividing means is arranged in an order that a user needs next. Data speech synthesizer.

5. The speech synthesis target data is text data such as an electronic mail, and the storage unit stores a plurality of the text data.
Or the data speech synthesizer according to 2.

6. The speech synthesis target data is data that is not in a text format, the storage unit stores a plurality of the data that is not in a text format, and the extraction unit is further configured to output the data that is not in the text format from the storage unit. 3. The data speech synthesizing apparatus according to claim 1, further comprising: a data search unit that searches data; and a text generation unit that converts the non-text data searched by the data search unit into text.

7. A data speech synthesis method for performing speech synthesis of data, comprising: an extraction step of extracting specific speech synthesis target data from a storage unit that stores a plurality of speech synthesis target data; A data speech, comprising: a dividing step of dividing the extracted speech synthesis target data based on predetermined division conditions; and a speech synthesis step of performing speech synthesis for each of the speech synthesis target data divided in the division step. Synthesis method.