JP2004271979A

JP2004271979A - Voice synthesizer

Info

Publication number: JP2004271979A
Application number: JP2003063565A
Authority: JP
Inventors: Natsuki Saito; 夏樹齋藤; Takahiro Kamai; 孝浩釜井
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2003-03-10
Filing date: 2003-03-10
Publication date: 2004-09-30

Abstract

<P>PROBLEM TO BE SOLVED: To fast forwardly reproduce a synthesized voice with a high audibility in such a way that a user can easily understand contents of an entire sentence. <P>SOLUTION: Text information stored in a text storage device 100 is inputted to a morphological analysis device 102 and a parsing device 103 by a text acquisition device 101 and is converted to reading information through a read analysis device 104 and is inputted to a synthesized voice waveform generator 105. The synthesized voice waveform generator 105 generates a synthesized voice waveform by using reading information inputted from the read analysis device 104, morphological analysis information inputted from the morphological analysis device 102, and voice information stored in a voice information database 107 and outputs the synthesized voice waveform to a waveform buffer device 108. The synthesized voice waveform generator 105 thins the generated synthesized voice waveform per morpheme. The outputted synthesized voice waveform is reproduced as a voice from a speaker device 109. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明はテキスト情報からそれに対応した音声情報を生成する音声合成装置に関する。
【０００２】
【従来の技術】
従来の音声再生装置や音声合成装置は、出力する音声を短い時間でユーザに聞かせようとした場合、次のような方法を典型的に用いる。
（１）出力する音声の速度を上げ、再生時間を短縮する（特許文献１）。
（２）声の成分に変化が少ない定常部分を短縮してやることで、再生時間を短縮しつつ可聴性を確保する（特許文献２）。
（３）再生する音声をある時間間隔で区切り、各時間間隔の中の一部のみを再生することで、再生時間を短縮しつつ可聴性を確保する（特許文献３）。
（４）章や段落の先頭のみを再生することによって、再生時間を大幅に短縮する（特許文献４）。
【０００３】
【特許文献１】
特開平１１−５２９８５号公報
【特許文献２】
特開昭６１−５５６９８号公報
【特許文献３】
特許第２０４８７６２号公報
【特許文献４】
特開平２−４５８６８号公報
【０００４】
【発明が解決しようとする課題】
従来の技術では、可聴性を確保し、再生時間を大幅に短縮し、かつ再生される音声全体の内容がユーザにとって分かり易いようにするという要求を満たすことができない。即ち、上に示された従来の技術では、それぞれ次のような問題点がある。
（１）出力する音声の速度を非常に大きくすると、可聴性が損なわれてしまう。
（２）再生される音声に含まれる定常部分の長さしか、音声の再生時間を短縮できない。
（３）一般的に、音声の時間的な区切りと言語的な区切りは一致しないため、可聴性は高くとも内容の理解し辛い音声になってしまう。
（４）ユーザにとって有益な情報が必ずしも章や段落の先頭にあるとは限らないため、この再生方法で文章全体の理解を行わせることは困難である。
【０００５】
【課題を解決するための手段】
本発明は以上の問題を鑑みてなされたものであり、音声合成装置に入力されたテキストを言語的に解析することで重要度の高い部分を見つけ、重要度の高い部分のみを繋げて再生するものである。これにより、ユーザが文章全体の内容を理解しやすく、かつ可聴性の高い、合成音の早送り再生を行うことができる。
【０００６】
また、この方法を入力されたテキストの各文に適用し、文単位でテキストを逆向きに再生することで、ユーザにとって理解しやすい、合成音の高速巻き戻し再生が実現できる。
【０００７】
本発明による第１の音声合成装置は、テキスト情報を保持するテキスト記憶手段と、前記テキスト記憶手段に保持されているテキスト情報を取得するテキスト取得手段と、前記テキスト取得手段によって取得されたテキスト情報について形態素解析処理を行う形態素解析手段と、前記テキスト取得手段によって取得されたテキスト情報と前記形態素解析手段による形態素解析処理の結果とに基づいて、前記テキスト情報を形態素単位で間引いたテキスト情報に対応する音声波形を生成する合成音波形生成手段と、前記合成音波形生成手段によって生成された音声波形を可聴音波に変換して出力するスピーカ手段とを備える。
【０００８】
上記音声合成装置は、音声合成の対象となるテキスト情報の形態素解析結果を用いて、合成音波形生成手段の生成する合成音波形に対し、形態素単位の間引きを行うことを特徴とする。これにより、合成音の元となったテキスト情報を、少なくとも形態素の単位では欠落無くユーザに聞かせられることを保証しつつ、合成音の再生速度を上げることができる。
【０００９】
本発明による第２の音声合成装置は、テキスト情報を保持するテキスト記憶手段と、前記テキスト記憶手段に保持されているテキスト情報を取得するテキスト取得手段と、前記テキスト取得手段によって取得されたテキスト情報について構文解析処理を行う構文解析手段と、前記テキスト取得手段によって取得されたテキスト情報と前記構文解析手段による構文解析処理の結果とに基づいて、前記テキスト情報を構文的な重要度に応じて間引いたテキスト情報に対応する音声波形を生成する合成音波形生成手段と、前記合成音波形生成手段によって生成された音声波形を可聴音波に変換して出力するスピーカ手段とを備える。
【００１０】
上記音声合成装置は、音声合成の対象となるテキスト情報の構文解析結果を用いて、合成音波形生成手段の生成する合成音波形に対し、構文的な重要度が低い部分の間引きを行うことを特徴とする。これにより、合成音の元となったテキスト情報から構文的な重要度の高い部分を残しつつ、重要度の低い部分を削って合成音の再生速度を上げることができる。
【００１１】
本発明による第３の音声合成装置は、テキスト情報を保持するテキスト記憶手段と、前記テキスト記憶手段に保持されているテキスト情報を要約するテキスト要約手段と、前記テキスト要約手段によって要約されたテキスト情報に対応する音声波形を生成する合成音波形生成手段と、前記合成音波形生成手段によって生成された音声波形を可聴音波に変換して出力するスピーカ手段とを備える。
【００１２】
上記音声合成装置は、音声合成の対象となるテキスト情報をまず要約してから音声合成を行うことを特徴とする。これにより、通常の音声合成装置に対する比較的容易な構成変更によって、テキスト情報の単純な間引きによる方法よりも、元のテキスト情報の内容を正確に表す早送り操作が可能になる。
【００１３】
【発明の実施の形態】
以下、本発明の実施の形態について、図１から図１５を用いて説明する。
【００１４】
（第１の実施形態）
第１の実施形態による音声合成装置の全体構成を図１に示す。この音声合成装置は、形態素解析の結果を用いて、形態素単位の間引きにより早送り再生を行う。
【００１５】
テキスト記憶装置１００に蓄積されたテキスト情報はテキスト取得装置１０１により形態素解析装置１０２及び構文解析装置１０３へと入力され、読み解析装置１０４を経て読み情報となり、合成音波形生成装置１０５に入力される。このとき、形態素解析装置１０２、構文解析装置１０３、読み解析装置１０４によるテキスト情報の解析には言語情報データベース１０６に蓄積された言語情報が用いられる。
【００１６】
合成音波形生成装置１０５は、読み解析装置１０４から入力される読み情報、形態素解析装置１０２から入力される形態素解析情報及び音声情報データベース１０７に蓄積された音声情報を用いて合成音波形を生成し、波形バッファ装置１０８へと出力する。出力された合成音波形は、スピーカ装置１０９から音声として再生される。
【００１７】
さらにこの装置は、ユーザが操作する音声合成制御スイッチ１１０からの入力を受け取ることができる音声合成制御装置１１１を有し、これによってテキスト取得装置１０１が取得するテキストの位置と合成音波形生成装置１０５による合成音波形生成処理の早送り速度を制御することができる。
【００１８】
図２に、テキスト記憶装置１００からテキスト取得装置１０１が受け取るテキスト文２００が合成音波形生成装置１０５により合成音２０４に変換されるまでの処理の流れを示す。
【００１９】
まず、形態素解析装置１０２が言語情報データベース１０６に格納された情報を元に入力のテキスト文２００を解析し、テキスト文２００を形態素単位に分割した情報を得る。以下、これを形態素解析情報２０１と呼ぶ。
【００２０】
さらに、形態素解析情報２０１とテキスト文２００を用いて、構文解析装置１０３が言語情報データベース１０６に格納された情報を元にテキスト文２００を構文解析し、形態素単位に分けられたテキスト文２００がどういう構造をしているのか調べる。以下、ここで得られる情報を構文解析情報２０２と呼ぶ。
【００２１】
次に読み解析装置１０４が、言語情報データベース１０６、形態素解析情報２０１、構文解析情報２０２を用いてテキスト文２００を解析し、テキスト文２００の読み上げに対応する音素列やアクセント位置などを合成音波形生成装置１０５に知らせるための読み情報２０３を生成する。
【００２２】
最後に、合成音波形生成装置１０５が読み情報２０３と音声情報データベース１０７を用いて、テキスト文２００の読み上げに対応する合成音波形２０４を出力する。
【００２３】
ここで、合成音波形生成装置１０５は、読み情報２０３のどの部分が形態素解析情報２０１のどの部分に対応するか知っているため、生成する合成音波形を形態素単位で間引くことが可能である。図３に、生成する合成音波形を形態素単位の間引きによって短縮する処理の例を示す。ここで、テキスト文２００の全体に対して合成音波形を生成すると合成音波形２０４ができるが、形態素単位の間引きによって合成音波形３００を生成することができる。この図の処理では、まず形態素解析された元のテキスト文２００から句読点や助詞・助動詞を取り除いた上で、残った形態素を２つおきに取り出して合成音波形３００を生成する。なお、合成音波形３００は、必要な部分の波形のみを生成して作っても良いし、まず合成音波形２０４を生成し、そこから必要な部分の波形を抜き出す方法で作っても良い。
【００２４】
この装置の動作フローを、図４〜図９に示す。
【００２５】
図４は、本実施例の装置における音声合成制御装置１１１の動作を示したものである。音声合成制御装置１１１は、概略以下のように動作する。
・音声合成制御スイッチ１１０からの入力があれば、対応する信号をテキスト取得装置１０１及び合成音波形生成装置１０５に送信する。
・テキスト取得装置１０１が次のテキストを取得できる状態であれば、次のテキストを取得させる信号をテキスト取得装置１０１に送信する。
・テキスト記憶装置１００に蓄積されたテキスト情報の終端の再生が完了したら、テキスト取得装置１０１及び合成音波形生成装置１０５に停止信号を送信する。
【００２６】
図４の動作フローの、各ステップの動作内容は以下の通りである。
Ｓ４００：装置の動作が開始する。
Ｓ４０１：音声合成制御装置１１１内部で使用する、各種変数のクリアを行う。
Ｓ４０２：音声合成制御スイッチ１１０から、合成音の再生停止を指示する入力があったかどうかを判定する。
Ｓ４０３：音声合成制御スイッチ１１０から、合成音の再生速度を１段階増加させることを指示する入力があったかどうかを判定する。
Ｓ４０４：音声合成制御スイッチ１１０から、合成音の再生速度を１段階減少させることを指示する入力があったかどうかを判定する。
Ｓ４０５：音声合成制御スイッチ１１０から、合成音の再生速度を初期値に戻すことを指示する入力があったかどうかを判定する。
Ｓ４０６：テキスト取得装置１０１が次のテキスト情報を取得できる状態であるかどうかを判定する。なお、テキスト取得装置１０１が次のテキスト情報を取得できるのは、テキスト取得装置１０１がまだテキスト記憶装置１００からテキスト情報を取得していないか、テキスト取得装置１０１が最後に取得したテキスト情報を既に形態素解析装置１０２及び構文解析装置１０３の両方に渡し終わったときである。
Ｓ４０７：合成音波形生成装置１０５に、合成音の再生速度を１段階増加させることを指示する信号を送信する。
Ｓ４０８：合成音波形生成装置１０５に、合成音の再生速度を１段階減少させることを指示する信号を送信する。
Ｓ４０９：合成音波形生成装置１０５に、合成音の再生速度を初期値に戻すことを指示する信号を送信する。
Ｓ４１０：テキスト取得装置１０１が最後に取得したテキスト情報が、テキスト記憶装置１００に蓄積されたテキスト情報の終端にあるものかどうかを判定する。
Ｓ４１１：テキスト取得装置１０１に、最後に取得したテキスト情報の直後にあるテキスト情報を取得することを指示する信号を送信する。
Ｓ４１２：スピーカ装置１０９による、波形バッファ装置１０８からの合成音の再生が完了したかどうかを判定する。
Ｓ４１３：テキスト取得装置１０１及び合成音波形生成装置１０５に、動作の停止を指示する信号を送信する。
Ｓ４１４：装置の動作が停止する。
【００２７】
図５は、本実施例の装置におけるテキスト取得装置１０１の動作を示したものである。テキスト取得装置１０１は、概略以下のように動作する。
・音声合成制御装置１１１から、テキスト記憶装置１００内の次のテキスト情報の取得を指示する信号を受信したら、その指示に従う。
・テキスト記憶装置１００からテキスト情報を取得したら、そのテキスト情報をまず形態素解析装置１０２に、次に構文解析装置１０３に送信する。
・音声合成制御装置１１１から、動作の停止を支持する信号を受信したら、その指示に従う。
【００２８】
図５の動作フローの、各ステップの動作内容は以下の通りである。
Ｓ５００：装置の動作が開始する。
Ｓ５０１：テキスト取得装置１０１内部で使用する、各種変数のクリアを行う。
Ｓ５０２：音声合成制御装置１１１より、動作の停止を指示する信号を受信したかどうかを判定する。
Ｓ５０３：音声合成制御装置１１１より、次のテキスト情報の取得を指示する信号を受信したかどうかを判定する。
Ｓ５０４：テキスト記憶装置１００内で、最後に取得したテキスト情報の直後に保持されているテキスト情報の取得を行う。動作開始後、まだ１度もテキスト情報の取得を行っていない場合はテキスト記憶装置１００内の先頭位置に保持されているテキスト情報の取得を行う。
Ｓ５０５：形態素解析装置１０２が次のテキスト情報を取得できる状態であるかどうかを判定する。なお、形態素解析装置１０２が次のテキスト情報を取得できるのは、形態素解析装置１０２がまだテキスト取得装置１０１からテキスト情報を取得していないか、形態素解析装置１０２が最後に取得したテキスト情報の形態素解析処理を既に完了し、構文解析装置１０３、読み解析装置１０４及び合成音波形生成装置１０５の全てに形態素解析結果を渡し終わったときである。
Ｓ５０６：現在保持しているテキスト情報を、形態素解析装置１０２に送信する。
Ｓ５０７：構文解析装置１０３が次のテキスト情報を取得できる状態であるかどうかを判定する。なお、構文解析装置１０３が次のテキスト情報を取得できるのは、構文解析装置１０３がまだテキスト取得装置１０１及び形態素解析装置１０２から何の情報も取得していないか、構文解析装置１０３が最後に行った構文解析処理の結果を、既に読み解析装置１０４に渡し終わったときである。
Ｓ５０８：現在保持しているテキスト情報を、構文解析装置１０３に送信する。Ｓ５０９：装置の動作が停止する。
【００２９】
図６は、本実施例の装置における形態素解析装置１０２の動作を示したものである。形態素解析装置１０２は、概略以下のように動作する。
・テキスト取得装置１０１からテキスト情報を受け取り、形態素解析処理を行って、形態素解析情報を構文解析装置１０３、読み解析装置１０４、合成音波形生成装置１０５に送信する。形態素解析処理に当たっては、言語情報データベース１０６に蓄積された言語情報を使用する。
【００３０】
図６の動作フローの、各ステップの動作内容は以下の通りである。
Ｓ６００：装置の動作が開始する。
Ｓ６０１：形態素解析装置１０２内部で使用する、各種変数のクリアを行う。
Ｓ６０２：テキスト取得装置１０１が、動作を停止したかどうかを判定する。
Ｓ６０３：テキスト取得装置１０１が、形態素解析装置１０２に対し次のテキスト情報を送信可能な状態であるかどうかを判定する。
Ｓ６０４：テキスト取得装置１０１から送信されるテキスト情報の受信を行う。
Ｓ６０５：言語情報データベース１０６を参照しつつ、Ｓ６０４で受信したテキスト情報の形態素解析処理を行う。
Ｓ６０６：構文解析装置１０３が次の形態素解析情報を取得できる状態であるかどうかを判定する。なお、構文解析装置１０３が次の形態素解析情報を取得できるのは、構文解析装置１０３がまだ形態素解析装置１０２から１度も形態素解析情報を取得していないか、形態素解析装置１０２から最後に取得した形態素解析情報を用いて行った構文解析処理の結果を、既に読み解析装置１０４に渡し終わったときである。
Ｓ６０７：Ｓ６０５で生成した形態素解析情報を、構文解析装置１０３に送信する。
Ｓ６０８：読み解析装置１０４が次の形態素解析情報を取得できる状態であるかどうかを判定する。なお、読み解析装置１０４が次の形態素解析情報を取得できるのは、読み解析装置１０４がまだ形態素解析装置１０２から１度も形態素解析情報を取得していないか、形態素解析装置１０２から最後に取得した形態素解析情報を用いて行った読み解析処理の結果を、既に合成音波形生成装置１０５に渡し終わったときである。
Ｓ６０９：Ｓ６０５で生成した形態素解析情報を、読み解析装置１０４に送信する。
Ｓ６１０：合成音波形生成装置１０５が次の形態素解析情報を取得できる状態であるかどうかを判定する。なお、合成音波形生成装置１０５が次の形態素解析情報を取得できるのは、合成音波形生成装置１０５がまだ形態素解析装置１０２から１度も形態素解析情報を取得していないか、形態素解析装置１０２から最後に取得した形態素解析情報を利用した合成音波形の生成が既に完了しているときである。
Ｓ６１１：Ｓ６０５で生成した形態素解析情報を、合成音波形生成装置１０５に送信する。
Ｓ６１２：装置の動作が停止する。
【００３１】
図７は、本実施例の装置における構文解析装置１０３の動作を示したものである。構文解析装置１０３は、概略以下のように動作する。
・テキスト取得装置１０１からテキスト情報を、形態素解析装置１０２から形態素解析情報を受け取り、構文解析処理を行って、構文解析情報を読み解析装置１０４に送信する。構文解析処理に当たっては、言語情報データベース１０６に蓄積された言語情報を使用する。
【００３２】
図７の動作フローの、各ステップの動作内容は以下の通りである。
Ｓ７００：装置の動作が開始する。
Ｓ７０１：構文解析装置１０３内部で使用する、各種変数のクリアを行う。
Ｓ７０２：テキスト取得装置１０１が、動作を停止したかどうかを判定する。
Ｓ７０３：テキスト取得装置１０１が、形態素解析装置１０２に対し次のテキスト情報を送信可能な状態であるかどうかを判定する。
Ｓ７０４：テキスト取得装置１０１から送信されるテキスト情報の受信を行う。
Ｓ７０５：形態素解析装置１０２が、構文解析装置１０３に対し次の形態素解析情報を送信可能な状態であるかどうかを判定する。
Ｓ７０６：形態素解析装置１０２から送信される形態素解析情報の受信を行う。
Ｓ７０７：言語情報データベース１０６を参照しつつ、Ｓ７０６で形態素解析装置１０２から受信した形態素解析情報を用いて、Ｓ７０４で受信したテキスト情報の構文解析処理を行う。
Ｓ７０８：読み解析装置１０４が次の構文解析情報を取得できる状態であるかどうかを判定する。なお、読み解析装置１０４が次の構文解析情報を取得できるのは、読み解析装置１０４がまだ構文解析装置１０３から１度も構文解析情報を取得していないか、構文解析装置１０３から最後に取得した構文解析情報を用いて行った読み解析処理の結果を、既に合成音波形生成装置１０５に渡し終わったときである。
Ｓ７０９：Ｓ７０７で生成した構文解析情報を、読み解析装置１０４に送信する。
Ｓ７１０：装置の動作が停止する。
【００３３】
図８は、本実施例の装置における読み解析装置１０４の動作を示したものである。読み解析装置１０４は、概略以下のように動作する。
・テキスト取得装置１０１からテキスト情報を、形態素解析装置１０２から形態素解析情報を、構文解析装置１０３から構文解析情報を受け取り、言語情報データベース１０６を参照して生成した読み情報を合成音波形生成装置１０５に送信する。
【００３４】
図８の動作フローの、各ステップの動作内容は以下の通りである。
Ｓ８００：装置の動作が開始する。
Ｓ８０１：読み解析装置１０４内部で使用する、各種変数のクリアを行う。
Ｓ８０２：テキスト取得装置１０１が、動作を停止したかどうかを判定する。
Ｓ８０３：テキスト取得装置１０１が、読み解析装置１０４に対し次の形態素解析情報を送信可能な状態であるかどうかを判定する。
Ｓ８０４：テキスト取得装置１０１から送信されるテキスト情報の受信を行う。
Ｓ８０５：形態素解析装置１０２が、読み解析装置１０４に対し次の形態素解析情報を送信可能な状態であるかどうかを判定する。
Ｓ８０６：形態素解析装置１０２から送信される形態素解析情報の受信を行う。
Ｓ８０７：構文解析装置１０３が、読み解析装置１０４に対し次の構文解析情報を送信可能な状態であるかどうかを判定する。
Ｓ８０８：構文解析装置１０３から送信される構文解析情報の受信を行う。
Ｓ８０９：言語情報データベース１０６を参照しつつ、Ｓ８０４でテキスト取得装置１０１から受信したテキスト情報と、Ｓ８０６で形態素解析装置１０２から受信した形態素解析情報と、Ｓ８０８で構文解析装置１０３から受信した構文解析情報を用いて、元のテキスト情報の読み解析処理を行い、読み情報を生成する。
Ｓ８１０：合成音波形生成装置１０５が次の読み情報を取得できる状態であるかどうかを判定する。なお、合成音波形生成装置１０５が次の読み情報を取得できるのは、合成音波形生成装置１０５がまだ読み解析装置１０４から１度も読み情報を取得していないか、読み解析装置１０４から最後に取得した読み情報を利用した合成音波形の生成が既に完了しているときである。
Ｓ８１１：Ｓ８０９で生成した読み情報を、合成音波形生成装置１０５に送信する。
Ｓ８１２：装置の動作が停止する。
【００３５】
図９は、本実施例の装置における合成音波形生成装置１０５の動作を示したものである。合成音波形生成装置１０５は、概略以下のように動作する。
・形態素解析装置１０２から形態素解析情報を、読み解析装置１０４から読み情報を受け取り、音声情報データベース１０７を参照して生成した合成音波形を波形バッファ装置１０８に書き込む。
・音声合成制御装置１１１から送られる信号を受信し、それに従って動作の停止や合成音の早送り速度の制御を行う。
【００３６】
図９の動作フローの、各ステップの動作内容は以下の通りである。
Ｓ９００：装置の動作が開始する。
Ｓ９０１：合成音波形生成装置１０５内部で使用する、各種変数のクリアを行う。このとき、合成音の再生速度は０に初期化する。
Ｓ９０２：音声合成制御装置１１１から、動作の停止を支持する信号を受信したかどうかを判定する。
Ｓ９０３：音声合成制御装置１１１から、合成音の再生速度を１段階増加させることを指示する信号を受信したかどうかを判定する。
Ｓ９０４：合成音の再生速度を１増やす。ただし、合成音の再生速度は５を超えないようにする。
Ｓ９０５：音声合成制御装置１１１から、合成音の再生速度を１段階減少させることを指示する入力があったかどうかを判定する。
Ｓ９０６：合成音の再生速度を１減らす。ただし、合成音の再生速度は０を下回らないようにする。
Ｓ９０７：形態素解析装置１０２が、合成音波形生成装置１０５に対し次の形態素解析情報を送信可能な状態であるかどうかを判定する。
Ｓ９０８：形態素解析装置１０２から送信される形態素解析情報の受信を行う。
Ｓ９０９：読み解析装置１０４が、合成音波形生成装置１０５に対し次の読み情報を送信可能な状態であるかどうかを判定する。
Ｓ９１０：読み解析装置１０４から送信される読み情報の受信を行う。
Ｓ９１１：音声情報データベース１０７を参照しつつ、Ｓ９０８で形態素解析装置１０２から受信した形態素解析情報と、Ｓ９１０で読み解析装置１０４から受信した読み情報を用いて、合成音波形を生成する。このとき、合成音の再生速度が０であれば渡された読み情報の全てに対して合成音波形を生成するが、合成音の再生速度が１以上であれば、図３に示す方法で、句読点や助詞・助動詞を取り除いた上で、さらに形態素の間引きを行い、残った形態素に対応する部分の合成音波形のみを出力する。例えば、合成音の再生速度が１であれば、句読点や助詞・助動詞を取り除いた後の形態素を１つおきに取り出した部分の合成音を生成し、合成音の再生速度が２であれば、同じく形態素を２つおきに取り出す、というようにする。
Ｓ９１２：波形バッファ装置１０８が、空になっているかどうかを判定する。なお、波形バッファ装置１０８が空であるとは、装置の動作開始後まだ１度も波形バッファ装置に合成音波形が書き込まれていないか、最後に書き込まれた合成音波形の再生が既に完了した状態である。
Ｓ９１３：Ｓ９１１で生成した合成音波形を、波形バッファ装置１０８に書き込む。
Ｓ９１４：装置の動作が停止する。
【００３７】
スピーカ装置１０９は、波形バッファ装置１０８に合成音波形が書き込まれたらそれを再生し、再生が終了すると再び次の合成音波形が波形バッファ装置１０８に書き込まれるのを待つことを繰り返す。
【００３８】
以上の構成及び手順によって、テキスト記憶装置１００に蓄積されたテキスト情報を、形態素単位の間引きによる早送りを行いつつ合成音波形として再生できる音声合成装置が実現できる。
【００３９】
この方法では、合成音波形の生成にあたって行われる形態素解析処理の結果を用いて、ごく単純なアルゴリズムで合成音の間引きを行うことにより、早送りを行うために新たに必要となる計算処理量を極力抑えながら、可聴性が高く理解し易い合成音の早送りが可能となる。この方法は、携帯電話でのメール読み上げやカーナビゲーションシステムでの道路情報読み上げなど、音声合成に使用できるリソースの限られた環境で早聞きを行うという用途に適する。
【００４０】
なお、本実施例では形態素の間引きの制御を、例えば「２つおきに１つの形態素を間引く」というような方法で行ったが、これを例えば「現在の早送り速度で合成音の再生を開始してから再生した合成音の時間長の総和が、早送りせずに通常の再生を行った場合の半分を超えたところで形態素を間引く」というように、合成音の再生時間を用いて形態素の間引きの制御をすることも可能である。
【００４１】
（第２の実施形態）
第２の実施形態による音声合成装置の全体構成を図１０に示す。この音声合成装置は、構文解析の結果（構文解析情報）を利用して形態素単位の間引きを行うことにより早送り再生を行う。この音声合成装置におけるデータの流れを図１１に示す。図１に示した装置構成および図２に示したデータの流れとの相違点は、形態素解析装置１０２から合成音波形生成装置１０５へは形態素解析情報を直接送らず、代わりに構文解析装置１０３から合成音波形生成装置１０５へ構文解析情報を直接送る点である。
【００４２】
図１２に、本実施の形態における、構文解析結果を用いた形態素単位の間引きによる合成音波形の短縮処理の例を示す。ここで、テキスト文２００の全体に対して合成音波形を生成すると合成音波形２０４ができるが、構文解析情報を考慮して、形態素単位で重要度の低い部分を優先的に間引くことによって、合成音波形１２００を生成することができる。このとき、合成音波形１２００は、必要な部分の波形のみを生成して作っても良いし、まず合成音波形２０４を生成し、そこから必要な部分の波形を抜き出す方法で作っても良い。
【００４３】
この図の処理では、まず形態素解析及び構文解析をされた元のテキスト文２００から、文章の構成に必須と思われる主語・述語の文節を抜き出した上で、さらに文節間の係り受けの情報を見て、重要度が高いと思われる文節を取り出して合成音波形１２００を生成する。重要度が高いと思われる文節とは、例えば文章全体の述語を修飾し、で・が・を等の助詞で終わる文節や、文章全体の主語や述語を修飾する、文章の先頭から最初に現れる文節などである。
【００４４】
本実施の形態における、各構成要素のフローチャートについて、上記第１の実施形態のフローチャートからの差分の形で説明する。
【００４５】
音声合成制御装置１１１の動作は、第１の実施形態の図４に示されるものと同様である。
【００４６】
テキスト取得装置１０１の動作は、第１の実施形態の図５に示されるものと同様である。
【００４７】
本実施の形態における形態素解析装置１０２は、第１の実施形態のものと比較すると、合成音波形生成装置１０５に形態素解析情報を送信する必要が無いこと以外は全く同じ動作をする。よって、図６に示される動作フローから、Ｓ６１０とＳ６１１を省き、Ｓ６０９の終了後Ｓ６０２へと移行するようにしたものが本実施の形態における形態素解析装置１０２の動作フローである。
【００４８】
本実施の形態における構文解析装置１０３は、第１の実施形態のものと比較すると、合成音波形生成装置１０５に構文解析情報を送信する必要があること以外は全く同じ動作をする。よって、図７に示される動作フローのＳ７０９の直後に、さらに合成音波形生成装置１０５が構文解析情報を受信可能かどうか判定するためのステップと、合成音波形生成装置１０５に構文解析情報を送信するステップを追加したものが本実施の形態における構文解析装置１０３の動作フローである。
【００４９】
読み解析装置１０４の動作は、第１の実施形態の図８に示されるものと同様である。
【００５０】
本実施の形態における合成音波形生成装置１０５は、第１の実施形態のものと比較すると、形態素解析装置１０２から形態素解析情報を受け取る代わりに、構文解析装置１０３から構文解析情報を受け取ること以外は全く同じフローになる。よって、図９に示される動作フローで、形態素解析装置１０２から形態素解析情報を受信できるかどうかを判定するＳ９０７と、形態素解析情報を受信するＳ９０８を、それぞれ構文解析装置Ｓ１０３から構文解析情報を受信できるかどうか判定するステップと、構文解析情報を受信するステップに変更したものが本実施の形態における合成音波形生成装置１０５の動作フローである。合成音波形生成装置１０５は、受信した構文解析情報を、Ｓ９１１で用いて合成音波形の生成を行う。このとき、合成音の再生速度が０であれば渡された読み情報の全てに対して合成音波形を生成するが、合成音の再生速度が１以上であれば、図１２に示されるような方法で、かつ合成音の再生速度が大きくなるにつれ間引きを行う形態素数が多くなるように合成音波形１２００の生成を行う。
【００５１】
以上の構成及び手順によって、テキスト記憶装置１００に蓄積されたテキスト情報を、構文解析情報を元にした形態素単位の間引きによる早送りを行いつつ合成音波形として再生できる音声合成装置が実現できる。
【００５２】
この方法では、形態素解析処理の結果を用いて得られる構文解析処理の結果を利用することで、第１の実施形態の方法よりも各形態素の重要性を反映した早送り処理ができるようになる。
【００５３】
（第３の実施形態）
第３の実施形態による音声合成装置の全体構成を図１３に示す。この装置は、音声合成処理の対象となるテキスト情報をあらかじめ要約しておくことによって、早送り再生を行う。この音声合成装置におけるデータの流れを図１４に示す。図１に示した装置の構成および図２に示したデータの流れとの相違点は、この音声合成装置では、合成音生成装置１０５が形態素解析情報２０１を直接使わず、読み情報２０３に含まれる情報のみを用いて合成音波形２０４の生成を行うが、代わりにテキスト要約装置１３００を備え、これによってテキスト情報２００をまずテキスト情報１４００に要約してから音声合成処理を行うことによって、合成音波形の短縮を行うようになっていることである。
【００５４】
この構成では、まず入力のテキスト情報をテキスト要約装置１３００によって要約することで出力される合成音波形の長さを短縮するため、テキスト取得装置１０１以降の処理では、合成音波形の短縮について考えなくても良いという利点を有する。
【００５５】
図１５に、テキスト要約装置１３００の動作フローを示す。テキスト要約装置１３００は、概略以下のように動作する。
・テキスト記憶装置１００からテキスト情報を受け取り、要約処理を行って長さを原文以下に縮め、テキスト取得装置１０１に送信する。
・音声合成制御装置１１１から送られる信号を受信し、それに従って動作の停止や合成音の早送り速度の制御を行う。
・音声合成制御装置１１１から指示される、合成音の早送り速度が上がるにつれ、入力されるテキスト情報に対する要約後のテキスト情報の長さは単調に減少する。
【００５６】
図１５の動作フローの、各ステップの動作内容は以下の通りである。
Ｓ１５００：装置の動作が開始する。
Ｓ１５０１：テキスト要約装置１３００内部で使用する、各種変数のクリアを行う。このとき、合成音の再生速度は０に初期化する。
Ｓ１５０２：音声合成制御装置１１１から、動作の停止を支持する信号を受信したかどうかを判定する。
Ｓ１５０３：音声合成制御装置１１１から、合成音の再生速度を１段階増加させることを指示する信号を受信したかどうかを判定する。
Ｓ１５０４：合成音の再生速度を１増やす。ただし、合成音の再生速度は５を超えないようにする。
Ｓ１５０５：音声合成制御装置１１１から、合成音の再生速度を１段階減少させることを指示する入力があったかどうかを判定する。
Ｓ１５０６：合成音の再生速度を１減らす。ただし、合成音の再生速度は０を下回らないようにする。
Ｓ１５０７：音声合成制御装置１１１より、次のテキスト情報の取得を指示する信号を受信したかどうかを判定する。
Ｓ１５０８：テキスト記憶装置１００内で、最後に取得したテキスト情報の直後に保持されているテキスト情報の取得を行う。動作開始後、まだ１度もテキスト情報の取得を行っていない場合はテキスト記憶装置１００内の先頭位置に保持されているテキスト情報の取得を行う。
Ｓ１５０９：Ｓ１５０８で取得したテキスト情報の要約処理を行う。
Ｓ１５１０：テキスト取得装置１０１が、Ｓ１５０９で生成した、要約済みのテキスト情報を取得可能な状態であるか判定する。なお、テキスト取得装置１０１がテキスト情報を取得できるのは、テキスト取得装置１０１がまだテキスト要約装置１３００から１度もテキスト情報を取得していないか、テキスト要約装置１３００から最後に取得したテキスト情報を、既に形態素解析装置１０２、構文解析装置１０３及び読み解析装置１０４に渡し終わっているときである。
Ｓ１５１１：Ｓ１５０９で生成した、要約済みのテキスト情報をテキスト取得装置１０１に送信する。
Ｓ１５１２：装置の動作が停止する。
【００５７】
以下、テキスト要約装置１３００以外の各装置の動作フローについて、上記第１の実施形態のフローチャートからの差分の形で説明する。
【００５８】
音声合成制御装置１１１は、図４に示される動作フローのＳ４０６、Ｓ４１０、Ｓ４１１、Ｓ４１２、Ｓ４１３においてテキスト取得装置１０１に対して行っていた処理を、本実施の形態ではテキスト要約装置１３００に対して行う。また、Ｓ４０７、Ｓ４０８、Ｓ４０９において合成音波形生成装置１０５に対して行っていた処理を、本実施の形態では同じくテキスト要約装置１３００に対して行う。
【００５９】
テキスト取得装置１０１は、図５に示される動作フローのＳ５０３において音声合成制御装置１１１からの信号を待ち受けていた動作を、本実施の形態ではテキスト要約装置１３００からのテキスト情報の送信を待ち受ける動作にする。また、Ｓ５０４においてテキスト取得装置１００からテキスト情報を取得していたところを、テキスト要約装置１３００から取得するようにする。
【００６０】
形態素解析装置１０２は、上記第２の実施形態のものと同じ動作をする。
【００６１】
構文解析装置１０３の動作は、第１の実施形態の図７に示されるものと同様である。
【００６２】
読み解析装置１０４の動作は、第１の実施形態の図８に示されるものと同様である。
【００６３】
本実施の形態における合成音波形生成装置１０５の動作は、図９に示される動作フローのＳ９０３、Ｓ９０４、Ｓ９０５、Ｓ９０６、Ｓ９０７及びＳ９０８の処理を必要としない。本実施の形態では、Ｓ９０２の処理の後、Ｓ９０９の判定処理を行い、判定の結果、読み解析装置１０４が次の読み情報を送信可能であればＳ９１０へ、可能でなければＳ９０２へ分岐するという処理を行う。
【００６４】
以上の構成及び手順によって、テキスト記憶装置１００に蓄積されたテキスト情報を、ブラックボックス化されたテキスト要約装置１３００による短縮処理を行いつつ合成音波形として再生できる音声合成装置が実現できる。
【００６５】
本実施の形態は、別途用意するテキスト要約装置１３００にテキスト情報の短縮を任せることで、テキスト情報から音声合成処理を行う部分の動作を簡単化できるという利点を有する。また、テキスト情報の要約処理は形態素単位での単純な間引き処理よりも多くのリソースを要求するが、テキスト要約装置１３００によって元のテキスト情報の意味を解析して適切な要約処理を行うことにより、形態素の単純な間引き処理を行うよりも意味的に正確で、文章としての体裁が整った合成音の早送り処理が可能となる。
【００６６】
なお、第１〜第３の実施形態に記載した音声合成装置のいずれか２つまたは３つを組み合わせた音声合成装置を実現することも可能である。
【００６７】
【発明の効果】
以上のように本発明によれば、入力となるテキスト情報の言語的な区切りを単位とする間引きや、テキスト情報の意味を考慮した要約を行うことにより、合成音の可聴性と理解のしやすさを確保した、合成音の早送り処理が可能となる。これにより、合成音によりテキストの読み上げを行わせる際の「早聞き」や、テキスト記憶手段からある目的のテキスト情報を探すための「聞き飛ばし」などの操作が容易にできるようになる。
【図面の簡単な説明】
【図１】第１の実施形態による音声合成装置の構成を示すブロック図である。
【図２】図１に示した音声合成装置におけるデータの流れを示す図である。
【図３】合成音の間引き処理の例を示す概略図である。
【図４】図１に示した音声合成制御装置の動作を示すフローチャートである。
【図５】図１に示したテキスト取得装置の動作を示すフローチャートである。
【図６】図１に示した形態素解析装置の動作を示すフローチャートである。
【図７】図１に示した構文解析装置の動作を示すフローチャートである。
【図８】図１に示した読み解析装置の動作を示すフローチャートである。
【図９】図１に示した合成音波形生成装置の動作を示すフローチャートである。
【図１０】第２の実施形態による音声合成装置の構成を示すブロック図である。
【図１１】図１０に示した音声合成装置におけるデータの流れを示す図である。
【図１２】図１０に示した音声合成装置における合成音の間引き処理の例を示す概略図である。
【図１３】第３の実施形態による音声合成装置の構成を示すブロック図である。
【図１４】図１３に示した音声合成装置におけるデータの流れを示す図である。
【図１５】図１３に示したテキスト要約装置の動作を示すフローチャートである。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a speech synthesizer for generating speech information corresponding to text information from the text information.
[0002]
[Prior art]
Conventional audio reproducing apparatuses and audio synthesizing apparatuses typically use the following method when trying to make a user output audio in a short time.
(1) To increase the speed of the output sound and reduce the reproduction time (Patent Document 1).
(2) The audibility is ensured while shortening the reproduction time by shortening the steady part where the voice component is little changed (Patent Document 2).
(3) The audio to be reproduced is divided at certain time intervals, and only a part of each time interval is reproduced, thereby ensuring the audibility while shortening the reproduction time (Patent Document 3).
(4) Reproduction of only the beginning of a chapter or paragraph greatly reduces the reproduction time (Patent Document 4).
[0003]
[Patent Document 1]
JP-A-11-52985
[Patent Document 2]
JP-A-61-55698
[Patent Document 3]
Japanese Patent No. 2048762
[Patent Document 4]
JP-A-2-45868
[0004]
[Problems to be solved by the invention]
The prior art cannot meet the demands of ensuring audibility, significantly shortening the playback time, and making the entire content of the played sound easier for the user to understand. That is, the above-described conventional techniques have the following problems.
(1) If the speed of the output sound is extremely high, the audibility is impaired.
(2) Only the length of the steady part included in the reproduced sound can shorten the reproduction time of the sound.
(3) In general, the temporal and linguistic divisions of audio do not match, so that the audio is difficult to understand even if the audibility is high.
(4) Since useful information for the user is not always at the beginning of a chapter or paragraph, it is difficult to make the whole sentence understood by this reproduction method.
[0005]
[Means for Solving the Problems]
The present invention has been made in view of the above problems, and finds a high importance portion by linguistically analyzing a text input to a speech synthesizer, and connects and reproduces only a high importance portion. Things. As a result, the user can easily understand the contents of the entire sentence and perform fast forward reproduction of the synthesized sound with high audibility.
[0006]
Also, by applying this method to each sentence of the input text and playing back the text in a sentence unit in the reverse direction, it is possible to realize a high-speed rewind playback of the synthesized sound that is easy for the user to understand.
[0007]
A first speech synthesizer according to the present invention comprises: a text storage unit for storing text information; a text acquisition unit for acquiring text information stored in the text storage unit; and a text information acquired by the text acquisition unit. A morphological analysis unit that performs a morphological analysis process on the basis of the text information acquired by the text acquisition unit and a result of the morphological analysis process by the morphological analysis unit, the text information corresponding to text information thinned out in morpheme units. And a speaker for converting an audio waveform generated by the synthesized audio waveform generating means into an audible sound wave and outputting the audible sound wave.
[0008]
The above speech synthesizer is characterized in that a morpheme unit is thinned out for a synthetic sound waveform generated by a synthetic sound waveform generating means using a morphological analysis result of text information to be subjected to voice synthesis. This makes it possible to increase the reproduction speed of the synthesized sound while ensuring that the user can hear the text information that is the source of the synthesized sound at least in units of morphemes.
[0009]
A second speech synthesizer according to the present invention comprises: a text storage unit for storing text information; a text acquisition unit for acquiring text information stored in the text storage unit; and a text information acquired by the text acquisition unit. And parsing the text information according to the syntactic importance based on the text information acquired by the text acquisition means and the result of the syntax analysis processing by the syntax analysis means. And a speaker for converting the speech waveform generated by the synthesized speech waveform generating means into an audible sound wave and outputting the audible sound wave.
[0010]
The above-mentioned speech synthesizer uses the syntax analysis result of text information to be subjected to speech synthesis to thin out a portion having a low syntactic importance to a synthetic sound waveform generated by a synthetic sound waveform generating means. Features. As a result, it is possible to increase the reproduction speed of the synthesized sound by cutting off the less important parts from the text information that is the source of the synthesized sound while removing the parts with the higher syntactic importance.
[0011]
A third speech synthesizer according to the present invention includes a text storage unit for storing text information, a text summarization unit for summarizing the text information stored in the text storage unit, and a text information summarized by the text summarization unit. And a speaker for converting the audio waveform generated by the synthesized audio waveform generator into an audible sound wave and outputting the audible sound wave.
[0012]
The speech synthesizer is characterized in that text information to be subjected to speech synthesis is first summarized, and then speech synthesis is performed. This makes it possible to perform a fast-forward operation that accurately represents the contents of the original text information by a relatively easy configuration change with respect to a normal speech synthesizer, as compared with a method using simple thinning of text information.
[0013]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to FIGS. 1 to 15.
[0014]
(1st Embodiment)
FIG. 1 shows the overall configuration of the speech synthesizer according to the first embodiment. This speech synthesizer performs fast-forward playback by thinning out morpheme units using the result of morphological analysis.
[0015]
The text information stored in the text storage device 100 is input by the text acquisition device 101 to the morphological analysis device 102 and the syntax analysis device 103, becomes the read information via the read analysis device 104, and is input to the synthetic sound waveform generation device 105. . At this time, the linguistic information stored in the linguistic information database 106 is used for analyzing the text information by the morphological analysis device 102, the syntax analysis device 103, and the reading analysis device 104.
[0016]
The synthetic sound waveform generation device 105 generates a synthetic sound waveform using the reading information input from the reading analysis device 104, the morphological analysis information input from the morphological analysis device 102, and the voice information stored in the voice information database 107. , To the waveform buffer device 108. The output synthesized sound waveform is reproduced as sound from the speaker device 109.
[0017]
Further, the apparatus has a speech synthesis control device 111 which can receive an input from a speech synthesis control switch 110 operated by a user, whereby the position of the text acquired by the text acquisition device 101 and the synthesized sound waveform generation device 105 Can control the fast-forward speed of the synthetic sound waveform generation processing by the above.
[0018]
FIG. 2 shows a processing flow until the text sentence 200 received by the text acquisition device 101 from the text storage device 100 is converted into the synthesized sound 204 by the synthesized sound waveform generation device 105.
[0019]
First, the morphological analyzer 102 analyzes the input text sentence 200 based on the information stored in the language information database 106, and obtains information obtained by dividing the text sentence 200 into morpheme units. Hereinafter, this is referred to as morphological analysis information 201.
[0020]
Further, using the morphological analysis information 201 and the text sentence 200, the syntactic analysis device 103 parses the text sentence 200 based on the information stored in the linguistic information database 106, and what is the text sentence 200 divided into morpheme units. Investigate whether it is structured. Hereinafter, the information obtained here is referred to as syntax analysis information 202.
[0021]
Next, the reading analysis device 104 analyzes the text sentence 200 using the linguistic information database 106, the morphological analysis information 201, and the syntax analysis information 202, and determines a phoneme sequence corresponding to the reading of the text sentence 200, an accent position, and the like. The reading information 203 for notifying the generating device 105 is generated.
[0022]
Finally, the synthesized sound waveform generating apparatus 105 outputs the synthesized sound waveform 204 corresponding to the reading of the text 200 using the reading information 203 and the voice information database 107.
[0023]
Here, since the synthesized sound waveform generating apparatus 105 knows which part of the reading information 203 corresponds to which part of the morphological analysis information 201, the synthesized sound waveform to be generated can be thinned out for each morpheme. FIG. 3 shows an example of processing for shortening the generated synthetic sound waveform by thinning out morpheme units. Here, when a synthesized sound waveform is generated for the entire text sentence 200, a synthesized sound waveform 204 is generated. However, the synthesized sound waveform 300 can be generated by thinning out morpheme units. In the processing shown in this figure, first, after removing punctuation marks and particles / auxiliary verbs from the original text sentence 200 subjected to morphological analysis, the remaining morphemes are taken out every two to generate a synthetic sound waveform 300. The synthesized sound waveform 300 may be generated by generating only a waveform of a necessary portion, or may be generated by first generating the synthesized sound waveform 204 and extracting a waveform of the required portion therefrom.
[0024]
The operation flow of this device is shown in FIGS.
[0025]
FIG. 4 shows the operation of the speech synthesis control device 111 in the device of the present embodiment. The speech synthesis control device 111 operates roughly as follows.
If there is an input from the voice synthesis control switch 110, a corresponding signal is transmitted to the text acquisition device 101 and the synthesized sound waveform generation device 105.
If the text acquiring device 101 can acquire the next text, a signal for acquiring the next text is transmitted to the text acquiring device 101.
When the reproduction of the end of the text information stored in the text storage device 100 is completed, a stop signal is transmitted to the text acquisition device 101 and the synthetic sound waveform generation device 105.
[0026]
The operation content of each step in the operation flow of FIG. 4 is as follows.
S400: Operation of the device starts.
S401: Various variables used inside the speech synthesis control device 111 are cleared.
S402: It is determined whether or not there is an input from the voice synthesis control switch 110 to stop the reproduction of the synthesized sound.
S403: It is determined whether or not there is an input from the voice synthesis control switch 110 to increase the reproduction speed of the synthesized sound by one step.
S404: It is determined whether or not there is an input from the voice synthesis control switch 110 to instruct to decrease the reproduction speed of the synthesized sound by one step.
S405: It is determined whether or not there is an input from the voice synthesis control switch 110 to instruct to return the reproduction speed of the synthesized sound to the initial value.
S406: It is determined whether the text acquisition device 101 is in a state where the next text information can be acquired. The text acquisition device 101 can acquire the next text information because the text acquisition device 101 has not yet acquired the text information from the text storage device 100 or the text information acquired by the text acquisition device 101 has already been acquired. This is the time when the data has been passed to both the morphological analyzer 102 and the syntax analyzer 103.
S407: A signal is transmitted to the synthetic sound waveform generating apparatus 105 to instruct to increase the reproduction speed of the synthetic sound by one step.
S408: A signal for instructing the synthetic sound waveform generating apparatus 105 to reduce the reproduction speed of the synthetic sound by one step is transmitted.
S409: A signal is transmitted to the synthesized sound waveform generating apparatus 105 to instruct to return the reproduction speed of the synthesized sound to the initial value.
S410: It is determined whether the text information acquired last by the text acquisition device 101 is at the end of the text information stored in the text storage device 100.
S411: The text acquisition device 101 transmits a signal instructing to acquire text information immediately after the text information acquired last.
S412: It is determined whether or not the reproduction of the synthesized sound from the waveform buffer device 108 by the speaker device 109 has been completed.
S413: A signal for instructing to stop the operation is transmitted to the text acquisition device 101 and the synthetic sound waveform generation device 105.
S414: The operation of the device stops.
[0027]
FIG. 5 shows the operation of the text acquisition device 101 in the device of the present embodiment. The text acquisition device 101 operates as follows.
When a signal for instructing acquisition of the next text information in the text storage device 100 is received from the speech synthesis control device 111, the instruction is followed.
When the text information is obtained from the text storage device 100, the text information is transmitted to the morphological analysis device 102 and then to the syntax analysis device 103.
When receiving a signal from the voice synthesis control device 111 to support the stop of the operation, follow the instruction.
[0028]
The operation content of each step in the operation flow of FIG. 5 is as follows.
S500: Operation of the device starts.
S501: Various variables used inside the text acquisition device 101 are cleared.
S502: It is determined whether or not a signal for instructing to stop the operation has been received from the speech synthesis control device 111.
S503: It is determined whether a signal instructing acquisition of the next text information has been received from the speech synthesis control device 111.
S504: In the text storage device 100, the text information held immediately after the text information obtained last is obtained. After starting the operation, if the text information has not been obtained at least once, the text information held at the head position in the text storage device 100 is obtained.
S505: It is determined whether the morphological analyzer 102 is in a state where the next text information can be acquired. The morpheme analyzer 102 can acquire the next text information because the morpheme analyzer 102 has not yet acquired text information from the text acquirer 101 or the morpheme of the text information acquired last by the morpheme analyzer 102. This is the time when the analysis processing has already been completed and the morphological analysis results have been passed to all of the syntax analysis device 103, the reading analysis device 104, and the synthesized sound waveform generation device 105.
S506: The currently held text information is transmitted to the morphological analyzer 102.
S507: It is determined whether or not the syntax analysis device 103 is in a state where the next text information can be acquired. It should be noted that the syntax analysis device 103 can acquire the next text information because the syntax analysis device 103 has not yet acquired any information from the text acquisition device 101 and the morphological analysis device 102. This is when the result of the performed syntax analysis processing has already been passed to the reading analysis apparatus 104.
S508: The currently stored text information is transmitted to the syntax analysis device 103. S509: The operation of the device stops.
[0029]
FIG. 6 shows the operation of the morphological analysis device 102 in the device of the present embodiment. The morphological analyzer 102 operates as follows.
Receives text information from the text acquisition device 101, performs morphological analysis processing, and transmits the morphological analysis information to the syntax analysis device 103, the reading analysis device 104, and the synthetic sound waveform generation device 105. In the morphological analysis process, the linguistic information stored in the linguistic information database 106 is used.
[0030]
The operation content of each step in the operation flow of FIG. 6 is as follows.
S600: Operation of the device starts.
S601: Various variables used inside the morphological analyzer 102 are cleared.
S602: It is determined whether the text acquisition device 101 has stopped operating.
S603: The text acquisition device 101 determines whether or not the next text information can be transmitted to the morphological analysis device 102.
S604: The text information transmitted from the text acquisition device 101 is received.
S605: The morphological analysis processing of the text information received in S604 is performed with reference to the language information database 106.
S606: It is determined whether or not the syntax analysis device 103 is in a state where the next morphological analysis information can be acquired. Note that the syntactic analysis device 103 can acquire the next morphological analysis information because the syntactic analysis device 103 has not yet obtained the morphological analysis information from the morphological analysis device 102 or has obtained the last morphological analysis information from the morphological analysis device 102. This is the time when the result of the syntax analysis performed using the morphological analysis information obtained has already been passed to the reading and analyzing apparatus 104.
S607: The morphological analysis information generated in S605 is transmitted to the syntax analysis device 103.
S608: It is determined whether the reading analysis device 104 is in a state where the next morphological analysis information can be acquired. Note that the reading analysis apparatus 104 can acquire the next morphological analysis information because the reading analysis apparatus 104 has not yet acquired the morphological analysis information from the morphological analysis apparatus 102 or has obtained the last morphological analysis information from the morphological analysis apparatus 102. This is the time when the result of the reading analysis process performed using the obtained morphological analysis information has already been passed to the synthetic sound waveform generator 105.
S609: The morphological analysis information generated in S605 is transmitted to the reading analysis device 104.
S610: It is determined whether or not the synthesized sound waveform generating apparatus 105 is in a state where the next morphological analysis information can be acquired. Note that the synthesized morpheme analysis device 105 can acquire the next morphological analysis information only when the synthesized morpheme analysis device 105 has not yet acquired the morpheme analysis information from the morpheme analysis device 102. This is when the generation of the synthetic sound waveform using the morphological analysis information acquired last from the step has already been completed.
S611: The morphological analysis information generated in S605 is transmitted to the synthetic sound waveform generator 105.
S612: The operation of the device stops.
[0031]
FIG. 7 shows the operation of the syntax analysis device 103 in the device of the present embodiment. The syntax analyzer 103 operates as follows.
It receives text information from the text acquisition device 101 and morphological analysis information from the morphological analysis device 102, performs a syntax analysis process, and reads the syntax analysis information and sends it to the analysis device 104. In the syntax analysis processing, linguistic information stored in the linguistic information database 106 is used.
[0032]
The operation content of each step in the operation flow of FIG. 7 is as follows.
S700: The operation of the device starts.
S701: Various variables used inside the syntax analysis device 103 are cleared.
S702: It is determined whether the text acquisition device 101 has stopped operating.
S703: The text acquisition device 101 determines whether the next text information can be transmitted to the morphological analysis device 102.
S704: The text information transmitted from the text acquisition device 101 is received.
S705: The morphological analysis device 102 determines whether or not the next morphological analysis information can be transmitted to the syntax analysis device 103.
S706: The morphological analysis information transmitted from the morphological analyzer 102 is received.
S707: While referring to the language information database 106, a syntax analysis process of the text information received in S704 is performed using the morphological analysis information received from the morphological analysis device 102 in S706.
S708: It is determined whether the reading analysis device 104 is in a state where the next syntax analysis information can be acquired. Note that the reading analysis apparatus 104 can acquire the next syntactic analysis information because the reading analysis apparatus 104 has not yet obtained the syntactic analysis information from the syntactic analysis apparatus 103 or the syntactic analysis information is obtained last from the syntactic analysis apparatus 103. This is the time when the result of the reading analysis processing performed using the obtained syntactic analysis information has already been passed to the synthetic sound waveform generating apparatus 105.
S709: The syntax analysis information generated in S707 is transmitted to the reading analysis device 104.
S710: The operation of the device stops.
[0033]
FIG. 8 shows the operation of the reading analysis device 104 in the device of the present embodiment. The reading analysis device 104 operates roughly as follows.
Receiving text information from the text acquisition device 101, morphological analysis information from the morphological analysis device 102, and syntactic analysis information from the syntax analysis device 103, and reading the read information generated by referring to the language information database 106; Send to
[0034]
The operation content of each step in the operation flow of FIG. 8 is as follows.
S800: Operation of the device starts.
S801: Various variables used inside the reading analysis device 104 are cleared.
S802: It is determined whether the text acquisition device 101 has stopped operating.
S803: The text acquisition device 101 determines whether or not the next morphological analysis information can be transmitted to the reading analysis device 104.
S804: The text information transmitted from the text acquisition device 101 is received.
S805: The morphological analysis device 102 determines whether or not the next morphological analysis information can be transmitted to the reading analysis device 104.
S806: The morphological analysis information transmitted from the morphological analyzer 102 is received.
S807: The syntax analysis device 103 determines whether or not the next syntax analysis information can be transmitted to the reading analysis device 104.
S808: The syntax analysis information transmitted from the syntax analysis device 103 is received.
S809: The text information received from the text acquisition device 101 in S804, the morphological analysis information received from the morphological analysis device 102 in S806, and the syntax analysis information received from the syntax analysis device 103 in S808 while referring to the language information database 106. To perform reading analysis processing of the original text information to generate reading information.
S810: It is determined whether the synthesized sound waveform generating apparatus 105 is in a state where the next reading information can be acquired. It should be noted that the synthesized sound waveform generating apparatus 105 can acquire the next reading information only when the synthesized sound waveform generating apparatus 105 has not yet acquired the reading information from the reading analyzing apparatus 104 or when the last reading information is obtained from the reading analyzing apparatus 104. This is the case where the generation of the synthetic sound waveform using the read information acquired in step (1) has already been completed.
S811: The reading information generated in S809 is transmitted to the synthetic sound waveform generating device 105.
S812: The operation of the device stops.
[0035]
FIG. 9 shows the operation of the synthetic sound waveform generating device 105 in the device of the present embodiment. The synthetic sound waveform generator 105 operates roughly as follows.
Receives morphological analysis information from the morphological analysis device 102 and read information from the read analysis device 104, and writes the synthesized sound waveform generated with reference to the voice information database 107 to the waveform buffer device 108.
Receives a signal sent from the voice synthesis control device 111, and stops the operation and controls the fast-forward speed of the synthesized sound according to the signal.
[0036]
The operation content of each step in the operation flow of FIG. 9 is as follows.
S900: Operation of the device starts.
S901: Various variables used inside the synthetic sound waveform generating apparatus 105 are cleared. At this time, the reproduction speed of the synthesized sound is initialized to zero.
S902: It is determined whether or not a signal supporting the stop of the operation has been received from the speech synthesis control device 111.
S903: It is determined whether or not a signal instructing to increase the reproduction speed of the synthesized sound by one step from the voice synthesis control device 111 is received.
S904: Increase the reproduction speed of the synthesized sound by one. However, the reproduction speed of the synthesized sound should not exceed 5.
S905: It is determined whether or not there is an input from the speech synthesis control device 111 to instruct to decrease the reproduction speed of the synthesized sound by one step.
S906: Decrease the synthetic sound reproduction speed by one. However, the reproduction speed of the synthesized sound should not fall below 0.
S907: The morphological analyzer 102 determines whether or not the next morphological analysis information can be transmitted to the synthetic sound waveform generator 105.
S908: The morphological analysis information transmitted from the morphological analyzer 102 is received.
S909: The reading analysis device 104 determines whether or not the next reading information can be transmitted to the synthetic sound waveform generating device 105.
S910: Reading information transmitted from the reading analysis device 104 is received.
S911: A synthetic sound waveform is generated using the morphological analysis information received from the morphological analyzer 102 in S908 and the reading information received from the reading analyzer 104 in S910 while referring to the voice information database 107. At this time, if the reproduction speed of the synthetic sound is 0, a synthetic sound waveform is generated for all of the passed reading information. If the reproduction speed of the synthetic sound is 1 or more, the method shown in FIG. After removing punctuation marks and particles / auxiliary verbs, morphemes are further thinned out, and only the synthesized sound waveform corresponding to the remaining morphemes is output. For example, if the reproduction speed of the synthetic sound is 1, a synthetic sound is generated by removing every other morpheme after removing punctuation marks and particles / auxiliary verbs, and if the reproduction speed of the synthetic sound is 2, Similarly, every third morpheme is taken out.
S912: It is determined whether the waveform buffer device 108 is empty. It is to be noted that the waveform buffer device 108 being empty means that the synthesized waveform has not been written to the waveform buffer device even once after the operation of the device has started, or the reproduction of the last synthesized waveform has already been completed. State.
S913: The synthesized sound waveform generated in S911 is written in the waveform buffer device 108.
S914: The operation of the device stops.
[0037]
When the synthesized sound waveform is written in the waveform buffer device 108, the speaker device 109 reproduces the sound waveform. When the reproduction is completed, the speaker device 109 repeats waiting for the next synthesized sound waveform to be written in the waveform buffer device 108 again.
[0038]
With the above configuration and procedure, it is possible to realize a speech synthesizer that can reproduce text information stored in the text storage device 100 as a synthesized sound waveform while performing fast-forwarding by thinning out morpheme units.
[0039]
In this method, by using the result of the morphological analysis performed when generating the synthesized sound waveform, the synthesized sound is thinned out by a very simple algorithm, thereby minimizing the amount of calculation processing newly required for fast-forwarding. It is possible to fast-forward a synthetic sound that is highly audible and easy to understand while suppressing it. This method is suitable for applications such as reading out a mail on a mobile phone or reading out road information using a car navigation system and performing an early listening in an environment where resources available for speech synthesis are limited.
[0040]
In this embodiment, the morpheme thinning is controlled by, for example, a method of “thinning out one morpheme every third”, for example, by starting the reproduction of the synthesized sound at the current fast-forward speed. The morpheme is decimated when the sum of the time lengths of the synthesized sounds that have been reproduced after that exceeds half that of normal playback without fast-forwarding. It is also possible to control.
[0041]
(Second embodiment)
FIG. 10 shows the overall configuration of the speech synthesizer according to the second embodiment. This speech synthesizer performs fast-forward playback by thinning out morpheme units using the result of syntax analysis (syntax analysis information). FIG. 11 shows a data flow in the speech synthesizer. The difference between the apparatus configuration shown in FIG. 1 and the flow of data shown in FIG. 2 is that morphological analysis information is not directly sent from the morphological analyzer 102 to the synthetic sound waveform generator 105, and The point is that syntax analysis information is directly sent to the synthetic sound waveform generating apparatus 105.
[0042]
FIG. 12 shows an example of a shortening process of a synthesized sound waveform by thinning out morpheme units using a result of syntax analysis in the present embodiment. Here, when a synthetic sound waveform is generated for the entire text sentence 200, a synthetic sound waveform 204 is generated. In consideration of the syntax analysis information, a portion having low importance is preferentially thinned out in units of morphemes. A sound waveform 1200 can be generated. At this time, the synthetic sound waveform 1200 may be generated by generating only a waveform of a necessary portion, or may be generated by first generating the synthetic sound waveform 204 and extracting a waveform of the necessary portion therefrom.
[0043]
In the processing of this figure, first, the subject and predicate clauses considered to be essential for the composition of the sentence are extracted from the original text sentence 200 that has been subjected to morphological analysis and syntax analysis, and information on the interdependence between the clauses is further extracted. At a glance, a phrase considered to be highly important is extracted to generate a synthetic sound waveform 1200. A phrase that is considered to be of high importance is, for example, a phrase that modifies a predicate of the entire sentence and modifies the subject or predicate of the entire sentence, or a phrase that ends with a particle such as a Phrases.
[0044]
The flowchart of each component in the present embodiment will be described in the form of a difference from the flowchart of the first embodiment.
[0045]
The operation of the speech synthesis control device 111 is the same as that shown in FIG. 4 of the first embodiment.
[0046]
The operation of the text acquisition device 101 is the same as that shown in FIG. 5 of the first embodiment.
[0047]
The morphological analyzer 102 in the present embodiment performs exactly the same operation as the first embodiment except that it does not need to transmit the morphological analysis information to the synthetic sound waveform generator 105. Therefore, S610 and S611 are omitted from the operation flow shown in FIG. 6, and the process shifts to S602 after the end of S609 is the operation flow of the morphological analyzer 102 in the present embodiment.
[0048]
The parsing device 103 according to the present embodiment performs exactly the same operation as that of the first embodiment except that the parsing information needs to be transmitted to the synthetic sound waveform generating device 105. Therefore, immediately after S709 of the operation flow shown in FIG. 7, a step for further determining whether or not the synthesized sound waveform generating apparatus 105 can receive the syntax analysis information, and transmitting the syntax analysis information to the synthesized sound waveform generating apparatus 105 The operation flow of the syntax analysis device 103 according to the present embodiment is obtained by adding a step for performing the operation.
[0049]
The operation of the reading analysis device 104 is the same as that shown in FIG. 8 of the first embodiment.
[0050]
Compared with the first embodiment, the synthetic sound waveform generating apparatus 105 according to the present embodiment is different from the first embodiment in that, instead of receiving morphological analysis information from the morphological analyzer 102, it receives syntactic analysis information from the syntactic analyzer 103. The flow is exactly the same. Therefore, in the operation flow shown in FIG. 9, S907 for determining whether morphological analysis information can be received from the morphological analysis device 102 and S908 for receiving morphological analysis information, respectively, the syntax analysis information is received from the syntax analysis device S103. The operation flow of the synthesized sound waveform generating apparatus 105 according to the present embodiment has been changed to a step of determining whether or not it can be performed and a step of receiving syntax analysis information. The synthesized sound waveform generating apparatus 105 generates a synthesized sound waveform using the received syntax analysis information in S911. At this time, if the reproduction speed of the synthesized sound is 0, a synthesized sound waveform is generated for all of the passed read information. However, if the reproduction speed of the synthesized sound is 1 or more, as shown in FIG. The synthetic sound waveform 1200 is generated in such a manner that the number of morphemes to be thinned out increases as the reproduction speed of the synthetic sound increases.
[0051]
With the above configuration and procedure, it is possible to realize a speech synthesizer that can reproduce text information stored in the text storage device 100 as a synthesized sound waveform while performing fast-forwarding by thinning out morpheme units based on syntax analysis information.
[0052]
In this method, by using the result of the syntax analysis processing obtained by using the result of the morphological analysis processing, it becomes possible to perform the fast-forward processing reflecting the importance of each morpheme as compared with the method of the first embodiment.
[0053]
(Third embodiment)
FIG. 13 shows the overall configuration of the speech synthesizer according to the third embodiment. This device performs fast-forward playback by summarizing text information to be subjected to speech synthesis processing in advance. FIG. 14 shows the flow of data in this speech synthesizer. The difference between the configuration of the apparatus shown in FIG. 1 and the flow of data shown in FIG. 2 is that in this speech synthesis apparatus, the synthesized speech generation apparatus 105 does not directly use the morphological analysis information 201 but is included in the reading information 203. The synthesis sound waveform 204 is generated using only the information. Instead, a text summarization device 1300 is provided, whereby the text information 200 is first summarized into the text information 1400, and then the speech synthesis process is performed. Is to be shortened.
[0054]
In this configuration, first, the text information input is summarized by the text summarization device 1300 to shorten the length of the synthesized sound waveform that is output. Therefore, in the processing after the text acquisition device 101, the shortening of the synthesized sound waveform is not considered. There is an advantage that it may be.
[0055]
FIG. 15 shows an operation flow of the text summarizing apparatus 1300. The text summarizing apparatus 1300 operates as follows.
The text information is received from the text storage device 100, subjected to a summarization process to reduce the length to less than the original text, and transmitted to the text acquisition device 101.
Receives a signal sent from the voice synthesis control device 111, and stops the operation and controls the fast-forward speed of the synthesized sound according to the signal.
As the fast-forward speed of the synthesized sound specified by the voice synthesis control device 111 increases, the length of the summarized text information for the input text information monotonically decreases.
[0056]
The operation content of each step in the operation flow of FIG. 15 is as follows.
S1500: Operation of the device starts.
S1501: Various variables used inside the text summarization apparatus 1300 are cleared. At this time, the reproduction speed of the synthesized sound is initialized to zero.
S1502: It is determined whether or not a signal supporting stop of the operation has been received from the speech synthesis control device 111.
S1503: It is determined whether or not a signal instructing to increase the reproduction speed of the synthesized sound by one step from the voice synthesis control device 111 is received.
S1504: The reproduction speed of the synthesized sound is increased by one. However, the reproduction speed of the synthesized sound should not exceed 5.
S1505: It is determined whether or not there is an input from the speech synthesis control device 111 to instruct to reduce the reproduction speed of the synthesized sound by one step.
S1506: Decrease the reproduction speed of the synthesized sound by one. However, the reproduction speed of the synthesized sound should not fall below 0.
S1507: It is determined whether or not a signal for instructing acquisition of the next text information has been received from the speech synthesis control device 111.
S1508: In the text storage device 100, the text information held immediately after the text information obtained last is obtained. After starting the operation, if the text information has not been obtained at least once, the text information held at the head position in the text storage device 100 is obtained.
S1509: Summarize the text information acquired in S1508.
S1510: The text acquisition device 101 determines whether the summarized text information generated in S1509 can be acquired. The text acquisition device 101 can acquire text information because the text acquisition device 101 has not yet acquired text information from the text summarization device 1300 or the text information acquired last from the text summarization device 1300 is , Which has already been passed to the morphological analysis device 102, the syntax analysis device 103, and the reading analysis device 104.
S1511: The summarized text information generated in S1509 is transmitted to the text acquisition device 101.
S1512: Operation of the device stops.
[0057]
Hereinafter, the operation flow of each device other than the text summarization device 1300 will be described in the form of a difference from the flowchart of the first embodiment.
[0058]
The speech synthesis control device 111 performs the processing performed on the text acquisition device 101 in S406, S410, S411, S412, and S413 of the operation flow illustrated in FIG. Do. Further, in the present embodiment, the processing that has been performed on the synthesized sound waveform generating apparatus 105 in S407, S408, and S409 is similarly performed on the text summarizing apparatus 1300.
[0059]
The text acquisition device 101 replaces the operation of waiting for a signal from the speech synthesis control device 111 in S503 of the operation flow illustrated in FIG. 5 with the operation of waiting for transmission of text information from the text summarization device 1300 in the present embodiment. I do. Also, the place where the text information was obtained from the text obtaining apparatus 100 in S504 is obtained from the text summarizing apparatus 1300.
[0060]
The morphological analyzer 102 performs the same operation as that of the second embodiment.
[0061]
The operation of the syntax analyzer 103 is the same as that shown in FIG. 7 of the first embodiment.
[0062]
The operation of the reading analysis device 104 is the same as that shown in FIG. 8 of the first embodiment.
[0063]
The operation of the synthetic sound waveform generation apparatus 105 in the present embodiment does not require the processing of S903, S904, S905, S906, S907, and S908 of the operation flow shown in FIG. In the present embodiment, after the process of S902, the determination process of S909 is performed, and as a result of the determination, the process branches to S910 if the next reading information can be transmitted by the reading analysis device 104, and branches to S902 if it cannot. Perform processing.
[0064]
With the above configuration and procedure, it is possible to realize a speech synthesizer that can reproduce text information stored in the text storage device 100 as a synthetic sound waveform while performing shortening processing by the text summarization device 1300 that has been turned into a black box.
[0065]
This embodiment has an advantage that the operation of a part that performs speech synthesis processing from text information can be simplified by leaving the text information shortening to the separately prepared text summarizing apparatus 1300. In addition, the text information summarization process requires more resources than simple thinning processing in morpheme units. However, the text summarization device 1300 analyzes the meaning of the original text information and performs an appropriate summarization process. It is possible to perform fast-forward processing of synthesized sounds that are semantically more accurate than performing simple thinning processing of morphemes and that are well-formed as text.
[0066]
It should be noted that it is also possible to realize a speech synthesizer obtained by combining any two or three of the speech synthesizers described in the first to third embodiments.
[0067]
【The invention's effect】
As described above, according to the present invention, the audibility and ease of understanding of a synthesized sound can be improved by thinning the input text information in units of linguistic delimiters and summarizing the text information in consideration of the meaning. The fast forward processing of the synthesized sound can be performed while ensuring the sound quality. This makes it possible to easily perform operations such as "early listening" when the text is read aloud by the synthetic sound, and "skip" for searching for the target text information from the text storage means.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of a speech synthesis device according to a first embodiment.
FIG. 2 is a diagram showing a data flow in the speech synthesizer shown in FIG.
FIG. 3 is a schematic diagram illustrating an example of a synthetic sound thinning process.
FIG. 4 is a flowchart showing the operation of the speech synthesis control device shown in FIG.
FIG. 5 is a flowchart illustrating an operation of the text acquisition device illustrated in FIG. 1;
FIG. 6 is a flowchart showing an operation of the morphological analyzer shown in FIG. 1;
FIG. 7 is a flowchart showing an operation of the syntax analysis device shown in FIG. 1;
FIG. 8 is a flowchart showing an operation of the reading analysis apparatus shown in FIG. 1;
FIG. 9 is a flowchart showing an operation of the synthetic sound waveform generating device shown in FIG. 1;
FIG. 10 is a block diagram illustrating a configuration of a speech synthesis device according to a second embodiment.
FIG. 11 is a diagram showing a data flow in the speech synthesizer shown in FIG.
FIG. 12 is a schematic diagram illustrating an example of a thinning-out process of a synthesized sound in the speech synthesizer illustrated in FIG. 10;
FIG. 13 is a block diagram illustrating a configuration of a speech synthesis device according to a third embodiment.
FIG. 14 is a diagram showing a data flow in the speech synthesizer shown in FIG.
FIG. 15 is a flowchart showing the operation of the text summarizing apparatus shown in FIG.

Claims

Text storage means for holding text information;
Text acquisition means for acquiring text information held in the text storage means,
A morphological analysis unit that performs a morphological analysis process on the text information acquired by the text acquisition unit,
Synthetic sound waveform generating means for generating a speech waveform corresponding to text information obtained by thinning out the text information in morpheme units based on the text information obtained by the text obtaining means and the result of the morphological analysis processing by the morphological analysis means When,
A speaker for converting an audio waveform generated by the synthetic sound waveform generator into an audible sound wave and outputting the audible sound.

Text storage means for holding text information;
Text acquisition means for acquiring text information held in the text storage means,
Syntax analysis means for performing syntax analysis processing on the text information acquired by the text acquisition means,
Based on the text information acquired by the text acquisition means and the result of the syntax analysis processing by the syntax analysis means, an audio waveform corresponding to the text information obtained by thinning out the text information according to the syntactic importance is generated. Synthetic sound waveform generating means;
A speaker for converting an audio waveform generated by the synthetic sound waveform generator into an audible sound wave and outputting the audible sound.

Text storage means for holding text information;
Text summarization means for summarizing text information held in the text storage means,
Synthetic sound waveform generating means for generating a speech waveform corresponding to the text information summarized by the text summarizing means,
A speaker for converting an audio waveform generated by the synthetic sound waveform generator into an audible sound wave and outputting the audible sound.