JP3603008B2

JP3603008B2 - Speech synthesis processor

Info

Publication number: JP3603008B2
Application number: JP2000163460A
Authority: JP
Inventors: 章寛隈田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2000-05-31
Filing date: 2000-05-31
Publication date: 2004-12-15
Anticipated expiration: 2020-05-31
Also published as: JP2001343991A

Description

【０００１】
【発明の属する技術分野】
本発明は、読み取った文章を発声させる音声合成処理装置に関する。
【０００２】
【従来の技術】
特開平８−８３２７０号公報には、音声合成装置においてテキストデータに話調に関するデータを指定することで、その複合データから自動的に話調を変更して音声を出力できる構成が開示されている。これは起伏のない朗読調になりがちな音声合成処理において、使用者がテキストデータに対して感情的な話調データを指定することで、擬似的に起伏のある感情のこもっているような音声を自動的に発声させることを可能にしている。
【０００３】
【発明が解決しようとする課題】
しかしながら、上述した従来の方式では、頻繁に話調が変化するような文章では話調変更に関する表示データが多く煩雑であり、見づらくなることがある。
【０００４】
本発明では、話調変更規則を改善することにより、より見やすく扱いが容易にすることを目的とする。
【０００５】
【課題を解決するための手段】
本発明は、発声する音声合成文および発声する声質の指定、切替を行う声質切替情報を含む入力文章を順に読み込んで音声合成文および声質切替情報を抽出し、抽出した音声合成文を、声質切替情報に基づいて音声合成処理を行い、指定された声質で発声する音声合成処理装置において、
抽出した声質切替情報を順に格納する声質切替履歴記憶手段を有し、読み出した音声合成文は、最も新しく格納された声質切替情報に基づいて発声し、
前記声質切替情報は、声質の指定を示す声質切替情報を解除する声質解除情報を含み、声質解除情報を読み出したとき、最も新しく格納された声質切替情報を解除し、以降の文章は、前記最も新しく格納された最新声質切替情報よりも１回前に格納されている前回声質切替情報に基づいて発声することを特徴とする音声合成処理装置である。
【０００６】
また本発明の前記声質解除情報は、改行コードを含むことを特徴とする。
また本発明の前記声質解除情報は、句点コードまたは読点コードを含むことを特徴とする。
【０００７】
本発明に従えば、入力文章は音声合成文と声質切替情報とを含み、入力文章を順に読み取って、音声合成文と声質切替情報を読み出し、抽出した声質切替情報を声質切替履歴記憶手段に格納しておく。そして、読み出した音声合成文を、最も新しく格納された声質切替情報に基づいて発声する。このようにして、入力文章に挿入された声質切替情報に応じて声質を変更して発声する。
本発明では、声質切替情報は声質解除情報を含み、声質解除情報を読み出したとき、声質切替履歴記憶手段に最も新しく格納された声質切替情報が打ち消されて解除される。これによって、次に、声質切替履歴記憶手段に格納されている前回の声質切替情報に基づいて発声される。したがって、たとえば最初に標準声質設定を行った場合には、声質を変える部分の文頭に声質切替履歴記憶手段に最も新しく格納された最新声質切替情報を設定し、文末に、声質解除情報を設定することで、以降の文章は、最も新しく格納された声質切替情報よりも１回前に声質切替履歴記憶手段に格納されている前回声質切替情報であるもとの標準声質にもどすことができる。
【０００８】
また、本発明では改行コードや句読点など、入力文章にもともと挿入されるコードを声質解除情報に設定することによって、前述した従来技術のように、話調変更に関する表示データが煩雑に表示されることが防がれ、見ずらくなるといったことが防がれる。
【０００９】
また本発明の前記声質切替情報は、疑問符または感嘆符を含むことを特徴とする。
【００１０】
また本発明は、声質切替情報である前記疑問符、または感嘆符を読み出したとき、その直前の音声合成文の語句を、声質切替情報に対応付けられた声質で発声させることを特徴とする。
【００１１】
本発明に従えば、疑問符や感嘆符を声質切替情報とし、たとえば疑問符を読み出したとき、その直前の語句の語尾が上がるように設定したり、感嘆符の直前の語句は、驚いた口調で発生するように設定することによって、特別な話調変更データを挿入しなくとも、自然な話調で発声することができる。
【００１２】
【発明の実施の形態】
以下、添付した図面を参照して本発明の音声合成処理装置の実施の一形態について詳細に説明する。本実施形態の音声合成処理装置は、たとえばパーソナル・コンピュータ、または携帯情報端末などの情報処理装置によって実現される。
【００１３】
図１は本実施形態の音声合成装置１の概略構成を示すブロック図である。本装置はＣＰＵ（ｃｅｎｔｒａｌｐｒｏｃｅｓｓｉｎｇｕｎｉｔ）１１、ＲＯＭ（ｒｅａｄｏｎｌｙｍｅｍｏｒｙ）１２、ＲＡＭ（ｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）１３、記憶装置１４、辞書１５、入力部１６、表示部１７、音響処理部１８、およびスピーカ１９から構成される。
つぎに、本装置の概要について説明する。記憶装置１４には、音声合成を行う文章が格納されており、使用者は入力部１６、表示部１７により前記文章に音質切替情報を挿入して文章の編集を行い、音声合成文および声質切替情報を含む入力文章を作成する。なお、声質切替情報の具体的な挿入方法については、図８〜図１４で詳細に説明する。
【００１４】
音声合成処理のプログラムはＲＯＭ１２に格納されており、辞書１５には、漢字の読みやアクセント情報がデータとして登録されている。ＣＰＵ１１は、ＲＯＭ１２に格納されるプログラムにしたがって記憶装置１４から前記入力文章を読み出し、辞書１５に記憶されたデータをもとに音響処理部１８で抑揚とともに、指定された声質で音声合成を行い、スピーカ１９から発声する。
【００１５】
図２は音声合成装置１の音声合成処理の構成を示すブロック図である。処理部２２は、前記記憶装置１４から声質切替情報が含まれた入力文章２１を順に読み出し、音声合成処理を行い、声質切換を伴った音声２８を発声させる。
【００１６】
処理部２２は、フォント音質対応記憶手段２３、音声合成文一時記憶手段２４、声質切替履歴記憶手段２６、音声合成処理部２７とを有する。これらは、実行時にＲＡＭ１３に生成される。
【００１７】
フォント声質対応記憶手段２３は、図３に一例として示すように、フォントと声質切替情報とを対応させて記憶している。たとえば、フォント欄に示されているロボットの顔に似せた絵文字は、“ロボットの声にする”という声質切替情報に対応づけられている。また、句点“。”は、声質解除情報として対応づけられおり、“！”、“？”は、それぞれのフォントが出現する以前の文章を、驚いた声、疑問の声で発声する声質切替情報と対応づけられている。
【００１８】
また、通常は表示されないが、テキストデータに含まれる改行コードや読点“、”を声質解除情報として設定してもよい。
【００１９】
つぎに、音声合成処理方法について説明する。入力文章２１は先頭から一文字ずつ読み出して処理が行われる。読み出したフォントが、フォント声質対応記憶手段２３で対応づけされていない場合は、音声合成文となるテキストデータとして、音声合成文一時記憶手段２４に一時的に記憶する。また、対応づけされた声質切替フォントは対応する声質切替情報２５に変換し、この声質切替情報２５に従って、声質情報を声質切替履歴記憶手段２６に記憶する。
【００２０】
このようにして、一連の入力文章を読み込み、音声合成文一時記憶手段２４に記憶された音声合成文と共に声質切替情報を音声合成処理部２７に送り、音声２８として発声する。
【００２１】
図４〜６は声質切替履歴記憶手段２６の記憶形式及び動作について示したものである。声質切替履歴記憶手段はスタックのような動作を行い、読みだした順に声質切替情報がスタックにプッシュされ、声質解除情報によりスタックからポップされるものとする。
【００２２】
図４を参照して音質切替履歴記憶手段２６の動作について説明する。音質切替履歴記憶手段２６には、下から“宇宙人の声”、“相撲取りの声”、“お婆さんの声”の順に音質切替情報が積み上げられて格納されており、音声合成文一時記憶手段２４に記憶されている情報はリセットされ、何も記憶されていないものとする。そして、入力文章を順に読み出し、音質切替情報が現れるまで、音声合成文一時記憶手段２４に音声合成文がテキストデータとして蓄積される。
【００２３】
図４の４１の状態は、“ロボットの声にする”という声質切替情報が現れたときの状態を示す。以下の説明では、声質切替情報を、声質情報と略称することがある。声質切替情報が現れると、声質切替情報履歴記憶手段２６に最後に積まれた情報である“お婆さんの声にする”という声質情報と共に音声合成文一時記憶手段２４に記憶された音声合成文を音声合成処理部２７へ送り、音声合成文をお婆さんの声で発声させる。その上で、“ロボットの声にする”という声質情報を音質切替履歴記憶手段２６の最後に積むことにする。そうすることで、声質切替情報以後の文章を、ロボットの声質で発声させることができる。
【００２４】
図５は声質切替情報として声質解除情報が与えられた場合の処理を示す。５１の状態から声質解除情報が与えられた時は、それまでに最後に積まれた最新の情報である“ロボットの声にする”という声質情報と共に音声合成文一時記憶手段２４に記憶された文章を音声合成処理部へ送り、ロボットの声で発声させる。その上で、スタックの最上部に積まれた“ロボットの声にする”という声質情報をスタックから削除する（５２）。そうすることにより、次に発声される文章を、元の声質情報（すなわち前記最新の情報“ロボットの声にする”よりも１回前に声質切替履歴記憶手段２６に格納されている前回声質切替情報）である“お婆さんの声”に戻すことが可能となる。
【００２５】
また、図３で示したように、句点“。”を声質解除情報と設定することで、一文ずつ、音質切替履歴記憶手段２６に積まれた声質切替情報を取り出し、一文ごとに、声質切替履歴記憶手段２６に格納される声質切替情報で順に発声することができる。
【００２６】
また、改行コードを声質解除情報として設定した場合は、一段落を一まとまりの音声合成文として発声することができ、読点“、”を声質解除情報として設定した場合は、読点で区切られた文章を一まとまりの音声合成文として発声することができる。
また、図４の４２で“ロボットの声にする”という声質情報をスタックに積むとき、複数、たとえば２個積むことにより、その直後の声質解除情報を無効にし、２文を指定された声質で発声させることも可能である。
【００２７】
図６は直前文声質切替情報が与えられた場合の処理を示している。図３で示したように、“！”および“？”には、直前文声質切替情報が対応付けられており、図６の６１の状態において、“！”に対応づけられた“驚いた声にする”という直前文声質切替情報が与えられた時は、最後に積まれた情報である“お婆さんの声”という声質情報に“驚いた声”という声質情報を加えた上に、音声合成文一時記憶手段２４に記憶された文章と共に音声合成処理部へ送り、お婆さんの驚いた声で発声させる。この“驚いた声”の声質情報は声質切替履歴手段２６には積まず、声質切替履歴手段２６はそのままの状態を保持する（６２）。
【００２８】
図７は本発明の動作を示すフローチャートである。前述したように、声質切替情報を含んだ入力文章を１文字づつ読み出し、図７に示すフローチャートに従って一文字ずつ処理する。
【００２９】
まず、読み出した文字が声質切替情報であるかを判定し（ステップＳ７０１）、声質切替情報の場合は図４で示したように声質切替履歴記憶手段２４の最上部に積まれた声質情報で発声させる（ステップＳ７０２）。その後、音声合成文一時記憶手段２４に記憶される音声合成文を削除した上で、入力切替情報を声質切替履歴手段２６に積んで元の処理に戻る（ステップＳ７０３）。
【００３０】
声質切替情報でない場合は、声質解除情報であるかを判定し、（ステップＳ７０４）、声質解除情報の場合は図５で示したように声質切替履歴記憶手段２４の最上部に積まれた最新の声質情報で発声させる（ステップＳ７０５）。音声合成文一時記憶手段２４の音声合成文を削除した上で、声質切替履歴記憶手段２４の最上部に積まれた声質情報を削除し、元の処理に戻り、すなわち前記最新声質切替情報よりも１回前に格納されている前記声質切替情報で発声する（ステップＳ７０６）。
【００３１】
声質解除情報でない場合は、直前文声質切替情報であるかを判定し、（ステップＳ７０７）、直前文声質切替情報の場合は図６で示したように声質切替履歴記憶手段２４の最上部に積まれた声質情報に直前文声質切替情報を加えた声質で発声させ、音声合成文一時記憶手段２４の情報を削除した上で、元の処理に戻る（ステップＳ７０８）。
【００３２】
直前文声質切替情報でない場合は、通常のテキストデータとして、音声合成文一時記憶手段２４に一時記憶（ステップ７０９）し、その後、全文が終了したかどうかを判定し（ステップ７１０）、終了していない場合は元の処理に戻る。終了した場合は、音声合成文一時記憶手段２４に記憶される音声合成文を声質切替履歴記憶手段２４の最上部に積まれた声質情報で発声して処理を終了する（Ｓ７１１）。
【００３３】
図８〜１４は音声合成処理すべき文章に声質切替情報を挿入して入力文章を作成するときの表示例である。文章全文を指定された声質で発声される場合は、まず、図８に示すように、文章の先頭にカーソル１００を配置する。つぎに、図９に示すように、その場所でメニュー表示を表示させる、そこで希望の声質を選択する。こうすることで、フォント声質対応記憶手段２３に記憶されている対応付けされたフォント１０１が、図１０のようにカーソル位置に挿入される。このように文章の先頭にのみ声質切替情報が挿入された入力文章は、全文が指定された声質で発声される。
【００３４】
その他の設定状態として、文章全体に標準声質設定が指定されており、句点コード“。”が声質解除情報に対応づけられており、上記したように、文章の先頭のみに声質切替情報が挿入される場合は、最初の文章の“突然ですが、本日５時に集まることになりました。”のみが声質切替情報で指定された声質で発声され、その後は標準声質設定の声質で発声されることになる。このような標準声質の設定は、たとえば声質解除情報によって解除されないように設定されて声質切替履歴記憶手段２６に格納するようにしてもよい。
【００３５】
図１１からは使用者が指定する区間のみの声質を切り替える時の手順を示している。カーソル１００を声質切替えしたい区間の先頭に配置し（図１１）、シフトキーを押しながらカーソルキーを押すなどによる既存のテキスト文書の区間指定手段にしたがって、終点を指定する（図１２）。区間が指定された状態のまま、メニュー表示を開いて希望の声質を選択する（図１３）。そうすることで、声質切替えをする先頭に声質切替情報に対応づけられたフォント１４１が挿入され、終点には声質解除情報に対応づけされたフォント１４２が挿入される。
この場合、文章の先頭の“突然ですが、〜連絡しておきます。”までが標準声質設定の声質で発声され、その次の“ご注意！〜”の手前に、声質切替えフォント１４１があるため、この“ご注意！〜電話で確認して下さい。”までを対応する声質情報に切り替えて発声させる。その次には、声質解除情報に対応づけされたフォント１４２があるので、前記“ご注意”の前にある声質切替えフォント１４１の設定を解除し、以降の文章は、対応する声質切替え情報１４１以前の声質である標準声質設定で発声されることになる。
【００３６】
その他の条件として例えば声質切替えフォント１４１以前に“ロボットの声質に切り替える”声質切替えフォントが指定されていた場合は、声質解除フォント１４２以降がロボットの声質で発声される。
【００３７】
【発明の効果】
本発明によれば、声質解除情報を設定することで、元の声質に戻して発声することができる。この声質解除情報として、テキストデータにもともと挿入される改行コードや句読点などのコードを対応づけることで、表示が煩雑にならず見やすくなる。
また、疑問符や感嘆符が付されている直前の単語の声質を変えることで、内容を確実に伝えることができ、聞く場合に、注意して聞くところを促すことができる。
【図面の簡単な説明】
【図１】本発明の実施の一形態の音声合成処理装置１を示す概略図である。
【図２】音声合成処理を示すブロック図である。
【図３】フォント声質対応記憶手段２３の記憶形式の一例である。
【図４】声質切替履歴記憶手段２６の声質切替情報が与えられた時の動作の一例を示す概略図である。
【図５】声質切替履歴記憶手段２６の声質解除情報が与えられた時の動作の一例を示す概略図である。
【図６】声質切替履歴記憶手段２６の直前文声質切替情報が与えられた時の動作の一例を示す概略図である。
【図７】本発明の動作説明のためのフローチャートである。
【図８】声質切替情報を挿入するときの表示内容の一例を示す概略図である。
【図９】声質切替情報を挿入するときの表示内容の一例を示す概略図である。
【図１０】声質切替情報を挿入するときの表示内容の一例を示す概略図である。
【図１１】声質切替情報を挿入するときの表示内容の一例を示す概略図である。
【図１２】声質切替情報を挿入するときの表示内容の一例を示す概略図である。
【図１３】声質切替情報を挿入するときの表示内容の一例を示す概略図である。
【図１４】声質切替情報を挿入するときの表示内容の一例を示す概略図である。
【符号の説明】
１音声合成処理装置
１１ＣＰＵ
１２ＲＯＭ
１３ＲＡＭ
１４記憶装置
１５辞書
１６入力部
１７表示部
１８音響処理部
１９スピーカ
２１入力文章
２２処理部
２３フォント声質対応記憶手段
２４音声合成文一時記憶手段
２５声質切替情報
２６声質切替履歴記憶手段
２７音声合成処理部
２８音声
１００カーソル
１０１，１０２声質切替えに対応づけられたフォント
１４２声質解除に対応づけられたフォント[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a speech synthesis processing device that utters a read sentence.
[0002]
[Prior art]
Japanese Patent Laying-Open No. 8-83270 discloses a configuration in which a speech synthesizer can specify speech-related data in text data to automatically change the speech from the composite data and output speech. . This is because the user specifies emotional speech data for text data in a speech synthesis process that tends to be a read-aloud tone without undulations, so that simulated speech with muffled emotions Is automatically uttered.
[0003]
[Problems to be solved by the invention]
However, in the above-described conventional method, a sentence whose speech tone changes frequently has a lot of display data relating to the speech tone change, which is complicated, and may be difficult to see.
[0004]
SUMMARY OF THE INVENTION It is an object of the present invention to improve the tone change rule so that it is easier to see and handle.
[0005]
[Means for Solving the Problems]
According to the present invention, an input sentence including a voice synthesis sentence to be uttered and voice quality switching information for designating and switching the voice to be uttered is sequentially read to extract a voice synthesized sentence and voice quality switching information, and the extracted voice synthesized sentence is subjected to voice quality switching. In a speech synthesis processing device that performs speech synthesis processing based on information and utters with a specified voice quality,
It has voice quality switching history storage means for sequentially storing the extracted voice quality switching information, and the read speech synthesis sentence utters based on the most recently stored voice quality switching information,
The voice switching information includes voice release information for releasing the voice switching information indicating the designated voice quality, when reading the voice quality release information, releases the voice switching information the most recently stored, and later sentences, the most A speech synthesis processing device characterized by uttering based on previous voice quality switching information stored one time before newly stored latest voice quality switching information .
[0006]
Further, the voice quality cancellation information of the present invention includes a line feed code.
Further, the voice quality cancellation information of the present invention is characterized in that it includes a period code or a reading code.
[0007]
According to the present invention, the input sentence includes a speech synthesis sentence and voice quality switching information, reads the input sentence in order, reads out the speech synthesis sentence and the voice quality switching information, and stores the extracted voice quality switching information in the voice quality switching history storage means. Keep it. Then, the read voice synthesis sentence is uttered based on the voice quality switching information stored most recently. Thus, the voice is changed and the voice is changed according to the voice quality switching information inserted in the input sentence.
According to the present invention, the voice quality switching information includes voice quality release information, and when the voice quality release information is read, the voice quality switching information most recently stored in the voice quality switching history storage unit is canceled and released. Thus, in the following, it is uttered based on the previous rounds of voice switching information stored in the voice switching history storage means. Therefore, for example, when the standard voice quality setting is first performed, the latest voice quality switching information stored most recently in the voice quality switching history storage unit is set at the beginning of the part where the voice quality is changed, and the voice quality release information is set at the end of the sentence. Thus, the subsequent sentence can be returned to the original standard voice quality, which is the previous voice quality switching information stored in the voice quality switching history storage unit one time before the most recently stored voice quality switching information .
[0008]
Further, in the present invention, by setting a code originally inserted into an input sentence, such as a line feed code or a punctuation mark, in voice quality cancellation information, display data relating to a tone change is displayed in a complicated manner as in the above-described related art. Can be prevented, and it can be prevented from being difficult to see.
[0009]
Further, the voice quality switching information of the present invention includes a question mark or an exclamation mark.
[0010]
Further, the present invention is characterized in that, when the question mark or the exclamation mark, which is the voice quality switching information, is read, the word of the immediately preceding speech synthesis sentence is uttered in the voice quality associated with the voice quality switching information.
[0011]
According to the present invention, a question mark or an exclamation mark is used as voice quality switching information. For example, when a question mark is read, the ending of the word immediately before the question mark is set or the word immediately before the exclamation mark is generated in a surprised tone. By doing so, it is possible to utter in a natural speech tone without inserting special speech tone change data.
[0012]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment of a speech synthesis processing device according to the present invention will be described in detail with reference to the accompanying drawings. The speech synthesis processing device of the present embodiment is realized by an information processing device such as a personal computer or a portable information terminal.
[0013]
FIG. 1 is a block diagram showing a schematic configuration of a speech synthesis device 1 of the present embodiment. The apparatus includes a CPU (central processing unit) 11, a ROM (read only memory) 12, a RAM (random access memory) 13, a storage device 14, a dictionary 15, an input unit 16, a display unit 17, a sound processing unit 18, and a speaker 19. Consists of
Next, an outline of the present apparatus will be described. The storage device 14 stores a sentence to be subjected to speech synthesis, and the user inserts sound quality switching information into the sentence using the input unit 16 and the display unit 17 to edit the sentence, and performs speech synthesis sentence and voice quality switching. Create input text with information. The specific method of inserting the voice quality switching information will be described in detail with reference to FIGS.
[0014]
The speech synthesis program is stored in the ROM 12, and the kanji reading and accent information are registered as data in the dictionary 15. The CPU 11 reads the input sentence from the storage device 14 in accordance with the program stored in the ROM 12, performs inflection with the sound processing unit 18 based on the data stored in the dictionary 15, and performs speech synthesis with the specified voice quality, Speak from speaker 19.
[0015]
FIG. 2 is a block diagram showing the configuration of the speech synthesis process of the speech synthesis device 1. The processing unit 22 sequentially reads out the input sentences 21 including the voice quality switching information from the storage device 14, performs voice synthesis processing, and utters a voice 28 with voice quality switching.
[0016]
The processing unit 22 includes a font sound quality correspondence storage unit 23, a voice synthesis sentence temporary storage unit 24, a voice quality switching history storage unit 26, and a voice synthesis processing unit 27. These are generated in the RAM 13 at the time of execution.
[0017]
The font voice quality correspondence storage means 23 stores fonts and voice quality switching information in association with each other, as shown as an example in FIG. For example, pictograms that resemble the robot's face shown in the font column are associated with voice quality switching information of “make a robot voice”. The period "." Is associated with voice quality cancellation information, and "!" And "?" Are voice quality switching information that utters the sentence before the appearance of each font with a surprised voice or a questionable voice. Is associated with.
[0018]
Although not normally displayed, a line feed code or a reading mark “,” included in text data may be set as voice quality release information.
[0019]
Next, a speech synthesis processing method will be described. The input sentence 21 is read and processed one character at a time. When the read font is not associated with the font voice quality correspondence storage unit 23, the read font is temporarily stored in the speech synthesis sentence temporary storage unit 24 as text data to be a speech synthesis sentence. The associated voice quality switching font is converted into the corresponding voice quality switching information 25, and the voice quality information is stored in the voice quality switching history storage unit 26 according to the voice quality switching information 25.
[0020]
In this way, a series of input sentences is read, and voice quality switching information is sent to the speech synthesis processing unit 27 together with the speech synthesis sentences stored in the speech synthesis sentence temporary storage means 24, and is uttered as speech 28.
[0021]
4 to 6 show the storage format and operation of the voice quality switching history storage unit 26. The voice quality switching history storage means operates like a stack, and voice quality switching information is pushed onto the stack in the order of reading, and popped off the stack by voice quality release information.
[0022]
The operation of the sound quality switching history storage means 26 will be described with reference to FIG. The sound quality switching history storage means 26 stores sound quality switching information in the order of “voice of an alien”, “voice of sumo wrestling”, and “voice of a grandmother” from the bottom, and stores the voice synthesis sentence temporary storage means. It is assumed that the information stored in 24 is reset and nothing is stored. Then, the input sentences are sequentially read out, and the speech synthesis sentences are accumulated as text data in the speech synthesis sentence temporary storage unit 24 until the sound quality switching information appears.
[0023]
The state 41 in FIG. 4 indicates a state when voice quality switching information "make a robot voice" appears. In the following description, the voice quality switching information may be abbreviated as voice quality information. When the voice quality switching information appears, the voice synthesis sentence stored in the voice synthesis sentence temporary storage means 24 together with the voice quality information of "make a grandmother's voice", which is the last information stored in the voice quality switching information history storage means 26, is output. The data is sent to the synthesis processing unit 27, and the voice synthesis sentence is uttered in the voice of the grandmother. Then, the voice quality information of "make a robot voice" is stored at the end of the sound quality switching history storage means 26. By doing so, the text after the voice quality switching information can be uttered in the voice quality of the robot.
[0024]
FIG. 5 shows processing when voice quality release information is given as voice quality switching information. When voice quality release information is given from the state of 51, the sentence stored in the voice synthesis sentence temporary storage means 24 together with voice quality information of "make a robot voice" which is the latest information accumulated so far. Is sent to the voice synthesis processing unit, and is uttered by the voice of the robot. Then, the voice quality information "make a robot voice" stacked on the top of the stack is deleted from the stack (52). By doing so, the sentence to be uttered next is replaced with the original voice quality information (that is, the previous voice quality switching history stored in the voice quality switching history storage unit 26 one time before the latest information “make a robot voice”). Information) , "Grandmother's voice".
[0025]
Also, as shown in FIG. 3, by setting the period “.” As the voice quality release information, the voice quality switching information stored in the voice quality switching history storage unit 26 is extracted one sentence at a time, and the voice quality switching history is stored for each sentence. Voices can be sequentially uttered based on the voice quality switching information stored in the storage unit 26.
[0026]
When the line feed code is set as voice quality cancellation information, a paragraph can be uttered as a group of speech synthesis sentences. When the reading point “,” is set as voice quality cancellation information, a sentence separated by a reading point can be used. It can be uttered as a group of speech synthesis sentences.
In addition, when the voice quality information of "make a robot voice" is stacked on the stack at 42 in FIG. 4, a plurality of, for example, two voice quality information pieces are stacked, thereby invalidating the voice quality release information immediately thereafter, and two sentences with the specified voice quality. It is also possible to utter.
[0027]
FIG. 6 shows a process when the immediately preceding sentence voice quality switching information is provided. As shown in FIG. 3, immediately before sentence voice quality switching information is associated with “!” And “?”, And in the state of 61 in FIG. 6, “surprising voice” associated with “!” When the sentence voice quality switching information of “make” is given, the voice information of “surprised voice” is added to the voice information of “grandmother's voice”, which is the last information loaded, and the speech synthesis text is added. The sentence is sent to the speech synthesis processing section together with the sentence stored in the temporary storage means 24, and is uttered with the surprised voice of the grandmother. The voice quality information of this "surprised voice" is not accumulated in the voice quality switching history means 26, and the voice quality switching history means 26 maintains the state as it is (62).
[0028]
FIG. 7 is a flowchart showing the operation of the present invention. As described above, the input sentence including the voice quality switching information is read out one character at a time, and is processed one character at a time according to the flowchart shown in FIG.
[0029]
First, it is determined whether or not the read character is voice quality switching information (step S701). In the case of voice quality switching information, utterance is made based on voice quality information stacked on the top of the voice quality switching history storage unit 24 as shown in FIG. (Step S702). Then, after deleting the speech synthesis sentence stored in the speech synthesis sentence temporary storage means 24, the input switching information is loaded on the voice quality switching history means 26, and the process returns to the original processing (step S703).
[0030]
If it is not voice quality switching information, it is determined whether it is voice quality release information (step S704). If it is voice quality release information, the latest voice information stored at the top of the voice quality switching history storage means 24 as shown in FIG. The voice is uttered using the voice quality information (step S705). On deleting the speech synthesis sentence speech synthesis Bun'ichi time storage means 24 deletes the voice information stacked on top of voice switching history storage unit 24, return to the original process, i.e. from the latest voice switching information it uttered by the voice switching information stored in one time before (step S706).
[0031]
If it is not the voice quality release information, it is determined whether or not it is the immediately preceding sentence voice quality switching information (step S707), and if it is the immediately preceding sentence voice quality switching information, it is stored at the top of the voice quality switching history storage means 24 as shown in FIG. The voice is then uttered in a voice quality obtained by adding the immediately preceding sentence voice quality switching information to the voice quality information, the information in the voice synthesis sentence temporary storage means 24 is deleted, and the process returns to the original processing (step S708).
[0032]
If it is not the immediately preceding sentence voice quality switching information, it is temporarily stored as normal text data in the speech synthesis sentence temporary storage means 24 (step 709), and thereafter, it is determined whether or not all the sentences have been completed (step 710), and the process is terminated. If not, return to the original processing. When the processing is completed, the voice synthesis sentence stored in the voice synthesis sentence temporary storage means 24 is uttered with the voice quality information loaded on the top of the voice quality switching history storage means 24, and the process is terminated (S711).
[0033]
8 to 14 show display examples when an input sentence is created by inserting voice quality switching information into a sentence to be subjected to speech synthesis processing. When the whole sentence is uttered with the specified voice quality, first, as shown in FIG. 8, the cursor 100 is placed at the head of the sentence. Next, as shown in FIG. 9, a menu display is displayed at that location, and a desired voice quality is selected there. By doing so, the associated font 101 stored in the font voice quality correspondence storage means 23 is inserted at the cursor position as shown in FIG. In this manner, the input sentence in which the voice quality switching information is inserted only at the head of the sentence is uttered with the specified voice quality.
[0034]
As other setting states, the standard voice quality setting is specified for the entire text, and the period code "." Is associated with the voice quality release information. As described above, the voice quality switching information is inserted only at the beginning of the text. If it does, only the first sentence, "Suddenly, we will gather at 5 o'clock today," will be uttered in the voice quality specified by the voice quality switching information, and then in the standard voice quality setting become. Such setting of the standard voice quality may be set so as not to be released by the voice quality release information and stored in the voice quality switching history storage unit 26, for example.
[0035]
FIG. 11 shows a procedure for switching the voice quality only in the section designated by the user. The cursor 100 is placed at the beginning of the section where the voice quality is to be switched (FIG. 11), and the end point is specified according to the existing text document section specifying means such as pressing the cursor key while pressing the shift key (FIG. 12). While the section is specified, the menu display is opened and the desired voice quality is selected (FIG. 13). By doing so, the font 141 associated with the voice quality switching information is inserted at the beginning of the voice quality switching, and the font 142 associated with the voice quality release information is inserted at the end point.
In this case, the beginning of the sentence, "Suddenly, but I will contact you." Is uttered in the voice quality of the standard voice quality setting, and the voice quality switching font 141 is provided before the next "Note!" For this reason, the voice information is switched to the corresponding voice quality information until "Caution!-Please confirm by telephone." Next, since there is a font 142 associated with the voice quality release information, the setting of the voice quality switching font 141 in front of the "note" is released, and the subsequent text is replaced with the corresponding voice quality switching information 141 or earlier. Will be uttered in the standard voice quality setting which is the voice quality of the user.
[0036]
As another condition, for example, when the voice quality switching font “switch to the voice quality of the robot” is specified before the voice quality switching font 141, the voice quality release font 142 and thereafter are uttered in the voice quality of the robot.
[0037]
【The invention's effect】
According to the present invention, by setting the voice quality cancellation information, it is possible to return to the original voice quality and utter. By associating a code such as a line feed code or a punctuation mark originally inserted into the text data as the voice quality release information, the display is not complicated and the display is easy to see.
Also, by changing the voice quality of the word immediately before the question mark or the exclamation mark is attached, the content can be conveyed reliably, and when listening, it is possible to encourage a person to listen carefully.
[Brief description of the drawings]
FIG. 1 is a schematic diagram showing a speech synthesis processing device 1 according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a speech synthesis process.
FIG. 3 is an example of a storage format of a font voice quality correspondence storage unit 23;
FIG. 4 is a schematic diagram showing an example of an operation of the voice quality switching history storage means 26 when voice quality switching information is given.
FIG. 5 is a schematic diagram showing an example of an operation of the voice quality switching history storage means 26 when voice quality cancellation information is given.
FIG. 6 is a schematic diagram showing an example of an operation when the immediately preceding sentence voice quality switching information is provided to the voice quality switching history storage means 26;
FIG. 7 is a flowchart for explaining the operation of the present invention.
FIG. 8 is a schematic diagram showing an example of display content when voice quality switching information is inserted.
FIG. 9 is a schematic diagram showing an example of display contents when voice quality switching information is inserted.
FIG. 10 is a schematic diagram showing an example of display contents when voice quality switching information is inserted.
FIG. 11 is a schematic diagram showing an example of display contents when voice quality switching information is inserted.
FIG. 12 is a schematic diagram showing an example of display content when voice quality switching information is inserted.
FIG. 13 is a schematic diagram showing an example of display contents when voice quality switching information is inserted.
FIG. 14 is a schematic diagram showing an example of display content when voice quality switching information is inserted.
[Explanation of symbols]
1 Speech synthesis processor 11 CPU
12 ROM
13 RAM
14 storage device 15 dictionary 16 input unit 17 display unit 18 sound processing unit 19 speaker 21 input sentence 22 processing unit 23 font voice quality correspondence storage means 24 voice synthesis sentence temporary storage means 25 voice quality switching information 26 voice quality switching history storage means 27 voice synthesis processing Part 28 Voice 100 Cursor 101, 102 Font 142 associated with voice quality switching Font associated with voice quality cancellation

Claims

The input speech including the voice synthesis sentence to be uttered and the voice quality to be uttered and the voice quality switching information for switching are sequentially read to extract the voice synthesis sentence and the voice quality switching information, and the extracted voice synthesized sentence is based on the voice quality switching information. In a voice synthesis processing device that performs voice synthesis processing and utters with a specified voice quality,
It has voice quality switching history storage means for sequentially storing the extracted voice quality switching information, and the read speech synthesis sentence utters based on the most recently stored voice quality switching information,
The voice switching information includes voice release information for releasing the voice switching information indicating the designated voice quality, when reading the voice quality release information, releases the voice switching information the most recently stored, and later sentences, the most A speech synthesis processing device that utters speech based on previous voice quality switching information stored one time before newly stored latest voice quality switching information .

The speech synthesis processing device according to claim 1, wherein the voice quality cancellation information includes a line feed code.

The speech synthesis processing device according to claim 1, wherein the voice quality release information includes a punctuation code or a reading code.

The speech synthesis processing device according to claim 1, wherein the voice quality switching information includes a question mark or an exclamation mark.

5. The speech synthesis according to claim 4, wherein when the question mark or the exclamation mark, which is voice quality switching information, is read, the word of the immediately preceding voice synthesis sentence is uttered in the voice quality associated with the voice quality switching information. Processing equipment.