JP4039620B2

JP4039620B2 - Speech synthesis apparatus and speech synthesis program

Info

Publication number: JP4039620B2
Application number: JP2002280430A
Authority: JP
Inventors: 寛之世木; 徹都木
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2002-09-26
Filing date: 2002-09-26
Publication date: 2008-01-30
Anticipated expiration: 2022-09-26
Also published as: JP2004117778A

Description

【０００１】
【発明の属する技術分野】
本発明は、テキストデータ、特にデータ放送、文字放送によって送信されるテキストデータから音声合成する音声合成装置および音声合成プログラムに関する。
【０００２】
【従来の技術】
従来、データ放送、文字放送によって送信側から送信されるテキストデータを、受信側で受信して高品質な音声合成データを生成する方法として、例えば、音声合成方式（特許文献１）が利用できる。この方式（方法、この方法による装置）は、送信側の装置において、当該装置に入力されたテキストデータから音声合成用データベース（送信側装置に包有）を参照して、テキストデータに含まれている音素の継続時間やピッチ等の付加情報を生成し、この付加情報をテキストデータと共に受信側に送信することにより、受信側の装置において、受信した付加情報に基づいて、音声合成用データベース（受信側装置に包有）を参照して、テキストデータの音声合成を行って、高品質な合成音声データを生成するものである。
【０００３】
この方法を使用することにより、受信側の装置（音声合成装置）で、音声合成する際に負荷の高い計算（高負荷計算）が必要とされても、送信側（放送局側）で生成した付加情報によって当該高負荷計算の負荷が軽減され、受信側の装置（音声合成装置）で素早く（処理速度の速い）高品質な音声合成を実現することができる。
【０００４】
【特許文献１】
特開平５−２１０３９５号公報
【０００５】
【発明が解決しようとする課題】
しかしながら、従来の方法「音声合成方式」では、送信側から受信側に同じテキストデータが繰り返し送信される場合が多く（例えば、データ放送におけるデータカルーセル方式による送信）、受信側の受信装置でテキストデータを受信する度にその都度、音声合成する必要が生じ処理効率が悪く、ひいては高品質な音声合成を維持することが困難になるという問題がある。また、従来の「音声合成方式」では、送信側の装置と受信側の装置間で、同じ音声合成用データベースを備える必要があった。このため、送信側と受信側間で同じデータベースを保持できるように絶えずメンテナンスする必要があるという問題がある。
【０００６】
そこで、本発明の目的は前記した従来の技術が有する課題を解消し、送信側と受信側で同じ音声合成用データベースを保持する必要がなく、高品質な合成音声データの生成をする（維持する）ことができる音声合成装置および音声合成プログラムを提供することにある。
【０００７】
【課題を解決するための手段】
本発明は、前記した目的を達成するため、以下に示す構成とした。
請求項１に記載の音声合成装置は、テキストデータを音声合成する音声合成装置であって、前記テキストデータを入力するテキストデータ入力手段と、前記テキストデータを記憶テキストデータとして記憶するテキストデータ記憶手段と、前記テキストデータ入力手段で単位毎に入力されたテキストデータを入力テキストデータとして、前記記憶テキストデータと比較し、前記入力テキストデータが前記記憶テキストデータと一致しない場合に、前記入力テキストデータを新たに入力された新規テキストデータとして判別し、記憶テキストデータとして前記テキストデータ記憶手段に記憶する新規テキストデータ判別手段と、前記新規テキストデータを音声合成する際に供され、音声合成単位ごとに語句と特徴量のデータが少なくとも含まれる音声合成用データを記憶する音声合成用データ記憶手段と、この音声合成用データ記憶手段に記憶された音声合成用データを使用して、前記新規テキストデータを音声合成し、合成音声データとする音声合成手段と、前記テキストデータ入力手段で入力された単位毎の入力テキストデータを選択するテキストデータ選択手段と、前記音声合成手段で音声合成した合成音声データを記憶する合成音声データ記憶手段と、前記テキストデータ選択手段で選択された入力テキストデータに対応する合成音声データを出力する合成音声データ出力手段と、前記音声合成手段で音声合成する際に使用した音声合成用データの語句と、それぞれの語句に対応する特徴量の前記音声合成用データ記憶手段における当該音声合成用データの記憶媒体上の記憶位置とを関連付けた参照情報を生成し、前記音声合成手段で音声合成する際に、前記合成音声データ記憶手段に出力する参照情報生成出力手段と、を備え、前記合成音声データ出力手段が、前記テキストデータ選択手段で選択された入力テキストデータに含まれる語句に対応する特徴量を、前記参照情報を使用して前記音声合成用データ記憶手段から読み出して、この読み出された特徴量を用いて生成された合成音声データをスピーカに出力することを特徴とする。
【０００８】
かかる構成によれば、テキストデータ入力手段で、テキストデータが入力される。新規テキストデータ判別手段で、テキストデータを記憶した記憶テキストデータと、新たに入力された入力テキストデータとが比較され、入力テキストデータが記憶テキストデータと一致しない場合に、入力テキストデータが新たに入力された新規テキストデータとして判別され、記憶テキストデータとしてテキストデータ記憶手段に記憶される。音声合成用データ記憶手段で、新規テキストデータを音声合成する際に供され、音声合成単位ごとに語句と特徴量のデータが少なくとも含まれる音声合成用データが記憶される。新規テキストデータ判別手段で新規テキストデータと判別された場合には、音声合成手段で、音声合成用データ記憶手段に記憶された音声合成用データが使用されて、新規テキストデータが音声合成され、音声合成データとされる。そして、テキストデータ選択手段で入力テキストデータが選択されるまで、音声合成データが合成音声データ記憶手段で保持されて、入力テキストデータが選択されると、この入力テキストデータに対応する合成音声データが合成音声データ出力手段で出力される。なお、入力テキストデータが記憶テキストデータと一致しない場合とは、句読点の間の一部でも一致していなければ一致していないとみなす場合を指すものであり、つまり、新規テキストデータ判別手段では、句読点の間の入力テキストデータが記憶テキストデータに完全一致していない限り、新規テキストデータとみなされる。また、参照情報生成出力手段で、音声合成手段で音声合成する際に使用された音声合成用データの語句と、それぞれの語句に対応する特徴量の音声合成用データ記憶手段における記憶媒体上の記憶位置とが関連付けられた参照情報が生成され、合成音声データ記憶手段に出力される。そして、この参照情報が使用され、合成音声データ出力手段により、テキストデータ選択手段で選択された入力テキストデータに含まれる語句に対応する特徴量が音声合成用データ記憶手段から読み出され、読み出された特徴量を用いて生成された合成音声データがスピーカに出力される。記憶媒体上の記憶位置は、例えば、記憶媒体上に付されている時間情報に対応しているものである。参照情報は、音声合成用データの単語または音素と、記憶媒体上の記憶位置とが関連付けられたものである。
【０００９】
請求項２に記載の音声合成装置は、請求項１に記載の音声合成装置において、前記テキストデータを、データ放送、文字放送の少なくとも一方の放送により受信するテキストデータ受信手段を備えたことを特徴とする。
【００１０】
かかる構成によれば、テキストデータ受信手段で、データ放送、文字放送の少なくとも一方によって、テキストデータが入力される。つまり、このテキストデータ受信手段が備えられることで、音声合成装置は、音声合成機能が付属したデータ受信機であるといえ、通常のデータ放送によって放送されており、受信側の表示装置に表示されるテロップ等の文字情報や、文字ニュース等のテキストデータが合成音声データに変換されて（合成されて）、出力される。
【００１１】
請求項３に記載の音声合成装置は、請求項１または請求項２に記載の音声合成装置において、前記合成音声データ記憶手段で前記合成音声データを記憶する際に、当該合成音声データのデータ量を圧縮した圧縮合成音声データとする合成音声データ圧縮手段と、前記テキストデータ選択手段で選択された入力テキストデータが前記圧縮合成音声データに対応する際に、当該圧縮合成音声データを解凍する圧縮合成音声データ解凍手段とを備えたことを特徴とする。
【００１２】
かかる構成によれば、合成音声データ圧縮手段で、合成音声データが合成音声データ記憶手段に記憶される場合に、テータ量が少なくなるように圧縮される。なお、この合成音声データ圧縮手段における合成音声データの圧縮方式は、ＭＰＥＧ−２方式の他、任意の圧縮方式でよい。この合成音声データ圧縮手段で圧縮された圧縮合成音声データは、テキストデータ選択手段で対応する入力テキストデータが選択された場合に、圧縮合成音声データ解凍手段で解凍される。
【００１３】
請求項４に記載の音声合成プログラムは、テキストデータを音声合成するために、コンピュータを、前記テキストデータを入力するテキストデータ入力手段、前記テキストデータを記憶テキストデータとして記憶するテキストデータ記憶手段、前記テキストデータ入力手段で単位毎に入力されたテキストデータを入力テキストデータとして、前記記憶テキストデータと比較し、前記入力テキストデータが前記記憶テキストデータと一致しない場合に、前記入力テキストデータを新たに入力された新規テキストデータとして判別し、記憶テキストデータとして前記テキストデータ記憶手段に記憶する新規テキストデータ判別手段、前記新規テキストデータを音声合成する際に供され、音声合成単位ごとに語句と特徴量のデータが少なくとも含まれる音声合成用データを記憶する音声合成用データ記憶手段、この音声合成用データ記憶手段に記憶された音声合成用データを使用して、前記新規テキストデータを音声合成し、合成音声データとする音声合成手段、前記テキストデータ入力手段で入力された単位毎の入力テキストデータを選択するテキストデータ選択手段、前記音声合成手段で音声合成した合成音声データを記憶する合成音声データ記憶手段、前記テキストデータ選択手段で選択された入力テキストデータに対応する合成音声データを出力する合成音声データ出力手段、前記音声合成手段で音声合成する際に使用した音声合成用データの語句と、それぞれの語句に対応する特徴量の前記音声合成用データ記憶手段における当該音声合成用データの記憶媒体上の記憶位置とを関連付けた参照情報を生成し、前記音声合成手段で音声合成する際に、前記合成音声データ記憶手段に出力する参照情報生成出力手段、として機能させ、前記合成音声データ出力手段が、前記テキストデータ選択手段で選択された入力テキストデータに含まれる語句に対応する特徴量を、前記参照情報を使用して前記音声合成用データ記憶手段から読み出して、この読み出された特徴量を用いて生成された合成音声データをスピーカに出力することを特徴とする。
【００１４】
かかる構成によれば、テキストデータ入力手段で、テキストデータが入力される。新規テキストデータ判別手段で、テキストデータを記憶した記憶テキストデータと、新たに入力された入力テキストデータとが比較され、入力テキストデータが記憶テキストデータと一致しない場合に、入力テキストデータが新たに入力された新規テキストデータとして判別され、記憶テキストデータとしてテキストデータ記憶手段に記憶される。音声合成用データ記憶手段で、新規テキストデータを音声合成する際に供され、音声合成単位ごとに語句と特徴量のデータが少なくとも含まれる音声合成用データが記憶される。新規テキストデータ判別手段で新規テキストデータと判別された場合には、音声合成手段で、音声合成用データ記憶手段に記憶された音声合成用データが使用されて、新規テキストデータが音声合成され、音声合成データとされる。そして、テキストデータ選択手段で入力テキストデータが選択されるまで、音声合成データが合成音声データ記憶手段で保持されて、入力テキストデータが選択されると、この入力テキストデータに対応する合成音声データが合成音声データ出力手段で出力される。参照情報生成出力手段で、音声合成手段で音声合成する際に使用された音声合成用データの語句と、それぞれの語句に対応する特徴量の音声合成用データ記憶手段における記憶媒体上の記憶位置とが関連付けられた参照情報が生成され、合成音声データ記憶手段に出力される。そして、この参照情報が使用され、合成音声データ出力手段により、テキストデータ選択手段で選択された入力テキストデータに含まれる語句に対応する特徴量が音声合成用データ記憶手段から読み出され、読み出された特徴量を用いて生成された合成音声データがスピーカに出力される。
【００２１】
【発明の実施の形態】
以下、本発明の一実施の形態について、図面を参照して詳細に説明する。
（音声合成装置の構成）
図１は音声合成装置のブロック図である。この図１に示すように、音声合成装置１は、テキストデータ入力部３と、テキストデータ受信部５と、新規テキストデータ判別部７と、テキストデータ記憶部９と、音声合成部１１と、音声合成用データベース１３と、合成音声データ記憶出力部１５と、テキストデータ選択部１７とを備えている。また、この音声合成装置１は、音声合成した合成音声データを出力するスピーカ１９に接続されており、当該音声合成装置１を操作するリモコン２１が備えられている。
【００２２】
この音声合成装置１は、データ放送、文字放送等によって送信されるテキストデータや、キーボード（図示せず）等を介して入力されたテキストデータ（単位毎に入力されたテキストデータ）を、音声合成して出力するものである。また、この音声合成装置１では、従来の音声合成装置（音声合成機能付属データ受信機）のように、送信側からテキストデータと付加情報とが送信されなくても、音声合成用データベース１３中の音声合成用データを使用して、テキストデータを音声合成する際に、参照情報（詳しくは後記する）を生成することで処理速度を向上させることができる共に、一度、音声合成した合成音声データを貯えておいて、再利用することで、当該装置１の処理能力が低下せず、高品質の合成音声を出力できる（維持できる）ものである。また、単位毎に入力されたテキストデータとは、当該装置１に一度に入力された入力単位や、一纏まりにまとめることができる単位、例えば、語句単位毎、文章単位毎のことを指すものである。
【００２３】
テキストデータ入力部３は、キーボードやマウス等によって構成され、テキストデータを入力するものである。なお、このテキストデータ入力部３は、既存の手書き文章からテキストデータを取得することができるＯＣＲで構成してもいいし、テキストデータが記憶されているディスクを取り扱うディスクドライブで構成してもいいし、外部から通信回線網等を介して入力される入力端子で構成してもいい。このテキストデータ入力部３が特許請求の範囲の請求項に記載したテキストデータ入力手段に相当するものである。
【００２４】
テキストデータ受信部５は、データ放送、文字放送等を受信可能なアンテナ（パラボラアンテナ）、検波回路等によって構成されるもので、放送局から送出される現行のアナログテレビ放送やハイビジョン衛星放送のデータチャンネル（データ放送、文字放送等）で伝送される各種のデジタルデータに含まれるテキストデータを検出するものである。このテキストデータ受信部５が特許請求の範囲の請求項に記載したテキストデータ受信手段に相当するものである。
【００２５】
なお、この実施の形態では、テキストデータ入力部３およびテキストデータ受信部５には、予め、入力単位を設定する機能（入力単位設定機能）が備えられており、当該装置１に一度に入力された入力単位（受信時刻に隔てのある文章単位）や、一纏まりにまとめることができる単位、例えば、語句単位毎、文章単位を設定することができる。
【００２６】
新規テキストデータ判別部７は、テキストデータ入力部３とテキストデータ受信部５とから入力されたテキストデータをテキストデータ記憶部９に記憶テキストデータとして記憶すると共に、このテキストデータ記憶部９に記憶した記憶テキストデータと続いて入力されたテキストデータ（入力テキストデータ）とを比較し、入力テキストデータが記憶テキストデータと一致しない場合に、新規テキストデータと判別して音声合成部１１に出力するものである。なお、新規テキストデータと判別された場合には、記憶テキストデータ（新たな記憶テキストデータ）としてテキストデータ記憶部９に記憶される。
【００２７】
また、入力テキストデータが記憶テキストデータとが一致しない場合とは入力テキストデータを句読点で分割して、これら句読点間の文章の中で異なる部分が少しでもあれば、一致しないとみなすことであり、この場合、一致していない部分のみが音声合成部１１に出力される。
【００２８】
例えば、記憶テキストデータが「今日はいい天気で、過ごしやすい日になるでしょう。」であり、入力テキストデータが「今日はいい天気で、気温は２８度になる見込みです。」であり、記憶テキストデータは「今日はいい天気で、」、「過ごしやすい日になるでしょう。」と分解されてテキストデータ記憶部９に記憶されており、入力テキストデータ「今日はいい天気で、」、「気温は２８度になる見込みです。」と分解され、これらを比較した場合、記憶テキストデータと入力テキストデータとで、異なる部分（新しい部分）、すなわち、「気温は２８度になる見込みです。」が新規テキストデータとして判別され、音声合成部１１に出力され、共通する「今日はいい天気で」という単語は音声合成部１１に出力されない。
【００２９】
テキストデータ記憶部９は、半導体メモリやハードディスク等によって構成されるもので、テキストデータ入力部３とテキストデータ受信部５とで得られたテキストデータを、新規テキストデータ判別部７で判別された結果に基づいて記憶するものである。なお、新規テキストデータ判別部７およびテキストデータ記憶部９が特許請求の範囲の請求項に記載した新規テキストデータ判別手段に相当するものである。
【００３０】
音声合成部１１は、新規テキストデータ判別部７で判別された新規テキストデータを、音声合成用データベース１３を探索して音声合成し、合成音声データを生成するものである。なお、音声合成部１１における音声合成の方法（手段）は、どのようなタイプのものであってもよく、例えば、特開平１０−４９１９３号公報に開示されている手段を利用してもよい。
【００３１】
この音声合成部１１には、新規テキストデータを音声合成する際に使用した音声合成用データと、音声合成用データベース１３の記憶媒体上の記憶位置と関連付けて、合成音声データと共に、合成音声データ記憶出力部１５に出力する参照情報生成出力手段（図示せず）が備えられている。
【００３２】
参照情報は、例えば、「米倉」という単語に対し、「米倉＿名詞＿ファイル２３＿３０ｍｓ〜７０ｍｓ＿ＸＸＸ」といったように記述されるもので、“名詞”は、単語または音素の品詞等に関する情報の一種であり、“ファイル２３”は音声合成用データベース１３中において、「米倉」という名詞が含まれている文章の番号を示すものであり、“３０ｍｓ〜７０ｍｓ”がファイル２３（文章）中で発声されている時間を示すものであり、“ＸＸＸ”が「米倉」という単語の特徴量を示すものである。
【００３３】
音声合成部１１において、前記した特開平１０−４９１９３号公報に開示されている手段を使用した場合、音声合成に時間がかかるのは、音声合成用データベース１３の探索時間であるので、この参照情報生成出力手段（図示せず）によって生成した参照情報を使用すれば、合成音声データ記憶出力部１５には、必ずしも、合成音声データを記憶しておく必要がなくなり、記憶容量を小さくすることができる。また、この音声合成部１１が特許請求の範囲の請求項に記載した音声合成手段に相当するものである。
【００３４】
音声合成用データベース１３は、大容量のハードディスク等によって構成され、音声合成部１１で新規テキストデータを音声合成する際に使用されるもので、単語または音素によって構成される音声合成単位（単語分割候補）毎に、発声時間、特徴量等がまとめられた音声合成用データが記憶されたものである。なお、この音声合成用データベース１３が特許請求の範囲の請求項に記載した音声合成用データ記憶手段に相当するものである。
【００３５】
合成音声データ記憶出力部１５は、いわゆるデータバッファに該当するものであり、音声合成部１１で音声合成された合成音声データを記憶して、テキストデータ選択部１７から出力された選択データに基づいて、記憶した合成音声データを出力するものであり、合成音声データ出力手段１５ａと、合成音声データ記憶手段１５ｂとを備えている。
【００３６】
合成音声データ出力手段１５ａは、合成音声データおよび参照情報の記憶および出力の制御を司るもので、音声合成部１１で音声合成された合成音声データを合成音声データ記憶手段１５ｂに記憶させると共に、テキストデータ選択部１７から出力された選択データに基づいて、この選択データに対応する合成音声データをスピーカ１９に出力するものである。
【００３７】
合成音声データ記憶手段１５ｂは、音声合成部１１で音声合成された合成音声データと、参照情報とを合成音声データ出力手段１５ａの制御に従って、記憶するものである。
また、この合成音声データ記憶出力部１５には、図示を省略した合成音声データ圧縮手段および圧縮合成音声データ解凍手段が備えられている。
【００３８】
合成音声データ圧縮手段（図示せず）は、音声合成部１１で音声合成された合成音声データを合成音声データ記憶手段１５ｂに記憶する際に、当該合成音声データのデータ量を圧縮し、圧縮合成音声データを生成するものである。圧縮合成音声データ解凍手段（図示せず）は、合成音声データ記憶手段１５ｂに記憶した圧縮合成音声データを、テキストデータ選択部１７から出力される選択データに基づいて、出力する（読み出す）際に解凍するものである。
【００３９】
これら合成音声データ圧縮手段（図示せず）および圧縮合成音声データ解凍手段（図示せず）によって、合成音声データ記憶手段１５ｂの記憶容量を少量に抑えることができる。
【００４０】
テキストデータ選択部１７は、音声合成装置１の使用者が操作したリモコン２１から出力された赤外線信号（制御信号）に基づいて、当該音声合成装置１から出力させる音声（合成音声データ）に対応するテキストデータを選択するものである。
【００４１】
この音声合成装置１によれば、テキストデータ入力部３で、テキストデータが入力される。新規テキストデータ判別部７で、テキストデータをテキストデータ記憶部９に記憶した記憶テキストデータと、新たに入力された入力テキストデータとが比較され、入力テキストデータが記憶テキストデータと一致しない場合に、入力テキストデータが新たに入力された新規テキストデータとして判別される。この新規テキストデータ判別部７で新規テキストデータと判別された場合には、音声合成部１１で、音声合成用データベース１３に記憶される音声合成用データが使用されて、新規テキストデータが音声合成され、音声合成データとされる。そして、テキストデータ選択部１７でテキストデータが選択されるまで、音声合成データが合成音声データ記憶出力部１５で保持されて、テキストデータが選択されると、このテキストデータに対応する音声合成データが合成音声データ記憶出力部１５で出力される。このため、一旦、音声合成部１１で音声合成された合成音声データが合成音声データ記憶出力部１５に記憶されており、新規テキストデータ判別部７で判別された新規テキストデータのみが音声合成部１１で音声合成されるので、音声合成する際の無駄な処理（余分な音声合成）が低減され、当該装置１の音声合成処理能力を高水準に維持することができ、高品質な合成音声データを生成することができる。
【００４２】
また、音声合成装置１のテキストデータ受信部５によって、データ放送、文字放送の少なくとも一方によるテキストデータが入力される。つまり、このテキストデータ受信部５が備えられることで、音声合成装置１は、音声合成機能が付属したデータ受信機であるといえ、通常のデータ放送によって放送されており、受信側の表示装置に表示されるテロップ等の文字情報や、文字ニュース等のテキストデータが合成音声データに変換されて（合成されて）、出力される。
【００４３】
すなわち、通常のデータ放送や文字放送等によるテキストデータは、音声合成装置（音声合成機能付属データ受信機）１を使用者（視聴者）が使用している最中に頻繁に入れ替わる可能性が少ない。このため、この音声合成装置１の音声合成部１１で合成済みでない新規テキストデータをテキストデータ受信部５で受信するとすぐに音声合成した合成音声データを生成し、合成音声データ記憶出力部１５の合成音声データ記憶手段１５ｂに記憶しておき（貯えておき）、テキストデータ選択部１７でテキストデータが選択されると、このテキストデータに対応する音声合成済みの合成音声データを合成音声データ記憶出力部１５から出力し、スピーカ１９で読み上げることができる（発声させることができる）。
【００４４】
（音声合成装置の動作）
次に、図２に示すフローチャートを参照して、音声合成装置１の動作を説明する。
まず、この音声合成装置１が起動すると、テキストデータ入力部３、テキストデータ受信部５の少なくとも一方でテキストデータが入力されたかどうかが判断され（Ｓ１）、入力されるまで待機され（Ｓ１、Ｎｏ）、入力された場合、テキストデータ記憶部９に記憶テキストデータとして記憶される（Ｓ２）。
【００４５】
そして、このテキストデータ記憶部９に記憶した記憶テキストデータと、新たに入力された入力テキストデータとが新規テキストデータ判別部７で比較判別され、この比較判別結果に基づいて、新規テキストデータかどうかが判断される（Ｓ３）。新規テキストデータ判別部７で入力テキストデータが新規テキストデータであると判断された場合（Ｓ３、Ｙｅｓ）、音声合成部１１に新規テキストデータが出力される。
【００４６】
すると、音声合成部１１で、音声合成用データベース１３を探索して、新規テキストデータが音声合成され、合成音声データとされる（Ｓ４）。この合成音声データおよび音声合成する際に生成した参照情報が合成音声データ記憶出力部１５に出力される。この合成音声データ記憶出力部１５では、合成音声データおよび参照情報が入力されると、合成音声データ記憶手段１５ｂに当該合成音声データおよび参照情報を記憶する（Ｓ５）。
【００４７】
その後、テキストデータ選択部１７にリモコン２１から（音声合成装置１の使用者から）の音声（合成音声データ）の出力要求（制御信号）があるかどうかが判断される（Ｓ６）。リモコン２１からの音声（合成音声データ）の出力要求（制御信号）があると判断されるまで待機され（Ｓ６、Ｎｏ）、リモコン２１からの音声（合成音声データ）の出力要求（制御信号）があると判断された場合には、合成音声データ記憶出力部１５の合成音声データ出力手段１５ａによって、出力要求（制御信号）に従ったテキストデータに対応する音声（合成音声データ）がスピーカ１９に出力される（Ｓ７）。
【００４８】
（データ放送によるテキストデータを音声合成する具体例について）
音声合成装置１のテキストデータ受信部５でデータ放送によるテキストデータを受信して、音声合成する具体例について説明する（適宜、図１を参照）。テキストデータ受信部５で受信したテキストデータに「ニュース」、「気象情報」、「スポーツ」、「円と株」、「道路交通情報」、「福祉」、「おすすめ情報」が含まれており、このテキストデータが図示を省略した表示装置の表示画面に“メニュー画面”として表示されている。
【００４９】
予め、「ニュース」、「気象情報」、「スポーツ」、「円と株」、「道路交通情報」、「福祉」、「おすすめ情報」が送信側の放送局から送信されてきていたとすると、これら「ニュース」、「気象情報」、「スポーツ」、「円と株」、「道路交通情報」、「福祉」、「おすすめ情報」が記憶テキストデータとして、テキストデータ記憶部９に記憶されている。続いて送信された「ニュース」、「気象情報」、「台風情報」、「スポーツ」、「円と株」、「道路交通情報」、「福祉」、「おすすめ情報」とすると、「台風情報」が新規テキストデータとして新規テキストデータ判別部７で判別され、音声合成部１１で音声合成される。音声合成部１１で音声合成が終了した合成音声データから順に合成音声データ記憶出力部１５に出力され、合成音声データ記憶手段１５ｂに記憶される。
【００５０】
そして、音声合成装置１の使用者がリモコン２１で「円と株」を選択したとすると、この「円と株」を選択した選択データがテキストデータ選択部１７から合成音声データ記憶出力部１５の合成音声データ出力手段１５ａに出力され、この合成音声データ出力手段１５ａで、合成音声データ記憶手段１５ｂに記憶されている「円と株」の合成音声データが読み出され、スピーカ１９から出力される。
【００５１】
以上、一実施形態に基づいて本発明を説明したが、本発明はこれに限定されるものではない。
例えば、音声合成装置１の各構成の処理を一つずつの工程ととらえた音声合成方法とみなすことや、各構成の処理を汎用のコンピュータ言語で記述した音声合成プログラムとみなすことは可能である。これらの場合、音声合成装置１と同様な効果を得ることができる。
【００５２】
【発明の効果】
請求項１、４に記載の発明によれば、テキストデータが入力され、このテキストデータが新たに入力されたものであれば、音声合成され、合成音声データとして記憶される。そして、テキストデータが選択されると、このテキストデータに対応する合成音声データが出力される。新規のテキストデータのみが音声合成されるので、音声合成処理能力が低下することなく、テキストデータを送信した送信側とテキストデータを受信した受信側とで同じ音声合成用データベースを保持する必要がなく、高品質な合成音声データを生成することができる。また、音声合成する際に使用された音声合成用データと、音声合成用データが記憶される記憶媒体上の記憶位置とが関連付けられた参照情報が生成されるので、この参照情報に基づいて、合成音声データを生成することができ、合成音声データを記憶しておく記憶媒体の記憶容量を少量に抑えることができると共に、音声合成する際の処理を軽減することができる。
【００５３】
請求項２に記載の発明によれば、データ放送、文字放送の少なくとも一方によるテキストデータが入力される。つまり、通常のデータ放送によって放送されており、受信側の表示装置に表示されるテロップ等の文字情報や、文字ニュース等のテキストデータを合成音声データして出力することができる。
【００５４】
請求項３に記載の発明によれば、合成音声データが記憶される場合に、テータ量が少なくなるように圧縮され、圧縮された圧縮合成音声データが、読み出される際に解凍されるので、合成音声データを記憶しておく記憶媒体の記憶容量を少量に抑えることができる。
【００５５】
請求項５記載の発明によれば、音声合成する際に使用された音声合成用データと、音声合成用データが記憶される記憶媒体上の記憶位置とが関連付けられた参照情報が生成されるので、この参照情報に基づいて、音声合成用データを生成することができ、合成音声データを記憶しておく記憶媒体の記憶容量を少量に抑えることができると共に、音声合成する際の処理を軽減することができる。
【図面の簡単な説明】
【図１】本発明による一実施の形態である音声合成装置のブロック図である。
【図２】図１に示した音声合成装置の動作を説明したフローチャートである。
【符号の説明】
１音声合成装置
３テキストデータ入力部
５テキストデータ受信部
７新規テキストデータ判別部
９テキストデータ記憶部
１１音声合成部
１３音声合成用データベース
１５合成音声データ記憶出力部
１５ａ合成音声データ出力手段
１５ｂ合成音声データ記憶手段
１７テキストデータ選択部
１９スピーカ
２１リモコン[0001]
BACKGROUND OF THE INVENTION
  The present invention synthesizes speech from text data, particularly text data transmitted by data broadcasting and text broadcasting.SoundThe present invention relates to a voice synthesizer and a voice synthesis program.
[0002]
[Prior art]
Conventionally, for example, a speech synthesis method (Patent Document 1) can be used as a method of receiving text data transmitted from a transmission side by data broadcasting or text broadcasting on the reception side and generating high-quality speech synthesis data. This method (method, device according to this method) is included in the text data by referring to the speech synthesis database (included in the transmitting device) from the text data input to the transmitting device. By generating additional information such as the duration and pitch of the phoneme that is being transmitted, and transmitting this additional information to the receiving side together with the text data. The high-quality synthesized voice data is generated by synthesizing the text data with reference to (included in the side device).
[0003]
By using this method, even if a high load calculation (high load calculation) is required when synthesizing the speech on the receiving device (speech synthesizer), it was generated on the transmitting side (broadcasting station side) The load of the high load calculation is reduced by the additional information, and high-quality speech synthesis can be realized quickly (fast processing speed) by the receiving device (speech synthesizer).
[0004]
[Patent Document 1]
Japanese Patent Application Laid-Open No. 5-210395
[0005]
[Problems to be solved by the invention]
However, in the conventional method “speech synthesis method”, the same text data is often repeatedly transmitted from the transmission side to the reception side (for example, transmission by the data carousel method in data broadcasting), and the text data is received by the reception device on the reception side. Each time a message is received, it is necessary to synthesize speech, resulting in poor processing efficiency, and it becomes difficult to maintain high-quality speech synthesis. Further, in the conventional “speech synthesis method”, it is necessary to provide the same speech synthesis database between the transmission side device and the reception side device. For this reason, there is a problem that it is necessary to maintain constantly so that the same database can be maintained between the transmission side and the reception side.
[0006]
  Therefore, the object of the present invention is to solve the problems of the conventional technique described above, and to generate (maintain) high-quality synthesized speech data without having to maintain the same speech synthesis database on the transmission side and the reception side. )It is possibleSoundTo provide a voice synthesizer and a voice synthesis program.
[0007]
[Means for Solving the Problems]
  In order to achieve the above-described object, the present invention has the following configuration.
  The speech synthesizer according to claim 1, which is a speech synthesizer for synthesizing text data, a text data input means for inputting the text data, and a text data storage means for storing the text data as stored text data. And the text data input for each unit by the text data input means as input text data, compared with the stored text data, if the input text data does not match the stored text data, the input text data New text data discriminating means for discriminating as newly input new text data and storing it as stored text data in the text data storing means, and provided for voice synthesis of the new text dataInclude at least word and feature data for each speech synthesis unitUsing the voice synthesis data storage means for storing the voice synthesis data and the voice synthesis data stored in the voice synthesis data storage means, the new text data is voice-synthesized into synthesized voice data. Speech synthesis means; text data selection means for selecting input text data for each unit input by the text data input means; synthesized speech data storage means for storing synthesized speech data synthesized by the speech synthesis means; Synthetic speech data output means for outputting synthesized speech data corresponding to the input text data selected by the text data selection means, and speech synthesis data used for speech synthesis by the speech synthesizerWordsWhen,Of features corresponding to each wordReference information in association with the storage position of the speech synthesis data in the speech synthesis data storage means on the storage medium is generated and output to the synthesized speech data storage means when speech synthesis is performed by the speech synthesis means. Reference information generation and output means, and the synthesized speech data output means,in frontSelected by the text data selection meansA feature amount corresponding to a phrase included in the input text data is read from the speech synthesis data storage unit using the reference information, and is generated using the read feature amount.The synthesized voice data is output to a speaker.
[0008]
  According to this configuration, text data is input by the text data input means. The new text data discriminating means compares the stored text data storing the text data with the newly input text data, and if the input text data does not match the stored text data, the input text data is newly input. The new text data is determined and stored as stored text data in the text data storage means.The speech synthesis data storage means is used when speech synthesis of new text data, and stores speech synthesis data including at least words and feature data for each speech synthesis unit.When the new text data discriminating means discriminates it as new text data, the voice synthesizing means uses the voice synthesizing data stored in the voice synthesizing data storage means, and the new text data is synthesized by voice. Synthetic data. The synthesized speech data is recorded until the input text data is selected by the text data selecting means.SpeculatorIf the input text data is selected, the synthesized speech data corresponding to the input text data is synthesized speech data.OutOutput by force means. Note that the case where the input text data does not match the stored text data refers to a case where the input text data is considered not to match unless even part of the punctuation marks match, that is, in the new text data determination means, As long as the input text data between the punctuation marks does not completely match the stored text data, it is regarded as new text data. In addition, the data for speech synthesis used when the speech synthesis unit performs speech synthesis by the reference information generation / output unit.WordsWhen,Of features corresponding to each wordReference information associated with the storage position on the storage medium in the voice synthesis data storage means is generated and output to the synthesized voice data storage means. This reference information is then used,The synthesized speech data output means reads out the feature quantity corresponding to the phrase included in the input text data selected by the text data selection means from the speech synthesis data storage means and generates it using the read feature quantity. Synthesized voice dataOutput to the speaker. The storage position on the storage medium corresponds to, for example, time information attached on the storage medium. The reference information is obtained by associating a word or phoneme of speech synthesis data with a storage position on a storage medium.
[0009]
  The speech synthesizer according to claim 2 is the speech synthesizer according to claim 1, further comprising text data receiving means for receiving the text data by at least one of data broadcasting and text broadcasting. And
[0010]
  According to such a configuration, text data is input by the text data receiving means by at least one of data broadcasting and text broadcasting. In other words, by providing this text data receiving means, the speech synthesizer can be said to be a data receiver with a speech synthesizer function, and is broadcast by normal data broadcasting and displayed on the receiving display device. Text information such as telop and text data such as text news are converted (synthesized) into synthesized voice data and output.
[0011]
  The speech synthesizer according to claim 3 is the speech synthesizer according to claim 1 or 2, wherein the synthesized speech data recording is performed.SpeculatorWhen the synthesized speech data is stored in a stage, the synthesized speech data compressing means for compressing the synthesized speech data into compressed synthesized speech data, and the text data selecting meansSelected bySelectedinputWhen the text data corresponds to the compressed synthesized voice data, a compressed synthesized voice data decompressing unit is provided for decompressing the compressed synthesized voice data.
[0012]
  According to such a configuration, the synthesized voice data is compressed by the synthesized voice data compression means.SpeculatorWhen it is stored in the stage, it is compressed so that the amount of data is reduced. Note that the synthetic audio data compression method in the synthetic audio data compression means may be any compression method other than the MPEG-2 method. The compressed synthesized voice data compressed by the synthesized voice data compressing means corresponds to the text data selecting means.inputWhen text data is selected, it is decompressed by the compressed synthesized speech data decompressing means.
[0013]
  The speech synthesis program according to claim 4, in order to synthesize text data, the computer synthesizes text data input means for inputting the text data, text data storage means for storing the text data as stored text data, The text data input for each unit by the text data input means is compared as the input text data with the stored text data. When the input text data does not match the stored text data, the input text data is newly input. New text data discriminating means for discriminating as new text data and storing it in the text data storage means as stored text data; provided for speech synthesis of the new text dataInclude at least word and feature data for each speech synthesis unitVoice synthesis data storage means for storing voice synthesis data, and voice synthesis data synthesized using the voice synthesis data stored in the voice synthesis data storage means to produce synthesized voice data Synthesis means, text data selection means for selecting input text data for each unit input by the text data input means, synthesized speech data storage means for storing synthesized speech data synthesized by the speech synthesis means, and the text data selection Synthesized speech data output means for outputting synthesized speech data corresponding to the input text data selected by the means, speech synthesis data used for speech synthesis by the speech synthesis meansWordsWhen,Of features corresponding to each wordReference information in association with the storage position of the speech synthesis data in the speech synthesis data storage means on the storage medium is generated and output to the synthesized speech data storage means when speech synthesis is performed by the speech synthesis means. Functioning as reference information generation / output means, and the synthesized voice data output means,in frontSelected by the text data selection meansA feature amount corresponding to a phrase included in the input text data is read from the speech synthesis data storage unit using the reference information, and is generated using the read feature amount.The synthesized voice data is output to a speaker.
[0014]
  According to this configuration, text data is input by the text data input means. The new text data discriminating means compares the stored text data storing the text data with the newly input text data, and if the input text data does not match the stored text data, the input text data is newly input. The new text data is determined and stored as stored text data in the text data storage means.The speech synthesis data storage means is used when speech synthesis of new text data, and stores speech synthesis data including at least words and feature data for each speech synthesis unit.When the new text data discriminating means discriminates it as new text data, the voice synthesizing means uses the voice synthesizing data stored in the voice synthesizing data storage means, and the new text data is synthesized by voice. Synthetic data. The synthesized speech data is recorded until the input text data is selected by the text data selecting means.SpeculatorIf the input text data is selected, the synthesized speech data corresponding to the input text data is synthesized speech data.OutOutput by force means. Speech synthesis data used for speech synthesis by the speech synthesis means in the reference information generation / output meansWordsWhen,Of features corresponding to each wordReference information associated with the storage position on the storage medium in the voice synthesis data storage means is generated and output to the synthesized voice data storage means. This reference information is then used,The synthesized speech data output means reads out the feature quantity corresponding to the phrase included in the input text data selected by the text data selection means from the speech synthesis data storage means and generates it using the read feature quantity. Synthesized voice dataOutput to the speaker.
[0021]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
(Configuration of speech synthesizer)
FIG. 1 is a block diagram of a speech synthesizer. As shown in FIG. 1, the speech synthesizer 1 includes a text data input unit 3, a text data receiving unit 5, a new text data determination unit 7, a text data storage unit 9, a speech synthesis unit 11, A synthesis database 13, a synthesized voice data storage / output unit 15, and a text data selection unit 17 are provided. The speech synthesizer 1 is connected to a speaker 19 that outputs synthesized speech data synthesized by speech synthesis, and includes a remote controller 21 for operating the speech synthesizer 1.
[0022]
This speech synthesizer 1 synthesizes text data transmitted by data broadcasting, text broadcasting, etc., or text data (text data input for each unit) input via a keyboard (not shown) or the like. Output. Further, in this speech synthesizer 1, unlike the conventional speech synthesizer (data receiver with a speech synthesizer function), even if text data and additional information are not transmitted from the transmission side, When synthesizing text data using speech synthesis data, the processing speed can be improved by generating reference information (details will be described later), and once synthesized speech data is synthesized. By storing and reusing, high-quality synthesized speech can be output (maintained) without reducing the processing capability of the apparatus 1. The text data input for each unit refers to the input unit input to the device 1 at a time, or a unit that can be grouped together, for example, each phrase unit or each sentence unit. is there.
[0023]
The text data input unit 3 is configured by a keyboard, a mouse, or the like, and inputs text data. The text data input unit 3 may be configured by an OCR that can acquire text data from existing handwritten text, or may be configured by a disk drive that handles a disk that stores text data. However, it may be configured by an input terminal that is input from the outside via a communication network or the like. The text data input unit 3 corresponds to the text data input means described in the claims.
[0024]
The text data receiving unit 5 is composed of an antenna (parabolic antenna) capable of receiving data broadcasting, text broadcasting, etc., a detection circuit, and the like, and data of current analog television broadcasting and high-definition satellite broadcasting transmitted from a broadcasting station. This is to detect text data included in various digital data transmitted through channels (data broadcasting, text broadcasting, etc.). The text data receiving unit 5 corresponds to the text data receiving means described in the claims.
[0025]
In this embodiment, the text data input unit 3 and the text data receiving unit 5 are provided with a function for setting an input unit (input unit setting function) in advance, and are input to the device 1 at a time. It is possible to set input units (sentence units separated by reception time) and units that can be grouped together, for example, phrase units and sentence units.
[0026]
The new text data discriminating unit 7 stores the text data input from the text data input unit 3 and the text data receiving unit 5 in the text data storage unit 9 as stored text data and stores the text data in the text data storage unit 9. The stored text data is compared with the subsequently input text data (input text data). When the input text data does not match the stored text data, it is determined as new text data and output to the speech synthesizer 11. is there. If it is determined as new text data, it is stored in the text data storage unit 9 as stored text data (new stored text data).
[0027]
In addition, the case where the input text data does not match the stored text data is to divide the input text data by punctuation marks, and if there are any different parts in the sentence between these punctuation marks, it is considered that they do not match. In this case, only the unmatched part is output to the speech synthesizer 11.
[0028]
For example, the stored text data is “Today is a good weather, it will be a comfortable day”, and the input text data is “Today is a good weather and the temperature is expected to be 28 degrees C.” The text data is decomposed and stored in the text data storage unit 9 as “Today is a good weather” and “It will be a comfortable day”, and the input text data “Today is a good weather”, “ The temperature is expected to be 28 degrees. ”When these are decomposed and compared, the difference between the stored text data and the input text data (new part), that is,“ the temperature is expected to be 28 degrees. ” Is determined as new text data and output to the speech synthesizer 11, and the common word “Today is good weather” is not output to the speech synthesizer 11.
[0029]
The text data storage unit 9 is constituted by a semiconductor memory, a hard disk, or the like. The text data obtained by the text data input unit 3 and the text data receiving unit 5 is determined by the new text data determination unit 7. It memorizes based on. The new text data determination unit 7 and the text data storage unit 9 correspond to the new text data determination means described in the claims.
[0030]
The speech synthesizer 11 searches the speech synthesis database 13 for speech synthesis of the new text data determined by the new text data determiner 7, and generates synthesized speech data. Note that the speech synthesis method (means) in the speech synthesizer 11 may be of any type, and for example, the means disclosed in Japanese Patent Laid-Open No. 10-49193 may be used.
[0031]
The speech synthesizer 11 stores synthesized speech data together with synthesized speech data in association with speech synthesis data used when speech synthesis of new text data is performed and a storage position of the speech synthesis database 13 on the storage medium. Reference information generating / outputting means (not shown) for outputting to the output unit 15 is provided.
[0032]
  The reference information is, for example, “Yonekura_noun_file 2” for the word “Yonekura”.3_30 ms to 70 ms_XXX ”,“ noun ”is a kind of information related to the part of speech or the like of a word or phoneme, and“ file 23"Indicates the number of a sentence containing the noun" Yonekura "in the speech synthesis database 13, and" 30ms to 70ms "is file 23(Sentence) indicates the time of utterance, and “XXX” indicates the feature amount of the word “Yonekura”.
[0033]
  In the speech synthesizer 11, when the means disclosed in Japanese Patent Laid-Open No. 10-49193 is used, it takes a long time for speech synthesis to search the speech synthesis database 13. If the reference information generated by the generation output means (not shown) is used,Synthesized speechThe data storage output unit 15 is not necessarily required to store the synthesized voice data, and the storage capacity can be reduced. The speech synthesizer 11 corresponds to the speech synthesizer described in the claims.
[0034]
The speech synthesis database 13 is composed of a large-capacity hard disk or the like, and is used when speech synthesis unit 11 synthesizes new text data. The speech synthesis unit 13 is composed of words or phonemes. ), Speech synthesis data in which speech time, feature amount, and the like are collected is stored. The speech synthesis database 13 corresponds to the speech synthesis data storage means described in the claims.
[0035]
The synthesized speech data storage output unit 15 corresponds to a so-called data buffer, stores the synthesized speech data synthesized by the speech synthesizer 11, and based on the selection data output from the text data selection unit 17. , Which outputs the stored synthesized voice data, and comprises synthesized voice data output means 15a and synthesized voice data storage means 15b.
[0036]
The synthesized voice data output means 15a is responsible for storage and output control of synthesized voice data and reference information. The synthesized voice data synthesized by the voice synthesizer 11 is stored in the synthesized voice data storage means 15b and text Based on the selection data output from the data selection unit 17, synthesized voice data corresponding to the selection data is output to the speaker 19.
[0037]
The synthesized speech data storage unit 15b stores the synthesized speech data synthesized by the speech synthesizer 11 and the reference information according to the control of the synthesized speech data output unit 15a.
The synthesized voice data storage / output unit 15 includes synthesized voice data compression means and compressed synthesized voice data decompression means (not shown).
[0038]
When the synthesized voice data compression means (not shown) stores the synthesized voice data synthesized by the voice synthesizer 11 in the synthesized voice data storage means 15b, the synthesized voice data compression means compresses the data amount of the synthesized voice data and compresses and synthesizes the synthesized voice data. It generates voice data. The compressed synthesized voice data decompressing means (not shown) outputs (reads out) the compressed synthesized voice data stored in the synthesized voice data storage means 15b based on the selection data output from the text data selection unit 17. It is to be thawed.
[0039]
By these synthesized voice data compression means (not shown) and compressed synthesized voice data decompression means (not shown), the storage capacity of the synthesized voice data storage means 15b can be suppressed to a small amount.
[0040]
The text data selection unit 17 corresponds to the voice (synthesized voice data) output from the voice synthesizer 1 based on the infrared signal (control signal) output from the remote controller 21 operated by the user of the voice synthesizer 1. Selects text data.
[0041]
  According to the speech synthesizer 1, text data is input by the text data input unit 3. When the new text data discrimination unit 7 compares the stored text data stored in the text data storage unit 9 with the newly input text data, and the input text data does not match the stored text data, The input text data is determined as newly input new text data. When the new text data discriminating unit 7 discriminates it as new text data, the voice synthesizing unit 11 uses the voice synthesizing data stored in the voice synthesizing database 13 to synthesize the new text data. , Speech synthesis data. Then, until the text data selection unit 17 selects the text data, the voice synthesis data is held in the synthesized voice data storage / output unit 15 and when the text data is selected, the voice synthesis data corresponding to the text data is obtained. It is output from the synthesized voice data storage output unit 15. For this reason,,The synthesized speech data synthesized by the speech synthesizer 11 is stored in the synthesized speech data storage / output unit 15, and only the new text data determined by the new text data determination unit 7 is synthesized by the speech synthesizer 11. Therefore, useless processing (excess speech synthesis) at the time of speech synthesis is reduced, the speech synthesis processing capability of the apparatus 1 can be maintained at a high level, and high-quality synthesized speech data can be generated. .
[0042]
The text data receiving unit 5 of the speech synthesizer 1 receives text data by at least one of data broadcasting and text broadcasting. That is, by providing the text data receiving unit 5, the speech synthesizer 1 can be said to be a data receiver with a speech synthesis function, and is broadcast by normal data broadcasting, and is displayed on the display device on the receiving side. Character information such as displayed telop and text data such as character news are converted (synthesized) into synthesized voice data and output.
[0043]
That is, text data by normal data broadcasting, text broadcasting, or the like is less likely to be frequently replaced while the user (viewer) is using the speech synthesizer (data receiver with speech synthesis function) 1. . Therefore, as soon as new text data that has not been synthesized by the speech synthesizer 11 of the speech synthesizer 1 is received by the text data receiver 5, synthesized speech data synthesized by speech synthesis is generated and synthesized by the synthesized speech data storage output unit 15 When the text data is selected (stored) in the voice data storage means 15b and the text data is selected by the text data selection section 17, the synthesized voice data corresponding to the text data is synthesized. 15 and can be read out by the speaker 19 (can be uttered).
[0044]
(Operation of speech synthesizer)
Next, the operation of the speech synthesizer 1 will be described with reference to the flowchart shown in FIG.
First, when the speech synthesizer 1 is activated, it is determined whether or not text data has been input by at least one of the text data input unit 3 and the text data receiving unit 5 (S1), and waits until it is input (S1, No). If it is input, it is stored as stored text data in the text data storage unit 9 (S2).
[0045]
Then, the stored text data stored in the text data storage unit 9 and the newly input text data are compared and discriminated by the new text data discriminating unit 7, and whether or not the text data is new text data based on the comparison discrimination result. Is determined (S3). When the new text data determination unit 7 determines that the input text data is new text data (S3, Yes), the new text data is output to the speech synthesis unit 11.
[0046]
Then, the speech synthesizer 11 searches the speech synthesis database 13 and synthesizes new text data as speech data (S4). The synthesized voice data and the reference information generated at the time of voice synthesis are output to the synthesized voice data storage output unit 15. When the synthesized voice data and the reference information are input, the synthesized voice data storage / output unit 15 stores the synthesized voice data and the reference information in the synthesized voice data storage unit 15b (S5).
[0047]
Thereafter, it is determined whether or not there is an output request (control signal) of speech (synthesized speech data) from the remote controller 21 (from the user of the speech synthesizer 1) in the text data selection unit 17 (S6). It is waited until it is determined that there is an output request (control signal) of the sound (synthesized sound data) from the remote controller 21 (S6, No), and an output request (control signal) of the sound (synthesized sound data) from the remote controller 21 is received. If it is determined that there is a voice (synthesized voice data) corresponding to the text data in accordance with the output request (control signal), the synthesized voice data output means 15 a of the synthesized voice data storage output unit 15 outputs to the speaker 19. (S7).
[0048]
(Specific examples of speech synthesis of text data by data broadcasting)
A specific example in which the text data receiving unit 5 of the speech synthesizer 1 receives text data by data broadcasting and synthesizes the speech will be described (see FIG. 1 as appropriate). The text data received by the text data receiving unit 5 includes “news”, “weather information”, “sports”, “yen and stock”, “road traffic information”, “welfare”, “recommended information”, This text data is displayed as a “menu screen” on the display screen of a display device (not shown).
[0049]
Assuming that "news", "weather information", "sports", "yen and stock", "road traffic information", "welfare", and "recommended information" have been transmitted from the broadcasting station on the sender side, “News”, “weather information”, “sports”, “yen and stock”, “road traffic information”, “welfare”, “recommended information” are stored in the text data storage unit 9 as stored text data. Subsequent transmitted “news”, “weather information”, “typhoon information”, “sports”, “yen and stock”, “road traffic information”, “welfare”, “recommended information”, “typhoon information” Are determined as new text data by the new text data determination unit 7 and synthesized by the speech synthesis unit 11. The synthesized voice data that has been synthesized by the voice synthesizer 11 is output in order to the synthesized voice data storage output unit 15 and stored in the synthesized voice data storage unit 15b.
[0050]
  If the user of the speech synthesizer 1 selects “yen and stock” with the remote controller 21, the selection data for selecting this “yen and stock” is sent from the text data selection unit 17 to the synthesized speech data storage output unit 15. It is output to the synthesized voice data output means 15a, and in this synthesized voice data output means 15a,Synthesized speechThe synthesized voice data of “yen and stock” stored in the data storage unit 15 b is read and output from the speaker 19.
[0051]
As mentioned above, although this invention was demonstrated based on one Embodiment, this invention is not limited to this.
For example, it is possible to regard the processing of each component of the speech synthesizer 1 as a speech synthesis method that considers each process as one step, or to regard the processing of each component as a speech synthesis program described in a general-purpose computer language. . In these cases, the same effect as the speech synthesizer 1 can be obtained.
[0052]
【The invention's effect】
  Claim 1,4According to the described invention, text data is input, and if this text data is newly input, it is synthesized and stored as synthesized speech data. When text data is selected, synthesized speech data corresponding to the text data is output. Since only new text data is synthesized with speech, there is no need to maintain the same database for speech synthesis between the sender that sent the text data and the receiver that received the text data, without reducing the speech synthesis processing capability. High-quality synthesized speech data can be generated.In addition, since reference information in which the voice synthesis data used for voice synthesis and the storage position on the storage medium in which the voice synthesis data is stored is associated is generated, based on this reference information, Synthetic speech data can be generated, the storage capacity of the storage medium for storing the synthetic speech data can be reduced to a small amount, and the processing for speech synthesis can be reduced.
[0053]
  Claim2According to the described invention, text data by at least one of data broadcasting and text broadcasting is input. That is, it is broadcast by normal data broadcasting, and text information such as telop displayed on the display device on the receiving side and text data such as text news can be output as synthesized voice data.
[0054]
  ClaimTo 3According to the described invention, when the synthesized voice data is stored, the compressed voice data is compressed so as to reduce the amount of data, and the compressed compressed voice data is decompressed when read out, so that the synthesized voice data is stored. The storage capacity of the storage medium to be kept can be reduced to a small amount.
[0055]
According to the fifth aspect of the present invention, the reference information in which the speech synthesis data used for speech synthesis is associated with the storage position on the storage medium in which the speech synthesis data is stored is generated. Based on this reference information, data for speech synthesis can be generated, the storage capacity of the storage medium for storing the synthesized speech data can be reduced to a small amount, and the processing at the time of speech synthesis is reduced. be able to.
[Brief description of the drawings]
FIG. 1 is a block diagram of a speech synthesizer according to an embodiment of the present invention.
FIG. 2 is a flowchart for explaining the operation of the speech synthesizer shown in FIG. 1;
[Explanation of symbols]
1 Speech synthesizer
3 Text data input section
5 Text data receiver
7 New text data discriminator
9 Text data storage
11 Speech synthesis unit
13 Database for speech synthesis
15 Synthetic voice data storage / output unit
15a Synthetic voice data output means
15b Synthetic voice data storage means
17 Text data selection part
19 Speaker
21 Remote control

Claims

A speech synthesizer for speech synthesis of text data,
Text data input means for inputting the text data;
Text data storage means for storing the text data as stored text data;
The text data input for each unit by the text data input means is compared with the stored text data as input text data. When the input text data does not match the stored text data, the input text data is newly New text data determining means for determining as input new text data and storing the stored text data in the text data storing means,
The subjected the new text data in speech synthesis, and speech synthesis data storage means for data words and the feature amount storing data for speech synthesis that is part of at least every speech synthesis unit,
Using the voice synthesis data stored in the voice synthesis data storage means, voice synthesis means for synthesizing the new text data into synthesized voice data; and
Text data selection means for selecting input text data for each unit input by the text data input means;
Synthesized voice data storage means for storing synthesized voice data synthesized by the voice synthesis means;
Synthesized speech data output means for outputting synthesized speech data corresponding to the input text data selected by the text data selection means;
The words and phrases of the speech synthesis data used when speech synthesis is performed by the speech synthesis means, and the storage locations of the speech synthesis data in the speech synthesis data storage means of the feature values corresponding to the respective phrases on the storage medium A reference information generating / outputting unit that outputs the reference information to the synthesized voice data storage unit when the voice synthesizing unit synthesizes the reference information.
The synthesized speech data output means, a feature amount corresponding to words contained before Symbol input text data selected by the text data selection means, reads from the speech synthesis data storage means using said reference information, A speech synthesizer characterized by outputting synthesized speech data generated using the read feature amount to a speaker.

The speech synthesizer according to claim 1, further comprising text data receiving means for receiving the text data by at least one of data broadcasting and text broadcasting.

In storing the synthesized speech data in the synthesized speech data SL 憶手 stage, the synthesized speech data compression unit to compress the synthetic speech data obtained by compressing the data amount of the synthesized speech data,
Claim the input text data is selected by said text data selection means in corresponding to the compressed synthesized speech data, characterized by comprising a compressed synthesized speech data decompression means for decompressing the compressed synthesized speech data The speech synthesizer according to claim 1 or 2.

In order to synthesize text data,
Text data input means for inputting the text data;
Text data storage means for storing the text data as stored text data;
The text data input for each unit by the text data input means is compared with the stored text data as input text data. When the input text data does not match the stored text data, the input text data is newly New text data discrimination means for discriminating as input new text data and storing it as stored text data in the text data storage means,
The new text data is subjected during the speech synthesizing speech synthesis data storage means for data words and the feature amount storing data for speech synthesis that is part of at least every speech synthesis unit,
Using the voice synthesis data stored in the voice synthesis data storage unit, the new text data is voice-synthesized into synthesized voice data;
Text data selection means for selecting input text data for each unit input by the text data input means;
Synthesized voice data storage means for storing synthesized voice data synthesized by the voice synthesis means;
Synthesized voice data output means for outputting synthesized voice data corresponding to the input text data selected by the text data selection means;
The words and phrases of the speech synthesis data used when speech synthesis is performed by the speech synthesis means, and the storage locations of the speech synthesis data in the speech synthesis data storage means of the feature values corresponding to the respective phrases on the storage medium Is generated as reference information generation and output means for outputting to the synthesized voice data storage means when voice synthesis is performed by the voice synthesis means.
The synthesized speech data output means, a feature amount corresponding to words contained before Symbol input text data selected by the text data selection means, reads from the speech synthesis data storage means using said reference information, A speech synthesis program characterized by outputting synthesized speech data generated using the read feature amount to a speaker.