JP2004037605A

JP2004037605A - Data reduction method for voice synthesis, data reduction apparatus for voice synthesis, and data reduction program for voice synthesis

Info

Publication number: JP2004037605A
Application number: JP2002191819A
Authority: JP
Inventors: Hiroyuki Segi; 世木　寛之
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2002-07-01
Filing date: 2002-07-01
Publication date: 2004-02-05
Anticipated expiration: 2022-07-01
Also published as: JP4206230B2

Abstract

【課題】音声データベースが保持している単位音声を明確にすると共に、音声合成する際の処理速度を向上させるために音声データベースを小容量に保持することができる音声合成用データ削減方法、装置およびプログラムを提供する。
【解決手段】音声合成を行う際に使用される、音素および単語の発話時間が記録された音声データベース中の音声合成用データを削減する音声合成用データ削減装置１であって、音声合成を行う際に使用した音声合成用データの使用頻度を記録する合成用データ使用頻度記録部５と、予め設定した使用頻度閾値よりも使用頻度が低い音声合成用データを削減する合成用データ削減部７と、を備えた。
【選択図】　　　図１A voice synthesizing data reduction method, apparatus, and apparatus capable of clarifying a unit voice held by a voice database and maintaining a small volume of the voice database in order to improve a processing speed in voice synthesis. Provide a program.
A speech synthesis data reduction device for reducing speech synthesis data in a speech database in which speech times of phonemes and words used for speech synthesis are recorded, and performs speech synthesis. A synthesis data usage frequency recording unit 5 for recording the frequency of use of the voice synthesis data used at that time, a synthesis data reduction unit 7 for reducing the voice synthesis data whose usage frequency is lower than a preset usage frequency threshold. , With.
[Selection diagram] Fig. 1

Description

【０００１】
【発明の属する技術分野】
本発明は、音声合成に使用される音声合成用データの削減に関する。
【０００２】
【従来の技術】
従来、音声合成の方法（装置）には、例えば、次に示すものがある。
（１）音声合成方法（特開平２−４７７００号公報）
この公報で公開されている音声合成装置には、単位音声（音素または単語）の発話時間が記録された音声データからなる音声データベースが備えられており、この音声合成装置は、当該装置に入力された文章を単位音声に分解した後、分解した単位音声毎に音声データベースの探索を実行し、得られた音声合成用データに対し、音韻および音律の補正を実行して音声合成を行うものである。
【０００３】
（２）自然発話音声波形信号接続型音声合成装置（特開平１０−４９１９３号公報）
この公報で公開されている音声合成装置には、音素の発話時間が記録された音声データベースが備えられており、この音声合成装置は、当該装置に入力された文章を音素列に分解した後、分解した音素列の音素単位で音声データベースの探索を実行し音声合成を行うものである。
【０００４】
【発明が解決しようとする課題】
しかしながら、従来の音声合成装置では、どのような単位音声（音素または単語）を保持している音声データベースなのかが明示されていなかった。また、いずれの音声合成装置も音声データベースのデータ量が多くなると、音声合成の候補となる音声合成用データ数が増大し、探索時間が膨大となり、処理速度が低下してしまうという問題がある。
【０００５】
そこで、本発明の目的は前記した従来の技術が有する課題を解消し、音声データベースが保持している単位音声を明確にすると共に、音声合成する際の処理速度を向上させるために音声データベースを小容量に保持することができる音声合成用データ削減方法、音声合成用データ削減装置および音声合成用データ削減プログラムを提供することにある。
【０００６】
【課題を解決するための手段】
本発明は、前記した目的を達成するため、以下に示す構成とした。
請求項１記載の音声合成用データ削減方法は、音声合成を行う際に使用される、音素および単語の発話時間が記録された音声データベース中における使用頻度の低い、音素および単語からなる音声合成用データを削減する音声合成用データ削減方法であって、音声合成を行う際に使用した音声合成用データの使用頻度を記録する使用頻度記録ステップと、予め設定した使用頻度閾値よりも使用頻度が低い音声合成用データを削減する音声合成用データ削減ステップと、を含むことを特徴とする。
【０００７】
この方法によれば、使用頻度記録ステップにおいて、音声合成装置等で音声合成が行われる度に、当該装置等に内在している音声データベース中の音声合成用データの使用頻度が記録される。そして、音声合成用データ削減ステップにおいて、予め設定した使用頻度閾値よりも使用頻度の低い音声合成用データが削減される。なお、合成音声用データは、音素および単語からなるもので、音声合成する際の最小単位である。また、予め設定した使用頻度閾値は、任意に設定可能な数値であり、例えば、この使用頻度閾値を音声合成装置の使用回数に基づいて想定すると、「使用回数５０回　使用頻度閾値１」、つまり、５０回音声合成を実行しても一度も使用しない音声合成用データを削減の対象に設定することができる。また、この音声合成用データ削減方法は、音声合成装置における音声合成する方法を問わず、当該装置等に少なくとも音声データベースが存在していれば適用可能なものである。
【０００８】
請求項２記載の音声合成用データ削減方法は、請求項１に記載の音声合成用データ削減方法において、前記音声データベースが、当該音声データベース中に含まれる音声合成用データからなる文章の集合として構成されており、前記使用頻度記録ステップにおいて、前記音声データベース中の文章をそれ以外の全ての文章に含まれる音声合成用データに基づいて音声合成を実行した際に、使用した音声合成用データの使用頻度を記録することを特徴とする。
【０００９】
この方法によれば、音声データベースには音声合成用データからなる複数の文章が記憶されており、これらの文章それぞれについて、使用頻度記録ステップにて、音声データベース中に記憶されているそれ以外の文章に含まれる音声合成用データを使用して音声合成した場合の音声合成用データの使用頻度が記録され、音声合成用データ削減ステップにて、予め設定した使用頻度閾値より使用頻度が低い音声合成用データが削減される。
【００１０】
請求項３記載の音声合成用データ削減装置は、音声合成を行う際に使用される、音素および単語の発話時間が記録された音声データベース中における使用頻度の低い、音素および単語からなる音声合成用データを削減する音声合成用データ削減装置であって、音声合成を行う際に使用した音声合成用データの使用頻度を記録する使用頻度記録手段と、予め設定した使用頻度閾値よりも使用頻度が低い音声合成用データを削減する音声合成用データ削減手段と、を備えることを特徴とする。
【００１１】
かかる構成によれば、使用頻度記録手段で、音声合成装置等で音声合成が行われる度に、当該装置等に内在している音声データベース中の音声合成用データの使用頻度が記録される。そして、音声合成用データ削減手段で、予め設定した使用頻度閾値よりも使用頻度の低い音声合成用データが削減される。
【００１２】
請求項４記載の音声合成用データ削減装置は、請求項３に記載の音声合成用データ削減装置において、前記音声データベースが、当該音声データベース中に含まれる音声合成用データからなる文章の集合として構成されており、前記使用頻度記録手段で、前記音声データベース中の文章をそれ以外の全ての文章に含まれる音声合成用データに基づいて音声合成を実行した際に、使用した音声合成用データの使用頻度を記録することを特徴とする。
【００１３】
かかる構成によれば、音声データベースには音声合成用データからなる複数の文章が記憶されており、これらの文章それぞれについて、使用頻度記録手段で、音声データベース中に記憶されているそれ以外の文章に含まれる音声合成用データを使用して音声合成した場合の音声合成用データの使用頻度が記録され、音声合成用データ削減手段で、予め設定した使用頻度閾値より使用頻度が低い音声合成用データが削減される。
【００１４】
請求項５記載の音声合成用データ削減プログラムは、音声合成を行う際に使用される、音素および単語の発話時間が記録された音声データベース中における使用頻度の低い、音素および単語からなる音声合成用データを削減する装置を、以下に示す手段として機能させることを特徴とする。当該装置を機能させる手段は、音声合成を行う際に使用した音声合成用データの使用頻度を記録する使用頻度記録手段、予め設定した使用頻度閾値よりも使用頻度が低い音声合成用データを削減する音声合成用データ削減手段、である。
【００１５】
かかる構成によれば、使用頻度記録手段で、音声合成装置等で音声合成が行われる度に、当該装置等に内在している音声データベース中の音声合成用データの使用頻度が記録される。そして、音声合成用データ削減手段で、予め設定した使用頻度閾値よりも使用頻度の低い音声合成用データが削減される。
【００１６】
請求項６記載の音声合成用データ削減プログラムは、請求項５に記載の音声合成用データ削減プログラムにおいて、前記音声データベースが、当該音声データベース中に含まれる音声合成用データからなる文章の集合として構成されており、前記使用頻度記録手段で、前記音声データベース中の文章をそれ以外の全ての文章に含まれる音声合成用データに基づいて音声合成を実行した際に、使用した音声合成用データの使用頻度を記録することを特徴とする。
【００１７】
かかる構成によれば、音声データベースには音声合成用データからなる複数の文章が記憶されており、これらの文章それぞれについて、使用頻度記録手段で、音声データベース中に記憶されているそれ以外の文章に含まれる音声合成用データを使用して音声合成した場合の音声合成用データの使用頻度が記録され、音声合成用データ削減手段で、予め設定した使用頻度閾値より使用頻度が低い音声合成用データが削減される。
【００１８】
【発明の実施の形態】
以下、本発明の一実施の形態について、図面を参照して詳細に説明する。
（音声合成用データ削減装置の構成）
図１は、音声合成用データ削減システムのブロック図である。この図１に示すように、音声合成用データ削減システムは、音声合成用データ削減装置１と音声合成装置２とからなり、音声合成用データ削減装置１は、入出力部３と、合成用データ使用頻度記録部５と、合成用データ削減部７と、記録部９とを備えている。
【００１９】
音声合成装置２は、音声データベース４を備えており、入力されたテキストデータから音声データ列（音声合成結果）を出力するものである。この音声合成装置２の音声データベース４で保持されている単位音声（音声合成用データ）は、「単語」を基盤としており、この実施の形態では、複数の単語からなる「文章」がデータベースの構成単位となっている。そして、各文章には「文番号」が付されており、各単語の発話時間が記憶されている。なお、当該音声合成装置２の他の構成について、例えば、音声合成の仕方については、本発明と直接関係がないので、図示及び説明を省略する。
【００２０】
音声合成用データ削減装置１は、音声合成装置２において音声合成される度に利用される音声データベース４のデータ量、すなわち、音声合成用データ（単語）を削減するためのもので、この音声合成用データ削減装置１によって音声データベース４のデータ量を減少させることで、音声合成する際の音声合成候補（単語の候補）の探索時間を大幅に短縮させることができるものである。
【００２１】
入出力部３は、音声合成装置２と情報（後記する合成用データ使用頻度情報、合成用データ削減情報）を送受信するためのものである。なお、この入出力部３は、インターネット等の通信回線網（図示を省略）を介して情報の送受信が行えるように構成されている。
【００２２】
合成用データ使用頻度記録部５は、音声合成装置２の音声データベース４で音声合成する度に使用された単位音声（音声合成用データ）の使用頻度に関する情報である合成用データ使用頻度情報を記録部９に記録するものである。つまり、この合成用データ使用頻度記録部５は、音声合成装置２で音声合成された場合に、合成用データ使用頻度情報を音声合成装置２から取得するものであるといえる。
【００２３】
この合成用データ使用頻度記録部５では、例えば、音声合成装置２において音声データベース削減用のテストセット（頻繁に音声合成されるテキストデータ）を用意しておき、このテストセットの文章（テキストデータ）が入力された際の音声合成した結果（合成用データ使用頻度情報）が記録される。この実施の形態では、音声合成装置２の音声データベース４の構成単位が文章であるので、この合成用データ使用頻度情報は、どの文章のどの単語が使用されたかが記録されている。
【００２４】
また、この合成用データ使用頻度記録部５は、音声合成装置２の音声データベース４中の音声合成用データが文章単位で記録されている場合には、一つの文章をその文章以外の他の文章で音声合成した場合の音声合成用データの使用頻度に関する情報である合成用データ使用頻度情報が記録部９に記録される。この合成用データ使用頻度記録部５が請求項に記載した音声合成用データ使用頻度記録手段に相当するものである。
【００２５】
合成用データ削減部７は、合成用データ使用頻度記録部５で記録部９に記録された合成用データ使用頻度情報を使用頻度閾値（合計使用頻度閾値）と比較して、この使用頻度閾値（合計使用頻度閾値）よりも小さい場合に、この合成用データ使用頻度情報に含まれている音声合成用データを削減するための情報である合成用データ削減情報を、入出力部３を介して音声合成装置２に出力するものである。この合成用データ削減情報を受信した音声合成装置２では、音声データベース４中の該当する音声合成用データが削減される。この合成用データ削減部７が請求項に記載した音声合成用データ削減手段に相当するものである。
【００２６】
なお、使用頻度閾値は、予め任意に設定可能な数値であり、例えば、この使用頻度閾値を音声合成装置の使用回数に基づいて想定すると、「使用回数５０　使用頻度閾値１」、つまり、５０回音声合成を実行しても一度も使用しない音声合成用データを削減の対象に設定することができる。
【００２７】
また、合計使用頻度閾値は、音声合成装置２の音声データベース４中の合成用データが文章単位で記録されている場合に、一つの文章をその文章以外の他の文章に含まれる音声合成用データで音声合成した場合の音声合成用データの使用頻度に関する情報である合成用データ使用頻度情報と比較するためのものである。
【００２８】
記録部９は、一般的なハードディスク等によって構成されており、合成用データ使用頻度情報と、使用頻度閾値（合計使用頻度閾値）とを記録するものである。なお、合成用データ使用頻度情報は、この記録部９において、各音声合成用データに設定されているデータ使用頻度に数値を加算していく形式（１回使用されれば＋１）で更新される。
【００２９】
この音声合成用データ削減装置１によれば、合成用データ使用頻度記録部５で、音声合成装置２で音声合成が行われる度に、当該装置２に内在している音声データベース４中の音声合成用データの使用頻度が記録される。そして、合成用データ削減部７で、予め設定した使用頻度閾値よりも使用頻度の低い音声合成用データが削減される。このため、音声合成装置２の音声データベース４で保持されている単位音声（音声合成用データ）が単語である場合に、音声合成する際に利用される音声データベース４中で、使用頻度の低い音声合成用データ（単語）を適宜削除することで、音声データベース４をコンパクトに（小容量に）維持することができ、音声合成装置２で音声合成する際の処理速度を向上させることができる。
【００３０】
音声合成装置２の音声データベース４には音声合成用データ（単語）からなる複数の文章が記憶されており、これらの文章それぞれについて、合成用データ使用頻度記録部５で、音声データベース４中に記憶されているそれ以外の文章を使用して音声合成した場合の音声合成用データの使用頻度が記録され、合成用データ削減部７で、予め設定した合計使用頻度閾値より使用頻度が低い音声合成用データが削減される。このため、音声データベース４の一つの文章中の音声合成用データ（単語）が他の文章を使用して音声合成することで使用頻度が記録され、使用頻度が低い場合には削除されるので、自動的に（自己学習的に）音声データベース４のデータ量を小容量に維持することができ、音声合成装置２で音声合成する際の処理速度を向上させることができる。
【００３１】
（音声合成用データ削減装置の動作）
次に、図２に示すフローチャートを参照して音声合成用データ削減装置１の動作を説明する（適宜図１参照）。
まず、入出力部３で合成用データ使用頻度情報が入力される（Ｓ１）。この音声合成用データ削減装置１の動作の説明では、一定期間、音声合成装置２において音声合成が実行されて、音声合成装置２で保持され合成用データ使用頻度情報が定期的に音声合成用データ削減装置１に入力されるものとしている。そして、入出力部３から合成用データ使用頻度記録部５に合成用データ使用頻度情報が出力されると、合成用データ使用頻度記録部５で、合成用データ使用頻度情報が記録部９に記録される（Ｓ２）。
【００３２】
そして、合成用データ削減部７で、合成用データ使用頻度情報と使用頻度閾値とが比較され、合成用データ使用頻度情報（図２中、使用頻度）が使用頻度閾値未満であるかどうかが判断される（Ｓ３）。合成用データ使用頻度情報が使用頻度閾値未満であると判断された場合（Ｓ３、Ｙｅｓ）には、入出力部３に合成用データ削減情報が出力され、入出力部３から音声合成装置２に出力される（Ｓ４）。その後、音声合成用データ削減装置１の動作が終了される。合成用データ使用頻度情報が使用頻度閾値未満であると判断されない場合（Ｓ３、Ｎｏ）は、そのまま音声合成用データ削減装置１の動作が終了される。
【００３３】
（音声合成用データ削減装置の具体的な音声合成用データ削減例）
次に、具体的に、音声合成用データ削減システムで、音声合成用データの削減を実行（運用）した場合について説明する。
実際の運用例では、音声合成装置２に、テキストデータの文章として「音声データベース削減用テストセット」を入力し、音声合成を行う。この「音声データベース削減用テストセット」は、報道番組等で頻繁に音声合成されるテキストデータの中から無作為抽出した数十種類の文章（テキストデータ）である。つまり、この「音声データベース削減用テストセット」が入力された際に使用される音声データベース４中の音声合成用データ（単語）は、通常の日本語を音声合成する際にも、頻繁に使用される可能性が高いものである。
【００３４】
まず、音声合成装置２で「音声データベース削減用テストセット」が入力されると、音声合成が実行され、音声合成用データ削減装置１の合成用データ使用頻度記録部５で、音声合成装置２の音声データベース４中のどの文章のどの単語が使用されたかが、合成用データ使用頻度情報によって得られ、記録部９に記録される。
【００３５】
例えば、「音声データベース削減用テストセット」として、「〈文頭〉次のニュースです〈文末〉」が入力されたときに、
「〈文頭〉次の」・・・文番号５０の０ｍｓ〜１０００ｍｓ
「ニュース」・・・文番号８の２１２５ｍｓ〜２８４０ｍｓ
「です〈文末〉」・・・文番号３２の１５００ｍｓ〜２０００ｍｓ
が使用されたとする（合成用データ使用頻度情報に含まれたデータとする）。
【００３６】
この時に、合成用データ使用頻度記録部５では、“文番号５０の「〈文頭〉」、「次」”、“文番号８の「ニュース」”、“文番号３２の「です〈文末〉」”のデータ使用頻度（記録部９）にそれぞれ１をプラスする。以降、次のテストセット（文章）が入力される毎に、合成用データ使用頻度記録部５では、使用された単語のデータ使用頻度に＋１していく。全てのテストセットが終了したらデータ使用頻度が低いもの、すなわち、使用頻度閾値に満たないものが合成用データ削減部７によって削減される。
【００３７】
この具体的な例によれば、「音声データベース削減用テストセット」によって生成される合成用データ使用頻度情報に基づいて、音声合成用データ削減装置１で音声合成用データが削減された音声データベース４は、音声データベースのバリエーション、すなわち、合成される日本語のバリエーションの幅が狭まるが、データ量がコンパクト（小容量）に抑えられると共に、音声合成する際の処理速度を向上させることができる。
【００３８】
以上、一実施形態に基づいて本発明を説明したが、本発明はこれに限定されるものではない。
例えば、音声合成用データ削減装置１の各構成の処理を一つずつの工程と捉えた音声合成用データ削減方法とみなすことや、各構成の処理を汎用的なコンピュータ言語で記述した音声合成用データ削減プログラムとみなすことができる。これらの場合、音声合成用データ削減装置１と同様の効果を得ることができる。
【００３９】
【発明の効果】
請求項１、３、５記載の発明によれば、音声合成装置等で音声合成が行われる度に、当該装置等に内在している音声データベース中の音声合成用データの使用頻度が記録され、予め設定した使用頻度閾値よりも使用頻度の低い音声合成用データが削減される。このため、音声合成装置等の音声データベースで保持されている単位音声が単語である場合に、音声合成する際に利用される音声データベース中で、使用頻度の低い音声合成用データを適宜削除することで、音声データベースを小容量に維持することができ、音声合成装置で音声合成する際の処理速度を向上させることができる。
【００４０】
請求項２、４、６記載の発明によれば、音声データベースには音声合成用データ（単語）からなる複数の文章が記憶されている場合、これらの文章それぞれについて、音声データベース中に記憶されているそれ以外の文章を使用して音声合成した場合の音声合成用データの使用頻度が記録され、予め設定した合計使用頻度閾値より使用頻度が低い音声合成用データが削減される。このため、音声データベースの一つの文章中の音声合成用データが他の文章を使用して音声合成することで使用頻度が記録され、使用頻度が低い場合には削除されるので、自動的に（自己学習的に）音声データベースのデータ量を小容量に維持することができ、音声合成装置で音声合成する際の処理速度を向上させることができる。
【図面の簡単な説明】
【図１】本発明による一実施の形態である音声合成用データ削減システムを図示したブロック図である。
【図２】図１に示した音声合成用データ削減装置の動作を説明したフローチャートである。
【符号の説明】
１　音声合成用データ削減装置
２　音声合成装置
３　入出力部
４　音声データベース
５　合成用データ使用頻度記録部
７　合成用データ削減部
９　記録部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to reduction of data for speech synthesis used for speech synthesis.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, there are the following methods (devices) for speech synthesis, for example.
(1) Speech synthesis method (JP-A-2-47700)
The speech synthesizer disclosed in this publication is provided with a speech database comprising speech data in which the speech times of unit speech (phonemes or words) are recorded, and the speech synthesizer is input to the speech synthesizer. After decomposing the sentence into unit voices, a search of a voice database is performed for each of the decomposed unit voices, and phonetic synthesis and tone correction are performed on the obtained voice synthesis data to perform voice synthesis. .
[0003]
(2) Spontaneous speech waveform signal connection type speech synthesizer (JP-A-10-49193)
The speech synthesizer disclosed in this publication is provided with a speech database in which speech times of phonemes are recorded, and the speech synthesizer decomposes a sentence input into the device into a phoneme sequence, The speech database is searched for each phoneme of the decomposed phoneme sequence to perform speech synthesis.
[0004]
[Problems to be solved by the invention]
However, the conventional speech synthesizer does not specify what unit speech (phoneme or word) the speech database holds. In addition, in any of the speech synthesizers, when the data amount of the speech database is large, the number of speech synthesis data that is candidates for speech synthesis increases, the search time becomes enormous, and the processing speed decreases.
[0005]
Therefore, an object of the present invention is to solve the above-mentioned problems of the conventional technology, to clarify the unit speech held by the speech database, and to reduce the speech database in order to improve the processing speed in speech synthesis. It is an object of the present invention to provide a voice synthesis data reduction method, a voice synthesis data reduction device, and a voice synthesis data reduction program that can be held in a capacity.
[0006]
[Means for Solving the Problems]
The present invention has the following configuration to achieve the above object.
The method for reducing data for speech synthesis according to claim 1, which is used for speech synthesis, is used for speech synthesis of phonemes and words that are used infrequently in a speech database in which speech times of phonemes and words are recorded. A method for reducing data for speech synthesis for reducing data, wherein a use frequency recording step of recording the use frequency of the speech synthesis data used when performing speech synthesis, and a use frequency lower than a preset use frequency threshold And a voice synthesis data reduction step of reducing voice synthesis data.
[0007]
According to this method, in the use frequency recording step, each time the speech synthesis is performed by the speech synthesis device or the like, the usage frequency of the speech synthesis data in the speech database inherent in the device or the like is recorded. Then, in the voice synthesizing data reduction step, voice synthesizing data whose use frequency is lower than a preset use frequency threshold is reduced. The synthesized speech data is composed of phonemes and words, and is the minimum unit for speech synthesis. The preset use frequency threshold is a numerical value that can be set arbitrarily. For example, assuming this use frequency threshold based on the number of times the speech synthesizer has been used, "use count 50 times use frequency threshold 1", that is, , The data for voice synthesis that is never used even if the voice synthesis is executed 50 times can be set as a reduction target. This method of reducing data for speech synthesis is applicable irrespective of the method of speech synthesis in the speech synthesis device as long as at least a speech database exists in the device or the like.
[0008]
According to a second aspect of the present invention, in the method for reducing data for speech synthesis according to the first aspect, the speech database is configured as a set of sentences composed of the data for speech synthesis included in the speech database. In the use frequency recording step, when a sentence in the speech database is subjected to speech synthesis based on speech synthesis data included in all other sentences, use of the used speech synthesis data The frequency is recorded.
[0009]
According to this method, a plurality of sentences composed of speech synthesis data are stored in the speech database, and for each of these sentences, in the use frequency recording step, other sentences stored in the speech database are stored. The usage frequency of the voice synthesis data when voice synthesis is performed using the voice synthesis data included in the voice synthesis data is recorded. In the voice synthesis data reduction step, the voice synthesis use frequency lower than a predetermined usage frequency threshold is used. Data is reduced.
[0010]
A speech synthesis data reduction device according to claim 3, which is used for speech synthesis, is used for speech synthesis of phonemes and words that are used infrequently in a speech database in which speech times of phonemes and words are recorded. A data reduction device for voice synthesis for reducing data, wherein a usage frequency recording means for recording the frequency of use of voice synthesis data used when performing voice synthesis, and a usage frequency lower than a preset usage frequency threshold And voice synthesis data reduction means for reducing voice synthesis data.
[0011]
According to this configuration, the use frequency recording unit records the use frequency of the data for voice synthesis in the voice database included in the voice synthesizer or the like each time the voice synthesis is performed by the voice synthesizer or the like. Then, the voice synthesizing data reducing unit reduces voice synthesizing data whose use frequency is lower than the preset use frequency threshold.
[0012]
According to a fourth aspect of the present invention, in the data reduction apparatus for speech synthesis according to the third aspect, the speech database is configured as a set of sentences composed of speech synthesis data included in the speech database. When the use frequency recording unit executes speech synthesis based on the speech synthesis data included in all the other sentences in the sentences in the speech database, the use of the speech synthesis data used is The frequency is recorded.
[0013]
According to such a configuration, a plurality of sentences composed of speech synthesis data are stored in the speech database, and each of these sentences is stored in the other sentences stored in the speech database by the use frequency recording unit. The frequency of use of the voice synthesis data when voice synthesis is performed using the included voice synthesis data is recorded, and the voice synthesis data reduction unit detects voice synthesis data having a frequency of use lower than a preset use frequency threshold. Be reduced.
[0014]
A speech synthesis data reduction program according to claim 5, which is used for performing speech synthesis, and is used for a speech synthesis consisting of phonemes and words which are used infrequently in a speech database in which speech times of phonemes and words are recorded. It is characterized in that an apparatus for reducing data functions as the following means. Means for causing the apparatus to function include a use frequency recording means for recording the use frequency of the speech synthesis data used when performing speech synthesis, and reducing the speech synthesis data having a use frequency lower than a preset use frequency threshold. Means for reducing data for speech synthesis.
[0015]
According to this configuration, the use frequency recording unit records the use frequency of the data for voice synthesis in the voice database included in the voice synthesizer or the like each time the voice synthesis is performed by the voice synthesizer or the like. Then, the voice synthesizing data reducing unit reduces voice synthesizing data whose use frequency is lower than the preset use frequency threshold.
[0016]
The data reduction program for speech synthesis according to claim 6 is the data reduction program for speech synthesis according to claim 5, wherein the speech database is configured as a set of sentences composed of speech synthesis data included in the speech database. When the use frequency recording unit executes speech synthesis based on the speech synthesis data included in all the other sentences in the sentences in the speech database, the use of the speech synthesis data used is The frequency is recorded.
[0017]
According to such a configuration, a plurality of sentences composed of speech synthesis data are stored in the speech database, and each of these sentences is stored in the other sentences stored in the speech database by the use frequency recording unit. The frequency of use of the voice synthesis data when voice synthesis is performed using the included voice synthesis data is recorded, and the voice synthesis data reduction unit detects voice synthesis data having a frequency of use lower than a preset use frequency threshold. Be reduced.
[0018]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
(Configuration of data reduction device for speech synthesis)
FIG. 1 is a block diagram of a speech synthesis data reduction system. As shown in FIG. 1, the voice synthesis data reduction system includes a voice synthesis data reduction device 1 and a voice synthesis device 2. The voice synthesis data reduction device 1 includes an input / output unit 3, a synthesis data The apparatus includes a use frequency recording unit 5, a synthesis data reduction unit 7, and a recording unit 9.
[0019]
The speech synthesizer 2 includes a speech database 4, and outputs a speech data string (speech synthesis result) from input text data. The unit speech (speech synthesis data) held in the speech database 4 of the speech synthesizer 2 is based on "words". In this embodiment, a "sentence" including a plurality of words is configured in the database. It is a unit. Each sentence is assigned a “sentence number”, and the speech time of each word is stored. It should be noted that other configurations of the speech synthesizer 2, for example, the manner of speech synthesis are not directly related to the present invention, and therefore, illustration and description are omitted.
[0020]
The speech synthesis data reduction device 1 is used to reduce the data amount of the speech database 4 used each time speech synthesis is performed in the speech synthesis device 2, that is, speech synthesis data (words). By reducing the data amount of the speech database 4 by the data reduction device 1, the search time for speech synthesis candidates (word candidates) at the time of speech synthesis can be significantly reduced.
[0021]
The input / output unit 3 is for transmitting and receiving information (synthesis data use frequency information and synthesis data reduction information described later) to and from the speech synthesizer 2. The input / output unit 3 is configured to transmit and receive information via a communication network such as the Internet (not shown).
[0022]
The synthesis data use frequency recording unit 5 records synthesis data use frequency information, which is information on the use frequency of unit speech (speech synthesis data) used each time speech synthesis is performed in the speech database 4 of the speech synthesizer 2. This is recorded in the unit 9. In other words, it can be said that the synthesis data use frequency recording unit 5 acquires the synthesis data use frequency information from the speech synthesis device 2 when the speech is synthesized by the speech synthesis device 2.
[0023]
In the synthesis data use frequency recording unit 5, for example, a test set for reducing the voice database (text data that is frequently synthesized) is prepared in the voice synthesis device 2, and the text (text data) of this test set is prepared. The result of speech synthesis (synthesis data use frequency information) when is input is recorded. In this embodiment, since the constituent unit of the speech database 4 of the speech synthesizer 2 is a sentence, the synthesis data use frequency information records which word of which sentence was used.
[0024]
When the speech synthesis data in the speech database 4 of the speech synthesizer 2 is recorded in units of sentences, the synthesis data use frequency recording unit 5 converts one sentence into another sentence other than the sentence. The synthesis unit usage frequency information, which is information on the usage frequency of the voice synthesis data when the voice synthesis is performed, is recorded in the recording unit 9. The synthesizing data use frequency recording unit 5 corresponds to a voice synthesizing data use frequency recording unit described in claims.
[0025]
The combining data reducing unit 7 compares the combining data use frequency information recorded in the recording unit 9 by the combining data use frequency recording unit 5 with a use frequency threshold (total use frequency threshold), and If it is smaller than the total usage frequency threshold), the synthesis data reduction information, which is information for reducing the voice synthesis data included in the synthesis data usage frequency information, is transmitted via the input / output unit 3 to the voice. This is output to the synthesizing device 2. In the voice synthesizing device 2 which has received the synthesis data reduction information, the corresponding voice synthesis data in the voice database 4 is reduced. The synthesizing data reducing section 7 corresponds to the voice synthesizing data reducing means described in the claims.
[0026]
The use frequency threshold is a numerical value that can be arbitrarily set in advance. For example, assuming that the use frequency threshold is based on the number of uses of the speech synthesizer, “use number 50 use frequency threshold 1”, that is, 50 times Even if voice synthesis is executed, data for voice synthesis that is never used can be set as a reduction target.
[0027]
In addition, the total use frequency threshold is set such that, when the synthesis data in the speech database 4 of the speech synthesis device 2 is recorded in units of sentences, one sentence is included in the speech synthesis data included in other sentences other than the sentence. This is for comparison with the synthesis data use frequency information, which is information on the use frequency of the speech synthesis data when the speech synthesis is performed.
[0028]
The recording unit 9 is composed of a general hard disk or the like, and records synthesis data use frequency information and a use frequency threshold (total use frequency threshold). Note that the synthesis data use frequency information is updated in the recording unit 9 in a format in which a numerical value is added to the data use frequency set for each voice synthesis data (+1 when used once). .
[0029]
According to the speech synthesis data reduction device 1, the speech data in the speech database 4 existing in the speech synthesis device 2 is used every time speech synthesis is performed by the speech synthesis device 2 in the synthesis data use frequency recording unit 5. The usage frequency of the data is recorded. Then, the synthesis data reducing unit 7 reduces the voice synthesis data whose use frequency is lower than the preset use frequency threshold. For this reason, when the unit speech (speech synthesis data) held in the speech database 4 of the speech synthesis device 2 is a word, the speech database 4 used for speech synthesis is used in the speech database 4 which is used less frequently. By appropriately deleting the synthesizing data (words), the voice database 4 can be kept compact (small volume), and the processing speed when performing voice synthesis by the voice synthesizing device 2 can be improved.
[0030]
The speech database 4 of the speech synthesizer 2 stores a plurality of sentences composed of speech synthesis data (words). Each of these sentences is stored in the speech database 4 by the synthesis data use frequency recording unit 5. The usage frequency of the voice synthesis data when voice synthesis is performed using the other sentence is recorded, and the voice data synthesis unit 7 uses the voice synthesis data having a usage frequency lower than a preset total usage frequency threshold. Data is reduced. For this reason, the speech synthesis data (word) in one sentence of the speech database 4 is synthesized by speech synthesis using another sentence, and the use frequency is recorded. If the use frequency is low, it is deleted. It is possible to automatically (self-learning) maintain the data amount of the voice database 4 at a small capacity, and improve the processing speed when the voice synthesizer 2 synthesizes voice.
[0031]
(Operation of the data reduction device for speech synthesis)
Next, the operation of the data reduction device 1 for speech synthesis will be described with reference to the flowchart shown in FIG. 2 (see FIG. 1 as appropriate).
First, data use frequency information for synthesis is input to the input / output unit 3 (S1). In the description of the operation of the speech synthesis data reducing device 1, the speech synthesis is performed in the speech synthesis device 2 for a certain period, and the synthesis data use frequency information held in the speech synthesis device 2 is periodically updated. It is to be input to the reduction device 1. When the synthesis data use frequency information is output from the input / output unit 3 to the synthesis data use frequency recording unit 5, the synthesis data use frequency recording unit 5 records the synthesis data use frequency information in the recording unit 9. Is performed (S2).
[0032]
Then, the data-for-synthesis reduction unit 7 compares the frequency-of-use data for use with the frequency of use with the frequency-of-use threshold, and determines whether or not the frequency-of-use data for use for synthesis (use frequency in FIG. 2) is less than the frequency-of-use threshold. Is performed (S3). If it is determined that the synthesis data use frequency information is less than the use frequency threshold (S3, Yes), the synthesis data reduction information is output to the input / output unit 3, and the input / output unit 3 sends the information to the speech synthesizer 2. It is output (S4). After that, the operation of the data reduction device 1 for speech synthesis ends. If it is not determined that the synthesis data use frequency information is less than the use frequency threshold (S3, No), the operation of the speech synthesis data reduction device 1 is terminated.
[0033]
(Specific example of speech synthesis data reduction by speech synthesis data reduction device)
Next, a case in which the data for speech synthesis is reduced (operated) by the data reduction system for speech synthesis will be specifically described.
In an actual operation example, a “speech database reduction test set” is input to the speech synthesizer 2 as a sentence of text data, and speech synthesis is performed. This "test set for reducing voice database" is dozens of types of sentences (text data) randomly extracted from text data that is frequently subjected to voice synthesis in news programs and the like. That is, the speech synthesis data (words) in the speech database 4 used when the “test set for speech database reduction” is input is frequently used even when speech is synthesized in ordinary Japanese. Is likely to be
[0034]
First, when the “speech database reduction test set” is input by the voice synthesis device 2, voice synthesis is performed, and the synthesis data use frequency recording unit 5 of the voice synthesis data reduction device 1 uses the voice synthesis device 2. Which sentence of which sentence in the voice database 4 is used is obtained from the synthesis data use frequency information, and is recorded in the recording unit 9.
[0035]
For example, if "<Sentence> Next news <End of sentence>" is input as "Test set for voice database reduction",
"<First sentence>next" ... 0ms to 1000ms of sentence number 50
"News": Sentence number 8, 2125ms-2840ms
"Is <End of sentence>" ... Sentence number 32: 1500ms to 2000ms
Is used (assuming that the data is included in the synthesis data use frequency information).
[0036]
At this time, the data-for-synthesis use frequency recording unit 5 reads "" at the beginning of the sentence number, "Next", "" News "at the sentence number 8, and""" at the sentence number 32. "1 is added to the data use frequency (recording unit 9) of the". ". Each time the next test set (text) is input, the data use frequency recording unit 5 When all test sets are completed, the frequency of data use is low, that is, the frequency less than the frequency of use threshold is reduced by the data-for-synthesis reduction unit 7.
[0037]
According to this specific example, the speech database 4 whose speech synthesis data has been reduced by the speech synthesis data reduction device 1 based on the synthesis data use frequency information generated by the “speech database reduction test set”. Although the range of the variation of the speech database, that is, the variation of the Japanese language to be synthesized is narrowed, the data amount can be suppressed to be small (small volume) and the processing speed at the time of speech synthesis can be improved.
[0038]
As described above, the present invention has been described based on one embodiment, but the present invention is not limited to this.
For example, the processing of each component of the voice synthesis data reduction device 1 is regarded as a voice synthesis data reduction method that is regarded as one process, and the processing of each component is described in a general-purpose computer language. It can be considered a data reduction program. In these cases, the same effect as that of the data reduction device 1 for speech synthesis can be obtained.
[0039]
【The invention's effect】
According to the first, third, and fifth aspects of the present invention, each time speech synthesis is performed by a voice synthesis device or the like, the frequency of use of voice synthesis data in a voice database that is inherent in the voice synthesis device or the like is recorded. Speech synthesis data that is used less frequently than the preset use frequency threshold is reduced. For this reason, when the unit speech held in the speech database of the speech synthesizer or the like is a word, in the speech database used for speech synthesis, it is necessary to appropriately delete the infrequently used speech synthesis data. Thus, the voice database can be maintained in a small capacity, and the processing speed when performing voice synthesis by the voice synthesizer can be improved.
[0040]
According to the second, fourth and sixth aspects of the present invention, when a plurality of sentences composed of speech synthesis data (words) are stored in the speech database, each of these sentences is stored in the speech database. The usage frequency of the voice synthesis data when voice synthesis is performed using other sentences is recorded, and voice synthesis data having a usage frequency lower than a preset total usage frequency threshold is reduced. For this reason, the speech synthesis data in one sentence of the speech database is speech-synthesized using another sentence, and the use frequency is recorded. If the use frequency is low, the data is deleted. The data amount of the speech database can be kept small (self-learning), and the processing speed when speech is synthesized by the speech synthesizer can be improved.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a data reduction system for speech synthesis according to an embodiment of the present invention.
FIG. 2 is a flowchart for explaining the operation of the speech synthesis data reducing device shown in FIG. 1;
[Explanation of symbols]
REFERENCE SIGNS LIST 1 voice synthesis data reduction device 2 voice synthesis device 3 input / output unit 4 voice database 5 synthesis data usage frequency recording unit 7 synthesis data reduction unit 9 recording unit

Claims

A speech synthesis data reduction method for reducing speech synthesis data composed of phonemes and words, which is used less frequently in a speech database in which speech times of phonemes and words used for speech synthesis are recorded. ,
A use frequency recording step of recording the use frequency of the speech synthesis data used when performing speech synthesis;
A voice synthesis data reduction step of reducing voice synthesis data whose use frequency is lower than a use frequency threshold set in advance;
A method for reducing data for speech synthesis, comprising:

The speech database is configured as a set of sentences composed of speech synthesis data included in the speech database,
In the use frequency recording step, when a sentence in the speech database is subjected to speech synthesis based on speech synthesis data included in all other sentences, a use frequency of the used speech synthesis data is recorded. 2. The method for reducing data for speech synthesis according to claim 1, wherein:

A speech synthesis data reduction device for reducing speech synthesis data consisting of phonemes and words, which is used in a speech database in which the speech times of phonemes and words used for speech synthesis are recorded. ,
Usage frequency recording means for recording the frequency of use of voice synthesis data used when performing voice synthesis,
Voice synthesis data reducing means for reducing voice synthesis data having a frequency of use lower than a preset use frequency threshold,
An apparatus for reducing data for speech synthesis, comprising:

The speech database is configured as a set of sentences composed of speech synthesis data included in the speech database,
The use frequency recording unit records the use frequency of the used voice synthesis data when the text in the voice database is subjected to voice synthesis based on the voice synthesis data included in all other texts. 4. The apparatus for reducing data for speech synthesis according to claim 3, wherein:

A device for reducing speech synthesis data consisting of phonemes and words, which is used less frequently in a speech database in which speech times of phonemes and words are used when performing speech synthesis,
Usage frequency recording means for recording the frequency of use of voice synthesis data used when performing voice synthesis,
Voice synthesis data reduction means for reducing voice synthesis data having a frequency of use lower than a preset use frequency threshold,
A data reduction program for speech synthesis characterized by functioning as a computer.

The speech database is configured as a set of sentences composed of speech synthesis data included in the speech database,
The use frequency recording means, when one sentence in the speech database is subjected to speech synthesis based on the speech synthesis data included in all other sentences, the use frequency of the used speech synthesis data The data reduction program for speech synthesis according to claim 5, wherein the program is recorded.