JP5763414B2

JP5763414B2 - Feature parameter generation device, feature parameter generation method, and feature parameter generation program

Info

Publication number: JP5763414B2
Application number: JP2011114192A
Authority: JP
Inventors: 信行西澤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2011-05-20
Filing date: 2011-05-20
Publication date: 2015-08-12
Anticipated expiration: 2031-05-20
Also published as: JP2012242693A

Description

本発明は、シンボルの系列情報に対応する特徴パラメータの時系列の分布を再現する特徴パラメータ生成装置、特徴パラメータ生成方法および特徴パラメータ生成プログラムに関する。 The present invention relates to a feature parameter generation apparatus, a feature parameter generation method, and a feature parameter generation program that reproduce a time-series distribution of feature parameters corresponding to symbol series information.

音声合成技術の代表的な利用方法として、テキスト音声変換（Text-To-Speech）が挙げられる。以下、テキスト解析等の結果得られる音素の種類や韻律的特徴を表記した記号を入力とし、音声波形を生成する装置を音声合成装置と呼ぶ。音声合成装置は、テキスト音声変換システムの構成要素である。 Text-to-speech is a typical method of using speech synthesis technology. Hereinafter, a device that generates a speech waveform using a symbol representing the type of phoneme and prosodic features obtained as a result of text analysis or the like as an input is called a speech synthesizer. A speech synthesizer is a component of a text-to-speech conversion system.

この音声合成装置に入力される記号を、以下、音声合成用記号と呼ぶ。音声合成用記号には様々な形式があり得るが、ここでは、一連の音声を構成する音韻的情報と、主としてポーズや声の高さとして表現される韻律的情報を同時に表記したものを考える。そのような音声合成用記号の例として、ＪＥＩＴＡ（電子情報技術産業協会）規格ＩＴ−４００２「日本語テキスト音声合成用記号」がある（非特許文献１参照）。音声合成装置は、このような音声合成用記号に基づいてそれに対応する音声波形を生成する。ただし、一般に音声波形は合成対象の音素だけでなく、前後の音素の種類や韻律的特徴の影響を強く受けるため、一般的に記号と音声波形の対応関係は複雑になる。 The symbols input to this speech synthesizer are hereinafter referred to as speech synthesis symbols. There are various forms of the symbols for speech synthesis. Here, let us consider a case in which phonological information constituting a series of speech and prosodic information mainly expressed as a pose or a voice pitch are simultaneously described. An example of such a symbol for speech synthesis is JEITA (Electronic Information Technology Industries Association) standard IT-4002 “symbol for Japanese text speech synthesis” (see Non-Patent Document 1). The speech synthesizer generates a speech waveform corresponding to such a speech synthesis symbol. However, since the speech waveform is generally strongly influenced by not only the phonemes to be synthesized but also the types of phonemes before and after and the prosodic features, the correspondence between symbols and speech waveforms is generally complicated.

音声合成装置による音声波形の生成方法には様々な方式があるが、音声の短時間スペクトルの特徴や有声・無声情報、基本周波数（F0）を直接パラメータとし、このパラメータに基づき音声波形を生成する方法が主な背景技術である。代表的な音声波形の生成方法に、音源・フィルタモデルに基づく音声合成がある。音源・フィルタモデルでは、音声の響きをつくる調音フィルタを適当な音源で駆動することで、音声波形を信号処理的に合成する。 There are various methods for generating a speech waveform by a speech synthesizer. The speech waveform is generated based on the short-time spectral features of voice, voiced / unvoiced information, and the fundamental frequency (F0) as direct parameters. The method is the main background art. A typical speech waveform generation method is speech synthesis based on a sound source / filter model. In the sound source / filter model, a sound waveform is synthesized in a signal processing manner by driving an articulation filter that generates sound of sound with an appropriate sound source.

インパルス列や白色雑音源といった比較的に単純な構成の音源を用いる場合、インパルス列と白色雑音源の切り替えは有声・無声情報に基づき、インパルス列の基本周波数はF0パラメータに基づきそれぞれ制御することができる。一方、スペクトルの特徴を表すパラメータとしてはＭＦＣＣ（Mel-Frequency Cepstral Coefficient）や線形予測係数が用いられ、調音フィルタとしては、ＡＲ（自己回帰）型のフィルタや、特にパラメータとしてＭＦＣＣを用いる場合には、ＭＦＣＣを直接そのパラメータとする、ＭＬＳＡ（メル対数スペクトル近似）フィルタ（非特許文献２参照）等が用いられる。 When using a relatively simple sound source such as an impulse train or white noise source, switching between the impulse train and the white noise source is based on voiced / unvoiced information, and the fundamental frequency of the impulse train can be controlled based on the F0 parameter. it can. On the other hand, MFCC (Mel-Frequency Cepstral Coefficient) or a linear prediction coefficient is used as a parameter representing the characteristics of the spectrum, and an AR (autoregressive) type filter is used as an articulation filter, or in particular, when MFCC is used as a parameter. An MLSA (Mel logarithmic spectrum approximation) filter (see Non-Patent Document 2), which directly uses MFCC as its parameter, is used.

例えば子音のような音声を合成するためには、音声合成パラメータを時間的に変化させることが必要なため、この方法では、例えば５ｍｓ程度の一定周期で音声合成パラメータを更新し、その特徴を変化させながら音声を合成することが一般的である。この一定周期の１周期分は一般に１フレームと呼ばれる。したがって、一般的に音声を合成するためには、音声合成用記号から、音声合成パラメータについてフレーム周期の時系列データを作成する必要がある。 For example, in order to synthesize speech such as consonants, it is necessary to change the speech synthesis parameters over time. In this method, for example, the speech synthesis parameters are updated at a fixed period of about 5 ms and the characteristics are changed. It is common to synthesize speech while One period of this fixed period is generally called one frame. Therefore, in general, in order to synthesize speech, it is necessary to create time-series data of frame periods for speech synthesis parameters from speech synthesis symbols.

最も簡単な方法としては、ある音素の長さ分だけのフレーム周期の時系列データを、必要な音素のそれぞれについて事前に準備しておき、生成したい音声の音素系列に合わせて、それらの音声合成パラメータ時系列（特徴パラメータの時系列データ）をつなぎ合わせて１発声の音声合成パラメータ時系列とする方法が考えられる。しかし、先述のように、同じ音素であっても、前後の音素の種類や、話速や声の高さ、直前や直後のポーズからの時間的距離によって、その特徴が大きく異なる場合がある。このような場合に対応するためには、前後の音素や韻律的特徴を考慮した複雑な音素分類を用いる必要があるが、このような複雑な音素分類を用いると、音素の種類の個数は莫大になり、必要な全ての音声合成パラメータ時系列のセットを事前に作成、蓄積しておくことは困難である。 The simplest method is to prepare time-series data of the frame period for a certain phoneme length in advance for each required phoneme, and synthesizing them according to the phoneme sequence of the speech to be generated A method of concatenating parameter time series (time series data of feature parameters) into a speech synthesis parameter time series for one utterance is conceivable. However, as described above, the characteristics of the same phoneme may vary greatly depending on the type of phonemes before and after, the speed of speech, the pitch of the voice, and the temporal distance from the immediately preceding or immediately following pose. In order to deal with such cases, it is necessary to use complex phoneme classifications that take into account the preceding and following phonemes and prosodic features, but with such complex phoneme classifications, the number of phoneme types is enormous. Therefore, it is difficult to create and store in advance all necessary speech synthesis parameter time series sets.

そこで、実際には、音声合成パラメータ時系列の時間変化を適当なモデルに基づきモデル化し、そのモデルパラメータを音声合成用記号からまず予測することで生成し、得られたモデルから音声合成パラメータ時系列を生成することで、任意の音声を合成可能とする方法が用いられる。以下では、このモデルのことを音声生成モデルと呼ぶ。 Therefore, in practice, the time change of the speech synthesis parameter time series is modeled based on an appropriate model, and the model parameters are generated by first predicting from the speech synthesis symbols, and the speech synthesis parameter time series is obtained from the obtained model. Is used to generate an arbitrary speech. Hereinafter, this model is referred to as a speech generation model.

例えば、ある音素の音声合成パラメータの特徴が時間的に３つの状態に分かれ、各状態のフレーム数について、それらの統計分布パラメータベクトルを最初の状態から順にd1、d2、d3とし、この３つのベクトルの要素を連結して１つのベクトルdを作り、また、音声合成パラメータの各状態の統計分布パラメータベクトルを最初の状態から順にv1、v2、v3とすれば、その音素を合成するための音声合成パラメータの特徴は、音声生成モデルのパラメータを構成するd、v1、v2、v3の4つのベクトルで表すことができる。さらに、音声合成用記号からこれらのパラメータベクトルを生成するような予測器を前もって構築し、音声合成時に予測器を用いることで、比較的少量のデータから音声を合成することができる。 For example, the features of a speech synthesis parameter of a phoneme are divided into three states in terms of time, and for the number of frames in each state, their statistical distribution parameter vectors are d1, d2, and d3 in order from the first state. The speech synthesis for synthesizing the phoneme is made by concatenating the elements of, making a vector d, and if the statistical distribution parameter vector of each state of the speech synthesis parameter is v1, v2, v3 in order from the first state The characteristics of the parameters can be expressed by four vectors d, v1, v2, and v3 that constitute the parameters of the speech generation model. Further, by constructing a predictor that generates these parameter vectors from speech synthesis symbols in advance and using the predictor during speech synthesis, it is possible to synthesize speech from a relatively small amount of data.

この方法に基づく代表的なものに、ＨＭＭ音声合成方式がある。ＨＭＭ音声合成方式は、音声生成モデルとしてＨＭＭ（隠れマルコフモデル）に基づくモデルを仮定している。そして、音声生成モデルのパラメータを構成する複数のベクトルは、音声認識技術における状態共有ＨＭＭで用いられる方法と同様に、それぞれ音声合成用記号から決定木に基づき決定される（非特許文献３参照）。ここで決定木は、予め用意しておいた学習音声と、それに対応する音声合成用記号を用いて構築（学習）する。 A typical example based on this method is an HMM speech synthesis method. The HMM speech synthesis method assumes a model based on HMM (Hidden Markov Model) as a speech generation model. The plurality of vectors constituting the parameters of the speech generation model are each determined based on a decision tree from speech synthesis symbols, as in the method used in the state sharing HMM in speech recognition technology (see Non-Patent Document 3). . Here, the decision tree is constructed (learned) using a prepared learning speech and a corresponding speech synthesis symbol.

１発話の音声を合成する際には、まず単位音声毎の音声生成モデルを連結して１発話分の音声生成モデルを構成する。そして、その構成された音声生成モデルに対し、尤度が最大となる音声合成パラメータ時系列を求め、これを音声波形生成に用いる。音声合成パラメータ時系列に対する、音声生成モデルの尤度は、例えば、音声生成モデルにおいて、次のように表わされる。 When synthesizing one utterance voice, first, a voice generation model for one utterance is constructed by connecting the voice generation models for each unit voice. Then, a speech synthesis parameter time series having the maximum likelihood is obtained for the constructed speech generation model, and this is used for speech waveform generation. The likelihood of the speech generation model with respect to the speech synthesis parameter time series is expressed as follows in the speech generation model, for example.

すなわち、フレームiにおける音声合成パラメータxの値xiの統計的分布が他の種類の音声合成パラメータに対し独立でかつ正規分布に従い、その分布の平均がμi、分散がσi2であるとき、音声の長さが全体でnフレームとすると、１発声の音声合成パラメータxの時系列xiに対する音声生成モデルの対数尤度は、以下の数式で与えられる。

That is, when the statistical distribution of the value xi of the speech synthesis parameter x in frame i is independent of other types of speech synthesis parameters and follows a normal distribution, the average of the distribution is μi, and the variance is σi2, the length of the speech Is a total of n frames, the log likelihood of the speech generation model for the time series xi of the speech synthesis parameter x of one utterance is given by the following equation.

しかし、フレーム周期の音声合成パラメータを数個の正規分布で直接モデル化した場合、最尤なパラメータ系列は、状態内で正規分布の平均値が連続的に出力されたものとなり、状態が切り替わる際に、その値が不連続となる。すなわち、階段状のパラメータ時系列となる。これは実際の音声の特徴と異なるため、音声合成パラメータそのものだけでなく（以下、これを静的特徴と呼ぶ）、音声合成パラメータの動的特徴として、音声合成パラメータ時系列データの一階差分（デルタ）や二階差分（デルタデルタ）等を組み合わせたベクトルを特徴ベクトルとすることで、音声合成パラメータの連続的な変化も考慮したモデル化が行われる（非特許文献４参照）。ある音声合成パラメータxのi番目のフレームにおける値xiのデルタΔxiおよびデルタデルタΔ²xiは、例えばそれぞれ数式（２）、数式（３）により与えられる。

However, when the speech synthesis parameters of the frame period are directly modeled with several normal distributions, the maximum likelihood parameter series is the one in which the average value of the normal distribution is continuously output within the state, and the state is switched. In addition, the value becomes discontinuous. That is, it becomes a stepwise parameter time series. Since this is different from actual speech features, not only speech synthesis parameters themselves (hereinafter referred to as static features), but also dynamic features of speech synthesis parameters, first-order differences of speech synthesis parameter time series data ( Delta), second-order difference (delta delta), and the like are used as feature vectors to perform modeling in consideration of continuous changes in speech synthesis parameters (see Non-Patent Document 4). The delta Δxi and delta delta Δ ² xi of the value xi in the i-th frame of a certain speech synthesis parameter x are given by, for example, Expression (2) and Expression (3), respectively.

以下、音声合成パラメータの時系列データの計算方法を説明する。まず説明のためにフレームiにおける特徴ベクトルをoiとする。数式中の英大文字および太字の英小文字はベクトルを意味する（以下、同様）。

Hereinafter, a method for calculating time-series data of speech synthesis parameters will be described. First, for explanation, the feature vector in frame i is assumed to be oi. Uppercase letters and lowercase letters in bold in the formula mean vectors (the same applies hereinafter).

また音声の長さはｎフレームとする。また、以下の行列を定義する。ただし、上付きのTは転置行列、上付きの-1は逆行列を表す（以下同様）。

The length of the voice is n frames. In addition, the following matrix is defined. However, the superscript T represents a transposed matrix, and the superscript -1 represents an inverse matrix (the same applies hereinafter).

さらに、数式（２）、（３）で定義される静的特徴の時系列Xから動的特徴を含む特徴
ベクトル時系列Oを求める変換行列をここではＷとする。つまり、以下の関係が成り立つ。ここでＷは３ｎ行×ｎ列の行列である。

Furthermore, a transformation matrix for obtaining a feature vector time series O including a dynamic feature from a static feature time series X defined by Equations (2) and (3) is W here. That is, the following relationship holds. Here, W is a matrix of 3n rows × n columns.

パラメータの分布が正規分布に従う場合、Xの対数尤度p(X)は以下の数式で与えられる。ここでμはＯの分布の平均ベクトル、UはＯの分布の分散共分散行列である。μおよびＵの各要素は事前に学習した決定木により、音声合成用記号から決定する。

When the parameter distribution follows a normal distribution, the log likelihood p (X) of X is given by the following equation. Here, μ is an average vector of O distribution, and U is a variance-covariance matrix of O distribution. Each element of μ and U is determined from a speech synthesis symbol by a decision tree learned in advance.

対数尤度p(X)を最大とするXは以下の関係を満たす。

X that maximizes log likelihood p (X) satisfies the following relationship.

数式（８）および数式（９）をXについて解くと以下の数式が得られる。

Solving Equation (8) and Equation (9) with respect to X yields the following equation:

すなわち、数式（１０）を計算することで、最尤基準に基づく、動的特徴を考慮したパラメータ時系列が得られる。音声合成パラメータxを多次元のベクトルに拡張した場合も同様である。このような処理装置の構成は、たとえば図３のように表される。 That is, by calculating Equation (10), a parameter time series in consideration of dynamic features based on the maximum likelihood criterion can be obtained. The same applies when the speech synthesis parameter x is extended to a multidimensional vector. The configuration of such a processing apparatus is expressed as shown in FIG. 3, for example.

しかしながら、このような方法により求められたパラメータ時系列は、分布情報の元となったデータよりも系列全体の平均値側に集まり、ダイナミックレンジの小さいパラメータ時系列となる傾向がしばしば見られる。この原因の一つとして、特徴パラメータ時系列の生成処理に、系列全体でみた場合の値の分布に関する制約が含まれていないことが挙げられる。このため、分布推定に用いたデータと得られたデータの間で、系列全体でみた場合の分布に大きな誤差が生じてしまうことが考えられる。 However, the parameter time series obtained by such a method often gathers closer to the average value side of the entire series than the data from which the distribution information is based, and tends to be a parameter time series with a small dynamic range. One reason for this is that the feature parameter time series generation processing does not include restrictions on the distribution of values when viewed as a whole series. For this reason, it is conceivable that a large error occurs in the distribution when viewed as a whole series between the data used for the distribution estimation and the obtained data.

そこで、系列全体の値の分散に対して、その分散の分布を考慮することで、系列全体の分散を制約して最尤なパラメータ時系列を得る方法が知られている（非特許文献５参照）。この方法においては、数式（８）の代わりに、以下の式で与えられる対数尤度を最大化する。なお、分散の分布とは、条件に対する分散の値の分布をいう。

Therefore, a method is known in which the variance of the entire sequence is constrained with respect to the variance of the values of the entire sequence to obtain the maximum likelihood parameter time series (see Non-Patent Document 5). ). In this method, instead of Equation (8), the log likelihood given by the following equation is maximized. Note that the distribution of dispersion refers to the distribution of dispersion values with respect to conditions.

ただし、p(X)は数式（８）で与えた式と同じであり、数式（１１）のpv（X）は、Xの要素の分散の分布の、Xに対する尤度であり、以下、分散の尤度という。また、ωは従来のXの時刻iにおける分布から計算される尤度とXの要素の分散の分布から計算される尤度の効果の割合を調整する重み係数である。これにより、各時刻における特徴分布だけでなく、系列全体のダイナミックレンジも再現された、より好ましいパラメータ時系列データが得られる。 However, p (X) is the same as the equation given by Equation (8), and pv (X) in Equation (11) is the likelihood of X of the variance distribution of the elements of X. The likelihood of. Further, ω is a weighting coefficient that adjusts the ratio of the effect of the likelihood calculated from the distribution of the variance of the elements of X and the likelihood calculated from the conventional distribution of X at time i. Thereby, more preferable parameter time-series data in which not only the feature distribution at each time but also the dynamic range of the entire series is reproduced is obtained.

上記、音声合成技術を例に説明したが、本パラメータ時系列生成技術は、他の用途に適用可能である。例えば、音響特徴量の代わりに口の形状に関する特徴パラメータを同様にモデル化し、本技術によりパラメータ時系列を生成することで、音声同様、時間軸方向にも圧縮された少量のデータから滑らかな口の動きの動画像を生成することができる（非特許文献６参照）。あるいは、入力情報を音素ではなく、動きに関する離散的シンボル情報（例えば「手を閉じた状態から人差し指を伸ばす」等）とし、それに動きに関する特徴量を結びつけたモデル化を行なうことで、少量のデータからアニメーションキャラクタや、ロボット等の自然なモーションデータを生成するといった利用も可能である（非特許文献７参照）。 The speech synthesis technique has been described above as an example, but the parameter time series generation technique can be applied to other uses. For example, instead of acoustic features, feature parameters related to the mouth shape are modeled in the same way, and parameter time series is generated by this technology, so that a smooth mouth can be obtained from a small amount of data compressed in the time axis direction as well as speech. Can be generated (see Non-Patent Document 6). Alternatively, the input information is not phonemes, but discrete symbol information about movement (for example, “extends the index finger from a closed hand”), and modeling is performed in combination with feature quantities related to movement. It can also be used to generate natural motion data such as animation characters and robots (see Non-Patent Document 7).

「日本語テキスト音声合成用記号」ＪＥＩＴＡ規格ＩＴ−４００２、２００５年3月"Symbols for Japanese text-to-speech synthesis" JEITA standard IT-4002, March 2005 今井聖、住田一男、古市千枝子、「音声合成のためのメル対数スペクトル近似（ＭＬＳＡ）フィルタ」、電子情報通信学会論文誌(A), J66-A, 2, Feb.1983, pp.122-129Sei Imai, Kazuo Sumita, Chieko Furuichi, "Mel Log Spectrum Approximation (MLSA) Filter for Speech Synthesis", IEICE Transactions (A), J66-A, 2, Feb.1983, pp.122-129 吉村貴克、徳田恵一、益子貴史、小林隆夫、北村正、「ＨＭＭに基づく音声合成におけるスペクトル・ピッチ・継続長の同時モデル化」、電子情報通信学会論文誌(D-II), J83-D-II, 11, Nov.2000, pp.2099-2107Takamura Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, Tadashi Kitamura, “Simultaneous Modeling of Spectrum, Pitch, and Duration in HMM-Based Speech Synthesis”, IEICE Transactions (D-II), J83-D -II, 11, Nov.2000, pp.2099-2107 益子貴史、徳田恵一、小林隆夫、今井聖、「動的特徴を用いたＨＭＭに基づく音声合成」、電子情報通信学会論文誌(D-II), J79-D-II, 12, Dec.1996, pp.2184-2190Masashi Takashi, Tokuda Keiichi, Kobayashi Takao, Imai Kiyoshi, "HMM-based speech synthesis using dynamic features", IEICE Transactions (D-II), J79-D-II, 12, Dec. 1996, pp.2184-2190 戸田智基、徳田恵一、「HMM音声合成のための系列内変動を考慮した音声パラメータ生成アルゴリズム」、電子情報通信学会技術報告, SP2005-52, Aug. 2005, pp. 1-6.Toda Tomoki, Tokuda Keiichi, “Speech parameter generation algorithm considering intra-sequence variation for HMM speech synthesis”, IEICE Technical Report, SP2005-52, Aug. 2005, pp. 1-6. 増淵淳、田村正統、宮川公成、益子貴史、小林隆夫、徳田恵一「HMMに基づくテキストからの音声・唇画像の同時生成」、日本音響学会平成10年度春季研究発表会講演論文集, 2-P-6, vol. 1, Mar. 1998. , pp. 305-306Masashi, Masanori Tamura, Kiminari Miyagawa, Takashi Masuko, Takao Kobayashi, Keiichi Tokuda “Simultaneous Generation of Speech and Lip Images from Text Based on HMM”, Proceedings of the 1998 Acoustical Research Conference of the Acoustical Society of Japan, 2- P-6, vol. 1, Mar. 1998., pp. 305-306 森健史、南角吉彦、宮島千代美、徳田恵一、北村正「隠れマルコフモデルに基づく指文字動画像生成」FIT2005（第４回情報科学技術フォーラム）, K-092, Aug. 2005, pp. 569-570.Kenji Mori, Yoshihiko Nankaku, Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura “Finger Text Video Generation Based on Hidden Markov Models” FIT2005 (4th Forum on Information Science and Technology), K-092, Aug. 2005, pp. 569-570 .

生成しようとする特徴パラメータ時系列データの要素の値の分布に関する制約を設けた場合は、そのような制約を設けない場合とは異なり、最尤となるパラメータ時系列データを直接計算できない。したがって、ニュートン・ラプソン法等の反復法により近似的に求める必要がある。このような反復法では、生成されるパラメータ時系列中のデータが１点でも変わると、パラメータ時系列データの尤度の値が変わる。これにより、たとえばニュートン・ラプソン法で反復される各回処理では、分散に関する尤度関数のヘッセ行列（目的関数の二階偏微分を要素とする行列）の各非対角要素が０にならない。すなわち、ヘッセ行列は対角行列とはならない。しかし、上記の反復処理の各回では、ヘッセ行列の逆行列を求める必要があり、非対角行列の逆行列計算によりコストは増大する。 Unlike the case where such a restriction is not provided, the parameter time-series data having the maximum likelihood cannot be directly calculated when the restriction on the element value distribution of the characteristic parameter time series data to be generated is provided. Therefore, it is necessary to obtain approximately by an iterative method such as Newton-Raphson method. In such an iterative method, the likelihood value of the parameter time-series data changes when the generated data in the parameter time-series changes even at one point. Thereby, for example, in each process repeated by the Newton-Raphson method, each off-diagonal element of the Hessian matrix of the likelihood function related to variance (a matrix having the second partial differential of the objective function as an element) does not become zero. That is, the Hessian matrix is not a diagonal matrix. However, in each iteration of the above iterative process, it is necessary to obtain the inverse matrix of the Hessian matrix, and the cost increases due to the inverse matrix calculation of the non-diagonal matrix.

これに対し、逆行列演算を回避できる方法として、ＢＦＧＳ公式等に基づく準ニュートン法が知られている。しかし、準ニュートン法は、計算対象となる行列の行および列の大きさが合成対象の音声のフレーム数に比例し、その二乗に比例するメモリを要するため、携帯端末等のメモリやＣＰＵ性能の限られた環境では、高速に処理できない。 On the other hand, a quasi-Newton method based on the BFGS formula or the like is known as a method that can avoid the inverse matrix operation. However, in the quasi-Newton method, the size of the row and column of the matrix to be calculated is proportional to the number of frames of the speech to be synthesized, and requires a memory proportional to the square thereof. It cannot be processed at high speed in a limited environment.

一方、逆行列計算を容易にする別の方法として、非特許文献５で試みられているような、ニュートン・ラプソン法におけるヘッセ行列の非対角要素をすべて０とする方法がある。この場合、メモリ使用量を減らすことができ、かつ逆行列計算が容易になるが、非対角要素をすべて０とすることは、各時刻のデータがそれぞれ独立であることを仮定することに相当する。しかし、動的特徴は複数時刻のデータで決まり、各時刻で独立した特徴情報ではないため、このような方法では適当な解の修正を行なうことができない。その結果、解の収束が非常に遅くなり、その処理速度を高速化できない。 On the other hand, as another method for facilitating the inverse matrix calculation, there is a method in which all the off-diagonal elements of the Hessian matrix in the Newton-Raphson method are set to 0 as attempted in Non-Patent Document 5. In this case, the memory usage can be reduced and the inverse matrix calculation becomes easy. However, setting all the off-diagonal elements to 0 is equivalent to assuming that the data at each time is independent. To do. However, since dynamic features are determined by data at a plurality of times and are not feature information independent at each time, an appropriate solution cannot be corrected by such a method. As a result, solution convergence becomes very slow, and the processing speed cannot be increased.

これらの方法では、求める時系列データの分散を考慮しない処理とは全く異なる処理を行なうため、新規の回路やソフトウェアが必要となる。実際には求める時系列データの分散を考慮した場合であっても、考慮しない場合の結果とある程度類似した結果となるため、従来法による特徴パラメータの時系列データ生成結果を反復法における初期値とすることで、解の収束を早めることができる。しかし、その場合には、図４に示すように、従来法と反復法のそれぞれの回路やソフトウェアが必要であり、携帯端末や組み込み機器等の処理能力や収容能力が不足する機器では採用できない。 In these methods, processing that is completely different from processing that does not consider the distribution of the time-series data to be obtained is performed, so that a new circuit or software is required. In fact, even when the variance of the time series data to be obtained is taken into account, the result is somewhat similar to the result when it is not taken into account. By doing so, solution convergence can be accelerated. However, in that case, as shown in FIG. 4, the circuit and software of the conventional method and the iterative method are required, and cannot be adopted in a device having insufficient processing ability or capacity such as a portable terminal or an embedded device.

本発明は、このような事情に鑑みてなされたものであり、回路・ソフトウェア規模の増大を抑えつつ、特徴パラメータ時系列の分散も考慮した特徴パラメータ生成装置、特徴パラメータ生成方法および特徴パラメータ生成プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, a feature parameter generation device, a feature parameter generation method, and a feature parameter generation program that take into account the variance of feature parameter time series while suppressing an increase in circuit / software scale. The purpose is to provide.

（１）上記の目的を達成するため、本発明の特徴パラメータ生成装置は、シンボルの系列情報に対応する特徴パラメータの時系列の分布を再現する特徴パラメータ生成装置であって、特徴パラメータの分布が最尤となるように、初期値の系列データを修正して特徴パラメータの時系列データを生成し、特徴パラメータの分布および特徴パラメータの時系列データの分散の尤度がより大きくなるように、前記生成された特徴パラメータの時系列データを繰り返し修正する構成のうち、前記初期値の修正と前記繰り返しの修正を同一の構成により行なうことを特徴としている。 (1) In order to achieve the above object, a feature parameter generation device of the present invention is a feature parameter generation device that reproduces a time series distribution of feature parameters corresponding to symbol series information, and the feature parameter distribution is In order to obtain maximum likelihood, the initial value series data is modified to generate feature parameter time series data, and the likelihood of the distribution of the feature parameter distribution and the feature parameter time series data is further increased. Of the configurations for repeatedly correcting the time-series data of the generated characteristic parameters, the initial value correction and the repeated correction are performed by the same configuration.

このように、本発明の特徴パラメータ生成装置は、特徴パラメータの時系列データの分散まで考慮するため、特徴パラメータの再現性に優れている。そして、特徴パラメータの分布が最尤となるパラメータ時系列データ生成結果を用いて繰り返しの修正を行なうため、その収束をより早めることができる。また、初期値の修正と繰り返しの修正を同一の構成により行なうため、回路やソフトウェアを重複して設ける必要がなくなり、回路規模・ソフトウェア規模を低減できる。したがって、携帯端末・組み込み機器等には特に有利である。 As described above, the feature parameter generation device of the present invention is excellent in the reproducibility of feature parameters because it takes into account the dispersion of time series data of feature parameters. Then, since the parameter time-series data generation result with the maximum likelihood of the feature parameter distribution is repeatedly corrected, the convergence can be further accelerated. In addition, since the initial value correction and the repeated correction are performed with the same configuration, it is not necessary to provide redundant circuits and software, and the circuit scale and software scale can be reduced. Therefore, it is particularly advantageous for portable terminals and embedded devices.

（２）また、本発明の特徴パラメータ生成装置は、特徴パラメータの分布が最尤となるようにシンボル情報系列から初期用の中間データ系列を生成する第１の中間データ生成部と、前記初期用の中間データ系列から修正用系列データを生成する修正データ生成部と、前記シンボル情報系列に対応する特徴パラメータの時系列データを、前記修正用系列データを用いて修正するデータ修正部と、特徴パラメータの分布および特徴パラメータの時系列データの分散の尤度がより大きくなるように、前記修正された特徴パラメータの時系列データから更新用の中間データ系列を生成する第２の中間データ生成部と、を備え、前記修正データ生成部が前記更新用の中間データ系列から修正用系列データを生成し、前記データ修正部が前記修正用系列データを用いて前記修正された特徴パラメータの時系列データを修正する一連の動作を繰り返すことを特徴としている。 (2) Further, the feature parameter generation apparatus of the present invention includes a first intermediate data generation unit that generates an initial intermediate data sequence from a symbol information sequence so that the distribution of the feature parameter is maximum likelihood, and the initial use A correction data generation unit that generates correction series data from the intermediate data series, a data correction unit that corrects the time series data of feature parameters corresponding to the symbol information series using the correction series data, and feature parameters A second intermediate data generation unit that generates an intermediate data sequence for updating from the time series data of the modified feature parameter so that the likelihood of the distribution of the distribution and the time series data of the feature parameter becomes larger; The correction data generation unit generates correction sequence data from the update intermediate data sequence, and the data correction unit generates the correction sequence data. It is characterized by repeating a series of operations to correct the time-series data of said modified feature parameter using the data.

これにより、特徴パラメータの時系列データについての初期の修正時と繰り返しの修正時のいずれの場合にも、修正データ生成部が動作するため、回路やソフトウェアを重複して設ける必要がなくなり、回路規模・ソフトウェア規模を低減できる。 As a result, the correction data generation unit operates in both cases of initial correction and repeated correction of the time series data of the feature parameter, so that it is not necessary to provide redundant circuits and software, and the circuit scale・ The software scale can be reduced.

（３）また、本発明の特徴パラメータ生成装置は、前記修正された特徴パラメータの時系列データを、所定条件を満たすまで繰り返して修正させる処理制御部を更に備えることを特徴としている。これにより、一定の再現レベルまで特徴パラメータの時系列データを収束させることができる。 (3) In addition, the feature parameter generation device of the present invention is further characterized by further including a processing control unit that repeatedly corrects the time series data of the corrected feature parameters until a predetermined condition is satisfied. Thereby, the time-series data of the characteristic parameters can be converged to a certain reproduction level.

（４）また、本発明の特徴パラメータ生成装置は、前記特徴パラメータの時系列の分布が特徴パラメータの時系列の動的特徴の分布を含み、前記第１の中間データ生成部は動的特徴を含む特徴パラメータの分布が最尤となるようにシンボル情報系列から中間データ系列を生成し、前記第２の中間データ生成部は、動的特徴を含む特徴パラメータの分布および特徴パラメータの時系列データの分散の尤度がより大きくなるようにシンボル情報系列から中間データ系列を生成することを特徴としている。これにより、動的特徴も考慮し、特徴パラメータの時系列データの再現性を向上できる。 (4) In the feature parameter generation device of the present invention, the time series distribution of the feature parameters includes a time series dynamic feature distribution of the feature parameters, and the first intermediate data generation unit includes the dynamic features. An intermediate data sequence is generated from the symbol information sequence so that the distribution of the feature parameter including the maximum likelihood, and the second intermediate data generation unit includes the distribution of the feature parameter including the dynamic feature and the time series data of the feature parameter. An intermediate data sequence is generated from the symbol information sequence so that the likelihood of dispersion is greater. Thereby, the reproducibility of the time series data of the feature parameter can be improved in consideration of the dynamic feature.

（５）また、本発明の特徴パラメータ生成装置は、前記第２の中間データ生成部が、特徴パラメータの分布の対数尤度と特徴パラメータの時系列データの分散の対数尤度との重み付き和を用いて更新用の中間データ系列を生成することを特徴としている。このようにして、重みにより特徴パラメータの時系列データの分散を再現したときの再現度を調整できる。 (5) Further, in the feature parameter generation device of the present invention, the second intermediate data generation unit includes a weighted sum of the log likelihood of the distribution of the feature parameter and the log likelihood of the variance of the time series data of the feature parameter. Is used to generate an intermediate data series for update. In this way, it is possible to adjust the reproducibility when reproducing the variance of the time series data of the characteristic parameters by the weight.

（６）また、本発明の特徴パラメータ生成装置は、前記第２の中間データ生成部が、前記修正の前後で前記特徴パラメータの時系列データの分散の尤度が変わらないことを仮定した条件の下で前記更新用の中間データ系列を生成することを特徴としている。これにより、特徴パラメータの時系列データの修正の計算を簡略化することで、処理負担を軽減できる。 (6) In the feature parameter generation device of the present invention, the second intermediate data generation unit has a condition on the assumption that the likelihood of variance of the time series data of the feature parameter does not change before and after the correction. In the following, the update intermediate data series is generated. Thereby, the processing load can be reduced by simplifying the calculation of the correction of the time series data of the characteristic parameters.

（７）また、本発明の特徴パラメータ生成装置は、前記修正データ生成部が、対称帯行列に対する逆行列と、ベクトルとの積を計算することに相当する動作により修正用系列データを生成することを特徴としている。これにより、特徴パラメータの時系列データの修正の計算を簡略化することで、処理負担を軽減できる。 (7) In the feature parameter generation device of the present invention, the correction data generation unit generates correction sequence data by an operation equivalent to calculating a product of an inverse matrix for a symmetric band matrix and a vector. It is characterized by. Thereby, the processing load can be reduced by simplifying the calculation of the correction of the time series data of the characteristic parameters.

（８）また、本発明の特徴パラメータ生成装置は、一連の単位音声列に含まれる単位音声の種類を記述する音声合成用情報を前記シンボル情報系列として入力し、前記修正された特徴パラメータの時系列データとして合成音声波形を生成することを特徴としている。音素の種類や韻律的特徴を表記した記号を入力し、適切なダイナミックレンジで音声波形を生成することができる。 (8) Also, the feature parameter generation device of the present invention inputs speech synthesis information describing the type of unit speech included in a series of unit speech sequences as the symbol information sequence, and the modified feature parameter A feature is that a synthesized speech waveform is generated as sequence data. A voice waveform can be generated with an appropriate dynamic range by inputting a symbol describing the type of phoneme and prosodic features.

（９）また、本発明の特徴パラメータ生成方法は、コンピュータにより行なわれ、シンボル情報系列に対応する特徴パラメータの時系列の分布を再現する特徴パラメータ生成方法であって、特徴パラメータの分布が最尤となるようにシンボル情報系列から初期用の中間データ系列を生成するステップと、前記初期用の中間データ系列から修正用系列データを生成するステップと、前記シンボル情報系列に対応する特徴パラメータの時系列データを、前記修正用系列データを用いて修正するステップと、特徴パラメータの分布および特徴パラメータの時系列データの分散の尤度がより大きくなるように、前記修正された特徴パラメータの時系列データから更新用の中間データ系列を生成するステップと、を含み、前記更新用の中間データ系列から修正用系列データを生成し、前記修正された特徴パラメータの時系列データを修正する一連のステップを繰り返すことを特徴としている。このようにして、回路・ソフトウェア規模の増大を抑えつつ、特徴パラメータ時系列の分散も考慮して特徴パラメータを再現できる。 (9) The feature parameter generation method of the present invention is a feature parameter generation method that reproduces a time-series distribution of feature parameters corresponding to a symbol information sequence by a computer, and the feature parameter distribution is the maximum likelihood. A step of generating an initial intermediate data sequence from the symbol information sequence, a step of generating correction sequence data from the initial intermediate data sequence, and a time series of feature parameters corresponding to the symbol information sequence Correcting the data using the correction series data; and from the time series data of the modified feature parameters so that the likelihood of the distribution of the feature parameters and the variance of the time series data of the feature parameters is increased. Generating an intermediate data series for update, wherein the intermediate data series for update Generates a corrective series data, is characterized by repeated sequence of steps to modify the time series data of said modified feature parameter. In this way, it is possible to reproduce the feature parameter in consideration of the variance of the feature parameter time series while suppressing an increase in circuit / software scale.

（１０）また、本発明の特徴パラメータ生成プログラムは、コンピュータに実行させ、シンボル情報系列に対応する特徴パラメータの時系列の分布を再現する特徴パラメータ生成プログラムであって、特徴パラメータの分布が最尤となるようにシンボル情報系列から初期用の中間データ系列を生成する処理と、前記初期用の中間データ系列から修正用系列データを生成する処理と、前記シンボル情報系列に対応する特徴パラメータの時系列データを、前記修正用系列データを用いて修正する処理と、特徴パラメータの分布および特徴パラメータの時系列データの分散の尤度がより大きくなるように、前記修正された特徴パラメータの時系列データから更新用の中間データ系列を生成する処理と、を含み、前記更新用の中間データ系列から修正用系列データを生成し、前記修正された特徴パラメータの時系列データを修正する一連の処理を繰り返すことを特徴としている。このようにして、回路・ソフトウェア規模の増大を抑えつつ、特徴パラメータ時系列の分散も考慮して特徴パラメータを再現できる。 (10) A feature parameter generation program of the present invention is a feature parameter generation program that causes a computer to execute and reproduce a time-series distribution of feature parameters corresponding to a symbol information series, wherein the feature parameter distribution is the maximum likelihood. Processing for generating an initial intermediate data sequence from the symbol information sequence, processing for generating correction sequence data from the initial intermediate data sequence, and a time series of feature parameters corresponding to the symbol information sequence From the time series data of the modified feature parameter so that the likelihood of the process of modifying the data using the correction series data and the distribution of the feature parameter distribution and the time series data of the feature parameter is increased. Generating intermediate data series for update, and correcting from the intermediate data series for update To generate a sequence data is characterized by repeating a series of processing for correcting the time series data of said modified feature parameter. In this way, it is possible to reproduce the feature parameter in consideration of the variance of the feature parameter time series while suppressing an increase in circuit / software scale.

全体の回路やプログラム規模を抑えつつ、分散も考慮して特徴パラメータの時系列データを再現できる。また、高速化が行われた装置やプログラムを反復法における処理で用いることで、計算処理を容易に高速化できる。 The time series data of the characteristic parameters can be reproduced in consideration of dispersion while suppressing the entire circuit and program scale. In addition, by using a device or program that has been speeded up for processing in an iterative method, the calculation processing can be easily speeded up.

本発明に係る特徴パラメータ生成装置を示すブロック図である。It is a block diagram which shows the characteristic parameter production | generation apparatus which concerns on this invention. 本発明に係る特徴パラメータ生成装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the characteristic parameter generation apparatus which concerns on this invention. 従来法の特徴パラメータ生成装置を示すブロック図である。It is a block diagram which shows the characteristic parameter production | generation apparatus of the conventional method. 従来法と反復法を組み合わせた特徴パラメータ生成装置を示すブロック図である。It is a block diagram which shows the characteristic parameter production | generation apparatus which combined the conventional method and the iterative method.

（特徴パラメータ生成装置の特徴）
本発明の特徴パラメータ生成装置は、シンボルの系列情報に対応する特徴パラメータの時系列の分布を再現する。たとえば、一連の単位音声列に含まれる単位音声の種類を記述する音声合成用情報をシンボル情報系列として入力し、修正された特徴パラメータの時系列データとして合成音声波形を生成する。そして、音素の種類や韻律的特徴を表記した記号を入力し、適切なダイナミックレンジで音声波形を生成することができる。 (Features of feature parameter generator)
The feature parameter generation device of the present invention reproduces a time series distribution of feature parameters corresponding to symbol series information. For example, speech synthesis information describing the type of unit speech included in a series of unit speech sequences is input as a symbol information sequence, and a synthesized speech waveform is generated as time-series data of the modified feature parameters. Then, it is possible to input a symbol describing the type of phoneme and prosodic features and generate a speech waveform with an appropriate dynamic range.

その他、音響特徴量の代わりに口の形状に関する特徴パラメータを同様にモデル化し、パラメータ時系列を生成し、口の動きの動画像を生成することもできる。また、特徴量をモデル化しアニメーションキャラクタや、ロボット等の自然なモーションデータを生成することもできる。 In addition, instead of the acoustic feature quantity, a feature parameter related to the mouth shape can be similarly modeled, a parameter time series can be generated, and a moving image of mouth movement can be generated. It is also possible to model feature quantities and generate natural motion data such as animation characters and robots.

このような特徴パラメータ生成装置は、特徴パラメータの分布が最尤となるように、初期値の系列データを修正して特徴パラメータの時系列データを生成する。そして、特徴パラメータの分布および特徴パラメータの時系列データの分散の尤度がより大きくなるように、生成された特徴パラメータの時系列データを繰り返し修正する構成のうち、初期値の修正と繰り返しの修正を同一の構成により行なう。なお、最尤とは、分布の尤度関数の最大化をいい、尤度関数は、その値が大きいほど分布が尤もらしいとして定義される関数である。そのような関数の例としてガウス関数や、ガウス関数の線形重みづけ和で定義される混合ガウス関数がある。 Such a feature parameter generation device corrects initial value series data to generate feature parameter time series data so that the distribution of the feature parameters becomes maximum likelihood. Then, in the configuration in which the time series data of the generated feature parameters is repeatedly corrected so that the likelihood of the distribution of the feature parameters and the distribution of the time series data of the feature parameters is increased, the correction of the initial value and the correction of the repetition are performed. Are performed with the same configuration. The maximum likelihood means the maximization of the likelihood function of the distribution, and the likelihood function is a function that is defined as the distribution is more likely as the value is larger. Examples of such functions are Gaussian functions and mixed Gaussian functions defined by linear weighted sums of Gaussian functions.

これにより、特徴パラメータの時系列データの分散まで考慮するため、特徴パラメータの再現性に優れている。そして、分布が尤となるパラメータ時系列データ生成結果を用いて繰り返しの修正を行なうため、その収束をより早めることができる。また、初期値の修正と繰り返しの修正を同一の構成により行なうため、回路やソフトウェアを重複して設ける必要がなくなり、回路規模・ソフトウェア規模を低減できる。したがって、携帯端末・組み込み機器等には特に有利である。 Thereby, since the dispersion of the time series data of the characteristic parameters is taken into consideration, the reproducibility of the characteristic parameters is excellent. Then, since iterative correction is performed using the parameter time-series data generation result with which the distribution is likely, the convergence can be further accelerated. In addition, since the initial value correction and the repeated correction are performed with the same configuration, it is not necessary to provide redundant circuits and software, and the circuit scale and software scale can be reduced. Therefore, it is particularly advantageous for portable terminals and embedded devices.

一例として、特徴パラメータ生成装置は、各時刻の特徴パラメータ分布と、特徴パラメータの時系列データの分散の分布のそれぞれの対数尤度の重み付き和が最大となるような、特徴パラメータ時系列データを出力する。これにより、双方の基準のバランスのとれた特徴パラメータ時系列データが生成される。以下に、図面を参照して本発明の詳細を説明する。 As an example, the feature parameter generation device generates feature parameter time series data that maximizes the weighted sum of the log likelihoods of the feature parameter distribution at each time and the distribution of the variance of the time series data of the feature parameters. Output. Thereby, feature parameter time-series data in which both standards are balanced is generated. Details of the present invention will be described below with reference to the drawings.

（特徴パラメータ生成装置の構成）
図１は、特徴パラメータ生成装置１００のブロック図である。特徴パラメータ生成装置１００は、第１の中間データ生成部１１０と、修正データ生成部１２０と、データ修正部１３０と、第２の中間データ生成部１４０と、処理制御部１５０とで構成される。 (Configuration of feature parameter generator)
FIG. 1 is a block diagram of the feature parameter generation device 100. The feature parameter generation apparatus 100 includes a first intermediate data generation unit 110, a correction data generation unit 120, a data correction unit 130, a second intermediate data generation unit 140, and a processing control unit 150.

以下、簡単のために、特徴パラメータ時系列データにおける各時刻のデータが１次元の値である場合について説明する。また時系列データの長さをｎとする。 Hereinafter, for the sake of simplicity, the case where the data at each time in the feature parameter time series data is a one-dimensional value will be described. The length of the time series data is n.

第１の中間データ生成部１１０は、特徴パラメータの分布が最尤となるようにシンボル情報系列から初期用の中間データ系列を生成する。すなわち、第１の中間データ生成部１１０は、入力された各時刻の特徴パラメータ分布情報に対し、最終的な出力の以下の数式（１２）、（１３）に示すｎ×ｎの対称行列およびｎ次元ベクトルを求めるための計算を行ない、その結果を、１回目に特徴パラメータ分布を修正するための中間データとして出力する。

The first intermediate data generation unit 110 generates an initial intermediate data sequence from the symbol information sequence so that the distribution of the characteristic parameters becomes maximum likelihood. That is, the first intermediate data generation unit 110 applies the n × n symmetric matrix shown in the following equations (12) and (13) of the final output to the feature parameter distribution information at each time and n The calculation for obtaining the dimension vector is performed, and the result is output as intermediate data for correcting the feature parameter distribution for the first time.

この処理は、特徴パラメータの時系列の分散を考慮しない場合と同じである。なお、特徴パラメータ分布情報は、シンボルの系列情報に基づいて求められ、入力される。また、「最尤となるように初期用の中間データ系列を生成する」とは、たとえば上記のように最尤基準で作成された数式を用いて初期用の中間データ系列を算出することを含め、なんらかの最尤基準で初期用の中間データ系列を生成することをいう。 This process is the same as when the time series variance of feature parameters is not considered. The feature parameter distribution information is obtained and input based on the symbol series information. “Generating an intermediate data sequence for initial use so as to have maximum likelihood” includes, for example, calculating an intermediate data sequence for initial use using a mathematical formula created on the basis of maximum likelihood as described above. The generation of the initial intermediate data series using some maximum likelihood criterion.

修正データ生成部１２０は、特徴パラメータを修正するための時系列データを生成する。具体的には、ｎ×ｎの対称行列（ここではＳとする）と、およびｎ次元ベクトル（ｘとする）が入力されると、以下の数式（１４）に示すｎベクトル次元ベクトルを計算し出力する。

The correction data generation unit 120 generates time series data for correcting the feature parameter. Specifically, when an n × n symmetric matrix (here, S) and an n-dimensional vector (x) are input, an n-vector dimensional vector shown in the following equation (14) is calculated. Output.

したがって、１回目の修正時には、初期用の中間データ系列から修正用系列データを生成し、２回目以降の修正における更新時には、更新用の中間データ系列から修正用系列データを生成する。１回目および２回目のいずれの修正の場合にも、修正データ生成部１２０が動作するため、回路やソフトウェアを重複して設ける必要がなくなり、回路規模・ソフトウェア規模を低減できる。 Therefore, correction sequence data is generated from the initial intermediate data sequence at the first correction, and correction sequence data is generated from the update intermediate data sequence at the second and subsequent corrections. In both the first and second corrections, the correction data generation unit 120 operates, so that it is not necessary to provide redundant circuits and software, and the circuit scale and software scale can be reduced.

データ修正部１３０は、ある特徴パラメータの時系列データと修正データをその入力とし、各時刻のパラメータの値をそれぞれ加算した結果の時系列データを、特徴パラメータの時系列データの修正結果として出力する。この際、修正データに対し、ニュートン・ラプソン法におけるステップパラメータに対応する乗数として、α（０＜α）倍する処理を行なってもよい。αが大きいと、修正が早くなる代わりに修正により逆に結果が悪化する可能性が高まり、αが小さいと、修正により結果が悪化する可能性は小さくなるものの収束が遅くなる。したがって、適当な大きさのαを設定しておくのが好ましい。 The data correction unit 130 receives time series data and correction data of a certain feature parameter as input, and outputs time series data obtained by adding the parameter values at each time as a correction result of the time series data of the feature parameter. . At this time, the correction data may be multiplied by α (0 <α) as a multiplier corresponding to the step parameter in the Newton-Raphson method. If α is large, the possibility of a worsening of the result due to the correction is increased instead of faster correction. If α is small, the possibility of the deterioration of the result due to the correction is reduced, but the convergence is delayed. Therefore, it is preferable to set α having an appropriate size.

具体的には、修正データ生成部１３０は、対称帯行列に対する逆行列と、ベクトルとの積を計算することに相当する動作により修正用系列データを生成する。なお、「相当する」とは、上記の計算方法に限らず、対称行列の逆行列とベクトルの積を計算する場合に、実際には陽に逆行列を求めない計算方法も含むことを意味している。すなわち、たとえば修正コレスキー分解と呼ばれる方法により対称帯行列を下三角行列・対角行列・上三角行列の積に分解し、それぞれの逆行列とベクトルとの積を順に計算する方法も含む。 Specifically, the correction data generation unit 130 generates correction sequence data by an operation equivalent to calculating a product of an inverse matrix for a symmetric band matrix and a vector. Note that “corresponding” is not limited to the above-described calculation method, but means that a calculation method that does not actually obtain an inverse matrix explicitly is included when calculating the product of an inverse matrix of a symmetric matrix and a vector. ing. That is, for example, a method called a modified Cholesky decomposition is used to decompose a symmetric band matrix into a product of a lower triangular matrix, a diagonal matrix, and an upper triangular matrix, and calculate the product of each inverse matrix and vector in order.

上記のような機能により、データ修正部１３０は、１回目の修正時はシンボル情報系列に対応する特徴パラメータの時系列データを、修正用系列データを用いて修正する。そして、繰り返しの修正における更新時には、修正された特徴パラメータの時系列データを、修正用系列データを用いて更に修正する。 With the function as described above, the data correction unit 130 corrects the time series data of the characteristic parameters corresponding to the symbol information series at the time of the first correction using the correction series data. At the time of update in repeated correction, the time series data of the corrected feature parameter is further corrected using the correction series data.

第２の中間データ生成部１４０は、特徴パラメータの分布および特徴パラメータの時系列データの分散の尤度がより大きくなるように、修正された特徴パラメータの時系列データから更新用の中間データ系列を生成する。具体的には、第２の中間データ生成部１４０は、第１の所定の基準がある基準値の最大化であるとき、ヘッセ行列（各時刻の特徴パラメータによる基準値の二階偏微分を要素とするｎ×ｎ対称行列）および、各時刻の特徴パラメータによる基準値の一階偏微分を要素とするｎ次元ベクトルを出力する。このｎ次元ベクトルの入力に対して、修正データ生成部１２０は、ニュートン・ラプソン法において特徴パラメータの時系列データを修正するベクトルを出力する。このベクトルは、ヘッセ行列をＨ、各時刻の特徴パラメータによる基準値の一階偏微分を要素とするｎ次元ベクトルを∂p'(X)/∂Xとするとき、以下の数式（１５）で与えられる。そして、−Ｈ＝Ｓ、∂p'(X)/∂X＝Xとおくと、数式（１５）は数式（１４）と一致するので、修正データ生成部１２０において、このベクトルを計算することができる。

The second intermediate data generation unit 140 generates an update intermediate data sequence from the modified feature parameter time-series data so that the likelihood of the distribution of the feature parameters and the distribution of the feature parameter time-series data is increased. Generate. Specifically, the second intermediate data generation unit 140 uses the Hessian matrix (second-order partial differentiation of the reference value based on the characteristic parameter at each time as an element when the first predetermined reference is maximization of a reference value. N × n symmetric matrix) and an n-dimensional vector whose elements are first-order partial differentials of reference values according to feature parameters at each time. In response to the input of the n-dimensional vector, the correction data generation unit 120 outputs a vector for correcting the time series data of the feature parameter in the Newton-Raphson method. This vector is expressed by the following equation (15), where H is a Hessian matrix and n-dimensional vector whose element is a first-order partial differential of a reference value according to a feature parameter at each time is ∂p '(X) / ∂X. Given. When −H = S and ∂p ′ (X) / ∂X = X, Equation (15) matches Equation (14), so that the correction data generation unit 120 can calculate this vector. it can.

ただし、数式（１２）の行列は、通常、対称帯行列である。もし、修正データ生成部１２０において、特徴パラメータの時系列データの分散を考慮しない装置、プログラム等を用いる場合、その装置、プログラム等へのｎ次元ベクトルの入力が数式（１２）の対称帯行列の形であることが前提の処理となっていると、そのままでは修正データ生成部１２０で数式（１５）の計算処理を行なうことができない。 However, the matrix of Expression (12) is usually a symmetric band matrix. If the modified data generation unit 120 uses an apparatus, a program, or the like that does not consider the distribution of time series data of feature parameters, the input of the n-dimensional vector to the apparatus, the program, etc. is the symmetric band matrix of Equation (12). If the processing is premised on the shape, the correction data generation unit 120 cannot perform the calculation processing of Expression (15) as it is.

これに対して、修正データ生成部１２０における処理前後で、特徴パラメータの時系列データの分散に対する尤度の値が変化しないことを仮定すると、分散の対数尤度関数の一階偏微係数は０となる。つまり、特徴パラメータの時系列データの分散の対数尤度関数に対するヘッセ行列は、対角行列となる。 On the other hand, if it is assumed that the likelihood value for the variance of the time series data of the feature parameter does not change before and after the processing in the modified data generation unit 120, the first-order partial coefficient of the logarithmic likelihood function of the variance is 0. It becomes. That is, the Hessian matrix for the log likelihood function of the variance of the time series data of the feature parameters is a diagonal matrix.

特に、反復処理を進めるにつれ、特徴パラメータの修正は小さくなり分散の値の変化がより小さくなるので、この仮定はより妥当なものとなる。よって、修正データ生成部１２０の出力の１つである、全体の対数尤度関数に関するヘッセ行列は、数式（１２）と同じ形の対称帯行列と対角行列の重みづけ和、すなわち数式（１２）と同じ形の対称帯行列（数式１２において常に０である行・列の要素が常に０の行列）となる。 In particular, as the iterative process proceeds, this assumption becomes more reasonable because the modification of the feature parameter becomes smaller and the change in the variance value becomes smaller. Therefore, the Hessian matrix relating to the entire log likelihood function, which is one of the outputs of the corrected data generation unit 120, is the weighted sum of the symmetric band matrix and the diagonal matrix having the same form as the formula (12), that is, the formula (12 ) In the same form as () (a matrix in which the row and column elements that are always 0 in Equation 12 are always 0).

このように、第２の中間データ生成部１４０は、修正の前後で特徴パラメータの時系列データの分散の尤度が変わらないことを仮定した条件の下で更新用の中間データ系列を生成する。これにより、特徴パラメータの時系列データの修正の計算を簡略化することで、処理負担を軽減できる。 As described above, the second intermediate data generation unit 140 generates the intermediate data series for update under the condition that the likelihood of the variance of the time series data of the feature parameters does not change before and after the correction. Thereby, the processing load can be reduced by simplifying the calculation of the correction of the time series data of the characteristic parameters.

第２の中間データ生成部１４０は、特徴パラメータの時系列データの分布に関する仮定を行なわずに、処理することもできる。その場合には、分散の対数尤度関数に対するヘッセ行列に対して、数式（１２）と同じ形の対称帯行列となるように、一部要素を０に置き換えて対称帯行列化した行列を出力できる。 The second intermediate data generation unit 140 can also perform processing without making assumptions regarding the distribution of time-series data of feature parameters. In that case, for the Hessian matrix for the log-likelihood function of variance, a matrix obtained by replacing some elements with 0 so that the symmetric band matrix has the same form as Equation (12) is output. it can.

以上のように、第２の中間データ生成部１４０は、特徴パラメータの分布の対数尤度と特徴パラメータの時系列データの分散の対数尤度との重み付き和を用いて更新用の中間データ系列を生成する。このようにして、重みにより特徴パラメータの時系列データの分散を再現したときの再現度を調整できる。そして、修正データ生成部１２０は、数式（１３）の処理を行なうことができる。 As described above, the second intermediate data generation unit 140 uses the weighted sum of the log likelihood of the feature parameter distribution and the log likelihood of the variance of the time series data of the feature parameters to update the intermediate data sequence for update. Is generated. In this way, it is possible to adjust the reproducibility when reproducing the variance of the time series data of the characteristic parameters by the weight. And the correction data generation part 120 can perform the process of Numerical formula (13).

処理制御部１５０は、修正された特徴パラメータの時系列データを、所定条件を満たすまで繰り返して修正させる。すなわち、処理制御部１５０は、所定の基準を満たすまで、第２の中間データ生成部１４０における処理、修正データ生成部１２０における処理、データ修正部１３０における処理を繰り返すように各部を制御する。所定の基準の例としては、（１）修正データ生成部１２０の出力である特徴パラメータ時系列の修正データにおける、その要素の二乗平均和がある閾値以下になる、（２）ある一定回数処理を繰り返す、がある。このようにして一定の再現レベルまで特徴パラメータの時系列データを収束させることができる。 The process control unit 150 repeatedly corrects the time-series data of the corrected feature parameters until a predetermined condition is satisfied. That is, the process control unit 150 controls each unit to repeat the process in the second intermediate data generation unit 140, the process in the correction data generation unit 120, and the process in the data correction unit 130 until a predetermined criterion is satisfied. Examples of the predetermined criteria include (1) a feature parameter time-series correction data that is output from the correction data generation unit 120, and a mean square sum of the elements is equal to or less than a certain threshold value. There is to repeat. In this way, the time series data of feature parameters can be converged to a certain reproduction level.

（特徴パラメータ生成装置の動作）
次に、このように構成された特徴パラメータ生成装置１００の動作を説明する。まず、特徴パラメータ生成装置１００は、上記の構成の前段において、入力記号列等のシンボルの系列情報から特徴パラメータ分布の時系列情報および特徴パラメータの時系列データの分散の分布情報を決定木等により生成する（ステップＳ１）。 (Operation of feature parameter generator)
Next, the operation of the feature parameter generation device 100 configured as described above will be described. First, the feature parameter generation device 100 uses the decision tree or the like to determine the time series information of the feature parameter distribution and the distribution information of the variance of the time series data of the feature parameter from the symbol series information such as the input symbol string in the previous stage of the above configuration. Generate (step S1).

そして、特徴パラメータの時系列データの初期系列として、全ての要素を０とした時系列データを設定する（ステップＳ２）。そして、第１の中間データ生成部１１０は、特徴パラメータの分布が最尤となるように、上記の情報から初期用の中間データ系列を生成する（ステップＳ３）。 Then, time series data with all elements set to 0 is set as an initial series of time series data of feature parameters (step S2). Then, the first intermediate data generation unit 110 generates an initial intermediate data series from the above information so that the distribution of the characteristic parameters becomes maximum likelihood (step S3).

次に、修正データ生成部１２０は、初期用の中間データ系列から修正用系列データを生成する（ステップＳ４）。そして、データ修正部１３０は、シンボル情報系列に対応する特徴パラメータの時系列データを、修正用系列データを用いて修正する（ステップＳ５）。 Next, the correction data generation unit 120 generates correction series data from the initial intermediate data series (step S4). Then, the data correction unit 130 corrects the time series data of the characteristic parameters corresponding to the symbol information series using the correction series data (step S5).

次に、繰り返し処理の停止条件を満たすか否かを判定する（ステップＳ６）。すなわち、上記の所定の基準を満たすか否かを判定する。停止条件を満たさない場合には、第２の中間データ生成部１４０が、特徴パラメータの分布および特徴パラメータの時系列データの分散の尤度がより大きくなるように、修正された特徴パラメータの時系列データから更新用の中間データ系列を生成し（ステップＳ７）、ステップＳ４に戻る。一方、停止条件を満たす場合には、生成された特徴パラメータの時系列データを出力し（ステップＳ８）、動作を終了する。 Next, it is determined whether or not a repetitive process stop condition is satisfied (step S6). That is, it is determined whether or not the predetermined criterion is satisfied. When the stop condition is not satisfied, the second intermediate data generation unit 140 corrects the feature parameter time series so that the likelihood of the distribution of the feature parameters and the time series data of the feature parameters becomes larger. An intermediate data series for update is generated from the data (step S7), and the process returns to step S4. On the other hand, when the stop condition is satisfied, the time series data of the generated feature parameter is output (step S8), and the operation is terminated.

このように、特徴パラメータ生成装置１００は、修正データ生成部１２０が更新用の中間データ系列から修正用系列データを生成し、データ修正部１３０が修正用系列データを用いて修正された特徴パラメータの時系列データを修正する一連の動作を繰り返す。これにより、特徴パラメータの時系列データについての初期の修正時と繰り返しの修正時のいずれの場合にも、修正データ生成部１２０が動作するため、回路やソフトウェアを重複して設ける必要がなくなり、回路規模・ソフトウェア規模を低減できる。このようにして生成された特徴パラメータの時系列データは、たとえば音声として出力される。 In this way, in the feature parameter generation device 100, the correction data generation unit 120 generates correction sequence data from the update intermediate data sequence, and the data correction unit 130 uses the correction sequence data to correct the feature parameters. A series of operations for correcting time series data is repeated. As a result, the correction data generation unit 120 operates both in the initial correction and the repeated correction of the time series data of the characteristic parameters, so that it is not necessary to provide redundant circuits and software. Scale and software scale can be reduced. The time series data of the characteristic parameters generated in this way is output as, for example, voice.

なお、特徴パラメータの時系列の分布は、特徴パラメータの時系列の動的特徴の分布を含んでもよい。第１の中間データ生成部１１０は動的特徴を含む特徴パラメータの分布が最尤となるようにシンボル情報系列から中間データ系列を生成し、第２の中間データ生成部１４０は、動的特徴を含む特徴パラメータの分布および特徴パラメータの時系列データの分散の尤度がより大きくなるようにシンボル情報系列から中間データ系列を生成する。これにより、動的特徴も考慮し、特徴パラメータの時系列データの再現性を向上できる。 Note that the time-series distribution of feature parameters may include a time-series dynamic feature distribution of feature parameters. The first intermediate data generation unit 110 generates an intermediate data sequence from the symbol information sequence so that the distribution of feature parameters including dynamic features is maximum likelihood, and the second intermediate data generation unit 140 generates dynamic features. An intermediate data sequence is generated from the symbol information sequence so that the distribution of the feature parameter and the likelihood of dispersion of the time series data of the feature parameter are increased. Thereby, the reproducibility of the time series data of the feature parameter can be improved in consideration of the dynamic feature.

動的特徴は、数式（２）および数式（３）で定義されるものに限定されない。数式（１２）の逆行列が計算可能であればよく、oiの次元数も任意である。そのような例として、｛ｘi｝の適当な部分系列に対する離散コサイン変換の結果を、oiの要素とする方法がある。 The dynamic features are not limited to those defined by Equation (2) and Equation (3). It suffices if the inverse matrix of Expression (12) can be calculated, and the number of dimensions of oi is arbitrary. As such an example, there is a method in which the result of discrete cosine transform for an appropriate subsequence of {xi} is used as an element of oi.

また、特徴パラメータの時系列データの分散は、系列全体から求めた分散でなくても良い。例えば、各時刻について、その前後の一定数のフレームのみの分散をそれぞれ計算し、それらの平均値を用いるといった方法がある。この場合は、その分布も同様に計算した値から決めることで、より正確に特徴パラメータの時系列データを再現することができる。 Also, the variance of the time series data of the feature parameters may not be the variance obtained from the entire sequence. For example, for each time, there is a method of calculating the variance of only a certain number of frames before and after the time and using the average value thereof. In this case, the time series data of the characteristic parameters can be reproduced more accurately by determining the distribution from the calculated values in the same manner.

以上の実施形態では、１次元の値の時系列データを生成しているが、多次元の値（ベクトル）のデータの生成に拡張した場合でも同様である。また、以上の実施形態において、各部の処理はプログラムがコンピュータに実行させることで実現可能である。 In the above embodiment, one-dimensional value time-series data is generated. However, the same applies to the case of expansion to the generation of multi-dimensional value (vector) data. Moreover, in the above embodiment, the process of each part is realizable by making a computer run a program.

１００特徴パラメータ生成装置
１１０中間データ生成部
１２０修正データ生成部
１３０データ修正部
１４０中間データ生成部
１５０処理制御部 100 feature parameter generation device 110 intermediate data generation unit 120 correction data generation unit 130 data correction unit 140 intermediate data generation unit 150 processing control unit

Claims

A feature parameter generation device for reproducing time series data of feature parameters corresponding to symbol series information,
A first intermediate data generation unit that generates an initial intermediate data sequence composed of a first symmetric band matrix and a first vector from a symbol information sequence so that the distribution of feature parameters becomes maximum likelihood;
Correction data for generating first correction series data by an operation equivalent to calculating a product of an inverse matrix for the first symmetric band matrix and the first vector from the initial intermediate data series A generator,
A data correction unit for correcting time series data of feature parameters corresponding to the symbol information series using the first correction series data;
For updating the characteristic parameter distribution and the characteristic parameter time-series data variance, the second characteristic symmetric band matrix and the second vector are updated from the corrected characteristic parameter time-series data . A second intermediate data generation unit for generating an intermediate data series,
A second correction sequence is generated by an operation corresponding to the correction data generation unit calculating a product of the inverse matrix for the second symmetric band matrix and the second vector from the update intermediate data sequence. A feature parameter generation device characterized in that data is generated and the data correction unit repeats a series of operations for correcting the time series data of the corrected feature parameter using the second correction series data.

The feature parameter generation apparatus according to claim 1 , further comprising a processing control unit that repeatedly corrects the time series data of the corrected feature parameters until a predetermined condition is satisfied.

The feature parameter time-series distribution includes a feature parameter time-series dynamic feature distribution, and the first intermediate data generation unit performs symbol information so that the feature parameter distribution including the dynamic feature is a maximum likelihood. An intermediate data sequence is generated from the sequence, and the second intermediate data generation unit generates the distribution of the feature parameter including the dynamic feature and the likelihood of the variance of the time series data of the feature parameter from the symbol information sequence. 3. The characteristic parameter generation apparatus according to claim 1, wherein the intermediate data series is generated.

The second intermediate data generation unit generates an intermediate data sequence for update using a weighted sum of the log likelihood of the distribution of the feature parameter and the log likelihood of the variance of the time series data of the feature parameter. The feature parameter generation device according to claim 1 , wherein the feature parameter generation device is a feature parameter generation device.

The second intermediate data generation unit generates the intermediate data series for update under a condition that the likelihood of variance of the time series data of the feature parameters does not change before and after the correction. The feature parameter generation device according to claim 4 , wherein the feature parameter generation device is a feature.

Voice synthesis information describing the type of unit voice included in a series of unit voice strings is input as the symbol information series, and a synthesized voice waveform is generated as time-series data of the modified feature parameters. The feature parameter generation device according to any one of claims 1 to 5 .

A feature parameter generation method for reproducing time series data of a feature parameter corresponding to a symbol information series performed by a computer,
Generating an initial intermediate data sequence consisting of a first symmetric band matrix and a first vector from a symbol information sequence so that the distribution of feature parameters is maximum likelihood;
Generating first correction series data from the initial intermediate data series by an operation equivalent to calculating a product of an inverse matrix for the first symmetric band matrix and the first vector ; ,
Correcting time series data of characteristic parameters corresponding to the symbol information series using the first correction series data;
For updating the characteristic parameter distribution and the characteristic parameter time-series data variance, the second characteristic symmetric band matrix and the second vector are updated from the corrected characteristic parameter time-series data . Generating an intermediate data series,
Wherein the intermediate data sequence for updating, and generates the inverse matrix for the second symmetric band matrix, the second correcting series data by operation corresponding to calculating the product of the second vector, the A feature parameter generation method characterized by repeating a series of steps for correcting the time series data of the corrected feature parameters using the second correction series data .

A feature parameter generation program for causing a computer to reproduce time series data of feature parameters corresponding to symbol information series,
A process of generating an initial intermediate data sequence composed of a first symmetric band matrix and a first vector from a symbol information sequence so that the distribution of feature parameters becomes maximum likelihood;
Processing for generating first correction series data by an operation corresponding to calculating a product of an inverse matrix for the first symmetric band matrix and the first vector from the initial intermediate data series ; ,
Processing to correct time series data of feature parameters corresponding to the symbol information series using the first correction series data;
For updating the characteristic parameter distribution and the characteristic parameter time-series data variance, the second characteristic symmetric band matrix and the second vector are updated from the corrected characteristic parameter time-series data . Processing to generate an intermediate data series,
Wherein the intermediate data sequence for updating, and generates the inverse matrix for the second symmetric band matrix, the second correcting series data by operation corresponding to calculating the product of the second vector, the A feature parameter generation program that repeats a series of processes for correcting the time series data of the corrected feature parameter using the second correction series data .