JP3068250B2

JP3068250B2 - Speech synthesizer

Info

Publication number: JP3068250B2
Application number: JP3180668A
Authority: JP
Inventors: 潤亀谷; 世光友竹
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1991-07-22
Filing date: 1991-07-22
Publication date: 2000-07-24
Anticipated expiration: 2015-07-24
Also published as: JPH0527791A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、規則合成方式を用い
て、予めフレーム毎に分析したスペクトル情報を含む複
数の音声情報パラメータをフレーム単位で合成し音声発
声を可能とする音声合成器に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer for synthesizing speech by synthesizing a plurality of speech information parameters including spectrum information previously analyzed for each frame by frame using a rule synthesis method.

【０００２】[0002]

【従来の技術】従来のこの種の音声合成器では、文章を
一定時間長のフレーム毎に分析して得た音声情報パラメ
ータを用いて音声合成する場合、一定フレーム時間毎に
例えば、スペクトル情報と残差（パルス）などのパラメ
ータにより合成している。このような音声合成器で高速
音声発声を行う場合は、スペクトル情報について有声ま
たは無声、母音または子音の判定を行い、有声もしくは
母音と判定されたフレームを間引いている（例えば、特
願平２−５８６０９号明細書に記載の音声合成器）。2. Description of the Related Art In a conventional speech synthesizer of this type, when speech is synthesized using speech information parameters obtained by analyzing a sentence for each frame of a fixed time length, for example, spectrum information and speech information are obtained at fixed frame times. Synthesis is performed using parameters such as a residual (pulse). When a high-speed voice utterance is performed by such a voice synthesizer, a voiced or unvoiced voice, a vowel or a consonant is determined with respect to the spectrum information, and the frames determined to be voiced or vowels are thinned out (for example, Japanese Patent Application No. Hei. No. 58609).

【０００３】[0003]

【発明が解決しようとする課題】このような従来の音声
合成器では、有声もしくは、母音フレームを一定の基準
により間引いており、一定の発声速度で音声合成を行う
だけである。In such a conventional speech synthesizer, voiced or vowel frames are thinned out based on a fixed reference, and only speech synthesis is performed at a fixed utterance speed.

【０００４】[0004]

【課題を解決するための手段】本発明の音声合成器は、
一定時間長のフレーム毎に分析したスペクトル情報を含
む複数の音声情報パラメータを前記フレーム単位で編集
して合成する音声合成器において、前記スペクトル情報
の予測ゲイン算出手段と、予め定められた複数のしきい
値の中から一つのしきい値を外部より与えられる速度指
示に応じて選定する選択手段と、前記予測ゲインと前記
しきい値とを比較して前記予測ゲインが前記しきい値よ
り大きな場合にはフレーム間引きを行なわせる制御手段
とを備えている。The speech synthesizer of the present invention comprises:
In a speech synthesizer for editing and synthesizing a plurality of speech information parameters including spectrum information analyzed for each frame of a fixed time length in units of the frame, a predictive gain calculation means for the spectrum information and a plurality of predetermined gains are calculated. Selecting means for selecting one threshold value from the threshold values according to a speed instruction given from the outside, and comparing the predicted gain with the threshold value so that the predicted gain is larger than the threshold value. Is provided with control means for performing frame thinning.

【０００５】[0005]

【実施例】次に、本発明について図面を参照して説明す
る。Next, the present invention will be described with reference to the drawings.

【０００６】図１は本発明の一実施例のブロック図であ
る。本実施例では、スペクトル情報を偏自己相関（ＰＡ
ＲＣＯＲ）方式により求めて与える。フレーム内の平均
予測残差信号電力（Ｐｅ）は、音声スペクトル情報の一
つの表現方法である偏自己相関係数（Ｋｉ）を用いて第
（１）式のように表される。FIG. 1 is a block diagram of one embodiment of the present invention. In the present embodiment, the spectrum information is converted to partial autocorrelation (PA
(RCOR) method. The average prediction residual signal power (Pe) in the frame is expressed by Expression (1) using the partial autocorrelation coefficient (Ki), which is one expression method of speech spectrum information.

【０００７】 [0007]

【０００８】また、予側ゲイン（Ｐｇ）は、第（２）式
のように表される。The preliminary gain (Pg) is expressed by the following equation (2).

【０００９】Ｐ_g＝Ｐ₀−Ｐ_e （２）ただし、Ｐ₀は、入力音声の平均電力を示す。また、偏
自己相関係数の次数Ｐは通常１０程度の値を選択する。P _g = P ₀ −P _e (2) where P ₀ indicates the average power of the input voice. In addition, the order P of the partial autocorrelation coefficient usually selects a value of about 10.

【００１０】この予測ゲイン（Ｐｇ）は、入力音声が母
音定常部などのような周期波の場合、偏自己相関係数ｋ
_iが１に近づくため、第（１）式および、第（２）式か
らわかるように、Ｐ₀に近い値をとる。また、入力音声
が子音部のような非周期波の場合、偏自己相関係数ｋ_i
が０に近づくため、予測ゲイン（Ｐｇ）は非常に小さな
値をとる。When the input speech is a periodic wave such as a vowel stationary part, the predictive gain (Pg) is a partial autocorrelation coefficient k
_{Since i} approaches 1, the value takes a value close to P _{0 as} can be seen from Expressions (1) and (2). When the input speech is an aperiodic wave such as a consonant part, the partial autocorrelation coefficient k _i
Approaches 0, the prediction gain (Pg) takes a very small value.

【００１１】従って予側ゲイン（Ｐｇ）の値をしきい値
と比較することにより、母音部フレームと子音フレーム
とを区別することができる。また、しきい値をＰ₀まで
変化させることにより、より安定した母音部フレームを
連続的に検出することができる。本方法を用いて、予め
母音フレーム判定しきい値を変えて行きながら、発声速
度に対応した複数のしきい値を学習しておき、これをし
きい値テーブルを作成する。Therefore, by comparing the value of the preliminary gain (Pg) with the threshold value, the vowel frame and the consonant frame can be distinguished. By changing the threshold value to P ₀ , a more stable vowel frame can be continuously detected. Using this method, a plurality of threshold values corresponding to the utterance speed are learned while changing the vowel frame determination threshold value in advance, and a threshold value table is created based on the learned threshold values.

【００１２】図１において、まず、音声ファイル１か
ら、合成に必要な音声データを音声メモリ２に送り蓄え
る。音声メモリ２はフレーム制御回路１０のタイミング
制御を受け、１フレーム分ずつ、スペクトル情報を予測
ゲイン算出器３とバッファメモリ６に転送し、残差をバ
ッファメモリ７に転送する。In FIG. 1, first, voice data necessary for synthesis is transmitted from a voice file 1 to a voice memory 2 and stored. The audio memory 2 receives the timing control of the frame control circuit 10 and transfers the spectrum information to the prediction gain calculator 3 and the buffer memory 6 for each frame, and transfers the residual to the buffer memory 7.

【００１３】予測ゲイン算出器３で予測ゲインが計算さ
れ、判定器４で、予め学習して定めたしきい値が格納さ
れているしきい値テーブル１４のしきい値と比較する。
どのしきい値を用いるかは、ホストＣＰＵ１２から与え
られる速度パラメータに応じて、しきい値制御回路１３
で決定する。The prediction gain is calculated by the prediction gain calculator 3 and compared with the threshold value of the threshold value table 14 in which the threshold value previously learned and stored is stored in the judgment unit 4.
Which threshold value to use is determined by a threshold control circuit 13 according to a speed parameter given from the host CPU 12.
Determined by

【００１４】予測ゲインがしきい値以上の場合、すなわ
ち、間引かないと判断されたフレームは、判定器４に接
続しているバッファ制御回路１１でバッファメモリ６お
よびバッファメモリ７を制御して、バッファメモリ６、
７に蓄積されている各データを合成フィルタ８に送出
し、合成フィルタ８は音声を合成を行い音声出力する。When the predicted gain is equal to or larger than the threshold value, that is, when the frame is determined not to be thinned, the buffer memory 6 and the buffer memory 7 are controlled by the buffer control circuit 11 connected to the decision unit 4. Buffer memory 6,
Each of the data stored in 7 is sent to a synthesizing filter 8, and the synthesizing filter 8 synthesizes voice and outputs the voice.

【００１５】予測ゲインがしきい値以下の場合は、バッ
ファメモリ６および７に蓄積されている１フレーム分の
スペクトル情報および残差を廃棄し、次の１フレーム分
の各データをバッファメモリ６、７に蓄積する。各デー
タの廃棄は、合成フィルタ８を一時中断することにより
行う。これにより、１フレーム分を間引いたことにな
る。If the predicted gain is equal to or smaller than the threshold value, the spectral information and residual for one frame stored in the buffer memories 6 and 7 are discarded, and each data for the next one frame is stored in the buffer memory 6. 7 is accumulated. Discarding of each data is performed by temporarily stopping the synthesis filter 8. This means that one frame has been thinned out.

【００１６】[0016]

【発明の効果】以上説明したように本発明によれば、予
め発声速度に対応したしきい値を学習により複数用意し
ておき、発声速度に応じてしきい値を変えて間引くフレ
ームの量を制御することにより、従来の音声合成器では
不可能であった発声速度の可変制御が可能となる。As described above, according to the present invention, a plurality of threshold values corresponding to the utterance speed are prepared in advance by learning, and the amount of frames to be thinned out is changed by changing the threshold value according to the utterance speed. By performing the control, the utterance speed can be variably controlled, which is impossible with a conventional speech synthesizer.

[Brief description of the drawings]

【図１】図１は本発明の実施例のブロック図。FIG. 1 is a block diagram of an embodiment of the present invention.

[Explanation of symbols]

１音声ファイル２音声メモリ３予測ゲイン算出器４判定器６，７バッファメモリ８合成フィルタ１０フレーム制御回路１１バッファ制御回路１２ホストＣＰＵ１３しきい値制御回路１４しきい値テーブル DESCRIPTION OF SYMBOLS 1 Audio file 2 Audio memory 3 Predictive gain calculator 4 Judgment device 6, 7 Buffer memory 8 Synthesis filter 10 Frame control circuit 11 Buffer control circuit 12 Host CPU 13 Threshold control circuit 14 Threshold table

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭64−9500（ＪＰ，Ａ) 特開昭61−122700（ＪＰ，Ａ) 特開平１−93795（ＪＰ，Ａ) 特開昭63−234299（ＪＰ，Ａ) 特開昭63−199399（ＪＰ，Ａ) 特開昭59−082608（ＪＰ，Ａ) 特開昭62−102300（ＪＰ，Ａ) 特開平３−206496（ＪＰ，Ａ) 特許2758688（ＪＰ，Ｂ２) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 13/08 G10L 19/00 - 21/06 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-64-9500 (JP, A) JP-A-61-122700 (JP, A) JP-A-1-93795 (JP, A) JP-A 63-122 234299 (JP, A) JP-A-63-199399 (JP, A) JP-A-59-082608 (JP, A) JP-A-62-102300 (JP, A) JP-A-3-206496 (JP, A) Patent 2758688 (JP, B2) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 11/00-13/08 G10L 19/00-21/06 JICST file (JOIS)

Claims

(57) [Claims]

1. A speech synthesizer for editing and synthesizing a plurality of speech information parameters including spectrum information analyzed for each frame of a fixed time length in frame units, and comprising: a predictive gain calculation means for the spectrum information; Selecting means for selecting one of the plurality of threshold values according to a speed instruction given from the outside, and comparing the predicted gain with the threshold value so that the predicted gain is And a control means for performing frame thinning when the threshold value is larger than the threshold value.