JP3264998B2

JP3264998B2 - Speech synthesizer

Info

Publication number: JP3264998B2
Application number: JP26099692A
Authority: JP
Inventors: 潤亀谷; 世光友竹
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1992-09-30
Filing date: 1992-09-30
Publication date: 2002-03-11
Anticipated expiration: 2017-03-11
Also published as: JPH06110496A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は音声合成器に関し、特に
規則合成方式を用いた音声合成器などにおいて、予めフ
レーム毎に分析したスペクトル情報を含む複数の音声情
報パラメータをフレーム単位で合成して音声発声を可能
とする音声合成器に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech synthesizer, and more particularly to a speech synthesizer using a rule synthesizing method in which a plurality of speech information parameters including spectrum information analyzed in advance for each frame are synthesized for each frame. The present invention relates to a speech synthesizer that enables speech utterance.

【０００２】[0002]

【従来の技術】従来、文章を一定時間長のフレームごと
に分析した音声情報パラメータを用いて音声を合成する
場合、一定フレーム時間ごとに例えばスペクトル情報と
残差（パルス）などのパラメータにより合成している。
このような音声合成器で高速音声発声を行う場合は、ス
ペクトル情報により有声または無声および母音または子
音の判定を行い、有声もしくは母音と判定されたフレー
ムをある区間毎一定に間引く方法により高速度音声発声
を行っている。2. Description of the Related Art Conventionally, when speech is synthesized using speech information parameters obtained by analyzing a sentence for each frame of a fixed time length, the text is synthesized for each fixed frame time using, for example, spectral information and parameters such as a residual (pulse). ing.
When a high-speed voice utterance is performed by such a voice synthesizer, a voiced or unvoiced voice and a vowel or a consonant are determined based on spectrum information, and a high-speed voice is determined by a method of thinning out a frame determined as a voiced or vowel at a certain interval in a certain section. Uttering.

【０００３】図３を参照すると、従来の音声合成器は、
一定時間長のフレームごとに分析したスペクトル情報を
含む複数の音声情報パラメータを前記フレーム単位で編
集して合成する音声合成器において、音声ファイル１か
らの合成に必要な音声データをａ蓄えるとともに制御信
号ｄ応じて１フレーム単位でスペクトル情報ｂを出力し
かつ残差ｃを出力する音声メモリ２と、音声メモリ２か
らのスペクトル情報ｂの予測ゲインを算出する予測ゲイ
ン算出器３と、予測ゲイン算出器３からの予測ゲイン算
出値をしきい値メモリ５からの予測ゲインしきい値と比
較判定する判定器４と、音声メモリ２からのスペクトル
情報ｂを格納するバッファメモリ６と、音声メモリ２か
らの残差ｃを格納するバッファメモリ７と、判定器４か
らの判定出力により制御信号を出力してバッファメモリ
６およびバッファメモリ７を制御するバッファ制御回路
１１と、バッファメモリ６からの出力とバッファメモリ
７からの出力とを合成する合成フィルタ８と、フレーム
間引きのための制御信号ｄを出力して音声メモリ２を制
御するフレーム制御回路１０とから構成される。Referring to FIG. 3, a conventional speech synthesizer includes:
In a speech synthesizer for editing and synthesizing a plurality of speech information parameters including spectrum information analyzed for each frame of a fixed time length in frame units, a speech data necessary for synthesis from the speech file 1 is stored and a control signal is stored. d a voice memory 2 to output the outputs and residual c spectral information b in frame by frame in response, the prediction gain calculator 3 for calculating a prediction gain for spectrum information b from the voice memory 2, the prediction gain calculated the prediction gain calculated value from vessel 3 and the prediction gain threshold from the threshold memory 5 and Comparative determination unit 4, a buffer memory 6 for storing the spectral information b from the voice memory 2, the voice memory 2 And a buffer memory 6 for storing a residual c from the buffer memory 6 and a buffer memory 6 for outputting a control signal based on the determination output from the determiner 4. A buffer control circuit 11 for controlling the memory 7, a synthesis filter 8 for synthesizing an output from the buffer memory 6 and an output from the buffer memory 7, and a control signal d for frame decimation to control the audio memory 2. And a frame control circuit 10.

【０００４】[0004]

【発明が解決しようとする課題】この従来の音声合成器
では、有声もしくは母音フレームを一つの基準により間
引いているために、母音と判定される区間が一律に間引
かれしまうことによって、しきい値の設定状態あるいは
発声する言葉によっては母音フレームがほとんど間引か
れて音質の劣化が起こる。In this conventional speech synthesizer, since a voiced or vowel frame is thinned out according to one criterion, a section determined to be a vowel is thinned out uniformly, thereby causing a threshold. Depending on the setting state of the value or the words to be uttered, the vowel frames are almost thinned out and the sound quality deteriorates.

【０００５】[0005]

【課題を解決するための手段】本発明による音声合成器
は、一定時間長のフレームごとに分析したスペクトル情
報を含む複数の音声情報パラメータを前記フレーム単位
で編集して合成する音声合成器において、前記スペクト
ル情報の予測ゲインを算出する予測ゲイン算出手段と、
前記フレームの間引きを制御する制御手段とを有し、前
記予測ゲインが予測ゲインしきい値より小さい場合、か
つ前記予測ゲインのフレーム間変化量が少ない場合に前
記フレームを間引く。According to the present invention, there is provided a speech synthesizer for editing a plurality of speech information parameters including spectrum information analyzed for each frame of a predetermined time length in units of the frame to synthesize the speech information. Prediction gain calculation means for calculating a prediction gain of the spectrum information,
Control means for controlling the thinning of the frame , wherein the frame is thinned when the prediction gain is smaller than a threshold value of the prediction gain and when the inter-frame change amount of the prediction gain is small.

【０００６】また、本発明による音声合成器は、一定時
間長のフレームごとに分析したスペクトル情報を含む複
数の音声情報パラメータを前記フレーム単位で編集して
合成する音声合成器において、前記スペクトル情報の予
測ゲインを算出する予測ゲイン算出手段と、前記スペク
トル情報のピッチ周期を算出するピッチ周期算出手段
と、前記フレームの間引きを制御する制御手段とを有
し、前記予測ゲインが予測ゲインしきい値より小さい場
合、かつ前記スペクトル情報のピッチ周期が安定してい
る場合に前記フレームを間引く。A speech synthesizer according to the present invention edits and synthesizes a plurality of speech information parameters including spectrum information analyzed for each frame of a fixed time length in a frame unit, and synthesizes the speech information. a prediction gain calculation means for calculating a prediction gain, and pitch period calculating means for calculating a pitch period of the spectral information, and a control means for controlling the decimation of the frame, the prediction gain from the prediction gain threshold When the pitch is small and the pitch period of the spectrum information is stable, the frames are thinned out.

【０００７】さらに、本発明による音声合成器は、一定
時間長のフレームごとに分析したスペクトル情報を含む
複数の音声情報パラメータを前記フレーム単位で編集し
て合成する音声合成器において、音声ファイルからの合
成に必要な音声データを蓄えるとともに第１の制御信号
に応じてフレーム単位でスペクトル情報を出力しかつ残
差を出力する音声メモリと、前記音声メモリからのスペ
クトル情報の予測ゲインを算出する予測ゲイン算出手段
と、前記予測ゲイン算出手段からの前記予測ゲインの変
化量を算出する予測ゲイン変化量算出手段と、前記予測
ゲイン算出手段からの予測ゲイン算出値を予測ゲインし
きい値と比較判定するとともに前記予測ゲイン変化量算
出手段からの予測ゲインの変化量を予測ゲイン変化量し
きい値と比較判定する判定手段と、前記音声メモリから
の前記スペクトル情報を格納する第１のバッファメモリ
と、前記音声メモリからの前記残差を格納する第２のバ
ッファメモリと、前記判定手段からの判定出力により第
２の制御信号を出力して前記第１のバッファメモリを制
御するとともに第３の制御信号を出力して前記第２のバ
ッファメモリを制御するバッファ制御手段と、前記第１
のバッファメモリからの出力と前記第２のバッファメモ
リからの出力とを合成する合成フィルタと、前記フレー
ムの間引きのための前記第１の制御信号を出力して前記
音声メモリを制御するフレーム制御手段とを備える。Further, a voice synthesizer according to the present invention edits a plurality of voice information parameters including spectrum information analyzed for each frame of a fixed time length in frame units and synthesizes the parameters. a voice memory for outputting the spectral information and outputs a residual frame by frame in response to the first control signal with storing audio data required for the synthesis, the prediction of the scan Bae <br/> spectrum information from the audio memory Predictive gain calculating means for calculating a gain, predictive gain change amount calculating means for calculating a change amount of the predictive gain from the predictive gain calculating means, and a predictive gain threshold value calculated from the predictive gain calculating value from the predictive gain calculating means. And comparing the predicted gain change amount from the predicted gain change amount calculation means with a predicted gain change amount threshold value. A constant determining means, a first buffer memory for storing the spectrum information from the voice memory, a second buffer memory for storing the residual from the voice memory, judgment output from said determination means A buffer control means for outputting a second control signal to control the first buffer memory and outputting a third control signal to control the second buffer memory;
Synthesis filter and a frame control means for controlling the voice memory and outputting the first control signal for decimation of the frame of the output from the buffer memory synthesizes the output from the second buffer memory And

【０００８】さらにまた、本発明による音声合成器は、
一定時間長のフレームごとに分析したスペクトル情報を
含む複数の音声情報パラメータをフレーム単位で編集し
て合成する音声合成器において、音声ファイルからの合
成に必要な音声データを蓄えるとともに第１の制御信号
に応じてフレーム単位でスペクトル情報を出力しかつ残
差を出力する音声メモリと、前記音声メモリからのスペ
クトル情報の予測ゲインを算出する予測ゲイン算出手段
と、前記音声メモリからのスペクトル情報のピッチ周期
を算出するピッチ周期算出手段と、前記予測ゲイン算出
手段からの予測ゲイン算出値を予測ゲインしきい値と比
較判定するとともに前記ピッチ周期算出手段からのピッ
チ周期の変化量をピッチ周期変化量しきい値と比較判定
する判定手段と、前記音声メモリからの前記スペクトル
情報を格納する第１のバッファメモリと、前記音声メモ
リからの前記残差を格納する第２のバッファメモリと、
前記判定手段からの判定出力により第２の制御信号を出
力して前記第１のバッファメモリを制御するとともに第
３の制御信号を出力して前記第２のバッファメモリを制
御するバッファ制御手段と、前記第１のバッファメモリ
からの出力と前記第２のバッファメモリからの出力とを
合成する合成フィルタと、前記フレームの間引きのため
の前記第１の制御信号を出力して前記音声メモリを制御
するフレーム制御手段とを備える。[0008] Furthermore, the speech synthesizer according to the present invention comprises:
A speech synthesizer for editing and synthesizing a plurality of speech information parameters including spectrum information analyzed for each frame of a fixed time length on a frame basis and storing speech data necessary for synthesis from a speech file and a first control signal a prediction gain calculation means for calculating a voice memory for outputting the outputs and residual spectrum information, the prediction gain of the scan Bae <br/> spectrum information from the voice memory on a frame-by-frame basis in response to, from the voice memory variation of the pitch period from the pitch period calculating means as well as comparison determining the pitch period calculating means, the prediction gain calculated value from the prediction gain calculation means and the prediction gain threshold to calculate the pitch period of the spectral information be stored and compared determining means and the pitch period variation threshold, the spectrum information from the voice memory A first buffer memory, a second buffer memory for storing the residual from the voice memory,
A buffer control unit that outputs a second control signal based on a determination output from the determination unit to control the first buffer memory and outputs a third control signal to control the second buffer memory; controlling a synthesis filter for synthesizing an output from said first of said second buffer memory and an output from the buffer memory, the voice memory and outputting the first control signal for decimation of the frame Frame control means.

【０００９】[0009]

【実施例】スペクトル情報として偏自己相関（ＰＡＲＣ
ＯＲ）方式を例にすると、フレーム内の平均予測残差信
号電力Ｐｅは、音声スペクトル情報の一つの表現方法で
ある偏自己相関係数ｋｉを用いて式（１）のように表さ
れる。また、予測ゲインは、音声符号化処理の分野では
一般的な用語であり、”入力信号のエネルギーと予測残
差のエネルギーの比を予測利得（予測ゲイン）”として
定義される（「音声符号化」守谷健弘著電子情報通
信学会刊Ｐ．２３参照）。スペクトル情報（ここでは
偏自己相関係数：ｋｉ）から算出した予測ゲインを本
発明では、以下で“ｋｉの予測ゲイン”と記述すが、
“ｋｉの予測ゲイン”とは”正規化予測ゲインＰｇ”を
示すので、“ｋｉの予測ゲイン”とは“正規化予測ゲイ
ンＰｇ”のことである。また、偏自己相関係数ｋｉは、
予測ゲインを表すのに使用される係数であるため、正規
化予測ゲインＰｇをスペクトル情報の１つである偏自己
相関係数ｋｉにより算出することができる。よって、ｋ
ｉの予測ゲインとは、スペクトル情報すなわち偏自己相
関係数ｋｉから算出した正規化予測ゲインＰｇを示
し、この正規化予測ゲインＰｇをここでは（２）式のよ
うに定義する。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Partial autocorrelation (PARC)
Taking the (OR) method as an example, the average predicted residual signal power Pe in the frame is expressed as in equation (1) using the partial autocorrelation coefficient ki, which is one method of expressing speech spectrum information. In addition, the prediction gain is used in the field of speech coding processing.
It is a general term that says "input signal energy and
The ratio of the energy of the difference is referred to as “prediction gain (prediction gain)”.
Defined ("Speech coding" by Takehiro Moriya Electronic information communication
IEICE P.S. 23). Spectral information (here
The prediction gain calculated from the partial autocorrelation coefficient: ki)
In the invention, hereinafter, it is described as "predicted gain of ki".
“Ki prediction gain” means “normalized prediction gain Pg”
Therefore, the “prediction gain of ki” is referred to as “normalized prediction gay”.
Pg ″. The partial autocorrelation coefficient ki is
Normalized because it is a coefficient used to represent the expected gain
Of the normalized prediction gain Pg as one of the spectral information
It can be calculated by the correlation coefficient ki. Therefore, k
The predicted gain of i is the spectral information, ie, the partial self-phase
Indicates the normalized prediction gain Pg calculated from the function coefficient ki.
In this case, the normalized prediction gain Pg is calculated by the following equation (2).
Defined as follows.

【００１０】ここで、Ｐ0 は入力音声の平均電力を示す。また、偏自
己相関係数ｋi の次数p は通常１０程度の値が選択され
る。[0010] Here, P0 indicates the average power of the input voice. The order p of the partial autocorrelation coefficient ki is usually selected to be about 10.

【００１１】この場合、ｋｉで表すことができる正規化
予測ゲインＰｇは、入力音声が母音定常部などのような
周期波の場合は、一般に偏自己相関係数ｋｉが、比較的
に安定した値をとり、正規化予測ゲインＰｇは同様に安
定して０に近い値をとる。入力音声が子音部のような非
周期波の場合は、偏自己相関係数ｋｉの値がばらつき、
また、このとき、正規化予測ゲインＰｇは、１に近い値
をとる。 In this case, the normalized prediction gain Pg , which can be represented by ki , generally indicates that the partial autocorrelation coefficient ki is relatively small when the input speech is a periodic wave such as a vowel stationary part.
And the normalized prediction gain Pg is similarly low.
And take a value close to 0. If the input speech is a non-periodic wave such as a consonant portion, variation value of the partial autocorrelation coefficients ki,
At this time, the normalized prediction gain Pg is a value close to 1.
Take.

【００１２】上述より、正規化予測ゲインＰｇの値をし
きい値と比較するすることにより、母音部フレームを検
出することができる。As described above , a vowel frame can be detected by comparing the value of the normalized prediction gain Pg with the threshold value.

【００１３】一般に定常な母音フレームは正規化予測ゲ
インＰｇが安定しているので、正規化予測ゲインＰｇが
しきい値以下で変化量の少ない（小さくかつ安定してい
る所）フレームを見つけることによって、安定している
母音フレームを見つけることができる。In general, a normalized vowel frame has a stable normalized prediction gain Pg . Therefore, by finding a frame in which the normalized prediction gain Pg is equal to or smaller than a threshold value and has a small amount of change (a small and stable place). You can find a stable vowel frame.

【００１４】また、ピッチ周期検出法では、スぺクトル
情報や残差のピークを探して算出する方法により実現す
ることができる。Further, the pitch period detecting method can be realized by a method of searching for and calculating peaks of spectral information and residuals.

【００１５】次に、本発明について図面を参照して説明
する。本発明の音声合成器の第１の実施例を示す図１を
参照すると、一定時間長のフレームごとに分析したスペ
クトル情報を含む複数の音声情報パラメータを前記フレ
ーム単位で編集して合成する音声合成器において、音声
ファイル１からの合成に必要な音声データａを蓄えると
ともに制御信号ｄに応じて１フレーム単位でスペクトル
情報ｂを出力しかつ残差ｃを出力する音声メモリ２と、
音声メモリ２からのスペクトル情報ｂの予測ゲインを算
出する予測ゲイン算出器３と、予測ゲイン算出器３から
の予測ゲインの変化量を算出する変化量算出器１２と、
予測ゲイン算出器３からの予測ゲイン算出値をしきい値
メモリ５からの予測ゲインしきい値と比較判定するとと
もに変化量算出器１２からの予測ゲインの変化量をしき
い値メモリ５からの予測ゲイン変化量しきい値と比較判
定する判定器４と、音声メモリ２からのスペクトル情報
ｂを格納するバッファメモリ６と、音声メモリ２からの
残差ｃを格納するバッファメモリ７と、判定器４からの
判定出力により制御信号を出力してバッファメモリ６お
よびバッファメモリ７を制御するバッファ制御回路１１
と、バッファメモリ６からの出力とバッファメモリ７か
らの出力とを合成する合成フィルタ８と、フレーム間引
きのための制御信号ｄを出力して音声メモリ２を制御す
るフレーム制御回路１０とから構成される。Next, the present invention will be described with reference to the drawings. Referring to FIG. 1 showing a first embodiment of the speech synthesizer according to the present invention, a speech synthesis for editing and synthesizing a plurality of speech information parameters including spectrum information analyzed for each frame of a fixed time length on a frame basis. A voice memory 2 for storing voice data a necessary for synthesis from a voice file 1 and outputting spectrum information b in units of one frame according to a control signal d and outputting a residual c;
A prediction gain calculator 3 for calculating a prediction gain for spectrum information b from the voice memory 2, a change amount calculator 12 for calculating a change amount of the prediction gain from a prediction gain calculator 3,
The prediction gain calculation value from the prediction gain calculator 3 is compared with the prediction gain threshold value from the threshold memory 5 and the change amount of the prediction gain from the change calculator 12 is predicted from the threshold memory 5. a gain variation threshold and comparator determination unit 4, a buffer memory 6 for storing the spectral information b from the voice memory 2, a buffer memory 7 for storing the residual c from the voice memory 2, the determination unit Buffer control circuit 11 which outputs a control signal based on the judgment output from control circuit 4 and controls buffer memory 6 and buffer memory 7
And a synthesizing filter 8 for synthesizing an output from the buffer memory 6 and an output from the buffer memory 7, and a frame control circuit 10 for outputting a control signal d for frame thinning to control the audio memory 2. You.

【００１６】詳述すると、スペクトル情報と音源情報と
を分離した形で記憶し合成する残差駆動方式の音声合成
器では、まず、音声ファイル１から合成に必要な音声デ
ータａを音声メモリ２に蓄える。音声メモリ２はフレー
ム制御回路１０からの制御信号ｄにより制御されて、ス
ペクトル情報ｂを１フレーム単位で予測ゲイン算出器３
とバッファメモリ６とへ転送するとともに残差ｃをバッ
ファメモリ７へ転送する。More specifically, in a residual drive type voice synthesizer that stores and synthesizes spectrum information and sound source information in a separated form, first, voice data a necessary for synthesis is stored in a voice memory 2 from a voice file 1. store. The audio memory 2 is controlled by a control signal d from the frame control circuit 10 and converts the spectrum information b into a prediction gain calculator 3 for each frame.
To the buffer memory 6 and the residual c to the buffer memory 7.

【００１７】予測ゲイン算出器３ではスペクトル情報ｂ
から予測ゲインＰg を計算してその算出結果を判定器４
および変化量算出器１２へ送出する。判定器４では予測
ゲインＰg の算出結果をしきい値メモリ５からのしきい
値と比較する。このとき変化量算出器１２では予測ゲイ
ンＰg の算出結果を数フレーム分格納できるようなリン
グバッファ等に格納しておく。The predicted gain calculator 3 calculates the spectrum information b
, And calculates the prediction gain Pg from the calculation result.
And a change amount calculator 12. The decision unit 4 compares the calculation result of the prediction gain Pg with the threshold value from the threshold value memory 5. At this time, the change amount calculator 12 stores the calculation result of the prediction gain Pg in a ring buffer or the like capable of storing several frames.

【００１８】ここで、予測ゲインＰg の算出結果が予め
設定したしきい値以上の場合、すなわち、間引かないと
判断されたフレームは、判定器４に接続されているバッ
ファ制御回路１１でバッファメモリ６およびバッファメ
モリ７を制御してバッファメモリ７に蓄積されている各
データを合成フィルタ８へ送出し、合成フィルタ８で音
声合成を行って音声出力端子９を介して出力する。Here, when the calculation result of the prediction gain Pg is equal to or larger than a predetermined threshold value, that is, the frame determined not to be thinned is determined by the buffer control circuit 11 connected to the determination unit 4 in the buffer memory. By controlling the buffer 6 and the buffer memory 7, each data stored in the buffer memory 7 is sent to the synthesis filter 8, the synthesis filter 8 synthesizes the voice, and outputs it via the voice output terminal 9.

【００１９】また、予測ゲインＰg の算出結果が予め設
定したしきい値以下の場合は、母音フレームと判定され
て間引かれる候補になる。次に変化量算出器１２では、
リングバッファに格納されている予測ゲインＰg の算出
値の平均変化量を算出して判定器４へ送出する。判定器
４では変化量算出器１２で算出した予測ゲインＰg の平
均変化量が予め設定した変化量以下の場合にはバッファ
メモリ６およびバッファメモリ７に蓄積されている１フ
レーム分のスペクトル情報と残差ｃを廃棄し、次の１フ
レーム分の各データをバッファメモリ６およびバッファ
メモリ７に蓄積する。この残差ｃの廃棄は合成フィルタ
８における音声合成を一時中断することにより行う。こ
のような方法によりフレームの間引を行う。If the calculation result of the prediction gain Pg is equal to or smaller than a predetermined threshold value, it is determined that the frame is a vowel frame and becomes a candidate to be thinned out. Next, in the change amount calculator 12,
The average change amount of the calculated value of the prediction gain Pg stored in the ring buffer is calculated and sent to the decision unit 4. When the average change amount of the prediction gain Pg calculated by the change amount calculator 12 is equal to or less than a predetermined change amount, the determination unit 4 determines whether the spectrum information of one frame stored in the buffer memories 6 and 7 The difference c is discarded, and the next one frame of data is stored in the buffer memory 6 and the buffer memory 7, respectively. The discard of the residual c is performed by temporarily stopping the speech synthesis in the synthesis filter 8. Frame thinning is performed by such a method.

【００２０】本発明の音声合成器の第２の実施例を示す
図２を参照すると、一定時間長のフレームごとに分析し
たスペクトル情報を含む複数の音声情報パラメータをフ
レーム単位で編集して合成する音声合成器において、音
声ファイル１からの合成に必要な音声データａを蓄える
とともに制御信号ｄに応じて１フレーム単位でスペクト
ル情報ｂを出力しかつ残差ｃを出力する音声メモリ２
と、音声メモリ２からのスペクトル情報ｂの予測ゲイン
を算出する予測ゲイン算出器３と、音声メモリ２からの
スペクトル情報ｂのピッチ周期を算出するピッチ周期算
出器１３と、予測ゲイン算出器３からの予測ゲイン算出
値をしきい値メモリ５からの予測ゲインしきい値と比較
判定するとともにピッチ周期算出器１３からのピッチ周
期の変化量をしきい値メモリ５からのピッチ周期変化量
しきい値と比較判定する判定器４と、音声メモリ４から
のスペクトル情報ｂを格納するバッファメモリ６と、音
声メモリ２からの残差ｃを格納するバッファメモリ７
と、判定器４からの判定出力により制御信号を出力して
バッファメモリ６およびバッファメモリ７を制御するバ
ッファ制御回路１１と、バッファメモリ６からの出力と
バッファメモリ７からの出力とを合成する合成フィルタ
８と、フレーム間引きのための制御信号ｄを出力して音
声メモリ２を制御するフレーム制御回路１０とから構成
される。Referring to FIG. 2 showing a second embodiment of the speech synthesizer according to the present invention, a plurality of speech information parameters including spectrum information analyzed for each frame of a fixed time length are edited and synthesized in frame units. A voice synthesizer for storing voice data a necessary for synthesis from the voice file 1 and outputting spectrum information b in frame units according to a control signal d and outputting a residual c;
When, a prediction gain calculator 3 for calculating a prediction gain for spectrum information b from the voice memory 2, from the voice memory 2
A pitch period calculator 13 for calculating a pitch period of the spectrum information b, the pitch period calculator with a predicted gain value calculated from the prediction gain calculator 3 for comparing determines that the prediction gain threshold from the threshold memory 5 the variation of the pitch period and the comparator determining unit 4 and the pitch period variation threshold from the threshold memory 5 from 13, a buffer memory 6 for storing the spectral information b from the speech memory 4, the audio Buffer memory 7 for storing residual c from memory 2
When, a buffer control circuit 11 for controlling the buffer memory 6 and the buffer memory 7 and outputs a by Ri control signal to the determination output from the determinator 4, and an output from the output buffer memory 7 from the buffer memory 6 Synthesis And a frame control circuit 10 that outputs a control signal d for frame thinning and controls the audio memory 2.

【００２１】この第２の実施例の場合は第１の実施例の
場合と同様に、間引くフレームの候補になるかどうかの
予測ゲイン算出結果のしきい値判定がされる。この場合
は、ピッチ周期をピッチ周期算出器１３のリングバッフ
ァ等に格納しておく。もし、間引くフレームの候補にな
った場合にはリングバッファに格納されているピッチ周
期を予め設定したピッチ周期と比較してほぼ一定になっ
ているか判定する。間引く場合は、図１の第１の実施例
の場合と同様の手順で行う。In the case of the second embodiment, as in the case of the first embodiment, a threshold value of the prediction gain calculation result as to whether or not it is a candidate for a thinned frame is determined. In this case, the pitch cycle is stored in a ring buffer or the like of the pitch cycle calculator 13. If a frame to be thinned out becomes a candidate, a pitch cycle stored in the ring buffer is compared with a preset pitch cycle to determine whether the pitch cycle is substantially constant. When thinning out, the same procedure as in the first embodiment of FIG. 1 is performed.

【００２２】[0022]

【発明の効果】以上説明したように本発明よれば、フレ
ームごとに予測ゲインを算出し、予測ゲインの変化量ま
たはピッチ周期の安定しているフレームを正確に判定し
て定常的に連続している母音のフレームのみを間引くこ
とにより音質劣下の少ない高速発声を可能にできる。As described above, according to the present invention, a prediction gain is calculated for each frame, a frame in which the amount of change in the prediction gain or the pitch period is stable is accurately determined, and the frame is continuously and continuously determined. By thinning out only the vowel frames that are present, high-speed utterance with less deterioration in sound quality can be realized.

[Brief description of the drawings]

【図１】本発明の第１の実施例の音声合成器を示すブロ
ック図である。FIG. 1 is a block diagram showing a speech synthesizer according to a first embodiment of the present invention.

【図２】本発明の第２の実施例の音声合成器を示すブロ
ック図である。FIG. 2 is a block diagram showing a speech synthesizer according to a second embodiment of the present invention.

【図３】従来の音声合成器を示すブロック図である。FIG. 3 is a block diagram showing a conventional speech synthesizer.

[Explanation of symbols]

１音声ファイル２音声メモリ３予測ゲイン算出器４判定器５しきい値メモリ６バッファメモリ７バッファメモリ８合成フィルタ９音声出力端子１０フレーム制御回路１１バッファ制御回路１２変化量算出器１３ピッチ周期算出器ａ音声データｂスペクトル情報ｃ残差ｄ制御信号ｅ音声出力 DESCRIPTION OF SYMBOLS 1 Audio file 2 Audio memory 3 Predictive gain calculator 4 Judgment device 5 Threshold memory 6 Buffer memory 7 Buffer memory 8 Synthesis filter 9 Audio output terminal 10 Frame control circuit 11 Buffer control circuit 12 Change amount calculator 13 Pitch period calculator a audio data b spectrum information c residual error d control signal e audio output

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平３−259197（ＪＰ，Ａ) 特開昭61−290499（ＪＰ，Ａ) 特開昭62−102300（ＪＰ，Ａ) 特開昭63−234299（ＪＰ，Ａ) 特開平４−273300（ＪＰ，Ａ) 特開平５−27791（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 13/00 - 13/08 G10L 21/04 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-3-259197 (JP, A) JP-A-61-290499 (JP, A) JP-A-62-102300 (JP, A) JP-A 63-290 234299 (JP, A) JP-A-4-273300 (JP, A) JP-A-5-27791 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 13/00-13 / 08 G10L 21/04

Claims

(57) [Claims]

1. A speech synthesizer that edits and synthesizes a plurality of speech information parameters including spectrum information analyzed for each frame of a fixed time length in frame units, wherein a prediction gain calculation for calculating a prediction gain of the spectrum information. Means, and control means for controlling the thinning of the frame, wherein the frame is thinned when the prediction gain is smaller than the prediction gain threshold value and when the inter-frame change amount of the prediction gain is small. And a speech synthesizer.

2. A speech synthesizer for editing and synthesizing a plurality of speech information parameters including spectrum information analyzed for each frame of a fixed time length on a frame basis, and calculating a prediction gain of the spectrum information. Means, a pitch cycle calculating means for calculating a pitch cycle of the spectrum information, and a control means for controlling the thinning of the frame , when the prediction gain is smaller than a prediction gain threshold value, and A speech synthesizer characterized in that the frames are thinned out when the pitch period is stable.

3. A speech synthesizer for editing and synthesizing a plurality of speech information parameters including spectrum information analyzed for each frame of a fixed time length in frame units, and storing speech data necessary for synthesis from a speech file. with a voice memory for outputting the spectral information in frame units and outputs a residual in response to the first control signal, and a prediction gain calculation means for calculating a prediction gain for spectrum information from the speech memory, the predicted gain Prediction gain change amount calculation means for calculating the change amount of the prediction gain from the calculation means; and a prediction gain calculation value from the prediction gain calculation means for making a comparison with a prediction gain threshold value; Determining means for comparing and determining the amount of change of the predicted gain from the threshold value with the predicted gain change amount threshold value; A first buffer memory for storing the spectrum of La, and outputs a second buffer memory for storing the residual from the voice memory, a second control signal by the determination output from said determination means Buffer control means for controlling the first buffer memory and outputting a third control signal to control the second buffer memory; and an output from the first buffer memory and the second buffer memory. a synthesis filter for synthesizing the output from the speech synthesizer, wherein the first control signal output to the and a frame control means for controlling the speech memory for decimation of the frame.

4. A voice synthesizer for editing and synthesizing a plurality of voice information parameters including spectrum information analyzed for each frame of a fixed time length in frame units, and storing voice data necessary for synthesis from a voice file. with a voice memory for outputting the spectral information in frame units and outputs a residual in response to the first control signal, and a prediction gain calculation means for calculating a prediction gain for spectrum information from the speech memory, the voice memory changes in pitch period and pitch period calculating means for calculating a pitch period of the spectrum information, from the pitch period calculating means with comparing determining prediction gain calculated value from the prediction gain calculation means and the prediction gain threshold from comparison determination means and the pitch period variation threshold amount, the scan Baek from the audio memory A first buffer memory for storing the file information, a second buffer memory for storing the residual from the audio memory, and a second control signal output by the determination output from the determination means to output the first control signal. Buffer control means for controlling the buffer memory and outputting a third control signal to control the second buffer memory; and an output from the first buffer memory and an output from the second buffer memory. a synthesis filter for synthesizing speech synthesizer, wherein the first control signal output to the and a frame control means for controlling the speech memory for decimation of the frame.