JPH05249999A

JPH05249999A - Learning type voice coding device

Info

Publication number: JPH05249999A
Application number: JP4278301A
Authority: JP
Inventors: Masami Akamine; 政巳赤嶺
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1991-10-21
Filing date: 1992-10-16
Publication date: 1993-09-28

Abstract

PURPOSE:To provide a learning type voice coding device capable of composing a voice of higher quality at a limited bit rate as less than 8 kbps. CONSTITUTION:A learning type voice composing device has an adaptive code book 110 in which drive signal vectors are stored; a minimum distortion searching circuit 115 for searching an optimum drive signal vector from the adaptive code book 110 in reference to an input voice signal; a composing filter 112 for composing a voice signal by use of the searched optimum drive signal vector; a buffer 131 for accumulating the information of the searched optimum drive signal vector; a training vector forming part 132 for cutting the accumulated information of drive signal vector in a determined length to form a training vector; and a learning part 133 for successively correcting the drive signal vectors in the code book by use of the training vector.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は音声符号化装置に係り、
特に音声信号を８ｋbps 程度以下の低ビットレートで符
号化するのに適した学習型音声符号化装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coder,
In particular, the present invention relates to a learning type speech coding apparatus suitable for coding a speech signal at a low bit rate of about 8 kbps or less.

【０００２】[0002]

【従来の技術】音声信号を低ビットレートで高能率に符
号化する技術は、自動車電話などの移動体通信や、企業
内通信において、電波の有効利用や通信コスト削減のた
めの重要な技術である。８ｋbps 以下のビットレートで
品質の優れた音声符号化方式として、ＣＥＬＰ(Code Ex
cited Linear Prediction)方式が知られている。2. Description of the Related Art A technique for encoding a voice signal at a low bit rate with high efficiency is an important technique for effective use of radio waves and reduction of communication cost in mobile communications such as car telephones and in-house communications. is there. CELP (Code Ex) is used as a high-quality speech coding method at a bit rate of 8 kbps or less.
The cited Linear Prediction) method is known.

【０００３】このＣＥＬＰ方式は、ＡＴ＆Ｔベル研のM.
R.Schroeder 氏とB.S.Atal氏により“Code-Excited Lin
ear Prediction(CELP)“High-Quality Speech at Very
LowBit Rates ”Proc.ICASSP;1985,pp.937-939 （文献
１）で発表されて以来、商品質の音声が合成できる方式
として注目され、品質の改善や、計算量の削減など、種
々の検討がなされて来た。ＣＥＬＰ方式の特徴は、ＬＰ
Ｃ（Liner PredictiveCoding:線形予測符号化）合成フ
ィルタの駆動信号を駆動信号ベクトルとしてコードブッ
クに格納し、合成音声信号と入力音声信号の誤差を評価
しながら、最適な駆動信号ベクトルをコードブックから
探索する点にある。This CELP method is based on M.A. of AT & T Bell Labs.
“Code-Excited Lin” by R. Schroeder and BSAtal
ear Prediction (CELP) “High-Quality Speech at Very
Since it was announced in LowBit Rates "Proc.ICASSP; 1985, pp.937-939 (Reference 1), it has been attracting attention as a method of synthesizing voices of product quality, and various studies such as quality improvement and reduction of calculation amount have been made. The features of CELP method are LP
The drive signal of the C (Liner Predictive Coding) synthesis filter is stored in the codebook as the drive signal vector, and the optimum drive signal vector is searched from the codebook while evaluating the error between the synthesized voice signal and the input voice signal. There is a point to do.

【０００４】図９は、最新のＣＥＬＰ方式による音声符
号化装置のブロック図である。同図において、入力信号
であるサンプリングされた音声信号系列は入力端子６０
０からフレーム単位で入力される。フレームはＬ個の信
号サンプルからなり、サンプリング周波数が８ｋＨｚの
場合、一般にＬ＝１６０が用いられる。図９には示され
ていないが、駆動信号ベクトルの探索に先立ち、入力さ
れたＬサンプルの音声信号系列に対してＬＰＣ分析が行
われ、ＬＰＣ予測パラメータ｛α₁，ｉ＝１，２，…
ｐ｝が抽出される。このＬＰＣ予測パラメータα₁は、
ＬＰＣ合成フィルタ６３０に供給される。なお、ｐは予
測次数であり、一般にｐ＝１０が用いられる。ＬＰＣ合
成フィルタ６３０の伝達関数Ｈ(z) は、［数１］で与え
られる。FIG. 9 is a block diagram of a speech coder according to the latest CELP method. In the figure, the sampled audio signal sequence which is the input signal is the input terminal 60.
It is input from 0 in frame units. A frame consists of L signal samples, and when the sampling frequency is 8 kHz, L = 160 is generally used. Although not shown in FIG. 9, LPC analysis is performed on the input L-sample speech signal sequence prior to the search for the drive signal vector, and LPC prediction parameters {α ₁ , i = 1, 2, ...
p} is extracted. This LPC prediction parameter α ₁ is
It is supplied to the LPC synthesis filter 630. Note that p is the predicted order, and p = 10 is generally used. The transfer function H (z) of the LPC synthesis filter 630 is given by [Equation 1].

【０００５】[0005]

【数１】 [Equation 1]

【０００６】次に、音声信号を合成しながら最適な駆動
信号ベクトルを探索する過程について説明する。まず、
入力端子６００に入力された１フレームの音声信号か
ら、減算器６１０で前フレームでの合成フィルタ６３０
の内部状態が現フレームに与える影響が減算される。減
算器６１０から得られた信号系列は４個のサブフレーム
に分割され、各サブフレームの目標信号ベクトルとな
る。Next, a process of searching for an optimum drive signal vector while synthesizing a voice signal will be described. First,
From the 1-frame audio signal input to the input terminal 600, the subtractor 610 performs the synthesis filter 630 for the previous frame.
The effect of the internal state of the current frame on the current frame is subtracted. The signal sequence obtained from the subtractor 610 is divided into four subframes, and becomes the target signal vector of each subframe.

【０００７】ＬＰＣ合成フィルタ６３０の入力信号であ
る駆動信号ベクトルは、適応コードブック６４０から選
択された駆動信号ベクトルに乗算器６５０で所定のゲイ
ンを乗算したものと、白色雑音コードブック７１０から
選択された雑音ベクトルに乗算器７２０で所定のゲイン
を乗算したものとを加算器６６０で加算することで得ら
れる。The driving signal vector, which is the input signal of the LPC synthesis filter 630, is selected from the white noise codebook 710 and the driving signal vector selected from the adaptive codebook 640 multiplied by a predetermined gain in the multiplier 650. The obtained noise vector is multiplied by a predetermined gain in the multiplier 720 and is added by the adder 660.

【０００８】ここで、適応コードブック６４０は文献１
に記載されているピッチ予測分析を閉ループ動作または
合成による分析(Analysis by Synthesis) によって行う
ものであり、詳細はW.B.Kleijin D.J.Krasinski and R.
H.Ketchum,"Improved SpeechQuality and Efficient Ve
ctor Quantization in CELP",Proc.ICASSP,1988,pp.155
-158 （文献２）に述べられている。この文献２による
と、ＬＰＣ合成フィルタ６３０の駆動信号をピッチ探索
範囲ａ〜ｂ（ａ，ｂは駆動信号のサンプル番号であり、
通常ａ＝２０，ｂ＝１４７）にわたって遅延回路６７０
で１サンプルづつ遅延させることにより、ａ〜ｂサンプ
ルのピッチ周期に対する駆動信号ベクトルを作成し、こ
れがコードワードとして適応コードブックに格納され
る。Here, the adaptive codebook 640 is referred to as reference 1
The pitch prediction analysis described in the above is performed by closed loop operation or analysis by synthesis.For details, see WBKleijin DJ Krasinski and R.
H. Ketchum, "Improved Speech Quality and Efficient Ve
ctor Quantization in CELP ", Proc.ICASSP, 1988, pp.155
-158 (reference 2). According to this reference 2, the drive signal of the LPC synthesis filter 630 is set to the pitch search range a to b (a and b are sample numbers of the drive signal,
Normally, a delay circuit 670 is provided over a = 20, b = 147).
By delaying one sample at a time, a drive signal vector for the pitch period of a to b samples is created, and this is stored in the adaptive codebook as a codeword.

【０００９】最適な駆動信号ベクトルの探索を行う場
合、適応コードブック６４０から各ピッチ周期に対応す
る駆動信号ベクトルのコードワードが１個ずつ読み出さ
れ、乗算器６５０で所定のゲインと乗算される。そし
て、ＬＰＣ合成フィルタ６３０によりフィルタ演算が行
われ、合成音声信号ベクトルが生成される。生成された
合成音声信号ベクトルは、減算器６２０で目標信号ベク
トルと減算される。この減算器６２０の出力は聴感重み
付けフィルタ６８０を経て誤算計算回路６９０に入力さ
れ、平均２乗誤差が求められる。平均２乗誤差の情報は
更に最小歪探索回路７００に入力され、その最小値が検
出される。When the optimum drive signal vector is searched for, one codeword of the drive signal vector corresponding to each pitch period is read from the adaptive codebook 640 and is multiplied by a predetermined gain in the multiplier 650. . Then, filter calculation is performed by the LPC synthesis filter 630 to generate a synthesized voice signal vector. The generated synthesized speech signal vector is subtracted from the target signal vector by the subtractor 620. The output of the subtractor 620 is input to the miscalculation calculation circuit 690 via the perceptual weighting filter 680, and the mean square error is obtained. The information on the mean square error is further input to the minimum distortion search circuit 700, and its minimum value is detected.

【００１０】以上の過程は、適応コードブック６４０中
の全ての駆動信号ベクトルのコードワードについて行わ
れ、最小歪探索回路７００において平均２乗誤差の最小
値を与えるコードワードの番号が求められる。また、乗
算器６５０で乗じられるゲインも平均２乗誤差が最小に
なるよう決定される。The above process is performed for all the drive signal vector codewords in the adaptive codebook 640, and the codeword number giving the minimum value of the mean square error is obtained in the minimum distortion search circuit 700. The gain multiplied by the multiplier 650 is also determined so that the mean square error is minimized.

【００１１】次に、同様の方法で最適な白色雑音ベクト
ルの探索が行われる。すなわち、白色雑音コードブック
７１０から雑音ベクトルのコードワードが１個ずつ読み
出され、乗算器７２０でのゲインとの乗算、ＬＰＣ合成
フィルタ６３０でのフィルタ演算を経て、合成音声信号
ベクトルの生成、目標ベクトルとの平均２乗誤差の計算
が全ての雑音ベクトルについて行われる。そして、平均
２乗誤差の最小値を与える雑音ベクトルの番号及びゲイ
ンが求められる。なお、聴感重み付けフィルタ６８０は
減算器６２０から出力される誤差信号のスペクトルを整
形して、人間に知党される歪を低減するために用いられ
る。Next, the optimum white noise vector is searched for in the same manner. That is, the codewords of the noise vector are read one by one from the white noise codebook 710, multiplied by the gain in the multiplier 720, and subjected to the filter operation in the LPC synthesis filter 630 to generate a synthetic speech signal vector and to obtain the target. The calculation of the mean square error with the vector is performed for all noise vectors. Then, the number and the gain of the noise vector that gives the minimum value of the mean square error are obtained. The perceptual weighting filter 680 is used to shape the spectrum of the error signal output from the subtractor 620 and reduce the distortion known to humans.

【００１２】このようにＣＥＬＰ方式は、合成音声信号
と入力音声信号との誤差が最小になるような最適の駆動
信号ベクトルを求めているので、８ｋbps 程度の低ビッ
トレートでも高品質の音声を合成することができる。し
かし、８ｋbps 以下のビットレートでは、駆動信号の符
号化に割り当てられるビット数が十分でなくなるため
に、品質の劣化が知覚されてしまうことが確認されてい
る。As described above, in the CELP system, since an optimum drive signal vector that minimizes the error between the synthesized voice signal and the input voice signal is obtained, a high quality voice is synthesized even at a low bit rate of about 8 kbps. can do. However, it has been confirmed that at a bit rate of 8 kbps or less, the deterioration of quality is perceived because the number of bits allocated for encoding the drive signal becomes insufficient.

【００１３】[0013]

【発明が解決しようとする課題】上述したように、従来
のＣＥＬＰ方式は８ｋbps 程度以上のビットレートでは
高品質の音声を合成することができるが、これ以下のビ
ットレートでは駆動信号の符号化に割り当てられるビッ
ト数が不足して品質の劣化が知覚されてしまい、実用上
不十分であるという問題があった。As described above, the conventional CELP system can synthesize high-quality speech at a bit rate of about 8 kbps or higher, but at a bit rate lower than this, it is not suitable for driving signal coding. There is a problem in that the number of allocated bits is insufficient and the deterioration of quality is perceived, which is not practically sufficient.

【００１４】本発明は上記の問題点に鑑みてなされたも
ので、８ｋbps 程度以下というような限られたビットレ
ートでより高品質の音声を合成できる学習型音声符号化
装置を提供することを目的とする。The present invention has been made in view of the above problems, and an object of the present invention is to provide a learning-type speech coder capable of synthesizing higher-quality speech at a limited bit rate of about 8 kbps or less. And

【００１５】[0015]

【課題を解決するための手段】本発明は上記の課題を解
決するために、駆動信号ベクトルをコードワードとして
格納したコードブック（適応コードブック）と、入力音
声信号を参照して適応コードブックから最適な駆動信号
ベクトルを探索する探索手段と、この探索手段により探
索された最適な駆動信号ベクトルを用いて音声信号を合
成する合成フィルタと、前記最適な駆動信号ベクトルを
用いてトレーニングベクトルを作成するトレーニングベ
クトル作成手段と、この手段により作成されたトレーニ
ングベクトルを用いてコードブック内の駆動信号ベクト
ルを逐次修正する学習手段とを具備することを特徴とす
る。SUMMARY OF THE INVENTION In order to solve the above problems, the present invention refers to a codebook (adaptive codebook) in which a drive signal vector is stored as a codeword, and an adaptive codebook with reference to an input voice signal. Search means for searching for an optimum driving signal vector, a synthesis filter for synthesizing a voice signal using the optimum driving signal vector searched by this searching means, and a training vector using the optimum driving signal vector It is characterized by comprising a training vector creating means and a learning means for sequentially correcting the drive signal vector in the codebook using the training vector created by this means.

【００１６】[0016]

【作用】本発明では適応コードブックから探索された最
適な駆動信号ベクトル、つまり合成フィルタを駆動して
実際に符号化に使用された駆動信号ベクトルを用い、こ
れをトレーニングベクトルとして、適応コードブック内
の駆動信号ベクトル、具体的には駆動信号ベクトルのう
ち所定の基準で選定した代表ベクトルが逐次修正され
る。この処理は符号化と並行して、新たな駆動信号ベク
トルが探索される毎に行われる。In the present invention, the optimum drive signal vector searched from the adaptive codebook, that is, the drive signal vector actually used for encoding by driving the synthesis filter is used, and this is used as a training vector in the adaptive codebook. Of the drive signal vector, specifically, the representative vector selected based on a predetermined standard among the drive signal vectors is sequentially corrected. This processing is performed in parallel with the encoding every time a new drive signal vector is searched.

【００１７】このように駆動信号ベクトルが逐次修正さ
れる学習処理によって、適応コードブック内の駆動信号
ベクトルは話者の音声をより正確に合成可能なベクトル
に逐次変化してゆく。この結果、例えば８ｋbps 程度以
下の低いビットレートでも、高品質の音声合成が可能と
なる。By the learning process in which the drive signal vector is sequentially corrected in this way, the drive signal vector in the adaptive codebook is sequentially changed to a vector that can more accurately synthesize the voice of the speaker. As a result, high quality speech synthesis is possible even at a low bit rate of, for example, about 8 kbps or less.

【００１８】[0018]

【実施例】以下、図面を参照しながら本発明の実施例を
説明する。図１は、本発明の一実施例に係る学習型音声
符号化装置のブロック図である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram of a learning-type speech encoding apparatus according to an embodiment of the present invention.

【００１９】図１において、入力端子１００には所定の
サンプリング周波数（例えば８ｋＨｚ）でサンプリング
された音声信号がフレーム単位で入力される。この入力
音声信号は、まずフレームバッファ１０１に入力され
る。フレームバッファ１０１では、入力音声信号系列を
Ｌ個（例えばＬ＝１６０）のサンプル単位で切出し、１
フレームの信号として記憶する。フレームバッファ１０
１からの１フレームの入力音声信号は、ＬＰＣ分析回路
１０２および重み付けフィルタ１０６へ供給される。In FIG. 1, an audio signal sampled at a predetermined sampling frequency (for example, 8 kHz) is input to the input terminal 100 in units of frames. This input audio signal is first input to the frame buffer 101. In the frame buffer 101, the input audio signal sequence is cut out in units of L (for example, L = 160) samples, and 1
It is stored as a frame signal. Frame buffer 10
The 1-frame input audio signal from 1 is supplied to the LPC analysis circuit 102 and the weighting filter 106.

【００２０】ＬＰＣ分析回路１０２は、例えば自己相関
法を用いて入力音声信号に対してＬＰＣ（Linear Predi
ctive Coding：線形予測符号化）分析を行い、Ｐ個のＬ
ＰＣ予測係数｛α₁、ｉ＝１，２，…ｐ、｝、または反
射係数｛ｋ₁、ｉ＝１，２，…，ｐ｝を抽出する。抽出
された予測係数または反射係数は、符号化回路１０３に
おいて所定のビット数で符号化された後、重み付けフィ
ルタ１０６および重み付け合成フィルタ１０７，１１
２，１２２で利用される。The LPC analysis circuit 102 uses an autocorrelation method, for example, for an LPC (Linear Predi
ctive Coding: Linear predictive coding) analysis is performed, and P L
The PC prediction coefficient {α ₁ , i = 1, 2, ... P,} or the reflection coefficient {k ₁ , i = 1, 2, ..., P} is extracted. The extracted prediction coefficient or reflection coefficient is encoded by the encoding circuit 103 with a predetermined number of bits, and then the weighting filter 106 and the weighting synthesis filters 107 and 11 are used.
2,122 is used.

【００２１】重み付けフィルタ１０６は、適応コードブ
ック１１０および雑音コードブック１２０から合成フィ
ルタの駆動信号ベクトルを探索する際に、入力音声信号
系列に重み付けを行うものである。重み付け合成フィル
タ１０７，１１２，１２２内の合成フィルタの伝達関数
Ｈ(z) は、［数１］で記述される。この時、重み付けフ
ィルタ１０６の伝達関数Ｗ(z) は［数２］で表される。The weighting filter 106 weights the input speech signal sequence when searching the drive signal vector of the synthesis filter from the adaptive codebook 110 and the noise codebook 120. The transfer function H (z) of the synthesis filter in the weighting synthesis filters 107, 112, 122 is described by [Equation 1]. At this time, the transfer function W (z) of the weighting filter 106 is represented by [Equation 2].

【００２２】[0022]

【数２】但し、γは重み付けの強さを制御するパラメータである
（０≦γ≦１）。[Equation 2] However, γ is a parameter for controlling the strength of weighting (0 ≦ γ ≦ 1).

【００２３】重み付け合成フィルタ１０７，１１２，１
２２は、［数１］に示したＨ(z) なる伝達関数の合成フ
ィルタと、Ｗ(z) なる伝達関数の重み付けフィルタを縦
続接続したフィルタであり、その伝達関数Ｈ_w(z) は
［数３］で記述される。Weighting synthesis filters 107, 112, 1
22 is a filter in which the synthesis filter of the transfer function H (z) shown in [Equation 1] and the weighting filter of the transfer function W (z) are connected in series, and the transfer function H _w (z) is [ [Equation 3].

【００２４】[0024]

【数３】 [Equation 3]

【００２５】本実施例のように重み付けフィルタ１０６
を用いると、聴感上の符号化歪を低減することが可能に
なる。また、本実施例では重み付けフィルタ１０６を駆
動信号ベクトルの探索ループの外に設けた構成になって
おり、この結果、探索に要する計算量が大幅に削除され
る。As in the present embodiment, the weighting filter 106
Is used, it is possible to reduce audible coding distortion. Further, in the present embodiment, the weighting filter 106 is provided outside the drive signal vector search loop, and as a result, the amount of calculation required for the search is substantially eliminated.

【００２６】さらに、重み付け合成フィルタ１１２，１
２２が駆動信号ベクトルの探索に影響を与えないよう
に、初期メモリを持った重み付け合成フィルタ１０７が
設けられている。この重み付け合成フィルタ１０７は、
前フレームの最後に重み付け合成フィルタ１１２，１２
２が保持していた内部状態を初期状態として持つ。Furthermore, the weighting synthesis filters 112, 1
A weighting synthesis filter 107 having an initial memory is provided so that 22 does not affect the search of the drive signal vector. The weighting synthesis filter 107 is
Weighted synthesis filters 112, 12 at the end of the previous frame
2 has an internal state held by 2 as an initial state.

【００２７】そして、重み付け合成フィルタ１０７の零
入力応答ベクトルを作成し、減算器１０８において重み
付けフィルタ１０６の出力から上記零入力応答ベクトル
を減算する。これにより、重み付け合成フィルタ１１
２，１２２の初期状態を零とすることができ、前フレー
ムの影響を考慮せずに駆動信号ベクトルの探索を行うこ
とができる。以上の処理は、全てフレーム単位で行われ
る。次に、フレームをＭ個（通常、Ｍ＝４）のサブフレ
ームに分割し、サブフレーム単位で行う駆動信号ベクト
ル探索の処理について説明する。Then, a zero input response vector of the weighting synthesis filter 107 is created, and the subtracter 108 subtracts the zero input response vector from the output of the weighting filter 106. As a result, the weighting synthesis filter 11
The initial state of 2,122 can be set to zero, and the drive signal vector can be searched without considering the influence of the previous frame. The above processing is all performed in frame units. Next, a drive signal vector search process performed by dividing a frame into M (usually M = 4) subframes and performing the subframe unit will be described.

【００２８】最適な駆動信号ベクトルの探索は適応コー
ドブック１１０、雑音コードブック１２０の順に行われ
る。適応コードブック１１０には、Ｋ次元（Ｋ＝Ｌ／
Ｍ）の駆動信号ベクトルが２０サンプルから１４７サン
プルのピッチ周期に対応できるように１２８個格納され
ている。駆動信号ベクトルの探索に際しては、まず適応
コードブック１１０から、後述するインデックスｊで指
定される駆動信号ベクトルＸ_jを順次読み出し、乗算器
１１１でＸ_jに所定のゲインβを乗じた後、重み付け合
成フィルタ１１２に供給する。重み付け合成フィルタ１
１２では、ゲインβが乗じられた駆動信号ベクトルにフ
ィルタリング演算を施して合成音声ベクトルを作成す
る。The search for the optimum drive signal vector is performed in the order of the adaptive codebook 110 and the noise codebook 120. The adaptive codebook 110 contains K dimensions (K = L /
128 drive signal vectors of M) are stored so as to correspond to a pitch period of 20 to 147 samples. In the search for the drive signal vector, first, the drive signal vector X _j specified by the index j described later is sequentially read from the adaptive codebook 110, the multiplier 111 multiplies X _j by a predetermined gain β, and then weighted synthesis is performed. It is supplied to the filter 112. Weighting synthesis filter 1
In step 12, the driving signal vector multiplied by the gain β is filtered to create a synthetic speech vector.

【００２９】一方、フレームバッファ１０１から読み出
された入力音声信号は、重み付けフィルタ１０６によっ
て重み付けがなされた後、減算器１０８で前フレームの
影響が差し引かれる。この減算器１０８から出力される
音声信号ベクトルＹを目標ベクトルとして、減算器１１
３で重み付け合成フィルタ１１２からの合成音声ベクト
ルとの誤差ベクトルＥ_jが計算される。そして、２乗誤
差計算回路１１４で誤差の２乗和‖Ｅ_j‖が計算され、
この‖Ｅ_j‖の最小値および最小値を与えるインデック
スｊが最小歪探索回路１１５で検出される。このインデ
ックスｊが適応コードブック１１０とマルチプレクサ１
４２に与えられる。On the other hand, the input audio signal read from the frame buffer 101 is weighted by the weighting filter 106, and then the influence of the previous frame is subtracted by the subtractor 108. Using the audio signal vector Y output from the subtractor 108 as a target vector, the subtractor 11
In 3, the error vector E _j from the synthesized speech vector from the weighting synthesis filter 112 is calculated. Then, the squared error calculation circuit 114 calculates the sum of squared error ‖E _j ‖,
The minimum value of this ‖E _j ‖ and the index j that gives the minimum value are detected by the minimum distortion search circuit 115. This index j is the adaptive codebook 110 and the multiplexer 1
42.

【００３０】具体的には、誤差ベクトルＥ_jは例えば
［数４］で表わされる。この誤差ベクトル‖Ｅ_j‖をβ
で偏微分して零と置くことによって、βを最適化した場
合の‖Ｅ_j‖の最小値が［数５］で表される。但し、β
は乗算器１１１で与えられるゲインである。Specifically, the error vector E _j is represented by, for example, [Equation 4]. This error vector ‖E _j ‖ is β
By partially differentiating and setting it as zero, the minimum value of ‖E _j ‖ when β is optimized is expressed by [Equation 5]. However, β
Is a gain given by the multiplier 111.

【００３１】[0031]

【数４】 [Equation 4]

【００３２】[0032]

【数５】 [Equation 5]

【００３３】ここで、‖Ｘ‖は２乗ノルム、（Ｘ，Ｙ）
は内積をそれぞれ表し、Ｈは［数６］で与えられる重み
付け合成フィルタ（伝達関数：Ｈ_w(z) ）のインパルス
応答行列である。Where ‖X‖ is the square norm, (X, Y)
Represents an inner product, and H is an impulse response matrix of a weighting synthesis filter (transfer function: H _w (z)) given by [Equation 6].

【００３４】[0034]

【数６】 [Equation 6]

【００３５】［数５］から明らかなように、適応コード
ブック１１０からの駆動信号ベクトルの探索は、全ての
コードワードＸ_jに対し［数５］の右辺第２項を計算
し、それが最大になるインデックスｊを検出することに
よって行う。As is clear from [Equation 5], the search for the drive signal vector from the adaptive codebook 110 calculates the second term on the right-hand side of [Equation 5] for all codewords X _j , which is the maximum. This is done by detecting the index j that becomes

【００３６】このようにして適応コードブック１１０か
ら最適な駆動信号ベクトルＸ_optが探索されると、減算
器１１３で目標ベクトルＹからＸ_optに対応する重み付
け合成フィルタ１１２の出力が差し引かれ、この減算器
１１３の出力が雑音コードブック１２０からの雑音ベク
トル探索の目標ベクトルとされる。雑音コードブック１
２０からの雑音ベクトルの探索も、適応コードブック１
１０からの駆動信号ベクトルの探索と全く同様に行うこ
とができる。この雑音ベクトル１２０からの探索で得ら
れたコードベクトルをＮ_optとすると、合成フィルタの
駆動信号ベクトルＸはWhen the optimum drive signal vector X _opt is searched from the adaptive codebook 110 in this way, the subtracter 113 subtracts the output of the weighting synthesis filter 112 corresponding to X _opt from the target vector Y, and this subtraction is performed. The output of the unit 113 is used as the target vector for the noise vector search from the noise codebook 120. Noise Codebook 1
The search for noise vectors from 20 is also an adaptive codebook 1
The search for the drive signal vector from 10 can be performed in exactly the same way. If the code vector obtained by the search from the noise vector 120 is N _opt , the drive signal vector X of the synthesis filter is

【００３７】[0037]

【数７】と表される。但し、β，ｇはそれぞれ減算器１１１、１
２１において適応コードブック１１０および雑音コード
ブック１２０から探索された駆動信号ベクトルおよび雑
音ベクトルに与えられるゲインである。[Equation 7] Is expressed as However, β and g are subtractors 111 and 1 respectively.
21 is a gain given to the drive signal vector and the noise vector searched from the adaptive codebook 110 and the noise codebook 120 in FIG.

【００３８】このように求められた駆動信号ベクトル
は、過去のサブフレームで求められた駆動信号ベクトル
と結合された後、２０〜１４７サンプルに渡って遅延回
路１５０で１サンプルずつ遅延され、Ｋサンプル単位で
適応コードブック１１０に格納される。次に本発明の要
旨である雑音コードブック１２０内の駆動信号ベクトル
を学習により逐次修正する構成について説明する。図１
においては、この学習のためにトレーニングベクトル作
成部１６２および学習部１６３が設けられている。The drive signal vector obtained in this way is combined with the drive signal vector obtained in the past sub-frame and then delayed by 1 sample by the delay circuit 150 for 20 to 147 samples, and K samples are obtained. It is stored in the adaptive codebook 110 in units. Next, a configuration for sequentially correcting the drive signal vector in the noise codebook 120, which is the gist of the present invention, by learning will be described. Figure 1
In, a training vector creation unit 162 and a learning unit 163 are provided for this learning.

【００３９】雑音コードブック１２０からの駆動信号ベ
クトルの探索があるサブフレームで終了すると、最適な
駆動信号ベクトルＮ_optが雑音コードブック１２０から
出力される。トレーニングベクトル作成部１６２はこの
駆動信号ベクトルをトレーニングベクトルＶ_tに設定す
る。学習部１６３では、トレーニング作成部１６２から
のトレーニングベクトルを用いて雑音コードブック１２
０に格納されている駆動信号ベクトルを学習により逐次
修正する。この修正は符号化の処理と並行して行う。When the search for the drive signal vector from the noise codebook 120 ends in a certain subframe, the optimum drive signal vector N _opt is output from the noise codebook 120. The training vector creation unit 162 sets this drive signal vector as the training vector V _t . The learning unit 163 uses the training vector from the training creation unit 162 to generate the noise codebook 12
The drive signal vector stored in 0 is successively corrected by learning. This correction is performed in parallel with the encoding process.

【００４０】図２に、この学習の手順を示す。まず、ト
レーニングベクトル作成部１６２からのトレーニングベ
クトルＶ_tを入力する（Ｓ１）。次に雑音コードブック
１２０内に格納されている複数個の駆動信号ベクトルの
うち、修正（更新）するベクトルを設定する（更新領域
設定Ｓ２）。更新領域の設定法としては、トレーニング
ベクトルＶ_tから一定のユークリッド距離内に存在する
代表ベクトルを、更新領域に設定する方法を用いる。こ
こで雑音コードブック内の駆動信号ベクトルを代表ベク
トルと言い換えている。また更新領域の大きさは時間と
共に小さくなるものとする。時刻ｉにおける更新領域を
ＮＥ(i) とおくと、ＮＥ(i) は次の性質を有するものと
する。FIG. 2 shows the procedure of this learning. First, the training vector V _t from the training vector creation unit 162 is input (S1). Next, among the plurality of drive signal vectors stored in the noise codebook 120, a vector to be corrected (updated) is set (update area setting S2). As a method of setting the update area, a method of setting a representative vector existing within a certain Euclidean distance from the training vector V _t in the update area is used. Here, the drive signal vector in the noise codebook is paraphrased as a representative vector. Further, the size of the update area is assumed to decrease with time. Letting NE (i) be the update area at time i, NE (i) has the following properties.

【００４１】[0041]

【数８】 [Equation 8]

【００４２】次に、更新領域内の代表ベクトルをトレー
ニングベクトルＶ_tを用いて、更新（修正）する。時刻
ｉにおける更新領域に含まれる代表ベクトルＶ_j(i)
は、次式に従って更新される。Next, the representative vector in the update area is updated (corrected) using the training vector V _t . Representative vector V _j (i) included in the update area at time i
Is updated according to the following equation:

【００４３】[0043]

【数９】ここで、α(i) は修正の大きさを制御する変数であり、
次の性質をもつ。[Equation 9] Where α (i) is a variable that controls the magnitude of the correction,
It has the following properties.

【００４４】[0044]

【数１０】 [Equation 10]

【００４５】そして、以上の更新は、更新が収束したか
否かが判定され（Ｓ４）、収束するまで続けられる。収
束の判定は、次式を満足するかによって行われ、満たす
場合に収束したと判定する。Then, the above update is continued until it is determined whether or not the update has converged (S4). The convergence is determined depending on whether the following expression is satisfied, and if it is satisfied, it is determined that the convergence is achieved.

【００４６】[0046]

【数１１】 [Equation 11]

【００４７】この学習法は、Ｋｏｈｏｎｅｎのアルゴリ
ズムとして知られるニューラルネットワークの学習法の
一つである。このＫｏｈｏｎｅｎのアルゴリズムについ
ては、例えばT.Kohonen 氏によるSelf-Organization an
d Associative Memory,Springer-Verlag(1984)（文献
３）に記載されているので、詳細な説明は省略する。な
お、学習法はこれに限られるものではなく、他の学習法
を用いてもよい。This learning method is one of the learning methods of the neural network known as the Kohonen algorithm. Regarding this Kohonen algorithm, for example, T. Kohonen's Self-Organization an
Since it is described in d Associative Memory, Springer-Verlag (1984) (Reference 3), detailed description is omitted. The learning method is not limited to this, and other learning methods may be used.

【００４８】このような学習によって、雑音コードブッ
ク１２０内の駆動信号ベクトルは、トレーニングベクト
ルとして用いられる駆動信号ベクトルと統計的に類似し
た性質を持つようになる。前述したように、合成フィル
タの駆動信号は符号化対象である入力音声信号と合成信
号との誤差が最小となるように作成される。従って、こ
の駆動信号を用いて学習を行い、雑音コードブック１２
０内の駆動信号ベクトルを修正することによって、入力
音声との差が少ない、つまり歪の少ない合成音声を生成
するのに適した雑音コードブックが作成されることにな
る。By such learning, the drive signal vector in the noise codebook 120 has a property that is statistically similar to the drive signal vector used as the training vector. As described above, the drive signal of the synthesis filter is created so that the error between the input audio signal to be encoded and the synthesis signal is minimized. Therefore, learning is performed using this drive signal, and the noise codebook 12
By modifying the drive signal vector in 0, a noise codebook suitable for generating a synthesized voice with a small difference from the input voice, that is, a low distortion, will be created.

【００４９】しかも、学習は音声符号化の処理と並行し
て行われるので、入力音声信号の性質の変化に対応して
雑音コードブック１２０内の駆動信号ベクトルの性質も
変化する。この結果、符号化レートが８ｋbps 以下とい
うような低ビットレートで、駆動信号に割り当てられる
ビット数が少ない場合でも、高品質の音声を合成するこ
とが可能となる。Moreover, since the learning is performed in parallel with the voice encoding process, the property of the drive signal vector in the noise codebook 120 also changes in response to the change in the property of the input voice signal. As a result, it is possible to synthesize high-quality speech at a low bit rate such as an encoding rate of 8 kbps or less and even when the number of bits assigned to the drive signal is small.

【００５０】換言すれば、従来のＣＥＬＰ方式では入力
音声信号の性質が変化するのに関らず、常に同一の雑音
コードブックを用いて音声信号を再生している。これに
対して、本実施例では上述のような学習動作によって、
入力音声信号に対する合成信号の誤差がより小さくなる
ように、雑音コードブック内の駆動信号ベクトルが変化
していく。これにより、駆動信号に割り当てられるビッ
ト数が同じであれば、より高品質の合成音声が得られ
る。In other words, in the conventional CELP system, the same noise codebook is always used to reproduce the voice signal regardless of the change in the characteristics of the input voice signal. On the other hand, in this embodiment, by the learning operation as described above,
The drive signal vector in the noise codebook changes so that the error of the synthesized signal with respect to the input speech signal becomes smaller. As a result, if the number of bits assigned to the drive signal is the same, higher quality synthetic speech can be obtained.

【００５１】以上の処理の過程で求められた符号化パラ
メータは、アルチプレクサ１４２で多重化され、出力端
子１４３から伝送路へ符号化出力として送出される。す
なわち、マルチプレクサ１４２ではＬＰＣ分析回路１０
２で求められたＬＰＣ予測係数の情報を符号化回路１０
３で符号化したコードと、最小歪探索回路１１５で求め
られた適応コードブック１１０のインデックスのコード
と、乗算器１１１で乗じられるゲインの情報をゲイン符
号化回路１４０で符号化したコードと、最小歪探索回路
１２５で求められた雑音コードブック１２０のインデッ
クスのコード、および乗算器１２１で乗じられるゲイン
の情報をゲイン符号化回路１４１で符号化したコードが
多重化される。次に、図１の音声符号化装置に対応した
音声復号化装置の構成を図３により説明する。The coding parameters obtained in the above process are multiplexed by the multiplexer 142 and sent from the output terminal 143 to the transmission path as a coded output. That is, in the multiplexer 142, the LPC analysis circuit 10
The information of the LPC prediction coefficient obtained in 2 is used as the encoding circuit 10
3, the code of the index of the adaptive codebook 110 obtained by the minimum distortion search circuit 115, the code of gain information multiplied by the multiplier 111 by the gain encoding circuit 140, and the minimum The code of the index of the noise codebook 120 obtained by the distortion search circuit 125 and the code obtained by encoding the gain information multiplied by the multiplier 121 by the gain encoding circuit 141 are multiplexed. Next, the configuration of a speech decoding apparatus corresponding to the speech encoding apparatus of FIG. 1 will be described with reference to FIG.

【００５２】図３において、入力された符号化パラメー
タは、まずデマルチプレクサ２０１で個々のパラメータ
に分解された後、復号化器２０２，２０３，２０４でそ
れぞれ復号化される。そして、復号化された適応コード
ブックのインデックス及びゲイン、雑音コードブックの
インデックルおよびゲインに基づいて駆動信号が作成さ
れる。この駆動信号が合成フィルタ２１５でフィルタリ
ングされることによって、合成音声信号が作成される。
この合成音声信号は、ポストフィルタ２１６でスペクト
ルの整形が行われ、聴覚的な歪が抑圧された後、出力端
子２１７より出力される。In FIG. 3, the input coding parameters are first decomposed into individual parameters by the demultiplexer 201, and then decoded by the decoders 202, 203 and 204, respectively. Then, a drive signal is created based on the decoded adaptive codebook index and gain, and the noise codebook index and gain. This drive signal is filtered by the synthesis filter 215 to create a synthetic voice signal.
The synthesized voice signal is output from the output terminal 217 after the spectrum is shaped by the post filter 216 and the auditory distortion is suppressed.

【００５３】なお、図３においては雑音コードブック２
１２内の駆動信号ベクトルの学習のためにトレーニング
ベクトル作成部２６２および学習部２６３が設けられて
いる。これらは、それぞれ図１に示した音声符号化装置
におけるトレーニングベクトル作成部１６２および学習
部１６３と同一機能を有するものであり、その動作も同
じであるから、詳細な説明は省略する。In FIG. 3, the noise codebook 2
A training vector creation unit 262 and a learning unit 263 are provided for learning the drive signal vector in the 12. These have the same functions as the training vector creation unit 162 and the learning unit 163 in the speech coding apparatus shown in FIG. 1, respectively, and their operations are also the same, so detailed description will be omitted.

【００５４】本実施例から明らかなように本発明では、
トレーニングに用いる信号を符号化，復号化の双方で得
られる信号に設定している。この結果コードブックの学
習のため、何ら補助情報を伝送する必要はなくビットレ
ートの増加はない。次に、図４に本発明の第２の実施例
に係る学習型音声符号化装置のブロック図を示す。As is clear from the present embodiment, in the present invention,
The signal used for training is set to the signal obtained by both encoding and decoding. As a result, because of the learning of the codebook, it is not necessary to transmit any auxiliary information and the bit rate does not increase. Next, FIG. 4 shows a block diagram of a learning-type speech encoding apparatus according to the second embodiment of the present invention.

【００５５】第１の実施例では雑音コードブックの内容
を学習によって更新する構成となっていたが、適応コー
ドブックの内容を更新する構成とすることもできる。本
実施例は適応コードブックの学習を行う一構成例であ
る。図４においてはこの学習のためにバッファ１３１，
トレーニングベクトル作成部１３２，学習部１３３，メ
モリ１３４，および遅延回路１３５が設けられている。In the first embodiment, the contents of the noise codebook are updated by learning, but the contents of the adaptive codebook may be updated. The present embodiment is an example of the configuration for learning the adaptive codebook. In FIG. 4, a buffer 131,
A training vector creation unit 132, a learning unit 133, a memory 134, and a delay circuit 135 are provided.

【００５６】適応コードブック１１０からの駆動信号ベ
クトルと雑音コードブック１２０からのベクトルの探索
があるサブフレームで終了すると、加算器１３０から新
たな合成フィルタの駆動信号ベクトルが出力される。バ
ッファ１３１は、この新たな駆動信号ベクトルを過去の
サブフレームの駆動信号ベクトルに加えて蓄積する。具
体的には、バッファ１３１は図５に示すように蓄積デー
タ長がＭ_Bサンプル分のシフトレジスタにより構成さ
れ、新たに加算器１３０から出力された駆動信号ベクト
ルを含めて、合計Ｍ_Bサンプル分の駆動信号ベクトルの
情報を蓄積する。バッファ１３１内の駆動信号ベクトル
の情報は、トレーニングベクトル作成部１３２に読み出
される。トレーニングベクトル作成部１３２は、図５に
示すようにバッファ１３１内から駆動信号ベクトルの情
報をベクトルの次元数Ｋの長さを１単位として、順次ｍ
サンプルずつシフトしながら切り出し、これをトレーニ
ングベクトルとして学習部１３３へ送る。図５ではｍ＝
１となっているが、ｍ＝２，３といった値でもよい。ま
た、図５ではＭ_B＝２Ｋとしている。例えばｍ＝１、Ｍ
_B＝２Ｋの場合、トレーニングベクトルとしてはＫ−１
個のベクトルが作成されることになる。When the search for the drive signal vector from the adaptive codebook 110 and the vector from the noise codebook 120 is completed in a certain subframe, the adder 130 outputs a drive signal vector for a new synthesis filter. The buffer 131 adds this new drive signal vector to the drive signal vector of the past sub-frame and stores it. Specifically, as shown in FIG. 5, the buffer 131 is composed of a shift register having an accumulated data length of M _B samples, and a total of M _B samples including the drive signal vector newly output from the adder 130. The information of the drive signal vector of is accumulated. The information on the drive signal vector in the buffer 131 is read by the training vector creation unit 132. As shown in FIG. 5, the training vector creation unit 132 sequentially sets the information of the drive signal vector from the buffer 131 by sequentially setting the length of the vector dimension K as one unit.
Clipping is performed while shifting each sample, and this is sent to the learning unit 133 as a training vector. In FIG. 5, m =
Although it is 1, values such as m = 2 and 3 may be used. Further, in FIG. 5, M _B = 2K. For example, m = 1, M
_{When B} = 2K, K-1 is used as the training vector.
Vectors will be created.

【００５７】学習部１３３では、トレーニングベクトル
作成部１３２からのトレーニングベクトルを用いて、適
応コードブック１１０に格納されている駆動信号ベクト
ルを学習により逐次修正する。この修正は符号化の処理
と並行して行う。The learning unit 133 sequentially corrects the drive signal vector stored in the adaptive codebook 110 by learning using the training vector from the training vector creation unit 132. This correction is performed in parallel with the encoding process.

【００５８】図６に、この学習の手順を示す。まず、ト
レーニングベクトル作成部１３２からトレーニングベク
トルを入力する（Ｓ１）。次に、メモリ１３４に格納さ
れている複数個の駆動信号ベクトルの内、入力されたト
レーニングベクトルとの類似度が最大のベクトルをサー
チする（Ｓ２）。なお、類似度としてはユークリッド距
離の逆数を用いることができる。またメモリ１３４内の
駆動信号ベクトルは図７に示すように長さがＮの信号系
列としてシフトレジスタに格納されている。駆動信号ベ
クトルは、ベクトルの次元数Ｋの長さを１単位としてシ
フトレジスタの右端から左へ１サンプルずつシフトしな
がら切り出すことで生成される。適応コードブック内の
駆動信号ベクトルの総数をｎとするとFIG. 6 shows the procedure of this learning. First, a training vector is input from the training vector creation unit 132 (S1). Next, of the plurality of drive signal vectors stored in the memory 134, the vector having the maximum similarity to the input training vector is searched (S2). The reciprocal of the Euclidean distance can be used as the similarity. The drive signal vector in the memory 134 is stored in the shift register as a signal sequence having a length N as shown in FIG. The drive signal vector is generated by cutting out while shifting the sample from the right end of the shift register to the left one sample at a time with the length of the vector dimension K as one unit. Let n be the total number of drive signal vectors in the adaptive codebook.

【００５９】[0059]

【数１２】の関係がある。次に、Ｓ２のステップで得られた類似ベ
クトルＣ_jをトレーニングベクトルＶ_tを用いて、以下
のように更新する（Ｓ３）。[Equation 12] Have a relationship. Next, the similarity vector C _j obtained in step S2 is updated as follows using the training vector V _t (S3).

【００６０】[0060]

【数１３】 [Equation 13]

【００６１】ここで、αはＣ_jとＶ_tの加重平均の重み
を制御する係数であり、予め定めた定数又は前述の類似
度によって適応的に変化する値を取ることができる。メ
モリ１３４の駆動信号ベクトルの更新は、上式によって
行われるが、実際には駆動信号ベクトルＣ_jが切り出さ
れたシフトレジスタ中の信号系列の一部が更新される。
以上の処理をＳ４でトレーニングベクトルがなくなった
と判定されるまで繰り返し行うことにより、メモリ１３
４内の駆動信号ベクトルの学習が行われる。この学習が
終了するとメモリ１３４のシフトレジスタに格納されて
いる信号系列を、駆動信号ベクトルの次元数Ｋの長さを
１単位として遅延回路１３５で１サンプルずつシフトし
ながら切り出し適応コードブック１１０に格納する。こ
れにより適応コードブックの学習が終了する。なお適応
コードブックは実際に用意する必要はなくメモリ１３４
を仮想的に適応コードブックとすることができる。Here, α is a coefficient for controlling the weight of the weighted average of C _j and V _t , and can be a predetermined constant or a value that adaptively changes according to the similarity. The drive signal vector of the memory 134 is updated by the above equation, but in reality, a part of the signal series in the shift register from which the drive signal vector C _j is cut out is updated.
By repeating the above processing until it is determined in S4 that the training vector is exhausted, the memory 13
The drive signal vector in 4 is learned. When this learning is completed, the signal series stored in the shift register of the memory 134 is stored in the cutout adaptive codebook 110 while shifting by one sample by the delay circuit 135 with the length of the dimension K of the drive signal vector as one unit. To do. This completes the learning of the adaptive codebook. It is not necessary to actually prepare the adaptive codebook, and the memory 134
Can be virtually an adaptive codebook.

【００６２】このような学習によって、適応コードブッ
ク１１０内の駆動信号ベクトルは、トレーニングベクト
ルとして用いられる駆動信号ベクトルと統計的に類似し
た性質を持つようになる。しかも、学習は音声符号化の
処理と並行して行われるので、入力音声信号の性質の変
化に対応して適応コードブック１１０内の駆動信号ベク
トルの性質も変化する。この結果、符号化レートが８ｋ
bps 以下というような低ビットレートで駆動信号の符号
化に割り当てられるビット数が少ない場合でも、高品質
の音声を合成することが可能となる。By such learning, the drive signal vector in the adaptive codebook 110 has a property statistically similar to the drive signal vector used as the training vector. Moreover, since the learning is performed in parallel with the voice coding process, the property of the drive signal vector in the adaptive codebook 110 also changes in response to the change in the property of the input voice signal. As a result, the coding rate is 8k
Even if the number of bits allocated to drive signal encoding is small at a low bit rate such as bps or less, it is possible to synthesize high-quality speech.

【００６３】また、従来のＣＥＬＰ方式では、無声音か
ら有声音へと入力音声信号の性質が急に変化した場合、
適応コードブックの内容が無声音区間の駆動信号ベクト
ルだけになるので有声音を合成するために必要な周期的
な駆動信号を生成することが直ちにはできず、入力音声
信号の変化への追従が遅くなる。この結果、合成音声の
明瞭性が悪くなる問題があった。これに対して、本実施
例では、入力音声信号が無声音から有声音へ急に変化し
た場合でも、上述の学習動作によって過去の有声音区間
の駆動信号ベクトルが適応コードブック内に保存される
ので、この駆動信号ベクトルを用いて有声音を合成する
ことができ、明瞭な合成音声を得ることが可能になる。
さらに、本実施例における駆動信号ベクトルは図７から
明らかなように互いにオーバラップする関係にあり、適
応コードブックから最適な駆動信号ベクトルを探索する
のに要する演算量を削減することができる。従来の適応
コードブックも文献２で記述されているように各ベクト
ルがオーバラップする構造となっており、最適な駆動信
号ベクトルの探索が効率良く行われる。本実施例では学
習動作によって適応コードブックの内容がランダムに更
新されても、オーバラップの構造が崩れないようになっ
ており、効率的な駆動信号ベクトルの探索が可能とな
る。オーバラップ構造を利用した効率的な探索法につい
ては、文献２に記述されているので、ここでは省略す
る。以上の処理の過程で求められた符号化パラメータ
は、マルチプレクサ１４２で多重化され、出力端子１４
３から伝送路へ符号化出力として送出される。In the conventional CELP system, when the nature of the input voice signal suddenly changes from unvoiced sound to voiced sound,
Since the content of the adaptive codebook is only the drive signal vector in the unvoiced section, it is not possible to immediately generate the periodic drive signal necessary for synthesizing the voiced sound, and it is difficult to follow the change of the input voice signal. Become. As a result, there is a problem that the clarity of the synthesized voice is deteriorated. On the other hand, in the present embodiment, even when the input voice signal suddenly changes from unvoiced sound to voiced sound, the driving signal vector of the past voiced sound section is stored in the adaptive codebook by the learning operation described above. , It is possible to synthesize voiced sound by using this drive signal vector, and it becomes possible to obtain clear synthesized speech.
Further, the drive signal vectors in this embodiment have a relationship of overlapping with each other as apparent from FIG. 7, and the amount of calculation required to search for the optimum drive signal vector from the adaptive codebook can be reduced. The conventional adaptive codebook also has a structure in which each vector overlaps as described in Document 2, and an optimum drive signal vector can be searched efficiently. In the present embodiment, even if the contents of the adaptive codebook are randomly updated by the learning operation, the overlap structure is not broken, and it is possible to efficiently search the drive signal vector. Since an efficient search method using the overlap structure is described in Document 2, it is omitted here. The encoding parameters obtained in the above process are multiplexed by the multiplexer 142 and output to the output terminal 14
3 is transmitted to the transmission path as encoded output.

【００６４】図４の音声符号化装置に対応した音声復号
化装置の構成は図８のようになる。図８においては適応
コードブック２１０内の駆動信号ベクトルの学習のため
にメモリ２２４，遅延回路２２５が設けられている。こ
れらは、それぞれ図４に示した音声符号化装置における
メモリ１３４，遅延回路１３５と同一機能を有するもの
であり、その動作も同じであるから、詳細な説明は省略
する。The configuration of the speech decoding apparatus corresponding to the speech encoding apparatus of FIG. 4 is as shown in FIG. In FIG. 8, a memory 224 and a delay circuit 225 are provided for learning the drive signal vector in the adaptive codebook 210. These have the same functions as the memory 134 and the delay circuit 135 in the speech coding apparatus shown in FIG. 4, respectively, and their operations are also the same, so detailed description will be omitted.

【００６５】[0065]

【発明の効果】以上説明したように、本発明によれば適
応コードブック及び雑音コードブック内の駆動信号ベク
トルはトレーニングベクトルとして用いられる駆動信号
と統計的に同じ性質を持つようになる。一方、合成フィ
ルタの駆動信号は符号化対象である入力音声信号を参照
して、適応コードブック及び雑音コードブックから最適
な駆動信号ベクトル、すなわち入力音声信号と合成フィ
ルタによる合成音声信号との誤差が最小となるような駆
動信号ベクトルが探索されることで作成される。従っ
て、この最適な駆動信号ベクトルを用いて学習的に適応
コードブック及び雑音コードブック内の駆動信号ベクト
ルを逐次修正することによって、入力音声信号に対する
歪がより小さくなるような合成音声を作成するのに適し
た適応コードブック及び雑音コードブックを作成するこ
とができる。また、学習の処理自体は符号化の処理と並
行して進めることができるので、入力音声信号の性質の
変化に対応して適応コードブック及び雑音コードブック
の性質も変化することになる。As described above, according to the present invention, the driving signal vector in the adaptive codebook and the noise codebook has the same statistical property as the driving signal used as the training vector. On the other hand, the drive signal of the synthesis filter refers to the input speech signal to be encoded, and the optimum drive signal vector from the adaptive codebook and the noise codebook, that is, the error between the input speech signal and the synthesis speech signal by the synthesis filter is It is created by searching for a drive signal vector that minimizes. Therefore, it is possible to create synthetic speech with less distortion with respect to the input speech signal by sequentially correcting the driving signal vectors in the adaptive codebook and the noise codebook by learning using the optimum driving signal vector. An adaptive codebook and a noise codebook suitable for Further, since the learning process itself can proceed in parallel with the encoding process, the properties of the adaptive codebook and the noise codebook also change in response to the changes in the properties of the input speech signal.

【００６６】この結果、上記のような学習を行わない従
来の方式では駆動信号に割り当てるビット数の制限から
品質を確保することが困難であった８ｋbps 程度以下の
低ビットレートにおいても、本発明によれば品質の高い
音声を合成することが可能となる。しかも、学習のトレ
ーニング信号を符号化，復号化の双方の処理で得ること
のできる駆動信号ベクトルに設定しているので、学習の
ため何ら補助情報を伝送する必要はなく、ビットレート
の増加もない。As a result, the present invention can be applied even at a low bit rate of about 8 kbps or less, which is difficult to secure the quality due to the limitation of the number of bits allocated to the drive signal in the conventional method that does not perform the learning as described above. According to this, it becomes possible to synthesize high-quality speech. Moreover, since the training signal for learning is set to the driving signal vector that can be obtained by both the encoding and decoding processes, there is no need to transmit any auxiliary information for learning, and there is no increase in bit rate. ..

[Brief description of drawings]

【図１】本発明の第１の実施例に係る学習型音声符号
化装置のブロック図。FIG. 1 is a block diagram of a learning-type speech encoding apparatus according to a first embodiment of the present invention.

【図２】同実施例における駆動信号ベクトルの学習の
手順を説明するための図。FIG. 2 is a diagram for explaining a procedure of learning a drive signal vector in the embodiment.

【図３】同実施例における音声復号化装置のブロック
図。FIG. 3 is a block diagram of a speech decoding apparatus according to the embodiment.

【図４】本発明の第２の実施例に係る学習型音声符号
化装置のブロック図。FIG. 4 is a block diagram of a learning-type speech encoding apparatus according to a second embodiment of the present invention.

【図５】同実施例におけるトレーニングベクトルの作
成法を説明するための図。FIG. 5 is a diagram for explaining a method of creating a training vector in the same embodiment.

【図６】同実施例における駆動信号ベクトルの学習の
手順を説明するための図。FIG. 6 is a diagram for explaining a procedure for learning a drive signal vector in the embodiment.

【図７】同実施例に係るメモリ内において、駆動信号
ベクトルが格納されている様子を示す図。FIG. 7 is a diagram showing how drive signal vectors are stored in the memory according to the embodiment.

【図８】同実施例における音声復号化装置のブロック
図。FIG. 8 is a block diagram of a speech decoding apparatus according to the embodiment.

【図９】従来の音声符号化装置における駆動信号ベク
トル探索に係る構成を示すブロック図。FIG. 9 is a block diagram showing a configuration related to a drive signal vector search in a conventional speech encoding device.

[Explanation of symbols]

１００…音声信号入力端子１０２…ＬＰＣ
分析回路１０３…符号化回路１０６…重み付
けフィルタ１０７…重み付け合成フィルタ１１０…適応コ
ードブック１１２…重み付け合成フィルタ１１４…２乗誤
差計算回路１１５…最小歪探索回路１２０…雑音コ
ードブック１２２…重み付け合成フィルタ１２４…２乗誤
差計算回路１２５…最小歪探索回路１３１…バッフ
ァ１３２…トレーニングベクトル作成部１３３…学習部１３４…メモリ１３５…遅延回
路１４０…ゲイン符号化回路１４１…ゲイン
符号化回路１４２…マルチプレクサ１４３…出力端
子１５０…遅延回路１６２…トレー
ニングベクトル作成部１６３…学習部100 ... Audio signal input terminal 102 ... LPC
Analysis circuit 103 ... Encoding circuit 106 ... Weighting filter 107 ... Weighting synthesis filter 110 ... Adaptive codebook 112 ... Weighting synthesis filter 114 ... Square error calculation circuit 115 ... Minimum distortion search circuit 120 ... Noise codebook 122 ... Weighting synthesis filter 124 ... Square error calculation circuit 125 ... Minimum distortion search circuit 131 ... Buffer 132 ... Training vector creation unit 133 ... Learning unit 134 ... Memory 135 ... Delay circuit 140 ... Gain coding circuit 141 ... Gain coding circuit 142 ... Multiplexer 143 ... Output Terminal 150 ... Delay circuit 162 ... Training vector creation unit 163 ... Learning unit

Claims

[Claims]

1. A codebook in which a drive signal vector is stored as a codeword, a search means for searching an optimum drive signal vector from the codebook by referring to an input voice signal, and an optimum search searched by the search means. A synthesis filter for synthesizing a voice signal using a drive signal vector, a training vector creation means for creating a training vector using the optimal drive signal vector, and a training vector created by this means in the codebook And a learning unit that sequentially corrects the driving signal vector of 1.

2. A plurality of codebooks in which driving signal vectors are stored as codewords, a search means for searching an optimum codeword from the plurality of codebooks by referring to an input voice signal, and a search by the search means. A synthesis filter for synthesizing a speech signal using the optimized codeword as a driving signal vector, a training vector creating means for creating a training vector using the optimal codeword, and a training vector created by this means A learning-type speech coding apparatus, comprising: learning means for sequentially correcting at least one corresponding codeword in the codebook.

3. A plurality of codebooks in which drive signal vectors are stored as codewords, a search means for searching an optimum codebook from the plurality of codebooks by referring to an input voice signal, and a search by the search means. A synthesis filter for synthesizing a voice signal by using the optimized codeword as a drive signal vector, a training vector creation means for creating a training vector using the drive signal vector obtained from the optimum codeword, and this means A learning-type speech encoding apparatus, comprising: a learning unit that sequentially corrects at least one corresponding codeword in the codebook by using the training vector created by the above.