JP3065638B2

JP3065638B2 - Audio coding method

Info

Publication number: JP3065638B2
Application number: JP2209337A
Authority: JP
Inventors: 公生三関; 政巳赤嶺
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1990-08-09
Filing date: 1990-08-09
Publication date: 2000-07-17
Anticipated expiration: 2015-07-17
Also published as: JPH0497199A

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）この発明は音声信号等を高能率に圧縮する音声符号化
方式に係り、特に低ビットの伝送レートにおける音声符
号化方式に関する。Description: Object of the Invention (Field of Industrial Application) The present invention relates to an audio encoding system for compressing an audio signal or the like with high efficiency, and more particularly to an audio encoding system at a low bit rate. .

（従来の技術）音声信号を低ビットの伝送レートで伝送する場合にお
いて、倒えば10kb/s程度以下の伝送レートで符号化する
効果的な方法として、マルチモードCELP（Code Excited
Linear Prediction）符号化方式が知られている。この
詳細は1989年のグラスゴーで行われたICASSPの論文（第
１の論文）「Multimode coding:Application to CELP T
omohiko Taniguchi,Shigeyuki Unagami and Robert M.G
ray」に記載されている。この内容を簡単に説明する。
第６図はそれぞれ前記論文に記載されたマルチモード符
号化の原理を説明する図、第７図はマルチモードCELP符
号化器の処理を示すブロック図である。(Prior Art) When an audio signal is transmitted at a low bit rate, a multi-mode CELP (Code Excited) is an effective method for encoding at a transmission rate of about 10 kb / s or less.
Linear Prediction) is known. The details are described in the ICASSP paper (first paper) “Multimode coding: Application to CELP T” in Glasgow, 1989.
omohiko Taniguchi, Shigeyuki Unagami and Robert MG
ray ". This will be described briefly.
FIG. 6 is a diagram for explaining the principle of multi-mode coding described in the above-mentioned paper, and FIG. 7 is a block diagram showing processing of the multi-mode CELP coder.

第６図において、符号側は、ｍ個の符号化器510,520,
530（符号化器＃１〜符号化器＃ｍ）を備え、各符号化
器は予め駆動信号パラメータとスペクトルパラメータに
対して異なるビット割りあてを与えるように設定されて
いる。In FIG. 6, the code side includes m encoders 510, 520,
530 (encoders # 1 to #m), and each encoder is set in advance to assign different bits to the drive signal parameters and the spectrum parameters.

各符号化器はフレーム単位で評価と最適符号化器の決
定部550で入力音声信号を並列的に処理し、入力音声信
号を用いて、各符号化器の与える合成音声信号（復号音
声信号）の品質を評価し、セレクタ540で最適な符号化
器のインデックスｎ（ｎは1,2,…ｍのうちいずれか）を
用いて、伝送する駆動信号パラメータ及びスペクトルパ
ラメータを選択し伝送すると共に、インデックスｎの情
報も復号側に伝送する。復号側では、符号化器のインデ
ックスｎを基に、符号化器＃ｎに対応する復号化器560
（復号化器＃ｎ）を用いることにより合成音声信号を出
力する。Each encoder processes the input speech signal in parallel in the evaluation and optimal encoder decision unit 550 in frame units, and uses the input speech signal to generate a synthesized speech signal (decoded speech signal) given by each encoder. , And the selector 540 selects and transmits the drive signal parameter and the spectrum parameter to be transmitted by using the index n (n is one of 1, 2,... The information of the index n is also transmitted to the decoding side. On the decoding side, the decoder 560 corresponding to the encoder #n is based on the index n of the encoder.
(Decoder #n) to output a synthesized speech signal.

以上が前記論文で示されたマルチモード符号化の概要
である。このマルチモード符号化の考えをCELP方式に応
用したものが第７図に示されるマルチモードCELP符号化
器である。The above is the outline of the multi-mode coding shown in the above-mentioned paper. A multi-mode CELP encoder shown in FIG. 7 applies the concept of the multi-mode encoding to the CELP system.

CELP方式は、駆動信号のベクトル量子化を合成音のレ
ベルで行う音声符号化方式であり、公知な技術である。
又、CELP方式についての詳細は「M.R.Schroeder and B.
S.Atal,“Code−excited linear predection CELP）:Hi
gh quality speech at very low bit rates,2Proc.ICAS
SP‘85,pp.937−940」に記載されている。The CELP method is a speech coding method that performs vector quantization of a drive signal at the level of a synthesized sound, and is a known technique.
For details on the CELP method, see `` MRSchroeder and B.
S. Atal, “Code-excited linear predection CELP): Hi
gh quality speech at very low bit rates, 2Proc.ICAS
SP'85, pp. 937-940 ".

第７図にマルチモードCELP方式は、上記のマルチモー
ド符号化方式を２つのモードという最も簡単な形でCELP
に適用したものである。すなわちＡモードは、従来の公
知なCELP方式で、駆動信号パラメータ、スペクトルパラ
メータ（LPCパラメータ）を伝送し、さらに１ビットの
モード情報をフレーム毎に伝送する。FIG. 7 shows a multi-mode CELP system in which the above-described multi-mode coding system is CELP in the simplest form of two modes.
It is applied to That is, in the A mode, a drive signal parameter and a spectrum parameter (LPC parameter) are transmitted by a known CELP method, and 1-bit mode information is transmitted for each frame.

一方、Ｂモードはスペクトルパラメータを伝送せず
に、前のフレームと同じスペクトルパラメータを用いる
ことで、駆動信号パラメータに割りあてる量子化ビット
数を増加させた構成となっている。各フレームにおい
て、A/Bのモード決定は、それぞれのモードの合成音声
信号の品質評価（SNR等を用いる）に基づいて行われ、
伝送情報の割りあては２つのモード間のスイッチングに
よりダイナミックにコントロールされる。第７図におい
て、ＡモードではLPC分析部100は入力音声信号からスペ
クトルパラメータ（LPCパラメータ）を摘出し、切り換
え端子Ａ及び短時間合成フィルタ110に出力する。長時
間合成フィルタ150のパラメータ及びコードブック
（小）170から選択されるベクトルの波形（コードブッ
ク内のベクトルに付されるインデックス＋符号）及びゲ
インは入力音声と短時間合成フィルタ110（合成フィル
タ）で合成された合成信号との誤差信号を、重みフィル
タ120で重み付けした重み付き誤差信号の電力が最小化
するよう閉ループ的に求める。On the other hand, the B mode has a configuration in which the number of quantization bits allocated to the drive signal parameters is increased by using the same spectral parameters as the previous frame without transmitting the spectral parameters. In each frame, the A / B mode is determined based on the quality evaluation (using SNR or the like) of the synthesized speech signal in each mode,
Allocation of transmission information is dynamically controlled by switching between the two modes. In FIG. 7, in the A mode, the LPC analyzer 100 extracts a spectrum parameter (LPC parameter) from the input audio signal and outputs the extracted parameter to the switching terminal A and the short-time synthesis filter 110. The parameters of the long-term synthesis filter 150 and the waveform of the vector selected from the codebook (small) 170 (index + sign attached to the vector in the codebook) and the gain are the input voice and the short-time synthesis filter 110 (synthesis filter). An error signal with the synthesized signal synthesized in the above is obtained in a closed loop so that the power of the weighted error signal weighted by the weight filter 120 is minimized.

一方、Ｂモードでは、スペクトルパラメータメモリ24
0がＡモードと決定された場合のみ端子Ａに接続されス
ペクトルパラメータを更新する構成となっており、スペ
クトルパラメータメモリ240に蓄積されるスペクトルパ
ラメータはＢモードである間は更新されずに同じものが
使用される。長時間合成フィルタ160のパラメータ及び
コードブック（大）180の波形及びゲインはＡモードで
行ったのと同様の方法で決定される。モード決定部230
はＡモード、Ｂモードで計算された各モードの誤差電力
の最小値を入力し、誤差電力の小さい方のモードを決定
されたモードとして出力する。On the other hand, in the B mode, the spectrum parameter memory 24
Only when 0 is determined to be the A mode, it is connected to the terminal A to update the spectrum parameters. The spectrum parameters stored in the spectrum parameter memory 240 are not updated during the B mode, and are not updated. used. The parameters of the long-time synthesis filter 160 and the waveform and gain of the codebook (large) 180 are determined in the same manner as in the A mode. Mode determination unit 230
Inputs the minimum value of the error power of each mode calculated in the A mode and the B mode, and outputs the mode with the smaller error power as the determined mode.

以上が第７図のマルチモードCELP方式（従来方式）の
説明である。The above is the description of the multi-mode CELP system (conventional system) shown in FIG.

この方式は、従来のCELP方式に比べて4.8kbit/s呼び8
kbit/sの伝送レートにおいて、約2dBのセグメンタルSNR
の改善があることが上記第１の論文でも示されている。This method is 4.8 kbit / s higher than the conventional CELP method.
Approximately 2dB segmental SNR at kbit / s transmission rate
Is also shown in the above first paper.

この音声符号化方式は、入力信号に応じてＡモードＢ
モードと切りかわることにより駆動信号とスペクトルパ
ラメータのビット割り当てがフレーム毎に可変であっ
た。This audio coding method uses A mode B
By changing the mode, the bit assignment of the drive signal and the spectrum parameter was variable for each frame.

そしてフレームを一定の符号量で伝送する際、Ａモー
ドではスペクトルパラメータへのビットの割りあてが多
くなり、駆動信号パラメータにはあまりビットを割りあ
てることができない。このため、Ａモードでは従来のCE
LP方式と同一であり、Ｂモードが使われる音声の区間で
は前のフレームと同じスペクトルパラメータを用いるこ
とにより駆動符号信号パラメータにより多くの量子化ビ
ットを割りあてることができる。よって、Ｂモードでは
CELP方式における音声品質の改善がなされる。When a frame is transmitted with a fixed code amount, in the A mode, bits are frequently allocated to spectrum parameters, and bits cannot be allocated much to drive signal parameters. Therefore, in the A mode, the conventional CE
This is the same as the LP system, and in the speech section in which the B mode is used, more quantization bits can be allocated to the drive code signal parameters by using the same spectral parameters as in the previous frame. So in B mode
The speech quality in the CELP system is improved.

一方、Ｂモードでは現フレームのスペクトルパラメー
タの代りに前フレームのスペクトルパラメータを使用で
きるような音声区間、すなわち、時間的にスペクトルの
変化の少ないような母音の区間で選択されやすいことは
明白である。On the other hand, in the B mode, it is clear that the voice section is easily selected in a voice section in which the spectrum parameter of the previous frame can be used instead of the spectrum parameter of the current frame, that is, a vowel section having a small temporal change in spectrum. .

ところがこのような音声区間は一般に駆動信号の周期
的くり返しによる冗長度も高いため、通常のCELP方式で
も高いSN比の合成音声が得られる。このような音声区間
にＢモードの符号化を行うと、CELP方式よりもさらに高
いSN比の合成音声が得られることが期待されるが聴感的
にはある程度高いSN比をクリアしている音声の違いはわ
かりにくい。However, such a speech section generally has a high degree of redundancy due to the cyclic repetition of the drive signal, so that a synthesized speech having a high SN ratio can be obtained even with the normal CELP method. When B-mode coding is performed in such a voice section, it is expected that a synthesized voice having a higher SN ratio than that of the CELP system will be obtained, but the sound that clears the SN ratio to a certain extent is perceived. The difference is hard to understand.

また、母音以外のスペクトルの変化の大きな音声区間
はＡモード（通常のCELP方式）が選択されやすいので、
聴感的には通常のCELP方式による音声品質の劣化は改善
されないという問題点があった。In addition, A mode (normal CELP method) is easy to select in the voice section where the spectrum change other than the vowel is large.
In terms of hearing, there is a problem that the deterioration of voice quality by the normal CELP method is not improved.

（発明が解決しようとする課題）上述したように、従来の音声符号化方式は、現フレー
ムのスペクトルパラメータを使うモードと、前フレーム
のスペクトルパラメータを使うモードとの２つのモード
の切り換えにより、駆動信号パラメータとスペクトルパ
ラメータのビット割りあてがフレーム毎に可変である
が、スペクトルの時間的変化の大きな子音等の音声区間
では前フレームのスペクトルパラメータを使用するモー
ドは使用されにくくなるため、低レートでは結局、従来
の音声符号化方式であるCELP方式における非定常区間の
音声品質の劣化は改善されないという問題点がある。(Problems to be Solved by the Invention) As described above, in the conventional speech coding method, the driving is performed by switching between two modes, a mode using the spectrum parameters of the current frame and a mode using the spectrum parameters of the previous frame. Although the bit assignment of the signal parameter and the spectrum parameter is variable for each frame, the mode using the spectrum parameter of the previous frame is hardly used in a speech section such as a consonant having a large temporal change in the spectrum. As a result, there is a problem that the deterioration of the voice quality in the non-stationary section in the CELP system which is the conventional voice coding system is not improved.

本発明は、このような問題点を解決するためになされ
たものであり、その目的は、低ビットの伝送レートで高
品質の合成音声を得ることのできる音声符号化方式を提
供することである。The present invention has been made in order to solve such a problem, and an object of the present invention is to provide a speech coding system capable of obtaining a high-quality synthesized speech at a low bit transmission rate. .

［発明の構成］（課題を解決するための手段）上述した目的を達成するため、本発明の音声符号化方
式は、極フィルタ及び零フィルタからなる合成フィルタ
を駆動信号で駆動して合成音声信号を得る音声符号化方
式において、前記零フィルタの係数情報を格納する手段
を有し、前記係数情報を用いて前記合成音声信号を得る
ことを特徴とするものである。[Structure of the Invention] (Means for Solving the Problems) In order to achieve the above-mentioned object, a speech encoding method according to the present invention provides a synthesized speech signal by driving a synthesis filter including a pole filter and a zero filter with a drive signal. In the speech coding method for obtaining the above, there is provided means for storing coefficient information of the zero filter, and the synthesized speech signal is obtained using the coefficient information.

（作用）上述した構成を有する本発明の音声符号化方式によれ
ば、極フィルタ及び零フィルタからなる合成フィルタの
うち、該零フィルタの係数情報を格納する手段を有し、
この係数情報を用いて合成音声信号を得るので、スペク
トルの変化が大きな子音等の音声区間でも、該区間の音
声にあったフィルタを選択することができる。よって高
品質で安定した合成音声を得ることができる。(Operation) According to the speech coding method of the present invention having the above-described configuration, of the synthesis filter composed of the pole filter and the zero filter, there is provided means for storing coefficient information of the zero filter,
Since the synthesized speech signal is obtained using the coefficient information, even in a speech section such as a consonant having a large spectrum change, a filter suitable for the speech in the section can be selected. Therefore, a high-quality and stable synthesized speech can be obtained.

（実施例）以下、図面を参照して本発明の符号化方式について詳
細に述べる。(Embodiment) Hereinafter, the encoding system of the present invention will be described in detail with reference to the drawings.

第１図，第２図は本発明の音声符号化方式を行なうた
めのブロック図である。第１図において入力音声信号は
LPC分析部100により線形予測とピッチ抽出が行なわれ、
これによって得られるフィルタのパラメータを短時間合
成フィルタ110及び長時間合成フィルタ150に出力する。
そしてコードブックA175から選択されるベクトルの波形
（該コードブックＡ内のベクトルに付されるインデック
ス＋符号）及びゲインが乗算回路190を介して長時間合
成フィルタ150に入力される。長時間合成フィルタ150
は、入力信号にピッチの周期性を付加する。これを短時
間合成フィルタ（以下合成フィルタという）110に入力
すると、前記LPC分析部100の線形予測による予測パラメ
ータ（合成フィルタ（極フィルタ）110の係数情報）か
ら合成音声信号を生成する。ここで本発明によれば、合
成フィルタを極零形フィルタで構成するので、零フィル
タ115を有する。そして零フィルタ115はコードブックB1
76に零フィルタの係数情報を有している。よって零フィ
ルタ115,コードブックB176及び極フィルタからなる合成
フィルタ113から出力される合成音声信号と前記入力信
号との誤差信号に対して、重みフィルタ120で重み付け
した重み付け誤差信号の電力を、前記コードブックA175
及びコードブックB176内の係数を閉ループ的に変化させ
て求める。そして歪み比較器210はこれら重み付けした
誤差が最小となると、該最小となる時のコードブックA1
75内の係数のインデックス及びコードブックB176内の係
数のインデックスを入力音声信号に対応する符号化信号
として出力する。第２図は入力信号と合成音声信号との
誤差を重み付けして歪を評価する構成にした本発明のブ
ロック図である。第２図において長時間合成フィルタ15
0のパラメータは閉ループ的に求められるように構成さ
れている。従ってこの場合、LPC分析部100では短時間合
成フィルタのパラメータだけが求められる。なお、第１
図の零フィルタ115に対応する第２図のＢ（Ｚ）がＢ
（Ｚ）＝１の場合、零フィルタの係数情報はない。ここ
で固定レートで伝送を行なう際、１フレームに割りあて
られるビット数は決まってしまう。しかし、一定の符号
量であれば各パラメータに対するビットの割りあては任
意でもかまわない。したがって上述したようにＢ（Ｚ）
＝１の場合には零フィルタのパラメータは送る必要がな
く、駆動信号パラメータにより多くのビットを割りあて
ることができる。反対にＢ（Ｚ）≠１の場合は、零フィ
ルタの係数も伝送しなければならないので、駆動信号パ
ラメータのビット割りあてを少なくすることで伝送レー
トを一定にすることができる。FIG. 1 and FIG. 2 are block diagrams for performing the speech coding method of the present invention. In FIG. 1, the input audio signal is
Linear prediction and pitch extraction are performed by the LPC analysis unit 100,
The obtained filter parameters are output to the short-time synthesis filter 110 and the long-time synthesis filter 150.
Then, the waveform of the vector selected from the codebook A175 (the index + sign attached to the vector in the codebook A) and the gain are input to the long-time synthesis filter 150 via the multiplication circuit 190. Long-time synthesis filter 150
Adds pitch periodicity to the input signal. When this is input to a short-time synthesis filter (hereinafter referred to as a synthesis filter) 110, a synthesized speech signal is generated from prediction parameters (coefficient information of the synthesis filter (polar filter) 110) based on linear prediction of the LPC analysis unit 100. Here, according to the present invention, since the synthesis filter is constituted by a pole-zero filter, the filter has the zero filter 115. And zero filter 115 is codebook B1
76 has zero filter coefficient information. Therefore, the power of the weighted error signal obtained by weighting the error signal between the synthesized voice signal output from the synthesis filter 113 including the zero filter 115, the codebook Book A175
And the coefficients in the codebook B176 are changed in a closed loop. When the weighted error is minimized, the distortion comparator 210 determines that the codebook A1 at the time when the error is minimized is obtained.
The index of the coefficient in 75 and the index of the coefficient in the codebook B176 are output as a coded signal corresponding to the input audio signal. FIG. 2 is a block diagram of the present invention in which an error between an input signal and a synthesized speech signal is weighted to evaluate distortion. In FIG.
The parameter of 0 is configured to be obtained in a closed loop. Therefore, in this case, the LPC analysis unit 100 obtains only the parameters of the short-time synthesis filter. The first
B (Z) in FIG. 2 corresponding to the zero filter 115 in FIG.
When (Z) = 1, there is no coefficient information of the zero filter. Here, when transmission is performed at a fixed rate, the number of bits allocated to one frame is determined. However, the bit may be arbitrarily assigned to each parameter as long as the code amount is constant. Therefore, as described above, B (Z)
In the case of = 1, it is not necessary to send the parameters of the zero filter, and more bits can be allocated to the drive signal parameters. Conversely, when B (Z) ≠ 1, the coefficients of the zero filter must also be transmitted, so that the transmission rate can be made constant by reducing the bit allocation of the drive signal parameters.

次に第３図は第１図に示した音声符号化方式を複数用
いた方式を示したブロック図である。第３図ではＢ
（Ｚ）≠１の場合、零フィルタ115はコードブックB176
を有しているため、駆動信号パラメータのコードブック
170は零フィルタ116のＢ（Ｚ）＝１の場合における駆動
信号パラメータのコードブック180より小さくなってし
まう。Next, FIG. 3 is a block diagram showing a system using a plurality of speech coding systems shown in FIG. In FIG. 3, B
If (Z) ≠ 1, the zero filter 115 is a codebook B176
Codebook of drive signal parameters
170 is smaller than the codebook 180 of the drive signal parameters when B (Z) = 1 of the zero filter 116.

さらに第４図は本発明の一実施例に係る符号化方式を
符号化装置に適用した場合のブロック図を示す。FIG. 4 is a block diagram showing a case where an encoding method according to an embodiment of the present invention is applied to an encoding device.

第４図において、入力端子10からA/D変換された入力
音声信号の系列が入力される。フレームバッファ11は入
力音声信号を１フレーム分蓄積する回路である。第４図
の各ブロックはフレーム単位又はフレームを複数個に分
割したサブフレーム単位に以下の処理を行う。In FIG. 4, a series of A / D converted input audio signals is input from an input terminal 10. The frame buffer 11 is a circuit for accumulating an input audio signal for one frame. Each block in FIG. 4 performs the following processing on a frame basis or on a subframe basis obtained by dividing a frame into a plurality of frames.

予測パラメータ計算回路12は、予測パラメータを公知
の方法を用いて計算する。予測フィルタが第５図に示す
ような長時間予測フィルタ41と短時間予測フィルタ42を
縦続持続して構成される場合、予測パラメータ計算回路
12はピッチ周期ピッチ予測係数および線形予測係数（α
パラメータまたはＫパラメータ：総してLPCパラメータ
と称す）を自己相関法や共分散法等の公知の方法で計算
する。計算法については、例えば（古井貞照著「ディジ
タル音声処理」1985年東海大学出版会発行）に記述され
ている。計算された予測パラメータは、予測パラメータ
符号化回路13へ入力される。予測パラメータ符号化回路
13は、予測パラメータを予め定められた量子化ビット数
に基づいて符号化し、この符号化をマルチプレクサ25に
出力すると共に、ゲイン計算回路15、合成フィルタ18、
重みフィルタ20へそれぞれ出力する。The prediction parameter calculation circuit 12 calculates a prediction parameter using a known method. When the prediction filter is configured by cascading a long-term prediction filter 41 and a short-term prediction filter 42 as shown in FIG.
12 is a pitch period pitch prediction coefficient and a linear prediction coefficient (α
Parameters or K parameters: all referred to as LPC parameters) are calculated by a known method such as an autocorrelation method or a covariance method. The calculation method is described, for example, in (Digital Speech Processing by Sadateru Furui, published by Tokai University Press, 1985). The calculated prediction parameters are input to the prediction parameter coding circuit 13. Prediction parameter coding circuit
13 encodes the prediction parameter based on a predetermined number of quantization bits, outputs this encoding to the multiplexer 25, and calculates the gain calculation circuit 15, the synthesis filter 18,
Output to the weight filters 20 respectively.

ゲイン計算回路15は後述する零フィルタ係数コードブ
ック14からの零フィルタの係数と、係数検索回路24から
出力される係数更新信号と、符号化回路13からの予測パ
ラメータ（極フィルタの係数情報）をもとに極零形の合
成フィルタＨ（Ｚ）を構成する。この逆フィルタ1/H
（Ｚ）を予測フィルタとして入力音声信号を予測し、予
測残差信号を作成する。次にゲイン計算回路15は予測残
差信号の平均パワーを計算してこれをゲインとして符号
化回路16へ出力する。前記予測残差信号の平均パワーと
しては、例えば標準偏差を用いることができる。符号化
回路16はゲインを予め定められた量子化ビット数に基づ
いて符号化し、この符号化をマルチプレクサ25および乗
算回路17へ出力する。零フィルタ係数コードブック14は
予め定められた次数と、量子化ビット数Ｍに対応した種
類数の零フィルタのフィルタ係数情報を格納するもので
ある。また、零フィルタ係数コードブック14に格納され
る零フィルタＢ（Ｚ）の１つにＢ（Ｚ）＝１となるフィ
ルタ情報に格納すれば、零フィルタを用いない全極形の
合成フィルタＨ（Ｚ）が自動的に同一の構成で作成でき
る。The gain calculation circuit 15 calculates a zero filter coefficient from a zero filter coefficient codebook 14 described later, a coefficient update signal output from the coefficient search circuit 24, and a prediction parameter (polar filter coefficient information) from the encoding circuit 13. A pole-zero type synthesis filter H (Z) is constructed based on this. This inverse filter 1 / H
The input speech signal is predicted using (Z) as a prediction filter, and a prediction residual signal is created. Next, the gain calculation circuit 15 calculates the average power of the prediction residual signal, and outputs this to the encoding circuit 16 as a gain. As the average power of the prediction residual signal, for example, a standard deviation can be used. The encoding circuit 16 encodes the gain based on a predetermined number of quantization bits, and outputs this encoding to the multiplexer 25 and the multiplication circuit 17. The zero filter coefficient codebook 14 stores a predetermined order and filter coefficient information of zero filters of the number corresponding to the number M of quantization bits. If one of the zero filters B (Z) stored in the zero filter coefficient codebook 14 is stored in the filter information where B (Z) = 1, the all-pole synthesis filter H ( Z) can be automatically created with the same configuration.

本実施例では、零フィルタ係数コードブック14は、2^M
＋１種類の零フィルタ係数情報を格納し、その第１番の
コードベクトルを用いて作成される零フィルタＢ（Ｚ）
は、Ｂ（Ｚ）＝１となるように予めコードブック14が作
成されているものとする。In the present embodiment, the zero filter coefficient codebook 14 is 2 ^M
+1 types of zero filter coefficient information are stored, and a zero filter B (Z) created using the first code vector is stored.
It is assumed that the code book 14 has been created in advance so that B (Z) = 1.

零フィルタ係数コードブック14は、係数探索回路24か
ら入力されるコード更新信号に基づき、該零フィルタコ
ードブック14に格納された零フィルタ係数（コードベク
トル）をゲイン計算回路15、合成フィルタ18へ出力する
と共に、零フィルタＢ（Ｚ）がＢ（Ｚ）＝１かＢ（Ｚ）
≠１かの情報PZをコードブック21へ出力する。The zero filter coefficient codebook 14 outputs the zero filter coefficients (code vectors) stored in the zero filter codebook 14 to the gain calculation circuit 15 and the synthesis filter 18 based on the code update signal input from the coefficient search circuit 24. And zero filter B (Z) is B (Z) = 1 or B (Z)
≠ Output one information PZ to the codebook 21.

コードブック21は予め分散値が正規化されており、コ
ードブック14からの情報PZに応じて予め設定される制限
された数のコードベクトルを乗算回路17へ出力する。こ
のときのコードベクトルの出力は、コード探索回路23か
ら入力されるコード更新信号によって制御される。コー
ドブック21内のコードベクトルの検索範囲の制限は例え
ば次のように決めることができる。The variance value of the codebook 21 is normalized in advance, and the codebook 21 outputs to the multiplication circuit 17 a limited number of codevectors set in advance according to the information PZ from the codebook 14. The output of the code vector at this time is controlled by the code update signal input from the code search circuit 23. The limitation on the search range of the code vector in the code book 21 can be determined, for example, as follows.

コードブックからの情報PZが零フィルタＢ（Ｚ）＝１
を示す情報である場合は、零フィルタ係数の情報は無い
ので、その分駆動信号に多くのビット数割りあてて、駆
動信号の形状を表すコードブック21内のコードベクトル
の検索範囲を広げることができる。Information PZ from the codebook is zero filter B (Z) = 1
In the case of the information indicating the drive signal, there is no information of the zero filter coefficient, so that a larger number of bits are allocated to the drive signal to expand the search range of the code vector in the code book 21 representing the shape of the drive signal. it can.

逆に、該情報PZが零フィルタＢ（Ｚ）≠１を示す情報
である場合は零フィルタ係数の情報を伝送する必要があ
るので、その分駆動信号に少ないビット数を割りあて
て、コードブック21内のコードベクトルの検索範囲をせ
ばめるものとする。Conversely, if the information PZ is information indicating a zero filter B (Z) ≠ 1, it is necessary to transmit information of a zero filter coefficient. It is assumed that the search range of the code vector in 21 is narrowed.

乗算回路17は、コードブック21から出力されるコード
ベクトルに符号化されたゲインを乗じて駆動信号の候補
となるベクトルを生成し、合成フィルタ18へ入力する。The multiplication circuit 17 multiplies the code vector output from the code book 21 by the encoded gain to generate a vector as a drive signal candidate, and inputs the vector to the synthesis filter 18.

合成フィルタ18は零フィルタ係数コードブック14と符
号化回路13とより、零フィルタの係数情報および極フィ
ルタの係数情報（これをまとめてスペクトルパラメータ
と呼んでいる）をそれぞれ入力し、合成フィルタＨ
（Ｚ）を構成し、乗算回路17よりの駆動信号の候補ベク
トルを入力信号として合成音声信号を出力する。The synthesis filter 18 receives the zero filter coefficient information and the pole filter coefficient information (collectively referred to as spectral parameters) from the zero filter coefficient codebook 14 and the encoding circuit 13, respectively.
(Z), and outputs a synthesized speech signal using the candidate vector of the drive signal from the multiplication circuit 17 as an input signal.

減算回路19は入力音声信号と上述の合成音声信号を入
力し、その誤差信号を出力する。The subtraction circuit 19 receives the input voice signal and the above-described synthesized voice signal, and outputs an error signal.

重みフィルタ20は上述の誤差信号に予測パラメータか
ら作成される重みを付けて出力する。重みフィルタ20は
伝達関数がで表されるフィルタで、聴覚のマスキング効果を利用し
て復号時に合成音声に含まれる符号化ノイズを聞こえに
くくする効果があることが知られている。（１）式にお
いて、Ａ（Ｚ）は予測パラメータから作成される予測フ
ィルタを表している。The weight filter 20 adds the weight generated from the prediction parameter to the above-described error signal and outputs the weighted error signal. The weight filter 20 has a transfer function It is known that the filter represented by ## EQU1 ## has an effect of making the encoding noise included in the synthesized speech difficult to hear at the time of decoding using the auditory masking effect. In equation (1), A (Z) represents a prediction filter created from prediction parameters.

２乗誤差計算回路22は、重み付けされた誤差信号の２
乗和をコードブック21から出力されるコードベクトル毎
に計算し、その結果をコード検索回路23へ出力すると共
に、誤差信号の２乗和を１フレーム分計算した値を係数
検索回路24へ出力する。The square error calculation circuit 22 calculates the weighted error signal 2
The sum of squares is calculated for each code vector output from the codebook 21, the result is output to the code search circuit 23, and the value obtained by calculating the sum of squares of the error signal for one frame is output to the coefficient search circuit 24. .

コード検索回路23は後述する係数検索回路24から出力
される現在検索中の零フィルタのコード番号を入力し、
その零フィルタのコード番号ごとに各サブフレームの２
乗誤差が最小となるコードをコードブック21から検索
し、このコードを保持する。係数検索回路24で最終的に
零フィルタのコード番号が決定すると、この番号を入力
し保持していた駆動信号のコードのうち、零フィルタの
コード番号に対応して保持しているコードをマルチプレ
クサ25へ出力する。The code search circuit 23 inputs the code number of the zero filter currently being searched output from the coefficient search circuit 24 described later,
2 for each subframe for each code number of the zero filter
The code that minimizes the power error is searched from the code book 21, and this code is retained. When the code number of the zero filter is finally determined by the coefficient search circuit 24, the code held in correspondence with the code number of the zero filter among the codes of the drive signal that has been input and held is input to the multiplexer 25. Output to

係数検索回路24は２乗誤差計算回路22から入力される
各零フィルタのコード番号毎にフレーム単位で計算され
た誤差信号の２乗和を比較してこれが最小となる零フィ
ルタのコード番号を選択し、このコード番号をマルチプ
レクサ25およびコード検索回路23へ出力する。もし検索
された零フィルタ係数のコード番号が１ならば上述した
ように、零フィルタは非使用であることがわかるので、
このときは、コード検索回路23から出力される駆動信号
のコードは零フィルタ使用時に比べてより大きなビット
数で表されている。係数検索回路24は零フィルタの使用
・非使用の情報も同時にマルチプレクサ25へ出力する。
第１表に本実施例における駆動信号とスペクトルパラメ
ータとの間のビット配分の例を示す。The coefficient search circuit 24 compares the sum of squares of the error signal calculated for each frame for each code number of each zero filter input from the square error calculation circuit 22, and selects the code number of the zero filter that minimizes this. The code number is output to the multiplexer 25 and the code search circuit 23. If the code number of the searched zero filter coefficient is 1, as described above, it is known that the zero filter is not used.
At this time, the code of the drive signal output from the code search circuit 23 is represented by a larger number of bits than when the zero filter is used. The coefficient search circuit 24 also outputs information on the use / non-use of the zero filter to the multiplexer 25 at the same time.
Table 1 shows an example of bit allocation between drive signals and spectral parameters in the present embodiment.

第１表において、使用する合成フィルタは零フィルタ
がＢ（Ｚ）＝１とＢ（Ｚ）≠の場合により、全極フィル
タと極零フィルタとに分けることができる。今、フレー
ムあたりのビット数をＲビットとする時、スペクトルパ
ラメータ用ビット数は極フィルタのビット数Ｋビットの
みとなり、駆動信号要ビット数は当然Ｒ−Ｋビットとな
る。よってフレームあたりのビット数は常にＲ一定とな
る。また、極零フィルタを用いた場合には、零フィルタ
にもペクトルパラメータ用ビットとしてＭビットを割り
ふるので、残りを駆動用信号とするものである。マルチ
プレクサ25は入力されるコード情報を多重化し、端子26
より伝送路へコード情報を出力する。 In Table 1, the synthesis filter used can be divided into an all-pole filter and a pole-zero filter depending on the case where the zero filter is B (Z) = 1 and B (Z) ≠. Now, when the number of bits per frame is R, the number of bits for the spectral parameter is only K, the number of bits of the polar filter, and the number of bits required for the drive signal is, of course, RK. Therefore, the number of bits per frame is always R constant. When a pole-zero filter is used, M bits are also assigned to the zero filter as the bits for the spectrum parameter, and the remainder is used as a drive signal. The multiplexer 25 multiplexes the input code information, and outputs
The code information is output to the transmission path.

このように、本発明の音声符号化によれば、入力音声
信号の音質の変化に適応して、スペクトル包絡を表すフ
ィルタと駆動信号のパラメータのビット配分がフレーム
単位で変化するだけでなく。このフィルタを極零形で表
し、零フィルタのフィルタ係数の量子化、つまりコード
ブックの選択を、入力音声信号と合成音声信号の聴感重
み付けした誤差が最小となるように行っている。このた
め、スペクトルの時間的変化が大きな音声区間に対して
も、その区間に適合したフィルタを選択できるので、合
成音声の品質を安定して向上させることができる。As described above, according to the speech encoding of the present invention, not only does the filter representing the spectral envelope and the bit allocation of the parameters of the drive signal change in frame units in accordance with the change in the sound quality of the input speech signal. This filter is represented by a pole-zero form, and the quantization of the filter coefficient of the zero filter, that is, the selection of the codebook, is performed so that the perceptually weighted error between the input speech signal and the synthesized speech signal is minimized. For this reason, even in a voice section in which the temporal change of the spectrum is large, a filter suitable for the section can be selected, so that the quality of the synthesized voice can be stably improved.

なお、ここで説明した実施例は本発明の一実施例であ
り、様々な変形が可能である。The embodiment described here is one embodiment of the present invention, and various modifications are possible.

（発明の効果）以上詳述したように本発明の音声符号化方式によれ
ば、高品質で安定した合成音声を得ることができる。(Effects of the Invention) As described in detail above, according to the speech encoding method of the present invention, a high-quality and stable synthesized speech can be obtained.

[Brief description of the drawings]

第１図，第２図は本発明の音声符号化方式を行なうため
ブロック図、第３図は複数の音声符号化方式に本発明の
音声符号化方式を用いたブロック図、第４図は本発明の
一実施例に係る音声符号化方式を符号化装置に適用した
構成を示すブロック図、第５図は第４図を用いた実施例
に記載される予測フィルタの一構成例を示すブロック
図、第６図，第７図は従来技術による符号化装置の構成
を示すブロック図である。 110……短時間合成フィルタ（極フィルタ） 113……合成フィルタ 115……零フィルタ 175,176……コードブック 195……駆動信号発生部1 and 2 are block diagrams for performing the speech coding method of the present invention, FIG. 3 is a block diagram using the speech coding method of the present invention for a plurality of speech coding methods, and FIG. FIG. 5 is a block diagram showing a configuration in which a speech coding method according to one embodiment of the present invention is applied to a coding apparatus. FIG. 5 is a block diagram showing one configuration example of a prediction filter described in the embodiment using FIG. 6 and 7 are block diagrams showing the configuration of an encoding device according to the prior art. 110: Short-time synthesis filter (polar filter) 113: Synthesis filter 115: Zero filter 175, 176: Code book 195: Drive signal generator

フロントページの続き (56)参考文献ＰｒｏｃｅｅｄｉｎｇｓｏｆＩＥＥＥ 1988 ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＡｃｏｕｓｔｉｃｓ，ＳｐｅｅｃｈａｎｄＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ，Ｖｏｌ．１，”Ｓ．12．５ＡＭｕｌｔｉｐｕｌｓｅＥｘｃｉｔｅｄＰｏｌｅ−ＺｅｒｏＦｉｌｔｅｒｉｎｇＡｐｐｒｏａｃｈｆｏｒＳｐｅｅｃｈＥｎｈａｎｃｅｍｅｎｔ”ｐ. 545−548 ＰｒｏｃｅｅｄｉｎｇｓｏｆＩＥＥＥ 1988 ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＡｃｏｕｓｔｉｃｓ，ＳｐｅｅｃｈａｎｄＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ，Ｖｏｌ．１，”Ｓ．14．10 ＡＧｅｎｅｒａｌｉｚｅｄＶｏｃａｌＴｒａｃｔＭｏｄｅｌｆｏｒＰｏｌｅＺｅｒｏＴｙｐｅＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ”ｐ．687−690 ＰｒｏｃｅｅｄｉｎｇｓｏｆＩＥＥＥ 1989 ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＡｃｏｕｓｔｉｃｓ，ＳｐｅｅｃｈａｎｄＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ，Ｖｏｌ．１，”Ｓ４．10 Ｍｕｌｔｉｍｏｄｅｃｏｄｉｎｇ：ＡｐｐｌｉｃａｔｉｏｎｔｏＣＥＬＰ”ｐ. 156−159 ＰｒｏｃｅｅｄｉｎｇｓｏｆＩＥＥＥ 1989 ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＡｃｏｕｓｔｉｃｓ，ＳｐｅｅｃｈａｎｄＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ，Ｖｏｌ．１，”Ｅ２．３ＤｉｓｃｒｅｔｅＰｏｌｅ−ＺｅｒｏＭｏｄｅｌｉｎｇａｎｄＡｐｐｌｉｃａｔｉｏｎｓ”ｐ．2162−2165 ＰｒｏｃｅｅｄｉｎｇｓｏｆＩＥＥＥ 1991 ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＡｃｏｕｓｔｉｃｓ，ＳｐｅｅｃｈａｎｄＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ，Ｖｏｌ．１，”Ｓ４．８ＡｄａｐｔｉｖｅＢｉｔ−ＡｌｌｏｃａｔｏｎＢｅｔｗｅｅｎｔｈｅＰｏｌｅ− ＺｅｒｏＳｙｎｔｈｅｓｉｓＦｉｌｔｅｒａｎｄＥｘｃｉｔａｔｉｏｎｉｎＣＥＬＰ”ｐ．229−232 ＰｒｏｃｅｅｄｉｎｇｓｏｆＩＥＥＥ 1992 ＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＡｃｏｕｓｔｉｃｓ，ＳｐｅｅｃｈａｎｄＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ，Ｖｏｌ．１，”Ｐｏｌｅ−ＺｅｒｏｃｏｄｅＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎｕｓｉｎｇａＰｅｒｃｅｐｔｕａｌｌｙＷｅｉｇｈｔｅｄＥｒｒｏｒＣｒｉｔｅｒｉｏｎ”ｐ．▲Ｉ▼−637〜▲Ｉ ▼−639 Ｐｒｏｃｅｅｄｉｎｇｓｏｆ６ｔｈＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＤｉｇｉｔａｌＳｉｇｎａｌｓｉｎＣｏｍｍｕｎｉｃａｔｉｏｎｓ，ＩＥＥＣｏｎｆｅｒｅｎｃｅＰｕｂｌｉｃａｔｉｏｎＮｏ．340，”Ｐｏｌｅ−ＺｅｒｏＣｏｄｅＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ”，ｐ．42− 47，２−６Ｓｅｐｔｅｍｂｅｒ 1991 電子情報通信学会論文誌，Ｖｏｌ．Ｊ 72−Ｄ−▲ＩＩ▼ Ｎｏ．８，Ａｕｇｕｓｔ 1989、「マルチモード符号化を適用したＣＥＬＰ符号化方式−音源情報と声道情報伝送の最適化−」，ｐ．1159− 1165，（平成元年８月25日発行) 1991年電子情報通信学会春季全国大会講演論文集，分冊１，［Ａ−216 極零合成フィルター駆動信号間適応ビット配分低レート音声符号化方式」，ｐ．１− 216，（1991年３月15日発行) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 13/08 G10L 19/00 - 21/06 H04B 14/04 ＩＮＳＰＥＣ（ＤＩＡＬＯＧ) ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of front page (56) References Proceedings of IE 1988 International Conference on Acoustics, Speech and Signal Processing, Vol. 1, "S.12.5 AM Multipulse Excited Pole-Zero Filtering Approach for Speech Enhancement" p. 1, "S.14.10 A Generalized Vocal Tract Model for Pole Zero Type Linear Prediction" p. 687-690 Proceedings of IEEE 1989 International Conference on Acoustics, Speech and Signal Processing, Vol. 1, "S4.10 Multimode coding: Application to CELP," p. 156-159, Proceedings of IEEE 1989, International Conference on Acoustics, Speech and Signal. 1, "E2.3 Discrete Pole-Zero Modeling and Application ions" p. 2162-2165 Proceedings of IE 1991 International Conference on Acoustics, Speech and Signal Processing, Vol. 1, "S4.8 Adaptive Bit-Allocaton Between the Pole-Zero Synthesis Filter and Excitation in CELP" p. 229-232 Proceedings of IE EE 1992 International Conference on Acoustics, Speech and Signal Processing, Vol. 1, "Pole-Zero code Excited Linear Prediction usin a Perceptually Weighed Error Criterion" p. I-637 to I-639 Proceedings of 6th International Conferencing on Digital Signals in Communications, IEEE Conference Publication No. 340, "Pole-Zero Code Excited Liner Prediction", p. 42-47, 2-6 September 1991 Transactions of the Institute of Electronics, Information and Communication Engineers, Vol. J 72-D- ▲ II ▼ No. 8, August 1989, “CELP Coding Scheme Applying Multi-Mode Coding-Optimization of Transmission of Stimulus Information and Vocal Tract Information-”, p. 1159-1165, (Published August 25, 1989) Proceedings of the 1991 IEICE Spring Conference, Volume 1, [A-216 pole-zero synthesis filter adaptive signal allocation between driving signals Low-rate speech coding Method ", p. 1-216, (issued on March 15, 1991) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 11/00-13/08 G10L 19/00-21/06 H04B 14/04 INSPEC (DIALOG) JICST file (JOIS)

Claims

(57) [Claims]

1. A speech coding system for obtaining a synthesized speech signal by driving a synthesis filter comprising a pole filter and a zero filter with a drive signal, comprising: means for storing coefficient information of the zero filter; A speech encoding method for obtaining the synthesized speech signal by using the speech signal.

2. A speech coding system for obtaining a synthesized speech signal by driving with a synthesis filter comprising a polar filter and a zero filter and a drive signal, comprising means for storing coefficient information of the filter, and using the coefficient information. A speech coding method characterized in that a synthesized speech signal is generated by the above-mentioned method, and coefficient information of the zero filter is selected based on a distortion between the synthesized speech signal and an input speech signal.

3. A distortion of a synthesized speech signal and an input speech signal by each coding method is calculated from a plurality of coding methods in which bit assignments of drive signal parameters and parameters of a synthesis filter including a polar filter and a zero filter are different. And then select one encoding method,
At least one of the plurality of encoding schemes has means for storing coefficient information of the zero filter, and generates a synthesized speech signal using the coefficient information. A speech coding system characterized in that the coefficient of the zero filter is selected based on a distortion with a signal.

4. The method according to claim 2, wherein the bit assignment between the drive signal parameter and the parameter of the synthesis filter is determined depending on whether or not the zero filter is used in the synthesis filter. 3. The speech encoding method according to 3.

5. The polar filter in the synthesis filter, wherein:
4. A speech encoding system according to claim 2, wherein said encoding system is common to all encoding systems.