JP2702157B2

JP2702157B2 - Optimal sound source vector search device

Info

Publication number: JP2702157B2
Application number: JP63153963A
Authority: JP
Inventors: 宏一白木; 邦男中島; 真哉高橋
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1988-06-21
Filing date: 1988-06-21
Publication date: 1998-01-21
Anticipated expiration: 2013-01-21
Also published as: JPH01319799A

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は音声信号を情報圧縮し、ディジタル伝送，
または蓄積を行なう音声符号化装置の改良に関するもの
である。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention compresses information of an audio signal and performs digital transmission,
Alternatively, the present invention relates to an improvement of a speech encoding device that performs accumulation.

[Conventional technology]

音声信号を合成フィルタを表すパラメータと音源を表
すパラメータとに分離することで情報圧縮を行なう音声
符号化方式の中にコードエキサイテッド線形予測（CEL
P:Code−Excited Linear Prediction）と呼ばれるもの
がある。CELPの一例を示すものとして、M.R.シュレー
ダ,B.S.アタルの「コードエキサイテッドリニアプリデ
ィクション（CELP）：ハイクオリティスピーチアットベ
リィロウビットレイツ」（M.R.Schroeder,B.S.Atal,Cod
e−Excited Linear Prediction（CELP）:high−quality
speech at very low bit rates,"Proc,IEEE Int.Conf.
Acoust.,Speech,Signal Processing,pp.937−940（198
5））（以下、文献１と称する）を挙げることができ
る。文献１に示される例では合成フィルタを表すパラメ
ータを10msec毎に分析により求め、一方40点（サンプリ
ング周波数が8KHzのときは5msecになる）毎に区切られ
た音声に時間対応した音源を表すパラメータとして、乱
数により生成した40点の雑音の時系列，即ち40次元のベ
クトル（以下、音源ベクトルと称する）を用いている。Code Excited Linear Prediction (CEL) is a speech coding method that compresses information by separating a speech signal into a parameter representing a synthesis filter and a parameter representing a sound source.
P: Code-Excited Linear Prediction). As an example of CELP, MR Schroeder, BS Atal's "Code Excited Linear Prediction (CELP): High Quality Speech at Very Low Bit Rate" (MR Schroeder, BSAtal, Cod
e-Excited Linear Prediction (CELP): high-quality
speech at very low bit rates, "Proc, IEEE Int.Conf.
Acoust., Speech, Signal Processing, pp. 937-940 (198
5)) (hereinafter referred to as Document 1). In the example shown in Reference 1, a parameter representing a synthesis filter is obtained by analysis every 10 msec, while a parameter representing a sound source corresponding to time divided into voices divided at every 40 points (5 msec when the sampling frequency is 8 KHz). A time series of 40 points of noise generated by random numbers, that is, a 40-dimensional vector (hereinafter, referred to as a sound source vector) is used.

文献１の中の最適音源ベクトル探索装置の行っている
処理を周波数領域において行う装置として第２図に示す
ようなものがある。第２図はI.M.トランコソ,B.S.アタ
ルの「エフィシェントプロシージャーズフォーファイン
ディングジオプティマムイノベーションインストカステ
ィックコーダーズ」（I.M.Trancoso,B.S.Atal,“Effici
ent procedures for finding the optimum innovation
in stochastic coders,"Proc.IEEE Int.conf.Acoust.,S
peech,Signal Processing,pp.2375−2378（1986））
（以下、文献２と称する）に記載されている，従来の最
適音源ベクトル探索装置を示す図である。図におて、２
はＮ点（文献２の例ではサンプリング周波数が8KHzのと
きＮ＝40）のサンプル値系列である音源ベクトルを２・
Ｌ点（文献２中ではＬ＝40としている）離散フーリエ変
換（DFT:Discrete Fourier Transform）して得られるDF
T音源ベクトル（写像音源ベクトル）、１はＬ次元の次
元直交条件を満たすＬ次元歪評価空間に写像したＭ個の
DFT音源ベクトルから構成される符号帳（写像音源符号
帳）、４はＮ点のサンプル値系列である入力音声（入力
される音声信号）11を２・Ｌ点DFTして得られるDFT入力
音声（写像音声信号）、５は入力音声11を音声分析して
得られる合成フィルタ係数のインパルス応答を２・Ｌ点
DFTして得られる周波数特性としての評価重みフィルタ
係数である。また、12は入力音声11をDFTすることによ
りDFT音源ベクトル２と同様のＬ次元歪評価空間に写像
するDFT回路（第１のＬ次元写像手段）、14は入力音声1
1を分析して合成フィルタ係数を算出する音声分析回
路、15は音声分析回路14より出力される合成フィルタ係
数のインパルス応答を算出するインパルス応答生成回
路、16は音声分析回路14およびインパルス応答生成回路
15からなる音声分析手段、13はこの合成フィルタ係数の
インパルス応答をDFTすることによりDFT音源ベクトル２
と同様のＬ次元歪評価空間に写像するDFT回路（第２の
Ｌ次元写像手段）、６はDFT音源ベクトル２を切換えて
音源ベクトル選択回路９に入力する切換スイッチ、９は
切換スイッチ６が選択したＭ個の写像音源ベクトル中か
ら、評価重みフィルタ係数５とＭ個の写像音源ベクトル
を用いてDFT入力音声４に対する歪量が最小となる１個
の最適音源ベクトルコードを選択する音源ベクトル選択
回路（音源ベクトル選択手段）、10は音源ベクトル選択
回路９によって選択された最適音源ベクトルコードであ
る。FIG. 2 shows an apparatus for performing the processing performed by the optimum sound source vector search apparatus in Document 1 in the frequency domain. Fig. 2 shows IM Trancoso, BS Atal's “Efficient Procedures for Finding Optimum Innovation Instrumental Coders” (IMTrancoso, BSAtal, “Effici
ent procedures for finding the optimum innovation
in stochastic coders, "Proc.IEEE Int.conf.Acoust., S
peech, Signal Processing, pp.2375-2378 (1986))
FIG. 2 is a diagram showing a conventional optimal sound source vector search device described in (hereinafter, referred to as Document 2). In the figure, 2
Represents the sound source vector, which is a sample value sequence at N points (in the example of Reference 2, N = 40 when the sampling frequency is 8 KHz), is 2 ·
DF obtained by discrete Fourier transform (DFT) at L point (L = 40 in Reference 2)
T sound source vectors (mapped sound source vectors), 1 is the number of M mapped to the L-dimensional distortion evaluation space satisfying the L-dimensional orthogonal condition.
A codebook (mapped excitation codebook) composed of DFT excitation vectors, and 4 is a DFT input speech (DFT input speech) obtained by subjecting an input speech (input speech signal) 11, which is an N-point sample value sequence, to a 2-L point DFT. 5 is an impulse response of a synthetic filter coefficient obtained by analyzing the input speech 11 at 2 · L points.
It is an evaluation weight filter coefficient as a frequency characteristic obtained by DFT. Reference numeral 12 denotes a DFT circuit (first L-dimensional mapping means) for performing DFT on the input voice 11 to map the same into an L-dimensional distortion evaluation space similar to the DFT sound source vector 2;
A speech analysis circuit that analyzes 1 to calculate a synthesis filter coefficient, 15 is an impulse response generation circuit that calculates an impulse response of a synthesis filter coefficient output from the voice analysis circuit 14, and 16 is a speech analysis circuit 14 and an impulse response generation circuit.
Speech analysis means 15 comprises a DFT sound source vector 2 by performing DFT on the impulse response of the synthesis filter coefficient.
A DFT circuit (second L-dimensional mapping means) for mapping to the same L-dimensional distortion evaluation space as that described above, 6 is a changeover switch for switching the DFT sound source vector 2 and input to the sound source vector selection circuit 9, 9 is a changeover switch 6 for selection A source vector selection circuit that selects one optimal source vector code that minimizes the amount of distortion for the DFT input speech 4 using the evaluation weighting filter coefficient 5 and the M mapped source vectors from the M mapped source vectors. (Sound source vector selecting means) and 10 are optimal sound source vector codes selected by the sound source vector selecting circuit 9.

次に、上記従来装置の基本動作を説明する。まず、切
換スイッチ６は符号帳１の中のＭ個のDFT音源ベクトル
２を１個ずつ最適音源ベクトル選択回路９に伝える。最
適音源ベクトル選択回路９はＭ個のDFT音源ベクトル２
それぞれに対して、このDFT音源ベクトル２と評価重み
フィルタ係数５とDFT入力音声４とを用いて周波数領域
において再生音声が入力音声に対して持つ歪量を計算す
る。Ｍ個中のｋ番目の音源ベクトルを用いた場合の前記
歪量Ｄ（ｋ）は次式で与えられる。Next, the basic operation of the conventional device will be described. First, the changeover switch 6 transmits the M DFT excitation vectors 2 in the codebook 1 to the optimal excitation vector selection circuit 9 one by one. The optimal sound source vector selection circuit 9 is composed of M DFT sound source vectors 2
For each of them, the distortion amount of the reproduced voice with respect to the input voice in the frequency domain is calculated using the DFT sound source vector 2, the evaluation weight filter coefficient 5, and the DFT input voice 4. The distortion amount D (k) when the k-th sound source vector out of M is used is given by the following equation.

ここで、Ｘ（ｉ）はDFT入力音声のｉ番目の成分、Ｈ
（ｉ）は評価重みフィルタのｉ番目の成分、Ｃ（i,k）
はｋ番目のDFT音源ベクトルのｉ番目の成分、ｇ（ｋ）
はＤ（ｋ）を最小化する利得係数である。さらに、前述
の文献２によれば、第（１）式は次の第（２）式と等価
であり、実際の演算には第（２）式が用いられる。 Here, X (i) is the i-th component of the DFT input voice, and H
(I) is the ith component of the evaluation weight filter, C (i, k)
Is the i-th component of the k-th DFT sound source vector, g (k)
Is a gain coefficient that minimizes D (k). Further, according to the above-mentioned document 2, the expression (1) is equivalent to the following expression (2), and the expression (2) is used for the actual calculation.

ここで、Ｙ^＊（ｉ）はＹ（ｉ）の共役複素数を表し、
Ｙ（ｉ）は次の第（３）式で与えられる。 Here, Y ^* (i) represents a complex conjugate of Y (i),
Y (i) is given by the following equation (3).

Ｙ（ｉ）＝Ｘ（ｉ）・ａ（ｉ）/H（ｉ） …（３）またａ（ｉ）は次の第（４）式で与えられる。 Y (i) = X (i) · a (i) / H (i) (3) a (i) is given by the following equation (4).

ａ（ｉ）＝|H（ｉ）｜ …（４）こうして求めたＭ個のＤ（ｋ）の内、最小値を与える
DFT音源ベクトルの番号を最適音源ベクトルコードとし
て選択する。a (i) = | H (i) | (4) Give the minimum value among the M D (k) obtained in this way
The number of the DFT sound source vector is selected as the optimum sound source vector code.

[Problems to be solved by the invention]

従来の最適音源ベクトル探索装置は以上のように構成
されているので、最適音源ベクトル選択回路９の中で、
Ｌ次元の歪量計算をＭ回行なう必要があり、良好な再生
音声を得るためにＭを大きくとる（例えばＭ＝1024）
と、この歪量計算に要する演算量が莫大となり、装置化
した場合の装置規模が非常に大きくなるという問題点が
あった。Since the conventional optimal sound source vector search device is configured as described above,
It is necessary to perform L-dimensional distortion amount calculation M times, and M is increased to obtain a good reproduced sound (for example, M = 1024).
Thus, there is a problem that the amount of calculation required for calculating the amount of distortion is enormous, and the scale of the device when the device is implemented becomes very large.

この発明は上記のような問題点を解決するためになさ
れたもので、最適音源ベクトル探索における歪量計算に
要する演算量を小さくできる最適音源ベクトル探索装置
を得ることを目的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to provide an optimal sound source vector search device capable of reducing the amount of calculation required for calculating the amount of distortion in the optimum sound source vector search.

[Means for solving the problem]

この発明に係る最適音源ベクトル探索装置は、写像音
源符号帳（１）と、音声分析手段（16）と、第１のＬ次
元写像手段（12）と、第２のＬ次元写像手段（13）と、
音源ベクトル予備選択手段（３）と、音源ベクトル選択
手段（９）からなる音声符号化装置の最適音源ベクトル
探索装置において、写像音源符号帳（１）は、Ｌ次元の
次元直交条件を満たすＬ次元歪評価空間に写像したＭ個
の写像音源ベクトル（２）を有し、第１のＬ次元写像手
段（12）は、入力される音声信号（11）を写像音源ベク
トル（２）と同様のＬ次元歪評価空間に写像し、写像音
声信号（４）として音源ベクトル予備選択手段（３）、
および、音源ベクトル選択手段（９）に出力し、音声分
析手段（16）は、入力される音声信号（11）を分析し合
成フィルタ係数を算出すると共に、この合成フィルタ係
数のインパルス応答を算出して第２のＬ次元写像手段
（13）に出力し、第２のＬ次元写像手段（13）は、合成
フィルタ係数のインパルス応答を写像音源ベクトル
（２）と同様のＬ次元歪評価空間に写像し、評価重みフ
ィルタ係数（５）として音源ベクトル予備選択手段
（３）、および、音源ベクトル選択手段（９）に出力
し、音源ベクトル予備選択手段（３）は、評価重みフィ
ルタ係数（５）の各成分の絶対値の大きさに基づいてL1
個（L1＜Ｌ）の次元成分を選出するとともに、L1個の次
元成分を対象に、評価重みフィルタ係数（５）とＭ個の
写像音源ベクトル（２）を用いて写像音声信号（４）に
対する歪量が小さいM1個（M1＜Ｍ）の写像音源ベクトル
を選択し、音源ベクトル選択手段（９）は、Ｌ次元全て
を対象に、音源ベクトル予備選択手段（３）の選択した
M1個の写像音源ベクトル中から、評価重みフィルタ係数
（５）とM1個の写像音源ベクトルを用いて写像音声信号
（４）に対する歪量が最小となる１個の最適写像音源ベ
クトルを選択するようにしたものである。An optimal excitation vector search apparatus according to the present invention comprises a mapped excitation codebook (1), a speech analysis means (16), a first L-dimensional mapping means (12), and a second L-dimensional mapping means (13). When,
In the optimal excitation vector search device of the speech encoding device including the excitation vector preliminary selection means (3) and the excitation vector selection means (9), the mapped excitation codebook (1) is an L-dimensional orthogonal orthogonal condition. It has M mapped sound source vectors (2) mapped to the distortion evaluation space, and the first L-dimensional mapping means (12) converts the input speech signal (11) into the same L as the mapped sound source vector (2). The sound source vector preliminary selecting means (3) is mapped to the dimensional distortion evaluation space, and is mapped as a mapped audio signal (4).
The voice signal is output to the sound source vector selection means (9), and the voice analysis means (16) analyzes the input voice signal (11) to calculate a synthesis filter coefficient, and calculates an impulse response of the synthesis filter coefficient. To the second L-dimensional mapping means (13), and the second L-dimensional mapping means (13) maps the impulse response of the synthesis filter coefficient to the same L-dimensional distortion evaluation space as the mapped sound source vector (2). Then, it outputs to the sound source vector preliminary selecting means (3) and the sound source vector selecting means (9) as the evaluation weight filter coefficient (5), and the sound source vector preliminary selecting means (3) outputs the evaluation weight filter coefficient (5). L1 based on the magnitude of the absolute value of each component
(L1 <L) dimension components are selected, and the L1 dimension components are subjected to the mapped speech signal (4) using the evaluation weight filter coefficient (5) and the M mapped sound source vectors (2). The M1 (M1 <M) mapped sound source vectors having a small distortion amount are selected, and the sound source vector selecting means (9) selects the sound source vector preliminary selecting means (3) for all L dimensions.
From the M1 mapped sound source vectors, one optimal mapped sound source vector that minimizes the amount of distortion for the mapped speech signal (4) is selected using the evaluation weight filter coefficient (5) and the M1 mapped sound source vectors. It was made.

[Action]

この発明においては、上述のように構成したことによ
り、音源ベクトルを予備選択し、そのなかから写像音声
信号に対する歪量が小さい写像音源ベクトルを選択し、
選択された写像音源ベクトルのなかから写像音声信号に
対する歪量が最小となる最適写像音源ベクトルを選択す
るようにしたので、最適音源ベクトルを探索する際にお
ける，歪量計算に要する演算量が小さくなる。In the present invention, by configuring as described above, the sound source vectors are preliminarily selected, and a mapped sound source vector having a small amount of distortion with respect to the mapped sound signal is selected from among them.
Since the optimum mapped sound source vector that minimizes the amount of distortion for the mapped audio signal is selected from the selected mapped sound source vectors, the amount of calculation required for calculating the amount of distortion when searching for the optimum sound source vector is reduced. .

〔Example〕

以下、この発明の一実施例を図について説明する。 An embodiment of the present invention will be described below with reference to the drawings.

第１図は本発明の一実施例による最適音源ベクトル探
索装置を示すブロック図であり、図において、第２図と
同一符号は同一または相当部分である。また、３は音源
ベクトル予備選択回路（音源ベクトル予備選択手段）で
あり、切換スイッチ６により選択されたDFT音源ベクト
ル２のなかから評価重みフィルタ係数５の各成分の絶対
値の大きさに基づいてL1個（L1＜Ｌ）の次元成分を選出
するとともに、L1個の次元成分を対象に、評価重みフィ
ルタ係数５とＭ個のDFT音源ベクトル２を用いてDFT入力
音声４に対する歪量が小さいM1個（M1＜Ｍ）のDFT音源
ベクトルを選択する。６は切換スイッチであり、DFT音
源ベクトル２を切換えて、音源ベクトル選択回路９にで
はなく音源ベクトル予備選択回路３に出力する。８は第
２の切換スイッチであり、音源ベクトル予備選択回路３
が出力する指定信号７に応じてDFT音源ベクトル２を切
換えて音源ベクトル選択回路９に出力する。FIG. 1 is a block diagram showing an apparatus for searching for an optimal excitation vector according to an embodiment of the present invention. In FIG. 1, the same reference numerals as those in FIG. 2 denote the same or corresponding parts. Reference numeral 3 denotes a sound source vector preliminary selection circuit (a sound source vector preliminary selection means), which is based on the magnitude of the absolute value of each component of the evaluation weighting filter coefficient 5 from the DFT source vector 2 selected by the changeover switch 6. In addition to selecting L1 (L1 <L) dimensional components, M1 having a small amount of distortion with respect to the DFT input voice 4 is used for the L1 dimensional components by using an evaluation weighting filter coefficient 5 and M DFT excitation vectors 2. (M1 <M) DFT sound source vectors are selected. A changeover switch 6 switches the DFT excitation vector 2 and outputs the DFT excitation vector 2 not to the excitation vector selection circuit 9 but to the excitation vector preselection circuit 3. Reference numeral 8 denotes a second changeover switch, and the sound source vector preliminary selection circuit 3
The DFT excitation vector 2 is switched in accordance with the designation signal 7 output by the.

次に動作について説明する。 Next, the operation will be described.

先ず、切換スイッチ６は符号帳１の中のDFT音源ベク
トル２を音源ベクトル予備選択回路３に伝える。音源ベ
クトル予備選択回路３は第（２）式におけるａ（ｉ）・
|Y（ｉ）｜の大きい次元が歪量への寄与度が大きい次元
であるとし、まずこのａ（ｉ）・|Y（ｉ）｜の大きなL1
個の次元を選出する。そしてＭ個のDFT音源ベクトル２
それぞれに対して、このL1個の次元についてのみDFT音
源ベクトル２と評価重みフィルタ係数５とDFT入力音声
４とを用いて周波数領域において再生音声が入力音声に
対して持つ歪量を計算する。Ｍ個中のｋ番目の音源ベク
トルを用いた場合の前記歪量D1（ｋ）は次式で与えられ
る。First, the changeover switch 6 transmits the DFT excitation vector 2 in the codebook 1 to the excitation vector preliminary selection circuit 3. The sound source vector preliminary selection circuit 3 calculates a (i) ·
It is assumed that a dimension having a large | Y (i) | is a dimension having a large contribution to the amount of distortion. First, a large L1 of a (i) · | Y (i) |
Choose dimensions. And M DFT sound source vectors 2
For each of these L1 dimensions, the amount of distortion that the reproduced voice has for the input voice in the frequency domain is calculated using the DFT sound source vector 2, the evaluation weight filter coefficient 5, and the DFT input voice 4. The distortion amount D1 (k) when the k-th sound source vector in M is used is given by the following equation.

ここで、Ｉ（ｉ）はベクトル｛|a（ｊ）|,j＝1,L1｝
の中でｉ番目に大きなベクトル成分に対応する次元であ
る。こうして求めたＭ個のD1（ｋ）の内、小さなD1
（ｋ）を与えるM1個の音源ベクトルの番号は音源ベクト
ル指定信号７として第２の切換スイッチ８に送られ、第
２の切換スイッチは小さなD1（ｋ）を与えるM1個のDFT
音源ベクトルを音源ベクトル選択回路９に１個ずつ伝え
る。以下の音源ベクトル選択回路９の動作は第２図の音
源ベクトル選択回路９がＭ個のDFT音源ベクトルを選択
の対象にしていたのが、M1個を対象としていること以外
は同じなので説明は省略する。 Here, I (i) is a vector {| a (j) |, j = 1, L1}.
Is the dimension corresponding to the i-th largest vector component. Of the M D1 (k) obtained in this way, a small D1
The number of M1 sound source vectors giving (k) is sent to the second changeover switch 8 as a sound source vector designating signal 7, and the second changeover switch outputs M1 DFTs giving small D1 (k).
The sound source vectors are transmitted to the sound source vector selection circuit 9 one by one. The following operation of the sound source vector selection circuit 9 is the same as that of the sound source vector selection circuit 9 shown in FIG. 2 except that M1 DFT sound source vectors are selected. I do.

次に演算量について述べる。Ｄ（ｋ）、又はD1（ｋ）
の１次元のみの演算に要する演算量をＦとするとき、従
来の技術、即ち予備選択を行わずに最適音源ベクトルを
選択するための歪量計算を行なう方法ではＬ・Ｍ・Ｆの
演算量が必要であり、本実施例によれば、まず予備選択
にL1・Ｍ・Ｆそして本選択にＬ・M1・Ｆの合計（L1・Ｍ
＋Ｌ・M1）・Ｆの演算量が必要であるので、L1・Ｍ＋Ｌ
・M1＜Ｌ・Ｍを満たすようにL1,M1を定めれば、演算量
を減少させることができる。このとき、M1,L1が小さい
ほど演算量は減少するが、音源ベクトル予備選択回路で
最適音源ベクトルが予備選択されない場合が起こるので
M1,L1は適切に定める必要がある。実験例としてはＭ＝1
024,L＝40のときM1＝32,L1＝５とした場合は最適音源ベ
クトルが予備選択結果からもれることなく演算量の大幅
な減少が確認されている。Next, the calculation amount will be described. D (k) or D1 (k)
In the conventional technique, that is, a method of calculating a distortion amount for selecting an optimal sound source vector without performing preliminary selection, the calculation amount of LMF According to the present embodiment, first, L1 ・ M ・ F for the preliminary selection and the sum of L ・ M1 ・ F for the final selection (L1 本 M)
+ L · M1) · F is required, so L1 · M + L
If L1 and M1 are determined so as to satisfy M1 <LM, the amount of calculation can be reduced. At this time, the computation amount decreases as M1 and L1 are smaller, but the optimal excitation vector may not be preselected by the excitation vector preliminary selection circuit.
M1 and L1 need to be determined appropriately. As an experimental example, M = 1
When 024, L = 40 and M1 = 32, L1 = 5, it has been confirmed that the amount of calculation is greatly reduced without the optimal excitation vector being omitted from the preliminary selection result.

なお上記実施例では最適音源探索の処理を回路内で実
現する例について述べたが、これをマイクロプロセッ
サ，信号処理プロセッサ等の汎用演算装置によるソフト
ウェア処理により実現してもよい。In the above-described embodiment, an example in which the process of searching for an optimum sound source is realized in a circuit has been described. However, this may be realized by software processing using a general-purpose arithmetic device such as a microprocessor or a signal processing processor.

また上記実施例では歪評価空間として、DFTによる周
波数領域を用いた場合について述べたが、これを次元直
交条件を満たす任意の写像空間を用いてもよい。In the above embodiment, the case where the frequency domain by DFT is used as the distortion evaluation space has been described. However, any mapping space that satisfies the dimensional orthogonality condition may be used.

〔The invention's effect〕

以上のように、この発明によれば、写像音源符号帳
（１）と、音声分析手段（16）と、第１のＬ次元写像手
段（12）と、第２のＬ次元写像手段（13）と、音源ベク
トル予備選択手段（３）と、音源ベクトル選択手段
（９）からなる音声符号化装置の最適音源ベクトル探索
装置において、写像音源符号帳（１）は、Ｌ次元の次元
直交条件を満たすＬ次元歪評価空間に写像したＭ個の写
像音源ベクトル（２）を有し、第１のＬ次元写像手段
（12）は、入力される音声信号（11）を写像音源ベクト
ル（２）と同様のＬ次元歪評価空間に写像し、写像音声
信号（４）として音源ベクトル予備選択手段（３）、お
よび、音源ベクトル選択手段（９）に出力し、音声分析
手段（16）は、入力される音声信号（11）を分析して合
成フィルタ係数を算出すると共に、この合成フィルタ係
数のインパルス応答を算出して第２のＬ次元写像手段
（13）に出力し、第２のＬ次元写像手段（13）は、合成
フィルタ係数のインパルス応答を写像音源ベクトル
（２）と同様のＬ次元歪評価空間に写像し、評価重みフ
ィルタ係数（５）として音源ベクトル予備選択手段
（３）、および、音源ベクトル選択手段（９）に出力
し、音源ベクトル予備選択手段（３）は、評価重みフィ
ルタ係数（５）の各成分の絶対値の大きさに基づいてL1
個（L1＜Ｌ）の次元成分を選出するとともに、L1個の次
元成分を対象に、評価重みフィルタ係数（５）とＭ個の
写像音源ベクトル（２）を用いて写像音声信号（４）に
対する歪量が小さいM1個（M1＜Ｍ）の写像音源ベクトル
を選択し、音源ベクトル選択手段（９）は、Ｌ次元全て
を対象に、音源ベクトル予備選択手段（３）の選択した
M1個の写像音源ベクトル中から、評価重みフィルタ係数
（５）とM1個の写像音源ベクトルを用いて写像音声信号
（４）に対する歪量が最小となる１個の最適写像音源ベ
クトルを選択するようにしたので、音源ベクトルに対す
る歪量計算を行う際にその演算量を低減でき、小規模な
装置でも十分大きなＭ個の音源ベクトルの中から最適な
音源ベクトルを探索することが可能となり、同じ装置規
模でより高品質な再生音声を得ることができる効果が得
られる。As described above, according to the present invention, the mapped excitation codebook (1), the speech analysis means (16), the first L-dimensional mapping means (12), and the second L-dimensional mapping means (13) In the optimal excitation vector search apparatus of the speech encoding apparatus comprising the excitation vector preliminary selection means (3) and the excitation vector selection means (9), the mapped excitation codebook (1) satisfies the L-dimensional orthogonal condition. It has M mapped sound source vectors (2) mapped to the L-dimensional distortion evaluation space, and the first L-dimensional mapping means (12) converts the input audio signal (11) into the same as the mapped sound source vector (2). And outputs it to the sound source vector preliminary selecting means (3) and the sound source vector selecting means (9) as a mapped sound signal (4), and the sound analyzing means (16) is inputted. Analyzing the audio signal (11) to calculate the synthesis filter coefficient and The impulse response of the synthesis filter coefficient is calculated and output to the second L-dimensional mapping means (13), and the second L-dimensional mapping means (13) converts the impulse response of the synthesis filter coefficient into the mapped sound source vector (2). Are mapped to the same L-dimensional distortion evaluation space as described above, and output to the excitation vector preselection means (3) and the excitation vector selection means (9) as evaluation weight filter coefficients (5), and the excitation vector preselection means (3) Is L1 based on the magnitude of the absolute value of each component of the evaluation weight filter coefficient (5).
(L1 <L) dimension components are selected, and the L1 dimension components are subjected to the mapped speech signal (4) using the evaluation weight filter coefficient (5) and the M mapped sound source vectors (2). The M1 (M1 <M) mapped sound source vectors having a small distortion amount are selected, and the sound source vector selecting means (9) selects the sound source vector preliminary selecting means (3) for all L dimensions.
From the M1 mapped sound source vectors, one optimal mapped sound source vector that minimizes the amount of distortion for the mapped audio signal (4) is selected using the evaluation weight filter coefficient (5) and the M1 mapped sound source vectors. Therefore, the amount of calculation can be reduced when calculating the amount of distortion for the sound source vector, and even a small-scale device can search for an optimal sound source vector from M sufficiently large sound source vectors. The effect of obtaining higher-quality reproduced audio on a large scale is obtained.

[Brief description of the drawings]

第１図はこの発明の一実施例による最適音源ベクトル探
索装置を示すブロック図、第２図は従来の最適音源ベク
トル探索装置を示すブロック図である。図において、１はＭ個のDFT音源ベクトルから構成され
る符号帳、２はDFT音源ベクトル、３は音源ベクトル予
備選択回路、４はDFT入力音声、５は評価重みフィルタ
係数、６は切換スイッチ、７は指定信号、８は第２の切
換スイッチ、９は音源ベクトル選択回路、10は最適音源
ベクトルコードである。なお図中同一符号は同一又は相当部分を示す。FIG. 1 is a block diagram showing an optimum sound source vector searching device according to an embodiment of the present invention, and FIG. 2 is a block diagram showing a conventional optimum sound source vector searching device. In the figure, 1 is a codebook composed of M DFT excitation vectors, 2 is a DFT excitation vector, 3 is an excitation vector preliminary selection circuit, 4 is a DFT input voice, 5 is an evaluation weight filter coefficient, 6 is a changeover switch, 7 is a designation signal, 8 is a second switch, 9 is a sound source vector selection circuit, and 10 is an optimum sound source vector code. In the drawings, the same reference numerals indicate the same or corresponding parts.

フロントページの続き (56)参考文献特開昭59−99496（ＪＰ，Ａ) 特開昭62−139089（ＪＰ，Ａ) 特開昭59−77730（ＪＰ，Ａ) 特開昭59−94936（ＪＰ，Ａ)Continuation of front page (56) References JP-A-59-99496 (JP, A) JP-A-62-139089 (JP, A) JP-A-59-77730 (JP, A) JP-A-59-94936 (JP, A) , A)

Claims

(57) [Claims]

1. A mapped excitation codebook, a speech analysis means, and a first
, A second L-dimensional mapping means, an excitation vector preliminary selection means, and an optimal excitation vector search apparatus of a speech coding apparatus including the excitation vector selection means, wherein the mapped excitation codebook has an L-dimensional It has M mapped sound source vectors mapped to an L-dimensional distortion evaluation space that satisfies the three-dimensional orthogonal condition, and the first L-dimensional mapping means converts the input speech signal into the same L-dimensional distortion evaluation space as the mapped sound source vector. The audio signal is mapped and output to the sound source vector preliminary selecting means and the sound source vector selecting means as a mapped sound signal. The sound analyzing means analyzes the input sound signal to calculate a synthesis filter coefficient, and impulse of the synthesis filter coefficient. Calculates the response and outputs it to the second L-dimensional mapping means. The second L-dimensional mapping means converts the impulse response of the synthesis filter coefficient into the same L as the mapped sound source vector. It maps to the dimensional distortion evaluation space and outputs it to the sound source vector preliminary selecting means and the sound source vector selecting means as the evaluation weight filter coefficient. The sound source vector preliminary selecting means determines the magnitude of the absolute value of each component of the evaluation weight filter coefficient. L1 (L1 <L) dimension components are selected based on the L1 dimension components.
Using the evaluation weighting filter coefficient and the M mapped sound source vectors, select M1 (M1 <M) mapped sound source vectors having a small amount of distortion to the mapped sound signal. The sound source vector selecting means targets all L dimensions, From the M1 mapped sound source vectors selected by the sound source vector preliminary selecting means, one optimal mapped sound source vector with the least amount of distortion to the mapped sound signal is selected using the evaluation weighting filter coefficient and the M1 mapped sound source vectors. An optimal sound source vector search device.