JPS63179399A

JPS63179399A - Voice encoding system

Info

Publication number: JPS63179399A
Application number: JP62011536A
Authority: JP
Inventors: 中川　富士夫
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1987-01-21
Filing date: 1987-01-21
Publication date: 1988-07-23

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】技術分野本発明はマルチパルス駆動音声符号化方式に関し、特に
演算量の制限下で符号化音質の優れたパルスを探索する
方法に関する。DETAILED DESCRIPTION OF THE INVENTION Technical Field The present invention relates to a multi-pulse driven audio encoding system, and more particularly to a method of searching for pulses with excellent encoded audio quality under a limited amount of calculation.

東工且薯８〜１６ｋｂｐｓのビットレートで音声を符号化する方
式の一つとして、２０ミリ秒程度のフレームで、１０〜
３０個のパルスで線形予測フィルタを駆動するマルチパ
ルス駆動音声符号化方式がある。As one method for encoding audio at a bit rate of 8 to 16 kbps, it is possible to encode audio at a bit rate of 8 to 16 kbps.
There is a multi-pulse driven speech encoding method that drives a linear predictive filter with 30 pulses.

本方式では、入力信号と合成信号の誤差が最小になるよ
うにパルスを求める必要があるが、演算の簡略化法とし
て、入力信号と線形予測フィルタのインパルス応答の相
互相関の絶対値が最大になる位置にパルスを立て、２つ
目以降のパルスは相互相関のパルスを立てた位置からイ
ンパルス応答を引いたものを新たな相互相関として、同
様にして求めてい（方法が提案されている。In this method, it is necessary to find pulses so that the error between the input signal and the composite signal is minimized, but as a method to simplify the calculation, the absolute value of the cross-correlation between the input signal and the impulse response of the linear prediction filter is maximized. For the second and subsequent pulses, a new cross-correlation is obtained by subtracting the impulse response from the position where the cross-correlation pulse was set (a method has been proposed).

相互相関としては入力信号とインパルス応答の相互相関
より、前フレームの影響を除去するために、合成フィル
タの入力をゼロにして影響信号を求め入力信号から引き
、ノイズを聴感特性を利用してマスクするように聴感の
重み付はフィルタを通した信号を求め、この信号とイン
パルス応答の相互相関を用いるのが一般的である。The cross-correlation is based on the cross-correlation between the input signal and the impulse response.In order to remove the influence of the previous frame, the input of the synthesis filter is set to zero, the influence signal is found and subtracted from the input signal, and the noise is masked using auditory characteristics. For auditory weighting, it is common to obtain a signal passed through a filter and use the cross-correlation between this signal and the impulse response.

パルス位置は、相互相関の絶対値が最大となるサンプル
の位置より求まるが、この絶対ＭＲ大の位置を求めるの
に、従来は、サンプル毎に絶対値の大小を比較すること
を行っていた。この様な従来方式についての詳細は、「
マルヂバルス駆動形音声符号化法の検討Ｊ　１９８３年
３月２３日、′Ｒ子通信学会予稿ｌ　Ｐ、　１１５〜Ｐ
、１２２に開示されている。かかる従来方式は以下の如
き欠点を有している。The pulse position is determined from the position of the sample where the absolute value of the cross-correlation is the maximum, but conventionally, in order to determine the position of the absolute MR, the magnitude of the absolute value was compared for each sample. For more information on this conventional method, see
Study of multi-valse-driven speech coding method J March 23, 1983, Proceedings of the R-Communication Society P, 115-P
, 122. Such conventional methods have the following drawbacks.

ここで、原音声として８にｌｌｚでサンプリングされた
ものを考え、２　Ｑ　ｍ５ｅｃのフレームで符号化する
とすると、フレーム内の原音声のサンプル数は１６０サ
ンプルとなり、相互相関も１６０サンプルとなる。また
、パルス探索においては、符号化フレーム内のみでなく
フレーム外も含めてパルス探索する方が、フレーム内の
みで探索するより符号化音質が優れていることが知られ
ている。従ってパルス探索に用いる相互相関は２００サ
ンプル程度となる。Here, if we consider that the original audio is sampled at 8 llz and encoded in a frame of 2 Q m5ec, the number of samples of the original audio in the frame will be 160 samples, and the cross-correlation will also be 160 samples. Furthermore, in pulse search, it is known that the encoded sound quality is better if the pulse search is performed not only within the encoded frame but also outside the frame, than when the pulse search is performed only within the frame. Therefore, the cross-correlation used for pulse search is about 200 samples.

従来のサンプル毎に大小比較する方法によって絶対Ｉａ
Ｒ大の位置を探す場合、２００サンプルの相互相関に対
しては１９９回の比較処理が必要になる。Absolute Ia is determined by the conventional method of comparing the size of each sample.
When searching for the position of R size, 199 comparisons are required for the cross-correlation of 200 samples.

また、この絶対値最大の位置探索はパルス１つに対して
１回必要であるので、２０　ｍ５ｅｃフレーム内に３０
パルス求めるとすると、比較処理は、１９９ｘ　３０−
５９７０回／　２０１ｓｅｃとなり、実時間で処理する
にはかなりの障害となる欠点がある。Also, this position search for the maximum absolute value is required once for each pulse, so 30
When calculating the pulse, the comparison process is 199x 30-
The processing time is 5,970 times/201 seconds, which is a drawback that is a considerable obstacle to processing in real time.

発明の目的本発明は上記従来のものの欠点を除去すべくなされたも
のであって、その目的とするところは、最大値検索にお
ける大小比較処理回数の大幅な削減を図り得る音声符号
化方式を提供することにある。Purpose of the Invention The present invention has been made to eliminate the drawbacks of the above-mentioned conventional methods, and its purpose is to provide a speech encoding method that can significantly reduce the number of times of magnitude comparison processing in maximum value search. It's about doing.

発明の構成本発明によれば、入力信号を線形予測分析し、該線形予
測フィルタのインパルス応答を求め、これ等入力信号と
インパルス応答との相互相関を求め、該相互相関の絶対
値の最大の位置に相互相関の大きさの第一のパルスを立
て、相互相関から、相互相関のパルスを立てた位置にイ
ンパルス応答の自己相関をパルスの大きさに正規化して
引いたものを新たな相互相関とし、該相互相関から、同
様にして、予め定めた個数のパルスを相互相関の予め定
めた範囲内でもとめ、前記線形予測フィルタの係数と予
め定めた個数のパルスの位置と大きさを伝送する音声符
号化方式であって、相互相関を２サンプル以上の間隔で
間引き、間引いた信号の中で絶対値の大きな１つ以上の
サンプルの位置を求め、さらに、値最大のサンプルの位
置を求め、該位置を相互相関の絶対値最大の位置とする
ことを特徴とする音声符号化方式が得られる。Structure of the Invention According to the present invention, an input signal is linearly predictively analyzed, an impulse response of the linear prediction filter is determined, a cross-correlation between these input signals and the impulse response is determined, and the maximum absolute value of the cross-correlation is calculated. The first pulse with the magnitude of the cross-correlation is set at the position, and the new cross-correlation is obtained by subtracting the autocorrelation of the impulse response at the position where the cross-correlation pulse was set, normalized to the pulse magnitude. Similarly, from the cross-correlation, a predetermined number of pulses are determined within a predetermined range of the cross-correlation, and the coefficients of the linear prediction filter and the positions and magnitudes of the predetermined number of pulses are transmitted. A speech encoding method, in which cross-correlation is thinned out at intervals of two or more samples, the position of one or more samples with a large absolute value is determined in the thinned out signal, and the position of the sample with the maximum value is determined, A speech encoding method is obtained in which the position is set as the position where the absolute value of the cross-correlation is maximum.

１里１以下、図面を用いて本発明について詳細に説明する。1 ri 1 Hereinafter, the present invention will be explained in detail using the drawings.

第１図は本発明の実施例に適用されるマルチパルス駆動
音声符号化方式のブロック図である。入力信号Ｘは重み
付Ｇノフィルタ１へ印加されて、聴感の重み付けがなさ
れた信号ＸＷが求められる。FIG. 1 is a block diagram of a multi-pulse driven speech encoding method applied to an embodiment of the present invention. The input signal X is applied to a weighted G filter 1 to obtain an auditory weighted signal XW.

また、入力信号ＸをＬＰＧ分析分析部上り線形予測分析
し、線形予測係数Ａが求められる。線形予測係数Ａを係
数に持つ線形予測フィルタのインパルス応答Ｒを、ＬＰ
Ｇパラメータ潰子化／符号化部４により求める。そして
、相互相関部２及び自己相関部５により前記重み付き信
号ＸＶＩとインパルス応答Ｒとの相互相関φｘ、ｒ及び
、インパルス応答Ｒの自己相関φ「ｒを夫々求める。パ
ルス探索部６において、相互相関φｘｒと自己相関φｒ
「とによりパルス探索を行い、得られた音源パルスの位
置と振幅とを、符号化部７にて線形予測係数Ａといっし
ょに符号化して伝送する。Further, the input signal X is subjected to upstream linear prediction analysis by the LPG analysis unit, and a linear prediction coefficient A is obtained. The impulse response R of the linear prediction filter having the linear prediction coefficient A as a coefficient is expressed as LP
The G parameter is determined by the collapsing/encoding unit 4. Then, the cross-correlation section 2 and the autocorrelation section 5 calculate the cross-correlation φx,r between the weighted signal XVI and the impulse response R, and the autocorrelation φ'r of the impulse response R, respectively. Correlation φxr and autocorrelation φr
The position and amplitude of the obtained excitation pulse are encoded and transmitted together with the linear prediction coefficient A in the encoding section 7.

次に、パルス探索の方法について説明する。第２図は第
１図のパルス探索部６の内部構成例である。パルス探索
部６の内部には、相互相関保持手段６１があり、長さｌ
ｘの相互相関φｘｒ（ｉ）　、　　ｉ−１〜ｌ×が保持
されている。絶対値の最大値検索部６２では、長さｉｘ
の相互相関φｘｒ（ｉ）　、　　ｉ＝１〜ｌｘの絶対値
が最大となる値φｘｒ（ｉｓ）を求め、位置ｉｌに大き
さφｘｒ（ｉｓ）のパルスを立てる。Next, a pulse search method will be explained. FIG. 2 shows an example of the internal configuration of the pulse search section 6 shown in FIG. 1. Inside the pulse search section 6, there is a cross-correlation holding means 61, which has a length l.
The cross-correlations φxr(i) of x, i-1 to lx are maintained. In the maximum absolute value search unit 62, the length ix
A value φxr(is) at which the absolute value of the cross-correlation φxr(i) of i=1 to lx is maximum is determined, and a pulse of magnitude φxr(is) is generated at the position il.

相互相関φＸ「から、長さ±ｌｒの自己相関φ「ｒを位
置ｉｌで大きさを位置・振幅修正部６３にてφＸ「（ｉ
■）に合わせ、しかる後に引算部６４にて下式のように
引算を行う。From the cross-correlation φX', the autocorrelation φ' of length ±lr is set at the position il and the magnitude is determined by the position/amplitude correction unit 63, φX'(i
(2), and then the subtraction unit 64 performs subtraction as shown in the following equation.

φｘｒ（ｉｍ＋ｉ）＝　　φＸｒ（ｌｌｌ＋１）−（φ
　ｘｒ（ｉｓ）／　φ　ｒｒ（０））×φｒｒ（ｉ）　
　（ｉ　−−Ｉｒ　〜Ｉｒ　）この最大値探索と、相互
相関から自己相関を引算する処理は、パルス１つ求める
毎に１回行われる。そして、相互相関から自己相関を引
算した結果は、再び相互相関保持手段６１に設定され次
のパルス探索において相互相関と共に用いられる。φxr(im+i)=φXr(llll+1)−(φ
xr(is)/φrr(0))×φrr(i)
(i--Ir to Ir) This maximum value search and the process of subtracting autocorrelation from cross-correlation are performed once every time one pulse is obtained. Then, the result of subtracting the autocorrelation from the crosscorrelation is again set in the crosscorrelation holding means 61 and used together with the crosscorrelation in the next pulse search.

以上の処理における相互相関の絶対値の最大探索部６２
における処理は、第３図に示す方法で行われる。まず、
相互相関φｘｒ（ｉ）を１／Ｎ間引き部６２１にてＮサ
ンプル分の１に間引き、間引いた信号に対して絶対値の
最大値検出を検出部６２２にて行う。次に、得られた最
大値の位置を中心にして、Ｎ分の１間引きにおいて間引
かれた、前後それぞれＮ−１サンプルを含めて合計２Ｘ
　（Ｎ−１）＋１サンプルでさらに最大値検出を部分検
出部６２３にて行う。Maximum search unit 62 for the absolute value of cross-correlation in the above processing
The processing in is performed by the method shown in FIG. first,
A 1/N decimation section 621 decimates the cross-correlation φxr(i) to N samples, and a detection section 622 detects the maximum absolute value of the decimated signal. Next, centering on the position of the maximum value obtained, a total of 2
Maximum value detection is further performed in the partial detection unit 623 using (N-1)+1 samples.

即ち、この方法はまずＮ分の１間引きを行うことによっ
て最大値検出の粗検出を行い１、それによって得られた
最大値位置を中心にさらに精密検出を行うものである。That is, this method first performs coarse detection of the maximum value by thinning out by 1/N, and then performs more precise detection around the maximum value position obtained thereby.

最大値の検出をサンプル間の大小比較で行うとすると、
大小比較処理の回数は、パルス１つを求めるのに、粗検
出におけるＩｘ／Ｎ−１回と、精密検出における２ｘ（
Ｎ−１）回で、合計ｌｘ　／Ｎ＋２　（Ｎ−１）　−１
回となる。Assuming that the maximum value is detected by comparing the size of samples,
The number of times of magnitude comparison processing is Ix/N-1 times in rough detection and 2x(
N-1) times, total lx /N+2 (N-1) -1
times.

これを、従来の間引きしないですべてのサンプルに対し
て大小比較する方法による大小比較処理回数１ｘ−１回
と比べると、（Ｉｘ　／Ｎ＋２　（Ｎ−１）　−１）／（ｌｘ−１）
＃１／Ｎ　（ｌｘ＞Ｎ）となり、ＩＸがＮより十分大き
いときは約１／Ｎになっている。Comparing this with the number of size comparisons of 1x-1 in the conventional method of comparing all samples in size without thinning out, we get (Ix /N+2 (N-1) -1)/(lx-1)
#1/N (lx>N), and when IX is sufficiently larger than N, it is approximately 1/N.

尚、本実施例では相互相関の最大値の粗検出において最
大値を１つだけ検出しているが、最大なものから順に２
つ以上検出して最大値位置の候補とすることもできる。In this embodiment, only one maximum value is detected in the coarse detection of the maximum value of cross-correlation, but two maximum values are detected in order from the maximum value.
It is also possible to detect more than one and use it as a candidate for the maximum value position.

この場合、精密検出は粗検出で求められた最大値位置の
候補すべてに対して行うことになる。In this case, precise detection is performed on all candidates for the maximum value position determined by rough detection.

発明の詳細な説明したように本発明によれば、相互相関の最大値検
索において、はじめにＮサンプル分の１に間引いて最大
値検索の粗検索を行うことにより、相互相関のサンプル
数がＮより十分大きいときには、最大値検索における大
小比較処理回数を従来の約１／Ｎにすることができると
いう効果がある。DETAILED DESCRIPTION OF THE INVENTION According to the present invention, in the maximum value search for cross-correlation, the number of cross-correlation samples is reduced from N by first thinning out to 1 N samples and performing a coarse search for the maximum value. When the value is sufficiently large, there is an effect that the number of times of size comparison processing in the maximum value search can be reduced to about 1/N of the conventional value.

[Brief explanation of the drawing]

第１図は本発明の実施例のブロック図、第２図は第１図
のパルス探索部の構成を示すブロック図、第３図は第２
図の絶対値の最大値探索部の構成を示すブロック図であ
る。主要部分の符号の説明FIG. 1 is a block diagram of an embodiment of the present invention, FIG. 2 is a block diagram showing the configuration of the pulse search section of FIG. 1, and FIG.
FIG. 2 is a block diagram showing the configuration of a maximum absolute value search unit shown in the figure. Explanation of symbols of main parts

Claims

[Claims]

Linear prediction analysis is performed on the input signal, the impulse response of the linear prediction filter is determined, the cross-correlation between these input signals and the impulse response is determined, and the position of the maximum absolute value of the cross-correlation is set at the position of the magnitude of the cross-correlation. A new cross-correlation is obtained by subtracting the autocorrelation of the impulse response at the position where the cross-correlation pulse was set, normalized to the pulse size, from the cross-correlation. A speech encoding method that detects a predetermined number of pulses within a predetermined range of cross-correlation and transmits the coefficients of the linear prediction filter and the positions and magnitudes of the predetermined number of pulses. Thin out the correlation at intervals of two or more samples, find the position of one or more samples with a large absolute value in the thinned out signal, find the position of the sample with the maximum value, and set this position as the maximum absolute value of the cross-correlation. A speech encoding method characterized by a position of .