JP5174651B2

JP5174651B2 - Low complexity code-excited linear predictive coding

Info

Publication number: JP5174651B2
Application number: JP2008500663A
Authority: JP
Inventors: アニセタレブ，
Original assignee: テレフオンアクチーボラゲットエルエムエリクソン（パブル）
Priority date: 2005-03-09
Filing date: 2005-03-09
Publication date: 2013-04-03
Anticipated expiration: 2025-03-09
Also published as: BRPI0520115A2; WO2006096099A1; TW200639801A; ATE513290T1; KR101235425B1; EP1859441B1; BRPI0520115B1; CN101138022A; EP1859441A1; KR20070116869A; CN101138022B; JP2008533522A

Abstract

Information (km) about excitation signals of a first signal (sm(n)) encoded by CELP is used to derive a limited set (10') of candidate excitation signals for a second correlated second signal (ss(n)). Preferably, pulse locations of the excitation signals of the first encoded signal (sm(n)) are used for determining the set (10') of candidate excitation signals. More preferably, the pulse locations of the set of candidate excitation signals are positioned in the vicinity of the pulse locations of the excitation signals of the first encoded signal (sm(n)). The first and second signals (sm(n), ss(n)) may be multi-channel signals of a common speech or audio signal. However, the first and second signals (sm(n), ss(n)) may also be identical, whereby the coding of the second signal (ss(n)) can be utilized for re-encoding at a lower bit rate.

Description

本発明は、オーディオ符号化に関し、特に、符号励振線形予測符号化に関する。 The present invention relates to audio coding, and in particular to code-excited linear predictive coding.

現在あるステレオ符号化技術、より一般的にはマルチチャネル符号化技術は、非常に高いビットレートを必要とする。パラメトリックステレオは、非常に低いビットレートで使用されることが多い。しかし、それら技術は、さまざまな種類の一般的なオーディオマテリアル、すなわち音楽、音声及び混合コンテンツ用に設計される。 Existing stereo coding techniques, and more generally multi-channel coding techniques, require very high bit rates. Parametric stereo is often used at very low bit rates. However, these technologies are designed for various types of common audio material: music, voice and mixed content.

マルチチャネル音声符号化においては、殆ど何も行なわれてきていない。研究の殆どは、チャネル間予測（inter-channel prediction : ICP）方法に重点をおいてきた。ICP技術は、左チャネルと右チャネルとの間に相関性があることを利用する。ステレオ信号におけるこの冗長性を減少する種々の方法が、[1][2][3]等の文献において説明される。 Almost nothing has been done in multi-channel speech coding. Most research has focused on inter-channel prediction (ICP) methods. ICP technology takes advantage of the correlation between the left and right channels. Various ways to reduce this redundancy in stereo signals are described in documents such as [1] [2] [3].

ICP方法は、話者が1人のみ存在する場合には非常に適切にモデリングするが、複数話者のモデリングや音源の拡散（例えば背景ノイズの拡散）はできない。したがって、ICPの残差の符号化は不可欠なものとなる場合があり、その場合非常に高いビットレートが要求される。 The ICP method models very well when there is only one speaker, but it cannot model multiple speakers or diffuse sound sources (for example, background noise diffusion). Therefore, the encoding of the ICP residual may be indispensable, in which case a very high bit rate is required.

既存の音声コーデックの殆どはモノラルであり、符号励振線形予測（code excited linear predictive : CELP）符号化モデルに基づく。例として、AMR-NB (Adaptive Multi-Rate Narrow Band) 及びAMR-WB (Adaptive Multi-Rate Wide Band) を含む。このモデル、すなわちCELPにおいて、短期線形予測合成フィルタの入力における励振信号は、適応コードブック及び固定コードブック（イノベーション・コードブック）からの2つの励振ベクトルを加算することにより構成される。音声は、短期合成フィルタを介してそれらコードブックから適切に選択された2つのベクトルを供給することにより合成される。コードブックの最適な励振シーケンスは、AbS探索 (Analysis-by-Synthesis search) の探索手順を使用して選択される。この手順において、現音声と合成音声との間の誤差は、知覚重み付け歪み基準に従って最小限にされる。 Most existing speech codecs are monaural and are based on a code excited linear predictive (CELP) coding model. Examples include AMR-NB (Adaptive Multi-Rate Narrow Band) and AMR-WB (Adaptive Multi-Rate Wide Band). In this model, ie CELP, the excitation signal at the input of the short-term linear prediction synthesis filter is constructed by adding two excitation vectors from the adaptive codebook and the fixed codebook (innovation codebook). The speech is synthesized by supplying two appropriately selected vectors from those codebooks via a short-term synthesis filter. The optimal excitation sequence of the codebook is selected using the AbS search (Analysis-by-Synthesis search) search procedure. In this procedure, the error between the current speech and the synthesized speech is minimized according to the perceptual weighting distortion criterion.

固定コードブックは2種類存在する。第１の種類のコードブックは、いわゆる確率コードブック (stochastic codebook) である。確率コードブックは、多くの場合、大量の記憶容量を必要とする。コードブックにインデックスを与えると、励振ベクトルは従来のテーブルルックアップにより取得される。したがって、コードブックのサイズは、ビットレート及び演算量に制約される。 There are two types of fixed codebooks. The first type of codebook is a so-called stochastic codebook. Probabilistic codebooks often require a large amount of storage capacity. Given an index into the codebook, the excitation vector is obtained by a conventional table lookup. Therefore, the size of the code book is limited by the bit rate and the calculation amount.

第２の種類のコードブックは、代数コードブック（algebraic codbook）である。確率コードブックとは対照的に、代数コードブックはランダムではなく、実質的な記憶を必要としない。代数コードブックは、k番目のコードベクトルを構成するパルスの振幅及び位置が対応するインデックスkから直接導出されるインデックス付きコードベクトルのセットである。これは、実質的にメモリに対する要件がない。したがって、代数コードブックのサイズはメモリ容量により制約されない。また、代数コードブックは、効率的な探索手順に非常に適している。 The second type of codebook is an algebraic codbook. In contrast to probability codebooks, algebraic codebooks are not random and do not require substantial storage. An algebraic codebook is a set of indexed code vectors in which the amplitude and position of the pulses making up the kth code vector are derived directly from the corresponding index k. This has virtually no memory requirement. Therefore, the size of the algebraic codebook is not limited by the memory capacity. Algebraic codebooks are also very suitable for efficient search procedures.

重要なことは、音声コーデックが利用可能なビットのうち重要な部分及び殆どの場合はその大部分が固定コードブック励振符号化に割り当てられることである。例えば、AMR-WB標準において、固定コードブック手順に割り当てられるビットの量は36%〜76%である。更にこれは、符号化器の演算量の大部分を表す固定コードブック励振探索である。 What is important is that a significant portion of the bits available to the speech codec and in most cases the majority are allocated to fixed codebook excitation coding. For example, in the AMR-WB standard, the amount of bits allocated to fixed codebook procedures is between 36% and 76%. In addition, this is a fixed codebook excitation search that represents the majority of the computational effort of the encoder.

[7]において、チャネル毎の個別固定コードブックと全チャネルに共通な共通コードブックとを含むマルチパート固定コードブックが使用される。この手法によれば、チャネル間相関の適切な表現を得ることができる。しかし、これにより記憶容量及び演算量は増す。更に、各チャネルコードブックインデックスに加え共通コードブックインデックスを送信する必要があるため、固定コードブック励振を符号化するのに必要なビットレートは非常に大きい。[8]及び[9]において、マルチチャネル信号を符号化する同様の方法が説明される。それらの方法において、符号化モードは異なるチャネルの相関度に依存する。それら技術は、Left/Right及びMid/Side符号化から周知であり、2つの符号化モード間の切り替えは残差に依存し、従って相関性に依存する。 In [7], a multipart fixed codebook including an individual fixed codebook for each channel and a common codebook common to all channels is used. According to this method, an appropriate expression of the correlation between channels can be obtained. However, this increases the storage capacity and the calculation amount. Furthermore, since it is necessary to transmit a common codebook index in addition to each channel codebook index, the bit rate required to encode the fixed codebook excitation is very high. In [8] and [9], a similar method for encoding multi-channel signals is described. In these methods, the coding mode depends on the correlation of different channels. These techniques are well known from Left / Right and Mid / Side coding, and the switching between the two coding modes depends on the residual and thus on the correlation.

[10]には、マルチチャネル信号を符号化する方法が記載されている。その方法は、単一チャネル線形予測コーデックの種々の要素を一般化する。この方法は、会話アプリケーション等のリアルタイムアプリケーションにおいてその方法を使用不可能にする大量の計算を必要とするという欠点を有する。この技術の別の欠点は、符号化に対して使用される種々の逆相関フィルタを符号化するのに必要なビット量である。 [10] describes a method of encoding a multi-channel signal. The method generalizes the various elements of a single channel linear prediction codec. This method has the disadvantage of requiring a large amount of computation that renders the method unusable in real-time applications such as conversational applications. Another drawback of this technique is the amount of bits required to encode the various inverse correlation filters used for encoding.

[1] H. Fuchs, "Improving joint stereo audio coding by adaptive inter-channel prediction", in Proc. IEEE WASPAA, Mohonk, NY, 1993年10月[1] H. Fuchs, "Improving joint stereo audio coding by adaptive inter-channel prediction", in Proc. IEEE WASPAA, Mohonk, NY, October 1993 [2] S.A. Ramprashad, "Stereophonic CELP coding using cross channel prediction", in Proc. IEEE Workshop Speech Coding, pp. 136-138, 2000年9月[2] S.A. Ramprashad, "Stereophonic CELP coding using cross channel prediction", in Proc. IEEE Workshop Speech Coding, pp. 136-138, September 2000 [3] T. Liebschen, "Lossless audio coding using adaptive multichannel prediction", in Proc. AES 113th Conv., Los Angeles, CA, 2002年10月[3] T. Liebschen, "Lossless audio coding using adaptive multichannel prediction", in Proc. AES 113th Conv., Los Angeles, CA, October 2002 [4] ITU-R BS.1387[4] ITU-R BS.1387 [5] 国際公開WO96/28810[5] International publication WO96 / 28810 [6] 3GPP TS 26.190, p. 28, table 7[6] 3GPP TS 26.190, p. 28, table 7 [7] 米国特許出願公開2004/0044524[7] US Patent Application Publication 2004/0044524 [8] 米国特許出願公開2004/0109471[8] US Patent Application Publication 2004/0109471 [9] 米国特許出願公開2003/0191635[9] US Patent Application Publication 2003/0191635 [10] 米国特許6,393,392[10] US Patent 6,393,392

従来の音声符号化の一般的な課題は、必要とするビットレートの高さとエンコーダの演算量の多さである。 A general problem of conventional speech coding is a high bit rate required and a large amount of calculation of the encoder.

本発明の目的は、音声符号化のため方法及び装置の改良である。本発明の副次的な目的は、CELP符号化方法及び装置におけるビットレート及び演算量を低減することである。 It is an object of the present invention to improve the method and apparatus for speech coding. A secondary object of the present invention is to reduce the bit rate and the amount of calculation in the CELP encoding method and apparatus.

上記目的は、開示される請求の範囲に従う方法及び装置により達成される。一般に、CELPにより符号化された第１信号の励振信号は、第２信号に対する候補励振信号の限定されたセットを導出するために使用される。第２信号は、第１信号と関係付けられるのが好ましい。特定の実施形態においては、候補励振信号の限定されたセットは、符号化された第１信号及び／又は第２信号に基づいて所定のルールのセットから選択された１つのルールにより導出される。第１符号化信号の励振信号のパルス位置は、候補励振信号のセットを判定するために使用されるのが好ましい。候補励振信号のセットのパルス位置は、第１符号化信号の励振信号のパルス位置に近接して位置付けられるのが更に好ましい。第１信号及び第２信号は、共通の音声又はオーディオ信号のマルチチャネル信号とすることができる。しかし、第１信号及び第２信号は同一であってもよく、それにより、第２信号の符号化はより低いビットレートで再符号化するために利用される。 The above objective is accomplished by a method and apparatus according to the disclosed claims. In general, the excitation signal of the first signal encoded by CELP is used to derive a limited set of candidate excitation signals for the second signal. The second signal is preferably related to the first signal. In certain embodiments, the limited set of candidate excitation signals is derived by one rule selected from a predetermined set of rules based on the encoded first signal and / or second signal. The pulse position of the excitation signal of the first encoded signal is preferably used to determine a set of candidate excitation signals. More preferably, the pulse positions of the set of candidate excitation signals are positioned close to the pulse positions of the excitation signals of the first encoded signal. The first signal and the second signal can be a multi-channel signal of a common voice or audio signal. However, the first signal and the second signal may be the same so that the encoding of the second signal is utilized to re-encode at a lower bit rate.

本発明の１つの利点は、符号化の演算量を低減できることである。更に、マルチチャネル信号の場合、符号化信号を送信するのに必要とされるビットレートが減少する。本発明は、より低いレートでの同一信号の再符号化に効率的に適用できる。本発明の別の利点は、モノラル信号との互換性を有する点、及び、殆ど変更なしに既存の音声コーデックの拡張機能として実現できる点である。 One advantage of the present invention is that the amount of encoding computation can be reduced. Furthermore, in the case of multi-channel signals, the bit rate required to transmit the encoded signal is reduced. The present invention can be efficiently applied to re-encoding the same signal at a lower rate. Another advantage of the present invention is that it is compatible with monaural signals and can be implemented as an extension function of an existing audio codec with little change.

一般的なCELP音声合成モデルを図１Ａに示す。固定コードブック10は、それぞれインデックスkで特徴付けられる複数の候補励振信号30を含む。代数コードブック (algebraic codebook) の場合、インデックスkのみで、対応する候補励振信号30を完全に特徴付ける。各候補励振信号30は、ある特定の位置及び振幅を有する複数のパルス32を含む。インデックスkは、参照番号12に相当する、出力励振信号c_k(n)を出力する増幅器11において増幅される候補励振信号30を決定する。本発明の第１の主題ではないが、適応コードブック14は、増幅器15を介して適応励振信号v(n)を出力する。加算器17で励振信号c_k(n)と適応励振信号v(n)とが加算され、加算励振信号u(n)が出力される。破線13で示すように、加算励振信号u(n)は、後続の信号に対する適応コードブックに影響を及ぼす。 A general CELP speech synthesis model is shown in FIG. 1A. Fixed codebook 10 includes a plurality of candidate excitation signals 30, each characterized by an index k. In the case of an algebraic codebook, only the index k is used to fully characterize the corresponding candidate excitation signal 30. Each candidate excitation signal 30 includes a plurality of pulses 32 having a particular position and amplitude. The index k determines the candidate excitation signal 30 that is amplified in the amplifier 11 that outputs the output excitation signal c _k (n), corresponding to reference numeral 12. Although not the first subject of the present invention, the adaptive codebook 14 outputs an adaptive excitation signal v (n) via an amplifier 15. The adder 17 adds the excitation signal c _k (n) and the adaptive excitation signal v (n), and outputs the addition excitation signal u (n). As indicated by the dashed line 13, the additive excitation signal u (n) affects the adaptive codebook for subsequent signals.

加算励振信号u(n)は、線形予測合成部20の変換1/A(z)に対する入力信号として使用され、21で示される「予測」信号^s(n)が結果として得られる。この予測信号^s(n)は、通常は後処理22の後、CELP合成手順の出力信号となる。 The additive excitation signal u (n) is used as an input signal for the transformation 1 / A (z) of the linear prediction synthesis unit 20, and a “prediction” signal ^ s (n) indicated by 21 is obtained as a result. This prediction signal ^ s (n) is usually an output signal of the CELP synthesis procedure after post-processing 22.

着目する音声信号のA-b-S符号化において、CELP音声合成モデルが使用される。ターゲット信号s(n)、すなわち近似されようとする信号が供給される。適応コードブックを用いた長期予測 (long-term prediction) が行われ、現在のターゲット信号に対する過去の符号化を調整し、適応励振信号v(n) = g_pu(n-δ)を出力する。残留誤差を固定コードブック励振信号を決定するためのターゲットとし、平均二乗基準等の目的関数に従って差分を最小限にする、エントリc_kに対応するコードブックインデックスkをみつける必要がある。一般には、代数コードブックは、重み付き入力音声と重み付き合成音声との間の平均二乗誤差を最小限にすることにより探索される。固定コードブック探索の目的は、以下の式が最大になるように、インデックスkに対応する代数コードブックのエントリc_kを見つけることである。 A CELP speech synthesis model is used in AbS encoding of the speech signal of interest. A target signal s (n), ie a signal to be approximated, is supplied. Long-term prediction using an adaptive codebook is performed, adjusting the past coding for the current target signal and outputting an adaptive excitation signal v (n) = g _p u (n-δ) . It is necessary to find the codebook index k corresponding to the entry c _k that uses the residual error as a target for determining the fixed codebook excitation signal and minimizes the difference according to an objective function such as a mean square criterion. In general, the algebraic codebook is searched by minimizing the mean square error between the weighted input speech and the weighted synthesized speech. The purpose of the fixed codebook search is to find the algebraic codebook entry c _k corresponding to index k so that the following expression is maximized.

行列Hは、要素が重み付けフィルタのインパルス応答から導出されるフィルタリング行列である。y₂は、符号化される信号に依存する成分のベクトルである。 The matrix H is a filtering matrix whose elements are derived from the impulse response of the weighting filter. y ₂ is a vector of components depending on the signal to be encoded.

この固定コードブック手順を図1Bに示す。図1Bにおいて、インデックスkは、固定コードブック10からエントリc_kを励振信号12として選択する。確率的固定コードブックにおいて、通常、インデックスkはテーブルルックアップの入力としての役割を果たす。一方、代数固定コードブックにおいて、励振信号12はインデックスkから直接導出される。一般に、マルチパルス励振は以下のように書ける。 This fixed codebook procedure is shown in FIG. 1B. In FIG. 1B, index k selects entry c _k from fixed codebook 10 as excitation signal 12. In a stochastic fixed codebook, index k usually serves as an input for table lookup. On the other hand, in the algebraic fixed codebook, the excitation signal 12 is directly derived from the index k. In general, multipulse excitation can be written as:

ただし、p_i,kはインデックスkに対するパルス位置、b_i,kは個々のパルス振幅、Pはパルス数、δは次のようなディラックパルス関数である。 Here, p _{i, k} is the pulse position with respect to the index k, b _{i, k} is the individual pulse amplitude, P is the number of pulses, and δ is the following Dirac pulse function.

図1Cは、固定コードブック10の候補励振信号30の一例を示す図である。候補励振信号30は、複数のパルス32、この例では8本のパルスで特徴付けられる。パルス32は、パルスの位置P(1)〜P(8)及びそれらの振幅により特徴付けられる。通常の代数固定コードブックにおいては、振幅は+1又は-1である。 FIG. 1C is a diagram showing an example of candidate excitation signal 30 of fixed codebook 10. The candidate excitation signal 30 is characterized by a plurality of pulses 32, in this example eight pulses. Pulse 32 is characterized by pulse positions P (1) -P (8) and their amplitudes. In a normal algebraic fixed codebook, the amplitude is +1 or -1.

単一チャネルのエンコーダ/デコーダ・システムにおいては、通常、CELPモデルは図2に示されるように実現される。図1AのCELP合成モデルの各機能に対応する各部分には、実際のインプリメンテーションにおいて実質的には同じレベルにはないが主にそれら部分の機能により特徴付けられるので、同一の参照符号を付している。例えば、線形予測A-b-Sの実際の実現例において通常存在する誤差重み付けフィルタは表示されない。 In a single channel encoder / decoder system, the CELP model is typically implemented as shown in FIG. Each part corresponding to each function of the CELP synthesis model of FIG. 1A is characterized by the functions of those parts, although not substantially at the same level in an actual implementation. It is attached. For example, error weighting filters that normally exist in actual implementations of linear prediction A-b-S are not displayed.

33で示される符号化対象の信号s(n)がエンコーダ40に供給される。エンコーダは、上述の原理に従うCELP合成ブロック25を含む（図を分かりやすくするために、後処理は省略した。）CELP合成ブロック25からの出力は、比較器31において信号s(n)と比較される。重み付けフィルタにより重み付けされる差分37は、コードブック最適化ブロック35に供給され、コードブック最適化ブロック35は、12で示される最適な又は少なくとも概ね適切な励振信号c_k(n)を見つけるように、従来の原理に従って構成される。コードブック最適化ブロック35は、対応するインデックスkを固定コードブック10に供給する。最終的な励振信号が見つけられると、適応コードブック12のインデックスk及び遅延δがインデックス・エンコーダ38で符号化され、インデックスk及び遅延δを表す出力信号を供給する。 An encoding target signal s (n) indicated by 33 is supplied to the encoder 40. The encoder includes a CELP synthesis block 25 that follows the principles described above (postprocessing is omitted for clarity of illustration). The output from CELP synthesis block 25 is compared with signal s (n) in comparator 31. The The difference 37 weighted by the weighting filter is fed to the codebook optimization block 35, which finds the optimal or at least approximately appropriate excitation signal c _k (n), indicated at 12. Constructed according to conventional principles. The codebook optimization block 35 supplies the corresponding index k to the fixed codebook 10. When the final excitation signal is found, index k and delay δ of adaptive codebook 12 are encoded by index encoder 38 to provide an output signal representing index k and delay δ.

インデックスk及び遅延δの表現は、デコーダ50に供給される。デコーダは、上述の原理に従うCELP合成ブロック25を含む。(図を分かりやすくするために、後処理は省略される。)インデックスk及び遅延δの表現は、インデックス・デコーダ53において復号化され、インデックスk及び遅延δは、入力パラメータとして固定コードブック及び適応コードブックにそれぞれ供給される。その結果、原信号s(n)に類似するであろう21で示される合成信号^s(n)が得られる。 The representation of the index k and the delay δ is supplied to the decoder 50. The decoder includes a CELP synthesis block 25 that follows the principles described above. (Post-processing is omitted for clarity of illustration.) The representation of index k and delay δ is decoded in index decoder 53, where index k and delay δ are fixed codebook and adaptive as input parameters. Supplied to each codebook. As a result, a composite signal ^ s (n) indicated by 21 that will be similar to the original signal s (n) is obtained.

インデックスk及び遅延δの表現は、短期間又は長期間、エンコーダとデコーダとの間の任意の場所に格納される。これにより、例えばオーディオ録音の格納が必要とする格納能力は相対的に小さくなる。 The representation of the index k and the delay δ is stored anywhere between the encoder and decoder for a short or long period. Thereby, for example, the storage capacity required for storing audio recordings becomes relatively small.

本発明は、音声符号化、より一般にはオーディオ符号化に関する。典型的な例において、本発明は、主信号s_M(n)をCELP技術に従って符号化するとともに、別の信号s_S(n)を符号化することが望ましい場合に対処する。その別の信号とは、例えば低ビットレートでの再符号化における同一の主信号s_S(n)=s_M(n)である場合もあるし、主信号の符号化バージョンs_S(n)=^s_M(n)である場合もあるし、あるいは、ステレオ、マルチチャネル5.1等の別のチャネルに対応する信号である場合もある。 The present invention relates to speech coding, and more generally to audio coding. In a typical example, the present invention addresses the case where it is desirable to encode the main signal s _M (n) according to the CELP technique and to encode another signal s _S (n). The other signal may be, for example, the same main signal s _S (n) = s _M (n) in re-encoding at a low bit rate, or the encoded version of the main signal s _S (n) = ^ s _M (n) or a signal corresponding to another channel such as stereo or multi-channel 5.1.

したがって本発明は、遠隔会議アプリケーションにおける音声のステレオ符号化、より一般にはマルチチャネル符号化に直接適用可能である。本発明の応用例は、オープンループ・コンテンツ依存符号化又はクローズドループ・コンテンツ依存符号化の一部として、オーディオ符号化を含むことができる。 Thus, the present invention is directly applicable to audio stereo coding, and more generally multi-channel coding, in teleconferencing applications. Applications of the invention can include audio coding as part of open loop content dependent coding or closed loop content dependent coding.

本発明が最適な条件で動作するために、主信号と他の信号との間に相関があることが好ましい。ただし、そのような相関の存在は、本発明が適切に動作するための必須の条件ではない。実際には、本発明は適応して動作し、主信号と他の信号との間の相関度に依存して実施される。ステレオアプリケーションにおいては、左チャネルと右チャネルとの間に因果関係があるため、多くの場合、和信号を主信号s_M(n)とし、左チャネル及び右チャネルの差信号をs_S(n)とすることになる。 In order for the present invention to operate under optimal conditions, it is preferred that there be a correlation between the main signal and other signals. However, the existence of such correlation is not an essential condition for the proper operation of the present invention. In practice, the present invention operates adaptively and is implemented depending on the degree of correlation between the main signal and other signals. In stereo applications, there is a causal relationship between the left and right channels, so in many cases the sum signal is the main signal s _M (n) and the difference signal between the left and right channels is s _S (n) Will be.

本発明の仮定は、主信号s_M(n)がCELP符号化表現において利用可能であることである。本発明の１つの基本概念は、他の信号s_S(n)の符号化中の固定コードブックにおける探索を候補励振信号のサブセットに限定することである。このサブセットは、主信号のCELP符号化に依存して選択される。好適な実施形態においては、サブセットの候補励振信号のパルスは、主信号のパルス位置に依存するパルス位置のセットに限定される。これは、制約された候補パルス位置を規定することに対応する。通常、利用可能なパルス位置のセットは、主信号のパルス位置とそれに隣接するパルス位置に設定される。 The assumption of the present invention is that the main signal s _M (n) is available in the CELP coded representation. One basic concept of the invention is to limit the search in the fixed codebook during encoding of the other signal s _S (n) to a subset of candidate excitation signals. This subset is selected depending on the CELP coding of the main signal. In a preferred embodiment, the subset of candidate excitation signal pulses is limited to a set of pulse positions that depend on the pulse positions of the main signal. This corresponds to defining constrained candidate pulse positions. Usually, the set of available pulse positions is set to the pulse position of the main signal and the adjacent pulse positions.

候補パルス数を減らすことにより、エンコーダの演算量を大幅に低減することができる。 By reducing the number of candidate pulses, the calculation amount of the encoder can be greatly reduced.

以下、２つのチャネル信号の一般的な例を示す。しかし、これは、それ以上のマルチチャネルに容易に拡張される。ただし、マルチチャネルの場合、各チャネルで異なる重み付けフィルタが与えられてターゲットが異なり、また、各チャネルのターゲットは互いに遅延することになる。 The following is a general example of two channel signals. However, this is easily extended to further multi-channels. However, in the case of multi-channel, a different weighting filter is given to each channel and the target is different, and the targets of each channel are delayed from each other.

主チャネル及び副チャネルは次式により構成される。 The main channel and the sub channel are constituted by the following equations.

ただし、s_L(n)及びs_R(n)はそれぞれ、左チャネル及び右チャネルの入力である。左チャネル及び右チャネルが互いに遅延したバージョンであっても、一般に主チャネル及び副チャネルは双方のチャネルからの情報を含むため、これは主チャネル及び副チャネルとの関係には当てはまらない。 However, s _L (n) and s _R (n) are inputs for the left channel and the right channel, respectively. Even though the left and right channels are delayed versions of each other, this is generally not the case with the main and subchannels because the main and subchannels contain information from both channels.

以下において、主チャネルが第１の符号化チャネルであり、その符号化に対する固定コードブック励振のパルス位置が利用可能であると仮定する。 In the following, it is assumed that the main channel is the first coding channel and the pulse position of the fixed codebook excitation for that coding is available.

副信号固定コードブック励振符号化のターゲットは、副信号と適応コードブック励振との間の差分として計算される。 The target of the sub-signal fixed codebook excitation encoding is calculated as the difference between the sub-signal and the adaptive codebook excitation.

ただし、g_Pv(n)は適応コードブック励振であり、s_C(n)は適応コードブック探索のターゲット信号である。 Here, g _P v (n) is adaptive codebook excitation, and s _C (n) is a target signal for adaptive codebook search.

本実施形態において、候補励振信号のとりうるパルス位置の数は、主信号のパルス位置に依存して規定される。それらのパルス位置はとりうる全ての位置のうちのほんの少しであるため、候補励振信号の限定されたセット内の励振信号を含む副信号を符号化するのに必要とされるビット量は、全てのパルス位置が現れる場合と比較して大幅に減少する。 In the present embodiment, the number of pulse positions that can be taken by the candidate excitation signal is defined depending on the pulse position of the main signal. Since those pulse positions are just a few of all possible positions, the amount of bits required to encode a sub-signal that contains excitation signals in a limited set of candidate excitation signals is all Compared with the case where the pulse position of 1 appears, it is greatly reduced.

主パルス位置に対してパルス候補位置を選択することは、演算量及び必要とされるビットレートを決定する際に重要である。 Selecting the pulse candidate position with respect to the main pulse position is important in determining the calculation amount and the required bit rate.

例えば、フレーム長をL、主信号符号化におけるパルス数をNとすると、パルス位置を符号化するのに約N*log2(L)ビット必要である。しかし、候補として主信号パルス位置のみを保持し且つ副信号に対する候補励振信号のパルス数がPである場合、副信号を符号化するためには約P*log2(N)ビットを必要とする。これは、N、P、Lの数が適切であれば、必要とするビットレートが大幅に減少することを意味する。 For example, if the frame length is L and the number of pulses in main signal encoding is N, approximately N * log2 (L) bits are required to encode the pulse position. However, if only the main signal pulse position is held as a candidate and the number of pulses of the candidate excitation signal for the sub signal is P, about P * log2 (N) bits are required to encode the sub signal. This means that if the number of N, P, and L is appropriate, the required bit rate is greatly reduced.

1つの興味深い側面は、副信号に対するパルス位置が主信号のパルス位置と同じに設定される場合である。パルス位置の符号化は必要なく、パルス振幅の符号化だけが必要である。+1/-1の振幅を有するパルスを含む代数コードブックの場合には、符号のみ(Nビット)が符号化される必要がある。 One interesting aspect is when the pulse position for the sub-signal is set to be the same as the pulse position of the main signal. No encoding of the pulse position is necessary, only the encoding of the pulse amplitude is necessary. In the case of an algebraic codebook including a pulse having an amplitude of + 1 / -1, only the code (N bits) needs to be encoded.

ここで、主信号パルス位置をP_M(i), i = 1, ...nで示す。副信号の候補励振信号のパルス位置は、主信号パルス位置及び可能な追加のパラメータに基づいて選択される。追加のパラメータは、2つのチャネル間の時間遅延及び/又は適応コードブックインデックスの差を含みうる。 Here, the main signal pulse positions are denoted by P _M (i), i = 1,. The pulse position of the candidate excitation signal for the sub-signal is selected based on the main signal pulse position and possible additional parameters. Additional parameters may include time delay between the two channels and / or adaptive codebook index differences.

本実施形態において、副信号候補励振信号に対するパルス位置のセットは、以下のように構成される。 In the present embodiment, the set of pulse positions for the sub-signal candidate excitation signal is configured as follows.

ただし、J(i,k)は遅延インデックスを示す。これは、各パルス位置が、副信号パルス探索手順における候補励振信号を構成するために使用されるパルス位置のセットを生成することを意味する。これを図3(a)に示す。ここで、P_Mは主信号の励振信号のパルス位置を示し、P^* _Sは副信号分析における候補励振信号のとりうるパルス位置を示す。 Here, J (i, k) represents a delay index. This means that each pulse position generates a set of pulse positions that are used to construct the candidate excitation signal in the sub-signal pulse search procedure. This is shown in FIG. Here, P _M represents the pulse position of the excitation signal of the main signal, P ^* _S shows the Possible pulse position candidate excitation signal in the side signal analysis.

当然、これは相関の高い信号に最適である。相関の低い信号又は相関のない信号の場合には、逆の戦略をとる。これは、パルス候補を次式のセットに属さない全てのパルスとすることになる。 Of course, this is optimal for highly correlated signals. The reverse strategy is taken for signals with low or no correlation. This means that pulse candidates are all pulses that do not belong to the set of the following equations.

これは相補的な例であるため、双方の戦略が類似していることは当業者には容易に理解されよう。相関する例のみを更に詳細に説明する。 Since this is a complementary example, those skilled in the art will readily understand that both strategies are similar. Only correlated examples will be described in more detail.

パルス候補の位置及び数が遅延インデックスJ(i,k)に依存することは容易に理解されよう。遅延インデックスは、2つのチャネル間の実際の遅延及び/又は適応コードブックインデックスに依存するようにしてもよい。図3(a)においては、k max = 3、かつ、J(i,k) = J(k) ∈ {-1, 0, +1} である。 It will be readily understood that the position and number of pulse candidates depends on the delay index J (i, k). The delay index may depend on the actual delay between the two channels and / or the adaptive codebook index. In FIG. 3 (a), k max = 3 and J (i, k) = J (k) ∈ {-1, 0, +1}.

図3(b)においては、僅かに異なる別のパルス位置の選択を行う。ここで、k max = 3であるが、J(i,k) = J(k) ∈ {0, +1, +2} である。 In FIG. 3 (b), another slightly different pulse position is selected. Here, k max = 3, but J (i, k) = J (k) ∈ {0, +1, +2}.

パルス位置の選択方法のルールには種々の方法をとりうることは当業者に認識されよう。使用する実際のルールは、実際のインプリメンテーションに適合させるとよい。しかし、重要な特徴は、パルス位置候補がある特定のルールに従う主信号分析による結果として得られるパルス位置に依存して選択されることである。このルールは、一意に固定してもよいし、あるいは、2つのチャネル間の相関及び/又は2つのチャネル間の遅延等に依存する所定のルールのセットから選択するようにしてもよい。 Those skilled in the art will recognize that various methods can be used for the rules of the pulse position selection method. The actual rules used should be adapted to the actual implementation. However, an important feature is that the pulse position candidates are selected depending on the resulting pulse positions as a result of the main signal analysis according to certain rules. This rule may be fixed uniquely, or may be selected from a predetermined set of rules that depend on the correlation between the two channels and / or the delay between the two channels.

使用されるルールに依存して、副信号のパルス候補のセットが構成される。一般に、副信号パルス候補のセットは、フレーム長全体と比較して非常に小さい。これにより、間引きフレームに基づく客観的な最大化問題を再公式化できる。 Depending on the rules used, a set of sub-signal pulse candidates is constructed. In general, the set of sub-signal pulse candidates is very small compared to the entire frame length. Thereby, the objective maximization problem based on the thinned frame can be reformulated.

一般的な例において、パルスは、例えば[5]において説明される深さ優先アルゴリズムを使用して、あるいは候補パルス数が非常に少ない場合には全数探索を使用して探索される。しかし、候補数が少ない場合でも、高速探索手順を使用することが推奨される。 In a general example, pulses are searched using, for example, a depth-first algorithm as described in [5], or using an exhaustive search if the number of candidate pulses is very small. However, it is recommended to use a fast search procedure even if the number of candidates is small.

一般に、後ろ向きフィルタ信号は、以下の式を使用して事前に計算される。 In general, the backward filter signal is pre-calculated using the following equation:

行列

は、h(n)(重み付けフィルタのインパルス応答)の相関の行列であり、その行列の要素は以下の式により計算される。 matrix

Is a matrix of correlation of h (n) (weighting filter impulse response), and the elements of the matrix are calculated by the following equations.

従って、目的関数は以下のように書ける。 Therefore, the objective function can be written as

副信号のとりうる候補パルス位置のセットを仮定すると、後ろ向きフィルタリングベクトルdのインデックスのサブセット及び行列Φのみが必要である。候補パルスのセットは、昇順にソートされる。 Assuming a set of candidate pulse positions that the sub-signal can take, only a subset of the indices of the backward filtering vector d and the matrix Φ are required. The set of candidate pulses is sorted in ascending order.

P^* _S(i)は候補パルス位置であり、pはその数である。なお、pはフレーム長Lより常に小さく、通常は十分に小さい。 P ^* _S (i) is a candidate pulse position, and p is the number thereof. Note that p is always smaller than the frame length L and is usually sufficiently small.

間引き信号を、次式で表す。 The thinned signal is expressed by the following equation.

また、間引き相関行列Φ₂を、次式で表す。 Further, the decimation correlation matrix Φ ₂ is expressed by the following equation.

この場合、Φ₂は対称行列であり、かつ正定値である。また、以下のように直接書くことができる。 In this case, Φ ₂ is a symmetric matrix and is positive definite. It can also be written directly as follows:

ただし、c'_k'は、新たな代数コードベクトルである。インデックスは、縮小されたサイズのコードブックへの新たなエントリであるk'となる。 Here, c ′ _{k ′} is a new algebraic code vector. The index becomes k ′, which is a new entry into the reduced size codebook.

これらの間引き動作の概要を図4に示す。図の上段には、元のサイズの代数コードブック10を、縮小されたサイズのコードブック10'へ縮小することが示されている。中段には、元のサイズの重み付けフィルタ共分散行列60を、縮小された重み付けフィルタ共分散行列60'に縮小することが示されている。また、下段には、元のサイズの後ろ向きフィルタリングされたターゲット62を、縮小されたサイズの後ろ向きフィルタリングされたターゲット62'に縮小することが示されている。当業者には、そのような縮小の結果、演算量が低減されることが理解されよう。 An outline of these thinning operations is shown in FIG. The upper part of the figure shows that the original size algebraic codebook 10 is reduced to a reduced size codebook 10 '. In the middle stage, it is shown that the original size of the weighting filter covariance matrix 60 is reduced to a reduced weighting filter covariance matrix 60 ′. Also, the lower part shows that the original size of the backward filtered target 62 is reduced to the reduced size of the backward filtered target 62 ′. Those skilled in the art will appreciate that the amount of computation is reduced as a result of such reduction.

間引き信号の目的関数を最大限にすることは、いくつかの利点を有する。その1つは、必要とされるメモリ容量を少なくできることであり、例えば、行列Φ₂に必要とされるメモリ容量を少なくできる。別の利点は、主信号のパルス位置がいずれの場合にも受信機に送信されるため、デコーダが間引き信号のインデックスを常に利用できることである。これにより、主信号のパルス位置に対して他の信号(副)のパルス位置を符号化でき、これが使用するビットは非常に少ない。別の利点は、最大化が間引き信号に対して実行されるため、演算量が低減することである。 Maximizing the objective function of the decimation signal has several advantages. One of them is that the required memory capacity can be reduced. For example, the memory capacity required for the matrix Φ ₂ can be reduced. Another advantage is that the decoder can always use the index of the decimation signal because the pulse position of the main signal is transmitted to the receiver in any case. Thereby, the pulse position of another signal (sub) can be encoded with respect to the pulse position of the main signal, and this uses very few bits. Another advantage is that the amount of computation is reduced because maximization is performed on the decimation signal.

図5Aにおいて、本発明によるエンコーダ40A、40B及びデコーダ50A、50Bのシステムの一実施形態を示す。多くの詳細は図2と同様であるため、機能が本質的に変更されていない部分については詳細な説明は省略する。33Aで示される主信号s_M(n)は、第１のエンコーダ40Aに供給される。第１のエンコーダ40Aは、従来の任意のCELP符号化モデルに従って動作し、固定コードブックに対するインデックスk_m及び適応コードブックに対する遅延基準δ_mを生成する。この符号化の詳細は、本発明にとって重要な部分ではないため、図5Aを理解し易くするために省略する。パラメータk_m及びδ_mは第１のインデックスエンコーダ38Aにおいて符号化され、第１のデコーダ50Aに送出されるパラメータの表現k^* _m及びδ^* _mを与える。第１のデコーダにおいて、表現k^* _m及びδ^* _mは、第１のインデックス・デコーダ53Aでパラメータk_m及びδ_mに復号化される。原信号は、従来技術による任意のCELP復号化モデルに従ってそれらパラメータから再生成される。この復号化の詳細は、本発明にとって重要な部分ではないため、図5Aを理解し易くするために省略する。21Aで示される、再生成された第１の出力信号^s_m(n)が出力される。 FIG. 5A shows an embodiment of a system of encoders 40A, 40B and decoders 50A, 50B according to the present invention. Many of the details are the same as those in FIG. 2, and therefore, detailed descriptions of the parts whose functions are not essentially changed are omitted. The main signal s _M (n) indicated by 33A is supplied to the first encoder 40A. The first encoder 40A operates according to any conventional CELP coding model, to generate a delayed reference [delta] _m for index k _m and adaptive codebook for a fixed codebook. Details of this encoding are not an important part of the present invention, and will be omitted for easy understanding of FIG. 5A. Parameter k _m and [delta] _m is encoded in the first index encoder 38A, giving a representation k ^* _m and [delta] ^* _m of parameters to be sent to the first decoder 50A. In the first decoder, the expression k ^* _m and [delta] ^* _m are decoded into parameters k _m and [delta] _m in the first index decoder 53A. The original signal is regenerated from those parameters according to any CELP decoding model according to the prior art. Details of this decoding are not important parts of the present invention, and will be omitted to facilitate understanding of FIG. 5A. The regenerated first output signal ^ s _m (n) indicated by 21A is output.

33Bで示される副信号s_S(n)は、第２のエンコーダ40Bに入力信号として供給される。第２のエンコーダ40Bの殆どの部分は、図2のエンコーダと同様である。ここで信号は、主信号を符号化するのに使用された信号と区別するためにインデックス「s」を与えられる。第２のエンコーダ40Bは、CELP合成ブロック25を含む。本発明によると、インデックスk_m又はそのインデックスの表現は、第１のエンコーダ40Aから第２のエンコーダ40Bの固定コードブック10の入力45に供給される。インデックスk_mは、上述の与えられた原理に従って縮小された固定コードブック10'を抽出するために、候補導出手段47により使用される。第２のエンコーダ40BのCELP合成ブロック25'の合成は、縮小された固定コードブック10'からの励振信号c'_k'S(n)を表すインデックスk'_Sに基づく。インデックスk'_Sは、CELP合成の最適な選択を表すように見つけられる。パラメータk'_S及びδ_Sは第２のインデックス・エンコーダ38Bにおいて符号化され、第２のデコーダ50Bに送出されるパラメータの表現k'^* _S及びδ^* _Sを与える。 The sub signal s _S (n) indicated by 33B is supplied as an input signal to the second encoder 40B. Most of the second encoder 40B is the same as the encoder of FIG. Here the signal is given an index “s” to distinguish it from the signal used to encode the main signal. The second encoder 40B includes a CELP synthesis block 25. According to the present invention, the expression of the index k _m or index, is supplied from the first encoder 40A to the input 45 of the fixed codebook 10 of the second encoder 40B. Index k _m, in order to extract reduced fixed codebook 10 'according to the principles given above, is used by the candidate deriving means 47. The synthesis of the CELP synthesis block 25 ′ of the second encoder 40B is based on the index k ′ _S representing the excitation signal c ′ _k ′ _S (n) from the reduced fixed codebook 10 ′. The index k ′ _S is found to represent the optimal choice of CELP synthesis. The parameters k ′ _S and δ _S are encoded in the second index encoder 38B and give the parameter representations k ′ ^* _S and δ ^* _S sent to the second decoder 50B.

第２のデコーダ50Bにおいて、表現k'^* _S及びδ^* _Sは、第２のインデックス・デコーダ53Bでパラメータk'_S及びδ_Sに復号化される。更に、第２のエンコーダ40Bで使用されたものと同等の縮小された固定コードブック10'の候補導出手段57による抽出を可能にするために、インデックスパラメータk_mは、第１のデコーダ50Aから得ることが可能であり、また、第２のデコーダ50Bの固定コードブック10の入力に供給される。元の副信号は、通常のCELP復号化モデル25"に従って、パラメータk'_S及びδ_S、並びに縮小された固定コードブック10'から再生成される。この復号化の詳細は、本質的に図2と同様に実行されるが、縮小固定コードブック10'を使用する。このようにして、21Bで示される、再生成された出力副信号^s_S(n)が出力される。 In the second decoder 50B, the representations k ′ ^* _S and δ ^* _S are decoded into parameters k ′ _S and δ _S by the second index decoder 53B. Furthermore, in order to allow extraction of the second candidate deriving means 57 of those used in the encoder 40B equivalent reduced fixed codebook 10 ', the index parameter k _m, obtained from the first decoder 50A And is fed to the input of the fixed codebook 10 of the second decoder 50B. The original sub-signal is regenerated from the parameters k ′ _S and δ _S and the reduced fixed codebook 10 ′ according to the normal CELP decoding model 25 ″. Details of this decoding are essentially shown in FIG. Performed in the same way as 2, but using the reduced fixed codebook 10 ', in this way, the regenerated output sub-signal ^ s _S (n), denoted 21B, is output.

インデックス付け関数J(i,k)等の候補パルスのセットを構成するためのルールの選択は有利に適応され、遅延パラメータ、相関度等の追加のチャネル間特性に依存する。この場合、すなわち適応されたルールの選択において、エンコーダは、他の信号を符号化するために候補パルスのセットを導出するのに選択されたルールをデコーダに送信するのが好ましい。ルールの選択は、例えば閉ループ手順により実行される。ここで、複数のルールがテストされ、最終的に最適な結果を与えるルールが選択される。 The selection of rules for constructing a set of candidate pulses, such as an indexing function J (i, k), is advantageously adapted and depends on additional interchannel characteristics such as delay parameters, degree of correlation and the like. In this case, i.e. in the selection of adapted rules, the encoder preferably sends the selected rules to the decoder to derive a set of candidate pulses for encoding other signals. The selection of the rule is executed by, for example, a closed loop procedure. Here, a plurality of rules are tested, and a rule that finally gives an optimal result is selected.

図5Bは、ルール選択方法を使用して一実施形態を示す。モノ信号s_M(n)及び好ましくは副信号s_S(n)は、本実施形態においてはルール選択ユニット39にも提供される。モノ信号の代わりに、モノ信号を表すパラメータk_mを使用できる。ルール選択ユニット39において、信号は、例えば遅延パラメータ又は相関度に対して分析される。その結果によって、例えばインデックスrにより表されるルールが所定のルールのセットから選択される。選択されたルールのインデックスは、候補セットの導出方法を判定する候補導出手段47に提供される。ルールインデックスrは第２のインデックスエンコーダ38Bに更に提供され、第２のインデックスエンコーダ38Bはインデックスの表現r^*を与える。表現r^*は、その後第２のデコーダ50Bに送出される。第２のインデックスデコーダ53Bはルールインデックスrを復号化し、それは候補導出手段57の動作を管理するために使用される。 FIG. 5B illustrates one embodiment using a rule selection method. The mono signal s _M (n) and preferably the sub-signal s _S (n) are also provided to the rule selection unit 39 in this embodiment. Instead of the mono signal may be used a parameter k _m representing the mono signal. In the rule selection unit 39, the signal is analyzed for delay parameters or correlations, for example. Based on the result, for example, the rule represented by the index r is selected from a set of predetermined rules. The index of the selected rule is provided to candidate derivation means 47 that determines how to derive the candidate set. The rule index r is further provided to a second index encoder 38B, which provides an index representation r ^* . The representation r ^* is then sent to the second decoder 50B. The second index decoder 53B decodes the rule index r, which is used to manage the operation of the candidate derivation means 57.

このように、異なる種類の信号に適するルールのセットが提供される。データの転送の際に単一のルールインデックスを追加するだけで、更なる融通性が達成される。 In this way, a set of rules suitable for different types of signals is provided. Additional flexibility is achieved by simply adding a single rule index during the transfer of data.

使用される特定のルール及び結果として得られる候補副信号パルスの数は、ビットレート及びアルゴリズムの演算量を決める主なパラメータである。 The particular rule used and the resulting number of candidate sub-signal pulses are the main parameters that determine the bit rate and the computational complexity of the algorithm.

上述したように、全く同一の原理が同一チャネルの再符号化に同様に適切に適用される。図6は、送信パスの異なる部分が異なるビットレートを可能にする一実施形態を示す。これは、レートトランスコーディング方式の一部として適用可能である。信号s(n)は、第１のエンコーダ40Aに入力信号33Aとして提供される。第１のエンコーダ40Aは、第１のビットレートに従って送信されるパラメータの表現k^*及びδ^*を生成する。ある特定の場所においては、利用可能なビットレートは減少し、より低いビットレートに対して再符号化が実行される必要がある。第１のデコーダ50Aは、21Aで示される再生成信号^s(n)を生成するためにパラメータの表現k^*及びδ^*を使用する。この21Aで示される再生成信号^s(n)は、入力信号33Bとして第２のエンコーダ40Bに提供される。第１のデコーダ50Aからのインデックスkは、第２のエンコーダ40Bに提供される。インデックスkは、縮小された固定コードブック10'を抽出するために使用された図6と同様である。第２のエンコーダ40Bは低ビットレートに対して信号^s(n)を符号化し、選択した励振信号c'_^k'(n)を表すインデックス^k'を与える。しかし、遠隔のデコーダが対応する縮小された固定コードブックを構成するのに必要な情報を有さないため、このインデックス^k'はそのデコーダにおいては殆ど使用されない。このインデックス^k'は、元のコードブック10を参照するインデックス^kと関連付けられる必要がある。これは、固定コードブック10と関連して実行されるのが好ましく、^k'の入力を示す矢印41及び^kの出力を示す矢印43により図6に表される。インデックス^kの符号化は、候補励振信号のセット全体を参照して実行される。 As mentioned above, the exact same principle applies equally well to re-encoding of the same channel. FIG. 6 shows an embodiment in which different parts of the transmission path allow different bit rates. This is applicable as part of the rate transcoding scheme. The signal s (n) is provided as the input signal 33A to the first encoder 40A. The first encoder 40A generates parameter representations k ^* and δ ^* to be transmitted according to the first bit rate. In certain locations, the available bit rate is reduced and re-encoding needs to be performed for lower bit rates. The first decoder 50A uses the parameter representations k ^* and δ ^* to generate the regenerated signal ^ s (n) denoted 21A. The regeneration signal ^ s (n) indicated by 21A is provided to the second encoder 40B as the input signal 33B. The index k from the first decoder 50A is provided to the second encoder 40B. The index k is similar to FIG. 6 used to extract the reduced fixed codebook 10 ′. The second encoder 40B encodes the signal ^ s (n) for the low bit rate and provides an index ^ k 'representing the selected excitation signal c' _{^ k '} (n). However, this index ^ k 'is rarely used in the decoder because the remote decoder does not have the information necessary to construct the corresponding reduced fixed codebook. This index ^ k 'needs to be associated with an index ^ k that references the original codebook 10. This is preferably performed in connection with the fixed codebook 10 and is represented in FIG. 6 by an arrow 41 indicating the input of ^ k 'and an arrow 43 indicating the output of ^ k. The encoding of the index ^ k is performed with reference to the entire set of candidate excitation signals.

典型的な例において、第１の符号化はビットレートnで行われ、第２の符号化はビットレートmで行われる。ただし、n＞mである。 In a typical example, the first encoding is performed at a bit rate n, and the second encoding is performed at a bit rate m. However, n> m.

異なる能力を有する異なる種類のネットワークを介してライブコンテンツをリアルタイム送信する（例えば、遠隔会議（等のある特定のアプリケーションにおいて、例えば、異なる種類のネットワークに対応するためにいくつかの異なるビットレートでの同一信号のリアルタイム符号化、いわゆる並列マルチレート符号化が必要な場合、異なるビットレートでの並列符号化を提供することは興味深いだろう。図7は、信号s(n)が第１のエンコーダ40A及び第２のエンコーダ40Bに供給されるシステムを示す。先の実施形態と同様に、第２のエンコーダは、第１の符号化を表すインデックスk_aに基づいて縮小された固定コードブック10'を提供する。第２の符号化はインデックス「b」により示される。第２のエンコーダ40Bは、第１のデコーダ50Aとは無関係になる。他の殆どの部分は図6と同様であるが、適応インデックス付けを含む。 Send live content over different types of networks with different capabilities in real time (eg, in certain applications such as teleconferencing (eg, at several different bit rates to accommodate different types of networks) If real-time coding of the same signal, so-called parallel multi-rate coding, is needed, it would be interesting to provide parallel coding at different bit rates. and shows a system to be supplied to the second encoder 40B similarly to the. previous embodiments, the second encoder, the fixed codebook 10 'that are reduced on the basis of the index k _a representative of the first encoding The second encoding is indicated by the index “b.” The second encoder 40B is independent of the first decoder 50A. To become. Most other parts is the same as FIG. 6, it includes an adaptive indexing.

低レートで同一信号を再符号化するそれら2つの応用例に対して、本発明は本質的に演算量を低減し、それら応用例を低コストのハードウェアで実現することを可能にする。 For those two applications that re-encode the same signal at a low rate, the present invention inherently reduces the amount of computation and allows the applications to be implemented with low-cost hardware.

上述のアルゴリズムの一実施形態は、AMR-WB音声コーデックと関連して実現された。副信号を符号化するために、同一の適応コードブックインデックスがモノ励振を符号化するのに使用されたように使用される。LTP利得及び改良ベクトル利得は、量子化されなかった。 One embodiment of the above algorithm was implemented in connection with the AMR-WB speech codec. To encode the side signal, the same adaptive codebook index is used as was used to encode the mono excitation. LTP gain and improved vector gain were not quantized.

代数コードブックに対するアルゴリズムは、モノパルス位置に基づいた。[6]において説明されるように、コードブックはトラックで構成されてもよい。最小モードを除いて、トラック数は4である。各モードに対して、ある特定のパルス位置の数が使用される。例えばモード5の場合、すなわち15.85kbpsの場合、候補パルス位置は以下の通りである。 The algorithm for the algebraic codebook was based on monopulse position. As described in [6], the codebook may be composed of tracks. Except for the minimum mode, the number of tracks is four. For each mode, a certain number of pulse positions is used. For example, in the case of mode 5, that is, in the case of 15.85 kbps, the candidate pulse positions are as follows.

実現されたアルゴリズムは、全てのモノパルスを副信号のパルス位置として保持する。すなわち、パルス位置は符号化されない。パルスの符号のみが符号化される。 The implemented algorithm holds all monopulses as sub-signal pulse positions. That is, the pulse position is not encoded. Only the sign of the pulse is encoded.

各パルスは、符号を符号化するのに使用するのは1ビットだけである。これにより、合計ビットレートはモノパルスの数と同等になる。上述の例において、サブフレーム毎に12個のパルスが存在し、改良ベクトルを符号化するための合計ビットレートは12ビット×4×50=2.4kbpsと同等になる。これは、最小AMR-WBモード(6.6kbpsモードに対して2パルス)に必要とされるビット数と同一であるが、この場合はより高いパルス密度を有する。 Each pulse uses only one bit to encode the code. As a result, the total bit rate becomes equal to the number of monopulses. In the above example, there are 12 pulses per subframe, and the total bit rate for encoding the improved vector is equivalent to 12 bits × 4 × 50 = 2.4 kbps. This is the same as the number of bits required for the minimum AMR-WB mode (2 pulses for the 6.6 kbps mode) but in this case has a higher pulse density.

なお、ステレオ信号を符号化するために、追加のアルゴリズム遅延は必要ない。 Note that no additional algorithm delay is required to encode the stereo signal.

図8は、知覚品質を評価するためのPEAQ [4] により得られる結果を示す。PEAQは、よく知られているため選択したもので、ステレオ信号に対する客観的な品質基準を提供する唯一のツールである。ステレオ100がモノ信号102と比較して品質を実際に上げることが、その結果から明らかに分かる。使用されるサウンドアイテムは種々あり、サウンド1のS1は背景ノイズのある映画から抽出されたものであり、サウンド2のS2は1分間のラジオ録音であり、サウンド3であるS3はカートレーススポーツイベントであり、サウンド4であるS4は実際の2マイク録音である。 Figure 8 shows the results obtained with PEAQ [4] for assessing perceptual quality. PEAQ was chosen because it is well known and is the only tool that provides an objective quality standard for stereo signals. The result clearly shows that the stereo 100 actually improves the quality compared to the mono signal 102. There are various sound items used, S1 of Sound 1 is extracted from a movie with background noise, S2 of Sound 2 is a one minute radio recording, S3 of Sound 3 is a car racing sport event And S4, which is Sound 4, is an actual 2-microphone recording.

図9は、本発明による符号化方法の一実施形態を示すフローチャートである。手順はステップ200で開始する。ステップ210において、第１のオーディオ信号に対するCELP励振信号の表現が供給される。なお、第１のオーディオ信号全体を提供する必要はなく、CELP励振信号の表現のみを供給する。ステップ212において、第１のオーディオ信号と関係付けられる第２のオーディオ信号が供給される。ステップ214において、候補励振信号のセットは、第１のCELP励振信号に従って導出される。候補励振信号のパルス位置は、第１のオーディオ信号のCELP励振信号のパルス位置に関係付けられるのが好ましい。ステップ216において、CELP符号化は、ステップ214で導出された候補励振信号の縮小されたセットを使用して第２のオーディオ信号に対して実行される。最後に、第２のオーディオ信号に対するCELP励振信号の表現、すなわち通常はインデックスが、縮小された候補セットに対する参照を使用して符号化される。手順は、ステップ299において終了する。 FIG. 9 is a flowchart showing an embodiment of the encoding method according to the present invention. The procedure starts at step 200. In step 210, a representation of the CELP excitation signal for the first audio signal is provided. Note that it is not necessary to provide the entire first audio signal, only a representation of the CELP excitation signal is provided. In step 212, a second audio signal associated with the first audio signal is provided. In step 214, a set of candidate excitation signals is derived according to the first CELP excitation signal. The pulse position of the candidate excitation signal is preferably related to the pulse position of the CELP excitation signal of the first audio signal. In step 216, CELP encoding is performed on the second audio signal using the reduced set of candidate excitation signals derived in step 214. Finally, a representation of the CELP excitation signal for the second audio signal, usually an index, is encoded using a reference to the reduced candidate set. The procedure ends at step 299.

図10は、本発明による符号化方法の別の実施形態を示す。手順は、ステップ200で開始する。ステップ211において、オーディオ信号が供給される。ステップ213において、同一のオーディオ信号に対する第１のCELP励振信号の表現が供給される。ステップ215において、候補励振信号のセットは第１のCELP励振信号に従って導出される。候補励振信号のパルス位置は、第１のオーディオ信号のCELP励振信号のパルス位置に関係付けられるのが好ましい。ステップ217において、CELP再符号化は、ステップ215で導出された候補励振信号の縮小されたセットを使用してオーディオ信号に対して実行される。最後に、オーディオ信号に対する第２のCELP励振信号の表現、すなわち通常はインデックスが、非縮小候補セット、すなわち第１のCELP符号化に対して使用されたセットに対する参照を使用して符号化される。手順は、ステップ299において終了する。 FIG. 10 shows another embodiment of the encoding method according to the present invention. The procedure starts at step 200. In step 211, an audio signal is provided. In step 213, a representation of the first CELP excitation signal for the same audio signal is provided. In step 215, a set of candidate excitation signals is derived according to the first CELP excitation signal. The pulse position of the candidate excitation signal is preferably related to the pulse position of the CELP excitation signal of the first audio signal. In step 217, CELP re-encoding is performed on the audio signal using the reduced set of candidate excitation signals derived in step 215. Finally, a representation of the second CELP excitation signal for the audio signal, i.e. usually an index, is encoded using a reference to the unreduced candidate set, i.e. the set used for the first CELP encoding. . The procedure ends at step 299.

図11は、本発明による復号化方法の一実施形態を示す。手順は、ステップ200で開始する。ステップ210において、第１のオーディオ信号に対する第１のCELP励振信号の表現が供給される。ステップ252において、第２のオーディオ信号に対する第２のCELP励振信号の表現が供給される。ステップ254において、第２の励振信号は、第１の励振信号の知識を使用して第２のCELP励振信号の表現から導出される。候補励振信号の縮小されたセットは第１のCELP励振信号に従って導出され、その縮小セットから、第２の励振信号が第２のCELP励振信号に対するインデックスを使用して選択されるのが好ましい。ステップ256において、第２のオーディオ信号は、第２の励振信号を使用して再構成される。手順は、ステップ299において終了する。 FIG. 11 shows an embodiment of a decoding method according to the present invention. The procedure starts at step 200. In step 210, a representation of the first CELP excitation signal for the first audio signal is provided. In step 252, a representation of the second CELP excitation signal for the second audio signal is provided. In step 254, the second excitation signal is derived from the representation of the second CELP excitation signal using knowledge of the first excitation signal. Preferably, a reduced set of candidate excitation signals is derived according to the first CELP excitation signal, from which the second excitation signal is selected using an index to the second CELP excitation signal. In step 256, the second audio signal is reconstructed using the second excitation signal. The procedure ends at step 299.

上述の実施形態は、本発明のいくつかの例として理解されるべきである。本発明の範囲から逸脱することなく、種々の変形、組合せ及び変更がそれら実施形態に対して行われうることは、当業者には理解されよう。特に、種々の実施形態における種々の部分解決策は、技術的に可能な他の構成において組み合わせることができる。しかし、本発明の範囲は添付の請求の範囲により規定される。 The above-described embodiments are to be understood as several examples of the invention. Those skilled in the art will appreciate that various modifications, combinations and changes can be made to the embodiments without departing from the scope of the invention. In particular, the various partial solutions in the various embodiments can be combined in other configurations that are technically possible. However, the scope of the invention is defined by the appended claims.

本発明により、代数コードブック及びCELPを使用して、複数オーディオチャネルを符号化する場合の複雑さ(メモリ及び算術演算)を非常に効果的に軽減し、かつ、ビットレートを非常に効果的に低減できる。 The present invention uses an algebraic codebook and CELP to greatly reduce the complexity (memory and arithmetic operations) when encoding multiple audio channels and to significantly increase the bit rate. Can be reduced.

符号励振線形予測モデルを示す概略図である。It is the schematic which shows a code excitation linear prediction model. 励振信号を導出する処理を示す概略図である。It is the schematic which shows the process which derives | leads-out an excitation signal. 符号励振線形予測モデルにおいて使用する励振信号の一実施形態を示す概略図である。FIG. 6 is a schematic diagram illustrating one embodiment of an excitation signal for use in a code-excited linear prediction model. 符号励振線形予測モデルによるエンコーダ及びデコーダの一実施形態を示すブロック図である。FIG. 3 is a block diagram illustrating an embodiment of an encoder and decoder according to a code-excited linear prediction model. （ａ）は、本発明に従って候補励振信号を選択する原理の一実施形態を示す図、（ｂ）は、本発明に従って候補励振信号を選択する原理の別の実施形態を示す図である。(A) is a diagram illustrating an embodiment of the principle of selecting a candidate excitation signal according to the present invention, and (b) is a diagram illustrating another embodiment of the principle of selecting a candidate excitation signal according to the present invention. 本発明の一実施形態により必要とするデータ量を低減できることを説明する図である。It is a figure explaining that the data amount required by one Embodiment of this invention can be reduced. 本発明による２つの信号に対するエンコーダ及びデコーダの一実施形態を示すブロック図である。FIG. 2 is a block diagram illustrating an embodiment of an encoder and decoder for two signals according to the present invention. 本発明による２つの信号に対するエンコーダ及びデコーダの別の実施形態を示すブロック図である。FIG. 6 is a block diagram illustrating another embodiment of an encoder and decoder for two signals according to the present invention. 本発明に従って信号を再符号化するためのエンコーダ及びデコーダの一実施形態を示すブロック図である。FIG. 2 is a block diagram illustrating one embodiment of an encoder and decoder for re-encoding a signal in accordance with the present invention. 本発明に従って異なるビットレートに対して信号を並列符号化するためのエンコーダ及びデコーダの一実施形態を示すブロック図である。FIG. 2 is a block diagram illustrating one embodiment of an encoder and decoder for parallel encoding signals for different bit rates in accordance with the present invention. 本発明の実施形態により達成される知覚品質を示す図である。FIG. 6 illustrates perceptual quality achieved by an embodiment of the present invention. 本発明による符号化方法の一実施形態の主なステップを示すフローチャートである。3 is a flowchart showing main steps of an embodiment of the encoding method according to the present invention. 本発明による符号化方法の別の実施形態の主なステップを示すフローチャートである。7 is a flowchart showing main steps of another embodiment of the encoding method according to the present invention; 本発明による復号化方法の一実施形態の主なステップを示すフローチャートである。3 is a flowchart illustrating main steps of an embodiment of a decoding method according to the present invention.

Claims

A method for encoding an audio signal, comprising:
And supplying express _{_{(k, k m, k a}} ) of the first excitation signal of a code excited linear prediction of the first audio signal (3 3A),
Deriving a candidate excitation signal (c '(n)) of the set (10') based on the previous SL first excitation signal,
Providing a second audio signal (33B) different from the first audio signal (33A);
Performing code-excited linear predictive coding of the second audio signal (3 3B) using the set (10 ') of the candidate excitation signals (c'(n));
A method characterized by comprising:

The method according to claim 1, characterized in that the second audio signal (3 3B) is correlated with the first audio signal (3 3A).

The step of deriving the set (10 ′) of the candidate excitation signals (c ′ (n)) includes: 1 from a set of predetermined rules based on at least one of the first excitation signal and the second audio signal. Method according to claim 1 or 2, characterized in that one rule is selected and the set (10 ') of candidate excitation signals (c' (n)) is derived according to the selected rule.

The first excitation signal includes n pulse positions (P _M ) out of a set of N possible pulse positions;
The candidate excitation signal (c ′ (n)) has pulse positions (P ^* _s ) only in a subset of the N possible pulse positions;
The subset of pulse positions (P ^* _s ) is selected based on n pulse positions (P _M ) of the first excitation signal. the method of.

If j is an index in the interval {i + L, i + K} (where K and L are integers and K> L) and i is an index of the n pulse positions, the subset of the pulse positions pulse position (P ^* _s) the method of claim 4, characterized in that it is located at the position p _j.

6. The method of claim 5, wherein K = 1 and L = -1.

7. The code-excited linear prediction of the second audio signal ( 33B) is performed using a global search in the set of candidate excitation signals (10 '). The method according to claim 1.

Encoding the second excitation signal of the code-excited linear prediction of the second audio signal ( 33B) with reference to the set of candidate excitation signals (10 ');
Wherein the representation of the first excitation signal _{_{(k, k m, k a}} ) and supplying a second excitation signal the encoded with,
The method according to claim 1, further comprising:

The method according to claim 3 and 8, characterized by further comprising the step of providing data representative of identification information of said representation of the first excitation signal _{_{(k, k m, k a}} ) said selected rule together.

Referring to the set (10) of candidate excitation signals having N possible pulse positions, the method further comprises the step of encoding the second excitation signal of the code excitation linear prediction of the second audio signal ( 33B). A method according to any one of claims 1 to 7, characterized in that

Said second excitation signal of m (where, m <n) The method according to any one of claims 1 to 1 0, characterized in that it has a pulse positions.

A method for decoding an audio signal (33A, 33B) comprising:
And supplying representations _{_{(k, k m, k a}} ) a first excitation signal of a code excited linear prediction of the first audio signal (33A),
Providing a representation (k ′ _s ) of a second excitation signal for code-excited linear prediction of a second audio signal (33B) different from the first audio signal (33A) ,
The second excitation signal is one of a set of candidate excitation signals (10 ′);
The set (10 ′) of candidate excitation signals is based on the first excitation signal;
The method further comprises:
Deriving the second excitation signal (c ′ _{k ′s} (n)) from the representation (k ′ _s ) of the second excitation signal based on information about the set (10 ′) of candidate excitation signals;
Reconstructing the second audio signal (^ s _S (n)) by predictive filtering the second excitation signal (c '_k's(n));
A method characterized by comprising:

It said second audio signal (33B) The method of claim 1 2, characterized in that there is a correlation between the first audio signal (33A).

The information regarding the set of candidate excitation signals (10 ′) includes identification information of one rule among a set of predetermined rules, and derivation of the set of candidate excitation signals (10 ′) is determined according to the rules. the method according to claim 1 2 or 1 3, characterized in that it is.

The first excitation signal includes n pulse positions (P _M ) out of a set of N possible pulse positions;
The candidate excitation signal has pulse positions (P ^* _s ) only in a subset of the N possible pulse positions;
It said subset of pulse position (P ^* _s) is any one of claims 1 2 to 1 4, characterized in that it is selected based on the n pulse position of the first excitation signal (P _M) The method described in 1.

If j is an index in the interval {i + L, i + K} (where K and L are integers and K> L) and i is an index of the n pulse positions, the subset of the pulse positions pulse position (P ^* _s) the method of claim 1 5, characterized in that it is located at the position p _j.

The method according to claim 16 , wherein K = 1 and L = −1.

An encoder (40B) for encoding an audio signal,
Representation of the first excitation signal of a code excited linear prediction of the first audio signal (3 3A) (k, k m, k a) a means for supplying (45),
Representation before Symbol first excitation signal (k, k _m, k _a) is connected to receive, and means (47) for deriving a set (10 ') of the candidate excitation signal based on said first excitation signal,
Means for providing a second audio signal (33B) different from the first audio signal (33A);
Means (25 ′) for performing code-excited linear prediction, connected to receive a representation of the second audio signal ( 33B) and the set of candidate excitation signals (10 ′), the candidate excitation signal Means for performing code-excited linear predictive coding of the second audio signal (3 3B) using a set (10 ′) of:
An encoder (40B) comprising:

The encoder according to claim 18 , wherein the second audio signal (3 3B) is correlated with the first audio signal (3 3A).

The means (47) for deriving a set (10 ′) of candidate excitation signals selects one rule from a set of predetermined rules based on at least one of the first excitation signal and the second audio signal. The encoder according to claim 18 or 19 , wherein a set (10 ') of the candidate excitation signals (c' (n)) is derived according to the selected rule.

The first excitation signal includes n pulse positions (P _M ) out of a set of N possible pulse positions;
The candidate excitation signal has pulse positions (P ^* _s ) only in a subset of the N possible pulse positions;
It said subset of pulse position (P ^* _s) is any one of claims 1 8 to 2 0, characterized in that it is selected based on the n pulse position of the first excitation signal (P _M) Encoder described in.

If j is an index in the interval {i + L, i + K} (where K and L are integers and K> L) and i is an index of the n pulse positions, the subset of the pulse positions pulse position (P ^* _s), the encoder according to claim 2 1, characterized in that it is located at the position p _j.

K = 1 and encoder of claim 2 2, characterized in that the L = -1.

The means (25 ') for performing code-excited linear prediction of the second audio signal ( 33B) performs a global search in the set of candidate excitation signals (10'). 8 or 2 3 encoder according to any one of.

Means (38B) for encoding the second excitation signal of the code-excited linear prediction of the second audio signal (33B) with reference to the set of candidate excitation signals (10 ');
Means for supplying a second excitation signal the encoded said representation _{_{(k, k m, k a}} ) with the first excitation signal,
Furthermore encoder according to any one of claims 1 8 to 2 4, characterized in that it comprises a.

The representation of the first excitation signal _{_{(k, k m, k a}} ) together with claim 2 0 and 2 5, characterized in that further comprising a means for supplying data representative of the identity of the selected rule Encoder.

Means for encoding the second excitation signal of the code excitation linear prediction of the second audio signal (33, 33B) with reference to a set (10) of candidate excitation signals having N possible pulse positions (38B); the encoder according to any one of claims 1 8 to 2 4), characterized in that it further comprises a.

Said second excitation signal of m (where, m <n) encoder according to any one of claims 1 8 to 2 7, characterized in that it comprises a pulse positions.

A decoder (50B) for decoding an audio signal,
And means for supplying express (k _m) of the first excitation signal of a code excited linear prediction of the first audio signal (33A) (55),
Means (53B) for supplying a second excitation signal representation (k ′ _s ) of a code-excited linear prediction of a second audio signal (33B) different from the first audio signal (33A) ,
The second excitation signal is one of a set of candidate excitation signals (10 ′);
The set of candidate excitation signals (10 ′) is based on the first excitation signal;
The decoder further comprises:
Connected to receive information associated with the representation of the first excitation signal (k _m ) and the representation of the second excitation signal (k ′ _s ) and based on information about the set of candidate excitation signals (10 ′) Means (57) for deriving the second excitation signal (c ′ _{k ′s} (n)) from the representation (k ′ _s ) of the second excitation signal;
Means (25 ") for reconstructing the second audio signal (^ s _S (n)) by predictive filtering the second excitation signal (c '_k's(n));
A decoder (50B) characterized by comprising:

30. Decoder according to claim 29 , wherein the second audio signal (33B) is correlated with the first audio signal (33A).

The information regarding the set of candidate excitation signals (10 ′) includes identification information of one rule among a set of predetermined rules, and derivation of the set of candidate excitation signals (10 ′) is determined according to the rules. the decoder of claim 29 or 3 0 characterized in that it is.

The first excitation signal includes n pulse positions (P _M ) out of a set of N possible pulse positions;
The candidate excitation signal has pulse positions (P ^* _s ) only in a subset of the N possible pulse positions;
The subset of pulse positions (P ^* _s ) is selected based on n pulse positions (P _M ) of the first excitation signal, according to any one of claims 29 to 31. The decoder described.

If j is an index in the interval {i + L, i + K} (where K and L are integers and K> L) and i is an index of the n pulse positions, the subset of the pulse positions pulse position (P ^* _s), the decoder according to claim 3 2, characterized in that it is located at the position p _j.

K = 1 and the decoder of claim 3 3, characterized in that the L = -1.