JP2007178684A

JP2007178684A - Multi-channel audio decoding device

Info

Publication number: JP2007178684A
Application number: JP2005376570A
Authority: JP
Inventors: Yoshiaki Takagi; 良明高木; Sen Chon Kok; セン・チョンコク; Takeshi Norimatsu; 武志則松; Shuji Miyasaka; 修二宮阪; Akihisa Kawamura; 明久川村; Koshiro Ono; 耕司郎小野
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2005-12-27
Filing date: 2005-12-27
Publication date: 2007-07-12

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem that, in a conventional multi-channel audio decoding technology using spatial information, high arithmetic cost and high memory requirement are caused by processing an analysis and synthesis filter bank with a complex coefficient, and when a real number arithmetic filter bank is used, computational complexity is reduced but sound quality is deteriorated by aliasing. <P>SOLUTION: Sound quality deterioration by the effect of the aliasing in a specified sub-band, which arises when the complex filter bank of the conventional multi-channel decoding technology using the spatial information is changed to a real number type, is solved firstly by determining an area where the aliasing having great influence by using a reflective coefficient at first, and secondly by equalizing change of coefficient of scaling or mixing by using an equalizing tool. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、従来の空間情報を用いた低ビットレートマルチチャンネル音響コーデック（たとえば非特許文献１）において、低消費電力で、かつ少ないメモリー容量で処理を行うことのできる復号装置を提供する。本発明は、以下に制限はされないが、放送等の低ビットレートの応用をはじめ、ホームシアターシステム、車載音響システム及び電子ゲームシステムに適用可能である。 The present invention provides a decoding apparatus that can perform processing with low power consumption and a small memory capacity in a low bit rate multi-channel acoustic codec (for example, Non-Patent Document 1) using conventional spatial information. The present invention is not limited to the following, but can be applied to low-bit-rate applications such as broadcasting, home theater systems, in-vehicle audio systems, and electronic game systems.

近年、スペーシャルオーディオコーデック（空間音響コーデック）と呼ばれる新しいマルチチャンネル音響符号化復号化技術が開発されている（非特許文献１）。これは、非常に少ない情報量でマルチチャネルの臨場感を圧縮・符号化することができ、例えば、既に日本デジタルテレビの音声方式として用いられているマルチチャネルコーデックであるＡＡＣ方式が、５．１ｃｈ当り５１２ｋｂｐｓや、３８４ｋｂｐｓという多くのビットレートを必要するのに対し、スペーシャルコーデックでは、１２８ｋｂｐｓや、６４ｋｂｐｓ、さらに４８ｋｂｐｓといった非常に少ないビットレートで５．１ｃｈのマルチチャネル信号を圧縮・符号化することができる。 In recent years, a new multi-channel acoustic coding / decoding technique called a spatial audio codec (spatial acoustic codec) has been developed (Non-patent Document 1). This can compress and encode the presence of multi-channel with a very small amount of information. For example, the AAC system, which is a multi-channel codec that is already used as an audio system of Japanese digital television, is 5.1ch. While a bit rate of 512 kbps or 384 kbps is required, a spatial codec compresses and encodes a 5.1 channel multi-channel signal at a very low bit rate of 128 kbps, 64 kbps, and 48 kbps. Can do.

図１は、非特許文献１に代表される、従来の空間音響符号化復号化の基本的原理をステレオ入力信号（２ｃｈ信号）の場合を例として説明する図である。ここで、Ｌは左チャンネル、Ｒは右チャンネルの信号を示す。符号化処理部において、入力音響信号であるＬおよびＲ信号は、所定の時間間隔のフレーム単位に処理され、ダウンミックス部（１００）において、たとえばＭ＝（Ｌ＋Ｒ）／２となる式によりダウンミックス信号Ｍが生成される。空間パラメータ検出モジュール（１０２）は、Ｌ，Ｒ及びＭ信号から、各スペクトルバンドごとに複数の空間パラメータを算出する。音響エンコーダ（１０４）は、ＭＰ３やＡＡＣ等の符号化方式を用いて、ダウンミックス信号Ｍを符号化して、圧縮された符号化列を生成する。さらに多重化装置ＭＵＸ（１０６）において、空間パラメータ情報とＭ信号の符号化列が多重化されビットストリームが生成される。 FIG. 1 is a diagram for explaining the basic principle of conventional spatial acoustic coding and decoding, represented by Non-Patent Document 1, taking a stereo input signal (2ch signal) as an example. Here, L indicates a left channel signal and R indicates a right channel signal. In the encoding processing unit, the L and R signals, which are input sound signals, are processed in units of frames at predetermined time intervals, and in the downmix unit (100), for example, downmixing is performed according to an equation of M = (L + R) / 2. A signal M is generated. The spatial parameter detection module (102) calculates a plurality of spatial parameters for each spectrum band from the L, R, and M signals. The acoustic encoder (104) encodes the downmix signal M using an encoding method such as MP3 or AAC, and generates a compressed encoded sequence. Further, in the multiplexer MUX (106), the spatial parameter information and the encoded sequence of the M signal are multiplexed to generate a bit stream.

空間パラメータ検出モジュール（１０２）で検出される空間パラメータ情報としては、２つの信号チャンネル間のレベル／強度差を示すＩｎｔｅｒｃｈａｎｎｅｌＬｅｖｅｌＤｉｆｆｅｒｅｎｃｅ（以下ＩＬＤと呼ぶ。）と、２つのチャンネル間の類似性（コヒーレンス／相関度）を示すＩｎｔｅｒＣｈａｎｎｅｌＣｏｒｒｅｌａｔｉｏｎ（以下ＩＣＣと呼ぶ。）などがある。一般に、ＩＬＤは、音のバランス／定位を制御し、ＩＣＣは音の幅／拡散性を制御する。これらは共に聴き手が聴覚的情景を頭の中で構成するのを助ける空間パラメータである。これらの空間パラメータは、通常音響スペクトルを複数の「パラメータバンド」からなるグループに区分されたとき、それぞれのパラメータバンドごとに算出される。 Spatial parameter information detected by the spatial parameter detection module (102) includes an inter-channel level difference (hereinafter referred to as ILD) indicating a level / intensity difference between two signal channels and a similarity (coherence) between the two channels. Inter Channel Correlation (hereinafter referred to as ICC) indicating (correlation degree). In general, the ILD controls sound balance / localization, and the ICC controls sound width / diffusivity. These are both spatial parameters that help the listener compose an auditory scene in the head. These spatial parameters are calculated for each parameter band when the normal acoustic spectrum is divided into groups consisting of a plurality of “parameter bands”.

復号化処理では、まず逆多重化装置ＤＥＭＵＸ（１０８）によって、入力されたビットストリームを空間パラメータ情報とダウンミックス信号Ｍの符号化列に分離する。分離されたＭの符号化列は、音響デコーダ（１１０）（たとえばＡＡＣデコーダやＭＰ３デコーダ等）により復号され、ダウンミックス信号Ｍが復元される。ステレオ信号合成モジュール（１１２）では、復号化されたダウンミックス信号Ｍと空間パラメータから、２チャンネルの信号に分離し、元のステレオ信号（Ｌ信号とＲ信号）を復元する。 In the decoding process, first, the demultiplexer DEMUX (108) separates the input bit stream into spatial parameter information and a coded sequence of the downmix signal M. The separated M encoded sequence is decoded by an acoustic decoder (110) (for example, an AAC decoder, an MP3 decoder, etc.), and the downmix signal M is restored. The stereo signal synthesis module (112) separates the decoded downmix signal M and the spatial parameters into two-channel signals, and restores the original stereo signals (L signal and R signal).

上記の例では、エンコーダにおいて入力の二つの信号から１つのダウンミックス信号と空間パラメータを抽出し、デコーダにおいて、空間パラメータとダウンミックス信号とから、ダウンミックス信号を２つの信号に分離する場合を説明したが、２チャンネルより多いオーディオ信号（例えば５．１チャンネル音源を構成する６つの信号）を、符号化処理時に１チャンネルもしくは２チャンネルのダウンミックス信号に圧縮し、復号化処理において５．１チャンネル信号（６チャンネル）に復元することができる。図２は、６チャンネルの場合の例であり、各チャンネル分離モジュール（２００〜２０４）において１つの中間ダウンミックス信号を２つの中間ダウンミックス信号に分離する処理が、６チャンネルそれぞれの単一信号に分離されるまで繰り返される。ここで、Ｌ_f、Ｒ_f、Ｌ_s、Ｒ_s、Ｃ、及びＬＦＥは、それぞれ左前方スピーカ信号、右前方スピーカ信号、左後方スピーカ信号、右後方スピーカ信号、前方中央信号、及び低域周波数信号に相当する。 In the above example, one downmix signal and a spatial parameter are extracted from two input signals in the encoder, and the downmix signal is separated into two signals from the spatial parameter and the downmix signal in the decoder. However, an audio signal having more than 2 channels (for example, 6 signals constituting a 5.1 channel sound source) is compressed into a 1-channel or 2-channel downmix signal at the time of encoding processing, and 5.1 channels at the time of decoding processing. The signal can be restored to 6 channels. FIG. 2 shows an example in the case of 6 channels. In each channel separation module (200 to 204), the process of separating one intermediate downmix signal into two intermediate downmix signals is converted into a single signal for each of the six channels. Repeat until separated. Here, L _f , R _f , L _s , R _s , C, and LFE are a left front speaker signal, a right front speaker signal, a left rear speaker signal, a right rear speaker signal, a front center signal, and a low frequency, respectively. Corresponds to the signal.

図３は、２チャンネル入力の場合のチャンネル分離モジュール（３００）の原理を説明するブロック図である。入力されたダウンミックス信号Ｍは、オールパスフィルタ（３０１）によって処理され、無相関信号Ｍ_revが生成される。次にモジュール３０３において、上記２つの信号ＭおよびＭ_revは、ミキシング係数であるＨ_ijと合成され、以下の式（数１）により２つの信号出力Ｌ及びＲに分離される。 FIG. 3 is a block diagram illustrating the principle of the channel separation module (300) in the case of 2-channel input. The input downmix signal M is processed by an all-pass filter (301) to generate an uncorrelated signal _Mrev . Next, in the module 303, the two signals M and _Mrev are combined with a mixing coefficient _Hij and separated into two signal outputs L and R by the following equation (Equation 1).

ここで使用されるミキシング係数は、ブロック（３０２）において、分離信号間の相関の程度、及び分離信号の指向性を維持するように空間パラメータＩＬＤおよびＩＣＣから算出される。 The mixing coefficient used here is calculated from the spatial parameters ILD and ICC in the block (302) so as to maintain the degree of correlation between the separated signals and the directivity of the separated signals.

図４は、従来の空間音響デコーダの主要なモジュールを説明するブロック図である。逆多重化装置によって分離されたダウンミックス信号符号化列は、音響デコーダ（４００）によって、時間領域音響信号に復号化され。次に、分析フィルタバンク（４０１）によって複数のサブバンド信号に変換される。この分析フィルタバンクは、例えば、ＱＭＦフィルタバンクと、ナイキストフィルタバンクの２段階のフィルタで構成され、最初にＱＭＦフィルタバンクで複数のサブバンドに分けられた後、低周波数サブバンドのスペクトルの分解能を高めるために、さらにナイキストフィルタバンクで低周波数サブバンドを分割する。 FIG. 4 is a block diagram illustrating main modules of a conventional spatial acoustic decoder. The downmix signal encoded sequence separated by the demultiplexer is decoded into a time domain acoustic signal by the acoustic decoder (400). Next, it is converted into a plurality of subband signals by the analysis filter bank (401). This analysis filter bank is composed of, for example, a QMF filter bank and a Nyquist filter bank, and is divided into a plurality of subbands by the QMF filter bank, and then the spectrum resolution of the low frequency subband is reduced. In order to enhance it, the low frequency subband is further divided by a Nyquist filter bank.

次に、分析フィルタバンク（４０１）の出力ベクトルｘ信号から、チャンネル分離されたｙベクトル信号が生成される過程を、図２の５．１ｃｈの場合を例に説明する。プリマトリクスモジュール（４０２）の目的は、図２の各チャンネル分離モジュール（２００〜２０４）が無相関信号を生成するために用いることができる中間信号を生成することである。プレマトリックスモジュールは、入力ダウンミックス信号ＭのエネルギーレベルをスケーリングするＩＬＤ空間パラメータから合成信号Ｍ₁からＭ₄のＩＬＤ空間パラメータの、スケーリングファクタのベクトルＲ₁を算出する。 Next, a process of generating a channel-separated y vector signal from the output vector x signal of the analysis filter bank (401) will be described by taking 5.1ch in FIG. 2 as an example. The purpose of the pre-matrix module (402) is to generate an intermediate signal that can be used by each channel separation module (200-204) of FIG. 2 to generate an uncorrelated signal. The pre-matrix module calculates a vector R _{1 of} scaling factors of the ILD spatial parameters of the synthesized signals M ₁ to M ₄ from the ILD spatial parameters that scale the energy level of the input downmix signal M.

この例において、Ｍ₁、Ｍ₂、Ｍ₃、Ｍ₄はそれぞれ、
Ｍ₁＝Ｌ_f＋Ｒ_f＋Ｃ＋ＬＦＥ
Ｍ₂＝Ｌ_f＋Ｒ_f
Ｍ₃＝Ｃ＋ＬＦＥ
Ｍ₄＝Ｌ_s＋Ｒ_s
である。 In this example, M ₁ , M ₂ , M ₃ and M ₄ are respectively
_{_{_{M 1 = L f + R f}}} + C + LFE
M ₂ = L _f + R _f
M ₃ = C + LFE
M ₄ = L _s + R _s
It is.

無相関モジュール（４０３）は、ｖ（ｎ，ｓｂ）にオールパスフィルタ処理を施し、下記の式により無相関信号ｗを生成する。ここで、Ｍ_i,revはＭ_iに無相関処理を施したものである。 The decorrelation module (403) performs an all-pass filter process on v (n, sb) and generates a decorrelation signal w by the following equation. Here, M _{i, rev} is obtained by subjecting M _i to decorrelation processing.

ポストマトリクスモジュール（４０４）は、個々の信号を導出するために、ＭとＭ_i,revをミキシングするミキシング係数のマトリックスＲ₂を算出する。図２の例を参照すると、
Ｌ_f ＝Ｈ_11,A ＊Ｍ₂ ＋Ｈ_12,A ＊Ｍ_2,rev
Ｍ₂ ＝Ｈ_11,D ＊Ｍ₁ ＋Ｈ_12,D ＊Ｍ_1,rev
Ｍ₁ ＝Ｈ_11,E ＊Ｍ＋Ｈ_12,E ＊Ｍ_rev
となる。 The post-matrix module (404) calculates a matrix R _{2 of} mixing coefficients that mixes M and M _{i, rev} to derive individual signals. Referring to the example of FIG.
L _f = H _{11, A} * M ₂ + H _{12, A} * M _{2, rev}
M ₂ = H _{11, D} * M ₁ + H _{12, D} * M _{1, rev}
M ₁ = H _{11, E} * M + H _{12, E} * M _rev
It becomes.

ここで、Ｈ_ij,Aは、チャンネル分離モジュールＣＳ＿Ａ（２００）等におけるミキシング係数Ｈ_ijである。上記３つの数式は、以下の（数４）のような一つのベクトル乗算式にまとめることができる。 Here, H _{ij, A} is a mixing coefficient H _ij in the channel separation module CS_A (200) or the like. The above three formulas can be combined into one vector multiplication formula as shown in the following (Equation 4).

上記と同様の数式は、Ｒ_f、Ｌ_s、．．．ＬＦＥを導出するための、Ｒ_2,Rf，Ｒ_2,Ls … Ｒ_2,LFEベクトルを算出することによって導出することができる。よって、ベクトルｙは以下の（数５）のように表すことができる。 Equations similar to the above are R _f , L _s,. . . It is possible to derive the _LFE by calculating the R2 _{, Rf} , R2 _{, Ls} ... R2 _{, LFE} vector for deriving the _LFE . Therefore, the vector y can be expressed as (Equation 5) below.

チャンネル分離モジュール（２００〜２０４）からのミキシング係数の倍数集合からなるマトリックスであるＲ₂は、マルチチャンネル信号を生成するために、Ｍ，Ｍ_rev，Ｍ_2,rev，．．．Ｍ_4,revを線形結合したようにみられる。 R ₂ , which is a matrix consisting of multiple sets of mixing coefficients from the channel separation module (200-204), generates M, M _rev , M _{2, rev,.} . . It seems that M _{4, rev} is linearly combined.

Ｒ₁とＲ₂はいずれも，行を示すｒ，列を示すｃ，時間を示すｎ，サブバンドを示すｓｂによって特定することができる。 Both R ₁ and R ₂ can be specified by r indicating a row, c indicating a column, n indicating a time, and sb indicating a subband.

最後に、分離された各信号は合成フィルタバンク（４０５）によって時間領域信号に変換され、マルチチャンネルの出力信号を得る。ここで、分析フィルタバンクがＱＭＦ分析フィルタバンクとナイキスト分析フィルタバンクで構成される場合には、合成フィルタバンク（４０５）は、合成ＱＭＦフィルタバンクと、合成ナイキストフィルタバンクで構成される。 Finally, each separated signal is converted into a time domain signal by the synthesis filter bank (405) to obtain a multi-channel output signal. Here, when the analysis filter bank includes a QMF analysis filter bank and a Nyquist analysis filter bank, the synthesis filter bank (405) includes a synthesis QMF filter bank and a synthesis Nyquist filter bank.

本発明の目的は、前述の構成の空間音響デコーダにおいて、高音質を維持しつつ、必要なメモリ容量、消費電力を減らすことである。
Ｊ．Ｈｅｒｒｅ，ｅｔａｌ， “ＴｈｅＲｅｆｅｒｅｎｃｅＭｏｄｅｌＡｒｃｈｉｔｅｃｔｕｒｅｆｏｒＭＰＥＧＳｐａｔｉａｌＡｕｄｉｏＣｏｄｉｎｇ”，１１８ｔｈＡＥＳＣｏｎｖｅｎｔｉｏｎ，Ｂａｒｃｅｌｏｎａ An object of the present invention is to reduce the required memory capacity and power consumption while maintaining high sound quality in the spatial acoustic decoder having the above-described configuration.
J. et al. Herre, et al, “The Reference Model Architecture for MPEG Spatial Audio Coding”, 118th AES Convention, Barcelona.

しかしながら従来技術において述べられた空間音響デコーダは、複素係数によるフィルタバンクによって実現されており、復号処理が複素領域において実行されるため、多くの演算費とメモリ容量を必要とする。複素係数の代わりに実数係数のフィルターバンクを用いることにより演算量を大幅に削減することが可能であるが、この場合下記に説明するようにエリアジングの影響による音質劣化が生じてしまうという課題がある。 However, the spatial acoustic decoder described in the prior art is realized by a filter bank with complex coefficients, and decoding processing is executed in the complex domain, so that it requires a lot of calculation costs and memory capacity. By using a filter bank of real coefficients instead of complex coefficients, it is possible to greatly reduce the amount of calculation, but in this case, there is a problem that sound quality deterioration due to the effect of aliasing occurs as described below. is there.

複素係数の分析フィルタバンクは、スペクトル領域を複数のサブバンドに分割した際の、それぞれのサブバンドの信号を出力する。現実のフィルタバンクに用いられるプロトタイプフィルタは、サブバンド間で周波数応答の領域が重なるため、エリアジングが発生する。分析フィルタバンクの出力信号が修正されない場合、もしくは全てのサブバンドに対して同量の修正が行われる場合に、合成フィルタバンクにおいて信号スペクトル全体にわたり周辺サブバンドに流れ出すエリアジング要素は削除されるように、フィルタバンクは設計される。 The complex coefficient analysis filter bank outputs a signal of each subband when the spectral region is divided into a plurality of subbands. In the prototype filter used in an actual filter bank, aliasing occurs because frequency response regions overlap between subbands. If the analysis filter bank output signal is not modified, or if the same amount of modification is made for all subbands, the aliasing elements that flow to the surrounding subbands in the synthesis filter bank will be eliminated. In addition, the filter bank is designed.

しかしながら、信号が修正される場合には、複素フィルタバンクは、その「オーバーサンプリング」特性を通して冗長性を持ち込むことによって、エリアジングの問題を軽減するが、演算及びメモリ負荷を軽減するために実数係数のフィルタバンクを用いた場合、信号は「オーバーサンプル」から「クリティカルサンプル」に変化する。言い換えれば、ミキシングマトリックスＲ１及びＲ２によって信号帯域が独立してスケーリングされる場合に、エリアジングの影響による音質の劣化をはっきりと聴くことができるようになる。実際に、エリアジングの影響は、信号スペクトルの強いトーナル成分を持つサブバンド域周辺において特に目立つ。 However, if the signal is modified, the complex filter bank alleviates the aliasing problem by introducing redundancy through its “oversampling” property, but the real coefficients to reduce the computation and memory load. When the filter bank is used, the signal changes from “oversample” to “critical sample”. In other words, when the signal band is independently scaled by the mixing matrices R1 and R2, it is possible to clearly hear the deterioration of sound quality due to the influence of aliasing. In fact, the effect of aliasing is particularly noticeable around the subband region having a strong tonal component of the signal spectrum.

上記課題を解決するために、本発明のマルチチャンネルオーディオ復号装置は、入力時間信号系列から複数のサブバンドを生成する実数係数の分析フィルタバンクと、前記サブバンド信号に対応する無相関信号を生成する実数係数のオールパスフィルタを持つ無相関モジュールと、前記サブバンド信号をマルチチャンネルのサブバンド信号に変換するチャンネル拡大モジュールと、前記サブバンド信号から反射係数を算出する反射係数算出モジュールと、前記反射係数を用いてエリアジングが発生する可能性が高い、強いトーナル成分の存在するサブバンドを特定し、さらに前記エリアジングを抑制するために、前記反射係数を用いてチャンネル拡大モジュールの出力信号をイコライジングするイコライジングモジュールと、実数係数の合成フィルタバンクから構成されることを特徴とする。 In order to solve the above problems, a multi-channel audio decoding apparatus according to the present invention generates an analysis filter bank of real coefficients for generating a plurality of subbands from an input time signal sequence, and generates an uncorrelated signal corresponding to the subband signals. A non-correlated module having an all-pass filter with a real coefficient, a channel expansion module for converting the subband signal into a multi-channel subband signal, a reflection coefficient calculation module for calculating a reflection coefficient from the subband signal, and the reflection The sub-bands with strong tonal components that are highly likely to cause aliasing are identified using coefficients, and the output signal of the channel expansion module is equalized using the reflection coefficients to further suppress the aliasing. Equalizing module and real number coefficient synthesis Characterized in that it is composed of Irutabanku.

本発明は、時間領域の信号を複数のサブバンド信号に変換する実数演算の分析フィルターバンクと、前記サブバンド信号に対応する無相関信号を生成する実数型オールパスフィルタを持つ無相関モジュールと、前記サブバンド信号をマルチチャンネルのサブバンド信号に変換するチャンネル拡大モジュールと、前記チャンネル拡大モジュールで変換されたマルチチャンネルのサブバンド信号を時間領域の信号に変換する実数演算の合成フィルタバンクとから構成されるマルチチャンネルオーディオ復号装置であって、さらに、前記各サブバンド信号から反射係数を算出する反射係数算出モジュールと、前記反射係数を用いて強いトーナル成分の存在するサブバンドを特定し、エリアジングの影響を抑圧するために前記反射係数を用いて前記チャンネル拡大モジュールの出力信号を調整するイコライジングモジュールを備えたことを特徴とする、マルチチャンネルオーディオ復号装置を提供する。 The present invention provides a real number analysis filter bank for converting a time domain signal into a plurality of subband signals, a non-correlation module having a real type all-pass filter for generating a non-correlated signal corresponding to the subband signal, It consists of a channel expansion module that converts a subband signal into a multichannel subband signal, and a synthesis filter bank for real number conversion that converts the multichannel subband signal converted by the channel expansion module into a time domain signal. A multi-channel audio decoding device, further comprising: a reflection coefficient calculation module for calculating a reflection coefficient from each of the subband signals; and identifying a subband in which a strong tonal component exists using the reflection coefficient to perform aliasing. In order to suppress the influence, the reflection coefficient is used to Characterized by comprising an equalizing module for adjusting the output signal of the tunnel expansion module provides a multi-channel audio decoding apparatus.

本発明はまた、時間領域の信号を複数のサブバンド信号に変換する実数演算の分析フィルターバンクと、前記サブバンド信号に対応する無相関信号を生成する実数型オールパスフィルタを持つ無相関モジュールと、前記サブバンド信号をマルチチャンネルのサブバンド信号に変換するチャンネル拡大モジュールと、前記チャンネル拡大モジュールで変換されたマルチチャンネルのサブバンド信号を時間領域の信号に変換する実数演算の合成フィルタバンクとから構成されるマルチチャンネルオーディオ復号装置であって、さらに、前記各サブバンド信号から反射係数を算出する反射係数算出モジュールと、前記サブバンド信号からトーナリティを算出するトーナリティ算出モジュールと、前記反射係数を用いて強いトーナル成分の存在するサブバンドを特定し、エリアジングの影響を抑圧するために、前記トーナリティを用いて前記チャンネル拡大モジュールの出力信号を調整するイコライジングモジュールを備えたことを特徴とした、マルチチャンネルオーディオ復号装置を提供する。 The present invention also provides an analysis filter bank for real arithmetic that converts a signal in the time domain into a plurality of subband signals, an uncorrelated module having a real type all-pass filter that generates an uncorrelated signal corresponding to the subband signals, and A channel expansion module for converting the subband signal into a multichannel subband signal, and a synthesis filter bank for real number operation for converting the multichannel subband signal converted by the channel expansion module into a time domain signal A multi-channel audio decoding device, wherein a reflection coefficient calculation module that calculates a reflection coefficient from each subband signal, a tonality calculation module that calculates a tonality from the subband signal, and the reflection coefficient Supports with strong tonal components Identify bands, in order to suppress the influence of aliasing, said characterized in that it comprises an equalizing module for adjusting the output signal of the channel expansion module, provides a multi-channel audio decoding apparatus using the tonality.

本発明の一実施態様において、反射係数は、単一の周波数成分が２つの連続したサブバンドの間に存在しているときに＋１または−１に近い値をとり、（数６）を用い、符号は前記連続したサブバンドの偶奇によって決定される値であることを特徴とする。 In one embodiment of the present invention, the reflection coefficient takes a value close to +1 or −1 when a single frequency component exists between two consecutive subbands, and uses (Equation 6): The sign is a value determined by even and odd of the continuous subbands.

本発明のさらなる一実施態様においては、反射係数の正負をテーブル参照によって求めることを特徴とする。 In a further embodiment of the present invention, the sign of the reflection coefficient is obtained by referring to a table.

本発明のさらなる一実施態様において、チャンネル拡大モジュールの出力信号をイコライジングする際に、隣り合ったサブバンドの反射係数の絶対値の平均が第１の閾値以上の場合には前記反射係数の平均値をそれぞれのサブバンドに対応した出力とし、隣り合ったサブバンドの反射係数の絶対値の平均が第２の閾値より小さい場合には前記反射係数をそれぞれのサブバンドに対応した出力とし、隣り合ったサブバンドの反射係数の絶対値の平均が第１の閾値より小さく第２の閾値以上の場合には前記反射係数と前記反射係数の平均値をそれぞれのサブバンドに対応した出力とすることを特徴する。 In a further embodiment of the present invention, when equalizing the output signal of the channel expansion module, the average value of the reflection coefficients if the average absolute value of the reflection coefficients of adjacent subbands is equal to or greater than a first threshold value. Is output corresponding to each subband, and if the average of the absolute values of the reflection coefficients of adjacent subbands is smaller than the second threshold, the reflection coefficient is output corresponding to each subband and adjacent to each other. If the average of the absolute values of the reflection coefficients of the subbands is smaller than the first threshold and greater than or equal to the second threshold, the reflection coefficient and the average value of the reflection coefficients are set as outputs corresponding to the respective subbands. Characterize.

本発明の別の実施態様において、チャンネル拡大モジュールの出力信号をイコライジングする際に、隣り合ったサブバンドのトーナリティの平均値が第１の閾値以上の場合には前記トーナリティの平均値をそれぞれのサブバンドに対応した出力とし、隣り合ったサブバンドのトーナリティの平均値が第２の閾値より小さい場合には前記トーナリティをそれぞれのサブバンドに対応した出力とし、隣り合ったサブバンドのトーナリティの平均値が第１の閾値より小さく第２の閾値以上の場合には前記トーナリティと前記トーナリティの平均値をそれぞれのサブバンドに対応した出力とすることを特徴とする。 In another embodiment of the present invention, when equalizing the output signal of the channel expansion module, if the average value of the tonalities of adjacent subbands is greater than or equal to the first threshold value, the average value of the tonality is set for each sub-band. When the average value of the tonalities of adjacent subbands is smaller than the second threshold value, the tonalities are output corresponding to the respective subbands, and the average value of the tonalities of adjacent subbands. Is smaller than the first threshold and greater than or equal to the second threshold, the tonality and the average value of the tonality are output corresponding to the respective subbands.

本発明の別の実施態様において、分析フィルターバンクは、時間領域の信号を複数のサブバンド信号に変換する実数係数のＱＭＦ分析フィルタバンクと、前記サブバンド信号の分解能を拡張する実数係数のナイキスト分析フィルタバンクから構成され、合成フィルターバンクは、実数係数のナイキスト合成フィルタバンクと、実数係数のＱＭＦ合成フィルタバンクとから構成されることを特徴とする。 In another embodiment of the present invention, the analysis filter bank includes a real coefficient QMF analysis filter bank that converts a time domain signal into a plurality of subband signals, and a real coefficient Nyquist analysis that extends the resolution of the subband signals. The filter bank is composed of a Nyquist synthesis filter bank with real coefficients and a QMF synthesis filter bank with real coefficients.

本発明の別の一実施態様において、イコライジング処理は、オーディオ信号の一部の周波数帯域のみに適用されることを特徴とする。 In another embodiment of the present invention, the equalizing process is applied to only a part of the frequency band of the audio signal.

本発明の別の一実施態様において、空間パラメータを共用するサブバンドを１つのパラメータバンドとしてまとめ、前記パラメータバンドごとにイコライジング処理を行うことを特徴とする。 In another embodiment of the present invention, subbands sharing a spatial parameter are grouped as one parameter band, and equalizing processing is performed for each parameter band.

本発明はまた、時間領域の信号を実数係数の分析フィルター演算により複数のサブバンド信号に変換するステップと、実数型オールパスフィルタにより前記サブバンド信号に対応する無相関信号を生成するステップと、前記サブバンド信号をマルチチャンネルのサブバンド信号に変換するステップと、前記変換されたマルチチャンネルのサブバンド信号を実数係数の合成フィルタ演算により時間領域の信号に変換するステップとから構成されるマルチチャンネルオーディオ復号方法であって、さらに、前記各サブバンド信号から反射係数を算出するステップと、前記算出された反射係数を用いて強いトーナル成分の存在するサブバンドを特定し、エリアジングの影響を抑圧するために前記反射係数を用いて前記変換されたマルチチャンネルのサブバンド出力信号をイコライジングするステップとを備えたことを特徴とする、マルチチャンネルオーディオ復号方法を提供する。 The present invention also includes a step of converting a time domain signal into a plurality of subband signals by an analysis filter operation of a real number coefficient, a step of generating an uncorrelated signal corresponding to the subband signal by a real type all-pass filter, Multi-channel audio comprising: converting a sub-band signal into a multi-channel sub-band signal; and converting the converted multi-channel sub-band signal into a time-domain signal by a real coefficient synthesis filter operation. In the decoding method, a step of calculating a reflection coefficient from each subband signal, and a subband in which a strong tonal component exists is specified using the calculated reflection coefficient, and the influence of aliasing is suppressed. For the converted multi-channel using the reflection coefficient Characterized by comprising a step of equalizing the subband output signals to provide a multi-channel audio decoding method.

本発明はまた、時間領域の信号を実数係数の分析フィルター演算により複数のサブバンド信号に変換するステップと、実数型オールパスフィルタにより前記サブバンド信号に対応する無相関信号を生成するステップと、前記サブバンド信号をマルチチャンネルのサブバンド信号に変換するステップと、前記変換されたマルチチャンネルのサブバンド信号を実数係数の合成フィルタ演算により時間領域の信号に変換するステップとから構成されるマルチチャンネルオーディオ復号方法であって、さらに、前記各サブバンド信号から反射係数を算出するステップと、前記サブバンド信号からトーナリティを算出するステップと、前記反射係数を用いて強いトーナル成分の存在するサブバンドを特定し、エリアジングの影響を抑圧するために、前記トーナリティを用いて前記変換されたマルチチャンネルの差分バンド出力信号をイコライジングするステップとを備えたことを特徴とした、マルチチャンネルオーディオ復号方法を提供する。 The present invention also includes a step of converting a time domain signal into a plurality of subband signals by an analysis filter operation of a real number coefficient, a step of generating an uncorrelated signal corresponding to the subband signal by a real type all-pass filter, Multi-channel audio comprising: converting a sub-band signal into a multi-channel sub-band signal; and converting the converted multi-channel sub-band signal into a time-domain signal by a real coefficient synthesis filter operation. A decoding method further comprising: calculating a reflection coefficient from each subband signal; calculating a tonality from the subband signal; and identifying a subband in which a strong tonal component exists using the reflection coefficient In order to suppress the effects of aliasing, Wherein the differential band output signal of the converted multi-channel and a step of equalizing was characterized by, providing a multi-channel audio decoding method using Nariti.

本発明はまた、上述のいずれかに記載の前記マルチチャンネルオーディオ復号方法をコンピュータに実行させるためのプログラムを提供する。 The present invention also provides a program for causing a computer to execute the multi-channel audio decoding method described above.

本発明はまた、上述のプログラムを記録した情報記録媒体を提供する。 The present invention also provides an information recording medium on which the above program is recorded.

本発明により、ビットストリームの構造を変形することなく、従来の空間音響復号技術の演算量を大幅に削減することが可能となり、実数係数フィルタを用いた場合に課題であったエリアジング歪による音質劣化を抑え、低演算、高音質を両立させたマルチチャンネルオーディオ復号装置が実現できる。 According to the present invention, it is possible to greatly reduce the amount of calculation of the conventional spatial acoustic decoding technique without changing the structure of the bit stream, and the sound quality due to aliasing distortion, which has been a problem when using a real coefficient filter, is achieved. It is possible to realize a multi-channel audio decoding device that suppresses deterioration and achieves both low computation and high sound quality.

以下の記載では、非特許文献１に示される音響空間符号化技術を、多数かつ直接的に参照しているが、本発明はその特定の技術に限定されるものではない。 In the following description, the acoustic space encoding technique shown in Non-Patent Document 1 is referred to in large numbers and directly, but the present invention is not limited to the specific technique.

本発明は、以下の実施の形態および図面を用いて説明されるが、これらに限定されることを意図しない。 The present invention will be described using the following embodiments and drawings, but is not intended to be limited thereto.

（実施の形態１）
図５は、本発明の第１の実施の形態を説明するためのデコーダの構成図である。分析フィルタバンク（５０１）は、オーディオデコーダ（５００）で復号されたダウンミックス出力信号から、複数のサブバンド信号に変換するＱＭＦフィルタバンクとナイキストフィルタバンクとから構成される。ここで従来技術で使用されるＱＭＦフィルタバンク変調係数Ｍは、下記（数７）の複素数係数であるが、本発明では、以下の（数８）の実数変調係数Ｍを用いる。 (Embodiment 1)
FIG. 5 is a block diagram of a decoder for explaining the first embodiment of the present invention. The analysis filter bank (501) includes a QMF filter bank and a Nyquist filter bank for converting the downmix output signal decoded by the audio decoder (500) into a plurality of subband signals. Here, the QMF filter bank modulation coefficient M used in the conventional technique is a complex coefficient of the following (Equation 7). In the present invention, the real modulation coefficient M of the following (Equation 8) is used.

非特許文献１に示される空間音響デコーダには、高音質（ＨＱ）モードと低演算量（ＬＣ）モードの２つのデコーダモードが存在する。ＨＱモードでは、低域の７つのサブバンドがナイキストフィルタバンク（Ｎｙｑ）によって、それぞれ１６，８，８，４，４，４，４のハイブリッドサブバンドに分割される。ＬＣモードで使用されるナイキストフィルタには、表１の４列目に示すように、複素係数を用いるＴｙｐｅＡのフィルタと実数係数を用いるＴｙｐｅＢフィルタの２種類のフィルタがあり、これらの低域の３つのサブバンドが８，２，２のハイブリッドサブバンドに分割される。４列目には対応するハイブリッドサブバンドの数が書かれ、５列目には前記ハイブリッドサブバンドの添字が書かれている。 The spatial acoustic decoder shown in Non-Patent Document 1 has two decoder modes: a high sound quality (HQ) mode and a low computational complexity (LC) mode. In the HQ mode, the seven subbands in the low band are divided into 16, 8, 8, 4, 4, 4, and 4 hybrid subbands by the Nyquist filter bank (Nyq). As shown in the fourth column of Table 1, the Nyquist filter used in the LC mode includes two types of filters, a type A filter using complex coefficients and a type B filter using real coefficients. Are subdivided into 8, 2, 2 hybrid subbands. The number of corresponding hybrid subbands is written in the fourth column, and the subscripts of the hybrid subbands are written in the fifth column.

また、従来技術で用いられるＴｙｐｅＡのナイキストフィルタバンクでは下記（数９）の複素数変調係数が用いられるが、本発明では、以下の（数１０）の実数変調係数を用いる。 In the Type A Nyquist filter bank used in the prior art, the following complex modulation coefficient (Equation 9) is used. In the present invention, the following real modulation coefficient (Equation 10) is used.

ＴｙｐｅＢフィルタは、下記（数１１）の実数係数のナイキストフィルタバンクを使用する。 The Type B filter uses a real coefficient Nyquist filter bank of (Equation 11) below.

重複しないハイブリッドサブバンドの数は、実数型のＴｙｐｅＡのナイキストフィルタによって分割されたサブバンドの総数の二分の一となる。表１の６列目には実数係数を用いた場合のハイブリッドサブバンドの数が書かれ、７列目には前記実数係数を用いた場合のハイブリッドサブバンドの添字と複素係数を用いた場合のハイブリッドサブバンドの添字の対応が書かれている。 The number of non-overlapping hybrid subbands is one half of the total number of subbands divided by the real type A Nyquist filter. The sixth column of Table 1 shows the number of hybrid subbands when using real coefficients, and the seventh column shows the case where hybrid subband subscripts and complex coefficients when using real coefficients are used. The subscript correspondence of the hybrid subband is written.

次に上記の実数係数のフィルターバンクを使用した際に発生するエリアジングの影響を抑制するための方法について説明する。
反射係数算出モジュール（５０６）では、下記（数１２）で示されるような式で算出される反射係数ｒｅｆ（ｓｂ）を用いて、高いトーナル成分が存在するハイブリッドサブバンドを特定する。 Next, a method for suppressing the influence of aliasing that occurs when the above-described real coefficient filter bank is used will be described.
In the reflection coefficient calculation module (506), a hybrid subband in which a high tonal component exists is specified by using the reflection coefficient ref (sb) calculated by the following equation (Equation 12).

ここで、反射係数ｒｅｆ（ｓｂ）の値は−１から１の間の値をとる。 Here, the value of the reflection coefficient ref (sb) takes a value between −1 and 1.

表２は、ＬＣモードにおいて隣り合う２つハイブリッドサブバンドの間にトーナル成分が存在するときの隣接する２つのサブバンドのｒｅｆ（ｓｂ）の符号ｓｒｅｆ（ｓｂ）を示す。例えば、トーナル成分が２番目と３番目のハイブリッドサブバンドの間に存在するならば（表２のｓｂ＜７，ｓｂ＝偶数の場合）、表２よりｒｅｆ（２）とｒｅｆ（３）の符号はともに正（＋）となる。２つのハイブリッドサブバンドの周波数応答が重なり合う領域のより近くにトーナル成分が存在するほど、ｒｅｆ（ｓｂ）の絶対値である｜ｒｅｆ（ｓｂ）｜の値は大きくなり、エリアジングが発生する危険性が高くなる。ＨＱモードについても同様の関係を表３に示す。 Table 2 shows codes sref (sb) of ref (sb) of two adjacent subbands when a tonal component is present between two adjacent hybrid subbands in the LC mode. For example, if a tonal component is present between the second and third hybrid subbands (when sb <7, sb = even in Table 2), the codes of ref (2) and ref (3) from Table 2 Are both positive (+). The closer to the region where the frequency responses of the two hybrid subbands overlap, the greater the value of | ref (sb) |, which is the absolute value of ref (sb), and the risk of aliasing Becomes higher. A similar relationship is shown in Table 3 for the HQ mode.

イコライジング処理は、ｒｅｆ（ｓｂ）とｒｅｆ（ｓｂ＋１）の値から、トーナル成分がｓｂとｓｂ＋１の間に存在することが判明した場合、Ｒ（ｓｂ）とＲ（ｓｂ＋１）は互いに近づくように、トーナル成分が強ければ強いほど、これら２つの値はより近づくよう調整する。 In the equalizing process, when it is determined from the values of ref (sb) and ref (sb + 1) that the tonal component exists between sb and sb + 1, the tonal The stronger the component, the closer these two values are adjusted.

図７は、イコライジングモジュール（５０７）での上記のイコライジング処理の詳細を説明するためのフローチャートである。ここでｖ（ｓｂ）はハイブリッドサブバンドｓｂにおけるスケーリングファクタであるＲ₁（ｓｂ）とミキシングファクタＲ₂（ｓｂ）を示す。モジュール（５０７）では、ｒ行ｃ列ｎ時点について、ｖ（ｓｂ）＝Ｒ₁（ｓｂ）となり、モジュール（５０８）では、ｒ行ｃ列ｎ時点について、ｖ（ｓｂ）＝Ｒ₂（ｓｂ）となる。 FIG. 7 is a flowchart for explaining the details of the equalizing process in the equalizing module (507). Here, v (sb) represents a scaling factor R ₁ (sb) and a mixing factor R ₂ (sb) in the hybrid subband sb. In the module (507), v (sb) = R ₁ (sb) at the time point r row c column n, and v (sb) = R ₂ (sb) at the time point r row c column n in the module (508). It becomes.

始めにステップ（７００）によってサブバンドインデックスｓｂが０に初期化される。ステップ（７０１）は全てのサブバンドが処理されたかを確認する。すべてのサブバンドの処理が完了すれば、イコライジング処理は終了する。 First, the subband index sb is initialized to 0 by step (700). Step (701) checks if all subbands have been processed. If the processing of all the subbands is completed, the equalizing process ends.

ステップ（７０２）で各サブバンドｓｂについて、ｒｅｆ０とｒｅｆ１を算出する。これらはサブバンドｓｂとｓｂ＋１の反射係数ｒｅｆ（ｓｂ）とｒｅｆ（ｓｂ＋１）に、表２または表３に記載されたそれらの極性ｓｒｅｆ（ｓｂ）とｓｒｅｆ（ｓｂ＋１）をそれぞれ乗じた値である。ここで、ｒｅｆ０とｒｅｆ１の平均値とｖ（ｓｂ）とｖ（ｓｂ＋１）の平均値が算出され、それぞれａｖｅ＿ｒｅｆとａｖｅ＿ｖとして保存される。 In step (702), ref0 and ref1 are calculated for each subband sb. These are values obtained by multiplying the reflection coefficients ref (sb) and ref (sb + 1) of the subbands sb and sb + 1 by their polarities sref (sb) and sref (sb + 1) described in Table 2 or Table 3, respectively. Here, the average value of ref0 and ref1 and the average value of v (sb) and v (sb + 1) are calculated and stored as ave_ref and ave_v, respectively.

任意の実数に対して、その実数の極性（＋１あるいは−１）を乗ずると正の値となるので、ｓｂとｓｂ＋１との間に単一のトーナル成分が存在する場合、ｒｅｆ０とｒｅｆ１は正の値となる。ステップ（７０３）は、ｒｅｆ０およびｒｅｆ１がともに正の値であるかを確認し、層でない場合、ステップ（７０８）でサブバンドｓｂをインクリメントし、次のサブバンドについて同様の処理が繰り返し行われる。 When an arbitrary real number is multiplied by the polarity (+1 or -1) of the real number, a positive value is obtained. Therefore, when a single tonal component exists between sb and sb + 1, ref0 and ref1 are positive. Value. In step (703), it is confirmed whether ref0 and ref1 are both positive values. If not, the subband sb is incremented in step (708), and the same processing is repeated for the next subband.

ステップ（７０４）において、各サブバンドについてのａｖｅ＿ｒｅｆと第２の閾値ＴＨ２（ただしＴＨ２＞ＴＨ１）とが比較され、ａｖｅ＿ｒｅｆの値のほうが大きいならば、ｓｂとｓｂ＋１の平均トーナリティは非常に高く、エリアジングの影響が大きく、最大限のイコライジング処理が必要であると判断する。このような場合、ステップ（７０５）により空間パラメータについて２つのサブバンドの平均値ａｖｅ＿ｖを出力とする。 In step (704), ave_ref for each subband is compared with a second threshold TH2 (where TH2> TH1), and if the value of ave_ref is greater, the average tonality of sb and sb + 1 is very high and the area It is judged that the influence of ging is large and the maximum equalizing process is necessary. In such a case, the average value ave_v of the two subbands is output for the spatial parameter in step (705).

ステップ（７０６）ではａｖｅ＿ｒｅｆと第１の閾値ＴＨ１（ただしＴＨ１＜ＴＨ２）とを比較する。Ａｖｅ＿ｒｅｆの値の方が小さいならば、エリアジングの影響を無視できるほどｓｂとｓｂ＋１のトーナリティが低いことを意味する。よってこの場合、イコライジング処理は行わない。 In step (706), ave_ref is compared with the first threshold value TH1 (where TH1 <TH2). If the value of Ave_ref is smaller, it means that the tonality of sb and sb + 1 is so low that the influence of aliasing can be ignored. Therefore, in this case, the equalizing process is not performed.

第１の閾値ＴＨ１と第２の閾値ＴＨ２の間の値となるａｖｅ＿ｒｅｆを持つすべてのサブバンドについて、ステップ（７０７）の処理により、ｓｂ及びｓｂ＋１の空間パラメータは線形的に補間された値に調整される。この補間処理は、ａｖｅ＿ｒｅｆが第１の閾値ＴＨ１に近ければｖ（ｓｂ）とｖ（ｓｂ＋１）はその元の値に近くなり、ａｖｅ＿ｒｅｆが第２の閾値ＴＨ２に近ければ、ｖ（ｓｂ）とｖ（ｓｂ＋１）はその平均値である０．５＊（ｖ（ｓｂ）＋ｖ（ｓｂ＋１））に近くなる。 For all subbands having ave_ref that is between the first threshold value TH1 and the second threshold value TH2, the spatial parameter of sb and sb + 1 is adjusted to a linearly interpolated value by the processing of step (707). Is done. In this interpolation process, v (sb) and v (sb + 1) are close to their original values if ave_ref is close to the first threshold TH1, and v (sb) and v are appropriate if ave_ref is close to the second threshold TH2. (Sb + 1) is close to the average value of 0.5 * (v (sb) + v (sb + 1)).

ステップ（７００）からステップ（７０８）までの処理は、Ｒ₁とＲ₂の全ての行ｒ，列ｃ，時間ｎについて行われる。 The processing from step (700) to step (708) is performed for all rows r, columns c, and times n of R ₁ and R ₂ .

空間パラメータの量を削減するため、一定の範囲のｓｂとｎの領域で空間パラメータを共有するパラメータバンドに（ｎ，ｓｂ）平面を分割し、連続したサブバンドは同じ空間パラメータを持つという知見を利用して、上記の実施例を高速化することが可能である。パラメータバンドの切り替わりの結果、空間パラメータが変化したサブバンドのみ、イコライジング処理を施す。この場合、例えばステップ（７０３）の条件式を、次のように変形すればよい。 In order to reduce the amount of spatial parameters, we divide the (n, sb) plane into parameter bands that share spatial parameters in a certain range of sb and n regions, and the knowledge that consecutive subbands have the same spatial parameters By utilizing this, it is possible to speed up the above embodiment. Only the subbands whose spatial parameters have changed as a result of the switching of the parameter bands are subjected to equalizing processing. In this case, for example, the conditional expression in step (703) may be modified as follows.

（ｒｅｆ０＞０＆＆ｒｅｆ１＞０）＆＆（ＰＡＲＡＭ＿ＢＡＮＤ（ｓｂ）！＝ＰＡＲＡＭ＿ＢＡＮＤ（ｓｂ＋１）） (Ref0> 0 && ref1> 0) && (PARAM_BAND (sb)! = PARAM_BAND (sb + 1))

無相関化モジュール（５０３）では、非整数遅延係数は除去され、（数１３）の複素ラティス係数の代わりに、以下の式（数１４）のように実数ラティス係数を用いて処理が行われる。 In the decorrelation module (503), the non-integer delay coefficient is removed, and processing is performed using a real lattice coefficient as shown in the following expression (Expression 14) instead of the complex lattice coefficient of (Expression 13).

上記のように非整数遅延係数を除去した際の出力信号のエコー密度の現象は僅かであるため、高音質を維持したまま、演算量やメモリ容量の大幅な削減を実現することができる。 As described above, since the phenomenon of the echo density of the output signal when the non-integer delay coefficient is removed is small, it is possible to realize a significant reduction in the calculation amount and the memory capacity while maintaining high sound quality.

最後に、ナイキスト合成フィルタバンク及びＱＭＦ合成フィルタバンクから構成される合成フィルタバンク（５０５）において、従来例で説明した合成フィルタバンク（４０５）のＱＭＦ合成フィルターバンクで用いられる（数１５）の複素変調係数に代わって、以下の（数１６）の実数変調係数を用いて、合成フィルター処理が行われ、時間領域の復号化されたマルチチャンネル信号が出力される。 Finally, in the synthesis filter bank (505) composed of the Nyquist synthesis filter bank and the QMF synthesis filter bank, the complex modulation of (Expression 15) used in the QMF synthesis filter bank of the synthesis filter bank (405) described in the conventional example. Instead of the coefficients, synthesis filter processing is performed using the following real number modulation coefficients of (Expression 16), and a time-domain decoded multi-channel signal is output.

（実施の形態２）
図６は、本発明の第２の実施の形態を説明するためのデコーダのブロック図である。実施の形態１のブロック図である図５との相違は、反射係数算出モジュール（５０６）が、反射係数／トーナリティ算出モジュール（６０６）に置き換わり、イコライジングモジュール（６０７，６０８）の動作が一部異なることであり、それ以外のモジュールの動作は図５と同一であるのでここでは説明を省略する。前記第１の実施の形態で説明した反射係数によってトーナル成分が存在するサブバンドを特定し、別のトーナリティ測定手段がイコライジングの必要度合いを評価するために用いられる。 (Embodiment 2)
FIG. 6 is a block diagram of a decoder for explaining the second embodiment of the present invention. The difference from FIG. 5 which is the block diagram of the first embodiment is that the reflection coefficient calculation module (506) is replaced with the reflection coefficient / tonality calculation module (606), and the operations of the equalizing modules (607, 608) are partially different. That is, the operation of the other modules is the same as that in FIG. The subband in which the tonal component is present is specified by the reflection coefficient described in the first embodiment, and another tonality measurement unit is used to evaluate the necessity level of equalization.

例えば、ｇフレーム目のｓｂサブバンドにおけるトーナリティＴ_g（ｓｂ）は次式（数１７）に示すようにエネルギで重み付けされたコヒーレンスの平均値として算出される。 For example, the tonality T _g (sb) in the sb subband of the g-th frame is calculated as an average value of coherence weighted with energy as shown in the following equation (Equation 17).

ここで、下記（数１８）は２つのフレームｇとｇ−１における信号パワーを示し、下記（数１９）は前記フレーム間のコヒーレンスを示す。 Here, (Equation 18) below shows the signal power in the two frames g and g-1, and (Equation 19) below shows the coherence between the frames.

また、下記（数２０）は０から１の間の値をとり、０はトーナリティがまったく存在しないことを示し、１は非常に高いトーナリティが存在することを示す。次式（数２１）に示すように、対象となる２つのフレームのトーナリティの内、小さい方の値を最終的なトーナリティとする。 The following (Equation 20) takes a value between 0 and 1, with 0 indicating no tonality and 1 indicating a very high tonality. As shown in the following equation (Equation 21), the smaller value of the tonalities of the two target frames is set as the final tonality.

図８は、トーナリティを用いたイコライジング処理の方法を説明するためのフローチャートである。第１の実施の形態で説明した反射係数を用いたイコライジング処理（図７）との違いは、（８０３）で反射係数を用いてトーナル成分が存在するサブバンドを確定した後、ステップ（８０２）で算出されたサブバンドｓｂとサブバンドｓｂ＋１の平均トーナリティであるａｖｅ＿Ｔを用いて、ステップ（８０５）およびステップ（８０６）の閾値判定を行い、ステップ（８０７）でａｖｅ＿Ｔを用いてイコライジング処理を行うことである。それ以外のステップの処理の内容は、図７と同一である。 FIG. 8 is a flowchart for explaining a method of equalizing processing using tonality. The difference from the equalizing process using the reflection coefficient described in the first embodiment (FIG. 7) is that the subband in which the tonal component exists is determined using the reflection coefficient in (803), and then the step (802). Ave_T that is the average tonality of subband sb and subband sb + 1 calculated in step 805 is used to perform threshold determination in step (805) and step (806), and equalization processing is performed using ave_T in step (807). It is. The contents of the processing in the other steps are the same as those in FIG.

（実施の形態３）
本発明の実施の形態３として、前記実施の形態１および２のイコライジングの処理を、周波数スペクトル上の一部分にのみ適用する場合について図９を用いて説明する。 (Embodiment 3)
As a third embodiment of the present invention, a case where the equalizing process of the first and second embodiments is applied to only a part of the frequency spectrum will be described with reference to FIG.

図９は、周波数スペクトルを、サブバンドＳＢ＿ＳＴＡＲＴおよびＳＢ＿ＳＴＯＰで分割した様子を示し、例えば実施の形態１および２で説明したイコライジングの処理を図９のＢの領域のみ行うようにする。具体的には、周波数スペクトル上の低域部分に相当する’Ａ’領域では、エリアジングをほとんど発生させないために複素数処理のままにしておき、’Ｂ’領域では実施の形態１および２で説明した方法によるイコライジングを行い。’Ｃ’領域では従来用いられている他のイコライジングを行う。すなわち、本実施例は従来の他のイコライジング方式と共存することが可能である。 FIG. 9 shows a state where the frequency spectrum is divided by subbands SB_START and SB_STOP. For example, the equalizing process described in the first and second embodiments is performed only in the region B in FIG. Specifically, in the 'A' region corresponding to the low frequency part on the frequency spectrum, complex processing is left in order to hardly generate aliasing, and in the 'B' region, the description will be given in the first and second embodiments. Perform equalization by the method. In the 'C' region, other conventional equalizing is performed. That is, this embodiment can coexist with other conventional equalizing methods.

実際には、上記一部の周波数スペクトル領域へのイコライジングの処理は、たとえば図７および図８のフローチャートにおいて、ステップ（７００）およびステップ（８００）でｓｂをＳＢ＿ＳＴＡＲＴという数値で初期化し、ステップ（７０１）およびステップ（８０１）において終了条件をｓｂ＝＝ＳＢ＿ＳＴＯＰ - １と置き換えることによって実現できる。 Actually, the equalizing process to the part of the frequency spectrum region is performed by, for example, initializing sb with a numerical value SB_START in steps (700) and (800) in the flowcharts of FIGS. ) And step (801) can be realized by replacing the termination condition with sb == SB_STOP-1.

なお、これらの処理は、図９の場合に限定されず、エリアジングの影響が生じる可能性のない周波数スペクトルの帯域があれば、当該帯域についてのイコライジングを行わないようにするように構成することも可能である。 Note that these processes are not limited to the case of FIG. 9, and if there is a band of a frequency spectrum that is not likely to cause aliasing, it is configured not to perform equalization for the band. Is also possible.

（実施の形態４）
本発明の実施の形態４として、１つのＱＭＦサブバンドに２つの近接したトーナル成分が存在する場合を考慮したイコライジング処理方法について説明する。これは表２および表３において、ＬＣモードではｓｂが７より大きいとき、ＨＱモードではｓｂが２３より大きいときが対象となる。 (Embodiment 4)
As an embodiment 4 of the present invention, an equalizing processing method considering a case where two adjacent tonal components exist in one QMF subband will be described. In Table 2 and Table 3, this applies when sb is larger than 7 in the LC mode and when sb is larger than 23 in the HQ mode.

ｓｂが偶数かつｓｂとｓｂ＋１の間にトーナル成分が存在するならば、表２および表３から明らかなように、ｒｅｆ（ｓｂ）とｒｅｆ（ｓｂ＋１）はともに負の値とある。同様にｓｂが偶数かつｓｂ＋１とｓｂの間にトーナル成分が存在するならば、ｒｅｆ（ｓｂ＋１）とｒｅｆ（ｓｂ＋２）はともに正の値である。 If sb is an even number and a tonal component exists between sb and sb + 1, as is clear from Tables 2 and 3, both ref (sb) and ref (sb + 1) are negative values. Similarly, if sb is an even number and a tonal component exists between sb + 1 and sb, ref (sb + 1) and ref (sb + 2) are both positive values.

しかしながら１つのＱＭＦサブバンドに同時に２つのトーナル成分が存在するときには、ｒｅｆ（ｓｂ＋１）の符号は対応するトーナル成分のエネルギに依存し、高域側のトーナル成分が低域側のトーナル成分よりエネルギが大きければｒｅｆ（ｓｂ＋１）は正の値となり、低域側のトーナル成分のエネルギが大きければｒｅｆ（ｓｂ＋１）は負の値となる。よってこの場合は、ｒｅｆ（ｓｂ＋１）を用いてイコライジングの量を決定することができない。 However, when two tonal components exist simultaneously in one QMF subband, the sign of ref (sb + 1) depends on the energy of the corresponding tonal component, and the tonal component on the high frequency side has more energy than the tonal component on the low frequency side. If it is large, ref (sb + 1) becomes a positive value, and if the energy of the tonal component on the low frequency side is large, ref (sb + 1) becomes a negative value. Therefore, in this case, the equalizing amount cannot be determined using ref (sb + 1).

以下、実施の形態４のイコライジングの方法を、図１０のフローチャートを用いて説明する。まずステップ（１０００）においてｓｂはＳＴＡＲＴ＿ＳＢで初期化される。ここで、ＬＣモードのときＳＴＡＲＴ＿ＳＢは８であり、ＨＱモードではＳＴＡＲＴ＿ＳＢは２４である。 Hereinafter, the equalizing method according to the fourth embodiment will be described with reference to the flowchart of FIG. First, in step (1000), sb is initialized with START_SB. Here, START_SB is 8 in the LC mode, and START_SB is 24 in the HQ mode.

ステップ（１００２）においてｓｂから始まる３つの連続したサブバンドの反射係数ｒｅｆ（ｓｂ）、ｒｅｆ（ｓｂ＋１）、ｒｅｆ（ｓｂ＋２）が算出され、ｓｂが偶数の時には符号を負、すなわちｓｉｇｎ（ｓｂ）を−１とし、ｓｂが奇数の時にはｓｉｇｎ（ｓｂ）を１として、各反射係数にｓｉｇｎ（ｓｂ）を乗じた値を求め、それぞれｒｅｆ０，ｒｅｆ１，ｒｅｆ２として記録される。 In step (1002), the reflection coefficients ref (sb), ref (sb + 1), and ref (sb + 2) of three consecutive subbands starting from sb are calculated. When sb is an even number, the sign is negative, that is, sign (sb) is set. When -1 is -1 and sb is an odd number, sign (sb) is set to 1, and values obtained by multiplying each reflection coefficient by sign (sb) are obtained and recorded as ref0, ref1, and ref2, respectively.

ステップ（１００３）において、ｒｅｆ０とｒｅｆ２の符号が異なっているかを調べ、異なっている場合には１つのＱＭＦサブバンドに２つのトーナル成分が存在すると推定される。この場合、ｒｅｆ１は高域のトーナル成分に影響されてしまうため、ステップ（１００４）によってａｖｅ＿ｒｅｆとしてｒｅｆ０が用いられる。一方、ｒｅｆ０とｒｅｆ２の符号が等しい場合には、１つのＱＭＦサブバンドに１つのトーナル成分が存在すると推定され、ステップ（１００５）から以降の処理は、実施の形態１で説明したものと同様の処理が行われる。 In step (1003), it is checked whether the signs of ref0 and ref2 are different. If they are different, it is estimated that two tonal components exist in one QMF subband. In this case, since ref1 is affected by the high frequency tonal component, ref0 is used as ave_ref in step (1004). On the other hand, when the codes of ref0 and ref2 are equal, it is estimated that one tonal component exists in one QMF subband, and the processing from step (1005) is the same as that described in the first embodiment. Processing is performed.

なお、本発明の実施の形態１、２，３および４では、図４、図５、図６に示されるようなプリマトリクス化モジュールとポストマトリクス化モジュールで構成される音響空間デコーダを用いて説明したが、Ｒ１を生成するためにプリマトリクス化モジュールとポストマトリクス化モジュールを統合した統合マトリクス化モジュールを用いて構成することも可能である。図１１は、図５が統合マトリクス化モジュールで構成された場合に、反射係数を用いてイコライジング処理を行う音響空間デコーダのブロック図である。ここでＲ１をイコライジングするイコライジングモジュール（１１０６）の処理として、本発明の実施の形態１あるいは４で説明したイコライジングの方法を適用することが可能である。図６に関しても同様に、統合マトリクス化モジュールで構成することができ、イコライジング処理は、実施の形態２あるいは４で説明したものと同様の方法を適用することが可能である。また、これらのいずれの場合においても、ｕ（ｎ．ｓｂ）＝ｘ（ｎ．ｓｂ）と近似すると、反射係数を１回だけ計算すればよく、さらなる演算量の削減も可能となる。 In the first, second, third, and fourth embodiments of the present invention, description will be made using an acoustic space decoder composed of a pre-matrix module and a post-matrix module as shown in FIGS. However, it is also possible to use an integrated matrixing module in which a prematrixing module and a postmatrixing module are integrated to generate R1. FIG. 11 is a block diagram of an acoustic space decoder that performs equalizing processing using reflection coefficients when FIG. 5 is configured by an integrated matrix module. Here, as the process of the equalizing module (1106) for equalizing R1, the equalizing method described in the first or fourth embodiment of the present invention can be applied. Similarly, FIG. 6 can also be configured by an integrated matrix module, and the equalizing process can be performed by the same method as that described in the second or fourth embodiment. Further, in any of these cases, when approximated to u (n.sb) = x (n.sb), the reflection coefficient needs to be calculated only once, and the amount of calculation can be further reduced.

本発明のマルチチャンネルオーディオ復号装置は、低消費電力で、かつ少ないメモリー容量で処理を行うことのできる復号装置であって、放送等の低ビットレートの応用をはじめ、ホームシアターシステム、車載音響システム及び電子ゲームシステムに適用可能である。 The multi-channel audio decoding device of the present invention is a decoding device that can perform processing with low power consumption and a small memory capacity, including application of low bit rate such as broadcasting, home theater system, in-vehicle audio system, and Applicable to electronic game systems.

空間音響符号化復号化処理の基本原理を示す図である。It is a figure which shows the basic principle of a spatial acoustic coding decoding process. ダウンミックス信号を個々の信号に分離する多段チャンネル分離方法を示す図である。It is a figure which shows the multistage channel separation method which isolate | separates a downmix signal into each signal. チャンネル分離の原理を示す図である。It is a figure which shows the principle of channel separation. 従来の空間音響復号化技術を示すブロック図である。It is a block diagram which shows the conventional spatial acoustic decoding technique. 本発明の実施の形態１のデコーダブロック図である。It is a decoder block diagram of Embodiment 1 of the present invention. 本発明の実施の形態２のデコーダブロック図である。It is a decoder block diagram of Embodiment 2 of the present invention. 本発明の実施の形態１のイコライジング処理のフローチャートである。It is a flowchart of the equalizing process of Embodiment 1 of this invention. 本発明の実施の形態２のイコライジング処理のフローチャートである。It is a flowchart of the equalizing process of Embodiment 2 of this invention. 本発明の実施の形態３のイコライジング処理を説明するための図である。It is a figure for demonstrating the equalizing process of Embodiment 3 of this invention. 本発明の実施の形態４のイコライジング処理のフローチャートである。It is a flowchart of the equalizing process of Embodiment 4 of this invention. 統合化マトリクス処理を適用する場合のデコーダのブロック図である。It is a block diagram of a decoder in the case of applying integrated matrix processing.

Explanation of symbols

５００オーディオデコーダ
５０１分析フィルタバンク
５０３無相関化モジュール
５０６反射係数算出モジュール
５０７イコライジングモジュール 500 audio decoder 501 analysis filter bank 503 decorrelation module 506 reflection coefficient calculation module 507 equalizing module

Claims

A real number analysis filter bank for converting a signal in the time domain into a plurality of subband signals, a non-correlation module having a real type all-pass filter for generating a decorrelation signal corresponding to the subband signal, and the subband signal Multi-channel audio comprising a channel expansion module for converting to a multi-channel sub-band signal, and a synthesis filter bank for real number operation for converting the multi-channel sub-band signal converted by the channel expansion module to a signal in the time domain A decoding device,
Further, a reflection coefficient calculation module that calculates a reflection coefficient from each subband signal, and a subband in which a strong tonal component exists is specified using the reflection coefficient, and the reflection coefficient is set to suppress the influence of aliasing. A multi-channel audio decoding apparatus, comprising: an equalizing module for adjusting an output signal of the channel expansion module.

A real number analysis filter bank for converting a signal in the time domain into a plurality of subband signals, a non-correlation module having a real type all-pass filter for generating a decorrelation signal corresponding to the subband signal, and the subband signal A channel expansion module that converts multi-channel subband signals;
A multi-channel audio decoding device comprising a real-valued synthesis filter bank for converting a multi-channel subband signal converted by the channel expansion module into a time-domain signal,
Further, a reflection coefficient calculation module that calculates a reflection coefficient from each of the subband signals, a tonality calculation module that calculates a tonality from the subband signal, and a subband in which a strong tonal component exists is specified using the reflection coefficient. A multi-channel audio decoding apparatus comprising: an equalizing module that adjusts an output signal of the channel expansion module using the tonality in order to suppress the influence of aliasing.

The reflection coefficient takes a value close to +1 or −1 when a single frequency component exists between two consecutive subbands, and uses the equation (Equation 1), and the sign is the continuous subband. The multi-channel audio decoding device according to claim 1, wherein the multi-channel audio decoding device is a value determined by even-oddness of.

4. The multi-channel audio decoding apparatus according to claim 3, wherein the sign of the reflection coefficient is obtained by referring to a table.

When equalizing the output signal of the channel expansion module, if the average of the absolute values of the reflection coefficients of adjacent subbands is equal to or greater than the first threshold, the average value of the reflection coefficients is output corresponding to each subband. When the average of the absolute values of the reflection coefficients of adjacent subbands is smaller than the second threshold, the reflection coefficient is output corresponding to each subband, and the absolute value of the reflection coefficient of the adjacent subbands The average value of the reflection coefficient and the average value of the reflection coefficient is output as an output corresponding to each subband when the average is smaller than the first threshold and greater than or equal to the second threshold. Multi-channel audio device.

When equalizing the output signal of the channel expansion module, if the average value of the tonalities of adjacent subbands is equal to or greater than the first threshold, the average value of the tonality is set as an output corresponding to each subband and adjacent to each other. If the average value of the tonalities of the subbands is smaller than the second threshold value, the tonality is output corresponding to each subband, and the average value of the tonality of adjacent subbands is smaller than the first threshold value and the second threshold value. The multi-channel audio decoding apparatus according to claim 2, wherein when the threshold value is equal to or greater than the threshold value, the tonality and an average value of the tonality are output corresponding to each subband.

The analysis filter bank includes a real coefficient QMF analysis filter bank that converts a time domain signal into a plurality of subband signals, and a real coefficient Nyquist analysis filter bank that extends the resolution of the subband signal. 7. The multi-channel audio decoding apparatus according to claim 1, comprising a real coefficient Nyquist synthesis filter bank and a real coefficient QMF synthesis filter bank.

The multi-channel audio decoding apparatus according to any one of claims 1 to 7, wherein the equalizing process is applied only to a part of a frequency band of the audio signal.

The multi-channel audio decoding device according to any one of claims 1 to 8, wherein subbands sharing a spatial parameter are grouped as one parameter band, and equalization processing is performed for each parameter band. .

Converting a time-domain signal into a plurality of subband signals by an analysis filter operation of a real coefficient; generating an uncorrelated signal corresponding to the subband signal by a real-type all-pass filter; A multi-channel audio decoding method comprising: a step of converting into a channel sub-band signal; and a step of converting the converted multi-channel sub-band signal into a time-domain signal by a synthesis filter operation of a real coefficient. ,
Further, a step of calculating a reflection coefficient from each of the subband signals, a subband in which a strong tonal component exists is identified using the calculated reflection coefficient, and the reflection coefficient is suppressed in order to suppress the influence of aliasing. And a method of equalizing the converted multi-channel subband output signal using the multi-channel audio decoding method.

Converting a time-domain signal into a plurality of subband signals by an analysis filter operation of a real coefficient; generating an uncorrelated signal corresponding to the subband signal by a real-type all-pass filter; A multi-channel audio decoding method comprising: a step of converting into a channel sub-band signal; and a step of converting the converted multi-channel sub-band signal into a time-domain signal by a synthesis filter operation of a real coefficient. ,
Further, a step of calculating a reflection coefficient from each subband signal, a step of calculating a tonality from the subband signal, a subband in which a strong tonal component is present is identified using the reflection coefficient, and the influence of aliasing And a step of equalizing the converted multi-channel differential band output signal using the tonality.

The reflection coefficient takes a value close to +1 or −1 when a single frequency component exists between two consecutive subbands, and uses the equation (Equation 2), and the sign is the continuous subband. The multi-channel audio decoding method according to claim 10 or 11, wherein the multi-channel audio decoding method is a value determined by an even / odd number.

13. The multi-channel audio decoding method according to claim 12, wherein the sign of the reflection coefficient is obtained by referring to a table.

When equalizing the converted multi-channel subband output signal, if the average of the absolute values of the reflection coefficients of adjacent subbands is greater than or equal to the first threshold value, the average value of the reflection coefficients is used for each subband. When the average of the absolute values of the reflection coefficients of adjacent subbands is smaller than the second threshold, the reflection coefficient is output corresponding to each subband, and the reflection of adjacent subbands is reflected. The average of the absolute value of the coefficient is smaller than the first threshold value and greater than or equal to the second threshold value, and the reflection coefficient and the average value of the reflection coefficient are output as corresponding to each subband. Item 11. The multichannel audio method according to Item 10.

When equalizing the converted multi-channel subband output signal, if the average value of the tonalities of adjacent subbands is equal to or greater than the first threshold, the average value of the tonality is output corresponding to each subband. If the average value of the tonality of adjacent subbands is smaller than the second threshold, the tonality is output corresponding to each subband, and the average value of the tonality of adjacent subbands is the first threshold. 12. The multi-channel audio decoding method according to claim 11, wherein the tonality and the average value of the tonality are output corresponding to respective subbands when smaller than a second threshold value.

The analysis filter operation is composed of a real coefficient QMF analysis filter operation that converts a signal in the time domain into a plurality of subband signals and a real coefficient Nyquist analysis filter operation that extends the resolution of the subband signal. 16. The multi-channel audio decoding method according to claim 10, comprising: a Nyquist synthesis filter operation for real coefficients and a QMF synthesis filter operation for real coefficients.

The multi-channel audio decoding method according to any one of claims 10 to 16, wherein the equalizing process is applied only to a part of the frequency band of the audio signal.

The multi-channel audio decoding method according to any one of claims 10 to 17, wherein subbands sharing a spatial parameter are grouped as one parameter band, and equalization processing is performed for each parameter band. .

A program for causing a computer to execute the multi-channel audio decoding method according to any one of claims 10 to 18.

An information recording medium on which the program according to claim 19 is recorded.