JP2006323314A

JP2006323314A - Apparatus for binaural-cue-coding multi-channel voice signal

Info

Publication number: JP2006323314A
Application number: JP2005148771A
Authority: JP
Inventors: Sen Chon Kok; セン・チョンコク; Naoya Tanaka; 直也田中; Hon Neo Sua; ホン・ネオスア
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2005-05-20
Filing date: 2005-05-20
Publication date: 2006-11-30

Abstract

<P>PROBLEM TO BE SOLVED: To provide an apparatus for carrying out binaural-cue-coding of a multi-channel voice signal, capable of playing back an effect of an original signal with high quality only by multi-channel, in a coding processing which extracts a binaural cue and down-mixes the original signal. <P>SOLUTION: After deriving a desired vector relation between a down-mix channel and an original channel from a binaural cue at first, an accurate vector relation between a down-mix signal and a signal that is orthogonal and non-correlated thereto is simulated. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、符号化処理においてバイノーラルキューを抽出して、ダウンミックス信号を生成し、復号化処理において前記バイノーラルキューを前記ダウンミックス信号に付加することでマルチチャネル音声信号を圧縮する装置に関する。本発明は、ホームシアターシステム、カーオーディオシステム、電子ゲームシステム等に適用可能である。 The present invention relates to an apparatus for compressing a multi-channel audio signal by extracting a binaural cue in an encoding process to generate a downmix signal and adding the binaural cue to the downmix signal in a decoding process. The present invention is applicable to a home theater system, a car audio system, an electronic game system, and the like.

本発明は、マルチチャネル音声信号の符号化に関する。主たる目的は、ビットレートに制約がある場合でも、デジタル音声信号の聴覚上のクオリティーを最大限に保ちつつ当該デジタル音声信号の符号化を行うことである。ビットレートが低くなると、伝送帯域幅および記憶容量を小さくするという点で有利である。 The present invention relates to encoding multi-channel audio signals. The main purpose is to encode the digital audio signal while maintaining the audible quality of the digital audio signal to the maximum even when the bit rate is limited. Lowering the bit rate is advantageous in reducing the transmission bandwidth and storage capacity.

従来より、上記のようにビットレート低減を実現するための方法が多く存在する。
”ＭＳステレオ”による方法では、ステレオチャネルＬおよびＲが、それらの”和”（Ｌ＋Ｒ）および”差分”（Ｌ−Ｒ）という形で表現される。これらのステレオチャネルの相関性が高い場合、”差分”信号に含まれるのは、”和”信号よりも少ないビットで粗い量子化を施せる重要度の低い情報である。Ｌ＝Ｒとなるような極端な例では、差分信号に関する情報を送信する必要はない。 Conventionally, there are many methods for realizing the bit rate reduction as described above.
In the “MS stereo” method, the stereo channels L and R are represented in the form of their “sum” (L + R) and “difference” (LR). When these stereo channels are highly correlated, the “difference” signal includes less important information that can be coarsely quantized with fewer bits than the “sum” signal. In an extreme example where L = R, it is not necessary to transmit information regarding the differential signal.

”インテンシティステレオ”による方法では、耳が持つ音響心理学的特性を利用し、高周波数領域に対しては、周波数に対応するスケールファクタと共に”和”信号のみを送信し、デコーダ側でそのスケールファクタを”和”信号に適用して、ＬチャネルおよびＲチャネルを合成する。 The “intensity stereo” method uses the psychoacoustic characteristics of the ears, and for the high frequency range, only the “sum” signal is transmitted along with the scale factor corresponding to the frequency, and the scale is set on the decoder side. A factor is applied to the “sum” signal to synthesize the L and R channels.

”バイノーラルキュー符号化”による方法では、復号化処理においてダウンミックス信号の形成を行うために、バイノーラルキューが生成される。バイノーラルキューは、例えば、チャネル間レベル／強度差（ＩＬＤ）、チャネル間位相／遅延差（ＩＰＤ）、チャネル間干渉性／相関性（ＩＣＣ）等である。ＩＬＤキューからは相対的な信号のパワーを測定でき、ＩＰＤキューからは音が両耳に届くまでの時間差を測定でき、ＩＣＣキューからは類似性を測定できる。一般に、レベル／強度キューおよび位相／遅延キューにより音声のバランスや方向性を制御でき、干渉性／相関性キューにより音声の幅や拡がりを制御できる。これらのキューは一体となって、聴き手が聴覚的情景を頭の中で構成するのを助ける空間的パラメータとなる。 In the method by “binaural cue coding”, a binaural cue is generated in order to form a downmix signal in the decoding process. The binaural cue is, for example, an inter-channel level / intensity difference (ILD), an inter-channel phase / delay difference (IPD), an inter-channel coherence / correlation (ICC), or the like. The relative signal power can be measured from the ILD queue, the time difference until the sound reaches both ears can be measured from the IPD queue, and the similarity can be measured from the ICC queue. In general, the balance and direction of voice can be controlled by the level / intensity cue and the phase / delay cue, and the width and spread of the voice can be controlled by the coherence / correlation cue. Together, these cues are spatial parameters that help the listener compose an auditory scene in the head.

図１は、バイノーラルキュー符号化による方法を用いた典型的なコーデックを示す図である。符号化処理において、音声信号はフレームごとに処理される。モジュール（１００）は、左チャネルＬおよび右チャネルＲをダウンミックスし、Ｍ＝（Ｌ＋Ｒ）／２を生成する。バイノーラルキュー抽出モジュール（１０２）は、Ｌ、ＲおよびＭを処理し、バイノーラルキューを生成する。バイノーラル抽出モジュール（１０２）は、通常、時間−周波数変換モジュールを備え、当該モジュールにおいてＬ、ＲおよびＭを例えば、ＦＦＴ、ＭＤＣＴ等の完全なスペクトル表現に変換するか、またはＱＭＦ等のような時間と周波数とのハイブリッド表現に変換する。あるいは、スペクトル表現されたＬおよびＲの平均値をとることにより、スペクトル変換後にＬおよびＲからＭを生成することもできる。バイノーラルキューは、上記のように表現されたＬ、ＲおよびＭを、スペクトル帯域ごとに比較することで求めることができる。 FIG. 1 is a diagram illustrating a typical codec using a method based on binaural cue coding. In the encoding process, the audio signal is processed for each frame. Module (100) downmixes left channel L and right channel R to produce M = (L + R) / 2. The binaural queue extraction module (102) processes L, R, and M to generate a binaural queue. The binaural extraction module (102) typically comprises a time-frequency conversion module in which L, R and M are converted into a full spectral representation, eg, FFT, MDCT, or time such as QMF. And a hybrid representation of frequency. Alternatively, M can be generated from L and R after spectral conversion by taking an average value of L and R expressed in a spectrum. The binaural cue can be obtained by comparing L, R, and M expressed as described above for each spectrum band.

オーディオエンコーダ（１０４）は、Ｍ信号を符号化し、圧縮ビットストリームを生成する。オーディオエンコーダの例として、ＭＰ３、ＡＡＣなどがある。バイノーラルキューは、モジュール（１０６）において量子化されてから、圧縮されたＭに多重化され、完全なビットストリームが形成される。復号化処理において、デマルチプレクサ（１０８）はＭのビットストリームをバイノーラルキュー情報から分離する。オーディオデコーダ（１１０）はＭのビットストリームを復号し、ダウンミックス信号Ｍを復元する。マルチチャネル合成モジュール（１１２）は、当該ダウンミックス信号および逆量子化されたバイノーラルキューを処理し、マルチチャネル信号を復元する。
ISO/IEC 14496-3:2001/AMD2, "Parametric Coding for high Quality Audio" US2003/0219130A1, "Coherence-based Audio Coding and Synthesis" Karls, M., Brandenburg, K., et al, "Applications of Digital Signal Processing to Audio and Acoustics", Kluwear Academic Press. JP2004/248989, "Encoding and Decoding Devices for Audio Signals" The audio encoder (104) encodes the M signal and generates a compressed bit stream. Examples of audio encoders include MP3 and AAC. The binaural cues are quantized in module (106) and then multiplexed into compressed M to form a complete bitstream. In the decoding process, the demultiplexer (108) separates the M bitstreams from the binaural queue information. The audio decoder (110) decodes the M bit stream and restores the downmix signal M. The multi-channel synthesis module (112) processes the downmix signal and the dequantized binaural cue to recover the multi-channel signal.
ISO / IEC 14496-3: 2001 / AMD2, "Parametric Coding for high Quality Audio" US2003 / 0219130A1, "Coherence-based Audio Coding and Synthesis" Karls, M., Brandenburg, K., et al, "Applications of Digital Signal Processing to Audio and Acoustics", Kluwear Academic Press. JP2004 / 248989, "Encoding and Decoding Devices for Audio Signals"

本発明は、従来技術におけるバイノーラルキュー符号化に基づく方法を改良することを目的とする。 The present invention aims to improve the method based on binaural cue coding in the prior art.

本発明は、符号化処理においてＱＭＦフィルタバンクを用いてＬチャネルおよびＲチャネルを時間−周波数（Ｔ／Ｆ）表現に変換するバイノーラルキュー符号化方法に関する。 The present invention relates to a binaural cue encoding method for converting an L channel and an R channel into a time-frequency (T / F) representation using a QMF filter bank in an encoding process.

非特許文献１では、ダウンミックス信号と”残響信号”とをミキシングすることで音の拡がりを実現している。残響信号は、ダウンミックス信号をShroederのオールパスリンクを用いて処理することで得ることができる。しかしながら、このミキシング方法は、ダウンミックス信号とオリジナル信号との間のベクトル関係を完全に活用しているとはいえない。 In Non-Patent Document 1, sound spread is realized by mixing a downmix signal and a “reverberation signal”. The reverberation signal can be obtained by processing the downmix signal using Shroeder's all-pass link. However, this mixing method does not fully utilize the vector relationship between the downmix signal and the original signal.

特許文献１では、ＩＬＤキューおよびＩＰＤキューに対して”ランダムシーケンス”を挿入することで、音の拡がり（すなわち、サラウンド効果）を実現している。ランダムシーケンスは、ＩＣＣキューによって制御される。 In Patent Literature 1, sound expansion (that is, a surround effect) is realized by inserting a “random sequence” into the ILD queue and the IPD queue. The random sequence is controlled by the ICC queue.

本発明の実施の形態１では、最初にダウンミックスチャネルとオリジナルチャネルとの間の所望のベクトル関係をバイノーラルキューから導出した後、ダウンミックス信号とその直交信号との間の正確なベクトル関係をシミュレーションするという新しいミキシング方法を提案する。 In Embodiment 1 of the present invention, a desired vector relationship between the downmix channel and the original channel is first derived from the binaural cue, and then the exact vector relationship between the downmix signal and its orthogonal signal is simulated. A new mixing method is proposed.

実施の形態２では、チャネル分離方法をマルチチャネルに応用する方法を提案する。 Embodiment 2 proposes a method of applying the channel separation method to multi-channel.

本発明では、バイノーラルキューを抽出し、オリジナル信号をダウンミキシングする符号化処理において、オリジナル信号が持つ、マルチチャネルならではの効果を高品位に再現することができる。これは、復号化処理において前記バイノーラルキューをダウンミックス信号に適用することで可能となる。 In the present invention, in an encoding process in which binaural cues are extracted and the original signal is downmixed, the multichannel effect of the original signal can be reproduced with high quality. This can be achieved by applying the binaural cue to the downmix signal in the decoding process.

以下に示す実施の形態は、本発明の様々な進歩性の原理を例示しているにすぎず、以下に示す詳細な説明に対して種々変形を加えることが可能であることは、当業者であれば容易に理解するところである。従って、本発明は特許請求の範囲によってのみ制限されるものであって、以下に示す詳細な具体例よって限定されるものではない。 It will be understood by those skilled in the art that the embodiments described below merely illustrate various inventive principles of the present invention, and various modifications can be made to the detailed description given below. If there is, it is easy to understand. Therefore, the present invention is limited only by the scope of the claims, and is not limited by the specific examples shown below.

さらに、ここではステレオ−モノラル−ステレオ（以降、“２-１-２ケース”と記す）および５チャネル−モノラル−５チャネル（以降、“５-１-５ケース”と記す）の２つのケースのみを示しているが、本発明はこれに限定されるものではない。これを、ＭオリジナルチャネルおよびＮダウンミックスチャネルとして一般化することができる。 Furthermore, here only two cases of stereo-mono-stereo (hereinafter referred to as “2-1-2 case”) and 5-channel-mono-5 channel (hereinafter referred to as “5-1-5 case”) are included. However, the present invention is not limited to this. This can be generalized as an M original channel and an N downmix channel.

図２は２-１-２ケースにおける符号化処理を示す図である。変換モジュール（２００）は、オリジナルチャネルＬ（ｔ）およびＲ（ｔ）を処理し、それぞれの時間−周波数表現Ｌ（ｔ，f）およびＲ（ｔ，f）を得る。ここで、ｔは時間指標を示し、fは周波数指標を示す。変換モジュール（２００）は、例えば、ＭＰＥＧ Audio Extension １，２で用いられるような複素ＱＭＦフィルタバンク等である。Ｌ（ｔ，f）およびＲ（ｔ，f）は連続する複数のサブバンドを含んでおり、それぞれのサブバンドはオリジナル信号の狭い周波数帯域を表している。ＱＭＦフィルタバンクは、低周波数サブバンドでは狭い周波数帯域とし、高周波数サブバンドでは広い帯域に対応するため、複数のステージで構成することができる。 FIG. 2 is a diagram showing an encoding process in the 2-1-2 case. The transform module (200) processes the original channels L (t) and R (t) to obtain respective time-frequency representations L (t, f) and R (t, f). Here, t indicates a time index, and f indicates a frequency index. The conversion module (200) is, for example, a complex QMF filter bank as used in MPEG Audio Extensions 1 and 2. L (t, f) and R (t, f) include a plurality of continuous subbands, and each subband represents a narrow frequency band of the original signal. Since the QMF filter bank has a narrow frequency band in the low frequency subband and a wide band in the high frequency subband, the QMF filter bank can be composed of a plurality of stages.

ダウンミックスモジュール（２０２）は、Ｌ（ｔ，f）およびＲ（ｔ，f）を処理し、ダウンミックス信号Ｍ（ｔ，f）を生成する。本実施の形態では、”重み付け”を用いた単純な方法を示す。 The downmix module (202) processes L (t, f) and R (t, f) and generates a downmix signal M (t, f). In this embodiment, a simple method using “weighting” is shown.

本発明では、ＩＬＤキューを用いてレベル調整を行う。ＩＬＤキューを計算するために、モジュール（２０４）は、Ｌ（ｔ，f）およびＲ（ｔ，f）をさらに処理し、ＩＬＤ（ｌ，ｂ）およびＢｏｒｄｅｒを生成する。図３に示されるように、まず、時間―周波数表現Ｌ（ｔ，f）を周波数方向に複数の帯域（３００）に分割する。それぞれの帯域は複数のサブバンドを含む。耳が持つ音響心理学的特性を利用して、低周波数帯域は、高周波数帯域よりもサブバンドの数が少なくなっている。例えば、サブバンドを帯域にグループ分けする際に、音響心理学の分野でよく知られている”バーク尺度”または”臨界帯域”を用いることができる。 In the present invention, level adjustment is performed using an ILD queue. To compute the ILD queue, module (204) further processes L (t, f) and R (t, f) to generate ILD (l, b) and Border. As shown in FIG. 3, first, the time-frequency representation L (t, f) is divided into a plurality of bands (300) in the frequency direction. Each band includes a plurality of subbands. Using the psychoacoustic characteristics of the ear, the low frequency band has fewer subbands than the high frequency band. For example, when subbands are grouped into bands, the “Burk scale” or “critical band” well known in the field of psychoacoustics can be used.

Ｌ（ｔ，f）およびＲ（ｔ，f）はさらに時間方向に境界Ｂｏｒｄｅｒ（３０２）で周波数帯域（ｌ，ｂ）に分割され、これに対してＥ_L（ｌ，ｂ）およびＥ_R（ｌ，ｂ）を計算する。ここで、ｌは時間的区分の指標であり、ｂは帯域の指標を示す。Ｂｏｒｄｅｒの最適な配置場所は、Ｅ_L（ｌ，ｂ）およびＥ_R（ｌ，ｂ）の比率が急激に変化する時間的位置である。ＩＬＤ（ｌ,ｂ）は次のように算出される。

L (t, f) and R (t, f) are further divided into frequency bands (l, b) at the boundary Border (302) in the time direction, whereas E _L (l, b) and E _R ( l, b) is calculated. Here, l is an index of time division, and b is an index of bandwidth. The optimal placement location of Border is the time position where the ratio of E _L (l, b) and E _R (l, b) changes rapidly. ILD (l, b) is calculated as follows.

符号化処理においてチャネル間干渉性キューを求めるため、モジュール（２０６）はＬ（ｔ，f）およびＲ（ｔ，f）を処理し、以下の数式を用いてＩＣＣ（ｂ）を求める。

In order to obtain the inter-channel coherence queue in the encoding process, the module (206) processes L (t, f) and R (t, f), and obtains ICC (b) using the following equation.

さらに、符号化処理において高周波数サブバンド（＞１．５ｋＨｚのみ）に対する高周波数チャネル間相関性キューを求めるため、（２０８）はＬ（ｔ，f）およびＲ（ｔ，f）を処理し、以下の数式を用いてＩＣＣＨ（ｂ）を求める。

Further, to determine the high frequency inter-channel correlation cue for the high frequency subband (> 1.5 kHz only) in the encoding process, (208) processes L (t, f) and R (t, f), ICCH (b) is obtained using the following mathematical formula.

後述するが、ＩＣＣ（ｌ，ｂ）をＩＬＤ（ｌ，ｂ）と組み合わせて用いることでゲインファクターを導出し、Ｍに対するＬおよびＲの実際の信号強度を復元する。さらに、ＩＣＣ（ｌ，ｂ）を用いて低周波数におけるＬとＲとの間の位相関係を計測するが、これはＬとＲと分離の度合いを計測するのにも役立つ。しかしながら高周波数においては、音が分離していることによってもたらされる効果は、位相差ではなく、ＬおよびＲの波形の類似度に影響される。例えば、Ｌ＝ｃｏｓ（ωｔ＋θ）、Ｒ＝ｃｏｓ（ωｔ）である場合、ωの値が大きければ、θの値に関わらず同じ立体音響的効果がもたらされる。このような波形相関性の計測にはＩＣＣＨ（ｌ，ｂ）の利用がより適している。 As will be described later, the gain factor is derived by using ICC (l, b) in combination with ILD (l, b), and the actual signal strengths of L and R with respect to M are restored. In addition, ICC (l, b) is used to measure the phase relationship between L and R at low frequencies, which also helps to measure the degree of separation between L and R. However, at high frequencies, the effect brought about by the separation of the sounds is influenced not by the phase difference but by the similarity of the L and R waveforms. For example, when L = cos (ωt + θ) and R = cos (ωt), if the value of ω is large, the same stereoacoustic effect is brought about regardless of the value of θ. The use of ICCH (l, b) is more suitable for such waveform correlation measurement.

上記バイノーラルキューは全て、符号化処理における副情報の一部となる。図４に示すように、バイノーラルキュー生成のための全処理は、上述の入力・出力を用いてモジュール（４００）に含めることができる。 All the binaural cues are part of the sub information in the encoding process. As shown in FIG. 4, the entire process for binaural cue generation can be included in the module (400) using the input / output described above.

図５は、上記のように生成されたバイノーラルキューを用いた復号化処理を示す図である。変換モジュール（５００）はダウンミックス信号Ｍ（ｔ）を処理し、時間−周波数表現Ｍ（ｔ，f）に変換する。本実施の形態で示す変換モジュールは、複素ＱＭＦフィルタバンクである。 FIG. 5 is a diagram showing a decoding process using the binaural queue generated as described above. The conversion module (500) processes the downmix signal M (t) and converts it into a time-frequency representation M (t, f). The conversion module shown in the present embodiment is a complex QMF filter bank.

無相関器（５０２）はＭ（ｔ，f）を処理し、直交信号を二つ生成する。図６において、従来技術における直交信号生成方法の例を二つ示す。非特許文献１ではＢｌｏｃｋ（６００）を用い、分数遅延オールパスフィルタを用いて、ダウンミックス信号Ｍ（ｔ，f）に対して直交である残響信号を導出している。Ｂｌｏｃｋ（６０４）は直列接続されたオールパスフィルタを示している。なお、上記以外の無相関器を用いることも可能である。例えば、非特許文献２ではＢｌｏｃｋ（６０２）を用い、共通オールパスフィルタ（６０６）においてＭ（ｔ，f）を処理した後、処理されたＭ（ｔ，f）を、互いに素の関係となる遅延特性を持つ二つの櫛形フィルタ（６０８）、（６１０）において無相関する（mutually-prime orders）。以下では無相関器（６００）を想定して説明を行う。 The decorrelator (502) processes M (t, f) and generates two orthogonal signals. FIG. 6 shows two examples of the orthogonal signal generation method in the prior art. Non-Patent Document 1 uses Block (600) and uses a fractional delay all-pass filter to derive a reverberation signal that is orthogonal to the downmix signal M (t, f). Block (604) represents an all-pass filter connected in series. It is also possible to use a decorrelator other than the above. For example, in Non-Patent Document 2, Block (602) is used, and after processing M (t, f) in the common all-pass filter (606), the processed M (t, f) is a delay that is relatively prime to each other. Two comb filters (608) and (610) having characteristics are uncorrelated (mutually-prime orders). In the following description, the decorrelator (600) is assumed.

本発明の実施の形態１において、モジュール（５０４）は（ｌ，ｂ）として示される帯域それぞれについて、バイノーラルキューＢｏｒｄｅｒ、ＩＬＤ（ｌ，ｂ）、ＩＣＣ（ｌ，ｂ）、およびＩＣＣＨ（ｌ，ｂ）からミキシング係数ｇ_L（ｌ，ｂ）、ｇ_R（ｌ，ｂ）、θ_L（ｌ，ｂ）、およびθ_R（ｌ，ｂ）を求める。次にモジュール（５０６）は、求められたミキシング係数に基づいてミキシングファクターｇ_L1（ｌ，ｂ）、ｇ_L2（ｌ，ｂ）、ｇ_R１（ｌ，ｂ）、およびｇ_R2（ｌ，ｂ）を算出する。 In the first embodiment of the present invention, the module (504) performs binaural queues Border, ILD (l, b), ICC (l, b), and ICCH (l, b) for each band indicated as (l, b). ), The mixing coefficients g _L (l, b), g _R (l, b), θ _L (l, b), and θ _R (l, b) are obtained. Module (506) then mixes the mixing factors g _L1 (l, b), g _L2 (l, b), g _R 1 (l, b), and g _R2 (l, b) based on the determined mixing coefficients. ) Is calculated.

記載の簡略化を図るため、以下では数式において（ｌ，ｂ）の表記を省略する。
符号化器でのダウンミックス処理に基づき、Ｌ、Ｒ、およびＭのエネルギー間の関係を以下のように導出する。

In order to simplify the description, the notation of (l, b) is omitted in the following formulas.
Based on the downmix processing at the encoder, the relationship between the L, R, and M energies is derived as follows.

従来、ＩＬＤおよびＩＣＣは以下のように定義されている。

Conventionally, ILD and ICC are defined as follows.

このため、上記ＩＬＤおよびＩＣＣの定義を数式Ｅ_Mに代入すると、分離されたチャネルＬ’およびＲ’のレベルにまでＭを増幅するのに必要なゲイン係数は以下のようになる。

For this reason, when the definitions of ILD and ICC are substituted into the equation E _M , the gain coefficients necessary to amplify M to the level of the separated channels L ′ and R ′ are as follows.

図７は、ベクトル関係において、ＭからＬおよびＲを“分離”する様子を幾何学的に示した図である（特許文献２）。同図において、θ_Lおよびθ_Rは分離の度合いを示す。低周波数に対しては（θ_L＋θ_R）をθ＝ｃｏｓ^-1（ＩＣＣ）に設定し、高周波数（＞１．５ｋＨｚ）に対しては（θ_L＋θ_R）をθ＝ｃｏｓ^-1（ＩＣＣＨ）に設定するが、その理由は上に述べたとおりである。図７に示す垂直三角形に対して三角関数を適用すると、

FIG. 7 is a diagram geometrically showing how “L” and “R” are separated from M in the vector relationship (Patent Document 2). In the figure, θ _L and θ _R indicate the degree of separation. For low frequencies, (θ _L + θ _R ) is set to θ = cos ⁻¹ (ICC), and for high frequencies (> 1.5 kHz) (θ _L + θ _R ) is set to θ = cos ⁻¹ ( ICCH) for the same reason as described above. Applying the trigonometric function to the vertical triangle shown in FIG.

同様に、

復号化器ではオリジナルのＬおよびＲを利用できないため、相関性のない二つの信号をモジュール（５０６）においてミックスして、上記分離をシミュレーションする。図８に示すように、前記相関性のない二つの信号は直交的なベクトル関係を有している。 Similarly,

Since the original L and R cannot be used in the decoder, two uncorrelated signals are mixed in the module (506) to simulate the separation. As shown in FIG. 8, the two signals having no correlation have an orthogonal vector relationship.

非特許文献１においては、相関のない二つの信号は、ダウンミックス信号Ｓ１＝Ｍと、Ｍから導出される無相関信号Ｓ２＝Ｍ_revとである。本発明においては、ミキシングファクターｇ_L1、ｇ_L2、ｇ_R1、およびｇ_R2を用いてＭおよびＭ_revをスケーリングすることでミキシングを行い、続いてベクトル加算を行う。ｇ_L1、ｇ_L2、ｇ_R1 、およびｇ_R2は、ｇ_L、ｇ_R、θ_L、およびθ_Rから導出される。これは、Ｍ_revはＭのオールパスバージョンであるため、｜Ｍ｜＝｜Ｍ_rev｜となるためである。 In Non-Patent Document 1, two uncorrelated signals are a downmix signal S1 = M and an uncorrelated signal S2 = _Mrev derived from M. In the present invention, mixing is performed by scaling M and M _rev using mixing factors g _L1 , g _L2 , g _R1 , and g _R2 , followed by vector addition. g _L1 , g _L2 , g _R1 , and g _R2 are derived from g _L , g _R , θ _L , and θ _R. This is because M _rev is an all-pass version of M, so | M | = | M _rev |.

左チャネルＬ’を合成する前提として、次の２つの要件が満たされている必要がある。

As a premise for synthesizing the left channel L ′, the following two requirements must be satisfied.

上記二つの連立方程式を解くことで、左チャネルを導出するためのミキシングファクターを求めることができる。

A mixing factor for deriving the left channel can be obtained by solving the above two simultaneous equations.

同様にして、右チャネルを導出するためのミキシングファクターを求めることができる。

Similarly, a mixing factor for deriving the right channel can be obtained.

最後に、二つのチャネルＬ’およびＲ’を合成するために、次のようにミキシングファクターを用いる。

Finally, to synthesize the two channels L ′ and R ′, a mixing factor is used as follows.

モジュール（５０８）において、分離されたチャネルＬ’（l，ｂ）およびＲ’（l，ｂ）を逆変換し、時間領域信号Ｌ’（ｔ）およびＲ’（ｔ）を形成する。 In module (508), the separated channels L '(l, b) and R' (l, b) are inverse transformed to form time domain signals L '(t) and R' (t).

本発明の実施の形態２では、上記チャネル分離方法をマルチチャネルにも応用する方法を示す。本実施の形態では、５-１-５ケースを用いて説明を行う。また、以下の数式をダウンミックス用の数式として想定する。

Embodiment 2 of the present invention shows a method in which the above channel separation method is applied to multi-channel. In the present embodiment, the description will be made using the 5-1-5 case. Further, the following formula is assumed as a formula for downmix.

上記数式において、ＬおよびＲは二つのフロント（前方）チャネルを示し、Ｌ_SおよびＲ_Sは二つのリア（後方）チャネルを示し、Ｃはセントラル（中央）チャネルを示す。 In the above equation, L and R indicate two front (front) channels, L _S and R _S indicate two rear (rear) channels, and C indicates a central (center) channel.

図９に示す５-１-５ケースにおける符号化処理では、図４に示すモジュール（４００）において、４通りのチャネルの組合せに対して４回処理を行うことで４つのバイノーラルキューセットを生成する。例えば、一つ目のバイノーラルキューセットを生成するために、ブロック（９００）（図４におけるモジュール（４００）と同じ）に対してＣチャネルと中間ダウンミックスチャネル（Ｌ＋０．７０７ＳＬ_S＋Ｒ＋０．７０７Ｒ_S）を入力する。モジュール（９０２）〜（９０６）においても同様の処理が行われる。生成された４つのバイノーラルキューセットは、マルチステージ復号化処理においてダウンミックスチャネルＭを、Ｌ、Ｒ、Ｌ_S、Ｒ_SおよびＣに分離するために用いられる。 In the encoding process in the case of 5-1-5 shown in FIG. 9, four binaural queue sets are generated by performing the process four times for four combinations of channels in the module (400) shown in FIG. . For example, to generate the first binaural cue set, the C channel and the intermediate downmix channel (L + 0.707SL _S + R + 0.707R _S ) for block (900) (same as module (400) in FIG. 4). Enter. Similar processing is performed in the modules (902) to (906). The generated four binaural queue sets are used to separate the downmix channel M into L, R, L _S , R _S and C in the multi-stage decoding process.

図１０はそのマルチステージ復号化処理を示す図である。図５に示す２-１-２ケースと同様に、ダウンミックスチャネルＭに対してＱＭＦ変換（１０００）および無相関処理（１００２）を行ってＭ_revを生成する。 FIG. 10 is a diagram showing the multistage decoding process. Similarly to the case of 2-1-2 shown in FIG. 5, the _MMF is generated by performing QMF conversion (1000) and decorrelation processing (1002) on the downmix channel M.

バイノーラルキューセット１をミキシング係数算出モジュール（１００４）において処理し、二つのミキシングファクターセット（ｇ_L1、ｇ_L2）および（ｇ_R1、ｇ_R2）を生成する。この処理は、ＭをＣとＭ₁＝（Ｌ＋０．７０７Ｌ_s＋Ｒ＋０．７０７Ｒ_s）とに分離するために行われる。[数１５]より、Ｍ＝０．２９３Ｃ＋０．７０７Ｍ₁を求めることは容易であり、重み付けの値として０．２９３および０．７０７を用いる。 The binaural cue set 1 is processed in the mixing coefficient calculation module (1004) to generate two mixing factor sets (g _L1 , g _L2 ) and (g _R1 , g _R2 ). This process is performed to separate M into C and M ₁ = (L + 0.707L _s + R + 0.707R _s ). From [Expression 15], it is easy to obtain M = 0.293C + 0.707M _1, and 0.293 and 0.707 are used as weighting values.

バイノーラルキューセット２をミキシング係数算出モジュール（１００６）において処理し、二つのミキシングファクターセット（ｇ_L3、ｇ_L4）および（ｇ_R3、ｇ_R4）を生成する。この処理はＭ₁をＭ₂＝（Ｌ＋Ｒ）／２とＭ₃＝（Ｌ_s＋Ｒ_s）／２とに分離するために行われる。[数１５]より、Ｍ₁＝０．５８６Ｍ₂＋０．４１４Ｍ₃を求めることは容易であり、重み付けの値として０．５８６および０．４１４を用いる。 The binaural cue set 2 is processed in the mixing coefficient calculation module (1006) to generate two mixing factor sets (g _L3 , g _L4 ) and (g _R3 , g _R4 ). This process is performed to separate M ₁ into M ₂ = (L + R) / 2 and M ₃ = (L _s + R _s ) / 2. From [Equation 15], it is easy to obtain M ₁ = 0.586M ₂ + 0.414M ₃ , and 0.586 and 0.414 are used as weighting values.

バイノーラルキューセット３をミキシング係数算出モジュール（１００８）において処理し、二つのミキシングファクターセット（ｇ_L5、ｇ_L6）および（ｇ_R5、ｇ_R6）を生成する。この処理はＭ₂をＬとＲとに分離するために行われる。Ｍ₂＝０．５Ｌ＋０．５Ｒであるため、重み付けの値として０．５を用いる。 The binaural cue set 3 is processed in the mixing coefficient calculation module (1008) to generate two mixing factor sets (g _L5 , g _L6 ) and (g _R5 , g _R6 ). This process is performed to separate M ₂ into L and R. Since M ₂ = 0.5L + 0.5R, 0.5 is used as the weighting value.

バイノーラルキューセット４をミキシング係数算出モジュール（１０１０）において処理し、二つのミキシングファクターセット（ｇ_L7、ｇ_L8）および（ｇ_R7、ｇ_R8）を生成する。この処理はＭ₃をＬｓとＲｓとに分離するために行われる。Ｍ₃＝０．５Ｌ_s＋０．５Ｒ_sであるため、重み付けの値として０．５を用いる。 The binaural cue set 4 is processed in the mixing coefficient calculation module (1010) to generate two mixing factor sets (g _L7 , g _L8 ) and (g _R7 , g _R8 ). This process is performed to separate M ₃ into Ls and Rs. Since M ₃ = 0.5L _s + 0.5R _s , 0.5 is used as the weighting value.

チャネルミキシングモジュール（１０１２）〜（１０２０）は、一連の行列演算においてミキシングファクターを組み合わせ、全体のミキシングファクターを求める。ここで、まず次の点に留意されたい。 The channel mixing modules (1012) to (1020) combine mixing factors in a series of matrix operations to obtain an overall mixing factor. First of all, please note the following points.

である場合、その直交信号（＋π／２で回転）は以下のようになる。

, The quadrature signal (rotated at + π / 2) is as follows:

よって、行列形式で表わすと、

Therefore, in matrix form,

Ｌ’を求めるため、一連のチャネル分離処理に用いられるミキシングファクターＭ‐＞Ｍ₁‐＞Ｍ₂を組み合わせる。Ｌ’に対するミキシング用の数式は、

In order to obtain L ′, mixing factors M → M ₁ → M ₂ used for a series of channel separation processes are combined. The mixing formula for L ′ is

同様にして、モジュール（１０１４）〜（１０２０）において他のミキシング用の数式を求めることができる。
Ｍ‐＞Ｍ₁‐＞Ｍ₂から

Ｍ‐＞Ｍ₁‐＞Ｍ₃から

Ｍ‐＞Ｍ₁‐＞Ｍ₃から

Ｍから

Similarly, other mathematical formulas for mixing can be obtained in the modules (1014) to (1020).
From M-> M _1- > M ₂

From M-> M _1- > M ₃

From M-> M _1- > M ₃

From M

逆ＱＭＦモジュール（１０２２）〜（１０３０）は、全ての合成チャネルを時間領域信号に変換する。 Inverse QMF modules (1022)-(1030) convert all combined channels into time domain signals.

（その他変形例）
なお、本発明を上記実施の形態に基づいて説明してきたが、本発明は、上記の実施の形態に限定されないのはもちろんである。以下のような場合も本発明に含まれる。 (Other variations)
Although the present invention has been described based on the above embodiment, it is needless to say that the present invention is not limited to the above embodiment. The following cases are also included in the present invention.

（１）上記の各装置は、具体的には、マイクロプロセッサ、ＲＯＭ、ＲＡＭ、ハードディスクユニット、ディスプレイユニット、キーボード、マウスなどから構成されるコンピュータシステムである。前記ＲＡＭまたはハードディスクユニットには、コンピュータプログラムが記憶されている。前記マイクロプロセッサが、前記コンピュータプログラムにしたがって動作することにより、各装置は、その機能を達成する。ここでコンピュータプログラムは、所定の機能を達成するために、コンピュータに対する指令を示す命令コードが複数個組み合わされて構成されたものである。 (1) Each of the above devices is specifically a computer system including a microprocessor, a ROM, a RAM, a hard disk unit, a display unit, a keyboard, a mouse, and the like. A computer program is stored in the RAM or hard disk unit. Each device achieves its functions by the microprocessor operating according to the computer program. Here, the computer program is configured by combining a plurality of instruction codes indicating instructions for the computer in order to achieve a predetermined function.

（２）上記の各装置を構成する構成要素の一部または全部は、１個のシステムＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ：大規模集積回路）から構成されているとしてもよい。システムＬＳＩは、複数の構成部を１個のチップ上に集積して製造された超多機能ＬＳＩであり、具体的には、マイクロプロセッサ、ＲＯＭ、ＲＡＭなどを含んで構成されるコンピュータシステムである。前記ＲＡＭには、コンピュータプログラムが記憶されている。前記マイクロプロセッサが、前記コンピュータプログラムにしたがって動作することにより、システムＬＳＩは、その機能を達成する。 (2) A part or all of the constituent elements constituting each of the above-described devices may be configured by one system LSI (Large Scale Integration). The system LSI is an ultra-multifunctional LSI manufactured by integrating a plurality of components on a single chip, and specifically, a computer system including a microprocessor, ROM, RAM, and the like. . A computer program is stored in the RAM. The system LSI achieves its functions by the microprocessor operating according to the computer program.

（３）上記の各装置を構成する構成要素の一部または全部は、各装置に脱着可能なＩＣカードまたは単体のモジュールから構成されているとしてもよい。前記ＩＣカードまたは前記モジュールは、マイクロプロセッサ、ＲＯＭ、ＲＡＭなどから構成されるコンピュータシステムである。前記ＩＣカードまたは前記モジュールは、上記の超多機能ＬＳＩを含むとしてもよい。マイクロプロセッサが、コンピュータプログラムにしたがって動作することにより、前記ＩＣカードまたは前記モジュールは、その機能を達成する。このＩＣカードまたはこのモジュールは、耐タンパ性を有するとしてもよい。 (3) Part or all of the constituent elements constituting each of the above devices may be configured from an IC card that can be attached to and detached from each device or a single module. The IC card or the module is a computer system including a microprocessor, a ROM, a RAM, and the like. The IC card or the module may include the super multifunctional LSI described above. The IC card or the module achieves its function by the microprocessor operating according to the computer program. This IC card or this module may have tamper resistance.

（４）本発明は、上記に示す方法であるとしてもよい。また、これらの方法をコンピュータにより実現するコンピュータプログラムであるとしてもよいし、前記コンピュータプログラムからなるデジタル信号であるとしてもよい。 (4) The present invention may be the method described above. Further, the present invention may be a computer program that realizes these methods by a computer, or may be a digital signal composed of the computer program.

また、本発明は、前記コンピュータプログラムまたは前記デジタル信号をコンピュータ読み取り可能な記録媒体、例えば、フレキシブルディスク、ハードディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ、ＤＶＤ−ＲＯＭ、ＤＶＤ−ＲＡＭ、ＢＤ（Ｂｌｕ−ｒａｙＤｉｓｃ）、半導体メモリなどに記録したものとしてもよい。また、これらの記録媒体に記録されている前記デジタル信号であるとしてもよい。 The present invention also provides a computer-readable recording medium such as a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, BD (Blu-ray Disc). ), Recorded in a semiconductor memory or the like. The digital signal may be recorded on these recording media.

また、本発明は、前記コンピュータプログラムまたは前記デジタル信号を、電気通信回線、無線または有線通信回線、インターネットを代表とするネットワーク、データ放送等を経由して伝送するものとしてもよい。 In the present invention, the computer program or the digital signal may be transmitted via an electric communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, or the like.

また、本発明は、マイクロプロセッサとメモリを備えたコンピュータシステムであって、前記メモリは、上記コンピュータプログラムを記憶しており、前記マイクロプロセッサは、前記コンピュータプログラムにしたがって動作するとしてもよい。 The present invention may be a computer system including a microprocessor and a memory, wherein the memory stores the computer program, and the microprocessor operates according to the computer program.

また、前記プログラムまたは前記デジタル信号を前記記録媒体に記録して移送することにより、または前記プログラムまたは前記デジタル信号を前記ネットワーク等を経由して移送することにより、独立した他のコンピュータシステムにより実施するとしてもよい。 In addition, the program or the digital signal is recorded on the recording medium and transferred, or the program or the digital signal is transferred via the network or the like, and executed by another independent computer system. It is good.

（５）上記実施の形態及び上記変形例をそれぞれ組み合わせるとしてもよい。 (5) The above embodiment and the above modifications may be combined.

本発明は、ホームシアターシステム、カーオーディオシステム、電子ゲームシステム等に適用可能である。 The present invention is applicable to a home theater system, a car audio system, an electronic game system, and the like.

典型的なバイノーラルキュー符号化システム。A typical binaural cue coding system. ２-１-２ケースにおける空間音声符号化処理。Spatial speech coding processing in the 2-1-2 case. 周波数帯域を形成するための時間・周波数の分割。Division of time and frequency to form a frequency band. バイノーラルキュー抽出ブロック。Binaural queue extraction block. ２-１-２ケースにおける空間音声復号化処理。2-1-2 spatial audio decoding processing in the case. 二つの無相関器。Two decorrelators. 音声信号のペアとそれらのダウンミックス信号と間のベクトル関係。A vector relationship between a pair of audio signals and their downmix signals. 二つの直交信号ベクトルの和による信号合成。Signal synthesis by the sum of two orthogonal signal vectors. ５-１-５ケースにおける空間音声符号化処理の一部。Part of the spatial speech coding process in the 5-1-5 case. ５-１-５ケースにおける空間音声復号化処理。5-1-5 Spatial speech decoding process in case.

Explanation of symbols

２００変換モジュール
２０２ダウンミックスモジュール
２０４ＩＬＤモジュール
２０６ＩＣＣモジュール
２０８ＩＣＣＨモジュール
４００２−１ＢＣＣ符号化モジュール
５００ＱＭＦフィルタバンク
５０２無相関器
５０４ミキシング係数算出モジュール
５０６チャネルミキシングモジュール
５０８ＱＭＦ^-1フィルタバンク 200 Conversion Module 202 Downmix Module 204 ILD Module 206 ICC Module 208 ICCH Module 400 2-1 BCC Coding Module 500 QMF Filter Bank 502 Correlator Correlator 504 Mixing Coefficient Calculation Module 506 Channel Mixing Module 508 QMF ^-1 Filter Bank

Claims

An apparatus for encoding a plurality of audio channels as spatial audio information,
(A) Calculate the downmix channel,
(B) converting the plurality of audio channels and downmix channels into a time-frequency representation, dividing them into intermediate frequency bands along the frequency axis;
(C) deriving a channel separation step for separating the downmix channel into individual audio channels in a multi-stage decoding process;
(D) For each channel separation step, determine a boundary in the time direction for further dividing the intermediate frequency band into frequency bands;
(E) For each channel separation step and each frequency band, calculate an inter-channel level difference queue (ILD);
(F) For each channel separation step and each frequency band, calculate an inter-channel coherence queue (ICC);
(G) A high frequency inter-channel correlation queue (ICCH) is calculated for each channel separation step and each high frequency band.

The apparatus according to claim 1, wherein, for each channel separation step, one composite downmix signal composed of a plurality of signals is input, and each of the one composite downmix signal is composed of one or a plurality of signals. A device characterized in that it is divided into two composite downmix signals.

The apparatus according to claim 1, wherein the boundary is arranged at a temporal position where a large change in the ILD appears in the time direction.

The apparatus according to claim 1, wherein the ILD queue is a ratio of energy of two composite signals in a frequency band.

The apparatus according to claim 1, wherein the ICC queue is used for measuring a phase correlation between two composite signals in a frequency band.

The apparatus according to claim 1, wherein the ICCH queue is used to measure a waveform correlation, not a phase, between two composite signals in a frequency band.

An apparatus for decoding spatial audio information into a plurality of audio channels,
(A) Convert the downmix channels into a time-frequency representation, divide them into intermediate frequency bands along the frequency axis,
(B) performing processing by a correlator on the downmix channel to form an inverse correlation channel of the downmix signal;
(C) For each channel separation step, input all binaural cue sets each including Border, ILD, ICC, and ICCH into a mixing coefficient calculation (MCC) module to derive a mixing factor;
(D) In the channel mixing (CM) module, by combining the mixing factors, an overall mixing factor for each channel is calculated,
(E) In the CM module, the uncorrelated signal and the overall mixing factor are mixed to generate the individual signals,
(F) A device that restores multi-channel speech by inversely transforming all individual signals from a time-frequency representation into the time domain.

The apparatus according to claim 7, wherein the uncorrelated signals are orthogonal to each other.

The apparatus according to claim 7, wherein the MCC generates two mixing factor sets to be added to the two composite signals output for the corresponding channel separation step, respectively.

10. Apparatus according to claim 7 and 9, wherein the mixing factor is a function of a gain factor and a degree of separation.

11. The apparatus according to claim 7, 9 and 10, wherein the gain factor and the degree of separation are calculated considering a vector relationship between two composite signals predicted from ILD, ICC and ICCH. A device characterized by that.

8. The apparatus according to claim 7, wherein the overall mixing factor is calculated by adding the mixing factors respectively derived at the corresponding channel separation stages by a series of matrix operations.

The apparatus according to claim 7, wherein in the mixing process, the downmix signal is separated from the downmix signal by using an orthogonal and uncorrelated signal in a vector relationship. A device characterized by realizing a desired degree of separation between them.

14. The apparatus according to claim 7 or 13, wherein in the mixing process, a downmix signal and an uncorrelated signal are scaled using the mixing factor, and these signals are added to a vector space. Also, it is possible to generate a separation signal in which the degree of separation from the downmix signal is a desired degree and the signal strength is a desired intensity.
A device characterized by that.

15. The apparatus according to claim 12, 13 and 14, wherein in the matrix operation, a downmix signal and an uncorrelated signal of a current channel separation stage are obtained until an input downmix signal and its uncorrelated signal are reached. , Repeatedly derived as a function of the dowmix signal and uncorrelated signal of the previous channel separation stage.
A device characterized by that.