JP2007529021A

JP2007529021A - Fidelity optimized variable frame length coding

Info

Publication number: JP2007529021A
Application number: JP2006518596A
Authority: JP
Inventors: ステファンブルン，; インイェマルヨハンソン，; アニセタレブ，; ダニエルエンストレム，
Original assignee: テレフオンアクチーボラゲットエルエムエリクソン（パブル）
Priority date: 2003-12-19
Filing date: 2004-12-15
Publication date: 2007-10-18
Anticipated expiration: 2024-12-15
Also published as: DE602004008613T2; ATE443317T1; WO2005059899A1; DE602004023240D1; BRPI0410856A; CA2527971C; AU2004298708A1; SE0400417D0; BRPI0410856B8; JP4335917B2; SE527670C2; EP1623411A1; ZA200508980B; EP1623411B1; EP1845519B1; ATE371924T1; RU2005134365A; RU2007121143A; BRPI0419281B1; SE0400417L

Abstract

A method of encoding multi-channel audio signals comprises generating of a first output signal (x' mono ), being encoding (38) parameters representing a main signal (x mono ). The main signal (x mono ) is a first linear combination (34) of signals (16A,16B) of at least a first and a second channel. The method further comprises generating (30) of a second output signal (p side ), being encoding parameters representing a side signal (x side ). The side signal (x side ) is a second linear combination (36) of signals (16A,16B) of at least the first and the second channel within an encoding frame. The method is characterised in that the generating of the second output signal further comprises scaling of the side signal (x side ) to an energy contour of the main signal (x mono ). A method of decoding is also presented as well as an encoder, a decoder and audio system, all according to the same basic idea.

Description

本発明は、オーディオ信号の符号化に関し、特に、マルチチャネル・オーディオ信号の符号化に関する。 The present invention relates to encoding audio signals, and more particularly to encoding multi-channel audio signals.

高いオーディオ品質を維持しつつ低ビットレートでオーディオ信号を送信あるいは記憶することは、市場の高い要求である。具体的には、送信リソースや記憶装置に制約がある場合、低ビットレート動作が本質的なコスト要因となる。これは、たとえば、ＧＳＭ、ＵＭＴＳ、ＣＤＭＡなどのモバイル通信システムにおけるストリーミングおよびメッセージングの応用分野では一般的な認識である。 There is a high market demand for transmitting or storing audio signals at a low bit rate while maintaining high audio quality. Specifically, when there are restrictions on transmission resources and storage devices, low bit rate operation is an essential cost factor. This is a common recognition in streaming and messaging applications in mobile communication systems such as GSM, UMTS, CDMA, for example.

現在のところ、モバイル通信システムに利用されうる標準化されたコーデックで、経済的に関心を引くビットレートにてステレオの高いオーディオ品質を提供するものは存在しない。いまあるコーデックで可能なのは、オーディオ信号のモノラル送信である。ステレオ送信もある程度までは利用可能である。しかしビットレートの制約により、非常に限定されたステレオ表現をせざるをえないのが現状である。 At present, there are no standardized codecs that can be used in mobile communication systems that provide high stereo audio quality at economically interesting bit rates. What is possible with the current codec is mono transmission of audio signals. Stereo transmission is also available to some extent. However, due to bit rate restrictions, the current situation is that very limited stereo representations are unavoidable.

オーディオ信号のステレオ符号化またはマルチチャネル符号化の最も簡単な方式は、異なるチャネルの信号を個々の独立な信号として別々に符号化することである。ステレオＦＭ無線送信において使用され、従来からのモノラル無線受信機との共存を図った他の基本的な方式は、２チャネルの和信号および差信号を送信するというものである。 The simplest method of stereo encoding or multi-channel encoding of an audio signal is to separately encode signals of different channels as individual independent signals. Another basic method used in stereo FM radio transmission and coexisting with a conventional monaural radio receiver is to transmit two-channel sum and difference signals.

ＭＰＥＧ−１／２レイヤIIIおよびＭＰＥＧ−２／４ＡＡＣなどの現在のオーディオコーデックは、いわゆるジョイント・ステレオ符号化（joint stereo coding）を使用する。この方法によれば、異なるチャネルの信号は、別々および個々にではなく、共同して処理される。２つの最も一般的に使用されるジョイント・ステレオ符号化方法は、「Ｍｉｄ／Ｓｉｄｅ」（Ｍ／Ｓ）ステレオ符号化（"Mid/Side" (M/S) stereo coding）およびインテンシティ・ステレオ符号化（intensity stereo coding）として知られ、これらは一般に、符号化されるステレオ信号またはマルチチャネル信号のサブバンドに対して適用される。 Current audio codecs such as MPEG-1 / 2 Layer III and MPEG-2 / 4 AAC use so-called joint stereo coding. According to this method, the signals of different channels are processed jointly rather than separately and individually. The two most commonly used joint stereo coding methods are “Mid / Side” (M / S) stereo coding and intensity stereo coding. Known as intensity stereo coding, these are generally applied to the subbands of the stereo signal or multichannel signal to be encoded.

Ｍ／Ｓステレオ符号化は、チャネル・サブバンドの和信号と差信号を符号化して送信し、それにより、チャネル・サブバンド間の冗長性を利用するという意味では、ステレオＦＭ無線において記述される手順と同様である。Ｍ／Ｓステレオ符号化に基づくエンコーダの構造および動作は、たとえば J. D. ジョンストン（Johnston）による米国特許第５２８５４９８号に記載されている。 M / S stereo coding is described in stereo FM radio in the sense that it encodes and transmits the sum and difference signals of the channel sub-bands, thereby exploiting the redundancy between the channel and sub-bands. The procedure is the same. The structure and operation of an encoder based on M / S stereo coding is described, for example, in US Pat. No. 5,285,498 by J. D. Johnston.

一方、インテンシティ・ステレオは、ステレオの非関連性を利用することができる。これは、強度がチャネルの中にどのように分布しているかを示すいくつかの位置情報と共に、（異なるサブバンドの）チャネルの結合強度を伝送する。インテンシティ・ステレオは、チャネルのスペクトル振幅情報のみを出力する。位相情報は、伝達されない。この理由および、時間チャネル間情報（より具体的にはチャネル間時間差）がとりわけ低い周波数において主要な音響心理学的関連性を呈するという理由により、インテンシティ・ステレオは、たとえば２ｋＨｚより高い周波数においてのみ使用することができる。インテンシティ・ステレオ符号化法は、たとえば、R. ヴェルデュイ（Veldhuis）らによる欧州特許第０４９７４１３号に記載されている。 On the other hand, intensity stereo can take advantage of stereo independence. This transmits the combined strength of the channels (in different subbands) along with some position information indicating how the strength is distributed in the channel. Intensity stereo outputs only the spectral amplitude information of the channel. Phase information is not transmitted. For this reason and because time-channel information (more specifically, channel-to-channel time difference) exhibits a major psychoacoustic relevance especially at low frequencies, intensity stereo is only at frequencies above 2 kHz, for example. Can be used. Intensity stereo coding is described, for example, in European Patent No. 0497413 by R. Veldhuis et al.

最近開発されたステレオ符号化法が、たとえば、C. フォーラー (Faller) らによる名称「Binaural cue coding applied to stereo and multi channel audio compression」（第112回AES会議、2002年5月、ドイツ、ミュンヘンの会議文献）に記載されている。この方法は、パラメトリック・マルチチャネル・オーディオ符号化法である。基本的な原理は、符号化側において、Ｎチャネルｃ１、ｃ２、・・・、ｃＮからの入力信号が、単一のモノラル信号（mono signal）ｍに組み合わされるものである。モノラル信号は、任意の従来のモノラル・オーディオ・コーデックを使用してオーディオ符号化される。並行して、パラメータが、マルチチャネル像を記述するチャネル信号から導出される。パラメータは符号化され、オーディオ・ビット・ストリームと共に復号化器に送信される。復号化器は、まず、モノラル信号ｍ’を復号化し、次いで、マルチチャネル像のパラメータ記述に基づいてチャネル信号ｃ１’、ｃ２’、・・・、ｃＮ’を再生成する。 A recently developed stereo coding method is, for example, the name “Binaural cue coding applied to stereo and multi channel audio compression” by C. Faller et al. (Conference Literature). This method is a parametric multi-channel audio coding method. The basic principle is that on the encoding side, the input signals from the N channels c1, c2,..., CN are combined into a single mono signal m. The mono signal is audio encoded using any conventional mono audio codec. In parallel, parameters are derived from the channel signal that describes the multi-channel image. The parameters are encoded and sent to the decoder along with the audio bit stream. The decoder first decodes the monaural signal m 'and then regenerates the channel signals c1', c2 ', ..., cN' based on the parameter description of the multichannel image.

バイノーラル・キュー符号化（ＢＣＣ）方法の原理は、符号化されたモノラル信号およびいわゆるＢＣＣパラメータを送信する。ＢＣＣパラメータは、元のマルチチャネル入力信号のサブバンドについて、コード化されたチャネル間レベル差およびチャネル間時間差の情報を有する。復号化器は、ＢＣＣパラメータに基づいてモノラル信号のサブバンド・レベルおよび位相調節を適用することによって、差チャネル信号を再生成する。Ｍ／Ｓまたはインテンシティ・ステレオなどに対する利点は、時間チャネル間情報を備えるステレオ情報が、はるかにより低いビットレートで送信されることである。しかし、この技法は、符号化器および復号化器の両方において、チャネルの各々について計算が厄介な時間周波数変換を必要とする。 The principle of the binaural cue coding (BCC) method transmits an encoded mono signal and so-called BCC parameters. The BCC parameters have coded inter-channel level difference and inter-channel time difference information for the subbands of the original multi-channel input signal. The decoder regenerates the difference channel signal by applying the subband level and phase adjustment of the mono signal based on the BCC parameters. An advantage over M / S or intensity stereo etc. is that stereo information with time channel information is transmitted at a much lower bit rate. However, this technique requires a time-frequency transform that is cumbersome to compute for each of the channels in both the encoder and the decoder.

さらに、ＢＣＣは、特に低周波数において多くのステレオ情報が拡散する、すなわち、任意の特定の方向からは来ないということに対処していない。ステレオ記録の両チャネルには拡散音場が存在するが、互いに関して位相は大きくずれている。ＢＣＣなどのアルゴリズムが、大量の拡散音場を有する記録の影響下にある場合、再生成されたステレオ像は混信し、ＢＣＣアルゴリズムは特定の周波数帯域の信号を左または右のみにしかパンすることができないので、左から右にジャンプする。 Furthermore, BCC does not deal with the fact that a lot of stereo information is spread, i.e. not coming from any particular direction, especially at low frequencies. A diffuse sound field exists in both channels of stereo recording, but the phases are greatly shifted with respect to each other. If an algorithm such as BCC is under the influence of a recording with a large amount of diffuse sound field, the regenerated stereo image will interfere and the BCC algorithm will only pan a signal in a particular frequency band to the left or right Jump from left to right.

ステレオ信号を符号化し、かつ拡散音場の良好な再生成を図った一手段としては、ＦＭステレオ放送において使用される技法と非常に類似な符号化方式を使用する、すなわち、モノ（左＋右）信号および差（左−右）信号を別々に符号化するものがある。 One way to encode a stereo signal and to achieve a good reproduction of the diffuse sound field is to use a coding scheme very similar to the technique used in FM stereo broadcasting, ie mono (left + right) Some encode the signal and the difference (left-right) signal separately.

C. E. ホルト（Holt）らによる米国特許第５４３４９４８号に記載されている技法は、モノラル信号および副情報を符号化するために、ＢＣＣと同様の技法を使用する。この場合、副情報は、予測フィルタおよびオプションで残差信号からなる。予測フィルタでは、Least-Mean-Squareアルゴリズムによって評価が行われ、モノラル信号に適用される場合にはマルチチャネル・オーディオ信号の予測が可能である。この技法により、マルチチャネル音源に対して非常に低いビットレート符号化を実現することができるが、以下でさらに議論されるように、品質の低下を伴うことになる。 The technique described in US Pat. No. 5,434,948 by C. E. Holt et al. Uses a technique similar to BCC to encode mono signals and side information. In this case, the sub information consists of a prediction filter and optionally a residual signal. In the prediction filter, evaluation is performed by a Least-Mean-Square algorithm, and when applied to a monaural signal, a multi-channel audio signal can be predicted. This technique can achieve very low bit rate coding for multi-channel sound sources, but with a loss of quality, as discussed further below.

最後に、完全を期すべく、３Ｄオーディオについて使用される技法について説明しておく。この技法は、いわゆるヘッド関係フィルタ (head-related filters) で音源信号をフィルタリングすることによって、右チャネル信号と左チャネル信号とを合成する。しかし、この技法は、異なる音源信号が分離されることを必要とし、したがって一般に、ステレオまたはマルチチャネル符号化に適用することができない。 Finally, for completeness, the techniques used for 3D audio are described. This technique combines a right channel signal and a left channel signal by filtering the sound source signal with so-called head-related filters. However, this technique requires that different sound source signals be separated and is therefore generally not applicable to stereo or multi-channel coding.

米国特許第５２８５４９８号US Pat. No. 5,285,498 欧州特許第０４９７４１３号European Patent No. 0497413 C. フォーラー（Faller）ら、"Binaural cue coding applied to stereo and multi-channel audio compression"、第１１２回ＡＥＳ会議、２００２年５月、ドイツ、ミュンヘンC. Faller et al., “Binaural cue coding applied to stereo and multi-channel audio compression”, 112th AES Conference, May 2002, Munich, Germany 米国特許第５４３４９４８号US Pat. No. 5,434,948

（概要）
具体的には主信号および１または２以上の副信号である信号のフレームの符号化に基づく既存の符号化方式に付随する問題は、オーディオ情報をフレームに分割することにより、不快な異音（perceptual artefacts）が生じる可能性があることである。比較的長い持続時間のフレームに情報を分割することにより、一般には、平均ビットレートが低減される。これは、たとえば、大量の拡散音を含む音楽には有益であろう。しかし、一時的な音量豊かな音楽や音声では、細かな時間変化はフレーム持続時間にわたって不鮮明になるので、ゴースト性の音、さらにはプリエコーの問題が生じる。フレーム長を短くして符号化すれば、逆に、音をより精密に表現でき、エネルギを最小限に抑えるが、送信ビットレートが高くなり、かつ、演算量も多くなってしまう。したがって、フレーム長を非常に短くしても、符号化効率は低下する可能性がある。また、フレーム境界が増加することにより符号化パラメータの不連続性が生じる可能性もあり、これが異音として知覚される可能性がある。 (Overview)
Specifically, problems associated with existing coding schemes based on the coding of a main signal and one or more sub-signal signals are the division of audio information into frames, resulting in unpleasant noise ( perceptual artefacts) can occur. By dividing the information into frames of relatively long duration, the average bit rate is generally reduced. This may be useful, for example, for music that contains a large amount of diffuse sound. However, in temporary music and voice with a high volume, fine temporal changes become unclear over the frame duration, resulting in ghostly sound and pre-echo problems. If encoding is performed with a shorter frame length, on the contrary, sound can be expressed more precisely and energy is minimized, but the transmission bit rate is increased and the amount of calculation is increased. Therefore, even if the frame length is very short, the encoding efficiency may be reduced. In addition, the discontinuity of the encoding parameter may occur due to an increase in the frame boundary, and this may be perceived as an abnormal sound.

主信号および１または２以上の副信号の符号化に基づく方式に付随する他の問題は、演算量の増大を招くことである。具体的には、短いフレームが使用される場合に、フレームごとにパラメータの不連続性に対処することは複雑なタスクである。長いフレームが使用される場兄は、一時的な音の評価誤差により、非常に大きな副信号が生じる可能性があり、これは、送信レートに対する要求を増大させることになる。 Another problem associated with schemes based on the encoding of the main signal and one or more sub-signals is an increase in computational complexity. Specifically, when short frames are used, dealing with parameter discontinuities from frame to frame is a complex task. If a long frame is used, a very large sub-signal may be generated due to a temporary sound evaluation error, which increases the demand for transmission rate.

したがって、本発明の目的は、マルチチャネル・オーディオ信号の聴感品質を向上させることができる符号化方法および装置を提供することであり、特に、プリエコー、ゴースト音、あるいはフレーム不連続などによる異音を回避することである。本発明の他の目的は、必要な処理電力がより小さく、かつより一定な送信ビットレートを有する符号化方法および装置を提供することである。 Accordingly, an object of the present invention is to provide an encoding method and apparatus capable of improving the audible quality of a multi-channel audio signal. It is to avoid. Another object of the present invention is to provide an encoding method and apparatus that requires less processing power and has a more constant transmission bit rate.

上記の目的は、添付の特許請求の範囲による方法および装置によって達成される。一般的には、ポリフォニック信号（polyphonic signals）が、典型的にはモノラル信号である主信号、および副信号を生成するために使用される。主信号は、従来技術の符号化原理に従って符号化される。副信号を符号化するいくつかの方式が提供される。各符号化方式は、異なる長さの１セットのサブフレームを特徴とする。サブフレームの全長は、符号化方式の符号化フレームの長さに対応する。サブフレームのセットは、少なくとも１つのサブフレームを有する。副信号について使用される符号化方式は、ポリフォニック信号の現在の信号内容に少なくとも部分的に依存して選択される。 The above objective is accomplished by a method and device according to the appended claims. In general, polyphonic signals are used to generate a main signal and a sub-signal, which are typically monaural signals. The main signal is encoded according to the encoding principle of the prior art. Several schemes are provided for encoding the side signals. Each coding scheme is characterized by a set of subframes of different lengths. The total length of the subframe corresponds to the length of the encoding frame of the encoding scheme. The set of subframes has at least one subframe. The coding scheme used for the sub-signal is selected depending at least in part on the current signal content of the polyphonic signal.

一実施形態では、その選択は、信号特徴分析に基づいて、符号化の前に行われる。他の実施形態では、副信号は、符号化方式の各々によって符号化され、符号化の品質の測定に基づいて、最適な符号化方式が選択される。 In one embodiment, the selection is made prior to encoding based on signal feature analysis. In other embodiments, the sub-signal is encoded by each of the encoding schemes, and an optimal encoding scheme is selected based on a measurement of the encoding quality.

好適な実施形態では、副残差信号が、バランス係数でスケーリングされた副信号と主信号との差として生成される。バランス係数は、副残差信号を最小にすように選択される。最適化された副残差信号およびバランス係数は符号化され、副信号を表すパラメータとして提供される。復号化器側においては、バランス係数、副残差信号、および主信号を用いて、副信号を回復する。 In the preferred embodiment, the sub-residual signal is generated as the difference between the sub-signal scaled by the balance factor and the main signal. The balance factor is selected to minimize the secondary residual signal. The optimized sub-residual signal and balance factor are encoded and provided as parameters representing the sub-signal. On the decoder side, the sub-signal is recovered using the balance coefficient, the sub-residual signal, and the main signal.

他の好適な実施形態では、副信号の符号化は、プリエコーの影響を回避するために、等エネルギ曲線スケーリングを有する。さらに、異なる符号化方式は、別々のサブフレームについて異なる符号化手順を有することが可能である。 In other preferred embodiments, the sub-signal encoding has iso-energy curve scaling to avoid pre-echo effects. Furthermore, different coding schemes can have different coding procedures for different subframes.

本発明の主な利点は、オーディオ信号の聴感上の品質が改善されることである。さらに、本発明により、非常に低いビットレートにおけるマルチチャネル信号送信が可能となる。 The main advantage of the present invention is that the audible quality of the audio signal is improved. Furthermore, the present invention enables multi-channel signal transmission at very low bit rates.

（詳細な説明）
図１は、本発明の好適な実施形態に係る典型的なシステム１を示す図である。送信機１０は、無線信号５を受信機２０に送信することができるように、関連するハードウエアおよびソフトウエアを含むアンテナ１２を備える。送信機１０は、とりわけマルチチャネル符号化器１４を備え、これは、いくつかの入力チャネル１６の信号を無線送信に適切な出力信号に変換する。適切なマルチチャネル符号化器１４の例については、後ほど詳細に説明する。入力チャネル１６の信号は、たとえば、録音に係るディジタル表現のデータファイル、磁気テープ、あるいはオーディオのビニル・ディスク記録など、オーディオ信号記憶装置１８から提供を受けることが可能である。入力チャネル１６の信号は、たとえば、１セットのマクロフォン１９から「ライブ」で提供されてもよい。オーディオ信号がすでにディジタル形式ではない場合には、マルチチャネル符号化器１４に入る前にディジタル化される。 (Detailed explanation)
FIG. 1 is a diagram illustrating an exemplary system 1 according to a preferred embodiment of the present invention. The transmitter 10 includes an antenna 12 that includes associated hardware and software so that the radio signal 5 can be transmitted to the receiver 20. The transmitter 10 comprises, among other things, a multi-channel encoder 14, which converts several input channel 16 signals into output signals suitable for wireless transmission. An example of a suitable multi-channel encoder 14 will be described in detail later. The input channel 16 signal may be provided by an audio signal storage device 18, such as a digital representation data file for recording, a magnetic tape, or an audio vinyl disk recording. The signal on the input channel 16 may be provided “live” from a set of microphones 19, for example. If the audio signal is not already in digital form, it is digitized before entering multichannel encoder 14.

受信機２０側では、関連するハードウエアおよびソフトウエアを有するアンテナ２２が、ポリフォニック・オーディオ信号を表す無線信号５の実際の受信に対処する。ここでは、たとえばエラー訂正などの通常の機能が実施される。復号化器２４が、受信された無線信号５を復号化し、それにより、搬送されたオーディオデータをいくつかの出力チャネル２６の信号に変換する。出力信号は、たとえば、直ちに出力するためにスピーカ２９に提供することができ、あるいは、任意の種類のオーディオ信号記憶装置２８に記憶することができる。 On the receiver 20 side, an antenna 22 with associated hardware and software handles the actual reception of the radio signal 5 representing the polyphonic audio signal. Here, normal functions such as error correction are implemented. A decoder 24 decodes the received radio signal 5, thereby converting the conveyed audio data into a number of output channel 26 signals. The output signal can be provided to the speaker 29 for immediate output, or can be stored in any type of audio signal storage device 28, for example.

システム１は、たとえば、電話会議システム、音声サービスあるいはその他のオーディオアプリケーションを提供するためのシステムでありうる。たとえば電話会議システムなどのいくつかのシステムでは、通信は、二重伝送方式でなければならず、一方、サービス・プロバイダから加入者への楽音の配信などは、基本的には一方向性伝送方式でよい。送信機１０から受信機２０への信号の送信は、異なる種類の電磁波、ケーブル、またはファイバ、あるいはその組合せによってなど、任意の他の手段によって実施することもできる。 The system 1 can be, for example, a system for providing a conference call system, voice service or other audio application. In some systems, such as teleconference systems, communication must be a dual transmission system, while the distribution of musical sounds from service providers to subscribers is basically a unidirectional transmission system. It's okay. Transmission of the signal from the transmitter 10 to the receiver 20 may be performed by any other means, such as by different types of electromagnetic waves, cables, or fibers, or combinations thereof.

図２ａは、本発明による符号化器の一実施形態を示す。この実施形態では、ポリフォニック信号は、入力１６Ａおよび１６Ｂにおいてそれぞれ受信される２つのチャネルａおよびｂを備えるステレオ信号である。チャネルａおよびｂの信号は、前処理部３２に出力され、そこで異なる信号調整処理を適用することが可能である。前処理部３２の出力からの（おそらくは修正された）信号は、加算器３４において加算される。この加算器３４は、２で加算結果を除算することも行う。このように生成される信号X_monoは、ステレオ信号の主信号であるが、その理由は、これは、両チャネルからのすべてのデータを基本的に有するからである。したがって、本実施形態では、主信号は純粋な「モノラル」信号を表す。主信号X_monoは、任意の適切な符号化原理に従って主信号を符号化する主信号符号化部３８に提供される。その原理については従来技術を適用可能であるから、本明細書では説明を省略する。主信号符号化部３８は、主信号を表す符号化パラメータである出力信号p_monoを出力する。 FIG. 2a shows an embodiment of an encoder according to the invention. In this embodiment, the polyphonic signal is a stereo signal with two channels a and b received at inputs 16A and 16B, respectively. The signals of channels a and b are output to the preprocessing unit 32, where different signal conditioning processes can be applied. The (possibly modified) signal from the output of the preprocessor 32 is added in an adder 34. The adder 34 also divides the addition result by 2. The signal X _mono generated in this way is the main signal of the stereo signal because it basically has all the data from both channels. Thus, in this embodiment, the main signal represents a pure “mono” signal. The main signal X _mono is provided to a main signal encoding unit 38 that encodes the main signal according to any suitable encoding principle. Since the prior art can be applied to the principle, the description is omitted in this specification. The main signal encoding unit 38 outputs an output signal p _mono which is an encoding parameter representing the main signal.

減算器３６において、チャネル信号の差（２で除算される）が、副信号x_sideとして提供される。本実施形態では、副信号はステレオ信号の２チャネル間の差を表す。副信号x_sideは、副信号符号化部３０に提供される。副信号符号化部３０の好適な実施形態を以下でさらに説明する。以下で詳しく説明する副信号符号化手順によれば、副信号x_sideは、副信号x_sideを表す符号化パラメータp_sideに変換される。ある実施形態では、この符号化は主信号x_monoの情報も使用して行われる。矢印４２は、そのような提供を示し、元の符号化されていない主信号x_monoが使用される。他の実施形態では、副信号符号化部３０において使用される主信号情報は、破線４４によって示されるように、主信号を表す符号化パラメータp_monoから推測することができる。 In the subtractor 36, the difference of the channel signals (divided by 2) is provided as the _side signal x _side . In this embodiment, the sub signal represents the difference between the two channels of the stereo signal. The sub signal x _side is provided to the sub signal encoding unit 30. A preferred embodiment of the sub-signal encoding unit 30 will be further described below. According to the sub-signal encoding procedure described in detail below, the sub-signal x _side is converted into an encoding parameter p _side representing the sub-signal x _side . In an embodiment, this encoding is also performed using information of the main signal x _mono . Arrow 42 indicates such a provision and the original _uncoded main signal x _mono is used. In another embodiment, the main signal information used in the sub-signal encoding unit 30 can be inferred from the encoding parameter p _mono representing the main signal, as indicated by the broken line 44.

主信号x_monoを表す符号化パラメータp_monoは第１出力信号であり、副信号x_sideを表す符号化パラメータp_sideは第２出力信号である。通常の場合、これら２つの出力信号p_mono、p_sideは、共に完全ステレオ音を表し、マルチプレクサ４０で１つの送信信号５２に多重化される。しかし、他の実施形態では、第１出力信号p_monoおよび第２出力信号p_sideの送信は、別々に行われることが可能である。 Encoding parameters p _mono representing the main signal x _mono is a first output signal, encoded parameters p _side representing the side signal x _side is a second output signal. In a normal case, these two output signals p _mono and p _side both represent perfect stereo sound and are multiplexed into one transmission signal 52 by the multiplexer 40. However, in other embodiments, the transmission of the first output signal p _mono and the second output signal p _side can be performed separately.

図２ｂには、本発明による復号化器２４の一実施形態が、ブロック図にて示されている。受信信号５４は、主信号情報および副信号情報を表す符号化パラメータを含み、第１入力信号および第２入力信号をそれぞれ分離するデマルチプレクサ５６に出力される。第１入力信号は、主信号の符号化パラメータp_monoに対応し、主信号復号化部６４に提供される。従来方式では、主信号を表す符号化パラメータp_monoは、符号化器１４（図２ａ）の主信号x_mono（図２ａ）と可能な限り同様であるように復号された主信号x"_monoを生成するために使用される。 In FIG. 2b, an embodiment of a decoder 24 according to the invention is shown in a block diagram. The received signal 54 includes encoding parameters representing main signal information and sub-signal information, and is output to a demultiplexer 56 that separates the first input signal and the second input signal. The first input signal corresponds to the main signal encoding parameter p _mono and is provided to the main signal decoding unit 64. In the conventional method, the encoding parameter p _mono representing the main signal is obtained by decoding the main signal x ″ _mono decoded so as to be as similar as possible to the main signal x _mono (FIG. 2 a) of the encoder 14 (FIG. 2 a). Used to generate.

同様に、副信号に対応する第２入力信号は、副信号復号化部６０に供給される。ここで、副信号を表す符号化パラメータp_sideは、復号された副信号x"_sideを回復するために使用される。いくつかの実施形態では、符号化手順は、矢印６５によって示されるように、主信号x"_monoに関する情報を使用する。 Similarly, the second input signal corresponding to the sub signal is supplied to the sub signal decoding unit 60. Here, the encoding parameter p _side representing the sub-signal is used to recover the decoded sub-signal x ″ _side . In some embodiments, the encoding procedure is as indicated by arrow 65. , Use information about the main signal x " _mono .

復号された主信号x"_monoおよび復号された副信号x"_sideは、加算器７０に供給される。加算器７０は、チャネルａの元の信号の表示である出力信号を出力する。同様に、減算器６８によって供給される差が、チャネルｂの元の信号の表示である出力信号を出力する。これらのチャネル信号は、従来技術の信号処理手順に従って、後処理部７４において後処理されることが可能である。最後に、チャネル信号ａおよびｂは、復号化器の信号出力２６Ａおよび２６Ｂより出力される。 The decoded main signal x ″ _mono and the decoded sub signal x ″ _side are supplied to the adder 70. Adder 70 outputs an output signal that is an indication of the original signal on channel a. Similarly, the difference supplied by subtractor 68 outputs an output signal that is an indication of the original signal on channel b. These channel signals can be post-processed in the post-processing unit 74 in accordance with a prior art signal processing procedure. Finally, channel signals a and b are output from decoder signal outputs 26A and 26B.

概要において述べたように、符号化は基本的に１フレームごとに実行される。フレームは所定時間のオーディオサンプルを有する。図３ａの下方部分において、持続時間ＬのフレームＳＦ２が示されている。ハッチなし部分内のオーディオサンプルは、一緒に符号化される。その先行サンプルおよび後続サンプルは、他のフレームにおいて符号化される。サンプルをフレームに分割することにより、あらゆる場合において、いくつかの不連続性がフレーム境界に生じることになる。音が変化すれば符号化パラメータも変化する。この符号化パラメータの変化は基本的には各フレーム境界ですることになる。これにより、聴感エラーが生じる。これをある程度補償する１つの方式は、符号化されるサンプルだけでなく、ハッチ部分によって示されるように、フレームの境界近傍にあるサンプルにも基づいて符号化することである。そうすることで、異なるフレーム間の移行がより柔軟になる。その代わりに、あるいはその補完として、フレーム境界によって生じる異音を低減するために、補間手法が使用されることもある。しかし、それらの処理はいずれも、大量の追加的な演算量を必要とし、符号化方式によっては、いかなるリソースをもってしても実現困難である可能性もある。 As described in the overview, encoding is basically performed frame by frame. A frame has audio samples for a predetermined time. In the lower part of FIG. 3a, a frame SF2 of duration L is shown. Audio samples within the unhatched part are encoded together. The preceding and subsequent samples are encoded in other frames. By dividing the sample into frames, in some cases, some discontinuities will occur at the frame boundaries. If the sound changes, the encoding parameter also changes. The change of the encoding parameter is basically at each frame boundary. This causes an auditory error. One way to compensate for this to some extent is to encode based not only on the samples to be encoded, but also on samples near the boundaries of the frame, as indicated by the hatched portion. By doing so, the transition between different frames becomes more flexible. Alternatively, or as a complement to it, interpolation techniques may be used to reduce abnormal noise caused by frame boundaries. However, each of these processes requires a large amount of additional calculation, and depending on the encoding method, it may be difficult to realize with any resource.

この観点からは、なるべく長いフレームを使用することが望ましいといえる。その理由は、フレーム境界の数が少なくなるからである。また、一般には符号化効率が高くなり、必要な送信ビットレートが最小限になる。しかし、フレーム長を長くすると、プリエコーによる異音やゴースト音が知覚されるという問題が生じる。 From this point of view, it is desirable to use a frame that is as long as possible. The reason is that the number of frame boundaries is reduced. In general, the coding efficiency is increased and the required transmission bit rate is minimized. However, when the frame length is increased, there arises a problem that abnormal sound or ghost sound due to pre-echo is perceived.

代わりに、Ｌ／２およびＬ／４の持続時間をそれぞれ有するＳＦ１またはさらにはＳＦ０などのより短いフレームを使用すると、符号化効率が低下し送信ビットレートが大きくなる可能性があり、しかも、フレーム境界異音の問題が増大することは、当業者には理解されよう。しかし、フレーム長を短くすれば、たとえばゴースト音やプリエコーによる異音による悪影響は小さくなる。符号化誤差を可能な限り最小限に抑えられるように、なるべく短いフレーム長を使用するべきである。 Instead, using shorter frames such as SF1 or even SF0 with durations of L / 2 and L / 4, respectively, can reduce coding efficiency and increase the transmission bit rate, and Those skilled in the art will appreciate that the problem of boundary noise increases. However, if the frame length is shortened, for example, an adverse effect due to abnormal sound due to ghost sound or pre-echo is reduced. As short a frame length as possible should be used so that coding errors are minimized as much as possible.

本発明によれば、現在の信号内容に依存した、副信号を符号化するフレーム長を使用することによって、オーディオ聴感品質が改善される。フレーム長が異なることのオーディオ聴感品質への影響は符号化される音の性質に応じて異なるので、信号自体の性質が使用フレーム長に影響を与えるようにすることによって改善することができる。主信号の符号化は、本発明の目的ではないので、詳細な説明は省略する。しかし、主信号に使用されるフレーム長は、副信号に使用されるフレーム長と等しくしてもよいし、等しくしなくてもよい。 According to the present invention, the audio audibility quality is improved by using the frame length for encoding the sub-signal depending on the current signal content. Since the influence of the frame length on the audio audibility quality differs depending on the nature of the sound to be encoded, it can be improved by making the nature of the signal itself affect the frame length used. Since the encoding of the main signal is not the object of the present invention, a detailed description is omitted. However, the frame length used for the main signal may or may not be equal to the frame length used for the sub-signal.

いくつかのケースでは、時間変化が小さいために比較的長いフレームを使用して副信号を符号化することが有益な場合がある。コンサートの録音など、大量の拡散音場を有する記録がこのケースにあたる。ステレオのスピーチ会話などの場合には、短いフレームが好ましいであろう。どのフレーム長が好ましいかは、２つの基本的な方式で決定することができる。 In some cases, it may be beneficial to encode the sub-signal using a relatively long frame because of the small temporal variation. This is the case for recordings with a large amount of diffuse sound field, such as recordings of concerts. For frames such as stereo speech, a short frame may be preferred. Which frame length is preferred can be determined in two basic ways.

本発明による副信号符号化部３０の一実施形態が、図３ｂに示されており、閉ループ決定が使用される。ここでは、長さＬの基本的な符号化フレームが使用される。サブフレーム９０のセット８０を特徴とするいくつかの符号化方式８１が提供される。サブフレーム９０のセット８０のそれぞれは、等しいまたは異なる長さの１または２以上のサブフレーム９０を有する。しかし、サブフレーム９０のセット８０の全長は、基本的な符号化フレーム長Ｌに常に等しい。図３ｂを参照すると、一番上の符号化方式は、長さＬの１つのサブフレームのみを備える１セットのサブフレームを特徴とする。次のサブフレームのセットは、長さＬ／２の２つのフレームを備える。第３のセットは、長さＬ／４の２つのフレームを備え、これにＬ／２のフレームが続く。 One embodiment of the sub-signal encoder 30 according to the present invention is shown in FIG. 3b, where a closed loop decision is used. Here, a basic encoded frame of length L is used. Several encoding schemes 81 featuring a set 80 of subframes 90 are provided. Each set 80 of subframes 90 has one or more subframes 90 of equal or different lengths. However, the total length of the set 80 of subframes 90 is always equal to the basic encoded frame length L. Referring to FIG. 3b, the top encoding scheme features a set of subframes comprising only one subframe of length L. The next set of subframes comprises two frames of length L / 2. The third set comprises two frames of length L / 4, followed by L / 2 frames.

副信号符号化部３０に供給される信号x_sideは、すべての符号化方式８１によって符号化される。一番上の符号化方式では、基本符号化フレーム全体が符号化される。しかし、他の符号化方式では、信号x_sideは、各サブフレームにおいて別々に符号化される。各符号化方式の結果は、セレクタ８５に供給される。忠実度測定手段８３は、符号化信号のそれぞれについて忠実度尺度を計算する。忠実度尺度は客観的品質値であり、信号対雑音尺度あるいは重み付け信号対雑音比であることが好ましい。各符号化方式に関連付けられる忠実度尺度が比較され、その結果に応じてスイッチング手段８７を制御して、最適な忠実度尺度が得られる符号化方式からの副信号を表す符号化パラメータを、副信号符号化部３０から出力信号p_sideとして選択する。 The signal x _side supplied to the sub signal encoding unit 30 is encoded by all the encoding methods 81. In the uppermost encoding scheme, the entire basic encoding frame is encoded. However, in other encoding schemes, the signal x _side is encoded separately in each subframe. The result of each encoding method is supplied to the selector 85. The fidelity measuring means 83 calculates a fidelity measure for each encoded signal. The fidelity measure is an objective quality value and is preferably a signal to noise measure or a weighted signal to noise ratio. The fidelity measures associated with the respective coding schemes are compared, and the switching means 87 is controlled in accordance with the result, so that the coding parameters representing the sub-signals from the coding schemes that obtain the optimum fidelity measures are subtracted. The output signal p _side is selected from the signal encoding unit 30.

とりうるフレーム長のすべての組み合せを試して、信号対雑音比など、最適な客観的品質が得られるサブフレームのセットを選択することが好ましい。 It is preferable to try all combinations of possible frame lengths and select a set of subframes that provide the best objective quality, such as signal-to-noise ratio.

本実施形態では、使用されるサブフレームの長さは、次式に従い選択される。 In the present embodiment, the length of the subframe to be used is selected according to the following equation.

l_sf＝l_f／2ⁿ l _sf ＝ l _f / 2 ⁿ

ただし、l_sfはサブフレームの長さ、l_fは符号化フレームの長さ、ｎは整数である。本実施形態では、ｎは、０から３の間において選択される。しかし、セットの全長が一定に維持される限り、任意のフレームの長を使用することが可能である。 Here, l _sf is the length of the subframe, l _f is the length of the encoded frame, and n is an integer. In the present embodiment, n is selected between 0 and 3. However, any frame length can be used as long as the total length of the set is kept constant.

図３ｃにおいて、本発明による副信号符号化部３０の他の実施形態が示されている。ここで、フレーム長の決定は、信号の統計に基づく開ループ決定である。すなわち、副信号のスペクトル特性は、どの符号化方式が使用されるかを決定する基礎として使用される。先述と同様に、異なるサブフレームのセットによって特徴付けられる異なる符号化方式が利用可能である。しかし、この実施形態では、セレクタ８５は、実際の符号化の前に配置される。入力副信号x_sideは、セレクタ８５および信号分析部８４に入力される。分析の結果はスイッチ８６の入力となり、符号化方式８１の１つのみが使用される。その符号化方式からの出力は、副信号符号化部３０からの出力信号p_sideでもある。 In FIG. 3c, another embodiment of the sub-signal encoding unit 30 according to the present invention is shown. Here, the determination of the frame length is an open loop determination based on signal statistics. That is, the spectral characteristics of the sub-signal are used as a basis for determining which encoding scheme is used. As before, different coding schemes characterized by different sets of subframes are available. However, in this embodiment, the selector 85 is arranged before the actual encoding. The input sub signal x _side is input to the selector 85 and the signal analysis unit 84. The result of the analysis becomes the input of the switch 86 and only one of the encoding methods 81 is used. The output from the encoding method is also the output signal p _side from the sub-signal encoding unit 30.

開ループ決定の利点は、唯一の実際の符号化が実施されることである。しかし、欠点は、信号特性の分析が非常に複雑となる可能性があり、また、スイッチ８６において適切な選択肢を与えることができるようになる前に、とりうる振る舞いを予測することが困難である可能性があることである。信号分析部８４は多くの音声統計分析を実行する必要がある。符号化方式のあらゆる些少な変化が、統計的振る舞いを混乱させる可能性がある。 The advantage of an open loop decision is that only one actual encoding is performed. However, the disadvantage is that the analysis of signal characteristics can be very complex and it is difficult to predict possible behavior before the switch 86 can be given the appropriate choices. There is a possibility. The signal analyzer 84 needs to perform many speech statistical analyses. Any minor change in the coding scheme can disrupt the statistical behavior.

閉ループ選択を使用することによって（図３ｂ）、後続のユニットに変化を与えずに符号化方式を変更することが可能である。一方、多くの符号化方式が調査される場合には、演算量が多くなる。 By using closed loop selection (FIG. 3b), it is possible to change the coding scheme without changing the subsequent units. On the other hand, when many encoding methods are investigated, the amount of calculation increases.

以上のような副信号の可変フレーム長符号化の利点は、微細な時間分解能かつ粗い周波数分解能を選択できる一方、粗い時間分解能かつ微細な周波数分解能を選択することもできる点である。上記の実施形態は、最適にステレオ像を維持できる。 The advantage of the variable frame length encoding of the sub-signal as described above is that a fine time resolution and a coarse frequency resolution can be selected, while a coarse time resolution and a fine frequency resolution can be selected. The above embodiment can maintain a stereo image optimally.

異なる符号化方式において使用される実際の符号化について、いくつかの要件も存在する。具体的には閉ループ選択が使用されるとき、いくつかのある程度の同時符号化を実行する計算リソースは、大きくなければならない。符号化プロセスが複雑になると、より高い計算能力が必要とされる。さらに、送信における低ビットレートも好ましい。 There are also several requirements for the actual coding used in different coding schemes. Specifically, when closed loop selection is used, the computational resources to perform some degree of simultaneous encoding must be large. As the encoding process becomes more complex, higher computational power is required. Furthermore, a low bit rate in transmission is also preferred.

米国特許第５４３４９４８号において開示される方法は、フィルタリングされたモノラル信号（主信号）を使用して、副信号または差信号を近似する。フィルタ・パラメータを時間とともに変化させて最適化することが可能である。次いで、副信号の符号化を表すフィルタ・パラメータが送信される。一実施形態では、副残差信号も送信される。多くの場合、そのような手法は、本発明の範囲内において副信号符号化として使用することが可能である。しかし、この手法は、いくつかの欠点を有する。フィルタ係数の量子化およびあらゆる副残差信号は、送信について比較的高いビットレートをしばしば必要とする。その理由は、フィルタの次数が、精確な副信号評価を提供するために高くする必要があるからである。フィルタ自体の評価は、具体的には一過性の大量の音楽の場合、問題である可能性がある。評価誤差は、未修正信号より大きさが大きいことがある修正副信号を与える。これにより、ビットレートに対する要求はより高くなる。さらに、フィルタ係数の新しいセットがＮサンプルごとに計算される場合、フィルタ係数は、上述したように、１セットのフィルタ係数から他への移行が滑らかになるように補間する必要がある。フィルタ係数の補間は複雑なタスクであり、補間の誤差が、大きな誤差副信号において出現し、これにより、より高いビットレートが、異なる誤差信号符号化器について必要になる。 The method disclosed in US Pat. No. 5,434,948 uses a filtered monaural signal (main signal) to approximate a sub-signal or difference signal. Filter parameters can be optimized over time. A filter parameter representing the encoding of the side signal is then transmitted. In one embodiment, a secondary residual signal is also transmitted. In many cases, such an approach can be used as sub-signal coding within the scope of the present invention. However, this approach has several drawbacks. Filter coefficient quantization and any sub-residual signals often require relatively high bit rates for transmission. The reason is that the order of the filter needs to be high to provide an accurate side signal estimate. The evaluation of the filter itself can be a problem, especially for a large amount of transient music. The evaluation error provides a modified sub-signal that can be larger than the uncorrected signal. This increases the demand for bit rate. Furthermore, if a new set of filter coefficients is calculated every N samples, the filter coefficients need to be interpolated so that the transition from one set of filter coefficients to another is smooth as described above. Interpolating filter coefficients is a complex task, and interpolation errors appear in large error sub-signals, which requires higher bit rates for different error signal encoders.

補間の必要性を回避する手段は、サンプルごとにフィルタ係数を更新して、バックワード・アダプティブ分析に依拠するものである。これがうまく機能するためには、残りの符号化器のビットレートがかなり高いことが必要である。したがって、これは、低ビットレートステレオ符号化の良好な代替物でとはいえない。 A means of avoiding the need for interpolation is to rely on backward adaptive analysis, updating the filter coefficients for each sample. In order for this to work well, the bit rate of the remaining encoders needs to be quite high. This is therefore not a good alternative to low bit rate stereo coding.

たとえば音楽では非常に一般的である、モノラル信号および差信号がほとんど無相関である場合が存在する。したがって、フィルタの評価、異なる誤差信号符号化器について状況を単に悪化させるさらなる危険性により、非常に厄介になる。 For example, there are cases where the monaural signal and the difference signal are almost uncorrelated, which is very common in music. Thus, the evaluation of the filter, the additional risk of simply exacerbating the situation for different error signal encoders, is very cumbersome.

米国第５４３４９４８号による解決法は、会議電話システムなど、フィルタ係数が時間について非常に緩慢に変化する場合に、きわめてよく作用する可能性がある。音楽信号の場合には、この手法は非常によくは機能しない。その理由は、ステレオ像を追跡するためにフィルタを非常に迅速に変化させる必要があるからである。これは大きく異なる大きさのサブフレーム長を使用する必要があることを意味し、これは、試験される組合せの数が迅速に増大することを意味する。これは、すべての可能な符号化方式を計算する要件が、非実際的に高くなることを意味する。 The solution according to US Pat. No. 5,434,948 can work very well when the filter coefficients change very slowly over time, such as in conference phone systems. In the case of music signals, this technique does not work very well. The reason is that it is necessary to change the filter very quickly in order to track the stereo image. This means that subframe lengths of very different sizes need to be used, which means that the number of combinations tested increases rapidly. This means that the requirement to calculate all possible coding schemes is impractically high.

したがって、好適な実施形態では、副信号の符号化は、複雑なビットレートを消費する予測フィルタの代わりに、簡単なバランス係数を使用することによって、モノラル信号と副信号との間の冗長性を低減する概念に基づく。したがって、この演算の残差が符号化される。そのような残差の大きさは比較的小さく、その伝送に高いビットレートは必要ではない。この概念は、上述した可変フレーム・セット手法と実際に組み合わされるのに好適である。その理由は、計算の複雑さが軽度であるからである。 Thus, in the preferred embodiment, the sub-signal encoding reduces the redundancy between the monaural signal and the sub-signal by using a simple balance factor instead of a predictive filter that consumes a complex bit rate. Based on the concept of reducing. Therefore, the residual of this operation is encoded. The magnitude of such a residual is relatively small and a high bit rate is not necessary for its transmission. This concept is suitable for practical combination with the variable frame set approach described above. The reason is that the computational complexity is mild.

可変フレーム長の手法と組み合わされたバランス係数の使用によって、複雑な補間の必要性および、補間により生じる可能性がある関連する問題が解消する。さらに、複雑なフィルタの代わりに簡単なバランス係数を使用することにより、評価に付随する問題がより少なくなる。その理由は、バランス係数の可能な評価誤差の影響がより小さくなるからである。好ましい解決法は、良好な品質を有し、かつ限定されたビットレート要件および計算リソースを有して、パンされた信号および拡散音場を再生成することができる。 The use of a balance factor combined with a variable frame length approach eliminates the need for complex interpolation and related problems that may arise from interpolation. Furthermore, by using a simple balance factor instead of a complex filter, there are fewer problems associated with the evaluation. This is because the influence of possible evaluation errors on the balance coefficient is smaller. The preferred solution can regenerate the panned signal and diffuse sound field with good quality and with limited bit rate requirements and computational resources.

図４は、本発明によるステレオ符号化器の好適な実施形態を示す。この実施形態は、図２ａに示された実施形態と非常に類似しているが、副信号符号化部３０の詳細が明らかにされている。この実施形態の符号化器１４は、前処理部を有さず、入力信号は加算器３４および減算器３６に直接供給される。モノラル信号x_monoは、マルチプレクサ３３で、バランス係数g_smで乗算される。乗算されたモノラル信号は、減算器３５において、副信号x_side、すなわち本質的には２チャネル間の差から減算されて、副残差信号が生成される。バランス係数g_smは、品質基準に従い副残差信号が最小となるよう、最適化部３７によってモノラル信号および副信号の内容に基づいて決定される。品質基準は、Least-Mean-Square (LMS) 法に基づくものであることが好ましい。副残差信号は、任意の符号化手順に従って副残差符号化器３９において符号化される。副残差符号化器３９は、低ビットレート変換符号化器またはＣＥＬＰ（コードブック駆動線形予測）符号化器であることが好ましい。したがって、副信号を表す符号化パラメータp_sideは、副残差信号および最適化バランス係数４９を表す符号化パラメータp_{side residual}を含む。 FIG. 4 shows a preferred embodiment of a stereo encoder according to the invention. This embodiment is very similar to the embodiment shown in FIG. 2a, but details of the sub-signal encoding unit 30 are revealed. The encoder 14 of this embodiment does not have a preprocessing unit, and the input signal is directly supplied to the adder 34 and the subtracter 36. The monaural signal x _mono is multiplied by the balance coefficient g _sm in the multiplexer 33. The multiplied monaural signal is subtracted in the subtractor 35 from the sub-signal x _side , that is, essentially the difference between the two channels to generate a sub-residual signal. The balance coefficient g _sm is determined by the optimization unit 37 based on the contents of the monaural signal and the sub signal so that the sub residual signal is minimized according to the quality standard. The quality standard is preferably based on the Least-Mean-Square (LMS) method. The sub residual signal is encoded in the sub residual encoder 39 according to an arbitrary encoding procedure. The sub-residual encoder 39 is preferably a low bit rate transform encoder or a CELP (Codebook Driven Linear Prediction) encoder. Therefore, the coding parameter p _side representing the _side signal includes the coding parameter p _{side residual} representing the _{side residual} signal and the optimization balance coefficient 49.

図４の実施形態では、副信号の合成に使用されるモノラル信号４２は、モノラル符号化器３８の対象信号x_monoである。上述したように（図２ａと関連して）、モノラル符号化器３８の局所合成信号を使用することもできる。後者の場合、全体の符号化遅延が増大する可能性があり、副信号の計算の複雑さも増大する可能性がある。一方、品質は良好となる可能性がある。その理由は、モノラル符号化器生じる符号化誤差を修復することが可能であるからである。 In the embodiment of FIG. 4, the monaural signal 42 used for the synthesis of the sub-signal is the target signal x _mono of the monaural encoder 38. As mentioned above (in conjunction with FIG. 2a), the local composite signal of the mono encoder 38 can also be used. In the latter case, the overall coding delay may increase and the complexity of the sub-signal calculation may also increase. On the other hand, the quality may be good. The reason is that it is possible to repair the coding error that occurs in the monaural encoder.

数学的には、基本的な符号化方式は、以下のように記述することができる。ステレオの左チャネルおよび右チャネルをそれぞれ、チャネル信号ａおよびｂと表記する。チャネル信号は、加算によってモノラル信号とされ、減算によって副信号とされる。その演算は、以下のように記述される。 Mathematically, the basic coding scheme can be described as follows: The stereo left channel and right channel are denoted as channel signals a and b, respectively. The channel signal is converted into a monaural signal by addition, and is sub-signaled by subtraction. The operation is described as follows.

x_mono(n)＝0.5 (a(n) + b(n))
x_side(n)＝0.5 (a(n) - b(n)) x _mono (n) = 0.5 (a (n) + b (n))
x _side (n) = 0.5 (a (n)-b (n))

x_monoおよびx_side信号を２でスケーリングすることが有益である。ここで、x_monoおよびx_sideを生成する他の方式が存在することを示す。たとえば、以下を使用することができる。 It is beneficial to scale the x _mono and x _side signals by 2. Here we show that there are other _ways to generate x _mono and x _side . For example, the following can be used:

x_mono(n)＝γa(n) + (1 - γ)b(n)
x_side(n)＝γa(n) - (1 - γ)b(n)
0≦γ≦1.0 x _mono (n) = γa (n) + (1-γ) b (n)
x _side (n) = γa (n)-(1-γ) b (n)
0 ≦ γ ≦ 1.0

入力信号のブロックにおいて、修正または副残差信号が、下式に従って計算される。 In the block of input signals, the modified or sub-residual signal is calculated according to the following equation:

x_{side residual}(n)＝x_side(n) - f(x_mono, x_side)x_mono(n) x _{side residual} (n) = x _side (n)-f (x _mono , x _side ) x _mono (n)

上式で、f(x_mono, x_side) は、副信号およびモノラル信号からのＮサンプルのブロック、すなわちサブフレームに基づいて、副信号から可能な限り除去しようとするバランス係数関数である。すなわち、バランス係数は、副残差信号を最小とするために使用される。副残差信号がLMSにより最小に抑えられる特別な場合では、これは、副残差信号x_{side residual}のエネルギを最小とすることと等価である。 In the above equation, f (x _mono , x _side ) is a balance coefficient function to be removed from the sub signal as much as possible based on a block of N samples from the sub signal and the monaural signal, that is, a subframe. That is, the balance factor is used to minimize the secondary residual signal. In the special case where the secondary residual signal is minimized by the LMS, this is equivalent to minimizing the energy of the secondary residual signal x _{side residual} .

上述した特別な場合、f(x_mono，x_side) は、以下のように表される。 In the special case described above, f (x _mono , x _side ) is expressed as follows.

ただし、x_sideは副信号、x_monoはモノラル信号である。関数は、「frame start」において開始され、「frame end」において終了するブロックに基づくことに留意されたい。 However, x _side is a sub signal and x _mono is a monaural signal. Note that the function is based on a block that starts at "frame start" and ends at "frame end".

周波数領域の重み付けをバランス係数の計算に追加することが可能である。これは、x_side信号およびx_mono信号を重み付けフィルタのインパルス応答でたたみ込むことによって実行される。次いで、評価誤差を聞くことが困難な周波数範囲に移動させることが可能である。これは、聴感重み付けと呼ばれる。 Frequency domain weighting can be added to the calculation of the balance factor. This is done by _convolving the x _side signal and the x _mono signal with the impulse response of the weighting filter. It is then possible to move to a frequency range where it is difficult to hear the evaluation error. This is called auditory weighting.

関数ｆ（x_mono，x_side）によって与えられるバランス係数値の量子化バージョンが復号化器に送信される。修正副信号が生成されている場合には、量子化を考慮することが好ましい。したがって、以下の式が達成される。 A quantized version of the balance coefficient value given by the function f (x _mono , x _side ) is sent to the decoder. If a modified sub-signal is generated, it is preferable to consider quantization. Thus, the following equation is achieved:

Q_g(..)は、関数ｆ（x_mono，x_side）によって与えられるバランス係数に適用される量子化関数である。バランス係数は、送信チャネル上において送信される。通常の左右パン信号では、バランス係数は、［-1.0, 1.0］の間に限定される。一方、チャネルが互いに関して位相がずれている場合には、バランス係数は、これらの限界を超えて拡張してもよい。 Q _g (..) is a quantization function applied to the balance coefficient given by the function f (x _mono , x _side ). The balance factor is transmitted on the transmission channel. In a normal left / right pan signal, the balance coefficient is limited to [−1.0, 1.0]. On the other hand, if the channels are out of phase with respect to each other, the balance factor may extend beyond these limits.

ステレオ像を安定させるオプション手段として、モノラル信号と副信号との間の正規化相互相関が以下の式によって与えられるように不十分である場合、バランス係数を限定することができる。 As an optional means of stabilizing the stereo image, the balance factor can be limited if the normalized cross-correlation between the monaural signal and the sub-signal is insufficient as given by:

ただし、

However,

これらの状況は、たとえば大量の拡散音を有するクラシック音楽やスタジオ音楽では非常に頻繁に生じ、いくつかの場合、ａチャネルおよびｂチャネルは、モノラル信号が生成されるとき、時には互いにほとんど消去する可能性がある。バランス係数に対する影響は、迅速にジャンプすることがあることであり、これにより、ステレオ像が混信する。上記の修正により、この問題は軽減される。 These situations occur very often, for example in classical music or studio music with a large amount of diffused sound, and in some cases the a and b channels can sometimes be almost erased from each other when a mono signal is generated There is sex. The effect on the balance factor is that it can jump quickly, which causes stereo images to interfere. The above correction alleviates this problem.

米国第５４３４９４８号のフィルタ・ベースの手法は、同様の問題を有するが、この場合、解決はそれほど簡単ではない。 The filter-based approach of US Pat. No. 5,434,948 has similar problems, but in this case the solution is not so simple.

E_sが副残差信号の符号化関数（たとえば、変換符号化器）であり、E_mがモノラル信号の符号化関数である場合、復号化器側における復号されたa"信号およびb"信号は、以下のように記述することができる（ここでは、γ＝0.5と想定する）。 E _s is the encoding function of the sub residual signal (e.g., transcoder), and when E _m is the encoding function of the mono signal is decoded in the decoder side a "signal and b" signals Can be written as follows (assuming γ = 0.5):

各フレームについてバランス係数を計算することに由来する１つの重要な利点は、補間の使用が回避されることである。代わりに、通常は、上記で記述されたように、フレーム処理は、重なりフレームで実施される。 One important advantage derived from calculating the balance factor for each frame is that the use of interpolation is avoided. Instead, typically, as described above, frame processing is performed on overlapping frames.

バランス係数を使用する符号化原理は、ステレオ像を追跡するために迅速な変化が通常必要とされる楽音信号の場合に特によく動作する。 Coding principles that use balance factors work particularly well in the case of musical signals that usually require rapid changes to track a stereo image.

最近、マルチチャネル符号化が一般的になっている。一例が、ＤＶＤ映画の5.1チャネル・サラウンド音声である。この場合、チャネルは、前左、前中心、前右、後左、後右、およびサブウーファとして構成される。図５において、本発明によるチャネル間冗長を利用するような構成の３つのフロント・チャネルを符号化する符号化器の実施形態が示されている。 Recently, multi-channel coding has become common. An example is 5.1 channel surround sound of a DVD movie. In this case, the channel is configured as front left, front center, front right, back left, back right, and subwoofer. In FIG. 5, an embodiment of an encoder for encoding three front channels configured to take advantage of inter-channel redundancy according to the present invention is shown.

３つのチャネル信号Ｌ、Ｃ、Ｒが、３つの入力１６Ａ〜Ｃに供給され、モノラル信号x_monoは、すべての３つの信号の和によって生成される。中心信号x_centreを受信する中心信号符号化部１３０が追加される。モノラル信号４２は、この実施形態では、符号化され、復号されたモノラル信号x"_monoであり、マルチプレクサ１３３においてあるバランス係数g_Qで乗算される。減算器１３５において、乗算されたモノラル信号は、中心残差信号を生成するために、中心信号x_centreから減算される。バランス係数g_Qは、品質基準に従って中心残差信号を最小にするために、最適化部１３７によってモノラル信号および中心信号の内容に基づいて決定される。中心残差信号は、任意の符号化器手順に従って、中心残差符号化器１３９において符号化される。中心残差符号化器１３９は、低ビットレート変換符号化器またはＣＥＬＰ符号化器であることが好ましい。したがって、中心信号を表す符号化パラメータp_centreは、中心残差信号および最適化バランス係数１４９を表す符号化パラメータp_{centre residual}を含む。中心残差信号およびスケーリングされたモノラル信号は、加算器２３５において加算されて、符号化誤差について補償される修正中心信号１４２を生成する。 Three channel signals L, C, R are supplied to the three inputs 16A-C, and the monaural signal x _mono is generated by the sum of all three signals. A center signal encoding unit 130 that receives the center signal x _center is added. The monaural signal 42 in this embodiment is an encoded and decoded monaural signal x ″ _mono and is multiplied by a balance factor g _Q in the multiplexer 133. In the subtractor 135, the multiplied monaural signal is To generate the center residual signal, it is subtracted from the center signal x _centre, and the balance factor g _Q is optimized by the optimization unit 137 to minimize the center residual signal according to the quality criterion. The central residual signal is encoded in a central residual encoder 139 according to an arbitrary encoder procedure, which is a low bit rate transform encoding. preferably a vessel or CELP encoder. Accordingly, coding parameters p _center representing the center signal, the center residual signal and optimization balance coefficient . Center residual signal and the scaled mono signal including the encoded parameters p _{center residual_prediction_flag} representing the 49 are summed in adder 235 to produce a modified center signal 142 being compensated for encoding errors.

副信号x_side、すなわち、左Ｌチャネルと右Ｒチャネルとの差は、上述の実施形態と同様に、副信号符号化部３０に供給される。しかし、ここでは、最適化部３７は、中心信号符号化部１３０によって供給される修正中心信号１４２にも依存する。したがって、副残差信号は、減算器３５において、モノラル信号４２、修正中心信号１４２、および副信号の最適線形組合せとして生成される。 The sub signal x _side , that is, the difference between the left L channel and the right R channel is supplied to the sub signal encoding unit 30 as in the above-described embodiment. However, here, the optimization unit 37 also depends on the modified center signal 142 supplied by the center signal encoding unit 130. Accordingly, the sub residual signal is generated in the subtractor 35 as an optimal linear combination of the monaural signal 42, the modified center signal 142, and the sub signal.

上記で記述された可変フレーム長の概念は、副信号および中心信号のどちらか、または両方に適用することができる。 The variable frame length concept described above can be applied to either or both of the side signal and the center signal.

図６は、図５の符号化部から符号化オーディオ信号を受信するのに適切な復号化器を示す。受信信号５４は、主信号を表す符号化パラメータp_mono、中心信号を表す符号化パラメータp_centre、および副信号を表す符号化パラメータp_sideに分割される。復号化器６４において、主信号を表す符号化パラメータp_monoは、主信号x"_monoを生成するために使用される。復号化器１６０において、中心信号を表す符号化パラメータp_centreは、主信号x"_monoに基づいてx"_centreを生成するために使用される。復号化器６０において、副信号を表す符号化パラメータp_sideは復号され、主信号x"_monoおよび中心信号x"_centreに基づいて、副信号x"_sideを生成する。 FIG. 6 shows a decoder suitable for receiving an encoded audio signal from the encoder of FIG. The received signal 54 is divided into a coding parameter p _mono representing the main signal, a coding parameter p _center representing the center signal, and a coding parameter p _side representing the sub signal. In the decoder 64, the encoding parameter p _mono representing the main signal is used to generate the main signal x ″ _mono . In the decoder 160, the encoding parameter p _center representing the center signal is used as the main signal. Used to generate x " _center based on x" _mono . In decoder 60, the coding parameter _pside representing the _side signal is decoded and based on the main signal x " _mono and the center signal x" _center . Thus, the sub signal x " _side is generated.

手順は、数学的には以下のように表すことができる。 The procedure can be expressed mathematically as follows:

入力信号x_left, x_right, x_centreは、以下に従ってモノラル・チャネルに組み合わされる。 Input signals x _left , x _right , x _center are combined into a mono channel according to the following.

α、β、χは、残りのセクションにおいて、簡単化のために１．０に設定されるが、任意の値に設定することができる。α、β、χの値は、最適品質を達成するために、１つまたは２つのチャネルを強調するように、一定とする、または信号内容に依存するとすることができる。 α, β, and χ are set to 1.0 in the remaining sections for simplicity, but can be set to arbitrary values. The values of α, β, χ can be constant or dependent on the signal content to emphasize one or two channels in order to achieve optimal quality.

モノラル信号と中心信号との間の正規化相互相関は、以下のように計算される。 The normalized cross-correlation between the monaural signal and the center signal is calculated as follows:

ただし、

However,

x_centreは中心信号、x_monoはモノラル信号である。モノラル信号は、モノラル対象信号から来るが、モノラル符号化器の局所合成を使用することも同様に可能である。 x _center is the center signal and x _mono is the monaural signal. The monaural signal comes from the monaural target signal, but it is equally possible to use local synthesis of a monaural encoder.

中心残差信号は、以下のように符号化される。 The center residual signal is encoded as follows.

Q_g(..)は、バランス係数に適用される量子化関数である。バランス係数は、送信チャネル上で送信される。 Q _g (..) is a quantization function applied to the balance coefficient. The balance factor is transmitted on the transmission channel.

E_cが中心残差信号の符号化関数（たとえば変換符号化器）であり、E_mがモノラル信号の符号化関数である場合、復号化器側の復号されたx"_centre信号は、以下のように記述することができる。 If E _c is the encoding function of the center residual signal (eg, transform encoder) and E _m is the encoding function of the monaural signal, the decoded x " _center signal on the decoder side is Can be described as:

符号化される副残差信号は、以下のようになる。 The sub residual signal to be encoded is as follows.

ただし、g_Qsmおよびg_Qscは、下式を最小にするパラメータg_smおよびg_scの量子化された値である。 Here, g _Qsm and g _Qsc are quantized values of the parameters g _sm and g _sc that minimize the following equation.

ηは、エラーのLMS最小化のために、たとえば２に等しいとすることができる。g_smおよびg_scパラメータは、共同してまたは別々に量子化することができる。 η can be equal to, for example, 2 for LMS minimization of errors. The g _sm and g _sc parameters can be quantized jointly or separately.

E_sが副残差信号の符号化関数である場合、復号されたx"_leftおよびx"_rightのチャネル信号は、以下のように与えられる。 If E _s is the coding function of the sub-residual signal, the decoded x " _left and x" _right channel signals are given as follows:

最も厄介である異音の１つは、プリエコー効果である。図７ａ、ｂに、そのような異音を示す。曲線１００によって示されるような時間進行を有する信号成分を想定する。t0からの開始時、オーディオサンプルに信号成分は存在しない。t1とt2との間の時間tで、信号成分が突然出現する。信号成分が、t2-t1のフレーム長を使用して符号化される場合、信号成分の出現は、曲線１０１において示されるように、全フレームにわたって「不鮮明」になる。復号が曲線１０１について行われる場合、信号成分は、信号成分の意図した出現よりΔｔ前の時間において出現し、「プリエコー」が知覚される。 One of the most troublesome noises is the pre-echo effect. Such abnormal noise is shown in FIGS. Assume a signal component having a time progression as shown by curve 100. At the start from t0, there is no signal component in the audio sample. At time t between t1 and t2, the signal component suddenly appears. If the signal component is encoded using a frame length of t 2 -t 1, the appearance of the signal component becomes “blurred” over the entire frame, as shown in curve 101. When decoding is performed on curve 101, the signal component appears at a time Δt before the intended appearance of the signal component, and “pre-echo” is perceived.

プリエコー異音は、長い符号化フレームが使用される場合により強調される。短いフレームを使用することによって、異音はいくらか軽減される。上述したプリエコーの問題に対処する他の方式は、モノラル信号が符号化器側および復号化器側の両方において利用可能であるということを利用する。これにより、モノラル信号の等エネルギ曲線に従って副信号をスケーリングすることが可能になる。復号化器側において、逆スケーリングが実施され、したがって、プリエコーの問題はいくらか軽減されることが可能である。 Pre-echo noise is accentuated when long encoded frames are used. By using a short frame, the noise is somewhat reduced. Another scheme that addresses the above-mentioned pre-echo problem takes advantage of the fact that monaural signals are available on both the encoder side and the decoder side. This makes it possible to scale the sub-signal according to the isoenergy curve of the monaural signal. On the decoder side, inverse scaling is performed, so the pre-echo problem can be somewhat mitigated.

モノラル信号の等エネルギ曲線が、以下のように、フレームにわたって計算される。 The isoenergy curve of the monaural signal is calculated over the frame as follows:

ただし、w(n)は窓関数である。最も簡単な窓関数は矩形窓であるが、ハミング窓などの他の窓タイプがより好ましい場合がある。 However, w (n) is a window function. The simplest window function is a rectangular window, but other window types such as a Hamming window may be more preferred.

次いで、副残差信号が、以下のようにスケーリングされる。 The sub-residual signal is then scaled as follows:

より一般的な形態では、上記の式は、以下のように書くことができる。 In a more general form, the above equation can be written as:

上式で、ｆ（・・・）は、単調連続関数である。復号化器において、等エネルギ曲線は、復号モノラル信号について計算され、以下のように復号副信号に適用される。 In the above equation, f (...) is a monotone continuous function. In the decoder, an isoenergy curve is calculated for the decoded monaural signal and applied to the decoded sub-signal as follows:

ある意味ではこの等エネルギ曲線スケーリングは、より短いフレーム長の使用の代替であるので、この概念は、上記でさらに記述されているように、可変フレーム長の概念と組み合わされるのに特によく適している。等エネルギ曲線スケーリングを適用するいくつかの符号化方式、適用しないいくつかの符号化方式、および等エネルギ曲線スケーリングをあるサブフレーム中にのみ適用するいくつかの符号化方式を有することによって、符号化方式のより柔軟なセットを供給することが可能である。図８において、本発明による信号符号化ユニット３０の実施形態が示されている。ここでは、異なる符号化方式８１は、等エネルギ曲線スケーリングを適用する符号化を表すハッチングされたサブフレーム９１、および等エネルギ曲線スケーリングを適用しない符号化手順を表すハッチングされていないサブフレーム９２を有する。このようにして、異なる長さのサブフレームだけでなく、異なる符号化原理のサブフレームの組合せも、利用可能である。本説明例では、等エネルギ曲線スケーリングの適用は、異なる符号化方式間において異なる。より一般的な場合、任意の符号化原理を類似の方式で可変長概念と組み合わせることができる。 In a sense, this isoenergetic curve scaling is an alternative to the use of shorter frame lengths, so this concept is particularly well suited to be combined with the variable frame length concept, as further described above. Yes. Encoding by having some coding schemes that apply iso-energy curve scaling, some coding schemes that do not apply, and some coding schemes that apply iso-energy curve scaling only during certain subframes It is possible to supply a more flexible set of schemes. In FIG. 8, an embodiment of a signal encoding unit 30 according to the present invention is shown. Here, the different encoding schemes 81 have hatched subframes 91 that represent encodings that apply equal energy curve scaling and unhatched subframes 92 that represent encoding procedures that do not apply equal energy curve scaling. . In this way, not only subframes of different lengths but also combinations of subframes of different coding principles can be used. In this illustrative example, the application of isoenergy curve scaling is different between different coding schemes. In the more general case, any coding principle can be combined with the variable length concept in a similar manner.

図８の符号化方式のセットは、異なる方式でプリエコー異音などに対処する方式を備える。いくつかの方式では、等エネルギ曲線原理によるプリエコー最小化を有するより長いサブフレームが使用される。他の方式では、等エネルギ曲線スケーリングを有さないより短いサブフレームが使用される。信号内容に応じて、代替物の１つが、より有利である可能性がある。非常に深刻なプリエコーの場合、等エネルギ曲線スケーリングを有する短いサブフレームを使用する符号化方式が必要である可能性がある。 The set of encoding schemes in FIG. 8 includes schemes that deal with pre-echo abnormal noise and the like using different schemes. In some schemes, longer subframes with pre-echo minimization according to the isoenergy curve principle are used. In other schemes, shorter subframes without isoenergy curve scaling are used. Depending on the signal content, one of the alternatives may be more advantageous. For very severe pre-echo, an encoding scheme that uses short subframes with iso-energy curve scaling may be necessary.

提案した解決法は、全周波数帯域において、あるいは１または２以上の個別サブバンドにおいて使用することができる。サブバンドの使用は、主信号および副信号の両方について、またはそれらの一方について別々に、適用することができる。好適な実施形態では、いくつかの周波数帯域にて副信号を分割する。この理由は、単に、周波数帯域全体においてより、隔離された周波数帯域において可能な冗長性を除去することが容易であるからである。これは、大量のスペクトル内容を有する楽音信号を符号化するとき、特に重要である。 The proposed solution can be used in the whole frequency band or in one or more individual subbands. The use of subbands can be applied for both main and subsignals or separately for one of them. In the preferred embodiment, the sub-signal is divided in several frequency bands. This is simply because it is easier to remove the possible redundancy in the isolated frequency band than in the entire frequency band. This is particularly important when encoding musical signals having a large amount of spectral content.

１つの可能な使用は、周波数帯域を上記の方法で所定の閾値より下において符号化する。所定の閾値は、２ｋＨｚ、または、さらにより好ましくは１ｋＨｚとすることが好ましい。対象周波数範囲の残りの部分については、上記の方法で、またはまったく異なる方法で、他の追加の周波数帯域を符号化することができる。 One possible use is to encode the frequency band below a predetermined threshold in the manner described above. The predetermined threshold is preferably 2 kHz, or even more preferably 1 kHz. For the rest of the frequency range of interest, other additional frequency bands can be encoded in the manner described above or in a completely different manner.

好ましくは低周波数について上記の方法を使用する１つの動機は、一般に、拡散音場が、高周波数においてわずかなエネルギ内容を有するからである。当然の理由は、音吸収が、周波数と共に通常増大するからである。また、拡散音場成分は、より高い周波数では人の聴覚システムについて重要性の劣る役割を果たすようである。したがって、この解決法を（１ｋＨｚあるいは２ｋＨｚより低い）低周波数において使用し、より高い周波数では、他のより多くのビット効率のよい符号化方式に依拠することが有益である。この方式が低周波数においてのみ適用されるということにより、ビットレートが大きく節約されるが、その理由は、提案された方法に関して必要なビットレートは、必要な帯域幅に比例するからである。ほとんどの場合、モノラル符号化器は、周波数帯域全体を符号化することができ、一方、提案された副信号符号化は、図９に概略的に示されるように、周波数帯域のより低い部分においてのみ実施されることが示唆される。参照符号３０１は、副信号の本発明による符号化方式を指し、参照符号３０２は、副信号のあらゆる他の符号化方式を指し、参照符号３０３は、副信号の符号化方式を指す。 One motivation for using the above method, preferably for low frequencies, is because the diffuse sound field generally has a low energy content at high frequencies. The reason for this is that sound absorption usually increases with frequency. Also, diffuse sound field components seem to play a less important role for human hearing systems at higher frequencies. It is therefore beneficial to use this solution at low frequencies (below 1 kHz or 2 kHz) and rely on other more bit efficient coding schemes at higher frequencies. By applying this scheme only at low frequencies, the bit rate is greatly saved because the required bit rate for the proposed method is proportional to the required bandwidth. In most cases, the mono encoder can encode the entire frequency band, while the proposed sub-signal encoding is in the lower part of the frequency band, as shown schematically in FIG. It is suggested that only be implemented. Reference numeral 301 refers to the coding scheme of the sub-signal according to the invention, reference numeral 302 refers to any other coding scheme of the sub-signal, and reference numeral 303 refers to the coding scheme of the sub-signal.

また、いくつかの個別の周波数帯域について提案された方法を使用する可能性も存在する。 There is also the possibility to use the proposed method for several individual frequency bands.

図１０において、本発明による符号化方法の実施形態の主要ステップが、フローチャートとして示されている。手順は、ステップ２００において開始される。ステップ２１０において、ポリフォニック信号から推測される主信号が符号化される。ステップ２１２において、異なる長さおよび／または順序のサブフレームを備える符号化方式が提供される。ポリフォニック信号からステップ２１４において推測される副信号が、存在するポリフォニック信号の実際の信号内容に少なくとも部分的に依存して選択された符号化方式によって符号化される。手順は、ステップ２９９において終了する。 In FIG. 10, the main steps of an embodiment of the encoding method according to the invention are shown as a flowchart. The procedure begins at step 200. In step 210, the main signal inferred from the polyphonic signal is encoded. In step 212, an encoding scheme with different length and / or order subframes is provided. The sub-signal inferred from the polyphonic signal in step 214 is encoded with a selected encoding scheme that depends at least in part on the actual signal content of the polyphonic signal present. The procedure ends at step 299.

図１１において、本発明による符号化方法の実施形態の主要ステップが、フローチャートとして示されている。手順は、ステップ２００において開始される。ステップ２２０において、受信符号化主信号が復号される。ステップ２２２において、異なる長さおよび／または順序のサブフレームを備える復号方式が提供される。受信副信号が、選択された符号化方式によってステップ２２４において復号される。ステップ２２６において、復号された主信号および副信号は、ポリフォニック信号に組み合わされる。手順は、ステップ２９９において終了する。 In FIG. 11, the main steps of an embodiment of the encoding method according to the invention are shown as a flowchart. The procedure begins at step 200. In step 220, the received encoded main signal is decoded. In step 222, a decoding scheme with different lengths and / or orders of subframes is provided. The received sub-signal is decoded at step 224 according to the selected coding scheme. In step 226, the decoded main and sub signals are combined into a polyphonic signal. The procedure ends at step 299.

上述した実施形態は、本発明のいくつかの例示として理解されるべきである。当業者であれば、上述の実施形態に対して様々な修正、組合せ、変更を、本発明の範囲から逸脱することなく行うことが可能であろう。具体的には、異なる実施形態の異なる部分的解決法を、技術的に可能であれば、他の構成において組み合わせることができる。いずれにせよ、本発明の範囲は添付の特許請求の範囲によって定義される。 The above-described embodiments should be understood as some examples of the invention. Those skilled in the art will be able to make various modifications, combinations and changes to the above-described embodiments without departing from the scope of the present invention. In particular, different partial solutions of different embodiments can be combined in other configurations where technically possible. In any case, the scope of the present invention is defined by the appended claims.

ポリフォニック信号を送信するシステムを示すブロック図である。1 is a block diagram illustrating a system for transmitting a polyphonic signal. 送信機における符号化器のブロック図である。It is a block diagram of the encoder in a transmitter. 受信機における復号化器のブロック図である。It is a block diagram of the decoder in a receiver. 異なる長さのフレームの符号化を示す図である。It is a figure which shows the encoding of the frame of a different length. 本発明による副信号符号化部の実施形態のブロック図である。It is a block diagram of an embodiment of a sub-signal encoding unit according to the present invention. 本発明による副信号符号化部の実施形態のブロック図である。It is a block diagram of an embodiment of a sub-signal encoding unit according to the present invention. 副信号のバランス係数符号化を使用する符号化器の一実施形態のブロック図である。FIG. 3 is a block diagram of an embodiment of an encoder that uses balance factor encoding of sub-signals. マルチ信号システムの符号化器の一実施形態のブロック図である。2 is a block diagram of an embodiment of an encoder for a multi-signal system. FIG. 図５の装置から信号を復号するのに適切な復号化器の一実施形態のブロック図である。FIG. 6 is a block diagram of one embodiment of a decoder suitable for decoding signals from the apparatus of FIG. プリエコー異音を示す図である。It is a figure which shows pre-echo noise. プリエコー異音を示す図である。It is a figure which shows pre-echo noise. 異なるサブフレームについて異なる符号化原理を使用する、本発明による副信号符号化部の一実施形態のブロック図である。FIG. 6 is a block diagram of an embodiment of a sub-signal encoding unit according to the present invention that uses different encoding principles for different subframes. 異なる周波数サブバンドについて異なる符号化原理を使用することを示す図である。FIG. 4 illustrates using different encoding principles for different frequency subbands. 本発明による符号化方法の一実施形態の基本的なステップのフローチャートである。4 is a flowchart of basic steps of an embodiment of an encoding method according to the present invention; 本発明による復号方法の一実施形態の基本的なステップのフローチャートである。4 is a flowchart of basic steps of an embodiment of a decoding method according to the present invention;

Claims

Generating a first output signal (p _mono ), which is an encoding parameter representing a main signal, based on at least the signals of the first channel and the second channel (a, b; L, R);
A second output signal (p _side ) that is an encoding parameter representing a sub-signal based on at least the signals of the first channel and the second channel (a, b; L, R) in the encoded frame (80). A method for encoding a polyphonic signal, comprising:
The encoded frame (80) is constituted by a set of subframes (90) including at least one subframe (90), and the sum of the lengths of the subframes (90) is the length of the encoded frame (80). Further comprising the step (212) of providing at least two encoding schemes (81) equal to the length;
The step (214) of generating the second output signal (p _side ) includes the step of selecting an encoding scheme (81) that depends at least in part on the signal content of the current sub-signal (x _side ),
The second output signal (p _side ) is encoded separately for each subframe (90) of a set of selected subframes (90).

The step (214) of generating the second output signal (p _side ) includes:
A first linear combination of signals of at least the first and second channels (a, b; L, R) within every subframe (90) of each of at least two sets of subframes (90) Separately generating encoding parameters representing the _side signal (x _side );
Calculating a fidelity measure for each of the at least two encoding schemes (81);
Selecting a signal encoded as the encoding parameter (p _side ) representing the sub-signal from the encoding scheme (81) having the optimal fidelity measure. The method according to 1.

The method of claim 2, wherein the fidelity measure is based on a signal-to-noise measure.

If l _f is the length of the coded frame (80) and n is an integer, the length l _sf of the subframe (90) is
l _sf ＝ l _f / 2 ⁿ
The method according to claim 1, wherein the method is represented by:

The method of claim 4, wherein n is less than a predetermined value.

The method according to claim 5, characterized in that the at least two encoding schemes (81) have a pre-permutation of the length of the subframe (90).

The step (210) of generating a coding parameter (p _mono ) representing the main signal is:
Generating a main signal (x _mono ) as a second linear combination of signals of at least the first and second channels (a, b; L, R);
Encoding the main signal into an encoding parameter (p _mono ) representing the main signal;
The step of encoding the sub-signal includes:
Generating a _{side residual} signal (x _{side residual} ) as a difference between the _side signal and the main signal (x _mono ) scaled by a balance factor (g _sm );
Encoding the sub-residual signal and the balance factor (g _sm ) into the encoding parameter (p _side ) representing the sub-signal,
The method according to any of the _preceding claims, wherein the balance factor (g _sm ) is determined as a factor that minimizes the sub-residual signal according to a quality criterion.

The method according to claim 7, wherein the quality standard is based on Least-Mean-Square (LMS).

9. The method according to claim 1, wherein the step of encoding the sub-signal further includes a step of scaling the sub-signal (x _side ) according to an isoenergy curve of the main signal (x _mono ). The method of crab.

The method according to claim 9, characterized in that the scaling of the sub-signal (x _side ) is divided by a factor that is a monotone continuous function of the iso-energy curve of the main signal (x _mono ).

The method of claim 10, wherein the monotonic continuous function is a square root function.

Wherein such energy curve E _c of said main signal (x _mono) The method of claim 10 or 11, characterized in that it is computed over a sub-frame according to the following equation.

Here, L is an arbitrary coefficient, n is a sum index, m is a sample in the subframe, and w (n) is a window function.

The method of claim 12, wherein the window function is a rectangular window function.

The method of claim 12, wherein the window function is a Hamming window function.

15. The method according to claim 1, wherein the at least two encoding schemes (81) comprise different encoding principles of the sub-signal (x _side ).

At least a first encoding scheme of the at least two encoding schemes (81) includes a first encoding principle of the sub-signal (x _side ) of all subframes (90), and the at least two encoding schemes The at least second encoding scheme of the encoding scheme (81) includes a second encoding principle of the sub-signal (x _side ) of all subframes (90). Method.

At least one of the at least two encoding schemes (81) includes the first encoding principle for the sub-signal (x _side ) of one subframe and the sub-signal ( _17. Method according to claim 15 or 16, characterized in that it comprises the second encoding principle for _xside ).

The step (214) of generating the second output signal (p _side ) includes:
Analyzing a spectral characteristic of a _side signal (x _side ) that is a first linear combination of signals of at least the first and second channels (a, b; L, R);
Selecting a set of subframes (90) based on the analyzed spectral characteristics;
The method of claim 1, comprising: encoding each of said sub-signals (x _side ) of all sub-frames (90) in said selected set of sub-frames (90).

The second output signal (p _side) the step of generating (214) A method according to any one of claims 1 to 18, characterized in that it is applied in a limited frequency band.

The second output signal step (214) for generating a (p _side) The method of claim 19, characterized in that it is applied only for frequencies below 2 kHz.

21. The method according to claim 20, wherein the step (214) of generating the second output signal (p _side ) is applied only for frequencies below 1 kHz.

22. A method as claimed in any preceding claim, wherein the polyphonic signal represents a musical sound signal.

Decoding a coding parameter (p _mono ) representing the main signal (220);
A step (224) of decoding a coding parameter (p _side ) representing a sub-signal in the coded frame (80);
Combining at least the decoded main signal (x " _mono ) and the decoded sub signal (x" _side ) into signals of at least first and second channels (a, b; L, R) (226) A method for decoding a polyphonic signal comprising:
The encoded frame (80) is composed of a set of subframes (90) including at least one subframe (90), and the sum of the lengths of the subframes (90) is the length of the encoded frame (80). Providing (222) providing at least two encoding schemes (81) equal to the length;
The step (224) of decoding the coding parameter (p _side ) representing the sub-signal includes the step of representing the sub-signal in the sub-frame (90) of any of the at least two coding schemes (81). A method comprising the step of decoding an encoding parameter (p _side ).

Input means (16; 16A-C) for inputting a polyphonic signal (a, b; L, R, C) having at least first and second channels (a, b; L, R);
Means (38) for generating a first output signal (p _mono ), which is a coding parameter representing a main signal, based on at least the signals of the first and second channels (a, b; L, R);
In the encoded frame (80), a second output signal (p _side ), which is an encoding parameter representing a sub-signal, is generated based on at least the signals of the first and second channels (a, b; L, R). Means for generating (30);
An encoding device (14) having output means (52),
The encoded frame (80) is constituted by a set of subframes (90) including at least one subframe (90), and the sum of the lengths of the subframes (90) is the length of the encoded frame (80). Means for providing at least two encoding schemes (81) equal to length,
The means (30) for generating the second output signal (p _side ) includes means (86; 87) for selecting an encoding method depending at least in part on the signal content of the current sub signal (x _side ). ,
The encoding apparatus further comprising means for separately encoding the sub-signal (x _side ) in each of the selected sub-frames (90) of the encoding method.

An input means (54) for inputting a coding parameter (p _mono ) representing a main signal and a coding parameter (p _side ) representing a sub signal;
Means (64) for decoding the encoding parameter (p _mono ) representing the main signal;
Means (60) for decoding the encoding parameter (p _side ) representing the sub-signal in an encoded frame (80);
Means for combining at least the decoded main signal (x " _mono ) and the decoded sub signal (x" _side ) into signals of at least first and second channels (a, b; L, R) (68, 70) and
A decoding device (24) having output means (26; 26A-C),
The means (60) for decoding the encoding parameter (p _side ) representing the sub-signal is:
The encoded frame (80) is constituted by a set of subframes (90) including at least one subframe (90), and the sum of the lengths of the subframes (90) is the length of the encoded frame (80). Means for providing at least two encoding schemes (81) equal to length;
Means for separately decoding the coding parameter (p _side ) representing the sub-signal in any sub-frame (90) of the at least two coding schemes (81). apparatus.

Encoding device (14) according to claim 24;
An audio system (1) comprising at least one of the decoding device (24) according to claim 25.