JP2006259291A

JP2006259291A - Audio encoder

Info

Publication number: JP2006259291A
Application number: JP2005077253A
Authority: JP
Inventors: Shuji Miyasaka; 修二宮阪
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2005-03-17
Filing date: 2005-03-17
Publication date: 2006-09-28

Abstract

<P>PROBLEM TO BE SOLVED: To obtain compatibility with the conventional MPEG standard AAC system even when the encoded signal of Spatial Codec becomes large in size. <P>SOLUTION: An audio encoder is equipped with a 1st encoding section 101 which encodes a down-mix signal, a 2nd encoding section 102 which encodes information to put the down-mix signal back into a multichannel signal, a division section 103 which divides the encoded signal of a channel expansion section into A (A≥1) partial signals, and a multiplexing section 104 which stores them in a fill element, so the encoded signal of the channel expansion section is disregarded by an old type AA decoder. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、マルチチャンネル信号を符号化するオーディオエンコーダに関する。特に、入力のマルチチャンネル信号をステレオ信号にダウンミックスした信号を符号化したダウンミックス部符号化信号と、それをもとのマルチチャンネル信号に戻すための信号を符号化したチャネル拡大部符号化信号とを分離してビットストリームに格納し、しかも、そのビットストリームが、ＭＰＥＧ規格ＡＡＣ方式のステレオ符号化信号と互換性を持つようなオーディオエンコーダに関する。 The present invention relates to an audio encoder that encodes a multi-channel signal. In particular, a downmix unit encoded signal obtained by encoding a signal obtained by downmixing an input multichannel signal into a stereo signal, and a channel expansion unit encoded signal obtained by encoding a signal for returning it to the original multichannel signal. And an audio encoder in which the bit stream is compatible with the MPEG standard AAC stereo encoded signal.

現在、ＭＰＥＧオーディオ規格化活動において、ＳｐａｔｉａｌＣｏｄｅｃの規格化が進められている。ＳｐａｔｉａｌＣｏｄｅｃとは、入力のマルチチャンネル信号をステレオ信号にダウンミックスした信号を符号化したダウンミックス部符号化信号と、それをもとのマルチチャンネル信号に戻すための信号を符号化したチャネル拡大部符号化信号とを分離してビットストリームに格納するような符号化方式である。 Currently, Spatial Codec is being standardized in MPEG audio standardization activities. Spatial Codec is a downmix unit encoded signal obtained by encoding a signal obtained by downmixing an input multichannel signal into a stereo signal, and a channel expansion unit encoding a signal for returning the signal to the original multichannel signal. This is an encoding method in which an encoded signal is separated and stored in a bit stream.

一方、入力の２チャンネルのステレオ信号をモノラル信号にダウンミックスした信号を符号化した符号化信号と、それをもとのステレオ信号に戻すための信号を符号化した符号化信号とを分離してビットストリームに格納するような符号化方式として、既にＭＰＥＧ規格においてＰａｒａｍｅｔｒｉｃＣｏｄｉｎｇの技術が規格化されている（例えば、非特許文献１参照）。
ＩＳＯ／ＩＥＣ１４４９６−３：２００１／ＦＤＡＭ２（ＰａｒａｍｅｔｒｉｃＣｏｄｉｎｇｆｏｒＨｉｇｈＱｕａｌｉｔｙＡｕｄｉｏ） On the other hand, an encoded signal obtained by encoding a signal obtained by down-mixing a stereo signal of two input channels into a monaural signal and an encoded signal obtained by encoding a signal for returning it to the original stereo signal are separated. As an encoding method for storing in a bitstream, the Parametric Coding technique has already been standardized in the MPEG standard (for example, see Non-Patent Document 1).
ISO / IEC 14496-3: 2001 / FDAM2 (Parametric Coding for High Quality Audio)

しかしながら、従来のＭＰＥＧ規格ＡＡＣ方式は、入力信号が例えば５．１チャンネルの場合、前方２チャンネルと後方２チャンネルとセンターチャンネルとＬＦＥチャンネルとにそれぞれ分離して圧縮符号化される規格であるので、前記のＳｐａｔｉａｌＣｏｄｅｃのように、入力のマルチチャンネル信号をステレオ信号にダウンミックスした信号を符号化したダウンミックス部符号化信号と、それをもとのマルチチャンネル信号に戻すための信号を符号化したチャネル拡大部符号化信号とを分離してビットストリームに格納する方式は、従来のＭＰＥＧ規格ＡＡＣ方式との互換性がとれなくなるという問題が生じる。 However, the conventional MPEG standard AAC system is a standard in which, for example, when the input signal is 5.1 channel, it is compressed and encoded separately into the front 2 channel, the back 2 channel, the center channel, and the LFE channel. Like the Spatial Codec, a downmix unit encoded signal obtained by encoding a signal obtained by downmixing an input multichannel signal into a stereo signal, and a signal for returning it to the original multichannel signal are encoded. The method of separating the channel expansion portion encoded signal and storing it in the bit stream has a problem that compatibility with the conventional MPEG standard AAC method cannot be achieved.

また、ＭＰＥＧ規格におけるＰａｒａｍｅｔｒｉｃＣｏｄｉｎｇの技術では、ＭＰＥＧ規格ＡＡＣ方式との互換性を取るために、ＭＰＥＧ規格ＡＡＣ方式におけるｆｉｌｌｅｌｅｍｅｎｔに、モノラル信号をもとのステレオ信号に戻すための信号を符号化した符号化信号を格納することを開示しているが、ＳｐａｔｉａｌＣｏｄｅｃにおいて前記チャネル拡大部符号化信号を従来と同じ方法でｆｉｌｌｅｌｅｍｅｎｔに格納すると、下記のような課題が発生する。以下、そのことについて述べる。 In addition, Parametric Coding technology in the MPEG standard encodes a signal for returning a monaural signal to an original stereo signal in the fill element in the MPEG standard AAC system in order to ensure compatibility with the MPEG standard AAC system. Although storing the encoded signal is disclosed, if the channel expansion portion encoded signal is stored in the fill element in the Spatial Codec by the same method as the prior art, the following problems occur. This will be described below.

図４は、前記チャネル拡大部符号化信号をｆｉｌｌｅｌｅｍｅｎｔに格納した場合に予測される問題点を示す図である。図４において、横軸は圧縮符号化におけるビットレートを表し、縦軸は音質を表す。ＭＰ２と付された曲線は、ＭＰＥＧ２−Ｌａｙｅｒ２方式におけるビットレートと音質の関係を示す。ビットレートを下げると音質が下がる傾向が示されている。図４は、横軸／縦軸とも具体的で定量的な値は示しておらず、単に傾向のみを示している。ＭＰ３と付された曲線は、ＭＰＥＧ２−Ｌａｙｅｒ３方式におけるビットレートと音質の関係を示し、ＡＡＣと付された曲線は、ＭＰＥＧ２−ＡＡＣ方式におけるビットレートと音質の関係を示している。 FIG. 4 is a diagram illustrating a problem that is predicted when the channel expansion unit coded signal is stored in a fill element. In FIG. 4, the horizontal axis represents the bit rate in compression encoding, and the vertical axis represents the sound quality. A curve labeled MP2 indicates the relationship between the bit rate and the sound quality in the MPEG2-Layer2 system. There is a tendency that the sound quality decreases when the bit rate is lowered. FIG. 4 does not show specific and quantitative values on the horizontal axis / vertical axis, but merely shows trends. The curve attached with MP3 shows the relationship between the bit rate and the sound quality in the MPEG2-Layer3 system, and the curve attached with AAC shows the relationship between the bit rate and the sound quality in the MPEG2-AAC system.

図４に示すように、何れのビットレートにおいても、新しく開発された符号化方式は、旧式の符号化方式より音質が良い、という特長を備えていた。即ち、ＭＰ２の次に開発されたＭＰ３は何れのビットレートにおいてもＭＰ２より音質が優れており、ＭＰ３の次に開発されたＡＡＣは何れのビットレートにおいてもＭＰ３より音質が優れていた。しかしながら、前記チャネル拡大部符号化信号をｆｉｌｌｅｌｅｍｅｎｔに格納することによってＡＡＣ方式と互換性を保持しようとしたＳｐａｔｉａｌＣｏｄｅｃの場合、図４の破線の曲線に示すように、ビットレートが低い場合は、従来の方式より音質がよくなることが期待できるが、ビットレートを上げても、音質が従来の方式よりよくならないという課題がある。 As shown in FIG. 4, at any bit rate, the newly developed encoding method has the feature that the sound quality is better than the old encoding method. That is, MP3 developed after MP2 has better sound quality than MP2 at any bit rate, and AAC developed next to MP3 has better sound quality than MP3 at any bitrate. However, in the case of the Spatial Codec that attempts to maintain compatibility with the AAC scheme by storing the channel expansion portion encoded signal in a fill element, as shown by the dashed curve in FIG. 4, when the bit rate is low, Although it can be expected that the sound quality will be better than the conventional method, there is a problem that even if the bit rate is increased, the sound quality is not better than the conventional method.

低ビットレートにおいて、音質がよくなる理由は、ＭＰＥＧ４規格のＰａｒａｍｅｔｒｉｃＣｏｄｉｎｇ方式によって示されたように、２チャネルの信号をモノラル信号にダウンミックスし、それをもとにステレオ信号に戻す場合、チャネル間のゲイン差情報や相関の度合いを用いることで低ビットレートで高音質が得られることが示されており、少なくともそのような技術を用いると、マルチチャネル信号をステレオ信号にダウンミックスした信号をもとのマルチチャネル信号に戻す際に、低ビットレートで高音質が得られるからである。 The reason why the sound quality is improved at a low bit rate is that when the 2-channel signal is downmixed to a monaural signal and returned to a stereo signal based on that as shown by the MPEG4 standard Parametric Coding method, It has been shown that high sound quality can be obtained at a low bit rate by using gain difference information and the degree of correlation. At least using such a technique, a multi-channel signal is based on a signal that is downmixed to a stereo signal. This is because high sound quality can be obtained at a low bit rate when the multi-channel signal is restored.

一方図５は、高ビットレートで音質がよくならない理由を説明するための図である。図５の上段は、１２８ｋｂｏｓにおけるＳｐａｔｉａｌＣｏｄｅｃの符号化信号の構成の概略を示している。網掛けの部分が、ダウンミックス部符号化信号であり、白抜きの部分がチャンネル拡大部符号化信号である。相対的にダウンミックス部符号化信号がチャンネル拡大部符号化信号より大きい。一方、図５の下段は、３２０ｋｂｓｐにおけるＳｐａｔｉａｌＣｏｄｅｃの符号化信号の構成の概略を示している。さてここで問題となるのは、網掛け部分のダウンミックス部符号化信号は、２チャネルの信号を符号化したものであるので、図５下段の示すほどビットレートを上げても音質の向上は飽和する。これはＡＡＣを用いる場合２チャネルの符号化信号では、１２８ｋｂｐｓほどで音質の向上は飽和するからである。そうであれば、網掛け部分のダウンミックス部符号化信号のサイズを低く押さえて、白抜き部分のチャンネル拡大部符号化信号のサイズを上げれば音質が向上することが期待できるが、前記チャネル拡大部符号化信号をｆｉｌｌｅｌｅｍｅｎｔに格納しようとするとそのようにできない。なぜならば、ｆｉｌｌｅｌｅｍｅｎｔに格納できる情報のサイズは、ＡＡＣ規格の規定から２６９バイトに制限されるからである。それは、ＡＡＣ規格においてｆｉｌｌｅｌｅｍｅｎｔは図７に示すシンタックスで規定されているので、最大でも２６９バイトとなるのであるからである（ＩＳＯ／ＩＥＣ１３８１８−７）。このようなチャネル拡大部符号化信号をｆｉｌｌｅｌｅｍｅｎｔに格納しようとした場合の課題を整理したものが、図６である。 On the other hand, FIG. 5 is a diagram for explaining the reason why the sound quality is not improved at a high bit rate. The upper part of FIG. 5 shows an outline of the configuration of an encoded signal of Spatial Codec at 128 kbos. The shaded portion is the downmix portion encoded signal, and the white portion is the channel expansion portion encoded signal. The downmix part encoded signal is relatively larger than the channel expansion part encoded signal. On the other hand, the lower part of FIG. 5 shows an outline of the configuration of an encoded signal of Spatial Codec at 320 kbsp. The problem here is that the downmixed portion encoded signal in the shaded portion is obtained by encoding a two-channel signal. Therefore, as shown in the lower part of FIG. Saturates. This is because when AAC is used, the improvement in sound quality is saturated at about 128 kbps in the case of a 2-channel encoded signal. If so, it can be expected that the sound quality can be improved by reducing the size of the downmixed portion encoded signal in the shaded portion and increasing the size of the channel expanded portion encoded signal in the whitened portion. This is not possible when trying to store a partial encoded signal in a fill element. This is because the size of information that can be stored in the fill element is limited to 269 bytes because of the AAC standard. This is because the fill element is defined by the syntax shown in FIG. 7 in the AAC standard, and is 269 bytes at the maximum (ISO / IEC 13818-7). FIG. 6 shows a summary of the problems when attempting to store such a channel expansion portion encoded signal in a fill element.

即ち、網掛け部分で示すダウンミックス部符号化信号は、ビットレートを上げても（サイズを大きくしても）音質は飽和して向上せず、白抜き部分で示すチャンネル拡大部符号化信号は、ｆｉｌｌｅｌｅｍｅｎｔの制約により、ビットレートを上げられない（サイズを大きくできない）ことになる。前記のＰａｒａｍｅｔｒｉｃＣｏｄｉｎｇの場合は、チャネル数を拡大するための情報が、モノラルをステレオにするための情報だけであったのでこのようなｆｉｌｌｅｌｅｍｅｎｔのサイズの制約は無視出来たが、ＳｐａｔｉａｌＣｏｄｅｃの場合は、ステレオ信号を、５チャネルや７チャネルというマルチチャネルに拡大するわけであるので、ｆｉｌｌｅｌｅｍｅｎｔのサイズの制約が、音質向上の差し障りになるという課題が生じる。 That is, even if the bit rate is increased (the size is increased), the downmix part encoded signal indicated by the shaded part does not improve because the sound quality is saturated, and the channel enlarged part encoded signal indicated by the white part is The bit rate cannot be increased (the size cannot be increased) due to the restriction of the fill element. In the case of Parametric Coding, since the information for expanding the number of channels is only information for making the monaural into stereo, such a restriction on the size of the fill element can be ignored, but in the case of Spatial Codec Since the stereo signal is expanded to multi-channels such as 5 channels and 7 channels, there is a problem that the restriction on the size of the fill element hinders improvement in sound quality.

本発明は、このような従来の問題点に鑑みてなされたものであって、前記ダウンミックス部符号化信号と、前記チャネル拡大部符号化信号とを分離してビットストリームに格納する際に、前記ダウンミックス部符号化信号がステレオ符号化信号として従来のＭＰＥＧ規格ＡＡＣ方式との互換性が取れるようにするためのオーディオエンコーダを提供するとともに、前記チャネル拡大部符号化信号のサイズを大きなサイズにして高音質化を図る場合でも従来のＭＰＥＧ規格ＡＡＣ方式との互換性が取れるようにするためのオーディオエンコーダを提供することを目的とする。 The present invention has been made in view of such a conventional problem, and when the downmix part encoded signal and the channel expansion part encoded signal are separated and stored in a bitstream, An audio encoder is provided for allowing the downmix part encoded signal to be compatible with the conventional MPEG standard AAC system as a stereo encoded signal, and the size of the channel expansion part encoded signal is increased. Therefore, it is an object of the present invention to provide an audio encoder that can be compatible with the conventional MPEG standard AAC system even when the sound quality is improved.

上記の課題を解決するため、本願の請求項１記載の発明は、Ｍチャネル（Ｍ＞２）のマルチチャネル信号をステレオ信号にダウンミックスするダウンミックス手段と、前記ダウンミックス信号を符号化し、ダウンミックス部符号化信号を生成する第１符号化手段と、前記ダウンミックス信号をマルチチャネル信号に戻すための情報を符号化し、チャネル拡大部符号化信号を生成する第２符号化手段と、前記チャネル拡大部符号化信号を、それぞれがＮバイト以下のＡ（Ａ≧１）個の部分信号に分割する分割手段と、前記ダウンミックス部符号化信号と前記Ａ個の部分信号とを多重化する多重化手段とを備えたことを特徴とするものである。 In order to solve the above problems, the invention according to claim 1 of the present application is directed to downmix means for downmixing an M channel (M> 2) multichannel signal into a stereo signal, and encoding the downmix signal to First encoding means for generating a mix part encoded signal, second encoding means for encoding information for returning the downmix signal to a multi-channel signal and generating a channel expansion part encoded signal, and the channel Dividing means for dividing the expanded portion encoded signal into A (A ≧ 1) partial signals each having N bytes or less, and multiplexing for multiplexing the downmix portion encoded signal and the A partial signals And a converting means.

本願の請求項２記載の発明は、請求項１記載の発明において、さらに、前記多重化手段が、前記Ｍの値が所定の値以上の場合、前記Ａを２以上にすることを特徴とするものである。 The invention described in claim 2 of the present application is characterized in that, in the invention described in claim 1, the multiplexing means further sets the A to 2 or more when the value of M is a predetermined value or more. Is.

本願の請求項３記載の発明は、請求項１或いは２記載の発明において、さらに、前記第２符号化手段が、前記入力のマルチチャネル信号の所定のチャネル間のゲイン差情報と相関の度合いとを符号化し、前記多重化手段が、前記第２符号化手段が、前記ゲイン差情報と相関の度合いとを所定の時間分解能以上の細かさで符号化する場合は、前記Ａを２以上にすることを特徴とするものである。 The invention according to claim 3 of the present application is the invention according to claim 1 or 2, wherein the second encoding means further includes gain difference information between predetermined channels of the input multi-channel signal and a degree of correlation. And when the second encoding unit encodes the gain difference information and the degree of correlation with a fineness equal to or greater than a predetermined time resolution, the A is set to 2 or more. It is characterized by this.

本願の請求項４記載の発明は、請求項１或いは２記載の発明において、さらに、前記第２符号化手段が、前記入力のマルチチャネル信号の所定のチャネル間のゲイン差情報と相関の度合いとを符号化し、前記多重化手段が、前記第２符号化手段が、前記ゲイン差情報と相関の度合いとを所定の周波数分解能以上の細かさで符号化する場合は、前記Ａを２以上にすることを特徴とするものである。 The invention according to claim 4 of the present application is the invention according to claim 1 or 2, wherein the second encoding means further includes information on gain difference between predetermined channels of the input multi-channel signal and the degree of correlation. When the second encoding unit encodes the gain difference information and the degree of correlation with a fineness equal to or higher than a predetermined frequency resolution, the A is set to 2 or more. It is characterized by this.

請求項１の発明によれば、前記ダウンミックス部符号化信号と、前記チャネル拡大部符号化信号とを分離してビットストリームに格納する際に、前記ダウンミックス部符号化信号がステレオ符号化信号として従来のＭＰＥＧ規格ＡＡＣ方式との互換性が取れることとなる。しかも、前記チャネル拡大部符号化信号のサイズを非常に大きなサイズにして高音質化を図る場合でも従来のＭＰＥＧ規格ＡＡＣ方式との互換性が取れることとなる。 According to the first aspect of the present invention, when the downmix section encoded signal and the channel expansion section encoded signal are separated and stored in a bitstream, the downmix section encoded signal is a stereo encoded signal. As a result, compatibility with the conventional MPEG standard AAC system can be obtained. In addition, compatibility with the conventional MPEG standard AAC system can be achieved even when the size of the encoded signal of the channel expansion unit is made very large to improve the sound quality.

請求項２の発明によれば、入力のマルチチャネル信号のチャネル数が大きな場合でも、従来のＭＰＥＧ規格ＡＡＣ方式との互換性が取れることとなる。 According to the second aspect of the present invention, even when the number of channels of the input multi-channel signal is large, compatibility with the conventional MPEG standard AAC system can be achieved.

請求項３の発明によれば、時間分解能をあげることによって高音質化を図る場合でも従来のＭＰＥＧ規格ＡＡＣ方式との互換性が取れることとなる。 According to the third aspect of the present invention, compatibility with the conventional MPEG standard AAC system can be obtained even when the sound quality is improved by increasing the time resolution.

請求項４の発明によれば、周波数分解能をあげることによって高音質化を図る場合でも従来のＭＰＥＧ規格ＡＡＣ方式との互換性が取れることとなる。 According to the fourth aspect of the present invention, compatibility with the conventional MPEG standard AAC system can be obtained even when the sound quality is improved by increasing the frequency resolution.

（実施の形態１）
以下本発明の実施の形態１におけるオーディオエンコーダについて図面を参照しながら説明する。 (Embodiment 1)
The audio encoder according to Embodiment 1 of the present invention will be described below with reference to the drawings.

図１は本実施の形態１におけるオーディオエンコーダの構成を示す図である。
図１に示されるように、オーディオエンコーダは、Ｍチャネル（Ｍ＞２）のマルチチャネル信号をステレオ信号にダウンミックスするダウンミックス部１００と、前記ダウンミックス信号を符号化し、ダウンミックス部符号化信号を生成する第１符号化部１０１と、前記ダウンミックス信号をマルチチャネル信号に戻すための情報を符号化し、チャネル拡大部符号化信号を生成する第２符号化部１０２と、前記チャネル拡大部符号化信号を、それぞれがＮバイト以下のＡ（Ａ≧１）個の部分信号に分割する分割部１０３と、前記ダウンミックス部符号化信号と前記Ａ個の部分信号とを多重化する多重化部１０４とを備える。 FIG. 1 is a diagram showing a configuration of an audio encoder according to the first embodiment.
As shown in FIG. 1, the audio encoder includes a downmix unit 100 that downmixes an M channel (M> 2) multichannel signal into a stereo signal, encodes the downmix signal, and encodes a downmix unit encoded signal. A first encoding unit 101 that generates a channel expansion unit encoded signal by encoding information for returning the downmix signal to a multi-channel signal, and the channel expansion unit code A dividing unit 103 that divides the encoded signal into A (A ≧ 1) partial signals each having N bytes or less, and a multiplexing unit that multiplexes the downmix unit encoded signal and the A partial signals 104.

以上のように構成されたオーディオエンコーダの動作について以下説明する。
まず、前記ダウンミックス部１００は、本実施の形態では４チャネル（前左ｃｈ、前右ｃｈ、後左ｃｈ、後右ｃｈ）のマルチチャネル信号を入力としステレオ信号にダウンミックスする。その方法は例えば、前左ｃｈ＋後左ｃｈを新たに左ｃｈとし、前右ｃｈ＋後右ｃｈを新たに右ｃｈとする、というような方法が一般的であるが、たし合わせるときに前方ｃｈと後方ｃｈとに重み付ける方法などでもよい。 The operation of the audio encoder configured as described above will be described below.
First, in the present embodiment, the downmix unit 100 receives a multi-channel signal of 4 channels (front left ch, front right ch, back left ch, back right ch) as input and downmixes it to a stereo signal. For example, the front left channel + rear left channel is newly set as the left channel, and the front right channel + rear right channel is newly set as the right channel. A method of weighting and rear channels may be used.

次に前記第１符号化部１０１は、前記ダウンミックス信号を符号化し、ダウンミックス部符号化信号を生成する。本願では、前記第１符号化部１０１は前記ダウンミックス信号をＭＰＥＧ規格ＡＡＣ方式におけるステレオ信号として符号化する。 Next, the first encoding unit 101 encodes the downmix signal to generate a downmix unit encoded signal. In the present application, the first encoding unit 101 encodes the downmix signal as a stereo signal in the MPEG standard AAC system.

次に前記第２符号化部１０２は、前記ダウンミックス信号をマルチチャネル信号に戻すための情報を符号化し、チャネル拡大部符号化信号を生成する。その方法は、例えば、ダウンミックスする前の複数のチャネル信号間のゲイン差や相関の度合いなどを符号化し、その符号化信号をチャネル拡大部符号化信号とするという方法でよい。近年では、Ｌｃｈ、Ｒｃｈのステレオ信号をモノラル信号にダウンミックスした信号を元のステレオ信号に戻すための情報として上記ゲイン差や相関の度合いを符号化するという技術が、ＭＰＥＧ４規格ＰａｒａｍｅｔｒｉｃＣｏｄｉｎｇ方式として規格化されている。本願におけるチャネル拡大部符号化信号も、そのような技術を用いて生成すればよい。 Next, the second encoding unit 102 encodes information for returning the downmix signal to a multi-channel signal, and generates a channel expansion unit encoded signal. The method may be, for example, a method in which a gain difference or a degree of correlation between a plurality of channel signals before downmixing is encoded, and the encoded signal is used as a channel expansion unit encoded signal. In recent years, a technique of encoding the gain difference and the degree of correlation as information for returning a signal obtained by downmixing a stereo signal of Lch and Rch to a monaural signal to the original stereo signal has been standardized as an MPEG4 standard Parametric Coding method. It has become. What is necessary is just to produce | generate the channel expansion part encoding signal in this application using such a technique.

次に前記分割部１０３は、前記チャネル拡大部符号化信号を、それぞれがＮバイト以下のＡ（Ａ≧１）個の部分信号に分割する。その方法は、単に、前記チャネル拡大部符号化信号を、それぞれがＮバイト以下のＡ（Ａ≧１）個の部分信号に分割するようにしてもよいし、あるいは、前左ｃｈと後左ｃｈとをダウンミックスした信号を元のそれぞれの信号に戻すための符号化信号を１個目の部分信号とし、前右ｃｈと後右ｃｈとをダウンミックスした信号を元のそれぞれの信号に戻すための符号化信号を２個目の部分信号とする、というように何れのチャネルの関する符号化信号かによって分割してもよい。 Next, the dividing section 103 divides the channel expansion section encoded signal into A (A ≧ 1) partial signals each having N bytes or less. In this method, the channel expansion portion encoded signal may be simply divided into A (A ≧ 1) partial signals each having N bytes or less, or the front left channel and the rear left channel. In order to return the signal obtained by down-mixing the front right channel and the rear right channel to the original signal, the encoded signal for returning the signal obtained by down-mixing the signal to the original signal is the first partial signal. The encoded signal may be divided according to which channel the encoded signal is, such as the second partial signal.

或いは、前記チャネル拡大部符号化信号のうち、ゲイン差情報を符号化した符号化信号を１個目の部分信号とし、相関の度合いを符号化した符号化信号を２個目の部分信号とするというように、符号化情報の種類によって分割してもよい。あるいは、前記第２符号化部１０２が、入力の周波数帯域毎にチャネル拡大部符号化信号を生成するような符号化方式の場合は、周波数の低い帯域に対する符号化信号を１個目の部分信号とし、周波数の高い帯域に対する符号化信号を２個目の部分信号とする、というように、入力信号の物理的位置によって分割してもよい。勿論その場合、周波数の位置で分けるのではなく、時間的な前後関係の位置によって分割してもよい。 Alternatively, of the channel expansion unit encoded signal, an encoded signal obtained by encoding gain difference information is used as the first partial signal, and an encoded signal obtained by encoding the degree of correlation is used as the second partial signal. In this way, division may be made according to the type of encoded information. Alternatively, in the case where the second encoding unit 102 generates a channel expansion unit encoded signal for each input frequency band, the encoded signal for the low frequency band is used as the first partial signal. And the coded signal for the high frequency band may be divided by the physical position of the input signal, such as the second partial signal. Of course, in that case, it may be divided not by frequency position but by temporal position.

最後に、前記多重化部１０４で前記ダウンミックス部符号化信号と前記Ａ個の部分信号とを多重化する。ここで、前記多重化部１０４は前記ダウンミックス部符号化信号についてはＭＰＥＧ規格ＡＡＣ方式のステレオ符号化信号として多重化する。また、前記Ａ個の部分信号についてはそれぞれＭＰＥＧ規格ＡＡＣ方式におけるｆｉｌｌ＿ｅｌｅｍｅｎｔの形式でフォーマットし多重化する。 Finally, the multiplexing unit 104 multiplexes the downmix unit encoded signal and the A partial signals. Here, the multiplexing unit 104 multiplexes the downmix unit encoded signal as an MPEG standard AAC stereo encoded signal. Each of the A partial signals is formatted and multiplexed in the form of a fill_element in the MPEG standard AAC system.

ここで注意しなければならないことは、前記Ａが２以上の場合、前記部分信号は夫々単独では前記チャネル拡大部符号化信号を構成していないことを示す情報も多重化するということである。言い換えると、前記Ａ個のｆｉｌｌ＿ｅｌｅｍｅｎｔが前記チャネル拡大部符号化信号のどの部分を格納しているかを示す情報も多重化するということである。 It should be noted here that when A is 2 or more, each partial signal alone also multiplexes information indicating that it does not constitute the channel expansion portion encoded signal. In other words, information indicating which part of the channel expansion portion encoded signal is stored in the A fill_elements is also multiplexed.

たとえば、前記分割部１０３における分離の方法が、単に、前記チャネル拡大部符号化信号を、それぞれがＮバイト以下のＡ（Ａ≧１）個の部分信号に分割するような方法である場合、１個目のｆｉｌｌｅｌｅｍｅｎｔは、完結しておらず、２個目のｆｉｌｌｅｌｅｍｅｎｔに続き、２個目のｆｉｌｌｅｌｅｍｅｎｔでも完結していない場合は３個目のｆｉｌｌｅｌｅｍｅｎｔに続く、或いは完結している場合はこのｆｉｌｌｅｌｅｍｅｎｔで終了である、というような情報をも多重化する。そのような情報は夫々のｆｉｌｌｅｌｅｍｅｎｔ内に含ませてもよいし、そのような情報だけを別途ｆｉｌｌｅｌｅｍｅｎｔとして多重化してもよい。 For example, when the dividing method in the dividing unit 103 is a method in which the channel expansion unit coded signal is simply divided into A (A ≧ 1) partial signals each having N bytes or less, 1 If the first fill element is not complete, is followed by the second fill element, and if the second fill element is not completed, it is followed by the third fill element, or if it is complete Also multiplexes information such as “end with this fill element”. Such information may be included in each fill element, or only such information may be multiplexed separately as a fill element.

或いは、前記分割部１０３における分離の方法が、前左ｃｈと後左ｃｈとをダウンミックスした信号を元のそれぞれの信号に戻すための符号化信号を１個目の部分信号とし、前右ｃｈと後右ｃｈとをダウンミックスした信号を元のそれぞれの信号に戻すための符号化信号を２個目の部分信号とする、というような方法の場合、夫々ｆｉｌｌｅｌｅｍｅｎｔが含んでいるチャネルの情報を示す情報をも多重化する。そのような情報は夫々のｆｉｌｌｅｌｅｍｅｎｔ内に含ませてもよいし、そのような情報だけを別途ｆｉｌｌｅｌｅｍｅｎｔとして多重化してもよい。 Alternatively, in the separation method in the dividing unit 103, an encoded signal for returning a signal obtained by downmixing the front left channel and the rear left channel to the original signal is used as the first partial signal, and the front right channel In the case of a method in which a coded signal for returning a signal obtained by downmixing the signal and the rear right channel to the original signal is used as the second partial signal, information on the channel included in the fill element, respectively. The information indicating is also multiplexed. Such information may be included in each fill element, or only such information may be multiplexed separately as a fill element.

或いは、前記分割部１０３における分離の方法が、前記チャネル拡大部符号化信号のうち、ゲイン差情報を符号化した符号化信号を１個目の部分信号とし、相関の度合いを符号化した符号化信号を２個目の部分信号とするというように、符号化情報の種類で分けるような場合は、夫々のｆｉｌｌｅｌｅｍｅｎｔが含んでいる符号化情報の種類を示す情報をも多重化する。そのような情報は夫々のｆｉｌｌｅｌｅｍｅｎｔ内に含ませてもよいし、そのような情報だけを別途ｆｉｌｌｅｌｅｍｅｎｔとして多重化してもよい。 Alternatively, the separation method in the dividing unit 103 is an encoding in which the degree of correlation is encoded by using the encoded signal obtained by encoding the gain difference information as the first partial signal in the channel expansion unit encoded signal. When the signal is divided by the type of encoded information, such as a second partial signal, information indicating the type of encoded information included in each fill element is also multiplexed. Such information may be included in each fill element, or only such information may be multiplexed separately as a fill element.

或いは、前記第２符号化部１０２が、入力信号の周波数帯域毎にチャネル拡大部符号化信号を生成するような符号化方式の場合で、前記分割部１０３における分離の方法が、周波数の低い帯域に対する符号化信号を１個目の部分信号とし、周波数の高い帯域に対する符号化信号を２個目の部分信号とする、というように入力信号の物理的位置で分けるような方法である場合、夫々のｆｉｌｌｅｌｅｍｅｎｔが含んでいる符号化情報の内容を示す情報をも多重化する。そのような情報は夫々のｆｉｌｌｅｌｅｍｅｎｔ内に含ませてもよいし、そのような情報だけを別途ｆｉｌｌｅｌｅｍｅｎｔとして多重化してもよい。勿論その場合、周波数の位置で分けるのではなく、時間的な前後関係の位置で分けてもよいことはいうまでもない。 Alternatively, in the case of an encoding method in which the second encoding unit 102 generates a channel expansion unit encoded signal for each frequency band of the input signal, the dividing method in the dividing unit 103 is a low frequency band. When the method is such that the coded signal for the input signal is the first partial signal, the coded signal for the high frequency band is the second partial signal, and so on, according to the physical position of the input signal, respectively. The information indicating the content of the encoded information included in the fill element is also multiplexed. Such information may be included in each fill element, or only such information may be multiplexed separately as a fill element. Of course, in that case, it is needless to say that it may be divided not by frequency position but by temporal position.

上記の説明においては、Ａが２以上であることを述べてきたが、Ａが１であってもよいことはいうまでもない。例えば、前記第２符号化部１０２が、入力のマルチチャネル信号をポリフェーズフィルタバンクなどを用いて複数のサブバンド信号に分解し、そのサブバンド信号をいくつかの周波数帯域毎にまとめたり、あるいはそのサブバンド信号をいくつかのタイムスロット毎にまとめたりして、そのそれぞれのまとまりごとに、ゲイン差情報や相関の度合い、等を符号化しているような場合、そのまとめ方の細かさが粗い場合は、符号化信号の量が少ないので、前記Ａを２以上にする必要がない。例えば図２に示すように、周波数方向のまとめ方、時間方向のまと方ともそれ程細かくない場合は、前記Ａは１でよい、即ち、ｆｉｌｌｅｌｅｍｅｎｔは１つでよい。一方図３に示すように、周波数方向のまとめ方や時間方向のまと方が図２より細かくなった場合は、符号化信号の量は大きくなるので、前記Ａを２やそれ以上にしなくてはならない場合が生じる。 In the above description, it has been described that A is 2 or more, but it is needless to say that A may be 1. For example, the second encoding unit 102 decomposes an input multi-channel signal into a plurality of subband signals using a polyphase filter bank or the like and collects the subband signals for each of several frequency bands, or If the subband signals are grouped into several time slots and the gain difference information, the degree of correlation, etc. are encoded for each group, the details of the grouping are coarse. In this case, since the amount of encoded signal is small, it is not necessary to set A to 2 or more. For example, as shown in FIG. 2, when the frequency direction and the time direction are not so fine, the A may be 1, that is, the fill element may be one. On the other hand, as shown in FIG. 3, when the frequency direction summarization method and time direction summarization method are finer than those in FIG. 2, the amount of the encoded signal becomes large. There are cases where it is not possible.

上記に説明においては、時間軸方向、周波数軸方向の両方分割してまとめているが、どちらか一方でもよいことはいうまでもない。また、図２、図３では、サブバンドの数は３２としてが、勿論これは一例に過ぎず、１６や６４や、７９など、どのような値であってもよいことはいうまでもない。 In the above description, both the time axis direction and the frequency axis direction are divided and collected, but it goes without saying that either one may be used. 2 and 3, the number of subbands is 32. Of course, this is only an example, and it is needless to say that any value such as 16, 64, or 79 may be used.

上記のように、本実施の形態によれば、Ｍチャネル（Ｍ＞２）のマルチチャネル信号をステレオ信号にダウンミックスするダウンミックス手段と、前記ダウンミックス信号を符号化し、ダウンミックス部符号化信号を生成する第１符号化手段と、前記ダウンミックス信号をマルチチャネル信号に戻すための情報を符号化し、チャネル拡大部符号化信号を生成する第２符号化手段と、前記チャネル拡大部符号化信号を、それぞれがＮバイト以下のＡ（Ａ≧１）個の部分信号に分割する分割手段と、前記ダウンミックス部符号化信号と前記Ａ個の部分信号とを多重化する多重化手段とを備え、前記第１符号化手段は前記ダウンミックス信号をＭＰＥＧ規格ＡＡＣ方式におけるステレオ信号として符号化し、前記多重化手段は前記Ａ個の部分信号を、それぞれＭＰＥＧ規格ＡＡＣ方式におけるｆｉｌｌ＿ｅｌｅｍｅｎｔの形式でフォーマットし、該Ａ個のｆｉｌｌ＿ｅｌｅｍｅｎｔを多重化し、かつ、前記Ａが２以上の場合、前記部分信号は夫々単独では前記チャネル拡大部符号化信号を構成していないことを示す情報も多重化することによって、前記ダウンミックス信号をＭＰＥＧ規格ＡＡＣ方式に準拠させることができ、かつ、前記チャネル拡大部符号化信号がｆｉｌｌｅｌｅｍｅｎｔとして格納されるので、前記チャネル拡大部符号化信号に基づいて、前記ダウンミックス信号をもとにマルチチャネル信号に戻す機能を有したＳｐａｔｉａｌＣｏｄｅｃのデコーダでは、前記多重化手段によって生成されば符号化信号をマルチチャネル信号に復号でき、一方前記ＳｐａｔｉａｌＣｏｄｅｃのデコーダでない、旧来のＡＡＣデコーダでは、前記チャネル拡大部符号化信号はｆｉｌｌｅｌｅｍｅｎｔとして、無視される存在となるので、ダウンミックスされた２チャネル信号が生成できることとなる。 As described above, according to the present embodiment, downmix means for downmixing an M channel (M> 2) multichannel signal to a stereo signal, the downmix signal is encoded, and a downmix unit encoded signal is encoded. First encoding means for generating a signal, second encoding means for encoding information for returning the downmix signal to a multi-channel signal, and generating a channel expansion part encoded signal, and the channel expansion part encoded signal Is divided into A (A ≧ 1) partial signals each having N bytes or less, and multiplexing means for multiplexing the downmix part encoded signal and the A partial signals. The first encoding means encodes the downmix signal as a stereo signal in the MPEG standard AAC system, and the multiplexing means converts the A partial signals. When each is formatted in the form of fill_element in the MPEG standard AAC system, the A number of fill_elements are multiplexed, and when A is 2 or more, each of the partial signals individually constitutes the channel expansion portion encoded signal. By multiplexing information indicating that there is no signal, the downmix signal can be made to conform to the MPEG standard AAC system, and the channel expansion portion encoded signal is stored as a fill element. A Spatial Codec decoder having a function of converting the downmix signal back to a multi-channel signal based on the encoded signal can decode the encoded signal into a multi-channel signal if generated by the multiplexing means. Spatial Non decoder Odec, the traditional AAC decoder, the channel expansion unit encoded signal as fill element, since the presence is ignored, so that the 2-channel signal down-mix can be generated.

特に、前記チャネル拡大部符号化信号のサイズが非常に大きなサイズになった場合でも複数のｆｉｌｌｅｌｅｍｅｎｔを用いるので、従来のＭＰＥＧ規格ＡＡＣ方式との互換性が取れるようになる。ここで注意しなくてはならないことは、複数のｆｉｌｌｅｌｅｍｅｎｔに分けた場合、それぞれ単独ではチャネル拡大部符号化信号として成立しないが、それらを関係付ける情報もｆｉｌｌｅｌｅｍｅｎｔとして多重化しているので、ＳｐａｔｉａｌＣｏｄｅｃのデコーダでは、前記多重化手段によって生成されば符号化信号を正確に解釈でき、その結果マルチチャネル信号に復号できるのである。 In particular, even when the size of the channel expansion portion encoded signal is very large, a plurality of fill elements are used, so that compatibility with the conventional MPEG standard AAC system can be achieved. It should be noted here that when divided into a plurality of fill elements, each of them is not established as a channel expansion portion encoded signal alone, but information relating them is also multiplexed as a fill element, so Spatial The Codec decoder can accurately interpret the encoded signal if it is generated by the multiplexing means, and as a result can be decoded into a multi-channel signal.

また、本実施の形態では、マルチチャンネル信号のチャンネル数は説明の簡単化のために４としたが、４でなくてもよく、一般的に広く普及している５．１チャンネルや、７．１チャネルなどあっても良いことはいうまでもない。むしろ、入力のマルチチャネルのチャネル数が、多くなればなるほど、チャネル拡大部符号化信号のサイズが大きくなるので、Ａを２以上にしなくてはならないケースがおおくなり、本願発明の趣旨に合致する状況になる。 In the present embodiment, the number of channels of the multi-channel signal is set to 4 for simplification of explanation, but it may not be 4, and 5.1 channels or 7. Needless to say, there may be one channel. Rather, the larger the number of input multi-channel channels, the larger the size of the channel expansion portion encoded signal, so there are many cases where A must be 2 or more, which is consistent with the spirit of the present invention. It becomes a situation.

本発明は、ＳｐａｔｉａｌＣｏｄｅｃの符号化信号が大きなサイズになった場合でも従来のＭＰＥＧ規格ＡＡＣ方式のステレオ符号化信号と互換性が取れるようにできるので、従来からＡＡＣのステレオ方式を用いて実用化されている機器、たとえば、デジタル放送受信機（所謂１Ｓｅｇ受信機）やポータブルオーディオ機器でもちいることによって、機器がさらに高機能化できると同時に、旧来の機器においてもそのユーザーに不便を与えないようにできる。 Since the present invention can be compatible with the conventional MPEG standard AAC stereo encoded signal even when the Spatial Codec encoded signal has a large size, it has been put into practical use by using the AAC stereo system. Devices, such as digital broadcast receivers (so-called 1Seg receivers) and portable audio devices, can further enhance the functionality of the devices, while preventing inconvenience to users of legacy devices. Can be.

本実施の形態１におけるオーディオエンコーダの構成を示す図The figure which shows the structure of the audio encoder in this Embodiment 1. 分割部における分割数Ａが１でよい場合の入力の時間／周波数信号を示す図。The figure which shows the time / frequency signal of an input in case the division | segmentation number A in a division | segmentation part should be one. 分割部における分割数Ａが２以上となる場合の入力の時間／周波数信号を示す図。The figure which shows the time / frequency signal of an input in case the division | segmentation number A in a division part becomes 2 or more. チャネル拡大部符号化信号をｆｉｌｌｅｌｅｍｅｎｔに格納した場合の問題点を示す図。The figure which shows the problem at the time of storing a channel expansion part encoding signal in a fill element. チャネル拡大部符号化信号をｆｉｌｌｅｌｅｍｅｎｔに格納した場合、高ビットレートでも音質がよくならない理由を説明するための図。The figure for demonstrating the reason why a sound quality does not improve at a high bit rate when a channel expansion part coding signal is stored in a fill element. チャネル拡大部符号化信号をｆｉｌｌｅｌｅｍｅｎｔに格納しようとした場合の課題を整理した図。The figure which arranged the subject at the time of trying to store a channel expansion part coding signal in a fill element. ＭＰＥＧ規格ＡＡＣ方式におけるｆｉｌｌｅｌｅｍｅｎｔのシンタックスを示す図。The figure which shows the syntax of the fill element in MPEG standard AAC system.

Explanation of symbols

１００ダウンミックス部
１０１第１符号化部
１０２第２符号化部
１０３分割部
１０４多重化部
DESCRIPTION OF SYMBOLS 100 Downmix part 101 1st encoding part 102 2nd encoding part 103 Dividing part 104 Multiplexing part

Claims

Downmix means for downmixing an M channel (M> 2) multichannel signal to a stereo signal;
A first encoding means for encoding the downmix signal and generating a downmix section encoded signal;
A second encoding means for encoding information for returning the downmix signal to a multi-channel signal and generating a channel extension encoded signal;
Dividing means for dividing the channel extension portion encoded signal into A (A ≧ 1) partial signals each having N bytes or less;
A multiplexing means for multiplexing the downmix part encoded signal and the A partial signals;
The first encoding means encodes the downmix signal as a stereo signal in the MPEG standard AAC system, and the multiplexing means formats the A partial signals in the form of a fill_element in the MPEG standard AAC system, A number of fill_elements is multiplexed, and when A is 2 or more, information indicating which part of the channel expansion portion encoded signal is stored in the A fill_elements is also multiplexed. An audio encoder.

The audio encoder according to claim 1, wherein the multiplexing means sets the A to 2 or more when the value of M is a predetermined value or more.

The second encoding means encodes gain difference information and a degree of correlation between predetermined channels of the input multi-channel signal,
The multiplexing means sets the A to 2 or more when the second encoding means encodes the gain difference information and the degree of correlation with a fineness of a predetermined time resolution or more. The audio encoder according to claim 1 or 2.

The second encoding means encodes gain difference information and a degree of correlation between predetermined channels of the input multi-channel signal,
The multiplexing means sets the A to 2 or more when the second encoding means encodes the gain difference information and the degree of correlation at a fineness equal to or higher than a predetermined frequency resolution. The audio encoder according to claim 1 or 2.