JP2011008258A

JP2011008258A - High quality multi-channel audio encoding apparatus and decoding apparatus

Info

Publication number: JP2011008258A
Application number: JP2010143218A
Authority: JP
Inventors: Jeong Il Seo; ジョンイルソ; Jae-Hyoun Yoo; チェ−ヒュンユ; Kyeongok Kang; キョンゴクカン
Original assignee: Electronics and Telecommunications Research Institute ETRI
Current assignee: Electronics and Telecommunications Research Institute ETRI
Priority date: 2009-06-23
Filing date: 2010-06-23
Publication date: 2011-01-13
Also published as: JP2013174891A; KR101283783B1; KR20100138716A

Abstract

PROBLEM TO BE SOLVED: To provide a High Quality Multi-channel Audio (HQMA) encoding apparatus and a decoding apparatus, which provide compatibility with a lower channel by making coding variable according to characteristics of audio signals.SOLUTION: There are disclosed a High Quality Multi-channel Audio(HQMA) encoding apparatus and a decoding apparatus. The HQMA encoding apparatus and the decoding apparatus provide compatibility with a lower channel by performing channel base audio encoding or channel base audio decoding in accordance with characteristics of input audio signals to provide compatibility with a lower channel.

Description

本発明は、高品質マルチチャネルオーディオ符号化および復号化装置に関し、入力されるオーディオ信号の特性に応じてオーディオ信号符号化を異なるように行うオーディオ符号化および復号化装置に関する。 The present invention relates to a high-quality multi-channel audio encoding and decoding apparatus, and more particularly to an audio encoding and decoding apparatus that performs audio signal encoding differently according to the characteristics of an input audio signal.

本発明は、放送通信委員会、放送通信委員会のＩＴリソース技術開発事業の一環として行う研究から導き出されたものである［課題管理番号：２００８−Ｆ−０１１−０１、課題名：次世代ＤＴＶコア技術開発（標準化関係）−無眼鏡個人型３Ｄ放送技術開発（継続）］。 The present invention was derived from research conducted as part of the Broadcast Communication Committee and the IT Resource Technology Development Project of the Broadcast Communication Committee [Problem Management Number: 2008-F-011-01, Issue Name: Next Generation DTV Core technology development (standardization related)-Non-glasses personal 3D broadcasting technology development (continued)].

５．１チャネルのようなマルチチャネルオーディオ信号は、効率的に放送網を介して送信されるか、ＤＶＤまたはブルーレイ（ｂｌｕｅ−ｒａｙ）のような光学メディアに格納されるため圧縮、符号化、および復号化の過程を行う。 Multi-channel audio signals such as 5.1 channels are efficiently transmitted over the broadcast network or stored on optical media such as DVD or blue-ray, so that compression, encoding, and Perform the decryption process.

かかる圧縮、符号化、および復号化の技術は音響心理（ｐｓｙｃｈｏａｃｏｕｓｔｉｃ）モデルと時間／周波数変換を用いる知覚音響符号化（ｐｅｒｃｅｐｔｕａｌａｕｄｉｏｃｏｄｉｎｇ）の技術に基づく。このとき、マルチチャネルオーディオ信号と隣接する信号の間の相関度を用いるチャネル符号化技術が付加的に用いられてもよい。一例として、チャネル符号化技術には、ＡＣ−３（またはドルビーデジタル（ｄｏｌｂｙｄｉｇｉｔａｌ））、ＤＴＳ（ＤｉｇｉｔａｌＴｈｅａｔｅｒＳｙｓｔｅｍ）、ＭＰＥＧで標準化されるＡＡＣ（ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ）等がある。このようなチャネル符号化技術は、国内外のデジタル放送標準とＤＶＤ、ＤＶＤ−Ａｕｄｉｏ、ＤＶＤ−ＨＤ、ブルーレイなどのような光学メディアの格納フォーマット標準に採択されて用いられる。 Such compression, encoding, and decoding techniques are based on perceptual acoustic coding techniques using a psychopsychic model and time / frequency conversion. At this time, a channel coding technique using the degree of correlation between the multi-channel audio signal and the adjacent signal may be additionally used. As an example, channel coding techniques include AC-3 (or Dolby Digital), DTS (Digital Theater System), AAC (Advanced Audio Coding) standardized by MPEG, and the like. Such channel coding technology is adopted and used in domestic and foreign digital broadcast standards and storage formats standards for optical media such as DVD, DVD-Audio, DVD-HD, and Blu-ray.

近年では、マルチチャネルオーディオサービスを移動放送またはＩＰＴＶなどのように帯域幅が制限される環境で提供するために、マルチチャネルオーディオ信号が有する空間情報（ｓｐａｔｉａｌｃｕｅ）をパラメータに表現して圧縮する空間オーディオ符号化技術の研究が進められている。空間オーディオ符号化技術は、マルチチャネルオーディオ信号をモノまたはステレオ信号にダウンミックスし、マルチチャネルオーディオ信号を復元するために必要な空間パラメータを付加情報に符号化する技術である。空間符号化技術の代表的な例として、ＭＰＥＧサラウンドが挙げられる。 In recent years, in order to provide a multi-channel audio service in an environment where bandwidth is limited, such as mobile broadcasting or IPTV, a space in which spatial information (spatial cue) of a multi-channel audio signal is expressed as a parameter and compressed. Research on audio coding technology is underway. The spatial audio encoding technique is a technique for down-mixing a multi-channel audio signal into a mono or stereo signal and encoding spatial parameters necessary for restoring the multi-channel audio signal into additional information. A typical example of the spatial coding technique is MPEG surround.

３ＤＴＶおよびＵＨＤＴＶのような実感放送の環境において再現しようとする高臨場感の生々しい実感オーディオを正しく表現するためには１０チャネル以上のラウドスピーカが必要となる。これまでは、ＨＤＴＶとＤＶＤに適用された５．１チャネルが広く用いられたが、ＤＶＤ−ＨＤ、ブルーレイでは最大７．１チャネルまで支援が可能である。さらに、劇場のような大規模のオーディオ空間において窮極の音場感を提供するために１００チャネル以上のラウドスピーカが用いられることもある。 In order to correctly express a lively and realistic audio that is to be reproduced in an actual broadcasting environment such as 3DTV and UHDTV, a loudspeaker having 10 or more channels is required. Up to now, 5.1 channels applied to HDTV and DVD have been widely used, but DVD-HD and Blu-ray can support up to 7.1 channels. Furthermore, a loudspeaker having 100 channels or more may be used in order to provide an extreme sound field in a large-scale audio space such as a theater.

しかし、ほとんどの一般家庭で用いるＴＶおよびラジオは２チャネルのラウドスピーカを用いており、ＨＤＴＶおよびＤＶＤが普遍化して５．１チャネルを再生できるようになっている。 However, TVs and radios used in most ordinary homes use two-channel loudspeakers, and HDTV and DVD are universal and can reproduce 5.1 channels.

一例として、図１に示すようなチャネルエンコーダによって１０チャネル以上のマルチチャネルオーディオ信号を圧縮する場合、５．１チャネル再生端末との互換性を維持することが難しい。 As an example, when a multi-channel audio signal of 10 channels or more is compressed by a channel encoder as shown in FIG. 1, it is difficult to maintain compatibility with a 5.1 channel playback terminal.

これによって、１０チャネル以上などのマルチチャネルオーディオ信号を圧縮しながら下位チャネルとの互換性を提供するマルチチャネルオーディオ符号化および復号化技術が求められる。 Accordingly, there is a need for a multi-channel audio encoding and decoding technique that provides compatibility with lower channels while compressing multi-channel audio signals such as 10 channels or more.

本発明は、オーディオ信号の特性に応じて符号化を異なるようにして下位チャネルとの互換性を提供する高品質マルチチャネルオーディオ符号化および復号化装置を提供する。 The present invention provides a high-quality multi-channel audio encoding and decoding apparatus that provides compatibility with lower channels by changing encoding according to the characteristics of an audio signal.

本発明の一実施形態に係るマルチチャネルオーディオ符号化装置は、入力されるオーディオ信号の特性に基づいてオーディオ信号に対してチャネル基盤オーディオ符号化を行うチャネル基盤オーディオ符号化部と、オーディオ信号の特性に基づいてオーディオ信号に対してオブジェクト基盤オーディオ符号化を行うオブジェクト基盤オーディオ符号化部とを含んでもよい。 A multi-channel audio encoding device according to an embodiment of the present invention includes a channel-based audio encoding unit that performs channel-based audio encoding on an audio signal based on characteristics of an input audio signal, and characteristics of the audio signal. And an object-based audio encoding unit that performs object-based audio encoding on the audio signal based on the above.

このとき、チャネル基盤オーディオ符号化部は、入力されるオーディオ信号がマルチチャネルオーディオ信号である場合、マルチチャネルオーディオ信号に対してチャネル基盤オーディオ符号化を行ってビットストリームを生成してもよい。 At this time, if the input audio signal is a multi-channel audio signal, the channel-based audio encoding unit may generate a bitstream by performing channel-based audio encoding on the multi-channel audio signal.

また、オブジェクト基盤オーディオ符号化部は、入力されるオーディオ信号がマルチオブジェクトオーディオ信号である場合、マルチオブジェクトオーディオ信号に対してオブジェクト基盤オーディオ符号化を行ってビットストリームを生成してもよい。 In addition, when the input audio signal is a multi-object audio signal, the object-based audio encoding unit may generate a bitstream by performing object-based audio encoding on the multi-object audio signal.

また、チャネル基盤オーディオ符号化部は、マルチチャネルオーディオ信号をダウンミックスして第１ダウンミックス信号を生成し、マルチチャネルオーディオ信号から抽出される空間パラメータを符号化して第２向上階層ビットストリームを生成してもよい。 In addition, the channel-based audio encoding unit generates a first downmix signal by downmixing the multichannel audio signal, and generates a second enhancement layer bitstream by encoding a spatial parameter extracted from the multichannel audio signal. May be.

また、チャネル基盤オーディオ符号化部は、第１ダウンミックス信号をダウンミックスして第２ダウンミックス信号を生成し、第１ダウンミックス信号と追加チャネル信号を合成するチャネル合成部をさらに含んでもよい。 The channel-based audio encoding unit may further include a channel combining unit that downmixes the first downmix signal to generate a second downmix signal and combines the first downmix signal and the additional channel signal.

また、チャネル基盤オーディオ符号化部は、合成された第１ダウンミックス信号を符号化して第１向上階層ビットストリームを生成する第２チャネルエンコーダをさらに含んでもよい。 In addition, the channel-based audio encoding unit may further include a second channel encoder that encodes the combined first downmix signal to generate a first enhancement layer bitstream.

また、チャネル基盤オーディオ符号化部は、第２ダウンミックス信号を符号化して基本階層ビットストリームを生成する第１チャネルエンコーダをさらに含んでもよい。 In addition, the channel-based audio encoding unit may further include a first channel encoder that encodes the second downmix signal to generate a base layer bitstream.

また、オブジェクト基盤オーディオ符号化部は、入力されるオーディオ信号がマルチオブジェクトオーディオ信号である場合、マルチオブジェクトオーディオ信号をミックスするミックス部と、ミックスされた信号を符号化して基本階層ビットストリームを生成するビットストリーム生成部と、入力されたマルチオブジェクトオーディオ信号をモノオブジェクト、ステレオオブジェクト、およびマルチオブジェクトオーディオ信号に分離し、予め設定されたレンダリング情報を用いて分離したオーディオ信号を多重化してオブジェクト階層ビットストリームを生成するオブジェクトエンコーダと、を含んでもよい。 In addition, when the input audio signal is a multi-object audio signal, the object-based audio encoding unit generates a base layer bitstream by encoding the mixed signal and a mix unit that mixes the multi-object audio signal. Bit stream generation unit and input multi-object audio signal is separated into mono object, stereo object, and multi-object audio signal, and the audio signal separated using preset rendering information is multiplexed to create an object hierarchy bit stream And an object encoder for generating.

このとき、チャネル基盤オーディオ符号化部によって生成される第１および第２向上階層ビットストリームは、基本階層ビットストリーム構造で付加データ領域に含まれてもよい。 At this time, the first and second enhancement layer bitstreams generated by the channel-based audio encoding unit may be included in the additional data area in a base layer bitstream structure.

また、オブジェクト基盤オーディオ符号化部によって生成されるオブジェクト階層ビットストリームは、基本階層ビットストリーム構造で付加データ領域に含まれてもよい。 Also, the object layer bitstream generated by the object-based audio encoding unit may be included in the additional data area with a basic layer bitstream structure.

本発明の一実施形態に係る高品質マルチチャネルオーディオ符号化装置は、高品質マルチチャネルオーディオ符号化装置から受信されるエンコーディングモードに基づいてチャネル基盤オーディオ復号化のために初期化を行うチャネル基盤オーディオ復号化部と、エンコーディングモードに基づいてオブジェクト基盤オーディオの復号化のために初期化を行うオブジェクト基盤オーディオ復号化部とを含んでもよい。 A high quality multi-channel audio encoding apparatus according to an embodiment of the present invention is a channel-based audio that performs initialization for channel-based audio decoding based on an encoding mode received from a high-quality multi-channel audio encoding apparatus. A decoding unit and an object-based audio decoding unit that performs initialization for decoding object-based audio based on an encoding mode may be included.

このとき、前記チャネル基盤オーディオ復号化部は、高品質マルチチャネルオーディオ符号化装置から受信されたフレームに含まれるビットストリーム階層に基づいてチャネル基盤オーディオの復号化を行ってもよい。 At this time, the channel-based audio decoding unit may perform channel-based audio decoding based on a bitstream hierarchy included in a frame received from the high-quality multi-channel audio encoding device.

また、オブジェクト基盤オーディオ復号化部は、ビットストリーム階層に基づいてオブジェクト基盤オーディオの復号化を行ってもよい。 The object-based audio decoding unit may decode object-based audio based on the bitstream hierarchy.

本発明は、高品質マルチチャネルオーディオ符号化および復号化装置によってＡＣ−３のような再生システムと互換性を維持しながら高品質マルチチャネルオーディオ信号を圧縮および復元することができる。 The present invention can compress and decompress a high quality multi-channel audio signal while maintaining compatibility with a playback system such as AC-3 by a high quality multi-channel audio encoding and decoding apparatus.

また、マルチチャネル信号の復元において、ビットストリーム階層に基づいて段階別にチャネルの拡張技法を適用するため、再生端末の環境に適するチャネル信号を復号化の中間ステップで抽出して用いることができる。 In addition, since the channel expansion technique is applied step by step based on the bitstream hierarchy in the reconstruction of the multi-channel signal, a channel signal suitable for the playback terminal environment can be extracted and used in an intermediate decoding step.

また、オブジェクト別に符号化および復号化を行うことによって、マルチチャネル環境において帯域幅を節約することができる。 Further, by performing encoding and decoding for each object, bandwidth can be saved in a multi-channel environment.

また、再生端末の環境に最適であり、レンダリングされた音響信号を提供するだけでなく、オーディオオブジェクト信号を自由に制御できるようユーザへ自由度を提供することができる。 Moreover, it is most suitable for the environment of the playback terminal, and not only provides a rendered acoustic signal, but also provides a user with a degree of freedom so that the audio object signal can be freely controlled.

７．１チャネルエンコーダの構成を示すブロック図である。It is a block diagram which shows the structure of a 7.1 channel encoder. 高品質オーディオ符号化装置の構成を示す図である。It is a figure which shows the structure of a high quality audio encoding apparatus. チャネル基盤オーディオ符号化部の構成を示すブロック図である。It is a block diagram which shows the structure of a channel-based audio encoding part. オブジェクト基盤オーディオ符号化部の構成を示すブロック図である。It is a block diagram which shows the structure of an object-based audio encoding part. ＨＱＭＡＣビットストリームの構造を示す図である。It is a figure which shows the structure of a HQMAC bit stream. ＨＱＭＡＣビットストリームの構造を示す図である。It is a figure which shows the structure of a HQMAC bit stream. ＨＱＭＡＣビットストリームの構造を示す図である。It is a figure which shows the structure of a HQMAC bit stream. チャネル基盤オーディオ復号化部の構成を示すブロック図である。It is a block diagram which shows the structure of a channel based audio decoding part. オブジェクト基盤オーディオ復号化部の構成を示すブロック図である。It is a block diagram which shows the structure of an object-based audio decoding part.

以下、添付の図面に記載される内容を参照して本発明に係る実施形態を詳細に説明する。ただし、本発明が実施形態によって制限されたり限定されることはない。各図面に提示される同じ参照符号は同じ部材を示す。 Hereinafter, embodiments of the present invention will be described in detail with reference to the contents described in the accompanying drawings. However, the present invention is not limited or limited by the embodiment. The same reference numerals provided in each drawing denote the same members.

図２は、高品質オーディオ符号化装置の構成を示す図である。図２に示すように、高品質オーディオ符号化装置（ＨｉｇｈＱｕａｌｉｔｙＭｕｌｔｉｃｈａｎｎｅｌＡｕｄｉｏＣｏｄｉｎｇ、以下、ＨＱＭＡＣ）は、入力されるオーディオ信号の特性に基づいて、オーディオ信号に対してチャネル基盤オーディオ符号化（ＨｉｇｈＱｕａｌｉｔｙＭｕｌｔｉｃｈａｎｎｅｌＡｕｄｉｏＣｏｄｉｎｇ−ＣｈａｎｎｅｌＢａｓｅｄ、以下、ＨＱＭＡＣ−ＣＢ）またはオブジェクト基盤オーディオ符号化（ＨｉｇｈＱｕａｌｉｔｙＭｕｌｔｉｃｈａｎｎｅｌＡｕｄｉｏＣｏｄｉｎｇ−ＯｂｊｅｃｔｅｄＢａｓｅｄ、以下、ＨＱＭＡＣ−ＯＢ）を行うことができる。 FIG. 2 is a diagram illustrating a configuration of a high-quality audio encoding device. As shown in FIG. 2, a high quality multi-channel audio coding (hereinafter, HQMAC) is based on channel characteristics of an audio signal based on characteristics of the input audio signal. Multichannel Audio Coding-Channel Based (hereinafter, HQMAC-CB) or object-based audio coding (High Quality Multichannel Audio-Objected Based, hereinafter, HQMAC-OB) can be performed.

一例として、入力されるオーディオ信号がマルチチャネル（Ｍチャネル）オーディオ信号である場合、マルチチャネルオーディオ符号化装置は、マルチチャネルオーディオ信号に対してチャネル基盤オーディオ符号化を行ってもよい。また、入力されるオーディオ信号がマルチオブジェクト（Ｐオブジェクト）オーディオ信号である場合、マルチオブジェクトオーディオ符号化装置はマルチオブジェクトオーディオ信号に対してオブジェクト基盤オーディオ符号化を行ってもよい。高品質オーディオ符号化装置は、入力されるオーディオ信号の特性に応じて、ＨＱＭＡＣ−ＣＢおよびＨＱＭＡＣ−ＯＢの過程を行って高品質オーディオビットストリーム（ＨＱＭＡＣｂｉｔｓｔｒｅａｍ）を生成してもよい。 As an example, when the input audio signal is a multi-channel (M-channel) audio signal, the multi-channel audio encoding device may perform channel-based audio encoding on the multi-channel audio signal. When the input audio signal is a multi-object (P object) audio signal, the multi-object audio encoding device may perform object-based audio encoding on the multi-object audio signal. The high quality audio encoding device may generate a high quality audio bitstream by performing the processes of HQMAC-CB and HQMAC-OB according to the characteristics of the input audio signal.

また、入力されるオーディオ信号がマルチチャネルオーディオ信号とマルチオブジェクトオーディオ信号が混在されている場合、ＨＱＭＡＣ−ＣＢおよびＨＱＭＡＣ−ＯＢの過程をすべて行って高品質オーディオビットストリームを生成してもよい。 Further, when the input audio signal is a mixture of a multi-channel audio signal and a multi-object audio signal, a high-quality audio bit stream may be generated by performing all the processes of HQMAC-CB and HQMAC-OB.

以下は、図３を参照してチャネル基盤オーディオ信号符号化の技術に対して説明することとする。 Hereinafter, the channel-based audio signal encoding technique will be described with reference to FIG.

図３は、チャネル基盤オーディオ符号化部の構成を示すブロック図である。図３を参照すれば、チャネル基盤オーディオ符号化部２００は、高効率チャネルエンコーダ２１０、チャネル合成部２３０、第２チャネルエンコーダ２５０、および第１チャネルエンコーダ２７０を含んでもよい。 FIG. 3 is a block diagram showing the configuration of the channel-based audio encoding unit. Referring to FIG. 3, the channel-based audio encoding unit 200 may include a high efficiency channel encoder 210, a channel synthesis unit 230, a second channel encoder 250, and a first channel encoder 270.

高効率チャネルエンコーダ（ＨｉｇｈＥｆｆｉｃｉｅｎｃｙＣｈａｎｎｅｌＥｎｃｏｄｅｒ、以下、ＨＥＣＥ）２１０は、入力されるマルチチャネル（Ｍチャネル）オーディオ信号をＮチャネルにダウンミックス（Ｍ２Ｎｄｏｗｎｍｉｘｉｎｇ）２１１して第１ダウンミックス信号を生成してもよい。一例として、２２．２チャネル（Ｍ＝２４）を１０．２チャネル（Ｎ＝１２）にダウンミックスして第２ダウンミックス信号を構成してもよい。 A high efficiency channel encoder (hereinafter referred to as HECE) 210 generates a first downmix signal by downmixing an input multichannel (M channel) audio signal to N channel 211 (M2N down mixing). May be. As an example, the second downmix signal may be configured by downmixing 22.2 channels (M = 24) to 10.2 channels (N = 12).

また、高効率チャネルエンコーダ２１０は、マルチチャネルオーディオ信号から空間情報を分析して空間パラメータを抽出２１３してもよい。このとき、空間パラメータは、Ｎチャネルでダウンミックスされた第１ダウンミックス信号がＭ個のマルチチャネルオーディオ信号に復元するため必要なパラメータを含んでもよい。 Further, the high-efficiency channel encoder 210 may extract 213 spatial parameters by analyzing spatial information from the multi-channel audio signal. At this time, the spatial parameter may include a parameter necessary for the first downmix signal downmixed by N channels to be restored to M multichannel audio signals.

また、高効率チャネルエンコーダ２１０は、マルチチャネルオーディオ信号を符号化して第２向上階層ビットストリーム（ｅｎｈａｎｃｅｍｅｎｔｌａｙｅｒＩＩｂｉｔｓｔｒｅａｍ）を生成してもよい。チャネル合成部２３０は、Ｎチャネルにダウンミックスされた第１ダウンミックス信号を、Ｌチャネルにダウンミックス２３１して第２ダウンミックス信号を生成してもよい。一例として、１０．２チャネル（Ｎ＝１２）を５．１チャネル（Ｌ＝６）にダウンミックスして第２ダウンミックス信号を構成してもよい。 Further, the high efficiency channel encoder 210 may encode a multi-channel audio signal to generate a second enhancement layer II bitstream. The channel synthesis unit 230 may generate the second downmix signal by downmixing the first downmix signal downmixed to the N channel to the L channel. As an example, the second downmix signal may be configured by downmixing 10.2 channels (N = 12) to 5.1 channels (L = 6).

このとき、チャネル合成部２３０は、Ｌチャネルにダウンミックスされた第２ダウンミックス信号をＮチャネルの第１ダウンミックス信号に復元するため必要な空間情報分析２３３を第１ダウンミックス信号に対して行ってもよい。これによって、Ｎチャネルの第１ダウンミックス信号は、Ｋチャネル信号に合成されてもよい。ここで、追加チャネル信号のチャネル数（Ｋ）は、第２ダウンミックス信号のチャネル数（Ｎ）と第１ダウンミックス信号のチャネル数（Ｌ）との間の差（Ｎ−Ｌ）よりも小さいか同じであってもよい。 At this time, the channel synthesizer 230 performs spatial information analysis 233 necessary for restoring the second downmix signal downmixed to the L channel into the first downmix signal of the N channel on the first downmix signal. May be. Accordingly, the N-channel first downmix signal may be combined with the K-channel signal. Here, the number of channels (K) of the additional channel signal is smaller than the difference (N−L) between the number of channels (N) of the second downmix signal and the number of channels (L) of the first downmix signal. Or the same.

第２チャネルエンコーダ２５０は、合成されたＫチャネル信号を符号化して第１向上階層ビットストリーム（ｅｎｈａｎｃｅｍｅｎｔｌａｙｅｒＩｂｉｔｓｔｒｅａｍ）を生成してもよい。ここで、合成されたＫチャネル信号は、Ｎ２Ｌダウンミックス２３１の過程において生成されるＬチャネルダウンミックスとともに第１ダウンミックス信号を構成してもよい。このとき、第２チャネルエンコーダ２５０は、ＡＣ−３またはＡＡＣなどの高品質チャネルエンコーディング（ＨｉｇｈＱｕａｌｉｔｙＣｈａｎｎｅｌＥｎｃｏｄｉｎｇ、以下、ＨＱＣＥ）の技術を用いて第１向上階層ビットストリームを生成してもよい。一例として、基本階層ビットストリームによって構成されるチャネルが５．１チャネル（Ｌ＝６）であり、第１向上階層ビットストリームによって構成されるチャネルが５．１チャネル（Ｋ＝６）であれば、この２ビットストリームによって１０．２チャネル（Ｎ＝１２）を構成してもよい。 The second channel encoder 250 may encode the synthesized K channel signal to generate a first enhancement layer I bitstream. Here, the synthesized K channel signal may constitute the first downmix signal together with the L channel downmix generated in the process of the N2L downmix 231. At this time, the second channel encoder 250 may generate the first enhancement layer bitstream using a technique of high quality channel encoding (hereinafter, HQCE) such as AC-3 or AAC. As an example, if the channel configured by the base layer bitstream is 5.1 channel (L = 6) and the channel configured by the first enhancement layer bitstream is 5.1 channel (K = 6), A 10.2 channel (N = 12) may be configured by this 2-bit stream.

第１チャネルエンコーダ２７０は、第２ダウンミックス信号を符号化して基本階層ビットストリームを生成してもよい。ここで、基本階層ビットストリームによって構成されるチャネルは、５．１チャネル（Ｌ＝６）で構成してもよい。 The first channel encoder 270 may generate a base layer bitstream by encoding the second downmix signal. Here, the channel constituted by the base layer bit stream may be constituted by 5.1 channels (L = 6).

このとき、第１チャネルエンコーダ２７０としては、５．１チャネルエンコーダのようなマルチチャネルエンコーダが用いられてもよい。ここで、生成される第１および第２向上階層ビットストリームは基本階層ビットストリームに多重化されてもよい。これによって、基本階層のみを復号化することのできるマルチチャネルデコーダでも１０チャネル以上のオーディオ信号に対して圧縮および符号化によって生成されるビットストリームを処理することができる。 At this time, as the first channel encoder 270, a multi-channel encoder such as a 5.1 channel encoder may be used. Here, the generated first and second enhancement layer bitstreams may be multiplexed into the base layer bitstream. As a result, even a multi-channel decoder capable of decoding only the base layer can process a bit stream generated by compression and encoding on an audio signal of 10 channels or more.

すると、高品質マルチチャネルオーディオ符号化装置は、生成される第１および第２向上階層ビットストリーム、および基本ビットストリームからなるＨＱＭＡＣビットストリームを高品質マルチチャネルオーディオ復号化装置に送信してもよい。ここで、ＨＱＭＡＣビットストリームは、ＨＱＭＡＣヘッダおよびＨＱＭＡＣフレームから構成してもよい。 Then, the high quality multi-channel audio encoding device may transmit the generated HQMAC bit stream including the first and second enhancement layer bit streams and the basic bit stream to the high quality multi-channel audio decoding device. Here, the HQMAC bit stream may be composed of an HQMAC header and an HQMAC frame.

また、第１および第２向上階層ビットストリームのうちのいずれか１つ、またはすべてが存在しないこともある。また、高品質マルチチャネルオーディオ符号化装置は、第１および第２向上階層ビットストリームそれぞれのチャネル数を決めてもよい。すると、決められたチャネル数はＨＱＭＡＣビットストリームのヘッダに含まれてもよい。 Also, any one or all of the first and second enhancement layer bitstreams may not exist. Further, the high quality multi-channel audio encoding device may determine the number of channels for each of the first and second enhancement layer bitstreams. Then, the determined number of channels may be included in the header of the HQMAC bitstream.

図４は、オブジェクト基盤オーディオ符号化部の構成を示すブロック図である。図４を参照すれば、オブジェクト基盤オーディオ符号化部３００は、ミックス部３１０、ビットストリーム生成部３３０、オブジェクトエンコーダ３５０を含んでもよい。 FIG. 4 is a block diagram illustrating a configuration of the object-based audio encoding unit. Referring to FIG. 4, the object-based audio encoding unit 300 may include a mixing unit 310, a bitstream generation unit 330, and an object encoder 350.

ミックス部３１０は、外部から入力されるミックス情報を用いてＰ個のマルチオブジェクトオーディオ信号をＬチャネルにミックスしてもよい。 The mixing unit 310 may mix the P multi-object audio signals into the L channel using mix information input from the outside.

ビットストリーム生成部３３０は、ミックスされたＬチャネルオーディオ信号を符号化して基本階層ビットストリームを生成してもよい。このとき、ビットストリーム生成部３３０は、５．１チャネルエンコーダのようなマルチャネルエンコーダを用いて基本階層ビットストリームを生成してもよい。 The bit stream generation unit 330 may generate a base layer bit stream by encoding the mixed L channel audio signal. At this time, the bit stream generation unit 330 may generate a base layer bit stream using a multi-channel encoder such as a 5.1 channel encoder.

オブジェクトエンコーダ３５０は、Ｐ個のマルチオブジェクトオーディオ信号をモノ、ステレオ、およびマルチチャネルオブジェクトオーディオ信号にそれぞれ分離し、分離されたオブジェクトそれぞれに対して符号化を行ってもよい。 The object encoder 350 may separate the P multi-object audio signals into mono, stereo, and multi-channel object audio signals, and perform encoding on each separated object.

一例として、モノオブジェクトオーディオ信号はモノチャネルエンコーダ３５１によって符号化され、ステレオオブジェクトオーディオ信号はステレオチャネルエンコーダ３５２によって符号化され、マルチチャネルオブジェクトオーディオ信号はマルチチャネルエンコーダ３５３によって符号化される。このとき、モノチャネルエンコーダ３５１、ステレオチャネルエンコーダ３５２、およびマルチチャネルエンコーダ３５３では、ＡＣ−３、ＡＡＣ、およびＭＰ３等の符号化技術を用いて分離したオブジェクトオーディオ信号を符号化してもよい。 As an example, the mono object audio signal is encoded by the mono channel encoder 351, the stereo object audio signal is encoded by the stereo channel encoder 352, and the multi channel object audio signal is encoded by the multi channel encoder 353. At this time, the mono channel encoder 351, the stereo channel encoder 352, and the multi-channel encoder 353 may encode the separated object audio signal using an encoding technique such as AC-3, AAC, and MP3.

すると、マルチプレクサ３５４は、符号化されたオブジェクト符号化ビットストリームをレンダリング情報とともに多重化してオブジェクト階層ビットストリームを生成してもよい。ここで、オブジェクト符号化ビットストリームは、符号化されたモノオブジェクトオーディオ信号、ステレオオブジェクトオーディオ信号、およびマルチチャネルオブジェクトオーディオ信号を含んでもよい。 Then, the multiplexer 354 may multiplex the encoded object encoded bitstream together with the rendering information to generate an object hierarchy bitstream. Here, the object encoded bitstream may include an encoded mono object audio signal, a stereo object audio signal, and a multi-channel object audio signal.

このとき、レンダリング情報は、ヘッドホン、ラウドスピーカ、ラウドスピーカの個数、ラウドスピーカの位置のような再生環境に応じて予め設定されてもよい。また、レンダリング情報は、３次元の空間上に仮想的に配置される位置を直接的に表現できる情報を含んでもよい。 At this time, the rendering information may be set in advance according to the reproduction environment such as the headphones, the loudspeakers, the number of loudspeakers, and the position of the loudspeakers. In addition, the rendering information may include information that can directly represent a position virtually arranged in a three-dimensional space.

すると、高品質マルチチャネルオーディオ符号化装置は、生成されるオブジェクト階層ビットストリーム、および基本ビットストリームからなるＨＱＭＡＣビットストリームを高品質マルチチャネルオーディオ復号化装置に送信してもよい。ここで、ＨＱＭＡＣビットストリームは、ＨＱＭＡＣヘッダおよびＨＱＭＡＣフレームに構成してもよい。このとき、ＨＱＭＡＣヘッダは、エンコーディングモード、チャネル数、量子化ビット、量子化周波数、付加階層構成情報、オブジェクト数などのようにデコーダを初期化するために必要な復号化情報を含んでもよい。 Then, the high quality multi-channel audio encoding device may transmit the generated object layer bit stream and the HQMAC bit stream including the basic bit stream to the high quality multi-channel audio decoding device. Here, the HQMAC bit stream may be configured into an HQMAC header and an HQMAC frame. At this time, the HQMAC header may include decoding information necessary for initializing the decoder, such as an encoding mode, the number of channels, a quantization bit, a quantization frequency, additional layer configuration information, and the number of objects.

ここで、エンコーディングモードは、ＨＱＭＡＣで生成されたビットストリームがＨＱＭＡＣ−ＣＢまたはＨＱＭＡＣ−ＯＢに符号化されたかを表す情報を含んでもよい。また、付加階層構成情報は、ＨＱＭＡＣから送信されるビットストリームがオブジェクト階層または第１および第２向上階層ビットストリームを含むか否かを表わしてもよい。 Here, the encoding mode may include information indicating whether the bit stream generated by HQMAC is encoded by HQMAC-CB or HQMAC-OB. Further, the additional layer configuration information may indicate whether the bit stream transmitted from the HQMAC includes an object layer or first and second enhancement layer bit streams.

一方、オブジェクトエンコーダ３５０としては、ＭＰＥＧＳＡＯＣ（ＳｐａｔｉａｌＡｕｄｉｏＯｂｊｅｃｔＣｏｄｉｎｇ）技術のようなパラメータ基盤のマルチオブジェクトオーディオエンコーダが用いられてもよい。このとき、ダウンミックス信号は、オブジェクトエンコーダ３５０で直接に生成するか、ミックス部３１０から出力されるＬチャネルオブジェクトオーディオ信号になり得る。すると、オブジェクトエンコーダ３５０で生成されるオブジェクト符号化ビットストリームは、ダウンミックス信号と空間パラメータに構成されるオブジェクト付加データを含んでもよい。 On the other hand, as the object encoder 350, a parameter-based multi-object audio encoder such as MPEG SAOC (Spatial Audio Object Coding) technology may be used. At this time, the downmix signal may be directly generated by the object encoder 350 or may be an L channel object audio signal output from the mixing unit 310. Then, the object encoded bitstream generated by the object encoder 350 may include object additional data configured as a downmix signal and a spatial parameter.

今までＨＱＭＡＣ−ＣＢ符号化部２００は、基本階層、第１向上階層、および第２向上階層ビットストリームを生成し、ＨＱＭＡＣ−ＯＢ符号化部３００は、基本階層およびオブジェクト階層ビットストリームを生成する過程について説明した。このとき、ＨＱＭＡＣ−ＣＢ符号化部２００およびＨＱＭＡＣ−ＯＢ符号化部３００で生成される基本階層ビットストリームが一般的なＬチャネル（一例として、５．１チャネル）ビットストリームと同じである場合、基本階層に追加されるビットストリームは基本階層ビットストリーム構造にて付加データ領域に位置してもよい。 Until now, the HQMAC-CB encoder 200 generates a base layer, a first enhancement layer, and a second enhancement layer bitstream, and the HQMAC-OB encoder 300 generates a base layer and an object layer bitstream. Explained. At this time, if the base layer bitstream generated by the HQMAC-CB encoding unit 200 and the HQMAC-OB encoding unit 300 is the same as a general L channel (5.1 channel, for example) bitstream, The bit stream added to the hierarchy may be located in the additional data area in the basic hierarchy bit stream structure.

すなわち、図５に示すように、ＨＱＭＡＣビットストリームを構成するＨＱＭＡＣヘッダとＨＱＭＡＣフレームは、基本階層ヘッダと基本階層フレームの付加データ領域にそれぞれ位置してもよい。これによって、基本階層ビットストリームを復号化することのできる５．１チャネルデコーダは、付加データ領域を無視することになるため、ＨＱＭＡＣビットストリーム内で基本階層ビットストリームを解析して５．１チャネルオーディオ信号を再生してもよい。このとき、基本階層ヘッダはレガシーチャネルＬヘッダ（ｌｅｇａｃｙＬ−ｃｈａｎｎｅｌｈｅａｄｅｒ）を含んでもよく、基本階層フレームはレガシーＬチャネルフレーム（ｌｅｇａｃｙＬ−ｃｈａｎｎｅｌｆｒａｍｅ）を含んでもよい。 That is, as shown in FIG. 5, the HQMAC header and the HQMAC frame constituting the HQMAC bitstream may be located in the additional data area of the base layer header and the base layer frame, respectively. As a result, the 5.1 channel decoder that can decode the base layer bitstream ignores the additional data area. Therefore, the 5.1 layer audio is analyzed by analyzing the base layer bitstream in the HQMAC bitstream. The signal may be reproduced. At this time, the basic layer header may include a legacy channel L header (legacy L-channel header), and the base layer frame may include a legacy L channel frame (legacy L-channel frame).

より詳しくは、図６を参照すれば、ＨＱＭＡＣ−ＣＢ符号化部２００によって生成されるＨＱＭＡＣ−ＣＢビットストリーム６００は、チャネル基盤のヘッダおよびフレーム（以下、ＨＱＭＡＣ−ＣＢヘッダおよびＨＱＭＡＣ−ＣＢフレーム）を含んでもよい。このとき、ＨＱＭＡＣ−ＣＢヘッダ６１０は、基本階層ヘッダ６１１およびＨＱＭＡＣ−ＣＢヘッダ６１２を含んでもよい。 More specifically, referring to FIG. 6, the HQMAC-CB bitstream 600 generated by the HQMAC-CB encoder 200 includes a channel-based header and a frame (hereinafter referred to as an HQMAC-CB header and an HQMAC-CB frame). May be included. At this time, the HQMAC-CB header 610 may include a base layer header 611 and an HQMAC-CB header 612.

また、ＨＱＭＡＣ−ＣＢフレーム６２０は、基本階層フレーム６２１およびＨＱＭＡＣ−ＣＢフレーム６２２を含んでもよい。このとき、基本階層ヘッダ６１１とフレーム６２１は、Ｌチャネル（一例として、５．１チャネル）ビットストリームの構造を有してもよい。すると、Ｌチャネルビットストリーム構造の付加データ領域にＨＱＭＡＣ−ＣＢヘッダ６１２とＨＱＭＡＣ−ＣＢフレーム６２２が位置してもよい。ここで、ＨＱＭＡＣ−ＣＢフレーム６２２は、第１向上階層ビットストリーム６２１−１および第２向上階層ビットストリーム６２１−２を含んでもよい。 Further, the HQMAC-CB frame 620 may include a base layer frame 621 and an HQMAC-CB frame 622. At this time, the base layer header 611 and the frame 621 may have an L channel (5.1 channel, for example) bit stream structure. Then, the HQMAC-CB header 612 and the HQMAC-CB frame 622 may be located in the additional data area of the L channel bit stream structure. Here, the HQMAC-CB frame 622 may include a first enhancement layer bitstream 621-1 and a second enhancement layer bitstream 621-2.

このとき、ＨＱＭＡＣ−ＣＢフレーム６２２には、第１および第２向上階層ビットストリームのうちの少なくとも１つが含まれるか、あるいは第１および第２向上階層ビットストリームのすべてが含まないことがある。すなわち、第１および第２向上階層ビットストリームは、入力されるオーディオ信号の特性およびユーザの選択に応じて選択的に用いてもよい。 At this time, the HQMAC-CB frame 622 may include at least one of the first and second enhancement layer bitstreams or may not include all of the first and second enhancement layer bitstreams. That is, the first and second enhancement layer bitstreams may be selectively used according to the characteristics of the input audio signal and the user's selection.

同様に、図７を参照すれば、ＨＱＭＡＣ−ＯＢ符号化部３００によって生成されるＨＱＭＡＣ−ＯＢビットストリーム７００は、オブジェクト基盤のヘッダおよびフレーム（以下、ＨＱＭＡＣ−ＯＢヘッダおよびＨＱＭＡＣ−ＯＢフレーム）を含んでもよい。このとき、図５にて説明したように、ＨＱＭＡＣ−ＯＢヘッダ７１０およびＨＱＭＡＣ−ＯＢフレーム７２０は、基本階層ビットストリームの付加データ領域に位置してもよい。 Similarly, referring to FIG. 7, the HQMAC-OB bitstream 700 generated by the HQMAC-OB encoder 300 includes an object-based header and a frame (hereinafter referred to as an HQMAC-OB header and an HQMAC-OB frame). But you can. At this time, as described with reference to FIG. 5, the HQMAC-OB header 710 and the HQMAC-OB frame 720 may be located in the additional data area of the base layer bitstream.

また、ＨＱＭＡＣ−ＯＢヘッダ７１０は、ＨＡＭＡＣ−ＯＢの復号化のための復号化情報、およびレンダリング情報（ＲＩ）を含んでもよい。ここで、レンダリング情報は、復号化されたオブジェクトオーディオ信号をマルチチャネルラウドスピーカでレンダリングするために用いてもよい。 Further, the HQMAC-OB header 710 may include decoding information for decoding HAMAC-OB and rendering information (RI). Here, the rendering information may be used for rendering the decoded object audio signal with a multi-channel loudspeaker.

また、レンダリング情報は、時間に応じて変更（ｕｐｄａｔｅ）してもよい。これによって、変更されたレンダリング情報７２２−２は、オブジェクト階層ビットストリーム７２２−１の次に位置してもよい。このとき、すべてのフレームごとにレンダリング情報が変更される必要がないことから、変更が生じる場合のみフラグを用いて変更の有無を知らせてもよい。 The rendering information may be updated according to time. Accordingly, the changed rendering information 722-2 may be positioned next to the object hierarchy bitstream 722-1. At this time, since it is not necessary to change the rendering information for every frame, the flag may be used to notify the presence or absence of the change only when the change occurs.

また、ＨＱＭＡＣ−ＣＢ符号化部とＨＱＭＡＣ−ＯＢ符号化部とが同時に用いられる場合、ＨＱＭＡＣ−ＣＢとＨＱＭＡＣ−ＯＢヘッダとフレームがすべて存在してもよい。 In addition, when the HQMAC-CB encoder and the HQMAC-OB encoder are used at the same time, the HQMAC-CB, the HQMAC-OB header, and the frame may all exist.

以下は、高品質マルチチャネルオーディオ復号化装置に対して説明することとする。高品質マルチチャネルオーディオ復号化装置は、チャネル基盤オーディオ復号化部８００およびオブジェクト基盤オーディオ復号化部９００を含んでもよい。 The following will be described for a high quality multi-channel audio decoding device. The high-quality multi-channel audio decoding apparatus may include a channel-based audio decoding unit 800 and an object-based audio decoding unit 900.

このとき、高品質マルチチャネルオーディオ復号化装置は、ＨＱＭＡＣヘッダおよびＨＱＭＡＣフレームに構成されるＨＱＭＡＣビットストリームを高品質マルチチャネルオーディオ符号化装置から受信してもよい。すると、高品質マルチチャネルオーディオ復号化装置は、ＨＱＭＡＣヘッダに含まれるエンコーディングモードに基づいて受信されるＨＱＭＡＣビットストリームに対してチャネル基盤オーディオ復号化またはオブジェクト基盤オーディオの復号化を行ってもよい。 At this time, the high-quality multi-channel audio decoding apparatus may receive an HQMAC bitstream configured with an HQMAC header and an HQMAC frame from the high-quality multi-channel audio encoding apparatus. Then, the high-quality multi-channel audio decoding apparatus may perform channel-based audio decoding or object-based audio decoding on the HQMAC bitstream received based on the encoding mode included in the HQMAC header.

図８は、チャネル基盤オーディオ復号化部の構成を示すブロック図である。図８を参照すれば、チャネル基盤オーディオ復号化部８００は、第２チャネルデコーダ８１０、第１チャネルデコーダ８２０、アップミックス部８３０、および高効率チャネルデコーダ８４０を含んでもよい。このとき、チャネル基盤オーディオ復号化部８００は、受信されるＨＱＭＡＣフレームに含まれたビットストリーム階層に基づいてＨＱＭＡＣビットストリームを復号化してもよい。ＨＱＭＡＣ−ＣＢである場合、ビットストリーム階層は、基本階層、第１および第２向上階層ビットストリームを含んでもよい。 FIG. 8 is a block diagram showing the configuration of the channel-based audio decoding unit. Referring to FIG. 8, the channel-based audio decoding unit 800 may include a second channel decoder 810, a first channel decoder 820, an upmix unit 830, and a high efficiency channel decoder 840. At this time, the channel-based audio decoding unit 800 may decode the HQMAC bitstream based on the bitstream hierarchy included in the received HQMAC frame. In the case of HQMAC-CB, the bitstream layer may include a base layer, first and second enhancement layer bitstreams.

エンコーディングモードがＨＱＭＡＣ−ＣＢである場合、第２チャネルデコーダ８１０は、ＨＱＭＡＣフレームに含まれる第１向上階層ビットストリームを復号化して合成されたＫチャネル信号を復元してもよい。ここで、第２チャネルデコーダ８１０としては、ＡＡＣ、またはＡＣ−３のような一般的な高品質チャネルデコーダを用いてもよい。 When the encoding mode is HQMAC-CB, the second channel decoder 810 may restore the synthesized K channel signal by decoding the first enhancement layer bitstream included in the HQMAC frame. Here, as the second channel decoder 810, a general high quality channel decoder such as AAC or AC-3 may be used.

一例として、高品質マルチチャネルオーディオ復号化装置から送信されたＨＱＭＡＣビットストリームがチャネル基盤オーディオ符号化部２００によって符号化される場合、第２チャネルデコーダ８１０は第１向上階層ビットストリームを復号化して合成されたＫチャネル信号を復元してもよい。すなわち、第２チャネルデコーダ８１０を用いて合成されたＫチャネルと、第１チャネルデコーダを用いて合成されたＬチャネルとを用いてＮチャネルを有する第１ダウンミックス信号を復元してもよい。 As an example, when the HQMAC bitstream transmitted from the high-quality multi-channel audio decoding apparatus is encoded by the channel-based audio encoding unit 200, the second channel decoder 810 decodes and combines the first enhancement layer bitstream. The recovered K channel signal may be restored. That is, the first downmix signal having N channels may be restored using the K channel combined using the second channel decoder 810 and the L channel combined using the first channel decoder.

第１チャネルデコーダ８２０は、ＨＱＭＡＣフレームに含まれる基本階層ビットストリームを復号化してＬチャネルの第２ダウンミックス信号を復元してもよい。すなわち、基本階層ビットストリームは、第１チャネルデコーダ８２０によってＬ個のチャネルに構成された第２ダウンミックス信号を復元してもよい。ここで、第２チャネルデコーダとしては、一般的な５．１チャネルデコーダを用いてもよい。 The first channel decoder 820 may restore the L channel second downmix signal by decoding the base layer bitstream included in the HQMAC frame. That is, the base layer bitstream may restore the second downmix signal configured in L channels by the first channel decoder 820. Here, a general 5.1 channel decoder may be used as the second channel decoder.

アップミックス部８３０は、第２ダウンミックス信号（Ｌチャネル）と第２チャネルデコーダを用いて合成されたＫチャネル信号を用いてアップミックスしてＮチャネルの第１ダウンミックス信号を復元してもよい。 The upmix unit 830 may perform upmix using the second downmix signal (L channel) and the K channel signal synthesized using the second channel decoder to restore the N channel first downmix signal. .

高効率チャネルデコーダ８４０は、第１ダウンミックス信号とＨＱＭＡＣフレームに含まれる第２向上階層ビットストリームとを用いてマルチチャネル（Ｍチャネル）オーディオ信号を復元してもよい。このとき、アップミックス部８３０で復元されたＮチャネルの第１ダウンミックス信号と、第１チャネルデコーダ８２０で復元されたＬチャネルの第２ダウンミックス信号とが直ちに出力されてもよい。すなわち、第１ダウンミックス信号および第２ダウンミックス信号は、チャネル基盤オーディオ復号化部８００の出力信号になり得る。 The high-efficiency channel decoder 840 may restore a multi-channel (M channel) audio signal using the first downmix signal and the second enhancement layer bitstream included in the HQMAC frame. At this time, the N-channel first downmix signal restored by the upmix unit 830 and the L-channel second downmix signal restored by the first channel decoder 820 may be immediately output. That is, the first downmix signal and the second downmix signal may be output signals of the channel-based audio decoding unit 800.

図９は、オブジェクト基盤オーディオ復号化部の構成を示すブロック図である。図９を参照すれば、オブジェクト基盤オーディオ復号化部９００は、ビットストリーム処理部９１０、オブジェクトデコーダ９３０、およびレンダリング部９５０を含んでもよい。このとき、オブジェクト基盤オーディオ復号化部９００は、受信されたＱＭＡＣフレームに含まれるビットストリーム階層に基づいてＨＱＭＡＣビットストリームを復号化してもよい。ＨＱＭＡＣ−ＯＢである場合、ビットストリーム階層は、基本階層、オブジェクト階層ビットストリームを含んでもよい。 FIG. 9 is a block diagram showing a configuration of the object-based audio decoding unit. Referring to FIG. 9, the object-based audio decoding unit 900 may include a bitstream processing unit 910, an object decoder 930, and a rendering unit 950. At this time, the object-based audio decoding unit 900 may decode the HQMAC bitstream based on the bitstream hierarchy included in the received QMAC frame. In the case of HQMAC-OB, the bitstream layer may include a base layer and an object layer bitstream.

ビットストリーム処理部９１０は、基本階層ビットストリームを用いてオブジェクト基盤オーディオ符号化部３００でＬチャネルにミックスされたオーディオ信号を復元してもよい。一例として、ビットストリーム処理部９１０は、５．１チャネルデコーダを用いてＬチャネルにミックスされたオーディオ信号を復元してもよい。 The bit stream processing unit 910 may restore the audio signal mixed in the L channel by the object-based audio encoding unit 300 using the base layer bit stream. As an example, the bit stream processing unit 910 may restore an audio signal mixed into the L channel using a 5.1 channel decoder.

オブジェクトデコーダ９３０は、オブジェクト階層ビットストリームに含まれるオブジェクト別符号化ビットストリームをそれぞれ復号化してマルチオブジェクトオーディオ信号を復元してもよい。すなわち、オブジェクトデコーダ９３０は、基本階層ビットストリームを用いることなく、マルチオブジェクトオーディオ信号を復元してもよい。ここで、オブジェクト別の符号化ビットストリームは、符号化されたモノオブジェクト、ステレオオブジェクト、およびマルチチャネルオブジェクトビットストリームを含んでもよい。 The object decoder 930 may restore the multi-object audio signal by decoding each object-coded bitstream included in the object hierarchy bitstream. That is, the object decoder 930 may restore the multi-object audio signal without using the base layer bitstream. Here, the encoded bitstream for each object may include an encoded mono object, stereo object, and multi-channel object bitstream.

一例として、モノチャネルデコーダ９３１は符号化されたモノオブジェクトビットストリームを復号化し、ステレオチャネルデコーダ９３３は符号化されたステレオオブジェクトビットストリームを復号化し、マルチチャネルデコーダ９３５は符号化されたマルチチャネルオブジェクトビットストリームを復号化してもよい。 As an example, the mono channel decoder 931 decodes the encoded mono object bitstream, the stereo channel decoder 933 decodes the encoded stereo object bitstream, and the multichannel decoder 935 encodes the encoded multichannel object bits. The stream may be decoded.

レンダリング部９５０は、レンダリング情報を用いてモノオブジェクト、ステレオオブジェクト、およびマルチチャネルオブジェクトそれぞれのビットストリームをレンダリングして再生できる形態の出力信号を生成してもよい。一例として、レンダリング部９５０は、Ｑチャネルラウドスピーカ信号を出力信号に生成してもよい。このとき、レンダリング情報は、高品質オーディオ符号化装置から送信されるＨＱＭＡＣビットストリームに含まれてもよい。 The rendering unit 950 may generate an output signal in a form that can render and reproduce the bit streams of the mono object, the stereo object, and the multi-channel object using the rendering information. As an example, the rendering unit 950 may generate a Q channel loudspeaker signal as an output signal. At this time, the rendering information may be included in the HQMAC bitstream transmitted from the high quality audio encoding device.

また、レンダリング部９５０は、ＨＱＭＡＣフレームに含まれる基本階層から復元されるオーディオ信号を選択的に用いてもよい。すなわち、レンダリング部９５０は、ビットストリーム処理部９１０で復元されたＬチャネルにミックスされたオーディオ信号を用いてもよい。 Further, the rendering unit 950 may selectively use an audio signal restored from the base layer included in the HQMAC frame. That is, the rendering unit 950 may use an audio signal mixed with the L channel restored by the bit stream processing unit 910.

また、入力される高品質マルチチャネルオーディオビットストリームにＨＱＭＡＣ−ＣＢビットストリームとＨＱＭＡＣ−ＯＢビットストリームがすべて含まれている場合には、それぞれの復号化過程を介する出力信号を多重化して出力してもよい。 Also, when the HQMAC-CB bitstream and HQMAC-OB bitstream are all included in the input high quality multi-channel audio bitstream, the output signals through the respective decoding processes are multiplexed and output. Also good.

以上は、説明の便宜のためにＨＱＭＡＣ−ＣＢビットストリーム、およびＨＱＭＡＣ−ＯＢビットストリームに区分して説明したが、ＨＱＭＡＣ−ＣＢビットストリーム、およびＨＱＭＡＣ−ＯＢビットストリームはすべてＨＱＭＡＣビットストリームを表わしてもよい。すなわち、ＨＱＭＡＣ−ＣＢビットストリームはＨＱＭＡＣ−ＣＢ符号化によって生成されるＨＱＭＡＣビットストリームであってもよく、ＨＱＭＡＣ−ＯＢビットストリームはＨＱＭＡＣ−ＯＢ符号化によって生成されるＨＱＭＡＣビットストリームであってもよい。 In the above description, the HQMAC-CB bit stream and the HQMAC-OB bit stream are divided into the HQMAC-CB bit stream and the HQMAC-OB bit stream for convenience of explanation. Good. That is, the HQMAC-CB bitstream may be an HQMAC bitstream generated by HQMAC-CB encoding, and the HQMAC-OB bitstream may be an HQMAC bitstream generated by HQMAC-OB encoding.

また、以上では図３を参照してチャネル基盤オーディオ符号化部において第１チャネルエンコーダとともに高効率チャネルエンコーダおよび第２チャネルエンコーダを用いてチャネル基盤オーディオ符号化を行うものとして説明したが、これは実施形態に該当し、高効率チャネルエンコーダおよび第２チャネルエンコーダは選択的に用いてもよい。 In the above description, the channel-based audio encoding unit performs channel-based audio encoding using the high-efficiency channel encoder and the second channel encoder together with the first channel encoder in the channel-based audio encoding unit. The high-efficiency channel encoder and the second channel encoder may be selectively used.

すなわち、チャネル基盤オーディオ符号化は、高効率チャネルエンコーダおよび第２チャネルエンコーダのうちの少なくとも１つを用いるか、または２つとも用いることなく第１チャネルエンコーダだけでもチャネル基盤オーディオ符号化を行ってもよい。 That is, the channel-based audio encoding may be performed using only the first channel encoder without using at least one of the high-efficiency channel encoder and the second channel encoder, or using both. Good.

このように、高効率チャネルエンコーダおよび第２チャネルエンコーダは選択的に用いられる場合、チャネル合成部ではダウンミックスを選択的に用いてもよい。すなわち、高効率チャネルエンコーダを使用しない場合、チャネル合成部は、入力されるマルチチャネル（Ｍチャネル）オーディオ信号をＬチャネルにダウンミックスしてもよい。 As described above, when the high-efficiency channel encoder and the second channel encoder are selectively used, the channel synthesis unit may selectively use downmix. That is, when the high-efficiency channel encoder is not used, the channel synthesis unit may downmix the input multi-channel (M channel) audio signal to the L channel.

同様に、チャネル基盤オーディオ復号化は、高効率チャネルデコーダおよび第２チャネルデコーダのうちの少なくとも１つを用いるか、または２つとも用いることなく第１チャネルデコーダのみを用いてチャネル基盤オーディオ復号化を行ってもよい。このとき、高効率チャネルデコーダを使用しない場合、アップミックス部は第２ダウンミックス信号と合成された第１ダウンミックス信号をＭチャネルにアップミックスしてもよい。 Similarly, channel-based audio decoding is performed using only the first channel decoder without using at least one of the high-efficiency channel decoder and the second channel decoder, or using both. You may go. At this time, when the high-efficiency channel decoder is not used, the upmix unit may upmix the first downmix signal combined with the second downmix signal into the M channel.

上述したように、本発明は、たとえ限定された実施形態と図面によって説明したが、本発明は、上記の実施形態に限定されることなく、本発明が属する分野における通常の知識を有する者であれば、このような基材から多様な修正および変形が可能である。 As described above, the present invention has been described with reference to the limited embodiments and drawings. However, the present invention is not limited to the above-described embodiments, and the person having ordinary knowledge in the field to which the present invention belongs. If present, various modifications and variations are possible from such substrates.

したがって、本発明の範囲は説明された実施形態に限定されて決められてはならず、後述する特許請求の範囲だけでなく、この特許請求の範囲と均等なものなどによって定められなければならない。 Accordingly, the scope of the present invention should not be determined by being limited to the embodiments described, but must be determined not only by the claims described below, but also by the equivalents of the claims.

２００：チャネル基盤オーディオ符号化部
３００：オブジェクト基盤オーディオ符号化部
２１０：高効率チャネルエンコーダ
２３０：チャネル合成部
２５０：第２チャネルエンコーダ
２７０：第１チャネルエンコーダ 200: Channel-based audio encoding unit 300: Object-based audio encoding unit 210: High-efficiency channel encoder 230: Channel synthesis unit 250: Second channel encoder 270: First channel encoder

Claims

A channel-based audio encoding unit that performs channel-based audio encoding on the audio signal based on characteristics of the input audio signal;
A high-quality multi-channel audio encoding device comprising: an object-based audio encoding unit that performs object-based audio encoding on the audio signal based on the characteristics of the audio signal.

The channel-based audio encoding unit generates a bitstream by performing channel-based audio encoding on the multi-channel audio signal when the input audio signal is a multi-channel audio signal;
When the input audio signal is a multi-object audio signal, the object-based audio encoding unit performs object-based audio encoding on the multi-object audio signal to generate a bitstream. The high quality multi-channel audio encoding device according to claim 1.

The channel-based audio encoding unit generates a first downmix signal by downmixing the multichannel audio signal, encodes a spatial parameter extracted from the multichannel audio signal, and generates a second enhancement layer bitstream. The high-quality multi-channel audio encoding apparatus according to claim 2, further comprising a high-efficiency channel encoder that generates the high-quality multi-channel audio encoding apparatus.

The channel-based audio encoding unit further includes a channel synthesis unit that downmixes the first downmix signal to generate a second downmix signal and synthesizes the first downmix signal and the additional channel signal. The high-quality multi-channel audio encoding device according to claim 3,

The high quality multi-channel audio code of claim 4, wherein the channel-based audio encoding unit further includes a first channel encoder that encodes the second downmix signal to generate a base layer bitstream. Device.

A channel configured through the base layer bitstream, a channel configured through the first enhancement layer, and a channel configured through the second enhancement layer are configured as different multi-channels. The high-quality multi-channel audio encoding device according to claim 5.

5. The high channel according to claim 4, wherein the channel-based audio encoding unit further includes a second channel encoder that encodes the combined first downmix signal to generate a first enhancement layer bitstream. Quality multi-channel audio encoding device.

The object-based audio encoding unit is
When the input audio signal is a multi-object audio signal, a mixing unit that mixes the multi-object audio signal;
A bitstream generator that encodes the mixed signal to generate a base layer bitstream;
An object that separates the input multi-object audio signal into a mono object, a stereo object, and a multi-object audio signal, and multiplexes the separated audio signal using preset rendering information to generate an object hierarchy bitstream The high quality multi-channel audio encoding device according to claim 1, further comprising: an encoder.

The high-quality multi-channel audio encoding apparatus according to claim 8, wherein the mixing unit mixes the multi-object audio signal into 5.1 channels using mix information received from outside.

The first and second enhancement layer bitstreams generated by the channel-based audio encoding unit are included in an additional data region in a base layer bitstream structure,
The high-quality multi-channel audio encoding device according to claim 1, wherein the object layer bitstream generated by the object-based audio encoding unit is included in an additional data area in the basic layer bitstream structure.

The channel-based audio encoding unit configures and transmits a channel-based header and a frame using the base layer bitstream and the first and second enhancement layer bitstreams,
The high-quality multi-channel according to claim 10, wherein the object-based audio encoding unit configures and transmits an object-based header and frame using the base layer bitstream and the object layer bitstream. Audio encoding device.

When audio encoding is performed on the audio signal using all of the channel-based audio encoding unit and the object-based audio encoding unit, the bitstream generated by the audio encoding includes the channel-based audio encoding unit. A header and a frame for each of the audio encoding and the object-based audio encoding are included;
The channel-based header or the object-based header includes decoding information used for decoding a bitstream generated by the channel-based audio encoding unit or the object-based audio encoding unit. The high quality multi-channel audio encoding device according to claim 11.

A channel-based audio decoding unit that performs initialization for channel-based audio decoding based on an encoding mode received from a high-quality multi-channel audio encoding device;
A high-quality multi-channel audio decoding apparatus comprising: an object-based audio decoding unit that performs initialization for decoding object-based audio based on the encoding mode.

The channel-based audio decoding unit performs decoding of the channel-based audio based on a bitstream layer included in a frame received from a high-quality multi-channel audio encoding device;
The high-quality multi-channel audio decoding apparatus according to claim 13, wherein the object-based audio decoding unit decodes the object-based audio based on the bitstream hierarchy.

The channel-based audio decoding unit includes a first channel decoder that decodes a base layer bitstream included in a frame transmitted from the high-quality multi-channel audio encoding device to restore a second downmix signal. 14. A high quality multi-channel audio decoding device according to claim 13, characterized in that

14. The channel-based audio decoding unit according to claim 13, further comprising a second channel decoder that restores a first downmix signal synthesized by decoding a first enhancement layer bitstream included in the frame. The high quality multi-channel audio decoding device described.

The channel-based audio decoding unit upmixes the synthesized first downmix signal and a second downmix signal restored using a base layer bitstream included in the frame to obtain a first downmix signal. 17. The high quality multi-channel audio decoding apparatus according to claim 16, further comprising an upmix unit for restoration.

The high-quality multi-channel according to claim 16, further comprising a high-efficiency channel decoder that restores a multi-channel audio signal using the first downmix signal and a second enhancement layer bitstream included in the frame. Audio decoding device.

The object-based audio decoding unit includes a bit stream processing unit that restores an audio signal mixed in the second channel using a base layer bit stream included in a frame received from a high-quality multi-channel audio encoding device;
The high-quality multi-channel audio according to claim 18, further comprising: an object decoder that restores a bit stream of each of a mono object, a stereo object, and a multi-channel object using an object hierarchy bit stream included in the frame. Decryption device.

When the HQMAC-CB bitstream and the HQMAC-OB bitstream are all included in the high-quality multi-channel audio bitstream input from the high-quality multichannel audio decoding device, the channel-based audio decoding unit performs the HQMAC-CB. Channel-based audio decoding on the bitstream to multiplex the output signal,
The high-quality multi-channel audio decoding according to claim 13, wherein the object-based audio decoding unit performs object-based audio decoding on the HQMAC-OB bitstream to multiplex an output signal. Device.