JP2013137546A

JP2013137546A - Apparatus for encoding and decoding audio signal and method thereof

Info

Publication number: JP2013137546A
Application number: JP2013003356A
Authority: JP
Inventors: Hee Suk Pang; スクパン，ヒー; Jae-Hyun Lim; ヒュンリム，ジェ; Hyen O Oh; オオー，ヒョン; Yan-Won Jun; ウォンジュン，ヤン; Dong Soo Kim; スーキム，ドン
Original assignee: LG Electronics Inc
Current assignee: LG Electronics Inc
Priority date: 2005-08-30
Filing date: 2013-01-11
Publication date: 2013-07-11
Anticipated expiration: 2026-08-30
Also published as: KR100880644B1; KR20080037106A; KR20080086551A; HK1124681A1; KR100891685B1; KR100880647B1; KR20080049747A; KR20080037104A; KR100880646B1; KR20080037105A; JP5319846B2; KR20080037111A; MX2008002713A; KR100880645B1; KR20080049746A; KR101165641B1; KR20080036232A; KR100891686B1; KR100891687B1

Abstract

PROBLEM TO BE SOLVED: To provide an apparatus for encoding and decoding audio signals and a method thereof.SOLUTION: Spatial information is encoded into a bitstream, and the bitstream includes different syntax related to time, frequency and spatial domains. The bitstream includes one or more data structures (e.g., frames) that contain ordered sets of slots for which parameters can be applied. A data structure type indicator is inserted to the bitstream. The data structure includes position information that can be used by a decoder to identify a correct slot for which a given parameter set is applied. The slot position information is encoded with either a fixed number of bits or a variable number of bits based on the data structure type indicated by the data structure type indicator. For variable data structure types, the slot position information is encoded with a variable number of bits based on the position of the slot in the ordered set of slots.

Description

本発明は主としてオーディオ信号処理に関する。 The present invention mainly relates to audio signal processing.

通常、空間音声コーディング（ＳＡＣ：ＳｐａｔｉａｌＡｕｄｉｏＣｏｄｉｎｇ）と呼ばれるマルチチャンネルオーディオのコーディングを認識するための新たなアプローチ方式が研究開発中にある。ＳＡＣは、マルチチャンネルオーディオを低いビットレートにて送信可能にすることから、種々のオーディオ適用分野（例えば、インターネットストリーミング、音楽のダウンロードなど）に向いている。 A new approach for recognizing multi-channel audio coding, usually called Spatial Audio Coding (SAC), is currently under research and development. SAC is suitable for various audio application fields (for example, Internet streaming, music download, etc.) because it enables multi-channel audio to be transmitted at a low bit rate.

ＳＡＣは、個別のオーディオ入力チャンネルを分散コーディングするよりも簡単なセットのパラメータでマルチチャンネルオーディオ信号の空間イメージを取得する。このようなパラメータは、デコーダーに送信されてオーディオ信号の空間特性を合成したり再構成するのに用いられる。 SAC obtains a spatial image of a multi-channel audio signal with a simpler set of parameters than distributed coding of individual audio input channels. Such parameters are sent to the decoder and used to synthesize and reconstruct the spatial characteristics of the audio signal.

一部のＳＡＣ適用分野においては、空間パラメータがビットストリームの一部としてデコーダーに送信される。このようなビットストリームは複数の空間フレームを含み、これらの空間フレームは空間パラメータが適用可能に並べられたタイムスロットセットを含む。また、ビットストリームは位置情報を含むが、デコーダーはこのような位置情報を用いて所定のパラメータセットが適用される正確なタイムスロットを識別することができる。 In some SAC application fields, spatial parameters are sent to the decoder as part of the bitstream. Such a bitstream includes a plurality of spatial frames, which include a time slot set in which spatial parameters are arranged to be applicable. Also, although the bitstream includes position information, the decoder can use the position information to identify an accurate time slot to which a predetermined parameter set is applied.

一部のＳＡＣ適用分野においては、エンコーディング／デコーディング経路に概念的なエレメントを用いる。そのようなエレメントの一つとしては、通常ＯＴＴ（Ｏｎｅ−Ｔｏ−Ｔｗｏ）と呼ばれるものがあり、他のエレメントとしては、通常ＴＴＴ（Ｔｗｏ−Ｔｏ−Ｔｈｒｅｅ）と呼ばれるものがあるが、このような名称は、それぞれ対応デコーダー要素の入力チャンネルと出力チャンネルの個数を意味している。ＯＴＴエンコーダーエレメントは、２つの空間パラメータを抽出して、ダウンミックス信号とレジデュアル信号（ｒｅｓｉｄｕａｌｓｉｇｎａｌ）を生成する。ＴＴＴエレメントは、３つのオーディオ信号を１つのダウンミックス信号と１つのレジデュアル信号にダウンミックスする。これらのエレメントが組み合わせられて種々の構成の空間オーディオ環境（例えば、サラウンドサウンドなど）を提供することができる。 Some SAC application fields use conceptual elements in the encoding / decoding path. One of such elements is usually called OTT (One-To-Two), and the other element is usually called TTT (Two-To-Three). The name means the number of input channels and output channels of the corresponding decoder element, respectively. The OTT encoder element extracts two spatial parameters and generates a downmix signal and a residual signal. The TTT element downmixes three audio signals into one downmix signal and one residual signal. These elements can be combined to provide various configurations of spatial audio environments (eg, surround sound, etc.).

一部のＳＡＣ適用分野は、ノンガイド動作モードにて動作可能であるが、このような動作モードにおいては空間パラメータを送信する必要がなく、ステレオダウンミックス信号だけがエンコーダーからデコーダーに送信される。デコーダーは、ダウンミックス信号から空間パラメータを合成してマルチチャンネルオーディオ信号の生成に用いる。 Some SAC application fields can operate in a non-guided operation mode, but in such an operation mode it is not necessary to transmit spatial parameters and only a stereo downmix signal is transmitted from the encoder to the decoder. The decoder synthesizes spatial parameters from the downmix signal and uses them to generate a multi-channel audio signal.

オーディオ信号に関連する空間情報は、デコーダーに送信されるか、または、記録媒体に記録可能なビットストリームにエンコーディングされる。このようなビットストリームは、時間、周波数及び空間エリアに関連する異なるシンタックスを含むことができる。一部の実施形態において、ビットストリームはパラメータが適用可能に並べられたセットのスロットを含む１以上のデータ構造（例えば、フレーム）を含むことができる。このようなデータ構造は、固定されたものであっても、可変的なものであってもよい。デコーダーがデータ構造タイプを判定して適切なデコーディング処理を呼び出せるようにするデータ構造タイプ表示子がビットストリームに含まれうる。データ構造は位置情報を含むことができ、デコーダーはこのような位置情報を用いて所定のパラメータセットが適用される正確なスロットを識別することができる。スロット位置情報は、データ構造タイプ表示子が表示するデータ構造タイプに応じて、固定個数のビットまたは可変個数のビットでエンコーディング可能である。可変データ構造タイプについて、スロット位置情報は前記並べられたセットのスロットにおいて当該スロットの位置により可変個数のビットでエンコーディング可能である。 Spatial information associated with the audio signal is transmitted to a decoder or encoded into a bitstream that can be recorded on a recording medium. Such a bitstream may include different syntax related to time, frequency and spatial area. In some embodiments, the bitstream can include one or more data structures (eg, frames) that include a set of slots in which parameters are appropriately arranged. Such a data structure may be fixed or variable. A data structure type indicator may be included in the bitstream that allows the decoder to determine the data structure type and invoke the appropriate decoding process. The data structure can include location information, and the decoder can use such location information to identify the exact slot to which a given parameter set is applied. The slot position information can be encoded with a fixed number of bits or a variable number of bits according to the data structure type displayed by the data structure type indicator. For variable data structure types, slot position information can be encoded with a variable number of bits according to the position of the slot in the ordered set of slots.

一部の実施形態において、オーディオ信号をエンコーディングする方法は、フレーミングタイプを決定するステップと、タイムスロットの個数及び１以上のパラメータを含むパラメータセットの個数を決定するステップと、オーディオ信号を並べられたセットのタイムスロットを含むフレームを含むビットストリームにエンコーディングするステップと、前記ビットストリームにフレーミングタイプ表示子を挿入するステップと、前記フレームタイプ表示が可変フレーミングを示す場合、並べられたセットのタイムスロットのうちパラメータがセットが適用される少なくとも一つのタイムスロットの位置を示す情報を生成するステップと、前記並べられたセットのタイムスロットのうち前記タイムスロットの位置を示す可変個数のビットを前記ビットストリームに挿入するステップと、を含み、前記可変個数のビットは前記タイムスロット位置により決定されることを特徴とする。 In some embodiments, a method for encoding an audio signal includes aligning an audio signal, determining a framing type, determining a number of time slots and a number of parameter sets including one or more parameters, and Encoding into a bitstream including a frame including a set of time slots; inserting a framing type indicator into the bitstream; and if the frame type indication indicates variable framing, A step of generating information indicating a position of at least one time slot to which the set is applied, and a variable number of bits indicating the position of the time slot among the time slots of the arranged set. Includes a step of inserting in the bitstream, the bit of the variable number is being determined by the time slot position.

一部の実施形態において、オーディオ信号をデコーディングする方法は、オーディオ信号を示しかつフレームを含むビットストリームを受信するステップと、前記ビットストリームからタイムスロットの個数と１以上のパラメータを含むパラメータセットの個数を決定するステップと、前記ビットストリームからフレーミングタイプを決定するステップと、前記フレーミングタイプが可変フレーミングである場合、前記ビットストリームから並べられたセットのタイムスロットのうちパラメータセットが適用されるタイムスロットの位置を示す位置情報を決定するステップと、前記タイムスロットの個数、前記パラメータセットの個数及び前記位置情報に基づき前記オーディオ信号をデコーディングするステップと、を含み、前記位置情報は前記タイムスロット位置に基づく可変個数のビットで表わされることを特徴とする。 In some embodiments, a method of decoding an audio signal includes receiving a bitstream that represents an audio signal and includes a frame, and a parameter set that includes a number of time slots and one or more parameters from the bitstream. A step of determining a number, a step of determining a framing type from the bit stream, and a time slot to which a parameter set is applied among time slots of a set arranged from the bit stream when the framing type is variable framing Determining position information indicating the position of the audio signal, and decoding the audio signal based on the number of time slots, the number of parameter sets, and the position information. Characterized in that it is represented by a variable number of bits based on the serial time slot position.

システム、方法、装置、データ構造及びコンピュータにて読取り可能な媒体に関する多数フレームタイプのタイムスロット位置コーディングの他の実施形態も開示される。 Other embodiments of multiple frame type time slot position coding for systems, methods, apparatus, data structures and computer readable media are also disclosed.

上述した通常の説明及び以下の実施形態の詳細な説明はいずれも例示的に説明するためのものであり、特許請求の範囲において請求される本発明の理解を容易にするためのものであるという点が理解されなければならない。 The foregoing general description and the following detailed description of the embodiments are for illustrative purposes only and are intended to facilitate the understanding of the present invention as claimed in the claims. The point must be understood.

本発明の一実施形態により空間情報を生成する原理を示す図である。It is a figure which shows the principle which produces | generates spatial information by one Embodiment of this invention. 本発明の一実施形態によりオーディオ信号をエンコーディングするエンコーダーのブロック図である。FIG. 2 is a block diagram of an encoder that encodes an audio signal according to an embodiment of the present invention. 本発明の一実施形態によりオーディオ信号をデコーディングするデコーダーのブロック図である。FIG. 3 is a block diagram of a decoder for decoding an audio signal according to an embodiment of the present invention. 本発明の一実施形態によるデコーダーのアップミキシング部に含まれるチャンネル変換部のブロック図である。FIG. 4 is a block diagram of a channel conversion unit included in an upmixing unit of a decoder according to an embodiment of the present invention. 本発明の一実施形態によりオーディオ信号のビットストリームを構成する方法を説明するブロック図である。2 is a block diagram illustrating a method for constructing a bit stream of an audio signal according to an embodiment of the present invention. 本発明の一実施形態よる、パラメータセット、タイムスロット及びパラメータ帯域の関係を説明するための図面と時間／周波数グラフである。5 is a diagram and a time / frequency graph for explaining a relationship between a parameter set, a time slot, and a parameter band according to an embodiment of the present invention. 本発明の一実施形態よる、パラメータセット、タイムスロット及びパラメータ帯域の関係を説明するための図面と時間／周波数グラフである。5 is a diagram and a time / frequency graph for explaining a relationship between a parameter set, a time slot, and a parameter band according to an embodiment of the present invention. 本発明の一実施形態による空間情報信号の構成情報を表示するシンタックスを示す図である。It is a figure which shows the syntax which displays the structure information of the spatial information signal by one Embodiment of this invention. 本発明の一実施形態による空間情報信号のパラメータ帯域の個数を示す表である。4 is a table showing the number of parameter bands of a spatial information signal according to an embodiment of the present invention. 本発明の一実施形態より、固定個数のビットとしてＯＴＴボックスに適用されたパラメータ帯域の個数を示すシンタックスを示す図である。FIG. 6 is a diagram illustrating a syntax indicating the number of parameter bands applied to an OTT box as a fixed number of bits according to an embodiment of the present invention. 本発明の一実施形態により、可変個数のビットでＯＴＴボックスに適用されたパラメータ帯域の個数を示すシンタックスを示す図である。FIG. 6 is a diagram illustrating a syntax indicating the number of parameter bands applied to an OTT box with a variable number of bits according to an embodiment of the present invention; 本発明の一実施形態により、固定個数のビットでＴＴＴボックスに適用されたパラメータ帯域の個数を示すシンタックスを示す図である。FIG. 6 is a diagram illustrating a syntax indicating the number of parameter bands applied to a TTT box with a fixed number of bits according to an embodiment of the present invention; 本発明の一実施形態により、可変個数のビットでＴＴＴボックスに適用されたパラメータ帯域の個数を示すシンタックスを示す図である。FIG. 6 is a diagram illustrating a syntax indicating the number of parameter bands applied to a TTT box with a variable number of bits according to an embodiment of the present invention; 本発明の一実施形態による空間拡張フレームのための空間拡張構成情報のシンタックスを示す図である。FIG. 6 is a diagram illustrating the syntax of spatial extension configuration information for a spatial extension frame according to an embodiment of the present invention. 本発明の一実施形態により空間拡張フレームにレジデュアル信号が含まれる場合、レジデュアル信号のための空間拡張構成情報のシンタックスを示す図である。FIG. 6 is a diagram illustrating a syntax of spatial extension configuration information for a residual signal when a residual signal is included in the spatial extension frame according to an embodiment of the present invention. 本発明の一実施形態により空間拡張フレームにレジデュアル信号が含まれる場合、レジデュアル信号のための空間拡張構成情報のシンタックスを示す図である。FIG. 6 is a diagram illustrating a syntax of spatial extension configuration information for a residual signal when a residual signal is included in the spatial extension frame according to an embodiment of the present invention. 本発明の一実施形態によりレジデュアル信号のためのパラメータ帯域の個数を示す方法のためのシンタックスを示す図である。FIG. 6 shows a syntax for a method for indicating the number of parameter bands for a residual signal according to an embodiment of the present invention. Ａは、本発明の一実施形態よりノンガイドコーディングを用いるデコーディング装置のブロック図であり、Ｂは、本発明の一実施形態によりパラメータ帯域の個数をグループで示す方法を示す図である。A is a block diagram of a decoding apparatus using non-guide coding according to an embodiment of the present invention, and B is a diagram illustrating a method of indicating the number of parameter bands in groups according to an embodiment of the present invention. Ａは、本発明の一実施形態よりノンガイドコーディングを用いるデコーディング装置のブロック図であり、Ｂは、本発明の一実施形態によりパラメータ帯域の個数をグループで示す方法を示す図である。A is a block diagram of a decoding apparatus using non-guide coding according to an embodiment of the present invention, and B is a diagram illustrating a method of indicating the number of parameter bands in groups according to an embodiment of the present invention. 本発明の一実施形態による空間フレームの構成情報のシンタックスを示す図である。It is a figure which shows the syntax of the structure information of the space frame by one Embodiment of this invention. 本発明の一実施形態によりパラメータセットが適用されるタイムスロットの位置情報のシンタックスを示す図である。It is a figure which shows the syntax of the positional information on the time slot to which a parameter set is applied by one Embodiment of this invention. 本発明の一実施形態により絶対値及び差分値としてパラメータセットが適用されるタイムスロットの位置情報を表わすためのシンタックスを示す図である。It is a figure which shows the syntax for showing the positional information on the time slot to which a parameter set is applied as an absolute value and a difference value by one Embodiment of this invention. 本発明の一実施形態によりパラメータセットがグループとして適用されるタイムスロットの複数の位置情報を示す図である。It is a figure which shows the several positional information on the time slot to which a parameter set is applied as a group by one Embodiment of this invention. 本発明の一実施形態によるエンコーディング方法のフローチャートである。4 is a flowchart of an encoding method according to an embodiment of the present invention. 本発明の一実施形態によるデコーディング方法のフローチャートである。5 is a flowchart of a decoding method according to an embodiment of the present invention. 図１〜図１５に基づき説明されるエンコーディング及びデコーディング処理を実現するための装置構造を示すブロック図である。It is a block diagram which shows the apparatus structure for implement | achieving the encoding and decoding process demonstrated based on FIGS.

本発明への理解を容易にするために含まれる添付図面は本発明の実施形態を示すものであり、この明細書と一緒に本発明の原理を説明するためのものである。 The accompanying drawings, which are included to facilitate understanding of the present invention, illustrate embodiments of the present invention and together with the description serve to explain the principles of the invention.

図１は、本発明の一実施形態により空間情報を生成する原理を示す図である。マルチチャンネルオーディオ信号に対するコーディング方式の概念は、人間がオーディオ信号を３次元的に認識するということに基づく。オーディオ信号の３次元空間は空間情報を用いて表わすことができ、これは、チャンネルレベル差分（ＣＬＤ；ＣｈａｎｎｅｌＬｅｖｅｌＤｉｆｆｅｒｅｎｃｅ）と、チャンネル間相関／コーヒーレンス（ＩＣＣ；Ｉｎｔｅｒ−ｃｈａｎｎｅｌＣｏｒｒｅｌａｔｉｏｎ／Ｃｏｈｅｒｅｎｃｅ）と、チャンネル時間差分（ＣＴＤ；ＣｈａｎｎｅｌＴｉｍｅＤｉｆｆｅｒｅｎｃｅ）と、チャンネル予測係数（ＣＰＣ：ＣｈａｎｎｅｌＰｒｅｄｉｃｔｉｏｎＣｏｅｆｆｉｃｉｅｎｔｓ）などを含むが、これらに制限されるものではない。ＣＬＤは２つのオーディオチャンネル間エネルギー（レベル）差分を意味し、ＩＣＣは２つのオーディオチャンネル間相関またはコーヒーレンスの量を意味し、ＣＴＤは２チャンネル間の時間差分を意味する。 FIG. 1 is a diagram illustrating the principle of generating spatial information according to an embodiment of the present invention. The concept of coding schemes for multi-channel audio signals is based on the fact that humans recognize audio signals three-dimensionally. The three-dimensional space of the audio signal can be represented using spatial information, which includes channel level difference (CLD), inter-channel correlation / coherence (ICC), and channel level difference (CLD). Including, but not limited to, channel time difference (CTD) and channel prediction coefficient (CPC). CLD means the energy (level) difference between two audio channels, ICC means the amount of correlation or coherence between the two audio channels, and CTD means the time difference between the two channels.

図１にＣＴＤとＣＬＤパラメータの生成を示す。遠距離サウンドソース１０１から第１のダイレクトサウンド波１０３が人間の左耳１０７に達し、第２のダイレクトサウンド波１０２が人間の頭の周りにおいて回折された後、人間の右耳１０６に達する。２つのサウンド波１０２及び１０３は到達時間とエネルギーレベルにおいて互いに異なる。ＣＴＤパラメータとＣＬＤパラメータはサウンド波１０２と１０３の到達時間及びエネルギーレベル差分に基づき生成される。また、反射されたサウンド波１０４及び１０５が両耳１０６及び１０７にそれぞれ達し、これらは互いに相関がない。ＩＣＣパラメータはサウンド波１０４及び１０５間相関に基づき生成可能である。 FIG. 1 shows the generation of CTD and CLD parameters. The first direct sound wave 103 reaches the human left ear 107 from the long-distance sound source 101, and the second direct sound wave 102 reaches the human right ear 106 after being diffracted around the human head. The two sound waves 102 and 103 differ from each other in arrival time and energy level. The CTD parameter and the CLD parameter are generated based on the arrival times of the sound waves 102 and 103 and the energy level difference. Also, the reflected sound waves 104 and 105 reach both ears 106 and 107, respectively, and they are not correlated with each other. ICC parameters can be generated based on the correlation between the sound waves 104 and 105.

エンコーダーにおいては、マルチチャンネルオーディオ信号において空間情報（例えば、空間パラメータなど）が抽出され、ダウンミックス信号が生成される。ダウンミックス信号と空間パラメータはデコーダーに転送される。これに制限されるものではないが、モノ信号、ステレオ信号またはマルチチャンネルオーディオ信号を含むダウンミックス信号に任意の個数のオーディオチャンネルが使用可能である。デコーダーにおいては、ダウンミックス信号と空間パラメータからマルチチャンネルアップミックス信号が生成される。 In the encoder, spatial information (for example, spatial parameters) is extracted from the multi-channel audio signal, and a downmix signal is generated. The downmix signal and the spatial parameters are transferred to the decoder. Although not limited thereto, an arbitrary number of audio channels can be used for a downmix signal including a mono signal, a stereo signal, or a multi-channel audio signal. In the decoder, a multi-channel upmix signal is generated from the downmix signal and the spatial parameters.

図２は、本発明の一実施形態によりオーディオ信号をエンコーディングするエンコーダーのブロック図である。エンコーダーは、ダウンミキシング部２０２と、空間情報生成部２０３と、ダウンミックス信号エンコーディング部２０７とマルチプレクシング部２０９と、を備える。エンコーダーの他の構成も採用可能である。エンコーダーはハードウェアまたはソフトウェアにより実現されるか、あるいは、ハードウェアとソフトウェアとの組み合わせにより実現可能である。エンコーダーは集積回路チップ、チップセット、システムオンチップ（ＳｏＣ：ＳｙｓｔｅｍｏｎＣｈｉｐ）、デジタル信号プロセッサー、汎用プロセッサー及び種々のデジタル装置とアナログ装置により実現可能である。 FIG. 2 is a block diagram of an encoder for encoding an audio signal according to an embodiment of the present invention. The encoder includes a downmixing unit 202, a spatial information generation unit 203, a downmix signal encoding unit 207, and a multiplexing unit 209. Other configurations of the encoder can be employed. The encoder can be realized by hardware or software, or can be realized by a combination of hardware and software. The encoder can be realized by an integrated circuit chip, a chip set, a system on chip (SoC), a digital signal processor, a general-purpose processor, and various digital and analog devices.

ダウンミキシング部２０２は、マルチチャンネルオーディオ信号２０１からダウンミックス信号２０４を生成する。図２において、ｘ₁、…、ｘ_nは入力オーディオチャンネルを示す。上述したように、ダウンミックス信号２０４は、モノ信号であっても、ステレオ信号であっても、オーディオ信号であってもよい。図示の例において、ｘ’₁、…、ｘ’_mはダウンミックス信号２０４のチャンネル番号を示す。一部の実施形態において、エンコーダーはダウンミックス信号２０４に代えて外部供給ダウンミックス信号２０５（例えば、精度よいダウンミックスなど）を処理する。 The downmixing unit 202 generates a downmix signal 204 from the multichannel audio signal 201. In FIG. 2, x ₁ ,..., X _n indicate input audio channels. As described above, the downmix signal 204 may be a mono signal, a stereo signal, or an audio signal. In the illustrated example, x ′ ₁ ,..., X ′ _m indicate channel numbers of the downmix signal 204. In some embodiments, the encoder processes an externally supplied downmix signal 205 (eg, a precise downmix, etc.) instead of the downmix signal 204.

空間情報生成部２０３は、マルチチャンネルオーディオ信号２０１から空間情報を抽出する。この場合、「空間情報」とは、デコーダーにおいてダウンミックス信号２０４をマルチチャンネルオーディオ信号にアップミキシングするのに用いられるオーディオ信号チャンネルに関連する情報を意味している。ダウンミックス信号２０４は、マルチチャンネルオーディオ信号をダウンミックスすることにより生成される。空間情報はエンコーディングされて、エンコーディングされた空間情報信号２０６を与える。 The spatial information generation unit 203 extracts spatial information from the multichannel audio signal 201. In this case, “spatial information” means information related to an audio signal channel used to upmix the downmix signal 204 into a multichannel audio signal in the decoder. The downmix signal 204 is generated by downmixing the multichannel audio signal. The spatial information is encoded to provide an encoded spatial information signal 206.

ダウンミックス信号エンコーディング部２０７は、ダウンミキシング部２０２において生成されたダウンミックス信号２０４をエンコーディングして、エンコーディングされたダウンミックス信号２０８を生成する。 The downmix signal encoding unit 207 encodes the downmix signal 204 generated by the downmixing unit 202 to generate an encoded downmix signal 208.

マルチプレクシング部２０９は、エンコーディングされたダウンミックス信号２０８とエンコーディングされた空間情報信号２０６とを含むビットストリーム２１０を生成する。ビットストリーム２１０はダウンストリームデコーダーに転送され、及び／または、記録媒体に記録される。 The multiplexing unit 209 generates a bit stream 210 including the encoded downmix signal 208 and the encoded spatial information signal 206. The bitstream 210 is transferred to a downstream decoder and / or recorded on a recording medium.

図３は、本発明の一実施形態により、エンコーディングされたオーディオ信号をデコーディングするデコーダーのブロック図である。デコーダーは、デマルチプレクシング部３０２と、ダウンミックス信号デコーディング部３０５と、空間情報デコーディング部３０７と、アップミキシング部３０９と、を備える。デコーダーは、ハードウェアやソフトウェアにより、または、ハードウェアとソフトウェアとの組み合わせにより実現可能である。デコーダーは、集積回路チップ、チップセット、システムオンチップ（ＳｏＣ）、デジタル信号プロセッサー、汎用プロセッサー及び種々のデジタル装置やデバイスにより実現可能である。 FIG. 3 is a block diagram of a decoder for decoding an encoded audio signal according to an embodiment of the present invention. The decoder includes a demultiplexing unit 302, a downmix signal decoding unit 305, a spatial information decoding unit 307, and an upmixing unit 309. The decoder can be realized by hardware or software, or a combination of hardware and software. The decoder can be realized by an integrated circuit chip, a chip set, a system on chip (SoC), a digital signal processor, a general-purpose processor, and various digital devices and devices.

一部の実施形態において、デマルチプレクシング部３０２は、オーディオ信号を示すビットストリーム３０１を受信して、このビットストリーム３０１から、エンコーディングされたダウンミックス信号３０３とエンコーディングされた空間情報信号３０４とを分離する。図３において、ｘ’₁、…、ｘ’_mはダウンミックス信号３０３のチャンネルを示す。ダウンミックス信号デコーディング部３０５は、エンコーディングされたダウンミックス信号３０３をデコーディングして、デコーディングされたダウンミックス信号３０６を出力する。デコーダーがマルチチャンネルオーディオ信号を出力することができない場合、ダウンミックス信号デコーディング部３０５はダウンミックス信号３０６を直接的に出力することができる。図３において、ｙ’₁、…、ｙ’_mはダウンミックス信号デコーディング部３０５の直接出力チャンネルを示す。 In some embodiments, the demultiplexing unit 302 receives a bit stream 301 indicating an audio signal, and separates the encoded downmix signal 303 and the encoded spatial information signal 304 from the bit stream 301. To do. In FIG. 3, x ′ ₁ ,..., X ′ _m indicate channels of the downmix signal 303. The downmix signal decoding unit 305 decodes the encoded downmix signal 303 and outputs a decoded downmix signal 306. When the decoder cannot output the multi-channel audio signal, the downmix signal decoding unit 305 can output the downmix signal 306 directly. In FIG. 3, y ′ ₁ ,..., Y ′ _m indicate direct output channels of the downmix signal decoding unit 305.

空間情報信号デコーディング部３０７は、エンコーディングされた空間情報信号３０４から空間情報信号の構成情報を抽出し、抽出された構成情報を用いて空間情報信号３０４をデコーディングする。 The spatial information signal decoding unit 307 extracts configuration information of the spatial information signal from the encoded spatial information signal 304, and decodes the spatial information signal 304 using the extracted configuration information.

アップミキシング部３０９は、抽出された空間情報３０８を用いてダウンミックス信号３０６をマルチチャンネルオーディオ信号３１０にアップミックスすることができる。図３において、ｙ₁、…、ｙ_nはアップミキシング部３０９の出力チャンネル番号を示す。 The upmixing unit 309 can upmix the downmix signal 306 to the multi-channel audio signal 310 using the extracted spatial information 308. In FIG. 3, y ₁ ,..., Y _n indicate output channel numbers of the upmixing unit 309.

図４は、図３に示すデコーダーのアップミキシング部３０９に含まれうるチャンネル変換モジュールのブロック図である。一部の実施形態において、アップミキシング部３０９は複数のチャンネル変換モジュールを含むことができる。チャンネル変換モジュールは、特定の情報を用いて入力チャンネルの個数と出力チャンネルの個数を区別可能な概念的な装置である。 FIG. 4 is a block diagram of a channel conversion module that can be included in the upmixing unit 309 of the decoder shown in FIG. In some embodiments, the upmixing unit 309 may include a plurality of channel conversion modules. The channel conversion module is a conceptual device that can distinguish the number of input channels from the number of output channels using specific information.

一部の実施形態において、チャンネル変換モジュールは、１本チャンネルを２本チャンネル及び装置などに変換するＯＴＴ（Ｏｎｅ−Ｔｏ−Ｔｗｏ）ボックスと、２本チャンネルを３本チャンネル及び装置などに変換するＴＴＴ（Ｔｗｏ−Ｔｏ−Ｔｈｒｅｅ）ボックスを含む。ＯＴＴボックス及び／またはＴＴＴボックスは種々の有用な構成にて配置可能である。例えば、図３に示すアップミキシング部３０９は、５−１−５構成、５−２−５構成、７−２−７構成、７−５−７構成などを含むことができる。５−１−５構成においては、５本のチャンネルを１本のチャンネルにダウンミキシングして１本のチャンネルを有するダウンミックス信号が生成されるが、これは、今後、５本のチャンネルにアップミックス可能である。ＯＴＴボックスとＴＴＴボックスの種々の組み合わせを用いる他の構成も同様に生成可能である。 In some embodiments, the channel conversion module includes an OTT (One-To-Two) box that converts one channel into two channels and devices, and a TTT that converts two channels into three channels and devices. (Two-To-Three) box. The OTT box and / or the TTT box can be arranged in various useful configurations. For example, the upmixing unit 309 illustrated in FIG. 3 may include a 5-1-5 configuration, a 5-2-5 configuration, a 7-2-7 configuration, a 7-5-7 configuration, and the like. In the 5-1-5 configuration, 5 channels are downmixed into one channel to generate a downmix signal having one channel. This will be upmixed to 5 channels in the future. Is possible. Other configurations using various combinations of OTT boxes and TTT boxes can be generated as well.

図４には、アップミキシング部４００の５−２−５構成例が示してある。５−２−５構成においては、２本チャンネルを有するダウンミックス信号４０１がアップミキシング部４００に入力される。図示の例には、左側のチャンネルＬと右側のチャンネルＲがアップミキシング部４００への入力として与えられる。この実施形態において、アップミキシング部４００は、１つのＴＴＴボックス４０２と３つのＯＴＴボックス４０６、４０７及び４０８を備える。２本チャンネルを有するダウンミックス信号４０１がＴＴＴボックスＴＴＴ０に対する入力として与えられ、ＴＴＴボックスはダウンミックス信号４０１を処理して３本の出力チャンネル４０３、４０４及び４０５を与える。ＴＴＴボックス４０２に対する入力として、１以上の空間パラメータ（例えば、ＣＰＣ、ＣＬＤ、ＩＣＣなど）が与えられて、後述するように、ダウンミックス信号４０１を処理するのに使用可能である。この場合、ＣＰＣは２本のチャンネルから３本のチャンネルを生成する予測係数として説明可能である。 FIG. 4 shows a 5-2-5 configuration example of the upmixing unit 400. In the 5-2-5 configuration, a downmix signal 401 having two channels is input to the upmixing unit 400. In the illustrated example, the left channel L and the right channel R are provided as inputs to the upmixing unit 400. In this embodiment, the upmixing unit 400 includes one TTT box 402 and three OTT boxes 406, 407, and 408. A downmix signal 401 having two channels is provided as an input to TTT box TTT0, which processes the downmix signal 401 to provide three output channels 403, 404 and 405. As an input to the TTT box 402, one or more spatial parameters (eg, CPC, CLD, ICC, etc.) are provided and can be used to process the downmix signal 401 as described below. In this case, CPC can be described as a prediction coefficient for generating three channels from two channels.

ＴＴＴボックス４０２からの出力として与えられるチャンネル４０３は、１以上の空間パラメータを用いて２本の出力チャンネルを生成するＯＴＴボックス４０６に対する入力として与えられる。図示の例において、２本の出力チャンネルは、例えば、サラウンドサウンド環境における前左側（ＦＬ；ＦｒｏｎｔＬｅｆｔ）スピーカー位置と後左側（ＢＬ；ＢａｃｋｗａｒｄＬｅｆｔ）スピーカー位置を示す。チャンネル４０４は１以上の空間パラメータを用いて２本の出力チャンネルを生成するＯＴＴボックス４０７に対する入力として与えられる。図示の例において、２本の出力チャンネルは前右側（ＦＲ；ＦｒｏｎｔＲｉｇｈｔ）のスピーカー位置と後右側（ＢＲ；ＢａｃｋｗａｒｄＲｉｇｈｔ）のスピーカー位置を示す。チャンネル４０５は２本の出力チャンネルを生成するＯＴＴボックス４０８に対する入力として与えられる。図示の例において、２本の出力チャンネルはセンター（Ｃ；Ｃｅｎｔｅｒ）スピーカー位置と低周波拡張（ＬＦＥ；ＬｏｗＦｒｅｑｕｅｎｃｙＥｎｈａｎｃｅｍｅｎｔ）チャンネルを示す。この場合、空間情報（例えば、ＣＬＤ、ＩＣＣなど）はＯＴＴボックスのそれぞれに対する入力として与えられる。一部の実施形態においては、レジデュアル信号Ｒｅｓ１、Ｒｅｓ２がＯＴＴボックス４０６及び４０７に対する入力として与えられうる。この実施形態において、レジデュアル信号はセンターチャンネルとＬＦＥチャンネルを出力するＯＴＴボックス４０８に対する出力として与えられないことがある。 The channel 403 provided as output from the TTT box 402 is provided as an input to an OTT box 406 that generates two output channels using one or more spatial parameters. In the illustrated example, the two output channels indicate, for example, a front left (FL) speaker position and a rear left (BL) speaker position in a surround sound environment. Channel 404 is provided as an input to an OTT box 407 that generates two output channels using one or more spatial parameters. In the illustrated example, two output channels indicate a front right (FR) front speaker position and a rear right (BR) back right speaker position. Channel 405 is provided as an input to an OTT box 408 that generates two output channels. In the illustrated example, the two output channels indicate a center (C) speaker position and a low frequency enhancement (LFE) channel. In this case, spatial information (eg, CLD, ICC, etc.) is provided as an input to each of the OTT boxes. In some embodiments, residual signals Res1, Res2 may be provided as inputs to OTT boxes 406 and 407. In this embodiment, the residual signal may not be provided as an output to the OTT box 408 that outputs the center channel and the LFE channel.

図４に示す構成は、チャンネル変換モジュール用の構成の一例である。ＯＴＴボックスとＴＴＴボックスの種々の組み合わせを含むチャンネル変換モジュール用の他の構成も採用可能である。チャンネル変換モジュールのそれぞれは周波数エリアにおいて動作可能であるため、チャンネル変換モジュールのそれぞれに適用されるパラメータ帯域の個数が定義可能である。パラメータ帯域は一つのパラメータに適用可能な少なくとも一つの周波数帯域を意味している。パラメータ帯域の個数については図６Ｂに基づき説明する。 The configuration shown in FIG. 4 is an example of a configuration for a channel conversion module. Other configurations for channel conversion modules including various combinations of OTT boxes and TTT boxes can also be employed. Since each of the channel conversion modules can operate in the frequency area, the number of parameter bands applied to each of the channel conversion modules can be defined. The parameter band means at least one frequency band applicable to one parameter. The number of parameter bands will be described with reference to FIG. 6B.

図５は、本発明の一実施形態によりオーディオ信号のビットストリームを構成する方法を示す図である。図５の（ａ）は、空間情報信号だけを含むオーディオ信号のビットストリームを示し、図５の（ｂ）及び（ｃ）は、ダウンミックス信号と空間情報信号を含むオーディオ信号のビットストリームを示す。 FIG. 5 is a diagram illustrating a method of constructing a bit stream of an audio signal according to an embodiment of the present invention. 5A shows a bit stream of an audio signal including only a spatial information signal, and FIGS. 5B and 5C show a bit stream of an audio signal including a downmix signal and a spatial information signal. .

図５の（ａ）を参照すると、オーディオ信号のビットストリームは、構成情報５０１とフレーム５０３を含むことができる。フレーム５０３は、ビットストリームにおいて繰り返し可能であり、一部の実施形態においては、空間オーディオ情報を含む１枚の空間フレーム５０２を含む。 Referring to (a) of FIG. 5, the bit stream of the audio signal can include configuration information 501 and a frame 503. Frame 503 is repeatable in the bitstream and, in some embodiments, includes a single spatial frame 502 that includes spatial audio information.

一部の実施形態において、構成情報５０１は１枚の空間フレーム５０２内におけるタイムスロットの総数と、オーディオ信号の周波数範囲を拡張するパラメータ帯域の総数と、ＯＴＴボックスにおけるパラメータ帯域の個数と、ＴＴＴボックスにおけるパラメータ帯域の個数と、レジデュアル信号におけるパラメータ帯域の個数を示す情報を含む。構成情報５０１には所望に応じて他の情報が含まれ得る。 In some embodiments, the configuration information 501 includes the total number of time slots in one spatial frame 502, the total number of parameter bands that extend the frequency range of the audio signal, the number of parameter bands in the OTT box, and the TTT box. And information indicating the number of parameter bands in the residual signal. The configuration information 501 can include other information as desired.

一部の実施形態において、空間フレーム５０２は、１以上の空間パラメータ（例えば、ＣＬＤ、ＩＣＣなど）と、フレームタイプと、１枚のフレーム内におけるパラメータセットの個数とパラメータセットが適用可能なタイムスロットと、を含む。所望に応じて、空間フレーム５０２には他の情報が含まれうる。図６〜図１０に基づき、構成情報５０１及び空間フレーム５０２に含まれる情報の意味と用途を説明する。 In some embodiments, the spatial frame 502 includes one or more spatial parameters (eg, CLD, ICC, etc.), a frame type, the number of parameter sets within a frame, and a time slot to which the parameter set is applicable. And including. Other information may be included in the spatial frame 502 as desired. The meaning and use of the information included in the configuration information 501 and the spatial frame 502 will be described with reference to FIGS.

図５の（ｂ）を参照すると、オーディオ信号のビットストリームは、構成情報５０４と、ダウンミックス信号５０５と、空間フレーム５０６と、を含む。この場合、１枚のフレーム５０７は、ダウンミックス信号５０５と空間フレーム５０６を含み、これらのフレーム５０７がビットストリームにおいて繰り返し可能である。 Referring to (b) of FIG. 5, the bit stream of the audio signal includes configuration information 504, a downmix signal 505, and a spatial frame 506. In this case, one frame 507 includes a downmix signal 505 and a spatial frame 506, and these frames 507 can be repeated in the bitstream.

図５の（ｃ）を参照すると、オーディオ信号のビットストリームは、ダウンミックス信号５０８と、構成情報５０９と、空間フレーム５１０と、を含む。この場合、１枚のフレーム５１１は、構成情報５０９と空間フレーム５１０を含み、フレーム５１１は、ビットストリームにおいて繰り返し可能である。各フレーム５１１に構成情報５０９が挿入される場合、オーディオ信号は再生装置により任意の位置において再生可能である。 Referring to (c) of FIG. 5, the bit stream of the audio signal includes a downmix signal 508, configuration information 509, and a spatial frame 510. In this case, one frame 511 includes configuration information 509 and a spatial frame 510, and the frame 511 can be repeated in the bitstream. When the configuration information 509 is inserted into each frame 511, the audio signal can be played back at an arbitrary position by the playback device.

図５の（ｃ）は、構成情報５０９がフレーム５１１ごとにビットストリームに挿入されることを示しているが、周期的にまたは非周期的に繰り返される複数のフレームごとに構成情報５０９がビットストリームに挿入可能であるということはいうまでもない。 FIG. 5C shows that the configuration information 509 is inserted into the bit stream for each frame 511. However, the configuration information 509 is bit streamed for each of a plurality of frames that are periodically or aperiodically repeated. It goes without saying that it can be inserted into the.

図６Ａと図６Ｂは、本発明の一実施形態によるパラメータセット、タイムスロット及びパラメータ帯域間の関係を示す図である。パラメータセットとは、１個のタイムスロットに適用される１以上の空間パラメータのことを言う。空間パラメータは、ＣＬＤ、ＩＣＣ、ＣＰＣなどの空間情報を含むことができる。タイムスロットとは、空間パラメータが適用可能なオーディオ信号の時間間隔のことをいう。１枚の空間フレームは１以上のタイムスロットを含むことができる。 6A and 6B are diagrams illustrating a relationship among parameter sets, time slots, and parameter bands according to an embodiment of the present invention. A parameter set refers to one or more spatial parameters applied to one time slot. Spatial parameters may include spatial information such as CLD, ICC, CPC. A time slot refers to a time interval of an audio signal to which a spatial parameter can be applied. One spatial frame can include one or more time slots.

図６Ａを参照すると、多数のパラメータセット１、…、Ｐが空間フレームに使用可能であり、各パラメータセットは１以上のデータフィールド１、…、Ｑ−１を含むことができる。オーディオ信号の全体の周波数範囲に一つのパラメータセットが適用可能であり、このようなパラメータセットにおける各空間パラメータは当該周波数帯域の１以上の位置に適用可能である。例えば、パラメータセットが２０個の空間パラメータを含む場合、オーディオ信号の全体の周波数帯域は２０個のエリア（以下、「パラメータ帯域」と言う。）に分割可能であり、パラメータセットの２０個の空間パラメータがこれらの２０個のパラメータ帯域に適用可能である。パラメータは、所望に応じて、パラメータ帯域に適用可能である。例えば、低周波パラメータ帯域に空間パラメータが密に適用され、高周波パラメータ帯域には疎らに適用可能である。 Referring to FIG. 6A, a number of parameter sets 1,..., P can be used for a spatial frame, and each parameter set can include one or more data fields 1,. One parameter set can be applied to the entire frequency range of the audio signal, and each spatial parameter in such a parameter set can be applied to one or more positions in the frequency band. For example, when the parameter set includes 20 spatial parameters, the entire frequency band of the audio signal can be divided into 20 areas (hereinafter referred to as “parameter bands”), and the 20 spaces of the parameter set. Parameters are applicable to these 20 parameter bands. Parameters can be applied to parameter bands as desired. For example, spatial parameters can be applied densely in the low frequency parameter band and sparsely applied in the high frequency parameter band.

図６Ｂには、パラメータセットとタイムスロットとの間の関係を示す時間／周波数グラフが示している。図示の例においては、１枚の空間フレームに１２個のタイムスロットに並べられたセットに３つのパラメータセット（パラメータセット１、パラメータセット２、パラメータセット３）が適用される。この場合、オーディオ信号の全体の周波数範囲は９個のパラメータ帯域に分割される。このため、水平軸はタイムスロットの個数を示し、垂直軸はパラメータ帯域の個数を示す。３つのパラメータセットのそれぞれが特定のタイムスロットに適用される。例えば、最初のパラメータセット（パラメータセット１）はタイムスロット＃１に適用され、２番目のパラメータセット（パラメータセット２）はタイムスロット＃５に適用され、３番目のパラメータセット（パラメータセット３）はタイムスロット＃９に適用される。タイムスロットにこれらのパラメータセットを補間及び／またはコピーすることにより、残りのタイムスロットにもこれらのパラメータセットが適用可能である。一般的に、パラメータセットの個数はタイムスロットの個数以下であってもよく、パラメータ帯域の個数はオーディオ信号の周波数帯域の個数以下であってもよい。オーディオ信号の全体の時間−周波数エリアの代わりに、オーディオ信号の一部の時間−周波数エリアに関する空間情報をエンコーディングすることにより、エンコーダーからデコーダーに送られる空間情報の量を低減することができる。このようなデータの低減が可能になる理由は、公知のオーディオコーディング認識の原理によれば、時間−周波数エリアにおける空間情報は、ほとんどの場合、人間の聴覚認識に十分であるためである。 FIG. 6B shows a time / frequency graph showing the relationship between parameter sets and time slots. In the illustrated example, three parameter sets (parameter set 1, parameter set 2, parameter set 3) are applied to a set arranged in 12 time slots in one spatial frame. In this case, the entire frequency range of the audio signal is divided into nine parameter bands. Therefore, the horizontal axis indicates the number of time slots, and the vertical axis indicates the number of parameter bands. Each of the three parameter sets is applied to a specific time slot. For example, the first parameter set (parameter set 1) is applied to time slot # 1, the second parameter set (parameter set 2) is applied to time slot # 5, and the third parameter set (parameter set 3) is Applies to time slot # 9. By interpolating and / or copying these parameter sets to the time slots, these parameter sets can be applied to the remaining time slots. In general, the number of parameter sets may be less than or equal to the number of time slots, and the number of parameter bands may be less than or equal to the number of frequency bands of the audio signal. By encoding the spatial information about the time-frequency area of a portion of the audio signal instead of the entire time-frequency area of the audio signal, the amount of spatial information sent from the encoder to the decoder can be reduced. The reason that such data reduction is possible is that, according to known audio coding recognition principles, spatial information in the time-frequency area is almost always sufficient for human auditory recognition.

開示された実施形態の重要な特徴は、パラメータセットが適用されるタイムスロットの位置を、固定個数のビットまたは可変個数のビットを用いてエンコーディングしかつデコーディングする、というところにある。また、パラメータ帯域の個数も固定個数のビットまたは可変個数のビットで表わすことができる。これに制限されるものではないが、空間オーディオコーディングに用いられる他の情報として、時間エリア、空間エリア及び／または周波数エリアに関連する情報を含む情報にも可変コーディング方式が適用可能である（例えば、フィルターバンクから出力される多数の周波数副帯域に適用される）。 An important feature of the disclosed embodiment is that the time slot position to which the parameter set is applied is encoded and decoded using a fixed number of bits or a variable number of bits. The number of parameter bands can also be represented by a fixed number of bits or a variable number of bits. Although not limited thereto, as other information used for spatial audio coding, a variable coding scheme can be applied to information including information related to a time area, a spatial area, and / or a frequency area (for example, Applied to a number of frequency subbands output from the filter bank).

図７Ａは、本発明の一実施形態による空間情報の構成情報を示すシンタックスを示している。このような構成情報は多数のビットが割当て可能な複数のフィールド７０１〜７１８を含む。 FIG. 7A shows a syntax indicating configuration information of spatial information according to an embodiment of the present invention. Such configuration information includes a plurality of fields 701 to 718 to which a large number of bits can be assigned.

「ｂｓＳａｍｐｌｉｎｇＦｒｅｑｕｅｎｃｙＩｎｄｅｘ」フィールド７０１は、オーディオ信号のサンプリング処理から取得されるサンプリング周波数を示す。サンプリング周波数を示すために、「ｂｓＳａｍｐｌｉｎｇＦｒｅｑｕｅｎｃｙＩｎｄｅｘ」フィールド７０１には４ビットが割り当てられる。「ｂｓＳａｍｐｌｉｎｇＦｒｅｑｕｅｎｃｙＩｎｄｅｘ」フィールド７０１の値が１５、すなわち、２進数「１１１１」であれば、サンプリング周波数を示すために、「ｂｓＳａｍｐｌｉｎｇＦｒｅｑｕｅｎｃｙ」フィールド７０２が追加される。この場合、「ｂｓＳａｍｐｌｉｎｇＦｒｅｑｕｅｎｃｙ」フィールド７０２には２４ビットが割り当てられる。 A “bsSamplingFrequencyIndex” field 701 indicates a sampling frequency acquired from the sampling process of the audio signal. To indicate the sampling frequency, 4 bits are assigned to the “bsSamplingFrequencyIndex” field 701. If the value of the “bsSamplingFrequencyIndex” field 701 is 15, that is, the binary number “1111”, a “bsSamplingFrequency” field 702 is added to indicate the sampling frequency. In this case, 24 bits are assigned to the “bsSamplingFrequency” field 702.

「ｂｓＦｒａｍｅＬｅｎｇｔｈ」フィールド７０３は、１枚の空間フレーム内のタイムスロットの総数（以下、「ｎｕｍＳｌｏｔｓ」という。）を示し、「ｎｕｍＳｌｏｔｓ」と「ｂｓＦｒａｍｅＬｅｎｇｔｈ」フィールド７０３との間には、「ｎｕｍＳｌｏｔｓ＝ｂｓＦｒａｍｅＬｅｎｇｔｈ＋１」という関係が成り立つ。 The “bsFrameLength” field 703 indicates the total number of time slots (hereinafter referred to as “numSlots”) in one spatial frame, and “numSlots = bsFrameLength + 1” between the “numSlots” and the “bsFrameLength” field 703. This relationship holds.

「ｂｓＦｒｅｑＲｅｓ」フィールド７０４は、オーディオ信号の全体の周波数エリアを拡張するパラメータ帯域の総数を示す。「ｂｓＦｒｅｑＲｅｓ」フィールド７０４については図７Ｂに基づき説明する。 The “bsFreqRes” field 704 indicates the total number of parameter bands that extend the entire frequency area of the audio signal. The “bsFreqRes” field 704 will be described with reference to FIG. 7B.

「ｂｓＴｒｅｅＣｏｎｆｇ」フィールド７０５は、図４に基づき説明したように、複数のチャンネル変換モジュールを含むツリー構成のための情報を示す。このようなツリー構成のための情報は、チャンネル変換モジュールのタイプ、チャンネル変換モジュールの個数、チャンネル変換モジュールに用いられた空間情報のタイプ、オーディオ信号の入力／出力チャンネルの個数などの情報を含む。 The “bsTreeConfig” field 705 indicates information for a tree structure including a plurality of channel conversion modules as described with reference to FIG. Information for such a tree structure includes information such as the type of channel conversion module, the number of channel conversion modules, the type of spatial information used in the channel conversion module, and the number of input / output channels of audio signals.

ツリー構成は、チャンネル変換モジュールのタイプまたはチャンネルの個数に応じて、５−１−５構成、５−２−５構成、７−２−７構成、７−５−７構成などのうちいずれかであってもよい。ツリー構成のうち５−２−５構成が図４に示してある。 The tree configuration is one of the 5-1-5 configuration, the 5-2-5 configuration, the 7-2-7 configuration, the 7-5-7 configuration, etc., depending on the type of channel conversion module or the number of channels. There may be. Of the tree configuration, the 5-2-5 configuration is shown in FIG.

「ｂｓＱｕａｎｔＭｏｄｅ」フィールド７０６は、空間情報の量子化モード情報を示す。 A “bsQuantMode” field 706 indicates quantization mode information of spatial information.

「ｂｓＯｎｅＩｃｃ」フィールド７０７は、１つのＩＣＣパラメータサブセットが全体のＯＴＴボックスに対して用いられるか否かを示す。この場合、パラメータサブセットは、特定のタイムスロット及び特定のチャンネル変換モジュールに適用されるパラメータセットを意味する。 The “bsOneIcc” field 707 indicates whether one ICC parameter subset is used for the entire OTT box. In this case, the parameter subset means a parameter set applied to a specific time slot and a specific channel conversion module.

「ｂｓＡｒｂｉｔｒａｒｙＤｏｗｎｍｉｘ」フィールド７０８は、任意のダウンミックスゲインの存否を示す。 A “bsArbitraryDownmix” field 708 indicates whether or not an arbitrary downmix gain exists.

「ｂｓＦｉｘｅｄＧａｉｎＳ（登録商標）ｕｒ」フィールド７０９は、ＬＳ（左側のサラウンド）及びＲＳ（右側のサラウンド）などのサラウンドチャンネルに適用されるゲインを示す。 A “bsFixedGainS® ur” field 709 indicates a gain applied to surround channels such as LS (left surround) and RS (right surround).

「ｂｓＦｉｘｅｄＧａｉｎＬＦＥ」は、ＬＦＥチャンネルに適用されるゲインを示す。 “BsFixedGainLFE” indicates a gain applied to the LFE channel.

「ｂｓＦｉｘｅｄＧａｉｎＤＭ」は、ダウンミックス信号に適用されるゲインを示す。 “BsFixedGainDM” indicates a gain applied to the downmix signal.

「ｂｓＭａｔｒｉｘＭｏｄｅ」フィールド７１２は、ステレオダウンミックス信号と互換可能な行列がエンコーダーから生成されるか否かを示す。 The “bsMatrixMode” field 712 indicates whether a matrix compatible with the stereo downmix signal is generated from the encoder.

「ｂｓＴｅｍｐＳｈａｐｅＣｏｎｆｉｇ」フィールド７１３は、デコーダーにおける臨時の形態（例えば、ＴＥＳ（ＴｅｍｐｏｒａｌＥｎｖｅｌｏｐｅＳｈａｐｉｎｇ）及び／またはＴＰ（ＴｅｍｐｏｒａｌＳｈａｐｉｎｇ））の動作モードを示す。 The “bsTempShapeConfig” field 713 indicates an operation mode of a temporary form in the decoder (for example, TES (Temporal Envelope Shaping) and / or TP (Temporal Shaping)).

「ｂｓＤｅｃｏｒｒＣｏｎｆｉｇ」フィールド７１４は、デコーダーの相関分離器の動作モードを示す。 The “bsDecorrConfig” field 714 indicates the operation mode of the correlation separator of the decoder.

最後に、「ｂｓ３ＤａｕｄｉｏＭｏｄｅ」フィールド７１５は、ダウンミックス信号が３Ｄ信号にエンコーディングされるか否かと、逆ＨＲＴＦ処理が用いられるか否かを示す。 Finally, the “bs3DaudioMode” field 715 indicates whether the downmix signal is encoded into a 3D signal and whether inverse HRTF processing is used.

エンコーダー／デコーダーにおいて各フィールドの情報が決定／抽出された後、チャンネル変換モジュールに適用されるパラメータ帯域の個数に関する情報がエンコーダー／デコーダーにおいて決定／抽出される。先ず、ＯＴＴボックスに適用されるパラメータ帯域の個数が決定／抽出され（７１６）てから、ＴＴＴボックスに適用されるパラメータ帯域の個数が決定／抽出される（７１７）。ＯＴＴボックス及び／またはＴＴＴボックスに対するパラメータ帯域の個数は、以下、図８Ａ〜図９Ｂに基づき詳述する。 After the information of each field is determined / extracted in the encoder / decoder, information on the number of parameter bands applied to the channel conversion module is determined / extracted in the encoder / decoder. First, the number of parameter bands applied to the OTT box is determined / extracted (716), and then the number of parameter bands applied to the TTT box is determined / extracted (717). The number of parameter bands for the OTT box and / or the TTT box will be described in detail below with reference to FIGS. 8A to 9B.

拡張フレームが存在する場合、「ｓｐａｔｉａｌＥｘｔｅｎｓｉｏｎＣｏｎｆｉｇ」ブロック７１８は、拡張フレームに関する構成情報を含む。「ｓｐａｔｉａｌＥｘｔｅｎｓｉｏｎＣｏｎｆｉｇ」ブロック７１８に含まれている情報について、以下、図１０Ａ〜図１０Ｄに基づき説明する。 If an extension frame is present, a “spatialExtensionConfig” block 718 includes configuration information regarding the extension frame. Information included in the “spatialExtensionConfig” block 718 will be described below with reference to FIGS. 10A to 10D.

図７Ｂは、本発明の一実施形態による空間情報信号のパラメータ帯域の個数を示す表である。「ｎｕｍＢａｎｄｓ」は、オーディオ信号の全体の周波数エリアに対するパラメータ帯域の個数を示し、「ｂｓＦｒｅｑＲｅｓ」は、パラメータ帯域の個数に関するインデックス情報を示す。例えば、オーディオ信号の全体の周波数エリアは、所望に応じて、パラメータ帯域の個数（例えば、４、５、７、１０、１４、２０、２８など）に分割可能である。 FIG. 7B is a table showing the number of parameter bands of the spatial information signal according to an embodiment of the present invention. “NumBands” indicates the number of parameter bands for the entire frequency area of the audio signal, and “bsFreqRes” indicates index information regarding the number of parameter bands. For example, the entire frequency area of the audio signal can be divided into the number of parameter bands (eg, 4, 5, 7, 10, 14, 20, 28, etc.) as desired.

一部の実施形態においては、各パラメータ帯域に一つのパラメータが適用可能である。例えば、「ｎｕｍＢａｎｄｓ」が２８である場合、オーディオ信号の全体の周波数エリアは２８個のパラメータ帯域に分割され、これらの２８個のパラメータ帯域のそれぞれに２８個のパラメータがそれぞれ適用可能である。他の例において、「ｎｕｍＢａｎｄｓ」が４である場合、所定のオーディオ信号の全体の周波数エリアは４個のパラメータ帯域に分割され、これらの４個のパラメータ帯域のそれぞれには４個のパラメータがそれぞれ適用可能である。図７Ｂにおいて、「Ｒｅｓｅｒｖｅ」は、所定のオーディオ信号の全体の周波数エリアに対するパラメータ帯域の個数が決定されていないことを意味する。 In some embodiments, one parameter can be applied to each parameter band. For example, when “numBands” is 28, the entire frequency area of the audio signal is divided into 28 parameter bands, and 28 parameters can be applied to each of these 28 parameter bands. In another example, when “numBands” is 4, the entire frequency area of a given audio signal is divided into four parameter bands, each of which has four parameters. Applicable. In FIG. 7B, “Reserve” means that the number of parameter bands for the entire frequency area of the predetermined audio signal has not been determined.

人間の聴覚機関は、コーディング方式において用いられるパラメータ帯域の個数に敏感ではないということに留意する必要がある。このため、少数のパラメータ帯域を用いてもより多数のパラメータ帯域が用いられた場合に比べて、聴取者に類似する空間オーディオ効果を奏することができる。 It should be noted that the human auditory engine is not sensitive to the number of parameter bands used in the coding scheme. For this reason, even if a small number of parameter bands are used, a spatial audio effect similar to that of the listener can be achieved as compared with the case where a larger number of parameter bands are used.

「ｎｕｍＢａｎｄｓ」とは異なり、図７Ａに示す「ｂｓＦｒａｍｅｌｅｎｇｔｈ」フィールド７０３が示す「ｎｕｍＳｌｏｔｓ」は、全体の値を示すことができる。しかしながら、１枚の空間フレーム内のサンプルの個数が「ｎｕｍＳｌｏｔｓ」により明確に分割される場合、「ｎｕｍＳｌｏｔｓ」の値は制限される。このため、実質的に表わされるべき「ｎｕｍＳｌｏｔｓ」の最大値が「ｂ」であれば、「ｂｓＦｒａｍｅＬｅｎｇｔｈ」フィールド７０３の全体の値はｃｅｉｌ｛ｌｏｇ₂（ｂ）｝ビットで表わすことができる。この場合、「ｃｅｉｌ（ｘ）」は、値「ｘ」以上の最大の整数を意味する。例えば、１枚の空間フレームが７２個のタイムスロットを含む場合、ｃｅｉｌ｛ｌｏｇ₂（７２）｝＝７ビットが「ｂｓＦｒａｍｅＬｅｎｇｔｈ」フィールド７０３に割当て可能であり、チャンネル変換モジュールに適用されるパラメータ帯域の個数は「ｎｕｍＢａｎｄｓ」内において決定可能である。 Unlike “numBands”, “numSlots” indicated by the “bsFramelength” field 703 shown in FIG. 7A can indicate the entire value. However, if the number of samples in one spatial frame is clearly divided by “numSlots”, the value of “numSlots” is limited. For this reason, if the maximum value of “numSlots” to be substantially expressed is “b”, the entire value of the “bsFrameLength” field 703 can be expressed by ceil {log ₂ (b)} bits. In this case, “ceil (x)” means the largest integer greater than or equal to the value “x”. For example, if one spatial frame includes 72 time slots, ceil {log ₂ (72)} = 7 bits can be assigned to the “bsFrameLength” field 703 and the parameter band applied to the channel conversion module The number can be determined in “numBands”.

図８Ａは、本発明の一実施形態により、ＯＴＴボックスに適用されるパラメータ帯域の個数を固定個数のビットで表わすシンタックスを示している。図７Ａと図８Ａを参照すると、「ｉ」は「０」において「ｎｕｍＯｔｔＢｏｘｅｓ−１」の値を有し、「ｎｕｍＯｔｔＢｏｘｅｓ」はＯＴＴボックスの総数である。すなわち、「ｉ」の値が各ＯＴＴボックスを示し、各ＯＴＴボックスに適用されるパラメータ帯域の個数は「ｉ」の値により表わされる。ＯＴＴボックスがＬＦＥチャンネルモードを有する場合、ＯＴＴボックスのＬＦＥチャンネルに適用される帯域の個数（以下、「ｂｓＯｔｔＢａｎｄｓ」という。）は固定個数のビットを用いて表わすことができる。図示の例においては、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０１に５ビットが割り当てられる。ＯＴＴボックスがＬＦＥチャンネルモードを有さない場合、総数のパラメータ帯域ｎｕｍＢａｎｄｓがＯＴＴボックスのチャンネルに割り当てられる。 FIG. 8A shows a syntax for representing the number of parameter bands applied to an OTT box by a fixed number of bits according to an embodiment of the present invention. Referring to FIGS. 7A and 8A, “i” has a value of “numOttBoxes−1” at “0”, and “numOttBoxes” is the total number of OTT boxes. That is, the value of “i” indicates each OTT box, and the number of parameter bands applied to each OTT box is represented by the value of “i”. When the OTT box has the LFE channel mode, the number of bands applied to the LFE channel of the OTT box (hereinafter referred to as “bsOttBands”) can be expressed using a fixed number of bits. In the illustrated example, 5 bits are assigned to the “bsOttBands” field 801. If the OTT box does not have the LFE channel mode, the total number of parameter bands numBands is assigned to the channel of the OTT box.

図８Ｂは、本発明の一実施の形態により、ＯＴＴボックスに適用されるパラメータ帯域の個数を可変個数のビットで表わすシンタックスを示している。図８Ｂは、図８Ａとほとんど同様であるが、図８Ｂに示す「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２が可変個数のビットで表わすという点で図８Ａとは異なる。具体的に、「ｎｕｍＢａｎｄｓ」以下の値を有する「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２は、「ｎｕｍＢａｎｄｓ」を用いる可変個数のビットで表わすことができる。 FIG. 8B shows a syntax representing the number of parameter bands applied to an OTT box by a variable number of bits according to an embodiment of the present invention. FIG. 8B is almost the same as FIG. 8A, but differs from FIG. 8A in that the “bsOtBands” field 802 shown in FIG. 8B is represented by a variable number of bits. Specifically, a “bsOttBands” field 802 having a value less than or equal to “numBands” can be represented by a variable number of bits using “numBands”.

「ｎｕｍＢａｎｄｓ」が２＾（ｎ−１）以上２＾（ｎ）未満の範囲に収まると、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２は可変ｎビットで表わすことができる。 When “numBands” falls within the range of 2 ^ (n−1) or more and less than 2 ^ (n), the “bsOttBands” field 802 can be represented by variable n bits.

例えば、（ａ）「ｎｕｍＢａｎｄｓ」が４０である場合、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２は６ビットで表わされ、（ｂ）「ｎｕｍＢａｎｄｓ」が２８または２０である場合、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２は５ビットで表わされ、（ｃ）「ｎｕｍＢａｎｄｓ」が１４または１０である場合、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２は４ビットで表わされ、（ｄ）「ｎｕｍＢａｎｄｓ」が７、５または４である場合、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２は３ビットで表わされる。 For example, if (a) “numBands” is 40, the “bsOtBands” field 802 is represented by 6 bits, and (b) if “numBands” is 28 or 20, the “bsOttBands” field 802 is 5 bits. And if (num) "numBands" is 14 or 10, the "bsOttBands" field 802 is represented by 4 bits; (d) if "numBands" is 7, 5 or 4, then "bsOttBands" Field 802 is represented by 3 bits.

「ｎｕｍＢａｎｄｓ」が２＾（ｎ−１）超え２＾（ｎ）以下の範囲に収まると、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２は可変ｎビットで表わすことができる。 When “numBands” falls within the range of 2 ^ (n−1) to 2 ^ (n), the “bsOttBands” field 802 can be represented by variable n bits.

例えば、（ａ）「ｎｕｍＢａｎｄｓ」が４０である場合、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２は６ビットで表わされ、（ｂ）「ｎｕｍＢａｎｄｓ」が２８または２０である場合、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２は５ビットで表わされ、（ｃ）「ｎｕｍＢａｎｄｓ」が１４または１０である場合、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２は４ビットで表わされ、（ｄ）「ｎｕｍＢａｎｄｓ」が７または５である場合、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２は３ビットで表わされ、（ｅ）「ｎｕｍＢａｎｄｓ」が４である場合、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２は２ビットとなる。 For example, if (a) “numBands” is 40, the “bsOtBands” field 802 is represented by 6 bits, and (b) if “numBands” is 28 or 20, the “bsOttBands” field 802 is 5 bits. (C) if “numBands” is 14 or 10, the “bsOtBands” field 802 is represented by 4 bits; (d) if “numBands” is 7 or 5, the “bsOtBands” field 802 Is represented by 3 bits. (E) When “numBands” is 4, the “bsOttBands” field 802 has 2 bits.

「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２は、「ｎｕｍＢａｎｄｓ」を変数として取って最も近い整数に切り上げる関数（以下、「切り上げ関数」と言う。）により可変個数のビットで表わすことができる。 The “bsOttBands” field 802 can be represented by a variable number of bits by a function that takes “numBands” as a variable and rounds it up to the nearest integer (hereinafter referred to as “rounding up function”).

具体的に、ｉ）０＜ｂｓＯｔｔＢａｎｄｓ＜ｎｕｍＢａｎｄｓまたは０≦ｂｓＯｔｔＢａｎｄｓ＜ｎｕｍＢａｎｄｓである場合、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２はｃｅｉｌ（ｌｏｇ₂ｎｕｍＢａｎｄｓ）の値に対応する数のビットで表わされるか、あるいは、ii）０≦ｂｓＯｔｔＢａｎｄｓ≦ｎｕｍＢａｎｄｓである場合、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２はｃｅｉｌ（ｌｏｇ₂（ｎｕｍＢａｎｄｓ＋１））で表わすことができる。 Specifically, if i) 0 <bsOtBands <numBands or 0 ≦ bsOttBands <numBands, then the “bsOtBands” field 802 is represented by a number of bits corresponding to the value of ceil (log ₂ numBands), or ii) If 0 ≦ bsOttBands ≦ numBands, the “bsOttBands” field 802 can be represented by ceil (log ₂ (numBands + 1)).

「ｎｕｍＢａｎｄｓ」（以下、「ｎｕｍｂｅｒＢａｎｄｓ」という。）以下の値が任意に決定される場合、「ｂｓＢａｎｄｓ」フィールド８０２は、「ｎｕｍｂｅｒＢａｎｄｓ」を変数として取って切り上げ関数により可変個数のビットで表わすことができる。 “NumBands” (hereinafter referred to as “numberBands”) When the following values are arbitrarily determined, the “bsBands” field 802 can be represented by a variable number of bits by a round-up function taking “numberBands” as a variable. .

具体的に、ｉ）０＜ｂｓＯｔｔＢａｎｄｓ≦ｎｕｍｂｅｒＢａｎｄｓまたは０≦ｂｓＯｔｔＢａｎｄｓ＜ｎｕｍｂｅｒＢａｎｄｓである場合、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２はｃｅｉｌ（ｌｏｇ₂（ｎｕｍｂｅｒＢａｎｄｓ））ビットで表わされるか、あるいは、ii）０≦ｂｓＯｔｔＢａｎｄｓ≦ｎｕｍｂｅｒＢａｎｄｓである場合、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２はｃｅｉｌ（ｌｏｇ₂（ｎｕｍｂｅｒＢａｎｄｓ＋１）で表わすことができる。 Specifically, if i) 0 <bsOtBands ≦ numberBands or 0 ≦ bsOtBands <numberBands, then the “bsOtBands” field 802 is represented by ceil (log ₂ (numberBands)) bits, or ii) 0 ≦ bsOtBandBsnbands ≦ , The “bsOttBands” field 802 can be represented by ceil (log ₂ (numberBands + 1)).

１以上のＯＴＴボックスが用いられる場合、「ｂｓＯｔｔＢａｎｄｓ」の組み合わせは以下の〔式１〕で表わすことができる： If more than one OTT box is used, the combination of “bsOttBands” can be represented by the following [Equation 1]:

ここで、ｂｓＯｔｔＢａｎｄｓ_iはｉ番目の「ｂｓＯｔｔＢａｎｄｓ」を示す。例えば、ＯＴＴボックスが３個存在し、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２に対して３つの値（Ｎ＝３）が存在するとする。この例において、３つのＯＴＴボックスに適用される「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２の３つの値（以下、それぞれａ１、ａ２、ａ３とする。）はそれぞれ２ビットで表わすことができる。このため、ａ１、ａ２、ａ３の値を表わすために、合計で６ビットが必要となる。しかしながら、ａ１、ａ２、ａ３の値がグループで表わされる場合、２７個（＝３＊３＊３）の場合が発生可能であり、これは５ビットで表わすことができ、１ビットを節約することになる。「ｎｕｍＢａｎｄｓ」が３であり、かつ、５ビットで表わされるグループ値が１５である場合、グループ値は１５＝１ｘ（３＾２）＋２＊（３＾１）＋０＊（３＾０）で表わすことができる。このため、デコーダーは〔式１〕の逆を適用してグループ値１５から「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２の３つの値ａ１、ａ２、ａ３をそれぞれ１、２、０に決定することができる。 Here, bsOttBands _i indicates the i-th “bsOttBands”. For example, it is assumed that there are three OTT boxes and three values (N = 3) exist for the “bsOttBands” field 802. In this example, three values of the “bsOttBands” field 802 applied to the three OTT boxes (hereinafter referred to as a1, a2, and a3, respectively) can be represented by 2 bits. Therefore, a total of 6 bits are required to represent the values of a1, a2, and a3. However, if the values of a1, a2, and a3 are represented in groups, 27 (= 3 * 3 * 3) cases can occur, which can be represented by 5 bits, saving 1 bit. become. When “numBands” is 3 and the group value represented by 5 bits is 15, the group value is represented by 15 = 1x (3 ^ 2) + 2 * (3 ^ 1) + 0 * (3 ^ 0) be able to. For this reason, the decoder can determine the three values a1, a2, and a3 of the “bsOtBands” field 802 from 1, 2, and 0 by applying the reverse of [Equation 1].

多数のＯＴＴボックスの場合、「ｂｓＯｔｔＢａｎｄｓ」の組み合わせは「ｎｕｍｂｅｒＢａｎｄｓ」を用いて、〔式２〕ないし〔式４〕の内のいずれかとして表わすことができる。
「ｎｕｍｂｅｒＢａｎｄｓ」を用いる「ｂｓＯｔｔＢａｎｄｓ」の表現は〔式１〕における「ｎｕｍＢａｎｄｓ」を用いる表現とほとんど同様であるため、詳細な説明は省き、その式だけを以下に示す。 In the case of a large number of OTT boxes, the combination of “bsOttBands” can be expressed as one of [Expression 2] to [Expression 4] using “numberBands”.
Since the expression “bsOtBands” using “numberBands” is almost the same as the expression using “numBands” in [Expression 1], detailed description is omitted, and only the expression is shown below.

図９Ａは、本発明の一実施形態により、ＴＴＴボックスに適用されるパラメータ帯域の個数を固定個数のビットで示すシンタックスを示している。図７Ａと図９Ａを参照すると、「ｉ」は「０」において「ｎｕｍＴｔｔＢｏｘｅｓ−１」の値を有し、「ｎｕｍＴｔｔＢｏｘｅｓ」はＴＴＴボックスの総数である。すなわち、「ｉ」の値が各ＴＴＴボックスを示す。各ＴＴＴボックスに適用されるパラメータ帯域の個数は「ｉ」の値により表わされる。一部の実施形態において、ＴＴＴボックスは低周波帯域範囲と高周波帯域範囲とに分割可能であり、これらの低周波帯域範囲と高周波帯域範囲には異なる処理が適用可能である。異なる分割も行うことが可能である。 FIG. 9A illustrates a syntax indicating the number of parameter bands applied to a TTT box with a fixed number of bits according to an embodiment of the present invention. Referring to FIGS. 7A and 9A, “i” has a value of “numTttBoxes-1” at “0”, and “numTttBoxes” is the total number of TTT boxes. That is, the value of “i” indicates each TTT box. The number of parameter bands applied to each TTT box is represented by a value of “i”. In some embodiments, the TTT box can be divided into a low frequency band range and a high frequency band range, and different processing can be applied to these low frequency band range and high frequency band range. Different divisions are possible.

「ｂｓＴＴＴＤｕａｌＭｏｄｅ」フィールド９０１は、所定のＴＴＴボックスが低周波帯域範囲と高周波帯域範囲に対してそれぞれ異なるモード（以下、「デュアルモード」という。）にて動作するか否かを示す。例えば、「ｂｓＴＴＴＤｕａｌＭｏｄｅ」フィールド９０１の値が「０」である場合、低周波帯域範囲と高周波帯域範囲を区別することなく、全体の帯域範囲に対して単一のモードが用いられる。「ｂｓＴＴＴＤｕａｌＭｏｄｅ」フィールド９０１の値が「１」である場合、低周波帯域範囲と高周波帯域範囲に対してそれぞれ異なるモードが用いられる。 The “bsTTDualMode” field 901 indicates whether or not a predetermined TTT box operates in different modes (hereinafter referred to as “dual mode”) for the low frequency band range and the high frequency band range. For example, when the value of the “bsTTDualMode” field 901 is “0”, a single mode is used for the entire band range without distinguishing the low frequency band range and the high frequency band range. When the value of the “bsTTDualMode” field 901 is “1”, different modes are used for the low frequency band range and the high frequency band range, respectively.

「ｂｓＴｔｔＭｏｄｅＬｏｗ」フィールド９０２は、所定のＴＴＴボックスの動作モードを示すものであり、これは種々の動作モードを有することができる。例えば、ＴＴＴボックスは、ＣＰＣパラメータとＩＣＣパラメータなどを用いる予測モード、ＣＬＤパラメータを用いるエネルギー基盤のモードなどを有することができる。ＴＴＴボックスがデュアルモードを有する場合、高周波帯域範囲に関する追加情報が必要になる。 The “bsTttModeLow” field 902 indicates the operating mode of a given TTT box, which can have various operating modes. For example, the TTT box may have a prediction mode using CPC parameters and ICC parameters, an energy-based mode using CLD parameters, and the like. If the TTT box has a dual mode, additional information about the high frequency band range is required.

「ｂｓＴｔｔＭｏｄｅＨｉｇｈ」フィールド９０３は、ＴＴＴボックスがデュアルモードを有する場合に高周波帯域範囲の動作モードを示す。 The “bsTttModeHigh” field 903 indicates an operation mode in the high frequency band range when the TTT box has a dual mode.

「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０４は、ＴＴＴボックスに適用されるパラメータ帯域の個数を示す。 A “bsTttBandsLow” field 904 indicates the number of parameter bands applied to the TTT box.

「ｂｓＴｔｔＢａｎｄｓＨｉｇｈ」フィールド９０５は、「ｎｕｍＢａｎｄｓ」を有する。 The “bsTttBandsHigh” field 905 has “numBands”.

ＴＴＴボックスがデュアルモードを有する場合、低帯域範囲は「０」以上「ｂｓＴｔｔＢａｎｄｓＬｏｗ」未満であり、高帯域範囲は「ｂｓＴｔｔＢａｎｄｓＬｏｗ」以上「ｂｓＴｔｔＢａｎｄｓＨｉｇｈ」未満である。 When the TTT box has a dual mode, the low band range is “0” or more and less than “bsTttBandsLow”, and the high band range is “bsTttBandsLow” or more and less than “bsTttBandsHigh”.

ＴＴＴボックスがデュアルモードを有さない場合、ＴＴＴボックスに適用されるパラメータ帯域の個数は「０」以上「ｎｕｍＢａｎｄｓ」未満である（９０７）。 If the TTT box does not have a dual mode, the number of parameter bands applied to the TTT box is greater than or equal to “0” and less than “numBands” (907).

「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０４は、固定個数のビットで表わすことができる。例えば、図９Ａに示すように、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０４を表わすために５ビットが割当て可能である。 The “bsTttBandsLow” field 904 can be represented by a fixed number of bits. For example, as shown in FIG. 9A, 5 bits can be allocated to represent the “bsTttBandsLow” field 904.

図９Ｂは、本発明の一実施形態により、ＴＴＴボックスに適用されるパラメータ帯域の個数を可変個数のビットで表わすシンタックスを示している。図９Ｂは、図９Ａとほとんど同様であるが、図９Ｂにおいては、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７を可変個数のビットで表わし、図９Ａにおいては、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０４を固定個数のビットで表わすという点において図９Ａと図９Ｂとは異なっている。具体的に、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７は「ｎｕｍＢａｎｄｓ」以下の値を有するため、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７は「ｎｕｍＢａｎｄｓ」を用いる可変個数のビットで表わすことができる。 FIG. 9B shows a syntax for representing the number of parameter bands applied to a TTT box by a variable number of bits according to an embodiment of the present invention. 9B is almost the same as FIG. 9A, except that in FIG. 9B the “bsTttBandsLow” field 907 is represented by a variable number of bits, and in FIG. 9A the “bsTttBandsLow” field 904 is represented by a fixed number of bits. 9A and 9B are different from each other. Specifically, since the “bsTttBandsLow” field 907 has a value equal to or less than “numBands”, the “bsTttBandsLow” field 907 can be represented by a variable number of bits using “numBands”.

具体的に、「ｎｕｍＢａｎｄｓ」が２＾（ｎ−１）以上２＾（ｎ）未満の範囲に収まると、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７はｎビットで表わすことができる。 Specifically, when “numBands” falls within the range of 2 ^ (n−1) or more and less than 2 ^ (n), the “bsTttBandsLow” field 907 can be represented by n bits.

例えば、（ｉ）「ｎｕｍＢａｎｄｓ」が４０である場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７は６ビットで表わされ、（ii）「ｎｕｍＢａｎｄｓ」が２８または２０である場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７は５ビットで表わされ、（iii ）「ｎｕｍＢａｎｄｓ」が１４または１０である場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７は４ビットで表わされ、（iv）「ｎｕｍＢａｎｄｓ」が７、５または４である場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７は３ビットで表わされる。 For example, if (i) “numBands” is 40, the “bsTttBandsLow” field 907 is represented by 6 bits, and (ii) if “numBands” is 28 or 20, the “bsTttBandsLow” field 907 is 5 bits. (Iii) if “numBands” is 14 or 10, the “bsTttBandsLow” field 907 is represented by 4 bits, and (iv) if “numBands” is 7, 5 or 4, “bsTttBandsLow” Field 907 is represented by 3 bits.

「ｎｕｍＢａｎｄｓ」が２＾（ｎ−１）超え２＾（ｎ）以下の範囲に収まると、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７は可変ｎビットで表わすことができる。 When “numBands” falls within the range of 2 ^ (n−1) to 2 ^ (n), the “bsTttBandsLow” field 907 can be represented by variable n bits.

例えば、（ｉ）「ｎｕｍＢａｎｄｓ」が４０である場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７は６ビットで表わされ、（ii）「ｎｕｍＢａｎｄｓ」が２８または２０である場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７は５ビットで表わされ、（iii ）「ｎｕｍＢａｎｄｓ」が１４または１０である場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７は４ビットで表わされ、（iv）「ｎｕｍＢａｎｄｓ」が７または５である場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７は３ビットで表わされ、（ｖ）「ｎｕｍＢａｎｄｓ」が４である場合、「ｂｓＯｔｔＢａｎｄｓ」フィールド８０２は２ビットで表わされる。 For example, if (i) “numBands” is 40, the “bsTttBandsLow” field 907 is represented by 6 bits, and (ii) if “numBands” is 28 or 20, the “bsTttBandsLow” field 907 is 5 bits. (Iii) if “numBands” is 14 or 10, the “bsTttBandsLow” field 907 is represented by 4 bits, and (iv) if “numBands” is 7 or 5, the “bsTttBandsLow” field 907 Is represented by 3 bits, and (v) when “numBands” is 4, the “bsOttBands” field 802 is represented by 2 bits.

「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７は、「ｎｕｍＢａｎｄｓ」を変数として取って最も近い整数に切り上げて決定される個数のビットで表わすことができる。 The “bsTttBandsLow” field 907 can be represented by the number of bits determined by taking “numBands” as a variable and rounding up to the nearest integer.

例えば、ｉ）０＜ｂｓＯｔｔＢａｎｄｓＬｏｗ≦ｎｕｍＢａｎｄｓまたは０≦ｂｓＯｔｔＢａｎｄｓＬｏｗ＜ｎｕｍＢａｎｄｓである場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７はｃｅｉｌ（ｌｏｇ₂（ｎｕｍＢａｎｄｓ））の値に対応する数のビットで表わされるか、あるいは、ii）０≦ｂｓＯｔｔＢａｎｄｓＬｏｗ≦ｎｕｍＢａｎｄｓである場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７はｃｅｉｌ（ｌｏｇ₂（ｎｕｍＢａｎｄｓ＋１））で表わすことができる。 For example, if i) 0 <bsOtBandsLow ≦ numBands or 0 ≦ bsOttBandsLow <numBands, then the “bsTttBandsLow” field 907 is represented by a number of bits corresponding to the value of ceil (log ₂ (numBands)), or ii) If 0 ≦ bsOttBandsLow ≦ numBands, the “bsTttBandsLow” field 907 can be represented by ceil (log ₂ (numBands + 1)).

「ｎｕｍＢａｎｄｓ」、すなわち、「ｎｕｍｂｅｒＢａｎｄｓ」以下の値が任意に決定される場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７は、「ｎｕｍｂｅｒＢａｎｄｓ」を用いて可変個数のビットで表わすことができる。 If “numBands”, that is, a value less than “numberBands” is arbitrarily determined, the “bsTttBandsLow” field 907 can be represented by a variable number of bits using “numberBands”.

具体的に、ｉ）０＜ｂｓＯｔｔＢａｎｄｓＬｏｗ≦ｎｕｍｂｅｒＢａｎｄｓまたは０≦ｂｓＯｔｔＢａｎｄｓＬｏｗ＜ｎｕｍｂｅｒＢａｎｄｓである場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７はｃｅｉｌ（ｌｏｇ₂（ｎｕｍｂｅｒＢａｎｄｓ））の値に対応する数のビットで表わされるか、あるいは、ii）０≦ｂｓＯｔｔＢａｎｄｓＬｏｗ≦ｎｕｍｂｅｒＢａｎｄｓである場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」フィールド９０７はｃｅｉｌ（ｌｏｇ₂（ｎｕｍｂｅｒＢａｎｄｓ＋１）の値に対応する数のビットで表わすことができる。 Specifically, i) if 0 <bsOtBandsLow ≦ numberBands or 0 ≦ bsOttBandsLow <numberBands, then the “bsTttBandsLow” field 907 is represented by a number of bits corresponding to the value of ceil (log ₂ (numberBands)), or ii) If 0 ≦ bsOttBandsLow ≦ numberBands, the “bsTttBandsLow” field 907 can be represented by a number of bits corresponding to the value of ceil (log ₂ (numberBands + 1)).

多数のＴＴＴボックスが用いられる場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」の組み合わせは以下の〔式５〕で表わすことができる： When multiple TTT boxes are used, the combination of “bsTttBandsLow” can be expressed by the following [Equation 5]:

ここで、ｂｓＴｔｔＢａｎｄｓＬｏｗ_iはｉ番目の「ｂｓＴｔｔＢａｎｄｓＬｏｗ」を示す。〔式５〕の意味は〔式１〕の意味と同様であるため、〔式５〕の詳細な説明は省く。
多数のＴＴＴボックスの場合、「ｂｓＴｔｔＢａｎｄｓＬｏｗ」の組み合わせは「ｎｕｍｂｅｒＢａｎｄｓ」を用いる〔式６〕乃至〔式８〕の意味は〔式２〕乃至〔式４〕の意味と同一であるので、〔式６〕乃至〔式８〕についての詳細な説明は省略する。 Here, bsTttBandsLow _i indicates the i-th “bsTttBandsLow”. Since the meaning of [Formula 5] is the same as the meaning of [Formula 1], a detailed description of [Formula 5] is omitted.
In the case of a large number of TTT boxes, the combination of “bsTttBandsLow” uses “numberBands”. The meaning of [Expression 6] to [Expression 8] is the same as the meaning of [Expression 2] to [Expression 4]. ] To [Formula 8] will not be described in detail.

チャンネル変換モジュール（例えば、ＯＴＴボックス及び／またはＴＴＴボックスなど）に適用されるパラメータ帯域の個数は「ｎｕｍＢａｎｄｓ」の分割値として表わすことができる。この場合、上述した分割値は、「ｎｕｍＢａｎｄｓ」の半分値または特定の値で「ｎｕｍＢａｎｄｓ」を割った結果値を用いる。

The number of parameter bands applied to the channel conversion module (eg, OTT box and / or TTT box, etc.) can be expressed as a divided value of “numBands”. In this case, as the above-described division value, a half value of “numBands” or a result value obtained by dividing “numBands” by a specific value is used.

ＯＴＴ及び／またはＴＴボックスに適用されるパラメータ帯域の個数が一旦決定されると、前記パラメータ帯域の個数の範囲内において各ＯＴＴボックス及び／または各ＴＴＴボックスに適用可能なパラメータセットが決定される。各ＯＴＴボックス及び／または各ＴＴＴボックスには各パラメータセットがタイムスロット単位で適用可能である。すなわち、１個のタイムスロットに一つのパラメータセットが適用可能である。 Once the number of parameter bands to be applied to the OTT and / or TT box is determined, a parameter set applicable to each OTT box and / or each TTT box is determined within the number of parameter bands. Each parameter set can be applied to each OTT box and / or each TTT box in units of time slots. That is, one parameter set can be applied to one time slot.

以上の述べたように、１枚の空間フレームは複数のタイムスロットを含むことができる。空間フレームが固定フレームタイプである場合、複数のタイムスロットには等間隔にてパラメータセットが適用可能である。空間フレームが可変フレームタイプである場合、パラメータが適用されるタイムスロットの位置情報が必要となる。これについては、図１３Ａ〜１３Ｃに基づき後述する。 As described above, one spatial frame can include a plurality of time slots. When the spatial frame is a fixed frame type, a parameter set can be applied to a plurality of time slots at equal intervals. When the spatial frame is a variable frame type, position information of the time slot to which the parameter is applied is necessary. This will be described later with reference to FIGS.

図１０Ａは、本発明の一実施形態により空間拡張フレームに関する空間拡張構成情報を示すシンタックスを示している。空間拡張構成情報は、「ｂｓＳａｃＥｘｔＴｙｐｅ」フィールド１００１と、「ｂｓＳａｃＥｘｔＬｅｎ」フィールド１００２と、「ｂｓＳａｃＥｘｔＬｅｎＡｄｄ」フィールド１００３と、「ｂｓＳａｃＥｘｔＬｅｎＡｄｄＡｄｄ」フィールド１００４と、「ｂｓＦｉｌｌＢｉｔｓ」フィールド１００７と、を含む。他のフィールドも採用可能である。 FIG. 10A illustrates a syntax indicating spatial extension configuration information regarding a spatial extension frame according to an embodiment of the present invention. The spatial extension configuration information includes a “bsSacExtType” field 1001, a “bsSacExtLen” field 1002, a “bsSacExtLenAdd” field 1003, a “bsSacExtLenAddAdd” field 1004, and a “bsFillBits” field 1007. Other fields can be employed.

「ｂｓＳａｃＥｘｔＴｙｐｅ」フィールド１００１は、空間拡張フレームのデータタイプを示す。例えば、空間拡張フレームは、「０」、レジデュアル信号データ、任意のダウンミックスレジデュアル信号データまたは任意のツリーデータで詰め込むことができる。 A “bsSacExtType” field 1001 indicates the data type of the spatial extension frame. For example, the spatial extension frame can be packed with “0”, residual signal data, arbitrary downmix residual signal data, or arbitrary tree data.

「ｂｓＳａｃＥｘｔＬｅｎ」フィールド１００２は、空間拡張構成情報のバイト数を示す。 A “bsSacExtLen” field 1002 indicates the number of bytes of the spatial extension configuration information.

「ｂｓＳａｃＥｘｔＬｅｎＡｄｄ」フィールド１００３は、空間拡張構成情報のバイト数が、例えば１５以上である場合、空間拡張構成情報の追加バイト数を示す。 The “bsSacExtLenAdd” field 1003 indicates the number of additional bytes of the spatial extension configuration information when the number of bytes of the spatial extension configuration information is, for example, 15 or more.

「ｂｓＳａｃＬｅｎＡｄｄＡｄｄ」フィールド１００４は、空間拡張構成情報のバイト数が、例えば２７０以上である場合、空間拡張構成情報の追加バイト数を示す。 The “bsSacLenAddAdd” field 1004 indicates the number of additional bytes of the spatial extension configuration information when the number of bytes of the spatial extension configuration information is 270 or more, for example.

エンコーダー／デコーダーにおいて各フィールドが決定／抽出された後、空間拡張フレームに含まれるデータタイプに関する構成情報が決定される（１００５）。 After each field is determined / extracted in the encoder / decoder, configuration information regarding the data type included in the spatial extension frame is determined (1005).

上述したように、空間拡張フレームにはレジデュアル信号データ、任意のダウンミックスレジデュアル信号データ、ツリー構成データなどが含まれ得る。 As described above, the spatial extension frame may include residual signal data, arbitrary downmix residual signal data, tree configuration data, and the like.

続けて、空間拡張構成情報の長さのうち未使用のビットの個数が算出される（１００６）。 Subsequently, the number of unused bits in the length of the spatial extension configuration information is calculated (1006).

「ｂｓＦｉｌｌＢｉｔｓ」フィールド１００７は、未使用のビットを詰め込むために見逃しうるデータのビット数を示す。 The “bsFillBits” field 1007 indicates the number of bits of data that can be overlooked to pack unused bits.

図１０Ｂと図１０Ｃは、本発明の一実施形態により、空間拡張フレームにレジデュアル信号が含まれる場合、レジデュアル信号のための空間拡張情報を示すシンタックスを示している。 10B and 10C illustrate a syntax indicating spatial extension information for a residual signal when the residual signal is included in the spatial extension frame according to an embodiment of the present invention.

図１０Ｂを参照すると、「ｂｓＲｅｓｉｄｕａｌＳａｍｐｌｉｎｇＦｒｅｑｕｅｎｃｙＩｎｄｅｘ」フィールド１００８は、レジデュアル信号のサンプリング周波数を示す。 Referring to FIG. 10B, a “bsResidualSamplingFrequencyIndex” field 1008 indicates the sampling frequency of the residual signal.

「ｂｓＲｅｓｉｄｕａｌＦｒａｍｅｓＰｅｒＳｐａｔｉａｌＦｒａｍｅ」フィールド１００９は、空間フレーム当たりのレジデュアルフレームの本数を示す。例えば、１枚の空間フレームに１枚、２枚、３枚または４枚のレジデュアルフレームが含まれうる。 The “bsResidualFramesPerSpatialFrame” field 1009 indicates the number of residual frames per spatial frame. For example, one spatial frame may include one, two, three, or four residual frames.

「ＲｅｓｉｄｕａｌＣｏｎｆｉｇ」フィールド１０１０は、各ＯＴＴ及び／またはＴＴＴボックスに適用されるレジデュアル信号に対するパラメータ帯域の個数を示す。 The “ResidualConfig” field 1010 indicates the number of parameter bands for the residual signal applied to each OTT and / or TTT box.

図１０Ｃを参照すると、「ｂｓＲｅｓｉｄｕａｌＰｒｅｓｅｎｔ」フィールド１０１１は、各ＯＴＴ及び／またはＴＴＴボックスにレジデュアル信号が適用されるか否かを示す。 Referring to FIG. 10C, a “bsResidualPresent” field 1011 indicates whether a residual signal is applied to each OTT and / or TTT box.

「ｂｓＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１２は、各ＯＴＴ及び／又はＴＴＴボックスにレジデュアル信号が存在する場合、各ＯＴＴ及び／又はＴＴＴボックスに存在するレジデュアル信号のパラメータ帯域の個数を示す。レジデュアル信号のパラメータ帯域の個数は固定個数のビットまたは可変個数のビットで表わすことができる。パラメータ帯域の個数が固定個数のビットで表わされる場合、レジデュアル信号はオーディオ信号のパラメータ帯域の総数以下の値を有することができる。このため、全体のパラメータ帯域の個数を示すために必要となるビット数（例えば、図１０Ｃにおける５ビットなど）が割当て可能である。 The “bsResidualBands” field 1012 indicates the number of parameter bands of the residual signal existing in each OTT and / or TTT box when the residual signal exists in each OTT and / or TTT box. The number of parameter bands of the residual signal can be represented by a fixed number of bits or a variable number of bits. If the number of parameter bands is represented by a fixed number of bits, the residual signal may have a value less than or equal to the total number of parameter bands of the audio signal. For this reason, the number of bits necessary for indicating the total number of parameter bands (for example, 5 bits in FIG. 10C) can be assigned.

図１０Ｄは、本発明の一実施形態により、レジデュアル信号のパラメータ帯域の個数を可変個数のビットで表わすシンタックスを示している。「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４は、「ｎｕｍＢａｎｄｓ」を用いる可変個数のビットで表わすことができる。 FIG. 10D illustrates a syntax that represents the number of parameter bands of a residual signal with a variable number of bits according to an embodiment of the present invention. The “beResidualBands” field 1014 can be represented by a variable number of bits using “numBands”.

「ｎｕｍＢａｎｄｓ」が２＾（ｎ−１）以上２＾（ｎ）未満であれば、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４はｎビットで表わすことができる。 If “numBands” is greater than or equal to 2 ^ (n−1) and less than 2 ^ (n), the “beResidualBands” field 1014 can be represented by n bits.

例えば、（ｉ）「ｎｕｍＢａｎｄｓ」が４０である場合、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４は６ビットで表わされ、（ii）「ｎｕｍＢａｎｄｓ」が２８または２０である場合、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４は５ビットで表わされ、（iii ）「ｎｕｍＢａｎｄｓ」が１４または１０である場合、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４は４ビットで表わされ、（iv）「ｎｕｍＢａｎｄｓ」が７、５または４である場合、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４は３ビットで表わされる。 For example, if (i) “numBands” is 40, the “beResidualBands” field 1014 is represented by 6 bits, and (ii) if “numBands” is 28 or 20, the “beResidualBands” field 1014 is 5 bits. (Iii) if “numBands” is 14 or 10, the “beResidualBands” field 1014 is represented by 4 bits, and (iv) if “numBands” is 7, 5 or 4, “beResidualBands” Field 1014 is represented by 3 bits.

「ｎｕｍＢａｎｄｓ」が２＾（ｎ−１）超え２＾（ｎ）以下であれば、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４は可変ｎビットで表わすことができる。 If “numBands” is greater than 2 ^ (n−1) and less than or equal to 2 ^ (n), the “beResidualBands” field 1014 can be represented by variable n bits.

例えば、（ｉ）「ｎｕｍＢａｎｄｓ」が４０である場合、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４は６ビットで表わされ、（ii）「ｎｕｍＢａｎｄｓ」が２８または２０である場合、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４は５ビットで表わされ、（iii ）「ｎｕｍＢａｎｄｓ」が１４または１０である場合、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４は４ビットで表わされ、（iv）「ｎｕｍＢａｎｄｓ」が７または５である場合、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４は３ビットで表わされ、（ｖ）「ｎｕｍＢａｎｄｓ」が４である場合、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４は２ビットで表わされる。 For example, if (i) “numBands” is 40, the “beResidualBands” field 1014 is represented by 6 bits, and (ii) if “numBands” is 28 or 20, the “beResidualBands” field 1014 is 5 bits. (Iii) if “numBands” is 14 or 10, the “beResidualBands” field 1014 is represented by 4 bits, and (iv) if “numBands” is 7 or 5, the “beResidualBands” field 1014 Is represented by 3 bits, and (v) when “numBands” is 4, the “beResidualBands” field 1014 is represented by 2 bits.

また、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４は、「ｎｕｍＢａｎｄｓ」を変数として取って最も近い整数に切り上げて決定される切り上げ関数により決定されるビット数で表わすことができる。 Further, the “beResidualBands” field 1014 can be represented by the number of bits determined by a round-up function that is determined by taking “numBands” as a variable and rounding up to the nearest integer.

具体的に、ｉ）０＜ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ≦ｎｕｍＢａｎｄｓまたは０≦ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ＜ｎｕｍＢａｎｄｓである場合、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４はｃｅｉｌ（ｌｏｇ₂（ｎｕｍＢａｎｄｓ））ビットで表わされるか、あるいは、ii）０≦ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ≦ｎｕｍＢａｎｄｓである場合、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４はｃｅｉｌ（ｌｏｇ₂（ｎｕｍＢａｎｄｓ＋１））ビットで表わすことができる。 Specifically, if i) 0 <beResidualBands ≦ numBands or 0 ≦ beResidualBands <numBands, the “beResidualBands” field 1014 is represented by ceil (log ₂ (numBands)) bits, or ii) 0 ≦ beResidualBands ≦ , The “beResidualBands” field 1014 can be represented by ceil (log ₂ (numBands + 1)) bits.

一部の実施形態においては、「ｎｕｍＢａｎｄｓ」以下の値「ｎｕｍｂｅｒＢａｎｄｓ」を用いて「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４を表わすことができる。 In some embodiments, a “beResidualBands” field 1014 may be represented using a value “numberBands” that is less than or equal to “numBands”.

具体的に、ｉ）０＜ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ≦ｎｕｍｂｅｒＢａｎｄｓまたは０≦ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ＜ｎｕｍｂｅｒＢａｎｄｓである場合、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４はｃｅｉｌ（ｌｏｇ₂（ｎｕｍｂｅｒＢａｎｄｓ））ビットで表わされるか、あるいは、ii）０≦ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ≦ｎｕｍｂｅｒＢａｎｄｓである場合、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」フィールド１０１４はｃｅｉｌ（ｌｏｇ₂（ｎｕｍｂｅｒＢａｎｄｓ＋１））の値で表わすことができる。 Specifically, if i) 0 <beResidualBands ≦ numberBands or 0 ≦ beResidualBands <numberBands, then the “beResidualBands” field 1014 is represented by ceil (log ₂ (numberBandBands)) bits, or ii) 0 ≦ bandBss , The “beResidualBands” field 1014 can be represented by the value of ceil (log ₂ (numberBands + 1)).

複数のレジデュアル信号Ｎが存在する場合、「ｂｅＲｅｓｉｄｕａｌＢａｎｄｓ」の組み合わせは、以下の〔式９〕で表わすことができる： When there are a plurality of residual signals N, the combination of “beResidualBands” can be expressed by the following [Equation 9]:

この場合、ｂｓＲｅｓｉｄｕａｌＢａｎｄｓ_iはｉ番目の「ｂｓＲｅｓｉｄｕａｌＢａｎｄｓ」を示す。
〔式９〕の意味は〔式１〕の意味と同一であるため、〔式９〕の詳細な説明は省略する。 In this case, bsResidualBands _i indicates the i-th “bsResidualBands”.
Since the meaning of [Formula 9] is the same as the meaning of [Formula 1], a detailed description of [Formula 9] is omitted.

多数のレジデュアル信号が存在する場合、「ｂｓＲｅｓｉｄｕａｌＢａｎｄｓ」の組み合わせは「ｎｕｍｂｅｒＢａｎｄｓ」を用いる〔式１０〕乃至〔式１２〕の内のいずれかとして表わすことができる。「ｎｕｍｂｅｒＢａｎｄｓ」を用いて「ｂｓＲｅｓｉｄｕａｌＢａｎｄｓ」を表わすことは〔式２〕乃至〔式４〕とほとんど同様であるため、詳細な説明を省く。 When there are a large number of residual signals, the combination of “bsResidualBands” can be expressed as any one of [Equation 10] to [Equation 12] using “numberBands”. Representing “bsResidualBands” using “numberBands” is almost the same as [Formula 2] to [Formula 4], and thus detailed description is omitted.

レジデュアル信号のパラメータ帯域の個数は「ｎｕｍＢａｎｄｓ」の分割値として表わすことができる。この場合、上述した分割値は、「ｎｕｍＢａｎｄｓ」の半分値または特定の値で「ｎｕｍＢａｎｄｓ」を割った結果値を用いる。 The number of parameter bands of the residual signal can be expressed as a divided value of “numBands”. In this case, as the above-described division value, a half value of “numBands” or a result value obtained by dividing “numBands” by a specific value is used.

レジデュアル信号はダウンミックス信号及び空間情報信号と共にオーディオ信号のビットストリームに含まれてもよく、このようなビットストリームはデコーダーに転送可能である。デコーダーは、このようなビットストリームから前記ダウンミックス信号と、空間情報信号及びレジデュアル信号を抽出することができる。 The residual signal may be included in the audio signal bit stream along with the downmix signal and the spatial information signal, and such bit stream can be transferred to the decoder. The decoder can extract the downmix signal, the spatial information signal, and the residual signal from such a bitstream.

続けて、ダウンミックス信号は空間情報を用いてアップミックスされる。一方、レジデュアル信号はアップミックスの過程においてダウンミックス信号に適用される。具体的に、ダウンミックス信号は空間情報を用いる複数のチャンネル変換モジュールにおいてアップミックスされる。このような過程において、レジデュアル信号がチャンネル変換モジュールに適用される。以上述べたように、チャンネル変換モジュールは複数のパラメータ帯域を有し、パラメータセットはタイムスロット単位でチャンネル変換モジュールに適用される。レジデュアル信号がチャンネル変換モジュールに適用される場合、レジデュアル信号が適用されるオーディオ信号のチャンネル間相関情報を更新するためにはレジデュアル信号が必要となる。このような更新されたチャンネル間相関情報はアップミキシング処理に用いられる。 Subsequently, the downmix signal is upmixed using spatial information. On the other hand, the residual signal is applied to the downmix signal in the upmix process. Specifically, the downmix signal is upmixed in a plurality of channel conversion modules using spatial information. In this process, the residual signal is applied to the channel conversion module. As described above, the channel conversion module has a plurality of parameter bands, and the parameter set is applied to the channel conversion module in units of time slots. When the residual signal is applied to the channel conversion module, the residual signal is required to update the inter-channel correlation information of the audio signal to which the residual signal is applied. Such updated inter-channel correlation information is used for the upmixing process.

図１１Ａは、本発明の一実施形態によるノンガイドコーディングのためのデコーダーを示すブロック図である。ノンガイドコーディングは、オーディオ信号のビットストリームに空間情報が含まれていないことを意味する。 FIG. 11A is a block diagram illustrating a decoder for non-guided coding according to an embodiment of the present invention. Non-guide coding means that spatial information is not included in the bit stream of the audio signal.

一部の実施形態において、デコーダーは、解析フィルターバンク１１０２と、解析部１１０４と、空間合成部１００６と、合成フィルターバンク１１０８と、を備える。図１１Ａにはステレオ信号タイプのダウンミックス信号が示されているが、他のタイプのダウンミックス信号が使用可能である。 In some embodiments, the decoder includes an analysis filter bank 1102, an analysis unit 1104, a spatial synthesis unit 1006, and a synthesis filter bank 1108. FIG. 11A shows a stereo signal type downmix signal, but other types of downmix signals can be used.

動作に際し、デコーダーはダウンミックス信号１１０１を受信し、解析フィルターバンク１１０２は前記受信されたダウンミックス信号１１０１を周波数エリア信号１１０３に変換する。解析部１１０４は、前記変換されたダウンミックス信号１１０３から空間情報を生成する。解析部１１０４がスロット単位で処理を行い、複数のスロットごとに空間情報１１０５を生成することができる。この場合、スロットはタイムスロットを含む。 In operation, the decoder receives a downmix signal 1101 and the analysis filter bank 1102 converts the received downmix signal 1101 into a frequency area signal 1103. The analysis unit 1104 generates spatial information from the converted downmix signal 1103. The analysis unit 1104 performs processing in slot units, and can generate spatial information 1105 for each of a plurality of slots. In this case, the slot includes a time slot.

空間情報は２ステップで生成可能である。第一に、ダウンミックス信号からダウンミックスパラメータが生成される。第二に、前記ダウンミックスパラメータは空間パラメータなどの空間情報に変換される。一部の実施形態において、ダウンミックスパラメータはダウンミックス信号の行列演算により生成可能である。 Spatial information can be generated in two steps. First, a downmix parameter is generated from the downmix signal. Second, the downmix parameter is converted into spatial information such as a spatial parameter. In some embodiments, the downmix parameter can be generated by matrix operation of the downmix signal.

空間合成部１１０６は、前記生成された空間情報１１０５とダウンミックス信号１１０３とを合成してマルチチャンネルオーディオ信号１１０７を生成する。前記生成されたマルチチャンネルオーディオ信号１１０７は、合成フィルターバンク１１０８を通過して時間エリアオーディオ信号１１０９に変換される。 The spatial synthesis unit 1106 synthesizes the generated spatial information 1105 and the downmix signal 1103 to generate a multi-channel audio signal 1107. The generated multi-channel audio signal 1107 passes through the synthesis filter bank 1108 and is converted into a time area audio signal 1109.

空間情報は所定のスロット位置に生成可能である。このような位置間距離は同じであってもよい（すなわち、等距離）。例えば、空間情報は４個のスロットごとに生成可能である。また、空間情報は可変スロット位置に生成可能である。この場合、空間情報が生成される位置情報がビットストリームから抽出可能である。前記位置情報は可変個数のビットで表わすことができる。前記位置は以前のスロット位置情報からの絶対値及び差分値として表わすことができる。 Spatial information can be generated at a predetermined slot position. Such inter-position distances may be the same (ie, equidistant). For example, the spatial information can be generated every four slots. Spatial information can be generated at variable slot positions. In this case, position information where spatial information is generated can be extracted from the bitstream. The position information can be represented by a variable number of bits. The position can be expressed as an absolute value and a difference value from previous slot position information.

ノンガイドコーディングを用いる場合、オーディオ信号の各チャンネルに対するパラメータ帯域の個数（以下、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」という。）は、固定個数のビットで表わすことができる。「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は、「ｎｕｍＢａｎｄｓ」を用いる可変個数のビットで表わすことができる。例えば、「ｎｕｍＢａｎｄｓ」が２＾（ｎ−１）以上２＾（ｎ）未満であれば、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は可変ｎビットで表わすことができる。 When non-guide coding is used, the number of parameter bands for each channel of the audio signal (hereinafter referred to as “bsNumgedBlindBands”) can be represented by a fixed number of bits. “BsNumbiddedBlindBands” can be represented by a variable number of bits using “numBands”. For example, if “numBands” is 2 ^ (n−1) or more and less than 2 ^ (n), “bsNumbiddedBlindBands” can be represented by variable n bits.

具体的に、（ａ）「ｎｕｍＢａｎｄｓ」が４０である場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は６ビットで表わされ、（ｂ）「ｎｕｍＢａｎｄｓ」が２８または２０である場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は５ビットで表わされ、（ｃ）「ｎｕｍＢａｎｄｓ」が１４または１０である場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は４ビットで表わされ、（ｄ）「ｎｕｍＢａｎｄｓ」が７、５または４である場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は３ビットで表わされる。 Specifically, when (a) “numBands” is 40, “bsNumbiddedBlindBands” is represented by 6 bits, and (b) when “numBands” is 28 or 20, “bsNumbiddedBlindBands” is represented by 5 bits. (C) If “numBands” is 14 or 10, “bsNumbiddedBlindBands” is represented by 4 bits; (d) If “numBands” is 7, 5 or 4, “bsNummudedBandBands” is 3 bits Represented.

「ｎｕｍＢａｎｄｓ」が２＾（ｎ−１）超え２＾（ｎ）以下であれば、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は可変ｎビットで表わすことができる。 If “numBands” is greater than 2 ^ (n−1) and less than or equal to 2 ^ (n), “bsNumbiddenBlindBands” can be represented by variable n bits.

例えば、（ａ）「ｎｕｍＢａｎｄｓ」が４０である場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は６ビットで表わされ、（ｂ）「ｎｕｍＢａｎｄｓ」が２８または２０である場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は５ビットで表わされ、（ｃ）「ｎｕｍＢａｎｄｓ」が１４または１０である場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は４ビットで表わされ、（ｄ）「ｎｕｍＢａｎｄｓ」が７または５である場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は３ビットで表わされ、（ｅ）「ｎｕｍＢａｎｄｓ」が４である場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は２ビットで表わされる。 For example, if (a) “numBands” is 40, “bsNumbiddedBlindBands” is represented by 6 bits, and (b) if “numBands” is 28 or 20, “bsNumbiddedBlindBands” is represented by 5 bits, (C) If “numBands” is 14 or 10, “bsNumbiddedBlindBands” is represented by 4 bits; (d) If “numBands” is 7 or 5, “bsNumbiddedBandBands” is represented by 3 bits; (E) When “numBands” is 4, “bsNumbiddenBlindBands” is represented by 2 bits.

また、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は「ｎｕｍＢａｎｄｓ」を変数として取る切り上げ関数を用いて可変個数のビットで表わすことができる。 Also, “bsNumbiddedBlindBands” can be represented by a variable number of bits using a round-up function that takes “numBands” as a variable.

例えば、ｉ）０＜ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ≦ｎｕｍＢａｎｄｓまたは０≦ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ＜ｎｕｍＢａｎｄｓである場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」はｃｅｉｌ（ｌｏｇ₂（ｎｕｍＢａｎｄｓ）ビットで表わされるか、あるいは、ii）０≦ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ≦ｎｕｍＢａｎｄｓである場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」はｃｅｉｌ（ｌｏｇ₂（ｎｕｍＢａｎｄｓ＋１））ビットで表わすことができる。 For example, if i) 0 <bsNumbiddedBindBands ≦ numBands or 0 ≦ bsNumgedBlindBands <numBands, then “bsNumbiddedBlindBands” is represented by ceil (log ₂ (numBands) bits, or sidBid ≦ Band bsNumbiddedBlindBands "can be represented by ceil (log ₂ (numBands + 1)) bits.

「ｎｕｍＢａｎｄｓ」以下の値、すなわち、「ｎｕｍｂｅｒＢａｎｄｓ」が任意に決定される場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は以下のように表わすことができる。 If a value less than “numBands”, ie “numberBands” is arbitrarily determined, “bsNumbiddedBlindBands” can be expressed as:

具体的に、ｉ）０＜ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ≦ｎｕｍｂｅｒＢａｎｄｓまたは０≦ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ＜ｎｕｍｂｅｒＢａｎｄｓである場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」はｃｅｉｌ（ｌｏｇ₂（ｎｕｍｂｅｒＢａｎｄｓ））ビットで表わされるか、あるいは、ii）０≦ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ≦ｎｕｍｂｅｒＢａｎｄｓである場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」はｃｅｉｌ（ｌｏｇ₂（ｎｕｍｂｅｒＢａｎｄｓ＋１））で表わすことができる。 Specifically, if i) 0 <bsNumbiddedBands ≦ numberBands or 0 ≦ bsNumbiddedBlindBands <numberBands, then “bsNumbiddedBinderBand” is represented by ceil (log ₂ (numberBands), or sidBid is represented by sid, and sidBand In this case, “bsNumeratedBlindBands” can be expressed as ceil (log ₂ (numberBands + 1)).

多数のチャンネルＮが用いられる場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」の組み合わせは以下の〔式１３〕で表わすことができる： When multiple channels N are used, the combination of “bsNumbiddedBlindBands” can be expressed by the following [Equation 13]:

ここで、「ｂｓＮｕｍＧｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ_i」はｉ番目の「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」を示す。
〔式１３〕の意味は〔式１〕の意味と同様であるため、〔式１３〕の詳細な説明は省く。 Here, “bsNumGuidedBandBands _i ” indicates the i-th “bsNumgedBlindBands”.
Since the meaning of [Formula 13] is the same as the meaning of [Formula 1], a detailed description of [Formula 13] is omitted.

多数のチャンネルが存在する場合、「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」は「ｎｕｍｂｅｒＢａｎｄｓ」を用いる〔式１４〕乃至〔式１６〕の内のいずれかとして表わすことができる。「ｎｕｍｂｅｒＢａｎｄｓ」を用いる「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」の表現は〔式２〕乃至〔式４〕の表現と同様であるため、〔式１４〕乃至〔式１６〕の詳細な説明は省く。 When there are a large number of channels, “bsNumbiddedBlindBands” can be expressed as any one of [Equation 14] to [Equation 16] using “numberBands”. Since the expression “bsNumbiddedBlindBands” using “numberBands” is the same as the expression [Expression 2] to [Expression 4], detailed description of [Expression 14] to [Expression 16] is omitted.

図１１Ｂは、本発明の一実施形態によりパラメータ帯域の個数をグループとして表わす方法を示している。パラメータ帯域の個数は、チャンネル変換モジュールに適用されるパラメータ帯域の個数情報と、レジデュアル信号に適用されるパラメータ帯域の個数情報と、ノンガイドコーディングを用いる場合にオーディオ信号の各チャンネルに関するパラメータ帯域の個数情報と、を含む。パラメータ帯域の個数情報が複数存在する場合、複数の個数情報（例えば、「ｂｓＯｔｔＢａｎｄｓ」、「ｂｓＴｔｔＢａｎｄｓ」、「ｂｓＲｅｓｉｄｕａｌＢａｎｄｓ」及び／または「ｂｓＮｕｍｇｕｉｄｅｄＢｌｉｎｄＢａｎｄｓ」など）は少なくとも一つのグループとして表わすことができる。 FIG. 11B illustrates a method for representing the number of parameter bands as a group according to an embodiment of the present invention. The number of parameter bands includes the number of parameter bands applied to the channel conversion module, the number of parameter bands applied to the residual signal, and the parameter band for each channel of the audio signal when non-guide coding is used. Number information. When there are a plurality of pieces of parameter band number information, the plurality of pieces of piece number information (for example, “bsOttBands”, “bsTttBands”, “bsResidualBands”, and / or “bsNumbiddedBlindBands”) can be represented as at least one group.

図１１Ｂを参照すると、パラメータ帯域の個数情報が（ｋＮ＋Ｌ）個存在し、かつ、各パラメータ帯域の個数情報を表わすのにＱビットが必要となる場合、複数のパラメータ帯域の個数情報は以下のグループで表わすことができる。この場合、「ｋ」と「Ｎ」は「０」ではない任意の整数であり、「Ｌ」は０≦Ｌ＜Ｎを満たす任意の整数である。 Referring to FIG. 11B, when there are (kN + L) number information of parameter bands and Q bits are required to represent the number information of each parameter band, the number information of a plurality of parameter bands is as follows: It can be expressed as In this case, “k” and “N” are arbitrary integers that are not “0”, and “L” is an arbitrary integer that satisfies 0 ≦ L <N.

グループ化方法は、パラメータ帯域の個数情報Ｎ個を束ねてｋ個のグループを生成するステップと、最後のパラメータ帯域の個数情報Ｌ個を束ねて最終のグループを生成するステップと、を含む。Ｋ個のグループはＭビットで表わすことができ、最終のグループはｐビットで表わすことができる。この場合、Ｍビットが、好ましくは、パラメータ帯域の個数情報のそれぞれをグループ化することなく表わす場合に用いられるＮ＊Ｑビットよりも小さい。Ｐビットが、好ましくは、パラメータ帯域の個数情報のそれぞれをグループ化することなく表わす場合に用いられるＬ＊Ｑビット以下である。 The grouping method includes a step of bundling N parameter band number information to generate k groups, and a step of bundling L last parameter band number information to generate a final group. The K groups can be represented by M bits and the final group can be represented by p bits. In this case, the M bits are preferably smaller than the N * Q bits used to represent each piece of parameter band number information without grouping. P bits are preferably less than or equal to L * Q bits used when representing each piece of parameter band number information without grouping.

例えば、パラメータ帯域の個数情報の２つがそれぞれｂ１とｂ２であるとする。ｂ１とｂ２がそれぞれ５個の値を有する場合、ｂ１とｂ２のそれぞれを表わすのに３ビットが必要である。この場合、たとえ３ビットは８個の値を表わすことができるとしても、実質的には５個の値が必要となる。このため、ｂ１とｂ２のそれぞれは３個の余分を有する。しかしながら、ｂ１とｂ２を束ねてグループとして表わす場合には、６ビット（＝３ビット＋３ビット）の代わりに５ビットが用いられる。具体的に、ｂ１とｂ２との全ての組み合わせは２５個（＝５＊５）のタイプを有するため、ｂ１とｂ２のグループは５ビットで表わすことができる。５ビットは３２個の値を表わすことができるため、グループ化表現の場合、７個の余分が生成される。しかしながら、ｂ１とｂ２をグループ化して表わす場合、その余分はｂ１とｂ２をそれぞれ３ビットで表わす場合の余分よりも小さい。複数のパラメータ帯域の個数情報をグループとして表わす方法は、以下のような種々の方式により実現可能である。 For example, assume that two pieces of parameter band number information are b1 and b2, respectively. If b1 and b2 each have 5 values, 3 bits are required to represent each of b1 and b2. In this case, even if 3 bits can represent 8 values, substantially 5 values are required. For this reason, each of b1 and b2 has three extras. However, when b1 and b2 are combined and expressed as a group, 5 bits are used instead of 6 bits (= 3 bits + 3 bits). Specifically, since all combinations of b1 and b2 have 25 types (= 5 * 5), the group of b1 and b2 can be represented by 5 bits. Since 5 bits can represent 32 values, 7 extras are generated in the grouped representation. However, when b1 and b2 are expressed as a group, the surplus is smaller than that when b1 and b2 are each represented by 3 bits. The method of expressing the number information of a plurality of parameter bands as a group can be realized by the following various methods.

複数のパラメータ帯域の個数情報がそれぞれ４０種類の値を有する場合、Ｎとして２、３、４、５または６を用いてｋ個のグループが生成される。これらのｋ個のグループはそれぞれ１１ビット、１６ビット、２２ビット、２７ビット、３２ビットとして表わすことができる。あるいは、これらのｋ個のグループは各場合を組み合わせて表わされる。 When the number information of the plurality of parameter bands has 40 types of values, k groups are generated by using 2, 3, 4, 5 or 6 as N. These k groups can be represented as 11 bits, 16 bits, 22 bits, 27 bits, and 32 bits, respectively. Alternatively, these k groups are represented by combining each case.

複数のパラメータ帯域の個数情報がそれぞれ２８種類の値を有する場合、Ｎとして６を用いてｋ個のグループが生成され、ｋは２９ビットで表わすことができる。 When the number information of the plurality of parameter bands has 28 values, k groups are generated using 6 as N, and k can be represented by 29 bits.

複数のパラメータ帯域の個数情報がそれぞれ２０種類の値を有する場合、Ｎとして２、３、４、５、６または７を用いてｋ個のグループが生成される。これらのｋ個のグループはそれぞれ９ビット、１３ビット、１８ビット、２２ビット、２６ビット及び３１ビットとして表わすことができる。あるいは、これらのｋ個のグループは各場合を組み合わせて表わすことができる。 When the number information of the plurality of parameter bands has 20 types of values, k groups are generated using 2, 3, 4, 5, 6 or 7 as N. These k groups can be represented as 9 bits, 13 bits, 18 bits, 22 bits, 26 bits and 31 bits, respectively. Alternatively, these k groups can be represented by combining each case.

複数のパラメータ帯域の個数情報がそれぞれ１４種類の値を有する場合、Ｎとして６を用いてｋ個のグループが生成される。これらのｋ個のグループは２３ビットで表わすことができる。 When the number information of the plurality of parameter bands has 14 types of values, k groups are generated using 6 as N. These k groups can be represented by 23 bits.

複数のパラメータ帯域の個数情報がそれぞれ１０種類の値を有する場合、Ｎとして２、３、４、５、６、７、８または９を用いてｋ個のグループが生成される。これらのｋ個のグループはそれぞれ７、１０、１４、１７、２０、２４、２７及び３０ビットで表わすことができる。あるいは、これらのｋ個のグループは各場合を組み合わせて表わすことができる。 When the number information of a plurality of parameter bands has 10 types of values, k groups are generated using 2, 3, 4, 5, 6, 7, 8 or 9 as N. These k groups can be represented by 7, 10, 14, 17, 20, 24, 27 and 30 bits, respectively. Alternatively, these k groups can be represented by combining each case.

複数のパラメータ帯域の個数情報がそれぞれ７種類の値を有する場合、Ｎとして６、７、８、９、１０または１１を用いてｋ個のグループが生成される。これらのｋ個のグループはそれぞれ１７、２０、２３、２６、２９及び３１ビットで表わすことができる。あるいは、これらのｋ個のグループは各場合を組み合わせて表わすことができる。 When the number information of the plurality of parameter bands has 7 types of values, k groups are generated using 6, 7, 8, 9, 10 or 11 as N. These k groups can be represented by 17, 20, 23, 26, 29 and 31 bits, respectively. Alternatively, these k groups can be represented by combining each case.

複数のパラメータ帯域の個数情報がそれぞれ５種類の値を有する場合、Ｎとして２、３、４、５、６、７、８、９、１０、１１、１２または１３を用いてｋ個のグループが生成可能である。これらのｋ個のグループはそれぞれ５、７、１０、１２、１３、１７、１９、２１、２４、２６、２８及び３１ビットで表わすことができる。あるいは、これらのｋ個のグループは各場合を組み合わせて表わすことができる。 When the number information of the plurality of parameter bands has 5 types of values, k groups are formed by using 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 or 13 as N. Can be generated. These k groups can be represented by 5, 7, 10, 12, 13, 17, 19, 21, 24, 26, 28 and 31 bits, respectively. Alternatively, these k groups can be represented by combining each case.

また、複数のパラメータ帯域の個数情報は、上述したグループとして表わされるように構成可能であるか、または、パラメータ帯域の個数情報のそれぞれを独立したビットシーケンスとして連続して表わされるように構成可能である。 Further, the number information of a plurality of parameter bands can be configured to be expressed as the above-described group, or the number information of parameter bands can be configured to be expressed continuously as independent bit sequences. is there.

図１２は、本発明の一実施形態により空間フレームの構成情報を示すシンタックスを示している。空間フレームは、「ＦｒａｍｉｎＩｎｆｏ」ブロック１２０１と、「ｂｓＩｎｄｅｐｅｎｄｅｎｃｙＦｌａｇ」ブロック１２０１と、「Ｏｔｔｄａｔａ」ブロック１２０３と、「Ｔｔｔｄａｔａ」ブロック１２０４と、「ＳｍｇＤａｔａ」ブロック１２０５と、「ＴｅｍｐＳｈａｐｅＤａｔａ」ブロック１２０６と、を含む。 FIG. 12 shows a syntax indicating spatial frame configuration information according to an embodiment of the present invention. The spatial frame includes a “FrameInfo” block 1201, a “bsIndependencyFlag” block 1201, an “Ottdata” block 1203, a “Tttdata” block 1204, an “SmgData” block 1205, and a “TempShapeData” block 1206.

「ＦｒａｍｉｎｇＩｎｆｏ」ブロック１２０１は、パラメータセットの個数に関する情報と、各パラメータが適用されるタイムスロットに関する情報と、を含む。「ＦｒａｍｉｎｇＩｎｆｏ」ブロック１２０１については、図１３Ａに基づき詳述する。 The “FramingInfo” block 1201 includes information regarding the number of parameter sets and information regarding time slots to which each parameter is applied. The “FramingInfo” block 1201 will be described in detail with reference to FIG. 13A.

「ｂｓＩｎｄｅｐｅｎｄｅｎｃｙＦｌａｇ」フィールド１２０２は、現在のフレームが以前フレームに関する知識無しにデコーディング可能であるか否かを示す。 The “bsIndependencyFlag” field 1202 indicates whether the current frame can be decoded without knowledge about the previous frame.

「ＯｔｔＤａｔａ」ブロック１２０３は、全体のＯＴＴボックスに関する全体の空間パラメータ情報を含む。 The “OttData” block 1203 includes overall spatial parameter information for the entire OTT box.

「ＴｔｔＤａｔａ」ブロック１２０４は、全体のＴＴＴボックスに関する全体の空間パラメータ情報を含む。 The “TttData” block 1204 contains the overall spatial parameter information for the entire TTT box.

「ＳｍｇＤａｔａ」ブロック１２０５は、非量子化された空間パラメータに適用される臨時平坦化に関する情報を含む。 The “SmgData” block 1205 contains information regarding the temporary flattening applied to the unquantized spatial parameters.

「ＴｅｍｐＳｈａｐｅＤａｔａ」ブロック１２０６は、非相関信号に適用される臨時エンベロープ形状化に関する情報を含む。 The “TempShapeData” block 1206 contains information regarding the temporary envelope shaping applied to the uncorrelated signal.

図１３Ａは、本発明の一実施形態により、パラメータセットが適用されるタイムスロット位置情報を示すシンタックスを示している。「ｂｓＦｒａｍｉｎｇＴｙｐｅ」フィールド１３０１はオーディオ信号の空間フレームが固定フレームタイプであるか、あるいは、可変フレームタイプであるかを示す。固定フレームとは、予め設定されたタイムスロットにパラメータセットが適用されるフレームのことを言う。例えば、等間隔にて予め設定されたタイムスロットにパラメータセットが適用される。可変フレームとは、パラメータセットが適用されるタイムスロットの位置情報を別途に受信するフレームのことを言う。 FIG. 13A shows a syntax indicating time slot position information to which a parameter set is applied according to an embodiment of the present invention. A “bsFramingType” field 1301 indicates whether the spatial frame of the audio signal is a fixed frame type or a variable frame type. A fixed frame refers to a frame in which a parameter set is applied to a preset time slot. For example, the parameter set is applied to time slots set in advance at regular intervals. A variable frame refers to a frame that separately receives time slot position information to which a parameter set is applied.

「ｂｓＮｕｍＰａｒａｍＳｅｔｓ」フィールド１３０２は、１枚の空間フレーム内においてパラメータセットの個数を示し（以下、「ｎｕｍＰａｒａｍＳｅｔｓ」という。）、「ｎｕｍＰａｒａｍＳｅｔｓ」と「ｂｓＮｕｍＰａｒａｍＳｅｔｓ」との間には「ｎｕｍＰａｒａｍＳｅｔｓ＝ｂｓＮｕｍｐａｒａＳｅｔｓ＋１」の関係が成り立つ。 The “bsNumParamSets” field 1302 indicates the number of parameter sets in one spatial frame (hereinafter referred to as “numParamSets”), and the relationship “numParamSets = bsNumparaSets + 1” exists between “numParamSets” and “bsNumParamSets”. It holds.

例えば、図１３Ａの「ｂｓＮｕｍＰａｒａｓｅｔｓ」フィールド１３０２に３ビットが割り当てられると、１枚の空間フレーム内には最大８個のパラメータセットが提供可能である。割り当てられるビットの個数については制限がないため、空間フレーム内により多くのパラメータセットが提供可能である。 For example, if 3 bits are assigned to the “bsNumParases” field 1302 of FIG. 13A, a maximum of 8 parameter sets can be provided in one spatial frame. Since there is no limit on the number of allocated bits, more parameter sets can be provided in the spatial frame.

空間フレームが固定フレームタイプである場合、パラメータセットが適用されるタイムスロットの位置情報は予め設定された規則により決定可能であり、パラメータセットが適用されるタイムスロットの追加的な位置情報は不要である。しかしながら、空間フレームが可変フレームタイプである場合、パラメータセットが適用される位置情報が必要となる。 When the spatial frame is a fixed frame type, the position information of the time slot to which the parameter set is applied can be determined by a preset rule, and the additional position information of the time slot to which the parameter set is applied is unnecessary. is there. However, when the spatial frame is a variable frame type, position information to which the parameter set is applied is necessary.

「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０３は、パラメータセットが適用されるタイムスロットの位置情報を示す。「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０３は、１枚の空間フレーム内におけるタイムスロットの個数、すなわち、「ｎｕｍＳｌｏｔｓ」を用いて可変個数のビットで表わすことができる。具体的に、「ｎｕｍＳｌｏｔｓ」が２＾（ｎ−１）以上２＾（ｎ）未満の範囲に収まると、「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０３はｎビットで表わすことができる。 The “bsParamSlot” field 1303 indicates time slot position information to which the parameter set is applied. The “bsParamSlot” field 1303 can be represented by a variable number of bits using the number of time slots in one spatial frame, that is, “numSlots”. Specifically, when “numSlots” falls within the range of 2 ^ (n−1) or more and less than 2 ^ (n), the “bsParamSlot” field 1303 can be represented by n bits.

例えば、（ｉ）「ｎｕｍＳｌｏｔｓ」が６４と１２７との間の範囲にある場合、「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０３は７ビットで表わされ、（ii）「ｎｕｍＳｌｏｔｓ」が３２と６３との間の範囲にある場合、「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０３は６ビットで表わされ、（iii ）「ｎｕｍＳｌｏｔｓ」が１６と３１との間の範囲にある場合、「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０３は５ビットで表わされ、（iv）「ｎｕｍＳｌｏｔｓ」が８と１５との間の範囲にある場合、「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０３は４ビットで表わされ、（ｖ）「ｎｕｍＳｌｏｔｓ」が４と７との間の範囲にある場合、「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０３は３ビットで表わされ、（vi）「ｎｕｍＳｌｏｔｓ」が２と３との間の範囲にある場合、「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０３は２ビットで表わされ、（vii ）「ｎｕｍＳｌｏｔｓ」が１である場合、「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０３は１ビットで表わされ、（viii）「ｎｕｍＳｌｏｔｓ」が０である場合、「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０３は０ビットで表わすことができる。これと同様に、「ｎｕｍＳｌｏｔｓ」が６４と１２７との間の範囲にある場合、「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０３は７ビットで表わすことができる。 For example, (i) if “numSlots” is in the range between 64 and 127, the “bsParamSlot” field 1303 is represented by 7 bits, and (ii) “numSlots” is in the range between 32 and 63. In some cases, the “bsParamSlot” field 1303 is represented by 6 bits, and (iii) if “numSlots” is in the range between 16 and 31, the “bsParamSlot” field 1303 is represented by 5 bits (iv) ) If “numSlots” is in the range between 8 and 15, the “bsParamSlot” field 1303 is represented by 4 bits; (v) If “numSlots” is in the range between 4 and 7, The “bsParamSlot” field 1303 is represented by 3 bits and (vi) “numSlots”. ”Is in the range between 2 and 3, the“ bsParamSlot ”field 1303 is represented by 2 bits, and (vii) when“ numSlots ”is 1, the“ bsParamSlot ”field 1303 is represented by 1 bit. (Viii) If “numSlots” is 0, the “bsParamSlot” field 1303 can be represented by 0 bits. Similarly, if “numSlots” is in the range between 64 and 127, the “bsParamSlot” field 1303 can be represented by 7 bits.

多数のパラメータセットＮが存在する場合、「ｂｓＰａｒａｍＳｌｏｔ」の組み合わせは〔式１７〕により表わすことができる。 When there are a large number of parameter sets N, the combination of “bsParamSlot” can be expressed by [Equation 17].

この場合、「ｂｓＰａｒａｍＳｌｏｔ_i」はＩ番目のパラメータセットが適用されるタイムスロットを示す。例えば、「ｎｕｍＳｌｏｔｓ」が３であり、かつ、「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０３が１０個の値を有することができるとする。この場合、「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０３に関する３つの情報（以下、それぞれ、ｃ１、ｃ２、ｃ３という。）が必要となる。ｃ１、ｃ２、ｃ３のそれぞれを表わすためには４ビットが必要であるため、合計で１２ビットが必要となる。ｃ１、ｃ２、ｃ３をグループに束ねて表わす場合、１、０００個（＝１０＊１０＊１０）の場合が発生可能であり、これは１０ビットで表わされ、２ビットを節約することになる。「ｎｕｍＳｌｏｔｓ」が３であり、かつ、５ビットで表わされるグループ値が３１である場合、グループ値は３１＝１ｘ（３＾２）＋５＊（３＾１）＋７＊（３＾０）で表わすことができる。このため、デコーダー装置は〔式９〕の逆を適用してｃ１、ｃ２、ｃ３をそれぞれ１、５、７に決定することができる。 In this case, “bsParamSlot _i ” indicates a time slot to which the I-th parameter set is applied. For example, assume that “numSlots” is 3 and the “bsParamSlot” field 1303 can have 10 values. In this case, three pieces of information related to the “bsParamSlot” field 1303 (hereinafter referred to as c1, c2, and c3, respectively) are required. Since 4 bits are required to represent each of c1, c2, and c3, a total of 12 bits are required. When c1, c2, and c3 are expressed as a group, 1,000 cases (= 10 * 10 * 10) can be generated, which is expressed by 10 bits and saves 2 bits. . When “numSlots” is 3 and the group value represented by 5 bits is 31, the group value is represented by 31 = 1x (3 ^ 2) + 5 * (3 ^ 1) + 7 * (3 ^ 0) be able to. Therefore, the decoder apparatus can determine c1, c2, and c3 as 1, 5, and 7 by applying the reverse of [Equation 9].

図１３Ｂは、本発明の一実施形態によりパラメータセットが絶対値及び差分値として適用されるタイムスロットの位置情報を示すシンタックスを示している。空間フレームが可変フレームタイプである場合、図１３Ａにおける「ｂｓＰａｒａｍＳｌｏｔ」フィールド１３０１は「ｂｓＰａｒａｍＳｌｏｔ」情報が単調増加するということを用いて絶対値及び差分値で表わすことができる。 FIG. 13B shows a syntax indicating time slot position information to which a parameter set is applied as an absolute value and a difference value according to an embodiment of the present invention. When the spatial frame is a variable frame type, the “bsParamSlot” field 1301 in FIG. 13A can be expressed as an absolute value and a difference value by using the fact that “bsParamSlot” information monotonously increases.

例えば、（ｉ）最初のパラメータセットが適用されるタイムスロットの位置は絶対値、すなわち、「ｂｓＰａｒａｍＳｌｏｔ［０］」として生成可能であり、（ii）２番目以上のパラメータセットが適用されるタイムスロットの位置は絶対値、すなわち、「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ］」と「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ−１］」との間の「ｄｉｆｆｅｒｅｎｃｅｖａｌｕｅ」または「ｄｉｆｆｅｒｅｎｃｅｖａｌｕｅ−１」として生成可能である（以下、「ｂｓＤｉｆｆＰａｒａｍＳｌｏｔ［ｐｓ］」という。）。この場合、「ｐｓ」はパラメータセットを意味する。 For example, (i) the position of the time slot to which the first parameter set is applied can be generated as an absolute value, that is, “bsParamSlot [0]”, and (ii) the time slot to which the second or more parameter set is applied. Can be generated as an absolute value, ie, “difference value” or “difference value-1” between “bsParamSlot [ps]” and “bsParamSlot [ps−1]” (hereinafter referred to as “bsDiffParamSlot [ps”). ] "). In this case, “ps” means a parameter set.

「ｂｓＰａｒａｍＳｌｏｔ［０］」フィールド１３０４は「ｎｕｍＳｌｏｔｓ」と「ｎｕｍＰａｒａｍＳｅｔｓ」を用いて算出される個数のビットで表わすことができる（以下、「ｎＢｉｔｓＰａｒａｍＳｌｏｔ（０）」という。）。 The “bsParamSlot [0]” field 1304 can be expressed by the number of bits calculated using “numSlots” and “numParamSets” (hereinafter referred to as “nBitsParamSlot (0)”).

「ｂｓＤｉｆｆＰａｒａｍＳｌｏｔ［ｐｓ]」フィールド１３０５は、「ｎｕｍＳｌｏｔｓ」、「ｎｕｍＰａｒａｍＳｅｔｓｔ」及び以前のパラメータセットが適用されたタイムスロットの位置を用いて算出される個数のビットで表わすことができる（以下、「ｎＢｉｔＰａｒａｍＳｌｏｔ（ｐｓ）」という。）。 The “bsDiffParamSlot [ps]” field 1305 can be represented by “numSlots”, “numParamSetst” and the number of bits calculated using the position of the time slot to which the previous parameter set is applied (hereinafter, “nBitParamSlot ( ps) ").

具体的に、最小数のビットで「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ］」を表わすために、「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ］」を表わすビットの個数は以下の規則により決定可能である：（ｉ）複数の「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ］」は昇順に増加する（ｂｓＰａｒａｍＳｌｏｔ［ｐｓ］＞ｂｓＰａｒａｍＳｌｏｔ［ｐｓ−１］）、（ii）「ｂｓＰａｒａｍＳｌｏｔ［０］」の最大値は「ｎｕｍＳｌｏｔｓ−ＮｕｍＰａｒａｍＳｅｔｓ」であり、（iii ）０＜ｐｓ＜ｎｕｍＰａｒａｍＳｅｔｓである場合、「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ］」は「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ−１］＋１」と「ｎｕｍＳｌｏｔｓ−ｎｕｍＰａｒａｍＳｅｔｓ＋ｐｓ」との間の値だけを有する。 Specifically, in order to represent “bsParamSlot [ps]” with the minimum number of bits, the number of bits representing “bsParamSlot [ps]” can be determined by the following rules: (i) Multiple “bsParamSlot [ps] ] ”Increases in ascending order (bsParamSlot [ps]> bsParamSlot [ps−1]), (ii) The maximum value of“ bsParamSlot [0] ”is“ numSlots−NumParamSets ”, and (iii) 0 <ps <numParamSets , “BsParamSlot [ps]” has only a value between “bsParamSlot [ps−1] +1” and “numSlots−numParamSets + ps”.

例えば、「ｎｕｍＳｌｏｔｓ」が１０であり、かつ、「ｎｕｍＰａｒａｍＳｅｔｓ」が３である場合、「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ］」は昇順に増加するため、「ｂｓＰａｒａｍＳｌｏｔ［０］」の最大値は「１０−３＝７」となる。すなわち、「ｂｓＰａｒａｍＳｌｏｔ［０］」は０〜７の値から選ばれる必要がある。これは、「ｂｓＰａｒａｍＳｌｏｔ［０］」が７以上の値を有する場合、残りのパラメータセットに対するタイムスロットの個数が十分ではないためである。 For example, when “numSlots” is 10 and “numParamSets” is 3, since “bsParamSlot [ps]” increases in ascending order, the maximum value of “bsParamSlot [0]” is “10−3 = 7”. " That is, “bsParamSlot [0]” needs to be selected from values of 0 to 7. This is because when “bsParamSlot [0]” has a value of 7 or more, the number of time slots for the remaining parameter sets is not sufficient.

「ｂｓＰａｒａｍＳｌｏｔ［０］」が５である場合、２番目のパラメータセットに対するタイムスロット位置ｂｓＰａｒａｍＳｌｏｔ［１］は「５＋１＝６」と「１０−３＋１＝８」との間の値から選ばれる必要がある。 When “bsParamSlot [0]” is 5, the time slot position bsParamSlot [1] for the second parameter set needs to be selected from a value between “5 + 1 = 6” and “10-3 + 1 = 8”. .

「ｂｓＰａｒａｍＳｌｏｔ［１］」が７であれば、「ｂｓＰａｒａｍＳｌｏｔ［２］」は８または９になりうる。「ｂｓＰａｒａｍＳｌｏｔ［１］」が８であれば、「ｂｓＰａｒａｍＳｌｏｔ［２］」は９になりうる。 If “bsParamSlot [1]” is 7, “bsParamSlot [2]” can be 8 or 9. If “bsParamSlot [1]” is 8, “bsParamSlot [2]” can be 9.

このため、「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ]」は、固定ビットとして表わされる代わりに、上記の特徴を用いて可変個数のビットで表わすことができる。 For this reason, “bsParamSlot [ps]” can be represented by a variable number of bits using the above feature instead of being represented as fixed bits.

「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ］」をビットストリームに構成するに当たって、「ｐｓ」が０である場合、「ｂｓＰａｒａｍＳｌｏｔ［０］」は「ｎＢｉｔｓＰａｒａｍＳｌｏｔ（０）」に対応する数のビットにより絶対値として表わすことができる。「ｐｓ」が０よりも大きな場合、「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ］」は「ｎＢｉｔｓＰａｒａＳｌｏｔ（ｐｓ）」に対応する数により絶対値として表わすことができる。ビットストリームから上記のように構成された「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ］」を読み取るとき、各データに対するビットストリームの長さ、すなわち、「ｎＢｉｔｓＰａｒａｍＳｌｏｔ［ｐｓ］」は〔式１０〕を用いて表わすことができる。 In configuring “bsParamSlot [ps]” into a bitstream, if “ps” is 0, “bsParamSlot [0]” can be represented as an absolute value by the number of bits corresponding to “nBitsParamSlot (0)”. . When “ps” is greater than 0, “bsParamSlot [ps]” can be expressed as an absolute value by a number corresponding to “nBitsParaSlot (ps)”. When “bsParamSlot [ps]” configured as described above is read from the bitstream, the length of the bitstream for each data, that is, “nBitsParamSlot [ps]” can be expressed using [Equation 10].

具体的に、「ｎＢｉｔｓＰａｒａｍＳｌｏｔ［ｐｓ］」はｎＢｉｔｓＰａｒａｍＳｌｏｔ［０］＝ｆ_b（ｎｕｍＳｌｏｔｓ−ｎｕｍＰａｒａＳｅｔｓ＋１）で表わすことができる。０＜ｐｓ＜ｎｕｍＰａｒａｍＳｅｔｓであれば、「ｎＢｉｔｓＰａｒａｍＳｌｏｔ［ｐｓ］」はｎＢｉｔｓＰａｒａｍＳｌｏｔ［ｐｓ］＝ｆ_b（ｎｕｍＳｌｏｔｓ−ｎｕｍＰａｒａＳｅｔｓ＋ｐｓ−ｂｓＰａｒａｍＳｌｏｔ［ｐｓ−１］）で表わすことができる。「ｎＢｉｔｓＰａｒａｍＳｌｏｔ［ｐｓ］」は〔式１８〕を７ビットまで延長した〔式１９〕を用いて決定可能である。 Specifically, “nBitsParamSlot [ps]” can be expressed as nBitsParamSlot [0] = f _b (numSlots−numParaSets + 1). If 0 <ps <numParamSets, “nBitsParamSlot [ps]” can be expressed as nBitsParamSlot [ps] = f _b (numSlots−numParaSets + ps−bsParamSlot [ps−1]). “NBitsParamSlot [ps]” can be determined using [Expression 19] obtained by extending [Expression 18] to 7 bits.

関数ｆｂ（ｘ）の例について後述する。「ｎｕｍＳｌｏｔｓ」が１５であり、かつ、「ｎｕｍＰａｒａｍＳｅｔｓ」が３である場合、上記の関数はｎＢｉｔｓＰａｒａｍＳｌｏｔ「１」＝ｆｂ（１５−３＋１−７）＝３ビットを求めることができる。この場合、「ｂｓＤｉｆｆＰａｒａｍＳｌｏｔ［１］」フィールド１３０５は３ビットで表わすことができる。 An example of the function fb (x) will be described later. If “numSlots” is 15 and “numParamSets” is 3, the above function can determine nBitsParamSlot “1” = fb (15-3 + 1-7) = 3 bits. In this case, the “bsDiffParamSlot [1]” field 1305 can be represented by 3 bits.

３ビットで表わされる値が３である場合、「ｂｓＰａｒａｍＳｌｏｔ［１］」は７＋３＝１０となる。このため、ｎＢｉｔｓＰａｒａｍＳｌｏｔ［２］＝ｆ_b（１５−３＋２−１０）＝２ビットとなる。この場合、「ｂｓＤｉｆｆＰａｒａｍＳｌｏｔ［１］」フィールド１３０５は２ビットで表わすことができる。レジデュアルタイムスロットの個数がレジデュアルパラメータセットの個数と同数である場合、「ｂｓＤｉｆｆＰａｒａｍＳｌｏｔ［ｐｓ]」フィールドには０ビットが割り当てられる。換言すると、パラメータセットが適用されるタイムスロットの位置を表わすための追加情報が不要である。 When the value represented by 3 bits is 3, “bsParamSlot [1]” is 7 + 3 = 10. Therefore, nBitsParamSlot [2] = f _b (15−3 + 2−10) = 2 bits. In this case, the “bsDiffParamSlot [1]” field 1305 can be represented by 2 bits. When the number of residual time slots is the same as the number of residual parameter sets, 0 bits are assigned to the “bsDiffParamSlot [ps]” field. In other words, no additional information is required to represent the position of the time slot to which the parameter set is applied.

このため、「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ]」に対するビットの個数は可変的に決定可能である。「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ]」に対するビットの個数はデコーダーにおいて関数ｆ_b（ｘ）を用いてビットストリームから読取り可能である。一部の実施形態において、関数ｆ_b（ｘ）は関数ｃｅｉｌ（ｌｏｇ₂（ｘ））を含むことができる。 Therefore, the number of bits for “bsParamSlot [ps]” can be variably determined. The number of bits for “bsParamSlot [ps]” can be read from the bitstream using the function f _b (x) at the decoder. In some embodiments, the function f _b (x) can include the function ceil (log ₂ (x)).

絶対値と差分値で表わされる「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ］」に関する情報をデコーダーにおいてビットストリームから読み取るとき、先ず、ビットストリームから「ｂｓＰａｒａｍＳｌｏｔ［０］」が読み取られてから、０＜ｐｓ＜ｎｕｍＰａｒａｍＳｅｔｓに対する「ｂｓＤｉｆｆＰａｒａｍＳｌｏｔ［ｐｓ］」が読み取られる。そして、０≦ｐｓ＜ｎｕｍＰａｒａｍＳｅｔｓ間隔に対する「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ］」は「ｂｓＰａｒａｍＳｌｏｔ［０］」と「ｂｓＤｉｆｆＰａｒａｍＳｌｏｔ［ｐｓ］」を用いて求めることができる。例えば、図１３Ｂに示すように、「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ］」は、「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ−１］」に「ｂｓＤｉｆｆＰａｒａｍＳｌｏｔ［ｐｓ］＋１」を加えて求めることができる。 When reading information about “bsParamSlot [ps]” represented by the absolute value and the difference value from the bitstream at the decoder, first, “bsParamSlot [0]” is read from the bitstream, and then “bsDiffParamSlot for 0 <ps <numParamSets”. [Ps] "is read. Then, “bsParamSlot [ps]” for the interval 0 ≦ ps <numParamSets can be obtained using “bsParamSlot [0]” and “bsDiffParamSlot [ps]”. For example, as shown in FIG. 13B, “bsParamSlot [ps]” can be obtained by adding “bsDiffParamSlot [ps] +1” to “bsParamSlot [ps−1]”.

図１３Ｃは、本発明の一実施形態により、パラメータセットが適用されるタイムスロットの位置情報を示すシンタックスを示す図である。複数のパラメータセットが存在する場合、複数のパラメータセットに対する複数の「ｂｓＰａｒａｍＳｌｏｔｓ」１３０７は少なくとも一つのグループとして表わすことができる。 FIG. 13C is a diagram illustrating a syntax indicating time slot position information to which a parameter set is applied according to an embodiment of the present invention. If there are multiple parameter sets, multiple “bsParamSlots” 1307 for multiple parameter sets can be represented as at least one group.

「ｂｓＰａｒａｍＳｌｏｔｓ」１３０７の個数が（ｋＮ＋Ｌ）であり、かつ、「ｂｓＰａｒａｍＳｌｏｔｓ」１３０７のそれぞれを表わすのにＱビットが必要となる場合、「ｂｓＰａｒａｍＳｌｏｔｓ」１３０７は以下のグループとして表わすことができる。この場合、「ｋ」と「Ｎ」は「０」ではない任意の整数であり、「Ｌ」は０≦Ｌ＜Ｎを満たす任意の整数である。 If the number of “bsParamSlots” 1307 is (kN + L) and Q bits are required to represent each of “bsParamSlots” 1307, then “bsParamSlots” 1307 can be represented as the following group. In this case, “k” and “N” are arbitrary integers that are not “0”, and “L” is an arbitrary integer that satisfies 0 ≦ L <N.

グループ化方法は、Ｎ個の「ｂｓＰａｒａｍＳｌｏｔｓ」１３０７を束ねてｋ個のグループを生成するステップと、最後のＬ個の「ｂｓＰａｒａｍＳｌｏｔｓ」１３０７を束ねて最終グループを生成するステップと、を含む。ｋ個のグループはＭビットで表わすことができ、最終グループはｐビットで表わすことができる。この場合、Ｍビットが、好ましくは、「ｂｓＰａｒａｍＳｌｏｔｓ」１３０７のそれぞれをグループ化することなく表わす場合に用いられるＮ＊Ｑビットよりも小さい。Ｐビットが、好ましくは、「ｂｓＰａｒａｍＳｌｏｔｓ」１３０７のそれぞれをグループ化することなく表わす場合に用いられるＬ＊Ｑビット以下である。 The grouping method includes a step of bundling N “bsParamSlots” 1307 to generate k groups, and a step of bundling the last L “bsParamSlots” 1307 to generate a final group. The k groups can be represented by M bits and the final group can be represented by p bits. In this case, the M bits are preferably smaller than the N * Q bits used to represent each of the “bsParamSlots” 1307 without grouping. The P bits are preferably less than or equal to the L * Q bits used to represent each of the “bsParamSlots” 1307 without grouping.

例えば、２つのパラメータセットに対する１対の「ｂｓＰａｒａｍＳｌｏｔｓ」１３０７がそれぞれｄ１とｄ２であるとする。ｄ１とｄ２がそれぞれ５個の値を有する場合、ｄ１とｄ２のそれぞれを表わすのに３ビットが必要となる。この場合、たとえ３ビットは８個の値を表わすことができるとしても、実質的には５個の値が必要となる。このため、ｄ１とｄ２のそれぞれは３個の余分を有する。しかしながら、ｄ１とｄ２を束ねてグループとして表わす場合には、６ビット（＝３ビット＋３ビット）の代わりに５ビットが用いられる。具体的に、ｄ１とｄ２との全ての組み合わせは２５個（＝５＊５）のタイプを有するため、ｄ１とｄ２のグループは５ビットで表わすことができる。５ビットは３２個の値を表わすことができるため、グループ化表現の場合、７個の余分が生成される。しかしながら、ｄ１とｄ２をグループ化して表わす場合、その余分はｄ１とｄ２をそれぞれ３ビットで表わす場合の余分よりも小さい。 For example, it is assumed that a pair of “bsParamSlots” 1307 for two parameter sets is d1 and d2, respectively. If d1 and d2 each have 5 values, 3 bits are required to represent each of d1 and d2. In this case, even if 3 bits can represent 8 values, substantially 5 values are required. For this reason, each of d1 and d2 has three extras. However, when d1 and d2 are combined and represented as a group, 5 bits are used instead of 6 bits (= 3 bits + 3 bits). Specifically, since all combinations of d1 and d2 have 25 types (= 5 * 5), the group of d1 and d2 can be represented by 5 bits. Since 5 bits can represent 32 values, 7 extras are generated in the grouped representation. However, when d1 and d2 are represented as a group, the surplus is smaller than that when d1 and d2 are each represented by 3 bits.

グループを構成するとき、グループに関するデータは、初期値に対する「ｂｓＰａｒａｍＳｌｏｔ［０］」と、２番目以上の値に対する「ｂｓＰａｒａｍＳｌｏｔ［ｐｓ］」との組間の差分値を用いて構成可能である。 When configuring a group, data relating to the group can be configured using a difference value between a pair of “bsParamSlot [0]” for the initial value and “bsParamSlot [ps]” for the second or more values.

グループを構成するとき、パラメータセットの個数が１であれば、グループ化することなくビットが直接的に割当て可能であり、パラメータセットの個数が２以上であれば、グループ化を完了してからビットが割当て可能である。 When configuring a group, if the number of parameter sets is 1, bits can be directly assigned without grouping. If the number of parameter sets is 2 or more, the bit is set after grouping is completed. Can be assigned.

図１４は、本発明の一実施形態によるエンコーディング方法のフローチャートである。以下、本発明によるオーディオ信号のエンコーディング方法及びエンコーダーの動作を説明する。 FIG. 14 is a flowchart of an encoding method according to an embodiment of the present invention. Hereinafter, an audio signal encoding method and an encoder operation according to the present invention will be described.

まず、１枚の空間フレームにおいてタイムスロットの総数ｎｕｍＳｌｏｔｓとオーディオ信号のパラメータ帯域の総数ｎｕｍＢａｎｄｓが決定される（Ｓ１４０１）。 First, the total number of time slots numSlots and the total number of parameter bands numBands of audio signals in one spatial frame are determined (S1401).

そして、チャンネル変換モジュールに適用されるパラメータの個数及び／またはレジデュアル信号が決定される（Ｓ１４０２）。 Then, the number of parameters and / or residual signals applied to the channel conversion module are determined (S1402).

ＯＴＴボックスがＬＦＥチャンネルモードを有する場合、ＯＴＴボックスに適用されるパラメータ帯域の個数は別途に決定される。 When the OTT box has the LFE channel mode, the number of parameter bands applied to the OTT box is determined separately.

ＯＴＴボックスがＬＦＥチャンネルモードを有さない場合、ＯＴＴボックスに適用されるパラメータ帯域の個数として「ｎｕｍＢａｎｄｓ」が用いられる。 When the OTT box does not have the LFE channel mode, “numBands” is used as the number of parameter bands applied to the OTT box.

続けて、空間フレームのタイプが判定される。この場合、空間フレームは固定フレームタイプと可変フレームタイプとに大別できる。 Subsequently, the type of spatial frame is determined. In this case, the spatial frame can be roughly divided into a fixed frame type and a variable frame type.

空間フレームが可変フレームタイプである場合（Ｓ１４０３）、１枚の空間フレーム内において用いられるパラメータセットの個数が決定される（Ｓ１４０６）。この場合、パラメータセットはタイムスロット単位でチャンネル変換モジュールに適用可能である。 When the spatial frame is a variable frame type (S1403), the number of parameter sets used in one spatial frame is determined (S1406). In this case, the parameter set can be applied to the channel conversion module in units of time slots.

続けて、パラメータセットが適用されるタイムスロットの位置が決定される（Ｓ１４０７）。この場合、パラメータセットが適用されるタイムスロットの位置は絶対値と差分値として表わすことができる。例えば、最初のパラメータセットが適用されるタイムスロットの位置は絶対値として表わすことができ、２番目以上のパラメータセットが適用されるタイムスロットの位置は以前のタイムスロットの位置からの差分値として表わすことができる。この場合、パラメータセットが適用されるタイムスロットの位置は可変個数のビットとして表わすことができる。 Subsequently, the position of the time slot to which the parameter set is applied is determined (S1407). In this case, the position of the time slot to which the parameter set is applied can be expressed as an absolute value and a difference value. For example, the time slot position to which the first parameter set is applied can be expressed as an absolute value, and the time slot position to which the second or more parameter set is applied is expressed as a difference value from the previous time slot position. be able to. In this case, the position of the time slot to which the parameter set is applied can be represented as a variable number of bits.

具体的に、最初のパラメータセットが適用されるタイムスロットの位置は、タイムスロットの総数とパラメータセットの総数を用いて算出されるビットの個数で表わすことができる。２番目以上のパラメータセットが適用されるタイムスロットの位置は、タイムスロットの総数と、パラメータセットの総数と、以前のパラメータセットが適用されるタイムスロットの位置を用いて算出されるビットの個数で表わすことができる。 Specifically, the position of the time slot to which the first parameter set is applied can be represented by the number of bits calculated using the total number of time slots and the total number of parameter sets. The position of the time slot to which the second or more parameter set is applied is the total number of time slots, the total number of parameter sets, and the number of bits calculated using the position of the time slot to which the previous parameter set is applied. Can be represented.

空間フレームが固定フレームタイプである場合、１枚の空間フレームに用いられたパラメータセットの個数が決定される（Ｓ１４０４）。この場合、パラメータセットが適用されるタイムスロットの位置は予め設定された規則を用いて決定される。例えば、パラメータセットが適用されるタイムスロットの位置は、以前のパラメータセットが適用されるタイムスロットの位置から等間隔を有するように決定可能である（Ｓ１４０５）。 If the spatial frame is a fixed frame type, the number of parameter sets used for one spatial frame is determined (S1404). In this case, the position of the time slot to which the parameter set is applied is determined using a preset rule. For example, the position of the time slot to which the parameter set is applied can be determined to have an equal interval from the position of the time slot to which the previous parameter set is applied (S1405).

続けて、ダウンミキシング部と空間生成部は、先に決定されたタイムスロットの総数と、パラメータ帯域の総数と、チャンネル変換部に適用されるべきパラメータ帯域の総数と、１枚の空間フレームにおけるパラメータセットの総数と、パラメータセットが適用されるタイムスロットの位置情報を用いて、ダウンミックス信号と空間情報をそれぞれ生成する（Ｓ１４０８）。 Subsequently, the downmixing unit and the space generation unit perform the total number of time slots determined previously, the total number of parameter bands, the total number of parameter bands to be applied to the channel conversion unit, and the parameters in one spatial frame. A downmix signal and spatial information are generated using the total number of sets and the position information of the time slot to which the parameter set is applied (S1408).

最後に、マルチプレクシング部は、ダウンミックス信号と空間情報を含むビットストリームを生成して（Ｓ１４０９）、この生成されたビットストリームをデコーダーに転送する（Ｓ１４０９）。 Finally, the multiplexing unit generates a bitstream including a downmix signal and spatial information (S1409), and transfers the generated bitstream to the decoder (S1409).

図１５は、本発明の一実施形態によるデコーディング方法のフローチャートである。以下、本発明によるオーディオ信号のデコーディング方法及びデコーダーの動作を説明する。 FIG. 15 is a flowchart of a decoding method according to an embodiment of the present invention. Hereinafter, an audio signal decoding method and a decoder operation according to the present invention will be described.

まず、デコーダーは、オーディオ信号のビットストリームを受信する（Ｓ１５０１）。デマルチプレクシング部は、受信されたビットストリームからダウンミックス信号と空間情報信号とを分離する（Ｓ１５０２）。続けて、空間情報信号デコーディング部は、空間情報信号の構成情報から、１枚の空間フレームにおけるタイムスロットの総数、パラメータ帯域の総数及びチャンネル変換モジュールに適用されるパラメータ帯域の個数に関する情報を抽出する（Ｓ１５０３）。 First, the decoder receives a bit stream of an audio signal (S1501). The demultiplexing unit separates the downmix signal and the spatial information signal from the received bitstream (S1502). Subsequently, the spatial information signal decoding unit extracts information on the total number of time slots, the total number of parameter bands, and the number of parameter bands applied to the channel conversion module in one spatial frame from the configuration information of the spatial information signal. (S1503).

空間フレームが可変フレームタイプである場合（Ｓ１５０４）、１枚の空間フレームにおけるパラメータセットの個数とパラメータセットが適用されるタイムスロットの位置情報が空間フレームから抽出される（Ｓ１５０５）。タイムスロットの位置情報は、固定個数のビットまたは可変個数のビットで表わすことができる。この場合、最初のパラメータセットが適用されるタイムスロットの位置情報は絶対値として表わすことができ、２番目以上のパラメータセットが適用されるタイムスロットの位置情報は差分値として表わすことができる。２番目以上のパラメータセットが適用されるタイムスロットの実際の位置情報は、以前のパラメータセットが適用されたタイムスロットの位置情報に差分値を加えて求められる。 When the spatial frame is a variable frame type (S1504), the number of parameter sets in one spatial frame and the time slot position information to which the parameter sets are applied are extracted from the spatial frame (S1505). Time slot position information can be represented by a fixed number of bits or a variable number of bits. In this case, the position information of the time slot to which the first parameter set is applied can be expressed as an absolute value, and the position information of the time slot to which the second or more parameter set is applied can be expressed as a difference value. The actual position information of the time slot to which the second or more parameter set is applied is obtained by adding a difference value to the position information of the time slot to which the previous parameter set is applied.

最後に、抽出された情報を用いてダウンミックス信号がマルチチャンネルオーディオ信号に変換される（Ｓ１５０６）。 Finally, the downmix signal is converted into a multi-channel audio signal using the extracted information (S1506).

この明細書に記載の実施形態は、従来のオーディオコーディング方式に比べて種々のメリットを与える。 The embodiments described in this specification provide various advantages over conventional audio coding schemes.

最初に、マルチチャンネルオーディオ信号のコーディングにおいて、パラメータセットが適用されるタイムスロットの位置を可変個数のビットで表わすことにより、転送データ量を低減することができる。 First, in the coding of a multi-channel audio signal, the amount of transferred data can be reduced by representing the position of the time slot to which the parameter set is applied by a variable number of bits.

第二に、最初のパラメータセットが適用されるタイムスロットの位置を絶対値で表わし、２番目以上のパラメータセットが適用されるタイムスロットの位置を差分値で表わすことにより、転送データ量を低減することができる。 Second, the amount of transferred data is reduced by expressing the position of the time slot to which the first parameter set is applied as an absolute value and the position of the time slot to which the second or more parameter set is applied as a difference value. be able to.

第三に、ＯＴＴボックス及び／又はＴＴＴボックスなどに適用されるパラメータ帯域の個数を固定個数のビットまたは可変個数のビットで表わすことにより、転送データ量を低減することができる。この場合、パラメータセットが適用されるタイムスロットの位置は、上述した原理を用いて表わすことができ、ここで、パラメータセットはパラメータ帯域の個数範囲内に存在する。 Third, the amount of transferred data can be reduced by expressing the number of parameter bands applied to the OTT box and / or the TTT box by a fixed number of bits or a variable number of bits. In this case, the position of the time slot to which the parameter set is applied can be expressed using the above-described principle, where the parameter set exists within the number of parameter bands.

図１６は、図１〜図１５に基づき説明されたオーディオエンコーダー／デコーダーを実現する装置構造１６００の一例を示すブロック図である。この装置構造１６００は、パソコン、サーーバコンピュータ、家電装置、移動電話、ＰＤＡ、電子タブレット、テレビシステム、テレビセットトップボックス、ゲームコンソール、媒体再生器、音楽再生器、ナビゲーションシステム及びオーディオ信号をデコーディング可能な任意のその他の装置を含む種々の装置に適用可能であるが、これらに制限されるものではない。これらの装置の一部はハードウェアとソフトウェアとの組み合わせを用いて変更された構造を実現することができる。 FIG. 16 is a block diagram illustrating an example of a device structure 1600 that implements the audio encoder / decoder described with reference to FIGS. This device structure 1600 decodes personal computers, server computers, home appliances, mobile phones, PDAs, electronic tablets, television systems, television set top boxes, game consoles, media players, music players, navigation systems and audio signals. It is applicable to a variety of devices, including but not limited to any other device possible. Some of these devices can implement a modified structure using a combination of hardware and software.

この構造１６００は、１以上のプロセッサー１６０２（例えば、ＰｏｗｅｒＰＣ（登録商標）、ＩｎｔｅｌＰｅｎｔｉｕｍ（登録商標）４など）と、１以上のディスプレイ装置１６０４（例えば、ＣＲＴ、ＬＣＤなど）と、オーディオサブシステム１６０６（例えば、オーディオハードウェア／ソフトウェア）と、１以上のネットワークインタフェース１６０８（例えば、Ｅｔｈｅｒｎｅｔ（登録商標）、ＦｉｒｅＷｉｒｅ（登録商標）、ＵＳＢなど）と、入力装置１６１０（例えば、キーボード、マウスなど）と、１以上のコンピュータにて読取り可能な媒体１６１２（例えば、ＲＡＭ、ＲＯＭ、ＳＤＲＡＭ、ハードディスク、光ディスク、フラッシュメモリなど）と、を備える。これらの構成要素は１以上のバス１６１４（例えば、ＥＩＳＡ、ＰＣＩ、ＰＣＩＥｘｐｒｅｓｓなど）を介して通信し、データをやり取りすることができる。 The structure 1600 includes one or more processors 1602 (eg, PowerPC®, Intel Pentium® 4, etc.), one or more display devices 1604 (eg, CRT, LCD, etc.), and an audio subsystem 1606. (For example, audio hardware / software), one or more network interfaces 1608 (for example, Ethernet (registered trademark), FireWire (registered trademark), USB, etc.), an input device 1610 (for example, keyboard, mouse, etc.), And one or more computer-readable media 1612 (eg, RAM, ROM, SDRAM, hard disk, optical disk, flash memory, etc.). These components can communicate and exchange data via one or more buses 1614 (eg, EISA, PCI, PCI Express, etc.).

「コンピュータにて読取り可能な媒体」というターミノロジーは、プロセッサー１６０２に実行用命令語を与える任意の媒体をいい、不揮発性媒体（例えば、光ディスクや磁気ディスクなど）、揮発性媒体（例えば、メモリーなど）及び転送媒体を含むが、これらに制限されるものではない。転送媒体は、同軸ケーブル、銅配線及び光ファイバを含むが、これらに制限されるものではない。また、転送媒体は、音響波、光波または無線周波数波形の形態が採用可能である。 The term “computer-readable medium” refers to any medium that gives execution instructions to the processor 1602, such as a non-volatile medium (for example, an optical disk or a magnetic disk), or a volatile medium (for example, a memory). ) And transfer media, but are not limited thereto. Transmission media include, but are not limited to, coaxial cables, copper wiring, and optical fibers. The transfer medium can be in the form of an acoustic wave, a light wave, or a radio frequency waveform.

コンピュータにて読取り可能な媒体１６１２は、オペレーティングシステム１６１６（例えば、ＭａｃＯＳ（登録商標）、Ｗｉｎｄｏｗｓ（登録商標）、Ｌｉｎｕｘ（登録商標）など）と、ネットワーク通信モジュール１６１８と、オーディオコーデック１６２０と、１以上のアプリケーション１６２２と、をさらに含む。 The computer readable medium 1612 includes an operating system 1616 (eg, MacOS (registered trademark), Windows (registered trademark), Linux (registered trademark), etc.), a network communication module 1618, an audio codec 1620, and one or more. The application 1622 is further included.

オペレーティングシステム１６１６は、マルチユーザー、マルチプロセッシング、マルチタスキング、マルチスレッディング、リアルタイムなどであってもよい。オペレーティングシステム１６１６は、入力装置１６１０から入力を認識し、ディスプレイ装置１６０４とサブシステム１６０６に出力を送り、コンピュータにて読取り可能な媒体１６１２（例えば、メモリーまたは記録装置など）の上にファイルとディレクトリを保持し、周辺装置（例えば、ディスクドライブ、プリンターなど）を制御し、１以上のバス１６１４上におけるトラフィックを管理する基本的なタスクを行うが、これらに制限されるものではない。 The operating system 1616 may be multi-user, multi-processing, multi-tasking, multi-threading, real-time, etc. The operating system 1616 recognizes input from the input device 1610 and sends output to the display device 1604 and subsystem 1606 to place files and directories on a computer readable medium 1612 (eg, memory or recording device). Retain, control peripheral devices (eg, disk drives, printers, etc.) and perform basic tasks to manage traffic on one or more buses 1614, but are not limited to these.

ネットワーク通信モジュール１６１８は、ネットワーク接続を確立して保持する種々の構成要素（例えば、ＴＣＰ／ＩＰ、ＨＴＴＰ、Ｅｔｈｅｒｎｅｔ（登録商標）などの通信プロトコルを実現するソフトウェアなど）を含む。ネットワーク通信モジュール１６１８は、装置構造１６００のオペレーターに情報（例えば、オーディオコンテンツなど）を探索してネットワーク（例えば、インターネットなど）を検索可能にするブラウザーを含むことができる。 The network communication module 1618 includes various components (for example, software that implements a communication protocol such as TCP / IP, HTTP, and Ethernet (registered trademark)) that establishes and maintains a network connection. The network communication module 1618 may include a browser that allows an operator of the device structure 1600 to search for information (eg, audio content) and search a network (eg, the Internet).

オーディオコーデック１６２０は、図１〜図１５に基づき説明したエンコーディング及び／またはデコーディングプロセスの全部または一部を実現する役割を果たす。一部の実施形態において、オーディオコーデックはハードウェア（例えば、プロセッサー１６０２、オーディオサブシステム１６０６など）と連動してこの明細書に記載の本発明によりオーディオ信号をエンコーディング及び／またはデコーディングすることをはじめとしてオーディオ信号を処理する。 The audio codec 1620 serves to implement all or part of the encoding and / or decoding process described with reference to FIGS. In some embodiments, the audio codec begins with encoding and / or decoding audio signals according to the invention described herein in conjunction with hardware (eg, processor 1602, audio subsystem 1606, etc.). As an audio signal.

アプリケーション１６２２は、オーディオコンテンツに関連する任意のソフトウェアを含むことができ、オーディオコンテンツは、媒体再生器、音楽再生器（例えば、ＭＰ３再生器など）、移動電話アプリケーション、ＰＤＡ、テレビシステム、セットトップボックスなどにおいてエンコーディング及び／またはデコーディングされるが、これらに制限されるものではない。一実施形態において、オーディオコーデックはアプリケーションサービスプロバイダーがネットワーク（例えば、インターネットなど）を介してエンコーディング／デコーディングサービスを提供するのに使用可能である。 Application 1622 can include any software related to audio content, which can be a media player, music player (eg, MP3 player, etc.), mobile phone application, PDA, television system, set-top box. However, the present invention is not limited to such encoding and / or decoding. In one embodiment, the audio codec can be used by an application service provider to provide an encoding / decoding service over a network (eg, the Internet, etc.).

以上の説明においては、説明の目的から、本発明の完全な理解を提供するための種々の特定の詳細について開示している。しかしながら、当業者であれば、本発明がこれらの特定の詳細なしにも実行可能であるということが理解できるであろう。なお、本発明を不明にすることを防ぐために、構造及び装置はブロック図として示してある。 In the foregoing description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of the present invention. However, one skilled in the art will understand that the invention may be practiced without these specific details. In order to avoid obscuring the present invention, the structure and apparatus are shown as a block diagram.

特に、当業者であれば、他の構造及びグラフィック環境が使用可能であり、上述とは異なるグラフィックツールと製品を用いて本発明が実現可能であるという点が理解できるであろう。特に、クライアント／サーバーアプローチは本発明のダッシュボード機能を与える構造の一例に過ぎず、当業者であれば、クライアント／サーバーアプローチではない他のものが使用可能であるという点が理解できるであろう。 In particular, those skilled in the art will appreciate that other structures and graphic environments can be used, and that the present invention can be implemented using different graphic tools and products than those described above. In particular, the client / server approach is only one example of a structure that provides the dashboard functionality of the present invention, and those skilled in the art will understand that other than the client / server approach can be used. .

詳細な説明の一部は、コンピュータメモリ内におけるデータビットに対する演算のアルゴリズムとシンボル表現で実現されている。これらのアルゴリズム説明と表現は、データ処理分野の当業者が他の当業者に自分の作業の本質を最も有効に伝えるための手段である。一般的に、そしてこの明細書において、アルゴリズムは所望の結果に至るステップの一連のシーケンスとして認識される。これらのステップは物理量の操作を必要とする。一般に、必ずしもその限りではないが、このような量は貯蔵されたり、転送されたり、組み合わせられたり、比較されたり、他に操作可能な電気信号または磁気信号の形態をとる。主として、通常の用途の理由から、これらの信号をビット、値、エレメント、シンボル、キャラクター、述語、ナンバーなどと呼んだ方が便利である。 Part of the detailed description is implemented by algorithms and symbolic representations of operations on data bits in computer memory. These algorithmic descriptions and representations are the means by which those skilled in the data processing arts will most effectively convey the substance of their work to others skilled in the art. In general, and in this specification, an algorithm is recognized as a sequence of steps leading to a desired result. These steps require manipulation of physical quantities. In general, though not necessarily, such quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It is more convenient to call these signals bits, values, elements, symbols, characters, predicates, numbers, etc. mainly for reasons of normal use.

しかしながら、これらの用語及びこれに類似する用語はあらゆる適切な物理量に関連するものであり、単にこれらの量に適用される便利な命名に過ぎない。論議から明らかなように、特に断わりのない限り、この明細書において、「処理」または「コンピューティング」または「算出」または「決定」または「表示」などのターミノロジーを用いるということは、コンピュータシステムのレジスターとメモリ内において物理（電気）量として表わされるデータを操作し、これをコンピュータシステムメモリまたはレジスターや他の情報の貯蔵、転送または表示装置において物理量で表わされるデータに変換するコンピュータシステムまたはこれに類似する電子計算装置のアクション及び処理を言う。 However, these terms and similar terms are associated with any suitable physical quantity and are merely convenient nomenclature applied to these quantities. As is apparent from the discussion, unless otherwise specified, the use of terminology, such as “processing” or “computing” or “calculation” or “decision” or “display” in this specification, is a computer system. A computer system for manipulating data represented as physical (electrical) quantities in registers and memories, and for converting them into data represented by physical quantities in a computer system memory or register or other information storage, transfer or display device Refers to actions and processing of an electronic computing device similar to.

本発明はこのような動作を行う装置に関する。この装置は、求められる目的により特殊に構成可能であるか、あるいは、コンピュータに格納されたコンピュータプログラムにより選択的に活性化されるか、あるいは、再構成される汎用のコンピュータを含むことができる。このようなコンピュータプログラムは、フロッピー（登録商標）ディスク、光ディスク、ＣＤ−ＲＯＭ、磁気光ディスクをはじめとする任意タイプのディスク、読込み専用メモリー（ＲＯＭ）、ランダムアクセスメモリ（ＲＡＭ）、ＥＰＲＯＭ、ＥＥＰＲＯＭ、磁気カードまたは光カード、または電子的な命令語を格納するのに適した任意タイプの媒体を含み、コンピュータシステムに接続されるコンピュータ貯蔵媒体に格納可能である。 The present invention relates to an apparatus for performing such an operation. The apparatus can be specially configured for the required purpose, or it can include a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such computer programs include floppy (registered trademark) disks, optical disks, CD-ROMs, arbitrary types of disks including magnetic optical disks, read only memory (ROM), random access memory (RAM), EPROM, EEPROM, magnetic It can be stored on a computer storage medium connected to a computer system, including a card or optical card, or any type of medium suitable for storing electronic instructions.

明細書に記載のアルゴリズムとモジュールが元々任意の特定のコンピュータや他の装置に関するものではない。この明細書に記載の技術によるプログラムに種々の汎用システムが使用可能であり、方法ステップを行う上でより特定化した装置を構成することが便利になることがある。このような種々のシステムに必要となる構造については、後述する。この明細書に記載の本発明の教示事項を実現するために種々のプログラミング言語が使用可能である。また、当業者にとって自明であるように、本発明のモジュール、特徴、属性、方法論及びその他の態様は、ソフトウェア、ハードウェア、ファームウェアまたはこれらの３種類に対する任意の組み合わせにより実現可能である。もちろん、本発明の構成要素がソフトウェアとして実現される場合、これらの構成要素は独自のプログラムとして、より大きなプログラムの一部として、複数の個別プログラムとして、静的にまたは動的にリンクされるライブラリーとして、カーネルロード型モジュールとして、デバイスドライバーとして実現されるか、あるいは、コンピュータプログラミングの分野における当業者に現在周知の、または、将来に周知になる予定のあらゆる任意の方式により実現可能である。また、本発明が任意の特定のオペレーティングシステムや環境において実現されることに制限されることはない。 The algorithms and modules described herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems can be used for the program according to the technique described in this specification, and it may be convenient to configure a more specialized apparatus for performing the method steps. The structure required for such various systems will be described later. A variety of programming languages can be used to implement the teachings of the invention as described herein. Also, as will be apparent to those skilled in the art, the modules, features, attributes, methodologies, and other aspects of the present invention can be implemented by software, hardware, firmware, or any combination of these three types. Of course, if the components of the present invention are implemented as software, these components can be statically or dynamically linked as unique programs, as part of a larger program, as multiple individual programs. It may be implemented as a library, as a kernel-loaded module, as a device driver, or by any arbitrary scheme now known to or will be known in the future to those skilled in the field of computer programming. In addition, the present invention is not limited to being implemented in any specific operating system or environment.

当業者には本発明の思想または範囲から逸脱することなく、この明細書に記載の実施形態に種々の変形及び変更がなされうるということは自明であろう。よって、本発明は、このような変形及び変更が特許請求の範囲及びその等価物の範囲内にあるものである限り、開示された実施形態のこのような変形及び変更をいずれもカバーするものである。 It will be apparent to those skilled in the art that various modifications and variations can be made to the embodiments described herein without departing from the spirit or scope of the invention. Accordingly, the present invention covers any such modifications and variations of the disclosed embodiments as long as such modifications and variations are within the scope of the claims and their equivalents. is there.

Claims

Determining a framing type;
Determining the number of time slots and the number of parameter sets including one or more parameters;
Encoding the audio signal into a bitstream that includes frames that include an ordered set of time slots;
Inserting a framing type indicator into the bitstream;
If the frame type indication indicates variable framing, the step of generating information indicating a position of at least one time slot to which the parameter is applied in the set of time slots of the set;
Inserting a variable number of bits indicating the position of the time slot of the arranged set of time slots into the bitstream;
The method of encoding an audio signal, wherein the variable number of bits is determined by the time slot position.

Receiving a bitstream indicative of an audio signal and including frames;
Determining a number of time slots and a number of parameter sets including one or more parameters from the bitstream;
Determining a framing type from the bitstream;
When the framing type is variable framing, determining position information indicating a position of a time slot to which a parameter set is applied among a set of time slots arranged from the bitstream; and
Decoding the audio signal based on the number of time slots, the number of parameter sets, and the position information;
The audio signal decoding method according to claim 1, wherein the position information is represented by a variable number of bits based on the time slot position.

The method of claim 2, wherein the variable number of bits is determined using the number of time slots.

3. The audio signal decoding method according to claim 2, wherein if the number of decoded time slots is the same as the number of parameter sets to which the parameter set is applied, the position of the time slot to which the parameter set is applied is not determined. .

5. The audio signal according to claim 4, wherein when the number of time slots is 2 ^ (n-1) or more and less than 2 ^ (n), the variable number of bits is determined as n bits. Decoding method.

5. The audio signal according to claim 4, wherein when the number of time slots is greater than 2 ^ (n-1) and less than or equal to 2 ^ (n), the variable number of bits is determined as n bits. Decoding method.

The position information is represented as a sum of a previous value and a difference value, the previous value indicates position information of a time slot to which the first parameter set is applied, and the difference value is applied to the second parameter set. 4. The audio signal decoding method according to claim 3, wherein the time slot position information is indicated.

8. The method of claim 7, wherein the previous value is represented by a variable number of bits determined using at least one of the number of time slots and the number of parameter sets.

9. The audio signal decoding method according to claim 8, wherein the variable number of bits is determined using a difference between the number of time slots and the number of parameter sets.

The difference value is represented by a variable number of bits determined using any one of the number of time slots, the number of parameter sets, and the position information of the time slot to which the previous parameter is applied. The audio signal decoding method according to claim 7.

The variable number of bits is determined using a difference value between at least one of the number of parameter sets and time slot position information to which a previous parameter set is applied and the number of time slots. 11. The audio signal decoding method according to claim 10, wherein the audio signal is decoded.

4. The audio signal decoding method according to claim 3, wherein when the number of parameter sets is N, position information of time slots to which the parameter sets are applied is represented by a combination using the following equation. . :

Here, numSlot and bsParamSlot _i indicate the number and position information of time slots to which the I-th parameter set is applied, respectively.

The audio signal according to claim 3, wherein when there are a plurality of parameter sets, the plurality of parameter sets are divided into groups, and position information of time slots to which the parameter sets are applied is represented for each group. Decoding method.

When the number of parameter sets is (kN + L), the group is generated by bundling N parameter sets and represented by M bits, and the last group is generated by bundling L parameter sets. 13. The audio signal decoding method according to claim 12, wherein the audio signal is represented by P bits.

Means for determining a framing type;
Means for determining the number of time slots and the number of parameter sets including one or more parameters;
Means for encoding the audio signal into a bitstream including frames including a set of time slots;
Means for inserting a framing type indicator in the bitstream;
Means for generating information indicating a position of at least one time slot to which the parameter applies in the set of time slots when the frame type indication indicates variable framing;
Means for inserting, into the bitstream, a variable number of bits indicating the position of the time slot among the set of time slots;
The audio signal encoding apparatus, wherein the variable number of bits is determined by the time slot position.

Means for receiving a bitstream indicative of an audio signal and comprising a frame;
Means for determining from the bitstream the number of time slots and the number of parameter sets comprising one or more parameters;
Means for determining a framing type from the bitstream;
If the framing type is variable framing, means for determining position information indicating a position of a time slot to which a parameter set is applied among a set of time slots arranged from the bitstream;
Means for decoding the audio signal based on the number of time slots, the number of parameter sets, and the position information;
The audio signal decoding apparatus, wherein the position information is represented by a variable number of bits based on the time slot position.

In a data structure that includes a bitstream representing an audio signal,
A first field containing the number of time slots;
A second field containing the number of parameter slots;
A third field including a framing type indicator;
A fourth field containing location information that determines the location of the time slot to which the parameter set applies;
The data structure is characterized in that the position information is represented by a variable number of bits based on the time slot position and the framing type indicator.

In a computer readable medium storing instructions that, when executed by a processor, cause the processor to:
The operation is
Receiving a bitstream indicative of an audio signal and including frames;
Determining a number of time slots and a number of parameter sets including one or more parameters from the bitstream;
Determining a framing type from the bitstream;
When the framing type is variable framing, determining position information indicating a position of a time slot to which a parameter set is applied among a set of time slots arranged from the bitstream; and
Decoding the audio signal based on the number of time slots, the number of parameter sets, and the position information;
The computer-readable medium is characterized in that the position information is represented by a variable number of bits based on the time slot position.

A processor;
When executed by the processor, a computer-readable medium storing a command word that causes the processor to perform the following operations, and
The operation is
Receiving a bitstream indicative of an audio signal and comprising a frame;
Determining a number of time slots and a number of parameter sets including one or more parameters from the bitstream;
Determining a framing type from the bitstream;
When the framing type is variable framing, determining position information indicating a position of a time slot to which a parameter set is applied among a set of time slots arranged from the bitstream; and
Decoding the audio signal based on the number of time slots, the number of parameter sets, and the position information;
The system is characterized in that the position information is represented by a variable number of bits based on the time slot position.

Means for receiving a bitstream indicative of an audio signal and comprising a frame;
Means for determining from the bitstream the number of time slots and the number of parameter sets comprising one or more parameters;
Means for determining a framing type from the bitstream;
If the framing type is variable framing, means for determining position information indicating a position of a time slot to which a parameter set is applied among a set of time slots arranged from the bitstream;
Means for decoding the audio signal based on the number of time slots, the number of parameter sets, and the position information;
The system is characterized in that the position information is represented by a variable number of bits based on the time slot position.

Decoding the audio signal based on the number of time slots and the number of parameter sets if the framing type indicator indicates fixed framing;
3. The audio signal decoding method according to claim 2, wherein the time slot position is represented by a fixed number of bits.

The method of claim 21, wherein the slots are equally spaced.