JP2022543083A

JP2022543083A - Encoding and Decoding IVAS Bitstreams

Info

Publication number: JP2022543083A
Application number: JP2022506569A
Authority: JP
Inventors: ティアギ，リシャブ; フェリクストーレス，フアン
Original assignee: ドルビーラボラトリーズライセンシングコーポレイション
Priority date: 2019-08-01
Filing date: 2020-07-30
Publication date: 2022-10-07
Also published as: CN114175151A; CL2022000206A1; AU2020320270A1; KR20220042166A; CA3146169A1; MX2022001152A; US20220284910A1; BR112022000230A2; IL289449A; TW202121399A; WO2021022087A1; EP4008000A1

Abstract

encoding/decoding an immersive voice audio service (IVAS) bitstream encoding/decoding an encoding mode indicator in a common header (CH) section of the IVAS bitstream; encoding/decoding a mode header or tool header in a header (TH) section, where the TH section follows a CH section; and a metadata payload in the Metadata Payload (MDP) section of the bitstream. with the MDP section following the CH section and encoding/decoding the enhanced voice service (EVS) payload in the EVS payload (EP) section of the bitstream. where the EP section follows the CH section; on the encoder side, storing or streaming the encoded bitstream; and on the decoder side, encoding mode, tool header, EVS payload, and metadata. controlling audio decoders or storing representations thereof based on the payloads.
[Selection drawing] Fig. 2

Description

［関連出願の相互参照］
本出願は、２０１９年８月１日に出願された米国仮出願第６２／８８１，５４１号、２０１９年１０月３０日に出願された米国仮出願第６２／９２７，８９４号、２０２０年６月１１日に出願された米国仮出願第６３／０３７，７２１号、および２０２０年７月２８日に出願された米国仮出願第６３／０５７，６６６号の優先権を主張する。これらの米国仮出願の全体の開示内容は、本願において参考のため援用する。 [Cross reference to related applications]
This application is based on U.S. Provisional Application No. 62/881,541, filed Aug. 1, 2019; U.S. Provisional Application No. 62/927,894, filed Oct. 30, 2019; No. 63/037,721 filed on July 11, 2020 and US Provisional Application No. 63/057,666 filed July 28, 2020. The entire disclosures of these US provisional applications are hereby incorporated by reference.

本開示は、包括的には、オーディオビットストリームの符号化および復号化に関する。 This disclosure relates generally to encoding and decoding audio bitstreams.

音声およびビデオのエンコーダー／デコーダー（「コーデック」）の規格開発は、近年、没入型音声オーディオサービス（ＩＶＡＳ：immersive voice and audio service）のコーデックの開発に焦点を当てている。ＩＶＡＳは、様々なオーディオサービス能力をサポートすることが期待されている。これらのオーディオサービス能力には、モノラルからステレオへのアップミックス（upmixing：アップミキシング）ならびに完全没入型オーディオ符号化、復号化およびレンダリングが含まれるが、これらに限定されるものではない。ＩＶＡＳは、広範囲のデバイス、エンドポイント、およびネットワークノードによってサポートされることが意図されている。これらの広範囲のデバイス、エンドポイント、およびネットワークには、モバイルフォンおよびスマートフォン、電子タブレット、パーソナルコンピューター、会議電話、会議室、仮想現実（ＶＲ：virtual reality）デバイスおよび拡張現実（ＡＲ：augmented reality）デバイス、ホームシアターデバイス、ならびに他の適したデバイスが含まれるが、これらに限定されるものではない。これらのデバイス、エンドポイントおよびネットワークノードは、音のキャプチャーおよびレンダリング用の様々な音響インターフェースを有することができる。 Standards development for audio and video encoder/decoders (“codecs”) has recently focused on developing codecs for immersive voice and audio services (IVAS). IVAS is expected to support various audio service capabilities. These audio service capabilities include, but are not limited to, mono-to-stereo upmixing and fully immersive audio encoding, decoding and rendering. IVAS is intended to be supported by a wide range of devices, endpoints and network nodes. These wide range of devices, endpoints and networks include mobile phones and smart phones, electronic tablets, personal computers, conference phones, conference rooms, virtual reality (VR) devices and augmented reality (AR) devices. , home theater devices, and other suitable devices. These devices, endpoints and network nodes can have various acoustic interfaces for sound capture and rendering.

ＩＶＡＳビットストリームを符号化および復号化する実施態様が開示される。 Embodiments are disclosed for encoding and decoding IVAS bitstreams.

いくつかの実施態様において、オーディオ信号のビットストリームを生成する方法は、没入型音声オーディオサービス（ＩＶＡＳ）エンコーダーを使用して、符号化モードインジケーターまたは符号化ツールインジケーターを求めることであって、前記符号化モードインジケーターまたは前記符号化ツールインジケーターは、前記オーディオ信号の符号化モードまたは符号化ツールを示すことと、前記ＩＶＡＳエンコーダーを使用して、前記符号化モードインジケーターまたは前記符号化ツールインジケーターをＩＶＡＳビットストリームの共通ヘッダー（ＣＨ）セクション内に符号化することと、前記ＩＶＡＳエンコーダーを使用して、モードヘッダーまたはツールヘッダーを求めることと、前記ＩＶＡＳエンコーダーを使用して、前記モードヘッダーまたは前記ツールヘッダーを前記ＩＶＡＳビットストリームのツールヘッダー（ＴＨ）セクション内に符号化することであって、該ＴＨセクションは前記ＣＨセクションの後に続くことと、前記ＩＶＡＳエンコーダーを使用して、空間メタデータを含むメタデータペイロードを求めることと、前記ＩＶＡＳエンコーダーを使用して、前記メタデータペイロードを前記ＩＶＡＳビットストリームのメタデータペイロード（ＭＤＰ）セクション内に符号化することであって、該ＭＤＰセクションは、前記ＣＨセクションの後に続くことと、前記ＩＶＡＳエンコーダーを使用して、拡張型音声サービス（ＥＶＳ）ペイロードを求めることであって、該ＥＶＳペイロードは、前記オーディオ信号の各チャネルまたは各ダウンミックスチャネルのＥＶＳ符号化ビットを含むことと、前記ＩＶＡＳエンコーダーを使用して、前記ＥＶＳペイロードを前記ＩＶＡＳビットストリームのＥＶＳペイロード（ＥＰ）セクション内に符号化することであって、該ＥＰセクションは前記ＣＨセクションの後に続くこととを含む。 In some embodiments, a method of generating a bitstream of an audio signal is using an immersive audio audio service (IVAS) encoder to determine an encoding mode indicator or an encoding tool indicator, The encoding mode indicator or the encoding tool indicator indicates an encoding mode or encoding tool of the audio signal, and the IVAS encoder is used to convert the encoding mode indicator or the encoding tool indicator into an IVAS bitstream. using the IVAS encoder to obtain a mode header or a tool header; and using the IVAS encoder to convert the mode header or the tool header to the encoding within a Tool Header (TH) section of an IVAS bitstream, the TH section following the CH section; and using the IVAS encoder to encode a metadata payload including spatial metadata. and encoding the metadata payload into a metadata payload (MDP) section of the IVAS bitstream using the IVAS encoder, the MDP section following the CH section. and using the IVAS encoder to determine enhanced voice service (EVS) payloads, the EVS payloads including EVS coded bits for each channel or downmix channel of the audio signal. and encoding the EVS payload into an EVS payload (EP) section of the IVAS bitstream using the IVAS encoder, the EP section following the CH section.

いくつかの実施態様において、前記ＩＶＡＳビットストリームは非一時的コンピューター可読媒体上に記憶される。他の実施態様において、前記ＩＶＡＳビットストリームは下流デバイスにストリーミングされ、前記符号化モードまたは前記符号化ツールインジケーター、前記モードヘッダーまたは前記ツールヘッダー、前記メタデータペイロードおよび前記ＥＶＳペイロードは、前記下流デバイスまたは別のデバイスにおける前記オーディオ信号の再構成に使用するために、前記ＩＶＡＳビットストリームの前記ＣＨセクション、前記ＴＨセクション、前記ＭＤＰセクションおよび前記ＥＰセクションからそれぞれ抽出されて復号化される。 In some embodiments, the IVAS bitstream is stored on non-transitory computer-readable media. In another embodiment, the IVAS bitstream is streamed to a downstream device, and the encoding mode or encoding tool indicator, the mode header or tool header, the metadata payload and the EVS payload are streamed to the downstream device or The CH, TH, MDP and EP sections of the IVAS bitstream are respectively extracted and decoded for use in reconstructing the audio signal in another device.

いくつかの実施態様において、オーディオ信号のビットストリームを復号化する方法は、没入型音声オーディオサービス（ＩＶＡＳ）デコーダーを使用して、ＩＶＡＳビットストリームの共通ヘッダー（ＣＨ）セクション内の符号化モードインジケーターまたは符号化ツールインジケーターを抽出して復号化することであって、前記符号化モードインジケーターまたは前記符号化ツールインジケーターは、前記オーディオ信号の符号化モードまたは符号化ツールを示すことと、前記ＩＶＡＳデコーダーを使用して、前記ＩＶＡＳビットストリームの前記ツールヘッダー（ＴＨ）セクション内のモードヘッダーまたはツールヘッダーを抽出して復号化することであって、該ＴＨセクションは前記ＣＨセクションの後に続くことと、前記ＩＶＡＳデコーダーを使用して、メタデータペイロードを前記ＩＶＡＳビットストリームのメタデータペイロード（ＭＤＰ）セクションから抽出して復号化することであって、該ＭＤＰセクションは前記ＣＨセクションの後に続き、前記メタデータペイロードは空間メタデータを含むことと、前記ＩＶＡＳデコーダーを使用して、拡張型音声サービス（ＥＶＳ）ペイロードを前記ＩＶＡＳビットストリームのＥＶＳペイロード（ＥＰ）セクションから抽出して復号化することであって、該ＥＰセクションは前記ＣＨセクションの後に続き、前記ＥＶＳペイロードは、前記オーディオ信号の各チャネルまたは各ダウンミックスチャネルのＥＶＳ符号化ビットを含むこととを含む。 In some implementations, a method of decoding a bitstream of an audio signal includes using an immersive voice audio service (IVAS) decoder to detect a coding mode indicator or extracting and decoding a coding tool indicator, wherein the coding mode indicator or the coding tool indicator indicates the coding mode or coding tool of the audio signal; and using the IVAS decoder. to extract and decode a mode header or tool header in the Tool Header (TH) section of the IVAS bitstream, the TH section following the CH section; extracting and decoding a metadata payload from a metadata payload (MDP) section of the IVAS bitstream using including metadata; and using the IVAS decoder to extract and decode an enhanced voice service (EVS) payload from an EVS payload (EP) section of the IVAS bitstream, the EP section follows the CH section, the EVS payload containing EVS coded bits for each channel or downmix channel of the audio signal.

いくつかの実施態様において、下流デバイスまたは別のデバイスにおける前記オーディオ信号の再構成に使用するための該下流デバイスのオーディオデコーダーが、前記符号化モードインジケーターもしくは前記符号化ツールインジケーター、前記モードヘッダーもしくは前記ツールヘッダー、前記ＥＶＳペイロード、および前記メタデータペイロードに基づいて制御される。他の実施態様において、前記符号化モードインジケーターもしくは前記符号化ツールインジケーター、前記モードヘッダーもしくは前記ツールヘッダー、前記ＥＶＳペイロード、および前記メタデータペイロードの表現(representation)が非一時的コンピューター可読媒体上に記憶する。 In some embodiments, a downstream device or an audio decoder of a downstream device for use in reconstructing the audio signal in another device detects the coding mode indicator or the coding tool indicator, the mode header or the Controlled based on the tool header, the EVS payload, and the metadata payload. In another embodiment, a representation of said encoding mode indicator or said encoding tool indicator, said mode header or said tool header, said EVS payload, and said metadata payload is stored on a non-transitory computer readable medium. do.

いくつかの実施態様において、各ＥＶＳ符号化チャネルまたは各ダウンミックスチャネルのビットレートは、ＥＶＳの利用可能な全ビット、ＳＰＡＲビットレート分布制御テーブルおよびビットレート分布アルゴリズムによって求められる。 In some implementations, the bitrate of each EVS-encoded channel or each downmixed channel is determined by EVS total available bits, a SPAR bitrate distribution control table and a bitrate distribution algorithm.

いくつかの実施態様において、前記ＣＨはマルチビットデータ構造体であり、該マルチビットデータ構造体の１つの値は、空間再構成（ＳＰＡＲ）符号化モードに対応し、該データ構造体の他の値は、他の符号化モードに対応する。 In some embodiments, the CH is a multi-bit data structure, one value of the multi-bit data structure corresponds to a Spatial Reconstruction (SPAR) coding mode, and the other value of the data structure is Values correspond to other encoding modes.

いくつかの実施態様において、上記方法は、空間再構成（ＳＰＡＲ）ビットレート分布制御テーブルの行インデックスを計算するためのインデックスオフセットをそれぞれ前記ＩＶＡＳビットストリームの前記ＴＨセクションに記憶することまたは前記ＴＨセクションから読み出すことを更に含む。 In some embodiments, the method comprises storing an index offset for calculating a row index of a Spatial Reconstruction (SPAR) bitrate distribution control table in the TH section of the IVAS bitstream, respectively. further comprising reading from

いくつかの実施態様において、上記方法は、量子化ストラテジーインジケーターと、ビットストリーム符号化ストラテジーインジケーターと、係数のセットの量子化および符号化された実部および虚部とを、それぞれ前記ＩＶＡＳビットストリームの前記ＭＤＰセクションに記憶することまたは前記ＭＤＰセクションから読み出すことを更に含む。 In some implementations, the above method converts a quantization strategy indicator, a bitstream encoding strategy indicator, and quantized and encoded real and imaginary parts of a set of coefficients into the IVAS bitstream, respectively. Further comprising storing in or reading from the MDP section.

いくつかの実施態様において、前記係数のセットは、予測係数、ダイレクト係数、対角実数係数および下三角複素係数を含む。 In some embodiments, the set of coefficients includes prediction coefficients, direct coefficients, diagonal real coefficients and lower triangular complex coefficients.

いくつかの実施態様において、前記予測係数は、エントロピー符号化に基づく可変ビット長であり、前記ダイレクト係数、前記対角実数係数および下三角複素係数は、ダウンミックス構成およびエントロピー符号化に基づく可変ビット長である。 In some embodiments, the prediction coefficients are variable bit length based entropy coding, and the direct coefficients, the diagonal real coefficients and the lower triangular complex coefficients are variable bit length based downmix construction and entropy coding. is long.

いくつかの実施態様において、前記量子化ストラテジーインジケーターは、量子化ストラテジーを示すマルチビットデータ構造体である。 In some embodiments, said quantization strategy indicator is a multi-bit data structure that indicates a quantization strategy.

いくつかの実施態様において、前記ビットストリーム符号化ストラテジーインジケーターは、空間メタデータの帯域数および非差分（non-differential）エントロピー符号化方式または時間差分（time-differential）エントロピー符号化方式を示すマルチビットデータ構造体である。 In some embodiments, the bitstream coding strategy indicator is multi-bit indicating the number of spatial metadata bands and non-differential entropy coding scheme or time-differential entropy coding scheme. A data structure.

いくつかの実施態様において、前記係数の前記量子化は、メタデータ量子化およびＥＶＳビットレート分布を含むＥＶＳビットレート分布制御ストラテジーに従う。 In some embodiments, said quantization of said coefficients follows an EVS bitrate distribution control strategy comprising metadata quantization and EVS bitrate distribution.

いくつかの実施態様において、上述の方法は、第３世代パートナーシッププロジェクト（３ＧＰＰ）技術仕様（ＴＳ）２６．４４５に従って、ＥＶＳインスタンスのＥＶＳペイロードを、それぞれ前記ＩＶＡＳビットストリームの前記ＥＰセクションに記憶すること、または、前記ＩＶＡＳビットストリームの前記ＥＰセクションから読み出すことを更に含む。 In some embodiments, the above method stores EVS payloads of EVS instances in the EP section of the IVAS bitstream, respectively, according to 3rd Generation Partnership Project (3GPP) Technical Specification (TS) 26.445. or reading from the EP section of the IVAS bitstream.

いくつかの実施態様において、上記方法は、前記ＩＶＡＳビットストリームからビットレートを求めることと、前記ＩＶＡＳビットストリームの空間再構成（ＳＰＡＲ）ツールヘッダー（ＴＨ）セクションからインデックスオフセットを読み出すことと、前記インデックスオフセットを使用して、前記ＳＰＡＲビットレート分布制御テーブルのテーブル行インデックスを求めることと、前記ＩＶＡＳビットストリームにおけるメタデータペイロード（ＭＤＰ）セクションから量子化ストラテジービットおよび符号化ストラテジービットを読み出すことと、前記量子化ストラテジービットおよび前記符号化ストラテジービットに基づいて、前記ＩＶＡＳビットストリームの前記ＭＤＰセクション内のＳＰＡＲ空間メタデータを量子化解除することと、利用可能な全ＥＶＳビット、およびＳＰＡＲビットレート分布制御テーブルを使用して、前記ＩＶＡＳビットストリームにおける各チャネルの拡張型音声サービス（ＥＶＳ）ビットレートを求めることと、前記ＥＶＳビットレートに基づいて前記ＩＶＡＳビットストリームの前記ＥＰセクションからＥＶＳ符号化ビットを読み出すことと、前記ＥＶＳビットを復号化することと、前記空間メタデータを復号化することと、前記復号化されたＥＶＳビットおよび前記復号化された空間メタデータを使用して、１次アンビソニックス（ＦｏＡ）出力を生成することとを更に含む。 In some embodiments, the method comprises: determining a bitrate from the IVAS bitstream; reading an index offset from a Spatial Reconstruction (SPAR) Tool Header (TH) section of the IVAS bitstream; determining a table row index of the SPAR bitrate distribution control table using the offset; reading quantization strategy bits and encoding strategy bits from a metadata payload (MDP) section in the IVAS bitstream; dequantizing SPAR spatial metadata in said MDP section of said IVAS bitstream based on quantization strategy bits and said encoding strategy bits; total available EVS bits; and a SPAR bitrate distribution control table. and reading EVS coded bits from the EP section of the IVAS bitstream based on the EVS bitrate, using decoding the EVS bits; decoding the spatial metadata; and using the decoded EVS bits and the decoded spatial metadata, a first order Ambisonics (FoA ) generating an output.

本明細書に開示される他の実施態様は、システム、装置およびコンピューター可読媒体に関する。開示される実施態様の詳細は、添付図面および以下の説明において明らかにされる。他の特徴、目的および利点は、以下の説明、図面および特許請求の範囲から明らかである。 Other embodiments disclosed herein relate to systems, apparatus and computer-readable media. The details of the disclosed embodiments are set forth in the accompanying drawings and the description below. Other features, objects and advantages are apparent from the following description, drawings and claims.

本明細書に開示される特定の実施態様は、次の利点のうちの１つ以上を提供する。開示されるＩＶＡＳビットストリームフォーマットは、様々なオーディオサービス能力をサポートする効率的でロバストなビットストリームフォーマットである。これらのオーディオサービス能力には、モノラルからステレオへのアップミックスならびに完全没入型オーディオ符号化、復号化およびレンダリングが含まれるが、これらに限定されるものではない。いくつかの実施態様において、ＩＶＡＳビットストリームフォーマットは、ステレオオーディオ信号を分析およびダウンミックス（downmixing：ダウンミキシング）するための複合アドバンスカップリング（ＣＡＣＰＬ：complex advance coupling）をサポートする。他の実施態様において、ＩＶＡＳビットストリームフォーマットは、１次アンビソニックス（ＦｏＡ：first order Ambisonics）オーディオ信号を分析およびダウンミックスする空間再構成（ＳＰＡＲ：spatial reconstruction）をサポートする。 Certain implementations disclosed herein provide one or more of the following advantages. The disclosed IVAS bitstream format is an efficient and robust bitstream format that supports various audio service capabilities. These audio service capabilities include, but are not limited to, mono-to-stereo upmix and fully immersive audio encoding, decoding and rendering. In some implementations, the IVAS bitstream format supports complex advance coupling (CACPL) for analyzing and downmixing stereo audio signals. In another embodiment, the IVAS bitstream format supports spatial reconstruction (SPAR) for analyzing and downmixing first order Ambisonics (FoA) audio signals.

図面において、デバイス、ユニット、命令ブロック、およびデータ要素を表す要素等の図的要素の特定の配列または順序付けは、説明を容易にするために示されている。しかしながら、図面におけるこれらの図的要素の特定の順序付けまたは配列は、処理に何らかの特定の順序またはシーケンスが必要とされていることも、プロセスの分離が必要とされていることも示唆しているものではないことが、当業者によって理解されるべきである。さらに、図的要素が図面に含まれているということは、そのような要素が全ての実施形態において必要とされることを示唆しているものでもなければ、そのような要素によって表される特徴部をいくつかの実施態様における他の要素に含めることができないことや他の要素と組み合わせることができないことを示唆しているものでもない。 In the drawings, a specific arrangement or ordering of graphical elements, such as elements representing devices, units, instruction blocks, and data elements, is shown for ease of explanation. However, the specific ordering or arrangement of these graphical elements in the drawings suggests that some specific order or sequence of processing is required, or that a separation of processes is required. It should be understood by those skilled in the art that it is not. Furthermore, the inclusion of a drawing element in a drawing does not imply that such element is required in all embodiments nor does it imply that the feature represented by such element It is not meant to imply that a part cannot be included in or combined with other elements in some implementations.

さらに、図面において、実線もしくは破線または矢印等の接続要素が、２つ以上の他の図的要素間の接続、関係、または関連付けを示すのに用いられている場合、そのような接続の要素が不在の場合に、接続、関係、または関連付けが存在し得ないことを示唆するものではない。換言すれば、要素間のいくつかの接続、関係、または関連付けは、本開示を分かりにくくしないように図面に示されていない。加えて、図示を容易にするために、要素間の複数の接続、関係、または関連付けを表すのに、単一の接続要素が用いられる。例えば、接続要素が、信号、データ、または命令の通信を表す場合、そのような要素は、必要に応じて、通信を実施するための１つまたは複数の信号パスを表すことが当業者によって理解されるべきである。 Further, when the drawings use connecting elements, such as solid or dashed lines or arrows, to indicate a connection, relationship, or association between two or more other diagrammatic elements, such connecting elements are Its absence does not imply that a connection, relationship or association cannot exist. In other words, some connections, relationships or associations between elements are not shown in the drawings so as not to obscure the present disclosure. Additionally, for ease of illustration, single connecting elements are used to represent multiple connections, relationships or associations between elements. For example, where connecting elements represent communication of signals, data, or instructions, such elements represent one or more signal paths for carrying out the communication, as appropriate. It should be.

一実施形態によるＩＶＡＳシステムを示す図である。1 illustrates an IVAS system according to one embodiment; FIG.

一実施形態による、ＩＶＡＳビットストリームを符号化および復号化するシステムのブロック図である。1 is a block diagram of a system for encoding and decoding an IVAS bitstream, according to one embodiment; FIG.

一実施形態による、ＩＶＡＳビットストリームをＦｏＡフォーマットで符号化および復号化するＦｏＡコーダー／デコーダー（「コーデック」）のブロック図である。1 is a block diagram of a FoA coder/decoder (“codec”) that encodes and decodes an IVAS bitstream in FoA format, according to one embodiment; FIG.

一実施形態によるＩＶＡＳ符号化プロセスのフロー図である。FIG. 4 is a flow diagram of an IVAS encoding process according to one embodiment;

一実施形態による、代替のＩＶＡＳフォーマットを使用するＩＶＡＳ符号化プロセスのフロー図である。FIG. 4 is a flow diagram of an IVAS encoding process using an alternative IVAS format, according to one embodiment;

一実施形態によるＩＶＡＳ復号化プロセスのフロー図である。FIG. 4 is a flow diagram of an IVAS decoding process according to one embodiment;

一実施形態による、代替のＩＶＡＳフォーマットを使用するＩＶＡＳ復号化プロセスのフロー図である。FIG. 4 is a flow diagram of an IVAS decoding process using an alternative IVAS format, according to one embodiment;

一実施形態によるＩＶＡＳＳＰＡＲ符号化プロセスのフロー図である。FIG. 4 is a flow diagram of an IVAS SPAR encoding process according to one embodiment;

一実施形態によるＩＶＡＳＳＰＡＲ復号化プロセスのフロー図である。FIG. 4 is a flow diagram of an IVAS SPAR decoding process according to one embodiment;

一実施形態による一例示のデバイスアーキテクチャのブロック図である。1 is a block diagram of an exemplary device architecture according to one embodiment; FIG.

様々な図面において使用される同じ参照符号は、同様の要素を示す。 The same reference numbers used in different drawings indicate similar elements.

以下の詳細な説明において、説明される様々な実施形態の十分な理解を提供するために、非常に多くの具体的な詳細が述べられている。説明される様々な実施態様は、これらの具体的な詳細がなくても実施することができることが当業者には明らかである。それ以外の場合には、既知の方法、手順、構成要素、および回路は、実施形態の態様を不必要に不明瞭にしないように、詳細には説明されていない。互いに独立にまたは他の特徴の何らかの組み合わせでそれぞれ使用することができるいくつかの特徴が以下で説明される。 In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the various described embodiments. It will be apparent to those skilled in the art that the various described embodiments may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments. Several features are described below that can each be used independently of each other or in any combination with other features.

命名法
本明細書に使用されるような用語「含む」／「備える」（include）およびその変異形は、「～を含むが、～に限定されるものではない」を意味する非限定的（open-ended）用語として解釈されるべきである。用語「または」／「もしくは」（or）は、文脈上明らかに他の意味を示していない限り、「および／または（and/or）」と解釈されるべきである。用語「～に基づいて」（based on）は、「～に少なくとも部分的に基づいて」と解釈されるべきである。用語「１つの例示の実施態様」および「一例示の実施態様」は、「少なくとも１つの例示の実施態様」と解釈されるべきである。用語「別の実施態様」は、「少なくとも１つの別の実施態様」と解釈されるべきである。用語「求められる」（determined）、「求める」／「決定する」（determines, determining）は、「取得する」、「受信する」、「計算する」、「算出する」、「推定する」、「予測する」または「導出する」と解釈されるべきである。加えて、以下の説明および特許請求の範囲において、別段の定義がない限り、本明細書に使用される全ての技術用語および科学用語は、本開示が属する技術分野の当業者によって一般に理解されているものと同じ意味を有する。 Nomenclature As used herein, the terms "include"/"include" and variations thereof mean "including, but not limited to" non-limiting ( open-ended) term. The terms "or"/"or" should be interpreted as "and/or" unless the context clearly indicates otherwise. The term "based on" should be interpreted as "based at least in part on." The terms "one exemplary embodiment" and "one exemplary embodiment" should be interpreted as "at least one exemplary embodiment". The term "another embodiment" should be interpreted as "at least one alternative embodiment". The terms "determined,""determine,""determine,""determine,""obtain,""receive,""calculate,""calculate,""estimate,"" should be construed as 'predicting' or 'deriving'. Additionally, in the following description and claims, unless otherwise defined, all technical and scientific terms used herein are commonly understood by one of ordinary skill in the art to which this disclosure pertains. have the same meaning as

ＩＶＡＳシステムの概略
図１は、１つ以上の実施態様によるＩＶＡＳシステム１００を示している。いくつかの実施態様において、様々なデバイスが、例えば、ＰＳＴＮ／他のＰＬＭＮ１０４によって示される公衆交換電話ネットワーク（ＰＳＴＮ：public switched telephone network）デバイスまたは公衆陸上移動ネットワーク（ＰＬＭＮ：public land mobile network）デバイスからオーディオ信号を受信するように構成されるコールサーバー１０２を通じて通信する。ＩＶＡＳシステム１００は、オーディオをモノラルのみでレンダリングおよびキャプチャーするようなレガシーデバイス１０６をサポートする。このレガシーデバイスは、拡張型音声サービス（ＥＶＳ：enhanced voice service）、マルチレート広帯域（ＡＭＲ－ＷＢ：multi-rate wideband）および適応マルチレート狭帯域（ＡＭＲ－ＮＢ：adaptive multi-rate narrowband）をサポートするデバイスを含むが、これに限定されるものではない。ＩＶＡＳシステム１００は、ステレオオーディオ信号をキャプチャーおよびレンダリングするユーザー機器（ＵＥ：user equipment）１０８、１１４、またはモノラル信号をキャプチャーし、マルチチャネル信号にバイノーラルでレンダリングするＵＥ１１０もサポートする。ＩＶＡＳシステム１００は、ビデオ会議室システム１１６、１１８によってそれぞれキャプチャーおよびレンダリングされる没入信号およびステレオ信号もサポートする。ＩＶＡＳシステム１００は、ホームシアターシステム用のステレオオーディオ信号のステレオキャプチャーおよび没入型レンダリング、ならびに、仮想現実（ＶＲ）ギア１２２および没入型コンテンツインジェスト（content ingest）１２４用のオーディオ信号のモノラルキャプチャーおよび没入型レンダリングもサポートする。 IVAS System Overview FIG. 1 illustrates an IVAS system 100 according to one or more embodiments. In some implementations, the various devices are, for example, public switched telephone network (PSTN) devices or public land mobile network (PLMN) devices denoted by PSTN/other PLMN 104. Communicate through a call server 102 configured to receive audio signals. The IVAS system 100 supports legacy devices 106 that render and capture audio in mono only. This legacy device supports enhanced voice service (EVS), multi-rate wideband (AMR-WB) and adaptive multi-rate narrowband (AMR-NB). Including but not limited to devices. The IVAS system 100 also supports user equipment (UE) 108, 114 that captures and renders stereo audio signals, or UE 110 that captures monophonic signals and binaurally renders them into multi-channel signals. The IVAS system 100 also supports immersive and stereo signals captured and rendered by the video conferencing room systems 116, 118, respectively. The IVAS system 100 provides stereo capture and immersive rendering of stereo audio signals for home theater systems, and mono capture and immersive rendering of audio signals for virtual reality (VR) gear 122 and immersive content ingest 124. It also supports rendering.

例示のＩＶＡＳ符号化／復号化システム
図２は、１つ以上の実施態様による、ＩＶＡＳビットストリームを符号化および復号化するシステム２００のブロック図である。符号化を行うために、ＩＶＡＳエンコーダーが、オーディオデータ２０１を受信する空間分析ダウンミックスユニット２０２を含む。このオーディオデータは、モノラル信号、ステレオ信号、バイノーラル信号、空間オーディオ信号（例えば、マルチチャネル空間オーディオオブジェクト）、ＦｏＡ、高次アンビソニックス（ＨｏＡ：higher order Ambisonics）および他の任意のオーディオデータを含むが、これらに限定されるものではない。いくつかの実施態様において、空間分析ダウンミックスユニット２０２は、ステレオオーディオ信号を分析／ダウンミックスするＣＡＣＰＬ、および／または、ＦｏＡオーディオ信号を分析／ダウンミックスするＳＰＡＲを実装する。他の実施態様において、空間分析ダウンミックスユニット２０２は、他のフォーマットを実装する。 Exemplary IVAS Encoding/Decoding System FIG. 2 is a block diagram of a system 200 for encoding and decoding IVAS bitstreams, according to one or more implementations. For encoding, the IVAS encoder includes a spatial analysis downmix unit 202 that receives audio data 201 . This audio data includes mono signals, stereo signals, binaural signals, spatial audio signals (e.g., multi-channel spatial audio objects), FoA, higher order Ambisonics (HoA) and any other audio data. , but not limited to these. In some implementations, spatial analysis downmix unit 202 implements CACPL to analyze/downmix stereo audio signals and/or SPAR to analyze/downmix FoA audio signals. In other embodiments, spatial analysis downmix unit 202 implements other formats.

空間分析ダウンミックスユニット２０２の出力は、空間メタデータ、および１－４チャネルのオーディオを含む。空間メタデータは、空間データを量子化およびエントロピー符号化する量子化エントロピー符号化ユニット２０３に入力される。いくつかの実施態様において、量子化は、精細な量子化ストラテジー、中程度の量子化ストラテジー、粗い量子化ストラテジーおよび極めて粗い量子化ストラテジーを含むことができ、エントロピー符号化は、ハフマン符号化または算術符号化を含むことができる。拡張型音声サービス（ＥＶＳ）符号化ユニット２０６は、１－４チャネルのオーディオを１つ以上のＥＶＳビットストリームに符号化する。 The output of spatial analysis downmix unit 202 includes spatial metadata and 1-4 channels of audio. The spatial metadata is input to quantization entropy encoding unit 203 which quantizes and entropy encodes the spatial data. In some implementations, quantization can include fine quantization strategies, medium quantization strategies, coarse quantization strategies, and very coarse quantization strategies, and entropy coding can be Huffman coding or arithmetic Encoding can be included. Enhanced voice service (EVS) encoding unit 206 encodes 1-4 channels of audio into one or more EVS bitstreams.

いくつかの実施態様において、ＥＶＳ符号化ユニット２０６は、３ＧＰＰＴＳ２６．４４５に準拠し、広範囲の機能、例えば狭帯域（ＥＶＳ－ＮＢ）音声サービスおよび広帯域（ＥＶＳ－ＷＢ）音声サービスの質の向上および符号化効率、超広帯域（ＥＶＳ－ＳＷＢ）音声を使用した質の向上、会話アプリケーションにおけるミックスされたコンテンツおよび音楽の質の向上、パケット損失および遅延ジッターに対するロバスト性、ならびにＡＭＲ－ＷＢコーデックへの後方互換性等を提供する。いくつかの実施態様において、ＥＶＳ符号化ユニット２０６は、モード／ビットレート制御部２０７に基づいて指定されたビットレートで音声信号を符号化する音声コーダーおよびオーディオ信号を符号化する知覚コーダーのいずれかを選択する、前処理モード選択ユニットを含む。いくつかの実施態様において、音声エンコーダーは、異なる音声クラスごとに特殊化されたＬＰベースのモードを用いて拡張された代数符号励振線形予測（ＡＣＥＬＰ：algebraic code-excited linear prediction）の、改良された変形形態である。いくつかの実施態様において、オーディオエンコーダーは、低遅延／低ビットレートにおいて効率が高められた変形離散コサイン変換（ＭＤＣＴ：modified discrete cosine transform）エンコーダーであり、音声エンコーダーとオーディオエンコーダーとの間のシームレスで信頼性のある切り替えを行うように設計される。 In some implementations, EVS encoding unit 206 complies with 3GPP TS26.445 and supports a wide range of functions, such as narrowband (EVS-NB) voice service and wideband (EVS-WB) voice service quality enhancement and Coding efficiency, improved quality using ultra-wideband (EVS-SWB) audio, improved mixed content and music quality in conversational applications, robustness against packet loss and delay jitter, and backwards to AMR-WB codecs Provide compatibility, etc. In some implementations, EVS encoding unit 206 is either a speech coder that encodes the speech signal at a bitrate specified based on mode/bitrate control 207, or a perceptual coder that encodes the audio signal. a preprocessing mode selection unit for selecting the In some implementations, the speech encoder performs enhanced algebraic code-excited linear prediction (ACELP) with LP-based modes specialized for different speech classes. It is a modified form. In some implementations, the audio encoder is a modified discrete cosine transform (MDCT) encoder with enhanced efficiency at low latency/low bitrates, and a seamless transition between speech and audio encoders. Designed for reliable switching.

いくつかの実施態様において、ＩＶＡＳデコーダーは、空間メタデータを回復するように構成される量子化エントロピー復号化ユニット２０４と、１－４チャネルオーディオ信号を回復するように構成されるＥＶＳデコーダー（単数または複数）とを含む。回復された空間メタデータおよびオーディオ信号は、空間合成／レンダリングユニット２０９に入力され、この空間合成／レンダリングユニットは、様々なオーディオシステム２１０上での再生のために空間メタデータを使用してオーディオ信号を合成／レンダリングする。 In some implementations, the IVAS decoder comprises a quantized entropy decoding unit 204 configured to recover spatial metadata and an EVS decoder (singular or plural) and The recovered spatial metadata and audio signal are input to spatial synthesis/rendering unit 209 , which uses the spatial metadata to render the audio signal for playback on various audio systems 210 . composite/render.

例示のＩＶＡＳ／ＳＰＡＲコーデック
図３は、いくつかの実施態様による、ＦｏＡをＳＰＡＲフォーマットで符号化および復号化するＦｏＡコーデック３００のブロック図である。ＦｏＡコーデック３００は、ＳＰＡＲＦｏＡエンコーダー３０１、ＥＶＳエンコーダー３０５、ＳＰＡＲＦｏＡデコーダー３０６およびＥＶＳデコーダー３０７を含む。ＦｏＡコーデック３００は、デコーダー３０６、３０７において入力信号を再生成するのに使用されるダウンミックスチャネルおよびパラメーターのセットにＦｏＡ入力信号を変換する。ダウンミックス信号は、１チャネルから４チャネルの間で変化することができ、パラメーターは、予測係数（ＰＲ）、相互予測係数（Ｃ）、および無相関係数（decorrelation coefficient）（Ｐ）を含む。ＳＰＡＲは、以下で更に詳細に説明されるように、ＰＲパラメーター、ＣパラメーターおよびＰパラメーターを使用して、オーディオ信号をダウンミックスしたものからオーディオ信号を再構成するのに使用されるプロセスであることに留意されたい。 Exemplary IVAS/SPAR Codec FIG. 3 is a block diagram of a FoA codec 300 for encoding and decoding FoA in SPAR format, according to some implementations. FoA codec 300 includes SPAR FoA encoder 301 , EVS encoder 305 , SPAR FoA decoder 306 and EVS decoder 307 . FoA codec 300 transforms the FoA input signal into a set of downmix channels and parameters that are used to regenerate the input signal at decoders 306,307. The downmix signal can vary between 1-channel to 4-channel and the parameters include prediction coefficient (PR), cross-prediction coefficient (C) and decorrelation coefficient (P). SPAR is a process used to reconstruct an audio signal from a downmix of the audio signal using the PR, C and P parameters, as described in more detail below. Please note.

図３に示す例示の実施態様は、パッシブＷチャネルを前提とし、Ｗチャネルが単一の予測チャネルＹ’とともに変更されずにデコーダー３０６に送信される公称２チャネルダウンミックスを描写していることに留意されたい。他の実施態様において、Ｗは、アクティブチャネルとすることができる。アクティブＷチャネルは、以下のように、Ｘチャネル、Ｙチャネル、ＺチャネルのＷチャネルへの或るミックス（mixing：ミキシング、合成）を可能にする。

ここで、ｆは、Ｘチャネル、Ｙチャネル、ＺチャネルのうちのいくつかをＷチャネルにミックスすることを可能にする定数（例えば０．５）である。ｐｒ_ｙ、ｐｒ_ｘおよびｐｒ_ｚは、予測（ＰＲ）係数である。パッシブＷでは、ｆ＝０であり、そのため、Ｘチャネル、Ｙチャネル、ＺチャネルのＷチャネルへのミックスは行われない。 Note that the exemplary implementation shown in FIG. 3 assumes a passive W channel and depicts a nominal two-channel downmix in which the W channel is sent unchanged to decoder 306 along with a single prediction channel Y′. Please note. In other implementations, W may be the active channel. The active W channel allows some mixing of the X, Y and Z channels into the W channel as follows.

where f is a constant (eg 0.5) that allows some of the X, Y and Z channels to be mixed into the W channel. pr _y , pr _x and pr _z are the prediction (PR) coefficients. In passive W, f=0, so there is no mixing of the X, Y, Z channels into the W channel.

以下で更に詳細に説明するように、Ｃ係数は、ＸチャネルおよびＺチャネルのうちのいくつかをＹ’から再構成することを可能にし、残りのチャネルは、以下で更に詳細に説明するように、Ｗチャネルを無相関化したものによって再構成される。 As described in more detail below, the C coefficients allow some of the X and Z channels to be reconstructed from Y′, while the remaining channels are , is reconstructed by decorrelating the W channel.

いくつかの実施態様において、ＳＰＡＲＦｏＡエンコーダー３０１は、パッシブ／アクティブ予測器ユニット３０２、リミックスユニット３０３および抽出／ダウンミックス選択ユニット３０４を含む。パッシブ／アクティブ予測器は、４チャネルＢフォーマット（Ｗ、Ｙ、Ｚ、Ｘ）のＦｏＡチャネルを受信し、予測チャネル（ＷまたはＷ’、Ｙ’、Ｚ’、Ｘ’）を計算する。Ｗチャネルは、等しい利得および位相において全ての方向から来る球内の全ての音を含む無指向性ポーラーパターンであり、Ｘは、前方を指し示す８字型（figure-8）双方向ポーラーパターンであり、Ｙは、左を指し示す８字型双方向ポーラーパターンであり、Ｚは、上方を指し示す８字型双方向ポーラーパターンであることに留意されたい。 In some implementations, SPAR FoA encoder 301 includes passive/active predictor unit 302 , remix unit 303 and extraction/downmix selection unit 304 . The passive/active predictor receives the FoA channel in 4-channel B format (W, Y, Z, X) and calculates the predicted channel (W or W', Y', Z', X'). The W channel is an omnidirectional polar pattern containing all sounds in a sphere coming from all directions at equal gain and phase, and the X is a forward-pointing figure-8 bidirectional polar pattern. , Y is a figure 8 bidirectional polar pattern pointing left, and Z is a figure 8 bidirectional polar pattern pointing up.

抽出／ダウンミックス選択ユニット３０４は、以下でより詳細に説明するように、ＩＶＡＳビットストリームのメタデータペイロードセクションからＳＰＡＲＦｏＡメタデータを抽出する。パッシブ／アクティブ予測器ユニット３０２およびリミックスユニット３０３は、ＳＰＡＲＦｏＡメタデータを使用して、リミックスされたＦｏＡチャネル（ＷまたはＷ’、Ａ’、Ｂ’、Ｃ’）を生成し、これらのＦｏＡチャネルは、ＥＶＳエンコーダー３０５に入力されてＥＶＳビットストリームに符号化され、このＥＶＳビットストリームは、デコーダー３０６に送信されるＩＶＡＳビットストリーム内にカプセル化される。この例において、アンビソニック（Ambisonic）ＢフォーマットチャネルがＡｍｂｉＸ形式で配置されることに留意されたい。ただし、Ｆｕｒｓｅ－Ｍａｌｈａｍ（ＦｕＭａ）形式（Ｗ、Ｘ、Ｙ、Ｚ）等の他の形式も同様に使用することができる。 Extraction/downmix selection unit 304 extracts SPAR FoA metadata from the metadata payload section of the IVAS bitstream, as described in more detail below. A passive/active predictor unit 302 and a remix unit 303 use the SPAR FoA metadata to generate remixed FoA channels (W or W', A', B', C'), and these FoA channels is input to EVS encoder 305 and encoded into an EVS bitstream, which is encapsulated within an IVAS bitstream that is sent to decoder 306 . Note that in this example the Ambisonic B format channels are arranged in AmbiX format. However, other formats such as the Furse-Malham (FuMa) format (W, X, Y, Z) can be used as well.

ＳＰＡＲＦｏＡデコーダー３０６を参照すると、ＥＶＳビットストリームは、ＥＶＳデコーダー３０７によって復号化され、その結果、Ｎ（例えばＮ＝４）個のダウンミックスチャネルが得られる。いくつかの実施態様において、ＳＰＡＲＦｏＡデコーダー３０６は、ＳＰＡＲエンコーダー３０１によって実行される動作の逆を実行する。例えば、リミックスされたＦｏＡチャネル（ＷまたはＷ’、Ａ’、Ｂ’、Ｃ’）は、ＳＰＡＲＦｏＡ空間メタデータを使用してＮ個のダウンミックスチャネルから回復される。リミックスされたＳＰＡＲＦｏＡチャネルは、逆ミキサー３１１に入力され、予測されたＳＰＡＲＦｏＡチャネル（ＷまたはＷ’、Ｙ’、Ｚ’、Ｘ’）が回復される。予測されたＳＰＡＲＦｏＡチャネルは、次に、逆予測器３１２に入力され、元のミックスされていないＳＰＡＲＦｏＡチャネル（Ｗ、Ｙ、Ｚ、Ｘ）が回復される。この２チャネルの例において、デコリレーター（decorrelator：無相関器）ブロック３０９ａ（ｄｅｃ_１）．．．３０９ｎ（ｄｅｃ_Ｄ）が、時間領域デコリレーターまたは周波数領域デコリレーターを使用してＷチャネルの無相関化されたものを生成するのに使用されることに留意されたい。無相関化されたチャネルは、ＳＰＡＲＦｏＡメタデータと組み合わせて使用され、ＸチャネルおよびＺチャネルが完全にまたはパラメーター的に再構成される。 Referring to SPAR FoA decoder 306, the EVS bitstream is decoded by EVS decoder 307, resulting in N (eg, N=4) downmix channels. In some implementations, SPAR FoA decoder 306 performs the inverse of the operations performed by SPAR encoder 301 . For example, remixed FoA channels (W or W', A', B', C') are recovered from N downmix channels using SPAR FoA spatial metadata. The remixed SPAR FoA channel is input to an inverse mixer 311 to recover the predicted SPAR FoA channel (W or W', Y', Z', X'). The predicted SPAR FoA channels are then input to an inverse predictor 312 to recover the original unmixed SPAR FoA channels (W, Y, Z, X). In this two-channel example, decorrelator blocks 309a(dec ₁ ) . . . Note that 309n(dec _D ) is used to generate a decorrelated version of the W channel using either the time domain decorrelator or the frequency domain decorrelator. The decorrelated channels are used in combination with SPAR FoA metadata to completely or parametrically reconstruct the X and Z channels.

いくつかの実施態様において、ダウンミックスチャネルの数に応じて、ＦｏＡ入力のうちの１つ（Ｗチャネル）がＳＰＡＲＦｏＡデコーダー３０６に完全な状態のままで送信され、他のチャネル（Ｙ、Ｚ、およびＸ）のうちの１つ～３つが、ＳＰＡＲＦｏＡデコーダー３０６に残差としてまたは完全にパラメーター的に送信される。ダウンミックスチャネルの数Ｎを問わず同じままであるＰＲ係数は、残差のダウンミックスチャネルにおける予測可能エネルギーを最小にするのに使用される。Ｃ係数は、完全にパラメーター化されたチャネルを残差から再生成することを更に助けるのに使用される。したがって、Ｃ係数は、予測するための残差チャネルまたはパラメーター化チャネルが存在しない１つおよび４つのチャネルダウンミックスのケースでは、必要とされない。Ｐ係数は、ＰＲ係数およびＣ係数によって考慮されていない残りのエネルギーを埋めるのに使用される。Ｐ係数の数は、各帯域におけるダウンミックスチャネルの数Ｎに依存する。いくつかの実施態様において、ＳＰＡＲＰＲ係数（パッシブＷのみ）は、以下のように算出される。 In some implementations, depending on the number of downmix channels, one of the FoA inputs (W channel) is sent intact to the SPAR FoA decoder 306 while the other channels (Y, Z, and X) are transmitted to the SPAR FoA decoder 306 as residuals or completely parametrically. PR coefficients that remain the same regardless of the number N of downmix channels are used to minimize the predictable energy in the residual downmix channels. The C coefficients are used to further help regenerate the fully parameterized channel from the residual. Therefore, C coefficients are not needed in the one and four channel downmix cases where there is no residual channel or parameterization channel to predict. The P coefficients are used to fill in the remaining energy not accounted for by the PR and C coefficients. The number of P coefficients depends on the number N of downmix channels in each band. In some implementations, the SPAR PR coefficients (passive W only) are calculated as follows.

ステップ１。式［１］を使用してメインＷ信号から全てのサイド信号（Ｙ、Ｚ、Ｘ）を予測する。

［１］
ここで、一例として、予測されるチャネルＹ’の予測パラメーターは、式［２］を使用して算出される。

［２］
ここで、

は、信号ＡおよびＢに対応する入力された共分散行列の要素である。同様に、Ｚ’残差チャネルおよびＸ’残差チャネルは、対応する予測パラメーターｐｒ_ｚおよびｐｒ_ｘを有する。ＰＲは、予測係数のベクトル

である。 Step one. Predict all side signals (Y, Z, X) from the main W signal using equation [1].

[1]
Here, as an example, the prediction parameters for the predicted channel Y' are calculated using equation [2].

[2]
here,

are the elements of the input covariance matrix corresponding to signals A and B; Similarly, the Z' and X' residual channels have corresponding prediction parameters pr _z and pr _x . PR is the vector of prediction coefficients

is.

ステップ２。Ｗ信号と予測された（Ｙ’、Ｚ’、Ｘ’）信号（この順で最も音響的に関係のある～最も音響的に関係のない）とをリミックスする。ここで、「リミックス」は、或る方法論に基づく並べ換え信号または組み換え信号を意味する。

［３］ Step two. Remix the W signal with the predicted (Y',Z',X') signal (most acoustically relevant to least acoustically relevant in that order). Here, "remix" means a permuted or recombined signal based on some methodology.

[3]

リミックスの１つの実施態様は、左右からのオーディオキューが前後よりも音響的に関係があり、前後のキューが上下のキューよりも音響的に関係があると仮定した場合における入力信号のＷ、Ｙ’、Ｘ’、Ｚ’への並べ換えである。 One implementation of remixing is the W,Y conversion of the input signal assuming that the audio cues from the left and right are more acoustically related than the front and back, and the front and back cues are more acoustically related than the top and bottom cues. ', X', Z' permutation.

ステップ３。式［４］および［５］に示すように、４チャネル事後予測およびリミックスの共分散を算出し、ダウンミックスする。

［４］

［５］
ここで、ｄは、Ｗを越える余分のダウンミックスチャネル（すなわち２番目のチャネルからＮｄｍｘ番目までのチャネル）を表し、ｕは完全に再生成する必要があるチャネル（すなわち（Ｎｄｍｘ＋１）番目のチャネルから４番目までのチャネル）を表す。 Step three. Compute the covariance of the 4-channel posterior prediction and remix and downmix as shown in equations [4] and [5].

[4]

[5]
where d represents the extra downmix channels over W (i.e. the 2nd to Ndmxth channels) and u is the channel that needs to be completely regenerated (i.e. from the (Ndmx+1)th channel to up to 4th channel).

１－４チャネルとのＷＡＢＣダウンミックスの例として、ｄおよびｕは、表Ｉに示す以下のチャネルを表す。

As an example of a WABC downmix with 1-4 channels, d and u represent the following channels shown in Table I.

ＳＰＡＲＦｏＡメタデータの算出の主に対象となっているものは、Ｒ_ｄｄ量、Ｒ_ｕｄ量およびＲ_ｕｕ量である。Ｒ_ｄｄ量、Ｒ_ｕｄ量およびＲ_ｕｕ量から、システムは、完全パラメトリックチャネル（fully parametric channel）の残りの部分を、デコーダーに送信される残差チャネルから相互予測することが可能であるか否かを判断する。いくつかの実施態様において、必要とされる余分のＣ係数は、以下の式によって与えられる。

［６］ Of primary interest in the calculation of SPAR _FoA metadata are the Rdd amount, the _Rud amount and the _Ruu amount. From the R _dd , R _ud and R _uu quantities, whether the system is able to co-predict the remainder of the fully parametric channel from the residual channel sent to the decoder to judge. In some implementations, the required extra C factor is given by the following equation.

[6]

したがって、Ｃパラメーターは、３チャネルダウンミックスの場合には形状（１×２）を有し、２チャネルダウンミックス場合には形状（２×１）を有する。 Therefore, the C parameter has the shape (1×2) for a 3-channel downmix and the shape (2×1) for a 2-channel downmix.

ステップ４。デコリレーターによって再構成しなければならないパラメーター化チャネルにおける残りのエネルギーを算出する。アップミックスチャネルにおける残差エネルギーＲｅｓ_ｕｕは、実際のエネルギーＲ_ｕｕ（事後予測）と再生成された相互予測エネルギーＲｅｇ_ｕｕとの間の差である。

［７］

［８］

［９］ Step 4. Calculate the remaining energy in the parameterized channel that must be reconstructed by the decorrelator. The residual energy Res _uu in the upmix channel is the difference between the actual energy R _uu (posterior prediction) and the regenerated co-prediction energy Reg _uu .

[7]

[8]

[9]

Ｐも共分散行列であり、したがってエルミート対称であり、そのため、上三角または下三角からのパラメーターのみをデコーダー３０６に送信する必要がある。対角エントリーは実数である一方、非対角要素は複素数であってもよい。 P is also a covariance matrix and is therefore Hermitian symmetric, so only parameters from the upper or lower triangle need to be sent to decoder 306 . The diagonal entries may be real, while the off-diagonal elements may be complex.

ＩＶＡＳビットストリームの例示の符号化／復号化
図２および図３を参照して説明したように、ＩＶＡＳビットストリーム（単数または複数）は、ＩＶＡＳコーデックによって符号化および復号化される。いくつかの実施態様において、ＩＶＡＳエンコーダーは、符号化ツールインジケーターおよびサンプリングレートインジケーターを求め、ＩＶＡＳビットストリームの共通ヘッダー（ＣＨ：common header）セクション内に符号化する。いくつかの実施態様において、符号化ツールインジケーターは、符号化ツールに対応する値を備え、サンプリングレートインジケーターは、サンプリングレートを示す値を備える。ＩＶＡＳエンコーダーは、ＥＶＳペイロードを求め、ビットストリームのＥＶＳペイロード（ＥＰ：EVS payload）セクション内に符号化する。ＥＰセクションはＣＨセクションの後に続く。ＩＶＡＳエンコーダーは、メタデータペイロードを求め、ビットストリームのメタデータペイロード（ＭＤＰ：metadata payload）セクション内に符号化する。いくつかの実施態様において、ＭＤＰセクションはＣＨセクションの後に続く。他の実施態様において、ＭＤＰセクションはビットストリームのＥＰセクションの後に続くか、または、ＥＰセクションがビットストリームのＭＤＰセクションの後に続く。いくつかの実施態様において、ＩＶＡＳエンコーダーは、ビットストリームを非一時的コンピューター可読媒体上に記憶するかまたはビットストリームを下流デバイスにストリーミングする。他の実施態様において、ＩＶＡＳエンコーダーは、図８に示すデバイスアーキテクチャを有する。 Exemplary Encoding/Decoding of IVAS Bitstreams As described with reference to FIGS. 2 and 3, IVAS bitstream(s) are encoded and decoded by an IVAS codec. In some implementations, the IVAS encoder determines the encoding tool indicator and the sampling rate indicator and encodes them in the common header (CH) section of the IVAS bitstream. In some implementations, the encoding tool indicator comprises a value corresponding to the encoding tool and the sampling rate indicator comprises a value indicative of the sampling rate. The IVAS encoder takes the EVS payload and encodes it into the EVS payload (EP) section of the bitstream. The EP section follows the CH section. The IVAS encoder takes the metadata payload and encodes it into the metadata payload (MDP) section of the bitstream. In some implementations, the MDP section follows the CH section. In other implementations, the MDP section follows the EP section of the bitstream, or the EP section follows the MDP section of the bitstream. In some implementations, the IVAS encoder stores the bitstream on a non-transitory computer-readable medium or streams the bitstream to downstream devices. In another embodiment, the IVAS encoder has the device architecture shown in FIG.

いくつかの実施態様において、ＩＶＡＳデコーダーは、ＩＶＡＳビットストリームを受信し、ＩＶＡＳエンコーダーによってＩＶＡＳフォーマットで符号化されたオーディオデータを抽出して復号化する。ＩＶＡＳデコーダーは、ＩＶＡＳビットストリームのＣＨセクション内の符号化ツールインジケーターおよびサンプリングレートインジケーターを抽出して復号化する。ＩＶＡＳデコーダーは、ビットストリームのＥＰセクション内のＥＶＳペイロードを抽出して復号化する。ＥＰセクションはＣＨセクションの後に続く。ＩＶＡＳデコーダーは、ビットストリームのＭＤＰセクション内のメタデータペイロードを抽出して復号化する。ＭＤＰセクションはＣＨセクションの後に続く。他の実施態様において、ＭＤＰセクションはビットストリームのＥＰセクションの後に続くか、または、ＥＰセクションがビットストリームのＭＤＰセクションの後に続く。いくつかの実施態様において、ＩＶＡＳシステムは、符号化ツール、サンプリングレート、ＥＶＳペイロード、およびメタデータペイロードに基づいてオーディオデコーダーを制御する。他の実施態様において、ＩＶＡＳシステムは、符号化ツール、サンプリングレート、ＥＶＳペイロード、およびメタデータペイロードの表現を非一時的コンピューター可読媒体上に記憶する。いくつかの実施態様において、ＩＶＡＳデコーダーは、図８に示すデバイスアーキテクチャを有する。 In some implementations, the IVAS decoder receives the IVAS bitstream and extracts and decodes the audio data encoded in the IVAS format by the IVAS encoder. The IVAS decoder extracts and decodes the encoding tool indicator and sampling rate indicator in the CH section of the IVAS bitstream. The IVAS decoder extracts and decodes the EVS payload within the EP section of the bitstream. The EP section follows the CH section. The IVAS decoder extracts and decodes the metadata payload within the MDP section of the bitstream. The MDP section follows the CH section. In other implementations, the MDP section follows the EP section of the bitstream, or the EP section follows the MDP section of the bitstream. In some implementations, the IVAS system controls the audio decoder based on the encoding tool, sampling rate, EVS payload, and metadata payload. In other embodiments, the IVAS system stores representations of the encoding tool, sampling rate, EVS payload, and metadata payload on non-transitory computer-readable media. In some implementations, the IVAS decoder has the device architecture shown in FIG.

いくつかの実施態様において、ＩＶＡＳ符号化ツールインジケーターは、マルチビットデータ構造体である。他の実施態様において、ＩＶＡＳ符号化ツールインジケーターは、３ビットデータ構造体であり、３ビットデータ構造体の第１の値はマルチモノラル符号化ツールに対応し、３ビットデータ構造体の第２の値はＣＡＣＰＬ符号化ツールに対応し、３ビットデータ構造体の第３の値は別の符号化ツールに対応する。他の実施態様において、ＩＶＡＳ符号化ツールインジケーターは、１つ～４つのＩＶＡＳ符号化ツールを示す２ビットデータ構造体または１つもしくは２つのＩＶＡＳ符号化ツールを示す１ビットデータ構造体である。他の実施態様において、ＩＶＡＳ符号化ツールインジケーターは、種々のＩＶＡＳ符号化ツールを示すために３ビット以上を含む。 In some implementations, the IVAS Encoding Tool Indicator is a multi-bit data structure. In another embodiment, the IVAS encoding tool indicator is a 3-bit data structure, a first value of the 3-bit data structure corresponds to a multi-mono encoding tool, and a second value of the 3-bit data structure is The value corresponds to a CACPL encoding tool and the third value of the 3-bit data structure corresponds to another encoding tool. In other embodiments, the IVAS encoding tool indicator is a 2-bit data structure that indicates 1-4 IVAS encoding tools or a 1-bit data structure that indicates 1 or 2 IVAS encoding tools. In other embodiments, the IVAS encoding tool indicator includes 3 or more bits to indicate different IVAS encoding tools.

いくつかの実施態様において、入力サンプリングレートインジケーターは、種々の入力サンプリングレートを示すマルチビットデータ構造体である。いくつかの実施態様において、入力サンプリングレートインジケーターは、２ビットデータ構造体であり、２ビットデータ構造体の第１の値は８ｋＨｚサンプリングレートを示し、２ビットデータ構造体の第２の値は１６ｋＨｚサンプリングレートを示し、２ビットデータ構造体の第３の値は３２ｋＨｚサンプリングレートを示し、２ビットデータ構造体の第４の値は４８ｋＨｚサンプリングレートを示す。他の実施態様において、入力サンプリングレートインジケーターは、１つまたは２つのサンプリングレートを示す１ビットデータ構造体である。他の実施態様において、入力サンプリングレートインジケーターは、種々のサンプリングレートを示す３ビット以上を含む。 In some implementations, the input sampling rate indicator is a multi-bit data structure that indicates various input sampling rates. In some implementations, the input sampling rate indicator is a 2-bit data structure, a first value of the 2-bit data structure indicating an 8 kHz sampling rate, and a second value of the 2-bit data structure indicating a 16 kHz sampling rate. Indicating the sampling rate, the third value of the 2-bit data structure indicates a 32 kHz sampling rate and the fourth value of the 2-bit data structure indicates a 48 kHz sampling rate. In other implementations, the input sampling rate indicator is a 1-bit data structure that indicates one or two sampling rates. In other embodiments, the input sampling rate indicator includes 3 or more bits that indicate various sampling rates.

いくつかの実施態様において、システムは、第３世代パートナーシッププロジェクト（３ＧＰＰ：3^rd generation partnership project）技術仕様（ＴＳ：technical specification）２６．４４５にこの順序で記載されているように、ＥＶＳチャネルの数すなわちＥＶＳチャネル数インジケーター；ビットレート（ＢＲ：bitrate）抽出モードインジケーター；ＥＶＳＢＲデータ；および全てのチャネルのＥＶＳペイロードをビットストリームのＥＰセクションに格納するかまたはビットストリームのＥＰセクションから読み出す。 In some embodiments, the system uses the number of EVS channels as described in the 3rd generation partnership project ( ^3GPP ) technical specification (TS) 26.445, in that order. bitrate (BR) extraction mode indicator; EVS BR data; and EVS payloads for all channels are stored in or read from the EP section of the bitstream.

他の実施態様において、システムは、ＥＶＳチャネル数インジケーターをビットストリームのＥＰセクションに格納するかまたはビットストリームのＥＰセクションから読み出す。 In other embodiments, the system stores the EVS channel number indicator in the EP section of the bitstream or reads it from the EP section of the bitstream.

他の実施態様において、システムは、ビットレート（ＢＲ）抽出モードインジケーターをビットストリームのＥＰセクションに格納するかまたはビットストリームのＥＰセクションから読み出す。 In other implementations, the system stores or reads the bitrate (BR) extraction mode indicator in the EP section of the bitstream.

他の実施態様において、システムは、ＥＶＳＢＲデータをビットストリームのＥＰセクションに格納するかまたはビットストリームのＥＰセクションから読み出す。 In other embodiments, the system stores EVS BR data in or reads from the EP section of the bitstream.

他の実施態様において、システムは、第３世代パートナーシッププロジェクト（３ＧＰＰ）技術仕様（ＴＳ）２６．４４５にこの順序で記載されているように、全てのチャネルのＥＶＳペイロードをビットストリームのＥＰセクションに格納するかまたはビットストリームのＥＰセクションから読み出す。 In another embodiment, the system stores the EVS payloads of all channels in the EP section of the bitstream, as described in 3rd Generation Partnership Project (3GPP) Technical Specification (TS) 26.445, in that order. or read from the EP section of the bitstream.

いくつかの実施態様において、ＩＶＡＳシステムは、符号化技法インジケーター；帯域数インジケーター；フィルターバンクの遅延構成を示すインジケーター；量子化ストラテジーのインジケーター；エントロピーコーダーインジケーター；確率モデルタイプインジケーター；係数実部；係数虚部；および１つ以上の係数をデータストリームのＭＤＰセクションに格納するかまたはデータストリームのＭＤＰセクションから読み出す。 In some embodiments, the IVAS system includes: a coding technique indicator; a number of bands indicator; an indicator indicating the delay configuration of the filterbank; and one or more coefficients stored in or read from the MDP section of the data stream.

他の実施態様において、ＩＶＡＳシステムは、符号化技法インジケーターをデータストリームのＭＤＰセクションに格納するかまたはデータストリームのＭＤＰセクションから読み出す。 In other implementations, the IVAS system stores the coding technique indicator in or reads the MDP section of the data stream.

他の実施態様において、ＩＶＡＳシステムは、帯域数インジケーターをデータストリームのＭＤＰセクションに格納するかまたはデータストリームのＭＤＰセクションから読み出す。 In other embodiments, the IVAS system stores the band number indicator in or reads from the MDP section of the data stream.

他の実施態様において、ＩＶＡＳシステムは、フィルターバンクの遅延構成を示すインジケーターをデータストリームのＭＤＰセクションに格納するかまたはデータストリームのＭＤＰセクションから読み出す。 In another embodiment, the IVAS system stores or reads from the MDP section of the datastream an indicator of the delay configuration of the filterbank.

他の実施態様において、ＩＶＡＳシステムは、量子化ストラテジーのインジケーターをデータストリームのＭＤＰセクションに格納するかまたはデータストリームのＭＤＰセクションから読み出す。 In other embodiments, the IVAS system stores the quantization strategy indicator in or reads from the MDP section of the data stream.

他の実施態様において、ＩＶＡＳシステムは、エントロピーコーダーインジケーターをデータストリームのＭＤＰセクションに格納するかまたはデータストリームのＭＤＰセクションから読み出す。 In other implementations, the IVAS system stores or reads the entropy coder indicator in the MDP section of the data stream.

他の実施態様において、ＩＶＡＳシステムは、確率モデルタイプインジケーターをデータストリームのＭＤＰセクションに格納するかまたはデータストリームのＭＤＰセクションから読み出す。 In other embodiments, the IVAS system stores the probabilistic model type indicator in or reads from the MDP section of the data stream.

他の実施態様において、ＩＶＡＳシステムは、係数実部をデータストリームのＭＤＰセクションに格納するかまたはデータストリームのＭＤＰセクションから読み出す。他の実施態様において、ＩＶＡＳシステムは、係数虚部をデータストリームのＭＤＰセクションに格納するかまたはデータストリームのＭＤＰセクションから読み出す。 In other implementations, the IVAS system stores the real coefficients in or reads the MDP section of the data stream. In other implementations, the IVAS system stores the coefficient imaginary part in or reads from the MDP section of the data stream.

他の実施態様において、ＩＶＡＳシステムは、１つ以上の係数をデータストリームのＭＤＰセクションに格納するかまたはデータストリームのＭＤＰセクションから読み出す。 In other implementations, the IVAS system stores one or more coefficients in or reads from the MDP section of the data stream.

ＩＶＡＳビットストリームフォーマットのいくつかの例を以下に示す。 Some examples of IVAS bitstream formats are given below.

例示のＩＶＡＳビットストリームフォーマット－３細分フォーマット
いくつかの実施態様において、ＩＶＡＳビットストリームフォーマットは、以下のように３つの細分を含む。

Exemplary IVAS Bitstream Format—3 Subdivision Format In some implementations, the IVAS bitstream format includes three subdivisions as follows.

いくつかの実施態様において、各細分における各フィールド内のパラメーターおよびそれらのそれぞれのビット割り当てが以下に示される。

In some implementations, the parameters within each field in each subdivision and their respective bit assignments are shown below.

上述したＩＶＡＳビットストリームフォーマットの実施形態の利点は、この実施形態が、様々なオーディオサービス能力をサポートするデータを効率的かつコンパクトに符号化することである。これらのオーディオサービス能力は、モノラルからステレオへのアップミックスならびに完全没入型オーディオ符号化、復号化およびレンダリングを含むが、これらに限定されるものではない。この実施形態は、広範囲のデバイス、エンドポイント、およびネットワークノードによってもサポートされる。これらの広範囲のデバイス等は、モバイルフォンおよびスマートフォン、電子タブレット、パーソナルコンピューター、会議電話、会議室、仮想現実（ＶＲ）デバイスおよび拡張現実（ＡＲ）デバイス、ホームシアターデバイス、ならびに他の適したデバイスを含むが、これらに限定されるものではなく、これらのそれぞれは、音のキャプチャーおよびレンダリング用の様々な音響インターフェースを有することができる。ＩＶＡＳビットストリームフォーマットは、ＩＶＡＳ規格および技術とともに容易に発展することができるように拡張可能である。 An advantage of the IVAS bitstream format embodiment described above is that it efficiently and compactly encodes data that supports a variety of audio service capabilities. These audio service capabilities include, but are not limited to, mono-to-stereo upmix and fully immersive audio encoding, decoding and rendering. This embodiment is also supported by a wide range of devices, endpoints and network nodes. These wide range of devices and others include mobile phones and smart phones, electronic tablets, personal computers, conference phones, conference rooms, virtual reality (VR) and augmented reality (AR) devices, home theater devices, and other suitable devices. but not limited to, each of which can have different acoustic interfaces for capturing and rendering sound. The IVAS bitstream format is extensible so that it can evolve easily with IVAS standards and technologies.

例示のＩＶＡＳビットストリームフォーマット－４細分フォーマット
更なる実施形態の以下の説明は、この更なる実施形態と前述した実施形態との間の相違に焦点を当てている。したがって、双方の実施形態に共通の特徴は、以下の説明から省略される場合があり、省略された場合には、前述した実施形態の特徴がこの更なる実施形態において実施されるかまたは少なくとも実施することができる（ただし以下の説明がそうではないと要請しない限りは）と仮定されるべきである。加えて、或る特徴が、以下に開示される実施態様から取り出されて請求項に追加されるとき、その特徴は、その実施態様の他の特徴に関係したり密接に関連しない場合もある。 Exemplary IVAS Bitstream Format-4 Subdivision Format The following description of a further embodiment will focus on the differences between this further embodiment and the previously described embodiments. Accordingly, features common to both embodiments may be omitted from the following description and, if omitted, features of the previously described embodiments may be implemented or at least implemented in this further embodiment. It should be assumed (unless the following description requires otherwise) that it can. In addition, when a feature is taken from an embodiment disclosed below and added to a claim, that feature may not be related or closely related to other features of that embodiment.

他の実施態様において、ＩＶＡＳビットストリームは、以下のように４つの細分を含む。

In another embodiment, the IVAS bitstream contains four subdivisions as follows.

いくつかの実施態様において、ＩＶＡＳエンコーダーは、符号化ツールインジケーターを求め、ＩＶＡＳビットストリームの共通ヘッダー（ＣＨ）セクション内に符号化する。符号化ツールインジケーターは、符号化ツールに対応する値を備える。ＩＶＡＳエンコーダーは、ＩＶＡＳビットレート分布制御テーブルへの行インデックスを求め、ＩＶＡＳビットストリームの共通空間符号化ツールヘッダー（ＣＴＨ：common spatial coding tool header）セクション内に符号化する。ＣＴＨセクションはＣＨセクションの後に続く。ＩＶＡＳエンコーダーは、ＥＶＳペイロードを求め、ＩＶＡＳビットストリームのＥＶＳペイロード（ＥＰ）セクション内に符号化する。ＥＰセクションはＣＨセクションの後に続く。ＩＶＡＳエンコーダーは、メタデータペイロードを求め、ＩＶＡＳビットストリームのメタデータペイロード（ＭＤＰ）セクション内に符号化する。ＭＤＰセクションはＣＨセクションの後に続く。 In some implementations, the IVAS encoder determines and encodes the encoding tool indicator into the common header (CH) section of the IVAS bitstream. The encoding tool indicator comprises a value corresponding to the encoding tool. The IVAS encoder finds the row index into the IVAS bitrate distribution control table and encodes it in the common spatial coding tool header (CTH) section of the IVAS bitstream. The CTH section follows the CH section. The IVAS encoder takes the EVS payload and encodes it into the EVS payload (EP) section of the IVAS bitstream. The EP section follows the CH section. The IVAS encoder takes the metadata payload and encodes it into the metadata payload (MDP) section of the IVAS bitstream. The MDP section follows the CH section.

いくつかの実施態様において、ＥＰセクションは、１つ以上のパラメーターに応じてＭＤＰセクションの前または後に位置する。いくつかの実施態様において、３ＧＰＰＴＳ２６．４４５に記載されているように、１つ以上のパラメーターは、公称ビットレートモードとのマルチチャネル入力のモノラルダウンミックスの後方互換モードを含む。 In some embodiments, the EP section precedes or follows the MDP section depending on one or more parameters. In some embodiments, the one or more parameters include a backwards compatible mode of mono downmixing of multi-channel input with nominal bitrate mode, as described in 3GPP TS 26.445.

いくつかの実施態様において、ＩＶＡＳシステムは、ＩＶＡＳビットストリームを非一時的コンピューター可読媒体上に記憶する。他の実施態様において、ＩＶＡＳシステムは、ビットストリームを下流デバイスにストリーミングする。いくつかの実施態様において、ＩＶＡＳエンコーダーは、図８に示すデバイスアーキテクチャを有する。 In some implementations, the IVAS system stores the IVAS bitstream on a non-transitory computer-readable medium. In other implementations, the IVAS system streams the bitstream to downstream devices. In some implementations, the IVAS encoder has the device architecture shown in FIG.

いくつかの実施態様において、ＩＶＡＳデコーダーは、ＩＶＡＳビットストリームを受信し、ＩＶＡＳエンコーダーによってＩＶＡＳフォーマットで符号化されたオーディオデータを抽出して復号化する。ＩＶＡＳデコーダーは、ＩＶＡＳビットストリームのＣＨセクション内の符号化ツールインジケーターを抽出して復号化する。ＩＶＡＳデコーダーは、ＩＶＡＳビットレート分布制御テーブルへのインデックスを抽出して復号化する。ＩＶＡＳデコーダーは、ＩＶＡＳビットストリームのＥＰセクション内のＥＶＳペイロードを抽出して復号化する。ＥＰセクションはＣＨセクションの後に続く。ＩＶＡＳデコーダーは、ＩＶＡＳビットストリームのＭＤＰセクション内のメタデータペイロードを抽出して復号化する。ＭＤＰセクションはＣＨセクションの後に続く。 In some implementations, the IVAS decoder receives the IVAS bitstream and extracts and decodes the audio data encoded in the IVAS format by the IVAS encoder. The IVAS decoder extracts and decodes the encoding tool indicator in the CH section of the IVAS bitstream. The IVAS decoder extracts and decodes the index into the IVAS bitrate distribution control table. The IVAS decoder extracts and decodes the EVS payload within the EP section of the IVAS bitstream. The EP section follows the CH section. The IVAS decoder extracts and decodes the metadata payload in the MDP section of the IVAS bitstream. The MDP section follows the CH section.

いくつかの実施態様において、ＥＰセクションは、１つ以上のパラメーターに応じて、ＭＤＰセクションの前または後に位置する。いくつかの実施態様において、３ＧＰＰＴＳ２６．４４５に記載されているように、１つ以上のパラメーターは、公称ビットレートモードとのマルチチャネル入力のモノラルダウンミックスの後方互換モードを含む。 In some implementations, the EP section precedes or follows the MDP section, depending on one or more parameters. In some embodiments, the one or more parameters include a backwards compatible mode of mono downmixing of multi-channel input with nominal bitrate mode, as described in 3GPP TS 26.445.

いくつかの実施態様において、ＩＶＡＳシステムは、符号化ツール、ＩＶＡＳビットレート分布制御テーブルへのインデックス、ＥＶＳペイロード、およびメタデータペイロードに基づいてオーディオデコーダーを制御する。他の実施態様において、ＩＶＡＳシステムは、符号化ツール、ＩＶＡＳビットレート分布制御テーブルへのインデックス、ＥＶＳペイロード、およびメタデータペイロードの表現を非一時的コンピューター可読媒体上に記憶する。いくつかの実施態様において、ＩＶＡＳデコーダーは、図８に示すデバイスアーキテクチャを有する。

In some implementations, the IVAS system controls the audio decoder based on the encoding tool, the index into the IVAS bitrate distribution control table, the EVS payload, and the metadata payload. In another embodiment, the IVAS system stores representations of the encoding tool, the index into the IVAS bitrate distribution control table, the EVS payload, and the metadata payload on a non-transitory computer-readable medium. In some implementations, the IVAS decoder has the device architecture shown in FIG.

メタデータペイロード（ＭＤＰ）：
ＩＶＡＳビットレート分布制御テーブルの利点は、このテーブルが、空間符号化モードについての情報をＭＤＰセクションに含める必要がないように、空間符号化モードについての情報を記録するということである。

Metadata Payload (MDP):
An advantage of the IVAS bitrate distribution control table is that it records information about the spatial coding mode such that the information about the spatial coding mode need not be included in the MDP section.

ＥＶＳペイロード（ＥＰ）：
ペイロードのこのセクションは、１つ以上のオーディオダウンミックスチャネルのＥＶＳ符号化ビットを含む。いくつかの実施態様において、このセクションにおける総ビット数は、

によって与えることができる。ここで、Ｎ（例えば、Ｎ＝４）は、符号化するのに必要とされるオーディオダウンミックスチャネルの数であり、ＥＶＳ_{ＢＲ（ｉ）}は、ｉ番目のオーディオダウンミックスチャネルの算出されたＥＶＳビットレートであり、ｓｔｒｉｄｅ_ｓｅｃｓは、秒を単位とする入力ストライド長である。 EVS Payload (EP):
This section of the payload contains EVS coded bits for one or more audio downmix channels. In some implementations, the total number of bits in this section is

can be given by where N (eg, N=4) is the number of audio downmix channels needed to encode and EVS _BR(i) is the calculated EVS of the i-th audio downmix channel is the bitrate and stride _secs is the input stride length in seconds.

いくつかの実施態様において、ＩＶＡＳビットレート分布制御テーブル内の各テーブルエントリーは、ＥＶＳ用に割り当てられた全ビットから各ＥＶＳインスタンスのビットレートを抽出するのに十分な情報を有する。この構造体は、各ＥＶＳインスタンスのビットを抽出するために、ＥＶＳペイロード内に追加のヘッダー情報が必要とされないという利点を提供する。

In some implementations, each table entry in the IVAS bitrate distribution control table has sufficient information to extract the bitrate of each EVS instance from all bits allocated for EVS. This structure provides the advantage that no additional header information is required within the EVS payload to extract the bits for each EVS instance.

いくつかの実施態様において、ＩＶＡＳビットレート分布制御テーブルにおけるパラメーターは以下の値を有する。

In some embodiments, the parameters in the IVAS bitrate distribution control table have the following values.

例示のＩＶＡＳビットレート分布制御テーブルは次のとおりである。

An exemplary IVAS bitrate distribution control table is as follows.

ＩＶＡＳビットストリームの例示の復号化
一実施形態において、ＩＶＡＳビットストリームを復号化するステップは以下のとおりである。 Exemplary Decoding of an IVAS Bitstream In one embodiment, the steps for decoding an IVAS bitstream are as follows.

ステップ１：ビットストリームの長さおよびｓｔｒｉｄｅ_ｓｅｃｓに基づいてＩＶＡＳ動作ビットレートを計算する。 Step 1: Calculate the IVAS operating bitrate based on the bitstream length and stride _secs .

ステップ２：空間符号化ツールを示す固定長のＣＨセクションを読み出す。 Step 2: Read the fixed length CH section indicating the spatial encoding tool.

ステップ３：ＩＶＡＳ動作ビットレートに基づいて、ＩＶＡＳビットレート分布制御テーブルにおけるＩＶＡＳ動作ビットレート（ステップ１において算出される）のエントリーの数を調べることによってＣＴＨフィールドの長さを求める。 Step 3: Based on the IVAS operating bitrate, determine the length of the CTH field by looking up the number of entries for the IVAS operating bitrate (calculated in step 1) in the IVAS bitrate distribution control table.

ステップ３：ＣＴＨフィールドの長さが判明すると、ＣＴＨフィールド内のインデックスオフセットを読み出す。 Step 3: Once the length of the CTH field is known, read the index offset within the CTH field.

ステップ５：インデックスオフセットおよびＩＶＡＳ動作ビットレートを使用して実際のＩＶＡＳビットレート分布制御テーブルインデックスを求める。 Step 5: Use the index offset and the IVAS operating bitrate to find the actual IVAS bitrate distribution control table index.

ステップ６：ＥＶＳビットレート分布およびモノラルダウンミックス後方互換性についての全ての情報をインデックス指定されたテーブルエントリーから読み出す。 Step 6: Read all information about EVS bitrate distribution and mono downmix backwards compatibility from the indexed table entry.

ステップ７：モノラルダウンミックス後方互換モードがＯＮである場合には、まず残りのＩＶＡＳビットをＥＶＳデコーダーに渡し、各ＥＶＳインスタンスのビット長をそのＥＶＳビットレート分布に基づいて算出し、各ＥＶＳインスタンスのＥＶＳビットを読み出し、対応するＥＶＳデコーダーを用いてＥＶＳビットを復号化し、ＭＤＰセクション内の空間メタデータを復号化する。 Step 7: If the mono downmix backward compatibility mode is ON, first pass the remaining IVAS bits to the EVS decoder, calculate the bit length of each EVS instance based on its EVS bitrate distribution, Read the EVS bits and decode the EVS bits using the corresponding EVS decoder to decode the spatial metadata in the MDP section.

ステップ８：モノラルダウンミックス後方互換モードがＯＦＦである場合には、ＭＤＰセクション内の空間メタデータを復号化し、各ＥＶＳインスタンスのビット長をそのＥＶＳビットレート分布に基づいて算出し、ＩＶＡＳビットストリームのＥＰセクションから各ＥＶＳインスタンスのＥＶＳビットを読み出して復号化する。 Step 8: If the mono downmix backward compatibility mode is OFF, decode the spatial metadata in the MDP section, calculate the bit length of each EVS instance based on its EVS bitrate distribution, and Read and decode the EVS bits for each EVS instance from the EP section.

ステップ９：復号化されたＥＶＳ出力および空間メタデータを使用して、入力オーディオフォーマット、例えばステレオ（ＣＡＣＰＬ）またはＦｏＡ（ＳＰＡＲ）等を構成する。 Step 9: Use the decoded EVS output and spatial metadata to construct the input audio format, such as stereo (CACPL) or FoA (SPAR).

例示のＩＶＡＳＳＰＡＲ符号化／復号化
更なる実施形態の以下の説明は、この更なる実施形態と前述した実施形態との間の相違に焦点を当てている。したがって、双方の実施形態に共通の特徴は、以下の説明から省略される場合があり、省略された場合には、前述した実施形態の特徴がこの更なる実施形態において実施されるかまたは少なくとも実施することができる（ただし以下の説明がそうではないと要請しない限りは）と仮定されるべきである。加えて、或る特徴が、以下に開示される実施態様から取り出されて請求項に追加されるとき、その特徴は、その実施態様の他の特徴に関係したり密接に関連しない場合もある。 Exemplary IVAS SPAR Encoding/Decoding The following description of a further embodiment focuses on the differences between this further embodiment and the previously described embodiments. Accordingly, features common to both embodiments may be omitted from the following description and, if omitted, features of the previously described embodiments may be implemented or at least implemented in this further embodiment. It should be assumed (unless the following description requires otherwise) that it can. In addition, when a feature is taken from an embodiment disclosed below and added to a claim, that feature may not be related or closely related to other features of that embodiment.

いくつかの実施態様において、ＩＶＡＳＳＰＡＲエンコーダーは、符号化モード／ツールインジケーターを求め、ＩＶＡＳビットストリームの共通ヘッダー（ＣＨ）セクション内に符号化する。符号化モード／ツールインジケーターは、符号化モード／ツールに対応する値を有する。ＩＶＡＳビットストリームは、モードヘッダー／ツールヘッダーを求め、ＩＶＡＳビットストリームのツールヘッダー（ＴＨ）セクション内に符号化する。ここで、ＴＨセクションはＣＨセクションの後に続く。ＩＶＡＳＳＰＡＲエンコーダーは、メタデータペイロードを求め、ＩＶＡＳビットストリームのメタデータペイロード（ＭＤＰ）セクション内に符号化する。ここで、ＭＤＰセクションはＣＨセクションの後に続く。ＩＶＡＳＳＰＡＲエンコーダーは、拡張型音声サービス（ＥＶＳ）ペイロードを求め、ＩＶＡＳビットストリームのＥＶＳペイロード（ＥＰ）セクション内に符号化する。ここで、ＥＰセクションはＣＨセクションの後に続く。いくつかの実施態様において、ＩＶＡＳシステムは、ビットストリームを非一時的コンピューター可読媒体上に記憶する。他の実施態様において、ＩＶＡＳシステムは、ビットストリームを下流デバイスにストリーミングする。いくつかの実施態様において、ＩＶＡＳＳＰＡＲエンコーダーは、図８を参照して説明するデバイスアーキテクチャを有する。 In some implementations, the IVAS SPAR encoder determines the encoding mode/tool indicator and encodes it in the common header (CH) section of the IVAS bitstream. The encoding mode/tool indicator has a value corresponding to the encoding mode/tool. The IVAS bitstream seeks a Mode Header/Tool Header and encodes it in the Tool Header (TH) section of the IVAS bitstream. Here, the TH section follows the CH section. The IVAS SPAR encoder takes a metadata payload and encodes it into the metadata payload (MDP) section of the IVAS bitstream. Here, the MDP section follows the CH section. The IVAS SPAR encoder takes an enhanced voice service (EVS) payload and encodes it into the EVS payload (EP) section of the IVAS bitstream. Here, the EP section follows the CH section. In some implementations, the IVAS system stores the bitstream on a non-transitory computer-readable medium. In other implementations, the IVAS system streams the bitstream to downstream devices. In some implementations, the IVAS SPAR encoder has the device architecture described with reference to FIG.

いくつかの実施態様において、ＥＰセクションはＭＤＰセクションの後に続く。ＥＰセクションをＩＶＡＳビットストリームのＭＤＰセクションの後に続かせることによって、効率的なビットパッキングが確保され、ＭＤＰビットおよびＥＰビットの数が（ビットレート分布アルゴリズムに従って）変化することを可能にすることによって、ＩＶＡＳビットレートバジェットにおける全ての利用可能なビットの利用が確保されることに留意されたい。 In some implementations, the EP section follows the MDP section. Efficient bit packing is ensured by having the EP section follow the MDP section of the IVAS bitstream, and by allowing the number of MDP and EP bits to vary (according to the bitrate distribution algorithm), Note that utilization of all available bits in the IVAS bitrate budget is ensured.

いくつかの実施態様において、ＩＶＡＳＳＰＡＲデコーダーは、ＩＶＡＳＳＰＡＲフォーマットで符号化されたＩＶＡＳビットストリームを抽出して復号化する。ＩＶＡＳＳＰＡＲデコーダーは、ビットストリームのＣＨセクション内の符号化モード／ツールインジケーターを抽出して復号化する。符号化モード／ツールインジケーターは、符号化モード／ツールに対応する値を有する。ＩＶＡＳＳＰＡＲデコーダーは、ビットストリームのツールヘッダー（ＴＨ）セクション内のモードヘッダー／ツールヘッダーを抽出して復号化する。ＴＨセクションはＣＨセクションの後に続く。ＩＶＡＳＳＰＡＲデコーダーは、ビットストリームのＭＤＰセクション内のメタデータペイロードを抽出して復号化する。ＭＤＰセクションはＣＨセクションの後に続く。ＩＶＡＳＳＰＡＲデコーダーは、ビットストリームのＥＰセクション内のＥＶＳペイロードを復号化する。ＥＰセクションはＣＨセクションの後に続く。 In some implementations, an IVAS SPAR decoder extracts and decodes an IVAS bitstream encoded in the IVAS SPAR format. The IVAS SPAR decoder extracts and decodes the coding mode/tool indicator in the CH section of the bitstream. The encoding mode/tool indicator has a value corresponding to the encoding mode/tool. The IVAS SPAR decoder extracts and decodes the mode header/tool header in the tool header (TH) section of the bitstream. A TH section follows a CH section. The IVAS SPAR decoder extracts and decodes the metadata payload in the MDP section of the bitstream. The MDP section follows the CH section. The IVAS SPAR decoder decodes the EVS payload within the EP section of the bitstream. The EP section follows the CH section.

いくつかの実施態様において、ＩＶＡＳシステムは、符号化モード、ツールヘッダー、ＥＶＳペイロード、およびメタデータペイロードに基づいてオーディオデコーダーを制御する。他の実施態様において、ＩＶＡＳシステムは、符号化モード、ツールヘッダー、ＥＶＳペイロード、およびメタデータペイロードの表現を非一時的コンピューター可読媒体上に記憶する。いくつかの実施態様において、ＩＶＡＳＳＰＡＲデコーダーは、図８を参照して説明するデバイスアーキテクチャを有する。 In some implementations, the IVAS system controls the audio decoder based on the encoding mode, tool header, EVS payload, and metadata payload. In other implementations, the IVAS system stores representations of the encoding mode, tool header, EVS payload, and metadata payload on non-transitory computer-readable media. In some implementations, the IVAS SPAR decoder has the device architecture described with reference to FIG.

いくつかの実施態様において、ＣＨは３ビットデータ構造体を含み、３ビットデータ構造体の値のうちの１つはＳＰＡＲ符号化モードに対応し、値の残りは他の符号化モードに対応する。３ビットデータ構造体は、符号化モードを８つまで示すことができるコンパクトな符号を可能にするので有利である。他の実施態様において、ＣＨは３ビットよりも少ないビットを含む。他の実施態様において、ＣＨは３ビットよりも多くのビットを含む。 In some implementations, CH contains a 3-bit data structure, one of the values of the 3-bit data structure corresponds to a SPAR coding mode and the rest of the values correspond to other coding modes. . A 3-bit data structure is advantageous because it allows a compact code that can indicate up to 8 encoding modes. In other embodiments, CH contains less than 3 bits. In other embodiments, CH contains more than 3 bits.

いくつかの実施態様において、ＩＶＡＳシステムは、ＳＰＡＲビットレート分布制御テーブルにおける行を指し示す行インデックスをＩＶＡＳビットストリームのＴＨセクションに格納するかまたはＩＶＡＳビットストリームのＴＨセクションから読み出す。例えば、行インデックスは、ＩＶＡＳ動作ビットレートに対応する行の数に基づいて、次のように、すなわちｘ＝ｃｅｉｌ（ｌｏｇ_２（ＩＶＡＳビットレートに対応する行の数））として計算することができる。したがって、ＴＨセクションの長さは可変である。 In some implementations, the IVAS system stores or reads from the TH section of the IVAS bitstream a row index pointing to a row in the SPAR bitrate distribution control table. For example, the row index can be calculated based on the number of rows corresponding to the IVAS operating bitrate as follows: x=ceil(log ₂ (the number of rows corresponding to the IVAS bitrate)) . Therefore, the length of the TH section is variable.

いくつかの実施態様において、システムは、量子化ストラテジーインジケーター；符号化ストラテジーインジケーター；ならびに１つ以上の係数の量子化および符号化された実部および虚部をＩＶＡＳビットストリームのＭＤＰセクションに格納するかまたはＩＶＡＳビットストリームのＭＤＰセクションから読み出す。 In some implementations, the system stores the quantization strategy indicator; the encoding strategy indicator; and the quantized and encoded real and imaginary parts of one or more coefficients in the MDP section of the IVAS bitstream. or read from the MDP section of the IVAS bitstream.

他の実施態様において、システムは、量子化ストラテジーインジケーターをＩＶＡＳビットストリームのＭＤＰセクションに格納するかまたはＩＶＡＳビットストリームのＭＤＰセクションから読み出す。 In other embodiments, the system stores the quantization strategy indicator in or reads the MDP section of the IVAS bitstream.

他の実施態様において、システムは、符号化ストラテジーインジケーターをＩＶＡＳビットストリームのＭＤＰセクションに格納するかまたはＩＶＡＳビットストリームのＭＤＰセクションから読み出す。 In other implementations, the system stores the encoding strategy indicator in the MDP section of the IVAS bitstream or reads it from the MDP section of the IVAS bitstream.

他の実施態様において、システムは、１つ以上の係数の量子化および符号化された実部および虚部をＩＶＡＳビットストリームのＭＤＰセクションに格納するかまたはＩＶＡＳビットストリームのＭＤＰセクションから読み出す。 In other implementations, the system stores or reads the quantized and encoded real and imaginary parts of one or more coefficients in the MDP section of the IVAS bitstream.

いくつかの実施態様において、１つ以上の係数は、予測係数、相互予測係数（またはダイレクト係数）、実数（対角）デコリレーター係数および複素（非対角）デコリレーター係数を含むが、これらに限定されるものではない。 In some embodiments, the one or more coefficients include prediction coefficients, co-prediction coefficients (or direct coefficients), real (diagonal) decorrelator coefficients and complex (off-diagonal) decorrelator coefficients, including It is not limited.

いくつかの実施態様において、より多くの係数またはより少ない係数が、ＩＶＡＳビットストリームのＭＤＰセクションに格納され、ＩＶＡＳビットストリームのＭＤＰセクションから読み出される。 In some implementations, more or fewer coefficients are stored in the MDP section of the IVAS bitstream and read out from the MDP section of the IVAS bitstream.

いくつかの実施態様において、ＩＶＡＳシステムは、３ＧＰＰＴＳ２６．４４５に従って全てのチャネルのＥＶＳペイロードをＩＶＡＳビットストリームのＥＰセクションに格納するかまたはＩＶＡＳビットストリームのＥＰセクションから読み出す。 In some implementations, the IVAS system stores or reads the EVS payloads of all channels in the EP section of the IVAS bitstream according to 3GPP TS26.445.

ＳＰＡＲフォーマット化を用いた一例示のＩＶＡＳビットストリームを以下に示す。ＩＶＡＳビットストリームは、以下のように４つの細分を含む。

An example IVAS bitstream using SPAR formatting is shown below. The IVAS bitstream contains four subdivisions as follows.

共通ヘッダー（ＣＨ）：
いくつかの実施態様において、ＩＶＡＳ共通ヘッダー（ＣＨ）は、以下のようにフォーマット化される。

Common Header (CH):
In some implementations, the IVAS Common Header (CH) is formatted as follows.

ツールヘッダー（ＴＨ）：
いくつかの実施態様において、ＳＰＡＲツールヘッダー（ＴＨ）は、ＳＰＡＲビットレート分布制御テーブルへのインデックスオフセットである。

Tool Header (TH):
In some implementations, the SPAR Tool Header (TH) is an index offset into the SPAR Bitrate Distribution Control Table.

ＳＰＡＲビットレート分布制御テーブルの例示の実施態様を以下に示す。各ＩＶＡＳビットレートは、帯域幅（ＢＷ：Bandwidth）、ダウンミックス構成（ｄｍｘチャネル、ｄｍｘストリング）、アクティブＷ、複素フラグ、遷移モード値、ＥＶＳビットレート設定、メタデータ量子化レベル設定およびデコリレーターダッキング（ducking）フラグの１つ以上の値をサポートすることができる。この例示の実施態様において、ビットレート当たり１つのエントリーしかないので、ＳＰＡＲＴＨセクションのビット数は０である。以下の表において使用される頭字語は、以下のように定義される。
ＰＲ：予測係数、
Ｃ：相互予測係数（またはダイレクト係数）、
Ｐ_ｒ：実数（対角）デコリレーター係数、
Ｐ_ｃ：複素（非対角）デコリレーター係数。 An exemplary implementation of a SPAR bitrate distribution control table is shown below. Each IVAS bitrate has Bandwidth (BW), downmix configuration (dmx channel, dmx string), active W, complex flags, transition mode value, EVS bitrate setting, metadata quantization level setting and decorrelator ducking. One or more values of the (ducking) flag can be supported. In this exemplary implementation, the number of bits in the SPAR TH section is 0 since there is only one entry per bitrate. The acronyms used in the table below are defined as follows.
PR: prediction coefficient;
C: co-prediction coefficient (or direct coefficient),
P _r : real (diagonal) decorrelator coefficients,
P _c : complex (off-diagonal) decorrelator coefficients.

一例示のＳＰＡＲビットレート分布制御テーブルは以下のとおりである。

An example SPAR bitrate distribution control table is as follows.

メタデータペイロード（ＭＤＰ）：
一例示のメタデータペイロード（ＭＤＰ）は以下のとおりである。

Metadata Payload (MDP):
An example metadata payload (MDP) is as follows:

ＥＶＳペイロード（ＥＰ）：
いくつかの実施態様において、各ダウンミックスチャネルの実際のＥＶＳビットレートのメタデータの量子化および算出は、ＥＶＳビットレート分布制御ストラテジーを使用して行われる。ＥＶＳビットレート分布制御ストラテジーの一例示の実施態様を以下に説明する。 EVS Payload (EP):
In some implementations, the quantization and calculation of the actual EVS bitrate metadata for each downmix channel is performed using an EVS bitrate distribution control strategy. An exemplary implementation of the EVS bitrate distribution control strategy is described below.

例示のＥＶＳビットレート分布制御ストラテジー
いくつかの実施態様において、ＥＶＳビットレート分布制御ストラテジーは、メタデータ量子化およびＥＶＳビットレート分布の２つのセクションを含む。 Exemplary EVS Bitrate Distribution Control Strategy In some implementations, the EVS bitrate distribution control strategy includes two sections: metadata quantization and EVS bitrate distribution.

メタデータ量子化。このセクションには、目標パラメータービットレート閾値（ＭＤｔａｒ）および最大目標ビットレート閾値（ＭＤｍａｘ）の２つの定義された閾値がある。 Metadata quantization. There are two defined thresholds in this section: Target Parameter Bitrate Threshold (MDtar) and Maximum Target Bitrate Threshold (MDmax).

ステップ１：フレームごとに、パラメーターが非時間差分方法で量子化され、エントロピーコーダーを用いて符号化される。いくつかの実施態様において、算術コーダーが使用される。他の実施態様において、ハフマンエンコーダーが使用される。パラメータービットレート推定値がＭＤｔａｒ未満である場合には、オーディオエッセンスのビットレートを増加させるために、任意の余分の利用可能なビットがオーディオエンコーダーに供給される。 Step 1: For each frame, the parameters are quantized with a non-temporal difference method and encoded with an entropy coder. In some implementations, an arithmetic coder is used. In another embodiment, a Huffman encoder is used. If the parameter bitrate estimate is less than MDtar, any extra available bits are provided to the audio encoder to increase the bitrate of the audio essence.

ステップ２：ステップ１が失敗した場合には、フレーム内のパラメーター値のサブセットが量子化され、先行フレーム内の量子化されたパラメーター値から減算され、差分量子化されたパラメーター値が、エントロピーコーダーを用いて符号化される。パラメータービットレート推定値がＭＤｔａｒ未満である場合には、オーディオエッセンスのビットレートを増加させるために、任意の余分の利用可能なビットがオーディオエンコーダーに供給される。 Step 2: If step 1 fails, then a subset of the parameter values in the frame are quantized and subtracted from the quantized parameter values in the previous frame, and the differentially quantized parameter values are used by the entropy coder. is encoded using If the parameter bitrate estimate is less than MDtar, any extra available bits are provided to the audio encoder to increase the bitrate of the audio essence.

ステップ３：ステップ２が失敗した場合には、量子化されたパラメーターのビットレートがエントロピーなしで算出される。 Step 3: If step 2 fails, the bitrate of the quantized parameter is calculated without entropy.

ステップ４：ステップ１、ステップ２、およびステップ３の結果がＭＤｍａｘと比較される。ステップ１、ステップ２、およびステップ３の最小値がＭＤｍａｘ内にある場合には、残りのビットが符号化され、オーディオコーダーに提供される。 Step 4: The results of steps 1, 2 and 3 are compared with MDmax. If the minimum of steps 1, 2 and 3 is within MDmax, the remaining bits are encoded and provided to the audio coder.

ステップ５：ステップ４が失敗した場合には、パラメーターは、より粗く量子化され、上記ステップは、第１のフォールバックストラテジー（フォールバック１）として繰り返される。 Step 5: If step 4 fails, the parameters are quantized more coarsely and the above steps are repeated as the first fallback strategy (fallback 1).

ステップ６：ステップ５が失敗した場合には、パラメーターは、第２のフォールバックストラテジー（フォールバック２）としてＭＤｍａｘ内に収まることが保証された量子化方式を用いて量子化される。上述した全ての反復の後、メタデータビットレートはＭＤｍａｘ内に収まることが保証され、エンコーダーは、実際のメタデータビットすなわちＭｅｔａｄａｔａ＿ａｃｔｕａｌ＿ｂｉｔｓ（ＭＤａｃｔ）を生成する。 Step 6: If step 5 fails, the parameters are quantized using a quantization scheme guaranteed to be within MDmax as a second fallback strategy (fallback 2). After all the iterations described above, the metadata bitrate is guaranteed to be within MDmax, and the encoder produces the actual metadata bits, Metadata_actual_bits (MDact).

ＥＶＳビットレート分布（ＥＶＳｂｄ）。このセクションについて、以下の定義が適用される。
ＥＶＳｔａｒ：ＥＶＳ目標ビット、各ＥＶＳインスタンスの所望のビット。
ＥＶＳａｃｔ：ＥＶＳ実際ビット、全てのＥＶＳインスタンスに利用可能な実際のビットの合計。
ＥＶＳｍｉｎ：ＥＶＳ最小ビット、各ＥＶＳインスタンスの最小ビット。ＥＶＳビットレートは、これらのビットによって示される値を下回ってはならない。
ＥＶＳｍａｘ：ＥＶＳ最大ビット、各ＥＶＳインスタンスの最大ビット。ＥＶＳビットレートは、これらのビットによって示される値を上回ってはならない。
ＥＶＳＷ：Ｗチャネルを符号化するＥＶＳインスタンス。
ＥＶＳＹ：Ｙチャネルを符号化するＥＶＳインスタンス。
ＥＶＳＸ：Ｘチャネルを符号化するＥＶＳインスタンス。
ＥＶＳＺ：Ｚチャネルを符号化するＥＶＳインスタンス。
ＥＶＳａｃｔ＝ＩＶＡＳ＿ｂｉｔｓ－ｈｅａｄｅｒ＿ｂｉｔｓ－ＭＤａｃｔ EVS bitrate distribution (EVSbd). For this section, the following definitions apply.
EVStar: EVS target bit, desired bit for each EVS instance.
EVSact: EVS Actual Bits, sum of actual bits available for all EVS instances.
EVSmin: EVS minimum bit, minimum bit for each EVS instance. The EVS bitrate must not fall below the value indicated by these bits.
EVSmax: EVS maximum bit, maximum bit for each EVS instance. The EVS bitrate must not exceed the value indicated by these bits.
EVS W: EVS instance that encodes the W channel.
EVS Y: EVS instance that encodes the Y channel.
EVS X: An EVS instance that encodes the X channel.
EVS Z: EVS instance that encodes the Z channel.
EVSact=IVAS_bits-header_bits-MDact

ＥＶＳａｃｔが、全てのＥＶＳインスタンスのＥＶＳｔａｒの合計よりも小さい場合には、ビットが、ＥＶＳインスタンスから次の順序（Ｚ、Ｘ、Ｙ、Ｗ）で取り出される。任意のチャネルから取り出すことができる最大ビット＝ＥＶＳｔａｒ（ｃｈ）－ＥＶＳｍｉｎ（ｃｈ）である。 If EVSact is less than the sum of EVStar of all EVS instances, bits are taken from the EVS instances in the following order (Z, X, Y, W). Maximum bits that can be taken from any channel=EVStar(ch)-EVSmin(ch).

ＥＶＳａｃｔが、全てのＥＶＳインスタンスのＥＶＳｔａｒの合計よりも大きい場合には、全ての追加ビットがダウンミックスチャネルに次の順序（Ｗ、Ｙ、Ｘ、Ｚ）で割り当てられる。任意のチャネルに追加することができる最大追加ビット＝ＥＶＳｍａｘ（ｃｈ）－ＥＶＳｔａｒ（ｃｈ）である。 If EVSact is greater than the sum of EVStar of all EVS instances, then all additional bits are allocated to downmix channels in the following order (W, Y, X, Z). The maximum extra bits that can be added to any channel = EVSmax(ch) - EVStar(ch).

上述したＥＶＳｂｄ方式は、全てのチャネルの実際のＥＶＳビットレート、すなわち、Ｗチャネル、Ｙチャネル、ＸチャネルおよびＺチャネルのそれぞれＥＷａ、ＥＹａ、ＥＸａ、ＥＺａを計算する。各チャネルが、ＥＷａビットレート、ＥＹａビットレート、ＥＸａビットレートおよびＥＺａビットレートを用いて個別のＥＶＳインスタンスによって符号化された後、全てのＥＶＳビットは、互いに連結およびパッキングされる。この構成の利点は、いずれのチャネルについてもＥＶＳビットレートを示すのに追加のヘッダーが必要とされないことである。 The EVSbd scheme described above calculates the actual EVS bitrates for all channels, ie EWa, EYa, EXa, EZa for W, Y, X and Z channels respectively. After each channel is encoded by a separate EVS instance using EWa, EYa, EXa and EZa bit rates, all EVS bits are concatenated and packed together. The advantage of this configuration is that no additional header is required to indicate the EVS bitrate for any channel.

いくつかの実施態様において、ＥＰセクションは以下のとおりである。

In some embodiments, the EP section is as follows.

例示のＳＰＡＲデコーダービットストリームアンパッキング
いくつかの実施態様において、ＳＰＡＲデコーダービットストリームアンパッキングのステップが以下のように説明される。 Exemplary SPAR Decoder Bitstream Unpacking In some embodiments, the steps for SPAR decoder bitstream unpacking are described as follows.

ステップ１：受信ビットバッファーの長さからＩＶＡＳビットレートを求める。 Step 1: Obtain the IVAS bit rate from the length of the received bit buffer.

ステップ２：ＳＰＡＲビットレート分布制御テーブル内のＩＶＡＳビットレートのエントリーの数に基づいてＳＰＡＲＴＨセクションをパースし、インデックスオフセットを抽出する。ここで、このインデックスオフセットは、ＩＶＡＳ動作ビットレートによって求められる。 Step 2: Parse the SPAR TH section based on the number of IVAS bitrate entries in the SPAR bitrate distribution control table and extract the index offset. Here, this index offset is determined by the IVAS operating bitrate.

ステップ３：インデックスオフセットを使用してＳＰＡＲビットレート分布制御テーブルの実際のテーブル行インデックスを求め、この実際のテーブル行インデックスによって指し示されるＳＰＡＲビットレート分布制御テーブル行の全ての列を読み出す。 Step 3: Use the index offset to obtain the actual table row index of the SPAR bitrate distribution control table, and read all the columns of the SPAR bitrate distribution control table row pointed to by this actual table row index.

ステップ４：ＩＶＡＳビットストリームのＭＤＰセクションからの量子化ストラテジービットおよび符号化ストラテジービットを読み出し、示された量子化ストラテジーおよび符号化ストラテジーに基づいてＭＰＤセクション内のＳＰＡＲ空間メタデータを量子化解除(unquantize)する。 Step 4: Read the quantization strategy bits and coding strategy bits from the MDP section of the IVAS bitstream and unquantize the SPAR spatial metadata in the MPD section based on the indicated quantization strategy and coding strategy. )do.

ステップ５：全ＥＶＳビットレート（ＩＶＡＳビットストリームから読み出される残りのビット）に基づいて、上述したＥＶＳビットレート分布（ＥＶＳｂｄ）ごとに各チャネルの実際のＥＶＳビットレートを求める。 Step 5: Based on the total EVS bitrate (remaining bits read from the IVAS bitstream), find the actual EVS bitrate for each channel for each EVS bitrate distribution (EVSbd) described above.

ステップ６：実際のＥＶＳビットレートに基づいてＩＶＡＳビットストリームのＥＰセクションから、符号化されたＥＶＳビットを読み出し、それぞれのＥＶＳインスタンスを用いてＦｏＡオーディオ信号の各チャネルを復号化する。 Step 6: Read the encoded EVS bits from the EP section of the IVAS bitstream based on the actual EVS bitrate, and use the respective EVS instance to decode each channel of the FoA audio signal.

ステップ７：復号化されたＥＶＳ出力および空間メタデータを使用して、ＦｏＡ（ＳＰＡＲ）オーディオ信号を構成する。 Step 7: Construct a FoA (SPAR) audio signal using the decoded EVS output and spatial metadata.

上述したＩＶＡＳビットストリームフォーマットの実施形態の利点は、この実施形態が、様々なオーディオサービス能力をサポートするデータを効率的かつコンパクトに符号化することである。これらのオーディオサービス能力は、モノラルからステレオへのアップミックスならびに完全没入型オーディオ符号化、復号化およびレンダリング（例えばＦｏＡ符号化）を含むが、これらに限定されるものではない。この実施形態は、広範囲のデバイス、エンドポイント、およびネットワークノードによってもサポートされる。これらの広範囲のデバイス等は、モバイルフォンおよびスマートフォン、電子タブレット、パーソナルコンピューター、会議電話、会議室、仮想現実（ＶＲ）デバイスおよび拡張現実（ＡＲ）デバイス、ホームシアターデバイス、ならびに他の適したデバイスを含むが、これらに限定されるものではなく、これらのそれぞれは、音のキャプチャーおよびレンダリング用の様々な音響インターフェースを有することができる。ＩＶＡＳビットストリームフォーマットは、ＩＶＡＳ規格および技術とともに容易に発展することができるように拡張可能である。 An advantage of the IVAS bitstream format embodiment described above is that it efficiently and compactly encodes data that supports a variety of audio service capabilities. These audio service capabilities include, but are not limited to, mono-to-stereo upmix and fully immersive audio encoding, decoding and rendering (eg FoA encoding). This embodiment is also supported by a wide range of devices, endpoints and network nodes. These wide range of devices and others include mobile phones and smart phones, electronic tablets, personal computers, conference phones, conference rooms, virtual reality (VR) and augmented reality (AR) devices, home theater devices, and other suitable devices. but not limited to, each of which can have different acoustic interfaces for capturing and rendering sound. The IVAS bitstream format is extensible so that it can evolve easily with IVAS standards and technologies.

例示のプロセス－ＣＡＣＰＬフォーマットのＩＶＡＳビットストリーム
図４Ａは、一実施形態によるＩＶＡＳ符号化プロセス４００のフロー図である。プロセス４００は、図８を参照して説明されるようなデバイスアーキテクチャを使用して実施することができる。 Exemplary Process—IVAS Bitstream in CACPL Format FIG. 4A is a flow diagram of an IVAS encoding process 400 according to one embodiment. Process 400 may be implemented using a device architecture such as that described with reference to FIG.

プロセス４００は、ＩＶＡＳエンコーダーを使用して符号化ツールインジケーターおよびサンプリングレートインジケーターを求め、符号化ツールインジケーターおよびサンプリングレートインジケーターをＩＶＡＳビットストリームの共通ヘッダー（ＣＨ）セクション内に符号化すること（４０１）を含む。いくつかの実施態様において、ツールインジケーターは、符号化ツールに対応する値を有し、サンプリングレートインジケーターは、サンプリングレートを示す値を有する。 The process 400 includes determining an encoding tool indicator and a sampling rate indicator using an IVAS encoder and encoding (401) the encoding tool indicator and the sampling rate indicator into a common header (CH) section of the IVAS bitstream. include. In some implementations, the tool indicator has a value corresponding to the encoding tool and the sampling rate indicator has a value that indicates the sampling rate.

プロセス４００は、ＩＶＡＳエンコーダーを使用して拡張型音声サービス（ＥＶＳ）ペイロードを求め、拡張型音声サービス（ＥＶＳ）ペイロードをＩＶＡＳビットストリームのＥＶＳペイロード（ＥＰ）セクション内に符号化すること（４０２）を更に含む。いくつかの実施態様において、ＥＰセクションはＣＨセクションの後に続く。 The process 400 includes determining an enhanced voice service (EVS) payload using an IVAS encoder and encoding (402) the enhanced voice service (EVS) payload into an EVS payload (EP) section of an IVAS bitstream. Including further. In some embodiments, the EP section follows the CH section.

プロセス４００は、ＩＶＡＳエンコーダーを使用してメタデータペイロードにおけるメタデータペイロードを求め、メタデータペイロードをＩＶＡＳビットストリームのメタデータペイロード（ＭＤＰ）セクション内に符号化すること（４０３）を更に含む。いくつかの実施態様において、ＭＤＰセクションはＣＨセクションの後に続く。いくつかの実施態様において、ＥＰセクションはビットストリームのＭＤＰセクションの後に続く。 Process 400 further includes determining a metadata payload in the metadata payload using an IVAS encoder and encoding (403) the metadata payload into a metadata payload (MDP) section of the IVAS bitstream. In some implementations, the MDP section follows the CH section. In some implementations, the EP section follows the MDP section of the bitstream.

プロセス４００は、ＩＶＡＳビットストリームを非一時的コンピューター可読媒体上に記憶することまたはＩＶＡＳビットストリームを下流デバイスにストリーミングすること（４０４）を更に含む。 Process 400 further includes storing the IVAS bitstream on a non-transitory computer-readable medium or streaming the IVAS bitstream to a downstream device (404).

図４Ｂは、一実施形態による、代替のＩＶＡＳフォーマットを使用するＩＶＡＳ符号化プロセス４０５のフロー図である。プロセス４０５は、図８を参照して説明されるようなデバイスアーキテクチャを含むことができる。 FIG. 4B is a flow diagram of an IVAS encoding process 405 using an alternative IVAS format, according to one embodiment. Process 405 can include device architecture as described with reference to FIG.

プロセス４０５は、ＩＶＡＳエンコーダーを使用して符号化ツールインジケーターを求め、符号化ツールインジケーターをＩＶＡＳビットストリームの共通ヘッダー（ＣＨ）セクション内に符号化すること（４０６）を含む。いくつかの実施態様において、ツールインジケーターは、符号化ツールに対応する値を有する。 Process 405 includes determining an encoding tool indicator using an IVAS encoder and encoding 406 the encoding tool indicator into a common header (CH) section of the IVAS bitstream. In some implementations, the tool indicator has a value corresponding to the encoding tool.

プロセス４０５は、ＩＶＡＳエンコーダーを使用して、ＩＶＡＳビットレート分布制御テーブルの表現をＩＶＡＳビットストリームの共通空間符号化ツールヘッダー（ＣＴＨ）セクション内に符号化すること（４０７）を更に含む。 Process 405 further includes encoding (407) a representation of the IVAS bitrate distribution control table into a Common Spatial Encoding Tool Header (CTH) section of the IVAS bitstream using an IVAS encoder.

プロセス４０５は、ＩＶＡＳエンコーダーを使用してメタデータペイロードを求め、メタデータペイロードをＩＶＡＳビットストリームのメタデータペイロード（ＭＤＰ）セクション内に符号化すること（４０８）を更に含む。いくつかの実施態様において、ＭＤＰセクションは、ＩＶＡＳビットストリームのＣＨセクションの後に続く。 Process 405 further includes determining a metadata payload using an IVAS encoder and encoding (408) the metadata payload into a metadata payload (MDP) section of the IVAS bitstream. In some implementations, the MDP section follows the CH section of the IVAS bitstream.

プロセス４０５は、ＩＶＡＳエンコーダーを使用して拡張型音声サービス（ＥＶＳ）ペイロードを求め、拡張型音声サービス（ＥＶＳ）ペイロードをＩＶＡＳビットストリームのＥＶＳペイロード（ＥＰ）セクション内に符号化すること（４０９）を更に含む。いくつかの実施態様において、ＥＰセクションは、ＩＶＡＳビットストリームのＣＨセクションの後に続く。いくつかの実施態様において、ＭＤＰセクションは、ＩＶＡＳビットストリームのＥＰセクションの後に続く。 Process 405 determines an enhanced voice service (EVS) payload using an IVAS encoder and encodes (409) the enhanced voice service (EVS) payload into an EVS payload (EP) section of the IVAS bitstream. Including further. In some implementations, the EP section follows the CH section of the IVAS bitstream. In some implementations, the MDP section follows the EP section of the IVAS bitstream.

プロセス４０５は、ＩＶＡＳビットストリームを記憶デバイス上に記憶することまたはＩＶＡＳビットストリームを下流デバイスにストリーミングすること（４１０）を更に含む。 Process 405 further includes storing the IVAS bitstream on a storage device or streaming 410 the IVAS bitstream to a downstream device.

図５Ａは、一実施形態によるＩＶＡＳ復号化プロセス５００のフロー図である。プロセス５００は、図８を参照して説明されるようなデバイスアーキテクチャを使用して実施することができる。 FIG. 5A is a flow diagram of an IVAS decoding process 500 according to one embodiment. Process 500 may be implemented using a device architecture such as that described with reference to FIG.

プロセス５００は、ＩＶＡＳデコーダーを使用して、符号化ツールインジケーターおよびサンプリングレートインジケーターをＩＶＡＳビットストリームの共通ヘッダー（ＣＨ）セクションから抽出して復号化すること（５０１）を含む。いくつかの実施態様において、ツールインジケーターは、符号化ツールに対応する値を有し、サンプリングレートインジケーターは、サンプリングレートを示す値を有する。 Process 500 includes extracting and decoding (501) an encoding tool indicator and a sampling rate indicator from a common header (CH) section of an IVAS bitstream using an IVAS decoder. In some implementations, the tool indicator has a value corresponding to the encoding tool and the sampling rate indicator has a value that indicates the sampling rate.

プロセス５００は、ＩＶＡＳデコーダーを使用して、拡張型音声サービス（ＥＶＳ）ペイロードをＩＶＡＳビットストリームのＥＶＳペイロード（ＥＰ）セクションから抽出して復号化すること（５０２）を更に含む。いくつかの実施態様において、ＥＰセクションは、ＩＶＡＳビットストリームのＣＨセクションの後に続く。 Process 500 further includes extracting and decoding Enhanced Voice Service (EVS) payload from the EVS payload (EP) section of the IVAS bitstream using an IVAS decoder (502). In some implementations, the EP section follows the CH section of the IVAS bitstream.

プロセス５００は、ＩＶＡＳデコーダーを使用して、メタデータペイロードをビットストリームのメタデータペイロード（ＭＤＰ）セクションから抽出して復号化すること（５０３）を更に含む。いくつかの実施態様において、ＭＤＰセクションは、ＩＶＡＳビットストリームのＣＨセクションの後に続く。いくつかの実施態様において、ＥＰセクションは、ＩＶＡＳビットストリームのＭＤＰセクションの後に続く。 The process 500 further includes extracting and decoding (503) the metadata payload from the metadata payload (MDP) section of the bitstream using the IVAS decoder. In some implementations, the MDP section follows the CH section of the IVAS bitstream. In some implementations, the EP section follows the MDP section of the IVAS bitstream.

プロセス５００は、符号化ツール、サンプリングレート、ＥＶＳペイロード、およびメタデータペイロードに基づいてオーディオデコーダーを制御すること、または、符号化ツール、サンプリングレート、ＥＶＳペイロード、およびメタデータペイロードの表現を非一時的コンピューター可読媒体上に記憶すること（５０４）を更に含む。 Process 500 controls an audio decoder based on the encoding tool, sampling rate, EVS payload, and metadata payload, or converts the representation of the encoding tool, sampling rate, EVS payload, and metadata payload into a non-temporal Further including storing (504) on a computer readable medium.

図５Ｂは、一実施形態による、代替のフォーマットを使用するＩＶＡＳ復号化プロセス５０５のフロー図である。プロセス５０５は、図８を参照して説明されるようなデバイスアーキテクチャを使用して実施することができる。 FIG. 5B is a flow diagram of an IVAS decoding process 505 using an alternate format, according to one embodiment. Process 505 may be implemented using a device architecture such as that described with reference to FIG.

プロセス５０５は、ＩＶＡＳデコーダーを使用して、ＩＶＡＳビットストリームの共通ヘッダー（ＣＨ）セクション内の符号化ツールインジケーターを抽出して復号化すること（５０６）を含む。いくつかの実施態様において、ツールインジケーターは、符号化ツールに対応する値を有する。 Process 505 includes extracting and decoding (506) the coding tool indicator in the common header (CH) section of the IVAS bitstream using an IVAS decoder. In some implementations, the tool indicator has a value corresponding to the encoding tool.

プロセス５０５は、ＩＶＡＳデコーダーを使用して、ＩＶＡＳビットストリームの共通空間符号化ツールヘッダー（ＣＴＨ）セクション内のＩＶＡＳビットレート分布制御テーブルの表現を抽出して復号化すること（５０７）を更に含む。 Process 505 further includes using an IVAS decoder to extract and decode (507) the representation of the IVAS bitrate distribution control table in the Common Spatial Encoding Tool Header (CTH) section of the IVAS bitstream.

プロセス５０５は、ＩＶＡＳデコーダーを使用して、ＩＶＡＳビットストリームのメタデータペイロード（ＭＤＰ）セクション内のメタデータペイロードを復号化すること（５０８）を更に含む。いくつかの実施態様において、ＭＤＰセクションは、ＩＶＡＳビットストリームのＣＨセクションの後に続く。 Process 505 further includes decoding 508 the metadata payload in the metadata payload (MDP) section of the IVAS bitstream using the IVAS decoder. In some implementations, the MDP section follows the CH section of the IVAS bitstream.

プロセス５０５は、ＩＶＡＳデコーダーを使用して、ＩＶＡＳビットストリームのＥＶＳペイロード（ＥＰ）セクション内の拡張型音声サービス（ＥＶＳ）ペイロードを復号化すること（５０９）を更に含む。いくつかの実施態様において、ＥＰセクションは、ＩＶＡＳビットストリームのＣＨセクションの後に続く。いくつかの実施態様において、ＭＤＰセクションは、ＩＶＡＳビットストリームのＥＰセクションの後に続く。 Process 505 further includes decoding 509 the enhanced voice service (EVS) payload within the EVS payload (EP) section of the IVAS bitstream using the IVAS decoder. In some implementations, the EP section follows the CH section of the IVAS bitstream. In some implementations, the MDP section follows the EP section of the IVAS bitstream.

プロセス５０５は、符号化ツールインジケーター、ＩＶＡＳビットレート分布制御テーブル、メタデータペイロード、およびＥＶＳペイロードの表現に基づいてオーディオデコーダーを制御すること、または、符号化ツールインジケーターの表現、ＩＶＡＳビットレート分布制御テーブル、メタデータペイロード、およびＥＶＳペイロードの表現を記憶デバイス上に記憶すること（５１０）を更に含む。 Process 505 controls an audio decoder based on the representation of the encoding tool indicator, the IVAS bitrate distribution control table, the metadata payload, and the EVS payload, or the representation of the encoding tool indicator, the IVAS bitrate distribution control table. , the metadata payload, and the EVS payload representation on a storage device (510).

例示のプロセス－ＳＰＡＲフォーマットのＩＶＡＳビットストリーム
図６は、一実施形態によるＩＶＡＳＳＰＡＲ符号化プロセス６００のフロー図である。プロセス６００は、図８を参照して説明されるようなデバイスアーキテクチャを使用して実施することができる。 An Exemplary Process—IVAS Bitstream in SPAR Format FIG. 6 is a flow diagram of an IVAS SPAR encoding process 600 according to one embodiment. Process 600 may be implemented using a device architecture such as that described with reference to FIG.

プロセス６００は、ＩＶＡＳエンコーダーを使用して、符号化モード／符号化ツールインジケーターを復号化し、符号化モード／符号化ツールインジケーターをＩＶＡＳビットストリームの共通ヘッダー（ＣＨ）セクション内に符号化すること（６０１）を含む。 The process 600 includes decoding the encoding mode/encoding tool indicator using an IVAS encoder and encoding the encoding mode/encoding tool indicator into a common header (CH) section of the IVAS bitstream (601 )including.

プロセス６００は、ＩＶＡＳエンコーダーを使用して、ＳＰＡＲビットレート分布制御テーブルの表現を求め、ＩＶＡＳビットストリームのツールヘッダー（ＴＨ）セクションにおけるモードヘッダー／ツールヘッダー内に符号化すること（６０２）を更に含む。ここで、ＴＨセクションはＣＨセクションの後に続く。 The process 600 further includes using an IVAS encoder to obtain a representation of a SPAR bitrate distribution control table and encoding (602) into a mode header/tool header in a tool header (TH) section of the IVAS bitstream. . Here, the TH section follows the CH section.

プロセス６００は、ＩＶＡＳエンコーダーを使用して、メタデータペイロードを求め、メタデータペイロードをＩＶＡＳビットストリームのメタデータペイロード（ＭＤＰ）セクション内に符号化すること（６０３）を更に含む。いくつかの実施態様において、ＭＤＰセクションは、ＩＶＡＳビットストリームのＣＨセクションの後に続く。 Process 600 further includes determining a metadata payload using an IVAS encoder and encoding 603 the metadata payload into a metadata payload (MDP) section of the IVAS bitstream. In some implementations, the MDP section follows the CH section of the IVAS bitstream.

いくつかの実施態様において、ＭＤＰセクションは、量子化ストラテジーインジケーター；符号化ストラテジーインジケーター；ならびに１つ以上の係数の量子化および符号化された実部および虚部を含む。いくつかの実施態様において、１つ以上の係数は、予測係数、相互予測係数（またはダイレクト係数）、実数（対角）デコリレーター係数および複素（非対角）デコリレーター係数を含むが、これらに限定されるものではない。いくつかの実施態様において、より多くの係数またはより少ない係数が、ＩＶＡＳビットストリームのＭＤＰセクションに格納され、ＩＶＡＳビットストリームのＭＤＰセクションから読み出される。 In some implementations, the MDP section includes a quantization strategy indicator; a coding strategy indicator; and quantized and coded real and imaginary parts of one or more coefficients. In some embodiments, the one or more coefficients include prediction coefficients, co-prediction coefficients (or direct coefficients), real (diagonal) decorrelator coefficients and complex (off-diagonal) decorrelator coefficients, including It is not limited. In some implementations, more or fewer coefficients are stored in the MDP section of the IVAS bitstream and read out from the MDP section of the IVAS bitstream.

プロセス６００は、ＩＶＡＳエンコーダーを使用して、拡張型音声サービス（ＥＶＳ）ペイロードを求め、ＥＶＳペイロードをＩＶＡＳビットストリームのＥＶＳペイロード（ＥＰ）セクション内に符号化すること（６０４）を更に含む。いくつかの実施態様において、ＩＶＡＳビットストリームのＥＰセクションは、３ＧＰＰＴＳ２６．４４５に従って全てのチャネルのＥＶＳペイロードを含む。いくつかの実施態様において、ＥＰセクションは、ＩＶＡＳビットストリームのＣＨセクションの後に続く。いくつかの実施態様において、ＥＰセクションは、ＭＤＰセクションの後に続く。ＥＰセクションをＩＶＡＳビットストリームのＭＤＰセクションの後に続かせることによって、効率的なビットパッキングが確保され、ＭＤＰビットおよびＥＰビットの数が（ビットレート分布アルゴリズムに従って）変化することを可能にすることによって、ＩＶＡＳビットレートバジェットにおける全ての利用可能なビットの利用が確保されることに留意されたい。 Process 600 further includes determining an enhanced voice service (EVS) payload using an IVAS encoder and encoding 604 the EVS payload into an EVS payload (EP) section of the IVAS bitstream. In some implementations, the EP section of the IVAS bitstream contains the EVS payloads of all channels according to 3GPP TS26.445. In some implementations, the EP section follows the CH section of the IVAS bitstream. In some implementations, the EP section follows the MDP section. Efficient bit packing is ensured by having the EP section follow the MDP section of the IVAS bitstream, and by allowing the number of MDP and EP bits to vary (according to the bitrate distribution algorithm), Note that utilization of all available bits in the IVAS bitrate budget is ensured.

プロセス６００は、ビットストリームを非一時的コンピューター可読媒体上に記憶すること、または、ビットストリームを下流デバイスにストリーミングすること（６０５）を更に含む。 Process 600 further includes storing the bitstream on a non-transitory computer-readable medium or streaming the bitstream to a downstream device (605).

図７は、一実施形態によるＩＶＡＳＳＰＡＲ復号化プロセス７００のフロー図である。プロセス７００は、図８を参照して説明されるようなデバイスアーキテクチャを使用して実施することができる。 FIG. 7 is a flow diagram of an IVAS SPAR decoding process 700 according to one embodiment. Process 700 may be implemented using a device architecture such as that described with reference to FIG.

プロセス７００は、ＩＶＡＳデコーダーを使用して、ＩＶＡＳビットストリームの共通ヘッダー（ＣＨ）セクション内の符号化モードインジケーターを抽出して復号化すること（７０１）を含む。 Process 700 includes extracting and decoding (701) a coding mode indicator in a common header (CH) section of an IVAS bitstream using an IVAS decoder.

プロセス７００は、ＩＶＡＳデコーダーを使用して、ＩＶＡＳビットストリームのツールヘッダー（ＴＨ）セクションにおけるモードヘッダー／ツールヘッダー内のＳＰＡＲビットレート分布制御テーブルの表現を抽出して復号化すること（７０２）を含む。いくつかの実施態様において、ＴＨセクションはＣＨセクションの後に続く。 The process 700 includes extracting and decoding 702 a representation of the SPAR bitrate distribution control table in the mode header/tool header in the tool header (TH) section of the IVAS bitstream using an IVAS decoder. . In some embodiments, a TH section follows a CH section.

プロセス７００は、ＩＶＡＳデコーダーを使用して、メタデータペイロードをＩＶＡＳビットストリームのメタデータペイロード（ＭＤＰ）セクションから抽出して復号化すること（７０３）を更に含む。いくつかの実施態様において、ＭＤＰセクションは、ＩＶＡＳビットストリームのＣＨセクションの後に続く。 The process 700 further includes extracting and decoding 703 the metadata payload from the metadata payload (MDP) section of the IVAS bitstream using the IVAS decoder. In some implementations, the MDP section follows the CH section of the IVAS bitstream.

プロセス７００は、ＩＶＡＳデコーダーを使用して、拡張型音声サービス（ＥＶＳ）ペイロードをＩＶＡＳビットストリームのＥＶＳペイロード（ＥＰ）セクションから抽出して復号化すること（７０４）を更に含む。いくつかの実施態様において、ＥＰセクションはＣＨセクションの後に続く。いくつかの実施態様において、ＥＰセクションはＭＤＰセクションの後に続く。ＥＰセクションをＩＶＡＳビットストリームのＭＤＰセクションの後に続かせることによって、効率的なビットパッキングが確保され、ＭＤＰビットおよびＥＰビットの数が（ビットレート分布アルゴリズムに従って）変化することを可能にすることによって、ＩＶＡＳビットレートバジェットにおける全ての利用可能なビットの利用が確保されることに留意されたい。 Process 700 further includes extracting and decoding Enhanced Voice Service (EVS) payload from the EVS payload (EP) section of the IVAS bitstream using an IVAS decoder (704). In some embodiments, the EP section follows the CH section. In some implementations, the EP section follows the MDP section. Efficient bit packing is ensured by having the EP section follow the MDP section of the IVAS bitstream, and by allowing the number of MDP and EP bits to vary (according to the bitrate distribution algorithm), Note that utilization of all available bits in the IVAS bitrate budget is ensured.

プロセス７００は、符号化モードインジケーター、ＳＰＡＲビットレート分布制御テーブル、ＥＶＳペイロード、およびメタデータペイロードの表現に基づいてオーディオデコーダーを制御すること、または、符号化モードインジケーターの表現、ＳＰＡＲビットレート分布制御テーブル、ＥＶＳペイロード、およびメタデータペイロードの表現を非一時的コンピューター可読媒体上に記憶すること（７０５）を更に含む。 Process 700 controls an audio decoder based on a representation of a coding mode indicator, a SPAR bitrate distribution control table, an EVS payload, and a metadata payload, or a representation of a coding mode indicator, a SPAR bitrate distribution control table. , the EVS payload, and the metadata payload on a non-transitory computer-readable medium (705).

例示のシステムアーキテクチャ
図８は、本開示の例示の実施形態を実施するのに適した一例示のシステム８００のブロック図を示している。システム８００は、１つ以上のサーバーコンピューターまたは任意のクライアントデバイスを含む。これらのサーバーコンピューターまたはクライアントデバイスは、図１に示すデバイスのうちの任意のもの、例えばコールサーバー１０２、レガシーデバイス１０６、ユーザー機器１０８、１１４、会議室システム１１６、１１８、ホームシアターシステム、ＶＲギア１２２および没入型コンテンツインジェスト１２４等を含むが、これらに限定されるものではない。システム８００は、任意のコンシューマデバイスを含み、このコンシューマデバイスは、スマートフォン、タブレットコンピューター、ウェアラブルコンピューター、車両コンピューター、ゲームコンソール、サラウンドシステム、キオスクを含むが、これらに限定されるものではない。 Exemplary System Architecture FIG. 8 depicts a block diagram of an exemplary system 800 suitable for implementing exemplary embodiments of the present disclosure. System 800 includes one or more server computers or any client device. These server computers or client devices can be any of the devices shown in FIG. Including, but not limited to, immersive content ingest 124 and the like. System 800 includes any consumer device including, but not limited to, smart phones, tablet computers, wearable computers, vehicle computers, game consoles, surround systems, kiosks.

図示するように、システム８００は、例えば、リードオンリーメモリ（ＲＯＭ：read only memory）８０２に記憶されたプログラム、または、例えば、記憶ユニット８０８からランダムアクセスメモリ（ＲＡＭ：random access memory）８０３にロードされたプログラムに従って様々なプロセスを実行することが可能な中央処理装置（ＣＰＵ：central processing unit）８０１を含む。ＲＡＭ８０３には、ＣＰＵ８０１が様々なプロセスを実行するときに必要とされるデータも、必要に応じて記憶される。ＣＰＵ８０１、ＲＯＭ８０２およびＲＡＭ８０３は、バス８０４を介して互いに接続される。入力／出力（Ｉ／Ｏ）インターフェース８０５もバス８０４に接続される。 As shown, the system 800 can be configured with programs stored, for example, in read only memory (ROM) 802 or loaded from, for example, storage unit 808 into random access memory (RAM) 803 . It includes a central processing unit (CPU) 801 capable of executing various processes according to programs written therein. Data required when the CPU 801 executes various processes is also stored in the RAM 803 as necessary. CPU 801 , ROM 802 and RAM 803 are interconnected via bus 804 . Input/output (I/O) interface 805 is also connected to bus 804 .

次の構成要素、すなわち、キーボード、マウス等を含むことができる入力ユニット８０６；液晶ディスプレイ（ＬＣＤ）等のディスプレイおよび１つ以上のスピーカーを含むことができる出力ユニット８０７；ハードディスクまたは別の適した記憶デバイスを含む記憶ユニット８０８；ならびにネットワークカード（例えば、有線または無線）等のネットワークインターフェースカードを含む通信ユニット８０９が、Ｉ／Ｏインターフェース８０５に接続される。 The following components: an input unit 806, which can include a keyboard, mouse, etc.; an output unit 807, which can include a display such as a liquid crystal display (LCD) and one or more speakers; a hard disk or other suitable storage. A storage unit 808 including devices; and a communication unit 809 including a network interface card such as a network card (eg, wired or wireless) are connected to the I/O interface 805 .

いくつかの実施態様において、入力ユニット８０６は、様々なフォーマット（例えば、モノラル、ステレオ、空間、没入型、および他の適したフォーマット）のオーディオ信号のキャプチャーを可能にする（ホストデバイスに応じて）異なる位置にある１つ以上のマイクロフォンを含む。 In some implementations, input unit 806 enables capture of audio signals in various formats (eg, mono, stereo, spatial, immersive, and other suitable formats) (depending on the host device). Contains one or more microphones at different positions.

いくつかの実施態様において、出力ユニット８０７は、様々な数のスピーカーを有するシステムを含む。図１に示すように、出力ユニット８０７は、（ホストデバイスの能力に応じて）様々なフォーマット（例えば、モノラル、ステレオ、没入型、バイノーラル、および他の適したフォーマット）のオーディオ信号をレンダリングすることができる。 In some implementations, the output unit 807 includes a system with varying numbers of speakers. As shown in FIG. 1, output unit 807 is capable of rendering audio signals in various formats (eg, mono, stereo, immersive, binaural, and other suitable formats) (depending on host device capabilities). can be done.

通信ユニット８０９は、他のデバイスと（例えば、ネットワークを介して）通信するように構成される。ドライブ８１０も、必要に応じてＩ／Ｏインターフェース８０５に接続される。着脱可能媒体８１１、例えば磁気ディスク、光ディスク、光磁気ディスク、フラッシュドライブまたは別の適した着脱可能媒体等が、そこから読み出されたコンピュータープログラムが必要に応じて記憶ユニット８０８内にインストールされるように、ドライブ８１０に取り付けられる。当業者であれば、システム８００は、上述した構成要素を含むものとして説明されているが、実際の用途において、これらの構成要素のうちの一部を追加、除去、および／または交換することが可能であり、これらの全ての変更または改変は全て本開示の範囲内に含まれることを理解するであろう。 Communication unit 809 is configured to communicate with other devices (eg, over a network). Drives 810 are also connected to I/O interface 805 as needed. A removable medium 811 , such as a magnetic disk, optical disk, magneto-optical disk, flash drive, or other suitable removable medium, is provided so that a computer program read therefrom may be installed within storage unit 808 as desired. , is attached to drive 810 . Those skilled in the art will appreciate that although system 800 is described as including the components described above, in actual application some of these components may be added, removed, and/or replaced. It will be understood that all such changes or modifications are possible and are all included within the scope of the present disclosure.

他の実施態様
一実施形態において、オーディオ信号のビットストリームを生成する方法が、ＩＶＡＳエンコーダーを使用して、符号化ツールインジケーターおよびサンプリングレートインジケーターを求めることであって、符号化ツールインジケーターは符号化ツールに対応する値を有し、サンプリングレートインジケーターはサンプリングレートを示す値を有することと；ＩＶＡＳエンコーダーを使用して、符号化ツールインジケーターおよびサンプリングレートインジケーターをＩＶＡＳビットストリームの共通ヘッダー（ＣＨ）セクション内に符号化することと；ＩＶＡＳエンコーダーを使用して、拡張型音声サービス（ＥＶＳ）ペイロードを求めることと；ＩＶＡＳエンコーダーを使用して、ＥＶＳペイロードをＩＶＡＳビットストリームのＥＶＳペイロード（ＥＰ）セクション内に符号化することであって、ＥＰセクションはＣＨセクションの後に続くことと；ＩＶＡＳエンコーダーを使用して、メタデータペイロードを求めることと；ＩＶＡＳエンコーダーを使用して、メタデータペイロードをＩＶＡＳビットストリームのメタデータペイロード（ＭＤＰ）セクション内に符号化することであって、ＭＤＰセクションはＣＨセクションの後に続くことと；ＩＶＡＳビットストリームを非一時的コンピューター可読媒体上に記憶することまたはＩＶＡＳビットストリームを下流デバイスにストリーミングすることとを含む。 Other Embodiments In one embodiment, a method of generating a bitstream of an audio signal uses an IVAS encoder to determine an encoding tool indicator and a sampling rate indicator, wherein the encoding tool indicator is the encoding tool and the sampling rate indicator has a value indicating the sampling rate; and using the IVAS encoder to set the encoding tool indicator and the sampling rate indicator in the common header (CH) section of the IVAS bitstream. Encoding; using an IVAS encoder to determine an enhanced voice service (EVS) payload; using the IVAS encoder to encode the EVS payload into an EVS payload (EP) section of an IVAS bitstream. using the IVAS encoder to determine the metadata payload; and using the IVAS encoder to convert the metadata payload to the metadata payload of the IVAS bitstream. (MDP) section, the MDP section following the CH section; and storing the IVAS bitstream on a non-transitory computer-readable medium or streaming the IVAS bitstream to a downstream device. Including things.

一実施形態において、オーディオ信号のビットストリームを復号化する方法が、ＩＶＡＳデコーダーを使用して、符号化ツールインジケーターおよびサンプリングレートインジケーターをＩＶＡＳビットストリームのＣＨセクションから抽出して復号化することであって、ツールインジケーターは、符号化ツールに対応する値を有し、サンプリングレートインジケーターは、サンプリングレートを示す値を有することと；ＩＶＡＳデコーダーを使用して、ＥＶＳペイロードをビットストリームのＥＰセクションから抽出して復号化することであって、ＥＰセクションはＣＨセクションの後に続くことと；ＩＶＡＳデコーダーを使用して、メタデータペイロードをビットストリームのＭＤＰセクションから復号化することであって、ＭＤＰセクションはＣＨセクションの後に続くことと；符号化ツール、サンプリングレート、ＥＶＳペイロード、およびメタデータペイロードに基づいてオーディオデコーダーを制御すること、または、符号化ツール、サンプリングレート、ＥＶＳペイロード、およびメタデータペイロードの表現を非一時的コンピューター可読媒体上に記憶することとを含む。 In one embodiment, a method of decoding a bitstream of an audio signal is extracting and decoding an encoding tool indicator and a sampling rate indicator from the CH section of the IVAS bitstream using an IVAS decoder. , the tool indicator has a value corresponding to the encoding tool, and the sampling rate indicator has a value indicating the sampling rate; and using an IVAS decoder to extract the EVS payload from the EP section of the bitstream. decoding the EP section following the CH section; decoding the metadata payload from the MDP section of the bitstream using an IVAS decoder, the MDP section following the CH section; and; controlling an audio decoder based on the encoding tool, sampling rate, EVS payload, and metadata payload; and storing on a readable computer-readable medium.

一実施形態において、ＭＤＰセクションは、ビットストリームのＥＰセクションの後に続くか、または、ＥＰセクションが、ビットストリームのＭＤＰセクションの後に続く。 In one embodiment, the MDP section follows the EP section of the bitstream, or the EP section follows the MDP section of the bitstream.

一実施形態において、ＩＶＡＳ符号化ツールインジケーターは、３ビットデータ構造体であり、３ビットデータ構造体の第１の値はマルチモノラル符号化ツールに対応し、３ビットデータ構造体の第２の値は複合アドバンストカップリング（ＣＡＣＰＬ）符号化ツールに対応し、３ビットデータ構造体の第３の値は別の符号化ツールに対応する。 In one embodiment, the IVAS encoding tool indicator is a 3-bit data structure, a first value of the 3-bit data structure corresponds to a multi-mono encoding tool, and a second value of the 3-bit data structure is corresponds to a Complex Advanced Coupling (CACPL) encoding tool, and the third value of the 3-bit data structure corresponds to another encoding tool.

一実施形態において、入力サンプリングレートインジケーターは、２ビットデータ構造体であり、２ビットデータ構造体の第１の値は８ｋＨｚサンプリングレートを示し、２ビットデータ構造体の第２の値は１６ｋＨｚサンプリングレートを示し、２ビットデータ構造体の第３の値は３２ｋＨｚサンプリングレートを示し、２ビットデータ構造体の第４の値は４８ｋＨｚサンプリングレートを示す。 In one embodiment, the input sampling rate indicator is a 2-bit data structure, a first value of the 2-bit data structure indicating an 8 kHz sampling rate, and a second value of the 2-bit data structure indicating a 16 kHz sampling rate. , the third value in the 2-bit data structure indicates a 32 kHz sampling rate, and the fourth value in the 2-bit data structure indicates a 48 kHz sampling rate.

一実施形態において、上記方法は、ＥＶＳチャネル数インジケーター、ビットレート（ＢＲ）抽出モードインジケーター、ＥＶＳＢＲデータ、およびＥＶＳペイロードをそれぞれビットストリームのＥＰセクションに記憶することまたはビットストリームのＥＰセクションから読み出すことを含む。 In one embodiment, the method stores and reads from or stores an EVS channel number indicator, a bitrate (BR) extraction mode indicator, EVS BR data, and an EVS payload in the EP section of the bitstream, respectively. including.

一実施形態において、上記方法は、符号化技法インジケーター、帯域数インジケーター、フィルターバンクの遅延構成を示すインジケーター、量子化ストラテジーのインジケーター、エントロピーコーダーインジケーター、確率モデルタイプインジケーター、係数実部、係数虚部、および１つ以上の係数をそれぞれデータストリームのＭＤＰセクションに記憶することまたはデータストリームのＭＤＰセクションから読み出すことを含む。 In one embodiment, the method includes a coding technique indicator, a number of bands indicator, an indicator of the delay configuration of the filterbank, an indicator of quantization strategy, an entropy coder indicator, a probabilistic model type indicator, a coefficient real part, a coefficient imaginary part, and respectively storing or reading from the MDP section of the data stream one or more coefficients.

一実施形態において、オーディオ信号のビットストリームを生成する方法が、ＩＶＡＳエンコーダーを使用して、符号化ツールインジケーターを求めることであって、このツールインジケーターは、符号化ツールに対応する値を有することと；ＩＶＡＳエンコーダーを使用して、符号化ツールインジケーターをＩＶＡＳビットストリームの共通ヘッダー（ＣＨ）セクション内に符号化することと；ＩＶＡＳエンコーダーを使用して、ＩＶＡＳビットレート分布制御テーブルのインデックスの表現を求めることと；ＩＶＡＳエンコーダーを使用して、ＩＶＡＳビットレート分布制御テーブルのインデックスの表現をＩＶＡＳビットストリームの共通空間符号化ツールヘッダー（ＣＴＨ）セクション内に符号化することであって、ＣＴＨセクションはＣＨセクションの後に続くことと；ＩＶＡＳエンコーダーを使用して、メタデータペイロードを求めることと；ＩＶＡＳエンコーダーを使用して、メタデータペイロードをＩＶＡＳビットストリームのメタデータペイロード（ＭＤＰ）セクション内に符号化することであって、ＭＤＰセクションはＣＴＨセクションの後に続くことと；ＩＶＡＳエンコーダーを使用して、拡張型音声サービス（ＥＶＳ）ペイロードを求めることと；ＩＶＡＳエンコーダーを使用して、ＥＶＳペイロードをＩＶＡＳビットストリームのＥＶＳペイロード（ＥＰ）セクション内に符号化することであって、ＥＰセクションはＣＴＨセクションの後に続くことと；ビットストリームを非一時的コンピューター可読媒体上に記憶することまたはビットストリームを下流デバイスにストリーミングすることとを含む。 In one embodiment, a method of generating a bitstream of an audio signal is using an IVAS encoder to determine an encoding tool indicator, the tool indicator having a value corresponding to the encoding tool. using the IVAS encoder to encode the encoding tool indicator into the common header (CH) section of the IVAS bitstream; and using the IVAS encoder to obtain a representation of the index of the IVAS bitrate distribution control table. and; using an IVAS encoder to encode a representation of the index of the IVAS bitrate distribution control table into the Common Spatial Encoding Tool Header (CTH) section of the IVAS bitstream, where the CTH section is the CH section. determining the metadata payload using an IVAS encoder; and encoding the metadata payload into a metadata payload (MDP) section of the IVAS bitstream using the IVAS encoder. the MDP section follows the CTH section; using the IVAS encoder to determine the enhanced voice service (EVS) payload; using the IVAS encoder to convert the EVS payload to the EVS payload of the IVAS bitstream encoding into an (EP) section, the EP section following the CTH section; storing the bitstream on a non-transitory computer-readable medium or streaming the bitstream to a downstream device. including.

一実施形態において、オーディオ信号のビットストリームを復号化する方法が、ＩＶＡＳデコーダーによってビットストリームを受信することと；ビットストリームの長さおよびストライドに基づいてＩＶＡＳ動作ビットレートを計算することと；空間符号化ツールのインジケーターをビットストリームの共通ヘッダー（ＣＨ）セクションから読み出すことと；ビットストリームの共通空間符号化ツールヘッダー（ＣＴＨ）セクションの長さをＩＶＡＳ動作ビットレートに基づいて求めることであって、この求めることは、ＣＴＨセクション内のＩＶＡＳビットレート分布制御テーブルにおけるＩＶＡＳ動作ビットレートに対応するエントリー数を調べることを含むことと；ＣＴＨセクションの長さが求まり、ＩＶＡＳビットレート分布制御テーブルのインデックスが求まると、ＣＴＨセクション内の値を読み出すことと；拡張型音声サービス（ＥＶＳ）ビットレート分布についての情報を、ＩＶＡＳビットレート分布制御テーブルのインデックスに対応するＩＶＡＳビットレート分布制御テーブルのエントリーから読み出すことと；ＥＶＳビットレート分布についての情報をＥＶＳデコーダーに提供することとを含む。 In one embodiment, a method of decoding a bitstream of an audio signal comprises: receiving the bitstream by an IVAS decoder; calculating an IVAS operating bitrate based on the length and stride of the bitstream; reading a coding tool indicator from the common header (CH) section of the bitstream; and determining the length of the common spatial coding tool header (CTH) section of the bitstream based on the IVAS operating bitrate, comprising: The determining includes examining the number of entries corresponding to the IVAS operating bitrate in an IVAS bitrate distribution control table within the CTH section; determining the length of the CTH section and determining the index of the IVAS bitrate distribution control table. and reading the values in the CTH section; and reading the information about the enhanced voice service (EVS) bitrate distribution from the IVAS bitrate distribution control table entry corresponding to the IVAS bitrate distribution control table index. and providing information about the EVS bitrate distribution to the EVS decoder.

一実施形態において、上記方法のいずれかは、３ＧＰＰＴＳ２６．４４５とのモノラルダウンミックス後方互換性のインジケーターをＩＶＡＳビットレート分布制御テーブルのエントリーから読み出すことを含む。 In one embodiment, any of the above methods include reading an indicator of mono downmix backward compatibility with 3GPP TS26.445 from an entry in the IVAS bitrate distribution control table.

一実施形態において、上記方法は、モノラルダウンミックス後方互換性インジケーターがＯＮモードにあると判断することと；ＯＮモードに応答して、ビットストリームの残りの部分をＥＶＳデコーダーに提供することと；次に、ＥＶＳビットレート分布に基づいて、各ＥＶＳインスタンスのそれぞれのビット長をビットストリームの残りの部分から算出することと；対応するビット長に基づいて各ＥＶＳインスタンスのＥＶＳビットを読み出すことと；ＥＶＳビットをＥＶＳデコーダーに第１の部分として提供することと、ビットストリームの残りの部分をＭＤＰデコーダーに提供して空間メタデータを復号化することとを含む。 In one embodiment, the method includes: determining that the mono downmix backward compatibility indicator is in ON mode; providing the remainder of the bitstream to the EVS decoder in response to the ON mode; calculating the respective bit length of each EVS instance from the rest of the bitstream based on the EVS bitrate distribution; reading the EVS bits of each EVS instance based on the corresponding bit length; providing the bits as a first portion to an EVS decoder; and providing remaining portions of the bitstream to an MDP decoder to decode the spatial metadata.

一実施形態において、上記方法は、モノラルダウンミックス後方互換性インジケーターがＯＦＦモードにあると判断することと；ＯＦＦモードに応答して、ビットストリームの残りの部分をＭＤＰデコーダーに提供して空間メタデータを復号化することと；次に、ＥＶＳビットレート分布に基づいて、ビットストリーム残りの部分から各ＥＶＳインスタンスのそれぞれのビット長を算出することと；対応するビット長に基づいて各ＥＶＳインスタンスのＥＶＳビットを読み出すことと；ＥＶＳビットをＥＶＳデコーダーに第１の部分として提供することとを含む。 In one embodiment, the method includes determining that the mono downmix backward compatibility indicator is in an OFF mode; and then calculating the respective bit length of each EVS instance from the rest of the bitstream based on the EVS bitrate distribution; and the EVS of each EVS instance based on the corresponding bit length. reading the bits; and providing the EVS bits to the EVS decoder as a first portion.

一実施形態において、システムが、１つ以上のコンピュータープロセッサと；１つ以上のプロセッサによって実行されると、上記方法クレームのうちのいずれか１つの動作を１つ以上のプロセッサに実行させる命令を記憶する非一時的コンピューター可読媒体とを備える。 In one embodiment, a system stores instructions that, when executed by one or more computer processors, cause the one or more processors to perform the actions of any one of the above method claims a non-transitory computer-readable medium for

一実施形態において、非一時的コンピューター可読媒体が、１つ以上のプロセッサによって実行されると、上記方法クレームのうちのいずれか１つの動作を１つ以上のプロセッサに実行させる命令を記憶する。 In one embodiment, a non-transitory computer-readable medium stores instructions that, when executed by one or more processors, cause the one or more processors to perform the actions of any one of the above method claims.

本開示の例示の実施形態によれば、上述したプロセスは、コンピューターソフトウェアプログラムとしてまたはコンピューター可読記憶媒体上に実施することができる。例えば、本開示の実施形態は、機械可読媒体上に有形に具現化されたコンピュータープログラムを含むコンピュータープログラム製品を含み、このコンピュータープログラムは、方法を実行するプログラムコードを含む。そのような実施形態において、コンピュータープログラムは、図８に示すように、ネットワークから通信ユニット８０９を介してダウンロードして実装することができ、および／または着脱可能媒体８１１からインストールすることができる。 According to exemplary embodiments of the present disclosure, the processes described above may be implemented as a computer software program or on a computer-readable storage medium. For example, embodiments of the present disclosure include computer program products including computer programs tangibly embodied on machine-readable media, which computer programs include program code for performing methods. In such an embodiment, the computer program can be downloaded and implemented from a network via communication unit 809 and/or installed from removable media 811, as shown in FIG.

一般に、本開示の様々な例示の実施形態は、ハードウェアもしくは専用回路（例えば、制御回路類）、ソフトウェア、ロジックまたはそれらの任意の組み合わせで実施することができる。例えば、上述したユニットは、制御回路類（例えば、図８の他の構成要素と組み合わせたＣＰＵ）によって実行することができ、したがって、この制御回路類は、本開示において説明された動作を実行することができる。ハードウェアで実施することができる態様もあれば、コントローラー、マイクロプロセッサまたは他のコンピューティングデバイス（例えば、制御回路類）によって実行することができるファームウェアまたはソフトウェアで実施することができる態様もある。本開示の例示の実施形態の様々な態様は、ブロック図、フローチャートとして、または他の或る図形表現を使用して図示および説明されているが、本明細書において説明されているブロック、装置、システム、技法または方法は、非限定的な例として、ハードウェア、ソフトウェア、ファームウェア、専用回路もしくはロジック、汎用ハードウェアもしくはコントローラーもしくは他のコンピューティングデバイス、またはそれらの或る組み合わせで実施することができることが理解されるであろう。 In general, various exemplary embodiments of the present disclosure can be implemented in hardware or dedicated circuitry (eg, control circuitry), software, logic, or any combination thereof. For example, the units described above may be executed by control circuitry (eg, a CPU in combination with other components of FIG. 8), which thus performs the operations described in this disclosure. be able to. Some aspects can be implemented in hardware, while other aspects can be implemented in firmware or software, which can be executed by a controller, microprocessor or other computing device (eg, control circuitry). Although various aspects of the exemplary embodiments of the present disclosure are illustrated and described using block diagrams, flowcharts, or some other graphical representation, the blocks, devices, that any system, technique, or method may be implemented, as non-limiting examples, in hardware, software, firmware, dedicated circuitry or logic, general-purpose hardware or controllers or other computing devices, or some combination thereof; will be understood.

加えて、フローチャートに示す様々なブロックは、コンピュータープログラムコードの動作の結果からもたらされる方法ステップおよび／または動作、および／または、関連した機能（単数または複数）を実行するように構成された複数の結合された論理回路素子とみなすことができる。例えば、本開示の実施形態は、機械可読媒体上に有形に具現化されたコンピュータープログラムを含むコンピュータープログラム製品を含み、このコンピュータープログラムは、上述したような方法を実行するように構成されるプログラムコードを含む。 In addition, the various blocks illustrated in the flowcharts represent multiple blocks configured to perform method steps and/or actions resulting from operation of the computer program code and/or associated function(s). It can be viewed as a combined logic circuit element. For example, an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program having program code configured to perform a method as described above. including.

本開示の文脈において、機械可読媒体は、命令実行システム、命令実行装置、もしくは命令実行デバイスによってまたはそれらに関連して使用されるプログラムを収容または記憶することができる任意の有形媒体とすることができる。機械可読媒体は、機械可読信号媒体であってもよいし、機械可読記憶媒体であってもよい。機械可読媒体は、非一時的なものであってもよく、電子、磁気、光、電磁気、赤外線、または半導体システム、装置、もしくはデバイス、またはそれらの任意の適した組み合わせを含むことができるが、これらに限定されるものではない。機械可読記憶媒体のより具体的な例は、１つ以上の配線を有する電気接続、ポータブルコンピューターディスケット、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、リードオンリーメモリ（ＲＯＭ）、消去可能プログラマブルリードオンリーメモリ（ＥＰＲＯＭまたはフラッシュメモリ）、光ファイバー、ポータブルコンパクトディスクリードオンリーメモリ（ＣＤ－ＲＯＭ）、光記憶デバイス、磁気記憶デバイス、またはそれらの任意の適した組み合わせを含む。 In the context of this disclosure, a machine-readable medium may be any tangible medium capable of containing or storing a program for use by or in connection with an instruction execution system, apparatus, or device. can. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may be non-transitory and may include electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination thereof; It is not limited to these. More specific examples of machine-readable storage media are electrical connections having one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM). or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.

本開示の方法を実行するコンピュータープログラムコードは、１つ以上のプログラミング言語の任意の組み合わせで記述することができる。これらのコンピュータープログラムコードは、当該プログラムコードが、コンピューターまたは他のプログラマブルデータ処理装置のプロセッサによって実行されると、フローチャートおよび／またはブロック図において指定された機能／動作の実施を引き起こすように、汎用コンピューター、専用コンピューター、または制御回路類を有する他のプログラマブルデータ処理装置のプロセッサに提供することができる。プログラムコードは、スタンドアローンソフトウェアパッケージとしてコンピューター上で全体的またはコンピューター上で部分的に実行することもできるし、一部はコンピューター上および一部はリモートコンピューター上で実行することもできるし、全体をリモートコンピューターまたはリモートサーバー上で実行することもできるし、１つ以上のリモートコンピューターおよび／またはリモートサーバーにわたって分散させることもできる。 Computer program code for implementing the methods of the present disclosure can be written in any combination of one or more programming languages. These computer program code are represented by general purpose computers, such that when the program code is executed by a processor of a computer or other programmable data processing apparatus, it causes the functions/acts specified in the flowchart illustrations and/or block diagrams to be performed. , dedicated computer, or other programmable data processing apparatus having control circuitry. The program code may run wholly on a computer, partially on a computer, as a stand-alone software package, partly on a computer and partly on a remote computer, or in whole. It can run on remote computers or servers, and can be distributed across one or more remote computers and/or remote servers.

本明細書は、多くの具体的な実施の詳細を含むが、これらは、特許請求され得るものの範囲に対する限定と解釈されるべきではなく、逆に、特定の実施形態に固有であり得る特徴の説明と解釈されるべきである。別々の実施形態の文脈で本明細書に説明されている或る特定の特徴は、単一の実施形態に組み合わせて実施することもできる。逆に、単一の実施形態の文脈で説明されている様々な特徴は、複数の実施形態において別々にまたは任意の適したサブコンビネーションで実施することもできる。その上、特徴は、或る特定の組み合わせで動作するものとして上記で説明され、そのようなものとして当初に請求項に記載されることさえあるが、請求項に記載の組み合わせからの１つ以上の特徴は、いくつかの場合には、その組み合わせから削除することができ、請求項に記載の組み合わせは、サブコンビネーションまたはサブコンビネーションの変形形態を対象とすることができる。図に示された論理フローは、望ましい結果を達成するために、図示された特定の順序も逐次的な順序も必須とするものではない。加えて、記載のフローに対して他のステップを設けることもできるし、ステップを削除することもでき、記載のシステムに対して他の構成要素を追加または削除することができる。したがって、他の実施態様が、添付の特許請求の範囲の範囲内にある。 While this specification contains many specific implementation details, these should not be construed as limitations on the scope of what may be claimed, but rather of features that may be inherent in particular embodiments. should be interpreted as an explanation. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features are described above as operating in certain combinations, and may even be originally claimed as such, one or more of the claimed combinations may be used in combination. The features of may in some cases be omitted from the combination, and the claimed combination may cover sub-combinations or variations of sub-combinations. The logic flow depicted in the figures does not require the particular order shown or the sequential order shown to achieve desirable results. Additionally, other steps may be provided or deleted from the described flows, and other components may be added or deleted from the described systems. Accordingly, other implementations are within the scope of the following claims.

Claims

A method for generating a bitstream of an audio signal, comprising:
Determining a coding mode indicator or coding tool indicator using an immersive audio audio service (IVAS) encoder, wherein the coding mode indicator or the coding tool indicator is the coding mode of the audio signal. or indicating an encoding tool;
encoding the encoding mode indicator or the encoding tool indicator into a common header (CH) section of an IVAS bitstream using the IVAS encoder;
determining a mode header or tool header using the IVAS encoder;
encoding the mode header or the tool header into a tool header (TH) section of the IVAS bitstream using the IVAS encoder, the TH section following the CH section;
determining a metadata payload containing spatial metadata using the IVAS encoder;
encoding the metadata payload into a metadata payload (MDP) section of the IVAS bitstream using the IVAS encoder, the MDP section following the CH section;
determining an enhanced voice service (EVS) payload using the IVAS encoder, the EVS payload including EVS coded bits for each channel or downmix channel of the audio signal;
encoding the EVS payload into an EVS payload (EP) section of the IVAS bitstream using the IVAS encoder, the EP section following the CH section;
A method, including

further comprising storing the IVAS bitstream on a non-transitory computer-readable medium or streaming the IVAS bitstream to a downstream device;
The encoding mode or encoding tool indicator, the mode header or the tool header, the metadata payload and the EVS payload are for use in reconstructing the audio signal in the downstream device or another device, the 2. The method of claim 1, extracted and decoded respectively from the CH section, the TH section, the MDP section and the EP section of an IVAS bitstream.

A method of decoding a bitstream of an audio signal, comprising:
extracting and decoding a coding mode indicator or coding tool indicator in a common header (CH) section of an IVAS bitstream using an immersive voice audio service (IVAS) decoder, said encoding wherein the mode indicator or coding tool indicator indicates a coding mode or coding tool for the audio signal;
extracting and decoding, using the IVAS decoder, a mode header or tool header in the Tool Header (TH) section of the IVAS bitstream, the TH section following the CH section; When,
extracting and decoding, using the IVAS decoder, a metadata payload from a metadata payload (MDP) section of the IVAS bitstream, the MDP section following the CH section; the payload includes spatial metadata;
extracting and decoding, using the IVAS decoder, an enhanced voice service (EVS) payload from an EVS payload (EP) section of the IVAS bitstream, the EP section following the CH section; , the EVS payload includes EVS coded bits for each channel or each downmix channel of the audio signal;
A method, including

the encoding mode indicator or the encoding tool indicator, the mode header or the tool header, the EVS payload, an audio decoder of the downstream device for use in reconstructing the audio signal in a downstream device or another device; and controlling based on said metadata payload or representation of said encoding mode indicator or said encoding tool indicator, said mode header or said tool header, said EVS payload, and said metadata payload to a non-transitory computer 4. The method of claim 3, further comprising storing on a readable medium.

The CH is a multi-bit data structure, one value of the multi-bit data structure corresponds to a spatial reconstruction (SPAR) coding mode, and the other value of the data structure corresponds to another coding. 5. A method according to any one of claims 1 to 4, corresponding to a mode.

6. Storing in or reading from the TH section of the IVAS bitstream respectively an index offset for calculating a row index of a Spatial Reconstruction (SPAR) bitrate distribution control table. A method according to any one of

a quantization strategy indicator;
a bitstream encoding strategy indicator;
quantized and encoded real and imaginary parts of the set of coefficients;
in or reading from the MDP section of the IVAS bitstream, respectively.

The EP section follows the MDP section to ensure efficient bit packing, and the number of bits in the MDP section of the IVAS bitstream and the number of bits in the EP section of the IVAS bitstream are equal to IVAS bits. 8. A method according to any one of claims 1 to 7, varying according to the SPAR bitrate distribution control table and bitrate distribution algorithm to ensure utilization of all available bits in the rate budget.

9. A bitrate according to any one of claims 1 to 8, wherein the bitrate of each EVS encoded channel or each downmix channel is determined by EVS Total Available Bits, Bitrate Distribution Control Table and Bitrate Distribution Algorithm. Method.

8. The method of claim 7, wherein the set of coefficients includes prediction coefficients, direct coefficients, diagonal real coefficients and lower triangular complex coefficients.

11. The prediction coefficients are variable bit lengths based on entropy coding, and the direct coefficients, the diagonal real coefficients and the lower triangular complex coefficients are variable bit lengths based on downmix construction and entropy coding. The method described in .

8. The method of claim 7, wherein said quantization strategy indicator is a multi-bit data structure that indicates a quantization strategy.

8. The method of claim 7, wherein the bitstream encoding strategy indicator is a multi-bit data structure that indicates the number of spatial metadata bands and non-differential or temporal differential entropy encoding scheme.

8. The method of claim 7, wherein said quantization of said coefficients follows an EVS bitrate distribution control strategy comprising metadata quantization and EVS bitrate distribution.

Storing or reading from the EP section of the bitstream, respectively, the EVS payload of the EVS instances in accordance with the 3rd Generation Partnership Project (3GPP) Technical Specification (TS) 26.445. 15. The method of any one of claims 1-14, comprising

determining a bitrate from the IVAS bitstream;
reading an index offset from a Spatial Reconstruction (SPAR) Tool Header (TH) section of the IVAS bitstream;
determining a table row index of the SPAR bitrate distribution control table using the index offset;
reading quantization strategy bits and encoding strategy bits from a metadata payload (MDP) section in the IVAS bitstream;
dequantizing SPAR spatial metadata in the MDP section of the IVAS bitstream based on the quantization strategy bits and the encoding strategy bits;
determining an enhanced voice service (EVS) bitrate for each channel in the IVAS bitstream using all available EVS bits, a SPAR bitrate distribution control table and a bitrate distribution algorithm;
reading EVS encoded bits from the EP section of the IVAS bitstream based on the EVS bitrate;
decoding the EVS bits;
decoding the spatial metadata;
generating a first order Ambisonics (FoA) output using the decoded EVS bits and the decoded spatial metadata;
16. The method of any one of claims 3-15, further comprising:

one or more processors;
a non-transitory computer-readable medium storing instructions that, when executed by said one or more processors, cause said one or more processors to perform the operations of the method of any one of claims 1 to 16;
A system comprising:

A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform the operations of the method of any one of claims 1 to 16.