JP6423009B2

JP6423009B2 - Obtaining symmetry information for higher-order ambisonic audio renderers

Info

Publication number: JP6423009B2
Application number: JP2016569921A
Authority: JP
Inventors: ペーターズ、ニルス・ガンザー; セン、ディパンジャン; モッレル、マーティン・ジェームス
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2014-05-30
Filing date: 2015-05-29
Publication date: 2018-11-14
Anticipated expiration: 2035-05-29
Also published as: HUE039048T2; KR101941764B1; WO2015184316A1; CN106465029B; BR112016028212A2; JP2017520174A; KR20170015898A; CN106465029A; CA2950014A1; EP3149972B1; CA2950014C; ES2696930T3; EP3149972A1; BR112016028212B1

Description

Related applications

[0001]本出願は、２０１４年７月１１日に出願された「SIGNALING AUDIO RENDERING INFORMATION IN A BITSTREAM」という名称の米国仮出願第６２／０２３，６６２号、および２０１４年５月３０日に出願された「SIGNALING AUDIO RENDERING INFORMATION IN A BITSTREAM」という名称の米国仮出願第６２／００５，８２９号の利益を主張し、上記に記載された米国仮出願の各々は、それらのそれぞれの全体として本明細書に記載されたかのように、参照により組み込まれる。 [0001] This application is filed on July 11, 2014, US Provisional Application No. 62 / 023,662, entitled “SIGNALING AUDIO RENDERING INFORMATION IN A BITSTREAM”, and filed May 30, 2014. Claiming the benefit of US Provisional Application No. 62 / 005,829 entitled “SIGNALING AUDIO RENDERING INFORMATION IN A BITSTREAM”, each of the US provisional applications described above is hereby incorporated by reference in its entirety. Incorporated by reference as if described in.

[0002]本開示は、情報をレンダリングすることに関し、より詳細には、高次アンビソニック（ＨＯＡ）オーディオデータのための情報をレンダリングすることに関する。 [0002] This disclosure relates to rendering information, and more particularly, to rendering information for higher order ambisonic (HOA) audio data.

[0003]オーディオコンテンツの作成の間、音響技師は、オーディオコンテンツを再生するために使用されるスピーカーの目標構成のためのオーディオコンテンツを調整するために、特定のレンダラを使用してオーディオコンテンツをレンダリングし得る。言い換えれば、音響技師は、オーディオコンテンツをレンダリングし、目標とされる構成に配置されたスピーカーを使用してレンダリングされたオーディオコンテンツを再生し得る。音響技師は次いで、オーディオコンテンツの様々な態様をリミックスし、リミックスされたオーディオコンテンツをレンダリングし、目標とされる構成に配置されたスピーカーを使用してレンダリングされ、リミックスされたオーディオコンテンツをふたたび再生し得る。音響技師は、ある芸術的意図がオーディオコンテンツによって提供されるまで、このように繰り返し得る。このようにして、音響技師は、（たとえば、オーディオコンテンツと一緒に上映される映像コンテンツと合わせるために）ある芸術的意図を提供するまたはさもなければ再生中にある音場を提供するオーディオコンテンツを作成し得る。 [0003] During the creation of audio content, a sound engineer renders the audio content using a specific renderer to adjust the audio content for the target configuration of the speakers used to play the audio content Can do. In other words, the acoustic engineer may render audio content and play the rendered audio content using speakers placed in the targeted configuration. The sound engineer then remixes the various aspects of the audio content, renders the remixed audio content, and renders the remixed audio content rendered using the speakers placed in the targeted configuration, again. obtain. The acoustic engineer can repeat this way until some artistic intent is provided by the audio content. In this way, the acoustic engineer provides audio content that provides some artistic intent (eg, to match video content that is screened with the audio content) or otherwise provides a sound field that is being played. Can be created.

[0004]一般に、オーディオデータを表すビットストリーム中でオーディオレンダリング情報を指定するための技法が、記述される。言い換えれば、本技法は、オーディオコンテンツ作成中に使用されるオーディオレンダリング情報を再生デバイスにシグナルするための方法を提供し、その再生デバイスは次いで、オーディオコンテンツをレンダリングするためにオーディオレンダリング情報を使用し得る。このようにレンダリング情報を提供することは、再生デバイスが、音響技師によって意図されたようにオーディオコンテンツをレンダリングし、それによって芸術的意図がリスナーによって潜在的に理解されるように、オーディオコンテンツの適切な再生を潜在的に確実にすることを可能にする。言い換えれば、音響技師によってレンダリング中に使用されるレンダリング情報は、本開示で述べられる技法に従って提供され、その結果オーディオ再生デバイスは、音響技師によって意図されたようにオーディオコンテンツをレンダリングするためにレンダリング情報を利用することができ、それによって、このオーディオレンダリング情報を提供しないシステムと比較して、オーディオコンテンツの作成中と再生中の両方でのより一貫した体験を確実にする。 [0004] In general, techniques for specifying audio rendering information in a bitstream representing audio data are described. In other words, the technique provides a method for signaling audio rendering information used during audio content creation to a playback device, which then uses the audio rendering information to render the audio content. obtain. Providing rendering information in this way ensures that the playback device renders the audio content as intended by the acoustic engineer so that the artistic intent is potentially understood by the listener. Allows you to potentially ensure proper regeneration. In other words, the rendering information used during rendering by the acoustic engineer is provided according to the techniques described in this disclosure so that the audio playback device can render the audio content to render the audio content as intended by the acoustic engineer. Compared to systems that do not provide this audio rendering information, thereby ensuring a more consistent experience both during creation and playback of audio content.

[0005]１つの態様では、高次アンビソニック係数をレンダリングするように構成されたデバイスは、複数のスピーカーフィードに高次アンビソニック係数をレンダリングするために使用される行列の希薄さを示す希薄さ情報を取得することを行うように構成された１つまたは複数のプロセッサと、希薄さ情報を記憶することを行うように構成されたメモリとを備える。 [0005] In one aspect, a device configured to render higher order ambisonic coefficients is a sparseness indicating a sparseness of a matrix used to render the higher order ambisonic coefficients in a plurality of speaker feeds. One or more processors configured to obtain information and a memory configured to store sparseness information.

[0006]別の態様では、高次アンビソニック係数をレンダリングする方法は、複数のスピーカーフィードを生成するために、高次アンビソニック係数をレンダリングするために使用される行列の希薄さを示す希薄さ情報を取得することを備える。 [0006] In another aspect, a method for rendering high-order ambisonic coefficients includes a sparseness indicating a sparseness of a matrix used to render the high-order ambisonic coefficients to generate a plurality of speaker feeds. Obtaining information.

[0007]別の態様では、ビットストリームを作成するように構成されたデバイスは、行列を記憶することを行うように構成されたメモリと、複数のスピーカーフィードを生成するために、高次アンビソニック係数をレンダリングするために使用される行列の希薄さを示す希薄さ情報を取得するように構成される１つまたは複数のプロセッサとを備える。 [0007] In another aspect, a device configured to create a bitstream includes a memory configured to store a matrix and a higher order ambisonic to generate a plurality of speaker feeds. One or more processors configured to obtain sparseness information indicative of the sparseness of the matrix used to render the coefficients.

[0008]別の態様では、ビットストリームを作成する方法は、複数のスピーカーフィードを生成するために、高次アンビソニック係数をレンダリングするために使用される行列の希薄さを示す希薄さ情報を取得することを備える。 [0008] In another aspect, a method for creating a bitstream obtains sparseness information indicating a sparseness of a matrix used to render higher order ambisonic coefficients to generate multiple speaker feeds. Prepare to do.

[0009]別の態様では、高次アンビソニック係数をレンダリングするように構成されたデバイスは、複数のスピーカーフィードを生成するために、高次アンビソニック係数をレンダリングするために使用される行列の符号シンメトリ（sign symmetry）を示す符号シンメトリ情報を取得するように構成された１つまたは複数のプロセッサと、希薄さ情報を格納するように構成されたメモリとを備える。 [0009] In another aspect, a device configured to render higher order ambisonic coefficients is a matrix code used to render higher order ambisonic coefficients to generate a plurality of speaker feeds. One or more processors configured to obtain sign symmetry information indicative of sign symmetry and a memory configured to store sparseness information.

[0010]別の態様では、高次アンビソニック係数をレンダリングする方法は、複数のスピーカーフィードを生成するために、高次アンビソニック係数をレンダリングするために使用される行列の符号シンメトリを示す符号シンメトリ情報を取得することを備える。 [0010] In another aspect, a method for rendering higher order ambisonic coefficients includes code symmetry that indicates code symmetry of a matrix used to render higher order ambisonic coefficients to generate a plurality of speaker feeds. Obtaining information.

[0011]別の態様では、ビットストリームを作成するように構成されたデバイスは、複数のスピーカーフィードを生成するために、高次アンビソニック係数をレンダリングするために使用される行列を記憶するように構成されたメモリと、行列の符号シンメトリを示すシンメトリ情報を符号するように構成された１つまたは複数のプロセッサとを備える。 [0011] In another aspect, a device configured to create a bitstream stores a matrix used to render higher order ambisonic coefficients to generate a plurality of speaker feeds. A memory configured and one or more processors configured to encode symmetry information indicative of matrix code symmetry.

[0012]別の態様では、ビットストリームを作成する方法は、複数のスピーカーフィードを生成するように、高次アンビソニック係数をレンダリングするために使用される行列の希薄さを示す希薄さ情報を取得することを備える。 [0012] In another aspect, a method for creating a bitstream obtains sparseness information indicating the sparseness of a matrix used to render higher-order ambisonic coefficients so as to generate multiple speaker feeds. Prepare to do.

[0013]本技法の１つまたは複数の態様の詳細は、添付の図面および以下の説明に記載される。本技法の他の特徴、目的、および利点は、その説明および図面、ならびに特許請求の範囲から明らかになろう。 [0013] The details of one or more aspects of the techniques are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the technique will be apparent from the description and drawings, and from the claims.

様々な次数および副次数の球面調和基底関数を示す図。The figure which shows the spherical harmonic basis function of various orders and suborders. 本開示で説明される技法の様々な態様を実行し得るシステムを示す図。FIG. 11 illustrates a system that can perform various aspects of the techniques described in this disclosure. 本開示で説明される技法の様々な態様を実行し得る、図２の例に示されるオーディオ符号化デバイスの一例をより詳細に示すブロック図。FIG. 3 is a block diagram illustrating in more detail an example of an audio encoding device shown in the example of FIG. 2 that may perform various aspects of the techniques described in this disclosure. 図２のオーディオ復号デバイスをより詳細に示すブロック図。FIG. 3 is a block diagram illustrating the audio decoding device of FIG. 2 in more detail. 本開示で説明されるベクトルベース合成技法の様々な態様を実行する際のオーディオ符号化デバイスの例示的な動作を示すフローチャート。6 is a flowchart illustrating an example operation of an audio encoding device in performing various aspects of the vector-based synthesis techniques described in this disclosure. 本開示で説明される技法の様々な態様を実行する際のオーディオ復号デバイスの例示的な動作を示すフローチャート。6 is a flowchart illustrating an example operation of an audio decoding device in performing various aspects of the techniques described in this disclosure. 本開示において説明される技法の様々な態様を実行する際に、図２の例で示されるシステムのうちの１つのような、システムの動作例を示すフローチャート。3 is a flowchart illustrating an example operation of a system, such as one of the systems shown in the example of FIG. 2, in performing various aspects of the techniques described in this disclosure. 本開示において説明される技法に従って形成されたビットストリームを示す図。FIG. 3 illustrates a bitstream formed in accordance with the techniques described in this disclosure. 本開示において説明される技法に従って形成されたビットストリームを示す図。FIG. 3 illustrates a bitstream formed in accordance with the techniques described in this disclosure. 本開示において説明される技法に従って形成されたビットストリームを示す図。FIG. 3 illustrates a bitstream formed in accordance with the techniques described in this disclosure. 本開示において説明される技法に従って形成されたビットストリームを示す図。FIG. 3 illustrates a bitstream formed in accordance with the techniques described in this disclosure. 圧縮された空間成分を指定し得るビットストリームまたはサイドチャネル情報の一部分をより詳細に示す図。FIG. 3 illustrates in more detail a portion of a bitstream or side channel information that may specify a compressed spatial component. 圧縮された空間成分を指定し得るビットストリームまたはサイドチャネル情報の一部分をより詳細に示す図。FIG. 3 illustrates in more detail a portion of a bitstream or side channel information that may specify a compressed spatial component. 圧縮された空間成分を指定し得るビットストリームまたはサイドチャネル情報の一部分をより詳細に示す図。FIG. 3 illustrates in more detail a portion of a bitstream or side channel information that may specify a compressed spatial component. 高次アンビソニック（ＨＯＡ）レンダリング行列内のＨＯＡ次数依存の最小および最大利得の例を示す図。FIG. 4 is a diagram illustrating an example of HOA order dependent minimum and maximum gains in a higher order ambisonic (HOA) rendering matrix. ２２個のラウドスピーカーのための部分的に希薄な６次ＨＯＡレンダリング行列を説明する図。FIG. 6 illustrates a partially sparse 6th order HOA rendering matrix for 22 loudspeakers. シンメトリプロパティのシグナリングを説明するフローチャート。The flowchart explaining the signaling of a symmetry property.

[0026]サラウンドサウンドの発展は、現今では娯楽のための多くの出力フォーマットを利用可能にしている。そのような消費者向けのサラウンドサウンドフォーマットの例は、ある幾何学的な座標にあるラウドスピーカーへのフィードを暗黙的に指定するという点で、大半が「チャネル」ベースである。消費者向けのサラウンドサウンドフォーマットは、普及している５．１フォーマット（これは、次の６つのチャネル、すなわち、フロントレフト（ＦＬ）と、フロントライト（ＦＲ）と、センターまたはフロントセンターと、バックレフトまたはサラウンドレフトと、バックライトまたはサラウンドライトと、低周波効果（ＬＦＥ）とを含む）、発展中の７．１フォーマット、７．１．４フォーマットおよび２２．２フォーマット（たとえば、超高精細度テレビジョン規格とともに使用するための）などのハイトスピーカーを含む様々なフォーマットを含む。消費者向けではないフォーマットは、「サラウンドアレイ」と呼ばれることが多い（シンメトリック、および非シンメトリック幾何学的配置の）任意の数のスピーカーに及び得る。そのようなアレイの一例は、切頂二十面体の角の座標に配置される３２個のラウドスピーカーを含む。 [0026] The development of surround sound now makes many output formats available for entertainment. Examples of such consumer surround sound formats are mostly “channel” based in that they implicitly specify a feed to a loudspeaker at certain geometric coordinates. The consumer surround sound format is a popular 5.1 format (which includes the following six channels: front left (FL), front right (FR), center or front center, and back Including left or surround left, backlight or surround right, and low frequency effect (LFE), developing 7.1 format, 7.1.4 format and 22.2 format (eg, ultra high definition) Includes various formats including height speakers (for use with television standards). A non-consumer format is often referred to as a “surround array” and can span any number of speakers (in symmetric and non-symmetrical geometry). An example of such an array includes 32 loudspeakers arranged at the corner coordinates of a truncated icosahedron.

[0027]将来のＭＰＥＧ符号化器への入力は、オプションで、次の３つの可能なフォーマット、すなわち、（ｉ）あらかじめ指定された位置でラウドスピーカーを通じて再生されることが意図される、（上で論じられたような）従来のチャネルベースオーディオ、（ｉｉ）（情報の中でも）位置座標を含む関連付けられたメタデータを有する単一オーディオオブジェクトのための離散的なパルス符号変調（ＰＣＭ）データを伴うオブジェクトベースオーディオ、および（ｉｉｉ）球面調和基底関数の係数（「球面調和係数」すなわちＳＨＣ、「高次アンビソニックス」すなわちＨＯＡ、および「ＨＯＡ係数」とも呼ばれる）を使用して音場を表すことを伴うシーンベースオーディオのうちの１つである。将来のＭＰＥＧ符号化器は、２０１３年１月にスイスのジュネーブで発表された、http://mpeg.chiariglione.org/sites/default/files/files/standards/parts/docs/w13411.zipにおいて入手可能な、ＩｎｔｅｒｎａｔｉｏｎａｌＯｒｇａｎｉｚａｔｉｏｎｆｏｒＳｔａｎｄａｒｄｉｚａｔｉｏｎ／ＩｎｔｅｒｎａｔｉｏｎａｌＥｌｅｃｔｒｏｔｅｃｈｎｉｃａｌＣｏｍｍｉｓｓｉｏｎ（ＩＳＯ）／（ＩＥＣ）ＪＴＣ１／ＳＣ２９／ＷＧ１１／Ｎ１３４１１による「ＣａｌｌｆｏｒＰｒｏｐｏｓａｌｓｆｏｒ３ＤＡｕｄｉｏ」と題される文書においてより詳細に説明され得る。 [0027] The input to a future MPEG encoder is optionally intended to be played through a loudspeaker at three pre-specified locations: (i) (Ii) Discrete Pulse Code Modulation (PCM) data for a single audio object with associated metadata including position coordinates (among other information) Representing a sound field using accompanying object-based audio and (iii) spherical harmonic basis function coefficients (also called “spherical harmonic coefficients” or SHC, “higher ambisonics” or HOA, and “HOA coefficients”) Is one of scene-based audio with The future MPEG encoder is available at http://mpeg.chiariglione.org/sites/default/files/files/standards/parts/docs/w13411.zip , published in Geneva, Switzerland in January 2013 Possible International Organization for Standardization / International Electrotechnical Commission (ISO) / (IEC) JTC1 / SC29 / WG11 / N13411 entitled “Call for Proposals for 3D Audio”

[0028]市場には様々な「サラウンドサウンド」チャネルベースフォーマットがある。これらのフォーマットは、たとえば、５．１ホームシアターシステム（リビングルームに進出するという点でステレオ以上に最も成功した）からＮＨＫ（ＮｉｐｐｏｎＨｏｓｏＫｙｏｋａｉすなわち日本放送協会）によって開発された２２．２システムに及ぶ。コンテンツ作成者（たとえば、ハリウッドスタジオ）は、一度に映画のサウンドトラックを作成することを望み、各々のスピーカー構成のためにサウンドトラックをリミックスする努力を行うことを望まない。最近では、規格開発組織が、規格化されたビットストリームへの符号化と、スピーカーの幾何学的配置（と数）および（レンダラを伴う）再生のロケーションにおける音響条件に適応可能でありそれらに依存しない後続の復号とを提供するための方法を考えている。 [0028] There are various “surround sound” channel-based formats on the market. These formats range, for example, from a 5.1 home theater system (most successful over stereo in terms of moving into the living room) to a 22.2 system developed by NHK (Nippon Hoso Kyokai or Japan Broadcasting Corporation). Content creators (eg, Hollywood studios) want to create a movie soundtrack at once, and do not want to make an effort to remix the soundtrack for each speaker configuration. Recently, standards development organizations have been able to adapt to and depend on the acoustic conditions at the location of the encoding (and number) of speakers and the playback location (with the renderer) into a standardized bitstream. Not thinking of a method for providing subsequent decoding and.

[0029]コンテンツ作成者にそのような柔軟性を提供するために、要素の階層セットが音場を表すために使用され得る。要素の階層セットは、モデル化された音場の完全な表現をより低次の要素の基本セットが提供するように要素が順序付けられる、要素のセットを指し得る。セットがより高次の要素を含むように拡張されると、表現はより詳細なものになり、分解能は向上する。 [0029] To provide such flexibility to content creators, a hierarchical set of elements can be used to represent a sound field. A hierarchical set of elements may refer to a set of elements in which the elements are ordered such that a basic set of lower order elements provides a complete representation of the modeled sound field. As the set is expanded to include higher order elements, the representation becomes more detailed and resolution is improved.

[0030]要素の階層セットの一例は、球面調和係数（ＳＨＣ）のセットである。次の式は、ＳＨＣを使用する音場の記述または表現を示す。 [0030] An example of a hierarchical set of elements is a set of spherical harmonic coefficients (SHC). The following equation shows a description or representation of a sound field that uses SHC.

[0031]この式は、時間ｔにおける音場の任意の点｛ｒ_r，θ_r，φ_r｝における圧力ｐ_iが、ＳＨＣ、 [0031] This equation shows that the pressure p _i at any point {r _r , θ _r , φ _r } in the sound field at time t is SHC,

によって一意に表され得ることを示す。ここで、 It can be expressed uniquely by here,

であり、ｃは音速（約３４３ｍ／ｓ）であり、｛ｒ_r，θ_r，φ_r｝は基準点（または観測点）であり、ｊ_n（・）は次数ｎの球ベッセル関数であり、 , C is the speed of sound (about 343 m / s), {r _r , θ _r , φ _r } is a reference point (or observation point), and j _n (•) is a spherical Bessel function of order n. ,

は次数ｎおよび副次数ｍの球面調和基底関数である。角括弧内の項は、離散フーリエ変換（ＤＦＴ）、離散コサイン変換（ＤＣＴ）、またはウェーブレット変換などの様々な時間周波数変換によって近似され得る信号の周波数領域表現（すなわち、Ｓ（ω，ｒ_r，θ_r，φ_r））であることが認識できよう。階層セットの他の例は、ウェーブレット変換係数のセット、および多分解能基底関数の係数の他のセットを含む。 Is a spherical harmonic basis function of order n and sub-order m. The terms in square brackets are frequency domain representations of the signal that can be approximated by various time frequency transforms such as discrete Fourier transform (DFT), discrete cosine transform (DCT), or wavelet transform (ie, S (ω, r _r , It can be recognized that θ _r , φ _r )). Other examples of hierarchical sets include sets of wavelet transform coefficients and other sets of coefficients of multi-resolution basis functions.

[0032]図１は、０次（ｎ＝０）から４次（ｎ＝４）までの球面調和基底関数を示す図である。理解できるように、各次数に対して、説明を簡単にするために図示されているが図１の例では明示的に示されていない副次数ｍの拡張が存在する。 [0032] FIG. 1 is a diagram showing spherical harmonic basis functions from the 0th order (n = 0) to the 4th order (n = 4). As can be seen, for each order there is an extension of sub-order m, which is shown for simplicity of explanation but is not explicitly shown in the example of FIG.

[0033]ＳＨＣ [0033] SHC

は、様々なマイクロフォンアレイ構成によって物理的に獲得（たとえば、録音）されてよく、または代替的に、それらは音場のチャネルベースまたはオブジェクトベースの記述から導出されてよい。ＳＨＣはシーンベースのオーディオを表し、ここで、ＳＨＣは、より効率的な送信または記憶を促し得る符号化されたＳＨＣを取得するために、オーディオ符号化器に入力され得る。たとえば、（１＋４）²個の（２５個の、したがって４次の）係数を伴う４次表現が使用され得る。 May be physically acquired (eg, recorded) by various microphone array configurations, or alternatively, they may be derived from a channel-based or object-based description of the sound field. SHC represents scene-based audio, where the SHC can be input to an audio encoder to obtain an encoded SHC that can facilitate more efficient transmission or storage. For example, a quaternary representation with (1 + 4) ² (25 and hence fourth order) coefficients may be used.

[0034]上述されたように、ＳＨＣは、マイクロフォンアレイを使用したマイクロフォン録音から導出され得る。ＳＨＣがマイクロフォンアレイからどのように導出され得るかの様々な例は、Ｐｏｌｅｔｔｉ，Ｍ、「Ｔｈｒｅｅ−ＤｉｍｅｎｓｉｏｎａｌＳｕｒｒｏｕｎｄＳｏｕｎｄＳｙｓｔｅｍｓＢａｓｅｄｏｎＳｐｈｅｒｉｃａｌＨａｒｍｏｎｉｃｓ」、Ｊ．ＡｕｄｉｏＥｎｇ．Ｓｏｃ．、Ｖｏｌ．５３、Ｎｏ．１１、２００５年１１月、１００４〜１０２５ページにおいて説明されている。 [0034] As described above, the SHC may be derived from a microphone recording using a microphone array. Various examples of how SHC can be derived from a microphone array are described in Poletti, M, “Three-Dimensional Surround Sound Systems Based on Physical Harmonics”, J. Org. Audio Eng. Soc. Vol. 53, no. 11, November 2005, pages 1004-1025.

[0035]ＳＨＣがどのようにオブジェクトベースの記述から導出され得るかを例示するために、次の式を考える。個々のオーディオオブジェクトに対応する音場についての係数 [0035] To illustrate how the SHC can be derived from an object-based description, consider the following equation: Coefficients for the sound field corresponding to individual audio objects

は、 Is

と表され得、ただし、ｉは、 Where i is

であり、 And

は、次数ｎの（第２の種類の）球ハンケル関数であり、｛ｒ_s，θ_s、φ_s｝はオブジェクトのロケーションである。周波数の関数として（たとえば、ＰＣＭストリームに対して高速フーリエ変換を実行するなど、時間周波数分析技法を使用して）オブジェクトソースエネルギーｇ（ω）を知ることで、各ＰＣＭオブジェクトと対応するロケーションとをＳＨＣ Is a spherical Hankel function of order n (second type), and {r _s , θ _s , φ _s } is the location of the object. Knowing the object source energy g (ω) as a function of frequency (eg, using a time-frequency analysis technique, such as performing a fast Fourier transform on a PCM stream), the location corresponding to each PCM object SHC

に変換することが可能となる。さらに、各オブジェクトについての It becomes possible to convert to. In addition, for each object

係数は、（上式は線形であり直交方向の分解であるので）加法的であることが示され得る。このようにして、多数のＰＣＭオブジェクトが The coefficients can be shown to be additive (since the above equation is linear and orthogonal). In this way, many PCM objects

係数によって（たとえば、個々のオブジェクトについての係数ベクトルの和として）表され得る。本質的に、これらの係数は、音場についての情報（３Ｄ座標の関数としての圧力）を含んでおり、上記は、観測点｛ｒ_r，θ_r，φ_r｝の近傍における、音場全体の表現への個々のオブジェクトからの変換を表す。残りの数字は、以下でオブジェクトベースのオーディオコーディングおよびＳＨＣベースのオーディオコーディングの文脈で説明される。 It can be represented by a coefficient (eg, as a sum of coefficient vectors for individual objects). In essence, these coefficients contain information about the sound field (pressure as a function of 3D coordinates), which is the total sound field in the vicinity of the observation points {r _r , θ _r , φ _r }. Represents a conversion from an individual object to a representation of The remaining numbers are described below in the context of object-based audio coding and SHC-based audio coding.

[0036]図２は、本開示で説明される技法の様々な態様を実行することができるシステム１０を示す図である。図２の例に示されるように、システム１０は、コンテンツ作成者デバイス１２と、コンテンツ消費者デバイス１４とを含む。コンテンツ作成者デバイス１２およびコンテンツ消費者デバイス１４の文脈で説明されているが、本技法は、オーディオデータを表すビットストリームを形成するために、ＳＨＣ（ＨＯＡ係数とも呼ばれ得る）または音場の任意の他の階層的表現が符号化される任意の文脈で実施され得る。その上、コンテンツ作成者デバイス１２は、いくつか例を挙げると、ハンドセット（またはセルラーフォン）、タブレットコンピュータ、スマートフォン、またはデスクトップコンピュータを含む、本開示で説明される技法を実施することが可能な任意の形態のコンピューティングデバイスを表し得る。同様に、コンテンツ消費者デバイス１４は、いくつか例を挙げると、ハンドセット（またはセルラーフォン）、タブレットコンピュータ、スマートフォン、セットトップボックス、またはデスクトップコンピュータを含む、本開示で説明される技法を実施することが可能な任意の形態のコンピューティングデバイスを表し得る。 [0036] FIG. 2 is an illustration of a system 10 that can perform various aspects of the techniques described in this disclosure. As shown in the example of FIG. 2, the system 10 includes a content creator device 12 and a content consumer device 14. Although described in the context of the content creator device 12 and the content consumer device 14, the technique can be any of SHC (which may also be referred to as a HOA coefficient) or sound field to form a bitstream representing audio data. It can be implemented in any context where other hierarchical representations are encoded. Moreover, the content creator device 12 can implement any of the techniques described in this disclosure, including a handset (or cellular phone), tablet computer, smartphone, or desktop computer, to name a few examples. In the form of a computing device. Similarly, content consumer device 14 implements the techniques described in this disclosure, including a handset (or cellular phone), tablet computer, smartphone, set-top box, or desktop computer, to name a few examples. May represent any form of computing device.

[0037]コンテンツ作成者デバイス１２は、コンテンツ消費者デバイス１４などのコンテンツ消費者デバイスのオペレータによる消費のためのマルチチャネルオーディオコンテンツを生成することができる、映画スタジオまたは他のエンティティによって操作され得る。いくつかの例では、コンテンツ作成者デバイス１２は、ＨＯＡ係数１１を圧縮することを望む個人ユーザによって操作され得る。多くの場合、コンテンツ作成者は、ビデオコンテンツとともに、オーディオコンテンツを生成する。コンテンツ消費者デバイス１４は、個人によって操作され得る。コンテンツ消費者デバイス１４は、マルチチャネルオーディオコンテンツとしての再生のためにＳＨＣをレンダリングすることが可能な任意の形態のオーディオ再生システムを指し得る、オーディオ再生システム１６を含み得る。 [0037] Content creator device 12 may be operated by a movie studio or other entity that can generate multi-channel audio content for consumption by an operator of a content consumer device, such as content consumer device 14. In some examples, the content creator device 12 may be operated by an individual user who desires to compress the HOA factor 11. In many cases, content creators generate audio content along with video content. The content consumer device 14 can be operated by an individual. Content consumer device 14 may include an audio playback system 16 that may refer to any form of audio playback system capable of rendering an SHC for playback as multi-channel audio content.

[0038]コンテンツ作成者デバイス１２は、オーディオ編集システム１８を含む。コンテンツ作成者デバイス１２は、様々なフォーマットのライブ録音７（ＨＯＡ係数として直接含む）とオーディオオブジェクト９とを取得し、コンテンツ作成者デバイス１２は、オーディオ編集システム１８を使用してこれらを編集することができる。マイクロフォン５は、ライブ録音７をキャプチャし得る。コンテンツ作成者は、編集プロセスの間に、オーディオオブジェクト９からのＨＯＡ係数１１をレンダリングし、さらなる編集を必要とする音場の様々な態様を特定しようとして、レンダリングされたスピーカーフィードを聞くことができる。コンテンツ作成者デバイス１２は次いで、（潜在的に、上記で説明された方法でソースＨＯＡ係数がそれから導出され得るオーディオオブジェクト９のうちの様々なオブジェクトの操作を通じて間接的に）ＨＯＡ係数１１を編集することができる。コンテンツ作成者デバイス１２は、ＨＯＡ係数１１を生成するためにオーディオ編集システム１８を採用することができる。オーディオ編集システム１８は、オーディオデータを編集し、このオーディオデータを１つまたは複数のソース球面調和係数として出力することが可能な任意のシステムを表す。 [0038] The content creator device 12 includes an audio editing system 18. The content creator device 12 obtains live recordings 7 (directly included as HOA coefficients) and audio objects 9 in various formats, and the content creator device 12 edits them using an audio editing system 18. Can do. The microphone 5 can capture a live recording 7. During the editing process, the content creator can render the HOA coefficient 11 from the audio object 9 and listen to the rendered speaker feed in an attempt to identify various aspects of the sound field that require further editing. . The content creator device 12 then edits the HOA coefficient 11 (potentially through manipulation of various objects of the audio object 9 from which the source HOA coefficient can be derived in the manner described above). be able to. The content creator device 12 can employ an audio editing system 18 to generate the HOA coefficient 11. Audio editing system 18 represents any system capable of editing audio data and outputting the audio data as one or more source spherical harmonic coefficients.

[0039]編集プロセスが完了すると、コンテンツ作成者デバイス１２は、ＨＯＡ係数１１に基づいてビットストリーム２１を生成することができる。すなわち、コンテンツ作成者デバイス１２は、ビットストリーム２１を生成するために、本開示で説明される技法の様々な態様に従って、ＨＯＡ係数１１を符号化またはさもなければ圧縮するように構成されたデバイスを表す、オーディオ符号化デバイス２０を含む。オーディオ符号化デバイス２０は、一例として、有線チャネルまたはワイヤレスチャネル、データ記憶デバイスなどであり得る送信チャネルを介した送信のために、ビットストリーム２１を生成することができる。ビットストリーム２１は、ＨＯＡ係数１１の符号化されたバージョンを表すことができ、主要ビットストリームと、サイドチャネル情報とも呼ばれ得る別のサイドビットストリームとを含み得る。 [0039] Upon completion of the editing process, the content creator device 12 may generate the bitstream 21 based on the HOA factor 11. That is, the content creator device 12 has a device configured to encode or otherwise compress the HOA coefficient 11 in accordance with various aspects of the techniques described in this disclosure to generate the bitstream 21. An audio encoding device 20 is represented. Audio encoding device 20 may generate bitstream 21 for transmission over a transmission channel, which may be a wired or wireless channel, a data storage device, etc., by way of example. Bitstream 21 may represent an encoded version of HOA coefficient 11 and may include a main bitstream and another side bitstream that may also be referred to as side channel information.

[0040]図２では、コンテンツ消費者デバイス１４に直接的に送信されるものとして示されているが、コンテンツ作成者デバイス１２は、コンテンツ作成者デバイス１２とコンテンツ消費者デバイス１４との間に配置された中間デバイスにビットストリーム２１を出力することができる。中間デバイスは、ビットストリームを要求し得るコンテンツ消費者デバイス１４に後で配信するために、ビットストリーム２１を記憶することができる。中間デバイスは、ファイルサーバ、ウェブサーバ、デスクトップコンピュータ、ラップトップコンピュータ、タブレットコンピュータ、モバイルフォン、スマートフォン、または後でのオーディオ復号器による取出しのためにビットストリーム２１を記憶することが可能な任意の他のデバイスを備え得る。中間デバイスは、ビットストリーム２１を要求するコンテンツ消費者デバイス１４などの加入者にビットストリーム２１を（場合によっては対応するビデオデータビットストリームを送信するとともに）ストリーミングすることが可能なコンテンツ配信ネットワーク内に存在してもよい。 [0040] Although shown in FIG. 2 as being sent directly to the content consumer device 14, the content creator device 12 is located between the content creator device 12 and the content consumer device 14. The bitstream 21 can be output to the intermediate device that has been selected. The intermediate device can store the bitstream 21 for later delivery to the content consumer device 14 that may request the bitstream. The intermediate device may be a file server, web server, desktop computer, laptop computer, tablet computer, mobile phone, smartphone, or any other capable of storing the bitstream 21 for later retrieval by an audio decoder. The device may be provided. The intermediate device is in a content distribution network capable of streaming the bitstream 21 (possibly with a corresponding video data bitstream) to a subscriber, such as a content consumer device 14 that requests the bitstream 21. May be present.

[0041]代替的に、コンテンツ作成者デバイス１２は、コンパクトディスク、デジタルビデオディスク、高精細度ビデオディスク、または他の記憶媒体などの記憶媒体にビットストリーム２１を記憶することができ、記憶媒体の大部分はコンピュータによって読み取り可能であり、したがって、コンピュータ可読記憶媒体または非一時的コンピュータ可読記憶媒体と呼ばれることがある。この文脈において、送信チャネルは、これらの媒体に記憶されたコンテンツが送信されるチャネルを指すことがある（および、小売店と他の店舗ベースの配信機構とを含み得る）。したがって、いずれにしても、本開示の技法は、この点に関して図２の例に限定されるべきではない。 [0041] Alternatively, the content creator device 12 may store the bitstream 21 on a storage medium, such as a compact disk, digital video disk, high definition video disk, or other storage medium, Most are readable by a computer and are therefore sometimes referred to as computer-readable or non-transitory computer-readable storage media. In this context, a transmission channel may refer to a channel through which content stored on these media is transmitted (and may include retail stores and other store-based distribution mechanisms). Thus, in any event, the techniques of this disclosure should not be limited to the example of FIG. 2 in this regard.

[0042]図２の例にさらに示されるように、コンテンツ消費者デバイス１４は、オーディオ再生システム１６を含む。オーディオ再生システム１６は、マルチチャネルオーディオデータを再生することが可能な任意のオーディオ再生システムを表し得る。オーディオ再生システム１６は、いくつかの異なるレンダラ２２を含み得る。レンダラ２２は各々、異なる形態のレンダリングを提供することができ、異なる形態のレンダリングは、ベクトルベース振幅パンニング（ＶＢＡＰ：vector-base amplitude panning）を実行する様々な方法の１つもしくは複数、および／または音場合成を実行する様々な方法の１つもしくは複数を含み得る。本明細書で使用される場合、「Ａおよび／またはＢ」は、「ＡまたはＢ」、または「ＡとＢ」の両方を意味する。 As further shown in the example of FIG. 2, the content consumer device 14 includes an audio playback system 16. Audio playback system 16 may represent any audio playback system capable of playing multi-channel audio data. Audio playback system 16 may include a number of different renderers 22. Each of the renderers 22 can provide a different form of rendering, wherein the different forms of rendering are one or more of various ways to perform vector-base amplitude panning (VBAP) and / or One or more of various ways of performing sound field synthesis may be included. As used herein, “A and / or B” means “A or B” or both “A and B”.

[0043]オーディオ再生システム１６は、オーディオ復号デバイス２４をさらに含み得る。オーディオ復号デバイス２４は、ビットストリーム２１からＨＯＡ係数１１’を復号するように構成されたデバイスを表し得、ＨＯＡ係数１１’は、ＨＯＡ係数１１と類似し得るが、損失のある演算（たとえば、量子化）および／または送信チャネルを介した送信が原因で異なり得る。オーディオ再生システム１６は、ＨＯＡ係数１１’を取得するためにビットストリーム２１を復号した後、ラウドスピーカーフィード２５を出力するためにＨＯＡ係数１１’をレンダリングすることができる。ラウドスピーカーフィード２５は、１つまたは複数のラウドスピーカー（説明を簡単にするために図２の例には示されていない）を駆動することができる。 [0043] The audio playback system 16 may further include an audio decoding device 24. Audio decoding device 24 may represent a device configured to decode HOA coefficient 11 ′ from bitstream 21, which may be similar to HOA coefficient 11 but with lossy operations (eg, quantum ) And / or transmission over the transmission channel. The audio playback system 16 can render the HOA coefficients 11 ′ to output the loudspeaker feed 25 after decoding the bitstream 21 to obtain the HOA coefficients 11 ′. The loudspeaker feed 25 can drive one or more loudspeakers (not shown in the example of FIG. 2 for ease of explanation).

[0044]適切なレンダラを選択するために、またはいくつかの場合には、適切なレンダラを生成するために、オーディオ再生システム１６は、ラウドスピーカーの数および／またはラウドスピーカーの空間的な幾何学的配置を示すラウドスピーカー情報１３を取得することができる。いくつかの場合には、オーディオ再生システム１６は、基準マイクロフォンを使用してラウドスピーカー情報１３を取得し、ラウドスピーカー情報１３を動的に決定するような方法でラウドスピーカーを駆動することができる。他の場合には、またはラウドスピーカー情報１３の動的な決定とともに、オーディオ再生システム１６は、オーディオ再生システム１６とインターフェースをとりラウドスピーカー情報１３を入力するようにユーザに促すことができる。 [0044] In order to select an appropriate renderer or, in some cases, to generate an appropriate renderer, the audio playback system 16 may determine the number of loudspeakers and / or the spatial geometry of the loudspeakers. The loudspeaker information 13 indicating the target arrangement can be acquired. In some cases, the audio playback system 16 can drive the loudspeaker in such a way as to obtain the loudspeaker information 13 using a reference microphone and dynamically determine the loudspeaker information 13. In other cases, or in conjunction with the dynamic determination of the loudspeaker information 13, the audio playback system 16 may prompt the user to interface with the audio playback system 16 and enter the loudspeaker information 13.

[0045]オーディオ再生システム１６は次いで、ラウドスピーカー情報１３に基づいて、オーディオレンダラ２２のうちの１つを選択することができる。いくつかの場合には、オーディオ再生システム１６は、オーディオレンダラ２２のいずれもがラウドスピーカー情報１３において指定されたラウドスピーカーの幾何学的配置に対して（ラウドスピーカーの幾何学的配置の観点では）何らかの類似性の尺度のしきい値内にないとき、ラウドスピーカー情報１３に基づいて、オーディオレンダラ２２のうちの１つを生成することができる。オーディオ再生システム１６は、いくつかの場合には、オーディオレンダラ２２のうちの既存の１つを選択することを最初に試みることなく、ラウドスピーカー情報１３に基づいて、オーディオレンダラ２２のうちの１つを生成することができる。１つまたは複数のスピーカー３は、その後、レンダリングされたラウドスピーカーフィード２５を再生し得る。 [0045] The audio playback system 16 may then select one of the audio renderers 22 based on the loudspeaker information 13. In some cases, the audio playback system 16 may have any of the audio renderers 22 relative to the loudspeaker geometry specified in the loudspeaker information 13 (in terms of loudspeaker geometry). One of the audio renderers 22 may be generated based on the loudspeaker information 13 when not within some similarity measure threshold. The audio playback system 16 may in one case select one of the audio renderers 22 based on the loudspeaker information 13 without first trying to select an existing one of the audio renderers 22. Can be generated. The one or more speakers 3 may then play the rendered loudspeaker feed 25.

[0046]いくつかの場合には、音声再生システム１６は、オーディオレンダラ２２のうちの任意の１つを選択でき、ビットストリーム２１が受信されソース（数例を提供するために、ＤＶＤプレーヤー、ブルーレイ（登録商標）プレーヤー、スマートフォン、タブレットコンピュータ、ゲーム機、およびテレビ受像機など）に応じてオーディオレンダラ２２のうちの１つまたは複数を選択するように構成されることができる。オーディオレンダラ２２のうちの任意の１つが、選択されることができるが、しばしばコンテンツを作成するときに使用されたオーディオレンダラは、コンテンツが、音声レンダラのうちのこの１つ、すなわち図３の例では音声レンダラ５を使用してコンテンツ作成者１２によって作成されたという事実に起因して、レンダリングのより良い（おそらく最良の）形を提供する。同じまたは少なくとも近い（レンダリングの形態の観点から）、オーディオレンダラ２２のうちの１つを選択することは、音場のより良い表現を提供することができ、コンテンツ消費者１４のために良いサラウンド音響体験をもたらすことができる。 [0046] In some cases, the audio playback system 16 can select any one of the audio renderers 22, and the bitstream 21 is received from the source (DVD player, Blu-ray to provide several examples). Depending on the (registered trademark) player, smartphone, tablet computer, game console, television receiver, etc.) can be configured to select one or more of the audio renderers 22. Any one of the audio renderers 22 can be selected, but the audio renderer often used when creating the content is that the content is this one of the audio renderers, ie the example of FIG. Now, due to the fact that it was created by the content creator 12 using the audio renderer 5, it provides a better (probably the best) form of rendering. Choosing one of the audio renderers 22 that is the same or at least close (in terms of the form of rendering) can provide a better representation of the sound field and good surround sound for the content consumer 14. Can bring experiences.

[0047]本開示で説明される技法に従って、オーディオ符号化デバイス２０は、オーディオレンダリング情報２（「レンダー情報２（render info）」）を含むために、ビットストリーム２１を生成することができる。オーディオレンダリング情報２は、マルチチャネルオーディオコンテンツを生成するときに使用されるオーディオレンダラ、すなわち図３の例ではオーディオレンダラ１を識別する信号値を含むことができる。いくつかの場合には、信号値は、球面調和係数を複数のスピーカーフィードにレンダリングするために使用されるマトリックスを含む。 [0047] In accordance with the techniques described in this disclosure, audio encoding device 20 may generate bitstream 21 to include audio rendering information 2 ("render info 2"). The audio rendering information 2 can include a signal value that identifies the audio renderer used when generating multi-channel audio content, ie, the audio renderer 1 in the example of FIG. In some cases, the signal value includes a matrix that is used to render spherical harmonic coefficients into multiple speaker feeds.

[0048]いくつかの場合には、信号値は、ビットストリームが、球面調和係数を複数のスピーカーフィードにレンダリングするために使用されるマトリックスを含むことを示すインデックスを規定する２つ以上のビットを含む。いくつかの場合には、インデックスが、使用されるとき、信号値はさらに、ビットストリームに含まれる行列の行の数を規定する２つ以上のビットと、ビットストリームに含まれる行列の列の数を規定する２つ以上のビットとを含む。この情報を使用し、二次元行列の各係数が典型的には、３２ビット浮動小数点数によって規定されると仮定すると、行列のビットの観点からのサイズは、行の数、列の数、およびマトリックスの各係数を規定する浮動小数点数のサイズ、すなわちこの例では３２ビットの関数として計算され得る。 [0048] In some cases, the signal value includes two or more bits that define an index indicating that the bitstream includes a matrix used to render spherical harmonic coefficients into multiple speaker feeds. Including. In some cases, when an index is used, the signal value further includes two or more bits that define the number of matrix rows included in the bitstream, and the number of matrix columns included in the bitstream. And two or more bits that define Using this information and assuming that each coefficient of a two-dimensional matrix is typically defined by a 32-bit floating point number, the size in terms of the bits of the matrix is the number of rows, the number of columns, and It can be calculated as a function of the size of a floating point number defining each coefficient of the matrix, i.e. in this example 32 bits.

[0049]いくつかの場合には、信号値は、球面調和係数を複数のスピーカーフィードにレンダリングするために使用されるレンダリングアルゴリズムを指定する。レンダリングアルゴリズムは、オーディオ符号化デバイス２０と復号デバイス２４との両方に既知の行列を含むことができる。すなわち、レンダリングアルゴリズムは、パニング（たとえば、ＶＢＡＰ、ＤＢＡＰ、もしくは単純なパニング）またはＮＦＣフィルタリングなどの、他のレンダリングステップに加えて行列の適用を含むことができる。いくつかの場合には、信号値は、球面調和係数を複数のスピーカーフィードにレンダリングするために使用される複数の行列のうちの１つと関連するインデックスを規定する２つ以上のビットを含む。この場合もやはり、オーディオ符号化デバイス２０と復号デバイス２４との両方は、インデックスが複数の行列のうちの特定の１つを一意的に識別することができるように、複数の行列と複数の行列の次数とを示す情報で構成され得る。代替的に、オーディオ符号化デバイス２０は、インデックスが複数の行列のうちの特定の１つを一意的に識別することができるように、複数の行列および／または複数の行列の次数を規定するビットストリーム３１でのデータを指定し得る。 [0049] In some cases, the signal value specifies a rendering algorithm that is used to render spherical harmonics into multiple speaker feeds. The rendering algorithm can include a matrix that is known to both the audio encoding device 20 and the decoding device 24. That is, the rendering algorithm can include matrix application in addition to other rendering steps such as panning (eg, VBAP, DBAP, or simple panning) or NFC filtering. In some cases, the signal value includes two or more bits that define an index associated with one of the plurality of matrices used to render the spherical harmonic coefficients into the plurality of speaker feeds. Again, both audio encoding device 20 and decoding device 24 may use a plurality of matrices and a plurality of matrices so that the index can uniquely identify a particular one of the plurality of matrices. And information indicating the order of. Alternatively, audio encoding device 20 may specify a plurality of matrices and / or a plurality of matrix orders such that an index can uniquely identify a particular one of the plurality of matrices. Data in stream 31 may be specified.

[0050]いくつかの場合には、信号値は、球面調和係数を複数のスピーカーフィードにレンダリングするために使用される複数のレンダリングアルゴリズムのうちの１つと関連するインデックスを規定する２つ以上のビットを含む。この場合もやはり、オーディオ符号化デバイス２０と復号デバイス２４との両方は、インデックスが複数の行列のうちの特定の１つを一意的に識別することができるように、複数のレンダリングアルゴリズムと複数のレンダリングアルゴリズムの次数とを示す情報で構成され得る。代替的に、オーディオ符号化デバイス２０は、インデックスが複数の行列の特定の１つを一意的に識別することができるように、複数の行列および／または複数の行列の次数を規定するビットストリーム２１中のデータを指定し得る。 [0050] In some cases, the signal value is two or more bits that define an index associated with one of a plurality of rendering algorithms used to render the spherical harmonics into a plurality of speaker feeds. including. Again, both audio encoding device 20 and decoding device 24 may use a plurality of rendering algorithms and a plurality of rendering algorithms such that the index can uniquely identify a particular one of the plurality of matrices. It may be composed of information indicating the order of the rendering algorithm. Alternatively, the audio encoding device 20 may define a plurality of matrices and / or a plurality of matrix orders such that the index can uniquely identify a particular one of the plurality of matrices. You can specify the data inside.

[0051]いくつかの場合には、オーディオ符号化デバイス２０は、ビットストリーム中でオーディオフレームごとにオーディオレンダリング情報２を指定する。他の場合には、オーディオ符号化デバイス２０は、ビットストリーム中で一回、オーディオレンダリング情報２を指定する。 [0051] In some cases, audio encoding device 20 specifies audio rendering information 2 for each audio frame in the bitstream. In other cases, the audio encoding device 20 specifies the audio rendering information 2 once in the bitstream.

[0052]復号デバイス２４は次いで、ビットストリーム中で指定されるオーディオレンダリング情報２を決定し得る。オーディオレンダリング情報２中に含まれる信号値に基づいて、オーディオ再生システム１６は、オーディオレンダリング情報２に基づいて複数のスピーカーフィード２５をレンダリングし得る。上述されたように、信号値は、いくつかの場合には、球面調和係数を複数のスピーカーフィードにレンダリングするために使用される行列を含み得る。この場合には、オーディオ再生システム１６は、その行列でオーディオレンダラ２２のうちの１つを構成することができ、その行列に基づいてスピーカーフィード２５をレンダリングするためにオーディオレンダラ２２のうちのこの１つを使用する。 [0052] The decoding device 24 may then determine the audio rendering information 2 specified in the bitstream. Based on the signal values included in the audio rendering information 2, the audio playback system 16 may render a plurality of speaker feeds 25 based on the audio rendering information 2. As described above, the signal values may in some cases include a matrix that is used to render spherical harmonic coefficients into multiple speaker feeds. In this case, the audio playback system 16 may configure one of the audio renderers 22 with that matrix, and this one of the audio renderers 22 to render the speaker feed 25 based on that matrix. Use one.

[0053]いくつかの場合には、信号値は、ビットストリームが、ＨＯＡ１１’をスピーカーフィード２５にレンダリングするために使用される行列を含むことを示すインデックスを規定する２つ以上のビットを含む。復号デバイス２４は、インデックスに応答してビットストリームから行列を解析することができ、そうするとオーディオ再生システム１６は、解析された行列でオーディオレンダラ２２のうちの１つを構成し、スピーカーフィード２５をレンダリングするためにレンダラ２２のうちのこの１つを呼び出し得る。信号値が、ビットストリーム中に含まれる行列の行の数を規定する２つ以上のビットと、ビットストリーム中に含まれる行列の列の数を規定する２つ以上のビットとを含むとき、復号デバイス２４は、インデックスに応答し、上述されたように行の数を規定する２つ以上のビットおよび列の数を規定する２つ以上のビットに基づいてビットストリームから行列を解析し得る。 [0053] In some cases, the signal value includes two or more bits that define an index indicating that the bitstream includes a matrix used to render the HOA 11 'to the speaker feed 25. The decoding device 24 can parse the matrix from the bitstream in response to the index, so that the audio playback system 16 constructs one of the audio renderers 22 with the parsed matrix and renders the speaker feed 25. This one of the renderers 22 may be called to do so. Decoding when the signal value includes two or more bits defining the number of rows of the matrix included in the bitstream and two or more bits defining the number of columns of the matrix included in the bitstream Device 24 may respond to the index and parse the matrix from the bitstream based on the two or more bits defining the number of rows and the two or more bits defining the number of columns as described above.

[0054]いくつか場合には、信号値は、ＨＯＡ１１’をスピーカーフィード２２にレンダリングするために使用されるレンダリングアルゴリズムを指定する。これらの場合には、オーディオレンダラ２２のうちのいくつかまたはすべては、これらのレンダリングアルゴリズムを実行し得る。オーディオ再生デバイス１６は次いで、ＨＯＡ１１’からスピーカーフィード２５をレンダリングするために、指定されたレンダリングアルゴリズム、たとえばオーディオレンダラ２２のうちの１つを利用し得る。 [0054] In some cases, the signal value specifies the rendering algorithm used to render the HOA 11 'to the speaker feed 22. In these cases, some or all of the audio renderers 22 may perform these rendering algorithms. Audio playback device 16 may then utilize one of the specified rendering algorithms, eg, audio renderer 22, to render speaker feed 25 from HOA 11 '.

[0055]信号値が、ＨＯＡ１１’をスピーカーフィード２５にレンダリングするために使用される複数の行列のうちの１つと関連するインデックスを規定する２つ以上のビットを含むとき、オーディオレンダラ２２のうちのいくつかまたはすべては、この複数の行列を表し得る。したがって、オーディオ再生システム１６は、インデックスと関連するオーディオレンダラ２２のうちの１つを使用してＨＯＡ１１’からスピーカーフィード２５をレンダリングし得る。 [0055] When the signal value includes two or more bits defining an index associated with one of a plurality of matrices used to render the HOA 11 'to the speaker feed 25, of the audio renderer 22 Some or all may represent this multiple matrix. Thus, the audio playback system 16 may render the speaker feed 25 from the HOA 11 'using one of the audio renderers 22 associated with the index.

[0056]信号値が、ＨＯＡ１１’をスピーカーフィード２５にレンダリングするために使用される複数のレンダリングアルゴリズムのうちの１つと関連するインデックスを規定する２つ以上のビットを含むとき、オーディオレンダラ３４のうちのいくつかまたはすべては、これらのレンダリングアルゴリズムを表し得る。したがって、オーディオ再生システム１６は、インデックスと関連するオーディオレンダラ２２のうちの１つを使用して球面調和係数１１’からスピーカーフィード２５をレンダリングし得る。 [0056] of the audio renderer 34 when the signal value includes two or more bits defining an index associated with one of a plurality of rendering algorithms used to render the HOA 11 'to the speaker feed 25 Some or all of these may represent these rendering algorithms. Thus, the audio playback system 16 may render the speaker feed 25 from the spherical harmonics 11 'using one of the audio renderers 22 associated with the index.

[0057]このオーディオレンダリング情報がビットストリーム中で指定される頻度に依存して、復号デバイス２４は、オーディオフレームベースごと（per-audio-frame-basis）にまたは一度で、オーディオレンダリング情報２を決定し得る。 [0057] Depending on the frequency with which this audio rendering information is specified in the bitstream, decoding device 24 determines audio rendering information 2 on a per-audio-frame-basis basis or once. Can do.

[0058]このようにオーディオレンダリング情報３を指定することによって、本技法は、コンテンツ作成者１２がマルチチャネルオーディオコンテンツを再生しようと意図した仕方に従って、マルチチャネルオーディオコンテンツのより良い再生を潜在的にもたらし得る。結果として、本技法は、より没入型サラウンド音響またはマルチチャネルオーディオ体験を提供し得る。 [0058] By specifying the audio rendering information 3 in this manner, the present technique potentially allows for better playback of multi-channel audio content according to how the content creator 12 intended to play multi-channel audio content. Can bring. As a result, the present technique may provide a more immersive surround sound or multi-channel audio experience.

[0059]言い換えれば、および、上述されたように、高次アンビソニック（ＨＯＡ）は、空間フーリエ変換に基づいて音場の指向性情報を説明するための方法を表わし得る。典型的には、アンビソニックス次数Ｎがより高いほど、空間分解能はより高く、球面調和（ＳＨ）係数の数（Ｎ＋１）＾２はより大きく、データを送信し、記憶するために必要とされる帯域幅はより大きい。 [0059] In other words, and as described above, higher order ambisonic (HOA) may represent a method for describing directivity information of a sound field based on spatial Fourier transform. Typically, the higher the ambisonics order N, the higher the spatial resolution and the larger the number of spherical harmonic (SH) coefficients (N + 1) ^ 2, which is required to transmit and store data. The bandwidth is larger.

[0060]この説明の潜在的利点は、大部分の任意のラウドスピーカー設定（たとえば、５．１、７．１、２２．２等）でこの音場を再生する可能性である。音場記述からＭ個のラウドスピーカー信号への転換は、（Ｎ＋１）²個の入力およびＭ個の出力を持つ静的なレンダリング行列を介して行われ得る。その結果として、あらゆるラウドスピーカー設定は、専用のレンダリング行列を必要とし得る。いくつかのアルゴリズムは、所望のラウドスピーカー設定のためのレンダリング行列を計算するために存在し得、それは、ガーゾン（Gerzon）基準のような、ある客観的尺度または主観的尺度のために最適化され得る。不規則なラウドスピーカー設定では、アルゴリズムは、凸最適化のような、反復数値最適化プロシージャに起因して複雑になり得る。待ち時間なしに不規則なラウドスピーカー配置のためのレンダリング行列を計算するために、利用可能な十分な計算リソースを有することが、有益となり得る。不規則なラウドスピーカー設定は、構造的制約および美的な好みに起因して家庭のリビングルーム環境において良く起き得る。したがって、最良の音場再生では、そのようなシナリオのために最適化されたレンダリング行列は、音場の再生をより正確に可能にし得るという点で好まれ得る。 [0060] A potential advantage of this description is the possibility of playing this sound field with most arbitrary loudspeaker settings (eg 5.1, 7.12, 22.2, etc.). The conversion from the sound field description to M loudspeaker signals can be done via a static rendering matrix with (N + 1) ² inputs and M outputs. As a result, every loudspeaker setting may require a dedicated rendering matrix. Several algorithms may exist to calculate the rendering matrix for the desired loudspeaker settings, which is optimized for some objective or subjective measure, such as the Gerzon criterion obtain. With irregular loudspeaker settings, the algorithm can be complicated due to iterative numerical optimization procedures, such as convex optimization. It may be beneficial to have sufficient computational resources available to calculate a rendering matrix for irregular loudspeaker placement without latency. Irregular loudspeaker settings can often occur in home living room environments due to structural constraints and aesthetic preferences. Thus, for best sound field reproduction, a rendering matrix optimized for such a scenario may be preferred in that it may allow for more accurate sound field reproduction.

[0061]オーディオ復号器は通常、多くの計算リソースを必要としないので、デバイスは、消費者が使い易い時間（consumer-friendly time）において不規則なレンダリング行列を計算することができ得ない。本開示で説明される技法の様々な態様は、以下のようなクラウドベース計算アプローチを使用に提供し得る：
１．オーディオ復号器は、サーバにラウドスピーカー座標を（およびいくつかの場合には、較正マイクロフォンで取得されるＳＰＬ測定結果も）インターネット接続を介して送り得る。
２．クラウドベースサーバは、レンダリング行列を（および、消費者が後でこれらの異なるバージョンから選ぶことができるように、おそらく少数の異なるバージョンを）計算し得る。
３．サーバは次いで、インターネット接続を介してオーディオ復号器にレンダリング行列を（または異なるバージョンを）送り返し得る。 [0061] Since an audio decoder typically does not require a lot of computational resources, the device cannot calculate an irregular rendering matrix in a consumer-friendly time. Various aspects of the techniques described in this disclosure may provide for using a cloud-based computing approach such as the following:
1. The audio decoder may send loudspeaker coordinates to the server (and in some cases also SPL measurement results obtained with a calibration microphone) via an Internet connection.
2. The cloud-based server may calculate the rendering matrix (and possibly a few different versions so that the consumer can later choose from these different versions).
3. The server may then send the rendering matrix (or a different version) back to the audio decoder over the internet connection.

[0062]このアプローチは、規則的なスピーカー構成または幾何学的配置のために通常設計されるレンダリング行列と比較して、（強力なプロセッサが、これらの不規則なレンダリング行列を計算するために必要とされ得ないので）より最適なオーディオ再生もまた容易にしながら、製造業者がオーディオ復号器の製造コストを低く保つことを可能にし得る。レンダリング行列を計算するためのアルゴリズムはまた、オーディオ復号器が出荷された後に最適化され得、ハードウェア修正（hardware revision）またはリコールさえものコストを潜在的に低減する。本技法はまた、いくつかの場合には、将来の製品開発のために有益であり得る消費者製品の異なるラウドスピーカー設定に関する多くの情報を収集し得る。 [0062] This approach is compared to the rendering matrices normally designed for regular speaker configurations or geometries (required for powerful processors to calculate these irregular rendering matrices May allow manufacturers to keep the cost of manufacturing the audio decoder low, while also facilitating more optimal audio playback. The algorithm for calculating the rendering matrix can also be optimized after the audio decoder is shipped, potentially reducing the cost of hardware revisions or even recalls. The technique may also collect a lot of information about the different loudspeaker settings of the consumer product that in some cases may be beneficial for future product development.

[0063]いくつかの場合には、上述したように、図３に示されるシステムは、ビットストリーム２１においてオーディオレンダリング情報２をシグナリングし得ないが、代わりに、ビットストリーム２１から分離したメタデータとしてこのオーディオレンダリング情報２をシグナリングし得る。代替または上述されたそれと併せて、図３に示されるシステムは、上述されるようなビットストリーム２１におけるオーディオレンダリング情報２の一部をシグナリングし得、ビットストリーム２１から分離したメタデータとしてこのオーディオレンダリング情報３の一部をシグナリングし得る。いくつかの例では、オーディオ符号化デバイス２０は、メタデータを出力し得、それは次いで、サーバまたは他のデバイスにアップロードされ得る。オーディオ復号デバイス２４は次いで、このメタデータをダウンロードするまたはさもなければ取り出し得、それは次いで、オーディオ符号化デバイス２４によってビットストリーム２１から抽出されたオーディオレンダリング情報を増強するために使用される。技術のレンダリング情報態様に従って形成されたビットストリーム２１は、図８Ａ−８Ｄの例に関して以下で説明される。 [0063] In some cases, as described above, the system shown in FIG. 3 may not signal audio rendering information 2 in bitstream 21, but instead as metadata separated from bitstream 21. This audio rendering information 2 can be signaled. 3 or in combination with that described above, the system shown in FIG. 3 may signal part of the audio rendering information 2 in the bitstream 21 as described above, and this audio rendering as metadata separated from the bitstream 21. Part of information 3 may be signaled. In some examples, the audio encoding device 20 may output metadata that may then be uploaded to a server or other device. The audio decoding device 24 may then download or otherwise retrieve this metadata, which is then used to augment the audio rendering information extracted from the bitstream 21 by the audio encoding device 24. A bitstream 21 formed in accordance with the rendering information aspect of the technology is described below with respect to the example of FIGS. 8A-8D.

[0064]図３は、本開示で説明される技法の様々な態様を実行することができる、図２の例に示されるオーディオ符号化デバイス２０の一例をより詳細に示すブロック図である。オーディオ符号化デバイス２０は、コンテンツ分析ユニット２６と、ベクトルベース分解ユニット２７と、指向性ベース分解ユニット２８とを含む。以下で手短に説明されるが、オーディオ符号化デバイス２０に関するより多くの情報、およびＨＯＡ係数を圧縮またはさもなければ符号化する様々な態様は、２０１４年５月２９に出願された「ＩＮＴＥＲＰＯＬＡＴＩＯＮＦＯＲＤＥＣＯＭＰＯＳＥＤＲＥＰＲＥＳＥＮＴＡＴＩＯＮＳＯＦＡＳＯＵＮＤＦＩＥＬＤ」という名称の国際特許出願公開第ＷＯ２０１４／１９４０９９号において入手可能である。 [0064] FIG. 3 is a block diagram illustrating in more detail an example of the audio encoding device 20 shown in the example of FIG. 2 that may perform various aspects of the techniques described in this disclosure. The audio encoding device 20 includes a content analysis unit 26, a vector-based decomposition unit 27, and a directivity-based decomposition unit 28. Although briefly described below, more information regarding the audio encoding device 20 and various aspects of compressing or otherwise encoding the HOA coefficients can be found in “INTERPOLATION FOR DECOMPOSED” filed May 29, 2014. It is available in International Patent Application Publication No. WO 2014/194099 entitled “REPRESENTATIONS OF A SOUND FIELD”.

[0065]コンテンツ分析ユニット２６は、ＨＯＡ係数１１がライブ録音から生成されたコンテンツを表すか、オーディオオブジェクトから生成されたコンテンツを表すかを特定するために、ＨＯＡ係数１１のコンテンツを分析するように構成されたユニットを表す。コンテンツ分析ユニット２６は、ＨＯＡ係数１１が実際の音場の録音から生成されたか人工的なオーディオオブジェクトから生成されたかを決定することができる。いくつかの場合には、フレーム化されたＨＯＡ係数１１が録音から生成されたとき、コンテンツ分析ユニット２６は、ＨＯＡ係数１１をベクトルベース分解ユニット２７に渡す。いくつかの場合には、フレーム化されたＨＯＡ係数１１が合成オーディオオブジェクトから生成されたとき、コンテンツ分析ユニット２６は、ＨＯＡ係数１１を指向性ベース合成ユニット２８に渡す。指向性ベース合成ユニット２８は、指向性ベースビットストリーム２１を生成するためにＨＯＡ係数１１の指向性ベース合成を実行するように構成されたユニットを表し得る。 [0065] The content analysis unit 26 analyzes the content of the HOA coefficient 11 to identify whether the HOA coefficient 11 represents content generated from a live recording or content generated from an audio object. Represents a configured unit. The content analysis unit 26 can determine whether the HOA coefficient 11 was generated from an actual sound field recording or an artificial audio object. In some cases, content analysis unit 26 passes HOA coefficient 11 to vector-based decomposition unit 27 when framed HOA coefficient 11 is generated from the recording. In some cases, the content analysis unit 26 passes the HOA coefficient 11 to the directivity-based synthesis unit 28 when the framed HOA coefficient 11 is generated from the synthesized audio object. The directivity-based combining unit 28 may represent a unit configured to perform directivity-based combining of the HOA coefficients 11 to generate the directivity-based bitstream 21.

[0066]図３の例に示されるように、ベクトルベース分解ユニット２７は、線形可逆変換（ＬＩＴ）ユニット３０と、パラメータ計算ユニット３２と、並べ替えユニット３４と、フォアグラウンド選択ユニット３６と、エネルギー補償ユニット３８と、聴覚心理オーディオコーダユニット４０と、ビットストリーム生成ユニット４２と、音場分析ユニット４４と、係数低減ユニット４６と、バックグラウンド（ＢＧ）選択ユニット４８と、空間時間的補間ユニット５０と、量子化ユニット５２とを含み得る。 [0066] As shown in the example of FIG. 3, the vector-based decomposition unit 27 includes a linear lossless transformation (LIT) unit 30, a parameter calculation unit 32, a reordering unit 34, a foreground selection unit 36, and energy compensation. Unit 38, psychoacoustic audio coder unit 40, bitstream generation unit 42, sound field analysis unit 44, coefficient reduction unit 46, background (BG) selection unit 48, spatiotemporal interpolation unit 50, A quantization unit 52.

[0067]線形可逆変換（ＬＩＴ）ユニット３０は、ＨＯＡチャネルの形態でＨＯＡ係数１１を受信し、各チャネルは、球面基底関数の所与の次数、副次数に関連付けられた係数のブロックまたはフレーム（ＨＯＡ［ｋ］と示され得、ただし、ｋはサンプルの現在のフレームまたはブロックを示し得る）を表す。ＨＯＡ係数１１の行列は、次元Ｄ：Ｍ×（Ｎ＋１）²を有し得る。 [0067] A linear reversible transform (LIT) unit 30 receives HOA coefficients 11 in the form of HOA channels, where each channel is a block or frame of coefficients associated with a given order, suborder of spherical basis functions ( HOA [k], where k may represent the current frame or block of samples). The matrix of HOA coefficients 11 may have dimension D: M × (N + 1) ² .

[0068]ＬＩＴユニット３０は、特異値分解と呼ばれるある形態の分析を実行するように構成されたユニットを表し得る。ＳＶＤに関して説明されているが、本開示で説明される技法は、線形的に無相関な、エネルギーが圧縮された出力のセットを提供する任意の類似の変換または分解に対して実行されてよい。また、本開示における「セット」への言及は、一般的に、それとは反対に特に明記されていない限り、非０のセットを指すことが意図され、いわゆる「空集合」を含む集合の古典的な数学的定義を指すことは意図されない。代替的な変換は、「ＰＣＡ」と呼ばれることが多い、主成分分析を備え得る。文脈に応じて、ＰＣＡは、いくつかの例を挙げれば、離散カルーネン−レーベ変換、ホテリング変換、固有直交分解（ＰＯＤ）、および固有値分解（ＥＶＤ）などのいくつかの異なる名前によって呼ばれることがある。オーディオデータを圧縮するという背後にある目標につながるそのような演算の特性は、マルチチャネルオーディオデータの「エネルギー圧縮」および「無相関化」である。 [0068] The LIT unit 30 may represent a unit configured to perform a form of analysis called singular value decomposition. Although described with respect to SVD, the techniques described in this disclosure may be performed for any similar transformation or decomposition that provides a linearly uncorrelated, energy-compressed set of outputs. Also, references to “sets” in this disclosure are generally intended to refer to non-zero sets, unless specified otherwise, and are classical for sets including so-called “empty sets”. It is not intended to refer to any mathematical definition. An alternative transformation may comprise principal component analysis, often referred to as “PCA”. Depending on the context, PCA may be referred to by several different names such as discrete Karhunen-Loeve transform, Hotelling transform, eigenorthogonal decomposition (POD), and eigenvalue decomposition (EVD), to name a few examples . The characteristics of such operations that lead to the goal behind compressing audio data are “energy compression” and “decorrelation” of multi-channel audio data.

[0069]いずれにしても、ＬＩＴユニット３０が、例として、特異値分解（やはり「ＳＶＤ」と呼ばれることがある）を実行すると仮定すると、ＬＩＴユニット３０は、ＨＯＡ係数１１を、変換されたＨＯＡ係数の２つ以上のセットに変換することができる。変換されたＨＯＡ係数の「セット」は、変換されたＨＯＡ係数のベクトルを含み得る。図３の例では、ＬＩＴユニット３０は、いわゆるＶ行列と、Ｓ行列と、Ｕ行列とを生成するために、ＨＯＡ係数１１に関してＳＶＤを実行することができる。ＳＶＤは、線形代数学では、ｙ×ｚの実行列または複素行列Ｘ（ここで、Ｘは、ＨＯＡ係数１１などのマルチチャネルオーディオデータを表し得る）の因数分解を以下の形で表し得る：
Ｘ＝ＵＳＶ＊
Ｕは、ｙ×ｙの実ユニタリー行列または複素ユニタリー行列を表し得、ここで、Ｕのｙ個の列は、マルチチャネルオーディオデータの左特異ベクトルとして知られる。Ｓは、対角線上に非負実数をもつｙ×ｚの矩形対角行列を表し得、ここで、Ｓの対角線値は、マルチチャネルオーディオデータの特異値として知られる。Ｖ＊（Ｖの共役転置を示し得る）はｚ×ｚの実ユニタリー行列または複素ユニタリー行列を表し得、ここで、Ｖ＊のｚ個の列は、マルチチャネルオーディオデータの右特異ベクトルとして知られる。 [0069] In any case, assuming that LIT unit 30 performs singular value decomposition (also sometimes referred to as "SVD") as an example, LIT unit 30 may convert HOA coefficient 11 into transformed HOA. It can be converted into two or more sets of coefficients. A “set” of transformed HOA coefficients may include a vector of transformed HOA coefficients. In the example of FIG. 3, the LIT unit 30 can perform SVD on the HOA coefficient 11 to generate a so-called V matrix, S matrix, and U matrix. SVD, in linear algebra, may represent a factorization of a y × z real matrix or complex matrix X, where X may represent multi-channel audio data such as HOA coefficient 11 in the following form:
X = USV *
U may represent a y × y real unitary or complex unitary matrix, where the y columns of U are known as the left singular vector of multichannel audio data. S may represent a y × z rectangular diagonal matrix with non-negative real numbers on the diagonal, where the diagonal value of S is known as a singular value of multi-channel audio data. V * (which may indicate a conjugate transpose of V) may represent a z × z real or complex unitary matrix, where the z columns of V * are known as the right singular vectors of multichannel audio data .

[0070]いくつかの例では、上で参照されたＳＶＤ数式中のＶ＊行列は、複素数を備える行列にＳＶＤが適用され得ることを反映するために、Ｖ行列の共役転置として示される。実数のみを備える行列に適用されるとき、Ｖ行列の複素共役（すなわち、言い換えれば、Ｖ＊行列）は、Ｖ行列の転置であると見なされてよい。以下では、説明を簡単にするために、ＨＯＡ係数１１が実数を備え、その結果、Ｖ＊行列ではなくＶ行列がＳＶＤによって出力されると仮定される。その上、本開示ではＶ行列として示されるが、Ｖ行列への言及は、適切な場合にはＶ行列の転置を指すものとして理解されるべきである。Ｖ行列であると仮定されているが、本技法は、同様の方式で、複素係数を有するＨＯＡ係数１１に適用されてよく、ここで、ＳＶＤの出力はＶ＊行列である。したがって、本技法は、この点について、Ｖ行列を生成するためにＳＶＤの適用を提供することのみに限定されるべきではなく、Ｖ＊行列を生成するために複素成分を有するＨＯＡ係数１１へのＳＶＤの適用を含んでよい。 [0070] In some examples, the V * matrix in the SVD formula referenced above is shown as a conjugate transpose of the V matrix to reflect that SVD can be applied to matrices with complex numbers. When applied to a matrix with only real numbers, the complex conjugate of the V matrix (ie, in other words, the V * matrix) may be considered a transpose of the V matrix. In the following, for ease of explanation, it is assumed that the HOA coefficient 11 comprises a real number, so that a V matrix is output by the SVD instead of a V * matrix. Moreover, although shown in this disclosure as a V matrix, references to the V matrix should be understood as referring to transposition of the V matrix where appropriate. Although assumed to be a V matrix, the technique may be applied to the HOA coefficients 11 with complex coefficients in a similar manner, where the output of the SVD is a V * matrix. Thus, the present technique should not be limited in this respect only to providing an application of SVD to generate a V matrix, but to an HOA coefficient 11 having a complex component to generate a V * matrix. Application of SVD may be included.

[0071]このようにして、ＬＩＴユニット３０は、次元Ｄ：Ｍ×（Ｎ＋１）²を有するＵＳ［ｋ］ベクトル３３（ＳベクトルとＵベクトルとの組み合わされたバージョンを表し得る）と、次元Ｄ：（Ｎ＋１）²×（Ｎ＋１）²を有するＶ［ｋ］ベクトル３５とを出力するために、ＨＯＡ係数１１に関してＳＶＤを実行することができる。ＵＳ［ｋ］行列中の個々のベクトル要素は、Ｘ_ps（ｋ）とも呼ばれることがあり、一方、Ｖ［ｋ］行列の個々のベクトルはｖ（ｋ）とも呼ばれることがある。 [0071] In this way, the LIT unit 30 has a US [k] vector 33 (which may represent a combined version of the S and U vectors) with dimension D: M × (N + 1) ² and dimension D. : SVD can be performed on the HOA coefficient 11 to output the V [k] vector 35 with (N + 1) ² × (N + 1) ² . Individual vector elements in the US [k] matrix may also be referred to as X _ps (k), while individual vectors in the V [k] matrix may be referred to as v (k).

[0072]Ｕ行列、Ｓ行列、およびＶ行列の分析は、それらの行列がＸによって上で表される背後の音場の空間的および時間的な特性を伝え、または表すということを明らかにし得る。（Ｍ個のサンプルの長さの）Ｕの中のＮ個のベクトルの各々は、（Ｍ個のサンプルによって表される時間期間の間は）時間の関数として、互いに直交しておりあらゆる空間特性（指向性情報とも呼ばれ得る）とは切り離されている、正規化された分離されたオーディオ信号を表し得る。空間的な形状と位置（ｒ、θ、φ）を表す空間特性は、代わりに、（各々が（Ｎ＋１）²の長さの）Ｖ行列の中の個々のｉ番目のベクトル、ｖ⁽ⁱ⁾（ｋ）によって表され得る。ｖ⁽ⁱ⁾（ｋ）ベクトルの各々の個々の要素は、関連付けられたオーディオオブジェクトのための音場の形状（幅を含む）と位置とを記述するＨＯＡ係数を表し得る。Ｕ行列中のベクトルとＶ行列中のベクトルの両方が、それらの２乗平均エネルギーが１に等しくなるように正規化される。したがって、Ｕの中のオーディオ信号のエネルギーは、Ｓの中の対角線要素によって表される。したがって、ＵＳ［ｋ］（個々のベクトル要素Ｘ_PS（ｋ）を有する）を形成するために、ＵとＳとを乗算することは、真のエネルギーを有するオーディオ信号を表す。（Ｕにおける）オーディオ時間信号と、（Ｓにおける）それらのエネルギーと、（Ｖにおける）それらの空間特性とを切り離すＳＶＤ分解の能力は、本開示で説明される技法の様々な態様を支援することができる。さらに、背後のＨＯＡ［ｋ］係数ＸをＵＳ［ｋ］とＶ［ｋ］とのベクトル乗算によって合成するモデルは、本文書全体で使用される、「ベクトルベース分解」という用語を生じさせる。 [0072] Analysis of the U, S, and V matrices may reveal that they convey or represent the spatial and temporal characteristics of the underlying sound field represented above by X . Each of the N vectors in U (of M samples in length) are orthogonal to each other as a function of time (during the time period represented by M samples) (Which may also be referred to as directional information) may represent a separated, separated audio signal. Spatial properties representing the spatial shape and position (r, θ, φ) are instead expressed as individual i th vectors in the V matrix (each of length (N + 1) ² ), v ⁽ⁱ⁾ It can be represented by (k). Each individual element of the v ⁽ⁱ⁾ (k) vector may represent a HOA coefficient that describes the shape (including width) and position of the sound field for the associated audio object. Both the vectors in the U matrix and the vectors in the V matrix are normalized so that their root mean square energy is equal to one. Thus, the energy of the audio signal in U is represented by the diagonal elements in S. Thus, multiplying U and S to form US [k] (with individual vector elements _XPS (k)) represents an audio signal with true energy. The ability of SVD decomposition to decouple audio time signals (in U), their energy (in S), and their spatial properties (in V) supports various aspects of the techniques described in this disclosure. Can do. Furthermore, a model that synthesizes the underlying HOA [k] coefficient X by vector multiplication of US [k] and V [k] yields the term “vector-based decomposition” that is used throughout this document.

[0073]ＨＯＡ係数１１に関して直接実行されるものとして説明されるが、ＬＩＴユニット３０は、線形可逆変換をＨＯＡ係数１１の派生物に適用することができる。たとえば、ＬＩＴユニット３０は、ＨＯＡ係数１１から導出された電力スペクトル密度行列に関してＳＶＤを適用することができる。ＨＯＡ係数自体ではなくＨＯＡ係数の電力スペクトル密度（ＰＳＤ）に関してＳＶＤを実行することによって、ＬＩＴユニット３０は潜在的に、プロセッササイクルおよび記憶空間のうちの１つまたは複数に関してＳＶＤを実行することの計算的な複雑さを低減しつつ、ＳＶＤがＨＯＡ係数に直接適用されたかのように同じソースオーディオ符号化効率を達成することができる。 [0073] Although described as being performed directly on the HOA coefficient 11, the LIT unit 30 may apply a linear reversible transform to the derivative of the HOA coefficient 11. For example, the LIT unit 30 can apply SVD on the power spectral density matrix derived from the HOA coefficient 11. By performing SVD on the power spectral density (PSD) of the HOA coefficient rather than the HOA coefficient itself, LIT unit 30 potentially calculates to perform SVD on one or more of processor cycles and storage space. The same source audio coding efficiency can be achieved as if SVD was applied directly to the HOA coefficients, while reducing the overall complexity.

[0074]パラメータ計算ユニット３２は、相関パラメータ（Ｒ）、指向性特性パラメータ（θ、φ、ｒ）、およびエネルギー特性（ｅ）などの様々なパラメータを計算するように構成されたユニットを表す。現在のフレームのためのパラメータの各々は、Ｒ［ｋ］、θ［ｋ］、φ［ｋ］、ｒ［ｋ］、およびｅ［ｋ］として示され得る。パラメータ計算ユニット３２は、パラメータを特定するために、ＵＳ［ｋ］ベクトル３３に関してエネルギー分析および／または相関（もしくはいわゆる相互相関）を実行することができる。パラメータ計算ユニット３２はまた、以前のフレームのためのパラメータを決定することができ、ここで、以前のフレームパラメータは、ＵＳ［ｋ−１］ベクトルおよびＶ［ｋ−１］ベクトルの以前のフレームに基づいて、Ｒ［ｋ−１］、θ［ｋ−１］、φ［ｋ−１］、ｒ［ｋ−１］、およびｅ［ｋ−１］と示され得る。パラメータ計算ユニット３２は、現在のパラメータ３７と以前のパラメータ３９とを並べ替えユニット３４に出力することができる。 [0074] The parameter calculation unit 32 represents a unit configured to calculate various parameters such as correlation parameters (R), directivity characteristic parameters (θ, φ, r), and energy characteristics (e). Each of the parameters for the current frame may be denoted as R [k], θ [k], φ [k], r [k], and e [k]. The parameter calculation unit 32 can perform energy analysis and / or correlation (or so-called cross-correlation) on the US [k] vector 33 to identify the parameters. The parameter calculation unit 32 can also determine parameters for the previous frame, where the previous frame parameters are stored in the previous frame of the US [k−1] and V [k−1] vectors. Based on this, R [k−1], θ [k−1], φ [k−1], r [k−1], and e [k−1] may be indicated. The parameter calculation unit 32 can output the current parameter 37 and the previous parameter 39 to the sorting unit 34.

[0075]パラメータ計算ユニット３２によって計算されるパラメータは、オーディオオブジェクトの自然な評価または時間的な継続性を表すようにオーディオオブジェクトを並べ替えるために、並べ替えユニット３４によって使用され得る。並べ替えユニット３４は、第１のＵＳ［ｋ］ベクトル３３からのパラメータ３７の各々を、第２のＵＳ［ｋ−１］ベクトル３３のためのパラメータ３９の各々に対して順番ごとに比較することができる。並べ替えユニット３４は、並べ替えられたＵＳ［ｋ］行列３３’（数学的には [0075] The parameters calculated by the parameter calculation unit 32 may be used by the reordering unit 34 to reorder the audio objects to represent the natural evaluation or temporal continuity of the audio objects. The reordering unit 34 compares each of the parameters 37 from the first US [k] vector 33 against each of the parameters 39 for the second US [k−1] vector 33 in turn. Can do. The reordering unit 34 is the reordered US [k] matrix 33 '

として示され得る）と、並べ替えられたＶ［ｋ］行列３５’（数学的には And the rearranged V [k] matrix 35 '(in mathematical terms)

として示され得る）とをフォアグラウンドサウンド（または支配的サウンド−ＰＳ（predominant sound））選択ユニット３６（「フォアグラウンド選択ユニット３６」）およびエネルギー補償ユニット３８に出力するために、現在のパラメータ３７および以前のパラメータ３９に基づいて、ＵＳ［ｋ］行列３３およびＶ［ｋ］行列３５内の様々なベクトルを（一例として、ハンガリー法を使用して）並べ替えることができる。 To the foreground sound (or dominant sound-PS (predominant sound) selection unit 36 ("foreground selection unit 36")) and the energy compensation unit 38 Based on the parameter 39, the various vectors in the US [k] matrix 33 and the V [k] matrix 35 can be reordered (using the Hungarian method as an example).

[0076]音場分析ユニット４４は、目標ビットレート４１を潜在的に達成するために、ＨＯＡ係数１１に関して音場分析を実行するように構成されたユニットを表し得る。音場分析ユニット４４は、その分析および／または受信された目標ビットレート４１に基づいて、聴覚心理コーダのインスタンス化の総数（環境またはバックグラウンドチャネルの総数（ＢＧ_TOT）とフォアグラウンドチャネル、または言い換えれば支配的なチャネルの数との関数であり得る、を決定することができる。聴覚心理コーダのインスタンス化の総数は、ｎｕｍＨＯＡＴｒａｎｓｐｏｒｔＣｈａｎｎｅｌｓとして示され得る。 [0076] The sound field analysis unit 44 may represent a unit configured to perform sound field analysis on the HOA coefficient 11 to potentially achieve the target bit rate 41. Based on the analysis and / or the received target bit rate 41, the sound field analysis unit 44 determines the total number of instances of the psychoacoustic coder (total number of environment or background channels (BG _TOT ) and foreground channels, or in other words Which can be a function of the number of dominant channels, the total number of instantiations of the psychoacoustic coder can be denoted as numHOATransportChannels.

[0077]音場分析ユニット４４はまた、やはり目標ビットレート４１を潜在的に達成するために、フォアグラウンドチャネルの総数（ｎＦＧ）４５と、バックグラウンド（または言い換えれば環境的な）音場の最小次数（Ｎ_BG、または代替的にはＭｉｎＡｍｂＨＯＡｏｒｄｅｒ）と、バックグラウンド音場の最小次数を表す実際のチャネルの対応する数（ｎＢＧａ＝（ＭｉｎＡｍｂＨＯＡｏｒｄｅｒ＋１）²）と、送るべき追加のＢＧＨＯＡチャネルのインデックス（ｉ）（図３の例ではバックグラウンドチャネル情報４３として総称的に示され得る）とを決定することができる。バックグラウンドチャネル情報４２は、環境チャネル情報４３とも呼ばれ得る。ｎｕｍＨＯＡＴｒａｎｓｐｏｒｔＣｈａｎｎｅｌｓ−ｎＢＧａで残るチャネルの各々は、「追加のバックグラウンド／環境チャネル」、「アクティブなベクトルベースの支配的なチャネル」、「アクティブな指向性ベースの支配的な信号」、または「完全に非アクティブ」のいずれかであり得る。一態様では、チャネルタイプは、２ビットによって（「ＣｈａｎｎｅｌＴｙｐｅ」として）示されたシンタックス要素であり得る（たとえば、００：指向性ベースの信号、０１：ベクトルベースの支配的な信号、１０：追加の環境信号、１１：非アクティブな信号）。バックグラウンド信号または環境信号の総数、ｎＢＧａは、（ＭｉｎＡｍｂＨＯＡｏｒｄｅｒ＋１）²＋（上記の例における）インデックス１０がそのフレームのためのビットストリームにおいてチャネルタイプとして現れる回数によって与えられ得る。 [0077] The sound field analysis unit 44 is also used to potentially achieve the target bit rate 41, as well as the total number of foreground channels (nFG) 45 and the minimum order of the background (or environmental) sound field. (N _BG , or alternatively MinAmbHOOrder), the corresponding number of actual channels representing the minimum order of the background sound field (nBGa = (MinAmbHOOrder + 1) ² ), and the index of the additional BG HOA channel to send (i ) (Which can be generically shown as background channel information 43 in the example of FIG. 3). Background channel information 42 may also be referred to as environmental channel information 43. Each of the remaining channels in numHOATransportChannels-nBGa is either “additional background / environment channel”, “active vector-based dominant channel”, “active directivity-based dominant signal”, or “completely non- It can be either “active”. In one aspect, the channel type may be a syntax element indicated by 2 bits (as “ChannelType”) (eg, 00: directivity-based signal, 01: vector-based dominant signal, 10: additional Environment signal, 11: inactive signal). The total number of background or environmental signals, nBGa, can be given by the number of times (MinAmbHOAorder + 1) ² + (in the above example) index 10 appears as the channel type in the bitstream for that frame.

[0078]いずれにしても、音場分析ユニット４４は、目標ビットレート４１に基づいて、バックグラウンド（または言い換えれば環境）チャネルの数とフォアグラウンド（または言い換えれば支配的な）チャネルの数とを選択し、目標ビットレート４１が比較的高いとき（たとえば、目標ビットレート４１が５１２Ｋｂｐｓ以上であるとき）はより多くのバックグラウンドチャネルおよび／またはフォアグラウンドチャネルを選択することができる。一態様では、ビットストリームのヘッダセクションにおいて、ｎｕｍＨＯＡＴｒａｎｓｐｏｒｔＣｈａｎｎｅｌｓは８に設定され得るが、一方で、ＭｉｎＡｍｂＨＯＡｏｒｄｅｒは１に設定され得る。このシナリオでは、各フレームにおいて、音場のバックグラウンド部分または環境部分を表すために４つのチャネルが確保され得るが、一方で、他の４つのチャネルは、フレームごとに、チャネルのタイプに応じて変化してよく、たとえば、追加のバックグラウンド／環境チャネルまたはフォアグラウンド／支配的なチャネルのいずれかとして使用され得る。フォアグラウンド／支配的な信号は、上記で説明されたように、ベクトルベースの信号または指向性ベースの信号のいずれかの１つであり得る。 [0078] In any case, the sound field analysis unit 44 selects the number of background (or in other words environmental) channels and the number of foreground (or in other words dominant) channels based on the target bit rate 41. However, when the target bit rate 41 is relatively high (for example, when the target bit rate 41 is 512 Kbps or more), more background channels and / or foreground channels can be selected. In one aspect, in the header section of the bitstream, numHOATransportChannels may be set to 8, while MinAmbHOOrder is set to 1. In this scenario, four channels may be reserved in each frame to represent the background part or the environment part of the sound field, while the other four channels depend on the channel type for each frame. It can vary and can be used, for example, as either an additional background / environment channel or a foreground / dominant channel. The foreground / dominant signal can be one of either a vector-based signal or a directivity-based signal, as described above.

[0079]いくつかの場合には、フレームのためのベクトルベースの支配的な信号の総数は、そのフレームのビットストリームにおいてＣｈａｎｎｅｌＴｙｐｅインデックスが０１である回数によって与えられ得る。上記の態様では、各々の追加のバックグラウンド／環境チャネル（たとえば、１０というＣｈａｎｎｅｌＴｙｐｅに対応する）に対して、（最初の４つ以外の）あり得るＨＯＡ係数のいずれがそのチャネルにおいて表され得るかの対応する情報。その情報は、４次のＨＯＡコンテンツについては、ＨＯＡ係数５〜２５を示すためのインデックスであり得る。最初の４つの環境ＨＯＡ係数１〜４は、ｍｉｎＡｍｂＨＯＡｏｒｄｅｒが１に設定されるときは常に送られ得、したがって、オーディオ符号化デバイスは、５〜２５のインデックスを有する追加の環境ＨＯＡ係数のうちの１つを示すことのみが必要であり得る。その情報はしたがって、「ＣｏｄｅｄＡｍｂＣｏｅｆｆＩｄｘ」として示され得る、（４次のコンテンツのための）５ビットのシンタックス要素を使用して送られ得る。いずれにしても、音場分析ユニット４４は、バックグラウンドチャネル情報４３とＨＯＡ係数１１とをバックグラウンド（ＢＧ）選択ユニット３６に、バックグラウンドチャネル情報４３を係数低減ユニット４６およびビットストリーム生成ユニット４２に、ｎＦＧ４５をフォアグラウンド選択ユニット３６に出力する。 [0079] In some cases, the total number of vector-based dominant signals for a frame may be given by the number of times the ChannelType index is 01 in the bitstream of that frame. In the above aspect, for each additional background / environment channel (eg, corresponding to a ChannelType of 10) which of the possible HOA coefficients (other than the first 4) can be represented in that channel Corresponding information. The information may be an index for indicating the HOA coefficients 5 to 25 for the fourth-order HOA content. The first four environmental HOA coefficients 1-4 may be sent whenever minAmbHOAorder is set to 1, so the audio encoding device is one of the additional environmental HOA coefficients with an index of 5-25. It may be necessary to show only one. That information may therefore be sent using a 5-bit syntax element (for 4th order content), which may be denoted as “CodedAmbCoeffIdx”. In any case, the sound field analysis unit 44 sends the background channel information 43 and the HOA coefficient 11 to the background (BG) selection unit 36, and the background channel information 43 to the coefficient reduction unit 46 and the bit stream generation unit 42. , NFG45 is output to the foreground selection unit 36.

[0080]バックグラウンド選択ユニット４８は、バックグラウンドチャネル情報（たとえば、バックグラウンド音場（Ｎ_BG）と、送るべき追加のＢＧＨＯＡチャネルの数（ｎＢＧａ）およびインデックス（ｉ）と）に基づいて、バックグラウンドまたは環境ＨＯＡ係数４７を決定するように構成されたユニットを表し得る。たとえば、Ｎ_BGが１に等しいとき、バックグラウンド選択ユニット４８は、１以下の次数を有するオーディオフレームの各サンプルのＨＯＡ係数１１を選択することができる。バックグラウンド選択ユニット４８は次いで、この例では、インデックス（ｉ）のうちの１つによって特定されるインデックスを有するＨＯＡ係数１１を、追加のＢＧＨＯＡ係数として選択することができ、ここで、ｎＢＧａは、図２および図４の例に示されるオーディオ復号デバイス２４などのオーディオ復号デバイスがビットストリーム２１からバックグラウンドＨＯＡ係数４７を解析することを可能にするために、ビットストリーム２１において指定されるために、ビットストリーム生成ユニット４２に提供される。バックグラウンド選択ユニット４８は次いで、環境ＨＯＡ係数４７をエネルギー補償ユニット３８に出力することができる。環境ＨＯＡ係数４７は、次元Ｄ：Ｍ×［（Ｎ_BG＋１）²＋ｎＢＧａ］を有し得る。環境ＨＯＡ係数４７はまた、「環境ＨＯＡ係数４７」と呼ばれることもあり、ここで、環境ＨＯＡ係数４７の各々は、聴覚心理オーディオコーダユニット４０によって符号化されるべき別個の環境ＨＯＡチャネル４７に対応する。 [0080] The background selection unit 48 is based on background channel information (eg, background sound field (N _BG ) and number of additional BG HOA channels to send (nBGa) and index (i)), A unit configured to determine the background or environmental HOA factor 47 may be represented. For example, when N _BG is equal to 1, the background selection unit 48 can select the HOA coefficient 11 for each sample of an audio frame having an order of 1 or less. The background selection unit 48 can then select, in this example, the HOA coefficient 11 having the index specified by one of the indices (i) as an additional BG HOA coefficient, where nBGa is To be specified in the bitstream 21 to enable an audio decoding device such as the audio decoding device 24 shown in the examples of FIGS. 2 and 4 to parse the background HOA coefficient 47 from the bitstream 21 , Provided to the bitstream generation unit 42. The background selection unit 48 can then output the environmental HOA coefficient 47 to the energy compensation unit 38. The environmental HOA factor 47 may have a dimension D: M × [(N _BG +1) ² + nBGa]. The environmental HOA coefficients 47 may also be referred to as “environmental HOA coefficients 47”, where each of the environmental HOA coefficients 47 corresponds to a separate environmental HOA channel 47 to be encoded by the psychoacoustic audio coder unit 40. To do.

[0081]フォアグラウンド選択ユニット３６は、（フォアグラウンドベクトルを特定する１つまたは複数のインデックスを表し得る）ｎＦＧ４５に基づいて、音場のフォアグラウンド成分または明瞭な成分を表す、並べ替えられたＵＳ［ｋ］行列３３’と並べ替えられたＶ［ｋ］行列３５’とを選択するように構成されたユニットを表し得る。フォアグラウンド選択ユニット３６は、（並べ替えられたＵＳ［ｋ］_1,...,nFG４９、ＦＧ_1,...,nfG［ｋ］４９、または [0081] The foreground selection unit 36 reorders the US [k] representing the foreground or distinct components of the sound field based on the nFG 45 (which may represent one or more indices identifying the foreground vector). It may represent a unit configured to select the matrix 33 ′ and the sorted V [k] matrix 35 ′. The foreground selection unit 36 (reordered US [k] _{1, ..., nFG} 49, FG _{1, ..., nfG} [k] 49, or

として示され得る）ｎＦＧ信号４９を、聴覚心理オーディオコーダユニット４０に出力することができ、ここで、ｎＦＧ信号４９は次元Ｄ：Ｍ×ｎＦＧを有し、モノラルオーディオオブジェクトを各々表し得る。フォアグラウンド選択ユニット３６はまた、音場のフォアグラウンド成分に対応する並べ替えられたＶ［ｋ］行列３５’（またはｖ^(1..nFG)（ｋ）３５’）を空間時間的補間ユニット５０に出力することができ、ここで、フォアグラウンド成分に対応する並べ替えられたＶ［ｋ］行列３５’のサブセットは、次元Ｄ：（Ｎ＋１）²×ｎＦＧを有するフォアグラウンドＶ［ｋ］行列５１_kとして示され得る（これは、 NFG signal 49 can be output to psychoacoustic audio coder unit 40, where nFG signal 49 has dimension D: M × nFG and can each represent a mono audio object. The foreground selection unit 36 also ^outputs a rearranged V [k] matrix 35 ′ (or v ^(1..nFG) (k) 35 ′) corresponding to the foreground component of the sound field to the spatiotemporal interpolation unit 50. Where the subset of the sorted V [k] matrix 35 ′ corresponding to the foreground component is shown as a foreground V [k] matrix 51 _k with dimension D: (N + 1) ² × nFG. Get (this is

として数学的に示され得る）。 As mathematically).

[0082]エネルギー補償ユニット３８は、バックグラウンド選択ユニット４８によるＨＯＡチャネルのうちの様々なチャネルの除去によるエネルギー損失を補償するために、環境ＨＯＡ係数４７に関してエネルギー補償を実行するように構成されたユニットを表し得る。エネルギー補償ユニット３８は、並べ替えられたＵＳ［ｋ］行列３３’、並べ替えられたＶ［ｋ］行列３５’、ｎＦＧ信号４９、フォアグラウンドＶ［ｋ］ベクトル５１_k、および環境ＨＯＡ係数４７のうちの１つまたは複数に関してエネルギー分析を実行し、次いで、エネルギー補償された環境ＨＯＡ係数４７’を生成するために、そのエネルギー分析に基づいてエネルギー補償を実行することができる。エネルギー補償ユニット３８は、エネルギー補償された環境ＨＯＡ係数４７’を聴覚心理オーディオコーダユニット４０に出力することができる。 [0082] The energy compensation unit 38 is a unit configured to perform energy compensation on the environmental HOA coefficient 47 to compensate for energy loss due to removal of various of the HOA channels by the background selection unit 48. Can be represented. The energy compensation unit 38 includes a rearranged US [k] matrix 33 ′, a rearranged V [k] matrix 35 ′, an nFG signal 49, a foreground V [k] vector 51 _k , and an environmental HOA coefficient 47. An energy analysis may be performed on one or more of the following, and then energy compensation may be performed based on the energy analysis to generate an energy compensated environmental HOA coefficient 47 '. The energy compensation unit 38 can output the energy-compensated environmental HOA coefficient 47 ′ to the psychoacoustic audio coder unit 40.

[0083]空間時間的補間ユニット５０は、ｋ番目のフレームのためのフォアグラウンドＶ［ｋ］ベクトル５１_kと以前のフレームのための（したがってｋ−１という表記である）フォアグラウンドＶ［ｋ−１］ベクトル５１_k-1とを受信し、補間されたフォアグラウンドＶ［ｋ］ベクトルを生成するために空間時間的補間を実行するように構成されたユニットを表し得る。空間時間的補間ユニット５０は、並べ替えられたフォアグラウンドＨＯＡ係数を復元するために、ｎＦＧ信号４９をフォアグラウンドＶ［ｋ］ベクトル５１_kと再び組み合わせることができる。空間時間的補間ユニット５０は次いで、補間されたｎＦＧ信号４９’を生成するために、補間されたＶ［ｋ］ベクトルによって、並べ替えられたフォアグラウンドＨＯＡ係数を分割することができる。空間時間的補間ユニット５０はまた、オーディオ復号デバイス２４などのオーディオ復号デバイスが補間されたフォアグラウンドＶ［ｋ］ベクトルを生成しそれによってフォアグラウンドＶ［ｋ］ベクトル５１_kを復元できるように、補間されたフォアグラウンドＶ［ｋ］ベクトルを生成するために使用されたフォアグラウンドＶ［ｋ］ベクトル５１_kを出力することができる。補間されたフォアグラウンドＶ［ｋ］ベクトルを生成するために使用されたフォアグラウンドＶ［ｋ］ベクトル５１_kは、残りのフォアグラウンドＶ［ｋ］ベクトル５３として示される。同じＶ［ｋ］およびＶ［ｋ−１］が符号化器および復号器において（補間されたベクトルＶ［ｋ］を作成するために）使用されることを保証するために、ベクトルの量子化された／逆量子化されたバージョンが符号化器および復号器において使用され得る。空間時間的補間ユニット５０は、補間されたｎＦＧ信号４９’を、聴覚心理オーディオコーダユニット４６に、および補間されたフォアグラウンドＶ［ｋ］ベクトル５１を、係数低減ユニット４６に出力し得る。 [0083] The spatio-temporal interpolation unit 50 performs the foreground V [k] vector 51 _k for the k th frame and the foreground V [k−1] for the previous frame (hence the notation k−1). Representing a unit configured to receive the vector 51 _k−1 and perform spatiotemporal interpolation to generate an interpolated foreground V [k] vector. The spatiotemporal interpolation unit 50 can recombine the nFG signal 49 with the foreground V [k] vector 51 _k to recover the sorted foreground HOA coefficients. The spatiotemporal interpolation unit 50 can then divide the sorted foreground HOA coefficients by the interpolated V [k] vector to produce an interpolated nFG signal 49 '. The spatiotemporal interpolation unit 50, so that it can restore the foreground V [k] vector 51 _k generated thereby foreground V [k] vector audio decoding device is interpolated, such as an audio decoding device 24, the interpolated The foreground V [k] vector 51 _k used to generate the foreground V [k] vector can be output. The foreground V [k] vector 51 _k that was used to generate the interpolated foreground V [k] vector is shown as the remaining foreground V [k] vector 53. In order to ensure that the same V [k] and V [k−1] are used in the encoder and decoder (to create the interpolated vector V [k]), the vector quantization is performed. A quantized / dequantized version may be used in the encoder and decoder. The spatiotemporal interpolation unit 50 may output the interpolated nFG signal 49 ′ to the psychoacoustic audio coder unit 46 and the interpolated foreground V [k] vector 51 to the coefficient reduction unit 46.

[0084]係数低減ユニット４６は、低減されたフォアグラウンドＶ［ｋ］ベクトル５５を量子化ユニット５２に出力するために、バックグラウンドチャネル情報４３に基づいて残りのフォアグラウンドＶ［ｋ］ベクトル５３に関して係数低減を実行するように構成されたユニットを表し得る。低減されたフォアグラウンドＶ［ｋ］ベクトル５５は、次元Ｄ：［（Ｎ＋１）²−（Ｎ_BG＋１）²−ＢＧ_TOT］×ｎＦＧを有し得る。係数低減ユニット４６は、この点において、残りのフォアグラウンドＶ［ｋ］ベクトル５３における係数の数を低減するように構成されたユニットを表し得る。言い換えれば、係数低減ユニット４６は、指向性情報をほとんどまたはまったく有しない（残りのフォアグラウンドＶ［ｋ］ベクトル５３を形成する）フォアグラウンドＶ［ｋ］ベクトルにおける係数を除去するように構成されたユニットを表し得る。いくつかの例では、（Ｎ_BGと示され得る）１次および０次の基底関数に対応する、明瞭な、または言い換えればフォアグラウンドＶ［ｋ］ベクトルの係数は、指向性情報をほとんど提供せず、したがって、（「係数低減」と呼ばれ得るプロセスを通じて）フォアグラウンドＶベクトルから除去され得る。この例では、Ｎ_BGに対応する係数を特定するだけではなく、追加のＨＯＡチャネル（変数ＴｏｔａｌＯｆＡｄｄＡｍｂＨＯＡＣｈａｎによって示され得る）を［（Ｎ_BG＋１）²＋１，（Ｎ＋１）²］のセットから特定するために、より大きい柔軟性が与えられ得る。 [0084] Coefficient reduction unit 46 performs coefficient reduction on the remaining foreground V [k] vector 53 based on background channel information 43 to output reduced foreground V [k] vector 55 to quantization unit 52. May represent a unit configured to perform The reduced foreground V [k] vector 55 may have dimension D: [(N + 1) ² − (N _BG +1) ² −BG _TOT ] × nFG. The coefficient reduction unit 46 may represent a unit configured in this respect to reduce the number of coefficients in the remaining foreground V [k] vector 53. In other words, coefficient reduction unit 46 includes units configured to remove coefficients in the foreground V [k] vector (forming the remaining foreground V [k] vector 53) that have little or no directivity information. Can be represented. In some examples, the coefficients of the clear or in other words foreground V [k] vectors, corresponding to the 1st and 0th order basis functions (which may be denoted N _BG ) provide little directivity information. Therefore, it can be removed from the foreground V vector (through a process that can be referred to as “coefficient reduction”). In this example, not only to identify the coefficient corresponding to N _BG , but also to identify an additional HOA channel (which may be indicated by the variable TotalOfAddAmbHOAChan) from the set of [(N _BG +1) ² +1, (N + 1) ² ] Greater flexibility may be provided.

[0085]量子化ユニット５２は、コーディングされたフォアグラウンドＶ［ｋ］ベクトル５７を生成するために低減されたフォアグラウンドＶ［ｋ］ベクトル５５を圧縮するための任意の形態の量子化を実行し、コーディングされたフォアグラウンドＶ［ｋ］ベクトル５７をビットストリーム生成ユニット４２に出力するように構成されたユニットを表し得る。動作において、量子化ユニット５２は、音場の空間成分、すなわちこの例では低減されたフォアグラウンドＶ［ｋ］ベクトル５５のうちの１つまたは複数を圧縮するように構成されたユニットを表し得る。量子化ユニット５２は、「ＮｂｔｓＱ」と表される量子化モードシンタックス要素によって示されるような、以下の１２個の量子化モードのうちの任意の１つを実行し得る：
ＮｂｔｓＱ値量子化モードのタイプ
０−３：予約済み
４：ベクトル量子化
５：ハフマンコーディングなしのスカラー量子化
６：ハフマンコーディングありの６ビットスカラー量子化
７：ハフマンコーディングありの７ビットスカラー量子化
８：ハフマンコーディングありの８ビットスカラー量子化
．．．．．．
１６：ハフマンコーディングありの１６ビットスカラー量子化
量子化ユニットはまた、量子化モードの上記のタイプのうちの任意のものの予測バージョンを実行し得、ここで、差は、以前のフレームのＶベクトルの（またはベクトル量子化が実行された時の重み）の要素と、決定された現在のフレームのＶベクトルの要素（またはベクトル量子化が実行された時の重み）との間で決定される。その後、量子化ユニット５２は、現在のフレーム自体のＶベクトルの要素の値よりむしろ、現在のフレームの現在のフレームの要素または重みと、以前のフレームの要素または重みとの間の差を量子化し得る。 [0085] Quantization unit 52 performs any form of quantization to compress reduced foreground V [k] vector 55 to generate coded foreground V [k] vector 57, and coding May represent a unit configured to output the generated foreground V [k] vector 57 to the bitstream generation unit 42. In operation, the quantization unit 52 may represent a unit configured to compress one or more of the spatial components of the sound field, ie, the reduced foreground V [k] vector 55 in this example. Quantization unit 52 may perform any one of the following 12 quantization modes, as indicated by the quantization mode syntax element denoted “NbtsQ”:
NbtsQ value Quantization mode type
0-3: Reserved
4: Vector quantization
5: Scalar quantization without Huffman coding
6: 6-bit scalar quantization with Huffman coding
7: 7-bit scalar quantization with Huffman coding
8: 8-bit scalar quantization with Huffman coding
. . . . . .
16: 16-bit scalar quantization with Huffman coding
The quantization unit may also perform a predictive version of any of the above types of quantization modes, where the difference is the V-vector of the previous frame (or when vector quantization was performed). Weight) and the determined V-vector element of the current frame (or weight when vector quantization is performed). Quantization unit 52 then quantizes the difference between the current frame element or weight of the current frame and the previous frame element or weight, rather than the value of the V vector element of the current frame itself. obtain.

[0086]量子化ユニット５２は低減されたフォアグラウンドＶ［ｋ］ベクトル５５の複数のコード化バージョンを取得するために、低減されたフォアグラウンドＶ［ｋ］ベクトル５５の各々に関して量子化の複数の形式を実行し得る。量子化ユニット５２は、コード化フォアグラウンドＶ［ｋ］ベクトル５７として低減されたフォアグラウンドＶ［ｋ］ベクトル５５のコード化バージョンのうちの１つを選択し得る。量子化ユニット５２は、本開示で説明される基準の任意の組合せに基づいて、出力切替えされ量子化されたＶベクトルとして使用するために、予測されないベクトル量子化されたＶベクトル、予測されベクトル量子化されたＶベクトル、ハフマンコーディングされないスカラー量子化されたＶベクトル、およびハフマンコーディングされスカラー量子化されたＶベクトルのうちの１つを選択することができる。いくつかの例では、量子化ユニット５２は、ベクトル量子化モードと１つまたは複数のスカラー量子化モードとを含む、量子化モードのセットから量子化モードを選択し、選択されたモードに基づいて（または従って）、入力Ｖベクトルを量子化することができる。量子化ユニット５２は次いで、（たとえば、重み値またはそれを示すビットに関して）予測されないベクトル量子化されたＶベクトル、（たとえば、誤差値またはそれを示すビットに関して）予測されベクトル量子化されたＶベクトル、ハフマンコーディングされないスカラー量子化されたＶベクトル、およびハフマンコーディングされスカラー量子化されたＶベクトルのうちの選択されたものを、コーディングされたフォアグラウンドＶ［ｋ］ベクトル５７としてビットストリーム生成ユニット５２に与えることができる。量子化ユニット５２はまた、量子化モードを示すシンタックス要素（たとえば、ＮｂｉｔｓＱシンタックス要素）と、Ｖベクトルを逆量子化またはさもなければ再構成するために使用される任意の他のシンタックス要素とを与えることができる。 [0086] Quantization unit 52 obtains multiple versions of quantization for each of the reduced foreground V [k] vectors 55 to obtain multiple encoded versions of the reduced foreground V [k] vectors 55. Can be executed. Quantization unit 52 may select one of the coded versions of reduced foreground V [k] vector 55 as coded foreground V [k] vector 57. The quantization unit 52 is configured to use an unpredicted vector quantized V-vector, a predicted vector quantum for use as an output-switched and quantized V-vector based on any combination of criteria described in this disclosure. One of a quantized V vector, a non-Huffman coded scalar quantized V vector, and a Huffman coded scalar quantized V vector can be selected. In some examples, the quantization unit 52 selects a quantization mode from a set of quantization modes, including a vector quantization mode and one or more scalar quantization modes, and based on the selected mode (Or therefore) the input V-vector can be quantized. Quantization unit 52 then performs an unpredicted vector quantised V-vector (eg, with respect to a weight value or a bit indicating it), and a predicted vector-quantized V-vector (eg, with respect to an error value or a bit indicating it). , A non-Huffman-coded scalar quantized V-vector and a selected one of the Huffman-coded and scalar-quantized V-vectors are provided to the bitstream generation unit 52 as coded foreground V [k] vectors 57 be able to. The quantization unit 52 also includes a syntax element indicating the quantization mode (eg, NbitsQ syntax element) and any other syntax element used to dequantize or otherwise reconstruct the V vector. And can give.

[0087]オーディオ符号化デバイス２０内に含まれる聴覚心理オーディオコーダユニット４０は、聴覚心理オーディオコーダの複数のインスタンスを表し得、これらの各々は、符号化された環境ＨＯＡ係数５９と符号化されたｎＦＧ信号６１とを生成するために、エネルギー補償された環境ＨＯＡ係数４７’および補間されたｎＦＧ信号４９’の各々の異なるオーディオオブジェクトまたはＨＯＡチャネルを符号化するために使用される。聴覚心理オーディオコーダユニット４０は、符号化された環境ＨＯＡ係数５９と符号化されたｎＦＧ信号６１とをビットストリーム生成ユニット４２に出力することができる。 [0087] The psychoacoustic audio coder unit 40 included within the audio encoding device 20 may represent multiple instances of the psychoacoustic audio coder, each of which is encoded with an encoded environmental HOA coefficient 59. In order to generate the nFG signal 61, it is used to encode each different audio object or HOA channel of the energy compensated environmental HOA coefficient 47 'and the interpolated nFG signal 49'. The psychoacoustic audio coder unit 40 can output the encoded environmental HOA coefficient 59 and the encoded nFG signal 61 to the bitstream generation unit 42.

[0088]オーディオ符号化デバイス２０内に含まれるビットストリーム生成ユニット４２は、既知のフォーマット（復号デバイスによって知られているフォーマットを指し得る）に適合するようにデータをフォーマットし、それによってベクトルベースのビットストリーム２１を生成するユニットを表す。ビットストリーム２１は、言い換えれば、上記で説明された方法で符号化されている、符号化されたオーディオデータを表し得る。ビットストリーム生成ユニット４２は、いくつかの例ではマルチプレクサを表してよく、マルチプレクサは、コーディングされたフォアグラウンドＶ［ｋ］ベクトル５７と、符号化された環境ＨＯＡ係数５９と、符号化されたｎＦＧ信号６１と、バックグラウンドチャネル情報４３とを受信することができる。ビットストリーム生成ユニット４２は次いで、コーディングされたフォアグラウンドＶ［ｋ］ベクトル５７と、符号化された環境ＨＯＡ係数５９と、符号化されたｎＦＧ信号６１と、バックグラウンドチャネル情報４３とに基づいて、ビットストリーム２１を生成することができる。このようにして、ビットストリーム生成ユニットは、それによって、ビットストリーム２１を取得するためにビットストリーム２１におけるベクトル５７を指定し得る。ビットストリーム２１は、主要またはメインビットストリームと、１つまたは複数のサイドチャネルビットストリームとを含み得る。 [0088] A bitstream generation unit 42 included within the audio encoding device 20 formats the data to conform to a known format (which may refer to a format known by the decoding device), thereby providing a vector-based This represents a unit that generates the bitstream 21. In other words, the bitstream 21 may represent encoded audio data that has been encoded in the manner described above. Bitstream generation unit 42 may represent a multiplexer in some examples, which includes a coded foreground V [k] vector 57, an encoded environmental HOA coefficient 59, and an encoded nFG signal 61. And the background channel information 43 can be received. The bitstream generation unit 42 then generates a bit based on the coded foreground V [k] vector 57, the encoded environmental HOA coefficient 59, the encoded nFG signal 61, and the background channel information 43. Stream 21 can be generated. In this way, the bitstream generation unit can thereby specify a vector 57 in the bitstream 21 to obtain the bitstream 21. Bitstream 21 may include a main or main bitstream and one or more side channel bitstreams.

[0089]本技法の様々な態様はまた、上述したように、ビットストリーム生成ユニット４６がビットストリーム２１におけるオーディオレンダリング情報２を指定することを可能にし得る。来たる３Ｄオーディオ圧縮ワーキングドラフトの現在のバージョンは、ビットストリーム２１内にシグナリング指定ダウンミックス行列（signaling specific downmix matrices）を提供する一方、ワーキングドラフトは、ビットストリーム中にレンダリングＨＯＡ係数１１において使用されるレンダラの指定を提供しない。ＨＯＡコンテンツについて、そのようなダウンミックス行列と同等のものは、ＨＯＡ表現を所望のラウドスピーカーフィードに変換するレンダリング行列である。本開示において説明される技法の様々な態様は、（例えば、オーディオレンダリング情報２としての）ビットストリーム内のＨＯＡレンダリング行列をシグナリングするために、ビットストリーム生成ユニット４６を許容することによってＨＯＡおよびチャネルコンテンツの特徴セットをさらに調和を提案する。 [0089] Various aspects of the techniques may also allow the bitstream generation unit 46 to specify audio rendering information 2 in the bitstream 21, as described above. The current version of the upcoming 3D audio compression working draft provides signaling specific downmix matrices in the bitstream 21, while the working draft is used in the rendering HOA coefficients 11 in the bitstream. Does not provide a renderer specification. For HOA content, the equivalent of such a downmix matrix is a rendering matrix that converts the HOA representation into the desired loudspeaker feed. Various aspects of the techniques described in this disclosure may include HOA and channel content by allowing bitstream generation unit 46 to signal a HOA rendering matrix in a bitstream (eg, as audio rendering information 2). Propose further harmony with the feature set.

[0090]ＨＯＡのために最適化され、ダウンミックス行列のコーディングスキームに基づく１つの例示のシグナリングソリューションは、以下に表される。ダウンミックス行列の送信に類似して、ＨＯＡレンダリング行列は、ｍｐｅｇｈ３ｄａＣｏｎｆｉｇＥｘｔｅｎｓｉｏｎ（）内にシグナリングされ得る。本技法は、（既存の表に対する変更をイタリック体およびボールド体で示す）以下の表イタリック体および既存の表に対する大胆な示す変更で）に記述しているように新規の拡張タイプＩＤ＿ＣＯＮＦＩＧ＿ＥＸＴ＿ＨＯＡ＿ＭＡＴＲＩＸを提供し得る。 [0090] One exemplary signaling solution optimized for HOA and based on a downmix matrix coding scheme is represented below. Similar to the transmission of the downmix matrix, the HOA rendering matrix can be signaled in mpegh3daConfigExtension (). The technique provides a new extension type ID_CONFIG_EXT_HOA_MATRIX as described in the following table italics and bold changes to existing tables (showing changes to existing tables in italics and bold): obtain.

[0091]ビットフィールドＨＯＡＲｅｎｄｅｒｉｎｇＭａｔｒｉｘＳｅｔ（）は、ＤｏｗｎｍｉｘＭａｔｒｉｘＳｅｔ（）と比較して構造および機能性において等しくなり得る。ｉｎｐｕｔＣｏｕｎｔ（ａｕｄｉｏＣｈａｎｎｅｌＬａｙｏｕｔ）の代わりに、ＨＯＡＲｅｎｄｅｒｉｎｇＭａｔｒｉｘＳｅｔ（）は、ＨＯＡＣｏｎｆｉｇにおいて計算される、「同等な」ＮｕｍＯｆＨｏａＣｏｅｆｆｓ値を使用し得る。さらに、ＨＯＡ係数のオーダーがＨＯＡ復号器内に固定され得る（例えば、ＣＤ中の付録Ｇを参照）ので、ＨＯＡＲｅｎｄｅｒｉｎｇＭａｔｒｉｘＳｅｔは、ｉｎｐｕｔＣｏｎｆｉｇ（ａｕｄｉｏＣｈａｎｎｅｌＬａｙｏｕｔ）とどれも同等である必要はない。 [0091] The bitfield HOARenderingMatrixSet () may be equal in structure and functionality compared to DownmixMatrixSet (). Instead of inputCount (audioChannelLayout), HOA RenderingMatrixSet () may use an “equivalent” NumOfHoaCoeffs value calculated in HOAConfig. Further, since the order of the HOA coefficients can be fixed in the HOA decoder (see, eg, Appendix G in the CD), HOARenderingMatrixSet need not be equivalent to any inputConfig (audioChannelLayout).

[0092]本技法の様々な態様はまた、ビットストリーム生成ユニット４６が、（ベクトルベース分解ユニット２７によって表される分解圧縮スキームのような）第１の圧縮スキームを使用するＨＯＡオーディオデータ（例えば、図４の例におけるＨＯＡ１１）を圧縮するとき、第２の圧縮スキーム（例えば、指向性ベース分解ユニットによって表される指向性ベース圧縮スキーム（directional-based compression scheme）または指向性ベース圧縮スキーム（directionality-based compression scheme））に対応するビットがビットストリーム２１中に含まれないような、ビットストリーム２１を指定することを可能にし得る。例えば、ビットストリーム生成ユニット４２は、使用が指向性ベース圧縮スキームの指向性信号間の予測情報を指定するために予約され得るＨＯＡＰｒｅｄｉｃｔｉｏｎＩｎｆｏシンタックス要素またはフィールドを含まないように、ビットストリーム２１を生成し得る。この開示において説明される本技法の様々な態様に従って生成されたビットストリーム２１の例は、図８Ｅおよび８Ｆの例において示される。 [0092] Various aspects of the present technique may also include HOA audio data in which the bitstream generation unit 46 uses a first compression scheme (such as the decomposition compression scheme represented by the vector-based decomposition unit 27) (eg, When compressing the HOA 11 in the example of FIG. 4, a second compression scheme (eg, a directional-based compression scheme represented by a directional-based decomposition unit or a directional-based compression scheme (directionality-) It may be possible to specify a bitstream 21 such that the bits corresponding to the based compression scheme)) are not included in the bitstream 21. For example, the bitstream generation unit 42 generates the bitstream 21 such that use does not include a HOAPreductionInfo syntax element or field that may be reserved to specify prediction information between directional signals of a directional-based compression scheme. obtain. An example of a bitstream 21 generated in accordance with various aspects of the techniques described in this disclosure is shown in the examples of FIGS. 8E and 8F.

[0093]言い換えれば、指向性信号の予測は、指向性ベース分解ユニット２８によって用いられ、（指向性ベース信号を示し得る）ＣｈａｎｎｅｌＴｙｐｅ０の存在に依存する支配的なサウンド合成の一部であり得る。指向性ベース信号がフレーム内に存在しない場合、指向性信号の予測は、実行され得ない。しかしながら、たとえ使用されないとしても、関連する側波帯情報ＨＯＡＰｒｅｄｉｃｔｉｏｎＩｎｆｏ（）は、指向性ベース信号の存在とは無関係にあらゆるフレームに書き込まれている。指向性信号がフレーム内に存在しない場合、この開示において説明される本技法は、ビットストリーム生成ユニット４２が（ここで、下線を持つイタリック体が追加を示す）以下の表に記述されるような側波帯におけるＨＯＡＰｒｅｄｉｃｔｉｏｎＩｎｆｏをシグナリングしないことによって側波帯のサイズを低減することを可能にし得る： [0093] In other words, directional signal prediction may be part of the dominant sound synthesis that is used by the directional base decomposition unit 28 and depends on the presence of ChannelType 0 (which may indicate a directional base signal). If the directional base signal is not present in the frame, directional signal prediction cannot be performed. However, even if not used, the associated sideband information HOAPreductionInfo () is written in every frame regardless of the presence of the directional base signal. If a directional signal is not present in the frame, the techniques described in this disclosure are such that the bitstream generation unit 42 is described in the following table (where an italic with an underline indicates addition): It may be possible to reduce the size of the sideband by not signaling HOAPreductionInfo in the sideband:

[0094]この点において、本技法は、オーディオ符号化デバイス２０のようなデバイスが、第１の圧縮技法を使用して、高次アンビソニックオーディオデータを圧縮するとき、高次アンビソニックオーディオデータを圧縮するためにさらに使用される第２の圧縮スキームに対応するビットを含まない高次アンビソニックオーディオデータの圧縮バージョンを表すビットストリームを指定するように構成されることを可能にし得る。 [0094] In this regard, the present technique provides for the higher order ambisonic audio data when a device, such as the audio encoding device 20, compresses the higher order ambisonic audio data using the first compression technique. It may be possible to be configured to specify a bitstream that represents a compressed version of higher-order ambisonic audio data that does not include bits corresponding to a second compression scheme that is further used to compress.

[0095]いくつかの場合には、第１の圧縮技法は、ベクトルベース分解圧縮スキームを備える。これらおよび他の場合には、ベクトルベース分解圧縮スキームは、高次アンビソニックオーディオデータに対する特異値分解（または本開示においてより詳細説明されたのと同等のもの）のアプリケーションを含む圧縮スキームを備える。 [0095] In some cases, the first compression technique comprises a vector-based decomposition compression scheme. In these and other cases, the vector-based decomposition compression scheme comprises a compression scheme that includes application of singular value decomposition (or equivalent to that described in more detail in this disclosure) to higher-order ambisonic audio data.

[0096]これらおよび他の場合には、オーディオ符号化デバイス２０は、圧縮スキームの第２のタイプを実行するために使用される少なくとも１つのシンタックス要素に対応するビットを含まないビットストリームを指定するように構成され得る。第２の圧縮スキームは、上述するように、指向性ベース圧縮スキームを備え得る。 [0096] In these and other cases, the audio encoding device 20 specifies a bitstream that does not include bits corresponding to at least one syntax element used to perform the second type of compression scheme. Can be configured to. The second compression scheme may comprise a directivity-based compression scheme, as described above.

[0097]オーディオ符号化デバイス２０はまた、ビットストリーム２１が第２の圧縮スキームのＨＯＡＰｒｅｄｉｃｔｉｏｎＩｎｆｏシンタックス要素に対応するビットを含まないように、ビットストリーム２１を指定するように構成され得る。 [0097] The audio encoding device 20 may also be configured to specify the bitstream 21 such that the bitstream 21 does not include bits corresponding to the HOAPPredictionInfo syntax element of the second compression scheme.

[0098]第２の圧縮スキームが指向性ベース圧縮スキームを備える場合、オーディオ符号化デバイス２０は、ビットストリーム２１が指向性ベース圧縮スキームのＨＯＡＰｒｅｄｉｃｔｉｏｎＩｎｆｏシンタックス要素に対応するビットを含まないように、ビットストリーム２１を指定するように構成され得る。言い換えれば、オーディオ符号化デバイス２０は、ビットストリーム２１が圧縮スキームの第２のタイプを実行するために使用される少なくとも１つのシンタックス要素に対応するビットを含まないように、ビットストリーム２１を指定するように構成され得、少なくとも１つのシンタックス要素は２つ以上の指向性ベース信号間の予測を示す。更にもう一度言い換えると、第２の圧縮技術が指向性ベース圧縮スキームを備えるとき、オーディオ符号化デバイス２０は、ビットストリーム２１が指向性ベース圧縮スキームのＨＯＡＰｒｅｄｉｃｔｉｏｎＩｎｆｏシンタックス要素に対応するビットを含まないように、ビットストリーム２１を指定するように構成され得る。ここで、ＨＯＡＰｒｅｄｉｃｔｉｏｎＩｎｆｏシンタックス要素は、２つ以上の指向性ベース信号間の予測を示す。 [0098] If the second compression scheme comprises a directional-based compression scheme, the audio encoding device 20 may ensure that the bitstream 21 does not include bits corresponding to the HOAPreductionInfo syntax element of the directional-based compression scheme. It may be configured to specify a stream 21. In other words, the audio encoding device 20 specifies the bitstream 21 such that the bitstream 21 does not include bits corresponding to at least one syntax element used to perform the second type of compression scheme. The at least one syntax element indicates a prediction between two or more directional base signals. In other words, when the second compression technique comprises a directional-based compression scheme, the audio encoding device 20 ensures that the bitstream 21 does not include bits that correspond to the HOAPreductionInfo syntax element of the directional-based compression scheme. , May be configured to specify the bitstream 21. Here, the HOAPPredictionInfo syntax element indicates a prediction between two or more directional base signals.

[0099]本技法の様々な態様は、ビットストリーム２１が利得修正データを含まないようなある場合において、ビットスキーム生成ユニット４６がビットストリーム２１を指定することをさらに可能にし得る。ビットストリーム生成ユニット４６は、利得修正が抑制されるとき、ビットストリーム２１が利得修正データを含まないようにビットストリーム２１を指定し得る。本技法の様々な態様に従って生成されたビットストリーム２１の例は、上述されるように、図８Ｅおよび図８Ｆの例中に示される。 [0099] Various aspects of the techniques may further allow the bit scheme generation unit 46 to specify the bitstream 21 in certain cases where the bitstream 21 does not include gain correction data. The bitstream generation unit 46 may specify the bitstream 21 such that the bitstream 21 does not include gain correction data when gain correction is suppressed. An example of a bitstream 21 generated according to various aspects of the techniques is shown in the examples of FIGS. 8E and 8F, as described above.

[0100]いくつかの場合には、聴覚心理符号化の他のタイプと比較して聴覚心理符号化のあるタイプのより相対的に小さい動的範囲を仮定すると、これらの聴覚心理符号化のあるタイプが実行されるとき、利得修正は、適用される。例えば、ＡＡＣは、音声音響統合コーディング（ＵＳＡＣ：unified speech and audio coding）より相対的に小さい動的範囲を有する。（ベクトルベース圧縮スキームまたは指向性ベース圧縮スキームのような）圧縮スキームが、ＵＳＡＣを含む場合、ビットストリーム生成ユニット４６は、利得修正が（ビットストリーム中にゼロの値を持つＨＯＡＣｏｎｆｉｇにおけるシンタックス要素ＭａｘＧａｉｎＣｏｒｒＡｍｐＥｘｐを指定することによって）抑制されたビットストリーム中にシグナリングし得、その後、（ＨＯＡＧａｉｎＣｏｒｒｅｃｔｉｏｎＤａｔａ（）フィールドにおける）利得修正データを含まないようにビットストリーム２１を指定し得る。 [0100] In some cases, assuming a relatively smaller dynamic range of certain types of psychoacoustic coding compared to other types of psychoacoustic coding, When the type is executed, gain correction is applied. For example, AAC has a dynamic range that is relatively smaller than unified speech and audio coding (USAC). If the compression scheme (such as a vector-based compression scheme or a directivity-based compression scheme) includes USAC, the bitstream generation unit 46 determines that the gain correction (in the HOAConfig having a value of zero in the bitstream) is the syntax element MaxGainCorArExpExp. May be signaled into the suppressed bitstream (by specifying), and then the bitstream 21 may be specified to not include gain correction data (in the HOAGainCollectionData () field).

[0101]言い換えれば、ＨＯＡＣｏｎｆｉｇの一部としてビットフィールドＭａｘＧａｉｎＣｏｒｒＡｍｐＥｘｐ（ＣＤにおける表７１を参照）は、自動利得制御モジュールがＵＳＡＣコアコーディングの前にトランスポートチャネル信号に影響する範囲を制御し得る。いくつかの場合には、このモジュールは、ＲＭ０が利用可能なＡＡＣ符号化器実装の非理想の動的範囲を改善するために開発された。統合フェーズの間のＡＡＣからＵＳＡＣコアコーダまでの変更で、コア符号化器の動的範囲は、改善され得、したがって、この利得制御モジュールのための必要性は、以前ほど批判的になり得ない。 [0101] In other words, the bit field MaxGainCorrAmpExp (see Table 71 in the CD) as part of HOAConfig may control the range in which the automatic gain control module affects the transport channel signal prior to USAC core coding. In some cases, this module was developed to improve the non-ideal dynamic range of AAC encoder implementations where RM0 is available. With changes from the AAC to the USAC core coder during the integration phase, the dynamic range of the core encoder can be improved, so the need for this gain control module cannot be as critical as before.

[0102]いくつかの場合には、利得制御機能性は、ＭａｘＧａｉｎＣｏｒｒＡｍｐＥｘｐが０に設定される場合、抑制されることができる。これらの場合には、関連する側波帯情報ＨＯＡＧａｉｎＣｏｒｒｅｃｔｉｏｎＤａｔａ（）は、「ＨＯＡＦｒａｍｅのシンタックス」を示す上記の表ごとにあらゆるＨＯＡフレームに書き込まれ得ない。ＭａｘＧａｉｎＣｏｒｒＡｍｐＥｘｐが０に設定される場合の構成に対して、この開示において説明される本技法は、ＨＯＡＧａｉｎＣｏｒｒｅｃｔｉｏｎＤａｔａをシグナルし得ない。さらに、そのようなシナリオでは、逆利得制御モジュールは、任意の負の側面の影響のないトランスポートチャネル毎に約０．０５ＭＯＰＳによって復号器複雑性を低減するときでさえ、バイパスされ得る。 [0102] In some cases, gain control functionality can be suppressed when MaxGainCorrAmpExp is set to zero. In these cases, the associated sideband information HOAGainCollectionData () cannot be written to any HOA frame for each of the above tables indicating “HOAFframe syntax”. For configurations where MaxGainCorrAmpExp is set to 0, the techniques described in this disclosure may not signal HOAGainCollectionData. Further, in such a scenario, the inverse gain control module can be bypassed even when reducing decoder complexity by about 0.05 MOPS per transport channel without the impact of any negative aspects.

[0103]この点において、本技法は、利得修正が高次アンビソニックオーディオデータの圧縮の間に抑制されるとき、ビットストリーム２１が利得修正情報を含まないように、高次アンビソニックオーディオデータの圧縮したバージョンを表すビットストリーム２１を指定するようにオーディオ符号化デバイス２０を構成し得る。 [0103] In this regard, the present technique provides for higher-order ambisonic audio data so that the bitstream 21 does not include gain correction information when gain correction is suppressed during compression of the higher-order ambisonic audio data. Audio encoding device 20 may be configured to specify a bitstream 21 that represents a compressed version.

[0104]これらおよび他の場合には、オーディオ符号化デバイス２０は、高次アンビソニックオーディオデータの圧縮されたバージョンを生成するために、ベクトルベース分解圧縮スキームに従って高次アンビソニックオーディオデータを圧縮するように構成され得る。分解圧縮スキームの例は、高次アンビソニックオーディオデータの圧縮されたバージョンを生成するために、高次アンビソニックオーディオデータに特異値分解（または以上でより詳細に説明したものと同等のもの）のアプリケーションを含み得る。 [0104] In these and other cases, the audio encoding device 20 compresses the higher order ambisonic audio data according to a vector-based decomposition compression scheme to generate a compressed version of the higher order ambisonic audio data. Can be configured as follows. An example of a decomposition compression scheme is a singular value decomposition (or equivalent to that described in more detail above) to higher order ambisonic audio data to produce a compressed version of higher order ambisonic audio data. Can include applications.

[0105]これらおよび他の場合には、オーディオ符号化デバイス２０は、利得修正が抑制されることを示すために、ゼロにビットストリーム２１におけるＭａｘＧａｉｎＣｏｒｒＡｍｂＥｘｐシンタックス要素を指定するように構成され得る。いくつかの場合には、オーディオ符号化デバイス２０は、利得修正が抑圧されるとき、ビットストリーム２１が利得修正データを記憶するＨＯＡＧａｉｎＣｏｒｒｅｃｔｉｏｎデータフィールドを含まないようなビットストリーム２１を指定するように構成され得る。言い換えれば、オーディオ符号化デバイス２０は、利得修正が抑制され、利得修正データを記憶するＨＯＡＧａｉｎＣｏｒｒｅｃｔｉｏｎデータフィールドをビットストリーム中に含まないことを示すためにゼロにビットストリーム２１におけるＭａｘＧａｉｎＣｏｒｒＡｍｂＥｘｐシンタックス要素を指定するように構成され得る。 [0105] In these and other cases, the audio encoding device 20 may be configured to specify the MaxGainCorAmbExp syntax element in the bitstream 21 to zero to indicate that gain correction is suppressed. In some cases, the audio encoding device 20 is configured to specify a bitstream 21 such that when the gain correction is suppressed, the bitstream 21 does not include a HOAGainCollection data field that stores the gain correction data. obtain. In other words, the audio encoding device 20 specifies the MaxGainCorrAmbExp syntax element in the bitstream 21 to zero to indicate that the gain correction is suppressed and that the HOAGainCollection data field that stores the gain correction data is not included in the bitstream. Can be configured as follows.

[0106]これらおよび他の場合には、オーディオ符号化デバイス２０は、高次アンビソニックオーディオデータの圧縮が高次アンビソニックオーディオデータに対して音声音響統合コーディング（ＵＳＡＣ）を含む場合に、利得修正を抑制するように構成され得る。 [0106] In these and other cases, the audio encoding device 20 may perform gain correction if the compression of the higher order ambisonic audio data includes speech acoustic integrated coding (USAC) for the higher order ambisonic audio data. It can be configured to suppress.

[0107]ビットストリーム２１における様々な情報のシグナリングに対する上記の潜在的な最適化は、以下でさらに詳細に説明される方法で適応されるまたはさもなければ更新され得る。更新は、以下で議論される、他の更新と併せて適用される、または以上で説明した様々な態様のみを更新するために使用され得る。したがって、以上で説明した最適化に対して以下で説明される単一更新のアプリケーションまたは以上で説明した最適化に対する以下で説明する更新の任意の特定の組み合わせのアプリケーションを含む、上記で説明された最適化に対する更新の潜在的な各組み合わせが考えられる。 [0107] The potential optimization described above for the signaling of various information in the bitstream 21 may be adapted or otherwise updated in a manner described in more detail below. Updates may be used to update only the various aspects discussed below, applied in conjunction with other updates, or described above. Thus, as described above, including the single update application described below for the optimization described above or any specific combination of applications described below for the optimization described above. Each potential combination of updates to optimization is considered.

[0108]ビットストリームにおける行列を指定するために、以下の表においてボールド体で表されハイライトされるように以下で示されるような、ビットストリーム２１のｍｐｅｇｈ３ｄａＣｏｎｆｉｇＥｘｔｅｎｓｉｏｎ（）におけるＩＤ＿ＣＯＮＦＩＧ＿ＥＸＴ＿ＨＯＡ＿ＭＡＴＲＩＸを指定する。以下の表は、ビットストリーム２１のｍｐｅｇｈ３ｄａＣｏｎｆｉｇＥｘｔｅｎｓｉｏｎ（）の一部を指定するためのシンタックスを表す： [0108] To specify a matrix in the bitstream, specify ID_CONFIG_EXT_HOA_MATRIX in mpegh3daConfigExtension () of the bitstream 21 as shown below in bold and highlighted in the table below. The following table shows the syntax for specifying part of mpegh3daConfigExtension () of bitstream 21:

上記の表におけるＩＤ＿ＣＯＮＦＩＧ＿ＥＸＴ＿ＨＯＡ＿ＭＡＴＲＩＸは、レンダリング行列を指定するコンテナを提供し、コンテナは、「ＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘＳｅｔ（）」として示される。 ID_CONFIG_EXT_HOA_MATRIX in the above table provides a container that specifies a rendering matrix, which is denoted as “HoaRenderingMatrixSet ()”.

[0109]ＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘＳｅｔ（）コンテナのコンテンツは、以下の表において記述されるシンタックスにしたがって定義され得る： [0109] The contents of the HoaRenderingMatrixSet () container may be defined according to the syntax described in the following table:

上の表に直接示されるように、ＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘＳｅｔ（）は、ｎｕｍＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｃｅｓ、ＨｏａＲｅｎｄｅｒｅｒｉｎｇＭａｔｒｉｘＩｄ、ＣＩＣＰｓｐｅａｋｅｒＬａｙｏｕｔＩｄｘ、ＨｏａＭａｔｒｉｘＬｅｎＢｉｔｓ、およびＨｏＡＲｅｎｄｅｒｉｎｇＭａｔｒｉｘを含むいくつかの異なるシンタックス要素を含む。 As shown directly in the table above, HoaRenderingMatrixSet () contains numHoaRenderingMatrixes, HoaRenderingMatrixId, CICPpeakerLayoutOutxx, HoaMatrixLenBits, and HoAMatrixLenBits.

[0110]ｎｕｍＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｃｅｓシンタックス要素は、ビットストリーム要素におけるいくつかのＨｏａＲｅｎｄｅｒｅｒｉｎｇＭａｔｒｉｘＩｄ定義を指定し得る。ＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘＩｄシンタックス要素は、復号器側で利用可能なデフォルトＨＯＡレンダリング行列または送信されたＨＯＡレンダリング行列のためのＩｄを一意に定義するフィールドを表わし得る。この点において、ＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘＩｄは、ビットストリームが複数のスピーカーフィードに対して球面調和係数をレンダリングするために使用される行列を含むことを示すインデックスを定義する２以上のビットを含む信号値の例、または複数のスピーカーフィードに対して球面調和係数をレンダリングするために使用される複数の行列のうちの１つと関連付けられるインデックスを定義する２以上のビットを含む信号値の例を表し得る。ＣＩＣＰｓｐｅａｋｅｒＬａｙｏｕｔＩｄｘシンタックス要素は、所与のＨＯＡレンダリング行列のための出力ラウドスピーカーレイアウトを説明する値を表し、ＩＳＯ／ＩＥＣ２３０００１−８中に定義されたＣｈａｎｎｅｌＣｏｎｆｉｇｕｒａｔｉｏｎ要素に対応し得る。ＨｏａＭａｔｒｉｘＬｅｎＢｉｔｓ（「ＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘＬｅｎＢｉｔｓ」とも示される）シンタックス要素は、ビットにおける以下のビットストリーム要素（例えば、ＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘ（）コンテナ）の長さを指定し得る。 [0110] The numHoaRenderingMatrix syntax element may specify several HoaRenderingMatrixId definitions in the bitstream element. The HoaRenderingMatrixId syntax element may represent a field that uniquely defines an Id for a default HOA rendering matrix available at the decoder side or a transmitted HOA rendering matrix. In this regard, HoaRenderingMatrixId is an example of a signal value that includes two or more bits that define an index that indicates that the bitstream includes a matrix used to render spherical harmonics for multiple speaker feeds, or An example of a signal value including two or more bits defining an index associated with one of a plurality of matrices used to render spherical harmonics for a plurality of speaker feeds may be represented. The CICPpeakerLayoutIdx syntax element represents a value that describes the output loudspeaker layout for a given HOA rendering matrix and may correspond to the ChannelConfiguration element defined in ISO / IEC 23000 1-8. The HoaMatrixLenBits (also referred to as “HoaRenderingMatrixLenBits”) syntax element may specify the length of the following bitstream elements (eg, HoaRenderingMatrix () container) in bits.

[0111]ＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘ（）コンテナは、後続にｏｕｔｐｕｔＣｏｎｆｉｇ（）コンテナおよびｏｕｔｐｕｔＣｏｕｎｔ（）コンテナが続くＮｕｍＯｆＨｏａＣｏｅｆｆｓを含む。ｏｕｔｐｕｔＣｏｎｆｉｇ（）コンテナは、各ラウドスピーカーに関する情報を指定するチャネル構成ベクトルを含み得る。ビットストリーム生成ユニット４２は、出力レイアウトのチャネル構成から知られる、このラウドスピーカー情報を仮定し得る。各エントリ、ｏｕｔｐｕｔＣｏｎｆｉｇ［ｉ］は、以下のメンバでデータ構造を表す：
ＡｚｉｍｕｔｈＡｎｇｌｅ（スピーカーアジマス角の絶対値を示し得る）；
ＡｚｉｍｕｔｈＤｉｒｅｃｔｉｏｎ（１つの例として左のために０および右のために１を使用するアジマス方向を示し得る）；
仰角（Elevation Angle）（スピーカー仰角の絶対値を示し得る）；
ＥｌｅｖａｔｉｏｎＤｉｒｅｃｔｉｏｎ（１つの例として上のために０下のために１を使用する仰角方向（elevation direction）を示し得る）；および
ｉｓＬＦＥ（スピーカーが低周波効果（ＬＦＥ：low frequency effect）スピーカーかどうか示し得る）。
ビットストリーム生成ユニット４２は、「ｆｉｎｄＳｙｍｍｅｔｒｉｃＳｐｅａｋｅｒｓ」として示される、いくつかの場合において、ヘルパー関数（helper function）を呼び出し得、それは、以下でさらに指定され得る：
ｐａｉｒＴｙｐｅ（（いくつかの例では２つのスピーカーのシンメトリックペアを意味する）ＳＹＭＭＥＴＲＩＣ、ＣＥＮＴＥＲ、またはＡＳＹＭＭＥＴＲＩＣの値を記憶し得る）；
ｓｙｍｍｅｔｒｉｃＰａｉｒ−＞ｏｒｉｇｉｎａｌＰｏｓｉｔｉｏｎ（ＳＹＭＭＥＴＲＩグループのみについて、グループ内の第２のスピーカ（例えば、右）のオリジナルチャネル構成における位置を示し得る）。
ｏｕｔｐｕｔＣｏｕｎｔ（）コンテナは、ＨＯＡレンダリング行列が定義されるいくつかのラウドスピーカーを指定し得る。 [0111] The HoaRenderingMatrix () container includes NumOfHoaCoeffs followed by an outputConfig () container and an outputCount () container. The outputConfig () container may contain a channel configuration vector that specifies information about each loudspeaker. Bitstream generation unit 42 may assume this loudspeaker information, known from the channel configuration of the output layout. Each entry, outputConfig [i], represents the data structure with the following members:
Azimuth Angle (can indicate the absolute value of the speaker azimuth angle);
AzimuthDirection (can indicate azimuth direction using 0 for left and 1 for right as one example);
Elevation Angle (can indicate the absolute value of the speaker elevation);
ElevationDirection (which may indicate an elevation direction using 1 for down 0 for up as an example); and
isLFE (can indicate whether the speaker is a low frequency effect (LFE) speaker).
Bitstream generation unit 42 may call a helper function, in some cases, denoted as “findSymmetricSpeakers”, which may be further specified below:
pairType (which may store the value of SYMMETRIC, CENTER, or ASYMMETRIC (in some examples means a symmetric pair of two speakers));
symbolicpair-> originalPosition (for the SYMMETRI group only, the position of the second speaker in the group (eg, right) in the original channel configuration may be indicated).
The outputCount () container may specify a number of loudspeakers on which the HOA rendering matrix is defined.

[0112]ビットストリーム生成ユニット４２は、以下の表に記述されるシンタックスに従ってＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘ（）コンテナを指定し得る： [0112] The bitstream generation unit 42 may specify a HoaRenderingMatrix () container according to the syntax described in the following table:

以上の表に直接示されるように、ｎｕｍＰａｉｒｓシンタックス要素は、ｏｕｔｐｕｔＣｏｕｎｔおよびｏｕｔｐｕｔＣｏｎｆｉｇを使用し、入力としてｈａｓＬｆｅＲｅｎｄｅｒｉｎｇ使用する、ｆｉｎｄＳｙｍｍｅｔｒｉｃＳｐｅａｋｅｒｓヘルパー関数を呼び出すことからの値出力にセットされる。したがって、ｎｕｍＰａｉｒｓは、効率的なシンメトリックコーディングのために考慮され得る出力ラウドスピーカーセットアップにおいて識別されるシンメトリックラウドスピーカーペアの数を示し得る。以上の表におけるｐｒｅｃｉｓｉｏｎＬｅｖｅｌシンタックス要素は、以下の表に従う利得の一様量子化のために使用される精度を示し得る： As shown directly in the table above, the numPairs syntax element is set to the value output from calling the findSymmetricSpeakers helper function using outputCount and outputConfig, using hasLfeRendering as input. Thus, numPairs may indicate the number of symmetric loudspeaker pairs identified in the output loudspeaker setup that can be considered for efficient symmetric coding. The precisionLevel syntax element in the above table may indicate the accuracy used for uniform quantization of the gain according to the following table:

[0113]ＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘ（）のシンタックスを記述する以上の表に示されるｇａｉｎＬｉｍｉｔＰｅｒＨｏａＯｒｄｅｒシンタックス要素は、ｍａｘＧａｉｎおよびｍｉｎＧａｉｎが欠く次数のためまたは全ＨＯＡレンダリング行列のためにここに指定されるかどうかを示すフラグを表し得る。ｍａｘＧａｉｎ［ｉ］シンタックス要素は、１つの例として、デシベル（ｄＢ）で表されるＨＯＡ次数ｉのための係数に対する行列における最大の実際の利得を指定し得る。ｍｉｎＧａｉｎ［ｉ］シンタックス要素は、１つの例として、ｄＢで表されるＨＯＡ次数ｉの係数に対する行列における最少の実際の利得を指定し得る。ｉｓＦｕｌｌＭａｔｒｉｘシンタックス要素は、ＨＯＡレンダリング行列が希薄または十分かどうかを示すフラグを表し得る。ｆｉｒｓｔＳｐａｒｓｅＯｒｄｅｒシンタックス要素は、ＨＯＡレンダリング行列がｉｓＦｕｌｌＭａｔｒｉｘシンタックス要素ごとに希薄なものとして指定された場合には、希薄にコード化される第１のＨＯＡ次数を指定し得る。ｉｓＨｏａＣｏｅｆＳｐａｒｓｅシンタックス要素は、ｆｉｒｓｔＳｐａｒｓｅＯｒｄｅｒシンタックス要素から導出されたｂｉｔｍａｓｋベクトルを表わし得る。ｌｆｅＥｘｉｓｔｓシンタックス要素は、１つまたは複数のＬＦＥがｏｕｔｐｕｔＣｏｎｆｉｇ中に存在するかどうかを示すフラグを表わし得る。ｈａｓＬｆｅＲｅｎｄｅｒｉｎｇシンタックス要素は、レンダリング行列が１つまたは複数のＬＦＥチャネルのための非ゼロ要素を含むかどうかを示す。ｚｅｒｏｔｈＯｒｄｅｒＡｌｗａｙｓＰｏｓｉｔｉｖｅシンタックス要素は、第０次のＨＯＡ次数が正値のみを有するかどうかを示すフラグを表わし得る。 [0113] Describes the syntax of HoaRenderingMatrix () The gainLimitPerHoaOrder syntax element shown in the table above is a flag that indicates whether maxGain and minGain are specified here for orders lacking maxGain and minGain or for the full HOA rendering matrix Can be represented. The maxGain [i] syntax element may specify, as one example, the maximum actual gain in the matrix for the coefficient for the HOA order i expressed in decibels (dB). The minGain [i] syntax element may specify, as an example, the minimum actual gain in the matrix for the coefficient of HOA degree i expressed in dB. The isFullMatrix syntax element may represent a flag that indicates whether the HOA rendering matrix is sparse or sufficient. The firstSparseOrder syntax element may specify the first HOA order that is sparsely encoded if the HOA rendering matrix is specified as sparse for each isFullMatrix syntax element. The isHoaCoefSparse syntax element may represent a bitmask vector derived from the firstSparseOrder syntax element. The lfeExists syntax element may represent a flag that indicates whether one or more LFEs are present in the outputConfig. The hasLfeRendering syntax element indicates whether the rendering matrix includes non-zero elements for one or more LFE channels. The zeroOrderAlwaysPositive syntax element may represent a flag indicating whether the zeroth-order HOA order has only a positive value.

[0114]ｉｓＡｌｌＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素は、すべてのシンメトリックラウドスピーカーペアがＨＯＡレンダリング行列中に等しい絶対値を有するかどうかを示すフラグを表わし得る。ｉｓＡｎｙＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素は、シンメトリックラウドスピーカーペアのうちのいくつかがＨＯＡレンダリング行列中に等しい絶対値を有するかどうか、例えば偽であるとき、を示すフラグを表わす。ｖａｌｕｅＳｙｍｍｅｔｒｉｃＰａｉｒｓシンタックス要素は、値シンメトリックを持つラウドスピーカーのペアを示す長さｎｕｍＰａｉｒｓのｂｉｔｍａｓｋを表わし得る。ｉｓＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素は、ｖａｌｕｅＳｙｍｍｅｔｒｉｃＰａｉｒｓシンタックス要素からの表３において示される方法で導出されるｂｉｔｍａｓｋを表わし得る。ｉｓＡｌｌＳｉｇｎＳｙｍｍｅｔｒｉｃシンタックス要素は、行列における値シンメトリがない時、すべての新メトリックラウドスピーカーペアが少なくともナンバー符号シンメトリ（number sign symmetries）を有するかどうかを示し得る。ｉｓＡｎｙＳｉｇｎＳｙｍｍｅｔｒｉｃシンタックス要素は、ナンバー符号シンメトリを持つ少なくともいくつかのシンメトリックラウドスピーカーペアがあるかどうかを示すフラグを表わし得る。ｓｉｇｎＳｙｍｍｅｔｒｉｃＰａｉｒｓシンタックス要素は、符号シンメトリを持つラウドスピーカーペアを示す長さｎｕｍＰａｉｒｓのｂｉｔｍａｓｋを表わし得る。ｉｓＳｉｇｎＳｙｍｍｅｔｒｉｃ変数は、ＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘ（）のシンタックスを記述する表において以上で示される方法でｓｉｇｎＳｙｍｍｅｔｒｉｃＰａｉｒｓシンタックス要素から導出されるｂｉｔｍａｓｋを表わし得る。ｈａｓＶｅｒｔｉｃａｌＣｏｅｆシンタックス要素は、行列が水平のみのＨＯＡレンダリング行列かどうかを示すフラグを表わし得る。ｂｏｏｔＶａｌシンタックス要素は、復号ループにおいて使用される変数を表わし得る。 [0114] The isAllValueSymmetric syntax element may represent a flag indicating whether all symmetric loudspeaker pairs have equal absolute values in the HOA rendering matrix. The isAnyValueSymmetric syntax element represents a flag that indicates whether some of the symmetric loudspeaker pairs have equal absolute values in the HOA rendering matrix, for example when false. The valueSymmetricPairs syntax element may represent a bitmask of length numPairs indicating a pair of loudspeakers with value symmetries. The isValueSymmetric syntax element may represent a bitmask derived in the manner shown in Table 3 from the valueSymmetricPairs syntax element. The isAllSignSymmetric syntax element may indicate whether all new metric loudspeaker pairs have at least number sign symmetries when there is no value symmetry in the matrix. The isAnySignSymmetric syntax element may represent a flag that indicates whether there are at least some symmetric loudspeaker pairs with number sign symmetry. The signSymmetricPairs syntax element may represent a bitmask of length numPairs indicating a loudspeaker pair with code symmetry. The isSignSymmetric variable may represent a bitmask derived from the signSymmetricPairs syntax element in the manner shown above in the table describing the syntax of HoaRenderingMatrix (). The hasVerticalCoef syntax element may represent a flag indicating whether the matrix is a horizontal only HOA rendering matrix. The bootVal syntax element may represent a variable used in the decoding loop.

[0115]言い換えれば、ビットストリーム生成ユニット４２は、以上の値シンメトリ情報のうちの任意の１つまたは複数（例えば、ｉｓＡｌｌＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素、ｉｓＡｎｙＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素、ｖａｌｕｅＳｙｍｍｅｔｒｉｃＰａｉｒｓシンタックス要素、ｉｓＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素、およびｖａｌｕｅＳｙｍｍｅｔｒｉｃＰａｉｒｓシンタックス要素の１つまたは複数の任意の組み合わせ）を生成するために、またはさもなければ値シンメトリック情報を取得するためにオーディオレンダラ１を分析し得る。ビットストリーム生成ユニット４２は、オーディオレンダラ情報２が値符号シンメトリ情報を含むような以上で示した方法におけるビットストリーム２１中のオーディオレンダラ情報２を指定し得る。 [0115] In other words, the bitstream generation unit 42 may use any one or more of the above value symmetry information (eg, isAllValueSymmetric syntax element, isAnyValueSymmetric syntax element, valueSymmetricPairs syntax element, isValueSyntax syntax element, isValueSyntax syntax element, Audio renderer 1 may be analyzed to generate (or any combination of one or more of) SymmetricPairs syntax elements) or otherwise to obtain value symmetric information. The bitstream generation unit 42 may specify the audio renderer information 2 in the bitstream 21 in the manner shown above such that the audio renderer information 2 includes value code symmetry information.

[0116]さらに、ビットストリーム生成ユニット４２はまた、以上の符号シンメトリ情報のうちの任意の１つまたは複数（例えば、ｉｓＡｌｌＳｉｇｎＳｙｍｍｅｔｒｉｃシンタックス要素、ｉｓＡｎｙＳｉｇｎＳｙｍｍｅｔｒｉｃシンタックス要素、ｓｉｇｎＳｙｍｍｅｔｒｉｃＰａｉｒｓシンタックス要素、ｉｓＳｉｇｎＳｙｍｍｅｔｒｉｃシンタックス要素、およびｓｉｇｎＳｙｍｍｅｔｒｉｃＰａｉｒｓシンタックス要素のうちの１つまたは複数の任意の組み合わせ）を生成するために、またはさもなければ符号シンメトリック情報を取得するためにオーディオレンダラ１を分析し得る。ビットストリーム生成ユニット４２は、オーディオレンダラ情報２がオーディオ符号シンメトリ情報を含むように以上で示した方法におけるビットストリーム２１中のオーディオレンダラ情報２を指定し得る。 [0116] In addition, the bitstream generation unit 42 may also include any one or more of the above code symmetry information (eg, isAllSignSymmetric syntax element, isAnySignalSymmetric syntax element, signSymmetricPairs syntax element, isSign Symmetric syntax element, Audio renderer 1 may be analyzed to generate (or any combination of one or more of the signSymmetricPairs syntax elements) or otherwise to obtain code symmetric information. The bitstream generation unit 42 may specify the audio renderer information 2 in the bitstream 21 in the manner shown above so that the audio renderer information 2 includes audio code symmetry information.

[0117]値シンメトリ情報および符号シンメトリ情報を決定する場合、ビットストリーム生成ユニット４２は、行列として指定される、オーディオレンダラ１の様々な値を分析し得る。レンダリング行列は、行列Ｒの擬似逆として定式化され得る。言い換えれば、（Ｌ個のラウドスピーカー信号の列ベクトル、ｐ、によって示さる）Ｌ個のラウドスピーカー信号に対して（Ｎ＋１）²ＨＯＡチャネル（以下でＺとして示される）をレンダリングするために、以下の方程式が与えられ得る：
Ｚ＝Ｒ＊ｐ
Ｌ個のラウドスピーカー信号を出力するレンダリング行列に達するように、Ｒ行列の逆が以下の方程式において示されるようなＺＨＯＡｃｈなえるによって乗算される：
ｐ＝Ｒ^-1＊Ｚ
ラウドスピーカーの数ＬがＺＨＯＡチャネルの数と同じでないならば、行列Ｒは、正方ではなくなることになり、完全な逆が決定され得ない。結果として、擬似逆は、以下で定義されるものが代わりに使用され得る：
ｐｉｎｖ（Ｒ）＝Ｒ^T（Ｒ＊ＲＴ）^-1
ここで、Ｒ^Tは、Ｒ行列の転置を示す。以上の方程式中のＲ^-1を置換して、列ベクトルｐによって示されるＬ個のラウドスピーカー信号のための解は、以下のように数学的に示されうる：
ｐ＝ｐｉｎｖ（Ｒ）＊Ｚ＝Ｒ^T（Ｒ＊ＲＴ）^-1＊Ｚ [0117] When determining value symmetry information and code symmetry information, the bitstream generation unit 42 may analyze various values of the audio renderer 1, specified as a matrix. The rendering matrix can be formulated as a pseudo inverse of the matrix R. In other words, to render an (N + 1) ² HOA channel (denoted as Z below) for L loudspeaker signals (denoted by a column vector of L loudspeaker signals, p), The equation of can be given:
Z = R * p
To arrive at a rendering matrix that outputs L loudspeaker signals, the inverse of the R matrix is multiplied by Z HOAch, as shown in the following equation:
p = R ^-1 * Z
If the number L of loudspeakers is not the same as the number of Z HOA channels, the matrix R will not be square and a perfect inverse cannot be determined. As a result, the pseudoinverse can be used instead as defined below:
pinv (R) = R ^T (R * RT) ⁻¹
Here, R ^T indicates transposition of the R matrix. Replacing R ^-1 in the above equation, the solution for the L loudspeaker signals indicated by the column vector p can be expressed mathematically as follows:
p = pinv (R) * Z = R ^T (R * RT) ⁻¹ * Z

[0118]Ｒ行列のエントリは、スピーカーのための異なる球面調和関数のための（Ｎ＋１）²行およびＬ列を持つラウドスピーカー位置のための球面調和関数の値であり得る。ビットストリーム生成ユニット４２は、ラウドスピーカーに対する値に基づいてスピーカーペアを決定し得る。ラウドスピーカー位置に対する球面調和関数の値を分析して、ビットストリーム生成ユニット４２は、値に基づいて、どのラウドスピーカー位置がペア（例えば、ペアが類似、ほとんど同じ、または同じ値だが反対の符号を有し得るような）になるかを決定し得る。 [0118] The entry of the R matrix may be the value of the spherical harmonic for the loudspeaker position with (N + 1) ² rows and L columns for different spherical harmonics for the speaker. Bitstream generation unit 42 may determine a speaker pair based on the value for the loudspeaker. Analyzing the spherical harmonic values for the loudspeaker positions, the bitstream generation unit 42 determines which loudspeaker positions are paired (eg, pairs are similar, nearly the same, or the same value but opposite signs) based on the values. Can be determined).

[0119]ペアを識別後、ビットストリーム生成ユニット４２は、ペアが同じ値またはほとんど同じ値を有するかどうかを、各ペアのために決定し得る。ペアの全てが同じ値を有するとき、ビットストリーム生成ユニット４２は、ｉｓＡｌｌＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素を１に設定し得る。ペアの全てが同じ値を有さないとき、ビットストリーム生成ユニット４２は、ｉｓＡｌｌＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素を０に設定し得る。ペアのうちすべてではないが１つまたは複数が同じ値を有するとき、ビットストリーム生成ユニット４２は、ｉｓＡｎｙＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素を１に設定し得る。ペアのどれもが同じ値を有しないとき、ビットストリーム生成ユニット４２は、ｉｓＡｎｙＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素を０に設定し得る。シンメトリック値を持つペアについて、ビットストリーム生成ユニット４２は、スピーカーのペアのための２つの別個の値よりむしろ１つの値を指定するのみであり、それによって、ビットストリーム２１におけるオーディオレンダリング情報２を表すために使用されるビットの数（例えば、この例における行列）を低減する。 [0119] After identifying the pairs, the bitstream generation unit 42 may determine for each pair whether the pairs have the same or nearly the same value. When all of the pairs have the same value, the bitstream generation unit 42 may set the isAllValueSymmetric syntax element to 1. When all of the pairs do not have the same value, the bitstream generation unit 42 may set the isAllValueSymmetric syntax element to 0. Bitstream generation unit 42 may set the isAnyValueSymmetric syntax element to 1 when one, but not all, of the pairs have the same value. Bitstream generation unit 42 may set the isAnyValueSymmetric syntax element to 0 when none of the pairs have the same value. For pairs with symmetric values, the bitstream generation unit 42 only specifies one value rather than two separate values for the speaker pair, so that the audio rendering information 2 in the bitstream 21 is Reduce the number of bits used to represent (eg, the matrix in this example).

[0120]ペアの間に値シンメトリがないとき、ビットストリーム生成ユニット４２はまた、スピーカーペアが（１つのスピーカーが負値を有する一方、他のスピーカーが正値を有することを意味する）符号シンメトリを有するかどうかを、各ペアのために決定し得る。ペアのすべてが符号シンメトリを有するとき、ビットストリーム生成ユニット４２は、ｉｓＡｌｌＳｉｇｎＳｙｍｍｅｔｒｉｃシンタックス要素を１に設定し得る。ペアのすべてが符号シンメトリを有しないとき、ビットストリーム生成ユニット４２は、ｉｓＡｌｌＳｉｇｎＳｙｍｍｅｔｒｉｃシンタックス要素を０に設定し得る。ペアのうちすべてではないが１つまたは複数のペアが符号シンメトリを有するとき、ビットストリーム生成ユニット４２は、ｉｓＡｎｙＳｉｇｎＳｙｍｍｅｔｒｉｃシンタックス要素を１に設定し得る。ペアいずれもが符号シンメトリを有しないとき、ビットストリーム生成ユニット４２は、ｉｓＡｎｙＳｉｇｎＳｙｍｍｅｔｒｉｃシンタックス要素を０に設定し得る。シンメトリック符号を持つペアについて、ビットストリーム生成ユニット４２は、スピーカーペアのための２つの別個の符号よりむしろ１つまたは符号なしをのみを指定し得、それによって、ビットストリーム２１におけるオーディオレンダリング情報を表すために使用されるビットの数（この例における行列）を低減し得る。 [0120] When there is no value symmetry between the pair, the bitstream generation unit 42 also signifies that the speaker pair (meaning that one speaker has a negative value while the other speaker has a positive value). Can be determined for each pair. When all of the pairs have code symmetry, the bitstream generation unit 42 may set the isAllSignSymmetric syntax element to 1. When all of the pairs do not have code symmetry, the bitstream generation unit 42 may set the isAllSignSymmetric syntax element to 0. The bitstream generation unit 42 may set the isAnySignSymmetric syntax element to 1 when not all but one or more pairs have code symmetry. Bitstream generation unit 42 may set the isAnySignSymmetric syntax element to 0 when none of the pairs has code symmetry. For pairs with symmetric codes, the bitstream generation unit 42 may specify only one or no sign rather than two separate codes for the speaker pair, thereby providing audio rendering information in the bitstream 21. The number of bits used to represent (the matrix in this example) may be reduced.

[0121]ビットストリーム生成ユニット４２は、以下の表で示されるシンタックスにしたがってＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘ（）のシンタックスを記述する表において示されるＤｅｃｏｄｅＨｏａＭａｔｒｉｘＤａｔａ（）コンテナを指定し得る： [0121] The bitstream generation unit 42 may specify the DecodeHoaMatrixData () container shown in the table that describes the syntax of HoaRenderingMatrix () according to the syntax shown in the following table:

[0122]ＤｅｃｏｄｅＨｏａＭａｔｒｉｘＤａｔａのシンタックスを記述する前述の表中におけるｈａｓＶａｌｕｅシンタックス要素は、行列要素が希薄にコード化されるかどうかを示すフラグを表わし得る。ｓｉｇｎＭａｔｒｉｘシンタックス要素は、１つの例として線形化したベクトル形式における、ＨＯＡレンダリング行列の符号値を持つ行列を表わし得る。ｈｏａＭａｔｒｉｘシンタックス要素は、１つの例として、線形化したベクトル形式における、ＨＯＡレンダリング行列値を表し得る。ビットストリーム生成ユニット４２は、以下の表に示されるシンタックスに従ってＤｅｃｏｄｅＨｏａＭａｔｒｉｘＤａｔａのシンタックスを記述する表に示されるＤｅｃｏｄｅＨｏａＧａｉｎＶａｌｕｅ（）コンテナを指定し得る： [0122] The hasValue syntax element in the preceding table describing the syntax of DecodeHoaMatrixData may represent a flag indicating whether the matrix element is sparsely encoded. The signMatrix syntax element may represent a matrix with the code values of the HOA rendering matrix in linearized vector form as one example. The hoaMatrix syntax element may represent a HOA rendering matrix value in a linearized vector format as one example. Bitstream generation unit 42 may specify a DecodeHoaGainValue () container shown in a table that describes the syntax of DecodeHoaMatrixData according to the syntax shown in the following table:

[0123]ビットストリーム生成ユニット４２は、以下の表において指定されたシンタックスに従ってＤｅｃｏｄｅＨｏａＧａｉｎＶａｌｕｅのシンタックスを記述する表に示されるｒｅａｄＲａｎｇｅ（）コンテナを指定し得る： [0123] The bitstream generation unit 42 may specify the readRange () container shown in the table that describes the syntax of DecodeHoaGainValue according to the syntax specified in the following table:

[0124]図３の例には示されないが、オーディオ符号化デバイス２０はまた、現在のフレームが指向性ベース合成を使用して符号化されるべきであるかベクトルベース合成を使用して符号化されるべきであるかに基づいて、オーディオ符号化デバイス２０から出力されるビットストリームを（たとえば、指向性ベースのビットストリーム２１とベクトルベースのビットストリーム２１との間で）切り替える、ビットストリーム出力ユニットを含み得る。ビットストリーム出力ユニットは、（ＨＯＡ係数１１が合成オーディオオブジェクトから生成されたことを検出した結果として）指向性ベース合成が実行されたか、または（ＨＯＡ係数が録音されたことを検出した結果として）ベクトルベース合成が実行されたかを示す、コンテンツ分析ユニット２６によって出力されるシンタックス要素に基づいて、切替えを実行することができる。ビットストリーム出力ユニットは、ビットストリーム２１の各々とともに現在のフレームのために使用される切替えまたは現在の符号化を示すために、正しいヘッダシンタックスを指定することができる。 [0124] Although not shown in the example of FIG. 3, audio encoding device 20 also encodes whether the current frame should be encoded using directional-based combining or vector-based combining. A bitstream output unit that switches a bitstream output from the audio encoding device 20 (eg, between a directivity-based bitstream 21 and a vector-based bitstream 21) based on what should be done Can be included. The bitstream output unit is either a directional-based synthesis performed (as a result of detecting that the HOA coefficient 11 was generated from the synthesized audio object) or a vector (as a result of detecting that the HOA coefficient was recorded). The switching can be performed based on a syntax element output by the content analysis unit 26 that indicates whether base composition has been performed. The bitstream output unit can specify the correct header syntax to indicate the switch or current encoding used for the current frame with each of the bitstreams 21.

[0125]その上、上述されたように、音場分析ユニット４４は、フレームごとに変化し得る、ＢＧ_TOT環境ＨＯＡ係数４７を特定することができる（が、時々、ＢＧ_TOTは、２つ以上の（時間的に）隣接するフレームにわたって一定または同じままであり得る）。ＢＧ_TOTにおける変化は、低減されたフォアグラウンドＶ［ｋ］ベクトル５５において表された係数への変化を生じ得る。ＢＧ_TOTにおける変化は、フレームごとに変化する（「環境ＨＯＡ係数」と呼ばれることもある）バックグラウンドＨＯＡ係数を生じ得る（が、この場合も時々、ＢＧ_TOTは、２つ以上の（時間的に）隣接するフレームにわたって一定または同じままであり得る）。この変化は、追加の環境ＨＯＡ係数の追加または除去と、対応する、低減されたフォアグラウンドＶ［ｋ］ベクトル５５からの係数の除去またはそれに対する係数の追加とによって表される、音場の態様のためのエネルギーの変化を生じることが多い。 [0125] Moreover, as described above, the sound field analysis unit 44 can identify a BG _TOT environmental HOA coefficient 47 that can vary from frame to frame (but sometimes BG _TOT is more than one Can remain constant or the same over adjacent (temporal) frames). Changes in BG _TOT can result in changes to the coefficients represented in the reduced foreground V [k] vector 55. Changes in BG _TOT can result in background HOA coefficients (sometimes referred to as “environmental HOA coefficients”) that change from frame to frame (although again, sometimes BG _TOT has more than one (in time) ) Can remain constant or the same across adjacent frames). This change is represented by the addition or removal of additional environmental HOA coefficients and the corresponding removal of coefficients from the reduced foreground V [k] vector 55 or addition of coefficients thereto. This often causes energy changes.

[0126]結果として、音場分析ユニット４４は、いつ環境ＨＯＡ係数がフレームごとに変化するかをさらに決定し、音場の環境成分を表すために使用されることに関して、環境ＨＯＡ係数への変化を示すフラグまたは他のシンタックス要素を生成することができる（ここで、この変化はまた、環境ＨＯＡ係数の「遷移」または環境ＨＯＡ係数の「遷移」と呼ばれることもある）。具体的には、係数低減ユニット４６は、（ＡｍｂＣｏｅｆｆＴｒａｎｓｉｔｉｏｎフラグまたはＡｍｂＣｏｅｆｆＩｄｘＴｒａｎｓｉｔｉｏｎフラグとして示され得る）フラグを生成し、そのフラグが（場合によってはサイドチャネル情報の一部として）ビットストリーム２１中に含まれ得るように、そのフラグをビットストリーム生成ユニット４２に与えることができる。 [0126] As a result, the sound field analysis unit 44 further determines when the environmental HOA coefficient changes from frame to frame and changes to the environmental HOA coefficient with respect to being used to represent the environmental components of the sound field. Or other syntax elements may be generated (where this change may also be referred to as an environmental HOA coefficient “transition” or an environmental HOA coefficient “transition”). Specifically, the coefficient reduction unit 46 generates a flag (which may be indicated as an AmbCoeffTransition flag or an AmbCoeffIdxTransition flag), which may be included in the bitstream 21 (possibly as part of the side channel information). As such, the flag can be provided to the bitstream generation unit 42.

[0127]係数低減ユニット４６は、環境係数遷移フラグを指定することに加えて、低減されたフォアグラウンドＶ［ｋ］ベクトル５５が生成される方法を修正することもできる。一例では、環境ＨＯＡ環境係数のうちの１つが現在のフレームの間に遷移中であると決定すると、係数低減ユニット４６は、遷移中の環境ＨＯＡ係数に対応する低減されたフォアグラウンドＶ［ｋ］ベクトル５５のＶベクトルの各々について、（「ベクトル要素」または「要素」とも呼ばれ得る）ベクトル係数を指定することができる。この場合も、遷移中の環境ＨＯＡ係数は、ＢＧ_TOTからバックグラウンド係数の総数を追加または除去し得る。したがって、バックグラウンド係数の総数において生じた変化は、環境ＨＯＡ係数がビットストリーム中に含まれるか含まれないか、および、Ｖベクトルの対応する要素が、上記で説明された第２の構成モードおよび第３の構成モードにおいてビットストリーム中で指定されたＶベクトルのために含まれるか否かに影響を及ぼす。係数低減ユニット４６が、エネルギーにおける変化を克服するために、低減されたフォアグラウンドＶ［ｋ］ベクトル５５を指定することができる方法に関するより多くの情報は、２０１５年１月１２日に出願された「ＴＲＡＮＳＩＴＩＯＮＩＮＧＯＦＡＭＢＩＥＮＴＨＩＧＨＥＲ＿ＯＲＤＥＲＡＭＢＩＳＯＮＩＣＣＯＥＦＦＩＣＩＥＮＴＳ」という名称の米国特許出願第１４／５９４，５３３号において提供されている。 [0127] In addition to specifying the environmental coefficient transition flag, the coefficient reduction unit 46 may also modify the manner in which the reduced foreground V [k] vector 55 is generated. In one example, if one of the environmental HOA environmental coefficients is determined to be transitioning during the current frame, coefficient reduction unit 46 may reduce the reduced foreground V [k] vector corresponding to the environmental HOA coefficient being transitioned. For each of the 55 V vectors, a vector coefficient (which may also be referred to as a “vector element” or “element”) may be specified. Again, the transitional environmental HOA coefficients may add or remove the total number of background coefficients from the BG _TOT . Thus, the change that occurs in the total number of background coefficients indicates that the environmental HOA coefficients are included or not included in the bitstream and that the corresponding elements of the V vector are the second configuration mode described above and It affects whether it is included for the V vector specified in the bitstream in the third configuration mode. More information regarding how the coefficient reduction unit 46 can specify a reduced foreground V [k] vector 55 to overcome changes in energy was filed on Jan. 12, 2015. U.S. Patent Application No. 14 / 594,533 entitled "TRANSITIONING OF AMBIENT HIGH_ORDER AMBISONIC COEFFICIENTS".

[0128]図４は、図２のオーディオ復号デバイス２４をより詳細に示すブロック図である。図４の例に示されているように、オーディオ復号デバイス２４は、抽出ユニット７２と、レンダラ再構成ユニット８１と、指向性ベース再構成ユニット９０と、ベクトルベース再構成ユニット９２とを含み得る。以下で説明されるが、オーディオ復号デバイス２４に関するより多くの情報、およびＨＯＡ係数を解凍またはさもなければ復号する様々な態様は、２０１４年５月２９日に出願された「ＩＮＴＥＲＰＯＬＡＴＩＯＮＦＯＲＤＥＣＯＭＰＯＳＥＤＲＥＰＲＥＳＥＮＴＡＴＩＯＮＳＯＦＡＳＯＵＮＤＦＩＥＬＤ」という名称の国際特許出願公開第ＷＯ２０１４／１９４０９９号において入手可能である。 [0128] FIG. 4 is a block diagram illustrating the audio decoding device 24 of FIG. 2 in more detail. As shown in the example of FIG. 4, the audio decoding device 24 may include an extraction unit 72, a renderer reconstruction unit 81, a directivity-based reconstruction unit 90, and a vector-based reconstruction unit 92. As described below, more information regarding the audio decoding device 24 and various aspects of decompressing or otherwise decoding the HOA coefficients can be found in "INTERPOLATION FOR DECOMPOSED REPREENTATIONS OF A" filed May 29, 2014. It is available in International Patent Application Publication No. WO 2014/194099 entitled “SOUND FIELD”.

[0129]抽出ユニット７２は、ビットストリーム２１を受信し、オーディオレンダリング情報２と、ＨＯＡ係数１１の様々な符号化されたバージョン（たとえば、指向性ベースの符号化されたバージョンまたはベクトルベースの符号化されたバージョン）とを抽出するように構成されたユニットを表し得る。言い換えれば、行列をレンダリングする高次アンビソニック（ＨＯＡ）は、オーディオ再生システム１６でＨＯＡレンダリング処理にわたって制御を可能にする、オーディオ符号化デバイス２０によって送信され得る。送信は、以上で示されたタイプＩＤ＿ＣＯＮＦＩＧ＿ＥＸＴ＿ＨＯＡ＿ＭＡＴＲＩＸのｍｐｅｇｈ３ｄａＣｏｎｆｉｇＥｘｔｅｎｓｉｏｎの手段によって容易にされ得る。ｍｐｅｇｈ３ｄａＣｏｎｆｉｇＥｘｔｅｎｓｉｏｎは、異なるラウドスピーカー再生構成のためのいくつかのＨＯＡレンダリング行列を含み得る。ＨＯＡレンダリング行列が送信されるとき、オーディオ符号化デバイス２０は、各ＨＯＡレンダリング行列に対して、ＨｏａＯｒｄｅｒの次元と一緒にレンダリング行列の次元を決定する関連する目標ラウドスピーカーレイアウトをシグナリングする。 [0129] The extraction unit 72 receives the bitstream 21, receives the audio rendering information 2 and various encoded versions of the HOA coefficients 11 (eg, directivity-based encoded version or vector-based encoding). The unit configured to extract the In other words, a higher order ambisonic (HOA) rendering matrix may be transmitted by the audio encoding device 20 that allows control over the HOA rendering process in the audio playback system 16. Transmission may be facilitated by means of mpegh3daConfigExtension of type ID_CONFIG_EXT_HOA_MATRIX indicated above. mpegh3daConfigExtension may include several HOA rendering matrices for different loudspeaker playback configurations. When the HOA rendering matrix is transmitted, the audio encoding device 20 signals for each HOA rendering matrix an associated target loudspeaker layout that determines the dimensions of the rendering matrix along with the dimensions of the HoaOrder.

[0130]一意のＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘＩｄの送信は、オーディオ再生システム１６で利用可能なデフォルトＨＯＡレンダリング行列、またはオーディオビットストリーム２１の外側から送信されたＨＯＡレンダリング行列に参照すること許す。いくつかの場合には、あらゆるＨＯＡレンダリング行列は、Ｎ３Ｄにおいて標準化されると仮定され、ビットストリーム２１において定義されるようなＨＯＡ係数のオーダリングすることに従う。 [0130] The transmission of the unique HoaRenderingMatrixId allows to refer to the default HOA rendering matrix available in the audio playback system 16 or the HOA rendering matrix transmitted from outside the audio bitstream 21. In some cases, every HOA rendering matrix is assumed to be standardized in N3D and follows the ordering of HOA coefficients as defined in bitstream 21.

[0131]上述されたように、関数ｆｉｎｄＳｙｍｍｅｔｒｉｃＳｐｅａｋｅｒｓは、１つの例として、いわゆる「スイートスポット」でリスナーの正中面に関してシンメトリックである提供されるラウドスピーカーセットアップ内で全てのラウドスピーカーの数および位置を示す。このヘルパー関数は、以下のように定義され得る：ｉｎｔｆｉｎｄＳｙｍｍｅｔｒｉｃＳｐｅａｋｅｒｓ（ｉｎｔｏｕｔｐｕｔＣｏｕｎｔ，ＳｐｅａｋｅｒＩｎｆｏｒｍａｔｉｏｎ＊ｏｕｔｐｕｔＣｏｎｆｉｇ，ｉｎｔｈａｓＬｆｅＲｅｎｄｅｒｉｎｇ）；抽出ユニット７２は、その後シンメトリックラウドスピーカーに関連した行列要素を生成するために使用され得る１．０および−１．０値のベクトルを計算するために、関数ｃｒｅａｔｅＳｙｍＳｉｇｎｓを呼び出し得る。このｃｒｅａｔｅＳｙｍＳｉｇｎｓ関数は、以下のように定義され得る：
void createSymSigns(int* symSigns, int hoaOrder)
{
int n, m, k = 0;
for (n = 0; n<=hoaOrder; ++n) {
for (m = -n; m<=n; ++m)
symSigns[k++] = ((m>=0)*2)-1;
}
} [0131] As described above, the function findSymmetricSpeakers, as one example, determines the number and position of all loudspeakers within a provided loudspeaker setup that is symmetric with respect to the listener's midplane as a so-called "sweet spot". Show. This helper function can be defined as: int findSymmetricSpeakers (int outputCount, SpeakerInformation * outputConfig, int hasLfeRendering); the extraction unit 72 can then be used to generate a matrix associated with the symmetric loudspeaker. The function createSymSigns may be called to compute a vector of 1.0 and -1.0 values. This createSymSigns function can be defined as follows:
void createSymSigns (int * symSigns, int hoaOrder)
{
int n, m, k = 0;
for (n = 0; n <= hoaOrder; ++ n) {
for (m = -n; m <= n; ++ m)
symSigns [k ++] = ((m> = 0) * 2) -1;
}
}

[0132]抽出ユニット７２は、水平面に単に使用されるＨＯＡ係数を識別するようにｂｉｔｍａｓｋを生成するために、関数ｃｒｅａｔｅ２ｄＢｉｔｍａｓｋを呼び出し得る。ｃｒｅａｔｅ２ｄＢｉｔｍａｓｋ関数は、以下のように定義され得る：
void create2dBitmask(int* bitmask, int hoaOrder)
{
int n, m, k = 0;
bitmask[k++] = 0;
for (n = 1; n<=hoaOrder; ++n) {
ffor (m = -n; m<=n; ++m)
bitmask[k++] = abs(m)!=n;
}
} [0132] The extraction unit 72 may call the function create2dBitmask to generate a bitmask to identify the HOA coefficients that are simply used for the horizontal plane. The create2dBitmask function can be defined as follows:
void create2dBitmask (int * bitmask, int hoaOrder)
{
int n, m, k = 0;
bitmask [k ++] = 0;
for (n = 1; n <= hoaOrder; ++ n) {
ffor (m = -n; m <= n; ++ m)
bitmask [k ++] = abs (m)! = n;
}
}

[0133]ＨＯＡレンダリング行列係数を復号するために、抽出ユニット７２は、最初に、シンタックス要素ＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘＳｅｔ（）を抽出し得る、それは、上述するように、所望のラウドスピーカーレイアウトにＨＯＡレンダリングを達成するために適用され得る１つまたは複数のＨＯＡレンダリング行列を含み得る。いくつかの場合には、所与のビットストリームは、ＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘＳｅｔ（）のうちの１つのインスタンスより多くのものを含み得ない。シンタックス要素ＨｏａＲｅｎｄｅｒｉｎｇＭａｔｒｉｘ（）は、（図４の例においてレンダラ情報２として示され得る）ＨＯＡレンダリング行列情報を含み得る。抽出ユニット７２は、最初に、コンフィグ情報中で読み込まれ得る。それは、復号処理をガイドし得る。その後、抽出ユニット７２は、それに応じて行列要素を読み込み得る。 [0133] To decode the HOA rendering matrix coefficients, the extraction unit 72 may first extract the syntax element HoaRenderingMatrixSet (), which achieves HOA rendering in the desired loudspeaker layout, as described above. May include one or more HOA rendering matrices that may be applied to. In some cases, a given bitstream may not contain more than one instance of HoaRenderingMatrixSet (). The syntax element HoaRenderingMatrix () may include HOA rendering matrix information (which may be shown as renderer information 2 in the example of FIG. 4). The extraction unit 72 can first be read in the configuration information. It may guide the decoding process. Thereafter, the extraction unit 72 may read the matrix elements accordingly.

[0134]いくつかの場合には、抽出ユニット７２は、最初に、フィールドｐｒｅｃｉｓｉｏｎＬｅｖｅｌおよびｇａｉｎＬｉｍｉｔＰｅｒＯｒｄｅｒを読み込む。フラグｇａｉｎＬｉｍｉｔＰｅｒＯｒｄｅｒが設定される場合、抽出ユニット７２は、別々に、各ＨＯＡオーダーのためにｍａｘＧａｉｎおよびｍｉｎＧａｉｎフィールドを読み込み、復号する。フラグｇａｉｎＬｉｍｉｔＰｅｒＯｒｄｅｒが設定されない場合、抽出ユニット７２は、フィールドｍａｘＧａｉｎおよびｍｉｎＧａｉｎを一度読み込み、復号し、復号処理の間にすべてのＨＯＡオーダーにこれらのフィールドを適用する。いくつかの場合には、ｍｉｎＧａｉｎ値は、０ｄｂと−６９ｄＢとの間になければならない。いくつかの場合には、ｍａｘＧａｉｎ値は、ｍｉｎＧａｉｎ値より小さい１ｄＢと１１１ｄＢとの間になければならない。図９は、ＨＯＡレンダリング行列内のＨＯＡオーダー依存の最小利得および最大利得の例を示す図である。 [0134] In some cases, the extraction unit 72 first reads the fields precisionLevel and gainLimitPerOrder. If the flag gainLimitPerOrder is set, the extraction unit 72 reads and decodes the maxGain and minGain fields separately for each HOA order. If the flag gainLimitPerOrder is not set, the extraction unit 72 reads and decodes the fields maxGain and minGain once and applies these fields to all HOA orders during the decoding process. In some cases, the minGain value must be between 0 db and -69 dB. In some cases, the maxGain value must be between 1 dB and 111 dB, which is less than the minGain value. FIG. 9 is a diagram illustrating an example of the minimum gain and the maximum gain depending on the HOA order in the HOA rendering matrix.

[0135]抽出ユニット７２は、次にフラグｉｓＦｕｌｌＭａｔｒｉｘを読み込み得る。それは、行列がフルとしてまたは部分的に希薄なもとして定義されるかどうかをシグナリングし得る。行列が部分的に希薄なものとして定義される場合、抽出ユニット７２は、次のフィールド（例えば、ｆｉｒｓｔＳｐａｒｓｅＯｒｄｅｒシンタックス要素）を読み込む。それは、ＨＯＡレンダリング行列が希薄にコード化されるＨＯＡオーダーを指定する。ＨＯＡレンダリング行列はしばしば、ラウドスピーカー再生セットアップに依存して、低次に対して濃密であり、高次において希薄になり得る。図１０は、２２個のラウドスピーカーのための部分的に希薄な６次ＨＯＡレンダリング行列を示す図である。図１０に示される行列の希薄さは、２６番目のＨＯＡ係数（ＨＯＡオーダー５）で開始する。 [0135] Extraction unit 72 may then read the flag isFullMatrix. It may signal whether the matrix is defined as full or partially sparse. If the matrix is defined as partially sparse, the extraction unit 72 reads the next field (eg, firstSparseOrder syntax element). It specifies the HOA order in which the HOA rendering matrix is sparsely encoded. The HOA rendering matrix is often dense for the low order and can be sparse for the high order, depending on the loudspeaker playback setup. FIG. 10 is a diagram illustrating a partially sparse 6th order HOA rendering matrix for 22 loudspeakers. The sparseness of the matrix shown in FIG. 10 starts with the 26th HOA coefficient (HOA order 5).

[0136]低周波数効果（ＬＦＥ）チャネルが（ｌｆｅＥｘｉｓｔｓシンタックス要素によって示される）ラウドスピーカー再生セットアップ内に存在するかどうかによって、抽出ユニット７２は、フィールドｈａｓＬｆｅＲｅｎｄｅｒｉｎｇを読み込み得る。ｈａｓＬｆｅＲｅｎｄｅｒｉｎｇが設定されない場合、抽出ユニット７２は、ＬＦＥチャネルに関する行列要素がデジタルのゼロであると仮定するように構成される。抽出ユニット７２によって読み込まれた次のフィールドはフラグｚｅｒｏｔｈＯｒｄｅｒＡｌｗａｙｓＰｏｓｉｔｉｖｅである。それは、０次の係数に関連する行列要素が正であるかどうかをシグナリングする。ｚｅｒｏｔｈＯｒｄｅｒＡｌｗａｙｓＰｏｓｉｔｉｖｅが、ゼロ次ＨＯＡ係数が正であることを示す場合には、抽出ユニット７２は、ナンバー符号がゼロ次ＨＯＡ係数に対応するレンダリング行列係数のためにコード化されないことを決定する。 [0136] Depending on whether a low frequency effects (LFE) channel is present in the loudspeaker playback setup (indicated by the lfeExists syntax element), the extraction unit 72 may read the field hasLfeRendering. If hasLfeRendering is not set, the extraction unit 72 is configured to assume that the matrix elements for the LFE channel are digital zeros. The next field read by the extraction unit 72 is the flag zeroOrderAlwaysPositive. It signals whether the matrix element associated with the zeroth order coefficient is positive. If zeroOrderAlwaysPositive indicates that the zeroth order HOA coefficient is positive, the extraction unit 72 determines that the number code is not coded for the rendering matrix coefficients corresponding to the zeroth order HOA coefficient.

[0137]以下では、ＨＯＡレンダリング行列のプロパティは、正中面に関してラウドスピーカーペアシンメトリックのためにシグナリングされ得る。いくつかの場合には、ａ）値シンメトリおよびｂ）符号シンメトリに関して２つのシンメトリプロパティがある。値シンメトリの場合には、シンメトリラウドスピーカーペアのうち左のラウドスピーカーの行列要素がコード化されないが、むしろ抽出ユニット７２は、以下を実行する、ヘルパー関数ｃｒｅａｔｅＳｙｍＳｉｇｎｓを用いることによって右ラウドスピーカーの復号された行列要素からこれらの要素を導出する：
ｐａｉｒＩｄｘ、ｏｕｔｐｕｔＣｏｎｆｉｇ［ｊ］。ｓｙｍｍｅｔｒｉｃＰａｉｒ−＞ｏｒｉｇｉｎａｌＰｏｓｉｔｉｏｎ；
ｈｏａＭａｔｒｉｘ［ｉ＊ｏｕｔｐｕｔＣｏｕｎｔ＋ｊ］＝ｈｏａＭａｔｒｉｘ［ｉ＊ｏｕｔｐｕｔＣｏｕｎｔ＋ｐａｉｒＩｄｘ；］；
ｓｉｇｎＭａｔｒｉｘ［ｉ＊ｏｕｔｐｕｔＣｏｕｎｔ＋ｊ］＝ｓｙｍＳｉｇｎｓ［ｉ］＊ｓｉｇｎＭａｔｒｉｘ［ｉ＊ｏｕｔｐｕｔＣｏｕｎｔ＋ｐａｉｒＩｄｘ］。 [0137] In the following, the properties of the HOA rendering matrix may be signaled for loudspeaker pair symmetry with respect to the median plane. In some cases, there are two symmetry properties for a) value symmetry and b) code symmetry. In the case of value symmetry, the matrix element of the left loudspeaker of the symmetry loudspeaker pair is not coded, but rather the extraction unit 72 decodes the right loudspeaker by using the helper function createSymSigns, which performs the following: Derive these elements from the given matrix elements:
pairIdx, outputConfig [j]. symbolicpair->originalPosition;
hoaMatrix [i * outputCount + j] = hoaMatrix [i * outputCount + pairIdx;];
signMatrix [i * outputCount + j] = symSigns [i] * signMatrix [i * outputCount + pairIdx].

[0138]ラウドスピーカーペアが値シンメトリックでないとき、その後、行列要素は、それらのナンバー符号に関してシンメトリックになり得る。ラウドスピーカーペアが符号シンメトリックであるとき、シンメトリックラウドスピーカーペアの左ラウドスピーカーの行列要素のナンバー符号は、コード化されず、抽出ユニット７２は、以下を実行する、ヘルパー関数ｃｒｅａｔｅＳｙｍＳｉｇｎｓを用いることによって右ラウドスピーカーに関連した行列要素のナンバー符号からこれらのナンバー符号を導出する：
ｐａｉｒＩｄｘ＝ｏｕｔｐｕｔＣｏｎｆｉｇ［ｊ］．ｓｙｍｍｅｔｒｉｃＰａｉｒ−＞ｏｒｉｇｉｎａｌＰｏｓｉｔｉｏｎ；
ｓｉｇｎＭａｔｒｉｘ［ｉ＊ｏｕｔｐｕｔＣｏｕｎｔ＋ｊ］＝ｓｙｍＳｉｇｎｓ［ｉ］＊ｓｉｇｎＭａｔｒｉｘ［ｉ＊ｏｕｔｐｕｔＣｏｕｎｔ＋ｐａｉｒＩｄｘ］； [0138] When the loudspeaker pairs are not value symmetric, then the matrix elements can be symmetric with respect to their number codes. When the loudspeaker pair is code symmetric, the number code of the matrix element of the left loudspeaker of the symmetric loudspeaker pair is not coded and the extraction unit 72 uses the helper function createSymSigns to perform the following: Derive these number codes from the number codes of the matrix elements associated with the right loudspeaker:
pairIdx = outputConfig [j]. symbolicpair->originalPosition;
signMatrix [i * outputCount + j] = symSigns [i] * signMatrix [i * outputCount + pairIdx];

[0139]図１１は、シンメトリプロパティのシグナリングを説明する図である。ラウドスピーカーペアは、同時刻で値シンメトリックと符号シンメトリックとして定義され得る。最後の復号フラグは、循環の（circular）（つまり２Ｄ）ＨＯＡ係数に関連した行列要素がコード化されるかどうかを指定したフラグｈａｓＶｅｒｔｉｃａｌＣｏｅｆを有する。ｈａｓＶｅｒｔｉｃａｌＣｏｅｆが設定されない場合、ヘルパー関数ｃｒｅａｔｅ２ｄＢｉｔｍａｓｋで定義されたＨＯＡ係数に関連した行列要素は、デジタルの０に設定される。 [0139] FIG. 11 is a diagram for explaining symmetry property signaling. A loudspeaker pair may be defined as a value symmetric and a code symmetric at the same time. The final decoding flag has a flag hasVerticalCoef that specifies whether the matrix elements associated with the circular (ie 2D) HOA coefficients are coded. If hasVerticalCoef is not set, the matrix element associated with the HOA coefficient defined by the helper function create2dBitmask is set to digital zero.

[0140]すなわち、抽出ユニット７２は、図１１に記述される処理に従ってオーディオレンダリング情報２を抽出し得る。抽出ユニット７２は、最初に、ビットストリーム２１（３００）からｉｓＡｌｌＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素を読み込み得る。ｉｓＡｌｌＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素が１（または言い換えれば、ブール真）に設定されるとき、抽出ユニット７２は、（効果的に、スピーカーペアの全てが値シンタックスであることを示す）１の値にｖａｌｕｅＳｙｍｍｅｔｒｉｃＰａｉｒｓアレイシンタックス要素を設定して、ｎｕｍＰａｉｒｓシンタックス要素の値を通じて繰り返し得る（３０２）。 [0140] That is, the extraction unit 72 may extract the audio rendering information 2 according to the process described in FIG. The extraction unit 72 may first read the isAllValueSymmetric syntax element from the bitstream 21 (300). When the isAllValueSymmetric syntax element is set to 1 (or in other words, Boolean true), the extraction unit 72 will effectively set the valueSymmetricPairs array to a value of 1 (indicating that all of the speaker pairs are value syntax). A syntax element may be set and repeated through the value of the numPairs syntax element (302).

[0141]ｉｓＡｌｌＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素が０（または、言い換えれば、ブール偽）に設定されるとき、抽出ユニット７２は、次に、ｉｓＡｎｙＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素を読み込み得る（３０４）。ｉｓＡｎｙＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素が１（または、言い換えれば、ブール真）に設定されるとき、抽出ユニット７２は、ビットストリーム２１からシーケンシャルに読み込まれたビットにｖａｌｕｅＳｙｍｍｅｔｒｉｃＰａｉｒｓアレイシンタックス要素を設定して、ｎｕｍＰａｉｒｓシンタックス要素の値を通じて繰り返し得る（３０６）。抽出ユニット７２はまた、０に設定されたｖａｌｕｅＳｙｍｍｅｔｒｉｃＰａｉｒｓシンタックス要素セットを有するペアのうちの任意のもののためのｉｓＡｎｙＳｉｇｎＳｙｍｍｅｔｒｉｃシンタックス要素を取得し得る。その後、抽出ユニット７２は、ペアの数を通じて再び繰り返し、ｖａｌｕｅＳｙｍｍｅｔｒｉｃＰａｉｒｓが０に等しいとき、ビットストリーム２１から読み込まれた値にｓｉｇｎＳｙｍｍｅｔｒｉｃＰａｉｒｓビットを設定する。 [0141] When the isAllValueSymmetric syntax element is set to 0 (or in other words, Boolean false), the extraction unit 72 may then read the isAnyValueSymmetric syntax element (304). When the isAnyValueSymmetric syntax element is set to 1 (or in other words, Boolean true), the extraction unit 72 sets the valueSymmetricPairs array syntax element to the bits that are read sequentially from the bitstream 21, and the numPairs syntax syntax element. It can be repeated through the value of the element (306). Extraction unit 72 may also obtain the isAnySignSymmetric syntax element for any of the pairs having a valueSymmetricPairs syntax element set set to zero. Thereafter, the extraction unit 72 repeats again through the number of pairs and sets the signSymmetricPairs bit to the value read from the bitstream 21 when valueSymmetricPairs is equal to 0.

[0142]ｉｓＡｎｙＶａｌｕｅＳｙｍｍｅｔｒｉｃシンタックス要素が０（または、言い換えれば、ブール偽）に設定されるとき、抽出ユニット７２は、ビットストリーム２１からｉｓＡｌｌＳｉｇｎＳｙｍｍｅｔｒｉｃシンタックス要素を読み込み得る。ｉｓＡｌｌＳｉｇｎＳｙｍｍｅｔｒｉｃシンタックス要素が１の値（または、言い換えれば、ブール真）に設定されるとき、抽出ユニット７２は、（スピーカーペアのすべてが符号新メトリックであることを効果的に示す）１の値にｓｉｇｎＳｙｍｍｅｔｒｉｃＰａｉｒｓアレイシンタックス要素を設定して、ｎｕｍＰａｉｒｓシンタックス要素の値を通じて繰り返し得る。 [0142] The extraction unit 72 may read the isAllSignSymmetric syntax element from the bitstream 21 when the isAnyValueSymmetric syntax element is set to 0 (or in other words, Boolean false). When the isAllSignSymmetric syntax element is set to a value of 1 (or in other words Boolean true), the extraction unit 72 is set to a value of 1 (effectively indicating that all of the speaker pairs are sign new metrics). The signSymmetricPairs array syntax element can be set and iterated through the values of the numPairs syntax element.

[0143]ｉｓＡｌｌＳｉｇｎＳｙｍｍｅｔｒｉｃシンタックス要素が０（または、言い換えれば、ブール偽）に設定されるとき、抽出ユニット７２は、ビットストリーム２１からのｉｓＡｎｙＳｉｇｎＳｙｍｍｅｔｒｉｃシンタックス要素を読み込み得る。抽出ユニット７２は、ビットストリーム２１からシーケンシャルに読み込まれたビットにｓｉｇｎＳｙｍｍｅｔｒｉｃＰａｉｒｓアレイシンタックス要素を設定して、ｎｕｍＰａｉｒｓシンタックス要素の値を通じて繰り返し得る。ビットストリーム生成ユニット４２は、値シンメトリ情報、符号シンメトリ情報、または値および符号シンメトリ情報の両方の組み合わせを指定するために、抽出ユニット７２に関して以上で説明されたものに逆処理を実行し得る。 [0143] When the isAllSignSymmetric syntax element is set to 0 (or in other words Boolean false), the extraction unit 72 may read the isAnySignSymmetric syntax element from the bitstream 21. Extraction unit 72 may set the signSymmetricPairs array syntax element to the bits read sequentially from bitstream 21 and iterate through the value of the numPairs syntax element. Bitstream generation unit 42 may perform an inverse process to that described above with respect to extraction unit 72 to specify value symmetry information, code symmetry information, or a combination of both value and code symmetry information.

[0144]レンダラ再構成ユニット８１は、オーディオレンダリング情報２に基づいてレンダラを再構成するために、ユニット構成ｄを表わす。すなわち、以上で述べられたプロパティを使用して、レンダラ再構成ユニット８１は、一連の行列要素利得値を読み込み得る。絶対値の利得値を読み込むために、レンダラ再構成ユニット８１、関数ＤｅｃｏｄｅＧａｉｎＶａｌｕｅ（）を呼び出し得る。レンダラ再構成ユニット８１は、利得値を一様に復号するためにアルファベットインデックスの関数ＲｅａｄＲａｎｇｅ（）を呼び出し得る。復号された利得値がデジタルの０でないとき、レンダラ再構成ユニット８１は、加えて（以下の表ごとに）ナンバー符号値を読み込み得る。行列要素が（ｉｓＨｏａＣｏｅｆＳｐａｒｓｅを介して）希薄になるようにシグナリングされたＨＯＡ係数と関連するとき、ｈａｓＶａｌｕｅフラグは、ｇａｉｎＶａｌｕｅＩｎｄｅｘの前に置く（表ｂを参照）。ｈａｓＶａｌｕｅフラグが０であるとき、この要素は、デジタルの０に設定され、ｇａｉｎＶａｌｕｅＩｎｄｅｘおよび符号がシグナリングされない。 [0144] The renderer reconstruction unit 81 represents a unit configuration d to reconstruct the renderer based on the audio rendering information 2. That is, using the properties described above, the renderer reconstruction unit 81 can read a series of matrix element gain values. To read the absolute gain value, the renderer reconstruction unit 81, function DecodeGainValue () may be called. The renderer reconstruction unit 81 may call the alphabet index function ReadRange () to uniformly decode the gain values. When the decoded gain value is not digital zero, the renderer reconstruction unit 81 may additionally read the number code value (as per the table below). When a matrix element is associated with a sparsely signaled HOA coefficient (via isHoaCoefSparse), the hasValue flag is placed before the gainValueIndex (see Table b). When the hasValue flag is 0, this element is set to digital 0 and the gainValueIndex and sign are not signaled.

[0145]ラウドスピーカーペアのための指定されたシンメトリプロパティによって、レンダラ再構成ユニット８１は、右のラウドスピーカーから左のラウドスピーカーに関連する行列要素を導出し得る。この場合、左のラウドスピーカーのための行列要素を復号するために、ビットストリーム２１におけるオーディオレンダリング情報２は、低減される、またはそれに応じて完全に省略される。 [0145] With the specified symmetry property for the loudspeaker pair, the renderer reconstruction unit 81 may derive matrix elements associated with the left loudspeaker from the right loudspeaker. In this case, in order to decode the matrix elements for the left loudspeaker, the audio rendering information 2 in the bitstream 21 is reduced or omitted accordingly.

[0146]このように、オーディオ復号デバイス２４は、指定されるオーディオレンダリング情報のサイズを低減するために、シンメトリ情報を決定し得る。いくつかの場合には、オーディオ復号デバイス２４は、指定されるオーディオレンダリング情報のサイズを低減するためにシンメトリ情報を決定し、シンメトリ情報に基づいてオーディオレンダラの少なくとも一部を導出し得る。 [0146] Thus, the audio decoding device 24 may determine symmetry information to reduce the size of the specified audio rendering information. In some cases, the audio decoding device 24 may determine symmetry information to reduce the size of the specified audio rendering information and derive at least a portion of the audio renderer based on the symmetry information.

[0147]これらおよび他の場合には、オーディオ復号デバイス２４は、指定されるオーディオレンダリング情報のサイズを低減するために値シンメトリ情報を決定し得る。これらおよび他の場合には、オーディオ復号デバイス２４は、値シンメトリ情報に基づいてオーディオレンダラの少なくとも一部を導出し得る。 [0147] In these and other cases, audio decoding device 24 may determine value symmetry information to reduce the size of the specified audio rendering information. In these and other cases, audio decoding device 24 may derive at least a portion of the audio renderer based on the value symmetry information.

[0148]これらおよび他の場合には、オーディオ復号デバイス２４は、指定されるオーディオレンダリング情報のサイズを低減するために符号シンメトリ情報を決定し得る。これらおよび他の場合には、オーディオ復号デバイス２４は、符号シンメトリ情報に基づいてオーディオレンダラの少なくとも一部を導出し得る。 [0148] In these and other cases, audio decoding device 24 may determine code symmetry information to reduce the size of the specified audio rendering information. In these and other cases, audio decoding device 24 may derive at least a portion of the audio renderer based on the code symmetry information.

[0149]これらおよび他の場合には、オーディオ復号デバイス２４は、複数のスピーカーフィードに球面調和係数をレンダリングするために使用される行列の希薄さを示す希薄さ情報を決定し得る。 [0149] In these and other cases, the audio decoding device 24 may determine sparseness information that indicates the sparseness of the matrix used to render the spherical harmonics into multiple speaker feeds.

[0150]これらおよび他の場合には、オーディオ復号デバイス２４は、行列が複数のスピーカーフィードに球面調和係数をレンダリングするために使用されるべきスピーカーレイアウトを決定し得る。 [0150] In these and other cases, audio decoding device 24 may determine the speaker layout in which the matrix is to be used to render spherical harmonics into multiple speaker feeds.

[0151]その後、この点において、オーディオ復号デバイス２４は、ビットストリームにおいて指定されるオーディオレンダリング情報２を決定し得る。オーディオレンダリング情報２中に含まれる信号値に基づいて、オーディオ再生システム１６は、オーディオレンダラ２２のうちの１つを使用して、複数のスピーカーフィード２５をレンダリングし得る。スピーカーフィードは、スピーカー３を導出し得る。上述されるように、信号値は、いくつかの場合には、複数のスピーカーフィードに球面調和係数をレンダリングするために使用される（オーディオレンダラ２２のうちの１つとして復号され提供される）行列を含み得る。この場合、オーディオ再生システム１６は、行列に基づいてスピーカーフィード２５をレンダリングするために、オーディオレンダラ２２のうちのこの１つを使用して、行列でオーディオレンダラ２２のうちの１つを構成し得る。 [0151] Thereafter, at this point, the audio decoding device 24 may determine the audio rendering information 2 specified in the bitstream. Based on the signal values included in the audio rendering information 2, the audio playback system 16 may render multiple speaker feeds 25 using one of the audio renderers 22. The speaker feed can derive the speaker 3. As described above, the signal values are in some cases used to render spherical harmonic coefficients in multiple speaker feeds (decoded and provided as one of the audio renderers 22). Can be included. In this case, the audio playback system 16 may use this one of the audio renderers 22 to configure one of the audio renderers 22 in a matrix to render the speaker feed 25 based on the matrix. .

[0152]ＨＯＡ係数１１が取得されたオーディオレンダラ２２を使用してレンダリングされることが利用可能なように、ＨＯＡ係数１１の様々な符号化されたバージョンを抽出し、次に、復号するために、抽出ユニット７２は、ＨＯＡ係数１１が様々な方向ベースのバージョンを介して符号化されたか、ベクトルベースのバージョンを介して符号化されたかを示す、上述されたシンタックス要素から決定することができる。指向性ベース符号化が実行されたとき、抽出ユニット７２は、ＨＯＡ係数１１の指向性ベースのバージョンと、符号化されたバージョンに関連付けられたシンタックス要素（図４の例では指向性ベース情報９１として示される）とを抽出し、指向性ベース情報９１を指向性ベース再構成ユニット９０に渡すことができる。指向性ベース再構成ユニット９０は、指向性ベース情報９１に基づいてＨＯＡ係数１１’の形態でＨＯＡ係数を再構成するように構成されたユニットを表し得る。 [0152] Various encoded versions of the HOA coefficient 11 are extracted and then decoded so that the HOA coefficient 11 can be rendered using the acquired audio renderer 22 , The extraction unit 72 can determine from the syntax elements described above that indicate whether the HOA coefficients 11 were encoded via various direction-based versions or vector-based versions. . When directivity-based encoding is performed, the extraction unit 72 uses the directivity-based version of the HOA coefficient 11 and the syntax elements associated with the encoded version (directivity-based information 91 in the example of FIG. 4). And the directivity base information 91 can be passed to the directivity base reconstruction unit 90. Directivity base reconstruction unit 90 may represent a unit configured to reconstruct HOA coefficients in the form of HOA coefficients 11 ′ based on directivity base information 91.

[0153]ＨＯＡ係数１１がベクトルベース分解を使用して符号化されたことをシンタックス要素が示すとき、抽出ユニット７２は、コーディングされたフォアグラウンドＶ［ｋ］ベクトル５７（コーディングされた重み５７および／もしくはインデックス６３またはスカラー量子化されたＶベクトルを含み得る）と、符号化された環境ＨＯＡ係数５９と、対応するオーディオオブジェクト６１（符号化ｎＦＧ信号６１とも称される）とを抽出することができる。オーディオオブジェクト６１は、ベクトル５７のうちの１つに各々対応する。抽出ユニット７２は、コーディングされたフォアグラウンドＶ［ｋ］ベクトル５７をＶベクトル再構成ユニット７４に渡し、符号化された環境ＨＯＡ係数５９を符号化されたｎＦＧ信号６１とともに聴覚心理復号ユニット８０に渡すことができる。 [0153] When the syntax element indicates that the HOA coefficient 11 has been encoded using vector-based decomposition, the extraction unit 72 may use the coded foreground V [k] vector 57 (coded weights 57 and / or Or an index 63 or scalar quantized V-vector), the encoded environmental HOA coefficients 59, and the corresponding audio object 61 (also referred to as an encoded nFG signal 61) can be extracted. . Each audio object 61 corresponds to one of the vectors 57. The extraction unit 72 passes the coded foreground V [k] vector 57 to the V vector reconstruction unit 74 and passes the encoded environmental HOA coefficient 59 along with the encoded nFG signal 61 to the psychoacoustic decoding unit 80. Can do.

[0154]Ｖベクトル再構成ユニット７４は、符号化されたフォアグラウンドＶ［ｋ］ベクトル５７からＶベクトルを再構成するように構成されたユニットを表し得る。Ｖベクトル再構成ユニット７４は、量子化ユニット５２の動作とは逆の方法で動作することができる。 [0154] V vector reconstruction unit 74 may represent a unit configured to reconstruct a V vector from the encoded foreground V [k] vector 57. The V vector reconstruction unit 74 can operate in a manner opposite to that of the quantization unit 52.

[0155]聴覚心理復号ユニット８０は、符号化された環境ＨＯＡ係数５９と符号化されたｎＦＧ信号６１とを復号し、それによってエネルギー補償された環境ＨＯＡ係数４７’と補間されたｎＦＧ信号４９’（補間されたｎＦＧオーディオオブジェクト４９’とも呼ばれ得る）とを生成するために、図３の例に示される聴覚心理オーディオコーダユニット４０とは逆の方法で動作することができる。聴覚心理復号ユニット８０は、エネルギー補償された環境ＨＯＡ係数４７’をフェードユニット７７０に渡し、ｎＦＧ信号４９’をフォアグラウンド編成ユニット７８に渡すことができる。 [0155] The psychoacoustic decoding unit 80 decodes the encoded environmental HOA coefficient 59 and the encoded nFG signal 61, thereby energy-compensated environmental HOA coefficient 47 'and interpolated nFG signal 49'. Can be operated in the opposite manner to the psychoacoustic audio coder unit 40 shown in the example of FIG. 3 to generate (interpolated nFG audio object 49 ′). The psychoacoustic decoding unit 80 can pass the energy-compensated environmental HOA coefficient 47 ′ to the fade unit 770 and the nFG signal 49 ′ to the foreground organization unit 78.

[0156]空間時間的補間ユニット７６は、空間時間的補間ユニット５０に関して上記で説明されたものと同様の方法で動作することができる。空間時間的補間ユニット７６は、低減されたフォアグラウンドＶ［ｋ］ベクトル５５_kを受信し、また、補間されたフォアグラウンドＶ［ｋ］ベクトル５５_k’’を生成するために、フォアグラウンドＶ［ｋ］ベクトル５５_kおよび低減されたフォアグラウンドＶ［ｋ−１］ベクトル５５_k-1に関して空間時間的補間を実行することができる。空間時間的補間ユニット７６は、補間されたフォアグラウンドＶ［ｋ］ベクトル５５_k’’をフェードユニット７７０に転送することができる。 [0156] The spatiotemporal interpolation unit 76 may operate in a manner similar to that described above with respect to the spatiotemporal interpolation unit 50. Spatiotemporal interpolation unit 76 receives the reduced foreground V [k] vector 55 _k, also in order to generate the interpolated foreground V [k] vector 55 k _'', foreground V [k] vector Spatiotemporal interpolation can be performed on 55 _k and reduced foreground V [k−1] vector 55 _k−1 . The spatiotemporal interpolation unit 76 can forward the interpolated foreground V [k] vector 55 _k ″ to the fade unit 770.

[0157]抽出ユニット７２はまた、いつ環境ＨＯＡ係数のうちの１つが遷移中であるかを示す信号７５７を、フェードユニット７７０に出力することもでき、フェードユニット７７０は次いで、ＳＣＨ_BG４７’（ここで、ＳＣＨ_BG４７’は、「環境ＨＯＡチャネル４７’」または「環境ＨＯＡ係数４７’」とも呼ばれ得る）および補間されたフォアグラウンドＶ［ｋ］ベクトル５５_k’’の要素のうちのいずれがフェードインまたはフェードアウトのいずれかを行われるべきであるかを決定することができる。いくつかの例では、フェードユニット７７０は、環境ＨＯＡ係数４７’および補間されたフォアグラウンドＶ［ｋ］ベクトル５５_k’’の要素の各々に関して、反対に動作することができる。すなわち、フェードユニット７７０は、環境ＨＯＡ係数４７’のうちの対応する１つに関して、フェードインもしくはフェードアウト、またはフェードインもしくはフェードアウトの両方を実行することができ、一方で、補間されたフォアグラウンドＶ［ｋ］ベクトル５５_k’’の要素のうちの対応する１つに関して、フェードインもしくはフェードアウト、またはフェードインとフェードアウトの両方を実行することができる。フェードユニット７７０は、調整された環境ＨＯＡ係数４７’’をＨＯＡ係数編成ユニット８２に出力し、調整されたフォアグラウンドＶ［ｋ］ベクトル５５_k’’’をフォアグラウンド編成ユニット７８に出力することができる。この点において、フェードユニット７７０は、ＨＯＡ係数またはその派生物の様々な態様に関して、たとえば、環境ＨＯＡ係数４７’および補間されたフォアグラウンドＶ［ｋ］ベクトル５５_k’’の要素の形態で、フェード動作を実行するように構成されたユニットを表す。 [0157] The extraction unit 72 may also output a signal 757 to the fade unit 770 indicating when one of the environmental HOA coefficients is in transition, and the fade unit 770 then outputs the SCH _BG 47 '( Here, SCH _BG 47 ′ may also be referred to as “environmental HOA channel 47 ′” or “environmental HOA coefficient 47 ′”) and any of the interpolated foreground V [k] vector 55 _k ″ elements. It can be determined whether a fade-in or fade-out should be performed. In some examples, the fade unit 770 can operate in the opposite manner with respect to each of the elements of the environmental HOA coefficient 47 ′ and the interpolated foreground V [k] vector 55 _k ″. That is, fade unit 770 can perform a fade-in or fade-out, or both fade-in or fade-out, with respect to a corresponding one of environmental HOA coefficients 47 ', while interpolated foreground V [k A fade-in or fade-out or both fade-in and fade-out can be performed on the corresponding one of the elements of the vector 55 _k ″. The fade unit 770 can output the adjusted environmental HOA coefficient 47 ″ to the HOA coefficient knitting unit 82 and output the adjusted foreground V [k] vector 55 _k ″ ″ to the foreground knitting unit 78. In this regard, the fade unit 770 may perform a fade operation with respect to various aspects of the HOA coefficients or derivatives thereof, eg, in the form of elements of an environmental HOA coefficient 47 ′ and an interpolated foreground V [k] vector 55 _k ″. Represents a unit configured to perform

[0158]フォアグラウンド編成ユニット７８は、フォアグラウンドＨＯＡ係数６５を生成するために、調整されたフォアグラウンドＶ［ｋ］ベクトル５５_k’’’および補間されたｎＦＧ信号４９’に関して行列乗算を実行するように構成されたユニットを表し得る。この点において、フォアグラウンド編成ユニット７８は、フォアグラウンドまたは言い換えればＨＯＡ係数１１’の支配的態様を再構成するために、ベクトル５５ｋ’’’と（補間されたｎＦＧ４９’を示すための別の方法である）オーディオオブジェクト４９’を組み合わせ得る。フォアグラウンド編成ユニット７８は、調整されたフォアグラウンドＶ［ｋ］ベクトル５５_k’’’による補間されたｎＦＧ信号４９’の行列乗算を実行し得る。 [0158] The foreground organization unit 78 is configured to perform matrix multiplication on the adjusted foreground V [k] vector 55 _k '''and the interpolated nFG signal 49' to generate the foreground HOA coefficient 65. Unit may represent In this regard, the foreground knitting unit 78 is another way to show the vector 55k ′ ″ and (interpolated nFG49 ′) to reconstruct the dominant aspect of the foreground or in other words the HOA coefficient 11 ′. ) Audio object 49 'can be combined. Foreground knitting unit 78 may perform matrix multiplication of the adjusted foreground V [k] vector 55 k _'' 'NFG signal 49 interpolated by'.

[0159]ＨＯＡ係数編成ユニット８２は、ＨＯＡ係数１１’を取得するために、フォアグラウンドＨＯＡ係数６５を調整された環境ＨＯＡ係数４７’’に組み合わせるように構成されたユニットを表し得る。プライム表記法は、ＨＯＡ係数１１’がＨＯＡ係数１１と同様であるが同じではないことがあることを反映している。ＨＯＡ係数１１とＨＯＡ係数１１’との間の差分は、損失のある送信媒体を介した送信、量子化、または他の損失のある演算が原因の損失に起因し得る。 [0159] The HOA coefficient organization unit 82 may represent a unit configured to combine the foreground HOA coefficient 65 with the adjusted environmental HOA coefficient 47 "to obtain the HOA coefficient 11 '. The prime notation reflects that the HOA coefficient 11 'may be similar to the HOA coefficient 11 but not the same. The difference between the HOA coefficient 11 and the HOA coefficient 11 'may be due to loss due to transmission over a lossy transmission medium, quantization, or other lossy operations.

[0160]さらに、抽出ユニット７２およびオーディオ復号デバイス２４はまた、より一般的に、ある場合における様々なシンタックス要素またはデータフィールドを含まないことに関して以上で説明された方法において潜在的に最適化されるビットストリーム２１を取得するために本開示において説明される技法の様々な態様に従って動作するように構成され得る。 [0160] Furthermore, the extraction unit 72 and the audio decoding device 24 are also potentially optimized in the manner described above with respect to not including various syntax elements or data fields in more general cases. May be configured to operate in accordance with various aspects of the techniques described in this disclosure to obtain a particular bitstream 21.

[0161]いくつかの場合には、オーディオ復号デバイス２４は、第１の圧縮スキームを使用して圧縮された高次アンビソニックオーディオデータを解凍するとき、オ高次アンビソニックオーディオデータを圧縮するためにさらに使用される第２の圧縮スキームに対応するビットを含まない高次アンビソニックオーディオデータの圧縮さらたバージョンを表わすビットストリーム２１を取得するように構成され得る。第１の圧縮スキームは、ベクトルベース圧縮スキームを備え、結果として生じるベクトル球面調和領域において定義され、ビットストリーム２１を介して送られる。ベクトルベース分解圧縮スキームは、いくつかの例では、高次アンビソニックオーディオデータに対する特異値分解（あるいは図３の例に関してより詳細にせつめいされるようなその均等物）の出願を含む圧縮スキームを含む。 [0161] In some cases, the audio decoding device 24 compresses the higher order ambisonic audio data when decompressing the higher order ambisonic audio data compressed using the first compression scheme. Can be configured to obtain a bitstream 21 representing a compressed version of higher-order ambisonic audio data that does not include bits corresponding to a second compression scheme that is used in The first compression scheme comprises a vector-based compression scheme and is defined in the resulting vector spherical harmonic domain and sent via the bitstream 21. Vector-based decomposition compression schemes include, in some cases, compression schemes that include applications for singular value decomposition for higher-order ambisonic audio data (or equivalents as detailed in connection with the example of FIG. 3). Including.

[0162]オーディオ復号デバイス２４は、圧縮スキームの第２のタイプを実行するために使用される少なくとも１つのシンタックス要素に対応するビットを含まない、ビットストリーム２１を取得するように構成され得る。上述されるように、第２の圧縮スキームは、指向性ベース圧縮スキームを備える。より詳細には、オーディオ復号デバイス２４は、第２の圧縮スキームのＨＯＡＰｒｅｄｉｃｔｉｏｎＩｎｆｏシンタックス要素に対応するビットを含まないビットストリーム２１を取得するように構成され得る。言い換えれば、第２の圧縮スキームが指向性ベース圧縮スキームを備えるとき、オーディオ復号デバイス２４は、指向性ベース圧縮スキームのＨＯＡＰｒｅｄｉｃｔｉｏｎＩｎｆｏシンタックス要素に対応するビットを含まないビットストリーム２１を取得するように構成され得る。上述されるように、ＨＯＡＰｒｅｄｉｃｔｉｏｎＩｎｆｏシンタックス要素は、２つ以上の方向ベースの信号間の予測を示し得る。 [0162] The audio decoding device 24 may be configured to obtain a bitstream 21 that does not include bits corresponding to at least one syntax element used to perform the second type of compression scheme. As described above, the second compression scheme comprises a directivity-based compression scheme. More particularly, the audio decoding device 24 may be configured to obtain a bitstream 21 that does not include bits corresponding to the HOAPreductionInfo syntax element of the second compression scheme. In other words, when the second compression scheme comprises a directivity-based compression scheme, the audio decoding device 24 is configured to obtain a bitstream 21 that does not include bits corresponding to the HOAPreductionInfo syntax element of the directivity-based compression scheme. Can be done. As described above, the HOAPPredictionInfo syntax element may indicate a prediction between two or more direction-based signals.

[0163]いくつかの場合には、上述の例の代替として、または上述の例と共に、いくつかの場合には、オーディオ復号デバイス２４は、高次アンビソニックオーディオデータの圧縮中に抑制されるとき、利得修正データを含まない高次アンビソニックオーディオデータの圧縮されたバージョンを表わすビットストリーム２１を取得するように構成され得る。これらの場合には、オーディオ復号デバイス２４は、ベクトルベース合成解凍スキームに従って高次アンビソニックオーディオデータを解凍するように構成され得る。高次アンビソニックデータの圧縮されたバージョンは、高次アンビソニックオーディオデータに対する特異値分解（あるいは上記の図３の例に関してより詳細に以説明された均等物）のアプリケーションを通じて生成される。ＳＶＤがＨＯＡオーディオデータに適用されるか、またはその均等物であるとき、オーディオ符号化デバイス２０は、ビットストリーム２１中のそれらを示す結果として生じるベクトルまたはビットのうちの少なくとも１つを指定する、ここで、ベクトルは、対応するフォアグラウンドオーディオオブジェクトの空間的特徴（例えば、対応するフォアグラウンドオーディオオブジェクトの幅、位置、および音量等）を記述する。 [0163] In some cases, as an alternative to or in conjunction with the above examples, in some cases, when audio decoding device 24 is suppressed during compression of higher order ambisonic audio data , May be configured to obtain a bitstream 21 representing a compressed version of higher-order ambisonic audio data that does not include gain correction data. In these cases, audio decoding device 24 may be configured to decompress higher-order ambisonic audio data according to a vector-based synthesis decompression scheme. The compressed version of higher order ambisonic data is generated through the application of singular value decomposition (or equivalent described in more detail with respect to the example of FIG. 3 above) for higher order ambisonic audio data. When SVD is applied to or equivalent to HOA audio data, the audio encoding device 20 specifies at least one of the resulting vectors or bits representing them in the bitstream 21. Here, the vector describes the spatial characteristics of the corresponding foreground audio object (eg, the width, position, volume, etc. of the corresponding foreground audio object).

[0164]より詳細に、オーディオ復号デバイス２４は、利得修正が抑圧されることを示すために、０に設定された値を持つビットストリーム２１ＭａｘＧａｉｎＣｏｒｒＡｍｂＥｘｐシンタックス要素を取得するように構成され得る。すなわち、オーディオ復号デバイス２４は、利得修正が抑圧されるとき、ビットストリームが利得修正を記憶するＨＯＡＧａｉｎＣｏｒｒｅｃｔｉｏｎデータフィールドを含まないビットストリームを取得するように構成され得る。ビットストリーム２１は、利得修正が抑圧されることをしめす０の値を有するＭａｘＧａｉｎＣｏｒｒＡｍｂＥｘｐシンタックス要素を備え、利得修正データを記憶するＨＯＡＧａｉｎＣｏｒｒｅｃｔｉｏｎデータフィールドを含み得ない。高次アンビソニックオーディオデータの圧縮が高次アンビソニックオーディオデータに対する音声音響統合コーディング（ＵＳＡＣ）のアプリケーションを含む時に生じ得る。 [0164] In more detail, audio decoding device 24 may be configured to obtain a bitstream 21MaxGainCorAmbExp syntax element having a value set to 0 to indicate that gain correction is suppressed. That is, the audio decoding device 24 may be configured to obtain a bitstream that does not include a HOAGainCollection data field in which the bitstream stores the gain correction when the gain correction is suppressed. Bitstream 21 may include a MaxGainCorrAmbExp syntax element having a value of 0 indicating that gain correction is suppressed and may not include a HOAGainCollection data field that stores gain correction data. The compression of higher order ambisonic audio data may occur when it includes speech acoustic integrated coding (USAC) applications for higher order ambisonic audio data.

[0165]図５は、本開示で説明されるベクトルベース合成技法の様々な態様を実行する際の、図３の例に示されるオーディオ符号化デバイス２０などのオーディオ符号化デバイスの例示的な動作を示すフローチャートである。最初に、オーディオ符号化デバイス２０は、ＨＯＡ係数１１を受信する（１０６）。オーディオ符号化デバイス２０はＬＩＴユニット３０を呼び出すことができ、ＬＩＴユニット３０は、変換されたＨＯＡ係数（たとえば、ＳＶＤの場合、変換されたＨＯＡ係数はＵＳ［ｋ］ベクトル３３とＶ［ｋ］ベクトル３５とを備え得る）を出力するためにＨＯＡ係数に関してＬＩＴを適用することができる（１０７）。 [0165] FIG. 5 is an exemplary operation of an audio encoding device, such as the audio encoding device 20 shown in the example of FIG. 3, in performing various aspects of the vector-based synthesis techniques described in this disclosure. It is a flowchart which shows. Initially, the audio encoding device 20 receives the HOA coefficient 11 (106). The audio encoding device 20 can call the LIT unit 30, which can convert the transformed HOA coefficients (eg, in the case of SVD, the transformed HOA coefficients are the US [k] vector 33 and the V [k] vector. LIT can be applied (107) with respect to the HOA coefficients.

[0166]オーディオ符号化デバイス２０は次に、上記で説明された方法で様々なパラメータを特定するために、ＵＳ［ｋ］ベクトル３３、ＵＳ［ｋ−１］ベクトル３３、Ｖ［ｋ］ベクトルおよび／またはＶ［ｋ−１］ベクトル３５の任意の組合せに関して上記で説明された分析を実行するために、パラメータ計算ユニット３２を呼び出すことができる。すなわち、パラメータ計算ユニット３２は、変換されたＨＯＡ係数３３／３５の分析に基づいて少なくとも１つのパラメータを決定することができる（１０８）。 [0166] Audio encoding device 20 may then use US [k] vector 33, US [k-1] vector 33, V [k] vector, and V [k] vector to identify various parameters in the manner described above. The parameter calculation unit 32 can be invoked to perform the analysis described above for any combination of V / [k-1] vectors 35. That is, parameter calculation unit 32 may determine at least one parameter based on an analysis of the converted HOA coefficients 33/35 (108).

[0167]オーディオ符号化デバイス２０は次いで、並べ替えユニット３４を呼び出すことができ、並べ替えユニット３４は、上記で説明されたように、並べ替えられた変換されたＨＯＡ係数３３’／３５’（または言い換えれば、ＵＳ［ｋ］ベクトル３３’およびＶ［ｋ］ベクトル３５’）を生成するために、パラメータに基づいて、変換されたＨＯＡ係数（この場合も、ＳＶＤの文脈では、ＵＳ［ｋ］ベクトル３３とＶ［ｋ］ベクトル３５とを指し得る）を並べ替えることができる（１０９）。オーディオ符号化デバイス２０は、前述の演算または後続の演算のいずれかの間に、音場分析ユニット４４を呼び出すこともできる。音場分析ユニット４４は、上記で説明されたように、フォアグラウンドチャネルの総数（ｎＦＧ）４５と、バックグラウンド音場の次数（Ｎ_BG）と、送るべき追加のＢＧＨＯＡチャネルの数（ｎＢＧａ）およびインデックス（ｉ）（図３の例ではバックグラウンドチャネル情報４３としてまとめて示され得る）とを決定するために、ＨＯＡ係数１１および／または変換されたＨＯＡ係数３３／３５に関して音場分析を実行することができる（１０９）。 [0167] Audio encoding device 20 may then invoke reordering unit 34, which reordered transformed HOA coefficients 33 '/ 35' (as described above). Or in other words, to generate a US [k] vector 33 ′ and a V [k] vector 35 ′), the transformed HOA coefficients (again in the context of SVD, US [k] Vector 33 and V [k] vector 35 can be reordered (109). The audio encoding device 20 may also call the sound field analysis unit 44 during any of the aforementioned operations or subsequent operations. The sound field analysis unit 44, as explained above, is the total number of foreground channels (nFG) 45, the order of the background sound field (N _BG ), the number of additional BG HOA channels to be sent (nBGa) and Perform a sound field analysis on the HOA coefficient 11 and / or the transformed HOA coefficient 33/35 to determine the index (i) (which may be collectively shown as background channel information 43 in the example of FIG. 3). (109).

[0168]オーディオ符号化デバイス２０はまた、バックグラウンド選択ユニット４８を呼び出すことができる。バックグラウンド選択ユニット４８は、バックグラウンドチャネル情報４３に基づいて、バックグラウンドまたは環境ＨＯＡ係数４７を決定することができる（１１０）。オーディオ符号化デバイス２０はさらに、フォアグラウンド選択ユニット３６を呼び出すことができ、フォアグラウンド選択ユニット３６は、ｎＦＧ４５（フォアグラウンドベクトルを特定する１つまたは複数のインデックスを表し得る）に基づいて、音場のフォアグラウンド成分または明瞭な成分を表す、並べ替えられたＵＳ［ｋ］ベクトル３３’と並べ替えられたＶ［ｋ］ベクトル３５’とを選択することができる（１１２）。 [0168] The audio encoding device 20 may also invoke the background selection unit 48. Background selection unit 48 may determine background or environmental HOA coefficient 47 based on background channel information 43 (110). The audio encoding device 20 may further invoke a foreground selection unit 36, which may be based on nFG 45 (which may represent one or more indices identifying the foreground vector). Alternatively, a sorted US [k] vector 33 ′ and a sorted V [k] vector 35 ′ that represent distinct components can be selected (112).

[0169]オーディオ符号化デバイス２０は、エネルギー補償ユニット３８を呼び出すことができる。エネルギー補償ユニット３８は、バックグラウンド選択ユニット４８によるＨＯＡ係数のうちの様々なものの除去によるエネルギー損失を補償するために、環境ＨＯＡ係数４７に関してエネルギー補償を実行し（１１４）、それによって、エネルギー補償された環境ＨＯＡ係数４７’を生成することができる。 [0169] The audio encoding device 20 may invoke the energy compensation unit 38. The energy compensation unit 38 performs energy compensation (114) on the environmental HOA coefficient 47 to compensate for energy loss due to removal of various of the HOA coefficients by the background selection unit 48, thereby being energy compensated. An environmental HOA coefficient 47 'can be generated.

[0170]オーディオ符号化デバイス２０はまた、空間時間的補間ユニット５０を呼び出すことができる。空間時間的補間ユニット５０は、補間されたフォアグラウンド信号４９’（「補間されたｎＦＧ信号４９’」とも呼ばれ得る）と残りのフォアグラウンド指向性情報５３（「Ｖ［ｋ］ベクトル５３」とも呼ばれ得る）とを取得するために、並べ替えられた変換されたＨＯＡ係数３３’／３５’に関して空間時間的補間を実行することができる（１１６）。オーディオ符号化デバイス２０は次いで、係数低減ユニット４６を呼び出すことができる。係数低減ユニット４６は、低減されたフォアグラウンド指向性情報５５（低減されたフォアグラウンドＶ［ｋ］ベクトル５５とも呼ばれ得る）を取得するために、バックグラウンドチャネル情報４３に基づいて残りのフォアグラウンドＶ［ｋ］ベクトル５３に関して係数低減を実行することができる（１１８）。 [0170] The audio encoding device 20 may also invoke the spatiotemporal interpolation unit 50. The spatiotemporal interpolation unit 50 is also referred to as an interpolated foreground signal 49 ′ (also referred to as “interpolated nFG signal 49 ′”) and the remaining foreground directivity information 53 (also referred to as “V [k] vector 53”). In order to obtain, a spatiotemporal interpolation may be performed on the reordered transformed HOA coefficients 33 '/ 35' (116). The audio encoding device 20 can then invoke the coefficient reduction unit 46. Coefficient reduction unit 46 obtains reduced foreground directivity information 55 (which may also be referred to as reduced foreground V [k] vector 55) based on background channel information 43 to provide the remaining foreground V [k. A coefficient reduction may be performed on the vector 53 (118).

[0171]オーディオ符号化デバイス２０は次いで、上記で説明された方法で、低減されたフォアグラウンドＶ［ｋ］ベクトル５５を圧縮し、コーディングされたフォアグラウンドＶ［ｋ］ベクトル５７を生成するために、量子化ユニット５２を呼び出すことができる（１２０）。 [0171] The audio encoding device 20 then compresses the reduced foreground V [k] vector 55 and generates a coded foreground V [k] vector 57 in the manner described above. Can be invoked (120).

[0172]オーディオ符号化デバイス２０はまた、聴覚心理オーディオコーダユニット４０を呼び出すことができる。聴覚心理オーディオコーダユニット４０は、符号化された環境ＨＯＡ係数５９と符号化されたｎＦＧ信号６１とを生成するために、エネルギー補償された環境ＨＯＡ係数４７’および補間されたｎＦＧ信号４９’の各ベクトルを聴覚心理コーディングすることができる。オーディオ符号化デバイスは次いで、ビットストリーム生成ユニット４２を呼び出すことができる。ビットストリーム生成ユニット４２は、コーディングされたフォアグラウンド指向性情報５７と、コーディングされた環境ＨＯＡ係数５９と、コーディングされたｎＦＧ信号６１と、バックグラウンドチャネル情報４３とに基づいて、ビットストリーム２１を生成することができる。 [0172] The audio encoding device 20 may also call the psychoacoustic audio coder unit 40. The psychoacoustic audio coder unit 40 generates an encoded environmental HOA coefficient 59 and an encoded nFG signal 61 for each of the energy compensated environmental HOA coefficient 47 'and the interpolated nFG signal 49'. The vector can be psychoacoustically coded. The audio encoding device can then invoke the bitstream generation unit 42. The bitstream generation unit 42 generates the bitstream 21 based on the coded foreground directivity information 57, the coded environment HOA coefficient 59, the coded nFG signal 61, and the background channel information 43. be able to.

[0173]図６は、本開示で説明される技法の様々な態様を実行する際の、図４に示されるオーディオ復号デバイス２４などのオーディオ復号デバイスの例示的な動作を示すフローチャートである。最初に、オーディオ復号デバイス２４は、ビットストリーム２１を受信し得る（１３０）。ビットストリームを受信すると、オーディオ復号デバイス２４は抽出ユニット７２を呼び出し得る。説明の目的で、ベクトルベース再構成が実行されるべきであることをビットストリーム２１が示すと仮定すると、抽出デバイス７２は、上述された情報を取り出すためにビットストリームを解析し、その情報をベクトルベース再構成ユニット９２に渡し得る。 [0173] FIG. 6 is a flowchart illustrating an example operation of an audio decoding device, such as the audio decoding device 24 shown in FIG. 4, in performing various aspects of the techniques described in this disclosure. Initially, audio decoding device 24 may receive bitstream 21 (130). Upon receipt of the bitstream, audio decoding device 24 may invoke extraction unit 72. For purposes of explanation, assuming that the bitstream 21 indicates that a vector-based reconstruction should be performed, the extraction device 72 parses the bitstream to retrieve the information described above and converts the information into a vector. It can be passed to the base reconstruction unit 92.

[0174]言い換えれば、抽出ユニット７２は、コーディングされたフォアグラウンド指向性情報５７（この場合も、コーディングされたフォアグラウンドＶ［ｋ］ベクトル５７とも呼ばれ得る）と、コーディングされた環境ＨＯＡ係数５９と、コーディングされたフォアグラウンド信号（コーディングされたフォアグラウンドｎＦＧ信号５９またはコーディングされたフォアグラウンドオーディオオブジェクト５９とも呼ばれ得る）とを、上記で説明された方法でビットストリーム２１から抽出し得る（１３２）。 [0174] In other words, the extraction unit 72 includes coded foreground directivity information 57 (also referred to as coded foreground V [k] vector 57), coded environment HOA coefficients 59, and A coded foreground signal (which may also be referred to as coded foreground nFG signal 59 or coded foreground audio object 59) may be extracted from bitstream 21 in the manner described above (132).

[0175]オーディオ復号デバイス２４はさらに、逆量子化ユニット７４を呼び出し得る。逆量子化ユニット７４は、低減されたフォアグラウンド指向性情報５５_kを取得するために、コーディングされたフォアグラウンド指向性情報５７をエントロピー復号および逆量子化し得る（１３６）。オーディオ復号デバイス２４はまた、聴覚心理復号ユニット８０を呼び出し得る。聴覚心理オーディオ復号ユニット８０は、エネルギー補償された環境ＨＯＡ係数４７’と補間されたフォアグラウンド信号４９’とを取得するために、符号化された環境ＨＯＡ係数５９と符号化されたフォアグラウンド信号６１とを復号し得る（１３８）。聴覚心理復号ユニット８０は、エネルギー補償された環境ＨＯＡ係数４７’をフェードユニット７７０に渡し、ｎＦＧ信号４９’をフォアグラウンド編成ユニット７８に渡し得る。 [0175] The audio decoding device 24 may further invoke an inverse quantization unit 74. Inverse quantization unit 74 may entropy decode and inverse quantize the coded foreground directivity information 57 to obtain reduced foreground directivity information 55 _k (136). Audio decoding device 24 may also call psychoacoustic decoding unit 80. The psychoacoustic audio decoding unit 80 uses the encoded environmental HOA coefficient 59 and the encoded foreground signal 61 to obtain the energy-compensated environmental HOA coefficient 47 'and the interpolated foreground signal 49'. It can be decoded (138). The psychoacoustic decoding unit 80 may pass the energy-compensated environmental HOA coefficient 47 ′ to the fade unit 770 and pass the nFG signal 49 ′ to the foreground organization unit 78.

[0176]オーディオ復号デバイス２４は次に、空間時間的補間ユニット７６を呼び出し得る。空間時間的補間ユニット７６は、並べ替えられたフォアグラウンド指向性情報５５_k’を受信し、また、補間されたフォアグラウンド指向性情報５５_k’’を生成するために、低減されたフォアグラウンド指向性情報５５_k／５５_k-1に関して空間時間的補間を実行し得る（１４０）。空間時間的補間ユニット７６は、補間されたフォアグラウンドＶ［ｋ］ベクトル５５_k’’をフェードユニット７７０に転送し得る。 [0176] The audio decoding device 24 may then invoke a spatiotemporal interpolation unit 76. The spatiotemporal interpolation unit 76 receives the reordered foreground directivity information 55 _k ′ and reduces the foreground directivity information 55 to generate interpolated foreground directivity information 55 _k ″. _A spatiotemporal interpolation may be performed for _k / 55 _k−1 (140). The spatiotemporal interpolation unit 76 may forward the interpolated foreground V [k] vector 55 _k ″ to the fade unit 770.

[0177]オーディオ復号デバイス２４は、フェードユニット７７０を呼び出し得る。フェードユニット７７０は、エネルギー補償された環境ＨＯＡ係数４７’がいつ遷移中であるかを示すシンタックス要素（たとえば、ＡｍｂＣｏｅｆｆＴｒａｎｓｉｔｉｏｎシンタックス要素）を（たとえば、抽出ユニット７２から）受信またはさもなければ取得し得る。フェードユニット７７０は、遷移シンタックス要素と維持された遷移状態情報とに基づいて、エネルギー補償された環境ＨＯＡ係数４７’をフェードインまたはフェードアウトし、調整された環境ＨＯＡ係数４７’’をＨＯＡ係数編成ユニット８２に出力し得る。フェードユニット７７０はまた、シンタックス要素と維持された遷移状態情報とに基づいて、および、補間されたフォアグラウンドＶ［ｋ］ベクトル５５_k’’の対応する１つまたは複数の要素をフェードアウトまたはフェードインし、フォアグラウンド編成ユニット７８に調整されたフォアグラウンドＶ［ｋ］ベクトル５５_k’’’を出力し得る（１４２）。 [0177] The audio decoding device 24 may invoke the fade unit 770. Fade unit 770 receives or otherwise obtains a syntax element (eg, from AmbCoeffTransition syntax element) that indicates when the energy compensated environmental HOA coefficient 47 'is in transition (eg, from extraction unit 72). obtain. Fade unit 770 fades in or out energy compensated environmental HOA coefficient 47 'based on the transition syntax element and the maintained transition state information, and adjusts adjusted environmental HOA coefficient 47''to HOA coefficient organization. Can be output to unit 82. The fade unit 770 also fades out or fades in the corresponding element or elements of the interpolated foreground V [k] vector 55 _k ″ based on the syntax elements and the maintained transition state information. Then, the adjusted foreground V [k] vector 55 _k ′ ″ may be output to the foreground organization unit 78 (142).

[0178]オーディオ復号デバイス２４は、フォアグラウンド編成ユニット７８を呼び出し得る。フォアグラウンド編成ユニット７８は、フォアグラウンドＨＯＡ係数６５を取得するために、調整されたフォアグラウンド指向性情報５５_k’’’による行列乗算ｎＦＧ信号４９’を実行し得る（１４４）。オーディオ復号デバイス２４はまた、ＨＯＡ係数編成ユニット８２を呼び出し得る。ＨＯＡ係数編成ユニット８２は、ＨＯＡ係数１１’を取得するために、フォアグラウンドＨＯＡ係数６５を調整された環境ＨＯＡ係数４７’’に加算し得る（１４６）。 [0178] The audio decoding device 24 may invoke the foreground organization unit 78. Foreground organization unit 78 may perform matrix multiplication nFG signal 49 ′ with adjusted foreground directivity information 55 _k ″ ″ to obtain foreground HOA coefficient 65 (144). Audio decoding device 24 may also invoke HOA coefficient organization unit 82. The HOA coefficient knitting unit 82 may add the foreground HOA coefficient 65 to the adjusted environmental HOA coefficient 47 ″ to obtain the HOA coefficient 11 ′ (146).

[0179]図７は、本開示で説明される技法の様々な態様を実行する際の、図２の例に示されるシステム１０のような、システムの例示的な動作を示すフローチャートである。以上で論じられたように、コンテンツ作成者デバイス１２は、キャプチャされたまたは生成されたオーディオコンテンツ（それは、図２の例ではＨＯＡ係数１１として示される）を作成するまたは編集するためにオーディオ編集システム１８を用い得る。コンテンツ作成者１２は次いで、以上でより詳細に論じられたように、生成されたマルチチャネルスピーカーフィードに対してオーディオレンダラ１を使用してＨＯＡ係数１１をレンダリングし得る（２００）。コンテンツ作成者２２は次いで、オーディオ再生システムを使用してこれらのスピーカーフィードを再生し、一例として、所望の芸術的意図をキャプチャするために、さらなる調整または編集が要求されるかどうかを決定し得る（２０２）。さらなる調整が望まれるときは（「ＹＥＳ」２０２）、コンテンツ作成者１２は、ＨＯＡ係数をリミックスし（２０４）、ＨＯＡ係数をレンダリングし（２００）、さらなる調整が必要かどうかを決定し得る（２０２）。さらなる調整が、望まれないときは（「ＮＯ」２０２）、オーディオ符号化デバイス２０は、図５の例に関連して以上で説明された方法においてビットストリーム２１を生成し得る（２０６）。オーディオ符号化デバイス２０はまた、以上でより詳細に記述されたように、ビットストリーム２１でオーディオレンダリング情報２を生成し、指定し得る（２０８）。 [0179] FIG. 7 is a flowchart illustrating an exemplary operation of a system, such as the system 10 shown in the example of FIG. 2, in performing various aspects of the techniques described in this disclosure. As discussed above, the content creator device 12 may use an audio editing system to create or edit captured or generated audio content (which is shown as the HOA coefficient 11 in the example of FIG. 2). 18 may be used. Content creator 12 may then render HOA coefficient 11 using audio renderer 1 on the generated multi-channel speaker feed, as discussed in more detail above (200). The content creator 22 may then play these speaker feeds using an audio playback system and, by way of example, determine whether further adjustments or edits are required to capture the desired artistic intent. (202). When further adjustment is desired (“YES” 202), the content creator 12 may remix the HOA coefficients (204), render the HOA coefficients (200), and determine whether further adjustment is required (202). ). If further adjustment is not desired (“NO” 202), the audio encoding device 20 may generate the bitstream 21 in the manner described above in connection with the example of FIG. 5 (206). Audio encoding device 20 may also generate and specify audio rendering information 2 in bitstream 21 as described in more detail above (208).

[0180]コンテンツ消費者デバイス１４は次いで、ビットストリーム２１からオーディオレンダリング情報２を取得し得る（２１０）。復号デバイス２４は次いで、図５の例に関連して以上で説明した方法において、オーディオコンテンツ（それは、図２の例ではＨＯＡ係数１１’として示される）を取得するようにビットストリーム２１を復号し得る。オーディオ再生システム１６は次いで、以上で説明された方法においてオーディオレンダリング情報２に基づいてＨＯＡ係数１１’をレンダリングし（２１２）、ラウドスピーカー３を介してレンダリングされたオーディオコンテンツを再生し得る（２１４）。 [0180] The content consumer device 14 may then obtain audio rendering information 2 from the bitstream 21 (210). Decoding device 24 then decodes bitstream 21 to obtain audio content (which is shown as HOA coefficient 11 'in the example of FIG. 2) in the manner described above in connection with the example of FIG. obtain. The audio playback system 16 may then render the HOA coefficient 11 'based on the audio rendering information 2 in the manner described above (212) and play the audio content rendered via the loudspeaker 3 (214). .

[0181]本開示で述べられる技法はしたがって、第１の例として、マルチチャネルオーディオコンテンツを表すビットストリームを生成するデバイスがオーディオレンダリング情報を指定することを可能にし得る。デバイスは、この第１の例では、マルチチャネルオーディオコンテンツを生成するときに使用されるオーディオレンダラを識別する信号値を含むオーディオレンダリング情報を指定するための手段を含む、デバイス。 [0181] The techniques described in this disclosure may thus allow a device that generates a bitstream representing multi-channel audio content as a first example to specify audio rendering information. The device, in this first example, includes means for specifying audio rendering information including signal values that identify an audio renderer used when generating multi-channel audio content.

[0182]第１の例のデバイスであって、信号値は、球面調和係数を複数のスピーカーフィードにレンダリングするために使用される行列を含む、第１の例のデバイス。 [0182] The first example device, wherein the signal values include a matrix that is used to render the spherical harmonic coefficients into a plurality of speaker feeds.

[0183]]第２の例では、第１の例のデバイスであって、信号値は、ビットストリームが、球面調和係数を複数のスピーカーフィードにレンダリングするために使用される行列を含むことを示すインデックスを規定する２つ以上のビットを含む。 [0183] In the second example, the device of the first example, wherein the signal value indicates that the bitstream includes a matrix that is used to render spherical harmonics into multiple speaker feeds. Contains two or more bits that define an index.

[0184]第２の例のデバイスであって、オーディオレンダリング情報はさらに、ビットストリームに含まれる行列の行の数を規定する２つ以上のビットと、ビットストリームに含まれる行列の列の数を規定する２つ以上のビットとを含む、第２の例のデバイス。 [0184] The device of the second example, wherein the audio rendering information further includes two or more bits that define the number of rows of the matrix included in the bitstream and the number of columns of the matrix included in the bitstream. A second example device comprising two or more bits defining.

[0185]第１の例のデバイスであって、信号値は、オーディオオブジェクトを複数のスピーカーフィードにレンダリングするために使用されるレンダリングアルゴリズムを指定する、第１の例のデバイス。 [0185] The first example device, wherein the signal value specifies a rendering algorithm that is used to render the audio object into a plurality of speaker feeds.

[0186]第１の例のデバイスであって、信号値は、球面調和係数を複数のスピーカーフィードにレンダリングするために使用されるレンダリングアルゴリズムを指定する。 [0186] In a first example device, the signal value specifies a rendering algorithm used to render spherical harmonic coefficients into a plurality of speaker feeds.

[0187]第１の例のデバイスであって、信号値は、球面調和係数を複数のスピーカーフィードにレンダリングするために使用される複数の行列の１つと関連するインデックスを規定する２つ以上のビットを含む、第１の例のデバイス。 [0187] The device of the first example, wherein the signal value is two or more bits defining an index associated with one of a plurality of matrices used to render the spherical harmonics into the plurality of speaker feeds. A device of a first example comprising:

[0188]第１の例のデバイスであって、信号値は、オーディオオブジェクトを複数のスピーカーフィードにレンダリングするために使用される複数のレンダリングアルゴリズムの１つと関連するインデックスを規定する２つ以上のビットを含む、第１の例のデバイス。 [0188] The device of the first example, wherein the signal value is two or more bits defining an index associated with one of a plurality of rendering algorithms used to render the audio object into a plurality of speaker feeds. A device of a first example comprising:

[0189]第１の例のデバイスであって、信号値は、球面調和係数を複数のスピーカーフィードにレンダリングするために使用される複数のレンダリングアルゴリズムの１つと関連するインデックスを規定する２つ以上のビットを含む、第１の例のデバイス。 [0189] The device of the first example, wherein the signal value defines two or more indices defining an index associated with one of a plurality of rendering algorithms used to render the spherical harmonic coefficients into a plurality of speaker feeds. The device of the first example, including bits.

[0190]第１の例のデバイスであって、オーディオレンダリング情報を指定するための手段は、ビットストリームでオーディオフレームごとにオーディオレンダリング情報を指定するための手段を備える。 [0190] In the first example device, the means for specifying audio rendering information comprises means for specifying audio rendering information for each audio frame in the bitstream.

[0191]第１の例のデバイスであって、オーディオレンダリング情報を指定するための手段は、ビットストリームで一回オーディオレンダリング情報を指定するための手段を備える、第１の例のデバイス。 [0191] The device of the first example, wherein the means for specifying audio rendering information comprises means for specifying audio rendering information once in the bitstream.

[0192]第３の例では、実行されるとき、１つまたは複数のプロセッサにビットストリームでオーディオレンダリング状を指定させる命令をその上に記憶した非一時的コンピュータ可読記憶媒体であって、オーディオレンダリング情報は、マルチチャネルオーディオコンテンツを生成するときに使用されるオーディオレンダラを識別する、非一時的コンピュータ可読記憶媒体。 [0192] In a third example, a non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause one or more processors to specify the state of audio rendering in a bitstream. The information is a non-transitory computer readable storage medium that identifies an audio renderer that is used when generating multi-channel audio content.

[0193]第４の例では、ビットストリームからマルチチャネルオーディオコンテンツをレンダリングするためのデバイスであって、本デバイスは、マルチチャネルオーディオコンテンツを生成するときに使用されるオーディオレンダラを識別する信号値を含むオーディオレンダリング情報を決定するための手段と、ビットストリームで指定されるオーディオレンダリング情報に基づいて複数のスピーカーフィードをレンダリングするための手段とを備える、デバイス。 [0193] In a fourth example, a device for rendering multi-channel audio content from a bitstream, the device comprising a signal value identifying an audio renderer used when generating multi-channel audio content. A device comprising: means for determining audio rendering information to include; and means for rendering a plurality of speaker feeds based on audio rendering information specified in the bitstream.

[0194]第４の例のデバイスであって、信号値は、球面調和係数を複数のスピーカーフィードにレンダリングするために使用される行列を含み、複数のスピーカーフィードをレンダリングするための手段は、行列に基づいて複数のスピーカーフィードをレンダリングするための手段を備える、第４の例のデバイス。 [0194] In a fourth example device, the signal value includes a matrix used to render the spherical harmonic coefficients into a plurality of speaker feeds, and the means for rendering the plurality of speaker feeds is a matrix. A device of a fourth example comprising means for rendering a plurality of speaker feeds based on

[0195]第５の例では、第４の例のデバイスであって、信号値は、ビットストリームが、球面調和係数を複数のスピーカーフィードにレンダリングするために使用される行列を含むことを示すインデックスを規定する２つ以上のビットを含み、デバイスはさらに、インデックスに応答してビットストリームから行列を解析するための手段を備え、複数のスピーカーフィードをレンダリングするための手段は、解析された行列に基づいて複数のスピーカーフィードをレンダリングするための手段を備える、第４の例のデバイス。 [0195] In a fifth example, the device of the fourth example, wherein the signal value is an index indicating that the bitstream includes a matrix used to render spherical harmonic coefficients into multiple speaker feeds Wherein the device further comprises means for analyzing the matrix from the bitstream in response to the index, the means for rendering the plurality of speaker feeds in the analyzed matrix A fourth example device comprising means for rendering a plurality of speaker feeds based thereon.

[0196]第５の例のデバイスであって、信号値はさらに、ビットストリームに含まれる行列の行の数を規定する２つ以上のビットと、ビットストリームに含まれる行列の列の数を規定する２つ以上のビットとを含み、ビットストリームから行列を解析するための手段は、インデックスに応答し、行の数を規定する２つ以上のビットおよび列の数を規定する２つ以上のビットに基づいてビットストリームから行列を解析するための手段を備える、第５の例のデバイス。 [0196] The device of the fifth example, wherein the signal value further defines two or more bits defining the number of rows of the matrix included in the bitstream and the number of columns of the matrix included in the bitstream. Means for analyzing the matrix from the bitstream is responsive to the index, the two or more bits defining the number of rows and the two or more bits defining the number of columns A fifth example device comprising means for analyzing a matrix from a bitstream based on

[0197]第４の例のデバイスであって、信号値は、オーディオオブジェクトを複数のスピーカーフィードにレンダリングするために使用されるレンダリングアルゴリズムを指定し、複数のスピーカーフィードをレンダリングするための手段は、指定されたレンダリングアルゴリズムを使用してオーディオオブジェクトから複数のスピーカーフィードをレンダリングするための手段を備える、第４の例のデバイス。 [0197] In a fourth example device, the signal value specifies a rendering algorithm used to render the audio object into a plurality of speaker feeds, and the means for rendering the plurality of speaker feeds comprises: The device of the fourth example comprising means for rendering a plurality of speaker feeds from an audio object using a specified rendering algorithm.

[0198]第４の例のデバイスであって、信号値は、球面調和係数を複数のスピーカーフィードにレンダリングするために使用されるレンダリングアルゴリズムを指定し、複数のスピーカーフィードをレンダリングするための手段は、指定されたレンダリングアルゴリズムを使用して球面調和係数から複数のスピーカーフィードをレンダリングするための手段を備える、第４の例のデバイス。 [0198] In a fourth example device, the signal value specifies a rendering algorithm used to render the spherical harmonic coefficients into a plurality of speaker feeds, and the means for rendering the plurality of speaker feeds is A fourth example device comprising means for rendering a plurality of speaker feeds from spherical harmonics using a specified rendering algorithm.

[0199]第４の例のデバイスであって、信号値は、球面調和係数を複数のスピーカーフィードにレンダリングするために使用される複数の行列の１つと関連するインデックスを規定する２つ以上のビットを含み、複数のスピーカーフィードをレンダリングするための手段は、インデックスと関連する複数の行列の１つを使用して球面調和係数から複数のスピーカーフィードをレンダリングするための手段を備える、第４の例のデバイス。 [0199] The device of the fourth example, wherein the signal value is two or more bits defining an index associated with one of a plurality of matrices used to render the spherical harmonics into the plurality of speaker feeds. And the means for rendering the plurality of speaker feeds comprises means for rendering the plurality of speaker feeds from the spherical harmonics using one of the plurality of matrices associated with the index. Devices.

[0200]第４の例のデバイスであって、信号値は、オーディオオブジェクトを複数のスピーカーフィードにレンダリングするために使用される複数のレンダリングアルゴリズムの１つと関連するインデックスを規定する２つ以上のビットを含み、複数のスピーカーフィードをレンダリングするための手段は、インデックスと関連する複数のレンダリングアルゴリズムの１つを使用してオーディオオブジェクトから複数のスピーカーフィードをレンダリングするための手段を備える、第４の例のデバイス。 [0200] A fourth example device, wherein the signal value is two or more bits defining an index associated with one of a plurality of rendering algorithms used to render the audio object into a plurality of speaker feeds. And means for rendering a plurality of speaker feeds comprises means for rendering a plurality of speaker feeds from an audio object using one of a plurality of rendering algorithms associated with the index. Devices.

[0201]第４の例のデバイスであって、信号値は、球面調和係数を複数のスピーカーフィードにレンダリングするために使用される複数のレンダリングアルゴリズムの１つと関連するインデックスを規定する２つ以上のビットを含み、複数のスピーカーフィードをレンダリングするための手段は、インデックスと関連する複数のレンダリングアルゴリズムの１つを使用して球面調和係数から複数のスピーカーフィードをレンダリングするための手段を備える、第４の例のデバイス。 [0201] The device of the fourth example, wherein the signal value defines two or more indices defining an index associated with one of a plurality of rendering algorithms used to render the spherical harmonic coefficients into a plurality of speaker feeds. The means for rendering a plurality of speaker feeds comprising bits comprises means for rendering the plurality of speaker feeds from the spherical harmonics using one of a plurality of rendering algorithms associated with the index. Example device.

[0202]第４の例のデバイスであって、オーディオレンダリング情報を決定するための手段は、ビットストリームからオーディオフレームごとにオーディオレンダリング情報を決定するための手段を含む、第４の例のデバイス。 [0202] The device of the fourth example, wherein the means for determining audio rendering information includes means for determining audio rendering information for each audio frame from the bitstream.

[0203]第４の例のデバイスであって、オーディオレンダリング情報手段を決定するための手段は、ビットストリームから一回オーディオレンダリング情報を決定することを含む、第４の例のデバイス。 [0203] The device of the fourth example, wherein the means for determining the audio rendering information means includes determining the audio rendering information once from the bitstream.

[0204]第６の例では、実行されるとき、１つまたは複数のプロセッサに、マルチチャネルオーディオコンテンツを生成するときに使用されるオーディオレンダラを識別する信号値を含むオーディオレンダリング情報を決定させ、ビットストリームで指定されたオーディオレンダリング情報に基づいて複数のスピーカーフィードをレンダリングさせる命令をその上に記憶した非一時的コンピュータ可読記憶媒体。 [0204] In a sixth example, when executed, causes one or more processors to determine audio rendering information including a signal value that identifies an audio renderer used when generating multi-channel audio content; A non-transitory computer readable storage medium having stored thereon instructions for rendering a plurality of speaker feeds based on audio rendering information specified in the bitstream.

[0205]図８Ａ−図８Ｄは、本開示において記述される技法にしたがって形成されたビットストリーム２１Ａ−２１Ｄを示す図である。図８Ａの例では、ビットストリーム２１Ａは、以上の図２−図４において示されるビットストリーム２１の１つの例を表わし得る。ビットストリーム２１Ａは、信号値５５４を定義する１つまたは複数のビットを含むオーディオレンダリング情報２Ａを含む。この信号値５５４は、以下で説明される情報のタイプの任意の組み合わせを表わし得る。ビットストリーム２１Ａはまた、オーディオコンテンツ７／９の１つの例を表わし得る、オーディオコンテンツ５５８を含む。 [0205] FIGS. 8A-8D are diagrams illustrating bitstreams 21A-21D formed in accordance with the techniques described in this disclosure. In the example of FIG. 8A, the bitstream 21A may represent one example of the bitstream 21 shown in FIGS. 2-4 above. The bitstream 21A includes audio rendering information 2A that includes one or more bits that define a signal value 554. This signal value 554 may represent any combination of the types of information described below. Bitstream 21A also includes audio content 558, which may represent one example of audio content 7/9.

[0206]図８Ｂの例では、ビットストリーム２１Ｂは、ビットストリーム２１Ａに類似し得、ここで、オーディオレンダリング情報２Ｂの信号値５５４は、インデックス５５４Ａ、シグナリングされた行列の行サイズ５５４Ｂを定義する１つまたは複数のビット、シグナリングされた行列の列サイズ５５４Ｃを定義する１つまたは複数のビット、および行列係数５５４Ｄを備える。インデックス５５４Ａは、２乃至５ビットを使用して、定義されるが、各行サイズ５５４Ｂおよび列サイズ５５４Ｃは、２乃至１６ビットを使用して定義され得る。 [0206] In the example of FIG. 8B, the bitstream 21B may be similar to the bitstream 21A, where the signal value 554 of the audio rendering information 2B defines an index 554A, the row size 554B of the signaled matrix 1 One or more bits, one or more bits defining a column size 554C of the signaled matrix, and a matrix coefficient 554D. Index 554A is defined using 2 to 5 bits, but each row size 554B and column size 554C may be defined using 2 to 16 bits.

[0207]抽出ユニット７２は、インデックス５５４Ａを抽出し、行列がビットストリーム２１中に含まれるかシグナリングするかどうかを決定し得る（ここで、００００または１１１１のようなあるインデックス値は、行列がビットストリーム２１Ｂ中に明示的に指定されることをシグナリングする）。図８Ｂの例では、ビットストリーム２１Ｂは、行列がビットストリーム２１Ｂ中に明示的に指定されることをシグナリングするインデックス５５４Ａを含む。結果として、抽出ユニット７２は、行サイズ５５４Ｂおよび列サイズ５５４Ｃを抽出し得る。抽出ユニット７２は、行サイズ５５４Ｂ、列サイズ５５４Ｃおよび各行列係数のシグナルされた（図８Ａに示されない）または暗示的なビットサイズの関数としての行列係数を表わすことを解析するために、ビット数を計算するように構成され得る。決定されたビット数を使用して、抽出ユニット７２は、オーディオ再生システム１６が上述されたようなオーディオレンダラ２２のうちの１つを構成するように使用され得る、行列係数５５４Ｄを抽出し得る。ビットストリーム２１Ｂ中のオーディオレンダリング情報２Ｂの単一時間をシグナリングするとして示されるが、オーディオレンダリング情報２Ｂは、ビットストリーム中に、または（いくつかの場合のオプションデータとして）少なくとも部分的にまたは完全に、別個の帯域外チャネル中に複数回シグナルされ得る。 [0207] Extraction unit 72 may extract index 554A and determine whether the matrix is included in bitstream 21 or signaled (where an index value, such as 0000 or 1111, is determined by whether the matrix is a bit Signaling that it is explicitly specified in stream 21B). In the example of FIG. 8B, the bitstream 21B includes an index 554A that signals that a matrix is explicitly specified in the bitstream 21B. As a result, extraction unit 72 may extract row size 554B and column size 554C. The extraction unit 72 analyzes the row size 554B, the column size 554C and the number of bits to analyze representing the matrix coefficient as a function of the signaled (not shown in FIG. 8A) or implicit bit size of each matrix coefficient. May be configured to calculate Using the determined number of bits, extraction unit 72 may extract matrix coefficients 554D that may be used by audio playback system 16 to configure one of audio renderers 22 as described above. Although shown as signaling a single time of audio rendering information 2B in bitstream 21B, audio rendering information 2B is at least partially or completely in the bitstream or (in some cases optional data). Can be signaled multiple times in separate out-of-band channels.

[0208]図８Ｃの例では、ビットストリーム２１Ｃは、以上の図２−図４において示されるビットストリーム２１の１つの例を表わし得る。ビットストリーム２１Ｃは、この例におけるアルゴリズムインデックス５５４Ｅを指定する、信号値５５４を含むオーディオレンダリング情報２Ｃを含む。ビットストリーム２１Ｃはまた、オーディオコンテンツ５５８を含む。上述するように、アルゴリズムインデックス５５４Ｅは、２−５ビットを使用して定義され得、このアルゴリズムインデックス５５４Ｅは、オーディオコンテンツ５５８をレンダリングするときに使用される。 [0208] In the example of FIG. 8C, the bitstream 21C may represent one example of the bitstream 21 shown in FIGS. 2-4 above. The bitstream 21C includes audio rendering information 2C that includes a signal value 554 that specifies the algorithm index 554E in this example. The bitstream 21C also includes audio content 558. As described above, algorithm index 554E may be defined using 2-5 bits, and this algorithm index 554E is used when rendering audio content 558.

[0209]抽出ユニット７２は、アルゴリズムインデックス５５０Ｅを抽出し、行列がビットストリーム２１Ｃに含まれることを、アルゴリズムインデックス５５４Ｅがシグナリングするかどうかを決定し得る（ここで、００００または１１１１のようなあるインデックス値は、行列がビットストリーム２１中に明示的に指定されることをシグナリングし得る）。図８Ｃの例では、ビットストリーム２１Ｃは、行列がビットストリーム２１Ｃ中に明示的に指定されないことをシグナリングするアルゴリズムインデックス５５４Ｅを含む。結果として、抽出ユニット７２は、アルゴリズムインデックス５５４Ｅをオーディオ再生装置１６に転送し得、それは、（図２−図４の例におけるレンダラ２２として示される）レンダリングアルゴリズムの対応するもの（利用可能な場合）を選択する。ビットストリーム２１Ｃにおいて一回、オーディオレンダリング情報２Ｃをシグナリングすることとして示されるが、図８Ｃでは、オーディオレンダリング情報２Ｃは、ビットストリーム２１Ｃにおいて、または（いくつかの場合にはオプションデータとして少なくとも部分的に、または完全に、別個の帯域外チャネルにおいて複数回シグナリングされ得る。 [0209] Extraction unit 72 may extract algorithm index 550E and determine whether algorithm index 554E signals that a matrix is included in bitstream 21C (where an index such as 0000 or 1111). The value may signal that the matrix is explicitly specified in the bitstream 21). In the example of FIG. 8C, the bitstream 21C includes an algorithm index 554E that signals that a matrix is not explicitly specified in the bitstream 21C. As a result, extraction unit 72 may forward algorithm index 554E to audio playback device 16, which corresponds to the rendering algorithm (if available) (shown as renderer 22 in the examples of FIGS. 2-4). Select. Although shown as signaling audio rendering information 2C once in the bitstream 21C, in FIG. 8C, the audio rendering information 2C is at least partially in the bitstream 21C or (in some cases as optional data). Can be signaled multiple times in separate out-of-band channels.

[0210]図８Ｄの例では、ビットストリーム２１Ｄは、上以上の図２−図４において示されるビットストリーム２１の１つの例を表わし得る。ビットストリーム２１Ｄは、信号値５５４を含むオーディオレンダリング情報２Ｄを含み、それは、この例では、５５行列インデックス５５４Ｆを指定する。ビットストリーム２１Ｄはまた、オーディオコンテンツ５５８を含む。上述したように、行列インデックス５５４Ｆは、２〜５ビットを使用して定義され、この行列インデックス５５４Ｆは、オーディオコンテンツ５５８をレンダリングするときに使用される、レンダリングアルゴリズムを識別し得る。 [0210] In the example of FIG. 8D, bitstream 21D may represent one example of bitstream 21 shown in FIGS. 2-4 above. Bitstream 21D includes audio rendering information 2D that includes signal value 554, which in this example specifies 55 matrix index 554F. The bitstream 21D also includes audio content 558. As described above, the matrix index 554F is defined using 2-5 bits, and this matrix index 554F may identify the rendering algorithm used when rendering the audio content 558.

[0211]抽出ユニット７２は、行列インデックス５５０Ｆを抽出し、行列がビットストリーム２１Ｄ中に含まれることを行列インデックス５５４Ｆがシグナリングする（ここで、行列がビットストリーム２１Ｃ中に明示的に指定されることを、００００または１１１１のような、あるインデックス値がシグナリングし得る）かどうかを決定し得る。図８Ｄの例では、ビットストリーム２１Ｄは、行列がビットストリーム２１Ｄ中に明示的に指定されないことをシグナリングする、行列インデックス５５４Ｆを含む。結果として、抽出ユニット７２は、オーディオ再生デバイスに行列インデックス５５４Ｆを転送し、それは、レンダラ２２の対応するもの（利用可能な場合）を選択する。ビットストリーム２１Ｄ中に一回、オーディオレンダリング情報２Ｄをシグナリングすることとして示されるが、オーディオレンダリング情報２Ｄは、ビットストリーム２１Ｄ中に、または（いくつかの場合においてオプションのデータとして）少なくとも部分的にまたは完全に別個の帯域外チャネル中に複数回シグナリングされ得る。 [0211] Extraction unit 72 extracts matrix index 550F and matrix index 554F signals that the matrix is included in bitstream 21D (where the matrix is explicitly specified in bitstream 21C). Can be determined whether an index value such as 0000 or 1111 can be signaled). In the example of FIG. 8D, the bitstream 21D includes a matrix index 554F that signals that the matrix is not explicitly specified in the bitstream 21D. As a result, the extraction unit 72 forwards the matrix index 554F to the audio playback device, which selects the corresponding one of the renderer 22 (if available). Although shown as signaling audio rendering information 2D once in bitstream 21D, audio rendering information 2D is at least partially or in bitstream 21D or (in some cases as optional data) or It can be signaled multiple times in a completely separate out-of-band channel.

[0212]図８Ｅ−図８Ｇは、より詳細に、圧縮した空間成分を指定し得るビットストリームまたはサイドチャネル情報の一部を示す図である。図８Ｅは、ビットストリーム２１のフレーム２４９Ａ’の第１の例を示す。図８Ｅの例では、フレーム２４９Ａ’は、ＣｈａｎｎｅｌＳｉｄｅＩｎｆｏＤａｔａ（ＣＳＩＤ）フィールド１５４Ａ−１５４Ｃ、ＨＯＡＧａｉｎＣｏｒｒｅｃｔｉｏｎＤａｔａ（ＨＯＡＧＣＤ）フィールド、およびＶＶｅｃｔｏｒＤａｔａフィールド１５６Ａおよび１５６Ｂを含む。ＣＳＩＤフィールド１５４Ａは、ＣｈａｎｎｅｌＴｙｐｅ２６９と共にｕｎｉｔＣ２６７、ｂｂ２６６、およびｂａ２６５を含む。それらの各々は、図８Ｅの例において示される、対応する値０１、１、０、および０１に設定される。ＣＳＩＤフィールド１５４Ｂは、ＣｈａｎｎｅｌＴｙｐｅ２６９と共にｕｎｉｔＣ２６７、ｂｂ２６６およびｂａ２６５を含む。それらの各々は、図８Ｅの例において示される、対応する値０１、１、０、および０１に設定される。ＣＳＩＤフィールド１５４Ｃは、３の値を有するＣｈａｎｎｅｌＴｙｐｅフィールド２６９を含む。ＣＳＩＤフィールド１５４Ａ−１５４Ｃの各々は、トランスポートチャネル１、２、および３のそれぞれの１つに対応する。事実上、各ＣＳＩＤフィールド１５４Ａ−１５４Ｃは、対応するペイロード１５６Ａおよび１５６Ｂが方向ベースの信号か（対応するＣｈａｎｎｅｌＴｙｐｅが０に等しいとき）、ベクトルベース信号か（対応するＣｈａｎｎｅｌＴｙｐｅが１に等しいときの）、追加の環境ＨＯＡ係数か（対応するＣｈａｎｎｅｌＴｙｐｅが２に等しいとき）、空か（ＣｈａｎｎｅｌＴｙｐｅが３に等しいとき）を示す。 [0212] FIGS. 8E-8G are diagrams illustrating a portion of bitstream or side channel information that may specify compressed spatial components in more detail. FIG. 8E shows a first example of the frame 249 </ b> A ′ of the bitstream 21. In the example of FIG. 8E, the frame 249A 'includes a ChannelSideInfoData (CSID) field 154A-154C, a HOAGainCollectionData (HOAGCD) field, and VVectorData fields 156A and 156B. CSID field 154A includes unitC 267, bb 266, and ba265 along with ChannelType 269. Each of them is set to the corresponding value 01, 1, 0, and 01 shown in the example of FIG. 8E. The CSID field 154B includes unitC 267, bb 266, and ba265 along with ChannelType 269. Each of them is set to the corresponding value 01, 1, 0, and 01 shown in the example of FIG. 8E. CSID field 154C includes a ChannelType field 269 having a value of 3. Each of the CSID fields 154A-154C corresponds to a respective one of transport channels 1, 2, and 3. In effect, each CSID field 154A-154C has a corresponding payload 156A and 156B that is a direction-based signal (when the corresponding ChannelType is equal to 0) or a vector-based signal (when the corresponding ChannelType is equal to 1), Indicates an additional environmental HOA factor (when the corresponding ChannelType is equal to 2) or empty (when the ChannelType is equal to 3).

[0213]図８Ｅの例では、フレーム２４９Ａは、（ＣＳＩＤフィールド１５４Ａおよび１５４Ｂにおいて１に等しいＣｈａｎｎｅｌＴｙｐｅシンタックス要素２６９が与えられる）２つのベクトルベース信号と、（ＣＳＩＤフィールド１５４Ｃにおいて３に等しいＣｈａｎｎｅｌＴｙｐｅ２６９が与えられる）空（empty）とを含む。（説明を簡単にするために示されていない）上記のＨＯＡｃｏｎｆｉｇ部分に基づいて、オーディオ復号デバイス２４は、すべての１６Ｖベクトル要素を決定し得る。従って、ＶＶｅｃｔｏｒＤａｔａ１５６Ａおよび１５６Ｂの各々は、１６のベクトル要素すべてを含み、それらの各々は、８ビットで一様に量子化される。 [0213] In the example of FIG. 8E, frame 249A is provided with two vector-based signals (given ChannelType syntax element 269 equal to 1 in CSID fields 154A and 154B) and ChannelType 269 (given 3 in CSID field 154C). Included) and empty. Based on the above-described HOAconfig portion (not shown for ease of explanation), audio decoding device 24 may determine all 16V vector elements. Thus, each of VVectorData 156A and 156B includes all 16 vector elements, each of which is uniformly quantized with 8 bits.

[0214]図８Ｅの例においてさらに示されるように、フレーム２４９Ａ’は、ＨＯＡＰｒｅｄｉｃｔｉｏｎＩｎｆｏフィールドを含まない。ＨＯＡＰｒｅｄｉｃｔｉｏｎＩｎｆｏフィールドは、ベクトルベース圧縮スキームがＨＯＡオーディオデータを圧縮するときに、本開示において説明される技法にしたがって除去され得る方向ベースの圧縮スキームに対応するフィールドを表わし得る。 [0214] As further shown in the example of FIG. 8E, frame 249A 'does not include a HOAPPredictionInfo field. The HOAP predictionInfo field may represent a field corresponding to a direction-based compression scheme that may be removed according to the techniques described in this disclosure when the vector-based compression scheme compresses HOA audio data.

[0215]図８Ｆは、ＨＯＡＧａｉｎＣｏｒｒｅｃｔｉｏｎＤａｔａがフィールド２４９’’に記憶された各トランスポートチャネルから除去されたことを除いてフレーム２４９Ａに実質的に類似しているフレーム２４９’’を説明する図である。ＨＯＡＧａｉｎＣｏｒｒｅｃｔｉｏｎＤａｔａフィールドは、利得修正が以上で説明される本技法の様々な態様に従って抑制されるとき、フレーム２４９’’から除去され得る。 [0215] FIG. 8F is a diagram illustrating a frame 249 "that is substantially similar to frame 249A, except that HOAGainCollectionData has been removed from each transport channel stored in field 249". The HOAGainCollectionData field may be removed from frame 249 '' when gain correction is suppressed in accordance with various aspects of the techniques described above.

[0216]図８Ｇは、ＨＯＡＰｒｅｄｉｃｔｉｏｎＩｎｆｏフィールドがっ除去されるという点を除いてフレーム２４９Ａ’’に類似するフレーム２４９Ａ’’’に類似し得る。フレーム２４９Ａ’’’は、本技法の両方の態様がある環境では必要でない様々なフィールドを除去するために、結合に適用され得る場合の１つの例を表わす。 [0216] FIG. 8G may be similar to frame 249A "" that is similar to frame 249A "except that the HOAPPredictionInfo field is removed. Frame 249A "" represents one example where it can be applied to a combination to remove various fields that are not needed in an environment where both aspects of the techniques are present.

[0217]上記の技法は、任意の数の異なる状況およびオーディオエコシステムに関して実行され得る。いくつかの例示的な状況が以下で説明されるが、本技法はそれらの例示的な状況に限定されるべきではない。１つの例示的なオーディオエコシステムは、オーディオコンテンツと、映画スタジオと、音楽スタジオと、ゲーミングオーディオスタジオと、チャネルベースオーディオコンテンツと、コーディングエンジンと、ゲームオーディオステムと、ゲームオーディオコーディング／レンダリングエンジンと、配信システムとを含み得る。 [0217] The above techniques may be performed for any number of different situations and audio ecosystems. Some example situations are described below, but the technique should not be limited to those example situations. One exemplary audio ecosystem includes audio content, movie studios, music studios, gaming audio studios, channel-based audio content, coding engines, game audio stems, game audio coding / rendering engines, Distribution system.

[0218]映画スタジオ、音楽スタジオ、およびゲーミングオーディオスタジオは、オーディオコンテンツを受信することができる。いくつかの例では、オーディオコンテンツは、獲得物の出力を表し得る。映画スタジオは、デジタルオーディオワークステーション（ＤＡＷ）を使用することなどによって、（たとえば、２．０、５．１、および７．１の）チャネルベースオーディオコンテンツを出力することができる。音楽スタジオは、ＤＡＷを使用することなどによって、（たとえば、２．０、および５．１の）チャネルベースオーディオコンテンツを出力することができる。いずれの場合も、コーディングエンジンは、配信システムによる出力のために、チャネルベースオーディオコンテンツベースの１つまたは複数のコーデック（たとえば、ＡＡＣ、ＡＣ３、ＤｏｌｂｙＴｒｕｅＨＤ、ＤｏｌｂｙＤｉｇｉｔａｌＰｌｕｓ、およびＤＴＳＭａｓｔｅｒＡｕｄｉｏ）を受信し符号化することができる。ゲーミングオーディオスタジオは、ＤＡＷを使用することなどによって、１つまたは複数のゲームオーディオステムを出力することができる。ゲームオーディオコーディング／レンダリングエンジンは、配信システムによる出力のために、オーディオステムをチャネルベースオーディオコンテンツへとコーディングおよびまたはレンダリングすることができる。本技法が実行され得る別の例示的な状況は、放送録音オーディオオブジェクトと、プロフェッショナルオーディオシステムと、消費者向けオンデバイスキャプチャと、ＨＯＡオーディオフォーマットと、オンデバイスレンダリングと、消費者向けオーディオと、ＴＶ、およびアクセサリと、カーオーディオシステムとを含み得る、オーディオエコシステムを備える。 [0218] Movie studios, music studios, and gaming audio studios may receive audio content. In some examples, the audio content may represent an output of the acquisition. Movie studios can output channel-based audio content (eg, 2.0, 5.1, and 7.1), such as by using a digital audio workstation (DAW). The music studio can output channel-based audio content (eg, 2.0 and 5.1), such as by using a DAW. In any case, the coding engine uses one or more channel-based audio content-based codecs (eg, AAC, AC3, Dolby True HD, Dolby Digital Plus, and DTS Master Audio) for output by the distribution system. Can be received and encoded. A gaming audio studio can output one or more gaming audio stems, such as by using a DAW. The game audio coding / rendering engine can code and / or render the audio stem into channel-based audio content for output by the distribution system. Another exemplary situation in which the technique may be implemented includes broadcast recording audio objects, professional audio systems, consumer on-device capture, HOA audio formats, on-device rendering, consumer audio, and TV. And an audio ecosystem that may include accessories and a car audio system.

[0219]放送録音オーディオオブジェクト、プロフェッショナルオーディオシステム、および消費者向けオンデバイスキャプチャはすべて、ＨＯＡオーディオフォーマットを使用して、それらの出力をコーディングすることができる。このようにして、オーディオコンテンツは、オンデバイスレンダリング、消費者向けオーディオ、ＴＶ、およびアクセサリ、ならびにカーオーディオシステムを使用して再生され得る単一の表現へと、ＨＯＡオーディオフォーマットを使用してコーディングされ得る。言い換えれば、オーディオコンテンツの単一の表現は、オーディオ再生システム１６など、汎用的なオーディオ再生システムにおいて（すなわち、５．１、７．１などの特定の構成を必要とすることとは対照的に）再生され得る。 [0219] Broadcast recording audio objects, professional audio systems, and consumer on-device captures can all use their HOA audio format to code their output. In this way, audio content is coded using the HOA audio format into a single representation that can be played using on-device rendering, consumer audio, TV and accessories, and car audio systems. obtain. In other words, a single representation of audio content is in contrast to requiring a specific configuration such as 5.1, 7.1, etc. in a general audio playback system, such as audio playback system 16. ) Can be played.

[0220]本技法が実行され得る状況の他の例には、獲得要素と再生要素とを含み得るオーディオエコシステムがある。獲得要素は、有線および／またはワイヤレス獲得デバイス（たとえば、Ｅｉｇｅｎマイクロフォン）、オンデバイスサラウンドサウンドキャプチャ、ならびにモバイルデバイス（たとえば、スマートフォンおよびタブレット）を含み得る。いくつかの例では、有線および／またはワイヤレス獲得デバイスは、有線および／またはワイヤレス通信チャネルを介してモバイルデバイスに結合され得る。 [0220] Another example of a situation in which this technique may be implemented is an audio ecosystem that may include an acquisition element and a playback element. Acquisition elements may include wired and / or wireless acquisition devices (eg, Eigen microphones), on-device surround sound capture, and mobile devices (eg, smartphones and tablets). In some examples, the wired and / or wireless acquisition device may be coupled to the mobile device via a wired and / or wireless communication channel.

[0221]本開示の１つまたは複数の技法によれば、モバイルデバイスが音場を獲得するために使用され得る。たとえば、モバイルデバイスは、有線および／もしくはワイヤレス獲得デバイス、ならびに／またはオンデバイスサラウンドサウンドキャプチャ（たとえば、モバイルデバイスに統合された複数のマイクロフォン）を介して、音場を獲得することができる。モバイルデバイスは次いで、再生要素のうちの１つまたは複数による再生のために、獲得された音場をＨＯＡ係数へとコーディングすることができる。たとえば、モバイルデバイスのユーザは、ライブイベント（たとえば、会合、会議、劇、コンサートなど）を録音し（その音場を獲得し）、録音をＨＯＡ係数へとコーディングすることができる。 [0221] In accordance with one or more techniques of this disclosure, a mobile device may be used to acquire a sound field. For example, a mobile device can acquire a sound field via wired and / or wireless acquisition devices and / or on-device surround sound capture (eg, multiple microphones integrated with the mobile device). The mobile device can then code the acquired sound field into HOA coefficients for playback by one or more of the playback elements. For example, a mobile device user can record a live event (eg, a meeting, conference, play, concert, etc.) (acquire its sound field) and code the recording into a HOA coefficient.

[0222]モバイルデバイスはまた、ＨＯＡコーディングされた音場を再生するために、再生要素のうちの１つまたは複数を利用することができる。たとえば、モバイルデバイスは、ＨＯＡコーディングされた音場を復号し、再生要素のうちの１つまたは複数に信号を出力することができ、このことは再生要素のうちの１つまたは複数に音場を再作成させる。一例として、モバイルデバイスは、１つまたは複数のスピーカー（たとえば、スピーカーアレイ、サウンドバーなど）に信号を出力するために、ワイヤレスおよび／またはワイヤレス通信チャネルを利用することができる。別の例として、モバイルデバイスは、１つもしくは複数のドッキングステーションおよび／または１つもしくは複数のドッキングされたスピーカー（たとえば、スマート自動車および／またはスマート住宅の中のサウンドシステム）に信号を出力するために、ドッキング解決手段を利用することができる。別の例として、モバイルデバイスは、ヘッドフォンのセットに信号を出力するために、たとえばリアルなバイノーラルサウンドを作成するために、ヘッドフォンレンダリングを利用することができる。 [0222] The mobile device may also utilize one or more of the playback elements to play the HOA coded sound field. For example, a mobile device can decode a HOA-coded sound field and output a signal to one or more of the playback elements, which causes the sound field to be transmitted to one or more of the playback elements. Let it be recreated. As an example, a mobile device can utilize wireless and / or wireless communication channels to output signals to one or more speakers (eg, speaker arrays, sound bars, etc.). As another example, a mobile device outputs a signal to one or more docking stations and / or one or more docked speakers (eg, a sound system in a smart car and / or smart home). In addition, a docking solution can be used. As another example, a mobile device can utilize headphone rendering to output a signal to a set of headphones, eg, to create a realistic binaural sound.

[0223]いくつかの例では、特定のモバイルデバイスは、３Ｄ音場を獲得することと、より後の時間に同じ３Ｄ音場を再生することの両方を行うことができる。いくつかの例では、モバイルデバイスは、３Ｄ音場を獲得し、３Ｄ音場をＨＯＡへと符号化し、符号化された３Ｄ音場を再生のために１つまたは複数の他のデバイス（たとえば、他のモバイルデバイスおよび／または他の非モバイルデバイス）に送信することができる。 [0223] In some examples, a particular mobile device can both acquire a 3D sound field and play the same 3D sound field at a later time. In some examples, the mobile device acquires a 3D sound field, encodes the 3D sound field into a HOA, and encodes the 3D sound field for playback on one or more other devices (eg, Other mobile devices and / or other non-mobile devices).

[0224]本技法が実行され得るＹまた別の状況は、オーディオコンテンツと、ゲームスタジオと、コーディングされたオーディオコンテンツと、レンダリングエンジンと、配信システムとを含み得る、オーディオエコシステムを含む。いくつかの例では、ゲームスタジオは、ＨＯＡ信号の編集をサポートし得る１つまたは複数のＤＡＷを含み得る。たとえば、１つまたは複数のＤＡＷは、１つまたは複数のゲームオーディオシステムとともに動作する（たとえば、機能する）ように構成され得る、ＨＯＡプラグインおよび／またはツールを含み得る。いくつかの例では、ゲームスタジオは、ＨＯＡをサポートする新しいステムフォーマットを出力することができる。いずれの場合も、ゲームスタジオは、配信システムによる再生のために音場をレンダリングすることができるレンダリングエンジンに、コーディングされたオーディオコンテンツを出力することができる。 [0224] Another situation in which the present techniques may be performed includes an audio ecosystem that may include audio content, game studios, coded audio content, rendering engines, and distribution systems. In some examples, the game studio may include one or more DAWs that may support editing of the HOA signal. For example, the one or more DAWs may include HOA plug-ins and / or tools that may be configured to operate (eg, function) with one or more gaming audio systems. In some examples, the game studio can output a new stem format that supports HOA. In either case, the game studio can output the coded audio content to a rendering engine that can render the sound field for playback by the distribution system.

[0225]本技法はまた、例示的なオーディオ獲得デバイスに関して実行され得る。たとえば、本技法は、３Ｄ音場を録音するようにまとめて構成される複数のマイクロフォンを含み得る、Ｅｉｇｅｎマイクロフォンに関して実行され得る。いくつかの例では、Ｅｉｇｅｎマイクロフォンの複数のマイクロフォンは、約４ｃｍの半径を伴う実質的に球状の球体の表面に配置され得る。いくつかの例では、オーディオ符号化デバイス２０は、マイクロフォンから直接ビットストリーム２１を出力するために、Ｅｉｇｅｎマイクロフォンに統合され得る。 [0225] The techniques may also be performed for an exemplary audio acquisition device. For example, the techniques may be performed on an Eigen microphone that may include multiple microphones configured together to record a 3D sound field. In some examples, multiple microphones of an Eigen microphone can be placed on the surface of a substantially spherical sphere with a radius of about 4 cm. In some examples, the audio encoding device 20 may be integrated into an Eigen microphone to output a bitstream 21 directly from the microphone.

[0226]別の例示的なオーディオ獲得状況は、１つまたは複数のＥｉｇｅｎマイクロフォンなど、１つまたは複数のマイクロフォンから信号を受信するように構成され得る、製作トラックを含み得る。製作トラックはまた、図３のオーディオ符号化器２０などのオーディオ符号化器を含み得る。 [0226] Another exemplary audio acquisition situation may include a production track that may be configured to receive signals from one or more microphones, such as one or more Eigen microphones. The production track may also include an audio encoder, such as the audio encoder 20 of FIG.

[0227]モバイルデバイスはまた、いくつかの場合には、３Ｄ音場を録音するようにまとめて構成される複数のマイクロフォンを含み得る。言い換えれば、複数のマイクロフォンは、Ｘ、Ｙ、Ｚのダイバーシティを有し得る。いくつかの例では、モバイルデバイスは、モバイルデバイスの１つまたは複数の他のマイクロフォンに関してＸ、Ｙ、Ｚのダイバーシティを提供するように回転され得るマイクロフォンを含み得る。モバイルデバイスはまた、図３のオーディオ符号化器２０などのオーディオ符号化器を含み得る。 [0227] The mobile device may also include a plurality of microphones that are configured together to record a 3D sound field in some cases. In other words, the plurality of microphones may have X, Y, Z diversity. In some examples, the mobile device may include a microphone that can be rotated to provide X, Y, Z diversity with respect to one or more other microphones of the mobile device. The mobile device may also include an audio encoder, such as audio encoder 20 of FIG.

[0228]耐衝撃性のビデオキャプチャデバイスは、３Ｄ音場を録音するようにさらに構成され得る。いくつかの例では、耐衝撃性のビデオキャプチャデバイスは、ある活動に関与するユーザのヘルメットに取り付けられ得る。たとえば、耐衝撃性のビデオキャプチャデバイスは、急流下りをしているユーザのヘルメットに取り付けられ得る。このようにして、耐衝撃性のビデオキャプチャデバイスは、ユーザの周りのすべての活動（たとえば、ユーザの後ろでくだける水、ユーザの前で話している別の乗員など）を表す３Ｄ音場をキャプチャすることができる。 [0228] The impact resistant video capture device may be further configured to record a 3D sound field. In some examples, an impact resistant video capture device may be attached to a user's helmet involved in certain activities. For example, an impact resistant video capture device may be attached to a user's helmet that is torrenting. In this way, the impact-resistant video capture device captures a 3D sound field that represents all activities around the user (eg, water squeezing behind the user, another occupant talking in front of the user, etc.) can do.

[0229]本技法はまた、３Ｄ音場を録音するように構成され得る、アクセサリで増強されたモバイルデバイスに関して実行され得る。いくつかの例では、モバイルデバイスは、上記で説明されたモバイルデバイスと同様であり得るが、１つまたは複数のアクセサリが追加されている。たとえば、Ｅｉｇｅｎマイクロフォンが、アクセサリで増強されたモバイルデバイスを形成するために、上述されたモバイルデバイスに取り付けられ得る。このようにして、アクセサリで増強されたモバイルデバイスは、アクセサリで増強されたモバイルデバイスと一体のサウンドキャプチャ構成要素をただ使用するよりも高品質なバージョンの３Ｄ音場をキャプチャすることができる。 [0229] The techniques may also be performed on accessory-enhanced mobile devices that may be configured to record 3D sound fields. In some examples, the mobile device may be similar to the mobile device described above, but with one or more accessories added. For example, an Eigen microphone can be attached to the mobile device described above to form an accessory enhanced mobile device. In this way, an accessory-enhanced mobile device can capture a higher quality version of the 3D sound field than just using a sound capture component that is integral with the accessory-enhanced mobile device.

[0230]本開示で説明される本技法の様々な態様を実行することができる例示的なオーディオ再生デバイスが、以下でさらに説明される。本開示の１つまたは複数の技法によれば、スピーカーおよび／またはサウンドバーは、あらゆる任意の構成で配置され得るが、一方で、依然として３Ｄ音場を再生する。その上、いくつかの例では、ヘッドフォン再生デバイスが、有線接続またはワイヤレス接続のいずれかを介して復号器２４に結合され得る。本開示の１つまたは複数の技法によれば、音場の単一の汎用的な表現が、スピーカー、サウンドバー、およびヘッドフォン再生デバイスの任意の組合せで音場をレンダリングするために利用され得る。 [0230] Exemplary audio playback devices that can perform various aspects of the techniques described in this disclosure are further described below. According to one or more techniques of this disclosure, the speakers and / or soundbar may be arranged in any arbitrary configuration, while still playing a 3D sound field. Moreover, in some examples, a headphone playback device may be coupled to the decoder 24 via either a wired connection or a wireless connection. In accordance with one or more techniques of this disclosure, a single generic representation of a sound field may be utilized to render the sound field with any combination of speakers, sound bars, and headphone playback devices.

[0231]いくつかの異なる例示的なオーディオ再生環境はまた、本開示で説明される技法の様々な態様を実行するために好適であり得る。たとえば、５．１スピーカー再生環境、２．０（たとえば、ステレオ）スピーカー再生環境、フルハイトフロントラウドスピーカーを伴う９．１スピーカー再生環境、２２．２スピーカー再生環境、１６．０スピーカー再生環境、自動車スピーカー再生環境、およびイヤバッド再生環境を伴うモバイルデバイスは、本開示で説明される技法の様々な態様を実行するために好適な環境であり得る。 [0231] Several different exemplary audio playback environments may also be suitable for performing various aspects of the techniques described in this disclosure. For example, 5.1 speaker playback environment, 2.0 (eg, stereo) speaker playback environment, 9.1 speaker playback environment with full height front loudspeaker, 22.2 speaker playback environment, 16.0 speaker playback environment, car speaker A playback environment, and a mobile device with an earbud playback environment may be a suitable environment for performing various aspects of the techniques described in this disclosure.

[0232]本開示の１つまたは複数の技法によれば、音場の単一の汎用的な表現が、上記の再生環境のいずれかにおいて音場をレンダリングするために利用され得る。加えて、本開示の技法は、レンダードが、上記で説明されたもの以外の再生環境での再生のために、汎用的な表現から音場をレンダリングすることを可能にする。たとえば、設計上の考慮事項が、７．１スピーカー再生環境に従ったスピーカーの適切な配置を妨げる場合（たとえば、右側のサラウンドスピーカーを配置することが可能ではない場合）、本開示の技法は、再生が６．１スピーカー再生環境で達成され得るように、レンダーが他の６つのスピーカーとともに補償することを可能にする。 [0232] According to one or more techniques of this disclosure, a single generic representation of a sound field may be utilized to render the sound field in any of the playback environments described above. In addition, the techniques of this disclosure allow a render to render a sound field from a generic representation for playback in playback environments other than those described above. For example, if design considerations prevent proper placement of speakers according to a 7.1 speaker playback environment (eg, where it is not possible to place right surround speakers), Allows the render to compensate with the other 6 speakers so that playback can be achieved in a 6.1 speaker playback environment.

[0233]その上、ユーザは、ヘッドフォンを装着しながらスポーツの試合を見ることができる。本開示の１つまたは複数の技法によれば、スポーツの試合の３Ｄ音場が獲得され得（たとえば、１つまたは複数のＥｉｇｅｎマイクロフォンが野球場の中および／または周りに配置され得）、３Ｄ音場に対応するＨＯＡ係数が取得され復号器に送信され得、復号器がＨＯＡ係数に基づいて３Ｄ音場を再構成して、再構成された３Ｄ音場をレンダラに出力することができ、レンダラが再生環境のタイプ（たとえば、ヘッドフォン）についての指示を取得し、再構成された３Ｄ音場を、ヘッドフォンにスポーツの試合の３Ｄ音場の表現を出力させる信号へとレンダリングすることができる。 [0233] In addition, the user can watch sports matches while wearing headphones. In accordance with one or more techniques of this disclosure, a 3D sound field of a sports game may be obtained (eg, one or more Eigen microphones may be placed in and / or around a baseball field), 3D HOA coefficients corresponding to the sound field can be obtained and transmitted to the decoder, which can reconstruct the 3D sound field based on the HOA coefficients and output the reconstructed 3D sound field to the renderer; The renderer can obtain instructions about the type of playback environment (eg, headphones) and render the reconstructed 3D sound field into a signal that causes the headphones to output a 3D sound field representation of the sports game.

[0234]上記で説明された様々な場合の各々において、オーディオ符号化デバイス２０は、ある方法を実行し、またはさもなければ、オーディオ符号化デバイス２０が実行するように構成される方法の各ステップを実行するための手段を備え得ることを理解されたい。いくつかの場合には、これらの手段は１つまたは複数のプロセッサを備え得る。いくつかの場合には、１つまたは複数のプロセッサは、非一時的コンピュータ可読記憶媒体に記憶される命令によって構成される、専用のプロセッサを表し得る。言い換えれば、符号化の例のセットの各々における本技法の様々な態様は、実行されると、１つまたは複数のプロセッサに、オーディオ符号化デバイス２０が実行するように構成されている方法を実行させる命令を記憶した、非一時的コンピュータ可読記憶媒体を提供し得る。 [0234] In each of the various cases described above, the audio encoding device 20 performs a method, or else each step of the method that the audio encoding device 20 is configured to perform. It should be understood that means for performing can be provided. In some cases, these means may comprise one or more processors. In some cases, the one or more processors may represent a dedicated processor configured with instructions stored on a non-transitory computer readable storage medium. In other words, various aspects of the techniques in each of the example set of encodings, when executed, perform one or more processors in a method that the audio encoding device 20 is configured to execute. A non-transitory computer readable storage medium storing instructions to be stored may be provided.

[0235]１つまたは複数の例において、前述の機能は、ハードウェア、ソフトウェア、ファームウェア、またはそれらの任意の組合せで実装され得る。ソフトウェアで実装される場合、機能は、コンピュータ可読媒体上の１つまたは複数の命令またはコード上に記憶され、またはこれを介して送信され、ハードウェアベースの処理ユニットによって実行され得る。コンピュータ可読媒体は、データ記憶媒体などの有形媒体に対応するコンピュータ可読記憶媒体を含み得る。データ記憶媒体は、本開示で説明される技法の実装のために命令、コードおよび／またはデータ構造を取り出すために、１つまたは複数のコンピュータあるいは１つまたは複数のプロセッサによってアクセスされ得る任意の利用可能な媒体であり得る。コンピュータプログラム製品は、コンピュータ可読媒体を含み得る。 [0235] In one or more examples, the functions described above may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. The computer readable medium may include a computer readable storage medium corresponding to a tangible medium such as a data storage medium. A data storage medium may be accessed by one or more computers or one or more processors to retrieve instructions, code and / or data structures for implementation of the techniques described in this disclosure. It can be a possible medium. The computer program product may include a computer readable medium.

[0236]同様に、上記で説明された様々な場合の各々において、オーディオ復号デバイス２４は、ある方法を実行し、またはさもなければ、オーディオ復号デバイス２４が実行するように構成される方法の各ステップを実行するための手段を備え得ることを理解されたい。いくつかの場合には、これらの手段は１つまたは複数のプロセッサを備え得る。いくつかの場合には、１つまたは複数のプロセッサは、非一時的コンピュータ可読記憶媒体に記憶される命令によって構成される、専用のプロセッサを表し得る。言い換えれば、符号化の例のセットの各々における本技法の様々な態様は、実行されると、１つまたは複数のプロセッサに、オーディオ復号デバイス２４が実行するように構成されている方法を実行させる命令を記憶した、非一時的コンピュータ可読記憶媒体を提供し得る。 [0236] Similarly, in each of the various cases described above, the audio decoding device 24 performs a method, or else each of the methods that the audio decoding device 24 is configured to perform. It should be understood that means may be provided for performing the steps. In some cases, these means may comprise one or more processors. In some cases, the one or more processors may represent a dedicated processor configured with instructions stored on a non-transitory computer readable storage medium. In other words, various aspects of the techniques in each of the example set of encodings, when executed, cause one or more processors to perform a method that the audio decoding device 24 is configured to perform. A non-transitory computer readable storage medium storing instructions may be provided.

[0237]限定ではなく例として、そのようなコンピュータ可読記憶媒体は、ＲＡＭ、ＲＯＭ、ＥＥＰＲＯＭ（登録商標）、ＣＤ−ＲＯＭもしくは他の光ディスクストレージ、磁気ディスクストレージ、もしくは他の磁気記憶デバイス、フラッシュメモリ、または命令もしくはデータ構造の形態の所望のプログラムコードを記憶するために使用され得、コンピュータによってアクセスされ得る任意の他の媒体を備えることができる。しかしながら、コンピュータ可読記憶媒体およびデータ記憶媒体は、接続、搬送波、信号、または他の一時的媒体を含むのではなく、非一時的な有形の記憶媒体を対象とすることを理解されたい。本明細書で使用するディスク（disk）およびディスク（disc）は、コンパクトディスク（disc）（ＣＤ）、レーザーディスク（登録商標）（disc）、光ディスク（disc）、デジタル多用途ディスク（disc）（ＤＶＤ）、フロッピー（登録商標）ディスク（disk）およびＢｌｕ−ｒａｙ（登録商標）ディスク（disc）を含み、ここで、ディスク（disk）は、通常、データを磁気的に再生し、一方、ディスク（disc）は、データをレーザーで光学的に再生する。上記の組合せも、コンピュータ可読媒体の範囲の中に含まれるべきである。 [0237] By way of example, and not limitation, such computer-readable storage media include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage device, flash memory Or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. However, it should be understood that computer-readable storage media and data storage media are directed to non-transitory tangible storage media, rather than including connections, carrier waves, signals, or other temporary media. As used herein, a disk and a disc are a compact disc (CD), a laser disc (registered trademark) (disc), an optical disc (disc), a digital versatile disc (DVD). ), Floppy disk, and Blu-ray disk, where the disk typically reproduces data magnetically, while the disk ) Reproduce the data optically with a laser. Combinations of the above should also be included within the scope of computer-readable media.

[0238]命令は、１つもしくは複数のデジタル信号プロセッサ（ＤＳＰ）、汎用マイクロプロセッサ、特定用途向け集積回路（ＡＳＩＣ）、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、あるいは他の同等の集積回路またはディスクリート論理回路などの１つもしくは複数のプロセッサによって実行され得る。したがって、本明細書で使用される「プロセッサ」という用語は、前述の構造、または、本明細書で説明された技法の実装に好適な任意の他の構造のいずれかを指し得る。加えて、いくつかの態様では、本明細書で説明された機能は、符号化および復号のために構成されるか、または複合コーデックに組み込まれる、専用のハードウェアモジュールおよび／またはソフトウェアモジュール内で提供され得る。また、本技法は、１つもしくは複数の回路または論理要素で十分に実装され得る。 [0238] The instructions may be one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or other equivalent integrated circuits or discrete logic circuits. Can be executed by one or more processors such as. Thus, as used herein, the term “processor” can refer to either the structure described above or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functions described herein may be configured in a dedicated hardware module and / or software module that is configured for encoding and decoding or embedded in a composite codec. Can be provided. Also, the techniques may be fully implemented with one or more circuits or logic elements.

[0239]本開示の技法は、ワイヤレスハンドセット、集積回路（ＩＣ）もしくはＩＣのセット（たとえば、チップセット）を含む、多種多様なデバイスまたは装置で実装され得る。本開示では、開示される技法を実行するように構成されたデバイスの機能的態様を強調するために様々な構成要素、モジュール、またはユニットが説明されるが、それらの構成要素、モジュール、またはユニットを、必ずしも異なるハードウェアユニットによって実現する必要があるとは限らない。むしろ、上で説明されたように、様々なユニットが、好適なソフトウェアおよび／またはファームウェアとともに、上記の１つまたは複数のプロセッサを含めて、コーデックハードウェアユニットにおいて組み合わせられるか、または相互動作ハードウェアユニットの集合によって与えられ得る。 [0239] The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (eg, a chip set). Although this disclosure describes various components, modules, or units to highlight functional aspects of a device configured to perform the disclosed techniques, those components, modules, or units Are not necessarily realized by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit, including one or more processors as described above, or interworking hardware, with suitable software and / or firmware. It can be given by a set of units.

[0240]本開示の様々な態様が説明された。本技法のこれらおよび他の態様は、以下の特許請求の範囲内に入る。
以下に、本願出願の当初の特許請求の範囲に記載された発明を付記する。
［Ｃ１］
高次アンビソニック係数をレンダリングするように構成されるデバイスであって、
複数のスピーカーフィードを生成する、前記高次アンビソニック係数をレンダリングするために使用される行列の符号シンメトリを示す符号シンメトリ情報を取得することを行うように構成される１つまたは複数のプロセッサと、
前記希薄さ情報を記憶するように構成されるメモリと
を備える、デバイス。
［Ｃ２］
前記１つまたは複数のプロセッサは、前記行列の値シンメトリを示す値シンメトリ情報を決定することと、前記値シンメトリ情報と前記符号シンメトリ情報とに基づいて、前記行列を表わすために使用される低減されたビット数を決定することとを行うようにさらに構成される、
Ｃ１に記載のデバイス。
［Ｃ３］
前記１つまたは複数のプロセッサは、前記行列の希薄さを示す希薄さ情報を決定することと、前記希薄さ情報と前記符号シンメトリ情報とに基づいて、前記行列を表わすために使用される低減されたビット数を決定することとを行うようにさらに構成される、
Ｃ１に記載のデバイス。
［Ｃ４］
前記１つまたは複数のプロセッサは、前記行列が前記高次アンビソニック係数から前記複数のスピーカーフィードをレンダリングするために使用されるスピーカーレイアウトを決定するようにさらに構成される、
Ｃ１に記載のデバイス。
［Ｃ５］
前記複数のスピーカーフィードに基づいて前記高次アンビソニック係数によって表わされる音場を再生するように構成されるスピーカーをさらに備える、
Ｃ１に記載のデバイス。
［Ｃ６］
前記１つまたは複数のプロセッサは、前記複数のスピーカーフィードを生成するときに使用されるオーディオレンダラを識別する信号値を示すオーディオレンダリング情報を取得することと、前記オーディオレンダリング情報に基づいて前記複数のスピーカーフィードをレンダリングすることとを行うようにさらに構成される、
Ｃ１に記載のデバイス。
［Ｃ７］
前記信号値は、前記複数のスピーカーフィードに前記高次アンビソニック係数をレンダリングするために使用される前記行列を含み、
前記１つまたは複数のプロセッサは、前記信号値中に含まれる前記行列に基づいて前記複数のスピーカーフィードをレンダリングするように構成される、
Ｃ６に記載のデバイス。
［Ｃ８］
高次アンビソニック係数をレンダリングする方法であって、
複数のスピーカーフィードを生成する、前記高次アンビソニック係数をレンダリングするために使用される行列の符号シンメトリを示す符号シンメトリ情報を取得することを備える、
方法。
［Ｃ９］
前記行列の値シンメトリを示す値シンメトリ情報を決定することと、
前記値シンメトリ情報と前記符号シンメトリ情報とに基づいて、前記行列を表わすために使用される低減されたビット数を決定することと、
をさらに備える、Ｃ８に記載の方法。
［Ｃ１０］
前記行列の希薄さを示す希薄さ情報を決定することと、
前記希薄さ情報と前記符号シンメトリ情報とに基づいて、前記行列を表わすために使用される低減されたビット数を決定することと
をさらに備える、Ｃ８に記載の方法。
［Ｃ１１］
前記行列が前記高次アンビソニック係数から前記マルチチャネルオーディオデータをレンダリングするために使用されるスピーカーレイアウトを決定することをさらに備える、
Ｃ８に記載の方法。
［Ｃ１２］
前記複数のスピーカーフィードに基づいて前記高次アンビソニック係数によって表わされる音場を再生することをさらに備える、
Ｃ８に記載の方法。
［Ｃ１３］
前記複数のスピーカーフィードを生成するときに使用されるオーディオレンダラを識別する信号値を示すオーディオレンダリング情報を取得することと、
前記オーディオレンダリング情報に基づいて前記複数のスピーカーフィードをレンダリングすることと
をさらに備える、Ｃ８に記載の方法。
［Ｃ１４］
前記信号値は、前記複数のスピーカーフィードを生成するために、前記高次アンビソニック係数をレンダリングするために使用される前記行列を含み、前記方法は、前記信号値中に含まれる前記行列に基づいて前記複数のスピーカーフィードをレンダリングすることをさらに備える、
Ｃ１３に記載の方法。
［Ｃ１５］
ビットストリームを作成するように構成されるデバイスであって、
複数のスピーカーフィードを生成する、高次アンビソニック係数をレンダリングするために使用される行列を記憶するように構成されるメモリと、
前記行列の符号シンメトリを示す符号シンメトリ情報に構成される１つまたは複数のプロセッサと
を備える、デバイス。
［Ｃ１６］
前記１つまたは複数のプロセッサは、前記行列の値シンメトリを示す値シンメトリ情報を決定することと、前記値シンメトリ情報と前記符号シンメトリ情報とに基づいて、前記行列を示すビット数を低減することとを行うようにさらに構成される、
Ｃ１５に記載のデバイス。
［Ｃ１７］
前記１つまたは複数のプロセッサは、前記行列の符号シンメトリを示す符号シンメトリ情報を決定することと、前記符号シンメトリ情報と前記符号シンメトリ情報とに基づいて、前記行列を示すビット数を低減することと
を行うようにさらに構成される、Ｃ１５に記載のデバイス。
［Ｃ１８］
前記１つまたは複数のプロセッサは、前記行列が前記高次アンビソニック係数から前記複数のスピーカーフィードをレンダリングするために使用されるスピーカーレイアウトを決定するようにさらに構成される、
Ｃ１５に記載のデバイス。
［Ｃ１９］
前記高次アンビソニック係数によって表わされる音場をキャプチャするように構成されるマイクロフォンをさらに備える、
Ｃ１５に記載のデバイス。
［Ｃ２０］
ビットストリームを作成する方法であって、
複数のスピーカーフィードを生成する、高次アンビソニック係数をレンダリングするために使用される行列の希薄さを示す希薄さ情報を取得することを備える、
方法。
［Ｃ２１］
前記行列の値シンメトリを示す値シンメトリ情報を決定することと、
前記値シンメトリ情報と前記符号シンメトリ情報とに基づいて、前記行列を示すビット数を低減することと
さらに備える、Ｃ２０に記載の方法。
［Ｃ２２］
前記行列の符号シンメトリを示す符号シンメトリ情報を決定することと、前記符号シンメトリ情報と前記符号シンメトリ情報とに基づいて、前記行列を示すビット数を低減することと
をさらに備える、Ｃ２０に記載の方法。
［Ｃ２３］
前記行列が前記高次アンビソニック係数から前記複数のマルチチャネルオーディオデータをレンダリングするために使用されるスピーカーレイアウトを決定することをさらに備える、
Ｃ２０に記載の方法。
［Ｃ２４］
前記高次アンビソニック係数によって表わされる音場をキャプチャすることをさらに備える、
Ｃ２０に記載の方法。
[0240] Various aspects of the disclosure have been described. These and other aspects of the technique fall within the scope of the following claims.
Hereinafter, the invention described in the scope of claims of the present application will be appended.
[C1]
A device configured to render higher order ambisonic coefficients,
One or more processors configured to obtain code symmetry information indicative of code symmetry of a matrix used to render the higher order ambisonic coefficients to generate a plurality of speaker feeds;
A memory configured to store the sparseness information;
A device comprising:
[C2]
The one or more processors are used to determine value symmetry information indicative of value symmetry of the matrix and to represent the matrix based on the value symmetry information and the code symmetry information. Further configured to determine the number of bits obtained,
The device according to C1.
[C3]
The one or more processors are used to determine sparseness information indicative of sparseness of the matrix and to represent the matrix based on the sparseness information and the code symmetry information. Further configured to determine the number of bits obtained,
The device according to C1.
[C4]
The one or more processors are further configured to determine a speaker layout in which the matrix is used to render the plurality of speaker feeds from the higher order ambisonic coefficients.
The device according to C1.
[C5]
A speaker configured to reproduce a sound field represented by the higher order ambisonic coefficient based on the plurality of speaker feeds;
The device according to C1.
[C6]
The one or more processors obtain audio rendering information indicating signal values that identify audio renderers used when generating the plurality of speaker feeds; and the plurality of processors based on the audio rendering information Further configured to render a speaker feed;
The device according to C1.
[C7]
The signal values include the matrix used to render the higher order ambisonic coefficients in the plurality of speaker feeds;
The one or more processors are configured to render the plurality of speaker feeds based on the matrix included in the signal values.
The device according to C6.
[C8]
A method for rendering higher order ambisonic coefficients,
Obtaining code symmetry information indicative of code symmetry of a matrix used to render the higher order ambisonic coefficients to generate a plurality of speaker feeds;
Method.
[C9]
Determining value symmetry information indicative of value symmetry of the matrix;
Determining a reduced number of bits used to represent the matrix based on the value symmetry information and the code symmetry information;
The method of C8, further comprising:
[C10]
Determining sparseness information indicative of sparseness of the matrix;
Determining a reduced number of bits used to represent the matrix based on the sparseness information and the code symmetry information;
The method of C8, further comprising:
[C11]
Further comprising determining a speaker layout used by the matrix to render the multi-channel audio data from the higher order ambisonic coefficients.
The method according to C8.
[C12]
Replaying a sound field represented by the higher order ambisonic coefficient based on the plurality of speaker feeds;
The method according to C8.
[C13]
Obtaining audio rendering information indicative of a signal value identifying an audio renderer used when generating the plurality of speaker feeds;
Rendering the plurality of speaker feeds based on the audio rendering information;
The method of C8, further comprising:
[C14]
The signal value includes the matrix used to render the higher order ambisonic coefficients to generate the plurality of speaker feeds, and the method is based on the matrix included in the signal value. Rendering the plurality of speaker feeds;
The method according to C13.
[C15]
A device configured to create a bitstream,
A memory configured to store a matrix used to render higher order ambisonic coefficients to generate a plurality of speaker feeds;
One or more processors configured in code symmetry information indicating code symmetry of the matrix;
A device comprising:
[C16]
The one or more processors determine value symmetry information indicating value symmetry of the matrix, and reduce the number of bits indicating the matrix based on the value symmetry information and the code symmetry information; Further configured to do the
The device according to C15.
[C17]
The one or more processors determine code symmetry information indicating code symmetry of the matrix; and reducing the number of bits indicating the matrix based on the code symmetry information and the code symmetry information;
The device of C15, further configured to:
[C18]
The one or more processors are further configured to determine a speaker layout in which the matrix is used to render the plurality of speaker feeds from the higher order ambisonic coefficients.
The device according to C15.
[C19]
Further comprising a microphone configured to capture a sound field represented by the higher order ambisonic coefficient;
The device according to C15.
[C20]
A method of creating a bitstream,
Obtaining sparseness information indicative of sparseness of a matrix used to render higher order ambisonic coefficients that generate multiple speaker feeds;
Method.
[C21]
Determining value symmetry information indicative of value symmetry of the matrix;
Reducing the number of bits indicating the matrix based on the value symmetry information and the code symmetry information;
The method of C20, further comprising.
[C22]
Determining code symmetry information indicating code symmetry of the matrix, and reducing the number of bits indicating the matrix based on the code symmetry information and the code symmetry information;
The method of C20, further comprising:
[C23]
The matrix further comprising: determining a speaker layout used to render the plurality of multi-channel audio data from the higher order ambisonic coefficients;
The method according to C20.
[C24]
Further comprising capturing a sound field represented by the higher order ambisonic coefficients.
The method according to C20.

Claims

A device configured to render higher order ambisonic coefficients,
Sparseness information indicating the sparseness of the matrix used to render the higher order ambisonic coefficients to generate a plurality of speaker feeds from a bitstream containing an encoded version of the higher order ambisonic coefficients Getting,
Obtaining code symmetry information indicating code symmetry of the matrix from the bitstream;
Obtaining a reduced number of bits used to represent the matrix from the bitstream;
One or more processors configured to: reconstruct the matrix based on the sparse information, the code symmetry information, and the reduced number of bits;
And a memory coupled to the one or more processors and configured to store the sparse information.

The one or more processors are further configured to determine a speaker layout in which the matrix is used to render the plurality of speaker feeds from the higher order ambisonic coefficients.
The device of claim 1.

A speaker configured to reproduce a sound field represented by the higher order ambisonic coefficient based on the plurality of speaker feeds;
The device of claim 1.

The one or more processors obtain audio rendering information from the bitstream indicating signal values identifying audio renderers used when generating the plurality of speaker feeds, and based on the audio rendering information Further configured to render the plurality of speaker feeds;
The device of claim 1.

The signal value includes an index associated with the matrix used to render the higher order ambisonic coefficients in the plurality of speaker feeds;
The one or more processors are configured to render the plurality of speaker feeds based on the matrix associated with the index included in the signal value.
The device of claim 4.

A method for rendering higher order ambisonic coefficients,
Sparseness information indicating the sparseness of the matrix used to render the higher order ambisonic coefficients to generate a plurality of speaker feeds from a bitstream containing an encoded version of the higher order ambisonic coefficients Getting,
Obtaining code symmetry information indicating code symmetry of the matrix from the bitstream;
Obtaining a reduced number of bits used to represent the matrix from the bitstream;
Reconstructing the matrix based on the sparseness information, the code symmetry information, and the reduced number of bits.

The matrix further comprising: determining a speaker layout used to render multi-channel audio data from the higher order ambisonic coefficients;
The method of claim 6.

Replaying a sound field represented by the higher order ambisonic coefficient based on the plurality of speaker feeds;
The method of claim 6.

Obtaining audio rendering information from the bitstream indicating a signal value identifying an audio renderer used when generating the plurality of speaker feeds;
The method of claim 6, further comprising rendering the plurality of speaker feeds based on the audio rendering information.

The signal value includes an index associated with the matrix used to render the higher order ambisonic coefficients to generate the plurality of speaker feeds;
The method further comprises rendering the plurality of speaker feeds based on the matrix associated with the index included in the signal value.
The method of claim 9.

A device configured to create a bitstream,
A memory configured to store a matrix used to render higher order ambisonic coefficients to generate a plurality of speaker feeds;
Coupled to the memory,
Obtaining code symmetry information indicating code symmetry of the matrix;
Obtaining sparse information indicating the sparseness of the matrix;
Determining a reduced number of bits used to represent the matrix based on the code symmetry information and the sparseness information;
Generating the bitstream to include a coded version of the higher order ambisonic coefficients, the code symmetry information, the sparseness information, and the reduced number of bits 1 A device comprising one or more processors.

The one or more processors are further configured to determine a speaker layout in which the matrix is used to render the plurality of speaker feeds from the higher order ambisonic coefficients.
The device of claim 11.

Further comprising a microphone configured to capture a sound field represented by the higher order ambisonic coefficient;
The device of claim 11.

A method of creating a bitstream,
Obtaining sparseness information indicating the sparseness of the matrix used to render the higher-order ambisonic coefficients that generate multiple speaker feeds;
Obtaining code symmetry information indicating code symmetry of the matrix;
Determining a reduced number of bits used to represent the matrix based on the code symmetry information and the sparseness information;
Generating the bitstream to include a coded version of the higher order ambisonic coefficients, the code symmetry information, the sparseness information, and the reduced number of bits.

The matrix further comprising: determining a speaker layout used to render multi-channel audio data from the higher order ambisonic coefficients;
The method according to claim 14.