KR101941764B1

KR101941764B1 - Obtaining symmetry information for higher order ambisonic audio renderers

Info

Publication number: KR101941764B1
Application number: KR1020167033118A
Authority: KR
Inventors: 닐스 귄터 페터스; 디판잔 센; 마틴 제임스 모렐
Original assignee: 퀄컴 인코포레이티드
Priority date: 2014-05-30
Filing date: 2015-05-29
Publication date: 2019-01-23
Also published as: JP2017520174A; ES2696930T3; WO2015184316A1; KR20170015898A; EP3149972A1; CN106465029B; BR112016028212A2; CA2950014A1; HUE039048T2; EP3149972B1; CA2950014C; JP6423009B2; BR112016028212B1; CN106465029A

Abstract

일반적으로, 비트스트림에서 오디오 렌더링 정보를 획득하기 위한 기법들이 설명된다. 프로세서 및 메모리를 포함하는, 고차 앰비소닉 계수들을 렌더링하도록 구성된 디바이스는 기법들을 수행할 수도 있다. 프로세서는 복수의 스피커 피드들을 생성하기 위해 고차 앰비소닉 계수들을 렌더링하는데 이용되는 행렬의 부호 대칭성을 나타내는 부호 대칭성 정보를 획득하도록 구성될 수도 있다. 메모리는 희소성 정보를 저장하도록 구성될 수도 있다.Generally, techniques for obtaining audio rendering information in a bitstream are described. A device configured to render higher order ambience coefficients, including a processor and a memory, may perform techniques. The processor may be configured to obtain sign symmetry information indicative of the sign symmetry of the matrix used to render the high order ambience coefficients to produce a plurality of speaker feeds. The memory may be configured to store scarcity information.

Description

OBTAINING SYMMETRY INFORMATION FOR HIGHER ORDER AMBISONIC AUDIO RENDERERS [0001]

본 출원은, 발명의 명칭이 "SIGNALING AUDIO RENDERING INFORMATION IN A BITSTREAM" 인, 2014년 7월 11일자로 출원된 미국 가출원 제62/023,662호, 및 발명의 명칭이 "SIGNALING AUDIO RENDERING INFORMATION IN A BITSTREAM" 인, 2014년 5월 30일자로 출원된 미국 가출원 제62/005,829호의 이익을 주장하고, 전술한 미국 가출원들 각각의 전체 내용은 이로써 본 명세서에 이들 각각의 전부가 제시된 것처럼 참조로 포함된다.This application claims the benefit of U.S. Provisional Application No. 62 / 023,662 , filed July 11, 2014, entitled " SIGNALING AUDIO RENDERING INFORMATION IN A BITSTREAM & U.S. Provisional Application No. 62 / 005,829 , filed May 30, 2014, the entire contents of each of the foregoing U.S. Provisional Patent Applications are hereby incorporated herein by reference in their entirety as if each of these were all expressly set forth herein.

본 개시물은 정보를 렌더링 (rendering) 하는 것에 관한 것으로, 더 구체적으로는, 고차 앰비소닉 (higher-order ambisonic; HOA) 오디오 데이터에 대한 정보를 렌더링하는 것이다.The present disclosure relates to rendering information, and more particularly, to rendering information about higher-order ambisonic (HOA) audio data.

오디오 콘텐츠의 생성 동안, 사운드 엔지니어는 오디오 콘텐츠를 재생하는데 이용되는 스피커들의 타깃 구성들에 대해 오디오 콘텐츠를 맞추려는 시도시에 특정 렌더러 (renderer) 를 이용하여 오디오 콘텐츠를 렌더링할 수도 있다. 다시 말해, 사운드 엔지니어는 타깃화된 구성에 배열된 스피커들을 이용하여 오디오 콘텐츠를 렌더링하고 그 렌더링된 오디오 콘텐츠를 재생할 수도 있다. 사운드 엔지니어는 그 후에 타깃화된 구성에 배열된 스피커들을 이용하여 오디오 콘텐츠의 다양한 양태들을 리믹싱하고, 리믹싱된 오디오 콘텐츠를 렌더링하며, 렌더링된, 리믹싱된 오디오 콘텐츠를 다시 재생할 수도 있다. 사운드 엔지니어는 소정의 예술적 의도가 오디오 콘텐츠에 의해 제공될 때까지 이러한 방식으로 반복할 수도 있다. 이러한 방법으로, 사운드 엔지니어는 (예를 들어, 오디오 콘텐츠와 함께 재생된 비디오 콘텐츠를 수반하기 위해) 재생 동안 소정의 예술적 의도를 제공하거나 또는 소정의 음장 (sound field) 을 제공하는 오디오 콘텐츠를 생성할 수도 있다.During the generation of the audio content, the sound engineer may render the audio content using a particular renderer in an attempt to match the audio content to the target configurations of the speakers used to play the audio content. In other words, the sound engineer may use the speakers arranged in the targeted configuration to render the audio content and play the rendered audio content. The sound engineer can then use the speakers arranged in the targeted configuration to remix various aspects of the audio content, render the remixed audio content, and replay the rendered, remixed audio content. The sound engineer may repeat this way until a given artistic intention is provided by the audio content. In this way, the sound engineer may provide some artistic intention during playback (e.g., to accompany the video content played with the audio content), or may generate audio content that provides a predetermined sound field It is possible.

일반적으로, 오디오 데이터를 표현하는 비트스트림에서 오디오 렌더링 정보를 특정하기 위한 기법들이 설명된다. 다시 말해, 이 기법들은 오디오 콘텐츠 생성 동안 이용된 오디오 렌더링 정보를 재생 디바이스에 시그널링하게 하는 방법을 제공할 수도 있고, 이 재생 디바이스는 그 후에 오디오 렌더링 정보를 이용하여 오디오 콘텐츠를 렌더링할 수도 있다. 이러한 방식으로 렌더링 정보를 제공하는 것은 재생 디바이스로 하여금 사운드 엔지니어에 의해 의도된 방식으로 오디오 콘텐츠를 렌더링하는 것을 가능하게 하고, 그에 의해 예술적 의도가 청취자에 의해 잠재적으로 이해되도록 오디오 콘텐츠의 적절한 재생을 잠재적으로 보장한다. 다시 말해, 사운드 엔지니어에 의한 렌더링 동안 이용되는 렌더링 정보가 본 개시물에서 설명되는 기법들에 따라 제공되어, 오디오 재생 디바이스가 그 렌더링 정보를 활용하여 사운드 엔지니어에 의해 의도된 방식으로 오디오 콘텐츠를 렌더링할 수도 있어서, 그에 의해 이 오디오 렌더링 정보를 제공하지 않는 시스템들에 비해 오디오 콘텐츠의 생성 및 재생 양쪽 동안 더 일관성있는 경험을 보장한다.Generally, techniques for specifying audio rendering information in a bitstream representing audio data are described. In other words, these techniques may provide a way to signal audio rendering information used during audio content generation to a playback device, which may then render the audio content using audio rendering information. Providing the rendering information in this manner enables the playback device to render the audio content in a manner intended by the sound engineer, thereby enabling the proper reproduction of the audio content to be potentially understood by the listener . In other words, the rendering information used during rendering by the sound engineer is provided in accordance with the techniques described in this disclosure so that the audio reproduction device utilizes the rendering information to render the audio content in a manner intended by the sound engineer So as to ensure a more consistent experience for both creation and playback of audio content as compared to systems that do not provide this audio rendering information.

하나의 양태에서, 고차 앰비소닉 계수들을 렌더링하도록 구성된 디바이스는, 고차 앰비소닉 계수들을 복수의 스피커 피드 (feed) 들로 렌더링하는데 이용되는 행렬의 희소성 (sparseness) 을 나타내는 희소성 정보를 획득하도록 구성된 하나 이상의 프로세서들, 및 희소성 정보를 저장하도록 구성된 메모리를 포함한다.In one aspect, a device configured to render high order ambsonic coefficients comprises one or more devices configured to obtain sparseness information indicative of sparseness of a matrix used to render high order ambience coefficients into a plurality of speaker feeds, Processors, and memory configured to store scarcity information.

다른 양태에서, 고차 앰비소닉 계수들을 렌더링하는 방법은, 복수의 스피커 피드들을 생성하기 위해 고차 앰비소닉 계수들을 렌더링하는데 이용되는 행렬의 희소성을 나타내는 희소성 정보를 획득하는 단계를 포함한다.In another aspect, a method of rendering higher order ambience coefficients comprises obtaining scarcity information indicative of the scarcity of a matrix used to render higher order ambience coefficients to produce a plurality of speaker feeds.

다른 양태에서, 비트스트림을 생성하도록 구성된 디바이스는, 행렬을 저장하도록 구성된 메모리, 및 복수의 스피커 피드들을 생성하기 위해 고차 앰비소닉 계수들을 렌더링하는데 이용되는 행렬의 희소성을 나타내는 희소성 정보를 획득하도록 구성된 하나 이상의 프로세서들을 포함한다.In another aspect, a device configured to generate a bitstream comprises a memory configured to store a matrix, and one configured to obtain scarcity information indicative of the scarcity of a matrix used to render high order ambience coefficients to generate a plurality of speaker feeds, &Lt; / RTI >

다른 양태에서, 비트스트림을 생성하는 방법은, 복수의 스피커 피드들을 생성하기 위해 고차 앰비소닉 계수들을 렌더링하는데 이용되는 행렬의 희소성을 나타내는 희소성 정보를 획득하는 단계를 포함한다.In another aspect, a method of generating a bitstream includes obtaining scarcity information indicative of the scarcity of a matrix used to render high order ambience coefficients to produce a plurality of speaker feeds.

다른 양태에서, 고차 앰비소닉 계수들을 렌더링하도록 구성된 디바이스는, 복수의 스피커 피드들을 생성하기 위해 고차 앰비소닉 계수들을 렌더링하는데 이용되는 행렬의 부호 대칭성을 나타내는 부호 대칭성 정보를 획득하도록 구성된 하나 이상의 프로세서들, 및 희소성 정보를 저장하도록 구성된 메모리를 포함한다.In another aspect, a device configured to render higher order ambience coefficients comprises one or more processors configured to obtain sign symmetry information indicative of a sign symmetry of a matrix used to render higher order ambience coefficients to generate a plurality of speaker feeds, And a memory configured to store scarcity information.

다른 양태에서, 고차 앰비소닉 계수들을 렌더링하는 방법은, 복수의 스피커 피드들을 생성하기 위해 고차 앰비소닉 계수들을 렌더링하는데 이용되는 행렬의 부호 대칭성을 나타내는 부호 대칭성 정보를 획득하는 단계를 포함한다.In another aspect, a method of rendering higher order ambience coefficients includes obtaining sign symmetry information indicative of the sign symmetry of a matrix used to render higher order ambience coefficients to produce a plurality of speaker feeds.

다른 양태에서, 비트스트림을 생성하도록 구성된 디바이스는, 복수의 스피커 피드들을 생성하기 위해 고차 앰비소닉 계수를 렌더링하는데 이용되는 행렬을 저장하도록 구성된 메모리, 및 행렬의 부호 대칭성을 나타내는 부호 대칭성 정보에 대해 구성된 하나 이상의 프로세서들을 포함한다.In another aspect, a device configured to generate a bitstream comprises a memory configured to store a matrix used to render a high order ambience coefficient to generate a plurality of speaker feeds, and a memory configured to store the code symmetry information indicative of the symmetry of the matrix One or more processors.

기법들의 하나 이상의 양태들의 상세들이 아래의 설명 및 첨부 도면들에 제시된다. 기법들의 다른 피처들, 목적들, 및 이점들은 이 설명 및 도면들, 그리고 청구항들로부터 명백해질 것이다.Details of one or more aspects of the techniques are set forth in the following description and the accompanying drawings. Other features, objects, and advantages of the techniques will be apparent from the description and drawings, and from the claims.

도 1 은 다양한 차수 (order) 들 및 하위차수 (sub-order) 들의 구면 조화 기저 함수 (spherical harmonic basis function) 들을 예시하는 다이어그램이다.
도 2 는 본 개시물에서 설명되는 기법들의 다양한 양태들을 수행할 수도 있는 시스템을 예시하는 다이어그램이다.
도 3 은 본 개시물에서 설명되는 기법들의 다양한 양태들을 수행할 수도 있는 도 2 의 예에 도시된 오디오 인코딩 디바이스의 하나의 예를 더 상세히 예시하는 블록 다이어그램이다.
도 4 는 도 2 의 오디오 디코딩 디바이스를 더 상세히 예시하는 블록 다이어그램이다.
도 5 는 본 개시물에서 설명되는 벡터-기반 합성 기법들의 다양한 양태들을 수행함에 있어서 오디오 인코딩 디바이스의 예시적인 동작을 예시하는 플로우차트이다.
도 6 은 본 개시물에서 설명되는 기법들의 다양한 양태들을 수행함에 있어서 오디오 디코딩 디바이스의 예시적인 동작을 예시하는 플로우차트이다.
도 7 은 본 개시물에서 설명되는 기법들의 다양한 양태들을 수행함에 있어서 도 2 의 예에 도시된 하나의 시스템과 같은 시스템의 예시적인 동작을 예시하는 플로우차트이다.
도 8a 내지 도 8d 는 본 개시물에서 설명되는 기법들에 따라 형성되는 비트스트림들을 예시하는 다이어그램이다.
도 8e 내지 도 8g 는 압축된 공간 성분들을 더 상세히 특정할 수도 있는 비트스트림 또는 사이드 채널 정보의 부분들을 예시하는 다이어그램들이다.
도 9 는 고차 앰비소닉 (HOA) 렌더링 행렬 내의 HOA 차수 의존 최소 및 최대 이득들의 일 예를 예시하는 다이어그램이다.
도 10 은 22 개의 라우드스피커 (loudspeaker) 들에 대한 부분적 희소 6 차 HOA 렌더링 행렬을 예시하는 다이어그램이다.
도 11 은 대칭성 속성들의 시그널링을 예시하는 플로우 다이어그램이다.Figure 1 is a diagram illustrating spherical harmonic basis functions of various orders and sub-orders.
Figure 2 is a diagram illustrating a system that may perform various aspects of the techniques described in this disclosure.
Figure 3 is a block diagram illustrating in more detail one example of an audio encoding device shown in the example of Figure 2, which may perform various aspects of the techniques described in this disclosure.
4 is a block diagram illustrating the audio decoding device of FIG. 2 in greater detail.
5 is a flow chart illustrating an exemplary operation of an audio encoding device in performing various aspects of vector-based synthesis techniques described in this disclosure.
6 is a flow chart illustrating an exemplary operation of an audio decoding device in performing various aspects of the techniques described in this disclosure.
Figure 7 is a flow chart illustrating exemplary operation of a system, such as the one system shown in the example of Figure 2, in performing various aspects of the techniques described in this disclosure.
8A-8D are diagrams illustrating bitstreams formed in accordance with the techniques described in this disclosure.
8E-8G are diagrams illustrating portions of bitstream or side channel information that may specify compressed spatial components in more detail.
9 is a diagram illustrating an example of HOA order-dependent minimum and maximum gains in a higher order ambience sonic (HOA) rendering matrix.
10 is a diagram illustrating a partial scarce sixth order HOA rendering matrix for twenty-two loudspeakers.
11 is a flow diagram illustrating signaling of symmetric attributes.

요즘에는 서라운드 사운드의 진화가 엔터테인먼트를 위한 많은 출력 포맷들을 이용가능하게 하였다. 이러한 소비자 서라운드 사운드 포맷들의 예들은, 이들이 소정의 기하학적 좌표들에서의 라우드스피커 (loudspeaker) 들로의 피드 (feed) 들을 암시적으로 특정한다는 점에서 대부분 '채널' 기반이다. 소비자 서라운드 사운드 포맷들은 대중적인 5.1 포맷 (다음 6 개의 채널들: 전방 좌측 (FL), 전방 우측 (FR), 중앙 또는 전방 중앙, 후방 좌측 또는 서라운드 좌측, 후방 우측 또는 서라운드 우측, 및 저주파 효과들 (low frequency effects; LFE) 을 포함함), 성장하는 7.1 포맷, (예를 들어, 울트라 고선명 텔레비전 표준으로 이용하기 위한) 22.2 포맷 및 7.1.4 포맷과 같은 하이트 스피커 (height speaker) 들을 포함하는 다양한 포맷들을 포함한다. 비-소비자 포맷들은, 종종 '서라운드 어레이들' 이라고 지칭되는 (대칭 및 비대칭 지오메트리들에 있어서) 임의의 개수의 스피커들에 걸쳐 있을 수 있다. 이러한 어레이의 하나의 예는 절단된 20면체의 코너들 상의 좌표들에 포지셔닝된 32 개의 라우드스피커들을 포함한다.Nowadays, the evolution of surround sound has made many output formats available for entertainment. Examples of such consumer surround sound formats are mostly 'channel based' in that they implicitly specify feeds to loudspeakers at certain geometric coordinates. Consumer surround sound formats include the popular 5.1 format (the following six channels: front left (FL), front right (FR), center or front center, rear left or surround left, rear right or surround right, and low frequency effects low frequency effects (LFE)), a growing 7.1 format, height speakers such as 22.2 format and 7.1.4 format (e.g., for use as an ultra high definition television standard) . Non-consumer formats may span any number of speakers, often referred to as " surround arrays " (in symmetric and asymmetric geometries). One example of such an array includes 32 loudspeakers positioned at the coordinates on the corners of the truncated icosahedron.

장래의 MPEG 인코더로의 입력은 옵션적으로 3 개의 가능한 포맷들 중 하나이다: (i) 미리 특정된 포지션들에서의 라우드스피커들을 통해 재생되도록 의도된 (위에서 논의된 바와 같은) 전통적인 채널-기반 오디오; (ii) (다른 정보 중에서도) 위치 좌표들을 포함하는 연관된 메타데이터를 갖는 단일 오디오 오브젝트들에 대한 이산 펄스-코드-변조 (pulse-code-modulation; PCM) 데이터를 수반하는 오브젝트-기반 오디오; 및 (iii) 구면 조화 기저 함수 (spherical harmonic basis function) 들의 계수들 (또한 "구면 조화 계수들 (spherical harmonic coefficients)" 또는 SHC, "고차 앰비소닉스 (Higher-order Ambisonics)" 또는 HOA, 및 "HOA 계수들" 이라고도 지칭됨) 을 이용하여 음장 (soundfield) 을 표현하는 것을 수반하는 장면-기반 오디오. 장래의 MPEG 인코더는, 스위스 제네바에서 2013년 1월에 공개되고 http://mpeg.chiariglione.org/sites/default/files/files/standards/parts/docs/w13411.zip 에서 입수가능한 국제 표준화 기구/국제 전자기술 위원회 (ISO)/(IEC) JTC1/SC29/WG11/N13411 에 의한 "Call for Proposals for 3D Audio" 라는 명칭의 문헌에 더 상세히 설명될 수도 있다.Inputs to future MPEG encoders are optionally one of three possible formats: (i) traditional channel-based audio (as discussed above) intended to be played through loudspeakers at pre-specified positions; (ii) object-based audio with discrete pulse-code-modulation (PCM) data for single audio objects with associated metadata including positional coordinates (among other information); And (iii) coefficients of spherical harmonic basis functions (also referred to as "spherical harmonic coefficients" or SHC, "Higher-order Ambisonics" or HOA, and "HOA Quot; coefficients ") of a scene-based audio. Future MPEG encoders will be published in Geneva, Switzerland, in January 2013, and available at http://mpeg.chiariglione.org/sites/default/files/files/standards/parts/docs/w13411.zip . May be further described in the document entitled " Call for Proposals for 3D Audio " by the International Electrotechnical Commission (ISO) / (IEC) JTC1 / SC29 / WG11 / N13411.

마켓에는 다양한 '서라운드-사운드' 채널-기반 포맷들이 존재한다. 이들은, 예를 들어, 5.1 홈 시어터 시스템 (스테레오를 넘어 거실로 진출한다는 관점에서 가장 성공적이었음) 으로부터 NHK (Nippon Hoso Kyokai 또는 일본 방송사) 에 의해 개발된 22.2 시스템까지의 범위에 있다. 콘텐츠 크리에이터들 (예를 들어, 헐리우드 스튜디오들 (Hollywood studios)) 은 영화용 사운드트랙을 한 번 제작하고 싶어하고, 각각의 스피커 구성을 위해 그것을 리믹싱하려는 노력을 들이지 않는다. 최근, 표준 개발 기구들은 표준화된 비트스트림으로의 인코딩, 및 재생 (렌더러 (renderer) 를 수반함) 의 위치에서 스피커 지오메트리 (및 개수) 및 음향 조건들에 대해 적응가능하고 구속받지 않는 후속 디코딩을 제공하는 방법들을 고려하고 있었다.There are various 'surround-sound' channel-based formats in the market. These range from, for example, the 5.1 home theater system (which has been most successful in terms of moving beyond the stereo to the living room) to the 22.2 system developed by NHK (Nippon Hoso Kyokai or Japanese broadcaster). Content creators (for example, Hollywood studios) want to make a soundtrack for a movie once and do not try to remix it for each speaker configuration. Recently, standards development organizations have provided adaptive and unrestricted subsequent decoding for speaker geometry (and number) and acoustic conditions at the location of encoding and playback (with a renderer) into a standardized bitstream I was considering ways to do that.

콘텐츠 크리에이터들에 대해 이러한 유연성을 제공하기 위해, 엘리먼트들의 계층적 세트가 음장을 표현하기 위해 이용될 수도 있다. 엘리먼트들의 계층적 세트는, 저차 엘리먼트들의 기본 세트가 모델링된 음장의 전체 표현을 제공하도록 엘리먼트들이 오더링되는 (ordered) 엘리먼트들의 세트를 지칭할 수도 있다. 그 세트가 고차 엘리먼트들을 포함하도록 확장됨에 따라, 그 표현은 더 상세화되어, 해상도를 증가시킨다.To provide this flexibility for content creators, a hierarchical set of elements may be used to represent the sound field. A hierarchical set of elements may refer to a set of elements from which elements are ordered such that a basic set of lower order elements provides an overall representation of the modeled sound field. As the set is expanded to include higher order elements, the representation is further refined to increase resolution.

엘리먼트들의 계층적 세트의 하나의 예는 구면 조화 계수들 (SHC) 의 세트이다. 다음 식은 SHC 를 이용하여 음장의 디스크립션 (description) 또는 표현을 나타낸다:One example of a hierarchical set of elements is a set of spherical harmonic coefficients (SHC). The following equation uses SHC to describe the sound field description or representation:

이 식은 시간 t 에서 음장의 임의의 포인트

에서의 압력

가 SHC,

에 의해 고유하게 표현될 수 있음을 나타낸다. 여기서,

이고, c 는 사운드의 속도 (~343 m/s) 이고,

은 참조의 포인트 (또는 관측 포인트) 이고,

은 차수 (order) n 의 구면 베셀 함수이며,

은 차수 n 및 하위차수 (sub-order) m 의 구면 조화 기저 함수들이다. 대괄호들에서의 용어는 이산 푸리에 변환 (DFT), 이산 코사인 변환 (DCT), 또는 웨이블릿 변환과 같은 다양한 시간-주파수 변환들에 의해 근사화될 수 있는 신호의 주파수-도메인 표현 (즉,

) 인 것이 인지될 수 있다. 계층적 세트들의 다른 예들은 웨이블릿 변환 계수들의 세트들 및 다해상도 기저 함수들의 계수들의 다른 세트들을 포함한다.This equation indicates that at any point in the sound field at time t

Pressure in

SHC,

&Lt; / RTI > here,

, C is the speed of sound (~ 343 m / s)

Is the point of reference (or observation point)

Is a spherical Bessel function of order n ,

Are the spherical harmonic basis functions of order n and sub-order m . The terms in brackets denote the frequency-domain representation of the signal that can be approximated by various time-frequency transforms such as discrete Fourier transform (DFT), discrete cosine transform (DCT), or wavelet transform

). &Lt; / RTI > Other examples of hierarchical sets include sets of wavelet transform coefficients and other sets of coefficients of multi-resolution basis functions.

도 1 은 제로 차수 (n = 0) 로부터 제 4 차수 (n = 4) 까지의 구면 조화 기저 함수들을 예시하는 다이어그램이다. 도시될 수 있는 바와 같이, 각각의 차수에 대해, 예시 목적들의 용이를 위해 도 1 의 예에 도시되지만 명시적으로 언급되지 않은 하위차수들 m 의 확장이 존재한다.1 is a diagram illustrating spherical harmonic basis functions from a zero order ( n = 0) to a fourth order ( n = 4). As can be seen, for each order, there is an extension of the lower orders m , which is shown in the example of FIG. 1 but is not explicitly mentioned for ease of illustration purposes.

SHC

는 다양한 마이크로폰 어레이 구성들에 의해 물리적으로 포착 (예를 들어, 레코딩) 될 수 있거나 또는, 대안적으로, 음장의 채널-기반 또는 오브젝트-기반 디스크립션들로부터 도출될 수 있다. SHC 는 장면-기반 오디오를 표현하는데, 여기서 SHC 는 더 효율적인 송신 또는 저장을 촉진할 수도 있는 인코딩된 SHC 를 획득하기 위해 오디오 인코더에 입력될 수도 있다. 예를 들어, (1+4)² (25, 그리고 그에 따라 제 4 차수) 계수들을 수반하는 제 4 차수 표현이 이용될 수도 있다.SHC

May be physically captured (e.g., recorded) by various microphone array configurations, or alternatively, may be derived from channel-based or object-based descriptions of the sound field. The SHC represents scene-based audio, where the SHC may be input to an audio encoder to obtain an encoded SHC that may facilitate more efficient transmission or storage. For example, a fourth order expression involving (1 + 4) ² (25, and hence fourth order) coefficients may be used.

위에서 언급된 바와 같이, SHC 는 마이크로폰 어레이를 이용한 마이크로폰 레코딩으로부터 도출될 수도 있다. SHC 가 어떻게 마이크로폰 어레이들로부터 도출될 수도 있는지의 다양한 예들은 『Poletti, M., "Three-Dimensional Surround Sound Systems Based on Spherical Harmonics", J. Audio Eng. Soc., Vol. 53, No. 11, 2005년 11월, pp. 1004-1025』에 설명된다.As noted above, the SHC may be derived from microphone recording using a microphone array. Various examples of how SHCs may be derived from microphone arrays are described in Poletti, M., " Three-Dimensional Surround Sound Systems Based on Spherical Harmonics ", J. Audio Eng. Soc., Vol. 53, No. 11, November 2005, pp. 1004-1025. &Quot;

SHC들이 어떻게 오브젝트-기반 디스크립션으로부터 도출될 수도 있는지를 예시하기 위해, 다음 식을 고려한다. 개별 오디오 오브젝트에 대응하는 음장에 대한 계수들

는 다음과 같이 표현될 수도 있고:To illustrate how SHCs may be derived from an object-based description, consider the following equations. The coefficients for the sound field corresponding to the individual audio object

May be expressed as: < RTI ID = 0.0 >

여기서 i 는

이고,

은 차수 n 의 (제 2 종의) 구면 핸켈 함수이고,

는 오브젝트의 위치이다. (예를 들어, PCM 스트림에 대해 고속 푸리에 변환을 수행하는 것과 같은 시간-주파수 분석 기법들을 이용하여) 주파수의 함수로서 오브젝트 소스 에너지

를 아는 것은 각각의 PCM 오브젝트 및 대응하는 위치를 SHC,

로 컨버팅하게 한다. 추가로, 이것은 (상기가 선형 및 직교 분해이기 때문에) 각각의 오브젝트에 대한

계수들이 가산적임을 나타낼 수 있다. 이러한 방식으로, 다수의 PCM 오브젝트들은 (예를 들어, 개별 오브젝트들에 대한 계수 벡터들의 합으로서)

계수들에 의해 표현될 수 있다. 본질적으로, 계수들은 음장에 관한 정보 (3D 좌표들의 함수로서의 압력) 를 포함하고, 상기는, 관측 포인트

의 부근에서, 개별 오브젝트들로부터 전체 음장의 표현으로의 변환을 표현한다. 나머지 도면들은 오브젝트-기반 및 SHC-기반 오디오 코딩의 맥락에서 아래에 설명된다.Where i is

ego,

Is a spherical Hankel function of degree n (second kind)

Is the position of the object. (E. G., Using time-frequency analysis techniques such as performing a fast Fourier transform on the PCM stream)

Knowing that each PCM object and its corresponding location is SHC,

. In addition, it is possible for each object (since it is linear and orthogonal decomposition)

The coefficients may be additive. In this way, multiple PCM objects (e.g., as the sum of the coefficient vectors for individual objects)

Can be expressed by coefficients. In essence, the coefficients comprise information about the sound field (pressure as a function of 3D coordinates), which,

, The conversion from individual objects to the representation of the entire sound field is expressed. The remaining figures are described below in the context of object-based and SHC-based audio coding.

도 2 는 본 개시물에서 설명되는 기법들의 다양한 양태들을 수행할 수도 있는 시스템 (10) 을 예시하는 다이어그램이다. 도 2 의 예에 도시된 바와 같이, 시스템 (10) 은 콘텐츠 크리에이터 디바이스 (12) 및 콘텐츠 소비자 디바이스 (14) 를 포함한다. 콘텐츠 크리에이터 디바이스 (12) 및 콘텐츠 소비자 디바이스 (14) 의 맥락에서 설명되지만, 그 기법들은 음장의 (HOA 계수들이라고도 또한 지칭될 수도 있는) SHC들 또는 임의의 다른 계층적 표현이 오디오 데이터를 표현하는 비트스트림을 형성하도록 인코딩되는 임의의 맥락에서 구현될 수도 있다. 더욱이, 콘텐츠 크리에이터 디바이스 (12) 는 몇몇 예들을 제공하자면 핸드셋 (또는 셀룰러 폰), 태블릿 컴퓨터, 스마트 폰, 또는 데스크톱 컴퓨터를 포함하여, 본 개시물에서 설명되는 기법들을 구현하는 것이 가능한 임의의 형태의 컴퓨팅 디바이스를 표현할 수도 있다. 이와 마찬가지로, 콘텐츠 소비자 디바이스 (14) 는 몇몇 예들을 제공하자면 핸드셋 (또는 셀룰러 폰), 태블릿 컴퓨터, 스마트 폰, 셋톱 박스, 또는 데스크톱 컴퓨터를 포함하여, 본 개시물에서 설명되는 기법들을 구현하는 것이 가능한 임의의 형태의 컴퓨팅 디바이스를 표현할 수도 있다.FIG. 2 is a diagram illustrating a system 10 that may perform various aspects of the techniques described in this disclosure. As shown in the example of FIG. 2, the system 10 includes a content creator device 12 and a content consumer device 14. Although illustrated in the context of the content creator device 12 and the content consumer device 14, the techniques may be implemented in SHCs (which may also be referred to as HOA coefficients) or any other hierarchical representation of the sound field Or may be implemented in any context where it is encoded to form a bitstream. Moreover, the content creator device 12 may be any form of device capable of implementing the techniques described in this disclosure, including a handset (or cellular phone), a tablet computer, a smartphone, or a desktop computer, May represent a computing device. Likewise, content consumer device 14 may be capable of implementing the techniques described in this disclosure, including a handset (or cellular phone), a tablet computer, a smartphone, a set-top box, Or may represent any type of computing device.

콘텐츠 크리에이터 디바이스 (12) 는 콘텐츠 소비자 디바이스 (14) 와 같은 콘텐츠 소비자 디바이스들의 오퍼레이터들에 의한 소비를 위해 다중-채널 오디오 콘텐츠를 생성할 수도 있는 영화 스튜디오 또는 다른 엔터티에 의해 동작될 수도 있다. 일부 예들에서, 콘텐츠 크리에이터 디바이스 (12) 는 HOA 계수들 (11) 을 압축하고자 하는 개별 사용자에 의해 동작될 수도 있다. 종종, 콘텐츠 크리에이터는 비디오 콘텐츠와 함께 오디오 콘텐츠를 생성한다. 콘텐츠 소비자 디바이스 (14) 는 개인에 의해 동작될 수도 있다. 콘텐츠 소비자 디바이스 (14) 는, 다중-채널 오디오 콘텐츠로서의 재생을 위해 SHC 를 렌더링 (rendering) 하는 것이 가능한 임의의 형태의 오디오 재생 시스템을 지칭할 수도 있는 오디오 재생 시스템 (16) 을 포함할 수도 있다.Content creator device 12 may be operated by a movie studio or other entity that may generate multi-channel audio content for consumption by operators of content consumer devices, such as content consumer device 14. [ In some instances, the content creator device 12 may be operated by an individual user who wishes to compress the HOA coefficients 11. Often, content creators generate audio content along with video content. The content consumer device 14 may be operated by an individual. Content consumer device 14 may include an audio playback system 16 that may refer to any form of audio playback system capable of rendering SHC for playback as multi-channel audio content.

콘텐츠 크리에이터 디바이스 (12) 는 오디오 편집 시스템 (18) 을 포함한다. 콘텐츠 크리에이터 디바이스 (12) 는 콘텐츠 크리에이터 디바이스 (12) 가 오디오 편집 시스템 (18) 을 이용하여 편집할 수도 있는 오디오 오브젝트들 (9) 및 다양한 포맷들 (HOA 계수들로서 직접 포함함) 의 라이브 레코딩들 (7) 을 획득한다. 마이크로폰 (5) 은 라이브 레코딩들 (7) 을 캡처할 수도 있다. 콘텐츠 크리에이터는, 편집 프로세스 동안, 오디오 오브젝트들 (9) 로부터 HOA 계수들 (11) 을 렌더링하여, 추가로 편집할 것을 요구하는 음장의 다양한 양태들을 식별하기 위한 시도시에 렌더링된 스피커 피드들을 청취할 수도 있다. 그 후에, 콘텐츠 크리에이터 디바이스 (12) 는 (소스 HOA 계수들이 상술된 방식으로 도출되게 할 수도 있는 오디오 오브젝트들 (9) 중 상이한 오디오 오브젝트들의 조작을 통해 잠재적으로 간접적으로) HOA 계수들 (11) 을 편집할 수도 있다. 콘텐츠 크리에이터 디바이스 (12) 는 오디오 편집 시스템 (18) 을 채용하여 HOA 계수들 (11) 을 생성할 수도 있다. 오디오 편집 시스템 (18) 은 오디오 데이터를 편집하고 오디오 데이터를 하나 이상의 소스 구면 조화 계수들로서 출력하는 것이 가능한 임의의 시스템을 표현한다.The content creator device 12 includes an audio editing system 18. The content creator device 12 includes audio objects 9 that the content creator device 12 may edit using the audio editing system 18 and live recordings of various formats (including directly as HOA coefficients) 7). The microphone 5 may capture live recordings 7. The content creator can also render the HOA coefficients 11 from the audio objects 9 during the editing process and listen to the rendered speaker feeds in an attempt to identify various aspects of the sound field requiring further editing have. Thereafter, the content creator device 12 generates HOA coefficients 11 (potentially indirectly through the manipulation of different audio objects among the audio objects 9 that may cause the source HOA coefficients to be derived in the manner described above) You can edit it. The content creator device 12 may employ the audio editing system 18 to generate the HOA coefficients 11. The audio editing system 18 represents any system capable of editing audio data and outputting audio data as one or more source spherical harmonic coefficients.

편집 프로세스가 완료될 때, 콘텐츠 크리에이터 디바이스 (12) 는 HOA 계수들 (11) 에 기초하여 비트스트림 (21) 을 생성할 수도 있다. 즉, 콘텐츠 크리에이터 디바이스 (12) 는 비트스트림 (21) 을 생성하기 위해 본 개시물에서 설명되는 기법들의 다양한 양태들에 따라 HOA 계수들 (11) 을 인코딩하거나 또는 그렇지 않으면 압축하도록 구성된 디바이스를 표현하는 오디오 인코딩 디바이스 (20) 를 포함한다. 오디오 인코딩 디바이스 (20) 는, 하나의 예로서, 데이터 저장 디바이스, 유선 또는 무선 채널일 수도 있는 송신 채널 등에 걸친 송신을 위해 비트스트림 (21) 을 생성할 수도 있다. 비트스트림 (21) 은 HOA 계수들 (11) 의 인코딩된 버전을 표현할 수도 있고, 프라이머리 비트스트림 또는 다른 사이드 비트스트림을 포함할 수도 있는데, 이 사이드 비트스트림은 사이드 채널 정보라고 지칭될 수도 있다.When the editing process is completed, the content creator device 12 may generate the bitstream 21 based on the HOA coefficients 11. That is, the content creator device 12 is configured to represent a device configured to encode or otherwise compress the HOA coefficients 11 in accordance with various aspects of the techniques described in this disclosure for generating the bitstream 21 And an audio encoding device (20). The audio encoding device 20 may generate a bitstream 21 for transmission over a transmission channel, which may be, for example, a data storage device, a wired or wireless channel. The bitstream 21 may represent an encoded version of the HOA coefficients 11 and may include a primary bitstream or other side bitstream, which may be referred to as side channel information.

콘텐츠 소비자 디바이스 (14) 에 직접 송신되는 것으로서 도 2 에 도시되지만, 콘텐츠 크리에이터 디바이스 (12) 는 콘텐츠 크리에이터 디바이스 (12) 와 콘텐츠 소비자 디바이스 (14) 사이에 포지셔닝된 중간 디바이스에 비트스트림 (21) 을 출력할 수도 있다. 중간 디바이스는 비트스트림을 요청할 수도 있는 콘텐츠 소비자 디바이스 (14) 로의 추후 전달을 위해 비트스트림 (21) 을 저장할 수도 있다. 중간 디바이스는 파일 서버, 웹 서버, 데스크톱 컴퓨터, 랩톱 컴퓨터, 태블릿 컴퓨터, 모바일 폰, 스마트 폰, 또는 오디오 디코더에 의한 추후 취출을 위해 비트스트림 (21) 을 저장하는 것이 가능한 임의의 다른 디바이스를 포함할 수도 있다. 중간 디바이스는, 비트스트림 (21) 을, 비트스트림 (21) 을 요청하는 콘텐츠 소비자 디바이스 (14) 와 같은 가입자들에게 스트리밍하는 것 (그리고 가능하게는, 대응하는 비디오 데이터 비트스트림을 송신하는 것과 함께) 이 가능한 콘텐츠 전달 네트워크에 상주할 수도 있다.Although depicted in Figure 2 as being directly transmitted to the content consumer device 14, the content creator device 12 includes a bitstream 21 for the intermediate device positioned between the content creator device 12 and the content consumer device 14 Output. The intermediate device may store the bitstream 21 for later delivery to the content consumer device 14 which may request the bitstream. The intermediate device includes any other device capable of storing the bitstream 21 for future retrieval by a file server, web server, desktop computer, laptop computer, tablet computer, mobile phone, smart phone, or audio decoder It is possible. The intermediate device is capable of streaming the bitstream 21 to subscribers, such as the content consumer device 14 requesting the bitstream 21 (and possibly with the transmission of the corresponding video data bitstream) ) May reside in a content delivery network.

대안적으로, 콘텐츠 크리에이터 디바이스 (12) 는 콤팩트 디스크, 디지털 비디오 디스크, 고선명 비디오 디스크 또는 다른 저장 매체들과 같은 저장 매체에 비트스트림 (21) 을 저장할 수도 있고, 이들 중 대부분은 컴퓨터에 의해 판독되는 것이 가능하여 그에 따라 컴퓨터 판독가능 저장 매체들 또는 비일시적 컴퓨터 판독가능 저장 매체들이라고 지칭될 수도 있다. 이러한 맥락에서, 송신 채널은 이들 매체들에 저장된 콘텐츠가 송신되게 하는 채널들을 지칭할 수도 있다 (그리고, 리테일 스토어 (retail store) 들 및 다른 스토어-기반 전달 메커니즘을 포함할 수도 있다). 그에 따라, 어떤 경우든, 본 개시물의 기법들은 이와 관련하여 도 2 의 예로 제한되어서는 안된다.Alternatively, the content creator device 12 may store the bitstream 21 in a storage medium such as a compact disk, a digital video disk, a high definition video disk, or other storage media, many of which are read by a computer And thus may be referred to as computer-readable storage media or non-volatile computer-readable storage media. In this context, the transmission channel may refer to channels (and may include retail stores and other store-based delivery mechanisms) that cause content stored on these media to be transmitted. Accordingly, in any event, the techniques of this disclosure should not be limited to the example of FIG. 2 in this regard.

도 2 의 예에 추가로 도시된 바와 같이, 콘텐츠 소비자 디바이스 (14) 는 오디오 재생 시스템 (16) 을 포함한다. 오디오 재생 시스템 (16) 은 다중-채널 오디오 데이터를 재생하는 것이 가능한 임의의 오디오 재생 시스템을 표현할 수도 있다. 오디오 재생 시스템 (16) 은 다수의 상이한 렌더러들 (22) 을 포함할 수도 있다. 렌더러들 (22) 은 각각 상이한 형태의 렌더링을 제공할 수도 있고, 여기서 상이한 형태들의 렌더링은 벡터-기반 진폭 패닝 (vector-base amplitude panning; VBAP) 을 수행하는 다양한 방법들 중 하나 이상, 및/또는 음장 합성을 수행하는 다양한 방법들 중 하나 이상을 포함할 수도 있다. 본 명세서에서 사용되는 바와 같이, "A 및/또는 B" 는 "A 또는 B", 또는 "A 와 B" 양쪽을 의미한다.As further shown in the example of FIG. 2, the content consumer device 14 includes an audio playback system 16. The audio reproduction system 16 may represent any audio reproduction system capable of reproducing multi-channel audio data. The audio playback system 16 may include a number of different renderers 22. Renderers 22 may each provide a different type of rendering, where rendering of different types may be performed by one or more of various methods of performing vector-based amplitude panning (VBAP) and / And may include one or more of various methods of performing sound field synthesis. As used herein, " A and / or B " means " A or B ", or " A and B ".

오디오 재생 시스템 (16) 은 오디오 디코딩 디바이스 (24) 를 더 포함할 수도 있다. 오디오 디코딩 디바이스 (24) 는 비트스트림 (21) 으로부터의 HOA 계수들 (11') 을 디코딩하도록 구성된 디바이스를 표현할 수도 있고, 여기서 HOA 계수들 (11') 은 HOA 계수들 (11) 과 유사하지만 손실성 동작들 (예를 들어, 양자화) 및/또는 송신 채널을 통한 송신으로 인해 상이할 수도 있다. 오디오 재생 시스템 (16) 은 비트스트림 (21) 을 디코딩한 후에 HOA 계수들 (11') 을 획득하고 HOA 계수들 (11') 을 렌더링하여 라우드스피커 피드들 (25) 을 출력할 수도 있다. 라우드스피커 피드들 (25) 은 (예시 목적들의 용이를 위해 도 2 의 예에 도시되지 않은) 하나 이상의 라우드스피커들을 구동할 수도 있다.The audio reproduction system 16 may further comprise an audio decoding device 24. [ The audio decoding device 24 may represent a device configured to decode the HOA coefficients 11 'from the bitstream 21, wherein the HOA coefficients 11' are similar to the HOA coefficients 11, (E. G., Quantization) and / or transmission over a transmission channel. The audio playback system 16 may decode the bitstream 21 and then obtain the HOA coefficients 11 'and render the HOA coefficients 11' to output the loudspeaker feeds 25. Loudspeaker feeds 25 may drive one or more loudspeakers (not shown in the example of FIG. 2 for ease of illustration purposes).

적절한 렌더러를 선택하거나 또는, 일부 경우들에서, 적절한 렌더러를 생성하기 위해, 오디오 재생 시스템 (16) 은 라우드스피커들의 개수 및/또는 라우드스피커들의 공간 지오메트리를 나타내는 라우드스피커 정보 (13) 를 획득할 수도 있다. 일부 경우들에서, 오디오 재생 시스템 (16) 은 라우드스피커 정보를 동적으로 결정하도록 하는 방식으로 라우드스피커들을 구동하고 참조 마이크로폰을 이용하여 라우드스피커 정보 (13) 를 획득할 수도 있다. 다른 경우들에서 또는 라우드스피커 정보 (13) 의 동적 결정과 함께, 오디오 재생 시스템 (16) 은 오디오 재생 시스템 (16) 과 상호작용하고 라우드스피커 정보 (13) 를 입력하도록 사용자를 프롬프트할 수도 있다.In order to select an appropriate renderer or, in some cases, to create an appropriate renderer, the audio playback system 16 may obtain loudspeaker information 13 indicating the number of loudspeakers and / or the spatial geometry of the loudspeakers have. In some cases, the audio playback system 16 may drive the loudspeakers in a manner that dynamically determines the loudspeaker information and obtain the loudspeaker information 13 using the reference microphone. In other cases or with dynamic determination of the loudspeaker information 13, the audio playback system 16 may interact with the audio playback system 16 and prompt the user to enter the loudspeaker information 13.

그 후에, 오디오 재생 시스템 (16) 은 라우드스피커 정보 (13) 에 기초하여 오디오 렌더러들 (22) 중 하나를 선택할 수도 있다. 일부 경우들에서, 오디오 재생 시스템 (16) 은, 오디오 렌더러들 (22) 중 어떠한 것도 라우드스피커 정보 (13) 에 특정된 라우드스피커 지오메트리에 대한 (라우드스피커 지오메트리 관점에서의) 일부 임계 유사도 측정치 내에 있지 않을 때, 라우드스피커 정보 (13) 에 기초하여 오디오 렌더러들 (22) 중 하나를 생성할 수도 있다. 오디오 재생 시스템 (16) 은, 일부 경우들에서, 오디오 렌더러들 (22) 중 기존의 하나의 오디오 렌더러를 우선 선택하려고 시도하는 일 없이 라우드스피커 정보 (13) 에 기초하여 오디오 렌더러들 (22) 중 하나를 생성할 수도 있다. 하나 이상의 스피커들 (3) 은 그 후에 렌더링된 라우드스피커 피드들 (25) 을 재생할 수도 있다.The audio playback system 16 may then select one of the audio renderers 22 based on the loudspeaker information 13. In some cases, the audio playback system 16 is configured so that none of the audio renderers 22 is within some criticality similarity measure (from the loudspeaker geometry point of view) to the loudspeaker geometry specified in the loudspeaker information 13 , It may generate one of the audio renderers 22 based on the loudspeaker information 13. The audio playback system 16 may in some cases be able to select one of the audio renderers 22 based on the loudspeaker information 13 without first attempting to select an existing one of the audio renderers 22. [ You can also create one. The one or more speakers 3 may then reproduce the rendered loudspeaker feeds 25.

일부 경우들에서, 오디오 재생 시스템 (16) 은 오디오 렌더러들 (22) 중 임의의 하나의 오디오 렌더러를 선택할 수도 있고, 비트스트림 (21) 이 수신되는 소스 (예컨대 몇몇 예들을 제공하자면 DVD 플레이어, 블루레이 플레이어, 스마트폰, 태블릿 컴퓨터, 게이밍 시스템, 및 텔레비전) 에 따라 오디오 렌더러들 (22) 중 하나 이상을 선택하도록 구성될 수도 있다. 오디오 렌더러들 (22) 중 임의의 하나의 오디오 렌더러가 선택될 수도 있지만, 콘텐츠를 생성할 때 이용되는 오디오 렌더러는 종종, 오디오 렌더러들 중 하나, 즉, 도 3 의 예에서의 오디오 렌더러 (5) 를 이용하여 콘텐츠 크리에이터 (12) 에 의해 콘텐츠가 생성되었다는 사실로 인해 렌더링의 더 양호한 (그리고 가능하다면 최상의) 형태를 제공한다. (렌더링 형태의 관점에서) 동일하거나 또는 적어도 가까운 오디오 렌더러들 (22) 중 하나를 선택하는 것은, 음장의 더 양호한 표현을 제공할 수도 있고 콘텐츠 소비자 (14) 에 대한 더 양호한 서라운드 사운드 경험을 발생시킬 수도 있다.In some cases, the audio playback system 16 may select an audio renderer of any one of the audio renderers 22 and may select a source (e.g., a DVD player, a Blu- (E.g., a video player, a video player, a video player, a ray player, a smartphone, a tablet computer, a gaming system, and a television). Although any one of the audio renderers 22 may be selected, the audio renderer used when creating the content is often one of the audio renderers, i.e., the audio renderer 5 in the example of FIG. 3, (And possibly the best) form of rendering due to the fact that the content has been created by the content creator 12 with the help of the content creator. Selecting one of the same or at least approximate audio renderers 22 (in terms of rendering style) may provide a better representation of the sound field and may result in a better surround sound experience for the content consumer 14 It is possible.

본 개시물에서 설명되는 기법들에 따르면, 오디오 인코딩 디바이스 (20) 는 오디오 렌더링 정보 (2) ("렌더 정보 (2)") 를 포함하도록 비트스트림 (21) 을 생성할 수도 있다. 오디오 렌더링 정보 (2) 는 다중-채널 오디오 콘텐츠를 생성할 때 이용되는 오디오 렌더러, 즉, 도 3 의 예에서의 오디오 렌더러 (1) 를 식별하는 신호 값을 포함할 수도 있다. 일부 경우들에서, 신호 값은 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 행렬을 포함한다.According to the techniques described in this disclosure, the audio encoding device 20 may generate the bitstream 21 to include audio rendering information 2 (" render information 2 "). The audio rendering information 2 may include a signal value that identifies the audio renderer used to create the multi-channel audio content, i.e., the audio renderer 1 in the example of FIG. In some cases, the signal value includes a matrix used to render the spherical harmonic coefficients into a plurality of speaker feeds.

일부 경우들에서, 신호 값은 비트스트림이 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 행렬을 포함함을 나타내는 인덱스를 정의하는 2 개 이상의 비트들을 포함한다. 일부 경우들에서, 인덱스가 이용될 때, 신호 값은 비트스트림에 포함된 행렬의 로우 (row) 들의 개수를 정의하는 2 개 이상의 비트들 및 비트스트림에 포함된 행렬의 컬럼 (column) 들의 개수를 정의하는 2 개 이상의 비트들을 더 포함한다. 이 정보를 이용하고 2 차원 행렬의 각각의 계수가 32-비트 부동 소수점 수에 의해 통상적으로 정의된다고 주어진다면, 행렬의 비트들의 관점에서의 사이즈는 로우들의 개수, 컬럼들의 개수, 및 행렬의 각각의 계수를 정의하는 부동 소수점 수들의 사이즈, 즉, 이 예에서는 32-비트들의 함수로서 연산될 수도 있다.In some cases, the signal value includes two or more bits defining an index indicating that the bitstream includes a matrix used to render the spherical harmonic coefficients into a plurality of speaker feeds. In some cases, when an index is used, the signal value may include two or more bits defining the number of rows of the matrix included in the bitstream and the number of columns of the matrix included in the bitstream Lt; RTI ID = 0.0 > and / or < / RTI > Given this information and given that each coefficient of a two-dimensional matrix is normally defined by a 32-bit floating-point number, the size in terms of the bits of the matrix is determined by the number of rows, the number of columns, May be computed as a function of the size of the floating-point numbers that define the coefficient, i. E. In this example, as 32-bits.

일부 경우들에서, 신호 값은 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 렌더링 알고리즘을 특정한다. 렌더링 알고리즘은 오디오 인코딩 디바이스 (20) 및 디코딩 디바이스 (24) 양쪽에게 알려져 있는 행렬을 포함할 수도 있다. 즉, 렌더링 알고리즘은 패닝 (예를 들어, VBAP, DBAP 또는 단순한 패닝) 또는 NFC 필터링과 같은 다른 렌더링 단계들에 부가적으로 행렬의 적용을 포함할 수도 있다. 일부 경우들에서, 신호 값은 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 복수의 행렬들 중 하나와 연관된 인덱스를 정의하는 2 개 이상의 비트들을 포함한다. 다시, 오디오 인코딩 디바이스 (20) 및 디코딩 디바이스 (24) 양쪽은 인덱스가 복수의 행렬들 중 특정된 하나의 행렬을 고유하게 식별할 수도 있도록 복수의 행렬들의 차수 및 복수의 행렬들을 나타내는 정보로 구성될 수도 있다. 대안적으로, 오디오 인코딩 디바이스 (20) 는 인덱스가 복수의 행렬들 중 특정된 하나의 행렬을 고유하게 식별할 수도 있도록 복수의 행렬들의 차수 및/또는 복수의 행렬들을 정의하는 비트스트림 (21) 에서의 데이터를 특정할 수도 있다.In some cases, the signal value specifies a rendering algorithm used to render the spherical harmonic coefficients into a plurality of speaker feeds. The rendering algorithm may include a matrix known to both the audio encoding device 20 and the decoding device 24. That is, the rendering algorithm may include the application of the matrix in addition to other rendering steps such as panning (e.g., VBAP, DBAP or simple panning) or NFC filtering. In some cases, the signal value includes two or more bits defining an index associated with one of the plurality of matrices used to render the spherical harmonic coefficients into a plurality of speaker feeds. Again, both the audio encoding device 20 and the decoding device 24 are configured with information indicating the order and the plurality of matrices of the plurality of matrices such that the index may uniquely identify one of the plurality of matrices It is possible. Alternatively, the audio encoding device 20 may be configured to decode the bitstream 21, which defines the order and / or the plurality of matrices of the plurality of matrices such that the index may uniquely identify a particular one of the plurality of matrices May be specified.

일부 경우들에서, 신호 값은 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 복수의 렌더링 알고리즘들 중 하나와 연관된 인덱스를 정의하는 2 개 이상의 비트들을 포함한다. 다시, 오디오 인코딩 디바이스 (20) 및 디코딩 디바이스 (24) 양쪽은 인덱스가 복수의 행렬들 중 특정된 하나의 행렬을 고유하게 식별할 수도 있도록 복수의 렌더링 알고리즘들의 차수 및 복수의 렌더링 알고리즘들을 나타내는 정보로 구성될 수도 있다. 대안적으로, 오디오 인코딩 디바이스 (20) 는 인덱스가 복수의 행렬들 중 특정된 하나의 행렬을 고유하게 식별할 수도 있도록 복수의 행렬들의 차수 및/또는 복수의 행렬들을 정의하는 비트스트림 (21) 에서의 데이터를 특정할 수도 있다.In some cases, the signal value includes two or more bits that define an index associated with one of a plurality of rendering algorithms used to render the spherical harmonic coefficients into a plurality of speaker feeds. Again, both the audio encoding device 20 and the decoding device 24 may use the information indicating the order of a plurality of rendering algorithms and a plurality of rendering algorithms so that the index may uniquely identify a particular one of the plurality of matrices . Alternatively, the audio encoding device 20 may be configured to decode the bitstream 21, which defines the order and / or the plurality of matrices of the plurality of matrices such that the index may uniquely identify a particular one of the plurality of matrices May be specified.

일부 경우들에서, 오디오 인코딩 디바이스 (20) 는 비트스트림에서 오디오 렌더링 정보 (2) 를 오디오 프레임 기반으로 특정한다. 다른 경우들에서, 오디오 인코딩 디바이스 (20) 는 비트스트림에서 오디오 렌더링 정보 (2) 를 단일 회 특정한다.In some cases, the audio encoding device 20 specifies audio rendering information (2) on an audio frame basis in the bitstream. In other cases, the audio encoding device 20 specifies audio rendering information 2 in the bitstream a single time.

디코딩 디바이스 (24) 는 그 후에 비트스트림에 특정된 오디오 렌더링 정보 (2) 를 결정할 수도 있다. 오디오 렌더링 정보 (2) 에 포함된 신호 값에 기초하여, 오디오 재생 시스템 (16) 은 오디오 렌더링 정보 (2) 에 기초하여 복수의 스피커 피드들 (25) 을 렌더링할 수도 있다. 위에서 언급된 바와 같이, 신호 값은 일부 경우들에서 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 행렬을 포함할 수도 있다. 이 경우, 오디오 재생 시스템 (16) 은 행렬로 오디오 렌더러들 (22) 중 하나를 구성하여, 행렬에 기초하여 스피커 피드들 (25) 을 렌더링하기 위해 오디오 렌더러들 (22) 중 이 하나의 오디오 렌더러를 이용할 수도 있다.The decoding device 24 may then determine the audio rendering information (2) specified in the bitstream. Based on the signal values contained in the audio rendering information 2, the audio playback system 16 may render a plurality of speaker feeds 25 based on the audio rendering information 2. As mentioned above, the signal value may include a matrix used in some cases to render the spherical harmonic coefficients into a plurality of speaker feeds. In this case, the audio playback system 16 may configure one of the audio renderers 22 as a matrix to provide the audio renderer 22 with one of the audio renderers 22 to render the speaker feeds 25 based on the matrix. May be used.

일부 경우들에서, 신호 값은 비트스트림이 HOA 계수들 (11') 을 스피커 피드들 (25) 로 렌더링하는데 이용되는 행렬을 포함함을 나타내는 인덱스를 정의하는 2 개 이상의 비트들을 포함한다. 디코딩 디바이스 (24) 는 인덱스에 응답하여 비트스트림으로부터의 행렬을 파싱할 수도 있고, 그 결과 오디오 재생 시스템 (16) 은 파싱된 행렬로 오디오 렌더러들 (22) 중 하나의 오디오 렌더러를 구성하고 오디오 렌더러들 (22) 중 이 하나의 오디오 렌더러를 호출하여 스피커 피드들 (25) 을 렌더링할 수도 있다. 신호 값이 비트스트림에 포함된 행렬의 로우들의 개수를 정의하는 2 개 이상의 비트들 및 비트스트림에 포함된 행렬의 컬럼들의 개수를 정의하는 2 개 이상의 비트들을 포함할 때, 디코딩 디바이스 (24) 는 상술된 방식으로 로우들의 개수를 정의하는 2 개 이상의 비트들 및 컬럼들의 개수를 정의하는 2 개 이상의 비트들에 기초하여 그리고 인덱스에 응답하여 비트스트림으로부터의 행렬을 파싱할 수도 있다.In some cases, the signal value includes two or more bits defining an index indicating that the bitstream includes a matrix used to render the HOA coefficients 11 'into the speaker feeds 25. The decoding device 24 may parse the matrix from the bitstream in response to the index such that the audio playback system 16 constructs an audio renderer of one of the audio renderers 22 with a parsed matrix, Or one of the audio renderers 22 to render the speaker feeds 25. When the signal value includes two or more bits defining the number of rows of the matrix included in the bitstream and two or more bits defining the number of columns of the matrix included in the bitstream, It may parse the matrix from the bit stream based on two or more bits that define the number of rows and two or more bits that define the number of rows and the number of columns and in response to the index in the manner described above.

일부 경우들에서, 신호 값은 HOA 계수들 (11') 을 스피커 피드들 (25) 로 렌더링하는데 이용되는 렌더링 알고리즘을 특정한다. 이들 경우들에서, 오디오 렌더러들 (22) 중 일부 또는 전부는 이들 렌더링 알고리즘들을 수행할 수도 있다. 오디오 재생 디바이스 (16) 는 그 후에, HOA 계수들 (11') 로부터 스피커 피드들 (25) 을 렌더링하기 위해, 특정된 렌더링 알고리즘, 예를 들어, 오디오 렌더러들 (22) 중 하나를 활용할 수도 있다.In some cases, the signal value specifies the rendering algorithm used to render the HOA coefficients 11 'into the speaker feeds 25. In these cases, some or all of the audio renderers 22 may perform these rendering algorithms. The audio playback device 16 may then utilize one of the specified rendering algorithms, for example, the audio renderers 22, to render the speaker feeds 25 from the HOA coefficients 11 ' .

신호 값이 HOA 계수들 (11') 을 스피커 피드들 (25) 로 렌더링하는데 이용되는 복수의 행렬들 중 하나와 연관된 인덱스를 정의하는 2 개 이상의 비트들을 포함할 때, 오디오 렌더러들 (22) 중 일부 또는 전부는 이 복수의 행렬들을 표현할 수도 있다. 따라서, 오디오 재생 시스템 (16) 은 인덱스와 연관된 오디오 렌더러들 (22) 중 하나를 이용하여 HOA 계수들 (11') 로부터 스피커 피드들 (25) 을 렌더링할 수도 있다.When the signal value includes two or more bits defining an index associated with one of the plurality of matrices used to render the HOA coefficients 11 'into the speaker feeds 25, one of the audio renderers 22 Some or all of these may represent the plurality of matrices. Thus, the audio playback system 16 may render the speaker feeds 25 from the HOA coefficients 11 'using one of the audio renderers 22 associated with the index.

신호 값이 HOA 계수들 (11') 을 스피커 피드들 (25) 로 렌더링하는데 이용되는 복수의 렌더링 알고리즘들 중 하나와 연관된 인덱스를 정의하는 2 개 이상의 비트들을 포함할 때, 오디오 렌더러들 (34) 중 일부 또는 전부는 이들 렌더링 알고리즘들을 표현할 수도 있다. 따라서, 오디오 재생 시스템 (16) 은 인덱스와 연관된 오디오 렌더러들 (22) 중 하나를 이용하여 구면 조화 계수들 (11') 로부터 스피커 피드들 (25) 을 렌더링할 수도 있다.When the signal values include two or more bits defining an index associated with one of a plurality of rendering algorithms used to render the HOA coefficients 11 'into the speaker feeds 25, the audio renderers 34, Some or all of which may represent these rendering algorithms. Thus, the audio playback system 16 may render the speaker feeds 25 from the spherical harmonic coefficients 11 'using one of the audio renderers 22 associated with the index.

이 오디오 렌더링 정보가 비트스트림에 특정되는 빈도에 따라, 디코딩 디바이스 (24) 는 오디오 렌더링 정보 (2) 를 오디오-프레임-기반으로 또는 단일 회 결정할 수도 있다.Depending on how often this audio rendering information is specified in the bitstream, the decoding device 24 may audio-frame-based or single-decode the audio rendering information (2).

이러한 방식으로 오디오 렌더링 정보 (3) 를 특정하는 것에 의해, 기법들은 잠재적으로 다중-채널 오디오 콘텐츠의 더 양호한 재생을 발생시키고 콘텐츠 크리에이터 (12) 가 다중-채널 오디오 콘텐츠가 재생되도록 의도한 방식에 따라 발생시킬 수도 있다. 그 결과, 기법들은 더 몰입형의 서라운드 사운드 또는 다중-채널 오디오 경험을 제공할 수도 있다.By specifying audio rendering information 3 in this manner, techniques can potentially result in better playback of multi-channel audio content, and the content creator 12 may be able to adapt to the manner in which the multi- . As a result, techniques may provide a more immersive surround sound or multi-channel audio experience.

다시 말해 그리고 위에서 언급된 바와 같이, 고차 앰비소닉스 (HOA) 는 공간 푸리에 변환에 기초하여 음장의 방향성 (directional) 정보를 설명하게 하는 방법을 표현할 수도 있다. 통상적으로, 앰비소닉스 차수 N 이 높을수록, 공간 해상도가 높아지고, 구면 조화들 (SH) 계수들 (N+1)^2 의 개수가 커지며, 데이터를 송신 및 저장하기 위해 요구되는 대역폭이 커진다.In other words, and as noted above, the higher order ambiance (HOA) may represent a way to describe the directional information of the sound field based on the spatial Fourier transform. Typically, the higher the ambsonic order N, the higher the spatial resolution, the larger the number of spherical harmonics (SH) coefficients (N + 1) ^ 2, and the larger the bandwidth required to transmit and store data.

이 설명의 잠재적인 이점은 대부분의 임의의 라우드스피커 셋업 (예를 들어, 5.1, 7.1 22.2 등) 에서 이 음장을 재생할 가능성이다. 음장 디스크립션으로부터 M 개의 라우드스피커 신호들로의 컨버전은 (N+1)² 개의 입력들 및 M 개의 출력들을 갖는 정적 렌더링 행렬을 통해 행해질 수도 있다. 그 결과, 모든 라우드스피커 셋업은 전용 렌더링 행렬을 필요로 할 수도 있다. 소정의 객관적인 또는 주관적인 척도, 예컨대 Gerzon 기준들에 대해 최적화될 수도 있는 원하는 라우드스피커들에 대한 렌더링 행렬을 연산하기 위한 수 개의 알고리즘들이 존재할 수도 있다. 불규칙적인 라우드스피커 셋업들에 대해, 알고리즘들은 반복적인 수치 최적화 프로시저들, 예컨대 콘벡스 최적화로 인해 복잡해질 수도 있다. 대기 시간 없이, 불규칙한 라우드스피커 레이아웃들에 대해 렌더링 행렬을 연산하기 위해서는, 이용가능한 충분한 연산 리소스들을 갖는 것이 이로울 수도 있다. 불규칙한 라우드스피커 셋업들은 아키텍처 제약들 및 심미적 선호도들로 인해 집안 거실 환경들에서 일반적일 수도 있다. 그에 따라, 최상의 음장 재생을 위해, 이러한 시나리오에 대해 최적화된 렌더링 행렬은 더 정확히 음장의 재생을 가능하게 할 수도 있다는 점에서 선호될 수도 있다.A potential benefit of this description is the possibility to reproduce this sound field in most random loudspeaker setups (e.g., 5.1, 7.1 22.2, etc.). Converting from the sound field description to M loudspeaker signals may be done through a static rendering matrix with (N + 1) ² inputs and M outputs. As a result, all loudspeaker setups may require a dedicated rendering matrix. There may be several algorithms for computing a rendering matrix for desired loudspeakers that may be optimized for some objective or subjective scale, e.g., Gerzon criteria. For irregular loudspeaker setups, the algorithms may be complicated by repetitive numerical optimization procedures, such as convex optimization. To compute the rendering matrix for irregular loudspeaker layouts without waiting time, it may be advantageous to have enough computational resources available. Irregular loudspeaker setups may be common in home living environments due to architectural limitations and aesthetic preferences. Thus, for best sound field reproduction, a rendering matrix optimized for this scenario may be preferred in that it may enable more accurate reproduction of the sound field.

오디오 디코더가 보통 많은 연산 리소스들을 필요로 하지 않기 때문에, 디바이스는 소비자 친화적인 시간에 불규칙한 렌더링 행렬을 연산하는 것이 가능하지 않을 수도 있다. 본 개시물에서 설명되는 기법들의 다양한 양태들은 다음과 같이 클라우드-기반 연산 접근법을 이용을 위해 제공할 수도 있다:Because audio decoders typically do not require many computational resources, the device may not be able to compute an irregular rendering matrix at consumer friendly times. Various aspects of the techniques described in this disclosure may be provided for use in a cloud-based computing approach as follows:

1. 오디오 디코더는 인터넷 연결을 통해 라우드스피커 좌표들 (그리고, 일부 경우들에서, 또한 캘리브레이션 마이크로폰으로 획득된 SPL 측정치들) 을 서버에 전송할 수도 있다;1. The audio decoder may send loudspeaker coordinates (and in some cases also SPL measurements obtained with a calibration microphone) to the server over an Internet connection;

2. 클라우드-기반 서버는 렌더링 행렬 (그리고 가능하다면 몇몇 상이한 버전들, 고객이 추후에 이들 상이한 버전들로부터 선정할 수도 있도록 함) 을 연산할 수도 있다; 그리고2. The cloud-based server may calculate a rendering matrix (and possibly several different versions, which the customer may later select from these different versions); And

3. 그 후에, 서버는 인터넷 연결을 통해 렌더링 행렬 (또는 상이한 버전들) 을 오디오 디코더에 다시 전송할 수도 있다.3. The server may then send the rendering matrix (or different versions) back to the audio decoder over the Internet connection.

이 접근법은 (강력한 프로세서가 이들 불규칙한 렌더링 행렬들을 연산하는데 필요하지 않을 수도 있기 때문에) 제조자가 오디오 디코더의 제조 비용들을 낮게 유지시키면서도 또한, 규칙적인 스피커 구성들 또는 지오메트리들에 대해 보통 설계되는 렌더링 행렬들에 비해 더 최적의 오디오 재생을 용이하게 할 수도 있다. 렌더링 행렬을 연산하기 위한 알고리즘은 또한 오디오 디코더가 출하된 후에 최적화되어, 하드웨어 변경들 또는 심지어 리콜들에 대한 비용들을 잠재적으로 감소시킬 수도 있다. 기법들은 또한, 일부 경우들에서, 장래의 제품 개발들에 대해 이로울 수도 있는 소비자 제품들의 상이한 라우드스피커 셋업들에 관한 많은 정보를 수집할 수도 있다.This approach allows manufacturers to reduce the manufacturing costs of audio decoders (since a robust processor may not be required to compute these irregular rendering matrices), but also allows for the use of rendering matrices that are typically designed for regular speaker configurations or geometries It may be possible to facilitate more optimal audio reproduction. The algorithm for computing the rendering matrix may also be optimized after the audio decoder is shipped, potentially reducing costs for hardware changes or even recall. The techniques may also collect, in some cases, a lot of information about different loudspeaker setups of consumer products that may be beneficial to future product developments.

일부 경우들에서, 도 3 에 도시된 시스템은 상술된 바와 같이 비트스트림 (21) 에서 오디오 렌더링 정보 (2) 를 시그널링하지 않을 수도 있지만, 그 대신에 비트스트림 (21) 으로부터 분리된 메타데이터로서 이 오디오 렌더링 정보 (2) 를 시그널링할 수도 있다. 대안적으로 또는 상술된 것과 관련하여, 도 3 에 도시된 시스템은 상술된 바와 같이 비트스트림 (21) 에서 오디오 렌더링 정보 (2) 의 일부를 시그널링할 수도 있고 비트스트림 (21) 으로부터 분리된 메타데이터로서 이 오디오 렌더링 정보 (3) 의 일부를 시그널링할 수도 있다. 일부 예들에서, 오디오 인코딩 디바이스 (20) 는 이 메타데이터를 출력할 수도 있고, 이 메타데이터는 그 후에 서버 또는 다른 디바이스에 업로드될 수도 있다. 오디오 디코딩 디바이스 (24) 는 그 후에 이 메타데이터를 다운로드하거나 또는 그렇지 않으면 취출할 수도 있고, 이 메타데이터는 그 후에 오디오 디코딩 디바이스 (24) 에 의해 비트스트림 (21) 으로부터 추출된 오디오 렌더링 정보를 증강시키는데 이용된다. 기법들의 렌더링 정보 양태들에 따라 형성된 비트스트림 (21) 은 도 8a 내지 도 8d 의 예들에 관하여 아래에 설명된다.In some cases, the system shown in FIG. 3 may not signal audio rendering information 2 in bitstream 21, as described above, but instead, as metadata separated from bitstream 21, The audio rendering information 2 may be signaled. Alternatively or in conjunction with what has been described above, the system shown in FIG. 3 may signal part of the audio rendering information 2 in the bitstream 21, as described above, And may signal a part of the audio rendering information (3). In some instances, the audio encoding device 20 may output this metadata, which may then be uploaded to a server or other device. The audio decoding device 24 may then download or otherwise retrieve this metadata which is then used by the audio decoding device 24 to augment the audio rendering information extracted from the bitstream 21 . The bitstream 21 formed according to the rendering information aspects of the techniques is described below with respect to the examples of Figs. 8a-8d.

도 3 은 본 개시물에서 설명되는 기법들의 다양한 양태들을 수행할 수도 있는 도 2 의 예에 도시된 오디오 인코딩 디바이스 (20) 의 하나의 예를 더 상세히 예시하는 블록 다이어그램이다. 오디오 인코딩 디바이스 (20) 는 콘텐츠 분석 유닛 (26), 벡터-기반 분해 유닛 (27) 및 방향성-기반 분해 유닛 (28) 을 포함한다. 아래에 간략히 설명되지만, HOA 계수들을 압축하거나 또는 그렇지 않으면 인코딩하는 다양한 양태들 및 오디오 인코딩 디바이스 (20) 에 관한 더 많은 정보는 발명의 명칭이 "INTERPOLATION FOR DECOMPOSED REPRESENTATIONS OF A SOUND FIELD" 이고 2014년 5월 29일자로 출원된 국제 특허 출원 공개 WO 2014/194099호에서 입수가능하다.FIG. 3 is a block diagram illustrating in greater detail one example of the audio encoding device 20 shown in the example of FIG. 2, which may perform various aspects of the techniques described in this disclosure. The audio encoding device 20 includes a content analysis unit 26, a vector-based decomposition unit 27 and a directionality-based decomposition unit 28. As will be briefly described below, the various aspects of compressing or otherwise encoding HOA coefficients and more information about the audio encoding device 20 are described in " INTERPOLATION FOR DECOMPOSED REPRESENTATIONS A SOUND FIELD " International Patent Application Publication No. WO 2014/194099, filed on May 29th.

콘텐츠 분석 유닛 (26) 은 HOA 계수들 (11) 이 라이브 레코딩 또는 오디오 오브젝트로부터 생성된 콘텐츠를 표현하는지 여부를 식별하기 위해 HOA 계수들 (11) 의 콘텐츠를 분석하도록 구성된 유닛을 표현한다. 콘텐츠 분석 유닛 (26) 은 HOA 계수들 (11) 이 실제 음장의 레코딩으로부터 생성되었는지 또는 인공적인 오디오 오브젝트로부터 생성되었는지 여부를 결정할 수도 있다. 일부 경우들에서, 프레임화된 HOA 계수들 (11) 이 레코딩으로부터 생성되었을 때, 콘텐츠 분석 유닛 (26) 은 HOA 계수들 (11) 을 벡터-기반 분해 유닛 (27) 에 전달한다. 일부 경우들에서, 프레임화된 HOA 계수들 (11) 이 합성 오디오 오브젝트로부터 생성되었을 때, 콘텐츠 분석 유닛 (26) 은 HOA 계수들 (11) 을 방향성-기반 합성 유닛 (28) 에 전달한다. 방향성-기반 합성 유닛 (28) 은 방향성-기반 비트스트림 (21) 을 생성하기 위해 HOA 계수들 (11) 의 방향성-기반 합성을 수행하도록 구성된 유닛을 표현할 수도 있다.The content analyzing unit 26 represents a unit configured to analyze the content of the HOA coefficients 11 to identify whether the HOA coefficients 11 represent content generated from live recording or audio objects. The content analyzing unit 26 may determine whether the HOA coefficients 11 were generated from recording of the actual sound field or from an artificial audio object. In some cases, when the framed HOA coefficients 11 have been generated from the recording, the content analysis unit 26 passes the HOA coefficients 11 to the vector-based decomposition unit 27. In some cases, when the framed HOA coefficients 11 have been generated from the composite audio object, the content analyzing unit 26 passes the HOA coefficients 11 to the directional-based compositing unit 28. The directional-based synthesis unit 28 may represent a unit configured to perform directional-based synthesis of the HOA coefficients 11 to produce a directional-based bitstream 21.

도 3 의 예에 도시된 바와 같이, 벡터-기반 분해 유닛 (27) 은 선형 가역 변환 (LIT) 유닛 (30), 파라미터 계산 유닛 (32), 리오더 유닛 (34), 전경 선택 유닛 (36), 에너지 보상 유닛 (38), 심리음향 오디오 코더 유닛 (40), 비트스트림 생성 유닛 (42), 음장 분석 유닛 (44), 계수 감소 유닛 (46), 배경 (BG) 선택 유닛 (48), 공간-시간 보간 유닛 (50), 및 양자화 유닛 (52) 을 포함할 수도 있다.3, the vector-based decomposition unit 27 includes a linear inverse transform (LIT) unit 30, a parameter calculation unit 32, a reorder unit 34, a foreground selection unit 36, A sound field analysis unit 44, a coefficient reduction unit 46, a background (BG) selection unit 48, a spatial-temporal analysis unit 44, A temporal interpolation unit 50, and a quantization unit 52. [

선형 가역 변환 (LIT) 유닛 (30) 은 HOA 채널들의 형태로 HOA 계수들 (11) 을 수신하고, 각각의 채널은 구면 기저 함수들 (HOA[k] 로서 표시될 수도 있고, 여기서 k 는 샘플들의 현재 프레임 또는 블록을 표시할 수도 있음) 의 주어진 차수, 하위차수와 연관된 계수의 블록 또는 프레임을 표현한다. HOA 계수들 (11) 의 행렬은 차원들 D: M x (N+1)² 을 가질 수도 있다.Linear inverse transform (LIT) units 30 receives the the HOA coefficient 11 in the form of HOA channels, each channel may be displayed as the spherical basis functions (HOA [k], where k is the number of samples A block or frame of coefficients associated with a given degree, a lower order, of the current frame or block). The matrix of HOA coefficients 11 may have dimensions D : M x ( N +1) ² .

LIT 유닛 (30) 은 특이값 분해라고 지칭되는 분석의 형태를 수행하도록 구성된 유닛을 표현할 수도 있다. SVD 에 관하여 설명되지만, 본 개시물에서 설명되는 기법들은 선형적으로 미상관된 에너지 집중형 출력의 세트들에 대해 제공하는 임의의 유사한 변환 또는 분해에 관하여 수행될 수도 있다. 또한, 본 개시물에서의 "세트들" 이라는 언급은, 구체적으로 반대로 서술되지 않으면 비-제로 세트들을 지칭하도록 일반적으로 의도되고, 소위 "공집합 (empty set)" 을 포함하는 세트들의 고전적인 수학적 정의를 지칭하도록 의도되지 않는다. 대안적인 변환은 "PCA" 라고 종종 지칭되는 주요 성분 분석 (principal component analysis) 을 포함할 수도 있다. 이러한 맥락에 따라, PCA 는, 몇몇 예들을 들자면, 이산 카루넨-루베 변환 (Karhunen-Loeve transform), 호텔링 변환 (Hotelling transform), 적절한 직교 분해 (proper orthogonal decomposition; POD), 및 고유값 분해 (eigenvalue decomposition; EVD) 와 같은 다수의 상이한 이름들로 지칭될 수도 있다. 오디오 데이터를 압축하는 기본 목적에 도움이 되는 이러한 동작들의 속성들은 다중채널 오디오 데이터의 '에너지 집중' 및 '탈상관' 이다.The LIT unit 30 may represent a unit configured to perform a form of analysis referred to as singular value decomposition. Although described with respect to SVD, the techniques described in this disclosure may be performed with respect to any similar transformations or decompositions that provide for sets of linearly uncorrelated energy-intensive outputs. Also, the term " sets " in this disclosure is intended to refer generally to non-zero sets unless specifically stated to the contrary, and includes the classical mathematical definition of sets including so-called " &Lt; / RTI > Alternative transformations may include principal component analysis, often referred to as " PCA ". In accordance with this context, the PCA can be implemented in several ways, such as the Karhunen-Loeve transform, the Hotelling transform, the proper orthogonal decomposition (POD) eigenvalue decomposition (EVD)). < / RTI > Attributes of these operations that serve the primary purpose of compressing audio data are the " energy concentration " and " uncorrelated "

어떤 경우든, LIT 유닛 (30) 이 예의 목적들을 위해 특이값 분해 (다시, "SVD" 라고 지칭될 수도 있음) 를 수행한다고 가정하면, LIT 유닛 (30) 은 HOA 계수들 (11) 을 변환된 HOA 계수들의 2 개 이상의 세트들로 변환할 수도 있다. 변환된 HOA 계수들의 "세트들" 은 변환된 HOA 계수들의 벡터들을 포함할 수도 있다. 도 3 의 예에서, LIT 유닛 (30) 은 HOA 계수들 (11) 에 관하여 SVD 를 수행하여 소위 V 행렬, S 행렬, 및 U 행렬을 생성할 수도 있다. 선형 대수에서의 SVD 는 y-바이-z (y-by-z) 실수 또는 복소 행렬 (X) (여기서 X 는 HOA 계수들 (11) 과 같은 다중-채널 오디오 데이터를 표현할 수도 있음) 의 인수분해를 다음의 형태로 표현할 수도 있다:In any case, assuming that the LIT unit 30 performs singular value decomposition (again, referred to as " SVD ") for exemplary purposes, the LIT unit 30 sends the HOA coefficients 11 to the transformed Into two or more sets of HOA coefficients. The " sets " of transformed HOA coefficients may include vectors of transformed HOA coefficients. In the example of FIG. 3, the LIT unit 30 may perform SVD with respect to the HOA coefficients 11 to generate the so-called V matrix, S matrix, and U matrix. The SVD in linear algebra can be expressed as a factorial of a y-by-z real number or a complex matrix X (where X may represent multi-channel audio data such as HOA coefficients 11) Can be expressed in the following form:

X = USV^* X = USV ^*

U 는 y-바이-y 실수 또는 복소 유니터리 행렬을 표현할 수도 있고, 여기서 U 의 y 컬럼들은 다중-채널 오디오 데이터의 좌-특이 벡터들로서 알려져 있다. S 는 대각선으로 비-네거티브 실수들을 갖는 y-바이-z 직사각형 대각 행렬을 표현할 수도 있고, 여기서 S 의 대각선 값들은 다중-채널 오디오 데이터의 특이값들로서 알려져 있다. V^* (V 의 공액 전치를 표시할 수도 있음) 는 z-바이-z 실수 또는 복소 유니터리 행렬을 표현할 수도 있고, 여기서 V^* 의 z 컬럼들은 다중-채널 오디오 데이터의 우-특이 벡터들로서 알려져 있다.U may represent a y-by-y real number or a complex unitary matrix, where the y columns of U are known as left-specific vectors of multi-channel audio data. S may represent a y-by-z rectangular diagonal matrix with diagonal non-negative real numbers, where diagonal values of S are known as singular values of multi-channel audio data. V ^* (which may represent the conjugate transposition of V) may represent a z-by-z real or complex unitary matrix, where the z columns of V ^* are known as right-specific vectors of multi-channel audio data .

일부 예들에서, 위에서 참조된 SVD 수학식에서의 V^* 행렬은, SVD 가 복소수들을 포함하는 행렬들에 적용될 수도 있음을 반영하기 위해 V 행렬의 공액 전치로서 표시된다. 오직 실수들만을 포함하는 행렬들에 적용될 때, V 행렬의 복소 공액 (또는, 다시 말해, V^* 행렬) 은 V 행렬의 전치인 것으로 고려될 수도 있다. 아래에, 예시 목적들의 용이를 위해, HOA 계수들 (11) 은, V^* 행렬보다는 V 행렬이 SVD 를 통해 출력되는 결과로 실수들을 포함한다고 가정된다. 더욱이, 본 개시물에서 V 행렬로서 표시되지만, V 행렬에 대한 언급은 적절한 경우 V 행렬의 전치를 지칭하는 것으로 이해되어야 한다. V 행렬인 것으로 가정되지만, 기법들은 복소 계수들을 갖는 HOA 계수들 (11) 에 대해 유사한 방식으로 적용될 수도 있고, 여기서 SVD 의 출력은 V^* 행렬이다. 이에 따라, 기법들은 이와 관련하여 V 행렬을 생성하기 위해 오직 SVD 의 적용만을 제공하도록 제한되어서는 안되지만, V^* 행렬을 생성하기 위해 복소 성분들을 갖는 HOA 계수들 (11) 로의 SVD 의 적용을 포함할 수도 있다.In some examples, the V ^* matrix in the SVD equation referenced above is represented as the conjugate transpose of the V matrix to reflect that the SVD may be applied to matrices containing complex numbers. When applied to matrices containing only real numbers, the complex conjugate (or, in other words, the V ^* matrix) of the V matrix may be considered to be a transpose of the V matrix. Below, for ease of illustration purposes, the HOA coefficients 11 are assumed to contain real numbers as a result of the V matrix being output through the SVD rather than the V ^* matrix. Furthermore, although shown as a V matrix in this disclosure, it should be understood that the reference to the V matrix refers to the transpose of the V matrix, where appropriate. V matrix, the techniques may be applied in a similar manner to the HOA coefficients 11 with complex coefficients, where the output of the SVD is a V ^* matrix. Accordingly, techniques should not be limited to providing only the application of SVD to generate a V matrix in this regard, but may involve applying SVD to HOA coefficients 11 with complex components to generate a V ^* matrix It is possible.

이러한 방법으로, LIT 유닛 (30) 은 HOA 계수들 (11) 에 관하여 SVD 를 수행하여, 차원들 D: M x (N+1)² 을 갖는 US[k] 벡터들 (33) (S 벡터들과 U 벡터들의 조합된 버전을 표현할 수도 있음), 및 차원들 D: (N+1)² x (N+1)² 을 갖는 V[k] 벡터들 (35) 을 출력할 수도 있다. US[k] 행렬에서의 개별 벡터 엘리먼트들은 또한 X _PS (k) 라고 지칭될 수도 있는 한편, V[k] 행렬에서의 개별 벡터들은 또한

라고 지칭될 수도 있다.In this way, LIT unit 30 by performing the SVD respect to HOA coefficient 11, the dimensions D: s x M (N +1) with US ² [k] vector of (33) (S Vector And V [ k ] vectors 35 with dimensions D: ( N + 1) ² x ( N + 1) ² . The discrete vector elements in the US [k] matrix may also be referred to as X _PS (k) , while the discrete vectors in the V [k] matrix are also

. &Lt; / RTI >

U, S 및 V 행렬들의 분석은 이들 행렬들이 X 에 의해 위에서 표현된 기본 음장의 공간 및 시간 특성들을 포함함 또는 표현함을 나타낼 수도 있다. (길이 M 샘플들의) U 에서의 N 벡터들 각각은, 서로 직교하고 임의의 공간 특성들 (또한 방향성 정보라고도 지칭될 수도 있음) 로부터 커플링해제되었던 정규화된 분리된 오디오 신호들을 (M 개의 샘플들에 의해 표현된 시간 주기에 대한) 시간의 함수로서 표현할 수도 있다. 공간 형상 및 포지션 (r, 세타, 파이) 을 표현하는 공간 특성들은 V 행렬 (각각의 길이 (N+1)²) 에서 개별 i 번째 벡터들

에 의해 대신 표현될 수도 있다. 벡터들

각각의 개별 엘리먼트들은 연관된 오디오 오브젝트에 대한 음장의 형상 (폭을 포함함) 및 포지션을 설명하는 HOA 계수를 표현할 수도 있다. U 행렬 및 V 행렬에서의 벡터들 양쪽은 이들의 제곱-평균-제곱근 (root-mean-square) 에너지들이 1 과 동일하도록 정규화된다. 따라서, U 에서의 오디오 신호들의 에너지는 S 에서의 대각선 엘리먼트들에 의해 표현된다. 따라서, U 와 S 를 곱하여 (개별 벡터 엘리먼트들 X _PS (k) 를 갖는) US[k] 를 형성하는 것은 에너지들을 갖는 오디오 신호를 표현한다. (U 에서의) 오디오 시간-신호들, (S 에서의) 이들의 에너지들, 및 (V 에서의) 이들의 공간 특성들을 커플링해제하기 위한 SVD 분해의 능력은 본 개시물에서 설명되는 기법들의 다양한 양태들을 지원할 수도 있다. 추가로, 기본 HOA[k] 계수들 (X) 을 US[k] 와 V[k] 의 벡터 곱셈에 의해 합성하는 모델은 본 문헌 전반에 걸쳐 사용되는 용어 "벡터-기반 분해" 를 발생시킨다.The analysis of the U, S and V matrices may indicate that these matrices contain or represent the spatial and temporal properties of the fundamental field described above by X. Each of the N vectors in U (of length M samples) are normalized separated audio signals that are mutually orthogonal and uncoupled from any spatial properties (which may also be referred to as directional information) (M samples Lt; / RTI > (for a time period represented by < RTI ID = 0.0 > Room shape and position (r, theta, pi) of the spatial properties of the matrix V are expressed in the individual i-th vector (the length of each of the (N + 1) ²⁾

May be represented instead by. Vectors

Each individual element may represent a HOA coefficient describing the shape (including width) and position of the sound field for the associated audio object. Both vectors in the U matrix and V matrix are normalized such that their root-mean-square energies are equal to one. Thus, the energy of the audio signals at U is represented by the diagonal elements at S. Thus, multiplying U by S to form US [ k ] (with discrete vector elements X _PS (k) ) represents an audio signal with energies. The ability of the SVD decomposition to decouple the audio time-signals (at U), their energies (at S), and their spatial properties (at V) It may support various aspects. In addition, a model that synthesizes the basic HOA [ k ] coefficients X by vector multiplication of US [ k ] and V [ k ] generates the term "vector-based decomposition" used throughout this document.

HOA 계수들 (11) 에 관하여 직접 수행되는 것으로서 설명되지만, LIT 유닛 (30) 은 선형 가역 변환을 HOA 계수들 (11) 의 도함수들에 적용할 수도 있다. 예를 들어, LIT 유닛 (30) 은 HOA 계수들 (11) 로부터 도출된 전력 스펙트럼 밀도 행렬에 관하여 SVD 를 적용할 수도 있다. 계수들 자체보다는 HOA 계수들의 전력 스펙트럼 밀도 (PSD) 에 관하여 SVD 를 수행함으로써, LIT 유닛 (30) 은 프로세서 사이클들 및 저장 공간 중 하나 이상의 관점에서 SVD 를 수행하는 연산 복잡도를 잠재적으로 감소시키면서, SVD 가 HOA 계수들에 직접 적용되었던 것처럼 동일한 소스 오디오 인코딩 효율을 달성할 수도 있다.The LIT unit 30 may apply a linear inverse transform to the derivatives of the HOA coefficients 11, although it is described as being performed directly with respect to the HOA coefficients 11. For example, the LIT unit 30 may apply the SVD with respect to the power spectral density matrix derived from the HOA coefficients 11. By performing the SVD with respect to the power spectral density (PSD) of the HOA coefficients rather than the coefficients themselves, the LIT unit 30 can reduce the computational complexity of performing SVD in terms of one or more of the processor cycles and storage space, May achieve the same source audio encoding efficiency as if it were directly applied to the HOA coefficients.

파라미터 계산 유닛 (32) 은 상관 파라미터 (R), 방향 속성 파라미터들 (θ, φ, r) 및 에너지 속성 (e) 과 같은 다양한 파라미터들을 계산하도록 구성된 유닛을 표현한다. 현재 프레임에 대한 파라미터들 각각은 R[k], θ[k], φ[k], r[k] 및 e[k] 로서 표시될 수도 있다. 파라미터 계산 유닛 (32) 은 US[k] 벡터들 (33) 에 관하여 에너지 분석 및/또는 상관 (또는 소위 상호-상관) 을 수행하여 이들 파라미터들을 식별할 수도 있다. 파라미터 계산 유닛 (32) 은 또한 파라미터들을 이전 프레임에 대해 결정할 수도 있고, 여기서 이전 프레임 파라미터들은 US[k-1] 벡터 및 V[k-1] 벡터들의 이전 프레임에 기초하여 R[k-1], θ[k-1], φ[k-1], r[k-1] 및 e[k-1] 로 표시될 수도 있다. 파라미터 계산 유닛 (32) 은 현재 파라미터들 (37) 및 이전 파라미터들 (39) 을 리오더 유닛 (34) 에 출력할 수도 있다.The parameter calculation unit 32 represents a unit configured to calculate various parameters such as correlation parameter R , directionality parameters ? , ? , R and energy attribute e . Each of the parameters for the current frame may be denoted as R [k], θ [k ], φ [k], r [k] and e [k]. The parameter calculation unit 32 may perform energy analysis and / or correlation (or so-called cross-correlation) on the US [ k ] vectors 33 to identify these parameters. The parameter calculation unit 32 also may determine the parameters for the previous frame, where the previous frame parameters based on the previous frame of the US [k -1] and a vector V [k -1] vector R [k -1] , it may be expressed as θ [k -1], φ [ k -1], r [k -1] and e [k -1]. The parameter calculation unit 32 may output the current parameters 37 and the previous parameters 39 to the reorder unit 34. [

파라미터 계산 유닛 (32) 에 의해 계산된 파라미터들은, 오디오 오브젝트들을, 시간에 걸친 이들의 자연적 평가 또는 연속성을 표현하도록 리오더링하기 (re-order) 위해 리오더 유닛 (34) 에 의해 이용될 수도 있다. 리오더 유닛 (34) 은 제 1 US[k] 벡터들 (33) 로부터의 파라미터들 (37) 각각을, 순번별로, 제 2 US[k-1] 벡터들 (33) 에 대한 파라미터들 (39) 각각에 대해 비교할 수도 있다. 리오더 유닛 (34) 은 현재 파라미터들 (37) 및 이전 파라미터들 (39) 에 기초하여 US[k] 행렬 (33) 및 V[k] 행렬 (35) 내의 다양한 벡터들을 (하나의 예로서, 헝가리안 알고리즘 (Hungarian algorithm) 을 이용하여) 리오더링하여, 리오더링된 US[k] 행렬 (33') (수학적으로

로서 표시될 수도 있음) 및 리오더링된 V[k] 행렬 (35') (수학적으로

로서 표시될 수도 있음) 을 전경 사운드 (또는 우세 사운드 - PS) 선택 유닛 (36) ("전경 선택 유닛 (36)") 및 에너지 보상 유닛 (38) 에 출력할 수도 있다.The parameters computed by the parameter computation unit 32 may be used by the reorder unit 34 to reorder audio objects to represent their natural evaluation or continuity over time. The reorder unit 34 maps each of the parameters 37 from the first US [ k ] vectors 33 to the parameters 39 for the second US [ k -1] vectors 33, You can also compare for each. The reorder unit 34 calculates the various vectors in the US [ k ] matrix 33 and the V [ k ] matrix 35 based on the current parameters 37 and the previous parameters 39 (in one example, (Using a Hungarian algorithm) to generate a reordered US [ k ] matrix 33 '(mathematically

) And a reordered V [ k ] matrix 35 '(mathematically

To the foreground sound (or dominant sound-PS) selection unit 36 (" foreground selection unit 36 ") and the energy compensation unit 38. [

음장 분석 유닛 (44) 은 타깃 비트레이트 (41) 를 잠재적으로 달성하도록 HOA 계수들 (11) 에 관하여 음장 분석을 수행하도록 구성된 유닛을 표현할 수도 있다. 음장 분석 유닛 (44) 은, 분석 및/또는 수신된 타깃 비트레이트 (41) 에 기초하여, 심리음향 코더 인스턴스화들의 총 개수 (주변 또는 배경 채널들 (BG_TOT) 의 총 개수의 함수일 수도 있음) 및 전경 채널들 또는, 다시 말해, 우세 채널들의 개수를 결정할 수도 있다. 심리음향 코더 인스턴스화들의 총 개수는 numHOATransportChannels 로서 표시될 수 있다.The sound field analysis unit 44 may represent a unit configured to perform sound field analysis with respect to the HOA coefficients 11 to potentially achieve a target bit rate 41. [ The sound field analysis unit 44 may determine the total number of psychoacoustic coder instantiations (which may be a function of the total number of surrounding or background channels BG _TOT ) and the number of psychoacoustic coder instantiations based on the analyzed and / May determine the number of foreground channels, or, in other words, the number of dominant channels. The total number of psychoacoustic coder instantiations can be displayed as numHOATransportChannels.

음장 분석 유닛 (44) 은 또한, 다시 타깃 비트레이트 (41) 를 잠재적으로 달성하기 위해, 전경 채널들의 총 개수 (nFG) (45), 배경 (또는, 다시 말해, 주변) 음장의 최소 차수 (N_BG 또는, 대안적으로, MinAmbHOAorder), 배경 음장의 최소 차수를 표현하는 실제 채널들의 대응하는 개수 (nBGa = (MinAmbHOAorder + 1)²), 및 전송할 부가적인 BG HOA 채널들의 인덱스들 (i) (도 3 의 예에서 배경 채널 정보 (43) 로서 일괄적으로 표시될 수도 있음) 을 결정할 수도 있다. 배경 채널 정보 (42) 는 또한 주변 채널 정보 (43) 라고도 지칭될 수도 있다. numHOATransportChannels - nBGa 로부터 남겨진 채널들 각각은 "부가적인 배경/주변 채널", "액티브 벡터-기반 우세 채널", "액티브 방향성-기반 우세 신호" 또는 "완전히 인액티브" 일 수도 있다. 하나의 양태에서, 채널 타입들은 2 비트들에 의해 ("ChannelType" 으로서) 신택스 엘리먼트로 나타낼 수도 있다 (예를 들어, 00: 방향성 기반 신호; 01: 벡터-기반 우세 신호; 10: 부가적인 주변 신호; 11: 인액티브 신호). 배경 또는 주변 신호들의 총 개수 (nBGa) 는 (MinAmbHOAorder + 1)² + (위의 예에서) 인덱스 00 이 그 프레임에 대한 비트스트림에서 채널 타입으로서 나타나는 횟수에 의해 주어질 수도 있다.The sound field analysis unit 44 also includes a total number of foreground channels nFG 45 and a minimum degree N of the background (or, e. G., Surrounding) sound field to potentially achieve the target bit rate 41 again _BG, or, alternatively, MinAmbHOAorder), the number of which of the physical channels representing the minimum degree of the background sound corresponding (nBGa = (MinAmbHOAorder + 1) 2), and to send the index of the additional BG HOA channel (i) (Fig. (Which may be collectively displayed as background channel information 43 in the example of FIG. 3). Background channel information 42 may also be referred to as peripheral channel information 43. [ numHOATransportChannels - Each of the channels left from nBGa may be an "additional background / peripheral channel", "active vector-based dominant channel", "active direction-based dominant signal" or "completely inactive". In one aspect, the channel types may be represented by a syntax element (e.g., 00: directional based signal, 01: vector-based dominant signal, 10: ; 11: inactive signal). The total number of background or surrounding signals (nBGa) may be given by (MinAmbHOAorder + 1) ² + the number of times the index 00 appears in the bit stream for that frame as the channel type (in the example above).

음장 분석 유닛 (44) 은 타깃 비트레이트 (41) 에 기초하여 배경 (또는, 다시 말해, 주변) 채널들의 개수 및 전경 (또는, 다시 말해, 우세) 채널들의 개수를 선택하여, 타깃 비트레이트 (41) 가 상대적으로 더 높을 때 (예를 들어, 타깃 비트레이트 (41) 가 512 Kbps 이상일 때) 더 많은 배경 및/또는 전경 채널들을 선택할 수도 있다. 하나의 양태에서, 비트스트림의 헤더 선택에 있어서 MinAmbHOAorder 는 1 로 설정될 수도 있는 한편 numHOATransportChannels 는 8 로 설정될 수도 있다. 이 시나리오에서, 모든 프레임에서, 4 개의 채널들이 음장의 배경 또는 주변 부분을 표현하기 위해 전용될 수도 있는 한편, 다른 4 개의 채널들은 프레임 기반으로 - 예를 들어, 부가적인 배경/주변 채널 또는 전경/우세 채널로서 이용되는 - 채널의 타입에 대해 변할 수 있다. 전경/우세 신호들은, 상술된 바와 같이, 벡터-기반 또는 방향성 기반 신호들 중 하나일 수 있다.The sound field analyzing unit 44 selects the number of background (or, moreover, peripheral) channels and the number of foreground (or, more dominant) channels based on the target bit rate 41, May select more background and / or foreground channels when the target bit rate 41 is relatively high (e.g., when the target bit rate 41 is greater than 512 Kbps). In one aspect, MinAmbHOAorder may be set to 1 in the header selection of the bitstream while numHOATransportChannels may be set to 8. In this scenario, in all frames, four channels may be dedicated to represent the background or surrounding portion of the sound field, while the other four channels may be frame based - for example, additional background / surround channel or foreground / It can be varied for the type of channel used as the dominant channel. The foreground / dominant signals may be one of vector-based or directional based signals, as described above.

일부 경우들에서, 프레임에 대한 벡터-기반 우세 신호들의 총 개수는 그 프레임의 비트스트림에서 ChannelType 인덱스가 01 인 횟수에 의해 주어질 수도 있다. 위의 양태에서, 모든 부가적인 배경/주변 채널 (예를 들어, 10 의 ChannelType 에 대응함) 에 대해, (처음 4 개를 초과한) 가능한 HOA 계수들 중 어느 HOA 계수의 대응하는 정보가 그 채널에서 표현될 수도 있다. 그 정보는, 제 4 차수 HOA 콘텐츠에 대해, HOA 계수들 5 내지 25 를 나타내기 위한 인덱스일 수도 있다. 처음 4 개의 주변 HOA 계수들 1 내지 4 는 minAmbHOAorder 가 1 로 설정될 때의 모든 시간에 전송될 수도 있어서, 그에 따라 오디오 인코딩 디바이스는 단지 5 내지 25 의 인덱스를 갖는 부가적인 주변 HOA 계수들 중 하나만을 나타낼 필요가 있을 수도 있다. 따라서, 이 정보는 "CodedAmbCoeffIdx" 로서 표시될 수도 있는 (제 4 차수 콘텐츠에 대한) 5 비트 신택스 엘리먼트를 이용하여 전송될 수 있다. 어떤 경우든, 음장 분석 유닛 (44) 은 배경 채널 정보 (43) 및 HOA 계수들 (11) 을 배경 (BG) 선택 유닛 (36) 에 출력하고, 배경 채널 정보 (43) 를 계수 감소 유닛 (46) 및 비트스트림 생성 유닛 (42) 에 출력하며, nFG (45) 를 전경 선택 유닛 (36) 에 출력한다.In some cases, the total number of vector-based dominant signals for a frame may be given by the number of times the ChannelType index is 01 in the bitstream of that frame. In the above embodiment, for every additional background / perimeter channel (e.g., corresponding to a ChannelType of 10), the corresponding information of any HOA coefficient (beyond the first four) May be expressed. The information may be an index for indicating the HOA coefficients 5 to 25 for the fourth-order HOA contents. The first four neighboring HOA coefficients 1 through 4 may be transmitted at all times when minAmbHOAorder is set to one so that the audio encoding device can only transmit one of the additional neighboring HOA coefficients having an index of only 5 to 25 It may be necessary to indicate. Thus, this information may be transmitted using a 5-bit syntax element (for fourth-order content) that may be denoted as " CodedAmbCoeffIdx ". In any case, the sound field analysis unit 44 outputs the background channel information 43 and the HOA coefficients 11 to the background (BG) selection unit 36 and the background channel information 43 to the coefficient reduction unit 46 ) And the bitstream generating unit 42, and outputs the nFG 45 to the foreground selecting unit 36. [

배경 선택 유닛 (48) 은 배경 채널 정보 (예를 들어, 전송할 부가적인 BG HOA 채널들의 인덱스들 (i) 및 개수 (nBGa) 그리고 배경 음장 (N_BG)) 에 기초하여 배경 또는 주변 HOA 계수들 (47) 을 결정하도록 구성된 유닛을 표현할 수도 있다. 예를 들어, N_BG 가 1 과 동일할 때, 배경 선택 유닛 (48) 은 1 보다 더 작거나 동일한 차수를 갖는 오디오 프레임의 각각의 샘플에 대한 HOA 계수들 (11) 을 선택할 수도 있다. 그 후에, 배경 선택 유닛 (48) 은, 이 예에서, 부가적인 BG HOA 계수들로서 인덱스들 (i) 중 하나에 의해 식별된 인덱스를 갖는 HOA 계수들 (11) 을 선택할 수도 있고, 여기서 nBGa 는, 도 2 및 도 4 의 예에 도시된 오디오 디코딩 디바이스 (24) 와 같은 오디오 디코딩 디바이스로 하여금 비트스트림 (21) 으로부터 배경 HOA 계수들 (47) 을 파싱할 수 있게 하도록 비트스트림 (21) 에 특정되도록 비트스트림 생성 유닛 (42) 에 제공된다. 그 후에, 배경 선택 유닛 (48) 은 주변 HOA 계수들 (47) 을 에너지 보상 유닛 (38) 에 출력할 수도 있다. 주변 HOA 계수들 (47) 은 차원들 D: M x [(N _BG +1)² ₊ nBGa] 를 가질 수도 있다. 주변 HOA 계수들 (47) 은 또한 "주변 HOA 계수들 (47)" 이라고 지칭될 수도 있고, 여기서 주변 HOA 계수들 (47) 각각은 심리음향 오디오 코더 유닛 (40) 에 의해 인코딩될 분리 주변 HOA 채널 (47) 에 대응한다.The background selection unit 48 is a background channel information (e.g., additional BG indices of HOA channel (i) and the number (nBGa) and the background field (N _BG) send) the background or ambient HOA coefficient based on ( 47). &Lt; / RTI > For example, when N _BG is equal to one, the background selection unit 48 may select the HOA coefficients 11 for each sample of the audio frame having a degree less than or equal to one. Thereafter, the background selection unit 48 may, in this example, select the HOA coefficients 11 with the index identified by one of the indices i as additional BG HOA coefficients, To be specified in the bitstream 21 to allow an audio decoding device such as the audio decoding device 24 shown in the example of Figures 2 and 4 to parse the background HOA coefficients 47 from the bitstream 21 And is provided to the bitstream generating unit 42. Thereafter, the background selection unit 48 may output the peripheral HOA coefficients 47 to the energy compensation unit 38. [ Peripheral HOA coefficient 47 is the dimension D: may have a _{M x [(N BG +1)} 2 + nBGa]. The neighboring HOA coefficients 47 may also be referred to as " neighboring HOA coefficients 47 ", wherein each of the neighboring HOA coefficients 47 is associated with a separate peripheral HOA channel 47 to be encoded by the psychoacoustic audio coder unit 40 (47).

전경 선택 유닛 (36) 은 (전경 벡터들을 식별하는 하나 이상의 인덱스들을 표현할 수도 있는) nFG (45) 에 기초하여 음장의 전경 또는 구별되는 성분들을 표현하는 리오더링된 V[k] 행렬 (35') 및 리오더링된 US[k] 행렬 (33') 을 선택하도록 구성된 유닛을 표현할 수도 있다. 전경 선택 유닛 (36) 은 nFG 신호들 (49) (리오더링된 US[k]_{1, …, nFG} (49), FG _{1, …, nfG}[k] (49), 또는

(49) 로서 표시될 수도 있음) 을 심리음향 오디오 코더 유닛 (40) 에 출력할 수도 있고, 여기서 nFG 신호들 (49) 은 차원들 D: M x nFG 를 가질 수도 있고 각각은 모노-오디오 오브젝트들을 표현한다. 전경 선택 유닛 (36) 은 또한, 음장의 전경 성분들에 대응하는 리오더링된 V[k] 행렬 (35') (또는

(35')) 을 공간-시간 보간 유닛 (50) 에 출력할 수도 있고, 여기서 전경 성분들에 대응하는 리오더링된 V[k] 행렬 (35') 의 서브세트는 차원들 D: (N+1)² x nFG 를 갖는 전경 V[k] 행렬 (51 _k ) (수학적으로

로서 표시될 수도 있음) 로서 표시될 수도 있다.The foreground selection unit 36 includes a reordered V [ k ] matrix 35 'that represents the foreground or distinct components of the sound field based on the nFG 45 (which may represent one or more indices identifying foreground vectors) And a reordered US [ k ] matrix 33 '. The foreground selection unit 36 receives the nFG signals 49 (reordered US [ k ] _{1, ..., nFG} 49, FG _{1, ..., nfG} [ k ]

(Which may also be denoted as < RTI ID = 0.0 > 49), to the psychoacoustic audio coder unit 40 where the nFG signals 49 may have dimensions D: M x nFG, Express. The foreground selection unit 36 also includes a reordered V [ k ] matrix 35 'corresponding to the foreground components of the sound field (or

(35 ')) to a space-time interpolation unit 50, wherein a subset of the reordered V [ k ] matrix 35' corresponding to the foreground components comprises dimensions D: ( N + 1) a foreground V [ k ] matrix 51 _k with ² x nFG (mathematically

As shown in FIG.

에너지 보상 유닛 (38) 은 배경 선택 유닛 (48) 에 의한 HOA 채널들 중 다양한 HOA 채널들의 제거로 인한 에너지 손실을 보상하기 위해 주변 HOA 계수들 (47) 에 관하여 에너지 보상을 수행하도록 구성된 유닛을 표현할 수도 있다. 에너지 보상 유닛 (38) 은 리오더링된 US[k] 행렬 (33'), 리오더링된 V[k] 행렬 (35'), nFG 신호들 (49), 전경 V[k] 벡터들 (51 _k ) 및 주변 HOA 계수들 (47) 중 하나 이상에 관하여 에너지 분석을 수행한 후에, 에너지 분석에 기초하여 에너지 보상을 수행하여 에너지 보상된 주변 HOA 계수들 (47') 을 생성할 수도 있다. 에너지 보상 유닛 (38) 은 에너지 보상된 주변 HOA 계수들 (47') 을 심리음향 오디오 코더 유닛 (40) 에 출력할 수도 있다.The energy compensation unit 38 represents a unit configured to perform energy compensation with respect to the surrounding HOA coefficients 47 to compensate for the energy loss due to removal of the various HOA channels among the HOA channels by the background selection unit 48 It is possible. The energy compensation unit 38 is a reordering the US [k] matrix (33 '), the reordering of V [k] matrix (35'), nFG signal (49), foreground V [k] vector (51 _k ) And neighboring HOA coefficients 47, energy compensation may then be performed based on energy analysis to generate energy-compensated neighboring HOA coefficients 47 '. The energy compensation unit 38 may output the energy-compensated neighboring HOA coefficients 47 'to the psychoacoustic audio coder unit 40.

공간-시간 보간 유닛 (50) 은 k 번째 프레임에 대한 전경 V[k] 벡터들 (51 _k ) 및 이전 프레임 (그에 따라 k-1 표시) 에 대한 전경 V[k-1] 벡터들 (51 _k _-1) 을 수신하고 공간-시간 보간을 수행하여 보간된 전경 V[k] 벡터들을 생성하도록 구성된 유닛을 표현할 수도 있다. 공간-시간 보간 유닛 (50) 은 nFG 신호들 (49) 을 전경 V[k] 벡터들 (51 _k ) 과 재조합하여 리오더링된 전경 HOA 계수들을 복원할 수도 있다. 그 후에, 공간-시간 보간 유닛 (50) 은 리오더링된 전경 HOA 계수들을 보간된 V[k] 벡터들에 의해 나누어서 보간된 nFG 신호들 (49') 을 생성할 수도 있다. 공간-시간 보간 유닛 (50) 은 또한, 보간된 전경 V[k] 벡터들을 생성하는데 이용되었던 전경 V[k] 벡터들 (51 _k ) 을 출력할 수도 있어서, 오디오 디코딩 디바이스 (24) 와 같은 오디오 디코딩 디바이스가 보간된 전경 V[k] 벡터들을 생성하고 그에 의해 전경 V[k] 벡터들 (51 _k ) 을 복원할 수도 있다. 보간된 전경 V[k] 벡터들을 생성하는데 이용된 전경 V[k] 벡터들 (51 _k ) 은 나머지 전경 V[k] 벡터들 (53) 로서 표시된다. 동일한 V[k] 및 V[k-1] 이 (보간된 벡터들 V[k] 를 생성하기 위해) 인코더 및 디코더에서 이용됨을 보장하기 위해, 벡터들의 양자화된/양자화해제된 버전들이 인코더 및 디코더에서 이용될 수도 있다. 공간-시간 보간 유닛 (50) 은 보간된 nFG 신호들 (49') 을 심리음향 오디오 코더 유닛 (46) 에 출력하고 보간된 전경 V[k] 벡터들 (51 _k ) 을 계수 감소 유닛 (46) 에 출력할 수도 있다.Space-time interpolation unit 50 in the foreground of the k-th frame, V [k] vector s (51 _k) and a previous frame view V [k -1] vector, for a (k-1 displayed accordingly) (51 _k _-1 ) and perform spatial-temporal interpolation to generate interpolated foreground V [ k ] vectors. The space-time interpolation unit 50 may reconstruct the reordered foreground HOA coefficients by recombining the nFG signals 49 with the foreground V [ k ] vectors 51 _k . Thereafter, the space-time interpolation unit 50 may generate the interpolated nFG signals 49 'by dividing the reordered foreground HOA coefficients by the interpolated V [ k ] vectors. Space-time interpolation unit 50 also, the interpolated foreground V [k] in the foreground that was used to produce the vector V [k] vector s (51 _k), the method may output audio such as an audio decoding device 24 The decoding device may generate the interpolated foreground V [ k ] vectors and thereby restore the foreground V [ k ] vectors _51k . The used to generate the interpolated foreground V [k] vector foreground V [k] of the vector (51 _k) is expressed as the remaining foreground V [k] vector (53). To ensure that the same V [k] and V [k-1] are used in the encoder and decoder (to produce the interpolated vectors V [k]), the quantized / Lt; / RTI > The spatial-temporal interpolation unit 50 outputs the interpolated nFG signals 49 'to the psychoacoustic audio coder unit 46 and outputs the interpolated foreground V [ k ] vectors 51 _k to the coefficient reduction unit 46, As shown in FIG.

계수 감소 유닛 (46) 은 감소된 전경 V[k] 벡터들 (55) 을 양자화 유닛 (52) 에 출력하기 위해 배경 채널 정보 (43) 에 기초하여 나머지 전경 V[k] 벡터들 (53) 에 관하여 계수 감소를 수행하도록 구성된 유닛을 표현할 수도 있다. 감소된 전경 V[k] 벡터들 (55) 은 차원들 D: [(N+1)² - (N _BG +1)²-BG_TOT] x nFG 를 가질 수도 있다. 계수 감소 유닛 (46) 은, 이와 관련하여, 나머지 전경 V[k] 벡터들 (53) 에서 계수들의 개수를 감소시키도록 구성된 유닛을 표현할 수도 있다. 다시 말해, 계수 감소 유닛 (46) 은 방향성 정보에 대해 거의 갖고 있지 않은 (나머지 전경 V[k] 벡터들 (53) 을 형성하는) 전경 V[k] 벡터들에서 계수들을 제거하도록 구성된 유닛을 표현할 수도 있다. 일부 예들에서, (N_BG 로서 표시될 수도 있는) 제 1 및 제로 차수 기저 함수들에 대응하는 구별되는 또는, 다시 말해, 전경 V[k] 벡터들의 계수들은 거의 방향성 정보를 제공하지 않아서 그에 따라 ("계수 감소" 라고 지칭될 수도 있는 프로세스를 통해) 전경 V-벡터들로부터 제거될 수 있다. 이 예에서, N_BG 에 대응하는 계수들을 식별할 뿐만 아니라 [(N_BG+1)²+1, (N+1)²] 의 세트로부터 (변수 TotalOfAddAmbHOAChan 에 의해 표시될 수도 있는) 부가적인 HOA 채널들을 식별하기 위해 가장 큰 유연성이 제공될 수도 있다.The coefficient reducing unit 46 reduces the foreground V [k] vectors (55) based upon a on a quantization unit 52, background channel information 43 to output the remaining foreground V [k] vector 53 A unit configured to perform a coefficient reduction with respect to a coefficient. The reduced foreground V [ k ] vectors 55 may have dimensions D: [( N +1) ² - ( N _BG +1) ² -BG _TOT ] x nFG. The coefficient reduction unit 46 may, in this regard, represent a unit configured to reduce the number of coefficients in the remaining foreground V [ k ] vectors 53. In other words, the coefficient of the reduction unit 46 is not substantially have for directional information, the foreground (the remaining views V [k] vectors to form a (53)) V [k] vector with the unit configured to eliminate the coefficients in the expressed It is possible. In some instances, the distinct or, in other words, the coefficients of the foreground V [ k ] vectors corresponding to the first and the zero order basis functions (which may be denoted as N _BG ) provide little or no directional information, May be removed from the foreground V-vectors (via a process that may be referred to as " factor reduction "). In this example, not only the coefficients corresponding to N _BG but also additional HOA channels (which may be indicated by the variable TotalOfAddAmbHOAChan) from the set of [(N _BG +1) ² +1, (N + 1) ² ] May be provided with the greatest flexibility to identify.

양자화 유닛 (52) 은 임의의 형태의 양자화를 수행하여 감소된 전경 V[k] 벡터들 (55) 을 압축하여 코딩된 전경 V[k] 벡터들 (57) 을 생성하여, 코딩된 전경 V[k] 벡터들 (57) 을 비트스트림 생성 유닛 (42) 에 출력하도록 구성된 유닛을 표현할 수도 있다. 동작시, 양자화 유닛 (52) 은 음장의 공간 성분, 즉, 이 예에서 감소된 전경 V[k] 벡터들 (55) 중 하나 이상을 압축하도록 구성된 유닛을 표현할 수도 있다. 양자화 유닛 (52) 은, "NbitsQ" 로 표시된 양자화 모드 신택스 엘리먼트에 의해 나타낸 바와 같이, 다음 12 개의 양자화 모드들 중 임의의 것을 수행할 수도 있다:The quantization unit 52 performs any form of quantization to compress the reduced foreground V [ k ] vectors 55 to produce coded foreground V [ k ] vectors 57 to produce a coded foreground V [ k ] vectors 57 to the bitstream generation unit 42. The bitstream generation unit 42 may be a unit configured to output the k- In operation, the quantization unit 52 may represent a unit configured to compress one or more of the spatial components of the sound field, i. E., The reduced foreground V [ k ] vectors 55 in this example. The quantization unit 52 may perform any of the following twelve quantization modes, as indicated by the quantization mode syntax element indicated by " NbitsQ "

NbitsQ 값 양자화 모드의 타입NbitsQ Value Type of quantization mode

0-3: 예비됨0-3: Reserved

4: 벡터 양자화4: vector quantization

5: 허프만 코딩을 이용하지 않은 스칼라 양자화5: Scalar quantization without Huffman coding

6: 허프만 코딩을 이용한 6-비트 스칼라 양자화6: 6-bit scalar quantization using Huffman coding

7: 허프만 코딩을 이용한 7-비트 스칼라 양자화7: 7-bit scalar quantization using Huffman coding

8: 허프만 코딩을 이용한 8-비트 스칼라 양자화8: 8-bit scalar quantization using Huffman coding

… …... ...

16: 허프만 코딩을 이용한 16-비트 스칼라 양자화16: 16-bit scalar quantization using Huffman coding

양자화 유닛 (52) 은 또한 전술한 타입들의 양자화 모드들 중 임의의 양자화 모드의 예측된 버전들을 수행할 수도 있고, 여기서 이전 프레임의 V-벡터의 엘리먼트 (또는 벡터 양자화가 수행될 때의 가중치) 와 현재 프레임의 V-벡터의 엘리먼트 (또는 벡터 양자화가 수행될 때의 가중치) 사이에서 결정되는 차이가 결정된다. 그 후에, 양자화 유닛 (52) 은 현재 프레임의 V-벡터의 엘리먼트의 값 그 자체보다는 현재 프레임과 이전 프레임의 엘리먼트들 또는 가중치들 사이의 차이를 양자화할 수도 있다.The quantization unit 52 may also perform predicted versions of any of the quantization modes of the types described above, wherein the elements of the V-vector of the previous frame (or the weight when vector quantization is performed) The difference determined between the elements of the V-vector of the current frame (or the weight when vector quantization is performed) is determined. The quantization unit 52 may then quantize the difference between the current frame and the elements or weights of the previous frame, rather than the value of the element of the V-vector of the current frame.

양자화 유닛 (52) 은 감소된 전경 V[k] 벡터들 (55) 각각에 관하여 다수의 형태들의 양자화를 수행하여 감소된 전경 V[k] 벡터들 (55) 의 다수의 코딩된 버전들을 획득할 수도 있다. 양자화 유닛 (52) 은 코딩된 전경 V[k] 벡터 (57) 로서 감소된 전경 V[k] 벡터들 (55) 의 코딩된 버전들 중 하나를 선택할 수도 있다. 양자화 유닛 (52) 은, 다시 말해, 본 개시물에서 논의된 기준들의 임의의 조합에 기초하여 출력 스위칭된-양자화된 V-벡터로서 이용하기 위해 비-예측된 벡터-양자화된 V-벡터, 예측된 벡터-양자화된 V-벡터, 비-허프만-코딩된 스칼라-양자화된 V-벡터, 및 허프만-코딩된 스칼라-양자화된 V-벡터 중 하나를 선택할 수도 있다. 일부 예들에서, 양자화 유닛 (52) 은 벡터 양자화 모드 및 하나 이상의 스칼라 양자화 모드들을 포함하는 양자화 모드들의 세트로부터 양자화 모드를 선택하고, 선택된 모드에 기초하여 (또는 선택된 모드에 따라) 입력 V-벡터를 양자화할 수도 있다. 그 후에, 양자화 유닛 (52) 은 비-예측된 벡터-양자화된 V-벡터 (예를 들어, 그것을 나타내는 가중 값들 또는 비트들의 관점들에서), 예측된 벡터-양자화된 V-벡터 (예를 들어, 그것을 나타내는 에러 값들 또는 비트들의 관점들에서), 비-허프만-코딩된 스칼라-양자화된 V-벡터 및 허프만-코딩된 스칼라-양자화된 V-벡터 중 선택된 하나를 코딩된 전경 V[k] 벡터들 (57) 로서 비트스트림 생성 유닛 (52) 에 제공할 수도 있다. 양자화 유닛 (52) 은 또한, 양자화 모드를 나타내는 신택스 엘리먼트들 (예를 들어, NbitsQ 신택스 엘리먼트), 및 V-벡터를 양자화해제하거나 또는 그렇지 않으면 재구성하기 위해 이용되는 임의의 다른 신택스 엘리먼트들을 제공할 수도 있다.Quantization unit 52 to obtain reduced foreground V [k] number of the coded version of the vectors 55, a reduced view V by performing quantization of a plurality of types with respect to each [k] vector (55) It is possible. Quantization unit 52 may select one of the coded version of the reduced as a coded foreground V [k] vector 57 views V [k] vector (55). The quantization unit 52, in other words, a non-predicted vector-quantized V-vector for use as an output switched-quantized V-vector based on any combination of the criteria discussed in this disclosure, Vector-quantized V-vector, a non-Huffman-coded scalar-quantized V-vector, and a Huffman-coded scalar-quantized V-vector. In some instances, the quantization unit 52 selects a quantization mode from a set of quantization modes that include a vector quantization mode and one or more scalar quantization modes, and based on the selected mode (or in accordance with the selected mode) It can also be quantized. Thereafter, the quantization unit 52 generates a predicted vector-quantized V-vector (e.g., in terms of weighted values or bits representing it), a predicted vector-quantized V-vector , A non-Huffman-coded scalar-quantized V-vector, and a Huffman-coded scalar-quantized V-vector in the coded foreground V [ k ] vector To the bitstream generating unit 52 as the bitstreams 57. The quantization unit 52 may also provide syntax elements (e.g., NbitsQ syntax elements) representing the quantization mode and any other syntax elements used to de-quantize or otherwise reconstruct the V-vector have.

오디오 인코딩 디바이스 (20) 내에 포함된 심리음향 오디오 코더 유닛 (40) 은 심리음향 오디오 코더의 다수의 인스턴스들을 표현할 수도 있고, 그 각각은 에너지 보상된 주변 HOA 계수들 (47') 및 보간된 nFG 신호들 (49') 각각의 상이한 오디오 오브젝트 또는 HOA 채널을 인코딩하여 인코딩된 주변 HOA 계수들 (59) 및 인코딩된 nFG 신호들 (61) 을 생성하는데 이용된다. 심리음향 오디오 코더 유닛 (40) 은 인코딩된 주변 HOA 계수들 (59) 및 인코딩된 nFG 신호들 (61) 을 비트스트림 생성 유닛 (42) 에 출력할 수도 있다.The psychoacoustic audio coder unit 40 included in the audio encoding device 20 may represent a plurality of instances of a psychoacoustic audio coder, each of which includes energy-compensated neighboring HOA coefficients 47 'and an interpolated nFG signal 47' Are used to encode the different audio objects or HOA channels of each of the frames 49 'to generate encoded neighboring HOA coefficients 59 and encoded nFG signals 61. The psychoacoustic audio coder unit 40 may output the encoded neighboring HOA coefficients 59 and the encoded nFG signals 61 to the bitstream generation unit 42. [

오디오 인코딩 디바이스 (20) 내에 포함된 비트스트림 생성 유닛 (42) 은 (디코딩 디바이스에 의해 알려진 포맷으로 지칭할 수도 있는) 알려진 포맷을 준수하도록 데이터를 포맷팅하여, 그에 의해 벡터-기반 비트스트림 (21) 을 생성하는 유닛을 표현한다. 비트스트림 (21) 은, 다시 말해, 상술된 방식으로 인코딩되었던, 인코딩된 오디오 데이터를 표현할 수도 있다. 비트스트림 생성 유닛 (42) 은, 일부 예들에서, 코딩된 전경 V[k] 벡터들 (57), 인코딩된 주변 HOA 계수들 (59), 인코딩된 nFG 신호들 (61) 및 배경 채널 정보 (43) 를 수신할 수도 있는 멀티플렉서를 표현할 수도 있다. 비트스트림 생성 유닛 (42) 은 그 후에 코딩된 전경 V[k] 벡터들 (57), 인코딩된 주변 HOA 계수들 (59), 인코딩된 nFG 신호들 (61) 및 배경 채널 정보 (43) 에 기초하여 비트스트림 (21) 을 생성할 수도 있다. 이러한 방법으로, 비트스트림 생성 유닛 (42) 은 그에 의해 비트스트림 (21) 을 획득하기 위해 비트스트림 (21) 에서 벡터들 (57) 을 특정할 수도 있다. 비트스트림 (21) 은 프라이머리 또는 메인 비트스트림 및 하나 이상의 사이드 채널 비트스트림들을 포함할 수도 있다.The bitstream generation unit 42 included in the audio encoding device 20 formats the data to comply with a known format (which may be referred to as a format known by the decoding device) Lt; / RTI > The bitstream 21, in other words, may represent encoded audio data that was encoded in the manner described above. The bitstream generation unit 42 includes in some examples coded foreground V [ k ] vectors 57, encoded neighboring HOA coefficients 59, encoded nFG signals 61 and background channel information 43 Lt; RTI ID = 0.0 > a < / RTI > The bitstream generating unit 42 then generates a bitstream based on the coded foreground V [ k ] vectors 57, the encoded neighboring HOA coefficients 59, the encoded nFG signals 61 and the background channel information 43 To generate the bit stream 21. In this way, the bitstream generation unit 42 may specify the vectors 57 in the bitstream 21 to thereby obtain the bitstream 21 therefrom. The bitstream 21 may comprise a primary or main bitstream and one or more side channel bitstreams.

기법들의 다양한 양태들은 또한, 비트스트림 생성 유닛 (46) 으로 하여금, 상술된 바와 같이, 비트스트림 (21) 에서 오디오 렌더링 정보 (2) 를 특정하는 것을 가능하게 할 수도 있다. 다가오는 3D 비디오 압축 작업 초안의 현재 버전이 비트스트림 (21) 내의 특정 다운믹스 행렬들을 시그널링하는 것을 제공하지만, 이 작업 초안은 비트스트림 (21) 에서 HOA 계수들 (11) 을 렌더링함에 있어서 이용되는 렌더러들을 특정하는 것을 제공하지 않는다. HOA 콘텐츠에 대해, 이러한 다운믹스 행렬의 등가물은 HOA 표현을 원하는 라우드스피커 피드들로 컨버팅하는 렌더링 행렬이다. 본 개시물에서 설명되는 기법들의 다양한 양태들은 비트스트림 생성 유닛 (46) 으로 하여금 (예를 들어, 오디오 렌더링 정보 (2) 로서) 비트스트림 내의 HOA 렌더링 행렬들을 시그널링하게 하는 것에 의해 HOA 및 채널 콘텐츠의 피처 세트들을 추가로 조화시키는 것을 제안한다.Various aspects of the techniques may also enable the bitstream generation unit 46 to specify the audio rendering information 2 in the bitstream 21, as described above. Although the current version of the upcoming 3D video compression task draft provides for signaling specific downmix matrices in the bitstream 21, this working draft is based on the assumption that the renderer used in rendering the HOA coefficients 11 in the bitstream 21 Lt; / RTI > For HOA content, the equivalent of this downmix matrix is a rendering matrix that converts the HOA representation into the loudspeaker feeds desired. Various aspects of the techniques described in this disclosure allow the bitstream generation unit 46 to signal the HOA and the channel content (e. G., As audio rendering information 2) by signaling the HOA rendering matrices in the bitstream. It is proposed to further match feature sets.

다운믹스 행렬들의 코딩 스킴에 기초하고 HOA 에 대해 최적화된 하나의 예시적인 시그널링 솔루션이 아래에 제시된다. 다운믹스 행렬들의 송신과 유사하게, HOA 렌더링 행렬들은 mpegh3daConfigExtension() 내에서 시그널링될 수도 있다. 기법들은 다음 테이블들 (이탤릭체 및 볼드체는 기존 테이블에 대한 변화들을 나타냄) 에 제시된 것과 같은 새로운 확장 타입 ID_CONFIG_EXT_HOA_MATRIX 를 제공할 수도 있다.One exemplary signaling solution based on the coding scheme of the downmix matrices and optimized for HOA is presented below. Similar to the transmission of downmix matrices, HOA rendering matrices may be signaled within mpegh3daConfigExtension () . Techniques may provide a new extension type ID_CONFIG_EXT_HOA_MATRIX as presented in the following tables (italics and boldface represent changes to existing tables).

테이블 - mpegh3daConfigExtension() 의 신택스 (CD 에서의 테이블 13)Table - Syntax of mpegh3daConfigExtension () (Table 13 on the CD)

테이블 - usacConfigExtType 의 값 (CD 에서의 테이블 1)Table - Values of usacConfigExtType (Table 1 on the CD)

비트필드 HOARenderingMatrixSet() 는 DownmixMatrixSet() 와 비교하면 구조 및 기능성이 동일할 수도 있다. inputCount(audioChannelLayout) 대신에, HOARenderingMatrixSet() 가 HOAConfig 에서 연산된 "등가의" NumOfHoaCoeffs 값을 이용할 수도 있다. 추가로, HOA 계수들의 오더링이 HOA 디코더 내에서 픽싱될 (fixed) 수도 있기 때문에 (예를 들어, CD 에서의 부록 G 참조), HOARenderingMatrixSet 는 inputConfig(audioChannelLayout) 에 대한 어떠한 등가물도 필요하지 않다.The bit field HOARenderingMatrixSet () may be the same in structure and functionality as DownmixMatrixSet () . Instead of inputCount (audioChannelLayout) , HOARenderingMatrixSet () may use the "equivalent" NumOfHoaCoeffs value computed in HOAConfig . In addition, since the ordering of HOA coefficients may be fixed within the HOA decoder (see, for example, Appendix G on CD), the HOARenderingMatrixSet does not need any equivalent for inputConfig (audioChannelLayout) .

테이블 2 - HOARenderingMatrixSet() 의 신택스 (CD 에서의 테이블 15 로부터 채택됨)Table 2 - Syntax of HOARenderingMatrixSet () (adopted from Table 15 on CD)

기법들의 다양한 양태들은 또한, 비트스트림 생성 유닛 (46) 으로 하여금, (벡터-기반 분해 유닛 (27) 에 의해 표현된 분해 압축 스킴과 같은) 제 1 압축 스킴을 이용하여 HOA 오디오 데이터 (예를 들어, 도 4 의 예에서의 HOA 계수들 (11)) 를 압축할 때, 제 2 압축 스킴 (예를 들어, 방향-기반 분해 유닛 (28) 에 의해 표현된 방향성-기반 압축 스킴 또는 방향성-기반 (directionality-based) 압축 스킴) 에 대응하는 비트들이 비트스트림 (21) 에 포함되지 않도록 비트스트림 (21) 을 특정하는 것을 가능하게 할 수도 있다. 예를 들어, 비트스트림 생성 유닛 (42) 은 방향성-기반 압축 스킴의 방향성 신호들 사이의 예측 정보를 특정하기 위한 용도로 예비될 수도 있는 HOAPredictionInfo 신택스 엘리먼트들 또는 필드를 포함하지 않도록 비트스트림 (21) 을 생성할 수도 있다. 본 개시물에서 설명되는 기법들의 다양한 양태들에 따라 생성되는 비트스트림 (21) 의 예들이 도 8e 및 도 8f 의 예들에 도시된다.Various aspects of the techniques may also allow the bitstream generation unit 46 to generate HOA audio data (e. G., Audio data) using a first compression scheme (such as a decomposition compression scheme represented by a vector-based decomposition unit 27) (For example, the direction-based compression scheme represented by the direction-based decomposition unit 28 or the directional-based decompression unit 28 shown in FIG. 4, the HOA coefficients 11 in the example of FIG. directionality-based compression scheme) may be specified in the bitstream 21 so that the bitstream 21 is not included in the bitstream 21. For example, the bitstream generation unit 42 may generate a bitstream 21 such that it does not include HOAPredictionInfo syntax elements or fields that may be reserved for use in specifying prediction information between directional signals of a directional- May be generated. Examples of bit streams 21 generated in accordance with various aspects of the techniques described in this disclosure are illustrated in the examples of Figures 8E and 8F.

다시 말해, 방향성 신호들의 예측은 방향성-기반 분해 유닛 (28) 에 의해 채용된 우세 사운드 합성의 부분일 수도 있고 (방향-기반 신호를 나타낼 수도 있는) ChannelType 0 의 존재에 의존한다. 어떠한 방향-기반 신호도 프레임 내에 존재하지 않을 때, 방향성 신호들의 어떠한 예측도 수행되지 않을 수도 있다. 그러나, 연관된 측파대 정보 HOAPredictionInfo() 는, 이용되지 않더라도, 방향-기반 신호들의 존재와는 독립적으로 모든 프레임에 기입될 수도 있다. 어떠한 방향성 신호도 프레임 내에 존재하지 않을 때, 본 개시물에서 설명되는 기법들은 비트스트림 생성 유닛 (42) 으로 하여금 다음 테이블 (밑줄친 이탤릭체는 부가들을 표시함) 에 제시된 바와 같이 측파대에 HOAPredictionInfo 를 시그널링하지 않는 것에 의해 측파대의 사이즈를 감소시키는 것을 가능하게 할 수도 있다:In other words, the prediction of the directional signals may be part of the dominant sound synthesis employed by the direction-based decomposition unit 28 and depends on the presence of ChannelType 0 (which may represent a direction-based signal). When no direction-based signal is present in the frame, no prediction of the directional signals may be performed. However, the associated sideband information HOAPredictionInfo () may be written to all frames independently of the presence of direction-based signals, even if they are not used. When no directional signal is present in the frame, the techniques described in this disclosure cause the bitstream generation unit 42 to signal HOAPredictionInfo to the sideband as presented in the next table (underlined italics indicate additions) It may be possible to reduce the size of the sidebands by not doing so:

테이블: HOAFrame 의 신택스Table: Syntax of HOAFrame

이와 관련하여, 기법들은, 오디오 인코딩 디바이스 (20) 와 같은 디바이스로 하여금, 제 1 압축 스킴을 이용하여 고차 앰비소닉 오디오 데이터를 압축할 때, 고차 앰비소닉 오디오 데이터를 압축하는데 또한 이용되는 제 2 압축 스킴에 대응하는 비트들을 포함하지 않는 고차 앰비소닉 오디오 데이터의 압축된 버전을 표현하는 비트스트림을 특정하는 것을 가능하게 할 수도 있다.In this regard, techniques may be applied to devices, such as audio encoding device 20, for compressing higher order ambsonic audio data using a first compression scheme, It may be possible to specify a bitstream representing a compressed version of the higher order ambience sound data that does not include the bits corresponding to the scheme.

일부 경우들에서, 제 1 압축 스킴은 벡터-기반 분해 압축 스킴을 포함한다. 이들 및 다른 경우들에서, 벡터 기반 분해 압축 스킴은 고차 앰비소닉 오디오 데이터에 대한 특이값 분해 (또는 본 개시물에서 더 상세히 설명되는 그의 등가물들) 의 적용을 수반하는 압축 스킴을 포함한다.In some cases, the first compression scheme includes a vector-based decomposition compression scheme. In these and other cases, the vector-based decomposition compression scheme includes a compression scheme involving the application of singular value decomposition (or equivalents thereof as described in greater detail in this disclosure) to higher order ambsonic audio data.

이들 및 다른 경우들에서, 오디오 인코딩 디바이스 (20) 는 압축 스킴의 제 2 타입을 수행하는데 이용되는 적어도 하나의 신택스 엘리먼트에 대응하는 비트들을 포함하지 않는 비트스트림을 특정하도록 구성될 수도 있다. 제 2 압축 스킴은, 위에서 언급된 바와 같이, 방향성-기반 압축 스킴을 포함할 수도 있다.In these and other cases, the audio encoding device 20 may be configured to specify a bitstream that does not include bits corresponding to at least one syntax element used to perform the second type of compression scheme. The second compression scheme, as mentioned above, may also include a directional-based compression scheme.

오디오 인코딩 디바이스 (20) 는 또한 비트스트림 (21) 이 제 2 압축 스킴의 HOAPredictionInfo 신택스 엘리먼트에 대응하는 비트들을 포함하지 않도록 비트스트림 (21) 을 특정하도록 구성될 수도 있다.The audio encoding device 20 may also be configured to specify the bitstream 21 such that the bitstream 21 does not include the bits corresponding to the HOAPredictionInfo syntax element of the second compression scheme.

제 2 압축 스킴이 방향성-기반 압축 스킴을 포함할 때, 오디오 인코딩 디바이스 (20) 는 비트스트림 (21) 이 방향성-기반 압축 스킴의 HOAPredictionInfo 신택스 엘리먼트에 대응하는 비트들을 포함하지 않도록 비트스트림 (21) 을 특정하도록 구성될 수도 있다. 다시 말해, 오디오 인코딩 디바이스 (20) 는 비트스트림 (21) 이 압축 스킴들의 제 2 타입을 수행하는데 이용되는 적어도 하나의 신택스 엘리먼트에 대응하는 비트들을 포함하지 않도록 비트스트림 (21) 을 특정하도록 구성될 수도 있고, 그 적어도 하나의 신택스 엘리먼트는 2 개 이상의 방향성-기반 신호들 사이의 예측을 나타낸다. 또 다시 재언급하면, 제 2 압축 스킴이 방향성-기반 압축 스킴을 포함할 때, 오디오 인코딩 디바이스 (20) 는 비트스트림 (21) 이 방향성-기반 압축 스킴의 HOAPredictionInfo 신택스 엘리먼트에 대응하는 비트들을 포함하지 않도록 비트스트림 (21) 을 특정하도록 구성될 수도 있고, 여기서 HOAPredictionInfo 신택스 엘리먼트는 2 개 이상의 방향성-기반 신호들 사이의 예측을 나타낸다.When the second compression scheme comprises a directional-based compression scheme, the audio encoding device 20 generates a bitstream 21 such that the bitstream 21 does not contain the bits corresponding to the HOAPredictionInfo syntax element of the directional- . &Lt; / RTI > In other words, the audio encoding device 20 is configured to specify the bitstream 21 such that the bitstream 21 does not include the bits corresponding to the at least one syntax element used to perform the second type of compression schemes And the at least one syntax element represents a prediction between two or more directional-based signals. Again, again, when the second compression scheme includes a directional-based compression scheme, the audio encoding device 20 determines that the bitstream 21 does not contain bits corresponding to the HOAPredictionInfo syntax element of the directional-based compression scheme , Where the HOAPredictionInfo syntax element indicates a prediction between two or more directional-based signals.

기법들의 다양한 양태들은 추가로, 비트스트림 생성 유닛 (46) 으로 하여금, 소정의 경우들에서 비트스트림 (21) 이 이득 정정 데이터를 포함하지 않도록 비트스트림 (21) 을 특정하는 것을 가능하게 할 수도 있다. 비트스트림 생성 유닛 (46) 은, 이득 정정이 억제될 때, 비트스트림 (21) 이 이득 정정 데이터를 포함하지 않도록 비트스트림 (21) 을 특정할 수도 있다. 기법들의 다양한 양태들에 따라 생성되는 비트스트림 (21) 의 예들이, 위에서 언급된 바와 같이, 도 8e 및 도 8f 의 예들에 도시된다.Various aspects of the techniques may further enable the bitstream generation unit 46 to specify the bitstream 21 such that in some cases the bitstream 21 does not include the gain correction data . The bitstream generation unit 46 may specify the bitstream 21 so that when the gain correction is suppressed, the bitstream 21 does not include the gain correction data. Examples of the bit stream 21 generated according to various aspects of the techniques are shown in the examples of Figures 8E and 8F, as mentioned above.

일부 경우들에서, 심리음향 인코딩의 소정의 타입들이 수행될 때 이득 정정이 적용되어 심리음향 인코딩의 다른 타입들에 비해 이들 심리음향 인코딩의 소정의 타입들의 상대적으로 더 작은 동적 범위가 주어진다. 예를 들어, AAC 는 단일화된 음성 및 오디오 코딩 (unified speech and audio coding; USAC) 보다 상대적으로 더 작은 동적 범위를 갖는다. (벡터-기반 합성 압축 스킴 또는 방향성-기반 압축 스킴과 같은) 압축 스킴이 USAC 를 수반할 때, 비트스트림 생성 유닛 (46) 은 (예를 들어, 비트스트림 (21) 에 0 의 값으로 HOAConfig 에서의 신택스 엘리먼트 MaxGainCorrAmpExp 를 특정하는 것에 의해) 이득 정정이 억제되었다는 것을 비트스트림 (21) 에 시그널링한 후에 (HOAGainCorrectionData() 필드에) 이득 정정 데이터를 포함시키지 않도록 비트스트림 (21) 을 특정할 수도 있다.In some cases, gain correction is applied when certain types of psychoacoustic encodings are performed, giving a relatively smaller dynamic range of certain types of these psychoacoustic encodings relative to other types of psychoacoustic encodings. For example, AAC has a relatively smaller dynamic range than unified speech and audio coding (USAC). When a compression scheme (such as a vector-based synthetic compression scheme or a directional-based compression scheme) involves USAC, the bitstream generation unit 46 generates a bitstream 21 (for example, a value of 0 in HOAConfig The bitstream 21 may be specified to not include the gain correction data ( by specifying the syntax element MaxGainCorrAmpExp in the HOAGainCorrectionData () field ) after signaling to the bitstream 21 that the gain correction has been suppressed.

다시 말해, HOAConfig 의 부분으로서의 비트필드 MaxGainCorrAmpExp (CD 에서의 테이블 71 참조) 는 USAC 코어 코딩에 앞서 자동 이득 제어 모듈이 전송 채널 신호들에 영향을 미치는 범위를 제어할 수도 있다. 일부 경우들에서, 이 모듈은 RM0 가 이용가능 AAC 인코더 구현의 비이상적인 동적 범위를 개선시키기 위해 개발되었다. 집적 페이즈 동안 AAC 로부터 USAC 코어 코더로의 변화로, 코어 인코더의 동적 범위는 개선되어 그에 따라, 이 이득 제어 모듈에 대한 필요성이 이전만큼 중요하지 않을 수도 있다.In other words, (see table 71 in the CD) as a part of the bit field HOAConfig MaxGainCorrAmpExp may control the range automatic gain control module is affecting the transmission channel signal prior to the USAC core coding. In some cases, this module has been developed to improve the non-ideal dynamic range of RM0-enabled AAC encoder implementations. With the change from AAC to USAC corecoder during the integration phase, the dynamic range of the core encoder is improved so that the need for this gain control module may not be as important as before.

일부 경우들에서, MaxGainCorrAmpExp 가 0 으로 설정되는 경우 이득 제어 기능성은 억제될 수 있다. 이들 경우들에서, 연관된 측파대 정보 HOAGainCorrectionData() 는 "HOAFrame 의 신택스" 를 예시하는 위의 테이블 당 모든 HOA 프레임에 기입되지 않을 수도 있다. MaxGainCorrAmpExp 가 0 으로 설정되는 구성에 대해, 본 개시물에서 설명되는 기법들은 HOAGainCorrectionData 를 시그널링하지 않을 수도 있다. 추가로, 이러한 시나리오에서 역 이득 제어 모듈이 심지어 바이패스되어, 어떠한 부정적인 부작용 없이 전송 채널 당 약 0.05 MOPS 까지 디코더 복잡도를 감소시킬 수도 있다.In some cases, the gain control functionality can be suppressed when MaxGainCorrAmpExp is set to zero. In these cases, the associated sideband information HOAGainCorrectionData () may not be written to every HOA frame per table above, which illustrates " Syntax of HOAFrame ". For configurations where MaxGainCorrAmpExp is set to zero, the techniques described in this disclosure may not signal HOAGainCorrectionData. Additionally, in such a scenario, the reverse gain control module may even be bypassed to reduce decoder complexity to about 0.05 MOPS per transport channel without any adverse side effects.

이와 관련하여, 기법들은, 고차 앰비소닉 오디오 데이터의 압축 동안 이득 정정이 억제될 때, 비트스트림 (21) 이 이득 정정 데이터를 포함하지 않도록 고차 앰비소닉 오디오 데이터의 압축된 버전을 표현하는 비트스트림 (21) 을 특정하도록 오디오 인코딩 디바이스 (20) 를 구성할 수도 있다.In this regard, techniques may be used to generate a bitstream (e. G., A bitstream) that represents a compressed version of high-order ambsonic audio data so that when the gain correction is suppressed during compression of higher- 21 of the audio encoding device 20. [

이들 및 다른 경우들에서, 오디오 인코딩 디바이스 (20) 는 벡터-기반 분해 압축 스킴에 따라 고차 앰비소닉 오디오 데이터를 압축하여 고차 앰비소닉 오디오 데이터의 압축된 버전을 생성하도록 구성될 수도 있다. 분해 압축 스킴의 예들로는 고차 앰비소닉 오디오 데이터의 압축된 버전을 생성하기 위해 고차 앰비소닉 오디오 데이터에 대한 특이값 분해 (또는 위에서 더 상세히 설명되는 그의 등가물들) 의 적용을 수반할 수도 있다.In these and other cases, the audio encoding device 20 may be configured to compress higher order ambsonic audio data according to a vector-based decomposition compression scheme to produce a compressed version of higher order ambsonic audio data. Examples of decomposition compression schemes may involve the application of singular value decomposition (or equivalents thereof as described in more detail above) to higher order ambience sound data to produce a compressed version of the higher order ambience sound data.

이들 및 다른 경우들에서, 오디오 인코딩 디바이스 (20) 는 이득 정정이 억제됨을 나타내기 위해 비트스트림 (21) 에서의 MaxGainCorrAmbExp 신택스 엘리먼트를 0 으로서 특정하도록 구성될 수도 있다. 일부 경우들에서, 오디오 인코딩 디바이스 (20) 는, 이득 정정이 억제될 때, 비트스트림 (21) 이 이득 정정 데이터를 저장하는 HOAGainCorrection 데이터 필드를 포함하지 않도록 비트스트림 (21) 을 특정하도록 구성될 수도 있다. 다시 말해, 오디오 인코딩 디바이스 (20) 는 이득 정정이 억제되고 이득 정정 데이터를 저장하는 HOAGainCorrection 데이터 필드를 비트스트림에 포함시키지 않는 것으로 나타내기 위해 비트스트림 (21) 에서의 MaxGainCorrAmbExp 신택스 엘리먼트를 0 으로서 특정하도록 구성될 수도 있다.In these and other cases, the audio encoding device 20 may be configured to specify the MaxGainCorrAmbExp syntax element in the bitstream 21 as zero to indicate that the gain correction is suppressed. In some cases, the audio encoding device 20 may be configured to specify the bitstream 21 such that when the gain correction is suppressed, the bitstream 21 does not include a HOAGainCorrection data field that stores the gain correction data have. In other words, the audio encoding device 20 specifies the MaxGainCorrAmbExp syntax element in the bitstream 21 as zero to indicate that the gain correction is suppressed and does not include the HOAGainCorrection data field storing the gain correction data in the bitstream .

이들 및 다른 경우들에서, 오디오 인코딩 디바이스 (20) 는 고차 앰비소닉 오디오 데이터의 압축이 고차 앰비소닉 오디오 데이터에 대한 단일화된 오디오 음성 및 음성 오디오 코딩 (USAC) 의 적용을 포함할 때 이득 정정을 억제하도록 구성될 수도 있다.In these and other cases, the audio encoding device 20 suppresses gain correction when the compression of higher order ambsonic audio data involves the application of a single audio audio and voice audio coding (USAC) for higher order ambsonic audio data. .

비트스트림 (21) 에서의 다양한 정보의 시그널링에 대한 전술한 잠재적인 최적화들이 아래에 더욱 상세히 설명되는 방식으로 적응 또는 그렇지 않으면 업데이트될 수도 있다. 이 업데이트들은 아래에 논의되는 다른 업데이트들과 함께 적용되거나 또는 위에서 논의된 최적화들의 다양한 양태들만을 업데이트하는데 이용될 수도 있다. 이와 같이, 상술된 최적화들에 대한 아래에 설명되는 단일 업데이트의 적용 또는 상술된 최적화들에 대한 아래에 설명되는 업데이트들의 임의의 특정 조합들을 포함하는, 상술된 최적화들에 대한 업데이트들의 각각의 잠재적인 조합이 고려된다.The aforementioned potential optimizations for the signaling of various information in the bitstream 21 may be adapted or otherwise updated in a manner to be described in more detail below. These updates may be applied together with other updates discussed below or may be used to update only the various aspects of the optimizations discussed above. As such, the potential of each of the updates to the above-described optimizations, including any specific combinations of updates described below for the above-described optimizations or the application of a single update described below for the above- Combinations are considered.

비트스트림에서 행렬을 특정하기 위해, 비트스트림 생성 유닛 (42) 은, 예를 들어, 다음 테이블에서 볼드체로 되고 하이라이트된 것으로서 아래에 도시된 바와 같이, 비트스트림 (21) 의 mpegh3daConfigExtension() 에서 ID_CONFIG_EXT_HOA_MATRIX 를 특정할 수도 있다. 다음 테이블은 비트스트림 (21) 의 mpegh3daConfigExtension() 부분을 특정하기 위한 신택스를 표현한다:In order to specify the matrix in the bitstream, the bitstream generation unit 42 generates ID_CONFIG_EXT_HOA_MATRIX in mpegh3daConfigExtension () of the bitstream 21, for example as shown in the bold and highlighted table in the following table It can be specified. The following table represents the syntax for specifying the mpegh3daConfigExtension () portion of the bitstream 21:

테이블 - mpegh3daConfigExtension() 의 신택스Table - Syntax for mpegh3daConfigExtension ()

전술한 테이블에서의 ID_CONFIG_EXT_HOA_MATRIX 는 렌더링 행렬을 특정하기 위한 컨테이너를 제공하고, 그 컨테이너는 "HOARenderingMatrixSet()" 로서 표시된다.ID_CONFIG_EXT_HOA_MATRIX in the above table provides a container for specifying the rendering matrix, and the container is indicated as " HOARenderingMatrixSet () ".

HOARenderingMatrixSet() 컨테이너의 콘텐츠들은 다음 테이블에 제시된 신택스에 따라 정의될 수도 있다:The contents of the HOARenderingMatrixSet () container may be defined according to the syntax shown in the following table:

테이블 - HOARenderingMatrixSet() 의 신택스Table - Syntax of HOARenderingMatrixSet ()

바로 위의 테이블에 도시된 바와 같이, HOARenderingMatrixSet() 는, numHoaRenderingMatrices, HoaRendereringMatrixId, CICPspeakerLayoutIdx, HoaMatrixLenBits 및 HoARenderingMatrix 를 포함하는 다수의 상이한 신택스 엘리먼트들을 포함한다.As shown in the immediately preceding table, HOARenderingMatrixSet () includes a number of different syntax elements, including numHoaRenderingMatrices, HoaRendereringMatrixId, CICPspeakerLayoutIdx, HoaMatrixLenBits, and HoARenderingMatrix.

numHoaRenderingMatrices 신택스 엘리먼트는 비트스트림 엘리먼트에 존재하는 HoaRendereringMatrixId 정의들의 개수를 특정할 수도 있다. HoARenderingMatrixId 신택스 엘리먼트는 디코더 측 상에서 이용가능한 디폴트 HOA 렌더링 행렬 또는 송신된 HOA 렌더링 행렬에 대해 Id 를 고유하게 정의하는 필드를 표현할 수도 있다. 이와 관련하여, HoARenderingMatrixId 는 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 행렬을 비트스트림이 포함함을 나타내는 인덱스를 정의하는 2 개 이상의 비트들을 포함하는 신호 값 또는 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 복수의 행렬들 중 하나와 연관된 인덱스를 정의하는 2 개 이상의 비트들을 포함하는 신호 값의 예를 표현할 수도 있다. CICPspeakerLayoutIdx 신택스 엘리먼트는 주어진 HOA 렌더링 행렬에 대한 출력 라우드스피커 레이아웃을 설명하는 값을 표현할 수도 있고, ISO/IEC 23000 1-8 에 정의된 ChannelConfiguration 엘리먼트에 대응할 수도 있다. (또한 "HoARenderingMatrixLenBits" 로서 표시될 수도 있는) HoaMatrixLenBits 신택스 엘리먼트는 비트들에 있어서의 후속 비트 스트림 엘리먼트 (예를 들어, HoARenderingMatrix() 컨테이너) 의 길이를 특정할 수도 있다.The numHoaRenderingMatrices syntax element may specify the number of HoaRendereringMatrixId definitions present in the bitstream element. The HoARenderingMatrixId syntax element may represent a field that uniquely defines Id for the default HOA rendering matrix or the transmitted HOA rendering matrix available on the decoder side. In this regard, the HoARenderingMatrixId may comprise a signal value or spherical harmonic coefficients, including two or more bits defining an index indicating that the bitstream includes a matrix used to render the spherical harmonic coefficients into a plurality of speaker feeds, May represent an example of a signal value that includes two or more bits that define an index associated with one of a plurality of matrices used to render into feeds. The CICPspeakerLayoutIdx syntax element may represent a value that describes the output loudspeaker layout for a given HOA rendering matrix, or may correspond to a ChannelConfiguration element defined in ISO / IEC 23000 1-8. The HoaMatrixLenBits syntax element (which may also be denoted as " HoARenderingMatrixLenBits ") may specify the length of a subsequent bitstream element (e.g., a HoARenderingMatrix () container) in bits.

HoARenderingMatrix() 컨테이너는 NumOfHoaCoeffs 다음에 outputConfig() 컨테이너 및 outputCount() 컨테이너를 포함한다. outputConfig() 컨테이너는 각각의 라우드스피커에 관한 정보를 특정하는 채널 구성 벡터들을 포함할 수도 있다. 비트스트림 생성 유닛 (42) 은 이 라우드스피커 정보가 출력 레이아웃의 채널 구성들로부터 알려져 있는 것으로 가정할 수도 있다. 각각의 엔트리 outputConfig[i] 는 다음 멤버들을 갖는 데이터 구조를 표현할 수도 있다:The HoARenderingMatrix () container contains NumOfHoaCoeffs followed by the outputConfig () and outputCount () containers. The outputConfig () container may contain channel configuration vectors that specify information about each loudspeaker. Bitstream generation unit 42 may assume that this loudspeaker information is known from the channel configurations of the output layout. Each entry outputConfig [i] may represent a data structure with the following members:

AzimuthAngle (스피커 방위각의 절대 값을 표시할 수도 있음);AzimuthAngle (may also represent the absolute value of the speaker azimuth);

AzimuthDirection (하나의 예로서, 좌측에 대해 0 그리고 우측에 대해 1 을 이용하는 방위 방향을 표시할 수도 있음);AzimuthDirection (in one example, it may indicate the azimuth direction using 0 for the left and 1 for the right);

Elevation Angle (스피커 고도각들의 절대 값을 표시할 수도 있음);Elevation Angle (may also represent the absolute value of the speaker altitude angles);

ElevationDirection (하나의 예로서, 상측에 대해 0 그리고 하측에 대해 1 을 이용하는 고도 방향을 표시할 수도 있음); 그리고ElevationDirection (in one example, it may indicate the altitude direction using 0 for the upper side and 1 for the lower side); And

isLFE (스피커가 저주파 효과 (LFE) 스피커인지 여부를 나타낼 수도 있음).isLFE (may also indicate whether the speaker is a low frequency effect (LFE) speaker).

비트스트림 생성 유닛 (42) 은, 일부 경우들에서, 다음을 추가로 특정할 수도 있는 "findSymmetricSpeakers" 로서 표시된 헬퍼 함수 (helper function) 를 호출할 수도 있다:Bitstream generation unit 42 may invoke, in some cases, a helper function indicated as " findSymmetricSpeakers ", which may further specify:

pairType (SYMMETRIC (이는 일부 예에서 2 개의 스피커들의 대칭 쌍을 의미함), CENTER, 또는 ASYMMETRIC 의 값을 저장할 수도 있음); 그리고pairType (which may store the value of SYMMETRIC (which in some examples means a symmetric pair of two speakers), CENTER, or ASYMMETRIC); And

symmetricPair->originalPosition (SYMMETRIC 그룹들에 대해서만, 그룹에서의 제 2 (예를 들어, 우측) 스피커의 오리지널 채널 구성에서의 포지션을 표시할 수도 있음).symmetricPair-> originalPosition (For SYMMETRIC groups only, it may indicate the position in the original channel configuration of the second (eg, right) speaker in the group).

outputCount() 컨테이너는 HOA 렌더링 행렬이 정의되는 라우드스피커들의 개수를 특정할 수도 있다.The outputCount () container may specify the number of loudspeakers for which the HOA rendering matrix is defined.

비트스트림 생성 유닛 (42) 은 다음 테이블에 제시된 신택스에 따라 HoARenderingMatrix() 컨테이너를 특정할 수도 있다:The bitstream generation unit 42 may specify a HoARenderingMatrix () container according to the syntax presented in the following table:

테이블 - HoARenderingMatrix() 의 신택스Table - Syntax of HoARenderingMatrix ()

바로 위의 테이블에 도시된 바와 같이, numPairs 신택스 엘리먼트는 입력들로서 outputCount 및 outputConfig 그리고 hasLfeRendering 을 이용하여 findSymmetricSpeakers 헬퍼 함수를 호출하는 것으로부터 출력된 값으로 설정된다. numPairs 는 그에 따라 효율적인 대칭성 코딩을 위해 고려될 수도 있는 출력 라우드스피커 셋업에서 식별된 대칭 라우드스피커 쌍들의 개수를 표시할 수도 있다. 위의 테이블에서의 precisionLevel 신택스 엘리먼트는 다음 테이블에 따라 이득들의 균일한 양자화를 위해 이용되는 정밀도를 표시할 수도 있다:As shown in the table immediately above, the numPairs syntax element is set to the value output from calling the findSymmetricSpeakers helper function using outputCount, outputConfig, and hasLfeRendering as inputs. numPairs may thus indicate the number of symmetric loudspeaker pairs identified in the output loudspeaker setup that may be considered for efficient symmetric coding. The precisionLevel syntax element in the above table may also indicate the precision used for uniform quantization of the gains according to the following table:

테이블 - precisionLevel 의 함수로서 hoaGain 의 균일한 양자화 단계 사이즈Table - uniform quantization step size of hoaGain as a function of precisionLevel

HoARenderingMatrix() 의 신택스를 제시한 위의 테이블에 도시된 gainLimitPerHoaOrder 신택스 엘리먼트는, maxGain 및 minGain 이 각각의 차수에 대해 또는 전체 HOA 렌더링 행렬에 대해 개별적으로 특정되는지를 나타내는 플래그를 표현할 수도 있다. maxGain[i] 신택스 엘리먼트들은, 하나의 예로서, 데시벨 (dB) 에 있어서, 표시된 HOA 차수 i 에 대한 계수들에 대한 행렬에서 최대 실제 이득을 특정할 수도 있다. minGain[i] 신택스 엘리먼트들은, 다시 하나의 예로서, dB 에 있어서, 표시된 HOA 차수 i 의 계수들에 대한 행렬에서 최소 실제 이득을 특정할 수도 있다. isFullMatrix 신택스 엘리먼트는 HOA 렌더링 행렬이 희소 (sparse) 한지 또는 충만 (full) 한지를 나타내는 플래그를 표현할 수도 있다. firstSparseOrder 신택스 엘리먼트는, HOA 렌더링 행렬이 isFullMatrix 신택스 엘리먼트에 대해 희소한 것으로 특정된 경우에, 희소하게 코딩되는 제 1 HOA 차수를 특정할 수도 있다. isHoaCoefSparse 신택스 엘리먼트는 firstSparseOrder 신택스 엘리먼트로부터 도출된 비트마스크 벡터를 표현할 수도 있다. lfeExists 신택스 엘리먼트는 하나 이상의 LFE들이 outputConfig 에 존재하는지 여부를 나타내는 플래그를 표현할 수도 있다. hasLfeRendering 신택스 엘리먼트는 렌더링 행렬이 하나 이상의 LFE 채널들에 대한 논-제로 엘리먼트들을 포함하는지 여부를 나타낸다. zerothOrderAlwaysPositive 신택스 엘리먼트는 0 번째 HOA 차수가 단지 포지티브 값들만을 갖는지 여부를 나타내는 플래그를 표현할 수도 있다.The gainLimitPerHoaOrder syntax element shown in the above table that presents the syntax of HoARenderingMatrix () may represent a flag indicating whether maxGain and minGain are individually specified for each order or for the entire HOA rendering matrix. The maxGain [ i ] syntax elements, as an example, may specify the maximum real gain in the matrix for coefficients for the indicated HOA order i for decibels (dB). The minGain [ i ] syntax elements may again specify the minimum real gain in the matrix for the coefficients of the indicated HOA order i , in dB, as an example. The isFullMatrix syntax element may also represent a flag indicating whether the HOA rendering matrix is sparse or full. The firstSparseOrder syntax element may specify a rarely coded first HOA order if the HOA rendering matrix is specified to be rare for the isFullMatrix syntax element. The isHoaCoefSparse syntax element may represent a bit mask vector derived from the firstSparseOrder syntax element. The lfeExists syntax element may represent a flag indicating whether one or more LFEs are present in the outputConfig. The hasLfeRendering syntax element indicates whether the rendering matrix includes non-zero elements for one or more LFE channels. The zerothOrderAlwaysPositive syntax element may also represent a flag indicating whether the zeroth HOA order has only positive values.

isAllValueSymmetric 신택스 엘리먼트는 모든 대칭 라우드스피커 쌍들이 HOA 렌더링 행렬에서 동일한 절대 값들을 갖는지 여부를 나타내는 플래그를 표현할 수도 있다. isAnyValueSymmetric 신택스 엘리먼트는, 예를 들어 거짓일 때, 대칭 라우드스피커 쌍들 중 일부가 HOA 렌더링 행렬에서 동일한 절대 값들을 갖는지 여부를 나타내는 플래그를 표현한다. valueSymmetricPairs 신택스 엘리먼트는 값 대칭성으로 라우드스피커 쌍들을 나타내는 길이 numPairs 의 비트마스크를 표현할 수도 있다. isValueSymmetric 신택스 엘리먼트는 valueSymmetricPairs 신택스 엘리먼트로부터 테이블 3 에 도시된 방식으로 도출된 비트마스크를 표현할 수도 있다. isAllSignSymmetric 신택스 엘리먼트는, 행렬에 어떠한 값 대칭성들도 존재하지 않을 때, 모든 대칭 라우드스피커 쌍들이 적어도 숫자 부호 (number sign) 대칭성들을 갖는지 여부를 표시할 수도 있다. isAnySignSymmetric 신택스 엘리먼트는 숫자 부호 대칭성들을 갖는 적어도 일부의 대칭 라우드스피커 쌍들이 존재하는지 여부를 나타내는 플래그를 표현할 수도 있다. signSymmetricPairs 신택스 엘리먼트는 부호 대칭성으로 라우드스피커 쌍들을 나타내는 길이 numPairs 의 비트마스크를 표현할 수도 있다. isSignSymmetric 변수는 HoARenderingMatrix() 의 신택스를 제시하는 테이블에서 위에 도시된 방식으로 signSymmetricPairs 신택스 엘리먼트로부터 도출된 비트마스크를 표현할 수도 있다. hasVerticalCoef 신택스 엘리먼트는 행렬이 수평 전용 HOA 렌더링 행렬인지 여부를 나타내는 플래그를 표현할 수도 있다. bootVal 신택스 엘리먼트는 디코딩 루프에서 이용되는 변수를 표현할 수도 있다.The isAllValueSymmetric syntax element may represent a flag indicating whether all symmetric loudspeaker pairs have the same absolute values in the HOA rendering matrix. The isAnyValueSymmetric syntax element represents, for example, a flag indicating whether some of the symmetric loudspeaker pairs have the same absolute values in the HOA rendering matrix when, for example, it is false. The valueSymmetricPairs syntax element may represent a bit mask of length numPairs representing loudspeaker pairs with value symmetry. The isValueSymmetric syntax element may represent a bit mask derived from the valueSymmetricPairs syntax element in the manner shown in Table 3. The isAllSignSymmetric syntax element may indicate whether all symmetric loudspeaker pairs have at least number sign symmetries when there are no value symmetries in the matrix. The isAnySignSymmetric syntax element may represent a flag indicating whether there are at least some symmetric loudspeaker pairs with numeric symmetricities. The signSymmetricPairs syntax element may also represent a bit mask of length numPairs representing the pairs of loudspeakers symmetrically. The isSignSymmetric variable may represent a bit mask derived from the signSymmetricPairs syntax element in the manner shown above in a table that presents the syntax of HoARenderingMatrix (). The hasVerticalCoef syntax element may also represent a flag indicating whether the matrix is a horizontal dedicated HOA rendering matrix. The bootVal syntax element may represent a variable used in the decoding loop.

다시 말해, 비트스트림 생성 유닛 (42) 은 위의 값 대칭성 정보 (예를 들어, isAllValueSymmetric 신택스 엘리먼트, isAnyValueSymmetric 신택스 엘리먼트, valueSymmetricPairs 신택스 엘리먼트, isValueSymmetric 신택스 엘리먼트, 및 valueSymmetricPairs 신택스 엘리먼트 중 하나 이상의 신택스 엘리먼트의 임의의 조합) 중 임의의 하나 이상을 생성하거나 또는 그렇지 않으면 값 대칭성 정보를 획득하기 위해 오디오 렌더러 (1) 를 분석할 수도 있다. 비트스트림 생성 유닛 (42) 은 오디오 렌더러 정보 (2) 가 값 부호 대칭성 정보를 포함하도록 위에 도시된 방식으로 비트스트림 (21) 에서의 오디오 렌더러 정보 (2) 를 특정할 수도 있다.In other words, the bitstream generation unit 42 may generate a bitstream that is a combination of one or more syntax elements of the above value symmetry information (e.g., isAllValueSymmetric syntax element, isAnyValueSymmetric syntax element, valueSymmetricPairs syntax element, isValueSymmetric syntax element, and valueSymmetricPairs syntax element) ) Or otherwise analyze the audio renderer 1 to obtain the value symmetry information. Bitstream generation unit 42 may specify audio renderer information 2 in bitstream 21 in the manner shown above so that audio renderer information 2 includes value sign symmetry information.

더욱이, 비트스트림 생성 유닛 (42) 은 또한 위의 부호 대칭성 정보 (예를 들어, isAllSignSymmetric 신택스 엘리먼트, isAnySignSymmetric 신택스 엘리먼트, signSymmetricPairs 신택스 엘리먼트, isSignSymmetric 신택스 엘리먼트, 및 signSymmetricPairs 신택스 엘리먼트 중 하나 이상의 신택스 엘리먼트의 임의의 조합) 중 임의의 하나 이상을 생성하거나 또는 그렇지 않으면 부호 대칭성 정보를 획득하기 위해 오디오 렌더러 (1) 를 분석할 수도 있다. 비트스트림 생성 유닛 (42) 은 오디오 렌더러 정보 (2) 가 오디오 부호 대칭성 정보를 포함하도록 위에 도시된 방식으로 비트스트림 (21) 에서의 오디오 렌더러 정보 (2) 를 특정할 수도 있다.Furthermore, the bitstream generation unit 42 may also generate the bitstream generation unit 42 in any combination of one or more of the syntax elements of the above sign symmetry information (e.g., isAllSignSymmetric syntax element, isAnySignSymmetric syntax element, signSymmetricPairs syntax element, isSignSymmetric syntax element, and signSymmetricPairs syntax element) ), Or otherwise analyze the audio renderer 1 to obtain sign symmetry information. Bitstream generation unit 42 may specify audio renderer information 2 in bitstream 21 in a manner shown above so that audio renderer information 2 includes audio code symmetry information.

값 대칭성 정보 및 부호 대칭성 정보를 결정할 때, 비트스트림 생성 유닛 (42) 은, 행렬로서 특정될 수도 있는, 오디오 렌더러 (1) 의 다양한 값들을 분석할 수도 있다. 렌더링 행렬은 행렬 R 의 의사-역 (pseudo-inverse) 으로서 공식화될 수도 있다. 다시 말해, (아래에 Z 로서 표시되는) (N+1)² HOA 채널들을 (L 개의 라우드스피커 신호들의 컬럼 벡터 p 로 표시되는) L 개의 라우드스피커 신호들로 렌더링하기 위해, 다음 식이 주어질 수도 있다:When determining the value symmetry information and the code symmetry information, the bitstream generating unit 42 may analyze various values of the audio renderer 1, which may be specified as a matrix. The rendering matrix may be formulated as a pseudo-inverse of the matrix R. [ In other words, to render (N + 1) ² HOA channels (denoted as Z below) with L loudspeaker signals (denoted by column vector p of L loudspeaker signals), the following equation may be given :

Z = R * p.Z = R * p.

L 개의 라우드스피커 신호들을 출력하는 렌더링 행렬에 도달하기 위해, R 행렬의 역이 다음 식에 나타낸 바와 같이 Z 개의 HOA 채널들로 곱해진다:To arrive at a rendering matrix that outputs L loudspeaker signals, the inverse of the R matrix is multiplied by Z HOA channels as shown in the following equation:

p = R^-1 * Z.p = R- ¹ * Z.

라우드스피커 채널들의 개수 L 이 Z 개의 HOA 채널들의 개수 (N+1)² 과 동일하지 않은 한, 행렬 R 은 제곱되지 않을 것이고 완전한 역이 결정되지 않을 수도 있다. 그 결과, 의사-역이 그 대신에 이용될 수도 있고, 이 의사-역은 다음과 같이 정의된다:As long as the number L of loudspeaker channels is not equal to the number (N + 1) ² of Z HOA channels, the matrix R may not be squared and a complete inverse may not be determined. As a result, a pseudo-inverse may be used instead, and this pseudo-inverse is defined as:

pinv(R) = R^T (R * R^T)^-1,pinv (R) = R ^T (R * R ^T ) ^-1 ,

여기서 R^T 는 R 행렬의 전치를 표시한다. 위의 식에서 R^-1 을 대체시키면, 컬럼 벡터 p 로 표시된 L 개의 라우드스피커 신호들에 대한 풀이가 다음과 같이 수학적으로 표시될 수도 있다:Where R ^T denotes the transpose of the R matrix. Substituting R ^-1 in the above equation, a solution to the L loudspeaker signals denoted by column vector p may be mathematically expressed as:

p = pinv(R) * Z = R^T (R * R^T)^-1 * Z.p = pinv (R) * Z = R ^T (R * R ^T ) ^-1 * Z.

R 행렬의 엔트리들은 상이한 구면 조화들에 대한 (N+1)² 개의 로우들 및 스피커들에 대한 L 개의 컬럼들을 갖는 라우드스피커 포지션들에 대한 구면 조화들의 값들이다. 비트스트림 생성 유닛 (42) 은 스피커들에 대한 값들에 기초하여 라우드스피커 쌍들을 결정할 수도 있다. 라우드스피커 포지션들에 대한 구면 조화들의 값들을 분석하면, 비트스트림 생성 유닛 (42) 은 그 값들에 기초하여 라우드스피커 포지션들 중 어떤 것이 쌍들인지를 (예를 들어, 쌍들이 유사한, 거의 동일한, 또는 동일한 값을 가질 수도 있지만 반대 부호들을 갖기 때문에) 결정할 수도 있다.The entries in the R matrix are the values of the spherical harmonics for the loudspeaker positions with (N + 1) ² rows for the different spherical harmonics and L columns for the speakers. The bitstream generation unit 42 may determine the loudspeaker pairs based on the values for the speakers. By analyzing the values of the spherical harmonics for the loudspeaker positions, the bitstream generation unit 42 determines which of the loudspeaker positions are pairs based on their values (e.g., the pairs are similar, May have the same value but have opposite signs).

쌍들을 식별한 후에, 비트스트림 생성 유닛 (42) 은 각각의 쌍에 대해, 쌍들이 동일한 값 또는 거의 동일한 값을 갖는지 여부를 결정할 수도 있다. 쌍들 모두가 동일한 값을 가질 때, 비트스트림 생성 유닛 (42) 은 isAllValueSymmetric 신택스 엘리먼트를 1 로 설정할 수도 있다. 쌍들 모두가 동일한 값을 갖지 않을 때, 비트스트림 생성 유닛 (42) 은 isAllValueSymmetric 신택스 엘리먼트를 0 으로 설정할 수도 있다. 쌍들 모두가 아니라 하나 이상이 동일한 값을 가질 때, 비트스트림 생성 유닛 (42) 은 isAnyValueSymmetric 신택스 엘리먼트를 1 로 설정할 수도 있다. 쌍들 중 어느 것도 동일한 값을 갖지 않을 때, 비트스트림 생성 유닛 (42) 은 isAnyValueSymmetric 신택스 엘리먼트를 0 으로 설정할 수도 있다. 대칭 값들을 갖는 쌍들에 대해, 비트스트림 생성 유닛 (42) 은 스피커들의 쌍에 대해 2 개의 별개의 값들보다는 하나의 값만을 단지 특정하여, 그에 의해 비트스트림 (21) 에서 오디오 렌더링 정보 (2) (예를 들어, 이 예에서는 행렬) 를 표현하는데 이용되는 비트수를 감소시킬 수도 있다.After identifying the pairs, the bitstream generation unit 42 may, for each pair, determine whether the pairs have the same value or approximately the same value. When all of the pairs have the same value, the bitstream generating unit 42 may set the isAllValueSymmetric syntax element to one. Bitstream generation unit 42 may set the isAllValueSymmetric syntax element to zero when all of the pairs do not have the same value. Bitstream generating unit 42 may set the isAnyValueSymmetric syntax element to 1 when more than one but not all of the pairs have the same value. When none of the pairs have the same value, the bitstream generating unit 42 may set the isAnyValueSymmetric syntax element to zero. For pairs having symmetric values, the bitstream generation unit 42 only specifies one value rather than two distinct values for a pair of speakers, thereby causing the audio rendering information 2 ( For example, a matrix in this example).

쌍들 중에서 어떠한 값 대칭성들도 존재하지 않을 때, 비트스트림 생성 유닛 (42) 은 또한 각각의 쌍에 대해, 스피커 쌍들이 부호 대칭성 (이는 하나의 스피커가 네거티브 값을 갖지만 다른 스피커가 포지티브 값을 갖는다는 것을 의미함) 을 갖는지 여부를 결정할 수도 있다. 쌍들 모두가 부호 대칭성을 가질 때, 비트스트림 생성 유닛 (42) 은 isAllSignSymmetric 신택스 엘리먼트를 1 로 설정할 수도 있다. 쌍들 모두가 부호 대칭성을 갖지 않을 때, 비트스트림 생성 유닛 (42) 은 isAllSignSymmetric 신택스 엘리먼트를 0 으로 설정할 수도 있다. 쌍들 모두가 아니라 하나 이상이 부호 대칭성을 가질 때, 비트스트림 생성 유닛 (42) 은 isAnySignSymmetric 신택스 엘리먼트를 1 로 설정할 수도 있다. 쌍들 중 어느 것도 부호 대칭성을 갖지 않을 때, 비트스트림 생성 유닛 (42) 은 isAnySignSymmetric 신택스 엘리먼트를 0 으로 설정할 수도 있다. 대칭 부호들을 갖는 쌍들에 대해, 비트스트림 생성 유닛 (42) 은 스피커 쌍에 대해 2 개의 별개의 부호들보다는 단지 하나의 부호만을 특정하거나 또는 어떠한 부호도 특정하지 않아서, 그에 의해 비트스트림 (21) 에서 오디오 렌더링 정보 (2) (예를 들어, 이 예에서는 행렬) 를 표현하는데 이용되는 비트수를 감소시킬 수도 있다.When there are no value symmetries in the pairs, the bitstream generation unit 42 also determines for each pair that the pairs of speakers are symmetric (which means that one speaker has a negative value but the other speaker has a positive value Quot;) < / RTI > When both of the pairs have sign symmetry, the bitstream generating unit 42 may set the isAllSignSymmetric syntax element to one. When both of the pairs do not have sign symmetry, the bitstream generating unit 42 may set the isAllSignSymmetric syntax element to zero. Bitstream generation unit 42 may set the isAnySignSymmetric syntax element to 1 when more than one but not all of the pairs have sign symmetry. When none of the pairs has sign symmetry, the bitstream generation unit 42 may set the isAnySignSymmetric syntax element to zero. For pairs with symmetric codes, the bitstream generation unit 42 specifies only one sign or no code for the speaker pair, rather than two distinct codes, thereby identifying in the bitstream 21 The number of bits used to represent the audio rendering information 2 (e.g., a matrix in this example) may be reduced.

비트스트림 생성 유닛 (42) 은 다음 테이블에 나타낸 신택스에 따라 HoARenderingMatrix() 의 신택스를 제시하는 테이블에 나타낸 DecodeHoaMatrixData() 컨테이너를 특정할 수도 있다:The bitstream generating unit 42 may specify the DecodeHoaMatrixData () container shown in the table that presents the syntax of HoARenderingMatrix () according to the syntax shown in the following table:

테이블 - DecodeHoaMatrixData 의 신택스Table - Syntax for DecodeHoaMatrixData

DecodeHoaMatrixData 의 신택스를 제시하는 전술한 테이블에서의 hasValue 신택스 엘리먼트는 행렬 엘리먼트가 희소하게 코딩되는지 여부를 나타내는 플래그를 표현할 수도 있다. signMatrix 신택스 엘리먼트는, 하나의 예로서, 선형화된 벡터-형태로 HOA 렌더링 행렬의 부호 값들을 갖는 행렬을 표현할 수도 있다. hoaMatrix 신택스 엘리먼트는, 하나의 예로서, 선형화된 벡터-형태로 HOA 렌더링 행렬 값들을 표현할 수도 있다. 비트스트림 생성 유닛 (42) 은 다음 테이블에 나타낸 신택스에 따라 DecodeHoaMatrixData 의 신택스를 제시하는 테이블에 나타낸 DecodeHoaGainValue() 컨테이너를 특정할 수도 있다:The hasValue syntax element in the above table which presents the syntax of DecodeHoaMatrixData may represent a flag indicating whether the matrix element is rarely coded. The signMatrix syntax element, as an example, may represent a matrix having sign values of the HOA rendering matrix in a linearized vector-form. The hoaMatrix syntax element, as an example, may represent the HOA rendering matrix values in a linearized vector-form. The bitstream generating unit 42 may specify the DecodeHoaGainValue () container shown in the table that presents the syntax of DecodeHoaMatrixData according to the syntax shown in the following table:

테이블 - DecodeHoaGainValue 의 신택스Table - Syntax for DecodeHoaGainValue

비트스트림 생성 유닛 (42) 은 다음 테이블에 특정된 신택스에 따라 DecodeHoaGainValue 의 신택스를 제시하는 테이블에 나타낸 readRange() 컨테이너를 특정할 수도 있다:The bitstream generation unit 42 may specify a readRange () container as indicated in the table presenting the syntax of DecodeHoaGainValue according to the syntax specified in the following table:

테이블 7 - ReadRange 의 신택스Table 7 - Syntax for ReadRange

도 3 의 예에 도시되지 않지만, 오디오 인코딩 디바이스 (20) 는 또한 현재 프레임이 방향성-기반 합성 또는 벡터-기반 합성을 이용하여 인코딩되어야 하는지 여부에 기초하여 (예를 들어, 방향성-기반 비트스트림 (21) 과 벡터-기반 비트스트림 (21) 사이에서) 오디오 인코딩 디바이스 (20) 로부터 출력된 비트스트림을 스위칭하는 비트스트림 출력 유닛을 포함할 수도 있다. 비트스트림 출력 유닛은, (HOA 계수들 (11) 이 합성 오디오 오브젝트로부터 생성되었다는 검출 결과로서) 방향성-기반 합성이 수행되었는지 또는 (HOA 계수들이 레코딩되었다는 검출 결과로서) 벡터-기반 합성이 수행되었는지 여부를 나타내는, 콘텐츠 분석 유닛 (26) 에 의해 출력된 신택스 엘리먼트에 기초하여 스위치를 수행할 수도 있다. 비트스트림 출력 유닛은 비트스트림들 (21) 중 각각의 하나의 비트스트림과 함께 현재 프레임에 대해 이용된 현재 인코딩 또는 스위치를 나타내기 위한 올바른 헤더 신택스를 특정할 수도 있다.Although not shown in the example of FIG. 3, the audio encoding device 20 is also configured to determine whether the current frame should be encoded using directional-based synthesis or vector-based synthesis (e.g., 21) and a vector-based bit stream (21)). The bit stream output unit may switch the bit stream output from the audio encoding device (20). The bitstream output unit determines whether directional-based compositing has been performed (as a detection result that the HOA coefficients 11 have been generated from the composite audio object) or whether vector-based compositing has been performed (as a result of detecting that the HOA coefficients have been recorded) Based on the syntax element output by the content analyzing unit 26, which represents the syntax element. The bitstream output unit may specify a correct header syntax for indicating the current encoding or switch used for the current frame, along with each one of the bitstreams 21.

더욱이, 위에서 언급된 바와 같이, 음장 분석 유닛 (44) 은 (때때로 BG_TOT 가 2 개 이상의 (시간적으로) 인접한 프레임들에 걸쳐 일정하게 또는 동일하게 남아있을 수도 있지만) 프레임 기반으로 변화할 수도 있는 BG_TOT 주변 HOA 계수들 (47) 을 식별할 수도 있다. BG_TOT 에서의 변화는 감소된 전경 V[k] 벡터들 (55) 에서 표현되는 계수들에 대한 변화들을 발생시킬 수도 있다. BG_TOT 에서의 변화는 (다시, 때때로 BG_TOT가 2 개 이상의 (시간적으로) 인접한 프레임들에 걸쳐 일정하게 또는 동일하게 남아있을 수도 있지만) 프레임 기반으로 변화하는 배경 HOA 계수들 (또한 "주변 HOA 계수들" 이라고도 지칭될 수도 있음) 을 발생시킬 수도 있다. 변화들은 종종, 부가적인 주변 HOA 계수들의 부가 또는 제거 및 감소된 전경 V[k] 벡터들 (55) 로부터의 계수들의 대응하는 제거 또는 그에 대한 계수들의 부가에 의해 표현되는 음장의 양태들에 대한 에너지의 변화를 발생시킨다.Moreover, as mentioned above, the sound field analysis unit 44 may be configured to determine the BG _TOT (which may sometimes vary on a frame basis) (although the BG _TOT may remain constant or uniform over two or more (temporally) _May identify the HOA coefficients 47 around the _TOT . The change in BG _TOT may result in changes to the coefficients represented in the reduced foreground V [ k ] vectors 55. The change in BG _TOT (again, occasionally) may be due to background HOA coefficients (also referred to as " neighboring HOA coefficients ") that vary on a frame basis, although the BG _TOT may remain constant or identical over two or more (temporally) May also be referred to as " s " s). The changes are often made by adding or removing additional surrounding HOA coefficients and applying energy to aspects of the sound field represented by the corresponding elimination of coefficients from the reduced foreground V [ k ] vectors 55 or addition of coefficients thereto .

그 결과, 음장 분석 유닛 (44) 은 추가로, 음장의 주변 성분들을 표현하는데 이용된다는 관점들에서 주변 HOA 계수들이 프레임 간에서 변화할 때를 결정하고 주변 HOA 계수에 대한 변화를 나타내는 플래그 또는 다른 신택스 엘리먼트를 생성할 수도 있다 (여기서 변화는 또한 주변 HOA 계수의 "천이" 라고 또는 주변 HOA 계수의 "천이" 로서 지칭될 수도 있다). 특히, 계수 감소 유닛 (46) 은 (AmbCoeffTransition 플래그 또는 AmbCoeffIdxTransition 플래그로서 표시될 수도 있는) 플래그를 생성하여, 플래그를 비트스트림 생성 유닛 (42) 에 제공하여 플래그가 (가능하다면 사이드 채널 정보의 부분으로서) 비트스트림 (21) 에 포함될 수도 있도록 할 수도 있다.As a result, the sound field analysis unit 44 additionally determines when the surrounding HOA coefficients change between frames in terms of being used to represent the surrounding components of the sound field, and generates a flag or other syntax (Where the change may also be referred to as a " transition " of a surrounding HOA coefficient or a " transition " of a surrounding HOA coefficient). In particular, the coefficient reduction unit 46 generates a flag (which may be indicated as the AmbCoeffTransition flag or the AmbCoeffIdxTransition flag) and provides the flag to the bitstream generation unit 42 so that the flag (possibly as part of the side channel information) And may be included in the bitstream 21 as well.

계수 감소 유닛 (46) 은, 주변 계수 천이 플래그를 특정하는 것에 부가적으로, 감소된 전경 V[k] 벡터들 (55) 이 생성되는 방법을 또한 변경할 수도 있다. 하나의 예에서, 현재 프레임 동안 주변 HOA 주변 계수들 중 하나가 천이 중이라는 결정시, 계수 감소 유닛 (46) 은, 천이시 주변 HOA 계수에 대응하는 감소된 전경 V[k] 벡터들 (55) 의 V-벡터들 각각에 대한 벡터 계수 (또한 "벡터 엘리먼트" 또는 "엘리먼트" 라고도 지칭될 수도 있음) 를 특정할 수도 있다. 다시, 천이시 주변 HOA 계수는 배경 계수들의 BG_TOT 총 개수로부터 부가 또는 제거할 수도 있다. 그에 따라, 배경 계수들의 총 개수에 있어서의 결과적인 변화는 주변 HOA 계수가 비트스트림에 포함되는지 또는 포함되지 않는지 여부, 그리고 상술된 제 2 및 제 3 구성 모드들에서 비트스트림에 특정된 V-벡터들에 대해 V-벡터들의 대응하는 엘리먼트가 포함되는지 여부에 영향을 미친다. 계수 감소 유닛 (46) 이 어떻게 감소된 전경 V[k] 벡터들 (55) 을 특정하여 에너지에서의 변화들을 극복할 수도 있는지에 관한 더 많은 정보는, 발명의 명칭이 "TRANSITIONING OF AMBIENT HIGHER_ORDER AMBISONIC COEFFICIENTS" 이고 2015년 1월 12일자로 출원된 미국 출원 제14/594,533호에 제공된다.The coefficient reduction unit 46 may also modify how the reduced foreground V [ k ] vectors 55 are generated, in addition to specifying the periphery coefficient transition flag. In one example, when determining that one of the neighboring HOA perimeter coefficients during the current frame is being transitioned, the coefficient reduction unit 46 calculates the reduced foreground V [ k ] vectors 55 corresponding to the surrounding HOA coefficients at transition Vectors (also referred to as " vector elements " or " elements ") for each of the V- Again, the HOA coefficients around the transition may be added or removed from the BG _TOT total number of background factors. Thus, the resulting change in the total number of background coefficients is determined by whether the neighboring HOA coefficients are included or not in the bitstream, and whether the V-vectors specified in the bitstream in the second and third configuration modes described above Lt; RTI ID = 0.0 > V-vectors < / RTI > More information about how the coefficient reduction unit 46 may overcome changes in energy by specifying reduced foreground V [ k ] vectors 55 is described in US patent application Ser. &Quot; and is filed in U.S. Serial No. 14 / 594,533, filed January 12,2015.

도 4 는 도 2 의 오디오 디코딩 디바이스 (24) 를 더 상세히 예시하는 블록 다이어그램이다. 도 4 의 예에 도시된 바와 같이, 오디오 디코딩 디바이스 (24) 는 추출 유닛 (72), 렌더러 재구성 유닛 (81), 방향성-기반 재구성 유닛 (90) 및 벡터-기반 재구성 유닛 (92) 을 포함할 수도 있다. 아래에 설명되지만, HOA 계수들을 압축해제하거나 또는 그렇지 않으면 디코딩하는 다양한 양태들 및 오디오 디코딩 디바이스 (24) 에 관한 더 많은 정보는 발명의 명칭이 "INTERPOLATION FOR DECOMPOSED REPRESENTATIONS OF A SOUND FIELD" 이고 2014년 5월 29일자로 출원된 국제 특허 출원 공개 WO 2014/194099호에서 입수가능하다.FIG. 4 is a block diagram illustrating the audio decoding device 24 of FIG. 2 in more detail. 4, the audio decoding device 24 includes an extraction unit 72, a renderer reconstruction unit 81, a direction-based reconstruction unit 90 and a vector-based reconstruction unit 92 It is possible. Although described below, the various aspects of decompressing or otherwise decoding the HOA coefficients and more information about the audio decoding device 24 are described in " INTERPOLATION FOR DECOMPOSED REPRESENTATIONS OF A SOUND FIELD & International Patent Application Publication No. WO 2014/194099, filed on May 29th.

추출 유닛 (72) 은 비트스트림 (21) 을 수신하고 오디오 렌더링 정보 (2) 및 HOA 계수들 (11) 의 다양한 인코딩된 버전들 (예를 들어, 방향성-기반 인코딩된 버전 또는 벡터-기반 인코딩된 버전) 을 추출하도록 구성된 유닛을 표현할 수도 있다. 다시 말해, 고차 앰비소닉스 (HOA) 렌더링 행렬들은 오디오 인코딩 디바이스 (20) 에 의해 송신되어 오디오 재생 시스템 (16) 에서 HOA 렌더링 프로세스를 통한 제어를 가능하게 할 수도 있다. 위에 도시된 타입 ID_CONFIG_EXT_HOA_MATRIX 의 mpegh3daConfigExtension 에 의해 송신이 용이하게 될 수도 있다. mpegh3daConfigExtension 은 상이한 라우드스피커 재생 구성들에 대한 수 개의 HOA 렌더링 행렬들을 포함할 수도 있다. HOA 렌더링 행렬들이 송신될 때, 오디오 인코딩 디바이스 (20) 는, 각각의 HOA 렌더링 행렬 신호에 대해, 렌더링 행렬의 차원들을 HoaOrder 와 함께 결정하는 연관된 타깃 라우드스피커 레이아웃을 시그널링한다.The extraction unit 72 receives the bitstream 21 and generates various encoded versions of the audio rendering information 2 and the HOA coefficients 11 (e.g., a directionally-based encoded version or a vector- Version) of the program. In other words, the higher order ambiance (HOA) rendering matrices may be transmitted by the audio encoding device 20 to enable control through the HOA rendering process in the audio reproduction system 16. Transmission may be facilitated by the mpegh3daConfigExtension of type ID_CONFIG_EXT_HOA_MATRIX shown above. The mpegh3daConfigExtension may include several HOA rendering matrices for different loudspeaker playback configurations. When the HOA rendering matrices are transmitted, the audio encoding device 20 signals, for each HOA rendering matrix signal, the associated target loudspeaker layout that together determine the dimensions of the rendering matrix with HoaOrder.

고유한 HoARenderingMatrixId 의 송신은 오디오 재생 시스템 (16) 에서 이용가능한 디폴트 HOA 렌더링 행렬, 또는 오디오 비트스트림 (21) 의 외측으로부터 송신된 HOA 렌더링 행렬을 참조하는 것을 가능하게 한다. 일부 경우들에서, 모든 HOA 렌더링 행렬은 N3D 에서 정규화되는 것으로 가정되고 비트스트림 (21) 에 정의된 것과 같은 HOA 계수들의 오더링을 따른다.The transmission of the unique HoARenderingMatrixId makes it possible to reference the default HOA rendering matrix available in the audio reproduction system 16, or the HOA rendering matrix transmitted from outside the audio bitstream 21. In some cases, all of the HOA rendering matrices are assumed to be normalized in N3D and follow the ordering of HOA coefficients as defined in bitstream 21.

함수 findSymmetricSpeakers 는, 위에서 언급된 바와 같이, 하나의 예로서, 소위 "스위트 스폿" 에서 청취자의 정중면에 대해 대칭인 제공된 라우드스피커 셋업 내의 모든 라우드스피커 쌍들의 포지션 및 개수를 나타낼 수도 있다. 이 헬퍼 함수는 다음과 같이 정의될 수도 있다:The function findSymmetricSpeakers may represent, as one example, the position and number of all loudspeaker pairs in a given loudspeaker setup that is symmetric with respect to the midpoint of the listener in the so-called " sweet spot ", as mentioned above. This helper function may be defined as:

추출 유닛 (72) 은 1.0 및 -1.0 값들의 벡터를 연산하기 위해 함수 createSymSigns 를 호출할 수도 있는데, 이 함수는 그 후에 대칭 라우드스피커들과 연관된 행렬 엘리먼트들을 생성하는데 이용될 수도 있다. 이 createSymSigns 함수는 다음과 같이 정의될 수도 있다:The extraction unit 72 may call the function createSymSigns to compute vectors of 1.0 and -1.0 values, which may then be used to generate matrix elements associated with symmetric loudspeakers. This createSymSigns function can also be defined as:

추출 유닛 (72) 은 수평면에서만 단지 이용되는 HOA 계수들을 식별하기 위한 비트마스크를 생성하기 위해 함수 create2dBitmask 를 호출할 수도 있다. create2dBitmask 함수는 다음과 같이 정의될 수도 있다:The extraction unit 72 may call the function create2dBitmask to generate a bit mask for identifying the HOA coefficients that are only used in the horizontal plane. The create2dBitmask function can also be defined as:

HOA 렌더링 행렬 계수들을 디코딩하기 위해, 추출 유닛 (72) 은 신택스 엘리먼트 HOARenderingMatrixSet() 를 우선 추출할 수도 있고, 이 신택스 엘리먼트는 위에서 언급된 바와 같이 원하는 라우드스피커 레이아웃에 대한 HOA 렌더링을 달성하기 위해 적용될 수도 있는 하나 이상의 HOA 렌더링 행렬들을 포함할 수도 있다. 일부 경우들에서, 주어진 비트 스트림은 HOARenderingMatrixSet() 의 하나보다 더 많은 인스턴스를 포함하지 않을 수도 있다. 신택스 엘리먼트 HoARenderingMatrix() 는 (도 4 의 예에서 렌더러 정보 (2) 로서 표시될 수도 있는) HOA 렌더링 행렬 정보를 포함한다. 추출 유닛 (72) 은 디코딩 프로세스를 가이딩할 수도 있는 config 정보를 우선 판독할 수도 있다. 그 후에, 추출 유닛 (72) 이 이에 따라 행렬 엘리먼트들을 판독한다.To decode the HOA rendering matrix coefficients, the extraction unit 72 may first extract the syntax element HOARenderingMatrixSet (), which may be applied to achieve HOA rendering for the desired loudspeaker layout, as noted above Lt; RTI ID = 0.0 > HOA < / RTI > In some cases, a given bitstream may not contain more than one instance of HOARenderingMatrixSet (). The syntax element HoARenderingMatrix () includes the HOA rendering matrix information (which may be represented as the renderer information (2) in the example of FIG. 4). The extraction unit 72 may first read the config information, which may guide the decoding process. Thereafter, the extraction unit 72 reads the matrix elements accordingly.

일부 경우들에서, 추출 유닛 (72) 은, 시작점에서, 필드들 precisionLevel 및 gainLimitPerOrder 를 판독한다. 플래그 gainLimitPerOrder 가 설정될 때, 추출 유닛 (72) 은 각각의 HOA 차수에 대해 별도로 maxGain, 및 minGain 필드들을 판독하고 디코딩한다. 플래그 gainLimitPerOrder 가 설정되지 않을 때, 추출 유닛 (72) 은 필드들 maxGain 및 minGain 을 한 번 판독하고 디코딩하고, 디코딩 프로세스 동안 이들 필드들을 모든 HOA 차수들에 적용한다. 일부 경우들에서, minGain 값은 0db 와 -69dB 사이에 있어야 한다. 일부 경우들에서, maxGain 값은 minGain 값보다 더 낮은 1dB 와 111dB 사이에 있어야 한다. 도 9 는 HOA 렌더링 행렬 내의 HOA 차수 의존 최소 및 최대 이득들의 일 예를 예시하는 다이어그램이다.In some cases, the extraction unit 72 reads the fields precisionLevel and gainLimitPerOrder at the starting point. When the flag gainLimitPerOrder is set, the extraction unit 72 reads and decodes the maxGain and minGain fields separately for each HOA order. When the flag gainLimitPerOrder is not set, the extraction unit 72 reads and decodes the fields maxGain and minGain once, and applies these fields to all HOA orders during the decoding process. In some cases, the minGain value should be between 0db and -69dB. In some cases, the maxGain value should be between 1dB and 111dB lower than the minGain value. Figure 9 is a diagram illustrating an example of HOA order dependent minimum and maximum gains in the HOA rendering matrix.

추출 유닛 (72) 은 그 다음에, 행렬이 충만한 것으로서 또는 부분적으로 희소한 것으로서 정의되는지 여부를 시그널링할 수도 있는 플래그 isFullMatrix 를 판독할 수도 있다. 행렬이 부분적으로 희소한 것으로서 정의될 때, 추출 유닛 (72) 은 HOA 렌더링 행렬이 희소하게 코딩되게 하는 HOA 차수를 특정하는 다음 필드 (예를 들어, firstSparseOrder 신택스 엘리먼트) 를 판독한다. HOA 렌더링 행렬들은 종종, 라우드스피커 재생 셋업에 따라, 낮은 차수에 대해 조밀하고 보다 높은 차수들에서 희소하게 될 수도 있다. 도 10 은 22 개의 라우드스피커들에 대한 부분적 희소 6 차 HOA 렌더링 행렬을 예시하는 다이어그램이다. 도 10 에 도시된 행렬의 희소성 (sparseness) 은 26 번째 HOA 계수 (HOA 차수 5) 에서 시작한다.The extraction unit 72 may then read the flag isFullMatrix which may signal whether the matrix is defined as being full or partially sparse. When the matrix is defined as being partially sparse, the extraction unit 72 reads the next field (e.g., the firstSparseOrder syntax element) specifying the HOA order that causes the HOA rendering matrix to be rarely coded. The HOA rendering matrices are often dense for lower orders and may be scarce in higher orders, depending on the loudspeaker reproduction setup. 10 is a diagram illustrating a partial sparse 6th order HOA rendering matrix for 22 loudspeakers. The sparseness of the matrix shown in FIG. 10 starts at the 26th HOA coefficient (HOA order 5).

(lfeExists 신택스 엘리먼트에 의해 나타내는) 하나 이상의 저주파 효과 (LFE) 채널들이 라우드스피커 재생 셋업 내에 존재하는지 여부에 따라, 추출 유닛 (72) 은 필드 hasLfeRendering 을 판독할 수도 있다. hasLfeRendering 이 설정되지 않을 때, 추출 유닛 (72) 은 LFE 채널들에 관련된 행렬 엘리먼트들이 디지털 제로들인 것으로 가정하도록 구성된다. 추출 유닛 (72) 에 의해 판독된 다음 필드는, 0 차의 계수와 연관된 행렬 엘리먼트들이 포지티브인지 여부를 시그널링하는 플래그 zerothOrderAlwaysPositive 이다. zerothOrderAlwaysPositive 가 0 차 HOA 계수들이 포지티브임을 나타내는 이 경우에서, 추출 유닛 (72) 은 0 차 HOA 계수들에 대응하는 렌더링 행렬 계수들에 대해 숫자 부호들이 코딩되지 않는다는 것을 결정한다.Depending on whether one or more low frequency effect (LFE) channels (represented by the lfeExists syntax element) are present in the loudspeaker reproduction setup, the extraction unit 72 may read the field hasLfeRendering. When hasLfeRendering is not set, the extraction unit 72 is configured to assume that the matrix elements associated with the LFE channels are digital zeros. The next field read by the extraction unit 72 is the flag zerothOrderAlwaysPositive which signals whether the matrix elements associated with the zeroth coefficient are positive. In this case, where the zerothOrderAlwaysPositive indicates that the zeroth order HOA coefficients are positive, the extraction unit 72 determines that the numeric codes are not coded for the render matrix coefficients corresponding to the zeroth order HOA coefficients.

다음에, HOA 렌더링 행렬의 속성들은 정중면과 관련하여 대칭인 라우드스피커 쌍들에 대해 시그널링될 수도 있다. 일부 경우들에서, a) 값 대칭성 및 b) 부호 대칭성에 관련된 2 개의 대칭성 속성들이 존재한다. 값 대칭성의 경우, 대칭 라우드스피커 쌍의 좌측 라우드스피커의 행렬 엘리먼트들은 코딩되지 않지만, 오히려 추출 유닛 (72) 은 다음을 수행하는 헬퍼 함수 createSymSigns 를 채용하는 것에 의해 우측 라우드스피커의 디코딩된 행렬 엘리먼트들을 형성하는 이들 엘리먼트들을 도출한다:Next, the attributes of the HOA rendering matrix may be signaled for pairs of loudspeakers that are symmetric with respect to the median plane. In some cases, there are two symmetric properties related to a) value symmetry and b) sign symmetry. In the case of value symmetry, the matrix elements of the left loudspeaker of the symmetric loudspeaker pair are not coded, but rather the extraction unit 72 forms the decoded matrix elements of the right loudspeaker by employing a helper function createSymSigns These elements are derived from:

라우드스피커 쌍이 대칭인 값이 아니라면, 행렬 엘리먼트들은 이들의 숫자 부호들과 관련하여 대칭일 수도 있다. 라우드스피커 쌍이 부호 대칭일 때, 대칭 라우드스피커 쌍의 좌측 라우드스피커의 행렬 엘리먼트들의 숫자 부호들은 코딩되지 않고, 추출 유닛 (72) 은 다음을 수행하는 헬퍼 함수 createSymSigns 를 채용하는 것에 의해 우측 라우드스피커와 연관된 행렬 엘리먼트들의 숫자 부호들로부터 이들 숫자 부호들을 도출한다:If the loudspeaker pairs are not symmetric, then the matrix elements may be symmetrical with respect to their numerical signatures. When the loudspeaker pairs are symmetrical, the numeric signs of the matrix elements of the left loudspeaker of the symmetrical loudspeaker pair are not coded, and the extraction unit 72 can use the helper function createSymSigns to: Derive these numeric codes from the numeric codes of the matrix elements:

도 11 은 대칭성 속성들의 시그널링을 예시하는 다이어그램이다. 라우드스피커 쌍은 동시에 값 대칭 및 부호 대칭으로서 정의될 수 없다. 최종 디코딩 플래그 hasVerticalCoef 는 원형 (즉, 2D) HOA 계수들과 연관된 행렬 엘리먼트들만이 코딩되는지를 특정하였다. hasVerticalCoef 가 설정되지 않은 경우, 헬퍼 함수 create2dBitmask 로 정의된 HOA 계수들과 연관된 행렬 엘리먼트들은 디지털 제로로 설정된다.11 is a diagram illustrating signaling of symmetric attributes. Loudspeaker pairs can not be defined as simultaneously symmetrical and symmetrical. The final decoding flag hasVerticalCoef specifies whether only the matrix elements associated with circular (i.e., 2D) HOA coefficients are coded. If hasVerticalCoef is not set, the matrix elements associated with the HOA coefficients defined by the helper function create2dBitmask are set to digital zero.

즉, 추출 유닛 (72) 은 도 11 에서 제시된 프로세스에 따라 오디오 렌더링 정보 (2) 를 추출할 수도 있다. 추출 유닛 (72) 은 비트스트림 (21) 으로부터 isAllValueSymmetric 신택스 엘리먼트를 우선 판독할 수도 있다 (300). isAllValueSymmetric 신택스 엘리먼트가 1 (또는, 다시 말해, 불 참 (Boolean true)) 로 설정될 때, 추출 유닛 (72) 은 numPairs 신택스 엘리먼트의 값을 반복하여, valueSymmetricPairs 어레이 신택스 엘리먼트를 1 의 값으로 설정 (스피커 쌍들 모두가 값 대칭임을 효과적으로 나타냄) 할 수도 있다 (302).That is, the extraction unit 72 may extract the audio rendering information 2 according to the process shown in FIG. The extraction unit 72 may first read the isAllValueSymmetric syntax element from the bitstream 21 (300). When the isAllValueSymmetric syntax element is set to 1 (or, in other words, Boolean true), the extraction unit 72 repeats the value of the numPairs syntax element and sets the valueSymmetricPairs array syntax element to a value of 1 Effectively indicating that both pairs are value symmetric).

isAllValueSymmetric 신택스 엘리먼트가 0 (또는, 다시 말해, 불 거짓 (Boolean false)) 으로 설정될 때, 추출 유닛 (72) 은 그 다음에 isAnyValueSymmetric 신택스 엘리먼트를 판독할 수도 있다 (304). isAnyValueSymmetric 신택스 엘리먼트가 1 (또는, 다시 말해, 불 참) 로 설정될 때, 추출 유닛 (72) 은 numPairs 신택스 엘리먼트의 값을 반복하여, valueSymmetricPairs 어레이 신택스 엘리먼트를 비트스트림 (21) 으로부터 순차적으로 판독되는 비트로 설정할 수도 있다 (306). 추출 유닛 (72) 은 또한 0 으로 설정된 valueSymmetricPairs 신택스 엘리먼트를 갖는 쌍들 중 임의의 것에 대한 isAnySignSymmetric 신택스 엘리먼트를 획득할 수도 있다 (308). 추출 유닛 (72) 은 그 후에, 다수의 쌍들을 다시 반복하고, valueSymmetricPairs 가 0 과 동일할 때, signSymmetricPairs 비트를 비트스트림 (21) 으로부터 판독된 값으로 설정할 수도 있다 (310).When the isAllValueSymmetric syntax element is set to 0 (or, in other words, Boolean false), the extraction unit 72 may then read the isAnyValueSymmetric syntax element (304). When the isAnyValueSymmetric syntax element is set to 1 (or, in other words, not true), the extraction unit 72 iterates the value of the numPairs syntax element and returns the valueSymmetricPairs array syntax element as a bit that is sequentially read from the bitstream 21 (306). The extraction unit 72 may also obtain (308) an isAnySignSymmetric syntax element for any of the pairs having a valueSymmetricPairs syntax element set to zero. The extraction unit 72 then repeats the multiple pairs again and may set the signSymmetricPairs bit to the value read from the bitstream 21 when the valueSymmetricPairs equals zero.

isAnyValueSymmetric 신택스 엘리먼트가 0 (또는, 다시 말해, 불 거짓) 으로 설정될 때, 추출 유닛 (72) 은 isAllSignSymmetric 신택스 엘리먼트를 비트스트림 (21) 으로부터 판독할 수도 있다 (312). isAllSignSymmetric 신택스 엘리먼트가 1 (또는, 다시 말해, 불 참) 의 값으로 설정될 때, 추출 유닛 (72) 은 numPairs 신택스 엘리먼트의 값을 반복하여, valueSymmetricPairs 어레이 신택스 엘리먼트를 1 의 값으로 설정 (스피커 쌍들 모두가 값 대칭임을 효과적으로 나타냄) 할 수도 있다 (316).The extraction unit 72 may read 312 the isAllSignSymmetric syntax element from the bitstream 21 when the isAnyValueSymmetric syntax element is set to zero (or, in other words, false). When the isAllSignSymmetric syntax element is set to a value of 1 (or, in other words, not true), the extraction unit 72 repeats the value of the numPairs syntax element and sets the valueSymmetricPairs array syntax element to a value of 1 Lt; / RTI > is effectively symmetric) (316).

isAllSignSymmetric 신택스 엘리먼트가 0 (또는, 다시 말해, 불 거짓) 으로 설정될 때, 추출 유닛 (72) 은 isAnySignSymmetric 신택스 엘리먼트를 비트스트림 (21) 으로부터 판독할 수도 있다 (316). 추출 유닛 (72) 은 numPairs 신택스 엘리먼트의 값을 반복하여, signSymmetricPairs 어레이 신택스 엘리먼트를 비트스트림 (21) 으로부터 순차적으로 판독되는 비트로 설정할 수도 있다 (318). 비트스트림 생성 유닛 (42) 은 값 대칭성 정보, 부호 대칭성 정보 또는 값 및 부호 대칭성 정보 양쪽의 조합을 특정하기 위해 추출 유닛 (72) 에 관하여 상술된 것과 상반되는 프로세스를 수행할 수도 있다.The extraction unit 72 may read (316) the isAnySignSymmetric syntax element from the bitstream 21 when the isAllSignSymmetric syntax element is set to zero (or, in other words, false). The extraction unit 72 may repeat the value of the numPairs syntax element to set the signSymmetricPairs array syntax element to bits that are read out sequentially from the bitstream 21 (318). Bitstream generation unit 42 may perform a process that is contrary to that described above with respect to extraction unit 72 to specify a combination of value symmetry information, code symmetry information, or both a value and sign symmetry information.

렌더러 재구성 유닛 (81) 은 오디오 렌더링 정보 (2) 에 기초하여 렌더러를 재구성하도록 구성된 유닛을 표현할 수도 있다. 즉, 위에서 언급된 속성들을 이용하여, 렌더러 재구성 유닛 (81) 은 일련의 행렬 엘리먼트 이득 값들을 판독할 수도 있다. 절대 이득 값을 판독하기 위해, 렌더러 재구성 유닛 (81) 은 함수 DecodeGainValue() 를 호출할 수도 있다. 렌더러 재구성 유닛 (81) 은 이득 값을 균일하게 디코딩하기 위해 알파벳 인덱스의 함수 ReadRange() 를 호출할 수도 있다. 디코딩된 이득 값이 디지털 제로가 아닐 때, 렌더러 재구성 유닛 (81) 은 부가적으로 (아래의 테이블 당) 숫자 부호 값을 판독할 수도 있다. 행렬 엘리먼트가 (isHoaCoefSparse 를 통해) 희소한 것으로 시그널링된 HOA 계수와 연관될 때, hasValue 플래그는 gainValueIndex 에 선행한다 (테이블 b 참조). hasValue 플래그가 0 일 때, 이 엘리먼트는 디지털 제로로 설정되고 어떠한 gainValueIndex 및 부호도 시그널링되지 않는다.The renderer reconstruction unit 81 may represent a unit configured to reconstruct the renderer based on the audio rendering information (2). That is, using the above-mentioned properties, the renderer reconstruction unit 81 may read a set of matrix element gain values. To read the absolute gain value, the renderer reconstruction unit 81 may call the function DecodeGainValue (). The renderer reconstruction unit 81 may call the function ReadRange () of the alphabet index to uniformly decode the gain value. When the decoded gain value is not digital zero, the renderer reconstruction unit 81 may additionally read the numerical sign value (per table below). When a matrix element is associated with a HOA coefficient signaled to be scarce (via isHoaCoefSparse), the hasValue flag precedes gainValueIndex (see table b). When the hasValue flag is 0, this element is set to digital zero and no gainValueIndex and sign are signaled.

테이블들 a 및 b - 행렬 엘리먼트를 디코딩하기 위한 비트 스트림 신택스에 대한 예들Examples of bitstream syntax for decoding tables a and b - matrix elements

라우드스피커 쌍들에 대해 특정된 대칭성 속성들에 따라, 렌더러 재구성 유닛 (81) 은 우측 라우드스피커로부터 좌측 라우드스피커와 연관된 행렬 엘리먼트들을 도출할 수도 있다. 이 경우, 좌측 라우드스피커에 대한 행렬 엘리먼트를 디코딩하기 위한 비트스트림 (21) 에서의 오디오 렌더링 정보 (2) 는 이에 따라 감소되거나 또는 잠재적으로 완전히 생략된다.Depending on the symmetry properties specified for the loudspeaker pairs, the renderer reconstruction unit 81 may derive the matrix elements associated with the left loudspeaker from the right loudspeaker. In this case, the audio rendering information 2 in the bitstream 21 for decoding the matrix elements for the left loudspeaker is accordingly reduced or potentially omitted altogether.

이러한 방법으로, 오디오 디코딩 디바이스 (24) 는 대칭성 정보를 결정하여 특정될 오디오 렌더링 정보의 사이즈를 감소시킬 수도 있다. 일부 경우들에서, 오디오 디코딩 디바이스 (24) 는 대칭성 정보를 결정하여 특정될 오디오 렌더링 정보의 사이즈를 감소시키고, 대칭성 정보에 기초하여 오디오 렌더러의 적어도 일부를 도출할 수도 있다.In this way, the audio decoding device 24 may determine the symmetry information and reduce the size of the audio rendering information to be specified. In some cases, the audio decoding device 24 may determine the symmetry information to reduce the size of the audio rendering information to be specified and derive at least a portion of the audio renderer based on the symmetry information.

이들 및 다른 경우들에서, 오디오 디코딩 디바이스 (24) 는 값 대칭성 정보를 결정하여 특정될 오디오 렌더링 정보의 사이즈를 감소시킬 수도 있다. 이들 및 다른 경우들에서, 오디오 디코딩 디바이스 (24) 는 값 대칭성 정보에 기초하여 오디오 렌더러의 적어도 일부를 도출할 수도 있다.In these and other cases, the audio decoding device 24 may determine the value symmetry information to reduce the size of the audio rendering information to be specified. In these and other cases, the audio decoding device 24 may derive at least a portion of the audio renderer based on the value symmetry information.

이들 및 다른 경우들에서, 오디오 디코딩 디바이스 (24) 는 부호 대칭성 정보를 결정하여 특정될 오디오 렌더링 정보의 사이즈를 감소시킬 수도 있다. 이들 및 다른 경우들에서, 오디오 디코딩 디바이스 (24) 는 부호 대칭성 정보에 기초하여 오디오 렌더러의 적어도 일부를 도출할 수도 있다.In these and other cases, the audio decoding device 24 may determine the code symmetry information to reduce the size of the audio rendering information to be specified. In these and other cases, the audio decoding device 24 may derive at least a portion of the audio renderer based on the sign symmetry information.

이들 및 다른 경우들에서, 오디오 디코딩 디바이스 (24) 는 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 행렬의 희소성을 나타내는 희소성 정보를 결정할 수도 있다.In these and other cases, the audio decoding device 24 may determine scarcity information indicative of the scarcity of the matrix used to render the spherical harmonic coefficients into a plurality of speaker feeds.

이들 및 다른 경우들에서, 오디오 디코딩 디바이스 (24) 는 행렬이 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되어야 하는 스피커 레이아웃을 결정할 수도 있다.In these and other cases, the audio decoding device 24 may determine the speaker layout in which the matrix should be used to render the spherical harmonic coefficients into a plurality of speaker feeds.

오디오 디코딩 디바이스 (24) 는, 이와 관련하여, 그 후에 비트스트림에 특정된 오디오 렌더링 정보 (2) 를 결정할 수도 있다. 오디오 렌더링 정보 (2) 에 포함된 신호 값에 기초하여, 오디오 재생 시스템 (16) 은 오디오 렌더러들 (22) 중 하나를 이용하여 복수의 스피커 피드들 (25) 을 렌더링할 수도 있다. 스피커 피드들은 스피커들 (3) 을 구동할 수도 있다. 위에서 언급된 바와 같이, 신호 값은 일부 경우들에서 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 (오디오 렌더러들 (22) 중 하나로서 디코딩 및 제공되는) 행렬을 포함할 수도 있다. 이 경우, 오디오 재생 시스템 (16) 은 행렬로 오디오 렌더러들 (22) 중 하나를 구성하여, 행렬에 기초하여 스피커 피드들 (25) 을 렌더링하기 위해 오디오 렌더러들 (22) 중 이 하나의 오디오 렌더러를 이용할 수도 있다.The audio decoding device 24 may, in this regard, determine audio rendering information (2) that is then specified in the bitstream. Based on the signal values contained in the audio rendering information 2, the audio playback system 16 may render a plurality of speaker feeds 25 using one of the audio renderers 22. The speaker feeds may also drive the speakers 3. As mentioned above, the signal value may include a matrix (which is decoded and provided as one of the audio renderers 22) used to render the spherical harmonic coefficients in a plurality of speaker feeds in some cases. In this case, the audio playback system 16 may configure one of the audio renderers 22 as a matrix to provide the audio renderer 22 with one of the audio renderers 22 to render the speaker feeds 25 based on the matrix. May be used.

획득된 오디오 렌더러 (22) 를 이용하여 HOA 계수들 (11) 이 렌더링되는 것이 가능하도록 HOA 계수들 (11) 의 다양한 인코딩된 버전들을 추출한 후에 디코딩하기 위해, 추출 유닛 (72) 은 HOA 계수들 (11) 이 다양한 방향-기반 또는 벡터-기반 버전들을 통해 인코딩되었는지 여부를 나타내는 위에서 언급된 신택스 엘리먼트로부터 결정할 수도 있다. 방향성-기반 인코딩이 수행되었을 때, 추출 유닛 (72) 은 (도 4 의 예에서 방향성-기반 정보 (91) 로서 표시되는) 이 인코딩된 버전과 연관된 신택스 엘리먼트들 및 HOA 계수들 (11) 의 방향성-기반 버전을 추출하여, 방향성-기반 정보 (91) 를 방향성-기반 재구성 유닛 (90) 에 전달할 수도 있다. 방향성-기반 재구성 유닛 (90) 은 방향성-기반 정보 (91) 에 기초하여 HOA 계수들 (11') 의 형태로 HOA 계수들을 재구성하도록 구성된 유닛을 표현할 수도 있다.The extraction unit 72 extracts the various encoded versions of the HOA coefficients 11 so as to enable the HOA coefficients 11 to be rendered using the obtained audio renderer 22, 11 may be determined from the above-mentioned syntax elements indicating whether they have been encoded through various direction-based or vector-based versions. When directional-based encoding has been performed, the extraction unit 72 determines whether the syntax elements associated with the encoded version (indicated as directional-based information 91 in the example of FIG. 4) and the directionality of the HOA coefficients 11 Based version 91 and forward the directional-based information 91 to the directional-based reconstruction unit 90. The direction- The directional-based reconstruction unit 90 may represent a unit configured to reconstruct HOA coefficients in the form of HOA coefficients 11 'based on the directional-based information 91.

신택스 엘리먼트가 HOA 계수들 (11) 이 벡터-기반 분해를 이용하여 인코딩되었음을 나타낼 때, 추출 유닛 (72) 은 코딩된 전경 V[k] 벡터들 (57) (코딩된 가중치들 (57) 및/또는 인덱스들 (63) 또는 스칼라 양자화된 V-벡터들을 포함할 수도 있음), 인코딩된 주변 HOA 계수들 (59) 및 대응하는 오디오 오브젝트들 (61) (또한 인코딩된 nFG 신호들 (61) 이라고도 지칭될 수도 있음) 을 추출할 수도 있다. 오디오 오브젝트들 (61) 각각은 벡터들 (57) 중 하나에 대응한다. 추출 유닛 (72) 은 코딩된 전경 V[k] 벡터들 (57) 을 V-벡터 재구성 유닛 (74) 에 전달하고 인코딩된 nFG 신호들 (61) 과 함께 인코딩된 주변 HOA 계수들 (59) 을 심리음향 디코딩 유닛 (80) 에 전달할 수도 있다.When the syntax element indicates that the HOA coefficients 11 have been encoded using vector-based decomposition, the extraction unit 72 extracts the coded foreground V [ k ] vectors 57 (coded weights 57 and / Encoded indexes 63 or scalar quantized V-vectors), encoded neighboring HOA coefficients 59 and corresponding audio objects 61 (also referred to as encoded nFG signals 61) May be extracted. Each of the audio objects 61 corresponds to one of the vectors 57. The extraction unit 72 transfers the coded foreground V [ k ] vectors 57 to the V-vector reconstruction unit 74 and the encoded neighboring HOA coefficients 59 along with the encoded nFG signals 61 To the psychoacoustic decoding unit 80. [

V-벡터 재구성 유닛 (74) 은 인코딩된 전경 V[k] 벡터들 (57) 로부터 V-벡터들을 재구성하도록 구성된 유닛을 표현할 수도 있다. V-벡터 재구성 유닛 (74) 은 양자화 유닛 (52) 의 것과 상반되는 방식으로 동작할 수도 있다.The V-vector reconstruction unit 74 may represent a unit configured to reconstruct the V-vectors from the encoded foreground V [ k ] vectors 57. The V-vector reconstruction unit 74 may operate in a manner incompatible with that of the quantization unit 52.

심리음향 디코딩 유닛 (80) 은 인코딩된 주변 HOA 계수들 (59) 및 인코딩된 nFG 신호들 (61) 을 디코딩하고 그에 의해 에너지 보상된 주변 HOA 계수들 (47') 및 보간된 nFG 신호들 (49') (또한 보간된 nFG 오디오 오브젝트들 (49') 이라고도 지칭될 수도 있음) 을 생성하도록 도 3 의 예에 도시된 심리음향 오디오 코더 유닛 (40) 과 상반되는 방식으로 동작할 수도 있다. 심리음향 디코딩 유닛 (80) 은 에너지 보상된 주변 HOA 계수들 (47') 을 페이드 유닛 (770) 에 전달하고 nFG 신호들 (49') 을 전경 공식화 유닛 (78) 에 전달할 수도 있다.The psychoacoustic decoding unit 80 decodes the encoded neighboring HOA coefficients 59 and the encoded nFG signals 61 thereby generating energy-compensated neighboring HOA coefficients 47 'and interpolated nFG signals 49 (Which may also be referred to as interpolated nFG audio objects 49 ') (also referred to as interpolated nFG audio objects 49'). Psychoacoustic decoding unit 80 may communicate energy compensated neighboring HOA coefficients 47 'to fade unit 770 and nFG signals 49' to foreground formulation unit 78.

공간-시간 보간 유닛 (76) 은 공간-시간 보간 유닛 (50) 에 관하여 상술된 것과 유사한 방식으로 동작할 수도 있다. 공간-시간 보간 유닛 (76) 은 감소된 전경 V[k] 벡터들 (55 _k ) 을 수신하고, 전경 V[k] 벡터들 (55 _k ) 및 감소된 전경 V[k-1] 벡터들 (55 _k _-1) 에 관하여 공간-시간 보간을 수행하여 보간된 전경 V[k] 벡터들 (55 _k '') 을 생성할 수도 있다. 공간-시간 보간 유닛 (76) 은 보간된 전경 V[k] 벡터들 (55 _k '') 을 페이드 유닛 (770) 에 포워딩할 수도 있다.The space-time interpolation unit 76 may operate in a manner similar to that described above with respect to the space-time interpolation unit 50. Space-time interpolation unit 76 decreases the foreground V [k] vector s (55 _k) for receiving and views V [k] vector s (55 _k) and the reduced view V [k -1] vector ( It may produce the perform time interpolation to the interpolated foreground V [k] vector _{(55 k '') - 55} k -1) with respect to space. The space-time interpolation unit 76 may forward the interpolated foreground V [ k ] vectors _55k " to the fade unit 770. [

추출 유닛 (72) 은 또한 주변 HOA 계수들 중 하나가 천이 중일 때를 나타내는 신호 (757) 를 페이드 유닛 (770) 에 출력할 수도 있고, 이 페이드 유닛은 그 후에, 보간된 전경 V[k] 벡터들 (55 _k '') 의 엘리먼트들 및 SHC_BG (47') (여기서 SHC_BG (47') 는 또한 "주변 HOA 채널들 (47')" 또는 "주변 HOA 계수들 (47')" 로서 표시될 수도 있음) 중 어떤 것이 페이드-인 또는 페이드-아웃되어야 하는지를 결정할 수도 있다. 일부 예들에서, 페이드 유닛 (770) 은 보간된 전경 V[k] 벡터들 (55 _k '') 의 엘리먼트들 및 주변 HOA 계수들 (47') 각각에 관하여 역으로 동작할 수도 있다. 즉, 페이드 유닛 (770) 은 보간된 전경 V[k] 벡터들 (55 _k '') 의 엘리먼트들 중 대응하는 하나의 엘리먼트에 관하여 페이드-인 또는 페이드-아웃 또는 페이드-인 및 페이즈-아웃 양쪽을 수행하는 동안, 주변 HOA 계수들 (47') 중 대응하는 하나의 주변 HOA 계수에 관하여 페이드-인 또는 페이드-아웃, 또는 페이드-인 또는 페이즈-아웃 양쪽을 수행할 수도 있다. 페이드 유닛 (770) 은 조정된 주변 HOA 계수들 (47'') 을 HOA 계수 공식화 유닛 (82) 에 출력하고 조정된 전경 V[k] 벡터들 (55 _k ''') 을 전경 공식화 유닛 (78) 에 출력할 수도 있다. 이와 관련하여, 페이드 유닛 (770) 은, 예를 들어, 보간된 전경 V[k] 벡터들 (55 _k '') 의 엘리먼트들 및 주변 HOA 계수들 (47') 의 형태로, HOA 계수들 또는 그의 도함수들의 다양한 양태들에 관하여 페이드 동작을 수행하도록 구성된 유닛을 표현한다.Extraction unit 72 also may output a signal 757 that indicates when a transition of the peripheral HOA coefficient to fading unit 770, the fading unit Thereafter, the interpolated foreground V [k] vector s (55 _k '') elements, and SHC _BG (47 a ') (wherein SHC _BG (47') is also "peripheral HOA channel (47 'appear as)" or "peripheral HOA coefficient (47)." May be determined to be fade-in or fade-out. In some instances, the fade unit 770 may operate inversely with respect to each of the elements of the interpolated foreground V [ k ] vectors 55 _k " and the surrounding HOA coefficients 47 '. That is, the fade unit 770 fades-in or fade-out or fades-in and phase-out both for the corresponding one of the elements of the interpolated foreground V [ k ] vectors _55k " In or fade-out, or both fade-in or phase-out, with respect to a corresponding one of the neighboring HOA coefficients 47 ', while performing the same operation. The fade unit 770 outputs the adjusted foreground V [ k ] vectors _55k '''to the foreground formulating unit 78 ). In this regard, the fading unit 770 is, for example, the interpolated foreground V [k] vector s (55 _k '') of the elements and the peripheral HOA coefficients (47 in the form of a), the HOA coefficient or Expresses a unit configured to perform a fade operation with respect to various aspects of its derivatives.

전경 공식화 유닛 (78) 은 조정된 전경 V[k] 벡터들 (55 _k ''') 및 보간된 nFG 신호들 (49') 에 관하여 행렬 곱셈을 수행하여 전경 HOA 계수들 (65) 을 생성하도록 구성된 유닛을 표현할 수도 있다. 이와 관련하여, 전경 공식화 유닛 (78) 은 오디오 오브젝트들 (49') (보간된 nFG 신호들 (49') 을 표시하게 하는 다른 방법임) 을 벡터들 (55 _k ''') 과 조합하여 HOA 계수들 (11') 의 전경 또는, 다시 말해, 우세 양태들을 재구성할 수도 있다. 전경 공식화 유닛 (78) 은 조정된 전경 V[k] 벡터들 (55 _k ''') 에 의한 보간된 nFG 신호들 (49') 의 행렬 곱셈을 수행할 수도 있다.The foreground formulation unit 78 performs the matrix multiplication with respect to the adjusted foreground V [k] vector (55 _k ''') and interpolated in nFG signal (49) to generate a foreground HOA coefficient 65 And may represent a configured unit. In this regard, the foreground formatter unit 78 combines the audio objects 49 '(which is another way to display the interpolated nFG signals 49') with vectors 55 _k ''' The foreground of the coefficients 11 ', or, in other words, dominant aspects. Foreground formulation unit 78 may perform matrix multiplication of 'nFG interpolated signal by a (49 in the adjusted foreground V [k] vector _(k 55'')").

HOA 계수 공식화 유닛 (82) 은 조정된 주변 HOA 계수들 (47'') 에 대해 전경 HOA 계수들 (65) 을 조합하여 HOA 계수들 (11') 을 획득하도록 구성된 유닛을 표현할 수도 있다. 주된 표기법은 HOA 계수들 (11') 이 HOA 계수들 (11) 과 유사하지만 동일하지 않을 수도 있다는 것을 반영한다. HOA 계수들 (11 과 11') 사이의 차이들은 손실성 송신 매체, 양자화 또는 다른 손실성 동작들을 통한 송신으로 인한 손실로부터 발생할 수도 있다.The HOA coefficient formulation unit 82 may represent a unit configured to combine the foreground HOA coefficients 65 with the adjusted neighboring HOA coefficients 47 '' to obtain the HOA coefficients 11 '. The main notation reflects that HOA coefficients 11 'are similar to HOA coefficients 11 but may not be the same. Differences between HOA coefficients 11 and 11 'may result from loss due to transmissions through lossy transmission media, quantization or other lossy operations.

부가적으로, 추출 유닛 (72) 및 오디오 디코딩 디바이스 (24) 는 더 일반적으로, 소정의 경우들에서 다양한 신택스 엘리먼트들 또는 데이터 필드들을 포함하지 않는 것에 관하여 상술된 방법들로 잠재적으로 최적화된 비트스트림들 (21) 을 획득하기 위해 본 개시물에서 설명되는 기법들의 다양한 양태들에 따라 동작하도록 또한 구성될 수도 있다.In addition, the extraction unit 72 and the audio decoding device 24 may more generally include, in certain cases, a potentially optimized bitstream May also be configured to operate in accordance with various aspects of the techniques described in this disclosure to acquire < RTI ID = 0.0 >

일부 경우들에서, 오디오 디코딩 디바이스 (24) 는, 제 1 압축 스킴을 이용하여 압축되는 고차 앰비소닉 오디오 데이터를 압축해제할 때, 고차 앰비소닉 오디오 데이터를 압축하는데 또한 이용되는 제 2 압축 스킴에 대응하는 비트들을 포함하지 않는 고차 앰비소닉 오디오 데이터의 압축된 버전을 표현하는 비트스트림 (21) 을 획득하도록 구성될 수도 있다. 제 1 압축 스킴은 벡터-기반 압축 스킴을 포함할 수도 있고, 결과적인 벡터는 구면 조화 도메인에서 정의되고 비트스트림 (21) 을 통해 전송된다. 벡터 기반 분해 압축 스킴은, 일부 예들에서, 고차 앰비소닉 오디오 데이터에 대한 특이값 분해 (또는 도 3 의 예에 관하여 더 상세히 설명된 것과 같은 그의 등가물들) 의 적용을 수반하는 압축 스킴을 포함할 수도 있다.In some cases, the audio decoding device 24 may be configured to support a second compression scheme that is also used to compress higher order ambience sound data when decompressing higher order ambience sound data compressed using the first compression scheme To obtain a bitstream 21 that represents a compressed version of higher order ambience sound data that does not include the bits that are < / RTI > The first compression scheme may comprise a vector-based compression scheme, and the resulting vector is defined in the spherical harmonic domain and transmitted via bitstream 21. The vector-based decomposition compression scheme may, in some instances, include a compression scheme involving the application of singular value decomposition (or equivalents thereof as described in more detail with respect to the example of FIG. 3) to higher order ambience sound data have.

오디오 디코딩 디바이스 (24) 는 압축 스킴의 제 2 타입을 수행하는데 이용되는 적어도 하나의 신택스 엘리먼트에 대응하는 비트들을 포함하지 않는 비트스트림 (21) 을 획득하도록 구성될 수도 있다. 위에서 언급된 바와 같이, 제 2 압축 스킴은 방향성-기반 압축 스킴을 포함한다. 더 구체적으로는, 오디오 디코딩 디바이스 (24) 는 제 2 압축 스킴의 HOAPredictionInfo 신택스 엘리먼트들에 대응하는 비트들을 포함하지 않는 비트스트림 (21) 을 획득하도록 구성될 수도 있다. 다시 말해, 제 2 압축 스킴이 방향성-기반 압축 스킴을 포함할 때, 오디오 디코딩 디바이스 (24) 는 방향성-기반 압축 스킴의 HOAPredictionInfo 신택스 엘리먼트에 대응하는 비트들을 포함하지 않는 비트스트림 (21) 을 획득하도록 구성될 수도 있다. 위에서 언급된 바와 같이, HOAPredictionInfo 신택스 엘리먼트는 2 개 이상의 방향성-기반 신호들 사이의 예측을 나타낼 수도 있다.The audio decoding device 24 may be configured to obtain a bit stream 21 that does not include the bits corresponding to at least one syntax element used to perform the second type of compression scheme. As mentioned above, the second compression scheme includes a direction-based compression scheme. More specifically, the audio decoding device 24 may be configured to obtain a bitstream 21 that does not include the bits corresponding to the HOAPredictionInfo syntax elements of the second compression scheme. In other words, when the second compression scheme includes a directional-based compression scheme, the audio decoding device 24 is configured to obtain a bitstream 21 that does not contain the bits corresponding to the HOAPredictionInfo syntax element of the directional-based compression scheme . As noted above, the HOAPredictionInfo syntax element may represent a prediction between two or more directional-based signals.

일부 경우들에서, 대안으로서 또는 전술한 예들에 관련하여, 오디오 디코딩 디바이스 (24) 는, 고차 앰비소닉 오디오 데이터의 압축 동안 이득 정정이 억제될 때, 이득 정정 데이터를 포함하지 않는 고차 앰비소닉 오디오 데이터의 압축된 버전을 표현하는 비트스트림 (21) 을 획득하도록 구성될 수도 있다. 오디오 디코딩 디바이스 (24) 는, 이들 경우들에서, 벡터-기반 합성 압축해제 스킴에 따라 고차 앰비소닉 오디오 데이터를 압축해제하도록 구성될 수도 있다. 고차 앰비소닉 데이터의 압축된 버전은 고차 앰비소닉 오디오 데이터에 대한 특이값 분해 (또는 위의 도 3 의 예에 관하여 더 상세히 설명된 그의 등가물들) 의 적용을 통해 생성된다. SVD 또는 그의 등가물들이 HOA 오디오 데이터에 적용될 때, 오디오 인코딩 디바이스 (20) 는 결과적인 벡터들 또는 비트스트림 (21) 에서 그것을 나타내는 비트들 중 적어도 하나를 특정하고, 여기서 벡터들은 대응하는 전경 오디오 오브젝트들의 공간 특성들 (예컨대 대응하는 전경 오디오 오브젝트들의 폭, 위치 및 볼륨) 을 설명한다.In some cases, alternatively or in conjunction with the foregoing examples, the audio decoding device 24 may be configured to generate a high-order ambience sound data that does not include gain correction data when the gain correction is suppressed during compression of the high- Lt; RTI ID = 0.0 > 21 < / RTI > The audio decoding device 24, in these cases, may be configured to decompress high order ambience sound data according to a vector-based composite decompression scheme. A compressed version of higher order ambsonic data is generated through the application of singular value decomposition (or equivalents thereof described in greater detail with respect to the example of FIG. 3 above) to higher order ambsonic audio data. When the SVD or its equivalents are applied to the HOA audio data, the audio encoding device 20 specifies the resulting vectors or at least one of the bits representing it in the bitstream 21, where the vectors correspond to the corresponding foreground audio objects Spatial properties (e.g., width, location, and volume of corresponding foreground audio objects).

더 구체적으로는, 오디오 디코딩 디바이스 (24) 는, 이득 정정이 억제됨을 나타내기 위해 0 으로 설정된 값을 갖는 MaxGainCorrAmbExp 신택스 엘리먼트를 비트스트림 (21) 으로부터 획득하도록 구성될 수도 있다. 즉, 오디오 디코딩 디바이스 (24) 는, 이득 정정이 억제될 때, 비트스트림이 이득 정정 데이터를 저장하는 HOAGainCorrection 데이터 필드를 포함하지 않도록 하는 비트스트림을 획득하도록 구성될 수도 있다. 비트스트림 (21) 은 이득 정정이 억제되고 이득 정정 데이터를 저장하는 HOAGainCorrection 데이터 필드를 포함하지 않는 것으로 나타내기 위해 0 의 값을 갖는 MaxGainCorrAmbExp 신택스 엘리먼트를 포함할 수도 있다. 이득 정정의 억제는 고차 앰비소닉 오디오 데이터의 압축이 고차 앰비소닉 오디오 데이터에 대한 단일화된 음성 및 오디오 및 음성 코딩 (USAC) 의 적용을 포함할 때 발생할 수도 있다.More specifically, the audio decoding device 24 may be configured to obtain from the bitstream 21 a MaxGainCorrAmbExp syntax element having a value set to zero to indicate that the gain correction is suppressed. That is, the audio decoding device 24 may be configured to obtain a bitstream such that when the gain correction is suppressed, the bitstream does not include a HOAGainCorrection data field that stores the gain correction data. Bitstream 21 may include a MaxGainCorrAmbExp syntax element having a value of zero to indicate that the gain correction is suppressed and does not include the HOAGainCorrection data field storing the gain correction data. The suppression of gain correction may occur when the compression of higher order ambsonic audio data involves the application of unified voice and audio and speech coding (USAC) to higher order ambsonic audio data.

도 5 는 본 개시물에서 설명되는 벡터-기반 합성 기법들의 다양한 양태들을 수행함에 있어서, 도 3 의 예에 도시된 오디오 인코딩 디바이스 (20) 와 같은 오디오 인코딩 디바이스의 예시적인 동작을 예시하는 플로우차트이다. 초기에는, 오디오 인코딩 디바이스 (20) 가 HOA 계수들 (11) 을 수신한다 (106). 오디오 인코딩 디바이스 (20) 는 LIT 유닛 (30) 을 호출할 수도 있고, 이 LIT 유닛은 HOA 계수들에 관하여 LIT 를 적용하여 변환된 HOA 계수들을 출력할 수도 있다 (예를 들어, SVD 의 경우, 변환된 HOA 계수들은 US[k] 벡터들 (33) 및 V[k] 벡터들 (35) 을 포함할 수도 있다) (107).5 is a flow chart illustrating an exemplary operation of an audio encoding device, such as the audio encoding device 20 shown in the example of FIG. 3, in performing various aspects of the vector-based synthesis techniques described in this disclosure . Initially, the audio encoding device 20 receives the HOA coefficients 11 (106). The audio encoding device 20 may invoke the LIT unit 30 and this LIT unit may apply the LIT with respect to the HOA coefficients to output the transformed HOA coefficients (e.g., in the case of SVD, The HOA coefficients may include US [ k ] vectors 33 and V [ k ] vectors 35).

오디오 인코딩 디바이스 (20) 는 그 다음에, 상술된 방식으로 다양한 파라미터들을 식별하기 위해 US[k] 벡터들 (33), US[k-1] 벡터들 (33), V[k] 및/또는 V[k-1] 벡터들 (35) 의 임의의 조합에 관하여 상술된 분석을 수행하도록 파라미터 계산 유닛 (32) 을 호출할 수도 있다. 즉, 파라미터 계산 유닛 (32) 은 변환된 HOA 계수들 (33/35) 의 분석에 기초하여 적어도 하나의 파라미터를 결정할 수도 있다 (108).The audio encoding device 20 then uses the US [ k ] vectors 33, US [ k- 1] vectors 33, V [ k ], and / or the like to identify various parameters in the manner described above May invoke the parameter calculation unit 32 to perform the analysis described above with respect to any combination of V [ k -1] vectors 35. That is, the parameter calculation unit 32 may determine at least one parameter based on the analysis of the transformed HOA coefficients 33/35 (108).

오디오 인코딩 디바이스 (20) 는 그 후에 리오더 유닛 (34) 을 호출할 수도 있고, 이 리오더 유닛은 그 파라미터에 기초하여 변환된 HOA 계수들 (다시 SVD 의 맥락에서, US[k] 벡터들 (33) 및 V[k] 벡터들 (35) 을 지칭할 수도 있음) 을 리오더링하여 상술된 바와 같이 리오더링된 변환된 HOA 계수들 (33'/35') (또는, 다시 말해, US[k] 벡터들 (33') 및 V[k] 벡터들 (35')) 을 생성할 수도 있다 (109). 오디오 인코딩 디바이스 (20) 는, 전술한 동작들 또는 후속 동작들 중 임의의 동작 동안, 음장 분석 유닛 (44) 을 또한 호출할 수도 있다. 음장 분석 유닛 (44) 은, 상술된 바와 같이, HOA 계수들 (11) 및/또는 변환된 HOA 계수들 (33/35) 에 관하여 음장 분석을 수행하여 전경 채널들 (nFG) (45) 의 총 개수, (도 3 의 예에서 배경 채널 정보 (43) 로서 일괄적으로 표시될 수도 있는) 전송할 부가적인 BG HOA 채널들의 인덱스들 (i) 및 개수 (nBGa) 그리고 배경 음장 (N_BG) 의 차수를 결정할 수도 있다 (109).The audio encoding device 20 may then call the reorder unit 34 which in turn transforms the transformed HOA coefficients (again US [ k ] vectors 33 in the context of the SVD) (Or, in other words, a US [ k ] vector 35, which may be referred to as V [ k ] vectors 35) (33 ') and V [ k ] vectors 35') (109). The audio encoding device 20 may also call the sound field analyzing unit 44 during any of the above described operations or subsequent operations. The sound field analysis unit 44 performs sound field analysis on the HOA coefficients 11 and / or the transformed HOA coefficients 33/35 to determine the total of the foreground channels nFG 45 the number, the order of the indexes of the additional BG HOA channel transfer (may be displayed collectively in a background channel information 43. in the example of FIG. 3) (i) and the number (nBGa) and the background field (N _BG) (109).

오디오 인코딩 디바이스 (20) 는 또한 배경 선택 유닛 (48) 을 호출할 수도 있다. 배경 선택 유닛 (48) 은 배경 채널 정보 (43) 에 기초하여 배경 또는 주변 HOA 계수들 (47) 을 결정할 수도 있다 (110). 오디오 인코딩 디바이스 (20) 는 전경 선택 유닛 (36) 을 추가로 호출할 수도 있고, 이 전경 선택 유닛은 (전경 벡터들을 식별하는 하나 이상의 인덱스들을 표현할 수도 있는) nFG (45) 에 기초하여 음장의 전경 또는 구별되는 성분들을 표현하는 리오더링된 V[k] 행렬 (35') 및 리오더링된 US[k] 행렬 (33') 을 선택할 수도 있다 (112).The audio encoding device 20 may also invoke the background selection unit 48. Background selection unit 48 may determine background or neighbor HOA coefficients 47 based on background channel information 43 (110). The audio encoding device 20 may additionally call a foreground selection unit 36 which selects the foreground of the sound field based on the nFG 45 (which may represent one or more indices identifying foreground vectors) Or a reordered US [ k ] matrix 33 'that represents the distinguished components and a reordered V [ k ] matrix 35' and a reordered US [ k ] matrix 33 '.

오디오 인코딩 디바이스 (20) 는 에너지 보상 유닛 (38) 을 호출할 수도 있다. 에너지 보상 유닛 (38) 은 주변 HOA 계수들 (47) 에 관하여 에너지 보상을 수행하여 배경 선택 유닛 (48) 에 의한 HOA 계수들 중 다양한 HOA 계수들의 제거로 인한 에너지 손실을 보상하고 (114) 그에 의해 에너지 보상된 주변 HOA 계수들 (47') 을 생성할 수도 있다.The audio encoding device 20 may call the energy compensation unit 38. [ The energy compensation unit 38 performs energy compensation with respect to the neighboring HOA coefficients 47 to compensate for the energy loss due to the removal of various HOA coefficients by the background selection unit 48 (114) Energy-compensated neighboring HOA coefficients 47 '.

오디오 인코딩 디바이스 (20) 는 또한 공간-시간 보간 유닛 (50) 을 호출할 수도 있다. 공간-시간 보간 유닛 (50) 은 리오더링된 변환된 HOA 계수들 (33'/35') 에 관하여 공간-시간 보간을 수행하여 보간된 전경 신호들 (49') (또한 "보간된 nFG 신호들 (49')" 이라고도 지칭될 수도 있음) 및 나머지 전경 방향성 정보 (53) (또한 "V[k] 벡터들 (53)" 이라고도 지칭될 수도 있음) 를 획득할 수도 있다 (116). 오디오 인코딩 디바이스 (20) 는 그 후에 계수 감소 유닛 (46) 을 호출할 수도 있다. 계수 감소 유닛 (46) 은 배경 채널 정보 (43) 에 기초하여 나머지 전경 V[k] 벡터들 (53) 에 관하여 계수 감소를 수행하여 감소된 전경 방향성 정보 (55) (또한 감소된 전경 V[k] 벡터들 (55) 이라고도 지칭될 수도 있음) 를 획득할 수도 있다 (118).The audio encoding device 20 may also invoke the space-time interpolation unit 50. The spatial-temporal interpolation unit 50 performs spatial-temporal interpolation on the reoriented transformed HOA coefficients 33 '/ 35' to generate interpolated foreground signals 49 '(also referred to as " interpolated nFG signals (Which may also be referred to as "V [ k ] vectors 49") and the remaining foreground directional information 53 (which may also be referred to as "V [ k ] vectors 53"). The audio encoding device 20 may then call the coefficient reduction unit 46. [ The coefficient reduction unit 46 performs a coefficient reduction on the remaining foreground V [ k ] vectors 53 based on the background channel information 43 to obtain reduced foreground direction information 55 (also reduced foreground V [ k] May also be referred to as vectors 55 (118).

오디오 인코딩 디바이스 (20) 는 그 후에 양자화 유닛 (52) 을 호출하여, 상술된 방식으로, 감소된 전경 V[k] 벡터들 (55) 을 압축하고 코딩된 전경 V[k] 벡터들 (57) 을 생성할 수도 있다 (120).The audio encoding device 20 then calls the quantization unit 52 to compress the reduced foreground V [ k ] vectors 55 and the coded foreground V [ k ] vectors 57 in the manner described above, (120).

오디오 인코딩 디바이스 (20) 는 또한 심리음향 오디오 코더 유닛 (40) 을 호출할 수도 있다. 심리음향 오디오 코더 유닛 (40) 은 에너지 보상된 주변 HOA 계수들 (47') 및 보간된 nFG 신호들 (49') 의 각각의 벡터를 심리음향 코딩하여 인코딩된 주변 HOA 계수들 (59) 및 인코딩된 nFG 신호들 (61) 을 생성할 수도 있다. 오디오 인코딩 디바이스는 그 후에 비트스트림 생성 유닛 (42) 을 호출할 수도 있다. 비트스트림 생성 유닛 (42) 은 코딩된 전경 방향성 정보 (57), 코딩된 주변 HOA 계수들 (59), 코딩된 nFG 신호들 (61) 및 배경 채널 정보 (43) 에 기초하여 비트스트림 (21) 을 생성할 수도 있다.The audio encoding device 20 may also call the psychoacoustic audio coder unit 40. Psychoacoustic audio coder unit 40 psychoacoustically codes each vector of energy-compensated neighboring HOA coefficients 47 'and interpolated nFG signals 49' to produce encoded neighboring HOA coefficients 59 and encoding Lt; RTI ID = 0.0 > nFG < / RTI > The audio encoding device may then call the bitstream generation unit 42. [ Bitstream generation unit 42 generates bitstream 21 based on coded foreground directional information 57, coded peripheral HOA coefficients 59, coded nFG signals 61 and background channel information 43, May be generated.

도 6 은 본 개시물에서 설명되는 기법들의 다양한 양태들을 수행함에 있어서, 도 4 에 도시된 오디오 디코딩 디바이스 (24) 와 같은 오디오 디코딩 디바이스의 예시적인 동작을 예시하는 플로우차트이다. 초기에는, 오디오 디코딩 디바이스 (40) 가 비트스트림 (21) 을 수신할 수도 있다 (130). 비트스트림의 수신시, 오디오 디코딩 디바이스 (24) 는 추출 유닛 (72) 을 호출할 수도 있다. 논의의 목적들을 위해 비트스트림 (21) 이 벡터-기반 재구성이 수행되어야 함을 나타낸다고 가정하면, 추출 유닛 (72) 은 비트스트림을 파싱하여 위에서 언급된 정보를 취출하여, 그 정보를 벡터-기반 재구성 유닛 (92) 에 전달할 수도 있다.FIG. 6 is a flow chart illustrating exemplary operation of an audio decoding device, such as the audio decoding device 24 shown in FIG. 4, in performing various aspects of the techniques described in this disclosure. Initially, the audio decoding device 40 may receive the bitstream 21 (130). Upon receipt of the bitstream, the audio decoding device 24 may call the extraction unit 72. Assuming for the purposes of the discussion that the bitstream 21 indicates that a vector-based reconstruction is to be performed, the extraction unit 72 parses the bitstream to extract the information referred to above, Unit 92 as shown in FIG.

다시 말해, 추출 유닛 (72) 은 상술된 방식으로 비트스트림 (21) 으로부터 코딩된 전경 방향성 정보 (57) (다시, 또한 코딩된 전경 V[k] 벡터들 (57) 이라고도 지칭될 수도 있음), 코딩된 주변 HOA 계수들 (59) 및 코딩된 전경 신호들 (또한 코딩된 전경 nFG 신호들 (59) 또는 코딩된 전경 오디오 오브젝트들 (59) 이라고도 지칭될 수도 있음) 을 추출할 수도 있다 (132).In other words, the extracting unit 72 extracts the foreground directional information 57 (again, also referred to as coded foreground V [ k ] vectors 57) coded from the bit stream 21 in the manner described above, (Which may also be referred to as coded foreground nFG signals 59 or coded foreground audio objects 59) of coded foreground HAs coefficients 59 and coded foreground signals 59 (also referred to as coded foreground nFG signals 59 or coded foreground audio objects 59) .

오디오 디코딩 디바이스 (24) 는 양자화해제 유닛 (74) 을 추가로 호출할 수도 있다. 양자화해제 유닛 (74) 은 코딩된 전경 방향성 정보 (57) 를 엔트로피 디코딩하고 양자화해제하여 감소된 전경 방향성 정보 (55 _k ) 를 획득할 수도 있다 (136). 오디오 디코딩 디바이스 (24) 는 또한 심리음향 디코딩 유닛 (80) 을 호출할 수도 있다. 심리음향 오디오 디코딩 유닛 (80) 은 인코딩된 주변 HOA 계수들 (59) 및 인코딩된 전경 신호들 (61) 을 디코딩하여 에너지 보상된 주변 HOA 계수들 (47') 및 보간된 전경 신호들 (49') 을 획득할 수도 있다 (138). 심리음향 디코딩 유닛 (80) 은 에너지 보상된 주변 HOA 계수들 (47') 을 페이드 유닛 (770) 에 전달하고 nFG 신호들 (49') 을 전경 공식화 유닛 (78) 에 전달할 수도 있다.The audio decoding device 24 may additionally call the dequantization unit 74. [ Quantization off unit 74 may obtain a coded foreground directional information (57) entropy decoding and reduced by turning off the quantization view direction information (55 _k) (136). The audio decoding device 24 may also call the psychoacoustic decoding unit 80. The psychoacoustic audio decoding unit 80 decodes the encoded neighboring HOA coefficients 59 and the encoded foreground signals 61 to produce energy compensated neighboring HOA coefficients 47 'and interpolated foreground signals 49' (138). Psychoacoustic decoding unit 80 may communicate energy compensated neighboring HOA coefficients 47 'to fade unit 770 and nFG signals 49' to foreground formulation unit 78.

오디오 디코딩 디바이스 (24) 는 그 다음에 공간-시간 보간 유닛 (76) 을 호출할 수도 있다. 공간-시간 보간 유닛 (76) 은 리오더링된 전경 방향성 정보 (55 _k ') 를 수신하고 감소된 전경 방향성 정보 (55 _k /55 _k _-1) 에 관하여 공간-시간 보간을 수행하여 보간된 전경 방향성 정보 (55 _k '') 를 생성할 수도 있다 (140). 공간-시간 보간 유닛 (76) 은 보간된 전경 V[k] 벡터들 (55 _k '') 을 페이드 유닛 (770) 에 포워딩할 수도 있다.The audio decoding device 24 may then invoke the space-time interpolation unit 76. [ The spatial-temporal interpolation unit 76 receives the reordered foreground directional information _55k 'and performs space-time interpolation on the reduced foreground directional information _55k / _55k- ₁ to generate interpolated foreground directional Information _55k '' (140). The space-time interpolation unit 76 may forward the interpolated foreground V [ k ] vectors _55k " to the fade unit 770. [

오디오 디코딩 디바이스 (24) 는 페이드 유닛 (770) 을 호출할 수도 있다. 페이드 유닛 (770) 은 에너지 보상된 주변 HOA 계수들 (47') 이 천이 중일 때를 나타내는 신택스 엘리먼트들 (예를 들어, AmbCoeffTransition 신택스 엘리먼트) 을 (예를 들어, 추출 유닛 (72) 으로부터) 수신하거나 또는 그렇지 않으면 획득할 수도 있다. 페이드 유닛 (770) 은, 천이 신택스 엘리먼트들 및 유지된 천이 상태 정보에 기초하여, 에너지 보상된 주변 HOA 계수들 (47') 을 페이드-인 또는 페이드-아웃하여 조정된 주변 HOA 계수들 (47'') 을 HOA 계수 공식화 유닛 (82) 에 출력할 수도 있다. 페이드 유닛 (770) 은 또한, 신택스 엘리먼트들 및 유지된 천이 상태 정보에 기초하고, 보간된 전경 V[k] 벡터들 (55 _k '') 의 대응하는 하나 이상의 엘리먼트들을 페이드-아웃 또는 페이드-인하여 조정된 전경 V[k] 벡터들 (55 _k ''') 을 전경 공식화 유닛 (78) 에 출력할 수도 있다 (142).The audio decoding device 24 may call the fade unit 770. [ Fade unit 770 may receive (e.g., from extraction unit 72) syntax elements (e.g., AmbCoeffTransition syntax element) indicating when energy-compensated neighboring HOA coefficients 47 'are in transition Or otherwise acquire. The fade unit 770 fades in or fades out the energy-compensated neighboring HOA coefficients 47 'based on the transition syntax elements and the held transition state information to adjust the adjusted neighboring HOA coefficients 47'') To the HOA coefficient formulation unit 82. [ Fade unit 770 is also based on the syntax elements and the held transition state information and fades out or fades in corresponding elements of the interpolated foreground V [ k ] vectors _55k " And output the adjusted foreground V [ k ] vectors _55k '''to the foreground formulation unit 78 (142).

오디오 디코딩 디바이스 (24) 는 전경 공식화 유닛 (78) 을 호출할 수도 있다. 전경 공식화 유닛 (78) 은 조정된 전경 방향성 정보 (55 _k ''') 에 의해 nFG 신호들 (49') 의 행렬 곱셈을 수행하여 전경 HOA 계수들 (65) 을 획득할 수도 있다 (144). 오디오 디코딩 디바이스 (24) 는 또한 HOA 계수 공식화 유닛 (82) 을 호출할 수도 있다. HOA 계수 공식화 유닛 (82) 은 조정된 주변 HOA 계수들 (47'') 에 대해 전경 HOA 계수들 (65) 을 가산하여 HOA 계수들 (11') 을 획득할 수도 있다 (146).The audio decoding device 24 may invoke the foreground formulation unit 78. [ Foreground formulation unit 78 is to perform a matrix multiplication of '(nFG signals 49) by the adjusted view direction information (55 _k' ')' may obtain the foreground HOA coefficient (65, 144). The audio decoding device 24 may also invoke the HOA coefficient formulation unit 82. The HOA coefficient formulation unit 82 may obtain 146 the HOA coefficients 11 'by adding the foreground HOA coefficients 65 to the adjusted neighboring HOA coefficients 47''.

도 7 은 본 개시물에서 설명되는 기법들의 다양한 양태들을 수행함에 있어서, 도 2 의 예에 도시된 시스템 (10) 과 같은 시스템의 예시적인 동작을 예시하는 플로우차트이다. 위에서 논의된 바와 같이, 콘텐츠 크리에이터 디바이스 (12) 는 (도 2 의 예에서 HOA 계수들 (11) 로서 도시되는) 캡처된 또는 생성된 오디오 콘텐츠를 생성 또는 편집하기 위해 오디오 편집 시스템 (18) 을 채용할 수도 있다. 콘텐츠 크리에이터 디바이스 (12) 는 그 후에, 위에서 더 상세히 논의된 바와 같이, 생성된 다중-채널 스피커 피드들에 대해 오디오 렌더러 (1) 를 이용하여 HOA 계수들 (11) 을 렌더링할 수도 있다 (200). 콘텐츠 크리에이터 디바이스 (12) 는 그 후에 오디오 재생 시스템을 이용하여 이들 스피커 피드들을 재생하고, 추가의 조정들 또는 편집이, 하나의 예로서, 원하는 예술적 의도를 캡처하도록 요구되는지 여부를 결정할 수도 있다 (202). 추가의 조정들을 원할 때 ("예" 202), 콘텐츠 크리에이터 디바이스 (12) 는 HOA 계수들 (11) 을 리믹싱하고 (204), HOA 계수들 (11) 을 렌더링하며 (200), 추가의 조정들이 필요한지 여부를 결정할 수도 있다 (202). 추가 조정들을 원하지 않을 때 ("아니오" 202), 오디오 인코딩 디바이스 (20) 는 도 5 의 예에 관하여 상술된 방식으로 비트스트림 (21) 을 생성하기 위해 오디오 콘텐츠를 인코딩할 수도 있다 (206). 오디오 인코딩 디바이스 (20) 는 또한, 위에서 더 상세히 설명된 바와 같이, 비트스트림 (21) 에서의 오디오 렌더링 정보 (2) 를 생성 및 특정할 수도 있다 (208).FIG. 7 is a flow chart illustrating exemplary operation of a system, such as the system 10 shown in the example of FIG. 2, in performing various aspects of the techniques described in this disclosure. As discussed above, the content creator device 12 employs an audio editing system 18 to create or edit the captured or generated audio content (shown as HOA coefficients 11 in the example of FIG. 2) You may. The content creator device 12 may then render 200 the HOA coefficients 11 using the audio renderer 1 for the generated multi-channel speaker feeds, as discussed in more detail above. . The content creator device 12 then uses these audio playback systems to play back these speaker feeds and may determine whether further adjustments or editing are required, as an example, to capture the desired artistic intention 202 ). The content creator device 12 remixes 204 the HOA coefficients 11, renders 200 the HOA coefficients 11, and then makes further adjustments (e.g., (202). &Lt; / RTI > When no further adjustments are desired (" NO " 202), the audio encoding device 20 may encode (206) the audio content to produce the bitstream 21 in the manner described above with respect to the example of FIG. The audio encoding device 20 may also generate and specify 208 the audio rendering information 2 in the bitstream 21, as described in more detail above.

콘텐츠 소비자 디바이스 (14) 는 그 후에 비트스트림 (21) 으로부터 오디오 렌더링 정보 (2) 를 획득할 수도 있다 (210). 디코딩 디바이스 (24) 는 그 후에 도 6 의 예에 관하여 상술된 방식으로 (도 2 의 예에서 HOA 계수들 (11') 로서 도시되는) 오디오 콘텐츠를 획득하기 위해 비트스트림 (21) 을 디코딩할 수도 있다 (211). 오디오 재생 시스템 (16) 은 그 후에 상술된 방식으로 오디오 렌더링 정보 (2) 에 기초하여 HOA 계수들 (11') 을 렌더링하고 (212) 그 렌더링된 오디오 콘텐츠를 라우드스피커들 (3) 을 통해 재생할 수도 있다 (214).The content consumer device 14 may then obtain audio rendering information 2 from the bitstream 21 (210). The decoding device 24 may then decode the bitstream 21 to obtain audio content (shown as HOA coefficients 11 'in the example of FIG. 2) in the manner described above with respect to the example of FIG. 6 (211). The audio playback system 16 then renders the HOA coefficients 11 'based on the audio rendering information 2 in the manner described above 212 and plays the rendered audio content through the loudspeakers 3 (214).

본 개시물에서 설명되는 기법들은 그에 따라, 제 1 예로서, 다중-채널 오디오 콘텐츠를 표현하는 비트스트림을 생성하는 디바이스로 하여금 오디오 렌더링 정보를 특정하는 것을 가능하게 할 수도 있다. 이 디바이스는, 이 제 1 예에서, 다중-채널 오디오 콘텐츠를 생성할 때 이용되는 오디오 렌더러를 식별하는 신호 값을 포함하는 오디오 렌더링 정보를 특정하는 수단을 포함할 수도 있다.The techniques described in this disclosure may accordingly enable, as a first example, a device that generates a bitstream representing multi-channel audio content to specify audio rendering information. The device may, in this first example, comprise means for specifying audio rendering information including signal values identifying an audio renderer used when generating multi-channel audio content.

제 1 예의 디바이스에 있어서, 신호 값은 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 행렬을 포함한다.In the device of the first example, the signal value includes a matrix used to render the spherical harmonic coefficients into a plurality of speaker feeds.

제 2 예에서, 제 1 예의 디바이스에 있어서, 신호 값은 비트스트림이 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 행렬을 포함함을 나타내는 인덱스를 정의하는 2 개 이상의 비트들을 포함한다.In a second example, in the device of the first example, the signal value includes two or more bits defining an index indicating that the bitstream includes a matrix used to render the spherical harmonic coefficients into a plurality of speaker feeds.

제 2 예의 디바이스에 있어서, 오디오 렌더링 정보는 비트스트림에 포함된 행렬의 로우들의 개수를 정의하는 2 개 이상의 비트들 및 비트스트림에 포함된 행렬의 컬럼들의 개수를 정의하는 2 개 이상의 비트들을 더 포함한다.In the device of the second example, the audio rendering information further includes two or more bits defining the number of rows of the matrix included in the bitstream and two or more bits defining the number of columns of the matrix included in the bitstream do.

제 1 예의 디바이스에 있어서, 신호 값은 오디오 오브젝트들을 복수의 스피커 피드들로 렌더링하는데 이용되는 렌더링 알고리즘을 특정한다.In the device of the first example, the signal value specifies a rendering algorithm used to render audio objects into a plurality of speaker feeds.

제 1 예의 디바이스에 있어서, 신호 값은 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 렌더링 알고리즘을 특정한다.In the device of the first example, the signal value specifies a rendering algorithm used to render the spherical harmonic coefficients into a plurality of speaker feeds.

제 1 예의 디바이스에 있어서, 신호 값은 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 복수의 행렬들 중 하나와 연관된 인덱스를 정의하는 2 개 이상의 비트들을 포함한다.In the device of the first example, the signal value comprises two or more bits defining an index associated with one of the plurality of matrices used to render spherical harmonic coefficients into a plurality of speaker feeds.

제 1 예의 디바이스에 있어서, 신호 값은 오디오 오브젝트들을 복수의 스피커 피드들로 렌더링하는데 이용되는 복수의 렌더링 알고리즘들 중 하나와 연관된 인덱스를 정의하는 2 개 이상의 비트들을 포함한다.In the device of the first example, the signal value comprises two or more bits defining an index associated with one of a plurality of rendering algorithms used to render audio objects into a plurality of speaker feeds.

제 1 예의 디바이스에 있어서, 신호 값은 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 복수의 렌더링 알고리즘들 중 하나와 연관된 인덱스를 정의하는 2 개 이상의 비트들을 포함한다.In the device of the first example, the signal value comprises two or more bits defining an index associated with one of a plurality of rendering algorithms used to render the spherical harmonic coefficients into a plurality of speaker feeds.

제 1 예의 디바이스에 있어서, 오디오 렌더링 정보를 특정하는 수단은 비트스트림에서 오디오 렌더링 정보를 오디오 프레임 기반으로 특정하는 수단을 포함한다.In the device of the first example, the means for specifying audio rendering information includes means for specifying audio rendering information on an audio frame basis in the bitstream.

제 1 예의 디바이스에 있어서, 오디오 렌더링 정보를 특정하는 수단은 비트스트림에서 오디오 렌더링 정보를 단일 회 특정하는 수단을 포함한다.In the device of the first example, the means for specifying the audio rendering information includes means for specifying the audio rendering information in the bitstream a single time.

제 3 예에서, 실행될 때, 하나 이상의 프로세서들로 하여금 비트스트림에서 오디오 렌더링 정보를 특정하게 하는 명령들을 저장한 비일시적 컴퓨터 판독가능 저장 매체에 있어서, 오디오 렌더링 정보는 다중-채널 오디오 콘텐츠를 생성할 때 이용되는 오디오 렌더러를 식별한다.In a third example, when executed, one or more processors store instructions that cause audio processors to specify audio rendering information in a bitstream, wherein the audio rendering information is generated to generate multi-channel audio content Identifies the audio renderer to be used.

제 4 예에서, 비트스트림으로부터 다중-채널 오디오 콘텐츠를 렌더링하기 위한 디바이스는, 다중-채널 오디오 콘텐츠를 생성할 때 이용되는 오디오 렌더러를 식별하는 신호 값을 포함하는 오디오 렌더링 정보를 결정하는 수단, 및 비트스트림에 특정된 오디오 렌더링 정보에 기초하여 복수의 스피커 피드들을 렌더링하는 수단을 포함한다.In a fourth example, a device for rendering multi-channel audio content from a bitstream comprises means for determining audio rendering information comprising a signal value identifying an audio renderer to be used in generating multi-channel audio content, And means for rendering a plurality of speaker feeds based on audio rendering information specified in the bitstream.

제 4 예의 디바이스에 있어서, 신호 값은 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 행렬을 포함하고, 복수의 스피커 피드들을 렌더링하는 수단은 행렬에 기초하여 복수의 스피커 피드들을 렌더링하는 수단을 포함한다.In a fourth example device, the signal value includes a matrix used to render spherical harmonic coefficients into a plurality of speaker feeds, and the means for rendering the plurality of speaker feeds comprises means for rendering a plurality of speaker feeds based on the matrix .

제 5 예에서, 제 4 예의 디바이스에 있어서, 신호 값은 비트스트림이 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 행렬을 포함함을 나타내는 인덱스를 정의하는 2 개 이상의 비트들을 포함하고, 디바이스는 인덱스에 응답하여 비트스트림으로부터의 행렬을 파싱하는 수단을 더 포함하고, 복수의 스피커 피드들을 렌더링하는 수단은 파싱된 행렬에 기초하여 복수의 스피커 피드들을 렌더링하는 수단을 포함한다.In a fifth example, in the device of the fourth example, the signal value includes two or more bits defining an index indicating that the bitstream includes a matrix used to render spherical harmonic coefficients into a plurality of speaker feeds, The device further comprises means for parsing a matrix from the bitstream in response to the index and the means for rendering the plurality of speaker feeds comprises means for rendering a plurality of speaker feeds based on the parsed matrix.

제 5 예의 디바이스에 있어서, 신호 값은 비트스트림에 포함된 행렬의 로우들의 개수를 정의하는 2 개 이상의 비트들 및 비트스트림에 포함된 행렬의 컬럼들의 개수를 정의하는 2 개 이상의 비트들을 더 포함하고, 비트스트림으로부터의 행렬을 파싱하는 수단은 로우들의 개수를 정의하는 2 개 이상의 비트들 및 컬럼들의 개수를 정의하는 2 개 이상의 비트들에 기초하여 그리고 인덱스에 응답하여 비트스트림으로부터의 행렬을 파싱하는 수단을 포함한다.In the device of the fifth example, the signal value further comprises two or more bits defining the number of rows of the matrix included in the bitstream and two or more bits defining the number of columns of the matrix included in the bitstream , The means for parsing the matrix from the bitstream comprises parsing the matrix from the bitstream in response to the index and based on two or more bits defining two or more bits and the number of columns defining the number of rows Means.

제 4 예의 디바이스에 있어서, 신호 값은 오디오 오브젝트들을 복수의 스피커 피드들로 렌더링하는데 이용되는 렌더링 알고리즘을 특정하고, 복수의 스피커 피드들을 렌더링하는 수단은 특정된 렌더링 알고리즘을 이용하여 오디오 오브젝트들로부터 복수의 스피커 피드들을 렌더링하는 수단을 포함한다.In the device of the fourth example, the signal value specifies a rendering algorithm used to render audio objects into a plurality of speaker feeds, and the means for rendering the plurality of speaker feeds comprises a plurality Lt; RTI ID = 0.0 > speaker feeds. &Lt; / RTI >

제 4 예의 디바이스에 있어서, 신호 값은 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 렌더링 알고리즘을 특정하고, 복수의 스피커 피드들을 렌더링하는 수단은 특정된 렌더링 알고리즘을 이용하여 구면 조화 계수들로부터 복수의 스피커 피드들을 렌더링하는 수단을 포함한다.In the device of the fourth example, the signal value specifies a rendering algorithm used to render the spherical harmonic coefficients into a plurality of speaker feeds, and the means for rendering the plurality of speaker feeds uses spherical harmonic coefficients And means for rendering a plurality of speaker feeds.

제 4 예의 디바이스에 있어서, 신호 값은 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 복수의 행렬들 중 하나와 연관된 인덱스를 정의하는 2 개 이상의 비트들을 포함하고, 복수의 스피커 피드들을 렌더링하는 수단은 인덱스와 연관된 복수의 행렬들 중 하나를 이용하여 구면 조화 계수들로부터 복수의 스피커 피드들을 렌더링하는 수단을 포함한다.In a fourth example device, the signal value includes two or more bits defining an index associated with one of a plurality of matrices used to render spherical harmonic coefficients into a plurality of speaker feeds, The means for rendering a plurality of speaker feeds from spherical harmonic coefficients using one of a plurality of matrices associated with the index.

제 4 예의 디바이스에 있어서, 신호 값은 오디오 오브젝트들을 복수의 스피커 피드들로 렌더링하는데 이용되는 복수의 렌더링 알고리즘들 중 하나와 연관된 인덱스를 정의하는 2 개 이상의 비트들을 포함하고, 복수의 스피커 피드들을 렌더링하는 수단은 인덱스와 연관된 복수의 렌더링 알고리즘들 중 하나를 이용하여 오디오 오브젝트들로부터 복수의 스피커 피드들을 렌더링하는 수단을 포함한다.In a fourth example device, the signal value comprises two or more bits defining an index associated with one of a plurality of rendering algorithms used to render audio objects into a plurality of speaker feeds, The means for rendering a plurality of speaker feeds from the audio objects using one of a plurality of rendering algorithms associated with the index.

제 4 예의 디바이스에 있어서, 신호 값은 구면 조화 계수들을 복수의 스피커 피드들로 렌더링하는데 이용되는 복수의 렌더링 알고리즘들 중 하나와 연관된 인덱스를 정의하는 2 개 이상의 비트들을 포함하고, 복수의 스피커 피드들을 렌더링하는 수단은 인덱스와 연관된 복수의 렌더링 알고리즘들 중 하나를 이용하여 구면 조화 계수들로부터 복수의 스피커 피드들을 렌더링하는 수단을 포함한다.In the device of the fourth example, the signal value comprises two or more bits defining an index associated with one of a plurality of rendering algorithms used to render spherical harmonic coefficients into a plurality of speaker feeds, The means for rendering includes means for rendering a plurality of speaker feeds from the spherical harmonic coefficients using one of a plurality of rendering algorithms associated with the index.

제 4 예의 디바이스에 있어서, 오디오 렌더링 정보를 결정하는 수단은 비트스트림으로부터 오디오 렌더링 정보를 오디오 프레임 기반으로 결정하는 수단을 포함한다.In the device of the fourth example, the means for determining audio rendering information includes means for determining audio rendering information from an audio stream based on an audio frame.

제 4 예의 디바이스에 있어서, 오디오 렌더링 정보를 결정하는 수단은 비트스트림으로부터 오디오 렌더링 정보를 단일 회 결정하는 수단을 포함한다.In the device of the fourth example, the means for determining audio rendering information comprises means for single-rounding audio rendering information from the bitstream.

제 6 예에서, 비일시적 컴퓨터 판독가능 저장 매체는, 실행될 때, 하나 이상의 프로세서들로 하여금, 다중-채널 오디오 콘텐츠를 생성할 때 이용되는 오디오 렌더러를 식별하는 신호 값을 포함하는 오디오 렌더링 정보를 결정하게 하고; 비트스트림에 특정된 오디오 렌더링 정보에 기초하여 복수의 스피커 피드들을 렌더링하게 하는 명령들을 저장하고 있다.In a sixth example, the non-transitory computer readable storage medium, when executed, causes one or more processors to determine audio rendering information including signal values identifying an audio renderer to be used in generating multi-channel audio content ; And to render a plurality of speaker feeds based on audio rendering information specified in the bitstream.

도 8a 내지 도 8d 는 본 개시물에서 설명되는 기법들에 따라 형성되는 비트스트림들 (21A 내지 21D) 을 예시하는 다이어그램이다. 도 8a 의 예에서, 비트스트림 (21A) 은 위의 도 2 내지 도 4 에 도시된 비트스트림 (21) 의 하나의 예를 표현할 수도 있다. 비트스트림 (21A) 은 신호 값 (554) 을 정의하는 하나 이상의 비트들을 포함하는 오디오 렌더링 정보 (2A) 를 포함한다. 이 신호 값 (554) 은 아래에 설명된 타입들의 정보의 임의의 조합을 표현할 수도 있다. 비트스트림 (21A) 은 또한, 오디오 콘텐츠 (7/9) 의 하나의 예를 표현할 수도 있는 오디오 콘텐츠 (558) 를 포함한다.8A-8D are diagrams illustrating bitstreams 21A-21D formed in accordance with the techniques described in this disclosure. In the example of Fig. 8A, the bit stream 21A may represent one example of the bit stream 21 shown in Figs. 2 to 4 above. Bitstream 21A includes audio rendering information 2A that includes one or more bits that define a signal value 554. This signal value 554 may represent any combination of the types of information described below. Bitstream 21A also includes audio content 558, which may represent one example of audio content 7/9.

도 8b 의 예에서, 비트스트림 (21B) 은 비트스트림 (21A) 과 유사할 수도 있는데, 여기서 오디오 렌더링 정보 (2B) 의 신호 값 (554) 은 인덱스 (554A), 시그널링된 행렬의 로우 사이즈 (554B) 를 정의하는 하나 이상의 비트들, 시그널링된 행렬의 컬럼 사이즈 (554C) 를 정의하는 하나 이상의 비트들, 및 행렬 계수들 (554D) 을 포함한다. 인덱스 (554A) 는 5 개의 비트들 중 2 개를 이용하여 정의될 수도 있는 한편, 로우 사이즈 (554B) 및 컬럼 사이즈 (554C) 각각은 2 개 내지 16 개의 비트들을 이용하여 정의될 수도 있다.In the example of FIG. 8B, bitstream 21B may be similar to bitstream 21A, where the signal value 554 of audio rendering information 2B includes an index 554A, a row size 554B of the signaled matrix One or more bits defining a column size 554C of the signaled matrix, and matrix coefficients 554D. Index 554A may be defined using two of the five bits while row size 554B and column size 554C may each be defined using two to sixteen bits.

추출 유닛 (72) 은 인덱스 (554A) 를 추출하고 그 인덱스가 행렬이 비트스트림 (21B) 에 포함된다는 것을 시그널링하는지 여부를 결정할 수도 있다 (여기서 소정의 인덱스 값들, 예컨대 0000 또는 1111 은 행렬이 비트스트림 (21B) 에 명시적으로 특정된다는 것을 시그널링할 수도 있다). 도 8b 의 예에서, 비트스트림 (21B) 은 행렬이 비트스트림 (21B) 에 명시적으로 특정된다는 것을 시그널링하는 인덱스 (554A) 를 포함한다. 그 결과, 추출 유닛 (72) 은 로우 사이즈 (554B) 및 컬럼 사이즈 (554C) 를 추출할 수도 있다. 추출 유닛 (72) 은 로우 사이즈 (554B), 컬럼 사이즈 (554C) 및 각각의 행렬 계수의 시그널링된 (도 8a 에 미도시) 또는 암시된 비트 사이즈의 함수로서 행렬 계수들을 표현하는 파싱할 비트수를 연산하도록 구성될 수도 있다. 비트들의 결정된 개수를 이용하여, 추출 유닛 (72) 은 행렬 계수들 (554D) 을 추출할 수도 있는 한편, 오디오 재생 시스템 (16) 은 상술된 바와 같이 오디오 렌더러들 (22) 중 하나를 구성하는데 이용할 수도 있다. 비트스트림 (21B) 에서 오디오 렌더링 정보 (2B) 를 단일 회 시그널링하는 것으로 도시되었지만, 오디오 렌더링 정보 (2B) 는 비트스트림 (21B) 에서 다수 회 또는 적어도 부분적으로 또는 완전히 별개의 대역외 채널에서 (일부 경우들에서 옵션적인 데이터로서) 시그널링될 수도 있다.The extraction unit 72 may extract an index 554A and determine whether the index signals that the matrix is included in the bitstream 21B (where predetermined index values, e.g., 0000 or 1111, (I. E., &Lt; / RTI > explicitly specified in step 21B). In the example of FIG. 8B, the bitstream 21B includes an index 554A that signals that the matrix is explicitly specified in the bitstream 21B. As a result, the extraction unit 72 may extract the row size 554B and the column size 554C. The extraction unit 72 is configured to determine the number of bits to parse representing the matrix coefficients as a function of row size 554B, column size 554C and signaled (not shown in FIG. 8A) or implied bit size of each matrix coefficient . Using the determined number of bits, the extraction unit 72 may extract the matrix coefficients 554D, while the audio playback system 16 may be used to construct one of the audio renderers 22 as described above It is possible. Audio rendering information 2B may be encoded in bit stream 21B multiple times or at least partially or entirely in a separate distinct out-of-band channel Lt; / RTI > may be signaled as optional data in some cases).

도 8c 의 예에서, 비트스트림 (21C) 은 위의 도 2 내지 도 4 에 도시된 비트스트림 (21) 의 하나의 예를 표현할 수도 있다. 비트스트림 (21C) 은 이 예에서 알고리즘 인덱스 (554E) 를 특정하는 신호 값 (554) 을 포함하는 오디오 렌더링 정보 (2C) 를 포함한다. 비트스트림 (21C) 은 또한 오디오 콘텐츠 (558) 를 포함한다. 알고리즘 인덱스 (554E) 는, 위에서 언급된 바와 같이, 2 개 내지 5 개의 비트들을 이용하여 정의될 수도 있고, 여기서 이 알고리즘 인덱스 (554E) 는 오디오 콘텐츠 (558) 를 렌더링할 때 이용될 렌더링 알고리즘을 식별할 수도 있다.In the example of Fig. 8C, the bit stream 21C may represent one example of the bit stream 21 shown in Figs. 2 to 4 above. The bit stream 21C includes audio rendering information 2C that includes a signal value 554 that specifies an algorithm index 554E in this example. Bitstream 21C also includes audio content 558. [ Algorithm index 554E may be defined using two to five bits, as noted above, where algorithm index 554E identifies the rendering algorithm to be used when rendering audio content 558 You may.

추출 유닛 (72) 은 알고리즘 인덱스 (550E) 를 추출하고 알고리즘 인덱스 (554E) 가 행렬이 비트스트림 (21C) 에 포함된다는 것을 시그널링하는지 여부를 결정할 수도 있다 (여기서 소정의 인덱스 값들, 예컨대 0000 또는 1111 은 행렬이 비트스트림 (21C) 에 명시적으로 특정된다는 것을 시그널링할 수도 있다). 도 8c 의 예에서, 비트스트림 (21C) 은 행렬이 비트스트림 (21C) 에 명시적으로 특정되지 않는다는 것을 시그널링하는 알고리즘 인덱스 (554E) 를 포함한다. 그 결과, 추출 유닛 (72) 은 알고리즘 인덱스 (554E) 를 오디오 재생 시스템 (16) 에 포워딩하고, 이 오디오 재생 시스템은 (이용가능하다면) (도 2 내지 도 4 의 예에서 렌더러들 (22) 로서 표시되는) 렌더링 알고리즘들 중 대응하는 하나의 렌더링 알고리즘을 선택한다. 비트스트림 (21C) 에서 오디오 렌더링 정보 (2C) 를 단일 회 시그널링하는 것으로 도시되었지만, 도 8c 의 예에서, 오디오 렌더링 정보 (2C) 는 비트스트림 (21C) 에서 다수 회 또는 적어도 부분적으로 또는 완전히 별개의 대역외 채널에서 (일부 경우들에서 옵션적인 데이터로서) 시그널링될 수도 있다.The extraction unit 72 may extract the algorithm index 550E and determine whether the algorithm index 554E signals that the matrix is included in the bitstream 21C where certain index values, e.g., 0000 or 1111, The matrix may be signaled that it is explicitly specified in the bit stream 21C). In the example of FIG. 8C, the bitstream 21C includes an algorithm index 554E that signals that the matrix is not explicitly specified in the bitstream 21C. As a result, the extraction unit 72 forwards the algorithm index 554E to the audio reproduction system 16 which, if available (as the renderers 22 in the example of Figures 2 to 4) Selects a corresponding one of the rendering algorithms. 8C, the audio rendering information 2C may be generated in the bitstream 21C a number of times or at least partially or completely independent of the audio rendering information 2C in the bitstream 21C, And may be signaled in the out-of-band channel (in some cases as optional data).

도 8d 의 예에서, 비트스트림 (21D) 은 위의 도 2 내지 도 4 에 도시된 비트스트림 (21) 의 하나의 예를 표현할 수도 있다. 비트스트림 (21D) 은 이 예에서 행렬 인덱스 (554F) 를 특정하는 신호 값 (554) 을 포함하는 오디오 렌더링 정보 (2D) 를 포함한다. 비트스트림 (21D) 은 또한 오디오 콘텐츠 (558) 를 포함한다. 행렬 인덱스 (554F) 는, 위에서 언급된 바와 같이, 2 개 내지 5 개의 비트들을 이용하여 정의될 수도 있고, 여기서 이 행렬 인덱스 (554F) 는 오디오 콘텐츠 (558) 를 렌더링할 때 이용될 렌더링 알고리즘을 식별할 수도 있다.In the example of Fig. 8D, the bit stream 21D may represent one example of the bit stream 21 shown in Figs. 2 to 4 above. The bit stream 21D includes audio rendering information 2D that includes a signal value 554 that specifies a matrix index 554F in this example. Bitstream 21D also includes audio content 558. [ The matrix index 554F may be defined using two to five bits as described above wherein the matrix index 554F identifies the rendering algorithm to be used when rendering the audio content 558 You may.

추출 유닛 (72) 은 행렬 인덱스 (550F) 를 추출하고 행렬 인덱스 (554F) 가 행렬이 비트스트림 (21D) 에 포함된다는 것을 시그널링하는지 여부를 결정할 수도 있다 (여기서 소정의 인덱스 값들, 예컨대 0000 또는 1111 은 행렬이 비트스트림 (21C) 에 명시적으로 특정된다는 것을 시그널링할 수도 있다). 도 8d 의 예에서, 비트스트림 (21D) 은 행렬이 비트스트림 (21D) 에 명시적으로 특정되지 않는다는 것을 시그널링하는 행렬 인덱스 (554F) 를 포함한다. 그 결과, 추출 유닛 (72) 은 행렬 인덱스 (554F) 를 오디오 재생 디바이스에 포워딩하고, 이 오디오 재생 디바이스는 (이용가능하다면) 렌더러들 (22) 중 대응하는 하나의 렌더러를 선택한다. 비트스트림 (21D) 에서 오디오 렌더링 정보 (2D) 를 단일 회 시그널링하는 것으로 도시되었지만, 도 8d 의 예에서, 오디오 렌더링 정보 (2D) 는 비트스트림 (21D) 에서 다수 회 또는 적어도 부분적으로 또는 완전히 별개의 대역외 채널에서 (일부 경우들에서 옵션적인 데이터로서) 시그널링될 수도 있다.The extraction unit 72 may extract the matrix index 550F and determine whether the matrix index 554F signals that the matrix is included in the bitstream 21D where certain index values, e.g., 0000 or 1111, The matrix may be signaled that it is explicitly specified in the bit stream 21C). In the example of FIG. 8D, the bitstream 21D includes a matrix index 554F that signals that the matrix is not explicitly specified in the bitstream 21D. As a result, the extraction unit 72 forwards the matrix index 554F to the audio reproduction device, which selects the corresponding one of the renderers 22 (if available). 8D, the audio rendering information 2D may be encoded in the bitstream 21D a number of times or at least partially or completely independent of the audio rendering information 2D in the bitstream 21D, And may be signaled in the out-of-band channel (in some cases as optional data).

도 8e 내지 도 8g 는 압축된 공간 성분들을 더 상세히 특정할 수도 있는 비트스트림 또는 사이드 채널 정보의 부분들을 예시하는 다이어그램들이다. 도 8e 는 비트스트림 (21) 의 프레임 (249A') 의 제 1 예를 예시한다. 도 8e 의 예에서, 프레임 (249A') 은 ChannelSideInfoData (CSID) 필드들 (154A 내지 154C), HOAGainCorrectionData (HOAGCD) 필드들, 및 VVectorData 필드들 (156A 및 156B) 을 포함한다. CSID 필드 (154A) 는 unitC (267), bb (266) 및 ba (265) 를 ChannelType (269) 과 함께 포함하고, 이들 각각은 도 8e 의 예에 도시된 대응하는 값들 01, 1, 0 및 01 로 전송된다. CSID 필드 (154B) 는 unitC (267), bb (266) 및 ba (265) 를 ChannelType (269) 과 함께 포함하고, 이들 각각은 도 8e 의 예에 도시된 대응하는 값들 01, 1, 0 및 01 로 전송된다. CSID 필드 (154C) 는 3 의 값을 갖는 ChannelType 필드 (269) 를 포함한다. CSID 필드들 (154A 내지 154C) 각각은 전송 채널들 1, 2 및 3 의 각각의 하나에 대응한다. 사실상, 각각의 CSID 필드 (154A 내지 154C) 는 대응하는 페이로드 (156A 및 156B) 가 방향-기반 신호들인지 (대응하는 ChannelType 이 0 과 동일할 때), 벡터-기반 신호들인지 (대응하는 ChannelType 이 1 과 동일할 때), 부가적인 주변 HOA 계수인지 (대응하는 ChannelType 이 2 와 동일할 때), 또는 엠프티 (empty) 인지 (ChannelType 이 3 과 동일할 때) 여부를 나타낸다.8E-8G are diagrams illustrating portions of bitstream or side channel information that may specify compressed spatial components in more detail. FIG. 8E illustrates a first example of a frame 249A 'of the bitstream 21. In the example of FIG. 8E, frame 249A 'includes ChannelSideInfoData (CSID) fields 154A through 154C, HOAGainCorrectionData (HOAGCD) fields, and VVectorData fields 156A and 156B. CSID field 154A includes unitC 267, bb 266 and ba 265 along with ChannelType 269, each of which corresponds to the corresponding values 01, 1, 0, and 01 shown in the example of FIG. Lt; / RTI > CSID field 154B includes unitC 267, bb 266 and ba 265 along with ChannelType 269, each of which corresponds to the corresponding values 01, 1, 0, and 01 shown in the example of FIG. Lt; / RTI > The CSID field 154C includes a ChannelType field 269 having a value of three. Each of the CSID fields 154A through 154C corresponds to one of each of the transport channels 1, 2, and 3. In fact, each of the CSID fields 154A through 154C indicates whether the corresponding payloads 156A and 156B are direction-based signals (when the corresponding ChannelType equals zero), vector-based signals (When the corresponding ChannelType is equal to 2) or empty (when the ChannelType is equal to 3).

도 8e 의 예에서, 프레임 (249A) 은 2 개의 벡터-기반 신호들 (CSID 필드들 (154A 및 154B) 에서 1 과 동일한 ChannelType (269) 이 주어짐) 및 엠프티 (ChannelType (269) 이 CSID 필드 (154C) 에서 3 과 동일하다는 것이 주어짐) 를 포함한다. 앞의 HOAconfig 부분 (예시 목적들의 용이를 위해 도시되지 않음) 에 기초하여, 오디오 디코딩 디바이스 (24) 는 16 개의 V 벡터 엘리먼트들 모두가 인코딩된 것을 결정할 수도 있다. 그에 따라, VVectorData (156A 및 156B) 각각은 16 개의 벡터 엘리먼트들 모두를 포함하고, 이들 각각은 8 비트들로 균일하게 양자화된다.In the example of FIG. 8E, frame 249A includes two vector-based signals (given a ChannelType 269 equal to 1 in CSID fields 154A and 154B) and an empty (ChannelType 269) CSID field 154C) is equal to 3). Based on the previous HOAconfig portion (not shown for ease of illustration purposes), the audio decoding device 24 may determine that all 16 V vector elements have been encoded. Accordingly, each of the VVectorData 156A and 156B includes all 16 vector elements, each of which is uniformly quantized into 8 bits.

도 8e 의 예에 추가로 도시된 바와 같이, 프레임 (249A') 은 HOAPredictionInfo 필드를 포함하지 않는다. HOAPredictionInfo 필드는 벡터-기반 압축 스킴이 HOA 오디오 데이터를 압축하는데 이용될 때 본 개시물에서 설명되는 기법들에 따라 제거될 수도 있는 제 2 방향성-기반 압축 스킴에 대응하는 필드를 표현할 수도 있다.As further shown in the example of FIG. 8E, frame 249A 'does not include a HOAPredictionInfo field. The HOAPredictionInfo field may represent a field corresponding to a second directional-based compression scheme that may be removed in accordance with the techniques described in this disclosure when a vector-based compression scheme is used to compress the HOA audio data.

도 8f 는 HOAGainCorrectionData 가 프레임 (249A'') 에 저장된 각각의 전송 채널로부터 제거되었다는 것을 제외하고는 프레임 (249A) 과 실질적으로 유사한 프레임 (249A'') 을 예시하는 다이어그램이다. HOAGainCorrectionData 필드는 상술된 기법들의 다양한 양태들에 따라 이득 정정이 억제될 때 프레임 (249A'') 으로부터 제거될 수도 있다.8F is a diagram illustrating a frame 249A " substantially similar to frame 249A except that HOAGainCorrectionData is removed from each transport channel stored in frame 249A ". The HOAGainCorrectionData field may be removed from frame 249A " when gain correction is suppressed according to various aspects of the techniques described above.

도 8g 는 HOAPredictionInfo 필드가 제거된 것을 제외하고는 프레임 (249A'') 과 유사할 수도 있는 프레임 (249A''') 을 예시하는 다이어그램이다. 프레임 (249A''') 은 소정의 상황들에서 필요하지 않을 수도 있는 다양한 필드들을 제어하는 것에 관련하여 기법들의 양쪽 양태들이 적용될 수도 있는 하나의 예를 표현한다.FIG. 8G is a diagram illustrating a frame 249A '' 'that may be similar to frame 249A' 'except that the HOAPredictionInfo field is removed. Frame 249A '' 'represents one example in which both aspects of the techniques may be applied in connection with controlling various fields that may not be needed in certain situations.

전술한 기법들은 임의의 개수의 상이한 맥락들 및 오디오 에코시스템들에 관하여 수행될 수도 있다. 다수의 예시적인 맥락들이 아래에 설명되지만, 기법들은 그 예시적인 맥락들로 제한되어야 한다. 하나의 예시적인 오디오 에코시스템은 오디오 콘텐츠, 영화 스튜디오들, 음악 스튜디오들, 게이밍 오디오 스튜디오들, 채널 기반 오디오 콘텐츠, 코딩 엔진들, 게임 오디오 스템들, 게임 오디오 코딩/렌더링 엔진들, 및 전달 시스템들을 포함할 수도 있다.The techniques described above may be performed with respect to any number of different contexts and audio echo systems. While a number of exemplary contexts are described below, techniques should be limited to those exemplary contexts. One exemplary audio echo system includes audio content, movie studios, music studios, gaming audio studios, channel based audio content, coding engines, game audio systems, game audio coding / rendering engines, .

영화 스튜디오들, 음악 스튜디오들, 및 게이밍 오디오 스튜디오들은 오디오 콘텐츠를 수신할 수도 있다. 일부 예들에서, 오디오 콘텐츠는 포착의 출력을 표현할 수도 있다. 영화 스튜디오들은, 예컨대 디지털 오디오 워크스테이션 (digital audio workstation; DAW) 을 이용함으로써, (예를 들어, 2.0, 5.1, 및 7.1 에서) 채널 기반 오디오 콘텐츠를 출력할 수도 있다. 음악 스튜디오들은, 예컨대 DAW 을 이용함으로써, (예를 들어, 2.0, 및 5.1 에서) 채널 기반 오디오 콘텐츠를 출력할 수도 있다. 어떤 경우든, 코딩 엔진들은 전달 시스템들에 의한 출력을 위해 하나 이상의 코덱들 (예를 들어, AAC, AC3, 돌비 트루 HD (Dolby True HD), 돌비 디지털 플러스 (Dolby Digital Plus), 및 DTS 마스터 오디오 (DTS Master Audio)) 에 기초하여 채널 기반 오디오 콘텐츠를 수신 및 인코딩할 수도 있다. 게이밍 오디오 스튜디오들은, 예컨대 DAW 를 이용함으로써, 하나 이상의 게임 오디오 스템들을 출력할 수도 있다. 게임 오디오 코딩/렌더링 엔진들은 전달 시스템들에 의한 출력을 위해 채널 기반 오디오 콘텐츠로 오디오 스템들을 코딩 및 또는 렌더링할 수도 있다. 기법들이 수행될 수도 있는 다른 예시적인 맥락은, 브로드캐스트 레코딩 오디오 오브젝트들, 프로페셔널 오디오 시스템들, 소비자 온-디바이스 캡처, HOA 오디오 포맷, 온-디바이스 렌더링, 소비자 오디오, TV, 및 액세서리들, 및 카 오디오 시스템들을 포함할 수도 있는 오디오 에코시스템을 포함한다.Movie studios, music studios, and gaming audio studios may also receive audio content. In some instances, the audio content may represent the output of the acquisition. Movie studios may output channel based audio content (e.g., at 2.0, 5.1, and 7.1), for example, by using a digital audio workstation (DAW). Music studios may output channel based audio content (e.g., at 2.0, and 5.1), for example, by using a DAW. In any case, the coding engines may use one or more codecs (e.g., AAC, AC3, Dolby True HD, Dolby Digital Plus, and DTS Master Audio) for output by the delivery systems (DTS Master Audio)) to receive and encode channel based audio content. Gaming audio studios may output one or more game audio systems, for example, by using a DAW. Game audio coding / rendering engines may also code and / or render audio systems with channel based audio content for output by delivery systems. Other exemplary contexts in which the techniques may be implemented are broadcast recording audio objects, professional audio systems, consumer on-device capture, HOA audio format, on-device rendering, consumer audio, TV, and accessories, And an audio echo system that may include audio systems.

브로드캐스트 레코딩 오디오 오브젝트들, 프로페셔널 오디오 시스템들, 및 소비자 온-디바이스 캡처는 모두 이들의 출력을 HOA 오디오 포맷을 이용하여 코딩할 수도 있다. 이러한 방법으로, 오디오 콘텐츠는 온-디바이스 렌더링, 소비자 오디오, TV, 및 액세서리들, 및 카 오디오 시스템들을 이용하여 재생될 수도 있는 단일 표현으로 HOA 오디오 포맷을 이용하여 코딩될 수도 있다. 다시 말해, 오디오 콘텐츠의 단일 표현은 오디오 재생 시스템 (16) 과 같은 일반 오디오 재생 시스템에서 (즉, 5.1, 7.1 등과 같은 특정 구성을 필요로 하는 것과는 대조적으로) 재생될 수도 있다.Broadcast recording audio objects, professional audio systems, and consumer on-device captures may both code their output using the HOA audio format. In this way, the audio content may be coded using the HOA audio format in a single representation that may be reproduced using on-device rendering, consumer audio, TV, and accessories, and car audio systems. In other words, a single representation of the audio content may be played in a generic audio playback system, such as audio playback system 16 (i.e., as opposed to requiring a particular configuration such as 5.1, 7.1, etc.).

기법들이 수행될 수도 있는 맥락의 다른 예들은 포착 엘리먼트들, 및 재생 엘리먼트들을 포함할 수도 있는 오디오 에코시스템을 포함한다. 포착 엘리먼트들은 유선 및/또는 무선 포착 디바이스들 (예를 들어, 아이겐 마이크로폰들), 온-디바이스 서라운드 사운드 캡처, 및 모바일 디바이스들 (예를 들어, 스마트폰들 및 태블릿들) 을 포함할 수도 있다. 일부 예들에서, 유선 및/또는 무선 포착 디바이스들은 유선 및/또는 무선 통신 채널(들) 을 통해 모바일 디바이스에 커플링될 수도 있다.Other examples of contexts in which techniques may be performed include capture elements, and audio echo systems that may include playback elements. Acquisition elements may include wired and / or wireless acquisition devices (e.g., eigenmicrophones), on-device surround sound capture, and mobile devices (e.g., smartphones and tablets). In some instances, the wired and / or wireless acquisition devices may be coupled to the mobile device via the wired and / or wireless communication channel (s).

본 개시물의 하나 이상의 기법들에 따르면, 모바일 디바이스는 음장을 포착하는데 이용될 수도 있다. 예를 들어, 모바일 디바이스는 유선 및/또는 무선 포착 디바이스들 및/또는 온-디바이스 서라운드 사운드 캡처 (예를 들어, 모바일 디바이스 내에 통합되는 복수의 마이크로폰들) 를 통해 음장을 포착할 수도 있다. 그 후에, 모바일 디바이스는 재생 엘리먼트들 중 하나 이상의 재생 엘리먼트에 의한 재생을 위해 포착된 음장을 HOA 계수들로 코딩할 수도 있다. 예를 들어, 모바일 디바이스의 사용자는 라이브 이벤트 (예를 들어, 회의, 컨퍼런스, 연극, 콘서트 등) 를 레코딩 (라이브 이벤트의 음장을 포착) 하고 그 레코딩을 HOA 계수들로 코딩할 수도 있다.According to one or more techniques of the present disclosure, the mobile device may be used to acquire a sound field. For example, the mobile device may capture the sound field through wired and / or wireless acquisition devices and / or on-device surround sound capture (e.g., a plurality of microphones integrated into the mobile device). The mobile device may then code the acquired sound field for playback by one or more of the playback elements with HOA coefficients. For example, a user of a mobile device may record a live event (e.g., a conference, a conference, a play, a concert, etc.) (capture the sound field of a live event) and code the recording into HOA coefficients.

모바일 디바이스는 또한, HOA 코딩된 음장을 재생하기 위해 재생 엘리먼트들 중 하나 이상을 활용할 수도 있다. 예를 들어, 모바일 디바이스는 HOA 코딩된 음장을 디코딩하고, 재생 엘리먼트들 중 하나 이상으로 하여금 음장을 재생성하게 하는 신호를 재생 엘리먼트들 중 하나 이상에 출력할 수도 있다. 하나의 예로서, 모바일 디바이스는 신호를 하나 이상의 스피커들 (예를 들어, 스피커 어레이들, 사운드 바들 등) 에 출력하기 위해 무선 및/또는 유선 통신 채널들을 활용할 수도 있다. 다른 예로서, 모바일 디바이스는 하나 이상의 도킹 스테이션들 및/또는 하나 이상의 도킹된 스피커들 (예를 들어, 스마트 카들 및/또는 가정들에 있는 사운드 시스템들) 에 신호를 출력하기 위해 도킹 솔루션들을 활용할 수도 있다. 다른 예로서, 모바일 디바이스는, 예를 들어, 현실적인 바이노럴 사운드를 생성하기 위해, 헤드폰들의 세트에 신호를 출력하도록 헤드폰 렌더링을 활용할 수도 있다.The mobile device may also utilize one or more of the reproduction elements to reproduce the HOA coded sound field. For example, the mobile device may decode the HOA coded sound field and output one or more of the playback elements to one or more of the playback elements to cause a signal to regenerate the sound field. As one example, a mobile device may utilize wireless and / or wired communication channels to output signals to one or more speakers (e.g., speaker arrays, sound bars, etc.). As another example, a mobile device may utilize docking solutions to output signals to one or more docking stations and / or one or more docked speakers (e.g., soundcards in smart cars and / or homes) have. As another example, a mobile device may utilize headphone rendering to output a signal to a set of headphones, for example, to produce a realistic binaural sound.

일부 예들에서, 특정 모바일 디바이스는 3D 음장을 포착하는 것 그리고 추후의 시간에 동일한 3D 음장을 재생하는 것 양쪽을 행할 수도 있다. 일부 예들에서, 모바일 디바이스는 3D 음장을 포착하고, 3D 음장을 HOA 로 인코딩하고, 재생을 위해 인코딩된 3D 음장을 하나 이상의 다른 디바이스들 (예를 들어, 다른 모바일 디바이스들 및/또는 다른 비-모바일 디바이스들) 로 송신할 수도 있다.In some instances, a particular mobile device may take both capturing the 3D sound field and playing back the same 3D sound field at a later time. In some examples, the mobile device captures the 3D sound field, encodes the 3D sound field to HOA, and transmits the encoded 3D sound field for playback to one or more other devices (e.g., other mobile devices and / Devices).

기법들이 수행될 수도 있는 또 다른 맥락은, 오디오 콘텐츠, 게임 스튜디오들, 코딩된 오디오 콘텐츠, 렌더링 엔진들, 및 전달 시스템들을 포함할 수도 있는 오디오 에코시스템을 포함한다. 일부 예들에서, 게임 스튜디오들은 HOA 신호들의 편집을 지원할 수도 있는 하나 이상의 DAW들을 포함할 수도 있다. 예를 들어, 하나 이상의 DAW들은 하나 이상의 게임 오디오 시스템들과 동작 (예를 들어, 작동) 하도록 구성될 수도 있는 HOA 플러그인들 및/또는 툴들을 포함할 수도 있다. 일부 예들에서, 게임 스튜디오들은 HOA 를 지원하는 새로운 스템 포맷들을 출력할 수도 있다. 어떤 경우든, 게임 스튜디오들은 전달 시스템들에 의한 재생을 위해 음장을 렌더링할 수도 있는 렌더링 엔진들로 코딩된 오디오 콘텐츠를 출력할 수도 있다.Another context in which techniques may be performed includes an audio echo system that may include audio content, game studios, coded audio content, rendering engines, and delivery systems. In some instances, game studios may include one or more DAWs that may support editing of HOA signals. For example, one or more DAWs may include HOA plug-ins and / or tools that may be configured to operate (e.g., operate) with one or more game audio systems. In some instances, game studios may output new stem formats that support HOA. In any case, game studios may output coded audio content with rendering engines that may render the sound field for playback by delivery systems.

기법들은 또한 예시적인 오디오 포착 디바이스들에 관하여 수행될 수도 있다. 예를 들어, 3D 음장을 레코딩하도록 일괄적으로 구성되는 복수의 마이크로폰들을 포함할 수도 있는 아이겐 마이크로폰에 관하여 기법들이 수행될 수도 있다. 일부 예들에서, 아이겐 마이크로폰의 복수의 마이크로폰들은, 대략 4cm 의 반경을 갖는 실질적으로 구형 볼의 표면 상에 위치될 수도 있다. 일부 예들에서, 오디오 인코딩 디바이스 (20) 는 마이크로폰으로부터 직접 비트스트림 (21) 을 출력하도록 아이겐 마이크로폰 내에 통합될 수도 있다.The techniques may also be performed with respect to exemplary audio acquisition devices. Techniques may be performed with respect to an eigenmicrophone, which may, for example, comprise a plurality of microphones collectively configured to record a 3D sound field. In some instances, the plurality of microphones of the eigenmicrophone may be located on the surface of a substantially spherical ball having a radius of approximately 4 cm. In some instances, the audio encoding device 20 may be integrated into the eigenmicrophone to output the bitstream 21 directly from the microphone.

다른 예시적인 오디오 포착 맥락은, 하나 이상의 아이겐 마이크로폰들과 같은 하나 이상의 마이크로폰들로부터 신호를 수신하도록 구성될 수도 있는 제조 트럭 (production truck) 을 포함할 수도 있다. 제조 트럭은 또한 도 3 의 오디오 인코더 (20) 와 같은 오디오 인코더를 포함할 수도 있다.Other exemplary audio capture contexts may include a production truck that may be configured to receive signals from one or more microphones, such as one or more ear microphones. The manufacturing truck may also include an audio encoder, such as the audio encoder 20 of FIG.

모바일 디바이스는 또한, 일부 경우들에서, 3D 음장을 레코딩하도록 일괄적으로 구성되는 복수의 마이크로폰들을 포함할 수도 있다. 다시 말해, 복수의 마이크로폰은 X, Y, Z 다이버시티 (diversity) 를 가질 수도 있다. 일부 예들에서, 모바일 디바이스는 마이크로폰을 포함할 수도 있고, 이 마이크로폰은 모바일 디바이스의 하나 이상의 다른 마이크로폰들에 관하여 X, Y, Z 다이버시티를 제공하기 위해 회전될 수도 있다. 모바일 디바이스는 또한 도 3 의 오디오 인코더 (20) 와 같은 오디오 인코더를 포함할 수도 있다.The mobile device may also include, in some cases, a plurality of microphones that are collectively configured to record a 3D sound field. In other words, the plurality of microphones may have X, Y, Z diversity. In some instances, the mobile device may include a microphone, which may be rotated to provide X, Y, Z diversity for one or more other microphones of the mobile device. The mobile device may also include an audio encoder, such as the audio encoder 20 of FIG.

러기다이즈드 (ruggedized) 비디오 캡처 디바이스는 또한 3D 음장을 레코딩하도록 구성될 수도 있다. 일부 예들에서, 러기다이즈드 비디오 캡처 디바이스는 활동에 관여된 사용자의 헬멧에 부착될 수도 있다. 예를 들어, 러기다이즈드 비디오 캡처 디바이스는 급류 래프팅 사용자의 헬멧에 부착될 수도 있다. 이러한 방법으로, 러기다이즈드 비디오 캡처 디바이스는 사용자 주위의 모든 액션 (예를 들어, 물이 사용자의 후방에서 부딪치는 것, 다른 래프터가 사용자의 전방에서 말하는 것 등...) 을 표현하는 3D 음장을 캡처할 수도 있다.A ruggedized video capture device may also be configured to record a 3D sound field. In some instances, the trusted video capture device may be attached to the user ' s helmet involved in the activity. For example, a tagged video capture device may be attached to the helmet of a torrent rafting user. In this way, the latched video capture device can be used to capture 3D images of all the actions around the user (e.g., the water hitting in the back of the user, other ruffers speaking in front of the user, etc.) You can also capture the sound field.

기법들은 또한 3D 음장을 레코딩하도록 구성될 수도 있는 액세서리 향상 모바일 디바이스에 관하여 수행될 수도 있다. 일부 예들에서, 모바일 디바이스는, 하나 이상의 액세서리들이 부가된, 상술된 모바일 디바이스들과 유사할 수도 있다. 예를 들어, 아이겐 마이크로폰은 위에서 언급된 모바일 디바이스에 부착되어 액세서리 향상 모바일 디바이스를 형성할 수도 있다. 이러한 방법으로, 액세서리 향상 모바일 디바이스는 액세서리 향상 모바일 디바이스에 일체화된 사운드 캡처 컴포넌트들만을 이용하는 것보다 더 높은 품질 버전의 3D 음장을 캡처할 수도 있다.The techniques may also be performed with respect to an accessory enhancement mobile device that may be configured to record a 3D sound field. In some instances, the mobile device may be similar to the mobile devices described above, to which one or more accessories have been added. For example, an eigenmicrophone may be attached to the above-mentioned mobile device to form an accessory enhancement mobile device. In this way, the accessory enhancement mobile device may capture a higher quality version of the 3D sound field than using only the integrated sound capture components in the accessory enhancement mobile device.

본 개시물에서 설명되는 기법들의 다양한 양태들을 수행할 수도 있는 예시적인 오디오 재생 디바이스들이 아래에 추가로 논의된다. 본 개시물의 하나 이상의 기법들에 따르면, 스피커들 및/또는 사운드 바들은 여전히 3D 음장을 재생하면서 어느 임의의 구성으로도 배열될 수도 있다. 더욱이, 일부 예들에서, 헤드폰 재생 디바이스들은 유선 또는 무선 연결 중 어느 하나를 통해 디코더 (24) 에 커플링될 수도 있다. 본 개시물의 하나 이상의 기법들에 따르면, 음장의 단일 일반 표현은 스피커들, 사운드 바들, 및 헤드폰 재생 디바이스들의 임의의 조합에서 음장을 렌더링하는데 활용될 수도 있다.Exemplary audio playback devices that may perform various aspects of the techniques described in this disclosure are discussed further below. According to one or more techniques of the present disclosure, the speakers and / or sound bars may still be arranged in any arbitrary configuration while reproducing the 3D sound field. Moreover, in some instances, the headphone playback devices may be coupled to the decoder 24 via either a wired or wireless connection. According to one or more techniques of the present disclosure, a single general representation of a sound field may be utilized to render a sound field in any combination of speakers, sound bars, and headphone reproduction devices.

다수의 상이한 예시적인 오디오 재생 환경들은 또한 본 개시물에서 설명되는 기법들의 다양한 양태들을 수행하기에 적합할 수도 있다. 예를 들어, 5.1 스피커 재생 환경, 2.0 (예를 들어, 스테레오) 스피커 재생 환경, 풀 하이트 전방 라우드스피커들을 갖는 9.1 스피커 재생 환경, 22.2 스피커 재생 환경, 16.0 스피커 재생 환경, 자동차 스피커 재생 환경, 및 이어 버드 (ear bud) 재생 환경을 갖는 모바일 디바이스는 본 개시물에서 설명되는 기법들의 다양한 양태들을 수행하기 위한 적합한 환경들일 수도 있다.A number of different exemplary audio playback environments may also be suitable for performing various aspects of the techniques described in this disclosure. For example, a 5.1 speaker reproduction environment, a 2.0 (e.g., stereo) speaker reproduction environment, a 9.1 speaker reproduction environment with full height front loudspeakers, a 22.2 speaker reproduction environment, a 16.0 speaker reproduction environment, A mobile device having an ear bud playback environment may be suitable environments for performing various aspects of the techniques described in this disclosure.

본 개시물의 하나 이상의 기법들에 따르면, 음장의 단일 일반 표현은 전술한 재생 환경들 중 임의의 재생 환경에서 음장을 렌더링하는데 활용될 수도 있다. 부가적으로, 본 개시물의 기법들은, 렌더러로 하여금, 상술된 것 이외의 재생 환경들에서의 재생을 위해 일반 표현으로부터의 음장을 렌더링하는 것을 가능하게 한다. 예를 들어, 설계 고려사항들이 7.1 스피커 재생 환경에 따른 스피커들의 적절한 배치를 금지하는 경우 (예를 들어, 우측 서라운드 스피커를 배치하는 것이 가능하지 않은 경우), 본 개시물의 기법들은, 렌더로 하여금, 6.1 스피커 재생 환경에서 재생이 달성될 수도 있도록 다른 6 개의 스피커들로 보상하는 것을 가능하게 한다.According to one or more techniques of the present disclosure, a single general representation of the sound field may be utilized to render the sound field in any of the playback environments described above. Additionally, the techniques of the present disclosure enable a renderer to render a sound field from a regular expression for playback in playback environments other than those described above. For example, if design considerations prohibit proper placement of speakers in accordance with a 7.1 speaker playback environment (e.g., where it is not possible to place a right surround speaker), the techniques of the present disclosure will allow the lender, 6.1 It is possible to compensate with the other six speakers so that reproduction in the speaker reproduction environment may be achieved.

더욱이, 사용자는 헤드폰들을 착용한 동안 스포츠 게임을 시청할 수도 있다. 본 개시물의 하나 이상의 기법들에 따르면, 스포츠 게임의 3D 음장이 포착될 수도 있고 (예를 들어, 하나 이상의 아이겐 마이크로폰들이 야구 스타디움에 및/또는 그 주변에 배치될 수도 있음), 3D 음장에 대응하는 HOA 계수들이 획득되고 디코더로 송신될 수도 있고, 디코더는 HOA 계수들에 기초하여 3D 음장을 재구성하고 재구성된 3D 음장을 렌더러로 출력할 수도 있고, 렌더러는 재생 환경 (예를 들어, 헤드폰들) 의 타입에 대한 표시를 획득할 수도 있으며, 헤드폰들로 하여금 스포츠 게임의 3D 음장의 표현을 출력하게 하는 신호들로 재구성된 3D 음장을 렌더링할 수도 있다.Moreover, the user may watch the sports game while wearing the headphones. According to one or more techniques of the present disclosure, a 3D sound field of a sports game may be captured (e.g., one or more child microphones may be placed in and / or around the baseball stadium), a 3D sound field corresponding to a 3D sound field HOA coefficients may be obtained and transmitted to the decoder and the decoder may reconstruct the 3D sound field based on the HOA coefficients and output the reconstructed 3D sound field to the renderer, Type and may render the reconstructed 3D sound field with signals that cause the headphones to output a representation of the 3D sound field of the sports game.

상술된 다양한 경우들 각각에서, 오디오 인코딩 디바이스 (20) 는 오디오 인코딩 디바이스 (20) 가 수행하도록 구성되는 방법을 수행하거나 또는 그렇지 않으면 그 방법의 각각의 단계를 수행하기 위한 수단을 포함할 수도 있다는 것을 이해해야 한다. 일부 경우들에서, 수단은 하나 이상의 프로세서들을 포함할 수도 있다. 일부 경우들에서, 하나 이상의 프로세서들은 비일시적 컴퓨터 판독가능 저장 매체에 저장된 명령들에 의해 구성된 특수 목적 프로세서를 표현할 수도 있다. 다시 말해, 인코딩 예들의 세트들 각각에서 기법들의 다양한 양태들은, 실행될 때, 하나 이상의 프로세서들로 하여금 오디오 인코딩 디바이스 (20) 가 수행하도록 구성된 방법을 수행하게 하는 명령들을 저장한 비일시적 컴퓨터 판독가능 저장 매체를 제공할 수도 있다.In each of the various cases described above, the audio encoding device 20 may comprise means for performing the method configured to perform the audio encoding device 20, or otherwise performing the respective steps of the method I have to understand. In some cases, the means may comprise one or more processors. In some cases, one or more processors may represent a special purpose processor configured by instructions stored in a non-volatile computer readable storage medium. In other words, various aspects of the techniques in each of the sets of encoding examples, when executed, may be implemented in a non-transitory computer readable storage (ROM) storage medium that stores instructions that cause one or more processors to perform the method Media may be provided.

하나 이상의 예들에서, 설명된 기능들은 하드웨어, 소프트웨어, 펌웨어, 또는 이들의 임의의 조합으로 구현될 수도 있다. 소프트웨어로 구현된 경우, 그 기능들은 하나 이상의 명령들 또는 코드로서 컴퓨터 판독가능 매체 상에 저장되거나 그 컴퓨터 판독가능 매체를 통해 송신될 수도 있고 하드웨어 기반 프로세싱 유닛에 의해 실행될 수도 있다. 컴퓨터 판독가능 매체들은, 데이터 저장 매체들과 같은 유형 매체에 대응하는 컴퓨터 판독가능 저장 매체들을 포함할 수도 있다. 데이터 저장 매체들은, 본 개시물에서 설명되는 기법들의 구현을 위해 명령들, 코드 및/또는 데이터 구조들을 취출하기 위해 하나 이상의 컴퓨터들 또는 하나 이상의 프로세서들에 의해 액세스될 수 있는 임의의 가용 매체들일 수도 있다. 컴퓨터 프로그램 제품은 컴퓨터 판독가능 매체를 포함할 수도 있다.In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted via a computer-readable medium as one or more instructions or code, or may be executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media corresponding to type media, such as data storage media. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and / or data structures for implementation of the techniques described in this disclosure have. The computer program product may comprise a computer readable medium.

이와 마찬가지로, 상술된 다양한 경우들 각각에서, 오디오 디코딩 디바이스 (24) 는 오디오 디코딩 디바이스 (24) 가 수행하도록 구성되는 방법을 수행하거나 또는 그렇지 않으면 그 방법의 각각의 단계를 수행하기 위한 수단을 포함할 수도 있다는 것을 이해해야 한다. 일부 경우들에서, 수단은 하나 이상의 프로세서들을 포함할 수도 있다. 일부 경우들에서, 하나 이상의 프로세서들은 비일시적 컴퓨터 판독가능 저장 매체에 저장된 명령들에 의해 구성된 특수 목적 프로세서를 표현할 수도 있다. 다시 말해, 인코딩 예들의 세트들 각각에서 기법들의 다양한 양태들은, 실행될 때, 하나 이상의 프로세서들로 하여금 오디오 디코딩 디바이스 (24) 가 수행하도록 구성된 방법을 수행하게 하는 명령들을 저장한 비일시적 컴퓨터 판독가능 저장 매체를 제공할 수도 있다.Likewise, in each of the various cases described above, the audio decoding device 24 includes means for performing, or otherwise performing, the method (s) configured for the audio decoding device 24 to perform It should be understood that it is possible. In some cases, the means may comprise one or more processors. In some cases, one or more processors may represent a special purpose processor configured by instructions stored in a non-volatile computer readable storage medium. In other words, various aspects of the techniques in each of the sets of encoding examples, when executed, may be used to cause one or more processors to perform a non-volatile computer readable storage Media may be provided.

제한이 아닌 예로서, 이러한 컴퓨터 판독가능 저장 매체들은 RAM, ROM, EEPROM, CD-ROM 또는 다른 광 디스크 스토리지, 자기 디스크 스토리지 또는 다른 자기 스토리지 디바이스들, 플래시 메모리, 또는 명령들 또는 데이터 구조들의 형태로 원하는 프로그램 코드를 저장하는데 이용될 수 있으며 컴퓨터에 의해 액세스될 수 있는 임의의 다른 매체들을 포함할 수 있다. 그러나, 컴퓨터 판독가능 저장 매체들 및 데이터 저장 매체들은 연결들, 캐리어 파들, 신호들, 또는 다른 일시적 매체들을 포함하는 것이 아니라, 그 대신에 비일시적, 유형의 저장 매체들에 관한 것이라는 것을 이해해야 한다. 디스크 (disk) 및 디스크 (disc) 는, 본 명세서에서 사용되는 바와 같이, 콤팩트 디스크 (CD), 레이저 디스크, 광 디스크, 디지털 다기능 디스크 (DVD), 플로피 디스크 및 블루레이 디스크를 포함하며, 여기서 디스크 (disk) 들은 데이터를 자기적으로 보통 재생하지만, 디스크 (disc) 들은 레이저로 데이터를 광학적으로 재생한다. 또한, 상기의 조합들도 컴퓨터 판독가능 매체들의 범위 내에 포함되어야 한다.By way of example, and not limitation, such computer-readable storage media can be stored in RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, Or any other medium which can be used to store the desired program code and which can be accessed by a computer. However, it should be understood that the computer-readable storage mediums and data storage mediums do not include connections, carrier waves, signals, or other temporal media, but instead relate to non-transitory, type of storage media. A disk and a disc as used herein include a compact disc (CD), a laser disc, an optical disc, a digital versatile disc (DVD), a floppy disc and a Blu-ray disc, discs usually reproduce data magnetically, while discs reproduce data optically with a laser. In addition, combinations of the above should also be included within the scope of computer readable media.

명령들은 하나 이상의 디지털 신호 프로세서 (DSP) 들, 범용 마이크로프로세서들, 주문형 집적 회로 (ASIC) 들, 필드 프로그래밍가능 로직 어레이 (FPGA) 들, 또는 다른 등가의 집적 또는 이산 로직 회로부와 같은 하나 이상의 프로세서들에 의해 실행될 수도 있다. 이에 따라, 본 명세서에서 사용되는 바와 같은 용어 "프로세서" 는 전술한 구조, 또는 본 명세서에서 설명되는 기법들의 구현에 적합한 임의의 다른 구조 중 임의의 것을 지칭할 수도 있다. 부가적으로, 일부 양태들에서, 본 명세서에서 설명되는 기능성은, 인코딩 및 디코딩을 위해 구성되거나 또는 조합된 코덱 내에 포함되는 전용 하드웨어 및/또는 소프트웨어 모듈들 내에 제공될 수도 있다. 또한, 기법들은 하나 이상의 회로들 또는 로직 엘리먼트들에서 완전히 구현될 수 있다.The instructions may include one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry Lt; / RTI > Accordingly, the term " processor " as used herein may refer to any of the structures described above, or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided in dedicated hardware and / or software modules included in a codec configured or combined for encoding and decoding. Techniques may also be fully implemented in one or more circuits or logic elements.

본 개시물의 기법들은 무선 핸드셋, 집적 회로 (IC) 또는 IC들의 세트 (예를 들어, 칩셋) 를 포함하는 광범위한 디바이스들 또는 장치들에서 구현될 수도 있다. 다양한 컴포넌트들, 모듈들, 또는 유닛들은 개시된 기법들을 수행하도록 구성된 디바이스들의 기능적 양태들을 강조하기 위해 본 개시물에서 설명되지만, 상이한 하드웨어 유닛들에 의한 실현을 반드시 요구하지는 않는다. 오히려, 상술된 바와 같이, 다양한 유닛들은 코덱 하드웨어 유닛에 조합될 수도 있거나, 또는 적합한 소프트웨어 및/또는 펌웨어와 함께, 상술된 하나 이상의 프로세서들을 포함하여, 상호작용하는 하드웨어 유닛들의 콜렉션에 의해 제공될 수도 있다.The techniques of the present disclosure may be implemented in a wide variety of devices or devices including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chipset). The various components, modules, or units are described in this disclosure to emphasize the functional aspects of the devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined into a codec hardware unit, or may be provided by a collection of interacting hardware units, including one or more of the processors described above, along with suitable software and / or firmware have.

기법들의 다양한 양태들이 설명되었다. 이들 그리고 다른 실시형태들은 다음의 청구항들의 범위 내에 있다.Various aspects of the techniques have been described. These and other embodiments are within the scope of the following claims.

Claims

A device configured to render higher order ambisonic coefficients,
One or more processors,
Scarcity information indicating the sparseness of the matrix used to render the higher order ambience coefficients to produce a plurality of speaker feeds is obtained from a bitstream comprising an encoded version of the higher order ambience coefficients ;
Obtaining code symmetry information from the bitstream, the code symmetry information representing the sign symmetry of the matrix;
From the bitstream, a reduced number of bits used to represent the matrix,
Wherein the processor is configured to reconstruct the matrix based on the scarcity information, the symmetric information, and the reduced number of bits; And
A memory coupled to the one or more processors and configured to store the scarcity information,
And to render high order ambience coefficients.

The method according to claim 1,
Wherein the one or more processors are further configured to render high order ambience coefficients, the matrix being configured to determine a speaker layout that should be used to render the plurality of speaker feeds from the higher order ambience coefficients.

The method according to claim 1,
Further comprising a speaker configured to reproduce a sound field represented by the higher order ambience coefficients based on the plurality of speaker feeds.

The method according to claim 1,
Wherein the one or more processors are further configured to obtain from the bitstream audio rendering information representative of a signal value identifying an audio renderer used to generate the plurality of speaker feeds, And configured to render the plurality of speaker feeds.

5. The method of claim 4,
Wherein the signal value comprises an index associated with the matrix used to render the higher order ambience coefficients into the plurality of speaker feeds,
Wherein the one or more processors are configured to render the plurality of speaker feeds based on the matrix associated with the index included in the signal value.

CLAIMS 1. A method of rendering high order ambience coefficients,
Obtaining scarcity information from a bitstream containing an encoded version of the higher order ambience coefficients, the scarcity indicating the scarcity of the matrix used to render the higher order ambience coefficients to produce a plurality of speaker feeds;
Obtaining code symmetry information indicating the code symmetry of the matrix from the bit stream;
Obtaining, from the bitstream, a reduced number of bits used to represent the matrix; And
Reconstructing the matrix based on the scarcity information, the code symmetry information, and the reduced number of bits
Gt; a < / RTI > higher order ambience coefficients.

The method according to claim 6,
Further comprising determining the speaker layout in which the matrix should be used to render multi-channel audio data from the higher order ambience coefficients.

The method according to claim 6,
And reproducing a sound field represented by the higher order ambience coefficients based on the plurality of speaker feeds.

The method according to claim 6,
Obtaining, from the bitstream, audio rendering information representative of a signal value identifying an audio renderer to be used in generating the plurality of speaker feeds; And
Rendering the plurality of speaker feeds based on the audio rendering information
&Lt; / RTI > further comprising the steps of:

10. The method of claim 9,
Wherein the signal value comprises an index associated with the matrix used to render the higher order ambience coefficients to produce the plurality of speaker feeds,
The method further comprises rendering the plurality of speaker feeds based on the matrix associated with the index included in the signal value.

A device configured to generate a bitstream,
A memory configured to store a matrix used to render a high order ambience coefficient to produce a plurality of speaker feeds; And
One or more processors coupled to the memory,
Obtaining sign symmetry information indicating sign symmetry of the matrix;
Obtaining scarcity information indicating the scarcity of the matrix;
Determine a reduced number of bits used to represent the matrix based on the code symmetry information and the scarcity information;
And to generate the bitstream to include an encoded version of the higher order ambience coefficients, the sign symmetry information, the scarcity information, and the reduced number of bits.
And to generate a bitstream.

12. The method of claim 11,
Wherein the one or more processors are further configured to determine a speaker layout in which the matrix should be used to render the plurality of speaker feeds from the higher order ambience coefficients.

12. The method of claim 11,
Further comprising a microphone configured to capture a sound field represented by the higher order ambience coefficients.

A method of generating a bitstream,
Obtaining scarcity information indicative of a scarcity of a matrix used to render high-order ambience coefficients to generate a plurality of speaker feeds;
Obtaining sign symmetry information indicating sign symmetry of the matrix;
Determining a reduced number of bits used to represent the matrix based on the symbol symmetry information and the scarcity information; And
Generating the bitstream to include an encoded version of the higher order ambience coefficients, the sign symmetry information, the scarcity information, and the reduced number of bits
Gt; a < / RTI > bitstream.

15. The method of claim 14,
Further comprising determining the speaker layout in which the matrix should be used to render multi-channel audio data from the higher order ambience coefficients.

delete