KR101893410B1

KR101893410B1 - Multi-channel decorrelator, multi-channel audio decoder, multi-channel audio encoder, methods and computer program using a premix of decorrelator input signals

Info

Publication number: KR101893410B1
Application number: KR1020167004501A
Authority: KR
Inventors: 사샤 디쉬; 하랄드 푹스; 올리버 헬무트; 위르겐 헤레; 아드리안 무르타자; 조우니 파울루스; 팔코 리더부슈; 레온 테렌티브
Original assignee: 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에.베.
Priority date: 2013-07-22
Filing date: 2014-07-17
Publication date: 2018-10-04
Also published as: US20160157039A1; CN105580390A; EP3419315B1; EP3419314B1; TW201532034A; JP2016531482A; JP2020120389A; WO2015011014A1; AU2014295206A1; RU2016105468A; US11252523B2; PL3025515T3; MX2018012891A; US20220167102A1; US20160240199A1; PT3025515T; ES2924174T3; JP7000488B2; EP2830333A1; MX362548B

Abstract

복수의 역상관기 입력 신호를 기초로 하여 복수의 역상관된 신호를 제공하기 위한 다채널 역상관기는 N 역상관기 입력 신호들의 제 1 세트를 K 역상관기 입력 신호들의 제 2 세트 내로 프리믹싱하도록 구성되고, 여기서 K＜N이다. 다채널 역상관기는 K 역상관기 입력 신호들의 제 1 세트를 기초로 하여 K' 역상관기 출력 신호들의 제 1 세트를 제공하도록 구성된다. 다채널 역상관기는 또한 K' 역상관기 출력 신호들의 제 1 세트를 N' 역상관기 출력 신호들의 제 2 세트 내로 업믹싱하도록 구성되고, 여기서 N'＞K'이다. 다채널 역상관기는 다채널 오디오 디코더에서 사용될 수 있다. 다채널 오디오 인코더는 다채널 역상관기를 위한 복잡도 제어 정보를 제공한다.A multi-channel decorrelator for providing a plurality of decorrelated signals based on the plurality of decorrelator input signals is configured to premix a first set of N decorrelator input signals into a second set of K decorrelator input signals , Where K < N. The multi-channel decorrelator is configured to provide a first set of K 'decorrelator output signals based on the first set of K decorrelator input signals. The multi-channel decorrelator is further configured to upmix a first set of K 'decorrelator output signals into a second set of N' decorrelator output signals, where N '> K'. The multi-channel decorrelator can be used in a multi-channel audio decoder. A multi-channel audio encoder provides complexity control information for a multi-channel decorrelators.

Description

MULTI-CHANNEL AUDIO DECODER, MULTI-CHANNEL AUDIO ENCODER, METHODS AND COMPUTER PROGRAM USING MULTI-CHANNEL DECORRELATOR, MULTI-CHANNEL AUDIO DECODER, MULTI-CHANNEL AUDIO ENCODER, METHODS AND COMPUTER PROGRAM USING A PREMIX OF DECORRELATOR INPUT SIGNALS}

본 발명에 따른 실시 예들은 복수의 역상관기 입력 신호를 기초로 하여 복수의 역상관된 신호를 제공하기 위한 다채널 역상관기에 관한 것이다.Embodiments in accordance with the present invention are directed to a multi-channel decorrelator for providing a plurality of decorrelated signals based on a plurality of decorrelator input signals.

본 발명에 따른 또 다른 실시 예들은 인코딩된 표현(encoded representation)을 기초로 하여 적어도 두 개의 출력 오디오 신호를 제공하기 위한 다채널 오디오 디코더에 관한 것이다.Yet another embodiment according to the present invention relates to a multi-channel audio decoder for providing at least two output audio signals based on an encoded representation.

본 발명에 따른 또 다른 실시 예들은 적어도 두 개의 입력 오디오 신호를 기초로 하여 인코딩된 표현을 제공하기 위한 다채널 오디오 인코더에 관한 것이다.Yet another embodiment according to the present invention relates to a multi-channel audio encoder for providing an encoded representation based on at least two input audio signals.

본 발명에 따른 또 다른 실시 예들은 복수의 역상관기 입력 신호를 기초로 하여 복수의 역상관된 신호를 제공하기 위한 방법에 관한 것이다.Yet another embodiment in accordance with the present invention is directed to a method for providing a plurality of decorrelated signals based on a plurality of decorrelator input signals.

본 발명에 따른 일부 실시 예들은 인코딩된 표현을 기초로 하여 적어도 두 개의 출력 오디오 신호를 제공하기 위한 방법에 관한 것이다.Some embodiments in accordance with the present invention are directed to a method for providing at least two output audio signals based on an encoded representation.

본 발명에 따른 일부 실시 예들은 적어도 두 개의 입력 오디오 신호를 기초로 하여 인코딩된 표현을 제공하기 위한 방법에 관한 것이다.Some embodiments in accordance with the present invention are directed to a method for providing an encoded representation based on at least two input audio signals.

본 발명에 따른 일부 예들은 상기 방법들 중 어느 하나를 실행하기 위한 컴퓨터 프로그램에 관한 것이다.Some examples according to the present invention relate to a computer program for carrying out any of the above methods.

본 발명에 따른 일부 실시 예들은 인코딩된 오디오 표현에 관한 것이다.Some embodiments in accordance with the present invention are directed to encoded audio representations.

일반적으로 설명하면, 본 발명에 따른 일부 실시 예들은 다채널 다운믹스(downmix)/업믹스 파라미터 오디오 오브젝트 코딩 시스템들을 위한 역상관(decorrelation) 개념에 관한 것이다.Generally speaking, some embodiments in accordance with the present invention relate to decorrelation concepts for multi-channel downmix / upmix parameter audio object coding systems.

최근에, 오디오 콘텐츠의 저장과 전송을 위한 요구가 꾸준히 증가하고 있다. 게다가, 오디오 콘텐츠의 저장과 전송을 위한 품질 요구사항들이 또한 꾸준히 증가하고 있다. 따라서, 오디오 콘텐츠의 인코딩과 디코딩을 위한 개념이 개선되어 왔다.Recently, there has been a steady increase in demand for the storage and transmission of audio content. In addition, quality requirements for the storage and transmission of audio content are also steadily increasing. Thus, the concept for encoding and decoding audio content has improved.

예를 들면, 국제 표준 ISO/IEC 13818-7:2003에서 설명되는, 이른바 "고급 오디오 코딩(AAC)"이 개발되었다. 게다가, 예를 들면 국제 표준 ISO/IEC 23003-1:2007에서 설명되는, 이른바 "MPEG 서라운드" 개념 같은, 일부 공간 확장들이 생성되었다. 게다가, 오디오 신호들의 공간 정보의 인코딩과 디코딩을 위한 부가적인 향상들이 이른바 "공간 오디오 오브젝트 코딩"과 관련된, 국제 표준 ISO/IEC 23003-2:2010에서 설명된다.For example, so-called " Advanced Audio Coding " (AAC), described in the International Standard ISO / IEC 13818-7: 2003, has been developed. In addition, some space extensions, such as the so-called " MPEG surround " concept, described for example in International Standard ISO / IEC 23003-1: In addition, additional enhancements for encoding and decoding spatial information of audio signals are described in the International Standard ISO / IEC 23003-2: 2010, which is related to so-called " spatial audio object coding ".

게다가, 뛰어난 코딩 효율로 일반적인 오디오 신호들 및 음성 신호들 모두를 인코딩하고 다채널 오디오 신호들을 처리하기 위한 가능성을 제공하는 스위칭 가능한 오디오 인코딩/디코딩 개념이 이른바 "통합 음성 및 오디오 코딩(USAC, 이하 USAC로 표기)" 개념을 설명하는, 국제 표준 ISO/IEC 23003-3:2012에서 정의된다.In addition, the concept of switchable audio encoding / decoding, which encodes both common audio signals and speech signals with excellent coding efficiency and provides the possibility to process multi-channel audio signals, is called " Integrated Voice and Audio Coding ISO / IEC 23003-3: 2012, which describes the concept of "

게다가, 또 다른 종래의 개념들이 본 발명의 설명의 끝에서 언급되는, 참고문헌들에서 설명된다.In addition, other conventional concepts are set forth in the references, referred to at the end of the description of the present invention.

그러나, 3차원 오디오 장면들의 효율적인 인코딩과 디코딩을 위한 훨씬 더 진보된 개념을 제공하기 위한 바람이 존재한다.However, there is a desire to provide a much more advanced concept for efficient encoding and decoding of 3D audio scenes.

본 발명에 따른 일 실시 예는 복수의 역상관기 입력 신호를 기초로 하여 복수의 역상관된 신호를 제공하기 위한 다채널 역상관기를 생성한다. 다채널 역상관기는 N 역상관기 입력 신호들의 제 1 세트를 K 역상관기 입력 신호들의 제 2 세 내로 프리믹싱하도록 구성되고, 여기서 K＜N이다. 다채널 역상관기는 K 역상관기 입력 신호들의 제 1 세트를 기초로 하여 K' 역상관기 출력 신호들의 제 1 세트를 제공하도록 구성된다. 다채널 역상관기는 K' 역상관기 출력 신호들의 제 1 세트를 N' 역상관기 출력 신호들의 제 2 세트 내로 업믹싱하도록 구성되고, 여기서 N'＞K이다.One embodiment in accordance with the present invention generates a multi-channel decorrelator for providing a plurality of decorrelated signals based on a plurality of decorrelator input signals. The multi-channel decorrelator is configured to premix a first set of N decorrelator input signals into a second set of K decorrelator input signals, where K < N. The multi-channel decorrelator is configured to provide a first set of K 'decorrelator output signals based on the first set of K decorrelator input signals. The multi-channel decorrelator is configured to upmix a first set of K 'decorrelator output signals into a second set of N' decorrelator output signals, where N '> K.

본 발명에 따른 일 실시 예는 N 역상관기 입력 신호들의 제 1 세트를 K 역상관기 입력 신호들의 제 2 세트 내로 프리믹싱함으로써 역상관의 복잡도가 감소될 수 있다는 개념을 기초로 하고, K 역상관기 입력 신호들의 제 2 세트는 N 역상관기 입력 신호들의 제 1 세트보다 적은 신호들을 포함한다. 따라서, 기본적인 역상관기 기능은 예를 들면, K (개별 역상관기들(또는 개별 역상관들)이 요구되도록(그리고 N 역상관기들은 요구되지 않는) K 신호들(제 2 세트의 K 역상관기 입력 신호들) 상에서만 실행된다. 게다가, N' 역상관기 출력 신호들을 제공하기 위하여, 업믹스가 실행되고, K' 역상관기 출력 신호들의 제 1 세트는 N' 역상관기 출력 신호들의 제 2 세트 내로 업믹싱된다. 따라서, 상대적으로 큰 수의 역상관기 입력 신호들(즉, 역상관기 입력 신호들의 제 1 세트의 N 신호들)을 기초로 하여 상대적으로 큰 수의 역상관된 신호들(즉, 역상관기 출력 신호들의 제 2 세트의 N' 신호들)을 획득하는 것이 가능하고, 코어 역상관 기능은 K 신호들만을 기초로 하여(예를 들면, K 개별 역상관기들만을 사용하여) 실행된다. 따라서, 역상관 효율성에서의 중요한 이득이 달성되고, 이는 처리 전력 및 자원들(예를 들면, 에너지)을 절약하는데 도움을 준다.One embodiment in accordance with the present invention is based on the concept that the complexity of the decorrelation can be reduced by premixing the first set of N decorrelator input signals into the second set of K decorrelator input signals, The second set of signals includes fewer signals than the first set of N de-correlator input signals. Thus, the basic decorrelator function is, for example, a set of K signals (a second set of K decorrelator input signals < RTI ID = 0.0 > In addition, to provide N 'decorrelators output signals, an upmix is performed and a first set of K' decorrelators output signals are upmixed into a second set of N 'decorrelators output signals A relatively large number of decorrelated signals (i. E., The decorrelator output < RTI ID = 0.0 > Signals of the second set of signals), and the core inverse correlation function is performed based only on the K signals (e.g., using only K individual inverse correlators). Thus, Significant benefits in correlation efficiency Is achieved, which helps to save processing power and resources (e. G., Energy).

바람직한 실시 예에서, 역상관기 입력 신호들의 제 2 세트의 신호들의 수(K)는 역상관기 출력 신호들의 제 1 세트의 신호들의 수(K')와 동일하다. 따라서, 예를 들면 각각 피리믹싱으로부터 하나의 역상관기 입력 신호(역상관기 입력 신호들의 제 2 세트의)를 수신하고, 각각 업믹싱에 하나의 역상관 출력 신호(역상관기 출력 신호들의 제 1 세트의)를 제공하는, K 개별 역상관기들이 존재할 수 있다. 따라서, 각각, 하나의 입력 신호를 기초로 하여 하나의 출력 신호를 제공하는, 간단한 개별 역상관기들이 사용될 수 있다.In a preferred embodiment, the number K of signals of the second set of decorrelator input signals is equal to the number K 'of signals of the first set of decorrelator output signals. Thus, for example, it is possible to receive one de-correlator input signal (of the second set of de-correlator input signals) from each of the premixing, and each de-mix with one de-correlated output signal ), &Lt; / RTI > may be present. Thus, simple individual decorrelators can be used, each providing one output signal based on one input signal.

또 다른 바람직한 실시 예에서, 역상관기 입력 신호들의 제 1 세트의 신호들의 수(N)는 역상관기 출력 신호들의 제 2 세트의 신호들의 수(N')와 동일할 수 dlkT다 따라서, 다채널 오디오 역상관기가 N 독립 역상관기들의 뱅크 같은, 외부로부터, 나타나도록, 다채널 역상관기에 의해 수신되는 신호들의 수는 다채널 역상관기에 의해 제공되는 신호들의 수와 동일하다(그러나, 역상관 결과는 코어 역상관기를 위한 K 입력 신호들만의 사용에 기인하여 일부 결함을 포함할 수 있다). 따라서, 다채널 역상관기는 동일한 수의 입력 신호들과 출력 신호들을 갖는 종래의 역상관기들이 즉시 대체로서 사용될 수 있다. 게다가, 업믹싱은 예를 들면, 적당한 노력으로 그러한 구성에서의 프리믹싱으로부터 유도될 수 있다는 것에 유의하여야 한다.In another preferred embodiment, the number of signals (N) of the first set of decorrelator input signals may be equal to the number of signals (N ') of the second set of decorrelator output signals. Thus, The number of signals received by the multi-channel decorrelator such that the decorrelator appears from the outside, such as a bank of N independent decorrelators, is equal to the number of signals provided by the multi-channel decorrelator (however, But may include some defects due to the use of only K input signals for the core inverse correlators). Thus, a multi-channel decorrelator can be used as an immediate replacement for conventional decorrelators having the same number of input and output signals. In addition, it should be noted that upmixing can be derived, for example, from premixing in such a configuration with reasonable effort.

바람직한 실시 예에서, 역상관기 입력 신호들의 제 1 세트의 신호들의 수(N)는 3보다 크거나 또는 동일할 수 있고, 역상관기 출력 신호들의 제 2 세트의 신호들의 수(N')는 또한 3보다 크거나 또는 동일할 수 있다. 그러한 경우에, 다채널 역상관기는 특정 효율을 제공할 수 있다.In a preferred embodiment, the number of signals (N) in the first set of decorrelator input signals may be greater than or equal to three, and the number of signals in the second set of decorrelator output signals (N ' May be greater or equal. In such a case, the multi-channel decorrelator can provide a certain efficiency.

바람직한 실시 예에서, 다채널 역상관기는 프리믹싱 매트릭스를 사용하여(즉, 선형 프리믹싱 기능을 사용하여) N 역상관기 입력 신호들의 제 1 세트를 K 역상관기 입력 신호들의 제 2 세트 내로 프리믹싱하도록 구성될 수 있다. 이러한 경우에, 다채널 역상관기는 K 역상관기 입력 신호들의 제 2 세트를 기초로 하여(예를 들면 개별 역상관기들을 사용하여) K' 역상관기 출력 신호들의 제 1 세트를 획득하도록 구성될 수 있다. 다채널 역상관기는 또한 포스트믹싱 매트릭스를 사용하여, 즉 선형 포스트믹싱 기능을 사용하여 K' 역상관기 출력 신호들의 제 1 세트를 N' 역상관기 출력 신호들의 제 2 세트 내로 업믹싱하도록 구성될 수 있다. 따라서, 왜곡들이 작게 유지될 수 있다. 또한 프리믹싱 및 포스트믹싱(또한 업믹싱으로서 지정되는)은 계산적으로 효율적인 방식으로 실행될 수 있다.In a preferred embodiment, the multi-channel decorrelator is configured to pre-mix a first set of N decorrelator input signals into a second set of K decorrelator input signals using a premixing matrix (i.e., using a linear pre-mixing function) Lt; / RTI > In this case, the multi-channel decorrelator may be configured to obtain a first set of K 'decorrelator output signals (e.g., using individual decorrel correlators) based on the second set of K decorrelator input signals . The multi-channel decorrelator may also be configured to upmix the first set of K 'decorrelator output signals into the second set of N' decorrelator output signals using a postmixing matrix, i. E., Using a linear postmixing function . Thus, the distortions can be kept small. Also, premixing and postmixing (also designated as upmixing) can be performed in a computationally efficient manner.

바람직한 실시 예에서, 다채널 역사오간기는 N 역상관기 입력 신호들의 제 1 세트의 채널 신호들이 관련된 공간 위치들에 의존하여 프리믹싱 매트릭스를 선택하도록 구성될 수 있다. 따라서, 공간 의존성들(또는 상관들)이 프리믹싱 과정에서 고려될 수 있고, 이는 다채널 역상관기 내에서 실행되는 프리믹싱 과정에 기인하는 과도한 저하를 방지하는데 도움을 준다.In a preferred embodiment, the multi-channel historical analyzer can be configured to select the premixing matrix in dependence on the associated spatial locations of the first set of channel signals of the N de-correlator input signals. Thus, spatial dependencies (or correlations) can be considered in the pre-mixing process, which helps to prevent excessive degradation due to the premixing process being performed in a multi-channel decorrelator.

바람직한 실시 예에서, 다채널 역상관기는 상기 N 역상관기 입력 신호들의 제 1 세트의 상관 특징들 또는 공분산 특징들에 의존하여 상기 프리믹싱 매트릭스를 선택하도록 구성될 수 있다. 그러한 기능은 또한 다채널 역상관기에 의해 실행되는 프리믹싱 과정에 기인하는 과도한 저하를 방지하는데 도움을 준다. 예를 들면, 밀접하게 관련된(즉, 높은 교차 상관 또는 높은 교차 공분산을 포함하는), 역상관기 입력 신호들(역상관기 입력 신호들의 제 1 세트의)은 역상관기 입력 신호들의 제 2 세트의 단일 역상관기 입력 신호(역상관기 입력 신호들의 제 2 세트의) 내로 결합될 수 있고, 그 결과, 예를 들면 공통 개별 역상관기(역상관기 코어의)에 의해 처리될 수 있다. 따라서, 실질적으로 상이한 역상관기 입력 신호들(역상관기 입력 신호들의 제 1 세트의)가 역상관기 코어 내로 입력된, 단일 역상관기 입력 신호(역상관기 입력 신호들의 제 2 세트의) 내로 프리믹싱되는 것이 방지될 수 있는데, 그 이유는 이것이 일반적으로 부적합한 역상관기 출력 신호들(예를 들면, 오디오 신호들을 원하는 교차 상관 특징들 또는 교차 공분산 특징들로 이글도록 사용될 때 공간 지각을 방해할 수 있는)을 야기할 것이기 때문이다. 따라서 다채널 역상관기는 지능적인 방식으로, 역상관 효율과 오디오 품질 사이의 뛰어난 절충을 허용하도록 프리믹싱(또는 다운믹싱) 과정에서 어떠한 신호들이 결합되어야만 하는지를 결정할 수 있다.In a preferred embodiment, the multi-channel decorrelator may be configured to select the premixing matrix depending on the correlation features or covariance characteristics of the first set of N decorrelator input signals. Such a function also helps prevent excessive degradation due to the premixing process performed by the multi-channel decorrelator. For example, the decorrelator input signals (of the first set of decorrelator input signals) that are closely related (i. E., Include a high cross-correlation or a high cross covariance) May be coupled into a correlator input signal (of the second set of decorrelator input signals) and, as a result, may be processed by, for example, a common individual decorrelator (of the decorrelator core). Thus, it is preferred that substantially different decorrelator input signals (of the first set of decorrelator input signals) are premixed into a single decorrelator input signal (of the second set of decorrelator input signals), which is input into the decorrelator core May be avoided because it generally causes inadequate decorrelator output signals (e.g., the audio signals may interfere with the spatial perception when used to cross the desired cross-correlation features or cross crossover features) I will do it. Thus, the multi-channel decorrelator can determine in an intelligent manner which signals should be combined in the premixing (or downmixing) process to allow for an excellent tradeoff between the decorrelation efficiency and the audio quality.

바람직한 실시 예에서, 다채널 역상관기는 프리믹싱 매트릭스 및 그것의 에르미트 행렬(Hermitian) 사이의 매트릭스 산물이 도치 운영(inversion operation)과 관련하여 잘 유지되도록 프리믹싱 매트릭스를 결정하도록 구성된다. 따라서, 프리믹싱 매트릭스는 포스트믹싱 매트릭스가 수치(numerical) 문제점 없이 선택될 수 있도록 결정될 수 있다.In a preferred embodiment, the multi-channel decorrelator is configured to determine the premixing matrix such that the matrix product between the premixing matrix and its Hermitian matrix is well maintained with respect to the inversion operation. Thus, the pre-mixing matrix can be determined such that the post-mixing matrix can be selected without numerical problems.

바람직한 실시 예에서, 다채널 역상관기는 일부 매트릭스 곱셈 및 매트릭스 도치 운영들을 사용하는 프리믹싱 매트릭스를 기초로 하여 포스트믹싱 매트릭스를 획득하도록 구성된다. 이러한 방법으로, 포스트믹싱 매트릭스는 포스트믹싱 매트릭싱가 프리믹싱 과정에 잘 적응되도록, 효율적으로 획득될 수 있다.In a preferred embodiment, the multi-channel decorrelator is configured to obtain a post-mixing matrix based on a pre-mix matrix using some matrix multiplication and matrix-turn operations. In this way, the post-mixing matrix can be efficiently obtained so that the post-mixing matrix is well suited to the pre-mixing process.

바람직한 실시 예에서, 다채널 역상관기는 N 역상관기 입력 신호들의 제 1 세트의 채널 신호들과 관련된 렌더링 구성에 대한 정보를 수신하도록 구성된다, 이러한 경우에, 다채널 역상관기는 렌더링 구성에 대한 정보를 기초에 의존하여 프리믹싱 매트릭스를 선택하도록 구성된다. 따라서, 프리믹싱 매트릭스는 뛰어난 오디오 품질이 획득되도록, 렌더링 구성에 잘 적응되는 방식으로 선택될 수 있다.In a preferred embodiment, the multi-channel decorrelator is configured to receive information about the rendering configuration associated with the first set of channel signals of the N decorrelator input signals. In this case, the multi-channel decorrelator includes information about the rendering configuration Lt; RTI ID = 0.0 > a < / RTI > pre-mixing matrix. Thus, the pre-mixing matrix may be selected in a manner that is well-suited to the rendering configuration, such that excellent audio quality is obtained.

바람직한 실시 예에서, 다채널 역상관기는 프리믹싱을 실행할 때 오디오 장면의 공간적으로 인접한 위치들과 관련된 N 역상관기 입력 신호들의 제 1 세트의 채널 신호들을 결합하도록 구성된다. 따라서, 오디오 장면의 공간적으로 인접한 위치들과 관련된 채널 신호들이 일반적으로 유사하다는 사실이 프리믹싱을 설정할 때 이용된다. 그 결과, 유사한 오디오 신호들이 역상관기 코어 내의 동일한 개별 역상관기를 사용하여 프리믹싱에서 결합되고 처리될 수 있다. 따라서, 오디오 콘텐츠의 용납할 수 없는 저하들이 방지될 수 있다.In a preferred embodiment, the multi-channel decorrelator is configured to combine the channel signals of the first set of non-correlator input signals with spatially adjacent positions of the audio scene when performing premixing. Thus, the fact that the channel signals associated with spatially contiguous locations of the audio scene are generally similar is used when setting up premixing. As a result, similar audio signals can be combined and processed in premixing using the same individual decorrelators in the decorrelator core. Thus, unacceptable degradation of the audio content can be prevented.

바람직한 실시 예에서, 다채널 역상관기는 프리믹싱을 실행할 때 오디오 장면의 수직으로 공간적으로 인접한 위치들과 관련된 N 역상관기 입력 신호들의 제 1 세트의 채널 신호들을 결합하도록 구성된다. 이러한 개념은 오디오 장면의 수직으로 공간적으로 인접한 위치들로부터의 오디오 신호들이 일반적으로 유사하다는 사실을 기초로 한다. 게다가, 인간 지각은 특히 오디오 장면의 수직으로 공간적으로 인접한 위치들과 관련된 신호들 사이의 차이들에 대하여 민감하지 않다. 따라서, 수직으로 공간적으로 인접한 위치들과 관련된 오디오 신호들의 결합은 역상관된 오디오 신호들을 기초로 하여 획득된 청취 인상의 실질적인 저하를 야기하지 않는다는 것을 발견하였다.In a preferred embodiment, the multi-channel decorrelator is configured to combine a first set of channel signals of N decorrelator input signals associated with vertically spatially adjacent positions of the audio scene when performing premixing. This concept is based on the fact that the audio signals from vertically spatially adjacent positions of the audio scene are generally similar. In addition, the human perception is not particularly sensitive to differences between signals associated with vertically and spatially adjacent positions of the audio scene. Thus, it has been found that the combination of audio signals associated with vertically spatially adjacent positions does not cause a substantial degradation of the listening impression obtained on the basis of decorrelated audio signals.

바람직한 실시 예에서, 다채널 역상관기는 왼쪽 측 위치 및 오른쪽 측 위치를 포함하는 공간 위치들의 수평 쌍과 관련된 N 역상관기 입력 신호들의 제 1 세트의 채널 신호들을 결합하도록 구성될 수 있다. 왼쪽 측 위치 및 오른쪽 측 위치를 포함하는 공간 위치들의 수평 쌍과 관련된 채널 신호들이 일반적으로 또한 다소 관련된다는 사실이 발견되었는데 그 이유는 공간 위치들의 수평 쌍과 관련된 채널 ㅅ신호들이 일반적으로 공간 인상을 획득하도록 사용되기 때문이다. 따라서, 예를 들면 만일 오디오 장면의 수직으로 공간적으로 인접한 위치들과 관련된 채널 신호들을 결합하는 것이 충분하지 않으면, 공간 위치들의 수평 쌍과 관련된 챠널 신호들을 결합하는 것이 합리적인 해결책이라는 사실이 발견되었는데, 그 이유는 공간 위치들의 수평 쌍과 관련된 채널 신호들의 결합은 일반적으로 청취 인상의 과도한 저하를 야기하지 않기 때문이다.In a preferred embodiment, the multi-channel decorrelator can be configured to combine the first set of channel signals of N decorrelator input signals with the horizontal pair of spatial positions including the left side position and the right side position. It has been found that the channel signals associated with the horizontal pair of spatial positions including the left side position and the right side position are generally also somewhat related because the channel signals associated with the horizontal pair of spatial positions generally have a spatial impression gain . Thus, for example, if it was not sufficient to combine channel signals associated with vertically spatially adjacent locations of an audio scene, it was found to be a reasonable solution to combine the channel signals associated with the horizontal pair of spatial locations, The reason is that the combination of the channel signals associated with the horizontal pair of spatial locations generally does not cause excessive degradation of the listening impression.

바람직한 실시 예에서, 다채널 역상관기는 N 역상관기 입력 신호들의 제 1 세트의 적어도 4개의 채널 신호를 결합하도록 구성되고, 상기 적어도 4개의 채널 신호 중 적어도 두 개는 오디오 장면의 왼쪽 측의 공간 위치들과 관련되고, 상기 적어도 4개의 채널 신호 중 적어도 두 개는 오디오 장면의 오른쪽 측의 공간 위치들과 관련된다. 따라서, 청취 인상을 유의하게(significantly) 포함하지 않고 효율적인 역상관이 획득될 수 있도록, 4개 이상의 채널 신호가 결합된다.In a preferred embodiment, the multi-channel decorrelator is configured to combine at least four channel signals of a first set of N decorrelator input signals, at least two of the at least four channel signals being associated with a spatial location And at least two of the at least four channel signals are associated with spatial locations on the right side of the audio scene. Thus, four or more channel signals are combined so that effective decorrelation can be obtained without significantly including the listening impression.

바람직한 실시 예에서, 다채널 역상관기는 적어도 두 개의 왼쪽 측의 채널 신호들(즉, 오디오 장면의 왼쪽 측 상의 공간 위치들과 관련된 채널 신호들)은 오디오 장면의 중앙 평면과 관련하여, 결합되려는 적어도 두 개의 오른쪽 측의 채널 신호들(즉, 오디오 장면의 오른쪽 측 상의 공간 위치들과 관련된 채널 신호들)과 관련된 공간 위치들에 대칭인 공간 위치들과 관련된다. "대칭" 공간 위치들과 관련된 채널 신호들의 결합이 일반적으로 뛰어난 결과들을 가져온다는 것이 발견되었는데, 그 이유는 그러한 "대칭" 공간적 위치들과 관련된 신호들이 일반적으로 다소 관련되기 때문이며, 이는 공통(결합된) 역상관을 실행하는데 바람직하다.In a preferred embodiment, the multi-channel decorrelator includes at least two left-side channel signals (i. E., Channel signals associated with spatial locations on the left side of the audio scene) associated with the center plane of the audio scene, Is associated with spatial locations that are symmetric to spatial positions associated with the two right-hand side channel signals (i. E., Channel signals associated with spatial positions on the right side of the audio scene). It has been found that the combination of channel signals associated with " symmetric " spatial positions generally yields excellent results because the signals associated with such " symmetric " spatial positions are generally somewhat related, ) Reverse correlation.

바람직한 실시 예에서, 다채널 역상관기는 역상관기 입력 신호들의 제 2 세트의 역상관기 입력 신호들의 수(K)를 기술하는 복잡도 정보를 수신하도록 구성된다. 이러한 경우에, 다채널 역상관기는 복잡도 정보에 의존하여 프리믹싱 매트릭스를 선택하도록 구성된다. 따라서, 다채널 역상관기는 상이한 복잡도 요구사항들에 유연하게 적용될 수 있다. 따라서, 오디오 품질과 복잡도 사이의 절충을 변경하는 것이 가능하다.In a preferred embodiment, the multi-channel decorrelator is configured to receive complexity information describing the number (K) of decorrelator input signals of the second set of decorrelator input signals. In this case, the multi-channel decorrelator is configured to select the pre-mixing matrix depending on the complexity information. Thus, a multi-channel decorrelator can be flexibly applied to different complexity requirements. Thus, it is possible to change the trade-off between audio quality and complexity.

바람직한 실시 예에서, 다채널 역상관기는 다채널 역상관기는 복잡도 정보의 값의 감소와 함께, 역상관기 입력 신호들의 제 2 세트의 상기 역상관기 입력 신호들을 획득하도록 결합되는 역상관기 입력 신호들의 제 1 세트의 역상관기 입력 신호들의 수를 단계적으로 증가시키도록 구성된다. 따라서, 만일 복잡도를 감소시키는 것이 바람직하면, 역상관기 입력 신호들의 제 1 세트의 더 많은 역상관기 입력 신호들을 결합하는(예를 들면, 역상관기 입력 신호들의 제 2 세트의 단일 역상관기 입력 신호 내로) 것이 가능하고, 이는 적은 노력으로 복잡도를 변경하도록 허용한다.In a preferred embodiment, the multi-channel decorrelator includes a first multi-channel decorrelator with a reduced value of complexity information, a first multi-channel decorrelator coupled to obtain the decorrelator input signals of the second set of decorrelator input signals, And to incrementally increase the number of decorrelator input signals in the set. Thus, if it is desired to reduce the complexity, it may be desirable to combine more decorrelator input signals of the first set of decorrelator input signals (e.g., into a single set of decorrelator input signals of the decorrelator input signals) , Which allows to change the complexity with little effort.

바람직한 실시 예에서, 다채널 역상관기는 복잡도 정보의 제 1 값을 위한 프리믹싱을 실행할 때 오디오 장면의 왼쪽 측 상의 수직으로 공간적으로 인접한 위치들과 관련된 역상관기 입력 신호들의 제 1 세의 채널 신호들만을 결합하도록 구성된다. 그러나, 다채널 역상관기는 또한 복잡도 정보의 제 2 값을 위한 프리믹싱을 실행할 때 역상관기 입력 신호들의 제 2 세트의 주어진 신호를 획득하기 위하여 오디오 장면의 왼쪽 측 상의 수직으로 공간적으로 인접한 위치들과 관련된 역상관기 입력 신호들의 제 1 세트 중 적어도 두 개의 채널 신호 및 오디오 장면의 오른쪽 측 상의 수직으로 공간적으로 인접한 위치들과 관련된 역상관기 입력 신호들의 제 1 세트 중 적어도 두 개의 채널 신호를 결합하도록 구성될 수 있다. 바꾸어 말하면, 복잡도 정보의 제 1 값을 위하여, 오디오 장면의 상이한 측들로부터의 어떠한 결합도 실행될 수 없는데, 이는 오디오 신호들의(그리고 역상관된 오디오 신호들을 기초로 하여 획득될 수 있는, 청취 인상의) 특히 뛰어난 품질을 야기할 수 있다. 이와 대조적으로, 만일 더 적은 복잡도가 요구되면, 수직 결합에 더하여 수평 결합이 또한 실행될 수 있다. 복잡도의 단계별 조정을 위한 이러한 합리적인 개념이 발견되었고, 감소된 복잡도를 위하여 청취 인상의 다소 높은 저하가 발견된다.In a preferred embodiment, the multi-channel decorrelator includes only first-order channel signals of decorrelator input signals associated with vertically spatially adjacent positions on the left side of the audio scene when performing pre-mixing for the first value of complexity information . However, the multi-channel decorrelator may also include vertically spatially adjacent positions on the left side of the audio scene to obtain a given signal of the second set of decorrelator input signals when performing premixing for the second value of complexity information At least two of the first set of related decorrelator input signals and at least two of the first set of decorrelator input signals associated with vertically spatially adjacent positions on the right side of the audio scene . In other words, for the first value of the complexity information, no combination from the different sides of the audio scene can be performed, which means that the audio signals (and of the listening impression, which can be obtained based on the decorrelated audio signals) Especially high quality. In contrast, if less complexity is required, horizontal coupling in addition to vertical coupling can also be performed. This rational concept for step-by-step adjustment of complexity has been found, and a somewhat higher drop in the listening impression is found for reduced complexity.

바람직한 실시 예에서, 다채널 역상관기는 복잡도 정보의 제 2 값을 위한 프리믹싱을 실행할 때 역상관기 입력 신호들의 제 2 세트의 주어진 신호를 획득하기 위하여, 역상관기 입력 신호들의 제 1 세트 중 적어도 4개의 채널 신호를 결합하도록 구성되고, 적어도 4개의 채널 신호 중 적어도 두 개는 오디오 장면의 왼쪽 측 상의 공간 위치들과 관련되고, 적어도 4개의 채널 신호 중 적어도 두 개는 오디오 장면의 오른쪽 측 상의 공간 위치들과 관련된다. 이러한 개념은 비록 상기 신호 채널들이 수직으로 인접하지 않더라도(또는 적어도 완벽하게 수직으로 인접하지 않더라도), 오디오 장면의 왼쪽 측 상의 공간 위치들과 관련된 적어도 두 개의 채널 신호 및 오디오 장면의 오른쪽 측 상의 공간 위치들과 관련된 적어도 두 개의 채널 신호를 결합함으로써 상대적으로 낮은 계산 복잡도가 획득될 수 있다는 발견을 기초로 한다.In a preferred embodiment, the multi-channel decorrelators are configured to receive at least four of the first set of decorrelator input signals to obtain a given signal of the second set of decorrelator input signals when performing premixing for the second value of complexity information Wherein at least two of the at least four channel signals are associated with spatial positions on the left side of the audio scene and at least two of the at least four channel signals are associated with spatial positions on the right side of the audio scene &Lt; / RTI > This concept is applicable to at least two channel signals associated with spatial positions on the left side of the audio scene and spatial positions on the right side of the audio scene, even though the signal channels are not vertically adjacent (or at least not perfectly perpendicularly adjacent) Based on the finding that relatively low computational complexity can be obtained by combining at least two channel signals associated with the < RTI ID = 0.0 >

바람직한 실시 예에서, 다채널 역상관기는 역상관기 입력 신호들의 제 2 세트의 제 1 역상관기 입력 신호를 획득하기 위하여, 오디오 장면의 왼쪽 측 상의 수직으로 공간적으로 인접한 위치들과 관련된 역상관기 입력 신호들의 제 1 세트 중 적어도 두 개의 채널 신호를 결합하도록 구성되고, 복잡도 정보의 제 1 값을 위한 역상관기 입력 신호들의 제 2 세트의 제 2 역상관기 입력 신호를 획득하기 위하여, 오디오 장면의 오른쪽 측 상의 수직으로 공간적으로 인접한 위치들과 관련된 역상관기 입력 신호들의 제 1 세트 중 적어도 두 개의 채널 신호를 결합하도록 구성된다. 게다가, 다채널 역상관기는 복잡도 정보의 제 2 값을 위한 역상관기 입력 신호들의 제 2 세트의 제 2 역상관기 입력 신호를 획득하기 위하여, 오디오 장면의 왼쪽 측의 수직으로 공간적으로 인접한 위치들과 관련된 역상관기 입력 신호들의 제 1 세트 중 상기 적어도 두 개의 채널 신호 및 오디오 장면의 오른쪽 측 상의 수직으로 공간적으로 인접한 위치들과 관련된 역상관기 입력 신호들의 제 1 세트 중 상기 적어도 두 개의 채널 신호를 결합하도록 구성된다. 이러한 경우에, 역상관기 입력 신호들의 제 2 세트의 역상관기 입력 신호들의 수는 복잡도 정보의 제 2 값에 대한 것보다 복잡도 정보의 제 1 값에 대하여 더 크다. 바꾸어 말하면, 복잡도 감소의 제 1 값을 위한 역상관기 입력 신호들의 제 2 세트의 두 개의 역상관기 입력 신호를 획득하도록 사용되는, 4개의 채널 신호는 복잡도 감소의 제 2 값을 위한 역상관기 입력 신호들의 제 2 세트의 단일 역상관기 입력 신호를 획득하도록 사용될 수 있다. 따라서, 복잡도 감소의 제 1 값을 위한 두 개의 개별 역상관기에 대한 입력 신호들로서 역할을 하는 신호들은 복잡도 감소의 제 2 값을 위한 단일 개별 역상관기에 대한 입력 신호들로서 역할을 하도록 결합된다. 따라서, 개별 역상관기들의 수(또는 역상관기 입력 신호들의 제 2 세트의 역상관기 입력 신호들의 수)는 복잡도 정보의 감소된 값을 위하여 획득될 수 있다.In a preferred embodiment, the multi-channel decorrelator includes a decorrelator input signal associated with vertically spatially adjacent locations on the left side of the audio scene to obtain a first decorrelator input signal of the second set of decorrelator input signals And to obtain a second set of decorrelator input signals for a first value of complexity information, wherein the second set of decorrelator input signals for the first value of complexity information comprises a first set of decorrelator input signals, And to combine at least two of the first set of decorrelator input signals associated with spatially adjacent positions. In addition, the multi-channel decorrelator is associated with vertically spatially adjacent positions on the left side of the audio scene to obtain a second decorrelator input signal of a second set of decorrelator input signals for a second value of complexity information And to combine the at least two of the first set of decorrelator input signals and the at least two of the first set of decorrelator input signals associated with vertically spatially adjacent positions on the right side of the audio scene do. In this case, the number of decorrelator input signals of the second set of decorrelator input signals is greater for the first value of complexity information than for the second value of complexity information. In other words, the four channel signals, which are used to obtain the two decorrelator input signals of the second set of decorrelator input signals for the first value of complexity reduction, Can be used to obtain a second set of single decorrelator input signals. Thus, the signals acting as input signals for the two individual decorrelators for the first value of complexity reduction are combined to act as input signals to a single individual decorrelator for the second value of complexity reduction. Thus, the number of individual decorrelators (or the number of decorrelator input signals of the second set of decorrelator input signals) may be obtained for a reduced value of the complexity information.

본 발명에 따른 일 실시 예는 인코딩된 표현을 기초로 하여 적어도 두 개의 출력 오디오 신호를 제공하기 위한 다채널 오디오 디코더를 생성한다. 다채널 오디오 디코더는 여기서 설명되는 것과 같은, 다채널 역상관기를 포함한다.One embodiment in accordance with the present invention creates a multi-channel audio decoder for providing at least two output audio signals based on the encoded representation. The multi-channel audio decoder includes a multi-channel decorrelators, such as those described herein.

이러한 실시 예는 다채널 오디오 역상관기가 다채널 오디오 디코더 내의 적용에 매우 적합하다는 사실을 기초로 한다.This embodiment is based on the fact that a multi-channel audio decorrelator is well suited for application in a multi-channel audio decoder.

바람직한 실시 예에서, 다채널 오디오 디코더는 복수의 렌더링된 오디오 신호를 획득하기 위하여, 하나 이상의 렌더링 파라미터에 의존하여, 인코딩된 표현을 기초로 하여 획득되는, 복수의 디코딩된 오디오 신호를 렌더링하도록 구성된다. 다채널 오디오 디코더는 다채널 역상관기를 사용하여 렌더링된 오디오 신호들로부터 하나 이상의 역상관된 오디오 신호를 유도하도록 구성되고, 렌더링된 오디오 신호들은 역상관기 입력 신호들의 제 1 세트을 구성하고, 역상관기 출력 신호들의 제 2 세트는 역상관된 오디오 신호들을 구성한다. 다채널 오디오 디코더는 출력 오디오 신호들을 획득하기 위하여, 렌더링된 오디오 신호들 또는 그것의 스케일링된 버전을 하나 이상의 역상관된 오디오 신호(역상관기 출력 신호들의 제 2 세트의)와 결합하도록 구성된다. 본 발명에 따른 이러한 실시 예는 여기서 설명되는 다채널 역상관기가 포스트-렌더링에 매우 적합하다는 발견을 기초로 하는데, 상대적으로 큰 수의 렌더링된 오디오 신호들이 다채널 역상관기 내로 입력되고, 상대적으로 큰 수의 역상관된 신호들이 그리고 나서 렌더링된 오디오 신호들과 결합된다. 게다가, 상대적으로 적은 수의 개별 역상관기들의 사용(다채널 역상관기의 복잡도 감소)에 기인하는 결함들이 일반적으로 다채널 디코더에 의해 출력된 출력 오디오 신호들의 품질의 심각한 저하를 야기하지 않는다는 사실이 발견되었다.In a preferred embodiment, a multi-channel audio decoder is configured to render a plurality of decoded audio signals, which are obtained based on the encoded representation, in dependence on one or more rendering parameters, to obtain a plurality of rendered audio signals . The multi-channel audio decoder is configured to derive one or more decorrelated audio signals from the rendered audio signals using a multi-channel decorrelator, wherein the rendered audio signals constitute a first set of decorrelator input signals, The second set of signals constitute the decorrelated audio signals. The multi-channel audio decoder is configured to combine the rendered audio signals or a scaled version thereof with one or more decorrelated audio signals (of a second set of decorrelator output signals) to obtain output audio signals. This embodiment in accordance with the present invention is based on the discovery that the multi-channel decorrelator described herein is well suited for post-rendering, wherein a relatively large number of rendered audio signals are input into the multi-channel decorrelator, The numbered correlated signals are then combined with the rendered audio signals. In addition, it has been found that defects due to the use of a relatively small number of individual decorrel correlators (reduced complexity of the multi-channel decorrelator) do not generally cause a serious degradation of the quality of the output audio signals output by the multi-channel decoder .

바람직한 실시 예에서, 다채널 오디오 디코더는 인코딩된 표현 내에 포함된 제어 정보에 의존하여 다채널 역상관기에 의한 사용을 위하여 프리믹싱 매트릭스를 선택하도록 구성된다. 따라서, 역상관의 품질이 특정 오디오 콘텐츠에 잘 적응되도록, 오디오 인코더가 심지어 역상관의 품질을 제어하는 것이 가능하고, 이는 오디오 품질 및 역상관 복잡도 사이의 뛰어난 균형을 가져온다.In a preferred embodiment, the multi-channel audio decoder is configured to select a premixing matrix for use by a multi-channel decorrelator in dependence on the control information contained in the encoded representation. Thus, it is possible for the audio encoder to even control the quality of the decorrelation so that the quality of the decorrelation is well adapted to the particular audio content, which results in an excellent balance between audio quality and inverse correlation complexity.

바람직한 실시 예에서, 다채널 오디오 디코더는 출력 오디오 신호들의 오디오 장면의 공간적 위치들로의 할당을 기술하는 출력 구성에 의존하여 상기 다채널 역상관기에 의한 사용을 위하여 프리믹싱 매트릭스를 선택하도록 구성된다. 따라서, 다채널 역상관기는 특정 렌더링 시나리오에 적응될 수 있고, 이는 효율적인 역상관에 의해 오디오 품질의 실질적인 저하를 방지하는데 도움을 준다.In a preferred embodiment, the multi-channel audio decoder is configured to select a premixing matrix for use by the multi-channel decorrelator depending on an output configuration that describes the assignment of output audio signals to spatial locations of the audio scene. Thus, the multi-channel decorrelators can be adapted to specific rendering scenarios, which helps prevent substantial degradation of audio quality by efficient decorrelation.

바람직한 실시 예에서, 다채널 오디오 디코더는 주어진 출력 구성을 위하여 인코딩된 표현 내에 포함된 제어 정보에 의존하여 다채널 역상관기에 의한 사용을 위하여 세 가지 이상의 서로 다른 프리믹싱 매트릭스(M _pre) 사이에서 선택하도록 구성된다. 이러한 경우에, 세 가지 이상의 상이한 프리믹싱 매트릭스 각각은 역상관기 입력 신호들의 제 2 세트의 신호들의 상이한 수와 관련된다. 따라서, 역상관의 복잡도는 광범위한 범위에 걸쳐 조정될 수 있다.In a preferred embodiment, the multi-channel audio decoder is configured to select between three or more different pre-mixing matrices ( M _pre ) for use by a multi-channel decorrelator depending on the control information contained in the encoded representation for a given output configuration . In this case, each of the three or more different pre-mixing matrices is associated with a different number of signals of the second set of decorrelator input signals. Thus, the complexity of the decorrelation can be adjusted over a wide range.

바람직한 실시 예에서, 다채널 오디오 인코더는 적어도 두 개의 출력 오디오 신호를 수신하는 포맷 컨버터 또는 렌더러에 의해 사용되는 믹싱 매트릭스(Dconv, Drender)에 의존하여 다채널 역상관기에 의한 사용을 위하여 프리믹싱 매트릭스(M _pre)를 선택하도록 구성된다.In a preferred embodiment, the multi-channel audio encoder rely on a mixing matrix (Dconv, Drender) used by a format converter or renderer that receives at least two output audio signals to provide a pre-mixing matrix M _pre ).

또 다른 실시 예에서, 다채널 오디오 디코더는 적어도 두 개의 출력 오디오 신호를 수신하는 포맷 컨버터 또는 렌더러에 의해 사용되는 믹싱 매트릭스(Dconv, Drender)와 동일하도록 다채널 역상관기에 의한 사용을 위하여 프리믹싱 매트릭스(M _pre)를 선택하도록 구성된다.In yet another embodiment, a multi-channel audio decoder is configured for use by a multi-channel decorrelator to be the same as a mixing matrix (Dconv, Drender) used by a format converter or renderer that receives at least two output audio signals. ( M _pre ).

본 발명에 따른 일 실시 예는 적어도 두 개의 입력 오디오 신호를 기초로 하여 인코딩된 표현(814)을 제공하기 위한 다채널 오디오 인코더를 생성한다. 다채널 오디오 인코더는 적어도 두 개의 입력 오디오 신호를 기초로 하여 하나 이상의 다운믹스 신호를 제공하도록 구성된다. 다채널 오디오 인코더는 또한 적어도 두 개의 입력 오디오 신호 사이의 관계를 기술하는 하나 이상의 파라미터를 제공하도록 구성된다. 게다가, 다채널 오디오 인코더는 오디오 디코더의 측에서 사용되도록 역상관기의 복잡도를 기술하는 역상관 복잡도 파라미터를 제공하도록 구성된다. 따라서, 다채널 오디오 인코더는 역상관의 복잡도가 다채널 오디오 인코더에 의해 인코딩되는 오디오 콘텐츠의 요구사항에 조정될 수 있도록, 위에 설명된 다채널 오디오 디코더를 제어할 수 있다.One embodiment in accordance with the present invention creates a multi-channel audio encoder to provide an encoded representation 814 based on at least two input audio signals. The multi-channel audio encoder is configured to provide one or more downmix signals based on at least two input audio signals. The multi-channel audio encoder is also configured to provide one or more parameters that describe the relationship between at least two input audio signals. In addition, a multi-channel audio encoder is configured to provide an inverse correlation complexity parameter that describes the complexity of the decorrelator to be used on the side of the audio decoder. Thus, the multi-channel audio encoder can control the multi-channel audio decoder described above so that the complexity of the decorrelation can be adjusted to the requirements of the audio content encoded by the multi-channel audio encoder.

본 발명에 따른 또 다른 실시 예는 복수의 역상관기 입력 신호를 기초로 하여 복수의 역상관된 신호를 제공하기 위한 방법을 생성한다. 방법은 N 역상관기 입력 신호들의 제 1 세트를 K 역상관기 입력 신호들의 제 2 세트 내로 프리믹싱하는 단계를 포함하고, 여기서 K＜N이다. 방법은 또한 K 역상관기 입력 신호들의 제 1 세트를 기초로 하여 K' 역상관기 출력 신호들의 제 1 세트를 제공하는 단계를 포함한다. 게다가, 방법은 K' 역상관기 출력 신호들의 제 1 세트를 N' 역상관기 출력 신호들의 제 2 세트 내로 업믹싱하는 단계를 포함하는데, 여기서 N'＞K'이다. 이러한 방법은 위에 설명된 다채널 역상관기와 동일한 개념을 기초로 한다.Another embodiment in accordance with the present invention creates a method for providing a plurality of decorrelated signals based on a plurality of decorrelator input signals. The method includes premixing a first set of N de-correlator input signals into a second set of K de-correlator input signals, where K < N. The method also includes providing a first set of K 'decorrelator output signals based on the first set of K decorrelator input signals. In addition, the method includes upmixing a first set of K 'decorrelators output signals into a second set of N' decorrelator output signals, where N '> K'. This method is based on the same concept as the multi-channel decorrelators described above.

본 발명에 따른 또 다른 실시 예는 인코딩된 표현을 기초로 하여 적어도 두 개의 출력 오디오 신호를 제공하기 위한 방법을 생성한다. 방법은 위에 설명된 것과 같은, 따른 복수의 역상관기 입력 신호를 기초로 하여 복수의 역상관된 신호를 제공하는 단계를 포함한다. 이러한 방법은 위에 설명된 다채널 오디오 디코더와 동일한 발견들을 기초로 한다.Yet another embodiment according to the present invention creates a method for providing at least two output audio signals based on an encoded representation. The method includes providing a plurality of decorrelated signals based on a plurality of decorrelator input signals, such as those described above. This method is based on the same findings as the multi-channel audio decoder described above.

또 다른 실시 예는 적어도 두 개의 입력 오디오 신호를 기초로 하여 인코딩된 표현을 제공하기 위한 방법을 생성한다. 방법은 적어도 두 개의 입력 오디오 신호를 기초로 하여 하나 이상의 다운믹스 신호를 제공하는 단계를 포함한다. 방법은 또한 적어도 두 개의 입력 오디오 신호 사이의 관계를 기술하는 하나 이상의 파라미터를 제공하는 단계를 포함한다. 또한, 방법은 오디오 디코더의 측에서 사용되도록 역상관기의 복잡도를 기술하는 역상관 복잡도 파라미터를 제공하는 단계를 포함한다. 이러한 방법은 위에 설명된 오디오 인코더와 동일한 개념을 기초로 한다.Yet another embodiment creates a method for providing an encoded representation based on at least two input audio signals. The method includes providing at least one downmix signal based on at least two input audio signals. The method also includes providing at least one parameter describing a relationship between at least two input audio signals. The method also includes providing an inverse correlation complexity parameter that describes the complexity of the decorrelator to be used on the side of the audio decoder. This method is based on the same concept as the audio encoder described above.

게다가, 본 발명에 따른 실시 예들은 상기 방법들을 실행하기 위한 컴퓨터 프로그램을 생성한다.In addition, embodiments according to the present invention create a computer program for carrying out the methods.

본 발명에 따른 또 다른 실시 예는 인코딩된 오디오 표현을 생성한다. 인코딩된 오디오 표현은 다운믹스 신호의 인코딩된 표현 및 적어도 두 개의 입력 오디오 신호 사이의 관계를 기술하는 하나 이상의 파라미터의 인코딩된 표현을 포함한다. 게다가, 인코딩된 오디오 표현은 오디오 디코더의 측에서 복수의 역상관 모드 중에서 어떠한 역상관 모드가 사용되어야만 하는지를 기술하는 인코딩된 역상관 방법 파라미터를 포함하낟. 따라서 인코딩된 오디오 표현은 적절한 역상관 모드를 시그널링하도록 허용하고 따라서 다채널 오디오 인코더 및 다채널 오디오 디코더와 관련하여 설명된 장점들을 구현하는데 도움을 준다.Yet another embodiment in accordance with the present invention produces an encoded audio representation. The encoded audio representation includes an encoded representation of the downmix signal and an encoded representation of one or more parameters describing the relationship between the at least two input audio signals. In addition, the encoded audio representation includes an encoded decorrelation method parameter that describes which of the plurality of decorrelation modes should be used on the side of the audio decoder. The encoded audio representation thus permits signaling the appropriate decorrelation mode and thus helps to realize the advantages described with respect to the multi-channel audio encoder and the multi-channel audio decoder.

게다가, 위에 설명된 방법들은 위에 언급된 것과 같은 장치와 관련하여 설명된 특징들과 기능 중 어느 하나에 의해 추가될 수 있다는 것에 유의하여야 한다.In addition, it should be noted that the above-described methods may be added by any of the features and functions described in connection with an apparatus such as those mentioned above.

본 발명에 따른 실시 예들이 첨부된 도면들을 참조하여 아래에 설명될 것이다.
도 1은 본 발명의 일 실시 예에 따른 다채널 오디오 디코더의 개략적인 블록 다이어그램을 도시한다.
도 2는 본 발명의 일 실시 예에 따른 다채널 오디오 인코더의 개략적인 블록 다이어그램을 도시한다.
도 3은 본 발명의 일 실시 예에 따라, 인코딩된 표현을 기초로 하여 적어도 두 개의 출력 오디오 신호를 제공하기 위한 방법의 플로우차트를 도시한다.
도 4는 본 발명의 일 실시 예에 따라, 적어도 두 개의 입력 오디오 신호를 기초로 하여 인코딩된 표현을 제공하기 위한 방법의 플로우차트를 도시한다.
도 5는 본 발명의 일 실시 예에 따라, 인코딩된 오디오 표현의 개략적인 표현을 도시한다.
도 6은 본 발명의 일 실시 예에 따라, 다채널 역상관기의 개략적인 블록 다이어그램을 도시한다.
도 7은 본 발명의 일 실시 예에 따라, 다채널 오디오 디코더의 개략적인 블록 다이어그램을 도시한다.
도 8은 본 발명의 일 실시 예에 따라, 다채널 오디오 인코더의 개략적인 블록 다이어그램을 도시한다.
도 9는 본 발명의 일 실시 예에 따라, 복수의 역상관기 입력 신호를 기초로 하여 복수의 역상관된 신호를 제공하기 위한 방법의 플로우차트를 도시한다.
도 10은 본 발명의 일 실시 예에 따라, 인코딩된 표현을 기초로 하여 적어도 두 개의 출력 오디오 신호를 제공하기 위한 방법의 플로우차트를 도시한다.
도 11은 본 발명의 일 실시 예에 따라, 적어도 두 개의 입력 오디오 신호를 기초로 하여 인코딩된 표현을 제공하기 위한 방법의 플로우차트를 도시한다.
도 12는 본 발명의 일 실시 예에 따라, 인코딩된 표현의 개략적인 표현을 도시한다.
도 13은 본 발명의 일 실시 예에 따라, 최소 평균 제곱 오차(minimum mean square error, MMSE) 기반 파라미터 다운믹스/업믹스의 개요를 제공하는 개략적인 표현을 도시한다.
도 14는 3차원 공간 내의 직교성(orthogonality) 원리를 위한 기하학적 표현을 도시한다.
도 15는 본 발명의 일 실시 예에 따라, 렌더링된 출력 상에 적용되는 역상관을 갖는 파라미터 재구성 시스템의 개략적인 블록 다이어그램을 도시한다.
도 16은 역상관 유닛의 개략적인 블록 다이어그램을 도시한다.
도 17은 본 발명의 일 실시 예에 따라, 감소된 복잡도 역상관 유닛의 개략적인 블록 다이어그램을 도시한다.
도 18은 본 발명의 일 실시 예에 따라, 확성기 위치들의 테이블 표현을 도시한다.
도 19a 내지 19g는 N=22이고 5와 11 사이인 K에 대한 프리믹싱(premixing) 계수들의 테이블 표현들을 도시한다.
도 20a 내지 20d는 N=10이고 2와 5 사이인 K에 대한 프리믹싱 계수들의 테이블 표현들을 도시한다.
도 21a 내지 19c는 N=8이고 2와 4 사이인 K에 대한 프리믹싱 계수들의 테이블 표현들을 도시한다.
도 21d 내지 21f는 N=7이고 2와 4 사이인 K에 대한 프리믹싱 계수들의 테이블 표현들을 도시한다.
도 22a 및 22b는 N=5이고 K=2 또는 K=3에 대한 프리믹싱 계수들의 테이블 표현들을 도시한다.
도 23은 N=2이고 K=1에 대한 프리믹싱 계수들의 테이블 표현을 도시한다.
도 24는 채널 신호들의 그룹들의 테이블 표현을 도시한다.
도 25는 SAOCSpecifigConfig(), 또는 균등하게 SAOC3DSpecifigConfig()의 구문 내로 포함될 수 있는, 부가적인 파라미터들의 구문 표현을 도시한다.
도 26은 비트스트림 변수 bsDecorrelationMethod를 위한 상이한 값들의 테이블 표현을 도시한다.
도 27은 비트스트림 변수 bsDecorrelationLevel에 의해 표시되는, 상이한 역상관 레벨들과 출력 구성들 위한 다수의 역상관기의 테이블 표현을 도시한다.
도 28은 개략적인 블록 다이어그램 형태로, 3차원 오디오 인코더에 대한 개요를 도시한다.
도 29는 개략적인 블록 다이어그램 형태로, 3차원 오디오 디코더에 대한 개요를 도시한다.
도 30은 포맷 컨버터(format converter)의 구조의 개략적인 블록 다이어그램을 도시한다.
도 31은 본 발명의 일 실시 예에 따라, 다운믹스 프로세서를 위한 개략적인 블록 다이어그램을 도시한다.
도 32는 상이한 수의 공간 오디오 오브젝트 코딩(SAOC) 다운믹스 오브젝트들에 대한 디코딩 모드들을 표현하는 테이블을 도시한다.
도 33은 비트스트림 요소 "SAOC3DSpecificConfig"의 구문 표현을 도시한다.Embodiments according to the present invention will be described below with reference to the accompanying drawings.
1 shows a schematic block diagram of a multi-channel audio decoder according to an embodiment of the present invention.
2 shows a schematic block diagram of a multi-channel audio encoder according to an embodiment of the present invention.
Figure 3 shows a flowchart of a method for providing at least two output audio signals based on an encoded representation, in accordance with an embodiment of the present invention.
Figure 4 illustrates a flowchart of a method for providing an encoded representation based on at least two input audio signals, in accordance with an embodiment of the present invention.
Figure 5 shows a schematic representation of an encoded audio representation, in accordance with an embodiment of the invention.
Figure 6 shows a schematic block diagram of a multi-channel decorrelator, in accordance with an embodiment of the present invention.
Figure 7 shows a schematic block diagram of a multi-channel audio decoder, in accordance with an embodiment of the present invention.
Figure 8 shows a schematic block diagram of a multi-channel audio encoder, in accordance with an embodiment of the present invention.
9 shows a flowchart of a method for providing a plurality of decorrelated signals based on a plurality of decorrelator input signals, in accordance with an embodiment of the present invention.
10 shows a flowchart of a method for providing at least two output audio signals based on an encoded representation, in accordance with an embodiment of the present invention.
11 shows a flowchart of a method for providing an encoded representation based on at least two input audio signals, in accordance with an embodiment of the present invention.
Figure 12 shows a schematic representation of an encoded representation, in accordance with one embodiment of the present invention.
Figure 13 shows a schematic representation that provides an overview of a minimum mean square error (MMSE) based parameter downmix / upmix, in accordance with an embodiment of the present invention.
Figure 14 shows a geometric representation for the orthogonality principle in three-dimensional space.
Figure 15 shows a schematic block diagram of a parameter reconstruction system with decorrelation applied on the rendered output, in accordance with an embodiment of the present invention.
Figure 16 shows a schematic block diagram of an decorrelation unit.
17 shows a schematic block diagram of a reduced complexity decorrelation unit, in accordance with an embodiment of the present invention.
Figure 18 shows a table representation of loudspeaker positions according to one embodiment of the present invention.
Figures 19a through 19g illustrate table representations of premixing coefficients for K equal to N = 22 and between 5 and 11.
Figures 20a through 20d show table representations of pre-mixing coefficients for N = 10 and K between 2 and 5;
Figures 21A through 19C show table representations of pre-mixing coefficients for N = 8 and K between 2 and 4;
Figures 21d through 21f show table representations of the pre-mixing coefficients for N = 7 and K between 2 and 4;
Figures 22A and 22B show table representations of pre-mixing coefficients for N = 5 and K = 2 or K = 3.
Figure 23 shows a table representation of the pre-mixing coefficients for N = 2 and K = 1.
24 shows a table representation of groups of channel signals.
Figure 25 shows a syntax representation of additional parameters that may be included in the SAOCSpecifigConfig (), or even the SAOC3DSpecifigConfig () syntax.
Figure 26 shows a table representation of the different values for the bitstream variable bsDecorrelationMethod.
Figure 27 shows a table representation of a number of decorrelators for different decorrelation levels and output configurations, as indicated by the bitstream variable bsDecorrelationLevel.
Figure 28 shows an overview of a three-dimensional audio encoder in the form of a schematic block diagram.
Figure 29 shows an overview of a three-dimensional audio decoder in the form of a schematic block diagram.
Figure 30 shows a schematic block diagram of the structure of a format converter.
Figure 31 shows a schematic block diagram for a downmix processor, in accordance with an embodiment of the present invention.
Figure 32 shows a table representing decoding modes for different numbers of spatial audio object coding (SAOC) downmix objects.
33 shows a syntax expression of the bit stream element " SAOC3DSpecificConfig ".

1. 도 1에 따른 다채널 오디오 디코더1. A multi-channel audio decoder

도 1은 본 발명의 일 실시 예에 따른, 다채널 오디오 디코더(100)의 개략적인 블록 다이어그램을 도시한다.Figure 1 shows a schematic block diagram of a multi-channel audio decoder 100, in accordance with an embodiment of the present invention.

다채널 오디오 디코더(100)는 인코딩된 표현(110)을 수신하고 이를 기초로 하여, 적어도 두 개의 출력 오디오 신호(112, 114)를 제공하도록 구성된다.The multi-channel audio decoder 100 is configured to receive the encoded representation 110 and to provide at least two output audio signals 112, 114 based thereon.

다채널 오디오 디코더(100)는 바람직하게는 인코딩된 표현(110)을 기초로 하여 디코딩된 오디오 신호들(122)을 제공하도록 구성되는 디코더(120)를 포함한다. 게다가, 다채널 오디오 디코더(100)는 복수의 렌더링된 오디오 신호(134, 136)를 획득하기 위하여, 하나 이상의 렌더링 파라미터(132)에 의존하여 인코딩된 표현(110, 예를 들면, 디코더(120)에 의해)을 기초로 하여 획득되는, 복수의 디코딩된 오디오 신호(122)를 렌더링하도록 구성되는, 렌더러(renderer, 130)를 포함한다. 게다가, 다채널 오디오 디코더(100)는 렌더링된 오디오 신호들(134, 136)로부터 하나 이상의 역상관된 오디오 신호(142, 144)를 유도하도록 구성되는, 역상관기(140)를 포함한다. 게다가, 다채널 오디오 디코더(100)는 출력 오디오 신호들(112, 114)을 획득하기 위하여, 렌더링된 오디오 신호들(134, 136) 또는 그것들의 스케일링된 버전을 하나 이상의 역상관된 오디오 신호(142, 144)와 결합하도록 구성되는, 결합기(150)를 포함한다.The multi-channel audio decoder 100 preferably includes a decoder 120 configured to provide decoded audio signals 122 based on the encoded representation 110. In addition, the multi-channel audio decoder 100 may generate an encoded representation 110 (e.g., decoder 120) in dependence on one or more rendering parameters 132 to obtain a plurality of rendered audio signals 134,136. And a renderer 130, which is configured to render a plurality of decoded audio signals 122, which are obtained on the basis of a plurality of decoded audio signals 122 (e.g. In addition, the multi-channel audio decoder 100 includes an decorrelator 140 configured to derive one or more decorrelated audio signals 142, 144 from the rendered audio signals 134, 136. In addition, the multi-channel audio decoder 100 may convert the rendered audio signals 134,136 or their scaled version into one or more decorrelated audio signals 142,144 to obtain output audio signals 112,114, And 144, respectively.

그러나, 위에 설명된 기능들이 주어지는 한, 다채널 오디오 디코더(100)의 상이한 하드웨어 구조가 가능할 수 있다는 것에 유의하여야 한다.However, it should be noted that different hardware structures of the multi-channel audio decoder 100 may be possible as long as the functions described above are given.

다채널 오디오 디코더(100)의 기능성과 관련하여, 역상관된 오디오 신호들(142, 144)은 렌더링된 오디오 신호들(134, 136)로부터 유도되고 역상관된 오디오 신호들(142, 144)은 출력 오디오 신호들(112, 114)을 획득하기 위하여 렌더링된 오디오 신호들(134, 136)과 결합된다는 것에 유의하여야 한다. 렌더링된 오디오 신호들(134, 136)로부터 역상관된 오디오 신호들(142, 144)을 유도함으로써, 특히 효율적인 처리가 달성될 수 있는데, 그 이유는 렌더링된 오디오 신호들(134, 136)의 수가 일반적으로 렌더러(130) 내로 입력되는 디코딩된 오디오 신호들(122)의 수로부터 독립적이기 때문이다. 따라서, 역상관 효과는 일반적으로 디코딩된 오디오 신호들(122)의 수와 독립적이고, 이는 구현 효율을 향상시킨다. 게다가, 렌더링 이후의 역상관의 적용은 렌더링 이전에 역상관이 적용되는 경우에 다수의 역상관된 신호를 결합할 때 렌더러에 의해 야기될 수 있는, 아티팩트들의 도입을 방지한다. 게다가, 렌더링된 오디오 신호들의 특징들이 역상관기(140)에 의해 실행되는 역상관에서 고려될 수 있고, 이는 일반적으로 뛰어난 품질의 출력 오디오 신호들을 야기한다.With respect to the functionality of the multi-channel audio decoder 100, the decorrelated audio signals 142 and 144 are derived from the rendered audio signals 134 and 136 and the decorrelated audio signals 142 and 144 are Is coupled with the rendered audio signals 134,136 to obtain the output audio signals 112,114. By deriving the decorrelated audio signals 142,144 from the rendered audio signals 134,136 a particularly efficient processing can be achieved because the number of rendered audio signals 134,136 Since it is generally independent of the number of decoded audio signals 122 that are input into the renderer 130. Thus, the inverse correlation effect is generally independent of the number of decoded audio signals 122, which improves implementation efficiency. In addition, the application of decorrelation after rendering prevents the introduction of artifacts, which may be caused by the renderer when combining a plurality of decorrelated signals when decorrelation is applied prior to rendering. In addition, features of the rendered audio signals may be considered in decorrelation performed by decorrelator 140, which generally results in output audio signals of excellent quality.

게다가. 다채널 오디오 디코더(100)는 여기서 설명되는 특징들과 기능들 중 어느 하나에 의해 추가될 수 있다는 것에 유의하여야 한다. 특히, 여기서 설명되는 것과 같은 개별 향상들은 이에 의해 출력 오디오 신호들의 처리 효율 및/또는 품질을 훨씬 향상시키기 위하여 다채널 오디오 디코더(100) 내로 도입될 수 있다는 것에 유의하여야 한다.Besides. It should be noted that the multi-channel audio decoder 100 may be added by any of the features and functions described herein. In particular, it should be noted that the individual enhancements as described herein may be introduced into the multi-channel audio decoder 100 to thereby greatly improve the processing efficiency and / or quality of the output audio signals.

2. 도 2에 따른 다채널 오디오 인코더2. The multi-channel audio encoder

도 2는 본 발명의 일 실시 예에 따른 다채널 오디오 인코더(200)의 개략적인 블록 다이어그램을 도시한다. 다채널 오디오 인코더(200)는 두 개 이상의 입력 오디오 신호(210, 212)를 수신하고, 이를 기초로 하여, 인코딩된 표현(214)을 제공하도록 구성된다. 다채널 오디오 인코더(200)는 적어도 두 개의 입력 오디오 신호(210, 212)를 기초로 하여 하나 이상의 다운믹스 신호(222)를 제공하도록 구성되는, 다운믹스 신호 제공기(220)를 포함한다. 게다가, 다채널 오디오 인코더(200)는 적어도 두 개의 입력 오디오 신호(210, 212) 사이의 관계(예를 들면, 역상관, 교차 상관, 교차 공분산, 레벨 차이 등)하나 이상의 파라미터(232)를 제공하도록 구성되는, 파라미터 제공기(230)를 포함한다.2 shows a schematic block diagram of a multi-channel audio encoder 200 according to an embodiment of the present invention. The multi-channel audio encoder 200 is configured to receive two or more input audio signals 210, 212 and to provide an encoded representation 214 based thereon. The multi-channel audio encoder 200 includes a downmix signal provider 220 configured to provide one or more downmix signals 222 based on at least two input audio signals 210, In addition, the multi-channel audio encoder 200 provides one or more parameters 232 (e.g., correlation, cross-correlation, crosstalk crossover, level difference, etc.) relationships between at least two input audio signals 210 and 212 And a parameter provider 230,

게다가, 다채널 오디오 인코더(200)는 또한 오디오 디코더의 측에서 복수의 역상관 모드 중에서 어떠한 역상관 모드가 사용되어야만 하는지를 기술하는 역상관 방법 파라미터(242)를 제공하도록 구성되는, 역상관 방법 파라미터 제공기(240)를 포함한다. 하나 이상의 다운믹스 신호(222), 하나 이상의 파라미터(232) 및 역상관 방법 파라미터(242)는 예를 들면 인코딩된 형태로, 인코딩된 표현(214) 내에 포함된다.In addition, the multi-channel audio encoder 200 is also configured to provide an anticoronance method parameter 242 that is configured to provide an inverse correlation method parameter 242 that describes which of the plurality of inverse correlation modes should be used on the side of the audio decoder (240). One or more downmix signals 222, one or more parameters 232 and an anticoronality method parameter 242 are included in the encoded representation 214, for example, in encoded form.

그러나, 위에 설명된 기능들이 충족되는 한, 다채널 오디오 인코더(200)의 하드웨어 구조는 상이할 수 있다는 것에 유의하여야 한다. 바꾸어 말하면, 다채널 오디오 인코더(200)의 기능들의 개별 블록들로의(예를 들면, 다운믹스 신호 제공기(220)로, 파라미터 제공기(230)로 그리고 역상관 방법 파라미터 제공기(240)로의) 분배는 단지 일례로서 고려되어야만 한다.However, it should be noted that the hardware structure of the multi-channel audio encoder 200 may be different so long as the functions described above are satisfied. In other words, to the individual blocks of functions of the multi-channel audio encoder 200 (e.g., to the downmix signal provider 220, to the parameter provider 230 and to the de-correlation method parameter provider 240) ) Should only be considered as an example.

다채널 오디오 인코더(200)의 기능성과 관련하여, 하나 이상의 다운믹스 신호(222) 및 하나 이상의 파라미터(232)는 예를 들면, 공간 오디오 오브젝트 코딩 다채널 오디오 인코더 또는 USAC에서와 같은, 종래의 방법으로 제공된다는 것에 유의하여야 한다. 그러나, 또한 다채널 오디오 인코더(200)에 의해 제공되고 인코딩된 표현(214) 내에 포함되는, 역상관 방법 파라미터는 역상관 모드를 입력 오디오 신호들(210, 212) 또는 요구되는 재생 품질에 적용하도록 사용될 수 있다. 따라서, 역상관 모드는 상이한 형태들의 오디오 콘텐츠에 적용될 수 있다. 예를 들면, 입력 오디오 신호들(212, 214)이 강하게 상관되는 오디오 콘텐츠의 형태들 및 입력 오디오 신호들(212, 214)이 독립적인 오디오 콘텐츠의 형태들을 위하여 상이한 역상관 모드들이 선택될 수 있다. 게다가, 상이한 역상관 모드들은 예를 들면, 공간 인식이 특히 중요한 오디오 콘텐츠의 형태들 및 공간 효과가 덜 중요하거나 또는 부수적으로 중요한(예를 들면, 개별 채널들의 재생과 비교할 때) 오디오 콘텐츠의 형태들을 위하여 역상관 모드 파라미터(242)에 의해 시그널링될 수 있다. 따라서, 인코딩된 표현(214)을 수신하는, 다채널 오디오 디코더는 다채널 오디오 인코더(200)에 의해 제어될 수 있고, 디코딩 복잡도와 재생 품질 사이의 최상의 가능한 절충을 가져오는 디코딩 모드로 보내질 수 있다.With respect to the functionality of the multi-channel audio encoder 200, one or more of the downmix signals 222 and the one or more parameters 232 may be encoded in a conventional manner such as, for example, in a spatial audio object coding multi-channel audio encoder or USAC . &Lt; / RTI > However, the decorrelation method parameters, also provided by the multi-channel audio encoder 200 and included in the encoded representation 214, may be used to apply the decorrelation mode to the input audio signals 210, 212 or to the required playback quality Can be used. Thus, the decorrelation mode can be applied to different types of audio content. For example, the types of audio content for which the input audio signals 212 and 214 are strongly correlated and the different decorrelation modes for the types of independent audio content may be selected for the input audio signals 212 and 214 . In addition, the different decorrelation modes may be used for different types of audio content such as, for example, the types of audio content for which spatial perception is particularly important, and the types of audio content for which space effects are less important or of secondary importance (e.g., And may be signaled by the inverse correlation mode parameter 242. [ Thus, a multi-channel audio decoder receiving the encoded representation 214 may be controlled by the multi-channel audio encoder 200 and may be sent in a decoding mode that yields the best possible trade-off between decoding complexity and playback quality .

게다가, 다채널 오디오 인코더(200)는 여기서 설명되는 특징들과 기능들 중 어느 하나에 의해 추가될 수 있다는 것에 유의하여야 한다. 특히, 여기서 설명되는 것과 같은 가능한 부가적인 특징들과 향상들은 이에 의해 다채널 오디오 인코더(200)를 향상(또는 개선)시키기 위하여, 개별적으로 또는 조합하여 다채널 오디오 인코더(200)에 추가될 수 있다는 것에 유의하여야 한다.In addition, it should be noted that the multi-channel audio encoder 200 may be added by any of the features and functions described herein. In particular, possible additional features and enhancements, such as those described herein, may be added to the multi-channel audio encoder 200, individually or in combination, thereby improving (or improving) the multi-channel audio encoder 200 .

도 3에 따른 적어도 두 개의 오디오 신호를 제공하기 위한 방법A method for providing at least two audio signals according to Fig. 3

도 3은 인코딩된 표현을 기초로 하여 적어도 두 개의 출력 오디오 신호를 제공하기 위한 방법(300)의 플로우차트를 도시한다. 방법은 복수의 렌더링된 오디오 신호를 획득하기 위하여, 하나 이상의 렌더링 파라미터에 의존하여, 인코딩된 표현(312)을 기초로 하여 획득되는, 복수의 디코딩된 오디오 신호를 렌더링하는 단계(310)를 포함한다. 방법(300)은 또한 렌더링된 오디오 신호들로부터 하나 이상의 역상관된 오디오 신호를 유도하는 단계(320)를 포함한다. 방법(300)은 또한 출력 오디오 신호들(332)을 획득하기 위하여, 렌더링된 오디오 신호들 또는 그것들의 스케일링된 버전을 하나 이상의 역상관된 오디오 신호와 결합하는 단계(330)를 포함한다.FIG. 3 shows a flowchart of a method 300 for providing at least two output audio signals based on an encoded representation. The method includes rendering (310) a plurality of decoded audio signals, which are obtained based on the encoded representation (312), in dependence on one or more rendering parameters, to obtain a plurality of rendered audio signals . The method 300 also includes deriving 320 one or more decorrelated audio signals from the rendered audio signals. The method 300 also includes combining (330) the rendered audio signals, or a scaled version thereof, with one or more decorrelated audio signals to obtain output audio signals 332.

방법은 도 1에 따른 다채널 오디오 디코더(100)와 동일한 고려사항들을 기초로 한다는 것에 유의하여야 한다. 게다가, 방법(300)은 여기서 설명되는 특징들과 기능들 중 어느 하나에 의해(개별적으로 또는 조합하여) 추가될 수 있다는 것에 유의하여야 한다. 예를 들면, 방법(300)은 여기에 설명되는 다채널 오디오 디코더와 관련하여 설명되는 특징들과 기능들 중 어느 하나에 의해 추가될 수 있다.It should be noted that the method is based on the same considerations as the multi-channel audio decoder 100 according to FIG. In addition, it should be noted that the method 300 may be added (either individually or in combination) by any of the features and functions described herein. For example, the method 300 may be added by any of the features and functions described in connection with the multi-channel audio decoder described herein.

4. 도 4에 따른 인코딩된 표현을 제공하기 위한 방법4. A method for providing an encoded representation according to FIG. 4

도 4는 적어도 두 개의 입력 오디오 신호를 기초로 하여 인코딩된 표현을 제공하기 위한 방법(400)의 플로우차트를 도시한다. 방법(400)은 적어도 두 개의 입력 오디오 신호(412)를 기초로 하여 하나 이상의 다운믹스 신호를 제공하는 단계(410)를 포함한다. 방법(400)은 적어도 두 개의 입력 오디오 신호(412) 사이의 관계를 기술하는 하나 이상의 파라미터를 제공하는 단계(420) 및 복수의 역상관 모드 중에서 어떤 역상관 모드가 오디오 디코더의 측에서 사용되어야만 하는지를 기술하는 역상관 방법 파라미터를 제공하는 단계(430)를 더 포함한다. 따라서, 바람직하게는 하나 이상의 다운믹스 신호, 적어도 두 개의 입력 오디오 신호 사이의 관계를 기술하는 하나 이상의 파라미터 및 역상관 방법 파라미터의 인코딩된 표현을 포함하는, 인코딩된 표현(432)이 제공된다. 방법(400)은 또한 위의 설명들이 적용되도록 도 2에 따른 다채널 오디오 인코더(200)와 동일한 고려사항들을 기초로 한다는 것에 유의하여야 한다.4 shows a flowchart of a method 400 for providing an encoded representation based on at least two input audio signals. The method 400 includes providing 410 one or more downmix signals based on at least two input audio signals 412. The method 400 includes providing 420 one or more parameters describing the relationship between at least two input audio signals 412 and determining which of the plurality of decorrelation modes should be used on the side of the audio decoder (Step 430). Thus, an encoded representation 432 is provided that preferably includes an encoded representation of one or more downmix signals, one or more parameters describing a relationship between at least two input audio signals, and an anti-correlation method parameter. It should be noted that the method 400 is also based on the same considerations as the multi-channel audio encoder 200 according to FIG. 2 so that the above description applies.

게다가, 단계들(410, 420, 430)의 순서는 유연하게 변경될 수 있고, 단계들(410, 420, 430)은 방법(400)을 위한 실행 환경에서 가능한 한, 또한 병렬로 실행될 수 있다는 것에 유의하여야 한다. 게다가, 방법(400)은 개별적으로 또는 조합하여, 여기서 설명되는 특징들과 기능들 중 어느 하나에 의해 추가될 수 있다는 것에 유의하여야 한다. 예를 들면, 방법(400)은 여기에 설명되는 다채널 오디오 인코더와 관련하여 설명되는 특징들과 기능들 중 어느 하나에 의해 추가될 수 있다. 그러나, 인코딩된 표현(432)을 수신하는, 여기서 설명되는 다채널 오디오 디코더들의 특징들과 기능들과 상응하는 특징들과 기능들을 도입하는 것이 또한 가능하다.In addition, the order of steps 410, 420 and 430 may be changed flexibly, and steps 410, 420 and 430 may be performed in parallel as well as possible in an execution environment for method 400 Be careful. In addition, it should be noted that the method 400 may be added either individually or in combination, by any of the features and functions described herein. For example, the method 400 may be added by any of the features and functions described in connection with the multi-channel audio encoder described herein. However, it is also possible to introduce features and functions corresponding to the features and functions of the multi-channel audio decoders described herein that receive the encoded representation 432.

5. 도 5에 따른 인코딩된 오디오 표현5. Encoded audio representation according to FIG. 5

도 5는 본 발명의 일 실시 예에 따른 인코딩된 오디오 표현(500)의 개략적인 표현을 도시한다.FIG. 5 shows a schematic representation of an encoded audio representation 500 in accordance with an embodiment of the present invention.

인코딩된 오디오 표현(500)은 다운믹스 신호의 인코딩된 표현(510), 적어도 두 개의 오디오 신호 사이의 관계를 기술하는 하나 이상의 파라미터의 인코딩된 표현(520)을 포함한다. 게다가, 인코딩된 오디오 표현(500)은 또한 복수의 역상관 모드 중에서 어떤 역상관 모드가 오디오 디코더의 측에서 사용되어야만 하는지를 기술하는 인코딩된 역상관 방법 파라미터(530)를 포함한다. 따라서, 인코딩된 오디오 표현은 오디오 인코더로부터 오디오 디코더로 역상관 모드를 시그널링하도록 허용한다. 따라서, 오디오 콘텐츠(예를 들면, 하나 이상의 다운믹스 신호의 인코딩된 표현(510)에 의해, 그리고 적어도 두 개의 오디오 신호(예를 들면, 하나 이상의 다운믹스 신호의 인코딩된 표현(510) 내로 다운믹싱된 적어도 두 개의 오디오 신호) 사이의 관계를 기술하는 하나 이상의 파라미터의 인코딩된 표현에 의해 기술되는)의 특징들에 작 적용되는 역상관 모드를 획득하는 것이 가능하다. 따라서, 인코딩된 오디오 표현(500)은 특히 뛰어난 청각 공간 효과 및/또는 청각 공간 효과와 디코딩 복잡도 사이의 특히 뛰어난 균형을 갖는 인코딩된 오디오 표현(500)에 의해 표현되는 오디오 콘텐츠의 렌더링을 허용한다.The encoded audio representation 500 includes an encoded representation 510 of the downmix signal, an encoded representation 520 of one or more parameters describing a relationship between at least two audio signals. In addition, the encoded audio representation 500 also includes an encoded decorrelation method parameter 530 that describes which of the plurality of decorrelation modes should be used on the side of the audio decoder. Thus, the encoded audio representation allows signaling of the decorrelation mode from the audio encoder to the audio decoder. Thus, audio content (e.g., by an encoded representation 510 of one or more downmix signals, and by downmixing into at least two audio signals (e.g., an encoded representation 510 of one or more downmix signals) Which is described by an encoded representation of one or more parameters describing the relationship between the audio signal (e.g., at least two audio signals that have been encoded). Thus, the encoded audio representation 500 allows for the rendering of audio content represented by an encoded audio representation 500 that has a particularly good balance between decoding complexity and auditory spatial effects and / or auditory spatial effects in particular.

게다가, 인코딩된 표현(500)은 개별적으로 또는 조합하여, 다채널 오디오 인코더들과 다채널 오디오 디코더들과 관련하여 설명된 특징들과 기능들 중 어느 하나에 의해 추가될 수 있다는 것에 유의하여야 한다.In addition, it should be noted that the encoded representation 500 may be added individually or in combination by any of the features and functions described in connection with multi-channel audio encoders and multi-channel audio decoders.

6. 도 6에 따른 다채널 6. The multi-channel 역상관기Inverse correlator

도 6은 본 발명의 일 실시 예에 따른 다채널 역상관기(600)의 개략적인 블록 다이어그램을 도시한다.FIG. 6 shows a schematic block diagram of a multi-channel decorrelator 600 according to an embodiment of the present invention.

다채널 역상관기(600)는 N 역상관기 입력 신호들(610a 내지 610n)의 제 1 세트를 수신하고, 이를 기초로 하여 N' 역상관기 출력 신호들(612a 내지 612n')의 제 2 세트를 제공하도록 구성된다. 바꾸어 말하면, 다채널 역상관기(600)는 역상관기 입력 신호들(610a 내지 610n)을 기초로 하여 복수의(적어도 대략적으로) 역상관된 신호(612a 내지 612n')를 제공하도록 구성된다.The multi-channel decorrelator 600 receives a first set of N decorrelator input signals 610a through 610n and provides a second set of N 'decorrelator output signals 612a through 612n' based thereon . In other words, the multi-channel decorrelator 600 is configured to provide a plurality of (at least approximately) decorrelated signals 612a through 612n 'based on the decorrelator input signals 610a through 610n.

다채널 역상관기(600)는 N 역상관기 입력 신호들(610a 내지 610n)의 제 1 세트를 K 역상관기 입력 신호들(622a 내지 622k)의 제 2 세트 내로 프리믹싱하도록 구성되는, 프리믹서(premixer, 620)를 포함하는데, K는 N보다 작다(K와 N은 정수들이다). 다채널 역상관기(600)는 또한 K 역상관기 입력 신호들(622a 내지 622k)을 기초로 하여 K' 역상관기 출력 신호들(632a 내지 632k')의 제 1 세트를 제공하도록 구성되는, 역상관(또는 역상관 코어, 630)를 포함한다. 게다가, 다채널 역상관기는 K' 역상관기 출력 신호들(632a 내지 632k')의 제 1 세트를 N' 역상관기 출력 신호들(612a 내지 612n')의 제 2 세트 내로 업믹싱하도록 구성되는, 포스트믹서(postmixer, 640)를 포함하는데, N'은 K'보다 크다(N'과 K'는 정수들이다).The multi-channel decorrelator 600 is configured to pre-mix a first set of N decorrelator input signals 610a through 610n into a second set of K decorrelator input signals 622a through 622k, , 620), where K is less than N (K and N are integers). The multi-channel decorrelator 600 is also configured to provide a first set of K 'decorrelator output signals 632a through 632k' based on K decorator input signals 622a through 622k, Or an inverse correlation core, 630). In addition, the multi-channel decorrelator is configured to upmix the first set of K 'decorrelator output signals 632a through 632k' into the second set of N 'decorrelator output signals 612a through 612n' Mixer 640, where N 'is greater than K' (N 'and K' are integers).

그러나, 주어진 다채널 역상관기(600)의 구조는 단지 일례로서 고려되어야만 하고, 여기서 설명되는 기능성이 제공되는 한 다채널 역상관기(600)를 기능 블록들로(예를 들면, 프리믹서(620), 역상관 또는 역상관 코어(630) 및 포스트 믹서(640)로) 세분하는 것이 반드시 필요하지는 않다.However, the structure of a given multi-channel decorrelator 600 should only be considered as an example, and the multi-channel decorrelator 600 may be provided as functional blocks (e.g., pre-mixer 620) , An inverse correlated or non-correlated core 630, and a post mixer 640).

다채널 역상관기(600)의 기능성과 관련하여, N 역상관기 입력 신호들의 제 1 세트로부터 K 역상관기 입력 신호들의 제 2 세트를 유도하기 위하여 프리믹싱을 실행하는 개념, 및 (프리믹싱되거나 또는 "다운믹싱된") K 역상관기 입력 신호들의 제 2 세트를 기초로 하여 역상관을 실행하는 개념은 실제 역상관이 예를 들면 직접적으로 N 역상관기 입력 신호들에 적용되는 개념과 비교할 때 복잡도의 감소를 가져온다. 게다가, 업믹서(640)에 의해 실행될 수 있는, 역상관기 출력 신호들의 제 1 (원래) 세트를 기초로 하여 N' 역상관기 출력 신호들의 제 2 (업믹싱된) 세트가 획득된다. 따라서, 다채널 역상관기(600)는 N 역상관기 입력 신호들을 효율적으로(외부에서 볼 때) 수신하고 이를 기초로 하여, N' 역상관기 출력 신호들을 제공하며, 실제 역상관기 코어(630)만이 적은 수의 신호들(즉 K 역상관기 입력 신호들의 제 2 세트의 K 다운믹싱된 역상관기 입력 신호들(622a 내지 622k)) 상에서 운영된다. 따라서, 역상관(또는 역상관기 코어, 630)의 입력 측에서 다운믹싱 또는 "프리믹싱(바람직하게는 어떠한 역상관 기능 없이 선형 프리믹싱일 수 있는)"의 실행에 의해, 그리고 역상관(또는 역상관기 코어, 630)의 (원래) 출력 신호들(632a 내지 632k')을 기초로 하여 업믹싱 또는 "포스트믹싱(예를 들면, 어떠한 부가적인 역상관 기능 없이 선형 업믹싱)의 실행에 의해, 다채널 역상관기(600)의 복잡도는 종래의 역상관기들과 비교할 때, 실질적으로 감소될 수 있다.Related to the functionality of the multi-channel decorrelator 600, the concept of performing pre-mixing to derive a second set of K decorrelator input signals from the first set of N decorrelator input signals, Downmixed ") K decorrelator input signals, the concept of performing the decorrelation based on the second set of decorrelator input signals is a reduction in complexity as compared to the concept that the actual decorrelation is applied, for example, directly to the N decorrelator input signals Lt; / RTI > In addition, a second (upmixed) set of N 'decorrelator output signals is obtained based on the first (original) set of decorrelator output signals that can be executed by the upmixer 640. Thus, the multi-channel decorrelator 600 efficiently receives (as viewed externally) the N decorrelator input signals and provides N 'decorrelator output signals based thereon, and only the actual decorrelator core 630 (I.e., a second set of K downmixed decorrelator input signals 622a through 622k of K decorator input signals). Thus, by downmixing or " premixing (which may be preferably linear premixing without any decorrelation function) " at the input of the decorrelation (or decorrelator core 630) By performing upmixing or " postmixing " (e.g., linear upmixing without any additional decorrelation function) based on the (original) output signals 632a through 632k 'of the correlator core 630 The complexity of the channel decorrelator 600 can be substantially reduced when compared to conventional decorrelators.

게다가, 다채널 역상관기(600)는 다채널 역상관 및 또한 다채널 오디오 디코더들과 관련하여 여기에 설명되는 특징들과 기능들 중 어느 하나에 의해 추가될 수 있다는 것에 유의하여야 한다. 여기서 설명되는 특징들은 이에 의해 다채널 역상관기(600)를 향상시키거나 또는 개선하기 위하여, 개별적으로 또는 조합하여 다채널 역상관기(600)에 추가될 수 있다는 것에 유의하여야 한다.In addition, it should be noted that the multi-channel decorrelator 600 may be added by any of the features and functions described herein with respect to multi-channel decorrelation and also with multi-channel audio decoders. It should be noted that the features described herein may be added to the multi-channel decorrelator 600 individually or in combination, thereby improving or improving the multi-channel decorrelator 600. [

복잡도 감소가 없는 다채널 역상관기는 K=N(그리고 가능한 K'=N' 또는 심지어 K=N=K'=N')에 대하여 위에 설명된 다채널 역상관기로부터 유도될 수 있다는 것에 유의하여야 한다.It should be noted that a multi-channel decorrelator without complexity reduction can be derived from the multi-channel decorrelator described above for K = N (and possibly K '= N' or even K = N = K '= N' .

7. 도 7에 따른 다채널 오디오 디코더7. Multichannel audio decoder < RTI ID = 0.0 >

도 7은 본 발명의 일 실시 예에 따른 다채널 오디오 디코더(700)의 개략적인 블록 다이어그램을 도시한다.7 illustrates a schematic block diagram of a multi-channel audio decoder 700 in accordance with an embodiment of the present invention.

다채널 오디오 디코더(700)는 인코딩된 표현(710)을 수신하고 이를 기초로 하여, 적어도 두 개의 출력 신호(712, 714)를 제공하도록 구성된다. 다채널 오디오 디코더(700)는 실질적으로 도 6에 따른 다채널 역상관기(600)와 동일할 수 있는, 다채널 역상관기(720)를 포함한다. 게다가, 다채널 오디오 디코더(700)는 통상의 지식을 가진 자들에 의해 알려졌거나 또는 다른 다채널 오디오 디코더들과 관련하여 여기서 설명되는 다채널 오디오 디코더의 특징들과 기능들 중 어느 하나를 포함할 수 있다.The multi-channel audio decoder 700 is configured to receive the encoded representation 710 and to provide at least two output signals 712, 714 based thereon. The multi-channel audio decoder 700 includes a multi-channel decorrelator 720, which may be substantially the same as the multi-channel decorrelator 600 according to FIG. In addition, the multi-channel audio decoder 700 may include any of the features and functions of a multi-channel audio decoder as described herein or otherwise described herein in connection with other multi-channel audio decoders have.

게다가, 다채널 오디오 디코더(700)는 종래의 다채널 오디오 디코더들과 비교할 때 특히 높은 효율성을 포함한다는 것에 유의하여야 하는데, 그 이유는 다채널 오디오 디코더(700)가 높은 효율성의 다채널 역상관기(720)를 사용하기 때문이다.In addition, it should be noted that the multi-channel audio decoder 700 includes a particularly high efficiency when compared to conventional multi-channel audio decoders because the multi-channel audio decoder 700 has a high efficiency multi- 720).

8. 도 8에 따른 다채널 오디오 인코더8. Multichannel audio encoder < RTI ID = 0.0 >

도 8은 본 발명의 일 실시 예에 따른 다채널 오디오 인코더(800)의 개략적인 블록 다이어그램을 도시한다. 다채널 오디오 인코더(800)는 적어도 두 개의 입력 오디오 신호(810, 812)를 수신하고 이를 기초로 하여, 입력 오디오 신호들(810, 812)에 의해 표현되는 오디오 콘텐츠의 인코딩된 표현(814)을 제공하도록 구성된다.FIG. 8 illustrates a schematic block diagram of a multi-channel audio encoder 800 in accordance with an embodiment of the present invention. The multi-channel audio encoder 800 receives at least two input audio signals 810 and 812 and generates an encoded representation 814 of the audio content represented by the input audio signals 810 and 812 .

다채널 오디오 인코더(800)는 적어도 두 개의 입력 오디오 신호(810, 812)를 기초로 하여 하나 이상의 다운믹스 신호(822)를 제공하도록 구성되는, 다운믹스 신호 제공기(820)를 포함한다. 다채널 오디오 인코더(800)는 또한 입력 오디오 신호들(810, 812)을 기초로 하여 하나 이상의 파라미터(832, 예를 들면, 교차 상관 파라미터들 또는 교차 공분산 파라미터들, 또는 오브젝트간 상관 파라미터들 및/또는 오브젝트 레벨 차이 파라미터들)를 제공하도록 구성되는 파라미터 제공기(830)를 포함한다. 게다가, 다채널 오디오 인코더(800)는 오디오 인코더(인코딩된 표현(814)을 수신하는)의 측에서 사용되도록 역상관의 복잡도를 기술하는 역상관 복잡도 파라미터(842)를 제공하도록 구성되는 역상관 복잡도 파라미터 제공기(840)를 포함한다. 하나 이상의 다운믹스 신호(822), 하나 이상의 파라미터(832) 및 역상관 복잡도 파라미터(842)는 바람직하게는 인코딩된 형태로, 인코딩된 표현(814) 내에 포함된다.The multi-channel audio encoder 800 includes a downmix signal provider 820 configured to provide one or more downmix signals 822 based on at least two input audio signals 810, 812. The multi-channel audio encoder 800 also includes one or more parameters 832, e.g., cross-correlation parameters or cross-covariance parameters, or inter-object correlation parameters and / Or object level difference parameters). &Lt; / RTI > In addition, the multi-channel audio encoder 800 is configured to provide an inverse correlation complexity parameter 842 that describes the complexity of the inverse correlation to be used on the side of the audio encoder (which receives the encoded representation 814) And a parameter provider 840. One or more downmix signals 822, one or more parameters 832 and an inverse correlation parameter 842 are preferably included in the encoded representation 814 in an encoded form.

그러나, 다채널 오디오 인코더(800)의 내부 구조(예를 들면 다운믹스 신호 제공기(820), 파라미터 제공기(830) 및 역상관 복잡도 파라미터 제공기(840)의 존재)는 단지 일례로서 고려된다는 것을 이해하여야 한다. 여기서 설명되는 기능성이 달성되는 한 상이한 구조들이 가능하다.However, the internal structure of the multi-channel audio encoder 800 (e.g., presence of downmix signal provider 820, parameter provider 830, and inverse correlation complexity parameter provider 840) Should be understood. Different structures are possible as long as the functionality described herein is achieved.

다채널 오디오 인코더(800)의 기능성과 관련하여, 다채널 오디오 인코더는 인코딩된 표현(814)을 제공하고, 하나 이상의 다운믹스 신호(822) 및 하나 이상의 파라미터(832)는 종래의 오디오 인코더들(예를 들면, 종래의 공간 오디오 오브젝트 코딩 오디오 인코더들 또는 통합 음성 및 오디오 코딩 오디오 디코더들 같은)에 의해 제공되는 다운믹스 신호들 및 파라미터들과 유사하거나 또는 동일할 수 있다는 것을 이해하여야 한다. 그러나, 다채널 오디오 인코더(800)는 또한 오디오 디코더의 측에서 적용되는 역상관 복잡도를 결정하도록 허용하는, 역상관 복잡도 파라미터(842)를 제공하도록 구성된다. 따라서, 역상관 복잡도는 현재 인코딩되는 오디오 콘텐츠에 적용될 수 있다. 예를 들면, 입력 오디오 신호들의 인코더 측의 지식에 의존하여 달성 가능한 오디오 품질과 상응하는, 요구되는 역상관 복잡도를 시그널링하는 것이 가능하다. 예를 들면, 만일 오디오 신호를 위하여 공간 특징들이 중요하다고 발견되면, 공간 특징들이 그다지 중요하지 않은 경우와 비교할 때, 역상관 복잡도 파라미터(842)를 사용하여 높은 역상관 복잡도가 시그널링될 수 있다. 대안으로서, 만일 오디오 콘텐츠 또는 전체 오디오 콘텐츠의 통로가 다른 이유 때문에 오디오 디코더의 측에서 높은 복잡도 역상관이 필요한 것과 같은 것으로 발견되면, 역상관 복잡도 파라미터(842)를 사용하여 높은 역상관 복잡도의 사용이 시그널링될 수 있다. Channel audio encoder 800 provides an encoded representation 814 and one or more downmix signals 822 and one or more parameters 832 are provided to conventional audio encoders (E.g., conventional spatial audio object coding audio encoders or unified voice and audio coding audio decoders) that are similar to or similar to downmix signals and parameters. However, the multi-channel audio encoder 800 is also configured to provide an inverse correlation complexity parameter 842 that allows it to determine the inverse correlation complexity that is applied on the side of the audio decoder. Thus, the inverse correlation complexity can be applied to the currently encoded audio content. For example, it is possible to signal the required inverse correlation complexity, which corresponds to the achievable audio quality, depending on the knowledge of the encoder side of the input audio signals. For example, if spatial features are found to be important for an audio signal, a high decorrelation complexity can be signaled using the decorrelation complexity parameter 842, as compared to the case where spatial features are not so important. Alternatively, if the path of the audio content or the entire audio content is found to require a high complexity decorrelation on the side of the audio decoder for other reasons, then the use of the high decorrelation complexity using the decorrelation complexity parameter 842 Lt; / RTI >

요약하면, 다채널 오디오 인코더(800)는 다채널 오디오 인코더(800)에 의해 설정될 수 있는 신호 특징들 또는 요구되는 재생 특징들에 적용되는 역상관 복잡도를 사용하기 위하여, 다채널 오디오 디코더를 제어하기 위한 가능성을 제공한다.Channel audio encoder 800 may be configured to control the multi-channel audio decoder 800 in order to use the inverse correlation complexity applied to the signal features or required playback features that may be set by the multi- Lt; / RTI >

게다가, 다채널 오디오 인코더(800)는 개별적으로 또는 조합하여, 다채널 오디오 인코더와 관련하여 여기서 설명되는 특징들과 기능들 중 어느 하나에 의해 추가될 수 있다는 것에 유의하여야 한다. 예를 들면, 다채널 오디오 인코더들과 관련하여 여기서 설명되는 일부 또는 모든 특징은 다채널 오디오 인코더(800)에 추가될 수 있다. 게다가, 다채널 오디오 인코더(800)는 여기서 설명되는 다채널 오디오 디코더들과 협력하도록 적용될 수 있다.In addition, it should be noted that the multi-channel audio encoder 800 may be added, either individually or in combination, by any of the features and functions described herein in connection with a multi-channel audio encoder. For example, some or all of the features described herein in connection with multi-channel audio encoders may be added to a multi-channel audio encoder 800. In addition, the multi-channel audio encoder 800 may be adapted to cooperate with the multi-channel audio decoders described herein.

9. 도 9에 따라, 9. According to Fig. 9, 복수의 역상관Multiple reverse correlation 입력 신호를 기초로 하여 Based on the input signal 복수의 역상관된A plurality of decorrelated 신호를 제공하기 위한 방법 Method for providing a signal

도 9는 복수의 역상관 입력 신호를 기초로 하여 복수의 역상관된 신호를 제공하기 위한 방법(900)의 플로우차트를 도시한다.FIG. 9 shows a flowchart of a method 900 for providing a plurality of decorrelated signals based on a plurality of decorrelated input signals.

방법(900)은 N 역상관기 입력 신호들의 제 1 세트를 K 역상관기 입력 신호들의 제 2 세트 내로 프리믹싱하는 단계(910)를 포함하는데, K는 N보다 작다. 방법(900)은 또한 K 역상관기 입력 신호들의 제 2 세트를 기초로 하여 K' 역상관기 출력 신호들의 제 1 세트를 제공하는 단계(920)를 포함한다. 예를 들면, K' 역 상관기 출력 신호들의 제 1 세트는 예를 들면 역상관기 코어 또는 역상관 알고리즘을 사용하여 실행될 수 있는, 역상관을 사용하여 K 역상관기 입력 신호들의 제 2 세트를 기초로 하여 제공될 수 있다. 방법(900)은 K' 역상관기 출력 신호들의 제 1 세트를 N' 역상관기 출력 신호들의 제 2 세트 내로 포스트믹싱하는 단계(930)를 더 포함하는데, N'은 K'보다 크다(N'과 K'은 정수 숫자들이다), 따라서, 방법(900)의 출력인, N' 역상관기 출력 신호들의 제 2 세트는 방법(900)에 입력되는, N' 역상관기 입력 신호들의 제 1 세트를 기초로 하여 제공될 수 있다.The method 900 includes pre-mixing (910) a first set of N de-correlator input signals into a second set of K de-correlator input signals, where K is less than N. The method 900 also includes providing 920 a first set of K 'decorrelator output signals based on the second set of K decorator input signals. For example, a first set of K 'decorrelator output signals may be generated based on a second set of K decorrelator input signals using decorrelation, which may be performed using, for example, an decorrelator core or an decorrelation algorithm Can be provided. The method 900 further includes postmixing 930 a first set of K 'decorrelator output signals into a second set of N' decorrelator output signals, where N 'is greater than K' (N ' The second set of N 'decorrelator output signals, which are the outputs of the method 900, are input to the method 900 based on the first set of N' decorrelator input signals .

방법은 위에 설명된 다채널 역상관기와 동일한 고려사항들을 기초로 한다는 것에 유의하여야 한다. 게다가, 방법(900)은 개별적으로 또는 조합하여, 다채널 역상관기와 관련하여(그리고 만일 적용 가능하면, 또한 다채널 오디오 인코더와 관련하여) 여기서 설명되는 특징들과 기능들 중 어느 하나에 의해 추가될 수 있다는 것에 유의하여야 한다.It should be noted that the method is based on the same considerations as the multi-channel decorrelators described above. In addition, the method 900 may be implemented separately or in combination with one or more of the features and functions described herein in connection with a multi-channel decorrelator (and, where applicable, also with respect to a multi-channel audio encoder) . &Lt; / RTI >

10. 도 10에 따라, 인코딩된 표현을 기초로 하여 적어도 두 개의 출력 오디오 신호를 제공하기 위한 방법10. A method for providing at least two output audio signals based on an encoded representation,

도 10은 인코딩된 표현을 기초로 하여 적어도 두 개의 출력 오디오 신호를 제공하기 위한 방법(1000)의 플로우차트를 도시한다.FIG. 10 shows a flowchart of a method 1000 for providing at least two output audio signals based on an encoded representation.

방법(1000)은 인코딩된 표현(1012)을 기초로 하여 적어도 두 개의 출력 오디오 신호(1014, 1016)를 제공하는 단계(1010)를 포함한다. 방법(1000)은 도 9에 따른 방법(900)에 의존하여 복수의 역상관기 입력 신호를 기초로 하여 복수의 역상관된 신호를 제공하는 단계(1020)를 포함한다.The method 1000 includes providing 1010 at least two output audio signals 1014 and 1016 based on an encoded representation 1012. [ The method 1000 includes step 1020 of providing a plurality of decorrelated signals based on a plurality of decorrelator input signals, depending on the method 900 according to FIG.

방법(1000)은 도 7에 따른 다채널 오디오 디코더(700)와 동일한 고려사항들을 기초로 한다는 것에 유의하여야 한다.It should be noted that the method 1000 is based on the same considerations as the multi-channel audio decoder 700 according to FIG.

또한, 방법(1000)은 개별적으로 또는 조합하여, 다채널 디코더들과 관련하여 여기서 설명되는 특징들과 기능들 중 어느 하나에 의해 추가될 수 있다는 것을 이해하여야 한다.It should also be appreciated that the method 1000 may be added individually or in combination by any of the features and functions described herein in connection with multi-channel decoders.

11. 도 11에 따라, 적어도 두 개의 입력 오디오 신호를 기초로 하여 인코딩된 표현을 제공하기 위한 방법11. A method for providing an encoded representation based on at least two input audio signals,

도 11은 적어도 두 개의 입력 오디오 신호를 기초로 하여 인코딩된 표현을 제공하기 위한 방법(1100)의 플로우차트를 도시한다.11 shows a flowchart of a method 1100 for providing an encoded representation based on at least two input audio signals.

방법(1100)은 적어도 두 개의 입력 오디오 신호(1112, 1113)를 기초로 하여 하나 이상의 다운믹스 신호를 제공하는 단계(1110)를 포함한다. 방법(1100)은 또한 적어도 두 개의 입력 오디오 신호(1112, 1114) 사이의 관계를 기술하는 하나 이상의 파라미터를 제공하는 단계(1220)를 포함한다. 게다가, 방법(1100)은 오디오 디코더의 측에서 사용되도록 역상관의 복잡도를 기술하는 역상관 복잡도 파라미터를 제공하는 단계(1130)를 포함한다. 따라서, 인코딩된 표현(1132)은 적어도 두 개의 입력 오디오 신호(1112, 1114)를 기초로 하여 제공되고, 인코딩된 표현은 일반적으로 하나 이상의 다운믹스 신호, 적어도 두 개의 입력 오디오 신호 사이의 관계를 기술하는 하나 이상의 파라미터 및 인코딩된 형태의 역상관 복잡도 파라미터를 포함한다.The method 1100 includes providing (1110) at least one downmix signal based on at least two input audio signals (1112, 1113). The method 1100 also includes providing 1220 one or more parameters describing the relationship between the at least two input audio signals 1112, 1114. In addition, the method 1100 includes a step 1130 of providing an inverse correlation complexity parameter that describes the complexity of the decorrelation to be used on the side of the audio decoder. Thus, the encoded representation 1132 is provided based on at least two input audio signals 1112, 1114, and the encoded representation generally describes the relationship between one or more downmix signals, at least two input audio signals Lt; / RTI > and one or more parameters and an encoded complex correlation parameter.

단계들(1110, 1120, 1130)은 본 발명에 따른 일부 실시 예들에서 병렬로 또는 상이한 순서로 실행될 수 있다는 것에 유의하여야 한다. 게다가, 방법(1110)은 도 8에 따른 다채널 오디오 인코더(800)와 동일한 고려사항들을 기초로 하고, 방법(1100)은 개별적으로 또는 조합하여, 다채널 인코더와 관련하여 여기서 설명되는 특징들과 기능들 중 어느 하나에 의해 추가될 수 있다는 것을 이해하여야 한다. 게다가, 방법(1100)은 다채널 오디오 디코더와 여기서 설명되는 적어도 두 개의 출력 오디오 신호를 제공하기 위한 방법이 일치하도록 적용될 수 있다는 것에 유의하여야 한다.It should be noted that steps 1110, 1120, 1130 may be performed in parallel or in a different order in some embodiments in accordance with the present invention. In addition, the method 1110 is based on the same considerations as the multi-channel audio encoder 800 according to FIG. 8, and the method 1100, separately or in combination, It is to be understood that the present invention can be added by any one of the functions. In addition, it should be noted that the method 1100 may be applied to match a multi-channel audio decoder and a method for providing at least two output audio signals as described herein.

12. 도 12에 따른 인코딩된 오디오 표현12. Encoded audio representation according to FIG.

도 12는 본 발명의 일 실시 예에 따라, 인코딩된 오디오 표현의 개략적인 표현을 도시한다. 인코딩된 오디오 표현(1200)은 다운믹스 신호의 인코딩된 표현(1210), 적어도 두 개의 입력 오디오 신호 사이의 관계를 기술하는 하나 이상의 파라미터의 인코딩된 표현(1220) 및 오디오의 디코더의 측에서 사용되도록 역상관의 복잡도를 기술하는 인코딩된 역상관 복잡도 파라미터(1230)를 포함한다. 따라서, 인코딩된 오디오 표현(1200)은 다채널 오디오 디코더에 의해 사용되는 역상관 복잡도를 조정하도록 허용하고, 이는 향상된 디코딩 효율, 및 가능하게는 향상된 오디오 품질 또는 코딩 효율과 오디오 품질 사이의 향상된 균형을 가져온다 게다가, 인코딩된 오디오 표현(1200)은 여기서 설명되는 것과 같은 다채널 오디오 인코더에 의해 제공될 수 있고, 여기서 설명되는 것과 같은 다채널 오디오 디코더에 의해 사용될 수 있다는 것에 유의하여야 한다. 따라서 인코딩된 오디오 표현(1200)은 다채널 오디오 인코더들 및 다채널 오디오 디코더들과 관련하여 설명된 특징들과 기능들 중 어느 하나에 의해 추가될 수 있다.Figure 12 shows a schematic representation of an encoded audio representation, in accordance with an embodiment of the present invention. The encoded audio representation 1200 includes an encoded representation 1210 of the downmix signal, an encoded representation 1220 of one or more parameters describing a relationship between at least two input audio signals, And an encoded inverse correlation parameter 1230 that describes the complexity of the inverse correlation. Thus, the encoded audio representation 1200 allows to adjust the inverse correlation complexity used by the multi-channel audio decoder, which improves the decoding efficiency, and possibly an improved audio quality, or an improved balance between coding efficiency and audio quality In addition, it should be noted that the encoded audio representation 1200 may be provided by a multi-channel audio encoder as described herein, and may be used by a multi-channel audio decoder such as the one described herein. Thus, the encoded audio representation 1200 may be added by any of the features and functions described in connection with the multi-channel audio encoders and the multi-channel audio decoders.

13. 기호 및 기본 고려사항13. Symbols and Basic Considerations

최근에, 다수의 오디오 오브젝트를 포함하는 오디오 장면들의 비트레이트 효율적 전송/저장을 위한 파라미터 기술들이 오디오 코딩(예를 들면, 참고문헌 [BCC], [JSC], [SAOC], [SAOC1], [SAOC2] 참조) 및 출처 소스 분리(informed source separation, 예를 들면, 참고문헌 [ISS1], [ISS2], [ISS3], [ISS4], [ISS5], [ISS6] 참조) 분야에서 제안되었다. 이러한 기술들은 전송된/저장된 오디오 장면 및/또는 오디오 장면 내의 소스 오브젝트들을 기술하는 부가적인 부가 정보를 기초로 하여 요구되는 출력 오디오 장면 또는 오디오 소스 오브젝트의 재구성을 목적으로 한다. 이러한 재구성은 파라미터 통보 소스 분리 전략을 사용하여 디코더에서 발생한다. 게다가, 또한 예를 들면 국제 표준 ISO/IEC 23003-1:2007에서 설명되는, 이른바 "MPEG 서라운드" 개념이 참조된다. 게다가, 또한 국제 표준 ISO/IEC 23003-2:2010에서 설명되는, 이른바 "공간 오디오 오브젝트 코딩"이 참조된다. 게다가, 국제 표준 ISO/IEC 23003-3:2012에서 설명되는, 이른바 "통합 음성 및 오디오 코딩" 개념이 참조된다. 이러한 표준들로부터의 개념들은 본 발명에 따른 실시 예들, 예를 들면 여기서 언급되는 다채널 오디오 인코더들 및 여기서 언급되는 다채널 오디오 디코더들에서 사용될 수 있고, 일부 적용들이 필요할 수 있다.Recently, parameter descriptions for bit rate efficient transmission / storage of audio scenes containing a large number of audio objects have been applied to audio coding (e.g., BCC, JSC, SAOC, SAOC1, (ISS1), [ISS2], [ISS3], [ISS4], [ISS5], and [ISS6]). These techniques are aimed at reconstructing the required output audio scene or audio source object based on additional side information describing source objects in the transmitted / stored audio scene and / or audio scene. This reconstruction occurs at the decoder using a parameter notification source separation strategy. In addition, reference is also made to the so-called " MPEG surround " concept, for example described in the international standard ISO / IEC 23003-1: 2007. In addition, reference is also made to the so-called " spatial audio object coding " described in the international standard ISO / IEC 23003-2: 2010. In addition, reference is made to the so-called " unified voice and audio coding " concept described in the international standard ISO / IEC 23003-3: 2012. The concepts from these standards can be used in embodiments according to the invention, for example in the multi-channel audio encoders mentioned here and in the multi-channel audio decoders mentioned here, and some applications may be required.

아래에, 일부 배경 정보가 설명된다. 특히, MPEG 공간 오디오 오브젝트 코딩(SAOC) 기술(예를 들면, 참고문헌 [SAOC] 참조)을 사용하는, 파라미터 분리 전략에 대한 개요가 제공될 것이다. 이러한 방법의 수학적 특성들이 고려된다.Below, some background information is described. In particular, an overview of a parameter separation strategy using MPEG spatial audio object coding (SAOC) techniques (see, for example, Reference [SAOC]) will be provided. The mathematical properties of these methods are considered.

13.1. 기호 및 정의들13.1. Symbols and Definitions

다음의 수학적 정의들이 본 발명에 적용된다:The following mathematical definitions apply to the present invention:

N _Objects 오디오 오브젝트 신호들의 수 N _{Objects Number of} audio object signals

N _DmxCh 다운믹스(처리된) 채널들의 수 N _{DmxCh Number of} downmixed (processed) channels

N _UpmicCh 업믹스(출력) 채널들의 수 N _{UpmicCh Number of} upmix (output) channels

N _Samples 처리된 데이터 샘플들의 수 N _{Samples Number of} processed data samples

D 다운믹스 매트릭스, 크기 N _DmxCh ×N _Objects D _Downmix Matrix, Size N _DmxCh N _Objects

X 입력 오디오 오브젝트 신호, 크기 N _Objects ×N _Samples X Input Audio object signal, size N _Objects × N _Samples

E _x 오브젝트 공분산 매트릭스, 크기 N _Objects ×N N _Objects E _x Object Covariance Matrix, Size N _Objects × N N _Objects

E _X = XX ^H 로서 정의됨.Defined as E _X = XX ^H.

Y 다운믹스 오디오 신호, 크기 N _DmxCh ×N _Samples Y _Downmix audio signal, size N _DmxCh × N _Samples

Y = DX와 같이 정의됨Defined as Y = DX

E _γ 다운믹스 신호들의 공분산 매트릭스, 크기 N _DmxCh ×N _DmxCh E _gamma downmix signals, the size N _DmxCh N _DmxCh

E _γ=YY ^H 와 같이 정의됨Defined as E _γ = YY ^H

G 파라미터 소스 추정 매트릭스, 크기 N _Objects ×N _DmxCh G parameter source estimation matrix, size N _Objects x N _DmxCh

E _X D ^H (DE _X D ^H )^-1과 근사치 E _X D ^H ( DE _X D ^H ) ^-1 and approximate

파라미터로 재구성된 오브젝트 신호, 크기 N _DmxCh ×N _Samples

Object signal reconstructed by parameters, size N _DmxCh × N _Samples

X에 근사치이고

로서 정의됨Approximate to X

Defined as

R 렌더링 매트릭스(디코더 측에 지정되는), 크기 N _UpmixCh × N _Objects R Rendering Matrix (specified on the decoder side), size N _UpmixCh N N _Objects

Z 이상적으로 렌더링된 출력 장면 신호, 크기 N _UpmixCh ×N _Samples Z Ideally rendered output scene signal, size N _UpmixCh × N _Samples

Z=RX로서 정의됨 Defined as Z = RX

렌더링된 파라미터 출력, 크기 N _UpmixCh ×N _Samples

Rendered parameter output, size N _UpmixCh × N _Samples

로서 정의됨

Defined as

C 이상적인 출력의 공분산 매트릭스, 크기 N _UpmixCh ×N _UpmixCh C Covariance matrix of ideal output, size N _UpmixCh × N _UpmixCh

C=RE _X R ^H 로서 정의됨Defined as C = RE _X R ^H

W 역상관 출력들, 크기 N _UpmixCh ×N _Samples W inverse correlation outputs, size N _UpmixCh N N _Samples

S 결합된 신호

, 크기 2N _UpmixCh ×N _Samples S combined signal

, Size 2 N _UpmixCh × N _Samples

E _S 결합된 신호 공분산 매트릭스, 크기 2N _UpmixCh ×2N _UpmixCh E _S Combined signal covariance matrix, size 2 N _UpmixCh × 2 N _UpmixCh

E _S = SS ^H 로서 정의됨 E Defined as _S = SS ^H

최종 출력, 크기 N _UpmixCh ×N _Samples

Final output, size N _UpmixCh × N _Samples

(·) ^H 자체 수반(self-adjoint) (에르미트) 연산자.(·) ^H self-adjoint (Hermite) operator.

(·)의 복소 켤레 전치(complex conjugate transpose)를 표현. 기호 (·) ^* 가 또한 사용될 수 있다.Express complex conjugate transpose of (·). The symbol (*) ^* may also be used.

F _decorr (·) 역상관 함수 F _decorr (·) Inverse correlation function

ε 0으로의 나눗셈을 방지하기 위한 추가 상수Additional constants to prevent division by ε 0

H=matdiag(M) 주 대각선 상의 매트릭스(M)의 주 대각선으로부터의 요소들 및 비-대각선 위치들 상의 0 값들을 포함하는 매트릭스 H = matdiag ( M ) A matrix containing the values from the main diagonal of the matrix ( M ) on the main diagonal and zero values on the non-diagonal positions

일반성의 손실 없이, 방정식들의 가독성을 향상시키기 위하여, 도입된 모든 변수에 대하여 시간 및 주파수 의존을 나타내는 지수들은 본 명세서에서 생략된다.In order to improve the readability of the equations without loss of generality, exponents representing time and frequency dependencies for all introduced variables are omitted herein.

13.2 파라미터 분리 시스템13.2 Parameter separation system

일반적인 파라미터 분리 시스템은 보조 파라미터 정보(예를 들면, 채널간 상관 값들, 채널간 레벨 차이 값들, 오브젝트간 상관 값들 및/또는 오브젝트 레벨 차이 정보 같은)를 사용하여 신호 혼합물(다운믹스)로부터 다수의 오디오 소스들을 추정하는 것을 목적으로 한다. 이러한 작업의 일반적인 해결책은 최소 평균 제곱 오차(MMSE) 추정 알고리즘들의 적용을 기초로 한다. 공간 오디오 오브젝트 코딩 기술은 그러한 파라미터 오디오 인코딩/디코딩 시스템들의 일례이다.A general parameter separation system may extract multiple audio (s) from a signal mixture (downmix) using supplemental parameter information (e.g., interchannel correlation values, interchannel level difference values, interobserver correlation values, and / The goal is to estimate the sources. A common solution to this task is based on the application of minimum mean square error (MMSE) estimation algorithms. The spatial audio object coding technique is an example of such parameter audio encoding / decoding systems.

도 13은 공간 오디오 오브젝트 코딩 인코더/디코더 구조의 일반적인 원리를 도시한다. 바꾸어 말하면, 도 13은 개략적인 블록 다이어그램 형태의, 최소 평균 제곱 오차 기반 파라미터 다운믹스/업믹스 개념의 개요를 도시한다.13 shows a general principle of a spatial audio object coding encoder / decoder structure. In other words, Fig. 13 shows an outline of the minimum mean square error-based parameter downmix / upmix concept, in schematic block diagram form.

인코더(1310)는 복수의 오브젝트 신호(1312a, 1312b 내지 1312n)를 수신한다. 게다가, 인코더(1310)는 또한 예를 들면 다운믹스 파라미터들일 수 있는, 믹싱 파라미터들(D, 1314)을 수신한다. 인코더(1310)는 이를 기초로 하여, 하나 이상의 다운믹스 신호(1316a, 1316b 등)를 제공한다. 게다가, 인코더는 부가 정보(1318)를 제공한다. 하나 이상의 다운믹스 신호 및 부가 정보는 예를 들면, 인코딩된 형태로 제공될 수 있다.The encoder 1310 receives a plurality of object signals 1312a, 1312b through 1312n. In addition, the encoder 1310 also receives mixing parameters D 1314, which may be, for example, downmix parameters. The encoder 1310 provides one or more downmix signals 1316a, 1316b, etc. based thereon. In addition, the encoder provides additional information 1318. One or more downmix signals and side information may be provided, for example, in encoded form.

인코더(1310)는 일반적으로 오브젝트 신호들(1312a 내지 1312n)을 수신하고 믹싱 파라미터들(1314)에 의존하여 오브젝트 신호들(1312a 내지 1312n)을 하나 이상의 다운믹스 신호(1316a 내지 1316n) 내로 결합하도록 구성되는, 믹서(1320)를 포함한다. 게다가, 인코더는 오브젝트 신호들(1312a 내지 1312n)로부터 부가 정보(1318)를 유도하도록 구성되는, 부가 정보 추정기(1330)를 포함한다. 예를 들면, 부가 정보 추정기(1330)는 부가 정보가 오브젝트 신호들 사이의 관계, 예를 들면 오브젝트 신호들 사이의 교차 상관("오브젝트간 상관(IOC)"으로서 지정될 수 있는) 및/또는 오브젝트 신호들 사이의 레벨 차이들("오브젝트 레벨 차이 정보(OLD)"로서 지정될 수 있는)을 기술하는 정보("오브젝트 레벨 차이 정보(OLD)"로서 지정될 수 있는)를 기술하기 위하여, 부가 정보(1318)를 유도하도록 구성될 수 있다.Encoder 1310 is generally configured to receive object signals 1312a through 1312n and to couple object signals 1312a through 1312n into one or more downmix signals 1316a through 1316n in dependence on mixing parameters 1314. [ And a mixer 1320, as shown in FIG. In addition, the encoder includes an additional information estimator 1330 configured to derive additional information 1318 from the object signals 1312a through 1312n. For example, the additional information estimator 1330 may determine that the additional information is related to the relationship between the object signals, e.g., cross correlation between object signals (which may be designated as " inter-object correlation (IOC) In order to describe information (which may be designated as " object level difference information OLD ") describing level differences between signals (which may be designated as " object level difference information OLD "Lt; RTI ID = 0.0 > 1318 < / RTI >

하나 이상의 다운믹스 신호(1316a, 1316b) 및 부가 정보(1318)는 도면부호 1340으로 표시되는, 디코더(1350)에 저장될 수 있거나 또는 전송될 수 있다.One or more downmix signals 1316a and 1316b and side information 1318 may be stored in decoder 1350 or may be transmitted,

디코더(1350)는 하나 이상의 다운믹스 신호(1316a, 1316b) 및 부가 정보(1318)를 수신하고(예를 들면, 인코딩된 형태로) 이를 기초로 하여, 복수의 출력 오디오 신호(1352a 내지 1352n)를 제공한다. 디코더(1350)는 또한 하나 이상의 렌더링 파라미터(R, 렌더링 매트릭스를 정의할 수 있는)를 포함할 수 있는, 사용자 상호작용 정보(user interaction information, 1354)를 수신할 수 있다. 디코더(1350)는 파라미터 오브젝트 분리기(parameter object separator, 1360), 부가 정보 프로세서(1370) 및 렌더러(1380)를 포함한다. 부가 정보 프로세서(1370)는 부가 정보(1318)를 수신하고 이를 기초로 하여, 파라미터 오브젝트 분리기(1360)에 대한 제어 정보(1372)를 제공한다. 파라미터 오브젝트 분리기(1360)는 부가 정보 프로세서(1370)에 의해 부가 정보(1318)로부터 유도되는, 다운믹스 신호들(1360a, 1360b)과 제어 정보(1372)를 기초로 하여 복수의 오브젝트 신호(1362a 내지 1362n)를 제공한다. 예를 들면, 오브젝트 분리기는 인코딩된 다운믹스 신호들의 디코딩 및 오브젝트 분리를 실행할 수 있다. 렌더러(1380)는 재구성된 오브젝트 신호들(1362a 내지 1362n)을 렌더링하고, 이에 의해 출력 오디오 신호들(1352a 내지 1352n)을 획득한다.Decoder 1350 receives a plurality of output audio signals 1352a through 1352n on the basis of one or more downmix signals 1316a and 1316b and additional information 1318 (e.g., in encoded form) to provide. Decoder 1350 may also receive user interaction information 1354, which may include one or more rendering parameters ( R , which may define a rendering matrix). Decoder 1350 includes a parameter object separator 1360, a side information processor 1370, and a renderer 1380. The side information processor 1370 receives the side information 1318 and provides control information 1372 to the parameter object separator 1360 based on the side information 1318. The parameter object separator 1360 generates a plurality of object signals 1362a to 1360b based on the downmix signals 1360a and 1360b and the control information 1372, which are derived from the side information 1318 by the side information processor 1370. [ 1362n. For example, the object separator can perform decoding of the encoded downmix signals and object separation. The renderer 1380 renders the reconstructed object signals 1362a through 1362n and thereby obtains the output audio signals 1352a through 1352n.

아래에, 최소 평균 제공 오차 기반 파라미터 다운믹스/업믹스 개념의 기능성이 설명될 것이다.Below, the functionality of the minimum average error-based parameter downmix / upmix concept will be described.

일반적인 파라미터 다운믹스/업믹스 처리는 시간/주파수 선택적 방법으로 수행되고 아래의 단계들의 결과로서 설명될 수 있다:General parameter downmix / upmix processing is performed in a time / frequency selective manner and can be described as a result of the following steps:

● "인코더(1310)"에 입력 "오디오 오브젝트들(X)" 및 "믹싱 파라미터들(D)"이 제공된다. "믹서(1320)"는 "믹싱 파라미터들(D)"(예를 들면, 다운믹스 이득들)을 사용하여 "오디오 오브젝트들(X)"을 다수의 "다운믹스 신호들(Y)"로 다운믹싱한다. "부가 정보 추정기"는 입력 "오디오 오브젝트들(X)"의 특징들(예를 들면, 공분산 특성들)을 기술하는 부가 정보(1318)를 추출한다.• Input "Audio Objects X " and "Mixing Parameters D " are provided in "Encoder 1310". Mixer 1320 downconverts " audio objects X " into a plurality of " downmix signals Y " using " mixing parameters D " (e.g., downmix gains) Mix. The additional information estimator extracts additional information 1318 that describes the characteristics (e.g., covariance characteristics) of the input " audio objects X ".

● "다운믹스 신호들(Y)" 및 부가 정보는 전송되거나 또는 저장된다. 이러한 다운믹스 오디오 신호들은 오디오 코더들(MPEG-1/2 계층 Ⅱ 또는 Ⅲ, MPEG-2/4 고급 오디오 코딩, MPEG 통합 음성 및 오디오 코딩 등과 같은)을 사용하여 더 압축될 수 있다. 부가 정보가 또한 효율적으로(예를 들면, 오브젝트 파워들과 오브젝트 상관 계수들의 무손실 코딩된 관계들로서) 표현되고 인코딩된다.&Quot; Downmix signals ( Y ) " and additional information are transmitted or stored. These downmix audio signals may be further compressed using audio coders (such as MPEG-1/2 Layer II or III, MPEG-2/4 advanced audio coding, MPEG integrated voice and audio coding, etc.). The side information is also efficiently represented (e.g., as lossless coded relations of object powers and object correlation coefficients) and encoded.

● 디코더(1350)는 전송된 부가 정보(1318)를 사용하여 디코딩된 "다운믹스 신호들"로부터 원래 "오디오 오브젝트들"을 복원한다. "부가 정보 프로세서(1370)"는 X의 파라미터 오브젝트 재구성을 획득하기 위하여 "파라미터 오브젝트 분리기(1360)" 내의 "다운믹스 신호들" 상에 적용되도록 업-믹싱 계수들(1372)을 추정한다. 재구성된 "오디오 오브젝트들(1362a 내지 1362n)"은 "렌더링 파라미터들(R)"(1354)의 적용에 의해 출력 채널들(

)에 의해 표현되는, (다채널) 표적 장면으로 렌더링된다.Decoder 1350 uses the transmitted side information 1318 to recover the original " audio objects " from the decoded " downmix signals ". Estimates the mixing coefficients (1372) - "side information processor 1370" is applied to up to "the down-mix signal,""Parameter object splitter 1360" in phase to obtain a reconstruction of the X parameter object. The reconstructed " audio objects 1362a through 1362n " are generated by applying " rendering parameters R "

(Multi-channel) target scene, which is represented by a multi-channel target scene.

게다가, 인코더(1310) 및 디코더(1350)와 관련하여 설명된 기능들은 또한 여기서 설명되는 다른 오디오 인코더들과 디코더들에서 사용될 수 있다는 것에 유의하여야 한다.In addition, it should be noted that the functions described with respect to encoder 1310 and decoder 1350 can also be used in other audio encoders and decoders described herein.

13.3 최소 평균 제곱 오차의 13.3 The minimum mean square error 직교성Orthogonality (( orthogonalityorthogonality ) 원리) principle

직교성 원리는 최소 평균 제곱 오차 추정기들의 한 가지 주요 특성이다. V는 벡터들(y₁)의 세트에 의해 스패닝되고(spanned) 벡터는 x∈W인, 두 개의 에르미트 공간(W 및 V)을 고려한다. 만일 벡터들(y₁∈W)의 선형 결합으로서 x에 근사치인 추정치(

)를 발견하고 평균 제곱 오차를 최소화기를 원하면, 오차 벡터는 벡터들(y₁)에 의해 스패닝되는 공간 상에 직각이 될 것이다:The orthogonality principle is one key characteristic of least mean square error estimators. V is considered to be vectors spanning by the set of (y ₁₎ (spanned) vectors x∈ W is, two Hermitian space (W and V). If a linear combination of vectors (y1 < _{RTI ID} = 0.0 _> W ) <

) And want to minimize the mean square error, the error vector will be perpendicular to the space spanned by the vectors y ₁ :

그 결과, 추정 오차 및 추정치 자체는 직각이다:As a result, the estimation error and the estimate itself are orthogonal:

기하학적으로 이는 도 14에 도시된 실시 예들에 의해 이를 시각화할 수 있다.Geometrically this can be visualized by the embodiments shown in FIG.

도 14는 3차원 공간 내의 직교성 원리를 위한 기하학적 표현을 도시한다. 도시된 것과 같이, 벡터 공간은 벡터들(y₁, y₂)에 의해 스패닝된다. 벡터(x)는 벡터(

) 및 차이 벡터(또는 오차 벡터, e)의 합계와 동일하다. 도시된 것과 같이, 오차 벡터(e)는 벡터들(y₁, y₂)에 의해 스패닝되는 벡터 공간(또는 평면)(V)에 직각이다. 따라서, 벡터(

)는 벡터 공간(V) 내의 x의 최상의 근사치로서 고려될 수 있다.Figure 14 shows a geometric representation for the orthogonality principle in three-dimensional space. As shown, the vector space is spanned by vectors y ₁ , y ₂ . The vector (x) is a vector (

) And a difference vector (or an error vector, e). As shown, the error vector e is orthogonal to the vector space (or plane) V spanned by the vectors y ₁ , y ₂ . Therefore,

) Can be considered as the best approximation of x in vector space (V).

13.4. 파라미터 재구성 오차13.4. Parameter reconstruction error

N 신호들을 포함하는 매트릭스의 정의: x 및 X _Error 인 추정 오차로, 아래의 특성들이 공식화될 수 있다. 원래 신호는 다음과 같이 파라미터 재구성(

) 및 재구성 오차(X _Error )의 합계로서 표현될 수 있다:With the definition of a matrix containing N signals: x and X _Error , the following properties can be formulated: The original signal is the parameter reconstruction (

) And a reconstruction error ( X _Error ): < EMI ID =

직교성 원리 때문에, 원래 신호들의 공분산 매트릭스(E _X =XX ^H )는 다음과 같이 재구성된 신호들의 공분산 매트릭스(

) 및 추정 오차들의 공분산 매트릭스(

)의 합계로서 공식화될 수 있다:Due to the orthogonality principle, the covariance matrix ( E _X = XX ^H ) of the original signals is transformed into a covariance matrix of reconstructed signals

) And the covariance matrix of the estimation errors (

): &Lt; RTI ID = 0.0 >

입력 오브젝트들(X)이 다운믹스 채널들에 의해 스패닝되는 공간 내에 존재하지 않고(예를 들면, 다운믹스 채널들의 수는 입력 채널들의 수보다 작다) 입력 오브젝트들이 다운믹스 채널들의 선형 조합들로서 표현될 수 없을 때, 최소 평균 제곱 오차 기반 알고리즘들은 재구성 부정확성(

)을 도입한다.If the input objects X are not in space spanned by the downmix channels (e.g., the number of downmix channels is less than the number of input channels) and the input objects are represented as linear combinations of downmix channels When not available, the minimum mean square error-based algorithms use reconstruction inaccuracy

).

13.5. 13.5. 오브젝트간Between Objects 상관 relation

청각 시스템에서, 교차 공분산(간섭/상관)은 음향에 의해 둘러싸이는 포락(envelopment)의 지각 및 음원의 지각된 폭과 밀접하게 관련된다. 예를 들면, 공간 오디오 오브젝트 코딩 기반 시스템들에서 오브젝트간 상관(IOC) 파라미터들은 이러한 특성의 특징화를 위하여 사용된다:In auditory systems, cross covariance (interference / correlation) is closely related to the perception of the envelopment surrounded by the sound and the perceived width of the sound source. For example, inter-object correlation (IOC) parameters in spatial audio object coding based systems are used for characterization of these characteristics:

두 개의 오디오 신호를 사용하는 음원의 재생의 일 실시 예가 고려된다. 만일 오브젝트간 상관 값이 1에 가까우면, 음향은 잘 지역화된 지점 소스로서 지각된다. 만일 오브젝트간 상관 값이 0에 가까우면, 지각된 음원의 폭은 증가하고 극단적인 경우들에 대하여 이는 두 개의 독특한 소스들로서 지각될 수 있다[Blauert, 3장].One embodiment of the reproduction of a sound source using two audio signals is contemplated. If the correlation value between objects is close to 1, the sound is perceived as a well-localized point source. If the correlation value between objects is close to zero, the perceived sound source width increases and for extreme cases this can be perceived as two distinct sources [Blauert, Chapter 3].

13.6 재구성 13.6 Reconstruction 부정확도에Inaccuracy 대한 보상 Compensation for

결함 있는 파라미터 재구성의 경우에, 출력 신호는 원래 오브젝트들과 비교하여 낮은 에너지를 나타낼 수 있다. 공분산 매트릭스의 대각선 요소들 내의 오차는 왜곡된 공간 음향 이미지(이상적인 기준 출력과 비교하여) 내의 비-대각선 요소들 내의 가청 레벨 차이들 및 오차를 야기할 수 있다. 제안된 방법은 이러한 문제점을 해결하기 위한 목적을 갖는다.In the case of a defective parameter reconstruction, the output signal may exhibit lower energy compared to the original objects. The error within the diagonal elements of the covariance matrix may cause audible level differences and errors in the non-diagonal elements within the distorted spatial acoustic image (as compared to the ideal reference output). The proposed method has a purpose to solve such a problem.

MPEG 서라운드(MPS)에서, 예를 들면, 이러한 문제는 단지 일부 특정 채널 기반 처리 시나리오를 위하여, 즉 모노/스테레오 다운믹스 및 제한된 고정 출력 구성들(예를 들면, 모노, 스테레오, 5.1, 7.1 등)을 위하여 처리된다. 또한 모노/스테레오 다운믹스를 사용하는 공간 오디오 오브젝트 코딩 같은, 오브젝트 기원 기술들에서, 이러한 문제점은 5.1 출력 구성을 위한 MPEG 서라운드 후처리 렌더링의 적용에 의해서만 처리된다.In MPEG Surround (MPS), for example, this problem is solved only for some specific channel based processing scenarios, namely mono / stereo downmix and limited fixed output configurations (e.g., mono, stereo, 5.1, Lt; / RTI > In object-oriented techniques such as spatial audio object coding using a mono / stereo downmix, this problem is handled only by the application of MPEG surround post-processing rendering for a 5.1 output configuration.

현존하는 해결책들은 표준 출력 구성들 및 고정된 수의 입력/출력 채널들에 한정된다. 즉, 이것들은 단지 "모노-대-스테레오"(또는 "stereo-to-three") 채널 역상관 방법들을 구현하는, 몇몇 블록들의 뒤따르는 적용으로서 실현된다.Existing solutions are limited to standard output configurations and a fixed number of input / output channels. That is, they are realized as a subsequent application of some blocks, implementing only "mono-to-three" (or "stereo-to-three") channel decorrelation methods.

따라서, 파라미터 재구성 부정확성 보상을 위한 일반적인 해결책(예를 들면, 에너지 레벨 및 상관 특성들 보정 방법)이 바람직한데, 이는 유연한 수의 다운믹스/업믹스 채널들 및 임의 출력 구성 설정들을 위하여 적용될 수 있다. Thus, a common solution for parameter reconstruction inaccuracy compensation (e.g., energy level and correlation property correction methods) is desirable, which can be applied for a flexible number of downmix / upmix channels and arbitrary output configuration settings.

13.7. 결론13.7. conclusion

결론적으로, 부호에 대한 개요가 제공되었다. 게다가, 본 발명에 따른 실시 예들의 기초가 되는 파라미터 분리 시스템이 설명되었다. 게다가, 직교성 원리가 치소 평균 제곱 오차 추정에 적용되는 것이 설명되었다. 게다가, 재구성 오차(X _Error )의 존재하에서 적용되는 공분산 매트릭스(E _X)의 계산을 위한 방정식이 제공되었다. 또한, 이른바 오브젝트간 상관 값들 및 공분산 매트릭스(E _X)의 요소들 사이의 관계가 설명되었는데, 이는 예를 들면, 오브젝트간 상관 값들(파라미터 부가 정보 내에 포함될 수 있는)로부터, 그리고 가능하게는 오브젝트 레벨 차이들로부터 요구되는 공산분 특징들(또는 상관 특징들)을 유도하기 위하여 본 발명에 따른 실시 예들에 적용될 수 있다. 게다가, 재구성된 오브젝트 신호들의 특징들은 결함 있는 재구성 때문에 요구되는 특징들과 다를 수 있다는 것이 설명되었다. 게다가, 문제점을 처리하기 위한 현존하는 해결책들은 일부 특정 출력 구성들에 한정되고 종래의 해결책들을 융통성 없게 만드는, 표준 블록들의 특정 조합에 의존한다는 것이 설명되었다.In conclusion, an overview of the codes has been provided. In addition, a parameter separation system on which the embodiments according to the present invention are based has been described. In addition, it has been described that the orthogonality principle is applied to the Chi square mean square error estimation. In addition, equations have been provided for the calculation of the applied covariance matrix ( E _X ) in the presence of a reconstruction error ( X _Error ). The relationship between so-called inter-object correlation values and the elements of the covariance matrix ( E _X ) has also been described, for example from inter-object correlation values (which may be included in the parameter side information) Can be applied to embodiments according to the present invention to derive the required communicative features (or correlation features) from the differences. In addition, it has been described that the features of the reconstructed object signals may differ from those required due to faulty reconstruction. In addition, it has been described that existing solutions for handling problems are limited to some specific output configurations and rely on certain combinations of standard blocks, making conventional solutions inflexible.

14. 도 15에 따른 실시 예14. Embodiment according to Fig. 15

14.1. 개념 개요14.1. Concept overview

본 발명에 따른 실시 예들은 임의의 수의 다운믹스/업믹스 채널을 위한 역상관 해결책으로 파라미터 오디오 분리 전략들에서 사용되는 최소 평균 제곱 오차 파라미터 재구성 방법들로 확장한다. 예를 들면 본 발명의 장치 및 본 발명의 방법 같은, 본 발명에 따른 실시 예들은 파라미터 재구성 동안에 에너지 손실을 보상하고 추정된 오브젝트들의 상관 특성들을 복원할 수 있다.Embodiments in accordance with the present invention extend to minimum mean square error parameter reconstruction methods used in parameter audio separation strategies as an inverse correlation solution for any number of downmix / upmix channels. Embodiments in accordance with the present invention, such as the apparatus of the present invention and the method of the present invention, can compensate for energy loss during parameter reconstruction and recover the correlation properties of the estimated objects.

도 15는 통합된 역상관 경로를 갖는 파라미터 다운믹스/업믹스 개념의 개요를 제공한다. 바꾸어 말하면, 도 15는 개략적인 블록 다이어그램 형태로, 렌더링된 출력 상에 적용되는 역상관을 갖는 파라미터 재구성 시스템을 도시한다.Figure 15 provides an overview of a parameter downmix / upmix concept with an integrated decorrelation path. In other words, FIG. 15 shows a parameter reconstruction system with a decorrelation applied on the rendered output in the form of a schematic block diagram.

도 15에 따른 시스템은 실질적으로 도 13에 따른 인코더(1310)와 동일한, 인코더(1510)를 포함한다. 인코더(1510)는 복수의 오브젝트 신호(1512a 내지 1512n)를 수신하고, 이를 기초로 하여, 하나 이상의 다운믹스 신호(1516a, 1516b)뿐만 아니라 부가 정보(1518)를 제공한다. 다운믹스 신호들(1516a, 1516b)은 실질적으로 다운믹스 신호들(1316a, 1316b)과 동일할 수 있으며 Y로 지정될 수 있다. 부가 정보(1518)는 실질적으로 부가 정보(1318)와 동일할 수 있다. 그러나, 부가 정보는 예를 들면, 역상관 모드 파라미터 또는 역상관 방법 파라미터, 혹은 역상관 복잡도 파라미터를 포함할 수 있다. 게다가, 인코더(1510)는 믹싱 파라미터들(1514)을 수신할 수 있다.The system according to FIG. 15 includes an encoder 1510, which is substantially the same as encoder 1310 according to FIG. The encoder 1510 receives the plurality of object signals 1512a through 1512n and provides additional information 1518 as well as one or more downmix signals 1516a and 1516b based thereon. The downmix signals 1516a and 1516b may be substantially the same as the downmix signals 1316a and 1316b and may be designated by Y. [ The additional information 1518 may be substantially the same as the additional information 1318. [ However, the side information may include, for example, an inverse correlation mode parameter or an inverse correlation method parameter, or an inverse correlation complexity parameter. In addition, the encoder 1510 may receive mixing parameters 1514. [

파라미터 재구성 시스템은 또한 하나 이상의 다운믹스 신호(1516a, 1516b) 및 부가 정보(1518)의 전송 및/또는 저장을 포함하고, 전송 및/또는 저장은 1540으로 지정되며, 하나 이상의 다운믹스 신호(1516a, 1516b) 및 부가 정보(1518, 파라미터 부가 정보를 포함할 수 있는)가 인코딩될 수 있다.The parameter reconstruction system also includes the transmission and / or storage of one or more downmix signals 1516a and 1516b and additional information 1518, wherein the transmission and / or storage is designated as 1540 and the one or more downmix signals 1516a, 1516b and additional information 1518, which may include parameter side information, may be encoded.

게다가, 도 15에 따른 파라미터 재구성 시스템은 전송되거나 또는 저장된 하나 이상의(가능하게는 인코딩된) 다운믹스 신호(1516a, 1516b) 및 전송되거나 또는 저장된(가능하게는 인코딩된) 부가 정보(1518)를 수신하고 이를 기초로 하여, 출력 오디오 신호들(1552a 내지 1552n)을 제공하도록 구성되는, 디코더(1550)를 포함한다. 디코더(1550, 다채널 오디오 디코더로서 고려될 수 있는)는 파라미터 오브젝트 분리기(1560) 및 부가 정보 프로세서(1570)를 포함한다. 게다가, 디코더(1550)는 렌더러(1580), 역상관기(1590) 및 믹서(1598)를 포함한다.15 further includes one or more (possibly encoded) downmix signals 1516a, 1516b that are transmitted or stored and additional information 1518 that is transmitted or stored (possibly encoded) And based thereon, to provide output audio signals 1552a through 1552n. Decoder (1550, which may be considered as a multi-channel audio decoder) includes a parameter object separator (1560) and a side information processor (1570). In addition, the decoder 1550 includes a renderer 1580, an decorrelator 1590, and a mixer 1598.

파라미터 오브젝트 분리기(1560)는 하나 이상의 다운믹스 신호(1516a, 1516b) 및 부가 정보(1518)를 기초로 하여 부가 정보 프로세서(1570)에 의해 제공되는, 제어 정보(1572)를 수신하고 이를 기초로 하여 또한

로서 지정되고 디코딩된 오디오 신호들로서 고려될 수 있는, 오브젝트 신호들(1562a 내지 1562b)을 제공하도록 구성된다. 제어 정보(1572)는 예를 들면, 재구성된 오브젝트 신호들(예를 들면, 디코딩된 오디오 신호들들(1562a 내지 1562b))을 획득하기 위하여 파라미터 오브젝트 분리기 내의 다운믹스 신호들 상에 적용되려는(예를 들면, 인코딩된 다운믹스 신호들(1516a, 1516b)로부터 유도되는 디코딩된 다운믹스 신호들에 대한) 비-믹싱 계수들을 포함할 수 있다. 렌더러(1580)는 디코딩된 오디오 신호들(1562a 내지 1562n, 재구성된 오디오 신호들일 수 있고, 예를 들면 입력 오브젝트 신호들(1512a 내지 1512n)과 상응할 수 있는)을 렌더링하고, 이에 의해 복수의 렌더링된 오디오 신호(1582a 내지 1582n)를 획득한다. 예를 들면, 렌더러(1580)는 예를 들면 사용자 상호작용에 의해 제공될 수 있고, 예를 들면 렌더링 매트릭스를 정의할 수 있는, 렌더링 파라미터들(R)을 고려할 수 있다. 그러나, 대안으로서, 렌더링 파라미터들은 인코딩된 표현(인코딩된 다운믹스 신호들(1516a, 1516b)과 인코딩된 부가 정보(1518)를 포함할 수 있는)로부터 얻을 수 있다.The parameter object separator 1560 receives the control information 1572 provided by the side information processor 1570 based on the one or

more downmix signals

1516a and 1516b and the side information 1518, Also

And can be considered as decoded audio signals. Control information 1572 may include information that is to be applied on the downmix signals in the parameter object separator to obtain reconstructed object signals (e.g., decoded audio signals 1562a through 1562b) (E.g., for decoded downmix signals derived from encoded

downmix signals

1516a, 1516b). The renderer 1580 may be adapted to render the decoded audio signals 1562a through 1562n, which may be reconstructed audio signals and may correspond to, for example, input object signals 1512a through 1512n, And acquires the audio signals 1582a to 1582n. For example, the renderer 1580 may consider rendering parameters R, which may be provided by, for example, user interaction and may, for example, define a render matrix. However, as an alternative, rendering parameters may be obtained from an encoded representation (which may include encoded

downmix signals

1516a, 1516b and encoded side information 1518).

역상관기(1590)는 렌더링된 오디오 신호들(1582a 내지 1582n)을 수신하고 이를 기초로 하여, 또한 W로서 지정되는, 역상관된 오디오 신호들(1592a 내지 1592n)을 제공하도록 구성된다. 믹서(1598)는 렌더링된 오디오 신호들(1582a 내지 1582n) 및 역상관된 오디오 신호들(1592a 내지 1592n)을 수신하고, 이에 의해 출력 오디오 신호들(1552a 내지 1552n)을 획득하기 위하여 렌더링된 오디오 신호들(1582a 내지 1582n) 및 역상관된 오디오 신호들(1592a 내지 1592n)을 결합하도록 구성된다. 믹서(1598)는 또한 아래에 설명될 것과 같이, 인코딩된 부가 정보(1518)로부터 부가 정보 프로세서(1570)에 의해 유도되는 제어 정보(1574)를 사용한다.The decorrelator 1590 receives the rendered audio signals 1582a through 1582n and is configured to provide the decorrelated audio signals 1592a through 1592n, also designated as W , based thereon. The mixer 1598 receives the rendered audio signals 1582a through 1582n and the decoded correlated audio signals 1592a through 1592n and thereby generates the rendered audio signal 1552a through 1552n to obtain the output audio signals 1552a through 1552n. 1582a through 1582n and the decorrelated audio signals 1592a through 1592n. The mixer 1598 also uses control information 1574 derived by the side information processor 1570 from the encoded side information 1518, as will be described below.

14.2. 14.2. 역상관기Inverse correlator 함수 function

아래에, 역상관기(1590)와 관련하여 일부 상세내용이 설명될 것이다. 그러나, 상이한 역상관기 개념들이 사용될 수 있다는 것에 유의하여야 하며, 이것들 중 일부는 아래에 설명될 것이다.In the following, some details will be described with respect to the inverse correlator 1590. However, it should be noted that different decorrelator concepts may be used, some of which will be described below.

일 실시 예에서, 역상관기 함수(

)는 입력 신호(

)에 직각인 출력 신호(w)를 제공한다. 출력 신호(w)는 동일한(입력신호(

)와) 스펙트럼 및 시간적 엔벨로프 특성들(또는 적어도 유사한 특성들)을 갖는다. 게다가, 신호(w)는 유사하게 지각되고 입력 신호(

)와 동일한(또는 유사한) 주관적 품질을 갖는다(예를 들면, [SAOC] 참조).In one embodiment, the decorrelator function (

) &Lt; / RTI >

Lt; RTI ID = 0.0 > w < / RTI > The output signal w has the same (input signal

) And spectral and temporal envelope properties (or at least similar properties). In addition, the signal w is similarly delayed and the input signal < RTI ID = 0.0 >

) (See, for example, [SAOC]).

다중 입력 신호의 경우에, 만일 역상관 함수가 서로 직각인(예를 들면, 모든 i 및 j에 대하여

이고 i≠j에 대하여

인 것과 같이,

) 다중 출력을 생산하면 이는 바람직하다.In the case of multiple input signals, if the decorrelation function is orthogonal to each other (e.g., for all i and j

And for i ≠ j

As is the case,

) It is desirable to produce multiple outputs.

역상관 함수 구현을 위한 정확한 사양은 본 설명의 범위를 벗어난다. 예를 들면, MPEG 서라운드 표준에서 지정된 일부 무한 임펄스 응답(IIR) 기반 역상관기들의 뱅크(bank)가 역상관 목적들을 위하여 사용될 수 있다[MPS].The exact specification for the implementation of the inverse correlation function is beyond the scope of this description. For example, a bank of some infinite impulse response (IIR) based decorrelators specified in the MPEG Surround standard can be used for correlation purposes [MPS].

본 설명에서 설명되는 일반적인 역상관기들은 이상적인 것으로 추정된다. 이는 (지각적 요구조건들에 더하여) 각각의 역상관기의 출력이 그것의 입력 및 모든 다른 역상관기의 출력에 직각인 것을 나타낸다. 따라서, 공분산(

)을 갖는 입력(

) 및 출력(

)을 위하여 아래의 공분산 매트릭스의 특성들이 유지된다:The general decorrelators described in this description are assumed to be ideal. This indicates that the output of each decorrelator (in addition to the perceptual requirements) is perpendicular to its input and to the output of all other decorrelators. Therefore,

) &Lt; / RTI >

) And output (

) The following properties of the covariance matrix are retained:

이러한 관계들로부터, 다음이 뒤따른다:From these relationships, the following is followed:

역상관기 출력(W)은 입력들로서 예측된 신호들을 사용함으로써 최소 평균 제곱 오차 추정기 내의 예측 부정확성을 보상하도록 사용될 수 있다(예측 오차가 예측된 신호들에 직각인 것을 참조).The decorrelator output W can be used to compensate for the prediction inaccuracies in the minimum mean square error estimator by using predicted signals as inputs (see that the prediction error is orthogonal to the predicted signals).

또한 예측 오차들은 일반적인 경우에 그것들 사이에 직각이 아닌 것에 유의하여야만 한다. 따라서, 본 발명의 개념(예를 들면, 방법)의 한 가지 목적은 결과로서 생기는 혼합물(예를 들면, 출력 오디오 신호들(1552a 내지 1552n))의 공분산 매트릭스가 요구되는 출력의 공분산 매트릭스와 유사하게 되도록, "순수(dry)"(즉, 역상관기 입력) 신호(예를 들면, 렌더링된 오디오 신호들(1582a 내지 1582n)) 및 "적용된(wet)"(즉, 역상관기 출력) 신호(예를 들면, 역상관된 오디오 신호들(1592a 내지 1592n))의 혼합물을 생성하는 것이다.It should also be noted that the prediction errors are not orthogonal between them in the general case. Thus, one object of the inventive concept (e.g., method) is to provide a covariance matrix of the resulting mixture (e. G., Output audio signals 1552a through 1552n) (E. G., Decorrelator input) signals (e. G., Rendered audio signals 1582a through 1582n) and " wet " Correlated audio signals 1592a through 1592n), for example.

게다가, 아래에 상세히 설명될, 역상관 신호의 일부 결함들을 가져오나 수용 가능할 수 있는, 역상관 유닛을 위한 복잡도 감소가 사용될 수 있다는 것에 유의하여야 한다.In addition, it should be noted that the complexity reduction for the decorrelation unit, which may be acceptable or acceptable, may be used to bring some of the defects of the decorrelation signal, which will be described in detail below.

14.3. 14.3. 역상관Reverse correlation 신호들을 사용하는 출력 공분산 보정 Output covariance correction using signals

아래에 합리적으로 뛰어난 청각 효과를 획득하기 위하여 출력 오디오 신호들(1552a 내지 1552n)의 공분산 특징들을 조정하기 위한 개념이 설명될 것이다.The concept for adjusting the covariance characteristics of the output audio signals 1552a through 1552n will be described below to obtain a reasonably good auditory effect.

출력 공분산 오차 보정을 위한 제안된 방법은 파라미터로 재구성된 신호(

, 예를 들면 렌더링된 오디오 신호들(1582a 내지 1582n)) 및 그것의 역상관된 부분(W)의 가중 합계로서 출력 신호(

, 예를 들면 출력 오디오 신호들(1552a 내지 1552n))을 포함한다. 이러한 합계는 다음과 같이 표현될 수 있다:The proposed method for output covariance error correction is based on the reconstructed signal (

(E.g., rendered audio signals 1582a through 1582n) and its decorrelated portion W ,

, E.g., output audio signals 1552a through 1552n). This sum can be expressed as: < RTI ID = 0.0 >

직접적인 신호(

)에 적용되는 믹싱 매트릭스들(P) 및 역상관된 신호(W)에 적용되는 M은 다음의 구조를 갖는다(여기서 N=N _UpmixCh 이고, N _UpmixCh 은 출력 오디오 신호들의 수와 동일할 수 있는, 렌더링된 오디오 신호들의 수를 지정한다):Direct signal (

(Where N = N _UpmixCh and N _UpmixCh is the number of output audio signals, which may be equal to the number of output audio signals), and M applied to the mixing matrices P and the decorrelated signal W, Specifies the number of rendered audio signals):

결합된 매트릭스(F = [P M]) 및 신호(

)에 대한 기호를 적용하여, 아래와 같이 생성된다:The combined matrix ( F = [ PM ]) and signal (

), It is generated as follows:

이러한 표현을 사용하여, 출력 신호(

)의 공분산 매트릭스(

)는 다음과 같이 정의된다:Using this expression, the output signal (

) Covariance matrix (

) Is defined as: < RTI ID = 0.0 >

이상적으로 생성된 렌더링된 출력 장면의 표적 공분산(C)은 다음과 같이 정의된다:The target covariance ( C ) of the ideally generated rendered output scene is defined as:

믹싱 매트릭스(F)는 다음과 같이 최종 출력의 공분산 매트릭스(

)가 표적 공산에 근사치이거나 또는 동일하도록 계산된다:The mixing matrix ( F ) can be expressed as the covariance matrix of the final output

) Is calculated to be approximate or equal to the target conjugate:

믹싱 매트릭스(F)는 예를 들면, 다음과 같은 알려진 양들의 함수(F=F(E _s ,E _X ,R)로서 계산되는데:Mixing the matrix (F) is, for example, is calculated as the following known amount of functions (F = F such as (E _s, E _X, R):

여기서 매트릭스들(U, T 및 V, Q)는 예를 들면 아래와 같이 생성하는 공분산 매트릭스들(E _S 및 C)의 단일 값 분해(SVD)을 사용하여 결정될 수 있다:Where the matrices U , T and V , Q may be determined using a single value decomposition (SVD) of the covariance matrices E _S and C , for example,

C=UTU ^H , E _S =VQV ^H C = UTU ^H , E _S = VQV ^H

프로토타입 매트릭스(H)는 직접적인 그리고 역상관된 신호 경로들에 대한 요구되는 가중들에 따라 선택될 수 있다.The prototype matrix H may be selected according to the required weights for the direct and the decorrelated signal paths.

예를 들면, 가능한 프로토타입 매트릭스(H)는 다음과 같이 결정될 수 있다:For example, the possible prototype matrix H can be determined as follows:

, 여기서

이다.

, here

to be.

아래에, 일반적인 매트릭스(F)에 대한 수학적 유도가 제공될 것이다.Below, a mathematical derivation of the general matrix F will be provided.

바꾸어 말하면, 일반적인 해결책을 위한 믹싱 매트릭스(F)의 유도가 아래에 설명될 것이다.In other words, the derivation of the mixing matrix F for a general solution will be described below.

공분산 매트릭스들(E _S 및 C)은 예를 들면, 다음과 같이 단일 값 분해(SVD)를 사용하여 표현될 수 있는데:The covariance matrices E _S and C may be expressed using, for example, single value decomposition (SVD) as follows:

E _S = VQV ^H , C = UTU ^H . E _S = VQV ^H , C = UTU ^H.

여기서 T 및 Q는 각각 E _S 및 C의 단일 값들을 갖는 대각선 매트릭스들이고, U 및 V는 상응하는 단일 벡터들을 포함하는 단위 매트릭스(unitary matrix)들이다.Where T and Q are diagonal matrices with single values of E _S and C , respectively, and U and V are unitary matrices containing corresponding single vectors.

슈어 삼각측량(Schur triangulation) 또는 고유값 분해(eigenvalue decomposition)(단일 값 분해 대신에)의 적용이 유사한 결과들(또는 만일 대각선 매트릭스들(Q 및 T)이 양의 값들에 한정되면 심지어 동일한 결과들)에 이르게 한다는 유의하여야 한다.If the application of Schur triangulation or eigenvalue decomposition (instead of single valued decomposition) yields similar results (or even if the diagonal matrices Q and T are limited to positive values, ).

이러한 분해를 요구조건(E _Z

C)에 적용하여, 다음을 생성한다(적어도 근사치로):This decomposition is called a requirement ( E _Z

C ) to generate (at least approximate):

공분산 매트릭스들의 차원수(dimensionality)에 주의하기 위하여, 일부 경우에 규칙화가 필요하다. 예를 들면,

인 특성을 갖는 크기 N _UpmixCh ×2N _UpmixCh 의 프로토타입 매트릭스(H)가 적용될 수 있다:In order to note the dimensionality of covariance matrices, in some cases a rule is needed. For example,

A prototype matrix ( H ) of size N _UpmixCh × 2 N _{UpmixCh with} the in- _feature can be applied:

그 뒤에 믹싱 매트릭스(F)가 다음과 같이 결정될 수 있다:The mixing matrix F can then be determined as follows:

프로토타입 매트릭스(H)는 직접적인 그리고 역상관된 신호 경로들을 위한 요구되는 가중들에 따라 선택된다. 예를 들면, 가능한 프로토타입 매트릭스(H)는 다음과 같이 결정될 수 있는데:The prototype matrix H is selected according to the required weights for the direct and inverse correlated signal paths. For example, a possible prototype matrix ( H ) can be determined as follows:

, 여기서

이다.

, here

to be.

결합된 신호들의 공분산 매트릭스(E _S )의 조건에 의존하여, 일부 규칙화를 포함하도록 마지막 방정식이 필요할 수 있으나, 그렇지 않으면 이는 수치상으로 안정적이어야만 한다.Depending on the condition of the covariance matrix ( E _S ) of the combined signals, the last equation may be needed to include some regularization, but otherwise it must be numerically stable.

결론적으로, 렌더링된 오디오 신호들(매트릭스(

)에 의해, 또는 동등하게 벡터()에 의해 표현되는) 및 역상관된 오디오 신호들(매트릭스(W)에 의해, 또는 동등하게 벡터(w)에 의해 표현되는)을 기초로 하여 출력 오디오 신호들(매트릭스(

)에 의해, 또는 동등하게 벡터(w)에 의해 표현되는)을 유도하기 위한 개념이 설명되었다. 알 수 있는 것과 같이, 일반 매트릭스 구조의 두 개의 믹싱 매트릭스(P 및 M)가 공통으로 결정된다. 예를 들면, 위에 정의된 것과 같이, 결합된 매트릭스(F)는 출력 오디오 신호들(1552a 내지 1552n)의 공분산 매트릭스(

)가 요구되는 공분산(또한 표적 공분산으로서 지정되는, C)과 근사치이거나 또는 동등하도록 결정될 수 있다. 요구되는 공분산 매트릭스(C)는 예를 들면, 렌더링 매트릭스(R, 예를 들면, 사용자 상호작용에 의해 제공될 수 있는)의 지식을 기초로 하고 오브젝트 공분산 매트릭스(E _X )의 지식을 기초로 하여 유도될 수 있고, 이는 예를 들면 인코딩된 부가 정보(1518)를 기초로 하여 유도될 수 있다. 예를 들면, 오브젝트 공분산 매트릭스(E _X )는 위에서 설명되고 인코딩된 부가 정보(1518) 내에 포함될 수 있는, 오브젝트간 상관 값들(IOC)을 사용하여 유도될 수 있다. 따라서, 표적 공분산 매트릭스(C)는 예를 들면, 정보(1574)로서 또는 정보(1574)의 일부분으로서 부가 정보 프로세서(1570)에 의해 제공될 수 있다.In conclusion, the rendered audio signals (matrix

) Or on the basis of the decorrelated audio signals (represented by the matrix W , or equivalently, by the vector w) (matrix(

), Or equivalently, by the vector w) has been described. As can be seen, the two mixing matrices P and M of the general matrix structure are commonly determined. For example, as defined above, the combined matrix F may be a covariance matrix of output audio signals 1552a through 1552n

) May be approximated or equal to the required covariance (also designated as the target covariance, C ). The required covariance matrix C is based on knowledge of, for example, a rendering matrix R (e.g., which may be provided by user interaction) and based on knowledge of the object covariance matrix E _X Which may be derived, for example, based on the encoded side information 1518. [ For example, the object covariance matrix E _X may be derived using inter-object correlation values (IOC), which may be included within the encoded side information 1518 as described above. Thus, the target covariance matrix C may be provided by the side information processor 1570, for example, as information 1574 or as part of information 1574. [

그러나 대안으로서, 부가 정보 프로세서(1570)는 또한 정보(1574)로서 믹싱 매트릭스(F)를 믹서(1598)에 직접적으로 제공할 수 있다.However, as an alternative, the supplementary information processor 1570 may also provide the mixing matrix F as information 1574 directly to the mixer 1598.

게다가, 단일 값 분해를 사용하는, 믹싱 매트릭스(F)를 위한 계산 규칙이 설명되었다. 그러나, 몇몇 자유도가 존재하는데, 그 이유는 프로토타입 매트릭스(H)의 엔트리들(a _i,i , b _i,i )이 선택될 수 있기 때문이다. 바람직하게는, 프로토타입 매트릭스(H)의 엔트리들은 0과 1 사이의 어딘가가 되도록 선택될 수 있다. 만일 값들(a _i,i )이 1에 가깝도록 선택되면, 렌더링된 오디오 신호들의 유의한 믹싱이 존재할 것이고, 역상관된 오디오 신호들의 영향은 상대적으로 적은데, 이는 일부 상황들에서 바람직할 수 있다. 그러나, 일부 다른 상황들에서 역상관된 오디오 신호들의 상대적으로 큰 영향을 갖고 렌더링된 오디오 신호들 사이에 약한 믹싱만이 존재하는 것이 더 바람직할 수 있다. 이러한 경우에, 값들(b _i,i )은 일반적으로 a _i,i 보다 크도록 선택된다. 따라서, 디코더(1550)는 프로토타입 매트릭스(H)의 엔트리들을 적절하게 선택함으로써 요구조건들에 적용될 수 있다.In addition, calculation rules for the mixing matrix F , using single value decomposition, have been described. However, there are some degrees of freedom, because the entries ( a _{i, i} , b _{i, i} ) of the prototype matrix H can be selected. Preferably, the entries of the prototype matrix H may be selected to be somewhere between zero and one. If the values ( a _{i, i} ) are chosen to be close to 1, there will be a significant mixing of the rendered audio signals and the effect of the decorrelated audio signals is relatively small, which may be desirable in some situations. However, in some other situations it may be preferable to have only a weak mixing between the rendered audio signals with a relatively large impact of the decoded audio signals. In this case, the values ( b _{i, i} ) are generally chosen to be greater than a _{i, i} . Thus, the decoder 1550 can be applied to requirements by appropriately selecting the entries of the prototype matrix H.

14.4. 출력 공분산 보정을 위한 단순화된 방법들14.4. Simplified methods for output covariance correction

본 섹션에서, 위에 언급된 믹싱 매트릭스(F)에 대한 두 가지 대안의 구조가 그것의 값들을 결정하기 위한 바람직한 알고리즘들과 함께 설명된다. 두 가지 대운은 상이한 입력 콘텐츠(예를 들면, 오디오 콘텐츠)를 위하여 디자인된다:In this section, two alternative schemes for the above-mentioned mixing matrix F are described with the preferred algorithms for determining its values. Two strategies are designed for different input content (e.g., audio content):

- 매우 상관된 콘텐츠를 위한 공분산 조정 방법(예를 들면, 상이한 채널 부분들 사이에 높은 상관을 갖는 채널 기반 입력)- covariance adjustment methods for highly correlated content (e.g., channel based input with high correlation between different channel portions)

- 독립 입력 신호들을 위한 에너지 보상 방법(예를 들면, 일반적으로 독립적으로 추정되는, 오브젝트 기반 입력).Energy compensation methods for independent input signals (e.g., object-based inputs, which are generally estimated independently).

14.4.1. 공분산 조정 방법(A)14.4.1. Covariance Adjustment Method (A)

신호(

, 예를 들면 렌더링된 오디오 신호들(1582a 내지 1582n))가 최소 평균 제곱 오차 의미에서 이미 최적인 것으로 고려하면, 출력(

)의 공분산 특성들을 향상시키기 위하여 파라미터 재구성들(

, 예를 들면 출력 오디오 신호들(1552a 내지 1552n))을 변형하는 것은 일반적으로 바람직하지 않은데 그 이유는 이것이 분리 품질에 영향을 미칠 수 있기 때문이다.signal(

, E.g., rendered audio signals 1582a through 1582n) are already considered optimal in the minimum mean square error sense, the output

Lt; RTI ID = 0.0 > (c) < / RTI &

, E.g., output audio signals 1552a through 1552n) is generally undesirable because it can affect the quality of the separation.

만일 역상관된 신호들(W)의 혼합물만이 조정되면, 믹싱 매트릭스(P)는 단위 매트릭스(또는 그것의 다수)로 감소될 수 있다. 따라서, 단순화된 방법은 다음을 설정함으로써 설명될 수 있다:If only a mixture of decorrelated signals W is adjusted, the mixing matrix P may be reduced to a unit matrix (or a plurality of it). Thus, a simplified method can be described by setting:

시스템의 최종 출력은 다음과 같이 표현될 수 있다:The final output of the system can be expressed as:

그 결과 시스템의 최종 출력 공분산은 다음과 같이 표현될 수 있다:As a result, the final output covariance of the system can be expressed as:

렌더링된 파라미터 재구성(예를 들면, 렌더링된 오디오 신호들)의 이상적(또는 요구되는) 출력 공분산 매트릭스(C) 및 공분산 매트릭스(

) 사이의 차이(△ _E )는 다음에 의해 주어진다:The ideal (or required) output covariance matrix ( C ) and covariance matrix (covariance matrix) of the rendered parameter reconstruction (e.g., rendered audio signals)

) &Lt; / RTI & _gt ; is given by: < RTI ID = 0.0 >

따라서, 믹싱 매트릭스(M)는 아래와 같이 되도록 결정된다:Thus, the mixing matrix M is determined to be:

믹싱 매트릭스(M)는 믹싱되고 역상관된 신호들(MW)의 공분산 매트릭스가 요구되는 공분산 및 순수(dry) 신호들(예를 들면, 렌더링된 오디오 신호들)의 공분산 사이의 공분산 차이와 동등하거나 또는 근사치가 되도록 계산된다. 그 결과 최종 출력의 공분산은 표적 공분산(

)에 근사치일 것이다:The mixing matrix M is equal to the covariance difference between the covariance of the required covariance and dry signals (e.g., the rendered audio signals) and the covariance matrix of the mixed and deconvolved signals MW Or approximate. As a result, the covariance of the final output is the target covariance (

):

여기서 매트릭스들(U, T 및 V, Q)는 예를 들면 아래와 같이 생성하는 공분산 매트릭스들(△ _E 및 E _W )의 단일 값 분해(SVD)을 사용하여 결정될 수 있다:The matrices (U, T and V, Q) can be determined using the singular value decomposition (SVD) of the covariance matrix, for example to generate, as shown below (△ _E and E _W):

△ _E = UTU ^H , E _W = VQV ^H _E = UTU ^H , E _W = VQV ^H

이러한 접근법은 순수 출력(예를 들면, 렌더링된 오디오 신호들(1582a 내지 1582n)의 사용을 최대화하는 뛰어난 교차 상관 재구성을 보장하고 역상관된 신호들만의 믹싱의 자유도를 사용한다. 바꾸어 말하면, 렌더링된 오디오 신호들(또는 그것들의 스케일링된 버전)을 하나 이상의 역상관된 오디오 신호와 결합할 때 상이한 렌더링된 오디오 신호들 사이에 허용되는 어떠한 믹싱도 존재하지 않는다. 그러나, 출력 오디오 신호들의 교차 상관 특징들 또는 교차 공분산 특징들을 조정하기 위하여, 동일하거나 또는 상이한 스케일링으로, 주어진 역상관된 신호가 복수의 렌더링된 오디오 신호 또는 그것들의 스케일링된 버전과 결합되는 것이 허용된다. 결합은 예를 들면, 여기서 정의되는 것이 같은 매트릭스(M)에 의해 정의된다.This approach ensures excellent cross-correlation reconstruction that maximizes the use of pure output (e.g., using the rendered audio signals 1582a through 1582n) and uses the degree of freedom of mixing only the decorrelated signals. In other words, There is no mixing allowed between different rendered audio signals when combining the audio signals (or their scaled version) with one or more decorrelated audio signals. However, the cross-correlation features of the output audio signals Or a scaled version of a plurality of rendered audio signals or a scaled version thereof, with the same or different scaling, to adjust cross covariance characteristics. Is defined by the same matrix ( M ).

아래에, 제한된 매트릭스(F)를 위한 수학적 유도가 제공될 것이다.Below, a mathematical derivation for the constrained matrix F will be provided.

바꾸어 말하면, 단순화된 방법 "A"를 위한 믹싱 매트릭스(M)의 유도가 설명될 것이다.In other words, the derivation of the mixing matrix M for the simplified method " A " will be described.

공분산 매트릭스들(△ _E 및 E _W )은 예를 들면, 아래와 같이 단일 값 분해(SVD)를 사용하여 표현될 수 있는데:The covariance matrices? _E and E _W can be expressed using, for example, single valued decomposition (SVD) as follows:

△ _E = UTU ^H , E _W = VQV ^H _E = UTU ^H , E _W = VQV ^H

여기서 T 및 Q는 각각 △ _E 및 E _W 의 단일 값들을 갖는 대각선 매트릭스들이고, U 및 V는 상응하는 단일 벡터들을 포함하는 단위 매트릭스들이다.Where T and Q are respectively _ΔE and E _W And U and V are unit matrices containing corresponding single vectors.

슈어 삼각측량 또는 고유값 분해(단일 값 분해 대신에)의 적용이 유사한 결과들(또는 만일 대각선 매트릭스들(Q 및 T)이 양의 값들에 한정되면 심지어 동일한 결과들)에 이르게 한다는 유의하여야 한다.It should be noted that the application of Schur triangulation or eigenvalue decomposition (instead of single value decomposition) leads to similar results (or even if the diagonal matrices Q and T are limited to positive values, even the same results).

이러한 분해를 요구조건(E _Z

C ) to generate (at least approximate):

방정식의 양쪽이 매트릭스의 제곱을 표현하고, 본 발명의 발명자들은 제곱을 떨어드리고(drop) 완전한 매트릭스(M)에 대하여 푼 것에 유의하여야 한다.It should be noted that both of the equations express the square of the matrix and the inventors of the present invention drop the square and solve for the complete matrix M.

그 다음에 믹싱 매트릭스(M)는 다음과 같이 결정될 수 있다:The mixing matrix M may then be determined as follows:

이러한 방법은 아래와 같이 프로토타입 매트릭스(H)를 설정함으로써 일반적인 방법으로부터 유도될 수 있다:This method can be derived from the general method by setting the prototype matrix H as follows:

적용된 신호들의 공분산 매트릭스(E _W )의 조건에 의존하여, 일부 규칙화를 포함하도록 마지막 방정식이 필요할 수 있으나, 그렇지 않으면 이는 수치상으로 안정적이어야만 한다. Depending on the condition of the covariance matrix ( E _W ) of the applied signals, the last equation may be required to include some regularization, but otherwise it must be numerically stable.

14.4.2. 에너지 보상 방법 (B)14.4.2. Energy compensation method (B)

때때로 (적용 시나리오에 의존하여) 파라미터 재구성들(예를 들면, 렌더링된 오디오 신호들) 또는 역상관된 신호들의 믹싱을 허용하나, 각각 파라미터로 재구성된 신호(예를 들면, 렌더링된 오디오 신호)를 단지 그것의 고유의 역상관된 신호와 개별적으로 믹싱하는 것은 바람직하지 않다.Sometimes it is possible to mix parameter reconstructs (e.g., rendered audio signals) or decorrelated signals (depending on the application scenario), but each reconstructed signal (e.g., a rendered audio signal) It is not desirable to mix separately with its inherent decoded signal.

이러한 요구조건을 달성하기 위하여, 단순화된 방법 "A"에 부가적인 제약이 도입되어야만 한다. 이제, 적용된 신호들(역상관된 신호들)의 믹싱 매트릭스(M)는 대각선 형태를 갖도록 요구된다:In order to achieve this requirement, additional constraints must be introduced in the simplified method " A ". Now, the mixing matrix M of the applied signals (decorrelated signals) is required to have a diagonal shape:

이러한 접근법의 주요 목적은 파라미터 재구성(예를 들면, 렌더링된 오디오 신호)에서의 에너지의 손실을 보상하기 위하여 역상관된 신호들을 사용하는 것이며, 출력 신호의 공분산 매트릭스의 대각선 변형이 무시되는데, 즉 교차 상관들의 어떠한 직접적인 처리도 존재하지 않는다. 따라서, 역상관된 신호들의 적용에서 출력 오브젝트들/채널들 사이(예를 들면, 렌더링된 오디오 신호들 사이)에 어떠한 교차 누출도 도입되지 않는다.The main purpose of this approach is to use the decorrelated signals to compensate for the loss of energy in the parameter reconstruction (e. G., The rendered audio signal) and diagonal deformation of the covariance matrix of the output signal is ignored, There is no direct treatment of correlations. Thus, no cross leakage is introduced between the output objects / channels (e.g. between the rendered audio signals) in the application of the decorrelated signals.

그 결과, 표적 공분산 매트릭스(또는 요구되는 공분산 매트릭스)의 주 대각선만이 도달될 수 있고, 비-대각선들은 파라미터 재구성 및 추가된 역상관된 신호들을 기초로 한다. 이러한 방법은 신호들이 상관되지 않는 것으로서 고려될 수 있는, 오브젝트만 기반으로 하는 적용들에 가장 적절하다.As a result, only the main diagonal of the target covariance matrix (or the required covariance matrix) can be reached, and the non-diagonals are based on the parameter reconstruction and the added decorrelated signals. This method is most appropriate for object-based applications where signals can be considered as uncorrelated.

방법의 최종 출력(예를 들면, 출력 오디오 신호들)은 재구성된 신호들(

)의 에너지들과 상응하는 공분산 매트릭스 엔트리들이 요구되는 에너지들과 동일하도록 계산된 대각선 매트릭스(M)를 갖는

에 의해 주어진다:The final output (e.g., output audio signals) of the method may be reconstructed signals

With the diagonal matrices M calculated so that the corresponding covariance matrix entries correspond to the required energies

Lt; / RTI >

C는 일반적인 경우에 대하여 위에 설명된 것과 같이 결정될 수 있다. C can be determined as described above for the general case.

예를 들면, 믹싱 매트릭스(M)는 보상 신호들(요구되는 에너지들(교차 공분산 매트릭스(C)의 대각선 요소들에 의해 기술될 수 있는) 및 파라미터 재구성들의 에너지들(오디오 디코더에 의해 결정될 수 있는) 사이의 차이들)의 요구되는 에너지들을 역상관된 신호들의 에너지들(오디오 디코더에 의해 결정될 수 있는)로 세분함으로써 직접적으로 유도될 수 있는데:For example, the mixing matrix M may include compensation signals (which may be described by diagonal elements of the crossover crossover matrix C ) and the energies of the parameter reconstructions (which may be determined by the audio decoder ) Into the energies of the decorrelated signals (which may be determined by the audio decoder):

여기서 λ _Dec 는 출력 신호들에 추가된 역상관된 성분의 양을 제한하도록 사용되는 비-음의 임계값이다(예를 들면, λ _Dec = 4).Where [lambda] _Dec is the non-negative threshold used to limit the amount of the decorrelated component added to the output signals (e.g., [lambda] _Dec = 4).

에너지들은 파라미터로 재구성될 수 있거나(예를 들면, 오브젝트 레벨 차이 정보들, 오브젝트간 상관들 및 렌더링 계수들을 사용하여) 또는 실제로 디코더(일반적으로 계산적으로 비용이 더 드는)에 의해 계산될 수 있다는 것을 이해하여야 한다.The energies can be reconstructed with parameters (e.g., using object level difference information, inter-object correlations and rendering coefficients), or actually computed by a decoder (typically computationally expensive) I must understand.

이러한 방법은 다음과 같이 프로토타입 매트릭스(H)를 설정함으로써 일반적인 방법으로부터 유도될 수 있다:This method can be derived from the general method by setting the prototype matrix H as follows:

이러한 방법은 순수 렌더링된 출력들의 사용을 명시적으로 최대화한다. 방법은 공분산 매트릭스들이 어떠한 비-대각선 엔트리들도 갖지 않을 때 단순화 "A"와 동등하다.This method explicitly maximizes the use of purely rendered outputs. The method is equivalent to the simplification " A " when the covariance matrices have no non-diagonal entries.

이러한 방법은 감소된 계산 복잡도를 갖는다.This method has a reduced computational complexity.

그러나, 에너지 보상 방법은 교차 상관 항들이 변형되지 않는다는 것을 반드시 나타내지는 않는다는 것을 이해하여야 한다. 이는 만일 본 발명의 발명자들이 이상적인 역상관기들을 사용하고 역상관 유닛에 대한 어떠한 복잡도 감소도 없으면 유효하다. 본 방법의 개념은 에너지를 복원하고 교차 항들에서의 변형을 무시하는 것이다(교차 항들의 변화는 실질적으로 상관 특성들을 변형하지 않을 것이고 전체 공간 효과에 영향을 미치지 않을 것이다).It should be understood, however, that the energy compensation method does not necessarily indicate that the cross-correlation terms are not modified. This is valid if the inventors of the present invention use ideal decorrelators and there is no reduction in complexity for the decorrelating unit. The idea of this method is to restore energy and ignore the transformations in the cross terms (a change in cross terms will not substantially modify the correlation properties and will not affect the overall spatial effect).

14.5. 14.5. 믹싱Mixing 매트릭스( matrix( FF )의 요구조건들) Requirements

아래에, 섹션 14.3 및 14.4에서 그것의 유도가 설명된, 믹싱 매트릭스(F)가 저하들을 방지하기 위하여 요구조건들을 충족시키는 것이 설명될 것이다.It will be described below that the mixing matrix F , whose derivation is described in Sections 14.3 and 14.4, satisfies the requirements to prevent degradations.

출력의 저하들을 방지하기 위하여, 파라미터 재구성을 보상하기 위한 어떤 방법은 다음의 특성을 갖는 결과를 생산하여야만 한다: 만일 렌더링 매트릭스가 다운믹스 매트릭스와 동일하면 출력 채널들은 다운믹스 채널들과 동일하여야만(또는 적어도 근사치이어야만) 한다. 제안된 모드는 이러한 특성을 충족시킨다. 만일 렌더링 매트릭스가 다운믹스 매트릭스와 동일하면(R = D), 파라미터 재구성은 다음에 의해 주어지고:To avoid degradation of output, some method for compensating for parameter reconstruction should produce results with the following characteristics: if the rendering matrix is the same as the downmix matrix, then the output channels must be the same as the downmix channels (or At least approximate). The proposed mode satisfies these characteristics. If the rendering matrix is the same as the downmix matrix ( R = D ), the parameter reconstruction is given by:

요구되는 공분산 매트릭스는 아래와 같을 것이다:The required covariance matrix would be:

C = RE _X R ^H = DE _X D ^H = E _Y . _{^{C = RE X R H = DE}} X D H = E Y.

따라서 믹싱 매트릭스(F)의 획득을 위하여 해결되기 위한 방정식은 아래와 같은데:Therefore, the equation to be solved for obtaining the mixing matrix F is as follows:

여기서

는 0들의 크기 N _UpmixCh ×N _UpmixCh 의 정방 매트릭스(square matrix)이다. F에 대한 이전의 방정식을 해결하여, 아래와 같이 획득될 수 있다:here

_Is a square matrix of the size of 0 N _UpmixCh N _UpmixCh . By solving the previous equation for F , it can be obtained as follows:

이는 역상관된 신호들이 가산에서 제로-가중을 가질 것이고, 최종 출력은 다운믹스 신호들과 동일한, 순수 신호들에 의해 주어질 것이라는 것을 의미한다:This means that the decorrelated signals will have a zero-weighting in the add, and the final output will be given by the same pure signals as the downmix signals:

그 결과, 이러한 렌더링 시나리오에서 다운믹스 신호와 동일하기 위하여 시스템 출력에 대한 주어진 요구조건이 충적된다.As a result, in such a rendering scenario, the given requirements for the system output are satisfied to be the same as the downmix signal.

14.6. 공분산 매트릭스(14.6. Covariance matrix ( EE _SS )의 추정)

믹싱 매트릭스(F)를 획득하기 위하여 결합된 신호들(S)의 공분산 매트릭스(E _S )의 지식이 필요하거나 또는 적어도 바람직하다.Knowledge of the covariance matrix E _S of the combined signals S to obtain the mixing matrix F is necessary or at least desirable.

원칙적으로, 이용 가능한 신호들로부터(즉, 파라미터 재구성(

) 및 역상관기 출력(W)으로부터) 공분산 매트릭스(E _S )를 직접적으로 추정하는 것이 가능하다. 비록 접근법이 더 정확한 결과들에 이르게 할 수 있더라도, 이는 관련 계산 복잡도 때문에 실용적이지 않을 수 있다. 제안된 방법들은 공분산 매트릭스(E _S )의 파라미터 근사치를 사용한다.In principle, from available signals (i.e., parameter reconstruction (

) And the decorrelator matrix ( E _S ) directly from the decorrelator output ( W ). Although the approach may lead to more accurate results, this may not be practical due to the associated computational complexity. The proposed methods use parameter approximations of the covariance matrix ( E _S ).

공분산 매트릭스(E _S )의 일반적인 구조는 다음과 같이 표현될 수 있는데:The general structure of the covariance matrix ( E _S ) can be expressed as:

여기서 매트릭스(

)는 직접적인 신호들(

) 및 역상관된 신호들(W) 사이의 교차 공분산이다.Here,

) &Lt; / RTI >

&Lt; / RTI > and the decorrelated signals W. < RTI ID = 0.0 >

역상관기들이 이상적인(즉, 에너지 보존성이고, 출력들이 입력들에 직각이며, 모든 출력이 상호 직각인) 것으로 가정하면, 공분산 매트릭스(E _S )는 다음과 같이 단순화된 형태를 사용하여 표현될 수 있다:Assuming that the decorrelators are ideal (i. E. , Energy conserving, the outputs are orthogonal to the inputs and all outputs are mutually orthogonal), the covariance matrix E _S can be expressed using the simplified form :

파라미터로 재구성된 신호(

)의 공분산 매트릭스(

)는 다음과 같이 파라미터로 결정된다:Signal reconstructed with parameters (

) Covariance matrix (

) Is determined by the following parameters:

역상관된 신호(W)의 공분산 매트릭스(E _W )는 상호 직교성 특성을 충족시키고 다음과 같이

의 대각선 요소들만을 포함하는 것으로 추정된다:The covariance matrix E _W of the decorrelated signal W satisfies the mutual orthogonality characteristic and is expressed as

It is assumed that only diagonal elements of < RTI ID = 0.0 >

만일 상호 직교성 및/또는 에너지 보존의 가정이 위반되면(예를 들면, 이용 가능한 역상관기들의 수가 역상관되려는 신호들의 수보다 적을 때의 경우에), 공분산 매트릭스(E _W )는 다음과 같이 추정될 수 있다:If the assumption of mutual orthogonality and / or energy conservation is violated (e. G. When the number of available decorrelators is less than the number of signals to be decorrelated), the covariance matrix ( E _W ) is estimated as Can:

15. 15. 역상관Reverse correlation 유닛을 위한 복잡도 감소 Reduced complexity for units

아래에, 본 발명에 따른 실시 예들에서 사용되는 역상관기들의 복잡도가 어떻게 감소될 수 있는지가 설명될 것이다.Below, how the complexity of the decorrelators used in embodiments according to the present invention can be reduced will be described.

역상관기 기능 구현은 때때로 계산적으로 복잡하다는 것을 이해하여야 한다. 일부 적용들에서(예를 들면, 소형 디코더 솔루션들) 제한된 계산 자원들에 기인하여 도입되려는 역상관기들의 수에 대한 제한이 필요할 수 있다. 본 섹션은 적용되는 역상관기들(또는 역상관들)의 수를 제어함으로써 역상관기 유닛 복잡도의 감소를 위한 수단들의 설명을 제공한다. 역상관기 유닛 인터페이스가 도 16과 17에 도시된다.It should be understood that the implementation of decorrelator functionality is sometimes computationally complex. In some applications (e.g., small decoder solutions), a restriction on the number of decorrelators to be introduced due to limited computational resources may be required. This section provides a description of the means for reducing the decorrelator unit complexity by controlling the number of applied decorrelators (or decorrelators). The decorrelator unit interface is shown in Figures 16 and 17.

도 16은 간단한 (종래의) 역상관 유닛의 개략적인 블록 다이어그램을 도시한다. 도 16에 따른 역상관 유닛(1600)은 예를 들면 렌더링된 오디오 신호들(

) 같은, N 역상관기 입력 신호들(1610a 내지 1610n)을 수신하도록 구성된다. 게다가, 역상관 유닛(1600)은 N 역상관 출력 신호들(1612a 내지 1612n)을 제공한다. 역상관 유닛(1600)은 예를 들면, N 개별 역상관기들(또는 역상관 함수들, 1620n 내지 1620n)을 포함할 수 있다. 예를 들면, 각각의 개별 역상관기들(1620a 내지 1620n)은 역상관기 입력 신호들(1610a 내지 1610n) 중 관련된 하나를 기초로 하여 역상관기 출력 신호들(1612a 내지 1612n) 중 하나를 제공할 수 있다. 따라서, N 개별 역상관기들, 또는 역상관 함수들(1620a 내지 1620n)은 역상관기 입력 신호들(1610a 내지 1610n)을 기초로 하여 N 역상관된 신호들(1612a 내지 1612n)을 제공하기 위하여 필요할 수 있다.Figure 16 shows a schematic block diagram of a simple (conventional) decorrelation unit. The decorrelation unit 1600 according to FIG. 16 may include, for example, rendering rendered audio signals

), &Lt; / RTI > N decorator correlator input signals 1610a through 1610n. In addition, the decorrelation unit 1600 provides N de-correlated output signals 1612a through 1612n. The decorrelation unit 1600 may include, for example, N individual decorrelators (or decorrelation functions, 1620n through 1620n). For example, each of the individual decorrelators 1620a through 1620n may provide one of the decorrelator output signals 1612a through 1612n based on a related one of the decorrelator input signals 1610a through 1610n . Thus, N individual decorrelators, or decorrelation functions 1620a through 1620n, may be needed to provide N decorrelated signals 1612a through 1612n based on decorrelator input signals 1610a through 1610n have.

그러나, 도 17은 감소된 복잡도 역상관 유닛(1700)의 개략적인 블록 다이어그램을 도시한다. 감소된 복잡도 역상관 유닛(1700)은 N 역상관기 입력 신호들(1710a 내지 1710n)을 수신하고 이를 기초로 하여, N 역상관기 출력 신호들(1712a 내지 1712n)을 제공하도록 구성된다. 예를 들면, N 역상관기 입력 신호들(1710a 내지 1710n)은 렌더링된 오디오 신호들(

)일 수 있고, N 역상관기 출력 신호들(1712a 내지 1712n)은 역상관된 오디오 신호들(W)일 수 있다.However, FIG. 17 shows a schematic block diagram of a reduced complexity decorrelation unit 1700. Decreased complexity decorrelation unit 1700 is configured to receive N decorrelator input signals 1710a through 1710n and to provide N decorrelator output signals 1712a through 1712n based thereon. For example, the N de-correlator input signals 1710a through 1710n may be generated from the rendered audio signals

), And the N de-correlator output signals 1712a through 1712n may be decorrelated audio signals W.

역상관기(1700)는 N 역상관기 입력 신호들(1710a 내지 1710n)의 제 1 세트을 수신하고 이를 기초로 하여, K 역상관기 입력 신호들(1722a 내지 1722k)의 제 2 세트를 제공하도록 구성되는 프리믹서(또는 동등하게는, 프리믹싱 기능성, 1720)를 포함한다. 예를 들면, 프리믹서(1720)는 N 역상관기 입력 신호들(1710a 내지 1710n)의 제 1 세트를 기초로 하여 K 역상관기 입력 신호들(1722a 내지 1722k)의 제 2 세트를 유도하기 위하여 이른바 "프리믹싱" 또는 "다운믹싱"을 실행할 수 있다. 예를 들면, K 역상관기 입력 신호들(1722a 내지 1722k)의 제 2 세트의 K 신호들은 매트릭스(

)를 사용하여 표현될 수 있다. 역상관 유닛(또는 동등하게는, 다채널 역상관기, 1700)는 또한 역상관기 입력 신호들(1722a 내지 1722k)의 제 1 세트의 K 신호들을 수신하고, 이를 기초로 하여, 역상관기 출력 신호들(1732a 내지 1732k)의 제 1 세트를 구성하는 K 역상관기 출력 신호들을 제공하도록 구성되는, 역상관기 코더(1730)를 포함한다. 예를 들면, 역상관기 코더(1730)는 K 개별 역상관기들(또는 역상관 함수들)을 포함할 수 있고, 각각의 개별 역상관기들(또는 역상관 함수들)은 K 역상관기 입력 신호들(1722a 내지 1722k)의 제 2 세트의 상응하는 역상관기 입력 신호를 기초로 하여 K 역상관기 출력 신호들(1732a 내지 1732k)의 제 1 세트의 역상관기 출력 신호들 중 하나를 제공한다. 대안으로서, 주어진 역상관기, 또는 역상관 함수는 K 역상관기 출력 신호들(1732a 내지 1732k)의 제 1 세트의 각각의 역상관기 출력 신호들이 K 역상관기 입력 신호들(1722a 내지 1722k)의 제 2 세트의 역상관기 입력 신호들 중 단일의 하나를 기초로 하도록 K번 적용될 수 있다.The decorrelator 1700 receives a first set of N decorrelator input signals 1710a through 1710n and provides a second set of K decorrelator input signals 1722a through 1722k based on the first set of N decorrelator input signals 1710a through 1710n, (Or equivalently, premixing functionality, 1720). For example, the pre-mixer 1720 may be a so-called " pre-mixer " to derive a second set of K decorrelator input signals 1722a through 1722k based on the first set of N decorrelator input signals 1710a through 1710n, Pre-mixing " or " down-mixing " For example, a second set of K signals of K decorrelator input signals 1722a through 1722k are applied to a matrix

). &Lt; / RTI > The decorrelator unit (or equivalently, a multi-channel decorrelator, 1700) also receives the first set of K signals of decorrelator input signals 1722a through 1722k and, based thereon, decorrelator output signals Correlator coder 1730 that is configured to provide K decorator output signals that make up the first set of symbols 1732a through 1732k. For example, the decorrelator coder 1730 may include K individual decorrelators (or decorrelation functions), and each individual decorrelator (or decorrelation functions) may include K decorrelator input signals Correlator output signals 1732a through 1732k based on the corresponding decorrelator input signals of the first set of K correlator output signals 1732a through 1722k. Alternatively, a given decorrelator, or decorrelation function, may be used such that each of the decorrelator output signals of the first set of K decorrelator output signals 1732a through 1732k is associated with a second set of K decorrelator input signals 1722a through 1722k Lt; RTI ID = 0.0 > K < / RTI >

역상관 유닛(1700)은 또한 역상관기 출력 신호들의 제 1 세트의 K 역상관기 출력 신호들(1732a 내지 1732k)을 수신하고 이를 기초로 하여, ("외부" 역상관기 출력 신호들을 구성하는) 역상관기 출력 신호들의 제 2 세트의 N 신호들(1712a 내지 1712n)을 제공하도록 구성되는, 포스트믹서(1740)를 포함한다.The decorrelator unit 1700 also receives the K-decorrelator output signals 1732a-1732k of the first set of decorrelator output signals and, based thereon, a decorrelator (which constitutes " outer " decorrelator output signals) And a post mixer 1740 configured to provide a second set of N signals 1712a through 1712n of output signals.

프리믹서(1720)는 바람직하게는 프리믹싱 매트릭스(M _pre)에 의해 기술될 수 있는, 선형 믹싱 운영을 실행할 수 있다는 것에 유의하여야 한다. 게다가, 포스트믹서(1740)는 바람직하게는 K 역상관기 출력 신호들(1732a 내지 1732k)의 제 1 세트로부터(즉, 역상관기 코어(1730)의 출력 신호들로부터) 역상관기 출력 신호들이 제 2 세트의 N 역상관기 출력 신호들(1712a 내지 1712n)을 유도하기 위하여 포스트믹싱 매트릭스(M _post)에 의해 기술될 수 있는, 선형 믹싱(또는 업믹싱) 운영을 실행한다.It should be noted that pre-mixer 1720 may preferably perform a linear mixing operation, which may be described by a _pre- mixing matrix ( M _pre ). In addition, the post mixer 1740 preferably outputs the decorrelator output signals from the first set of K decorrelator output signals 1732a through 1732k (i.e., from the output signals of the decorrelator core 1730) (Or upmixing) operation, which may be described by a post-mixing matrix ( M _post ) to derive N inverse correlator output signals 1712a through 1712n.

제안된 방법 및 장치의 주요 개념은 다음에 의해 N부터 K까지의 역상관기들(또는 역상관기 코어)에 대한 입력 신호들의 수를 감소시키는 것이다:The main concept of the proposed method and apparatus is to reduce the number of input signals for N to K decorrelators (or decorrelator cores) by:

● 아래와 같은 낮은 수의 채널들로의 신호들(예를 들면, 렌더링된 오디오 신호들)의 프리믹싱:Pre-mixing of signals (e.g., rendered audio signals) to a low number of channels such as:

● 아래와 같이 이용 가능한 K 역상관기들을 사용하는 역상관의 적용(예를 들면 역상관기 코어의):The application of the decorrelation using the available K correlation correlators (for example of the decorrelator core) as follows:

● 아래와 같은 다시 N 채널들로의 역상관된 신호들의 업-믹싱:Upmixing of the correlated signals back to the N channels as follows:

.

프리믹싱 매트릭스(M _pre)는 매트릭스 산물(

)이 잘 조절되도록(도치(inversion) 운영과 관련하여) 다운믹스/렌더링/상관/등등의 정보를 기초로 하여 구성될 수 있다. 포스트믹싱 매트릭스는 다음과 같이 계산될 수 있다:The pre-mixing matrix ( M _pre )

(In relation to the inversion operation ) so that they are well-controlled (for example, in the case of an inversion operation). The post-mixing matrix can be calculated as: < RTI ID = 0.0 >

비록 중간 역상관된 신호들(

또는

)의 공분산 매트릭스가 대각선이더라도(이상적인 역상관기들을 가정하여), 최종 역상관된 신호들의 공분산 매트릭스(W)는 이러한 종류의 처리를 사용할 때 더 이상 상당히 대각선 같지는 않을 것이다. 따라서, 공분산 매트릭스는 다음과 같이 믹싱 매트릭스들을 사용하여 추정될 수 있다:Although the intermediate decorrelated signals (

or

(Assuming ideal decorrelators), the covariance matrix W of the final decorrelated signals will no longer be significantly diagonal when using this kind of processing. Thus, the covariance matrix can be estimated using mixing matrices as follows:

.

사용된 역상관기들(또는 개별 역상관기들)의 수, K는 지정되지 않고 요구되는 계산 복잡도 및 이용 가능한 역상관기들에 의존한다. 그것의 값은 N(가장 높은 계산 복잡도)부터 1(가장 낮은 계산 복잡도)까지 다양할 수 있다.The number of inverse correlators (or individual inverse correlators) used, K is not specified and depends on the required computational complexity and the available decorrelators. Its value can vary from N (highest computational complexity) to 1 (lowest computational complexity).

역상관기 유닛에 대한 입력 신호들의 수, N은 임의적이고 제안된 방법은 시스템의 렌더링 구성과 독립적으로, 어떠한 수의 입력 신호들을 지원한다.The number of input signals to the decorrelator unit, N, is arbitrary and the proposed method supports any number of input signals, independent of the rendering configuration of the system.

예를 들면 출력 채널에 의존하는 높은 수의 출력 채널을 갖는, 3차원 오디오 콘텐츠를 사용하는 적용들에서, 프리믹싱 매트릭스(M _pre)를 위한 한 가지 가능한 표현이 아래에 설명된다.For example, with an output channel of a high number of which depends on the output channel, in the application of using the 3-D audio content, one possible expression for the pre-mixing matrix _(pre M) is described below.

아래에서, 만일 다채널 오디오 디코더에서 역상관 유닛(1700)이 사용되면 프리믹서(1720)에 의해 실행되는 프리믹싱(및 그 결과, 포스트믹서(1740)에 의해 실행되는 포스트믹싱)이 어떻게 조정되는지가 설명될 것이고, 역상관기 입력 신호들의 제 1 세트의 역상관기 입력 신호들(1710a 내지 1710n)이 오디오 장면이 상이한 공간 위치들과 관련된다.Below, how the pre-mixing (and hence the post-mixing performed by the post mixer 1740) to be performed by the pre-mixer 1720 is adjusted if the decorrelation unit 1700 is used in a multi-channel audio decoder And the first set of decorrelator input signals 1710a through 1710n of the decorrelator input signals are associated with different spatial locations of the audio scene.

이러한 목적을 위하여, 도 18은 상이한 출력 포맷들을 위하여 사용되는, 확성기 위치들의 테이블 표현을 도시한다.For this purpose, Figure 18 shows a table representation of the loudspeaker locations used for different output formats.

도 18의 테이블(1800)에서, 제 1 행(1810)은 확성기 지수 숫자를 기술한다. 제 2 행(1820)은 확성기 레벨을 기술한다. 제 3 행(1830)은 각각의 확성기의 방위각 위치를 기술하고, 제 4 행(1832)은 확성기의 위치의 방위각 오차를 기술한다. 제 5 행(1840)은 각각의 확성기의 위치의 고도를 기술하고, 제 6 행(1842)은 상응하는 고도 오차를 기술한다. 제 7 행(1850)은 출력 포맷(O-2,0)을 위하여 어떤 확성기들이 사용되는지를 나타낸다. 제 8 행(1860)은 출력 포맷(O-5.1)을 위하여 어떤 확성기들이 사용되는지를 나타낸다. 제 9 행(1864)은 출력 포맷(O-7.1)을 위하여 어떤 확성기들이 사용되는지를 나타낸다. 제 10 행(1870)은 출력 포맷(O-8.1)을 위하여 어떤 확성기들이 사용되는지를 나타내고, 제 11 행(1880)은 출력 포맷(O-10.1)을 위하여 어떤 확성기들이 사용되는지를 나타내며, 제 12 행(1890)은 출력 포맷(O-22.2)을 위하여 어떤 확성기들이 사용되는지를 나타낸다. 도시된 것과 같이, 출력 포맷(O-2.0)을 위하여 두 개의 확성기가 사용되고, 출력 포맷(O-5.1)을 위하여 6개의 확성기가 사용되며, 출력 포맷(O-7.1)을 위하여 8개의 확성기가 사용되며, 출력 포맷(O-8.1)을 위하여 9개의 확성기가 사용되고, 출력 포맷(O-10.1)을 위하여 11개의 확성기가 사용되며, 출력 포맷(O-22.2)을 위하여 24개의 확성기가 사용된다. In table 1800 of FIG. 18, first row 1810 describes the loudspeaker exponent number. A second row 1820 describes the loudspeaker level. The third row 1830 describes the azimuth position of each loudspeaker, and the fourth row 1832 describes the azimuth error of the location of the loudspeaker. A fifth row 1840 describes the altitude of the position of each loudspeaker, and a sixth row 1842 describes the corresponding altitude error. The seventh row 1850 indicates which loudspeakers are used for the output format (O-2,0). The eighth row 1860 indicates which loudspeakers are used for the output format (O-5.1). Ninth line 1864 shows which loudspeakers are used for the output format (O-7.1). The tenth row 1870 indicates which loudspeakers are used for the output format O-8.1, the eleventh row 1880 indicates which loudspeakers are used for the output format O-10.1, Line 1890 shows which loudspeakers are used for output format (O-22.2). As shown, two loudspeakers are used for the output format (O-2.0), six loudspeakers are used for the output format (O-5.1), eight loudspeakers are used for the output format (O-7.1) Nine loudspeakers are used for the output format (O-8.1), 11 loudspeakers are used for the output format (O-10.1), and 24 loudspeakers are used for the output format (O-22.2).

그러나, 출력 포맷들(O-5.1, O-7.1, O-8,1 및 O-10.1)을 위하여 하나의 저주파수 효과 확성기가 사용되고 출력 포맷(O-22.2)을 위하여 두 개의 저주파수 효과 확성기(LFE1, LFE2)가 사용된다는 것에 유의하여야 한다. 게다가, 바람직한 실시 예에서, 하나의 렌더링된 오디오 신호(예를 들면, 렌더링된 오디오 신호들(1582a 내지 1582n) 중 하나)는 하나 이상의 저주파수 효과 확성기를 제외하고는, 각각의 확성기들과 관련된다는 것에 유의하여야 한다. 따라서, 두 개의 렌더링된 오디오 신호는 O-2.0 포맷에 따라 사용되는 두 개의 확성기와 관련되고, 5개의 렌더링된 오디오 신호는 만일 5.1 포맷이 사용되면 5개의 비-저주파수 효과 확성기와 관련되며, 7개의 렌더링된 오디오 신호는 만일 O-7.1 포맷이 사용되면 7개의 비-저주파수 효과 확성기와 관련되며, 8개의 렌더링된 오디오 신호는 만일 8.1 포맷이 사용되면 8개의 비-저주파수 효과 확성기와 관련되며, 10개의 렌더링된 오디오 신호는 만일 O-10.1 포맷이 사용되면 10개의 비-저주파수 효과 확성기와 관련되며, 22개의 렌더링된 오디오 신호는 만일 O-22.2 포맷이 사용되면 22개의 비-저주파수 효과 확성기와 관련된다.However, one low frequency effect loudspeaker is used for the output formats (O-5.1, O-7.1, O-8,1 and O-10.1) and two low frequency effect loudspeakers (LFE1, LFE2) is used. Further, in the preferred embodiment, one rendered audio signal (e.g., one of the rendered audio signals 1582a through 1582n) is associated with each loudspeaker except one or more low-frequency effect loudspeakers Be careful. Thus, the two rendered audio signals are associated with two loudspeakers used in accordance with the O-2.0 format, the five rendered audio signals are associated with five non-low frequency effect loudspeakers if 5.1 format is used, The rendered audio signal is associated with seven non-low frequency effect loudspeakers if the O-7.1 format is used, eight rendered audio signals associated with eight non-low frequency effect loudspeakers if the 8.1 format is used, The rendered audio signal is associated with 10 non-low frequency effect loudspeakers if the O-10.1 format is used and the 22 rendered audio signals are associated with 22 non-low frequency effect loudspeakers if the O-22.2 format is used.

그러나, 때때로 위에 언급된 것과 같이, 더 적은 수의 (개별) 역상관기(역상관기 코어)를 사용하는 것이 바람직하다. 아래에, 다채널 오디오 디코더에 의해 O-22.2 출력 포맷이 사용될 때 22개의 렌더링된 오디오 신호(1582a 내지 1582n, 매트릭스(

) 또는 매트릭스(

)에 의해 표현될 수 있는)가 존재하도록, 역상관기들의 수가 어떻게 유연하게 감소하는지가 설명될 것이다.However, it is sometimes desirable to use fewer (individual) decorrelators (decorrelator cores), as mentioned above. Below, when the O-22.2 output format is used by the multi-channel audio decoder, the 22 rendered audio signals 1582a through 1582n,

) Or matrix (

) Can be expressed by the number of decorrelators), there will be explained how the number of decorrelators decreases flexibly.

도 19a 내지 19g는 N=22의 렌더링된 오디오 신호가 존재한다는 가정 하에서 렌더링된 오디오 신호들(1582a 내지 1582n)의 프리믹싱을 위한 상이한 선택들을 표현한다. 예를 들면, 도 19a는 프리믹싱 매트릭스(M _pre )의 엔트리들의 테이블 표현을 도시한다. 도 19a의 1 내지 11로 라벨링된, 열들은 프리믹싱 매트릭스(M _pre )의 열들을 표현하고 1 내지 22로 라벨링된, 행들은 프리믹싱 매트릭스(M _pre )의 행들과 관련된다. 게다가, 프리믹싱 매트릭스(M _pre )의 각각의 열은 역상관기 입력 신호들의 제 2 세트의 K 역상관기 입력 신호들(1722a 내지 1722k) 중 하나와(즉, 역상관기 코어의 입력 신호들과) 관련된다는 것에 유의하여야 한다. 게다가, 프리믹싱 매트릭스(M _pre )의 각각의 행은 역상관기 입력 신호들의 제 1 세트의 N 역상관기 입력 신호들(1710a 내지 1710n) 중 하나와 관련되고, 그 결과 렌더링된 오디오 신호들(1582a 내지 1582n) 중 하나와 관련된다(그 이유는 일 실시 예에서 역상관기 입력 신호들의 제 1 세트의 역상관기 입력 신호들(1710a 내지 1710n)은 일반적으로 렌더링된 오디오 신호들(1582a 내지 1582n)과 동일하기 때문이다). 따라서, 프리믹싱 매트릭스(M _pre )의 각각의 행은 특정 확성기와 관련되고, 그 결과 확성기들이 공간 위치들과 관련되기 때문에, 특정 공간 위치와 관련된다. 열(1910)은 어떤 확성기(및, 그 결과 어떤 공간 위치)에 프리믹싱 매트릭스(M _pre )의 행들이 관련되는지를 나타낸다(확성기는 라벨들은 테이블(1800)의 행(1820) 내에 정의된다).Figs. 19A-19G depict different choices for premixing the rendered audio signals 1582a-1582n under the assumption that a rendered audio signal of N = 22 is present. For example, FIG. 19A shows a table representation of the entries of the premixing matrix M _pre . Labeled 1 to 11 in Fig. 19a, the columns represent the heat of the pre-mixing matrix _(pre M) and the row labeled 1 to 22 are associated with the rows of the pre-mixing matrix _(pre M). In addition, each column of the _pre- mixing matrix M _pre is associated with one of the second set of K decorrelator input signals 1722a through 1722k of the decorrelator input signals (i.e., with the input signals of the decorrelator core) . In addition, each row of the premixing matrix M _pre is associated with one of the first set of N decorrelator input signals 1710a through 1710n of the decorrelator input signals, such that the rendered audio signals 1582a through < RTI ID = 0.0 > 1582n since the decorrelator input signals 1710a-1710n of the first set of decorrelator input signals in one embodiment are identical to the generally rendered audio signals 1582a-1582n Because). Thus, pre-mixing each row of the matrix (M _pre) is associated with a particular loudspeaker, so that a loudspeaker are related since the connection with the spatial position, with a specific spatial location. Column 1910 indicates which rows of premixing matrix M _pre are associated with which loudspeaker (and, consequently, which spatial location) (loudspeakers are defined in row 1820 of table 1800).

아래에, 도 19a의 프리믹싱(M _pre )에 의해 정의되는 기능성이 더 상세히 설명될 것이다. 도시된 것과 같이, 프리믹싱 매트릭스(M _pre )의 제 1 열의 제 1 및 제 2 행 내의 "1"-값들에 의해 표시되는, 역상관기 입력 신호들의 제 2 세트의 제 1 역상관기 입력 신호(즉, 제 1 다운믹싱된 역상관기 입력 신호)를 획득하기 위하여, 스피커들(또는, 동등하게 스피커 위치들)과 관련된 렌더링된 오디오 신호들 "CH_M_000" 및 "CH_L_000"이 결합된다. 유사하게, 스피커들(또는, 동등하게 스피커 위치들)과 관련된 렌더링된 오디오 신호들 "CH_U_000" 및 "CH_T_000"은 제 2 다운믹싱된 역상관기 입력 신호(즉, 역상관기 입력 신호들의 제 2 세트의 제 2 역상관기 입력 신호)를 획득하도록 결합된다. 게다가, 도 19a의 프리믹싱 매트릭스(M _pre )는 11개의 다운믹싱된 역상관기 입력 신호가 22개의 렌더링된 오디오 신호로부터 유도되도록, 각각 두 개의 렌더링된 오디오 신호의 11개의 결합을 정의한다는 것을 알 수 있다. 또한 두 개의 다운믹싱된 역상관기 입력 신호를 획득하기 위하여, 4개의 중앙 신호들이 결합되는 것을 알 수 있다(프리믹싱 매트릭스의 행들(1 내지 4) 및 열들(1 및 2)을 참조). 게다가, 나머지 다운믹싱된 역상관기 입력 신호들은 각각 오디오 장면의 동일한 측과 관련된 두 개의 오디오 신호의 결합에 의해 획득된다는 것을 알 수 있다. 예를 들면, 프리믹싱 매트릭스의 제 3 열에 의해 표현되는, 제 3 다운믹싱된 역상관기 입력 신호는 +135^o의 방위각 위치와 관련된 렌더링된 오디오 신호들("CH_M_L135", "CH_U_L135")의 결합에 의해 획득된다. 게다가, 제 4 역상관기 입력 신호(프리믹스 매트릭스의 제 4열에 의해 표현되는)는 -135^o의 방위각 위치와 관련된 렌더링된 오디오 신호들("CH_M_R135", "CH_U_R135")의 결합에 의해 획득된다. 따라서, 각각의 다운믹싱된 역상관기 입력 신호들은 동일한(또는 유사한) 방위각 위치(또는, 동등하게 수평 위치)와 관련된 두 개의 렌더링된 오디오 신호의 결합에 의해 획득되고, 일반적으로 상이한 고도(또는, 동등하게 수직 위치)와 관련된 신호들의 결합이 존재한다.Below, the functionality defined by the _pre- mix ( M _pre ) of Figure 19a will be described in more detail. As shown, a second set of decorrelator input signals (denoted by " 1 " -values in the first and second rows of the first column of the _pre- mixing matrix M _pre ) CH_M_000 " and " CH_L_000 " associated with the loudspeakers (or equally speaker positions) are combined to obtain the first downmixed decorrelator input signal. Similarly, the rendered audio signals " CH_U_000 " and " CH_T_000 " associated with the speakers (or equally speaker positions) are converted into a second downmixed decorrelator input signal A second decorrelator input signal). In addition, it can be seen that the premixing matrix ( M _pre ) of Figure 19a defines 11 combinations of two rendered audio signals, each such that 11 downmixed decorrelator input signals are derived from 22 rendered audio signals have. It can also be seen that to obtain the two downmixed decorrelator input signals, the four center signals are combined (see rows 1 to 4 and columns 1 and 2 of the premixing matrix). In addition, it can be seen that the remaining downmixed decorrelator input signals are obtained by combining two audio signals, each associated with the same side of the audio scene. For example, a combination of, as represented by the third column of the pre-mixing matrix, the third downmixing the decorrelator input signal is rendered audio signal related to the azimuth position of ^{+135 o ( "CH_M_L135", "} CH_U_L135") Lt; / RTI > In addition, the fourth decorrelator input signal (represented by the fourth column of the premix matrix) is obtained by combining the rendered audio signals (" CH_M_R135 "," CH_U_R135 ") associated with an azimuthal position of -135 ^o . Thus, each downmixed decorrelator input signal is obtained by combining two rendered audio signals associated with the same (or similar) azimuthal position (or equally horizontal position) and is typically at a different altitude &Lt; / RTI > vertical position).

이제 N=22이고 K=10에 대하여 프리믹싱 계수들(프리믹싱 매트릭스(M _pre )의 엔트리들)을 나타내는, 도 19b를 참조한다, 도 19b의 테이블의 구조는 도 19a의 테이블의 구조와 동일하다. 그러나 도시된 것과 같이, 도 19b에 따른 프리믹싱 매트릭스(M _pre )는 제 1열이 채널 ID들(또는 위치들)을 갖는 4개의 렌더링된 오디오 신호(CH_M_000", CH_L_000", CH_U_000" 및 CH_T_000")를 기술한다는 점에서 도 19a의 프리믹싱 매트릭스(M _pre )와 다르다. 바꾸어 말하면, 수직으로 인접한 위치들과 관련된 4개의 렌더링된 오디오 신호가 요구된 역상관기들의 수를 감소시키도록(도 19a에 따른 매트릭스에 대한 11개의 역상관기 대신에 10개의 역상관기) 프리믹싱 내에 포함된다.Referring now to FIG. 19B, which shows N = 22 and K = 10 for premixing coefficients (entries of the _pre- mix matrix M _pre ), the structure of the table of FIG. 19B is the same as the structure of the table of FIG. 19A Do. As shown, however, the premixing matrix M _pre according to FIG. 19b is similar to the pre-mixing matrix M _pre according to FIG. 19b except that the first column contains four rendered audio signals CH_M_000 ", CH_L_000", CH_U_000 "and CH_T_000" ( M _pre ) in FIG. 19A in that it describes the pre-mixing matrix M _pre . In other words, four rendered audio signals associated with vertically adjacent positions are included in premixing to reduce the number of required decorrelators (ten decorrelators instead of eleven decorrelators for the matrix according to Figure 19a) do.

이제 N=22이고 K=9에 대하여 프리믹싱 계수들(프리믹싱 매트릭스(M _pre )의 엔트리들)을 나타내는, 도 19c를 참조하면, 도 19c에 따른 프리믹싱 매트릭스(M _pre )는 9개의 열만을 포함한다는 것을 알 수 있다. 게다가, 도 19c의 프리믹싱 매트릭스(M _pre )의 제 2열로부터 채널 ID들(또는 위치들)과 관련된 렌더링된 오디오 신호들(CH_M_L135", CH_U_L135", CH_M_R135" 및 CH_U_R135")은 제 2 다운믹싱된 역상관기 입력 신호(역상관기 입력 신호들의 제 2 세트의 역상관기 입력 신호)를 획득하도록 결합된다(도 19c의 프리믹싱 매트릭스에 따라 구성되는 프리믹서 내에서). 도시된 것과 같이, 도 19a 및 19b에 따른 프리믹싱 매트릭스들에 의해 개별 다운믹싱된 역상관기 입력 신호들로 결합된 렌더링된 오디오 신호들은 도 19c에 따른 공통 다운믹싱된 역상관기 입력 신호로 다운믹싱된다. 게다가, 채널 ID들을 갖는 렌더링된 오디오 신호들(CH_M_L135", CH_U_L135")는 오디오 장면의 동일한 측 상의 동일한 수평 위치들(또는 방위각 위치들) 및 공간적으로 인접한 수직 위치들(또는 고도들)과 관련되고, 채널 ID들을 갖는 렌더링된 오디오 신호들(CH_M_R135" 및 CH_U_R135")은 오디오 장면의 제 2 측 상의 동일한 수평 위치들(또는 방위각 위치들) 및 공간적으로 인접한 수직 위치들(또는 고도들)과 관련된다는 것에 유의하여야 한다. 게다가, 채널 ID들을 갖는 렌더링된 오디오 신호들(CH_M_L135", CH_U_L135", CH_M_R135" 및 CH_U_R135")은 좌측 위치 및 우측 위치를 포함하는 공간 위치들의 수평 쌍(또는 심지어 수평 4쌍)과 관련된다고 할 수 있다. 바꾸어 말하면, 도 19c의 프리믹싱 매트릭스(M _pre )의 제 2열에서 단일의 주어진 역상관기를 사용하여 역상관되도록 결합되는, 4개의 렌더링된 오디오 신호 중 두 개는 오디오 장면의 좌측 상의 공간 위치들과 관련되고, 단일의 주어진 역상관기를 사용하여 역상관되도록 결합되는, 4개의 렌더링된 오디오 신호 중 두 개는 오디오 장면의 우측 상의 공간 위치들과 관련된다는 것을 알 수 있다. 게다가, "대칭의" 4쌍이 단일(개별) 역상관기를 사용하여 역상관되도록 프리믹싱에 의해 결합되도록, 좌측의 렌더링된 오디오 신호들(상기 4개의 렌더링된 오디오 신호 중)는 오디오 장면의 중앙 평면에 대하여 대칭인 공간 위치들과 관련되고, 공간 위치들은 우측의 렌더링된 오디오 신호들(상기 4개의 렌더링된 오디오 신호 중)과 관련된다.Now, N = 22 and K = the pre-mixing coefficient with respect to 9 when representing a (pre-mixing matrix (entries of M _pre)), see Fig. 19c, pre-mixing the matrix (M _pre) according to Figure 19c is of 9 columns . &Lt; / RTI > Furthermore, the rendered audio signals (CH_M_L135 ", CH_U_L135", CH_M_R135 "and CH_U_R135") associated with the channel IDs (or positions) from the second column of the _pre- mixing matrix M _{pre of} FIG. (In a pre-mixer configured according to the pre-mix matrix of FIG. 19C) to obtain the de-correlator input signal (the de-correlator input signal of the second set of de-correlator input signals). As shown, the rendered audio signals combined into the individually downmixed decorrelator input signals by the premixing matrices according to Figures 19a and 19b are downmixed to a common downmixed decorrelator input signal according to Figure 19c . In addition, the rendered audio signals (CH_M_L135 ", CH_U_L135") having channel IDs are associated with the same horizontal positions (or azimuth positions) and spatially adjacent vertical positions (or altitudes) on the same side of the audio scene , The rendered audio signals (CH_M_R135 " and CH_U_R135 ") with channel IDs are associated with the same horizontal positions (or azimuth positions) and spatially adjacent vertical positions (or altitudes) on the second side of the audio scene . Furthermore, the rendered audio signals (CH_M_L135 ", CH_U_L135", CH_M_R135 "and CH_U_R135") with channel IDs may be associated with a horizontal pair (or even four horizontal pairs) of spatial positions including the left and right positions have. In other words, two of the four rendered audio signals, which are combined to be de-correlated using a single given decorrelator in the second column of the _pre- mixing matrix M _pre of Figure 19c, , And that two of the four rendered audio signals, which are combined to be deconvolved using a single given decorrelator, are associated with spatial locations on the right side of the audio scene. In addition, the left-hand rendered audio signals (of the four rendered audio signals) are arranged so that the four pairs of "symmetrical" are combined by premixing to be inverse-correlated using a single And spatial positions are associated with the rendered audio signals on the right (of the four rendered audio signals).

도 19d, 19e, 19f 및 19g를 참조하면, 더 많은 렌더링된 오디오 신호들이 감소된(즉, K로 감소하는) 수의 (개별) 역상관기들과 결합되는 것을 알 수 있다. 도 19a 내지 19g에 도시된 것과 같이, 일반적으로 두 개의 개별 다운믹싱된 역상관기 입력 신호로 다운믹싱된 렌더링된 오디오 신호들은 역상관기들의 수가 1로 검소할 때 결합된다. 게다가, 공간 위치들의 "대칭 4쌍"과 관련된, 그러한 렌더링된 오디오 신호들이 결합되고, 상대적으로 높은 수의 역상관기들에 대하여, 동일하거나 또는 적어도 유사한 수평 위치들(또는 방위각 위치들)과 관련된 렌더링된 오디오 신호들만이 결합되고, 상대적으로 낮은 수의 역상관기들에 대하여, 오디오 장면의 반대 측들 상의 공간 위치들과 관련된 렌더링된 오디오 신호들이 또한 결합되는 것을 알 수 있다.Referring to Figures 19d, 19e, 19f, and 19g, it can be seen that more rendered audio signals are combined with reduced (i.e., reduced to K) number of (individual) decorrelators. As shown in FIGS. 19A through 19G, the rendered audio signals, typically downmixed to two separate downmixed decorrelator input signals, are combined when the number of decorrelators is one as probing. In addition, such rendered audio signals associated with " symmetrical quadruple pairs " of spatial positions are combined and associated with a relatively high number of decorrelators with respect to the same or at least similar horizontal positions (or azimuthal positions) It can be seen that for a relatively low number of decorrelators, the rendered audio signals associated with spatial locations on opposite sides of the audio scene are also combined.

이제 도 20a 내지 20d, 21a 내지 21c, 22a 내지 22b 및 23을 참조하면, 상이한 수의 렌더링된 오디오 신호들에 대하여 유사한 개념들이 적용될 수 있다는 것에 유의하여야 한다.Referring now to Figures 20a-20d, 21a-21c, 22a-22b and 23, it should be noted that similar concepts may be applied for different numbers of rendered audio signals.

예를 들면, 도 20a 내지 20d는 N=10이고 K는 2와 5 사이에 대하여 프리믹싱 매트릭스(M _pre )의 엔트리들을 기술한다.For example, Figs. 20a to 20d describe N = 10 and K entries of the premixing matrix M _pre between 2 and 5.

유사하게, 도 21a 내지 21c는 N=8이고 K는 2와 4 사이에 대하여 프리믹싱 매트릭스(M _pre )의 엔트리들을 기술한다.Similarly, FIGS. 21A through 21C describe N = 8 and K entries for the premixing matrix ( M _pre ) between 2 and 4.

유사하게, 도 21d 내지 21f는 N=7이고 K는 2와 4 사이에 대하여 프리믹싱 매트릭스(M _pre )의 엔트리들을 기술한다.Similarly, Figures 21d through 21f describe N = 7 and K entries for the premixing matrix ( M _pre ) between 2 and 4.

도 22a 및 22b는 N=5이고 K=2 및 K=3에 대하여 프리믹싱 매트릭스의 엔트리들을 기술한다.Figures 22A and 22B describe the entries of the premixing matrix for N = 5 and K = 2 and K = 3.

도 23은 N=2이고 K=1에 대하여 프리믹싱 매트릭스의 엔트리들을 기술한다.FIG. 23 describes entries of the premixing matrix for N = 2 and K = 1.

요약하면, 도 19 내지 23에 따른 프리믹싱 매트릭스들은 예를 들면, 스위칭 가능한 방식으로, 다채널 오디오 디코더의 일부분인 다채널 역상관기에서 사용될 수 있다. 프리믹싱 매트릭스들 사이의 스위칭은 예를 들면, 요구되는 출력 구성(일반적으로 렌더링된 오디오 신호들의 수(N)를 결정하는)에 의존하고, 또한 역상관의 요구되는 복잡도(파라미터(K)를 결정하고, 예를 들면 오디오 콘텐츠의 인코딩된 표현 내에 포함된 복잡도 정보에 의존하여 조정될 수 있는)에 의존하여 실행될 수 있다.In summary, the premixing matrices according to Figs. 19-23 can be used, for example, in a switchable manner in a multi-channel decorrelator that is part of a multi-channel audio decoder. The switching between the premixing matrices depends, for example, on the required output configuration (which generally determines the number N of rendered audio signals) and also on the required complexity of the decorrelation And may be adjusted depending on, for example, the complexity information contained within the encoded representation of the audio content).

도 24를 참조하면, 22.2 출력 포맷에 대한 복잡도 감소가 더 상세히 설명될 것이다. 위에서 이미 설명된 것과 같이, 프리믹싱 매트릭스 및 포스트믹싱 매트릭스를 구성하기 위한 한 가지 해결책은 함께 믹싱되려는 채널을 선택하고 믹싱 계수들을 계산하도록 재생 레이아웃의 공간 정보를 사용하는 것이다. 그것들의 위치를 기초로 하여, 기하학적으로 관련된 확성기들(및, 예를 들면 그것들과 관련된 렌더링된 오디오 신호들)은 도 24의 테이블에서 설명되는 것과 같이, 수직 및 수평 쌍들을 취하여, 함께 그룹화된다. 바꾸어 말하면, 도 24는 테이블의 형태로, 렌더링된 오디오 신호들과 관련될 수 있는, 확성기 위치들의 그룹화를 도시한다. 예를 들면, 제 1 열(2410)은 오디오 장면의 중앙에 위치하는, 확성기 위치들의 제 1 그룹을 기술한다. 제 2 열(2412)은 공간적으로 관련된, 확성기 위치들의 제 2 그룹을 표현한다. 확성기 위치들("CH_M_L135" 및 "CH_U_L135")은 동일한 방위각 위치들(또는 동등하게, 수평 위치들) 및 인접한 고도 위치들(또는 동등하게, 수직으로 인접한 위치들)과 관련된다. 유사하게, 위치들("CH_M_R135" 및 "CH_U_R135")은 동일한 방위((또는 동등하게, 동일한 수평 위치) 및 유사한 고도(또는 동등하게, 수직으로 인접한 위치)를 포함한다. 게다가, 위치들("CH_M_L135", "CH_U_L135", "CH_M_R135" 및 "CH_U_R135")은 4쌍의 위치들을 형성하고, 위치들("CH_M_L135" 및 "CH_U_L135")은 오디오 장면의 중앙 평면에 대하여 위치들("CH_M_R135" 및 "CH_U_R135")과 대칭이다. 게다가, 위치들("CH_M_180" 및 "CH_U_180")은 또한 동일한 방위각 위치(또는 동등하게, 동일한 수평 위치) 및 유사한 고도(또는 동등하게, 인접한 수직 위치)를 포함한다.Referring to FIG. 24, the complexity reduction for the 22.2 output format will be described in greater detail. As already explained above, one solution for constructing the pre-mixing matrix and post-mixing matrix is to use the spatial information of the reproduction layout to select the channels to be mixed together and to calculate the mixing coefficients. Based on their location, the geometrically related loudspeakers (and, for example, the rendered audio signals associated with them) are grouped together taking vertical and horizontal pairs, as described in the table of FIG. In other words, Figure 24 shows the grouping of loudspeaker positions, which may be associated with rendered audio signals, in the form of a table. For example, the first column 2410 describes a first group of loudspeaker positions located at the center of the audio scene. Second column 2412 represents a second group of loudspeaker positions spatially related. The loudspeaker positions ("CH_M_L135" and "CH_U_L135") are associated with the same azimuth positions (or equally, horizontal positions) and adjacent altitude positions (or equivalently, vertically adjacent positions). Similarly, positions ("CH_M_R135" and "CH_U_R135") include the same orientation ((or equivalently, the same horizontal position) and similar height (or equivalently, vertically adjacent position) CH_M_L135 "," CH_U_L135 "," CH_M_R135 "and" CH_U_R135 "form four pairs of positions and positions (" CH_M_L135 "and" CH_U_L135 " (Or equivalently, the same horizontal position) and a similar altitude (or, equivalently, an adjacent vertical position), as shown in Fig. .

제 3 열(2414)은 위치들의 제 3 그룹을 표현한다. 위치들(CH_N_L030" 및 CH_L_L045")은 공간적으로 인접한 위치들이고 유사한 방위각(또는 동등하게, 유사한 수평 위치) 및 유사한 고도(또는 동등하게, 유사한 수직 위치)를 포함한다는 것에 유의하여야 한다. 위치들("CH_M_R030" 및 CH_L_R045")에도 동일하게 적용된다. 게다가, 위치들의 제 3 그룹의 위치들은 4쌍의 위치들을 형성하고, 위치들CH_N_L030" 및 CH_L_L045")은 위치들("CH_M_R030" 및 CH_L_R045")에, 공간적으로 인접하고, 오디오 장면의 중앙 평면에 대하여 대칭이다.Third column 2414 represents a third group of locations. It should be noted that positions (CH_N_L030 " and CH_L_L045 ") are spatially adjacent positions and include similar azimuthal angles (or equivalently, similar horizontal positions) and similar altitudes (or equivalently, similar vertical positions). The positions of the third group of positions form four pairs of positions, and positions CH_N_L030 " and CH_L_L045 ") are also applied to positions (" CH_M_R030 " and CH_L_L045 " CH_L_R045 "), which are symmetric with respect to the center plane of the audio scene.

제 4 열(2416)은 제 2 열의 제 1의 4개의 위치와 비교할 때 유사한 특징들을 갖고 대칭의 4쌍의 위치들을 갖는, 4개의 부가적인 위치를 표현한다.Fourth column 2416 represents four additional positions having four symmetrical positions with similar features when compared to the first four positions of the second row.

제 5 열(2418)은 또 다른 4쌍의 대칭인 위치들("CH_M_L060", "CH_U_L060", "CH_M_R060" 및 "CH_U_R060")을 표현한다.The fifth column 2418 represents another four pairs of symmetric positions ("CH_M_L060", "CH_U_L060", "CH_M_R060" and "CH_U_R060").

게다가, 위치들의 상이한 그룹들의 위치들과 관련된 렌더링된 오디오 신호들은 역상관기들의 수의 감소로 더 많이 결합될 수 있다는 것에 유의하여야 한다. 예를 들면, 다채널 역상관기 내의 11개의 개별 역상관기들의 존재 하에서, 제 1 및 제 2 행 내의 위치들과 관련된 렌더링된 오디오 신호들은 각각의 그룹에 대하여 결합될 수 있다. 이에 더하여, 제 3 및 제 4 행에 표현된 위치들과 관련된 렌더링된 오디오 신호들은 각각의 그룹에 대하여 결합될 수 있다. 게다가, 제 5 및 제 6 행에 도시된 위치들과 관련된 렌더링된 오디오 신호들은 제 2 그룹에 대하여 결합될 수 있다. 따라서, 11개의 다운믹스 역상관기 입력 신호(개별 역상관기들 내로 입력될 수 있는)가 획득될 수 있다. 그러나 만일 더 적은 개별 역상관기들을 갖는 것이 바람직하면, 행들(1 내지 4)에 도시된 위치들과 관련된 렌더링된 오디오 신호들은 하나 이상의 그룹에 대하여 결합될 수 있다. 또한, 만일 개별 역상관기들의 수를 더 감소시키는 것이 바람직하면, 제 2 그룹의 모든 위치와 관련된 렌더링된 오디오 신호들이 결합될 수 있다.In addition, it should be noted that the rendered audio signals associated with the positions of different groups of positions may be combined more with a reduction in the number of decorrelators. For example, in the presence of eleven individual decorrelators in a multi-channel decorrelator, rendered audio signals associated with locations within the first and second rows may be combined for each group. In addition, the rendered audio signals associated with the positions represented in the third and fourth rows may be combined for each group. In addition, the rendered audio signals associated with the positions shown in the fifth and sixth rows can be combined for the second group. Thus, eleven downmixed decorrelator input signals (which may be input into individual decorrelators) can be obtained. However, if it is desired to have fewer individual inverse correlators, the rendered audio signals associated with the locations shown in rows 1 to 4 may be combined for one or more groups. Also, if it is desired to further reduce the number of individual decorrelators, the rendered audio signals associated with all positions in the second group may be combined.

요약하면, 출력 레이아웃(예를 들면, 확성기들)으로 제공된 신호들은 역상관 과정 동안에 보존되어야만 하는, 수평 및 수직 의존성들을 갖는다. 따라서, 믹싱 계수들은 상이한 확성기 그룹들과 상응하는 채널들이 함께 믹싱되지 않도록 계산된다.In summary, the signals provided to the output layout (e.g., loudspeakers) have horizontal and vertical dependencies that must be preserved during the decorrelation process. Thus, the mixing coefficients are calculated such that the channels corresponding to the different loudspeaker groups and the corresponding channels are not mixed together.

이용 가능한 역상관기들의 수 또는 요구되는 역상관기의 레벨에 의존하여, 각각의 그룹에서 먼저 수직 쌍들(중간 계층과 상부 계층 사이 또는 중간 계층과 하부 계층 사이)이 함께 믹싱된다. 두 번째로, 수평 쌍들(왼쪽 및 오른쪽 사이) 또는 나머지 수직 쌍들이 함께 믹싱된다. 예를 들면, 그룹 3에서, 먼저 왼쪽 수직 쌍("CH_M_L030" 및 "CH_L_L045") 및 오른쪽 수직 쌍("CH_M_R030" 및 "CH_L_R045") 내의 채널들이 함께 믹싱되고, 이러한 방법으로 이러한 그룹을 위하여 요구된 역상관기들의 수를 4개에서 두 개로 감소시킨다. 만일 훨씬 더 많은 수의 역상관기를 감소시키는 것이 바람직하면, 획득된 수평 쌍은 하나의 채널로만 다운믹싱되고, 이러한 그룹을 위하여 요구된 역상관기들의 수는 4개에서 한 개로 감소된다.Depending on the number of available decorrelators or the level of required decorrelators, the vertical pairs (between the middle and upper layers or between the middle and lower layers) are first mixed together in each group. Second, horizontal pairs (between left and right) or remaining vertical pairs are mixed together. For example, in group 3, the channels in the left vertical pair ("CH_M_L030" and "CH_L_L045") and the right vertical pair ("CH_M_R030" and "CH_L_R045") are mixed together, Decrease the number of inverse correlators from four to two. If it is desired to reduce a much larger number of decorrelators, the obtained horizontal pairs are downmixed to only one channel, and the number of decorrelators required for this group is reduced from four to one.

제시된 믹싱 규칙들을 기초로 하여, 위에 언급된 테이블들(예를 들면, 도 19 내지 23에 도시된)은 요구되는 역상관의 상이한 레벨들(또는 요구되는 역상관 복잡도의 상이한 레벨들)을 위하여 유도된다.Based on the presented mixing rules, the above-mentioned tables (e.g., shown in Figures 19-23) can be used to derive different levels of required down correlation (or different levels of required inverse correlation complexity) do.

16. 2차 외부 16. Second outside 렌더러Renderer /포맷 컨버터와의 호환성/ Compatibility with format converters

공간 오디오 오브젝트 코딩 디코더(또는 더 일반적으로, 다채널 오디오 디코더)가 외부 이차 렌더러/포맷 컨버터와 함께 사용될 때, 제안된 개념(방법 또는 장치)에 대한 다음의 변화들이 사용될 수 있다:When a spatial audio object coding decoder (or more generally, a multi-channel audio decoder) is used with an external secondary renderer / format converter, the following changes to the proposed concept (method or apparatus) can be used:

- 내부 렌더링 매트릭스(R, 예를 들면 렌더러의)는 아이덴티티(identity,

)로 설정되거나(외부 렌더러가 사용될 때) 또는 중간 렌더링 구성으로부터 유도되는 믹싱 계수들로 초기화된다(외부 포맷 컨버터가 사용될 때).The internal rendering matrix (R, for example, of the renderer) is an identity (identity,

) (When an external renderer is used) or with mixing coefficients derived from an intermediate rendering configuration (when an external format converter is used).

- 역상관기들의 수는 섹션 15에 설명된 방법을 사용하여 감소되고 프리믹싱 매트릭스(M _pre )의 수는 렌더러/포맷 컨버터로부터 수신되는 피드백 정보를 기초로 하여 계산된다(예를 들면, M _pre = D _convert 여기서 D _convert 는 포맷 컨버터이 내부에서 사용되는 다운믹스 매트릭스이다). 공간 오디오 오브젝트 코딩 디코더 외부에서 함께 믹싱될 채널들이 프리믹싱되고 공간 오디오 오브젝트 코딩 디코더 내부의 동일한 역상관기로 제공된다.The number of inverse correlators is reduced using the method described in Section 15 and the number of premixing matrices M _pre is computed based on feedback information received from the renderer / format converter (e.g., M _pre = D _convert where D _convert is the downmix matrix used internally by the format _converter ). Channels to be mixed together outside the spatial audio object coding decoder are premixed and provided to the same decorrelators inside the spatial audio object coding decoder.

외부 포맷 컨버터를 사용하여, 공간 오디오 오브젝트 코딩 내부 렌더러는 중간 구성(예를 들면 가장 높은 수의 확성기를 갖는 구성)에 프리렌더링할 것이다.Using an external format converter, the spatial audio object coding inner renderer will pre-render to an intermediate configuration (e.g., the configuration with the highest number of loudspeakers).

결론적으로, 일부 실시 예들에서 출력 오디오 신호들이 외부 렌더러 또는 포맷 컨버터에서 함께 믹싱되는 정보는 프리믹싱 매트릭스가 실제로 외부 렌더러 내에 결합되는 (역상관기 입력 신호들의 제 1 세트의) 그러한 역상관기 입력 신호들의 결합을 정의하도록 프리믹싱 매트릭스(M _pre )를 결정하기 위하여 사용된다. 따라서, (다채널 디코더의 출력 오디오 신호들을 수신하는) 외부 렌더러/포맷 컨버터로부터 수신되는 정보는 프리믹싱 매트릭스를 선택하거나 또는 조정하도록 사용되고(다채널 오디오 디코더의 내부 렌더링 매트릭스가 아이덴티티로 설정되거나 또는 중간 렌더링 구성으로부터 유도되는 믹싱 계수들로 초기화될 때), 외부 렌더러/포맷 컨버터는 다채널 오디오 디코더와 관련하여 위에서 언급된 것과 같이 출력 오디오 신호들을 수신하도록 연결된다.In conclusion, in some embodiments, the information in which the output audio signals are mixed together in an external renderer or format converter allows the combination of such decorrelator input signals (in the first set of decorrelator input signals) Is used to determine the _pre- mixing matrix ( M _pre ) to define the pre-mixing matrix ( M _pre ). Thus, the information received from the external renderer / format converter (which receives the output audio signals of the multi-channel decoder) is used to select or adjust the premixing matrix (the internal rendering matrix of the multi-channel audio decoder is set to identity, The external renderer / format converter is coupled to receive the output audio signals as described above in connection with the multi-channel audio decoder.

17. 17. 비트스트림Bit stream

아래에, 비트스트림(또는 동등하게, 오디오 콘텐츠의 인코딩된 표현)내에 어떠한 부가적인 시그널링 정보가 사용될 수 있는지가 설명될 것이다. 본 발명에 따른 실시 예들에서, 역상관 방법은 요구되는 품질 레벨을 보장하기 위하여 비트스트림내로 시그널링될 수 있다. 이러한 방법으로, 사용자(또는 오디오 인코더)는 콘텐츠를 기초로 하여 방법을 선택하는데 더 많은 유연성을 갖는다. 이러한 목적을 위하여, MPEG 공간 오디오 오브젝트 코딩 비트스트림 구문은 예를 들면, 사용되는 역상관 방법을 지정하기 위한 2 비트 및/또는 구성(복잡도)을 지정하기 위하여 2 비트로 확장될 수 있다.Below, what additional signaling information may be used in the bitstream (or equivalently, the encoded representation of the audio content) will be described. In embodiments according to the present invention, the decorrelation method may be signaled into the bitstream to ensure the required quality level. In this way, the user (or audio encoder) has more flexibility in selecting the method based on the content. For this purpose, the MPEG spatial audio object coding bitstream syntax may be extended to two bits, for example, to specify two bits and / or a configuration (complexity) for specifying the method of de-correlation to be used.

도 25는 예를 들면 비트스트림 부분("SAOCSpecificConfig()" 또는 "SAOC3DSpecificConfig()")에 추가될 수 있는, 비트스트림 요소들("bsDecorrelationMethod" 및 "bsDecorrelationLevel")의 구문 표현을 도시한다. 도 25에서 알 수 있는 것과 같이, 비트스트림 요소("bsDecorrelationMethod")를 위하여 두 개의 비트가 사용될 수 있고, 비트스트림 요소("bsDecorrelationLevel")를 위하여 두 개의 비트가 사용될 수 있다.Figure 25 shows the syntax representation of the bitstream elements ("bsDecorrelationMethod" and "bsDecorrelationLevel"), which may be added, for example, to the bitstream portion ("SAOCSpecificConfig ()" or "SAOC3DSpecificConfig ()". As can be seen in Fig. 25, two bits can be used for the bit stream element ("bsDecorrelationMethod"), and two bits can be used for the bit stream element ("bsDecorrelationLevel").

도 26은 테이블의 형태로, 비트스트림 변수 "bsDecorrelationNethod"의 값들 및 상이한 역상관 방법들 사이의 관계를 도시한다. 예를 들면, 세 가지 상이한 역상관 방법이 상기 비트스트림 변수의 상이한 값들에 의해 시그널링될 수 있다. 예를 들면, 예를 들면 섹션 14.3에서 설명된 것과 같은, 역상관들을 사용하는 출력 공분산 보정이 선택사항들 중 하나로서 시그널링될 수 있다. 또 다른 선택사항으로서, 예를 들면 섹션 14.4.1에서 설명된 것과 같은, 공분산 조정 방법이 시그널링될 수 있다. 또한 또 다른 선택사항으로서, 예를 들면 섹션 14.4.2에서 설명된 것과 같은, 에너지 보상 방법이 시그널링될 수 있다. 따라서, 렌더링된 오디오신호들과 역상관된 오디오 신호들을 기초로 하는 출력 오디오 신호들의 신호 특징들의 재구성을 위한 세 가지 상이한 방법들은 비트스트림 변수에 의존하여 선택될 수 있다.Figure 26 shows the relationship between the values of the bitstream variable " bsDecorrelationNethod " and the different decorrelation methods in the form of a table. For example, three different decorrelation methods may be signaled by different values of the bitstream variable. For example, an output covariance correction using inverse correlations, such as that described in section 14.3, may be signaled as one of the choices. As another option, a covariance adjustment method, for example, as described in Section 14.4.1, may be signaled. As yet another option, an energy compensation method, for example, as described in Section 14.4.2, may be signaled. Thus, three different methods for reconstructing the signal characteristics of the output audio signals based on the decoded audio signals and the decorrelated audio signals may be selected depending on the bitstream parameters.

에너지 보상 모드는 섹션 14.4.2에서 설명된 방법을 사용한다. 제한된 공분산 조정 모드는 섹션 14.4.1에서 설명된 방법을 사용하고, 일반적인 공분산 조정 모드는 섹션 14.3에서 설명된 방법을 사용한다.The energy compensation mode uses the method described in Section 14.4.2. The restricted covariance adjustment mode uses the method described in Section 14.4.1, and the general covariance adjustment mode uses the method described in Section 14.3.

이제 테이블 표현의 형태로, 상이한 역상관 레벨들이 비트스트림 변수 "bsDecorrelationLevel"에 의해 어떻게 시그널링될 수 있는지를 도시한, 도 27을 참조하면, 역상관 복잡도를 선택하기 위한 방법이 설명될 것이다. 바꾸어 말하면, 상기 변수는 어떤 역상관 복잡도가 사용되는지를 결정하기 위하여 위에 설명된 다채널 역상관기를 포함하는 다채널 오디오 디코더에 의해 평가될 수 있다. 예를 들면, 상기 비트스트림 파라미터는 값들(0, 1, 2, 및 3)로 지정될 수 있는 상이한 역상관 "레벨들"을 시그널링할 수 있다.Referring now to Fig. 27, which illustrates how different decorrelation levels can be signaled by the bitstream variable " bsDecorrelationLevel " in the form of a table representation, a method for selecting the decorrelation complexity will now be described. In other words, the variable can be evaluated by a multi-channel audio decoder including the multi-channel decorrelators described above to determine what inverse correlation complexity is used. For example, the bitstream parameter may signal different decorrelation " levels " that may be assigned values (0, 1, 2, and 3).

역상관 구성들(예를 들면, 역상관 레벨들로서 지정될 수 있는)의 일례가 도 27의 테이블에 주어진다. 도 27은 상이한 "레벨들"(예를 들면, 역상관 레벨들) 및 출력 구성들을 위한 다수의 역상관기들의 테이블 표현을 도시한다. 바꾸어 말하면, 도 27은 다채널 역상관기에 의해 사용되는, 역상관기 입력 신호들(역상관기 입력 신호들의 제 2 세트의)의 수(K)를 도시한다. 도 27에서 알 수 있는 것과 같이, 다채널 역상관기 내의 다수의 (개별) 역상관기들의 수는 어떤 "비트스트림 레벨"이 비트스트림 파라미터 "bsDecorrelationLevel"에 의해 시그널링되는지에 의존하여, 22.2 출력 구성에 대하여 11, 9, 7 및 5 사이에서 스위칭한다. 상기 비트스트림 파라미터에 의해 시그널링되는 "역상관기 레벨"에 의존하여, 10.1 출력 구성에 대하여 10, 5, 3 및 2개의 개별 역상관기들 사이에서 선택되고, 8.1 구성에 대하여 8, 4, 3 또는 2개의 개별 역상관기들 사이에서 선택되며, 7.1 구성에 대하여 7, 4, 3 또는 2개의 개별 역상관기들 사이에서 선택된다. 5.1 출력 구성에서, 개별 역상관기들의 수에 대하여 세 가지 유효한 선택사항, 5, 3 또는 2만이 존재한다. 2.1 출력 구성에 대하여, 두 개의 개별 역상관기(역상관 레벨 0) 및 하나의 개별 역상관기(역상관 레벨 1) 사이의 선택만이 존재한다.One example of the decorrelation configurations (which may be specified, for example, as the de-correlation levels) is given in the table of FIG. Figure 27 shows a table representation of a number of " levels " (e.g., decorrelation levels) and a plurality of decorrelators for output configurations. In other words, Figure 27 shows the number (K) of decorrelator input signals (of the second set of decorrelator input signals) used by the multi-channel decorrelator. As can be seen in Figure 27, the number of multiple (individual) decorrelators in a multi-channel decorrelator depends on which "bitstream level" is signaled by the bitstream parameter "bsDecorrelationLevel" 11, 9, 7 and 5, respectively. Depending on the " decorrelator level " signaled by the bitstream parameter, it is selected between 10, 5, 3 and 2 individual decorrelators for a 10.1 output configuration and 8, 4, 3 or 2 Are selected among the individual inverse correlators, and are selected among 7, 4, 3, or 2 individual inverse correlators for the 7.1 configuration. In the 5.1 output configuration, there are only three valid choices, 5, 3 or 2, for the number of individual decorrelators. For the 2.1 output configuration, there is only a choice between two individual decorrelators (de-correlation level 0) and one individual de-correlator (de-correlation level 1).

요약하면, 역상관 방법은 역상관기들의 계산 파워 및 이용 가능한 수를 기초로 하여 디코더 측에서 결정될 수 있다. 게다가, 역상관기들의 수의 선택은 비트스트림 파라미터를 사용하여 인코더 측에서 만들어지고 시그널링될 수 있다.In summary, the inverse correlation method can be determined at the decoder side based on the computational power and available number of decorrelators. In addition, the choice of the number of inverse correlators can be made and signaled on the encoder side using bitstream parameters.

따라서, 출력 오디오 신호들을 획득하기 위하여, 어떻게 역상관된 오디오 신호들이 적용되는지의 방법, 및 역상관된 신호들에 대한 복잡도 모두는 도 25에 도시되고 도 26과 27에 더 상세히 정의된 비트스트림 파라미터들을 사용하여 오디오 인코더의 측으로부터 제어될 수 있다.Thus, in order to obtain output audio signals, both how the decorrelated audio signals are applied, and the complexity for the decorrelated signals, is determined by the bitstream parameters < RTI ID = 0.0 > Can be controlled from the side of the audio encoder.

18. 본 발명의 처리를 위한 적용 분야18. Fields of Application for the Treatment of the Present Invention

오디오 장면의 인간 지각에 매우 중요한, 오디오 신호들을 복요구되는 것이 도입된 방법들이 목적 중의 하나라는 것에 유의하여야 한다. 본 발명에 따른 실시 예들은 에너지 레벨 및 상관 특성들의 재구성 정확도를 향상시키고 따라서 최종 출력 신호의 지각적 출력 품질을 증가시킨다. 본 발명에 따른 실시 예들은 임의의 수의 다운믹스/업믹스 채널들을 위하여 적용될 수 있다. 게다가, 여기서 설명되는 방법들과 장치들은 현존하는 파라미터 소스 분리 알고리즘들과 결합될 수 있다. 본 발명에 따른 실시 예들은 적용된 역상관기 함수들의 수에 대한 제한들을 설정함으로써 시스템의 계산 복잡도를 제어하도록 허용한다. 본 발명에 따른 실시 예들은 MPS 트랜스코딩 단계를 제거함으로써 공간 오디오 오브젝트 코딩 같은 오브젝트 기반 파라미터 구성 알고리즘들의 단순화에 이르게 할 수 있다.It should be noted that the methods in which it is necessary to repeat audio signals, which is very important for human perception of an audio scene, is one of the purposes. Embodiments in accordance with the present invention improve the reconstruction accuracy of the energy level and correlation characteristics and thus increase the perceptual output quality of the final output signal. Embodiments according to the present invention may be applied for any number of downmix / upmix channels. In addition, the methods and apparatuses described herein can be combined with existing parameter source separation algorithms. Embodiments in accordance with the present invention allow to control the computational complexity of the system by setting limits on the number of decorrelator functions applied. Embodiments in accordance with the present invention can lead to simplification of object-based parameter construction algorithms such as spatial audio object coding by eliminating the MPS transcoding step.

19. 인코딩/디코딩 환경19. Encoding / decoding environment

아래에, 본 발명에 따른 개념들이 적용될 수 있는 오디오 인코딩/디코딩 환경이 설명될 것이다.Hereinafter, an audio encoding / decoding environment to which the concepts according to the present invention can be applied will be described.

본 발명에 따른 개념들이 사용될 수 있는, 3차원 오디오 코덱 시스템은 많은 양의 오브젝트들의 코딩을 위한 효율을 증가시키도록 채널 및 오브젝트 신호들의 코딩을 위하여 MPEG-D USAC 코덱을 기초로 한다. MPEG-공간 오디오 오브젝트 코딩 기술이 적용되어왔다. 세 가지 형태의 렌더러가 오브젝트들의 채널들로의 렌더링, 채널들의 헤드폰들로의 렌더링 또는 채널들의 상기한 확성기 설정들로의 렌더링의 작업을 실행한다. 오브젝트 신호들이 공간 오디오 오브젝트 코딩을 사용하여 명시적으로 전송되거나 또는 파라미터로 인코딩될 때, 상응하는 오브젝트 메타데이터 정보가 압축되고 3차원 오디오 스트림 내로 멀티플렉싱된다.A three-dimensional audio codec system, in which the concepts according to the present invention may be used, is based on the MPEG-D USAC codec for coding of channel and object signals to increase the efficiency for coding a large amount of objects. MPEG-space audio object coding techniques have been applied. Three types of renderers perform rendering of objects to channels, rendering channels to headphones, or rendering channels to the above loudspeaker settings. When the object signals are explicitly transmitted or encoded with parameters using spatial audio object coding, the corresponding object metadata information is compressed and multiplexed into the three-dimensional audio stream.

도 28, 29 및 30은 3차원 오디오 시스템의 상이한 알고리즘 블록들을 도시한다.Figures 28, 29 and 30 illustrate different algorithm blocks of a three-dimensional audio system.

도 28은 그러한 오디오 인코더의 개략적인 블록 다이어그램을 도시하고, 도 29는 그러한 오디오 디코더의 개략적인 블록 다이어그램을 도시한다. 바꾸어 말하면, 도 28과 29는 3차원 오디오 시스템의 상이한 알고리즘 블록들을 도시한다.Fig. 28 shows a schematic block diagram of such an audio encoder, and Fig. 29 shows a schematic block diagram of such an audio decoder. In other words, Figures 28 and 29 show different algorithm blocks of a three-dimensional audio system.

3차원 오디오 인코더(2900)의 개략적인 블록 다이어그램을 도시한, 도 28을 참조하여, 일부 상세내용이 설명될 것이다. 인코더(2900)는 하나 이상의 채널 신호(2912) 및 하나 이상의 오브젝트 신호(2914)를 수신하고, 이를 기초로 하여 하나 이상의 채널 신호(2916)뿐만 아니라 하나 이상의 오브젝트 신호(2918, 2920)를 제공하는, 선택적 프리-렌더러/믹서(2910)를 포함한다. 오디오 인코더는 또한 USAC 인코더(2930) 및 선택적으로 공간 오디오 오브젝트 코딩 인코더(2940)를 포함한다. 공간 오디오 오브젝트 코딩 인코더(2940)는 공간 오디오 오브젝트 코딩 인코더에 제공되는 하나 이상의 오브젝트(2920)를 기초로 하여 하나 이상의 공간 오디오 오브젝트 코딩 전송 채널(2942) 및 공간 오디오 오브젝트 코딩 부가 정보(2944)를 제공하도록 구성된다. 게다가, USAC 인코더(2930)는 프리-렌더러/믹서(2910)로부터 채널들과 프리렌더링된 오브젝트들을 포함하는 채널 신호들(2916)을 수신하고, 프리-렌더러/믹서(2910)로부터 하나 이상의 오브젝트 신호(2918)를 수신하며, 하나 이상의 공간 오디오 오브젝트 코딩 전송 채널(2942) 및 공간 오디오 오브젝트 코딩 부가 정보(2944)를 수신하며, 이를 기초로 하여 인코딩된 표현(2932)을 제공하도록 구성된다. 게다가, 오디오 인코더(2900)는 또한 인코딩된 오브젝트 메타데이터(2954)를 획득하기 위하여 오브젝트 메타데이터(2952, 프리-렌더러/믹서(2910)에 의해 평가될 수 있는)를 수신하고 오브젝트 메타데이터를 인코딩하도록 구성되는 오브젝트 메타데이터 인코더(2950)를 포함한다. 인코딩된 오디오 신호는 또한 USAC 인코더(2930)에 의해 수신되고 인코딩된 표현(2932)을 제공하도록 사용된다.Some details will be described with reference to FIG. 28, which shows a schematic block diagram of a three-dimensional audio encoder 2900. The encoder 2900 receives one or more channel signals 2912 and one or more object signals 2914 and provides one or more channel signals 2916 as well as one or more object signals 2918 and 2920 based thereon. And an optional pre-renderer / mixer 2910. The audio encoder also includes a USAC encoder 2930 and optionally a spatial audio object coding encoder 2940. The spatial audio object coding encoder 2940 provides one or more spatial audio object coding transmission channels 2942 and spatial audio object coding side information 2944 based on one or more objects 2920 provided to the spatial audio object coding encoder . In addition, the USAC encoder 2930 receives channel signals 2916 including channels and pre-rendered objects from the pre-renderer / mixer 2910 and provides one or more object signals Receive spatial audio object coding transmission information 2918 and receive one or more spatial audio object coding transmission channels 2942 and spatial audio object coding side information 2944 and to provide an encoded representation 2932 based thereon. In addition, audio encoder 2900 also receives object metadata 2952 (which may be evaluated by pre-renderer / mixer 2910) to obtain encoded object metadata 2954 and encodes the object metadata And an object meta data encoder 2950 that is configured to do so. The encoded audio signal is also received by the USAC encoder 2930 and used to provide an encoded representation 2932.

오디오 인코더(2900)의 개별 부품들에 관한 일부 상세내용이 아래에 설명될 것이다.Some details regarding the individual components of the audio encoder 2900 will be described below.

도 29를 참조하여, 오디오 디코더(3000)가 설명될 것이다. 오디오 디코더(3000)는 인코딩된 표현(3010)을 수신하고 이를 기초로 하여, 대안의 포맷(예를 들면, 5.1 포맷)으로 다채널 확성기 신호들(3012), 헤드폰 신호들(3014) 및/또는 확성기 신호들(3016)을 제공하도록 구성된다. 오디오 디코더(3000)는 인코딩된 표현(3010)을 기초로 하여 하나 이상의 채널 신호(3022), 하나 이상의 프리렌더링된 오브젝트 신호(3024), 하나 이상의 오브젝트 신호(3025), 하나 이상의 공간 오디오 오브젝트 코딩 전송 채널(3028), 공간 오디오 오브젝트 코딩 부가 정보(3030) 및 압축된 오브젝트 메타데이터 정보(3032)를 제공하는, USAC 디코더(3020)를 포함한다. 오디오 디코더(3000)는 또한 하나 이상의 오브젝트 신호(3026) 및 오브젝트 메타데이터 정보(3044)를 기초로 하여 하나 이상의 렌더링된 오브젝트 신호(3042)를 제공하도록 구성되는, 오브젝트 렌더러(3040)를 포함하고, 오브젝트 메타데이터 정보(3044)는 압축된 오브젝트 메타데이터 정보(3062)를 기초로 하여 오브젝트 메타데이터 디코더(3050)에 의해 제공된다. 오디오 디코더(3000)는 또한 채널 신호들(3022), 프리렌더링된 오브젝트 신호들(3024), 렌더링된 오브젝트 신호들(3042) 및 렌더링된 오브젝트 신호들(3062)을 수신하고 이를 기초로 하여, 예를 들면 다채널 확성기 신호들(3012)을 구성할 수 있는, 복수의 믹싱된 채널 신호(3072)을 제공하도록 구성되는, 믹서(3070)를 포함한다. 오디오 디코더(3000)는 예를 들면, 또한 믹싱된 채널 신호들(3072)을 수신하고 이를 기초로 하여, 헤드폰 신호들(3014)을 제공하도록 구성되는, 바이노럴 렌더러(binaural rendere, 3080)을 포함할 수 있다. 게다가, 오디오 디코더(3000)는 믹싱된 채널 신호들(3072) 및 재생 레이아웃 정보(3092)를 수신하고 이를 기초로 하여, 대안의 확성기 설정을 위한 확성기 신호(3016)를 제공하도록 구성되는, 포맷 전환(3090)을 포함할 수 있다.29, an audio decoder 3000 will be described. The audio decoder 3000 receives the encoded representation 3010 and generates, based thereon, the multi-channel loudspeaker signals 3012, the headphone signals 3014 and / or the multi-channel loudspeaker signals 3012 in an alternative format (e.g., 5.1 format) And to provide loudspeaker signals 3016. Audio decoder 3000 may generate one or more channel signal 3022, one or more pre-rendered object signals 3024, one or more object signals 3025, one or more spatial audio object coding A USAC decoder 3020 that provides a channel 3028, spatial audio object coding side information 3030 and compressed object metadata information 3032. [ Audio decoder 3000 also includes an object renderer 3040 configured to provide one or more rendered object signals 3042 based on one or more object signals 3026 and object metadata information 3044, The object meta data information 3044 is provided by the object meta data decoder 3050 based on the compressed object meta data information 3062. [ Audio decoder 3000 also receives channel signals 3022, pre-rendered object signals 3024, rendered object signals 3042 and rendered object signals 3062 and, based thereon, And a mixer 3070 configured to provide a plurality of mixed channel signals 3072, which may constitute multi-channel loudspeaker signals 3012. For example, The audio decoder 3000 also includes a binaural renderer 3080 that is configured to receive the mixed channel signals 3072 and to provide the headphone signals 3014 based thereon, . The audio decoder 3000 is configured to receive the mixed channel signals 3072 and playback layout information 3092 and to provide a loudspeaker signal 3016 for alternative loudspeaker setup 3090 < / RTI >

아래에, 오디오 인코더(2900) 및 오디오 디코더(3000)의 부품들에 대한 일부 상세내용이 설명될 것이다.Some details of the components of the audio encoder 2900 and the audio decoder 3000 will be described below.

19.1. 19.1. 프리free -- 렌더러Renderer /믹서/mixer

프리-렌더러/믹서(2910)는 인코딩 이전에 채널 및 오브젝트 입력 장면을 채널 장면으로 전환하기 위하여 선택적으로 사용될 수 있다. 기능적으로, 이는 예를 들면, 아래에 설명되는 오브젝트 렌더러/믹서와 동일할 수 있다.The pre-renderer / mixer 2910 may optionally be used to convert the channel and object input scenes into channel scenes prior to encoding. Functionally, this may be the same as, for example, the object renderer / mixer described below.

오브젝트들의 프리렌더링은 예를 들면, 기본적으로 동시에 활성인 오브젝트 신호들의 수에 독립적인 인코더 입력에서 결정론적(deterministic) 신호 엔트로피를 보장할 수 있다.Pre-rendering of objects can ensure, for example, deterministic signal entropy at an encoder input that is independent of the number of object signals that are essentially active at the same time.

오브젝트들의 프리렌더링으로, 어떠한 오브젝트 메타데이터도 요구되지 않는다.With pre-rendering of objects, no object meta data is required.

이산 오브젝트 신호들은 인코더가 사용하도록 구성되는 채널 레이아웃에 렌더링되고, 각각의 채널에 대한 오브젝트들의 가중들이 관련 오브젝트 메타데이터(OAM, 1952)로부터 획득된다.The discrete object signals are rendered in a channel layout configured for use by the encoder, and the weights of the objects for each channel are obtained from the associated object metadata (OAM, 1952).

19.2. 19.2. USACUSAC 코어 core 코더coder

확성기 채널 신호들, 이산 오브젝트 신호들, 오브젝트 다운믹스 신호들 및 프리렌더링된 신호들을 위한 코어 코덱(2930, 3020)은 MPEG-D USAC 기술을 기초로 한다. 이는 입력 채널과 오브젝트 할당의 기하학 및 시맨틱 정보를 기초로 하여 채널- 및 오브젝트-매핑 정보를 생성함으로써 다수의 신호의 디코딩을 처리한다. 이러한 매핑 정보는 어떻게 입력 채널들과 오브젝트들이 USAC 채널 요소들(CPE들, SCE들, LFE들)에 매핑되고 상응하는 정보가 디코더에 전송되는지를 기술한다.The core codecs 2930 and 3020 for the loudspeaker channel signals, discrete object signals, object downmix signals and pre-rendered signals are based on the MPEG-D USAC technology. It handles the decoding of multiple signals by generating channel-and object-mapping information based on the geometry and semantic information of the input channel and object allocation. This mapping information describes how input channels and objects are mapped to USAC channel elements (CPEs, SCEs, LFEs) and corresponding information is sent to the decoder.

공간 오디오 오브젝트 코딩 데이터 또는 오브젝트 메타데이터 같은 부가적인 페이로드들이 확장 요소들을 통과하였고 인코더 비율 제어에서 고려되었다. 오브젝트들의 디코딩은 렌더러에 대한 비율/왜곡 요구사항들 및 상호작용 요구사항들에 의존하여, 상이한 방법들에서 가능하다. 다음의 오브젝트 코딩 변형들이 가능하다:Additional payloads such as spatial audio object coding data or object metadata have passed through the expansion elements and are considered in encoder rate control. The decoding of objects is possible in different ways, depending on the ratio / distortion requirements and the interaction requirements for the renderer. The following object coding variants are possible:

● 프리렌더링된 오브젝트들: 오브젝트 신호들은 인코딩 이전에 22.2 채널 신호들에 프리렌더링되고 믹싱된다. 뒤따르는 코딩 체인은 22.2 채널 신호들을 보게 된다.Pre-rendered objects: Object signals are pre-rendered and mixed in 22.2 channel signals prior to encoding. The following coding chain will see 22.2 channel signals.

● 이산 오브젝트 파형들: 모노포닉 파형들로서 디코더에 적용되는 것과 같은 오브젝트들. 채널 신호들에 더하여 오브젝트들을 전송하기 위하여 인코더는 단일 채널 요소들(SCEs)을 사용한다. 디코딩된 오브젝트들은 수신기 측에서 렌더링되고 믹싱된다. 압축된 오브젝트 메타데이터 정보가 수신기/렌더러에 함께 전송된다.Discrete object waveforms: objects such as those applied to the decoder as monophonic waveforms. The encoder uses single channel elements (SCEs) to transmit objects in addition to channel signals. The decoded objects are rendered and mixed on the receiver side. The compressed object metadata information is sent to the receiver / renderer together.

● 파라미터 오브젝트 파형들: ● Parameter object waveforms:

오브젝트 특성들 및 서로에 대한 그것들에 관계는 공간 오디오 오브젝트 코딩 파라미터들에 의해 기술된다. 오브젝트 신호들의 다운믹스는 USAC로 코딩된다. 파라미터 정보가 함께 전공된다. 다운믹스 채널들의 수는 오브젝트의 수 및 전체 데이터 레이트에 의존하여 선택된다. 압축된 오브젝트 메타데이터 정보는 공간 오디오 오브젝트 코딩 렌더러에 전송된다.The object properties and their relationship to each other are described by spatial audio object coding parameters. The downmix of the object signals is coded in USAC. Parameter information is studied together. The number of downmix channels is selected depending on the number of objects and the total data rate. The compressed object metadata information is transmitted to the spatial audio object coding renderer.

19.3. 공간 오디오 19.3. Space audio 오브젝트Object 코딩 Coding

오브젝트 신호들을 위한 공간 오디오 오브젝트 코딩 인코더(2940) 및 공간 오디오 오브젝트 코딩 디코더(3060)는 MPEG 공간 오디오 오브젝트 코딩 기술을 기초로 한다. 시스템은 더 적은 수의 전송된 채널들 및 부가적인 파라미터 데이터(오브젝트 레벨 차이들(OLDs), 오브젝트간 상관들(IOCs), 다운믹스 이득들(DMGs))을 기초로 하여 다수의 오디오 오브젝트들을 재생성, 변형 및 렌더링할 수 있다. 부가적인 파라미터 데이터는 전송된 모든 오브젝트에 필요한 것보다 상당히 낮은 데이터 레이트를 나타내고, 이는 디코딩을 매우 효율적으로 만든다. 공간 오디오 오브젝트 코딩 인코더는 입력으로서 모노포닉 파형들 같은 오브젝트/채널 신호들을 취하고 파라미터 정보(3차원 오디오 비트스트림(2932, 3010) 내로 패킹된) 및 공간 오디오 오브젝트 코딩 전송 채널들(단일 채널 요소들을 사용하여 인코딩되고 전송되는)을 출력한다. 공간 오디오 오브젝트 코딩 디코더(3060)는 디코딩된 공간 오디오 오브젝트 코딩 전송 채널들(3028) 및 파라미터 정보(3030)로부터 오브젝트/채널 신호들을 재구성하고 재생 레이아웃, 분해된 오브젝트 메타데이터 정보를 기초로 하고 선택적으로 사용자 상호작용 정보를 기초로 하여 출력 오디오 장면을 발생시킨다.The spatial audio object coding encoder 2940 and spatial audio object coding decoder 3060 for object signals are based on the MPEG spatial audio object coding technique. The system regenerates multiple audio objects based on a smaller number of transmitted channels and additional parameter data (object level differences (OLDs), inter-object correlations (IOCs), downmix gains (DMGs) , Transform, and render. The additional parameter data represents a significantly lower data rate than is required for all transmitted objects, which makes decoding very efficient. The spatial audio object coding encoder takes as input the object / channel signals such as monophonic waveforms and uses parameter information (packed into three-dimensional audio bitstreams 2932 and 3010) and spatial audio object coding transport channels And encoded and transmitted). The spatial audio object coding decoder 3060 reconstructs the object / channel signals from the decoded spatial audio object coding transmission channels 3028 and parameter information 3030 and generates a spatial audio object based on the playback layout, And generates an output audio scene based on user interaction information.

19.4. 19.4. 오브젝트Object 메타데이터 코덱 Metadata codec

각각의 오브젝트를 위하여, 3차원 공간 내의 오브젝트의 기하학적 위치 및 볼륨을 지정하는 관련 메타데이터는 시간 및 공간 내의 오브젝트 특성들의 양자화에 의해 효율적으로 코딩된다. 압축된 메타데이터 cOAM(2954, 3032)는 부가 정보로서 수신기에 전송된다.For each object, the associated metadata specifying the geometric location and volume of the object in the three-dimensional space is efficiently coded by quantization of object properties in time and space. The compressed metadata cOAMs 2954 and 3032 are transmitted to the receiver as additional information.

19.5. 19.5. 오브젝트Object 렌더러Renderer /믹서/mixer

오브젝트 렌더러는 주어진 재생 포맷에 따라 오브젝트 파형들을 발생시키기 위하여 압축된 오브젝트 메타데이터(OAM, 3044)를 사용한다. 각각의 오브젝트는 그것의 메타데이터에 따라 특정 출력 채널들에 렌더링된다. 이러한 블록의 출력은 부분 결과들이 합계로부터 야기한다.The object renderer uses compressed object metadata (OAM) 3044 to generate object waveforms in accordance with a given playback format. Each object is rendered on certain output channels according to its metadata. The output of this block results from the sum of the partial results.

만일 채널 기반 콘텐츠뿐만 아니라 이산/파라미터 오브젝트들 모두가 디코딩되면, 채널 기반 파형들 및 렌더링된 오브젝트 파형들은 결과로서 생기는 파형들의 출력 이전에(또는 그것들을 바이노럴 렌더러 같은 포스트-프로세서 모듈 또는 확성기 렌더러 모듈에 제공하기 전에) 믹싱된다.If both the channel-based content as well as the discrete / parameter objects are decoded, the channel-based waveforms and the rendered object waveforms may be processed prior to the output of the resulting waveforms (or in a post- processor module such as a binaural renderer, Before being provided to the module.

19.6. 19.6. 바이노럴Binaural 렌더러Renderer

바이노럴 렌더러 모듈(3080)은 각각의 입력 채널이 가상 음원에 의해 표현되도록, 다채널 오디오 자료의 바이노럴 다운믹스를 생산한다. 처리는 직각 대칭 필터(QMF) 도메인 내에서 프레임 방식으로 수행된다. 바이노럴화는 측정된 바이노럴 룸 임펄스 응답들을 기초로 한다.The binaural renderer module 3080 produces a binaural downmix of multi-channel audio data such that each input channel is represented by a virtual sound source. The processing is performed in a frame-wise manner within a right-angled symmetric filter (QMF) domain. Binauralization is based on measured binaural room impulse responses.

19.7 확성기 19.7 Loudspeakers 렌더러Renderer /포맷 전환/ Format conversion

확성기 렌더러(3090)는 전송된 채널 구성 및 요구되는 재생 포맷 사이에서 전환한다. 따라서 이는 아래에서 "포맷 컨버터"로 불린다. 포맷 컨버터는 낮은 수의 출력 채널들로의 전환들을 실행하는데, 즉 다운믹스들을 생성한다. 시스템은 입력 및 출력 포맷들의 주어진 결합을 위하여 최적화된 다운믹스 매트릭스들을 발생시키고 다운믹스 과정에서 이러한 매트릭스들을 적용한다. 포맷 컨버터는 표준 확성기 구성들뿐만 아니라 비-표준 확성기 위치들을 갖는 임의 구성들을 허용한다.The loudspeaker renderer 3090 switches between the transmitted channel configuration and the required playback format. This is therefore referred to below as a " format converter ". The format converter performs conversions to a lower number of output channels, i. E., Generates downmixes. The system generates optimized downmix matrices for a given combination of input and output formats and applies these matrices in a downmixing process. The format converter allows for arbitrary configurations with non-standard loudspeaker locations as well as standard loudspeaker configurations.

도 30은 포맷 컨버터의 개략적인 블록 다이어그램을 도시한다. 바꾸어 말하면, 도 30은 포맷 컨버터의 구조를 도시한다.Figure 30 shows a schematic block diagram of a format converter. In other words, Fig. 30 shows the structure of the format converter.

도시된 것과 같이, 포맷 컨버터(3100)는 믹서 출력 신호들(3110), 예를 들면 믹싱된 채널 신호들(3072)을 수신하고, 확성기 신호들(3112), 예를 들면 스피커 신호들(3016)을 제공한다. 포맷 컨버터는 직각 대칭 필터 도메인 내의 다운믹스 과정(3120) 및 다운믹스 구성기(3130)를 포함하고, 다운믹스 구성기는 믹서 출력 레이아웃 정보(3032) 및 재생 레이아웃 정보(3034)를 기초로 하여 다운믹스 과정(3020)을 위한 구성 정보를 제공한다.As shown, the format converter 3100 receives mixer output signals 3110, e.g., mixed channel signals 3072, and outputs loudspeaker signals 3112, e.g., speaker signals 3016, . The format converter includes a downmix process 3120 and a downmix constructor 3130 in a rectangular symmetric filter domain and the downmix configurator includes mixer output layout information 3032 and playout layout information 3034, And provides configuration information for the process 3020.

19.8. 일반적인 개론19.8. General Introduction

게다가, 여기서 설명되는 개념들, 예를 들면 오디오 디코더(100), 오디오 인코더(200), 다채널 역상관기(600), 다채널 오디오 디코더(700), 오디오 인코더(800) 또는 오디오 디코더(1550)는 오디오 인코더(2900) 및/또는 오디오 디코더(3000) 내에서 사용될 수 있다는 것에 유의하여야 한다. 예를 들면, 위에 언급된 오디오 인코더들/디코더들은 공간 오디오 오브젝트 코딩 인코더(2940)의 부분 및/또는 공간 오디오 오브젝트 코딩 디코더(3060)의 부분으로서 사용될 수 있다. 그러나, 위에 언급된 개념들은 또한 3차원 오디오 디코더(3000) 및/또는 오디오 인코더(2900)의 다른 위치들에서 사용될 수 있다.In addition, the concepts described herein, such as audio decoder 100, audio encoder 200, multi-channel decorrelator 600, multi-channel audio decoder 700, audio encoder 800 or audio decoder 1550, May be used within the audio encoder 2900 and / or the audio decoder 3000. For example, the above-mentioned audio encoders / decoders may be used as part of the spatial audio object coding encoder 2940 and / or as part of the spatial audio object coding decoder 3060. However, the above-mentioned concepts may also be used in other positions of the three-dimensional audio decoder 3000 and / or the audio encoder 2900.

자연적으로, 위에 언급된 방법들은 또한 도 28 및 29에 따른 오디오 정보의 인코딩과 디코딩을 위한 개념들에서 사용될 수 있다.Naturally, the above-mentioned methods can also be used in the concepts for the encoding and decoding of audio information according to FIGS. 28 and 29.

20. 부가적인 실시 예들20. Additional embodiments

20.1 서론20.1 Introduction

아래에, 본 발명에 따른 또 다른 실시 예가 설명될 것이다.Hereinafter, another embodiment according to the present invention will be described.

도 31은 본 발명의 일 실시 예에 따른, 다운믹스 프로세서의 개략적인 블록 다이어그램을 도시한다.Figure 31 shows a schematic block diagram of a downmix processor, in accordance with an embodiment of the present invention.

다운믹스 프로세서(3100)는 업믹서(3110), 렌더러(3120), 결합기(3130) 및 다채널 역상관기(3140)를 포함한다. 렌더러는 렌더링된 오디오 신호들(Y_dry)을 결합기(3130) 및 다채널 역상관기(3140)에 제공한다. 다채널 역상관기는 렌더링된 오디오 신호들(역상관기 입력 신호들의 제 1 세트로서 고려될 수 있는)을 수신하고 이를 기초로 하여 역상관기 입력 신호들의 프리믹싱된 제 2 세트를 역상관기 코어(3160)에 제공하는, 프리믹서(3150)를 포함한다. 역상관기 코어는 포스트믹서(3170)의 사용을 위하여 역상관기 입력 신호들의 제 2 세트를 기초로 하여 역상관기 출력 신호들의 제 1 세트를 제공하고, 포스트믹서는 결합기(3130)에 제공되는 역상관기 출력 신호의 제 2 세트를 획득하기 위하여, 역상관기 코어(3160)에 의해 제공되는 역상관기 출력 신호들을 포스트믹싱(업믹싱)한다.The downmix processor 3100 includes an upmixer 3110, a renderer 3120, a combiner 3130 and a multi-channel decorrelator 3140. The renderer provides the rendered audio signals (Y _dry ) to the combiner 3130 and the multi-channel decorrelator 3140. The multi-channel decorrelator receives the rendered audio signals (which may be considered as a first set of decorrelator input signals) and based thereon a pre-mixed second set of decorrelator input signals to decorrelator core 3160 And a pre-mixer 3150 that provides the pre-mixer 3150 with a pre-mixer. The decorrelator core provides a first set of decorrelator output signals based on the second set of decorrelator input signals for use by the postmixer 3170 and the postmixer provides a decorrelator output signal (Upmix) the decorrelator output signals provided by the decorrelator core 3160 to obtain a second set of signals.

렌더러(3130)는 예를 들면, 렌더링을 위한 매트릭스(R)를 적용할 수 있고, 프리믹서는 예를 들면, 프리믹싱을 위한 매트릭스(M _pre)를 적용할 수 있으며, 포스트믹서는 예를 들면, 포스트믹싱을 위한 매트릭스(M _post)를 제공할 수 있으며, 결합기는 예를 들면, 결합을 위한 매트릭스(P)를 제공할 수 있다.The renderer 3130 may apply, for example, a matrix R for rendering and the pre-mixer may, for example, apply a matrix M _pre for pre-mixing, and the post- , A matrix for post mixing ( M _post ), and the combiner may provide, for example, a matrix P for combining.

다운믹스 프로세서(3100), 또는 개별 부품들 또는 그것의 기능들은 여기서 섦녕되는 오디오 디코더들 내에서 사용될 수 있다는 것에 유의하여야 한다. 게다가, 다운믹스 프로세서는 여기서 설명되는 특징들과 기능들 중 어느 하나에 의해 추가될 수 있다는 것에 유의하여야 한다.It should be noted that the downmix processor 3100, or the individual components or their functions, may be used within the forwarded audio decoders. In addition, it should be noted that the downmix processor may be added by any of the features and functions described herein.

20.2 공간 오디오 20.2 Spatial audio 오브젝트Object 코딩 3차원 프로세싱 Coded 3D Processing

ISO/IEC 23003-1:2007에서 설명되는 하이브리드 필터뱅크가 적용된다. 다운믹스 이득(DMG), 오브젝트 레벨 차이 정보(OLD), 오브젝트간 상관(IOC) 파러마터들의 역양자화는 ISO/IEC 23003-2:2010의 7.1.2에 정의된 것과 동일한 규칙들을 따른다.The hybrid filter bank described in ISO / IEC 23003-1: 2007 applies. Inverse quantization of downmix gain (DMG), object level difference information (OLD), and inter-object correlation (IOC) paramaters follows the same rules as defined in ISO / IEC 23003-2: 2010, 7.1.2.

20.2.1 신호들 및 파라미터들 20.2.1 Signals and Parameters

오디오 신호들은 모든 타임 슬롯(n) 및 모든 하이브리드 서브대역(k)을 위하여 정의된다. 상응하는 공간 오디오 오브젝트 코딩 3차원 파라미터들이 모든 파라미터 타임 슬롯(t) 및 처리 대역(m)을 위하여 정의된다. 하이브리드 및 파라미터 도메인 사이의 뒤따르는 매핑은 ISO/IEC 23003-1:2007의 테이블 A.31에 의해 지정된다. 따라서, 특정 시간/대역 지수들과 관련하여 모든 계산이 실행되고 각각의 도입된 변수를 위하여 상응하는 차원수들이 표시된다.Audio signals are defined for all time slots ( n ) and all hybrid subbands ( k ). Corresponding spatial audio object coding three-dimensional parameters are defined for all parameter timeslot ( t ) and processing band ( m ). The following mapping between the hybrid and parameter domains is specified by ISO / IEC 23003-1: 2007, Table A.31. Thus, all calculations are performed with respect to specific time / band indices and corresponding dimension numbers are displayed for each introduced variable.

공간 오디오 오브젝트 코딩 3차원 디코더에서 이용 가능한 데이터는 다채널 다운믹스 신호(X), 공분산 매트릭스(E), 렌더링 매트릭스(R) 및 다운믹스 매트릭스(D)로 구성된다.The data available in the spatial audio object coding three-dimensional decoder consists of a multi-channel downmix signal X , a covariance matrix E , a rendering matrix R and a downmix matrix D.

20.2.1.1 오브젝트 파라미터들20.2.1.1 Object Parameters

요소들(e _i,j)을 갖는 크기(N×N)의 공분산 매트릭스(E)는 원래 신호 공분산 매트릭스의 근사치(

)를 표현하고 다음과 같이 오브젝트 레벨 차이 정보 및 오브젝트간 상관 파라미터들로부터 획득된다:The (N x N) covariance matrix E with elements e _{i, j} is an approximation of the original signal covariance matrix (

) And is obtained from object level difference information and inter-object correlation parameters as follows:

.

따라서, 역양자화된 오브젝트 파라미터들은 다음과 같이 획득된다:Thus, the dequantized object parameters are obtained as follows:

OLD _i = D _OLD(i,l,m), IOC _i,j = D _IOC(i,j,l,m). OLD _i = D _OLD ( i , l , m ), IOC _{i, j} = D _IOC ( i , j , l , m ).

20.2.1.3 다운믹스 매트릭스20.2.1.3 Downmix Matrix

입력 오디오 신호들(S)에 적용되는 다운믹스 매트릭스(D)는 X = DS로서 다운믹스 신호를 결정한다. 크기(N _dmx×N)의 다운믹스 매트릭스(D)는 다음과 같이 획득된다:The downmix matrix D applied to the input audio signals S determines the downmix signal as X = DS . The downmix matrix D of size ( N _dmx x N ) is obtained as follows:

D = D _dmx D _premix . D = D _dmx D _premix .

매트릭스(D _dmx ) 및 매트릭스(D _premix )는 처리 모드에 의존하여 상이한 크기들을 갖는다. 매트릭스(D _dmx )는 다음과 같이 다운믹스 이득 파라미터들로부터 획득된다:The matrix ( D _dmx ) and the matrix ( D _premix ) have different sizes depending on the processing mode. The matrix ( D _dmx ) is obtained from the downmix gain parameters as follows:

따라서, 역양자화된 다운믹스 파라미터들은 다음과 같이 획득된다:Thus, the dequantized downmix parameters are obtained as follows:

DMG _i,j = D _DMG(i,j,l) DMG _{i, j} = D _DMG ( i , j , l )

20.2.1.3.1 직접 모드20.2.1.3.1 Direct mode

직접 모드의 경우에, 어떠한 프리믹싱도 사용되지 않는다. 매트릭스(D _premix)는 크기(N×N)를 갖고 D _premix = I에 의해 주어진다. 매트릭스(D _dmx)는 크기(N _dmx×N)를 갖고 20.2.1.3에 따라 다운믹스 파라미터들로부터 획득된다.In the case of direct mode, no pre-mixing is used. The matrix ( D _premix ) has a size ( N × N ) and is given by D _premix = I. The matrix ( D _dmx ) has a size ( N _dmx x N ) and is obtained from the downmix parameters according to 20.2.1.3.

20.2.1.3.2 프리믹싱 모드20.2.1.3.2 Pre-Mixing Mode

프리믹싱 모드의 경우에, 매트릭스(D _premix)는 크기(N _ch + N _premix)를 갖고 다음에 의해 주어지는데,In the case of the premixing mode, the matrix D _premix has a magnitude ( N _ch + N _premix ) and is given by:

여기서 프리믹싱 매트릭스(A)의 크기(N _premix×N _obj)는 입력으로서 오브젝트 렌더러로부터 공간 오디오 오브젝트 코딩 3차원 디코더로 수신된다.Where the size ( N _premix x N _obj ) of the _premixing matrix A is received as input from the object renderer into a spatial audio object coding three-dimensional decoder.

매트릭스(D _dmx)는 크기(N _dmx×(N _ch + N _premix))를 갖고 20.2.1.3에 따라 다운믹스 이득 파라미터들로부터 획득된다.The matrix ( D _dmx ) has a magnitude ( N _dmx × ( N _ch + N _premix )) and is obtained from the downmix gain parameters according to 20.2.1.3.

2.2.1.2 렌더링 매트릭스2.2.1.2 Rendering Matrix

입력 출력 신호들(S) 상에 적용되는 렌더링 매트릭스(R)는 Y = RS와 같이 표적 렌더링된 출력을 결정한다. 다음에 의해 크기(N _out×N)의 렌더링 매트릭스(R)가 주어지는데:The rendering matrix R applied on the input output signals S determines the target rendered output, such as Y = RS . Next, a rendering matrix R of size ( N _out x N ) is given:

R = (R _ch R _obj) R = ( R _ch R _obj )

여기서 크기(N _out×N)의 R _ch는 입력 채널들과 관련된 렌더링 매트릭스를 표현하고 크기(N _obj×N _obj)의 R _obj는 입력 오브젝트들과 관련된 렌더링 매트릭스를 표현한다.Wherein R _ch of the size (N × N _out) is R _obj expression of the rendering matrix associated with the input channel and the size (N × N _obj _obj) represents a rendering matrix associated with the input object.

20.2.1.4 표적 출력 공분산 매트릭스20.2.1.4 Target Output Covariance Matrix

요소들(c _i,j )을 갖는 크기(N _out×N _out)의 공분산 매트릭스(C)는 표적 출력 신호 공분산 매트릭스의 근사치(

)를 표현하고 공분산 매트릭스(E) 및 렌더링 매트릭스(R)로부터 획득된다:The covariance matrix C of size ( N _out x N _out ) with the elements ( c _{i, j} ) is an approximation of the target output signal covariance matrix (

) And is obtained from the covariance matrix ( E ) and the rendering matrix ( R ): < EMI ID =

C = RER ^*. C = RER ^* .

20.2.2 디코딩20.2.2 Decoding

공간 오디오 오브젝트 코딩 3차원 파라미터들과 렌더링 정보를 사용하여 출력 신호를 획득하기 위한 방법이 설명된다. 공간 오디오 오브젝트 코딩 3차원 디코더는 예를 들면, 공간 오디오 오브젝트 코딩 3차원 파라미터 프로미터 및 공간 오디오 오브젝트 코딩 3차원 다운믹스 프로세서로 구성된다.Space Audio Object Coding A method for obtaining an output signal using three-dimensional parameters and rendering information is described. The spatial audio object coding three-dimensional decoder is composed of, for example, a spatial audio object coding three-dimensional parameter parameterizer and a spatial audio object coding three-dimensional downmix processor.

20.2.2.1 다운믹스 프로세서20.2.2.1 Downmix Processor

다운믹스 프로세서(직각 대칭 필터 도메인 내에 표현되는)의 출력 신호는 공간 오디오 오브젝트 코딩 3차원 디코더의 최종 출력을 생산하는 ISO/IEC 23003-1:2007에서 설명된 것과 같이 상응하는 합성 필터뱅크 내로 제공된다. 다운믹스 프로세서의 상세 구조가 도 31에 도시된다.The output signal of the downmix processor (represented within the quadrature symmetric filter domain) is provided in a corresponding synthesis filter bank as described in ISO / IEC 23003-1: 2007, which produces the final output of the spatial audio object coding three-dimensional decoder . The detailed structure of the downmix processor is shown in Fig.

출력 신호(

)는 다음과 같이 다운믹스 신호(X) 및 역상관된 다채널 신호(X _d)로부터 계산되는데:Output signal (

Is calculated from the downmix signal X and the decorrelated multi-channel signal X _d as follows:

,

여기서 U는 파라미터 비-믹싱 매트릭스를 표현하고 20.2.2.1.1 및 20.2.2.1.2에서 정의된다.Where U represents the parameter non-mixing matrix and is defined in 20.2.2.1.1 and 20.2.2.1.2.

역상관된 다채널 신호(X _d)는 20.2.3에 따라 계산된다.The de-correlated multi-channel signal ( X _d ) is calculated according to 20.2.3.

.

믹싱 매트릭스(P = P _dry P _wet))는 20.2.3에서 설명된다. 상이한 출력 구성에 대한 매트릭스들(M _pre)이 도 19 내지 23에 주어지고 다음의 방정식을 사용하여 매트릭스들(M _post)이 획득된다:The mixing matrix ( P = P _dry P _wet ) is described in 20.2.3. Matrices M _pre for different output configurations are given in Figures 19-23 and matrices M _post are obtained using the following equation:

디코딩 모드는 도 32에 도시된 것과 같이 비트스트림 요소(bsNumSaocDmxObjects)에 의해 제어된다.The decoding mode is controlled by the bitstream element (bsNumSaocDmxObjects) as shown in Fig.

20.2.2.1.1 결합된 디코딩 모드20.2.2.1.1 Combined decoding mode

겹합된 디코딩 모드의 경우에 파라미터 비-믹싱 매트릭스(U)는 다음에 의해 주어진다:In the case of the combined decoding mode, the parameter non-mixing matrix U is given by:

U = ED ^* J. U = ED ^* J.

크기(N _dmx×N _dmx)의 매트릭스(J)는

에 의해 주어지고 여기서

이다.The matrix J of size ( N _dmx x N _dmx )

Lt; RTI ID = 0.0 >

to be.

20.2.2.1.2 독립 디코딩 모드20.2.2.1.2 Independent decoding mode

독립 디코딩 모드의 경우에 비-믹싱 매트릭스(U)는 다음에 의해 주어지는데:In the case of the independent decoding mode, the non-mixing matrix U is given by:

여기서

이고

이다.here

ego

to be.

크기(N _ch×N _ch)의 채널 기반 공분산 매트릭스(E _ch) 및 크기(N _obj×N _obj)의 오브젝트 기반 공분산 매트릭스(E _obj)는 상응하는 대각선 블록들만을 선택함으로써 공분산 매트릭스(E)로부터 획득되는데:Size from the channel-based covariance matrix (E _ch) and the size of the object based on the covariance matrix (E _obj) is the covariance matrix (E), by selecting only the corresponding diagonal block of (N _{_obj} × N _obj) of the (N _{_ch} × N _ch) Obtained:

여기서 매트릭스(E _obj,ch = (E _obj,ch)^*)는 입력 채널들 및 입력 오브젝트들 사이의 교차 공분산 매트릭스를 표현하고 계산되도록 요구되지 않는다.Where the matrix ( E _{obj, ch} = ( E _{obj, ch} ) ^* ) is not required to represent and calculate the cross covariance matrix between input channels and input objects.

크기(

)의 채널 기반 다운믹스 매트릭스(D _ch) 및 크기(

)의 오브젝트 기반 다운믹스 매트릭스(D _obj)는 상응하는 대각선 블록들만을 선택함으로써 다운믹스 매트릭스(D)로부터 획득된다:size(

Channel downmix matrix ( _Dch ) and size (< RTI ID = 0.0 >

) Object-based down-mix matrix (D _obj) is obtained from the down-mix matrix (D) by selecting only the corresponding diagonal blocks:

.

크기(

)의 매트릭스(

)는

를 위하여 20.2.2.1.4에 따라 유도된다.size(

) Matrix

)

In accordance with 20.2.2.1.4.

크기(

)의 매트릭스(

)는

을 위하여 20.2.2.1.4에 따라 유도된다.size(

) Matrix

)

Shall be derived in accordance with 20.2.2.1.4.

20.2.2.1.4 매트릭스(J)의 계산20.2.2.1.4 Calculation of Matrix ( J )

매트릭스(

)는 다음의 방정식을 사용하여 계산된다:matrix(

) Is calculated using the following equation:

여기서 매트릭스(△)의 단일 벡터(V)는 다음의 특징 방정식을 사용하여 획득된다:Where a single vector (V) of the matrix (?) Is obtained using the following characteristic equation:

.

대각선 단일 값 매트릭스(△)의 규칙화된 역(inverse,

)은 다음과 같이 계산된다:The inverse of the diagonal single valued matrix [Delta]

) Is calculated as follows:

.

상대적 규칙화 스칼라()는 다음과 같이 절대 임계(T _reg ) 및 △의 최대 값을 사용하여 결정된다:The relative regularization scalar () is determined using the absolute value of the absolute threshold ( T _reg ) and Δ as follows:

.82-3

20.2.3 역상관20.2.3 Reverse correlation

역상관된 신호들(X _d)은 ISO/IEC 23003-1:2007의 6.6.2에서 설명되는 역상관기로부터 생성되는데, 여기서 bsDecorrConfig==0이고, 역상관기 지수(X)는 도 19 내지 24의 테이블들에 따라 생성된다. 따라서, decorrfunc()는 역상관 과정을 나타낸다:The decorrelated signals X _d are generated from the decorrelator described in 6.6.2 of ISO / IEC 23003-1: 2007, where bsDecorrConfig == 0 and the decorrelator index ( X ) It is generated according to the tables. Thus, decorrfunc () denotes the decorrelation process:

X _d = decorrfunc(M _pre Y _dry). X _d = decorrfunc ( M _pre Y _dry ).

20.2.4 믹싱 매트릭스(P)20.2.4 Mixing Matrix ( P )

믹싱 매트릭스의 계산(P = (P _dry P _wet)은 비트스트림 요소(bsDecorrelationMethod)에 의해 제어된다. 매트릭스(P)는 크기(N _out×2N _out)를 갖고 P _dry 및 P _wet은 모두 크기(N _out×N _out)를 갖는다.The calculation of the mixing matrix (P = P _dry P _wet ) is controlled by a bit stream element (bsDecorrelationMethod). The matrix P has a size ( N _out 2 N _out ) and P _dry and P _wet both have a size N _out N _out ).

20.2.4.1 에너지 보상 모드20.2.4.1 Energy Compensation Mode

에너지 보상 모드는 파라미터 재구성에서의 에너지의 손실을 보상하기 위하여 역상관된 신호들을 사용한다. 믹싱 매트릭스들(P _dry 및 P _wet)은 다음에 의해 주어지는데:The energy compensation mode uses decorrelated signals to compensate for the loss of energy in the parameter reconstruction. The mixing matrices P _dry and P _wet are given by:

P _dry = I, P _dry = I ,

여기서 λ _Dec =4는 출력 신호들에 추가된 역상관된 성분의 양을 제한하도록 사용되는 상수이다.Where [lambda] _Dec = 4 is a constant used to limit the amount of the decorrelated component added to the output signals.

20.2.4.2 제한된 공분산 조정 모드20.2.4.2 Restricted Covariance Adjustment Mode

제한된 공분산 조정 모드는 믹싱되고 역상관된 신호들의 공분산 매트릭스(PwetYdry)가 차이 공분산 매트릭스와 근사치가 되도록 보장한다:

믹싱 매트릭스들()은 다음이 방정식을 사용하여 정의되는데:The limited covariance adjustment mode ensures that the covariance matrix (PwetYdry) of the mixed and deconvolved signals approximates the differential covariance matrix:

The mixing matrices () are defined using the following equation:

P _dry = I, P _dry = I ,

여기서 대각선 단일 값 매트릭스(Q₂)의 규칙화된 역(

)이 다음과 같이 계산된다:Where the ordered inverse of the diagonal single valued matrix (Q ₂ )

) Is calculated as follows:

.

상대적 규칙화 스칼라(

)는 다음과 같이 절대 임계(T _reg ) 및

의 최대 값을 사용하여 결정된다:Relative rule scalar (

) Is expressed as absolute threshold ( T _reg ) and

Is determined using the maximum value of:

매트릭스(△ _E )는 다음과 같이 단일 값 분해를 사용하여 분해된다:The matrix (? _E ) is decomposed using single valued decomposition as follows:

역상관된 신호들의 공분산 매트릭스(

)가 또한 단일 값 분해를 사용하여 표현된다:The covariance matrix of the decorrelated signals (

) Is also expressed using single valued decomposition:

20.2.4.3 일반적인 공분산 조정 모드20.2.4.3 General covariance adjustment mode

일반적인 공분산 조정 모드는 최종 출력 신호들의 공분산 매트릭스

)가 표적 공분산 매트릭스와 근사치가 되도록 보장한다:

. 믹싱 매트릭스(P)는 다음의 방정식을 사용하여 정의되는데:A common covariance adjustment mode is a covariance matrix of the final output signals

) Approximates the target covariance matrix:

. The mixing matrix ( P ) is defined using the following equation:

여기서 대각선 단일 값 매트릭스(Q₂)의 규칙화된 역(

) Is calculated as follows:

.

상대적 규칙화 스칼라(

)는 다음과 같이 절대 임계(T _reg ) 및

의 최대 값을 사용하여 결정된다:Relative rule scalar (

) Is expressed as absolute threshold ( T _reg ) and

Is determined using the maximum value of:

표적 공분산 매트릭스(C)는 다음과 같이 단일 값 분해를 사용하여 분해된다:The target covariance matrix ( C ) is decomposed using single value decomposition as follows:

결합된 신호들의 공분산 매트릭스(

)가 또한 단일 값 분해를 사용하여 표현된다:The covariance matrix of the combined signals (

) Is also expressed using single valued decomposition:

매트릭스(H)는 크기(N _out×2N _out)의 프로토타입 가중 매트릭스를 표현하고 다음의 방정식에 의해 주어진다:The matrix H represents a prototype weighted matrix of size ( N _out × 2 N _out ) and is given by the following equation:

20.2.4.4 도입된 공분산 매트릭스들20.2.4.4 Introduced covariance matrices

매트릭스(△ _E )는 표적 출력 공분산 매트릭스(C) 및 파라미터로 재구성된 신호들의 공분산 매트릭스(

) 사이의 차이를 표현하고 다음에 의해 주어진다:Matrix (△ _E) is the covariance matrix of the reconstructed signal to the target output covariance matrix (C) and a parameter (

) And is given by: < RTI ID = 0.0 >

매트릭스(

)는 파라미터로 추정된 신호들의 공분산 매트릭스(

)를 표현하고 다음의 방정식을 사용하여 정의된다:matrix(

) Is the covariance matrix of the signals estimated by the parameters (

) And is defined using the following equation:

매트릭스(

)는 역상관된 신호들의 공분산 매트릭스(

)를 표현하고 다음의 방정식을 사용하여 정의된다:matrix(

) Is the covariance matrix of the decorrelated signals < RTI ID = 0.0 > (

) And is defined using the following equation:

파라미터 추정되고 역상관된 신호들의 조합으로 구성되는 신호(Y _com)를 다음과 같이 고려할 때:Considering a signal ( Y _com ) consisting of a combination of parameterally-estimated and decorrelated signals as follows:

Y _com의 공분산 매트릭스는 다음의 방정식에 의해 정의된다:The covariance matrix of Y _com is defined by the following equation:

21 구현 대안들21 Implementation alternatives

장치의 맥락에서 일부 양상들이 설명되었으나, 이러한 양상들은 또한 블록 또는 장치가 방법 단계 또는 방법 단계의 특징과 상응하는, 상응하는 방법의 설명을 나타낸다는 것은 자명하다. 유사하게, 방법 단계의 맥락에서 설명된 양상들은 또한 상응하는 블록 또는 아이템 또는 장치의 특징을 나타낸다. 일부 또는 모든 방법 단계는 예를 들면 마이크로프로세서 같은, 하드웨어 장치에 의해(사용하여) 실행될 수 있다. 일부 실시 예들에서, 그러한 장치에 가장 중요한 방법 단계의 일부 하나 이상이 실행될 수 있다.While some aspects have been described in the context of an apparatus, it is to be understood that these aspects also illustrate the corresponding method of the method, or block, corresponding to the features of the method steps. Similarly, the aspects described in the context of the method steps also represent the corresponding block or item or feature of the device. Some or all method steps may be performed (by use) by a hardware device, such as, for example, a microprocessor. In some embodiments, some or more of some of the most important method steps for such devices may be performed.

본 발명의 인코딩된 오디오 신호들은 디지털 저장 매체 상에 저장될 수 있거나 혹은 무선 전송 매체 또는 인터넷과 같은 유선 전송 매체와 같은 전송 매체 상에 전송될 수 있다.The encoded audio signals of the present invention may be stored on a digital storage medium or transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.

특정 구현 요구사항들에 따라, 본 발명의 실시 예는 하드웨어 또는 소프트웨어에서 구현될 수 있다. 구현은 디지털 저장 매체, 예를 들면, 그 안에 저장되는 전자적으로 판독가능한 제어 신호들을 갖는, 플로피 디스크, DVD, 블루-레이, CD, RON, PROM, EPROM, EEPROM 또는 플래시 메모리를 사용하여 실행될 수 있으며, 이는 각각의 방법이 실행되도록 프로그램가능 컴퓨터 시스템과 협력한다(또는 협력할 수 있다). 따라서, 디지털 저장 매체는 컴퓨터로 판독 가능할 수 있다.Depending on the specific implementation requirements, embodiments of the invention may be implemented in hardware or software. An implementation may be implemented using a digital storage medium, such as a floppy disk, DVD, Blu-ray, CD, RON, PROM, EPROM, EEPROM or flash memory, having electronically readable control signals stored therein , Which cooperate (or cooperate) with the programmable computer system so that each method is executed. Thus, the digital storage medium may be computer readable.

본 발명에 따른 일부 실시 예들은 여기에 설명된 방법들 중 어느 하나가 실행되도록, 프로그램가능 컴퓨터 시스템과 협력할 수 있는, 전자적으로 판독 가능한 제어 신호들을 갖는 데이터 캐리어를 포함한다.Some embodiments in accordance with the present invention include a data carrier having electronically readable control signals that can cooperate with a programmable computer system such that any one of the methods described herein is executed.

일반적으로, 본 발명의 실시 예들은 프로그램 코드를 갖는 컴퓨터 프로그램 제품으로서 구현될 수 있으며, 프로그램 코드는 컴퓨터 프로그램 제품이 컴퓨터 상에서 구동할 때 방법들 중 어느 하나를 실행하도록 운영될 수 있다. 프로그램 코드는 예를 들면, 기계 판독가능 캐리어 상에 저장될 수 있다.In general, embodiments of the present invention may be implemented as a computer program product having program code, wherein the program code is operable to execute any of the methods when the computer program product is running on the computer. The program code may, for example, be stored on a machine readable carrier.

다른 실시 예들은 기계 판독가능 캐리어 상에 저장되는, 여기에 설명된 방법들 중 어느 하나를 실행하기 위한 컴퓨터 프로그램을 포함한다.Other embodiments include a computer program for executing any of the methods described herein, stored on a machine readable carrier.

바꾸어 말하면, 본 발명의 방법의 일 실시 예는 따라서 컴퓨터 프로그램이 컴퓨터 상에 구동할 때, 여기에 설명된 방법들 중 어느 하나를 실행하기 위한 프로그램 코드를 갖는 컴퓨터 프로그램이다.In other words, one embodiment of the method of the present invention is therefore a computer program having program code for executing any of the methods described herein when the computer program runs on a computer.

본 발명의 방법들의 또 다른 실시 예는 따라서 여기에 설명된 방법들 중 어느 하나를 실행하기 위한 컴퓨터 프로그램을 포함하는, 그 안에 기록되는 데이터 캐리어(또는 디지털 저장 매체, 또는 컴퓨터 판독가능 매체)이다. 데이터 캐리어, 디지털 저장 매체 또는 녹음된 매체는 일반적으로 유형 및/또는 비-일시적이다.Another embodiment of the methods of the present invention is thus a data carrier (or digital storage medium, or computer readable medium) recorded therein, including a computer program for carrying out any of the methods described herein. Data carriers, digital storage media or recorded media are typically type and / or non-transient.

본 발명의 방법의 또 다른 실시 예는 따라서 여기에 설명된 방법들 중 어느 하나를 실행하기 위한 컴퓨터 프로그램을 나타내는 데이터 스트림 또는 신호들의 시퀀스이다. 데이터 스트림 또는 신호들의 시퀀스는 예를 들면 데이터 통신 연결, 예를 들면 인터넷을 거쳐 전송되도록 구성될 수 있다.Another embodiment of the method of the present invention is thus a sequence of data streams or signals representing a computer program for carrying out any of the methods described herein. The data stream or sequence of signals may be configured to be transmitted, for example, over a data communication connection, e.g., the Internet.

또 다른 실시 예는 여기에 설명된 방법들 중 어느 하나를 실행하도록 구성되거나 혹은 적용되는, 처리 수단, 예를 들면 컴퓨터, 또는 프로그램가능 논리 장치를 포함한다.Yet another embodiment includes processing means, e.g., a computer, or a programmable logic device, configured or adapted to execute any of the methods described herein.

또 다른 실시 예는 그 안에 여기에 설명된 방법들 중 어느 하나를 실행하기 위한 컴퓨터 프로그램이 설치된 컴퓨터를 포함한다.Yet another embodiment includes a computer in which a computer program for executing any of the methods described herein is installed.

본 발명에 따른 또 다른 실시 예는 여기서 설명된 방법들 중 하나를 실행하기 위하여 컴퓨터 프로그램을 수신기에 전달하도록(예를 들면, 전기적으로 또는 광학적으로) 구성되는 장치 또는 시스템을 포함한다. 수신기는 예를 들면, 컴퓨터, 모바일 장치, 메모리 장치 등일 수 있다. 장치 또는 시스템은 예를 들면, 컴퓨터 프로그램을 수신기에 전달하기 위한 파일 서버를 포함할 수 있다.Yet another embodiment according to the present invention includes an apparatus or system configured to transmit (e.g., electrically or optically) a computer program to a receiver to perform one of the methods described herein. The receiver may be, for example, a computer, a mobile device, a memory device, or the like. A device or system may include, for example, a file server for delivering a computer program to a receiver.

일부 실시 예들에서, 여기에 설명된 방법들 중 일부 또는 모두를 실행하기 위하여 프로그램가능 논리 장치(예를 들면, 필드 프로그램가능 게이트 어레이)가 사용될 수 있다. 일부 실시 예들에서, 필드 프로그램가능 게이트 어레이는 여기서 설명된 방법들 중 어느 하나를 실행하기 위하여 마이크로프로세서와 협력할 수 있다. 일반적으로, 방법들은 바람직하게는 어떠한 하드웨어 장치에 의해 실행된다.In some embodiments, a programmable logic device (e.g., a field programmable gate array) may be used to implement some or all of the methods described herein. In some embodiments, the field programmable gate array may cooperate with the microprocessor to perform any of the methods described herein. Generally, the methods are preferably executed by any hardware device.

이에 설명된 실시 예들은 단지 본 발명의 원리들을 위한 설명이다. 여기에 설명된 배치들과 상세내용들의 변형과 변경은 통상의 지식을 가진 자들에 자명할 것이라는 것을 이해할 것이다. 따라서, 본 발명은 여기에 설명된 실시 예들의 설명에 의해 표현된 특정 상세내용이 아닌 특허 청구항의 범위에 의해서만 한정되는 것으로 의도된다.The embodiments described herein are merely illustrative for the principles of the present invention. It will be appreciated that variations and modifications of the arrangements and details described herein will be apparent to those of ordinary skill in the art. Accordingly, it is intended that the invention not be limited to the specific details presented by way of description of the embodiments described herein, but only by the scope of the patent claims.

참고문헌references

[BCC] C. Faller and F. Baumgarte, "Binaural Cue Coding - Part II: Schemes and applications," IEEE Trans. on Speech and Audio Proc., vol. 11, no. 6, Nov. 2003.[BCC] C. Faller and F. Baumgarte, "Binaural Cue Coding - Part II: Schemes and applications," IEEE Trans. on Speech and Audio Proc., vol. 11, no. 6, Nov. 2003.

[Blauert] J. Blauert, "Spatial Hearing - The Psychophysics of Human Sound Localization", Revised Edition, The MIT Press, London, 1997.[Blauert] J. Blauert, " Spatial Hearing - The Psychophysics of Human Sound Localization ", Revised Edition, The MIT Press, London,

[JSC] C. Faller, "Parametric Joint-Coding of Audio Sources", 120th AES Convention, Paris, 2006.[JSC] C. Faller, " Parametric Joint-Coding of Audio Sources ", 120th AES Convention, Paris, 2006.

[ISS1] M. Parvaix and L. Girin: "Informed Source Separation of underdetermined instantaneous Stereo Mixtures using Source Index Embedding", IEEE ICASSP, 2010.[ISS1] M. Parvaix and L. Girin: "Informed Source Separation of Underdetermined Instantaneous Stereo Mixtures Using Source Index Embedding", IEEE ICASSP, 2010.

[ISS2] M. Parvaix, L. Girin, J.-M. Brossier: "A watermarking-based method for informed source separation of audio signals with a single sensor", IEEE Transactions on Audio, Speech and Language Processing, 2010.[ISS2] M. Parvaix, L. Girin, J.-M. Brossier: " A watermarking-based method for informed source separation of audio signals with a single sensor ", IEEE Transactions on Audio, Speech and Language Processing, 2010.

[ISS3] A. Liutkus and J. Pinel and R. Badeau and L. Girin and G. Richard: "Informed source separation through spectrogram coding and data embedding", Signal Processing Journal, 2011.[ISS3] A. Liutkus and J. Pinel and R. Badeau and L. Girin and G. Richard: "Informed source separation through spectrogram coding and data embedding", Signal Processing Journal, 2011.

[ISS4] A. Ozerov, A. Liutkus, R. Badeau, G. Richard: "Informed source separation: source coding meets source separation", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2011.[ISS4] A. Ozerov, A. Liutkus, R. Badeau, G. Richard: "Informed source separation: source coding meets source separation", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,

[ISS5] S. Zhang and L. Girin: "An Informed Source Separation System for Speech Signals", INTERSPEECH, 2011.[ISS5] S. Zhang and L. Girin: "An Informed Source Separation System for Speech Signals", INTERSPEECH, 2011.

[ISS6] L. Girin and J. Pinel: "Informed Audio Source Separation from Compressed Linear Stereo Mixtures", AES 42nd International Conference: Semantic Audio, 2011.[ISS6] L. Girin and J. Pinel: "Informed Audio Source Separation from Compressed Linear Stereo Mixtures", AES 42nd International Conference: Semantic Audio, 2011.

[MPS] ISO/IEC, "Information technology - MPEG audio technologies - Part 1: MPEG Surround," ISO/IEC JTC1/SC29/WG11 (MPEG) international Standard 23003-1:2006.[MPS] ISO / IEC, "Information technology - MPEG audio technologies - Part 1: MPEG Surround," ISO / IEC JTC1 / SC29 / WG11 (MPEG) international Standard 23003-1: 2006.

[OCD] J. Vilkamo, T. Bㅁckstrom, and A. Kuntz. "Optimized covariance domain framework for time-frequency processing of spatial audio", Journal of the Audio Engineering Society, 2013. in press.[OCD] J. Vilkamo, T. B ㅁ ckstrom, and A. Kuntz. "Optimized covariance domain framework for time-frequency processing of spatial audio", Journal of the Audio Engineering Society, 2013. in press.

[SAOC1] J. Herre, S. Disch, J. Hilpert, O. Hellmuth: "From SAC To SAOC - Recent Developments in Parametric Coding of Spatial Audio", 22nd Regional UK AES Conference, Cambridge, UK, April 2007.[SAOC1] J. Herre, S. Disch, J. Hilpert, and O. Hellmuth: "From SAC To SAOC - Recent Developments in Parametric Coding of Spatial Audio", 22nd Regional UK AES Conference, Cambridge, UK, April 2007.

[SAOC2] J. Engdeg?rd, B. Resch, C. Falch, O. Hellmuth, J. Hilpert, A. H?lzer, L. Terentiev, J. Breebaart, J. Koppens, E. Schuijers and W. Oomen: " Spatial Audio Object Coding (SAOC) - The Upcoming MPEG Standard on Parametric Object Based Audio Coding", 124th AES Convention, Amsterdam 2008.J. Engdeg? Rd, B. Resch, C. Falch, O. Hellmuth, J. Hilpert, A. H? Lzer, L. Terentiev, J. Breebaart, J. Koppens, E. Schuijers and W. Oomen : &Quot; Spatial Audio Object Coding (SAOC) - The Upcoming MPEG Standard on Parametric Object Based Audio Coding ", 124th AES Convention, Amsterdam 2008.

[SAOC] ISO/IEC, "MPEG audio technologies - Part 2: Spatial Audio Object Coding (SAOC)," ISO/IEC JTC1/SC29/WG11 (MPEG) International Standard 23003-2.[SAOC] ISO / IEC, "MPEG audio technologies - Part 2: Spatial Audio Object Coding (SAOC)," ISO / IEC JTC1 / SC29 / WG11 (MPEG) International Standard 23003-2.

International Patent No. WO/2006/026452, "MULTICHANNEL DECORRELATION IN SPATIAL AUDIO CODING" issued on 9 March 2006.International Patent No. WO / 2006/026452, "MULTICHANNEL DECORRELATION IN SPATIAL AUDIO CODING" issued on 9 March 2006.

100 : 다채널 오디오 디코더
110 : 인코딩된 표현
112, 114 : 출력 오디오 신호
120 : 디코더
122 : 디코딩된 오디오 신호
130 : 렌더러
132 : 렌더링 파라미터
134, 136 : 렌더링된 오디오 신호
140 : 역상관기
142, 144 : 역상관된 오디오 신호
150 : 결합기
200 : 다채널 오디오 인코더
210, 212 : 입력 오디오 신호
214 : 인코딩된 표현
220 : 다운믹스 신호 제공기
222 : 다운믹스 신호
230 : 파라미터 제공기
232 : 파라미터
240 : 역상관 방법 파라미터 제공기
242 : 역상관 방법 파라미터
500 : 인코딩된 오디오 표현
510 : 다운믹스 신호의 인코딩된 표현
520 : 파라미터의 인코딩된 표현
530 : 인코딩된 역상관 방법 파라미터
600 : 다채널 역상관기
610a-610n : N 역상관기 입력 신호
612a-612n' : N' 역상관기 출력 신호
620 : 프리믹서
622a-622k : K 역상관기 입력 신호
630 : 역상관기 코어
640 : 포스트믹서
700 : 다채널 오디오 디코더
710 : 인코딩된 표현
712, 714 : 출력 신호
720 : 다채널 역상관기
800 : 다채널 오디오 인코더
810, 812 : 입력 오디오 신호
814 : 오디오 콘텐츠의 인코딩된 표현
820 : 다운믹스 신호 제공기
822 : 다운믹스 신호
830 : 파라미터 제공기
832 : 파라미터
840 : 역상관 복잡도 파라미터 제공기
842 : 역상관 복잡도 파라미터
1012 : 인코딩된 표현
1014, 1016 : 출력 오디오 신호
1112, 1114 : 입력 오디오 신호
1200 : 인코딩된 오디오 표현
1210 : 다운믹스 신호의 인코딩된 표현
1220 : 파라미터의 인코딩된 표현
1230 : 인코딩된 역상관 복잡도 파라미터
1310 : 인코더
1312a-1312n : 오브젝트 신호
1314 : 믹싱 파라미터들
1316a, 1316b : 다운믹스 신호
1318 : 부가 정보
1320 : 믹서
1330 : 부가 정보 추정기
1340 : 디코더
1352a-1352n : 출력 오디오 신호
1354 : 사용자 상호작용 정보
1360 : 파라미터 오브젝트 분리기
1360a, 1360b : 다운믹스 신호
1362a-1362n : 오브젝트 신호
1370 : 부가 정보 프로세서
1372 : 제어 정보
1380 : 렌더러
1510 : 인코더
1512a-1512n : 오브젝트 신호
1514 : 믹싱 파라미터
1516a, 1516b : 다운믹스 신호
1518 : 부가 정보
1550 : 디코더
1552a-1552n : 출력 오디오 신호
1560 : 파라미터 오브젝트 분리기
1570 : 부가 정보 프로세서
1580 : 렌더러
1582a-1582n : 렌더링된 오디오 신호
1590 : 역상관기
1592a-1592n : 역상관된 오디오 신호
1598 : 믹서
1600 : 역상관 유닛
1610a-1610n : N 역상관기 입력 신호
1612a-1612n : N 역성관 출력 신호
1620n-1620n : N 개별 역상관기
1700 : 역상관 유닛
1710a-1710n : N 역상관기 입력 신호
1712a-1712n : N 역상관기 출력 신호
1720 : 프리믹서
1722a-1722k : K 역상관기 입력 신호
1730 : 역상관기 코더
1732a 내지 1732k : 역상관기 출력 신호
1740 : 포스트믹서
2900 : 3차원 오디오 인코더
2910 : 프리-렌더러/믹서
2912 : 채널 신호
2914 : 오브젝트 신호
2916 : 채널 신호
2918, 2920 : 오브젝트 신호
2930 : USAC 인코더
2932 : 인코딩된 표현
2940 : 공간 오디오 오브젝트 코딩 인코더
2942 : 공간 오디오 오브젝트 코딩 전송 채널
2944 : 공간 오디오 오브젝트 코딩 부가 정보
2950 : 오브젝트 메타데이터 인코더
2952 : 오브젝트 메타데이터
2954 : 인코딩된 오브젝트 메타데이터
3000 : 오디오 디코더
3010 : 인코딩된 표현
3012 : 다채널 확성기 신호
3014 : 헤드폰 신호
3016 : 확성기 신호
3020 : USAC 디코더
3022 : 채널 신호
3024 : 프리렌더링된 오브젝트 신호
3025 : 오브젝트 신호
3026 : 오브젝트 신호
3028 : 공간 오디오 오브젝트 코딩 전송 채널
3030 : 공간 오디오 오브젝트 코딩 부가 정보
3032 : 압축된 오브젝트 메타데이터 정보
3040 : 렌더러
3042 : 렌더링된 오브젝트 신호
3044 : 오브젝트 메타데이터 정보
3050 : 오브젝트 메타데이터 디코더
3062 : 압축된 오브젝트 메타데이터 정보
3070 : 믹서
3072 : 믹싱된 채널 신호
3080 : 바이노럴 렌더러
3092 : 재생 레이아웃 정보
3090 : 포맷 전환
3100 : 다운믹스 프로세서
3110 : 업믹서
3120 : 렌더러
3130 : 결합기
3140 : 다채널 역상관기
3150 : 프리믹서
3160 : 역상관기 코어
3170 : 포스트믹서100: Multi-channel audio decoder
110: Encoded representation
112, 114: output audio signal
120: decoder
122: decoded audio signal
130: Renderer
132: Rendering parameters
134, 136: Rendered audio signal
140:
142, 144: decoded audio signal
150: coupler
200: Multi-channel audio encoder
210, 212: input audio signal
214: Encoded representation
220: Downmix signal provider
222: Downmix signal
230: Parameter provider
232: Parameter
240: an inverse correlation method parameter provider
242: Reverse correlation method parameters
500: Encoded audio representation
510: Encoded representation of the downmix signal
520: Encoded representation of the parameter
530: Encoded anticorrosion method parameter
600: Multichannel correlator
610a-610n: N-inverse correlator input signal
612a-612n ': N' decorrelator output signal
620: pre-mixer
622a-622k: K correlator input signal
630: Inverse Correlator Core
640: Post mixer
700: Multi-channel audio decoder
710: Encoded representation
712, 714: output signal
720: Multi-channel decorrelator
800: Multi-channel audio encoder
810, 812: input audio signal
814: Encoded representation of audio content
820: Downmix signal provider
822: Downmix signal
830: Parameter provider
832: Parameter
840: Inverse correlation complexity parameter provider
842: Inverse correlation complexity parameter
1012: Encoded representation
1014, 1016: output audio signal
1112, 1114: Input audio signal
1200: Encoded audio representation
1210: Encoded representation of the downmix signal
1220: Encoded representation of the parameter
1230: Encoded inverse correlation complexity parameter
1310: Encoder
1312a-1312n: an object signal
1314: Mixing parameters
1316a, 1316b: Downmix signal
1318: Additional information
1320: Mixer
1330: additional information estimator
1340: decoder
1352a-1352n: Output audio signal
1354: About user interaction
1360: Parameter object separator
1360a, 1360b: Downmix signal
1362a-1362n: Object signal
1370: Additional information processor
1372: Control information
1380: The renderer
1510: Encoder
1512a-1512n: Object signal
1514: Mixing parameter
1516a, 1516b: Downmix signal
1518: Additional information
1550: decoder
1552a-1552n: Output audio signal
1560: Parameter object separator
1570: Additional information processor
1580: The renderer
1582a-1582n: Rendered audio signal
1590: Inverse correlator
1592a-1592n: decoded audio signal
1598: Mixer
1600: Inverse correlation unit
1610a-1610n: N inverse correlator input signal
1612a-1612n: N backplane tube output signal
1620n-1620n: N individual inverse correlators
1700: Inverse correlation unit
1710a-1710n: N inverse correlator input signal
1712a-1712n: N inverse correlator output signal
1720: pre-mixer
1722a-1722k: K correlator input signal
1730: Inverse Correlator Coder
1732a to 1732k: the decorrelator output signal
1740: Post mixer
2900: 3D Audio Encoder
2910: Free-Renderer / Mixer
2912: Channel signal
2914: Object signal
2916: Channel signal
2918, 2920: Object signal
2930: USAC encoder
2932: Encoded representation
2940: Spatial audio object coding encoder
2942: Spatial audio object coding transmission channel
2944: Spatial audio object coding additional information
2950: Object Metadata Encoder
2952: Object Metadata
2954: Encoded object metadata
3000: Audio decoder
3010: Encoded representation
3012: Multi-channel loudspeaker signal
3014: Headphone signal
3016: Loudspeaker signal
3020: USAC decoder
3022: Channel signal
3024: Pre-rendered object signal
3025: Object signal
3026: Object signal
3028: Spatial audio object coding transmission channel
3030: Space audio object coding additional information
3032: compressed object metadata information
3040: Renderer
3042: rendered object signal
3044: Object meta data information
3050: object meta data decoder
3062: Compressed object metadata information
3070: Mixer
3072: Mixed channel signal
3080: Binaural Renderer
3092: Playback layout information
3090: Format conversion
3100: Downmix processor
3110: Upmixer
3120: Renderer
3130: Coupler
3140: Multi-channel decorrelator
3150: Pre-Mixer
3160: Correlator core
3170: Post mixer

Claims

A plurality of decorrelated signals (142, 144; 612a-612n ';1592a-1592n; 1712a-1712n) based on a plurality of decorrelator input signals (134, 136; 610a-610n; 1582a-1582n; 1710a-1710n) A multi-channel decorrelator (140; 600; 1590; 1700) for providing a multi-
The multi-channel decorrelator includes a first set of N decorrelator input signals 134, 136, 610a-610n, 1582a-1582n, 1710a-1710n,

) To a second set of K decorrelator input signals (622a-622k; 1722a-1722k;

), Where K < N,
The multi-channel decorrelator provides a first set of K 'decorrelator output signals (632a-632k'; 1732a-1732k) based on a first set of K decorrelator input signals,
The multi-channel decorrelator upmixes the first set of K 'decorrelator output signals into a second set of N' decorrelator output signals (142, 144; 612a-612n ';1592a-1592n; 1712a-1792n) Where N '>K'
The multi-channel decorrelators are as follows:

A pre-mixing matrix ( M _pre ) is used to generate a first set of N decorrelator input signals

) To a second set of K decorrelator input signals (

), &Lt; / RTI >
The multi-channel decorrelator includes a second set of K decorrelator input signals (

A first set of K 'decorrelator output signals (< RTI ID = 0.0 >

),
The multi-channel decorrelators are as follows:

A first set of K 'decorrelator output signals (&quot; M _post "

) Into a second set ( W ) of N 'decorrelator output signals,
The multi-channel decorrelator includes a first set of N decorrelator input signals (

Channel signals are selected in dependence on the spatial positions associated with the pre-mixing matrix ( M _pre ).

The multi-channel decorrelator of claim 1, wherein K = K '.

The multi-channel decorrelator of claim 1, wherein N = N '.

2. The multi-channel decorrelator of claim 1, wherein N > = 3 and N >

The apparatus of claim 1, wherein the multi-channel decorrelator comprises a first set of N decorrelator input signals (

( M _pre ) depending on the correlation features or covariance characteristics of the multi-channel decorrelator.

The apparatus of claim 1, wherein the multi-channel decorrelator is a matrix product,

Wherein the pre-mixing matrix is determined to be in good condition with respect to the inversion operation.

2. The apparatus of claim 1, wherein the multi-channel decorrelator comprises:

And obtains the post-mixing matrix ( M _post ).

Receiving information on a rendering configuration associated with the channel signal) and the multi-channel decorrelator multichannel decorrelator characterized in that in dependence on information about the rendered configuration select pre-mixing matrix (M _pre).

The apparatus of claim 1, wherein the multi-channel decorrelator comprises a first set of N decorrelator input signals associated with spatially adjacent positions of the audio scene when performing the premixing

&Lt; / RTI > of the plurality of channel decorrelators.

10. The apparatus of claim 9, wherein the multi-channel decorrelator includes a first set of N decorrelator input signals associated with vertically spatially adjacent locations of the audio scene when performing the premixing

&Lt; / RTI > of the plurality of channel decorrelators.

2. The apparatus of claim 1, wherein the multi-channel decorrelator comprises a first set of N decorrelator input signals associated with a horizontal pair of spatial positions including a left side position and a right side position

&Lt; / RTI > of the plurality of channel decorrelators.

2. The apparatus of claim 1, wherein the multi-channel decorrelator comprises a first set of N decorrelator input signals associated with vertically spatially contiguous positions of the audio scene when performing the premixing

&Lt; / RTI > of the plurality of channel decorrelators.

13. The method of claim 12 wherein at least two left side channel signals to be combined are associated with spatial positions symmetrical to spatial positions associated with at least two right side channel signals to be combined, Channel decorrelator.

The apparatus of claim 1, wherein the multi-channel decorrelator receives complexity information describing a number (K) of decorrelator input signals of a second set of decorrelator input signals, and wherein the multi-channel decorrelator dependent and is characterized in that for selecting the pre-mixing matrix _(pre M) channel decorrelator.

15. The apparatus of claim 14, wherein the multi-channel decorrelator is operative to reduce the value of the complexity information to a first value of the decorrelator input signals to obtain the decorrelator input signals of the second set of decorrelator input signals. set(

) Incrementally increases the number of decorrelator input signals of the multi-channel decorrelator.

15. The apparatus of claim 14, wherein the multi-channel decorrelator comprises a first one of the decorrelator input signals associated with vertically spatially adjacent positions on the left side of the audio scene when performing the premixing for a first value of the complexity information, set(

&Lt; / RTI > only the channel signals of < RTI ID =
The multi-channel decorrelator is operable to perform a pre-mixing operation on a vertically spatially adjacent location on the left side of the audio scene to obtain a given signal of the second set of decorrelator input signals when performing the premixing for a second value of the complexity information A first set of decorrelator input signals associated with < RTI ID = 0.0 >

A first set of decorrelator input signals associated with at least two of the channel signals and vertically spatially adjacent positions on the right side of the audio scene

And a channel correlation unit for combining the at least two channel signals.

15. The apparatus of claim 14, wherein the multi-channel decorrelator is operable to obtain a given signal of the second set of decorrelator input signals when performing the premixing for a second value of the complexity information, The first set (

Wherein at least two of the at least four channel signals are associated with spatial positions on the left side of the audio scene, and at least two of the at least four channel signals are associated with at least two of the at least four channel signals, And the spatial positions on the right side are related to spatial positions on the right side.

15. The apparatus of claim 14, wherein the multi-channel decorrelator is further adapted to obtain a first decorrelator input signal of a second set of decorrelator input signals, A first set of correlator input signals (

, And to obtain a second set of decorrelator input signals for a first value of the complexity information, a second set of decorrelator input signals for a first value of the complexity information, A first set of decorrelator input signals associated with spatially adjacent locations

, &Lt; / RTI >
The multi-channel decorrelator is operable to obtain a second decorrelator input signal of a second set of decorrelator input signals for a second value of the complexity information, the plurality of vertically spatially adjacent positions A first set of decorrelator input signals associated with the decorrelator < RTI ID = 0.0 >

A first set of decorrelator input signals associated with the at least two channel signals and vertically spatially adjacent locations on the right side of the audio scene

) Of the at least two channel signals,
Wherein the number of decorrelator input signals of the second set of decorrelator input signals is greater for a first value of the complexity information than for a second value of the complexity information.

A multi-channel audio decoder (100; 1550) for providing at least two output audio signals (112,114; 1552a-1552n) based on an encoded representation (110; 1516a, 1516b, 1518)
Characterized in that the multi-channel audio decoder comprises a multi-channel decorrelator (140; 600; 1590; 1700) according to any one of claims 1 to 18.

20. The method of claim 19,
Wherein the multi-channel audio decoder is operative to generate a plurality of rendered audio signals (134, 136; 1582a-1582n) based on the encoded representation, To render the decoded audio signal 122 (1562a-1562n)
The multi-channel audio decoder derives one or more decorrelated audio signals (142, 144; 1592a-1592n) from the rendered audio signals using the multi-channel decorrelator, A second set of decorrelator output signals constituting the decorrelated audio signals,
Wherein the multi-channel audio decoder combines the rendered audio signals or a scaled version thereof with the one or more decorrelated audio signals to obtain the output audio signals.

20. The method of claim 19 wherein the multi-channel audio decoder is a channel, characterized in that for selecting the pre-mixing matrix (M _pre) for use by a said multi-channel decorrelator, depending on the control information included in the encoded representation Audio decoder.

20. The method of claim 19 wherein the multi-channel audio decoder premixing matrix for use by the multi-channel decorrelator, depending on the output configuration for describing the allocation of a spatial location of the audio scene by the output audio signal (M _pre Channel audio decoder.

20. The method of claim 19 wherein the multi-channel audio decoder has three or more different pre-mixing matrix for use by the control information, the multi-channel decorrelator depending on included in the encoded representation for a given output configuration (M _pre ), Wherein each of the three or more different pre-mixing matrices is associated with a different number of signals of the second set of decorrelator input signals.

20. The apparatus of claim 19, wherein the multi-channel audio decoder is configured for use by the multi-channel decorrelator in dependence on a mixing matrix (Dconv, Drender) used by a format converter or renderer that receives the at least two output audio signals ( M _pre ) is selected based on the _pre- mixing matrix ( M _pre ).

25. The apparatus of claim 24, wherein the multi-channel audio decoder is configured for use by the multi-channel decorrelator to be the same as the mixing matrix (Dconv, Drender) used by the format converter or renderer receiving the at least two output audio signals ( M _pre ) is selected based on the _pre- mixing matrix ( M _pre ).

A multi-channel audio encoder (800) for providing an encoded representation (814) based on at least two input audio signals (810; 812)
The multi-channel audio encoder provides one or more downmix signals (822) based on the at least two input audio signals,
The multi-channel audio encoder provides one or more parameters (832) describing a relationship between the at least two input audio signals,
The multi-channel audio encoder provides an inverse correlation complexity parameter 842 that describes the complexity of the decorrelator to be used on the side of the audio decoder,
Wherein the decorrelated complexity parameter determines a number K of decorrelators used in a multi-channel decorrelator that premixes a first set of N de-correlated input signals to a second set of K de-correlated input signals,
Wherein the decorrelation complexity parameter determines the selection of a premixing matrix used to premix a first set of N de-correlated input signals to a second set of K de-correlated input signals in a multi-channel decorrelator. Encoder.

A method (900) for providing a plurality of decorrelated signals based on a plurality of decorrelator input signals,
Pre-mixing (910) a first set of N de-correlator input signals into a second set of K de-correlator input signals, wherein K < N -;
Providing (920) a first set of K 'decorrelator output signals based on the first set of K decorrelator input signals; And
Upmixing the first set of K 'decorrelator output signals into a second set of N' decorrelator output signals (930), wherein N '>K'-;
A first set of N decorrelator input signals ( ) Is as follows:

A pre-mixing matrix ( M _pre ) is used to generate a second set of K decorrelator input signals

Lt; / RTI >
The first set of K 'decorrelator output signals (

&Lt; / RTI > is a second set of K decorrelator input signals (

), &Lt; / RTI >
The first set of K 'decorrelator output signals (

) Is as follows:

Is upmixed into a second set ( W ) of N 'decorrelator output signals using a postmixing matrix ( M _post )
The pre-mix matrix ( M _pre ) may comprise a first set of N decorrelator input signals (

&Lt; / RTI > are selected in dependence on the associated spatial positions. &Lt; Desc / Clms Page number 13 >

A method (1000) for providing at least two output audio signals based on an encoded representation,
The method includes providing (1020) a plurality of decorrelated signals based on a plurality of decorrelator input signals according to claim 27.

A method (1100) for providing an encoded representation based on at least two input audio signals,
Providing (1110) at least one downmix signal based on the at least two input audio signals;
Providing (1120) at least one parameter describing a relationship between at least two input audio signals; And
And providing (1130) an inverse correlation complexity parameter (842) that describes the complexity of the decorrelator to be used on the side of the audio decoder,
Wherein the decorrelated complexity parameter determines a number K of decorrelators used in a multi-channel decorrelator that premixes a first set of N de-correlated input signals to a second set of K de-correlated input signals,
Wherein the decorrelated complexity parameter determines the selection of a premixing matrix used to premix a first set of N de-correlated input signals to a second set of K de-correlated input signals in a multi-channel decorrelator. / RTI >

28. A computer program stored on a computer readable recording medium for executing the method of claim 27, 28 or 29 when the computer program is run on a program.

13. A digital storage medium storing an encoded audio representation (1200)
An encoded representation (1210) of the downmix signal;
An encoded representation (1220) of one or more parameters describing a relationship between at least two input audio signals; And
An encoded decorrelation complexity parameter describing the complexity of the decorrelation to be used on the side of the audio decoder,
Wherein the decorrelated complexity parameter determines a number K of decorrelators used in a multi-channel decorrelator that premixes a first set of N de-correlated input signals to a second set of K de-correlated input signals,
Wherein the decorrelated complexity parameter determines the selection of a premixing matrix used to premix a first set of N de-correlated input signals to a second set of K de-correlated input signals in a multi-channel decorrelator. A digital storage medium storing representations.

) To a second set of K decorrelator input signals (622a-622k; 1722a-1722k;

) To a second set of K decorrelator input signals (

A first set of K 'decorrelator output signals (< RTI ID = 0.0 >

),
The multi-channel decorrelators are as follows:

A first set of K 'decorrelator output signals (&quot; M _post "

Channel pre-mixer matrix ( M _pre ) depending on the correlation features or the covariance characteristics of the channel signals of the multi-channel decorrelator.

) To a second set of K decorrelator input signals (622a-622k; 1722a-1722k;

) To a second set of K decorrelator input signals (

A first set of K 'decorrelator output signals (< RTI ID = 0.0 >

),
The multi-channel decorrelators are as follows:

A first set of K 'decorrelator output signals (&quot; M _post "

) Into a second set ( W ) of N 'decorrelator output signals,
The multi-channel decorrelators are as follows:

And obtains the post-mixing matrix ( M _post ).

) To a second set of K decorrelator input signals (622a-622k; 1722a-1722k;

), Where K < N,
The multi-channel decorrelator provides a first set of K 'decorrelator output signals (632a-632k'; 1732a-1732k) based on a first set of K decorrelator input signals,
The multi-channel decorrelator upmixes the first set of K 'decorrelator output signals into a second set of N' decorrelator output signals (142, 144; 612a-612n ';1592a-1592n; 1712a-1792n) Where N '>K'
The multi-channel decorrelator includes a first set of N decorrelator input signals (

) To receive information for rendering the configuration related to the channel signal, the channel decorrelator is a multi-channel station, characterized in that, depending on the basis of the information on the rendering configuration select pre-mixing matrix (M _pre) Correlator.

delete

) To a second set of K decorrelator input signals (622a-622k; 1722a-1722k;

), Wherein at least two of the at least four channel signals are associated with spatial locations on the left side of the audio scene, and at least two of the at least four channel signals are associated with at least two of the at least four channel signals, Channel spatial correlators are associated with spatial locations on the right-hand side.

) To a second set of K decorrelator input signals (622a-622k; 1722a-1722k;

), Where K < N,
The multi-channel decorrelator provides a first set of K 'decorrelator output signals (632a-632k'; 1732a-1732k) based on a first set of K decorrelator input signals,
The multi-channel decorrelator upmixes the first set of K 'decorrelator output signals into a second set of N' decorrelator output signals (142, 144; 612a-612n ';1592a-1592n; 1712a-1792n) Where N '>K'
The multi-channel decorrelator receives complexity information describing a number K of decorrelator input signals of the second set of decorrelator input signals, wherein the multi-channel decorrelator is operable to generate a premixing matrix M _pre ) is selected.

A multi-channel audio decoder (100; 1550) for providing at least two output audio signals (112,114; 1552a-1552n) based on an encoded representation (110; 1516a, 1516b, 1518)
The multi-channel audio decoder includes a plurality of decorrelated signals (142, 144; 612a-612n '; 1592a) based on a plurality of decorrelator input signals (134, 136; 610a-610n; 1582a-1582n; 1710a-1710n) A multi-channel decorrelator (140; 600; 1590; 1700) for providing a multi-channel decorrelator (1712a-1712n;
The multi-channel decorrelator includes a first set of N decorrelator input signals 134, 136, 610a-610n, 1582a-1582n, 1710a-1710n,

) To a second set of K decorrelator input signals (622a-622k; 1722a-1722k;

), Where K < N,
The multi-channel decorrelator provides a first set of K 'decorrelator output signals (632a-632k'; 1732a-1732k) based on a first set of K decorrelator input signals,
The multi-channel decorrelator upmixes the first set of K 'decorrelator output signals into a second set of N' decorrelator output signals (142, 144; 612a-612n ';1592a-1592n; 1712a-1792n) Where N '>K'
Channel audio decoder selects a pre-mix matrix ( M _pre ) for use by the multi-channel decorrelator depending on an output configuration that describes the assignment of the output audio signals to spatial positions of the audio scene Channel audio decoder.

) To a second set of K decorrelator input signals (622a-622k; 1722a-1722k;

), Where K < N,
The multi-channel decorrelator provides a first set of K 'decorrelator output signals (632a-632k'; 1732a-1732k) based on a first set of K decorrelator input signals,
The multi-channel decorrelator upmixes the first set of K 'decorrelator output signals into a second set of N' decorrelator output signals (142, 144; 612a-612n ';1592a-1592n; 1712a-1792n) Where N '>K'
The multi-channel audio decoder selects between three or more different pre-mixing matrices ( M _pre ) for use by the multi-channel decorrelator depending on the control information contained in the encoded representation for a given output configuration, Wherein each of the three or more different pre-mixing matrices is associated with a different number of signals of the second set of decorrelator input signals.

) To a second set of K decorrelator input signals (622a-622k; 1722a-1722k;

), Where K < N,
The multi-channel decorrelator provides a first set of K 'decorrelator output signals (632a-632k'; 1732a-1732k) based on a first set of K decorrelator input signals,
The multi-channel decorrelator upmixes the first set of K 'decorrelator output signals into a second set of N' decorrelator output signals (142, 144; 612a-612n ';1592a-1592n; 1712a-1792n) Where N '>K'
The multi-channel audio decoder renders a pre-mixing matrix ( M _{pre) for} use by the multi-channel decorrelator depending on a mixing matrix (Dconv, Drender) used by a format converter or renderer that receives the at least two output audio signals. Channel audio decoder.

A method (900) for providing a plurality of decorrelated signals based on a plurality of decorrelator input signals,
Pre-mixing (910) a first set of N de-correlator input signals into a second set of K de-correlator input signals, wherein K < N -;
Providing (920) a first set of K 'decorrelator output signals based on the first set of K decorrelator input signals; And
Upmixing the first set of K 'decorrelator output signals into a second set of N' decorrelator output signals (930), wherein N '>K'-;
A first set of N decorrelator input signals (

) Is as follows:

Lt; / RTI >
The first set of K 'decorrelator output signals (

&Lt; / RTI > is a second set of K decorrelator input signals (

), &Lt; / RTI >
The first set of K 'decorrelator output signals (

) Is as follows:

&Lt; / RTI > is selected depending on the crosstalk characteristics or crossover characteristics of the channel signals of the channel.

) Is as follows:

Lt; / RTI >
The first set of K 'decorrelator output signals (

&Lt; / RTI > is a second set of K decorrelator input signals (

), &Lt; / RTI >
The first set of K 'decorrelator output signals (

) Is as follows:

Is upmixed into a second set ( W ) of N 'decorrelator output signals using a postmixing matrix ( M _post )
The post-mixing matrix ( M _post ) is as follows:

&Lt; / RTI > wherein the first and second signals are obtained.

A method (900) for providing a plurality of decorrelated signals based on a plurality of decorrelator input signals,
Pre-mixing (910) a first set of N de-correlator input signals into a second set of K de-correlator input signals, wherein K < N -;
Providing (920) a first set of K 'decorrelator output signals based on the first set of K decorrelator input signals; And
Upmixing the first set of K 'decorrelator output signals into a second set of N' decorrelator output signals (930), wherein N '>K'-;
The method includes receiving a first set of N decorrelator input signals (

Receiving a plurality of de-correlated signals, characterized in that the _pre- mixing matrix ( M _pre ) is selected depending on the information about the rendering configuration, Methods for providing.

delete

Wherein at least two of the at least four channel signals are associated with spatial locations on the left side of the audio scene and at least two of the at least four channel signals are associated with at least two of the at least four channel signals, Wherein the spatial positions of the right-side spatial locations are related to spatial positions on the right-hand side.

A method (900) for providing a plurality of decorrelated signals based on a plurality of decorrelator input signals,
Pre-mixing (910) a first set of N de-correlator input signals into a second set of K de-correlator input signals, wherein K < N -;
Providing (920) a first set of K 'decorrelator output signals based on the first set of K decorrelator input signals; And
Upmixing the first set of K 'decorrelator output signals into a second set of N' decorrelator output signals (930), wherein N '>K'-;
The method includes the decorrelator input claim comprising the step of receiving the complexity information describing the number (K) of the decorrelator input signal of the second set, and depending on the complexity of the information pre-mixing matrix (M _pre) is selected of the signals &Lt; / RTI > wherein the plurality of decorrelated signals comprise a plurality of decorrelated signals.

A method (1000) for providing at least two output audio signals based on an encoded representation,
The method includes providing (1020) a plurality of decorrelated signals based on a plurality of decorrelator input signals,
Wherein providing a plurality of decorrelated signals based on the plurality of decorrelator input signals comprises:
Pre-mixing (910) a first set of N de-correlator input signals into a second set of K de-correlator input signals, wherein K < N -;
Providing (920) a first set of K 'decorrelator output signals based on the first set of K decorrelator input signals; And
Upmixing the first set of K 'decorrelator output signals into a second set of N' decorrelator output signals (930), wherein N '>K'-;
Characterized in that a _pre- mix matrix ( M _pre ) is selected for use by a multi-channel decorrelator depending on an output configuration that describes the assignment of the output audio signals to spatial positions of an audio scene. / RTI >

A method (1000) for providing at least two output audio signals based on an encoded representation,
The method includes providing (1020) a plurality of decorrelated signals based on a plurality of decorrelator input signals,
Wherein providing a plurality of decorrelated signals based on the plurality of decorrelator input signals comprises:
Pre-mixing (910) a first set of N de-correlator input signals into a second set of K de-correlator input signals, wherein K < N -;
Providing (920) a first set of K 'decorrelator output signals based on the first set of K decorrelator input signals; And
Upmixing the first set of K 'decorrelator output signals into a second set of N' decorrelator output signals (930), wherein N '>K'-;
The method includes selecting between three or more different pre-mixing matrices ( M _pre ) for use by a multi-channel decorrelator depending on control information contained in the encoded representation for a given output configuration, Wherein each of the at least three different pre-mixing matrices is associated with a different number of signals of the second set of decorrelator input signals.

A method (1000) for providing at least two output audio signals based on an encoded representation,
The method includes providing (1020) a plurality of decorrelated signals based on a plurality of decorrelator input signals,
Wherein providing a plurality of decorrelated signals based on the plurality of decorrelator input signals comprises:
Pre-mixing (910) a first set of N de-correlator input signals into a second set of K de-correlator input signals, wherein K < N -;
Providing (920) a first set of K 'decorrelator output signals based on the first set of K decorrelator input signals; And
Upmixing the first set of K 'decorrelator output signals into a second set of N' decorrelator output signals (930), wherein N '>K'-;
Characterized in that a pre-mixing matrix ( M _pre ) is selected for use by a multi-channel decorrelator depending on a mixing matrix (Dconv, Drender) used by a format converter or renderer receiving said at least two output audio signals / RTI > of at least two output audio signals.

A computer program stored on a computer readable recording medium for executing the method of any one of claims 42 to 44 and 47 to 51 when the computer program is run on a program.