KR20070031212A

KR20070031212A - Method and Apparatus for encoding/decoding audio signal

Info

Publication number: KR20070031212A
Application number: KR1020060004048A
Authority: KR
Inventors: 오현오; 방희석; 김동수; 임재현; 정양원
Original assignee: 엘지전자 주식회사
Priority date: 2005-09-14
Filing date: 2006-01-13
Publication date: 2007-03-19

Abstract

본 발명은 효율적인 오디오 신호의 처리를 위한 오디오 신호의 인코딩/디코딩 방법 및 장치에 관한 것이다. The present invention relates to a method and apparatus for encoding / decoding audio signals for efficient processing of audio signals.

본 발명은 채널구성 식별자를 확인하는 단계와, 상기 채널구성이 기 결정되지 않은 경우이면, 입력채널 개수에 대응하는 비트 수가 할당된 제1 채널변환모듈의 개수 정보를 추출하는 단계와, 상기 제1 채널변환모듈을 이용하여 멀티채널을 생성하는 단계를 포함하여 이루어지는 것을 특징으로 하는 오디오 신호의 디코딩 방법을 제공한다.The present invention provides a method of identifying a channel configuration identifier, extracting information on the number of bits of a first channel conversion module to which the number of bits corresponding to the number of input channels is allocated, if the channel configuration is not determined. It provides a method for decoding an audio signal comprising the step of generating a multi-channel using a channel conversion module.

따라서, 본 발명에 의하면, 멀티채널 오디오 코딩에서 신호를 모노 또는 스테레오와 같이 특정 개수로 압축하고 공간 정보로 표현되는 부가 정보를 함께 전송 또는 저장함으로써 데이터 량을 효과적으로 줄이는 것이 가능하고, 부가 정보인 공간 정보가 포함되는 비트 스트림을 효과적으로 구성하여 오디오 신호를 효과적으로 처리하는 것이 가능하다.Therefore, according to the present invention, it is possible to effectively reduce the amount of data by compressing a signal to a specific number, such as mono or stereo, and transmitting or storing the additional information represented by the spatial information together in multichannel audio coding. It is possible to effectively configure the bit stream containing the information to effectively process the audio signal.

채널변환, 멀티채널, 공간 파라미터 Channel change, multichannel, spatial parameter

Description

Method and apparatus for encoding / decoding audio signal {Method and Apparatus for encoding / decoding audio signal}

도 1은 본 발명에 따른 오디오 신호 처리 장치의 개념적인 설명을 위한 실시예를 도시한 것1 illustrates an embodiment for conceptual description of an audio signal processing apparatus according to the present invention.

도 2는 본 발명에 따라 채널변환모듈을 이용하여 다운 믹스(Down-mix)된 신호를 업 믹스(Up-mix)하는 경우의 실시예를 설명하기 위한 도면2 is a view for explaining an embodiment of up-mixing a down-mixed signal using a channel conversion module according to the present invention.

도 3은 본 발명에 따른 입력채널을 멀티채널로 업 믹스하여 출력채널을 생성하는 실시예를 설명하기 위한 도면3 is a view for explaining an embodiment of generating an output channel by upmixing an input channel to a multi-channel according to the present invention.

도 4는 본 발명에 따른 공간 파라미터 컨피규레이션 신택스(Syntax)의 실시예를 설명하기 위한 도면4 is a diagram illustrating an embodiment of a spatial parameter configuration syntax according to the present invention.

도 5는 본 발명에 따른 채널구성 정보 신택스의 실시예를 설명하기 위한 도면5 is a diagram illustrating an embodiment of channel configuration information syntax according to the present invention.

도 6은 본 발명에 따른 입력채널 개수에 따른 채널변환박스 개수를 나타내는 비트 수의 실시예를 설명하기 위한 도면6 is a view for explaining an embodiment of the number of bits indicating the number of channel conversion box according to the number of input channels according to the present invention;

도 7은 본 발명에 따른 입력채널 개수에 따른 채널 리매핑 정보를 나타내는 비트 수의 실시예를 설명하기 위한 도면FIG. 7 illustrates an embodiment of a number of bits representing channel remapping information according to the number of input channels according to the present invention.

도 8은 본 발명에 따른 채널 리매핑 정보를 이용하여 채널 리매핑하는 방법 의 실시예를 설명하기 위한 도면8 illustrates an embodiment of a method for channel remapping using channel remapping information according to the present invention.

도 9는 본 발명에 따른 채널구성 정보 신택스의 다른 실시예를 설명하기 위한 도면 9 illustrates another embodiment of channel configuration information syntax according to the present invention.

도 10은 본 발명에 따른 오디오 신호의 디코딩 방법을 설명하기 위한 흐름도10 is a flowchart illustrating a decoding method of an audio signal according to the present invention.

*도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

10 : 인코딩 장치 20 : 디코딩 장치10: encoding device 20: decoding device

100 : 공간 인코더 101 : 다운 믹스 모듈100: space encoder 101: downmix module

102 : 공간 파라미터 추출 모듈 120 : 오디오 인코더102: spatial parameter extraction module 120: audio encoder

130 : 오디오 디코더 140 : 공간 디코더130: audio decoder 140: space decoder

141 : 합성 모듈 200 : TTT 모듈141: synthesis module 200: TTT module

210, 220, 230, 240, 250 : OTT 모듈210, 220, 230, 240, 250: OTT Module

300 : 채널 리매핑 모듈300: channel remapping module

본 발명은 오디오 신호의 인코딩/디코딩 방법 및 장치에 관한 것으로, 보다 상세하게는 멀티채널 오디오 코딩 기법에 있어서 신호의 다운 믹스 및 부가 정보를 이용하는 방법의 효과적인 비트 스트림 구성 방법에 관한 것이다.The present invention relates to a method and apparatus for encoding / decoding an audio signal, and more particularly, to a method for constructing an effective bit stream in a method of using downmixing and additional information of a signal in a multichannel audio coding technique.

디지털 비디오, 디지털 오디오에 대한 표준은 각각의 신호에 대한 압축 및 복원에 대한 규격이며 디지털 시스템에 대한 표준은 압축된 비디오와 오디오 각각 을 일정한 크기의 패킷으로 분할한 후 타이밍 정보, 스트림 관련 정보 등을 추가하여 다중화하여 전송하고, 그 반대로 역 다중화 과정을 통해 타이밍 정보, 스트림 관련 정보 등을 얻어내고, 또한 압축된 비디오와 오디오를 각각 분리해 내는데 필요한 규격이다.The standard for digital video and digital audio is the standard for compression and reconstruction of each signal. The standard for digital system divides compressed video and audio into packets of a certain size, and then provides timing information and stream related information. In addition, it is necessary to multiplex and transmit, and vice versa, to obtain timing information, stream related information, and the like, and to separate compressed video and audio, respectively.

최근에 디지털 오디오 신호에 대한 다양한 코딩기술 및 방법들이 개발되고 있으며, 이와 관련된 제품들이 생산되고 있다. 또한 심리음향 모델(Psychoacoustic model)을 이용하여 멀티채널 오디오 신호의 코딩 방법들이 개발되고 있으며, 이에 대한 표준화 작업이 진행되고 있다. Recently, various coding techniques and methods for digital audio signals have been developed, and related products have been produced. In addition, coding methods for multichannel audio signals have been developed using a psychoacoustic model, and standardization thereof has been progressed.

상기 심리음향 모델은 인간이 소리를 인식하는 방식, 예를 들면 큰 소리 다음에 오는 적은 소리는 들리지 않으며, 20Hz 내지 20000Hz의 주파수에 해당되는 소리만 들을 수 있다는 사실을 이용하여, 코딩 과정에서 불필요한 부분에 대한 신호를 제거함으로써 필요한 데이터의 양을 효과적으로 줄일 수 있는 것이다. The psychoacoustic model takes unnecessary part in the coding process by taking advantage of the way in which a human recognizes a sound, for example, a sound that is not heard after a loud sound, but only a sound corresponding to a frequency of 20 Hz to 20000 Hz. By removing the signal for, we can effectively reduce the amount of data needed.

현재 MPEG-1 오디오, MPEG-4 AAC(Advanced Audio Coding) 및 MPEG-4 HE-AAC(High-Efficiency AAC)와 같은 오디오 표준 기술이 개발되어 상용화되고 있다. Currently, audio standard technologies such as MPEG-1 audio, MPEG-4 Advanced Audio Coding (AAC) and MPEG-4 High-Efficiency AAC (HE-AAC) have been developed and commercialized.

또한, "MPEG 서라운드(Surround)"라 불리는 멀티채널 오디오 신호의 코딩 방법이 개발되고 있는데, 상기 MPEG 서라운드 방식은 압축된 스테레오(또는 모노) 오디오 신호 및 낮은 비트-레이트(bit-rate)의 공간 정보 채널을 이용하여 멀티채널 오디오 신호의 전송 효율을 매우 효과적으로 향상시키는 것이다.In addition, a method of coding a multi-channel audio signal called "MPEG Surround" has been developed. The MPEG surround method is a compressed stereo (or mono) audio signal and low bit-rate spatial information. By using channels, the transmission efficiency of a multichannel audio signal is greatly improved.

그러나, 상기 MPEG 서라운드 방식에서 멀티채널 오디오 신호의 공간 정보를 코딩하는데 불필요한 부분에 비트 수가 사용되고 있어 신호의 인코딩, 전송 및 디 코딩에 있어서 효율이 좋지 못하다는 문제점이 있었다.However, since the number of bits is used for an unnecessary portion for coding spatial information of a multichannel audio signal in the MPEG surround method, there is a problem in that the efficiency of encoding, transmitting and decoding the signal is not good.

본 발명은 상기와 같은 문제점을 해결하기 위한 것으로서, 멀티채널 오디오 신호를 코딩하는데 있어서, 공간 정보를 효과적인 방식으로 표현하여, 상기 공간 정보를 위해 사용되는 데이터 량을 줄임으로써, 멀티채널 오디오 신호의 압축 및 전송효율을 향상시킬 수 있는 인코딩 및 디코딩 방법을 제공하는데 그 목적이 있다. SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and in coding a multichannel audio signal, by representing spatial information in an effective manner, reducing the amount of data used for the spatial information, thereby compressing the multichannel audio signal. And an encoding and decoding method capable of improving transmission efficiency.

상기 목적을 달성하기 위하여, 본 발명은 (a) 채널구성 식별자를 확인하는 단계와, (b) 상기 채널구성이 기 결정되지 않은 경우이면, 입력채널 개수에 대응하는 비트 수가 할당된 제1 채널변환모듈의 개수 정보를 추출하는 단계와, (c) 상기 제1 채널변환모듈을 이용하여 멀티채널을 생성하는 단계를 포함하여 이루어지는 것을 특징으로 하는 오디오 신호의 디코딩 방법을 제공한다.In order to achieve the above object, the present invention provides the steps of (a) checking a channel configuration identifier; and (b) if the channel configuration is not predetermined, a first channel conversion in which the number of bits corresponding to the number of input channels is allocated. And extracting the number information of the module, and (c) generating a multi-channel using the first channel conversion module.

또한, 본 발명은 채널구성 식별자를 확인하는 모듈과, 상기 채널구성이 기 결정되지 않은 경우이면, 입력채널 개수에 대응하는 비트 수가 할당된 제1 채널변환모듈의 개수 정보를 추출하는 제1 개수정보 추출부와, 상기 추출된 개수 정보를 이용하여 멀티채널을 생성하는 제1 채널변환모듈을 포함하여 구성되는 것을 특징으로 하는 오디오 신호의 디코딩 장치를 제공한다.In addition, the present invention provides a module for identifying a channel configuration identifier, and if the channel configuration is not determined, the first number information for extracting the number information of the first channel conversion module to which the number of bits corresponding to the number of input channels is allocated. And an extracting unit and a first channel conversion module for generating a multichannel using the extracted number information.

또한, 본 발명은 채널구성이 미리 정의되지 않은 경우이면, 출력채널 개수에 대응하는 비트 수가 할당된 제1 채널변환모듈의 개수 정보를 생성하는 단계와, 상 기 제1 채널변환모듈의 개수 정보에 따라 제2 채널변환모듈의 구성 정보를 생성하는 단계를 포함하여 이루어지는 것을 특징으로 하는 오디오 신호의 인코딩 방법을 제공한다.In addition, when the channel configuration is not defined in advance, generating the number information of the first channel conversion module to which the number of bits corresponding to the number of output channels is allocated, and the number information of the first channel conversion module Accordingly, the present invention provides a method of encoding an audio signal, the method comprising generating configuration information of a second channel conversion module.

또한, 본 발명은 채널구성이 기 결정되지 않은 경우이면, 출력채널 개수에 대응하는 비트 수가 할당된 제1 채널변환모듈의 개수 정보를 생성하는 제1 생성부와, 상기 제1 채널변환모듈의 개수 정보에 따라 제2 채널변환모듈의 구성 정보를 생성하는 제2 생성부를 포함하여 구성되는 것을 특징으로 하는 오디오 신호의 인코딩 장치를 제공한다.In addition, when the channel configuration is not determined in advance, the first generation unit for generating information on the number of the first channel conversion module is assigned a number of bits corresponding to the number of output channels, and the number of the first channel conversion module It provides a device for encoding an audio signal comprising a second generator for generating configuration information of the second channel conversion module according to the information.

또한, 본 발명은 채널구성 식별자를 포함하고, 상기 채널구성이 기 결정되지 않은 경우이면, 입력채널 개수에 대응하는 비트 수가 할당된 제1 채널변환모듈의 개수 정보를 포함하여 이루어지는 것을 특징으로 하는 오디오 신호를 제공한다.The present invention also includes a channel configuration identifier, and if the channel configuration is not determined, the audio information comprising the number information of the first channel conversion module to which the number of bits corresponding to the number of input channels is allocated. Provide a signal.

따라서, 본 발명에 의하면, 멀티채널 오디오 코딩에서 오디오 신호를 모노 또는 스테레오와 같이 특정 개수로 압축하고, 상기 오디오 신호의 공간 정보를 함께 전송 또는 저장함으로써 데이터 량을 효과적으로 줄이는 것이 가능하고, 공간 정보가 포함되는 비트 스트림을 효과적으로 구성하여 오디오 신호를 효과적으로 처리하는 것이 가능하다.Therefore, according to the present invention, it is possible to effectively reduce the amount of data by compressing an audio signal to a specific number such as mono or stereo in multichannel audio coding, and transmitting or storing the spatial information of the audio signal together. It is possible to effectively configure the included bit stream to effectively process the audio signal.

이하 상기의 목적으로 구체적으로 실현할 수 있는 본 발명의 바람직한 실시예를 첨부한 도면을 참조하여 설명한다.Hereinafter, with reference to the accompanying drawings, preferred embodiments of the present invention that can be specifically realized for the above purpose.

아울러, 본 발명에서 사용되는 용어는 가능한 한 현재 널리 사용되는 일반적인 용어를 선택하였으나, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우는 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재하였으므로, 단순한 용어의 명칭이 아닌 용어가 가지는 의미로서 본 발명을 파악하여야함을 밝혀두고자 한다.In addition, the terms used in the present invention was selected as a general term widely used as possible now, but in some cases, the term is arbitrarily selected by the applicant, in which case the meaning is described in detail in the description of the invention, It is to be understood that the present invention is to be understood as the meaning of terms rather than the names of terms.

관련하여, 본 발명에서 "공간 정보"란 인코딩 부에서 멀티채널을 다운 믹스(Down-mix)하고 송신한 신호를 디코딩 부에서 수신하여 업 믹스(Up-mix)를 수행하여 멀티채널을 생성하기 위해 필요한 정보를 의미한다. 상기 공간 정보로 공간 파라미터를 기준으로 설명하나, 본 발명이 이에 한정되지 않음은 자명한 사실임을 밝혀둔다.In this regard, in the present invention, “spatial information” means down-mixing multichannels in an encoding unit and receiving signals transmitted by a decoding unit to perform up-mix to generate multichannels. Means necessary information. Although the spatial information is described based on the spatial parameters, it is to be understood that the present invention is not limited thereto.

또한, 상기 공간 파라미터는 두 채널간의 에너지 차이를 의미하는CLD(Channel Level Difference), 두 채널간의 상관관계(Correlation)를 의미하는 ICC(Inter Channel Coherences) 및 두 채널로부터 세 채널을 생성할 때 이용되는 예측 계수인 CPC(Channel Prediction Coefficients) 등이 있다.In addition, the spatial parameter is used when generating three channels from two channels and an ICC (Inter Channel Coherences) representing a channel level difference (CLD) representing an energy difference between two channels, and a correlation between two channels. Channel Prediction Coefficients (CPC), which are prediction coefficients.

관련하여, 본 발명에서 "채널구성 식별자"는 특정 신호의 채널구성을 나타내는 정보를 의미한다. 상기 채널구성 식별자에 의해서 미리 정해진 채널구성인지 미리 정해지지 않은 채널구성인지를 판단한다. 상기 미리 정해진 채널구성은 5-1-5 경우와 5-2-5 경우가 있다. 상기 채널구성 식별자로 "bsTreeConfig"를 예로 하여 설명하나, 본 발명은 이에 한정되지 않음은 자명한 사실이다. In this regard, in the present invention, a "channel configuration identifier" means information indicating a channel configuration of a specific signal. The channel configuration identifier determines whether the channel configuration is predetermined or not predetermined. The predetermined channel configuration is 5-1-5 case and 5-2-5 case. Although "bsTreeConfig" is described as an example of the channel configuration identifier, it is obvious that the present invention is not limited thereto.

관련하여, 본 발명에서 "채널변환모듈"은 특정 개수의 입력채널을 입력채널 개수와 다른 특정 출력 채널 개수로 변환하는 모듈을 의미하고, 상기 채널변환모듈 중 하나는 제1 채널변환모듈로 명명하고, 또 다른 채널변환모듈은 제2 채널변환모 듈로 명명하여 사용한다.In this regard, in the present invention, "channel conversion module" means a module for converting a specific number of input channels into a specific output channel number different from the number of input channels, and one of the channel conversion modules is referred to as a first channel conversion module. In addition, another channel conversion module is referred to as a second channel conversion module.

예를 들어, 제1 채널변환모듈은 입력채널이 2개인 경우 출력 채널을 3개로 변환하는 TTT(Two to Three:TTT, 이하 'TTT'라 한다.) 모듈 또는 TTT 박스를 기준으로, 제2 채널변환모듈은 입력채널이 1개인 경우 출력 채널을 2개로 변환하는 OTT(One to Two:OTT, 이하 'OTT'라 한다.) 모듈 또는 OTT 박스를 기준으로 설명하기로 한다.For example, the first channel conversion module converts an output channel into three when two input channels are provided. The second channel is based on a two-to-three (TTT) module or a TTT box. The conversion module will be described based on the OTT (One to Two: OTT) module or OTT box that converts the output channel into two when the input channel is one.

다만, 본 발명은 TTT 모듈과 OTT 모듈에 한정되지 않으며, 상기 제1 채널변환모듈과 제2 채널변환모듈은 입력채널과 출력 채널이 임의의 개수를 가지는 경우에 모두 적용 가능함은 자명한 사실임을 밝혀둔다.However, the present invention is not limited to the TTT module and the OTT module, and it is apparent that the first channel conversion module and the second channel conversion module are applicable to the case where the input channel and the output channel have any number. Put it.

관련하여, 본 발명에서 "채널구성 정보"는 상기 채널구성 식별자를 확인하여 채널구성이 미리 정해지지 않은 경우에 채널구성에 대한 정보를 포함하는 것을 의미하며, 특히, 채널구성 식별자가 기 결정되지 않은 경우에 채널구성에 대한 정보를 나타낸다.In this regard, in the present invention, "channel configuration information" means checking the channel configuration identifier to include information on the channel configuration when the channel configuration is not predetermined, and in particular, when the channel configuration identifier is not predetermined. Shows information about channel configuration.

예를 들어, 공간 정보인 공간 파라미터 컨피규레이션(Spatial Parameter Configuration)에서 "TreeDescription()"가 상기 채널구성 정보를 포함하고 있는바, 본 발명은 상기 예에 한정되지 않음은 자명한 사실이다.For example, since "TreeDescription ()" includes the channel configuration information in spatial parameter configuration, which is spatial information, it is obvious that the present invention is not limited to the above example.

관련하여, 본 발명에서 "채널 리매핑 정보(Channel Re-mapping Information)"는 입력된 채널의 순서를 재배치(Reordering)하는 경우의 정보, 즉 멀티채널을 생성하기 위해서 입력된 채널을 리매핑하기 위한 정보를 의미한다. 상기 채널 리매핑 정보로 "bsChannelRemapping[ch]"를 기준으로 설명하나, 본 발명은 이에 한정되지 않음은 자명한 사실이다.In this regard, in the present invention, "Channel Re-mapping Information" refers to information when reordering input channels, that is, information for remapping an input channel to generate a multi-channel. it means. Although the channel remapping information is described based on "bsChannelRemapping [ch]", it is obvious that the present invention is not limited thereto.

또한, 본 발명에서 "채널 리매핑 모듈(Channel Re-mapping Module)"이란 채널 리매핑 정보에 따라 입력 된 채널을 리매핑하는 모듈을 의미한다.In addition, in the present invention, a "channel remapping module" refers to a module for remapping an input channel according to channel remapping information.

도 1은 본 발명에 따른 오디오 신호 처리 장치의 개념적인 설명을 위한 실시예를 도시한 것으로 특히, MPEG 서라운드(MPEG Surround)에서 오디오 신호의 인코딩 장치와 디코딩 장치를 설명하기 위한 도면이다.FIG. 1 illustrates an embodiment for conceptual description of an audio signal processing apparatus according to the present invention. In particular, FIG. 1 is a diagram illustrating an apparatus for encoding and decoding an audio signal in MPEG surround.

인코딩 장치(10)는 다운 믹스 모듈(Downmix Module:101)과 공간 파라미터 추출 모듈(Spatial Parameter Estimation Module:102)을 포함하여 구성되는 공간 인코더(Spatial Encoder:100)와, 다운 믹스된 오디오 신호를 인코딩하는 오디오 인코더(Audio Encoder:120)를 포함하여 구성된다. The encoding device 10 encodes a downmixed audio signal and a spatial encoder 100 including a downmix module 101 and a spatial parameter estimation module 102. It is configured to include an audio encoder (Audio Encoder) (120).

오디오 신호가 N개의 멀티채널(

,

,...,

)로 입력되면, 다운 믹스 모듈(101)은 미리 정해진 다운 믹스 정보 또는 외부 제어 명령에 따라 특정 개수의 채널로 입력된 오디오 신호의 다운 믹스를 수행하여 다운 믹스 채널을 생성하고, 상기 다운 믹스 채널로 다운 믹스된 오디오 신호를 출력하면, 상기 출력된 신호는 오디오 인코더(120)에 입력된다.The audio signal is N multichannel (

,

, ...,

), The downmix module 101 generates a downmix channel by downmixing an audio signal input to a specific number of channels according to predetermined downmix information or an external control command, and generates a downmix channel. When the down mixed audio signal is output, the output signal is input to the audio encoder 120.

여기서, 상기 다운 믹스된 채널은 한 개의 채널 또는 두 개의 채널(

,

)을 가지거나, 또는 다운 믹스 명령에 따라 특정 개수의 채널을 가질 수 있다. 이때, 다운 믹스된 채널의 개수는 설정가능하다.Here, the downmixed channel may be one channel or two channels (

,

) Or a certain number of channels depending on the downmix command. At this time, the number of down-mixed channels can be set.

선택적으로, 다운 믹스된 오디오 신호는 외부에서 직접 제공되는 다운 믹스된 오디오 신호, 즉 아티스틱 다운 믹스 신호(Artistic Downmix Signal)를 이용할 수 있음을 밝혀둔다.Optionally, it is noted that the downmixed audio signal may utilize an externally provided downmixed audio signal, namely an artistic downmix signal.

오디오 인코더(Audio Encoder:120)는 다운 믹스된 채널을 통해서 전송된 다운 믹스 오디오 신호를 수신하고, 상기 수신한 신호의 인코딩을 수행하여 압축된 오디오 신호(Compressed Audio Signal)를 송신한다.The audio encoder 120 receives a downmix audio signal transmitted through a downmixed channel, encodes the received signal, and transmits a compressed audio signal.

공간 파라미터 추출 모듈(102)은 멀티채널로부터 공간 파라미터를 추출하여, 상기 추출된 공간 파라미터들을 디코딩 장치(20)로 송신한다.The spatial parameter extraction module 102 extracts the spatial parameters from the multichannel and transmits the extracted spatial parameters to the decoding apparatus 20.

디코딩 장치(20)의 오디오 디코더(130)는 압축된 오디오 신호(Compressed Audio Signal)를 수신하고, 상기 수신한 압축된 오디오 신호의 오디오 디코딩을 수행하여 다운 믹스된 오디오 신호를 스테레오 채널(

,

)을 통해 출력한다.The audio decoder 130 of the decoding apparatus 20 receives a compressed audio signal and performs audio decoding of the received compressed audio signal to convert a down mixed audio signal into a stereo channel (

,

)

이때, 오디오 신호의 디코딩 장치(20)가 멀티채널을 디코딩하지 못하는 경우에는 압축된 오디오 신호의 디코딩을 수행하여 모노 또는 스테레오 오디오 신호로 직접 출력할 수 있는데, 이는 오디오 신호의 디코딩 장치들 간에 호환성을 위해서 필요한 것이다.In this case, when the decoding device 20 of the audio signal cannot decode the multi-channel, the decoding of the compressed audio signal may be performed and output directly as a mono or stereo audio signal, which may improve compatibility between the decoding devices of the audio signal. It is necessary for that.

공간 디코더(Spatial Decoder:140)의 합성 모듈(Synthesis Module:141)은 오디오 디코더(130)로부터 오디오 스테레오 신호를 수신하고, 인코딩 장치(10)의 공간 파라미터 추출 모듈(102)로부터 공간 파라미터들(Spatial Parameters)을 수신하 여 서라운드 통합을 하여 멀티채널(

,

, ...,

)을 생성하고, 상기 생성된 멀티채널을 통해서 멀티채널 오디오 신호를 출력한다.The synthesis module 141 of the spatial decoder 140 receives an audio stereo signal from the audio decoder 130, and receives the spatial parameters from the spatial parameter extraction module 102 of the encoding apparatus 10. Parameters, and surround integration to multi-channel (

,

, ...,

), And outputs a multichannel audio signal through the generated multichannel.

이와 같이, 멀티채널 오디오 신호를 직접 전송하는 대신에 스테레오 또는 모노 오디오 신호로 다운 믹스하여 전송하고, 상기 멀티채널 오디오 신호의 공간 파라미터를 함께 전송하는 방식은 압축 및 전송 효율의 관점에서 매우 우수한 방식이다.As such, a method of downmixing and transmitting a stereo or mono audio signal and transmitting spatial parameters of the multichannel audio signal together instead of directly transmitting the multichannel audio signal is an excellent method in terms of compression and transmission efficiency. .

관련하여, 공간 디코더(Spatial Decoder:140)에서 다운 믹스된 채널인 모노 또는 스테레오 채널을 멀티채널로 하는 경우 중 하나인 2채널에서 5.1채널로 변환하는 경우에 대해서 좀 더 상세히 살펴본다.In this regard, a case in which a spatial decoder (Spatial Decoder) 140 converts a mono- or stereo channel, which is a downmixed channel, into a multi-channel, from 2 to 5.1, will be described in more detail.

2채널에서 5.1채널로의 변환은 시간/주파수 영역(Time/Frequency Domain)에서 이루어지는데, 그 과정은 다음과 같다.The conversion from 2 channels to 5.1 channels is performed in the time / frequency domain. The process is as follows.

먼저, 2채널 분석 필터뱅크(Analysis Filterbank)는 디코딩되어 전송된 스테레오 오디오 신호를 2채널의 시간/주파수 영역 오디오 신호로 변환하고, 상기 시간/주파수 영역 오디오 신호는 공간 정보, 즉 공간 파라미터를 이용하여 6채널 시간/주파수 오디오 신호로 업 믹스(Up-mix) 되며, 상기 6채널 시간/주파수 오디오 신호는 6채널 합성 필터뱅크(Synthesis Filterbank)에 의해 5.1채널 오디오 신호로 변환된다.First, a two-channel analysis filterbank converts a decoded and transmitted stereo audio signal into a two-channel time / frequency domain audio signal, and the time / frequency domain audio signal uses spatial information, that is, a spatial parameter. The six-channel time / frequency audio signal is up-mixed into a six-channel time / frequency audio signal, and the six-channel time / frequency audio signal is converted into a 5.1-channel audio signal by a six-channel synthesis filterbank.

현재, 기 결정된 컨피규레이션(Pre-defined configuration)으로 5-1-5 와 5-2-5 컨피규레이션 채널구성이 존재하고, 컨피규레이션들(Configurations)은 OTT 모 듈들과 TTT 모듈들을 포함하는 트리 구조들(Tree Structures)로 표현될 수 있다.Currently, there are 5-1-5 and 5-2-5 configuration channel configurations in a pre-defined configuration, and Configurations are tree structures including OTT modules and TTT modules. Structures).

본 발명은 오디오 시스템에서 기 결정된 경우가 아닌 임의적인 채널 컨피규레이션(Arbitrary Channel Configuration)을 위한 시그널링(Signalling)을 위한 것으로, 특히 MPEG 서라운드에서 오디오 신호의 임의적인 채널 컨피규레이션을 위한 것이다.The present invention is for signaling for arbitrary channel configuration (Arbitrary Channel Configuration) and not for a predetermined case in an audio system, in particular for arbitrary channel configuration of an audio signal in MPEG surround.

도 2는 본 발명에 따라 채널변환모듈을 이용하여 다운 믹스(Down-mix)된 신호를 업 믹스(Up-mix)하는 경우의 실시예를 설명하기 위한 도면으로, 특히 제1 채널변환모듈은 TTT 모듈을 기준으로 제2 채널변환모듈은 OTT 모듈을 기준으로 설명하며, 채널변환모듈을 이용하여 5개의 채널을 11개의 채널로 변환하는 경우를 나타낸 것이다.FIG. 2 is a diagram illustrating an embodiment of up-mixing a down-mixed signal using a channel conversion module according to the present invention. Particularly, the first channel conversion module is a TTT. The second channel conversion module on the basis of the module will be described based on the OTT module, and shows a case of converting five channels into eleven channels using the channel conversion module.

도 2를 참조하면, TTT 모듈 1개와 OTT 모듈 5개로 구성된 임의적인 디코더 트리(Arbitrary Decoder Tree)를 나타낸 것이다.Referring to FIG. 2, an arbitrary decoder tree composed of one TTT module and five OTT modules is illustrated.

2개의 채널로 입력신호

과

를 수신하는 TTT 모듈(200)은 상기 2개의 채널을 3개의 채널로 변환하고, 상기 3개의 채널로 3개의 신호를 출력하고, 상기 TTT 모듈(200)로부터 출력된 3개의 신호 중 첫 번째 신호는 OTT 모듈(220)에 입력되어 두 개의 채널로 출력신호

과

를 출력하고, 두 번째 신호는 OTT 모듈(230)에 입력되어 두 개의 채널로 출력신호

과

를 출력하고, 세 번 째 신호는 OTT 모듈(240)에 입력되어 두 개의 채널로 출력신호

과

를 출력한다.Input signal to 2 channels

and

Receiving the TTT module 200 converts the two channels into three channels, and outputs three signals to the three channels, the first of the three signals output from the TTT module 200 is Input signal to two channels input to OTT module 220

and

The second signal is input to the OTT module 230 and the output signal to the two channels

and

The third signal is input to the OTT module 240 and the output signal to the two channels

and

Outputs

그리고, 입력신호

와

는 OTT 모듈이 존재하지 않는 경우를 나타내는 비트 시퀀스(Bit-sequence)가 '0'인 경우로 OTT 모듈과 연결되지 않고 바로 출력신호

와

로 출력한다.And the input signal

Wow

The bit sequence representing the case where the OTT module does not exist is '0' and the output signal is not connected to the OTT module.

Wow

Will output

그리고, 입력신호

는 비트 시퀀스(Bit-sequence)가 '10100'인 경우로 첫 번째 OTT 모듈(210)에 의해서 2개의 채널로 변환되어, 상기 2채널 중 하나의 채널을 통해 출력된 신호는 더 이상 변환되지 않아 출력신호

이 출력되고, 상기 2채널 중 다른 하나의 채널을 통해 출력된 신호는 다시 OTT 모듈(250)에 의해 2개의 채널로 변환되고, 상기 변환된 2개의 채널을 통해서 출력신호

과

를 출력한다.And the input signal

The bit sequence is '10100', and the bit sequence is converted into two channels by the first OTT module 210 so that the signal output through one of the two channels is no longer converted and output. signal

Is output, and the signal output through the other one of the two channels is converted into two channels by the OTT module 250 again, and the output signal through the two converted channels

and

Outputs

도 3은 본 발명에 따른 입력채널을 멀티채널로 업 믹스하여 출력채널을 생성하는 실시예를 설명하기 위한 도면으로, 특히 3개의 입력채널에서 12개의 출력 채널로 업 믹스(Up-mix)하는 경우를 나타낸 것이다.3 is a view for explaining an embodiment of generating an output channel by upmixing an input channel to a multi-channel according to the present invention. In particular, when up-mixing from three input channels to 12 output channels. It is shown.

도 3을 참조하면, 이 완전한 트리(Complete Tree)는 4번의 연속적인 단계들에 의해서 묘사되는데, 입력채널은 채널 리매핑 정보인 "bsChannelRemapping[ch]" 를 이용하여 채널 리매핑 모듈(Channel Re-mapping Module:300)에서 재배치(Re-ordered)된다.Referring to FIG. 3, this complete tree is described by four successive steps, where the input channel is a channel remapping module using channel remapping information "bsChannelRemapping [ch]". Re-ordered at (300).

예를 들어, TTT 모듈의 개수 정보에 의해 정해지는 하나 또는 복수 개의 TTT 모듈이 존재하면, 채널 리매핑된 후의 첫 번째 채널과 두 번째 채널은 첫 번째 TTT 모듈의 두 입력이 되고, 채널 리매핑된 후의 세 번째 채널과 네 번째 채널은 두 번째 TTT 모듈의 두 입력이 된다.For example, if there is one or more TTT modules determined by the number information of the TTT module, the first channel and the second channel after channel remapping become two inputs of the first TTT module, and three channels after channel remapping. The first and fourth channels are the two inputs of the second TTT module.

도 3의 채널 리매핑된 후의 첫 번째 채널과 두 번째 채널은 TTT 모듈(310)의 입력채널이 되고, 상기 TTT 모듈(310)은 두 개의 채널을 세 개의 채널로 변환하여 서브트리(Subtree)인 서브트리 "A", 서브트리 "B", 서브트리 "C" 가 생성된다.After the channel remapping in FIG. 3, the first channel and the second channel become input channels of the TTT module 310, and the TTT module 310 converts two channels into three channels to serve as a subtree. The tree "A", the subtree "B", and the subtree "C" are generated.

첫 번째 서브트리인 서브트리 "A"는 3개의 OTT 모듈(320, 360, 370)을 포함하여(comprise) 출력채널 4개를 생성하고, 두 번째 서브트리인 서브트리 "B"는 OTT 모듈(330)을 포함하여 출력채널 2개를 생성하고, 세 번째 서브트리인 서브트리 "C"는 OTT 모듈(340)을 포함하여 출력채널 2개를 생성한다.The first subtree, subtree "A", produces three output channels, including three OTT modules 320, 360, and 370, and the second subtree, subtree "B", is the OTT module ( 330 to generate two output channels, and the third subtree, subtree "C", includes the OTT module 340 to generate two output channels.

또한, 채널 리매핑된 후에 TTT 모듈(310)에 관련되지 않은 네 번째 서브트리인 서브트리 "D"는 3개의 OTT 모듈(350, 380, 390)을 포함하여 출력채널 4개를 생성한다.Also, after the channel remapping, the subtree "D", the fourth subtree not related to the TTT module 310, generates four output channels including three OTT modules 350, 380, and 390.

따라서, 입력채널 3개에 대해서 출력채널 12개가 되고, 각 출력채널은 출력채널 위치 정보에 의해서 각 출력채널과 외부 스피커로 매칭되게 된다.Therefore, there are 12 output channels for the three input channels, and each output channel is matched with each output channel and the external speaker by the output channel position information.

도 4는 본 발명에 따른 공간 파라미터 컨피규레이션 신택스(Syntax)의 실시예를 설명하기 위한 도면이다.4 is a diagram illustrating an embodiment of a spatial parameter configuration syntax according to the present invention.

도 4를 참조하면, 채널구성 식별자는 채널구성을 나타내고, 트리 컨피규레이션(Tree Configuration)을 정의한다. 또한, 채널구성 식별자인 "bsTreeConfig"는 4비트로 채널구성에 대한 정보를 포함한다.Referring to FIG. 4, the channel configuration identifier indicates a channel configuration and defines a tree configuration. In addition, the channel configuration identifier "bsTreeConfig" includes information about the channel configuration in 4 bits.

예를 들어, "bsTreeConfig"가 '0'이면 5151 컨피규레이션을 의미하고, "bsTreeConfig"가 '1'이면 5152 컨피규레이션을 의미하고, "bsTreeConfig"가 '2'이면 525 컨피규레이션을 의미하고, "bsTreeConfig"가 '15'이면 전체 트리 디스크립션(Tree Description)이 시그널링(Signalling) 되는 것을 의미한다.For example, if "bsTreeConfig" is "0", it means 5151 configuration, "bsTreeConfig" is "1" means 5152 configuration, "bsTreeConfig" is "2" means 525 configuration, and "bsTreeConfig" is '15' means that the entire tree description is signaled.

상기 채널구성 식별자를 추출하여, "bsTreeConfig"가 '15'이면, 채널구성 정보를 포함하는 "TreeDescription()"에서 트리 디스크립션 정보를 추출하는데, 이하, 도 5를 참조하여 "TreeDescription()"를 상세히 살펴보도록 한다.If the "bsTreeConfig" is "15" and extracts the channel configuration identifier, tree description information is extracted from "TreeDescription ()" including the channel configuration information. Hereinafter, "TreeDescription ()" is described in detail with reference to FIG. Let's take a look.

도 5는 본 발명에 따른 채널구성 정보 신택스의 실시예를 설명하기 위한 도면으로, 이때, 채널구성 정보는 트리 디스크립션(Tree Description) 정보를 의미한다.5 is a view for explaining an embodiment of the channel configuration information syntax according to the present invention, in which the channel configuration information refers to tree description information.

도 5를 참조하면, "TreeDescription()"는 채널구성인 트리(Tree)를 묘사(Describing)하기 위한 통상적인 요소이며, "bsNumInChan"는 입력채널 개수의 정보를 의미하고, 입력채널의 개수인 "numInChan"는 "bsNumInChan+1"과 같으며, 상기 입력채널의 개수 정보를 나타내는 비트 수는 4비트이다.Referring to FIG. 5, "TreeDescription ()" is a typical element for describing Tree, which is a channel configuration, and "bsNumInChan" means information of the number of input channels, and "" which is the number of input channels. numInChan "is equal to" bsNumInChan + 1 ", and the number of bits representing the number information of the input channel is 4 bits.

또한, 본 발명에서 "bsNumTttBoxes"는 TTT 박스의 개수를 나타내며, 그 범위는 0에서부터 입력채널 개수의 절반 사이에서 존재하는데, 즉 0 <= bsNumTttBoxes <= numInChan/2 의 범위 사이에서 존재하는 것이다.In addition, in the present invention, "bsNumTttBoxes" indicates the number of TTT boxes, and the range is between 0 and half of the number of input channels, that is, between 0 <= bsNumTttBoxes <= numInChan / 2.

특히, 본 발명에서 "bsNumTttBoxes"는 TTT 박스들의 개수에 따라서 비트 필드를 가변적으로 하는데, "numInChan"이 2^(n-1)~(2^n)-1의 범위일 때, "bsNumTttBoxes"의 길이를 나타내는 비트 수는 n-1 비트로 가변적인바, 예를 들어 "numInChan"이 1~16이면, "bsNumTttBoxes"의 길이는 0~4 비트 중 어느 한 비트로 나타내는바, 상세한 설명은 도 6을 참조하여 설명하고, 여기서 상기 n은 임의의 정수를 의미한다.In particular, in the present invention, "bsNumTttBoxes" varies the bit field according to the number of TTT boxes. When "numInChan" is in the range of 2 ^ (n-1) to (2 ^ n) -1, the "bsNumTttBoxes" The number of bits representing the length is variable to n-1 bits. For example, if "numInChan" is 1 to 16, the length of "bsNumTttBoxes" is represented by any one of 0 to 4 bits. For details, refer to FIG. In the following description, n is an arbitrary integer.

또한, 본 발명에서 "bsChannelRemapping[ch]"는 정의된 입력채널 수에 따라 각 입력채널에 대한 채널 리매핑 정보를 나타내는 필드이며, 채널 리매핑 모듈은 상기 채널 리매핑 정보에 따라 채널 리매핑을 수행한다.Also, in the present invention, "bsChannelRemapping [ch]" is a field indicating channel remapping information for each input channel according to a defined number of input channels, and the channel remapping module performs channel remapping according to the channel remapping information.

그리고, 임의적인 입력채널들에 대해서 TTT 박스들의 개수가 정해지면, 상기 TTT 박스들의 위치도 정해지는 것이다.When the number of TTT boxes is determined for arbitrary input channels, the positions of the TTT boxes are also determined.

관련하여, 입력채널 및 출력채널의 관계를 기술하는 비트 스트림 구조에서, 각 입력채널에 대한 리매핑 정보는 입력채널 개수와 같은 개수의 정보를 갖는 경우와, 입력채널 개수보다 하나 적은 개수의 정보를 가지는 경우가 있다.In this regard, in the bit stream structure describing the relationship between the input channel and the output channel, the remapping information for each input channel has the same number of information as the number of input channels, and has one less information than the number of input channels. There is a case.

그리고, 모든 채널들을 위한 채널 리매핑 정보의 범위(Range)는 0에서부터 입력채널의 개수 사이에서 존재하는데, 즉 0 <= bsChannelRemapping[ch] < numInChan의 범위 사이에서 존재하는 것이다.The range of channel remapping information for all channels is between 0 and the number of input channels, that is, between 0 <= bsChannelRemapping [ch] <numInChan.

특히, 상기 "bsChannelRemapping[ch]"는 입력채널 개수에 따라 각 입력채널에 대한 채널 리매핑 정보를 포함하는 비트 수를 가변적으로 하는데, "numInChan"이 2^(n-1)~(2^n)-1의 범위일 때, "bsChannelRemapping[ch]"의 길이를 나타내는 비 트 수는 n비트로 가변적이다.In particular, the "bsChannelRemapping [ch]" varies the number of bits including channel remapping information for each input channel according to the number of input channels, where "numInChan" is 2 ^ (n-1) to (2 ^ n). When in the range of -1, the number of bits representing the length of "bsChannelRemapping [ch]" is variable by n bits.

예를 들어, "numInChan"이 1~16이면, "bsChannelRemapping[ch]"를 나타내는 비트 수는 0~4 비트 중 어느 한 비트로 나타내는데, 상세한 설명은 도 7을 참조하여 설명하고, 여기서 상기 n은 임의의 정수를 의미한다.For example, when "numInChan" is 1 to 16, the number of bits representing "bsChannelRemapping [ch]" is represented by any one of 0 to 4 bits. A detailed description thereof will be described with reference to FIG. 7, where n is arbitrary. Means an integer.

관련하여, 채널변환모듈 구성정보는 트리구조에서 채널변환모듈의 구성정보를 나타낸다. 상기 채널변환모듈 구성정보는 채널변환모듈의 출력신호가 트리구조에서 최종 출력신호인지, 아니면 상기 트리구조에서 다른 채널변환모듈에 입력되는 중간신호인지를 나타낸다.In this regard, the channel conversion module configuration information represents the configuration information of the channel conversion module in a tree structure. The channel conversion module configuration information indicates whether an output signal of the channel conversion module is a final output signal in a tree structure or an intermediate signal input to another channel conversion module in the tree structure.

제2 채널변환모듈의 구성정보를 나타내는 "bsOttBoxPresent"를 기준으로 설명하면 다음과 같다. 상기 "bsOttBoxPresent"의 값이 '0'이면 출력신호가 트리구조의 최종 출력신호이고, 상기 "bsOttBoxPresent"의 값이 '1'이면 출력신호가 다른 제2 채널변환모듈에 입력된다.The following description will be made based on "bsOttBoxPresent" indicating configuration information of the second channel conversion module. If the value of "bsOttBoxPresent" is "0", the output signal is the final output signal of the tree structure. If the value of "bsOttBoxPresent" is "1", the output signal is input to another second channel conversion module.

관련하여, 채널변환박스 디폴트 정보는 채널변환박스의 디폴트를 나타낸다. 공간 정보의 디폴트를 나타내는 "bsOttDefaultCld"를 기준으로 설명하면 다음과 같다. 상기 "bsOttDefaultCld"는 idxCLD[][][]를 위한 디폴트 값(Default Value)의 정보를 포함하고 있는데, 상기 "bsOttDefaultCld"의 값이 '0'이면 디폴트 idxCLD[][][] = 0을 의미하고, "bsOttDefaultCld"의 값이 '1'이면 디폴트 idxCLD[][][] = 15를 의미한다.In this regard, the channel change box default information indicates a default of the channel change box. The following description will be made based on "bsOttDefaultCld" indicating a default of spatial information. The "bsOttDefaultCld" includes information of a default value for idxCLD [] [] []. If the value of the "bsOttDefaultCld" is '0', it means a default idxCLD [] [] [] = 0. If the value of "bsOttDefaultCld" is '1', it means a default idxCLD [] [] [] = 15.

관련하여, 채널변환박스 모드 정보는 채널변환박스가 어떤 모드로 동작하는지를 나타낸다. 제2 채널변환박스의 모드 정보를 나타내는 "bsOttModeLfe"를 기준 으로 설명하면 다음과 같다.In this regard, the channel conversion box mode information indicates in which mode the channel conversion box operates. The following description will be made based on "bsOttModeLfe" indicating mode information of the second channel conversion box.

상기 "bsOttModeLfe"는 OTT 박스가 일반적인(normal) 모드 또는 LFE 모드 중 어떤 모드로 동작하는지의 정보를 포함하고 있으므로, "bsOttModeLfe"의 값이 '0'이면 일반적인 모드에서의 OTT 박스를 의미하고, "bsOttModeLfe"의 값이 '1'이면 LFE 모드에서의 OTT 박스를 의미한다.The "bsOttModeLfe" includes information on which mode the OTT box operates in the normal mode or the LFE mode. When the value of "bsOttModeLfe" is 0, the "bsOttModeLfe" means the OTT box in the normal mode. If the value of bsOttModeLfe "is" 1 ", it means an OTT box in LFE mode.

관련하여, 출력채널 위치정보는 트리구조에서 출력채널의 위치정보를 나타낸다. 예를 들어, 상기 출력채널 위치정보를 "bsOutputChannelPos"를 기준으로 설명하면, 상기 "bsOutputChannelPos"는 트리구조에서의 각 출력채널과 외부 스피커인 라우드스피커(Loudspeaker)들의 위치를 나타내는 정보를 의미한다.In this regard, the output channel position information indicates the position information of the output channel in the tree structure. For example, when the output channel position information is described based on "bsOutputChannelPos", the "bsOutputChannelPos" means information indicating the position of each output channel in the tree structure and the loudspeakers (Loudspeakers) that are external speakers.

이하, 도 6과 도 7을 참조하여 본 발명에 따라 입력채널의 개수에 따라 TTT 박스들의 개수와 채널 리매핑 정보를 각각 나타내는 "bsNumTttBoxes"와 "bsChannelRemapping"의 비트 수를 살펴보도록 한다.Hereinafter, the number of bits of "bsNumTttBoxes" and "bsChannelRemapping" indicating the number of TTT boxes and channel remapping information according to the number of input channels according to the present invention will be described with reference to FIGS. 6 and 7.

도 6은 본 발명에 따른 입력채널 개수에 따른 채널변환박스 개수를 나타내는 비트 수의 실시예를 설명하기 위한 도면으로, 특히 입력채널 및 출력채널의 관계를 기술하는 비트 스트림 구조에서, 입력채널 개수에 따라 채널변환박스 중 제1 채널변환박스를 의미하는 TTT 박스의 개수를 표현하는 비트 필드를 가변적으로 하는 경우를 예를 들어 설명한 것이다.FIG. 6 is a view for explaining an embodiment of the number of bits representing the number of channel conversion boxes according to the number of input channels according to the present invention. In particular, in the bit stream structure describing the relationship between the input channel and the output channel, Accordingly, the case in which the bit field representing the number of TTT boxes, which means the first channel conversion box, of the channel conversion box is variably described will be described.

이때, TTT 박스들의 수는 입력채널 개수의 절반을 넘을 수 없으며, 도 6과 같이 각각의 입력채널 개수에 따라 TTT 박스들의 개수를 나타내기 위한 비트 수(Number of bits of bsNumTttBoxes)가 가변적인바, 이하에서 상세히 설명하도록 한 다.In this case, the number of TTT boxes may not exceed half of the number of input channels, and as shown in FIG. 6, the number of bits of bsNumTttBoxes for indicating the number of TTT boxes varies according to the number of input channels, as follows. This will be explained in detail.

도 6을 참조하면, "numInChan"이 '1'의 값이면 "bsNumTttBoxes"의 비트를 0비트(사용하지 않음)로 하고, "numInChan"이 '2~3' 범위의 값이면 "bsNumTttBoxes"의 비트를 1비트로 하고, "numInChan"이 '4~7' 범위의 값이면 "bsNumTttBoxes"의 비트를 2비트로 하고, "numInChan"이 '8~15' 범위의 값이면 "bsNumTttBoxes"의 비트를 3비트로 하고, "numInChan"이 '16'의 값이면 "bsNumTttBoxes"의 비트를 4비트로 한다.Referring to FIG. 6, if "numInChan" is a value of '1', the bit of "bsNumTttBoxes" is set to 0 bits (not used). If "numInChan" is a value in the range of '2 to 3', a bit of "bsNumTttBoxes" is shown. Is 1 bit. If "numInChan" is a value in the range "4-7", the bit of "bsNumTttBoxes" is 2 bits. If "numInChan" is a value in the range "8-15", the bit of "bsNumTttBoxes" is 3 bits. , If "numInChan" is a value of '16', the bit of "bsNumTttBoxes" is 4 bits.

도 7은 본 발명에 따른 입력채널 개수에 따른 채널 리매핑 정보를 나타내는 비트 수의 실시예를 설명하기 위한 도면으로, 특히 입력채널 및 출력채널의 관계를 기술하는 비트 스트림 구조에서 입력채널 개수에 따라 각 입력채널에 대한 리매핑 정보를 포함하는 비트 수를 가변적으로 하는 경우를 예를 들어 설명한 것이다.FIG. 7 is a view for explaining an embodiment of the number of bits representing channel remapping information according to the number of input channels according to the present invention. In particular, each bit according to the number of input channels in a bit stream structure describing a relationship between an input channel and an output channel is illustrated. The case where the number of bits including the remapping information for the input channel is varied is described as an example.

이때, 채널 리매핑 정보에 포함되는 정보는 임의적인 채널 인덱스인 경우를 의미한다. 상기 채널 리매핑 정보의 개수는 입력채널의 개수를 넘을 수 없는 바, 이하에서 상세히 설명한다.In this case, the information included in the channel remapping information means a case of an arbitrary channel index. Since the number of channel remapping information cannot exceed the number of input channels, it will be described in detail below.

도 7을 참조하면, "numInChan"이 '1'의 값이면 "bsChannelRemapping[ch]"의 비트를 0비트(사용하지 않음)로 하고, "numInChan"이 '2'의 값이면 "bsChannelRemapping[ch]"의 비트를 1비트로 하고, "numInChan"이 '3~4' 범위의 값이면 "bsChannelRemapping[ch]"의 비트를 2비트로 하고, "numInChan"이 '5~8' 범위의 값이면 "bsChannelRemapping[ch]"의 비트를 3비트로 하고, "numInChan"이 '9~16' 범위의 값이면 "bsChannelRemapping[ch]"의 비트를 4비트로 한다.Referring to FIG. 7, if "numInChan" is a value of '1', the bit of "bsChannelRemapping [ch]" is set to 0 bits (not used). If "numInChan" is a value of '2', "bsChannelRemapping [ch] is used. If the bit of "is 1 bit, if" numInChan "is a value in the range" 3-4 ", the bit of" bsChannelRemapping [ch] "is 2 bits. If" numInChan "is a value in the range" 5-8 ", it is" bsChannelRemapping [ ch] "bit is 3 bits, and if" numInChan "is a value in the range of" 9-16 ", the bit of" bsChannelRemapping [ch] "is 4 bits.

상기와 같은 스킴(Scheme)을 사용함으로써, "numInChan"가 더 적은 값을 가질 때, "bsNumTttBoxes"와 "bsChannelRemapping[ch]"의 비트 수를 감소하여, 더 효율적인 오디오 코딩을 수행하는 것이 가능하다.By using such a scheme (Scheme), when "numInChan" has a smaller value, it is possible to reduce the number of bits of "bsNumTttBoxes" and "bsChannelRemapping [ch]" to perform more efficient audio coding.

관련하여, 입력채널 개수인 "numInChan"의 값과 임의의 입력을 나타내는 특정 i번째 입력채널에 대한 리매핑 정보를 나타내는 "bsChannelRemapping[i]"에 대해, i=0, 1, ..., numInChan-1 에서 0 <= bsChannelRemapping < numInChan 인 관계를 이용하여, 전체 "bsChannelRemapping[i]"의 조합을 표현하는 것이 가능한데, 이하 수식과 같이 표현하는 것이 가능하다. In relation to this, i = 0, 1, ..., numInChan- for a value of "numInChan" which is the number of input channels and "bsChannelRemapping [i]" indicating remapping information for a particular i-th input channel indicating any input. By using the relation 1 to 0 <= bsChannelRemapping <numInChan, it is possible to express the entire combination of "bsChannelRemapping [i]", which can be expressed as the following formula.

도 8은 본 발명에 따른 채널 리매핑 정보를 이용하여 채널을 리매핑하는 방법의 실시예를 설명하기 위한 도면으로, 특히, 입력채널 개수보다 하나 적은 채널 리매핑 정보를 이용하여 채널 리매핑을 수행하는 경우를 나타낸 것이다.FIG. 8 is a diagram illustrating an embodiment of a method for remapping a channel using channel remapping information according to the present invention. In particular, FIG. 8 illustrates a case in which channel remapping is performed using one channel remapping information less than the number of input channels. will be.

채널 리매핑 정보인 "bsChannelRemapping"는 입력채널과 일대일(One-to-One) 매핑되어 재배치(Reordering)되는데, 이때 "bsChannelRemapping"의 마지막 요소인 매핑되지 않은 채널은 이전 요소들의 매핑 정보들인 "bsChannelRemapping[0..numInChan-2]"를 이용하여 매핑할 수 있으므로, 마지막 요소의 채널 매핑 정보인 "bsChannelRemapping[numInChan-1]"는 생략 가능하다.The channel remapping information "bsChannelRemapping" is one-to-one mapped and reordered with the input channel. In this case, the unmapped channel, which is the last element of "bsChannelRemapping", is the mapping information of the previous elements, "bsChannelRemapping [0]. ..numInChan-2] "can be used to map, so" bsChannelRemapping [numInChan-1] ", which is the channel mapping information of the last element, can be omitted.

본 발명은 특정 입력채널 개수인 "numInChan"의 값과 i번째 입력채널에 대한 리매핑 정보를 나타내는 "bsChannelRemapping[i]"에 대해, 0 <= bsChannelRemapping[i] < numInChan 인 정수이고, i = 0, 1, ..., numInChan-1 인 관계를 이용한다. The present invention is an integer of 0 <= bsChannelRemapping [i] <numInChan for "bsChannelRemapping [i]" indicating the value of "numInChan" which is the number of specific input channels and remapping information for the i-th input channel, and i = 0, Use the relation 1, ..., numInChan-1.

즉, 인코딩 장치에서는 입력채널 개수보다 하나 적은 채널 리패핑 정보를 비트 스트림에 포함하여 전송하면, 디코딩 장치에서 상기 비트 스트림을 수신하여 채널 리매핑을 수행하게 된다. 상기 비트 스트림은 numInChan-1 개의 입력채널에 대한 "bsChannelRemapping[i]" 만을 포함한다.That is, when the encoding apparatus includes and transmits channel repacking information less than the number of input channels in the bit stream, the decoding apparatus receives the bit stream and performs channel remapping. The bit stream includes only "bsChannelRemapping [i]" for numInChan-1 input channels.

상기 비트 스트림으로부터 numInChan-1개의 "bsChannelRemapping[i]"를 추출하고, 0 <= bsChannelRemapping[i] < numInChan 인 관계를 이용하여 numInChan번째 "bsChannelRemapping[numInChan-1]"의 값을 계산하는데, 이를 수식으로 나타내면 다음과 같다.From the bit stream, numInChan-1 "bsChannelRemapping [i]" is extracted, and the value of numInChan-th "bsChannelRemapping [numInChan-1]" is calculated using a relationship of 0 <= bsChannelRemapping [i] <numInChan. It is as follows.

bsChannelRemapping[numInChan-1]=

bsChannelRemapping [numInChan-1] =

도 8과 상기 수식을 참조하여 예를 들면, 채널 리매핑 정보는 채널이 재배치되는 정보를 포함하므로, 상기 채널 리매핑 정보를 이용하여 채널 재배치를 수행하면, "bsChannelRemapping[0]"는 Input channel #1로, "bsChannelRemapping[1]"는 Input channel #n-1로,..., "bsChannelRemapping[n-2]"는 Input channel #n-2로 재배치되며, 마지막 요소인 "bsChannelRemapping[n-1]"는 결국 Input channel #0로 재배치된다.For example, referring to FIG. 8 and the above equation, since channel remapping information includes information on which channels are relocated, when channel relocation is performed using the channel remapping information, "bsChannelRemapping [0]" is set to Input channel # 1. , "bsChannelRemapping [1]" is relocated to Input channel # n-1, ..., "bsChannelRemapping [n-2]" is relocated to Input channel # n-2, and the last element "bsChannelRemapping [n-1]" Eventually relocates to Input channel # 0.

즉, 리매핑 채널의 채널번호의 합계에서 입력채널과 매핑이된 리매핑 채널의 채널번호 합계의 차이를 구하면, 마지막 리매핑 채널과 매핑되는 입력채널이 결정되게 되는 것이다.That is, if the difference between the sum of the channel numbers of the remapping channels mapped to the input channel and the sum of the channel numbers of the remapping channels is obtained, the input channel mapped to the last remapping channel is determined.

관련하여, numInChan-1개의 "bsChannelRemapping[i]"의 조합을 표현하는 것이 가능한바, 이하 수식과 같이 표현하는 것이 가능하다.In relation to this, it is possible to express a combination of numInChan-1 "bsChannelRemapping [i]", which can be expressed by the following expression.

도 9는 본 발명에 따른 채널구성 정보 신택스의 다른 실시예를 설명하기 위한 도면으로, 이때, 채널구성 정보는 트리 디스크립션(Tree Description) 정보를 의미한다.FIG. 9 is a diagram for explaining another embodiment of channel configuration information syntax according to the present invention. In this case, the channel configuration information refers to tree description information.

특히, 상기 도 5에서 설명한 신택스와 다른 점은 채널 리매핑 정보가 입력채널 개수보다 하나 적은 정보를 포함하고 있어서, 마지막 채널 리매핑 정보는 비트 스트림에 포함하고 있지 않고, 디코더에서 계산해야한다. 상기 마지막 채널 리매핑 정보는 "bsCannelRemapping[numInChan-1]"를 기준으로 한다.In particular, the difference from the syntax described with reference to FIG. 5 is that the channel remapping information includes one less information than the number of input channels, so the last channel remapping information is not included in the bit stream and should be calculated by the decoder. The last channel remapping information is based on "bsCannelRemapping [numInChan-1]".

도 9를 참조하면, "bsNumTttBoxes"의 비트 수는 입력채널 개수인 "numInChan"에 의해 결정되는데, 이를 수식으로 나타내면 다음과 같다.Referring to FIG. 9, the number of bits of "bsNumTttBoxes" is determined by the number of input channels "numInChan", which is expressed as follows.

No. of bits of bsNumTttBoxes = ceil(log((numInChan+1)/2)) No. of bits of bsNumTttBoxes = ceil (log ((numInChan + 1) / 2))

또한, "bsChannelRemapping[ch]"의 비트 수는 입력채널 개수인 "numInChan" 에 의해 결정되는데, 이는 다음과 같은 수식으로 나타낸다.In addition, the number of bits of "bsChannelRemapping [ch]" is determined by the number of input channels "numInChan", which is represented by the following equation.

No. of bits of bsChannelRemapping = ceil(log2(numInChan))No. of bits of bsChannelRemapping = ceil (log2 (numInChan))

따라서, 본 발명은 "bsNumTttBoxes"와 "bsChannelRemapping[ch]"를 더 효과적으로 코딩하는 것이 가능하다.Thus, the present invention makes it possible to code "bsNumTttBoxes" and "bsChannelRemapping [ch]" more effectively.

관련하여, 입력채널 개수가 N인 경우 본 발명에 따른 비트 수가 얼마나 감소하는지 살펴본다. 상기 입력채널 개수 N을 위해 요구되는 비트 수를 비트 수가 고정된 경우와 가변적인 경우로 나누어 확인한다.In this regard, how the number of bits according to the present invention is reduced when the number of input channels is N. FIG. The number of bits required for the number N of input channels is divided into a case where the number of bits is fixed and a case where the number of bits is fixed.

상기 비트 수가 고정된 경우에는 "bsNumTttBoxes"와 "bsChannelRemapping[ch]" 각각 4, 4N 비트가 필요하나, 상기 비트 수가 가변적인 경우에는 "bsNumTttBoxes"와 "bsChannelRemapping[ch]" 각각 ceil(log2((N+1)/2), ceil(log2(N))*(N-1) 비트가 필요하여, 본 발명에 따른 비트 절약은 각각 4-ceil(log2((N+1)/2), 4N-ceil(log2(N))*(N-1)이다.If the number of bits is fixed, 4 and 4N bits are required for the "bsNumTttBoxes" and "bsChannelRemapping [ch]", respectively. If the number of bits is variable, each of the "bsNumTttBoxes" and "bsChannelRemapping [ch]" ceil (log2 ((N +1) / 2), ceil (log2 (N)) * (N-1) bits are required, so the bit savings in accordance with the present invention are 4-ceil (log2 ((N + 1) / 2), 4N−, respectively). ceil (log2 (N)) * (N-1)

예를 들어, 5채널이 2채널로 다운 믹스 되어 전송된 후, 다시 5채널로 업 믹스하는 경우인 5-2-5의 경우, 비트 수가 고정된 경우에는 "bsNumTttBoxes"와 "bsChannelRemapping[ch]" 모두에 사용되는 비트는 12 비트이나, 본 발명에 따라 비트 수가 가변적인 경우에는 "bsNumTttBoxes"와 "bsChannelRemapping[ch]" 모두에 사용되는 비트는 2 비트이다. 따라서, 10비트의 비트 절약(Bit saving)이 가능함을 알 수 있다.For example, in case of 5-2-5, in which 5 channels are downmixed to 2 channels and then transmitted and then upmixed to 5 channels again, when the number of bits is fixed, "bsNumTttBoxes" and "bsChannelRemapping [ch]" The bits used for all are 12 bits, but in the case where the number of bits is variable according to the present invention, the bits used for both "bsNumTttBoxes" and "bsChannelRemapping [ch]" are 2 bits. Therefore, it can be seen that bit saving of 10 bits is possible.

도 10은 본 발명에 따른 오디오 신호의 디코딩 방법을 설명하기 위한 흐름도 이며, 특히 수신된 오디오 신호의 비트 스트림 내의 정보를 이용하여 멀티채널을 생성하는 방법에 관한 것이다.10 is a flowchart illustrating a method of decoding an audio signal according to the present invention, and more particularly, a method of generating a multichannel using information in a bit stream of a received audio signal.

디코딩 장치는 공간 파리미터 비트 스트림을 수신하여, 채널구성 식별자를 확인한다(S10). The decoding apparatus receives the spatial parameter bit stream and checks the channel configuration identifier (S10).

상기 확인(S10) 후, 채널구성이 기 결정된 경우인지를 확인한다(S20).After the check (S10), it is determined whether the channel configuration is predetermined (S20).

상기 확인(S20) 결과, 채널구성이 기 결정된 경우이면, 디코딩 장치는 채널변환모듈의 정보와, 멀티채널을 생성하기 위한 정보를 테이블 형태 등으로 알고 있으므로, 이를 이용하여 멀티채널을 생성한다(S30).As a result of the check (S20), if the channel configuration is predetermined, since the decoding apparatus knows the information of the channel conversion module and the information for generating the multi-channel in the form of a table or the like, it generates a multi-channel by using this (S30). ).

상기 확인(S20) 결과, 채널구성이 기 결정된 경우가 아니면, 입력채널 개수에 대응하는 비트 수가 할당된 제1 채널변환모듈의 개수 정보를 추출한다(S40). As a result of the check (S20), if the channel configuration is not determined, the number information of the first channel conversion module to which the number of bits corresponding to the number of input channels is allocated is extracted (S40).

상기 제1 채널변환모듈의 개수 정보 추출 후, 채널 리매핑 정보 추출 부에서는 채널 리매핑 정보를 추출하고, 채널 리매핑 모듈에서는 입력채널의 채널 리매핑을 수행한다(S50). 여기서, 채널 리매핑 정보를 포함하는 비트 필드는 입력채널의 개수에 대응하는 비트 수가 할당된다. 예를 들어, 상기 채널 리매핑 정보는 입력채널과 같은 개수의 정보를 포함하거나, 입력채널보다 하나 적은 개수의 정보를 포함할 수 있다.After extracting the number information of the first channel conversion module, the channel remapping information extraction unit extracts channel remapping information, and the channel remapping module performs channel remapping of the input channel (S50). Here, the bit field including the channel remapping information is allocated the number of bits corresponding to the number of input channels. For example, the channel remapping information may include the same number of information as the input channel or may include one less information than the input channel.

상기 채널 리매핑 수행 후, 제2 채널변환모듈 구성 정보를 추출한다(S60). 여기서, 상기 제2 채널변환모듈 구성 정보란 제 2 채널변환모듈들의 위치 정보 등을 포함하는 정보를 의미한다.After performing the channel remapping, the second channel conversion module configuration information is extracted (S60). Here, the second channel conversion module configuration information refers to information including location information of the second channel conversion modules.

상기 단계(S50)에서 채널 리매핑된 채널에 대해 채널변환모듈의 정보를 이용 하여 멀티채널을 생성한다(S70). 다시 말해서, 상기 단계(S40, S60)에서 추출된 정보에 따라 채널변환모듈인 제1 채널변환모듈과 제2 채널변환모듈의 위치가 결정되고, 이에 따라 리매핑된 채널들의 멀티채널을 생성하는 것이다.The multi-channel is generated using the information of the channel conversion module for the channel remapped channel in the step (S50) (S70). In other words, the positions of the first channel conversion module and the second channel conversion module, which are channel conversion modules, are determined according to the information extracted in steps S40 and S60, thereby generating multichannels of the remapped channels.

본 발명을 상술한 실시예에 한정되지 않으며, 첨부된 청구범위에서 알 수 있는 바와 같이 본 발명이 속한 분야의 통상의 지식을 가진 자에 의해 변형이 가능하고 이러한 변형은 본 발명의 범위에 속한다. The present invention is not limited to the above-described embodiments, and as can be seen in the appended claims, modifications can be made by those skilled in the art to which the invention pertains, and such modifications are within the scope of the present invention.

상기에서 설명한 본 발명에 따른 오디오 신호의 인코딩/디코딩 방법 및 장치 효과를 설명하면 다음과 같다. A method and apparatus for encoding / decoding an audio signal according to the present invention as described above are as follows.

첫째, 멀티채널 오디오 코딩에서 신호를 모노 또는 스테레오와 같이 특정 개수로 압축하고 공간 정보를 함께 전송 또는 저장함으로써 데이터 량을 효과적으로 줄이는 것이 가능하다.First, in multichannel audio coding, it is possible to effectively reduce the amount of data by compressing a signal to a certain number such as mono or stereo and transmitting or storing spatial information together.

둘째, 공간 정보가 포함되는 비트 스트림을 효과적으로 구성하여 오디오 신호를 효과적으로 처리하는 것이 가능하다. Second, it is possible to effectively configure the bit stream containing the spatial information to effectively process the audio signal.

Claims

(a) identifying a channel configuration identifier;

(b) if the channel configuration is not determined, extracting number information of the first channel conversion module to which the number of bits corresponding to the number of input channels is allocated;

(c) generating a multi-channel using the first channel conversion module.

According to claim 1, wherein step (b),

And extracting channel configuration information.

The method of claim 1,

The first channel conversion module is a decoding method of an audio signal, characterized in that the output channel is three channels if the input channel is two channels.

The method of claim 1,

And when the number information of the first channel conversion module is confirmed, the configuration of the second channel conversion module is determined.

The method of claim 4, wherein

The second channel conversion module is a method of decoding an audio signal, characterized in that the output channel is two channels if the input channel is one channel.

The method of claim 1,

And the number of first channel conversion modules is in a range between 0 and half of the number of input channels.

The method of claim 1,

When the number of input channels is in the range of 2 ^ (n-1) to (2 ^ n) -1 for any number n, the number information of the first channel conversion module is represented by n-1 bits. Decoding method of an audio signal, characterized in that.

The method of claim 1,

And the number of bits representing the number information of the first channel conversion module is ceil (log2 ((numInChan + 1) / 2)).

The method according to claim 1 or 4,

The position of the first channel conversion module is determined using the number information of the first channel conversion module, wherein the first channel conversion module is configured to have a configuration position before the second channel conversion module. Decoding method.

The method of claim 1,

And the input channel is remapped by a channel remapping module.

The method of claim 1,

The channel conversion module number field, which is a field indicating the number information of the first channel conversion module, is represented by any one of 0 to 4 bits.

The method of claim 11,

If the number of input channels is 16, the channel conversion module number field is 4 bits. If the number of input channels is 8-15, the channel conversion box module field is 3 bits. If the number of input channels is 4-7, the channel conversion module number field is 2 bits, and if the number of input channels is in the range of 2-3, the channel conversion module number field is 1 bit, and if the number of input channels is 1, the channel conversion module number field is 0 bit.

A module for checking a channel configuration identifier,

A first number information extracting unit extracting number information of a first channel conversion module to which the number of bits corresponding to the number of input channels is allocated, if the channel configuration is not determined;

And a first channel conversion module for generating a multi-channel by using the extracted number information.

If the channel configuration is not previously defined, generating number information of the first channel conversion module to which the number of bits corresponding to the number of output channels is allocated;

And generating configuration information of the second channel conversion module according to the number information of the first channel conversion module.

If the channel configuration is not predetermined, the first generation unit for generating the number information of the first channel conversion module to which the number of bits corresponding to the number of output channels is allocated;

And a second generator for generating configuration information of the second channel conversion module according to the number information of the first channel conversion module.

Contains a channel configuration identifier,

If the channel configuration is not determined, the audio signal comprising the information on the number of the first channel conversion module allocated the number of bits corresponding to the number of input channels.

The method of claim 16,

The audio signal is characterized in that it further comprises a channel configuration information.