KR20060109296A

KR20060109296A - Reference channel adaptation method considering subband spatial cues for multi-channel audio signal

Info

Publication number: KR20060109296A
Application number: KR1020060013751A
Authority: KR
Inventors: 방희석; 김동수; 임재현
Original assignee: 엘지전자 주식회사
Priority date: 2005-04-14
Filing date: 2006-02-13
Publication date: 2006-10-19
Also published as: KR20060109298A; KR20060109297A; KR20060109299A

Abstract

A method of applying a variable reference channel to a multi-channel audio signal using spatial information of each sub-band is provided to represent a spatial information bit stream in an efficient manner using a variable reference channel for spatial information in consideration of characteristics of the multi-channel audio signal to improve encoding, transmission and decoding efficiency of the multi-channel audio signal. A multi-channel audio signal is down-mixed and spatial information of the multi-channel audio signal is extracted by sub-bands(315,316). Reference channels are set for respective sub-bands using the spatial information, and spatial information of the other channels is re-extracted using spatial information of the set reference channels(318). The entire bit stream is constructed using the down-mixed audio signal and the re-extracted spatial information(321).

Description

Reference channel adaptation method considering subband spatial cues for multi-channel audio signal

도 1은 본 발명에서의 오디오 신호에 대한 공간 정보를 인간이 인식하는 방법을 나타내는 도면.BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a diagram illustrating a method for a human to recognize spatial information about an audio signal in the present invention.

도 2는 본 발명에서의 공간 인코더 및 디코더를 이용하여 멀티채널 오디오 신호를 코딩하는 방법에 대한 도면.2 is a diagram of a method of coding a multichannel audio signal using a spatial encoder and decoder in the present invention.

도 3a는 본 발명의 제1 실시예에 따른 공간 인코더에 대한 상세한 도면.3A is a detailed diagram of a spatial encoder according to the first embodiment of the present invention.

도 3b는 본 발명의 제1 실시예에 따른 인코딩 과정의 흐름도.3B is a flowchart of an encoding process according to the first embodiment of the present invention.

도 4a는 본 발명의 제2 실시예에 따른 공간 인코더에 대한 상세한 도면.4A is a detailed diagram of a spatial encoder according to a second embodiment of the present invention.

도 4b는 본 발명의 제2 실시예에 따른 인코딩 과정의 흐름도.4B is a flowchart of an encoding process according to the second embodiment of the present invention.

도 5a는 본 발명의 제1 실시예에 따른 공간 디코더에 대한 상세한도면.Fig. 5A is a detailed diagram of a spatial decoder according to the first embodiment of the present invention.

도 5b는 본 발명의 제1 실시예에 따른 디코딩 과정의 흐름도.5B is a flowchart of a decoding process according to the first embodiment of the present invention.

*도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

101.원거리 음원 102.직접적인 음파101.Remote sound source 102.Direct sound wave

104.반사된 음파 201.멀티채널 오디오 신호104. Reflected Sound Wave 201. Multichannel Audio Signal

202.다운믹스부 203.스페셜 파라미터 추출부202. Downmix unit 203. Special parameter extraction unit

204.공간 인코더 205.아티스틱 다운믹스 오디오 신호204 Spatial Encoder 205 Artistic Downmix Audio Signal

206.모노 또는 스테레오 오디오 신호 207.스페셜 파라미터206. Mono or stereo audio signal 207. Special parameters

208.공간 디코더 302.필터뱅크208.Space Decoder 302.Filter Bank

303.기준채널선택부 310.비트스트림 포맷터303. Reference channel selector 310. Bitstream formatter

407.비교 선택부 502.비트스트림 해독부407. Comparison selection unit 502. Bitstream decoding unit

504.공간 정보 계산부 504.Spatial information calculation unit

본 발명은 멀티채널 오디오 신호의 비트스트림(bitstream) 구성방법에 관한 것으로서, 더욱 상세하게는 멀티채널 오디오 코딩에서 변동적인 기준채널을 사용함으로써 공간 정보에 대한 비트스트림을 효과적으로 구성하는 방법에 관한 것이다.The present invention relates to a method for constructing a bitstream of a multichannel audio signal, and more particularly, to a method for effectively constructing a bitstream for spatial information by using a variable reference channel in multichannel audio coding.

최근에 디지털 오디오 신호에 대한 다양한 코딩기술 및 방법들이 개발되고 있으며, 이와 관련된 제품들이 생산되고 있다. 또한 심리음향 모델(Psychoacoustic model)을 이용하여 멀티채널 오디오 신호(multi-channel audio signal)의 코딩방법들이 개발되고 있으며, 이에 대한 표준화 작업이 진행되고 있다. Recently, various coding techniques and methods for digital audio signals have been developed, and related products have been produced. In addition, coding methods for multi-channel audio signals have been developed using a psychoacoustic model, and standardization thereof has been performed.

상기 심리음향 모델은 인간이 소리를 인식하는 방식, 예를 들면 큰 소리 다음에 오는 작은 소리는 들리지 않으며, 20Hz 내지 20000Hz의 주파수에 해당되는 소리만 들을 수 있다는 사실을 이용하여, 코딩과정에서 불필요한 부분에 대한 오디오 신호를 제거함으로써 필요한 데이터의 양을 효과적으로 줄일 수 있는 것이다.The psychoacoustic model is an unnecessary part of the coding process by using a method of recognizing a sound, for example, a small sound following a loud sound, and only a sound corresponding to a frequency of 20 Hz to 20000 Hz. By eliminating the audio signal for, the amount of data needed can be effectively reduced.

현재 MPEG-1 오디오(MEPG-1 레이어 Ⅲ), MPEG-4 AAC(Advanced Audio Coding) 및 MPEG-4 HE-AAC(High-Efficiency AAC)와 같은 오디오 표준 기술이 개발되어 상용화되고 있다. 또한 공간 정보를 이용하는 멀티채널 오디오 신호의 코딩방법이 개발되고 있다. 상기 멀티채널 오디오 신호의 코딩방법은 압축된 오디오 신호(예를 들면, 스테레오 또는 모노 오디오 신호) 및 낮은 비트-레이트의 부가정보(low-rate side information)(예를 들면, 공간 정보) 채널을 이용하여 멀티채널 오디오 신호의 전송 효율을 매우 효과적으로 향상시키는 것이다.Currently, audio standard technologies such as MPEG-1 Audio (MEPG-1 Layer III), MPEG-4 Advanced Audio Coding (AAC), and MPEG-4 High-Efficiency AAC (HE-AAC) have been developed and commercialized. In addition, a method of coding a multichannel audio signal using spatial information has been developed. The multi-channel audio signal coding method uses a compressed audio signal (e.g., stereo or mono audio signal) and a low bit-rate side information (e.g., spatial information) channel. Therefore, the transmission efficiency of the multichannel audio signal is greatly improved.

그러나, 상기 멀티채널 오디오 신호의 코딩방법에서 멀티채널 오디오 신호의 비트스트림을 구성하는데 있어서, 종래에는 공간 정보를 고정된 기준채널(예를 들면, Front left 신호)을 사용하여 순서를 정해서 공간 정보를 표현하였다. 따라서, 종래의 방법은 오디오 신호의 특성을 전혀 고려하지 않아 공간 정보를 코딩하는데 많은 비트율을 요구되어, 상기 오디오 신호에 대한 비트스트림의 구성효율이 좋지 못하다는 단점이 있었다.However, in configuring a bitstream of a multichannel audio signal in the coding method of the multichannel audio signal, conventionally, spatial information is ordered by using a fixed reference channel (for example, a front left signal). Expressed. Therefore, the conventional method does not consider the characteristics of the audio signal at all and requires a large bit rate for coding spatial information, which has a disadvantage in that the configuration efficiency of the bitstream for the audio signal is not good.

따라서 상기와 같은 문제점을 해결하기 위해 제안된 본 발명은, 멀티채널 오디오 신호를 코딩하는데 있어서, 오디오 신호의 특성을 고려하여 공간 정보에 대한 변동 기준채널을 사용하여 공간 정보 비트스트림(spatial information bitstream)을 효율적인 방식으로 표현함으로써, 멀티채널 오디오 신호의 인코딩, 전송 및 디코딩 효율을 향상시킬 수 있는 인코딩 및 디코딩 방법을 제공하는데 그 목적이 있다.Therefore, the present invention proposed to solve the above problems, in coding a multi-channel audio signal, a spatial information bitstream using a variable reference channel for spatial information in consideration of the characteristics of the audio signal It is an object of the present invention to provide an encoding and decoding method capable of improving the encoding, transmission and decoding efficiency of a multichannel audio signal by expressing in a efficient manner.

상기의 목적을 달성하기 위하여, 본 발명은 상기 멀티채널 오디오 신호를 다운믹스하고, 상기 멀티채널 오디오 신호의 공간 정보를 서브밴드별로 추출하는 단계와; 상기 서브밴드별 공간 정보를 이용하여 기준채널선택부에서 서브밴드별로 기준채널을 설정하고, 설정된 기준채널의 공간 정보를 이용하여 나머지 채널의 공간 정보를 재추출하는 단계와; 상기 다운믹스된 오디오 신호 및 상기 재추출된 공간 정보를 인코딩하여 비트스트림포맷터에서 비트스트림을 구성하는 단계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호의 인코딩 방법을 제공한다. 여기서 상기 공간 정보에는 CLD(Channel Level Difference), CTD(Channel Time Difference), ICC(Inter-Channel Coherence), 또는 CPC(Channel Prediction Coefficients)등이 포함될 수 있으며, 상기 미리 설정된 기준은 상기 공간 정보의 최대값 또는 중간값 중 하나에 대응하는 채널을 기준 채널로 설정하는 것이 될 수 있다. 또한, 상기 기준채널에 대한 인덱스 정보를 상기 비트스트림에 포함할 수 있으며, 상기 인덱스 정보는 상기 공간 정보에 대한 테이블과 동일한 테이블을 이용하여 비트스트림에 포함되거나 또는 별도의 테이블을 이용하여 비트스트림에 포함될 수 있다.In order to achieve the above object, the present invention comprises the steps of downmixing the multichannel audio signal and extracting spatial information of the multichannel audio signal for each subband; Setting a reference channel for each subband by the reference channel selector by using the spatial information for each subband, and re-extracting spatial information of the remaining channels using the spatial information of the set reference channel; And encoding the downmixed audio signal and the re-extracted spatial information to construct a bitstream in a bitstream formatter. The spatial information may include a channel level difference (CLD), a channel time difference (CTD), an inter-channel coherence (ICC), a channel prediction coefficient (CPC), etc. The preset criterion is a maximum of the spatial information. The channel corresponding to either the value or the median may be set as the reference channel. In addition, the index information for the reference channel may be included in the bitstream, and the index information is included in the bitstream using the same table as the table for the spatial information or in the bitstream using a separate table. May be included.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 상기 멀티채널 오디오 신호를 다운믹스하고, 상기 멀티채널 오디오 신호의 공간 정보를 서브밴드별로 추출하는 단계((a)단계)와; 상기 서브밴드별 공간 정보를 이용하여 기준채널선택부에서 서브밴드별로 기준채널을 설정하고, 설정된 기준채널의 공간 정보를 이용하여 나머지 채널의 공간 정보를 재추출하여 비트스트림을 형성하는 단계((b)단계)와; 고정 적으로 설정된 기준채널의 공간 정보를 이용하여 나머지 채널의 공간 정보를 추출하여 비트스트림을 형성하는 단계((c)단계)와; 비교선택부에서 상기 (b)단계로 형성된 비트스트림과 상기 (c)단계로 형성된 비트스트림을 비교한 후, 하나의 방법을 선택하는 단계((d)단계);를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호의 인코딩 방법을 제공한다. 여기서 상기 서브밴드별로 변동적으로 설정된 기준채널을 이용하는 경우와 상기 고정적으로 설정된 기준채널을 이용하는 경우를 구별할 수 있는 식별정보가 상기 비트스트림 내에 포함될 수 있다. 상기 하나의 방법을 선택하는 것은, 예를 들면 최소의 비트율을 갖는 방법을 선택하는 것이 될 수 있다.In addition, in order to achieve the above object, the present invention comprises the steps of downmixing the multichannel audio signal and extracting spatial information of the multichannel audio signal for each subband (step (a)); The reference channel selector sets a reference channel for each subband by using the spatial information for each subband, and extracts spatial information of the remaining channels using spatial information of the set reference channel to form a bitstream ((b). Step); Extracting spatial information of the remaining channels using spatial information of the fixed reference channel to form a bitstream (step (c)); Comparing the bitstream formed in the step (b) with the bitstream formed in the step (c), and then selecting one method (step (d)); A method of encoding a channel audio signal is provided. In this case, identification information for distinguishing a case in which a reference channel is variably set from each subband and a case in which the stationary reference channel is used may be included in the bitstream. Selecting the one method may be, for example, selecting a method having a minimum bit rate.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 다운믹스된 오디오 신호에 대한 코어 코덱 비트스트림 및 공간 정보 비트스트림을 수신하는 단계와; 상기 코어 코덱 비트스트림 및 공간 정보 비트스트림을 디코딩하고, 상기 디코딩된 공간 정보 비트스트림으로부터 서브 밴드별로 변동적으로 설정된 기준채널의 인덱스 정보, 기준채널의 공간 정보 및 상기 기준채널의 공간 정보를 이용하여 추출된 나머지 채널의 공간 정보를 독출하는 단계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호의 디코딩 방법을 제공한다. 여기서 상기 디코딩 방법은 상기 독출된 기준채널의 인덱스 정보, 기준채널의 공간 정보 및 나머지 채널의 공간정보를 이용하여 상기 코어 코덱 비스트림을 디코딩하여 생성된 다운믹스 오디오 신호를 멀티채널로 변환하는 단계를 더 포함할 수 있다.In addition, to achieve the above object, the present invention includes the steps of receiving a core codec bitstream and spatial information bitstream for the downmixed audio signal; The core codec bitstream and the spatial information bitstream are decoded, and index information of the reference channel, which is variably set for each subband from the decoded spatial information bitstream, spatial information of the reference channel, and spatial information of the reference channel are used. And reading spatial information of the extracted remaining channel. The decoding method may include converting a downmix audio signal generated by decoding the core codec non-stream into a multi-channel using index information of the read reference channel, spatial information of the reference channel, and spatial information of the remaining channels. It may further include.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 다운믹스된 오디오 신호에 대한 코어 코덱 비트스트림 및 공간 정보 비트스트림을 수신하는 단계와; 상기 코 어 코덱 비트스트림을 디코딩하고, 상기 공간 정보 비트스트림으로부터 서브밴드별로 변동적으로 설정된 기준채널을 이용하여 비트스트림을 구성하는 경우와 고정적으로 설정된 기준채널을 이용하여 비트스트림을 구성하는 경우를 구별할 수 있는 식별정보를 독출하고, 상기 독출된 식별정보에 따라 공간 정보 비트스트림을 디코딩하는 단계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호의 디코딩 방법을 제공한다. 여기서 상기 디코딩 방법은 상기 디코딩된 공간 정보 비트스트림으로부터 기준채널의 인덱스 정보, 기준채널의 공간 정보 및 나머지 채널의 공간정보를 독출하고, 독출된 상기 기준채널의 인덱스 정보, 기준채널의 공간 정보 및 나머지 채널의 공간정보를 이용하여 상기 코어 코덱 비트스트림을 디코딩하여 얻어진 다운믹스 신호를 멀티채널로 변환하는 단계를 더 포함할 수 있다.In addition, to achieve the above object, the present invention includes the steps of receiving a core codec bitstream and spatial information bitstream for the downmixed audio signal; Decoding the core codec bitstream and configuring a bitstream using a reference channel set variably for each subband from the spatial information bitstream and configuring a bitstream using a fixed reference channel. And reading the distinguishable identification information and decoding the spatial information bitstream according to the read identification information. The decoding method reads the index information of the reference channel, the spatial information of the reference channel and the spatial information of the remaining channels from the decoded spatial information bitstream, and the index information of the read reference channel, the spatial information of the reference channel, and the rest. The method may further include converting a downmix signal obtained by decoding the core codec bitstream by using spatial information of a channel into a multichannel.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 오디오 신호가 코어 코덱 비트스트림 및 공간 정보 비트스트림을 포함하도록 생성되고, 상기 공간 정보 비트스트림은 멀티채널 오디오 신호의 공간 정보를 이용하여 기준채널선택부에서 서브밴드별로 기준채널을 설정하고, 설정된 상기 기준채널의 공간 정보를 이용하여 나머지 채널의 공간 정보를 재추출하며, 상기 기준채널의 공간 정보 및 재추출된 나머지 채널의 공간 정보를 이용하여 비트스트림이 생성되는 것을 특징으로 하는 오디오 신호의 생성방법을 제공한다.In addition, in order to achieve the above object, the present invention is generated so that an audio signal includes a core codec bitstream and a spatial information bitstream, wherein the spatial information bitstream is selected by using the spatial information of the multi-channel audio signal A reference channel is set for each subband, re-extracting spatial information of the remaining channels using the set spatial information of the reference channel, and using the spatial information of the reference channel and the spatial information of the re-extracted remaining channels It provides a method for generating an audio signal, characterized in that the stream is generated.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 멀티채널 오디오 신호의 공간 정보를 서브밴드별로 추출하고, 상기 서브밴드별 공간 정보를 이용하여 기준채널선택부에서 서브밴드별로 기준채널을 변동적으로 설정하며, 설정된 기준채널의 공간 정보를 이용하여 나머지 채널의 공간 정보를 재추출하여 비트스트림을 구성하는 방법과 고정적으로 설정된 기준채널의 공간 정보를 이용하여 나머지 채널의 공간 정보를 추출하여 비트스트림을 구성하는 방법을 비교한 후, 하나의 방법을 선택하도록 생성되는 것을 특징으로 하는 오디오 신호의 생성방법을 제공한다. 여기서 상기 비트스트림에는 상기 서브밴드별로 설정된 변동적인 기준채널을 이용하여 비트스트림을 구성하는 경우와 상기 고정적으로 설정된 기준채널을 이용하여 비트스트림을 구성하는 경우를 구별할 수 있는 식별정보가 포함될 수 있다.In addition, in order to achieve the above object, the present invention extracts the spatial information of the multi-channel audio signal for each subband, and a reference channel for each subband in the reference channel selector variably using the spatial information for each subband A method of configuring a bitstream by re-extracting spatial information of the remaining channels using spatial information of the set reference channel and extracting spatial information of the remaining channels using spatial information of a fixed reference channel. The present invention provides a method for generating an audio signal, characterized in that the method is configured to compare one method and then select one method. Here, the bitstream may include identification information for distinguishing a case where a bitstream is configured using a variable reference channel set for each subband and a case where a bitstream is configured using the fixed reference channel. .

이하 상기의 목적을 구체적으로 실현할 수 있는 본 발명의 바람직한 실시예를 첨부한 도면을 참조하여 설명한다.Hereinafter, with reference to the accompanying drawings, preferred embodiments of the present invention that can specifically realize the above object will be described.

도 1 은 본 발명에서의 오디오 신호에 대한 공간 정보를 인간이 인식하는 방법을 도시한다. 멀티채널 오디오 신호에 대한 코딩방법은 인간이 오디오 신호를 3차원적 공간으로 인지한다는 사실을 바탕으로, 복수의 파라미터 세트(parameter sets)를 통하여 상기 오디오 신호를 3차원적 공간 정보로 표현할 수 있다는 것을 이용한다. 멀티채널 오디오 신호의 공간 정보를 표시하기 위한 "공간 파라미터"라고 불리는 상기 파라미터에는 CLD(Channel level differences), ICC(Inter Channel Coherences) 및 CPC(Channel Prediction Coefficients) CTD(Channel Time Difference)등이 있다. 상기 CLD는 두 채널간의 에너지 차이를 의미하고, 상기 ICC는 두 채널 간의 상관관계(correlation)를 의미하며, 상기 CPC는 두 채널로부터 세 채널을 생성할 때 이용되는 예측 계수(prediction coefficient)를 의미하고, CTD는 두 채널간의 시간 차이를 의미한다.1 shows a method for a human to recognize spatial information about an audio signal in the present invention. The coding method for a multichannel audio signal is based on the fact that a human perceives the audio signal as a three-dimensional space. I use it. Such parameters, called "spatial parameters" for indicating spatial information of a multichannel audio signal, include channel level differences (CLD), inter channel coherences (ICC), channel prediction coefficients (CPC), and channel time differences (CTD). The CLD denotes an energy difference between two channels, the ICC denotes a correlation between two channels, and the CPC denotes a prediction coefficient used when generating three channels from two channels. , CTD means time difference between two channels.

인간이 오디오 신호를 어떻게 공간적으로 인식하며, 상기 공간 파라미터의 개념이 어떻게 생성되는지가 도 1에 도시된다. 원거리에 있는 음원(105)으로부터의 직접적인 음파(direct sound wave)(103)가 인간의 왼쪽 귀(107)에 도달하고, 또 다른 직접적인 음파(102)는 머리 주위에서 회절되어 오른쪽 귀(106)에 도달하게 된다. 상기 두 음파(102 및 103)는 도달시간 및 에너지 레벨에서 차이를 보이게 되며, 이와 같은 차이가 상기 CLD, CPC 및 CTD 파라미터를 생성하게 된다.How a human perceives an audio signal spatially and how the concept of the spatial parameter is generated is shown in FIG. 1. Direct sound wave 103 from the remote source 105 arrives at the human left ear 107, and another direct sound wave 102 is diffracted around the head to the right ear 106. Will be reached. The two sound waves 102 and 103 show a difference in arrival time and energy level, and this difference generates the CLD, CPC and CTD parameters.

또한 만일 반사된 음파(104 및 105)가 양 귀에 도달되거나, 또는 상기 음원(105)이 분산되어 있다면, 서로 상관관계가 없는 음파가 양 귀에 도달될 것이고, 이것이 상기 ICC 파라미터를 생성하게 된다. 상기와 같이 원리로 생성된 공간 파라미터들은 멀티채널 오디오 신호를 모노 또는 스테레오 신호로 전송한 후 다시 멀티채널로 출력하는데 있어서, 강력한 비트 수 감소를 가능하게 한다는 것이 알려져 있다. 본 발명은 상기 공간 파라미터들을 변동 기준채널을 이용하여 매우 효율적인 방법으로 비트스트림 내에 표현하는 방법을 제시한다.Also, if the reflected sound waves 104 and 105 reach both ears, or if the sound source 105 is dispersed, sound waves that do not correlate with each other will reach both ears, which will generate the ICC parameter. Spatial parameters generated on the principle as described above are known to enable a strong number of bits in transmitting a multichannel audio signal as a mono or stereo signal and then outputting the multichannel audio signal back to the multichannel. The present invention proposes a method of expressing the spatial parameters in a bitstream in a very efficient manner using a variable reference channel.

도 2 는 본 발명에서의 공간 인코더 및 디코더를 이용하여 멀티채널 오디오 신호를 코딩하는 원리를 도시한다. 도시된 것처럼, 먼저 공간 인코더(204)는 멀티채널 오디오 신호(201)를 수신한다. 여기서 N은 입력 채널의 수를 의미한다. 상기 멀티채널 오디오 신호(201)는 다운믹스(down-mix)부(202)에서 다운믹스되어 다운믹스 신호(206)로 된다.2 illustrates the principle of coding a multichannel audio signal using a spatial encoder and decoder in the present invention. As shown, the spatial encoder 204 first receives a multichannel audio signal 201. Where N is the number of input channels. The multichannel audio signal 201 is downmixed by the down-mix unit 202 to be a downmix signal 206.

또한 상기 멀티채널 오디오 신호의 공간 정보, 즉 공간 파라미터가 공간 파라미터 추출부(203)에서 상기 멀티채널 오디오 신호(201)로부터 추출된다. 여기서 공간 정보(spatial information)란 멀티채널(예를 들면, Left, Right, Center, Left surround, Right surround 등) 오디오 신호를 다운믹스하고, 상기 다운믹스 신호(206)를 전송하며, 상기 전송된 다운믹스 신호를 다시 멀티채널로 업믹스 할 때 사용되는 오디오 신호 채널에 대한 정보를 말한다. 선택적으로, 상기 다운믹스 신호(206)는 외부에서 직접 제공되는 다운믹스 신호, 예를 들면 아티스틱 다운믹스 신호(Artistic down-mix signal, 205)를 이용하여 생성될 수 있다.In addition, the spatial information of the multichannel audio signal, that is, the spatial parameter, is extracted from the multichannel audio signal 201 by the spatial parameter extractor 203. In this case, spatial information refers to downmixing a multi-channel (eg, Left, Right, Center, Left surround, Right surround, etc.) audio signal, transmitting the downmix signal 206, and transmitting the transmitted down signal. Information about the audio signal channel used when upmixing a mix signal back to multichannel. Alternatively, the downmix signal 206 may be generated using an externally provided downmix signal, for example, an artistic down-mix signal 205.

상기 다운믹스 신호(206)는 코어 코덱 코딩방법을 이용하여 인코딩된 후 압축되어 전송되고, 또한 상기 공간 정보, 즉 공간 파라미터(207)도 함께 전송된다. 상기 코어 코덱은 공간 정보, 즉 공간 파라미터(207)가 아닌 오디오 신호를 코딩 또는 인코딩하는 코덱을 지칭하며, 상기 코어 코덱에는 MP3, AC-3, DTS 또는 AAC가 포함될 수 있으며, 오디오 신호에 대하여 코덱 기능을 수행한다면 기존에 개발된 코덱뿐만 아니라 향후 개발될 코덱을 포함할 수 있다. 만일 사용자의 시스템이 상기 다운믹스 신호(206)로만 출력할 수 있다면, 상기 압축되어 전송된 다운믹스 신호(206)는 디코딩된 후 직접 출력(209)될 수 있다. 만일 상기 시스템이 멀티채널 오디오 신호로 출력할 수 있다면, 상기 압축되어 전송된 오디오 신호는 디코딩된 후 공간 디코더(208)에서 함께 전송된 상기 멀티채널 오디오 신호의 공간 정보, 즉 공간 파라미터(207)를 이용하여 멀티채널 오디오 신호(210)로 변환되어 출력될 수 있다.The downmix signal 206 is encoded using a core codec coding method and then compressed and transmitted. The downmix signal 206 is also transmitted with the spatial information, that is, the spatial parameter 207. The core codec refers to a codec for coding or encoding an audio signal other than the spatial information, that is, the spatial parameter 207. The core codec may include MP3, AC-3, DTS, or AAC. If the function is performed, it may include a codec to be developed in the future as well as a codec previously developed. If the user's system can only output the downmix signal 206, the compressed downmixed signal 206 may be decoded and output 209 directly. If the system is capable of outputting a multichannel audio signal, the compressed and transmitted audio signal is decoded and then received spatial information, i.e., spatial parameter 207, of the multichannel audio signal transmitted together by the spatial decoder 208. The multichannel audio signal 210 may be converted into a multichannel audio signal 210 and output.

멀티채널 오디오 신호를 직접 전송하는 대신에, 상기와 같이 다운믹스 신호(206)로 다운믹스하여 전송하고, 상기 멀티채널 오디오 신호의 공간 정보, 즉 공간 파라미터(207)를 함께 전송하는 방식은 압축 및 전송효율의 관점에서 매우 유리하다. 본 발명에서는 상기 멀티채널 오디오 신호의 공간 정보, 즉 공간 파라미터(207)를 함께 전송하는데 있어서, 상기 공간 파라미터(207)를 변동적인 기준채널을 사용하여 더욱 효율적으로 방법으로 표현하여 비트열을 구성함으로써 압축 및 전송효율을 개선할 수 있다.Instead of transmitting the multichannel audio signal directly, the method of downmixing and transmitting the downmix signal 206 as described above, and transmitting the spatial information of the multichannel audio signal, that is, the spatial parameter 207 together, is compressed and It is very advantageous in terms of transmission efficiency. In the present invention, when transmitting spatial information of the multi-channel audio signal, that is, the spatial parameter 207 together, the spatial parameter 207 is represented in a more efficient manner using a variable reference channel to form a bit string. Compression and transmission efficiency can be improved.

도 3a는 본 발명의 제1 실시예에 따른 공간 인코더를 상세하게 도시한다. 도시된 것처럼 멀티채널 오디오 신호(301)는 다운믹스되어 인코딩(312)되며, 또한 상기 멀티채널 오디오 신호(301)로부터 공간 정보를 추출하기 위해 상기 멀티채널 오디오 신호는 필터뱅크(302)를 통과하여 서브밴드별로 나누어진다. 상기 필터뱅크(302)는 모든 주파수 대역에 걸친 오디오 신호를 각 서브밴드별로 나누는 역할을 하며, 상기 필터뱅크(302)에는 서브 밴드 필터뱅크(sub-band filter bank) 또는 QMF 필터뱅크가 포함될 수 있다. 상기 필터뱅크를 통과한 오디오 신호는 서브 밴드별로 공간 정보가 추출되고, 기준채널선택부(303)에서 상기 추출된 공간 정보를 이용하여 미리 설정된 기준에 따라 기준채널을 변동적으로 설정한다. 상기 기준채널은 서브밴드별로 설정될 수 있다. 그 다음에 상기 변동적으로 설정된 기준채널을 이용하여 공간 정보를 표현한다. 상기 기준채널을 이용하여 표현된 공간 정보는 주변값과의 비교값으로 표현될 수 있는데, 예를 들면 주변값과의 차이값으로 표현될 수 있다. 상기 공간 정보에는 CLD, CTD, ICC 또는 CPC 등이 포함될 수 있다. 상기 미리 설정된 기준은, 예를 들면 상기 공간 정보 중 최대값 또는 중간값의 공간 정보를 갖는 채널을 기준채널로 설정하는 것이 될 수 있다.3A shows in detail the spatial encoder according to the first embodiment of the present invention. As shown, the multichannel audio signal 301 is downmixed and encoded 312, and the multichannel audio signal is passed through a filterbank 302 to extract spatial information from the multichannel audio signal 301. It is divided by subbands. The filter bank 302 serves to divide an audio signal across all frequency bands for each subband, and the filter bank 302 may include a sub-band filter bank or a QMF filter bank. . The spatial information is extracted for each subband of the audio signal passing through the filter bank, and the reference channel selector 303 uses the extracted spatial information to variably set a reference channel according to a preset reference. The reference channel may be set for each subband. Then, spatial information is expressed using the variably set reference channel. The spatial information expressed using the reference channel may be expressed as a comparison value with a peripheral value, for example, as a difference value with the peripheral value. The spatial information may include CLD, CTD, ICC or CPC. The preset criterion may be, for example, setting a channel having spatial information of a maximum value or a median value of the spatial information as a reference channel.

상기 기준채널선택부(303)는 서브 밴드별 공간 정보를 이용하여 미리 설정된 기준, 예를 들면 상기 서브밴드별 공간 정보 중 최대값 또는 중간값의 공간 정보를 갖는 채널을 기준채널로 설정(304)할 수 있다. 상기 기준채널선택부(303)는 여러 단계로 구성될 수 있으며, 예를 들면 상기와 같이 설정된 서브밴드별 기준채널을 이용하여 공간 정보를 계산(305)하고, 계산된 상기 공간 정보를 양자화(306)함으로써, 상기 기준채널의 인덱스 정보(307) 및 양자화된 공간 정보(308)를 생성할 수 있다. 그 다음에 상기 기준채널의 인덱스 정보(307) 및 양자화된 공간 정보(308)는 인코딩(309)되어 비트스트림 포맷터(Bitstream Formatter, 310)에서 다운믹스된 오디오 신호와 함께 전체 비트스트림(311)을 형성하게 된다. 상기 기준 채널의 인덱스 정보(307) 및 양자화된 공간 정보(308)는 공간 정보 비트스트림을 형성하고, 상기 다운믹스된 오디오 신호는 코어 코덱 비트스트림을 형성한다. 상기 공간 정보 비트스트림이란 멀티 채널 오디오 신호로부터 추출된 공간 정보, 즉 공간 파라미터에 대한 비트스트림으로서, 상기 공간 정보에 대한 컨피규레이션 비트스트림 및 공간 데이터 비트스트림으로 구성될 수 있다. 상기 코어 코덱 비트스트림이란 상기 공간 정보를 제외한 오디오 신호에 의해 형성되는 비트스트림을 말한다.The reference channel selector 303 sets a channel having spatial information of a predetermined value, for example, the maximum value or the median value of the subband spatial information, as the reference channel using the spatial information for each subband (304). can do. The reference channel selector 303 may be configured in several steps. For example, the reference channel selector 303 may calculate spatial information using the reference channel for each subband set as described above, and quantize the calculated spatial information 306. ), The index information 307 and the quantized spatial information 308 of the reference channel can be generated. The index information 307 and the quantized spatial information 308 of the reference channel are then encoded 309 to complete the entire bitstream 311 along with the downmixed audio signal in the bitstream formatter 310. To form. The index information 307 and the quantized spatial information 308 of the reference channel form a spatial information bitstream, and the downmixed audio signal forms a core codec bitstream. The spatial information bitstream is a spatial information extracted from a multi-channel audio signal, that is, a bitstream for a spatial parameter, and may be configured with a configuration bitstream and a spatial data bitstream for the spatial information. The core codec bitstream refers to a bitstream formed by an audio signal except for the spatial information.

상기 공간 정보 중 CLD의 경우를 예를 들면, 먼저 필터뱅크(302)를 통해 서브밴드별로 나누어진 오디오 신호로부터 서브밴드별로 채널별 에너지(CLD)를 계산한다. 그 다음에 계산된 상기 채널별 에너지를 이용하여 기준채널선택부(303)는 미리 설정된 기준에 따라 서브밴드별로 특정한 채널을 기준채널로 설정한다. 설정된 상기 기준채널의 에너지를 이용하여 채널간 에너지 차이 등의 에너지 정보를 표현 하고, 상기와 같이 표현된 에너지 정보를 양자화한다. 그 다음에 상기 양자화된 에너지 정보는 상기 기준채널의 인덱스 정보와 함께 또는 각각 허프만 코딩(Huffman Coding) 등의 엔트로피 코딩이 수행되고, 비트스트림 포맷터(310)에서 다운믹스된 오디오 신호와 함께 전체 비트스트림을 구성하게 된다.For example, in the case of CLD among the spatial information, first, the energy CLD for each channel is calculated from the audio signal divided for each subband through the filter bank 302. Then, the reference channel selector 303 sets a specific channel for each subband as a reference channel using the calculated energy for each channel. Energy information such as energy difference between channels is expressed using the set energy of the reference channel, and the energy information expressed as described above is quantized. Subsequently, the quantized energy information is subjected to entropy coding such as Huffman coding or the index information of the reference channel, respectively, and the entire bitstream together with the audio signal downmixed in the bitstream formatter 310. Will be configured.

상기 공간 정보 중 CTD, ICC, 또는 CPC의 경우를 예를 들면, 먼저 필터뱅크(302)를 통해 서브밴드별로 나누어진 오디오 신호로부터 서브밴드별로 CTD, ICC 또는 CPC를 추출한다. 그 다음에 추출된 상기 CTD, ICC 또는 CPC를 이용하여 기준채널선택부(303)는 미리 설정된 기준에 따라 서브밴드별로 특정한 채널을 기준채널로 설정한다. 설정된 상기 기준채널의 CTD, ICC 또는 CPC를 이용하여 CTD, ICC 또는 CPC를 재추출하고, 상기와 같이 표현된 CTD, ICC 또는 CPC를 양자화한다. 그 다음에 양자화된 상기 CTD, ICC 또는 CPC는 기준채널의 인덱스 정보와 함께 또는 각각 허프만 코딩 등의 엔트로피 코딩이 수행되고, 비트스트림 포맷터(310)에서 다운믹스된 오디오 신호와 함께 전체 비트스트림을 구성하게 된다.For example, in the case of CTD, ICC, or CPC, the CTD, ICC, or CPC is extracted for each subband from the audio signal divided for each subband through the filter bank 302. Then, using the extracted CTD, ICC or CPC, the reference channel selector 303 sets a specific channel for each subband as a reference channel according to a preset reference. CTD, ICC or CPC is re-extracted using the configured CTD, ICC or CPC of the reference channel, and the CTD, ICC or CPC expressed as described above is quantized. The quantized CTD, ICC, or CPC is then subjected to entropy coding such as Huffman coding or index information of the reference channel, respectively, and constitutes the entire bitstream together with the downmixed audio signal in the bitstream formatter 310. Done.

도 3b는 본 발명의 제1 실시예에 따른 인코딩 과정의 흐름도이다. 공간 정보를 더욱 효율적으로 표현하기 위해, 먼저 멀티채널 오디오 신호(314)로부터 오디오 신호를 다운믹스(315)하고, 상기 다운믹스된 오디오 신호를 이용하여 코어 코덱 비트스트림을 생성(317)한다. 상기 다운믹스된 오디오 신호는 모노 또는 스테레오 신호를 포함할 수 있다. 그 다음에 상기 멀티채널 오디오 신호(314)로부터 서브 밴드별로 공간 정보, 예를 들면 공간 파라미터를 추출(316)하고, 상기 추출된 공간 정보를 이용하여 미리 설정된 방식으로 서브밴드별로 기준 채널을 선택(318)한다. 상 기 미리 설정된 방식은 상기 공간 정보의 최대값 또는 중간값 중 하나에 대응하는 채널을 기준채널로 설정하는 것이 될 수 있다. 그 다음에 설정된 기준채널을 이용하여 공간 정보를 표현하고, 상기와 같이 표현된 공간 정보를 포함하는 공간 정보 비트스트림을 생성(320)하고, 상기 코어 코덱 비트스트림 및 공간 정보 비트스트림으로 구성되는 전체 비트스트림을 전송(321)한다.3B is a flowchart of an encoding process according to the first embodiment of the present invention. To more efficiently represent spatial information, first downmix 315 the audio signal from the multichannel audio signal 314 and generate a core codec bitstream using the downmixed audio signal (317). The downmixed audio signal may comprise a mono or stereo signal. Subsequently, spatial information, e.g., spatial parameters, are extracted for each subband from the multichannel audio signal 314, and a reference channel is selected for each subband in a preset manner by using the extracted spatial information. 318). The preset method may be to set a channel corresponding to one of the maximum value or the median value of the spatial information as a reference channel. Next, the spatial information is represented using the set reference channel, and the spatial information bitstream including the spatial information expressed as described above is generated 320, and the entire codec bitstream and the spatial information bitstream are composed. The bitstream is transmitted 321.

도 4a는 본 발명의 제2 실시예에 따른 공간 인코더에 대한 상세한 도면을 나타낸다. 도시된 것처럼 멀티채널 오디오 신호(401)는 다운믹스되어 인코딩(410)되며, 또한 상기 멀티채널 오디오 신호(401)로부터 공간 정보를 추출하기 위해 상기 멀티채널 오디오 신호는 필터뱅크(402)를 통과하여 서브밴드별로 나누어진다. 상기 필터뱅크를 통과한 오디오 신호로부터 서브 밴드별로 공간 정보가 추출되고, 추출된 상기 공간 정보는 두 가지 방법으로 표현된다. 첫 번째 방법은 기준채널선택부(403)에서 상기 서브밴드별로 추출된 공간 정보를 이용하여 미리 설정된 기준에 따라 서브밴드별 기준채널을 변동적으로 설정하고, 상기 변동적으로 설정된 기준채널을 이용하여 나머지 채널의 공간 정보를 재추출하며, 상기 기준채널의 공간 정보 및 재추출된 공간 정보를 인코딩부(405)에서 인코딩하여 비트스트림을 구성하는 것이다. 두 번째 방법은 기준채널선택부(404)에서 고정된 기준채널의 공간 정보를 이용하여 나머지 채널의 공간 정보를 추출하고, 상기 기준채널의 공간 정보 및 나머지 채널의 공간 정보를 인코딩부(406)에서 인코딩하여 비트스트림을 구성하는 것이다. 상기 기준채널선택부(404)는 상기 기준채널선택부(403)과 동일한 것이거나 별개의 것이 될 수 있다. 그 다음에 비교선택부(407)에서 상기 변동적으로 설정된 기 준채널을 이용하여 비트스트림을 구성하는 방법과 상기 고정된 기준채널을 이용하여 비트스트림을 구성하는 방법 중 하나의 방법을 선택한다. 상기 비교선택부(407)는 상기 두 개의 방법에 의해 구성되는 비트스트림의 비트율을 비교하여 최소의 비트율을 갖는 방법을 선택할 수 있다. 그 다음에 선택된 방법에 의한 비트스트림은 비트스트림 포맷터(408)에서 다운믹스된 오디오 신호에 대한 비트스트림과 함께 전체 비트스트림(409)을 형성하게 된다.4A shows a detailed diagram of a spatial encoder according to a second embodiment of the present invention. As shown, the multichannel audio signal 401 is downmixed and encoded 410, and the multichannel audio signal is passed through a filterbank 402 to extract spatial information from the multichannel audio signal 401. It is divided by subbands. Spatial information is extracted for each subband from the audio signal passing through the filter bank, and the extracted spatial information is expressed in two ways. In the first method, the reference channel selector 403 variably sets a reference channel for each subband according to a preset reference using spatial information extracted for each subband, and uses the variably set reference channel. The spatial information of the remaining channels is reextracted, and the encoding unit 405 encodes the spatial information and the reextracted spatial information of the reference channel to form a bitstream. The second method extracts spatial information of the remaining channels using spatial information of the fixed reference channel in the reference channel selector 404, and encodes spatial information of the reference channel and spatial information of the remaining channels in the encoding unit 406. The encoding consists of the bitstream. The reference channel selector 404 may be the same as or different from the reference channel selector 403. Next, the comparison selecting unit 407 selects one of a method of configuring a bitstream using the variable reference channel and a method of configuring a bitstream using the fixed reference channel. The comparison selecting unit 407 may select a method having a minimum bit rate by comparing bit rates of the bitstreams formed by the two methods. The bitstream by the selected method then forms the entire bitstream 409 along with the bitstream for the downmixed audio signal in the bitstream formatter 408.

도 4b는 본 발명의 제2 실시예에 따른 인코딩 과정의 흐름도이다. 공간 정보를 더욱 효율적으로 표현하기 위해, 먼저 멀티채널 오디오 신호(411)로부터 오디오 신호를 다운믹스(412)하고, 상기 다운믹스된 오디오 신호를 이용하여 코어 코덱 비트스트림을 생성(414)한다. 상기 다운믹스된 오디오 신호는 모노 또는 스테레오 신호를 포함할 수 있다. 그 다음에 상기 멀티채널 오디오 신호(411)로부터 서브 밴드별로 공간 정보, 예를 들면 공간 파라미터를 추출(413)하고, 상기 서브밴드별로 추출된 공간 정보를 이용하여 미리 설정된 방식으로 변동적인 기준채널을 서브밴드별로 선택하고, 상기 선택된 변동적인 기준채널을 이용하여 공간 정보 비트스트림을 생성(416)한다. 또한, 상기 서브밴드별로 추출된 공간 정보는 고정채널을 이용하여 공간 정보 비트스트림을 생성(415)한다. 그 다음에 상기 변동적인 기준채널을 이용하여 생성된 공간 정보 비트스트림과 상기 고정채널을 이용하여 생성된 공간 정보 비트스트림을 비교(417)한다. 만일 변동적인 기준채널을 사용하여 공간 정보 비트스트림을 생성하는 것이 더 작은 비트율을 갖는다면 상기 변동적인 기준채널을 사용하여 생성된 공간 정보 비트스트림을 선택(419)하고, 만일 고정채널을 사용하여 공간 정보 비트스트림을 생성하는 것이 더 작은 비트율을 갖는다면 상기 고정채널을 사용하여 생성된 공간 정보 비트스트림을 선택(418)한다. 그 다음에 선택된 방식의 공간 정보 비트스트림 및 상기 코어 코덱 비트스트림으로 구성되는 전체 비트스트림을 전송(420)한다.4B is a flowchart of an encoding process according to a second embodiment of the present invention. In order to more efficiently represent the spatial information, first downmix 412 the audio signal from the multichannel audio signal 411, and generate a core codec bitstream using the downmixed audio signal (414). The downmixed audio signal may comprise a mono or stereo signal. Subsequently, spatial information, e.g., spatial parameters, are extracted from the multi-channel audio signal 411 for each subband (413), and a reference channel that is changed in a predetermined manner is used by using the spatial information extracted for each subband. Selected for each subband, and generates a spatial information bitstream using the selected variable reference channel (416). In addition, the spatial information extracted for each subband generates a spatial information bitstream using a fixed channel (415). Next, the spatial information bitstream generated using the variable reference channel and the spatial information bitstream generated using the fixed channel are compared 417. If generating the spatial information bitstream using the variable reference channel has a smaller bit rate, select (419) the spatial information bitstream generated using the variable reference channel, and if using the fixed channel, If generating the information bitstream has a smaller bit rate, then select 418 the spatial information bitstream generated using the fixed channel. Then, the entire bitstream consisting of the spatial information bitstream of the selected scheme and the core codec bitstream is transmitted (420).

도 5a는 본 발명의 제1 실시예에 따른 공간 디코더를 상세하게 도시한다. 도시된 것처럼, 상기 공간 디코더는 코어 코덱 비트스트림 및 공간 정보 비트스트림을 포함하는 전체 비트스트림(501)을 수신하고, 비트스트림 해독부(502)에서 해독된다. 상기 코어 코덱 비트스림은 디코딩되어(412) 다운믹스된 오디오 신호(413)로 되며, 상기 다운믹스된 오디오 신호는 모노 또는 스테레오 오디오 신호를 포함할 수 있다. 또한, 상기 공간 정보 비트스트림은 서브 밴드별로 디코딩(503)되어 공간 정보 계산부(504)로 들어가게 된다. 상기 디코딩 단계는 예를 들면 호프만 디코딩 방식을 이용할 수 있다. 상기 공간 정보 계산부(504)는 기준채널의 인덱스 정보(505) 및 상기 기준채널을 이용하여 표현된 공간 정보(506)를 이용하여 공간 정보를 계산(507)하고, 계산된 공간 정보에 대응하는 물리량을 계산(408)하여, 상기 공간 정보에 대응하는 물리량을 각 서브밴드별로 출력(409)한다. 그 다음에 멀티채널 오디오 신호 발생기(510)는 상기 다운믹스된 오디오 신호 및 상기 각 서브밴드별로 출력된 공간 정보에 대응하는 물리량을 이용하여 멀티채널 오디오 신호(511)를 만들 수 있다.5a shows a spatial decoder according to a first embodiment of the present invention in detail. As shown, the spatial decoder receives the entire bitstream 501 including the core codec bitstream and the spatial information bitstream, and is decoded by the bitstream decoder 502. The core codec bitstream is decoded 412 to be a downmixed audio signal 413, which may include a mono or stereo audio signal. In addition, the spatial information bitstream is decoded 503 for each subband and entered into the spatial information calculator 504. The decoding step may use, for example, a Hoffman decoding scheme. The spatial information calculator 504 calculates 507 spatial information using the index information 505 of the reference channel and the spatial information 506 expressed using the reference channel, and corresponds to the calculated spatial information. The physical quantity is calculated 408, and the physical quantity corresponding to the spatial information is output 409 for each subband. The multichannel audio signal generator 510 may then generate the multichannel audio signal 511 using the downmixed audio signal and the physical quantity corresponding to the spatial information output for each subband.

도 5b는 본 발명의 제1 실시예에 따른 디코딩 과정의 흐름도를 도시한다. 먼저 공간 디코더는 코어 코데 비트스트림 및 공간 정보 비트스트림을 포함하는 전체 비트스트림을 수신(514)한다. 그 다음에 상기 코어 코덱 비트스트림은 디코딩(515)되고, 상기 디코딩된 코어 코덱 비트스트림으로부터 다운믹스된 오디오 신호를 추출(517)할 수 있다. 상기 다운믹스된 오디오 신호는 모노 또는 스테레오 오디오 신호를 포함할 수 있다. 또한, 상기 공간 정보 비트스트림이 디코딩(516)되고, 상기 디코딩된 공간 정보 비트스트림으로부터 기준채널의 인덱스 정보를 독출(518)하며, 상기 기준채널의 인덱스 정보를 이용하여 공간 정보를 계산하고 상기 공간 정보에 대응하는 물리량을 계산(519)한다. 그 다음에 상기 계산된 공간 정보에 대응하는 물리량을 이용하여 상기 추출된 다운믹스 신호를 멀티채널 오디오 신호로 변환(520)한다.5B shows a flowchart of a decoding process according to the first embodiment of the present invention. First, the spatial decoder receives 514 the entire bitstream including the core codec bitstream and the spatial information bitstream. The core codec bitstream may then be decoded 515 and extract 517 a downmixed audio signal from the decoded core codec bitstream. The downmixed audio signal may comprise a mono or stereo audio signal. In addition, the spatial information bitstream is decoded 516, the index information of the reference channel is read 518 from the decoded spatial information bitstream, and the spatial information is calculated using the index information of the reference channel. A physical quantity corresponding to the information is calculated (519). Thereafter, the extracted downmix signal is converted into a multi-channel audio signal using the physical quantity corresponding to the calculated spatial information (520).

지금까지 본 발명에 대하여 몇몇 실시예들을 들어 구체적으로 설명하였으나, 상기 실시예들은 본 발명을 이해하기 위한 설명을 위해 제시된 것이며, 본 발명의 범위가 상기 실시예에 제한되는 것은 아니다. 당업자라면 본 발명의 기술적 사상의 범위를 벗어나지 않고도 다양한 변형이 가능함을 이해할 수 있을 것이며, 본 발명의 범위는 첨부된 특허청구범위에 의해서 해석되어야 할 것이다.Although the present invention has been described in detail with reference to some embodiments, the above embodiments are presented for the purpose of understanding the present invention, and the scope of the present invention is not limited to the above embodiments. Those skilled in the art will understand that various modifications are possible without departing from the scope of the technical idea of the present invention, and the scope of the present invention should be interpreted by the appended claims.

이상에서 기술된 것과 같이, 본 발명에 따른 멀티채널 오디오 신호를 코딩하는데 있어서, 공간 정보를 효율적으로 표현하기 위해, 서브밴드별로 추출된 공간 정보를 이용하여 미리 설정된 기준에 따라 변동적으로 기준채널을 설정하고, 상기 변동적인 기준채널을 이용하여 공간 정보를 표현함으로써 전체 비트스트림을 더 적은 비트율로 표현할 수 있으며, 따라서 인코딩, 전송 및 디코딩 효율을 향상시킬 수 있는 효과가 있다.As described above, in coding a multi-channel audio signal according to the present invention, in order to efficiently represent spatial information, the reference channel is variably changed according to a preset reference using spatial information extracted for each subband. By setting and expressing spatial information using the variable reference channel, the entire bitstream can be represented at a lower bit rate, thereby improving the encoding, transmission and decoding efficiency.

또한, 상기 변동적인 기준채널을 이용하여 공간 정보를 표현하는 경우와, 고정채널을 이용하여 공간 정보를 표현하는 경우를 비교하여, 더 적은 비트율로 표현될 수 있는 방법을 선택함으로써 인코딩, 전송 및 디코딩 효율을 향상시킬 수 있는 효과가 있다.Also, by comparing the case of representing spatial information using the variable reference channel and the case of representing spatial information using a fixed channel, encoding, transmission and decoding are performed by selecting a method that can be represented with a lower bit rate. There is an effect that can improve the efficiency.

Claims

A method of encoding a multichannel audio signal,

(a) downmixing the multichannel audio signal and extracting spatial information of the multichannel audio signal for each subband;

(b) setting a reference channel for each subband using the spatial information for each subband, and re-extracting spatial information for the remaining channels using spatial information of the set reference channel; And

and (b) constructing an entire bitstream by using the downmixed audio signal and the re-extracted spatial information.

The method of claim 1,

The spatial information includes a channel level difference (CLD), a channel time difference (CTD), an inter-channel coherence (ICC), or channel prediction coefficients (CPC).

The method of claim 1,

In step (b),

A channel having spatial information of the maximum value or the median value among the spatial information extracted for each subband is set as a reference channel for each subband, and the spatial information of the remaining channels is reextracted using the set reference channel. , Method of encoding multichannel audio signals.

The method of claim 1,

In step (b),

And incorporating index information of the reference channel into the bitstream.

The method of claim 4, wherein

And the index information is included in the bitstream using the same table as the table for the spatial information or in the bitstream using a separate table.

The method of claim 1.

In step (b),

And expressing the re-extracted spatial information as a comparison value with adjacent spatial information.

A method of encoding a multichannel audio signal,

(b) setting a reference channel for each subband using the spatial information for each subband, and reconstructing spatial information of the remaining channels using spatial information of the set reference channel to form a bitstream;

(c) forming a bitstream using spatial information of a reference channel in which spatial information for each subband is fixed; And

(d) comparing the bitstream formed in the step (b) with the bitstream formed in the step (c), and then selecting one of the methods; encoding method of the multi-channel audio signal .

The method of claim 7, wherein

In step (d),

When composing the entire bitstream using the bitstream of the selected method, it is possible to distinguish whether the bitstream is formed in the step (b) or the bitstream is formed in the step (c). And including the identification information.

The method of claim 7, wherein

In step (d),

And selecting a method having a minimum bit rate among the bitstream formed in the step (b) and the bitstream formed in the step (c).

In the method of decoding a multichannel audio signal,

(a) receiving a core codec bitstream and a spatial information bitstream for the downmixed audio signal; And

(b) index information of a reference channel, spatial information of a reference channel, and spatial information of the reference channel, which are decoded from the core codec bitstream and the spatial information bitstream and are variably set for each subband from the decoded spatial information bitstream; And reading out spatial information of the remaining channel extracted using the multi-channel audio signal.

The method of claim 10,

In step (b),

And converting the downmix signal generated by decoding the core codec nonstream using the index information of the read reference channel, the spatial information of the reference channel, and the spatial information of the remaining channels, into a multichannel. A method for decoding a multichannel audio signal.

In the method of decoding a multichannel audio signal,

(b) decoding the core codec bitstream and configuring a bitstream using a reference channel set variably from the spatial information bitstream for each subband and configuring a bitstream using a fixed reference channel; And reading the identification information for distinguishing cases, and decoding the spatial information bitstream according to the read identification information.

The method of claim 12,

In step (b),

The index information of the reference channel, the spatial information of the reference channel, and the spatial information of the remaining channels are read from the decoded spatial information bitstream, and the index information of the read reference channel, spatial information of the reference channel, and spatial information of the remaining channels are read. And converting the downmix signal obtained by decoding the core codec bitstream into a multichannel using the multi-channel audio signal.

In generating an audio signal,

The audio signal is generated to include a core codec bitstream and a spatial information bitstream,

The spatial information bitstream sets a reference channel for each subband using spatial information of a multichannel audio signal, re-extracts spatial information of the remaining channels using spatial information of the set reference channel, and spaces of the reference channel. And the entire bitstream is generated using the information and the spatial information of the remaining extracted channels.

The method of claim 14,

The setting of the reference channel may include setting a channel having spatial information of a maximum value or a median value among the spatial information extracted for each subband as a reference channel.

In generating an audio signal,

Extract spatial information of a multi-channel audio signal for each subband, variably set a reference channel for each subband using the spatial information for each subband, and spatial information of the remaining channels using spatial information of the set reference channel. It is generated to select one method after comparing the method of composing the bitstream by re-extracting and the method of composing the bitstream by extracting the spatial information of the remaining channels using the spatial information of the fixed reference channel. A method of generating an audio signal.

The method of claim 17,

The bitstream is generated to include identification information for distinguishing a case of configuring a bitstream using a variable reference channel set for each subband and a case of configuring a bitstream using the fixed reference channel. A method of generating an audio signal, characterized in that.