KR20060109299A

KR20060109299A - Method for encoding-decoding subband spatial cues of multi-channel audio signal

Info

Publication number: KR20060109299A
Application number: KR1020060013754A
Authority: KR
Inventors: 방희석; 김동수; 임재현
Original assignee: 엘지전자 주식회사
Priority date: 2005-04-14
Filing date: 2006-02-13
Publication date: 2006-10-19
Also published as: KR20060109298A; KR20060109296A; KR20060109297A

Abstract

A method of encoding and decoding spatial information for each sub-band with respect to a multi-channel audio signal is provided to form spatial information of the audio signal from differential values by channels, sub-bands and frames to improve the encoding, transmission and decoding efficiency of the multi-channel audio signal. A multi-channel audio signal is down-mixed(310), and first spatial information is extracted from the multi-channel audio signal by sub-bands using a reference channel(311). Second spatial information is generated with differential values by channels, differential values by sub-bands and differential values by frames of the first spatial information(313). A core CODEC bit stream is generated using the down-mixed audio signal(312), and a spatial information bit stream is generated using the second spatial information(314).

Description

Code-decoding method of subband spatial information for multi-channel audio signal {METHOD FOR ENCODING-DECODING SUBBAND SPATIAL CUES OF MULTI-CHANNEL AUDIO SIGNAL}

도 1은 본 발명에서의 오디오 신호에 대한 공간 정보를 인간이 인식하는 방법을 나타내는 도면.BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a diagram illustrating a method for a human to recognize spatial information about an audio signal in the present invention.

도 2는 본 발명에서의 공간 인코더 및 디코더를 이용하여 멀티채널 오디오 신호를 코딩하는 방법에 대한 도면.2 is a diagram of a method of coding a multichannel audio signal using a spatial encoder and decoder in the present invention.

도 3a는 본 발명의 제1 실시예에 따른 공간 인코더에 대한 상세한 도면.3A is a detailed diagram of a spatial encoder according to the first embodiment of the present invention.

도 3b는 본 발명의 제1 실시예에 따른 인코딩 과정의 흐름도.3B is a flowchart of an encoding process according to the first embodiment of the present invention.

도 4a는 본 발명의 제2 실시예에 따른 공간 인코더에 대한 상세한 도면.4A is a detailed diagram of a spatial encoder according to a second embodiment of the present invention.

도 4b는 본 발명의 제2 실시예에 따른 인코딩 과정의 흐름도.4B is a flowchart of an encoding process according to the second embodiment of the present invention.

도 5a는 본 발명의 제1 실시예에 따른 공간 디코더에 대한 상세한도면.Fig. 5A is a detailed diagram of a spatial decoder according to the first embodiment of the present invention.

도 5b는 본 발명의 제1 실시예에 따른 디코딩 과정의 흐름도.5B is a flowchart of a decoding process according to the first embodiment of the present invention.

*도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

101.원거리 음원 102.직접적인 음파101.Remote sound source 102.Direct sound wave

104.반사된 음파 201.멀티채널 오디오 신호104. Reflected Sound Wave 201. Multichannel Audio Signal

202.다운믹스부 203.스페셜 파라미터 추출부202. Downmix unit 203. Special parameter extraction unit

204.공간 인코더 205.아티스틱 다운믹스 오디오 신호204 Spatial Encoder 205 Artistic Downmix Audio Signal

206.모노 또는 스테레오 오디오 신호 207.스페셜 파라미터206. Mono or stereo audio signal 207. Special parameters

208.공간 디코더 302.필터뱅크208.Space Decoder 302.Filter Bank

303.기준채널선택 및 공간정보계산부303. Reference channel selection and spatial information calculation unit

304.비교값생성부 305.그룹화생성부304. Comparative value generation unit 305. Grouping generation unit

306.최소비트율결정부 410.비교선택부306. Minimum Bit Rate Determination Unit 410. Comparative Selection Unit

502.공간정보검출부 502.디퍼렌셜공간정보검출부502. Spatial information detector 502. Differential spatial information detector

503.그룹화공간정보검출부 506.공간정보계산부503. Grouping spatial information detection unit 506. Spatial information calculation unit

본 발명은 멀티채널 오디오 신호의 공간 정보에 대한 부호-복호화(encoding-decoding)방법에 관한 것으로서, 더욱 상세하게는 멀티채널 오디오 코딩에서 서브밴드별 공간 정보를 디퍼렌셜 값으로 생성하거나 또는 그룹화를 통해 생성함으로써 비트스트림을 효과적으로 구성하는 방법에 관한 것이다.The present invention relates to an encoding-decoding method for spatial information of a multi-channel audio signal. More particularly, the present invention relates to generating sub-band spatial information as differential values or grouping in multi-channel audio coding. The present invention relates to a method for effectively configuring a bitstream.

최근에 디지털 오디오 신호에 대한 다양한 코딩기술 및 방법들이 개발되고 있으며, 이와 관련된 제품들이 생산되고 있다. 또한 심리음향 모델(Psychoacoustic model)을 이용하여 멀티채널 오디오 신호(multi-channel audio signal)의 코딩방법들이 개발되고 있으며, 이에 대한 표준화 작업이 진행되고 있다. Recently, various coding techniques and methods for digital audio signals have been developed, and related products have been produced. In addition, coding methods for multi-channel audio signals have been developed using a psychoacoustic model, and standardization thereof has been performed.

상기 심리음향 모델은 인간이 소리를 인식하는 방식, 예를 들면 큰 소리 다 음에 오는 작은 소리는 들리지 않으며, 20Hz 내지 20000Hz의 주파수에 해당되는 소리만 들을 수 있다는 사실을 이용하여, 코딩과정에서 불필요한 부분에 대한 오디오 신호를 제거함으로써 필요한 데이터의 양을 효과적으로 줄일 수 있는 것이다.The psychoacoustic model is unnecessary in the coding process by using a method in which a human recognizes a sound, for example, a small sound that comes after a loud sound and can only hear a sound corresponding to a frequency of 20 Hz to 20000 Hz. By removing the audio signal for the part, the amount of data required can be effectively reduced.

현재 MPEG-1 오디오(MEPG-1 레이어 Ⅲ), MPEG-4 AAC(Advanced Audio Coding) 및 MPEG-4 HE-AAC(High-Efficiency AAC)와 같은 오디오 표준 기술이 개발되어 상용화되고 있다. 또한 공간 정보를 이용하는 멀티채널 오디오 신호의 코딩방법이 개발되고 있다. 상기 멀티채널 오디오 신호의 코딩방법은 압축된 오디오 신호(예를 들면, 스테레오 또는 모노 오디오 신호) 및 낮은 비트-레이트의 부가정보(low-rate side information)(예를 들면, 공간 정보) 채널을 이용하여 멀티채널 오디오 신호의 전송 효율을 매우 효과적으로 향상시키는 것이다.Currently, audio standard technologies such as MPEG-1 Audio (MEPG-1 Layer III), MPEG-4 Advanced Audio Coding (AAC), and MPEG-4 High-Efficiency AAC (HE-AAC) have been developed and commercialized. In addition, a method of coding a multichannel audio signal using spatial information has been developed. The multi-channel audio signal coding method uses a compressed audio signal (e.g., stereo or mono audio signal) and a low bit-rate side information (e.g., spatial information) channel. Therefore, the transmission efficiency of the multichannel audio signal is greatly improved.

그러나, 상기 멀티채널 오디오 신호의 비트스트림을 구성하는데 있어서, 종래에는 공간 정보에 대하여 서브밴드별로 양자화하여 부호화-복호화하였었다. 따라서, 종래의 방법은 오디오 신호의 공간 정보에 대한 특성을 전혀 고려하지 않아 공간 정보를 코딩하는데 많은 비트율을 요구되어, 상기 오디오 신호의 공간 정보를 이용하여 생성되는 비트스트림의 구성효율이 좋지 못하다는 단점이 있었다.However, in constructing the bitstream of the multichannel audio signal, conventionally, the spatial information has been quantized by subband for encoding and decoding. Therefore, the conventional method does not consider the characteristics of the spatial information of the audio signal at all and requires a large bit rate for coding the spatial information, resulting in poor construction efficiency of the bitstream generated using the spatial information of the audio signal. There was a downside.

따라서 상기와 같은 문제점을 해결하기 위해 제안된 본 발명은, 멀티채널 오디오 신호를 코딩하는데 있어서, 오디오 신호의 공간 정보에 대한 특성을 고려하여 상기 공간 정보를 채널별, 서브밴드별, 프레임별 디퍼렌셜 값으로 생성하거나 또는 채널별, 서브밴드별, 프레임별로 그룹화를 통해 생성하여 공간 정보 비트스트림 (spatial information bitstream)을 구성함으로써, 멀티채널 오디오 신호의 인코딩, 전송 및 디코딩 효율을 향상시킬 수 있는 인코딩 및 디코딩 방법을 제공하는데 그 목적이 있다.Accordingly, the present invention proposed to solve the above problems, in coding a multi-channel audio signal, in consideration of the characteristics of the spatial information of the audio signal, the spatial information by channel, subband, frame-by-frame differential value Encoding or decoding to improve the encoding, transmission and decoding efficiency of multichannel audio signals by constructing a spatial information bitstream or by generating a group by channel, subband, or frame. The purpose is to provide a method.

상기의 목적을 달성하기 위하여, 본 발명은 상기 멀티채널 오디오 신호를 다운믹스하고, 상기 멀티채널 오디오 신호로부터 기준채널을 이용하여 서브밴드별로 제1 공간 정보를 추출하는 단계와; 상기 제1 공간 정보의 채널별 디퍼렌셜 값, 서브밴드별 디퍼렌셜 값 또는 프레임별 디퍼렌셜 값으로 제2 공간 정보를 생성하는 단계와; 상기 다운믹스된 오디오 신호를 이용하여 코어 코덱 비트스트림을 형성하고, 상기 제2 공간 정보를 이용하여 공간 정보 비트스트림을 형성하는 단계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호의 인코딩 방법을 제공한다. 여기서 상기 제2 공간 정보는 제1 공간정보를 채널별 디퍼렌셜, 서브밴드별 디퍼렌셜, 또는 프레임별 디퍼렌셜 중 두 개 이상을 조합하여 생성할 수 있다. 상기 공간 정보는, ICLD(Channel Level Difference), ICTD(Channel Time Difference), 또는 ICC(Inter-Channel Coherence)를 포함될 수 있으며, 상기 채널별 디퍼렌셜은 가장 에너지가 큰 채널, 에너지가 중간인 채널, 또는 가장 에너지가 낮은 채널 중 하나의 채널을 기준채널로 할 수 있다.In order to achieve the above object, the present invention comprises the steps of downmixing the multichannel audio signal and extracting first spatial information for each subband using a reference channel from the multichannel audio signal; Generating second spatial information using a differential value for each channel, a differential value for each subband, or a differential value for each frame of the first spatial information; Forming a core codec bitstream by using the downmixed audio signal and forming a spatial information bitstream by using the second spatial information; providing a method of encoding a multichannel audio signal do. The second spatial information may be generated by combining two or more of the first spatial information by channel-specific, subband-differential, or frame-by-frame differential. The spatial information may include a Channel Level Difference (ICLD), a Channel Time Difference (ICTD), or an Inter-Channel Coherence (ICC), and the differential for each channel includes a channel having the largest energy, a channel having an intermediate energy, or One channel of the lowest energy channel may be used as the reference channel.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 상기 멀티채널 오디오 신호를 다운믹스하고, 상기 멀티채널 오디오 신호로부터 기준채널을 이용하여 서브밴드별로 제1 공간 정보를 추출하는 단계와; 상기 제1 공간 정보를 채널별, 서브밴드 별 또는 프레임별로 그룹화하여 제2 공간 정보를 생성하는 단계와; 상기 다운믹스된 오디오 신호를 이용하여 코어 코덱 비트스트림을 형성하고, 상기 제2 비교값으로 생성된 공간 정보를 이용하여 공간 정보 비트스트림을 형성하는 단계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호의 인코딩 방법을 제공한다. In addition, to achieve the above object, the present invention comprises the steps of downmixing the multichannel audio signal, and extracting first spatial information for each subband using a reference channel from the multichannel audio signal; Generating second spatial information by grouping the first spatial information by channel, subband, or frame; And forming a core codec bitstream using the downmixed audio signal, and forming a spatial information bitstream using the spatial information generated as the second comparison value. Provides an encoding method.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 (a)상기 멀티채널 오디오 신호를 다운믹스하고, 상기 멀티채널 오디오 신호로부터 기준채널을 이용하여 서브밴드별로 제1 공간 정보를 추출하는 단계와; (b)상기 제1 공간 정보의 채널별 디퍼렌셜 값, 서브밴드별 디퍼렌셜 값 또는 프레임별 디퍼렌셜 값으로 제2 공간 정보를 생성하는 단계와; (c)상기 제1 공간 정보를 채널별, 서브밴드별 또는 프레임별로 그룹화하여 제2 공간 정보를 생성하는 단계와; (d)상기 (b)단계로 생성된 제2 공간 정보와 상기 (c)단계로 생성된 제2 공간 정보 중 더 작은 비트율을 갖는 것을 선택하여 공간 정보 비트스트림을 생성하는 단계와; (e)상기 제1 공간 정보를 이용하여 공간 정보 비트스트림을 생성하는 단계와; (f)상기 (d)단계로 생성된 공간 정보 비트스트림과 상기 (e)단계로 생성된 공간 정보 비트스트림 중 하나를 선택하는 단계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호의 인코딩 방법을 제공한다. 여기서 상기 선택된 방법으로 생성된 공간 정보 비트스트림을 이용하여 전체 비트스트림을 구성할 때, 상기 전체 비트스트림 내에 상기 (d)단계로 공간 정보 비트스트림이 형성되었는지, 상기 (e)단계로 공간 정보 비트스트림이 형성되었는지를 구별할 수 있는 식별정보를 포함할 수 있다.In addition, to achieve the above object, the present invention comprises the steps of (a) downmixing the multichannel audio signal, and extracting first spatial information for each subband using a reference channel from the multichannel audio signal; (b) generating second spatial information based on a channel-specific differential value, a subband-specific differential value, or a frame-specific differential value of the first spatial information; (c) generating second spatial information by grouping the first spatial information by channel, subband, or frame; (d) generating a spatial information bitstream by selecting one having a smaller bit rate among the second spatial information generated in step (b) and the second spatial information generated in step (c); (e) generating a spatial information bitstream using the first spatial information; (f) selecting one of the spatial information bitstream generated in step (d) and the spatial information bitstream generated in step (e). to provide. Here, when configuring the entire bitstream using the spatial information bitstream generated by the selected method, whether the spatial information bitstream is formed in the (d) step in the entire bitstream, the spatial information bit in the step (e) It may include identification information for distinguishing whether a stream is formed.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 (a)다운믹스된 오디오 신 호에 대한 코어 코덱 비트스트림 및 공간 정보 비트스트림을 수신하는 단계와; (b)상기 공간 정보 비트스트림으로부터 기준채널을 이용하여 서브밴드별로 생성된 제1 공간정보의 채널별 디퍼렌셜 값, 서브밴드별 디퍼렌셜 값 또는 프레임별 디퍼렌셜 값으로 생성된 제2 공간 정보 및 채널인덱스 정보를 독출하는 단계와; (c)상기 코어 코덱 비트스트림을 디코딩하고, 상기 (b)단계에서 독출된 공간 정보 및 채널인덱스 정보를 이용하여 상기 공간 정보 비트스트림을 디코딩하는 단계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호를 디코딩하는 방법을 제공한다. 여기서 상기 (b)단계는 상기 제2 공간 정보를 채널별, 서브밴드별 또는 프레임별로 그룹화하여 생성된 공간 정보 및 채널인덱스 정보를 독출하는 단계를 더 포함할 수 있다. 또한 상기 디코딩된 공간 정보 비트스트림으로부터 얻어진 공간 정보를 이용하여 상기 코어 코덱 비트스트림을 디코딩하여 얻어진 다운믹스 오디오 신호를 멀티채널 오디오 신호로 변환할 수 있다.In addition, to achieve the above object, the present invention comprises the steps of: (a) receiving a core codec bitstream and spatial information bitstream for the downmixed audio signal; (b) Second spatial information and channel index information generated as a differential value for each channel, a differential value for each subband, or a differential value for each frame of the first spatial information generated for each subband using a reference channel from the spatial information bitstream. Reading a; (c) decoding the core codec bitstream and decoding the spatial information bitstream using the spatial information and the channel index information read in the step (b). It provides a method for decoding. The step (b) may further include reading the spatial information and the channel index information generated by grouping the second spatial information for each channel, subband, or frame. In addition, the downmix audio signal obtained by decoding the core codec bitstream may be converted into a multichannel audio signal using the spatial information obtained from the decoded spatial information bitstream.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 (a)다운믹스된 오디오 신호에 대한 코어 코덱 비트스트림 및 공간 정보 비트스트림을 수신하는 단계와; (b)상기 공간 정보 비트스트림으로부터 채널별, 서브밴드 또는 프레임별로 그룹화하여 생성된 공간 정보 및 채널인덱스 정보를 독출하는 단계와; (c)상기 코어 코덱 비트스트림을 디코딩하고, 상기 (b)단계에서 독출된 공간 정보 및 채널인덱스 정보를 이용하여 상기 공간 정보 비트스트림을 디코딩하는 단계;를 포함하는 것을 특징으로 하는 멀티채널 오디오 신호를 디코딩하는 방법을 제공한다.In addition, in order to achieve the above object, the present invention comprises the steps of (a) receiving a core codec bitstream and spatial information bitstream for the downmixed audio signal; (b) reading spatial information and channel index information generated by grouping each channel, subband or frame from the spatial information bitstream; (c) decoding the core codec bitstream and decoding the spatial information bitstream using the spatial information and the channel index information read in the step (b). It provides a method for decoding.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 상기 오디오 신호가 코어 코덱 비트스트림 및 공간 정보 비트스트림을 포함하도록 생성되고, 상기 공간 정보 비트스트림은 멀티채널 오디오 신호로부터 기준채널을 이용하여 서브밴드별로 제1 공간 정보를 추출하고, 추출된 상기 제1 공간 정보의 채널별 디퍼렌셜 값, 서브밴드별 디퍼렌셜 값 또는 프레임별 디퍼렌셜 값으로 제2 공간 정보를 표현하도록 생성되는 것을 특징으로 하는 오디오 신호의 생성방법을 제공한다. In addition, in order to achieve the above object, the present invention is generated so that the audio signal includes a core codec bitstream and a spatial information bitstream, the spatial information bitstream is a subband using a reference channel from a multichannel audio signal And generating second spatial information by extracting first spatial information for each channel, and expressing second spatial information by a differential value for each channel, a differential value for each subband, or a differential value for each frame of the extracted first spatial information. Provide a method.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 오디오 신호가 코어 코덱 비트스트림 및 공간 정보 비트스트림을 포함하도록 생성되고, 상기 공간 정보 비트스트림은 멀티채널 오디오 신호로부터 기준채널을 이용하여 서브밴드별로 제1 공간 정보를 추출하고, 추출된 상기 제1 공간 정보를 채널별, 서브밴드별 또는 프레임별로 그룹화하여 제2 공간 정보를 표현하도록 생성되는 것을 특징으로 하는 오디오 신호의 생성방법을 제공한다.In addition, in order to achieve the above object, the present invention is generated so that an audio signal includes a core codec bitstream and a spatial information bitstream, the spatial information bitstream for each subband using a reference channel from a multi-channel audio signal And extracting first spatial information and grouping the extracted first spatial information by channel, subband, or frame to represent second spatial information.

또한, 상기의 목적을 달성하기 위하여, 본 발명은 멀티채널 오디오 신호로부터 서브밴드별로 기준채널을 이용하여 제1 공간 정보를 추출하고, 상기 제1 공간 정보의 채널별 디퍼렌셜 값, 서브밴드별 디퍼렌셜 값 또는 프레임별 디퍼렌셜 값으로 제2 공간 정보를 생성하는 경우와 상기 제1 공간 정보를 채널별, 서브밴드별 또는 프레임별로 그룹화하여 제2 공간 정보를 생성하는 경우 중 더 작은 비트율을 갖는 경우를 이용하여 공간 정보 비트스트림을 생성하는 방법과, 상기 제1 공간 정보를 이용하여 공간 정보 비트스트림을 생성하는 방법을 비교한 후, 하나의 방법을 선택하도록 생성되는 것을 특징으로 하는 오디오 신호의 생성방법을 제공한다.In addition, in order to achieve the above object, the present invention extracts the first spatial information using a reference channel for each subband from the multi-channel audio signal, the differential value for each channel, the differential value for each subband of the first spatial information Alternatively, when the second spatial information is generated using differential values for each frame and the second spatial information is generated by grouping the first spatial information for each channel, for each subband, or for each frame, a smaller bit rate is used. And a method of generating a spatial information bitstream and a method of generating a spatial information bitstream using the first spatial information, and then selecting one method. do.

이하 상기의 목적을 구체적으로 실현할 수 있는 본 발명의 바람직한 실시예 를 첨부한 도면을 참조하여 설명한다.Hereinafter, with reference to the accompanying drawings, preferred embodiments of the present invention that can specifically realize the above object will be described.

도 1 은 본 발명에서의 오디오 신호에 대한 공간 정보를 인간이 인식하는 방법을 도시한다. 멀티채널 오디오 신호에 대한 코딩방법은 인간이 오디오 신호를 3차원적 공간으로 인지한다는 사실을 바탕으로, 복수의 파라미터 세트(parameter sets)를 통하여 상기 오디오 신호를 3차원적 공간 정보로 표현할 수 있다는 것을 이용한다. 멀티채널 오디오 신호의 공간 정보를 표시하기 위한 "공간 파라미터"라고 불리는 상기 파라미터에는 ICLD(Inter Channel level differences), ICC(Inter Channel Coherences) 및 ICTD(Inter Channel Time Difference)등이 있다. 상기 ICLD는 두 채널간의 에너지 차이를 의미하고, 상기 ICC는 두 채널 간의 상관관계(correlation)를 의미하며, ICTD는 두 채널간의 시간 차이를 의미한다.1 shows a method for a human to recognize spatial information about an audio signal in the present invention. The coding method for a multichannel audio signal is based on the fact that a human perceives the audio signal as a three-dimensional space. I use it. Such parameters, called "spatial parameters" for indicating spatial information of a multichannel audio signal, include ICLD (Inter Channel level differences), ICC (Inter Channel Coherences), ICTD (Inter Channel Time Difference), and the like. The ICLD means an energy difference between two channels, the ICC means a correlation between two channels, and the ICTD means a time difference between two channels.

인간이 오디오 신호를 어떻게 공간적으로 인식하며, 상기 공간 파라미터의 개념이 어떻게 생성되는지가 도 1에 도시된다. 원거리에 있는 음원(105)으로부터의 직접적인 음파(direct sound wave)(103)가 인간의 왼쪽 귀(107)에 도달하고, 또 다른 직접적인 음파(102)는 머리 주위에서 회절되어 오른쪽 귀(106)에 도달하게 된다. 상기 두 음파(102 및 103)는 도달시간 및 에너지 레벨에서 차이를 보이게 되며, 이와 같은 차이가 상기 CLD, CPC 및 CTD 파라미터를 생성하게 된다.How a human perceives an audio signal spatially and how the concept of the spatial parameter is generated is shown in FIG. 1. Direct sound wave 103 from the remote source 105 arrives at the human left ear 107, and another direct sound wave 102 is diffracted around the head to the right ear 106. Will be reached. The two sound waves 102 and 103 show a difference in arrival time and energy level, and this difference generates the CLD, CPC and CTD parameters.

또한 만일 반사된 음파(104 및 105)가 양 귀에 도달되거나, 또는 상기 음원(105)이 분산되어 있다면, 서로 상관관계가 없는 음파가 양 귀에 도달될 것이고, 이것이 상기 ICC 파라미터를 생성하게 된다. 상기와 같이 원리로 생성된 공간 파라미터들은 멀티채널 오디오 신호를 모노 또는 스테레오 신호로 전송한 후 다시 멀티 채널로 출력하는데 있어서, 강력한 비트 수 감소를 가능하게 한다는 것이 알려져 있다. 본 발명은 상기 공간 파라미터를 채널별, 서브밴드별, 프레임별 비교값으로 생성하거나 또는 그룹화를 통해 생성하여 비트스트림을 형성하는 방법을 제시한다.Also, if the reflected sound waves 104 and 105 reach both ears, or if the sound source 105 is dispersed, sound waves that do not correlate with each other will reach both ears, which will generate the ICC parameter. It is known that the spatial parameters generated on the principle as described above enable a strong number of bits in transmitting a multichannel audio signal as a mono or stereo signal and then outputting the multichannel audio signal back to the multichannel. The present invention proposes a method for generating a bitstream by generating the spatial parameter as a comparison value for each channel, for each subband, or for each frame, or for grouping.

도 2 는 본 발명에서의 공간 인코더 및 디코더를 이용하여 멀티채널 오디오 신호를 코딩하는 원리를 도시한다. 도시된 것처럼, 먼저 공간 인코더(204)는 멀티채널 오디오 신호(201)를 수신한다. 여기서 N은 입력 채널의 수를 의미한다. 상기 멀티채널 오디오 신호(201)는 다운믹스(down-mix)부(202)에서 다운믹스되어 다운믹스 신호(206)로 된다.2 illustrates the principle of coding a multichannel audio signal using a spatial encoder and decoder in the present invention. As shown, the spatial encoder 204 first receives a multichannel audio signal 201. Where N is the number of input channels. The multichannel audio signal 201 is downmixed by the down-mix unit 202 to be a downmix signal 206.

또한 상기 멀티채널 오디오 신호의 공간 정보, 즉 공간 파라미터가 공간 파라미터 추출부(203)에서 상기 멀티채널 오디오 신호(201)로부터 추출된다. 여기서 공간 정보(spatial information)란 멀티채널(예를 들면, Left, Right, Center, Left surround, Right surround 등) 오디오 신호를 다운믹스하고, 상기 다운믹스 신호(206)를 전송하며, 상기 전송된 다운믹스 신호를 다시 멀티채널로 업믹스 할 때 사용되는 오디오 신호 채널에 대한 정보를 말한다. 선택적으로, 상기 다운믹스 신호(206)는 외부에서 직접 제공되는 다운믹스 신호, 예를 들면 아티스틱 다운믹스 신호(Artistic down-mix signal, 205)를 이용하여 생성될 수 있다.In addition, the spatial information of the multichannel audio signal, that is, the spatial parameter, is extracted from the multichannel audio signal 201 by the spatial parameter extractor 203. In this case, spatial information refers to downmixing a multi-channel (eg, Left, Right, Center, Left surround, Right surround, etc.) audio signal, transmitting the downmix signal 206, and transmitting the transmitted down signal. Information about the audio signal channel used when upmixing a mix signal back to multichannel. Alternatively, the downmix signal 206 may be generated using an externally provided downmix signal, for example, an artistic down-mix signal 205.

상기 다운믹스 신호(206)는 코어 코덱 코딩방법을 이용하여 인코딩된 후 압축되어 전송되고, 또한 상기 공간 정보, 즉 공간 파라미터(207)도 함께 전송된다. 상기 코어 코덱은 공간 정보, 즉 공간 파라미터(207)가 아닌 오디오 신호를 코딩 또는 인코딩하는 코덱을 지칭하며, 상기 코어 코덱에는 MP3, AC-3, DTS 또는 AAC가 포함될 수 있으며, 오디오 신호에 대하여 코덱 기능을 수행한다면 기존에 개발된 코덱뿐만 아니라 향후 개발될 코덱을 포함할 수 있다. 만일 사용자의 시스템이 상기 다운믹스 신호(206)로만 출력할 수 있다면, 상기 압축되어 전송된 다운믹스 신호(206)는 디코딩된 후 직접 출력(209)될 수 있다. 만일 상기 시스템이 멀티채널 오디오 신호로 출력할 수 있다면, 상기 압축되어 전송된 오디오 신호는 디코딩된 후 공간 디코더(208)에서 함께 전송된 상기 멀티채널 오디오 신호의 공간 정보, 즉 공간 파라미터(207)를 이용하여 멀티채널 오디오 신호(210)로 변환되어 출력될 수 있다.The downmix signal 206 is encoded using a core codec coding method and then compressed and transmitted. The downmix signal 206 is also transmitted with the spatial information, that is, the spatial parameter 207. The core codec refers to a codec for coding or encoding an audio signal other than the spatial information, that is, the spatial parameter 207. The core codec may include MP3, AC-3, DTS, or AAC. If the function is performed, it may include a codec to be developed in the future as well as a codec previously developed. If the user's system can only output the downmix signal 206, the compressed downmixed signal 206 may be decoded and output 209 directly. If the system is capable of outputting a multichannel audio signal, the compressed and transmitted audio signal is decoded and then received spatial information, i.e., spatial parameter 207, of the multichannel audio signal transmitted together by the spatial decoder 208. The multichannel audio signal 210 may be converted into a multichannel audio signal 210 and output.

멀티채널 오디오 신호를 직접 전송하는 대신에, 상기와 같이 다운믹스 신호(206)로 다운믹스하여 전송하고, 상기 멀티채널 오디오 신호의 공간 정보, 즉 공간 파라미터(207)를 함께 전송하는 방식은 압축 및 전송효율의 관점에서 매우 유리하다. 본 발명에서는 상기 멀티채널 오디오 신호의 공간 정보, 즉 공간 파라미터(207)를 함께 전송하는데 있어서, 서브밴드별로 추출된 상기 공간 파라미터(207)를 적응적 양자화를 수행하여 더욱 효율적으로 비트열을 구성함으로써 압축 및 전송효율을 개선할 수 있다.Instead of transmitting the multichannel audio signal directly, the method of downmixing and transmitting the downmix signal 206 as described above, and transmitting the spatial information of the multichannel audio signal, that is, the spatial parameter 207 together, is compressed and It is very advantageous in terms of transmission efficiency. In the present invention, in transmitting the spatial information of the multi-channel audio signal, that is, the spatial parameter 207 together, the spatial parameter 207 extracted for each subband is adaptively quantized to construct a bit stream more efficiently. Compression and transmission efficiency can be improved.

도 3a는 본 발명의 제1 실시예에 따른 공간 인코더를 상세하게 도시한다. 도면에서 굵은 화살표는 복수의 신호가 간다는 의미이다. 도시된 것처럼 멀티채널 오디오 신호(301)는 필터뱅크(302)를 통과하여 서브밴드별로 나누어진다. 상기 필터뱅크(302)는 모든 주파수 대역에 걸친 오디오 신호를 각 서브밴드별로 나누는 역할을 하며, 상기 필터뱅크(302)로는 서브밴드 필터뱅크(sub-band filter bank) 또는 QMF 필터뱅크 등이 사용될 수 있다. 상기 필터뱅크를 통과한 오디오 신호는 서브밴드별로 공간 정보가 추출되고, 기준채널선택 및 공간정보계산부(303)는 상기 서브밴드별로 추출된 상기 공간 정보를 이용하여 기준채널을 변동적으로 설정할 수 있다. 상기 기준채널선택부 및 공간정보계산부(303)는 상기 변동적으로 설정된 기준채널을 이용하거나, 또는 고정된 기준채널을 이용할 수 있다. 상기 공간 정보에는 ICLD, ICTD, 또는 ICC 등이 포함될 수 있다. 상기 기준채널은 서브밴드별로 설정될 수 있거나 또는 모든 서브밴드에 대하여 동일한 것으로 설정될 수 있다. 그 다음에 상기 기준채널선택 및 공간정보계산부(303)는 상기 기준채널을 이용하여 서브밴드별로 제1 공간 정보를 생성한다.3A shows in detail the spatial encoder according to the first embodiment of the present invention. In the drawing, a thick arrow means that a plurality of signals go. As shown, the multi-channel audio signal 301 passes through the filter bank 302 and is divided into subbands. The filter bank 302 divides an audio signal across all frequency bands into subbands, and a sub-band filter bank or a QMF filter bank may be used as the filter bank 302. have. The spatial information is extracted for each subband of the audio signal passing through the filter bank, and the reference channel selection and spatial information calculator 303 may variably set a reference channel using the spatial information extracted for each subband. have. The reference channel selector and the spatial information calculator 303 may use the variably set reference channel or use a fixed reference channel. The spatial information may include ICLD, ICTD, or ICC. The reference channel may be set for each subband or the same for all subbands. Next, the reference channel selection and spatial information calculator 303 generates first spatial information for each subband using the reference channel.

그 다음에 비교값생성부(304)는 상기 제1 공간 정보의 채널별 디퍼렌셜 값, 서브밴드별 디퍼렌셜 값, 또는 프레임별 디퍼렌셜 값으로 제2 공간 정보를 생성할 수 있다. 상기 채널별 디퍼렌셜은 에너지가 가장 큰 채널을 기준으로 하여 생성되거나, 에너지가 중간인 채널을 기준으로 하여 생성되거나, 또는 에너지가 가장 작은 채널을 기준으로 하여 생성될 수 있다. 상기 제2 공간 정보는 채널별 디퍼렌셜 값에 다시 서브밴드별 디퍼렌셜을 적용하여 구하거나, 채널별 디퍼렌셜 값에 다시 프레임별 디퍼렌셜을 적용하여 구하거나, 또는 채널별 디퍼레셜 값에 다시 서브밴드별 디퍼렌셜 및 프레임별 디퍼렌셜을 적용하여 구할 수 있다. 그리고 그룹화생성부(305)는 상기 제1 공간 정보를 채널별, 서브밴드별, 또는 프레임별로 그룹화하여 제2 공간 정보를 생성할 수 있다. 또한, 상기 그룹화생성부(305)는 상기 비교값생성부(304)에서 디퍼렌셜 값으로 생성된 제2 공간 정보를 채널별, 서브밴드별, 또는 프레임별로 그룹화하여 제2 공간 정보를 다시 생성할 수 있다.Next, the comparison value generator 304 may generate second spatial information using the channel-specific differential value, the subband-specific differential value, or the frame-specific differential value of the first spatial information. The differential for each channel may be generated based on a channel having the largest energy, based on a channel having an intermediate energy, or generated based on a channel having the smallest energy. The second spatial information is obtained by applying subband-differentials to the channel-specific differentials again, or is obtained by applying frame-differentials to the channel-specific differentials again, or by sub-band differential and subchannels again to the channel-differential values. Can be obtained by applying the differential per frame. The grouping generator 305 may generate the second spatial information by grouping the first spatial information by channel, subband, or frame. In addition, the grouping generator 305 may regenerate the second spatial information by grouping the second spatial information generated as the differential value by the comparison value generator 304 for each channel, subband, or frame. have.

그 다음에 최소비트율결정부(Minimum Bit-rate decision element, 306)는 상기 디퍼렌셜 값으로 생성된 제2 공간 정보와, 상기 그룹화하여 생성된 제2 공간 정보와, 상기 제1 공간 정보 중 최소 비트율을 갖는 공간 정보를 선택한다. 선택된 공간 정보는 인코딩부(307)에서 인코딩되어 공간 정보 비트스트림(308)을 형성한다. 상기 인코딩부(307)에는 엔트로피 인코딩, 예를 들면 호프만 인코딩이 적용될 수 있다. 상기 공간 정보 비트스트림(308)은 멀티채널 오디오 신호를 다운믹스하여 얻어지는 다운믹스오디오신호를 이용하여 생성되는 코어 코덱 비트스트림과 함께 전체 비트스트림을 형성한다. 상기 공간 정보 비트스트림이란 멀티 채널 오디오 신호로부터 추출된 공간 정보, 즉 공간 파라미터에 대한 비트스트림으로서, 상기 공간 정보에 대한 컨피규레이션 비트스트림 및 공간 데이터 비트스트림으로 구성될 수 있다. 상기 코어 코덱 비트스트림이란 상기 공간 정보를 제외한 오디오 신호에 의해 형성되는 비트스트림을 말한다.Next, the minimum bit rate decision element 306 determines second spatial information generated by the differential value, the second spatial information generated by the grouping, and the minimum bit rate of the first spatial information. Select spatial information to have. The selected spatial information is encoded by the encoding unit 307 to form a spatial information bitstream 308. Entropy encoding, for example Hoffman encoding, may be applied to the encoding unit 307. The spatial information bitstream 308 forms an entire bitstream together with a core codec bitstream generated by using a downmix audio signal obtained by downmixing a multichannel audio signal. The spatial information bitstream is a spatial information extracted from a multi-channel audio signal, that is, a bitstream for a spatial parameter, and may be configured with a configuration bitstream and a spatial data bitstream for the spatial information. The core codec bitstream refers to a bitstream formed by an audio signal except for the spatial information.

상기 공간 정보 중 ICLD의 경우를 예를 들면, 먼저 필터뱅크(302)를 통해 서브밴드별로 나누어진 오디오 신호로부터 서브밴드별로 채널별 에너지를 계산한다. 그 다음에 기준채널선택 및 공간정보계산부(303)는 상기 채널별 에너지를 이용하여 미리 설정된 기준에 따라 서브밴드별로 변동적 기준채널로 선택할 수 있다. 상기 기준채널선택 및 공간정보계산부(303)는 상기 변동적 기준채널을 이용하거나 또는 고정된 기준채널을 이용할 수 있다. 또한 상기 기준채널선택 및 공간정보계산부(303)는 상기 기준채널을 이용하여 채널간에너지차이(ICLD)를 생성할 수 있다. 그 다음에 비교값생성부(304)는 상기와 같이 표현된 채널간 에너지 정보 및 채널인덱스 정보를 채널별, 밴드별, 또는 프레임별 디퍼렌셜 값으로 생성한다. 또한 그룹화생성부(305)는 상기 기준채널을 이용하여 생성된 채널간에너지차이(ICLD)를 채널별, 서브밴드별, 또는 프레임별로 그룹화하여 생성할 수 있다. 그 다음에 최소비트율결정부(306)는 상기 디퍼렌셜 값으로 생성된 ICLD와, 상기 그룹화로 생성된 ICLD와, 상기 기준채널을 이용하여 생성된 ICLD를 비교하여 최소 비트율을 갖는 것을 선택할 수 있다. 선택된 상기 채널간에너지차이(ICLD) 정보는 상기 채널인덱스 정보와 함께 또는 각각 허프만 코딩(Huffman Coding)과 같은 엔트로피 코딩을 하여 공간 정보 비트스트림을 구성하고, 다운믹스오디오신호로 구성되는 코어 코덱 비트스트림과 함께 전체 비트스트림을 구성할 수 있다.For example, in the case of ICLD among the spatial information, first, energy of each channel is calculated for each subband from an audio signal divided for each subband through the filter bank 302. Next, the reference channel selection and spatial information calculator 303 may select the variable reference channel for each subband according to a preset reference using the energy for each channel. The reference channel selection and spatial information calculator 303 may use the variable reference channel or use a fixed reference channel. In addition, the reference channel selection and spatial information calculator 303 may generate an inter-channel energy difference (ICLD) using the reference channel. Next, the comparison value generator 304 generates the channel-to-band energy information and the channel index information expressed as described above as channel, band, or frame differential values. In addition, the grouping generation unit 305 may generate the channel-to-channel energy difference (ICLD) generated by using the reference channel by grouping by channel, subband, or frame. Next, the minimum bit rate determination unit 306 may select the one having the minimum bit rate by comparing the ICLD generated by the differential value, the ICLD generated by the grouping, and the ICLD generated by using the reference channel. The selected inter-channel energy difference (ICLD) information is combined with the channel index information or entropy coding such as Huffman Coding to form a spatial information bitstream, and a core codec bitstream composed of a downmix audio signal. In addition, the entire bitstream can be configured.

상기 공간 정보 중 ICTD 또는 ICC의 경우를 예를 들면, 먼저 필터뱅크(302)를 통해 서브밴드별로 나누어진 오디오 신호로부터 서브밴드별로 ICTD 또는 ICC를 계산한다. 그 다음에 기준채널선택 및 공간정보계산부(303)는 상기 ICTD 또는 ICC를 이용하여 미리 설정된 기준에 따라 서브밴드별로 변동적 기준채널을 선택할 수 있다. 상기 기준채널선택 및 공간정보계산부(303)는 상기 변동적 기준채널을 이용하거나 또는 고정된 기준채널을 이용할 수 있다. 또한 상기 기준채널선택 및 공간정보계산부(303)는 상기 기준채널을 이용하여 ICTD 또는 ICC를 다시 계산할 수 있다. 그 다음에 비교값생성부(304)는 상기와 같이 표현된 ICTD 또는 ICC 및 채널인덱스 정보를 채널별, 서브밴드별, 또는 프레임별 디퍼렌셜 값으로 생성한다. 또한 그룹화생성부(305)는 상기 기준채널을 이용하여 생성된 ICTD 또는 ICC를 채널별, 서브밴드별, 또는 프레임별로 그룹화하여 생성할 수 있다. 그 다음에 최소비트율결정부(306)는 상기 디퍼렌셜 값으로 생성된 ICTD 또는 ICC와, 상기 그룹화로 생성된 ICTD 또는 ICC와, 상기 기준채널을 이용하여 생성된 ICTD 또는 ICC를 비교하여 최소 비트율을 갖는 것을 선택할 수 있다. 그리고 상기 ICTD 또는 ICC는 채널인덱스 정보와 함께 또는 각각 허프만 코딩 등의 엔트로피 코딩을 하여 공간 정보 비트스트림을 구성하고, 다운믹스된 신호로 구성되는 코어 코덱 비트스트림과 함께 전체 비트스트림을 구성할 수 있다.For example, in the case of ICTD or ICC among the spatial information, first, the ICTD or ICC is calculated for each subband from the audio signal divided for each subband through the filter bank 302. Subsequently, the reference channel selection and spatial information calculation unit 303 may select a variable reference channel for each subband based on a preset reference using the ICTD or ICC. The reference channel selection and spatial information calculator 303 may use the variable reference channel or use a fixed reference channel. In addition, the reference channel selection and spatial information calculator 303 may recalculate ICTD or ICC using the reference channel. Next, the comparison value generator 304 generates the ICTD or ICC and the channel index information expressed as described above as channel, subband, or frame differential values. In addition, the grouping generator 305 may generate an ICTD or ICC grouped by the channel, the subband, or the frame by using the reference channel. Next, the minimum bit rate determining unit 306 compares the ICTD or ICC generated by the differential value, the ICTD or ICC generated by the grouping, and the ICTD or ICC generated by using the reference channel to have a minimum bit rate. You can choose. The ICTD or ICC may configure the spatial information bitstream by entropy coding such as Huffman coding or channel index information, and the entire bitstream together with the core codec bitstream composed of downmixed signals. .

도 3b는 본 발명의 제1 실시예에 따른 인코딩 과정의 흐름도이다. 공간 정보를 더욱 효율적으로 표현하기 위해, 먼저 멀티채널 오디오 신호(309)로부터 오디오 신호를 다운믹스(310)하고, 상기 다운믹스된 오디오 신호를 이용하여 코어 코덱 비트스트림을 생성(312)한다. 상기 다운믹스된 오디오 신호는 모노 또는 스테레오 신호를 포함할 수 있다. 그 다음에 상기 멀티채널 오디오 신호(309)로부터 기준 채널을 이용하여 서브밴드별로 제1 공간 정보, 예를 들면 공간 파라미터를 추출(311)하고, 상기 서브밴드별로 추출된 제1 공간 정보의 채널별, 서브밴드별, 또는 프레임별 디퍼렌셜 값으로 제2 공간 정보를 생성(313)한다. 또한 상기 서브밴드별로 추출된 제1 공간 정보를 채널별, 서브밴드별, 또는 프레임별로 그룹화하여 공간 정보를 생성(313)할 수 있다. 그 다음에 상기 제2 공간 정보를 이용하여 공간 정보 비트스트림을 생성(314)하고, 상기 코어 코덱 비트스트림 및 공간 정보 비트스트림으로 구성되는 전체 비트스트림을 전송(315)한다.3B is a flowchart of an encoding process according to the first embodiment of the present invention. To more efficiently represent spatial information, first downmix 310 an audio signal from a multichannel audio signal 309, and then generate 312 a core codec bitstream using the downmixed audio signal. The downmixed audio signal may comprise a mono or stereo signal. Subsequently, first spatial information, for example, spatial parameters, are extracted 311 for each subband using a reference channel from the multichannel audio signal 309, and for each channel of the first spatial information extracted for each subband. In operation 313, second spatial information is generated using differential values per subband or per frame. In addition, the spatial information may be generated 313 by grouping the first spatial information extracted for each subband by channel, subband, or frame. Next, a spatial information bitstream is generated using the second spatial information (314), and the entire bitstream including the core codec bitstream and the spatial information bitstream is transmitted (315).

도 4a는 본 발명의 제2 실시예에 따른 공간 인코더에 대한 상세한 도면을 나 타낸다. 도시된 것처럼 멀티채널 오디오 신호(401)는 필터뱅크(402)를 통과하여 서브밴드별로 나누어진다. 상기 필터뱅크를 통과한 오디오 신호는 서브밴드별로 공간 정보가 추출되고, 기준채널선택 및 공간정보계산부(403)는 상기 공간 정보를 이용하여 서브밴드별로 변동적 기준채널을 설정할 수 있다. 상기 기준채널선택 및 공간정보계산부(403)는 변동적 기준채널을 이용하거나 또는 고정된 기준채널을 이용할 수 있다. 그리고 상기 기준채널선택 및 공간정보계산부(403)는 상기 변동적 기준채널 또는 고정적 기준채널을 이용하여 제1 공간 정보를 생성한다. 그 다음에 비교값생성부(404)는 상기 제1 공간 정보의 채널별, 서브밴드별, 또는 프레임별 디퍼렌셜 값으로 제2 공간 정보를 생성할 수 있다. 또한, 상기 비교값생성부(404)는 상기 제1 공간 정보를 채널별, 서브밴드별, 또는 프레임별로 그룹화하여 제2 공간 정보를 생성할 수 있다. 4A shows a detailed diagram of a spatial encoder according to a second embodiment of the present invention. As shown, the multichannel audio signal 401 passes through the filter bank 402 and is divided into subbands. The spatial information is extracted for each subband of the audio signal passing through the filter bank, and the reference channel selection and spatial information calculator 403 may set a variable reference channel for each subband using the spatial information. The reference channel selection and spatial information calculator 403 may use a variable reference channel or a fixed reference channel. The reference channel selection and spatial information calculator 403 generates first spatial information using the variable reference channel or the fixed reference channel. Next, the comparison value generator 404 may generate second spatial information using differential values of channels, subbands, or frames of the first spatial information. In addition, the comparison value generator 404 may generate the second spatial information by grouping the first spatial information by channel, subband, or frame.

그 다음에 최소비트율결정부(406)는 상기 디퍼렌셜 값으로 생성된 제2 공간 정보와 상기 그룹화를 통해 생성된 제2 공간 정보 중에서 비트율이 작은 것을 선택한다. 선택된 상기 제2 공간 정보는 인코딩부(408)에서 인코딩되어 공간 정보 비트스트림을 생성한다. 또한, 상기 서브밴드별로 생성된 제1 공간 정보는 인코딩부(409)에서 인코딩하여 공간 정보 비트스트림을 생성할 수 있다. 상기 인코딩부(408)부와 상기 인코딩부(409)는 동일한 것일 수도 있고, 다른 것일 수도 있다. 그 다음에 비교선택부(410)는 상기 제2 공간 정보를 인코딩하여 생성된 공간 정보 비트스트림과 상기 제1 공간 정보를 인코딩하여 생성된 공간 정보 비트스트림을 비교한 후, 하나를 선택한다. 이때, 상기 비교선택부(410)는 더 적은 비트율을 갖는 공 간 정보 비트스트림을 선택할 수 있다. Next, the minimum bit rate determining unit 406 selects a small bit rate from the second spatial information generated as the differential value and the second spatial information generated through the grouping. The selected second spatial information is encoded by the encoding unit 408 to generate a spatial information bitstream. In addition, the first spatial information generated for each subband may be encoded by the encoder 409 to generate a spatial information bitstream. The encoding unit 408 and the encoding unit 409 may be the same or different. Next, the comparison selecting unit 410 compares the spatial information bitstream generated by encoding the second spatial information with the spatial information bitstream generated by encoding the first spatial information, and then selects one. In this case, the comparison selector 410 may select a space information bitstream having a lower bit rate.

도 4b는 본 발명의 제2 실시예에 따른 인코딩 과정의 흐름도이다. 공간 정보를 더욱 효율적으로 표현하기 위해, 먼저 멀티채널 오디오 신호(412)로부터 오디오 신호를 다운믹스(413)하고, 상기 다운믹스된 오디오 신호를 이용하여 코어 코덱 비트스트림을 생성(415)한다. 상기 다운믹스된 오디오 신호는 모노 또는 스테레오 신호를 포함할 수 있다. 그 다음에 상기 멀티채널 오디오 신호(412)로부터 서브밴드별로 제1 공간 정보를 기준채널을 이용하여 서브밴드별로 추출(414)하고, 상기 제1 공간 정보를 이용하여 공간 정보 비트스트림을 생성(416)한다. 또한, 상기 제1 공간 정보의 채널별, 서브밴드별, 또는 프레임별 디퍼렌셜 값으로 제2 공간 정보를 생성하고, 상기 제2 공간 정보를 이용하여 공간 정보 비트스트림을 생성(417)한다. 그 다음에 상기 제1 공간 정보를 이용하여 생성된 공간 정보 비트스트림과 상기 제2 공간 정보를 이용하여 생성된 공간 정보 비트스트림을 비교(418)한다. 만일 제1 공간 정보를 이용하여 생성된 공간 정보 비트스트림이 더 적은 비트율을 갖는다면, 상기 제1 공간 정보를 이용하여 생성된 공간 정보 비트스트림을 선택(419)하고, 만일 제2 공간 정보를 이용하여 생성된 공간 정보 비트스트림이 더 적은 비트율을 갖는다면, 상기 제2 공간 정보를 이용하여 생성된 공간 정보 비트스트림을 선택(420)한다. 그 다음에 선택된 방식으로 구성된 공간 정보 비트스트림과 상기 코어 코덱 비트스트림을 포함하는 전체 비트스트림을 전송(421)한다.4B is a flowchart of an encoding process according to a second embodiment of the present invention. To more efficiently represent spatial information, first downmix 413 the audio signal from the multi-channel audio signal 412, and generate 415 a core codec bitstream using the downmixed audio signal. The downmixed audio signal may comprise a mono or stereo signal. Next, the first spatial information for each subband is extracted from the multichannel audio signal 412 for each subband using a reference channel (414), and a spatial information bitstream is generated using the first spatial information (416). )do. In addition, second spatial information is generated using channel, subband, or frame differential values of the first spatial information, and a spatial information bitstream is generated using the second spatial information (417). Next, the spatial information bitstream generated using the first spatial information is compared with the spatial information bitstream generated using the second spatial information (418). If the spatial information bitstream generated using the first spatial information has a lower bit rate, select 419 the spatial information bitstream generated using the first spatial information, and use the second spatial information. If the generated spatial information bitstream has a lower bit rate, the generated spatial information bitstream is selected 420 using the second spatial information. Then, the entire bitstream including the spatial information bitstream configured in the selected manner and the core codec bitstream is transmitted 421.

도 5a는 본 발명의 제1 실시예에 따른 공간 디코더를 상세하게 도시한다. 도시된 것처럼, 상기 공간 디코더는 코어 코덱 비트스트림 및 공간 정보 비트스트림 을 포함하는 전체 비트스트림(501)을 수신한다. 수신된 전체 비트스트림은 다수의 공간정보검출부로 보내질 수 있다. 예를 들면, 상기 다수의 공간정보검출부에는 공간정보검출부(502), 디퍼렌셜공간정보검출부(503) 및 그룹화공간정보검출부(504)가 포함될 수 있다. 상기 다수의 공간정보검출부는 인코더에서 어떤 방식으로 공간 정보가 생성되었는지를 나타내는 플래그 정보를 읽는다. 만일 서브밴드별로 공간 정보가 생성된 경우라면 상기 공간정보독출부(502)에서 공간 정보가 독출된다. 만일 상기 서브밴드별 공간 정보의 디퍼렌셜 값으로 공간 정보가 생성된 경우라면 상기 디퍼렌셜공간정보검출부(503)에서 공간 정보가 독출된다. 만일 상기 서브밴드별 공간 정보 또는 상기 디퍼렌션 값으로 생성된 공간 정보를 그룹화하여 공간 정보가 생성된 경우라면 상기 그룹화공간정보검출부(504)에서 공간 정보가 독출된다. 5a shows a spatial decoder according to a first embodiment of the present invention in detail. As shown, the spatial decoder receives an entire bitstream 501 that includes a core codec bitstream and a spatial information bitstream. The entire received bitstream may be sent to a plurality of spatial information detectors. For example, the plurality of spatial information detectors may include a spatial information detector 502, a differential spatial information detector 503, and a grouped spatial information detector 504. The plurality of spatial information detection units read flag information indicating how spatial information is generated in the encoder. If spatial information is generated for each subband, the spatial information reading unit 502 reads the spatial information. If spatial information is generated as a differential value of the spatial information for each subband, the spatial information is read by the differential spatial information detector 503. If spatial information is generated by grouping the spatial information generated by the subband spatial information or the difference value, the spatial information is read by the grouping spatial information detector 504.

그 다음에 독출된 상기 공간 정보를 이용하여 디코딩부(505)에서 상기 공간 정보 비트스트림이 디코딩되고, 상기 디코딩으로부터 얻어진 채널인덱스 정보 및 공간 정보를 이용하여 공간정보계산부(506)에서 상기 공간 정보에 대응하는 실제 물리량을 계산한다. 상기 채널인덱스 정보에는 기준 채널에 대한 정보가 포함될 수 있다. 그 다음에 상기 공간 정보에 대응하는 실제 물리량을 이용하여 멀티채널 생성기(multi-channel generator, 507)는 상기 코어 코덱 비트스트림을 디코딩하여 얻어진 다운믹스오디오신호(509)를 멀티채널 오디오 신호(507)로 변환시킬 수 있다. 만일 시스템이 모노 또는 스테레오 오디오 신호만을 지원한다면, 상기 다운믹스 오디오 신호(509)를 직접 출력할 수 있다.Then, the spatial information bitstream is decoded by the decoding unit 505 using the read spatial information, and the spatial information calculating unit 506 uses the spatial index information and the spatial index information obtained from the decoding. Calculate the actual physical quantity corresponding to. The channel index information may include information about a reference channel. Then, using the actual physical quantity corresponding to the spatial information, the multi-channel generator 507 may output the downmix audio signal 509 obtained by decoding the core codec bitstream to the multi-channel audio signal 507. Can be converted to If the system only supports mono or stereo audio signals, the downmix audio signal 509 can be output directly.

도 5b는 본 발명의 제1 실시예에 따른 디코딩 과정의 흐름도를 도시한다. 먼 저 공간 디코더는 코어 코데 비트스트림 및 공간 정보 비트스트림을 포함하는 전체 비트스트림을 수신(509)한다. 그 다음에 상기 코어 코덱 비트스트림이 디코딩(510)되고, 상기 디코딩된 코어 코덱 비트스트림으로부터 다운믹스된 오디오 신호를 추출(512)할 수 있다. 상기 다운믹스된 오디오 신호는 모노 또는 스테레오 오디오 신호를 포함할 수 있다. 또한, 상기 공간 정보 비트스트림으로부터 인코더에서 공간 정보가 생성된 방식, 예를 들면 서브밴드별로 공간 정보, 상기 서브밴드별 공간 정보의 디퍼렌셜 값으로 생성된 공간 정보, 또는 상기 서브밴드별 공간 정보를 그룹화한 값으로 생성된 공간 정보를 독출(511)한다. 독출된 상기 공간 정보를 이용하여 공간 정보 비트스트림을 디코딩(513)하고, 디코딩하여 얻어진 채널인덱스 정보 및 공간 정보를 이용하여 공간 정보에 대응하는 물리량을 계산(514)한다. 그 다음에 상기 계산된 공간 정보에 대응하는 물리량을 이용하여 상기 다운믹스 신호를 멀티채널 오디오 신호로 변환(515)한다.5B shows a flowchart of a decoding process according to the first embodiment of the present invention. The spatial decoder first receives 509 the entire bitstream including the core codec bitstream and the spatial information bitstream. The core codec bitstream may then be decoded 510 and extract 512 a downmixed audio signal from the decoded core codec bitstream. The downmixed audio signal may comprise a mono or stereo audio signal. In addition, a method in which spatial information is generated in the encoder from the spatial information bitstream, for example, spatial information for each subband, spatial information generated as differential values of spatial information for each subband, or spatial information for each subband are grouped. The spatial information generated by the value is read (511). The spatial information bitstream is decoded using the read spatial information (513), and the physical quantity corresponding to the spatial information is calculated (514) using the channel index information and the spatial information obtained by decoding. The downmix signal is then converted into a multichannel audio signal using the physical quantity corresponding to the calculated spatial information (515).

지금까지 본 발명에 대하여 몇몇 실시예들을 들어 구체적으로 설명하였으나, 상기 실시예들은 본 발명을 이해하기 위한 설명을 위해 제시된 것이며, 본 발명의 범위가 상기 실시예에 제한되는 것은 아니다. 당업자라면 본 발명의 기술적 사상의 범위를 벗어나지 않고도 다양한 변형이 가능함을 이해할 수 있을 것이며, 본 발명의 범위는 첨부된 특허청구범위에 의해서 해석되어야 할 것이다.Although the present invention has been described in detail with reference to some embodiments, the above embodiments are presented for the purpose of understanding the present invention, and the scope of the present invention is not limited to the above embodiments. Those skilled in the art will understand that various modifications are possible without departing from the scope of the technical idea of the present invention, and the scope of the present invention should be interpreted by the appended claims.

이상에서 기술된 것과 같이, 본 발명에 따른 멀티채널 오디오 신호를 코딩하는데 있어서, 공간 정보를 효율적으로 표현하기 위해, 서브밴드별로 추출된 제1 공 간 정보의 채널별, 서브밴드별, 또는 프레임별 디퍼렌셜 값으로 제2 공간 정보를 생성하거나, 또는 상기 제1 공간 정보를 채널별, 서브밴드별, 또는 프레임별로 그룹화하여 제2 공간 정보를 생성하고, 상기 제2 공간 정보를 이용하여 공간 정보 비트스트림을 생성함으로써 전체 비트스트림을 더 적은 비트율로 표현할 수 있으며, 따라서 인코딩, 전송 및 디코딩 효율을 향상시킬 수 있는 효과가 있다.As described above, in coding the multi-channel audio signal according to the present invention, in order to efficiently represent spatial information, for each channel, subband, or frame of the first space information extracted for each subband. Generate second spatial information by differential value, or generate second spatial information by grouping the first spatial information by channel, subband, or frame and by using the second spatial information, a spatial information bitstream. By generating, the entire bitstream can be represented at a lower bit rate, thereby improving the encoding, transmission and decoding efficiency.

또한, 상기 제2 공간 정보를 이용하여 공간 정보 비트스트림을 생성하는 경우와, 상기 제1 공간 정보를 이용하여 공간 정보 비트스트림을 생성하는 경우를 비교하여, 더 적은 비트율로 표현될 수 있는 경우를 선택함으로써 인코딩, 전송 및 디코딩 효율을 향상시킬 수 있는 효과가 있다.In addition, the case in which the spatial information bitstream is generated using the second spatial information and the case in which the spatial information bitstream is generated using the first spatial information may be expressed at a lower bit rate. The selection has the effect of improving the encoding, transmission and decoding efficiency.

Claims

A method of encoding a multichannel audio signal,

(a) downmixing the multichannel audio signal and extracting first spatial information for each subband using a reference channel from the multichannel audio signal;

(b) generating second spatial information based on a channel-specific differential value, a subband-specific differential value, or a frame-specific differential value of the first spatial information; And

(c) forming a core codec bitstream using the downmixed audio signal and forming a spatial information bitstream using the second spatial information. Encoding Method.

The method of claim 1,

The second spatial information is generated by combining two or more of the first spatial information per channel, subband differential, or frame differential.

The method of claim 1,

In step (b),

And regenerating second spatial information by grouping the second spatial information by channel, subband, and frame.

The method of claim 1,

The spatial information includes an inter channel level difference (ICLD), an inter channel time difference (ICTD), or an inter-channel coherence (ICC).

The method of claim 1,

The differential for each channel is generated by using one of the channel with the highest energy, the channel with the highest energy, or the channel with the lowest energy as a reference channel.

A method of encoding a multichannel audio signal,

(b) generating second spatial information by grouping the first spatial information by channel, subband, or frame; And

A method of encoding a multichannel audio signal,

(b) generating second spatial information based on a channel-specific differential value, a subband-specific differential value, or a frame-specific differential value of the first spatial information;

(c) generating second spatial information by grouping the first spatial information by channel, subband, or frame;

(d) generating a spatial information bitstream by selecting one having a smaller bit rate among the second spatial information generated in step (b) and the second spatial information generated in step (c);

(e) generating a spatial information bitstream using the first spatial information; And

and (f) selecting one of the spatial information bitstream generated in the step (d) and the spatial information bitstream generated in the step (e). .

The method of claim 7, wherein

Step (f),

When configuring the entire bitstream using the spatial information bitstream generated by the selected method, whether the spatial information bitstream is formed in the (d) step within the entire bitstream, or the spatial information bit in the step (e). Generating identification information that can distinguish whether a stream has been formed.

In the method of decoding a multichannel audio signal,

(a) receiving a core codec bitstream and a spatial information bitstream for the downmixed audio signal;

(b) Second spatial information and channel index information generated as a differential value for each channel, a differential value for each subband, or a differential value for each frame of the first spatial information generated for each subband using a reference channel from the spatial information bitstream. Reading out; And

(c) decoding the core codec bitstream, and decoding the spatial information bitstream using the spatial information and the channel index information read in the step (b). How to decode a signal.

The method of claim 9,

In step (b),

And reading out the spatial information and the channel index information generated by grouping the second spatial information on a channel, subband, or frame basis.

The method of claim 9,

Step (c) is,

And converting the downmix audio signal obtained by decoding the core codec bitstream into a multichannel audio signal using the spatial information obtained from the decoded spatial information bitstream. Decoding method.

In the method of decoding a multichannel audio signal,

(b) reading second spatial information and channel index information generated by grouping first spatial information generated for each subband by a channel, subband or frame using a reference channel from the spatial information bitstream; ; And

(c) decoding the core codec bitstream and decoding the spatial information bitstream using the read second spatial information and channel index information. How to.

The method of claim 12,

Step (c) is,

And converting the downmix audio signal obtained by decoding the core codec bitstream into a multichannel audio signal by using the second spatial information obtained from the decoded spatial information bitstream. Method of decoding the signal.

In the method of decoding a multichannel audio signal,

(b) reading identification information on a method of generating spatial information from the spatial information bitstream; And

and (c) decoding the core codec bitstream, and decoding the spatial information bitstream according to the identification information read out in step (b).

The method of claim 14,

The identification information is generated when a spatial information bitstream is generated using first spatial information extracted for each subband using a reference channel, and the first spatial information is generated as a difference value for each channel, subband, or frame. The spatial information bitstream is generated by using the second spatial information to generate the spatial information bitstream, or by using the second spatial information generated by grouping the first spatial information by channel, subband, or frame. A method for decoding a multichannel audio signal, characterized in that the generated case can be distinguished.

The method of claim 14,

Step (c) is,

Converting the downmix audio signal obtained by decoding the core codec bitstream using the spatial information obtained from the decoded spatial information bitstream into a multichannel audio signal. Decoding method.

In generating an audio signal,

The audio signal is generated to include a core codec bitstream and a spatial information bitstream,

The spatial information bitstream extracts first spatial information for each subband using a reference channel from a multichannel audio signal, and extracts a differential value for each channel, a differential value for each subband, or a differential value for each frame from the extracted first spatial information. And generate second spatial information.

The method of claim 17,

The second spatial information is grouped by channel, subband, and frame to generate a spatial information bitstream.

In generating an audio signal,

The spatial information bitstream extracts first spatial information for each subband using a reference channel from a multichannel audio signal, and groups the extracted first spatial information for each channel, for each subband, or for each frame. And to generate an audio signal.

In generating an audio signal,

Extracting first spatial information from a multi-channel audio signal by using a reference channel for each subband, and generating second spatial information using a differential value for each channel, a differential value for each subband, or a differential value for each frame. A method of generating a spatial information bitstream using a case in which the first spatial information is grouped by channel, subband, or frame to generate second spatial information, and having a smaller bit rate;

And comparing a method of generating a spatial information bitstream using the first spatial information, and then selecting one method.

The method of claim 21,

When composing the entire bitstream using the spatial information bitstream formed by the selected method, whether the spatial information bitstream is formed using the first spatial information in the entire bitstream using the second spatial information. And identification information for discriminating whether or not a bitstream has been formed.