KR100740807B1

KR100740807B1 - Spatial Information Extraction Method in Spatial Information-based Audio Coding

Info

Publication number: KR100740807B1
Application number: KR1020040117805A
Authority: KR
Inventors: 서정일; 백승권; 이병화; 강경옥; 홍진우; 한민수
Original assignee: 한국전자통신연구원
Priority date: 2004-12-31
Filing date: 2004-12-31
Publication date: 2007-07-19
Anticipated expiration: 2024-12-31
Also published as: KR20060077832A

Abstract

본 발명은 공간정보기반 오디오 부호화(SAC: Spatial Audio Coding) 방식을 이용한 멀티채널 오디오 신호의 부호화 및 복호화하는 과정에서 부가정보로 사용되는 공간정보 (spatial cue)를 추출 및 적용하는 방법에 관한 것이다. 본 발명의 일특징에 따르면, 멀티채널 오디오 신호의 공간정보 오디오 부호화시에 각 서브밴드별 공간 정보를 추출하는 방법이 제공된다. 상기 공간정보 추출 방법은, 좌우 비대칭적이고 동일한 최대치를 지니며 이웃한 창함수와 중첩되는 부분에서 그 합이 일정 상수로 표현되도록 서브밴드별 창함수를 선정하는 단계와, 상기 선정된 창함수를 이용하여 공간정보를 추출하는 단계를 포함하는 것을 특징으로 한다. 일실시예에서, 상기 공간 정보는 채널간 에너지비(Inter-Channel Level Difference: ICLD)이다.The present invention relates to a method of extracting and applying a spatial cue used as additional information in a process of encoding and decoding a multichannel audio signal using a spatial information based audio coding (SAC) method. According to one aspect of the present invention, there is provided a method of extracting spatial information for each subband during spatial information audio encoding of a multichannel audio signal. The method for extracting spatial information includes selecting a window function for each subband such that the sum is expressed as a constant at a portion which is asymmetrical and has the same maximum value and overlaps with a neighboring window function, and uses the selected window function. To extract the spatial information. In one embodiment, the spatial information is an inter-channel level difference (ICLD).

공간정보 기반 오디오 부호화(spatial audio coding:SAC), ICLD(Inter Channel Level Difference)Spatial audio coding (SAC), Inter Channel Level Difference (ICLD)

Description

Method for obtaining spatial information in spatial information-based audio coding {Method for obtaining spatial cues in Spatial Audio Coding}

도 1은 본 발명이 적용되는 일반적인 SAC 코딩 시스템의 구성을 개략적으로 도시한 블록도이다.1 is a block diagram schematically illustrating a configuration of a general SAC coding system to which the present invention is applied.

도 2는 본 발명에 따라 공간 정보의 추출 및 적용에 이용되는 창 함수의 일실시예를 도시한다.Figure 2 illustrates one embodiment of a window function used for extraction and application of spatial information in accordance with the present invention.

본 발명은 공간정보기반 오디오 부호화(Spatial Audio Coding: SAC) 방법에 관한 것으로, 구체적으로는 공간정보기반 오디오 부호화시에 이용되는 공간 정보 추출 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a spatial information coding (SAC) method, and more particularly, to a spatial information extraction method used for spatial information based audio coding.

최근에 소개된 SAC 기술은 멀티채널 신호나 여러 독립된 신호를 다운믹스된 모노 또는 스테레오 신호와 공간정보로 표현 전송 및 복원하는 기술로서, 낮은 비트율에서도 고품질의 멀티채널 신호를 전송할 수 있는 기술이다. SAC 기술의 핵심은 멀티채널 신호를 서브밴드별로 분석하여 각 밴드별 공간정보를 추정하고 이것과 다운믹스된 신호로부터 다채널 원 신호를 복원한다는 것이다. 따라서, 공간정보는 원래 신호를 복원하는데 중요한 요소로서, SAC의 재생 오디오 신호의 음질을 좌우하는 큰 요인이 된다. 대표적인 SAC 기술로서, 바이노럴 큐 코딩(Binaural Cue Coding: BCC)이 최근에 소개되었으며, 이는 채널간 에너지비(Inter-Channel Level Difference: ICLD), 채널간 시간 지연(Inter Channel Time Difference: ICTD) 및 채널간 코히런스(Inter Channel Coherence: ICC) 및 가상음원 위치정보(Virtual Source Location Information)를 공간정보로 이용한다. ICLD는 원 신호의 주파수 정보를 복원하는데 가장 중요한 공간정보이다. 각 공간 정보는 다운믹스된 신호의 서브밴드별로 적용하게 되는데, 각 서브밴드마다 각각 하나의 공간정보가 대표적으로 적용됨에 따라 밴드 내 혹은 밴드 경계에서 주파수 왜곡이 발생하게 된다. 따라서, 이러한 왜곡현상을 방지하기 위한 공간정보의 평탄화(smoothing)가 필요하다.Recently introduced SAC technology is a technology for transmitting and restoring multichannel signals or independent signals into downmixed mono or stereo signals and spatial information, and is capable of transmitting high quality multichannel signals at low bit rates. The core of SAC technology is to analyze multichannel signal by subband, estimate spatial information of each band, and recover multichannel original signal from downmixed signal. Therefore, spatial information is an important factor in restoring the original signal, which is a big factor in determining the sound quality of the reproduced audio signal of the SAC. As a representative SAC technology, binaural cue coding (BCC) has recently been introduced, which includes Inter-Channel Level Difference (ICLD) and Inter Channel Time Difference (ICTD). And interchannel coherence (ICC) and virtual source location information as spatial information. ICLD is the most important spatial information for recovering the frequency information of the original signal. Each spatial information is applied to each subband of the downmixed signal. As one spatial information is representatively applied to each subband, frequency distortion occurs in a band or at a band boundary. Therefore, smoothing of spatial information is necessary to prevent such distortion.

따라서, 본 발명은 공간정보기반 오디오 부호화 방식에서 공간 정보를 추출 및 적용시에, 창 함수를 이용한 공간정보의 평탄화를 제공함으로써 복원되는 신호의 주파수 왜곡 및 불연속성을 최소화하는 것을 목적으로 한다.Accordingly, an object of the present invention is to minimize frequency distortion and discontinuity of a restored signal by providing flattening of spatial information using a window function when extracting and applying spatial information in a spatial information-based audio coding scheme.

상기의 목적을 달성하기 위하여, 본 발명의 일특징에 따르면, 멀티채널 오디오 신호의 공간정보 오디오 부호화시에 각 서브밴드별 공간 정보를 추출하는 방법이 제공된다. 상기 공간정보 추출 방법은, 좌우 비대칭적이고 동일한 최대치를 지니며 이웃한 창함수와 중첩되는 부분에서 그 합이 일정 상수로 표현되도록 서브밴드별 창함수를 선정하는 단계와, 상기 선정된 창함수를 이용하여 공간정보를 추출 하는 단계를 포함하는 것을 특징으로 한다. 일실시예에서, 상기 공간 정보는 채널간 에너지비(Inter-Channel Level Difference: ICLD), 채널간 시간 지연(Inter Channel Time Difference: ICTD) 및 채널간 코히런스(Inter Channel Coherence: ICC) 및 가상음원 위치정보(Virtual Source Location Information)를 포함한다.In order to achieve the above object, according to an aspect of the present invention, there is provided a method for extracting spatial information for each subband during spatial information audio encoding of a multichannel audio signal. The method for extracting spatial information includes selecting a window function for each subband such that the sum is expressed as a constant at a portion which is asymmetrical and has the same maximum value and overlaps with a neighboring window function, and uses the selected window function. Characterized in that it comprises the step of extracting the spatial information. In one embodiment, the spatial information is inter-channel level difference (ICLD), inter-channel time difference (ICTD), inter-channel coherence (ICC) and virtual sound source It includes location information (Virtual Source Location Information).

본 발명의 또 다른 특징에 따르면, 전술한 공간 정보 추출 방법을 수행하여 추출된 공간 정보를 이용하여 다운믹스 신호로부터 원래의 멀티채널 신호를 복원하는 방법이 제공된다. 상기 복원 방법은, 상기 공간 정보를 이용하여 각 채널 신호의 서브밴드별 이득값을 구하는 단계와, 상기 이득값을 상기 다운믹스 신호에 적용하여 서브밴드별 각 채널 신호를 복원하는 단계를 포함한다.
According to still another aspect of the present invention, a method of restoring an original multichannel signal from a downmix signal using spatial information extracted by performing the aforementioned spatial information extraction method is provided. The method may include obtaining a gain value for each subband of each channel signal using the spatial information, and recovering each channel signal for each subband by applying the gain value to the downmix signal.

이하에서는 본 발명을 첨부된 도면에 도시된 실시예들과 관련하여 예시적으로 상세히 설명하겠다. 그러나, 이하의 상세한 설명은 단지 예시적인 목적으로 제공되는 것이며 본 발명의 개념을 임의의 특정된 물리적 구성에 한정하는 것으로 해석되어서는 안 될 것이다.
Hereinafter, with reference to the embodiments shown in the accompanying drawings, the present invention will be described in detail by way of example. However, the following detailed description is provided for illustrative purposes only and should not be construed as limiting the inventive concept to any particular physical configuration.

도 1은 본 발명이 적용되는 SAC 코딩 시스템의 구성을 개략적으로 도시한 블록도이다. 도 1에 도시된 바와 같이, SAC 부호화기(110)측의 다운믹스부(110)는 입력된 멀티채널 신호들을 다운믹스된 (모노/스테레오) 신호로 변환하고, 분석부(120)는 입력된 멀티채널 신호들로부터 서브밴드별 공간정보를 추출한다. 다운믹스된 신호 및 공간 정보를 수신하는 SAC 복호화기(120)측의 합성부(121)는 수신된 다 운믹스 신호와 공간정보를 이용하여 멀티채널 신호를 각 서브밴드 별로 합성 및 재생한다.1 is a block diagram schematically showing a configuration of a SAC coding system to which the present invention is applied. As shown in FIG. 1, the downmix unit 110 on the side of the SAC encoder 110 converts input multichannel signals into downmixed (mono / stereo) signals, and the analysis unit 120 inputs the multi Sub-band spatial information is extracted from the channel signals. The synthesizer 121 of the SAC decoder 120 that receives the downmixed signal and spatial information synthesizes and reproduces a multichannel signal for each subband using the received downmix signal and spatial information.

구체적으로, 분석부(120)는 공간 정보를 추출하는 과정에서 본 발명에서 제안하는 서브밴드별 창함수(window function)를 이용한다. 도 2는 본 발명에 따라 공간 정보의 추출 및 적용에 이용되는 창 함수(H_b[k], 여기서, b는 서브밴드 인덱스를 k는 주파수 빈 인덱스를 나타냄)의 일실시예를 도시한다. 도시된 창 함수는 삼각창 함수의 일예이다. 본 발명에 따라 공간정보 추출에 이용되는 창함수의 조건은 좌우 비대칭적이고 이웃한 창함수와 중첩(overlap)되는 구간에서 그 합이 상수가 되며 각각의 창함수의 최대치(h_i)가 동일하도록 선정된다. 각 창 함수의 최대치 h_i는 각 서브밴드의 중간지점에서 산출된다. 예를 들어, 도 2에서와 같이 임의의 인접한 서브밴드 b와 b+1의 경계를 A_b, A_b+1 라고 정의할 때 창 함수의 최대치 h_i를 산출하는 주파수 f_i는 수학식 1과 같이 표현한다.Specifically, the analysis unit 120 uses a window function for each subband proposed by the present invention in the process of extracting spatial information. FIG. 2 illustrates one embodiment of a window function (H _b [k], where b represents a subband index and k represents a frequency bin index) used to extract and apply spatial information in accordance with the present invention. The window function shown is an example of a triangular window function. According to the present invention, the condition of the window function used for extracting spatial information is asymmetrical and the sum is constant in the section overlapping with the neighboring window function, and the maximum value h _i of each window function is selected to be the same. do. The maximum value h _i of each window function is calculated at the midpoint of each subband. For example, when defining the boundary of any adjacent subbands b and b + 1 as A _b , A _{b + 1} , as shown in FIG. 2, the frequency f _i, which yields the maximum value h _i of the window function, Express it together.

여기서, A_b는 서브밴드 b의 경계를 나타낸다.Here, A _b represents the boundary of the subband b.

공간정보 추출에 적용되는 창함수가 삼각 창함수일 경우에, H_b[k](0≤k≤512)는 수학식 2로부터 구할 수 있다. When the window function applied to the spatial information extraction is a triangular window function, H _b [k] (0 ≦ k ≦ 512) may be obtained from Equation 2.

부호화기의 분석부(112)는 상기 수학식 2를 통해 구한 창함수(H_b[k])를 이용하여 공간정보인 ICLD값(ΔL_c,b)을 하기 수학식에 따라 추출할 수 있다. 본 발명에 따라 선정된 창함수를 이용하여 추출된 ICLD는 이웃한 서브밴드와 중첩 추정되어 평탄화됨으로써 주파수 왜곡 및 불연속성을 최소화할 수 있다.The analysis unit 112 of the encoder may extract the ICLD values ΔL _{c, b as} spatial information by using the window function H _b [k] obtained through Equation 2 according to the following equation. The ICLD extracted using the window function selected according to the present invention can be flattened by overlapping with the neighboring subbands, thereby minimizing frequency distortion and discontinuity.

여기서, 서브밴드 b의 평균에너지 P_c,b(1≤c≤C: 총 채널수)는 하기 수학식 4로부터 구하고, ref는 기준 채널 인덱스를 나타낸다.Here, the average energy P _{c, b} of subband _b (1 ≦ _c ≦ C: total number of channels) is obtained from Equation 4 below, and ref represents a reference channel index.

여기서, S_c[k]는 각 채널 신호의 DFT 값이다.Here, S _c [k] is a DFT value of each channel signal.

한편, 복호화기의 합성부(121)는 상기 분석부(112)로부터 전송된 ICLD 정보를 이용하여 각 채널의 서브밴드별로 이득(gain)을 구하고 이를 다운믹스된 신호에 적용함으로써 각 채널 신호를 복원한다. ICLD로부터 구한 각 채널의 이득값을 F_c,b라 할 때, 수학식 5로부터 서브밴드별 각 채널 신호를 복원할 수 있다.Meanwhile, the synthesizer 121 of the decoder obtains a gain for each subband of each channel using the ICLD information transmitted from the analyzer 112 and restores each channel signal by applying it to the downmixed signal. do. When the gain value of each channel obtained from the ICLD is F _{c, b} , each channel signal for each subband can be restored from Equation 5.

여기서

는 c채널의 주파수 영역 신호이다.here

Is the frequency domain signal of the c channel.

본 발명에 따라 창함수를 이용하여 공간정보를 추출함으로써, 재생된 오디오 신호의 주파수 왜곡을 최소화시키고 서브밴드 경계의 불연속성 특징을 완화하여 보다 좋은 음질의 오디오 재생 신호를 얻을 수 있다. 본 발명을 통하여 SAC에 사용되는 공간정보 정보의 신뢰도를 높임으로써 보다 나은 음질을 기대할 수 있으며 이는 SAC 기술의 상용화를 앞당길 수 있을 것이다.According to the present invention, by extracting spatial information using a window function, it is possible to minimize the frequency distortion of the reproduced audio signal and to reduce the discontinuity characteristic of the subband boundary to obtain a better audio reproduction signal. Through the present invention, better sound quality can be expected by increasing the reliability of the spatial information used in the SAC, which can speed up the commercialization of the SAC technology.

Claims

A method of extracting spatial information for each subband during spatial information audio encoding of a multichannel audio signal,

Selecting a window function for each subband such that the sum is expressed as a constant at a portion that is asymmetrical and has the same maximum and overlaps with a neighboring window function;

Extracting spatial information using the selected window function

Spatial information extraction method comprising a.

The method of claim 1, wherein the spatial information includes an inter-channel level difference (ICLD), an inter-channel time difference (ICTD), or an inter-channel coherence (ICC). And virtual sound source location information (Virtual Source Location Information).

The method of claim 1, wherein the window function is selected based on Equation 6 below.

Here, b is a subband index, k is the bin index, _b f denotes the frequency for calculating the maximum value of the window function to be applied to the subband b.

The method of claim 3, wherein the extracting of the spatial information is performed based on Equation 7 below.

Where P _{c, b} (1≤c≤C: total number of channels)

(S _c [k] is the DFT value of each channel signal), and ref represents the reference channel.

A method of restoring an original multichannel signal from a downmix signal using spatial information extracted by performing the method according to any one of claims 1 to 4,

Obtaining a gain value for each subband of each channel signal using the spatial information;

Restoring each channel signal for each subband by applying the gain value to the downmix signal, wherein the restoring step is performed according to Equation 8 below.

here,

Is the frequency domain signal of c channel.