KR20070011100A

KR20070011100A - Methods for energy compensation for multi-channel audio coding and methods for generating encoded audio signal for the compensation

Info

Publication number: KR20070011100A
Application number: KR1020060055012A
Authority: KR
Inventors: 방희석; 오현오; 김동수; 임재현
Original assignee: 엘지전자 주식회사
Priority date: 2005-07-18
Filing date: 2006-06-19
Publication date: 2007-01-24

Abstract

A method of compensating the energy of an audio signal in multi-channel audio coding and a method of generating an encoded audio signal for the compensation are provided to compensate channel and frequency distortions generated when a multi-channel signal is reconstructed by using energy compensation information. A method of compensating the energy of a multi-channel audio signal by using energy compensation information includes steps of analyzing the energy compensation information, converting the analyzed information into a value applicable when the multi-channel audio signal is decoded, and performing energy compensation of the multi-channel audio signal when the multi-channel audio signal is decoded by using the converted value.

Description

METHODS FOR ENERGY COMPENSATION FOR MULTI-CHANNEL AUDIO CODING AND METHODS FOR GENERATING ENCODED AUDIO SIGNAL FOR THE COMPENSATION}

도 1은 본 발명에 일실시예에 의한 다채널 오디오 코딩에서 오디오 신호의 에너지 보정방법을 이용하여 오디오 신호를 디코딩하는 디코딩 장치의 블록 구성도이고,1 is a block diagram of a decoding apparatus for decoding an audio signal using an energy correction method of an audio signal in multi-channel audio coding according to an embodiment of the present invention.

도 2는 도 1에 도시된 다채널 디코더의 상세 블록도이고,FIG. 2 is a detailed block diagram of the multichannel decoder illustrated in FIG. 1;

도 3은 본 발명의 일실시예에 의한 다채널 오디오 코딩에서 오디오 신호의 에너지 보정방법을 이용하여 오디오 신호를 인코딩하는 인코딩 장치의 블록 구성도이고,3 is a block diagram of an encoding apparatus for encoding an audio signal by using an energy correction method of the audio signal in multichannel audio coding according to an embodiment of the present invention.

도 4와 도 5는 본 발명의 일실시예에 의한 부호화된 오디오 신호의 생성방법에 의해 생성된 비트 스트림의 구조도이다.4 and 5 are structural diagrams of a bit stream generated by a method of generating an encoded audio signal according to an embodiment of the present invention.

* 도면의 주요 부분에 대한 부호의 설명 *Explanation of symbols on the main parts of the drawings

10: 신호 분리부 20: 오디오 디코더10: signal separation unit 20: audio decoder

30: 다채널 디코더 31: 고주파 밴드 에너지 보상부30: multi-channel decoder 31: high frequency band energy compensation unit

32: 전체 밴드 에너지 보상부 33: 다채널 생성부32: full band energy compensation unit 33: multi-channel generator

50: 공간 인코더 60: 공간 비트 스트림 생성부50: spatial encoder 60: spatial bit stream generator

70: 에너지 보상 결정부 100: 오디오 비트 스트림70: energy compensation determining unit 100: audio bit stream

110: 다운믹스 비트 스트림 120: 에너지 보정정보 110: downmix bit stream 120: energy correction information

130: 공간정보 140: 에너지 보정 데이터130: spatial information 140: energy correction data

150: 다채널 오디오 신호 160: 다운믹스 오디오 신호150: multi-channel audio signal 160: downmix audio signal

170: 다운믹스 신호정보170: downmix signal information

본 발명은 다채널(멀티채널, Mult-Channel) 오디오 코딩에 관한 것으로, 특히 다채널 신호의 에너지 보정을 수행하는 에너지 보정방법 및 그 보정을 위한 오디오 신호 생성방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to multi-channel (multi-channel) audio coding, and more particularly, to an energy correction method for performing energy correction of a multi-channel signal and an audio signal generation method for the correction.

최근에 디지털 오디오 신호에 대한 다양한 코딩기술 및 방법들이 개발되고 있으며, 이와 관련된 제품들이 생산되고 있다. 또한 멀티채널 오디오 신호의 공간 정보를 이용하여 모노 또는 스테레오 오디오 신호를 디코딩 단계에서 멀티채널로 바꾸는 코딩방법들이 개발되고 있으며, 이에 대한 제품이 실용화되고 있다.Recently, various coding techniques and methods for digital audio signals have been developed, and related products have been produced. In addition, coding methods for converting a mono or stereo audio signal into a multichannel in a decoding step using spatial information of a multichannel audio signal have been developed, and a product for this has been put into practical use.

그러나 상기와 같은 제품들을 이용한 멀티 채널 오디오 신호 처리 기법은 신호 처리과정에서 데이터량을 줄일 수 있다는 장점은 있지만 특정 채널 또는 특정 주파수 밴드에서 시간에 따라 신호의 왜곡이 발생한다는 문제점이 있었다.However, the multi-channel audio signal processing technique using the above products has the advantage of reducing the amount of data in the signal processing process, but there is a problem that the signal distortion occurs over time in a specific channel or a specific frequency band.

이에 본 발명은 상기와 같은 종래의 제반 문제점을 해결하기 위해 제안된 것으로, 본 발명의 목적은 다채널 오디오 신호 코딩 방법에 대해, 다운믹스 신호 및 공간정보(spatial cue)로 다채널 신호를 재구성 하는 과정에서 발생하는 채널별, 주파수별 왜곡을 원신호와 다운믹스(down-mix)된 신호 또는 원신호와 최종 출력된 신호와의 차이를 바탕으로 구한 에너지 보정정보를 다채널 오디오 코딩에서 오디오 신호의 에너지 보정방법을 제공하는데 있다.Accordingly, the present invention has been proposed to solve the above conventional problems, and an object of the present invention is to reconstruct a multichannel signal with downmix signals and spatial cues for a multichannel audio signal coding method. The energy correction information obtained based on the difference between the original signal and the down-mixed signal or the difference between the original signal and the final output signal is obtained by the distortion of each channel and frequency generated in the process. An energy correction method is provided.

또한, 상기와 같은 보정을 위해 에너지 보정정보를 포함하여 부호화된 오디오를 생성하는 방법을 제공하는 것을 본 발명의 또다른 목적으로 하고 있다.In addition, another object of the present invention is to provide a method for generating encoded audio including energy correction information for the above correction.

본 발명에 따른 다채널 오디오 코딩에서 오디오 신호의 에너지 보정방법은 에너지 보정정보를 이용하여 다채널 오디오 신호의 에너지를 보정하는 방법에 있어서, 상기 에너지 보정정보를 해석하는 단계와; 상기 다채널 오디오 신호를 디코딩 할 때, 상기 해석된 정보를 적용 가능한 값으로 변환하는 단계와; 상기 변환된 값을 이용하여 다채널 오디오 신호를 디코딩 할 때, 다채널 오디오 신호의 에너지 보정을 수행하는 단계를 포함한다.An energy correction method of an audio signal in multichannel audio coding according to the present invention includes the steps of: correcting energy of a multichannel audio signal using energy correction information, comprising: interpreting the energy correction information; Converting the interpreted information into an applicable value when decoding the multichannel audio signal; Performing energy correction of the multichannel audio signal when decoding the multichannel audio signal using the converted value.

또한, 본 발명에 따른 다채널 오디오 코딩에서 오디오 신호의 에너지 보정방법은 다채널 오디오 입력신호를 인코딩 할 때, 상기 다채널 오디오 입력신호를 다운믹스한 다운믹스 신호와 상기 다채널 오디오 입력신호간의 에너지 차이를 이용하여 에너지 보정정보를 구하는 단계와; 상기 구해진 에너지 보정정보와 공간정보를 결합하고, 결합된 결과를 이용하여 에너지를 보정하는 단계를 포함하여 수행될 수 도 있다.In addition, in the multi-channel audio coding according to the present invention, the method for correcting the energy of an audio signal includes encoding an energy between a downmix signal downmixing the multichannel audio input signal and the multichannel audio input signal when encoding the multichannel audio input signal. Obtaining energy correction information using the difference; It may be performed by combining the obtained energy correction information and spatial information, and correcting the energy using the combined result.

한편, 본 발명에 따른 부호화된 오디오 신호 생성방법은 다채널 오디오 코딩에서 다채널 오디오 입력신호를 다운믹스하고, 상기 다채널 오디오 입력신호에서 공간정보를 추출하여, 상기 다운믹스 신호와 공간정보로 부호화된 오디오 신호를 생성하는 방법에 있어서, 상기 부호화된 오디오 신호는 에너지 보정정보를 포함하되, 상기 에너지 보정정보는 에너지 보정을 수행할 것인지 여부를 표현하거나, 에너지 보정을 수행할 경우 이전의 프레임의 에너지 보정정보를 그대로 사용할 것인지 아니면 새로운 에너지 보정정보를 사용할 것인지를 표시하는 플래그 정보와; 에너지 보정을 수행할 밴드에 대한 정보를 표시하는 보정수행 밴드 정보와; 및 상기 보정수행 밴드에 에너지 보정의 정도를 표시하는 보정수준 정보가 포함된다.Meanwhile, the encoded audio signal generating method according to the present invention downmixes a multichannel audio input signal in multichannel audio coding, extracts spatial information from the multichannel audio input signal, and encodes the downmix signal and spatial information. In the method of generating a compressed audio signal, the encoded audio signal includes energy correction information, wherein the energy correction information represents whether or not to perform energy correction, or when performing the energy correction energy of the previous frame Flag information indicating whether to use the correction information as it is or to use the new energy correction information; Correction performing band information for displaying information on a band to perform energy correction; And correction level information indicating a degree of energy correction in the correction performing band.

이하, 상기와 같은 본 발명, 다채널 오디오 코딩에서 오디오 신호의 에너지 보정방법 및 그 보정을 위한 부호화된 오디오 신호 생성방법의 기술적 사상에 따른 일실시예를 도면을 참조하여 설명하면 다음과 같다.Hereinafter, an embodiment according to the present invention, an energy correction method of an audio signal in a multi-channel audio coding and a coded audio signal generation method for the correction will be described with reference to the drawings.

도 1은 본 발명에 일실시예에 의한 다채널 오디오 코딩에서 오디오 신호의 에너지 보정방법을 이용하여 오디오 신호를 디코딩하는 디코딩 장치의 블록 구성도로서, 도시된 디코딩 장치는 신호 분리부(10)와 오디오 디코더(20)와, 다채널 디코더(30)와, 에너지 레벨 보상부(40)로 구성된다.1 is a block diagram of a decoding apparatus for decoding an audio signal by using an energy correction method of an audio signal in multi-channel audio coding according to an embodiment of the present invention. An audio decoder 20, a multichannel decoder 30, and an energy level compensator 40 are included.

이에 도시된 바와 같이, 에너지 보정정보가 포함된 오디오 비트 스트림(100)이 신호 분리부(10)에 인가되면, 상기 신호 분리부(10)에서는 상기 오디오 비트 스트림(100)에서 다운믹스 비트 스트림(110), 에너지 보정정보(120), 공간정보(130) 를 각각 별개로 추출하여 출력한다. 다운믹스 비트 스트림(110)의 일례로 모노 또는 스테레오 비트 스트림 등이 있다.As shown in FIG. 2, when the audio bit stream 100 including energy correction information is applied to the signal separator 10, the signal separator 10 may use a downmix bit stream ( 110, the energy correction information 120, and the spatial information 130 are separately extracted and output. An example of the downmix bit stream 110 is a mono or stereo bit stream.

그리고 다운믹스 비트 스트림(110)은 오디오 디코더(20)로 인가되어 다운믹스 오디오 신호(160)로 출력된다. 오디오 디코더(20)에는 AAC(Advenced Audio Coding)나 MP3 등이 있다. 또한, 오디오 디코더(20)에서 다운믹스 오디오 신호(160)와는 별개로 다채널 오디오 신호(150)의 생성을 위해 다운믹스 신호정보(170)를 출력하기도 한다. 다운믹스 신호정보(170)라 함은 오디오 디코더(20)에서 복호할 때, 발생하는 정보를 말한다. 예를 들어, QMF(Quadrature Mirror Filter), MDCT(Modified Discrete Cosine Transform), PCM(Pulse Code Modulation) 등의 변환과정에서 발생하는 정보라고 할 수 있다.The downmix bit stream 110 is applied to the audio decoder 20 and output as the downmix audio signal 160. The audio decoder 20 includes AAC (Advanced Audio Coding) or MP3. In addition, the audio decoder 20 may output downmix signal information 170 to generate the multi-channel audio signal 150 separately from the downmix audio signal 160. The downmix signal information 170 refers to information generated when the audio decoder 20 decodes. For example, the information may be referred to as information generated during a transformation process such as Quadrature Mirror Filter (QMF), Modified Discrete Cosine Transform (MDCT), and Pulse Code Modulation (PCM).

한편, 다채널 디코더(30)에서는 오디오 디코더(20)에서 인가된 다운믹스 신호정보(170)와 신호 분리부(10)에서 인가된 공간정보(130)를 종합하여 다채널 오디오 신호(150)를 출력한다. 이 경우 신호 분리부(10)에서 추출된 에너지 보정정보(120)는 에너지 레벨 보상부(40)로 인가된 후, 다채널 디코더(30)에 적용할 수 있는 적절한 값(140, 에너지 보정 데이터)으로 변환되어 다채널 오디오 디코더(30)로 입력된다. 그러면 다채널 오디오 디코더(30)에서 보정을 수행하여 다채널 오디오 신호(150)를 출력하게 된다. 에너지 레벨 보상부(40)에서는 에너지 보정정보(120)를 해석하여 다채널 디코더(30)에 적용 가능한 값(140)으로 변환하여 출력하는데, 이 때, 에너지 보정정보를 해석하는 방법은 다양하게 존재하지만 다음과 같은 방법이 예가 될 수 있다.Meanwhile, the multichannel decoder 30 combines the downmix signal information 170 applied by the audio decoder 20 and the spatial information 130 applied by the signal separator 10 to combine the multichannel audio signal 150. Output In this case, the energy correction information 120 extracted by the signal separation unit 10 is applied to the energy level compensation unit 40, and then an appropriate value 140 (energy correction data) that can be applied to the multi-channel decoder 30. Are converted to the multi-channel audio decoder 30. Then, the multichannel audio decoder 30 performs correction to output the multichannel audio signal 150. The energy level compensator 40 interprets the energy correction information 120 and converts the energy correction information 120 into a value 140 applicable to the multi-channel decoder 30. In this case, there are various methods for interpreting the energy correction information. However, for example:

에너지 레벨 보상부(40)로 전송된 에너지 보정정보(120)에 처리를 하지 않고 그대로 해석하거나, 에너지 보정정보(120)를 시간축 상에서 스무딩(smoothing)기법을 적용하거나, 에너지 보정정보(120)를 시간축 상에서 인터폴레이션(interpolation)기법을 적용할 수 있다. 또한, 에너지 보정정보(120)를 밴드사이에서 스무딩기법을 적용하거나, 에너지 보정정보(120)를 밴드사이에서 인터폴레이션기법 등이 적용될 수 있다. The energy correction information 120 transmitted to the energy level compensator 40 may be interpreted as it is without processing, or the energy correction information 120 may be applied on the time axis, or the energy correction information 120 may be applied. Interpolation techniques can be applied on the time base. In addition, the smoothing technique may be applied to the energy correction information 120 between bands, or the interpolation technique may be applied to the energy correction information 120 between bands.

스무딩(smoothing)은 로패스필터(lowpass filter) 등을 사용하는 방법으로 분산되어 존재하는 값 사이를 부드럽게 매꾸게 하는 것을 말한다. 하지만 분산되어 존재하고 있는 값 자체도 영향을 받아 그 값이 약간 변하기도 한다. Smoothing is the use of lowpass filters, etc., to smoothly distribute the values that exist. However, the value that is distributed is also affected, and the value may change slightly.

그리고 인터폴레이션(interpolation)의 경우 분산되어 존재하는 값 사이를 부드럽게 매꾸는 것은 스무딩과 유사하지만 분산되어 존재하는 값 자체는 변하지 않는다. 스무딩과 인터폴레이션은 모두 광의의 스무딩(smoothing)기법의 한 종류라 볼 수 있다. 단지 에너지 보정정보(120)를 해석하는 방법에 있어서의 구체적 적용을 로패스필터(lowpass filter)를 사용하여 할 수도 있고 인터폴레이션을 사용하여 할 수도 있다는 것을 예시한 것이다.In the case of interpolation, smoothly bridging the existing values is similar to smoothing, but the distributed values do not change. Smoothing and interpolation are both types of broad smoothing techniques. It merely illustrates that a specific application in the method of interpreting the energy correction information 120 may be performed using a lowpass filter or interpolation.

또한, 에너지 보정정보(120)를 시간축 상에서 스무딩기법을 적용하거나, 에너지 보정정보(120)를 시간축 상에서 인터폴레이션기법을 적용한다는 말은 다음과 같이 풀이될 수 있다.In addition, the application of the smoothing technique of the energy correction information 120 on the time axis or the interpolation technique of the energy correction information 120 on the time axis may be solved as follows.

특정 시간에 밴드는 저밴드에서 고밴드까지 여러개의 밴드가 있고, 이 밴드 중 특정 밴드의 값이 시간에 따라 변하게 될 때 이를 시간상에서 부드럽게 한다는 것이다. At any given time, the band has several bands, from low to high, and smooths in time as the value of a particular band changes over time.

또한, 에너지 보정정보(120)를 밴드사이에서 스무딩기법을 적용하거나, 에너지 보정정보(120)를 밴드사이에서 인터폴레이션기법을 적용한다는 말은 특정 시간에서 저밴드에서 고밴드까지의 보정값을 부드럽게 한다는 것이다.In addition, applying the smoothing technique between the energy correction information 120 between the bands or applying the interpolation technique between the energy correction information 120 between the bands smoothes the correction values from the low band to the high band at a specific time. will be.

그리고 MPEG Surround에 사용되는 밴드들은 각 밴드별로 레졸루션(Resolution)이 다르고 사용용도도 다르기 때문에, 에너지 보정정보(120)를 해석하여 다채널 디코더(30)에 적용 가능한 값으로 변환을 하여 도메인별로 적용할 수 있다. 예를 들면, 서브밴드 도메인에서 적용하거나, 하이브리드 밴드 도메인에서 적용하거나, QMF(Quadrature Mirror Filter) 밴드 도메인에서 적용하는 것이다. 에너지 보정정보(120)를 공간정보(130) 값에 적용할 수도 있다.Since the bands used for MPEG Surround have different resolutions and different uses for each band, the energy correction information 120 is interpreted and converted into values applicable to the multi-channel decoder 30 to be applied for each domain. Can be. For example, it is applied in the subband domain, applied in the hybrid band domain, or applied in the quadrature mirror filter (QMF) band domain. The energy correction information 120 may be applied to the value of the spatial information 130.

도 2는 도 1에 도시된 다채널 디코더의 상세 블록도로 도시된 다채널 디코더는 고주파 밴드 에너지 보상부(31)와, 전체 밴드 에너지 보상부(32)와, 다채널 생성부(33)로 구성된다.FIG. 2 is a detailed block diagram of the multi-channel decoder shown in FIG. 1. The multi-channel decoder includes a high frequency band energy compensator 31, a full band energy compensator 32, and a multichannel generator 33. do.

이에 도시된 바와 같이, 본 발명의 일실시예에 의한 다채널 오디오 코딩에서 오디오 신호의 에너지 보정하는 방법은 에너지 보정정보 데이터(140)를 고주파 밴드 에너지 보상부(31)로 인가하여 고주파 밴드에만 보정을 수행하여 다채널 생성부(33)에서 다채널 오디오 신호(150)를 출력하게 하는 방법이 있고, 에너지 보정정보 데이터(140)를 전체 밴드 에너지 보상부(32)로 인가하여 전체 밴드에 보정을 수행하여 다채널 생성부(33)에서 다채널 오디오 신호(150)를 출력할 수도 있다. 이 경우 고주파 밴드만을 특별히 언급한 이유는 고주파 대역에서 상대적으로 왜곡이 많이 발생되기 때문이다.As shown in the drawing, in the multi-channel audio coding according to an embodiment of the present invention, a method of correcting the energy of an audio signal is applied to the high frequency band energy compensator 31 by applying the energy correction information data 140 to correct only the high frequency band. There is a method for outputting the multi-channel audio signal 150 from the multi-channel generating unit 33 by applying the energy correction information data 140 to the full band energy compensation unit 32 to correct the entire band. The multichannel generator 33 may also output the multichannel audio signal 150. In this case, the reason for mentioning only the high frequency band is because a lot of distortion occurs in the high frequency band.

도 3은 본 발명의 일실시예에 의한 다채널 오디오 코딩에서 오디오 신호의 에너지 보정방법을 이용하여 오디오 신호를 인코딩하는 인코딩 장치의 블록 구성도로서, 공간 인코더(50)와, 에너지 보상 결정부(70)와, 비트 스트림 생성부(60)로 구성된다.3 is a block diagram of an encoding apparatus for encoding an audio signal using an energy correction method of an audio signal in multichannel audio coding, according to an embodiment of the present invention. 70 and a bit stream generator 60.

이에 도시된 바와 같이, 다채널 오디오 입력신호(200)는 공간 인코더(50)로 인가되고, 인가된 다채널 오디오 입력신호(200)는 다운믹스되어 다운믹스 오디오 신호(210)로 출력된다. 다운믹스 오디오 신호(210)의 일예에는 모노 오디오 신호 또는 스테레오 오디오 신호 등이 있다.As shown therein, the multichannel audio input signal 200 is applied to the spatial encoder 50, and the applied multichannel audio input signal 200 is downmixed and output as the downmix audio signal 210. One example of the downmix audio signal 210 is a mono audio signal or a stereo audio signal.

또한, 공간 인코더(50)에서는 다채널 오디오 입력신호(200)를 인가받아 공간정보(220)를 추출해 낸다. 그리고 에너지 보상 결정부(70)에서는 에너지 보정수행 여부를 결정하여 에너지 보정정보(230)를 출력한다. In addition, the spatial encoder 50 receives the multi-channel audio input signal 200 to extract the spatial information 220. The energy compensation determination unit 70 determines whether to perform energy correction and outputs energy correction information 230.

에너지 보상 결정부(70)에서는 다양한 방법으로 에너지 보정정보(230)를 출력하여 공간 비트 스트림 생성부(60)로 인가시킨다. 그러면 공간 비트 스트림 생성부(60)에서 에너지 보정정보(230)와 공간정보(220)를 하나의 비트열로 결합하여 공간 비트 스트림(240)을 출력한다.The energy compensation determination unit 70 outputs the energy correction information 230 in various ways and applies it to the spatial bit stream generator 60. Then, the spatial bit stream generation unit 60 outputs the spatial bit stream 240 by combining the energy correction information 230 and the spatial information 220 into one bit string.

다운믹스 오디오 신호(210)와 다채널 오디오 입력신호(200)를 비교하여 두 신호간의 에너지 차이에 대한 정보가 에너지 보정정보(230)가 될 수 있다. The energy correction information 230 may be information about an energy difference between the two signals by comparing the downmix audio signal 210 and the multichannel audio input signal 200.

구체적으로 해당 신호(다채널 오디오 입력신호(200)와 다운믹스 오디오 신호(210))를 적절한 밴드범위로 나누고, 각각의 밴드 에너지를 전체 밴드 에너지로 나누어 정규화(normalize)하고, 정규화된 각 밴드의 에너지 값들의 비로 두 신호간의 에너지 차이를 구할 수 있다. Specifically, the corresponding signal (multi-channel audio input signal 200 and downmix audio signal 210) is divided into an appropriate band range, each band energy is divided by total band energy, and normalized, and each normalized band The energy difference between two signals can be obtained from the ratio of energy values.

밴드범위를 적절하게 나누는 방법은 다양하다. 예를 들면 저주파는 촘촘하게, 고주파는 상대적으로 듬성듬성하게 나눌 수 있다. 또한 등가사각대역폭(EBR, Equivalent Rectangular Bandwith) 스케일(scale)로 나누는 방법이 있다.There are various ways to divide the band range appropriately. For example, low frequencies can be densely divided and high frequencies can be divided relatively sparsely. There is also a method of dividing by an equivalent rectangular bandwith (EBR) scale.

또한, 에너지 보정정보(230)를 구하는 방법은 다음과 같다.In addition, the method of obtaining the energy correction information 230 is as follows.

우선, 다채널 오디오 입력신호(200)와 다운믹스 오디오 신호(210)의 각 밴드에 대하여 전체 에너지에 대한 각 밴드의 에너지 크기의 비(편의상 '제1에너지 크기 비'라 한다)를 구한다. 그리고 각 밴드 내에서의 다채널 오디오 입력신호(200)와 다운믹스 오디오 신호(210)의 에너지 크기의 비(편의상 '제2에너지 크기 비'라 한다)를 구하여, 제1에너지 크기 비와 제2에너지 크기 비를 종합한 것을 에너지 보정정보(230)로 사용한다.First, for each band of the multi-channel audio input signal 200 and the downmix audio signal 210, the ratio of the energy magnitude of each band to the total energy (conventionally referred to as the 'first energy magnitude ratio') is obtained. In addition, the ratio of energy magnitudes (for convenience, referred to as a 'second energy magnitude ratio') of the multichannel audio input signal 200 and the downmix audio signal 210 in each band is obtained, and the first energy magnitude ratio and the second energy ratio are obtained. The total energy size ratio is used as the energy correction information 230.

여기서 종합한다는 말의 의미는 제1에너지 크기 비와 제2에너지 크기 비를 더하는 연산을 포함할 수도 있고, 각 경우에 따라 테이블(table)을 형성하는 것을 포함할 수도 있다. 이는 실험결과에 의해 구체적으로 구해질 수 있다.Here, the meaning of synthesizing may include an operation of adding the first energy size ratio and the second energy size ratio, and may include forming a table according to each case. This can be specifically determined by the experimental results.

또한, 공간 비트 스트림(240)에는 각 밴드의 전체 밴드에 대한 상대적인 에너지 크기의 정도에 따라 또는 각 밴드에 있어서 다채널 오디오 입력신호(200)와 다운믹스 오디오 신호(210)의 차이의 정도에 따라 에너지 보정정보(230)가 포함될 수도 그렇지 않을 수도 있다. 예를 들어, 해당 밴드의 에너지가 절대적으로 작은 양이라면 그것은 무시할 수도 있기에 그 밴드에 대해서는 에너지 보상을 해주지 않 을 수 있다. 또한, 전체 에너지가 아주 작은 경우, 예를 들어 아주 작은 소리는 에너지 보정을 한 효과가 없기 때문에 보정을 수행하지 않을 수도 있다. 물론 다채널 오디오 입력신호(200)와 다운믹스 오디오 신호(210)의 차이가 무시할 정도로 작은 경우에도 보정을 수행하지 않을 수 있다. 이렇듯 에너지 보정을 선택적으로 수행하는 이유는 사람의 귀가 모든 음성을 인지하는 것이 아니라 선택적으로 인지하기 때문이다. 따라서 사람의 귀로 인지하지 못하는 에너지 차이 또는 절대적인 에너지 양이 작은 경우에는 굳이 에너지 보정정보(230)를 추가하는 단계를 생략하여 다채널 데이터 량을 줄일 수 있다.In addition, the spatial bit stream 240 has a degree of energy relative to the entire band of each band or a degree of difference between the multichannel audio input signal 200 and the downmix audio signal 210 in each band. The energy correction information 230 may or may not be included. For example, if the band's energy is an absolutely small amount, it can be ignored and no energy compensation can be given for that band. Also, if the total energy is very small, for example a very small sound may not perform the correction because there is no effect of energy correction. Of course, even if the difference between the multi-channel audio input signal 200 and the downmix audio signal 210 is negligibly small, the correction may not be performed. The reason for this selective energy correction is that the human ear does not recognize all the voices, but selectively. Therefore, when the energy difference or absolute amount of energy not recognized by the human ear is small, the step of adding energy correction information 230 may be omitted to reduce the amount of multi-channel data.

구체적으로 각 밴드의 전체 밴드에 대한 상대적인 에너지 크기가 일정값(기준값)을 넘는 경우는 에너지 보정정보(230)를 공간 비트 스트림(240)에 추가되게 하고, 그렇지 않은 경우에는 추가되지 않게 하는 방법을 취할 수 있다.Specifically, when the relative energy magnitude of the entire band of each band exceeds a predetermined value (reference value), the energy correction information 230 is added to the spatial bit stream 240, otherwise it is not added. Can be taken.

또한, 각 밴드에 있어서 다채널 오디오 입력신호(200)와 다운믹스 오디오 신호(210)의 차이가 일정값(기준값)을 넘는 경우는 에너지 보정정보(230)를 공간 비트 스트림(240)에 추가되게 하고, 그렇지 않은 경우에는 추가되지 않게 하는 방법을 취할 수도 있다.In addition, when the difference between the multi-channel audio input signal 200 and the downmix audio signal 210 in each band exceeds a predetermined value (reference value), the energy correction information 230 is added to the spatial bit stream 240. If not, a method may be taken so that it is not added.

또한, 에너지 보정정보(230)를 추가하는 순간을 정하는 비율의 값과 에너지 보정정보(230)의 추가가 중단되는 순간을 정하는 값을 다른 값으로 하여 시간에 따라 보정수행 정보의 추가와 중단이 심하게 반복되지 않게 하는 방법을 취할 수도 있다.In addition, the addition and interruption of the correction performance information is severely changed with time by setting the value of the ratio that determines the moment of adding the energy correction information 230 and the value that determines the moment when the addition of the energy correction information 230 is stopped to different values. You can also take steps to avoid repetition.

또한, 공간 비트 스트림(240)에는 각 밴드의 전체 밴드에 대한 상대적인 에 너지 크기의 정도와 각 밴드에 있어서 다채널 오디오 입력신호(200)와 다운믹스 오디오 신호(210)의 차이의 정도를 종합한 정보에 따라 에너지 보정정보(230)가 포함될 수도 그렇지 않을 수도 있다.In addition, the spatial bit stream 240 combines the degree of energy magnitude relative to the entire band of each band and the degree of difference between the multichannel audio input signal 200 and the downmix audio signal 210 in each band. Depending on the information, the energy correction information 230 may or may not be included.

그러한 예로 각 밴드의 전체 밴드에 대한 상대적인 에너지 크기의 정도에 해당하는 인자와 각 밴드에 있어서 다채널 오디오 입력신호(200)와 다운믹스 오디오 신호(210)의 차이의 정도에 해당하는 인자 중 한 가지 인자만 보정을 해야 할 기준 비율을 넘고, 다른 한 가지 인자는 기준 비율을 넘지 못할 경우에는 다른 조건 인자를 정의하고, 이에 따라 보정수행 정보를 추가하는 방법을 취할 수 있다. For example, one of the factors corresponding to the degree of energy magnitude relative to the entire band of each band and the factor corresponding to the degree of difference between the multichannel audio input signal 200 and the downmix audio signal 210 in each band. If only the factor exceeds the reference ratio to be corrected, and the other factor does not exceed the reference ratio, another condition factor may be defined, and accordingly, correction method information may be added.

예를 들어, 한 가지 인자는 기준을 넘고, 다른 한 가지는 기준을 넘지 못한 경우, 그 둘의 값을 더한 값을 또다른 인자로 정의하여, 그 또다른 인자의 값이 일정 기준을 넘는 경우에 에너지 보정정보(230)를 공간 비트 스트림(240)에 추가하는 방법을 예로 들 수 있다.For example, if one factor exceeds the criterion and the other does not exceed the criterion, the value of the two plus the value is defined as another factor, and the energy when the value of the other factor exceeds a certain criterion. For example, the correction information 230 may be added to the spatial bit stream 240.

이하, 본 발명의 또다른 일실시예인 부호화된 오디오 신호의 생성방법을 설명하기로 한다.Hereinafter, a method of generating an encoded audio signal according to another embodiment of the present invention will be described.

도 4와 도 5는 본 발명의 일실시예에 의한 부호화된 오디오 신호의 생성방법에 의해 생성된 비트 스트림의 구조도이다. 4 and 5 are structural diagrams of a bit stream generated by a method of generating an encoded audio signal according to an embodiment of the present invention.

도시된 바와 같이, 다채널 오디오 신호를 처리하는 과정에서 신호의 왜곡이 발생하는 경우에 에너지 보정정보를 이용하여 신호의 에너지 보정을 하는 경우 그 보정을 위해 에너지 보정정보를 비트열에 삽입하는 방법이 취해진다. 이 경우 상기 비트열에 포함된 에너지 보정정보에는 다음과 같은 정보들이 포함된다.As shown, in the case where signal distortion occurs in the process of processing a multi-channel audio signal, when energy correction of the signal is performed using the energy correction information, a method of inserting the energy correction information into the bit string for the correction is taken. All. In this case, the energy correction information included in the bit string includes the following information.

상기 비트열에 삽입된 에너지 보정정보는 플래그 정보(200)와, 보정수행 밴드 정보(210)와, 보정수준 정보(220)로 구성될 수 있다. 플래그 정보(200)는 에너지 보정을 수행할 것인지 여부를 표현하거나, 에너지 보정을 수행할 경우 이 전 프레임의 에너지 보정정보를 그대로 사용할 것인지 아니면 새로운 에너지 보정정보를 사용할 것인지가 표시된다. 그리고 보정수행 밴드 정보(210)에는 보정을 수행할 밴드에 대한 정보가 표시된다. 보정수준 정보(220)에는 에너지 보정의 정도가 표시된다.The energy correction information inserted in the bit string may include flag information 200, correction performance band information 210, and correction level information 220. The flag information 200 expresses whether or not to perform energy correction, or if the energy correction is performed, whether the energy correction information of the previous frame is used as it is or whether new energy correction information is used. In the correction performance band information 210, information on a band to be corrected is displayed. The correction level information 220 displays the degree of energy correction.

이 경우, 에너지 보정을 수행할지 말지는 다음과 같은 방법으로 행하여질 수 있다. 예를 들면, 어떤 기준 에너지를 정하고, 그 기준 에너지와 비교하여 기준 비율 이하로 에너지 차이가 발생하면 에너지 보정을 수행하지 않고, 그 이상으로 발생하면 에너지 보정을 수행하는 방법이 있다. 이와 같은 방법이 행해질 수 있는 이유는 기준 에너지 비율 이하로 에너지 차이가 발생했을 때는 신호의 왜곡이 미미하하기에 그 차이를 무시하여도 귀로 들었을 때는 별반 차이가 없기 때문이다.In this case, whether or not to perform energy correction can be made in the following manner. For example, there is a method of determining a reference energy, and performing energy correction if an energy difference occurs below the reference ratio compared to the reference energy, and performing an energy correction if it occurs above. Such a method can be performed because when the energy difference is less than the reference energy ratio, the distortion of the signal is insignificant, so even if the difference is ignored, there is no difference.

또한, 보정수행 밴드 정보(210)에는 각각의 밴드에 대하여 보정을 수행할지 여부, 보정을 수행할 밴드의 시작과 끝, 보정을 수행할 밴드의 시작값, 보정을 수행할 밴드의 끝 값 중 적어도 하나가 포함될 수 있다.In addition, the correction band information 210 includes at least one of whether to perform correction for each band, the start and end of the band to be corrected, the start value of the band to be corrected, and the end value of the band to be corrected. One may be included.

또한, 보정수준 정보(220)에는 보정을 수행할 각각의 밴드에 대한 에너지 보정 값, 에너지 보정을 수행하는 밴드 전체에 대한 하나의 값, 에너지 보정을 수행하는 밴드들에 대한 하나의 기울기 값, 에너지 보정을 수행하는 밴드들에 대한 보정값 인터폴레이션을 위한 몇 개의 값 중 적어도 하나를 포함되는 것이 바람직하 다.In addition, the correction level information 220 includes an energy correction value for each band to be corrected, one value for the entire band for energy correction, one slope value for the bands for energy correction, and energy. It is preferable to include at least one of several values for correction value interpolation for the bands performing the correction.

에너지 보정정보를 비트열에 삽입하는 방법의 일례로 에너지 보정을 수행할 경우에 있어 이 전 프레임의 보정정보를 그대로 사용할 때는 플래그 정보를 0으로 표시하고, 새로운 보정정보를 사용할 경우는 플래그 정보를 1로 표시하는 방법이 있다.As an example of inserting the energy correction information into the bit string, when performing the energy correction, the flag information is displayed as 0 when the correction information of the previous frame is used as it is, and the flag information is set to 1 when the new correction information is used. There is a way to display it.

또한, 보정을 수행할 밴드의 시작점을 두 곳으로 구분하여, 어느 한 시작점부터 보정이 수행되게 하기 위해서는 보정수행 밴드 정보(210)를 0으로 표시하고, 또다른 한 시작점으로부터 보정을 수행하기 위해서는 보정수행 밴드 정보를 1로 표시하여 비트열에 삽입하는 방법이 있다. 예를 들어, 10번 밴드부터 보정수행을 할 때는 보정수행 밴드 정보(210)를 0으로 표시하고, 보정수행을 15번 밴드부터 수행할 때는 보정수행 밴드 정보(210)를 1로 표시할 수 있다. 상기와 같은 방법이 수행될 수 있는 이유는 둘 중의 하나의 경우로 대략적인 모델링이 가능하기 때문이다.In addition, by dividing the starting point of the band to be corrected into two places, the correction performing band information 210 is displayed as 0 in order to perform the correction from one starting point, and to perform the correction from another starting point, the correction is performed. There is a method of marking performance band information as 1 and inserting the information into a bit string. For example, when performing correction from band 10, correction band information 210 may be displayed as 0, and when performing correction from band 15, correction band information 210 may be displayed as 1. . The above method can be performed because one of two cases can be roughly modeled.

또한, 본 발명의 일실시예에 의한 보정정보를 비트열에 삽입하는 방법은 채널별로 구분되어지는 방법을 취할 수도 있다.In addition, the method of inserting the correction information into the bit string according to an embodiment of the present invention may take a method that is divided for each channel.

예를 들어, 도 5에 도시된 바와 같이, 보정수행 밴드 정보와 보정수준 정보를 둘로 구분하여, 왼쪽 채널에 적용할 때는 band information 1(310), Energy level information 1(320)에 해당 비트을 삽입하고, 오른쪽 채널에 적용할 때는 band information 2(310′), Energy level information 2(320′) 해당비트를 삽입하는 방법이 적용되어질 수 있다. 이 경우 센터 채널에 적용되는 보정값은 왼쪽 채널과 오른쪽 채널에 적용되는 보정값의 합의 1/2이 된다. 예를 들면, 왼쪽채널의 보정값이 3이고, 오른쪽 채널의 보정값이 6인 경우 센터채널에는 (3+6)/2= 4.5가 보정값으로 적용된다는 의미이다.For example, as shown in FIG. 5, when performing the correction channel band information and the correction level information into two, and applying it to the left channel, a corresponding bit is inserted into band information 1 (310) and energy level information 1 (320). When applied to the right channel, a method of inserting corresponding bits of band information 2 (310 ') and energy level information 2 (320') may be applied. In this case, the correction value applied to the center channel is 1/2 of the sum of the correction values applied to the left channel and the right channel. For example, when the correction value of the left channel is 3 and the correction value of the right channel is 6, it means that (3 + 6) /2=4.5 is applied as the correction value to the center channel.

또한, 플래그 정보(200, 300)를 1로 표시한 경우에 있어, 보정을 수행하지 않는 경우는 보정수준 정보(210, 310, 310′)를 0으로 표시하고, 끝 밴드에서의 에너지 차이가 1.5dB가 되도록 선형적(Linear)으로 감쇄하고자 할 경우는 보정수준 정보(210, 310, 310′)를 1로 표시하며, 끝 밴드에서의 에너지 차이가 3dB가 되도록 선형적(Linear)으로 감쇄하고자 할 경우는 보정수준 정보(210, 310, 310′)를 2로 표시하여 비트열에 삽입할 수 있다. 상기와 같이, 에너지 감쇄를 1.5dB와 3dB의 두 가지로 구분하여 선형적(linear)으로 감쇄하는 이유는 더 자세히 나누어 감쇄할 수도 있지만, 그렇게 되면 비트 수가 너무 많이 쓰이게 되므로 최소비트에서 모델링하기 위해서이다. 또한, 선형적으로 감쇄하는 것은 에너지가 주파수와의 관계에서 선형성을 가질 때가 종종 있기 때문이다.In addition, in the case where the flag information 200, 300 is displayed as 1, when the correction is not performed, the correction level information 210, 310, 310 'is displayed as 0, and the energy difference in the end band is 1.5. In order to linearly attenuate dB, the correction level information (210, 310, 310 ') is indicated as 1, and linear attenuation is made so that the energy difference in the end band is 3dB. In this case, the correction level information 210, 310, 310 ′ may be displayed as 2 and inserted into the bit string. As described above, the reason for linearly attenuating the energy decay into two parts, 1.5 dB and 3 dB, can be attenuated in more detail. . Also, linear decay is because energy often has linearity in relation to frequency.

또한, 끝 밴드에서의 에너지 차이를 1.5dB, 3dB외에 더 자세히 나눌 필요가 있을 경우, 보정수준 정보(210, 310, 310′)를 3으로 표시하여 저장(Reserved)해 놓을 수도 있다.In addition, when the energy difference in the end band needs to be further divided into 1.5 dB and 3 dB, the correction level information 210, 310, and 310 ′ may be displayed as 3 and stored.

또한, 상기 플래그 정보, 보정수행 밴드 정보, 보정수준 정보를 독립적으로 표시하지 않고 둘 또는 셋을 묶어서 하나의 값으로 표시하여 비트열에 삽입할 수도 있다. In addition, the flag information, the correction performance band information, and the correction level information may be displayed as a single value by grouping two or three together without being displayed independently and inserted into the bit string.

이상에서 본 발명의 바람직한 실시예를 설명하였으나, 본 발명은 다양한 변화와 변경 및 균등물을 사용할 수 있다. 본 발명은 상기 실시예를 적절히 변형하여 동일하게 응용할 수 있음이 명확하다. 따라서 상기 기재 내용은 하기 특허청구범위의 한계에 의해 정해지는 본 발명의 범위를 한정하는 것이 아니다.Although the preferred embodiment of the present invention has been described above, the present invention may use various changes, modifications, and equivalents. It is clear that the present invention can be applied in the same manner by appropriately modifying the above embodiments. Accordingly, the above description does not limit the scope of the invention as defined by the limitations of the following claims.

이상에서 살펴본 바와 같이, 본 발명에 의한 다채널 오디오 코딩에서 오디오 신호의 에너지 보정방법 및 그 보정을 위한 부호화된 오디오 신호 생성방법은 다채널 신호를 재구성 하는 과정에서 발생하는 채널별, 주파수별 왜곡을 에너지 보정정보를 이용하여 보상할 수 있는 효과가 있다.As described above, in the multi-channel audio coding according to the present invention, the energy correction method of the audio signal and the coded audio signal generation method for the correction of the channel-specific distortion generated during the reconstruction of the multi-channel signal There is an effect that can be compensated using the energy correction information.

Claims

In the method for correcting the energy of the multi-channel audio signal using the energy correction information,

(a) interpreting the energy correction information;

(b) converting the interpreted information into an applicable value when decoding the multichannel audio signal;

and (c) performing energy correction of the multichannel audio signal when decoding the multichannel audio signal using the converted value.

The method of claim 1,

In step (a), the energy correction information is processed as it is, the smoothing technique is applied to the energy correction information on the time axis, the energy correction information is applied to the time axis, and the energy correction information is applied between the bands. And applying at least one of a smoothing technique and applying the energy correction information between bands to the interpolation technique.

The method of claim 1,

And (b) converting the interpreted information into a value applicable to at least one of a subband domain, a hybrid band domain, and a QMF band domain.

The method of claim 1,

In the step (b), the energy correction method of the audio signal in the multi-channel audio coding, characterized in that for converting the interpreted information to a value applicable to the spatial information.

The method of claim 1,

The energy correction method is an energy correction method of the audio signal in the multi-channel audio coding, characterized in that to perform the energy correction by applying the energy correction information only to the high frequency band.

The method of claim 1,

The energy correction method is the energy correction method of the audio signal in the multi-channel audio coding, characterized in that for performing the energy correction by applying the energy correction information to the entire band.

(a) obtaining energy correction information using an energy difference between the downmix signal downmixing the multichannel audio input signal and the multichannel audio input signal when encoding the multichannel audio input signal;

and (b) combining spatial information with the obtained energy correction information and correcting the energy using the combined result.

The method of claim 7, wherein

In the step (a), the downmix signal and the multi-channel audio input signal are divided into bands, normalized by dividing the energy of the divided band by the total band energy, and then energy correction information is expressed as a ratio of energy values of the normalized band. The energy correction method of the audio signal in the multi-channel audio coding, characterized in that obtaining.

The method according to claim 7 or 8,

In step (a), the ratio of the energy magnitude of the divided band to the total energy with respect to the band of the divided multichannel audio input signal and the downmix signal, and the multichannel audio signal and the downmix signal within the divided band The energy correction method of the audio signal in the multi-channel audio coding, characterized in that the ratio of the energy magnitude is calculated, and the sum of the ratios of the two energy magnitudes is calculated as energy correction information.

The method of claim 9,

In step (b), the step of determining whether to combine the energy correction information with the spatial information is preceded, and the energy is corrected only when the combination of the energy correction information and the spatial information is permitted. Energy correction method of audio signal in multichannel audio coding.

The method of claim 10,

The permission of combining the energy correction information and the spatial information is determined according to the relative energy magnitude of the entire band of the divided band.

The method of claim 10,

The permission of combining the energy correction information and the spatial information is determined according to an energy difference between the multi-channel audio input signal of the divided band and the downmix signal.

The method of claim 10,

The permission of the combination of the energy correction information and the spatial information is determined according to a result of the sum of the relative energy magnitudes of the entire bands of the divided bands and the energy difference between the multi-channel audio input signals of the divided bands and the downmix signal. The energy correction method of the audio signal in the multi-channel audio coding, characterized in that.

The method of claim 11,

The energy correction method of the audio signal in the multi-channel audio coding, characterized in that the combination of the energy correction information and the spatial information is permitted only when the relative magnitude of energy of the entire band of the divided band exceeds a predetermined value.

The method of claim 12,

The energy of the audio signal in the multi-channel audio coding is allowed to combine the energy correction information and the spatial information only when the energy difference between the multi-channel audio input signal of the divided band and the downmix signal exceeds a predetermined value. Correction method.

The method of claim 13,

The combination of the energy correction information and the spatial information is allowed only when the sum of the relative energy magnitudes of the divided bands and the energy difference between the multi-channel audio input signal of the divided band and the downmix signal exceeds a predetermined value. The energy correction method of the audio signal in the multi-channel audio coding, characterized in that.

A method for downmixing a multichannel audio input signal in multichannel audio coding, extracting spatial information from the multichannel audio input signal, and generating an audio signal encoded with the downmix signal and spatial information,

The encoded audio signal includes energy correction information, wherein the energy correction information indicates whether to perform energy correction or, when performing energy correction, whether to use the energy correction information of the previous frame as it is or new energy correction. Flag information indicating whether to use the information;

Correction performing band information for displaying information on a band to perform energy correction; And

And a correction level information indicating a degree of energy correction in the correction performing band.

The method of claim 17,

The information displayed in the correction performing band information includes at least one of presence or absence of energy correction for the band, start and end of the band to be corrected, start value of the band to be corrected, and end value of the band to be corrected. Coded audio signal generation method, characterized in that.

The method of claim 17,

The information displayed in the correction level information includes an energy correction value for the band to be corrected, one value for the entire band for energy correction, one slope value for the bands for energy correction, and energy correction. And at least one of a predetermined number of values for interpolation of correction values for the performing bands.