KR101001748B1

KR101001748B1 - Method and apparatus for decoding audio signal

Info

Publication number: KR101001748B1
Application number: KR1020070040560A
Authority: KR
Inventors: 김중회; 오은미; 콘스탄틴 오시포브; 보리스 쿠드야쇼브
Original assignee: 삼성전자주식회사
Priority date: 2007-04-25
Filing date: 2007-04-25
Publication date: 2010-12-15
Also published as: KR20070050035A

Abstract

오디오 신호의 중요 주파수성분 부호화방법 및 장치와 이를 이용한 오디오 신호 복호화 방법 및 장치가 개시된다. 그 오디오 신호의 중요주파수 성분 부호화방법은 주파수 영역으로 변환된 오디오 신호에 대해 심리음향모델을 이용하여 SMR 값을 계산하는 단계; 마스킹 역치가 주파수 영역의 오디오 신호보다 작은 주파수의 신호를 중요주파수 성분으로 선택하는 단계; 및 상기 중요주파수 성분으로 선택된 오디오 신호들 중에서 소정의 가중치를 고려하여 스펙트럼 피크를 추출하여 중요주파수 성분으로 선택하는 단계를 포함함을 특징으로 한다.Disclosed are a method and apparatus for encoding an important frequency component of an audio signal and an audio signal decoding method and apparatus using the same. An important frequency component encoding method of the audio signal may include calculating an SMR value using a psychoacoustic model for an audio signal converted into a frequency domain; Selecting a signal having a frequency whose masking threshold is smaller than an audio signal in a frequency domain as a significant frequency component; And extracting a spectral peak in consideration of a predetermined weight among the audio signals selected as the critical frequency component and selecting the critical frequency component.

본 발명에 의하면, 지각적으로 중요한 주파수 성분을 효율적으로 부호화하여 저비트율에서 고음질을 제공할 수 있다. 또한 심리음향 모델을 통하여 지각적으로 중요한 성분을 추출하며, 위상정보없이 부호화가 가능하고, 저비트율에서 효율적인 스펙트럼 신호를 표현할 수 있다. 또한 본 발명은 저비트율 오디오 부호화 방식이 필요한 모든분야에 응용가능하며, 차세대 오디오 방식으로 적용가능하다.According to the present invention, it is possible to efficiently encode perceptually important frequency components to provide high sound quality at a low bit rate. In addition, the psychoacoustic model extracts perceptually important components, encodes without phase information, and can represent an efficient spectral signal at a low bit rate. In addition, the present invention is applicable to all fields that require a low bit rate audio encoding method, and is applicable to the next generation audio method.

Description

Method and apparatus for decoding audio signal

도 1은 오디오신호를 저비트율로 압축하기 위해 입력되는 오디오신호 중에서 중요주파수 성분을 추출하는 본 발명에 의한 오디오신호의 중요주파수 성분 추출 장치의 구성을 블록도로 도시한 것이다.1 is a block diagram illustrating a configuration of an apparatus for extracting an important frequency component of an audio signal according to the present invention, which extracts an important frequency component from an audio signal input to compress an audio signal at a low bit rate.

도 2는 오디오신호를 저비트율로 압축하기 위해 입력되는 오디오신호 중에서 중요주파수 성분을 추출하는 본 발명에 의한 오디오신호의 중요주파수 성분 추출 방법을 흐름도로 도시한 것이다.2 is a flowchart illustrating a method for extracting important frequency components of an audio signal according to the present invention, which extracts important frequency components from an audio signal input to compress an audio signal at a low bit rate.

도 3은 오디오신호를 저비트율로 압축하기 위해 입력되는 오디오신호 중에서 중요주파수 성분을 추출하는 상술한 본 발명에 의한 오디오신호의 중요주파수 성분 추출 방법을 개념적으로 도시한 것이다.3 conceptually illustrates a method for extracting an important frequency component of an audio signal according to the present invention, which extracts an important frequency component from an audio signal input to compress an audio signal at a low bit rate.

도 4는 본 발명에 의한 오디오신호의 중요주파수 성분 추출 장치를 이용한 저비트율 오디오 신호의 부호화 장치의 구성을 블록도로 도시한 것이다.4 is a block diagram illustrating a configuration of an apparatus for encoding a low bit rate audio signal using an apparatus for extracting an important frequency component of an audio signal according to the present invention.

도 5는 양자화부의 세부 구성을 블록도로 도시한 것이다.5 is a block diagram illustrating a detailed configuration of a quantization unit.

도 6은 무손실 부호화의 세부 구성을 블록도로 도시한 것이다.6 is a block diagram illustrating a detailed configuration of lossless coding.

도 7은 본 발명에 의한 오디오신호의 중요주파수 성분 추출 방법을 이용한 저비트율 오디오 신호의 부호화 방법의 일실시예를 흐름도로 도시한 것이다.7 is a flowchart illustrating an embodiment of a method for encoding a low bit rate audio signal using the method for extracting an important frequency component of an audio signal according to the present invention.

도 8은 ISC 양자화를 보다 상세히 나타낸 흐름도이다.8 is a flow diagram illustrating ISC quantization in more detail.

도 9는 오디오신호의 중요주파수 성분 추출 장치를 이용하여 부호화된 저비트율 오디오 신호를 복호화하는 저비트율 오디오 신호 복호화 장치의 구성을 블록도로 도시한 것이다.FIG. 9 is a block diagram illustrating a configuration of a low bit rate audio signal decoding apparatus for decoding a low bit rate audio signal encoded by using an apparatus for extracting an important frequency component of an audio signal.

도 10은 오디오신호의 중요주파수 성분 추출 장치를 이용하여 부호화된 저비트율 오디오 신호를 복호화하는 저비트율 오디오 신호 복호화 방법을 흐름도로 도시한 것이다.FIG. 10 is a flowchart illustrating a low bit rate audio signal decoding method for decoding an encoded low bit rate audio signal using an apparatus for extracting an important frequency component of an audio signal.

본 발명은 오디오 부호화/복호화에 관한 것으로서, 특히 오디오 신호의 중요주파수 성분 추출방법 및 장치와 이를 이용한 저비트율 오디오 신호 부호화/복호화 방법 및 장치에 관한 것이다.The present invention relates to audio encoding / decoding, and more particularly, to a method and apparatus for extracting an important frequency component of an audio signal and a method and apparatus for encoding / decoding a low bit rate audio signal using the same.

MPEG 오디오는 고품질, 고능률 스테레오 부호화를 위한 ISO/IEC의 표준방식이다. 즉, ISO/IEC SC 29/WG11에 설치된 MPEG(Moving Picture Experts Group)내에서 동영상부호화와 병행하여 표준화되었다. 압축에는 32밴드에 기초한 서브밴드코딩(대역분할부호화)과 MDCT(Modified Discrete Cosine Transform:변형이산여현변환)를 사용하는데, 청각심리적(Psychoacoustic)특성을 이용해서 고능률의 압축이 실현되고 있다. 이 기술에 의해 MPEG 오디오는 종래의 압축부호화방식에 비해 뛰어난 음질을 실현하게 되었다. MPEG audio is the ISO / IEC standard for high quality, high efficiency stereo encoding. In other words, it is standardized in parallel with moving picture encoding in the Moving Picture Experts Group (MPEG) installed in ISO / IEC SC 29 / WG11. Compression uses 32-band subband coding (band division coding) and MDCT (Modified Discrete Cosine Transform). Psychoacoustic characteristics enable high-efficiency compression. This technology enables MPEG audio to achieve superior sound quality compared to conventional compression encoding.

MPEG 오디오는 오디오 신호를 고능률로 압축하기 위해 신호를 받아들이는 인간의 감각특성을 이용해서 감도가 낮은 세부의 정보를 생략하여 부호량을 절감하는 "지각부호화(Perceptual Coding)" 압축방법을 이용한다. MPEG audio uses a "Perceptual Coding" compression method that reduces the amount of code by omitting the low-sensitivity details by using the human sensory characteristics that accept the signal in order to compress the audio signal with high efficiency.

또한, MPEG 오디오에서 청각심리 특성을 이용한 지각부호화는 주로 고요할 때의 최소가청한계와 마스킹 특성이 이용되고 있다. 고요할 때의 최소가청한계란 청각이 감지할 수 있는 음의 최소 레벨로서, 고요할 때 청각이 감지할 수 있는 잡음의 한계와 관계가 있다. 상기 최소가청한계는 음의 주파수에 따라 다르다. 어떤 주파수에서 최소가청한계보다 큰 음은 들을 수 있지만, 최소가청한계보다 작은 음은 들을 수 없다. 또한, 특정음의 감지한계는 함께 들리는 다른 음에 의해 크게 변하는데, 이를 마스킹 효과라고 한다. 그리고, 마스킹 효과가 일어나는 주파수 폭을 임계대역(Critical Band)이라고 부른다. 이와 같은 임계대역 등의 청각심리를 효율적으로 이용하기 위해서는 우선 신호를 주파수 성분으로 나누는 것이 중요한데 이 때문에 대역을 32개의 밴드로 세분하여 서브밴드 부호화를 행한다. 또한, 이 때 MPEG 오디오에서는 32밴드의 엘리어싱 잡음을 소거시키기위해 필터 뱅크를 사용한다.In addition, the perceptual encoding using the psychoacoustic characteristics in MPEG audio mainly uses the minimum audible limit and the masking characteristics when the audio is silent. Quiet Minimum Audience Limit is the minimum level of sound that hearing can detect, which is related to the limit of noise that hearing can detect when quiet. The minimum audible limit depends on the frequency of the sound. At some frequencies you can hear notes that are greater than the minimum audible limit, but you can't hear sounds that are less than the minimum audible limit. In addition, the detection limit of a particular sound is greatly changed by other sounds heard together, which is called a masking effect. The frequency width at which the masking effect occurs is called a critical band. In order to effectively use the hearing psychology such as the critical band, it is important to first divide the signal into frequency components. Therefore, subband coding is performed by subdividing the band into 32 bands. MPEG audio also uses filter banks to eliminate 32-band aliasing noise.

MPEG 오디오는 이와 같이 필터 뱅크와 심리음향모델을 이용한 비트 할당과 양자화로 구성되어 있다. MDCT의 결과로 생성된 계수를 심리음향모델2를 이용하여, 최적의 양자화 비트를 할당하면서 압축을 하게 된다. 최적의 비트를 할당하기 위한 심리음향모델2는 FFT를 기초로 하고, 스프레딩 함수를 이용하여 마스킹 효과를 계산하기 때문에 상당히 많은 양의 복잡도가 요구된다. MPEG audio is composed of bit allocation and quantization using filter bank and psychoacoustic model. The coefficients generated as a result of MDCT are compressed using psychoacoustic model 2 while allocating optimal quantization bits. Psychoacoustic model 2 for assigning optimal bits is based on FFT and requires a considerable amount of complexity because the masking effect is calculated using a spreading function.

일반적으로 오디오신호를 저비트율(32 kbps 이하)로 압축하는 데 있어서 상기 오디오 신호의 모든 주파수 성분을 양자화하고 무손실 부호화하기에는 신호별 할당 가능한 비트수가 부족하다. 따라서 지각적으로 중요한 주파수 성분을 추출하여 양자화 및 무손실 부호화할 필요가 있다.In general, in compressing an audio signal at a low bit rate (32 kbps or less), the number of bits that can be allocated for each signal is insufficient to quantize and lossless encode all frequency components of the audio signal. Therefore, it is necessary to extract perceptually important frequency components for quantization and lossless coding.

본 발명이 이루고자 하는 기술적 과제는 오디오신호를 저비트율로 압축하기 위해 입력되는 오디오신호 중에서 중요주파수 성분을 추출하는 오디오신호의 중요주파수 성분 추출 방법 및 장치를 제공하는 것이다.An object of the present invention is to provide a method and apparatus for extracting an important frequency component of an audio signal for extracting an important frequency component from an audio signal input for compressing an audio signal at a low bit rate.

본 발명이 이루고자하는 다른 기술적 과제는 상기 오디오신호의 중요주파수 성분 추출방법 및 장치를 이용하여 저비트율 오디오 신호 부호화 방법 및 장치를 제공하는 것이다.Another object of the present invention is to provide a method and apparatus for encoding a low bit rate audio signal using the method and apparatus for extracting an important frequency component of the audio signal.

본 발명이 이루고자하는 또 다른 기술적 과제는 상기 오디오신호의 중요주파수 성분 추출방법 및 장치를 이용하여 부호화된 저비트율 오디오 신호를 복호화하는 저비트율 오디오 신호 복호화 방법 및 장치를 제공하는 것이다.Another object of the present invention is to provide a low bit rate audio signal decoding method and apparatus for decoding a low bit rate audio signal encoded using the method and apparatus for extracting an important frequency component of the audio signal.

상술한 기술적 과제를 해결하기 위한 본 발명에 의한 오디오신호의 중요주파수 성분 추출 방법은, 주파수 영역으로 변환된 오디오 신호에 대해 심리음향모델을 이용하여 SMR 값을 계산하는 단계; 마스킹 역치가 주파수 영역의 오디오 신호보다 작은 주파수의 신호를 중요주파수 성분으로 선택하는 단계; 및 상기 중요주파수 성분으로 선택된 오디오 신호들 중에서 소정의 가중치를 고려하여 스펙트럼 피크를 추출하여 중요주파수 성분으로 선택하는 단계를 포함함을 특징으로 한다. 상기 가중치는 가중치를 구하고자 하는 현재 신호의 주파수 부근에 있는 소정 개수의 주파수 스펙트럼 값을 이용하여 가중치를 구함이 바람직하다.According to an aspect of the present invention, there is provided a method for extracting an important frequency component of an audio signal, the method including calculating an SMR value using a psychoacoustic model for an audio signal converted into a frequency domain; Selecting a signal having a frequency whose masking threshold is smaller than an audio signal in a frequency domain as a significant frequency component; And extracting a spectral peak in consideration of a predetermined weight among the audio signals selected as the critical frequency component and selecting the critical frequency component. The weight is preferably obtained by using a predetermined number of frequency spectrum values in the vicinity of the frequency of the current signal to be weighted.

상기 오디오신호의 중요주파수 성분 추출 방법은, 상기 주파수 대역별로 SNR을 구해 SNR이 낮은 주파수 대역 중에서 소정 크기 이상의 피크값을 갖는 주파수 성분을 중요주파수 성분으로 선택하는 단계를 더 구비함이 바람직하다.The method for extracting an important frequency component of the audio signal may further include selecting an SNR for each frequency band and selecting a frequency component having a peak value of a predetermined magnitude or more among the frequency bands having a low SNR as an important frequency component.

상술한 기술적 과제를 해결하기 위한 본 발명에 의한 오디오신호의 중요주파수 성분 추출 방법은, 주파수 영역으로 변환된 오디오 신호에 대해 심리음향모델을 이용하여 SMR 값을 계산하는 단계; 마스킹 역치가 주파수 영역의 오디오 신호보다 작은 주파수의 신호를 중요주파수 성분으로 선택하는 단계; 및 상기 중요주파수 성분으로 선택된 오디오 신호들 중에서 주파수 대역별로 SNR을 구해 SNR이 낮은 주파수 대역 중에서 소정 크기 이상의 피크값을 갖는 주파수 성분을 중요주파수 성분으로 선택하는 단계를 포함함을 특징으로 한다.According to an aspect of the present invention, there is provided a method for extracting an important frequency component of an audio signal, the method including calculating an SMR value using a psychoacoustic model for an audio signal converted into a frequency domain; Selecting a signal having a frequency whose masking threshold is smaller than an audio signal in a frequency domain as a significant frequency component; And obtaining an SNR for each frequency band among the audio signals selected as the important frequency components, and selecting a frequency component having a peak value of a predetermined magnitude or more among the frequency bands having a low SNR as an important frequency component.

상술한 다른 기술적 과제를 해결하기 위한 본 발명에 의한 저비트율 오디오 신호 부호화 방법은, (a) 주파수 영역의 오디오 신호에 대해 심리음향모델을 이용하여 SMR 값을 계산하는 단계; (b) 마스킹 역치가 주파수 영역의 오디오 신호보다 작은 주파수의 신호를 중요주파수 성분으로 선택하는 단계; (c) 상기 중요주파수 성분으로 선택된 오디오 신호들 중에서 소정의 가중치를 고려하여 스펙트럼 피크를 추출하여 상기 피크의 주파수를 중요주파수 성분으로 선택하는 단계; 및 (d) 상기 중요주파수 성분의 오디오 신호를 양자화 및 무손실 부호화하는 단계를 포함함을 특징으로 한다. 상기 (c)단계는 상기 주파수 대역별로 SNR을 구해 SNR이 낮은 주파수 대역 중에서 소정 크기 이상의 피크값을 갖는 주파수 성분을 중요주파수 성분으로 선택하는 단계를 더 구비함이 바람직하다. 상기 (a)단계의 주파수 영역의 오디오 신호는 시간 영역의 오디오 신호를 MDCT(Modified Discrete Cosine Transform)와 MDST(Modified Discrete Sine Transform)를 이용하여 주파수 영역의 오디오 신호로 변환함에 의해 생성됨이 바람직하다. 상기 (d)단계의 중요주파수 성분 오디오 신호의 양자화는 비트사용량과 양자화에러 관계를 고려하여 부가정보를 최소로 할 수 있도록 그룹핑하는 단계; 상기 각 그룹의 데이터 분포(dynaamic range)와 SMR을 고려하여 양자화 스텝 크기를 결정하는 단계; 및 상기 그룹별로 소정의 양자화기를 사용하여 오디오 신호를 양자화하는 단계를 포함함이 바람직하다. 상기 양자화기는 상기 그룹 내의 최대값을 기준으로 정규화한 정규화 값과 상기 양자화 스텝 크기를 이용하여 선택됨이 바람직하다. 상기 양자화는 Max-Lloyd 양자화임이 바람직하다.According to another aspect of the present invention, there is provided a low bit rate audio signal encoding method comprising: (a) calculating an SMR value using a psychoacoustic model for an audio signal in a frequency domain; (b) selecting a signal at a frequency whose masking threshold is smaller than an audio signal in a frequency domain as a significant frequency component; (c) extracting a spectral peak in consideration of a predetermined weight among audio signals selected as the important frequency component and selecting a frequency of the peak as the important frequency component; And (d) quantizing and lossless encoding the audio signal of the critical frequency component. The step (c) preferably further includes the step of obtaining the SNR for each frequency band and selecting a frequency component having a peak value of a predetermined magnitude or more among the frequency bands having a low SNR as an important frequency component. The audio signal in the frequency domain of step (a) is preferably generated by converting the audio signal in the time domain into an audio signal in the frequency domain by using a modified disc cosine transform (MDCT) and a modified disc sine transform (MDST). Quantization of the critical frequency component audio signal of the step (d) is grouped to minimize the additional information in consideration of the bit usage and the quantization error relationship; Determining a quantization step size in consideration of the dynamic range and SMR of each group; And quantizing the audio signal by using a predetermined quantizer for each group. Preferably, the quantizer is selected using a normalization value normalized based on the maximum value in the group and the quantization step size. The quantization is preferably Max-Lloyd quantization.

상기 (d)단계의 양자화된 신호의 무손실 부호화는 콘텍스트 산술 부호화를 통해 이루어짐이 바람직하다. 상기 콘텍스트 산술 부호화는 프레임별로 상기 프레임을 구성하고 있는 주파수 성분 각각을 중요주파수 성분의 존재 여부를 나타내는 주파수 인덱스로 나타내는 단계; 및 양자화기 정보, 양자화 스텝 및 그룹핑정보를 포함한 부가정보와, 오디오 신호의 양자화값과, 상기 주파수 인덱스 값을 이전 프 레임과의 상관도 및 주변의 중요주파수 성분의 분포를 고려하여 확률모델을 선정하여 무손실 부호화하는 단계를 구비함이 바람직하다.Lossless coding of the quantized signal of step (d) is preferably performed through context arithmetic coding. The context arithmetic coding may include: representing each frequency component constituting the frame for each frame as a frequency index indicating whether a significant frequency component exists; And a probabilistic model by selecting additional information including quantizer information, quantization step and grouping information, quantization value of an audio signal, correlation between the frequency index value with a previous frame, and distribution of important frequency components around. Lossless coding.

상술한 다른 기술적 과제를 해결하기 위한 본 발명에 의한 저비트율 오디오 신호 부호화 방법은, 주파수 영역의 오디오 신호에 대해 심리음향모델을 이용하여 SMR 값을 계산하는 단계; 마스킹 역치가 주파수 영역의 오디오 신호보다 작은 주파수의 신호를 중요주파수 성분으로 선택하는 단계; 상기 중요주파수 성분으로 선택된 오디오 신호들 중에서 주파수 대역별로 SNR을 구해 SNR이 낮은 주파수 대역 중에서 소정 크기 이상의 피크값을 갖는 주파수 성분을 중요주파수 성분으로 선택하는 단계; 및 상기 중요주파수 성분의 오디오 신호를 양자화 및 무손실 부호화하는 단계를 포함함을 특징으로 한다.According to another aspect of the present invention, there is provided a low bit rate audio signal encoding method comprising: calculating an SMR value using a psychoacoustic model for an audio signal in a frequency domain; Selecting a signal having a frequency whose masking threshold is smaller than an audio signal in a frequency domain as a significant frequency component; Obtaining an SNR for each frequency band among the audio signals selected as the important frequency components and selecting a frequency component having a peak value of a predetermined magnitude or more among frequency bands having a low SNR as an important frequency component; And quantizing and lossless encoding the audio signal of the critical frequency component.

상술한 기술적 과제를 해결하기 위한 본 발명에 의한 오디오신호의 중요주파수 성분 추출 장치는, 주파수 영역으로 변환된 오디오 신호에 대해 심리음향적 특성을 고려하여 SMR 값을 계산하는 심리음향모델부; 상기 심리음향모델에서 계산된 SMR값을 이용하여 마스킹 역치가 주파수 영역의 오디오 신호보다 작은 주파수의 신호를 중요주파수 성분으로 선택하는 제1 ISC선택부; 및 상기 중요주파수 성분으로 선택된 오디오 신호들 중에서 소정의 가중치를 고려하여 스펙트럼 피크를 추출하여 중요주파수 성분으로 선택하는 제2 ISC선택부를 포함함을 특징으로 한다. 상기 제2 ISC선택부에서의 가중치는 가중치를 구하고자 하는 현재 신호의 주파수 부근에 있는 소정 개수의 주파수 스펙트럼 값을 이용하여 가중치를 구함이 바람직하다. 상기 본 발명에 의한 오디오신호의 중요주파수 성분 추출 장치는, 상기 주파수 대 역별로 SNR을 구해 SNR이 낮은 주파수 대역 중에서 소정 크기 이상의 피크값을 갖는 주파수 성분을 중요주파수 성분으로 선택하는 제3 ISC선택부를 더 구비함이 바람직하다.An apparatus for extracting an important frequency component of an audio signal according to the present invention for solving the above technical problem includes a psychoacoustic model unit configured to calculate an SMR value in consideration of psychoacoustic characteristics of an audio signal converted into a frequency domain; A first ISC selector configured to select, as an important frequency component, a signal having a frequency whose masking threshold is smaller than an audio signal in a frequency domain using an SMR value calculated in the psychoacoustic model; And a second ISC selector which extracts a spectral peak and selects it as an important frequency component from the audio signals selected as the important frequency component in consideration of a predetermined weight. The weight of the second ISC selector is preferably obtained by using a predetermined number of frequency spectrum values near the frequency of the current signal for which the weight is to be obtained. The apparatus for extracting an important frequency component of an audio signal according to the present invention includes: a third ISC selector configured to obtain an SNR for each frequency band and select a frequency component having a peak value of a predetermined magnitude or more among frequency bands having a low SNR as an important frequency component; It is preferable to further provide.

상술한 기술적 과제를 해결하기 위한 본 발명에 의한 오디오신호의 중요주파수 성분 추출 장치는, 주파수 영역으로 변환된 오디오 신호에 대해 심리음향적 특성을 고려하여 SMR 값을 계산하는 심리음향모델부; 상기 심리음향모델에서 계산된 SMR값을 이용하여 마스킹 역치가 주파수 영역의 오디오 신호보다 작은 주파수의 신호를 중요주파수 성분으로 선택하는 제1 ISC선택부; 및 상기 중요주파수 성분으로 선택된 오디오 신호들 중에서 주파수 대역별로 SNR을 구해 SNR이 낮은 주파수 대역 중에서 소정 크기 이상의 피크값을 갖는 주파수 성분을 중요주파수 성분으로 선택하는 제3 ISC선택부를 포함함을 특징으로 한다.An apparatus for extracting an important frequency component of an audio signal according to the present invention for solving the above technical problem includes a psychoacoustic model unit configured to calculate an SMR value in consideration of psychoacoustic characteristics of an audio signal converted into a frequency domain; A first ISC selector configured to select, as an important frequency component, a signal having a frequency whose masking threshold is smaller than an audio signal in a frequency domain using an SMR value calculated in the psychoacoustic model; And a third ISC selector configured to obtain an SNR for each frequency band among the audio signals selected as the important frequency components and to select a frequency component having a peak value of a predetermined magnitude or more among frequency bands having a low SNR as an important frequency component. .

상술한 다른 기술적 과제를 해결하기 위한 본 발명에 의한 저비트율 오디오 신호 부호화 장치는 주파수 영역으로 변환된 오디오 신호에 대해 심리음향적 특성을 고려하여 SMR 값을 계산하는 심리음향모델부; 상기 심리음향모델에서 계산된 SMR값을 이용하여 마스킹 역치가 주파수 영역의 오디오 신호보다 작은 주파수의 신호를 중요주파수 성분으로 선택하는 제1 ISC선택부; 상기 중요주파수 성분으로 선택된 오디오 신호들 중에서 소정의 가중치를 고려하여 스펙트럼 피크를 추출하여 상기 피크의 주파수를 중요주파수 성분으로 선택하는 제2 ISC선택부; 상기 중요주파수 성분의 오디오 신호를 양자화하는 양자화부; 및 상기 양자화된 신호를 무손실 부호화하는 무손실 부호화부를 포함함을 특징으로 한다.According to another aspect of the present invention, there is provided a low bit rate audio signal encoding apparatus comprising: a psychoacoustic model unit configured to calculate an SMR value in consideration of psychoacoustic characteristics of an audio signal converted into a frequency domain; A first ISC selector configured to select, as an important frequency component, a signal having a frequency whose masking threshold is smaller than an audio signal in a frequency domain using an SMR value calculated in the psychoacoustic model; A second ISC selector which extracts a spectral peak in consideration of a predetermined weight among audio signals selected as the critical frequency component and selects the frequency of the peak as the critical frequency component; A quantizer for quantizing the audio signal of the critical frequency component; And a lossless encoding unit for lossless encoding the quantized signal.

*상기 본 발명에 의한 저비트율 오디오 신호 부호화 장치는 상기 주파수 대역별로 SNR을 구해 SNR이 낮은 주파수 대역 중에서 소정 크기 이상의 피크값을 갖는 주파수 성분을 중요주파수 성분으로 선택하는 제3 ISC선택부를 더 구비함이 바람직하다.The apparatus for encoding low bit rate audio signals according to the present invention further includes a third ISC selector for obtaining an SNR for each frequency band and selecting a frequency component having a peak value of a predetermined magnitude or more among frequency bands having a low SNR as an important frequency component. This is preferred.

상기 본 발명에 의한 저비트율 오디오 신호 부호화 장치는 시간 영역의 오디오 신호를 MDCT(Modified Discrete Cosine Transform)와 MDST(Modified Discrete Sine Transform)를 이용하여 주파수 영역의 오디오 신호로 변환하는 T/F변환부를 더 구비함이 바람직하다.The apparatus for encoding a low bit rate audio signal according to the present invention further includes a T / F converter for converting an audio signal in a time domain into an audio signal in a frequency domain by using a Modified Discrete Cosine Transform (MDCT) and a Modified Discrete Sine Transform (MDST). It is desirable to have.

상기 양자화부는 비트사용량과 양자화에러 관계를 고려하여 부가정보를 최소로 할 수 있도록 그룹핑하는 그룹핑부; 상기 각 그룹의 데이터 분포(dynaamic range)와 SMR을 고려하여 양자화 스텝 크기를 결정하는 양자화 스텝 크기 결정부; 및 상기 그룹 내의 최대값을 기준으로 정규화한 정규화 값과 상기 양자화 스텝 크기를 이용하여 선택되며, 상기 그룹별로 오디오신호를 양자화하는 양자화기를 포함함이 바람직하다. 상기 그룹 양자화기에서의 양자화는 Max-Lloyd 양자화임이 바람직하다. 상기 무손실 부호화부는 콘텍스트 산술 부호화에 의해 무손실 부호화함이 바람직하다.The quantization unit is a grouping unit for grouping to minimize the additional information in consideration of the bit usage and the quantization error relationship; A quantization step size determiner which determines a quantization step size in consideration of the dynamic range and SMR of each group; And a quantizer selected using a normalization value normalized based on the maximum value in the group and the quantization step size, and quantizing the audio signal for each group. The quantization in the group quantizer is preferably Max-Lloyd quantization. The lossless coding unit preferably performs lossless coding by context arithmetic coding.

상기 무손실 부호화부는 프레임별로 상기 프레임을 구성하고 있는 주파수 성분 각각을 중요주파수 성분의 존재 여부를 나타내는 주파수 인덱스로 나타내는 인덱스부; 및 양자화기 정보, 양자화 스텝 크기, 그룹핑 정보를 포함하는 부가정보와 오디오 신호의 양자화 값과, 상기 주파수 인덱스 값을 이전 프레임과의 상관도 및 주변의 중요주파수 성분의 분포를 고려하여 확률모델을 선정하여 무손실 부호화하는 확률모델 무손실 부호화부를 구비함이 바람직하다.The lossless encoding unit may include: an index unit representing each frequency component constituting the frame for each frame as a frequency index indicating whether a significant frequency component exists; And a probabilistic model by selecting the additional information including the quantizer information, the quantization step size, and the grouping information, the quantization value of the audio signal, the correlation between the frequency index value and the previous frame, and the distribution of important frequency components. It is preferable to have a lossless coding unit for the probability model for lossless coding.

상술한 다른 기술적 과제를 해결하기 위한 본 발명에 의한 저비트율 오디오 신호 부호화 장치는 주파수 영역으로 변환된 오디오 신호에 대해 심리음향적 특성을 고려하여 SMR 값을 계산하는 심리음향모델부; 상기 심리음향모델에서 계산된 SMR값을 이용하여 마스킹 역치가 주파수 영역의 오디오 신호보다 작은 주파수의 신호를 중요주파수 성분으로 선택하는 제1 ISC선택부; 상기 중요주파수 성분으로 선택된 오디오 신호들 중에서 주파수 대역별로 SNR을 구해 SNR이 낮은 주파수 대역 중에서 소정 크기 이상의 피크값을 갖는 주파수 성분을 중요주파수 성분으로 선택하는 제3 ISC선택부; 상기 중요주파수 성분의 오디오 신호를 양자화하는 양자화부; 및 상기 양자화된 신호를 무손실 부호화하는 무손실 부호화부를 포함함을 특징으로 한다.According to another aspect of the present invention, there is provided a low bit rate audio signal encoding apparatus comprising: a psychoacoustic model unit configured to calculate an SMR value in consideration of psychoacoustic characteristics of an audio signal converted into a frequency domain; A first ISC selector configured to select, as an important frequency component, a signal having a frequency whose masking threshold is smaller than an audio signal in a frequency domain using an SMR value calculated in the psychoacoustic model; A third ISC selector which obtains an SNR for each frequency band among the audio signals selected as the important frequency components and selects a frequency component having a peak value of a predetermined magnitude or more among the frequency bands having a low SNR as an important frequency component; A quantizer for quantizing the audio signal of the critical frequency component; And a lossless encoding unit for lossless encoding the quantized signal.

상술한 또 다른 기술적 과제를 해결하기 위한 본 발명에 의한 저비트율 오디오 신호 복호화 방법은, 프레임별로 확률모델 정보를 추출하여 ISC의 존재 여부를 나타내는 인덱스 정보, 양자화기 정보, 양자화 스텝 크기, ISC의 그룹핑 정보 및 오디오 신호의 양자화값을 복원하는 단계; 상기 복원된 양자화기 정보와 양자화 스텝 크기 및 그룹핑정보를 참조하여 상기 양자화 값을 역양자화하는 단계; 및 상기 역양자화된 값을 시간영역의 신호로 변환하는 단계를 포함함을 특징으로 한다.According to another aspect of the present invention, there is provided a method for decoding a low bit rate audio signal. The method includes extracting probability model information for each frame and indicating index information, quantizer information, quantization step size, and grouping of the ISC. Restoring quantization values of the information and audio signals; Dequantizing the quantization value with reference to the reconstructed quantizer information, quantization step size, and grouping information; And converting the dequantized value into a signal in a time domain.

상술한 또 다른 기술적 과제를 해결하기 위한 본 발명에 의한 저비트율 오디 오 신호 복호화 장치는, 프레임별로 확률모델 정보를 추출하고, 상기 확률모델 정보를 이용하여 ISC의 존재 여부를 나타내는 인덱스 정보, 양자화기 정보, 양자화 스텝 크기, ISC의 그룹핑 정보 및 오디오 신호의 양자화값을 복원하는 무손실 복호화부; 상기 복원된 양자화기 정보와 양자화 스텝 크기 및 그룹핑정보를 참조하여 역양자화기를 사용하여 상기 양자화 값을 역양자화하는 역양자화부; 및 상기 역양자화된 값을 시간영역의 신호로 변환하는 F/T변환부를 포함함을 특징으로 한다.According to another aspect of the present invention, a low bit rate audio signal decoding apparatus according to the present invention extracts probability model information for each frame, and uses index information and quantizer to indicate whether ISC is present using the probability model information. A lossless decoding unit for restoring the information, the quantization step size, the grouping information of the ISC, and the quantization value of the audio signal; An inverse quantizer for inversely quantizing the quantization value using an inverse quantizer with reference to the reconstructed quantizer information, quantization step size, and grouping information; And an F / T converter converting the dequantized value into a signal in a time domain.

이하, 본 발명의 실시예에 대하여 첨부된 도면들을 참조하여 상세하게 설명하기로 한다. 도 1은 오디오신호를 저비트율로 압축하기 위해 입력되는 오디오신호 중에서 중요주파수 성분을 추출하는 본 발명에 의한 오디오신호의 중요주파수 성분 추출 장치의 구성을 블록도로 도시한 것으로서, 심리음향모델부(100) 및 ISC 선택부(150)를 포함하여 이루어진다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. 1 is a block diagram illustrating a configuration of an apparatus for extracting an important frequency component of an audio signal according to the present invention, which extracts an important frequency component from an audio signal input for compressing an audio signal at a low bit rate. ) And the ISC selector 150.

상기 심리음향모델부(100)는 주파수 영역으로 변환된 오디오 신호에 대해 심리음향적 특성을 고려하여 SMR 값을 계산한다. 상기 심리음향모델부(100)에 입력되는 스펙트럴 오디오 신호는 이산푸리에변환(Discrete Fourier Transform : DFT)이 아닌 MDCT(Modified Discrete Cosine Transform)와 MDST(Modified Discrete Sine Transform)를 이용하여 생성된다. 이렇게 하는 것은 MDCT는 실수부를 표현하고, MDST는 허수부를 표현하므로, 오디오 신호의 위상정보를 추가로 표현할 수 있기 때문에, DFT와 MDCT의 miss match 문제를 해결할 수 있다. 상기 miss match 문제는 시간영역의 오디오 신호를 DFT한 후, 이 신호를 이용하여 MDCT의 계수를 양자화함으로써 발생되는 것이다. The psychoacoustic model unit 100 calculates an SMR value in consideration of psychoacoustic characteristics of the audio signal converted into the frequency domain. The spectral audio signal input to the psychoacoustic model unit 100 is generated using a Modified Discrete Cosine Transform (MDCT) and a Modified Discrete Sine Transform (MDST) rather than a Discrete Fourier Transform (DFT). In this case, since the MDCT expresses the real part and the MDST expresses the imaginary part, the phase information of the audio signal can be additionally represented, thereby solving the problem of miss match between the DFT and the MDCT. The miss match problem is caused by DFTing an audio signal in the time domain and then quantizing the coefficients of the MDCT using the signal.

상기 ISC 선택부(150)는 상기 SMR값을 이용하여 오디오 신호 중 중요주파수 성분을 선택하며, 제1 ISC 선택부(152), 제2 ISC선택부(154) 및 제3 ISC선택부(156)를 구비한다. The ISC selector 150 selects an important frequency component of an audio signal using the SMR value, and includes a first ISC selector 152, a second ISC selector 154, and a third ISC selector 156. It is provided.

상기 제1 ISC선택부(152)는 상기 심리음향모델부(100)에서 계산된 SMR값을 이용하여 마스킹 역치가 주파수 영역의 오디오 신호보다 작은 주파수의 신호를 중요주파수 성분으로 선택한다. 중요주파수 성분을 선택하는데 있어 상기 제1 ISC선택부(152)에서의 선택이 가장 큰 비중을 차지한다.The first ISC selector 152 selects a signal having a frequency whose masking threshold is smaller than an audio signal in a frequency domain as an important frequency component by using the SMR value calculated by the psychoacoustic model unit 100. In selecting the important frequency component, the selection in the first ISC selector 152 takes the largest weight.

상기 제2 ISC선택부(154)는 상기 제1 ISC선택부(152)에서 중요주파수 성분으로 선택된 오디오 신호들 중에서 소정의 가중치를 고려하여 스펙트럼 피크를 추출하여 중요주파수 성분으로 선택한다.The second ISC selector 154 extracts a spectral peak in consideration of a predetermined weight among audio signals selected as the critical frequency component by the first ISC selector 152 and selects the spectral peak as a significant frequency component.

상기 중요주파수 성분들 중에서 스펙트럼 피크(Spectral peak)를 검색한다. 상기 스펙트럼 피크는 신호의 크기를 구하여 결정한다. 즉 MDCT와 MDST에 의해 변환된 신호의 실수부와 허수부를 제곱하여 더하여 그 값의 제곱근을 신호의 크기로 한다. 또한 상기 신호 주변의 스펙트럼 값을 이용하여 상기 신호의 가중치를 구한다. 상기 제2 ISC선택부(154)에서의 가중치는 가중치를 구하고자 하는 현재 신호의 주파수 부근에 있는 소정 개수의 주파수 스펙트럼 값을 이용하여 가중치를 구한다. 상기 가중치는 수학식 1에 의해 구할 수 있다. The spectral peak is searched among the critical frequency components. The spectral peak is determined by obtaining the magnitude of the signal. In other words, the real and imaginary parts of the signal converted by MDCT and MDST are squared and added to make the square root of the value the signal size. In addition, the weight of the signal is obtained using the spectral values around the signal. The weight of the second ISC selector 154 obtains a weight using a predetermined number of frequency spectrum values near the frequency of the current signal for which the weight is to be obtained. The weight can be obtained by Equation 1.

*여기서

는 가중치를 구하고자 하는 현재 신호의 크기이고,

및

는 현재 신호 주위에 있는 신호의 크기를 나타낸다. 또한 len는 상기 현재 신호와 주위에 있는 신호의 갯수를 나타낸다.*here

Is the magnitude of the current signal whose weight you want to obtain,

And

Represents the magnitude of the signal around the current signal. Len also represents the number of the current signal and the surrounding signals.

이렇게 구해진 신호의 피크값과 가중치를 근거로 하여, 중요 주파수 성분을 선택한다. 예를 들어 상기 피크값과 가중치를 곱하여 그 결과값을 미리 정해진 임계값(threshold value)과 비교하여 상기 임계값보다 큰 값만 중요 주파수 성분으로 선택한다.An important frequency component is selected based on the peak value and weight of the signal thus obtained. For example, the peak value is multiplied by the weight, and the result value is compared with a predetermined threshold value so that only a value larger than the threshold value is selected as the critical frequency component.

상기 제3 ISC선택부(156)는 SNR 등화(equalization)를 수행한다. 즉 주파수 대역별로 SNR을 구해 SNR이 낮은 주파수 대역 중에서 소정 크기 이상의 피크값을 갖는 주파수 성분을 중요주파수 성분으로 선택한다. 이렇게 하는 것은 특정주파수 대역에 ISC가 집중되어 선택되는 것을 방지하기 위함이다. 전체 대역에서 각 대역들 간에 SNR이 서로 비슷해지도록 SNR이 낮은 대역 중에서 dominant한 피크(peak)를 선택한다. 이렇게 함으로써 SNR값이 낮은 대역은 SNR값이 높아지고, 결국 대역들 간의 SNR값이 비슷하게 된다.The third ISC selector 156 performs SNR equalization. In other words, the SNR is calculated for each frequency band, and a frequency component having a peak value of a predetermined magnitude or more is selected as an important frequency component from a frequency band having a low SNR. This is to prevent the ISC from being concentrated and selected in a specific frequency band. Select the dominant peak among the low SNR bands so that the SNRs are similar to each other in the entire band. In this way, a band having a low SNR value has a high SNR value, and thus, an SNR value between bands is similar.

상기 ISC선택부(150)를 구성하고 있는 제1 ISC선택부(152), 제2 ISC선택부(154) 및 제3 ISC선택부(156)는 선택적으로 사용될 수 있다. 예를 들어 제1 ISC선택부(152) 및 제2 ISC선택부(154) 만 사용될 수도 있고, 제1 ISC선택부(152) 및 제3 ISC선택부(156) 만 사용될 수도 있다. 아니면, 제1 ISC선택부(152), 제2 ISC선택부(154) 및 제3 ISC선택부(156) 모두 사용될 수도 있다.The first ISC selector 152, the second ISC selector 154, and the third ISC selector 156 constituting the ISC selector 150 may be selectively used. For example, only the first ISC selector 152 and the second ISC selector 154 may be used, or only the first ISC selector 152 and the third ISC selector 156 may be used. Alternatively, the first ISC selector 152, the second ISC selector 154, and the third ISC selector 156 may all be used.

도 2는 오디오신호를 저비트율로 압축하기 위해 입력되는 오디오신호 중에서 중요주파수 성분을 추출하는 본 발명에 의한 오디오신호의 중요주파수 성분 추출 방법을 흐름도로 도시한 것이다. 먼저, 주파수 영역으로 변환된 오디오 신호에 대해 심리음향모델을 이용하여 SMR 값을 계산한다.(200단계) 그리고 나서 상기 SMR값을 이용하여 마스킹 역치가 주파수 영역의 오디오 신호보다 작은 주파수의 신호를 중요주파수 성분(ISC)으로 선택한다.(220단계) 2 is a flowchart illustrating a method for extracting important frequency components of an audio signal according to the present invention, which extracts important frequency components from an audio signal input to compress an audio signal at a low bit rate. First, an SMR value is calculated by using a psychoacoustic model on an audio signal converted into a frequency domain (step 200). Then, a signal of a frequency whose masking threshold is smaller than an audio signal of a frequency domain is important using the SMR value. Select the frequency component (ISC) (step 220).

상기 중요주파수 성분으로 선택된 오디오 신호들 중에서 소정의 가중치를 고려하여 스펙트럼 피크를 추출하여 중요주파수 성분으로 선택한다.(240단계) 상기 가중치는 가중치를 구하고자 하는 현재 신호의 주파수 부근에 있는 소정 개수의 주파수 스펙트럼 값을 이용하여 가중치를 구함이 바람직하다. 상기 240단계는 상술한 제2 ISC선택부(154)와 동일하므로 설명을 생략한다.The spectral peak is extracted from the audio signals selected as the important frequency components in consideration of a predetermined weight and selected as the important frequency component (step 240). The weight is a predetermined number near the frequency of the current signal whose weight is to be obtained. It is desirable to obtain weights using frequency spectrum values. Since step 240 is the same as the second ISC selector 154 described above, a description thereof will be omitted.

또한 주파수 대역별로 SNR등화를 수행하여 ISC를 선택한다.(260단계) 즉 주파수 대역별로 SNR을 구해 SNR이 낮은 주파수 대역 중에서 소정 크기 이상의 피크값을 갖는 주파수 성분을 중요주파수 성분으로 선택한다. 이렇게 하는 것은 상술한 바와 같이 특정주파수 대역에 ISC가 집중되어 선택되는 것을 방지하기 위함이 다. 전체 대역에서 각 대역들 간에 SNR이 서로 비슷해지도록 SNR이 낮은 대역 중에서 dominant한 피크(peak)를 선택한다. 이렇게 함으로써 SNR값이 낮은 대역은 SNR값이 높아지고, 결국 대역들 간의 SNR값이 비슷하게 된다.In addition, SNR equalization is performed for each frequency band to select ISC (step 260). In other words, an SNR is obtained for each frequency band, and a frequency component having a peak value of a predetermined magnitude or more is selected as an important frequency component among frequency bands having a low SNR. This is to prevent the ISC from being concentrated and selected in a specific frequency band as described above. Select the dominant peak among the low SNR bands so that the SNRs are similar to each other in the entire band. In this way, a band having a low SNR value has a high SNR value, and thus, an SNR value between bands is similar.

한편 상기 220단계 내지 260단계의 중요주파수 성분(ISC) 추출은 선택적으로 사용될 수 있다. 예를 들어 200단계와 220단계만 사용하여 ISC를 추출할 수도 있고, 200단계 및 260단계만 사용하여 ISC를 추출하거나 200단계, 240단계 및 260단계를 모두 거쳐 ISC를 추출할 수도 있다.Meanwhile, the extraction of critical frequency components (ISC) in steps 220 to 260 may be selectively used. For example, ISC can be extracted using only 200 and 220 steps, ISC can be extracted using only 200 and 260 steps, or ISC can be extracted through both 200, 240 and 260 steps.

도 3은 오디오신호를 저비트율로 압축하기 위해 입력되는 오디오신호 중에서 중요주파수 성분을 추출하는 상술한 본 발명에 의한 오디오신호의 중요주파수 성분 추출 방법을 개념적으로 도시한 것이다. 3 conceptually illustrates a method for extracting an important frequency component of an audio signal according to the present invention, which extracts an important frequency component from an audio signal input to compress an audio signal at a low bit rate.

도 4는 본 발명에 의한 오디오신호의 중요주파수 성분 추출 장치를 이용한 저비트율 오디오 신호의 부호화 장치의 구성을 블록도로 도시한 것으로서, ISC추출부(420), 양자화부(440) 및 무손실부호화부(460)를 포함하여 이루어진다. 상기 저비트율 오디오 신호의 부호화 장치는 T/F변환부(400)를 더 구비할 수도 있다.4 is a block diagram showing the configuration of an apparatus for encoding a low bit rate audio signal using an apparatus for extracting an important frequency component of an audio signal according to the present invention, and includes an ISC extractor 420, a quantizer 440, and a lossless encoder ( 460). The encoding apparatus of the low bit rate audio signal may further include a T / F converter 400.

상기 T/F변환부(400)는 시간영역의 오디오 신호를 MDCT(Modified Discrete Cosine Transform)와 MDST(Modified Discrete Sine Transform)를 이용하여 주파수 영역의 신호로 변환한다. 상기 ISC추출부(420)의 심리음향모델에 입력되는 스펙트럴 오디오 신호는 이산푸리에변환(Discrete Fourier Transform : DFT)이 아닌 MDCT와 MDST를 이용하여 생성된다. 이렇게 하는 것은 MDCT는 실수부를 표현하고, MDST는 허수부를 표현하므로, 오디오 신호의 위상정보를 추가로 표현할 수 있기 때 문에, DFT와 MDCT의 miss match 문제를 해결할 수 있다. 상기 miss match 문제는 시간영역의 오디오 신호를 DFT한 후, 이 신호를 이용하여 MDCT의 계수를 양자화함으로써 발생되는 것이다.The T / F converter 400 converts an audio signal in a time domain into a signal in a frequency domain by using a Modified Discrete Cosine Transform (MDCT) and a Modified Discrete Sine Transform (MDST). The spectral audio signal input to the psychoacoustic model of the ISC extractor 420 is generated using MDCT and MDST rather than Discrete Fourier Transform (DFT). This is because the MDCT expresses the real part and the MDST expresses the imaginary part, so that the phase information of the audio signal can be additionally represented, so that the miss match problem between the DFT and the MDCT can be solved. The miss match problem is caused by DFTing an audio signal in the time domain and then quantizing the coefficients of the MDCT using the signal.

상기 ISC추출부(420)는 주파수 영역의 오디오 신호에서 중요주파수 성분의 오디오 신호를 추출하며, 상술한 본 발명에 의한 오디오 신호의 중요주파수 성분 추출장치와 동일하다. 즉 상기 ISC추출부(420)는 심리음향부(100) 및 ISC선택부(150)를 구비한다.The ISC extractor 420 extracts an audio signal of an important frequency component from an audio signal in a frequency domain, and is the same as the apparatus for extracting an important frequency component of an audio signal according to the present invention. That is, the ISC extractor 420 includes a psychoacoustic unit 100 and an ISC selector 150.

상기 양자화부(440)는 상기 중요주파수 성분의 오디오 신호를 양자화하며, 도 5에 도시된 바와 같이 그룹핑부(442), 양자화스텝 크기 결정부(444) 및 양자화기(446)를 구비한다.The quantization unit 440 quantizes the audio signal of the critical frequency component, and includes a grouping unit 442, a quantization step size determining unit 444, and a quantizer 446 as shown in FIG. 5.

상기 그룹핑부(442)는 비트사용량과 양자화에러 관계를 고려하여 부가정보를 최소로 할 수 있도록 그룹핑한다. 상기 선택된 ISC에 대한 양자화는 다음과 같이 이루어진다. 먼저 선택된 ISC는 Rate-Distortion을 고려하여 부가정보를 최소로 할 수 있도록 그룹핑(grouping)을 수행한다. 상기 Rate-Distortion은 비트사용량과 양자화 에러의 관계를 나타내며 서로 trade off 관계에 있다. 즉 비트사용량을 늘리면 양자화 에러는 줄어들고 비트사용량을 줄이면 양자화 에러는 커지는 관계이다. 상기 그룹핑은 선택된 ISC들을 그룹핑하고 상기 그룹별로 cost를 계산하여 cost가 작게 되도록 그룹핑한다. 처음에는 그룹핑을 균일하게 할 수 있다. 그리고 나서 각 밴드별로 cost가 작아지도록 merge 한다. 또한 상기 cost 는 수학식 2와 같이 그룹별로 필요한 비트수와 부가정보 비트수와 더함으로써 구해진다.The grouping unit 442 groups the additional information to minimize the additional information in consideration of the bit usage and the quantization error relationship. Quantization for the selected ISC is performed as follows. Firstly, the selected ISC performs grouping to minimize additional information in consideration of rate-distortion. The Rate-Distortion indicates a relationship between bit usage and quantization error and is in a trade off relationship. In other words, increasing the bit usage reduces the quantization error, and decreasing the bit usage increases the quantization error. The grouping groups the selected ISCs and calculates the cost for each group to group the cost to be small. At first, the grouping can be made uniform. Then merge to make the cost smaller for each band. In addition, the cost is obtained by adding the number of bits required for each group and the number of additional information bits as shown in Equation 2.

cost = q_bit + 부가정보 [bit수]cost = q _bit + additional information [number of bits]

여기서 q_bit는 그룹별로 소요되는 비트수를 나타내고, 부가정보는 스케일팩터, 양자화 정보 등으로 이루어진다.In this case, q _bit represents the number of bits required for each group, and the additional information includes a scale factor, quantization information, and the like.

이렇게 하여 그룹핑이 되고 나면, 상기 양자화 스텝 크기 결정부(444)는 상기 각 그룹의 데이터 분포(dynaamic range)와 SMR을 고려하여 양자화 스텝 크기를 결정한다. 또한 상기 그룹을 구성하고 있는 ISC 들 중 최대값을 기준으로 ISC들을 정규화(normalize)한다.After grouping in this manner, the quantization step size determiner 444 determines the quantization step size in consideration of the data distribution (dynaamic range) and SMR of each group. In addition, the ISCs are normalized based on the maximum value of the ISCs constituting the group.

상기 양자화기(446)는 상기 그룹별로 오디오신호를 양자화한다. 상기 양자화기(446)는 그룹 내의 최대값을 기준으로 정규화한 정규화 값과 상기 양자화 스텝 크기를 이용하여 결정된다. 상기 양자화는 Max-Lloyd 양자화를 사용함이 바람직하다.The quantizer 446 quantizes the audio signal for each group. The quantizer 446 is determined using a normalization value normalized based on the maximum value in the group and the quantization step size. Preferably, the quantization uses Max-Lloyd quantization.

상기 무손실 부호화부(460)는 상기 양자화된 신호를 무손실 부호화하며, 도 6에 도시된 바와 같이 인덱스부(462) 및 확률모델 무손실 부호화부(464)를 구비한다. 상기 무손실 부호화는 문맥(context) 산술 부호화를 사용할 수 있다.The lossless encoding unit 460 losslessly encodes the quantized signal, and includes an index unit 462 and a probability model lossless encoding unit 464 as shown in FIG. 6. The lossless coding may use context arithmetic coding.

상기 인덱스부(462)는 프레임별로 상기 프레임을 구성하고 있는 주파수 성분 각각을 중요 주파수 성분의 존재 여부를 나타내는 주파수 인덱스로 나타낸다. ISC의 주파수 정보는 context arithmetic coding을 통해 부호화한다. 구체적으로 프레임별로 상기 프레임을 구성하고 있는 주파수 성분 각각을 ISC로 선택된 것인지 여부를 나타내는 주파수 인덱스로 설정한다. 상기 주파수 인덱스는 ISC의 존재 여부를 0과 1로 표현한 것이다.The index unit 462 represents each frequency component constituting the frame for each frame as a frequency index indicating whether a significant frequency component exists. Frequency information of the ISC is encoded through context arithmetic coding. Specifically, each frequency component constituting the frame for each frame is set to a frequency index indicating whether ISC is selected. The frequency index is expressed by 0 and 1 whether ISC is present or not.

상기 확률모델 무손실 부호화부(464)는 양자화기 정보, 양자화 스텝 및 그룹핑정보를 포함한 부가정보와 오디오 신호의 양자화값과, 상기 주파수 인덱스 값을 이전 프레임과의 상관도 및 주변의 중요주파수 성분의 분포를 고려하여 확률모델을 선정하여 무손실 부호화한다. 그리고 부호화된 값에 대해 비트 패킹(bit packing)을 수행한다.The probability model lossless encoder 464 is a quantization value of the additional information including the quantizer information, quantization step and grouping information and the audio signal, the correlation between the frequency index value and the previous frame and the distribution of important frequency components around Lossless coding by selecting a probabilistic model in consideration of Then, bit packing is performed on the encoded value.

시간영역의 오디오 신호를 MDCT(Modified Discrete Cosine Transform)와 MDST(Modified Discrete Sine Transform)를 이용하여 주파수 영역의 신호로 변환한다.(700단계) 상기 주파수 영역으로 변환된 오디오 신호는 심리음향모델에 입력된다. 상기 심리음향모델에서 상기 주파수 영역의 오디오 신호에 대한 중요도를 예측하기 위해, SMR(Signal to Mask Ratio)을 계산한다.(720단계) 상기 SMR값을 이용하여 ISC를 추출한다.(740단계) 상기 ISC 추출은 상술한 본 발명에 의한 ISC추출방법과 동일하므로 설명을 생략한다.The audio signal in the time domain is converted into a signal in the frequency domain by using a Modified Discrete Cosine Transform (MDCT) and a Modified Discrete Sine Transform (MDST) (step 700). The audio signal converted into the frequency domain is input to a psychoacoustic model. do. In order to predict the importance of the audio signal of the frequency domain in the psychoacoustic model, a signal to mask ratio (SMR) is calculated (step 720). The ISC is extracted using the SMR value (step 740). Since ISC extraction is the same as the ISC extraction method according to the present invention described above, description thereof is omitted.

ISC가 추출되면, 상기 ISC를 양자화한다.(760단계) 상기 ISC 양자화를 보다 상세히 설명하면, 도 8에 도시된 바와 같이 비트사용량과 양자화에러 관계를 고려하여 부가정보를 최소로 할 수 있도록 그룹핑한다.(762단계) 상기 그룹핑은 상술한 도 5의 그룹핑부(442)에서의 설명한 것과 동일하므로 설명을 생략한다.When the ISC is extracted, the ISC is quantized (step 760). The ISC quantization will be described in more detail. As shown in FIG. 8, grouping is performed to minimize additional information in consideration of bit usage and quantization error relationship. (Step 762) Since the grouping is the same as described in the grouping unit 442 of FIG. 5 described above, a description thereof will be omitted.

그룹핑이 되고 나면 상기 각 그룹의 데이터 분포(dynamic range)와 SMR을 고려하여 양자화 스텝 크기를 결정한다.(764단계) 또한 상기 그룹을 구성하고 있는 ISC 들 중 최대값을 기준으로 ISC들을 정규화(normalize)한다.After grouping, the quantization step size is determined in consideration of the dynamic range and SMR of each group (step 764). Also, normalize the ISCs based on the maximum value among the ISCs constituting the group. )do.

그리고 나서 그룹 내의 최대값을 기준으로 정규화한 정규화 값과 상기 양자화 스텝 크기를 이용하여 양자화기를 결정하여 상기 그룹별로 오디오신호를 양자화한다.(766단계) 상기 양자화는 Max-Lloyd 양자화를 사용함이 바람직하다.Then, the quantizer is determined using the normalization value normalized based on the maximum value in the group and the quantization step size to quantize the audio signal for each group (step 766). Preferably, the quantization uses Max-Lloyd quantization. .

상술한 바와 같이 양자화가 되면, 이를 무손실 부호화한다.(780단계) ISC의 주파수 정보와 양자화값은 문맥 산술 부호화(context arithmetic coding)를 통해 부호화한다. 또한 프레임별로 상기 프레임을 구성하고 있는 주파수 성분 각각을 ISC로 선택된 것인지 여부를 나타내는 주파수 인덱스로 설정한다. 상기 주파수 인덱스는 ISC의 존재 여부를 0과 1로 표현한 것이다. 상기 주파수 인덱스 값을 부호화한다. 이 때 상기 부호화는 이전 프레임과의 상관도 및 주변의 ISC의 분포를 고려하여 확률모델 선정하여 무손실 부호화한다. 그리고 나서 부호화된 값에 대해 비트패킹 한다.When the quantization is performed as described above, lossless coding is performed (step 780). The frequency information and the quantization value of the ISC are encoded through context arithmetic coding. In addition, for each frame, each frequency component constituting the frame is set to a frequency index indicating whether ISC is selected. The frequency index is expressed by 0 and 1 whether ISC is present or not. The frequency index value is encoded. In this case, the encoding is performed losslessly by selecting a probabilistic model in consideration of correlation with a previous frame and distribution of surrounding ISCs. Then bitpack the encoded value.

도 9는 상기 오디오신호의 중요주파수 성분 추출 장치를 이용하여 부호화된 저비트율 오디오 신호를 복호화하는 저비트율 오디오 신호 복호화 장치의 구성을 블록도로 도시한 것으로서, 무손실 복호화부(900), 역양자화부(920), F/T변환부(940)를 포함하여 이루어진다.9 is a block diagram illustrating a configuration of a low bit rate audio signal decoding apparatus for decoding a low bit rate audio signal encoded by using an apparatus for extracting an important frequency component of the audio signal, wherein the lossless decoding unit 900 and the inverse quantization unit ( 920, an F / T conversion unit 940.

상기 무손실 복호화부(900)는 프레임별로 확률모델 정보를 추출하고, 상기 확률모델 정보를 이용하여 ISC의 존재 여부를 나타내는 인덱스 정보, 양자화기 정 보, 양자화 스텝 크기, ISC 그룹핑 정보 및 상기 그룹별 오디오 신호의 양자화값을 복원한다.The lossless decoding unit 900 extracts probabilistic model information for each frame, and uses the probabilistic model information to index information indicating whether ISC is present, quantizer information, quantization step size, ISC grouping information, and audio for each group. Restore the quantized value of the signal.

상기 역양자화부(920)는 상기 복원된 양자화기 정보와 양자화 스텝 크기 및 그룹핑정보를 참조하여 역양자화기를 사용하여 상기 양자화 값을 역양자화한다. The dequantizer 920 dequantizes the quantization value using an inverse quantizer with reference to the reconstructed quantizer information, quantization step size, and grouping information.

상기 F/T 변환부(940)는 상기 역양자화된 값을 시간영역의 신호로 변환한다. The F / T converter 940 converts the dequantized value into a signal in a time domain.

도 10은 오디오신호의 중요주파수 성분 추출 장치를 이용하여 부호화된 저비트율 오디오 신호를 복호화하는 저비트율 오디오 신호 복호화 방법을 흐름도로 도시한 것이다. 도 9 및 도 10을 참조하여 본 발명에 의한 저비트율 오디오 신호 복호화 방법 및 그 장치의 동작을 설명하기로 한다. FIG. 10 is a flowchart illustrating a low bit rate audio signal decoding method for decoding an encoded low bit rate audio signal using an apparatus for extracting an important frequency component of an audio signal. A method of decoding a low bit rate audio signal and an apparatus thereof according to the present invention will be described with reference to FIGS. 9 and 10.

먼저, 무손실 복호화부(900)를 통해 프레임별로 확률모델 정보를 추출한다.(1000단계) 그리고 나서 상기 확률모델 정보를 이용하여 ISC의 존재 여부를 나타내는 인덱스 정보, 양자화기 정보, 양자화 스텝 크기, 그룹핑 정보 및 오디오 신호의 양자화값을 복원한다.(1020단계) 그 다음에 상기 역양자화부(920)를 통해 상기 복원된 양자화기 정보와 양자화 스텝 크기 및 그룹핑정보를 참조하여 상기 양자화 값을 역양자화한다.(1040단계) 역양자화가 되면, 상기 F/T 변환부(940)를 통해 상기 역양자화된 값을 시간영역의 신호로 변환한다.(1060단계)First, the probability model information is extracted for each frame through the lossless decoder 900 (step 1000). Then, index information indicating the existence of ISC, quantizer information, quantization step size, and grouping are performed using the probability model information. The quantized value of the information and the audio signal is recovered (step 1020). The quantized value is then dequantized by referring to the reconstructed quantizer information, the quantization step size, and the grouping information through the inverse quantizer 920. In step 1040, the dequantization is performed, and the F / T converter 940 converts the dequantized value into a signal in a time domain.

한편, 상기한 본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자 기 테이프, 플라피디스크, 광데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다.On the other hand, the present invention described above can also be embodied as computer readable codes on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and also implemented in the form of a carrier wave (for example, transmission over the Internet). It includes being. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. And functional programs, codes and code segments for implementing the present invention can be easily inferred by programmers in the art to which the present invention belongs.

이상 도면과 명세서에서 최적 실시예들이 개시되었다. 여기서 특정한 용어들이 사용되었으나, 이는 단지 본 발명을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 특허청구범위에 기재된 본 발명의 범위를 제한하기 위하여 사용된 것은 아니다. 그러므로 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다.The best embodiments have been disclosed in the drawings and specification above. Although specific terms have been used herein, they are used only for the purpose of describing the present invention and are not used to limit the scope of the present invention as defined in the meaning or claims. Therefore, those skilled in the art will understand that various modifications and equivalent other embodiments are possible from this. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

상술한 바와 같이 본 발명에 의한 오디오 신호의 중요주파수 성분 추출방법 및 장치와 이를 이용한 저비트율 오디오 신호 부호화/복호화 방법 및 장치에 의하면, 지각적으로 중요한 주파수 성분을 효율적으로 부호화하여 저비트율에서 고음질을 제공할 수 있다. 또한 심리음향 모델을 통하여 지각적으로 중요한 성분을 추출하며, 위상정보없이 부호화가 가능하고, 저비트율에서 효율적인 스펙트럼 신호를 표현할 수 있다. 또한 본 발명은 저비트율 오디오 부호화 방식이 필요한 모든분야 에 응용가능하며, 차세대 오디오 방식으로 적용가능하다.As described above, the method and apparatus for extracting an important frequency component of an audio signal according to the present invention and the method and apparatus for encoding / decoding a low bit rate audio signal using the same, efficiently encode perceptually important frequency components to achieve high sound quality at low bit rates. Can provide. In addition, the psychoacoustic model extracts perceptually important components, encodes without phase information, and can represent an efficient spectral signal at a low bit rate. In addition, the present invention is applicable to all fields that require a low bit rate audio encoding method, and is applicable to the next generation audio method.

Claims

delete

Decoding frequency information of the critical frequency component;

Dequantizing and decoding the value of the significant frequency component; And

And restoring an audio signal by inversely converting the signal restored to the value of the decoded critical frequency component to the time domain using the decoded frequency information.

44. The method of claim 43, wherein decrypting the information comprises:

Decoding index information indicating whether the significant frequency component is present; And

And decoding the quantization step size and the quantization value of the audio signal.

45. The method of claim 44, wherein inverse quantization and decoding

And inversely quantize and decode the value of the significant frequency component using the decoded quantization step size and the quantization value.

A frequency information decoder for decoding frequency information of the critical frequency component; And

A frequency component decoder for inversely quantizing and decoding the value of the significant frequency component;

And a signal reconstruction unit for reconstructing an audio signal by inversely converting a signal reconstructed into a value of the decoded critical frequency component using the decoded frequency information into a time domain.

47. The apparatus of claim 46, wherein the frequency information decoder

A first information decoder which decodes index information indicating whether the significant frequency component is present; And

And a second information decoder which decodes the quantization step size and the quantization value of the audio signal.

48. The apparatus of claim 47, wherein the frequency component decoder

delete