KR100928966B1

KR100928966B1 - Low bitrate encoding/decoding method and apparatus

Info

Publication number: KR100928966B1
Application number: KR1020070010614A
Authority: KR
Inventors: 김중회; 오은미; 보리스 쿠드야쇼브; 콘스탄틴 오시포브
Original assignee: 삼성전자주식회사
Priority date: 2007-02-01
Filing date: 2007-02-01
Publication date: 2009-11-26
Also published as: KR20070027669A

Abstract

The present invention provides a low bit rate encoding / decoding method and apparatus. The low bit rate encoding method according to the present invention comprises the steps of: (a) converting an input audio signal in a time domain into an audio signal in a frequency domain; (b) extracting an important spectrum component from an audio signal in a frequency domain and quantizing the important spectrum component; (c) extracting a residual spectrum component except for an important spectrum component from an audio signal in a frequency domain, and calculating and quantizing a noise level of the residual spectrum component; And (d) losslessly encoding the quantized significant spectral components and the noise level to output the bitstream.

Description

Low bitrate encoding / decoding method and apparatus

도 1은 본 발명에 따른 저비트율 부호화장치의 구성을 나타내는 블록도. 1 is a block diagram showing a configuration of a low bit rate encoding apparatus according to the present invention.

도 2는 도 1에 있어서 노이즈성분처리부의 보다 상세한 구성을 나타내는 블록도. FIG. 2 is a block diagram showing a more detailed configuration of a noise component processing section in FIG. 1; FIG.

도 3은 본 발명에 따른 저비트율 부호화장치의 동작을 나타내는 흐름도. 3 is a flowchart illustrating the operation of a low bit rate encoding apparatus according to the present invention.

도 4는 도 3에 있어서 S330단계를 보다 상세하게 나타내는 흐름도. 4 is a flowchart illustrating step S330 in more detail in FIG.

도 5a 내지 도 5d는 본 발명에 의한 저비트율 오디오 부호화/복호화장치의 주파수 신호처리에 따른 신호변화의 일례를 나타내는 도. 5A to 5D are diagrams showing an example of signal change according to frequency signal processing of the low bit rate audio encoding / decoding apparatus according to the present invention.

도 6은 본 발명에 따른 저비트율 복호화장치의 구성을 나타내는 블록도. 6 is a block diagram showing the configuration of a low bit rate decoding apparatus according to the present invention;

도 7은 본 발명에 따른 저비트율 복호화장치의 동작을 나타내는 흐름도. 7 is a flowchart illustrating the operation of a low bit rate decoding apparatus according to the present invention.

본 발명은 부호화/복호화 방법 및 장치에 관한 것으로서, 더욱 상세하게는 낮은 비트율에서 효율적으로 데이터를 압축하여 고음질을 제공하는 저비트율 부호화/복호화 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for encoding / decoding, and more particularly, to a method and apparatus for low bit rate encoding / decoding that provides high quality sound by compressing data efficiently at a low bit rate.

정보를 포함하고 있는 파형은 원래 진폭에 있어서 연속적이고 시간상으로도 연속적인 아날로그(Analog) 신호이다. 따라서 파형을 이산신호로 표현하기 위해서 A/D(Analog-to-Digital) 변환이 필요하다. 상기 A/D 변환을 하기 위해서는 두 가지 과정을 필요로 한다. 하나는 시간상의 연속신호를 이산신호를 바꾸어 주는 표본화(Sampling)과정이고, 다른 하나는 가능한 진폭의 수를 유한한 값으로 제한하기 위한 진폭 양자화 과정이다. 즉, 진폭의 양자화는 시간 n에서 입력 진폭 x(n)을 가능한 진폭의 유한한 집합 중의 한 원소인 y(n)으로 변환해 주는 과정이다.The waveform containing the information is an analog signal that is continuous in time and continuous in time. Therefore, A / D (Analog-to-Digital) conversion is required to represent the waveform as a discrete signal. In order to perform the A / D conversion, two processes are required. One is a sampling process for changing the discrete signal of a continuous signal in time, and the other is an amplitude quantization process for limiting the number of possible amplitudes to a finite value. That is, quantization of amplitude is the process of converting input amplitude x (n) to y (n), an element of a finite set of possible amplitudes at time n.

오디오 신호의 저장/복원 방식도 최근 디지털 신호처리 기술의 발달에 의해 기존의 아날로그 신호를 표본화와 양자화 과정을 거쳐 디지털 신호인 PCM(Pulse Code Modulation) 데이터로 변환하여 CD(Compact Disc)와 DAT(Digital Audio Tape)와 같은 기록/저장 매체에 신호를 저장해 둔 뒤 사용자가 필요시에 저장된 신호를 다시 재생해서 들을 수 있는 기술이 개발되어 일반인들에게 보편화되어 사용되고 있다. 이러한 디지털 방식에 의한 저장/복원 방식은 LP(Long-Play Record)와 테이프(Tape)와 같은 아날로그 방식에 비해 음질의 향상과 저장 기간에 따른 열화를 극복하였으나 디지털 데이터의 크기가 많아 저장 및 전송에 문제를 보였다.The storage / restoration method of audio signal is also converted into PCM (Pulse Code Modulation) data, which is a digital signal through sampling and quantization through the development of digital signal processing technology. After storing signals on recording / storage media such as Audio Tape, users can replay the stored signals when they need them. Compared with analog methods such as LP (Long-Play Record) and Tape, the digital storage / restore method overcomes the deterioration due to the improvement of sound quality and the storage period. Showed a problem.

이러한 문제를 해결하기 위해 디지털 음성 신호를 압축하기 위해 DPCM(Differential Pulse Code Modulaton)이나 ADPCM(Adaptive Differential Pulse Code Modulation)등의 방법들이 개발되었다. 상기 방법들을 사용하여 디지털 음성신호의 데이터 양을 줄이려는 노력이 있었으나 신호의 종류에 따라 효율성이 크게 차이가 난다. 최근 ISO(International Standard Organization)에 의해 표준화 작 업이 이루어진 MPEG/audio(Moving Pictures Expert Group)기법이나 돌비(dolby)에 의해 개발된 AC-2/AC-3 기법에서는 인간의 심리음향 모델(psychoacoustic model)을 이용하여 데이터 양을 줄이는 방법을 사용했다. 이러한 방법은 신호의 특성에 관계없이 효율적으로 데이터의 양을 줄이는데 크게 기여하였다.To solve this problem, methods such as DPCM (Differential Pulse Code Modulaton) or ADPCM (Adaptive Differential Pulse Code Modulation) have been developed to compress digital voice signals. Efforts have been made to reduce the amount of data in digital voice signals using the above methods, but the efficiency varies greatly depending on the type of signal. The psychoacoustic model of humans is used in the MPEG / audio (Moving Pictures Expert Group) technique, which was recently standardized by the International Standard Organization (ISO), or the AC-2 / AC-3 technique developed by Dolby. ) To reduce the amount of data. This method greatly contributed to reducing the amount of data efficiently regardless of the signal characteristics.

MPEG-1/audio, MPEG-2/audio나 AC-2/AC-3 등과 같은 기존의 오디오 신호 압축 기법에서는 시간영역의 신호를 일정한 크기의 블록으로 묶어서 주파수 영역의 신호로 변환한다. 그리고 이 변환된 신호를 인간의 심리음향 모델을 이용하여 스칼라 양자화를 수행한다. 이런 양자화 기법은 단순하지만 입력 샘플이 통계적으로 독립적이라고 할지라도 최적은 아니다. 물론 입력샘플이 통계적으로 종속적이라면 더욱 불충분하다. 이러한 문제점 때문에 엔트로피(entropy) 부호화와 같은 무손실 부호화나 어떤 종류의 적응 양자화를 포함하여 부호화를 수행한다. 따라서 단순한 PCM 데이터만을 저장하던 방식보다는 상당히 복잡한 과정을 거치고 비트스트림은 양자화된 PCM 데이터뿐만 아니라 신호를 압축하기 위한 부가적인 정보들로 구성되어 있다.Conventional audio signal compression techniques such as MPEG-1 / audio, MPEG-2 / audio or AC-2 / AC-3 combine signals in the time domain into blocks of a certain size and convert them into signals in the frequency domain. The transformed signal is then scalar quantized using a human psychoacoustic model. This quantization technique is simple but not optimal even if the input samples are statistically independent. Of course, if the input sample is statistically dependent, it is even insufficient. Because of this problem, coding is performed including lossless coding such as entropy coding or some kind of adaptive quantization. Therefore, rather than simply storing only PCM data, the process is considerably more complicated, and the bitstream is composed of additional information for compressing a signal as well as quantized PCM data.

MPEG/audio 표준이나 AC-2/AC-3 방식은 기존의 디지털 부호화에 비해 1/6내지 1/8로 줄어든 64kbps~384kbps의 비트율로 컴팩트디스크(compact disc) 음질과 거의 같은 정도의 음질을 제공한다. 이러한 이유 때문에 MPEG/audio 표준은 DAB(Digital Audio Broadcasting), 인터넷 폰(internet phone), AOD(Audio on Demand) 등의 멀티미디어 시스템에서 오디오 신호의 저장과 전송에 중요한 역할을 할 것으로 예상된다.The MPEG / audio standard or AC-2 / AC-3 offers almost the same sound quality as compact discs, with bit rates of 64 kbps to 384 kbps, reduced by 1/6 to 1/8 compared to conventional digital encoding. do. For this reason, the MPEG / audio standard is expected to play an important role in the storage and transmission of audio signals in multimedia systems such as digital audio broadcasting (DAB), internet phones, and audio on demand.

그러나, 이러한 부호화 기술은 32kbps이하의 저비트율에서는 신호 양자화 방식만으로는 부호화하기에 가용 비트가 부족하다. 따라서, 저비트율에서도 효율적으로 오디오신호를 압축하고, 원음에 가깝게 복원할 수 있는 기술이 요구된다. However, such an encoding technique lacks available bits for encoding only by signal quantization at low bit rates of 32 kbps or less. Therefore, there is a need for a technique capable of efficiently compressing an audio signal and restoring it close to the original sound even at a low bit rate.

본 발명이 이루고자 하는 기술적 과제는 효율적으로 데이터를 압축할 수 있고 원음에 가깝게 복원할 수 있는 저비트율 부호화/복호화방법 및 장치를 제공하는데 있다. SUMMARY OF THE INVENTION The present invention has been made in an effort to provide a low bit rate encoding / decoding method and apparatus capable of compressing data efficiently and restoring near an original sound.

상기 기술적 과제를 이루기 위한 본 발명에 의한 저비트율 부호화방법은, (a) 입력되는 시간영역의 오디오신호를 주파수영역의 오디오신호로 변환하는 단계; (b) 상기 주파수영역의 오디오신호에서 중요스펙트럼성분을 추출하고, 상기 중요스펙트럼성분을 양자화하는 단계; (c) 상기 주파수영역의 오디오신호에서 상기 중요스펙트럼성분을 제외한 잔여스펙트럼성분을 추출하고, 상기 잔여스펙트럼성분의 노이즈레벨을 계산하여 양자화하는 단계; 및 (d) 상기 양자화된 중요스펙트럼성분 및 노이즈레벨을 무손실 부호화하여 비트스트림을 출력하는 단계를 포함하는 것을 특징으로 한다. 상기 (b)단계 전에, (e) 상기 시간영역의 오디오신호를 인가받아 인간의 청각특성에 맞추어서 모델링하고 부호화비트할당정보를 계산하는 단계를 더 포함하며, 상기 (b)단계 및 (c)단계에서, 상기 부호화비트할당정보에 따라 비트수를 할당하여 상기 중요스펙트럼성분 및 노이즈레벨을 양자화할 수 있다. 또한, 잔여스펙트럼은 상기 중요스펙트럼성분의 주변 스펙트럼성분을 0으로 할 수 있다. 상기 (c)단계는 (c1) 상기 주파수영역의 오디오신호에서 상기 중요스펙트럼성분을 제외한 잔여스펙트럼성분을 추출하는 단계; (c2) 상기 잔여스펙트럼성분을 소정의 서브밴드로 나누어 노이즈크기에 따른 노이즈레벨을 계산하는 단계; 및 (c3) 상기 계산된 노이즈레벨을 양자화하는 단계를 포함할 수 있다. 상기 노이즈크기는 선형예측 분석을 이용하여 계산될 수 있다. According to an aspect of the present invention, there is provided a low bit rate encoding method comprising: (a) converting an input audio signal in a time domain into an audio signal in a frequency domain; (b) extracting an important spectrum component from the audio signal in the frequency domain and quantizing the important spectrum component; (c) extracting residual spectrum components excluding the important spectrum components from the audio signal of the frequency domain, and calculating and quantizing noise levels of the residual spectrum components; And (d) losslessly encoding the quantized significant spectral components and the noise level to output a bitstream. Before the step (b), (e) receiving the audio signal of the time domain further comprising the step of modeling according to the human auditory characteristics and calculating the coded bit allocation information, the steps (b) and (c) In this case, the number of bits may be allocated according to the encoding bit allocation information to quantize the significant spectrum component and the noise level. In addition, the residual spectrum may be set to 0 as the peripheral spectral component of the critical spectrum component. Step (c) comprises: (c1) extracting residual spectrum components excluding the important spectrum components from the audio signal in the frequency domain; (c2) calculating a noise level according to a noise size by dividing the residual spectrum component into predetermined subbands; And (c3) quantizing the calculated noise level. The noise magnitude can be calculated using linear prediction analysis.

상기 기술적 과제를 이루기 위한 저비트율 부호화장치는, 입력되는 시간영역의 오디오신호를 주파수영역의 오디오신호로 변환하는 신호변환부; 상기 주파수영역의 오디오신호에서 중요스펙트럼성분을 추출하고, 상기 중요스펙트럼성분을 양자화하는 중요스펙트럼성분처리부; 상기 주파수영역의 오디오신호에서 상기 중요스펙트럼성분을 제외한 잔여스펙트럼성분을 추출하고, 상기 잔여스펙트럼성분의 노이즈레벨을 계산하여 양자화하는 노이즈성분처리부; 및 상기 양자화된 중요스펙트럼성분 및 노이즈레벨을 무손실 부호화하여 비트스트림을 출력하는 무손실부호화부를 포함하는 것을 특징으로 한다. 상기 저비트율 부호화장치는 상기 시간영역의 오디오신호를 인가받아 인간의 청각특성에 맞추어서 모델링하고 부호화비트할당정보를 계산하는 심리음향모델부를 더 포함하며, 상기 중요스펙트럼성분처리부 및 노이즈성분처리부는 상기 부호화비트할당정보에 따라 비트수를 할당하여 상기 중요스펙트럼성분 및 노이즈레벨을 각각 양자화할 수 있다. 상기 노이즈성분처리부는 상기 주파수영역의 오디오신호에서 상기 중요스펙트럼성분을 제외한 잔여스펙트럼성분을 추출하는 잔여스펙트럼성분추출부; 상기 잔여스펙트럼성분을 소정의 서브밴드로 나누어 노이즈크기에 따른 노이즈레벨을 계산하는 노이즈레벨계산부; 및 상기 계산된 노이즈레벨을 양자화하는 노이즈레벨양자화부를 포함할 수 있다. According to an aspect of the present invention, there is provided a low bit rate encoding apparatus comprising: a signal converter converting an input audio signal in a time domain into an audio signal in a frequency domain; An important spectrum component processing unit which extracts an important spectrum component from the audio signal of the frequency domain and quantizes the important spectrum component; A noise component processing unit for extracting residual spectrum components excluding the important spectrum components from the audio signal in the frequency domain, and calculating and quantizing noise levels of the residual spectrum components; And a lossless encoding unit for losslessly encoding the quantized significant spectral components and the noise level to output a bitstream. The low bit rate encoding apparatus may further include a psychoacoustic model unit configured to receive audio signals in the time domain according to human auditory characteristics and calculate encoding bit allocation information. The important spectrum component processing unit and the noise component processing unit may include: The number of bits may be allocated according to bit allocation information to quantize the critical spectrum component and the noise level, respectively. The noise component processing unit comprises: a residual spectrum component extracting unit extracting a residual spectrum component except the important spectrum component from the audio signal in the frequency domain; A noise level calculator for dividing the residual spectrum components into predetermined subbands and calculating a noise level according to a noise level; And a noise level quantizer for quantizing the calculated noise level.

상기 기술적 과제를 이루기 위한 저비트율 복호화방법은 (a) 입력되는 비트스트림에 대하여 무손실 복호화하여 복호화된 오디오 신호를 출력하는 단계; (b) 상기 복호화된 오디오신호중에서 양자화된 데이터인 중요스펙트럼성분을 역양자화하는 단계; (c) 상기 복호화된 오디오신호중에서 부가정보인 노이즈레벨을 역양자화하여 노이즈성분을 생성하는 단계; (d) 상기 역양자화된 중요스펙트럼성분 및 노이즈성분을 결합하여 주파수영역의 오디오신호를 출력하는 단계; 및 (e) 상기 주파수영역의 오디오 신호를 시간영역의 오디오 신호로 변환하는 단계를 포함하는 것을 특징으로 한다. 상기 (c)단계는 (c1) 상기 복호화된 오디오신호중에서 부가정보인 노이즈레벨을 역양자화하는 단계; 및 (c2) 상기 역양자화된 노이즈레벨로부터 상기 중요스펙트럼성분 주변의 소정의 영역을 제외하고 노이즈성분을 생성하는 단계를 포함할 수 있다. According to another aspect of the present invention, there is provided a low bit rate decoding method comprising: (a) outputting a decoded audio signal by lossless decoding on an input bitstream; (b) dequantizing important spectral components which are quantized data in the decoded audio signal; (c) inversely quantizing a noise level which is additional information in the decoded audio signal to generate a noise component; (d) outputting an audio signal in a frequency domain by combining the dequantized significant spectral components and noise components; And (e) converting the audio signal in the frequency domain into an audio signal in the time domain. Step (c) includes: (c1) inverse quantization of a noise level which is additional information in the decoded audio signal; And (c2) generating a noise component from the dequantized noise level by excluding a predetermined region around the important spectral component.

상기 기술적 과제를 이루기 위한 저비트율 복호화장치는, 입력되는 비트스트림에 대하여 무손실 복호화하여 복호화된 오디오 신호를 출력하는 무손실복호화부; 상기 복호화된 오디오신호중에서 양자화된 데이터인 중요스펙트럼성분을 역양자화하는 중요스펙트럼성분역양자화부; 상기 복호화된 오디오신호중에서 부가정보인 노이즈레벨을 역양자화하여 노이즈성분을 생성하는 노이즈성분처리부; 상기 역양자화된 중요스펙트럼성분 및 노이즈성분을 결합하여 주파수영역의 오디오신호를 출력하는 스펙트럼성분결합부; 및 상기 주파수영역의 오디오 신호로부터 시간영역의 오디오 신호를 생성하는 신호생성부를 포함하는 것을 특징으로 한다. 상기 노이즈성분 처리부는 상기 복호화된 오디오신호중에서 부가정보인 노이즈레벨을 역양자화하는 노이즈레벨역양자화부; 및 상기 역양자화된 노이즈레벨로부터 상기 중요스펙트럼성분 주변의 소정의 영역을 제외하고 노이즈성분을 생성하는 노이즈성분생성부를 포함할 수 있다. The low bit rate decoding apparatus for achieving the technical problem, lossless decoding unit for outputting the decoded audio signal by lossless decoding on the input bit stream; An important spectral component inverse quantizer for inversely quantizing an important spectrum component which is quantized data in the decoded audio signal; A noise component processor for generating a noise component by inversely quantizing a noise level as additional information among the decoded audio signals; A spectral component combiner for outputting an audio signal in a frequency domain by combining the dequantized important spectral components and noise components; And a signal generator for generating an audio signal in the time domain from the audio signal in the frequency domain. The noise component processing unit comprises: a noise level inverse quantization unit which inversely quantizes a noise level that is additional information in the decoded audio signal; And a noise component generation unit that generates a noise component except for a predetermined region around the important spectrum component from the dequantized noise level.

이하, 첨부된 도면들을 참조하여 본 발명에 따른 저비트율 부호화/복호화 방법 및 장치를 상세히 설명하기로 한다.Hereinafter, a low bit rate encoding / decoding method and apparatus according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 의한 저비트율 오디오 부호화 장치에 대한 일실시예의 구성을 블록도로 도시한 것으로서, 신호변환부(100), 심리음향모델부(110), 중요스펙트럼부분처리부(120), 노이즈성분처리부(130) 및 무손실부호화부(140)를 포함하여 이루어진다.FIG. 1 is a block diagram showing a configuration of an embodiment of a low bit rate audio encoding apparatus according to the present invention. The signal converter 100, the psychoacoustic model unit 110, the important spectrum portion processor 120, and the noise component are shown in FIG. It comprises a processing unit 130 and a lossless encoding unit 140.

신호변환부(100)는 시간영역의 오디오 신호를 주파수영역의 오디오 신호로 변환하게 된다. 이와 같은 시간/주파수 변환으로는 MDCT(Modified Discrete Cosine Transform)으로 변환할 수 있다. 또한, 신호변환부(110)에서는 일정한 주파수성분에 대하여 서브밴드(sub band)로 분할이 이루어지게 된다. The signal converter 100 converts an audio signal in a time domain into an audio signal in a frequency domain. Such a time / frequency transform can be converted into a Modified Discrete Cosine Transform (MDCT). In addition, the signal conversion unit 110 is divided into subbands for a predetermined frequency component.

심리음향모델부(110)는 인간의 청각 특성에 의한 지각적인 중복성을 제거하기 위해, 신호변환부(100)에서 분할된 각각의 서브밴드에 대하여 부호화 비트할당정보를 계산한다. 심리음향 모델부(110)는 인간의 청각특성을 이용해서 감도가 낮은 세부의 정보를 생략하고 부호량을 절감할 수 있게 주파수별로 비트할당량을 달리하며, 바람직하게는 심리음향적(psychoacoutic)인 측면을 고려하여 비트할당정보를 계산한다. 심리음향모델부(110)에서는 이와 같이 계산된 부호화 비트할당정보 를 중요스펙트럼성분처리부(120) 및 노이즈성분처리부(130)로 출력한다. The psychoacoustic model unit 110 calculates coded bit allocation information for each subband divided by the signal converter 100 to remove perceptual redundancy due to human auditory characteristics. The psychoacoustic model unit 110 changes bit allocation for each frequency so as to omit detailed information having low sensitivity and to reduce a code amount by using human auditory characteristics, and preferably a psychoacoustic side. Calculate bit allocation information. The psychoacoustic model unit 110 outputs the coded bit allocation information calculated in this way to the important spectrum component processing unit 120 and the noise component processing unit 130.

중요스펙트럼성분처리부(120)는 신호변환부(100)로부터의 주파수영역 오디오신호에서 중요스펙트럼성분을 추출하여 양자화하며, 중요스펙트럼성분추출부(121) 및 중요스펙트럼성분양자화부(122)를 구비한다. 중요스펙트럼성분추출부(121)는 소정의 스펙트럼 구간별로 중요하다고 판단되는 스펙트럼성분을 추출한다. 중요스펙트럼성분양자화부(122)는 추출된 중요스펙트럼성분에 대해, 심리음향모델부(110)로부터 입력되는 부호화 비트할당정보에 따른 비트율로 양자화한다. The important spectrum component processor 120 extracts and quantizes important spectrum components from the frequency domain audio signal from the signal converter 100, and includes an important spectrum component extractor 121 and an important spectrum component quantizer 122. . The important spectrum component extractor 121 extracts the spectral components determined to be important for each predetermined spectral section. The important spectrum component quantization unit 122 quantizes the extracted important spectrum component at a bit rate according to coding bit allocation information input from the psychoacoustic model unit 110.

노이즈성분처리부(130)는 주파수영역의 오디오신호에서 상기 중요스펙트럼성분을 제외한 잔여스펙트럼성분을 추출하고, 상기 잔여스펙트럼성분의 노이즈레벨을 계산하여 양자화한다. 상기 노이즈성분처리부(140)는 본 발명의 핵심블록으로서 뒤에서 보다 상세히 설명하기로 한다. The noise component processor 130 extracts a residual spectrum component except the important spectrum component from the audio signal in the frequency domain, and calculates and quantizes a noise level of the residual spectrum component. The noise component processor 140 will be described later in more detail as a core block of the present invention.

무손실부호화부(140)는 중요스펙트럼성분처리부(120) 및 노이즈성분처리부(130)로부터 양자화된 오디오 신호를 입력받아 무손실 부호화하여 비트스트림을 출력한다. 이 때, 호프만 부호화(huffman coding), 산술 부호화(arithmetic coding) 등의 무손실 부호화 방식을 사용함으로써 효율적으로 압축하여 부호화를 할 수 있다. The lossless encoding unit 140 receives a quantized audio signal from the important spectrum component processing unit 120 and the noise component processing unit 130 and losslessly encodes the bitstream. At this time, lossless coding such as Huffman coding and arithmetic coding can be used to efficiently compress and encode.

도 2는 도 1에 도시된 노이즈성분처리부(130)의 구성을 블록도로 도시한 것으로서, 노이즈성분처리부(130)는 잔여스펙트럼성분추출부(200), 노이즈레벨계산부(210) 및 노이즈레벨양자화부(220)를 포함하여 이루어진다. 2 is a block diagram illustrating the configuration of the noise component processor 130 shown in FIG. 1, wherein the noise component processor 130 includes a residual spectrum component extractor 200, a noise level calculator 210, and a noise level quantization. It comprises a portion 220.

도 1 및 도 2를 참조하면, 잔여스펙트럼성분추출부(200)는 신호변환부(100) 로부터의 원 스펙트럼신호와 중요스펙트럼성분추출부(121)로부터의 중요스펙트럼성분신호간의 차이를 구하여 잔여스펙트럼성분을 추출한다. 노이즈레벨계산부(210)는 상기 잔여스펙트럼성분을 소정의 서브밴드로 나누어 노이즈크기에 따른 노이즈레벨을 계산한다. 노이즈레벨양자화부(220)는 상기 계산된 노이즈레벨을 심리음향모델부(110)로부터 입력되는 부호화 비트할당정보에 따른 비트율로 양자화한다. Referring to FIGS. 1 and 2, the residual spectrum component extractor 200 obtains a difference between the original spectrum signal from the signal converter 100 and the important spectrum component signal from the important spectrum component extractor 121. Extract the ingredients. The noise level calculator 210 divides the residual spectrum component into predetermined subbands and calculates a noise level according to the noise size. The noise level quantization unit 220 quantizes the calculated noise level at a bit rate according to encoding bit allocation information input from the psychoacoustic model unit 110.

도 3은 본 발명에 따른 저비트율 부호화장치의 동작을 흐름도로 도시한 것으로서, 도 1의 구성요소와 결부시켜 설명하기로 한다. 3 is a flowchart illustrating an operation of a low bit rate encoding apparatus according to the present invention, and will be described with reference to the components of FIG. 1.

도 1 및 도 3을 참조하면, S300 단계에서 신호변환부(100)는 시간영역의 오디오 신호를 주파수영역의 오디오 신호로 변환한다. 이와 같은 시간/주파수 변환으로는 MDCT(Modified Discrete Cosine Transform)으로 변환할 수 있다. 또한, 신호변환부(110)에서는 일정한 주파수성분에 대하여 서브밴드(sub band)로 분할이 이루어지게 된다. 주파수영역의 오디오신호의 MDCT 스펙트럼(X)이 도 5a에 도시되어 있다. 1 and 3, in step S300, the signal converter 100 converts an audio signal in a time domain into an audio signal in a frequency domain. Such a time / frequency transform can be converted into a Modified Discrete Cosine Transform (MDCT). In addition, the signal conversion unit 110 is divided into subbands for a predetermined frequency component. The MDCT spectrum X of the audio signal in the frequency domain is shown in Fig. 5A.

S310 단계에서 심리음향모델부(110)는 인간의 청각 특성에 의한 지각적인 중복성을 제거하기 위해, 신호변환부(100)에서 분할된 각각의 서브밴드에 대하여 부호화 비트할당정보를 계산한다. 심리음향모델부(110)는 심리음향적(psychoacoustic)인 측면을 고려하여 비트할당정보를 계산하므로, 인간의 청각 감도가 높은 주파수는 비트를 많이 할당하고 그렇지 않은 주파수는 비트를 적게 할당할 수 있다. In operation S310, the psychoacoustic model unit 110 calculates encoding bit allocation information for each subband divided by the signal conversion unit 100 to remove perceptual redundancy due to human auditory characteristics. Since the psychoacoustic model unit 110 calculates the bit allocation information in consideration of the psychoacoustic aspect, a frequency with high human hearing sensitivity may allocate a lot of bits, and a frequency other than the psychoacoustic model may allocate a bit. .

S320 단계에서 중요스펙트럼성분처리부(120)는 신호변환부(100)로부터의 주 파수영역 오디오신호에서 중요스펙트럼성분을 추출하여 양자화한다. 도 5a의 MDCT 스펙트럼에서 중요스펙트럼성분만을 추출한 스펙트럼(Y)이 도 5b에 도시되어 있다. 이 때, 중요스펙트럼성분의 주변 스펙트럼영역을 0으로 설정한다. 0으로 설정하는 주변 스펙트럼영역의 크기(nAround)는 표 1에 나타내고 있다. In operation S320, the important spectrum component processor 120 extracts and quantizes an important spectrum component from the frequency region audio signal from the signal converter 100. A spectrum Y obtained by extracting only important spectrum components from the MDCT spectrum of FIG. 5A is illustrated in FIG. 5B. At this time, the peripheral spectral region of the important spectrum component is set to zero. Table 1 shows the size (nAround) of the surrounding spectral region set to zero.

S330 단계에서 노이즈성분처리부(130)는 주파수영역의 오디오신호에서 상기 중요스펙트럼성분을 제외한 잔여스펙트럼성분을 추출하고, 상기 잔여스펙트럼성분의 노이즈레벨을 계산하여 양자화한다. 이러한 S330 단계는 본 발명의 핵심으로서 뒤에서 보다 상세히 설명하기로 한다. In operation S330, the noise component processor 130 extracts a residual spectrum component except for the important spectrum component from the audio signal in the frequency domain, and calculates and quantizes a noise level of the residual spectrum component. This step S330 will be described in more detail later as the core of the present invention.

S340 단계에서 무손실부호화부(140)는 중요스펙트럼성분처리부(120) 및 노이즈성분처리부(140)로부터 양자화된 오디오 신호를 입력받아 무손실 부호화하여 비트스트림을 출력한다. 이 때, 호프만 부호화(huffman coding), 산술 부호화(arithmetic coding) 등의 무손실 부호화 방식을 사용함으로써 효율적으로 압축하여 부호화를 할 수 있다. 출력되는 비트스트림은 중요스펙트럼성분인 양자화된 데이터 및 노이즈레벨인 부가정보로 이루어져 있다. In operation S340, the lossless encoding unit 140 receives the quantized audio signal from the important spectrum component processing unit 120 and the noise component processing unit 140 and losslessly encodes the bitstream. At this time, lossless coding such as Huffman coding and arithmetic coding can be used to efficiently compress and encode. The output bitstream is composed of quantized data, which is an important spectrum component, and additional information, which is a noise level.

도 4는 S330 단계를 보다 상세하게 나타내는 흐름도를 도시한 것으로서, 도 1 및 도 2의 구성요소와 결부시켜 설명하기로 한다. 4 is a flowchart illustrating the operation S330 in more detail, and will be described with reference to the components of FIGS. 1 and 2.

도 1, 도 2 및 도 4를 참조하면, S400 단계에서 잔여스펙트럼성분추출부(200)는 신호변환부(100)로부터의 원스펙트럼신호와 중요스펙트럼성분추출부(121)로부터의 중요스펙트럼신호간의 차이를 구하여 잔여스펙트럼성분을 추출한다. 도 5a의 원스펙트럼(X)에서 도 5b의 중요스펙트럼(Y)을 제외하여 추출한 잔여스펙트럼(Z)이 도 5c에 도시되어 있다. 1, 2, and 4, in operation S400, the residual spectrum component extracting unit 200 may include a signal between the one-spectrum signal from the signal converter 100 and the important spectrum signal from the important spectrum component extractor 121. Find the difference and extract the residual spectrum components. The remaining spectrum Z extracted from the one spectrum X of FIG. 5A except for the important spectrum Y of FIG. 5B is illustrated in FIG. 5C.

S410 단계에서 노이즈레벨계산부(210)는 상기 잔여스펙트럼성분을 소정의 서브밴드로 나누어 노이즈크기에 따른 노이즈레벨을 계산한다. In operation S410, the noise level calculator 210 divides the residual spectrum component into predetermined subbands and calculates a noise level according to the noise size.

상기 노이즈크기는 선형예측(linear prediction) 분석을 수행하여 계산한다. 선형예측 분석은 널리 알려진 자기상관법(autocorrelation method)을 이용하여 수행하며, 공분산법(covariance method), 더빈의 방법(Durbin's method) 등을 이용 수도 있다. 이러한 방법은 당해 기술분야에서 널리 알려져 있으므로 여기서는 설명을 생략한다. 선형 예측을 통해 부호화기에서는 현재 프레임에서 노이즈 성분이 얼마나 많은지를 예측한다. 만일 노이즈 성분이 강한 경우 노이즈 크기를 그대로 전송하고 노이즈 성분이 적고 톤 성분이 강한 경우에는 상대적으로 노이즈 크기를 줄여 전송한다. 또한 작은 윈도우일 경우에는 노이즈가 급격하게 변하는 경우이므로 이러한 경우에는 추가적으로 노이즈 크기를 줄여 전송한다.The noise magnitude is calculated by performing a linear prediction analysis. Linear predictive analysis is performed using the well-known autocorrelation method. The covariance method and the Durbin's method may be used. Such methods are well known in the art and will not be described here. Through linear prediction, the encoder predicts how much noise is present in the current frame. If the noise component is strong, the noise level is transmitted as it is. If the noise component is small and the tone component is strong, the noise level is relatively reduced. In addition, in the case of a small window, the noise changes abruptly, so in this case, the noise size is additionally reduced.

상기 노이즈 레벨은 다음 식에 의해 계산할 수 있다. The noise level can be calculated by the following equation.

여기서, Energy는 각각의 서브밴드에 대한 에너지, nCountFreq는 그 서브밴드 내에서 0이 아닌 스펙트럼성분의 개수, dNoise는 그 서브밴드에 대해 계산된 노이즈크기, 및 α는 노이즈특성에 따라 선택되는 지각적인 가중치를 의미한다. α는 일시적인 노이즈에 대해서는 노이즈레벨을 낮게(예를 들어, 0.3) 하고(보통 이러한 경우에는 짧은 윈도우를 이용하여 데이터를 변환함), 지속적인 노이즈(예를 들어, 화이트 노이즈(white noise))에 대해서는 노이즈레벨을 높게(예를 들어, 0.7) 설정한다(보통 이러한 경우에는 긴 윈도우를 이용하여 데이터를 변환함). Where Energy is the energy for each subband, nCountFreq is the number of nonzero spectral components in that subband, dNoise is the noise size calculated for that subband, and α is a perceptual It means the weight. α decreases the noise level for transient noise (e.g., 0.3) (usually in this case using a short window to convert the data), and for continuous noise (e.g. white noise). The noise level is set high (for example, 0.7) (usually in this case the data is converted using a long window).

S420 단계에서 노이즈레벨양자화부(220)는 상기 계산된 노이즈레벨을 심리음향모델부(110)로부터 입력되는 부호화 비트할당정보에 따른 비트율로 양자화한다. In operation S420, the noise level quantization unit 220 quantizes the calculated noise level at a bit rate according to encoding bit allocation information input from the psychoacoustic model unit 110.

도 6은 본 발명에 따른 저비트율 복호화장치의 구성을 블록도로 도시한 것으로서, 무손실복호화부(600), 중요스펙트럼성분역양자화부(610), 노이즈레벨처리부(620), 스펙트럼성분결합부(630) 및 신호생성부(640)를 포함하여 이루어진다. 6 is a block diagram illustrating a configuration of a low bit rate decoding apparatus according to the present invention, and includes a lossless decoding unit 600, an important spectral component dequantization unit 610, a noise level processing unit 620, and a spectral component combining unit 630. ) And a signal generator 640.

무손실복호화부(600)는 수신되는 비트스트림에 대하여 복호화된 오디오 신호를 중요스펙트럼성분역양자화부(610) 및 노이즈레벨처리부(620)로 출력한다. 즉, 무손실 복호화부(600)는 계층적 구조를 가진 비트스트림에서 양자화된 데이터 및 부가정보를 추출한다. The lossless decoding unit 600 outputs the decoded audio signal with respect to the received bitstream to the critical spectrum component inverse quantizer 610 and the noise level processor 620. That is, the lossless decoder 600 extracts quantized data and additional information from a bitstream having a hierarchical structure.

중요스펙터럼성분역양자화부(610)는 상기 복호화된 오디오신호에서 중요스펙트럼성분을 역양자화한다. An important spectrum component inverse quantizer 610 inversely quantizes an important spectrum component in the decoded audio signal.

노이즈레벨처리부(620)는 복호화된 오디오신호에서 노이즈레벨을 역양자화하여 노이즈성분을 생성하며, 노이즈레벨역양자화부(621) 및 노이즈성분생성부(622)를 구비한다. 노이즈레벨역양자화부(621)는 상기 복호화된 오디오신호에서 노이즈레벨을 역양자화하며, 노이즈성분생성부(622)는 상기 역양자화된 노이즈레벨로부터 상기 중요스펙트럼성분 주변의 소정의 영역을 제외하고 노이즈성분을 생성한다. The noise level processor 620 generates a noise component by inversely quantizing the noise level in the decoded audio signal, and includes a noise level inverse quantizer 621 and a noise component generator 622. The noise level inverse quantizer 621 inversely quantizes the noise level in the decoded audio signal, and the noise component generator 622 excludes a predetermined region around the important spectrum component from the dequantized noise level. Produce ingredients.

스펙트럼성분결합부(630)는 상기 역양자화된 중요스펙트럼성분 및 노이즈성분을 결합하여 주파수영역의 오디오신호를 출력한다. The spectral component combiner 630 combines the dequantized significant spectral components and noise components to output an audio signal in a frequency domain.

신호생성부(640)는 상기 주파수영역의 오디오 신호로부터 시간영역의 오디오 신호를 생성한다. The signal generator 640 generates an audio signal in the time domain from the audio signal in the frequency domain.

도 7은 본 발명에 따른 저비트율 복호화장치의 동작을 흐름도로 도시한 것으로서, 도 6의 구성요소와 결부시켜 설명하기로 한다. FIG. 7 is a flowchart illustrating an operation of a low bit rate decoding apparatus according to the present invention, and will be described with reference to the components of FIG. 6.

도 6 및 도 7을 살펴보면, 단계 S700에서 무손실 복호화부(600)는 수신되는 부호화된 비트스트림에 대하여 무손실 부호화부(140)의 역과정을 수행하며, 그 결과로서 복호화된 오디오 신호를 중요스펙트럼성분역양자화부(610) 및 노이즈레벨처리부(620)로 출력한다. 즉, 무손실 복호화부(600)는 계층적 구조를 가진 비트스트림에서 양자화된 데이터 및 부가정보를 추출한다. 무손실 복호화는 산술복호화 방법에 의해 복호화하거나 호프만 복호화 방법에 의해 복호화할 수 있다.6 and 7, in step S700, the lossless decoder 600 performs an inverse process of the lossless encoder 140 on the received coded bitstream, and as a result, the significant spectrum component is included in the decoded audio signal. The dequantizer 610 and the noise level processor 620 output the result. That is, the lossless decoder 600 extracts quantized data and additional information from a bitstream having a hierarchical structure. Lossless decoding can be decoded by the arithmetic decoding method or by the Hoffman decoding method.

S710 단계에서 중요스펙터럼성분역양자화부(610)는 상기 복호화된 오디오신 호에서 양자화된 데이터인 중요스펙트럼성분을 역양자화한다. In operation S710, the important spectrum component inverse quantization unit 610 inversely quantizes an important spectrum component that is quantized data in the decoded audio signal.

S720 단계에서 노이즈레벨처리부(620)는 복호화된 오디오신호에서 부가정보의 노이즈레벨을 역양자화하여 노이즈성분을 생성한다. 노이즈레벨역양자화부(621)는 상기 복호화된 오디오신호에서 노이즈레벨을 역양자화하며, 노이즈성분생성부(622)는 상기 역양자화된 노이즈레벨로부터 상기 중요스펙트럼성분의 주변 스펙트럼영역을 제외하고 노이즈성분을 생성한다. In operation S720, the noise level processor 620 inversely quantizes the noise level of the additional information in the decoded audio signal to generate a noise component. The noise level dequantization unit 621 dequantizes the noise level in the decoded audio signal, and the noise component generation unit 622 excludes the surrounding spectral region of the important spectral component from the dequantized noise level. Create

S730 단계에서 스펙트럼성분결합부(630)는 상기 역양자화된 중요스펙트럼성분 및 노이즈성분을 결합하여 주파수영역의 오디오신호를 출력한다. 도 5d에는 중요스펙터럼성분과 노이즈성분을 결합한 신호스펙트럼이 도시되어 있다. 도 5d에서 알 수 있는 바와 같이, 도 5a의 원스펙트럼신호에 비하여 노이즈성분이 현저히 감소되어 있다. In operation S730, the spectral component combiner 630 combines the dequantized significant spectral components and noise components to output an audio signal in a frequency domain. In FIG. 5D, a signal spectrum combining important spectral components and noise components is shown. As can be seen from FIG. 5D, the noise component is significantly reduced compared to the one-spectrum signal of FIG. 5A.

S740 단계에서 신호생성부(640)는 상기 주파수영역의 오디오 신호로부터 시간영역의 오디오 신호를 생성한다. In operation S740, the signal generator 640 generates an audio signal in the time domain from the audio signal in the frequency domain.

본 발명은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터(정보처리기능을 갖는 장치를 모두 포함한다)가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 장치의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등이 있다. The present invention can be embodied as code that can be read by a computer (including all devices having an information processing function) in a computer-readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording devices include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 당해 기술 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다. Although the present invention has been described with reference to the embodiments shown in the drawings, this is merely exemplary, and it will be understood by those skilled in the art that various modifications and equivalent other embodiments are possible. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

본 발명에 따르면, 저비트율 부호화/복호화방법 및 장치에 따르면, 오디오신호에서 중요스펙트럼성분과 노이즈성분을 분리하여 부호화함으로써 효율적으로 데이터를 압축할 수 있고 원음에 가깝게 복원할 수 있다. According to the present invention, according to the low bit rate encoding / decoding method and apparatus, by separating and encoding important spectral components and noise components from an audio signal, data can be efficiently compressed and restored close to the original sound.

Claims

delete

Inverse quantization of quantized spectral components to a value other than '0' included in the bitstream;

Dequantizing a noise level, which is additional information included in the bitstream;

Generating a noise component for a frequency component other than the spectral component quantized to a value other than '0' using the dequantized noise level;

And combining the dequantized spectral component and the generated noise component.

The method of claim 11, wherein the noise component generation step

And generating the noise component from the dequantized noise level except for a predetermined region around the spectral component quantized to a non-zero value.

An important spectral component inverse quantizer for inversely quantizing spectral components quantized to a value other than '0' included in the bitstream;

A noise component that inversely quantizes a noise level that is additional information included in the bitstream, and generates a noise component for a frequency component other than the spectral component that is quantized to a non-zero value by using the dequantized noise level. Processing unit; And

And a spectral component combiner for outputting an audio signal in a frequency domain by combining the dequantized spectral component and the noise component.

The method of claim 13, wherein the noise component processing unit

A noise level inverse quantizer for inversely quantizing a noise level which is additional information included in the bitstream; And

And a noise component generator for generating the noise component except for a predetermined region around the spectral component quantized to a value other than '0'.

And a program for executing a low bit rate decoding method comprising combining said dequantized spectral component and said generated noise component.