KR100300957B1

KR100300957B1 - Digital audio encoding method using lookup table and apparatus for the same

Info

Publication number: KR100300957B1
Application number: KR1019950031355A
Authority: KR
Inventors: 김도형; 서양석
Original assignee: 윤종용; 삼성전자 주식회사
Priority date: 1995-09-22
Filing date: 1995-09-22
Publication date: 2001-11-22
Also published as: KR970019116A

Abstract

PURPOSE: A digital audio encoding method using a lookup table and an apparatus for the same are provided to reduce a delay time in an audio encoding process by storing the amount of bit allocation according to a characteristics of an input audio signal in a lookup table. CONSTITUTION: A frequency mapping portion(31) is used for dividing time area input audio data into 32 frequency bands by passing the audio data through a band resolution filter. A characteristic value acquirement portion(33) obtains an address of a lookup table(35) by calculating occupation ratios for a scale factor and a mean square of an input signal and selecting a larger one of the occupation ratios as the address of the lookup table(35). The characteristic value acquirement portion(33) determines an address of the lookup table(35) according to the occupation ratios. The lookup table(35) has the number of allocated bits for protecting a frequency band. A bit allocation and quantization portion(37) allocates bits according to each band by using the obtained lookup table(35) and controls the amount of total bit. A frame packing portion(39) forms the quantized audio signal as a bit stream.

Description

Digital audio encoding method and apparatus using lookup table

제1도는 일반적인 MPEG 오디오 부호화기를 나타낸 블럭도.1 is a block diagram showing a typical MPEG audio coder.

제2a도는 룩업테이블을 생성하는 장치를 설명하기 위한 도면.2A is a diagram for explaining an apparatus for generating a lookup table.

제2b도는 룩업테이블을 생성하는 과정을 설명하기 위한 도면.2b is a view for explaining a process of generating a lookup table.

제3도는 본 발명에 의한 룩업테이블을 이용한 디지탈 오디오 부호화장치를 나타낸 블럭도.3 is a block diagram showing a digital audio encoding apparatus using a lookup table according to the present invention.

제4도는 왼쪽 채널의 NMR에 의한 MPEG과 본 발명에 의한 부호화장치의 성능을 비교한 그래프.4 is a graph comparing the performance of the MPEG apparatus according to the NMR of the left channel and the encoding apparatus according to the present invention.

제5도는 오른쪽 채널의 NMR에 의한 MPEG과 본 발명에 의한 부호화장치의 성능을 비교한 그래프.5 is a graph comparing the performance of the MPEG apparatus according to the NMR of the right channel and the encoding apparatus according to the present invention.

본 발명은 디지탈 오디오 부호화방법 및 장치에 관한 것으로서, 특히 룩업테이블(Look-Up Table;이하 LUT라 칭함)을 이용한 디지탈 오디오 부호화방법 및 장치에 관한 것이다.The present invention relates to a digital audio encoding method and apparatus, and more particularly, to a digital audio encoding method and apparatus using a look-up table (hereinafter referred to as LUT).

오늘날의 통신기술은 모든 것을 아날로그에서 디지탈로 변화시켜 가는 추세이다. 이러한 추세에 부응하여 모든 오디오기기 혹은 오디오전송에 있어서도 디지탈 전송은 필수불가결하게 되었다. 이러한 디지탈 오디오의 전송은 기존의 아날로그 전송방식보다 주위의 잡음에 강하고, 또한 음질도 컴팩트 디스크(CD)에서와 같이 매우 깨끗하게 재생할 수 있다. 그러나, 전송할 데이타양이 증가함에 따라서 저장해야할 메모리의 용량 혹은 전송선로의 용량 등 여러가지 문제를 야기시켰다.Today's communication technology is changing everything from analog to digital. In response to this trend, digital transmission is indispensable for any audio device or audio transmission. The transmission of digital audio is more resistant to ambient noise than conventional analog transmission, and the sound quality can be reproduced very cleanly as in a compact disc (CD). However, as the amount of data to be transmitted increases, various problems such as the amount of memory to be stored or the capacity of a transmission line have caused.

이러한 문제점을 해결하기 위하여 필요한 기술이 데이타 압축기술이다. 오디오의 경우, 원음을 압축하여 전송한 다음 다시 풀어서 들어 보았을 때 원음과 거의 같게 재생되도록 하는 것이 오디오 압축기술의 목표이다. 즉, 똑같은 수준의 음질을 재생하면서 단위시간당 보다 적은 양의 정보를 전송할 수 있도록 한다.The technology required to solve this problem is data compression technology. In the case of audio, the goal of audio compression technology is to compress the original sound, transmit it, and then uncompress and play it almost like the original sound when heard. In other words, it is possible to transmit less information per unit time while playing the same sound quality.

이러한 기술이 현재 전세계적으로 진행되어 오고 있는데, 그 시발점이 된 것이 1992년 일본의 SONY사가 만든 미니 디스크(MD)와 Philips사에서 제작한 디지탈 컴팩트 카세트(DCC)이다. MD의 경우를 예로 들면, CD 수준의 음질을 재생하면서 기존의 CD보다 크기가 상대적으로 작고, 또한 압축율을 약 5:1로 하여 CD보다 훨씬 많은 양의 데이타를 저장할 수 있으며, 외부의 충격에도 강한 특성을 가지고 있다.This technology is currently underway around the world, and the starting point is the mini disk (MD) made by SONY of Japan in 1992 and the digital compact cassette (DCC) made by Philips. In the case of MD, for example, the CD-level sound quality is relatively smaller than a conventional CD, and the compression ratio is about 5: 1, so that a much larger amount of data can be stored than a CD. Has characteristics.

한편, 전세계적으로 디지탈 압축부호화 기술에 대한 국제표준안기구 즉 MPEG(Moving Picture Experts Group)이 설립되었다. MPEG은 크게 시스템, 비디오, 오디오의 세부분으로 구성되어 있다. 그 중 오디오부분은 다시 세개의 계층으로 구분된다.Meanwhile, the International Standards Organization for Digital Compression Encoding Technology, or Moving Picture Experts Group (MPEG), was established worldwide. MPEG is largely composed of system, video and audio subdivisions. The audio part is divided into three layers again.

MPEG에서는 동화상과 그에 따른 오디오신호의 부호화된 표면에 대해 국제적인 표준안을 제정하기 위하여 제안된 많은 저전송률 부호화기술들을 비교, 분석 및 테스트한다. 이러한 국제표준안이 만들어지면 향후 모든 디지탈 저장매체에서는 이 규격에 맞도록 데이타를 부호화하여 저장하여야 한다. 여기서, 디지탈 저장매체는 CD-롬, 디지탈 오디오 테이프(DAT), 광자기 디스크(MOD) 및 컴퓨터 디스크를 포함한다.MPEG compares, analyzes, and tests many low-rate coding techniques proposed to establish international standards for moving images and the encoded surfaces of audio signals. Once these international standards have been produced, all future digital storage media will need to encode and store the data in accordance with this standard. Here, the digital storage medium includes a CD-ROM, a digital audio tape (DAT), a magneto-optical disk (MOD) and a computer disk.

오디오신호의 압축부호화에 있어서 통상적으로 사용되는 기술이 인간의 청각심리모델(Psychoacoustics model)을 이용하는 방법이다. 청각특성중 마스킹(masking) 현상과 임계대역(critical band) 등을 이용하여 인간이 들어서 느낄수 없는 신호는 제거하고, 꼭 있어야 할 신호만 부호화하여 비트를 할당해 줌으로써 원래 신호보다 적은 양의 비트로 부호화하여도 원음과 거의 같은 수준의 음질을 얻을 수 있다.A technique commonly used in compression encoding of an audio signal is a method using a human psychoacoustic model. Using masking phenomenon and critical band among auditory characteristics, it removes signals that humans cannot feel and encodes only the signals that must be present and allocates bits by using less bits than the original signals. The sound quality is almost the same as that of the original sound.

여기서, 마스킹 현상은 오디오신호들 중에서 서로간의 간섭에 의해 어떤 신호가 다른 신호를 마스킹함으로써 인간이 듣기에 전혀 느끼지 못하는 현상을 말한다. 그리고, 임계대역이란 인간이 소리의 주파수를 구분해내는 일종의 단위로서 일반적으로 24개의 대역으로 나뉘어진다. 고주파수쪽으로 갈수록 이 대력의 폭은 로그 스케일(log scale)로 점점 커진다. 따라서 인간의 귀는 저주파수 쪽보다는 고주파수쪽 신호에 대해 주파수를 구분하기가 용이하지 않다.Here, the masking phenomenon refers to a phenomenon in which a human face does not feel at all by masking another signal due to interference among audio signals. In addition, the critical band is a kind of unit in which humans distinguish frequencies of sound, and is generally divided into 24 bands. As we move towards the higher frequencies, the magnitude of this force becomes larger on a log scale. Therefore, the human ear is not easy to distinguish frequencies for high frequency signals rather than low frequencies.

이러한 청각특성을 이용하여 비트를 할당해 주기 위해서는 신호대잡음비(Signal-to-Noise Ratio;이하 SNR이라 칭함)와 신호대 마스크레벨비(Signal-to-Mask Ratio;이하 SMR라 칭함)를 구하여 이 값으로 부터 다시 마스크레벨대 잡음비(Mask-to-Noise Ratio;이하 MNR라 칭함)를 계산해야 한다. 여기서, 마스크레벨이란 인간이 들어도 느끼지 못하는 최소의 신호레벨을 의미한다. 따라서, 이 마스크레벨 이하의 신호에 대해서는 비트를 할당하지 않아도 된다.In order to allocate bits using these auditory characteristics, a signal-to-noise ratio (hereinafter referred to as SNR) and a signal-to-mask ratio (hereinafter referred to as SMR) are obtained as these values. The mask-to-noise ratio (hereinafter referred to as MNR) must be calculated again. Here, the mask level means the minimum signal level that humans do not feel. Therefore, it is not necessary to allocate bits for signals below this mask level.

위와 같은 과정을 거쳐 최종 MNR을 구한 다음 이 값을 기준으로 반복적으로 비트를 할당하여 준다. 그러나, 이러한 일련의 과정을 거치는 동안 많은 연산시간이 소요되는데, 이는 바로 부호화기에서의 실시간 지연이 커짐을 의미하므로 연산의 복잡도를 줄일 필요성이 대두되었다.Through the above process, the final MNR is obtained and bits are repeatedly allocated based on this value. However, it takes a lot of computation time during this series of processes, which means that the real-time delay in the encoder is increased. Therefore, there is a need to reduce the complexity of the computation.

한편, 일반적인 MPEG 오디오 부호화기에 대하여 제1도를 참조하여 간략히 설명하면 다음과 같다.Meanwhile, a general MPEG audio encoder will be briefly described with reference to FIG. 1 as follows.

주파수 맵핑부(11)에서는 대역분해필터를 이용하여 시간영역의 오디오 데이타를 32개의 균등한 대역의 주파수영역으로 변환시킨다. 이때 각 대역에는 계층 Ⅰ일 경우 12개, 계층 Ⅱ일 경우 36개의 샘플이 존재하게 된다. 한편, 스케일 팩터는 최대값 2.0에서 부터 3단계 내려갈 때마다 1/{(2^1/3)}^N씩 감소되는 값으로서, 표1에서와 같이 총 63단계가 있다. 그러므로, 이것을 부호화하는데 필요한 비트수는 6비트이다.The frequency mapping unit 11 converts the audio data in the time domain into 32 equal bands in the frequency domain by using a band decomposition filter. In this case, there are 12 samples in each band in layer I and 36 samples in layer II. On the other hand, the scale factor is a value that decreases by 1 / {(2 ^1/3 )} ^N every three steps from the maximum value 2.0, and there are a total of 63 steps as shown in Table 1. Therefore, the number of bits required to encode this is 6 bits.

그리고, 부호화하는 방법은 계층에 따라 다소의 차이가 있다. 계층 Ⅰ에서는 각 대역에 존재하는 12개의 샘플 중 가장 큰 값을 구하여 이 값과 같거나 약간 큰 값을 해당되는 대역의 스케일 팩터로 한다. 한편, 계층 Ⅱ에서는 각 대역에 3개의 스케일 팩터가 존재하므로, 각 스케일 팩터의 유사성을 검토하여 3개중 몇개를 부호화할 것인지를 결정한다. 즉, 서로 이웃하는 스케일 팩터와의 차이를 구하는 그 값의 범위에 따라 다르게 선택하도록 한다. 따라서 계층 Ⅰ에서와 달리 부수적으로 스케일 팩터를 선택해 주는 정보가 필요하게 되는데, 이러한 경우 2비트로 부호화한다.The encoding method has some differences depending on the hierarchies. In Layer I, the largest value among the 12 samples in each band is obtained, and a value equal to or slightly larger than this value is used as a scale factor of the corresponding band. On the other hand, in layer II, since three scale factors exist in each band, the similarity of each scale factor is examined to determine how many of the three are encoded. In other words, the difference between the neighboring scale factors is selected differently according to the range of the value. Therefore, unlike in layer I, information that selects a scale factor is additionally needed. In this case, two bits are encoded.

청각심리모델(13)은 부호화기에서 연산의 복잡도가 가장 큰 부분으로서, 청각심리모델의 최종출력값은 각 대역의 SMR로서, 비트할당의 기준이 된다. SMR값은 다음과 같은 일련의 단계에 의해 계산된다. 제1단계에서는 고속퓨리에변환(FFT)에 의해 시간영역의 오디오신호를 주파수영역으로 변환하고, 제2단계에서는 각 대역의 음압레벨(Sound Pressure Level)을 계산하고, 제3단계에서는 절대 마스킹 문턱치(Absolute Threshold)를 계산하고, 제4단계에서는 오디오신호의 유성음과 무성음 성분을 결정하고, 제5단계에서는 마스커를 결정하고, 제6단계에서는 각각의 마스킹 문턱치를 계산하고, 제7단계에서는 전체 마스킹 문턱치를 계산하고, 제8단계에서는 각 대역의 최소 마스킹 문턱치를 계산하고, 제9단계에서는 각 대역의 SMR 값을 계산한다.The psychoacoustic model 13 is a part of the computational complexity that is the largest in the encoder, and the final output value of the psychoacoustic model is an SMR of each band and serves as a criterion for bit allocation. The SMR value is calculated by the following series of steps. In the first step, the audio signal in the time domain is converted into the frequency domain by a fast Fourier transform (FFT), in the second step, a sound pressure level of each band is calculated, and in the third step, an absolute masking threshold ( Absolute Threshold) is calculated, in the fourth step the voiced and unvoiced components of the audio signal are determined, in the fifth step the masker is determined, in the sixth step the respective masking thresholds are calculated, and in the seventh step, the overall masking is performed. The threshold value is calculated, and in the eighth step, the minimum masking threshold of each band is calculated, and in the ninth step, the SMR value of each band is calculated.

비트할당 및 양자화부(15)에 있어서, 먼저 비트할당과정은 청각심리모델(13)에서 SMR값을 기준으로 다음과 같은 일련의 단계를 반복적으로 수행하여 각 대역의 비트할당량을 구한다. 제1단계에서는 초기 할당비트를 0으로 하고, 제2단계에서는 각 대역에 대하여 MNR값을 구하는데, 이때 MNR값은 SNR값에서 SNR값을 뺀 값이 된다. 제3단계에서는 각 대역별로 구해진 MNR값 중에서 최소 MNR을 가진 대역을 찾아 할당비트수를 1 증가시키고, 제4단계에서는 요구되는 비트수를 넘지 않을 경우 나머지 대역에 대하여 제2 내지 제3단계를 반복한다.In the bit allocation and quantization unit 15, first, the bit allocation process repeatedly performs the following series of steps based on the SMR value in the psychoacoustic model 13 to obtain the bit allocation of each band. In the first step, the initial allocation bit is set to 0, and in the second step, an MNR value is obtained for each band, where the MNR value is obtained by subtracting the SNR value from the SNR value. In the third step, the band having the minimum MNR is found among the MNR values obtained for each band, and the number of allocated bits is increased by one. In the fourth step, if the required number of bits is not exceeded, the second to third steps are repeated for the remaining bands. do.

한편, 양자화과정은 다음과 같은 일련의 단계를 거쳐 수행된다. 제1단계에서는 각 대역내에서 샘플들을 스케일 팩터로 나누어 X라 두고, 제2단계에서는 A*X+B(여기서 A,B는 미리 정해진 테이블값)을 계산하고, 제3단계에서는 계산된 값 중에서 비트할당과정에서 구해진 할당비트수 만큼 취하고, 제4단계에서는 최상위비트(MSB)를 역전시킨다.On the other hand, the quantization process is performed through a series of steps as follows. In the first step, samples are divided into scale factors within each band, and X is calculated. In the second step, A * X + B (where A and B are predetermined table values) is calculated. As many as the number of allocation bits obtained in the bit allocation process are taken, and in the fourth step, the most significant bit (MSB) is reversed.

상술한 바와 같이 종래의 디지탈 오디오 부호화기에서는 청각심리모델을 이용하므로 SMR값을 구하기 위해 9단계의 처리과정을 필요로 하며, 따라서 연산의 복잡도가 커지고, 전체 수행시간에 큰 영향을 미치게 된다. 또한 이러한 방법으로 얻어진 SMR값을 이용하여 다시 MNR을 계산하고, MNR을 기준으로 다시 비트할당 루프를 반복적으로 수행하기 때문에 이 과정에서도 시간지연이 발생한다.As described above, in the conventional digital audio encoder, since the psychoacoustic model is used, a nine-step process is required to obtain an SMR value, which increases the complexity of the operation and greatly affects the overall execution time. In addition, since the MNR is calculated again using the SMR value obtained by the above method, and the bit allocation loop is repeatedly performed based on the MNR, time delay also occurs in this process.

실제 실험을 해 본 결과, 다음 표2에서와 같이 전체 부호화기의 수행시간에서 청각심리모델과 비트할당과정이 약 49.9%를 차지할 만큼 연산의 복잡도가 높다는 것을 알 수 있다. 그러므로 이러한 과정을 대체할 만한 방법의 필요성이 부각되었다.As a result of the actual experiment, it can be seen that the computational complexity is high enough that the auditory psychology model and bit allocation process takes about 49.9% in the execution time of the entire encoder as shown in Table 2 below. Therefore, the need for an alternative to this process has emerged.

따라서, 본 발명의 목적은 상술한 문제점을 해결하기 위하여 오디오 압축부호화시 소요되는 지연시간을 줄이기 위하여 기존의 MPEG과 호환되면서, 입력 오디오신호의 특성들을 규명하여 특성에 따른 비트할당량을 미리 룩업테이블를 작성한 다음, 작성된 룩업테이블을 이용하여 비트를 할당하기 위한 디지탈 오디오 부호화방법을 제공하는데 있다.Accordingly, an object of the present invention is to create a lookup table in advance that the bit allocation according to the characteristics is made by identifying the characteristics of the input audio signal while being compatible with the existing MPEG to reduce the delay time required for audio compression encoding in order to solve the above problems. Next, there is provided a digital audio encoding method for allocating bits by using a created lookup table.

본 발명의 다른 목적은 상기 디지탈 오디오 부호화방법을 실현하는데 가장 적합한 장치를 제공하는데 있다.Another object of the present invention is to provide an apparatus most suitable for realizing the digital audio encoding method.

상기 목적을 달성하기 위하여, 본 발명에 의한 디지탈 오디오 부호화방법은 주파수 대역을 부호화하기 위한 할당비트수들을 포함하며, 그 할당비트수들을 스케일 팩터 및 제곱평균을 포함한 주파수 대역의 특성의 점유율을 이용하여 계산된 주소에 저장한 룩업테이블을 미리 작성하는 단계; 시간영역 입력 오디오 신호를 다수의 동일 주파수 대역들로 분할하는 단계; 각 주파수 대역별 스케일 팩터 및 제곱평균을 계산하고, 각 대역별 스케일 팩터 및 제곱평균의 점유율들을 결정하고, 각 주파수 대역별 점유율들을 비교하고 가장 큰 점유율을 이용하여 각 주파수 대역별 주소를 계산하고, 각 주파수 대역별로 계산된 주소를 이용하여 미리 준비된 룩업테이블에서 할당비트수를 추출하는 과정들을 포함하여, 각 주파수 대역별 할당비트수를 결정하는 단계; 추출된 할당비트수들에 대응하여 주파수 대역들에 비트들을 할당하고, 할당된 비트들을 양자화하는 단계; 및 양자화된 비트들로부터 양자화된 오디오 신호를 비트스트림으로 형성하는 단계를 포함하는 것을 특징으로 한다.In order to achieve the above object, the digital audio encoding method according to the present invention includes the number of allocation bits for encoding the frequency band, using the occupancy ratio of the characteristics of the frequency band including the scale factor and the root mean square. Creating a look-up table stored in the calculated address in advance; Dividing the time domain input audio signal into a plurality of same frequency bands; Calculate the scale factor and square average of each frequency band, determine the share of scale factor and square average of each band, compare the share of each frequency band, calculate the address of each frequency band using the largest share, Determining the number of allocated bits for each frequency band, including extracting the number of allocated bits from a lookup table prepared in advance using an address calculated for each frequency band; Allocating bits to frequency bands corresponding to the extracted number of allocated bits and quantizing the allocated bits; And forming a quantized audio signal from the quantized bits into a bitstream.

상기 다른 목적을 달성하기 위하여, 본 발명에 의한 디지탈 오디오 부호화장치는, 시간영역 입력 오디오 신호를 다수의 동일한 주파수 대역들로 분할하는 주파수 매핑부; 주파수 대역을 부호화하기 위한 할당비트수들을 포함하며, 그 할당비트수들이 스케일 팩터 및 제곱평균을 포함한 주파수 대역의 특성의 점유율을 이용하여 계산된 주소에 저장된 룩업테이블; 입력 오디오 신호의 특성의 점유율에 따라 룩업테이블의 주소를 결정하는 특성획득부; 룩업테이블에서의 주소에 대응하는 비트들을 각 주파수 대역에 할당하고, 할당된 비트들을 양자화하는 비트할당 및 양자화부; 및 양자화된 비트들로부터 양자화된 오디오 신호를 비트스트림으로 형성하는 프레임 팩킹부를 포함하며, 상기 특성획득부는, 입력 오디오 신호의 각 주파수 대역의 특성을 계산하는 계산부, 각 주파수 대역별 특성의 점유율들을 결정하는 결정부 및 각 주파수 대역별 점유율들을 비교하고 가장 큰 점유율을 이용하여 각 주파수 대역별 주소를 계산하는 주소 계산부를 포함하는 것을 특징으로 한다.In order to achieve the above object, the digital audio encoding apparatus according to the present invention comprises: a frequency mapping unit for dividing a time domain input audio signal into a plurality of same frequency bands; A lookup table including allocation bit numbers for encoding a frequency band, wherein the allocation bit numbers are stored at an address calculated using a share of characteristics of the frequency band including a scale factor and a squared average; A characteristic acquisition unit for determining an address of a lookup table according to a share of the characteristics of the input audio signal; A bit allocation and quantization unit for allocating bits corresponding to addresses in the lookup table to each frequency band and quantizing the allocated bits; And a frame packing unit configured to form a quantized audio signal from the quantized bits into a bitstream, wherein the characteristic obtaining unit comprises: a calculating unit calculating a characteristic of each frequency band of the input audio signal; And an address calculator for comparing the determining unit and the occupancy rate of each frequency band and calculating an address for each frequency band using the largest occupancy rate.

이하, 본 발명의 실시예에 대하여 첨부된 도면을 참조하여 상세히 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명에서 가장 핵심이 되는 LUT에 의한 비트할당방식은 제1도의 청각심리모델 부분과 비트할당 루프부분을 없애고, 그 대신 미리 구해진 LUT에 의해 비트할당을 하는 것이다. 본 발명에서 제안한 디지탈 오디오 부호화장치는 제3도에 도시되어 있다. 그러면, 먼저 비트할당에 필요한 LUT 생성과정을 살펴보기로 한다.In the present invention, the most important bit allocation method by LUT is to remove the psychoacoustic model part and bit allocation loop part of FIG. 1 and to perform bit allocation by using a previously obtained LUT. The digital audio encoding apparatus proposed by the present invention is shown in FIG. First, the LUT generation process required for bit allocation will be described.

제2a도는 LUT를 작성하기 위한 장치를 설명하기 위한 도면으로서, 주파수 맵핑부(21), 청각심리모델(23), 비트할당부(25), 특성값획득부(27) 및 LUT 작성부(29)를 구비한다.FIG. 2A is a diagram for explaining an apparatus for creating a LUT. The frequency mapping unit 21, the psychoacoustic model 23, the bit allocation unit 25, the characteristic value acquisition unit 27, and the LUT preparation unit 29 are shown. ).

제2b도는 LUT를 작성하기 위한 방법을 설명하기 위한 플로우챠트이다. 제2b도를 참조하여, LUT를 작성하기 위한 장치의 동작을 상세히 설명한다.2B is a flowchart for explaining a method for creating a LUT. Referring to FIG. 2B, the operation of the apparatus for creating the LUT is described in detail.

먼저, 입력된 오디오 데이터 정확히, 테스트 오디오 입력신호를 주파수 맵핑부(21)를 통해 대역분해필터링한 후 다수의 주파수 대역들로 분할한다(제210단계). 특성값 획득부(27)를 통해 각 대역별 시간영역에서의 특성으로 스케일 팩터와 제곱평균을 계산한다(제220단계).First, exactly the input audio data is subjected to band resolution filtering through the frequency mapping unit 21 and then divided into a plurality of frequency bands (step 210). The scale factor and the root mean square are calculated as the characteristics in the time domain for each band through the characteristic value obtaining unit 27 (operation 220).

구체적으로 살펴보면, LUT는 각 대역별 비트할당량을 결정하여 주는 표로서, 비트할당에 소요되는 시간을 최소한으로 줄이기 위해 사용된다. 그러기 위해서는 청각심리모델에 사용되는 FFT 혹은 전력분포를 구하는 과정 등 이러한 주파수영역에서 입력신호를 분석하는 과정이 없어야 전체 수행시간을 줄일 수 있으므로 입력신호에 대한 시간영역에서의 특성을 고려한다. 즉, 입력신호가 들어오면 그것을 주파수영역에서 해석하지 않고, 시간영역에서 바로 특성을 찾아내어 그 값들을 LUT 작성의 기준으로 정한다. 본 발명에서는 청각심리모델에 준하는 이 특성들을 스케일 팩터 및 제곱평균으로 규정한다. 물론, 이러한 특성들은 제1도에 있어서 주파수 매핑부(11)의 대역분해필터를 거친 다음 바로 구하여 LUT 주소를 결정하는데 이용된다. 먼저, 이러한 특성들에 대해 간략하게 살펴보고, 이 특성들을 어떻게 부호화 알고리즘에 적용하는지를 살펴보기로 한다.Specifically, the LUT is a table that determines bit allocation for each band, and is used to minimize the time required for bit allocation. To do this, the overall execution time can be reduced without analyzing the input signal in the frequency domain such as the FFT or power distribution used in the psychoacoustic model. Therefore, consider the characteristics in the time domain for the input signal. In other words, when an input signal comes in, it does not interpret it in the frequency domain, but finds the characteristic immediately in the time domain and sets the values as a standard for LUT creation. In the present invention, these characteristics according to the psychoacoustic model are defined by the scale factor and the root mean square. Of course, these characteristics are used to determine the LUT address immediately after passing through the bandpass filter of the frequency mapping unit 11 in FIG. First, we will briefly discuss these features and how to apply them to an encoding algorithm.

청각심리모델(23)의 최종출력값은 SMR로서, 각 대역별 신호레벨과 마스킹레벨간의 차이를 의미한다. 이 값을 구하기 위해 먼저 신호레벨을 계산해야 하는데 청각심리모델(23)에서는 각 대역별 전력(power)과 스케일 팩터의 전력 스펙트럼(sf_max) 중에서 큰 값을 그 대역의 신호레벨로 둔다. 이를 수식으로 나타내면 다음 제1식과 같다. 여기서, Lsb(n)은 대역 n의 음압레벨로서, SMR의 S 레벨과 같은 의미이다.The final output value of the psychoacoustic model 23 is SMR, which means a difference between a signal level and a masking level for each band. In order to obtain this value, the signal level must first be calculated. In the psychoacoustic model 23, a larger value of power of each band and power spectrum sf _max of the scale factor is set as the signal level of the band. This is represented by the following equation. Here, Lsb (n) is the sound pressure level of the band n, which is the same as the S level of the SMR.

Lsb(n) = MAX [ power(n), 20log(sf_max(n) * 32768)-10] dB ... (1)Lsb (n) = MAX [power (n), 20log (sf _max (n) * 32768) -10] dB ... (1)

제1식에서 각 대역의 신호레벨을 결정지어 주는 요소는 각 대역의 스케일 팩터와 전력값이다. 청각심리모델에서는 이 전력을 주파수변환을 통해 구하였으나 이를 시간영역에서 유추해 보면, 대역분해필터를 통과한 후의 각 대역별 샘플값들의 제곱평균으로 구할 수 있다. 그리고, 스케일 팩터는 각 대역에서 가장 큰 샘플값이므로 시간영역에서 바로 구할 수 있다.In the first equation, the factors that determine the signal level of each band are the scale factor and power value of each band. In the psychoacoustic model, this power was obtained through frequency conversion, but inferred from the time domain, it can be calculated as the squared mean of sample values for each band after passing through the band-pass filter. Since the scale factor is the largest sample value in each band, it can be directly obtained in the time domain.

일반적으로 오디오신호의 전력이 크면 비트를 많이 할당해 주어야 하므로 주파수영역에서의 전력은 시간영역에서의 샘플값들의 크기에 의해 결정된다고 해도 무방하다. 이러한 점을 감안하면 각 대역별 가장 큰 값인 스케일 팩터가 전력과 연관성이 있어 비트할당량을 결정지어 준다고 할 수 있다.In general, if the power of the audio signal is large, a lot of bits should be allocated, so the power in the frequency domain may be determined by the magnitude of the sample values in the time domain. Considering this, it can be said that the scale factor, which is the largest value of each band, is related to power, thereby determining bit allocation.

구체적으로 오디오신호의 특성들을 이용하여 LUT를 작성함에 있어서, 가장 중요한 고려사항은 어떻게 이러한 특성들을 LUT의 주소를 찾는데 이용하느냐는 것이다. 즉, 이 두 특성 즉, 스케일 팩터와 제곱평균을 이용하여 LUT를 작성하는 방법을 살펴보면 다음과 같다.Specifically, in writing a LUT using the characteristics of the audio signal, the most important consideration is how to use these characteristics to find the address of the LUT. That is, the method of writing LUT using these two characteristics, namely scale factor and root mean, is as follows.

다시 제 2b도를 참조하면, 제220단계 후에, 특성값 획득부(27)를 통해 전체 대역에 대한 각 대역별 스케일 팩터와 제곱평균의 점유율을 구한다(제230단계). 다음에, 각 대역별 점유율들을 비교하고, 가장 큰 점유율에 근거하여 각 주파수 대역별 주소를 계산한다(제240단계).Referring back to FIG. 2B, after step 220, the feature value acquisition unit 27 calculates the occupancy ratio of the scale factor and the root mean square for each band for the entire band (step 230). Next, the share of each band is compared and an address for each frequency band is calculated based on the largest share (step 240).

LUT는 각 대역별 주소와 비트할당량을 저장하기 위한 것이고, 비트할당에 소요되는 시간을 최소한으로 줄이기 위해서 사용한다. 즉, 위 두 파라미터들로 결정되어지는 주소를 LUT에서 찾아 해당되는 대역에 바로 비트할당해 주어야 한다. MPEG에서의 비트할당방식은 청각심리모델로부터 구해진 대역별 SMR값으로부터 가장 큰 값을 가진 대역부터 1 비트씩 증가시켜간다. 이는 대역별 상호 연관성을 고려하여 1 프레임내에서 상대적인 SMR값으로 처리하는 방식이다. 이러한 점을 고려하여 본 발명에서는 LUT의 주소를 구할때 스케일 팩터와 제곱평균의 각 대역별 상대적인 값을 구하고자 점유율 SR(n)의 개념을 도입한다.The LUT is used to store the address and bit allocation for each band and is used to minimize the time required for bit allocation. In other words, the address determined by the above two parameters should be found in the LUT and assigned to the corresponding band immediately. The bit allocation method in MPEG increases by one bit from the band having the largest value from the band-specific SMR values obtained from the psychoacoustic model. This is a method of processing relative SMR values within one frame in consideration of cross-band correlation. In consideration of this point, the present invention introduces the concept of occupancy SR (n) to obtain a relative value of each band of the scale factor and the root mean square when obtaining the address of the LUT.

대역 n의 스케일 팩터에 대한 점유율 SR_sf(n)은 대역 n의 스케일 팩터(sf(n))를 전체 대역의 스케일 팩터(sf)의 합으로 나눈 값으로서, 제2식과 같이 나타낼 수 있다.The occupancy SR _sf (n) for the scale factor of the band n is a value obtained by dividing the scale factor _sf (n) of the band n by the sum of the scale factors sf of the entire band and can be expressed as in the second equation.

한편, 대역 n의 제곱평균에 대한 점유율 SR_pwr(n)은 대역 n의 제곱평균(pwr(n))을 전체대역의 제곱평균의 합으로 나눈 값으로서, 제3식과 같이 나타낼 수 있다.On the other hand, occupancy SR _pwr (n) with respect to the square root of the band n is a value obtained by dividing the square root of the band n (pwr (n)) by the sum of the square roots of all the bands, and can be expressed as in Equation 3 below.

여기서, 전체대역은 두 채널 모두를 의미하는데, 이는 조인트(joint) 모드와의 연관성을 위해서이다. 이 점유율은 한 대역의 값이 전체 중에서 어느 정도를 차지하는지를 나타내 주는 지표이다. 그러므로 점유율은 전체의 값들을 고려한 상대적인 값으로서 MPEG의 반복적인 비트할당 루프와 유사한 개념이 된다.Here, the whole band means both channels, for the purpose of association with the joint mode. This occupancy is an indicator of how much the value of one band occupies. Therefore, occupancy is a relative value that takes into account the total values, which is a concept similar to the iterative bit allocation loop of MPEG.

이렇게 구한 각 대역별 스케일 팩터와 제곱평균의 점유율 중에 큰 값을 선택하여 그 대역의 주소를 구하는데 이용된다. 이렇게 처리하는 이유는 두 특성을 모두 LUT의 주소 계산에 이용해도 되지만 그 만큼 주소가 커져 전체 LUT 크기가 커지는 단점이 있기 때문이다. 그리고, 청각심리모델에서도 제1식에서 보듯이 둘 중 큰 것을 그 대역의 신호레벨로 정하기 때문에 본 발명에서도 이와 유사한 처리를 해 준것이다.A large value is selected from the scale factors for each band and the occupancy rate of the root mean square, and used to obtain the address of the band. The reason for this is that both characteristics can be used to calculate the address of the LUT, but the disadvantage is that the larger the address, the larger the total LUT size. In the psychoacoustic model, as shown in Equation 1, the larger of the two is determined as the signal level of the band, and thus, the present invention has similar processing.

점유율은 모두 1보다 작은 값들이 나오는데 본 발명에서는 이값들을 표1의 스케일 팩터 63 단계 매핑표를 이용하여 정수값으로 변환한 다음 LUT의 주소를 지정한다. 이러한 경우 각 대역에서 가장 큰 주소는 62가 나올 수 있다.The occupancy is all less than 1. In the present invention, these values are converted to integer values using the scale factor 63 mapping table of Table 1 and then addressed to the LUT. In this case, 62 may be the largest address in each band.

다시 제2b도를 참조하면, 한편 청각심리모델(23)을 이용하여 얻은 SMR에 근거하여 비트 할당부(25)를 통해 각 대역별 할방비트수를 구한다(제250단계). 제250단계 후에, 각 대역에 대하여 청각심리모델에 의한 비트할당량과 두가지 특성들에 대한 점유율이 구해졌으므로, 이를 기준으로 해당 주소에 각 대역별 할당비트수를 구하여 저장한다(제260단계).Referring to FIG. 2B again, the number of split bits for each band is calculated through the bit allocator 25 based on the SMR obtained using the psychoacoustic model 23 (step 250). After operation 250, since the bit allocation by the psychoacoustic model and the occupancy ratio of the two characteristics are obtained for each band, the allocation of the number of bits for each band is obtained and stored in the corresponding address (operation 260).

각 대역별 최종 점유율을 구한 후, 이 값들에 대응하는 비트할당량을 많은 오디오 데이타를 이용하여 조사한 다음, LUT에 저장한다. 즉, 청각심리모델을 이용한 MPEG 알고리즘을 이용하여 선택된 오디오 데이타를 통한 비트할당량을 조사하고, 그때의 각 대역별 점유율을 구하여 가장 많은 빈도를 차지하는 비트를 그 점유율에 해당하는 비트로 결정한다.After the final occupancy rate of each band is obtained, the bit allocation corresponding to these values is investigated using a lot of audio data and stored in the LUT. That is, the bit allocation through the selected audio data is investigated using the MPEG algorithm using the psychoacoustic model, and the occupancy rate of each band is determined to determine the bit corresponding to the occupancy rate as the bit corresponding to the occupancy rate.

하나의 대역에 대하여 예를 들면, 대역 0에서 청각심리모델에 의한 비트할당량이 7이고, 그때의 점유율이 5였다면, 주소 5의 7비트 영역에 빈도수를 1 증가시킨다. 이와 같은 과정을 여러 오디오 데이타에 대해 반복적으로 수행하여(제270단계), 가장 빈도가 많은 값을 그 대역의 비트할당량으로 결정한다(제280단계). 다음 표3에 그예가 나와 있듯이 조소 5에는 8, 주소 30에는 7, 주소 61과 62에는 0비트가 각각 최종 비트할당량으로 결정된다. 만약 빈도수가 가장 큰 대역이 둘 이상 있을 경우에는 높은 비트쪽을 선택한다.For example, in one band, if the bit allocation by the psychoacoustic model is 7 in the band 0 and the occupancy ratio is 5, the frequency is increased by 1 in the 7-bit region of the address 5. This process is repeated for a plurality of audio data (step 270), and the most frequent value is determined as the bit allocation of the band (step 280). As shown in Table 3 below, 8 bits are used for sculpture 5, 7 bits for address 30, and 0 bits for address 61 and 62, respectively. If there is more than one band with the highest frequency, the higher bit is selected.

한편, 이렇게 구해진 LUT는 각 대역별 62개씩 전체 32대역이 있으므로 총 1984개의 정보가 저정된다. 이는 25 kbytes 정도의 큰 용량이므로 전체 크기를 줄여 좀 더 효율적으로 메모리를 사용하고자 최적화과정을 부가한다(제290단계). 즉, 표3의 주소 61 및 62와 같이 최종 비트할당량이 0일 경우에는 LUT에 저장하지 않고, 연속적인 주소사이에 만약 같은 비트할당량이면 작은 주소쪽의 비트할당량만을 저장하도록 처리함으로써 비트값이 바뀌는 경계의 주소만이 정보로서 사용된다. 이러한 최적화과정은 LUT를 저장하기 위한 메모리의 용량을 대폭 줄일 수 있도록 한다.On the other hand, since the LUTs thus obtained have a total of 32 bands of 62 for each band, a total of 1984 pieces of information are stored. Since this is a large capacity of about 25 kbytes, the optimization process is added to reduce the total size and use the memory more efficiently (step 290). That is, if the last bit allocation is 0 as shown in addresses 61 and 62 of Table 3, the bit value is changed by processing to store only the bit allocation of the smaller address if the same bit allocation is made between consecutive addresses. Only the address of the boundary is used as information. This optimization can significantly reduce the amount of memory to store the LUT.

그러면, 작성된 LUT를 이용한 오디오 부호화에 대하여 설명하면 다음과 같다.Next, audio encoding using the generated LUT will be described.

제3도는 본 발명에 의한 룩업테이블을 이용한 디지탈 오디오 부호화장치를 나타낸 블럭도로서, 주파수 맵핑부(31), 특성값획득부(33), LUT(35), 비트할당 및 양자화부(37)와 프레임 팩킹부(39)로 구성된다.3 is a block diagram showing a digital audio coding apparatus using a look-up table according to the present invention, which includes a frequency mapping section 31, a characteristic value obtaining section 33, a LUT 35, a bit allocation and quantization section 37, The frame packing part 39 is comprised.

제3도의 구성에 의거하여 동작을 살펴보면, 주파수 맵핑부(31)에서는 먼저 오디오 데이타를 대역분해필터를 통과시켜 32개의 주파수 대역으로 나뉘어지면 각 대역에는 계층 Ⅰ일 경우 12개의 샘플, 계층 Ⅱ일 경우 36개의 샘플이 존재한다.Referring to the operation based on the configuration of FIG. 3, the frequency mapping unit 31 first divides the audio data through a band-pass filter to be divided into 32 frequency bands, and in each band, 12 samples for layer I and 12 samples for layer II. There are 36 samples.

특성값 획득부(33)에서는 LUT(35)의 주소를 구하기 위하여 각 대역별로 전술한 방법으로 입력신호에 대한 스케일 팩터와 제곱평균에 대한 점유율을 계산한 다음 그 중 큰 값을 선택하여 LUT(35)의 주소로 지정한다.In order to obtain the address of the LUT 35, the characteristic value acquisition unit 33 calculates the occupancy ratio of the scale factor and the root mean square of the input signal by the above-described method for each band, and then selects a larger value among them to select the LUT 35. ) As an address.

비트할당 및 양자화부(37)에서는 전술한 방법에 의해 구한 LUT(35)에 의해 각 대역별로 비트를 할당한 후, 요구되는 비트수와 비교하여 남는 비트가 있는지 아니면 넘었는지를 조사하여 전체 비트사용량을 조정해 준다. 물론, LUT(35)에 의해 비트를 할당했을 때, 요구되는 비트수에 맞추어 할당하는 것이 바람직하다. 그 이유는 부수적인 조정작업이 필요없으므로 그 만큼 처리속도를 줄일수 있기 때문이다.The bit allocation and quantization unit 37 allocates bits for each band by the LUT 35 obtained by the method described above, and then checks whether there are remaining bits or exceeded in comparison with the required number of bits, and then uses the total bit usage. Adjust the Of course, when the bit is allocated by the LUT 35, it is preferable to allocate the bit in accordance with the required number of bits. The reason is that the processing speed can be reduced by that, because no additional adjustment is required.

우선, LUT(35)에 의해 비트를 할당한 후, 요구되는 비트수를 넘었을 경우의처리과정을 살펴보면, 두 채널 모두에 대해 높은 대역부터 조사하여 가장 작은 점유율을 가지면서 한 비트 이상 할당된 대역부터 한 비트씩 감소시켜 요구되는 비트수를 넘지 않을 때까지 반복적으로 수행한다. 이때 한번 감소된 대역은 우선순위를 제일 나중으로 지정하여 모든 대역에 대해 골고루 감소시킬 수 있도록 처리한다. 그리고, 높은 대역부터 감소시키는 것은 일반적으로 중요한 정보가 저주파수쪽에 많이 집중되어 있기 때문이다.First, when the bits are allocated by the LUT 35 and then the processing process when the required number of bits is exceeded, the first step is to examine the high bands for both channels and have the smallest occupancy and the band allocated to one or more bits. Decreases by one bit from the command until it does not exceed the required number of bits. At this time, the band that has been reduced once is assigned the priority as the last, so that all bands can be reduced evenly. And, from the high band, it is generally because important information is concentrated in the low frequency side.

이와 반대로, LUT(35)에 의한 비트할당후, 사용된 총 비트수가 요구되는 비트수보다 적어서 할당할 수 있는 여부의 비트가 있을 경우에는 낮은 대역부터 조사하여 점유율이 가장 크면서 각 대역별 최대 비트할당량을 넘지 않은 대역부터 한 비트씩 증가시켜 요구되는 비트수를 넘지 않을 때까지 반복한다.On the contrary, after the bit allocation by the LUT 35, if there is a bit of whether the total number of bits used is smaller than the required number of bits and can be allocated, the low band is used to investigate the largest occupancy and the maximum bit for each band. It increments by one bit from the band not exceeding the quota, and repeats until the required number of bits is not exceeded.

이와 같은 비트조정과정은 오디오 부호화에 소요되는 전체 수행시간의 단축에 큰 영향을 미친다. 즉, LUT(35)에 의한 비트할당량이 얼마나 정확하게 요구되는 비트수에 맞게 되었느냐에 따라 부가적으로 따르는 이 비트조정과정의 반복횟수를 줄일 수 있기 때문에 정확하고 신뢰성있는 LUT(35)를 작성할 필요가 있다.This bit adjustment process greatly affects the reduction of the overall execution time required for audio encoding. That is, it is necessary to create an accurate and reliable LUT 35 because the number of repetitions of this bit adjustment process additionally depends on how exactly the bit allocation by the LUT 35 is matched to the required number of bits. have.

한편, 조인트-스테레오(Joint-Stereo) 모드는 인간의 청각심리특성으로부터 유도된 것으로서, 일반적으로 고주파수쪽으로 갈수록 오디오 소오스의 정확한 위치 파악 능력이 떨어진다는 사실을 이용한다. 일반적으로 스테레오 부호화방식과 하드웨어 복잡도가 거의 같고, 부호화기의 지연시간도 거의 변화없다. 이 모드의 주목적은 오디오의 음질을 높여 부호화하는 것으로서, 비트전송률로 약 10~30 kbps 정도로 낮출 수 있다.Joint-stereo mode, on the other hand, derives from the human psychoacoustic characteristics and generally takes advantage of the fact that the audio source's ability to accurately locate audio sources decreases toward higher frequencies. In general, the hardware complexity is almost the same as the stereo coding scheme, and the delay time of the encoder is almost unchanged. The main purpose of this mode is to increase the audio quality of the encoding, which can be reduced to about 10 to 30 kbps at a bit rate.

MPEG 오디오에서 표준화된 조인트-스테레오 부호화방식을 살펴보면, 특정한 대역을 결정하여 그 대역 이상에서는 샘플들을 각각 부호화하지 않고 그것들의 합을 구하여 하나만 부호화한다. 그리고, 조인트-스테레오 부호화할 특정한 대역의 범위결정은 각 대역의 잡음이 마스크레벨 이하로 내려가게 하기 위한 비트할당량을 결정하여 요구되는 비트수보다 많은지의 여부를 조사하여 4,8,12,16의 4가지 대역 중 하나를 선택한다. 이렇게 결정된 조인트-스테레오 대역에서는 좌, 우 채널 중 큰 비트할당량을 선택하여 양자화한다.Looking at the joint-stereo coding scheme standardized in MPEG audio, a specific band is determined, and above that band, the samples are summed and only one is encoded without obtaining the samples. In the range determination of a specific band to be joint-stereo coded, the bit allocation for causing the noise of each band to be lowered below the mask level is determined to examine whether or not the number of bits is greater than the required number of bits. Select one of four bands. In the joint-stereo band thus determined, a large bit allocation among left and right channels is selected and quantized.

이러한 개념을 본 발명에서의 LUT에 적용하여 부호화할 때 문제가 되는 것은 조인트-스테레오로 부호화할 대역의 범위를 결정하는 것이다. 본 발명에서는 각 대역의 점유율을 조사하여 전체 대역 중 점유율의 합이 99%가 넘는 대역이 어디까지인가를 검사하여 범위를 결정한다. 그리고, 조인트-스테레오로 부호화할 대역에서는 좌, 우 채널의 스케일 팩터와 제곱평균이 큰 값을 각각 선택하여 이를 점유율을 구할때 이용한다. 각각의 점유율을 구할때 전체대역의 합은 스테레오 모드일때보다 작아지므로 각각의 점유율은 스테레오일 때보다 커진다. 그러므로 조인트-스테레오 모드로 부호화할 경우에는 다른 주소가 계산되어 지므로 비트할당량도 달라지게 된다. 따라서 하나의 LUT를 이용하여 스테레오와 조인트-스테레오 모드로 부호화하는 것이 가능하게 된다.The problem when applying the concept to the LUT in the present invention and encoding is to determine the range of band to be encoded by joint-stereo. In the present invention, by examining the occupancy of each band, the range is determined by examining where the total of the occupied bands exceeds 99%. In the band to be encoded with joint stereo, a scale factor and a root mean square value of the left and right channels are respectively selected and used to obtain the occupancy ratio. When each occupancy rate is obtained, the sum of all bands is smaller than in stereo mode, so each occupancy rate is larger than in stereo. Therefore, when coding in joint-stereo mode, the other bits are calculated so that the bit allocation is different. Therefore, it is possible to encode in stereo and joint-stereo mode using one LUT.

실제로 본 발명의 성능을 실험하기 위하여 12개의 오디오 데이타를 이용하여 전체 수행시간을 특정하였다. 실험 환경은 유닉스 시스템하의 SUN SPARC-10을 사용하고, 오디오 데이타는 CD로 부터 얻은 데이타이다.In fact, to test the performance of the present invention, twelve audio data were used to specify the total execution time. The experimental environment uses SUN SPARC-10 under Unix system, and audio data is obtained from CD.

다음 표4는 부호화기에서의 수행시간을 나타낸 것으로서, 본 발명에서 제안한 UUT를 이용한 부호화방식과 기존의 MPEG 알고리즘을 비교하여 그 성능을 비교한 실험결과이다.The following Table 4 shows the execution time in the encoder and compares the performance of the encoding method using the UUT proposed in the present invention and the existing MPEG algorithm.

표4에서 12개의 오디오 데이타는 임의의 명칭을 부여한 것이고, 숫자는 실제 수행시간을 나타낸다. 그리고 성능개선은 다음 제4식에 의해 구해진 것이다.In Table 4, twelve audio data are given an arbitrary name, and the numbers represent actual execution times. The performance improvement is obtained by the following equation.

먼저, 본 발명에서 제안한 알고리즘의 수행속도를 살펴보면, 표4에서와 같이 오디오 데이타에 따라 약간의 차이는 보이지만 기존의 방식보다 평균 44% 내외의 속도개선이 이루어진 것을 알 수 있다. 표4에서 데이타 1~6은 계층 Ⅱ, 나머지는 계층 Ⅰ에 대한 수행속도를 나타낸다. 이는 곧 부호화기의 실시간처리속도 향상을 의미하므로 하드웨어로 구현시 부호화기에서의 처리 지연이 감소된다. 그러나, 본 발명에서 제안한 비트할당과정은 단지 LUT에 의해서만 이루지는 것이 아니라 이 과정후에 부수적으로 따르는 여분의 비트 혹은 모자라는 비트를 처리해 주는 과정이 있기 때문에 이 비트할당 조정과정을 얼마나 빨리 처리해 주느냐에 따라 전체 수행시간은 훨씬 더 빨라질 수 있다. 그리고, 오디오의 음질을 살펴보면 기존의 MPEG 알고리즘에서 나오는 음질수준과 거의 동일하다. 즉 계층 Ⅰ에서는 128 kbps, 계층 Ⅱ에서는 96 kbps까지 원음과 거의 같은 CD 음질을 들을 수 있다.First, looking at the execution speed of the algorithm proposed in the present invention, as shown in Table 4, it can be seen that a slight difference depending on the audio data, but the speed improvement of about 44% on average compared to the conventional method. In Table 4, data 1-6 show the performance rate for layer II and the rest for layer I. This means that the real-time processing speed of the encoder is improved, so that the processing delay in the encoder is reduced when the hardware is implemented. However, the bit allocation process proposed in the present invention is not performed only by the LUT, but because there is a process of processing the extra bits or the missing bits after this process, depending on how quickly the bit allocation adjustment process is performed. The overall run time can be much faster. In addition, the sound quality of the audio is almost the same as that of the conventional MPEG algorithm. In other words, up to 128 kbps in layer I and 96 kbps in layer II, almost the same CD sound quality can be heard.

한편, 본 발명에서 제안한 알고리즘의 성능이 기존의 MPEG과 어느정도 차이가 있는지를 알아보기 위해 Karlheinz Brandenburg가 제안한 청각심리특성을 이용한 성능평가방법을 이용하였다. 이 방법은 NMR을 기준척도로 이용하는데, 이 값이 음이면 잡음이 마스크레벨 이하에 위치한다는 것을 의미하므로 그만큼 잡음이 적게 들려 좀 더 깨끗한 음질로 재생되는 것으로 평가한다.On the other hand, the performance evaluation method using the psychoacoustic characteristics proposed by Karlheinz Brandenburg was used to see how the performance of the algorithm proposed in the present invention is different from the conventional MPEG. This method uses NMR as a reference scale. If this value is negative, it means that the noise is below the mask level. Therefore, it is estimated that the noise is reproduced with cleaner sound quality because it is less noise.

제4 및 제5도는 두 알고리즘의 평가결과를 채널별로 비교한 그래프로서, LUT를 이용한 방식이 낮은 주파수 영역에 대하여 LPEG 방식에 비해 비교적 잡음에 강한 특성을 보인다. 즉, 똑같은 요구비트량일 경우라도 LUT를 이용한 부호화방식에서 더 효율적인 비트사용이 이루어지고 있는 것이다.4 and 5 are graphs comparing the evaluation results of the two algorithms on a channel-by-channel basis, and the method using the LUT shows a relatively stronger noise performance than the LPEG method in the low frequency region. In other words, even in the same amount of required bits, more efficient use of bits is achieved in the encoding method using the LUT.

상술한 바와 같이 본 발명에 의한 룩업테이블을 이용한 디지탈 오디오 부호화방법 및 장치에서는 청각심리모델을 사용하지 않으므로 부호화기에서의 연산시간 지연을 크게 줄여줌으로써 하드웨어로 구현시 부호화기에서의 실시간 처리가능성이 높아졌다. 실제 응용예로서, 네트워크를 통한 음악의 부호화를 들 수 있는데 이는 VOD 개념과 비슷한 것으로서 사용자가 요구하는 음악을 네트워크를 통해 전달받아 실시간에 부호화하여 전송해 주는 시스템이다.As described above, the digital audio encoding method and apparatus using the look-up table according to the present invention do not use the psychoacoustic model, thereby greatly reducing the computation time delay in the encoder, thereby increasing the possibility of real-time processing in the encoder when implemented in hardware. As an actual application example, encoding of music through a network is similar, which is similar to the VOD concept, and is a system that receives and transmits music required by a user through a network in real time.

또한 청각심리모델은 그 알고리즘이 너무 복잡하여 실제 하드웨어로 구현하기가 힘들고, 부호화기의 값을 결정짓는데 가장 큰 영향을 주는 것으로서, 본 발명에서는 이 청각심리모델을 사용하지 않고 LUT를 사용하므로 메모리만 추가함으로써 하드웨어로 구현하기가 용이해지며 부호화기에 드는 비용을 절감할 수 있다.In addition, the psychoacoustic model is difficult to implement in actual hardware because the algorithm is too complex, and has the greatest influence on determining the value of the encoder. In the present invention, since the psychoacoustic model is not used, only the memory is added. This makes it easy to implement in hardware and can reduce the cost of the encoder.

또한, 본 발명은 기존의 MPEG 부호화기에 비해 훨씬 축소된 칩을 얻을 수 있으므로 휴대용 카메라 등 소형기기에 응용될 수 있다. 그리고, 요즘 컴퓨터에는 MPEG 칩을 탑재하는 것이 보편화된 추세인데, 본 발명을 사용할 경우 MPEG과 완변한 호환을 유지하므로 컴퓨터 내부의 보드 크기를 줄일 수 있어 경박단소형을 추구하는 제품을 제작할 수 있다.In addition, the present invention can be applied to a small device such as a portable camera because a much smaller chip can be obtained than the conventional MPEG encoder. In recent years, the use of the MPEG chip in the computer is a general trend, and when the present invention is used, it is possible to reduce the board size of the computer because it maintains the full compatibility with the MPEG, and thus it is possible to manufacture a product that pursues light and small size.

Claims

A digital audio encoding method comprising: a lookup table including allocation bits for encoding a frequency band, and storing the allocation bits at an address calculated using a share of the characteristics of the frequency band including a scale factor and a root mean square. Creating in advance; Dividing the time domain input audio signal into a plurality of same frequency bands; Calculate the scale factor and square average of each frequency band, determine the share of scale factor and square average of each band, compare the share of each frequency band, calculate the address of each frequency band using the largest share, Determining the number of allocated bits for each frequency band, including extracting the number of allocated bits from a lookup table prepared in advance using an address calculated for each frequency band; Allocating bits to frequency bands corresponding to the extracted number of allocated bits and quantizing the allocated bits; And forming a quantized audio signal from the quantized bits into a bitstream.

The digital audio encoding method of claim 1, wherein the generating of the lookup table comprises optimizing bit allocation by storing only a boundary portion where a bit value changes, except for an address having an allocated number of bits in the lookup table. .

The method of claim 1, further comprising: comparing the requested number of bits with the actual number of allocated bits after allocating bits using a lookup table, and adjusting the allocation bits according to the comparison result. Digital audio coding method.

4. The method of claim 3, wherein when the actual bit allocation is allocated less than the required number of bits, the band having the largest occupancy rate is found from the low frequency band and increased by one bit, and the actual bit allocation is allocated more than the required number of bits. If it is, the digital audio encoding method characterized by finding the band having the smallest occupancy from the high frequency band and decreases by one bit.

The digital audio encoding method of claim 1, wherein allocation numbers of bits are determined using the lookup table in a stereo mode and a joint-stereo mode without changing the lookup table.

6. The method of claim 5, wherein the method finds boundaries of frequency bands to be encoded in a joint-stereo mode by obtaining occupancy rates for each frequency band.

A digital audio encoding apparatus for encoding an input audio signal, comprising: a frequency mapping unit for dividing a time domain input audio signal into a plurality of same frequency bands; A lookup table including allocation bits for encoding a frequency band, wherein the allocation bits store the characteristics of the frequency band including the scale factor and the root mean square at an address calculated using occupancy rates; A characteristic acquisition unit for determining an address of a lookup table according to a share of characteristics of the input audio signal; A bit allocation and quantization unit for allocating bits corresponding to addresses in the lookup table to each frequency band and quantizing the allocated bits; And a frame packing unit configured to form a quantized audio signal from the quantized bits into a bitstream, wherein the characteristic obtaining unit comprises: a calculating unit calculating a characteristic of each frequency band of the input audio signal; And an address calculator which compares a decision unit to determine and an occupancy rate of each frequency band, and calculates an address for each frequency band by using the largest occupancy rate.

10. The digital audio encoding apparatus of claim 7, wherein the lookup table excludes an address having an allocated number of bits from 0 and stores only a boundary portion at which a bit value changes to optimize bit allocation.

The method of claim 1, wherein generating the lookup table comprises:

(a) dividing the test audio input signal into a plurality of frequency bands;

(b) calculating a scale factor and a root mean square for each frequency band;

(c) calculating occupancy rates corresponding to the scale factor and the root mean square of each frequency band;

(d) comparing the occupancy rates of each frequency band and calculating an address for each frequency band based on the largest occupancy rate;

(e) determining the number of allocated bits for each frequency band by using the psychoacoustic model;

(f) storing the number of allocated bits for each frequency band at an address corresponding to each frequency band;

(g) repeating steps (a) to (f) and storing a plurality of allocated bit numbers in each address; And

and (h) generating the lookup table by storing each address and the number of allocation bits having the highest frequency in each address.