KR100300887B1

KR100300887B1 - A method for backward decoding an audio data

Info

Publication number: KR100300887B1
Application number: KR1019990006157A
Authority: KR
Inventors: 유수근; 박정재
Original assignee: 유수근
Priority date: 1999-02-24
Filing date: 1999-02-24
Publication date: 2001-09-26
Also published as: KR20000056661A; AU1693400A; JP2002538503A; WO2000051243A1

Abstract

본 발명은 디지털 오디오 데이터를 역방향으로 디코딩하는 방법에 관한 것으로서, 순방향 디코딩 알고리즘을 그대로 사용하는 방법과 역방향 디코딩 알고리즘을 개발하여 효율을 높이는 방법이 있으며, 이때 본 발명의 역방향 디코딩 알고리즘은, 압축된 디지털 오디오 데이터의 마지막 프레임 헤더를 확인하는 제 1단계; 상기 확인된 헤더 정보에 근거하여, 해당 프레임을 구성하는 복수의 단위 블록 데이터 전체에 대하여 동시에 역양자화하는 제 2단계; 상기 역양자화된 데이터에 대해, 상기 단위 블록간의 연속성을 유지하면서 주파수 대역별 데이터로 복원하는 제 3단계; 및 상기 복원된 주파수 대역별 데이터를 시간 역순의 오디오 샘플 데이터로 변환출력하는 제 4단계를 포함하여 이루어짐으로써, 알고리즘과 연산량, 메모리 사용량이 순방향 디코딩 방법과 거의 유사하면서도 역방향 디코딩이 가능하며, 디지털로 압축된 오디오 데이터를 테이프 등과 같이 순방향과 역방향 트랙을 갖는 아날로그 매체에 녹음하는 경우 한 방향의 기록으로 두 트랙을 모두 녹음할 수 있도록 하는 매우 유용한 발명인 것이다.The present invention relates to a method of decoding digital audio data in the reverse direction, and there is a method of using the forward decoding algorithm as it is and a method of improving the efficiency by developing a reverse decoding algorithm, wherein the reverse decoding algorithm of the present invention is compressed digital A first step of identifying a last frame header of the audio data; A second step of simultaneously dequantizing all the plurality of unit block data constituting the frame based on the identified header information; A third step of restoring the dequantized data into frequency band data while maintaining continuity between the unit blocks; And a fourth step of converting and restoring the restored frequency band data into audio sample data in chronological order, such that algorithm, calculation amount, and memory usage are almost similar to the forward decoding method, but can be reversely decoded. When recording compressed audio data to an analog medium having forward and reverse tracks, such as tape, it is a very useful invention to record both tracks in one direction of recording.

Description

A method for backward decoding an audio data}

본 발명은 MPEG 오디오 데이터를 아날로그 신호로 역방향으로 디코딩하는 방법에 관한 것으로서, 특히 디지털 오디오 데이터를 테이프 등과 같은 아날로그 매체에 고속으로 녹음시, 메모리 사용량이나 연산량을 크게 증가시키지 않고도 역방향 디코딩하여 아날로그 오디오 신호를 역방향으로 기록가능하게 한 디지털 오디오의 역방향 디코딩 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method of backward decoding MPEG audio data into an analog signal. Especially, when digital audio data is recorded at high speed on an analog medium such as tape, the analog audio signal is reversely decoded without significantly increasing memory usage or computation amount. The present invention relates to a reverse decoding method of digital audio, which enables recording in reverse direction.

일반적으로 오디오 데이터를 저장함에 있어서, 디지털 방식으로 저장하는 경우에는 기록 매체 간의 복제시 음질의 손실이 없는 것은 물론, 디지털 변환시 효율적인 압축 방식을 이용하면 큰 음질의 손실 없이도 데이터량을 상당히 줄일 수 있기 때문에 데이터의 보관 및 관리가 용이하며, 통신상으로 데이터를 전송할 경우에 효율이 뛰어나는 등 아날로그 방식으로 저장하는 경우에 비해 여러 가지 장점을 지니고 있다.In general, when storing audio data, when digitally storing, there is no loss of sound quality when copying between recording media, and an efficient compression method for digital conversion can reduce the amount of data considerably without large sound quality loss. It is easy to store and manage data, and it has many advantages compared to the case of storing in the analog way.

이와 같은 여러 장점으로 인해, 오디오 데이터를 디지털 데이터로 보다 효율적으로 변환시키기 위한 여러 가지 인코딩 방법이 고안된 바 있으며, 그 중 대표적인 것으로 MPEG(Moving Picture Expert Group) 오디오 규격이 있는바, MPEG 오디오란 고품질??고능률의 스테레오 부호화를 위한 국제표준화기구(ISO)/국제전기기술위원회(IEC)의 표준방식으로서, 종래 압축부호화 방식에 비해 뛰어난 음질을 실현할 수 있어서 MPEG 비디오와 조합되어 고능률의 멀티미디어 정보 압축을 실현할 수 있는 것은 물론, 디지털 음악 방송 등에 단독으로 이용할 수도 있다. 최초 MPEG1부터 시작하여, 점점 증가되고 있는 멀티미디어 데이터 압축 표준에 대한 새로운 필요성으로 인해 MPEG2를 거쳐, 객체지향 멀티미디어 통신을 위한 MPEG4가 발표되었으며, 현재도 계속 연구가 진행 중이다.Due to these advantages, several encoding methods have been devised to convert audio data into digital data more efficiently. Among them, there is a moving picture expert group (MPEG) audio standard. Standardized by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) for high-efficiency stereo encoding, it is possible to realize superior sound quality compared to the conventional compression and coding schemes. Not only can be realized, but it can also be used alone in digital music broadcasting and the like. Starting with the first MPEG1, MPEG4 for object-oriented multimedia communication has been announced over MPEG2 due to the new need for an increasing number of multimedia data compression standards.

처음에 표준화된 MPEG1은 동영상과 오디오를 디지털 저장 장치에 최대 1.5Mbps로 압축??저장하기 위한 코딩 기술로 모두 다섯 부분으로 구성되어 있으며, 이 중 하나인 오디오 부분에는 3가지의 오디오 코딩 기법이 정의되어 있는바, 이를 각각 계층, 즉 레이어(Layer)1, 레이어2, 레이어3라고 부르며, 계층-3(이하 'MP3'라고 함)이 가장 세밀한 알고리즘을 사용하는 반면 압측성능이 우수하고, 계층-1과 계층-2를 개념적으로 포함하고 있으므로(즉, 하향 호환성이 있다), 이하에서는 MP3를 기준으로 아날로그 오디오 데이터의 인코딩 방법에 대해 간략히 설명한다.Initially standardized, MPEG1 is a coding technology for compressing and storing video and audio up to 1.5 Mbps in digital storage devices. It consists of five parts, one of which is defined by three audio coding techniques. Each layer is called Layer 1, Layer 2, and Layer 3, and Layer 3 (hereinafter referred to as 'MP3') uses the most detailed algorithm while providing excellent measurement performance. Since 1 and Layer-2 are conceptually included (ie, backward compatible), the following describes a method of encoding analog audio data based on MP3.

MPEG의 오디오 레이어들은 인식 코딩 또는 지각 부호화(Perceptual Coding)이라고 불리우는 압축 코딩 기법을 사용하는데, 이것은 사람의 청감 모델을 분석, 적용한 기법으로서 마스킹(Masking) 효과라고 하는 사람 귀의 둔감함을 이용한 일종의 트릭이라고 할 수 있다.The audio layers of MPEG use a compression coding technique called perceptual coding or perceptual coding, which is a technique of analyzing and applying a human auditory model, which is a kind of trick using the insensitivity of the human ear called the masking effect. Can be.

사람의 귀는 보통 20㎐에서 20㎑의 소리를 들을 수 있는데 이를 가청 주파수라고 하며, 각 주파수 범위마다 청각의 민감성이 달라지며, 특히 2㎑에서 5㎑의 대역에서 가장 민감한 것으로 알려져 있다.The human ear can usually hear 20 kHz to 20 kHz, which is called the audible frequency, and the sensitivity of hearing varies with each frequency range, especially the 2 kHz to 5 kHz band.

예를 들어 여러 사람이 피아노 연주를 듣는다고 할 때, 연주가가 피아노를 치지 않을 때에는 피아노의 현에서 울리는 여음을 들을 수 있지만 피아노 건반을다시 치는 순간 사람들은 그 소리를 더 이상 들을 수 없게 된다. 이는 여음이 없어져서가 아니라 그 음이 건반을 칠 때 나는 소리에 비해 작기 때문이다. 즉 큰 음(Masker)이 존재하고 있는 상태에서, 이 음 부근의 일정 범위, 즉 임계대역(Critical Band) 이하의 음은 꽤 레벨이 높은 순음임에도 들리지 않게 되는 것이다. 이러한 원리를 주파수 마스킹 효과라고 하는데, 주파수 대역에 따라 마스킹되는 음의 임계값도 달라지며, 귀에 민감한 부분에서는 마스킹되는 음의 임계값이 작게 되며 둔감한 부분은 큰 마스킹 임계값을 갖게 된다.For example, when several people listen to the piano, when the player does not play the piano, they can hear the sound of the strings on the piano, but as soon as they play the piano keys, they can no longer hear it. This is not because the sound is lost, but because the sound is small compared to the sound produced when you play the keyboard. In other words, in the state where a large mask exists, a sound in a predetermined range near the note, i.e., below the critical band, is not heard even though the sound is quite high level. This principle is called the frequency masking effect, and the threshold of the masked sound varies according to the frequency band, and the threshold of the masked sound becomes smaller in the ear-sensitive part, and the masked sound threshold has a large masking threshold.

마스킹 효과에는 상기 주파수 마스킹과 더불어 시간적(Temporal) 마스킹이 있다. 이는 어떤 큰 소리를 듣고 난 후에 이보다 작은 소리를 들을 수 있기까지 일정한 지연 시간이 있다는 것이다. 예컨대 60㏈의 소리를 5㎳동안 들려준 후 연속해서 40㏈의 소리를 들려줄 경우 약 5㎳ 이후에야 그 소리를 감지할 수 있다. 이와 같은 지연 시간 역시 주파수 대역에 따라 다른 값을 갖게 된다.Masking effects include temporal masking in addition to the frequency masking. This means that there is a certain delay after you hear a loud sound before you can hear a smaller sound. For example, if you hear 60㏈ of sound for 5㎳ and then hear 40㏈ of sound continuously, you can detect the sound only after about 5 약. This delay time also has a different value depending on the frequency band.

MP3는 이러한 사람의 청각 심리 모델을 이용해서, 주파수 대역에 따라 양자화에 의해 발생되는 양자화 잡음을 마스킹 임계값과 지연 시간 내에서 허용토록 하여 오디오 데이타의 비트율을 감소시킴으로써 음질의 손실 없이 압축할 수 있게 한것이다.MP3 uses this person's auditory psychological model to allow quantization noise caused by quantization according to frequency bands within the masking threshold and delay time to compress the audio data without loss of sound quality by reducing the bit rate. It is.

이와 같은 MP3 인코딩 방법에 대하여, 일반적인 인코딩 장치의 일부 구성을 개략적으로 도시한 도 1 및 도 2를 참조하여, 각 단계별로 보다 상세히 살펴보면 다음과 같다.With respect to such an MP3 encoding method, referring to FIG. 1 and FIG. 2 schematically showing some components of a general encoding apparatus, each step will be described in more detail as follows.

(1) 서브밴드 부호화(Subband Coding) 및 MDCT(1) Subband Coding and MDCT

임계(critical) 밴드 등의 청각 심리를 보다 효율적으로 이용하기 위해서는 우선 신호를 주파수 성분으로 나누는 것이 중요하며, 이를 위해 먼저 도 1의 필터 뱅크(10)을 이용하여 입력되는 오디오 PCM 샘플의 전대역을 32개의 등간격 주파수 밴드로 나눈 후, 각각의 신호를 원래 샘플링 주파수의 1/32로 서브샘플링해서 부호화한다(대역분할 부호화).In order to use the hearing psychology such as critical bands more efficiently, it is important to first divide the signal into frequency components. To do this, first, the entire band of the audio PCM sample input using the filter bank 10 of FIG. After dividing into four equally spaced frequency bands, each signal is encoded by subsampling to 1/32 of the original sampling frequency (band division coding).

그러나, 통상의 필터로 1/32의 주파수 대역을 취하는 경우 이상적인 필터가 아니기 때문에 서브 샘플링의 시점에서 '에일리어싱(aliasing)', 즉 샘플링 주파수의 1/2 보다 높은 주파수 성분을 제거하지 않고 신호처리했을 때 저주파 신호로 되돌아오는 잡음이 생기게 되므로, 폴리페이즈(Polyphase) 필터뱅크(Filter Bank)라고 하는 필터를 사용하거나, 도 1의 MDCT부(20) 및 에일리어싱 감소부(30)를 사용하여 MDCT(Modified Discret Cosine Transform: 변형 이산 여현 변환)를 수행함으로써, 32 밴드의 에일리어싱 잡음이 서로 소거되어 필터에 의한 열화가 해소되도록 하고 있다.However, if a typical filter takes a 1/32 frequency band, it is not an ideal filter, so it may be processed without removing the 'aliasing', i.e., removing frequency components higher than 1/2 of the sampling frequency. When the noise is returned to the low frequency signal, a filter called a polyphase filter bank is used, or MDCT (Modified) using the MDCT unit 20 and the aliasing reduction unit 30 of FIG. By performing Discret Cosine Transform, the aliasing noise of 32 bands is canceled from each other so that the degradation caused by the filter is eliminated.

MDCT는 크리티컬(critical)하게 샘플링된 값에 대한 DCT이므로 어떤 양자화도 행해지지 않았다면 완변하게 원래 신호로 복구 가능하다. 다만, 실제로는 양자화가 행해지므로 각 전송블록 사이에서 불연속성이 생기게 된다.MDCT is a DCT for critically sampled values, so if no quantization has been done, it can be completely recovered to the original signal. In practice, however, quantization is performed, resulting in discontinuities between the respective transport blocks.

다시 말해, 오디오 입력은 32개의 필터 뱅크를 통해 32개의 주파수 대역으로 분할된 후, 각 주파수 대역마다 인접한 주파수 대역의 성분에 의한 마스킹 영향을 계산해서 마스킹 임계값 이상의 신호에 대해서만 양자화 비트를 할당하게 되며, 이때 양자화에 의한 잡음이 주어진 주파수 대역에서 마스킹 임계값 이하가 되도록 양자화 비트를 할당하게 되는 것이다.In other words, the audio input is divided into 32 frequency bands through 32 filter banks, and then, for each frequency band, quantization bits are allocated only for signals above the masking threshold by calculating masking influences of components of adjacent frequency bands. In this case, the quantization bits are allocated such that the quantization noise is less than or equal to the masking threshold in a given frequency band.

(2) 스케일링(Scaling)(2) Scaling

상기 32개의 서브밴드 내의 샘플 데이터는 파형과 배율로 분리되며, 파형은 최대 진폭이 1.0이 되도록 정규화되고, 그 때의 배율이 스케일 팩터(scale factor)로서 부호화된다. 이 부호화에 의해 큰 신호성분에 대해서도 주파수와 시간 모두 가까운 영역내로 들어오도록 하여 양자화 잡음의 발생을 제한할 수 있기 때문에, 동일한 청각심리 효과가 작용하여 이들 잡음이 감지되지 않게 된다.The sample data in the 32 subbands is separated into a waveform and a magnification, and the waveform is normalized so that the maximum amplitude is 1.0, and the magnification at that time is encoded as a scale factor. This encoding allows the generation of quantization noise by restricting the generation of quantization noise by allowing the signal and the time to enter into a region close to both large signal components.

(3) 허프만 코딩(Huffman Coding)(3) Huffman Coding

허프만 코딩이란 가변장 부호화 또는 엔트로피 부호화라고 불리우는 코딩 방식으로서, 디지털 데이터의 통계적인 특성을 이용하여 비트의 중복을 없애는 리던던시 리덕션(Redundancy Reduction)방식, 즉 부호의 발생확률을 이용하여 발생확률이 높은 값에는 길이가 짧은 부호를 할당하고, 발생확률이 낮은 값에는 길이가 긴 부호를 할당함으로써, 전체 코딩 데이터의 평균 부호 길이를 줄이는 방식이다.Huffman coding is a coding scheme called variable length coding or entropy coding. It is a redundancy reduction method that eliminates duplication of bits by using statistical characteristics of digital data, that is, a high probability of occurrence using a code occurrence probability. By assigning a short code to a long code and assigning a long code to a low probability of occurrence, the average code length of all coded data is reduced.

예를 들어, 양자화 처리되어 입력되는 데이터 값이 0,1,2,3, 즉 00,01,10,11 이고, 실험에 의해 각각의 영상데이터 발생확률이 0.6, 0.2, 0.1, 0.1 이였다고 가정할 경우, 발생확률과 무관하게 부호길이를 2비트로 고정할당 한다면 평균부호길이는 (부호길이×할당 비트)/(데이터의 수)이므로 (2×0.6 + 2×0.2 + 2×0.1 + 2×0.1)/4 = 2 (비트) 가 된다.For example, it is assumed that the data values input by the quantization process are 0,1,2,3, that is, 00,01,10,11, and the probability of generating each image data was 0.6, 0.2, 0.1, 0.1 by experiment. In this case, if the code length is fixed to 2 bits regardless of the probability of occurrence, the average code length is (code length × allocation bits) / (number of data), so (2 × 0.6 + 2 × 0.2 + 2 × 0.1 + 2 × 0.1). ) / 4 = 2 (bits).

그러나, 부호길이를 가변시켜, 가장 발생확률이 높은 0 에는 1비트를 할당하고, 다음으로 발생확률이 높은 1에는 2비트를, 그리고 발생확률이 가장 낮은 2와3에는 3비트의 부호를 할당하면, 가변장 부호화 처리결과에 의한 평균부호길이는 (1×0.6 + 2×0.2 + 3×0.1 + 3×0.1)/4 = 1.6 (비트)이 된다.However, by varying the code length, one bit is assigned to 0 with the highest probability of occurrence, two bits are assigned to 1 with the highest probability of occurrence, and three bits are assigned to 2 and 3 with the lowest probability of occurrence. The average code length according to the result of the variable length coding process becomes (1 × 0.6 + 2 × 0.2 + 3 × 0.1 + 3 × 0.1) / 4 = 1.6 (bits).

따라서, 발생확률에 따라 할당 비트를 서로 상이하게 부여하는 가변장부호화에 의해 얻어진 데이터의 값(길이)이 보다 고압축 처리되는 것이다,Therefore, the value (length) of the data obtained by variable length encoding which gives allocation bits differently according to the probability of occurrence is more highly compressed.

이 외에 MP3에서는 주어진 압축율로는 압축할 수 없는 부분을 위한 비트 리저브와 버퍼링(Bit Reservoir Buffering) 등과 같은 기법을 병행하여 음질 및 압축율을 보다 향상 시키고 있는바, 이러한 모든 과정을 거친 코딩값을 비트 스트림 포맷팅하여 출력하며, 도 3은 이와 같이 최종 출력되는 MP3 비트 스트림의 단위 프레임을 데이터 특성별로 구분표시한 것이다.In addition, MP3 improves the sound quality and compression rate by using techniques such as bit reserve and buffering for parts that cannot be compressed at a given compression rate. Formatting and outputting, FIG. 3 shows the unit frames of the final output MP3 bit stream according to data characteristics.

상기와 같은 MP3 인코딩 이용시의 압축 효율에 대해 살펴보면, 기존의 디지털 오디오 기기에서는 샘플당 16비트에, 샘플링 주파수 32k㎐, 44.1k㎐, 48k㎐의 PCM(Pulse Code Modulation) 부호가 널리 사용되는데, 가령 두채널 스테레오, 44.1k㎐ 샘플링 및 16bit의 양자화 비트(bit)의 경우 비트율은 16×44100×2=1411200(약 1.4 Mbps)가 되는 반면, MPEG 오디어의 계층(Layer) 3에서는 이와 같은 신호를 128~256 kbps 정도로 부호화할 수 있다. 이것은 원래의 PCM 부호의 약 1/12~1/6에 상당하며 샘플당 원래의 16비트가 약 1.5 내지 3비트로 절감되는 셈이며, 따라서 MP3 파일로 CD를 제작할 경우 CD 한 장에 오디오 CD 12장 분량을 담을 수 있게 되는 것은 물론, 이렇게 압축하여도 원음과의 차이를 거의 느낄 수 없으며 특히 200kbps 이상이 되면 거의 판별할 수 없을 정도이다.As for the compression efficiency when using the MP3 encoding as described above, in conventional digital audio equipment, pulse code modulation (PCM) codes of 32kkHz, 44.1kkHz, and 48kkHz are widely used for 16 bits per sample. For two-channel stereo, 44.1kΩ sampling, and 16-bit quantization bits, the bit rate would be 16x44100x2 = 1411200 (about 1.4 Mbps), while Layer 3 of MPEG audio would take such a signal. It can be encoded at 128 ~ 256 kbps. This corresponds to about 1/12 to 1/6 of the original PCM code, which saves about 1.5 to 3 bits of the original 16 bits per sample, so if you make a CD from an MP3 file, you will get 12 audio CDs per CD. Of course, it is possible to hold the amount, of course, even if you do not feel the difference with the original sound, especially if it is more than 200kbps is almost impossible to determine.

그러나, 디지털 기록의 이와 같은 여러 장점에도 불구하고, 아직까지는 디지털 데이터 기록/재생 장치의 보급이 널리 이루어지지 않고 오히려 워크맨과 같은 아날로그 기록/재생 장치가 시장의 대부분을 차지하고 있는 상태이므로, 이러한 디지털 데이터를 널리 사용되고 있는 테이프 등 아날로그 매체에 저장할 필요성이 높아진다.However, despite these advantages of digital recording, digital data recording / reproducing devices have not yet been widespread, but analog recording / reproducing devices such as Walkmans dominate the market. Storage on analog media, such as tape, which is widely used.

이 경우 대부분의 테이프 매체 등에서는 동일 기록면의 상??하측에 순방향 트랙과 역방향 트랙을 구비하는데, 순방향 트랙에 대해 순방향 디코딩으로 재생하여 이를 녹음한 후, 순방향 트랙의 마지막부터 역방향 주행시키면서 역방향 트랙에 대해 다시 순방향 디코딩으로 재생하여 이를 녹음하게 되면, 각각의 트랙을 반복 주행함에 따라 녹음시간이 장시간이 소요된다는 문제점이 발생하게 된다.In this case, most tape media and the like have a forward track and a reverse track on the upper and lower sides of the same recording surface. The forward track and the reverse track are reproduced by forward decoding and recorded on the reverse track while driving backward from the end of the forward track. If the recording is reproduced by forward decoding, the recording takes a long time as each track is repeatedly driven.

또한, 아날로그 오디오을 역방향으로 재생하면서 엔코딩한 데이터를 구비하였다가 디코딩 기록하는 경우에는 별도의 공간이 필요하다는 문제점이 있는 것은 물론, 역방향 재생시의 마스킹 효과 등에 의해 원음 재생이 불가능하다는 문제점이 발생하게 된다.In addition, when decoding and recording data encoded while reproducing analog audio in the reverse direction, a separate space is required, as well as a problem that original sound reproduction is impossible due to masking effects during reverse reproduction.

따라서, 본 발명은 상기와 같은 문제점을 해결하기 위하여 창작된 것으로서 디지털 오디오 데이터를 테이프 등과 같은 아날로그 매체에 고속으로 녹음시, 메모리 사용량이나 연산량을 크게 증가시키지 않고도 역방향 디코딩이 가능한 MPEG 오디오의 역방향 디코딩 방법을 제공하는데 그 목적이 있는 것이다.Accordingly, the present invention was created to solve the above problems, and when the digital audio data is recorded at high speed on an analog medium such as a tape, the reverse decoding method of MPEG audio capable of reverse decoding without significantly increasing memory usage or calculation amount The purpose is to provide.

도 1 및 도 2는 MPEG 오디오 인코더의 구성을 개략적으로 도시한 것이고,1 and 2 schematically show the configuration of an MPEG audio encoder,

도 3은 MPEG 오디오 데이터의 단위 프레임의 구조를 도시한 것이고,3 shows the structure of a unit frame of MPEG audio data,

도 4는 MPEG 오디오 디코더의 일반적인 구성을 도시한 것이고,4 shows a general configuration of an MPEG audio decoder,

도 5는 사이드 인포메이션에 의한 단위 프레임 대응 데이터의 위치 파악 관계를 도시한 것이고,5 is a diagram illustrating a positional relationship of unit frame correspondence data by side information;

도 6은 IMDCT시 오버랩 합산하는 과정을 도식적으로 도시한 것이고,6 is a diagram schematically illustrating a process of adding overlap in IMDCT,

도 7은 순방향 디코딩을 위한 합성 필터의 구조를 도시한 것이고,7 shows the structure of a synthesis filter for forward decoding,

도 8은 도 7의 합성 필터 구현을 위한 알고리즘 순서도를 도시한 것이고,8 illustrates an algorithm flow diagram for implementing the synthesis filter of FIG.

도 9는 도 8의 블록도를 도시한 것이고,9 illustrates the block diagram of FIG. 8;

도 10은 본 발명에 따른 역방향 디코딩을 위한 합성 필터의 구조를 도시한 것이고,10 illustrates a structure of a synthesis filter for reverse decoding according to the present invention,

도 11은 도 10의 합성 필터 구현을 위한 알고리즘 순서도를 도시한 것이고,FIG. 11 illustrates an algorithm flow diagram for implementing the synthesis filter of FIG. 10.

도 12는 도 11의 블록도를 도시한 것이다.FIG. 12 illustrates the block diagram of FIG. 11.

※ 도면의 주요부분에 대한 부호의 설명※ Explanation of code for main part of drawing

10: 필터뱅크(Filter Bank) 20: MDCT부10: Filter Bank 20: MDCT unit

30: 에일리어스 리덕션부 40: 양자화부30: alias reduction unit 40: quantization unit

50: 허프만 인코더 60: 비트스트림 포매팅부50: Huffman encoder 60: bitstream formatting unit

100: 디먹스(DeMultiplexer) 110: 사이드 인포메이션 디코더100: DeMultiplexer 110: side information decoder

120: 허프만 디코더 130: 역양자화부120: Huffman decoder 130: inverse quantization unit

140: IMDCT부 150: 합성 필터뱅크140: IMDCT unit 150: synthetic filter bank

상기와 같은 목적을 달성하기 위한 본 발명에 따른 디지털 오디오 데이터의 역방향 디코딩 방법은, 디지털 오디오 데이터의 역방향 디코딩 방법에 있어서, 기록 오디오 데이터의 마지막 프레임 헤더를 확인하는 제 1단계; 상기 확인된 헤더 정보에 근거하여, 해당 프레임을 구성하는 복수의 단위 블록 데이터 전체에 대하여 동시에 역양자화하는 제 2단계; 상기 역양자화된 데이터에 대해, 상기 단위 블록간의 연속성을 유지하면서 주파수 대역별 데이터로 복원하는 제 3단계; 및 상기 복원된 주파수 대역별 데이터를 시간 역순의 오디오 샘플 데이터로 변환출력하는 제 4단계를 포함하여 이루어지는 것에 그 특징이 있는 것이다.According to an aspect of the present invention, there is provided a reverse decoding method of digital audio data, comprising: a first step of identifying a last frame header of recorded audio data; A second step of simultaneously dequantizing all the plurality of unit block data constituting the frame based on the identified header information; A third step of restoring the dequantized data into frequency band data while maintaining continuity between the unit blocks; And a fourth step of converting the restored frequency band-specific data into audio sample data in chronological order.

이하, 본 발명에 따른 MP3 오디오 역방향 디코딩 방법의 바람직한 실시예에 대해 상세히 설명한다.Hereinafter, a preferred embodiment of the MP3 audio reverse decoding method according to the present invention will be described in detail.

도 4는 본 발명을 구현하기 위한 MP3 오디오 역방향 디코딩 장치의 일실시예를 개략적으로 도시한 것으로서, 입력되는 MP3 오디오 비트 스트림을 다수의 특성 데이터로 분리출력하는 디먹스(Demultiplxer)(100); 상기 분리출력되는 데이터중 사이드 인포메이션(Side Information) 데이터를 독출해석하는 사이드 인포메이션 디코더(110); 상기 분리출력되는 오디오 데이터를 허프만 디코딩하는 허프만 디코더(120); 상기 허프만 디코딩된 데이터를 주파수 영역에서의 실제 샘플 에너지 값으로 복원하는 역양자화부(130); 상기 복원된 데이터를 MDCT 이전의 데이터로 복원하는 IMDCT(Inverse MDCT)부(140); 및 상기 복원된 데이터의 각 서브밴드 값을 합성하여 최종 PCM 샘플로 출력하는 합성 필터 뱅크(Synthesis Filterbank)(150)를 포함하여 구성되어 있다.FIG. 4 schematically illustrates an embodiment of an MP3 audio reverse decoding apparatus for implementing the present invention, comprising: a demultipler 100 for separating and outputting an input MP3 audio bit stream into a plurality of characteristic data; A side information decoder 110 for reading and interpreting side information data from the separated output data; A Huffman decoder 120 for decoding the separated output audio data; An inverse quantization unit 130 for restoring the Huffman decoded data to an actual sample energy value in a frequency domain; An Inverse MDCT unit 140 for restoring the restored data to data before MDCT; And a synthesis filter bank 150 for synthesizing each subband value of the reconstructed data and outputting the final PCM sample.

상기와 같이 구성된 MPEG 오디오 역방향 디코딩 장치를 이용하여, MPEG 오디오 비트 스트림을 역방향으로 디코딩하는 방법에 대해 설명하면 다음과 같다.A method of backward decoding an MPEG audio bit stream using the MPEG audio reverse decoding apparatus configured as described above is as follows.

(1) 프레임 헤더(Frame Header) 찾기(1) Finding a Frame Header

MP3 비트 스트림(bit stream)의 역방향 디코딩을 위해서는 오디오 비트 스트림의 맨 뒤에서부터 디코딩을 실시해야 하므로 순방향 디코딩에 비해서 먼저 디코딩해야할 시작 위치를 찾는 것이 문제가 되는바, MPEG 오디오 포맷에서 각 프레임은 서로 독립적이므로 역방향 디코딩 과정의 첫 번째 단계는 프레임 헤더를 찾는 것이다.Because backward decoding of MP3 bit streams requires decoding from the back of the audio bit stream, it is more problematic to find the starting position to decode first than the forward decoding.In MPEG audio format, each frame is independent of each other. Therefore, the first step in the backward decoding process is to find the frame header.

프레임 헤더를 찾기 위해서는 프레임 크기에 대한 정보를 미리 알고 있어야 하는데, 일반적으로, 압축된 MP3 오디오 비트 스트림은 프레임 단위로 나누어지고, 각 MPEG 포맷(format)에서 초당 프레임 수는 고정되어 있는데, 이것은 주어진 비트 레이트(bit-rate)와 샘플링 주파수에서 각 입력 프레임은 고정 사이즈를 가지고 있고, 고정된 개수의 출력 샘플을 생성한다는 것을 의미한다.In order to find the frame header, information about the frame size must be known in advance. In general, compressed MP3 audio bit streams are divided into frames, and the number of frames per second in each MPEG format is fixed. This means that at bit rate and sampling frequency, each input frame has a fixed size and produces a fixed number of output samples.

이와 같이 MPEG 오디오에서 프레임의 크기는 고정되어 있기는 하지만, 이 크기는 특정 비트율과 샘플링 주파수에 대한 고정값에 불과하므로, 실제 프레임의 크기를 알려면 먼저 프레임의 헤더를 찾아 분석하여야 하는 것이다.Although the frame size is fixed in MPEG audio as described above, the size is only a fixed value for a specific bit rate and sampling frequency. Therefore, in order to know the actual frame size, the header of the frame must be found and analyzed.

문제는 헤더를 어떻게 찾는가인데, 일반적으로 헤더에 싱크워드(Sync Word)를 제공하여 헤더임을 표시하기는 하지만, 오디오 샘플 데이터에 상기 싱크워드와 동일한 패턴이 발생할 수 있으므로 헤더를 잘못 찾는 경우가 생길 수 있다.The problem is how to find the header, although it usually indicates that it is a header by providing a sync word in the header, but the same pattern as the sync word may occur in the audio sample data, which may lead to the wrong find of the header. have.

이 문제를 해결하기 위해, 상기 디먹스(100)는 한 스트림 클립 내에서는 스트림의 특성이 변하지 않는다는 가정하에 스트림의 첫 헤더를 미리 분석하여 패딩(padding) 비트가 없는 경우의 프레임 크기를 얻은 다음, 해당 곡에 대응되는 비트 스트림 파일의 마지막 위치에서 상기 한 프레임 크기를 이용하여 마지막 프레임의 헤더 위치를 찾는다.In order to solve this problem, the demux 100 analyzes the first header of the stream in advance on the assumption that the characteristics of the stream do not change within a stream clip to obtain a frame size when there is no padding bit. The header position of the last frame is found by using the one frame size at the last position of the bit stream file corresponding to the song.

다만, 각 프레임의 크기는 헤더의 패딩 비트에 의해 그 크기가 1 바이트 만큼 변할 수 있으므로, 역방향으로 헤더를 찾는 경우 그 찾는 헤더가 들어있는 프레임의 정확한 크기를 모르기 때문에, 첫 헤더에서 얻은 프레임 크기만큼 해당 파일의 마지막 위치에서 역방향으로 이동한 후, 패딩 비트를 고려하여 한 바이트 만큼 이전 데이터부터 다음 데이터까지 헤더의 시작 위치를 찾도록 한다.However, since the size of each frame can be changed by 1 byte by the padding bit of the header, if the header is searched backwards, the size of the frame obtained from the first header is not known because the exact size of the frame containing the header is not known. After moving backward from the last position of the file, consider the padding bit to find the starting position of the header from the previous data to the next data by one byte.

(2) 사이드 인포메이션 분석(2) side information analysis

입력되는 MP3 오디오 비트 스트림으로부터 일단 프레임 헤더를 찾은 후, 상기 디먹스(100)는 순방향 디코딩 시와 마찬가지로 어떻게 프레임이 인코딩 되어 있는지에 대한 정보인 사이드 인포메이션(Side Information)과, 각 주파수 밴드의 이득값을 컨트롤하는 스케일 팩터 및 허프만 코딩 데이터를 차례로 분리하여 출력하며, 상기 사이드 인포메이션 디코더(110)는 분리출력되는 상기 사이드 인포메이션 데이터를 디코딩한 후, 이 정보에 근거하여 해당 프레임에 들어 있는 데이터를 어떻게 처리할 것인지 알 수 있게 된다.Once the frame header is found from the input MP3 audio bit stream, the demux 100 has side information which is information on how the frame is encoded, as in the case of forward decoding, and the gain value of each frequency band. A scale factor and a Huffman coded data for controlling the data are separated and sequentially outputted, and the side information decoder 110 decodes the side information data outputted separately, and then, based on this information, how to process data contained in the corresponding frame. You will know if you want to.

일반적으로 MPEG 오디오 레이어 3에서는, 상기 인코딩 과정에서 간략히 설명한 바와 같이, 고정된 비트율에서 보다 음질을 향상시키기 위하여 비트 레저브와(bit reservoir) 방식을 사용한다.In general, in MPEG audio layer 3, as described briefly in the encoding process, a bit reservoir and a bit reservoir are used to improve sound quality at a fixed bit rate.

즉, 인코딩할 오디오 샘플의 특성에 따라, 일정 음질을 유지하면서도 코딩하는데 많은 데이터를 필요로 할 수도 있고, 반대로 적은 데이터만을 필요로 할 수도 있는데, 프레임 크기와 그 프레임에 코딩되는 오디오 샘플수를 동일하게 유지하면서 음질유지에 필요한 데이터량을 프레임마다 다르게 하기 위해서는, 적은 데이터만을 필요로 하는, 정해진 크기의 프레임의 남는 공간에, 많은 데이터를 필요로 하는 프레임의 데이터를 넣어야 하며, 이에 따라 현재 프레임의 데이터 공간에는 현재 프레임의 데이터 뿐만 아니라 다음 프레임의 데이터가 들어 있을 수 있게 된다. 다만, MPEG 표준은 임의 위치에서 시작하여도 디코딩이 가능하도록 하기 위하여 511 바이트 거리내에서 이전 프레임들에 대해서 존재할 수 있도록 하여 현재 프레임의 데이터가 들어 있을 수 있는 이전 프레임의 범위를 제한하였으며, 또한 미래에 올 프레임에는 과거 프레임의 데이터를 넣지 못하도록 하였다.That is, depending on the characteristics of the audio samples to be encoded, a large amount of data may be required for coding while maintaining a constant sound quality, or, on the contrary, only a small amount of data may be required. In order to keep the amount of data required for maintaining sound quality different from frame to frame, data of a frame requiring a lot of data must be put in a remaining space of a frame of a predetermined size requiring only a small amount of data. The data space may contain data of the next frame as well as data of the current frame. However, the MPEG standard limits the range of previous frames that may contain data of the current frame by allowing existing frames within 511 bytes to be decoded even when starting from an arbitrary position. All frames cannot be past data.

이와 같이, 현재 프레임을 위한 데이터가 이전 프레임에 몇 개 들어있는지를 알려주기 위해, 도 5에 도시된 바와 같이 사이드 인포메이션에 그 포인터 값이 들어 있으며, 디코딩할 데이터를 얻기 위해서는 상기 사이드 인포메이션을 분석한 후 포인터 값에 따라 이전 프레임의 데이터 영역에서 해당 개수만큼 데이터를 가져와야 하는 것이다.As such, to indicate how many pieces of data for the current frame are included in the previous frame, the side information includes a pointer value as shown in FIG. 5, and the side information is analyzed to obtain data to be decoded. According to the after pointer value, the corresponding number of data should be obtained from the data area of the previous frame.

상기 내용을 정리하면, 현재 프레임에 해당하는 데이터, 즉 상기 스케일 팩터와 허프만 코딩 데이터가 들어 있을 수 있는 프레임의 범위는 현재 프레임만으로 한정되는 것이 아니라 사이드 인포메이션을 포함하는 현재 프레임에 511 바이트 거리내에서 앞서는 프레임 (정상 재생시의 시간적 이전 프레임)에도 들어있을 수 있는바, 순방향 디코딩시에는 이전 프레임에서 디코딩하고 남은 데이터에 현재 프레임의 데이터를 채워넣어 사용하면 되지만, 역방향 디코딩에서는 현재 디코딩할 프레임에 앞서는 이전 프레임의 헤더를 찾고 해당 데이터 영역으로부터 필요한 개수만큼 더 오디오 데이터를 얻어와야 하며, 이 경우 상기 디코딩된 사이드 인포메이션에 근거하여 이전 프레임으로부터 얻어올 데이터의 수를 확인함으로써, 필요한 만큼의 데이터를 정확히 얻어올 수 있게 되는 것이다.In summary, the range of data corresponding to the current frame, that is, the scale factor and the Huffman-coded data, may not be limited to the current frame but within 511 bytes of the current frame including side information. It may also be included in the preceding frame (temporary previous frame during normal playback). In forward decoding, the data decoded from the previous frame and the remaining data may be filled in. In reverse decoding, the previous frame may be used before the current frame to be decoded. It is necessary to find the header of the frame and obtain the audio data from the corresponding data area as many times as necessary. In this case, the number of data to be obtained from the previous frame is confirmed based on the decoded side information. Will be able to get.

경우에 따라서는 이전 프레임 몇 개에 걸쳐 이 과정을 반복하여 필요한 데이터 값을 얻어와야 하는 경우도 있으나, 헤더를 제대로 찾는다면 이 과정도 역시 어렵지 않다.In some cases, you may need to repeat this process over a few previous frames to get the data you need, but this process is also not difficult if you find the header correctly.

(3) 허프만 디코딩(3) Huffman decoding

상기 사이드 인포메이션에 근거하여 디코딩하여야 할 오디오 데이터를 확인하게 되면, 상기 허프만 디코더(120)는 인코딩시 총 비트량을 줄이기 위해 데이터특성에 따라 사용된 허프만 트리 및 상기 사이드 인포메이션에 근거하여, 분리출력되는 허프만 코딩 데이터(이전 프레임으로부터 얻어진 데이터를 포함한 전체 데이터)에 대해 허프만 디코딩을 수행한다.When the audio data to be decoded is identified based on the side information, the Huffman decoder 120 separates and outputs the output based on the Huffman tree and the side information used according to data characteristics in order to reduce the total bit amount during encoding. Huffman decoding is performed on Huffman coded data (full data including data obtained from previous frames).

이 과정은 순방향 디코딩에서와 동일하며, 다만 일반적으로 한 프레임은 두개의 그래뉼(granule 0, granule 1)로 나누어 인코딩되어 있는바, 그래뉼 1을 디코딩 하기 위해서는 그래뉼 0을 먼저 디코딩해야 그 데이터 위치를 알 수 있으므로, 순방향 디코딩에서는 그래뉼 단위로 디코딩이 가능한데 반하여 역방향 디코딩에서는 한 프레임 전체, 즉 두 개의 그래뉼 모두에 대해 한번에 디코딩해야 한다.This process is the same as in forward decoding, except that in general, one frame is encoded by dividing it into two granules (granule 0 and granule 1). In order to decode granule 1, granule 0 must first be decoded to know its data position. In the forward decoding, it is possible to decode by granule, whereas in the reverse decoding, the entire decoding should be decoded for one frame, that is, both granules at once.

(4) 역양자화 및 리스케일링(4) dequantization and rescaling

상기 허프만 디코더(120)에 오디오 데이터를 허프만 코딩 이전의 데이터로 복원한 후, 상기 역양자화부(130)는 허프만 디코딩된 결과를 주파수 영역에서의 실제 샘플 에너지 값으로 복원하는 작업을 수행하는데, 예를 들어 허프만 디코딩된 값이 Y라고 하면 일단 Y^4/3을 계산하고, 이렇게 얻어진 값에 상기 스케일 팩터에서 얻은 스케일 값을 곱하여 리스케일링(re-scaling)함으로써, 실제 스펙트럼 에너지 값으로 복원한다.After restoring audio data to the Huffman decoder 120 before the Huffman coding, the inverse quantization unit 130 restores the Huffman decoded result to the actual sample energy value in the frequency domain. For example, once the Huffman decoded value is Y, Y ^4/3 is calculated, and the value obtained in this manner is multiplied by the scale value obtained in the scale factor, and re-scaled to restore the actual spectral energy value.

이 과정에서, 만약 비트 스트림이 스테레오 신호로 인코딩되었다면 각 채널은 서로 분리되어 전송될 수도 있지만, 종종 두 채널 사이의 합(sum)과 차(difference)를 전송함으로써 두 채널 사이의 중복성을 제거하는 방식을 이용하기도 하는데, 이 방식을 이용하여 인코딩되었다면 스테레오 복구를 실시한다.In this process, if the bit stream is encoded as a stereo signal, each channel may be transmitted separately from each other, but a method of eliminating redundancy between the two channels is often transmitted by sending a sum and a difference between the two channels. It is also used to perform stereo recovery if encoded using this method.

(5) IMDCT(5) IMDCT

지금까지 얻어진 신호는 주파수 영역(Frequency-Domain)에 있으므로, 실제 출력 샘플을 합성하기 위해서는 시간 영역(Time-Domain)으로의 에너지 변환(transform)이 행해져야 하는데, 이 변환은 인코더에서 이용된 시간-주파수 변환(Time-to-Frequency Transform)의 역과정으로서, 상기 IMDCT부(140)에 의해 실제샘플 에너지 값으로 복원된다.Since the signals obtained so far are in the frequency domain (Frequency-Domain), an energy transform in the time domain must be performed in order to synthesize the actual output samples. As an inverse process of the time-to-frequency transform, the IMDCT unit 140 restores the actual sample energy value.

레이어 3에서는 다른 레이어에서보다 더 좋은 주파수 해상도를 얻기 위해 MDCT를 추가로 사용하는데, 이 변환은 크리티컬하게 샘플링된 값에 대한 DCT이므로 어떤 양자화도 행해지지 않았다면 완벽하게 원래 신호로 복구가 가능하나, 실제로는 양자화가 행해지므로 각 전송블록 사이에 불연속성이 생기게 된다.Layer 3 additionally uses MDCT to achieve better frequency resolution than other layers, and this transform is a DCT for critically sampled values, so if no quantization is done, it can be completely recovered to the original signal. Since quantization is performed, there is a discontinuity between each transport block.

이에 따라, 블록(그래뉼) 단위로 수행되는 IMDCT에 있어서 각 전송블록 사이에 불연속성이 생기게 되고, 이러한 불연속성은 노이즈와 클릭음 등을 만들어 음질에 치명적인 영향을 미치므로, 이를 없애기 위해 IMDCT 변환 후의 결과값에 대하여 이전 그래뉼에서의 값과 50% 오버랩 합산(overlap-adding)하여 사용한다.As a result, discontinuity is generated between each transport block in the IMDCT performed in units of blocks (granules), and this discontinuity creates noise and click sounds, which have a fatal effect on sound quality. 50% overlap-adding with the values from the previous granules.

즉, IMDCT를 수행하게 되면 모두 36개의 데이터가 얻어지는데, 순방향 디코딩의 경우 도 6에 도시된 바와 같이 현재 그래뉼에 대한 전반부의 18개 데이터와, 이전 그래뉼의 후반부 18개 데이터를 합산한 값을 이용하고 있다. 그러나, 순방향 디코딩의 경우에는 이전 프레임의 값을 가지고 있지만, 역방향 디코딩의 경우에는 시간적으로 이후의 프레임에 대한 데이터를 가지고 있으므로, 오버랩 되는 순서를 바꾸어 주어야 하며, 따라서 현재 그래뉼에 대한 IMDCT 결과 36개중 후반부 18개와 이후 그래뉼의 전반부 18개를 오버랩 합산하여 사용하는 것이다. 다만, 시작하는 프레임, 즉 역방향 디코딩시의 마지막 프레임의 경우에는, 프레임의 후반부 그래뉼에 대해서는 오버랩할 데이터가 없으므로 오버랩할 부분에 0(zero)를 넣거나 오버랩 합산을 하지 않고 그냥 사용한다.That is, when IMDCT is performed, all 36 data are obtained. In the case of forward decoding, as shown in FIG. 6, 18 data of the first half of the current granule and 18 data of the second half of the previous granule are added together. Doing. However, in the case of the forward decoding, the previous frame has the value of the previous frame, but in the case of the backward decoding, the data of the subsequent frame must be changed in time, so the overlapping order should be changed, and thus, the latter half of the 36 IMDCT results for the current granule. The 18 and the first half of the granules afterwards are overlapped and used. However, in the case of the starting frame, that is, the last frame at the reverse decoding, since there is no data to overlap with the latter granule of the frame, 0 (zero) is added to the portion to be overlapped or used without adding overlap.

상기 과정을 수학식으로 다시 한번 정리하면 다음과 같다.The process is summarized as follows.

x_i(n)을 다음 처리에 사용할 목표 샘플, y_i(n)을 IMDCT한 결과, 첨자 i는 그래뉼 번호, N이 총 프레임 수라고 할 때,When IMDCT x _i (n) to the target sample for the next process, y _i (n), the subscript i is the granule number, where N is the total number of frames,

순방향 디코딩의 경우에는 다음과 같이 처리한다.In the case of forward decoding, processing is as follows.

x_i(n) = y_i(n) + y_i-1(n+18) 0≤n<18, i=1,2, … 2Nx _i (n) = y _i (n) + y _i-1 (n + 18) 0≤n <18, i = 1,2,... 2N

단, 여기서 y₀(n+18), 0≤n<18 은 모두 0으로 초기화 되어 있어야 한다.Here, y ₀ (n + 18) and 0≤n <18 must all be initialized to 0.

그런데, 역방향 디코딩을 위해서는 이전 프레임의 정보를 가지고 있지 않기 때문에, 대신 이후 프레임과 50% 오버랩 하여 더해지도록 다음과 같이 수학식을 변경하여야 한다.However, since backward information does not have the information of the previous frame, the following equation should be changed to add 50% overlap with the subsequent frame instead.

x_i(n) = y_i(n+18) + y_i-1(n) 0 ≤ n < 18, i=2N, 2N-1 … , 1x _i (n) = y _i (n + 18) + y _i-1 (n) 0 ≤ n <18, i = 2N, 2N-1. , One

물론, 이 경우에도 초기값인 y_2N+1(n+18), 0 ≤ n < 18 은 모두 0으로 초기화 되어 있어야 한다.Of course, even in this case, the initial values y _{2N + 1} (n + 18) and 0 ≤ n <18 must be initialized to 0.

상기의 오버랩 과정은 순방향의 경우와 비교하여 그 순서만 바뀌었을 뿐이므로, 연산량과 메모리 사용량 역시 순방향의 경우와 동일하다.Since the overlap process is only changed in the order compared to the case of the forward, the amount of computation and memory usage is also the same as the case of the forward.

(6) 서브밴드 합성(6) subband synthesis

상기 IMDCT부(140)에 의해 IMDCT 변환을 수행하고 50% 오버랩 합산 처리한 결과값을 얻은 후, 실제 오디오 샘플을 출력하기 전의 마지막 절차는 서브밴드로 나누어진 샘플들을 합성하여 원래의 시간 샘플 신호로 복원하는 서브밴드 합성과정으로서, 이는 인코딩시 입력 샘플을 32개의 주파수 대역별로 분리하였던 것을 다시 합성하는 과정(subband synthesis)이며, 각 서브밴드의 샘플들을 하나의 시간 샘플열로 합성하는 인터폴레이션(interpolation) 과정이다.After the IMDCT conversion is performed by the IMDCT unit 140 and the result of 50% overlap addition is obtained, the final procedure before outputting the actual audio sample is to synthesize the samples divided into subbands into the original time sample signal. Reconstructing subband synthesis, which is a process of subband synthesis that separates input samples into 32 frequency bands during encoding, and interpolation of synthesizing samples of each subband into one temporal sample sequence. It is a process.

서브밴드 합성 필터에서는 이전 프레임의 필터 입력이 지연되어 사용되므로 이전 프레임의 필터 입력값을 필요로 하는데, 순방향 디코딩과는 달리 역방향 디코딩시에는 합성필터에 입력되는 서브밴드의 데이터가 역순으로 입력되어 이전 값을 알 수 없으므로, 합성필터의 구조를 이에 맞추어 재설계하여야 한다.In the subband synthesis filter, the filter input of the previous frame is delayed and needs the filter input value of the previous frame. Unlike the forward decoding, in the reverse decoding, the subband data input to the synthesis filter is input in the reverse order. Since the value is unknown, the structure of the synthesis filter must be redesigned accordingly.

합성 필터를 위한 필터 뱅크는 MPEG 표준에 정의되어 있으므로 이것을 이용하여 역방향 디코딩이 가능하도록 합성 필터를 설계하였으며, 이하 일반적인 순방향 디코딩에서 사용되는 합성필터에 대해 살펴본 후, 이와 대비하여 본 발명에 따른 새로운 합성필터에 대해 상세히 설명한다.Since the filter bank for the synthesis filter is defined in the MPEG standard, the synthesis filter is designed to enable reverse decoding using the synthesis filter. Hereinafter, the synthesis filter used in the general forward decoding is described, and in contrast, the new synthesis according to the present invention is prepared. The filter will be described in detail.

도 7은 MPEG 오디오 순방향 디코딩시 사용되는 합성필터의 구조를 도시한 것으로서, 상기 합성필터의 목표는 각 서브밴드의 샘플들을, FDM(Frequency Division Multiplexing) 방식과 유사하게, 해당 주파수 대역의 신호로 하여 하나의 신호를 합성해 내는 것이다.FIG. 7 illustrates a structure of a synthesis filter used in MPEG audio forward decoding. The goal of the synthesis filter is to take samples of each subband as signals of a corresponding frequency band, similar to a frequency division multiplexing (FDM) scheme. Is to synthesize one signal.

즉, T_S1간격으로 크리티컬하게 샘플링된 32개의 신호 x_r(mT_S1)을 하나의 신호로 합성하여, T_S2= T_S1/32 간격으로 크리티컬하게 샘플링된 신호인 s(nT_S2)를 합성해 내는 것이며, 이것은 인코딩시 32개의 동일한 주파수 대역 구간으로 나누었던 신호를 다시 원래의 신호로 복원하는 과정이다.That is, 32 signals critically sampled at intervals of T _S1 x _r (mT _S1 ) are synthesized into one signal, and s (nT _S2 ), which is a critically sampled signal at intervals of T _S2 = T _S1 / 32, is synthesized. This is the process of restoring the signal, which was divided into 32 identical frequency band sections at the time of encoding, back to the original signal.

여기서 x_r(mT_S1)은 r 번째 서브밴드의 샘플신호를 의미하며, 이 신호를 32배로 up-sampling한 신호가 x_r(nT_S2)이다. 이 과정에서 (m-1)T_S1과 mT_S1사이에 31개의0(zero)가 삽입된다. 이 효과는 T_S1시간 간격으로 샘플링 된 신호의 고조파는 모양이 그대로 유지되면서 단지 주파수의 범위만 32배로 확장된 것에 해당된다.Here, x _r (mT _S1 ) denotes a sample signal of the r-th subband, and a signal obtained by up-sampling this signal by 32 times is x _r (nT _S2 ). In this process, 31 zeros are inserted between (m-1) T _S1 and mT _S1 . The effect is that the harmonics of a signal sampled at T _S1 time intervals remain intact, extending only the frequency range by 32 times.

다시 말해, T_S1시간 간격으로 샘플링함으로써, f_S1= 1/T_S1주파수 간격으로 반복적으로 나타나는 고조파가, T_S2시간 간격으로 up-sampling하여 f_S2= 1/T_S2로 샘플링 주파수 범위가 확장되면서 이 범위 내에 32개의 f_S1고조파가 포함된 것이며, 결과적으로 T_S2= T_S1/32 시간 간격으로 샘플링된 신호로 변경되는 것이다.In other words, by sampling at T _S1 time intervals, the harmonics appearing repeatedly at f _S1 = 1 / T _S1 frequency intervals are up-sampled at T _S2 time intervals to extend the sampling frequency range to f _S2 = 1 / T _S2 . Within this range, 32 f _S1 harmonics are included, resulting in a signal sampled at T _S2 = T _S1 / 32 time intervals.

이제 이 신호를 가지고 각 서브밴드마다 해당 대역만 남기고 제거한다. 이때 사용되는 band-pass filter가 바로 H_r(z)이다(r은 서브밴드 번호).Now take this signal and remove it, leaving only the corresponding band for each subband. The band-pass filter used at this time is H _r (z) (r is a subband number).

필터 H_r(z)은 512개의 차수를 가지며, prototype low-pass filter를 phase-shift하여 해당 서브밴드의 band-pass filter를 구성한다. 위 도 7의 블록 다이어그램을 수학식으로 표현하면 다음식과 같다.The filter H _r (z) has 512 orders and phase-shifts the prototype low-pass filter to form a band-pass filter for the subband. The block diagram of FIG. 7 above is expressed as the following equation.

여기서 S_t(nT_S2)은 합성된 신호의 출력 샘플, t는 T_S1시간 간격이며 서브밴드신호의 현재 입력 샘플을 의미한다. 즉 x_r(tT_S1)의 서브밴드별 각 입력 샘플에 대한 합성 출력 신호가 S_t(nT_S2)가 되는 것이다. 여기서 r=0,1,2, ,31이고 서브밴드 인덱스이며, n=0,1,2, ,31이고 출력샘플 인덱스로서, 32개의 서브밴드마다 각각 1개 씩의 샘플이 입력되어 32개의 합성된 출력 샘플이 생성되는 구조이다.Where S _t (nT _S2 ) is an output sample of the synthesized signal, and t is a T _S1 time interval and means a current input sample of the subband signal. That is, the synthesized output signal for each input sample for each subband of x _r (tT _S1 ) becomes S _t (nT _S2 ). Where r = 0,1,2,, 31, subband index, n = 0,1,2,, 31, and output sample index, one sample is input for every 32 subbands, and 32 synthesis Output sample is generated.

상기 수학식의 구조는 x_r(kT_S2) 와 H_r(kT_S2)의 컨볼루션(convolution) 형태이며, 필터링 과정임을 나타낸다. H_r(kT_S2) 는 512개의 차수를 가지며, 기본형의 low-pass filter인 h(kT_S2)와 이 필터를 phase-shift하여 해당 서브밴드의 band-pass filter를 생성하기 위한 의 N_r(k)의 곱으로 구성되어 있다.The structure of the above equation is a convolutional form of x _r (kT _S2 ) and H _r (kT _S2 ), indicating that the filtering process. H _r (kT _S2 ) has 512 orders of magnitude, and h (kT _S2 ), a basic low-pass filter, and N _r (k) for phase-shifting this filter to create a band-pass filter for the subband. ) Is a product of

각 서브밴드를 up-sampling하고 해당 대역 필터를 거친 후 모든 서브밴드의 신호를 더하면 출력 샘플이 생성된다. 여기서 상기 수학식 1을 그대로 연산하려면 상당히 많은 양의 연산이 필요하므로, 연산량을 줄이기 위해 수학식을 정리하여 다시 쓰면 아래의 수학식 2와 같다. 수학식 1을 전개하여 최적화하는 과정에서 cosine항의 대칭성과 서브밴드 신호 x_r(tT_S1)을 up-sampling하여 x_r(kT_S2)를 만들면서 채워진 0(zero)가 고려되었다. 이하의 수학식부터는 표기의 편의를 위해 샘플링 주기를 생략하였고, 묵시적으로 샘플링 주기는 T_S2이다.Up-sampling each subband, passing its bandpass filter, and then adding the signals from all the subbands produces an output sample. In order to calculate Equation 1 as it is, a large amount of calculations are required. Thus, in order to reduce the amount of calculation, the equations are summarized and rewritten as Equation 2 below. In developing and optimizing Equation 1, the symmetry of the cosine term and the zero filled by up-sampling the subband signal x _r (tT _S1 ) to make x _r (kT _S2 ) were considered. From the following equation, the sampling period is omitted for convenience of notation, and the sampling period is implicitly T _S2 .

여기서 n=0,1,2 … 31, i= 0,1,2 …15, k=0,1,2 … 63 그리고 r=0,1,2 … 31 이다. 첨자 k, n, i는 연산 과정에서 사용되는 인덱스이고, 첨자 r은 서브밴드 번호를 의미하며, 첨자 t는 서브밴드 신호의 현재 샘플이 입력된 시간을 의미한다. 위의 수학식에서 기호 [x]는 실수 x에 대하여 a≤x 를 만족하는 최대 정수 a를 의미한다. 즉, 실수 x의 소수점 이하를 잘라버리고 정수로 만든 것이다. 또한, 연산자 %는 modular 연산을 의미하며 a%b는 a를 b로 나눈 나머지 값에 해당한다.Where n = 0,1,2... 31, i = 0, 1, 2... 15, k = 0,1,2... 63 and r = 0,1,2... 31. The subscripts k, n, and i are indices used in the calculation process, the subscript r means the subband number, and the subscript t means the time when the current sample of the subband signal was input. In the above equation, the symbol [x] means the largest integer a that satisfies a≤x for the real number x. That is, it rounds off the decimal point of the real number x to make it an integer. Also, operator% means modular operation and a% b is the remainder of a divided by b.

상기 최적화된 수학식 2를 알고리즘 순서도로 표현하면 도 8과 같으며(MPEG Audio Standard Spec. 참조), 상기 도 8의 순서도를 계산 절차에 따른 블록도로 나타내면 도 9와 같다.The optimized Equation 2 is represented by an algorithm flow chart as shown in FIG. 8 (see MPEG Audio Standard Spec.), And the flow chart of FIG. 8 is shown as a block diagram according to a calculation procedure.

각 서브밴드마다 1개의 샘플이 입력되어 N_r(k) 행렬 값이 곱해지며, 나온 결과는 64개가 된다. 이 값이 1024개의 선입선출(FIFO) 버퍼에 입력되며, 기존에 들어 있던 값들은 64개의 값만큼 shift된다. 이제 이 FIFO 안에 들어있는 값 중에서 사용할 데이터를 순차적으로 가져와 window 계수를 곱하여 나온 값을 모두 더하면 PCM 출력 샘플이 생성된다. 이를 수학식 2와 비교하면서 보면 쉽게 알 수 있다.One sample is input for each subband, and the N _r (k) matrix value is multiplied, resulting in 64 results. This value is input to 1024 first-in-first-out (FIFO) buffers, and the existing values are shifted by 64 values. Now, the data contained in this FIFO is taken sequentially, and the sum of the multiplied window coefficients produces the PCM output sample. Comparing this with Equation 2 can be easily seen.

이하에서는, 상기에서 설명한 순방향 디코딩에 사용되는 합성필터의 구조를 참조하여, 본 발명에 따른 역방향 디코딩시의 합성필터에 대해 설명한다.Hereinafter, a synthesis filter during reverse decoding according to the present invention will be described with reference to the structure of the synthesis filter used for the forward decoding described above.

위의 순방향 합성필터를 역방향으로 디코딩하기 위한 구조로 변경하기 위해서는 몇가지를 고려해야 한다. 역방향으로 디코딩하는 과정에서는 서브밴드의 샘플들이 시간적으로 역순으로 입력되게 된다. 즉, 한 서브밴드의 총 샘플 수가 N개라면, 순방향으로 디코딩 하는 경우 t=0,1,2 … N-1 의 순서로 샘플이 입력되는 반면, 역방향으로 디코딩하는 경우에는 t= N-1,N-2, N-3 … 1, 0 의 순서로 입력되는 것이다.There are several things to consider in order to change the above forward synthesis filter into a structure for decoding in the reverse direction. In the reverse decoding process, samples of subbands are inputted in reverse order in time. That is, if the total number of samples in one subband is N, t = 0, 1, 2... Samples are input in the order of N-1, whereas in reverse decoding, t = N-1, N-2, N-3... It is input in order of 1 and 0.

MPEG 오디오 합성 필터는 시간적으로 이전 샘플 값들을 이용하는데, 샘플이 역순으로 입력되면 시간적으로 이전 샘플 값들을 알 수 없기 때문에 순방향 합성필터를 그대로 이용하여 역방향 합성을 수행할 수 없고, 따라서 역순으로 입력되는 샘플에 대하여 역순으로 합성을 수행할 수 있도록 필터의 구조를 변경해야 하는바, 아래에서는 이러한 구조를 갖는 합성 필터를 제시한다.The MPEG audio synthesis filter uses the previous sample values in time. If the samples are input in reverse order, the backward synthesis cannot be performed using the forward synthesis filter because the previous sample values are not known in time. It is necessary to change the structure of the filter so that the synthesis can be performed in reverse order with respect to the sample.

도 10은 본 발명에 따른 역방향 디코딩을 위한 합성필터의 구조를 도시한 것으로서, 순방향 디코딩을 위한 합성필터의 구조와 유사하나, 다만, 필터뱅크만 그 전달함수가 B_r(z)으로 대치 되었다. 서브밴드 신호의 샘플이 입력되는 순서는 역방향 디코딩의 경우 x_r(mT_S1), m= N-1,N-2, N-3 … 1, 0 과 같다.FIG. 10 illustrates a structure of a synthesis filter for reverse decoding according to the present invention, which is similar to the structure of a synthesis filter for forward decoding, except that only the filter bank is replaced with B _r (z). The order in which the samples of the subband signal are inputted is x _r (mT _S1 ), m = N-1, N-2, N-3. Same as 1, 0

입력 순서를 고려하여 상기 수학식 1을 변경하면 다음 수학식 4와 같다.If Equation 1 is changed in consideration of the input order, Equation 4 is obtained.

수학식 4를 수학식 1과 비교하면, 각 서브밴드에 대한 band-pass filter는 동일하며, 다만 입력 샘플이 시간의 역순으로 입력되므로 필터링을 위한 컨볼루션에서 입력 신호의 순서를 거꾸로 뒤집어 필터 계수와 곱했던 것을 그냥 사용하게 된다. 즉, 입력되는 순서 자체가 역순이기 때문에 순방향 디코딩시의 서브밴드 입력 x_r(((32t+n)-k)T_s) 는 역순입력에 대하여 x_r(((32t+n)-511+k)T_s) 가 된다.Comparing Equation 4 with Equation 1, the band-pass filter for each subband is the same, except that the input samples are inputted in reverse order of time, so that the convolution for filtering reverses the order of the input signals and the filter coefficients. You just use what you multiply. That is, the subband input x _r (((32t + n) -k) T _s ) at the time of forward decoding is x _r (((32t + n) -511 + k for the reverse input because the input order itself is reverse order. ) T _s )

수학식 4를 그대로 계산하기 위해서는 상당히 많은 양의 연산이 필요하므로, 순방향 디코딩시와 마찬가지로 수학식을 전개하여 최적화하면 다음 수학식이 얻어진다.In order to calculate Equation 4 as it is, a large amount of calculations are required. Thus, when the equation is developed and optimized as in the case of forward decoding, the following equation is obtained.

단, 위 수학식 5와 6에서 사용된 첨자와 연산자에 대한 것은 수학식 3과 4에서 적용된 것과 동일하다.However, the subscripts and operators used in Equations 5 and 6 are the same as those applied in Equations 3 and 4.

위 수학식 5와 6번에서 j=31-n 그리고 m=63-k로 치환하면 위의 수학식은 아래와 같이 된다.Substituting j = 31-n and m = 63-k in Equations 5 and 6 above, the above equation is as follows.

상기 수학식 7, 8을 수학식 3, 4와 비교하면 유사하다는 것을 알 수 있으며, 다만 입력된 샘플에 대한 인덱스만 다르다. 상기 식을 알고리즘 순서도로 표현하면 도 11과 같으며, 상기 순서도를 계산절차에 따른 블록도로 구성하면 도 12와 같다.Comparing Equations 7, 8 to Equations 3 and 4, it can be seen that they are similar, but only indices of the input samples are different. The equation is expressed as an algorithm flow chart as shown in FIG. 11, and the flow chart as shown in FIG. 12 as a block diagram according to a calculation procedure.

이상에서 설계한 역방향 디코딩을 위한 합성필터의 구조를 살펴보면 순방향 합성필터와 알고리즘 구조가 비슷하며, 따라서 연산량과 구현에 필요한 메모리 사용량도 동일하게 구현할 수 있다. 만약 순방향 디코딩과 역방향 디코딩을 동시에 구현한다면 순방향 합성필터를 그대로 적용할 수 있으며, 단지 FIFO의 shifting 방향과 최종 샘플 합성 순서(첨자 j)만 역방향으로 바꾸어 주면 된다.Looking at the structure of the synthesis filter for the reverse decoding designed above, the structure of the forward synthesis filter and the algorithm is similar, so that the amount of calculation and the memory usage required for the implementation can be equally implemented. If forward and backward decoding are implemented simultaneously, the forward synthesis filter can be applied as it is, only the shifting direction of the FIFO and the final sample synthesis order (subscript j) need to be reversed.

즉, 매 합성시마다 합성되는 32 샘플은 시간 순서이지만, 다음에 합성되는 시간 샘플은 시간적으로 이전 샘플이며, 따라서, 샘플의 출력순서는 한번에 합성되는 32 샘플을 역순으로 출력하고 다음 합성되는 32 샘플을 마찬가지로 역순으로 출력하는 과정을 첫 프렘임까지 반복하게 된다. 본 합성필터는 MPEG Audio Layer 1, 2, 3 모두에서 사용 가능하다.That is, the 32 samples synthesized at every synthesis are in chronological order, but the next time synthesized is the previous sample in time, so the output order of the samples is to output 32 samples synthesized at once and 32 samples synthesized at the next. Likewise, the process of printing in reverse order is repeated until the first frame. This synthesis filter can be used in MPEG Audio Layer 1, 2, and 3.

한편, 보다 고속의 녹음을 위해서는 순방향 트랙에 대해 녹음을 수행함과 동시에, 역방향 트랙에 대해서도 함께 녹음을 수행하는 것이 바람직하며, 이때 역방향 트랙에 대해서는 테이프가 재생시의 역으로 주행하게 되므로, 디코딩 또한 역방향으로 디코딩해 주어야 하는바, 역방향 디코딩을 수행하는 방법에는 상기의 실시예와는 다른 여러 가지가 있을 수 있으나, 먼저 순방향 알고리즘을 이용한 역방향 디코딩 방법에 대해서 살펴보면 다음과 같다.On the other hand, for higher speed recording, it is preferable to perform recording on the forward track and at the same time on the reverse track. In this case, since the tape travels in reverse at the time of playback, the decoding is also reversed. It should be decoded. There may be various methods for performing reverse decoding as described above. However, the reverse decoding method using the forward algorithm will be described as follows.

압축된 MPEG 디지털 오디오 데이터를 역방향으로 아날로그 기록매체에 기록하기 위한 방안으로서 생각할 수 있는 첫 번째 방법은 데이터를 모두 순방향으로 디코딩하여 그 결과의 PCM 샘플을 저장해 두고 맨 마지막 PCM 데이터부터 역으로 아날로그 변환하는 것이다. 이 경우 순방향 알고리즘을 그대로 사용하므로 구현은 쉽지만 디코딩된 데이터량이 방대하므로, 데이터를 저장할 대량의 저장매체가 필요하다는 문제점이 있는 것은 물론, 한 데이터 클립의 길이가 서로 다르며, 제한되어 있지 않기 때문에 필요한 저장매체의 최대한계를 결정 지을 수 없다는 문제점이 있다.The first method that can be thought of as a way to record compressed MPEG digital audio data on an analog recording medium in the reverse direction is to decode all the data in the forward direction, store the resulting PCM samples, and then perform analog conversion from the last PCM data to the reverse. will be. In this case, the forward algorithm is used as it is, but the implementation is easy, but since the amount of decoded data is huge, there is a problem that a large amount of storage medium is required to store the data, and the length of one data clip is different and is not limited. The problem is that the maximum of the medium cannot be determined.

압축된 MPEG 디지털 오디오 데이터를 역방향으로 아날로그 기록매체에 기록하기 위한 방안으로서 생각할 수 있는 두 번째 방법은 전체 스트림의 맨 뒷 프레임부터 일정한 프레임 수로 나누어 순방향으로 디코딩을 하는 것이다.The second method, which can be thought of as a method for recording compressed MPEG digital audio data in the reverse direction on an analog recording medium, is to decode in the forward direction by dividing the last frame of the entire stream into a predetermined number of frames.

즉, 총 N 개의 프레임이 있다면 M 개의 프레임 씩 나누어 (N-M) 번째 프레임부터 N 번째 프레임까지 디코딩을 하여 저장하고 이것을 N 번째 프레임부터 (N-M+1) 번째 프레임까지 역순으로 연주한다. 이때, (N-M) 번째 프레임은 디코딩시 첫 프레임이기 때문에 이전 프레임과 연속성이 없으므로 정상적인 데이터가 아니므로 출력하지 않는다. 중요한 것은 프레임 간의 연속성은 한 프레임 만큼이며, 두 프레임 이상 떨어진 프레임 사이에는 연산 알고리즘적으로 영향을 주지 않는다는 것이다. 따라서, 두 번째 프레임부터는 정상적인 샘플을 얻을 수 있고, 그 결과값을 이용할 수 있다.That is, if there are a total of N frames, the M frames are divided and decoded from the (N-M) th frame to the N th frame and stored, and played in the reverse order from the N th frame to the (N-M + 1) th frame. At this time, since the (N-M) th frame is the first frame during decoding, it is not normal data because there is no continuity with the previous frame. It is important to note that the continuity between frames is only one frame, and there is no algorithmic effect between frames that are more than two frames apart. Therefore, from the second frame, a normal sample can be obtained and the result value can be used.

첫 번째 블록 프레임에 대한 연주가 끝나면 다시 (N-2M) 번째 프레임부터 (N-M) 번째 프레임까지를 마찬가지로 순방향으로 디코딩 하면서 저장한다. 여기서 주목할 것은 (N-M) 번째 프레임은 이전 디코딩 과정에서 그 블록의 첫 프레임으로 디코딩을 했던 프레임이다. 그러나 실제 샘플은 이전 프레임의 정보가 없어 정상적인 샘플로 디코딩되지 않았기 때문에 출력하지 않았다. 따라서 여기서 그 프레임까지 디코딩을 한다. 이때는 프레임의 마지막이고 이전 프레임의 정보를 가지고 있기 때문에 정상적인 샘플로 디코딩된다. 이제 출력은 (N-M) 번째 프레임부터 (N-2M+1) 번째 프레임까지를 역순으로 실시한다.When the performance of the first block frame is finished, the frame is again decoded in the forward direction from the (N-2M) th frame to the (N-M) th frame. Note that the (N-M) th frame is the frame that was decoded as the first frame of the block in the previous decoding process. However, the actual sample was not output because it was not decoded as a normal sample because there was no information of the previous frame. Therefore, the frame is decoded up to here. At this time, since it is the end of the frame and contains information of the previous frame, it is decoded as a normal sample. The output now reverses from the (N-M) th frame to the (N-2M + 1) th frame.

이 과정을 모든 프레임이 디코딩될 때까지 실시한다.This process is performed until all the frames are decoded.

마지막으로 디코딩되는 블록의 첫 프레임에 대해서는 순방향으로 디코딩할 때와 동일한 조건이 되므로 그때와 동일한 샘플값이 얻어진다. 따라서 그냥출력 샘플로 사용할 수 있다.Since the first frame of the last block to be decoded is the same condition as when decoding in the forward direction, the same sample value is obtained. So you can just use it as an output sample.

이 방법으로 역방향 디코딩을 하는 경우, 모든 프레임을 디코딩하여 저장하지 않고, 단지 M개의 프레임만을 디코딩하여 저장했다가 역순으로 출력하므로 M개의 프레임에 대한 샘플 개수만큼만을 저장할 공간만 있으면 충분하여, 상기 첫 번째 방법에 비해 최종 출력 샘플을 저장할 버퍼의 크기가 작아지며, 또한 저장할 공간의 요구량에 대한 한계가 분명하므로 설계가 용이한 장점도 있다.In the case of reverse decoding in this manner, instead of decoding and storing all frames, only M frames are decoded and stored and output in the reverse order, so that only enough space to store the number of samples for M frames is sufficient. Compared to the first method, the size of the buffer to store the final output sample is smaller, and the limitation on the amount of space required for storage is obvious, so the design is easy.

상기와 같은 구성 및 동작에 의해 이루어지는 본 발명에 따른 디지탈 오디오 데이터의 역방향 디코딩 방법은, 디지털로 압축된 오디오 신호를 테이프와 같은 아날로그 매체에 고속으로 저장함에 있어서, 압축된 디지털 오디오를 순방향 트랙은 순방향으로 디코딩하여 신호를 제공하고, 반대 방향의 트랙은 역방향으로 디코딩하여 신호를 제공함으로써, 동시에 순방향 및 반대 방향의 트랙 모두를 녹음할 수 있어서 고속 녹음이 가능하게 된다.The reverse decoding method of digital audio data according to the present invention by the above configuration and operation, in the high-speed storage of a digitally compressed audio signal to an analog medium such as a tape, the forward track to the compressed digital audio By decoding to provide a signal, and the track in the opposite direction is decoded in the reverse direction to provide a signal, it is possible to record both the track in the forward and reverse direction at the same time to enable high-speed recording.

또한, 순방향 디코딩 방식을 이용하여 역방향 디코딩을 수행하는 경우, 즉 맨 마지막 프레임부터 M프레임씩 나누어 순방향으로 디코딩하는 경우에는 알고리즘이 순방향 디코딩 방법과 동일하므로 구현이 용이하며, 비록 M개의 프레임을 디코딩하여 저장할 버퍼가 필요하다는 단점이 있으나 그 크기가 고정되어 있기 때문에구현이 어렵지 않다.In addition, when performing reverse decoding by using the forward decoding method, that is, when decoding in the forward direction by dividing the M frames from the last frame, the algorithm is the same as the forward decoding method, and it is easy to implement. The disadvantage is that you need a buffer to store it, but its size is fixed, so it is not difficult to implement.

또한, 역방향 디코딩 알고리즘을 이용한 역방향 디코딩 방식의 경우에는, 순방향 디코딩과 비교하여 연산량은 동일하되 약간의 추가 메모리만이 요구되며, 추가되는 메모리의 크기는, 허프만 디코딩 과정에서 순방향 디코딩시에는 한 프레임을 두 개의 블록으로 나누에 처리할 수 있는 반면 역방향 디코딩시에는 한 프레임 모두를 미리 허프만 디코딩해야 하므로 2배의 버퍼 크기만큼 필요하다. 즉, 그 크기는 스테레오 채널에 대하여 순방향 디코딩시 576×2 워드 만큼 필요한데 반해, 역방향 디코딩시는 1152×2 워드 만큼의 메모리가 필요하다.In addition, in the case of the reverse decoding method using the reverse decoding algorithm, compared to the forward decoding, the amount of computation is the same but only a little additional memory is required, and the size of the added memory is one frame during forward decoding in the Huffman decoding process. While it can be divided into two blocks, reverse decoding requires only Huff to decode all the frames beforehand, which requires twice the buffer size. In other words, the size of the stereo channel is required as much as 576 x 2 words in the forward decoding, while the reverse decoding requires 1152 x 2 words of memory.

Claims

In the reverse decoding method of digital audio data,

A first step of identifying a last frame header of the recorded audio data;

A second step of simultaneously dequantizing all the plurality of unit block data constituting the frame based on the identified header information;

A third step of restoring the dequantized data into frequency band data while maintaining continuity between the unit blocks; And

And converting and outputting the restored frequency band-specific data into audio sample data in chronological order.

The method of claim 1,

The frame header confirmation of the first step is performed based on first head information of the recorded audio data.

The method of claim 1,

The first step,

A lower first step of reading a frame size from first head information of the recorded audio data;

A lower second step of moving to a virtual head position of a frame based on the read frame size; And

And a lower third step of checking an actual frame header in consideration of the presence or absence of padding bits at the shifted position.

The method of claim 1,

The second step,

A lower first step of reading side information based on the identified header information;

A lower second step of identifying a location of data corresponding to a corresponding unit frame based on the read side information;

And a lower third step of simultaneously dequantizing all of the plurality of unit blocks constituting the unit frame based on the identified position.

The method of claim 1,

The third step,

Huffman decoding all the plurality of unit blocks constituting the unit frame; And

Dequantizing and descaling the Huffman decoded data.

The method of claim 1,

The third step is a unit block by overlapping the unit block data of the latter half of the current MDCT transformed with the unit block data of the first half of the reverse modified discrete cosine transformed immediately after. A method of reverse decoding digital audio data, the continuity of which is maintained.

The method of claim 6,

And setting data to be overlapped to 0 for the first unit block of the first frame inputted backward.

The method of claim 1,

In the third step, the continuity between the unit blocks is maintained by superimposing the unit block data of the second half to be restored (MDCT) for each current frequency band and the unit block data of the first half to be immediately restored. Reverse decoding method of audio data.

The method of claim 1,

The fourth step,

A method of reverse decoding of digital audio data, characterized in that the MDCT data is sequentially input to the memory in the time order of normal playback, and then delayed and outputted in the reverse order of the playback time order. .

A reverse decoding method of dividing M frames by the last portion of the compressed digital audio data in a forward direction and then outputting the decoded data in the reverse order of decoding.