KR100923300B1

KR100923300B1 - Method and apparatus for encoding/decoding audio data using bandwidth extension technology

Info

Publication number: KR100923300B1
Application number: KR1020030017977A
Authority: KR
Inventors: 김중회; 김상욱
Original assignee: 삼성전자주식회사
Priority date: 2003-03-22
Filing date: 2003-03-22
Publication date: 2009-10-23
Also published as: KR20040086878A; CN1290078C; CN1532809A

Abstract

대역 확장 기법을 이용한 오디오 데이터의 부호화 방법, 그 장치, 복호화 방법 및 그 장치가 개시된다.Disclosed are a method, an apparatus, a decoding method, and apparatus for encoding audio data using a band extension technique.

본 발명에 따라 오디오 데이터를 부호화하는 방법은 (a) 오디오 데이터를 대역 확장 부호화하여 대역 제한 오디오 데이터를 출력하고 대역 확장 정보를 생성하는 단계: (b) 상기 대역 제한 데이터를 비트율 조절가능하도록 기저 계층과 적어도 하나의 상위 계층을 갖는 계층 구조로 허프만 부호화하는 단계; 및 (c) 허프만 부호화된 대역 제한 오디오 데이터와 상기 대역 확장 정보를 다중화하는 단계를 포함하는 것을 특징으로 한다. 이에 의해, 네트워크 상황 등에 따라 비트율 조절가능하며, 복호화단에서 비트스트림의 일부만을 가지고 복원하더라도 보다 좋은 품질을 보장할 수 있다.According to the present invention, there is provided a method of encoding audio data, the method comprising the steps of: (a) outputting band-limited audio data by band-extending encoding of the audio data and generating band extension information; Huffman coding with a hierarchical structure having at least one higher layer; And (c) multiplexing the Huffman coded band-limited audio data and the band extension information. As a result, the bit rate can be adjusted according to network conditions, and better quality can be ensured even when only a part of the bitstream is recovered by the decoding end.

Description

TECHNICAL FIELD AND APPARATUS FOR ENCODED / DECODED audio data using bandwidth extension technology

도 1은 본 발명에 따른 부호화 장치의 블록도,1 is a block diagram of an encoding apparatus according to the present invention;

도 2는 도 1의 부호화 장치의 상세 블럭도,2 is a detailed block diagram of the encoding apparatus of FIG. 1;

도 3은 본 발명에 따른 복호화 장치의 블록도,3 is a block diagram of a decoding apparatus according to the present invention;

도 4는 도 3의 복호화 장치의 상세 블럭도,4 is a detailed block diagram of the decoding apparatus of FIG. 3;

도 5는 FGS 부호화기(2)로부터 출력된 비트스트림의 구조도,5 is a structural diagram of a bitstream output from the FGS encoder 2;

도 6은 도 5의 부가 정보의 상세 구조도,6 is a detailed structural diagram of additional information of FIG. 5;

도 7은 다중화기(3)로부터 출력되거나 역다중화기(7)로 입력되는 비트스트림의 구조도,7 is a structural diagram of a bitstream output from the multiplexer 3 or input to the demultiplexer 7;

도 8은 본 발명의 부호화 장치 및 복호화 장치에서 각각 수행되는 허프만 부호화/복호화 방식을 설명하기 위한 참고도,8 is a reference diagram for explaining a Huffman encoding / decoding method performed in each of an encoding apparatus and a decoding apparatus according to the present invention;

도 9는 BWE 복호화기(9)에서 수행되는 대역 확장 복호화, 즉 BWE 복호화를 보다 상세히 설명하기 위한 참고도,9 is a reference diagram for explaining in more detail the band extension decoding performed by the BWE decoder 9, that is, BWE decoding;

도 10은 본 발명에 따른 부호화 방법을 설명하기 위한 플로우챠트,10 is a flowchart for explaining an encoding method according to the present invention;

도 11은 본 발명에 따른 복호화 방법을 설명하기 위한 플로우챠트이다.11 is a flowchart for explaining a decoding method according to the present invention.

본 발명은 오디오 데이터의 부호화 및 복호화에 관한 것으로, 보다 상세하게는 대역 확장 기법을 이용한 오디오 데이터의 부호화 방법, 그 장치, 복호화 방법 및 그 장치에 관한 것이다.The present invention relates to encoding and decoding of audio data, and more particularly, to an encoding method, an apparatus, a decoding method and an apparatus of audio data using a band extension technique.

최근 디지털 신호처리 기술의 발달에 의해 오디오 신호는 디지털 데이터로 저장되고 재생되는 경우가 대부분이다. 디지털 오디오 저장/재생 장치는 아날로그 오디오 신호를 샘플링하고 양자화하여 디지탈 신호인 PCM(Pulse Code Modulation) 오디오 데이터로 변환하여 CD, DVD와 같은 정보저장매체에 저장해둔 다음 사용자가 필요로 할 때 이를 재생해서 들을 수 있도록 해준다. 디지털 방식에 의한 오디오 신호의 저장/복원 방식은 LP(Long-Play Record), 마그네틱 테이프와 같은 아날로그 저장/복원 방식에 비해 음질을 크게 향상시켰고 저장 기간에 따른 열화 현상을 현저히 감소시켰으나 디지털 데이터의 크기가 적지 않아 저장 및 전송이 원할하지 못한 문제점이 있었다.With the recent development of digital signal processing technology, audio signals are mostly stored and reproduced as digital data. Digital audio storage / playback equipment samples and quantizes analog audio signals, converts them to digital signal pulse code modulation (PCM) audio data, stores them on information storage media such as CDs and DVDs, and then plays them back when needed. Allows you to listen. Digital storage / restore method of audio signal greatly improves sound quality compared to analog storage / restore methods such as LP (Long-Play Record) and magnetic tape, and significantly reduces deterioration due to the storage period. There was a problem that the storage and transmission is not so small.

이와 같은 문제점을 해결하기 위해, 디지털 오디오 신호의 크기를 줄이기 위한 다양한 압축 방식이 사용되고 있다. ISO (International Standard Organization)에 의해 표준화 작업이 이루어진 MPEG (Moving Pictures Expert Group)/audio나 Dolby사에 의해 개발된 AC-2/AC-3는 인간의 심리음향 모델(Psychoacoustic Model)을 이용하여 데이터의 양을 줄이는 방법을 채용하였고 그 결과 신호의 특성에 관계없이 효율적으로 데이터의 양을 줄일 수 있었다. 즉, MPEG/audio 표준이나 AC-2/AC-3 방식은 이전의 디지털 부호화 방식에 비해 1/6 내지 1/8로 줄어든 64 Kbps - 384 Kbps 비트율만으로 CD의 음질과 거의 같은 정도의 음질을 제공한다.In order to solve this problem, various compression schemes have been used to reduce the size of digital audio signals. AC-2 / AC-3, developed by Moving Pictures Expert Group (MPEG) / audio or Dolby, which has been standardized by the ISO (International Standard Organization), uses the human psychoacoustic model to A method of reducing the amount was adopted, and as a result, the amount of data could be efficiently reduced regardless of the signal characteristics. In other words, the MPEG / audio standard or the AC-2 / AC-3 system provides sound quality almost identical to that of a CD with only 64 Kbps-384 Kbps bit rate, which is reduced by 1/6 to 1/8 compared to previous digital coding methods. do.

그러나, 이들 방법은 모두 고정된 비트율에 대해 최적의 상태를 찾아 양자화 과정과 부호화 과정을 거치는 방식을 따르므로, 네트워크를 통해 전송할 때 네트워크 상황이 좋지 않아 전송 대역폭이 낮아지면 끊김이 발생하며 사용자에게 더 이상의 서비스를 제공할 수 없게 되는 문제점이 있다. 또한, 제한된 저장 용량을 가지고 있는 이동식 기기에 적합하도록 좀 더 작은 크기의 비트스트림으로 변환하고자 할 때 크기를 줄이기 위해서는 재부호화 과정을 거쳐야 하므로 많은 계산량이 요구된다.However, all of these methods follow the quantization process and the encoding process to find the optimal state for a fixed bit rate. There is a problem that can not provide the above services. In addition, when a bitstream of a smaller size is converted to be suitable for a mobile device having limited storage capacity, a large amount of computation is required because a recoding process is required to reduce the size.

이에, 본 출원인은 비트 분할 산술 부호화(BSAC, Bit-Sliced Arithmetic Coding) 기법을 사용하여 비트율 조절이 가능한 오디오 부호화/복호화 방법 및 장치를 1997년 11월 19일자 대한민국 특허출원 제97-61298호로 출원하여 2000년 4월 17일자 등록특허 제261253호로 등록받았다. BSAC에 따르면, 높은 비트율로 부호화된 비트스트림을 낮은 비트율의 비트스트림으로 만들 수도 있고 그 중 일부의 비트스트림만을 가지고도 복원이 가능하므로 네트워크에 과부화가 걸리거나 복호화기의 성능이 좋지 않거나 또는 사용자가 낮은 비트율을 요구하면 비트스트림의 일부만을 가지고도 - 비트율이 낮아진 만큼 성능의 열화를 보이겠지만 - 사용자에게 어느 정도의 음질로 서비스를 제공할 수 있다. 그럼에도 불구하고, 비트율이 낮아지면 성 능의 열화는 피할 수 없는 문제점이 있다.Accordingly, the present applicant has filed an audio encoding / decoding method and apparatus for bit rate control using Bit-Sliced Arithmetic Coding (BSAC) as Korean Patent Application No. 97-61298 filed on November 19, 1997. On April 17, 2000, it was registered as a registered patent 262653. According to BSAC, high bit rate coded bitstreams can be made into low bit rate bitstreams, and even some of the bitstreams can be recovered, resulting in network overload, poor decoder performance, or If a low bit rate is required, even if only a portion of the bitstream is degraded as the bit rate is lowered, the service can be provided to a user with a certain sound quality. Nevertheless, when the bit rate is lowered, there is a problem that performance degradation is inevitable.

더불어, BSAC는 산술 부호화(arithmetic coding)를 채용하고 있음으로 인해 complexity가 높아 실제로 장치에 구현할 때 비용이 증가하는 단점이 있다. 또한, BSAC는 오디오 신호를 변환함에 있어 MDCT(Modified Discrete Cosine Transform)를 사용함에 따라 낮은 계층에서 음질의 열화가 보다 심해지는 문제점이 있다.In addition, BSAC has a disadvantage in that the complexity is high due to the adoption of arithmetic coding, which increases the cost when actually implementing the device. In addition, BSAC has a problem in that sound quality is worsened at a lower layer by using Modified Discrete Cosine Transform (MDCT) in converting an audio signal.

따라서, 본 발명이 이루고자 하는 기술적 과제는 비트스트림의 일부만을 가지고 복원하더라도 좋은 품질을 보장할 수 있는 비트율 조절가능한 오디오 부호화 방법, 그 장치, 복호화 방법 및 그 장치를 제공하는 것이다.Accordingly, the present invention has been made in an effort to provide a bit rate adjustable audio encoding method, an apparatus, a decoding method, and an apparatus capable of guaranteeing good quality even if only a part of a bitstream is recovered.

본 발명이 이루고자 하는 다른 기술적 과제는 complexity가 보다 낮은 비트율 조절가능한 오디오 부호화 방법, 그 장치, 복호화 방법 및 그 장치를 제공하는 것이다.Another technical problem to be achieved by the present invention is to provide a bit rate adjustable audio encoding method, a device, a decoding method and a device having a lower complexity.

본 발명이 이루고자 하는 또 다른 기술적 과제는 낮은 계층에서도 보다 양호한 음질을 제공할 수 있는 비트율 조절가능한 오디오 부호화 방법, 복호화 방법, 그 부호화 장치 및 복호화 장치를 제공하는 것이다.Another technical problem to be solved by the present invention is to provide a bit rate adjustable audio encoding method, a decoding method, an encoding apparatus and a decoding apparatus, which can provide better sound quality even at a lower layer.

상기 기술적 과제는 본 발명에 따라 오디오 데이터를 부호화하는 방법에 있어서, (a) 오디오 데이터를 대역 확장 부호화하여 대역 제한 오디오 데이터를 출력하고 대역 확장 정보를 생성하는 단계: (b) 상기 대역 제한 데이터를 비트율 조절가능하도록 기저 계층과 적어도 하나의 상위 계층을 갖는 계층 구조로 허프만 부호 화하는 단계; 및 (c) 허프만 부호화된 대역 제한 오디오 데이터와 상기 대역 확장 정보를 다중화하는 단계를 포함하는 것을 특징으로 하는 부호화 방법에 의해 달성된다.According to an aspect of the present invention, there is provided a method of encoding audio data, the method comprising the steps of: (a) outputting band-limited audio data and generating band extension information by band-extending encoding of the audio data; Encoding a Huffman into a hierarchical structure having a base layer and at least one upper layer so that bit rate is adjustable; And (c) multiplexing the Huffman coded band-limited audio data and the band extension information.

상기 (b)단계는 (b11) 상기 기저 계층에 해당하는 부가 정보를 차분 부호화하는 단계; (b12) 상기 기저 계층에 해당하는 복수개의 양자화 샘플을 비트 분할 부호화하는 단계; 및 (b13) 미리 결정된 복수개의 계층에 대한 부호화가 완료될 때까지 다음 상위 계층에 대해 상기 (b11)단계 및 (b12)단계를 반복 수행하는 단계를 포함하는 것이 바람직하다.Step (b) may include: (b11) differentially encoding side information corresponding to the base layer; (b12) bit division encoding a plurality of quantized samples corresponding to the base layer; And (b13) repeating steps (b11) and (b12) for the next higher layer until encoding of a plurality of predetermined layers is completed.

상기 (b)단계는 (b21) 상기 기저 계층에 해당하는 스케일 팩터 정보 및 코딩 모델 정보를 포함하는 부가 정보를 차분 부호화하는 단계; (b22) 상기 기저 계층에 해당하는 복수개의 양자화 샘플을 상기 코딩 모델 정보를 참조하여 비트 분할 부호화하는 단계; 및 (b23) 미리 결정된 복수개의 계층에 대한 부호화가 완료될 때까지 다음 상위 계층에 대해 상기 (b21)단계 및 (b22)단계를 반복 수행하는 단계를 포함하는 것이 바람직하다.Step (b) includes: (b21) differentially encoding side information including scale factor information and coding model information corresponding to the base layer; (b22) bit-dividing encoding the plurality of quantization samples corresponding to the base layer with reference to the coding model information; And (b23) repeating steps (b21) and (b22) for the next higher layer until encoding of a plurality of predetermined layers is completed.

상기 양자화 샘플은 PWT 변환하여 얻어진 것임이 바람직하다.The quantized sample is preferably one obtained by PWT conversion.

상기 (c)단계는 상기 부호화된 대역 제한 오디오 데이터 중 상기 기저 계층에 해당하는 데이터가 맨 먼저 배치되고 이어서 상기 대역 확장 정보가 배치되며 다음으로 나머지 상위 계층에 해당하는 데이터가 배치되는 순서로 다중화하는 단계이거나, 상기 대역 확장 정보가 맨 먼저 배치되고 이어서 상기 부호화된 대역 제한 오디오 데이터 중 상기 기저 계층에 해당하는 데이터가 배치되며 다음으로 나머지 상위 계층에 해당하는 데이터가 배치되는 순서로 다중화하는 단계임이 바람직하다.In step (c), the data corresponding to the base layer of the encoded band-limited audio data is arranged first, followed by the band extension information, and then multiplexing in the order of data corresponding to the remaining higher layers. Or the step of multiplexing in the order that the band extension information is placed first, followed by data corresponding to the base layer among the coded band-limited audio data, and then data corresponding to the remaining higher layers. Do.

한편, 본 발명의 다른 분야에 따르면 상기 기술적 과제는 오디오 데이터를 복호화하는 방법에 있어서, (a) 입력된 오디오 비트스트림을 역다중화하여 기저 계층과 적어도 하나의 상위 계층을 갖는 계층 구조로 부호화된 대역 제한 오디오 데이터와 대역 확장 정보를 추출하는 단계; (b) 적어도 기저 계층에 해당하는 상기 대역 제한 오디오 데이터를 허프만 복호화하는 단계; 및 (c) 복호화된 오디오 데이터를 기초로 상기 대역 확장 정보를 참조하여 상기 복호화된 오디오 데이터가 커버하지 않는 적어도 일부 대역의 오디오 데이터를 생성하여 상기 복호화된 오디오 데이터에 덧붙이는 단계를 포함하는 것을 특징으로 하는 복호화 방법에 의해서도 달성된다.Meanwhile, according to another aspect of the present invention, the technical problem is a method for decoding audio data, comprising: (a) demultiplexing an input audio bitstream and encoding a band having a base layer and a hierarchical structure having at least one upper layer Extracting limited audio data and band extension information; (b) Huffman decoding the band limited audio data corresponding to at least base layer; And (c) generating audio data of at least some bands not covered by the decoded audio data based on the decoded audio data and adding the decoded audio data to the decoded audio data. It is also achieved by a decoding method.

상기 (c)단계는 상기 복호화된 오디오 데이터의 경계에 맞도록 상기 일부 대역의 오디오 데이터를 생성하는 단계를 포함하는 것이 바람직하며, 웨이블릿 변환에서 사용되는 필터 뱅크(filter bank)에 경계에 맞도록 상기 일부 대역의 오디오 데이터를 생성하는 단계를 포함하거나, 웨이블릿 변환에서 사용되는 필터 뱅크(filter bank)에 경계에 맞지 않을 경우 상기 복호화된 오디오 데이터와 상기 생성된 일부 대역의 오디오 데이터가 중첩된 부분을 보간(interpolation)하는 단계를 포함하는 것이 더욱 바람직하다.Step (c) preferably includes the step of generating the audio data of the partial band to fit the boundary of the decoded audio data, the step to fit the boundary to the filter bank used in the wavelet transform Generating audio data of some bands, or interpolating a portion where the decoded audio data and the audio data of the generated some bands overlap when the filter bank used in the wavelet transform does not fit a boundary; It is more preferred to include the step of interpolation.

상기 (a)단계는 상기 비트스트림으로부터 맨 먼저 상기 기저 계층에 해당하는 데이터를 추출하고, 이어서 상기 대역 확장 정보를 추출하며, 다음으로 나머지 상위 계층에 해당하는 데이터를 추출하는 순서로 역다중화하는 단계임이 바람직하 다.In the step (a), first extracting data corresponding to the base layer from the bitstream, and then extracting the bandwidth extension information, and then demultiplexing in order of extracting data corresponding to the remaining upper layers. Is preferred.

상기 (a)단계는 상기 비트스트림으로부터 맨 먼저 상기 대역 확장 정보를 추출하고 이어서 상기 기저 계층에 해당하는 데이터를 추출하며 다음으로 나머지 상위 계층에 해당하는 데이터를 추출하는 순서로 역다중화하는 단계임이 바람직하다.The step (a) is preferably a step of demultiplexing in order of first extracting the band extension information from the bitstream, then extracting data corresponding to the base layer, and then extracting data corresponding to the remaining upper layers. Do.

상기 (b)단계는 (b11) 상기 기저 계층에 해당하는 부가 정보를 차분 복호화하는 단계; (b12) 상기 기저 계층에 해당하는 복수개의 양자화 샘플을 비트 분할 복호화하는 단계; 및 (b13) 미리 결정된 복수개의 계층에 대한 복호화가 완료될 때까지 다음 상위 계층에 대해 상기 (b11)단계 및 (b12)단계를 반복 수행하는 단계를 포함하는 것이 바람직하다.Step (b) may include: (b11) differentially decoding side information corresponding to the base layer; (b12) bit division decoding a plurality of quantized samples corresponding to the base layer; And (b13) repeating steps (b11) and (b12) for the next higher layer until decoding of a plurality of predetermined layers is completed.

상기 (b)단계는 (b21) 상기 기저 계층에 해당하는 스케일 팩터 정보 및 코딩 모델 정보를 포함하는 부가 정보를 차분 복호화하는 단계; (b22) 상기 기저 계층에 해당하는 복수개의 양자화 샘플을 상기 코딩 모델 정보를 참조하여 비트 분할 복호화하는 단계; 및 (b23) 미리 결정된 복수개의 계층에 대한 복호화가 완료될 때까지 다음 상위 계층에 대해 상기 (b21)단계 및 (b22)단계를 반복 수행하는 단계를 포함하는 것이 바람직하다.Step (b) may include: (b21) differentially decoding side information including scale factor information and coding model information corresponding to the base layer; (b22) bit-decoding and decoding a plurality of quantization samples corresponding to the base layer with reference to the coding model information; And (b23) repeating steps (b21) and (b22) for the next higher layer until decoding of a plurality of predetermined layers is completed.

한편, 본 발명의 다른 분야에 따르면 상기 기술적 과제는 오디오 데이터를 부호화하는 장치에 있어서, 오디오 데이터를 대역 확장 부호화하여 대역 제한 오디오 데이터를 출력하고 대역 확장 정보를 생성하는 BWE 부호화기; 상기 대역 제한 데이터를 비트율 조절가능하도록 기저 계층과 적어도 하나의 상위 계층을 갖는 계층 구조로 허프만 부호화하는 FGS 부호화기; 및 부호화된 대역 제한 오디오 데이터 와 상기 대역 확장 정보를 다중화하는 다중화기를 포함하는 것을 특징으로 하는 부호화 장치에 의해서도 달성된다.According to another aspect of the present invention, there is provided an apparatus for encoding audio data, the apparatus comprising: a BWE encoder outputting band-limited audio data by band-extending encoding the audio data and generating band extension information; An FGS encoder for Huffman coding the band-limited data into a hierarchical structure having a base layer and at least one upper layer to enable bit rate adjustment; And a multiplexer for multiplexing the coded band-limited audio data and the band extension information.

상기 FGS 부호화기는 상기 기저 계층에 해당하는 부가 정보를 차분 부호화하고 상기 기저 계층에 해당하는 복수개의 양자화 샘플을 비트 분할 부호화하며, 미리 결정된 복수개의 계층에 대한 부호화가 완료될 때까지 다음 상위 계층에 해당하는 부가 정보 및 복수개의 양자화 샘플을 비트 분할 부호화하는 것이 바람직하다.The FGS encoder differentially encodes additional information corresponding to the base layer, bit-divide-codes a plurality of quantization samples corresponding to the base layer, and corresponds to a next higher layer until encoding of a plurality of predetermined layers is completed. It is preferable to perform bit division coding on the side information and the plurality of quantized samples.

상기 FGS 부호화기는 상기 기저 계층에 해당하는 스케일 팩터 정보 및 코딩 모델 정보를 포함하는 부가 정보를 차분 부호화하고, 상기 기저 계층에 해당하는 복수개의 양자화 샘플을 상기 코딩 모델 정보를 참조하여 비트 분할 부호화하며, 미리 결정된 복수개의 계층에 대한 부호화가 완료될 때까지 다음 상위 계층에 해당하는 스케일 팩터 정보 및 코딩 모델 정보를 포함하는 부가 정보를 부호화하고 다음 상위 계층에 해당하는 복수개의 양자화 샘플을 비트 분할 부호화하는 것이 바람직하다.The FGS encoder differentially encodes additional information including scale factor information and coding model information corresponding to the base layer, and performs bit division encoding on a plurality of quantized samples corresponding to the base layer with reference to the coding model information, Encoding additional information including scale factor information and coding model information corresponding to a next higher layer and performing bit split encoding on a plurality of quantization samples corresponding to a next higher layer until encoding of a plurality of predetermined layers is completed is performed. desirable.

상기 FGS 부호화기는 PWT 변환하여 상기 양자화 샘플을 얻는 것이 바람직하다.Preferably, the FGS encoder performs PWT conversion to obtain the quantized sample.

상기 다중화기는 상기 부호화된 대역 제한 오디오 데이터 중 상기 기저 계층에 해당하는 데이터가 맨 먼저 배치되고 이어서 상기 대역 확장 정보가 배치되며 다음으로 나머지 상위 계층에 해당하는 데이터가 배치되는 순서로 다중화하는 것이 바람직하다.Preferably, the multiplexer multiplexes the data corresponding to the base layer among the encoded band limited audio data first, followed by the band extension information, and then the data corresponding to the remaining higher layers. .

한편, 본 발명의 다른 분야에 따르면 상기 기술적 과제는 오디오 데이터를 복호화하는 장치에 있어서, 입력된 오디오 비트스트림을 역다중화하여 기저 계층과 적어도 하나의 상위 계층을 갖는 계층 구조로 부호화된 대역 제한 오디오 데이터와 대역 확장 정보를 추출하는 역다중화기; 적어도 기저 계층에 해당하는 상기 대역 제한 오디오 데이터를 복호화하는 FGS 허프만 복호화기; 및 복호화된 오디오 데이터를 기초로 상기 대역 확장 정보를 참조하여 상기 복호화된 오디오 데이터가 커버하지 않는 적어도 일부 대역의 오디오 데이터를 생성하여 상기 복호화된 오디오 데이터에 덧붙이는 BWE 복호화기를 포함하는 것을 특징으로 하는 복호화 장치에 의해서도 달성된다.Meanwhile, according to another aspect of the present invention, in the apparatus for decoding audio data, the band-limited audio data encoded in a hierarchical structure having a base layer and at least one upper layer by demultiplexing an input audio bitstream. A demultiplexer for extracting band extension information; An FGS Huffman decoder that decodes the band limited audio data corresponding to at least a base layer; And a BWE decoder generating audio data of at least some bands not covered by the decoded audio data based on the decoded audio data and appending the decoded audio data to the decoded audio data. It is also achieved by the decoding device.

상기 FGS 복호화기는 상기 기저 계층에 해당하는 부가 정보를 차분 복호화하고, 상기 기저 계층에 해당하는 복수개의 양자화 샘플을 비트 분할 복호화하며, 미리 결정된 복수개의 계층에 대한 복호화가 완료될 때까지 다음 상위 계층에 대응하는 부가 정보를 복호화하고 대응하는 복수개의 양자화 샘플을 비트 분할 복호화하는 것이 바람직하다.The FGS decoder differentially decodes side information corresponding to the base layer, bit-decodes a plurality of quantization samples corresponding to the base layer, and performs decoding on a next higher layer until decoding of a plurality of predetermined layers is completed. It is preferable to decode the corresponding side information and bit-decode the corresponding plurality of quantized samples.

상기 역다중화기는 상기 비트스트림으로부터 맨 먼저 상기 기저 계층에 해당하는 데이터를 추출하고, 이어서 상기 대역 확장 정보를 추출하며, 다음으로 나머지 상위 계층에 해당하는 데이터를 추출하는 순서로 역다중화하거나, 상기 비트스트림으로부터 맨 먼저 상기 대역 확장 정보를 추출하고 이어서 상기 기저 계층에 해당하는 데이터를 추출하며 다음으로 나머지 상위 계층에 해당하는 데이터를 추출하는 순서로 역다중화하는 것이 바람직하다.
The demultiplexer first extracts data corresponding to the base layer from the bitstream, and then extracts the band extension information, and then demultiplexes in order of extracting data corresponding to the remaining higher layers, or It is preferable to demultiplex in order of first extracting the band extension information from a stream, then extracting data corresponding to the base layer, and then extracting data corresponding to the remaining upper layers.

이하 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 부호화 장치의 블럭도이다.1 is a block diagram of an encoding apparatus according to the present invention.

도 1을 참조하면, 부호화 장치는 부호화 장치는 PCM(Pulse Coded Modulation) 오디오 데이터를 입력받아 본 발명에 따라 부호화하여 오디오 비트스트림을 출력하는 장치로서, BWE 부호화기(1), FGS 부호화기(2) 및 다중화기(3)를 포함한다.Referring to FIG. 1, the encoding device is a device that receives Pulse Coded Modulation (PCM) audio data and encodes the same according to the present invention to output an audio bitstream. The encoding device includes a BWE encoder 1, an FGS encoder 2, and And a multiplexer 3.

BWE 부호화기(1)는 PCM 오디오 데이터를 대역 확장 부호화하여 대역 제한 데이터를 출력하고 대역 확장 정보를 생성한다. 대역 확장 부호화란 오디오 데이터를 입력받아 소정 주파수 이상의 높은 주파수 대역의 데이터를 잘라내어 버리는 한편 잘라내어 버린 높은 주파수 대역의 데이터를 복원하기 위해 필요한 부가 정보를 생성하는 것을 가리킨다. 여기서, 입력된 오디오 데이터 중 높은 주파수 대역의 데이터를 잘라내어 버리고 남은 데이터를 대역 제한 오디오 데이터라고 하고, 버린 데이터를 복원하기 위해 필요한 부가 정보는 대역 확장 정보라고 한다. 대역 확장 기술의 대표적인 예로는 Coding Technology사의 SBR(Spectral Band Replication) 기술을 들 수 있다. SBR에 대한 상세한 설명은 2002년 5월 10-13일 Audio Engineering Society 112 차 컨벤션에서 발표된 Convention Paper 5560에 개시되어 있다.The BWE encoder 1 performs band extension encoding on the PCM audio data to output band limit data and to generate band extension information. The band extension encoding refers to generating additional information necessary for recovering the data of the high frequency band which has been cut off while receiving the audio data and cutting out the data of the high frequency band over a predetermined frequency. Here, the remaining data after cutting out the data of the high frequency band among the input audio data is called band-limited audio data, and the additional information necessary for recovering the discarded data is called band extension information. A representative example of the band extension technology is Coding Technology's SBR (Spectral Band Replication) technology. A detailed description of the SBR is disclosed in Convention Paper 5560, presented at the Audio Engineering Society 112th Convention, May 10-13, 2002.

FGS 부호화기(2)는 대역 제한 오디오 데이터를 비트율 조절가능하도록 기저 계층과 적어도 하나의 상위 계층을 갖는 계층 구조로 부호화한다. FGS 부호화는 비트율 조절가능하도록, 즉 FGS(Fine Grain Scalability)를 제공할 수 있도록 복수개의 계층 구조로 부호화하는 것을 의미한다. FGS 부호화의 일 예로는 본 출원인에 의해 1997년 11월 19일자 대한민국 특허출원 제97-61298호로 출원하여 2000년 4월 17일자 등록특허 제261253호 비트율 조절이 가능한 오디오 부호화/복호화 방법 및 장치에 개시된 비트 분할 부호화 기술, 즉 BSAC(Bit-Sliced Arithmetic Coding) 부호화 기술을 들 수 있다. 즉, FGS 부호화기(2)는 기저 계층에 해당하는 부가 정보를 차분 부호화하고, 기저 계층에 해당하는 복수개의 양자화 샘플을 비트 분할 부호화하며, 미리 결정된 복수개의 계층에 대한 부호화가 완료될 때까지 다음 상위 계층에 대한 부가 정보를 차분 부호화하고, 대응하는 복수개의 양자화 샘플을 비트 분할 부호화한다. 부가 정보는 스케일 팩터 정보 및 코딩 모델 정보를 포함한다. 양자화 샘플은 입력된 오디오 데이터를 변환하고 양자화하여 얻어진다. 보다 상세한 설명은 후술한다.The FGS encoder 2 encodes the band limited audio data into a hierarchical structure having a base layer and at least one upper layer so that the bit rate is adjustable. FGS encoding means encoding in a plurality of hierarchical structures so that bit rate is adjustable, that is, to provide fine grain scalability (GFS). An example of FGS encoding is disclosed in Korean Patent Application No. 97-61298 filed on November 19, 1997, filed by the present applicant and disclosed in an audio encoding / decoding method and apparatus capable of adjusting bit rate on April 17, 2000. Bit split coding, that is, bit-sliced Arithmetic Coding (BSAC) coding. That is, the FGS encoder 2 differentially encodes additional information corresponding to the base layer, bit-divids-codes a plurality of quantized samples corresponding to the base layer, and then encodes the next higher layer until encoding of the plurality of predetermined layers is completed. The differential information of the layer is differentially encoded, and the corresponding plurality of quantized samples are bit-divided encoded. The additional information includes scale factor information and coding model information. Quantization samples are obtained by converting and quantizing input audio data. A more detailed description will be described later.

다중화기(3)는 FGS 부호화기(2)에 의해 부호화된 대역 제한 오디오 데이터와 BWE 부호화기(1)에 의해 생성된 대역 확장 정보를 다중화한다.The multiplexer 3 multiplexes the band limited audio data encoded by the FGS encoder 2 and the band extension information generated by the BWE encoder 1.

도 2는 도 1의 부호화 장치의 상세 블럭도이다.FIG. 2 is a detailed block diagram of the encoding apparatus of FIG. 1.

도 2를 참조하면, 부호화 장치는 BWE 부호화기(1), FGS 부호화기(2) 및 다중화기(3)를 포함한다. 도 1의 그것과 실질적으로 동일한 기능을 수행하는 블럭에는 동일한 참조번호를 부여하고 중복되는 설명은 생략한다.Referring to FIG. 2, the encoding apparatus includes a BWE encoder 1, an FGS encoder 2, and a multiplexer 3. The same reference numerals are assigned to blocks that perform substantially the same functions as those in FIG. 1, and redundant descriptions are omitted.

특히, FGS 부호화기(2)는 PWT 변환부(21), 심리음향부(22), 양자화부(23), FGS 허프만 부호화부(24)를 구비한다. In particular, the FGS encoder 2 includes a PWT converter 21, a psychoacoustic unit 22, a quantization unit 23, and an FGS Huffman encoder 24.

PWT 변환부(21)는 시간 영역의 오디오 신호인 PCM 오디오 데이터를 입력받아 심리음향부(22)로부터의 제공되는 음향심리모델에 관한 정보를 참조하여 주파수 영역의 신호로 PWT(Pseudo Wavelet Transform) 변환한다. 시간 영역에서는 인간이 인지하는 오디오 신호의 특성의 차이가 그리 크지 않지만, 변환을 통해 얻어진 주파수 영역의 오디오 신호는 인간의 음향심리모델에 따라 각 주파수 대역에서 인간이 느낄 수 있는 신호와 느낄 수 없는 신호의 특성 차이가 크기 때문에 각 주파수 대역 별로 할당되는 비트수를 다르게 함으로써 압축의 효율을 높일 수 있다. 낮은 주파수 대역에서의 주파수 분해능이 필요 이상으로 높음으로 인해 작은 왜곡(distortion)에 의해서도 인간의 귀에 인지되는 열화가 발생되는 MDCT에 비해, PWT 변환은 시간/주파수 분해능이 보다 적절하여 낮은 주파수 대역을 갖는 낮은 계층에서도 보다 안정적인 음질을 제공해줄 수 있다.The PWT converter 21 receives PCM audio data, which is an audio signal in the time domain, and converts a PWT (Pseudo Wavelet Transform) into a signal in the frequency domain by referring to the information about the psychoacoustic model provided from the psychoacoustic unit 22. do. In the time domain, the difference in the characteristics of the audio signal perceived by human beings is not so large, but the audio signal in the frequency domain obtained through the conversion is a signal that humans can and cannot feel in each frequency band according to the human psychoacoustic model. Because of the large difference in the characteristics of, the efficiency of compression can be improved by varying the number of bits allocated to each frequency band. Compared to MDCT, where the frequency resolution in the low frequency band is higher than necessary and the deterioration perceived by the human ear is caused by small distortion, PWT conversion has a lower frequency band because the time / frequency resolution is more appropriate. Even lower layers can provide more stable sound quality.

심리음향부(22)는 어택(attack) 감지 정보, 등 음향심리모델에 관한 정보를 변환부(21)로 제공하는 한편, 변환부(21)에 의해 변환된 오디오 신호를 적절한 서브 밴드의 신호들로 묶고 각 신호들의 상호작용으로 인해 발생되는 마스킹현상을 이용하여 각 서브 밴드에서의 마스킹 문턱치(masking threshold)를 계산하여 양자화부(23)로 제공한다. 마스킹 문턱치란 오디오 신호들의 상호 작용으로 인해 인간이 들어도 느끼지 못하는 신호의 최대 크기를 말한다. 본 실시예에서 심리음향부(22)는 BMLD(Binaural Masking Level Depression)를 이용하여 스테레오 성분에 대한 마스킹 문턱치 등을 계산한다.The psychoacoustic unit 22 provides attack detection information, such as information about an acoustic psychological model, to the converting unit 21, while the audio signal converted by the converting unit 21 receives signals of appropriate subbands. By using the masking phenomenon generated by the interaction of each signal to calculate the masking threshold (masking threshold) in each subband to provide to the quantization unit (23). Masking threshold refers to the maximum size of a signal that humans do not feel due to the interaction of audio signals. In the present embodiment, the psychoacoustic unit 22 calculates masking thresholds and the like for stereo components using BMLD (Binaural Masking Level Depression).

양자화부(23)는 인간이 들어도 느끼지 못하도록 각 대역의 양자화 잡음의 크 기가 심리음향부(22)에서 제공된 마스킹 문턱치보다 작도록 각 대역의 오디오 신호들을 대응하는 스케일 팩터 정보를 기초로 스칼라 양자화하여 양자화 샘플들을 출력한다. 즉, 양자화부(23)는 심리음향부(22)에서 계산된 마스킹 문턱치와 각 대역에서 발생하는 잡음(noise)의 비율인 NMR (Noise-to-Mask Ratio)를 이용하여 전 대역의 NMR 값이 0 dB 이하가 되도록 양자화한다. NMR 값이 0 dB 이하라는 것은 양자화 잡음을 인간이 들을 수 없음을 의미한다.The quantization unit 23 quantizes scalar quantization based on corresponding scale factor information of audio signals of each band such that the magnitude of quantization noise of each band is smaller than a masking threshold provided by the psychoacoustic unit 22 so that a human cannot feel it. Output samples. That is, the quantization unit 23 uses the masking threshold calculated by the psychoacoustic unit 22 and the noise-to-mask ratio (NMR), which is a ratio of noise generated in each band, to increase the NMR value of the entire band. Quantize it to 0 dB or less. An NMR value of 0 dB or less means that humans cannot hear quantization noise.

FGS 허프만 부호화부(24)는 각 계층에 속하는 양자화 샘플들 및 부가 정보를 부호화하여 계층 구조로 부호화한다. 부가 정보는 각 계층에 해당하는 스케일 밴드 정보, 코딩 밴드 정보, 그 스케일 팩터 정보 및 코딩 모델 정보를 포함한다. 스케일 밴드 정보와 코딩 밴드 정보는 오디오 비트스트림을 구성하는 각 프레임의 헤더 정보로서 패킹되어 복호화 장치로 전송될 수도 있고, 각 계층마다의 부가 정보로서 부호화되고 패킹되어 복호화 장치로 전송될 수도 있으며, 복호화 장치에 미리 저장되어 있음으로 인해 전송되지 않을 수도 있다.The FGS Huffman encoder 24 encodes the quantized samples and the additional information belonging to each layer and encodes them in a hierarchical structure. The additional information includes scale band information, coding band information, its scale factor information, and coding model information corresponding to each layer. The scale band information and the coding band information may be packed as header information of each frame constituting the audio bitstream and transmitted to the decoding apparatus, or may be encoded and packed as additional information for each layer and transmitted to the decoding apparatus. It may not be transmitted because it is stored in the device in advance.

보다 구체적으로, FGS 허프만 부호화부(24)는 첫 번째 계층에 상응하는 스케일 팩터 정보 및 코딩 모델 정보를 포함하는 부가 정보를 차분 부호화하는 한편, 첫 번째 계층에 상응하는 양자화 샘플들을 대응 코딩 모델 정보를 참조하여 비트 분할 부호화한다. 비트 분할 부호화는 전술한 BSAC 부호화에서 채용된 부호화로서 양자화 샘플들의 최상위 비트들, 다음 상위 비트들,‥, 최하위 비트들의 순서로 무손실 부호화함을 의미한다. 다음으로 두 번째 계층에 대해서도 동일한 과정을 반복한다. 즉, 미리 결정된 복수개의 계층에 대한 부호화가 완료될 때까지 계층을 증가시키면서 부호화한다. 첫번째 계층은 기저 계층이라고 하고 나머지 계층은 상위 계층이라고 부른다. 계층 구조에 대한 보다 상세한 설명은 후술한다.More specifically, the FGS Huffman encoder 24 differentially encodes additional information including scale factor information and coding model information corresponding to the first layer, and encodes quantization samples corresponding to the first layer to corresponding coding model information. Bit division coding is performed with reference. Bit split coding is a coding scheme employed in the aforementioned BSAC encoding, which means lossless coding in order of most significant bits, next higher bits, ..., and least significant bits of quantized samples. Next, the same process is repeated for the second layer. That is, the encoding is performed while increasing the layers until the encoding of the plurality of predetermined layers is completed. The first layer is called the base layer and the rest are called higher layers. A more detailed description of the hierarchical structure will be given later.

스케일 밴드 정보는 오디오 신호의 주파수 특성에 따라 보다 적절하게 양자화를 수행하기 위한 정보로, 주파수 영역을 복수개의 밴드로 나누고 각 밴드에 적합한 스케일 팩터를 할당하였을 때 각 계층에 대응하는 스케일 밴드를 알려주는 정보를 말한다. 이에, 각 계층은 적어도 하나의 스케일 밴드에 속하게 된다. 각 스케일 밴드는 할당된 하나의 스케일 팩터를 가진다. 코딩 밴드 정보 또한 오디오 신호의 주파수 특성에 따라 보다 적절하게 부호화를 수행하기 위한 정보로, 주파수 영역을 복수개의 밴드로 나누고 각 밴드에 적합한 코딩 모델을 할당하였을 때 각 계층에 대응하는 코딩 밴드를 알려주는 정보를 말한다. 스케일 밴드와 코딩 밴드는 실험에 의해 적절히 나누어지며 대응하는 스케일 팩터와 코딩 모델이 결정된다.The scale band information is information for more appropriate quantization according to the frequency characteristics of the audio signal. The scale band information indicates a scale band corresponding to each layer when the frequency domain is divided into a plurality of bands and an appropriate scale factor is allocated to each band. Say information. Thus, each layer belongs to at least one scale band. Each scale band has one scale factor assigned to it. Coding band information is also information for more appropriately performing encoding according to the frequency characteristics of an audio signal, and indicates a coding band corresponding to each layer when a frequency domain is divided into a plurality of bands and an appropriate coding model is assigned to each band. Say information. The scale band and coding band are appropriately divided by experiment, and the corresponding scale factor and coding model are determined.

다중화기(3)는 부호화된 양자화 샘플 중 기저 계층에 해당하는 데이터를 맨 먼저 배치하고 이어서 대역 확장 정보를 배치하며 다음으로 나머지 상위 계층에 해당하는 데이터를 배치하거나 또는 대역 확장 정보를 맨 먼저 배치하고 이어서 기저 계층에 해당하는 데이터를 배치하며 다음으로 나머지 상위 계층에 해당하는 데이터를 배치하는 순서로 다중화한다.The multiplexer 3 first places the data corresponding to the base layer among the encoded quantization samples, and then places the band extension information, and then places the data corresponding to the remaining higher layers or first places the band extension information. Next, the data corresponding to the base layer is placed, and then the multiplexing data is placed in the order of placing the data corresponding to the remaining upper layers.

도 3은 본 발명에 따른 복호화 장치의 블럭도이다.3 is a block diagram of a decoding apparatus according to the present invention.

도 3을 참조하면, 복호화 장치는 오디오 비트스트림을 입력받아 본 발명에 따라 복호화하여 오디오 데이터를 출력하는 장치로서, 역다중화기(7), FGS 복호화기(8) 및 BWE 복호화기(9)를 포함한다. Referring to FIG. 3, a decoding apparatus receives an audio bitstream and decodes it according to the present invention, and outputs audio data. The decoding apparatus includes a demultiplexer 7, an FGS decoder 8, and a BWE decoder 9. do.

역다중화기(7)는 입력된 오디오 비트스트림을 역다중화하여 기저 계층과 적어도 하나의 상위 계층을 갖는 계층 구조로 부호화된 대역 제한 오디오 데이터와 대역 확장 정보를 추출한다. 여기서, 대역 제한 오디오 데이터 및 대역 확장 정보는 도 1을 참조하여 설명한 그것과 동일한 의미를 가진다. FGS 복호화기(8)는 역다중화기(7)에 의해 추출된 대역 제한 데이터 중 적어도 기저 계층에 해당하는 대역 제한 오디오 데이터를 복호화한다. 어느 계층까지 복호화할 것인지 여부는 네트워크 상태, 사용자의 선택 등에 따라 결정된다.The demultiplexer 7 demultiplexes the input audio bitstream to extract band-limited audio data and band extension information encoded in a hierarchical structure having a base layer and at least one upper layer. Here, the band limited audio data and the band extension information have the same meanings as those described with reference to FIG. 1. The FGS decoder 8 decodes the band limited audio data corresponding to at least the base layer among the band limited data extracted by the demultiplexer 7. Which layer is to be deciphered depends on the network condition, the user's choice, and the like.

BWE 복호화기(9)는 FGS 복호화기(8)에 의해 복호화된 오디오 데이터를 기초로 역다중화기(7)에 의해 추출된 대역 확장 정보를 참조하여 FGS 복호화기(8)에 의해 복호화된 데이터가 커버하지 않는 적어도 일부 대역의 오디오 데이터를 생성하여 FGS 복호화기(8)에 의해 복호화된 대역 제한 오디오 데이터에 덧붙인다.The BWE decoder 9 covers the data decoded by the FGS decoder 8 with reference to the band extension information extracted by the demultiplexer 7 based on the audio data decoded by the FGS decoder 8. Audio data of at least some bands which are not used is generated and added to the band limited audio data decoded by the FGS decoder 8.

한편, 본 발명은 Pseudo Wavelet 변환에 따르므로 BWE 복호화기(9)는 다음과 같은 과정을 거친다. Pseudo Wavelet 변환을 통하여 부호화를 수행할 때 대역 제한 오디오 데이터를 결정함에 있어서 주파수 축 상의 마지막 노드를 결정하여 컷오프 주파수가 선택된다. MDCT와는 달리 wavelet 변환은 고주파 부분에서는 주파수 분해능이 낮기 때문에, 결정된 마지막 노드에 따라 대역제한을 할 경우 미세한 조절이 불가능하다. 따라서 복호화 과정에 있어서, BWE 복호화기(8)는 FGS 복호화기(9)에 의해 생성된 core 부분을 주파수 축 상에 정렬시켜서 FGS 복호화기(9)에 의해 생성된 core부분의 주파수 대역폭을 확인하고 이에 맞도록 BWE 부분을 수정하여 복호화한다. On the other hand, since the present invention is based on the pseudo wavelet transform, the BWE decoder 9 goes through the following process. When encoding through Pseudo Wavelet transform, the cutoff frequency is selected by determining the last node on the frequency axis in determining band-limited audio data. Unlike MDCT, wavelet transform has low frequency resolution in high frequency part, so it is impossible to make fine adjustment when band limiting according to the determined last node. Therefore, in the decoding process, the BWE decoder 8 aligns the core portion generated by the FGS decoder 9 on the frequency axis to check the frequency bandwidth of the core portion generated by the FGS decoder 9. The BWE portion is modified and decoded accordingly.

예를 들어 16 개의 계층으로 구성된 64 kbps로 부호화된 비트스트림 중 8개의 계층만을 이용하여 복원할 경우 8 번째 계층에 해당하는 주파수가 8.5kHz하자. 이와 같은 경우 BWE 복호화기(8)는 8.5kHz에서부터 15kHz 이상까지의 데이터를 복원해주어 한다. BWE 복호화기(8)는 QMF(Quadrature mirror filter) 필터 특성상 QMF 한 채널의 밴드폭(bandwidth) 단위로만 주파수 대역폭의 조절이 가능하다. QMF 필터의 n 번째 필터의 주파수 대역폭이 8.3kHz이라 하자. 그와 같은 경우 8.3~8.5kHz에 해당하는 주파수 성분은 core부분과 BWE부분 양측에 모두 존재하기 때문에 두 데이터를 적절히 처리하야 한다.For example, when restoring using only eight layers of a 64 kbps coded bit stream consisting of 16 layers, the frequency corresponding to the eighth layer is 8.5 kHz. In this case, the BWE decoder 8 restores data from 8.5 kHz to 15 kHz or more. The BWE decoder 8 can adjust the frequency bandwidth only in the bandwidth unit of one channel of QMF due to the characteristics of a quadrature mirror filter (QMF) filter. Assume that the frequency bandwidth of the nth filter of the QMF filter is 8.3 kHz. In such a case, the frequency components corresponding to 8.3 to 8.5 kHz exist on both the core part and the BWE part, so the two data must be processed properly.

이를 처리하기 위한 첫번째 방법은 core부분에서 8.3~8.5kHz에 해당하는 주파수 성분을 모두 제거하는 방법이다. 이 경우, FGS 복호화기(9)는 BWE부분의 대역폭 정보를 고려하여 복호화를 수행하는 것이고, 두번째 방법은 core부분의 데이터를 BWE 복호화기(8)에서 사용되는 QMF 필터를 거친 다음 보간(interpolation)을 통해 QMF 데이터를 만들어 역 QMF필터링을 하여 복원하는 방법이다.The first method to deal with this is to remove all the frequency components corresponding to 8.3 ~ 8.5kHz in the core part. In this case, the FGS decoder 9 performs decoding in consideration of the bandwidth information of the BWE part, and the second method interpolates the data of the core part after the QMF filter used in the BWE decoder 8. QMF data is created by using reverse QMF filtering and restoring.

이처럼, FGS 복호화기(8)에 의해 복호화된 오디오 데이터가 기저 대역에 속하는 오디오 데이터뿐일 경우라도 결손된 대역의 오디오 데이터를 BWE 복호화기(9)에 의해 생성하여 덧붙임으로써 복호화된 오디오 데이터의 품질을 높일 수 있게 된다.In this way, even when the audio data decoded by the FGS decoder 8 is only audio data belonging to the base band, the quality of the decoded audio data by generating and adding the audio data of the missing band by the BWE decoder 9 is added. To increase.

도 4는 도 3의 복호화 장치의 상세 블럭도이다.4 is a detailed block diagram of the decoding apparatus of FIG. 3.

도 4를 참조하면, 복호화 장치는 역다중화기(7), FGS 복호화기(8) 및 BWE 복호화기(9)를 포함한다. 본 발명의 관점에서 실질적으로 동일한 기능을 수행하는 블럭에는 도 3의 그것과 동일한 참조번호를 부여하고 중복되는 설명은 생략한다.Referring to FIG. 4, the decoding apparatus includes a demultiplexer 7, an FGS decoder 8, and a BWE decoder 9. Blocks that perform substantially the same function in view of the present invention are given the same reference numerals as those in FIG. 3 and duplicated descriptions are omitted.

특히, FGS 복호화기(8)는 네트워크 상황, 장치의 성능, 사용자 선택 등에 따라 결정된 타겟 계층까지 복호화함으로써 비트율을 조절할 수 있는 장치로서, FGS 허프만 복호화부(81), 역양자화부(82) 및 PWT 역변환부(83)를 구비한다. FGS 허프만 복호화부(81)는 오디오 비트스트림을 타겟 계층까지 복호화한다. 보다 구체적으로, 각 계층 대응하는 스케일 팩터 정보, 코딩 모델 정보가 포함된 부가 정보를 복호화하여 얻어진 코딩 모델 정보를 기초로 각 계층에 속하는 부호화된 양자화 샘플들을 허프만 복호화하여 양자화 샘플들을 얻는다. 보다 상세한 설명은 후술한다.In particular, the FGS decoder 8 is a device capable of adjusting the bit rate by decoding up to a target layer determined according to network conditions, device performance, user selection, and the like. The FGS Huffman decoder 81, the dequantizer 82, and the PWT An inverse transform unit 83 is provided. The FGS Huffman decoder 81 decodes the audio bitstream to the target layer. More specifically, Huffman-decoded coded quantization samples belonging to each layer are obtained based on coding model information obtained by decoding side information including scale factor information and coding model information corresponding to each layer to obtain quantization samples. A more detailed description will be described later.

한편, 스케일 밴드 정보와 코딩 밴드 정보는 비트스트림의 헤더 정보로부터 얻거나, 각 계층 별 부가 정보를 복호화하여 얻을 수 있다. 대안적으로, 복호화 장치가 스케일 밴드 정보 및 코딩 밴드 정보를 미리 저장하고 있을 수도 있다.Meanwhile, scale band information and coding band information may be obtained from header information of a bitstream or by decoding additional information of each layer. Alternatively, the decoding apparatus may previously store scale band information and coding band information.

역양자화부(82)는 각 계층의 양자화 샘플을 대응하는 스케일 팩터 정보에 따라 역양자화하여 복원한다. PWT 역변환부(83)는 복원된 샘플을 주파수/시간 매핑하여 시간 영역의 PCM 오디오 데이터로 PWT 역변환하여 출력한다.The inverse quantization unit 82 dequantizes and restores quantized samples of each layer according to corresponding scale factor information. The PWT inverse transform unit 83 performs frequency / time mapping on the reconstructed samples and inversely converts the PWT into PCM audio data in the time domain.

BWE 복호화기(9)는 변환부(91), 고주파 생성부(92), 조정부(93) 및 합성부(94)를 구비한다. 변환부(91)는 역변환부(83)로부터 출력된 PCM 오디오 데이터를 주파수 영역의 데이터로 변환한다. 변환된 데이터는 저주파 부분이라고 부른다. 고주파 생성부(92)는 BWE 정보를 참조하여 변환부(91)에 의해 변환된 저주파 부분을 복제하여 덧붙이는(patch) 방식으로 변환부(91)에 의해 변환된 데이터가 커버하지 못하는 부분, 즉 고주파 부분을 만들어낸다. 조정부(93)는 BWE 정보의 하나인 엔벨로프 정보를 이용하여 고주파 생성부(92) 고주파 부분의 레벨을 조정한다. 엔벨로프 정보는 부호화 단에서 보내진 정보로서 부호화 단에서 BWE 부호화시 잘라낸 고주파 부분에 해당하는 오디오 데이터의 엔벨로프 정보를 의미한다. 합성부(94)는 변환부(91)로부터 출력된 저주파 부분과 조정부(93)로부터 출력된 고주파 부분을 합성하여 PCM 오디오 데이터를 출력한다.The BWE decoder 9 includes a converter 91, a high frequency generator 92, an adjuster 93, and a combiner 94. The converter 91 converts the PCM audio data output from the inverse converter 83 into data in the frequency domain. The converted data is called the low frequency part. The high frequency generator 92 does not cover the data converted by the converter 91 in a manner of copying and patching a low frequency part converted by the converter 91 with reference to BWE information, that is, Create a high frequency part. The adjusting unit 93 adjusts the level of the high frequency part of the high frequency generating unit 92 by using the envelope information which is one of the BWE information. Envelope information is information sent from an encoding end and means envelope information of audio data corresponding to a high frequency part cut out by BWE encoding by an encoding end. The combining unit 94 synthesizes the low frequency portion output from the converting unit 91 and the high frequency portion output from the adjusting unit 93 to output PCM audio data.

이처럼, 비록 FGS 복호화기(8)가 기저 대역의 오디오 데이터만을 복호화하더라도 BWE 복호화기(9)가 결손된 대역의 오디오 데이터를 복원해서 덧붙여줌으로써 오디오 데이터의 품질을 높일 수 있게 된다.As such, even if the FGS decoder 8 decodes only the baseband audio data, the BWE decoder 9 can improve the quality of the audio data by reconstructing and adding the audio data of the missing band.

도 5는 FGS 부호화기(2)로부터 출력된 비트스트림의 구조를 보여준다.5 shows the structure of the bitstream output from the FGS encoder 2.

도 5를 참조하면, FGS 부호화기(2)에 의해 부호화된 비트스트림의 프레임은 FGS(Fine Grain Scalability)를 위해 양자화 샘플과 부가 정보를 계층 구조에 맵핑시켜 부호화되어 있다. 즉, 하위 계층의 비트스트림이 상위 계층의 비트스트림에 포함되어 있는 계층 구조를 가진다. 각 계층에 필요한 부가 정보들은 계층 별로 나뉘어서 부호화된다.Referring to FIG. 5, a frame of a bitstream encoded by the FGS encoder 2 is encoded by mapping quantization samples and additional information to a hierarchical structure for fine grain scalability (FGS). That is, it has a hierarchical structure in which the bitstream of the lower layer is included in the bitstream of the upper layer. The additional information required for each layer is divided into layers and encoded.

비트스트림의 선두에는 헤더 정보가 저장된 헤더 영역이 마련되고, 계층 0의 정보가 패킹되어 있으며, 상위 계층(enhancement layer)인 계층 1 내지 계층 N에 속하는 정보가 순서대로 패킹되어 있다. 헤더 영역에서부터 계층 0 정보까지를 기저 계층(base layer)이라고 부르고, 헤더 영역에서부터 계층 1 정보까지를 계층 1, 계층 2 정보까지를 계층 2라고 부른다. 마찬가지 방식으로, 최상위 계층은 헤 더 영역에서부터 계층 N 정보까지, 즉 기저 계층에서부터 상위 계층인 계층 N까지를 말한다. 각 계층 정보로는 부가 정보와 부호화된 데이터가 저장되어 있다. 가령, 계층 2 정보로 부가 정보 2와 부호화된 양자화 샘플들이 저장되어 있다. 여기서, N은 1 보다 크거나 같은 정수이다.A header area in which header information is stored is provided at the head of the bitstream, information of layer 0 is packed, and information belonging to layers 1 to N, which is an enhancement layer, is packed in order. The header region to layer 0 information is called a base layer, and the header region to layer 1 information is called layer 1 and layer 2 information is called layer 2. In the same way, the top layer refers to the header region to layer N information, that is, from the base layer to the upper layer N. Each layer information stores additional information and coded data. For example, side information 2 and encoded quantized samples are stored as layer 2 information. Where N is an integer greater than or equal to one.

도 6은 도 5의 부가 정보의 상세 구조를 보여준다.FIG. 6 shows a detailed structure of additional information of FIG. 5.

도 6을 참조하면, 임의의 계층 정보로는 부가 정보와 부호화된 양자화 샘플들이 저장되어 있고, 본 실시예에서 부가 정보는 양자화 샘플에 대해 허프만 부호화를 수행하였으므로, 허프만 코딩 모델 정보, 양자화 팩터 정보, 채널에 대한 부가 정보와 기타 부가 정보를 포함한다. 허프만 코딩 모델 정보는 대응하는 계층에 속하는 양자화 샘플들의 부호화에 사용되거나 복호화에 사용되어야 할 허프만 코딩 모델에 대한 인덱스 정보를 말한다. 양자화 팩터 정보는 대응하는 계층에 속하는 오디오 데이터를 양자화하거나 역영자화하기 위한 양자화 스텝 사이즈를 알려준다. 채널에 대한 부가 정보란 M/S stereo와 같은 채널에 대한 정보를 말한다. 기타 부가 정보는 M/S stereo의 채용 여부에 대한 플래그 정보 등을 말한다.Referring to FIG. 6, additional information and coded quantization samples are stored as arbitrary layer information. In the present embodiment, since Huffman coding is performed on the quantization samples, Huffman coding model information, quantization factor information, Include additional information about the channel and other additional information. Huffman coding model information refers to index information for a Huffman coding model to be used for encoding or decoding of quantization samples belonging to a corresponding layer. The quantization factor information informs the quantization step size for quantizing or inverse magnetizing audio data belonging to the corresponding layer. The additional information about the channel refers to information about a channel such as M / S stereo. Other additional information refers to flag information on whether or not to employ the M / S stereo.

도 7은 다중화기(3)로부터 출력되거나 역다중화기(7)로 입력되는 비트스트림의 구조를 보여준다.7 shows the structure of the bitstream output from the multiplexer 3 or input to the demultiplexer 7.

도 7을 참조하면, 비트스트림의 앞에는 FGS 부호화기(2)에 의해 부호화된 기저 계층인 계층 0이 선두에 배치되고 이어서 BWE 정보가 배치되며 다음으로 상위 계층, 즉 계층 1, 계층 2,‥, 계층 N이 차례대로 배치된다. 이에 따라, 복호화단에서는 기저 계층까지만 수신하거나 기저 계층만을 복호화하였더라도 BWE 정보를 참조하여 복호화된 기저 계층의 오디오 데이터를 기초로 결손된 계층의 오디오 데이터를 생성해낼 수 있게 된다.Referring to Fig. 7, in front of the bitstream, layer 0, which is a base layer coded by the FGS encoder 2, is placed first, followed by BWE information, followed by higher layers, that is, layer 1, layer 2, ..., layer. N is placed in order. Accordingly, even when only the base layer is received or only the base layer is decoded, the decoder can generate audio data of the missing layer based on the audio data of the base layer decoded with reference to the BWE information.

도 8은 본 발명의 부호화 장치 및 복호화 장치에서 각각 수행되는 허프만 부호화/복호화 방식을 설명하기 위한 참고도이다.8 is a reference diagram for explaining a Huffman encoding / decoding method performed in the encoding apparatus and the decoding apparatus, respectively.

도 8을 참조하면, 부호화해야 할 양자화 샘플 전체가 3 개의 계층으로 구성되어 있다. 빗금친 사각형은 양자화 샘플들로 구성된 스펙트럼 라인을 나타내며, 실선은 스케일 밴드를 표시하며, 띠선은 코딩 밴드를 나타낸다. 계층 0에는 스케일 밴드 ①, ②, ③, ④ 및 ⑤가 속하며, 코딩 밴드 ①, ②, ③, ④ 및 ⑤가 속하고, 계층 1에는 스케일 밴드 ⑤ 및 ⑥이 속하며, 코딩 밴드 ⑥, ⑦, ⑧, ⑨ 및 ⑩이 속하고, 계층 2에는 스케일 밴드 ⑥ 및 ⑦이 속하며, 코딩 밴드 ⑪, ⑫, ⑬, ⑭ 및 ⑮가 속한다. 한편, 계층 0은 주파수 대역 ⓐ까지 부호화하도록 고정되어 있고, 계층 1은 주파수 대역 ⓑ까지 부호화하도록 고정되어 있으며, 계층 2는 주파수 대역 ⓒ까지 부호화하도록 고정되어 있다.Referring to FIG. 8, the entire quantized sample to be encoded is composed of three layers. The hatched squares represent the spectral lines composed of quantized samples, the solid line represents the scale band, and the band line represents the coding band. Scale bands ①, ②, ③, ④, and ⑤ belong to layer 0, coding bands ①, ②, ③, ④, and ⑤ belong, and layer bands belong to scale bands ⑤, ⑥, and coding bands ⑥, ⑦, and ⑧. , ⑨ and 속 belong to the layer 2, and the scale bands ⑥ and ⑦ belong to the layer 2, and the coding bands ⑪, ⑫, ⑬, ⑭ and 속 belong. On the other hand, layer 0 is fixed to encode up to frequency band ⓐ, layer 1 is fixed to encode up to frequency band ⓑ, and layer 2 is fixed to encode up to frequency band ⓒ.

먼저, 100 비트 내에서 계층 0에 해당하는 양자화 샘플들을 해당하는 코딩 밴드 ①, ②, ③, ④ 및 ⑤에 정해져 있는 코딩 모델을 사용하여 부호화한다. 또한, 계층 0의 부가 정보로서 계층 0에 속하는 스케일 밴드 ①, ②, ③, ④, ⑤와 코딩 밴드 ①, ②, ③, ④, ⑤를 부호화한다. 계층 0의 샘플들을 심벌 단위로 부호화하면서 비트 수를 카운트하여 허용된 비트 범위, 즉 100비트를 넘어서면 계층 0의 부호화를 중단하고 계층 1을 부호화한다. 부호화되지 못한 계층 0의 샘플들은 계층 1 및 계층 2에 허용된 비트 범위에 여유가 생겼을 때 부호화한다. First, quantization samples corresponding to layer 0 within 100 bits are encoded using a coding model defined in corresponding coding bands ①, ②, ③, ④, and ⑤. As the additional information of the layer 0, the scale bands 1, 2, 3, 4 and 5 and the coding bands 1, 2, 3, 4 and 5 belonging to the layer 0 are encoded. By encoding the samples of the layer 0 in symbol units, the number of bits is counted, and when the allowed bit range is exceeded, that is, 100 bits, the encoding of the layer 0 is stopped and the layer 1 is encoded. The uncoded layer 0 samples are encoded when there is room in the bit range allowed for layer 1 and layer 2.

다음으로, 계층 1에 속하는 코딩 밴드, 즉 코딩 밴드 ⑥, ⑦, ⑧, ⑨ 및 ⑩ 중 부호화하고자 하는 양자화 샘플이 속하는 코딩 밴드의 코딩 모델을 사용하여 계층 1에 속하는 양자화 샘플들을 부호화한다. 또한, 계층 1의 부가 정보로서 계층 1에 속하는 스케일 밴드 ⑤ 및 ⑥과 코딩 밴드 ⑥, ⑦, ⑧, ⑨ 및 ⑩을 부호화한다. 만일 계층 1에 해당하는 샘플들을 모두 부호화하고도 허용된 비트 범위, 즉 100 비트가 되지 않을 경우에는 100비트가 다 찰 때까지 계층 0에서 부호화하지 못하였던 샘플을 부호화한다. 계층 1에 해당하는 샘플들을 심벌 단위로 부호화하면서 비트 수를 카운트하여 허용된 비트 범위, 즉 100 비트를 넘어서면 계층 1의 부호화를 중단하고 계층 2의 부호화로 넘어간다.Next, quantization samples belonging to layer 1 are encoded using a coding model of a coding band belonging to layer 1, that is, a coding band to which coding quantization samples to be coded among coding bands ⑥, ⑦, ⑧, ⑨, and 이 belong. As the additional information of the layer 1, the scale bands ⑤ and ⑥ belonging to the layer 1 and the coding bands ⑥, ⑦, ⑧, ⑨ and 및 are encoded. If all the samples corresponding to layer 1 are encoded, even if the allowed bit range, that is, not 100 bits, the samples that have not been encoded in layer 0 are encoded until 100 bits are filled. When the samples corresponding to layer 1 are encoded in symbol units, the number of bits is counted, and when the allowed bit range is exceeded, that is, 100 bits, the encoding of layer 1 is stopped and the layer 2 is skipped.

마지막으로, 계층 2에 속하는 코딩 밴드, 즉 코딩 밴드 ⑪, ⑫, ⑬, ⑭ 및 ⑮ 중 부호화하고자 하는 양자화 샘플이 속하는 코딩 밴드의 코딩 모델을 사용하여 계층 2에 속하는 양자화 샘플을 부호화한다. 또한, 계층 2의 부가 정보로서 계층 2에 속하는 스케일 밴드 ⑥ 및 ⑦과 코딩 밴드 ⑪, ⑫, ⑬, ⑭ 및 ⑮를 부호화한다. 만일 계층 1에 해당하는 샘플들을 모두 부호화하고도 허용된 비트 범위, 즉 100 비트가 되지 않을 경우에는 100 비트가 다 찰 때까지 계층 0에서 부호화하지 못하였던 샘플을 부호화한다.Finally, the quantization samples belonging to layer 2 are encoded using a coding model of the coding band belonging to layer 2, that is, the coding band to which the quantization samples to be coded among the coding bands ⑪, ⑫, ⑬, ⑭, and 이 belong. As the additional information of the layer 2, the scale bands 6 and 7 belonging to the layer 2 and the coding bands u, u, u, u and u are encoded. If all the samples corresponding to layer 1 are encoded, even if the allowed bit range is not 100 bits, the samples that are not encoded in layer 0 are encoded until 100 bits are filled.

만일 계층 0 또는 계층 1에서 허용된 비트 범위를 고려하지 않고 해당하는 양자화 샘플을 모두 부호화해버린다면, 다시 말해 부호화된 비트 수가 이미 허용된 비트 범위, 즉 100비트를 초과하였는데도 불구하고 모두 부호화한다면 결국 다음 계층인 계층 1에 허용된 비트 범위의 적어도 일부를 차용하는 셈이 되어 정작 계층 1에 속하는 양자화 샘플들을 부호화할 수 없게 되는 일이 발생된다. 따라서, 비트율 조절 가능(scalable)하게 복호화할 경우 계층 1까지만 복호화한다면 계층 1의 주파수 ⓑ까지 부호화되지 못했기 때문에 복호화된 양자화 샘플들은 주파수 ⓑ 이하에서 오르락 내리락하는 모습을 띄게 된다. 이 때 음질이 열화되는 버디 효과(birdy effect)가 나타난다.If all of the corresponding quantized samples are encoded without considering the allowed range of bits in layer 0 or layer 1, that is, if all of the encoded bits are encoded even though they have already exceeded the allowed range of bits, that is, 100 bits, eventually At least a part of the bit range allowed in the next layer, Layer 1, is borrowed, and it becomes impossible to encode quantized samples belonging to Layer 1. Therefore, in the case of scalable decoding, if only decoding is performed up to layer 1, the decoded quantized samples may rise or fall below the frequency ⓑ because the decoding is not performed until the frequency ⓑ of the layer 1. At this time, a birdy effect appears that deteriorates the sound quality.

한편, 복수개의 계층(타겟 계층)을 결정할 때 부호화해야할 오디오 데이터 전체의 크기를 고려하여 비트 범위가 할당되므로 전체적으로 부호화해야할 비트 범위가 모자라서 부호화하지 못하는 경우는 발생되지 않는다.On the other hand, when determining a plurality of layers (target layers), a bit range is allocated in consideration of the size of the entire audio data to be encoded, so that the encoding cannot be generated because the bit range to be encoded is insufficient.

복호화 과정 또한 부호화와 마찬가지로 그 역과정을 수행하면서 허용하는 비트 범위에 따라 비트 수를 카운트하기 때문에 계층 1로 복호화할 시점을 알아낼 수 있다.Like the encoding, the decoding process also counts the number of bits according to the allowable bit range while performing the reverse process, so that it is possible to find a time to decode to layer 1.

도 9는 BWE 복호화기(9)에서 수행되는 대역 확장 복호화, 즉 BWE 복호화를 보다 상세히 설명하기 위한 참고도이다.9 is a reference diagram for explaining in more detail the band extension decoding performed by the BWE decoder 9, that is, BWE decoding.

도 9를 참조하면, 줄무늬 부분은 FGS 복호화기(8)에 의해 복호화된 데이터를 나타내고, 회색 부분은 BWE 복호화기(9)에 의해 생성된 데이터를 나타낸다. 샘플링 주파수 Fs의 1/4까지의 데이터가 기저 계층에 속한다고 할 때 (a)는 복호화단에서 기저 대역에 해당하는 데이터만이 복호화된 경우를, (b), (c) 및 (d)는 기저 계층 및 적어도 하나의 상위 계층에 속하는 데이터가 FGS 복호화기(8)에 의해 복호화된 경우를 보여준다. 즉, FGS 복호화기(8)는 비트율 조절가능하도록 데이터의 복호화가 가능하며 BWE 복호화기(9)는 FGS 복호화기(8)가 복호화하지 못한 결손 대역 의 데이터를 생성해낸다.Referring to Fig. 9, the striped portion represents data decoded by the FGS decoder 8, and the gray portion represents data generated by the BWE decoder 9. When data up to one-quarter of the sampling frequency Fs belongs to the base layer, (a) indicates that only the data corresponding to the base band is decoded by the decoding stage, and (b), (c), and (d) The case where data belonging to the base layer and at least one upper layer is decoded by the FGS decoder 8 is shown. That is, the FGS decoder 8 can decode the data so that the bit rate can be adjusted, and the BWE decoder 9 generates data of missing bands that the FGS decoder 8 cannot decode.

상기와 같은 구성을 기초로 본 발명의 바람직한 실시예에 따른 부호화 방법 및 복호화 방법을 설명하면 다음과 같다.A coding method and a decoding method according to a preferred embodiment of the present invention will be described below based on the above configuration.

도 10은 본 발명에 따른 부호화 방법을 설명하기 위한 플로우챠트이다.10 is a flowchart for explaining an encoding method according to the present invention.

도 10을 참조하면, 부호화 장치는 오디오 데이터를 대역 확장 부호화하여 대역 제한 오디오 데이터를 출력하고 기저 계층에 대한 대역 확장 정보를 생성한다(1001단계). 기저 계층에 대한 대역 확장 정보의 의미는 복호화단에서 기저 계층에 속하는 오디오 데이터를 기초로 나머지 결손된 대역의 오디오 데이터를 생성해낼 수 있기 위한 정보로서, 엔벨로프 정보 등을 포함한다. 다음으로, 부호화 장치는 대역 제한 데이터를 비트율 조절가능하도록 기저 계층과 적어도 하나의 상위 계층을 갖는 계층 구조로 부호화한다. 보다 구체적으로, 각 계층 별로 대역 제한 오디오 데이터를 PWT 변환하고(1002단계), 양자화하고(1003단계), 허프만 부호화한 다음 비트율 조절가능하도록 계층 구조로 패키징한다(1004단계). 마지막으로, 부호화된 대역 제한 오디오 데이터와 대역 확장 정보를 다중화하여 얻어진 오디오 비트스트림을 출력한다(1004단계). 보다 구체적으로, 부호화 장치는 부호화된 대역 제한 오디오 데이터 중 기저 계층에 해당하는 데이터가 맨 먼저 배치되고 이어서 대역 확장 정보가 배치되며 다음으로 나머지 상위 계층에 해당하는 데이터가 배치되는 순서로 다중화하거나 또는 대역 확장 정보가 맨 먼저 배치되고 이어서 기저 계층에 해당하는 데이터가 배치되고 난 다음 나머지 상위 계층에 해당하는 데이터가 배치되는 순서로 다중화한다. Referring to FIG. 10, the encoding apparatus band-extends audio data to output band-limited audio data and generates band extension information for a base layer (step 1001). The meaning of the band extension information for the base layer is information for the decoder to generate audio data of the remaining missing band based on the audio data belonging to the base layer, and includes envelope information and the like. Next, the encoding apparatus encodes the band-limited data into a hierarchical structure having a base layer and at least one upper layer so that the bit rate can be adjusted. More specifically, band-limited audio data is PWT-converted for each layer (step 1002), quantized (step 1003), Huffman coded, and packaged in a hierarchical structure to enable bit rate adjustment (step 1004). Finally, the audio bitstream obtained by multiplexing the encoded band-limited audio data and the band extension information is output (step 1004). More specifically, the encoding apparatus multiplexes or bands in a sequence in which data corresponding to a base layer among the encoded band limited audio data is placed first, followed by band extension information, and then data corresponding to the remaining upper layers are arranged. The extension information is placed first, followed by data corresponding to the base layer, and then multiplexed in the order of data corresponding to the remaining upper layers.

도 11을 참조하면, 복호화 장치는 입력된 오디오 비트스트림을 역다중화하여 기저 계층과 적어도 하나의 상위 계층을 갖는 계층 구조로 부호화된 대역 제한 오디오 데이터와 대역 확장 정보를 추출한다(1101단계). 즉, 입력된 오디오 비트스트림으로부터 맨 먼저 기저 계층에 해당하는 데이터를 추출하고, 이어서 대역 확장 정보를 추출하며, 다음으로 나머지 상위 계층에 해당하는 데이터를 추출하는 순서로 역다중화하거나 맨 먼저 대역 확장 정보를 추출하고 이어서 기저 계층에 해당하는 데이터를 추출하며 다음으로 나머지 상위 계층에 해당하는 데이터를 추출하는 순서로 역다중화한다. 이어서, 복호화 장치는 적어도 기저 계층에 해당하는 대역 제한 오디오 데이터를 비트율 조절가능하도록 복호화한다. 보다 구체적으로, 타겟 계층까지 허프만 복호화한 다음(1102단계), 역양자화하고(1103단계) PWT 역변환하여(1104단계) 대역 제한된 PCM 오디오 데이터를 얻는다. 다음으로, 1104단계에서 얻어진 PCM 오디오 데이터를 기초로 대역 확장 정보를 참조하여 1104단계에서 얻어진 오디오 데이터가 커버하지 않는 적어도 일부 대역의 PCM 오디오 데이터를 생성하여 1104단계에서 얻어진 PCM 오디오 데이터에 덧붙여서 출력한다(1105단계).Referring to FIG. 11, the decoding apparatus demultiplexes an input audio bitstream to extract band-limited audio data and band extension information encoded in a hierarchical structure having a base layer and at least one upper layer (step 1101). That is, the data corresponding to the base layer is first extracted from the input audio bitstream, and then the band extension information is extracted, followed by demultiplexing in the order of extracting the data corresponding to the remaining upper layers, or the first band extension information. We then extract the data corresponding to the base layer, and then demultiplex in the order of extracting the data corresponding to the remaining upper layers. Subsequently, the decoding apparatus decodes the band-limited audio data corresponding to at least the base layer to be bit rate adjustable. More specifically, Huffman decoding to the target layer (step 1102), inverse quantization (step 1103), and PWT inverse transformation (step 1104) to obtain band-limited PCM audio data. Next, the PCM audio data of at least a part of the band not covered by the audio data obtained in step 1104 is generated based on the PCM audio data obtained in step 1104, and is output in addition to the PCM audio data obtained in step 1104. (Step 1105).

전술한 바와 같이, 본 발명에 따르면 비트스트림의 일부만을 가지고 복원하더라도 보다 좋은 품질을 보장할 수 있는 비트율 조절가능한 오디오 부호화 방법, 그 장치, 복호화 방법 및 그 장치가 제공된다.As described above, according to the present invention, there is provided a bit rate adjustable audio encoding method, an apparatus, a decoding method, and an apparatus capable of guaranteeing better quality even if only a part of a bitstream is recovered.

또한, complexity가 보다 낮으며 낮은 계층에서도 보다 양호한 음질을 제공 할 수 있게 된다. 산술 부호화를 이용하는 MPEG-4 Audio BSAC에 비해 허프만 부호화를 이용하는 본 발명의 부호화 장치/복호화 장치는 비트 패킹/언패킹(bit packing/unpacking) 과정에서 계산량이 크게 줄어든다. FGS를 제공하기 위해 본 발명에 따른 비트 패킹을 수행하여도 오버헤드가 적어 부호화 이득 측면에서 scalability를 제공하지 않은 경우와 거의 유사하다.It also has lower complexity and better sound quality even at lower levels. Compared to MPEG-4 Audio BSAC using arithmetic coding, the encoding / decoding apparatus of the present invention using Huffman coding greatly reduces the amount of computation during bit packing / unpacking. Even if bit packing according to the present invention is performed to provide the FGS, the overhead is small, and is similar to the case where scalability is not provided in terms of encoding gain.

더불어, 네트워크를 통한 오디오스트림 전송시 사용자의 의지 혹은 네트워크 환경에 따라 전송 비트율을 변경하여 전송함으로써 끊김없는 서비스의 제공이 가능하다. 용량의 제한을 갖는 정보저장매체에 저장할 때 파일 사이즈를 임의로 조절하여 저장할 수 있게 된다. 비트율이 낮아지면 대역이 제한되어 있기 때문에 주로 부호화/복호화 장치의 복잡성의 대부분을 차지하는 필터의 복잡성이 상당히 감소하기 때문에 비트율에 반비례해서 부호화 장치/복호화 장치의 실제 복잡성도 감소하게 된다.In addition, it is possible to provide a seamless service by changing the transmission bit rate according to the user's will or network environment when transmitting the audio stream through the network. When storing in an information storage medium having a limited capacity, the file size can be arbitrarily adjusted and stored. When the bit rate is lowered, the bandwidth is limited, and thus the complexity of the filter, which mainly accounts for most of the complexity of the encoding / decoding device, is considerably reduced, so that the actual complexity of the encoding device / decoding device is inversely proportional to the bit rate.

또한, PWT 변환을 채용함으로써, 기존의 MDCT 기반의 부호화에 비해 시간/주파수축의 분해능이 우월하므로 낮은 계층에서 보다 좋은 음질을 제공한다.In addition, by adopting the PWT transform, the resolution of the time / frequency axis is superior to that of the conventional MDCT-based encoding, thereby providing better sound quality at a lower layer.

Claims

delete

In a method of encoding audio data,

(a) band extending encoding the audio data to output band limited audio data and generating band extension information;

(b) Huffman encoding the band limited data into a hierarchical structure having a base layer and at least one upper layer to enable bit rate control; And

(c) multiplexing the Huffman coded band-limited audio data and the band extension information,

Step (b) is

(b11) differentially encoding side information corresponding to the base layer;

(b12) bit division encoding a plurality of quantized samples corresponding to the base layer; And

and (b13) repeating steps (b11) and (b12) for the next higher layer until encoding for a plurality of predetermined layers is completed.

In a method of encoding audio data,

Step (b) is

(b21) differentially encoding side information including scale factor information and coding model information corresponding to the base layer;

(b22) bit-dividing encoding the plurality of quantization samples corresponding to the base layer with reference to the coding model information; And

and (b23) repeating steps (b21) and (b22) for the next higher layer until encoding of a plurality of predetermined layers is completed.

The method according to claim 2 or 3,

And the quantized sample is obtained by PWT conversion.

The method according to claim 2 or 3,

Step (c) is

And encoding the data corresponding to the base layer among the encoded band-limited audio data first, followed by the band extension information, and then multiplexing the data corresponding to the remaining higher layers. Way.

The method according to claim 2 or 3,

Step (c) is

Wherein the band extension information is arranged first, followed by multiplexing in the order that data corresponding to the base layer among the encoded band-limited audio data is arranged, and then data corresponding to the remaining upper layers are arranged. Way.

delete

In the method of decoding audio data,

(a) demultiplexing the input audio bitstream to extract band limited audio data and band extension information encoded in a hierarchical structure having a base layer and at least one upper layer;

(b) Huffman decoding the band limited audio data corresponding to at least base layer; And

(c) generating audio data of at least some bands not covered by the decoded audio data based on the decoded audio data and adding the decoded audio data to the decoded audio data;

Step (c) is

And generating the audio data of the partial band so as to conform to the boundary of the decoded audio data.

The method of claim 8,

Step (c) is

And generating the audio data of the partial band so as to fit the boundary of the filter bank used in the wavelet transform.

The method of claim 8,

Step (c) is

Interpolating the overlapped portion of the decoded audio data and the generated partial band audio data when the filter bank used in the wavelet transform does not fit the boundary. Decryption method

The method of claim 8,

Step (a) is

First extracting data corresponding to the base layer from the bitstream, and then extracting the band extension information, and then demultiplexing in the order of extracting data corresponding to the remaining upper layers. .

The method of claim 8,

Step (a) is

And demultiplexing in the order of first extracting the band extension information from the bitstream, then extracting data corresponding to the base layer, and then extracting data corresponding to the remaining upper layers.

In the method of decoding audio data,

Step (b) is

(b11) differentially decoding side information corresponding to the base layer;

(b12) bit division decoding a plurality of quantized samples corresponding to the base layer; And

and (b13) repeating steps (b11) and (b12) for the next higher layer until decoding of a plurality of predetermined layers is completed.

In the method of decoding audio data,

Step (b) is

(b21) differentially decoding side information including scale factor information and coding model information corresponding to the base layer;

(b22) bit-decoding and decoding a plurality of quantization samples corresponding to the base layer with reference to the coding model information; And

(b23) repeating steps (b21) and (b22) for the next higher layer until decoding of a plurality of predetermined layers is completed.

delete

In the apparatus for encoding audio data,

A BWE encoder for performing band extension encoding on the audio data to output band limited audio data and to generate band extension information;

An FGS encoder for Huffman coding the band-limited data into a hierarchical structure having a base layer and at least one upper layer to enable bit rate adjustment; And

A multiplexer for multiplexing the coded band-limited audio data and the band extension information,

The FGS coder

Differentially encoding the additional information corresponding to the base layer, bit-dividing-code the plurality of quantized samples corresponding to the base layer, and additional information corresponding to the next higher layer until encoding of the plurality of predetermined layers is completed; An encoding device characterized by performing bit division encoding on a plurality of quantized samples.

In the apparatus for encoding audio data,

The FGS coder

Differentially encode additional information including scale factor information and coding model information corresponding to the base layer, and perform bit division coding on a plurality of quantized samples corresponding to the base layer with reference to the coding model information, Encoding, wherein the additional information including scale factor information and coding model information corresponding to the next higher layer is encoded and bit-divided encoding of a plurality of quantized samples corresponding to the next higher layer until encoding on the layer is completed. Device.

The method according to claim 16 or 17,

The FGS coder

And a quantized sample is obtained by PWT conversion.

The method according to claim 16 or 17,

The multiplexer

And encoding the data corresponding to the base layer among the encoded band-limited audio data first, followed by the band extension information, and then multiplexing the data corresponding to the remaining higher layers. .

delete

In the apparatus for decoding audio data,

A demultiplexer for demultiplexing the input audio bitstream to extract band-limited audio data and band extension information encoded in a hierarchical structure having a base layer and at least one upper layer;

A FGS decoder for Huffman decoding the band limited audio data corresponding to at least base layer; And

A BWE decoder that generates audio data of at least some bands not covered by the decoded audio data based on the decoded audio data and appends the decoded audio data to the decoded audio data;

The FGS decoder

Differentially decode the additional information corresponding to the base layer, bit-decode and decode a plurality of quantized samples corresponding to the base layer, and add additional information corresponding to a next higher layer until decoding of a plurality of predetermined layers is completed Decoding and bit-decoding the corresponding plurality of quantized samples.

The method of claim 21,

The demultiplexer is

And demultiplexing the data corresponding to the base layer first from the bitstream, subsequently extracting the band extension information, and then extracting data corresponding to the remaining upper layers.

The method of claim 21,

The demultiplexer is

And demultiplexing in order of first extracting the band extension information from the bitstream, subsequently extracting data corresponding to the base layer, and then extracting data corresponding to the remaining upper layers.