KR101176703B1

KR101176703B1 - Decoder and decoding method for multichannel audio coder using sound source location cue

Info

Publication number: KR101176703B1
Application number: KR1020090064918A
Authority: KR
Inventors: 서정일; 백승권; 이용주; 강경옥; 홍진우; 김진웅; 안치득
Original assignee: 한국전자통신연구원
Priority date: 2008-12-03
Filing date: 2009-07-16
Publication date: 2012-08-23
Also published as: KR20100063639A; CN101754086A; CN101754086B

Abstract

Disclosed are a multi-channel audio decoding apparatus and method based on sound source location clues. A multi-channel audio decoding apparatus based on a sound source position clue includes: a demultiplexer which receives a signal and parses the received signal into an audio bitstream and an additional information bitstream; An audio decoder to restore a downmix signal based on the audio bitstream; An integrated upmixing unit for predicting a multichannel signal using the downmix signal and the side information bitstream, and generating an upmixing signal by upmixing the downmix signal based on the multichannel signal; And an integrated filter bank windowing unit configured to extract a time domain signal by performing an integrated filter bank on the upmix signal, and extract an output signal by windowing the upmix signal.

SSLCC, multichannel, downmix, Huffman coding, inverse quantization.

Description

Multi-channel audio decoding apparatus and method based on sound source location clues {DECODER AND DECODING METHOD FOR MULTICHANNEL AUDIO CODER USING SOUND SOURCE LOCATION CUE}

본 발명은 음원 위치 단서 기반의 멀티 채널 오디오 복호화 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for multi-channel audio decoding based on sound source location clues.

SSLCC(Sound Source Location Cue Coding)의 기본 코딩개념은 Spatial Audio Coding(SAC) 기법에서 출발한다. The basic coding concept of Sound Source Location Cue Coding (SSLCC) starts from Spatial Audio Coding (SAC).

상기 SAC는 음원 위치 단서 기반의 멀티 채널 오디오 신호를 압축하는 기술로써, 공간상의 사람이 인지하는 공간큐(Spatial Cue)를 기반으로 각 채널 신호의 잉여성분 (redundancy)를 제거하여 그 압축효율을 극대화 할 수 있다.The SAC is a technique for compressing a multi-channel audio signal based on sound source location clues, and maximizes the compression efficiency by removing redundancy of each channel signal based on spatial cues perceived by a person in space. can do.

또한 멀티채널 신호는 기본적으로 다운믹스 처리됨으로써, 전송되는 오디오 신호는 다운믹스 신호가 코어(core) 신호가 된다. 즉, 기존 스테레오 오디오를 통해서도 재생 가능한 것이 SAC기법의 기본 원칙이다. In addition, the multichannel signal is basically downmixed, so that the downmixed signal is a core signal of the transmitted audio signal. That is, the basic principle of the SAC technique is that it can be reproduced through existing stereo audio.

상기 SSLCC는 이러한 SAC 기법의 하나이며 이는 공간상에서 사람이 인지하는 공간큐로써, 음원의 위치를 간주하고, 음원의 위치정보를 멀티채널 신호로부터 추출하여 표현하고 전송한다.The SSLCC is one of such SAC schemes. It is a spatial cue recognized by a person in space, and considers the position of a sound source, extracts the position information of the sound source from a multi-channel signal, and transmits it.

이때, SAC 코딩 전략을 통해 추출되는 정보는 그 정보량이 적기 때문에, 잉여데이터 영역으로 전송할 수 있으므로 상기 정보를 수신하는 수신지가 SAC 코딩 방식을 지원하지 않는 오디오라면, 기본 스테레오 오디오를 이용하여 스테레오 신호만을 재생시킬 수 있고, 상기 정보를 수신하는 수신지가 SAC 코딩 방식을 지원할 경우에는 전송된 잉여정보를 사용하여 다운믹스 스테레오 신호로부터 음원 위치 단서 기반의 멀티 채널 오디오 신호를 복원할 수 있다. In this case, since the information extracted through the SAC coding strategy has a small amount of information, the information can be transmitted to the redundant data area. When the destination receiving the information supports the SAC coding scheme, the transmitted redundancy information may be used to restore the sound source position-based multi-channel audio signal from the downmix stereo signal.

그러나, 잉여정보를 사용하여 다운믹스 스테레오 신호로부터 음원 위치 단서 기반의 멀티 채널 오디오 신호를 복원하기 위해서는 SAC 코딩 전략을 통해 정보를 추출하는 과정에서 사용하는 T/F 변환 방법과 동일한 T/F 변환 방법을 사용해야 하므로 SAC 코딩 전략을 통해 정보를 추출하는 과정에서 사용하는 T/F 변환 방법이 상기 수신지에 최적화된 T/F 변환 방법이 아니면 변환 과정에 영향을 줄 수도 있는 실정이다.However, in order to recover the multi-channel audio signal based on sound source position cues from the downmix stereo signal using the surplus information, the same T / F conversion method is used as the T / F conversion method used in the process of extracting information through the SAC coding strategy. Since the T / F conversion method used in the information extraction through the SAC coding strategy is not the T / F conversion method optimized for the destination, it may affect the conversion process.

따라서, 수신지에서 최적화된 T/F 변환 방법으로 음원 위치 단서 기반의 멀티 채널 오디오 신호를 복원할 수 있는 장치나 방법이 필요한 실정이다.Therefore, there is a need for an apparatus or method capable of reconstructing a multi-channel audio signal based on a sound source position clue with a T / F conversion method optimized at a destination.

본 발명은 멀티채널 오디오 신호를 입력 받아 이를 압축하고 기본 스테레오 코덱(core stereo codec)을 통하여 스테레오 신호를 압축 전송함으로써, 기존 스테레오 오디오 코덱과 역 호환성(backward compatible)을 제공함과 동시에, 멀티채널 오디오 전송이 가능한 복호화 장치 및 방법을 제공한다. The present invention receives a multi-channel audio signal, compresses it, and compresses and transmits a stereo signal through a core stereo codec, thereby providing backward compatibility with the existing stereo audio codec and simultaneously transmitting multi-channel audio. It provides a possible decoding apparatus and method.

본 발명은 TDAC(Time Domain Aliasing Cancellation) 파일 뱅크를 사용함으로써 옵션에 따라, 스테레오 다운믹스 오디오 신호에 대한 T/F 변환을 달리할 수 있는 복호화 장치 및 방법을 제공한다. The present invention provides a decoding apparatus and method capable of varying T / F conversion for a stereo downmix audio signal according to an option by using a time domain alias cancellation cancellation (TDAC) file bank.

본 발명의 일실시예에 따른 음원 위치 단서 기반의 멀티 채널 오디오 복호화 장치는 신호를 수신하고, 수신한 상기 신호를 오디오 비트스트림과 부가 정보 비트스트림으로 파싱하는 디멀티플렉서; 상기 오디오 비트스트림을 기초로 다운믹스 신호를 복원하는 오디오 복호화부; 상기 다운믹스 신호와 상기 부가 정보 비트스트림을 사용하여 멀티 채널 신호를 예측하고, 상기 멀티 채널 신호를 기초로 상기 다운 믹스 신호를 업믹싱하여 업믹싱 신호를 생성하는 통합 업믹싱부; 상기 업믹싱 신호에 통합 필터 뱅크를 수행하여 시간 영역 신호를 추출하고, 상기 업믹싱 신호를 윈도우잉하여 출력 신호를 추출하는 통합 필터뱅크 윈도우잉부를 포함한다.A multi-channel audio decoding apparatus based on a sound source position clue according to an embodiment of the present invention comprises: a demultiplexer for receiving a signal and parsing the received signal into an audio bitstream and an additional information bitstream; An audio decoder to restore a downmix signal based on the audio bitstream; An integrated upmixing unit for predicting a multichannel signal using the downmix signal and the side information bitstream, and generating an upmixing signal by upmixing the downmix signal based on the multichannel signal; And an integrated filter bank windowing unit configured to extract a time domain signal by performing an integrated filter bank on the upmix signal, and extract an output signal by windowing the upmix signal.

또한, 본 발명의 일실시예에 따른 음원 위치 단서 기반의 멀티 채널 오디오 복호화 방법은 수신한 신호를 오디오 비트스트림과 부가 정보 비트스트림으로 파싱 하는 단계; 상기 오디오 비트스트림을 기초로 다운믹스 신호를 복원하는 단계; 상기 부가 정보 비트스트림을 복호화하여 부가 정보를 복원하는 단계; 상기 다운믹스 신호와 상기 부가 정보를 사용하여 멀티 채널 신호를 예측하는 단계; 상기 멀티 채널 신호를 기초로 상기 다운 믹스 신호를 업믹싱하여 업믹싱 신호를 생성하는 단계; 상기 업믹싱 신호에 통합 필터 뱅크를 수행하여 시간 영역 신호를 추출하는 단계; 및 상기 업믹싱 신호를 윈도우잉하여 출력 신호를 추출하는 단계를 포함한다.In addition, a multi-channel audio decoding method based on a sound source position clue according to an embodiment of the present invention includes parsing a received signal into an audio bitstream and an additional information bitstream; Restoring a downmix signal based on the audio bitstream; Restoring side information by decoding the side information bitstream; Predicting a multi-channel signal using the downmix signal and the side information; Generating an upmix signal by upmixing the downmix signal based on the multi-channel signal; Extracting a time domain signal by performing an integrated filter bank on the upmix signal; And windowing the upmix signal to extract an output signal.

본 발명에 따르면 멀티채널 오디오 신호를 입력 받아 이를 압축하고 기본 스테레오 코덱(core stereo codec)을 통하여 스테레오 신호를 압축 전송함으로써, 기존 스테레오 오디오 코덱과 역 호환성(backward compatible)을 제공함과 동시에, 멀티채널 오디오 전송이 가능하다. According to the present invention, by receiving a multi-channel audio signal and compressing it and compressing and transmitting a stereo signal through a core stereo codec, a backward compatible with existing stereo audio codec and at the same time, multi-channel audio Transmission is possible.

본 발명은 TDAC(Time Domain Aliasing Cancellation) 파일 뱅크를 사용함으로써 옵션에 따라, 스테레오 다운믹스 오디오 신호에 대한 T/F 변환을 달리할 수 있다. According to the present invention, a T / F conversion for a stereo downmix audio signal may be varied according to an option by using a time domain alias cancellation cancellation (TDAC) file bank.

이하 첨부된 도면을 참조하여 본 발명에 따른 다양한 실시예를 상세히 설명하기로 한다.Hereinafter, various embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일실시예에 따른 음원 위치 단서 기반의 멀티 채널 오디오 부호화 장치를 도시한 일례이다. 1 is an example of an audio source location clue based multi-channel audio encoding apparatus according to an embodiment of the present invention.

본 발명의 일실시예에 따른 음원 위치 단서 기반의 멀티 채널 오디오 부호화 장치(100)는 SSLCC(Sound Source Location Cue Coding)기반의 5채널 멀티 채널 오디오 부호화 장치로서 도 1에 도시된 바와 같이 전처리 필터뱅크부(110), 분석부(120), 다운믹스 처리부(130), 오디오 부호화부(140) 및 멀티플렉서(150)로 구성될 수 있다.The multi-channel audio encoding apparatus 100 based on sound source location cues according to an embodiment of the present invention is a 5-channel multi-channel audio encoding apparatus based on sound source location cue coding (SSLCC), as shown in FIG. 1. The unit 110, the analyzer 120, the downmix processor 130, the audio encoder 140, and the multiplexer 150 may be configured.

이때, 음원 위치 단서 기반의 멀티 채널 오디오 부호화 장치(100)는 5채널 이상의 음원 위치 단서 기반의 멀티 채널 오디오 컨텐츠에 대해서도 확장될 수 있다.In this case, the multi-channel audio encoding apparatus 100 based on the sound source position clue may be extended to the multi-channel audio content based on the sound source position clue of 5 channels or more.

전처리 필터뱅크부(110)는 음원 위치 단서 기반의 멀티 채널 오디오 부호화 장치(100)에 입력되는 멀티채널의 입력 오디오 신호를 전 처리하고, 전 처리된 입력 오디오 신호를 필터뱅크(filterbank)에 통과시켜 주파수 영역 신호로 변경할 수 있다. 이때, 필터뱅크는 서브 밴드 분석(sub-band analysis)에 기초하여 T/F(Time to Frequency) 변환을 하는 구성으로 MDCT, MDST, DFT등이 활용될 수 있다. The preprocessing filter bank unit 110 preprocesses the multi-channel input audio signal input to the multi-channel audio encoding apparatus 100 based on the sound source position clue, and passes the pre-processed input audio signal through a filterbank. Can be changed to a frequency domain signal. In this case, the filter bank may be configured to perform time to frequency (T / F) conversion based on sub-band analysis (MDCT, MDST, DFT, etc.).

이때, 상기 입력 오디오 신호는 입력 신호 LF(Left Front), 입력 신호 RF(Right Front). 입력 신호 C(Center Front), 입력 신호 Ls(Left Surround), 입력 신호 Rs(Right Surround)를 포함할 수 있다.In this case, the input audio signal is an input signal LF (Left Front), an input signal RF (Right Front). The input signal C (Center Front), the input signal Ls (Left Surround), and the input signal Rs (Right Surround) may be included.

분석부(Analyzer)(120)는 전처리 필터뱅크부(110)에서 주파수 영역 신호로 변환된 입력 오디오 신호로부터 공간큐를 추출하고 상기 공간큐를 부가정보 비트스트림으로 표현화하여 전송할 수 있다. 이때, 분석부(120)는 상기 입력 오디오 신호를 압축하여 다운믹스 처리부(130)로 전송할 수 있다.The analyzer 120 extracts a spatial cue from an input audio signal converted from the preprocessing filter bank unit 110 into a frequency domain signal, and expresses the spatial cue as an additional information bitstream. In this case, the analyzer 120 may compress the input audio signal and transmit the compressed audio signal to the downmix processor 130.

다운믹스 처리부(130)는 분석부(120)에서 압축된 상기 입력 오디오 신호를 주파수 영역에서 오디오 신호의 다운믹스 하는 방식으로, 다운믹스를 할 수 있다. 또한, 다운믹스 처리부(130)는 ITU-T의 권고안을 따라 다운 믹스할 수도 있다.The downmix processor 130 may downmix the input audio signal compressed by the analyzer 120 in a downmixing manner of the audio signal in the frequency domain. In addition, the downmix processor 130 may downmix according to the recommendations of the ITU-T.

다운믹스 처리부(130)에서 다운 믹스된 오디오 신호는 일반적인 스테레오 오디오를 통하여 비트스트림으로 표현될 수 있다. 이때, 상기 일반적인 스테레오 오디오로는 MP3(MPEG Layer III) 혹은 AAC(Advanced Audio Coding)등이 활용될 수 있다.The downmixed audio signal in the downmix processor 130 may be represented as a bitstream through general stereo audio. In this case, as the general stereo audio, MP3 (MPEG Layer III) or AAC (Advanced Audio Coding) may be utilized.

오디오 부호화부(140)는 다운믹스 처리부(130)에서 다운 믹스된 오디오 신호를 부호화할 수 있다.The audio encoder 140 may encode the downmixed audio signal by the downmix processor 130.

멀티플렉서(150)는 오디오 부호화부(140)에서 부호화 된 신호와 분석부(120)에서 전송된 부가정보 비트스트림을 결합하여 전송할 수 있다.The multiplexer 150 may combine and transmit the signal encoded by the audio encoder 140 and the side information bitstream transmitted by the analyzer 120.

도 2는 본 발명의 일실시예에 따른 음원 위치 단서 기반의 멀티 채널 오디오 복호화 장치를 도시한 일례이다. 2 is a diagram illustrating an audio source position clue based multi-channel audio decoding apparatus according to an embodiment of the present invention.

본 발명의 일실시예에 따른 음원 위치 단서 기반의 멀티 채널 오디오 복호화 장치(200)는 SSLCC 기반의 5채널 멀티 채널 오디오 복호화 장치로서 도 2에 도시된 바와 같이 디멀티플렉서(210), 오디오 복호화부(220), 윈도우잉 필터뱅크부(230), 통합 업믹서부(240), 및 통합 필터뱅크 윈도우잉부(250)로 구성될 수 있다.The multi-channel audio decoding apparatus 200 based on the sound source position clue according to the embodiment of the present invention is a SSLCC-based 5-channel multi-channel audio decoding apparatus as shown in FIG. 2, the demultiplexer 210 and the audio decoder 220. ), The windowing filter bank unit 230, the integrated upmixer 240, and the integrated filter bank windowing unit 250 may be configured.

디멀티플렉서(210)는 멀티플렉서(150)가 전송한 신호를 수신하고, 수신한 상기 신호를 오디오 비트스트림과 부가 정보 비트스트림으로 파싱할 수 있다.The demultiplexer 210 may receive a signal transmitted by the multiplexer 150 and parse the received signal into an audio bitstream and an additional information bitstream.

오디오 복호화부(220)는 상기 오디오 비트스트림을 기초로 다운믹스 신호를 복원할 수 있다.The audio decoder 220 may restore the downmix signal based on the audio bitstream.

윈도우잉 필터뱅크부(230)는 상기 다운믹스 신호에 에널러시스 필터뱅크를 적용하여 T/F 변환하고, T/F 변환된 상기 다운믹스 신호를 윈도우잉하여 전송할 수 있다.The windowing filter bank unit 230 may apply an analytical filter bank to the downmix signal to perform T / F conversion, and may window and transmit the T / F converted downmix signal.

통합 업믹싱부(240)는 상기 다운믹스 신호와 상기 부가 정보 비트스트림을 사용하여 멀티 채널 신호를 예측하고, 상기 멀티 채널 신호를 기초로 상기 다운 믹스 신호를 업믹싱(upmixing)하여 업믹싱 신호를 생성할 수 있다.The integrated upmixing unit 240 predicts a multichannel signal using the downmix signal and the side information bitstream, and upmixes the downmix signal based on the multichannel signal to generate an upmix signal. Can be generated.

구체적으로 통합 업믹싱부(240)는 상기 다운 믹스 신호에서 진폭 정보와 위상 정보를 분리하고, 상기 진폭 정보를 기초로 기 저장된 랜덤 시퀀스를 윈도우잉하여 상기 위상 정보에 가중치를 부여하며, 상기 진폭 정보와 가중치가 부여된 위상 정보를 기초로 멀티 채널 신호를 예측할 수 있다.In detail, the integrated upmixing unit 240 separates amplitude information and phase information from the downmix signal, weights the phase information by windowing a random sequence previously stored based on the amplitude information, and applies the weight information to the amplitude information. The multi-channel signal may be predicted based on the weighted phase information.

이때, 통합 업믹싱부(240)는 상기 다운 믹스 신호를 콤플렉스 변환하고, 콤플렉스 변환된 상기 다운 믹스 신호에서 진폭 정보와 위상 정보를 분리할 수 있다.In this case, the integrated upmixing unit 240 may complex-convert the downmix signal and separate amplitude information and phase information from the down-converted downmix signal.

또한, 통합 업믹싱부(240)는 상기 진폭 정보의 포락선을 기초로 위상 정보를 보정하기 위한 스펙트럼 형 윈도우를 모델링 하고, 상기 스펙트럼 형 윈도우를 기 저장된 랜덤 시퀀스에 사용하여 윈도우잉하며, 윈도우잉된 랜덤 시퀀스를 사용하여 상기 위상 정보에 가중치를 부여할 수 있다.In addition, the integrated upmixing unit 240 models the spectral window for correcting the phase information based on the envelope of the amplitude information, uses the spectral window for the previously stored random sequence, and windowes the windowed window. A random sequence may be used to weight the phase information.

그리고, 통합 업믹싱부(240)는 상기 진폭 정보와 가중치가 부여된 위상 정보를 결합하고, 결합된 상기 진폭 정보와 가중치가 부여된 위상 정보를 역 콤플렉스 변환하여 멀티 채널 신호를 예측할 수 있다.In addition, the integrated upmixer 240 may combine the amplitude information and the weighted phase information, and inversely complex convert the combined amplitude information and the weighted phase information to predict a multi-channel signal.

통합 필터뱅크 윈도우잉부(250)는 상기 업믹싱 신호에 통합 필터뱅 크(synthesis filterbank)를 수행하여 시간 영역 신호를 추출하고, 상기 업믹싱 신호를 윈도우잉(windowing)하여 출력 신호를 추출할 수 있다.The integrated filter bank windowing unit 250 may extract a time domain signal by performing a synthesis filter bank on the upmix signal, and extract an output signal by windowing the upmix signal. .

도 3은 본 발명의 다른 일실시예에 따른 음원 위치 단서 기반의 멀티 채널 오디오 복호화 장치를 도시한 일례이다. 3 illustrates an example of a multi-channel audio decoding apparatus based on a sound source location clue according to another embodiment of the present invention.

본 발명의 다른 일실시예에 따른 음원 위치 단서 기반의 멀티 채널 오디오 복호화 장치(300)는 리얼 변환(Real-Transform)을 적용한 음원 위치 단서 기반의 멀티 채널 오디오 복호화 장치로서 도 2에 도시된 바와 같이 디멀티플렉서(310), TDAC 필터뱅크(320), 통합 업믹서부(330), 및 통합 필터뱅크 윈도우잉부(340)로 구성될 수 있다.The multi-channel audio decoding apparatus 300 based on sound source position clue according to another embodiment of the present invention is a multi-channel audio decoding apparatus based on sound source position clue to which Real-Transform is applied, as shown in FIG. 2. The demultiplexer 310, the TDAC filter bank 320, the integrated upmixer 330, and the integrated filterbank windowing unit 340 may be configured.

SSLCC는 기본적으로 DFT filterbank (transform)을 따른다. 그러나 코어 스테레오 오디오와 연동을 위해, 다양한 필터뱅크 타입이 적용될 수 있다. SSLCC basically follows the DFT filterbank (transform). However, to work with core stereo audio, various filterbank types can be applied.

이때, 필터뱅크의 타입이 변하더라도, SSLCC 분석부(120)나 통합 업믹서부(240), 및 통합 필터뱅크 윈도우잉부(250)의 통합(Synthesis)의 동작원리는 동일할 수 있다.In this case, even if the type of the filter bank is changed, the operating principle of the synthesis of the SSLCC analyzer 120, the integrated upmixer 240, and the integrated filter bank windowing unit 250 may be the same.

즉, 스테레오 전송 시에는 역상관기(Deccorelator)가 적용되는 않으므로 리얼 변환(Real filterbank(transform))이 가능할 수 있다. 본 발명의 다른 일실시예에 따른 음원 위치 단서 기반의 멀티 채널 오디오 복호화 장치(300)는 상기 리얼 변환이 가능한 MDCT를 사용하여 코어코덱과의 연동이 가능한 구성의 일 실시예이다.That is, since a decorelator is not applied during stereo transmission, a real filterbank (transform) may be possible. The multi-channel audio decoding apparatus 300 based on the sound source location clue according to another embodiment of the present invention is an embodiment of a configuration capable of interworking with a core codec using the MDCT capable of real conversion.

TDAC 필터뱅크(320)는 상기 오디오 비트스트림을 기초로 다운믹스 신호를 복 원한 다음에 상기 다운믹스 신호에 에널러시스 필터뱅크를 적용하는 과정과 윈도우잉 과정을 생략하고 통합 업믹싱부(330로 전송할 수 있다. 이때, TDAC 필터뱅크(320)가 전송하는 신호는 주파수가 다운 믹싱된 신호 L과 주파수가 다운 믹싱된 신호 R일 수 있다.The TDAC filter bank 320 restores the downmix signal based on the audio bitstream and then omits the process of applying an analyse filter bank to the downmix signal and the windowing process, and skips the integrated upmixer 330. In this case, the signal transmitted by the TDAC filter bank 320 may be a signal L whose frequency is downmixed and a signal R whose frequency is downmixed.

통합 필터뱅크 윈도우잉부(340)는 통합 업믹싱부(330)에서 생성된 업믹싱 신호에 통합 필터뱅크(synthesis filterbank)를 수행하여 시간 영역 신호를 추출하고, 상기 업믹싱 신호를 코어 스테레오 오디오의 에널러시스 윈도우잉과 매핑 되도록 윈도우잉하여 출력 신호를 추출할 수 있다.The integrated filterbank windowing unit 340 extracts a time domain signal by performing a synthesis filterbank on the upmixing signal generated by the integrated upmixing unit 330, and extracts the upmixing signal from the core of the stereo audio. The output signal can be extracted by windowing to map with the rushing windowing.

이때, 디멀티플렉서(310), 통합 업믹서부(330)는 본 발명의 일실시예에 따른 음원 위치 단서 기반의 멀티 채널 오디오 복호화 장치(200)의 디멀티플렉서(210), 통합 업믹서부(240)와 동일한 구성이므로 상세한 설명은 생략한다.In this case, the demultiplexer 310 and the integrated upmixer 330 may include the demultiplexer 210 and the integrated upmixer 240 of the multi-channel audio decoding apparatus 200 based on the sound source position clue according to an embodiment of the present invention. Since the same configuration, detailed description thereof will be omitted.

본 발명의 다른 일실시예에 따른 음원 위치 단서 기반의 멀티 채널 오디오 복호화 장치(300)는 옵션에 따라, 스테레오 다운믹스 오디오 신호에 대한 T/F 변환을 달리할 수 있다. 일례로 복호화 장치에서 역 상관기의 동작이 off될 경우, 리얼 T/F 변환을 적용할 수 있다. 이 경우에 콤플렉스 T/F 변환도 가능하나, 콤플렉스 T/F 변환할 경우에도 위상정보는 사용하지 않는다. The multi-channel audio decoding apparatus 300 based on the sound source position clue according to another embodiment of the present invention may vary the T / F conversion for the stereo downmix audio signal according to an option. For example, when the operation of the inverse correlator is turned off in the decoding apparatus, real T / F conversion may be applied. In this case, complex T / F conversion is possible, but phase information is not used even when performing complex T / F conversion.

단, 복호화 장치에서 위상정보를 필요로 하는 동작이 on이 될 경우, 즉, 역 상관기의 동작을 필요할 경우에는 반드시 콤플렉스 T/F 변환을 사용하여야 한다. 콤플렉스 T/F 변환 시에는 DFT가 기본이 되나, MDCT/MDST를 하나의 콤플렉스 변환 페어(complex transform pair)로 활용할 수 있다. However, when the operation requiring the phase information in the decoding apparatus is turned on, that is, when the inverse correlator is required, the complex T / F conversion must be used. DFT is the basis for complex T / F conversion, but MDCT / MDST can be utilized as a complex transform pair.

도 4는 본 발명의 일실시예에 따른 부가 정보 비트스트림의 복호화 장치를 도시한 일례이다. 4 is a diagram illustrating an apparatus for decoding an additional information bitstream according to an embodiment of the present invention.

본 발명의 일실시예에 따른 부가 정보 비트스트림의 복호화 장치는 디멀티플렉서(210)가 파싱한 부가 정보 비트스트림에서 부가 정보인 VSLA(Virtual Sound Location Angle)를 복호화하는 장치로서, 도 4에 도시된 바와 같이 허프만 복호화부(410)와 역양자화부(420)로 구성될 수 있다.An apparatus for decoding an additional information bitstream according to an embodiment of the present invention is an apparatus for decoding a virtual sound location angle (VSLA), which is additional information, from the additional information bitstream parsed by the demultiplexer 210. As shown in FIG. Likewise, the Huffman decoder 410 and the inverse quantizer 420 may be configured.

또한, 본 발명의 일실시예에 따른 부가 정보 비트스트림의 복호화 장치는 윈도우잉 필터뱅크부(230)이나 통합 필터뱅크 윈도우잉부(250)에 포함될 수 있다.In addition, the apparatus for decoding the additional information bitstream according to an embodiment of the present invention may be included in the windowing filter bank unit 230 or the integrated filter bank windowing unit 250.

허프만 복호화부(410)는 허프만 코드북으로 상기 부가 정보 비트스트림을 허프만 복호화하여 디퍼렌셜 인덱스를 생성할 수 있다.The Huffman decoder 410 may generate a differential index by Huffman decoding the additional information bitstream using the Huffman codebook.

이때, 허프만 복호화부(410)는 역 차분 코딩부(411), 차분 코딩부(412), 매핑부(413), 및 허프만 코딩부(414)를 포함하여 상기 허프만 코드북을 생성할 수 있다.In this case, the Huffman decoder 410 may generate the Huffman codebook including an inverse difference coding unit 411, a difference coding unit 412, a mapping unit 413, and a Huffman coding unit 414.

이때, 역 차분 코딩부(411)는 이미 처리된 이전 프레임과 허프만 코딩 타입에 대한 정보에 기초하여 역 차분 코딩을 수행하여 원 인덱스를 복호화할 수 있다.In this case, the inverse difference coding unit 411 may decode the original index by performing inverse difference coding on the basis of the information on the previously processed previous frame and the Huffman coding type.

또한, 차분 코딩부(412)는 사인 비트의 정보에 대응하여 상기 원 인덱스에서 네거티브 정보를 제거한 다음에 차분 코딩을 수행하여 인덱스 정보를 생성할 수 있다.In addition, the differential coding unit 412 may generate index information by performing differential coding after removing negative information from the original index in response to the information of the sign bit.

다음으로 매핑부(413)는 상기 인덱스에서 상기 네거티브 정보를 제거하기 위한 오프셋 정보를 제거한 다음에 상기 인덱스를 주파수 솔루션에 따라 매핑하여 제 1 서브 밴드와 상기 제1 서브 밴드를 제외한 다른 밴드들로 분류할 수 있다. Next, the mapping unit 413 removes the offset information for removing the negative information from the index, and then maps the index according to a frequency solution and classifies the index into other bands except for the first subband and the first subband. can do.

마지막으로 허프만 코딩부(414)는 상기 제1 서브 밴드와 상기 제1 서브 밴드를 제외한 다른 밴드들 각각에 대하여 허프만 코딩 방법을 적용하여 허프만 코드북을 생성할 수 있다.Finally, the Huffman coding unit 414 may generate the Huffman codebook by applying the Huffman coding method to each of the first subband and the other bands except for the first subband.

허프만 복호화부(410)는 제1 서브 밴드의 값을 하기 표 1의 허프만 코드북을 참조하여 허프만 복호화할 수 있다.The Huffman decoder 410 may decode the Huffman by referring to the Huffman codebook of Table 1 below.

IndexIndex Num of bitsNum of bits Code wordCode word IndexIndex Num. of bitsNum. of bits CodewordCodeword 00 55 0x170x17 1616 55 0x1d0x1d 1One 88 0x640x64 1717 55 0x190x19 22 88 0x650x65 1818 55 0x1c0x1c 33 88 0xf00xf0 1919 55 0x160x16 44 88 0xf10xf1 2020 55 0x180x18 55 77 0x330x33 2121 55 0x140x14 66 77 0x790x79 2222 55 0x130x13 77 66 0x180x18 2323 55 0x150x15 88 66 0x220x22 2424 55 0x1b0x1b 99 66 0x230x23 2525 55 0x100x10 1010 66 0x3d0x3d 2626 55 0x0e0x0e 1111 55 0x0b0x0b 2727 55 0x0f0x0f 1212 55 0x120x12 2828 55 0x0d0x0d 1313 55 0x1a0x1a 2929 55 0x0a0x0a 1414 44 0x040x04 3030 22 0x000x00 1515 55 0x1f0x1f

또한, 허프만 복호화부(410)는 디멀티플렉서(210)가 수신한 상기 신호가 5비트로 양자화된 신호인 경우에 하기 표 2의 허프만 코드북을 참조하여 허프만 복호화할 수 있다.In addition, the Huffman decoder 410 may decode Huffman with reference to the Huffman codebook of Table 2 when the signal received by the demultiplexer 210 is a signal quantized to 5 bits.

IndexIndex Differential FrequencyDifferential Frequency Differential FrequencyDifferential Frequency IndexIndex Differential
TimeDifferential
Time Differential
TimeDifferential
Time 비트수Number of bits 코드워드Codeword 비트수Number of bits 코드워드Codeword 비트수Number of bits 코드워드Codeword 비트수Number of bits 코드워드Codeword -30-30 3434 0x7907FFFF0x7907FFFF 2727 0x5D0FFF0x5D0FFF 1One 33 0x000000030x00000003 33 0x0000030x000003 -29-29 3434 0x7907FFFE0x7907FFFE 2727 0x5D0FFE0x5D0FFE 22 44 0x000000030x00000003 55 0x0000000x000000 -28-28 3333 0x3C83FFFE0x3C83FFFE 2626 0x2E87FE0x2E87FE 33 55 0x000000040x00000004 55 0x0000040x000004 -27-27 3232 0x1E41FFFE0x1E41FFFE 2525 0x1743FE0x1743FE 44 66 0x000000040x00000004 66 0x0000040x000004 -26-26 3131 0x0F20FFFE0x0F20FFFE 2424 0x0BA1FE0x0BA1FE 55 66 0x000000100x00000010 77 0x0000040x000004 -25-25 3030 0x07907FFE0x07907FFE 1919 0x005D0E0x005D0E 66 66 0x000000130x00000013 77 0x0000140x000014 -24-24 2929 0x03C83FFE0x03C83FFE 1717 0x0017420x001742 77 77 0x000000250x00000025 88 0x0000140x000014 -23-23 2020 0x0001E41E0x0001E41E 1515 0x0015E60x0015E6 88 88 0x0000001D0x0000001D 99 0x0000140x000014 -22-22 2828 0x01E41FFE0x01E41FFE 1717 0x00174D0x00174D 99 99 0x000000310x00000031 99 0x0000540x000054 -21-21 1818 0x000079060x00007906 1616 0x000BA00x000BA0 1010 99 0x0000003D0x0000003D 1010 0x00002C0x00002C -20-20 1717 0x00003C820x00003C82 1515 0x0005D20x0005D2 1111 1010 0x000000700x00000070 1010 0x0000AD0x0000AD -19-19 1515 0x00000F220x00000F22 1515 0x0005D10x0005D1 1212 1111 0x000000C00x000000C0 1111 0x00005E0x00005E -18-18 1515 0x00000F210x00000F21 1414 0x0002EA0x0002EA 1313 1111 0x000000E30x000000E3 1111 0x0001590x000159 -17-17 1313 0x000003CE0x000003CE 1414 0x000AC40x000AC4 1414 1212 0x000001E50x000001E5 1212 0x0000B80x0000B8 -16-16 1313 0x000003070x00000307 1313 0x0005630x000563 1515 1313 0x000003060x00000306 1212 0x0002B00x0002B0 -15-15 1212 0x000001E60x000001E6 1212 0x0000BB0x0000BB 1616 1313 0x000003C90x000003C9 1212 0x0002BD0x0002BD -14-14 1212 0x000001820x00000182 1212 0x0000B90x0000B9 1717 1414 0x0000079E0x0000079E 1313 0x0005780x000578 -13-13 1111 0x000000E20x000000E2 1111 0x00015F0x00015F 1818 1414 0x0000079F0x0000079F 1414 0x0002EB0x0002EB -12-12 1010 0x000000780x00000078 1111 0x00005F0x00005F 1919 1616 0x00001E470x00001E47 1414 0x000AC50x000AC5 -11-11 1010 0x000000610x00000061 1010 0x0000AE0x0000AE 2020 1616 0x00001E400x00001E40 1515 0x0015E40x0015E4 -10-10 99 0x000000390x00000039 1010 0x00002D0x00002D 2121 1717 0x00003C8C0x00003C8C 1515 0x0015E50x0015E5 -9-9 88 0x0000001F0x0000001F 99 0x0000550x000055 2222 1717 0x00003C8D0x00003C8D 1616 0x002BCE0x002BCE -8-8 88 0x000000190x00000019 99 0x0000150x000015 2323 1919 0x0000F20E0x0000F20E 1717 0x00174C0x00174C -7-7 77 0x000000240x00000024 88 0x0000150x000015 2424 2727 0x000F20FFE0x000F20FFE 1616 0x000BA70x000BA7 -6-6 77 0x0000000D0x0000000D 77 0x00000B0x00000B 2525 2626 0x0007907FE0x0007907FE 1616 0x002BCF0x002BCF -5-5 66 0x000000110x00000011 66 0x00000B0x00000B 2626 2525 0x0003C83FE0x0003C83FE 1818 0x002E860x002E86 -4-4 66 0x000000050x00000005 66 0x0000030x000003 2727 2424 0x0001E41FE0x0001E41FE 2323 0x05D0FE0x05D0FE -3-3 55 0x000000050x00000005 55 0x0000030x000003 2828 2323 0x0000F20FE0x0000F20FE 2222 0x02E87E0x02E87E -2-2 44 0x000000050x00000005 44 0x0000030x000003 2929 2222 0x00007907E0x00007907E 2121 0x01743E0x01743E -1-One 44 0x000000000x00000000 33 0x0000020x000002 3030 2121 0x00003C83E0x00003C83E 2020 0x00BA1E0x00BA1E 00 1One 0x000000010x00000001 1One 0x0000010x000001

그리고, 또한, 허프만 복호화부(410)는 디멀티플렉서(210)가 수신한 상기 신호가 4비트로 양자화된 신호인 경우에 하기 표 3의 허프만 코드북을 참조하여 허프만 복호화할 수 있다.In addition, the Huffman decoder 410 may decode Huffman with reference to the Huffman codebook of Table 3 when the signal received by the demultiplexer 210 is a signal quantized to 4 bits.

Index
Index Differential FrequencyDifferential Frequency Differential FrequencyDifferential Frequency IndexIndex Differential
TimeDifferential
Time Differential
TimeDifferential
Time 비트수Number of bits 코드워드Codeword 비트수Number of bits 비트수Number of bits 비트수Number of bits 코드워드Codeword 비트수Number of bits 코드워드Codeword -14-14 2121 0xFCA7F0xFCA7F 1717 0x1F9EF0x1F9EF 1One 33 0x000000x00000 33 0x000050x00005 -13-13 2121 0xFCA7E0xFCA7E 1717 0x1F9EE0x1F9EE 22 44 0x000040x00004 44 0x0000D0x0000D -12-12 1717 0x0FCA60x0FCA6 1313 0x01FFE0x01FFE 33 55 0x0000C0x0000C 55 0x0001D0x0001D -11-11 1616 0x07E520x07E52 1313 0x01F9F0x01F9F 44 66 0x0001D0x0001D 66 0x0003C0x0003C -10-10 1414 0x01F950x01F95 1212 0x00FFE0x00FFE 55 77 0x0003D0x0003D 77 0x0007C0x0007C -9-9 1111 0x003F30x003F3 1111 0x007FE0x007FE 66 88 0x0007D0x0007D 88 0x000FD0x000FD -8-8 1111 0x003F00x003F0 1010 0x003FE0x003FE 77 99 0x000FD0x000FD 99 0x001F80x001F8 -7-7 88 0x0007F0x0007F 99 0x001FE0x001FE 88 1111 0x003F10x003F1 1010 0x003F20x003F2 -6-6 88 0x0007C0x0007C 88 0x000FE0x000FE 99 1212 0x007E40x007E4 1111 0x007E60x007E6 -5-5 77 0x0003C0x0003C 77 0x0007D0x0007D 1010 1313 0x00FCB0x00FCB 1212 0x00FCE0x00FCE -4-4 66 0x0001C0x0001C 66 0x0003D0x0003D 1111 1515 0x03F280x03F28 1313 0x01FFF0x01FFF -3-3 55 0x0000D0x0000D 55 0x0001C0x0001C 1212 2020 0x7E53E0x7E53E 1414 0x03F3C0x03F3C -2-2 44 0x000050x00005 44 0x0000C0x0000C 1313 1919 0x3F29E0x3F29E 1515 0x07E7A0x07E7A -1-One 33 0x000010x00001 33 0x000040x00004 1414 1818 0x1F94E0x1F94E 1616 0x0FCF60x0FCF6 00 1One 0x000010x00001 1One 0x000000x00000

역양자화부(420)는 역양자화 테이블로 상기 디퍼렌셜 인덱스를 역양자화 하여 부가 정보를 복원할 수 있다. 구체적으로 역양자화부(420)는 프레임당 VSLA(Virtual Sound Location Angle)정보에 각각의 VSLA에 대응하는 양자화 테이블을 매핑하여 역양자화 할 수 있다. 이때, 본 발명에 따른 음원 위치 단서 기반의 멀티 채널 오디오는 기본적으로 frame단위의 DFT혹은 MDCT로 디코딩 과정이 수행되므로, 프레임간의 스모싱(smoothing)은 주로 윈도우잉을 통한 오버렙에드(overlap-add)방식에 의해 충족될 수 있다.The inverse quantization unit 420 may restore additional information by inversely quantizing the differential index into the inverse quantization table. In more detail, the inverse quantization unit 420 may inverse quantize by mapping quantization tables corresponding to each VSLA to virtual sound location angle (VSLA) information per frame. At this time, the multi-channel audio based on the sound source position clue according to the present invention is basically carried out by the frame-based DFT or MDCT, so that the smoothing between frames is mainly overlapped through windowing. Can be satisfied by the method.

이때, 역양자화부(420)는 VSLA정보가 LHA(Left Half-plane vector Angle)인 경우에 하기 표 4의 양자화 테이블을 매핑하여 역양자화 할 수 있다.In this case, the inverse quantization unit 420 may dequantize by mapping the quantization table shown in Table 4 below when the VSLA information is left half-plane vector angle (LHA).

IdxIdx -15-15 -14-14 -13-13 -12-12 -11-11 -10-10 -9-9 -8-8 -7-7 -6-6 -5-5 LHA[idx]LHA [idx] -55-55 -51.3-51.3 -47.6-47.6 -44.0-44.0 -40.3-40.3 -36.6-36.6 -33.0-33.0 -29.3-29.3 -25.6-25.6 -22.0-22.0 -18.3-18.3 IdxIdx -4-4 -3-3 -2-2 -1-One 00 1One 22 33 44 55 66 LHA[idx]LHA [idx] -14.6-14.6 -11-11 -7.3-7.3 -3.6-3.6 -3.6-3.6 7.37.3 1111 14.614.6 18.318.3 22.022.0 IdxIdx 77 88 99 1010 1111 1212 1313 1414 1515 LHA [idx]LHA [idx] 25.625.6 29.329.3 33.033.0 36.636.6 40.340.3 44.044.0 47.647.6 51.351.3 5555

이때, 역양자화부(420)는 상기 부가 정보를 복원하는 단계는 VSLA정보가 RHA(Right Half-plane vector Angle)인 경우에 하기 표 5의 양자화 테이블을 매핑하여 역양자화 할 수 있다.In this case, the dequantization unit 420 restores the additional information to dequantize by mapping the quantization table shown in Table 5 below when the VSLA information is a right half-plane vector angle (RHA).

IdxIdx -15-15 -14-14 -13-13 -12-12 -11-11 -10-10 -9-9 -8-8 -7-7 -6-6 -5-5 RHA[idx]RHA [idx] -55-55 -51.3-51.3 -47.6-47.6 -44.0-44.0 -40.3-40.3 -36.6-36.6 -33.0-33.0 -29.3-29.3 -25.6-25.6 -22.0-22.0 -18.3-18.3 IdxIdx -4-4 -3-3 -2-2 -1-One 00 1One 22 33 44 55 66 RHA[idx]RHA [idx] -14.6-14.6 -11-11 -7.3-7.3 -3.6-3.6 -3.6-3.6 7.37.3 1111 14.614.6 18.318.3 22.022.0 IdxIdx 77 88 99 1010 1111 1212 1313 1414 1515 RHA [idx]RHA [idx] 25.625.6 29.329.3 33.033.0 36.636.6 40.340.3 44.044.0 47.647.6 51.351.3 5555

이때, 역양자화부(420)는 상기 부가 정보를 복원하는 단계는 VSLA정보가 LSA(Left Subsequent vector Angle)인 경우에 하기 표 6의 양자화 테이블을 매핑하여 역양자화 할 수 있다.In this case, the dequantization unit 420 restores the additional information to dequantize by mapping the quantization table shown in Table 6 below when the VSLA information is Left Subsequent Vector Angle (LSA).

IdxIdx -15-15 -14-14 -13-13 -12-12 -11-11 -10-10 -9-9 -8-8 -7-7 -6-6 -5-5 LSA[idx]LSA [idx] -15-15 -14-14 -13-13 -12-12 -11-11 -10-10 -9-9 -8-8 -7-7 -6-6 -5-5 IdxIdx -4-4 -3-3 -2-2 -1-One 00 1One 22 33 44 55 66 LSA[idx]LSA [idx] -4-4 -3-3 -2-2 -1-One 00 1One 22 33 44 55 66 IdxIdx 77 88 99 1010 1111 1212 1313 1414 1515 LSA [idx]LSA [idx] 77 88 99 1010 1111 1212 1313 1414 1515

이때, 역양자화부(420)는 상기 부가 정보를 복원하는 단계는 VSLA정보가 RSA(Right Subsequent vector Angle)인 경우에 하기 표 7의 양자화 테이블을 매핑하여 역양자화 할 수 있다.In this case, the inverse quantization unit 420 restores the additional information may be dequantized by mapping the quantization table shown in Table 7 below when the VSLA information is a right subsequent vector angle (RSA).

IdxIdx -15-15 -14-14 -13-13 -12-12 -11-11 -10-10 -9-9 -8-8 -7-7 -6-6 -5-5 RSA[idx]RSA [idx] -15-15 -14-14 -13-13 -12-12 -11-11 -10-10 -9-9 -8-8 -7-7 -6-6 -5-5 IdxIdx -4-4 -3-3 -2-2 -1-One 00 1One 22 33 44 55 66 RSA[idx]RSA [idx] -4-4 -3-3 -2-2 -1-One 00 1One 22 33 44 55 66 IdxIdx 77 88 99 1010 1111 1212 1313 1414 1515 RSA [idx]RSA [idx] 77 88 99 1010 1111 1212 1313 1414 1515

또한, 역양자화부(420)는 각 인덱스 정보로부터 얻어진 VSLA정보로부터 하기 수학식 1을 만족하는 파라미터 값들을 추출할 수 있다.In addition, the inverse quantization unit 420 may extract parameter values satisfying Equation 1 from VSLA information obtained from each index information.

이때, 서브밴드의 수는 상기 부가 정보에 포함된 서브밴드 개수 정의 변수인 bsFreqRes에서 구할 수 있으며, 역양자화부(420)는 서브밴드의 수에 따라 전송되는 VSLA정보수도 매핑 할 수 있다. 또한, 최대 밴드 수는 28밴드(M_par=28)를 기준으로 비트레잇 또는 프레임의 주파수 특성에 따라, 분해능을 달리하여 밴드 수를 달리 할 수 있다.In this case, the number of subbands may be obtained from bsFreqRes, which is a subband number defining variable included in the additional information, and the dequantization unit 420 may also map the number of VSLA information transmitted according to the number of subbands. In addition, the maximum number of bands may be changed by varying the resolution according to the frequency characteristics of the bit rate or frame based on 28 bands (M _par = 28).

역양자화부(420)는 mapsubbands(bsFreqRes, M_par ₎를 통하여 하기된 표 8과 같이 매핑 할 수 있다.Inverse quantization unit 420 may be mapped as shown in Table 8 through the mapsubbands (bsFreqRes, M _par ₎ .

m
m Mapfunc(m, M_par ₎ Mapfunc (m, M _par ₎ M_par=28M _par = 28 M_par=20M _par = 20 M_par=14M _par = 14 M_par=10M _par = 10 M_par=7M _par = 7 M_par=5M _par = 5 M_par=4M _par = 4 0 0 0 0 0 0 0 0 0 0 00 00 00 1 One 1 One 1 One 0 0 0 0 00 00 00 2 2 2 2 2 2 1 One 1 One 00 00 00 3 3 3 3 3 3 1 One 1 One 00 00 00 4 4 4 4 4 4 2 2 2 2 1One 1One 00 5 5 5 5 5 5 3 3 2 2 1One 1One 00 6 6 6 6 6 6 4 4 3 3 22 1One 1One 7 7 7 7 7 7 4 4 3 3 22 1One 1One 8 8 8 8 8 8 5 5 4 4 22 22 1One 9 9 9 9 9 9 6 6 4 4 33 22 1One 1010 1010 10 10 6 6 5 5 33 22 1One 1111 1111 11 11 7 7 5 5 33 22 22 1212 1212 12 12 7 7 6 6 33 22 22 1313 1313 13 13 8 8 6 6 44 22 22 1414 1414 14 14 8 8 7 7 44 33 22 1515 1515 14 14 8 8 7 7 44 33 22 1616 1616 15 15 9 9 7 7 44 33 22 1717 1717 15 15 9 9 7 7 44 33 22 1818 1818 16 16 1010 8 8 55 33 22 1919 1919 16 16 1010 8 8 55 33 22 2020 2020 17 17 1111 8 8 55 33 22 2121 2121 17 17 1111 8 8 55 33 22 2222 2222 18 18 1212 9 9 66 44 33 2323 2323 18 18 1212 9 9 66 44 33 2424 2424 18 18 1212 9 9 66 44 33 2525 2525 19 19 1313 9 9 66 44 33 2626 2626 19 19 1313 9 9 66 44 33 2727 2727 19 19 1313 9 9 66 44 33

역양자화부(420)는 ERB밴드를 기준으로 설계된 각 밴드의 분해능을 사용하여 상기 수학식 1을 처리할 수 있으며, 상기 표 8과 같이 매핑 된 경우에서 M_par=28일 때 각 밴드의 분해능은 하기된 표 9와 같을 수 있다.The inverse quantization unit 420 may process Equation 1 using the resolution of each band designed based on the ERB band, and when M _par = 28 in the case of mapping as shown in Table 8, the resolution of each band is It may be as shown in Table 9 below.

mm M_par=28M _par = 28 kHzkHz 0 0 0 0 0.07020.0702 1 One 1 One 0.16390.1639 2 2 2 2 0.25760.2576 3 3 3 3 0.35120.3512 4 4 4 4 0.44490.4449 5 5 5 5 0.53850.5385 6 6 6 6 0.63220.6322 7 7 7 7 0.72590.7259 8 8 8 8 0.91320.9132 9 9 9 9 1.10051.1005 1010 1010 1.28781.2878 1111 1111 1.47511.4751 1212 1212 1.84981.8498 1313 1313 2.22442.2244 1414 1414 2.5992.599 1515 1515 2.97372.9737 1616 1616 3.72293.7229 1717 1717 4.47224.4722 1818 1818 5.22155.2215 1919 1919 5.97075.9707 2020 2020 6.726.72 2121 2121 7.46937.4693 2222 2222 8.59328.5932 2323 2323 9.71719.7171 2424 2424 11.215611.2156 2525 2525 13.088813.0888 2626 2626 15.336615.3366 2727 2727 2424

이때, 주기적 전치 부호 제거부(210), 역 이산 푸리에 변환부(220), 보호대역 제거부(230)는 상기 복호화 장치에서 사용하는 수신 안테나의 수와 동일한 수로 구성되어 각각의 수신 안테나와 대응할 수 있다.In this case, the cyclic prefix remover 210, the inverse discrete Fourier transform 220, and the guard band remover 230 may be configured to have the same number as the number of receive antennas used in the decoding apparatus and correspond to each receive antenna. have.

윈도우잉 필터뱅크부(230)는 기본적으로 T/F변환을 수행한 후, 주파수 영역의 밴드를 묶어 하나의 프로세싱 밴드(processing band)로 정의할 수 있다.After the windowing filter bank unit 230 basically performs T / F conversion, the windowing filter bank unit 230 may define a single processing band by combining bands in a frequency domain.

일례로 2048-point DFT 변환을 수행할 경우, 하기 표 10에 도시된 바와 같이 start, stop bin의 위치를 중심으로, 주파수 영역의 빈들을 묶어서 하나의 프로세싱 밴드로 정의할 수 있다. For example, when performing a 2048-point DFT transformation, bins in a frequency domain may be defined as a single processing band by centering the positions of start and stop bins as shown in Table 10 below.

mm 111=28111 = 28 Frequncy BinFrequncy bin Start bin's
positionStart bin's
position End bin's positionEnd bin's position 0 0 0 0 1One 33 1 One 1 One 44 77 2 2 2 2 88 1111 3 3 3 3 1212 1515 4 4 4 4 1616 1919 5 5 5 5 2020 2323 6 6 6 6 2424 2727 7 7 7 7 2828 3131 8 8 8 8 3232 3939 9 9 9 9 4040 4747 1010 1010 4848 5555 1111 1111 5656 6363 1212 1212 6464 7979 1313 1313 8080 9595 1414 1414 9696 111111 1515 1515 112112 127127 1616 1616 128128 159159 1717 1717 160160 191191 1818 1818 192192 223223 1919 1919 224224 255255 2020 2020 256256 287287 2121 2121 288288 319319 2222 2222 320320 367367 2323 2323 368368 415415 2424 2424 416416 479479 2525 2525 480480 559559 2626 2626 560560 655655 2727 2727 656656 10251025

이때,통합 업믹싱부(240)는 상기 부가 정보 비트스트림에서 복원된 VSLA정보를 기반으로 상기 다운믹스 신호의 각 채널의 서브밴드 내에서의 음상정위를 통하여 멀티채널 신호를 복원할 수 있다. 구체적으로 상기 부가 정보 비트스트림으로부터 하기된 도 5와 같이 각 서브밴드 내에서의 파워(Power) 정보를 패닝 각도(Panning Angle)로부터 예측하고 이를 적용함으로써 각 채널의 서브밴드 신호를 예측할 수 있다. In this case, the integrated upmixing unit 240 may restore the multi-channel signal through sound phase in the subband of each channel of the downmix signal based on the VSLA information recovered from the additional information bitstream. In detail, as shown in FIG. 5 described below, the subband signal of each channel can be predicted by predicting and applying power information in each subband from a panning angle.

도 5는 본 발명의 일실시예에 따른 통합 업믹싱부가 각 채널의 이득을 예측하는 과정의 일례이다. 5 is an example of a process of estimating the gain of each channel by the integrated upmixing unit according to an embodiment of the present invention.

통합 업믹싱부(240)는 도 5에 도시된 바와 같이 각 채널의 음상정보를 계층적으로 복원하며 각 채널의 이득(gain)을 예측할 수 있다.As illustrated in FIG. 5, the integrated upmixing unit 240 may reconstruct the sound information of each channel hierarchically and predict the gain of each channel.

먼저, 통합 업믹싱부(240)는 LHA[idx](510)와 RHA[idx](520)을 복원할 수 있다.First, the integrated upmixer 240 may restore the LHA [idx] 510 and the RHA [idx] 520.

다음으로 통합 업믹싱부(240)는 LHA[idx](510)로부터 gLs[idx](530)을 예측하고, LSA[idx](511)를 복원하며, RHA[idx](520)로부터 gRs[idx](540)을 예측하고, RSA[idx](521)를 복원할 수 있다.Next, the integrated upmixer 240 predicts the gLs [idx] 530 from the LHA [idx] 510, restores the LSA [idx] 511, and gRs [from the RHA [idx] 520. idx] 540 may be predicted and the RSA [idx] 521 may be restored.

그 다음에 통합 업믹싱부(240)는 LSA[idx](511)로부터 gL[idx](550)와 gCL[idx](512)을 예측하고, RSA[idx](521)로부터 gRs[idx](560)와 gCR[idx](522)를 예측할 수 있다.The integrated upmixer 240 then predicts gL [idx] 550 and gCL [idx] 512 from the LSA [idx] 511 and gRs [idx] from the RSA [idx] 521. 560 and gCR [idx] 522 may be predicted.

마지막으로 통합 업믹싱부(240)는 gCL[idx](512)와 gCR[idx](522)로부터 gCL[idx]/sqrt(2)(570)를 예측할 수 있다. 이때, gCL[idx]/sqrt(2)은 gCL[idx](512) * 0.7071로서 센터 채널의 이득을 스케일링한 값일 수 있다.Finally, the integrated upmixer 240 may predict gCL [idx] / sqrt (2) 570 from gCL [idx] 512 and gCR [idx] 522. In this case, gCL [idx] / sqrt (2) may be a value obtained by scaling the gain of the center channel as gCL [idx] 512 * 0.7071.

통합 업믹싱부(240)는 상기 과정을 통하여 예측한 상기 멀티 채널 신호를 기초로 상기 다운 믹스 신호를 업믹싱(upmixing)하여 업믹싱 신호를 생성할 수 있다.The integrated upmixing unit 240 may generate an upmixing signal by upmixing the downmix signal based on the multi-channel signal predicted through the above process.

이때, X_dmxL(m,k)를 전송된 Left 다운믹스 신호의 m번째 서브밴드의 k번째 주파수 bin이라 할 때, 'Left upmixing Matrix'는 하기된 수학식 2를 만족할 수 있다.In this case, when X _dmxL (m, k) is the k-th frequency bin of the m-th subband of the transmitted Left downmix signal, 'Left upmixing matrix' may satisfy Equation 2 described below.

또한, Right 다운믹스 신호에 대해서 'Rightupmixing Matrix'는 하기된 수학식 3을 만족할 수 있다.In addition, 'Rightupmixing Matrix' for the Right downmix signal may satisfy Equation 3 below.

또한, 통합 업믹싱부(240)는 DFT기반의 역상관기(decorrelator)인 D_L과 D_R을 포함할 수 있다.In addition, the integrated upmixing unit 240 may include D _L and D _R , which are decorrelators based on DFT.

상기 D_L과 D_R은 하이 컴플렉시티 모드(High-complexity mode)와 일반 모드인 로우 컴플렉시티 모드(Low-complexity mode)로 동작할 수 있다. 이때, 상기 D_L과 D_R은 디코더에서만 생성되며, 고음장감을 생성할 경우에는 하이 컴플렉시티 모드로 동작하고, 일반 음질을 생성할 경우에는 로우 컴플렉시티 모드로 동작할 수 있다.The D _L and D _R may operate in a high-complexity mode and a low-complexity mode in a normal mode. In this case, the D _L and the D _R may be generated only in the decoder, and may operate in a high complexity mode when generating a high sound feeling, and may operate in a low complexity mode when generating general sound quality.

상기 D_L과 D_R은 하이 컴플렉시티 모드일 경우에 L(m,k)와 R(m,k)에 대한 하기된 수학식 4의 매트릭싱(matrixing) 연산을 수행하여 역 상관 신호를 생성할 수 있다. The D _L and D _R generate an inverse correlation signal by performing a matrixing operation of Equation 4 below for L (m, k) and R (m, k) in the high complexity mode. can do.

또한, D_L과 D_R은 일반모드일 경우에 하기된 수학식 5를 만족하며 역 상관 신호를 생성하지 않을 수 있다.In addition, D _L and D _R satisfy the following Equation 5 in the normal mode and may not generate an inverse correlation signal.

또한, 통합 업믹싱부(240)는 상기 수학식 2와 수학식 3에서 계산된 값을 하기된 수학식 6에 사용하여 업믹싱을 할 수 있다. In addition, the integrated upmixing unit 240 may perform upmixing by using the values calculated in Equation 2 and Equation 3 below.

이때, α(m)은 각 밴드 별 L, R 신호간의 상관관계를 나타내는 펙터(factor)일 수 있다. 또한, δ는 고정계수이며, 인코더에서 다운믹스 시, 서라운드(surround) 신호에 대한 믹싱(mixing) 계수의 역인 고정계수일 수 있다.In this case, α (m) may be a factor representing a correlation between L and R signals for each band. In addition, δ is a fixed coefficient and may be a fixed coefficient which is an inverse of a mixing coefficient of a surround signal when the mixer is downmixed.

이때, 상기 α(m)은 상기 수학식 4와 수학식 5에서 계산된 값을 하기된 수학식 7에 사용하여 얻을 수 있다. In this case, α (m) may be obtained by using the values calculated in Equations 4 and 5 in Equation 7 below.

이때,λ는 가중치(weighting) 계수로써, 역상관 신호의 믹싱(mixing) 정도를 조절하기 위한 값일 수 있다. 따라서, α(m)은 0≤α(m)≤

이며,

는 0≤

≤1의 범위 내에서 정의될 수 있다.In this case, λ is a weighting coefficient and may be a value for adjusting the degree of mixing of the decorrelation signal. Therefore, α (m) is 0 ≦ α (m) ≦

Is,

Is 0≤

It can be defined within the range of ≤ 1.

또한, 상기 _wetL(m,k)와 _wetR(m,k)은 역상관 신호로서 역상관기에서 실행되는 역상관 프로세싱(Decorrelation processing)을 거쳐 생성될 수 있다.Also, the _wet L (m, k) and the _wet R (m, k) may be generated through decorrelation processing performed in the decorrelator as a decorrelative signal.

도 6은 본 발명의 일실시예에 따른 역상관기를 도시한 일례이다. 6 illustrates an example of a decorrelator according to an embodiment of the present invention.

본 발명의 일실시예에 따른 역상관기(600)는 통합 업믹싱부(240)에 포함되어 역상관 신호를 생성하는 구성으로서 도 6에 도시된 바와 같이 콤플렉스 변환부(610), 진폭 정보 추출부(620), 위상 정보 추출부(630), 랜덤 시퀀스 저장부(640), 윈도우잉부(650), 위상 변환부(660), 통합부(670), 및 역 콤플렉스 변환 부(680)로 구성될 수 있다.The decorrelator 600 according to an embodiment of the present invention is included in the integrated upmixing unit 240 to generate a decorrelative signal, as shown in FIG. 6, the complex converter 610 and the amplitude information extractor. 620, the phase information extractor 630, the random sequence storage unit 640, the windowing unit 650, the phase shifter 660, the integrator 670, and the inverse complex transform unit 680. Can be.

콤플렉스 변환부(610)는 상기 다운 믹스 신호를 콤플렉스 변환할 수 있다.The complex converter 610 may complex convert the downmix signal.

진폭 정보 추출부(620)와 위상 정보 추출부(630)는 콤플렉스 변환부(610)에서 변환된 상기 다운 믹스 신호로부터 각각 진폭 정보와 위상 정보를 추출하여 상기 다운 믹스 신호를 분리할 수 있다.The amplitude information extractor 620 and the phase information extractor 630 may extract amplitude information and phase information from the downmix signal converted by the complex converter 610 to separate the downmix signal.

윈도우잉부(650)는 진폭 정보 추출부(620)가 추출한 상기 진폭 정보의 포락선을 기초로 위상 정보를 보정하기 위한 스펙트럼 형 윈도우를 모델링하고, 랜덤 시퀀스 저장부(640)에 기 저장된 랜덤 시퀀스에 상기 스펙트럼 형 윈도우를 사용하여 윈도우잉할 수 있다.The windowing unit 650 models a spectral window for correcting phase information based on the envelope of the amplitude information extracted by the amplitude information extracting unit 620, and stores the spectral window in the random sequence stored in the random sequence storage unit 640. Windowing can be done using spectral windows.

이때, 랜덤 시퀀스 저장부(640)에 기 저장된 랜덤 시퀀스의 개수는 상기 다운 믹스 신호의 개수에 따라 정해진다. 즉, 상기 _wetL(m,k)와 _wetR(m,k)을 생성하기 위해서. 서로 상이한 랜덤 시퀀스가 사용되며, 이때 사용되는 두 랜덤 시퀀스 간의 상관도는 0에 가까울 수 있다.In this case, the number of random sequences previously stored in the random sequence storage unit 640 is determined according to the number of the downmix signals. That is, to generate the _wet L (m, k) and _wet R (m, k). Different random sequences are used, and the correlation between the two random sequences used may be close to zero.

위상 변환부(660)는 윈도우잉부(650)에서 윈도우잉된 랜덤 시퀀스를 사용하여 위상 정보 추출부(630)에서 추출한 상기 위상 정보에 가중치를 부여할 수 있다.The phase converter 660 may weight the phase information extracted by the phase information extractor 630 using the random sequence windowed by the windower 650.

통합부(670)는 진폭 정보 추출부(620)가 추출한 상기 진폭 정보와 위상 변환부(660)에서 가중치가 부여된 위상 정보를 결합할 수 있다.The integrator 670 may combine the amplitude information extracted by the amplitude information extractor 620 and the phase information weighted by the phase converter 660.

역 콤플렉스 변환부(680)는 통합부(670)에서 결합된 정보를 역 콤플렉스 변환하여 역상관 신호를 계산할 수 있다.The inverse complex transform unit 680 may calculate an inverse correlation signal by inverse complex transforming the information combined in the integrator 670.

도 7은 본 발명의 일실시예에 따른 음원 위치 단서 기반의 멀티 채널 오디오 복호화 방법을 도시한 흐름도이다. 7 is a flowchart illustrating a multi-channel audio decoding method based on a sound source location clue according to an embodiment of the present invention.

단계(S710)에서 디멀티플렉서(210)는 멀티플렉서(150)가 전송한 신호를 수신하고, 수신한 상기 신호를 오디오 비트스트림과 부가 정보 비트스트림으로 파싱할 수 있다.In operation S710, the demultiplexer 210 may receive a signal transmitted by the multiplexer 150 and parse the received signal into an audio bitstream and an additional information bitstream.

단계(S720)에서 오디오 복호화부(220)는 단계(S710)에서 파싱된 상기 오디오 비트스트림을 기초로 다운믹스 신호를 복원할 수 있다.In operation S720, the audio decoder 220 may restore a downmix signal based on the audio bitstream parsed in operation S710.

단계(S730)에서 허프만 복호화부(410)는 허프만 코드북으로 단계(S710)에서 파싱된 상기 부가 정보 비트스트림을 허프만 복호화하여 디퍼렌셜 인덱스를 생성할 수 있다.In operation S730, the Huffman decoder 410 may generate a differential index by Huffman decoding the additional information bitstream parsed in operation S710 by using the Huffman codebook.

단계(S740)에서 역양자화부(420)는 역양자화 테이블로 단계(S710)에서 생성된 상기 디퍼렌셜 인덱스를 역양자화 하여 부가 정보를 복원할 수 있다. 구체적으로 역양자화부(420)는 프레임당 VSLA(Virtual Sound Location Angle)정보에 각각의 VSLA에 대응하는 양자화 테이블을 매핑하여 역양자화 할 수 있다.In operation S740, the inverse quantization unit 420 may restore additional information by inversely quantizing the differential index generated in operation S710 with an inverse quantization table. In more detail, the inverse quantization unit 420 may inverse quantize by mapping quantization tables corresponding to each VSLA to virtual sound location angle (VSLA) information per frame.

단계(S750)에서 통합 업믹싱부(240)는 단계(S720)에서 복원된 상기 다운믹스 신호와 단계(S740)에서 복원된 상기 부가 정보를 사용하여 멀티 채널 신호를 예측하고, 상기 멀티 채널 신호를 기초로 상기 다운 믹스 신호를 업믹싱(upmixing)하여 업믹싱 신호를 생성할 수 있다.In operation S750, the integrated upmixer 240 predicts a multi-channel signal by using the downmix signal reconstructed in operation S720 and the additional information reconstructed in operation S740. Upmixing the downmix signal may be performed to generate an upmix signal.

단계(S760)에서 통합 필터뱅크 윈도우잉부(250)는 단계(S750)에서 생성된 상기 업믹싱 신호에 통합 필터뱅크를 수행하여 시간 영역 신호를 추출하고, 상기 업 믹싱 신호를 윈도우잉하여 출력 신호를 추출할 수 있다.In operation S760, the integrated filter bank windowing unit 250 performs an integrated filter bank on the upmixing signal generated in operation S750 to extract a time domain signal, and windows the upmixing signal to output an output signal. Can be extracted.

이상에서 설명한 바와 같이, 본 발명에 따른 음원 위치 단서 기반의 멀티 채널 오디오 복호화 장치 및 방법은 멀티채널 오디오 신호를 입력 받아 이를 압축하고 기본 스테레오 코덱(core stereo codec)을 통하여 스테레오 신호를 압축 전송함으로써, 기존 스테레오 오디오 코덱과 역 호환성(backward compatible)을 제공함과 동시에, 멀티채널 오디오 전송이 가능하다. As described above, the multi-channel audio decoding apparatus and method based on the sound source position clue according to the present invention receives a multi-channel audio signal, compresses it, and compresses and transmits the stereo signal through a core stereo codec. It provides backward compatibility with existing stereo audio codecs and enables multichannel audio transmission.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.In the present invention as described above has been described by the specific embodiments, such as specific components and limited embodiments and drawings, but this is provided to help a more general understanding of the present invention, the present invention is not limited to the above embodiments. For those skilled in the art, various modifications and variations are possible from these descriptions.

따라서 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be construed as being limited to the described embodiments, and all of the equivalents or equivalents of the claims, as well as the following claims, are included in the scope of the present invention.

도 7은 본 발명의 일실시예에 따른 음원 위치 단서 기반의 멀티 채널 오디오 복호화 방법을 도시한 흐름도이다.7 is a flowchart illustrating a multi-channel audio decoding method based on a sound source location clue according to an embodiment of the present invention.

Claims

A demultiplexer for receiving a signal and parsing the received signal into an audio bitstream and an additional information bitstream;

An audio decoder for restoring a downmix signal based on the audio bitstream and transmitting the downmix signal to an integrated upmixing unit without applying an analytical filter bank to the restored downmix signal;

An integrated upmixing unit for predicting a multichannel signal using the downmix signal and the side information bitstream, and generating an upmixing signal by upmixing the downmix signal based on the multichannel signal;

An integrated filter bank windowing unit extracts a time domain signal by performing an integrated filter bank on the upmix signal, and extracts an output signal by windowing the upmix signal.

Apparatus for decoding multi-channel audio based on the sound source position clues, comprising: a.

The method of claim 1,

And the audio decoder comprises a stereo core codec using a TDAC (bank of time domain alias cancellation) files.

The method of claim 1,

The integrated filter bank windowing unit windowing the upmixed signal to be mapped with the analytical windowing of the core stereo audio, multi-channel audio decoding apparatus based on the sound source position.

The method of claim 1,

The integrated upmixing unit separates amplitude information and phase information from the downmix signal, weights the phase information by windowing a pre-stored random sequence based on the amplitude information, and adds the amplitude information and weighted values. A multi-channel audio decoding apparatus based on sound source position clues, comprising: predicting a multi-channel signal based on phase information.

5. The method of claim 4,

And the integrated upmixing unit complex-converts the downmix signal and separates amplitude information and phase information from the complex-converted downmix signal.

5. The method of claim 4,

The integrated upmixer models a spectral window for correcting phase information based on the envelope of the amplitude information, windowes the spectral window using a pre-stored random sequence, and uses a windowed random sequence. And weighting the phase information by using the multi-channel audio decoding apparatus based on the sound source position.

5. The method of claim 4,

The integrated upmixing unit combines the amplitude information and the weighted phase information, and inversely complex-converts the combined amplitude information and the weighted phase information to predict a multi-channel signal. Multi-channel audio decoding device.

Parsing the received signal into an audio bitstream and an additional information bitstream;

Restoring a downmix signal that does not apply an analytical filter bank based on the audio bitstream;

Generating a differential index by Huffman decoding the side information bitstream;

Dequantizing the differential index with a quantization table to restore additional information;

Predicting a multi-channel signal using the downmix signal and the side information;

Generating an upmix signal by upmixing the downmix signal based on the multi-channel signal;

Extracting a time domain signal by performing an integrated filter bank on the upmix signal; And

Windowing the upmix signal to extract an output signal

A decoding method of a multi-channel audio based sound source position clues, characterized in that it comprises a.

9. The method of claim 8,

Generating the upmix signal,

Separating amplitude information and phase information from the downmix signal;

Weighting the phase information by windowing a random sequence previously stored based on the amplitude information;

Predicting a multi-channel signal based on the amplitude information and weighted phase information; And

Upmixing the downmix signal based on the multi-channel signal

A multi-channel audio decoding method based on sound source position clues, characterized in that the.

10. The method of claim 9,

The separating step,

Complex converting the downmix signal; And

Separating amplitude information and phase information from the complex-converted downmix signal

Sound channel position clue based multi-channel audio decoding method comprising a.

10. The method of claim 9,

The assigning weights may include:

Modeling a spectral type window for correcting phase information based on the envelope of the amplitude information;

Windowing the spectral window using the pre-stored random sequence; And

Weighting the phase information using a windowed random sequence

10. The method of claim 9,

Predicting the multi-channel signal,

Combining the amplitude information with weighted phase information; And

Predicting a multi-channel signal by inverse complex transforming the combined information in the combining step