KR20100090962A

KR20100090962A - Multi-channel audio decoder, transceiver comprising the same decoder, and method for decoding multi-channel audio

Info

Publication number: KR20100090962A
Application number: KR1020090010213A
Authority: KR
Inventors: 서상원; 김헌중
Original assignee: 주식회사 코아로직
Priority date: 2009-02-09
Filing date: 2009-02-09
Publication date: 2010-08-18

Abstract

PURPOSE: A multichannel audio decoder for increasing the speed and memory size of the decoder is provided to reduce the operation quantity about the decoder in a multi channel audio system using a conversion coding method. CONSTITUTION: A frequency domain signal generator(310) generates the data signal of frequency domain from an encoded bitstream data signal. A down mixer performs the down-mixing of the frequency domain about data signal. An IFFT(Inverse Fast Fourier Transform) unit(350) transforms down-mixed data signal into the data signal of a temporal area. A window unit(370) generates PCM(Pulse Code Modulation) output signal by overlap add operation.

Description

Multi-channel audio decoder, transceiver including the decoder and multi-channel audio decoding method {Multi-channel audio decoder, transceiver comprising the same decoder, and method for decoding multi-channel audio}

본 발명은 신호 처리에 관한 것으로서, 특히 변환 코딩 설계를 사용하는 멀티채널 오디오 디코딩 및 방법에 관한 것이다.TECHNICAL FIELD The present invention relates to signal processing, and in particular, to multichannel audio decoding and methods using transform coding schemes.

일반적으로 디지털 오디오 코더 또는 코덱(Codec)은 아날로그 오디오 신호를 디지털 형태로 변환하거나 그 역으로 변환할 수 있는 장치를 말한다. 이러한 아날로그 신호에서 디지털 신호로의 변환 과정은 오디오 코더의 인코더(Encoder) 단에서 이루어진다. 이 변환을 통해 일련의 숫자로 표현된 디지털 신호를 갖게 되면, 이것을 저장할 수 있고, 처리할 수 있으며, 전송할 수도 있다. 또한, 우리는 이 사운드를 나중에 재생해서 다시 들을 수도 있다. 이를 위해서는 사람의 귀로 들을 수 있도록 디지털 신호를 다시 아날로그 신호로 변환할 수 있어야 한다. 디지털 신호에서 아날로그 신호로의 역변환은 코더의 디코더(Decoder) 단에서 이루어진다.Generally, a digital audio coder or codec refers to a device capable of converting an analog audio signal into a digital form and vice versa. The conversion from the analog signal to the digital signal is performed at the encoder stage of the audio coder. If we have a digital signal represented as a series of numbers through this transformation, we can store it, process it, and transmit it. We can also play this sound later and listen again. This requires converting the digital signal back into an analog signal for the human ear to hear. The inverse conversion from digital signals to analog signals is done at the decoder stage of the coder.

디지털 오디오의 장점은 아날로그에 비해 음질의 변화가 거의 없이 신호를 처리할 수 있다는 데에 있다. 따라서, 디지털 오디오의 도입을 통해 원음을 거의 손실 없이 들을 수 있고, 다른 곳으로 전송할 수 있는 환경이 갖추어지게 되었다. 또한, 디지털 오디오는 영화나 컴퓨터와 같은 다른 매체와의 호환성이 뛰어나므로 계속해서 그 응용분야가 확장되고 있다. The advantage of digital audio is that it can process signals with little change in sound quality compared to analog. Therefore, the introduction of digital audio has provided an environment in which the original sound can be heard with little loss and transmitted elsewhere. In addition, digital audio is highly compatible with other media such as movies and computers, and its applications continue to expand.

한편, 아날로그 오디오 신호를 샘플링해서 얻어지는 고음질의 디지털 오디오 신호는 데이터량이 아주 많아서 막대한 용량의 저장매체를 요구하며, 전송시 매우 넓은 주파수 대역을 차지하게 된다. 따라서 오디오 신호를 음질의 손실 없이 효과적으로 압축하는 방법이 필요하게 되었다. 현재 오디오 신호의 주된 코딩 기법은 압출률을 높이기 위해 시간-주파수 변환(Time-to-Frequency Mapping)을 사용하여 주파수 도메인에서의 신호 특성을 이용하는 것이다. 대부분 오디오 신호는 시간보다 주파수에 대한 함수로 설명하는 것이 더 바람직하다. 그 이유는 우리가 사운드를 지각할 때 주로 사운드의 톤(tone) 성분에 의존하기 때문이며 시간보다는 주파수 상에서 사운드를 훨씬 더 간결하게 표현할 수 있기 때문이다.On the other hand, high-quality digital audio signals obtained by sampling analog audio signals require a large amount of storage media due to a large amount of data, and occupy a very wide frequency band during transmission. Therefore, there is a need for a method of compressing an audio signal effectively without loss of sound quality. Currently, the main coding technique for audio signals is to use signal characteristics in the frequency domain using time-to-frequency mapping to increase the extrusion rate. Most audio signals are better described as a function of frequency than time. The reason is that when we perceive sound, it depends mainly on the tone component of the sound, and because it can express the sound much more concisely in frequency than in time.

이와 같이 시간-주파수 변환을 이용하여 오디오 코딩을 하는 멀티채널 오디오 디코더에서는 일반적으로 입력 채널의 주파수 영역(Frequency Domain)의 신호를 시간 영역(Time Domain)의 신호로 변환하여 출력 채널 수에 맞게 다운-믹싱(Down-Mixing) 한다. 이러한 멀티채널 코딩의 주목적은 멀티채널 오디오 신호의 원래 공간 속성과 기본 음질을 유지하는 동시에 채널 간의 중복성과 신호 공간 표현상의 무관성을 제거해 데이터 에러율을 줄이는 데에 있다. As described above, in a multi-channel audio decoder using audio coding using time-frequency conversion, a signal in a frequency domain of an input channel is converted into a signal in a time domain, and down-sized according to the number of output channels. Down-Mixing The main purpose of such multichannel coding is to reduce the data error rate by eliminating the redundancy between the channels and the independence of the signal space representation while maintaining the original spatial properties and basic sound quality of the multichannel audio signal.

오늘날 멀티채널 오디오는 디지털 기술의 발전, 전송 대역폭과 저장공간의 지속적인 성장으로 인해 광범위한 오디오 재생 시스템에 적용될 수 있는 현실적인 대안이 되고 있다. 하지만 오디오 레코딩이나 필름 믹스의 청취자는 오디오 레코딩이나 필름 믹스가 생성된 채널의 수보다 더 적은 수의 채널을 지원하는 오디오 시스템을 가질 수 있고, 그에 따라 지원되는 채널 수에 따라 다채널 신호를 다운-믹싱을 해야 한다. 그러나, 현재의 다운-믹싱 방법은 입력 채널 수만큼의 시간-주파수 변환이 선행되어야 하므로, 많은 연산량이 필요하고, 그에 따라 메모리의 소비가 크다는 문제점이 있다.Today, multichannel audio is becoming a viable alternative for a wide range of audio playback systems due to the advances in digital technology, transmission bandwidth and storage space. However, the listener of an audio recording or film mix can have an audio system that supports fewer channels than the number of channels from which the audio recording or film mix was created, thereby downgrading the multichannel signal depending on the number of channels supported. You have to mix. However, the current down-mixing method requires a large amount of computation since the time-frequency conversion needs to be preceded by the number of input channels, thereby causing a problem of high memory consumption.

본 발명이 해결하고자 하는 과제는 출력 채널 수에 따라 미리 다운-믹싱을 수행하고, 믹싱된 계수들을 시간-주파수 역변환하여 시간 영역의 신호를 생성함으로써, 변환 과정의 연산량을 줄일 수 있고, 또한 메모리의 사용을 줄일 수 있는 멀티채널 오디오 디코더, 그 디코더를 포함한 송수신 장치 및 멀티채널 오디오 디코딩 방법을 제공하는 데에 있다.The problem to be solved by the present invention is to perform down-mixing according to the number of output channels in advance, and time-frequency inverse conversion of the mixed coefficients to generate a time domain signal, thereby reducing the amount of computation in the conversion process, The present invention provides a multichannel audio decoder, a transceiver including the decoder, and a multichannel audio decoding method capable of reducing use.

상기 과제를 해결하기 위하여, 본 발명은 인코딩된 비트스트림(Bitstream) 데이터 신호로부터 주파수 영역(Frequency Domain)의 데이터 신호를 생성하는 주파수 영역 신호 생성부; 상기 주파수 영역의 데이터 신호를 출력 채널 수에 따라 주파수 영역의 다운-믹싱(down-mixing)하는 다운 믹서; 다운-믹싱된 상기 주파수 영역의 데이터 신호를 시간 영역의 데이터 신호로 변환하는 IFFT(Inverse Fast Fourier Transform)부; 및 상기 시간 영역의 데이터 신호에 대하여 윈도우잉(windowing) 및 중첩-가산(overlap add) 연산을 수행하여 PCM(Pulse Code Modulation) 출력신호를 생성하는 윈도우부;를 포함하는 멀티채널 오디오 디코더를 제공한다.In order to solve the above problems, the present invention provides a frequency domain signal generation unit for generating a frequency domain (Frequency Domain) data signal from the encoded bitstream data signal; A down mixer for down-mixing the data signal in the frequency domain according to the number of output channels; An Inverse Fast Fourier Transform (IFFT) unit for converting down-mixed data signals in the frequency domain into data signals in the time domain; And a window unit generating a pulse code modulation (PCM) output signal by performing windowing and overlap add operations on the data signal in the time domain. .

본 발명에 있어서, 상기 IFFT부는 상기 출력 채널 수만큼의 시간-주파수 역변환을 수행할 수 있다. 예컨대, 상기 다운 믹서에 M개의 입력 채널을 통해 상기 주파수 영역의 데이터 신호가 입력되는 경우에, 상기 IFFT부는 상기 다운 믹서에 의해 상기 M보다 작은 N개의 출력 채널로 다운-믹싱된 상기 주파수 영역의 데이터 신호를 상기 시간 영역의 데이터 신호로 변환할 수 있다.In the present invention, the IFFT unit may perform time-frequency inverse transformation by the number of output channels. For example, when the data signal of the frequency domain is input to the down mixer through M input channels, the IFFT unit is down-mixed by the down mixer into N output channels smaller than M by the data of the frequency domain. The signal may be converted into a data signal of the time domain.

또한, 상기 주파수 영역 신호 생성부는 입력 채널들의 각 주파수 대역에 따라 상기 주파수 영역(Frequency Domain)의 데이터 신호를 생성하게 되는데, 본 발명의 멀티채널 오디오 디코더는 상기 입력 채널들의 블럭 사이즈가 동일한지 여부를 판단하여, 상기 주파수 영역의 데이터 신호를 상기 다운 믹서 또는 상기 IFFT부로 입력하는 블럭 스위칭부를 더 포함할 수 있다. 상기 블럭 사이즈가 동일한 경우는 상기 주파수 영역 다운 믹서로 상기 주파수 영역의 데이터 신호가 입력되어 다운-믹싱된 후 상기 IFFT부를 통해 시간 영역의 데이터 신호로 변환되며, 상기 블럭 사이즈가 다른 경우는 상기 IFFT부로 상기 주파수 영역의 데이터 신호가 입력되어 시간 영역 데이터 신호로 변환된 후, 상기 다운 믹서를 통해 시간 영역의 다운-믹싱이 될 수 있다. In addition, the frequency domain signal generator generates a data signal in the frequency domain according to each frequency band of the input channels. The multi-channel audio decoder of the present invention determines whether the block sizes of the input channels are the same. The apparatus may further include a block switching unit configured to input the data signal in the frequency domain to the down mixer or the IFFT unit. If the block sizes are the same, data signals of the frequency domain are input and down-mixed to the frequency domain down mixer, and then converted into data signals of the time domain through the IFFT unit, and if the block sizes are different, the IFFT unit is used. After the data signal in the frequency domain is input and converted into a time domain data signal, the down mixer may be down-mixed in the time domain.

본 발명은 또한 상기 과제를 달성하기 위하여, 인코딩된 비트스트림(Bitstream)을 송수신하는 송수신부; PCM(Pulse Code Modulation)을 비트스트림으로 인코딩하는 멀티채널 오디오 인코더; 주파수 영역 신호 생성부, 다운 믹서, IFFT부 및 윈도우부를 구비하여 상기 PCM 신호로 디코딩하는 멀티채널 오디오 디코더; 상기 멀티채널 오디오 인코더로 음성 신호를 입력하거나 상기 멀티채널 오디오 디코더부터의 PCM 신호를 출력하는 입출력부; 및 상기 송수신부, 멀티채널 인코더와 디코더 및 입출력부를 제어하는 제어부;를 포함하고, 상기 멀티채널 오디오 디코더의 상기 다운 믹서가 주파수 영역에서 다운-믹싱을 수행하는 것을 특징으로 하 는 송수신 장치를 제공한다.The present invention also provides a transceiver for transmitting and receiving an encoded bitstream (Bitstream) to achieve the above object; A multichannel audio encoder for encoding Pulse Code Modulation (PCM) into a bitstream; A multi-channel audio decoder having a frequency domain signal generator, a down mixer, an IFFT unit, and a window unit to decode the PCM signal; An input / output unit for inputting a voice signal to the multichannel audio encoder or outputting a PCM signal from the multichannel audio decoder; And a control unit for controlling the transceiver, the multichannel encoder, the decoder, and the input / output unit, wherein the down mixer of the multichannel audio decoder performs down-mixing in a frequency domain. .

더 나아가 본 발명은 상기 과제를 달성하기 위하여, 인코딩된 비트스트림 데이터 신호로부터 주파수 영역의 데이터 신호를 생성하는 단계; 상기 주파수 영역의 데이터 신호를 출력 채널 수에 따라 주파수 영역의 다운-믹싱하는 단계; 다운-믹싱된 상기 주파수 영역의 데이터 신호를 IFFT를 통해 시간 영역의 데이터 신호로 변환하는 단계; 및 상기 시간 영역의 데이터 신호에 대하여 윈도우잉 및 중첩-가산 연산을 수행하여 PCM 신호를 출력하는 단계;를 포함하는 멀티채널 오디오 디코딩 방법을 제공한다.The present invention further provides a method for generating the data signal in the frequency domain from an encoded bitstream data signal to achieve the above object; Down-mixing the frequency-domain data signal according to the number of output channels; Converting the down-mixed data signal in the frequency domain into a time domain data signal through an IFFT; And outputting a PCM signal by performing windowing and overlap-add operations on the data signal in the time domain.

본 발명에 있어서, 상기 다운-믹싱 단계에서, 입력 채널이 M개이고 출력 채널이 N개인 경우에, N×M의 다운 믹스 매트릭스 G(N,M)을 이용하여 주파수 영역 또는 시간 영역 상의 다운-믹싱을 수행할 수 있다. 또한, 주파수 영역의 입력 채널의 데이터 신호를 M×1의 F_M(x) 열행렬로 표현하고, 주파수 영역의 출력 채널의 데이터 신호를 N×1의 F_N'(x) 열행렬로 표현할 때, 상기 다운-믹싱 단계에서, 상기 G(N,M)에 상기 F_M(x) 열행렬을 곱하여, 상기 F_N'(x) 열행렬을 산출하며, 상기 변환하는 단계에서, 상기 F_N'(x)에 IFFT 수행하여, 시간 영역의 출력 채널의 데이터 신호인 1×N의 T_N'(x) 열행렬을 생성할 수 있다.In the down-mixing step, in the down-mixing step, when there are M input channels and N output channels, down-mixing on the frequency domain or the time domain using N × M downmix matrix G (N, M) Can be performed. In addition, when the data signal of the input channel in the frequency domain is represented by the M × 1 F _M (x) column matrix, and the data signal of the output channel in the frequency domain is represented by the N × 1 F _N '(x) column matrix. In the down-mixing step, multiplying G (N, M) by the F _M (x) column matrix to calculate the F _N ′ (x) column matrix, and in the converting step, F _N ′ IFFT may be performed at (x) to generate a T _N '(x) column matrix of 1 × N, which is a data signal of an output channel in the time domain.

한편, 본 발명은 상기 과제를 달성하기 위하여, 인코딩된 비트스트림 데이터 신호로부터 M개의 입력 채널들의 각 주파수 대역에 따라 주파수 영역의 데이터 신 호를 생성하는 단계; 상기 입력 채널들의 블럭 사이즈가 동일한지 여부를 판단하는 단계; 상기 블럭 사이즈 동일 여부에 따라 다운-믹싱과 IFFT 수행의 순서를 바꾸어 실행하는 단계; 및 상기 시간 영역의 데이터 신호에 대하여 윈도우잉 및 중첩-가산 연산을 수행하여 PCM 신호를 출력하는 단계;를 포함하는 멀티채널 오디오 디코딩 방법을 제공한다.On the other hand, to achieve the above object, the present invention comprises the steps of: generating a data signal in the frequency domain according to each frequency band of the M input channels from the encoded bitstream data signal; Determining whether the block sizes of the input channels are the same; Changing the order of down-mixing and performing IFFT according to whether the block size is the same; And outputting a PCM signal by performing windowing and overlap-add operations on the data signal in the time domain.

본 발명에 따른 멀티채널 오디오 디코더, 그 디코더를 포함한 송수신 장치 및 멀티채널 오디오 디코딩 방법은, 변환 코딩을 사용하는 멀티채널 오디오 시스템에서 시간-주파수 역변환 횟수를 줄임으로써, 기존과 동일한 출력을 가지면서 디코더에서 사용되는 연산량을 줄일 수 있다. 그에 따라, 속도 및 메모리 사이즈 면에서 성능이 향상된 디코더를 구현할 수 있으며, 또한, 전체 동작 클럭 수를 줄이거나 낮출 수 있어 디코더의 저전력 설계를 가능하게 한다.The multichannel audio decoder, the transmitting and receiving apparatus including the decoder, and the multichannel audio decoding method according to the present invention have the same output while reducing the number of time-frequency inverse transforms in a multichannel audio system using transform coding. It can reduce the amount of computation used in. As a result, a decoder having improved performance in terms of speed and memory size can be implemented, and the total number of operating clocks can be reduced or lowered, thereby enabling a low power design of the decoder.

이러한 본 발명의 멀티채널 오디오 디코더 및 딩코딩 방법은 기타 주파수 영역에서 시간영역으로 변환하여 디코딩하는 구조를 갖는 오디오 또는 보이스 디코더에서 M개의 다중 채널을 갖는 데이터를 N(N<M)개의 채널로 다운-믹싱해서 출력하는 모든 디코더에 적용 가능하다.The multi-channel audio decoder and decoding method of the present invention downloads data having M multiple channels to N (N <M) channels in an audio or voice decoder having a structure of converting and decoding from other frequency domains to time domains. Applicable to all decoders that mix and output.

이하에서는 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명한다. 이하의 설명에서 어떤 구성 요소가 다른 구성 요소에 연결된다고 기술될 때, 이는 다른 구성 요소와 바로 연결될 수도 있지만, 그 사이에 제3의 구성 요소 가 개재될 수도 있다. 또한, 도면에서 각 구성 요소의 구조나 크기는 설명의 편의 및 명확성을 위하여 과장되었고, 설명과 관계없는 부분은 생략되었다. 도면상에서 동일 부호는 동일한 요소를 지칭한다. 한편, 사용되는 용어들은 단지 본 발명을 설명하기 위한 목적에서 사용된 것이지 의미 한정이나 특허청구범위에 기재된 본 발명의 범위를 제한하기 위하여 사용된 것은 아니다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description, when a component is described as being connected to another component, it may be directly connected to another component, but a third component may be interposed therebetween. In addition, in the drawings, the structure or size of each component is exaggerated for convenience and clarity of explanation, and parts irrelevant to the description are omitted. Like numbers refer to like elements in the figures. It is to be understood that the terminology used is for the purpose of describing the present invention only and is not used to limit the scope of the present invention.

도 1은 본 발명의 일 실시예에 따른 송수신 장치에 대한 블럭 구조도이다.1 is a block diagram of a transmission and reception apparatus according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 송수신 장치는 송수신부(100), 멀티채널 오디오 인코더(200), 멀티채널 오디오 디코더(300), 입출력부(400) 및 제어부(500)를 포함한다. Referring to FIG. 1, a transceiver of the present invention includes a transceiver 100, a multichannel audio encoder 200, a multichannel audio decoder 300, an input / output unit 400, and a controller 500.

송수신부(100)는 인코더를 통해 인코딩된 비트스트림(Bitstream)을 송수신한다. 멀티채널 오디오 인코더(200)는 오디오 PCM(Pulse Code Modulation) 데이터 신호를 전송을 위한 비트스트림으로 인코딩한다. 즉, 오디오 PCM 데이터 신호에 윈도우잉(windowing)을 취하고, 시간-주파수 변환(IFFT)을 통해 주파수 영역(Frequency Domain)의 데이터 신호로 변환시킨 후에, 압축을 통해 비트스트림으로 변환시킨다. The transceiver 100 transmits and receives an encoded bitstream through an encoder. The multichannel audio encoder 200 encodes an audio Pulse Code Modulation (PCM) data signal into a bitstream for transmission. That is, windowing is performed on the audio PCM data signal, converted into a frequency domain data signal through time-frequency conversion (IFFT), and then converted into a bitstream through compression.

윈도우잉은 윈도우 함수를 적용하여 윈도우잉된 데이터 신호를 생성하는 것을 의미하며, 일반적으로 가변 시간 세그멘테이션(variable time segmentation)을 수용하기 위하여 4개의 윈도우 유형을 사용한다. 또한, 여기서, 윈도우잉은 신호의 연속성을 유지하기 위한 중첩-가산(Overlap-Add) 연산을 포함하는 개념으로 사용된다. Windowing refers to generating a windowed data signal by applying a window function, and generally uses four window types to accommodate variable time segmentation. Also, here windowing is used with the concept of including an overlap-add operation to maintain signal continuity.

현재의 압축 기술은 인간의 지각 시스템의 모델을 사용하여, 소스 신호와 무 관한 정보를 제거하게 된다. 예컨대, 본 실시예의 멀티채널 오디오 인코더에서 압축은 규정된 비트스트림 포맷의 출력 신호를 생성하기 위하여 주파수 라인들의 적절한 양자화 및 코딩을 수행하게 되는데, 이러한 양자화 및 코딩으로 정신-음향학적 모델(Psycho-Acoustic Model)에 의해 조정되는 비트-할당 알고리즘이 이용된다. 멀티채널 오디오 인코더는 본 발명의 주요 특징과는 거리가 있으므로 그에 대한 상세한 설명은 생략한다.Current compression techniques use a model of the human perceptual system to remove information about the source signal and nothing. For example, in the multi-channel audio encoder of the present embodiment, the compression is performed by appropriate quantization and coding of frequency lines to generate an output signal of a prescribed bitstream format. With this quantization and coding, a psycho-acoustic model A bit-allocation algorithm adjusted by the model is used. Since the multichannel audio encoder is far from the main features of the present invention, a detailed description thereof will be omitted.

멀티채널 오디오 디코더(300)는 송수신부(100)를 통해 수신된 비트스트림을 원래의 PCM 신호로 디코딩한다. 이러한 멀티채널 오디오 디코더(300)는 주파수 영역 신호 생성부, 다운 믹서, IFFT부 및 윈도우부를 구비하여 수신된 비트스트림 신호를 멀티채널 오디오 인코더(200)에서의 역방향 순으로 처리하여 PCM 신호를 생성하게 된다. 본 실시예에서의 멀티채널 오디오 디코더(300)는 사용자의 오디오 출력 시스템의 채널 수에 맞도록 입력 채널을 출력 채널로 다운-믹싱(Down-Mixing)하는 다운 믹서를 포함하는데, 이러한 다운 믹서는 주파수 영역에서 다운-믹싱을 수행한다는 특징을 갖는다. 멀티채널 오디오 디코더(300)에 대한 좀더 상세한 설명은 도 2 및 3의 설명부분에서 기술한다.The multichannel audio decoder 300 decodes the bitstream received through the transceiver 100 into an original PCM signal. The multi-channel audio decoder 300 includes a frequency domain signal generator, a down mixer, an IFFT unit, and a window unit to process the received bitstream signal in the reverse order in the multi-channel audio encoder 200 to generate a PCM signal. do. The multichannel audio decoder 300 in this embodiment includes a down mixer that down-mixes the input channel to the output channel to match the number of channels of the user's audio output system. It is characterized by performing down-mixing in the region. A more detailed description of the multichannel audio decoder 300 is provided in the description of FIGS. 2 and 3.

입출력부(400)는 멀티채널 오디오 인코더(200)로 음성 신호를 입력하거나 멀티채널 오디오 디코더(300)부터의 PCM 신호를 출력하는 기능을 하며, 제어부(500)는 송수신 장치의 각 구성부분, 즉 송수신부(100), 멀티채널 인코더와 디코더(200, 300) 및 입출력부(400)를 전반적으로 제어한다. 한편, 도시하지는 않았지만 본 실시예의 송수신 장치가 비디오, 오디오 데이터 신호 등의 다양한 데이터 신호를 저 장하는 저장부를 포함할 수 있음은 물론이다.The input / output unit 400 functions to input a voice signal to the multi-channel audio encoder 200 or to output a PCM signal from the multi-channel audio decoder 300. The control unit 500 is a component of the transmission / reception apparatus, that is, The transceiver 100, the multi-channel encoder and decoders 200 and 300, and the input / output unit 400 are generally controlled. Although not shown, the transmission and reception apparatus of the present embodiment may include a storage unit for storing various data signals such as video and audio data signals.

본 실시예의 송수신 장치는 변환 코딩을 사용하는 멀티채널 오디오 시스템에서 주파수 영역에서 다운-믹싱을 수행함으로써, 시간-주파수 변환 횟수를 줄일 수 있고, 그에 따라 기존과 동일한 출력을 가지면서 디코더에서 사용되는 연산량을 줄일 수 있다. 또한, 연산량의 축소에 기인하여, 속도 및 메모리 사이즈 면에서 오디오 디코더의 성능을 향상시킬 수 있고, 전체 동작 클럭 수를 줄이거나 낮출 수 있어 디코더의 저전력 설계를 가능하게 함으로써, 송수신 장치의 사이즈 및 수신 성능을 향상시킬 수 있다.The transmitting and receiving device of the present embodiment can reduce the number of time-frequency conversions by performing down-mixing in the frequency domain in a multichannel audio system using transform coding, and accordingly, the amount of computation used in the decoder while having the same output as before. Can be reduced. In addition, due to the reduction of the calculation amount, the performance of the audio decoder can be improved in terms of speed and memory size, and the number of operating clocks can be reduced or lowered, thereby enabling the low-power design of the decoder, thereby making the size and reception of the transmitting and receiving device It can improve performance.

도 2는 도 1의 멀티채널 오디오 디코더 부분을 좀더 상세하게 보여주는 블럭 구조도이다.FIG. 2 is a block diagram illustrating in detail the multichannel audio decoder of FIG. 1.

도 2를 참조하면, 본 실시예의 멀티채널 오디오 디코더는 주파수 영역 신호 생성부(310), 주파수 영역 다운 믹서(330), IFFT(Inverse Fast Fourier Transform)부(350) 및 윈도우부(370)를 포함한다.Referring to FIG. 2, the multi-channel audio decoder of the present embodiment includes a frequency domain signal generator 310, a frequency domain down mixer 330, an inverse fast fourier transform (IFFT) unit 350, and a window unit 370. do.

주파수 영역 신호 생성부(310)는 인코딩된 비트스트림 데이터 신호로부터 주파수 영역의 데이터 신호를 생성한다. 이러한 주파수 영역 신호 생성은 앞서 멀티채널 오디오 인코더의 양자화 및 코딩에 대응하여 디코딩 및 역양자화 수행을 통해 이루어지는데, 입력 채널들의 각 주파수 대역에 따라 주파수 영역의 데이터 신호가 추출된다. 주파수 영역 다운 믹서(330)는 주파수 영역의 데이터 신호를 출력 채널 수에 따라 주파수 영역의 다운-믹싱을 수행한다. 이러한 주파수 영역 다운-믹싱에 대해서는 시간 영역 다운-믹싱과 비교하여, 도 4 및 5에 대한 설명 부분에서 좀더 상세하게 기술한다.The frequency domain signal generator 310 generates a data signal of the frequency domain from the encoded bitstream data signal. The frequency domain signal generation is performed through decoding and inverse quantization in response to quantization and coding of the multichannel audio encoder. The data signal of the frequency domain is extracted according to each frequency band of the input channels. The frequency domain down mixer 330 performs down-mixing of the frequency domain data signal according to the number of output channels. Such frequency domain down-mixing is described in more detail in the description of FIGS. 4 and 5 as compared to time domain down-mixing.

IFFT부(350)는 다운-믹싱된 주파수 영역의 데이터 신호를 시간-주파수 역변환을 통해 시간 영역의 데이터 신호로 변환한다. 윈도우부(370)는 시간 영역의 데이터 신호에 윈도우 함수를 적용하여 윈도우윙을 취함으로써, PCM 신호를 출력한다. 여기서, 윈도우부(370)는 앞서 멀티채널 오디오 인코더에서 설명한 바와 같이 중첩-가산 연산도 함께 수행하게 됨은 물론이다.The IFFT unit 350 converts the down-mixed data signal of the frequency domain into a data signal of the time domain through time-frequency inverse transform. The window unit 370 outputs a PCM signal by applying a window function to a data signal in the time domain to take a window wing. Here, the window unit 370 also performs the overlap-add operation as described above in the multi-channel audio encoder.

본 실시예의 멀티채널 오디오 디코더는 주파수 영역에서 다운-믹싱을 미리 수행함으로써, 시간-주파수 변환 횟수를 줄일 수 있고, 그에 따라 기존과 동일한 출력을 가지면서 디코더에서 사용되는 연산량을 줄일 수 있는 장점을 갖는다.The multi-channel audio decoder of this embodiment can reduce the number of time-frequency conversions by performing down-mixing in the frequency domain in advance, thereby reducing the amount of computation used in the decoder while having the same output as before. .

도 3은 도 2의 멀티채널 오디오 디코더의 변형예를 보여주는 블럭 구조도이다.3 is a block diagram illustrating a modification of the multichannel audio decoder of FIG. 2.

도 3을 참조하면, 본 실시예의 멀티채널 오디오 디코더는 도 2의 멀티채널 오디오 디코더와 유사하나, 주파수 영역 신호 생성부(310) 후단으로 블럭 스위칭부(320)를 더 포함한다.Referring to FIG. 3, the multi-channel audio decoder of the present embodiment is similar to the multi-channel audio decoder of FIG. 2, but further includes a block switching unit 320 after the frequency domain signal generator 310.

이러한 블럭 스위칭부(320)는 주파수 영역 신호 생성부(310)로부터 입력된 데이터 신호에 대한 입력 채널들의 블럭 사이즈가 동일한지 여부를 판단하여, 주파수 영역의 다운-믹싱을 수행할 것인지 시간 영역의 다운-믹싱을 수행할 것인지를 결정한다. 즉, 입력 채널들의 블럭 사이즈가 동일한 경우에는 시간-주파수 역변환의 수를 줄이기 위하여 주파수 영역 다운 믹서(330)에서 먼저 다운-믹싱을 수행한 후, IFFT부(350)를 통해 시간-주파수 역변환을 수행하도록 한다. 한편, 입력 채널 들의 블럭 사이즈가 다른 경우에는 거쳐 IFFT부(350)를 통해 시간-주파수 역변환을 먼저 수행한 후에 시간 영역 다운 믹서(330a)를 통해 다운-믹싱이 수행되도록 한다. 여기서, 주파수 영역 다운 믹서(330)와 시간 영역 다운 믹서(330a)에 대하여 참조번호를 다르게 표시하고 있지만, 하나의 다운 믹서를 통해 주파수 영역의 다운-믹싱 또는 시간 영역의 다운-믹싱을 선택적으로 수행하도록 할 수 있음은 물론이다.The block switching unit 320 determines whether the block sizes of the input channels with respect to the data signal input from the frequency domain signal generator 310 are the same, and performs down-mixing of the frequency domain or downs the time domain. Determine whether to perform mixing. That is, if the block sizes of the input channels are the same, down-mixing is first performed in the frequency domain down mixer 330 to reduce the number of time-frequency inverse transforms, and then time-frequency inverse transform is performed through the IFFT unit 350. Do it. On the other hand, if the block sizes of the input channels are different, the time-frequency inverse transform is first performed through the IFFT unit 350, and then down-mixing is performed through the time domain down mixer 330a. Here, although reference numerals are different for the frequency domain down mixer 330 and the time domain down mixer 330a, the down-mixing of the frequency domain or the down-mixing of the time domain is selectively performed through one down mixer. Of course you can.

입력 채널의 블럭 사이즈가 다른 경우에는 IFFT부(350)를 통한 시간-주파수 역변환이 채널에 따라 달라지므로, 시간-주파수 역변환 후 시간 영역의 다운-믹싱을 한 결과값이 시간 영역의 다운-믹싱 후 시간-주파수 역변환의 결과값과 달라지게 되어, 주파수 영역의 다운-믹싱을 먼저 수행할 수는 없다. 그러나, 입력 채널의 블럭 사이즈가 같은 경우에는 시간-주파수 역변환 후 시간 영역의 다운-믹싱을 한 결과값이 시간 영역의 다운-믹싱 후 시간-주파수 역변환의 결과값이 같다. 따라서, 시간-주파수 역변환의 수를 줄이기 위해 주파수 영역의 다운-믹싱을 먼저 수행하는 것이 유리하다. 이하, 도 4 및 5에서는 입력 채널의 블럭 사이즈가 동일한 경우에 시간-주파수 역변환 후 시간 영역의 다운-믹싱을 한 결과값이 시간 영역의 다운-믹싱 후 시간-주파수 역변환의 결과값과 동일함을 설명한다.When the block size of the input channel is different, the time-frequency inverse transform through the IFFT unit 350 varies depending on the channel. Thus, the result of down-mixing the time domain after the time-frequency inverse transform is after the down-mixing of the time domain. It is different from the result of the time-frequency inverse transform, so that down-mixing of the frequency domain cannot be performed first. However, when the block sizes of the input channels are the same, the result value of the time-domain down-mixing after the time-frequency inverse transform is the same as the result value of the time-frequency inverse transformation after the down-mixing of the time domain. Therefore, it is advantageous to first perform down-mixing of the frequency domain to reduce the number of time-frequency inverse transforms. 4 and 5, when the block sizes of the input channels are the same, the result of down-mixing the time domain after time-frequency inverse transform is the same as the result of time-frequency inverse transform after down-mixing of time domain. Explain.

도 4 및 5는 본 발명의 멀티채널 오디오 디코딩 방법에서 시간 영역의 다운-믹싱 과정 및 주파수 영역의 다운-믹싱 과정을 보여주는 블럭 구조도이다. 도 4는 시간-주파수 역변환을 먼저 수행하고 난 후에 다운-믹싱을 수행하는 모습을 보여주며, 도 5는 다운-믹싱을 수행하고 난 후에 시간-주파수 역변환을 수행하는 모습을 보여준다. 여기서, 입력 채널의 블럭 사이즈는 동일한 것으로 가정하며, 이해의 편의를 위해 도 2 및 3을 함께 참조하여 설명한다.4 and 5 are block diagrams illustrating a down-mixing process in a time domain and a down-mixing process in a frequency domain in a multichannel audio decoding method of the present invention. FIG. 4 shows the down-mixing process after performing the time-frequency inverse transform first, and FIG. 5 shows the time-frequency inverse transformation after the down-mixing. Here, it is assumed that the block size of the input channel is the same, and it will be described with reference to FIGS. 2 and 3 together for convenience of understanding.

도 4를 참조하면, 주파수 영역 신호 생성부(310)에서 비트 할당(역양자화) 및 디코딩을 통해 주파수 영역의 데이터 신호가 추출된 후에, IFFT(350)에서 시간-주파수 역변환이 수행된다. 이때, 주파수 영역의 데이터 신호는 다운 믹서의 입력 채널의 각 주파수 대역에 따라 추출된다. 한편, IFFT(350)는 채널별로 시간-주파수 역변환을 수행하게 되는데, 만약 입력 채널 수가 M개인 경우에는 M 루프만큼의 시간-주파수 역변환을 수행한다. 이와 같이 시간-주파수 역변환의 수행된 신호는 다운 믹서(330a)의 M개의 입력 채널로 입력되어 다운-믹싱이 수행된다. 예컨대, 출력 채널이 N(N<M) 개인 경우에, M개의 입력 채널의 데이터 신호가 N개의 출력 채널의 데이터 신호로 다운-믹싱된다. 그 후, 다운-믹싱된 신호는 윈도우부(370)를 통해 윈도우잉되어 PCM 신호로 변환된다.Referring to FIG. 4, after the data signal of the frequency domain is extracted through bit allocation (dequantization) and decoding in the frequency domain signal generator 310, a time-frequency inverse transform is performed in the IFFT 350. At this time, the data signal of the frequency domain is extracted according to each frequency band of the input channel of the down mixer. Meanwhile, the IFFT 350 performs time-frequency inverse transform for each channel. If the number of input channels is M, the IFFT 350 performs time-frequency inverse transform as much as M loops. In this way, the time-frequency inverse transformed signal is input to M input channels of the down mixer 330a to perform down-mixing. For example, when the output channel is N (N <M), the data signals of the M input channels are down-mixed with the data signals of the N output channels. The down-mixed signal is then windowed through window portion 370 and converted to a PCM signal.

도 5를 참조하면, 주파수 영역 신호 생성부(310)에서 비트 할당 및 디코딩을 통해 주파수 영역의 데이터 신호가 추출된 후에, 다운 믹서(330)에서 주파수 영역의 다운-믹싱이 수행된다. 즉, 다운 믹서(330)의 입력 채널이 M개이고 출력 채널이 N개인 경우, M개의 채널에 대응되는 주파수 영역의 데이터 신호가 다운 믹서(330)를 통해 N개의 채널에 대응되는 주파수 영역의 데이터 신호로 다운-믹싱 된다. 이렇게 다운-믹싱된 주파수 영역의 데이터 신호는 IFFT(350)에서 시간-주파수 역변환이 수행되는데, 입력되는 주파수 영역의 데이터 신호가 N 개의 채널을 가지므로, N 루프만큼만 시간-주파수 역변환이 수행되어 시간 영역의 데이터 신호로 변환된다. 시간 영역의 데이터 신호는 역시 윈도우부(370)를 통해 윈도우윙이 수행되어 PCM 신호로 변환된다.Referring to FIG. 5, after the data signal of the frequency domain is extracted by bit allocation and decoding in the frequency domain signal generator 310, the down mixer 330 performs down-mixing of the frequency domain. That is, when the number of input channels of the down mixer 330 and the number of output channels is N, the data signal of the frequency domain corresponding to the M channels is the data signal of the frequency domain corresponding to the N channels through the down mixer 330. Are down-mixed. In the down-mixed frequency domain data signal, time-frequency inverse transform is performed in the IFFT 350. Since the data signal in the frequency domain that is input has N channels, time-frequency inverse transform is performed only by N loops. The data signal of the area is converted. The data signal of the time domain is also window-winged through the window unit 370 and converted into a PCM signal.

도 4와 도 5를 비교하면, 주파수 영역에서 다운-믹싱을 수행하고 난 뒤에 시간-주파수 역변환을 수행하는 경우가 시간-주파수 역변환의 수행 수를 줄일 수 있음을 알 수 있다. 즉, 시간 영역에서의 다운-믹싱을 수행하는 경우에는 입력 채널 수 M번만큼의 시간-주파수 역변환을 수행하여야 하나, 주파수 영역에서의 다운-믹싱을 수행하는 경우에는 출력 채널 수 N번만큼만 시간-주파수 역변환을 수행하면 되므로, 입력 채널 수와 출력 채널 수의 차이만큼의 시간-주파수 역변환을 수를 줄일 수 있다.Comparing FIG. 4 and FIG. 5, it can be seen that the time-frequency inverse transform can be reduced if the time-frequency inverse transform is performed after down-mixing in the frequency domain. That is, in case of performing down-mixing in the time domain, time-frequency inverse transform should be performed by the number of input channels M times, but in case of performing down-mixing in the frequency domain, only time N of output channels should be used. Since the frequency inverse transform is performed, the number of time-frequency inverse transforms can be reduced by the difference between the number of input channels and the number of output channels.

정량적으로 IFFT부(350)에서 줄어드는 연산량을 연산하면 다음의 식들로 표현될 수 있다.Quantitatively calculating the amount of calculation reduced in the IFFT unit 350 can be expressed by the following equations.

채널당 IFFT의 연산량 = Q Mips.................................식(1)Calculation amount of IFFT per channel = Q Mips ..... (1)

시간 영역 다운-믹싱의 IFFT 연산량 = M * Q Mips................식(2)IFFT computed amount for time-domain down-mixing = M * Q Mips ...... (2)

주파수 영역 다운-믹싱 IFFT 연산량 = N * Q Mips...............식(3)Frequency-domain down-mixing IFFT computation = N * Q Mips ............... Equation (3)

줄어든 IFFT 연산량 = (M - N) * Q Mips, (where, N < M).........식(4)Reduced IFFT computation = (M-N) * Q Mips, (where, N <M) ......... Equation (4)

예컨대, 자주 사용되고 있는 5.1 채널(6채널에 대응됨)로 인코딩된 비트스트림 데이터 신호를 주파수 영역 다운-믹싱의 방법을 적용하여 2채널로 다운-믹싱하는 경우에, 위의 식들에 근거하여 4 * Q Mips의 연산량을 줄일 수 있다. 보통 변환 코딩을 이용한 디코더에서 시간-주파수 역변환 연산이 차지하는 연산량이 전체 연산량의 약 30% 정도를 차지한다는 점을 감안하면, 상당히 많은 연산량을 줄일 수 있음을 알 수 있다. For example, in the case of down-mixing a bitstream data signal encoded in a 5.1 channel (corresponding to six channels) which is frequently used, into two channels by applying a frequency domain down-mixing method, 4 * The amount of computation in Q Mips can be reduced. Considering that the amount of time taken by the time-frequency inverse transform operation in the decoder using the transform coding occupies about 30% of the total amount of computation, it can be seen that a considerable amount of computation can be reduced.

한편, 시간 영역 다운-믹싱 대신에 주파수 영역 다운-믹싱을 대신 사용할 수 있으려면, 최종 결과값이 동일해야 함이 보장되어야 한다. 이러한 최종 결과값 동일성은 이하의 시간-주파수 변환의 선형성 특성(Linear Property)으로 설명될 수 있다.On the other hand, in order to be able to use frequency domain down-mixing instead of time domain down-mixing, it should be ensured that the final result should be the same. This final result identity can be explained by the linear property of the time-frequency transform below.

먼저 M개의 입력 채널의 신호를 N개의 출력 채널의 신호로 시간 영역에서 다운-믹싱할 때, 매트릭스를 이용하여 입력 채널의 신호와 출력 신호의 관계를 표현하면 식(5)와 같다. First, when down-mixing the signals of the M input channels in the time domain with the signals of the N output channels, the relationship between the signal of the input channel and the output signal is expressed using Equation (5).

...............식(5)

............... Equation (5)

식(5)에서 표현된 것과 같이 다운-믹싱을 위해서 임의의 다운 믹스 매트릭스 G (n * m)를 이용하여 m개의 입력 채널의 시간 영역 신호 f₁ ₍x), f₂(x), .... , f_m(x)를 다운-믹싱하면 결과는 식(6)과 같이 된다.Time domain signals f ₁ ₍ x), f ₂ (x),... Of m input channels using an arbitrary downmix matrix G (n * m) for down-mixing as represented in equation (5). If we down-mix f _m (x), the result is (6).

여기서, here,

n<m,n <m,

f(x) = {f1(x), f2(x), .... , fm(x)} : m개의 입력 채널의 신호,f (x) = {f1 (x), f2 (x), ...., fm (x)}: signals of m input channels,

f'(x) = {f1´(x), f1´(x), .... , fn´(x)} : n개의 출력 채널의 신호,f '(x) = {f1´ (x), f1´ (x), ...., fn´ (x)}: signals from n output channels,

G = { {A₁₁, B₁₂, .... , K_1m}, {A₂₁, B₂₂, .... , K_2m}, .... , {A_n1, B_n2, .... , Knm} } : n번째 출력 채널에 대한 m번째 입력 채널의 다운 믹스 계수G = {{A ₁₁ , B ₁₂ , ...., K _1m }, {A ₂₁ , B ₂₂ , ...., K _2m }, ...., {A _n1 , B _n2 , ... , Knm}}: downmix coefficient of the mth input channel to the nth output channel

f₁'(x) = A₁₁*f₁(x) + B₁₂*f₂(x) + .... + K_1m*f_m(x),f ₁ '(x) = A ₁₁ * f ₁ (x) + B ₁₂ * f ₂ (x) + .... + K _1m * f _m (x),

f₂'(x) = A₂₁*f₁(x) + B₂₂*f₂(x) + .... + K_2m*f_m(x),f ₂ '(x) = A ₂₁ * f ₁ (x) + B ₂₂ * f ₂ (x) + .... + K _2m * f _m (x),

..

f_n'(x) = A_n1*f₁(x) + B_n2*f₂(x) + .... + K_nm*f_m(x)..............식(6)f _n '(x) = A _n1 * f ₁ (x) + B _n2 * f ₂ (x) + .... + K _nm * f _m (x) ............ Equation (6)

이러한 시간 영역에서의 계수들을 다운-믹싱하면 식(7)과 같은 결과를 얻을 수 있다. Down-mixing the coefficients in this time domain yields the same result as Equation (7).

F₂'(X) = A₂₁*F₁(X) + _B22*F₂(X) + .... + K_2m*F_m(X),F ₂ '(X) = A ₂₁ * F ₁ (X) + _B22 * F ₂ (X) + .... + K _{2 m} * F _m (X),

..

F_n'(X) = A_n1*F₁(X) + B_n2*F₂(X) + .... + K_nm*F_m(X).............식(7)F _n '(X) = A _n1 * F ₁ (X) + B _n2 * F ₂ (X) + .... + K _nm * F _m (X) ............ . (7)

여기서, 시간 영역의 각각 다른 신호를 x₁(x), x₂(x)라 하고, 식(8)과 식(9)와 같이 그 신호의 시간-주파수 변환을 한 주파수 영역의 신호를 X₁(X), X₂(X)라 하면, 시간-주파수 변환의 선형성 특성에 의하여 식(10)이 성립한다.Here, each signal in the time domain is called x ₁ (x), x ₂ (x), and the signals in the frequency domain in which the time-frequency conversion of the signals are transformed as in Equations (8) and (9) are X _1. In the case of (X) and X ₂ (X), equation (10) is established by the linearity characteristic of the time-frequency conversion.

x₁(x) ↔ X₁(X).......................................식(8)x ₁ (x) ↔ X ₁ (X) ..................... Formula (8)

x₂(x) ↔ X₂(X).......................................식(9)x ₂ (x) ↔ X ₂ (X) ..................... Formula (9)

A*x₁(x) + B*x₂(x) ↔ A*X₁(X) + B*X₂(X)................식(10)A * x ₁ (x) + B * x ₂ (x) ↔ A * X ₁ (X) + B * X ₂ (X) ... )

이와 동일한 원리에 의해 시간-주파수 변환의 선형성에 기인하여 식(7)을 시간-주파수 역변환하면 식(6)을 얻을 수 있다. 따라서, 주파수 영역의 다운-믹싱을 수행한 최종 출력신호와 시간 영역의 다운-믹싱을 수행한 최종 출력 신호는 동일함을 할 수 있다.By the same principle, equation (6) can be obtained by inversely transforming equation (7) due to the linearity of the time-frequency conversion. Therefore, the final output signal which performed the down-mixing of the frequency domain and the final output signal which performs the down-mixing of the time domain may be the same.

도 6은 본 발명의 다른 실시예에 따른 멀티채널 오디오 디코딩 방법을 보여주는 흐름도이다.6 is a flowchart illustrating a multichannel audio decoding method according to another embodiment of the present invention.

도 6을 참조하면, 본 실시예의 멀티채널 오디오 디코딩 방법은, 먼저 멀티채널 디코더로 인코딩된 비트스트림이 입력된다(S110). 다음, 인코딩된 비트스트림 데이터 신호로부터 디코딩 및 역양자화 과정 등을 통해 주파수 영역의 데이터 신호가 생성된다(S130). 이후, 주파수 영역의 데이터 신호는 주파수 영역의 다운-믹싱을 통해 출력 채널 수에 맞도록 다운-믹싱된다(S150). 다운-믹싱된 주파수 영역의 데이터 신호는 시간-주파수 역변환, 즉 IFFT가 수행되어 시간 영역의 데이터 신호로 변환된다(S170). 최종적으로, 시간 영역의 데이터 신호는 위도우잉을 통해 PCM 신호로 변환되어 출력된다(S190).Referring to FIG. 6, in the multichannel audio decoding method of the present embodiment, first, a bitstream encoded by a multichannel decoder is input (S110). Next, a data signal of a frequency domain is generated from the encoded bitstream data signal through decoding and inverse quantization (S130). Thereafter, the data signal in the frequency domain is down-mixed to match the number of output channels through down-mixing in the frequency domain (S150). The data signal of the down-mixed frequency domain is transformed into a data signal of the time domain by performing time-frequency inverse transformation, that is, IFFT. Finally, the data signal in the time domain is converted into a PCM signal through the widowing (S190).

본 실시예의 멀티채널 오디오 디코딩 방법은 시간-주파수 역변환을 수행하기 전에 미리 다운 믹싱을 통해 주파수 영역의 다운-믹싱을 수행함으로써, 후에 IFFT부에서의 시간-주파수 역변환 수행 횟수를 줄일 수 있다. 즉, 입력 채널이 M개이고 출력 채널이 N개인 경우에, M - N 만큼의 시간-주파수 역변환 수행 횟수를 줄일 수 있다. 그에 따라, 멀티채널 디코더에서 시간-주파수 역변환 연산이 차지하는 연산량이 전체 연산량의 약 30% 정도를 차지한다는 점을 감안하면, 상당히 많은 연산량을 줄일 수 있다.In the multi-channel audio decoding method of the present embodiment, down-mixing of the frequency domain is performed through down-mixing before performing time-frequency inverse transform, thereby reducing the number of time-frequency inverse transform performed in the IFFT unit later. That is, when there are M input channels and N output channels, the number of time-frequency inverse transforms performed by M-N can be reduced. Accordingly, considering that the amount of time occupied by the time-frequency inverse transform operation in the multichannel decoder occupies about 30% of the total amount of operations, a considerable amount of computation can be reduced.

도 7은 본 발명의 또 다른 실시예에 따른 멀티채널 오디오 디코딩 방법을 보여주는 흐름도이다. 7 is a flowchart illustrating a multichannel audio decoding method according to another embodiment of the present invention.

오디오 신호는 크게 정상 신호(Stationary signals)와 비정상 신호(Non- Stationary signals)로 구분할 수 있다. 정상 신호와 비정상 신호는 서로 다른 특성을 가지기 때문에 신호의 코딩 방법도 다르게 된다. 정상 신호와 비정상 신호 구분하여 코딩하는 방법으로 블록 스위칭(Block switching) 방법을 사용할 수 있는데, 이러한 블록 스위칭은 정상 신호와 비정상 신호를 각각 다른 블록 크기로 코딩함으로써 신호의 특성을 보다 정확히 파악할 수 있도록 한다. 여기서, 정상 신호는 시간에 따라 그 특성이 변하지 않는 신호로 동일한 블록 사이즈를 가지고 코딩하게 되고, 비정상 신호는 시간에 따라 그 특성이 변하는 신호로 블록 사이즈를 상황에 따라 조정하면서 코딩 하게 된다. 따라서 비정상신호에서는 블록 사이즈가 채널마다 달라질 수 있음을 의미한다.Audio signals can be roughly divided into stationary signals and non-stationary signals. Since the normal signal and the abnormal signal have different characteristics, the coding method of the signal is also different. Block switching can be used as a method of distinguishing between normal signal and abnormal signal, and this block switching enables to identify the characteristics of the signal more accurately by coding the normal signal and the abnormal signal in different block sizes. . Here, the normal signal is a signal whose characteristic does not change with time, and is coded with the same block size, and the abnormal signal is a signal whose characteristic varies with time, and coding while adjusting the block size according to a situation. Therefore, in an abnormal signal, it means that the block size may vary for each channel.

도 7을 참조하면, 본 실시예의 멀티채널 오디오 디코딩 방법은, 앞서, 도 6과 유사하게 멀티채널 디코더로 인코딩된 비트스트림이 입력되고(S210), 인코딩된 비트스트림 데이터 신호로부터 디코딩 및 역양자화 과정 등을 통해 주파수 영역의 데이터 신호가 생성된다(S230). 이후, 블럭 스위칭부를 통해 모든 입력 채널들의 블럭 사이즈가 동일한지 판단된다(S230). Referring to FIG. 7, in the multi-channel audio decoding method according to the present embodiment, similarly to FIG. 6, a bitstream encoded by a multichannel decoder is input (S210), and a decoding and dequantization process is performed from the encoded bitstream data signal. The data signal of the frequency domain is generated through the display (S230). Thereafter, it is determined whether the block sizes of all input channels are the same through the block switching unit (S230).

입력 채널들의 블럭 사이즈가 동일한 경우는 앞서 도 6과 같은 과정이 수행된다. 즉, 주파수 영역의 다운-믹싱이 수행된(S270) 후, 시간-주파수 역변환이 수행되며(S250)이 수행되며, 최종적으로 윈도우잉을 통해 PCM 신호가 출력된다(S290).If the block sizes of the input channels are the same, the process as shown in FIG. 6 is performed. That is, after down-mixing of the frequency domain is performed (S270), a time-frequency inverse transform is performed (S250), and finally, a PCM signal is output through windowing (S290).

한편, 입력 채널들의 블럭 사이즈가 다른 경우에는 앞서 언급한 바와 같이 IFFT부를 통한 시간-주파수 역변환이 채널에 따라 달라지므로, 시간-주파수 역변환 후 시간 영역의 다운-믹싱을 한 결과값이 주파수 영역의 다운-믹싱 후 시간-주파수 역변환의 결과값과 달라지게 되어, 주파수 영역의 다운-믹싱을 먼저 수행할 수 없다. 따라서, 입력 채널들의 블럭 사이즈가 다른 경우에는, 도 6과 달리, 먼저 시간-주파수 역변환이 수행되고(250), 그 후 시간 영역의 다운-믹싱이 수행된다(S270a). 그리고 최종적으로 윈도우잉을 통해 PCM 신호가 출력된다(S290). On the other hand, when the block sizes of the input channels are different, as described above, since the time-frequency inverse transform through the IFFT unit varies depending on the channel, the result of down-mixing the time domain after the time-frequency inverse transform is down in the frequency domain. After mixing, it is different from the result of the time-frequency inverse transform, so that down-mixing of the frequency domain cannot be performed first. Accordingly, when the block sizes of the input channels are different, unlike in FIG. 6, time-frequency inverse transform is first performed 250, and then down-mixing of the time domain is performed (S270a). Finally, the PCM signal is output through windowing (S290).

본 실시예의 멀티채널 오디오 디코딩 방법은 입력 채널들의 블럭 사이즈를 고려하여, 블럭 사이즈의 동일성 여부에 따라 선택적으로 디코딩 방법을 수행함으로써, 다양한 오디오 신호에 대한 디코딩을 효율적으로 수행할 수 있도록 한다.The multi-channel audio decoding method of the present embodiment can efficiently perform decoding on various audio signals by selectively performing a decoding method according to whether the block sizes are identical in consideration of the block sizes of the input channels.

지금까지, 본 발명을 도면에 표시된 실시예를 참고로 설명하였으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다.So far, the present invention has been described with reference to the embodiments shown in the drawings, which are merely exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. . Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

도 4 및 5는 본 발명의 멀티채널 오디오 디코딩 방법에서 시간 영역의 다운-믹싱 과정 및 주파수 영역의 다운-믹싱 과정을 보여주는 블럭 구조도이다.4 and 5 are block diagrams illustrating a down-mixing process in a time domain and a down-mixing process in a frequency domain in a multichannel audio decoding method of the present invention.

도 7은 본 발명의 또 다른 실시예에 따른 멀티채널 오디오 디코딩 방법을 보여주는 흐름도이다.7 is a flowchart illustrating a multichannel audio decoding method according to another embodiment of the present invention.

Claims

A frequency domain signal generator configured to generate a frequency domain data signal from the encoded bitstream data signal;

A down mixer for down-mixing the data signal in the frequency domain according to the number of output channels;

An Inverse Fast Fourier Transform (IFFT) unit for converting down-mixed data signals in the frequency domain into data signals in the time domain; And

And a window unit configured to generate a pulse code modulation (PCM) output signal by performing windowing and overlap add operations on the data signal of the time domain.

According to claim 1,

And the IFFT unit performs time-frequency inverse transformation by the number of output channels.

According to claim 1,

The frequency domain signal generator generates a data signal of the frequency domain according to each frequency band of input channels.

And a block switching unit configured to determine whether the block sizes of the input channels are the same and to input the data signal in the frequency domain to the down mixer or the IFFT unit.

The method of claim 3,

If the block sizes are the same, the data signal of the frequency domain is input to the frequency domain down mixer, down-mixed, and then converted into a time domain data signal through the IFFT unit.

If the block size is different, the multi-channel audio decoder characterized in that the data signal of the frequency domain is input to the IFFT unit and converted into a time domain data signal, and then down-mixing of the time domain through the down mixer.

According to claim 1,

The down mixer has M input channels and N output channels,

And performing down-mixing on a frequency domain or a time domain using a downmix matrix composed of a gain value of N × M.

6. The method of claim 5,

The time-frequency inverse transform performed by the IFFT unit has linearity,

The value of performing the time-frequency inverse transform after down-mixing the frequency domain using the downmix matrix and the value of the down-mixing of the time domain using the downmix matrix after the time-frequency inverse transform are the same. Multichannel audio decoder.

A transceiver for transmitting and receiving an encoded bitstream;

A multichannel audio encoder for encoding Pulse Code Modulation (PCM) into a bitstream;

A multi-channel audio decoder having a frequency domain signal generator, a down mixer, an IFFT unit, and a window unit to decode the PCM signal;

An input / output unit for inputting a voice signal to the multichannel audio encoder or outputting a PCM signal from the multichannel audio decoder; And

And a controller for controlling the transceiver, the multichannel encoder, the decoder, and the input / output unit.

And the down mixer of the multichannel audio decoder performs down-mixing in a frequency domain.

The method of claim 7, wherein

The frequency domain signal generator generates a frequency domain data signal from the encoded bitstream data signal,

The down mixer performs down-mixing of the data signal in the frequency domain in the frequency domain according to the number of output channels,

The IFFT unit converts the down-mixed data signal of the frequency domain into a data signal of the time domain, and performs time-frequency inverse conversion by the number of output channels.

And the window unit generates a pulse code modulation (PCM) output signal by performing windowing and overlap add operations on the data signal in the time domain.

The method of claim 7, wherein

The data signal of the frequency domain is input to the down mixer through M input channels,

And the IFFT unit converts the data signal in the frequency domain down-mixed into N output channels smaller than M by the down mixer into a data signal in the time domain.

The method of claim 7, wherein

And a block switching unit which determines whether the block sizes of the input channels are the same, and inputs the data signal in the frequency domain to the down mixer or the IFFT unit.

Generating a data signal in a frequency domain from the encoded bitstream data signal;

Down-mixing the frequency-domain data signal according to the number of output channels;

Converting the down-mixed data signal in the frequency domain into a time domain data signal through an IFFT; And

Outputting a PCM signal by performing windowing and overlap-add operations on the data signal in the time domain.

12. The method of claim 11,

And inversely converting time-frequency inverse number by the number of output channels in the converting step.

12. The method of claim 11,

In the down-mixing step, when there are M input channels and N output channels,

A down-mixing method in a frequency domain or a time domain is performed by using a downmix matrix composed of a gain value of N × M.

The method of claim 13,

When the data signal of the input channel in the frequency domain is represented by the M × 1 F _M (x) column matrix, and the data signal of the output channel in the frequency domain is represented by the N × 1 F _N '(x) column matrix,

In the down-mixing step, multiply G (N, M) by the F _M (x) column matrix to calculate the F _N ′ (x) column matrix,

In the converting, the IFN is performed on the F _N '(x) to generate a T _N ' (x) column matrix of 1 × N that is a data signal of an output channel in a time domain. Way.

15. The method of claim 14,

The T _N '(x) column matrix is equal to the value representing the data signal of the output channel of the time domain down-mixed using G (N, M) by performing IFFT on the data signal in the frequency domain. Multi-channel audio decoding method, characterized in that.

Generating a data signal in a frequency domain according to each frequency band of the M input channels from the encoded bitstream data signal;

Determining whether the block sizes of the input channels are the same;

Changing the order of down-mixing and performing IFFT according to whether the block size is the same; And

The method of claim 16,

If the block size is the same,

Down-mixing the data signal of the frequency domain according to the number of output channels according to the number of output channels, and then converting the down-mixed data signal of the frequency domain into a data signal of the time domain through an IFFT,

If the block size is different,

Converting the data signal in the frequency domain into a data signal in the time domain through an IFFT, and then performing down-mixing of the time domain data signal according to the number of output channels. .

18. The method of claim 17,

The data signal of the time domain converted through the IFFT when the block sizes are the same,

After converting the data signal in the frequency domain to the data signal in the time domain through an IFFT, the multi-value of the time domain data signal is the same as the result of down-mixing the time domain according to the number of output channels Channel audio decoding method.