KR20090037806A

KR20090037806A - Encoding and decoding method using variable subband aanlysis and apparatus thereof

Info

Publication number: KR20090037806A
Application number: KR1020080095541A
Authority: KR
Inventors: 서정일; 백승권; 장인선; 강경옥; 홍진우; 김진웅; 안치득
Original assignee: 한국전자통신연구원
Priority date: 2007-10-12
Filing date: 2008-09-29
Publication date: 2009-04-16

Abstract

An encoding and decoding method using variable subband analysis and an apparatus thereof are provided to increase the number of subbands while minimizing the increase of bit races, thereby outputting more superior sound. Audio objects(1~M) are inputted to an audio encoding unit and a frequency conversion unit. The audio encoding unit down-mixes the audio object. The frequency conversion unit(202) converts an audio object into a frequency domain. A subband configuration unit(203) subdivides a subband of a frequency-converted signal to variable subbands. A parameter generation unit(205) extracts parameters necessary for restoring the audio objects from the down mix signal. An encoding unit(206) encodes parameter information including a parameter generated from a parameter generating unit.

Description

Encoding and decoding method using variable subband analysis and apparatus therefor {Encoding and Decoding Method using Variable Subband Aanlysis and Apparatus}

본 발명은 부호화와 복호화 방법 및 그 장치에 관한 것으로서, 더욱 상세하게는 가변 서브밴드 분석을 이용한 부호화와 복호화 방법 및 그 장치에 관한 것이다.The present invention relates to an encoding and decoding method and apparatus, and more particularly, to an encoding and decoding method and apparatus using variable subband analysis.

본 발명은 정보통신부 및 정보통신연구진흥원의 정보통신표준개발지원의 일환으로 수행한 연구로부터 도출된 것이다. [과제관리번호: 2007-S-004-01, 과제명: 무안경 개인형 3D 방송기술개발 (Development of Glassless Single-User 3D Broadcasting Technologies)]The present invention is derived from a study performed as part of the support for the development of information and communication standards of the Ministry of Information and Communication and the Institute for Information and Telecommunication Research Promotion. [Task Management Number: 2007-S-004-01, Title: Development of Glassless Single-User 3D Broadcasting Technologies]

종래의 기술에 따르면, 다양한 채널로 구성된 다수의 오디오 객체가 사용자의 필요에 따라 다양하게 조합될 수 없고 따라서 하나의 오디오 컨텐츠가 다양한 형태로 소비될 수 없다. 결국, 사용자는 오디오 컨텐츠를 수동적으로만 소비할 수 있다.According to the prior art, a plurality of audio objects composed of various channels cannot be variously combined according to a user's needs, and thus one audio content cannot be consumed in various forms. As a result, the user can only consume audio content passively.

표준화 그룹 중 하나인 MPEG 오디오 서브그룹에서는 AAC(Advanced Audio Coding)과 MPEG Surround(MPS)와 같은 오디오 부호화(coding) 표준에 대해 개발해오고 있다. AAC는 모노(mono) 또는 스테레오(stereo) 채널 신호를 위한 고품질 오디오 부호화에 대한 기술이고, MPS는 멀티 채널(multi-channel) 오디오 부호화에 적합한 기술이다. 이들 기술과 같이, 종래의 오디오 부호화기는 채널에 기초한(channel based) 오디오 신호에 촛점이 맞추어져 왔다. 최근에 소개된 공간 큐 기반 오디오 부호화 방식인 공간 오디오 부호화(SAC: Spatial Audio Coding)가 그 예이다. MPEG audio subgroups, one of the standardization groups, have been developing audio coding standards such as AAC (Advanced Audio Coding) and MPEG Surround (MPS). AAC is a technique for high quality audio encoding for mono or stereo channel signals, and MPS is a technique suitable for multi-channel audio encoding. As with these techniques, conventional audio coders have been focused on channel based audio signals. For example, Spatial Audio Coding (SAC), which is a recently introduced spatial cue based audio encoding method, is an example.

종래기술인 SAC(Spatial Audio Coding) 기술에 따르면 다채널 오디오 신호는 다운믹스된 모노 채널 또는 스테레오 채널 신호와 공간큐(spatial cue) 정보로 인코딩되며, 낮은 비트 율에서도 고품질의 멀티채널 신호가 전송된다. SAC 기술에 따르면 오디오 신호는 서브밴드 별로 분석되고, 각 서브밴드에 대응하는 공간큐 정보에 기초하여 상기 다운믹스된 모노 채널 또는 스테레오 채널 신호로부터 원래의 다채널 오디오 신호가 복원된다. 상기 공간큐 정보는 디코딩 과정에서 원 신호의 복원을 위한 정보를 포함하며, SAC 디코딩 장치에서 재생되는 오디오 신호의 음질을 결정한다. MPEG은 MPEG Surround(MPS)라는 명칭으로 SAC 기술에 대한 표준화를 진행하고 있으며 CLD(Channel Level Difference)를 공간큐로 활용한다.According to the conventional spatial audio coding (SAC) technology, a multichannel audio signal is encoded into a downmixed mono channel or stereo channel signal and spatial cue information, and a high quality multichannel signal is transmitted even at a low bit rate. According to the SAC technology, an audio signal is analyzed for each subband, and an original multichannel audio signal is recovered from the downmixed mono channel or stereo channel signal based on spatial cue information corresponding to each subband. The spatial cue information includes information for reconstruction of the original signal in the decoding process, and determines the sound quality of the audio signal reproduced in the SAC decoding apparatus. MPEG is a standardization of SAC technology under the name of MPEG Surround (MPS), and uses CLD (Channel Level Difference) as a spatial cue.

SAC에 따르면, 다채널 오디오 신호로서 1개 오디오 객체에 대해서만 인코딩 및 디코딩이 가능하기 때문에, 다채널로 구성된 다객체 오디오 신호, 예를 들어, 모노 채널, 스테레오 채널 및 5.1 채널로 구성된 다양한 객체의 오디오 신호가 인 코딩 및 디코딩될 수 없다.According to the SAC, as a multichannel audio signal, only one audio object can be encoded and decoded, so that a multi-object audio signal composed of multiple channels, for example, audio of various objects composed of mono channels, stereo channels, and 5.1 channels The signal cannot be encoded and decoded.

또 다른 종래기술인 바이노럴 큐 코딩(Binaural Cue Coding, BCC) 기술에 따르면, 모노 채널로만 구성된 다객체 오디오 신호가 인코딩 및 디코딩이 가능하기 때문에, 모노 채널 이외의 다채널로 구성된 다객체 오디오 신호가 인코딩 및 디코딩될 수 없다.According to another conventional Binaural Cue Coding (BCC) technique, since a multi-object audio signal composed of only a mono channel can be encoded and decoded, a multi-object audio signal composed of multiple channels other than a mono channel is generated. It cannot be encoded and decoded.

따라서, 종래의 오디오 부호화 스킴(scheme)을 사용할 때, 모노, 스테레오 또는 멀티 채널 신호를 포함하는 다양한 오디오 객체의 전송이 필요한 경우, 높은 비트레이트(bitrate)라는 피할 수 없는 문제점이 발생하게 된다. Therefore, when using a conventional audio encoding scheme, if the transmission of various audio objects including mono, stereo or multi-channel signals is required, a problem of high bitrate is inevitable.

이러한 문제점을 해결하기 위해 공간 오디오 객체 부호화(Spatial Audio Object Coding: SAOC)라는 또 다른 오디오 부호화 스킴이 제안되었다. 종래의 SAC 방식이 다채널 오디오 부호화에 초점을 맞춘 기술이라면, SAOC 방식은 다객체 오디오를 부호화 하기 위한 방법이다. 다객체 오디오 부호화란, 서로 상이한 오디오 객체들을 압축 전송하는 기술로서 이때도 마찬가지로 각 객체별 특징을 대표하는 공간 큐를 추출할 수 있다. 따라서, 종래의 오디오 부호화 스킴이 각각의 객체를 별도로 압축하는 기술인 반면, SAOC는 하나 이상의 다수의 객체를 동시에 처리하는 기술이다. SAOC에서는 하나 또는 그 이상의 오디오 객체를 하나의 스테레오 다운믹스된(downmixed) 신호와 부가 정보(side information)으로 표현한다. 따라서, SAOC를 이용하는 경우, 종래의 오디오 부호화기와 비교하여 비트레이트를 획기적으로 감소시킬 수 있었다. In order to solve this problem, another audio coding scheme called Spatial Audio Object Coding (SAOC) has been proposed. If the conventional SAC scheme focuses on multichannel audio encoding, the SAOC scheme is a method for encoding multi-object audio. Multi-object audio encoding is a technique for compressing and transmitting different audio objects. In this case, a spatial cue representative of characteristics of each object may be extracted. Thus, while conventional audio coding schemes are techniques for compressing each object separately, SAOC is a technique for simultaneously processing one or more multiple objects. In SAOC, one or more audio objects are represented by one stereo downmixed signal and side information. Therefore, when using SAOC, the bit rate can be significantly reduced as compared with the conventional audio encoder.

기존의 오디오 서비스에서는 전송되는 오디오 콘텐츠에 대하여 사용자는 수 동적으로 청취할 수밖에 없는 기능적 제약이 따르는 것이 일반적이었으나, 객체별 오디오 코딩 방식은 사용자에게 보다 능동적인 서비스를 제공하기 위한 것으로, 사용자의 요청에 따라 각 오디오 객체를 제어할 수 있을 뿐만 아니라 하나의 콘텐츠 조합으로부터 다양한 오디오 서비스 및 콘텐츠 창출이 가능하다. 한편, SAOC는 패닝(panning), 어테뉴에이션(attenuation) 및 서프레션(suppression)과 같은 기능을 제공하는 믹서/렌더러(mixer/ renderer)를 적용시킬 수 있기 때문에, 사용자와의 상호작용(interaction)에 따라 오디오 객체를 탄력적으로(flexible) 제어할 수 있다. In the conventional audio service, there was a general functional restriction that the user has to listen passively to the transmitted audio content. However, the audio coding method for each object is to provide a more active service to the user. Accordingly, not only can control each audio object, but also various audio services and contents can be created from one content combination. SAOC, on the other hand, can apply a mixer / renderer that provides functions such as panning, attenuation, and suppression, thereby allowing interaction with the user. Accordingly, the audio object can be flexibly controlled.

그러나, 기본적으로 SAC 기반의 SAOC 코딩 기법은 원 음원에 대한 완벽한 복원이 불가능함으로써, 특히 Karaoke, Solo-representation과 같은 주요 기능의 제어 성능이 제약을 받는다. 특히, 이러한 SAOC 시스템은 한정된 개수의 서브밴드(subband)에 기초한 다운믹스 프로세스로 인해 음질(sound quality)의 심각한 손상(degradation)을 발생키는 문제점을 가지고 있다. 따라서, 이러한 문제점을 해결하기 위해서는 서브밴드의 개수를 증가시켜야만 한다. 그러나, 서브밴드의 개수를 무분별하게 증가시키는 경우에는 전송하여야 할 부가 정보도 증가하게 되는 문제점이 있다. However, the SAC-based SAOC coding scheme cannot completely restore the original sound source, and therefore, the control performance of key functions such as Karaoke and Solo-representation is limited. In particular, such a SAOC system has a problem of causing severe degradation of sound quality due to a downmix process based on a limited number of subbands. Therefore, to solve this problem, the number of subbands must be increased. However, when the number of subbands is indiscriminately increased, additional information to be transmitted also increases.

따라서, 본 발명은 비트레이스의 증가를 최소화하면서, 서브밴드 구조를 세분화하여 음질을 향상시킬 수 있는 부호화 및 복호화 방법 및 장치를 제공하는 것을 목적으로 한다. Accordingly, an object of the present invention is to provide an encoding and decoding method and apparatus capable of improving sound quality by subdividing a subband structure while minimizing an increase in bitlace.

본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허청구범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.Other objects and advantages of the present invention can be understood by the following description, and will be more clearly understood by the embodiments of the present invention. It will also be appreciated that the objects and advantages of the present invention may be realized by the means and combinations thereof indicated in the claims.

전술한 과제를 해결하기 위한 본 발명의 일 실시예에 따른 가변 서브밴드 분석을 이용한 부호화 방법은 오디오 객체를 주파수 영역으로 변환하는 단계, 주파수 영역으로 변환된 신호의 서브밴드의 특성에 따라 서브밴드를 가변 서브밴드로 세분화하고, 세분화된 가변 서브밴드에 대한 정보를 포함하는 가변 서브밴드 정보를 생성하는 단계 및 가변 서브밴드에 기초하여 오디오 객체의 복원에 사용되는 파라미터 정보를 생성하는 단계를 포함한다. The encoding method using the variable subband analysis according to an embodiment of the present invention for solving the above-mentioned problems is a step of converting the audio object to the frequency domain, subband according to the characteristics of the subband of the signal converted to the frequency domain Subdividing into variable subbands, generating variable subband information including information about the subdivided variable subbands, and generating parameter information used for reconstruction of an audio object based on the variable subbands.

본 발명의 다른 실시예에 따른 가변 서브밴드 분석을 이용한 부호화 장치는 오디오 객체를 주파수 영역으로 변환하는 주파수 변환부, 주파수 영역으로 변환된 신호의 서브밴드의 특성에 따라 서브밴드를 가변 서브밴드로 세분화하고, 세분화된 가변 서브밴드에 대한 정보를 포함하는 가변 서브밴드 정보를 생성하는 서브밴드 구성부 및 가변 서브밴드에 기초하여 오디오 객체의 복원에 사용되는 파라미터 정보를 생성하는 파라미터 생성부를 포함한다. According to another embodiment of the present invention, an encoding apparatus using variable subband analysis includes a frequency converter for converting an audio object into a frequency domain, and subdividing the subband into variable subbands according to characteristics of a subband of a signal converted into a frequency domain. And a subband constructing unit for generating variable subband information including information about the subdivided variable subbands, and a parameter generating unit for generating parameter information used for reconstruction of an audio object based on the variable subbands.

본 발명의 또 다른 실시예에 따른 가변 서브밴드 분석을 이용한 복호화 방법은 오디오 객체의 서브밴드의 특성에 따라 서브밴드에서 세분화된 가변 서브밴드에 대한 정보를 포함하는 가변 서브밴드 정보 및 가변 서브밴드에 기초하여 오디오 객체를 복원하기 위한 파라미터 정보를 포함하는 비트스트림을 수신하는 단계, 가변 서브밴드 정보에 기초하여 서브밴드를 재구성하는 단계 및 파라미터 정보를 이용하여 오디오 객체를 복원하는 단계를 포함한다.According to another embodiment of the present invention, a decoding method using variable subband analysis includes variable subband information including variable subband information divided into subbands according to characteristics of subbands of an audio object, and variable subbands. Receiving a bitstream including parameter information for reconstructing the audio object based on the data, reconstructing the subband based on the variable subband information, and reconstructing the audio object using the parameter information.

본 발명의 또 다른 실시예에 따른 가변 서브밴드 분석을 이용한 복호화 장치는 오디오 객체의 서브밴드의 특성에 따라 서브밴드에서 세분화된 가변 서브밴드에 대한 정보를 포함하는 가변 서브밴드 정보 및 가변 서브밴드에 기초하여 오디오 객체를 복원하기 위한 파라미터 정보를 포함하는 비트스트림을 수신하는 수신부, 가변 서브밴드 정보에 기초하여 서브밴드를 재구성하는 서브밴드 재구성부 및 파라미터 정보를 이용하여 상기 오디오 객체를 복원하는 복원부를 포함한다.According to another embodiment of the present invention, a decoding apparatus using variable subband analysis may include variable subband information and variable subband information including information about a variable subband subdivided into subbands according to characteristics of a subband of an audio object. A receiver for receiving a bitstream including parameter information for reconstructing the audio object based on the data, a subband reconstruction unit for reconstructing the subband based on variable subband information, and a reconstruction unit for reconstructing the audio object using parameter information Include.

본 발명의 또 다른 실시예에 따른 가변 서브밴드 분석을 이용한 부호화 방법은 입력된 복수의 오디오 객체를 다운믹스 신호로 생성하고 부호화하는 단계, 복수의 오디오 객체를 주파수 영역으로 변환하는 단계, 주파수 영역으로 변환된 신호의 서브밴드를 서브밴드의 특성에 따라 가변 서브밴드로 세분화하고, 세분화된 가변 서브밴드에 대한 정보를 포함하는 가변 서브밴드 정보를 생성하는 단계, 가변 서브밴드에 기초하여 다운믹스 신호의 복원에 사용되는 파라미터 정보를 생성하는 단계 및 가변 서브밴드 정보 및 파라미터 정보를 부호화하는 단계를 포함한다.The encoding method using variable subband analysis according to another embodiment of the present invention comprises the steps of generating and encoding a plurality of input audio objects as a downmix signal, converting the plurality of audio objects to a frequency domain, and a frequency domain. Subdividing the subbands of the converted signal into variable subbands according to characteristics of the subbands, and generating variable subband information including information about the subdivided variable subbands; Generating parameter information used for reconstruction and encoding variable subband information and parameter information.

본 발명의 또 다른 실시예에 따른 가변 서브밴드 분석을 이용한 부호화 장치는 입력된 복수의 오디오 객체를 다운믹스 신호로 생성하고 부호화하는 오디오 부호화부, 복수의 오디오 객체를 주파수 영역으로 변환하는 주파수 변환부, 주파수 영역으로 변환된 신호의 서브밴드를 서브밴드의 특성에 따라 가변 서브밴드로 세분화하고, 세분화된 가변 서브밴드에 대한 정보를 포함하는 가변 서브밴드 정보를 생성하는 서브밴드 구성부, 가변 서브밴드에 기초하여 다운믹스 신호의 복원에 사용되는 파라미터 정보를 생성하는 파라미터 생성부 및 가변 서브밴드 정보 및 파라미터 정보를 부호화하는 부호화부를 포함한다.An encoding apparatus using variable subband analysis according to another embodiment of the present invention includes an audio encoder for generating and encoding a plurality of input audio objects as a downmix signal, and a frequency converter for converting the plurality of audio objects to a frequency domain. A subband configuration unit for subdividing a subband of a signal converted into a frequency domain into a variable subband according to characteristics of a subband and generating variable subband information including information about the subdivided variable subband, and a variable subband A parameter generator for generating parameter information used for reconstruction of the downmix signal, and an encoder for encoding variable subband information and parameter information.

본 발명의 또 다른 실시예에 따른 가변 서브밴드 분석을 이용한 복호화 방법은 입력된 비트스트림으로부터 복수의 오디오 객체에 대한 다운믹스 신호, 복수의 오디오 객체의 서브밴드의 특성에 따라 세분화된 가변 서브밴드에 대한 정보를 포 함하는 가변 서브밴드 정보 및 가변 서브밴드에 기초하여 다운믹스 신호를 복원하기 위한 파라미터 정보를 복호화하는 단계, 복호화된 다운믹스 신호를 주파수 영역으로 변환하는 단계, 복호화된 가변 서브밴드 정보에 기초하여 서브밴드를 재구성하는 단계, 복호화된 파라미터 정보, 주파수 영역으로 변환된 다운믹스 신호 및 재구성된 서브밴드를 이용하여 복수의 오디오 객체를 복원하는 단계 및 복원된 복수의 오디오 객체를 시간 영역으로 변환하는 단계를 포함한다.According to another embodiment of the present invention, a decoding method using variable subband analysis may include a downmix signal for a plurality of audio objects from an input bitstream and a variable subband subdivided according to characteristics of subbands of the plurality of audio objects. Decoding the variable subband information including information about the information and parameter information for reconstructing the downmix signal based on the variable subband, converting the decoded downmix signal into a frequency domain, and decoded variable subband information Reconstructing the subbands based on the received information, restoring the plurality of audio objects using the decoded parameter information, the downmix signal converted into the frequency domain, and the reconstructed subband, and converting the plurality of restored audio objects into the time domain. Converting.

본 발명의 또 다른 실시예에 따른 가변 서브밴드 분석을 이용한 복호화 장치는 입력된 비트스트림으로부터 복수의 오디오 객체에 대한 다운믹스 신호, 복수의 오디오 객체의 서브밴드의 특성에 따라 세분화된 가변 서브밴드에 대한 정보를 포함하는 가변 서브밴드 정보 및 가변 서브밴드에 기초하여 다운믹스 신호를 복원하기 위한 파라미터 정보를 복호화하는 복호화부, 복호화된 다운믹스 신호를 주파수 영역으로 변환하는 주파수 변환부, 복호화된 가변 서브밴드 정보에 기초하여 서브밴드를 재구성하는 서브밴드 재구성부, 복호화된 파라미터 정보, 주파수 영역으로 변환된 다운믹스 신호 및 재구성된 서브밴드를 이용하여 복수의 오디오 객체를 복원하는 복원부 및 복원된 복수의 오디오 객체를 시간 영역으로 변환하는 시간변환부를 포함한다.According to another embodiment of the present invention, a decoding apparatus using variable subband analysis may include a downmix signal for a plurality of audio objects and a variable subband subdivided according to characteristics of subbands of a plurality of audio objects from an input bitstream. A decoder which decodes the variable subband information including the information and parameter information for reconstructing the downmix signal based on the variable subband, a frequency converter which converts the decoded downmix signal into a frequency domain, and a decoded variable subband A subband reconstruction unit for reconstructing the subbands based on the band information, a reconstruction unit for reconstructing a plurality of audio objects using the decoded parameter information, the downmix signal converted to the frequency domain, and the reconstructed subband, and a plurality of reconstructed plurality It includes a time conversion unit for converting the audio object to the time domain.

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명하기로 한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, whereby those skilled in the art may easily implement the technical idea of the present invention. There will be. In addition, in describing the present invention, when it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명에 의하면, 오디오 객체에 대한 서브밴드 구조를 세분화하여 음질을 향상시킬 수 있다. According to the present invention, the sound quality can be improved by subdividing the subband structure of the audio object.

이하의 내용은 단지 본 발명의 원리를 예시한다. 그러므로 당업자는 비록 본 명세서에 명확히 설명되거나 도시되지 않았지만 본 발명의 원리를 구현하고 본 발명의 개념과 범위에 포함된 다양한 장치를 발명할 수 있는 것이다. 또한, 본 명세서에 열거된 모든 조건부 용어 및 실시예들은 원칙적으로, 본 발명의 개념이 이해되도록 하기 위한 목적으로만 명백히 의도되고, 이와같이 특별히 열거된 실시예들 및 상태들에 제한적이지 않는 것으로 이해되어야 한다. The following merely illustrates the principles of the invention. Therefore, those skilled in the art, although not explicitly described or illustrated herein, can embody the principles of the present invention and invent various devices that fall within the spirit and scope of the present invention. In addition, all conditional terms and embodiments listed herein are in principle clearly intended to be understood solely for the purpose of understanding the concept of the invention and are not to be limited to the specifically listed embodiments and states. do.

또한, 본 발명의 원리, 관점 및 실시예들 뿐만 아니라 특정 실시예를 열거하는 모든 상세한 설명은 이러한 사항의 구조적 및 기능적 균등물을 포함하도록 의도되는 것으로 이해되어야 한다. 또한 이러한 균등물들은 현재 공지된 균등물뿐만 아 니라 장래에 개발될 균등물 즉 구조와 무관하게 동일한 기능을 수행하도록 발명된 모든 소자를 포함하는 것으로 이해되어야 한다. In addition, it is to be understood that all detailed descriptions, including the principles, aspects, and embodiments of the present invention, as well as listing specific embodiments, are intended to include structural and functional equivalents of these matters. In addition, these equivalents should be understood to include not only equivalents currently known but also equivalents to be developed in the future, that is, all devices invented to perform the same function regardless of structure.

따라서, 예를 들어, 본 명세서의 블럭도는 본 발명의 원리를 구체화하는 예시적인 회로의 개념적인 관점을 나타내는 것으로 이해되어야 한다. 이와 유사하게, 모든 흐름도, 상태 변환도, 의사 코드 등은 컴퓨터가 판독 가능한 매체에 실질적으로 나타낼 수 있고 컴퓨터 또는 프로세서가 명백히 도시되었는지 여부를 불문하고 컴퓨터 또는 프로세서에 의해 수행되는 다양한 프로세스를 나타내는 것으로 이해되어야 한다.Thus, for example, it should be understood that the block diagrams herein represent a conceptual view of example circuitry embodying the principles of the invention. Similarly, all flowcharts, state transitions, pseudocodes, and the like are understood to represent various processes performed by a computer or processor, whether or not the computer or processor is substantially illustrated on a computer readable medium and whether the computer or processor is clearly shown. Should be.

프로세서 또는 이와 유사한 개념으로 표시된 기능 블럭을 포함하는 도면에 도시된 다양한 소자의 기능은 전용 하드웨어뿐만 아니라 적절한 소프트웨어와 관련하여 소프트웨어를 실행할 능력을 가진 하드웨어의 사용으로 제공될 수 있다. 프로세서에 의해 제공될 때, 상기 기능은 단일 전용 프로세서, 단일 공유 프로세서 또는 복수의 개별적 프로세서에 의해 제공될 수 있고, 이들 중 일부는 공유될 수 있다. The functionality of the various elements shown in the figures, including functional blocks represented by a processor or similar concept, can be provided by the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functionality may be provided by a single dedicated processor, by a single shared processor or by a plurality of individual processors, some of which may be shared.

또한 프로세서, 제어 또는 이와 유사한 개념으로 제시되는 용어의 명확한 사용은 소프트웨어를 실행할 능력을 가진 하드웨어를 배타적으로 인용하여 해석되어서는 아니되고, 제한 없이 디지털 신호 프로세서(DSP) 하드웨어, 소프트웨어를 저장하기 위한 롬(ROM), 램(RAM) 및 비 휘발성 메모리를 암시적으로 포함하는 것으로 이해되어야 한다. 주지관용의 다른 하드웨어도 포함될 수 있다. In addition, the explicit use of terms presented in terms of processor, control, or similar concept should not be interpreted exclusively as a citation to hardware capable of running software, and without limitation, ROM for storing digital signal processor (DSP) hardware, software. (ROM), RAM, and non-volatile memory are to be understood to implicitly include. Other hardware for the governor may also be included.

본 명세서의 청구범위에서, 상세한 설명에 기재된 기능을 수행하기 위한 수단으로 표현된 구성요소는 예를 들어 상기 기능을 수행하는 회로 소자의 조합 또는 펌웨어/마이크로 코드 등을 포함하는 모든 형식의 소프트웨어를 포함하는 기능을 수행하는 모든 방법을 포함하는 것으로 의도되었으며, 상기 기능을 수행하도록 상기 소프트웨어를 실행하기 위한 적절한 회로와 결합된다. 이러한 청구범위에 의해 정의되는 본 발명은 다양하게 열거된 수단에 의해 제공되는 기능들이 결합되고 청구항이 요구하는 방식과 결합되기 때문에 상기 기능을 제공할 수 있는 어떠한 수단도 본 명세서로부터 파악되는 것과 균등한 것으로 이해되어야 한다.In the claims of this specification, components expressed as means for performing the functions described in the detailed description include all types of software including, for example, a combination of circuit elements or firmware / microcode, etc. that perform the functions. It is intended to include all methods of performing a function which are combined with appropriate circuitry for executing the software to perform the function. The invention, as defined by these claims, is equivalent to what is understood from this specification, as any means capable of providing such functionality, as the functionality provided by the various enumerated means are combined, and in any manner required by the claims. It should be understood that.

본 발명에 의한 가변 서브밴드 분석을 이용한 부호화는 주파수 영역으로 변환된 오디오 객체의 서브밴드를 서브밴드의 특성에 따라 세분화함으로써, 비트레이트의 증가를 최소화하면서, 고품질의 음질을 달성할 수 있다. In the encoding using the variable subband analysis according to the present invention, by subdividing the subbands of the audio object converted into the frequency domain according to the characteristics of the subbands, it is possible to achieve high quality sound while minimizing an increase in bitrate.

구체적으로 본 발명에 의한 가변 서브밴드 분석을 이용한 부호화에 의하면, 오디오 객체를 주파수 영역으로 변환(T/F Transform)하고, 주파수 영역으로 변환된 신호의 서브밴드의 특성에 따라 서브밴드를 가변 서브밴드로 세분화하고, 세분화된 가변 서브밴드에 대한 정보를 포함하는 가변 서브밴드 정보를 생성한다. 가변 서브밴드에 기초하여 오디오 객체의 복원에 사용되는 파라미터 정보를 생성한다. 서브밴드를 세분화함으로써, 고품질의 음질을 제공할 수 있고, 서브밴드를 선택적으로 세분화함으로써, 비트율의 증가를 억제할 수 있다. Specifically, according to the encoding using the variable subband analysis according to the present invention, the audio object is transformed into a frequency domain (T / F transform), and the subband is changed into a variable subband according to the characteristics of the subband of the signal transformed into the frequency domain. And subband information including subband subdivided subband information is generated. Generates parameter information used for reconstruction of the audio object based on the variable subband. By subdividing the subbands, a high quality sound quality can be provided, and by subdividing the subbands selectively, it is possible to suppress an increase in the bit rate.

가변 서브밴드 정보와 파라미터 정보는 부호화된다. 여기서 부호화는 무손실 부호화(lossless coding)로 수행될 수 있다. 오디오 객체는 다운믹스된 신호로 생성되어 오디오 부호화된다. 오디오 부호화는 종래의 부호화 방법을 사용할 수 있다. 부호화된 가변 서브밴드 정보, 파라미터 정보 및 오디오 객체는 비트스트림으로 생성된다.Variable subband information and parameter information are encoded. In this case, the encoding may be performed by lossless coding. The audio object is generated from the downmixed signal and audio encoded. Audio encoding may use a conventional encoding method. The encoded variable subband information, parameter information, and audio object are generated in a bitstream.

서브밴드의 특성에는 각 서브밴드에 대한 파워비의 분산계수 특성을 포함할 수 있다. 예를 들어, 특정 서브밴드의 파워비의 분산계수가 소정의 임계값 이상인 경우, 해당 서브밴드를 세분화하고, 소정의 임계갑 미만인 경우, 해당 서브밴드는 세분화하지 않고, 기존의 서브밴드를 유지하는 것이다.The characteristics of the subbands may include dispersion coefficient characteristics of power ratios for each subband. For example, when the dispersion coefficient of the power ratio of a specific subband is greater than or equal to a predetermined threshold, the subband is subdivided. If it is less than a predetermined threshold, the subband is not subdivided and the existing subband is maintained.

한편, 파라미터 정보는 가변 비트 레벨을 이용하여 양자화할 수 있으며, 파라미터 정보에는 공간 파라미터(spatial parameter) 정보가 포함될 수 있다. 가변 비트 레벨을 이용하여 양자화함으로써, 서브밴드의 개수가 증가함에 따라 늘어나는 비트율을 최소화시킬 수 있다.Meanwhile, the parameter information may be quantized using a variable bit level, and the parameter information may include spatial parameter information. By quantizing using a variable bit level, it is possible to minimize the bit rate that increases as the number of subbands increases.

본 발명에 의한 가변 서브밴드 분석을 이용한 복호화는 주파수 영역으로 변환된 오디오 객체의 서브밴드를 서브밴드의 특성에 따라 세분화된 정보를 이용하여 오디오 객체를 복원함으로써, 비트레이트의 증가를 최소화하면서, 고품질의 음질을 달성할 수 있다. The decoding using the variable subband analysis according to the present invention restores the audio object using the information subdivided into subbands of the audio object transformed into the frequency domain according to the characteristics of the subbands, thereby minimizing the increase of the bit rate, Sound quality of can be achieved.

구체적으로 본 발명에 의한 가변 서브밴드 분석을 이용한 복호화에 의하면, 오디오 객체의 서브밴드의 특성에 따라 세분화된 가변 서브밴드에 대한 정보를 포함하는 가변 서브밴드 정보 및 가변 서브밴드에 기초하여 오디오 객체를 복원하기 위한 파라미터 정보를 포함하는 비트스트림을 수신하여, 가변 서브밴드 정보에 기초하여 서브밴드를 재구성하고, 파라미터 정보를 이용하여 오디오 객체를 복원한다. 서브밴드를 세분화함으로써, 고품질의 음질을 제공할 수 있고, 서브밴드를 선택적으로 세분화함으로써, 비트율의 증가를 억제할 수 있다. Specifically, according to the decoding using the variable subband analysis according to the present invention, the audio object is based on the variable subband information and the variable subband including information on the variable subband subdivided according to the characteristics of the subband of the audio object. Receives a bitstream including parameter information for reconstruction, reconstructs a subband based on variable subband information, and reconstructs an audio object using parameter information. By subdividing the subbands, a high quality sound quality can be provided, and by subdividing the subbands selectively, it is possible to suppress an increase in the bit rate.

가변 서브밴드 정보와 파라미터 정보는 복호화된다. 여기서 복호화는 무손실 복호화(lossless decoding)로 수행될 수 있다. 비트스트림에는 오디오 객체에 대한 비트스트림을 포함할 수 있으며, 오디오 객체는 오디오 복호화된다. 오디오 복호화는 종래의 복호화 방법을 사용할 수 있다. 복호화된 오디오 객체는 주파수 변환(T/F Transform)된다. 주파수 변환된 오디오 객체와 재구성된 서브밴드를 이용하여 오디오 객체를 복원한다. 복원된 오디오 객체는 시간 변환(F/T Transform)되어 출력된다.Variable subband information and parameter information are decoded. In this case, the decoding may be performed by lossless decoding. The bitstream may include a bitstream for an audio object, and the audio object is audio decoded. Audio decoding may use a conventional decoding method. The decoded audio object is frequency transformed (T / F transform). The audio object is reconstructed using the frequency-converted audio object and the reconstructed subband. The reconstructed audio object is output after being subjected to F / T transform.

서브밴드의 특성에는 각 서브밴드에 대한 파워비의 분산계수 특성을 포함할 수 있다. 예를 들어, 특정 서브밴드의 파워비의 분산계수가 소정의 임계값 이상인 경우, 해당 서브밴드를 세분화하고, 소정의 임계값 미만인 경우, 해당 서브밴드는 세분화하지 않고, 기존의 서브밴드로 유지된다. 가변 서브밴드 정보에는 서브밴드의 특성 정보가 포함될 수 있으며, 이를 이용하여 서브밴드를 재구성하게 된다.The characteristics of the subbands may include dispersion coefficient characteristics of power ratios for each subband. For example, when the dispersion coefficient of the power ratio of a specific subband is greater than or equal to a predetermined threshold, the subband is subdivided. If it is less than the predetermined threshold, the subband is not subdivided and is maintained as an existing subband. The variable subband information may include the characteristic information of the subband, and the subband is reconfigured using this.

한편, 파라미터 정보는 가변 비트 레벨을 이용하여 역양자화할 수 있으며, 파라미터 정보에는 공간 파라미터(spatial parameter) 정보가 포함될 수 있다. 가변 비트 레벨을 이용하여 양자화함으로써, 서브밴드의 개수가 증가함에 따라 늘어나는 비트율을 최소화시킬 수 있다.Meanwhile, the parameter information may be inversely quantized using a variable bit level, and the parameter information may include spatial parameter information. By quantizing using a variable bit level, it is possible to minimize the bit rate that increases as the number of subbands increases.

이하에서 실시예와 함께 자세히 설명한다.It will be described in detail with the following examples.

<부호화><Coding>

본 발명에 따른 가변 서브밴드 분석을 이용한 부호화 방법은 입력된 복수의 오디오 객체를 다운믹스 신호로 생성하고 부호화하는 단계, 복수의 오디오 객체를 주파수 영역으로 변환하는 단계, 주파수 영역으로 변환된 신호의 서브밴드를 서브밴드의 특성에 따라 가변 서브밴드로 세분화하고, 세분화된 가변 서브밴드에 대한 정보를 포함하는 가변 서브밴드 정보를 생성하는 단계, 가변 서브밴드에 기초하여 다운믹스 신호의 복원에 사용되는 파라미터 정보를 생성하는 단계 및 가변 서브밴드 정보 및 파라미터 정보를 부호화하는 단계를 포함한다. 여기서, 서브밴드의 특성은 서브밴드에 대한 파워비의 분산계수 특성을 포함할 수 있다. 여기서, 파라미 터 정보는 공간 파라미터 정보를 포함할 수 있다. The encoding method using the variable subband analysis according to the present invention includes generating and encoding a plurality of input audio objects as a downmix signal, converting the plurality of audio objects into a frequency domain, and sub-processing a signal converted into a frequency domain. Subdividing the band into variable subbands according to characteristics of the subbands, generating variable subband information including information about the subdivided variable subbands, and a parameter used for reconstructing a downmix signal based on the variable subbands Generating information and encoding variable subband information and parameter information. Here, the characteristics of the subband may include a dispersion coefficient characteristic of the power ratio with respect to the subband. Here, the parameter information may include spatial parameter information.

한편, 파라미터 정보를 생성하는 단계는 파라미터 정보를 가변 비트 레벨을 이용하여 양자화하는 단계를 포함할 수 있다. Meanwhile, generating the parameter information may include quantizing the parameter information using a variable bit level.

본 발명에 따른 가변 서브밴드 분석을 이용한 부호화 장치는 입력된 복수의 오디오 객체를 다운믹스 신호로 생성하고 부호화하는 오디오 부호화부, 복수의 오디오 객체를 주파수 영역으로 변환하는 주파수 변환부, 주파수 영역으로 변환된 신호의 서브밴드를 서브밴드의 특성에 따라 가변 서브밴드로 세분화하고, 세분화된 가변 서브밴드에 대한 정보를 포함하는 가변 서브밴드 정보를 생성하는 서브밴드 구성부, 가변 서브밴드에 기초하여 다운믹스 신호의 복원에 사용되는 파라미터 정보를 생성하는 파라미터 생성부 및 가변 서브밴드 정보 및 파라미터 정보를 부호화하는 부호화부를 포함한다. 여기서, 서브밴드의 특성은 서브밴드에 대한 파워비의 분산계수 특성을 포함할 수 있다. 여기서, 파라미터 정보는 공간 파라미터 정보를 포함할 수 있다. The encoding apparatus using the variable subband analysis according to the present invention includes an audio encoder for generating and encoding a plurality of input audio objects as a downmix signal, a frequency converter for converting the plurality of audio objects to a frequency domain, and a frequency domain. A subband configuration unit for subdividing the subbands of the divided signals into variable subbands according to characteristics of the subbands, and generating variable subband information including information about the subdivided variable subbands, the downmix based on the variable subbands And a parameter generator for generating parameter information used for reconstruction of the signal, and an encoder for encoding variable subband information and parameter information. Here, the characteristics of the subband may include a dispersion coefficient characteristic of the power ratio with respect to the subband. Here, the parameter information may include spatial parameter information.

한편, 파라미터 생성부는 파라미터 정보를 가변 비트 레벨을 이용하여 양자화하는 양자화부를 포함할 수 있다.The parameter generator may include a quantizer for quantizing parameter information using a variable bit level.

<복호화><Decryption>

본 발명의 따른 가변 서브밴드 분석을 이용한 복호화 방법은 입력된 비트스트림으로부터 복수의 오디오 객체에 대한 다운믹스 신호, 복수의 오디오 객체의 서 브밴드의 특성에 따라 세분화된 가변 서브밴드에 대한 정보를 포함하는 가변 서브밴드 정보 및 가변 서브밴드에 기초하여 다운믹스 신호를 복원하기 위한 파라미터 정보를 복호화하는 단계, 복호화된 다운믹스 신호를 주파수 영역으로 변환하는 단계, 복호화된 가변 서브밴드 정보에 기초하여 서브밴드를 재구성하는 단계, 복호화된 파라미터 정보, 주파수 영역으로 변환된 다운믹스 신호 및 재구성된 서브밴드를 이용하여 복수의 오디오 객체를 복원하는 단계 및 복원된 복수의 오디오 객체를 시간 영역으로 변환하는 단계를 포함한다. 여기서, 서브밴드의 특성은 서브밴드에 대한 파워비의 분산계수 특성을 포함할 수 있다. 여기서, 파라미터 정보는 공간 파라미터 정보를 포함할 수 있다. A decoding method using variable subband analysis according to the present invention includes downmix signals for a plurality of audio objects from input bitstreams, and information about variable subbands divided according to characteristics of subbands of the plurality of audio objects. Decoding parameter information for restoring the downmix signal based on the variable subband information and the variable subband, converting the decoded downmix signal into a frequency domain, and subband based on the decoded variable subband information Reconstructing a plurality of audio objects using the decoded parameter information, the downmix signal converted into the frequency domain, and the reconstructed subband, and converting the restored plurality of audio objects into a time domain. do. Here, the characteristics of the subband may include a dispersion coefficient characteristic of the power ratio with respect to the subband. Here, the parameter information may include spatial parameter information.

한편, 복호화하는 단계는 파라미터 정보를 가변 비트 레벨을 이용하여 역양자화하는 단계를 포함할 수 있다. Meanwhile, the decoding may include dequantizing parameter information using a variable bit level.

본 발명의 따른 가변 서브밴드 분석을 이용한 복호화 장치는 입력된 비트스트림으로부터 복수의 오디오 객체에 대한 다운믹스 신호, 복수의 오디오 객체의 서브밴드의 특성에 따라 세분화된 가변 서브밴드에 대한 정보를 포함하는 가변 서브밴드 정보 및 가변 서브밴드에 기초하여 다운믹스 신호를 복원하기 위한 파라미터 정보를 복호화하는 복호화부, 복호화된 다운믹스 신호를 주파수 영역으로 변환하는 주파수 변환부, 복호화된 가변 서브밴드 정보에 기초하여 서브밴드를 재구성하는 서브밴드 재구성부, 복호화된 파라미터 정보, 주파수 영역으로 변환된 다운믹스 신호 및 재구성된 서브밴드를 이용하여 복수의 오디오 객체를 복원하는 복원부 및 복 원된 복수의 오디오 객체를 시간 영역으로 변환하는 시간변환부를 포함한다. 여기서, 서브밴드의 특성은 서브밴드에 대한 파워비의 분산계수 특성을 포함할 수 있다. 여기서, 파라미터 정보는 공간 파라미터 정보를 포함할 수 있다. The decoding apparatus using the variable subband analysis according to the present invention includes a downmix signal for a plurality of audio objects from the input bitstream and information about the variable subbands divided according to the characteristics of the subbands of the plurality of audio objects. A decoder which decodes parameter information for reconstructing the downmix signal based on the variable subband information and the variable subband, a frequency converter which converts the decoded downmix signal into a frequency domain, and based on the decoded variable subband information. A subband reconstruction unit for reconstructing the subbands, a decoded parameter information, a downmix signal converted into a frequency domain, and a reconstruction unit for reconstructing a plurality of audio objects using the reconstructed subbands and a plurality of restored audio objects in a time domain It includes a time conversion unit to convert to. Here, the characteristics of the subband may include a dispersion coefficient characteristic of the power ratio with respect to the subband. Here, the parameter information may include spatial parameter information.

한편, 복호화부는 파라미터 정보를 가변 비트 레벨을 이용하여 역양자화하는 역양자화부를 포함할 수 있다. The decoder may include an inverse quantizer that inversely quantizes the parameter information using the variable bit level.

이하에서는 도면을 참조하며, 구체적인 실시예에 대해 설명한다.Hereinafter, with reference to the drawings, a specific embodiment will be described.

일반적인 다객체/다채널 오디오 부호화 과정에서는 하나의 객체로 다운믹스 과정을 거치기 때문에 복호화 과정에서 각 객체의 복원이 완벽하게 이루어질 수 없다. 특히, 카라오케 모드와 같이 하나의 객체 신호의 파워를 완전히 줄이는 경우, 음질 열화는 눈에 띄게 심해진다. In the general multi-object / multi-channel audio encoding process, since the downmix process is performed as one object, the restoration of each object cannot be completely performed during the decoding process. In particular, when the power of one object signal is completely reduced, such as in a karaoke mode, the sound quality deterioration is noticeably worse.

따라서, 본 발명에서는 다객체/다채널 오디오 부/복호화 과정에서 파라미터를 분석하는 서브밴드의 수를 가변적으로 증가시킴으로서 더욱 정확한 파라미터를 추출하고, 그 결과 다운믹스 신호로부터 더욱 분명하게 객체들을 복원한다. 이 과정에서 신호들의 주파수 특성에 따라 양자화 레벨을 달리 적용하여 비트율의 증가를 최소화할 수 있다.Accordingly, the present invention extracts more accurate parameters by variably increasing the number of subbands analyzing parameters in the multi-object / multi-channel audio encoding / decoding process, and consequently recovers objects more clearly from the downmix signal. In this process, it is possible to minimize the increase in the bit rate by applying different quantization levels according to the frequency characteristics of the signals.

도 1은 본 발명에 따른 오디오 부/복호화 과정을 도시한 것이다. 부호화기(101)는 오디오 객체를 입력받는다. 입력되는 오디오 객체는 제한되지 않는다. 따라서, 복수의 오디오 객체(Object #1, Object #2, Object #3 ...)를 입력받을 수 있다. 부호화기(101)에서는 입력된 객체들을 이용하여 다운 믹스 신호(Dowmmix signal)를 생성하고, 복호화 과정에서 필요하게 될 파라미터들을 추출한다. 도 1의 부가 정보(Side Info.)가 파라미터에 포함될 수 있다. 복호화기(102)는 복호화를 수행하는 것으로서, 부호화기(101)로부터 다운 믹스 신호와 파라미터를 이용하여 복원된 오디오 객체를 출력한다. 복원된 오디오 객체는 믹서/렌더러(Mixer/ Renderer: 103)에서 위치/레벨 등의 제어(Interaction control)를 수행하여 채널(Chan. #1, Chan. #2, Chan. #3, ...)로 출력된다. 여기서, 부호화기(101)와 복호화기(102)는 SAOC 방식을 이용할 수 있다.1 illustrates an audio encoding / decoding process according to the present invention. The encoder 101 receives an audio object. The input audio object is not limited. Therefore, a plurality of audio objects (Object # 1, Object # 2, Object # 3 ...) may be input. The encoder 101 generates a downmix signal using the input objects, and extracts parameters required in the decoding process. Side information of FIG. 1 may be included in a parameter. The decoder 102 performs decoding and outputs an audio object reconstructed from the encoder 101 using the downmix signal and the parameters. The reconstructed audio object is controlled by the mixer / renderer (Mixer / Renderer 103) such as position / level and the like (Chan. # 1, Chan. # 2, Chan. # 3, ...). Is output. Here, the encoder 101 and the decoder 102 may use the SAOC method.

도 2는 본 발명에 따른 다객체 오디오 부호화기의 구성도이다. 본 발명에 따른 다객체 오디오 부호화는 신호들의 주파수 대역 특성을 분석하고, 파라미터를 분석할 서브밴드 구조를 정의하며, 파라미터의 양자화 방법을 주파수 특성에 따라 상이하게 적용한다. 정의된 서브밴드 구조는 복호화 과정에서 복원을 위해 재구성된다.2 is a block diagram of a multi-object audio encoder according to the present invention. Multi-object audio encoding according to the present invention analyzes the frequency band characteristics of signals, defines a subband structure for analyzing parameters, and applies a quantization method of parameters differently according to frequency characteristics. The defined subband structure is reconstructed for reconstruction in the decoding process.

오디오 객체(1, 2, ... , M)는 오디오 부호화부(201)와 주파수 변환부(T/F Transform: 202)에 입력된다. 오디오 부호하부(201)은 오디오 객체(1, 2, ... , M)를 다운 믹스하여 부호화한다. 주파수 변환부(202)는 오디오 객체(1, 2, ... , M)를 주파수 영역으로 변환된다. The audio objects 1, 2,..., M are input to the audio encoder 201 and the frequency transform unit 202. The audio code lower portion 201 downmixes and encodes the audio objects 1, 2, ..., M. The frequency converter 202 converts the audio objects 1, 2, ..., M into a frequency domain.

서브밴드 구성부(203)는 주파수 변환된 신호의 서브밴드를 서브밴드의 특성 에 따라 가변 서브밴드로 세분화한다. 파라미터 생성부(205)는 가변 서브밴드에 기초하여 복호화 과정에서 오디오 객체들이 다운 믹스 신호로부터 복원되기 위해 필요한 파라미터들을 추출한다. The subband configuration unit 203 subdivides the subbands of the frequency-converted signal into variable subbands according to the characteristics of the subbands. The parameter generator 205 extracts parameters necessary for audio objects to be recovered from the downmix signal in the decoding process based on the variable subband.

서브밴드에 대한 파라미터에는 IOLD(Inter-Object Level Difference) 정보를 포함할 수 있다. IOLD는 두 개의 객체 간의 파워의 비율을 각 서브밴드 별로 계산하는 파라미터이다. 이를 표현하는 IOLD의 식은 [수학식 1]과 같다.The parameter for the subband may include IOLD (Inter-Object Level Difference) information. IOLD is a parameter that calculates the ratio of power between two objects for each subband. The expression of IOLD expressing this is shown in [Equation 1].

[수학식 1][Equation 1]

여기서, M은 서브밴드의 수이고, k,b는 각각 주파수 계수, 서브밴드 인덱스이다. 또한, 분자항과 분모항은 상호 바뀌어 정의될 수 있다. 서브밴드는 부호화 방법에 따라 고정된 서브밴드를 가질 수 있다. 예를 들어, MPEG Surround 의 경우, 20 또는 28개의 고정된 서브밴드를 한 오디오 신호 프레임에 대해 적용하고 있다. 이렇게 고정된 서브밴드 별로 계산하는 경우, 이를 분석하는 밴드의 해상력이 낮으면 복호화 과정에서 두 신호를 잘 분리해 내지 못하는 단점이 있다. 본 발명에서는 이를 개선하기 위해 서브밴드 구성부(203)에서 서브밴드를 더욱 세분한 가변 서브 밴드로 구성함으로써, 보다 정확한 파라미터를 분석할 수 있다. 서브밴드 구성부(203)에 대해서는 도 4에서 더욱 상세히 설명한다.Here, M is the number of subbands, and k and b are frequency coefficients and subband indexes, respectively. In addition, the molecular terms and denominator terms can be defined interchangeably. The subband may have a fixed subband according to an encoding method. For example, in MPEG Surround, 20 or 28 fixed subbands are applied to one audio signal frame. In the case of calculating for each fixed subband, if the resolution of the band analyzing this is low, there is a disadvantage in that the two signals cannot be separated well in the decoding process. In the present invention, in order to improve this, the subband configuration unit 203 configures the subband into more subdivided variable subbands, so that more accurate parameters can be analyzed. The subband configuration unit 203 will be described in more detail with reference to FIG. 4.

부호화부(204)에서는 서브밴드 구성부(203)에서 생성된 가변 서브밴드 정보를 부호화한다. 부호화부(206)에서는 파라미터 생성부에서 생성된 파라미터를 포함하는 파라미터 정보를 부호화한다. 부호화부(204, 206)은 무손실 부호화 방식을 이용할 수 있다. 비트스트림 포맷터(Bitstream Formatter: 207)에서는 부호화된 가변 서브밴드 정보, 파라미터 정보 및 오디오 객체는 비트스트림으로 생성한다. 여기서 비트스트림은 SAOC 비트스트림일 수 있다. The encoder 204 encodes the variable subband information generated by the subband configuration unit 203. The encoder 206 encodes parameter information including the parameter generated by the parameter generator. The encoders 204 and 206 may use a lossless coding scheme. In the bitstream formatter 207, encoded variable subband information, parameter information, and an audio object are generated as a bitstream. Here, the bitstream may be a SAOC bitstream.

도 4는 본 발명에 따른 가변 서브밴드의 구성을 설명하기 위한 도면이다. 도 2에서의 서브밴드 구성부(203)에는 도 4의 스팩트럼 분석부(Spectrum Analysis: 401)를 포함할 수 있다. 도 4에서 사용자가 자유롭게 컨트롤하게 될 객체를 Object #1이라 칭하고, 그 외의 객체들을 Object #2, Object #3 ... 이라 칭한다. 여기서 스팩트럼 분석부(401)는 각 신호들의 주파수 대역의 파워를 분석하여, 새로운 서브밴드 정보인 가변 서브밴드 정보를 출력한다. 4 is a view for explaining the configuration of a variable subband according to the present invention. The subband configuration unit 203 of FIG. 2 may include a spectrum analysis unit 401 of FIG. 4. In FIG. 4, an object to be freely controlled by the user is referred to as Object # 1, and other objects are referred to as Object # 2, Object # 3,. Here, the spectrum analyzer 401 analyzes the power of the frequency band of each signal and outputs variable subband information which is new subband information.

파라미터를 분석할 서브밴드의 기본적인 구조는 MPEG Surround에서 사용되는 28 밴드를 따른다. 각 서브밴드 내에서 두 신호의 파워비의 변화가 심할 때, 특정 밴드를 더욱 세분화한다. 이 조건을 수식으로 표현하면 [수학식 2] 및 [수학식 3]과 같다. The basic structure of the subbands to be analyzed for parameters follows the 28 bands used in MPEG Surround. When the power ratio of the two signals in each subband changes significantly, the specific band is further subdivided. This condition is expressed by the equation as shown in [Equation 2] and [Equation 3].

[수학식 2][Equation 2]

[수학식 3][Equation 3]

여기서, avrgb 는 b번째 서브밴드 내에서 두 신호의 파워비의 평균을 나타내는 변수이고, varb 는 두 신호의 파워비의 변화정도를 나타내는 분산계수이다. Here, avrgb is a variable representing an average of power ratios of two signals in a b-th subband, and varb is a dispersion coefficient representing a degree of change of power ratios of two signals.

[수학식 3]에 따른 분산계수가 소정의 임계값을 넘으면 그 분석밴드(b번째 서브밴드)를 더욱 세분화하여 전체적인 서브밴드를 구성하며, 이렇게 구성된 가변 서브밴드의 구조를 표시하는 파라미터를 추가적으로 전송하여 복호화 단계에서 세분화된 가변 서브밴드를 용이하게 구성할 수 있도록 한다. 예를 들어, 서브밴드의 구조를 나타내는 파라미터는 각각의 밴드마다 0 또는 1로 표시한다. 1로 표시된 서브밴드는 그 밴드를 더욱 세분화할 필요가 있음을 의미한다. When the dispersion coefficient according to [Equation 3] exceeds a predetermined threshold, the analysis band (b-th subband) is further subdivided to form an overall subband, and additionally transmits a parameter indicating the structure of the variable subband thus configured. In the decoding step, it is possible to easily configure the granular variable subband. For example, a parameter indicating the structure of a subband is represented by 0 or 1 for each band. A subband labeled with 1 means that the band needs to be further subdivided.

도 3은 본 발명에 따른 다객체 오디오 복호화기의 구성도이다. 비트스트림 디멀티플렉서(Bitstream Demultiplexer: 301)는 비트스트림을 입력받아, 오디오 객체에 대한 신호, 파라미터 정보에 대한 신호 및 가변 서브밴드 정보에 대한 신호를 각각 분리하여 각각의 복호화부(302, 304, 305)로 출력한다. 여기서 비트스트림은 SAOC 비트스트림일 수 있다. 오디오 객체에 대한 신호는 오디오 복호화부(302)에서 복호화되어 다운믹스된 신호로 출력된다. 다운믹스된 신호는 주파수 변환부(303)에서 주파수 변환된다. 파라미터 정보에 대한 신호는 파라미터 복호화부(304)에서 복호화되어 복원부(307)로 출력된다. 가변 서브밴드 정보에 대한 신호는 가변 서브밴드 복호화부(305)에서 복호화되어 서브밴드 재구성부(306)으로 출력된다. 파라미터 정보에 대한 신호 및 가변 서브밴드 정보에 대한 신호의 복호화는 무손실 복호화 방법을 사용할 수 있다.3 is a block diagram of a multi-object audio decoder according to the present invention. The bitstream demultiplexer 301 receives a bitstream, separates a signal for an audio object, a signal for parameter information, and a signal for variable subband information, respectively, and decodes each of the decoders 302, 304, and 305. Will output Here, the bitstream may be a SAOC bitstream. The signal for the audio object is decoded by the audio decoder 302 and output as a downmixed signal. The downmixed signal is frequency converted by the frequency converter 303. The signal for the parameter information is decoded by the parameter decoder 304 and output to the decompression unit 307. The signal for the variable subband information is decoded by the variable subband decoder 305 and output to the subband reconstructor 306. The decoding of the signal for the parameter information and the signal for the variable subband information may use a lossless decoding method.

복원부(307)는 서브밴드 재구성부(306)에서 가변 서브밴드 정보를 이용하여 재구성된 서브밴드를 기반으로, 주파수 변환된 오디오 객체의 다운믹스 신호와 파라미터 정보를 이용하여 오디오 객체를 복원한다. 여기서 파라미터 정보는 공간 큐 정보를 포함하는 공간 파라미터일 수 있다. 복원된 오디오 객체는 시간 변환부에서 시간 영역으로 변환되어 최종적으로 오디오 객체로 출력된다. 일 예로 IOLD 파라미터를 이용하여 하나의 다운믹스 신호로부터 두 객체를 복원하는 것은 아래의 [수학식 4]를 이용할 수 있다.The reconstruction unit 307 reconstructs the audio object by using the downmix signal and parameter information of the frequency-converted audio object based on the subband reconstructed using the variable subband information in the subband reconstruction unit 306. In this case, the parameter information may be a spatial parameter including spatial queue information. The restored audio object is converted into a time domain by the time converter and finally output as an audio object. For example, restoring two objects from one downmix signal using the IOLD parameter may use Equation 4 below.

[수학식 4][Equation 4]

도 5는 본 발명에 따른 서브밴드의 재구성을 설명하기 위한 도면이다. 도 5는 도 3의 서브밴드 재구성부(306)에서 부호화 과정에서 사용된 서브밴드의 형태를 재구성하는 과정을 설명한다. 가변 서브밴드 정보에 따라 28개의 서브밴드는 각각 0 또는 1로 표시된다. 0으로 표시된 밴드의 경우에는 기존의 것을 그대로 사용하며, 1로 표시된 밴드의 경우에는 미리 정해놓은 개수로 그 밴드를 일정하게 나누어 사용한다. 블록(501)은 MPEG Surround에서 사용된 28 subband의 파티션 정보를 FFT 기반으로 하여 나타내고 있으며, 출력으로 나오는 A(k)는 k번째 밴드의 파티션을 나타낸다.5 is a diagram for explaining reconfiguration of a subband according to the present invention. 5 illustrates a process of reconstructing the shape of a subband used in an encoding process by the subband reconstruction unit 306 of FIG. 3. 28 subbands are represented by 0 or 1, respectively, according to the variable subband information. In the case of the band indicated by 0, the existing one is used as it is, and in the case of the band indicated by 1, the band is regularly divided by a predetermined number. Block 501 indicates partition information of 28 subbands used in MPEG Surround on the basis of FFT, and A (k) as the output indicates the partition of the k-th band.

도 6은 본 발명에 따른 가변 비트 레벨을 이용한 양자화를 설명하기 위한 도면이다. 가변 레벨 양자화부(601)은 도 2의 파라미터 생성부(205)에 포함될 수 있다. 가변 레벨 양자화부(601)는 입력된 파라미터는 주파수 대역 특징을 분석하여 그에 따른 가변적인 비트 양자화를 수행하여 양자화된 파라미터를 출력한다. 파라미터 생성부에서 가변적인 비트 양자화를 함으로써, 비트율의 증가를 최소화할 수 있다. 6 is a diagram for explaining quantization using a variable bit level according to the present invention. The variable level quantization unit 601 may be included in the parameter generator 205 of FIG. 2. The variable level quantization unit 601 analyzes frequency band characteristics of the input parameter and performs variable bit quantization according to the input parameter to output the quantized parameter. By performing variable bit quantization in the parameter generator, an increase in bit rate can be minimized.

도 7은 본 발명에 따른 가변 비트 레벨을 이용한 역양자화를 설명하기 위한 도면이다. 가변 레벨 역양자화부(701)는 도 3의 가변 서브밴드 복호화부(305)에 포함될 수 있다. 가변 레벨 역양자화부(701)은 양자화된 파라미터를 입력받아 주파수 대역 특징에 따른 가변적인 비트 역양자화를 수행하여 역양자화된 파라미터를 출력한다. 7 is a view for explaining inverse quantization using a variable bit level according to the present invention. The variable level dequantizer 701 may be included in the variable subband decoder 305 of FIG. 3. The variable level inverse quantization unit 701 receives the quantized parameter and performs variable bit inverse quantization according to the frequency band characteristic to output the dequantized parameter.

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 형태로 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다. 이러한 과정은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있으므로 더 이상 상세히 설명하지 않기로 한다.As described above, the method of the present invention may be implemented as a program and stored in a recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.) in a computer-readable form. Since this process can be easily implemented by those skilled in the art will not be described in more detail.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited by the drawings.

본 발명은 오디오의 부호화 및 복호화에 사용된다. The present invention is used for encoding and decoding audio.

도 1은 본 발명에 따른 오디오 부/복호화 과정을 도시한 것이다. 1 illustrates an audio encoding / decoding process according to the present invention.

도 2는 본 발명에 따른 다객체 오디오 부호화기의 구성도이다. 2 is a block diagram of a multi-object audio encoder according to the present invention.

도 3은 본 발명에 따른 다객체 오디오 부호화기의 구성도이다. 3 is a block diagram of a multi-object audio encoder according to the present invention.

도 4는 본 발명에 따른 가변 서브밴드의 구성을 설명하기 위한 도면이다. 4 is a view for explaining the configuration of a variable subband according to the present invention.

도 5는 본 발명에 따른 서브밴드의 재구성을 설명하기 위한 도면이다. 5 is a diagram for explaining reconfiguration of a subband according to the present invention.

도 6은 본 발명에 따른 가변 비트 레벨을 이용한 양자화를 설명하기 위한 도면이다. 6 is a diagram for explaining quantization using a variable bit level according to the present invention.

도 7은 본 발명에 따른 가변 비트 레벨을 이용한 역양자화를 설명하기 위한 도면이다. 7 is a view for explaining inverse quantization using a variable bit level according to the present invention.

Claims

Generating and encoding a plurality of input audio objects as a downmix signal;

Converting the plurality of audio objects into a frequency domain;

Subdividing a subband of the signal converted into the frequency domain into a variable subband according to characteristics of the subband, and generating variable subband information including information about the subdivided variable subband;

Generating parameter information used for reconstruction of the downmix signal based on the variable subband; And

And encoding the variable subband information and the parameter information.

The method of claim 1,

The characteristics of the subband

And a coefficient of variance characteristic of a power ratio for the subband.

The method of claim 1,

Generating the parameter information

And quantizing the parameter information using a variable bit level.

The method of claim 1,

The parameter information is

An encoding method using variable subband analysis, including spatial parameter information.

An audio encoder for generating and encoding a plurality of input audio objects as a downmix signal;

A frequency converter converting the plurality of audio objects into a frequency domain;

A subband configuration unit for subdividing the subbands of the signal converted into the frequency domain into variable subbands according to characteristics of the subbands, and generating variable subband information including information about the subdivided variable subbands;

A parameter generator for generating parameter information used to recover the downmix signal based on the variable subband; And

And an encoding unit encoding the variable subband information and the parameter information.

The method of claim 5,

The characteristics of the subband

And a coefficient of variance characteristic of the power ratio for the subband.

The method of claim 5,

The parameter generation unit

And a quantizer for quantizing the parameter information using a variable bit level.

The method of claim 5,

The parameter information is

An encoding apparatus using variable subband analysis including spatial parameter information.

Based on the downmix signal for the plurality of audio objects from the input bitstream, variable subband information including information on the variable subbands subdivided according to the characteristics of the subbands of the plurality of audio objects, and the variable subband. Decoding parameter information for recovering the downmix signal;

Converting the decoded downmix signal into a frequency domain;

Reconstructing a subband based on the decoded variable subband information;

Restoring a plurality of audio objects using the decoded parameter information, the downmix signal converted into the frequency domain, and the reconstructed subband; And

And converting the reconstructed plurality of audio objects into a time domain.

The method of claim 9,

The characteristics of the subband

And a dispersion coefficient characteristic of power ratio for the subband.

The method of claim 9,

The decoding step

And inverse quantizing the parameter information using a variable bit level.

The method of claim 9,

The parameter information is

A decoding method using variable subband analysis including spatial parameter information.

Based on the downmix signal for the plurality of audio objects from the input bitstream, variable subband information including information on the variable subbands subdivided according to the characteristics of the subbands of the plurality of audio objects, and the variable subband. A decoder which decodes parameter information for restoring the downmix signal;

A frequency converter converting the decoded downmix signal into a frequency domain;

A subband reconstruction unit reconstructing a subband based on the decoded variable subband information;

A reconstruction unit for reconstructing a plurality of audio objects using the decoded parameter information, the downmix signal converted into the frequency domain, and the reconstructed subband; And

And a time converter for converting the restored plurality of audio objects into a time domain.

The method of claim 13,

The characteristics of the subband

And a dispersion coefficient characteristic of a power ratio for the subband.

The method of claim 13,

The decoding unit

And an inverse quantization unit for inversely quantizing the parameter information by using a variable bit level.

The method of claim 13,

The parameter information is

A decoding apparatus using variable subband analysis including spatial parameter information.

Converting the audio object into a frequency domain;

Subdividing the subband into variable subbands according to characteristics of subbands of the signal converted into the frequency domain, and generating variable subband information including information about the subdivided variable subbands; And

Generating parameter information used for reconstruction of the audio object based on the variable subband.

A frequency converter converting the audio object into a frequency domain;

A subband configuration unit configured to subdivide the subband into a variable subband according to characteristics of a subband of the signal converted into the frequency domain, and generate variable subband information including information about the subdivided variable subband; And

And a parameter generator for generating parameter information used for reconstruction of the audio object based on the variable subband.

A bitstream including variable subband information including information on a variable subband subdivided in the subband according to characteristics of a subband of an audio object and parameter information for reconstructing the audio object based on the variable subband. Receiving;

Reconstructing subbands based on the variable subband information; And

And reconstructing the audio object using the parameter information.

A bitstream including variable subband information including information on a variable subband subdivided in the subband according to characteristics of a subband of an audio object and parameter information for reconstructing the audio object based on the variable subband. Receiving unit for receiving;

A subband reconstruction unit for reconstructing a subband based on the variable subband information; And

And a reconstruction unit for reconstructing the audio object by using the parameter information.