KR102191260B1

KR102191260B1 - Apparatus and method for encoding/decoding of audio using multi channel audio codec and multi object audio codec

Info

Publication number: KR102191260B1
Application number: KR1020130094386A
Authority: KR
Inventors: 서정일; 백승권; 장대영; 강경옥; 김진웅
Original assignee: 한국전자통신연구원
Priority date: 2013-04-19
Filing date: 2013-08-08
Publication date: 2020-12-16
Also published as: KR20140126222A

Abstract

다채널 오디오 코덱과 다객체 오디오 코덱을 이용한 오디오 부호화/복호화 장치 및 방법이 개시된다.
오디오 부호화 장치는 입력 신호를 다운믹스한 제1 다운믹스 신호와 음성 객체 신호를 다운믹스하여 제2 다운믹스 신호를 생성하고, 상기 음성 객체 신호를 부호화하는 다객체 오디오 부호화부; 상기 제2 다운믹스 신호를 부호화하는 다운믹스 오디오 부호화부; 및 상기 입력 신호를 부호화한 비트스트림, 상기 음성 객체 신호를 부호화한 비트스트림 및 상기 제2 다운믹스 신호를 부호화한 비트스트림을 다중화하여 부호화 비트스트림을 생성하는 비트스트림 다중화부를 포함할 수 있다.Disclosed are an audio encoding/decoding apparatus and method using a multi-channel audio codec and a multi-object audio codec.
The audio encoding apparatus includes: a multi-object audio encoder configured to downmix a first downmix signal obtained by downmixing an input signal and a voice object signal to generate a second downmix signal and encode the voice object signal; A downmix audio encoder that encodes the second downmix signal; And a bitstream multiplexer configured to generate an encoded bitstream by multiplexing the bitstream encoding the input signal, the bitstream encoding the speech object signal, and the bitstream encoding the second downmix signal.

Description

Audio encoding/decoding device and method using multi-channel audio codec and multi-object audio codec {APPARATUS AND METHOD FOR ENCODING/DECODING OF AUDIO USING MULTI CHANNEL AUDIO CODEC AND MULTI OBJECT AUDIO CODEC}

본 발명은 다채널 오디오 코덱과 다객체 오디오 코덱을 이용한 오디오 부호화/복호화 장치 및 방법에 관한 것으로, 보다 상세하게는 다채널 오디오 코덱과 다객체 오디오 코덱으로 스테레오 시스템 및 다채널 오디오 시스템에서 음성신호와 주변음 간의 볼륨 차를 제어하여 음성신호가 명료하게 재생되도록 하는 장치 및 방법에 관한 것이다. The present invention relates to an audio encoding/decoding apparatus and method using a multi-channel audio codec and a multi-object audio codec, and more particularly, to a multi-channel audio codec and a multi-object audio codec. The present invention relates to an apparatus and method for clearly reproducing a voice signal by controlling a volume difference between ambient sounds.

TV방송의 본격적인 디지털화와 5.1채널 오디오 시스템의 보급으로 인하여 다양한 오디오 서비스가 제공되고 있다. Various audio services are provided due to the full-scale digitalization of TV broadcasting and the spread of 5.1-channel audio systems.

한국공개특허 제10-2012-0009150호(공개일 2012년 02월 01일)에는 다채널 오디오 신호를 부호화 및 복호화하는 기술이 개시되어 있다. 그러나, 다채널 오디오 콘텐츠는 음장을 재현하기 위하여 모노 콘텐츠나 스테레오 콘텐츠에 비하여 주변 잡음이나 악기음이 많이 재생될 수 있다. Korean Patent Publication No. 10-2012-0009150 (published on February 01, 2012) discloses a technique for encoding and decoding a multi-channel audio signal. However, in order to reproduce a sound field, multi-channel audio content may reproduce more ambient noise or musical instrument sound than mono content or stereo content.

따라서, 다채널 오디오 콘텐츠를 그대로 부호화 및 복호화하면, 주변 잡음이나 악기음 때문에 대사나 가수의 목소리와 같이 시청자가 원하는 내용이 포함된 음성 신호가 명료하게 들리지 않는 경우가 발생할 수 있다.Accordingly, when multi-channel audio content is encoded and decoded as it is, there may be a case in which an audio signal including a content desired by a viewer, such as a dialogue or a singer's voice, cannot be clearly heard due to ambient noise or musical instrument sound.

따라서, 다채널 오디오 콘텐츠에서 음성 신호가 명료하게 재생되도록 하는 방법이 요청되고 있다.Accordingly, there is a demand for a method of clearly reproducing an audio signal in multi-channel audio content.

본 발명은 다채널 오디오 코덱과 다객체 오디오 코덱을 결합하여 오디오 신호를 부호화함으로써, 스테레오 시스템 및 다채널 오디오 시스템에서 음성신호의 음질을 개선하거나 음성신호와 주변음 간의 볼륨 차를 제어하여 음성신호가 명료하게 재생되도록 하는 장치 및 방법을 제공할 수 있다.The present invention encodes an audio signal by combining a multi-channel audio codec and a multi-object audio codec, thereby improving the sound quality of a voice signal in a stereo system and a multi-channel audio system, or controlling the volume difference between the voice signal and the surrounding sound, thereby generating a voice signal. It is possible to provide an apparatus and method for clearly playing back.

본 발명의 일실시예에 따른 오디오 부호화 장치는 입력 신호를 다운믹스한 제1 다운믹스 신호와 음성 객체 신호를 다운믹스하여 제2 다운믹스 신호를 생성하고, 상기 음성 객체 신호를 부호화하는 다객체 오디오 부호화부; 상기 제2 다운믹스 신호를 부호화하는 다운믹스 오디오 부호화부; 및 상기 입력 신호를 부호화한 비트스트림, 상기 음성 객체 신호를 부호화한 비트스트림 및 상기 제2 다운믹스 신호를 부호화한 비트스트림을 다중화하여 부호화 비트스트림을 생성하는 비트스트림 다중화부를 포함할 수 있다.The audio encoding apparatus according to an embodiment of the present invention generates a second downmix signal by downmixing a first downmix signal obtained by downmixing an input signal and a voice object signal, and encodes the voice object signal. An encoding unit; A downmix audio encoder that encodes the second downmix signal; And a bitstream multiplexer configured to generate an encoded bitstream by multiplexing the bitstream encoding the input signal, the bitstream encoding the speech object signal, and the bitstream encoding the second downmix signal.

본 발명의 일실시예에 따른 오디오 부호화 장치는 상기 입력 신호를 다운믹스하여 상기 제1 다운믹스 신호를 생성하고, 상기 입력 신호를 부호화하는 다채널 오디오 부호화부를 더 포함할 수 있다.The audio encoding apparatus according to an embodiment of the present invention may further include a multi-channel audio encoder configured to downmix the input signal to generate the first downmix signal and encode the input signal.

본 발명의 일실시예에 따른 오디오 부호화 장치의 다채널 오디오 부호화부는 상기 제1 다운믹스 신호에서 상기 입력 신호를 복원하기 위한 부가 정보를 생성하여, 상기 다객체 오디오 부호화부에 전송할 수 있다.The multi-channel audio encoder of the audio encoding apparatus according to an embodiment of the present invention may generate additional information for reconstructing the input signal from the first downmix signal and transmit it to the multi-object audio encoder.

본 발명의 일실시예에 따른 오디오 부호화 장치의 다객체 오디오 부호화부는 상기 부가 정보와 렌더링 정보를 이용하여 상기 제1 다운믹스 신호에서 상기 음원 객체 신호를 추출할 수 있다.The multi-object audio encoding unit of the audio encoding apparatus according to an embodiment of the present invention may extract the sound source object signal from the first downmix signal using the additional information and rendering information.

본 발명의 일실시예에 따른 오디오 복호화 장치는 수신한 부호화 비트스트림을 입력 신호를 부호화한 비트스트림, 음성 객체 신호를 부호화한 비트스트림 및 제2 다운믹스 신호를 부호화한 비트스트림으로 역 다중화하는 비트스트림 역다중화부; 상기 제2 다운믹스 신호를 부호화한 비트스트림에서 제2 다운믹스 신호를 복호화하는 다운믹스 오디오 복호화부; 상기 음성 객체 신호를 부호화한 비트스트림과 상기 제2 다운믹스 신호를 이용하여 음성 객체 신호와 제1 다운믹스 신호를 복호화하는 다객체 오디오 복호화부; 및 상기 제1 다운믹스 신호와 상기 입력 신호를 부호화한 비트스트림을 이용하여 입력 신호를 복호화하는 다채널 오디오 복호화부를 포함할 수 있다.In an audio decoding apparatus according to an embodiment of the present invention, a bit for demultiplexing a received encoded bitstream into an input signal encoded bitstream, an audio object signal encoded bitstream, and a second downmix signal encoded bitstream Stream demultiplexer; A downmix audio decoder configured to decode a second downmix signal from the bitstream in which the second downmix signal is encoded; A multi-object audio decoding unit that decodes the voice object signal and the first downmix signal by using the bitstream encoded by the voice object signal and the second downmix signal; And a multi-channel audio decoder that decodes the input signal by using the first downmix signal and the bitstream encoded by the input signal.

본 발명의 일실시예에 따른 오디오 복호화 장치는 렌더링 정보에 기초하여 상기 음성 객체 신호의 크기 및 상기 입력 신호의 크기를 제어하여 출력하는 렌더링부를 더 포함할 수 있다.The audio decoding apparatus according to an embodiment of the present invention may further include a rendering unit that controls and outputs the size of the voice object signal and the size of the input signal based on rendering information.

본 발명의 일실시예에 따른 오디오 복호화 장치의 다객체 오디오 복호화부는 상기 제2 다운믹스 신호를 부호화한 비트스트림과 상기 제2 다운믹스 신호를 기초로 상기 제1 다운믹스 신호에서 상기 입력 신호를 복원하기 위한 트랜스코딩 된 다채널 부가 정보를 생성할 수 있다.The multi-object audio decoder of the audio decoding apparatus according to an embodiment of the present invention restores the input signal from the first downmix signal based on the bitstream encoded by the second downmix signal and the second downmix signal. It is possible to generate transcoded multi-channel side information for use.

본 발명의 일실시예에 따른 오디오 부호화 장치는 다채널 입력 신호의 특정 채널과 음성 객체 신호를 다운믹스한 제1 다운믹스 신호와 상기 다채널 입력 신호에서 상기 특정 채널을 제외한 나머지 채널을 다운믹스하여 제2 다운믹스 신호를 생성하고, 상기 나머지 채널을 부호화하는 다채널 오디오 부호화부; 상기 제2 다운믹스 신호를 부호화하는 다운믹스 오디오 부호화부; 및 상기 특정 채널과 상기 음성 객체 신호를 부호화한 비트스트림, 상기 나머지 채널을 부호화한 비트스트림, 및 상기 제2 다운믹스 신호를 부호화한 비트스트림을 다중화하여 부호화 비트스트림을 출력하는 비트스트림 다중화부를 포함할 수 있다.The audio encoding apparatus according to an embodiment of the present invention downmixes a first downmix signal obtained by downmixing a specific channel of a multi-channel input signal and an audio object signal, and a remaining channel other than the specific channel in the multi-channel input signal. A multi-channel audio encoder that generates a second downmix signal and encodes the remaining channels; A downmix audio encoder that encodes the second downmix signal; And a bitstream multiplexer configured to output an encoded bitstream by multiplexing the specific channel and the bitstream encoding the speech object signal, the bitstream encoding the remaining channels, and the bitstream encoding the second downmix signal. can do.

본 발명의 일실시예에 따른 오디오 부호화 장치는 상기 특정 채널과 음성 객체 신호를 다운믹스하여 상기 제1 다운믹스 신호를 생성하고, 상기 특정 채널과 음성 객체 신호를 부호화하는 다객체 오디오 부호화부를 더 포함할 수 있다.The audio encoding apparatus according to an embodiment of the present invention further includes a multi-object audio encoder configured to generate the first downmix signal by downmixing the specific channel and the speech object signal, and encoding the specific channel and the speech object signal. can do.

본 발명의 일실시예에 따른 오디오 복호화 장치는 수신한 부호화 비트스트림을 입력 신호의 특정 채널과 음성 객체 신호를 부호화한 비트스트림, 상기 입력 신호의 나머지 채널을 부호화한 비트스트림, 및 제2 다운믹스 신호를 부호화한 비트스트림으로 역 다중화하는 비트스트림 역다중화부; 상기 제2 다운믹스 신호를 부호화한 비트스트림에서 제2 다운믹스 신호를 복호화하는 다운믹스 오디오 복호화부; 상기 나머지 채널을 부호화한 비트스트림과 상기 제2 다운믹스 신호를 이용하여 상기 나머지 채널과 제1 다운믹스 신호를 복호화하는 다채널 오디오 복호화부; 및 상기 제1 다운믹스 신호와 상기 특정 채널과 음성 객체 신호를 부호화한 비트스트림을 이용하여 상기 특정 채널과 음성 객체 신호를 복호화하는 다객체 오디오 복호화부를 포함할 수 있다.The audio decoding apparatus according to an embodiment of the present invention uses a received encoded bitstream into a specific channel of an input signal and a bitstream obtained by encoding an audio object signal, a bitstream encoding the remaining channels of the input signal, and a second downmix. A bitstream demultiplexer for demultiplexing the signal into an encoded bitstream; A downmix audio decoder configured to decode a second downmix signal from the bitstream in which the second downmix signal is encoded; A multi-channel audio decoder for decoding the remaining channels and the first downmix signal by using the bitstream encoded by the remaining channels and the second downmix signal; And a multi-object audio decoder configured to decode the specific channel and the speech object signal by using the first downmix signal and the bitstream obtained by encoding the specific channel and the speech object signal.

본 발명의 일실시예에 따른 오디오 복호화 장치는 상기 다객체 오디오 부호화부가 상기 특정 채널을 복호화하는 과정에서 발생하는 지연에 기초하여 상기 나머지 채널에 지연을 인가하여 출력하는 지연부를 더 포함할 수 있다.The audio decoding apparatus according to an embodiment of the present invention may further include a delay unit for applying a delay to the remaining channels and outputting a delay based on a delay generated in a process of the multi-object audio encoding unit decoding the specific channel.

본 발명의 일실시예에 따른 오디오 부호화 방법은 입력 신호를 다운믹스한 제1 다운믹스 신호와 음성 객체 신호를 다운믹스하여 제2 다운믹스 신호를 생성하고, 상기 음성 객체 신호를 부호화하는 단계; 상기 제2 다운믹스 신호를 부호화하는 단계; 및 상기 입력 신호를 부호화한 비트스트림, 상기 음성 객체 신호를 부호화한 비트스트림 및 상기 제2 다운믹스 신호를 부호화한 비트스트림을 다중화하여 부호화 비트스트림을 생성하는 단계를 포함할 수 있다.An audio encoding method according to an embodiment of the present invention includes downmixing a first downmix signal obtained by downmixing an input signal and an audio object signal to generate a second downmix signal, and encoding the audio object signal; Encoding the second downmix signal; And generating an encoded bitstream by multiplexing the bitstream encoding the input signal, the bitstream encoding the speech object signal, and the bitstream encoding the second downmix signal.

본 발명의 일실시예에 따른 오디오 복호화 방법은 수신한 부호화 비트스트림을 입력 신호를 부호화한 비트스트림, 음성 객체 신호를 부호화한 비트스트림 및 제2 다운믹스 신호를 부호화한 비트스트림으로 역 다중화하는 단계; 상기 제2 다운믹스 신호를 부호화한 비트스트림에서 제2 다운믹스 신호를 복호화하는 단계; 상기 음성 객체 신호를 부호화한 비트스트림과 상기 제2 다운믹스 신호를 이용하여 음성 객체 신호와 제1 다운믹스 신호를 복호화하는 단계; 및 상기 제1 다운믹스 신호와 상기 입력 신호를 부호화한 비트스트림을 이용하여 입력 신호를 복호화하는 단계를 포함할 수 있다.An audio decoding method according to an embodiment of the present invention includes demultiplexing a received encoded bitstream into an input signal encoded bitstream, an audio object signal encoded bitstream, and a second downmix signal encoded bitstream. ; Decoding a second downmix signal from the bitstream in which the second downmix signal is encoded; Decoding an audio object signal and a first downmix signal using the bitstream encoded by the audio object signal and the second downmix signal; And decoding an input signal by using the first downmix signal and a bitstream obtained by encoding the input signal.

본 발명의 일실시예에 의하면, 다채널 오디오 코덱과 다객체 오디오 코덱을 결합하여 오디오 신호를 부호화함으로써, 스테레오 시스템 및 다채널 오디오 시스템에서 음성신호의 음질을 개선하거나 음성신호와 주변음 간의 볼륨 차를 제어하여 음성신호가 명료하게 재생되게 할 수 있다.According to an embodiment of the present invention, the audio signal is encoded by combining a multi-channel audio codec and a multi-object audio codec, thereby improving sound quality of an audio signal in a stereo system and a multi-channel audio system, or a volume difference between the audio signal and the surrounding sound. By controlling the sound signal can be clearly reproduced.

도 1은 본 발명의 제1 실시예에 따른 오디오 부호화 장치를 나타내는 도면이다.
도 2은 본 발명의 제1 실시예에 따른 오디오 부호화 장치의 구성간 정보 입출력을 나타내는 도면이다.
도 3은 본 발명의 제1 실시예에 따른 오디오 복호화 장치를 나타내는 도면이다.
도 4는 본 발명의 제1 실시예에 따른 오디오 복호화 장치의 구성간 정보 입출력을 나타내는 도면이다.
도 5는 본 발명의 제1 실시예에 따른 오디오 복호화 장치의 오디오 트랜스코더와 주변 구성간 정보 입출력을 나타내는 도면이다.
도 6은 본 발명의 제2 실시예에 따른 오디오 부호화 장치를 나타내는 도면이다.
도 7은 본 발명의 제2 실시예에 따른 오디오 부호화 장치의 구성간 정보 입출력을 나타내는 도면이다.
도 8은 본 발명의 제2 실시예에 따른 오디오 복호화 장치를 나타내는 도면이다.
도 9는 본 발명의 제2 실시예에 따른 오디오 복호화 장치의 구성간 정보 입출력을 나타내는 도면이다.
도 10은 본 발명의 제1 실시예에 따른 오디오 부호화 방법을 도시한 플로우차트이다.
도 11은 본 발명의 제1 실시예에 따른 오디오 복호화 방법을 도시한 플로우차트이다.
도 12는 본 발명의 제2 실시예에 따른 오디오 부호화 방법을 도시한 플로우차트이다.
도 13은 본 발명의 제2 실시예에 따른 오디오 부호화 방법을 도시한 플로우차트이다.1 is a diagram illustrating an audio encoding apparatus according to a first embodiment of the present invention.
2 is a diagram showing input/output of information between configurations of an audio encoding apparatus according to a first embodiment of the present invention.
3 is a diagram illustrating an audio decoding apparatus according to a first embodiment of the present invention.
4 is a diagram illustrating input/output of information between configurations of an audio decoding apparatus according to a first embodiment of the present invention.
5 is a diagram illustrating input/output of information between an audio transcoder and peripheral components of the audio decoding apparatus according to the first embodiment of the present invention.
6 is a diagram illustrating an audio encoding apparatus according to a second embodiment of the present invention.
7 is a diagram illustrating input/output of information between configurations of an audio encoding apparatus according to a second embodiment of the present invention.
8 is a diagram illustrating an audio decoding apparatus according to a second embodiment of the present invention.
9 is a diagram illustrating input/output of information between configurations of an audio decoding apparatus according to a second embodiment of the present invention.
10 is a flowchart showing an audio encoding method according to the first embodiment of the present invention.
11 is a flowchart illustrating an audio decoding method according to the first embodiment of the present invention.
12 is a flowchart showing an audio encoding method according to a second embodiment of the present invention.
13 is a flowchart illustrating an audio encoding method according to a second embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다. 본 발명의 일실시예에 따른 오디오 부호화 방법 및 복호화 방법은 오디오 부호화 장치 및 복호화 장치에 의해 수행될 수 있다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. An audio encoding method and a decoding method according to an embodiment of the present invention may be performed by an audio encoding apparatus and a decoding apparatus.

도 1은 본 발명의 일실시예에 따른 오디오 부호화 장치를 나타내는 도면이다. 1 is a diagram illustrating an audio encoding apparatus according to an embodiment of the present invention.

도 1을 참고하면, 본 발명의 일실시예에 따른 오디오 부호화 장치(100)는 다채널 오디오 부호화부(110), 다객체 오디오 부호화부(120), 다운믹스 오디오 부호화부(130), 및 비트스트림 다중화부(140)를 포함할 수 있다.Referring to FIG. 1, an audio encoding apparatus 100 according to an embodiment of the present invention includes a multi-channel audio encoder 110, a multi-object audio encoder 120, a downmix audio encoder 130, and a bit A stream multiplexer 140 may be included.

다채널 오디오 부호화부(110)는 입력 신호를 다운믹스하여 제1 다운믹스 신호를 생성하고, 입력 신호를 부호화하여 다채널 오디오 부가 정보 비트스트림을 생성할 수 있다. 예를 들어, 다채널 오디오 부호화부(110)는 MPS(MPEG Surround)를 이용하여 입력 신호를 부호화할 수 있다.The multi-channel audio encoder 110 downmixes the input signal to generate a first downmix signal, and encodes the input signal to generate a multi-channel audio side information bitstream. For example, the multi-channel audio encoder 110 may encode an input signal using MPEG Surround (MPS).

그리고, 다채널 오디오 부호화부(110)는 제1 다운믹스 신호를 다객체 오디오 부호화부(120)로 전송하고, 다채널 오디오 부가 정보 비트스트림을 비트스트림 다중화부(140)로 전송할 수 있다.In addition, the multi-channel audio encoder 110 may transmit the first downmix signal to the multi-object audio encoder 120 and transmit the multi-channel audio side information bitstream to the bitstream multiplexer 140.

또한, 다채널 오디오 부호화부(110)는 제1 다운믹스 신호에서 입력 신호를 복호화할 때 필요한 부가 정보를 생성하여, 다객체 오디오 부호화부(120)에 전송할 수도 있다.In addition, the multi-channel audio encoder 110 may generate additional information necessary for decoding an input signal from the first downmix signal and transmit it to the multi-object audio encoder 120.

다객체 오디오 부호화부(120)는 음성 객체 신호와 다채널 오디오 부호화부(110)가 입력 신호를 다운믹스하여 생성한 제1 다운믹스 신호를 수신할 수 있다. 그리고, 다객체 오디오 부호화부(120)는 수신한 제1 다운믹스 신호와 음성 객체 신호를 다운믹스하여 제2 다운믹스 신호를 생성하고, 음성 객체 신호를 부호화하여 다객체 오디오 부가 정보 비트스트림을 생성할 수 있다. 예를 들어, 다객체 오디오 부호화부(120)는 SAOC(Spatial Audio Object Coding)로 음성 객체 신호를 부호화할 수 있다.The multi-object audio encoder 120 may receive an audio object signal and a first downmix signal generated by downmixing an input signal by the multi-channel audio encoder 110. Then, the multi-object audio encoder 120 downmixes the received first downmix signal and the voice object signal to generate a second downmix signal, and encodes the voice object signal to generate a multi-object audio additional information bitstream. can do. For example, the multi-object audio encoder 120 may encode an audio object signal using Spatial Audio Object Coding (SAOC).

또한, 다객체 오디오 부호화부(120)는 제2 다운믹스 신호를 다운믹스 오디오 부호화부(130)로 전송하고, 다객체 오디오 부가 정보 비트스트림을 비트스트림 다중화부(140)로 전송할 수 있다.Also, the multi-object audio encoder 120 may transmit the second downmix signal to the downmix audio encoder 130 and transmit the multi-object audio side information bitstream to the bitstream multiplexer 140.

그리고, 다채널 오디오 부호화부(110)가 부가 정보를 생성한 경우, 다객체 오디오 부호화부(120)는 부가 정보와 렌더링 정보를 이용하여 제1 다운믹스 신호에서 음원 객체 신호를 추출할 수도 있다.In addition, when the multi-channel audio encoder 110 generates additional information, the multi-object audio encoder 120 may extract a sound source object signal from the first downmix signal using the additional information and rendering information.

또한, 다채널 오디오 부호화부(110)가 생성하는 제1 다운믹스 신호, 또는 다객체 오디오 부호화부(120)가 생성하는 제2 다운믹스 신호에 포함된 채널의 개수는 입력 신호에 포함된 개수보다 적거나 동일할 수 있다. 예를 들어, 입력 신호는 5.1 채널이고, 제1 다운믹스 신호 및 제2 다운믹스 신호는 모노 신호, 또는 스테레오 신호일 수 있다.In addition, the number of channels included in the first downmix signal generated by the multi-channel audio encoder 110 or the second downmix signal generated by the multi-object audio encoder 120 is greater than the number included in the input signal. It can be less or the same. For example, the input signal may be 5.1 channels, and the first downmix signal and the second downmix signal may be a mono signal or a stereo signal.

다운믹스 오디오 부호화부(130)는 다객체 오디오 부호화부(120)가 생성한 제2 다운믹스 신호를 부호화할 수 있다.The downmix audio encoder 130 may encode the second downmix signal generated by the multi-object audio encoder 120.

비트스트림 다중화부(140)는 다채널 오디오 부호화부(110)가 생성한 다채널 오디오 부가정보 비트스트림과 다객체 오디오 부호화부(120)가 생성한 다객체 오디오 부가정보 비트스트림 및 다운믹스 오디오 부호화부(130)가 생성한 다운믹스 오디오 비트스트림을 하나의 비트스트림 구조로 다중화하여 부호화 비트스트림을 생성할 수 있다.
The bitstream multiplexer 140 encodes a multi-channel audio additional information bitstream generated by the multi-channel audio encoder 110, a multi-object audio additional information bitstream generated by the multi-object audio encoder 120, and downmix audio encoding. The downmixed audio bitstream generated by the unit 130 may be multiplexed into one bitstream structure to generate an encoded bitstream.

도 2은 본 발명의 제1 실시예에 따른 오디오 부호화 장치의 구성간 정보 입출력을 나타내는 도면이다. 2 is a diagram showing input/output of information between configurations of an audio encoding apparatus according to a first embodiment of the present invention.

먼저, 다채널 오디오 부호화부(110)는 도 2에 도시된 바와 같이 복수의 채널을 포함하는 입력 신호(200)를 수신할 수 있다. 예를 들어, 입력 신호(200)는 정면 스피커에 대응하는 C(Center) 채널, 오른쪽 전방 스피커에 대응하는 FR(Front Right) 채널, 왼쪽 전방 스피커에 대응하는 FL(Front Left) 채널, 오른쪽 환경 스피커에 대응하는 RR(Rear Right) 채널, 왼쪽 전방 스피커에 대응하는 RL(Rear Left) 채널, 및 서브 우퍼 스피커에 대응하는 LFE(Low Frequency Effects) 채널을 포함하는 5.1 채널의 신호일 수 있다.First, as illustrated in FIG. 2, the multi-channel audio encoder 110 may receive an input signal 200 including a plurality of channels. For example, the input signal 200 is a C (Center) channel corresponding to the front speaker, a FR (Front Right) channel corresponding to the right front speaker, a FL (Front Left) channel corresponding to the left front speaker, and a right environment speaker. It may be a 5.1 channel signal including a rear right (RR) channel corresponding to, a rear left (RL) channel corresponding to a left front speaker, and a low frequency effects (LFE) channel corresponding to a subwoofer speaker.

그리고, 다채널 오디오 부호화부(110)는 수신한 입력 신호(200)를 부호화하여 다채널 오디오 부가 정보 비트스트림(211)을 생성하고, 생성한 다채널 오디오 부가 정보 비트스트림(211)을 비트스트림 다중화부(140)로 전송할 수 있다.In addition, the multi-channel audio encoder 110 encodes the received input signal 200 to generate a multi-channel audio side information bitstream 211, and converts the generated multi-channel audio side information bitstream 211 into a bitstream. It can be transmitted to the multiplexer 140.

또한, 다채널 오디오 부호화부(110)는 수신한 입력 신호(200)를 다운믹스하여 제1 다운믹스 신호(212)를 생성하고, 생성한 제1 다운믹스 신호(212)를 다객체 오디오 부호화부(120)로 전송할 수 있다.In addition, the multi-channel audio encoder 110 downmixes the received input signal 200 to generate a first downmix signal 212, and the generated first downmix signal 212 is a multi-object audio encoder. It can be transmitted to 120.

다음으로, 다객체 오디오 부호화부(120)는 외부 구성으로부터 음성 객체 신호를 수신하고, 다채널 오디오 부호화부(110)로부터 제1 다운믹스 신호(212)를 수신할 수 있다. 예를 들어, 외부 구성은 입력 신호(201)에서 음성 객체 신호를 추출할 수 있는 오디오 객체 추출 장치일 수 있다.Next, the multi-object audio encoder 120 may receive an audio object signal from an external component, and may receive a first downmix signal 212 from the multi-channel audio encoder 110. For example, the external component may be an audio object extraction device capable of extracting a voice object signal from the input signal 201.

이때, 다객체 오디오 부호화부(120)는 음성 객체 신호를 부호화하여 다객체 오디오 부가 정보 비트스트림(221)을 생성하고, 생성한 다객체 오디오 부가 정보 비트스트림(221)을 비트스트림 다중화부(140)로 전송할 수 있다.At this time, the multi-object audio encoding unit 120 generates a multi-object audio additional information bitstream 221 by encoding the speech object signal, and the generated multi-object audio additional information bitstream 221 is converted into a bitstream multiplexer 140. ).

또한, 다객체 오디오 부호화부(120)는 제1 다운믹스 신호와 음성 객체 신호를 다운믹스하여 제2 다운믹스 신호(222)를 생성하고, 생성한 제2 다운믹스 신호(222)를 다운믹스 오디오 부호화부(130)로 전송할 수 있다.In addition, the multi-object audio encoder 120 downmixes the first downmix signal and the voice object signal to generate a second downmix signal 222, and downmixes the generated second downmix signal 222 to downmix audio. It may be transmitted to the encoder 130.

이때, 제1 다운믹스 신호(212) 및 제2 다운믹스 신호(222)는 모노 신호, 또는 스테레오 신호일 수 있다.In this case, the first downmix signal 212 and the second downmix signal 222 may be a mono signal or a stereo signal.

그 다음으로, 다운믹스 오디오 부호화부(130)는 수신한 제2 다운믹스 신호(222)를 부호화하여 다운믹스 오디오 비트스트림(231)을 생성하고, 생성한 다운믹스 오디오 비트스트림(231)을 비트스트림 다중화부(140)로 전송할 수 있다.Next, the downmix audio encoder 130 generates a downmix audio bitstream 231 by encoding the received second downmix signal 222, and converts the generated downmix audio bitstream 231 to a bit. It can be transmitted to the stream multiplexer 140.

마지막으로 비트스트림 다중화부(140)는 다채널 오디오 부호화부(110)로부터 다채널 오디오 부가정보 비트스트림(211)을 수신하고, 다객체 오디오 부호화부(120)로부터 다객체 오디오 부가정보 비트스트림(221)을 수신하며, 다운믹스 오디오 부호화부(130)로부터 다운믹스 오디오 비트스트림(231)을 수신할 수 있다.Finally, the bitstream multiplexer 140 receives the multi-channel audio additional information bitstream 211 from the multi-channel audio encoder 110, and the multi-object audio additional information bitstream ( 221, and a downmixed audio bitstream 231 from the downmixed audio encoder 130.

그리고, 비트스트림 다중화부(140)는 다채널 오디오 부가정보 비트스트림(211), 다객체 오디오 부가정보 비트스트림(221), 및 다운믹스 오디오 비트스트림(231)을 하나의 비트스트림 구조로 다중화하여 부호화 비트스트림(241)을 생성할 수 있다.
In addition, the bitstream multiplexer 140 multiplexes the multi-channel audio side information bitstream 211, the multi-object audio side information bitstream 221, and the downmix audio bitstream 231 into one bitstream structure. An encoded bitstream 241 may be generated.

도 3은 본 발명의 제1 실시예에 따른 오디오 복호화 장치를 나타내는 도면이다. 3 is a diagram illustrating an audio decoding apparatus according to a first embodiment of the present invention.

도 3을 참고하면, 본 발명의 일실시예에 따른 오디오 복호화 장치(300)는 비트스트림 역다중화부(310), 다운믹스 오디오 복호화부(320), 다객체 오디오 복호화부(330), 다채널 오디오 복호화부(340), 및 렌더링부(350)를 포함할 수 있다.Referring to FIG. 3, the audio decoding apparatus 300 according to an embodiment of the present invention includes a bitstream demultiplexer 310, a downmix audio decoding unit 320, a multi-object audio decoding unit 330, and a multi-channel It may include an audio decoding unit 340 and a rendering unit 350.

비트스트림 역다중화부(310)는 오디오 부호화 장치(100)로부터 수신한 부호화 비트스트림을 입력 신호를 부호화한 다채널 오디오 부가 정보 비트스트림, 음성 객체 신호를 부호화한 다객체 오디오 부가 정보 비트스트림 및 제2 다운믹스 신호를 부호화한 다운믹스 오디오 비트스트림으로 역 다중화할 수 있다.The bitstream demultiplexer 310 includes an encoded bitstream received from the audio encoding apparatus 100, a multi-channel audio additional information bitstream encoding an input signal, a multi-object audio additional information bitstream encoding an audio object signal, and 2 The downmix signal can be demultiplexed into an encoded downmix audio bitstream.

이때, 비트스트림 역 다중화부(310)는 다채널 오디오 부가 정보 비트스트림을 다채널 오디오 복호화부(340)에 전송하고, 다객체 오디오 부가 정보 비트스트림을 다객체 오디오 복호화부(330)에 전송하며, 다운믹스 오디오 비트스트림을 다운믹스 오디오 복호화부(320)에 전송할 수 있다.At this time, the bitstream demultiplexer 310 transmits the multi-channel audio additional information bitstream to the multi-channel audio decoding unit 340, and transmits the multi-object audio additional information bitstream to the multi-object audio decoding unit 330. , The downmixed audio bitstream may be transmitted to the downmixed audio decoding unit 320.

또한 비트스트림 역다중화부(310)는 부호화 비트스트림에 프리셋(preset)의 형태로 포함된 렌더링 정보를 추출하여 렌더링부(350)에 전송할 수 있다.In addition, the bitstream demultiplexer 310 may extract rendering information included in the encoded bitstream in the form of a preset and transmit it to the rendering unit 350.

다운믹스 오디오 복호화부(320)는 비트스트림 역다중화부(310)로부터 수신한 다운믹스 오디오 비트스트림에서 제2 다운믹스 신호를 복호화할 수 있다. 이때, 다운믹스 오디오 복호화부(320)는 다운믹스 오디오 부호화부(130)가 제2 다운믹스를 부호화하기 위하여 이용한 부호화 방법에 기초하여 다운믹스 오디오 비트스트림에서 제2 다운믹스 신호를 복호화할 수 있다.The downmix audio decoder 320 may decode the second downmix signal from the downmix audio bitstream received from the bitstream demultiplexer 310. In this case, the downmix audio decoder 320 may decode the second downmix signal from the downmix audio bitstream based on the encoding method used by the downmix audio encoder 130 to encode the second downmix. .

다객체 오디오 복호화부(330)는 비트스트림 역다중화부(310)로부터 수신한 다객체 오디오 부가정보 비트스트림과 다운 믹스 오디오 복호화부(320)가 복호화한 제2 다운믹스 신호를 이용하여 음성 객체 신호와 제1 다운믹스 신호를 복호화할 수 있다. 예를 들어, 다객체 오디오 복호화부(330)는 다객체 오디오 부가정보 비트스트림과 제2 다운믹스 신호에 SAOC를 적용하여 음성 객체 신호와 제1 다운믹스 신호를 복호화할 수 있다.The multi-object audio decoding unit 330 uses the multi-object audio additional information bitstream received from the bitstream demultiplexer 310 and a second downmix signal decoded by the downmix audio decoding unit 320 to provide an audio object signal. And the first downmix signal can be decoded. For example, the multi-object audio decoding unit 330 may decode the voice object signal and the first downmix signal by applying SAOC to the multi-object audio side information bitstream and the second downmix signal.

다채널 오디오 복호화부(340)는 비트스트림 역다중화부(310)로부터 수신한 다채널 오디오 부가 정보 비트스트림과, 다객체 오디오 복호화부(330)가 복호화한 제1 다운믹스 신호를 이용하여 입력 신호를 복호화할 수 있다. 예를 들어, 다채널 오디오 복호화부(340)는 다채널 오디오 부가 정보 비트스트림과, 제1 다운믹스 신호에 MPS를 적용하여 입력 신호를 복호화할 수 있다.The multi-channel audio decoding unit 340 uses the multi-channel audio side information bitstream received from the bitstream demultiplexer 310 and a first downmix signal decoded by the multi-object audio decoding unit 330 to provide an input signal. Can be decrypted. For example, the multi-channel audio decoder 340 may decode an input signal by applying MPS to the multi-channel audio side information bitstream and the first downmix signal.

렌더링부(350)는 렌더링 정보에 기초하여 다객체 오디오 복호화부(330)가 복호화한 음성 객체 신호의 크기 및 다채널 오디오 복호화부(340)가 복호화한 입력 신호의 크기를 제어하여 출력할 수 있다.The rendering unit 350 may control and output a size of a voice object signal decoded by the multi-object audio decoding unit 330 and an input signal decoded by the multi-channel audio decoding unit 340 based on the rendering information. .

이때, 렌더링부(350)는 사용자의 요청에 따라 음성 객체 신호의 크기 및 입력 신호의 크기를 제어함으로써, 사용자가 입력 신호에 포함된 음성 신호를 보다 명확하게 인식 가능하도록 할 수 있다.In this case, the rendering unit 350 may control the size of the voice object signal and the size of the input signal according to the user's request, thereby enabling the user to more clearly recognize the voice signal included in the input signal.

또한, 렌더링부(350)는 부호화 비트스트림에 프리셋(preset)의 형태로 렌더링 정보가 포함되지 않은 경우, 사용자에 의하여 외부로부터 렌더링 정보를 입력 받을 수도 있다.In addition, when rendering information is not included in the encoded bitstream in the form of a preset, the rendering unit 350 may receive rendering information from the outside by a user.

입력 신호와 음성신호가 SAOC 인코더에 의해서 부호화되어 부호화 비트스트림이 된 경우, 다객체 오디오 복호화부(330)는 다객체/다채널 트랜스코더를 포함할 수 있다.When an input signal and an audio signal are encoded by an SAOC encoder to form an encoded bitstream, the multi-object audio decoding unit 330 may include a multi-object/multi-channel transcoder.

이때, 다객체/다채널 트랜스코더를 포함하는 다객체 오디오 복호화부(330)는 다객체 오디오 부가정보 비트스트림과 제2 다운믹스 신호를 이용하여 음성 객체 신호를 복호화하고, 렌더링 정보를 기초로 음성 객체 신호와 배경음의 크기를 조정하여 제1 다운믹스 신호를 생성할 수도 있다.At this time, the multi-object audio decoding unit 330 including the multi-object/multi-channel transcoder decodes the voice object signal using the multi-object audio additional information bitstream and the second downmix signal, and based on the rendering information The first downmix signal may be generated by adjusting the size of the object signal and the background sound.

또한, 다객체/다채널 트랜스코더를 포함하는 다객체 오디오 복호화부(330)는 다객체 오디오 부가정보 비트스트림과 제2 다운믹스 신호를 기초로 제1 다운믹스 신호를 다채널로 확장하여 입력 신호를 복호화하기 위한 부가 정보를 생성할 수 있다. 이때, 다객체/다채널 트랜스코더를 포함하는 다객체 오디오 복호화부(330)가 생성하는 부가 정보는 트랜스코딩 된 다채널 오디오 부가 정보 비트스트림일 수 있다. In addition, the multi-object audio decoding unit 330 including the multi-object/multi-channel transcoder expands the first downmix signal to multiple channels based on the multi-object audio side information bitstream and the second downmix signal to provide an input signal. Additional information for decoding may be generated. In this case, the additional information generated by the multi-object audio decoding unit 330 including the multi-object/multi-channel transcoder may be a transcoded multi-channel audio additional information bitstream.

그리고, 다채널 오디오 복호화부(340)는 제1 다운믹스 신호와 트랜스코딩 된 다채널 오디오 부가 정보 비트스트림을 이용하여 입력 신호를 복호화할 수 있다.
In addition, the multi-channel audio decoding unit 340 may decode the input signal using the first downmix signal and the transcoded multi-channel audio side information bitstream.

도 4는 본 발명의 제1 실시예에 따른 오디오 복호화 장치의 구성간 정보 입출력을 나타내는 도면이다. 4 is a diagram illustrating input/output of information between configurations of an audio decoding apparatus according to a first embodiment of the present invention.

먼저, 비트스트림 역다중화부(310)는 오디오 부호화 장치(100)로부터 부호화 비트스트림(400)을 수신할 수 있다.First, the bitstream demultiplexer 310 may receive an encoded bitstream 400 from the audio encoding apparatus 100.

이때, 비트스트림 역다중화부(310)는 수신한 비트스트림(400)을 입력 신호를 부호화한 다채널 오디오 부가 정보 비트스트림(413), 음성 객체 신호를 부호화한 다객체 오디오 부가 정보 비트스트림(412) 및 제2 다운믹스 신호를 부호화한 다운믹스 오디오 비트스트림(411)으로 역 다중화할 수 있다. 또한, 비트스트림 역 다중화부(310)는 다채널 오디오 부가 정보 비트스트림(413)을 다채널 오디오 복호화부(340)에 전송하고, 다객체 오디오 부가 정보 비트스트림(412)을 다객체 오디오 복호화부(330)에 전송하며, 다운믹스 오디오 비트스트림(411)을 다운믹스 오디오 복호화부(320)에 전송할 수 있다.At this time, the bitstream demultiplexer 310 includes a multi-channel audio additional information bitstream 413 encoding an input signal from the received bitstream 400, and a multi-object audio additional information bitstream 412 encoding an audio object signal. ) And the second downmix signal may be demultiplexed into the encoded downmix audio bitstream 411. In addition, the bitstream demultiplexer 310 transmits the multi-channel audio additional information bitstream 413 to the multi-channel audio decoding unit 340, and transmits the multi-object audio additional information bitstream 412 to the multi-object audio decoding unit. It transmits to 330, and may transmit the downmixed audio bitstream 411 to the downmixed audio decoder 320.

다음으로, 다운믹스 오디오 복호화부(320)는 수신한 다운믹스 오디오 비트스트림(411)에서 제2 다운믹스 신호(421)를 복호화할 수 있다. 이때, 다운 믹스 오디오 복호화부(320)는 제2 다운믹스 신호(421)를 다객체 오디오 복호화부(330)로 전송할 수 있다.Next, the downmix audio decoder 320 may decode the second downmix signal 421 from the received downmix audio bitstream 411. In this case, the downmix audio decoding unit 320 may transmit the second downmix signal 421 to the multi-object audio decoding unit 330.

그 다음으로, 다객체 오디오 복호화부(330)는 수신한 다객체 오디오 부가정보 비트스트림(412)과 제2 다운믹스 신호(421)를 이용하여 음성 객체 신호(431)와 제1 다운믹스 신호(432)를 복호화할 수 있다. 이때, 다객체 오디오 복호화부(330)는 음성 객체 신호(431)을 렌더링부(350)로 전송하고, 제1 다운믹스 신호(432)를 다채널 오디오 복호화부(340)로 전송할 수 있다.Next, the multi-object audio decoding unit 330 uses the received multi-object audio additional information bitstream 412 and the second downmix signal 421 to provide the voice object signal 431 and the first downmix signal ( 432) can be decrypted. In this case, the multi-object audio decoding unit 330 may transmit the voice object signal 431 to the rendering unit 350 and transmit the first downmix signal 432 to the multi-channel audio decoding unit 340.

다음으로, 다채널 오디오 복호화부(340)는 수신한 다채널 오디오 부가 정보 비트스트림(413)과, 제1 다운믹스 신호(432)를 이용하여 입력 신호(441)를 복호화할 수 있다.Next, the multi-channel audio decoder 340 may decode the input signal 441 by using the received multi-channel audio side information bitstream 413 and the first downmix signal 432.

마지막으로 렌더링부(350)는 사용자에 의하여 외부로부터 렌더링 정보를 입력 받고, 입력 받은 렌더링 정보에 기초하여 다객체 오디오 복호화부(330)가 복호화한 음성 객체 신호(431)의 크기 및 다채널 오디오 복호화부(340)가 복호화한 입력 신호(441)의 크기를 제어하여 출력할 수 있다.Finally, the rendering unit 350 receives rendering information from the outside by a user, and decodes the size and multi-channel audio of the speech object signal 431 decoded by the multi-object audio decoding unit 330 based on the received rendering information. The unit 340 may control and output the size of the decoded input signal 441.

이때, 렌더링부(350)는 사용자의 요청에 따라 음성 객체 신호(431)의 크기 및 입력 신호의 크기를 제어함으로써, 사용자가 입력 신호에 포함된 음성 신호를 보다 명확하게 인식 가능하도록 할 수 있다.In this case, the rendering unit 350 may control the size of the voice object signal 431 and the size of the input signal according to the user's request, thereby enabling the user to more clearly recognize the voice signal included in the input signal.

또한, 렌더링부(350)는 부호화 비트스트림(400)에 프리셋(preset)의 형태로 렌더링 정보가 포함된 경우, 비트스트림 역다중화부(310)로부터 렌더링 정보를 입력 받을 수도 있다.
In addition, the rendering unit 350 may receive rendering information from the bitstream demultiplexer 310 when rendering information is included in the encoded bitstream 400 in the form of a preset.

도 5는 본 발명의 제1 실시예에 따른 오디오 복호화 장치의 오디오 트랜스코더와 주변 구성간 정보 입출력을 나타내는 도면이다. 5 is a diagram illustrating input/output of information between an audio transcoder and peripheral components of the audio decoding apparatus according to the first embodiment of the present invention.

입력 신호와 음성신호가 SAOC 인코더에 의해서 부호화되어 부호화 비트스트림이 된 경우, 다객체 오디오 복호화부(330)는 다객체/다채널 오디오 트랜스코더부(500)를 포함할 수 있다.When the input signal and the voice signal are encoded by the SAOC encoder to become an encoded bitstream, the multi-object audio decoding unit 330 may include a multi-object/multi-channel audio transcoder unit 500.

이때, 다객체/다채널 오디오 트랜스코더부(500)는 다객체 오디오 부가정보 비트스트림(412)과 제2 다운믹스 신호(421)를 이용하여 음성 객체 신호(431)를 복호화하고, 렌더링 정보를 기초로 음성 객체 신호와 배경음의 크기를 조정하여 제1 다운믹스 신호(432)를 생성할 수 있다. 비트스트림 역다중화부(310)과 다운믹스 오디오 복호화부(320)의 동작은 도 4와 동일하므로 구체적인 설명은 생략한다. 이때, 다객체/다채널 오디오 트랜스코더부(500)는 복호화한 음성 객체 신호(431)를 렌더링부(350)로 전송할 수 있다.At this time, the multi-object/multi-channel audio transcoder unit 500 decodes the voice object signal 431 using the multi-object audio additional information bitstream 412 and the second downmix signal 421, and converts the rendering information. The first downmix signal 432 may be generated by adjusting the size of the voice object signal and the background sound based on the voice object signal. Since the operation of the bitstream demultiplexer 310 and the downmix audio decoder 320 is the same as that of FIG. 4, a detailed description will be omitted. In this case, the multi-object/multi-channel audio transcoder unit 500 may transmit the decoded voice object signal 431 to the rendering unit 350.

또한, 다객체/다채널 오디오 트랜스코더부(500)는 다객체 오디오 부가정보 비트스트림(412)과 제2 다운믹스 신호(421)를 기초로 제1 다운믹스 신호(432)를 다채널로 확장하기 위한 확장 부가 정보를 생성할 수 있다. 이때, 다객체/다채널 오디오 트랜스코더부(500)가 생성하는 확장 부가 정보는 트랜스코딩 된 다채널 오디오 부가정보 비트스트림(501)일 수 있다. 예를 들어, 다객체/다채널 오디오 트랜스코더부(500)는 비트스트림 역다중화부(310)가 역 다중화한 다채널 오디오 부가 정보 비트스트림을 트래스코딩하여 확장 부가 정보를 생성할 수 있다.In addition, the multi-object/multi-channel audio transcoder unit 500 extends the first downmix signal 432 to multiple channels based on the multi-object audio side information bitstream 412 and the second downmix signal 421 It is possible to generate additional information for extension. In this case, the extended additional information generated by the multi-object/multi-channel audio transcoder unit 500 may be a transcoded multi-channel audio additional information bitstream 501. For example, the multi-object/multi-channel audio transcoder unit 500 may generate extension additional information by trascoding the multi-channel audio additional information bitstream demultiplexed by the bitstream demultiplexer 310.

다음으로, 다채널 오디오 복호화부(340)는 제1 다운믹스 신호(432)와 트랜스코딩 된 다채널 오디오 부가 정보 비트스트림(501)을 이용하여 입력 신호를 복호화할 수 있다.Next, the multi-channel audio decoder 340 may decode the input signal by using the first downmix signal 432 and the transcoded multi-channel audio side information bitstream 501.

마지막으로 렌더링부(350)는 렌더링 정보에 따라 다객체/다채널 오디오 트랜스코더부(500)로부터 수신한 음성 객체 신호(431)와 다채널 오디오 복호화부(340)가 복호화한 입력 신호를 함께 렌더링하여 다채널 오디오 신호를 출력할 수 있다.
Finally, the rendering unit 350 renders the voice object signal 431 received from the multi-object/multi-channel audio transcoder unit 500 and the input signal decoded by the multi-channel audio decoding unit 340 together according to the rendering information. Thus, multi-channel audio signals can be output.

도 6은 본 발명의 제2 실시예에 따른 오디오 부호화 장치를 나타내는 도면이다. 6 is a diagram illustrating an audio encoding apparatus according to a second embodiment of the present invention.

도 6을 참고하면, 본 발명의 일실시예에 따른 오디오 부호화 장치(600)는 다객체 오디오 부호화부(610), 다채널 오디오 부호화부(620), 다운믹스 오디오 부호화부(630), 및 비트스트림 다중화부(640)를 포함할 수 있다.Referring to FIG. 6, the audio encoding apparatus 600 according to an embodiment of the present invention includes a multi-object audio encoding unit 610, a multi-channel audio encoding unit 620, a downmix audio encoding unit 630, and a bit A stream multiplexer 640 may be included.

다객체 오디오 부호화부(610)는 다채널 입력 신호의 특정 채널과 음성 객체 신호를 다운믹스하여 제1 다운믹스 신호를 생성할 수 있다. 또한, 다객체 오디오 부호화부(610)는 다채널 입력 신호의 특정 채널과 음성 객체 신호를 부호화하여 다객체 오디오 부가 정보 비트스트림을 생성할 수 있다. 예를 들어, 다객체 오디오 부호화부(610)는 SAOC로 특정 채널과 음성 객체 신호를 부호화할 수 있다.The multi-object audio encoder 610 may downmix a specific channel of a multi-channel input signal and an audio object signal to generate a first downmix signal. In addition, the multi-object audio encoder 610 may generate a multi-object audio side information bitstream by encoding a specific channel of a multi-channel input signal and an audio object signal. For example, the multi-object audio encoder 610 may encode a specific channel and an audio object signal using SAOC.

이때, 특정 채널은 다채널 입력 신호에 포함된 복수의 채널 중 일정 방향에 위치한 적어도 하나의 채널일 수 있다. 예를 들어, 입력 신호가 5.1 채널인 경우, 다객체 오디오 부호화부(610)는 FL 채널, FR 채널, C 채널과 같은 전방 채널을 수신하고, 수신한 전방 채널을 다운믹스하여 제1 다운믹스 신호를 생성할 수 있다. 이때, 제1 다운믹스 신호는 모노 신호, 스테레오 신호, 또는 변형된 FL 채널, FR 채널, C 채널로 구성되는 3채널이 될 수 있다.In this case, the specific channel may be at least one channel positioned in a predetermined direction among a plurality of channels included in the multi-channel input signal. For example, when the input signal is a 5.1 channel, the multi-object audio encoder 610 receives a front channel such as an FL channel, an FR channel, and a C channel, and downmixes the received front channel to obtain a first downmix signal. Can be created. In this case, the first downmix signal may be a mono signal, a stereo signal, or three channels including a modified FL channel, an FR channel, and a C channel.

그리고, 다객체 오디오 부호화부(610)는 제1 다운믹스 신호를 다채널 오디오 부호화부(620)로 전송하고, 다객체 오디오 부가 정보 비트스트림을 비트스트림 다중화부(640)로 전송할 수 있다.In addition, the multi-object audio encoder 610 may transmit the first downmix signal to the multi-channel audio encoder 620 and transmit the multi-object audio side information bitstream to the bitstream multiplexer 640.

다채널 오디오 부호화부(620)는 다채널 입력 신호에서 특정 채널을 제외한 나머지 채널을 수신할 수 있다. 예를 들어, 특정 채널이 5.1 채널의 전방 채널인 경우, 나머지 채널은 LFE 채널, RL 채널, 및 R 채널일 수 있다.The multi-channel audio encoder 620 may receive channels other than a specific channel from the multi-channel input signal. For example, when a specific channel is a 5.1 channel front channel, the remaining channels may be an LFE channel, an RL channel, and an R channel.

또한, 다채널 오디오 부호화부(620)는 다객체 오디오 부호화부(610)로부터 수신한 제1 다운믹스 신호와 나머지 신호를 다운믹스하여 제2 다운믹스 신호를 생성하고, 나머지 채널을 부호화하여 다채널 오디오 부가정보 비트스트림을 생성할 수 있다. 예를 들어, 다채널 오디오 부호화부(620)는 MPS를 이용하여 나머지 채널을 부호화할 수 있다.In addition, the multi-channel audio encoder 620 downmixes the first downmix signal and the remaining signals received from the multi-object audio encoder 610 to generate a second downmix signal, and encodes the remaining channels to generate a multichannel An audio side information bitstream can be generated. For example, the multi-channel audio encoder 620 may encode the remaining channels using MPS.

그리고, 다채널 오디오 부호화부(620)는 제2 다운믹스 신호를 다운믹스 오디오 부호화부(630)로 전송하고, 다채널 오디오 부가 정보 비트스트림을 비트스트림 다중화부(640)로 전송할 수 있다.In addition, the multi-channel audio encoder 620 may transmit the second downmix signal to the downmix audio encoder 630 and transmit the multi-channel audio side information bitstream to the bitstream multiplexer 640.

다운믹스 오디오 부호화부(630)는 다채널 오디오 부호화부(620)로부터 수신한 제2 다운믹스 신호를 부호화하여 다운믹스 오디오 비트스트림을 생성할 수 있다. 이때, 다운믹스 오디오 부호화부(630)는 다운믹스 오디오 부가 정보 비트스트림을 비트스트림 다중화부(640)로 전송할 수 있다.The downmix audio encoder 630 may generate a downmix audio bitstream by encoding the second downmix signal received from the multi-channel audio encoder 620. In this case, the downmix audio encoder 630 may transmit the downmix audio side information bitstream to the bitstream multiplexer 640.

비트스트림 다중화부(640)는 다객체 오디오 부호화부(610)로부터 수신한 다객체 오디오 부가정보 비트스트림, 다채널 오디오 부호화부(620)로부터 수신한 다채널 오디오 부가정보 비트스트림, 및 다운믹스 오디오 부호화부(630)로부터 수신한 다운믹스 오디오 비트스트림을 하나의 비트스트림이나 패키지로 다중화할 수 있다.
The bitstream multiplexer 640 includes a multi-object audio side information bitstream received from the multi-object audio encoder 610, a multi-channel audio side information bitstream received from the multi-channel audio encoder 620, and downmix audio. The downmixed audio bitstream received from the encoder 630 may be multiplexed into one bitstream or package.

도 7은 본 발명의 제2 실시예에 따른 오디오 부호화 장치의 구성간 정보 입출력을 나타내는 도면이다. 7 is a diagram illustrating input/output of information between configurations of an audio encoding apparatus according to a second embodiment of the present invention.

먼저, 다객체 오디오 부호화부(610)는 다채널 입력 신호에 포함된 신호들 중 전방 채널(701)과 음성 객체 신호(702)를 수신할 수 있다.First, the multi-object audio encoder 610 may receive a front channel 701 and an audio object signal 702 among signals included in a multi-channel input signal.

그리고, 다객체 오디오 부호화부(610)는 수신한 전방 채널(701)과 음성 객체 신호(702)를 다운믹스하여 제1 다운믹스 신호(712)를 생성할 수 있다. 이때, 다객체 오디오 부호화부(610)는 제1 다운믹스 신호(712)를 다채널 오디오 부호화부(620)로 전송하고,In addition, the multi-object audio encoder 610 may downmix the received front channel 701 and the voice object signal 702 to generate a first downmix signal 712. At this time, the multi-object audio encoding unit 610 transmits the first downmix signal 712 to the multi-channel audio encoding unit 620,

또한, 다객체 오디오 부호화부(610)는 다채널 입력 신호의 특정 채널과 음성 객체 신호를 부호화하여 다객체 오디오 부가 정보 비트스트림(711)을 생성할 수 있다. 이때, 다객체 오디오 부호화부(610)는 다객체 오디오 부가 정보 비트스트림(711)을 비트스트림 다중화부(640)로 전송할 수 있다.In addition, the multi-object audio encoder 610 may generate a multi-object audio side information bitstream 711 by encoding a specific channel of a multi-channel input signal and an audio object signal. In this case, the multi-object audio encoding unit 610 may transmit the multi-object audio additional information bitstream 711 to the bitstream multiplexing unit 640.

다음으로, 다채널 오디오 부호화부(620)는 다채널 입력 신호에서 전방 채널을 제외한 나머지 채널(703)을 수신할 수 있다. Next, the multi-channel audio encoder 620 may receive the remaining channels 703 excluding the front channel from the multi-channel input signal.

그리고, 다채널 오디오 부호화부(620)는 수신한 제1 다운믹스 신호(712)와 나머지 신호(703)를 다운믹스하여 제2 다운믹스 신호(722)를 생성하고, 제2 다운믹스 신호(722)를 다운믹스 오디오 부호화부(630)로 전송하고,Further, the multi-channel audio encoder 620 downmixes the received first downmix signal 712 and the remaining signal 703 to generate a second downmix signal 722, and generates a second downmix signal 722. ) To the downmix audio encoder 630,

또한, 다채널 오디오 부호화부(620)는 나머지 채널(703)을 부호화하여 다채널 오디오 부가정보 비트스트림(721)을 생성하고, 다채널 오디오 부가 정보 비트스트림(721)을 비트스트림 다중화부(640)로 전송할 수 있다.In addition, the multi-channel audio encoder 620 encodes the remaining channels 703 to generate a multi-channel audio additional information bitstream 721, and converts the multi-channel audio additional information bitstream 721 to a bitstream multiplexer 640. ).

그 다음으로, 다운믹스 오디오 부호화부(630)는 수신한 제2 다운믹스 신호(722)를 부호화하여 다운믹스 오디오 비트스트림(731)을 생성하고, 다운믹스 오디오 부가 정보 비트스트림(731)을 비트스트림 다중화부(640)로 전송할 수 있다.Next, the downmix audio encoder 630 encodes the received second downmix signal 722 to generate a downmix audio bitstream 731, and converts the downmix audio side information bitstream 731 into bits. It can be transmitted to the stream multiplexer 640.

마지막으로, 비트스트림 다중화부(640)는 수신한 다객체 오디오 부가정보 비트스트림(711), 다채널 오디오 부가정보 비트스트림(721), 및 다운믹스 오디오 비트스트림(731)을 하나의 비트스트림이나 패키지로 다중화하여 부호화 비트스트림(741)을 생성할 수 있다.
Finally, the bitstream multiplexing unit 640 converts the received multi-object audio side information bitstream 711, the multi-channel audio side information bitstream 721, and the downmixed audio bitstream 731 into one bitstream or An encoded bitstream 741 may be generated by multiplexing in a package.

도 8은 본 발명의 제2 실시예에 따른 오디오 복호화 장치를 나타내는 도면이다. 8 is a diagram illustrating an audio decoding apparatus according to a second embodiment of the present invention.

도 8을 참고하면, 본 발명의 일실시예에 따른 오디오 복호화 장치(800)는 비트스트림 역다중화부(810), 다운믹스 오디오 복호화부(820), 다채널 오디오 복호화부(830), 다객체 오디오 복호화부(840), 및 지연부(850)를 포함할 수 있다.Referring to FIG. 8, an audio decoding apparatus 800 according to an embodiment of the present invention includes a bitstream demultiplexer 810, a downmix audio decoder 820, a multi-channel audio decoder 830, and a multi-object. An audio decoding unit 840 and a delay unit 850 may be included.

비트스트림 역다중화부(810)는 오디오 부호화 장치(600)로부터 수신한 부호화 비트스트림을 입력 신호의 특정 채널과 음성 객체 신호를 부호화한 다객체 오디오 부가 정보 비트스트림, 입력 신호의 나머지 채널을 부호화한 다채널 오디오 부가 정보 비트스트림, 및 제2 다운믹스 신호를 부호화한 다운믹스 오디오 비트스트림으로 역 다중화할 수 있다.The bitstream demultiplexer 810 encodes the encoded bitstream received from the audio encoding device 600 into a specific channel of an input signal and a multi-object audio side information bitstream that encodes a voice object signal, and the remaining channels of the input signal. The multi-channel audio side information bitstream and the second downmix signal may be demultiplexed into a coded downmix audio bitstream.

이때, 비트스트림 역 다중화부(810)는 다객체 오디오 부가 정보 비트스트림을 다객체 오디오 복호화부(840)에 전송하고, 다채널 오디오 부가 정보 비트스트림을 다채널 오디오 복호화부(830)에 전송하며, 다운믹스 오디오 비트스트림을 다운믹스 오디오 복호화부(820)에 전송할 수 있다.At this time, the bitstream demultiplexer 810 transmits the multi-object audio side information bitstream to the multi-object audio decoding unit 840, and transmits the multi-channel audio side information bitstream to the multi-channel audio decoder 830. , The downmixed audio bitstream may be transmitted to the downmixed audio decoding unit 820.

또한 비트스트림 역다중화부(810)는 부호화 비트스트림에 프리셋(preset)의 형태로 포함된 렌더링 정보를 추출하여 다객체 오디오 복호화부(840)에 전송할 수 있다.In addition, the bitstream demultiplexer 810 may extract rendering information included in the encoded bitstream in the form of a preset and transmit it to the multi-object audio decoder 840.

다운믹스 오디오 복호화부(820)는 비트스트림 역다중화부(810)로부터 수신한 다운믹스 오디오 비트스트림에서 제2 다운믹스 신호를 복호화할 수 있다. 이때, 다운믹스 오디오 복호화부(820)는 복호화한 제2 다운믹스 신호를 다채널 오디오 부호화부(830)로 전송할 수 있다.The downmix audio decoder 820 may decode the second downmix signal from the downmix audio bitstream received from the bitstream demultiplexer 810. In this case, the downmix audio decoder 820 may transmit the decoded second downmix signal to the multi-channel audio encoder 830.

다채널 오디오 복호화부(830)는 비트스트림 역다중화부(810)로부터 수신한 다채널 오디오 부가 정보 비트스트림과 다운믹스 오디오 복호화부(820)로부터 수신한 제2 다운믹스 신호를 이용하여 입력 신호의 나머지 채널과 제1 다운믹스 신호를 복호화할 수 있다. 이때, 입력 신호의 나머지 채널은 입력 신호에 포함된 채널 중에서 다객체 오디오 부호화부(610)가 부호화하지 않은 채널일 수 있다. 예를 들어, 다채널 오디오 복호화부(830)는 다채널 오디오 부가 정보 비트스트림과, 제2 다운믹스 신호에 MPS를 적용하여 나머지 채널과 제1 다운믹스 신호를 복호화할 수 있다.The multi-channel audio decoder 830 uses the multi-channel audio side information bitstream received from the bitstream demultiplexer 810 and the second downmix signal received from the downmix audio decoder 820 to generate an input signal. The remaining channels and the first downmix signal may be decoded. In this case, the remaining channels of the input signal may be channels not encoded by the multi-object audio encoder 610 among channels included in the input signal. For example, the multi-channel audio decoder 830 may apply MPS to the multi-channel audio side information bitstream and the second downmix signal to decode the remaining channels and the first downmix signal.

또한, 다채널 오디오 복호화부(830)는 제1 다운 믹스 신호를 다객체 오디오 복호화부(840)에 전송하고, 나머지 채널을 지연부(850)에 전송할 수 있다.In addition, the multi-channel audio decoding unit 830 may transmit the first down-mix signal to the multi-object audio decoding unit 840 and transmit the remaining channels to the delay unit 850.

다객체 오디오 복호화부(840)는 비트스트림 역다중화부(810)로부터 수신한 다객체 오디오 부가 정보 비트스트림과 다채널 오디오 복호화부(830)로부터 수신한 제1 다운믹스 신호를 이용하여 입력 신호의 특정 채널과 음성 객체 신호를 복호화할 수 있다. 예를 들어, 다객체 오디오 복호화부(330)는 다객체 오디오 부가정보 비트스트림과 제1 다운믹스 신호에 SAOC를 적용하여 입력 신호의 특정 채널과 음성 객체 신호를 복호화할 수 있다.The multi-object audio decoding unit 840 uses the multi-object audio additional information bit stream received from the bit stream demultiplexer 810 and the first downmix signal received from the multi-channel audio decoding unit 830. It is possible to decode a specific channel and voice object signal. For example, the multi-object audio decoding unit 330 may apply SAOC to the multi-object audio side information bitstream and the first downmix signal to decode a specific channel of an input signal and a voice object signal.

이때, 다객체 오디오 복호화부(840)는 렌더링 정보에 기초하여 음성 객체 신호의 크기 및 입력 신호의 특정 채널의 크기를 제어하여 출력할 수 있다.In this case, the multi-object audio decoder 840 may control and output the size of the voice object signal and the size of a specific channel of the input signal based on the rendering information.

지연부(850)는 다객체 오디오 부호화부(840)가 특정 채널을 복호화하는 과정에서 발생하는 지연에 기초하여 다채널 오디오 복호화부(830)가 복호화한 나머지 채널에 지연을 인가하여 출력할 수 있다.The delay unit 850 may apply and output a delay to the remaining channels decoded by the multi-channel audio decoder 830 based on a delay generated in the process of decoding a specific channel by the multi-object audio encoder 840. .

다채널 오디오 복호화부(830)가 나머지 신호를 복호화하는 시점에서 특정 채널은 아직 제1 다운믹스 신호로 부호화된 상태일 수 있다. 그리고, 다객체 오디오 복호화부(840)가 제1 다운믹스 신호에서 특정 채널을 복호화 하는 과정은 일정 시간을 필요로 한다. 그러므로, 복호화된 나머지 신호를 그대로 출력할 경우, 다객체 오디오 복호화부(840)가 출력하는 특정 채널과 재생 위치 또는 시간이 어긋날 수 있다.When the multi-channel audio decoder 830 decodes the remaining signals, the specific channel may still be encoded as the first downmix signal. In addition, a process in which the multi-object audio decoding unit 840 decodes a specific channel from the first downmix signal requires a predetermined time. Therefore, when the remaining decoded signals are output as they are, a specific channel output from the multi-object audio decoding unit 840 may be shifted from a playback position or time.

즉, 지연부(850)는 다객체 오디오 복호화부(840)가 제1 다운믹스 신호에서 특정 채널을 복호화 하는 과정에 소요되는 시간만큼 나머지 신호를 지연시켜 출력함으로써, 다채널 오디오 복호화부(830)가 출력하는 특정 채널과 지연부(850)가 출력하는 나머지 신호를 동기화할 수 있다.
That is, the delay unit 850 delays and outputs the remaining signals by the time required for the multi-object audio decoding unit 840 to decode a specific channel from the first downmix signal, so that the multi-channel audio decoding unit 830 A specific channel output by the delay unit 850 may be synchronized with the remaining signal output from the delay unit 850.

도 9는 본 발명의 제2 실시예에 따른 오디오 복호화 장치의 구성간 정보 입출력을 나타내는 도면이다. 9 is a diagram illustrating input/output of information between configurations of an audio decoding apparatus according to a second embodiment of the present invention.

먼저, 비트스트림 역다중화부(810)는 오디오 부호화 장치(600)로부터 부호화 비트스트림(901)을 수신할 수 있다. 그리고, 비트스트림 역다중화부(810)는 부호화 비트스트림(901)을 입력 신호의 특정 채널과 음성 객체 신호를 부호화한 다객체 오디오 부가 정보 비트스트림(911), 입력 신호의 나머지 채널을 부호화한 다채널 오디오 부가 정보 비트스트림(913), 및 제2 다운믹스 신호를 부호화한 다운믹스 오디오 비트스트림(912)으로 역 다중화할 수 있다.First, the bitstream demultiplexer 810 may receive an encoded bitstream 901 from the audio encoding apparatus 600. In addition, the bitstream demultiplexer 810 encodes the encoded bitstream 901 into a specific channel of the input signal, the multi-object audio side information bitstream 911 encoding the voice object signal, and the remaining channels of the input signal. The channel audio side information bitstream 913 and the downmix audio bitstream 912 obtained by encoding the second downmix signal may be demultiplexed.

이때, 비트스트림 역 다중화부(810)는 다객체 오디오 부가 정보 비트스트림(911)을 다객체 오디오 복호화부(840)에 전송하고, 다채널 오디오 부가 정보 비트스트림(913)을 다채널 오디오 복호화부(830)에 전송하며, 다운믹스 오디오 비트스트림(912)을 다운믹스 오디오 복호화부(820)에 전송할 수 있다.At this time, the bitstream demultiplexer 810 transmits the multi-object audio additional information bitstream 911 to the multi-object audio decoding unit 840, and transmits the multi-channel audio additional information bitstream 913 to the multi-channel audio decoding unit. It is transmitted to 830 and may transmit the downmixed audio bitstream 912 to the downmixed audio decoder 820.

다음으로, 다운믹스 오디오 복호화부(820)는 수신한 다운믹스 오디오 비트스트림(912)에서 제2 다운믹스 신호(921)를 복호화할 수 있다. 이때, 다운믹스 오디오 복호화부(820)는 복호화한 제2 다운믹스 신호(921)를 다채널 오디오 부호화부(830)로 전송할 수 있다.Next, the downmix audio decoder 820 may decode the second downmix signal 921 from the received downmix audio bitstream 912. In this case, the downmix audio decoder 820 may transmit the decoded second downmix signal 921 to the multi-channel audio encoder 830.

그 다음으로, 다채널 오디오 복호화부(830)는 수신한 다채널 오디오 부가 정보 비트스트림(913)과 제2 다운믹스 신호(830)를 이용하여 입력 신호의 나머지 채널(932)과 제1 다운믹스 신호(931)를 복호화할 수 있다. 이때, 입력 신호의 나머지 채널은 입력 신호에 포함된 채널 중에서 다객체 오디오 부호화부(610)가 부호화하지 않은 채널일 수 있다. 또한, 다채널 오디오 복호화부(830)는 제1 다운 믹스 신호(931)를 다객체 오디오 복호화부(840)에 전송하고, 나머지 채널(932)을 지연부(850)에 전송할 수 있다.Next, the multi-channel audio decoder 830 uses the received multi-channel audio side information bitstream 913 and the second downmix signal 830 to perform a first downmix with the remaining channels 932 of the input signal. The signal 931 can be decoded. In this case, the remaining channels of the input signal may be channels not encoded by the multi-object audio encoder 610 among channels included in the input signal. In addition, the multi-channel audio decoding unit 830 may transmit the first downmix signal 931 to the multi-object audio decoding unit 840 and transmit the remaining channels 932 to the delay unit 850.

다음으로, 다객체 오디오 복호화부(840)는 수신한 다객체 오디오 부가 정보 비트스트림(911)과 제1 다운믹스 신호(931)를 이용하여 입력 신호의 특정 채널(941)과 음성 객체 신호를 복호화할 수 있다. 이때, 다객체 오디오 복호화부(840)는 렌더링 정보에 기초하여 음성 객체 신호의 크기 및 입력 신호의 특정 채널(941)의 크기를 제어하여 출력할 수 있다.Next, the multi-object audio decoding unit 840 decodes the specific channel 941 of the input signal and the voice object signal using the received multi-object audio additional information bitstream 911 and the first downmix signal 931 can do. In this case, the multi-object audio decoding unit 840 may control and output the size of the voice object signal and the size of a specific channel 941 of the input signal based on the rendering information.

그리고, 지연부(850)는 다객체 오디오 부호화부(840)가 특정 채널(941)을 복호화하는 과정에서 발생하는 시간만큼 나머지 채널(932)의 출력에 지연을 인가하여 출력할 수 있다. 구체적으로 지연부(850)는 나머지 채널(932)을 지연시켜 다객체 오디오 부호화부(840)가 출력하는 특정 채널(941)과 동기화하고, 동기화된 나머지 채널(951)을 출력할 수 있다.
In addition, the delay unit 850 may apply a delay to the output of the remaining channels 932 for as long as a time generated in the process of the multi-object audio encoding unit 840 decoding the specific channel 941 and output it. Specifically, the delay unit 850 may delay the remaining channels 932 to synchronize with a specific channel 941 output from the multi-object audio encoder 840 and output the synchronized remaining channels 951.

도 10은 본 발명의 제1 실시예에 따른 오디오 부호화 방법을 도시한 플로우차트이다.10 is a flowchart showing an audio encoding method according to the first embodiment of the present invention.

단계(1010)에서 다채널 오디오 부호화부(110)는 입력 신호를 다운믹스하여 제1 다운믹스 신호를 생성하고, 입력 신호를 부호화하여 다채널 오디오 부가 정보 비트스트림을 생성할 수 있다.In operation 1010, the multi-channel audio encoder 110 downmixes the input signal to generate a first downmix signal, and encodes the input signal to generate a multi-channel audio side information bitstream.

단계(1020)에서 다객체 오디오 부호화부(120)는 음성 객체 신호와 단계(1010)에서 생성한 제1 다운믹스 신호를 수신할 수 있다. 그리고, 다객체 오디오 부호화부(120)는 수신한 제1 다운믹스 신호와 음성 객체 신호를 다운믹스하여 제2 다운믹스 신호를 생성하고, 음성 객체 신호를 부호화하여 다객체 오디오 부가 정보 비트스트림을 생성할 수 있다.In operation 1020, the multi-object audio encoder 120 may receive an audio object signal and a first downmix signal generated in operation 1010. Then, the multi-object audio encoder 120 downmixes the received first downmix signal and the voice object signal to generate a second downmix signal, and encodes the voice object signal to generate a multi-object audio additional information bitstream. can do.

단계(1030)에서 다운믹스 오디오 부호화부(130)는 단계(1020)에서 생성한 제2 다운믹스 신호를 부호화할 수 있다.In step 1030, the downmix audio encoder 130 may encode the second downmix signal generated in step 1020.

단계(1040)에서 비트스트림 다중화부(140)는 단계(1010)에서 생성한 다채널 오디오 부가정보 비트스트림과 단계(1020)에서 생성한 다객체 오디오 부가정보 비트스트림 및 단계(1030)에서 생성한 다운믹스 오디오 비트스트림을 하나의 비트스트림 구조로 다중화하여 부호화 비트스트림을 생성할 수 있다.
In step 1040, the bitstream multiplexer 140 includes the multi-channel audio side information bitstream generated in step 1010, the multi-object audio side information bitstream generated in step 1020, and the bitstream generated in step 1030. An encoded bitstream may be generated by multiplexing the downmixed audio bitstream into one bitstream structure.

도 11은 본 발명의 제1 실시예에 따른 오디오 복호화 방법을 도시한 플로우차트이다.11 is a flowchart showing an audio decoding method according to the first embodiment of the present invention.

단계(1110)에서 비트스트림 역다중화부(310)는 오디오 부호화 장치(100)로부터 부호화 비트스트림을 수신할 수 있다. 그리고, 비트스트림 역다중화부(310)는 수신한 부호화 비트스트림을 입력 신호를 부호화한 다채널 오디오 부가 정보 비트스트림, 음성 객체 신호를 부호화한 다객체 오디오 부가 정보 비트스트림 및 제2 다운믹스 신호를 부호화한 다운믹스 오디오 비트스트림으로 역 다중화할 수 있다.In operation 1110, the bitstream demultiplexer 310 may receive an encoded bitstream from the audio encoding apparatus 100. In addition, the bitstream demultiplexer 310 converts the received encoded bitstream into a multi-channel audio side information bitstream encoding an input signal, a multi-object audio side information bitstream encoding an audio object signal, and a second downmix signal. It can be demultiplexed into an encoded downmixed audio bitstream.

또한 비트스트림 역다중화부(310)는 부호화 비트스트림에 프리셋(preset)의 형태로 포함된 렌더링 정보를 추출할 수 있다.Also, the bitstream demultiplexer 310 may extract rendering information included in the encoded bitstream in the form of a preset.

단계(1120)에서 다운믹스 오디오 복호화부(320)는 단계(1110)에서 생성한 다운믹스 오디오 비트스트림에서 제2 다운믹스 신호를 복호화할 수 있다. In operation 1120, the downmix audio decoder 320 may decode the second downmix signal from the downmix audio bitstream generated in operation 1110.

단계(1130)에서 다객체 오디오 복호화부(330)는 단계(1110)에서 생성한 다객체 오디오 부가정보 비트스트림과 단계(1120)에서 복호화한 제2 다운믹스 신호를 이용하여 음성 객체 신호와 제1 다운믹스 신호를 복호화할 수 있다.In step 1130, the multi-object audio decoding unit 330 uses the multi-object audio side information bitstream generated in step 1110 and the second downmix signal decoded in step 1120 to provide a voice object signal and a first signal. The downmix signal can be decoded.

단계(1140)에서 다채널 오디오 복호화부(340)는 단계(1110)에서 생성한 다채널 오디오 부가 정보 비트스트림과, 단계(1130)에서 복호화한 제1 다운믹스 신호를 이용하여 입력 신호를 복호화할 수 있다.In step 1140, the multi-channel audio decoding unit 340 decodes the input signal using the multi-channel audio side information bitstream generated in step 1110 and the first downmix signal decoded in step 1130. I can.

단계(1150)에서 렌더링부(350)는 단계(1110)에서 추출한 렌더링 정보, 또는 외부에서 입력된 렌더링 정보에 기초하여 단계(1130)에서 복호화한 음성 객체 신호의 크기 및 단계(1140)에서 복호화한 입력 신호의 크기를 제어하여 출력할 수 있다.
In step 1150, the rendering unit 350 determines the size of the voice object signal decoded in step 1130 based on the rendering information extracted in step 1110 or the rendering information input from the outside, and the decoded sound object signal in step 1140. You can control and output the input signal.

도 12는 본 발명의 제2 실시예에 따른 오디오 부호화 방법을 도시한 플로우차트이다.12 is a flowchart showing an audio encoding method according to a second embodiment of the present invention.

단계(1210)에서 다객체 오디오 부호화부(610)는 다채널 입력 신호의 특정 채널과 음성 객체 신호를 다운믹스하여 제1 다운믹스 신호를 생성할 수 있다. 또한, 다객체 오디오 부호화부(610)는 다채널 입력 신호의 특정 채널과 음성 객체 신호를 부호화하여 다객체 오디오 부가 정보 비트스트림을 생성할 수 있다.In operation 1210, the multi-object audio encoder 610 may downmix a specific channel of the multi-channel input signal and an audio object signal to generate a first downmix signal. In addition, the multi-object audio encoder 610 may generate a multi-object audio side information bitstream by encoding a specific channel of a multi-channel input signal and an audio object signal.

단계(1220)에서 다채널 오디오 부호화부(620)는 다채널 입력 신호에서 특정 채널을 제외한 나머지 채널을 수신할 수 있다. 또한, 다채널 오디오 부호화부(620)는 단계(1210)에서 생성한 제1 다운믹스 신호와 수신한 나머지 신호를 다운믹스하여 제2 다운믹스 신호를 생성하고, 나머지 채널을 부호화하여 다채널 오디오 부가정보 비트스트림을 생성할 수 있다.In operation 1220, the multi-channel audio encoder 620 may receive channels other than a specific channel from the multi-channel input signal. In addition, the multi-channel audio encoder 620 downmixes the first downmix signal generated in step 1210 and the other received signal to generate a second downmix signal, and encodes the remaining channels to add multi-channel audio. You can create an information bitstream.

단계(1230)에서 다운믹스 오디오 부호화부(630)는 단계(1220)에서 생성한 제2 다운믹스 신호를 부호화하여 다운믹스 오디오 비트스트림을 생성할 수 있다. In operation 1230, the downmix audio encoder 630 may generate a downmix audio bitstream by encoding the second downmix signal generated in operation 1220.

단계(1240)에서 비트스트림 다중화부(640)는 단계(1210)에서 생성한 다객체 오디오 부가정보 비트스트림, 단계(1220)에서 생성한 다채널 오디오 부가정보 비트스트림, 및 단계(1230)에서 생성한 다운믹스 오디오 비트스트림을 하나의 비트스트림이나 패키지로 다중화하여 부호화 비트스트림을 생성할 수 있다.
In step 1240, the bitstream multiplexer 640 generates the multi-object audio side information bitstream generated in step 1210, the multi-channel audio side information bitstream generated in step 1220, and the step 1230 An encoded bitstream may be generated by multiplexing one downmixed audio bitstream into one bitstream or package.

도 13은 본 발명의 제2 실시예에 따른 오디오 부호화 방법을 도시한 플로우차트이다.13 is a flowchart illustrating an audio encoding method according to a second embodiment of the present invention.

단계(1310)에서 비트스트림 역다중화부(810)는 오디오 부호화 장치(600)로부터 부호화 비트스트림을 수신할 수 있다. 그리고, 비트스트림 역다중화부(810)는 수신한 부호화 비트스트림을 입력 신호의 특정 채널과 음성 객체 신호를 부호화한 다객체 오디오 부가 정보 비트스트림, 입력 신호의 나머지 채널을 부호화한 다채널 오디오 부가 정보 비트스트림, 및 제2 다운믹스 신호를 부호화한 다운믹스 오디오 비트스트림으로 역 다중화할 수 있다.In operation 1310, the bitstream demultiplexer 810 may receive an encoded bitstream from the audio encoding apparatus 600. Further, the bitstream demultiplexer 810 uses the received encoded bitstream into a specific channel of the input signal and multi-object audio additional information bitstream encoding the voice object signal, and multi-channel audio additional information encoding the remaining channels of the input signal. The bitstream and the second downmix signal may be demultiplexed into a coded downmix audio bitstream.

또한, 비트스트림 역다중화부(810)는 부호화 비트스트림에 프리셋(preset)의 형태로 포함된 렌더링 정보를 추출할 수 있다.In addition, the bitstream demultiplexer 810 may extract rendering information included in the encoded bitstream in the form of a preset.

단계(1320)에서 다운믹스 오디오 복호화부(820)는 단계(1310)에서 생성한 다운믹스 오디오 비트스트림에서 제2 다운믹스 신호를 복호화할 수 있다. In step 1320, the downmix audio decoder 820 may decode the second downmix signal from the downmix audio bitstream generated in step 1310.

단계(1330)에서 다채널 오디오 복호화부(830)는 단계(1310)에서 생성한 다채널 오디오 부가 정보 비트스트림과 단계(1320)에서 생성한 제2 다운믹스 신호를 이용하여 입력 신호의 나머지 채널과 제1 다운믹스 신호를 복호화할 수 있다. 이때, 입력 신호의 나머지 채널은 입력 신호에 포함된 채널 중에서 다객체 오디오 부호화부(610)가 부호화하지 않은 채널일 수 있다.In step 1330, the multi-channel audio decoder 830 uses the multi-channel audio side information bitstream generated in step 1310 and the second downmix signal generated in step 1320 to obtain the remaining channels of the input signal. The first downmix signal can be decoded. In this case, the remaining channels of the input signal may be channels not encoded by the multi-object audio encoder 610 among channels included in the input signal.

단계(1340)에서 다객체 오디오 복호화부(840)는 단계(1310)에서 생성한 다객체 오디오 부가 정보 비트스트림과 단계(1330)에서 복호화한 제1 다운믹스 신호를 이용하여 입력 신호의 특정 채널과 음성 객체 신호를 복호화할 수 있다.In step 1340, the multi-object audio decoding unit 840 uses the multi-object audio additional information bitstream generated in step 1310 and the first downmix signal decoded in step 1330 to determine the specific channel of the input signal and Voice object signals can be decoded.

단계(1350)에서 다객체 오디오 복호화부(840)는 단계(1310)에서 추출한 렌더링 정보, 또는 외부에서 입력된 렌더링 정보에 기초하여 단계(1340)에서 복호화한 음성 객체 신호의 크기 및 입력 신호의 특정 채널의 크기를 제어하여 출력할 수 있다.In step 1350, the multi-object audio decoding unit 840 determines the size of the audio object signal decoded in step 1340 and the input signal based on the rendering information extracted in step 1310 or rendering information input from the outside. You can output by controlling the size of the channel.

단계(1360)에서 지연부(850)는 단계(1340)가 수행되는 과정에서 발생하는 지연에 기초하여 단계(1330)에서 복호화한 나머지 채널에 지연을 인가하여 출력할 수 있다.In step 1360, the delay unit 850 may apply and output a delay to the remaining channels decoded in step 1330 based on a delay generated in the process of performing step 1340.

구체적으로, 지연부(850)는 단계(1340)가 수행되는 과정에 소요되는 시간만큼 나머지 신호를 지연시켜 출력함으로써, 단계(1350)에서 출력하는 특정 채널과 단계(1360)에서 출력하는 나머지 신호를 동기화할 수 있다.
Specifically, the delay unit 850 delays and outputs the remaining signals for the amount of time required for the process in which step 1340 is performed, so that a specific channel output in step 1350 and the remaining signals output in step 1360 are output. Can be synchronized.

본 발명은 MPS로 대표되는 파라메트릭 다채널 오디오 코덱과 SAOC로 대표되는 파라메트릭 다객체 오디오 코덱을 결합하여 오디오 신호를 부호화함으로써, 스테레오 시스템 및 다채널 오디오 시스템에서 음성신호의 음질을 개선하거나 음성신호와 주변음 간의 볼륨 차를 제어하여 음성신호가 명료하게 재생되게 할 수 있다.
The present invention encodes an audio signal by combining a parametric multi-channel audio codec represented by MPS and a parametric multi-object audio codec represented by SAOC, thereby improving sound quality of a speech signal in a stereo system and a multi-channel audio system, or By controlling the volume difference between the sound and the surrounding sound, the audio signal can be clearly reproduced.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, although the present invention has been described by the limited embodiments and drawings, the present invention is not limited to the above embodiments, and various modifications and variations from these descriptions are those of ordinary skill in the field to which the present invention belongs. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention is limited to the described embodiments and should not be defined, but should be defined by the claims to be described later as well as equivalents to the claims.

110: 다채널 오디오 부호화부
120: 다객체 오디오 부호화부
130: 다운믹스 오디오 부호화부
140: 비트스트림 다중화부110: multi-channel audio encoder
120: multi-object audio encoder
130: downmix audio encoder
140: bitstream multiplexer

Claims

delete

A first downmix signal is generated by downmixing a multi-channel audio signal and a voice object signal corresponding to specified M channels among N channels, and a multi-channel audio signal and a voice object corresponding to the M channels A multi-object audio encoder that encodes a signal and generates a multi-object audio side information bitstream -M is a number less than N-;
A second downmix signal is generated by downmixing the first downmix signal and a multi-channel audio signal corresponding to the remaining channels except for M channels among the N channels, and a multi-channel corresponding to the remaining channels A multi-channel audio encoder for encoding an audio signal to generate a multi-channel audio side information bitstream;
A downmix audio encoder configured to generate a downmix audio bitstream by encoding the second downmix signal; And
A bitstream multiplexer for outputting an encoded bitstream by multiplexing the multi-object audio side information bitstream, a multichannel audio side information bitstream, and a downmix audio bitstream
Audio encoding device comprising a.

delete

A bitstream demultiplexer that demultiplexes the encoded bitstream received from the audio encoding device into a multi-object audio side information bitstream, a multi-channel audio side information bitstream, and a downmix audio bitstream.- The multi-object audio side information bitstream is , A multi-channel audio signal and a voice object signal corresponding to specified M channels among N channels are encoded, and the multi-channel audio side information bitstream is the remaining channels excluding M channels among N channels. Is encoded with a multi-channel audio signal corresponding to, M is a number less than N;
A downmix audio decoding unit decoding a second downmix signal from the downmix audio bitstream;
A multi-channel audio decoder that decodes the remaining channels and the first downmix signal by using the multi-channel audio side information bitstream and the second downmix signal; And
A multi-object audio decoding unit that decodes the M channels and voice object signals using the first downmix signal and the multi-object audio side information bitstream
Audio decoding device comprising a.

The method of claim 13,
Delay unit for applying a delay to the remaining channels and outputting the delay based on the delay generated in the process of decoding the M channels by the multi-object audio decoding unit
Audio decoding apparatus further comprising a.

The method of claim 13,
The bitstream demultiplexer,
An audio decoding apparatus for extracting rendering information included in the encoded bitstream in a preset form.

The method of claim 13,
The multi-object audio decoding unit,
An audio decoding apparatus configured to control and output the size of the voice object signal and the M channels based on rendering information.

delete