KR20060122695A

KR20060122695A - Method and apparatus for decoding audio signal

Info

Publication number: KR20060122695A
Application number: KR1020060030670A
Authority: KR
Inventors: 오현오; 방희석; 김동수; 임재현
Original assignee: 엘지전자 주식회사
Priority date: 2005-05-26
Filing date: 2006-04-04
Publication date: 2006-11-30

Abstract

A method and an apparatus for decoding an audio signal are provided to produce to extract the spatial information of a multi-channel and produce the multi-channel. An encoding device(1) includes a spatial encoder(10). A core encoder(20) encodes a downmix audio signal. A multiplexer(30) performs the multiplexing of the encoded downmix audio signal and the spatial information. When an audio signal is inputted through the number of N of multi-channel, a downmix member(11) performs the downmix of the inputted audio signal and produces the downmix channel. The core encoder performs encoding of the received downmix audio signal and inputs the encoded downmix audio signal to the multiplexer. A spatial information extracting member(12) extracts the spatial information from the multichannel and transmits the extracted spatial information to the multiplexer.

Description

Method and apparatus for decoding audio signal {Method and Apparatus for decoding audio signal}

도 1은 본 발명에 따른 신호의 인코딩 장치와 디코딩 장치의 일 실시예를 도시한 것1 illustrates an embodiment of an apparatus for encoding and decoding a signal according to the present invention;

도 2는 본 발명에 따른 오디오 비트스트림 구조를 도시한 것2 illustrates an audio bitstream structure according to the present invention.

도 3은 본 발명에 따른 디코딩 방법과 장치를 예를 들어 설명하기 위해 도시한 것3 is a diagram illustrating an example of a decoding method and apparatus according to the present invention.

도 4는 본 발명에 따른 가상 서라운드 렌더링 과정과 공간 정보의 변환 과정의 제1실시예를 도시한 것4 illustrates a first embodiment of a virtual surround rendering process and a transformation process of spatial information according to the present invention.

도 5는 본 발명에 따른 가상 서라운드 렌더링 과정과 공간 정보의 변환 과정의 제2실시예를 도시한 것5 illustrates a second embodiment of a virtual surround rendering process and a transformation process of spatial information according to the present invention.

도 6a, 도 6b는 본 발명에 따른 채널 매핑 과정을 예를 들어 설명하기 위해 도시한 것6A and 6B illustrate an example of a channel mapping process according to the present invention.

도 7은 본 발명에 따른 각 채널별 필터 계수를 예를 들어 설명하기 위해 도시한 것7 is a diagram illustrating an example of filter coefficients for each channel according to the present invention.

도 8 ~ 도 10은 본 발명에 따른 필터 계수를 생성하기 위한 과정을 예를 들어 상세히 설명하기 위해 도시한 것8 to 10 illustrate in detail the process for generating filter coefficients according to the present invention, for example.

*도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

1:인코딩 장치 2:디코딩 장치1: encoding device 2: decoding device

10:공간 인코더 11:다운믹스부10: spatial encoder 11: downmix unit

12:공간 정보 추출부 20:코어 인코더12: spatial information extraction unit 20: core encoder

30:다중화부 40,300:역다중화부30: multiplexer 40,300: demultiplexer

50, 310:공간 정보 변환부 60, 320:오디오 디코더50, 310: Spatial information converter 60, 320: Audio decoder

311:공간 정보 디코더 312, 400, 500:공간 정보 변환기311: Spatial information decoder 312, 400, 500: Spatial information converter

321:코어 디코더 322, 510:가상 서라운드 렌더러321: core decoder 322, 510: virtual surround renderer

401, 501:채널 매핑부401, 501: channel mapping unit

402, 502, 800, 900, 1000:계수 생성부402, 502, 800, 900, 1000: coefficient generator

403, 503, 810, 910, 1010:합성기/도메인 변환기403, 503, 810, 910, 1010: Synthesizer / Domain Converter

411, 412, 511:도메인 변환기 413:렌더러L411, 412, 511: Domain Converter 413: Renderer L

414:렌더러R 416, 417:가산기414: Renderer R 416, 417: Adder

417, 418, 513, 514:역 도메인 변환기417, 418, 513, 514: reverse domain converter

512:렌더러M512: Renderer M

601, 602, 603, 604, 605, 611, 612, 613, 614, 615:OTT 박스601, 602, 603, 604, 605, 611, 612, 613, 614, 615: OTT box

811, 912, 1012:합성기 812, 911, 1000:인터폴레이터811, 912, 1012: Synthesizer 812, 911, 1000: Interpolator

813, 913, 1013:도메인 변환기813, 913, 1013: Domain Converter

본 발명은 오디오 신호의 디코딩 방법 및 장치에 관한 것으로, 보다 상세하게는 오디오 신호를 처리함에 있어서, 수신된 오디오 신호에서 가상 서라운드 신호를 생성하는 효과적인 방법에 관한 것이다.The present invention relates to a method and apparatus for decoding an audio signal, and more particularly, to an effective method for generating a virtual surround signal from a received audio signal in processing an audio signal.

디지털 비디오, 디지털 오디오에 대한 표준은 각각의 신호에 대한 압축 및 복원에 대한 규격이다. 또한, 디지털 시스템에 대한 표준은 압축된 비디오와 오디오 각각을 일정한 크기의 패킷으로 분할한 후 타이밍 정보, 스트림 관련 정보 등을 추가하여 다중화하여 전송하고, 그 반대로 역 다중화 과정을 통해 타이밍 정보, 스트림 관련 정보 등을 얻어내고, 또한 압축된 비디오와 오디오를 각각 분리해 내는데 필요한 규격이다.The standard for digital video and digital audio is the standard for compression and reconstruction for each signal. In addition, the standard for digital systems divides each compressed video and audio into packets of a certain size, and then multiplexes and transmits timing information and stream-related information, and vice versa. It is a standard necessary to obtain information and separate separate compressed video and audio.

최근에 디지털 오디오 신호에 대한 다양한 코딩기술 및 방법들이 개발되고 있으며, 이와 관련된 제품들이 생산되고 있다. 또한 심리음향 모델(psychoacoustic model)을 이용하여 멀티채널 오디오 신호의 코딩 방법들이 개발되고 있으며, 이에 대한 표준화 작업이 진행되고 있다. Recently, various coding techniques and methods for digital audio signals have been developed, and related products have been produced. In addition, coding methods for multichannel audio signals have been developed using a psychoacoustic model, and standardization thereof has been performed.

상기 심리음향 모델은 인간이 소리를 인식하는 방식, 예를 들면 큰 소리 다음에 오는 작은 소리는 들리지 않으며, 20Hz 내지 20000Hz의 주파수에 해당되는 소리만 들을 수 있다는 사실을 이용하여, 코딩 과정에서 불필요한 부분에 대한 신호를 제거함으로써 필요한 데이터의 양을 효과적으로 줄일 수 있는 것이다. The psychoacoustic model takes unnecessary parts in the coding process by using a method in which a human recognizes a sound, for example, a small sound following a loud sound and only a sound corresponding to a frequency of 20 Hz to 20000 Hz. By removing the signal for, we can effectively reduce the amount of data needed.

그리고, 현재 MPEG-1 오디오, MPEG-4 AAC(advanced audio coding) 및 MPEG-4 HE-AAC(high-efficiency AAC)와 같은 오디오 표준 기술이 개발되어 상용화되고 있 다. In addition, audio standard technologies such as MPEG-1 audio, MPEG-4 advanced audio coding (AAC), and MPEG-4 high-efficiency AAC (HE-AAC) have been developed and commercialized.

그러나, 공간 정보를 포함한 오디오 비트스트림에서 가상 서라운드 신호를 생성하기 위한 오디오 신호를 디코딩하는 방법이 구체적으로 제시된바 없어, 오디오 신호를 효율적으로 처리하는데 많은 문제점이 있었다.However, a method of decoding an audio signal for generating a virtual surround signal in an audio bitstream including spatial information has not been specifically described, and there are many problems in efficiently processing an audio signal.

본 발명은 상기와 같은 문제점을 해결하기 위한 것으로서, 오디오 시스템에서 가상의 입체음향 효과를 제공하는 오디오 신호의 디코딩 방법 및 장치를 제공하는데 그 목적이 있다. SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object thereof is to provide a method and apparatus for decoding an audio signal that provides a virtual stereophonic effect in an audio system.

상기 목적을 달성하기 위하여, 본 발명은 (a) 수신한 비트스트림을 코어 코덱 비트스트림과 공간 정보 비트스트림으로 역다중화하는 단계; (b) 상기 코어 코덱 비트스트림을 디코딩하여 디코딩된 다운믹스 신호를 생성하는 단계; (c) 상기 공간 정보 비트스트림을 디코딩하여 공간 정보를 생성하는 단계; 및 (d) 상기 공간 정보를 이용하여 상기 디코딩된 다운믹스 신호에서 가상 서라운드 신호를 생성하는 단계를 포함하여 이루어지는 것을 특징으로 하는 오디오 신호의 디코딩 방법을 제공한다.In order to achieve the above object, the present invention comprises the steps of (a) demultiplexing the received bitstream into a core codec bitstream and a spatial information bitstream; (b) decoding the core codec bitstream to produce a decoded downmix signal; (c) decoding the spatial information bitstream to generate spatial information; And (d) generating a virtual surround signal from the decoded downmix signal using the spatial information.

또한, 본 발명은 수신된 비트스트림을 역다중화하고, 상기 역다중화된 비트스트림에서 다운믹스된 신호와 공간 정보를 생성하는 단계; 상기 공간 정보와 필터 정보를 이용하여 필터 계수를 생성하는 단계; 및 상기 필터 계수를 사용하여 상기 다운믹스된 신호에서 가상 서라운드 신호를 생성하는 단계를 포함하여 이루어지는 것을 특징으로 하는 오디오 신호의 디코딩 방법을 제공한다.In addition, the present invention includes demultiplexing a received bitstream and generating downmixed signal and spatial information in the demultiplexed bitstream; Generating filter coefficients using the spatial information and the filter information; And generating a virtual surround signal from the downmixed signal using the filter coefficients.

또한, 본 발명은 수신된 비트스트림을 코어 코덱 비트스트림과 공간 정보 비트스트림으로 역다중화하는 역다중화부; 상기 공간 정보 비트스트림을 디코딩하여 공간 정보를 생성하고, 상기 공간 정보를 가상 서라운드 신호 생성을 위해 변환하는 공간 정보 변환부; 및 상기 코어 코덱 비트스트림을 디코딩하여 디코딩된 다운믹스 신호를 생성하고, 상기 변환된 공간 정보를 이용하여 상기 디코딩된 다운믹스 신호에서 가상 서라운드 신호를 생성하는 오디오 디코더를 포함하여 구성되는 것을 특징으로 하는 오디오 신호의 디코딩 장치를 제공한다.The present invention also provides a demultiplexer for demultiplexing a received bitstream into a core codec bitstream and a spatial information bitstream; A spatial information converter which decodes the spatial information bitstream to generate spatial information, and converts the spatial information to generate a virtual surround signal; And an audio decoder which decodes the core codec bitstream to generate a decoded downmix signal, and generates a virtual surround signal from the decoded downmix signal using the converted spatial information. An apparatus for decoding an audio signal is provided.

따라서, 본 발명에 의하면, 멀티채널을 다운믹스하여 다운믹스 채널을 생성하고, 상기 멀티채널의 공간 정보를 추출하여 생성된 오디오 비트스트림을 수신한 디코딩 장치가 멀티채널을 생성할 수 있는 환경이 아닌 경우에도 가상 서라운드 효과를 가질 수 있도록 디코딩하는 것이 가능하다.Therefore, according to the present invention, a decoding apparatus that receives a audio bitstream generated by downmixing a multichannel to generate a downmix channel and extracting spatial information of the multichannel is not an environment in which a multichannel can be generated. Even if it is possible to decode to have a virtual surround effect.

이하 상기의 목적으로 구체적으로 실현할 수 있는 본 발명의 바람직한 실시예를 첨부한 도면을 참조하여 설명한다.Hereinafter, with reference to the accompanying drawings, preferred embodiments of the present invention that can be specifically realized for the above purpose.

아울러, 본 발명에서 사용되는 용어는 가능한 한 현재 널리 사용되는 일반적인 용어를 선택하였으나, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우는 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재하였으므로, 단순한 용어의 명칭이 아닌 용어가 가지는 의미로서 본 발명을 파악하여야 함을 밝혀두고자 한다.In addition, the terms used in the present invention was selected as a general term widely used as possible now, but in some cases, the term is arbitrarily selected by the applicant, in which case the meaning is described in detail in the description of the invention, It is to be understood that the present invention is to be understood as the meaning of terms rather than the names of terms.

본 발명에서 "공간 정보(spatial information)"란 인코딩 장치에서 멀티채널 을 다운믹스(down-mix)하고 송신한 신호를 디코딩 장치에서 수신하여 업믹스(up-mix)를 수행하여 멀티채널을 생성하기 위해 필요한 정보를 의미한다. 상기 공간 정보로 공간 파라미터를 기준으로 설명하나, 본 발명이 이에 한정되지 않음은 자명한 사실임을 밝혀둔다. 관련하여, 상기 공간 파라미터는 두 채널간의 에너지 차이를 의미하는 CLD(channel level difference), 두 채널간의 상관관계(correlation)를 의미하는 ICC(inter channel coherences) 및 두 채널로부터 세 채널을 생성할 때 이용되는 예측 계수인 CPC(channel prediction coefficients) 등이 있다.In the present invention, " spatial information " refers to down-mixing multichannels in an encoding apparatus and receiving signals transmitted from a decoding apparatus to perform upmixing to generate multichannels. Means information required for Although the spatial information is described based on the spatial parameters, it is to be understood that the present invention is not limited thereto. In this regard, the spatial parameter is used when generating three channels from two channels and an inter channel coherence (ICC) representing a channel level difference (CLD) representing an energy difference between two channels and a correlation between two channels. Channel prediction coefficients (CPCs), which are prediction coefficients.

본 발명에서 "코어 코덱(core codec)"은 공간 정보, 즉 공간 파라미터가 아닌 오디오 신호를 코딩하는 코덱을 지칭한다. 본 발명에서는 상기 공간 정보가 아닌 오디오 신호로 다운믹스 오디오 신호를 예로 하여 설명한다. 또한, 상기 코어 코덱에는 MP3, AC-3, DTS 또는 AAC가 포함될 수 있으며, 압축을 하지않은 PCM 신호인 경우도 가능할 수 있다. 오디오 신호에 대하여 코덱 기능을 수행한다면 기존에 개발된 코덱뿐만 아니라 향후 개발될 코덱을 포함할 수 있다.In the present invention, "core codec" refers to a codec for coding an audio signal rather than spatial information, that is, a spatial parameter. In the present invention, a downmix audio signal is described as an audio signal instead of the spatial information. In addition, the core codec may include MP3, AC-3, DTS, or AAC, and may also be the case of a PCM signal without compression. If the codec function is performed on the audio signal, it may include a codec to be developed in the future as well as a codec previously developed.

본 발명에서 "채널 분할(channel splitting)부"는 특정 개수의 입력채널을 입력채널 개수와 다른 특정 출력채널 개수로 분할하는 분할부를 의미한다. 예를 들어, 상기 채널 분할부는 입력채널이 2개인 경우 출력채널을 3개로 변환하는 TTT(two to three:TTT, 이하 'TTT'라 한다.)부 또는 TTT 박스를 기준으로, 입력채널이 1개인 경우 출력채널을 2개로 변환하는 OTT(one to two:OTT, 이하 'OTT'라 한다.)부 또는 OTT 박스를 기준으로 설명하는 것이 가능하다. 다만, 본 발명의 채널 분할부는 TTT부와 OTT부에 한정되지 않으며, 입력채널과 출력채널이 임의의 개수를 가지는 경우에 모두 적용 가능함은 자명한 사실임을 밝혀둔다.In the present invention, a "channel splitting unit" means a splitting unit that divides a specific number of input channels into a specific output channel number different from the input channel number. For example, the channel dividing unit converts an output channel into three when two input channels are provided, or one input channel based on a TTT box or a TTT box. In this case, the OTT (one to two: OTT) section or OTT box for converting two output channels can be described. However, the channel dividing unit of the present invention is not limited to the TTT unit and the OTT unit, and it is apparent that both of the input channels and the output channels can be applied when the arbitrary number is obvious.

도 1은 본 발명에 따른 신호의 인코딩 장치와 디코딩 장치의 일 실시예를 도시한 것이다. 예를 들어, MPEG 서라운드(MPEG surround)에서 오디오 신호의 인코딩 장치와 디코딩 장치를 설명하기 위한 도면이다.1 illustrates an embodiment of an apparatus for encoding and decoding a signal according to the present invention. For example, a diagram for describing an encoding device and a decoding device of an audio signal in MPEG surround.

도 1을 참조하면, 오디오 신호처리의 과정에 대한 인코딩 장치(1)와 디코딩 장치(2)를 나타내고 있다. 다만, 본 발명에서 오디오 신호에 대해서 살펴보나, 본 발명은 오디오 신호 외에 모든 신호의 처리를 함에 있어서도 적용가능함을 밝혀둔다. Referring to FIG. 1, there is shown an encoding device 1 and a decoding device 2 for the process of audio signal processing. However, the present invention will be described with respect to the audio signal, but the present invention is applicable to the processing of all signals in addition to the audio signal.

인코딩 장치(1)는 다운믹스(downmix)부(11)와 공간 정보 추출(spatial parameter estimation)부(12)를 포함하여 구성되는 공간 인코더(spatial encoder:10)와, 다운믹스 오디오 신호를 인코딩하는 코어 인코더(core encoder:20)와, 인코딩된 다운믹스 오디오 신호와 공간 정보의 다중화를 수행하는 다중화부(multiplexer:30)를 포함하여 구성된다. The encoding apparatus 1 includes a spatial encoder: 10 including a downmix unit 11 and a spatial parameter estimation unit 12, and encodes a downmix audio signal. A core encoder 20 and a multiplexer 30 for multiplexing the encoded downmix audio signal and spatial information are included.

오디오 신호가 N개의 멀티채널(

,

,...,

)로 입력되면, 다운믹스부(11)는 미리 정해진 다운믹스 정보 또는 외부 제어 명령에 따라 특정 개수의 채널로 입력된 오디오 신호의 다운믹스를 수행하여 다운믹스 채널을 생성하고, 상기 다운믹스 채널로 다운믹스 오디오 신호를 출력하면, 상기 출력된 신호는 코어 인코더(20)로 입력한다.The audio signal is N multichannel (

,

, ...,

), The downmix unit 11 generates a downmix channel by performing downmixing of an audio signal input to a specific number of channels according to predetermined downmix information or an external control command and generating a downmix channel. When the downmix audio signal is output, the output signal is input to the core encoder 20.

그리고, 상기 다운믹스 채널은 한 개의 채널 또는 두 개의 채널을 가지거나, 또는 다운믹스 명령에 따라 특정 개수의 채널을 가질 수 있다. 이때, 다운믹스 채널의 개수는 설정가능하다. 또한, 선택적으로 다운믹스 오디오 신호는 외부에서 직접 제공되는 다운믹스 오디오 신호, 즉 아티스틱 다운믹스 신호(artistic downmix signal)를 이용할 수 있음을 밝혀둔다.The downmix channel may have one channel or two channels, or may have a specific number of channels according to a downmix command. At this time, the number of downmix channels can be set. Further, it is noted that the downmix audio signal may optionally use a downmix audio signal provided directly from the outside, that is, an artistic downmix signal.

코어 인코더(20)는 다운믹스 채널을 통해서 전송된 다운믹스 오디오 신호를 수신하고, 상기 수신한 다운믹스 오디오 신호의 인코딩을 수행한다. 상기 인코딩된 다운믹스 오디오 신호는 다중화부(multiplexer:30)로 입력한다.The core encoder 20 receives a downmix audio signal transmitted through a downmix channel and encodes the received downmix audio signal. The encoded downmix audio signal is input to a multiplexer 30.

공간 정보 추출부(12)는 멀티채널로부터 공간 정보를 추출하여, 상기 추출된 공간 정보를 다중화부(30)로 송신한다.The spatial information extracting unit 12 extracts spatial information from the multichannel, and transmits the extracted spatial information to the multiplexing unit 30.

다중화부(30)는 상기 코어 인코더(20)에서 인코딩된 다운믹스 오디오 신호를 수신하고, 공간 정보 추출부(12)에서 공간 정보를 수신한다. 그리고, 상기 수신한 인코딩된 다운믹스 오디오 신호와 공간 정보를 다중화하여 비트스트림을 생성하여 디코딩 장치(2)로 송신한다. 이때, 상기 비트스트림은 코어 코덱 비트스트림과 공간 정보 비트스트림을 포함하고, 상기 비트스트림에 대해서는 도 2를 참조하여 설명하도록 한다.The multiplexer 30 receives the downmixed audio signal encoded by the core encoder 20, and the spatial information extractor 12 receives spatial information. The encoded downmix audio signal and the spatial information are multiplexed to generate a bitstream and transmit the generated bitstream to the decoding apparatus 2. In this case, the bitstream includes a core codec bitstream and a spatial information bitstream, and the bitstream will be described with reference to FIG. 2.

디코딩 장치(2)는 역다중화부(demultiplexer:40), 공간 정보 변환부(50), 그리고 오디오 디코더(audio decoder:60)를 포함하여 구성됨을 특징으로 한다.The decoding device 2 is characterized in that it comprises a demultiplexer 40, a spatial information converter 50, and an audio decoder 60.

역다중화부(40)는 비트스트림을 수신하여, 수신된 비트스트림을 코어 코덱 비트스트림과 공간 정보 비트스트림으로 역다중화한다.The demultiplexer 40 receives the bitstream and demultiplexes the received bitstream into a core codec bitstream and a spatial information bitstream.

공간 정보 변환부(50)는 역다중화부(40)로부터 공간 정보 비트스트림을 수신 하고, 상기 수신된 공간 정보 비트스트림을 디코딩하여 공간 정보를 생성하며, 외부로부터 필터 정보를 수신한다. 또한, 상기 생성된 공간 정보와 수신한 필터 정보를 이용하여 가상 서라운드 신호를 생성하기 위해, 상기 공간 정보를 출력신호(예를 들어, 스테레오 출력)에 적용시킬 수 있는 형태로 변환하는 기능을 수행한다. 이로 인해 본 발명은 스테레오 디코딩과 함께 가상 서라운드로 디코딩할 수 있는 특징이 있다. 여기서, 상기 필터 정보의 일예로 HRTF(head-related transfer functions) 파라미터를 기준으로 설명하나, 본 발명이 이에 한정되지 않음은 자명한 사실이다.The spatial information converter 50 receives the spatial information bitstream from the demultiplexer 40, decodes the received spatial information bitstream, generates spatial information, and receives filter information from the outside. Also, to generate a virtual surround signal using the generated spatial information and the received filter information, the spatial information is converted into a form that can be applied to an output signal (for example, a stereo output). . As a result, the present invention has the feature of decoding in virtual surround together with stereo decoding. Here, although an example of the filter information is described based on a head-related transfer functions (HRTF) parameter, it is obvious that the present invention is not limited thereto.

오디오 디코더(60)는 역다중화부(40)로부터 코어 코덱 비트스트림을 수신하여 디코딩된 다운믹스 신호를 생성하고, 상기 생성된 디코딩된 다운믹스 신호와 공간 정보 변환부(50)에서 수신한 공간 정보를 이용하여 가상 서라운드 신호를 생성하여 출력한다. 여기서, 상기 공간 정보는 가상 서라운드 신호의 생성을 위해 변환한 공간 정보임을 밝혀둔다. 예를 들어, 상기 가상 서라운드 신호는 스테레오 장치만을 가지는 오디오 시스템에서 가상의 입체음향 효과를 제공하는 신호이다. 이때, 본 발명은 출력되는 신호가 스테레오인 장치만을 가지는 오디오 시스템 외에도 적용가능함은 자명한 사실임을 밝혀둔다. 그리고, 상기 오디오 디코더(60)에서 수행하는 렌더링(rendering)은 설정된 모드(mode)에 따라서 여러 형태로 적용가능하다.The audio decoder 60 receives the core codec bitstream from the demultiplexer 40 to generate a decoded downmix signal, and the spatial information received by the generated decoded downmix signal and the spatial information converter 50. Generates and outputs a virtual surround signal using. Here, the spatial information is found to be spatial information converted for generation of a virtual surround signal. For example, the virtual surround signal is a signal that provides a virtual stereophonic effect in an audio system having only a stereo device. In this case, it is apparent that the present invention is applicable to an audio system having only a device in which the output signal is stereo. In addition, the rendering performed by the audio decoder 60 may be applied in various forms according to a set mode.

상기 공간 정보 변환부(50)와 오디오 디코더(60)에 대한 상세한 설명은 도 3 ~ 도 10을 참조하여 살펴보도록 한다.The spatial information converter 50 and the audio decoder 60 will be described in detail with reference to FIGS. 3 to 10.

이와 같이, 본 발명은 인코딩 장치(1)에서 멀티채널 오디오 신호를 직접 전 송하는 대신에 스테레오 또는 모노 오디오 신호로 다운믹스하여 전송하고, 상기 멀티채널 오디오 신호의 공간 정보를 함께 전송하는 경우, 디코딩 장치(2)에서 본 발명에 따른 공간 정보 변환부(50)와 오디오 디코더(60)를 포함한다면, 출력채널이 멀티채널이 아니고 모노 또는 스테레오 채널인 경우에도 사용자는 가상 3D(virtual 3D) 효과를 경험할 수 있는 매우 우수한 방식이다.As described above, in the present invention, instead of directly transmitting a multichannel audio signal in the encoding apparatus 1, downmixing and transmitting a stereo or mono audio signal, and transmitting spatial information of the multichannel audio signal together, decoding If the device 2 includes the spatial information converting unit 50 and the audio decoder 60 according to the present invention, even if the output channel is not a multi-channel but is a mono or stereo channel, the user may apply a virtual 3D effect. It's a great way to experience it.

도 2는 본 발명에 따른 오디오 비트스트림 구조를 도시한 것이다. 이때, 오디오 비트스트림은 공간 정보를 포함하는 것을 특징으로 한다.2 illustrates an audio bitstream structure according to the present invention. In this case, the audio bitstream may include spatial information.

도 2를 참조하면, 오디오 페이로드(audio payload) 1프레임(frame)에는 코딩된 오디오 데이터를 포함하는 필드와, 부가 데이터 필드(ancillary data field)를 포함한다. 여기서, 상기 부가 데이터 필드에 코딩된 공간 정보를 포함할 수 있다. 예를 들어, 오디오 페이로드가 48 ~ 128kbps일 때, 공간 정보는 5 ~ 32kbps 정도의 범위를 가질 수 있으나, 이는 하나의 예이며 이에 대한 제한은 없다.Referring to FIG. 2, one frame of an audio payload includes a field including coded audio data and an ancillary data field. Here, the spatial information coded in the additional data field may be included. For example, when the audio payload is 48 to 128 kbps, the spatial information may have a range of about 5 to 32 kbps, but this is one example and there is no limitation thereto.

도 3은 본 발명에 따른 디코딩 방법과 장치를 예를 들어 설명하기 위해 도시한 것이다.3 is a diagram illustrating an example of a decoding method and apparatus according to the present invention.

도 3을 참조하면, 디코딩 장치는 수신한 비트스트림을 코어 코덱 비트스트림과 공간 정보 비트스트림으로 역다중화하는 역다중화부(300), 코어 디코더(core decoder:321)와 가상 서라운드 렌더러(pseudo-surround renderer:322)를 포함하여 구성하는 오디오 디코더(audio decoder:320), 공간 정보 디코더(311)와 공간 정보 변환기(312)를 포함하여 구성하는 공간 정보 변환부(310)를 포함하여 구성된다.Referring to FIG. 3, the decoding apparatus may include a demultiplexer 300, a core decoder 321, and a virtual surround renderer that demultiplexes a received bitstream into a core codec bitstream and a spatial information bitstream. An audio decoder 320 including a renderer: 322, a spatial information decoder 311, and a spatial information converter 310 including a spatial information converter 312 are included.

공간 정보 디코더(311)는 역다중화부(300)가 송신한 공간 정보 비트스트림을 수신하고, 상기 공간 정보 비트스트림을 디코딩하여 공간 정보를 생성한다. 공간 정보 변환기(312)는 필터 정보와 공간 정보를 수신하여 가상 서라운드 신호를 생성하기 위한 형태로 상기 공간 정보를 변환한다. 상기 변환한 공간 정보를 가상 서라운드 렌더러(322)로 출력한다. 예를 들어, 상기 변환한 공간 정보는 필터 계수(filter coefficients)가 될 수 있다.The spatial information decoder 311 receives the spatial information bitstream transmitted by the demultiplexer 300, and decodes the spatial information bitstream to generate spatial information. The spatial information converter 312 receives the filter information and the spatial information and converts the spatial information into a form for generating a virtual surround signal. The converted spatial information is output to the virtual surround renderer 322. For example, the transformed spatial information may be filter coefficients.

코어 디코더(321)는 역다중화부(300)가 송신한 코어 코덱 비트스트림을 수신하여, 디코딩된 다운믹스 신호를 출력한다. 예를 들어, 인코딩 장치에서 멀티채널을 다운믹스 할 때, 모노 채널 또는 스테레오 채널로 다운믹스한 경우에는 상기 디코딩된 다운믹스 신호는 모노 채널 또는 스테레오 채널이 가능하다. 다만, 본 발명의 실시예에서는 다운믹스 채널을 모노 채널 또는 스테레오 채널을 기준으로 설명하나, 본 발명은 이에 한정되지 않고, 다운믹스 채널의 수에 관계없이 적용가능함을 밝혀둔다.The core decoder 321 receives the core codec bitstream transmitted by the demultiplexer 300 and outputs a decoded downmix signal. For example, when downmixing multichannels in an encoding apparatus, when downmixing to a mono channel or a stereo channel, the decoded downmix signal may be a mono channel or a stereo channel. However, in the embodiment of the present invention, the downmix channel will be described based on a mono channel or a stereo channel, but the present invention is not limited thereto, and it is understood that the present invention can be applied regardless of the number of downmix channels.

가상 서라운드 렌더러(322)는 코어 디코더(321)가 송신한 디코딩된 다운믹스 신호를 수신하고, 공간 정보 변환기(312)가 송신한 변환된 공간 정보를 수신하여, 디코딩된 다운믹스 신호의 가상 서라운드 렌더링을 수행하여 가상 서라운드 신호를 출력한다. 예를 들어, 출력부가 스테레오 채널인 경우, 상기 가상 서라운드 신호는 가상의 입체적 음향을 가지는 가상 서라운드 스테레오 출력(pseudo-surround stereo output)이 된다. 여기서, 출력부가 스테레오 채널인 경우를 예를 들어 설명하나, 본 발명은 출력부의 채널 수에 관계없이 적용가능하다.The virtual surround renderer 322 receives the decoded downmix signal transmitted by the core decoder 321, and receives the transformed spatial information transmitted by the spatial information converter 312 to virtually render the decoded downmix signal. To output the virtual surround signal. For example, when the output unit is a stereo channel, the virtual surround signal is a pseudo-surround stereo output having virtual stereo sound. Here, the case where the output unit is a stereo channel will be described by way of example, but the present invention is applicable regardless of the number of channels of the output unit.

예를 들어, MPEG 서라운드의 공간 정보에는 CLD, CPC, ICC 등이 있다. 이 중 에서 상기 CLD는 채널과 채널 사이의 상대적인 레벨 차이값을 나타낸다. 인코딩 과정에서 멀티채널로부터 다운믹스 채널(예를 들어, 1채널 또는 2채널)로 다운믹스하는 과정은 채널 변환부(예를 들어, OTT박스 또는 TTT박스)를 이용하여 다단계로 진행한다. 이때, 각 단계에서 상기 채널 변환부마다 상기 공간 정보를 생성하게 된다. 상기 공간 정보를 디코딩 장치에서는 디코딩하고, 상기 디코딩된 공간 정보를 이용하여, 다운믹스 신호에 공간 정보를 입히는 과정을 통해서 가상 서라운드를 가진 신호를 생성할 수 있다. 본 발명은 다운믹스 신호를 공간 업믹스를 하여 멀티채널을 생성하는 것이 아니라, 각 단계별 공간 정보만을 추출하고, 상기 추출한 공간 정보를 이용하여 가상 서라운드 렌더링을 실시하는데 특징이 있다.For example, spatial information of MPEG surround includes CLD, CPC, ICC, and the like. Among these, the CLD represents a relative level difference value between the channel and the channel. In the encoding process, the downmixing process from the multichannel to the downmix channel (for example, one channel or two channels) is performed in multiple steps using a channel converter (for example, an OTT box or a TTT box). In this case, the spatial information is generated for each channel converter in each step. The decoding apparatus may decode the spatial information and generate a signal having virtual surround by applying spatial information to a downmix signal using the decoded spatial information. The present invention is characterized by extracting only spatial information for each stage and performing virtual surround rendering using the extracted spatial information, rather than spatially upmixing the downmix signal.

예를 들어, 가상 서라운드 렌더링 방법에는 HRTF(head-related transfer functions: 이하 'HRTF'라 한다.) 필터링이 있다. 이때, 공간 정보는 MPEG 서라운드에서 정의된 하이브리드 필터뱅크 도메인(hybrid filterbank domain)에서 적용될 수 있는 값이다. 관련하여, 상기 가상 서라운드 렌더링하는 방법은 도메인에 따라서 다음과 같은 종류가 가능하다. For example, a virtual surround rendering method includes HRTF (head-related transfer functions) (hereinafter referred to as 'HRTF') filtering. In this case, the spatial information is a value that can be applied in a hybrid filterbank domain defined in MPEG surround. In this regard, the virtual surround rendering method may be of the following types depending on the domain.

첫째, 다운믹스 신호에 대해 하이브리드 필터뱅크(hybrid filterbank)를 통과시키고, 하이브리드 도메인(hybrid domain)에서 가상 서라운드 렌더링을 수행하는 방법이다. 여기서는 공간 정보에 대한 도메인의 변환을 필요하지 않다.First, a hybrid filterbank is passed through a downmix signal, and virtual surround rendering is performed in a hybrid domain. There is no need to convert the domain to spatial information here.

둘째, 타임 도메인(time domain)에서 가상 서라운드 렌더링을 수행하는 방법이다. 이 방법은 HRTF 필터가 시간영역에서 FIR 필터 또는 IIR 필터로 모델링되는 점을 이용하여, 공간 정보를 시간영역의 필터 계수로 변환하는 과정을 필요로 한 다.The second method is to perform virtual surround rendering in the time domain. This method requires transforming spatial information into filter coefficients in the time domain by using the HRTF filter modeled as a FIR filter or an IIR filter in the time domain.

셋째, 다른 주파수 도메인에서 가상 서라운드 렌더링을 수행하는 방법이다. 예를 들어, DFT(discrete fourier transform) 도메인에서 가상 서라운드 렌더링을 수행하는 방법이다. 이 방법은 공간 정보를 해당 도메인으로의 변환을 필요로 한다. 또한, 이 방법은 시간영역에서의 필터링을 주파수 도메인에서의 연산으로 치환하는 것으로서, 고속 연산이 가능하다.Third, virtual surround rendering is performed in different frequency domains. For example, a method of performing virtual surround rendering in a discrete fourier transform (DFT) domain. This method requires the conversion of spatial information to the domain. In addition, this method replaces filtering in the time domain with operations in the frequency domain, thereby enabling high-speed computation.

본 발명에서 필터 계수는 특정 필터가 가지는 계수를 의미한다. 상기 필터 계수의 예로 다음과 같은 필터 계수가 있다. 원형 HRTF 필터 계수(proto-type HRTF filter coefficient)는 특정 HRTF 필터가 가지는 원래의 필터 계수를 의미하고, GL_L 등으로 표현 가능하다. 변형된 HRTF 필터 계수(converted HRTF filter coefficient)는 상기 원형 HRTF 필터 계수가 변형된 후의 필터 계수를 의미하고, GL_L' 등으로 표현 가능하다. 공간화된 HRTF 필터 계수(spatialized HRTF filter coefficient)는 원형 HRTF 필터 계수를 가상 서라운드 신호 생성을 위해 공간화한 필터 계수를 의미하고, FL_L1 등으로 표현 가능하다. 마스터 렌더링 계수는 렌더링을 수행을 위해 필요한 필터 계수를 의미하고, HL_L 등으로 표현 가능하다. 인터폴레이팅/블러링된 마스터 렌더링 계수는 상기 마스터 렌더링 계수를 인터폴레이팅 및/또는 블러링을 수행한 필터 계수를 의미하고, HL_L' 등으로 표현 가능하다. 여기서, 본 발명이 상기 필터 계수의 명칭에 한정되지 않음을 밝혀둔다.In the present invention, the filter coefficient means a coefficient of a specific filter. Examples of the filter coefficients include the following filter coefficients. Proto-type HRTF filter coefficients represent original filter coefficients of a specific HRTF filter, and may be expressed as GL_L or the like. The transformed HRTF filter coefficient refers to a filter coefficient after the circular HRTF filter coefficient is modified and may be expressed as GL_L '. The spatialized HRTF filter coefficients represent spatial filter coefficients obtained by spatializing the circular HRTF filter coefficients for generating a virtual surround signal, and may be expressed as FL_L1 or the like. The master rendering coefficients mean filter coefficients necessary for performing rendering, and can be expressed as HL_L. The interpolated / blurred master rendering coefficients refer to filter coefficients in which the master rendering coefficients are interpolated and / or blurred, and can be expressed as HL_L '. Here, it should be noted that the present invention is not limited to the names of the filter coefficients.

도 4는 본 발명에 따른 가상 서라운드 렌더링 과정과 공간 정보의 변환 과정의 제1실시예를 도시한 것이다. 특히, 가상 서라운드 렌더러로 입력되는 디코딩된 다운믹스 신호가 스테레오인 경우를 예로 한 것이다.4 illustrates a first embodiment of a virtual surround rendering process and a transformation process of spatial information according to the present invention. In particular, the case where the decoded downmix signal input to the virtual surround renderer is stereo is taken as an example.

가상 서라운드 렌더링이 적용되는 도메인에 따라서 도메인 변환(domain conversion)은 다를 수 있다. 이는 다운믹스 신호를 가상 서라운드 렌더링하는 도메인과 공간 정보에 대한 도메인 양쪽이 일치되는 도메인이어야 하므로, 양쪽의 도메인이 일치하지 않는 경우에는 양쪽이 일치되는 도메인으로 변환해야 한다.Depending on the domain to which virtual surround rendering is applied, domain conversion may vary. Since the domain for the virtual surround rendering of the downmix signal and the domain for spatial information must be matched domains, if both domains do not match, the domains must be converted to the matching domain.

공간 정보 변환기(400)는 채널 매핑부(401), 계수 생성부(402), 합성기/도메인 변환기(integrator/domain converter:403)를 포함하여 구성된다. The spatial information converter 400 includes a channel mapping unit 401, a coefficient generator 402, and an integrator / domain converter 403.

채널 매핑부(401)는 입력된 공간 정보를 멀티채널 신호의 각 채널에 매핑되도록 채널 매핑을 수행한다. 상기 채널 매핑 과정에 대한 상세한 설명은 이하 도 6a, 도 6b를 참조하여 상세히 설명하도록 한다.The channel mapping unit 401 performs channel mapping so that the input spatial information is mapped to each channel of the multichannel signal. A detailed description of the channel mapping process will be given below with reference to FIGS. 6A and 6B.

계수 생성부(402)의 각 채널별 계수 생성기(402_1, 402_2, ..., 402_N)는 외부 입력으로 필터 정보를 수신하고, 채널 매핑부(402)로부터 채널 매핑된 공간 정보를 수신하여 해당 채널 신호를 가상적으로 생성할 수 있는 각 채널별 계수를 생성한다. 예를 들어, 가상 서라운드 렌더러(pseudo-surround renderer:410)가 HRTF 필터인 경우에는 상기 계수 생성부(402)는 HRTF 계수를 생성한다. 상기 각 채널별 계수에 대한 설명은 이하 도 7을 참조하여 상세히 설명하도록 한다.The coefficient generators 402_1, 402_2,..., 402_N of each channel of the coefficient generator 402 receive filter information through an external input, and receive channel-mapped spatial information from the channel mapping unit 402 to correspond to the corresponding channel. Generate coefficients for each channel that can virtually generate signals. For example, when the pseudo-surround renderer 410 is an HRTF filter, the coefficient generator 402 generates HRTF coefficients. A description of the coefficient for each channel will be described in detail with reference to FIG. 7.

각 채널별 계수를 수신한 합성기/도메인 변환기(integrator/domain converter:403)는 각 채널별 계수를 다운믹스 채널별로 합성(integration)하고, 가상 서라운드 렌더링을 위한 도메인으로 변환하여 필터 계수(filter coefficients)를 출력한다. 여기서, 상기 합성(integration) 단계는 가상 서라운드 렌더링하는 연산량을 줄이기 위한 과정으로 생략이 가능하다. 또한, 다운믹스 신호가 스테레오(stereo)인 경우는, 각 채널별 계수 생성과정에서 왼쪽(left) 및 오른쪽(right) 다운믹스 신호에 적용될 계수 세트(coefficient set)를 생성한다. 관련하여, 필터 계수(filter coefficients) 세트는 각각의 채널에서 자기 채널로 전달되는 계수와 상대 채널로 전달되는 계수를 포함한다. The synthesizer / domain converter (403) receiving the coefficients for each channel integrates the coefficients for each channel for each downmix channel and converts them into domains for virtual surround rendering. Outputs In this case, the integration may be omitted as a process for reducing the amount of computation for virtual surround rendering. In addition, when the downmix signal is a stereo, a coefficient set to be applied to the left and right downmix signals is generated during coefficient generation for each channel. In this regard, a set of filter coefficients includes coefficients delivered from each channel to the magnetic channel and coefficients delivered to the relative channel.

즉, 공간 정보 변환기(400)는 공간 정보를 이용하여 가상 서라운드 렌더러(510)의 자기 채널로 전달되는 계수와 상대 채널로 전달되는 계수를 생성한다. 예를 들어, 상기 공간 정보 변환기(400)는 렌더러 L(renderer L:413)에 입력되어 자기 채널 출력인 왼쪽 출력(left out)으로 전달하는 계수(HL_L)와, 상대 채널인 오른쪽 출력(right out)으로 전달하는 계수(HL_R)를 생성한다. 또한, 상기 공간 정보 변환기(400)는 렌더러 R(renderer R:414)에 입력되어 자기 채널 출력인 오른쪽 출력(right out)으로 전달하는 계수(HR_R)와, 상대 채널인 왼쪽 출력(left out)으로 전달하는 계수(HR_L)를 생성한다. That is, the spatial information converter 400 generates spatial coefficients transmitted to the magnetic channel of the virtual surround renderer 510 and coefficients transmitted to the relative channel by using the spatial information. For example, the spatial information converter 400 is input to the renderer L (413) and passes the coefficient (HL_L) to the left output (left out), which is the output of its own channel, and the right output (right out) which is the relative channel. Create a coefficient (HL_R) passing in In addition, the spatial information converter 400 is inputted to the renderer R (414) R (R R) to pass to the right output (right out) of its own channel output (HR_R), and to the left output (left out) of the relative channel. Generates the passing coefficient (HR_L).

가상 서라운드 렌더러(pseudo-surround renderer:410)는 도메인 변환기(domain converter:411,412), 렌더러 L(renderer L:413), 렌더러 R(renderer R:414), 그리고 가산기(415, 415), 그리고 역 도메인 변환기(inverse domain converter:417,418)를 포함하여 구성된다.The pseudo-surround renderer (410) is a domain converter (411,412), renderer L (413), renderer R (414), and adder (415, 415), and reverse domain. And an inverse domain converter (417, 418).

도메인 변환기(domain converter:411,412)는 공간 정보 도메인과 가상 서라운드 렌더링을 수행하는 도메인이 다른 경우, 양쪽의 도메인을 일치하기 위해 도메인 변환을 수행한다. Domain converters 411 and 412 perform domain conversion to match both domains when the spatial information domain and the domain for performing virtual surround rendering are different.

렌더러 L(renderer L:413), 렌더러 R(renderer R:414)는 스테레오 채널로 다운믹스 신호를 수신하고, 합성기/도메인 변환기(integrator/domain converter:403)에서 출력한 왼쪽(left), 오른쪽(right) 다운믹스 신호에 적용될 네 개의 필터 계수(filter coefficients) 세트를 수신한다. 상기 렌더러 L(renderer L:413), 렌더러 R(renderer R:414)는 네 개의 필터 계수(filter coefficients) 세트(예를 들어, HL_L, HL_R, HR_L, HR_R)를 이용하여 다운믹스 신호에서 가상 서라운드 신호를 생성하기 위한 렌더링을 수행한다. Renderer L: 413 and Renderer R: 414 receive downmix signals on stereo channels and output left, right (integrator / domain converter: 403) outputs. right) receives a set of four filter coefficients to be applied to the downmix signal. The renderer L (413) and the renderer R (414r) are virtual surrounds in the downmix signal using four sets of filter coefficients (e.g., HL_L, HL_R, HR_L, HR_R). Perform rendering to generate the signal.

예를 들어, 상기 렌더러 L(renderer L:413)은 필터 계수(filter coefficients) 세트인 왼쪽 세트(left set)에서 자기 채널로 전달되는 필터 계수 세트(HL_L)와 상대 채널로 전달되는 필터 계수 세트(HL_R)를 이용하여 렌더링을 수행한다. 상기 렌더러 L(renderer L:413)은 렌더러 HL_L(renderer HL_L)과 렌더러 HL_R(renderer HL_R)을 포함하여 구성될 수 있다. 상기 렌더러 HL_L(renderer HL_L)는 자기 채널 출력인 왼쪽 출력(left out)으로 전달하는 필터 계수 세트(HL_L)를 이용하여 렌더링을 수행하고, 상기 렌더러 HL_R(renderer HL_R)는 상대 채널인 오른쪽 출력(right out)으로 전달하는 필터 계수 세트(HL_R)를 이용하여 렌더링을 수행한다. For example, the renderer L (413) is a filter coefficient set (HL_L) transmitted to a magnetic channel and a filter coefficient set (referred to) in a left channel, which is a set of filter coefficients. HL_R) is used for rendering. The renderer L (413) may be configured to include a renderer HL_L and a renderer HL_R. The renderer HL_L renders using a filter coefficient set HL_L which is passed to the left out which is the output of its own channel, and the renderer HL_R is the right output that is a relative channel. Rendering is performed using the filter coefficient set HL_R delivered to out).

또한, 상기 렌더러 R(renderer R:414)은 필터 계수(filter coefficients) 세트인 오른쪽 세트(right set)에서 자기 채널로 전달되는 필터 계수 세트(HR_R)와 상대 채널로 전달되는 필터 계수 세트(HR_L)를 이용하여 렌더링을 수행한다. 상기 렌더러 R(renderer R:414)은 렌더러 HR_R(renderer HR_R)과 렌더러 HR_L(renderer HR_L)을 포함하여 구성될 수 있다. 상기 렌더러 HR_R(renderer HR_R)는 자기 채널 출력인 오른쪽 출력(right out)으로 전달하는 필터 계수 세트(HR_R)를 이용하여 렌더링을 수행하고, 상기 렌더러 HR_L(renderer HR_L)는 상대 채널인 왼쪽 출력(left out)으로 전달하는 필터 계수 세트(HR_L)를 이용하여 렌더링을 수행한다. 관련하여, 상기 HL_R, HR_L는 크로스 텀(cross term) 처리 후 가산기(415, 416)에서 상대 채널로 더해진다. 이때, 경우에 따라서는 상기 HL_R, HR_L가 0이 될 수 있는데, 이는 크로스 텀(cross term)의 계수는 0 값을 가질 수 있음을 의미한다. 여기서, 상기 HL_R, HR_L가 0이 되면, 해당 패스에 아무런 기여를 하지 않음을 의미한다.In addition, the renderer R (414) is a filter coefficient set HR_R delivered to the magnetic channel and a filter coefficient set HR_L delivered to the relative channel from the right set which is a set of filter coefficients. Rendering is done using. The renderer R (414) may include a renderer HR_R and a renderer HR_L. The renderer HR_R renders using a filter coefficient set HR_R delivered to the right out, which is a channel output, and the renderer HR_L is a left channel, which is a relative channel. Rendering is performed using the filter coefficient set HR_L delivered to out). In this regard, the HL_R and HR_L are added to the relative channel at the adders 415 and 416 after cross term processing. In this case, the HL_R and HR_L may be 0 in some cases, which means that the coefficient of the cross term may have a value of zero. Here, when the HL_R and HR_L become 0, it means that no contribution is made to the corresponding path.

역 도메인 변환기(417, 418)은 수신된 신호를 도메인 역변환하여 가상 서라운드 스테레오 신호를 출력한다. 이때, 사용자는 스테레오 채널을 가진 이어폰 등으로 멀티채널 효과를 가진 음향을 들을 수 있게 된다.Inverse domain converters 417 and 418 perform domain inverse transform on the received signal to output a virtual surround stereo signal. At this time, the user can listen to the sound having a multi-channel effect, such as earphone having a stereo channel.

이하, 도 4와 같이 입력신호가 스테레오 다운믹스 신호인 경우에 다운믹스 신호를 x, 공간 정보를 채널 매핑한 계수를 D, 외부 입력인 원형 HRTF 필터 계수를 G, 임시 멀티채널 신호를 p, 렌더링된 출력신호를 y라 정의하고, 이를 행렬식으로 나타내면 다음과 같다.Hereinafter, when the input signal is a stereo downmix signal as shown in FIG. 4, the downmix signal is x, the channel-mapped coefficient is D, the external HRP is a circular HRTF filter coefficient, and the temporary multichannel signal is p. If the output signal is defined as y and expressed as a determinant, it is as follows.

,

,

,

,

여기서, 상기 각 계수가 주파수 영역의 값이라고 한다면, 다음과 같은 형태로 전개할 수 있다. 먼저 임시 멀티채널 신호는 공간 정보를 채널 매핑한 계수와 다운믹스 신호의 곱으로 나타낼 수 있는데, 이를 수식으로 나타내면 다음과 같다.Here, if each coefficient is a value in the frequency domain, it can be developed in the following form. First, the temporary multichannel signal may be represented by a product of channel-mapped coefficients and a downmix signal.

,

,

그리고, 임시 멀티채널 p는 원형 HRTF 필터 계수 G를 이용하여 렌더링하면 다음과 같다.The temporary multichannel p is then rendered using the circular HRTF filter coefficient G as follows.

여기서, 상기

를 대입하여 y를 구할 수 있다.Where

You can find y by substituting for.

이때, H를

로 정의하면, 렌더링된 출력신호 y와 다운믹스 신호 x와는 다음의 관계를 갖는다.Where H

When defined as, the rendered output signal y and the downmix signal x have the following relationship.

,

따라서, 필터 계수 간의 곱을 먼저 처리하여 H를 생성한 뒤, 다운믹스 신호 x에 곱하여 y를 구할 수 있다.Therefore, the product of the filter coefficients may be first processed to generate H, and then the downmix signal x may be multiplied to obtain y.

상기에서 살펴본 정의에 따를 때, 이하에서 설명할 F계수는

의 관계, 다음 식의 관계에 의해 얻을 수 있다. 상기 F계수는 이하 도 8에서 살펴보도록 한다.According to the above definition, the F coefficient to be described below is

Can be obtained by the following relationship. The F coefficient will be described below with reference to FIG. 8.

도 5는 본 발명에 따른 가상 서라운드 렌더링 과정과 공간 정보의 변환 과정의 제2실시예를 도시한 것이다. 특히, 가상 서라운드 렌더러로 입력되는 디코딩된 다운믹스 신호가 모노(mono)인 경우를 예로 한 것이다.5 illustrates a second embodiment of a virtual surround rendering process and a transformation process of spatial information according to the present invention. In particular, the case where the decoded downmix signal input to the virtual surround renderer is mono is an example.

도 5를 참조하면, 공간 정보 변환기(500)는 채널 매핑부(501), 계수 생성부(502), 합성기/도메인 변환기(integrator/domain converter:503)를 포함하여 구성된다. 상기 공간 정보 변환기(500)의 구성요소들은 도 4에서 살펴본 공간 정보 변환기(400)의 구성요소들과 동일한 기능을 수행하므로, 이에 대한 상세한 설명은 여기서는 생략하도록 한다. 다만, 이때 상기 공간 정보 변환기(500)는 가상 서라운드 렌더링 도메인으로 변환된 최종적인 필터 계수를 생성한다. 상기 필터 계수는 디코딩된 다운믹스 신호가 모노인 경우, 모노 신호를 왼쪽 채널로 렌더링하는데 사용하는 필터 계수 세트(HM_L)와 모노 신호를 오른쪽 채널로 렌더링하는데 사용하는 필터 계수 세트(HM_R)를 포함한다.Referring to FIG. 5, the spatial information converter 500 includes a channel mapping unit 501, a coefficient generator 502, and an integrator / domain converter 503. Since the components of the spatial information converter 500 perform the same functions as the components of the spatial information converter 400 described with reference to FIG. 4, a detailed description thereof will be omitted herein. However, at this time, the spatial information converter 500 generates the final filter coefficients transformed into the virtual surround rendering domain. The filter coefficients include a set of filter coefficients (HM_L) for rendering a mono signal to the left channel and a set of filter coefficients (HM_R) for rendering a mono signal to the right channel when the decoded downmix signal is mono. .

가상 서라운드 렌더러(pseudo-surround renderer:510)는 도메인 변환기(domain converter:511), 렌더러 M(renderer M:512), 그리고 역 도메인 변환기(inverse domain converter:513,514)를 포함하여 구성된다. 상기 가상 서라운드 렌더러(510)의 구성요소와 도 4에서 살펴본 가상 서라운드 렌더러(410)의 차이점은 디코딩된 다운믹스 신호가 모노이므로 도메인 변환기(511)를 하나 가지며, 가상 서라운드 렌더링을 수행하는 렌더러도 렌더러 M(512) 하나이다. 상기 렌더러 M(512)은 공간 정보 변환기(500)로부터 필터 계수를 수신하고, 상기 수신한 필터 계수를 이용하여 가상 서라운드 신호를 생성하기 위한 가상 서라운드 렌더링을 수행한다. 이때, 상기 필터 계수는 모노 신호를 왼쪽 채널로 렌더링하는데 사용하는 필터 계수 세트(HM_L)와 모노 신호를 오른쪽 채널로 렌더링하는데 사용하는 필터 계수 세트(HM_R)이다.The pseudo-surround renderer (510) includes a domain converter (511), a renderer M (512), and an inverse domain converter (513,514). The difference between the components of the virtual surround renderer 510 and the virtual surround renderer 410 described in FIG. 4 is that the decoded downmix signal is mono, so that it has a domain converter 511, and the renderer that performs virtual surround rendering is also a renderer. M 512 is one. The renderer M 512 receives filter coefficients from the spatial information converter 500, and performs virtual surround rendering for generating a virtual surround signal using the received filter coefficients. In this case, the filter coefficients are the filter coefficient set HM_L used to render the mono signal to the left channel and the filter coefficient set HM_R used to render the mono signal to the right channel.

한편, 모노인 다운믹스 신호의 입력에 대해, 가상 서라운드 렌더링 후의 출력이 다운믹스 스테레오(downmix stereo)와 같은 형태의 출력을 얻고자 하는 경우에는 다음과 같은 두 가지 방법이 가능하다.On the other hand, with respect to the input of the downmix signal that is mono, the following two methods are possible when the output after virtual surround rendering is intended to obtain an output of a form such as downmix stereo.

첫째, 가상 서라운드 렌더러(예를 들어, HRTF 필터)를 가상 서라운드 효과를 위한 필터 계수를 사용하는 것이 아니라, 스테레오 다운믹스(stereo downmix) 시 사용하는 값을 이용하는 것이다. 이때, 상기 스테레오 다운믹스(stereo downmix) 시 사용하는 값은 왼쪽(left) 출력을 위한 계수 left front=1, right front=0, ...등이 가능하다.First, the virtual surround renderer (eg, HRTF filter) does not use filter coefficients for the virtual surround effect, but uses a value used for stereo downmix. In this case, the values used in the stereo downmix may be coefficients left front = 1, right front = 0, ... for left output.

둘째, 다운믹스 채널에서 공간 정보를 이용하여 멀티채널을 생성하는 디코딩 과정에서 마지막 멀티채널을 생성하는 것이 아니라, 원하는 채널 수를 얻기 위해 해당 단계(step)까지만 디코딩을 진행하는 것이 가능하다.Second, in the decoding process of generating a multichannel using spatial information in the downmix channel, the decoding may be performed only up to a corresponding step to obtain a desired number of channels instead of generating the last multichannel.

이하, 도 5와 같이 입력신호가 모노 다운믹스 신호인 경우에 다운믹스 신호를 x, 공간 정보를 채널 매핑한 계수를 D, 외부 입력으로 원형 HRTF 필터 계수를 G, 임시 멀티채널 신호를 p, 렌더링된 출력신호를 y라 정의하고, 이를 행렬식으로 나타내면 다음과 같다.Hereinafter, when the input signal is a mono downmix signal as shown in FIG. 5, the downmix signal is x, the channel-mapped coefficient is D, the circular HRTF filter coefficient is G as an external input, and the temporary multichannel signal is p. If the output signal is defined as y and expressed as a determinant, it is as follows.

,

,

,

,

,

여기서, 상기 행렬식들의 관계는 도 4에서 살펴보았으므로, 여기서는 생략하도록 한다. 이때, 도 4는 입력 다운믹스 신호가 스테레오인 경우를 예로 나타낸 것이며, 도 5는 입력 다운믹스 신호가 모노인 경우를 예로 나타낸 것에 차이가 있다. 관련하여, 상기에서 살펴본 정의에 따를 때, 본 발명에서 H와 GD의 관계 수식은 이 하 도 8에서 살펴보도록 한다.Here, since the relationship between the determinants has been described in FIG. 4, it will be omitted here. 4 illustrates an example in which the input downmix signal is stereo, and FIG. 5 illustrates an example in which the input downmix signal is mono. In relation to the above-described definition, the relationship between H and GD in the present invention will be described with reference to FIG. 8.

도 6a, 도 6b는 본 발명에 따른 채널 매핑 과정을 예를 들어 설명하기 위해 도시한 것이다.6A and 6B illustrate an example of a channel mapping process according to the present invention.

채널 매핑 과정은 수신된 공간 정보들을 가상 서라운드 렌더러에 맞게 멀티채널상의 각 채널별로 매핑되는 값을 생성하는 과정이다. 이때, 상기 채널 매핑 과정은 채널 매핑부(401, 501)에서 수행되는 과정이다.The channel mapping process is a process of generating a value mapped to each channel on a multi-channel according to the received spatial information to the virtual surround renderer. In this case, the channel mapping process is a process performed by the channel mapping units 401 and 501.

예를 들어, 다운믹스 신호가 모노(mono)인 경우에는 CLD1 ~ CLD5, ICC1 ~ ICC5 등의 계수를 이용하여, 채널 매핑 출력값을 생성한다. 예를 들어, 상기 채널 매핑 출력값은

,

등이 가능하다. 이때, 상기 채널 매핑 출력값을 생성하는 과정은 디코딩 장치에 수신된 공간 정보에 대응하는 트리 컨피규레이션(tree configuration)과 디코딩 장치에서 사용하는 공간 정보의 범위 등에 따라 가변적이다.For example, when the downmix signal is mono, channel mapping output values are generated using coefficients such as CLD1 to CLD5 and ICC1 to ICC5. For example, the channel mapping output value

,

Etc. are possible. In this case, the process of generating the channel mapping output value is variable according to a tree configuration corresponding to the spatial information received by the decoding apparatus and a range of spatial information used by the decoding apparatus.

도 6a는 본 발명에 따른 채널 매핑 과정을 설명하기 위한 제1채널 구성을 예를 들어 도시한 것이다. 이때, 채널 구성을 이루는 채널 변환부는 OTT박스이며, 상기 채널 구성이 5151의 구조를 나타낸다. 6A illustrates an example of a first channel configuration for explaining a channel mapping process according to the present invention. At this time, the channel conversion unit constituting the channel configuration is an OTT box, the channel configuration shows a structure of 5151.

도 6a를 참조하면, OTT박스들(601, 602, 603, 604, 605)과 공간 정보(예를 들어,

,

등)를 이용하여, 다운믹스 채널(m) 에서 멀티채널(L, R, C, LFE, Ls, Rs)을 생성하는 것이 가능하다.Referring to FIG. 6A,

OTT boxes

601, 602, 603, 604, and 605 and spatial information (eg,

,

Etc.), it is possible to generate the multichannels L, R, C, LFE, Ls, Rs in the downmix channel m.

예를 들어, 트리 구조(tree structure)가 5151의 구조인 경우, CLD만을 이용하여 채널 매핑 출력값을 얻는 방법은 다음 수식과 같다.For example, when the tree structure is 5151, a method of obtaining a channel mapping output value using only the CLD is as follows.

여기서, here,

,

,

이다. to be.

도 6b를 참조하면, OTT박스들(611, 612, 613, 614, 615)과 공간 정보(예를 들어,

,

등)를 이용하여, 다운믹스 채널(m)에서 멀티채널(L, Ls, R, Rs, C, LFE)을 생성하는 것이 가능하다.Referring to FIG. 6B,

OTT boxes

611, 612, 613, 614, and 615 and spatial information (eg,

,

Etc.), it is possible to generate the multichannels L, Ls, R, Rs, C, LFE in the downmix channel m.

예를 들어, 트리 구조(tree structure)가 5152의 구조인 경우, CLD만을 이용하여 채널 매핑 출력값을 얻는 방법은 다음 수식과 같다.For example, when the tree structure is 5152, a method of obtaining a channel mapping output value using only the CLD is as follows.

그리고, 상기에서 생성된 채널 매핑 출력값들은 주파수 밴드별, 파라미터 밴드별, 및/또는 전송된 타임 슬롯(time slot)별로 다른 값을 가진다. 여기서, 이웃하는 밴드 사이, 경계가 되는 타임 슬롯(time slot) 사이에서 값이 크게 차이가 나면 가상 서라운드 렌더링시에 왜곡이 발생할 수 있다. 상기 발생한 왜곡을 막기 위해서는 주파수 및 시간영역에서 블러링(blurring)을 해주는 과정이 필요하다. 상기 왜곡을 방지하기 위해서 수행하는 방법은 다음과 같다. 먼저, 상기에서 언급한 주파수 블러링(frequency blurring)과 시간영역 블러링(time blurring)을 이용할 수 있고, 가상 서라운드 렌더링에 적합한 다른 방법을 사용할 수 있다. 또한, 채널 매핑 출격값 각각에 특정 게인(gain)을 곱하여 이용할 수 있다. 상기 시간영역 블러링(time blurring)에 대해서는 이하 도 8에서 상세히 설명하도록 한다.In addition, the generated channel mapping output values have different values for each frequency band, for each parameter band, and / or for each transmitted time slot. Here, if the value is significantly different between neighboring bands and time slots as boundaries, distortion may occur during virtual surround rendering. In order to prevent the generated distortion, a process of blurring in the frequency and time domains is required. The method performed to prevent the distortion is as follows. First, the above-described frequency blurring and time blurring can be used, and other methods suitable for virtual surround rendering can be used. In addition, each channel mapping sorting value may be used by multiplying a specific gain. The time blurring will be described in detail later with reference to FIG. 8.

도 7은 본 발명에 따른 각 채널별 필터 계수를 예를 들어 설명하기 위해 도시한 것이다. 예들 들어, 상기 필터 계수로 HRTF 계수가 가능하다.FIG. 7 illustrates an example of filter coefficients for each channel according to the present invention. For example, the HRTF coefficient may be used as the filter coefficient.

가상 서라운드 렌더링을 위해서는 왼쪽 채널 소스(left channel source)에 대해 GL_L 필터를 통과한 신호를 왼쪽 출력으로 보내고, GL_R 필터를 통과한 신호를 오른쪽 출력으로 보낸다. 이후, 각 채널로부터 수신한 신호를 모두 합하여 왼쪽 최종 출력(예를 들어, Lo)과 오른쪽 최종 출력(예를 들어, Ro)을 생성하는 과정을 수행한다.For virtual surround rendering, the signal passing through the GL_L filter is sent to the left output for the left channel source and the signal passing through the GL_R filter to the right output. Thereafter, a process of generating the left final output (eg, Lo) and the right final output (eg, Ro) by summing all the signals received from each channel are performed.

따라서, 가상 서라운드 렌더링 수행된 좌우채널 출력은 다음과 같다.Accordingly, the left and right channel outputs of the virtual surround rendering are as follows.

Lo = L * GL_L + C * GC_L + R * GR_L + Ls * GLs_L + Rs * GRs_LLo = L * GL_L + C * GC_L + R * GR_L + Ls * GLs_L + Rs * GRs_L

Ro = L * GL_R + C * GC_R + R * GR_R + Ls * GLs_R + Rs * GRs_R Ro = L * GL_R + C * GC_R + R * GR_R + Ls * GLs_R + Rs * GRs_R

본 발명에서는 L(710), C(700), R(720), Ls(730), Rs(740)를 구하는 방법은 예를 들어 다음과 같다. 첫째, 다운믹스 채널을 공간 정보를 이용하여 멀티채널을 생성하는 복호화 방법을 이용하여 L(710), C(700), R(720), Ls(730), Rs(740)를 구할 수 있다. 예를 들어, 상기 멀티채널 생성하는 방법은 MPEG 서라운드 복호화 방법이 있다. 둘째, 공간 정보들만의 관계식으로 L(710), C(700), R(720), Ls(730), Rs(740)를 표현할 수 있다.In the present invention, a method for obtaining L (710), C (700), R (720), Ls (730), and Rs (740) is as follows. First, L (710), C (700), R (720), Ls (730), and Rs (740) can be obtained by using a decoding method of generating a multichannel using the downmix channel using spatial information. For example, the multi-channel generating method is an MPEG surround decoding method. Second, L (710), C (700), R (720), Ls (730), and Rs (740) may be expressed as a relational expression of only spatial information.

도 8 ~ 도 10은 본 발명에 따른 필터 계수를 생성하기 위한 과정을 예를 들어 상세히 설명하기 위해 도시한 것이다. 8 to 10 illustrate a process for generating filter coefficients according to the present invention, for example, in detail.

도 8을 참조하면, 계수 생성부(800)는 적어도 하나 이상의 계수 생성기(coef_1 generator(800_1), coef_2 generator(800_2), ..., coef_N generator(800_N))를 포함하여 구성되는 것을 특징으로 한다. 합성기/도메인 변환기(integrator/domain converter:810)는 합성기(integrator:811), 인터폴레이터(interpolator:812), 및 도메인 변환기(813)를 포함하여 구성되는 것을 특징으로 한다.Referring to FIG. 8, the coefficient generator 800 may include at least one coefficient generator (coef_1 generator 800_1, coef_2 generator 800_2,..., Coef_N generator 800_N). . The synthesizer / domain converter 810 may be configured to include an integrator 811, an interpolator 812, and a domain converter 813.

계수 생성부(800)에서 필터 정보와 공간 정보를 이용한 계수 생성 과정으로, 특정 계수 생성기(예를 들어, 제1계수 생성기를 coef_1 generator(800_1)라 한다.)에서의 계수 생성 과정은 다음과 같은 식으로 표현가능하다.As the coefficient generation process using the filter information and the spatial information in the coefficient generator 800, the coefficient generation process in a specific coefficient generator (for example, the first coefficient generator is called coef_1 generator 800_1) is as follows. It can be expressed as

예를 들어, 다운믹스 채널이 모노인 경우, 제1계수 생성기(coef_1 generator)는 멀티채널 상의 왼쪽 채널(left channel)에 대응되는 계수 생성기로서, 공간 정보로부터 생성된 계수 D_L을 이용하여 다음과 같이 표현할 수 있다.For example, when the downmix channel is monaural, the first coefficient generator (coef_1 generator) is a coefficient generator corresponding to the left channel on the multi-channel, and uses coefficient D_L generated from spatial information as follows. I can express it.

FL_L = D_L * GL_L (모노 입력에서 왼쪽 출력채널로 가는데 사용된 계수)FL_L = D_L * GL_L (coefficient used to go from mono input to left output channel)

FL_R = D_L * GL_R (모노 입력에서 오른쪽 출력채널로 가는데 사용된 계수)FL_R = D_L * GL_R (coefficient used to go from mono input to right output channel)

여기서, 상기 D_L은 공간 정보의 채널 매핑 과정에서 공간 정보로부터 생성한 값이다. 다만, 상기 D_L을 구하는 과정은 인코딩 장치에서 송신하여 디코딩 장치에서 수신한 채널 트리 컨피규레이션(tree configuration)에 따라 다를 수 있다. 관련하여, 제2계수 생성기를 coef_2 generator, 제3계수 생성기를 coef_3 generator라 하면, 상기 계수 생성 방법과 같은 방법으로 상기 제2계수 생성기(coef_2 generator)는 FR_L, FR_R을 생성하고, 상기 제3계수 생성기(coef_3 generator)는 FC_L, FC_R 등을 생성할 수 있다.Here, D_L is a value generated from the spatial information in the channel mapping process of the spatial information. However, the process of obtaining the D_L may vary depending on the channel tree configuration transmitted from the encoding apparatus and received by the decoding apparatus. In relation to this, when a second coefficient generator is a coef_2 generator and a third coefficient generator is a coef_3 generator, the second coefficient generator generates FR_L and FR_R in the same manner as the coefficient generation method, and the third coefficient is generated. The generator coef_3 generator may generate FC_L, FC_R, and the like.

예를 들어, 다운믹스 채널이 스테레오인 경우, 제1계수 생성기(coef_1 generator)는 멀티채널 상의 왼쪽 채널(left channel)에 대응되는 계수 생성기로서, 공간 정보로부터 생성된 계수 D_L1, D_L2을 이용하여 다음과 같이 표현할 수 있다.For example, when the downmix channel is stereo, the first coefficient generator (coef_1 generator) is a coefficient generator corresponding to the left channel on the multi-channel, and then uses the coefficients D_L1 and D_L2 generated from the spatial information. It can be expressed as

FL_L1 = D_L1 * GL_L (왼쪽 입력에서 왼쪽 출력채널로 가는데 사용된 계수)FL_L1 = D_L1 * GL_L (coefficient used to go from left input to left output channel)

FL_L2 = D_L2 * GL_L (오른쪽 입력에서 왼쪽 출력채널로 가는데 사용된 계 수)FL_L2 = D_L2 * GL_L (coefficient used to go from right input to left output channel)

FL_R1 = D_L1 * GL_R (왼쪽 입력에서 오른쪽 출력채널로 가는데 사용된 계수)FL_R1 = D_L1 * GL_R (coefficient used to go from left input to right output channel)

FL_R2 = D_L2 * GL_R (오른쪽 입력에서 오른쪽 출력채널로 가는데 사용된 계수)FL_R2 = D_L2 * GL_R (coefficient used to go from right input to right output channel)

여기서, 다운믹스 채널이 스테레오인 경우는 상기 다운믹스 채널이 모노인 경우와 동일한 방법으로 적어도 하나 이상의 계수 생성기에서 여러 계수들을 생성할 수 있다.Here, when the downmix channel is stereo, at least one coefficient generator may generate various coefficients in the same manner as when the downmix channel is mono.

이하, 도 8 ~ 도 10을 참조하여 공간 정보 변환기에서 필터 계수를 생성하는 과정에 대한 상세한 설명을 예를 들어 설명한다.Hereinafter, a detailed description of a process of generating filter coefficients in the spatial information converter will be described with reference to FIGS. 8 to 10.

도 8은 본 발명에 따른 공간 정보 변환기에서 필터 계수를 생성하는 과정에 대한 제1실시예를 설명하기 위해 도시한 것이다. FIG. 8 illustrates a first embodiment of a process of generating filter coefficients in a spatial information converter according to the present invention.

먼저, 합성기(811)는 채널별로 생성된 필터 계수를 가상 서라운드 렌더링용 필터 계수로 합성한다. 상기 합성기(811)에서의 합성 과정을 모노 입력인 경우와 스테레오 입력인 경우로 나누어 설명하면 다음과 같다.First, the synthesizer 811 synthesizes the filter coefficients generated for each channel into filter coefficients for virtual surround rendering. The synthesis process in the synthesizer 811 will be described below by dividing into a mono input case and a stereo input case.

<모노 입력의 예><Example of mono input>

HM_L = FL_L + FR_L + FC_L + FLS_L + FRS_L + FLFE_LHM_L = FL_L + FR_L + FC_L + FLS_L + FRS_L + FLFE_L

HM_R = FL_R + FR_R + FC_R + FLS_R + FRS_R + FLFE_RHM_R = FL_R + FR_R + FC_R + FLS_R + FRS_R + FLFE_R

<스테레오 입력의 예><Example of stereo input>

HL_L = FL_L1 + FR_L1 + FC_L1 + FLS_L1 + FRS_L1 + FLFE_L1HL_L = FL_L1 + FR_L1 + FC_L1 + FLS_L1 + FRS_L1 + FLFE_L1

HR_L = FL_L2 + FR_L2 + FC_L2 + FLS_L2 + FRS_L2 + FLFE_L2HR_L = FL_L2 + FR_L2 + FC_L2 + FLS_L2 + FRS_L2 + FLFE_L2

HL_R = FL_R1 + FR_R1 + FC_R1 + FLS_R1 + FRS_R1 + FLFE_R1HL_R = FL_R1 + FR_R1 + FC_R1 + FLS_R1 + FRS_R1 + FLFE_R1

HR_R = FL_R2 + FR_R2 + FC_R2 + FLS_R2 + FRS_R2 + FLFE_R2HR_R = FL_R2 + FR_R2 + FC_R2 + FLS_R2 + FRS_R2 + FLFE_R2

여기서, HM_L, HM_R는 모노 입력인 경우에 가상 서라운드 렌더링용 필터 계수로 합성된 계수를 의미하고, HL_L, HR_L, HL_R, HR_R는 스테레오 입력인 경우에 가상 서라운드 렌더링용 필터 계수로 합성된 계수를 의미한다.Here, HM_L and HM_R refer to coefficients synthesized as filter coefficients for virtual surround rendering in the case of mono input, and HL_L, HR_L, HL_R and HR_R refer to coefficients synthesized as filter coefficients for virtual surround rendering in case of stereo input. do.

인터폴레이터/타임 블러러(interpolator/time blurer:812)는 필터 계수에 대해서 인터폴레이션(interpolration)과 시간영역 블러링을 수행할 수 있다. The interpolator / time blurer 812 may perform interpolation and time domain blurring on the filter coefficients.

상기 인터폴레이터(interpolator)에서의 인터폴레이션 수행은 전송 및 생성된 공간 정보가 시간 축에서 간격이 넓을 경우, 상기 전송 및 생성된 공간 정보 사이에 존재하지 않는 공간 정보를 얻기 위해서 이루어진다. 예를 들어, n번째 paramSlot과 n+k번째 paramSlot에서 공간 정보가 존재한다고 할 때(k>1), 생성된 계수들(예를 들어, HL_L, HR_L, HL_R, HR_R)을 이용하여, 전송되지 않은 paramSlot상에서의 선형 인터폴레이션을 수행하는 경우에 대한 수식을 나타내면 다음과 같다. The interpolation performed by the interpolator is performed to obtain spatial information that does not exist between the transmitted and generated spatial information when the transmitted and generated spatial information is wide in the time axis. For example, when spatial information exists in the nth paramSlot and the n + kth paramSlot (k> 1), the generated coefficients (eg, HL_L, HR_L, HL_R, HR_R) are not transmitted. The formula for the case of performing linear interpolation on paramSlot is as follows.

<모노 입력의 예><Example of mono input>

HM_L(n+j) = HM_L(n) * a + HM_L(n+k) * (1-a)HM_L (n + j) = HM_L (n) * a + HM_L (n + k) * (1-a)

HM_R(n+j) = HM_R(n) * a + HM_R(n+k) * (1-a)HM_R (n + j) = HM_R (n) * a + HM_R (n + k) * (1-a)

<스테레오 입력의 예><Example of stereo input>

HL_L(n+j) = HL_L(n) * a + HL_L(n+k) * (1-a)HL_L (n + j) = HL_L (n) * a + HL_L (n + k) * (1-a)

HR_L(n+j) = HR_L(n) * a + HR_L(n+k) * (1-a)HR_L (n + j) = HR_L (n) * a + HR_L (n + k) * (1-a)

HL_R(n+j) = HL_R(n) * a + HL_R(n+k) * (1-a)HL_R (n + j) = HL_R (n) * a + HL_R (n + k) * (1-a)

HR_R(n+j) = HR_R(n) * a + HR_R(n+k) * (1-a)HR_R (n + j) = HR_R (n) * a + HR_R (n + k) * (1-a)

여기서, HM_L(n+j) , HM_R(n+j)는 모노 입력인 경우에 입력된 가상 서라운드 렌더링용 필터 계수로 합성된 계수를 인터폴레이션 수행한 계수를 의미한다. HL_L(n+j), HR_L(n+j), HL_R(n+j), HR_R(n+j)는 스테레오 입력인 경우에 입력된 가상 서라운드 렌더링용 필터 계수로 합성된 계수를 인터폴레이션 수행한 계수를 의미한다. 또한, 0<j<k 이고, j, k는 각각 정수이며, a는 0<a<1인 실수로서 다음과 같은 수식으로 얻을 수 있다.Here, HM_L (n + j) and HM_R (n + j) refer to coefficients obtained by interpolating the coefficients synthesized with the input filter coefficients for virtual surround rendering. HL_L (n + j), HR_L (n + j), HL_R (n + j), HR_R (n + j) are coefficients obtained by interpolating the coefficients synthesized with the filter coefficients for the virtual surround rendering input in the case of stereo input. Means. In addition, 0 <j <k, j, k are integers, and a is a real number of 0 <a <1, which can be obtained by the following formula.

a = j/ka = j / k

따라서, 상기 전송되지 않은 paramSlot상에서의 선형 인터폴레이션을 수행하는 경우에 대한 수식은 n번째 파라미터 슬롯의 값과 n+k번째 파라미터 슬롯의 값을 이용하여, 그 사이에 존재하는 파라미터 슬롯의 값을 찾는 방법이다. 상기 수식에 따라 두 슬롯에서의 값을 직선으로 연결한 선 상에서 해당 위치에 대응하는 값을 얻게 된다.Therefore, the equation for the case of performing the linear interpolation on the untransmitted paramSlot is to find the value of the parameter slot existing between the n-th parameter slot and the n + k-th parameter slot. to be. According to the above equation, a value corresponding to a corresponding position is obtained on a line connecting values in two slots in a straight line.

상기 타임 블러러(time blurer)에서의 시간영역 블러링(time blurring)은 시간영역에서 이웃하는 블록 사이에 계수 값이 급격히 변화하면, 불연속 점이 발생하여 왜곡을 일으킬 수 있는 문제를 막기 위해서 수행할 수 있다. 상기 시간영역 블러링은 인터폴레이션과 병행하여 진행할 수 있으며, 또는 그 위치에 따라 적용되는 방법이 다를 수 있다. 이하, 다운믹스 채널이 모노인 경우에 시간영역 블러링을 위 한 수식의 예를 살펴보도록 한다.Time blurring in the time blurr may be performed to prevent a problem in which discontinuities may occur and cause distortion when a coefficient value rapidly changes between neighboring blocks in the time domain. have. The time domain blurring may be performed in parallel with the interpolation, or the method applied may vary depending on the location. Hereinafter, an example of the equation for time domain blurring when the downmix channel is mono will be described.

HM_L(n)' = HM_L(n) * b + HM_L(n-1)' * (1-b)HM_L (n) '= HM_L (n) * b + HM_L (n-1)' * (1-b)

HM_R(n)' = HM_R(n) * b + HM_R(n-1)' * (1-b)HM_R (n) '= HM_R (n) * b + HM_R (n-1)' * (1-b)

즉, 이전 블록(n-1)에서의 필터 계수(HM_L(n-1)' 또는 HM_R(n-1)')에 (1-b)를 곱하고, 현재 블록(n)에서 생성된 필터 계수(HM_L(n) 또는 HM_R(n))에 b를 곱해서 더하는 1-pole IIR 필터 형태의 블러링(blurring)을 수행할 수 있다. 이때, b는 0<b<1인 상수값으로, 상기 b값이 작을수록 블러링 효과가 크고, b값이 클수록 블러링 효과는 작아진다. 또한, 나머지 필터들도 같은 방법으로 적용가능하다.That is, the filter coefficients HM_L (n-1) 'or HM_R (n-1)' in the previous block n-1 are multiplied by (1-b), and the filter coefficients generated in the current block n ( HM_L (n) or HM_R (n)) may be multiplied by b to add a 1-pole IIR filter. In this case, b is a constant value of 0 <b <1. The smaller the value of b, the larger the blurring effect. The larger the value of b, the smaller the blurring effect. The remaining filters are also applicable in the same way.

상기 시간영역 블러링을 위한 수식을 이용하여 인터폴레이션(interpolation)과 블러링(blurring)을 하나의 수식으로 표현하면, 다음과 같다.Interpolation and blurring are expressed as one equation using the equation for time domain blurring as follows.

HM_L(n+j)' = (HM_L(n)*a + HM_L(n+k)*(1-a)) * b + HM_L(n+j-1)' * (1-b)HM_L (n + j) '= (HM_L (n) * a + HM_L (n + k) * (1-a)) * b + HM_L (n + j-1)' * (1-b)

HM_R(n+j)' = (HM_R(n)*a + HM_R(n+k)*(1-a)) * b + HM_R(n+j-1)' * (1-b)HM_R (n + j) '= (HM_R (n) * a + HM_R (n + k) * (1-a)) * b + HM_R (n + j-1)' * (1-b)

한편, 인터폴레이터/타임 블러러(interpolator/time blurer:812)에서 인터폴레이션과 시간영역 블러링 과정을 수행하면, 원래의 필터 계수가 갖는 에너지와 다른 에너지 값을 가진 필터 계수가 나올 수 있는데, 이러한 문제를 막기 위한 에너지 정규화 작업이 추가될 수 있다.On the other hand, if the interpolation and time domain blurring process is performed in the interpolator / time blurer 812, a filter coefficient having an energy value different from the energy of the original filter coefficient may be generated. Energy normalization work may be added to prevent this.

도메인 변환기(domain converter:813)는 가상 서라운드 렌더링 적용 영역으로 도메인 변환(domain conversion)을 수행한다. 이때, 가상 서라운드 렌더링 적용 영역과 공간 정보의 적용 영역이 동일한 경우에는 도메인 변환을 수행하지 않을 수 있다.A domain converter 813 performs domain conversion to the virtual surround rendering application area. In this case, when the virtual surround rendering application area and the spatial information application area are the same, domain conversion may not be performed.

도 9는 본 발명에 따른 공간 정보 변환기에서 필터 계수를 생성하는 과정에 대한 제2실시예를 설명하기 위해 도시한 것이다. 9 is a view illustrating a second embodiment of a process of generating filter coefficients in a spatial information converter according to the present invention.

도 9를 참조하면, 합성기/도메인 변환기(integrator/domain converter:910)는 적어도 하나 이상의 인터폴레이터(911_1, 911_2, ..., 911_N)와 합성기(integrator:912)와 도메인 변환기(domain converter:913)를 포함하여 구성되는 것을 특징으로 한다.Referring to FIG. 9, an integrator / domain converter 910 may include at least one interpolator 911_1, 911_2,..., 911_N, an integrator 912, and a domain converter 913. It characterized in that it is configured to include.

도 8에서 설명한 제1실시예와의 차이는 계수 생성부(900)에서 각 채널별로 생성된 계수들(예를 들어, 모노인 경우는 FL_L, FL_R와 스테레오인 경우는 FL_L1, FL_L2 ,FL_R1, FL_R2)에 대해, 상기 각 계수들에 대해서 모두 인터폴레이션을 수행하는 것이다.The difference from the first embodiment described with reference to FIG. 8 is that coefficients generated by each channel in the coefficient generator 900 (for example, FL_L1, FL_L2, FL_R1, and FL_R2 in the case of mono and FL_L2, FL_R1, and stereo). ), Interpolation is performed for each of the coefficients.

도 10은 본 발명에 따른 공간 정보 변환기에서 필터 계수를 생성하는 과정에 대한 제3실시예를 설명하기 위해 도시한 것이다. FIG. 10 illustrates a third embodiment of a process of generating filter coefficients in the spatial information converter according to the present invention.

도 10은 앞에서 설명한 도 8, 도 9의 실시예와 차이점은 채널 매핑된 공간 정보에 대해 각각 인터폴레이터(1000)에서 인터폴레이션(interpolation)을 수행한 후, 상기 인터폴레이션 수행된 값을 이용하여 채널별 계수를 생성하는 것이다.FIG. 10 differs from the above-described embodiments of FIGS. 8 and 9 after performing interpolation in the interpolator 1000 for channel-mapped spatial information, and then using the interpolated value for each channel. To generate.

따라서, 도 10에서는 합성기/도메인 변환기(integrator/domain converter:1010)는 계수 생성부(1011), 합성기(integrator:1012), 및 도메인 변환기(domain converter:1013)를 포함하는 것을 특징으로 한다.Accordingly, in FIG. 10, an integrator / domain converter: 1010 may include a coefficient generator 1011, an integrator: 1012, and a domain converter: 1013.

도 8 ~ 도 10에서 설명한 각각 실시예의 방법에 있어서, 공간 정보를 채널 매핑한 출력은 주파수 영역의 값(예를 들어, 파라미터 밴드(parameter band) 단위 는 하나의 값을 갖는 값)이므로, 필터 계수의 생성 과정 등의 모두 주파수 영역에서 진행되는 경우로 가정하고 설명한 것임을 밝혀둔다. 또한, 가상 서라운드 렌더링 역시 주파수 영역에서 수행되는 경우에는 도메인 변환기는 아무런 역할을 수행하지 않고, 주파수 영역에서의 필터 계수들을 그대로 출력하거나, 혹은 주파수 해상도를 맞추는 변환 과정만을 수행하여 출력할 수 있다.In the method of each embodiment described with reference to FIGS. 8 to 10, the channel mapped output of the spatial information is a value in the frequency domain (for example, a parameter band unit has a single value). Note that it is assumed that all of the generation processes, such as, are performed in the frequency domain. In addition, when the virtual surround rendering is also performed in the frequency domain, the domain converter does not play any role and may output the filter coefficients in the frequency domain as they are or by performing only a conversion process that matches the frequency resolution.

본 발명을 상술한 실시예에 한정되지 않으며, 첨부된 청구범위에서 알 수 있는 바와 같이 본 발명이 속한 분야의 통상의 지식을 가진 자에 의해 변형이 가능하고 이러한 변형은 본 발명의 범위에 속한다. The present invention is not limited to the above-described embodiments, and as can be seen in the appended claims, modifications can be made by those skilled in the art to which the invention pertains, and such modifications are within the scope of the present invention.

상기에서 설명한 본 발명에 따른 오디오 신호의 디코딩 방법 및 장치는 멀티채널을 다운믹스하여 다운믹스 채널을 생성하고, 상기 멀티채널의 공간 정보를 추출하여 생성된 오디오 비트스트림을 수신한 디코딩 장치가 멀티채널을 생성할 수 있는 환경이 아닌 경우에도 가상 서라운드 효과를 가질 수 있도록 디코딩하는 것이 가능하다.The decoding method and apparatus for decoding an audio signal according to the present invention as described above generate a downmix channel by downmixing multichannels, and the decoding apparatus that receives the audio bitstream generated by extracting spatial information of the multichannel is multichannel. It is possible to decode to have a virtual surround effect even if the environment is not capable of generating the.

Claims

(a) demultiplexing the received bitstream into a core codec bitstream and a spatial information bitstream;

(b) decoding the core codec bitstream to produce a decoded downmix signal;

(c) decoding the spatial information bitstream to generate spatial information;

(d) generating a virtual surround signal from the decoded downmix signal using the spatial information.

The method of claim 1,

And a domain for generating the virtual surround signal and a domain for spatial information are the same domain.

The method of claim 2,

And when the domain generating the virtual surround signal is a frequency domain, converts a downmix signal into a frequency domain.

The method of claim 2,

And when the domain for generating the virtual surround signal is a time domain, converting a domain for spatial information into a time domain.

The method of claim 1, wherein step (d)

And generating filter coefficients using the spatial information and the external filter information to generate the virtual surround signal.

The method of claim 5,

And the filter information is a head-related transfer functions (HRTF) parameter.

The method of claim 5,

The generating of the filter coefficients includes a channel mapping step of mapping spatial information to each channel.

The method of claim 7, wherein

And generating filter coefficients for each channel by using the channel-mapped spatial information.

The method of claim 8,

And decoding and / or domain converting the filter coefficients of each channel for each downmix channel.

The method of claim 1, wherein step (d)

Domain converting the domains of the downmixed signal and spatial information for domain matching.

The method of claim 1, wherein step (d)

And performing inverse domain transformation after generating a virtual surround signal when domain conversion is performed for domain matching of the downmixed signal and spatial information.

The method of claim 1,

When the downmixed signal is mono, two filter coefficient sets are generated using spatial information, and the downmix signal is rendered using the filter coefficient sets. Decoding method.

The method of claim 12,

And the rendering generates a virtually rendered first output signal using the first set of coefficients and generates a second output signal using the second set of coefficients.

The method of claim 1,

When the downmixed signal is a stereo, four filter coefficient sets are generated using spatial information, and a downmix signal is rendered using the filter coefficient sets in each channel. Method of decoding an audio signal.

The method of claim 14,

And the rendering uses a first set of coefficients delivered to its own channel and a second set of coefficients delivered to its relative channel.

Demultiplexing the received bitstream and generating downmixed signal and spatial information in the demultiplexed bitstream;

Generating filter coefficients using the spatial information and the filter information;

Generating a virtual surround signal from the downmixed signal using the filter coefficients.

The method of claim 16,

Receiving filter information comprising the step of decoding the audio signal.

The method of claim 17,

The filter information is a method of decoding an audio signal, characterized in that the HRTF parameter.

A demultiplexer which demultiplexes the received bitstream into a core codec bitstream and a spatial information bitstream;

A spatial information converter which decodes the spatial information bitstream to generate spatial information, and converts the spatial information to generate a virtual surround signal;

And an audio decoder configured to decode the core codec bitstream to generate a decoded downmix signal and to generate a virtual surround signal from the decoded downmix signal using the converted spatial information. Device for decoding signal.

The method of claim 19, wherein the spatial information converter,

A spatial information decoder for decoding the spatial information bitstream;

And a spatial information converter for generating an essential coefficient for generating a virtual surround signal from the spatial information.

The method of claim 20, wherein the spatial information converter,

And a channel mapping unit for mapping the received spatial information to each channel of the multichannel signal.

The method of claim 21, wherein the spatial information converter,

And a coefficient generator for generating at least one filter coefficient for each channel in the channel-mapped spatial information.

The spatial information converter of claim 22,

And an integrator for synthesizing the filter coefficients for each channel for each downmix channel.

The spatial information converter of claim 22,

And an interpolator for interpolating the filter coefficients of each channel.

The spatial information converter of claim 22,

And a domain converter for matching the decoded downmix signal and the domain to each of the channel filter coefficients.

The method of claim 19, wherein the audio decoder,

A core decoder for decoding the core codec bitstream to produce a decoded downmix signal;

And a virtual surround renderer for generating a virtual surround signal using the decoded downmix signal and spatial information.

27. The system of claim 26 wherein the virtual surround renderer is:

And a domain converter configured to perform domain conversion for domain matching of the downmixed signal and spatial information.

27. The system of claim 26 wherein the virtual surround renderer is:

In case of domain conversion for domain matching of the downmixed signal and spatial information, an audio signal comprising an inverse domain converter for inverse domain conversion after rendering the decoded downmix signal Decoding device.

27. The system of claim 26 wherein the virtual surround renderer is:

When the decoded downmix signal is mono, two filter coefficient sets are generated by using spatial information, and the decoded downmix signal is rendered by using the filter coefficient sets in each channel. A device for decoding an audio signal.

The method of claim 29, wherein the virtual surround renderer,

Generating a virtually rendered first output signal using the first set of coefficients, and generating a second output signal using the second set of coefficients.

27. The system of claim 26 wherein the virtual surround renderer is:

When the decoded downmix signal is stereo, four filter coefficient sets are generated by using spatial information, and the decoded downmix signal is rendered by using the filter coefficient sets in each channel. A device for decoding an audio signal.

The method of claim 31, wherein the virtual surround renderer,

And a first renderer and a second renderer for generating a virtual surround signal using a first set of coefficients delivered to its own channel and a second set of coefficients delivered to a relative channel.

The method of claim 26,

And a coefficient used by the virtual surround renderer is a filter coefficient generated using spatial information and filter information.