KR20140037118A

KR20140037118A - Method of processing audio signal, audio encoding apparatus, audio decoding apparatus and terminal employing the same

Info

Publication number: KR20140037118A
Application number: KR1020137032698A
Authority: KR
Inventors: 이남숙
Original assignee: 삼성전자주식회사
Priority date: 2011-06-07
Filing date: 2012-06-07
Publication date: 2014-03-26
Also published as: WO2012169808A3; WO2012169808A2; EP2720223A2; CN103733256A

Abstract

A method for processing an audio signal, when down-mixing a first multiple input channels to a second multiple output channels, comprises the steps of: comparing the location of the first multiple input channels and the second multiple output channels; down-mixing the channel having the same location with the second multiple output channel among the first multiple input channels to the channel having the same location among the second multiple output channels; searching at least one adjacent channel for the rest channels among the first multiple input channels; deciding weighted value by considering at least one among the distance between channels, the correlation of signals, and the error from restoration; and down-mixing the rest channels among the first multiple input channels to the adjacent channels based on the decided weighted value.

Description

Method of processing audio signal, audio encoding apparatus, audio decoding apparatus and terminal employing the same}

본 발명은 오디오 부호화/복호화에 관한 것으로서, 보다 구체적으로는 멀티 채널 오디오 신호의 복원시 음질 열화를 최소화시킬 수 있는 오디오 신호 처리방법, 오디오 부호화장치, 오디오 복호화장치, 및 이를 채용하는 단말기에 관한 것이다.The present invention relates to audio encoding / decoding, and more particularly, to an audio signal processing method, an audio encoding apparatus, an audio decoding apparatus, and a terminal employing the same, which can minimize sound quality degradation when reconstructing a multi-channel audio signal. .

최근, 멀티 미디어 콘텐츠가 보급됨에 따라 더욱 현장감 넘치고, 풍부한 음원 환경을 경험하고자 하는 사용자들의 요구가 증가하고 있다. 이러한 사용자들의 요구를 충족시키기 위해 멀티 채널 오디오에 대한 연구가 활발히 진행되고 있다.Recently, as multimedia contents are spread, the demand of users who want to experience a more realistic and rich sound source environment is increasing. In order to meet the needs of these users, research on multi-channel audio is being actively conducted.

멀티 채널 오디오 신호는 전송 환경에 따라 고효율의 데이터 압축률을 요구한다. 특히, 멀티 채널 오디오 신호를 복원하기 위해, 공간 파라미터(Spatial Parameter)가 이용된다. 이때, 공간 파라미터를 추출하는 과정에서 잔향신호의 영향으로 왜곡이 발생할 수 있다. 그러면, 멀티 채널 오디오 신호를 복원함에 있어서, 음질 열화가 발생할 수 있다.Multi-channel audio signals require high data compression rates depending on the transmission environment. In particular, spatial parameters are used to recover the multi-channel audio signals. At this time, distortion may occur due to the influence of the reverberation signal in the process of extracting the spatial parameter. Then, sound quality degradation may occur in recovering the multi-channel audio signal.

따라서, 공간 파라미터를 이용하여 멀티 채널 오디오 신호를 복원하는 경우에 발생할 수 있는 음질 열화를 감소 또는 제거할 수 있는 멀티 채널 오디오 코덱 기술을 필요로 한다.Accordingly, there is a need for a multi-channel audio codec technology capable of reducing or eliminating sound quality degradation that may occur when reconstructing a multi-channel audio signal using spatial parameters.

본 발명이 해결하고자 하는 과제는 멀티 채널 오디오 신호의 복원시 음질 열화를 최소화시킬 수 있는 오디오 신호 처리방법, 오디오 부호화장치, 오디오 복호화장치, 및 이를 채용하는 단말기를 제공하는데 있다.SUMMARY OF THE INVENTION An object of the present invention is to provide an audio signal processing method, an audio encoding apparatus, an audio decoding apparatus, and a terminal employing the same, capable of minimizing sound degradation when reconstructing a multi-channel audio signal.

상기 과제를 달성하기 위한 본 발명의 일실시예에 따른 오디오 신호 처리방법은, 제1 복수개의 입력 채널을 제2 복수개의 출력채널로 다운믹싱하는 경우, 상기 제1 복수개의 입력 채널의 위치와 상기 제2 복수개의 출력채널의 위치를 비교하는 단계; 상기 제1 복수개의 입력채널 중 상기 제2 복수개의 출력채널과 동일한 위치를 갖는 채널은 상기 제2 복수개의 출력채널 중 동일한 위치의 채널에 다운믹싱하는 단계; 상기 제1 복수개의 입력채널 중 나머지 채널에 대해서는, 적어도 하나 이상의 인접채널을 탐색하는 단계; 상기 탐색된 인접채널에 대하여 채널간 거리, 신호의 상관관계 및 복원시 에러 중 적어도 하나를 고려하여 가중치를 결정하는 단계; 및 상기 결정된 가중치에 근거하여 상기 인접채널에 상기 제1 복수개의 입력채널 중 나머지 채널을 다운믹싱하는 단계를 포함할 수 있다.The audio signal processing method according to an embodiment of the present invention for achieving the above object, when downmixing the first plurality of input channels to the second plurality of output channels, the position and the position of the first plurality of input channels Comparing the positions of the second plurality of output channels; Downmixing a channel having the same position as the second plurality of output channels among the first plurality of input channels to a channel having the same position among the second plurality of output channels; Searching for at least one adjacent channel with respect to the remaining channels of the first plurality of input channels; Determining a weight with respect to the found adjacent channel in consideration of at least one of an inter-channel distance, a correlation of a signal, and an error in reconstruction; And downmixing the remaining channels of the first plurality of input channels to the adjacent channel based on the determined weight.

..

도 1은 본 발명이 적용되는 오디오 신호처리 시스템의 구성을 나타낸 블록도이다.
도 2는 본 발명이 적용되는 오디오 부호화장치의 구성을 나타낸 블록도이다.
도 3은 본 발명이 적용되는 오디오 복호화장치의 구성을 나타낸 블록도이다.
도 4는 본 발명의 일실시예에 따른 10.2 채널 오디오신호와 5.1 채널 오디오신호간 채널 매칭 일례를 도시한 도면이다.
도 5는 본 발명의 일실시예에 따른 다운 믹싱방법을 설명하는 흐름도이다.
도 6은 본 발명의 일실시예에 따른 업 믹싱방법을 설명하는 흐름도이다.
도 7은 본 발명의 일실시예에 따른 공간 파라미터 부호화장치의 구성을 나타낸 블록도이다.
도 8a 및 도 8b는 각 다운믹싱 채널에 대하여 각 프레임의 주파수밴드내의 에너지값에 따라서 가변적인 양자화스텝의 일예를 나타내는 도면이다.
도 9는 전체 채널에 대한 스펙트럼 데이터의 주파수밴드별 에너지 분포의 예를 나타내는 도면이다.
도 10a 내지 도 10c는 문턱치를 가변시켜 전체 비트율을 조절하는 예를 나타내는 도면이다.
도 11은 본 발명의 일실시예에 따른 공간 파라미터 생성방법을 설명하는 흐름도이다.
도 12는 본 발명의 다른 실시예에 따른 공간 파라미터 생성방법을 설명하는 흐름도이다.
도 13은 본 발명의 일실시예에 오디오신호 처리방법을 설명하는 흐름도이다.
도 14a 내지 도 14c는 도 11의 1110 단계 혹은 도 13의 1330 단계를 설명하기 위한 일예의 도면이다.
도 15는 도 11의 1110 단계 혹은 도 13의 1330 단계를 설명하기 위한 다른 예의 도면이다.
도 16a 내지 도 16d는 도 11의 1110 단계 혹은 도 13의 1330 단계를 설명하기 위한 또 다른 예의 도면이다.
도 17은 앵글 파라미터들의 총 합을 나타내는 그래프이다.
도 18는 본 발명의 일실시예에 따른 앵글 파라미터들의 산출을 설명하기 위한 도면이다.
도 19는 본 발명의 일실시예에 따른 멀티채널 코덱과 코어 코덱을 통하는 오디오 신호처리 시스템의 구성을 나타낸 블록도이다.
도 20은 본 발명의 일실시예에 따른 오디오 부호화장치의 구성을 나타낸 블록도이다.
도 21은 본 발명의 일실시예에 따른 오디오 복호화장치의 구성을 나타낸 블록도이다.1 is a block diagram showing the configuration of an audio signal processing system to which the present invention is applied.
2 is a block diagram showing a configuration of an audio encoding apparatus to which the present invention is applied.
3 is a block diagram showing a configuration of an audio decoding apparatus to which the present invention is applied.
4 illustrates an example of channel matching between a 10.2 channel audio signal and a 5.1 channel audio signal according to an embodiment of the present invention.
5 is a flowchart illustrating a downmixing method according to an embodiment of the present invention.
6 is a flowchart illustrating an upmixing method according to an embodiment of the present invention.
7 is a block diagram showing the configuration of a spatial parameter encoding apparatus according to an embodiment of the present invention.
8A and 8B are diagrams showing an example of a quantization step that is variable according to an energy value in a frequency band of each frame for each downmix channel.
9 is a diagram illustrating an example of energy distribution for each frequency band of spectral data for all channels.
10A to 10C are diagrams illustrating an example of adjusting the overall bit rate by varying a threshold.
11 is a flowchart illustrating a method of generating spatial parameters according to an embodiment of the present invention.
12 is a flowchart illustrating a method of generating spatial parameters according to another embodiment of the present invention.
13 is a flowchart for explaining an audio signal processing method according to an embodiment of the present invention.
14A to 14C are exemplary diagrams for describing operation 1110 of FIG. 11 or operation 1330 of FIG. 13.
FIG. 15 is a diagram of another example for describing operation 1110 of FIG. 11 or operation 1330 of FIG. 13.
16A to 16D illustrate another example for describing operation 1110 of FIG. 11 or operation 1330 of FIG. 13.
17 is a graph showing the total sum of angle parameters.
18 is a view for explaining calculation of angle parameters according to an embodiment of the present invention.
19 is a block diagram illustrating a configuration of an audio signal processing system through a multi-channel codec and a core codec according to an embodiment of the present invention.
20 is a block diagram showing a configuration of an audio encoding apparatus according to an embodiment of the present invention.
21 is a block diagram showing the configuration of an audio decoding apparatus according to an embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 구체적으로 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 기술적 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해될 수 있다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.As the invention allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the present invention to specific embodiments, it can be understood to include all transformations, equivalents, and substitutes included in the technical spirit and technical scope of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들이 용어들에 의해 한정되는 것은 아니다. 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms such as first and second may be used to describe various components, but the components are not limited by the terms. Terms are used only for the purpose of distinguishing one component from another.

본 발명에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 본 발명에서 사용한 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나 이는 당 분야에 종사하는 기술자의 의도, 판례, 또는 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terminology used in the present invention is to select the general term is widely used as possible in consideration of the function in the present invention, but this may vary according to the intention of the person skilled in the art, precedent, or the emergence of new technology. Also, in certain cases, there may be a term selected arbitrarily by the applicant, in which case the meaning thereof will be described in detail in the description of the corresponding invention. Therefore, the terms used in the present invention should be defined based on the meanings of the terms and the contents throughout the present invention, rather than the names of the simple terms.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 발명에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present invention, the term "comprises" or "having ", etc. is intended to specify that there is a feature, number, step, operation, element, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

이하, 본 발명의 실시예들을 첨부 도면을 참조하여 상세히 설명하기로 하며, 첨부 도면을 참조하여 설명함에 있어, 동일하거나 대응하는 구성요소는 동일한 도면번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, in the following description with reference to the accompanying drawings, the same or corresponding components will be given the same reference numerals and duplicate description thereof will be omitted. do.

도 1은 본 발명이 적용되는 오디오 신호처리 시스템의 구성을 나타낸 블록도이다. 오디오 신호처리 시스템(100)은 멀티미디어 기기에 해당하며, 전화, 모바일 폰 등을 포함하는 음성통신 전용 단말기, TV, MP3 플레이어 등을 포함하는 방송 혹은 음악 전용 단말기, 혹은 음성통신 전용 단말기와 방송 혹은 음악 전용 단말기의 융합 단말기가 포함될 수 있으나, 이에 한정되는 것은 아니다. 또한, 오디오 신호처리 시스템(100)은 클라이언트, 서버 혹은 클라이언트와 서버 사이에 배치되는 변환기로서 사용될 수 있다.1 is a block diagram showing the configuration of an audio signal processing system to which the present invention is applied. The audio signal processing system 100 corresponds to a multimedia device, and includes a dedicated terminal for voice communication including a telephone and a mobile phone, a broadcast or music terminal including a TV, an MP3 player, and the like, or a broadcast or music terminal for a voice communication terminal. A fusion terminal of a dedicated terminal may be included, but is not limited thereto. In addition, the audio signal processing system 100 may be used as a client, a server, or a transducer disposed between the client and the server.

도 1을 참조하면, 오디오 신호처리 시스템(100)은 부호화 장치(110) 및 복호화 장치(120)를 포함한다. 일실시예에서 오디오 신호처리 시스템(100)은 부호화장치(110)와 복호화장치(120)를 모두 포함할 수 있고, 다른 실시예에서 오디오 신호처리 시스템(100)은 부호화장치(110)와 복호화장치(120) 중 어느 하나를 포함할 수 있다.Referring to FIG. 1, an audio signal processing system 100 includes an encoding device 110 and a decoding device 120. In one embodiment, the audio signal processing system 100 may include both the encoding apparatus 110 and the decoding apparatus 120. In another embodiment, the audio signal processing system 100 may include the encoding apparatus 110 and the decoding apparatus. It may include any one of the 120.

부호화 장치(110)는 복수개의 채널로 이루어지는 원래 신호 즉, 멀티채널 오디오 신호를 입력받고, 이를 다운믹싱하여 다운믹스된 오디오 신호를 생성한다. 부호화 장치(110)는 예측 파라미터(prediction parameter)를 생성하여 부호화한다. 여기서, 예측 파라미터는 다운믹스된 오디오 신호를 원래 신호로 복원하기 위해 적용되는 파라미터이다. 구체적으로, 원래 신호의 다운믹싱을 위하여 사용되는 다운믹스 매트릭스, 다운믹스 매트릭스내에 포함되는 각 계수값 등과 관련된 값이다. 일예를 들면, 예측 파라미터는 공간 파라미터를 포함할 수 있다. 또한, 예측 파라미터는 부호화 장치(110) 또는 복호화 장치(120)의 제품 사양, 설계 사양 등에 따라서 달라질 수 있으며, 실험적으로 최적화된 값으로 설정될 수 있다. 여기서, 채널은 스피커를 의미할 수 있다.The encoding apparatus 110 receives an original signal composed of a plurality of channels, that is, a multichannel audio signal, and downmixes the original signal to generate a downmixed audio signal. The encoding apparatus 110 generates and encodes a prediction parameter. Here, the prediction parameter is a parameter applied to restore the downmixed audio signal to the original signal. Specifically, it is a value related to the downmix matrix used for downmixing the original signal, each coefficient value included in the downmix matrix, and the like. For example, the prediction parameter may comprise a spatial parameter. In addition, the prediction parameter may vary according to a product specification, a design specification, etc. of the encoding apparatus 110 or the decoding apparatus 120, and may be set to an experimentally optimized value. Here, the channel may mean a speaker.

복호화 장치(120)는 다운믹스된 오디오 신호를 예측 파라미터를 이용하여 업믹싱하여 원본신호인 멀티채널 오디오 신호에 대응하는 복원신호를 생성한다.The decoding device 120 upmixes the downmixed audio signal using the prediction parameter to generate a reconstruction signal corresponding to the multichannel audio signal as the original signal.

도 2는 본 발명이 적용되는 오디오 부호화장치의 구성을 나타낸 블록도이다.2 is a block diagram showing a configuration of an audio encoding apparatus to which the present invention is applied.

도 2를 참조하면, 오디오 부호화장치(200)는 다운믹싱부(210), 부가정보 생성부(220) 및 부호화부(230)를 포함할 수 있다. 각 구성요소는 적어도 하나 이상의 모듈로 일체화되어 적어도 하나 이상의 프로세서(미도시)로 구현될 수 있다.Referring to FIG. 2, the audio encoding apparatus 200 may include a downmixer 210, an additional information generator 220, and an encoder 230. Each component may be integrated into at least one or more modules and implemented as at least one or more processors (not shown).

다운믹싱부(210)는 N개의 멀티채널 오디오신호를 수신하고, 수신된 멀티채널 오디오신호를 다운믹싱한다. N 채널 오디오신호를 다운믹싱하여 모노채널 오디오신호를 생성하거나 M(여기서 M<N) 채널 오디오신호를 생성할 수 있다. 예를 들어, 10.2 채널 오디오신호를 다운믹싱하여 2.1 채널 오디오신호 또는 5.1 채널 오디오신호에 대응되도록 3개 채널 오디오신호 또는 6개 채널 오디오신호로 다운믹싱할 수 있다.The downmixer 210 receives the N multichannel audio signals and downmixes the received multichannel audio signals. The N-channel audio signal can be downmixed to produce a mono-channel audio signal or an M (where M < N) channel audio signal. For example, the 10.2 channel audio signal may be downmixed into a 3 channel audio signal or a 6 channel audio signal to correspond to a 2.1 channel audio signal or a 5.1 channel audio signal.

일실시예에 따르면, N 채널에서 두 채널을 선택하여 다운믹싱하여 제1 모노채널을 생성하고, 생성된 제1 모노채널과 다른 채널을 다시 다운믹싱하여 제2 모노채널을 생성한다. 다운믹싱 결과 생성되는 모노채널에 다른 채널을 추가하여 다운믹싱하는 과정을 반복하여 최종 모노채널 오디오신호 또는 M 채널 오디오신호를 생성할 수 있다.According to an embodiment, two channels are selected from the N channels and downmixed to generate a first monochannel, and a second monochannel is generated by downmixing another channel different from the generated first monochannel. The final monochannel audio signal or the M channel audio signal may be generated by repeating the downmixing process by adding another channel to the monochannel generated as the downmixing result.

N 채널 오디오신호를 다운믹싱함에 있어, 엔트로피를 최소화하면서 다운믹싱하기 위해서는 유사한 채널을 다운믹스하는 것이 바람직하다. 따라서, 다운믹싱부(210)에서는 상관관계가 높은 채널들끼리 다운믹싱함으로써 보다 높은 압축률로 멀티채널 오디오신호를 다운믹싱할 수 있다.In downmixing N-channel audio signals, it is desirable to downmix similar channels in order to downmix with minimal entropy. Therefore, the downmixing unit 210 can downmix the multi-channel audio signal at a higher compression rate by downmixing the highly correlated channels.

부가정보 생성부(220)는 다운믹싱된 채널로부터 멀티채널을 복원하기 위해 필요한 부가정보를 생성한다. 다운믹싱부(210)가 멀티채널을 순차적으로 다운믹싱할 때마다, 다운믹싱된 채널에서 멀티채널을 복원하기 위해 필요한 부가정보를 생성한다. 이때 다운믹싱된 두개 채널의 세기를 결정하기 위한 정보 및 두개 채널의 위상을 결정하기 위한 정보를 생성할 수 있다.The additional information generator 220 generates additional information necessary to restore the multichannel from the downmixed channel. Each time the downmixing unit 210 sequentially downmixes the multichannels, the downmixing unit 210 generates additional information necessary to restore the multichannels from the downmixed channels. In this case, information for determining the strength of the downmixed two channels and information for determining the phase of the two channels may be generated.

또한, 부가정보 생성부(220)는 다운믹싱이 진행될 때마다, 어떤 채널들이 다운믹싱되었는지 나타내는 정보를 생성한다. 고정된 순서가 아니라, 상관도 계산에 기초해 순차적으로 다운믹싱되는 경우에는 채널들의 다운믹싱 순서를 부가정보로서 생성할 수 있다.In addition, whenever the downmixing is performed, the additional information generating unit 220 generates information indicating which channels are downmixed. When the downmixing is sequentially performed based on the correlation calculation rather than the fixed order, the downmixing order of the channels may be generated as additional information.

부가정보 생성부(220)는 다운믹싱이 계속될 때마다 모노채널에서 다운믹싱된 채널을 복원하기 위해 필요한 정보들의 생성을 반복한다. 예를 들어, 12개의 채널을 11회 반복하여 순차적으로 다운믹싱하여 하나의 모노채널을 생성한다면, 다운믹싱 순서에 대한 정보, 채널의 세기를 결정하기 위한 정보 및 채널의 위상을 결정하기 위한 정보가 각각 11회씩 생성된다. 또한 일실시예에 따르면, 채널의 세기를 결정하기 위한 정보 및 채널의 위상을 결정하기 위한 정보를 복수의 주파수 밴드 각각에 대해 생성하는 경우, 주파수 밴드의 개수가 k라고 하면, 채널의 세기를 결정하기 위한 정보가 11*k개 생성되고, 채널의 위상을 결정하기 위한 정보 11*k개 생성될 수 있다.The additional information generator 220 repeats generation of information necessary to restore the downmixed channel in the mono channel whenever downmixing continues. For example, if a single channel is generated by sequentially downmixing 12 channels 11 times in sequence, information about the downmixing order, information for determining channel strength, and information for determining channel phase are provided. 11 times each. In addition, according to an embodiment, when generating information for determining the strength of the channel and information for determining the phase of the channel for each of a plurality of frequency bands, if the number of frequency bands is k, the strength of the channel is determined. 11 * k information may be generated, and 11 * k information for determining a phase of a channel may be generated.

부호화부(230)는 다운믹싱부(210)에서 다운믹싱되어 생성된 모노채널 오디오 신호 또는 M 채널 오디오 신호를 부호화할 수 있다. 다운믹싱부(210)에서 출력되는 오디오가 아날로그 신호인 경우에는 아날로그 신호를 디지털 신호로 변환한 다음, 심볼들을 소정의 알고리즘에 따라 부호화한다. 부호화 알고리즘에는 제한이 없으며, 오디오 신호를 부호화하여 비트스트림을 생성하는 모든 알고리즘이 부호화부(230)에서 이용될 수 있다. 또한, 부호화부(230)는 부가정보 생성부(220)에서 모노채널 오디오신호로부터 멀티채널 오디오신호를 복원하기 위해 생성된 부가정보를 부호화할 수 있다.The encoder 230 may encode a monochannel audio signal or an M channel audio signal generated by downmixing by the downmixer 210. When the audio output from the downmixing unit 210 is an analog signal, the analog signal is converted into a digital signal, and the symbols are encoded according to a predetermined algorithm. There is no limitation to an encoding algorithm, and any algorithm for encoding a audio signal to generate a bitstream may be used in the encoder 230. In addition, the encoder 230 may encode the additional information generated by the additional information generator 220 to recover the multichannel audio signal from the monochannel audio signal.

도 3은 본 발명이 적용되는 오디오 복호화장치의 구성을 나타낸 블록도이다.3 is a block diagram showing a configuration of an audio decoding apparatus to which the present invention is applied.

도 3을 참조하면, 오디오 복호화장치(300)는 추출부(310), 복호화부(320) 및 업믹싱부(330)를 포함할 수 있다. 각 구성요소는 적어도 하나 이상의 모듈로 일체화되어 적어도 하나 이상의 프로세서(미도시)로 구현될 수 있다.Referring to FIG. 3, the audio decoding apparatus 300 may include an extractor 310, a decoder 320, and an upmixer 330. Each component may be integrated into at least one or more modules and implemented as at least one or more processors (not shown).

추출부(310)는 수신된 오디오 데이터 즉, 비트스트림으로부터 부호화된 오디오 및 부호화된 부가정보를 추출한다. 부호화된 오디오는 N 채널을 하나의 모노채널 또는 M(여기서 M<N) 채널로 다운믹싱한 다음, 소정의 알고리즘에 따라 오디오 신호를 부호화하여 생성된 것일 수 있다.The extractor 310 extracts the encoded audio and the encoded additional information from the received audio data, that is, the bitstream. The encoded audio may be generated by downmixing the N channels to one mono channel or M (where M <N) channel, and then encoding the audio signal according to a predetermined algorithm.

복호화부(320)는 추출부(310)에서 추출된 부호화된 오디오 및 부가정보를 복호화한다. 부호화에 이용된 알고리즘과 동일한 알고리즘을 이용하여 부호화된 오디오 및 부가정보를 복호화한다. 오디오의 복호화 결과, 하나의 모노채널 오디오신호 또는 M개 멀티채널 오디오신호가 복원된다.The decoder 320 decodes the encoded audio and additional information extracted by the extractor 310. The encoded audio and the additional information are decoded using the same algorithm as the algorithm used for encoding. As a result of decoding the audio, one monochannel audio signal or M multichannel audio signals are restored.

업믹싱부(330)는 복호화부(720)에서 복호화된 오디오 신호를 업믹싱(up-mixing)하여 다운믹싱 이전의 N 채널 오디오신호를 복원한다. 이때, 복호화부(320)에서 복호화된 부가정보에 기초하여 N 채널 오디오신호를 복원한다.The upmixer 330 up-mixes the audio signal decoded by the decoder 720 to restore the N-channel audio signal before downmixing. At this time, the N-channel audio signal is restored based on the additional information decoded by the decoder 320.

즉, 공간 파라미터인 부가정보를 참조하여 다운믹스 과정을 역으로 수행하여 다운믹싱된 오디오신호를 멀티채널 오디오신호로 업믹싱한다. 이때, 채널들의 다운믹싱 순서에 대한 정보가 포함되어 있는 부가정보를 참조하여, 모노채널에서 순서대로 채널들을 분리한다. 다운믹스된 채널들의 세기 및 위상을 결정하기 위한 정보에 따라 다운믹스된 채널들의 세기 및 위상을 결정함으로써 모노채널에서 순서대로 채널들을 분리할 수 있다.That is, the downmix process is performed inversely with reference to the additional information, which is a spatial parameter, to upmix the downmixed audio signal into a multichannel audio signal. At this time, the channels are separated in order from the mono channel with reference to the additional information including information on the downmixing order of the channels. By determining the strength and phase of the downmixed channels according to the information for determining the strength and phase of the downmixed channels, the channels may be separated in order in the monochannel.

도 4는 본 발명의 일실시예에 따른 10.2 채널 오디오 신호와 5.1 채널 오디오 신호간 채널 매칭 일례를 도시한 도면이다.4 illustrates an example of channel matching between a 10.2 channel audio signal and a 5.1 channel audio signal according to an embodiment of the present invention.

입력 멀티채널 오디오 신호가 10.2 채널 오디오 신호인 경우, 7.1 채널 오디오 신호, 5.1 채널 오디오 신호, 2.0 채널 오디오 신호와 같이 10.2 채널보다 적은 수의 다운믹스된 멀티채널 오디오 신호를 출력 멀티채널 오디오 신호로 사용할 수 있다.If the input multichannel audio signal is a 10.2 channel audio signal, fewer downmixed multichannel audio signals, such as 7.1 channel audio signals, 5.1 channel audio signals, or 2.0 channel audio signals, are available as output multichannel audio signals. Can be.

도 4에서와 같이, 10.2 채널 오디오 신호(410)를 5.1 채널 오디오 신호(420)로 다운믹싱하는 경우, 5.1 채널에서 10.2 채널 중 LW 채널의 인접 채널로 FL 및 RL 채널이 확인되면 위치, 상관관계, 혹은 복원시 에러를 고려하여 가중치를 결정할 수 있다. 일실시예에서, FL 채널에 대한 가중치가 0, RL 채널에 대한 가중치가 1로 결정되면 10.2 채널 중 LW 채널의 채널신호를 5.1 채널 중 RL 채널에 다운믹싱할 수 있다.As shown in FIG. 4, when downmixing the 10.2 channel audio signal 410 to the 5.1 channel audio signal 420, if the FL and RL channels are identified as the adjacent channels of the LW channel among the 10.2 channels, the position and correlation Alternatively, the weight may be determined in consideration of an error in restoration. According to an embodiment, when the weight for the FL channel is 0 and the weight for the RL channel is determined to be 1, the channel signal of the LW channel among the 10.2 channels may be downmixed to the RL channel among the 5.1 channels.

한편, 도 4에 있어서, 10.2 채널 중 L 채널과 Ls 채널은 동일한 위치에 있는 5.1 채널의 FL 채널과 RL 채널에 할당될 수 있다.Meanwhile, in FIG. 4, the L channel and the Ls channel among the 10.2 channels may be allocated to the FL channel and the RL channel of the 5.1 channel located at the same position.

도 5는 본 발명의 일실시예에 따른 다운 믹싱방법을 설명하는 흐름도이다.5 is a flowchart illustrating a downmixing method according to an embodiment of the present invention.

도 5를 참조하면, 510 단계에서는, 제1 레이아웃 정보로부터 입력 채널의 수와 위치를 확인한다. 예를 들면, 제1 레이아웃 정보는 IC(1), IC(2), ,IC(N)으로서, N개의 입력채널의 위치를 알 수 있다.Referring to FIG. 5, in step 510, the number and positions of input channels are checked from the first layout information. For example, the first layout information is IC 1, IC 2, and IC (N), where the positions of the N input channels can be known.

520 단계에서는, 제2 레이아웃 정보로부터 다운믹스된 채널 즉, 출력 채널의 수와 위치를 확인한다. 예를 들면, 제2 레이아웃 정보는 DC(1), DC(2), ,DC(M)으로서, M(여기서 M<N)개의 출력채널의 위치를 알 수 있다.In operation 520, the number and locations of downmixed channels, that is, output channels, are determined from the second layout information. For example, the second layout information is DC (1), DC (2), and DC (M), where the positions of M output channels can be known.

530 단계에서는, 입력 채널의 첫번째 채널 IC(1)에서부터 시작하여, 입력 채널과 출력 채널 중 출력 위치가 동일한 채널이 있는지 여부를 확인한다.In operation 530, starting from the first channel IC 1 of the input channel, it is determined whether there is a channel having the same output position among the input channel and the output channel.

540 단계에서는, 입력 채널과 출력 채널 중 출력 위치가 동일한 채널이 있는 경우, 동일한 위치의 출력 채널에 입력 채널의 채널 신호를 할당할 수 있다. 예를 들어, 입력 채널 IC(n)과 출력 채널 DC(m)의 출력 위치가 동일할 경우, DC(m) = DC(m) + IC(n)이 될 수 있다.In operation 540, when there is a channel having the same output position among the input channel and the output channel, the channel signal of the input channel may be allocated to the output channel having the same position. For example, when the output positions of the input channel IC (n) and the output channel DC (m) are the same, DC (m) = DC (m) + IC (n).

550 단계에서는, 입력 채널과 출력 채널 중 출력 위치가 동일한 채널이 없는 경우, 입력 채널의 첫번째 채널 IC(1)에서부터 시작하여, 출력 채널 중 입력 채널 IC(n)에 인접한 채널이 있는지를 확인한다.In step 550, if there is no channel having the same output position among the input channel and the output channel, starting from the first channel IC 1 of the input channel, it is checked whether there is a channel adjacent to the input channel IC (n) among the output channels.

560 단계에서는, 550 단계에서 복수개의 인접 채널이 존재하는 것으로 확인되면, 확인된 각 인접 채널에 대하여 대응하는 소정의 가중치를 이용하여 입력 채널 IC(n)의 채널신호를 각 인접 채널에 분배한다. 예를 들어, 출력 채널에서 DC(i), DC(j), DC(k)가 입력 채널 IC(n)의 인접 채널로 확인되는 경우, 입력 채널 IC(n)와 출력 채널 DC(i), 입력 채널 IC(n)와 출력 채널 DC(j), 입력 채널 IC(n)와 출력 채널 DC(k) 각각에 대하여 가중치 w_i, w_j, w_k가 설정될 수 있다. 설정된 가중치 w_i, w_j, w_k를 이용하여 DC(i) = DC(i) + w_i * IC(n), DC(j) = DC(j) + w_j * IC(n), DC(k) = DC(k) + w_k * IC(n)와 같이 입력 채널 IC(n)의 채널신호가 분배될 수 있다.In step 560, if it is determined in step 550 that there are a plurality of adjacent channels, the channel signal of the input channel IC (n) is distributed to each adjacent channel using a predetermined weight corresponding to each identified adjacent channel. For example, if DC (i), DC (j), DC (k) is identified as an adjacent channel of the input channel IC (n) in the output channel, the input channel IC (n) and the output channel DC (i), Weights w _i , w _j , and w _k may be set for the input channel IC (n) and the output channel DC (j) and the input channel IC (n) and the output channel DC (k), respectively. DC (i) = DC (i) + w _i using the set weights w _i , w _j , w _k * IC (n), DC (j) = DC (j) + w _j * IC (n), DC (k) = DC (k) + w _k Like the IC (n), the channel signal of the input channel IC (n) can be distributed.

한편, 가중치는 다음과 같은 방법에 의거하여 설정할 수 있다.In addition, a weight can be set based on the following method.

일실시예에서는 복수개의 인접채널과 입력채널 IC(n) 간의 관계에 따라서 가중치를 결정할 수 있다. 복수개의 인접채널과 입력채널 IC(n) 간의 관계로는, 복수개의 인접채널과 입력채널 IC(n) 간의 거리, 복수개의 인접채널의 각 채널신호와 입력채널 IC(n)의 채널신호의 상관관계, 복수개의 인접채널에서의 복원에러 중 적어도 하나를 적용할 수 있다.In one embodiment, the weight may be determined according to the relationship between the plurality of adjacent channels and the input channel IC (n). As a relationship between a plurality of adjacent channels and the input channel IC (n), the correlation between the distance between the plurality of adjacent channels and the input channel IC (n), each channel signal of the plurality of adjacent channels and the channel signal of the input channel IC (n) At least one of a relationship and recovery errors in a plurality of adjacent channels may be applied.

다른 실시예에서는 복수개의 인접채널과 입력채널 IC(n) 간의 관계에 따라서 가중치를 0 혹은 1로 결정할 수 있다. 예를 들어, 복수개의 인접채널 중 입력채널 IC(n)과의 거리가 가장 가까운 인접채널을 1로 결정하고, 나머지 인접채널은 0으로 결정할 수 있다. 또는, 복수개의 인접채널의 채널신호 중 입력채널 IC(n)의 채널신호와 상관관계가 가장 높은 인접채널을 1로 결정하고, 나머지 인접채널은 0으로 결정할 수 있다. 또는, 복수개의 인접채널 중 복원에러가 가장 적은 인접채널을 1로 결정하고, 나머지 인접채널은 0으로 결정할 수 있다.In another embodiment, the weight may be determined as 0 or 1 depending on the relationship between the plurality of adjacent channels and the input channel IC (n). For example, among the plurality of adjacent channels, the adjacent channel having the closest distance to the input channel IC (n) may be determined as 1, and the remaining adjacent channels may be determined as 0. Alternatively, the adjacent channel having the highest correlation with the channel signal of the input channel IC (n) among the channel signals of the plurality of adjacent channels may be determined as 1, and the remaining adjacent channels may be determined as 0. Alternatively, the adjacent channel having the least reconstruction error among the plurality of adjacent channels may be determined as 1, and the remaining adjacent channels may be determined as 0.

570 단계에서는, 입력 채널의 모든 채널을 확인하였는지 판단하고, 모든 채널에 대하여 확인되지 않은 경우 530 단계로 복귀하여 530 내지 560 단계를 반복 수행한다.In step 570, it is determined whether all the channels of the input channel have been checked.

580 단계에서는, 입력 채널의 모든 채널에 대하여 확인된 경우, 최종적으로 540 단계에서 할당된 신호 및 560 단계에서 분배된 신호를 갖는 다운믹스된 채널들의 구성(configuration) 정보 및 이에 대응되는 공간 파라미터를 생성한다.In step 580, when all the channels of the input channel are checked, configuration information of the downmixed channels having a signal allocated in step 540 and a signal distributed in step 560 and a spatial parameter corresponding thereto are generated. do.

상기 실시예에 따른 다운 믹싱방법은 채널, 프레임, 주파수밴드 혹은 주파수대역 단위로 수행될 수 있으므로 필요에 따라서 성능 향상의 정밀도를 조절할 수 있다. 여기서, 주파수밴드는 오디오 스펙트럼의 샘플들을 그루핑한 단위로서, 임계대역을 반영하여 균일 혹은 비균일 길이를 가질 수 있다. 비균일한 경우, 한 프레임에 대하여 시작 샘플에서부터 마지막 샘플에 이르기까지 주파수밴드에 포함되는 샘플의 개수가 점점 증가하도록 설정할 수 있다. 또한 다중 비트율을 지원하는 경우, 서로 다른 비트율에서 대응하는 각 주파수밴드에 포함되는 샘플의 갯수가 동일해지도록 설정할 수 있다. 한 프레임에 포함되는 주파수밴드의 개수 혹은 한 주파수밴드에 포함되는 샘플의 개수는 미리 결정될 수 있다.The downmixing method according to the above embodiment may be performed in units of channels, frames, frequency bands, or frequency bands, thereby adjusting the precision of performance improvement as necessary. Here, the frequency band is a unit of grouping samples of the audio spectrum and may have a uniform or non-uniform length reflecting a critical band. In the case of non-uniformity, the number of samples included in the frequency band from the start sample to the last sample may be gradually increased for one frame. In the case of supporting multiple bit rates, the number of samples included in each frequency band corresponding to different bit rates may be set to be the same. The number of frequency bands included in one frame or the number of samples included in one frequency band may be predetermined.

또한, 실시예에 따른 다운 믹싱방법은 다운믹스된 채널의 레이아웃과 입력 채널의 레이아웃에 대응하여, 채널 다운 믹싱에 사용되는 가중치를 결정할 수 있다. 이에 따르면, 다양한 레이아웃에 적응적으로 대응할 수 있으며, 채널의 위치뿐 아니라 채널신호들간의 상관관계 혹은 복원시 에러를 고려하여 가중치를 결정함으로써, 복원 음질을 향상시킬 수 있다. 또한, 채널의 위치, 채널신호들간의 상관관계 혹은 복원시 에러를 고려하여 다운믹스된 채널들이 구성되어 있으므로, 오디오 복호화장치가 다운믹싱된 채널수와 동일한 채널을 가지고 있는 경우 별도의 업믹싱과정없이 사용자가 다운믹스된 채널들만 청취하더라도 주관적인 음질 열화를 인식할 수 없는 이점이 있다.In addition, the downmixing method according to the embodiment may determine a weight used for channel downmixing, corresponding to the layout of the downmixed channel and the layout of the input channel. According to this, it is possible to adaptively adapt to various layouts, and to determine the weight in consideration of the correlation between the channel signals as well as the position of the channel or an error in reconstruction, thereby improving the reconstructed sound quality. In addition, since downmixed channels are configured in consideration of channel position, correlation between channel signals, or an error in reconstruction, if the audio decoding apparatus has the same number of downmixed channels, there is no upmixing process. Even if a user listens to only downmixed channels, there is an advantage in that subjective sound quality deterioration cannot be recognized.

도 6은 본 발명의 일실시예에 따른 업 믹싱방법을 설명하는 흐름도이다.6 is a flowchart illustrating an upmixing method according to an embodiment of the present invention.

도 6을 참조하면, 610 단계에서는 도 5에 도시된 바와 같은 프로세스를 통하여 생성된, 다운믹스된 채널들의 구성(configuration) 정보 및 이에 대응되는 공간 파라미터를 수신한다.Referring to FIG. 6, in step 610, configuration information of downmixed channels and a spatial parameter corresponding thereto are generated through a process as illustrated in FIG. 5.

620 단계에서는, 610 단계에서 수신된 다운믹스된 채널들의 구성(configuration) 정보 및 이에 대응되는 공간 파라미터를 이용하여 업믹싱을 수행하여 입력 채널 오디오신호를 복원한다.In operation 620, the input channel audio signal is restored by performing upmixing using configuration information of the downmixed channels received in operation 610 and spatial parameters corresponding thereto.

도 7은 본 발명의 일실시예에 따른 공간 파라미터 부호화장치의 구성을 나타낸 블록도로서, 도 2의 부호화부(230)에 포함될 수 있다.FIG. 7 is a block diagram illustrating a configuration of a spatial parameter encoding apparatus according to an embodiment of the present invention, and may be included in the encoder 230 of FIG. 2.

도 7을 참조하면, 공간 파라미터 부호화장치(700)는 에너지 산출부(710), 양자화스텝 결정부(720), 양자화부(730) 및 다중화부(740)를 포함할 수 있다. 각 구성요소는 적어도 하나 이상의 모듈로 일체화되어 적어도 하나 이상의 프로세서(미도시)로 구현될 수 있다.Referring to FIG. 7, the spatial parameter encoding apparatus 700 may include an energy calculator 710, a quantization step determiner 720, a quantizer 730, and a multiplexer 740. Each component may be integrated into at least one or more modules and implemented as at least one or more processors (not shown).

에너지 산출부(710)는 다운믹싱부(도 2의 210)으로부터 제공되는 다운믹싱된 채널신호를 입력하여, 채널, 프레임, 주파수밴드 혹은 주파수대역 단위로 에너지값을 산출한다. 여기서, 에너지값의 예로는 norm 값을 들 수 있다.The energy calculator 710 inputs a downmixed channel signal provided from the downmixer 210 of FIG. 2 to calculate an energy value in units of channels, frames, frequency bands, or frequency bands. Here, an example of the energy value may be a norm value.

양자화스텝 결정부(720)는 에너지 산출부(710)에서 제공되는 채널, 프레임, 주파수밴드 혹은 주파수대역 단위로 산출된 에너지값을 이용하여, 양자화스텝을 결정한다. 예를 들어, 에너지값이 큰 채널, 프레임, 주파수밴드 혹은 주파수대역에 대해서는 양자화스텝을 작게 하고, 에너지값이 작은 채널, 프레임, 주파수밴드 혹은 주파수대역에 대해서는 양자화스텝을 크게 할 수 있다. 이때, 두가지 양자화 스텝을 설정하고, 에너지값과 소정 문턱치와의 비교결과에 따라서 하나의 양자화스텝을 선택할 수 있다. 한편, 에너지값의 분포에 대응하여 적응적으로 양자화스텝을 할당하는 경우, 에너지값의 분포에 매칭되는 양자화스텝을 선택할 수 있다. 이에 따르면, 청각적 중요도에 따라서 양자화에 소요되는 할당 비트가 조절될 수 있으므로 음질 향상을 가능케 한다. 또한, 일실시예에 따르면, 각 다운믹싱 채널의 에너지 분포에 따라서 할당된 가중치를 유지하면서, 임계 주파수를 가변시켜 전체 비트율을 조절할 수도 있다.The quantization step determiner 720 determines the quantization step by using energy values calculated in units of channels, frames, frequency bands, or frequency bands provided by the energy calculator 710. For example, the quantization step can be reduced for a channel, frame, frequency band or frequency band with a large energy value, and the quantization step can be increased for a channel, frame, frequency band or frequency band with a small energy value. In this case, two quantization steps may be set, and one quantization step may be selected according to a result of comparing the energy value with a predetermined threshold. On the other hand, when adaptively allocating a quantization step corresponding to the distribution of energy values, it is possible to select a quantization step matching the distribution of energy values. According to this, the allocation bit required for quantization can be adjusted according to the auditory importance, thereby improving sound quality. In addition, according to an embodiment, the overall bit rate may be adjusted by varying the threshold frequency while maintaining the assigned weight according to the energy distribution of each downmixing channel.

양자화 및 무손실부호화부(730)는 양자화스텝 결정부(720)에서 결정된 양자화스텝을 이용하여, 채널, 프레임, 주파수밴드 혹은 주파수대역 단위로 공간 파라미터를 양자화한 다음, 무손실 부호화한다.The quantization and lossless encoding unit 730 quantizes a spatial parameter in units of a channel, a frame, a frequency band, or a frequency band by using the quantization step determined by the quantization step determiner 720, and then performs lossless encoding.

다중화부(740)는 무손실 부호화된 공간 파라미터와 함께 무손실 부호화된 다운믹싱된 오디오신호를 다중화하여 비트스트림을 형성한다.The multiplexer 740 multiplexes the lossless coded downmixed audio signal together with the lossless coded spatial parameters to form a bitstream.

도 8a 및 도 8b는 각 다운믹싱 채널에 대하여 각 프레임의 주파수밴드내의 에너지값에 따라서 가변적인 양자화스텝의 일예를 나타내는 도면이다. 여기서, 채널 1과 채널 2를 다운믹싱하고, 채널 3와 채널 4를 다운믹싱하는 것을 예로 들기로 한다. d0는 채널 1과 채널 2에 대한 다운믹싱 채널의 에너지값, d1은 채널 3과 채널 4에 대한 다운믹싱 채널의 에너지값을 각각 나타낸다.8A and 8B are diagrams showing an example of a quantization step that is variable according to an energy value in a frequency band of each frame for each downmix channel. Here, an example of downmixing channel 1 and channel 2 and downmixing channel 3 and channel 4 will be given. d0 represents an energy value of the downmixing channel for channel 1 and channel 2, and d1 represents an energy value of the downmixing channel for channel 3 and channel 4, respectively.

도 8a 및 도 8b는 두가지 양자화스텝이 설정된 예로서, 해칭 처리된 부분이 소정 문턱치 이상의 에너지값을 갖는 주파수밴드에 해당하므로 작은 양자화스텝으로 설정한다.8A and 8B are examples in which two quantization steps are set. Since the hatched portion corresponds to a frequency band having an energy value equal to or greater than a predetermined threshold, it is set as a small quantization step.

도 9는 전체 채널에 대한 스펙트럼 데이터의 주파수밴드별 에너지 분포의 예를 나타내며, 도 10a 내지 도 10c는 채널별 에너지값에 따라서 가중치를 할당한 상태에서, 에너지 분포를 고려하여 임계 주파수를 가변시켜 전체 비트율을 조절하는 예를 나타내는 도면이다.FIG. 9 shows an example of energy distribution for each frequency band of spectral data for all channels, and FIGS. 10A to 10C show that the threshold frequency is changed in consideration of energy distribution while allocating weights according to energy values for each channel. It is a figure which shows the example which adjusts a bit rate.

도 10a는 초기 임계 주파수(100a)에 근거하여, 좌측 부분 즉, 임계 주파수보다 작은 저주파수 영역(110a, 120a, 130a)은 양자화스텝을 작게 설정하고, 우측 부분 즉, 임계 주파수보다 큰 고주파수 영역(110b, 120b, 130b)은 양자화스텝을 크게 설정하는 예를 나타낸다. 도 10b는 초기 임계 주파수(100a)에 비하여 증가된 임계 주파수(100b)를 사용함으로써, 양자화스텝을 작게 설정하는 영역(140a, 150a, 160a)이 늘어나게 되어 전체 비트율이 높아지는 예를 나타낸다. 도 10c는 초기 임계 주파수(100a)에 비하여 감소된 임계 주파수(100c)를 사용함으로써, 양자화스텝을 작게 설정하는 영역(170a, 180a, 190a)이 줄어들게 되어 전체 비트율이 낮아지는 예를 나타낸다.FIG. 10A shows that the left portion, i.e., the low frequency regions 110a, 120a, 130a smaller than the threshold frequency, set the quantization step smaller, and the right portion, i.e., the high frequency region 110b larger than the threshold frequency, based on the initial threshold frequency 100a. 120b and 130b show an example in which the quantization step is set large. FIG. 10B shows an example in which the areas 140a, 150a, 160a for setting the quantization step are increased by using the increased threshold frequency 100b compared to the initial threshold frequency 100a, thereby increasing the overall bit rate. FIG. 10C illustrates an example in which the areas 170a, 180a, and 190a for setting the quantization step are reduced by using the reduced threshold frequency 100c compared to the initial threshold frequency 100a, thereby lowering the overall bit rate.

도 11은 본 발명의 일실시예에 따른 공간 파라미터 생성방법을 설명하는 흐름도로서, 도 2의 부호화장치(200)에서 수행될 수 있다.11 is a flowchart illustrating a method of generating a spatial parameter according to an embodiment of the present invention, which may be performed by the encoding apparatus 200 of FIG. 2.

도 11을 참조하면, 1110 단계에서는 N개의 앵글 파라미터를 생성한다.Referring to FIG. 11, in operation 1110, N angle parameters are generated.

1120 단계에서는 N개의 앵글 파라미터 중 (N-1)개의 앵글 파라미터에 대하여 독립적으로 부호화를 수행한다.In step 1120, encoding is independently performed on the (N-1) angle parameters among the N angle parameters.

1130 단계에서는 (N-1)개의 앵글 파라미터로부터 나머지 하나의 앵글 파라미터를 예측한다.In operation 1130, the other one angle parameter is predicted from the (N-1) angle parameters.

1140 단계에서는 예측된 앵글 파라미터에 대하여 레지듀얼 부호화를 수행하여 나머지 하나의 앵글 파라미터의 레지듀를 생성한다.In operation 1140, residual encoding is performed on the predicted angle parameter to generate residue of the other angle parameter.

도 12는 본 발명의 다른 실시예에 따른 공간 파라미터 생성방법을 설명하는 흐름도로서, 도 3의 복호화장치(300)에서 수행될 수 있다.12 is a flowchart illustrating a method of generating a spatial parameter according to another embodiment of the present invention, which may be performed by the decoding apparatus 300 of FIG. 3.

도 12를 참조하면, 1210 단계에서는 N개의 앵글 파라미터 중 (N-1)개의 앵글 파라미터를 수신한다.Referring to FIG. 12, in step 1210, (N-1) angle parameters among N angle parameters are received.

1220 단계에서는 (N-1)개의 앵글 파라미터로부터 나머지 하나의 앵글 파라미터를 예측한다.In operation 1220, the other angle parameter is predicted from the (N-1) angle parameters.

1230 단계에서는 예측된 앵글 파라미터와 레지듀를 가산하여 나머지 하나의 앵글 파라미터를 생성한다.In operation 1230, the predicted angle parameter and the residue are added to generate the other angle parameter.

도 13은 본 발명의 일실시예에 오디오신호 처리방법을 설명하는 흐름도이다.13 is a flowchart for explaining an audio signal processing method according to an embodiment of the present invention.

도 13을 참조하면, 1310 단계에서는 멀티채널 신호인 n 개의 채널 신호들(ch1 내지 chn)을 다운믹싱한다. 구체적으로, n 개의 채널 신호들(ch1 내지 chn)은 하나의 모노신호(DM)로 다운믹싱될 수 있다. 1310 단계의 동작은 다운믹싱 부(도 2의 210)에서 수행될 수 있다.Referring to FIG. 13, in operation 1310, n channel signals ch1 to chn that are multichannel signals are downmixed. In detail, the n channel signals ch1 to chn may be downmixed into one mono signal DM. Operation 1310 may be performed by the downmixing unit 210 of FIG. 2.

1320 단계에서는 입력된 n 개의 채널 신호들(ch1 내지 chn) 중 (n-1)개의 채널 신호들을 합산하거나 또는 입력된 n 개의 채널 신호들(ch1 내지 chn)을 합산한다. 구체적으로, 제1 내지 제 n 채널 신호들(ch1 내지 chn) 중 기준 채널 신호를 제외한 나머지 채널 신호들을 합산할 수 있으며, 합산된 신호는 전술한 제1 합산 신호가 된다. 또는, 제1 내지 제 n 채널 신호들(ch1 내지 chn) 전체를 합산할 수 있으며, 합산된 신호는 전술한 제2 합산 신호가 된다.In operation 1320, the (n-1) channel signals among the n channel signals ch1 to chn inputted are summed or the n channel signals ch1 to chn inputted. Specifically, the remaining channel signals except for the reference channel signal among the first to nth channel signals ch1 to chn may be summed, and the summed signal becomes the first summation signal described above. Alternatively, all of the first to n th channel signals ch1 to chn may be summed, and the summed signal may be the second summation signal.

1330 단계에서는, 1320 단계에서 생성된 신호인 제1 합산 신호와 기준 채널 신호 간의 상관관계를 이용하여 전술한 제1 공간 파라미터를 생성할 수 있다. 또는, 1330 단계에서는, 제1 공간 파라미터는 생성하지 않고, 1320 단계에서 생성된 신호인 제2 합산 신호와 기준 채널 신호 간의 상관관계를 이용하여 전술한 제2 공간 파라미터를 생성할 수도 있다.In operation 1330, the aforementioned first spatial parameter may be generated by using a correlation between the first sum signal, which is the signal generated in operation 1320, and the reference channel signal. Alternatively, in operation 1330, the second spatial parameter may be generated by using the correlation between the second summation signal and the reference channel signal, which are signals generated in operation 1320.

또한, 기준 채널 신호는 제1 내지 제 n 채널 신호(ch1 내지 chn) 각각이 될 수 있다. 따라서, 기준 채널 신호는 모두 n 개가 될 수 있으며, 기준 채널 신호에 대응하는 공간 파라미터 또한 n 개가 생성될 수 있다.In addition, the reference channel signal may be each of the first to nth channel signals ch1 to chn. Therefore, the number of reference channel signals may be all n, and n spatial parameters corresponding to the reference channel signal may also be generated.

따라서, 1330 단계는 제1 내지 제 n 채널 신호들(ch1 내지 chn) 각각을 기준 채널 신호로 하여, 공간 파라미터를 n 개 생성하는 단계를 더 포함할 수 있다.Therefore, operation 1330 may further include generating n spatial parameters by using each of the first to n th channel signals ch1 to chn as a reference channel signal.

1320 단계 및 1330 단계의 동작은 다운믹싱부(210)에서 수행될 수 있다.The operations of steps 1320 and 1330 may be performed by the downmixing unit 210.

1340 단계에서는, 1330 단계에서 생성된 공간 파라미터(SP)를 엔코딩하여 복호화장치(도 3의 300)로 전송한다. 또한, 1310 단계에서 생성된 모노 신호(DM)를 부호화하여 복호화장치(도 3의 300)로 전송한다. 구체적으로, 부호화된 공간 파라미터 및 부호화된 모노신호를 전송 스트림(TS)에 포함시켜 복호화장치(도 3의 300)로 전송할 수 있다. 전송 스트림(TS)에 포함되는 공간 파라미터는 전술한 제1 내지 제n 공간 파라미터들을 포함하는 공간 파라미터 셋(set)을 의미한다.In operation 1340, the spatial parameter SP generated in operation 1330 is encoded and transmitted to the decoding apparatus 300 of FIG. 3. In addition, the mono signal DM generated in operation 1310 is encoded and transmitted to the decoder 300 of FIG. 3. In detail, the encoded spatial parameter and the encoded mono signal may be included in the transport stream TS and transmitted to the decoding apparatus 300 of FIG. 3. The spatial parameter included in the transport stream TS refers to a spatial parameter set including the aforementioned first to nth spatial parameters.

1340 단계의 동작은 부호화장치(도 2의 200)에서 수행될 수 있다.Operation 1340 may be performed by the encoding apparatus 200 of FIG. 2.

도 14a 내지 도 14c는 도 11의 1110 단계 혹은 도 13의 1330 단계를 설명하기 위한 일예의 도면이다. 이하에서는 도 14a 내지 도 14c를 참조하여 제1 합산 신호 및 제1 공간 파라미터를 생성하는 동작을 상세히 설명한다. 도 14a 내지 도 14c는 멀티 채널 신호가 제1 내지 제3 채널 신호들(ch1, ch2, ch3)을 포함하는 경우를 예로 들어 도시한 것이다. 또한, 도 14a 내지 도 14c는 신호의 합산을 신호의 백터 합산을 예로 들어 도시하였으며, 여기서 신호의 합산은 다운믹싱을 의미하며, 백터 합 방법 이외에도 다양한 다운믹싱 방법이 존재할 수 있다.14A to 14C are exemplary diagrams for describing operation 1110 of FIG. 11 or operation 1330 of FIG. 13. Hereinafter, an operation of generating the first summation signal and the first spatial parameter will be described in detail with reference to FIGS. 14A to 14C. 14A to 14C illustrate an example in which a multi-channel signal includes first to third channel signals ch1, ch2, and ch3. In addition, FIGS. 14A to 14C illustrate the summation of signals as vector summation of signals, and the summation of signals means downmixing, and there may be various downmixing methods in addition to the vector summation method.

도 14a, 도 14b 및 도 14c는 각각 기준 채널 신호가 제1 채널 신호(ch1), 제2 채널 신호(ch2), 및 제3 채널 신호(ch3)인 경우를 도시한다.14A, 14B, and 14C illustrate cases where the reference channel signal is the first channel signal ch1, the second channel signal ch2, and the third channel signal ch3, respectively.

도 14a를 참조하면, 기준 채널 신호가 제1 채널 신호(ch1)인 경우, 부가정보 생성부(220)는 기준 채널 신호를 제외한 제2 및 제3 채널 신호들(ch2, ch3)을 합산(ch2 + ch3)하여 합산 신호(1410)를 생성한다. 그리고, 기준 채널 신호인 제1 채널 신호(ch1)와 합산 신호(1410) 간의 상관관계(ch1, ch2 + ch3)를 이용하여 공간 파라미터를 생성한다. 공간 파라미터는 기준 채널 신호와 합산 신호 간의 상관관계를 나타내는 정보 및 기준 채널 신호와 합산 신호의 상대적 신호 크기를 나타내는 정보를 가진다.Referring to FIG. 14A, when the reference channel signal is the first channel signal ch1, the additional information generator 220 sums the second and third channel signals ch2 and ch3 excluding the reference channel signal (ch2). + ch3) to generate a sum signal 1410. The spatial parameter is generated using the correlations ch1 and ch2 + ch3 between the first channel signal ch1 that is the reference channel signal and the sum signal 1410. The spatial parameter has information indicating a correlation between the reference channel signal and the sum signal and information indicating a relative signal magnitude of the reference channel signal and the sum signal.

도 14b를 참조하면, 기준 채널 신호가 제2 채널 신호(ch2)인 경우, 부가정보 생성부(220)는 기준 채널 신호를 제외한 제1 및 제3 채널 신호들(ch1, ch3)을 합산(ch1 + ch3)하여 합산 신호(1420)를 생성한다. 그리고, 기준 채널 신호인 제2 채널 신호(ch2)와 합산 신호(1420) 간의 상관관계(ch2, ch1 + ch3)를 이용하여 공간 파라미터를 생성한다.Referring to FIG. 14B, when the reference channel signal is the second channel signal ch2, the additional information generator 220 sums the first and third channel signals ch1 and ch3 excluding the reference channel signal (ch1). + ch3) to generate a sum signal 1420. The spatial parameter is generated using the correlations ch2 and ch1 + ch3 between the second channel signal ch2 that is the reference channel signal and the summation signal 1420.

도 14c를 참조하면, 기준 채널 신호가 제3 채널 신호(ch3)인 경우, 부가정보 생성부(220)는 기준 채널 신호를 제외한 제1 및 제2 채널 신호들(ch1, ch2)을 합산(ch1 + ch2)하여 합산 신호(1430)를 생성한다. 그리고, 기준 채널 신호인 제3 채널 신호(ch3)와 합산 신호(1430) 간의 상관관계(ch3, ch1 + ch3)를 이용하여 공간 파라미터를 생성한다.Referring to FIG. 14C, when the reference channel signal is the third channel signal ch3, the additional information generator 220 sums the first and second channel signals ch1 and ch2 excluding the reference channel signal (ch1). + ch2) to generate a sum signal 1430. The spatial parameter is generated using the correlations ch3 and ch1 + ch3 between the third channel signal ch3 and the summation signal 1430.

멀티채널 신호가 3개의 채널 신호를 포함할 경우, 기준 채널 신호는 3개가 되며, 3개의 공간 파라미터가 생성될 수 있다. 생성된 공간 파라미터는 부호화장치(200)에서 부호화되어 네트워크(미도시)를 통하여 복호화장치(300)로 전송된다.When the multichannel signal includes three channel signals, three reference channel signals may be generated, and three spatial parameters may be generated. The generated spatial parameter is encoded by the encoding apparatus 200 and transmitted to the decoding apparatus 300 through a network (not shown).

제1 내지 제3 채널 신호들(ch1, ch2, ch3)을 다운믹싱한 모노 신호(DM)는 제1 내지 제3 채널 신호들(ch1, ch2, ch3)의 합산 신호와 동일하며, DM=ch1 + ch2 + ch3 과 같이 표현할 수 있다. 따라서, ch1 = DM - (ch2 + ch 3)의 관계가 성립한다.The mono signal DM downmixing the first to third channel signals ch1, ch2 and ch3 is the same as the sum signal of the first to third channel signals ch1, ch2 and ch3, and DM = ch1. It can be expressed as + ch2 + ch3. Therefore, the relationship of ch1 = DM-(ch2 + ch3) is established.

복호화장치(300)는 도 14a 내지 도 14c에 설명한 공간 파라미터인 제1 공간 파라미터를 수신 및 복호화한다. 그리고, 디코딩된 모노 신호와 디코딩된 공간 파라미터를 이용하여 원래의 채널 신호들을 복원한다. 전술한 바와 같이, ch1 = DM - (ch2 + ch 3)의 관계가 성립하고, 도 14a에서 생성된 공간 파라미터는 신호들(ch1, ch2 + ch3)의 상대적 크기를 나타내는 파라미터 및 신호들(ch1, ch2 + ch3)의 유사도를 표현하는 파라미터를 포함할 수 있으므로, 도 14a에서 생성된 공간 파라미터 및 모노 신호(DM)를 이용하면 ch1 과 ch2+ch3 신호를 복원할 수 있다. 동일한 방법으로, 도 14b 및 도 14c에서 생성된 공간 파라미터들을 이용하면, ch2 과 ch1+ch3 신호 및 ch3 과 ch1+ch3 신호를 복원할 수 있다. 즉, 업믹싱부(도 3의 330)는 제1 내지 제3 채널 신호들(ch1, ch2, ch3)를 모두 복원할 수 있다.The decoding apparatus 300 receives and decodes the first spatial parameter, which is the spatial parameter described with reference to FIGS. 14A to 14C. The original channel signals are recovered using the decoded mono signal and the decoded spatial parameter. As described above, the relationship of ch1 = DM-(ch2 + ch3) is established, and the spatial parameter generated in FIG. 14A is a parameter and signals ch1, representing the relative magnitude of the signals ch1, ch2 + ch3. Since the parameter representing the similarity of ch2 + ch3) may be included, the ch1 and ch2 + ch3 signals may be recovered by using the spatial parameter and the mono signal DM generated in FIG. 14A. In the same way, using the spatial parameters generated in FIGS. 14B and 14C, the ch2 and ch1 + ch3 signals and the ch3 and ch1 + ch3 signals may be restored. That is, the upmixer 330 of FIG. 3 may restore all of the first to third channel signals ch1, ch2, and ch3.

도 15는 도 11의 1110 단계 혹은 도 13의 1330 단계를 설명하기 위한 다른 예의 도면이다. 이하에서는, 도 15를 참조하여 제2 합산 신호 및 제2 공간 파라미터를 생성하는 동작을 상세히 설명한다. 도 15에서는 멀티채널 신호가 제1 내지 제3 채널 신호들(ch1, ch2, ch3)을 포함하는 경우를 예로 들어 도시한 것이다. 또한, 도 15에서는 신호의 합산을 신호의 백터 합산을 예로 들어 도시한 것이다.FIG. 15 is a diagram of another example for describing operation 1110 of FIG. 11 or operation 1330 of FIG. 13. Hereinafter, an operation of generating the second sum signal and the second spatial parameter will be described in detail with reference to FIG. 15. FIG. 15 illustrates an example in which a multichannel signal includes first to third channel signals ch1, ch2, and ch3. In addition, in FIG. 15, the sum of the signals is illustrated taking the vector sum of the signals as an example.

도 15를 참조하면, 제2 합산 신호는 멀티 채널 신호인 제1 내지 제3 채널 신호들(ch1, ch2, ch3)을 모두 합산한 신호이므로, ch1 신호와 ch2 신호를 합산한 신호(1510)에 ch3 신호를 합산한 신호(ch1+ch2+ch3)(1520)가 제2 합산 신호가 된다.Referring to FIG. 15, since the second summation signal is a sum of all of the first to third channel signals ch1, ch2, and ch3 that are multi-channel signals, the second summation signal may be added to the signal 1510 that sums the ch1 and ch2 signals. A signal (ch1 + ch2 + ch3) 1520 obtained by adding the ch3 signals becomes a second sum signal.

제1 채널 신호(ch1)를 기준 채널 신호로 하여, 제1 채널 신호(ch1)와 제2 합산 신호(1520) 간의 공간 파라미터를 생성한다. 구체적으로, 제1 채널 신호(ch1)와 제2 합산 신호(1520) 간의 상관관계(ch1, ch1+ch2+ch3)를 이용하여, 제1 파라미터 및 제2 파라미터 중 적어도 하나를 포함하는 공간 파라미터를 생성할 수 있다.The spatial parameter between the first channel signal ch1 and the second sum signal 1520 is generated using the first channel signal ch1 as a reference channel signal. Specifically, a spatial parameter including at least one of the first parameter and the second parameter is determined using the correlations ch1 and ch1 + ch2 + ch3 between the first channel signal ch1 and the second sum signal 1520. Can be generated.

그리고, 제2 채널 신호(ch2)를 기준 채널 신호로 하여, 제2 채널 신호(ch2)와 제2 합산 신호(1520) 간의 상관관계(ch2, ch1+ch2+ch3)를 이용하여, 공간 파라미터를 생성한다. 또한, 제3 채널 신호(ch3)를 기준 채널 신호로 하여, 제3 채널 신호(ch3)와 제2 합산 신호(1520) 간의 상관관계(ch2, ch1+ch2+ch3)를 이용하여, 공간 파라미터를 생성한다.Then, using the second channel signal ch2 as the reference channel signal, the spatial parameter is obtained by using the correlations ch2 and ch1 + ch2 + ch3 between the second channel signal ch2 and the second sum signal 1520. Create In addition, by using the third channel signal ch3 as a reference channel signal and using the correlations ch2 and ch1 + ch2 + ch3 between the third channel signal ch3 and the second sum signal 1520, a spatial parameter is obtained. Create

복호화장치(도 3의 300)는 도 15에 설명한 공간 파라미터인 제1 공간 파라미터를 수신 및 부호화한다. 그리고, 부호화된 모노신호와 복호화된 공간 파라미터를 이용하여 원래의 채널신호들을 복원한다. 여기서, 디코딩된 모노 신호는 멀티채널 신호들의 합산 신호(ch1+ch2+ch3)와 대응된다.The decoding apparatus 300 of FIG. 3 receives and encodes the first spatial parameter, which is the spatial parameter described with reference to FIG. 15. The original channel signals are recovered by using the encoded mono signal and the decoded spatial parameter. Here, the decoded mono signal corresponds to the sum signal ch1 + ch2 + ch3 of the multichannel signals.

따라서, 제1 채널 신호(ch1)와 제2 합산 신호(1520) 간의 상관관계(ch1, ch1+ch2+ch3)를 이용하여 생성된 공간 파라미터 및 디코딩된 모노 신호를 이용하면, 제1 채널 신호(ch1)를 복원할 수 있다. 이와 유사하게, 제2 채널 신호(ch2)와 제2 합산 신호(1520) 간의 상관관계(ch2, ch1+ch2+ch3)를 이용하여 생성된 공간 파라미터를 이용하면, 제2 채널 신호(ch2)를 복원할 수 있다. 또한, 제3 채널 신호(ch3)와 제2 합산 신호(1520) 간의 상관관계(ch2, ch1+ch2+ch3)를 이용하여 생성된 공간 파라미터를 이용하면, 제3 채널 신호(ch3)를 복원할 수 있다.Therefore, when the spatial parameter generated using the correlation between the first channel signal ch1 and the second sum signal 1520 (ch1, ch1 + ch2 + ch3) and the decoded mono signal are used, the first channel signal ( ch1) can be restored. Similarly, using the spatial parameter generated by the correlation between the second channel signal ch2 and the second sum signal 1520, ch2, ch1 + ch2 + ch3, the second channel signal ch2 Can be restored In addition, when the spatial parameter generated by the correlation (ch2, ch1 + ch2 + ch3) between the third channel signal ch3 and the second sum signal 1520 is used, the third channel signal ch3 may be restored. Can be.

도 16a 내지 도 16d는 도 11의 1110 단계 혹은 도 13의 1330 단계를 설명하기 위한 또 다른 도면이다.16A through 16D are still another diagrams for describing operation 1110 of FIG. 11 or operation 1330 of FIG. 13.

먼저, 도 2의 부호화장치(200)에 있어서 부가정보 생성부(220) 에서 생성되는 공간 파라미터는 제1 파라미터로서 앵글 파라미터를 포함할 수 있다. 앵글 파라미터는 제1 내지 제 n 채널 신호들(ch1 내지 chn) 중 어느 하나의 채널 신호인 기준 채널 신호와 제1 내지 제 n 채널 신호들(ch1 내지 chn) 중 기준 채널 신호를 제외한 나머지 채널 신호들 간의 신호 크기에 대한 상관관계를 소정 각도 값으로 나타내는 파라미터이다. 앵글 파라미터는 글로벌 벡터 앵글(GVA: Global Vector Angle)이라 호칭할 수 있다. 또한, 앵글 파라미터는 기준 채널 신호와 제1 합산 신호의 상대적 크기를 앵글 값으로 표현하는 파라미터로 볼 수 있다.First, in the encoding apparatus 200 of FIG. 2, the spatial parameter generated by the additional information generator 220 may include an angle parameter as the first parameter. The angle parameter is the other channel signals except for the reference channel signal which is one of the first to nth channel signals ch1 to chn and the reference channel signal among the first to nth channel signals ch1 to chn. This parameter represents a correlation between the signal magnitudes of the signals at predetermined angle values. The angle parameter may be referred to as a global vector angle (GVA). In addition, the angle parameter may be viewed as a parameter representing the relative magnitude of the reference channel signal and the first sum signal as an angle value.

또한, 부가정보 생성부(220)는 제1 내지 제 n 채널 신호들(ch1 내지 chn) 각각을 기준 채널 신호로 하여, n 개의 제1 내지 제 n 앵글 파라미터들을 생성할 수 있다. 이하에서는, 제k 채널 신호를 기준 채널 신호로 하여 생성된 앵글 파라미터를 제k 앵글 파라미터라 한다.In addition, the additional information generator 220 may generate n first to nth angle parameters by using each of the first to nth channel signals ch1 to chn as a reference channel signal. Hereinafter, an angle parameter generated using the k-th channel signal as the reference channel signal is referred to as a k-th angle parameter.

도 16a는, 부호화장치(200)가 입력받는 멀티채널 신호는 제1 내지 제 3 채널 신호들(ch1, ch2, ch3)을 포함하는 경우를 예로 들어 도시한 것이다. 도 16b, 도 16c 및 도 16d는 각각 기준 채널 신호가 제1 채널 신호(ch1), 제2 채널 신호(ch2), 및 제3 채널 신호(ch3)인 경우를 도시한 것이다.FIG. 16A illustrates an example in which the multichannel signal input by the encoding apparatus 200 includes first to third channel signals ch1, ch2, and ch3. 16B, 16C, and 16D illustrate cases where the reference channel signal is the first channel signal ch1, the second channel signal ch2, and the third channel signal ch3, respectively.

도 16b를 참조하면, 부가정보 생성부(220)는 기준 채널 신호가 제1 채널 신호(ch1)인 경우, 기준 채널 신호를 제외한 나머지 채널 신호들인 제2 및 제3 채널 신호(ch2, ch3)를 합산(ch2+ch3)하고, 합산된 신호(1620)와 제1 채널 신호(ch1) 간의 앵글 파라미터인 제1 앵글 파라미터(angle 1)(1622)를 구한다.Referring to FIG. 16B, when the reference channel signal is the first channel signal ch1, the additional information generator 220 may select the second and third channel signals ch2 and ch3 which are the remaining channel signals except for the reference channel signal. The sum (ch2 + ch3) is performed to obtain a first angle parameter (angle 1) 1622, which is an angle parameter between the summed signal 1620 and the first channel signal ch1.

구체적으로, 제1 앵글 파라미터(angle 1)(1622)는 합산된 신호(ch2+ch3)(1620)의 절대값을 제1 채널 신호(ch1)의 절대값으로 나눈 값을 역 탄젠트(inverse tangent)하여 구할 수 있다.In detail, the first angle parameter 1622 is an inverse tangent of a value obtained by dividing an absolute value of the summed signal (ch2 + ch3) 1620 by an absolute value of the first channel signal ch1. Can be obtained by

도 16c를 참조하면, 제2 채널 신호(ch2)를 기준 채널 신호로 한 제2 앵글 파라미터(angle 2)(1632)는 합산된 신호(ch1+ch3)(1630)의 절대값을 제2 채널 신호(ch2)의 절대값으로 나눈 값을 역 탄젠트(inverse tangent)하여 구할 수 있다.Referring to FIG. 16C, the second angle parameter 1632 using the second channel signal ch2 as the reference channel signal may include the absolute value of the summed signals ch1 + ch3 1630 as the second channel signal. It can be found by inverse tangent of the value divided by the absolute value of (ch2).

도 16d를 참조하면, 제3 채널 신호(ch3)를 기준 채널 신호로 한 제3 앵글 파라미터(angle 3)(1642)는 합산된 신호(ch2+ch3)(1640)의 절대값을 제3 채널 신호(ch3)의 절대값으로 나눈 값을 역 탄젠트(inverse tangent)하여 구할 수 있다.Referring to FIG. 16D, the third angle parameter 1642 using the third channel signal ch3 as the reference channel signal may include an absolute value of the summed signal ch2 + ch3 1640 as the third channel signal. The value divided by the absolute value of (ch3) can be obtained by inverse tangent.

도 17은 앵글 파라미터들의 총합을 나타내는 그래프로서, x 축은 각도값을 나타내고, y 축은 분포 확률을 나타낸다. 또한, 도시된 각도값은 1 단위가 6도(degree)에 대응되며, 예를 들어, x 축의 30 값은 180 도가 된다.17 is a graph showing the sum of angle parameters, where the x axis represents an angle value and the y axis represents a distribution probability. In addition, the illustrated angle value corresponds to 6 degrees in one unit, for example, a value of 30 on the x-axis becomes 180 degrees.

구체적으로, 제1 내지 제 n 채널 신호들 각각을 기준 채널 신호로 하여 산출된 n 개의 앵글 파라미터들의 총합은 소정값으로 수렴된다. 수렴되는 소정 값은 n 의 값에 따라서 달라질 수 있는 값으로, 시뮬레이션을 통하여 혹은 실험적으로 최적화될 수 있다. 일예를 들어, n이 3인 경우 대략 180도가 될 수 있다.Specifically, the sum of the n angle parameters calculated using each of the first to nth channel signals as the reference channel signal converges to a predetermined value. The predetermined value converged may vary depending on the value of n, and may be optimized through simulation or experimentally. For example, when n is 3, it may be approximately 180 degrees.

도 17을 참조하면, n이 3개일 경우, 앵글 파라미터들의 총합은 도시된 바와 같이 30 단위, 즉 대략 180도 부근(1710)에서 수렴된다. 여기서, 도 14의 그래프는 시뮬레이션을 통하여 혹은 실험적으로 산출된 것이다.Referring to FIG. 17, when n is three, the sum of the angle parameters converges at 30 units, that is, around 1710 around 1710 as shown. Here, the graph of FIG. 14 is calculated through simulation or experimentally.

예외적으로, 앵글 파라미터들의 총합이 45단위, 즉 270도 부근(1720)에서 수렴되는 경우가 있다. 소정값이 270도 부근(1720)에서 수렴하는 경우는, 3개의 채널 신호들이 모두 묵음이어서 각각의 앵글 파라미터가 90도의 값을 갖는 경우이다. 이러한 예외의 경우에는, 3개의 앵글 파라미터들 중 하나의 앵글 파라미터의 값을 0 으로 바꾸면, 다시 앵글 파라미터들의 총합은 180 도로 수렴된다. 3개의 채널 신호들이 모두 묵음인 경우, 다운믹싱된 모노신호도 0 값을 가지며, 모노 신호를 업믹싱 및 디코딩하여도 0 값을 가진다. 따라서, 앵글 파라미터의 값을 0으로 바꾸더라도 업믹싱 및 디코딩 결과는 달라지지 않으므로, 3개의 앵글 파라미터들 중 하나의 앵글 파라미터의 값을 0으로 바꾸어도 무관하다.Exceptionally, the sum of the angle parameters sometimes converges at 45 units, i.e., 1720, around 1720. When the predetermined value converges in the vicinity 1720, all three channel signals are silent and each angle parameter has a value of 90 degrees. In the case of this exception, changing the value of one of the three angle parameters to zero, the sum of the angle parameters again converges to 180 degrees. When all three channel signals are muted, the downmixed mono signal also has a zero value, even when upmixing and decoding the mono signal. Accordingly, since the upmixing and decoding results do not change even when the value of the angle parameter is changed to 0, the value of the angle parameter of one of the three angle parameters may be changed to 0.

도 18은 앵글 파라미터들의 산출을 설명하기 위한 도면으로서, 멀티 채널 신호가 3개의 채널 신호들(ch1, ch2, ch3)을 포함하는 경우를 예로 들어 도시한 것이다. 일실시예에 따르면, 제1 내지 제 n 앵글 파라미터들 중 제 k 앵글 파라미터를 제외한 앵글 파라미터들 및 제 k 앵글 파라미터를 산출하는데 이용되는 제 k 앵글 파라미터의 레지듀를 포함하는 공간 파라미터를 생성할 수 있다.FIG. 18 is a diagram for describing calculation of angle parameters, and illustrates an example in which a multi-channel signal includes three channel signals ch1, ch2, and ch3. According to an embodiment, the spatial parameter may be generated including the residues of the first to nth angle parameters except the kth angle parameter and the kth angle parameter used to calculate the kth angle parameter. have.

도 18을 참조하면, 제1 채널 신호(ch1)가 기준 채널 신호인 경우 제1 앵글 파라미터가 산출되어 부호화되며, 부호화된 제1 앵글 파라미터는 소정 비트 영역(1810)에 포함되어 복호화장치(도 3의 300)로 전송된다. 그리고, 제2 채널 신호(ch2)가 기준 채널 신호인 경우, 제2 앵글 파라미터가 산출되어 부호화되며, 부호화된 제2 앵글 파라미터는 소정 비트 영역(1830)에 포함되어 복호화장치(도 3의 300)로 전송된다.Referring to FIG. 18, when the first channel signal ch1 is a reference channel signal, a first angle parameter is calculated and encoded, and the encoded first angle parameter is included in a predetermined bit region 1810 to decode the decoder (FIG. 3). Is sent to 300). When the second channel signal ch2 is the reference channel signal, the second angle parameter is calculated and encoded, and the encoded second angle parameter is included in the predetermined bit region 1830 to decode the apparatus (300 of FIG. 3). Is sent to.

제3 앵글 파라미터가 전술한 제 k 앵글 파라미터인 경우, 제 k 앵글 파라미터의 레지듀는 다음과 같이 구해질 수 있다.When the third angle parameter is the above-described kth angle parameter, the residue of the kth angle parameter may be obtained as follows.

n 개의 앵글 파라미터들의 총합은 소정 값으로 수렴하므로, 제 k 앵글 파라미터의 값은 소정 값에서 n 개의 앵글 파라미터들 중 제 k 앵글 파라미터를 제외한 앵글 파라미터들의 값을 빼면 구할 수 있다. 구체적으로, n 이 3일 경우, 세 개의 채널 신호들 모두가 묵음인 경우가 아니면, 세개의 앵글 파라미터들의 총 합은 180 도로 수렴한다. 따라서, '제3 앵글 파라미터의 값 = 180 도 - (제1 앵글 파라미터의 값 + 제2 앵글 파라미터의 값)'이 된다. 이와 같은 제1 내지 제2 앵글 파라미터간의 관계를 이용하여 제3 앵글 파라미터를 예측할 수 있다.Since the sum of the n angle parameters converges to a predetermined value, the value of the kth angle parameter may be obtained by subtracting the value of the angle parameters except the kth angle parameter among the n angle parameters from the predetermined value. Specifically, when n is 3, the total sum of the three angle parameters converges 180 degrees unless all three channel signals are silent. Thus, the value of the third angle parameter = 180 degrees-(the value of the first angle parameter + the value of the second angle parameter). The third angle parameter may be predicted by using the relationship between the first to second angle parameters.

구체적으로, 부가정보 생성부(도 2의 220)는 제1 내지 제 n 앵글 파라미터들 중 제 k 앵글 파라미터의 값을 예측한다. 소정 비트 영역(1870)은 예측된 제 k 앵글 파라미터의 값이 포함된 데이터 영역을 나타낸다.In detail, the additional information generator 220 of FIG. 2 predicts a value of the k th angle parameter among the first to n th angle parameters. The predetermined bit area 1870 represents a data area including a value of the predicted k-th angle parameter.

다음, 부가정보 생성부(도 2의 220)는 예측된 제 k 앵글 파라미터의 값과 원래의 제 k 앵글 파라미터의 값을 비교한다. 소정 비트 영역(1850)은 도 13d에서와 같이 산출된 제3 앵글 파라미터의 값을 포함하는 데이터 영역을 나타낸다.Next, the additional information generator 220 of FIG. 2 compares the predicted k-th angle parameter with the value of the original k-th angle parameter. The predetermined bit area 1850 represents a data area including the value of the third angle parameter calculated as in FIG. 13D.

다음, 부가정보 생성부(도 2의 220)는 예측된 제 k 앵글 파라미터의 값(1870)과 원래의 제 k 앵글 파라미터의 값(1850)의 차이 값을 제 k 앵글 파라미터의 레지듀로 생성한다. 소정 비트 영역(1890)은 제 k 앵글 파라미터의 레지듀가 포함된 데이터 영역을 나타낸다.Next, the additional information generator 220 of FIG. 2 generates a difference value between the predicted k-th angle parameter value 1870 and the original k-th angle parameter value 1850 as a residue of the k-th angle parameter. . The predetermined bit area 1890 represents a data area including a residue of the kth angle parameter.

부호화장치(도 2의 200)는 제1 내지 제 n 앵글 파라미터들 중 제 k 앵글 파라미터를 제외한 앵글 파라미터들(1810 및 1830 영역에 포함되는 파라미터들) 및 제 k 앵글 파라미터의 레지듀(1890 영역에 포함되는 파라미터)를 포함하는 공간 파라미터를 부호화하여 복호화장치(도 3의 300)로 전송한다.The encoder (200 of FIG. 2) may include angle parameters 1810 and 1830 in the first to nth angle parameters except for the kth angle parameter, and a residue 1910 of the kth angle parameter. The spatial parameter including the included parameter is encoded and transmitted to the decoder 300 of FIG. 3.

복호화장치(도 3의 300)는 제1 내지 제 n 앵글 파라미터들 중 제 k 앵글 파라미터를 제외한 앵글 파라미터들 및 제 k 앵글 파라미터의 레지듀를 포함하는 공간 파라미터를 수신한다.The decoder 300 of FIG. 3 receives the spatial parameters including the angle parameters except the kth angle parameter among the first to nth angle parameters and the residue of the kth angle parameter.

복호화장치(도 3의 300)의 복호화부(320)는 수신된 공간 파라미터 및 소정 값을 이용하여, 제 k 앵글 파라미터를 복원한다.The decoder 320 of the decoder 300 of FIG. 3 restores the k-th angle parameter by using the received spatial parameter and a predetermined value.

구체적으로, 복호화부(320)는 소정 값에서 제1 내지 제 n 앵글 파라미터들 중 제 k 앵글 파라미터를 제외한 앵글 파라미터들의 값을 빼고, 상기 뺀 값에서 제 k 앵글 파라미터의 레지듀를 보상한 값을 제 k 앵글 파라미터로 생성할 수 있다.In detail, the decoder 320 subtracts values of angle parameters excluding the k-th angle parameter among the first to n-th angle parameters from a predetermined value, and compensates for the residue of the k-th angle parameter from the subtracted value. The k th angle parameter may be generated.

제 k 앵글 파라미터의 레지듀는 k 앵글 파라미터의 값에 비하여 작은 데이터 크기를 갖는다. 따라서, 제1 내지 제 n 앵글 파라미터들 중 제 k 앵글 파라미터를 제외한 앵글 파라미터들 및 제 k 앵글 파라미터의 레지듀를 포함하는 공간 파라미터를 복호화장치(도 3의 300)로 전송할 경우, 부호화장치(도 2의 200)와 복호화장치(도 3의 300)간에 송수신되는 데이터량을 줄일 수 있다.The residue of the kth angle parameter has a smaller data size compared to the value of the k angle parameter. Therefore, when the spatial parameters including the angle parameters except the k-th angle parameter and the residue of the k-th angle parameter among the first to n-th angle parameters are transmitted to the decoder 300 of FIG. The amount of data transmitted / received between 200 of 2 and the decoder 300 of FIG. 3 can be reduced.

한편, 예를 들어 3개의 채널에 대하여 앵글 파라미터를 생성함에 있어서, 0, 1, 2의 값으로 표현함으로써 어느 채널의 앵글 파라미터에 대하여 레지듀얼 부호화를 수행하였는지를 알 수 있다. 즉, 3개 채널 모두에 대하여 독립적으로 부호화할 경우 2비트 * 3 = 6비트를 필요로 하지만, 하기와 같은 방법에 따르면 5비트로 표현할 수 있다.On the other hand, in generating angle parameters for three channels, for example, it is possible to know which channel has performed the residual encoding by representing the angle parameters of 0, 1, and 2. That is, when all three channels are independently encoded, 2 bits * 3 = 6 bits are required. However, according to the following method, 5 bits can be represented.

먼저, D=A+B*3+C*9 (여기서, %D의 범위: 0~26)로 둘 경우, 복호화시 D 값을 알면 A,B,C를 C=floor(D/9); D'=mod(D,9); B=floor(D'/3); A=mod(D'/3)와 같이 구해질 수 있다.First, if D = A + B * 3 + C * 9 (where% D is in the range of 0 to 26), A, B, C is C = floor (D / 9); D '= mod (D, 9); B = floor (D '/ 3); Can be found as A = mod (D '/ 3).

도 19는 본 발명의 일실시예에 따른 멀티채널 코덱과 코어 코덱을 통합하는 오디오 신호처리 시스템의 구성을 나타낸 블록도이다.19 is a block diagram showing the configuration of an audio signal processing system integrating a multi-channel codec and a core codec according to an embodiment of the present invention.

도 19에 도시된 오디오 신호처리 시스템(1900)은 부호화장치(1910) 및 복호화장치(1940)를 포함한다. 일실시예에서 오디오 신호처리 시스템(1900)은 부호화장치(1910)와 복호화장치(1940)를 모두 포함할 수 있고, 다른 실시예에서 오디오 신호처리 시스템(100)은 부호화장치(1910)와 복호화장치(1940) 중 어느 하나를 포함할 수 있다.The audio signal processing system 1900 illustrated in FIG. 19 includes an encoding device 1910 and a decoding device 1940. In one embodiment, the audio signal processing system 1900 may include both an encoding device 1910 and a decoding device 1940. In another embodiment, the audio signal processing system 100 may include an encoding device 1910 and a decoding device. 1940.

부호화장치(1910)는 멀티채널 엔코더(1920)와 코어 엔코더(1930)를 포함하고, 복호화장치(1940)은 코더 디코더(1850)와 멀티채널 디코더(1960)를 포함할 수 있다.The encoding apparatus 1910 may include a multichannel encoder 1920 and a core encoder 1930, and the decoding apparatus 1940 may include a coder decoder 1850 and a multichannel decoder 1960.

코더 엔코더(1930) 및 코어 디코더(1950)에서 사용되는 코덱 알고리즘의 예로는 변환 알고리즘으로 MDCT(Modified Discrete Cosine Transform)를 사용하는 AC-3, Enhancement AC-3, 혹은 AAC가 될 수 있으나, 이에 한정되지는 않는다.Examples of codec algorithms used in the coder encoder 1930 and the core decoder 1950 may be AC-3, Enhancement AC-3, or AAC using a Modified Discrete Cosine Transform (MDCT) as a conversion algorithm, but is not limited thereto. It doesn't work.

도 20은 본 발명의 일실시예에 따른 오디오 부호화장치의 구성을 나타낸 블록도로서, 멀티채널 엔코더(2010)와 코어 엔코더(2040)를 통합한 것이다.20 is a block diagram showing the configuration of an audio encoding apparatus according to an embodiment of the present invention, in which a multi-channel encoder 2010 and a core encoder 2040 are integrated.

도 20에 도시된 오디오 부호화장치(2000)는 멀티채널 엔코더(2010)와 코어 엔코더(2040)를 포함하며, 멀티채널 엔코더(2010)는 변환부(2020) 및 다운믹싱부(2030)으로, 코어 엔코더(2040)는 엔벨로프 부호화부(2050), 비트할당부(2060), 양자화부(2070) 및 비트스트림 결합부(2080)를 포함할 수 있다. 각 구성요소는 적어도 하나 이상의 모듈로 일체화되어 적어도 하나 이상의 프로세서(미도시)로 구현될 수 있다.The audio encoding apparatus 2000 illustrated in FIG. 20 includes a multichannel encoder 2010 and a core encoder 2040. The multichannel encoder 2010 is a converter 2020 and a downmixer 2030. The encoder 2040 may include an envelope encoder 2050, a bit allocator 2060, a quantizer 2070, and a bitstream combiner 2080. Each component may be integrated into at least one or more modules and implemented as at least one or more processors (not shown).

도 20을 참조하면, 변환부(2020)는 시간 도메인의 PCT 입력을 주파수 도메인의 스펙트럼 데이터로 변환한다. 이때, MODFT(Modified Odd Discrete Fourier Transform)를 적용할 수 있다. MODFT = MDCT jMDST 가 되어 MDCT 성분이 생성되므로, 기존의 역변환 파트와 분석필터 뱅크 파트가 불필요해진다. 아울러, MODFT는 복소값으로 구성되므로 레벨/위상/상관도에 대하여 MDCT에서보다 좀 더 정확하게 구할 수 있다.Referring to FIG. 20, the converter 2020 converts a PCT input in the time domain into spectral data in the frequency domain. At this time, Modified Odd Discrete Fourier Transform (MODFT) may be applied. MODFT = MDCT jMDST is used to generate the MDCT component, which eliminates the need for existing inverse transform parts and analysis filter bank parts. In addition, since the MODFT is composed of complex values, the MODFT can be obtained more accurately for the level / phase / correlation than in MDCT.

다운믹싱부(2030)는 변환부(2020)로부터 제공되는 스펙트럼 데이터에 대하여 공간 파라미터를 추출하고, 다운믹싱을 수행하여 다운믹싱된 스펙트럼을 생성한다. 추출된 공간 파라미터는 비트스트림 결합부(2080)로 제공된다.The downmixer 2030 extracts spatial parameters from the spectrum data provided from the converter 2020 and performs downmixing to generate downmixed spectrum. The extracted spatial parameters are provided to the bitstream combiner 2080.

엔벨로프 부호화부(2050)는 다운믹싱부(2030)로부터 제공되는 다운믹싱된 스펙트럼의 MDCT 변환계수들로부터, 소정의 주파수 밴드 단위로 엔벨로프 값을 획득하여 무손실 부호화를 수행한다. 여기서, 엔벨로프는 소정의 주파수 밴드 단위로 얻어지는 파워, 평균 진폭, norm 값, 평균 에너지 중 어느 하나로부터 구성될 수 있다.The envelope encoder 2050 performs lossless encoding by obtaining an envelope value in units of predetermined frequency bands from the MDCT transform coefficients of the downmixed spectrum provided from the downmixer 2030. Here, the envelope may be configured from any one of power, average amplitude, norm value, and average energy obtained in units of a predetermined frequency band.

비트할당부(2060)는 각 주파수밴드 단위로 구해지는 엔벨로프 값을 이용하여 변환계수를 부호화하는데 필요로 하는 비트할당정보를 생성하고, MDCT 변환계수들에 대하여 정규화를 수행한다. 이 경우, 각 주파수밴드 단위로 양자화 및 무손실 부호화된 엔벨로프 값은 비트스트림에 포함되어 복호화장치(도 21의 2100)로 제공될 수 있다. 각 주파수밴드의 엔벨로프 값을 이용한 비트할당과 관련하여, 부호화장치와 복호화장치에서 동일한 프로세스를 이용할 수 있도록 역양자화된 엔벨로프 값을 사용할 수 있다. 엔벨로프 값으로서 norm 값을 예로 들 경우, 각 주파수밴드 단위로 norm 값을 이용하여 마스킹 임계치를 계산하고, 마스킹 임계치를 이용하여 지각적으로 필요한 비트수를 예측할 수 있다.The bit allocator 2060 generates bit allocation information necessary for encoding the transform coefficients using envelope values obtained in units of frequency bands, and normalizes the MDCT transform coefficients. In this case, an envelope value quantized and losslessly encoded in each frequency band unit may be included in the bitstream and provided to the decoding apparatus 2100 of FIG. 21. With respect to bit allocation using envelope values of respective frequency bands, dequantized envelope values may be used so that the same process may be used in the encoding apparatus and the decoding apparatus. If the norm value is used as the envelope value, the masking threshold value may be calculated using the norm value for each frequency band unit, and the perceptually necessary number of bits may be predicted using the masking threshold value.

양자화부(2070)는 다운믹싱된 스펙트럼의 MDCT 변환계수들에 대하여 비트할당부(2060)로부터 제공되는 비트할당정보에 근거하여 양자화를 수행하여 양자화 인덱스를 생성한다.The quantization unit 2070 generates a quantization index by performing quantization on the MDCT transform coefficients of the downmixed spectrum based on the bit allocation information provided from the bit allocation unit 2060.

비트스트림 결합부(2080)는 부호화된 스펙트럼 엔벨로프, 다운믹싱된 스펙트럼의 양자화 인덱스, 및 공간 파라미터를 결합하여 비트스트림을 생성한다.The bitstream combiner 2080 generates a bitstream by combining the encoded spectral envelope, the quantization index of the downmixed spectrum, and the spatial parameter.

도 21은 본 발명의 일실시예에 따른 오디오 복호화장치의 구성을 나타낸 블록도로서, 코어 디코더(2110)와 멀티채널 디코더(2160)를 통합한 것이다.21 is a block diagram showing the configuration of an audio decoding apparatus according to an embodiment of the present invention, in which a core decoder 2110 and a multichannel decoder 2160 are integrated.

도 21에 도시된 오디오 복호화장치(2100)는 코어 디코더(2110)와 멀티채널 디코더(2160)를 포함하며, 코어 디코더(2110)는 비트스트림 파싱부(2120), 엔벨로프 복호화부(2130), 비트할당부(2140) 및 역양자화부(2150)로, 멀티채널 디코더(2160)는 업믹싱부(2150) 및 역변환부(2160)를 포함할 수 있다. 각 구성요소는 적어도 하나 이상의 모듈로 일체화되어 적어도 하나 이상의 프로세서(미도시)로 구현될 수 있다.The audio decoding apparatus 2100 illustrated in FIG. 21 includes a core decoder 2110 and a multichannel decoder 2160, and the core decoder 2110 includes a bitstream parser 2120, an envelope decoder 2130, and a bit. As the allocator 2140 and the inverse quantizer 2150, the multichannel decoder 2160 may include an upmixer 2150 and an inverse transformer 2160. Each component may be integrated into at least one or more modules and implemented as at least one or more processors (not shown).

도 21을 참조하면, 비트스트림 파싱부(2120)는 네트워크(미도시)를 통하여 전송되는 비트스트림을 파싱하여, 부호화된 스펙트럼 엔벨로프, 다운믹싱된 스펙트럼의 양자화 인덱스, 및 공간 파라미터를 추출한다.Referring to FIG. 21, the bitstream parser 2120 parses a bitstream transmitted through a network (not shown) to extract an encoded spectral envelope, a quantization index of a downmixed spectrum, and a spatial parameter.

엔벨로프 복호화부(2130)는 비트스트림 파싱부(2120)로부터 제공되는 부호화된 스펙트럼 엔벨로프를 무손실 복호화한다.The envelope decoder 2130 losslessly decodes the encoded spectral envelope provided from the bitstream parser 2120.

비트할당부(2140)는 비트스트림 파싱부(2120)로부터 각 주파수밴드 단위로 제공되는 부호화된 스펙트럼 엔벨로프를 이용하여 변환계수를 복호화하는데 필요로 하는 비트 할당에 사용한다. 비트할당부(2140)는 오디오 부호화장치(2000)의 비트할당부(2060)과 동일하게 동작할 수 있다.The bit allocator 2140 is used for bit allocation required to decode the transform coefficient using the encoded spectral envelope provided from the bitstream parser 2120 in units of frequency bands. The bit allocator 2140 may operate in the same manner as the bit allocator 2060 of the audio encoding apparatus 2000.

역양자화부(2150)는 비트스트림 파싱부(2120)로부터 제공되는 다운믹싱된 스펙트럼의 양자화 인덱스에 대하여 비트할당부(2140)로부터 제공되는 비트할당정보에 근거하여 역양자화를 수행하여 MDCT 성분의 스펙트럼 데이터를 생성한다.The inverse quantization unit 2150 performs inverse quantization on the quantization index of the downmixed spectrum provided from the bitstream parser 2120 based on the bit allocation information provided from the bit allocation unit 2140, thereby spectrum of the MDCT component. Generate data.

업믹싱부(2170)는 비트스트림 파싱부(2120)로부터 제공되는 공간 파라미터를 이용하여, 역양자화부(210)로부터 제공되는 MDCT 성분의 스펙트럼 데이터에 대하여 업믹싱을 수행하고, 엔벨로프 복호화부(2130)로부터 제공되는 복호화된 스펙트럼 엔벨로프를 이용하여 역정규화를 수행한다.The upmixer 2170 performs upmixing on the spectral data of the MDCT component provided from the inverse quantizer 210 by using the spatial parameter provided from the bitstream parser 2120, and the envelope decoder 2130. Denormalization is performed using the decoded spectral envelope provided by.

역변환부(2180)는 업믹싱부(2170)로부터 제공되는 업믹싱된 스펙트럼에 대하여 역변환을 수행하여 시간 도메인의 PCM 출력을 생성한다. 이때, 변환부(도 20의 2020)에 대응되도록 역 MODFT를 적용할 수 있다. 이를 위하여, MDCT 성분의 스펙트럼 데이터로부터 MDST 성분의 스펙트럼 데이터를 생성하거나, 예측할 수 있다. MDCT 성분의 스펙트럼 데이터와 생성 혹은 예측된 MDST 성분의 스펙트럼 데이터를 이용하여 MODFT 성분의 스펙트럼 데이터를 생성하여 역 MODFT를 적용할 수 있다. 한편, 역변환부(2180)는 MDCT 성분의 스펙트럼 데이터에 대하여 역 MDCT를 적용할 수 있다. 이를 위하여, 오디오 부호화장치(도 20의 2000)로부터 MDCT 도메인에서 업믹싱 수행시 생기는 에러를 보상하기 위한 파라미터가 전송될 수 있다.The inverse transform unit 2180 performs inverse transform on the upmixed spectrum provided from the upmixing unit 2170 to generate the PCM output of the time domain. In this case, an inverse MODFT may be applied to correspond to the converter (2020 of FIG. 20). To this end, spectral data of the MDST component may be generated or predicted from the spectral data of the MDCT component. Inverse MODFT may be applied by generating spectral data of the MODFT component using the spectral data of the MDCT component and the spectral data of the generated or predicted MDST component. The inverse transform unit 2180 may apply an inverse MDCT to the spectral data of the MDCT component. To this end, a parameter for compensating for an error occurring when upmixing is performed in the MDCT domain may be transmitted from the audio encoding apparatus 2000 of FIG. 20.

일실시예에 따르면, 정적인 신호 구간(stationary 구간)에 대해서는 MDCT 도메인에서 멀티채널 디코딩을 수행할 수 있다. 한편, 비정적인 신호 구간 예를 들면, 트랜지언트 신호 구간에서는 MDCT 성분으로부터 MDST 성분을 생성 혹은 예측하여 MODFT 성분을 생성한 다음 MODFT 도메인에서 멀티채널 디코딩을 수행할 수 있다.According to an embodiment, the multi-channel decoding may be performed in the MDCT domain for the static signal interval. Meanwhile, in a non-static signal section, for example, a transient signal section, a MODFT component may be generated by generating or predicting an MDST component from an MDCT component, and then multichannel decoding may be performed in the MODFT domain.

한편, 현재 신호가 정적인 신호 구간에 해당하는지 혹은 비정적인 신호 구간에 해당하는지는 비트스트림에 소정의 주파수밴드 혹은 프레임 단위로 부가된 플래그 정보 혹은 윈도우 정보를 이용하여 체크할 수 있다. 예를 들어, 단구간 윈도우(short window)가 적용되는 경우 비정적인 신호 구간, 장구간 윈도우(long window)가 적용되는 경우 정적인 신호 구간에 해당할 수 있다.Meanwhile, whether the current signal corresponds to a static signal period or a non-static signal period may be checked using flag information or window information added to a bitstream in a predetermined frequency band or frame unit. For example, it may correspond to a non-static signal section when a short window is applied and a static signal section when a long window is applied.

좀 더 구체적으로, 코어 코덱에 enhancement AC-3 알고리즘을 적용하는 경우, blksw 와 AHT 플래그 정보를 이용하고, AC-3 알고리즘을 적용하는 경우, blksw 플래그 정보를 이용하여 현재 신호의 특성을 확인할 수 있다.More specifically, when the enhancement AC-3 algorithm is applied to the core codec, blksw and AHT flag information are used, and when the AC-3 algorithm is applied, the characteristics of the current signal can be checked by using the blksw flag information. .

도 20 및 도 21에 따르면, MODFT를 시간/주파수 도메인 변환에 사용함으로써, 서로 다른 변환기법을 사용하는 멀티채널 코덱과 코어 코덱을 통합하더라도 복호화단의 복잡도를 감소시킬 수 있다. 또한, 서로 다른 변환기법을 사용하는 멀티채널 코덱과 코어 코덱을 통합하더라도 기존의 합성필터 뱅크 파트와 변환 파트가 불필요하게 되어 오버랩 애드(overlap add)를 생략할 수 있으므로 추가적인 지연이 발생하지 않는다.According to FIGS. 20 and 21, by using the MODFT for time / frequency domain conversion, the complexity of the decoding stage can be reduced even when integrating a multi-channel codec and a core codec using different converter methods. In addition, even when integrating a multi-channel codec and a core codec using different converter methods, an existing synthesis filter bank part and a conversion part are unnecessary, so that an overlap add can be omitted, and thus no additional delay occurs.

상기 실시예들에 따른 방법은 컴퓨터에서 실행될 수 있는 프로그램으로 작성가능하고, 컴퓨터로 읽을 수 있는 기록매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 또한, 상술한 본 발명의 실시예들에서 사용될 수 있는 데이터 구조, 프로그램 명령, 혹은 데이터 파일은 컴퓨터로 읽을 수 있는 기록매체에 다양한 수단을 통하여 기록될 수 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 저장 장치를 포함할 수 있다. 컴퓨터로 읽을 수 있는 기록매체의 예로는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함될 수 있다. 또한, 컴퓨터로 읽을 수 있는 기록매체는 프로그램 명령, 데이터 구조 등을 지정하는 신호를 전송하는 전송 매체일 수도 있다. 프로그램 명령의 예로는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함할 수 있다.The method according to the embodiments can be written in a computer executable program and can be implemented in a general-purpose digital computer operating the program using a computer readable recording medium. In addition, data structures, program instructions, or data files that can be used in the above-described embodiments of the present invention can be recorded on a computer-readable recording medium through various means. The computer-readable recording medium may include all kinds of storage devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, floppy disks, and the like. Such as magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. The computer-readable recording medium may also be a transmission medium for transmitting a signal specifying a program command, a data structure, or the like. Examples of program instructions may include machine language code such as those produced by a compiler, as well as high level language code that may be executed by a computer using an interpreter or the like.

이상과 같이 본 발명의 일실시예는 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명의 일실시예는 상기 설명된 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 스코프는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 이의 균등 또는 등가적 변형 모두는 본 발명 기술적 사상의 범주에 속한다고 할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is clearly understood that the same is by way of illustration and example only and is not to be construed as limiting the scope of the invention as defined by the appended claims. Various modifications and variations are possible in light of the above teachings. Therefore, the scope of the present invention is shown in the claims rather than the foregoing description, and all equivalent or equivalent modifications thereof will be within the scope of the present invention.

Claims

When downmixing the first plurality of input channels to the second plurality of output channels, comparing the positions of the first plurality of input channels with the positions of the second plurality of output channels;
Downmixing a channel having the same position as the second plurality of output channels among the first plurality of input channels to a channel having the same position among the second plurality of output channels;
Searching for at least one adjacent channel with respect to the remaining channels of the first plurality of input channels;
Determining a weight with respect to the found adjacent channel in consideration of at least one of a distance between channels, a correlation of a signal, and an error in reconstruction; And
And downmixing the remaining channels of the first plurality of input channels to the adjacent channel based on the determined weight.