KR20040102164A

KR20040102164A - Parametric representation of statial audio

Info

Publication number: KR20040102164A
Application number: KR10-2004-7017073A
Authority: KR
Inventors: 제이. 브리바르트덜크; 스티븐 엘. 제이. 디. 이. 밴드파
Original assignee: 코닌클리케 필립스 일렉트로닉스 엔.브이.
Priority date: 2002-04-22
Filing date: 2003-04-22
Publication date: 2004-12-03
Also published as: JP5498525B2; US20090287495A1; US20130094654A1; EP1881486B1; CN1307612C; ES2300567T3; CN1647155A; JP2012161087A; DE60318835D1; ATE385025T1; ATE426235T1; BR0304540A; KR20100039433A; US20080170711A1; WO2003090208A1; BRPI0304540B1; JP5101579B2; US9137603B2; JP2005523480A; JP2009271554A

Abstract

In summary, this application describes a psycho-acoustically motivated, parametric description of the spatial attributes of multichannel audio signals. This parametric description allows strong bitrate reductions in audio coders, since only one monaural signal has to be transmitted, combined with (quantized) parameters which describe the spatial properties of the signal. The decoder can form the original amount of audio channels by applying the spatial parameters. For near-CD-quality stereo audio, a bitrate associated with these spatial parameters of 10 kbit/s or less seems sufficient to reproduce the correct spatial impression at the receiving end.

Description

Parametric representation of statial audio

오디오 코딩 분야에서, 예를 들면 오디오 신호의 지각적 품질과 부적절히 타협하지 않고 신호를 통신하는 비트율 또는 신호를 저장하기 위한 저장 요건을 감소시키기 위해, 오디오 신호를 인코딩하는 것이 일반적으로 바람직하다. 이는 오디오 신호들이 제한된 용량의 통신 채널들을 통해 전송되어야 할 때 또는 이들 신호들이 제한된 용량을 갖는 저장 매체 상에 저장되어야 할 때 중요한 쟁점이다.In the field of audio coding, it is generally desirable to encode an audio signal, for example, in order to reduce the bit rate of communicating the signal or the storage requirements for storing the signal without inadequately compromising the perceptual quality of the audio signal. This is an important issue when audio signals must be transmitted over limited capacity communication channels or when these signals must be stored on a storage medium having limited capacity.

스테레오 프로그램 물질의 비트율을 감소시키기 위해 제안되고 있는 오디오 코더들에서 선행 기술의 해결책들은 다음을 포함한다:Prior art solutions in audio coders that have been proposed to reduce the bit rate of stereo program material include:

'세기 스테레오(Intensity stereo)'. 이 알고리즘에서, 높은 주파수들(전형적으로 5kHz 이상)은 시간-변화 및 주파수-의존성 계수 인자들과 조합된 단일 오디오 신호(즉, 모노)로 표시된다.' Intensity stereo '. In this algorithm, high frequencies (typically above 5 kHz) are represented by a single audio signal (ie mono) combined with time-varying and frequency-dependent coefficient factors.

'M/S 스테레오'. 이 알고리즘에서, 신호는 합(또는 미드, 또는 공통) 및 차이(또는 사이드, 또는 비공통) 신호로 분해된다. 이러한 분해는 때때로 주요 성분 분석 또는 시간-변화하는 계수 인자들과 조합된다. 이어서, 이들 신호는 변환 코더 또는 파형 코더에 의해 독립적으로 코딩된다. 이 알고리즘에 의해 성취된 정보 감소량은 소스 신호의 공간적 특성들에 강하게 의존한다. 예를 들면, 소스 신호가 모노럴인 경우, 상이한 신호가 0이고 폐기될 수 있다. 그러나, 좌측 및 우측 오디오 신호들의 상관 관계가 적은 경우(이는 종종 있는 경우임), 이러한 체제은 장점을 거의 제공하지 않는다.' M / S Stereo '. In this algorithm, the signal is decomposed into sum (or mid, or common) and difference (or side, or non-common) signals. This decomposition is sometimes combined with major component analysis or time-varying coefficient factors. These signals are then independently coded by a transform coder or waveform coder. The amount of information reduction achieved by this algorithm strongly depends on the spatial characteristics of the source signal. For example, if the source signal is monaural, the different signal is zero and can be discarded. However, when the correlation of left and right audio signals is small (which is often the case), this regime offers little advantage.

오디오 신호들의 파라메터적 해석들은 특히 오디오 코딩 분야에서 지난 수년 동안 흥미를 끌어왔다. 오디오 신호들을 기재하는 (양자화된) 파라메터들을 전송하는 것은 수신 단부에서 지각적으로 동등한 신호를 재합성하기 위한 전송 용량을 거의 필요로 하지 않는 것으로 밝혀지고 있다. 그러나, 현재의 파라메트릭 오디오 코더들은 모노럴 신호들을 코딩하는 것에 초점을 맞추고 있고, 스테레오 신호들은 종종 이중 모노로서 처리된다.Parametric interpretations of audio signals have been of interest for many years, especially in the field of audio coding. It has been found that transmitting (quantized) parameters describing audio signals requires very little transmission capacity to resynthesize perceptually equivalent signals at the receiving end. However, current parametric audio coders are focused on coding monaural signals, and stereo signals are often treated as dual mono.

유럽 특허 출원 EP 제 1 107 232호는 L 및 R 성분을 갖는 스테레오 신호를 인코딩하는 방법을 개시하고 있으며, 여기서 스테레오 신호는 스테레오 성분들 중 하나와, 파라메터트릭 정보 캡쳐링 단계와, 오디오 신호의 레벨 차이들로 나타낸다. 디코더에서, 다른 스테레오 성분은 인코딩된 스테레오 성분 및 파라메트릭 정보 기초로 회수된다.European patent application EP 1 107 232 discloses a method for encoding a stereo signal having L and R components, wherein the stereo signal is one of the stereo components, the step of capturing parameter information and the level of the audio signal. Represented by the differences. At the decoder, the other stereo component is recovered on the basis of the encoded stereo component and parametric information.

본 발명은 오디오 신호들의 코딩에 관한 것으로서, 특히 다중-채널 오디오신호들의 코딩에 관한 것이다.The present invention relates to the coding of audio signals, and more particularly to the coding of multi-channel audio signals.

본 발명의 목적은 회수된 신호의 높은 지각적 품질을 생성하는 개선된 오디오 코딩을 제공하는 문제를 해결하는 것이다.It is an object of the present invention to solve the problem of providing improved audio coding that produces a high perceptual quality of the recovered signal.

상기 문제 및 다른 문제들은 오디오 신호를 코딩하는 방법에 의해 해결되며,이 방법은,The above and other problems are solved by a method of coding an audio signal, which method

- 적어도 2개의 입력 오디오 채널들의 조합을 포함하는 모노럴 신호를 생성하는 단계와,Generating a monaural signal comprising a combination of at least two input audio channels,

- 적어도 2개의 입력 오디오 채널들의 공간적 특성들을 나타내는 공간적 파라메터들의 세트를 결정하는 단계로서, 상기 공간적 파라메터들의 세트는 적어도 2개의 입력 오디오 채널들의 파형들의 유사성의 척도를 나타내는 파라메터를 포함하는 상기 단계와,Determining a set of spatial parameters indicative of the spatial characteristics of at least two input audio channels, said set of spatial parameters comprising a parameter indicative of a measure of similarity of waveforms of at least two input audio channels,

- 모노럴의 신호 및 공간적 파라메터들의 세트를 포함하는 인코딩된 신호를 발생시키는 단계를 포함한다.Generating an encoded signal comprising a monaural signal and a set of spatial parameters.

대응하는 파형들의 유사성의 척도를 포함하는 많은 공간적 속성들 및 모노럴의 오디오 신호 등의 다중-채널 오디오 신호를 인코딩함으로써 다중-채널 신호는 높은 지각적 품질로 회수될 수 있는 것으로 발명자들에 의해 밝혀졌다. 본 발명의 추가의 장점은 다중-채널 신호, 즉, 적어도 제 1 채널 및 제 2 채널을 포함하는 신호, 예를 들면 스테레오 신호, 4채널 신호 등의 효율적인 인코딩을 제공하는 것이다.It has been found by the inventors that multi-channel signals can be recovered with high perceptual quality by encoding multi-channel audio signals such as monaural audio signals and many spatial properties including measures of similarity of corresponding waveforms. . A further advantage of the present invention is to provide efficient encoding of multi-channel signals, ie signals comprising at least a first channel and a second channel, for example stereo signals, four channel signals and the like.

따라서, 본 발명에 따라, 다중-채널 오디오 신호들의 공간적 속성들이 파라메터화된다. 일반적인 오디오 코딩 용도들에 대해, 단지 하나의 모노럴의 오디오 신호와 조합된 이들 파라메터들을 전송하는 것은 채널들을 독립적으로 진행시키는 오디오 코더들에 비교한 입체 신호를 전송하는데 필요한 전송 용량을 감소시키는 한편, 원시 공간적 임프레션을 유지한다. 중요한 쟁점은 사람들이 청각적 대상물의 파형들을 2회 수신하더라도(좌측 귀로 1회 및 우측 귀로 1회), 단일 청각적 대상물만이 특정 위치에서 특정 크기(또는 공간적 확산도)로 인지된다.Thus, according to the invention, the spatial properties of the multi-channel audio signals are parameterized. For typical audio coding applications, transmitting these parameters in combination with only one monaural audio signal reduces the transmission capacity required to transmit stereoscopic signals compared to audio coders that advance the channels independently, while Maintain spatial impressions. An important issue is that although people receive the waveforms of the auditory object twice (once in the left ear and once in the right ear), only a single auditory object is perceived at a certain location (or spatial diffusion) at a particular location.

따라서, 2개 이상의 (독립적인) 파형들로서 오디오 신호들을 기재하는 것이 불필요해 보이고, 각각 그 자신의 공간적 특성들을 갖는 청각적 대상물들의 세트로서 다중-채널 오디오를 기재하는 것이 보다 양호할 것이다. 즉각적으로 발생하는 하나의 곤란점은 청각적 대상물들의 주어진 앙상블, 예를 들면 음악 레코딩으로부터 개개의 청각적 대상물들을 자동으로 분리하는 것은 거의 불가능하다는 사실이다. 이 문제는 개개의 청각적 대상물들에서 프로그램 물질을 분할하지 않고, 오히려 청각 시스템의 효과적인 (주변) 프로세싱을 닮은 방식으로 공간적 파라메터들을 기재함으로써 회피될 수 있다. 공간적 속성들이 대응하는 파형들의 (비)유사성의 척도를 포함할 때, 효율적인 코딩이 성취되는 한편, 높은 레벨의 지각적 품질을 유지할 수 있다.Thus, it would seem unnecessary to describe audio signals as two or more (independent) waveforms, and it would be better to describe multi-channel audio as a set of auditory objects, each with its own spatial characteristics. One difficulty that arises immediately is the fact that it is almost impossible to automatically separate individual auditory objects from a given ensemble of auditory objects, for example a music recording. This problem can be avoided by not dividing the program material in individual auditory objects, but rather by describing the spatial parameters in a manner similar to the effective (peripheral) processing of the auditory system. When spatial properties include a measure of (de) similarity of corresponding waveforms, efficient coding can be achieved while maintaining a high level of perceptual quality.

특히, 여기 제공된 다중-채널 오디오의 파라메터적 설명은 Breebaart 등에 의해 제공된 바이노럴의 프로세싱 모델에 관련된다. 이 모델은 바이노럴의 청각 시스템의 효과적인 신호 프로세싱을 기재하는 것을 목표로 한다. Breebaart 등에 의한 두 귀의 프로세싱 모델의 설명을 위해, Breebaart, J., van de Par, S. 및 Kohlrausch, A.(2001a). 상반된 측면 억제 I에 기초한 바이노럴 프로세싱 모델, 모델 셋업 (Binaural processing model based on contralateral inhibition. I. Model setup.).J. Acoust. Soc. Am.110, 1974-1088; Breebaart, J. van de Par, S. 및 Kohlrausch; A.(2001b). 상반된 측면 억제 II에 기초한 바이노럴 프로세싱모델, 스펙트럼적 파라메터들에 대한 차이 (Binaural processing model based on contralateral inhibition. II. Dependence on spectral parameters.).J. Acoust. Soc. Am.110, 1989-1088; 및 Breebaart, J., van de Par, S. 및 Kohlrausch; A.(2001c). 상반된 측면 억제 III에 기초한 바이노럴 프로세싱 모델, 일시적 파라메터들에 대한 차이 (Binaural processing model based on contralateral inhibition. III. Dependence on tempora parameters.).J. Acoust. Soc. Am.110, 1105-1117 참조. 본 발명의 이해를 돕기 위해 아래 짧은 해석이 주어진다.In particular, the parametric description of the multi-channel audio provided herein relates to the binaural processing model provided by Breebaart et al. This model aims to describe the effective signal processing of Binaural's auditory system. For a description of the processing model of both ears by Breebaart et al., Breebaart, J., van de Par, S. and Kohlrausch, A. (2001a). Binaural processing model based on contralateral inhibition. I. Model setup. J. Acoust. Soc. Am. 110, 1974-1088; Breebaart, J. van de Par, S. and Kohlrausch; A. (2001b). Binaural processing model based on contralateral inhibition. II.Dependence on spectral parameters. J. Acoust. Soc. Am. 110, 1989-1088; And Breebaart, J., van de Par, S. and Kohlrausch; A. (2001c). Binaural processing model based on contralateral inhibition. III. Dependence on tempora parameters. J. Acoust. Soc. Am. See 110, 1105-1117. The following brief interpretation is given to help understand the present invention.

바람직한 실시예에서, 공간적 파라메터들의 세트는 적어도 하나의 편재화 큐를 포함한다. 공간적 속성들이 1개 이상, 바람직하게는 2개의 편재화 큐들 뿐만 아니라 대응하는 파형들의 (비)유사성의 척도를 포함할 때, 특히 높은 레벨의 인식 품질을 유지하는 동안 특히 효율적인 코딩이 성취된다.In a preferred embodiment, the set of spatial parameters includes at least one localization queue. Particularly efficient coding is achieved, especially while maintaining a high level of recognition quality, when the spatial properties comprise one or more, preferably two localization cues, as well as a measure of the (non) similarity of the corresponding waveforms.

편재화 큐라는 용어는 오디오 신호에 기여하는 청각적 대상물들의 편재화, 예를 들면 청각적 대상물의 방향 및/또는 거리에 관한 정보를 전달하는 임의의 적절한 파라메터를 포함한다.The term localization cue includes any suitable parameter that conveys information about the localization of the auditory objects that contribute to the audio signal, eg, the direction and / or distance of the auditory object.

본 발명의 바람직한 실시예에서, 공간적 파라메터들의 세트는 채널간 레벨 차이(ILD)와, 채널간 시간차(ITD) 및 채널간 위상차(IPD) 중 선택된 것을 포함하는 적어도 2개의 편재화 큐를 포함한다. 채널간 레벨 차이 및 채널간 시간차는 수평 평면에서 가장 중요한 편재화 큐들인 것으로 고려된다.In a preferred embodiment of the invention, the set of spatial parameters comprises at least two localization queues comprising a selected one of an interchannel level difference (ILD), an interchannel time difference (ITD) and an interchannel phase difference (IPD). The level difference between channels and the time difference between channels are considered to be the most important localization queues in the horizontal plane.

제 1 및 제 2 오디오 채널들에 대응하는 파형들의 유사성의 척도는 대응하는파형들이 얼마나 유사하거나 또는 유사하지 않은지를 기재하는 임의의 적절한 기능일 수 있다. 따라서, 유사성의 척도는 유사성의 증가하는 함수, 예를 들면 채널간 교차-상관 관계(함수)로부터/로 결정되는 파라메터일 수 있다.The measure of similarity of the waveforms corresponding to the first and second audio channels can be any suitable function that describes how similar or dissimilar the corresponding waveforms are. Thus, the measure of similarity may be a parameter that is determined from / to an increasing function of similarity, for example cross-channel cross-correlation (function).

바람직한 실시예에 따라, 유사성의 척도는 상기 교차-상관 관계 함수의 최대값에서 교차-상관 관계 함수의 값에 대응한다(간섭으로서 공지됨). 최대 채널간 교차-상관 관계는 음향 소스의 인식 공간의 확산도(또는 압축도)에 강력히 관련되고, 즉 상기 편재화 큐들에 의해 고려되지 않는 추가의 정보를 제공함으로써, 이것에 의하여, 이들에 의해 전달되는 적은 정도의 잉여 정보를 갖는 파라메터들을 제공하고, 따라서 효율적인 코딩을 제공한다.According to a preferred embodiment, the measure of similarity corresponds to the value of the cross-correlation function at the maximum value of the cross-correlation function (known as interference). The maximum interchannel cross-correlation relationship is strongly related to the diffusion (or compression) of the recognition space of the acoustic source, ie by providing additional information not considered by the localization queues, thereby It provides parameters with a small amount of surplus information to be conveyed, thus providing efficient coding.

대안으로, 유사성의 다른 척도들, 예를 들면 파형들의 비유사성에 의해 증가하는 함수가 사용될 수 있다. 그러한 함수의 일 예는 1-c이고, 여기서 c는 0과 1 사이의 값들을 가정할 수 있는 교차-상관 관계이다.Alternatively, a function of increasing by other measures of similarity, for example dissimilarity of waveforms may be used. One example of such a function is 1-c, where c is a cross-correlation that can assume values between 0 and 1.

본 발명의 바람직한 실시예에 따라, 공간적 특성들을 나타내는 공간적 파라메터들의 세트를 결정하는 단계는 시간 및 주파수의 함수로서 공간적 파라메터들의 세트를 결정하는 단계를 포함한다.According to a preferred embodiment of the present invention, determining the set of spatial parameters indicative of the spatial characteristics comprises determining the set of spatial parameters as a function of time and frequency.

본 발명자들의 통찰로는 ILD, ITD (또는 IPD) 및 시간과 주파수의 함수로서 최대 상관 관계를 명시함으로써 임의의 다중 채널 오디오 신호의 공간적 속성들을 기재하는 것으로 충분하다.Our insight is sufficient to describe the spatial properties of any multichannel audio signal by specifying the maximum correlation as a function of ILD, ITD (or IPD) and time and frequency.

본 발명의 추가의 바람직한 실시예에서, 공간적 특성들을 나타내는 공간적 파라메터들의 세트를 결정하는 단계는,In a further preferred embodiment of the invention, the step of determining a set of spatial parameters indicative of the spatial characteristics,

- 적어도 2개의 입력 오디오 채널들 각각을 대응하는 복수개의 주파수 대역들로 분할하는 단계와,Dividing each of the at least two input audio channels into a corresponding plurality of frequency bands,

- 복수의 주파수 대역들 각각에 대해, 대응하는 주파수 대역 내에서 적어도 2개의 입력 오디오 채널들의 공간적 특성들을 나타내는 공간적 파라메터들의 세트를 결정하는 단계를 포함한다.For each of the plurality of frequency bands, determining a set of spatial parameters indicative of the spatial characteristics of the at least two input audio channels within the corresponding frequency band.

따라서, 인입하는 오디오 신호는 (바람직하게는) EPR-등급 규모로 선형으로 공간 배치된 여러 개의 대역-제한된 신호들로 분할된다. 바람직하게는, 분석 필터들은 주파수 및/또는 시간 도메인에서 부분적 오버랩을 보여준다. 이들 신호들의 대역폭은 ERB 속도에 따라, 중심 주파수에 의존한다. 순차로, 바람직하게는모든 주파수 대역에 대해,유입되는 신호들의 다음 특성들:Thus, the incoming audio signal is (preferably) split into several band-limited signals that are spatially arranged linearly on an EPR-grade scale. Preferably, the analysis filters show partial overlap in the frequency and / or time domain. The bandwidth of these signals depends on the center frequency, depending on the ERB speed. In order, preferably for all frequency bands the following characteristics of the incoming signals:

- 좌측 및 우측 신호들로부터 대역폭-제한된 신호 스테밍의 상대적 레벨들로 정의되는 채널간 레벨 차이 또는 ILD,Level difference or ILD between channels defined by the relative levels of bandwidth-limited signal stemming from the left and right signals,

- 채널간 교차-상관 관계 함수에서 피크의 위치에 대응하는 채널간 지연(또는 페이스 시프트)로 정의되는 채널간 시간(또는 위상) 차이(ITD 또는 IPD), 및The interchannel time (or phase) difference (ITD or IPD) defined by the interchannel delay (or face shift) corresponding to the position of the peak in the interchannel cross-correlation function, and

- 최대 채널간 교차-상관 관계에 의해 파라메터화될 수 있는 ITD들 또는 ILD들에 의해 고려될 수 없는 파형들의 (비)유사성 (즉, 최대 피크의 위치에서 ㅍ준화된 교차-상관 관계 함수의 값, 또한 간섭으로서 공지됨)이 분석된다.The (non) similarity of waveforms that cannot be considered by ITDs or ILDs that can be parameterized by the maximum inter-channel cross-correlation (i.e. the value of the standardized cross-correlation function at the position of the maximum peak) , Also known as interference).

상기 3개의 파라메터들은 시간이 경과함에 따라 변화하지만; 바이노럴 청각 시스템은 그의 처리에 있어서 매우 느리기 때문에, 이들 특성들의 갱신 속도는 오히려 낮다(전형적으로 수십 밀리초).The three parameters change over time; Since the binaural hearing system is very slow in its processing, the update rate of these properties is rather low (typically tens of milliseconds).

여기서, (느리게) 시간-변화하는 상기 특성들은 바이노럴 청각 시스템이 이용할 수 있고, 이들 시간 및 주파수 의존 파라메터들로부터, 인지되는 청각 세계는 보다 높은 레벨들의 청각 시스템에 의해 재구축된다고 가정될 수 있다.Here, the (slowly) time-changing characteristics are available to the binaural auditory system, and from these time and frequency dependent parameters, it can be assumed that the perceived auditory world is reconstructed by higher levels of the auditory system. have.

본 발명의 일 실시예는,One embodiment of the present invention,

입력 신호들의 특정 조합을 구성하는 하나의 모노럴 신호, 및One monaural signal constituting a particular combination of input signals, and

공간적 파라메터들의 세트: 바람직하게는 모든 시간/주파수 슬롯에 대해 ILD들 및/또는ITD들에 의해 설명될 수 없는 파형들의 유사성 또는 비유사성을 기재하는 파라메터(예, 교차-상관 관계의 최대값) 및 2개의 편재화 큐들(ILD, 및 ITD 또는 IPD)에 의해 다중 채널 오디오 신호를 기재하는 것을 목표로 한다. 바람직하게는, 공간적 파라메터들은 각각의 추가의 청각 채널에 대해 포함된다.Set of spatial parameters: preferably a parameter describing the similarity or dissimilarity of waveforms that cannot be explained by ILDs and / or IDTs for all time / frequency slots (e.g., the maximum value of cross-correlation) and It is aimed at describing a multi-channel audio signal by two localization cues (ILD and ITD or IPD). Preferably, spatial parameters are included for each additional auditory channel.

파라메터들의 전송의 중요한 쟁점은 파라메터 표시(즉, 양자화 에러들의 크기)의 정확도이고, 이는 불필요한 전송 용량에 직접적으로 관련된다.An important issue in the transmission of parameters is the accuracy of the parameter representation (ie the magnitude of the quantization errors), which is directly related to the unnecessary transmission capacity.

본 발명의 다른 바람직한 실시예에 따라, 모노럴 신호 및 공간적 파라메터들의 세트를 포함하는 인코딩된 신호를 발생시키는 단계는 각각 대응하는 결정된 공간적 파라메터에 상대적인 대응하는 양자화 에러를 도입하는 양자화된 공간적 파라메터들의 세트를 발생시키는 단계를 포함하고, 여기서, 도입된 양자화 에러들 중의 적어도 하나는 결정된 공간적 파라메터들 중의 적어도 하나의 값에 의존하도록 제어된다.According to another preferred embodiment of the present invention, generating an encoded signal comprising a monaural signal and a set of spatial parameters comprises a set of quantized spatial parameters each introducing a corresponding quantization error relative to the corresponding determined spatial parameter. Generating at least one of the introduced quantization errors is controlled to depend on the value of at least one of the determined spatial parameters.

따라서, 파라메터들의 양자화에 의해 도입된 양자화 에러는 이들 파라메터들에서 변화들로 인간의 청각 시스템의 감응성에 따라 제어된다. 이러한 감응성은파라메터들 자체의 값들에 크게 의존한다. 따라서, 양자화 에러를 제어함으로써, 파라메터들의 값들에 의존하고, 개선된 인코딩이 성취된다.Thus, the quantization error introduced by quantization of the parameters is controlled in accordance with the sensitivity of the human auditory system with changes in these parameters. This sensitivity is highly dependent on the values of the parameters themselves. Thus, by controlling the quantization error, depending on the values of the parameters, an improved encoding is achieved.

본 발명의 장점은 오디오 코더들에서 모노럴 및 바이노럴 신호 파라메터들의 결합 해제를 제공하는 것이다. 따라서, 입체 오디오 코더들에 관련된 곤란점들 (예를 들면, 청각 간에 상관된 양자화 잡음에 비교한 청각간 상관되지 않은 양자화 잡음의 가청성, 또는 이중 모노 모드로 인코딩되는 파라메터적 코드들에서 청각간 위상 불일관성)은 강력히 감소된다는 것이다.An advantage of the present invention is to provide decoupling of monaural and binaural signal parameters in audio coders. Thus, difficulties associated with stereoscopic audio coders (e.g., audibility of inter-audit uncorrelated quantization noise compared to correlated quantization noise, or inter-hearing in parametric codes encoded in dual mono mode) Phase inconsistency) is strongly reduced.

본 발명의 추가의 장점은 강력한 비트율 감소가 공간적 파라메터들에 필요한 낮은 갱신 속도 및 낮은 주파수 분해능으로 인해 오디오 코더들에서 성취된다. 공간적 파라메터들을 코딩하기 위해 연관된 비트율은 전형적으로 10kbit 이하라는 것이다(아래 실시예 참조).A further advantage of the present invention is that strong bit rate reduction is achieved in audio coders due to the low update rate and low frequency resolution required for spatial parameters. The associated bit rate for coding spatial parameters is typically less than 10 kbit (see embodiment below).

본 발명의 추가의 장점은 현존하는 오디오 코더들과 용이하게 조합될 수 있다는 것이다. 제안된 체제은 임의의 현존하는 코딩 전략에 의해 코딩되고 인코딩될 수 있는 하나의 모노신호를 생산한다. 모노럴 디코딩 후, 여기 기재된 시스템은 적절한 공간적 속성들에 의해 입체 다중채널 신호를 재발생시킨다.A further advantage of the present invention is that it can be easily combined with existing audio coders. The proposed framework produces one monosignal that can be coded and encoded by any existing coding strategy. After monaural decoding, the system described herein regenerates a stereoscopic multichannel signal with appropriate spatial properties.

공간적 파라메터들의 세트는 오디오 코더들에서 강화층으로서 사용될 수 있다. 예를 들면, 모노 신호는 낮은 비트율만이 허용되는 경우에 전송되는 한편, 공간 강화층을 포함함으로서 디코더는 입체 음향을 재생산할 수 있다.The set of spatial parameters can be used as an enhancement layer in audio coders. For example, a mono signal is transmitted when only a low bit rate is allowed, while including a spatial enhancement layer allows the decoder to reproduce stereoscopic sound.

본 발명은 입체 신호들로만 제한되지 않고, n개의 채널들(n>1)을 포함하는 임의의 다중-채널 신호에 적용될 수 있음에 주의하자. 특히, 본 발명은 (n-1) 세트의 공간적 파라메터들이 전송되는 경우, 하나의 모노 신호로부터 n개의 채널들을 발생시키기 위해 사용될 수 있다. 이러한 경우에, 공간적 파라메터들은 단일 모노 신호로부터 n개의 상이한 오디오 채널들을 어떻게 형성할지를 기재한다.Note that the present invention is not limited to stereoscopic signals but can be applied to any multi-channel signal including n channels (n> 1). In particular, the present invention can be used to generate n channels from one mono signal when (n-1) sets of spatial parameters are transmitted. In this case, the spatial parameters describe how to form n different audio channels from a single mono signal.

본 발명은 상기 방법을 포함하는 상이한 방식들로 및 다음에서, 코딩된 오디오 신호를 디코딩하는 방법, 인코더, 디코더 및 추가의 생성 수단들로 구현될 수 있고, 이들 각각은 상기 제 1 방법과 관련하여 기재된 1개 이상의 이익들 및 장점들을 생성하고, 각각은 상기 제 1 방법과 관련하여 기재되고 종속항들에 개시된 바람직한 실시예들에 대응하는 1개 이상의 바람직한 실시예들을 갖는다.The invention can be implemented in different ways including the above method, and in the following, a method of decoding a coded audio signal, an encoder, a decoder and further generating means, each of which relates to the first method. Produces one or more benefits and advantages described, each having one or more preferred embodiments corresponding to the preferred embodiments described in connection with the first method and disclosed in the dependent claims.

상기 방법 및 다음 방법은 소프트웨어에서 구현될 수 있고, 데이터 처리 시스템 또는 컴퓨터-실행 가능한 명령들의 실행에 의해 유발되는 기타 프로세싱 수단에서 수행될 수 있음이 주목된다. 그 명령들은 컴퓨터 네트워크를 통해 저장 매체로부터 또는 다른 컴퓨터로부터 메모리, 예를 들면 RAM에 로드된 프로그램 코드 수단일 수 있다. 대안으로, 기재된 특징들은 소프트웨어 대신에 또는 소프트웨어와 조합된 하드와이어드 회로에 의해 구현될 수 있다.It is noted that the method and the following method may be implemented in software and performed in a data processing system or other processing means caused by the execution of computer-executable instructions. The instructions may be program code means loaded into a memory, for example RAM, from a storage medium or from another computer via a computer network. Alternatively, the described features may be implemented by hardwired circuitry instead of or in combination with software.

본 발명은 오디오 신호를 코딩하는 인코더와 더 관련되며, 상기 인코더는,The invention further relates to an encoder for coding an audio signal, the encoder comprising:

- 적어도 2개의 입력 오디오 채널들의 조합물을 포함하는 모노럴 신호를 생성하는 수단과,Means for generating a monaural signal comprising a combination of at least two input audio channels,

- 적어도 2개의 입력 오디오 채널들의 공간적 특성들을 나타내는 공간적 파라메터들의 세트를 결정하는 수단으로서, 공간적 파라메터들의 세트는 적어도 2개의 입력 오디오 채널들의 파형들의 유사성의 척도를 나타내는 파라메터를 포함하는, 상기 수단과,Means for determining a set of spatial parameters indicative of the spatial characteristics of at least two input audio channels, wherein the set of spatial parameters comprises a parameter indicative of a measure of similarity of waveforms of at least two input audio channels,

- 모노럴 신호 및 공간적 파라메터들의 세트를 포함하는 인코딩된 신호를 발생시키는 수단을 포함한다.Means for generating an encoded signal comprising a monaural signal and a set of spatial parameters.

모노럴 신호를 생성하는 상기 수단, 공간적 파라메터들의 세트를 결정하는 수단 뿐만 아니라 인코딩된 신호를 생성하는 수단은 임의의 적절한 회로 또는 디바이스에 의해, 예를 들면 범용 또는 특수 목적의 프로그램 가능한 마이크로프로세서들, 디지털 신호 프로세서들(DSP), 용도 특이적 집적 회로들(ASIC), 프로그램 가능한 논리 어레이들(PLA), 필드 프로그램 가능한 게이트 어레이들(FPGA), 특수 목적 전자 회로들 등 또는 이들의 조합으로서 구현될 수 있음이 주목된다.The means for generating a monaural signal, the means for determining a set of spatial parameters as well as the means for generating an encoded signal may be produced by any suitable circuit or device, for example general purpose or special purpose programmable microprocessors, digital Signal processors (DSP), application specific integrated circuits (ASIC), programmable logic arrays (PLA), field programmable gate arrays (FPGA), special purpose electronic circuits, or the like, or a combination thereof. It is noted that.

본 발명은 오디오 신호를 공급하는 장치와 더 관련되며, 상기 장치는,The invention further relates to a device for supplying an audio signal, the device comprising:

- 오디오 신호를 수용하는 입력단과,An input for receiving an audio signal,

- 인코딩된 오디오 신호를 얻기 위해 오디오 신호를 인코딩하는 상기 및 다음에 기재되는 바의 인코더와,An encoder as described above and next for encoding an audio signal to obtain an encoded audio signal,

- 인코딩된 오디오 신호를 공급하는 출력단을 포함하는, 오디오 신호를 공급하는 장치에 관한 것이다.An apparatus for supplying an audio signal, comprising an output for supplying an encoded audio signal.

이 장치는 임의의 전자 장비 도는 그러한 장비의 일부, 예를 들면 고정식 또는 휴대용 컴퓨터들, 고정식 또는 휴대용 무선 통신 장비 또는 기타 손잡이형 또는 휴대용 디바이스들, 예를 들면 매체 플레이어들, 기록 디바이스들 등일 수 있다. 휴대용 무선 통신 장비라는 용어는 모바일 전화기들, 호출기들, 커뮤니케이터들, 즉, 전자 오거나이저들, 스마트 폰들, 개인용 디지털 보조 장치들(PDA들), 손잡이형 컴퓨터들 등을 포함한다.The apparatus may be any electronic equipment or part of such equipment, for example fixed or portable computers, fixed or portable wireless communication equipment or other handheld or portable devices such as media players, recording devices, and the like. . The term portable wireless communication equipment includes mobile telephones, pagers, communicators, ie electronic organizers, smart phones, personal digital assistants (PDAs), handheld computers, and the like.

입력단은 아날로그 또는 디지털 형태로, 유선 접속, 예를 들면 라인 잭을 통해서 또는 무선 접속, 예를 들면 무선 신호, 또는 임의의 다른 적절한 방식으로 다중-채널 오디오 신호를 수신하는 임의의 적절한 회로 또는 디바이스를 포함할 수 있다.The input stage may be in analog or digital form, or any suitable circuit or device that receives a multi-channel audio signal via a wired connection, eg a line jack or in a wireless connection, eg a wireless signal, or any other suitable manner. It may include.

유사하게, 출력단은 인코딩된 신호를 공급하는 임의의 적절한 회로 또는 디바이스를 포함할 수 있다. 그러한 출력단들의 예들은 LAN, 인터넷 등의 통신 네트워크에 신호를 제공하는 네트워크 인터페이스, 신호를 통신 채널, 예를 들면 무선 통신 채널 등을 통해 통신시키는 통신 회로를 포함한다. 다른 실시예들에서, 출력단은 저장 매체 상에 신호를 저장하는 디바이스를 포함할 수 있다.Similarly, the output stage can include any suitable circuit or device that supplies the encoded signal. Examples of such output stages include a network interface that provides a signal to a communication network, such as a LAN, the Internet, and communication circuitry that communicates the signal through a communication channel, such as a wireless communication channel. In other embodiments, the output stage can include a device that stores a signal on a storage medium.

본 발명은 인코딩된 오디오 신호와 더 관련되며, 상기 신호는,The invention further relates to an encoded audio signal, wherein the signal is

- 적어도 2개의 오디오 채널들의 조합을 포함하는 모노럴 신호와,A monaural signal comprising a combination of at least two audio channels,

- 적어도 2개의 입력 오디오 채널들의 공간적 특성들을 나타내는 공간적 파라메터들의 세트로서, 공간적 파라메터들의 세트는 적어도 2개의 입력 오디오 채널들의 파형들의 유사성의 척도를 나타내는 파라메터를 포함하는 상기 세트를 포함한다.A set of spatial parameters indicative of the spatial characteristics of at least two input audio channels, the set of spatial parameters comprising said set comprising a parameter indicative of a measure of similarity of waveforms of at least two input audio channels.

본 발명은 또한 그와 같이 인코딩된 신호가 그 위에 저장된 저장 매체에 관한 것이다. 여기서, 저장 매체라는 용어는 자기 테이프, 광디스크, 디지털 비디오 디스크(DVD), 컴팩트 디스크(CD 또는 CD-ROM), 미니-디스크, 하드 디스크, 플로피 디스크, 페로-전기 메모리, 전기적으로 소거 가능한 프로그램 가능한 판독 전용 메모리(EEPROM), 플래쉬 메모리, EPROM, 판독 전용 메모리(ROM), 스태틱 랜덤 액세스 메모리(SRAM), 다이내믹 랜덤 액세스 메모리(DRAM), 동기적 다이내믹 랜덤 액세스 메모리(SDRAM), 강자성 메모리, 광학 저장기, 전하 결합된 디바이스들, 스마트 카드들, PCMCIA 카드들 등을 포함하지만, 이들로만 제한되지 않는다.The invention also relates to a storage medium in which such encoded signals are stored. Here, the term storage medium refers to magnetic tape, optical disk, digital video disk (DVD), compact disk (CD or CD-ROM), mini-disk, hard disk, floppy disk, ferro-electric memory, electrically erasable programmable Read Only Memory (EEPROM), Flash Memory, EPROM, Read Only Memory (ROM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Ferromagnetic Memory, Optical Storage Groups, charge coupled devices, smart cards, PCMCIA cards, and the like.

본 발명은 추가로,The present invention further provides

인코딩된 오디오 신호를 디코딩하는 방법과 더 관련되며, 상기 방법은,More related to a method of decoding an encoded audio signal, the method comprising:

- 인코딩된 오디오 신호로부터 모노럴 신호를 얻는 단계로서, 상기 모노럴 신호는 적어도 2개의 오디오 채널들의 조합을 포함하는 상기 단계와,Obtaining a monaural signal from an encoded audio signal, said monaural signal comprising a combination of at least two audio channels;

- 인코딩된 오디오 신호로부터 공간적 파라메터들의 세트를 얻는 단계로서, 공간적 파라메터들의 상기 세트는 적어도 2개의 입력 오디오 채널들의 파형들의 유사성의 척도를 나타내는 파라메터를 포함하는 상기 단계와,Obtaining a set of spatial parameters from the encoded audio signal, said set of spatial parameters comprising a parameter indicating a measure of similarity of waveforms of at least two input audio channels,

- 모노럴 신호 및 상기 공간적 파라메터들로부터 다중-채널 출력 신호를 생성하는 단계를 포함한다.Generating a multi-channel output signal from the monaural signal and the spatial parameters.

본 발명은 추가로The present invention further

인코딩된 오디오 신호를 디코딩하는 디코더와 더 관련되며, 상기 디코더는,Further related to a decoder for decoding an encoded audio signal, the decoder comprising:

- 인코딩된 오디오 신호로부터 모노럴 신로를 얻는 수단으로서, 상기 모노럴 신호는 적어도 2개의 오디오 채널들의 조합을 포함하는 상기 수단과, ,Means for obtaining a monaural channel from an encoded audio signal, said monaural signal comprising a combination of at least two audio channels,

- 인코딩된 오디오 신호로부터 공간적 파라메터들의 세트를 얻는 수단으로서, 공간적 파라메터들의 상기 세트는 적어도 2개의 오디오 채널들의 파형들의 유사성의 척도를 나타내는 파라메터를 포함하는 상기 수단과,Means for obtaining a set of spatial parameters from an encoded audio signal, said set of spatial parameters comprising a parameter indicating a measure of similarity of waveforms of at least two audio channels,

- 모노럴 신호 및 상기 공간적 파라메터들로부터 다중-채널 출력 신호를 생성하는 수단을 포함한다.Means for generating a monaural signal and a multi-channel output signal from said spatial parameters.

상기 수단들은 임의의 적절한 회로 또는 디바이스, 예를 들면 범용 또는 특수-목적의 프로그램 가능한 마이크로프로세서들, 디지털 신호 처리기들(DSP), 용도 특이적 집적 회로들(ASIC), 프로그램 가능한 논리 어레이들(PLA), 필드-프로그램 가능한 게이트 어레이들(FPGA), 특수 목적의 전자 회로들 등 또는 이들의 조합물에 의해 구현될 수 있음이 주목된다.The means may be any suitable circuit or device, for example general purpose or special-purpose programmable microprocessors, digital signal processors (DSP), application specific integrated circuits (ASIC), programmable logic arrays (PLA). It is noted that the present invention can be implemented by field-programmable gate arrays (FPGA), special purpose electronic circuits, or the like, or a combination thereof.

본 발명은 디코딩된 오디오 신호를 공급하는 장치와 더 관련되며, 상기 장치는,The invention further relates to an apparatus for supplying a decoded audio signal, the apparatus comprising:

- 인코딩된 오디오 신호를 수신하는 입력단과,An input for receiving an encoded audio signal,

- 다중-채널 출력 신호를 얻기 위해 상기 인코딩된 오디오 신호를 디코딩하기 위한 제 14 항에 청구된 디코더와A decoder as claimed in claim 14 for decoding said encoded audio signal to obtain a multi-channel output signal;

- 다중-채널 출력 신호를 공급 또는 재생산하는 출력단을 포함한다.An output stage for supplying or reproducing a multi-channel output signal.

이 장치는 임의의 전자 장비 또는 상기한 바의 그러한 장비의 일부일 수 있다.This device may be any electronic equipment or part of such equipment as described above.

입력단은 코딩된 오디오 신호를 수신하는 임의의 적절한 회로 또는 디바이스를 포함할 수 있다. 그러한 입력단들의 예들은 LAN, 인터넷 등의 컴퓨터 네트워크를 통해 신호를 수신하는 네트워크 인터페이스, 통신 채널, 예를 들면 무선 통신 채널 등을 통해 신호를 수신하는 통신 회로를 포함한다. 다른 실시예들에서, 입력단은 저장 매체로부터 신호를 판독하는 디바이스를 포함할 수 있다.The input stage can include any suitable circuit or device that receives the coded audio signal. Examples of such inputs include a network interface for receiving signals via a computer network, such as a LAN, the Internet, or a communication circuit for receiving signals via a communication channel, for example a wireless communication channel. In other embodiments, the input stage can include a device that reads signals from the storage medium.

유사하게, 출력단은 디지털 또는 아날로그 형태로 다중-채널 신호를 공급하기 위한 임의의 적절한 회로 또는 디바이스를 포함할 수 있다.Similarly, the output stage may include any suitable circuit or device for supplying a multi-channel signal in digital or analog form.

본 발명의 이들 측면 및 기타 측면들은 도면을 참조하여 아래 기재된 실시예들로부터 명확하고 명백해질 것이다.These and other aspects of the invention will become apparent and apparent from the embodiments described below with reference to the drawings.

도 1은 본 발명의 일 실시예에 따라 오디오 신호를 인코딩하는 방법의 흐름도를 도시한다.1 shows a flowchart of a method of encoding an audio signal according to an embodiment of the present invention.

도 2는 본 발명의 일 실시예에 따른 코딩 시스템의 개략적 블록도를 도시한다.2 shows a schematic block diagram of a coding system according to an embodiment of the present invention.

도 3은 오디오 신호를 합성하는데 사용하기 위한 필터 방법을 도시한다.3 shows a filter method for use in synthesizing an audio signal.

도 4는 오디오 신호를 합성하는데 사용하기 위한 비상관기를 도시한다.4 shows a decorrelator for use in synthesizing an audio signal.

초기 단계 S1에서, 유입되는 신호들 L 및 R은 참조 번호 101로 지시된 대역-통과 신호들로 (바람직하게는 주파수에 따라 증가하는 대역폭에 의해) 분할됨으로써, 이들의 파라메터들은 시간의 함수로서 분석될 수 있다. 시간/주파수 분할을 위한 하나의 가능한 방법은 변환 오퍼레이션에 이어 시간-윈도우화를 사용하는 것이지만, 시간-연속 방법들이 사용될 수도 있다(예, 필터 뱅크들). 이 프로세스의 시간 및 주파수 분해능은 신호에 채용되는 것이 바람직하고; 일시적인 신호들에 대해, 미세한 시간 분해능(수 밀리초의 치수) 및 거친 주파수 분해능이 바람직한 한편, 비-일시적 신호들에 대해, 보다 미세한 주파수 분해능 및 보다 거친 시간 분해능(수십 밀리초의 치수)이 바람직하다. 순차로, 단계 S2에서, 대응하는 서브 대역 신호들의 레벨 차이(ILD)가 결정되고; 단계 S3에서, 대응하는 서브대역 신호들의 시간 차이(ITD 또는 IPD)가 결정되고; 단계 S4에서 ILD들 또는 ITD들에 의해 설명될 수 없는 파형들의 유사성 또는 비유사성의 양이 기재된다. 이들 파라메터들의 분석은 아래 고찰된다.In an initial step S1, incoming signals L and R are divided (preferably by a bandwidth that increases with frequency) into band-pass signals indicated by reference numeral 101, so that their parameters are analyzed as a function of time. Can be. One possible method for time / frequency division is to use time-windowing following the conversion operation, but time-continuous methods may be used (eg, filter banks). The time and frequency resolution of this process is preferably employed in the signal; For temporal signals, fine time resolution (dimensions of a few milliseconds) and coarse frequency resolution are preferred, while for non-transient signals, finer frequency resolution and coarser time resolution (dimensions of tens of milliseconds) are preferred. In turn, in step S2, the level difference ILD of the corresponding subband signals is determined; In step S3, the time difference (ITD or IPD) of the corresponding subband signals is determined; In step S4 the amount of similarity or dissimilarity of waveforms that cannot be described by ILDs or ITDs is described. Analysis of these parameters is discussed below.

단계 S2: ILD들의 분석Step S2: Analysis of ILDs

ILD는 주어진 주파수 대역에 대해 특정 시간의 경우에 신호들의 레벨 차이에 의해 결정된다. ILD를 결정하는 하나의 방법은 두 입력 채널들의 대응하는 주파수 대역의 근제곱 평균(rms) 값을 측정하고 이들 rms 값들(바람직하게는 dB로 표현됨)의 비율을 연산하는 것이다.The ILD is determined by the level difference of the signals at specific times for a given frequency band. One way to determine the ILD is to measure the root mean square (rms) values of the corresponding frequency bands of the two input channels and calculate the ratio of these rms values (preferably expressed in dB).

단계 S3: ITD들의 분석Step S3: Analysis of ITDs

ITD는 양 채널들의 파형들 사이에 최상의 일치를 제공하는 시간 또는 위상 정렬에 의해 결정된다. ITD를 얻는 하나의 방법은 2개의 대응하는 서브 대역 신호들 사이의 교차-상관 관계 함수를 연산하고 최대값을 찾는 것이다. 교차-상관 관계 함수에서 이러한 최대값에 대응하는 지연은 ITD 값으로서 사용될 수 있다. 제 2 방법은 좌측 및 우측 서브대역의 분석적 신호들을 연산하고(즉, 페이스 및 엔벨로프 값들을 연산함), IPD 파라메터로서 채널들 간의 (평균) 위상 차이를 사용하는 것이다.ITD is determined by the time or phase alignment that provides the best match between the waveforms of both channels. One way to obtain an ITD is to compute a cross-correlation function between two corresponding subband signals and find the maximum. The delay corresponding to this maximum in the cross-correlation function can be used as the ITD value. The second method is to compute the analytical signals of the left and right subbands (ie, calculate the face and envelope values), and use the (average) phase difference between the channels as the IPD parameter.

단계 S4: 상관 관계의 분석Step S4: Analysis of Correlation

상관 관계는 먼저 대응하는 서브 대역 사이에 최상의 일치를 제공하는 ILD 및 ITD를 우선 발견하고, 이어서 ITD 및/또는 ILD에 대한 보상 후 파형들의 유사성을 측정함으로써 얻어진다. 따라서, 이 프레임워크에서, 상관 관계는 ILD들 및/또는 ITD들에 속할 수 없는 대응하는 서브 대역 신호들의 유사성 또는 비유사성으로서 정의된다. 이 파라메터에 대한 적절한 척도는 교차-상관 관계 함수의 최대값(즉, 지연들의 세트를 가로지를 최대값)이다. 그러나, 대응하는 서브대역들의 합 신호에 비교한 ILD 및/또는 ITD 보상 후 차이 신호의 상대적 에너지 등의 다른 척도들이 사용될 수 있다(바람직하게는 ILD들 및/또는 ITD들에 대해 역시 보상됨). 이러한 차이 파라메터는 기본적으로 (최대) 상관 관계의 선형 변환이다.The correlation is obtained by first finding the ILD and ITD that provide the best match between the corresponding subbands, and then measuring the similarity of the waveforms after compensation for the ITD and / or ILD. Thus, in this framework, correlation is defined as the similarity or dissimilarity of corresponding subband signals that may not belong to ILDs and / or ITDs. A suitable measure for this parameter is the maximum value of the cross-correlation function (ie the maximum value across the set of delays). However, other measures may be used (preferably compensated for ILDs and / or ITDs), such as the relative energy of the difference signal after the ILD and / or ITD compensation compared to the sum signal of the corresponding subbands. This difference parameter is basically a linear transformation of the (maximum) correlation.

후속 단계들 S5, S6 및 S7에서, 측정된 파라메터들이 양자화된다. 파라메터들의 전송의 중요한 쟁점은 파라메터 표시의 정확도(즉, 양자화 에러들의 크기)이고, 이는 필수적인 전송 용량에 직접적으로 관련된다. 이 섹션에서, 공간적 파라메터들의 양자화와 관련된 여러 가지 쟁점들이 고찰될 것이다. 기본적인 개념은 이른 바 공간적 큐들의 바로-인식 가능한 차이들(JND들)에 대한 양자화 에러들을 기초한 것이다. 보다 명확히 하기 위해, 양자화 에러는 파라메터들에서 변화에 대한 인간 청각 시스템의 감응성에 의해 결정된다. 파라메터들의 변화들에 대한 감응성은 파라메터들 자체의 값들에 강력히 의존하기 때문에, 우리는 이산적인 양자화 단계들을 결정하기 위해 다음 방법들을 적용한다In subsequent steps S5, S6 and S7 the measured parameters are quantized. An important issue in the transmission of parameters is the accuracy of the parameter representation (ie the magnitude of quantization errors), which is directly related to the necessary transmission capacity. In this section, various issues related to the quantization of spatial parameters will be considered. The basic concept is based on quantization errors for the so-called perceptible differences (JNDs) of spatial cues. For clarity, the quantization error is determined by the human auditory system's sensitivity to changes in parameters. Because sensitivity to changes in parameters is strongly dependent on the values of the parameters themselves, we apply the following methods to determine discrete quantization steps.

단계 S5: ILD들의 양자화Step S5: Quantization of ILDs

이는 ILD에서 변화들에 대한 감응성이 ILD 자체에 의존한다는 정신 음향적연구로부터 공지된다. ILD가 dB로 표현되는 경우, 0dB의 기준치로부터 대략 1dB의 편차가 검출될 수 있는 한편, 3dB의 수치의 변화들은 기준 레벨 차이가 20dB에 상당하는 양인 경우에 필요하다. 따라서, 양자화 에러들은 좌측 및 우측 채널들의 신호가 보다 큰 레벨 차이를 갖는 경우에 보다 커질 수 있다. 예를 들면, 이는 먼저 채널들 사이의 레벨 차이를 측정하고, 이어서 얻어진 레벨 차이의 비선형(압축) 변환에 의해서 및 순차로 선형 양자화 프로세스에 의해서 또는 비선형 분포를 갖는 유효 ILD 값들에 대한 색인표을 사용함으로써 적용될 수 있다. 아래 실시예는 그러한 색인표의 일 예를 제공한다.This is known from psychoacoustic studies that the sensitivity to changes in the ILD depends on the ILD itself. When the ILD is expressed in dB, a deviation of approximately 1 dB from the reference value of 0 dB can be detected, while changes in the numerical value of 3 dB are necessary when the reference level difference is an amount equivalent to 20 dB. Thus, the quantization errors can be greater if the signal of the left and right channels have a greater level difference. For example, this can be done by first measuring the level difference between the channels and then by nonlinear (compression) transformation of the obtained level difference and sequentially by linear quantization process or by using an index table for valid ILD values with nonlinear distribution. Can be applied. The example below provides an example of such an index table.

단계 S6: ITD들의 양자화Step S6: Quantization of ITDs

ITD들에서 변화들에 대한 감응성은 일정한 위상 임계값을 갖는 것으로서 특성화될 수 있다. 이는 지연 시간들의 견지에서, ITD의 양자화 단계는 주파수에 의해 감소되어야 한다. 대안으로, ITD가 위상 차이들의 형태로 나타나는 경우, 양자화 단계들은 주파수와 독립적이어야 한다. 이를 구현하는 하나의 방법은 양자화 단계로서 고정 위상 차이를 취하고 각각의 주파수 대역에 대한 대응하는 시간 지연을 결정하는 것이다. 이어서, 이러한 ITD 값은 양자화 단계로서 사용된다. 다른 방법은 주파수-독립형 양자화 체제에 따르는 위상 차이들을 전송하는 것이다. 이것은 또한 특정 주파수 이상에서, 인간의 청각 시스템이 미세 구조의 파형들에서 ITD들에 감응하지 않는 것으로 밝혀졌다. 이러한 현상은 특정 주파수(전형적으로 2kHz)에 이르기까지 ITD 파라메터들을 전송함으로써만 전개될 수 있다.Sensitivity to changes in ITDs can be characterized as having a constant phase threshold. This is in terms of delay times, the quantization step of the ITD should be reduced by frequency. Alternatively, if the ITD appears in the form of phase differences, the quantization steps should be frequency independent. One way to implement this is to take a fixed phase difference as the quantization step and determine the corresponding time delay for each frequency band. This ITD value is then used as the quantization step. Another method is to transmit phase differences according to the frequency-independent quantization regime. It has also been found that above a certain frequency, the human auditory system is not sensitive to ITDs in microstructured waveforms. This phenomenon can only be developed by transmitting ITD parameters down to a certain frequency (typically 2 kHz).

제 3의 비트스트림 감소 방법은 동일한 서브대역의 ILD 및/또는 상관 관계파라메터들에 의존하는 ITD 양자화 단계들을 포함시키는 것이다. 큰 ILD들에 대해, ITD들는 정확히 적게 코딩될 수 있다. 더욱이, 상관 관계가 매우 낮은 경우, ITD에서 변화들에 대한 인간의 감응성은 감소되는 것으로 알려졌다. 따라서, 보다 큰 ITD 양자화 에러들이 상관 관계가 적은 경우에 적용될 수 있다. 이러한 개념의 극단적인 예는 상관 관계가 특정한 임계값 이하인 경우 및/또는 ILD가 동일한 서브대역에 대해 충분히 큰 경우(전형적으로 약 20dB) ITD들를 전혀 전송하지 않는 것이다.A third bitstream reduction method is to include ITD quantization steps that depend on ILD and / or correlation parameters of the same subband. For large ILDs, ITDs can be coded exactly less. Moreover, when the correlation is very low, it is known that human sensitivity to changes in ITD is reduced. Thus, larger ITD quantization errors can be applied when there is little correlation. An extreme example of this concept is no transmission of ITDs if the correlation is below a certain threshold and / or if the ILD is large enough for the same subband (typically about 20 dB).

단계 S7: 상관 관계의 양자화Step S7: Quantization of Correlation

상관 관계의 양자화 에러는 (1) 상관 관계값 자체 및 가능하게는 (2) ILD에 의존한다. +1에 근사하는 상관 관계값은 큰 정확도(즉, 작은 양자화 단계)로 코딩되는 한편, 0에 근사하는 상관 관계값들은 낮은 정확도(큰 양자화 단계)로 코딩될 수 있다. 비선형으로 분포된 상관 관계 값들의 세트의 일 예가 이 실시예에 주어진다. 제 2의 확률은 동일한 서브대역의 측정된 ILD에 의존하는 상관 관계에 대한 양자화 단계들을 사용하는 것이고: 큰 ILD들(즉, 하나의 채널이 에너지의 견지에서 지배적임)에 대해, 상관 관계에서 양자화 에러들이 커진다. 이러한 원리의 극도의 실시예는 그 서브대역에 대한 ILD의 절대값이 특정 임계값 미만인 경우 특정 서브대역에 대한 상관 관계 값들을 전혀 전송하지 않는 것일 수 있다.The quantization error of the correlation depends on (1) the correlation value itself and possibly (2) the ILD. Correlation values approximating +1 can be coded with greater accuracy (ie, smaller quantization steps), while correlation values approximating zero can be coded with lower accuracy (large quantization steps). An example of a set of non-linearly distributed correlation values is given in this embodiment. The second probability is to use quantization steps for correlations that depend on the measured ILD of the same subband: for large ILDs (ie, one channel dominates in terms of energy), quantization in correlation The errors are big. An extreme embodiment of this principle may be to transmit no correlation values for a particular subband if the absolute value of the ILD for that subband is below a certain threshold.

단계 S8에서, 모노럴 신호 S는 유입되는 오디오 신호들로부터, 예를 들면 유입되는 신호 성분들의 합 신호로서, 지배적인 신호를 결정함으로써, 유입되는 신호 성분들로부터 주요 성분 신호를 발생시킴으로서 생성된다. 이러한 프로세서는 바람직하게는 모노 신호를 생성하기기 위해, 즉 먼저 조합 전에 ITD 또는 IPD를 사용하여 서브대역 파형들을 정렬시킴으로써 추출된 공간적 파라메터들을 사용한다.In step S8, the monaural signal S is generated by generating the dominant component signal from the incoming signal components by determining the dominant signal from the incoming audio signals, for example as the sum signal of the incoming signal components. Such a processor preferably uses the spatial parameters extracted to produce a mono signal, ie first by aligning the subband waveforms using ITD or IPD prior to combining.

마지막으로, 단계 S9에서, 코딩된 신호(102)는 모노럴 신호 및 결정된 파라메터들로부터 발생된다. 대안으로, 합 신호 및 공간적 파라메터들은 동일하거나 또는 상이한 채널들을 통해 별개의 신호들로서 통신될 수 있다.Finally, in step S9, the coded signal 102 is generated from the monaural signal and the determined parameters. Alternatively, the sum signal and spatial parameters can be communicated as separate signals on the same or different channels.

상기 방법은 대응하는 배치에 의해 구현될 수 있고, 예를 들면 범용 또는 특수 목적의 프로그램 가능한 마이크로프로세서들, 디지털 신호 프로세서들(DSP), 용도 특이적 집적 회로들(ASIC), 프로그램 가능한 논리 어레이들(PLA), 필드 프로그램 가능한 게이트 어레이들(FPGA), 특수 목적의 전자 회로들 등 또는 이들의 조합물로서 구현될 수 있음이 주목된다.The method can be implemented by a corresponding arrangement, for example general purpose or special purpose programmable microprocessors, digital signal processors (DSP), application specific integrated circuits (ASIC), programmable logic arrays. (PLA), field programmable gate arrays (FPGA), special purpose electronic circuits, or the like, or combinations thereof.

도 2는 본 발명의 일 실시예에 따른 코딩 시스템의 개략적 블록도를 나타낸다. 이 시스템은 인코더(201) 및 대응하는 디코더(202)를 포함한다. 디코더(201)는 2개의 성분들 L 및 R을 갖는 스테레오 신호를 수신하고, 디코더(202)로 통신되는 공간적 파라메터들 P 및 합 신호 S를 포함하는 코딩된 신호(203)를 생성한다. 이 신호(203)는 임의의 적절한 통신 채널들(204)을 통해 통신될 수 있다. 대안으로 또는 추가로, 신호는 소거 가능한 저장 매체(214), 예를 들면 메모리 카드 상에 저장될 수 있고, 이는 인코더로부터 디코더로 전송될 수 있다.2 shows a schematic block diagram of a coding system according to an embodiment of the present invention. The system includes an encoder 201 and a corresponding decoder 202. The decoder 201 receives a stereo signal having two components L and R, and generates a coded signal 203 that includes the spatial parameters P and the sum signal S communicated to the decoder 202. This signal 203 can be communicated via any suitable communication channels 204. Alternatively or in addition, the signal may be stored on an erasable storage medium 214, for example a memory card, which may be transmitted from an encoder to a decoder.

인코더(201)는 바람직하게는 각각의 시간/주파수 슬롯에 대해 유입되는 신호들 L 및 R 각각의 공간적 파라메터들을 분석하기 위한 분석 모듈들(205 및 206)을 포함한다. 인코더는 양자화된 공간적 파라메터들을 발생시키는 파라메터 추출 모듈(207); 및 적어도 2개의 입력 신호들의 특정 조합으로 구성된 합(또는 지배적) 신호를 발생시키는 조합기 모듈(208)을 더 포함한다. 인코더는 모노럴 신호 및 공간적 파라메터들을 포함하는 결과의 코딩된 신호(203)를 발생시키는 인코딩 모듈(209)을 더 포함한다. 일 실시예에서, 이 모듈(209)은 다음 함수들: 비트율 할당, 프레이밍, 손실 없는 코딩 등 중의 하나 이상을 더 수행한다.Encoder 201 preferably includes analysis modules 205 and 206 for analyzing the spatial parameters of incoming signals L and R, respectively, for each time / frequency slot. The encoder includes a parameter extraction module 207 for generating quantized spatial parameters; And a combiner module 208 for generating a sum (or dominant) signal comprised of a particular combination of at least two input signals. The encoder further includes an encoding module 209 for generating the resultant coded signal 203 comprising the monaural signal and the spatial parameters. In one embodiment, this module 209 further performs one or more of the following functions: bit rate allocation, framing, lossless coding, and the like.

합성(디코더(202)에서)은 좌측 및 우측 출력 신호들을 발생시키기 위해 합 신호에 공간적 파라메터들을 인가함으로써 수행된다. 따라서, 디코더(202)는 모듈(209)의 역 오퍼레이션을 수행하고, 코딩된 신호(203)로부터 파라메터들 P 및 합 신호 S를 추출하는 디코딩 모듈(210)을 포함한다. 디코더는 합(또는 지배적) 신호 및 공간적 파라메터들로부터 스테레오 성분들 L 및 R을 회수하는 합성 모듈(211)을 추가로 포함한다.Synthesis (in decoder 202) is performed by applying spatial parameters to the sum signal to generate left and right output signals. Thus, the decoder 202 includes a decoding module 210 that performs the inverse operation of the module 209 and extracts the parameters P and the sum signal S from the coded signal 203. The decoder further includes a synthesis module 211 for recovering the stereo components L and R from the sum (or dominant) signal and the spatial parameters.

이 실시예에서, 공간적 파라메터 설명은 스테레오 오디오 신호를 인코딩하기 위해 모노럴 (단일 채널) 오디오 코더와 조합된다. 기재된 실시예는 스테레오 신호들 상에서 작업하지만, 일반적인 개념은 n-채널 오디오 신호들에 적용될 수 있음에 주의해야 한다(단, n>1).In this embodiment, the spatial parameter description is combined with a monaural (single channel) audio coder to encode the stereo audio signal. Although the described embodiment works on stereo signals, it should be noted that the general concept can be applied to n-channel audio signals (where n> 1).

분석 모듈들(205 및 206)에서, 좌측 및 우측으로 인입하는 신호들 L 및 R 각각은 여러 가지 시간 프레임들(예, 각각 44.1 kHz 샘플링 속도로 2048 샘플들을 포함함)에서 분할되고, 제곱근 해닝(Hanning) 윈도우로 윈도우즈된다. 순차로, FFTs가 연산된다. 음의 FFT 주파수들이 폐기되고, 결과의 FFTs가 FFT 빈들(bins)의 그룹들(서브대역들)로 부분 분할된다. 서브 대역 g에서 합해진 FFT 빈들의 수는 주파수에 의존하고; 보다 큰 주파수들에서, 보다 적은 주파수들에서 보다 많은 빈들이 조합된다. 일 실시예에서, 대략 1.8ERB들(직사각형 대역폭에 등가임)에 대응하는 FFT 빈들이 그룹화되고, 전체 가청 주파수 범위를 나타내도록 20개의 서브 대역들을 초래한다. 각각의 순차의 서브 대역의 FFT 빈들 S[g]의 결과의 수(가장 낮은 주파수에서 시작함)는 다음과 같다.In the analysis modules 205 and 206, the signals L and R incoming to the left and right, respectively, are divided in various time frames (e.g., each containing 2048 samples at a 44.1 kHz sampling rate), and square root hanning ( Hanning) Windows. In turn, FFTs are computed. Negative FFT frequencies are discarded and the resulting FFTs are partly divided into groups (subbands) of FFT bins. The number of FFT bins summed in subband g depends on the frequency; At higher frequencies, more bins are combined at less frequencies. In one embodiment, FFT bins corresponding to approximately 1.8ERBs (equivalent to rectangular bandwidth) are grouped, resulting in 20 subbands to represent the entire audio frequency range. The number of results (starting at the lowest frequency) of the FFT bins S [g] of each sequential subband is as follows.

따라서, 제 1의 3개의 서브대역들은 4FFT 빈들을 포함하고, 제 4 서브대역은 5FFT 빈들을 포함한다. 각각의 서브대역에 대해, 대응하는 ILD, ITD 및 상관 관계(r)가 연산된다. ITD 및 상관 관계는 다른 군들에 속하는 모든 FFT 빈들을 0으로 설정하고, 좌측 및 우측 채널들로부터 결과의 (대역-제한된) FFT들을 승산하고, 이어서 역 FFT 변환시킴으로써 간단히 연산된다. 결과의 교차-상관 관계 함수는 -64 내지 +63 샘플들 사이의 채널간 지연 내에서 피크에 대해 스캔된다. 피크에 대응하는 내부 지연은 ITD 값으로서 사용되고, 이 피크에서 교차-상관 관계 함수의 값은 이러한 서브대역의 채널간 상관 관계로서 사용된다. 마지막으로, ILD는 각각의 서브대역에 대해 좌측 및 우측 채널들의 전력비를 취함으로써 간단히 연산된다.Thus, the first three subbands include 4FFT bins, and the fourth subband includes 5FFT bins. For each subband, the corresponding ILD, ITD, and correlation r are computed. ITD and correlation are computed simply by setting all FFT bins belonging to different groups to zero, multiplying the resulting (band-limited) FFTs from the left and right channels, and then inverse FFT transform. The cross-correlation function of the result is scanned for peaks within the interchannel delay between -64 and +63 samples. The internal delay corresponding to the peak is used as the ITD value, and the value of the cross-correlation function at this peak is used as the interchannel correlation of this subband. Finally, the ILD is simply calculated by taking the power ratio of the left and right channels for each subband.

조합기 모듈(208)에서, 좌측 및 우측 서브대역들은 위상 정정(일시적 정렬) 후 합산된다. 이러한 위상 상관 관계는 그러한 서브대역에 대해 연산된 ITD로부터 후속하고, ITD/2로 좌측-채널 서브밴드를 지연시키고 -ITD/2로 우측-채널 서브밴드를 지연시키는 것으로 구성된다. 이 지연은 각각의 FFT 빈의 위상 각들의 적절한변경에 의해 주파수 도메인에서 수행된다. 순차로, 합 신호는 좌측 및 우측 서브대역 신호들의 위상-변형된 버전들을 부가함으로써 연산된다. 마지막으로, 상관되지 않거나 또는 상관된 부가물을 보상하기 위해, 합 신호의 각각의 서브대역은 대응하는 서브대역의 r 상관 관계에 따라, 제곱근(2/(1+r))이 승산된다. 필요할 경우, 합 신호는 (1) 음의 주파수들에서 복수 공액들(complex conjugates)의 삽입, (2) 역 FFT, (3) 윈도우화, 및 (4)오버랩-부가에 의해 시간 도메인으로 변환될 수 있다.In the combiner module 208, the left and right subbands are summed after phase correction (temporary alignment). This phase correlation consists of following the ITD computed for that subband, delaying the left-channel subband with ITD / 2 and delaying the right-channel subband with -ITD / 2. This delay is performed in the frequency domain by appropriate changes in the phase angles of each FFT bin. In turn, the sum signal is computed by adding phase-modified versions of the left and right subband signals. Finally, to compensate for uncorrelated or correlated additions, each subband of the sum signal is multiplied by a square root (2 / (1 + r)), according to the r correlation of the corresponding subband. If necessary, the sum signal may be transformed into the time domain by (1) insertion of multiple conjugates at negative frequencies, (2) inverse FFT, (3) windowing, and (4) overlap-addition. Can be.

파라메터 추출 모듈(207)에서, 공간적 파라메터들은 양자화되고, ILD들(dB로)는 다음 세트 I 밖의 가장 근사한 값으로 양자화된다:In the parameter extraction module 207, the spatial parameters are quantized and the ILDs (in dB) are quantized to the nearest value outside of the next set I:

ITD 양자화 단계들은 0.1rad의 각각의 서브대역의 일정한 위상 차이에 의해 결정된다. 따라서, 각각의 서브대역에 대해, 서브대역 중심 주파수의 0.1rad에 대응하는 시간 차이는 양자화 단계로서 사용된다. 2kHz 이상의 주파수들에 대해, 어떠한 ITD 정보도 전송되지 않는다.The ITD quantization steps are determined by the constant phase difference of each subband of 0.1 rad. Thus, for each subband, the time difference corresponding to 0.1 rad of the subband center frequency is used as the quantization step. For frequencies above 2 kHz, no ITD information is sent.

채널간 상관 관계값 r은 다음 앙상블 R의 가장 가까운 값으로 양자화된다:The interchannel correlation value r is quantized to the nearest value of the following ensemble R:

이는 상관 관계 값당 다른 3개의 비트들을 부담할 것이다.This will bear the other three bits per correlation value.

현재 서브대역의 (양자화된) ILD의 절대값이 19dB의 양인 경우, 어떠한 ITD 및 상관 관계 값들도 이러한 서브대역으로 전송되지 않는다. 특정 서브대역의 (양자화된) 상관 관계 값이 0의 양인 경우, 어떠한 ITD 값도 그러한 서브대역에 대해 전송되지 않는다.If the absolute value of the (quantized) ILD of the current subband is a quantity of 19 dB, no ITD and correlation values are transmitted in this subband. If the (quantized) correlation value of a particular subband is positive, no ITD value is sent for that subband.

이러한 방식으로, 각각의 프레임은 공간적 파라메터들을 전송하기 위해 최대 233비트를 필요로 한다. 1024 프레임들의 프레임 길이에 의해, 전송을 위한 최대 비트율은 10.25kbit/s의 양이다. 엔트로피 코딩 또는 상이한 코딩을 사용하여, 이러한 비트율은 추가로 감소될 수 있음에 주의해야 한다.In this way, each frame needs up to 233 bits to transmit spatial parameters. With a frame length of 1024 frames, the maximum bit rate for transmission is an amount of 10.25 kbit / s. It should be noted that using entropy coding or different coding, this bit rate may be further reduced.

디코더는 합성 모듈(211)을 포함하고, 여기서 스테레오 신호는 수신된 합 신호 및 공간적 파라메터들로부터 합성된다. 따라서, 이러한 설명의 목적으로, 합성 모듈은 상기한 바의 합 신호의 주파수-도메인 표시를 수신하는 것으로 가정된다. 이러한 표시는 시간-도메인 파형의 윈도우화 및 FFT 오퍼레이션들에 의해 얻어질 수 있다. 먼저, 합 신호는 좌측 및 우측 출력 신호들로 복제된다. 순차로, 좌측 및 우측 신호들 간의 상관 관계는 비상관기에 의해 변경된다. 바람직한 실시예에서, 아래 기재되는 바의 비상관기가 사용될 수 있다. 순차로, 좌측 신호의 각각의 서브대역은 -ITD/2 만큼 지연되고, 우측 신호는 그 서브대역에 대응하는 (양자화된) ITD 제공하는 ITD/2 만큼 지연된다. 마지막으로, 좌측 및 우측 서브대역들은 그 서브대역에 대한 ILD에 따라 스케일된다. 일 실시예에서, 상기 변형은 아래 기재된 바의 필터에 의해 수행된다. 출력 신호들을 시간 도메인으로 변환시키기 위해, 다음 단계들: (1) 음의 주파수들에서 복수 공액들의 삽입, (2) 역 FFT, (3) 윈도우화, 및 (4) 오버랩-부가가 수행된다.The decoder includes a combining module 211, where the stereo signal is synthesized from the received sum signal and the spatial parameters. Thus, for the purposes of this description, it is assumed that the synthesis module receives the frequency-domain representation of the sum signal as described above. This indication can be obtained by windowing the time-domain waveform and FFT operations. First, the sum signal is duplicated into left and right output signals. In turn, the correlation between the left and right signals is changed by the decorrelator. In a preferred embodiment, a decorrelator as described below can be used. In turn, each subband of the left signal is delayed by -ITD / 2, and the right signal is delayed by ITD / 2 providing the (quantized) ITD corresponding to that subband. Finally, the left and right subbands are scaled according to the ILD for that subband. In one embodiment, the modification is performed by a filter as described below. To convert the output signals to the time domain, the following steps are performed: (1) insertion of multiple conjugates at negative frequencies, (2) inverse FFT, (3) windowing, and (4) overlap-addition.

도 3은 오디오 신호를 합성하는데 사용하기 위한 필터 방법을 예시한다. 초기 단계 301에서, 인입하는 오디오 신호 x(t)는 많은 프레임들로 세그먼트화된다. 세그먼트화 단계(301)는 적절한 길이의 프레임들 x_R(t), 예를 들면 500-5000 샘플들 범위에서, 1024 또는 2048개 샘플들로 분할된다.3 illustrates a filter method for use in synthesizing an audio signal. In an initial step 301, the incoming audio signal x (t) is segmented into many frames. Segmentation step 301 is divided into frames of appropriate length x _R (t), for example in the range of 500-5000 samples, into 1024 or 2048 samples.

바람직하게는, 세그먼트화는 오버래핑 분석 및 합성 윈도우 함수들을 사용하여 수행됨으로써, 프레임 경계들에 도입될 수 있는 아티팩트들을 억제한다(예, Princen, J. P. 및 Bradley, A. B.: "Analysis/synthesis filterbank design based on time domain aliasing cancellation", IEEE transactions on Acoustics, Speech and Signal processing, ASSP 34권, 1989 참조)Preferably, segmentation is performed using overlapping analysis and synthesis window functions, thereby suppressing artifacts that may be introduced at frame boundaries (eg, Princen, JP and Bradley, AB: “Analysis / synthesis filterbank design based on time domain aliasing cancellation ", IEEE transactions on Acoustics, Speech and Signal processing, ASSP 34, 1989)

단계 302에서, 프레임들 x_R(t) 각각은 퓨리에 변환을 적용함으로써 주파수 도메인으로 변환되고, 바람직하게는 고속 퓨리에 변환(FFT)으로서 구현된다. n-번째 프레임 x_R(t)의 결과의 주파수 표시는 많은 주파수 성분들 X(k,n)을 포함하고, 여기서 파라메터 n은 프레임수를 지시하고, 0<k<K인, 파라메터 k는 주파수 ω_k에 대응하는 주파수 빈 또는 주파수 성분을 지시한다. 일반적으로, 주파수 도메인 성분들 X(k,n)은 복잡한 수들이다.In step 302, each of the frames x _R (t) is transformed into the frequency domain by applying a Fourier transform, preferably implemented as a Fast Fourier Transform (FFT). The frequency representation of the result of the n-th frame x _R (t) includes many frequency components X (k, n), where parameter n indicates the number of frames and parameter k is 0 <k <K, where frequency indicates a frequency bin or frequency component corresponding to ω _k . In general, the frequency domain components X (k, n) are complex numbers.

단계 303에서, 현재 프레임에 대한 목적하는 필터는 수신된 시간-변화하는 공간적 파라메터들에 따라 결정된다. 목적하는 필터는 n-번째 프레임에 대해 K 복잡한 중량 인자들 0<k<K, F(k,n)의 세트를 포함하는 목적하는 필터 응답으로서 표현된다. 필터 응답 F(k,n)은 2개의 실제 번호들, 즉에따라 그의 크기 a(k,n) 및 그의 위상으로 표시될 수 있다.In step 303, the desired filter for the current frame is determined according to the received time-varying spatial parameters. The desired filter is represented as the desired filter response containing a set of K complex weight factors 0 <k <K, F (k, n) for the n-th frame. The filter response F (k, n) is two real numbers, namely According to its magnitude a (k, n) and its phase It may be indicated by.

주파수 도메인에서, 여과된 주파수 성분들은 Y(k,n) = F(k,n)ㆍF(k,n)이고, 즉, 이들은 입력 신호의 주파수 성분들 F(k,n)과 필터 응답 F(k,n)의 승산을 초래한다. 숙련자에게 명백하듯이, 주파수 도메인에서 이러한 승산은 입력 신호 프레임 x_n(t)과 대응하는 필터 f_n(t)의 상승에 대응한다.In the frequency domain, the filtered frequency components are Y (k, n) = F (k, n) .F (k, n), that is, they are frequency components F (k, n) and filter response F of the input signal. This results in a multiplication of (k, n). As will be apparent to the skilled person, this multiplication in the frequency domain corresponds to the rise of the corresponding filter f _n (t) with the input signal frame x _n (t).

단계 304에서, 목적하는 필터 응답 F(k,n)은 이를 현재 프레임 X(k,n)에 적용시키기 전에 변경된다. 특히, 적용되어야 할 실제 필터 응답 F'(k,n)은 목적하는 필터 응답 F(k,n) 및 이전 프레임들의 정보(308)의 함수로서 결정된다. 바람직하게는, 이러한 정보는 다음에 따라 1개 이상의 이전 프레임들의 실제 및/또는 목적하는 필터 응답을 포함한다.In step 304, the desired filter response F (k, n) is changed before applying it to the current frame X (k, n). In particular, the actual filter response F '(k, n) to be applied is determined as a function of the desired filter response F (k, n) and the information 308 of previous frames. Preferably, this information comprises the actual and / or desired filter response of one or more previous frames according to the following.

따라서, 이전 필터 응답들의 역사에 의존하는 실제 필터 응답을 만들므로서, 연속적인 프레임들 사이의 필터 응답에서 변화들에 의해 도입된 아티팩트(artifacts)들은 효율적으로 억제될 수 있다. 바람직하게는, 변환 함수 Φ의 실제 형태가 다이내믹하게-변화하는 필터 응답들로부터 초래되는 오버랩-부가된 아티팩트들을 감소시키기 위해 선택된다.Thus, artifacts introduced by changes in the filter response between successive frames can be effectively suppressed, making the actual filter response dependent on the history of previous filter responses. Preferably, the actual form of transform function Φ is chosen to reduce overlap-added artifacts resulting from the dynamically-changing filter responses.

예를 들면, 변환 함수 Φ는 단일의 이전의 응답 함수의 함수일 수 있다. 예를 들면 F'(k,n) = Φ₁[F(k,n), F(k,n-1)] 또는 F'(k,n) = Φ₂[F'(k,n), F'(k,n-1)]. 다른 실시예에서, 변환 함수는 많은 이전의 응답 함수들에 걸쳐 플로팅 평균, 예를 들면 이전의 응답 함수들의 여과된 버전 등을 포함할 수 있다. 변환 함수 Φ의 바람직한 실시예들은 아래 보다 상세히 기재될 것이다.For example, the transform function φ may be a function of a single previous response function. For example, F '(k, n) = Φ ₁ [F (k, n), F (k, n-1)] or F' (k, n) = Φ ₂ [F '(k, n), F '(k, n-1)]. In other embodiments, the transform function may include a floating average over many previous response functions, eg, a filtered version of previous response functions, and the like. Preferred embodiments of the transform function Φ will be described in more detail below.

단계 305에서, 실제 필터 응답 F'(k,n)은 Y(k,n)=F'(k,n)ㆍX(k,n)에 따라 입력 신호의 현재 프레임의 주파수 성분들 X(k,n)과 대응하는 필터 응답 인자들 F'(k,n)을 승산함으로써 현재 프레임에 적용된다.In step 305, the actual filter response F '(k, n) is determined by the frequency components X (k) of the current frame of the input signal according to Y (k, n) = F' (k, n) .X (k, n). , n) is applied to the current frame by multiplying the corresponding filter response factors F '(k, n).

단계 306에서, 결과의 프로세스된 주파수 성분들 Y(k,n)은 필터링된 프레임들 y_n(t)을 초래하는 시간 도메인으로 다시 변환된다. 바람직하게는, 역 변환은 역 고속 퓨리에 변환(IFFT)으로서 구현된다.At step 306, the resulting processed frequency components Y (k, n) are converted back to the time domain resulting in filtered frames y _n (t). Preferably, the inverse transform is implemented as an inverse fast Fourier transform (IFFT).

마지막으로, 단계 307에서, 필터링된 프레임들은 오버랩-부가된 방법에 의해 필터링된 신호 y(t)에 재조합된다. 그러한 오버랩 부가 방법의 효율적인 구현은 Bergmans, J. W. M.: "Digital basband transmission and recording", Kluwer, 1996에 개시된다.Finally, in step 307, the filtered frames are recombined into the filtered signal y (t) by an overlap-added method. An efficient implementation of such overlap addition method is disclosed in Bergmans, J. W. M .: "Digital basband transmission and recording", Kluwer, 1996.

일 실시예에서, 단계 304의 변환 함수 Φ는 현재 프레임과 이전 프레임 사이의 위상-변화 리미터로서 구현된다. 이러한 실시예에 따라, 대응하는 주파수 성분의 이전 샘플에 인가된 실제 위상 변형에 비교한 각각의 주파수 성분 F(k,n)의 위상 변화 δ(k)는 다음과 같이 연산된다. 즉,.In one embodiment, the transform function Φ of step 304 is implemented as a phase-change limiter between the current frame and the previous frame. According to this embodiment, the actual phase distortion applied to the previous sample of the corresponding frequency component The phase change δ (k) of each frequency component F (k, n) compared to is calculated as follows. In other words, .

순차로, 목적하는 필터 F(k,n)의 위상 성분은 프레임들을 가로지르는 위상변화가 감소되는 방식으로, 그 변화가 오버랩-부가된 아티팩트들을 초래할 수 있는 경우에 변형된다. 이러한 실시예에 따라, 이는 실제 위상 차이가 소정의 임계값을 초과하지 않도록 보장함으로써, 예를 들면 다음에 따르는 위상 차이의 단순한 커팅에 의해 성취된다.In turn, the phase component of the desired filter F (k, n) is modified in such a way that the phase change across the frames is reduced, in which case the change can lead to overlap-added artifacts. According to this embodiment, this is achieved by ensuring that the actual phase difference does not exceed a predetermined threshold, for example by simple cutting of the following phase difference.

(1) (One)

임계값 c는 소정의 상수, 예를 들면 π/8 내지 π/3 rad 사이의 상수일 수 있다. 일 실시예에서, 임계값 c는 상수는 아니지만, 예를 들면 시간, 주파수 및/또는 유사한 것의 함수일 수 있다. 더욱이, 위상 변화에 대한 상기 한계에 대한 대안으로, 다른 위상-변화-한계 함수들이 사용될 수 있다.Threshold c may be a constant, for example, a constant between π / 8 and π / 3 rad. In one embodiment, the threshold c is not constant but may be a function of time, frequency and / or the like, for example. Moreover, as an alternative to the above limits on phase change, other phase-change-limit functions can be used.

일반적으로, 상기 실시예에서, 개개의 주파수 성분에 대한 후속 시간 프레임들을 가로지르는 목적하는 위상-변화는 입출력 함수 P(δ(k))에 의해 변환되고, 실제 필터 응답 F'(k,n)은 다음 식으로 주어진다.In general, in this embodiment, the desired phase-change across the subsequent time frames for the individual frequency components is transformed by the input / output function P (δ (k)) and the actual filter response F '(k, n) Is given by

F'(k,n) = F'(k,n-1)ㆍexp[jP(δ(k))]. (2)F '(k, n) = F' (k, n-1) .exp [jP (δ (k))]. (2)

따라서, 이 실시예에 따라, 후속 시간 프레임들을 가로지르는 위상 변화의 변환 함수 P가 도입된다.Thus, according to this embodiment, a transform function P of phase change across subsequent time frames is introduced.

필터 응답의 변환의 다른 실시예에서, 위상 제한 공정은 음조의 적절한 척도, 예를 들면 아래 기재된 예측 방법에 의해 구동된다. 이는 잡음과 같은 신호들에서 발생하는 연속적인 프레임들 사이의 위상 점프들이 본 발명에 따른 위상-변화 제한 공정으로부터 배제될 수 있다는 장점을 갖는다. 이는, 잡음과 같은 신호들에서 그러한 위상 점프들을 제한하는 것이 합성음 또는 금속음으로서 종종 인지되는 잡음형 신호 사운드를 보다 많은 음조를 만들 수 있기 때문에 유리하다.In another embodiment of the transformation of the filter response, the phase limiting process is driven by an appropriate measure of pitch, for example the prediction method described below. This has the advantage that phase jumps between successive frames occurring in signals such as noise can be excluded from the phase-change limiting process according to the present invention. This is advantageous because limiting such phase jumps in signals such as noise can produce more tonality of the noisy signal sound that is often perceived as synthesized or metallic.

이러한 실시예에 따라, 예측되는 위상 에러 θ(k)=(k,n)-(k,n-1)-ω_kㆍh가 산출된다. 여기서, ω_k는 k번째 주파수 성분에 대응하는 주파수를 나타내고, h는 샘플들 중 홉 크기(hop size)를 나타낸다. 여기서, 홉 크기라는 용어는 2개의 인접한 윈도우 센터들 사이의 차이, 즉 대칭 윈도우들에 대한 분석 길이의 절반을 의미한다. 다음에서, 상기 에러는 간격 [-π, +π]으로 래핑되는 것으로 가정된다.According to this embodiment, the predicted phase error θ (k) = (k, n)- (k, n-1) -ω _k -h is calculated. Ω _k denotes a frequency corresponding to a k th frequency component, and h denotes a hop size among samples. The term hop size here means half the difference between two adjacent window centers, ie the analysis length for symmetric windows. In the following, it is assumed that the error is wrapped at intervals [−π, + π].

순차로, k번째 주파수에서 위상 예측 가능성의 양에 대한 예측 척도 P_k는 P_k= (π-｜θ(k)｜/π∈[0,1])에 따라 산출되고, 여기서 ｜ㆍ｜는 절대값을 나타낸다.In turn, the prediction measure P _k for the amount of phase predictability at the k th frequency is _{calculated according} to P _k = (π− | θ (k) | / π∈ [0,1]), where | It represents the absolute value.

따라서, 상기 척도 P_k는 k번째 주파수 빈에서 위상-예측 가능성의 양에 따라 0과 1사이의 값을 생성한다. P_k가 1에 근접한 경우, 밑에 놓인 신호는 높은 정도의 음조를 갖는 것으로 가정될 수 있고, 즉, 실질적으로 사인파 파형을 갖는다. 그러한 신호에 대해, 위상 점프들은 예를 들면 오디오 신호의 청취자에 의해 용이하게 인지될 수 있다. 따라서, 위상 점프들은 이러한 경우에 제거되어야 하는 것이 바람직하다. 다른 한편, P_k의 값이 0에 근사하는 경우, 언더라잉 신호(underlying signal)는 잡음으로 가정될 수 있다. 잡음 신호들에 대해, 위상 점프들은 용이하게 인지되지 않고 따라서 허용될 수 있다.Thus, the measure P _k produces a value between 0 and 1 depending on the amount of phase-predictability in the k th frequency bin. When P _k is close to 1, the underlying signal can be assumed to have a high degree of tonality, ie it has a substantially sinusoidal waveform. For such a signal, phase jumps can be easily recognized, for example, by a listener of the audio signal. Therefore, phase jumps should be eliminated in this case. On the other hand, _if the value of P _k is close to zero, the underlying signal can be assumed to be noise. For noise signals, phase jumps are not easily recognized and can therefore be allowed.

따라서, 위상 제한 함수는 P_k가 소정의 임계값을 초과하는 경우에 적용되고, 즉, 척도 P_k> A, 다음에 따라 실제 필터 응답 F'(k,n)을 초래한다.Thus, the phase limit function is applied when P _k exceeds a predetermined threshold, i.e., results in a measure P _k > A, followed by the actual filter response F '(k, n).

여기서, A는 각각 +1, 0인 P의 상위 및 하위 경계들에 의해 제한된다. A의 정확한 값은 실제 구현에 의존한다. 예를 들면, A는 0.6과 0.9 사이에서 선택될 수 있다.Here, A is limited by the upper and lower bounds of P, which are +1 and 0, respectively. The exact value of A depends on the actual implementation. For example, A can be chosen between 0.6 and 0.9.

대안으로, 음조를 추정하는 임의의 다른 적절한 척도가 사용될 수 있는 것이 이해된다. 또 다른 실시예에서, 상기 허용되는 위상 점프 c는 음조의 적절한 척도, 예를 들면 상기 척도 P_k에 의존하여 이루어짐으로써, P_k가 크거나 또는 그 역인 경우 보다 큰 위상 점프들을 허용한다.Alternatively, it is understood that any other suitable measure for estimating pitch may be used. In another embodiment, the allowed phase jump c is made in dependence on a suitable measure of pitch, e.g., the measure P _k , thereby allowing larger phase jumps if P _k is large or vice versa.

도 4는 오디오 신호를 합성하는데 사용하기 위한 비상관기를 도시한다, 비상관기는 채널간 교차-상관 관계 r 및 채널 차이 c를 나타내는 파라메터를 포함하는 공간적 파라메터들 P의 세트 및 모노럴 신호 x를 수신하는 전역-통과 필터(401)를 포함한다. 파라메터 c는 ILD = klog(c)에 의해 채널간 레벨 차이에 관련되고, 여기서, k는 상수이고, 즉, ILD는 c의 대수에 비례하는 것에 주의하자.4 shows a decorrelator for use in synthesizing an audio signal, the decorrelator receiving a monaural signal x and a set of spatial parameters P comprising a parameter representing a cross-correlation r between the channels and a channel difference c; An all-pass filter 401. Note that parameter c is related to the level difference between channels by ILD = klog (c), where k is a constant, i.e., ILD is proportional to the logarithm of c.

바람직하게는, 전역-통과 필터는 낮은 주파수들에서보다 높은 주파수들에서 비교적 작은 지연을 제공하는 주파수-의존성 지연을 포함한다. 이는 슈뢰더-위상콤플렉스(Schroeder-phase complex)의 일 기간을 포함하는 전역-통과 필터로 전역-통과 필터의 고정된 지연을 대체함으로써 성취될 수 있다(예, M. R. Schroeder, "Synthesis of low-peak-factor signals and binary sequences with low autocorrelation", IEEE Transact. Inf. Theor. 16:85-89, 1970 참조). 비상관기는 디코더로부터 공간적 파라메터들을 수신하고 채널간 교차-상관 관계 r 및 채널 차이 c를 추출하는 분석 회로(402)를 더 포함한다. 회로(402)는 아래 고찰하게 될 혼합 매트릭스 M(α,β)를 결정한다. 혼합 매트릭스의 성분들은 변환 회로(403) 내로 공급되어, 입력 신호 x 및 여과된 신호를 추가로 수신한다. 회로(403)은 다음에 따른 혼합 오퍼레이션을 수행하고Preferably, the all-pass filter includes a frequency-dependent delay that provides a relatively small delay at higher frequencies than at lower frequencies. This can be achieved by replacing the fixed delay of the global-pass filter with a global-pass filter that includes one period of Schroeder-phase complex (eg, MR Schroeder, "Synthesis of low-peak-"). factor signals and binary sequences with low autocorrelation ", IEEE Transact. Inf. Theor. 16: 85-89, 1970). The decorrelator further comprises an analysis circuit 402 which receives spatial parameters from the decoder and extracts the cross-channel cross-correlation r and the channel difference c. Circuit 402 determines the mixing matrix M (α, β) to be considered below. The components of the mixing matrix are fed into a conversion circuit 403 to provide input signal x and filtered signal. Receive additionally. Circuit 403 performs a blend operation according to

(3) (3)

출력 신호들 L 및 R을 초래한다.Resulting in output signals L and R.

신호들 L 및 R 사이의 상관 관계는 r=cos(α)에 따라 신호들 x 및에 의해 스팬(span)된 공간에서 신호들 L 및 R 각각을 나타내는 벡터들 사이의 각 α로서 표현될 수 있다. 결과적으로, 정확한 각 거리(correct angular distance)를 나타내는 벡터들의 임의의 쌍은 특이적 상관 관계를 갖는다.The correlation between signals L and R is based on r = cos (α) It can be expressed as an angle α between vectors representing each of signals L and R in a space spanned by. As a result, any pair of vectors representing a correct angular distance has a specific correlation.

따라서, 신호들 x 및를 소정의 상관 관계 r에 의해 신호들 L 및 R로 변환시키는 혼합 매트릭스 M은 다음과 같이 표현될 수 있다:Thus, signals x and The mixed matrix M which transforms into signals L and R by a given correlation r can be expressed as follows:

(4) (4)

따라서, 전역-통과 필터링된 신호의 양은 목적하는 상관 관계에 의존한다. 더욱이, 전역-통과 신호 성분의 에너지는 양 출력 채널들에서 동일하다(하지만 180˚위상 시프트됨).Thus, the amount of all-pass filtered signal depends on the desired correlation. Moreover, the energy of the all-pass signal component is the same (but shifted 180 ° out of phase) in both output channels.

매트릭스 M이 다음 식으로 주어지는 경우,If matrix M is given by

(5) (5)

즉, α=90˚일때, 상관되지 않은 출력 신호들(r=0)에 대응하는 경우는, 로리드센 비상관기(Lauridsen decorrelator)에 대응하는 것에 주의하자.That is, when α = 90 °, note that it corresponds to Lauridsen decorrelator when it corresponds to uncorrelated output signals r = 0.

식(5)의 매트릭스에 의해 문제점을 예시하기 위해, 우리는 좌측 채널쪽으로 패닝(panning)하는 최고 진폭을 갖는 상황, 즉 특정 신호가 좌측 채널에만 존재하는 경우를 가정한다. 우리는 출력단들 간의 목적하는 상관 관계가 0인 것으로 추가로 가정한다. 이러한 경우에, 식(5)의 혼합 매트릭스에 의해 식(3)의 변환의 좌측 채널의 출력은를 생성한다. 따라서, 이 출력단은 그의 전역-통과 필터링된 버전과 조합된 원시 신호 x로 구성된다.To illustrate the problem by the matrix of equation (5), we assume a situation with the highest amplitude panning towards the left channel, i.e. when a particular signal is present only in the left channel. We further assume that the desired correlation between the outputs is zero. In this case, the output of the left channel of the transformation of equation (3) by the mixing matrix of equation (5) Create Thus, this output stage is its all-pass filtered version And the raw signal x in combination with.

그러나, 전역-통과 필터는 통상적으로 신호의 지각할 수 있는 품질을 악화시키기 때문에, 이는 목적하지 않는 상황이다. 더욱이, 원시 신호 및 필터링된 신호의 부가는 출력 신호의 인지된 착색 등의 콤브-필터 효과들(comb-filter effects)을 초래한다. 이와 같이 가정된 극도의 상황에서, 최상의 해결책은 좌측 출력 신호가 입력 신호로 구성된다는 것이다. 이는 2개의 출력 신호들의 상관 관계가 여전히 0일 수 있는 방식이다.However, this is an undesired situation because all-pass filters typically degrade the perceptible quality of the signal. Moreover, the addition of the raw signal and the filtered signal results in comb-filter effects such as perceived coloring of the output signal. In this extreme case, the best solution is that the left output signal consists of the input signal. This is how the correlation of the two output signals can still be zero.

보다 적당한 레벨 차이들을 갖는 상황들에서, 바람직한 상황은 보다 큰 출력 채널이 비교적 많은 원시 신호를 포함하고, 보다 유연한 출력 채널이 비교적 많은 필터링된 신호를 포함한다는 것이다. 따라서, 일반적으로, 2개의 출력단들에 함께 존재하는 원시 신호의 양을 최대화시키고, 필터링된 신호의 양을 최소화시키는 것이 바람직하다.In situations with more moderate level differences, the preferred situation is that a larger output channel contains relatively more raw signals, and a more flexible output channel contains relatively more filtered signals. Thus, in general, it is desirable to maximize the amount of raw signal present at the two outputs and to minimize the amount of filtered signal.

이러한 실시예에 따라, 이는 추가의 공통 회전을 포함하는 상이한 혼합 매트릭스를 도입함으로써 성취된다.According to this embodiment, this is achieved by introducing different mixing matrices containing additional common rotations.

(6) (6)

여기서 β는 추가의 회전이고, C는 출력 신호들 간의 상대적인 레벨 차이가 c와 동일한 것을 보장하는 스케일링 매트릭스이다. 즉,Where β is an additional rotation and C is a scaling matrix that ensures that the relative level difference between the output signals is equal to c. In other words,

식(3)에 식(6)의 매트릭스를 삽입함으로써 본 실시예에 따라 매트릭스화 오퍼레이션에 의해 발생된 출력 신호들을 생성한다:Inserting the matrix of equation (6) into equation (3) produces the output signals generated by the matrixing operation according to this embodiment:

따라서, 출력 신호들 L 및 R은 여전히 각 차이를 갖고, 즉, L 및 R 신호들 간의 상관 관계는 L 및 R 신호들 모두의 각 β의 추가의 화전 및 목적하는 레벨 차이에 따라 신호들 L 및 R을 스케일링 함으로써 영향을 받지 않는다.Thus, the output signals L and R still have an angular difference, that is, the correlation between the L and R signals depends on the signal L and the additional level of each β of the L and R signals and the desired level difference. It is not affected by scaling R.

상기한 바와 같이, 바람직하게는, L 및 R의 요약된 출력에서 원시 신호 x의 양은 최대화되어야 한다. 이러한 조건은 다음에 따라 각 β를 결정하기 위해 사용될 수 있고,As mentioned above, preferably, the amount of raw signal x in the summarized output of L and R should be maximized. These conditions can be used to determine the angle β according to

다음 조건을 생성한다.Create the following condition:

요약하자면, 본원 발명은 다중 채널 오디오 신호들의 공간적 속성들의 정신-음향적으로 자극되는 파라메터적 설명을 기재한다. 이 파라메터적 설명은 단지 하나의 모노럴 신호가 전송되어야 하고, 신호의 공간적 특성들을 기재하는 (양자화된) 파라메터들과 조합되어야 하기 때문에 오디오 코더들에서 강력한 비트율 감소들을 허용한다. 디코더는 공간적 파라메터들을 인가함으로써 원래량의 오디오 채널들을 형성할 수 있다. 근접한-CD-품질 스테레오 오디오를 위해, 10kbit 이하의공간적 파라메터들과 연합된 비트율이 수신 단부에서 정확한 공간적 임프레션을 재생산하기에 충분해 보인다. 이 비트율은 공간적 파라메터들의 공간적 및/또는 일시적 분해능을 감소시키고/시키거나 손상 없는 압축 알코리즘들을 사용하여 공간적 파라메터들을 처리함으로써 더 축소(scaled down)될 수 있다.In summary, the present invention describes a psychoacoustically stimulated parametric description of the spatial properties of multichannel audio signals. This parametric description allows strong bit rate reductions in audio coders because only one monaural signal should be transmitted and combined with (quantized) parameters describing the spatial characteristics of the signal. The decoder can form the original amount of audio channels by applying spatial parameters. For near-CD-quality stereo audio, the bit rate associated with spatial parameters of 10 kbit or less seems sufficient to reproduce the correct spatial impression at the receiving end. This bit rate can be further scaled down by reducing the spatial and / or temporal resolution of the spatial parameters and / or processing the spatial parameters using intact compression algorithms.

상기 실시예들은 본 발명을 제한하기보다는 오히려 예시하는 것으로, 본 기술의 숙련자들은 첨부된 특허 청구의 범위에서 벗어나지 않는 많은 대안의 실시예들을 고안할 수 있음을 인식해야 한다.The above embodiments are illustrative rather than limiting of the invention, and those skilled in the art should recognize that many alternative embodiments can be devised without departing from the scope of the appended claims.

예를 들면, 본 발명은 주로 2개의 편재화 큐들 ILD 및 ITD/IPD를 사용하는 실시예와 관련하여 기재하였다. 대안의 실시예들에서, 다른 편재화 큐들이 사용될 수 있다. 더욱이, 일 실시예에서, ILD, ITD/IPD, 및 채널간 교차-상관 관계는 상기한 바와 같이 결정될 수 있지만, 채널간 교차-상관 관계만이 모노럴 신호와 함께 전송됨으로서, 오디오 신호를 전송/저장하기 위해 요구된 대역폭/저장 용량을 더 감소시킬 수 있다. 대안으로, 채널간 교차-상관 관계 및 ILD 및 ITD/IPD 중의 하나가 전송될 수 있다. 이들 실시예들에서, 이 신호는 전송된 파라메터들만에 기초하여 모노럴 신호로부터 합성된다.For example, the present invention has been described in the context of an embodiment which mainly uses two localization cues ILD and ITD / IPD. In alternative embodiments, other localization queues may be used. Moreover, in one embodiment, the ILD, ITD / IPD, and inter-channel cross-correlation relationship may be determined as described above, but only the cross-channel cross-correlation relationship is transmitted with the monaural signal, thereby transmitting / saving the audio signal. The bandwidth / storage capacity required to do so can be further reduced. Alternatively, cross-channel cross-correlation and one of ILD and ITD / IPD may be transmitted. In these embodiments, this signal is synthesized from the monaural signal based only on the transmitted parameters.

특허 청구의 범위에서, 괄호 안의 임의의 기호들은 특허 청구의 범위를 제한하는 것으로서 해석되지 않아야 한다. "포함하는"이라는 단어는 특허 청구의 범위에 나열된 것들 이외의 요소들 또는 단계들의 존재를 배제하지 않는다. 요소 앞에 선행하는 "하나" 또는 "한개"라는 단어는 복수개의 그러한 요소들의 존재를 배제하지 않는다.In the claims, any symbols placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim. The word "one" or "one" preceding an element does not exclude the presence of a plurality of such elements.

본 발명은 여러 가지 독특한 소자들을 포함하는 하드웨어 수단 및 적절히 프로그램된 컴퓨터 수단에 의해 구현될 수 있다. 여러 소자들을 열거하는 디바이스 청구항에서, 여러 개의 이들 수단들은 하드웨어의 하나의 동일한 아이템에 의해실시될 수 있다. 특정 척도들이 상호 상이한 종속항들에 재인용된다는 단순한 사실은 이들 척도들의 조합이 유리하게 사용될 수 없다는 것을 지적하지 않는다.The invention can be implemented by means of hardware and variously programmed computer means comprising a variety of unique elements. In the device claim enumerating several elements, several of these means can be embodied by one and the same item of hardware. The simple fact that certain measures are re-cited in mutually different dependent claims does not indicate that a combination of these measures cannot be used advantageously.

Claims

A method of coding an audio signal,

Generating a monaural signal comprising a combination of at least two input audio channels,

Determining a set of spatial parameters indicative of the spatial characteristics of at least two input audio channels, wherein the set of spatial parameters comprises a parameter indicative of a measure of similarity of waveforms of at least two input audio channels. Determining a set of spatial parameters to represent;

Generating an encoded signal comprising a monaural signal and a set of spatial parameters.

2. The method of claim 1, wherein determining the set of spatial parameters indicative of spatial characteristics comprises determining the set of spatial parameters as a function of time and frequency.

The method of claim 2, wherein determining the set of spatial parameters indicative of the spatial characteristics comprises:

Dividing each of the at least two input audio channels into a corresponding plurality of frequency bands,

-For each of the plurality of frequency bands, determining a set of spatial parameters indicative of the spatial characteristics of the at least two input audio channels within the corresponding frequency band.

4. A method according to any of the preceding claims, wherein said set of spatial parameters comprises at least one localization cue.

5. The method of claim 4, wherein said set of spatial parameters comprises at least two localization cues comprising a selected one of an interchannel level difference and an interchannel time difference and an interchannel phase difference.

6. The method of claim 4 or 5, wherein the measure of similarity comprises information that cannot be described by localization queues.

The method of any one of claims 1 to 6, wherein the measure of similarity corresponds to a value of the cross-correlation function at the maximum value of the cross-correlation function.

8. The method of any of claims 1 to 7, wherein generating an encoded signal comprising the monaural signal and the set of spatial parameters comprises generating a set of quantized spatial parameters; Respectively introducing a corresponding quantization error with respect to the corresponding determined spatial parameter, wherein at least one of the introduced quantization errors is controlled to depend on the value of at least one of the determined spatial parameters.

An encoder for coding an audio signal,

Means for generating a monaural signal comprising a combination of at least two input audio channels,

Means for determining a set of spatial parameters indicative of the spatial characteristics of at least two input audio channels, said set of spatial parameters comprising parameters indicative of a measure of similarity of waveforms of at least two input audio channels Means for determining a set of spatial parameters representing the

Means for generating an encoded signal comprising said monaural signal and said set of spatial parameters.

A device for supplying an audio signal,

An input for receiving an audio signal,

An encoder as claimed in claim 9 for encoding said audio signal to obtain an encoded audio signal;

And an output end for supplying the encoded audio signal.

An encoded audio signal,

A monaural signal comprising a combination of at least two audio channels,

A set of spatial parameters indicative of spatial characteristics of at least two input audio channels, the set of spatial parameters comprising the set comprising a parameter indicative of a measure of similarity of waveforms of at least two input audio channels signal.

A storage medium having stored the encoded signal claimed in claim 11.

A method of decoding an encoded audio signal,

Obtaining a monaural signal from the encoded audio signal, wherein the monaural signal comprises a combination of at least two audio channels;

Obtaining a set of spatial parameters from the encoded audio signal, wherein the set of spatial parameters comprises a parameter indicating a measure of similarity of waveforms of at least two input audio channels;

Generating a multi-channel output signal from the monaural signal and the spatial parameters.

A decoder for decoding an encoded audio signal,

Means for obtaining a monaural signal from the encoded audio signal, wherein the monaural signal comprises a combination of at least two audio channels;

Means for obtaining a set of spatial parameters from the encoded audio signal, the set of spatial parameters comprising means for obtaining the set of spatial parameters including a parameter indicating a measure of similarity of waveforms of at least two audio channels;

Means for generating a multi-channel output signal from the monaural signal and the spatial parameters.

An apparatus for supplying a decoded audio signal,

An input for receiving an encoded audio signal,

A decoder as claimed in claim 14 for decoding said encoded audio signal to obtain a multi-channel output signal;

And an output stage for supplying or reproducing the multi-channel output signal.