WO2013029225A1 - Parametric multichannel encoder and decoder - Google Patents

Parametric multichannel encoder and decoder Download PDF

Info

Publication number
WO2013029225A1
WO2013029225A1 PCT/CN2011/079051 CN2011079051W WO2013029225A1 WO 2013029225 A1 WO2013029225 A1 WO 2013029225A1 CN 2011079051 W CN2011079051 W CN 2011079051W WO 2013029225 A1 WO2013029225 A1 WO 2013029225A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
channel signal
time difference
interaural time
fuzziness
Prior art date
Application number
PCT/CN2011/079051
Other languages
French (fr)
Inventor
Christof Faller
David Virette
Yue Lang
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/CN2011/079051 priority Critical patent/WO2013029225A1/en
Priority to CN201180068689.6A priority patent/CN103403801B/en
Publication of WO2013029225A1 publication Critical patent/WO2013029225A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Definitions

  • the demultiplexer is configured to extract a first portion of the receive signal to obtain the multichannel audio signal, to extract a second portion of the receive signal to obtain the estimate of the interaural time difference, and to extract a third portion of the receive signal to obtain the fuzziness indicator.
  • Fig 4 shows technologies for ITD extraction and synthesis
  • the interaural time difference is the difference in arrival time of a sound between both ears of a hearer. ITD is important in the localization of sounds, as it provides a cue to the direction or angle of the sound source with respect to the head. If a signal arrives at the head from one side, the signal has a further way to travel to reach the farer ear than the nearer ear. This path length difference results in a time difference between the arrival of sound at the ears, which is perceived and aids the process of identifying the direction of sound source.
  • Fig. 4 shows an embodiment for ITD extraction and synthesis comprising an encoder 401 and a decoder 403.
  • Cross-correlation can be used to extract fuzziness indicator. If the maximum cross- correlation of two channels is lower than a threshold, fuzziness indicator can be set to 1 , else it is set to 0.
  • ICj(d) being the normalized cross-correlation defined as wherein xref represents the reference signal and xj represents the channel signal j.
  • the reference signal xref can be chosen in one of the channels xj (for j in [1 ,M]) and then M-1 spatial cues are calculated in the decoder.
  • the reference signal xref can be also a mono downmix signal, which is the average of all M channels, and then M spatial cues can be calculated in the decoder.
  • dj[k) can be decreased or set to zero, when a certain degree of fuzziness is present.
  • the parametric multichannel decoder 403, 503 can be further configured to modulate the estimate of the interaural time difference if the fuzziness indicator indicates a second grade of non-reliability of the estimate of the interaural time difference, in particular a reliability of the estimate of the interaural time difference, that is greater than a first grade of non-reliability of the estimate of the interaural time difference, in particular a non- reliability of the estimate of the interaural time difference, and to amend phases of the first decoded channel signal and the second decoded channel signal upon the basis of the modulated estimate of the interaural time difference.

Abstract

The present invention relates to a parametric multichannel encoder 501 for encoding a multichannel signal comprising a first channel signal and a second channel signal, the parametric multichannel encoder 501 comprising an estimator 505 for estimating an interaural time difference between the first channel signal and the second channel signal to obtain an estimate of the interaural time difference, the estimator 505 being further configured to determine a fuzziness indicator, the fuzziness indicator indicating a grade of non-reliability of the estimate of the interaural time difference; a downmix signal generator 507 for generating a downmix signal from the first channel signal and the second channel signal; and a multiplexer 511 for multiplexing the downmix signal, the interaural time difference, and the fuzziness parameter to obtain an encoded signal.

Description

DESCRIPTION
Parametric multichannel encoder and decoder TECHNICAL FIELD
The present invention relates to audio coding. BACKGROUND OF THE INVENTION
Parametric stereo or multi-channel audio coding as described e.g. in C. Faller and F. Baumgarte, "Efficient representation of spatial audio using perceptual parameterization," in Proc. IEEE Workshop on Appl. of Sig. Proc. to Audio and Acoust, Oct. 2001 , pp. 199- 202, uses spatial cues and a down-mix - usually mono or stereo - audio signals to synthesize signals with more channels. Usually, the down-mix audio signals result from a superposition of a plurality of audio channel signals of a multi-channel audio signal, e.g. of a stereo audio signal. These less channels are waveform coded and side information, i.e. the spatial cues, relating to the original signal channel relations is added to the coded audio channels. The decoder uses this side information to re-generate the original number of audio channels based on the decoded waveform of the down-mix audio channels.
SUMMARY OF THE INVENTION
A goal to be achieved by the present invention is to provide an efficient concept for synthesizing a multi-channel audio signal from a down-mix audio signal.
This object is achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures. According to a first aspect, the invention relates to a parametric multichannel encoder for encoding a multichannel signal comprising a first channel signal and a second channel signal, the parametric multichannel encoder comprising an estimator for estimating an interaural time difference between the first channel signal and the second channel signal to obtain an estimate of the interaural time difference, the estimator being further configured to determine a fuzziness indicator, the fuzziness indicator indicating a grade of non-reliability of the estimate of the interaural time difference; a downmix signal generator for generating a downmix signal from the first channel signal and the second channel signal; and a multiplexer for multiplexing the downmix signal, the interaural time difference, and the fuzziness parameter to obtain an encoded signal.
In a first possible implementation form of the audio signal processor according to the first aspect the estimator is configured to determine a delay between the first channel signal and the second channel signal for estimating the interaural time difference.
In a second possible implementation form of the audio signal processor according to the first aspect as such or according to the first implementation form of the first aspect the estimator comprises a Fourier Transformer for transforming the first channel signal and the second channel signal into frequency domain to obtain a first transformed channel signal and a second transformed channel signal, and wherein the estimator is configured to estimate a phase difference between the first and the second transformed channel signal, the phase difference indicating the interaural time difference.
In a third possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the estimator is configured to determine a standard deviation of a delay between the first channel signal and the second channel signal in different bandwidths to determine the fuzziness indicator. In a fourth possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the estimator is configured to determine a first value of the fuzziness indicator or a second value of the fuzziness indicator, the first value indicating that the interaural time difference is non-reliable the second value indicating that the interaural time difference is reliable.
In a fifth possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the estimator is configured to determine one of a plurality of values of the fuzziness indicator, each value being associated with a different grade of non-reliability of the estimate of the interaural time difference.
In a sixth possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the estimator is configured to determine a cross-correlation between the first channel signal and the second channel signal to estimate the interaural time difference.
In a seventh possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the grade of non-reliability of the estimate of the interaural time difference is determined by a delay between the first channel signal and the second channel signal.
In an eighth possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the downmix signal generator is configured to combine the first channel signal and the second channel signal to obtain the downmix signal.
In a ninth possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the estimator is configured to quantize the interaural time difference to obtain quantized interaural time difference, and wherein the multiplexer is configured to include the quantized interaural time difference into the encoded signal. In a tenth possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the first audio signal is a superposition of audio signal components originating from a first audio signal source and a second audio signal source from different directions. According to a second aspect, the invention relates to a parametric multichannel decoder for decoding a received signal to obtain a multichannel audio signal with a first channel signal and a second channel signal, the received signal comprising a downmix signal, an estimate of an interaural time difference between the first channel signal and the second channel signal, and a fuzziness indicator, the fuzziness indicator indicating a grade of non-reliability of the estimate of the interaural time difference, the parametric multichannel decoder comprising a demultiplexer for demultiplexing the received signal to provide the downmix signal, the estimate of the interaural time difference and the fuzziness indicator; and a synthesizer for synthesizing the first decoded channel signal and the second decoded channel signal from the multichannel audio signal using the encoded downmix signal, the estimate of the interaural time difference and the fuzziness indicator.
In a first possible implementation form of the audio signal processor according to the second aspect the demultiplexer is configured to extract a first portion of the receive signal to obtain the multichannel audio signal, to extract a second portion of the receive signal to obtain the estimate of the interaural time difference, and to extract a third portion of the receive signal to obtain the fuzziness indicator.
In a second possible implementation form of the audio signal processor according to the second aspect as such or according to the first implementation form of the second aspect the parametric multichannel decoder is configured to amend phases of the first decoded channel signal and the second decoded channel signal upon the basis of the estimate interaural time difference if the fuzziness indicator indicates a grade corresponding to a reliability of the estimate of the interaural time difference.
In a third possible implementation form of the audio signal processor according to the second aspect as such or according to any of the preceding implementation forms of the second aspect the parametric multichannel decoder is configured to modulate the estimate of the interaural time difference if the fuzziness indicator indicates a grade corresponding to a non-reliability grade of the estimate of the interaural time difference, and to amend phases of the first decoded channel signal and the second decoded channel signal upon the basis of the modulated estimate of the interaural time difference.
According to a third aspect, the invention relates to an encoding method for parametric multichannel encoding a multichannel signal comprising a first channel signal and a second channel signal, the encoding method comprising estimating an interaural time difference between the first channel signal and the second channel signal to obtain an estimate of the interaural time difference, the estimator being further configured to determine a fuzziness indicator, the fuzziness indicator indicating a grade of non-reliability of the estimate of the interaural time difference; generating a downmix signal from the first channel signal and the second channel signal; and multiplexing the downmix signal, the interaural time difference and the fuzziness parameter to obtain an encoded signal. According to a fourth aspect, the invention relates to a decoding method for parametric multichannel decoding a received signal to obtain a multichannel audio signal with a first decoded channel signal and a second decoded channel signal, the received signal comprising a downmix signal, an estimate of an interaural time difference between the first channel signal and the second channel signal, the fuzziness indicator, and a fuzziness indicator indicating a grade of non-reliability of the estimate of the interaural time difference, the decoding method comprising demultiplexing the received signal to provide the downmix signal, the estimate of the interaural time difference and the fuzziness indicator; and synthesizing the first decoded channel signal and the second decoded channel signal of the multichannel audio signal using the encoded downmix signal, the estimate of the interaural time difference and the fuzziness indicator.
According to a fifth aspect, the invention relates to a computer program for performing the method of the third aspect or the fourth aspect when run on a computer. BRIEF DESCRIPTION OF THE DRAWINGS
Further embodiments of the invention will be described with respect to the following figures, in which: Fig. 1 shows an illustration of the principles of interaural time differences (ITDs); Fig. 2 shows a typical scenario of an ITD switching problem;
Fig 3 shows an example of ITD estimation and a standard deviation;
Fig 4 shows technologies for ITD extraction and synthesis;
Fig 5 shows a diagram of fuzzy ITD synthesis. DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION
The interaural time difference (ITD) is the difference in arrival time of a sound between both ears of a hearer. ITD is important in the localization of sounds, as it provides a cue to the direction or angle of the sound source with respect to the head. If a signal arrives at the head from one side, the signal has a further way to travel to reach the farer ear than the nearer ear. This path length difference results in a time difference between the arrival of sound at the ears, which is perceived and aids the process of identifying the direction of sound source.
Fig. 1 shows an example of ITD. Differences in time of arrival at the two ears 103 and 105 of a hearer 101 are indicated by a delay of the sound waveform. It can be defined that if a waveform arrives firstly at the left ear, the ITD is positive, whereas it is negative otherwise. If the sound source is directly in front of the listener, the ITD is zero.
When coding audio items with inherent ITD cues, for example binaural audio, e.g.
recorded by binaural microphone or two high fidelity microphones mounted in a dummy head, ITD estimation is unreliable and often the estimated ITD is fast varying, when multiple sources from different directions are simultaneously active. The parametric stereo decoded signal in such a case is often instable, i.e. sources are perceived with instable directions.
Fig. 2 shows a typical scenario of this problem. There are two talkers on the left and right side. The left talker 201 starts speaking first, i.e. his ITD is positive, and then the right talker 205 starts speaking, i.e. his ITD is negative. Between these two talkers there is a crosstalk 203, when both talkers talk simultaneously. During the crosstalk, a single ITD cannot represent the channel relation correctly.
If the ITD directly switches from a positive to a negative value, a click is perceived around the switching point. If, however, the ITD set to zero before switching from a positive to a negative value, the right talker is perceived as moving from the left side to the right side and staying at the right side. Fig. 3 shows an example of an ITD estimation 301 and a standard deviation 303 of the binaural speech shown in figure 2. Nearly at time 600 occurs a switch from one source of speech to another with an intermediate overlap. The ITD standard deviation 303 exhibits a peak, indicating an instable ITD. An instable ITD indicates that the localization information is blurry. Thus, it is not possible to rely on only one delay parameter to maintain the stereo image.
In a situation with an instable ITD, e.g. switching between two values corresponding to two sources, the ITD is ambiguous. It can take randomly a value from either one or the other source in dependence on each frequency and time and according to which source is stronger, i.e. has more energy than the other. In this case a "fuzzy ITD" is used, i.e. the stereo signal is rendered such that it is perceived as being blurred. Using this strategy, a direct switching and instability of the ITD is not perceived as such. A parametric stereo encoder using fuzzy ITD can be described by the steps of receiving a stereo signal; estimating inter-channel cues, including estimating ITD; estimating the ITD as being fuzzy, if ITD is fluctuating or ambiguous; and packing into a bit-stream the mono downmix plus the inter-channel cues which includes the fuzzy ITDs, if such were estimated or determined.
A parametric stereo decoder using fuzzy ITD can be described by the steps of receiving a bit-stream; obtaining from the bit-stream a mono signal plus inter-channel cues;
synthesizing a stereo signal using the inter-channel cues; and synthesizing the stereo signal at the corresponding frequency and time as being blurred, if a fuzzy ITD is detected.
A fuzzy ITDs can be represented in various ways, e.g. in the bit-stream: in addition to using conventional ITDs, one or more ITD levels can be defined as being fuzzy and different degrees of fuzziness may be defined; only one parameter may be used, e.g.
defining a fuzzy ITD as being zero (ITD=0), but with different fuzziness levels; or two parameters may be used, e.g. one conventional ITD and a degree of fuzziness.
At the decoder fuzzy ITDs may be synthesized in different ways: the desired ITD is synthesized and left and right channel are de-correlated depending on the desired degree of fuzziness; the desired ITD and an ICC corresponding to the desired fuzziness are synthesized; an ITD fluctuating over time is synthesized; or an ITD fluctuating over frequency is synthesized.
Fig. 4 shows an embodiment for ITD extraction and synthesis comprising an encoder 401 and a decoder 403.
The parametric multichannel encoder 401 comprises an estimator 405, a downmix means 407, an encoder 409 and a multiplexer 41 1 . The parametric multichannel decoder 403 comprises a demultiplexer 413, a decoder 417, and a de-quantizer 419. The estimator 405 is, according to an implementation form, configured to extract the fuzziness factor according to the principles described herein.
In the encoder 401 the ITD is extracted from the left channel x1(n) and the right channel x2(n) in the short-time discrete Fourier transform domain (STFT-domain). Without loss of generality, the technique can be described as using sampled signals. A time domain signal is written with lower-case letters, e.g. as x(n), where n is the time index. The sampling frequency, at which a time-domain signal is sampled, is fs Hz. The
corresponding short-time discrete Fourier transform (STFT) is denoted X(k,i), where k \s the frame index, i.e. downsampled time index, and / is the frequency index. The discrete Fourier transform (DFT) size is denoted N.
The time-smoothed cross-spectrum is estimated with single-pole averaging
Figure imgf000009_0001
where * denotes the complex conjugate and determines the decay of the exponential time smoothing, a is a smoothing factor with a range from 0 to 1 . The delay as a function of time and frequency is given by
Figure imgf000009_0002
(2) where :~ ' · i: is the unwrapped phase of C( e unwrapping ensures that all ppropriate multiples of 2PI have been included
Figure imgf000010_0001
inally, the full-band delay is estimated as
Figure imgf000010_0002
where // and ih are the STFT bin indices which can be calculated from equation (4)
"ΐ·/>
Figure imgf000010_0003
(4) where [fl, fh] Hz is the delay estimation frequency range. I and h are abbreviations for low and high. Using a smoothed version of the cross-spectrum and only a part of the frequency band for ITD estimation can make the delay estimate robust and reliable.
At the parametric stereo decoder, the delay d(k) can be synthesized by modifying the left and right spectra,
Figure imgf000010_0004
. 2md(k)
$2(k,i)=e ] 2N X2(k,i). (5) Note that equation (5) gives a delay of d(k)/2 to the left channel and a delay of -d(k)/2 to the right channel. Alternatively, it is also possible not to modify a channel and give the full delay to the other channel. In the following a fuzziness indicator extraction will be described.
In complex audio signals a single delay often cannot represent the relation between the audio channels precisely. The standard deviation of the delay in samples,
Figure imgf000011_0001
can be used as a measure, on how precisely a single delay describes the channel relations or equivalently, how much the delay varies as a function of frequency.
If
Figure imgf000011_0002
is higher than a threshold, the fuzziness indicator can be set to 1 , else the indicator is set to 0.
Cross-correlation can be used to extract fuzziness indicator. If the maximum cross- correlation of two channels is lower than a threshold, fuzziness indicator can be set to 1 , else it is set to 0.
In the following a fuzzy time-delay synthesis will be described.
Application of a full-band single delay, as done with equation (5) still can be improved. If a binaural signal has two sources on the left and right side, the delay between the audio channels toggles between the delays related to the direction of both sources, if only one source is active at a time. However, if both sources are simultaneously active, a single delay cannot represent the channel relation. Localization blur increases when sources are simultaneous active. A single delay in this case will result in perception of a non-blurred object at a specific direction. To improve single-delay parametric stereo synthesis, an additional localization blur parameter can be used.
A way to synthesize a blurred source is to change the estimated delay over frequency, e.g. d(k,i) = d(k) + sm(^-r) (7)
N where β \$ the modulation amplitude in samples and γ determines how many periods of sinusoidal modulation are contained up to Nyquist-frequency.
Given d(k,i) blurred time delay synthesis is carried out with
, 2m3( k,i)
-¾(¼ )= e7 2N Χγ{ ϊ)
_ .2m3(k,i)
X2(k,i)=e~J 2N X2(k,i). (8)
In order to prevent fuzziness from too quickly switching on and off, the fuzziness indicator is time-smoothed at the decoder. If the time-smoothed fuzziness indicator in a range from 0 to 1 is denoted f(k), then a blurred delay adapting to the degree of fuzziness can be computed as: d(k, i) = d(k) + f(k)fi sin(^ γ) (9)
N
Alternatively d(k) can be decreased or set to zero, when a certain degree of fuzziness is present.
In contrast to using delay modulation it is also possible to apply different all-pass filters to left and right to obtain a blurred spatial image:
2m3(k,i) .2m3.(k,i)
-j
Jt2(k,i)- (f(k)A2(i) + l -f(k))e X2(k,i), (10) where A 1(i) and A2(i) represent all-pass filters. More generally speaking, de-correlators can be used instead of all-pass filters.
The parametric stereo encoder estimates the time-delay d(k), e.g. as aforementioned. Additionally, a fuzziness indicator f(k) is determined. For example, an indicator could be used which is either 0 or 1 , where 1 indicates that the delay is fuzzy. The delay and fuzziness indicator are transmitted as parametric stereo parameters to the decoder, usually in combination with other parameters, such as level differences. If f(k)=0 (reliable ITD), then the parametric stereo decoder uses equation (5) for delay synthesis. Else if f(k)=1 (non-reliable ITD), then the parametric stereo decoder uses equation (8) for delay synthesis. Alternatively, a fuzziness indicator could be used which has different values. In this case, the parametric decoder has the capability of synthesizing delays with different degrees of fuzziness. For example, the degree of fuzziness could be varied by varying β \η equation (9). The higher the degree of fuzziness is, the more unstable is the source. Fig. 5 shows a diagram of fuzzy ITD synthesis. The fuzzy ITD synthesis implementation includes a parametric multichannel encoder 501 and a parametric multichannel decoder 503. The parametric multichannel encoder 501 comprises an estimator 505, a downmix means 507, an encoder 509, a fuzziness indicator extraction means 523 and a multiplexer 51 1 . The parametric multichannel decoder 503 comprises a demultiplexer 513, a fuzziness indicator 515, a decoder 517 and a de-quantizer 519.
In the following a multiple channels implementation will be described.
ITDs are extracted from the multi-channel signal by using the following equation.
Figure imgf000013_0001
with ICj(d) being the normalized cross-correlation defined as
Figure imgf000014_0001
wherein xref represents the reference signal and xj represents the channel signal j. The reference signal xref can be chosen in one of the channels xj (for j in [1 ,M]) and then M-1 spatial cues are calculated in the decoder. The reference signal xref can be also a mono downmix signal, which is the average of all M channels, and then M spatial cues can be calculated in the decoder.
The advantage of using a downmix signal as a reference for a multichannel audio signal is avoiding using a silent signal as reference signal. Indeed the downmix represents an average of the energy of all the channels and is hence less subject to be silent.
If ICj(d) is lower than a given threshold, fuzziness indicator of channel j can be set to 1 , else set to 0. This fuzziness indicator needs to be transmitted to the decoder.
In the decoder, a blurred source is synthesized the same way.
The estimated delay is changed over frequency, e.g.
4πι
dj (k, i) = dj k) + β sin(— Y) (13)
Where β is the modulation amplitude in samples and γ determines how many periods of sinusoidal modulation are contained up to Nyquist-frequency. y is the index of channel.
Given dj(k,i) blurred time delay synthesis is carried out with
XrAk,i) = X Ak,i)
(14)
Xj{k,i) = e i^^Xj{k,i).
(15)
To prevent fuzziness from too quickly switching on and off, the fuzziness indicator is time- smoothed at the decoder. If the time-smoothed fuzziness indicator is denoted f k) in a range from 0 to 1 ,then a blurred delay, adapting to the degree of fuzziness can be computed as:
dj (k, i) = dj (*) + fj (£) ? sin(^ χ) (16)
N
Alternatively dj[k) can be decreased or set to zero, when a certain degree of fuzziness is present.
In contrast to using delay modulation applying different all-pass filters in all channels to obtain a blurred spatial image:
Xref (k, i) = Xref (k,i) =^,(*, = ( ,(*Η( +ΐ- ,(*))β"7_ !Γ"^(*, , (17) where Aj(i) represents the applied all-pass filter. More generally speaking, de-correlators can be used instead of all-pass filters.
The parametric multichannel decoder 403, 503 can be configured to amend phases of the first decoded channel signal and the second decoded channel signal upon the basis of the estimate interaural time difference if the fuzziness indicator indicates a first grade of non- reliability of the estimate of the interaural time difference, in particular a non -reliability of the estimate of the interaural time difference, that is smaller than a second grade of non- reliability of the estimate of the interaural time difference, in particular a reliability of the estimate of the interaural time difference.
The parametric multichannel decoder 403, 503 can be further configured to modulate the estimate of the interaural time difference if the fuzziness indicator indicates a second grade of non-reliability of the estimate of the interaural time difference, in particular a reliability of the estimate of the interaural time difference, that is greater than a first grade of non-reliability of the estimate of the interaural time difference, in particular a non- reliability of the estimate of the interaural time difference, and to amend phases of the first decoded channel signal and the second decoded channel signal upon the basis of the modulated estimate of the interaural time difference.

Claims

CLAIMS:
1 . A parametric multichannel encoder (401 , 501 ) for encoding a multichannel signal comprising a first channel signal and a second channel signal, the parametric
multichannel encoder (401 , 501 ) comprising: an estimator (405, 505) for estimating an interaural time difference between the first channel signal and the second channel signal to obtain an estimate of the interaural time difference, the estimator (405, 505) being further configured to determine a fuzziness indicator, the fuzziness indicator indicating a grade of non-reliability of the estimate of the interaural time difference; a downmix signal generator (407, 507) for generating a downmix signal from the first channel signal and the second channel signal; and a multiplexer (41 1 , 51 1 ) for multiplexing the downmix signal, the interaural time difference, and the fuzziness parameter to obtain an encoded signal.
2. The parametric multichannel encoder (401 , 501 ) of claim 1 , wherein the estimator (405, 505) is configured to determine a delay between the first channel signal and the second channel signal for estimating the interaural time difference.
3. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the estimator (405, 505) comprises a Fourier Transformer for transforming the first channel signal and the second channel signal into frequency domain to obtain a first transformed channel signal and a second transformed channel signal, and wherein the estimator (405, 505) is configured to estimate a phase difference between the first and the second transformed channel signal, the phase difference indicating the interaural time difference.
4. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the estimator (405, 505) is configured to determine a standard deviation of the delays between the first channel signal and the second channel signal in different bandwidths to determine the fuzziness indicator.
5. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the estimator (405, 505) is configured to determine a first value of the fuzziness indicator or a second value of the fuzziness indicator, the first value indicating that the interaural time difference is non-reliable the second value indicating that the interaural time difference is reliable.
6. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the estimator (405, 505) is configured to determine one of a plurality of values of the fuzziness indicator, each value being associated with a different grade of non- reliability of the estimate of the interaural time difference.
7. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the estimator (405, 505) is configured to determine a cross-correlation between the first channel signal and the second channel signal to estimate the interaural time difference.
8. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the grade of non-reliability of the estimate of the interaural time difference is determined by a cross-correlation between the first channel signal and the second channel signal.
9. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the downmix signal generator (407, 507) is configured to combine the first channel signal and the second channel signal to obtain the downmix signal.
10. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the estimator (405, 505) is configured to quantize the interaural time difference to obtain quantized interaural time difference, and wherein the multiplexer (41 1 , 51 1 ) is configured to include the quantized interaural time difference into the encoded signal.
1 1 . The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the first audio signal is a superposition of audio signal components originating from a first audio signal source and a second audio signal source from different directions.
12. A parametric multichannel decoder (403, 503) for decoding a received signal to obtain a multichannel audio signal with a first decoded channel signal and a second decoded channel signal, the received signal comprising a downmix signal, an estimate of an interaural time difference between the first channel signal and the second channel signal, and a fuzziness indicator, the fuzziness indicator indicating a grade of non- reliability of the estimate of the interaural time difference, the parametric multichannel decoder (403, 503) comprising: a demultiplexer (413, 513) for demultiplexing the received signal to provide the downmix signal, the estimate of the interaural time difference and the fuzziness indicator; and a synthesizer (421 , 521 ) for synthesizing the first decoded channel signal and the second decoded channel signal from the multichannel audio signal using the encoded downmix signal, the estimate of the interaural time difference and the fuzziness indicator.
13. The parametric multichannel decoder (403, 503) of claim 12, wherein the demultiplexer (513) is configured to extract a first portion of the receive signal to obtain the multichannel audio signal, to extract a second portion of the receive signal to obtain the estimate of the interaural time difference, and to extract a third portion of the receive signal to obtain the fuzziness indicator.
14. The parametric multichannel decoder (403, 503) of claim 12 or 13, wherein the parametric multichannel decoder (403, 503) is configured to amend phases of the first decoded channel signal and the second decoded channel signal upon the basis of the estimate interaural time difference if the fuzziness indicator indicates a grade
corresponding to a reliability of the estimate of the interaural time difference.
15. The parametric multichannel decoder (403, 503) of claim 12, 13 or 14, wherein the parametric multichannel decoder (403, 503) is configured to modulate the estimate of the interaural time difference if the fuzziness indicator indicates a grade corresponding to a non-reliability grade of the estimate of the interaural time difference, and to amend phases of the first decoded channel signal and the second decoded channel signal upon the basis of the modulated estimate of the interaural time difference.
16. An encoding method for parametric multichannel encoding a multichannel signal comprising a first channel signal and a second channel signal, the encoding method comprising: estimating an interaural time difference between the first channel signal and the second channel signal to obtain an estimate of the interaural time difference, the estimator (405, 505) being further configured to determine a fuzziness indicator, the fuzziness indicator indicating a grade of non-reliability of the estimate of the interaural time difference; generating a downmix signal from the first channel signal and the second channel signal ; and multiplexing the downmix signal, the interaural time difference and the fuzziness parameter to obtain an encoded signal.
17. A decoding method for parametric multichannel decoding a received signal to obtain a multichannel audio signal with a first decoded channel signal and a second decoded channel signal, the received signal comprising a downmix signal, an estimate of an interaural time difference between the first channel signal and the second channel signal, the fuzziness indicator, and a fuzziness indicator indicating a grade of non- reliability of the estimate of the interaural time difference, the decoding method comprising: demultiplexing the received signal to provide the downmix signal, the estimate of the interaural time difference and the fuzziness indicator; and synthesizing the first decoded channel signal and the second decoded channel signal of the multichannel audio signal using the encoded downmix signal, the estimate of the interaural time difference and the fuzziness indicator.
18. A computer program for performing the method of claim 16 or 17 when run on a computer.
PCT/CN2011/079051 2011-08-29 2011-08-29 Parametric multichannel encoder and decoder WO2013029225A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2011/079051 WO2013029225A1 (en) 2011-08-29 2011-08-29 Parametric multichannel encoder and decoder
CN201180068689.6A CN103403801B (en) 2011-08-29 2011-08-29 Parametric multi-channel encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/079051 WO2013029225A1 (en) 2011-08-29 2011-08-29 Parametric multichannel encoder and decoder

Publications (1)

Publication Number Publication Date
WO2013029225A1 true WO2013029225A1 (en) 2013-03-07

Family

ID=47755184

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/079051 WO2013029225A1 (en) 2011-08-29 2011-08-29 Parametric multichannel encoder and decoder

Country Status (2)

Country Link
CN (1) CN103403801B (en)
WO (1) WO2013029225A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107742521A (en) * 2016-08-10 2018-02-27 华为技术有限公司 The coding method of multi-channel signal and encoder
WO2022262960A1 (en) * 2021-06-15 2022-12-22 Telefonaktiebolaget Lm Ericsson (Publ) Improved stability of inter-channel time difference (itd) estimator for coincident stereo capture

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103916217B (en) * 2014-03-25 2017-06-13 烽火通信科技股份有限公司 The implementation method and device of XLGMII interface multichannel frequency reducing DIC mechanism

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1600791A1 (en) * 2004-05-26 2005-11-30 Honda Research Institute Europe GmbH Sound source localization based on binaural signals
WO2009042386A1 (en) * 2007-09-25 2009-04-02 Motorola, Inc. Apparatus and method for encoding a multi channel audio signal
CN101408615A (en) * 2008-11-26 2009-04-15 武汉大学 Method and device for measuring binaural sound time difference ILD critical apperceive characteristic

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2323130A1 (en) * 2009-11-12 2011-05-18 Koninklijke Philips Electronics N.V. Parametric encoding and decoding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1600791A1 (en) * 2004-05-26 2005-11-30 Honda Research Institute Europe GmbH Sound source localization based on binaural signals
WO2009042386A1 (en) * 2007-09-25 2009-04-02 Motorola, Inc. Apparatus and method for encoding a multi channel audio signal
CN101408615A (en) * 2008-11-26 2009-04-15 武汉大学 Method and device for measuring binaural sound time difference ILD critical apperceive characteristic

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FALLER, CHRISTOF ET AL.: "Efficient Representation of Spatial Audio Using Perceptual Parametrization", IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS 2001, 21 October 2001 (2001-10-21), pages 199 - 202 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20210093384A (en) * 2016-08-10 2021-07-27 후아웨이 테크놀러지 컴퍼니 리미티드 Method for encoding multi-channel signal and encoder
US11217257B2 (en) 2016-08-10 2022-01-04 Huawei Technologies Co., Ltd. Method for encoding multi-channel signal and encoder
EP3486904A4 (en) * 2016-08-10 2019-06-19 Huawei Technologies Co., Ltd. Method for encoding multi-channel signal and encoder
RU2718231C1 (en) * 2016-08-10 2020-03-31 Хуавэй Текнолоджиз Ко., Лтд. Method for encoding multichannel signal and encoder
US10643625B2 (en) 2016-08-10 2020-05-05 Huawei Technologies Co., Ltd. Method for encoding multi-channel signal and encoder
KR102281668B1 (en) 2016-08-10 2021-07-23 후아웨이 테크놀러지 컴퍼니 리미티드 Multi-channel signal encoding method and encoder
KR20190030735A (en) * 2016-08-10 2019-03-22 후아웨이 테크놀러지 컴퍼니 리미티드 Multichannel signal encoding method and encoder
CN107742521B (en) * 2016-08-10 2021-08-13 华为技术有限公司 Coding method and coder for multi-channel signal
CN107742521A (en) * 2016-08-10 2018-02-27 华为技术有限公司 The coding method of multi-channel signal and encoder
KR102464300B1 (en) 2016-08-10 2022-11-04 후아웨이 테크놀러지 컴퍼니 리미티드 Method for encoding multi-channel signal and encoder
KR20220151043A (en) * 2016-08-10 2022-11-11 후아웨이 테크놀러지 컴퍼니 리미티드 Method for encoding multi-channel signal and encoder
KR102617415B1 (en) 2016-08-10 2023-12-21 후아웨이 테크놀러지 컴퍼니 리미티드 Method for encoding multi-channel signal and encoder
EP4131260A1 (en) * 2016-08-10 2023-02-08 Huawei Technologies Co., Ltd. Method for encoding multi-channel signal and encoder
US11756557B2 (en) 2016-08-10 2023-09-12 Huawei Technologies Co., Ltd. Method for encoding multi-channel signal and encoder
WO2022262960A1 (en) * 2021-06-15 2022-12-22 Telefonaktiebolaget Lm Ericsson (Publ) Improved stability of inter-channel time difference (itd) estimator for coincident stereo capture

Also Published As

Publication number Publication date
CN103403801A (en) 2013-11-20
CN103403801B (en) 2015-11-25

Similar Documents

Publication Publication Date Title
RU2705007C1 (en) Device and method for encoding or decoding a multichannel signal using frame control synchronization
JP7091411B2 (en) Multi-channel signal coding method and encoder
US9449603B2 (en) Multi-channel audio encoder and method for encoding a multi-channel audio signal
US8831759B2 (en) Audio coding
JP5290956B2 (en) Audio signal correlation separator, multi-channel audio signal processor, audio signal processor, method and computer program for deriving output audio signal from input audio signal
EP2702776B1 (en) Parametric encoder for encoding a multi-channel audio signal
US9009057B2 (en) Audio encoding and decoding to generate binaural virtual spatial signals
JP5947971B2 (en) Method for determining coding parameters of a multi-channel audio signal and multi-channel audio encoder
EP2633520B1 (en) Parametric encoder for encoding a multi-channel audio signal
KR101662682B1 (en) Method for inter-channel difference estimation and spatial audio coding device
KR20050021484A (en) Audio coding
WO2006108543A1 (en) Temporal envelope shaping of decorrelated signal
CN108369810B (en) Adaptive channel reduction processing for encoding multi-channel audio signals
WO2010097748A1 (en) Parametric stereo encoding and decoding
EP2730102B1 (en) Method and apparatus for decomposing a stereo recording using frequency-domain processing employing a spectral weights generator
CN101421779A (en) Apparatus and method for production of a surrounding-area signal
JP2015528926A (en) Generalized spatial audio object coding parametric concept decoder and method for downmix / upmix multichannel applications
EP2984857A1 (en) Apparatus and method for center signal scaling and stereophonic enhancement based on a signal-to-downmix ratio
WO2013029225A1 (en) Parametric multichannel encoder and decoder
JP2017058696A (en) Inter-channel difference estimation method and space audio encoder
Vilkamo Perceptually motivated time-frequency processing of spatial audio

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11871767

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11871767

Country of ref document: EP

Kind code of ref document: A1