WO2013029225A1

WO2013029225A1 - Parametric multichannel encoder and decoder

Info

Publication number: WO2013029225A1
Application number: PCT/CN2011/079051
Authority: WO
Inventors: Christof Faller; David Virette; Yue Lang
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2011-08-29
Filing date: 2011-08-29
Publication date: 2013-03-07
Also published as: CN103403801A; CN103403801B

Abstract

The present invention relates to a parametric multichannel encoder 501 for encoding a multichannel signal comprising a first channel signal and a second channel signal, the parametric multichannel encoder 501 comprising an estimator 505 for estimating an interaural time difference between the first channel signal and the second channel signal to obtain an estimate of the interaural time difference, the estimator 505 being further configured to determine a fuzziness indicator, the fuzziness indicator indicating a grade of non-reliability of the estimate of the interaural time difference; a downmix signal generator 507 for generating a downmix signal from the first channel signal and the second channel signal; and a multiplexer 511 for multiplexing the downmix signal, the interaural time difference, and the fuzziness parameter to obtain an encoded signal.

Description

DESCRIPTION

Parametric multichannel encoder and decoder TECHNICAL FIELD

The present invention relates to audio coding. BACKGROUND OF THE INVENTION

Parametric stereo or multi-channel audio coding as described e.g. in C. Faller and F. Baumgarte, "Efficient representation of spatial audio using perceptual parameterization," in Proc. IEEE Workshop on Appl. of Sig. Proc. to Audio and Acoust, Oct. 2001 , pp. 199- 202, uses spatial cues and a down-mix - usually mono or stereo - audio signals to synthesize signals with more channels. Usually, the down-mix audio signals result from a superposition of a plurality of audio channel signals of a multi-channel audio signal, e.g. of a stereo audio signal. These less channels are waveform coded and side information, i.e. the spatial cues, relating to the original signal channel relations is added to the coded audio channels. The decoder uses this side information to re-generate the original number of audio channels based on the decoded waveform of the down-mix audio channels.

SUMMARY OF THE INVENTION

A goal to be achieved by the present invention is to provide an efficient concept for synthesizing a multi-channel audio signal from a down-mix audio signal.

This object is achieved by the features of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures. According to a first aspect, the invention relates to a parametric multichannel encoder for encoding a multichannel signal comprising a first channel signal and a second channel signal, the parametric multichannel encoder comprising an estimator for estimating an interaural time difference between the first channel signal and the second channel signal to obtain an estimate of the interaural time difference, the estimator being further configured to determine a fuzziness indicator, the fuzziness indicator indicating a grade of non-reliability of the estimate of the interaural time difference; a downmix signal generator for generating a downmix signal from the first channel signal and the second channel signal; and a multiplexer for multiplexing the downmix signal, the interaural time difference, and the fuzziness parameter to obtain an encoded signal.

In a first possible implementation form of the audio signal processor according to the first aspect the estimator is configured to determine a delay between the first channel signal and the second channel signal for estimating the interaural time difference.

In a second possible implementation form of the audio signal processor according to the first aspect as such or according to the first implementation form of the first aspect the estimator comprises a Fourier Transformer for transforming the first channel signal and the second channel signal into frequency domain to obtain a first transformed channel signal and a second transformed channel signal, and wherein the estimator is configured to estimate a phase difference between the first and the second transformed channel signal, the phase difference indicating the interaural time difference.

In a third possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the estimator is configured to determine a standard deviation of a delay between the first channel signal and the second channel signal in different bandwidths to determine the fuzziness indicator. In a fourth possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the estimator is configured to determine a first value of the fuzziness indicator or a second value of the fuzziness indicator, the first value indicating that the interaural time difference is non-reliable the second value indicating that the interaural time difference is reliable.

In a fifth possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the estimator is configured to determine one of a plurality of values of the fuzziness indicator, each value being associated with a different grade of non-reliability of the estimate of the interaural time difference.

In a sixth possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the estimator is configured to determine a cross-correlation between the first channel signal and the second channel signal to estimate the interaural time difference.

In a seventh possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the grade of non-reliability of the estimate of the interaural time difference is determined by a delay between the first channel signal and the second channel signal.

In an eighth possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the downmix signal generator is configured to combine the first channel signal and the second channel signal to obtain the downmix signal.

In a ninth possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the estimator is configured to quantize the interaural time difference to obtain quantized interaural time difference, and wherein the multiplexer is configured to include the quantized interaural time difference into the encoded signal. In a tenth possible implementation form of the audio signal processor according to the first aspect as such or according to any of the preceding implementation forms of the first aspect the first audio signal is a superposition of audio signal components originating from a first audio signal source and a second audio signal source from different directions. According to a second aspect, the invention relates to a parametric multichannel decoder for decoding a received signal to obtain a multichannel audio signal with a first channel signal and a second channel signal, the received signal comprising a downmix signal, an estimate of an interaural time difference between the first channel signal and the second channel signal, and a fuzziness indicator, the fuzziness indicator indicating a grade of non-reliability of the estimate of the interaural time difference, the parametric multichannel decoder comprising a demultiplexer for demultiplexing the received signal to provide the downmix signal, the estimate of the interaural time difference and the fuzziness indicator; and a synthesizer for synthesizing the first decoded channel signal and the second decoded channel signal from the multichannel audio signal using the encoded downmix signal, the estimate of the interaural time difference and the fuzziness indicator.

In a first possible implementation form of the audio signal processor according to the second aspect the demultiplexer is configured to extract a first portion of the receive signal to obtain the multichannel audio signal, to extract a second portion of the receive signal to obtain the estimate of the interaural time difference, and to extract a third portion of the receive signal to obtain the fuzziness indicator.

In a second possible implementation form of the audio signal processor according to the second aspect as such or according to the first implementation form of the second aspect the parametric multichannel decoder is configured to amend phases of the first decoded channel signal and the second decoded channel signal upon the basis of the estimate interaural time difference if the fuzziness indicator indicates a grade corresponding to a reliability of the estimate of the interaural time difference.

In a third possible implementation form of the audio signal processor according to the second aspect as such or according to any of the preceding implementation forms of the second aspect the parametric multichannel decoder is configured to modulate the estimate of the interaural time difference if the fuzziness indicator indicates a grade corresponding to a non-reliability grade of the estimate of the interaural time difference, and to amend phases of the first decoded channel signal and the second decoded channel signal upon the basis of the modulated estimate of the interaural time difference.

According to a third aspect, the invention relates to an encoding method for parametric multichannel encoding a multichannel signal comprising a first channel signal and a second channel signal, the encoding method comprising estimating an interaural time difference between the first channel signal and the second channel signal to obtain an estimate of the interaural time difference, the estimator being further configured to determine a fuzziness indicator, the fuzziness indicator indicating a grade of non-reliability of the estimate of the interaural time difference; generating a downmix signal from the first channel signal and the second channel signal; and multiplexing the downmix signal, the interaural time difference and the fuzziness parameter to obtain an encoded signal. According to a fourth aspect, the invention relates to a decoding method for parametric multichannel decoding a received signal to obtain a multichannel audio signal with a first decoded channel signal and a second decoded channel signal, the received signal comprising a downmix signal, an estimate of an interaural time difference between the first channel signal and the second channel signal, the fuzziness indicator, and a fuzziness indicator indicating a grade of non-reliability of the estimate of the interaural time difference, the decoding method comprising demultiplexing the received signal to provide the downmix signal, the estimate of the interaural time difference and the fuzziness indicator; and synthesizing the first decoded channel signal and the second decoded channel signal of the multichannel audio signal using the encoded downmix signal, the estimate of the interaural time difference and the fuzziness indicator.

According to a fifth aspect, the invention relates to a computer program for performing the method of the third aspect or the fourth aspect when run on a computer. BRIEF DESCRIPTION OF THE DRAWINGS

Further embodiments of the invention will be described with respect to the following figures, in which: Fig. 1 shows an illustration of the principles of interaural time differences (ITDs); Fig. 2 shows a typical scenario of an ITD switching problem;

Fig 3 shows an example of ITD estimation and a standard deviation;

Fig 4 shows technologies for ITD extraction and synthesis;

Fig 5 shows a diagram of fuzzy ITD synthesis. DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION

The interaural time difference (ITD) is the difference in arrival time of a sound between both ears of a hearer. ITD is important in the localization of sounds, as it provides a cue to the direction or angle of the sound source with respect to the head. If a signal arrives at the head from one side, the signal has a further way to travel to reach the farer ear than the nearer ear. This path length difference results in a time difference between the arrival of sound at the ears, which is perceived and aids the process of identifying the direction of sound source.

Fig. 1 shows an example of ITD. Differences in time of arrival at the two ears 103 and 105 of a hearer 101 are indicated by a delay of the sound waveform. It can be defined that if a waveform arrives firstly at the left ear, the ITD is positive, whereas it is negative otherwise. If the sound source is directly in front of the listener, the ITD is zero.

When coding audio items with inherent ITD cues, for example binaural audio, e.g.

recorded by binaural microphone or two high fidelity microphones mounted in a dummy head, ITD estimation is unreliable and often the estimated ITD is fast varying, when multiple sources from different directions are simultaneously active. The parametric stereo decoded signal in such a case is often instable, i.e. sources are perceived with instable directions.

Fig. 2 shows a typical scenario of this problem. There are two talkers on the left and right side. The left talker 201 starts speaking first, i.e. his ITD is positive, and then the right talker 205 starts speaking, i.e. his ITD is negative. Between these two talkers there is a crosstalk 203, when both talkers talk simultaneously. During the crosstalk, a single ITD cannot represent the channel relation correctly.

If the ITD directly switches from a positive to a negative value, a click is perceived around the switching point. If, however, the ITD set to zero before switching from a positive to a negative value, the right talker is perceived as moving from the left side to the right side and staying at the right side. Fig. 3 shows an example of an ITD estimation 301 and a standard deviation 303 of the binaural speech shown in figure 2. Nearly at time 600 occurs a switch from one source of speech to another with an intermediate overlap. The ITD standard deviation 303 exhibits a peak, indicating an instable ITD. An instable ITD indicates that the localization information is blurry. Thus, it is not possible to rely on only one delay parameter to maintain the stereo image.

In a situation with an instable ITD, e.g. switching between two values corresponding to two sources, the ITD is ambiguous. It can take randomly a value from either one or the other source in dependence on each frequency and time and according to which source is stronger, i.e. has more energy than the other. In this case a "fuzzy ITD" is used, i.e. the stereo signal is rendered such that it is perceived as being blurred. Using this strategy, a direct switching and instability of the ITD is not perceived as such. A parametric stereo encoder using fuzzy ITD can be described by the steps of receiving a stereo signal; estimating inter-channel cues, including estimating ITD; estimating the ITD as being fuzzy, if ITD is fluctuating or ambiguous; and packing into a bit-stream the mono downmix plus the inter-channel cues which includes the fuzzy ITDs, if such were estimated or determined.

A parametric stereo decoder using fuzzy ITD can be described by the steps of receiving a bit-stream; obtaining from the bit-stream a mono signal plus inter-channel cues;

synthesizing a stereo signal using the inter-channel cues; and synthesizing the stereo signal at the corresponding frequency and time as being blurred, if a fuzzy ITD is detected.

A fuzzy ITDs can be represented in various ways, e.g. in the bit-stream: in addition to using conventional ITDs, one or more ITD levels can be defined as being fuzzy and different degrees of fuzziness may be defined; only one parameter may be used, e.g.

defining a fuzzy ITD as being zero (ITD=0), but with different fuzziness levels; or two parameters may be used, e.g. one conventional ITD and a degree of fuzziness.

At the decoder fuzzy ITDs may be synthesized in different ways: the desired ITD is synthesized and left and right channel are de-correlated depending on the desired degree of fuzziness; the desired ITD and an ICC corresponding to the desired fuzziness are synthesized; an ITD fluctuating over time is synthesized; or an ITD fluctuating over frequency is synthesized.

Fig. 4 shows an embodiment for ITD extraction and synthesis comprising an encoder 401 and a decoder 403.

The parametric multichannel encoder 401 comprises an estimator 405, a downmix means 407, an encoder 409 and a multiplexer 41 1 . The parametric multichannel decoder 403 comprises a demultiplexer 413, a decoder 417, and a de-quantizer 419. The estimator 405 is, according to an implementation form, configured to extract the fuzziness factor according to the principles described herein.

In the encoder 401 the ITD is extracted from the left channel x1(n) and the right channel x2(n) in the short-time discrete Fourier transform domain (STFT-domain). Without loss of generality, the technique can be described as using sampled signals. A time domain signal is written with lower-case letters, e.g. as x(n), where n is the time index. The sampling frequency, at which a time-domain signal is sampled, is fs Hz. The

corresponding short-time discrete Fourier transform (STFT) is denoted X(k,i), where k \s the frame index, i.e. downsampled time index, and / is the frequency index. The discrete Fourier transform (DFT) size is denoted N.

The time-smoothed cross-spectrum is estimated with single-pole averaging

where ^* denotes the complex conjugate and determines the decay of the exponential time smoothing, a is a smoothing factor with a range from 0 to 1 . The delay as a function of time and frequency is given by

(2) where ^{:~ '} · ^i: is the unwrapped phase of C( e unwrapping ensures that all ppropriate multiples of 2PI have been included

inally, the full-band delay is estimated as

where /_/ and i_h are the STFT bin indices which can be calculated from equation (4)

^"ΐ·/>

(4) where [fl, fh] Hz is the delay estimation frequency range. I and h are abbreviations for low and high. Using a smoothed version of the cross-spectrum and only a part of the frequency band for ITD estimation can make the delay estimate robust and reliable.

At the parametric stereo decoder, the delay d(k) can be synthesized by modifying the left and right spectra,

. 2md(k)

$₂(k,i)=e ^{] 2N} X₂(k,i). (5) Note that equation (5) gives a delay of d(k)/2 to the left channel and a delay of -d(k)/2 to the right channel. Alternatively, it is also possible not to modify a channel and give the full delay to the other channel. In the following a fuzziness indicator extraction will be described.

In complex audio signals a single delay often cannot represent the relation between the audio channels precisely. The standard deviation of the delay in samples,

can be used as a measure, on how precisely a single delay describes the channel relations or equivalently, how much the delay varies as a function of frequency.

If

is higher than a threshold, the fuzziness indicator can be set to 1 , else the indicator is set to 0.

Cross-correlation can be used to extract fuzziness indicator. If the maximum cross- correlation of two channels is lower than a threshold, fuzziness indicator can be set to 1 , else it is set to 0.

In the following a fuzzy time-delay synthesis will be described.

Application of a full-band single delay, as done with equation (5) still can be improved. If a binaural signal has two sources on the left and right side, the delay between the audio channels toggles between the delays related to the direction of both sources, if only one source is active at a time. However, if both sources are simultaneously active, a single delay cannot represent the channel relation. Localization blur increases when sources are simultaneous active. A single delay in this case will result in perception of a non-blurred object at a specific direction. To improve single-delay parametric stereo synthesis, an additional localization blur parameter can be used.

A way to synthesize a blurred source is to change the estimated delay over frequency, e.g. d(k,i) = d(k) + sm(^-_r) (7)

N where β \$ the modulation amplitude in samples and γ determines how many periods of sinusoidal modulation are contained up to Nyquist-frequency.

Given d(k,i) blurred time delay synthesis is carried out with

, 2m3( k,i)

-¾(¼ )= e^{7 2N} Χ_γ{ ϊ)

_ .2m3(k,i)

X₂(k,i)=e^{~J 2N} X₂(k,i). (8)

In order to prevent fuzziness from too quickly switching on and off, the fuzziness indicator is time-smoothed at the decoder. If the time-smoothed fuzziness indicator in a range from 0 to 1 is denoted f(k), then a blurred delay adapting to the degree of fuzziness can be computed as: d(k, i) = d(k) + f(k)fi sin(^ γ) (9)

N

Alternatively d(k) can be decreased or set to zero, when a certain degree of fuzziness is present.

In contrast to using delay modulation it is also possible to apply different all-pass filters to left and right to obtain a blurred spatial image:

2m3(k,i) .2m3.(k,i)

-j

Jt₂(k,i)- (f(k)A2(i) + l -f(k))e X₂(k,i), (10) where A 1(i) and A2(i) represent all-pass filters. More generally speaking, de-correlators can be used instead of all-pass filters.

The parametric stereo encoder estimates the time-delay d(k), e.g. as aforementioned. Additionally, a fuzziness indicator f(k) is determined. For example, an indicator could be used which is either 0 or 1 , where 1 indicates that the delay is fuzzy. The delay and fuzziness indicator are transmitted as parametric stereo parameters to the decoder, usually in combination with other parameters, such as level differences. If f(k)=0 (reliable ITD), then the parametric stereo decoder uses equation (5) for delay synthesis. Else if f(k)=1 (non-reliable ITD), then the parametric stereo decoder uses equation (8) for delay synthesis. Alternatively, a fuzziness indicator could be used which has different values. In this case, the parametric decoder has the capability of synthesizing delays with different degrees of fuzziness. For example, the degree of fuzziness could be varied by varying β \η equation (9). The higher the degree of fuzziness is, the more unstable is the source. Fig. 5 shows a diagram of fuzzy ITD synthesis. The fuzzy ITD synthesis implementation includes a parametric multichannel encoder 501 and a parametric multichannel decoder 503. The parametric multichannel encoder 501 comprises an estimator 505, a downmix means 507, an encoder 509, a fuzziness indicator extraction means 523 and a multiplexer 51 1 . The parametric multichannel decoder 503 comprises a demultiplexer 513, a fuzziness indicator 515, a decoder 517 and a de-quantizer 519.

In the following a multiple channels implementation will be described.

ITDs are extracted from the multi-channel signal by using the following equation.

with ICj(d) being the normalized cross-correlation defined as

wherein xref represents the reference signal and xj represents the channel signal j. The reference signal xref can be chosen in one of the channels xj (for j in [1 ,M]) and then M-1 spatial cues are calculated in the decoder. The reference signal xref can be also a mono downmix signal, which is the average of all M channels, and then M spatial cues can be calculated in the decoder.

The advantage of using a downmix signal as a reference for a multichannel audio signal is avoiding using a silent signal as reference signal. Indeed the downmix represents an average of the energy of all the channels and is hence less subject to be silent.

If IC_j(d) is lower than a given threshold, fuzziness indicator of channel j can be set to 1 , else set to 0. This fuzziness indicator needs to be transmitted to the decoder.

In the decoder, a blurred source is synthesized the same way.

The estimated delay is changed over frequency, e.g.

4πι

d_j (k, i) = d_j k) + β sin(— Y) (13)

Where β is the modulation amplitude in samples and γ determines how many periods of sinusoidal modulation are contained up to Nyquist-frequency. y is the index of channel.

Given d_j(k,i) blurred time delay synthesis is carried out with

X_rAk,i) = X Ak,i)

(14)

X_j{k,i) = e ⁱ^^X_j{k,i).

(15)

To prevent fuzziness from too quickly switching on and off, the fuzziness indicator is time- smoothed at the decoder. If the time-smoothed fuzziness indicator is denoted f k) in a range from 0 to 1 ,then a blurred delay, adapting to the degree of fuzziness can be computed as:

d_j (k, i) = d_j (*) + f_j (£) ? sin(^ χ) (16)

N

Alternatively dj[k) can be decreased or set to zero, when a certain degree of fuzziness is present.

In contrast to using delay modulation applying different all-pass filters in all channels to obtain a blurred spatial image:

X_ref (k, i) = X_ref (k,i) =^,(*, = ( ,(*Η( +ΐ- ,(*))^{β"7_ !Γ"}^(*, , (17) where A_j(i) represents the applied all-pass filter. More generally speaking, de-correlators can be used instead of all-pass filters.

The parametric multichannel decoder 403, 503 can be configured to amend phases of the first decoded channel signal and the second decoded channel signal upon the basis of the estimate interaural time difference if the fuzziness indicator indicates a first grade of non- reliability of the estimate of the interaural time difference, in particular a non -reliability of the estimate of the interaural time difference, that is smaller than a second grade of non- reliability of the estimate of the interaural time difference, in particular a reliability of the estimate of the interaural time difference.

The parametric multichannel decoder 403, 503 can be further configured to modulate the estimate of the interaural time difference if the fuzziness indicator indicates a second grade of non-reliability of the estimate of the interaural time difference, in particular a reliability of the estimate of the interaural time difference, that is greater than a first grade of non-reliability of the estimate of the interaural time difference, in particular a non- reliability of the estimate of the interaural time difference, and to amend phases of the first decoded channel signal and the second decoded channel signal upon the basis of the modulated estimate of the interaural time difference.

Claims

CLAIMS:

1 . A parametric multichannel encoder (401 , 501 ) for encoding a multichannel signal comprising a first channel signal and a second channel signal, the parametric

multichannel encoder (401 , 501 ) comprising: an estimator (405, 505) for estimating an interaural time difference between the first channel signal and the second channel signal to obtain an estimate of the interaural time difference, the estimator (405, 505) being further configured to determine a fuzziness indicator, the fuzziness indicator indicating a grade of non-reliability of the estimate of the interaural time difference; a downmix signal generator (407, 507) for generating a downmix signal from the first channel signal and the second channel signal; and a multiplexer (41 1 , 51 1 ) for multiplexing the downmix signal, the interaural time difference, and the fuzziness parameter to obtain an encoded signal.

2. The parametric multichannel encoder (401 , 501 ) of claim 1 , wherein the estimator (405, 505) is configured to determine a delay between the first channel signal and the second channel signal for estimating the interaural time difference.

3. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the estimator (405, 505) comprises a Fourier Transformer for transforming the first channel signal and the second channel signal into frequency domain to obtain a first transformed channel signal and a second transformed channel signal, and wherein the estimator (405, 505) is configured to estimate a phase difference between the first and the second transformed channel signal, the phase difference indicating the interaural time difference.

4. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the estimator (405, 505) is configured to determine a standard deviation of the delays between the first channel signal and the second channel signal in different bandwidths to determine the fuzziness indicator.

5. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the estimator (405, 505) is configured to determine a first value of the fuzziness indicator or a second value of the fuzziness indicator, the first value indicating that the interaural time difference is non-reliable the second value indicating that the interaural time difference is reliable.

6. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the estimator (405, 505) is configured to determine one of a plurality of values of the fuzziness indicator, each value being associated with a different grade of non- reliability of the estimate of the interaural time difference.

7. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the estimator (405, 505) is configured to determine a cross-correlation between the first channel signal and the second channel signal to estimate the interaural time difference.

8. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the grade of non-reliability of the estimate of the interaural time difference is determined by a cross-correlation between the first channel signal and the second channel signal.

9. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the downmix signal generator (407, 507) is configured to combine the first channel signal and the second channel signal to obtain the downmix signal.

10. The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the estimator (405, 505) is configured to quantize the interaural time difference to obtain quantized interaural time difference, and wherein the multiplexer (41 1 , 51 1 ) is configured to include the quantized interaural time difference into the encoded signal.

1 1 . The parametric multichannel encoder (401 , 501 ) of any of the preceding claims, wherein the first audio signal is a superposition of audio signal components originating from a first audio signal source and a second audio signal source from different directions.

12. A parametric multichannel decoder (403, 503) for decoding a received signal to obtain a multichannel audio signal with a first decoded channel signal and a second decoded channel signal, the received signal comprising a downmix signal, an estimate of an interaural time difference between the first channel signal and the second channel signal, and a fuzziness indicator, the fuzziness indicator indicating a grade of non- reliability of the estimate of the interaural time difference, the parametric multichannel decoder (403, 503) comprising: a demultiplexer (413, 513) for demultiplexing the received signal to provide the downmix signal, the estimate of the interaural time difference and the fuzziness indicator; and a synthesizer (421 , 521 ) for synthesizing the first decoded channel signal and the second decoded channel signal from the multichannel audio signal using the encoded downmix signal, the estimate of the interaural time difference and the fuzziness indicator.

13. The parametric multichannel decoder (403, 503) of claim 12, wherein the demultiplexer (513) is configured to extract a first portion of the receive signal to obtain the multichannel audio signal, to extract a second portion of the receive signal to obtain the estimate of the interaural time difference, and to extract a third portion of the receive signal to obtain the fuzziness indicator.

14. The parametric multichannel decoder (403, 503) of claim 12 or 13, wherein the parametric multichannel decoder (403, 503) is configured to amend phases of the first decoded channel signal and the second decoded channel signal upon the basis of the estimate interaural time difference if the fuzziness indicator indicates a grade

corresponding to a reliability of the estimate of the interaural time difference.

15. The parametric multichannel decoder (403, 503) of claim 12, 13 or 14, wherein the parametric multichannel decoder (403, 503) is configured to modulate the estimate of the interaural time difference if the fuzziness indicator indicates a grade corresponding to a non-reliability grade of the estimate of the interaural time difference, and to amend phases of the first decoded channel signal and the second decoded channel signal upon the basis of the modulated estimate of the interaural time difference.

16. An encoding method for parametric multichannel encoding a multichannel signal comprising a first channel signal and a second channel signal, the encoding method comprising: estimating an interaural time difference between the first channel signal and the second channel signal to obtain an estimate of the interaural time difference, the estimator (405, 505) being further configured to determine a fuzziness indicator, the fuzziness indicator indicating a grade of non-reliability of the estimate of the interaural time difference; generating a downmix signal from the first channel signal and the second channel signal ; and multiplexing the downmix signal, the interaural time difference and the fuzziness parameter to obtain an encoded signal.

17. A decoding method for parametric multichannel decoding a received signal to obtain a multichannel audio signal with a first decoded channel signal and a second decoded channel signal, the received signal comprising a downmix signal, an estimate of an interaural time difference between the first channel signal and the second channel signal, the fuzziness indicator, and a fuzziness indicator indicating a grade of non- reliability of the estimate of the interaural time difference, the decoding method comprising: demultiplexing the received signal to provide the downmix signal, the estimate of the interaural time difference and the fuzziness indicator; and synthesizing the first decoded channel signal and the second decoded channel signal of the multichannel audio signal using the encoded downmix signal, the estimate of the interaural time difference and the fuzziness indicator.

18. A computer program for performing the method of claim 16 or 17 when run on a computer.