WO2010075895A1

WO2010075895A1 - Parametric audio coding

Info

Publication number: WO2010075895A1
Application number: PCT/EP2008/068371
Authority: WO
Inventors: Juha Petteri OJANPERÄ
Original assignee: Nokia Corporation
Priority date: 2008-12-30
Filing date: 2008-12-30
Publication date: 2010-07-08

Abstract

It is disclosed to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.

Description

Parametric Audio Coding

FIELD

This invention relates to parametric audio coding.

BACKGROUND

Channel capacity for transmitting an audio signal or storage capacity for storing an audio signal is often limited. In consequence, audio coding is applied to reduce the required bitrate. It has been the industries constant goal to develop audio coding techniques enabling high quality audio signal reconstruction while reducing the required bitrate to a minimum. This is especially true for multichannel audio coding where, compared to single channel audio signals, an even larger amount of information has to be dealt with.

SUMMARY OF SOME EXEMPLARY EMBODIMENTS OF THE INVENTION According to a first aspect of the present invention, a first apparatus is disclosed, comprising a processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.

According to the first aspect of the present invention, further a second apparatus is disclosed, comprising means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.

The means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, may for instance comprise a processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, but said means is not limited thereto.

The first apparatus as well as the second apparatus according to the first aspect of the present invention can be a module that forms part of or is to form part of another apparatus (such as for instance an audio encoder or decoder) , for instance a processor, or it can be a separate apparatus . According to a second aspect of the present invention, further a method is disclosed, comprising determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.

According to a third aspect of the present invention, further a program is disclosed, comprising program code for performing the method according to the second aspect of the present invention, when the program is executed on a processor. The program may for instance be a computer program that is readable by a computer or processor. The program code may then for instance be computer program code. The program may for instance be distributed via a network, such as for instance the Internet. The program may for instance be stored on a tangible readable medium, for instance a computer-readable or processor-readable medium. The readable medium may for instance be embodied as an electric, magnetic, electro-magnetic, optic or other storage medium, and may either be a removable medium or a medium that is fixedly installed in an apparatus or device.

According to a fourth aspect of the present invention, further a readable storage medium encoded with instructions that, when executed by a processor, perform the method according to the second aspect of the present invention is disclosed.

The readable storage medium may for instance be a computer-readable or processor-readable storage medium. It may be embodied as an electric, magnetic, electromagnetic, optic or other storage medium, and may either be a removable storage medium or a storage medium that is fixedly installed in an apparatus or device.

According to a fifth aspect of the present invention, further a program is disclosed which causes a processor to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. The program may for instance be a computer program that is readable and/or executable by a computer or processor. The program may for instance be a computer program with computer program code. The program may be stored on a tangible readable medium, for instance a computer- readable or processor-readable medium. The readable medium may for instance be embodied as an electric, magnetic, electro-magnetic, optic or other storage medium, and may either be a removable medium or a medium that is fixedly installed in an apparatus or device.

The term auditory spatial image may be used to refer to the listener's spatial perception of components of an audio signal.

Human listeners are capable of perceiving an auditory spatial image by interpreting differences of the signals received by the left and by the right ear. Among these differences are the interaural level difference (ILD) and the interaural time difference (ITD) . Diffraction and reflection may be caused by at least some of the listener's body parts, thus affecting the audio signal on its way from a sound source to the listener's eardrum. A head related transfer function (HRTF) describes the transform an audio signal may be subject to due to these diffraction and reflection phenomena. It is direction- dependent. Hence, being used to the listener's individual HRTF, the listener's brain can derive directional cues from the interaural signal difference caused by the HRTF. Many audio signal reproduction technologies aim at obtaining an auditory spatial image of the reproduced audio signal that resembles the auditory spatial image of the original audio signal as closely as possible. Multichannel audio signals exist. An exemplary multichannel audio signal is a stereo audio signal. A stereo audio signal comprises a first channel audio signal and a second channel audio signal. The first channel and the second channel are often referred to as a left channel and a right channel, respectively. A stereo audio signal is usually intended to be reproduced via a set of speakers, wherein for each channel there is at least one speaker provided. In accordance with the naming of the channels, these two speakers can be called a left speaker and a right speaker, respectively. The left and the right speaker can be arranged with a certain offset to one another. A user of the audio signal reproduction system comprising said left and said right speaker, who may also be referred to as a listener or an auditor, may then localize whether a certain component of a replayed audio signal is provided through the left or through the right speaker. Of course, the audio signal may also be provided through both speakers, possibly with a different sound level for each speaker, so that the listener does not only perceive said signal component exclusively on one side. In consequence, the listener can perceive an auditory spatial image. In the case of two channels being provided, the auditory spatial image the listener perceives may also be called a stereo image.

For audio reproduction by means of speakers or headphones, the audio signal provided to the speakers or headphones may, among other parameters, exhibit a certain inter-channel level difference (ICLD) and a certain inter-channel time difference (ICTD) and a certain inter- channel correlation (ICC) . If reproduced by means of head phones, the ICLD and the ICTD may correspond to the ILD and to the ITD, respectively. In loudspeaker reproduction, the ICLD and the ICTD may only partially determine the ILD and the ITD, respectively. It is to be understood that the above explanations represent only some of the factors contributing to the complex system of human sound localization. Furthermore, it is readily clear to a skilled person that recording or encoding audio signals may involve using much more elaborate spatial information capturing or encoding techniques.

Using more than two channels may enable an even better spatial reproduction of an audio signal. For instance, four channel audio systems exist. A set of four loudspeakers, for example a front right loudspeaker, a front left loudspeaker, a rear right loudspeaker and a rear left loudspeaker, may be provided for audio signal reproduction. Five channel audio signal reproduction systems often comprise a fifth loudspeaker arranged in between the front right loudspeaker and the front left loudspeaker. It is noted that an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame may be formed by selecting two channels of a plurality of provided channels. For instance, if five channels are provided, the left and right front channels may form one channel pair and the left and right rear channels form another channel pair. It is also possible that the front left and rear left channels form a channel pair and the front right and rear right channels form another channel pair. In the following, exemplary embodiments of the present invention are explained with respect to a scenario in which only two channels are provided. It is to be understood that, as stated above, exemplary embodiments according to all aspects of the present invention may be applied in scenarios in which more than two channels are provided by forming at least one channel pair, each channel pair comprising two selected channels.

In order to reduce the bitrate required by an audio signal, various audio signal coding approaches exist. In the field of audio coding, the auditory spatial image a listener will perceive if provided with the decoded audio signal that has been obtained by decoding the previously encoded audio signal may be referred to as a reconstructed auditory spatial image. Again, it is often desired to obtain a reconstructed auditory spatial image that closely resembles the auditory spatial image of the original (unmodified and uncoded) audio signal. On the other hand, in some cases it may be desirable to modify the perceived auditory spatial image to be different from the original auditory spatial image, for example by introducing a wider or narrower auditory spatial image than the one present in the original audio signal.

Parametric audio coding may be thought of as generating a downmix signal from a multichannel audio signal and providing spatial extension information that allows reconstruction of the multichannel audio signal from the downmix signal. A downmix signal is generated from a multichannel audio signal in such a way that M input channels are used to generate N downmix channels, with N < M. As an example, a single-channel downmix signal - a mono signal - is generated from a stereo audio signal, implying N=I and M=2. In the course of audio signal encoding, the audio signal may be divided into a sequence of frames, each frame comprising a limited number of samples of the audio signal. In parametric audio coding, an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame may be encoded as a single mono audio signal frame and spatial extension information. One may say that the parametrically encoded audio signal represents the audio signal frame pair.

The spatial extension information may contain parameters such as ICLDs, ICTDs and ICCs, as well as other parameters. Any parametric audio coding technology may be used in the scope of the present invention. Exemplary parametric audio coding technologies comprise Binaural Cue Coding (BCC) and Spatial Audio Coding (SAC) . An advantage of parametric audio coding is that bitrate requirements can often be reduced significantly. Instead of separately encoding a full set of input channel signals, a smaller number of signals and the spatial extension information have to be provided for transmission or storage. For example instead of two audio channel signals only a mono signal, i.e. a single channel signal, and the spatial extension information has to be provided for transmission or storage. The bitrate required for the spatial extension information is usually smaller than the bitrate required for transmitting another mono audio signal. One may therefore say that parametric audio coding exploits channel interrelations to reduce the bitrate required by the parametrically encoded audio signal compared to the bitrate required by the original multichannel audio signal.

Any pertinent mono codec may be used for mono signal coding. Among them are Advanced Audio Coding (AAC), Advanced Audio Coding with spectral band replication (AAC+) and the International Telecommunication Union (ITU) G.718 mono codec, to give only some examples of mono codecs .

Decoding a parametrically encoded audio signal may involve decoding the mono signal. Exploiting the spatial extension information, the mono signal can serve as a basis for reconstructing a first channel audio signal and a second channel audio signal. The reconstructed first channel audio signal frame may be a representation of the first channel audio signal frame, while the reconstructed second channel audio signal frame may be a representation of the second channel audio signal frame. Hence, one may say that the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame.

An audio signal frame may be divided into a plurality of frequency subbands . In the frequency domain, a signal is represented by a set of frequency components. A frequency subband of the signal may then be thought of as comprising each frequency component falling within an upper frequency bound and a lower frequency bound limiting the respective frequency subband. The frequency subbands, i.e. their bounds, may be identical for the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame. Moreover, they may also be identical to the frequency subbands of the audio signal frame pair.

Exemplary embodiments according to all aspects of the present invention involve determination whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame.

One possibility to apply a gain to at least one frequency subband of a reconstructed audio signal frame is to multiply each frequency component of the respective frequency subband by a gain factor. Thus, the listener's perception of an audio signal can be modified by applying a gain to a certain frequency band. In particular, the auditory spatial image perceived by the listener may be affected by applying a gain to a frequency subband of either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame. A listener's perception may exhibit different characteristics in different frequency subbands . In consequence, it can be relevant to which specific frequency subband or frequency subbands a gain is applied. In addition, it can also be relevant whether a gain is applied to a frequency subband of a reconstructed first channel audio signal frame or to a frequency subband of a reconstructed second channel audio signal frame. Audio signal components falling within the subband of the reconstructed audio signal frame to which a gain is applied may be perceived more prominently in the respective channel than audio signal components falling within the subband of the reconstructed audio signal frame of the other channel.

According to all aspects of the present invention, the decision whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is based on a first parameter and a set of second parameters.

The first parameter may be indicative of a long-term estimate of a perceived direction of the audio signal frame pair. For instance, the first parameter may be indicative of a perceived direction of the (current) audio signal frame pair and at least one preceding audio signal frame pair. No limitations pertain to the first parameter as long as it is indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair. It is readily clear that different listeners are likely to have a different audio signal perception. Hence, there is probably not a single parameter that can exactly describe the perceived direction of an audio signal for all possible listeners. Moreover, with a variety of parameters contributing to the auditory spatial image a listener perceives and complex interrelations among these parameters, a model describing human audio signal localization is likely to be afflicted with at least some amount of inaccuracy. In consequence, various parameters can be determined that are indicative of a perceived direction of an audio signal, while it may be difficult to determine a parameter exactly representing said direction. In this respect, other examples for directional sensing of sounds may include techniques based on subspace decomposition of a covariance matrix of received signals; ESPRIT and MUSIC. The former is introduced in ,,A. Paulraj and R. R. Kailath, "ESPRIT-Estimation of Signal Parameters via Rotational Invariance Techniques," IEEE Trans. Acoust., Speech, Signal Processing, vol.37, no.7, pp.984-995, 1989." And the latter in ,,R. O. Schmidt, "Multiple Emitter Location and Signal Parameter Estimation, " IEEE Trans. Antennas and Propagation, vol.34, no.3, 1986."

According to all aspects of the present invention, the first parameter is indicative of a perceived direction of the audio signal frame pair and at least one preceding frame pair. In exemplary embodiments according to all aspects of the present invention, this may be understood as taking into account not only the perceived direction of a current audio signal frame pair but also considering the audio signal history. In consequence, the first parameter may change more slowly than, for instance, a parameter indicative of a perceived direction of the current audio signal frame pair only. It is noted that according to all aspects of the present invention, any number of preceding audio signal frames may be taken into account. Of course, if the current audio signal frame pair does not have a predecessor, a preceding audio signal frame pair cannot be taken into account.

Each second parameter of the set of second parameters may be indicative of a short-term estimate of a perceived direction of a frequency subband (or several frequency subbands) of the audio signal frame pair. For instance, each second parameter may be indicative of a perceived direction of a frequency subband of the audio signal frame pair. In exemplary embodiments according to all aspects of the present invention, each second parameter may be understood as a parameter that does not describe the perceived direction of the entire audio signal frame pair, but of one or more specific frequency subbands of the audio signal frame pair. In case of one specific subband, one may also call each second parameter a frequency subband specific parameter. Again, as already explained with regard to the first parameter indicative of a perceived direction of the audio signal frame pair, different parameters indicative of a perceived direction of a frequency subband of the audio signal frame pair can be employed. The set of second parameters may comprise at least two second parameters respectively indicative of a perceived direction of a frequency subband of the audio signal frame pair but may also comprise more than two second parameters. If the set of second parameters is only derived from the current audio signal frame pair and not also from a preceding audio signal frame pair, it may not only be thought of as comprising frequency subband specific information but it may also change more quickly than the first parameter for a sequence of audio signal frame pairs.

It is noted that according to all aspects of the present invention, any suitable processing of the first parameter and the set of second parameters may be employed in the course of determining whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame. Any condition that has to be met in order to cause determination to yield a positive result can be imposed on the first parameter and the set of second parameters.

According to all aspects of the present invention, no restrictions regarding the location at which it is determined whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame apply. For example, determination may be performed at a parametric audio encoder or at a parametric audio decoder. It is also possible that it takes place at a separate module neither forming part of an encoder nor of a decoder, such as for instance a network element located in a transmission path between an encoder and a decoder, a separate module co-located with the encoder, or a separate module co-located with the decoder. The auditory spatial image a listener perceives when provided with the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame, which will in the following be referred to as the reconstructed auditory spatial image, may differ from the auditory spatial image the listener would perceive if provided directly with the first channel audio signal frame and the second channel audio signal frame. The latter auditory spatial image will be in the following referred to as the original auditory spatial image .

One reason for the occurrence of a difference between the reconstructed auditory spatial image and the original auditory spatial image may be that parametric coding may possibly not allow exact reconstruction of the auditory spatial image. This may be, for instance, due to the spatial extension information only representing some but not every aspect of the interrelations of the first channel audio signal frame and the second channel audio signal frame. Coarse quantization of the spatial extension information may also adversely affect the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame with regard to the auditory spatial image.

The reconstructed auditory spatial image may be perceived by the listeners as a being different from the desired auditory spatial image. The desired reconstructed auditory spatial image may be the original auditory spatial image or for example a wider or a narrower auditory spatial image than the original one. Exemplary embodiments according to all aspects of the present invention can be employed to determine whether auditory spatial image width modification should be applied to the reconstructed auditory spatial image. As an example, the reconstructed auditory spatial image may be perceived as being narrow compared to the original auditory spatial image, and it may be desirable to apply auditory spatial image widening to the reconstructed auditory spatial image. Applying a gain to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame may yield a widened auditory spatial image, for example by means of a thereby obtained subband and channel specific audio signal amplification. Due to auditory spatial image widening, even a widened reconstructed auditory spatial image obtained from a parametrically encoded low bitrate audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame may closely resemble the original auditory spatial image.

However, so as to attain a desired auditory spatial image, for example the one closely resembling the original auditory spatial image, well targeted auditory spatial image width modification can be necessary. In order to minimize the risk of adversely affecting the auditory spatial image, exemplary embodiments according to all aspects of the present invention take into account parameters indicative of a perceived direction of the audio signal frame pair, of a preceding audio signal frame pair and also of a frequency subband of the audio signal frame pair.

Thereby, a hypothetic listener's directional audio signal perception can serve as a criterion whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame. An advantage of this approach can be that an auditory spatial image width modification result can be obtained that causes the listener's perception of the modified reconstructed auditory spatial image to closely resemble that of the original auditory spatial image. Also, the determination criteria used may help applying width modification only if it is beneficial. Only applying a gain in these cases can also have the advantage of reducing the signal processing load by potentially omitting gain application if the criteria are not met.

According to exemplary embodiments of the first apparatus according to the first aspect of the present invention, the first apparatus comprises a processor configured to calculate the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. According to exemplary embodiments of the second apparatus according to the first aspect of the present invention, the second apparatus further comprises means for calculating the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. These means may for instance be embodied as a processor configured to calculate the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, but are not limited thereto.

An advantage of these embodiments can be that it may then not be necessary to transmit the first parameter and the set of second parameters from another apparatus or device that is configured to determine the first parameter and the set of second parameters to the first apparatus or the second apparatus. The processor configured to calculate the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, or the corresponding means do not necessarily have to be separate entities. For example, the processor configured to calculate the first parameter and the set of second parameters may form part of the processor configured to determine whether a gain should be applied to a frequency subband of a reconstructed first channel audio signal frame or to a frequency subband of a reconstructed second channel audio signal frame .

According to exemplary embodiments of the method according to the second aspect of the present invention, the method comprises calculating the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention.

According to exemplary embodiments of the program according to the fifth aspect of the present invention, the program causes a processor to calculate the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.

According to exemplary embodiments of the first apparatus according to the first aspect of the present invention, the apparatus forms part of a parametric audio encoder.

According to exemplary embodiments of the second apparatus according to the first aspect of the present invention, the apparatus forms part of means for parametric audio encoding. The means for parametric audio coding may for instance be embodied as a parametric audio encoder, but are not limited thereto.

According to exemplary embodiments of the method according to the second aspect of the present invention, determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is performed on a parametric audio encoder side. This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention.

According to exemplary embodiments of the program according to the fifth aspect of the present invention, the program causes a processor forming part of a parametric audio encoder to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame.

For parametric audio encoding, a parametric audio encoder can be provided with an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame. Thus, information that may be necessary for calculating a first parameter indicative of a perceived direction of the audio signal frame pair and a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, may be available at the parametric audio encoder. The first parameter and the set of second parameters may then be computed on the parametric audio encoder side, too. This approach can render it unnecessary to forward these parameters to another module serving for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame. It may suffice to generate a flag indicating the determination result and to provide it to an entity configured to carry out the actual application of the gain if permitted by the flag value.

According to exemplary embodiments according to all aspects of the present invention, the first parameter and the second parameters are at least partially obtainable from an assumed or known position of a listener relative to a first loudspeaker to which the first channel audio signal frame is assigned and relative to a second loudspeaker to which the second channel audio signal frame is assigned.

For instance and as already elucidated above, for stereo audio reproduction, i.e. two channel audio reproduction, two speakers referred to as a left speaker and as a right speaker are used. First channel audio signal frames, for example left channel audio signal frames, are replayed by means of the left loudspeaker, while second channel audio signal frames, i.e. right channel audio signal frames, are replayed by means of the right loudspeaker. Put differently, the first channel audio signal frames are assigned to a first loudspeaker and the second channel audio signal frames are assigned to a second loudspeaker.

A listener's position can be expressed relative to the first loudspeaker and the second loudspeaker. It is possible that the listener's perceived auditory spatial image depends on his position relative to the first loudspeaker and to the second loudspeaker. As an example, being located closer to the first loudspeaker than to the second loudspeaker, audio signals emitted from the first loudspeaker may reach the listener with a smaller time difference and a higher sound intensity than audio signals emitted from the second speaker.

According to exemplary embodiments according to all aspects of the present invention, the listener's position relative to the first loudspeaker and the second loudspeaker can be assumed, for instance based on one or more pre-defined assumptions. To this end, it can be for instance assumed that the listener is positioned with the same distance to both the first and the second loudspeaker. Moreover, it may be assumed that the distance between the first loudspeaker and the second loudspeaker corresponds to the distance from the listener's position to each of the loudspeakers. Thus, the listener's position, the position of the first loudspeaker and the position of the second loudspeaker form the vertices of an equilateral triangle.

It is also possible, that the position of a listener relative to a first loudspeaker and a second loudspeaker is known. For instance, reconstructed and possibly widened first and second audio signal frames can be reproduced by means of the two loudspeakers. Since the reconstructed first channel audio signal frame can be seen as a representation of the first audio signal frame and since the reconstructed second channel audio signal frame can be seen as a representation of the second audio signal frame, also the first channel audio signal frame is assigned to the first loudspeaker and the second channel audio signal frame is assigned to the second loudspeaker .

The listener's position relative to these two loudspeakers may then be measured and provided to an entity configured to calculate the first parameter and the set of second parameters. In turn, perceived direction determination of the audio signal frame pair may be adapted to the reconstructed first and second audio signal frame reproduction scenario.

By taking into account a listener's assumed or known position relative to a first loudspeaker to which the first channel audio signal frame is assigned and relative to a second loudspeaker to which the second channel audio signal frame is assigned, determining whether a gain should be applied to at least one frequency subband of a reconstructed first or second channel audio signal frame can aim at reproduction of an auditory spatial image that could be perceived with the underlying assumed or known listener and loudspeaker position configuration.

Exemplary embodiments according to all aspects of the present invention comprise that the first parameter is obtainable from the direction from the assumed or known position of the listener to the first loudspeaker weighted with the sum of the energy of the first channel audio signal frame and the energy of a first channel audio signal frame of the at least one preceding audio signal frame pair, and the direction from the assumed or known position of the listener to the second loudspeaker weighted with the sum of the energy of the second channel audio signal frame and the energy of a second channel audio signal frame of the at least one preceding audio signal frame pair. Therein, in the calculation of the energies, all frequency subbands, or only a subset thereof (for instance a subset that comprises the frequency subbands that are considered important, such as for instance low frequencies, wherein this consideration may depend on the sample rate applied) may be considered.

These embodiments may provide a good representation of the perceived direction of the audio signal frame pair and the at least one preceding audio signal frame pair. Knowing the direction from the assumed or known position of the listener to the first loudspeaker and the direction from the assumed or known position of the listener to the second loudspeaker, the first parameter may be determined with low computational effort.

According to exemplary embodiments according to all aspects of the present invention, the second parameters of the set of second parameters are obtainable from the direction from the assumed or known position of the listener to the first loudspeaker weighted with the first channel audio signal frame energy within the respective frequency subband, and the direction from the assumed or known position of the listener to the second loudspeaker weighted with the second channel audio signal frame energy within the respective frequency subband. Exemplary embodiments according to all aspects of the present invention comprise that the first parameter and the second parameters are indicative of a perceived direction relative to a reference direction, for example relative to a first imaginary line passing through the assumed or known position of the listener, the first imaginary line being perpendicular to a second imaginary line connecting the first loudspeaker and the second loudspeaker. In other words, the first parameter can be indicative of a perceived direction of an audio signal frame pair and at least one preceding audio signal frame pair relative to said first imaginary line and each second parameter of the set of second parameters can be indicative of a perceived direction of a frequency subband of the audio signal frame pair relative to said first imaginary line.

By describing the above perceived directions relative to the first imaginary line, arranging the perceived directions in two groups, wherein the first group comprises perceived directions having a positive angular offset relative to the first imaginary line and the second group comprises perceived directions having a negative angular offset relative to the first imaginary line, is enabled. In exemplary embodiments according to all aspects of the present invention, the angular offset relative to the first imaginary line may also be thought of as an indicator for the amount of spatial effects contained in the entire audio signal frame pair and in a specific subband of the audio signal frame pair, respectively. This indicator may also be considered in deciding whether a gain should be applied to a specific subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame .

According to exemplary embodiments of the first apparatus according to the first aspect of the present invention, the processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is configured to base the determination on averaged second parameters. According to exemplary embodiments of the second apparatus according to the first aspect of the present invention, the means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame are configured to base the determination on averaged second parameters .

According to exemplary embodiments of the method according to the second aspect of the present invention, determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is based on averaged second parameters. This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention. According to exemplary embodiments of the program according to the fifth aspect of the present invention, the program causes a processor to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame based on averaged second parameters .

Averaging may be based on any existing averaging approach. Besides of computing the mean value or the median value, averaging may also comprise assigning weights to the second parameters before summing them up. The weights may control the influence of each second parameter on the averaged value. An advantage of basing determination whether a gain should be applied to at least one frequency subband of a reconstructed first or second channel audio signal frame on average second parameters can be that fast determination may be enabled. Checking if each second parameter meets a certain condition may require more time. This may be even more significant if several conditions have to be met.

According to exemplary embodiments of the first apparatus according to the first aspect of the present invention, the processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is further configured to consider whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame in determining whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame.

According to exemplary embodiments of the second apparatus according to the first aspect of the present invention, the means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame are configured to consider whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame in determining whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame.

According to exemplary embodiments of the method according to the second aspect of the present invention, determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame involves considering whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame. This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention .

According to exemplary embodiments of the program according to the fifth aspect of the present invention, the program causes a processor to consider whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame in determining whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame .

A listener's perception of the auditory spatial images of an audio signal he is provided with at a certain time may depend on the history of the audio signal. For example, due to auditory spatial image widening having been applied to a comparatively large number of preceding reconstructed audio signal frames, the listener may continue to perceive a widened auditory spatial image even no gain is applied to any of the subbands of the currently perceived reconstructed audio signal frames. Thus, it may not be necessary to apply a gain to any of the subbands of the currently perceived reconstructed audio signal frames. On the other hand, scenarios are imaginable in which neither the first parameter nor the set of second parameters would cause that it is determined that a gain should be applied to at least one frequency subband of a reconstructed first or second channel audio signal frame. However, if a gain has been applied to a subband of a preceding reconstructed audio signal frame it may be advisable to also apply a gain to at least one subband of a current reconstructed first or second channel audio signal frame so as not to cause sudden changes in the perceived auditory spatial image. An advantage of the embodiments of the present invention currently discussed can thus be that the history of the audio signal is taken into account, thereby allowing adequate use of auditory spatial image width modification.

In further exemplary embodiments according to all aspects of the present invention, the frequency subbands of the audio signal frame pair are non-uniform frequency subbands corresponding to the human auditory filters. The human auditory system can be modeled as a system comprising a plurality of auditory filters. The bandwidths of these auditory filters increase with increasing audio signal frequencies. An example of non- uniform frequency subbands corresponding to the human auditory filters are equivalent rectangular bandwidth (ERB) frequency subbands. By splitting the audio signal frame pair into non-uniform frequency subbands corresponding to the human auditory filters, determination whether a gain should be applied to a frequency subband of a reconstructed first or second channel audio signal frame can take into account psychoacoustic parameters of the human auditory system, thus possibly attaining enhanced modified auditory spatial images.

According to exemplary embodiments of the first apparatus according to the first aspect of the present invention, the apparatus comprises a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband. This processor may either be a separate module or it may form part of the processor configured to determine whether a gain should be applied to a frequency subband of a reconstructed first or second channel audio signal frame. According to exemplary embodiments of the second apparatus according to the first aspect of the present invention, the apparatus comprises means for determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband. These means may for example be embodied as a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband, but the means are not limited thereto.

According to exemplary embodiments of the method according to the second aspect of the present invention, the method comprises determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband. This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention.

According to exemplary embodiments of the program according to the fifth aspect of the present invention, the program causes a processor to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband.

These embodiments may for instance enable applying a gain to the reconstructed audio signal frame associated with the audio signal frame of the audio signal frame pair that is assigned to the loudspeaker for which the direction from an assumed or known position of a listener to said loudspeaker is closer to the perceived direction in the respective frequency subband as indicated by the second parameter of said frequency subband.

As an example, the perceived direction as indicated by the second parameter of a certain frequency subband of an audio signal frame pair can be 110°. The direction is measured from an assumed or known position of a listener relative to a pair of speakers. For instance and as already elucidated above, for stereo audio reproduction, i.e. two channel audio reproduction, two speakers referred to as a left speaker and as a right speaker are used. First channel audio signal frames, for example left channel audio signal frames, are replayed by means of the left loudspeaker, while second channel audio signal frames, i.e. right channel audio signal frames, are replayed by means of the right loudspeaker. Put differently, the first channel audio signal frames are assigned to a first loudspeaker and the second channel audio signal frames are assigned to a second loudspeaker.

An angle of 90° corresponds to the direction of a first imaginary line perpendicular to a second imaginary line connecting the left and the right loudspeaker, the first imaginary line passing through the assumed or known position of the listener. Therefore, directions describable by an angle greater than 90° are closer to the direction from the assumed or known position of the listener to the left speaker, while directions describable by an angle of less than 90° are closer to the direction from the assumed or known position of the listener to the right speaker.

The perceived direction of 110° is thus closer to the direction from the assumed or known position of the listener to the left loudspeaker. According to the exemplary embodiments of the present invention currently discussed, for the specific subband for which the perceived direction as indicated by the second parameter of the said subband is 110° it is determined to apply a gain to the subband of the reconstructed first channel audio signal frame. Said frame is associated with the left, i.e. first, channel audio signal frame of the audio signal frame pair that is assigned to the left loudspeaker because it may be thought of as a representation of the left channel audio signal frame.

Hence, the embodiments of the present invention currently discussed may enable emphasizing the content of the reconstructed audio signal frame contributing more significantly to the spatial auditory effects in the respective frequency subband. The modified auditory spatial image may then be close to the desired auditory spatial image, it may for example closely resemble the original auditory spatial image. Of course it does not have to be determined for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame if it has previously been determined that no gain should be applied at all.

According to exemplary embodiments of the first apparatus according to the first aspect of the present invention, the apparatus comprises a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame. This processor may either be a separate module or it may form part of the processor configured to determine whether a gain should be applied to a frequency subband of a reconstructed first channel audio signal frame or to a frequency subband of a reconstructed second channel audio signal frame . According to exemplary embodiments of the second apparatus according to the first aspect of the present invention, the apparatus comprises means for determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame. These means may for example be embodied as a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame, but the means are not limited thereto.

According to exemplary embodiments of the method according to the second aspect of the present invention, the method comprises determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame. This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention.

According to exemplary embodiments of the program according to the fifth aspect of the present invention, the program causes a processor to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame. These embodiments may be advantageous because the inter- channel level differences may form part of the spatial extension information that may be needed in order to establish these differences between the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame. It may be possible to determine for each frequency subband whether a gain should be applied to either the reconstructed first or second channel audio signal frame at the entity configured to perform reconstruction of the reconstructed first and second channel audio signal frame. To this end, it may then not be necessary to provide said entity with additional parameters. Also, the gain can be applied to at least one frequency subband of the reconstructed first or second channel audio signal frame at said entity in exemplary embodiments according to all aspects of the present invention.

According to exemplary embodiments according to all aspects of the present invention, the gain is obtainable from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of an indicator of the maximum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame, and an indicator of the minimum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame . Thus, due to the computation of the reciprocal average value of frequency subband specific indicator ratios, it may be achieved that the gain is comparatively small if the subband specific indicator ratios are large. Large subband specific indicator ratios may occur if the indicator of the maximum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame is much greater than the indicator of the minimum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame. In other words, large subband specific indicator ratios may occur if the energies of the first channel audio signal frame and the second channel audio signal frame in the respective subband differ significantly. In consequence, applying the gain may yield a widened auditory spatial image, in which subtle inter-channel differences are emphasized more than prominent inter- channel differences. Thus the widened auditory spatial image may closely resemble the original auditory spatial image .

Other exemplary embodiments according to all aspects of the present invention comprise that the gain is obtainable from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of an indicator of the maximum of the inter-channel level difference in the frequency subband of the reconstructed first channel audio signal frame and the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame, and an indicator of the minimum of the inter- channel level difference in the frequency subband of the reconstructed first channel audio signal frame and the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame.

These embodiments may be advantageous because the inter- channel level differences may form part of the spatial extension information that may be needed in order to establish these differences between the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame. The gain may then be computed at the entity configured to perform reconstruction of the reconstructed first and second channel audio signal frame. To this end, it may then not be necessary to provide said entity with additional parameters for gain computation. Also, the gain can be applied to at least one frequency subband of the reconstructed first or second channel audio signal frame at said entity in exemplary embodiments according to all aspects of the present invention.

Exemplary embodiments according to all aspects of the present invention comprise that the gain is a quantized value. Gain quantization may enable representation of the gain by a reduced number of bits. Thus, transmitting the gain to an entity configured to apply it to at least one frequency subband of a reconstructed first or second channel audio signal frame may require a reduced bandwidth. According to exemplary embodiments according to all aspects of the present invention, the gain is obtainable from a lookup table. An advantage of these embodiments can be that instead of transmitting the gain to an entity configured to apply it to at least one frequency subband of a reconstructed first or second channel audio signal frame, it may suffice to merely transmit a lookup table index, which can possibly be represented by fewer bits than the quantized gain itself.

The features of the present invention and of its exemplary embodiments as presented above shall also be understood to be disclosed in all possible combinations with each other.

It is to be noted that the above description of embodiments of the present invention is to be understood to be merely exemplary and non-limiting.

Further aspects of the invention will be apparent from and elucidated with reference to the detailed description presented hereinafter.

BRIEF DESCRIPTION OF THE FIGURES In the figures show:

Fig. 1: An exemplary illustration of a method for parametric audio coding;

Fig. 2: A more detailed representation of method step 101 of Fig. 1; Fig. 3: A more detailed representation of method step 102 of Fig. 1 ;

Fig. 4: A schematic exemplary illustration of the assumed or known position of a listener relative to a first loudspeaker to which a first channel audio signal frame is assigned and relative to a second loudspeaker to which a second channel audio signal frame is assigned;

Fig. 5: A flowchart illustrating an exemplary embodiment of a first method according to the second aspect of the present invention;

Fig. 6: A flowchart illustrating an exemplary embodiment of a second method according to the second aspect of the present invention;

Fig. 7: A more detailed illustration of method step 112 of Fig. 6;

Fig. 8: A flowchart exemplarily illustrating decoding of a bitstream generated according to the method of Fig. 6 including gain application;

Fig. 9: A schematic illustration of a first exemplary embodiment of an apparatus according to the first aspect of the present invention;

Fig. 10: A schematic illustration of a second exemplary embodiment of an apparatus according to the first aspect of the present invention; Fig. 11: A schematic illustration of an exemplary embodiment of a readable medium according to the fourth aspect of the present invention.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS OF THE

INVENTION

In the following detailed description, exemplary embodiments of the present invention will be described.

An input audio signal comprising two channels, namely L₁ and R₁, is provided to method step 101. Method step 101 comprises parametrically encoding the input audio signals L₁ and R₁. As an output, method step 101 delivers a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame. In step 102, the parametrically encoded audio signal is decoded. In decoding the parametrically encoded audio signal frame pair, a reconstructed first channel audio signal frame and a reconstructed second channel audio signal frame can be obtained.

Fig. 2 is a more detailed representation of method step 101 of Fig. 1.

The input signals L₁ and R₁ are divided into frames in step 103. Successive frames may overlap in time, for example half of their duration. As a result, the first channel audio signal frame L₁ and the second channel audio signal frame R₁ are obtained. They comprise samples t_L and t_R , respectively.

In step 104 windowing is applied to the audio signal frames L₁ and R₁. Windowing may serve for suppressing framing induced artefacts in time-to-frequency transformed representations of L₁ and R₁. Any pertinent windowing function —wT can be employed. An exemplary windowing function is the sinusoidal window given by:

wherein N is the number of samples in each frame L_t and R₁. As a result, the windowed first channel audio signal frame L_w and the windowed second channel audio signal frame R_w are obtained.

In step 105 the windowed audio signal frames L_w and R_w are transformed to the frequency domain. To this end, any transform TF that provides complex valued output may be used. For example, Discrete Fourier Transform (DFT) , Modified Discrete Cosine Transform (MDCT) , Modified Discrete Sine Transform (MDST) or Quadrature Mirror

Filtering (QMF) may be used. The transformed windowed audio signal frames are referred to as L_f and R_f . The transformed windowed audio signal frames comprise the samples f_L and f_{R I} respectively. How they are obtained from the samples t_L and t_R can be described by the following equations:

(2) In step 106 the audio signal frames L_f and R_f are transformed to a downmix signal, which in this example embodiment is a mono signal M_f .

M_f=0.5-(L_f+R_f) (3)

Step 107 comprises encoding the mono signal M_f . Any pertinent mono codec, such as Advanced Audio Coding (AAC) , Advanced Audio Coding with spectral band replication (AAC+) and the International Telecommunication Union (ITU) G.718 mono codec, may be employed.

In step 108, spatial extension information is derived from the first channel audio signal frame L_f and the second channel audio signal frame R_f . Exemplary spatial extension parameters are the inter-channel level difference (ICLD), the inter-channel time difference (ICTD) and a the inter-channel correlation (ICC) .

By multiplexing the encoded mono signal and the spatial extension information, a bitstream can be formed. The bitstream representative of an input frame may comprise one or more encoded frames or packets. It can be thought of as a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame L_f or L₁ and a second channel audio signal frame R_f or R₁. Fig. 3 is a more detailed representation of method step 102 of Fig. 1.

According to Fig. 3, the bitstream provided from the encoding step 101 is demultiplexed in step 109. Thereby, an encoded mono signal and spatial extension information are obtained. In step 110 the mono signal is decoded, yielding the decoded mono signal M_f. Step 111 comprises applying the spatial extension information, thus obtaining the reconstructed first channel audio signal frame L_f and the reconstructed second channel audio signal frame R_f .

Fig. 4 schematically illustrates the assumed or known position A of a listener relative to a first loudspeaker LS to which a first channel audio signal frame L_t is assigned and relative to a second loudspeaker RS to which a second channel audio signal frame R₁ is assigned;

For stereo audio reproduction, i.e. two channel audio reproduction, two loudspeakers referred to as a left loudspeaker LS and as a right loudspeaker RS are used. First channel audio signal frames, for example left channel audio signal frames L₁, are replayed by means of the left loudspeaker LS, while second channel audio signal frames, i.e. right channel audio signal frames R₁, are replayed by means of the right loudspeaker RS. Put differently, the first channel audio signal frames L₁ are assigned to the first loudspeaker LS, while the second channel audio signal frames R₁ are assigned to the second loudspeaker RS .

Also reconstructed and possibly widened first and second audio signal frames L_f and R_f can be reproduced by means of the two loudspeakers LS and RS. Since the reconstructed first channel audio signal frame L_f can be seen as a representation of the first audio signal frame L₁ and since the reconstructed second channel audio signal frame R_f can be seen as a representation of the second audio signal frame R_t , the first channel audio signal frame L₁ is assigned to the first loudspeaker LS and the second channel audio signal frame R₁ is assigned to the second loudspeaker RS in this case, too.

The position A can be expressed relative to the loudspeakers LS and RS. For instance, the distance from position A to the first loudspeaker LS is Si, while the distance from position A to the second loudspeaker RS is S2. The distances Si and S2 are equal. Furthermore, the direction from position A to the first loudspeaker LS can be described by the angle Θ_L and the direction from position A to the first loudspeaker LS can be described by the angle Θ_R. In the present example, Θ_L is 120° and Θ_R is 60°. Thus, the triangle A, RS, LS is an equilateral triangle. The first imaginary line S3 is perpendicular to the second imaginary line S4 that connects the first loudspeaker LS and the second loudspeaker RS. The first imaginary line S₃ passes through the position A. As an example, the listener may localize a sound component of the replayed first channel audio signal frame L_t or of the second channel audio signal frame R_t at a position P which can be described by its distance to the position A of the listener and the direction θ as seen from that position A.

In the following it is explained how a first parameter indicative of a perceived direction of an audio signal frame pair and at least one preceding audio signal frame pair may be obtained.

In a first step, the energy e_L of the first channel audio signal frame L₁ and the energy e_R of the second channel audio signal frame R_t are computed. This can be done by calculating the sum of the squared absolute values of the frequency transformed samples f_L or f_R of the respective frame and then extracting the square root of this sum according to equation (4) .

It should be noted that, instead of considering all frequency subbands in the calculation of the energy e,frame 1 of the first channel audio signal frame L_t and the energy e_R of the second channel audio signal frame R_t , only the most important frequency bands may be considered, for instance the frequency subbands corresponding to the low frequencies. Which frequency subbands are important may depend on the sample rate of the signal. For instance, half of the frequency subbands could be left out in case of a sample rate of 48kHz, whereas, at a sample rate of 8kHz, the frequency band coverage may only be up to 4kHz so in this case more than half of the bands may probably have to be included in the calculations.

The energies e_L and e_R of preceding frames are also taken into account according to the pseudocode given in equation (5) . e_τ =e_τ -β + e_τ -(1-β), e_κ =e_κ -β + e_κ -(1-/?) (5)

According to equation (5) , with each update the sum of the former value of the variable e_L weighted with β and the value of e_L weighted with {1 - β) is assigned to e_L e_R is computed accordingly. As an example, β may have the value 0.95. It may thus happen that e_L and e_R change only slightly with each update, and the e_L and e_R can be considered to represent the long-term energies of the first and second channels, respectively. e_L and e_R are initialized to zero so that they can be determined even if there has not been a preceding audio signal frame pair.

In the next step, the direction Θ_L from the assumed or known position A of the listener to the first loudspeaker LS is weighted with the sum of the energy e_L of the first channel audio signal frame L_t and the energy e_L of a first channel audio signal frame of at least one preceding audio signal frame pair. Accordingly, the direction Θ_R from the assumed or known position A of the listener to the second loudspeaker RS is weighted with the sum of the energy e_R of the first channel audio signal frame R_t and the energy e_R of a second channel audio signal frame of at least one preceding audio signal frame pair. Normalization with the sum of the energies e₇ and e_R is performed.

_ e_Lfmme ^■ cos(θ_L ) + e_Rfim ^■ cos(θ_R ) _ _ e_L/_ • sin(#_L ) + e^_ • sin(#_Λ ) ^a _ ^r frame ⁼ ' °- - ^l frame ⁼ ( 6 ]

^a -^rf_rame ^an(^ ^α -^lf_rame ^can ^e seen as a coordinate pair representing the overall localized position of sound components within the pair of audio signal frames L₁ and R_t and preceding audio signal frames. For instance, a -^r _frame ^an<^ ^α -i_frame ^can describe the position of the point P in Fig. 4. The angle or direction θ_frame is an indicator of the perceived direction of the audio signal frame pair L_t and R_t .

θ_fr_ame ^can ^e a good representation of the perceived direction of the audio signal frame pair L_t and R_t and the at least one preceding audio signal frame pair:

"frame ^~ ^-\^a _ ^r frame ' ^a _ ^l frame ) ^• ( 7 , In other words, θ_frame may be seen as a representation of a long-term estimate of a perceived direction of the audio signal .

Knowing the direction from the assumed or known position A of the listener to the first loudspeaker LS and the direction from the assumed or known position A of the listener to the second loudspeaker RS, the first parameter θ_frame may be determined with low computational effort.

The difference θ_frame-90° can also be seen as a first parameter indicative of a perceived direction of an audio signal frame pair L_f and R_f and at least one preceding audio signal frame pair or, in the time domain, of the pair L_t and R_t and at least one preceding audio signal frame pair.

It is indicative of a perceived direction of the audio signal frame pair L_f and R_f and at least one preceding audio signal frame pair relative to a first imaginary line S₃ passing through the assumed or known position A of the listener, the first imaginary line S₃ being perpendicular to a second imaginary line S₄ connecting the first loudspeaker LS and the second loudspeaker RS.

The angular offset θfr_fame -90' relative to the first imaginary line S3 may also be thought of as an indicator for the amount of spatial effects contained in the entire audio signal frame pair L_f and R_f and at least one preceding audio signal frame pair. This indicator may also be considered in deciding whether a gain should be applied to a specific subband of the reconstructed first channel audio signal frame L_f or of the reconstructed second channel audio signal frame R_f .

In the following it is explained how a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of a audio signal frame pair, can be obtained.

First, the frequency domain audio signal frames L_f and R_f are divided into m frequency subbands, wherein 0<m<M . sbθffset[m] is the lower bound of frequency subband m . The frequency subbands m , i.e. their bounds, may be identical for the first channel audio signal frame L_f and the second channel audio signal frame R_f . Moreover, they may also be identical to the frequency subbands of the reconstructed audio signal frame pair L_f and R_f .

The frequency subbands m of the audio signal frame pair L_f and R_f are non-uniform frequency subbands corresponding to the human auditory filters. Namely, they are equivalent rectangular bandwidth (ERB) frequency subbands. The human auditory system can be modeled as a system comprising a plurality of auditory filters. The bandwidths of these auditory filters increase with increasing audio signal frequencies. By splitting the audio signal frame pair L_f and R_f into non-uniform frequency subbands m corresponding to the human auditory filters, determination whether a gain should be applied to a frequency subband m of the reconstructed first channel audio signal frame L_f or of the reconstructed second channel audio signal frame R_f can take into account psychoacoustic parameters of the human auditory system, thus possibly attaining enhanced widened auditory spatial images.

In the next step the energies e_L in each frequency subband m of the first channel audio signal frame L_f and the energies e_R in each frequency subband m of the second channel audio signal frame R_f are determined:

⁽⁸⁾

The energies e_L and e_R can be considered to represent the short-term energies of the first and second channels in each frequency subband m, respectively.

Subsequently, the direction Θ_L from the assumed or known position A of the listener to the first loudspeaker LS weighted with the energy e_L of the first channel audio signal frame L_f , which corresponds to the energy of said frame in the time domain L_t , is calculated for each subband m . Accordingly, the direction Θ_R from the assumed or known position A of the listener to the second loudspeaker RS weighted with the energy e_R of the second channel audio signal frame R_f , which corresponds to the energy of said frame in the time domain R_t , is calculated for each subband m . Normalization with the sum of the energies e_L and e_R is performed: e_L - cos(#_z )+ e_Λ - ∞s(θ_R ) . e_L - sm(θ_L ) + e_R - sin(øj a _ r_m = — = , a _ι_m = — = . ( 9 ; eL_m + ^eR_m ^eι_m + ^eκ_m

Similar to the explanations given above with respect to the first parameter, a directional angle θ_m can be derived from a _r_m and a _i_m .

The angles or direction angles θ_m are second parameters indicative of the perceived direction of the respective frequency subband m of the audio signal frame pair L_f and

R_f , and thus of the audio signal frame pair L₁ and R₁ in the time domain. In other words, the angles or direction angles θ_m may be seen as indicative of a short-term estimate of a perceived direction of the respective frequency subband m of the audio signal frame pair.

The second parameters θ_m of the set of second parameters are obtainable from the direction from the assumed or known position of the listener A to the first loudspeaker LS weighted with the first channel audio signal frame energy e_L within the respective frequency subband m , and the direction from the assumed or known position A of the listener to the second loudspeaker RS weighted with the second channel audio signal frame energy e_R within the respective frequency subband m .

The differences #_m-90° can also be seen as a set of second parameters indicative of a perceived direction of the respective frequency subband m of the audio signal frame pair L_f and R_f .

They are indicative of a perceived direction of a respective frequency subband m of the audio signal frame pair L_f and R_f relative to a first imaginary line S₃ passing through the assumed or known position A of the listener, the first imaginary line S₃ being perpendicular to a second imaginary line S₄ connecting the first loudspeaker LS and the second loudspeaker RS.

By describing the above perceived directions θ_m relative to the first imaginary line S3, arranging the perceived directions θ_m in two groups, wherein the first group comprises perceived directions having a positive angular offset relative to the first imaginary S₃ line and the second group comprises perceived directions having a negative angular offset relative to the first imaginary line S₃, is enabled. The angular offset relative to the first imaginary line S₃ may also be thought of as an indicator for the amount of spatial effects contained in the respective subband m of the audio signal frame pair L_f and R_f . This indicator may also be considered in deciding whether a gain should be applied to a specific subband m of the reconstructed first channel audio signal frame L_f or of the reconstructed second channel audio signal frame R_f .

The parameter aveDiff can be seen as an averaged representation of the set of second parameters θ_n

Alternatively, the computation of parameter aveDiff may consider only a subset of the second parameters θ_m, for example θ_m corresponding to a selected number of lowest frequency bands, instead of considering the full set of second parameters θ_m for the M frequency bands as in equation (11) .

Considering aveDiff in determining whether a gain should be applied to at least one frequency subband m of the reconstructed first channel audio signal frame L_f or to at least one frequency subband m of a reconstructed second channel audio signal frame R_f , may thus correspond to basing the determination on averaged second parameters .

If aveDiff assumes the value 0, this can be understood as indicating that no spatial effects are contained within the pair of audio signal frames L_f and R_f , i.e. that the first audio signal frame L_f and the second audio signal frame R_f are identical. On the other hand, an assumed value of 90° indicates maximum spatial effects within the pair of audio signal frames L_f and R_f , i.e. they differ completely from each other and are not correlated.

An advantage of basing determination whether a gain should be applied to at least one frequency subband m can be that can be that fast determination may be enabled. Checking if each second parameter θ_m meets a certain condition may require more time or higher processing power. This may be even more significant if several conditions have to be met.

Fig. 5 is a flowchart illustrating an exemplary embodiment of a first method according to the second aspect of the present invention.

According to this embodiment, whether a gain has been applied to at least one frequency subband m of a preceding reconstructed first or second channel audio signal frame is taken into account in determining whether a gain should be applied to at least one frequency subband m of the reconstructed first or second channel audio signal frame L_f and R_f , respectively.

To this end, it is stored in the array st_wideArray, whether a gain has been applied to at least one frequency subband m of a preceding reconstructed first or second channel audio signal frame, st wideArray is initialized to zero start up. It is updated after each audio signal frame pair by performing a left shift, thereby discarding the flag indicating whether a gain has been applied to the oldest reconstructed audio signal frame pair represented by st_wideArray . The flag for the current reconstructed audio signal frame pair L_f and R_f is stored at the rightmost position st_wideArray [ 0 ] in the array. In the present example, eight flags are stored in st wideArray.

In step 1, it is checked whether a gain has been applied to the last eight frames by comparing st_wideArray to the binary number 11111111. It is also checked whether the conditions aveDijf < 10° and Θ f_fr_ra_^me - 9W < 10° are also met . I f true, applying a gain to at least one frequency subband m of the reconstructed first channel audio signal frame L_f or of the reconstructed second channel audio signal frame R_f is not enabled. In other words, auditory spatial image widening is not enabled. The flag widening enabled is used to indicate the whether or not a gain should be applied. The value 1 corresponds to enabled auditory spatial image widening and the value 0 corresponds to disabled auditory spatial image widening.

A listener's perception of the auditory spatial image of an audio signal he is provided with at a certain time may depend on the history of the audio signal. For example, due to auditory spatial image widening having been applied to preceding reconstructed audio signal frames, the listener may continue to perceive a widened auditory spatial image even no gain is applied to any of the subbands m of the current reconstructed audio signal frames L_f and R_f . Thus, it may not be necessary to apply a gain to any of the subbands m of the currently perceived reconstructed audio s ignal frames L_f and R_f .

In step 2 , it i s checked whether the conditions aveDiff < 10° or θ fr_f ame - 90' < 10° are met .

If false, i.e. if at least one of them is greater than 10°, it is determined that auditory spatial image widening should be applied. Being greater than 10°, the amount of spatial effects in the audio signal frame pair L₁ and R₁ is rather large. Thus, a gain should be applied to at least one frequency subband m of the reconstructed first channel audio signal frame L_f or to at least one frequency subband m of the reconstructed second channel audio signal frame R_f in order to obtain a reconstructed auditory spatial image closely resembling the original auditory spatial image of the audio signal frame pair L_t and R₁.

I f the check of step 2 yields a pos itive result , a third check i s performed in step 4 . Thi s time , it i s determined if the conditions aveDiff > 9° and Θ_frame - W > 9° are met and if widening has been enabled for the past two frames. Thus, a somewhat lesser amount of spatial effects in the audio signal frame pair L₁ and R₁ is sufficient to cause determination that widening should be applied. Among other reasons, this stems from the fact that if a gain has been applied to a subband of a preceding reconstructed audio signal frame it may be advisable to also apply a gain to at least one subband m of current reconstructed audio signal frames L_f and R_f so as not to cause sudden changes in the perceived auditory spatial image. Moreover, as widening has not been applied to the past eight frames, but only to a smaller number of frames, it will make a perceptual difference whether or not widening is applied to at least one subband m of the current reconstructed audio signal frames L_f and R_f .

Similarly, step 5 comprises checking if the conditions aveDiff>%° and θ_fmme - 90° > 8° are met and if widening has been enabled for the past six frames.

If the checks in step 5 are negative, step 6 is entered. It is checked if g_enc < 2 and aveDiff < 10° hold, with

6 enc _{τhe var}i_abi_{e s a a}nd b_m and the

functions MAX ( ) and MIN () will be explained in more detail below (see the discussion of equation (13)) . If this is not the case, it is determined that auditory spatial image widening should be applied.

Taking into account the amount of spatial effects in the audio signal frame pair L_t and R_t as well as frequency subband specific spatial effect information and the audio signal history according to the exemplary embodiment of the present invention currently discussed, determination whether auditory spatial image widening should be applied is based on properties of human auditory perception. In consequence, reconstructed auditory spatial images may be obtained that closely resemble the original auditory spatial image.

In the following it is explained how it can be determined for each frequency subband m whether a gain should be applied to either the reconstructed first channel audio signal frame L_f or the reconstructed second channel audio signal frame R_f , wherein discrimination whether the gain should be applied to the respective subband m of the reconstructed first channel audio signal frame L_f or of the reconstructed second channel audio signal frame R_f is based on the second parameter θ_m associated with the respective frequency subband m .

To this end, if it has been determined that a gain should be applied to at least one frequency subband m of the reconstructed first channel audio signal frame L_f , i.e. the left channel audio signal frame, or to at least one frequency subband m of the reconstructed second channel audio signal frame R_f , i.e. the right channel audio signal frame, a binary value flags_Wldemng(m) is determined for each frequency subband m indicating whether a gain should be applied to the said frequency subband m of the reconstructed left channel audio signal frame L_f or to said frequency subband m of the reconstructed right channel audio signal frame R_f . If θ_m is greater than 90°, a gain should be applied to the respective frequency subband m of the reconstructed left channel audio signal frame L_f . Otherwise, it should be applied to the respective frequency subband m of the reconstructed right channel audio signal frame R_f .

, , Left, θ_m > 90 ΛO^c fl^ags _wιdemng{m) = { [ R„i.gh it, ot 1herwi ■se > 0 < m < M ⁽ 12 ;

Thus, applying a gain to the reconstructed audio signal frame associated with the audio signal frame of the audio signal frame pair L_t and R₁ that is assigned to the loudspeaker for which the direction from an assumed or known position A of a listener to said loudspeaker is closer to the perceived direction in the respective frequency subband m as indicated by the second parameter θ_m of said frequency subband m is enabled.

In the following it is explained how the gain can be obtained from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of an indicator of the maximum of the energy e_L in the frequency subband m of the first channel audio signal frame L_f and the energy e_R in said frequency subband m of the second channel audio signal frame R_f , and an indicator of the minimum of the energy e_L in the frequency subband m of the first channel audio signal frame L_f and the energy e_R in said frequency subband m of the second channel audio signal frame R_f .

Defining cι_m and b_m according to equation (13), MAX(a_m,b_m) is an indicator of the maximum of the energy e_L in the frequency subband m of the first channel audio signal frame L_f and the energy e_R in said frequency subband m of the second channel audio signal frame R_f . Similarly, MIN(a_m,b_m) is an indicator of the minimum of the energy e_L in the frequency subband m of the first channel audio signal frame L_f and the energy e_R in said frequency subband m of the second channel audio signal frame R_f . The gain g_wιdemng is then given by equation (14) :

1 g widening;

J_ ψ MAXja _m,b_m) ( 14 :

S_wi_deni_ng ^^{s s} reciprocal average value of the frequency subband specific indicator ratios of MAX(a_m,b_m) and

With g_wιdemng being the reciprocal average value of the frequency subband specific indicator ratios, the gain S_wi_deni_ng ^~^s comparatively small if the subband specific indicator ratios are large. Large subband specific indicator ratios may occur if the indicator MAX(a_m,b_m) of is greater than the indicator MIN(a_m,b_m) . In other words, large subband specific indicator ratios may occur if the energies of the first channel audio signal frame L_f and the second channel audio signal frame R_f in the respective subband m differ significantly. In consequence, applying the gain g_wldmmg may yield a widened reconstructed auditory spatial image in which subtle inter-channel differences are emphasized more than prominent inter-channel differences. Thus the widened reconstructed auditory spatial image may closely resemble the original auditory spatial image.

Gain quantization may enable representation of the gain by a reduced number of bits. If the gain is to be applied to a frequency subband, transmitting it to an entity configured to apply it may only require a reduced bandwidth.

A quantized representation of g_wιdemng can be obtained using the quantization table qTbl{k): qTbl(k)=2^025k , 0<k<K. (15;

In the present example, K is 16. The quantized gain can be chosen as the quantized value qTblyk) minimizing the error distance to the unquantized gain value g_wldemng •

To further reduce the required bitrate for representing the gain for example for transmission or storage, a lookup table can be used. The entity configured to determine the gain can then derive the index widening_{gam ldx} of the respective quantized value. An entity configured to apply the gain can retrieve the quanti zed gain g_wldmmg from the lookup table us ing the index widening _{gam ldx} . g_w,_den,_ng = qTbl[widening _gam __ldx J ( 1 6 ;

The index widening_{gam ldx} can possibly be represented by fewer bits than the quantized gain g_wιdemng itself.

Fig. 6 is a flowchart illustrating an exemplary embodiment of a second method according to the second aspect of the present invention.

It is similar to the flowchart of Fig. 2. Hence, corresponding method steps are not repeatedly discussed. In contrast to the flowchart of Fig. 2, it is determined in step 112 whether a gain should be applied to at least one frequency subband m of the reconstructed first channel audio signal frame L_f or of the reconstructed second channel audio signal frame R_f . The results obtained in method step 112 are also incorporated into a bitstream to be provided for a parametric audio decoder.

Fig. 7 is a more detailed illustration of method step 112 of Fig. 6.

In step 113, the first parameter θ_frame as defined by equation (7) is calculated. Step 114 comprises calculating the set of second parameters θ as defined by equation (10) . In step 115 the value θ_fi_∞ - 90¹ i s computed, whi le in step 11 6 the values #_m - 90° are determined. Subsequently, in step 117, aveDiff is determined according to equation (11) . In step 118, the actual decision whether a gain should be applied is performed as elucidated above with respect to Fig. 5. If gain application is not be carried out, merely the flag widening_enabled = 0 comprising this information is to form part of the bitstream to be provided for a parametric audio decoder. If a gain should be applied, it is computed in step 120 in accordance with equations (13) to (16) . Also, the flags flags_Wldemng(m) according to equation

(12) are determined in step 119. The index widening_{gaιn lώc} and the flags flags_Wldenmg(m) are multiplexed into the bitstream along with the flag widening_enabled = 1 indicating that the gain should be applied, the mono signal and the spatial extension information.

Fig. 8 is a flowchart exemplarily illustrating decoding of a bitstream generated according to the method of Fig. 6 including gain application. It is similar to the flowchart of Fig. 3, but in step 121 the index widening_{gam ιdx}, the flags flags_wιdemng{m) and the flag widening_enabled = 1 are used to apply a gain to frequency subbands m of the reconstructed first channel audio signal frame L_f or of the reconstructed second channel audio signal frame R_f . Audio signal frames L₁ and

R₁ are obtained in step 122 by frequency-to-time transformation . The actual gain application can be performed by multiplying the samples f_L of the reconstructed first channel audio signal frame L_f or the samples f_R of the reconstructed second channel audio signal frame R_f by g_wldemng ^{if the} flags flags _wιdemng{m) indicate to do so.

This following pseudocode also describes this process. for(i = 0; i < M i++) { if widening enabled == ^Λl' bit { for(j = sbOffset [i] ; j < sbθffset [i + I] ; j++) { if (f lags_widening (i) == Right)

J R\J )^~ J R\J )^' S widening

Else f_L(^j)=f_L(^j)-^g widening

In another embodiment, it is also possible to determine for each frequency subband m whether a gain should be applied to either the reconstructed first channel audio signal frame L_f or the reconstructed second channel audio signal frame R_f , wherein discrimination whether the gain should be applied to the respective subband m of the reconstructed first channel audio signal frame L_f or of the reconstructed second channel audio signal frame R_f is based on the inter-channel level difference icld_L(m) in said frequency subband m of the reconstructed first channel audio signal frame L_f and on the inter-channel level difference icld_R{m) in said frequency subband m of the reconstructed second channel audio signal frame R_f .

, N. [ Left, icld Λm) > icld Λm) flags _wιdemng{m) = \ [ R_Ώi.ght ^', ot 1herwi .se _' 0 ^< m < M ⁽ IT,

The gain g_wldemng can also be obtained from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of an indicator of the maximum MAX(icld_L\i),icld_R(i)) of the inter- channel level difference icld_L(m) in the frequency subband m of the reconstructed first channel audio signal frame L_f and the inter-channel level difference icld_R\m) in said frequency subband m of the reconstructed second channel audio signal frame R_f , and an indicator of the minimum MIN(icld_L\i),icld_R\i)) of the inter-channel level difference icld_Lψι) in the frequency subband m of the reconstructed first channel audio signal frame L_f and the inter-channel level difference icld_R{m) in said frequency subband m of the reconstructed second channel audio signal frame R_f :

έ> widening ( 18 ;

The parameters determining the inter-channel level differences icld_Lψι) and icld_Rψι) may anyway form part of the spatial extension information needed in order to establish these differences between the reconstructed first channel audio signal frame L_f and the reconstructed second channel audio signal frame R_f . It may be possible to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame L_f or the reconstructed second channel audio signal frame R_f at the entity configured to perform reconstruction of the reconstructed first and second channel audio signal frames L_f and R_f , i.e. at a decoder. To this end, it may not be necessary to provide said entity with additional parameters for determining for each frequency subband m whether a gain should be applied to either the reconstructed first channel audio signal frame L_f or the reconstructed second channel audio signal frame R_f . Also, the gain can be applied to at least one frequency subband m of the reconstructed first channel audio signal frame L_f and the reconstructed second channel audio signal frame R_f at the decoder.

Fig. 9 is a schematic illustration of a first exemplary embodiment of an apparatus according to the first aspect of the present invention.

The apparatus comprises a processor 201 configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. The processor 201 can also be seen as means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. The processor 201 may be part of an audio encoder or an audio decoder, to name but a few examples. For instance, processor 201 may be configured to implement the steps of the flowcharts of Figs. 5-7 or 8.

Fig. 10 is a schematic illustration of a second exemplary embodiment of an apparatus 203 according to the first aspect of the present invention. The apparatus 203 forms part of a parametric audio encoder 202. It comprises a processor 204. In turn, the processor 204 comprises a frequency subband selection circuit 205 and a parameter determination circuit 206.

The processor 204 is configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. Processor 204 thus may be configured to implement the steps of the flowcharts of Fig. 5-7.

The first parameter and the second parameter can be determined by means of the parameter determination circuit 206. Processor 204 is thus also configured to calculate the first parameter and the set of second parameters (see steps 113 and 114 of Fig. 7) . The parameter determination circuit 206 may also be thought of as means for calculating the first parameter and the set of second parameters and so may the processor 204. For parametric audio encoding, the parametric audio encoder 202 can be provided with an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame. Thus, information that may be necessary for the parameter determination circuit 206 for calculating a first parameter indicative of a perceived direction of the audio signal frame pair and a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, may be available at the parametric audio encoder 202. As the processor 204 and thus the parameter determination circuit 206 form part of the parametric audio encoder 202, this information is available to the parameter determination circuit 206.

The frequency subband selection circuit 205 is configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband. Hence, the processor 204 can be considered to be configured accordingly. The frequency subband selection circuit 205 or the processor 204 may thus also be seen as means for determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband.

Fig. 11 is a schematic illustration of an embodiment of a readable medium 300 according to the fourth aspect of the present invention.

In this example the readable medium 300 is a computer- readable medium. A program 301 according to the fifth aspect of the present invention is stored thereon. The program 301 comprises program code 302. When executed by a processor, the instructions of the program code 302 cause a processor (for instance processor 201 of Fig. 9 or processor 204 of Fig. 10) to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. The program 301 can also be considered as a program according to the third aspect of the present invention.

It is to be understood that with respect to all of the above embodiments that relate to a processor the processor may for instance be implemented in hardware alone, may have certain aspects in software alone, or may be a combination of hardware and software. The processor may either be a separate module or it may be a subcomponent of a module such as, for example, a processor or an application specific integrated circuit (ASIC) that has other functional components or structures, too.

Furthermore, it is readily clear for a person skilled in the art that the logical blocks in the schematic block diagrams as well as the flowchart and algorithm steps presented in the above description may at least partially be implemented in electronic hardware and/or computer software, wherein it may depend on the functionality of the logical block, flowchart step and algorithm step and on design constraints imposed on the respective devices to which degree a logical block, a flowchart step or algorithm step is implemented in hardware or software. The presented logical blocks, flowchart steps and algorithm steps may for instance be implemented in one or more digital signal processors (DSPs) , application specific integrated circuits (ASICs) , field programmable gate arrays (FPGAs) or other programmable devices. The computer software may be stored in a variety of computer- readable storage media of electric, magnetic, electro- magnetic or optic type and may be read and executed by a processor, such as for instance a microprocessor. To this end, the processor and the storage medium may be coupled to interchange information, or the storage medium may be included in the processor.

The invention has been described above by means of embodiments, which shall be understood to be exemplary and non-limiting. In particular, it should be noted that there are alternative ways and variations which are obvious to a skilled person in the art and can be implemented without deviating from the scope and spirit of the appended claims. It should also be understood that the sequence of all method steps presented above is not mandatory, also alternative sequences may be possible.

Claims

1. An apparatus, comprising: a processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.

2. The apparatus of claim 1, wherein the apparatus comprises a processor configured to calculate the first parameter and the set of second parameters.

3. The apparatus of any of the claims 1-2, wherein the apparatus forms part of a parametric audio encoder.

4. The apparatus of any of any of the claims 1-3, wherein the first parameter and the second parameters are at least partially obtainable from an assumed or known position of a listener with respect to a first loudspeaker to which the first channel audio signal frame is assigned and with respect to a second loudspeaker to which the second channel audio signal frame is assigned.

5. The apparatus of claim 4, wherein the first parameter is obtainable from:

- the direction from the assumed or known position of the listener to the first loudspeaker weighted with the sum of the energy of the first channel audio signal frame and the energy of a first channel audio signal frame of the at least one preceding audio signal frame pair, and

- the direction from the assumed or known position of the listener to the second loudspeaker weighted with the sum of the energy of the second channel audio signal frame and the energy of a second channel audio signal frame of the at least one preceding audio signal frame pair.

6. The apparatus of claim 4 or 5, wherein the second parameters of the set of second parameters are obtainable from the direction from:

- the assumed or known position of the listener to the first loudspeaker weighted with the first channel audio signal frame energy within the respective frequency subband, and

- the direction from the assumed or known position of the listener to the second loudspeaker weighted with the second channel audio signal frame energy within the respective frequency subband.

7. The apparatus of any of the claims 4-6, wherein the first parameter and the second parameters are indicative of a perceived direction relative to a first imaginary line passing through the assumed or known position of the listener, the first imaginary line being perpendicular to a second imaginary line connecting the first loudspeaker and the second loudspeaker .

8. The apparatus of any of the claims 1-7, wherein the processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is configured to base the determination on averaged second parameters .

9. The apparatus of any of the claims 1-8, wherein the processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is further configured to consider whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame in determining whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame.

10. The apparatus of any of the claims 1-9, wherein the frequency subbands of the audio signal frame pair are non-uniform frequency subbands corresponding to the human auditory filters.

11. The apparatus of any of the claims 1-10, wherein the apparatus comprises a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband.

12. The apparatus of any of the claims 1-10, wherein the apparatus comprises a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame.

13. The apparatus of any of the claims 1-12, wherein the gain is obtainable from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of:

- an indicator of the maximum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame, and

- an indicator of the minimum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame.

14. The apparatus of any of the claims 1-12, wherein the gain is obtainable from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of:

- an indicator of the maximum of the inter-channel level difference in the frequency subband of the reconstructed first channel audio signal frame and the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame, and

- an indicator of the minimum of the inter-channel level difference in the frequency subband of the reconstructed first channel audio signal frame and the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame.

15. The apparatus of any of the claims 1-14, wherein the gain is a quantized value.

16. The apparatus of claim 15, wherein the gain is obtainable from a lookup table.

17. An apparatus, comprising: means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.

18. A method, comprising: determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.

19. The method of claim 18, wherein it further comprises calculating the first parameter and the set of second parameters .

20. The method of any of the claims 18-19, wherein determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is performed on a parametric audio encoder side.

21. The method of any of the claims 18-20, wherein the first parameter and the second parameters are at least partially obtainable from an assumed or known position of a listener relative to a first loudspeaker to which the first channel audio signal frame is assigned and relative to a second loudspeaker to which the second channel audio signal frame is assigned.

22. The method of claim 21, wherein the first parameter is obtainable from:

23. The method of any of the claims 21-22, wherein the second parameters of the set of second parameters are obtainable from the direction from:

24. The method of any of the claims 21-23, wherein the first parameter and the second parameters are indicative of a perceived direction relative to a first imaginary line passing through the assumed or known position of the listener, the first imaginary line being perpendicular to a second imaginary line connecting the first loudspeaker and the second loudspeaker .

25. The method of any of the claims 18-24, wherein determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is based on averaged second parameters .

26. The method of any of the claims 18-25, wherein determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame involves considering whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame.

27. The method of any of the claims 18-26, wherein the frequency subbands of the audio signal frame pair are non-uniform frequency subbands corresponding to the human auditory filters.

28. The method of any of the claims 18-27, wherein it comprises determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband.

29. The method of any of the claims 18-27, wherein it comprises determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame

30. The method of any of the claims 18-29, wherein the gain is obtainable from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of: - an indicator of the maximum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame, and

31. The method of any of the claims 18-29, wherein the gain is obtainable from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of:

32. The method of any of the claims 18-31, wherein the gain is a quantized value.

33. The method of claim 32, wherein the gain is obtainable from a lookup table.

34. A program comprising: program code for performing the method according to any of the claims 18-33, when the program is executed on a processor.