WO2010075895A1 - Parametric audio coding - Google Patents

Parametric audio coding Download PDF

Info

Publication number
WO2010075895A1
WO2010075895A1 PCT/EP2008/068371 EP2008068371W WO2010075895A1 WO 2010075895 A1 WO2010075895 A1 WO 2010075895A1 EP 2008068371 W EP2008068371 W EP 2008068371W WO 2010075895 A1 WO2010075895 A1 WO 2010075895A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
signal frame
channel audio
reconstructed
frequency subband
Prior art date
Application number
PCT/EP2008/068371
Other languages
French (fr)
Inventor
Juha Petteri OJANPERÄ
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Priority to PCT/EP2008/068371 priority Critical patent/WO2010075895A1/en
Publication of WO2010075895A1 publication Critical patent/WO2010075895A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Definitions

  • This invention relates to parametric audio coding.
  • Channel capacity for transmitting an audio signal or storage capacity for storing an audio signal is often limited.
  • audio coding is applied to reduce the required bitrate. It has been the industries constant goal to develop audio coding techniques enabling high quality audio signal reconstruction while reducing the required bitrate to a minimum. This is especially true for multichannel audio coding where, compared to single channel audio signals, an even larger amount of information has to be dealt with.
  • a first apparatus comprising a processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
  • a second apparatus comprising means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
  • the first apparatus as well as the second apparatus according to the first aspect of the present invention can be a module that forms part of or is to form part of another apparatus (such as for instance an audio encoder or decoder) , for instance a processor, or it can be a separate apparatus .
  • a method comprising determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
  • a program comprising program code for performing the method according to the second aspect of the present invention, when the program is executed on a processor.
  • the program may for instance be a computer program that is readable by a computer or processor.
  • the program code may then for instance be computer program code.
  • the program may for instance be distributed via a network, such as for instance the Internet.
  • the program may for instance be stored on a tangible readable medium, for instance a computer-readable or processor-readable medium.
  • the readable medium may for instance be embodied as an electric, magnetic, electro-magnetic, optic or other storage medium, and may either be a removable medium or a medium that is fixedly installed in an apparatus or device.
  • a readable storage medium encoded with instructions that, when executed by a processor, perform the method according to the second aspect of the present invention is disclosed.
  • the readable storage medium may for instance be a computer-readable or processor-readable storage medium. It may be embodied as an electric, magnetic, electromagnetic, optic or other storage medium, and may either be a removable storage medium or a storage medium that is fixedly installed in an apparatus or device.
  • a program which causes a processor to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
  • the program may for instance be a computer program that is readable and/or executable by a computer or processor.
  • the program may for instance be a computer program with computer program code.
  • the program may be stored on a tangible readable medium, for instance a computer- readable or processor-readable medium.
  • the readable medium may for instance be embodied as an electric, magnetic, electro-magnetic, optic or other storage medium, and may either be a removable medium or a medium that is fixedly installed in an apparatus or device.
  • auditory spatial image may be used to refer to the listener's spatial perception of components of an audio signal.
  • Human listeners are capable of perceiving an auditory spatial image by interpreting differences of the signals received by the left and by the right ear. Among these differences are the interaural level difference (ILD) and the interaural time difference (ITD) . Diffraction and reflection may be caused by at least some of the listener's body parts, thus affecting the audio signal on its way from a sound source to the listener's eardrum.
  • a head related transfer function (HRTF) describes the transform an audio signal may be subject to due to these diffraction and reflection phenomena. It is direction- dependent. Hence, being used to the listener's individual HRTF, the listener's brain can derive directional cues from the interaural signal difference caused by the HRTF.
  • An exemplary multichannel audio signal is a stereo audio signal.
  • a stereo audio signal comprises a first channel audio signal and a second channel audio signal.
  • the first channel and the second channel are often referred to as a left channel and a right channel, respectively.
  • a stereo audio signal is usually intended to be reproduced via a set of speakers, wherein for each channel there is at least one speaker provided. In accordance with the naming of the channels, these two speakers can be called a left speaker and a right speaker, respectively.
  • the left and the right speaker can be arranged with a certain offset to one another.
  • a user of the audio signal reproduction system comprising said left and said right speaker, who may also be referred to as a listener or an auditor, may then localize whether a certain component of a replayed audio signal is provided through the left or through the right speaker.
  • the audio signal may also be provided through both speakers, possibly with a different sound level for each speaker, so that the listener does not only perceive said signal component exclusively on one side.
  • the listener can perceive an auditory spatial image.
  • the auditory spatial image the listener perceives may also be called a stereo image.
  • the audio signal provided to the speakers or headphones may, among other parameters, exhibit a certain inter-channel level difference (ICLD) and a certain inter-channel time difference (ICTD) and a certain inter- channel correlation (ICC) .
  • the ICLD and the ICTD may correspond to the ILD and to the ITD, respectively.
  • the ICLD and the ICTD may only partially determine the ILD and the ITD, respectively.
  • a set of four loudspeakers for example a front right loudspeaker, a front left loudspeaker, a rear right loudspeaker and a rear left loudspeaker, may be provided for audio signal reproduction.
  • Five channel audio signal reproduction systems often comprise a fifth loudspeaker arranged in between the front right loudspeaker and the front left loudspeaker.
  • an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame may be formed by selecting two channels of a plurality of provided channels. For instance, if five channels are provided, the left and right front channels may form one channel pair and the left and right rear channels form another channel pair.
  • front left and rear left channels form a channel pair and the front right and rear right channels form another channel pair.
  • exemplary embodiments of the present invention are explained with respect to a scenario in which only two channels are provided. It is to be understood that, as stated above, exemplary embodiments according to all aspects of the present invention may be applied in scenarios in which more than two channels are provided by forming at least one channel pair, each channel pair comprising two selected channels.
  • the auditory spatial image a listener will perceive if provided with the decoded audio signal that has been obtained by decoding the previously encoded audio signal may be referred to as a reconstructed auditory spatial image.
  • a reconstructed auditory spatial image that closely resembles the auditory spatial image of the original (unmodified and uncoded) audio signal.
  • Parametric audio coding may be thought of as generating a downmix signal from a multichannel audio signal and providing spatial extension information that allows reconstruction of the multichannel audio signal from the downmix signal.
  • a downmix signal is generated from a multichannel audio signal in such a way that M input channels are used to generate N downmix channels, with N ⁇ M.
  • the audio signal may be divided into a sequence of frames, each frame comprising a limited number of samples of the audio signal.
  • an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame may be encoded as a single mono audio signal frame and spatial extension information.
  • the parametrically encoded audio signal represents the audio signal frame pair.
  • the spatial extension information may contain parameters such as ICLDs, ICTDs and ICCs, as well as other parameters.
  • Any parametric audio coding technology may be used in the scope of the present invention.
  • Exemplary parametric audio coding technologies comprise Binaural Cue Coding (BCC) and Spatial Audio Coding (SAC) .
  • BCC Binaural Cue Coding
  • SAC Spatial Audio Coding
  • An advantage of parametric audio coding is that bitrate requirements can often be reduced significantly. Instead of separately encoding a full set of input channel signals, a smaller number of signals and the spatial extension information have to be provided for transmission or storage. For example instead of two audio channel signals only a mono signal, i.e. a single channel signal, and the spatial extension information has to be provided for transmission or storage.
  • the bitrate required for the spatial extension information is usually smaller than the bitrate required for transmitting another mono audio signal.
  • Any pertinent mono codec may be used for mono signal coding.
  • AAC Advanced Audio Coding
  • AAC+ Advanced Audio Coding with spectral band replication
  • ITU International Telecommunication Union G.718 mono codec
  • Decoding a parametrically encoded audio signal may involve decoding the mono signal. Exploiting the spatial extension information, the mono signal can serve as a basis for reconstructing a first channel audio signal and a second channel audio signal.
  • the reconstructed first channel audio signal frame may be a representation of the first channel audio signal frame
  • the reconstructed second channel audio signal frame may be a representation of the second channel audio signal frame.
  • the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame.
  • An audio signal frame may be divided into a plurality of frequency subbands .
  • a signal is represented by a set of frequency components.
  • a frequency subband of the signal may then be thought of as comprising each frequency component falling within an upper frequency bound and a lower frequency bound limiting the respective frequency subband.
  • the frequency subbands i.e. their bounds, may be identical for the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame. Moreover, they may also be identical to the frequency subbands of the audio signal frame pair.
  • Exemplary embodiments according to all aspects of the present invention involve determination whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame.
  • One possibility to apply a gain to at least one frequency subband of a reconstructed audio signal frame is to multiply each frequency component of the respective frequency subband by a gain factor.
  • the listener's perception of an audio signal can be modified by applying a gain to a certain frequency band.
  • the auditory spatial image perceived by the listener may be affected by applying a gain to a frequency subband of either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame.
  • a listener's perception may exhibit different characteristics in different frequency subbands .
  • a gain can be applied to a frequency subband of a reconstructed first channel audio signal frame or to a frequency subband of a reconstructed second channel audio signal frame. Audio signal components falling within the subband of the reconstructed audio signal frame to which a gain is applied may be perceived more prominently in the respective channel than audio signal components falling within the subband of the reconstructed audio signal frame of the other channel.
  • the decision whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is based on a first parameter and a set of second parameters.
  • the first parameter may be indicative of a long-term estimate of a perceived direction of the audio signal frame pair.
  • the first parameter may be indicative of a perceived direction of the (current) audio signal frame pair and at least one preceding audio signal frame pair.
  • No limitations pertain to the first parameter as long as it is indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair. It is readily clear that different listeners are likely to have a different audio signal perception. Hence, there is probably not a single parameter that can exactly describe the perceived direction of an audio signal for all possible listeners.
  • a model describing human audio signal localization is likely to be afflicted with at least some amount of inaccuracy.
  • various parameters can be determined that are indicative of a perceived direction of an audio signal, while it may be difficult to determine a parameter exactly representing said direction.
  • other examples for directional sensing of sounds may include techniques based on subspace decomposition of a covariance matrix of received signals; ESPRIT and MUSIC.
  • the former is introduced in ,A. Paulraj and R. R. Kailath, "ESPRIT-Estimation of Signal Parameters via Rotational Invariance Techniques," IEEE Trans.
  • the first parameter is indicative of a perceived direction of the audio signal frame pair and at least one preceding frame pair.
  • this may be understood as taking into account not only the perceived direction of a current audio signal frame pair but also considering the audio signal history.
  • the first parameter may change more slowly than, for instance, a parameter indicative of a perceived direction of the current audio signal frame pair only.
  • any number of preceding audio signal frames may be taken into account. Of course, if the current audio signal frame pair does not have a predecessor, a preceding audio signal frame pair cannot be taken into account.
  • Each second parameter of the set of second parameters may be indicative of a short-term estimate of a perceived direction of a frequency subband (or several frequency subbands) of the audio signal frame pair.
  • each second parameter may be indicative of a perceived direction of a frequency subband of the audio signal frame pair.
  • each second parameter may be understood as a parameter that does not describe the perceived direction of the entire audio signal frame pair, but of one or more specific frequency subbands of the audio signal frame pair. In case of one specific subband, one may also call each second parameter a frequency subband specific parameter.
  • the set of second parameters may comprise at least two second parameters respectively indicative of a perceived direction of a frequency subband of the audio signal frame pair but may also comprise more than two second parameters. If the set of second parameters is only derived from the current audio signal frame pair and not also from a preceding audio signal frame pair, it may not only be thought of as comprising frequency subband specific information but it may also change more quickly than the first parameter for a sequence of audio signal frame pairs.
  • any suitable processing of the first parameter and the set of second parameters may be employed in the course of determining whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame. Any condition that has to be met in order to cause determination to yield a positive result can be imposed on the first parameter and the set of second parameters.
  • a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame.
  • determination may be performed at a parametric audio encoder or at a parametric audio decoder. It is also possible that it takes place at a separate module neither forming part of an encoder nor of a decoder, such as for instance a network element located in a transmission path between an encoder and a decoder, a separate module co-located with the encoder, or a separate module co-located with the decoder.
  • the auditory spatial image a listener perceives when provided with the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame which will in the following be referred to as the reconstructed auditory spatial image, may differ from the auditory spatial image the listener would perceive if provided directly with the first channel audio signal frame and the second channel audio signal frame.
  • the latter auditory spatial image will be in the following referred to as the original auditory spatial image .
  • One reason for the occurrence of a difference between the reconstructed auditory spatial image and the original auditory spatial image may be that parametric coding may possibly not allow exact reconstruction of the auditory spatial image. This may be, for instance, due to the spatial extension information only representing some but not every aspect of the interrelations of the first channel audio signal frame and the second channel audio signal frame. Coarse quantization of the spatial extension information may also adversely affect the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame with regard to the auditory spatial image.
  • the reconstructed auditory spatial image may be perceived by the listeners as a being different from the desired auditory spatial image.
  • the desired reconstructed auditory spatial image may be the original auditory spatial image or for example a wider or a narrower auditory spatial image than the original one.
  • Exemplary embodiments according to all aspects of the present invention can be employed to determine whether auditory spatial image width modification should be applied to the reconstructed auditory spatial image.
  • the reconstructed auditory spatial image may be perceived as being narrow compared to the original auditory spatial image, and it may be desirable to apply auditory spatial image widening to the reconstructed auditory spatial image.
  • Applying a gain to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame may yield a widened auditory spatial image, for example by means of a thereby obtained subband and channel specific audio signal amplification. Due to auditory spatial image widening, even a widened reconstructed auditory spatial image obtained from a parametrically encoded low bitrate audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame may closely resemble the original auditory spatial image.
  • exemplary embodiments according to all aspects of the present invention take into account parameters indicative of a perceived direction of the audio signal frame pair, of a preceding audio signal frame pair and also of a frequency subband of the audio signal frame pair.
  • a hypothetic listener's directional audio signal perception can serve as a criterion whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame.
  • An advantage of this approach can be that an auditory spatial image width modification result can be obtained that causes the listener's perception of the modified reconstructed auditory spatial image to closely resemble that of the original auditory spatial image.
  • the determination criteria used may help applying width modification only if it is beneficial. Only applying a gain in these cases can also have the advantage of reducing the signal processing load by potentially omitting gain application if the criteria are not met.
  • the first apparatus comprises a processor configured to calculate the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
  • the second apparatus further comprises means for calculating the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
  • These means may for instance be embodied as a processor configured to calculate the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, but are not limited thereto.
  • An advantage of these embodiments can be that it may then not be necessary to transmit the first parameter and the set of second parameters from another apparatus or device that is configured to determine the first parameter and the set of second parameters to the first apparatus or the second apparatus.
  • the processor configured to calculate the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, or the corresponding means do not necessarily have to be separate entities.
  • the processor configured to calculate the first parameter and the set of second parameters may form part of the processor configured to determine whether a gain should be applied to a frequency subband of a reconstructed first channel audio signal frame or to a frequency subband of a reconstructed second channel audio signal frame .
  • the method comprises calculating the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
  • This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention.
  • the program causes a processor to calculate the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
  • the apparatus forms part of a parametric audio encoder.
  • the apparatus forms part of means for parametric audio encoding.
  • the means for parametric audio coding may for instance be embodied as a parametric audio encoder, but are not limited thereto.
  • determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is performed on a parametric audio encoder side.
  • This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention.
  • the program causes a processor forming part of a parametric audio encoder to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame.
  • a parametric audio encoder can be provided with an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame.
  • information that may be necessary for calculating a first parameter indicative of a perceived direction of the audio signal frame pair and a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair may be available at the parametric audio encoder.
  • the first parameter and the set of second parameters may then be computed on the parametric audio encoder side, too.
  • This approach can render it unnecessary to forward these parameters to another module serving for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame. It may suffice to generate a flag indicating the determination result and to provide it to an entity configured to carry out the actual application of the gain if permitted by the flag value.
  • the first parameter and the second parameters are at least partially obtainable from an assumed or known position of a listener relative to a first loudspeaker to which the first channel audio signal frame is assigned and relative to a second loudspeaker to which the second channel audio signal frame is assigned.
  • first channel audio signal frames for example left channel audio signal frames
  • second channel audio signal frames i.e. right channel audio signal frames
  • the first channel audio signal frames are assigned to a first loudspeaker
  • the second channel audio signal frames are assigned to a second loudspeaker.
  • a listener's position can be expressed relative to the first loudspeaker and the second loudspeaker. It is possible that the listener's perceived auditory spatial image depends on his position relative to the first loudspeaker and to the second loudspeaker. As an example, being located closer to the first loudspeaker than to the second loudspeaker, audio signals emitted from the first loudspeaker may reach the listener with a smaller time difference and a higher sound intensity than audio signals emitted from the second speaker.
  • the listener's position relative to the first loudspeaker and the second loudspeaker can be assumed, for instance based on one or more pre-defined assumptions. To this end, it can be for instance assumed that the listener is positioned with the same distance to both the first and the second loudspeaker. Moreover, it may be assumed that the distance between the first loudspeaker and the second loudspeaker corresponds to the distance from the listener's position to each of the loudspeakers. Thus, the listener's position, the position of the first loudspeaker and the position of the second loudspeaker form the vertices of an equilateral triangle.
  • first and second audio signal frames can be reproduced by means of the two loudspeakers. Since the reconstructed first channel audio signal frame can be seen as a representation of the first audio signal frame and since the reconstructed second channel audio signal frame can be seen as a representation of the second audio signal frame, also the first channel audio signal frame is assigned to the first loudspeaker and the second channel audio signal frame is assigned to the second loudspeaker .
  • the listener's position relative to these two loudspeakers may then be measured and provided to an entity configured to calculate the first parameter and the set of second parameters.
  • perceived direction determination of the audio signal frame pair may be adapted to the reconstructed first and second audio signal frame reproduction scenario.
  • determining whether a gain should be applied to at least one frequency subband of a reconstructed first or second channel audio signal frame can aim at reproduction of an auditory spatial image that could be perceived with the underlying assumed or known listener and loudspeaker position configuration.
  • Exemplary embodiments according to all aspects of the present invention comprise that the first parameter is obtainable from the direction from the assumed or known position of the listener to the first loudspeaker weighted with the sum of the energy of the first channel audio signal frame and the energy of a first channel audio signal frame of the at least one preceding audio signal frame pair, and the direction from the assumed or known position of the listener to the second loudspeaker weighted with the sum of the energy of the second channel audio signal frame and the energy of a second channel audio signal frame of the at least one preceding audio signal frame pair.
  • all frequency subbands or only a subset thereof (for instance a subset that comprises the frequency subbands that are considered important, such as for instance low frequencies, wherein this consideration may depend on the sample rate applied) may be considered.
  • These embodiments may provide a good representation of the perceived direction of the audio signal frame pair and the at least one preceding audio signal frame pair. Knowing the direction from the assumed or known position of the listener to the first loudspeaker and the direction from the assumed or known position of the listener to the second loudspeaker, the first parameter may be determined with low computational effort.
  • the second parameters of the set of second parameters are obtainable from the direction from the assumed or known position of the listener to the first loudspeaker weighted with the first channel audio signal frame energy within the respective frequency subband, and the direction from the assumed or known position of the listener to the second loudspeaker weighted with the second channel audio signal frame energy within the respective frequency subband.
  • Exemplary embodiments according to all aspects of the present invention comprise that the first parameter and the second parameters are indicative of a perceived direction relative to a reference direction, for example relative to a first imaginary line passing through the assumed or known position of the listener, the first imaginary line being perpendicular to a second imaginary line connecting the first loudspeaker and the second loudspeaker.
  • the first parameter can be indicative of a perceived direction of an audio signal frame pair and at least one preceding audio signal frame pair relative to said first imaginary line and each second parameter of the set of second parameters can be indicative of a perceived direction of a frequency subband of the audio signal frame pair relative to said first imaginary line.
  • the angular offset relative to the first imaginary line may also be thought of as an indicator for the amount of spatial effects contained in the entire audio signal frame pair and in a specific subband of the audio signal frame pair, respectively. This indicator may also be considered in deciding whether a gain should be applied to a specific subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame .
  • the processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is configured to base the determination on averaged second parameters.
  • the means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame are configured to base the determination on averaged second parameters .
  • determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is based on averaged second parameters.
  • This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention.
  • the program causes a processor to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame based on averaged second parameters .
  • Averaging may be based on any existing averaging approach. Besides of computing the mean value or the median value, averaging may also comprise assigning weights to the second parameters before summing them up. The weights may control the influence of each second parameter on the averaged value.
  • the processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is further configured to consider whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame in determining whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame.
  • the means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame are configured to consider whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame in determining whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame.
  • determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame involves considering whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame.
  • This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention .
  • the program causes a processor to consider whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame in determining whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame .
  • a listener's perception of the auditory spatial images of an audio signal he is provided with at a certain time may depend on the history of the audio signal. For example, due to auditory spatial image widening having been applied to a comparatively large number of preceding reconstructed audio signal frames, the listener may continue to perceive a widened auditory spatial image even no gain is applied to any of the subbands of the currently perceived reconstructed audio signal frames. Thus, it may not be necessary to apply a gain to any of the subbands of the currently perceived reconstructed audio signal frames.
  • the frequency subbands of the audio signal frame pair are non-uniform frequency subbands corresponding to the human auditory filters.
  • the human auditory system can be modeled as a system comprising a plurality of auditory filters. The bandwidths of these auditory filters increase with increasing audio signal frequencies.
  • An example of non- uniform frequency subbands corresponding to the human auditory filters are equivalent rectangular bandwidth (ERB) frequency subbands.
  • determination whether a gain should be applied to a frequency subband of a reconstructed first or second channel audio signal frame can take into account psychoacoustic parameters of the human auditory system, thus possibly attaining enhanced modified auditory spatial images.
  • the apparatus comprises a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband.
  • This processor may either be a separate module or it may form part of the processor configured to determine whether a gain should be applied to a frequency subband of a reconstructed first or second channel audio signal frame.
  • the apparatus comprises means for determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband.
  • These means may for example be embodied as a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband, but the means are not limited thereto.
  • the method comprises determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband.
  • the program causes a processor to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband.
  • These embodiments may for instance enable applying a gain to the reconstructed audio signal frame associated with the audio signal frame of the audio signal frame pair that is assigned to the loudspeaker for which the direction from an assumed or known position of a listener to said loudspeaker is closer to the perceived direction in the respective frequency subband as indicated by the second parameter of said frequency subband.
  • the perceived direction as indicated by the second parameter of a certain frequency subband of an audio signal frame pair can be 110°.
  • the direction is measured from an assumed or known position of a listener relative to a pair of speakers.
  • two speakers referred to as a left speaker and as a right speaker are used.
  • First channel audio signal frames for example left channel audio signal frames, are replayed by means of the left loudspeaker, while second channel audio signal frames, i.e. right channel audio signal frames, are replayed by means of the right loudspeaker.
  • the first channel audio signal frames are assigned to a first loudspeaker and the second channel audio signal frames are assigned to a second loudspeaker.
  • An angle of 90° corresponds to the direction of a first imaginary line perpendicular to a second imaginary line connecting the left and the right loudspeaker, the first imaginary line passing through the assumed or known position of the listener. Therefore, directions describable by an angle greater than 90° are closer to the direction from the assumed or known position of the listener to the left speaker, while directions describable by an angle of less than 90° are closer to the direction from the assumed or known position of the listener to the right speaker.
  • the perceived direction of 110° is thus closer to the direction from the assumed or known position of the listener to the left loudspeaker.
  • the specific subband for which the perceived direction as indicated by the second parameter of the said subband is 110° it is determined to apply a gain to the subband of the reconstructed first channel audio signal frame.
  • Said frame is associated with the left, i.e. first, channel audio signal frame of the audio signal frame pair that is assigned to the left loudspeaker because it may be thought of as a representation of the left channel audio signal frame.
  • the embodiments of the present invention currently discussed may enable emphasizing the content of the reconstructed audio signal frame contributing more significantly to the spatial auditory effects in the respective frequency subband.
  • the modified auditory spatial image may then be close to the desired auditory spatial image, it may for example closely resemble the original auditory spatial image.
  • the apparatus comprises a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame.
  • This processor may either be a separate module or it may form part of the processor configured to determine whether a gain should be applied to a frequency subband of a reconstructed first channel audio signal frame or to a frequency subband of a reconstructed second channel audio signal frame .
  • the apparatus comprises means for determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame.
  • These means may for example be embodied as a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame, but the means are not limited thereto.
  • the method comprises determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame.
  • This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention.
  • the program causes a processor to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame.
  • inter- channel level differences may form part of the spatial extension information that may be needed in order to establish these differences between the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame. It may be possible to determine for each frequency subband whether a gain should be applied to either the reconstructed first or second channel audio signal frame at the entity configured to perform reconstruction of the reconstructed first and second channel audio signal frame. To this end, it may then not be necessary to provide said entity with additional parameters. Also, the gain can be applied to at least one frequency subband of the reconstructed first or second channel audio signal frame at said entity in exemplary embodiments according to all aspects of the present invention.
  • the gain is obtainable from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of an indicator of the maximum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame, and an indicator of the minimum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame .
  • each indicator ratio is the ratio of an indicator of the maximum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame, and an indicator of the minimum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame .
  • Large subband specific indicator ratios may occur if the indicator of the maximum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame is much greater than the indicator of the minimum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame.
  • large subband specific indicator ratios may occur if the energies of the first channel audio signal frame and the second channel audio signal frame in the respective subband differ significantly.
  • applying the gain may yield a widened auditory spatial image, in which subtle inter-channel differences are emphasized more than prominent inter- channel differences.
  • the widened auditory spatial image may closely resemble the original auditory spatial image .
  • exemplary embodiments comprise that the gain is obtainable from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of an indicator of the maximum of the inter-channel level difference in the frequency subband of the reconstructed first channel audio signal frame and the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame, and an indicator of the minimum of the inter- channel level difference in the frequency subband of the reconstructed first channel audio signal frame and the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame.
  • the inter- channel level differences may form part of the spatial extension information that may be needed in order to establish these differences between the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame.
  • the gain may then be computed at the entity configured to perform reconstruction of the reconstructed first and second channel audio signal frame. To this end, it may then not be necessary to provide said entity with additional parameters for gain computation.
  • the gain can be applied to at least one frequency subband of the reconstructed first or second channel audio signal frame at said entity in exemplary embodiments according to all aspects of the present invention.
  • Exemplary embodiments according to all aspects of the present invention comprise that the gain is a quantized value.
  • Gain quantization may enable representation of the gain by a reduced number of bits.
  • transmitting the gain to an entity configured to apply it to at least one frequency subband of a reconstructed first or second channel audio signal frame may require a reduced bandwidth.
  • the gain is obtainable from a lookup table.
  • Fig. 1 An exemplary illustration of a method for parametric audio coding
  • FIG. 2 A more detailed representation of method step 101 of Fig. 1;
  • Fig. 3 A more detailed representation of method step 102 of Fig. 1 ;
  • Fig. 4 A schematic exemplary illustration of the assumed or known position of a listener relative to a first loudspeaker to which a first channel audio signal frame is assigned and relative to a second loudspeaker to which a second channel audio signal frame is assigned;
  • FIG. 5 A flowchart illustrating an exemplary embodiment of a first method according to the second aspect of the present invention
  • FIG. 6 A flowchart illustrating an exemplary embodiment of a second method according to the second aspect of the present invention.
  • FIG. 7 A more detailed illustration of method step 112 of Fig. 6;
  • Fig. 8 A flowchart exemplarily illustrating decoding of a bitstream generated according to the method of Fig. 6 including gain application;
  • FIG. 9 A schematic illustration of a first exemplary embodiment of an apparatus according to the first aspect of the present invention.
  • Fig. 10 A schematic illustration of a second exemplary embodiment of an apparatus according to the first aspect of the present invention
  • Fig. 11 A schematic illustration of an exemplary embodiment of a readable medium according to the fourth aspect of the present invention.
  • Method step 101 comprises parametrically encoding the input audio signals L 1 and R 1 .
  • Method step 101 delivers a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame.
  • the parametrically encoded audio signal is decoded. In decoding the parametrically encoded audio signal frame pair, a reconstructed first channel audio signal frame and a reconstructed second channel audio signal frame can be obtained.
  • Fig. 2 is a more detailed representation of method step 101 of Fig. 1.
  • the input signals L 1 and R 1 are divided into frames in step 103. Successive frames may overlap in time, for example half of their duration. As a result, the first channel audio signal frame L 1 and the second channel audio signal frame R 1 are obtained. They comprise samples t L and t R , respectively.
  • windowing is applied to the audio signal frames L 1 and R 1 .
  • Windowing may serve for suppressing framing induced artefacts in time-to-frequency transformed representations of L 1 and R 1 .
  • Any pertinent windowing function —wT can be employed.
  • An exemplary windowing function is the sinusoidal window given by: wherein N is the number of samples in each frame L t and R 1 .
  • step 105 the windowed audio signal frames L w and R w are transformed to the frequency domain.
  • any transform TF that provides complex valued output may be used.
  • DFT Discrete Fourier Transform
  • MDCT Modified Discrete Cosine Transform
  • MDST Modified Discrete Sine Transform
  • the transformed windowed audio signal frames are referred to as L f and R f .
  • the transformed windowed audio signal frames comprise the samples f L and f R I respectively. How they are obtained from the samples t L and t R can be described by the following equations: (2) In step 106 the audio signal frames L f and R f are transformed to a downmix signal, which in this example embodiment is a mono signal M f .
  • Step 107 comprises encoding the mono signal M f .
  • Any pertinent mono codec such as Advanced Audio Coding (AAC) , Advanced Audio Coding with spectral band replication (AAC+) and the International Telecommunication Union (ITU) G.718 mono codec, may be employed.
  • AAC Advanced Audio Coding
  • AAC+ Advanced Audio Coding with spectral band replication
  • ITU International Telecommunication Union
  • spatial extension information is derived from the first channel audio signal frame L f and the second channel audio signal frame R f .
  • exemplary spatial extension parameters are the inter-channel level difference (ICLD), the inter-channel time difference (ICTD) and a the inter-channel correlation (ICC) .
  • bitstream representative of an input frame may comprise one or more encoded frames or packets. It can be thought of as a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame L f or L 1 and a second channel audio signal frame R f or R 1 .
  • Fig. 3 is a more detailed representation of method step 102 of Fig. 1.
  • the bitstream provided from the encoding step 101 is demultiplexed in step 109. Thereby, an encoded mono signal and spatial extension information are obtained.
  • the mono signal is decoded, yielding the decoded mono signal M f .
  • Step 111 comprises applying the spatial extension information, thus obtaining the reconstructed first channel audio signal frame L f and the reconstructed second channel audio signal frame R f .
  • Fig. 4 schematically illustrates the assumed or known position A of a listener relative to a first loudspeaker LS to which a first channel audio signal frame L t is assigned and relative to a second loudspeaker RS to which a second channel audio signal frame R 1 is assigned;
  • first channel audio signal frames for example left channel audio signal frames L 1
  • second channel audio signal frames i.e. right channel audio signal frames R 1
  • first channel audio signal frames L 1 are assigned to the first loudspeaker LS
  • second channel audio signal frames R 1 are assigned to the second loudspeaker RS .
  • first and second audio signal frames L f and R f can be reproduced by means of the two loudspeakers LS and RS. Since the reconstructed first channel audio signal frame L f can be seen as a representation of the first audio signal frame L 1 and since the reconstructed second channel audio signal frame R f can be seen as a representation of the second audio signal frame R t , the first channel audio signal frame L 1 is assigned to the first loudspeaker LS and the second channel audio signal frame R 1 is assigned to the second loudspeaker RS in this case, too.
  • the position A can be expressed relative to the loudspeakers LS and RS.
  • the distance from position A to the first loudspeaker LS is Si
  • the distance from position A to the second loudspeaker RS is S2.
  • the distances Si and S2 are equal.
  • the direction from position A to the first loudspeaker LS can be described by the angle ⁇ L
  • the direction from position A to the first loudspeaker LS can be described by the angle ⁇ R .
  • ⁇ L is 120° and ⁇ R is 60°.
  • the triangle A, RS, LS is an equilateral triangle.
  • the first imaginary line S3 is perpendicular to the second imaginary line S4 that connects the first loudspeaker LS and the second loudspeaker RS.
  • the first imaginary line S 3 passes through the position A.
  • the listener may localize a sound component of the replayed first channel audio signal frame L t or of the second channel audio signal frame R t at a position P which can be described by its distance to the position A of the listener and the direction ⁇ as seen from that position A.
  • the energy e L of the first channel audio signal frame L 1 and the energy e R of the second channel audio signal frame R t are computed. This can be done by calculating the sum of the squared absolute values of the frequency transformed samples f L or f R of the respective frame and then extracting the square root of this sum according to equation (4) .
  • frequency subbands instead of considering all frequency subbands in the calculation of the energy e,frame 1 of the first channel audio signal frame L t and the energy e R of the second channel audio signal frame R t , only the most important frequency bands may be considered, for instance the frequency subbands corresponding to the low frequencies. Which frequency subbands are important may depend on the sample rate of the signal. For instance, half of the frequency subbands could be left out in case of a sample rate of 48kHz, whereas, at a sample rate of 8kHz, the frequency band coverage may only be up to 4kHz so in this case more than half of the bands may probably have to be included in the calculations.
  • Equation (5) with each update the sum of the former value of the variable e L weighted with ⁇ and the value of e L weighted with ⁇ 1 - ⁇ ) is assigned to e L e R is computed accordingly.
  • may have the value 0.95. It may thus happen that e L and e R change only slightly with each update, and the e L and e R can be considered to represent the long-term energies of the first and second channels, respectively.
  • e L and e R are initialized to zero so that they can be determined even if there has not been a preceding audio signal frame pair.
  • the direction ⁇ L from the assumed or known position A of the listener to the first loudspeaker LS is weighted with the sum of the energy e L of the first channel audio signal frame L t and the energy e L of a first channel audio signal frame of at least one preceding audio signal frame pair.
  • the direction ⁇ R from the assumed or known position A of the listener to the second loudspeaker RS is weighted with the sum of the energy e R of the first channel audio signal frame R t and the energy e R of a second channel audio signal frame of at least one preceding audio signal frame pair. Normalization with the sum of the energies e 7 and e R is performed.
  • a - r f rame an( ⁇ ⁇ - l f rame can ⁇ e seen as a coordinate pair representing the overall localized position of sound components within the pair of audio signal frames L 1 and R t and preceding audio signal frames.
  • a - r frame an ⁇ ⁇ ⁇ -i frame can describe the position of the point P in Fig. 4.
  • the angle or direction ⁇ frame is an indicator of the perceived direction of the audio signal frame pair L t and R t .
  • ⁇ f r ame can ⁇ e a good representation of the perceived direction of the audio signal frame pair L t and R t and the at least one preceding audio signal frame pair:
  • ⁇ frame may be seen as a representation of a long-term estimate of a perceived direction of the audio signal .
  • the first parameter ⁇ frame may be determined with low computational effort.
  • the difference ⁇ frame -90° can also be seen as a first parameter indicative of a perceived direction of an audio signal frame pair L f and R f and at least one preceding audio signal frame pair or, in the time domain, of the pair L t and R t and at least one preceding audio signal frame pair.
  • the angular offset ⁇ fr f ame -90' relative to the first imaginary line S3 may also be thought of as an indicator for the amount of spatial effects contained in the entire audio signal frame pair L f and R f and at least one preceding audio signal frame pair. This indicator may also be considered in deciding whether a gain should be applied to a specific subband of the reconstructed first channel audio signal frame L f or of the reconstructed second channel audio signal frame R f .
  • each second parameter indicative of a perceived direction of a frequency subband of a audio signal frame pair can be obtained.
  • the frequency domain audio signal frames L f and R f are divided into m frequency subbands, wherein 0 ⁇ m ⁇ M . sb ⁇ ffset[m] is the lower bound of frequency subband m .
  • the frequency subbands m i.e. their bounds, may be identical for the first channel audio signal frame L f and the second channel audio signal frame R f . Moreover, they may also be identical to the frequency subbands of the reconstructed audio signal frame pair L f and R f .
  • the frequency subbands m of the audio signal frame pair L f and R f are non-uniform frequency subbands corresponding to the human auditory filters. Namely, they are equivalent rectangular bandwidth (ERB) frequency subbands.
  • the human auditory system can be modeled as a system comprising a plurality of auditory filters. The bandwidths of these auditory filters increase with increasing audio signal frequencies.
  • determination whether a gain should be applied to a frequency subband m of the reconstructed first channel audio signal frame L f or of the reconstructed second channel audio signal frame R f can take into account psychoacoustic parameters of the human auditory system, thus possibly attaining enhanced widened auditory spatial images.
  • the energies e L and e R can be considered to represent the short-term energies of the first and second channels in each frequency subband m, respectively.
  • the direction ⁇ R from the assumed or known position A of the listener to the second loudspeaker RS weighted with the energy e R of the second channel audio signal frame R f , which corresponds to the energy of said frame in the time domain R t is calculated for each subband m .
  • a directional angle ⁇ m can be derived from a _r m and a _i m .
  • angles or direction angles ⁇ m are second parameters indicative of the perceived direction of the respective frequency subband m of the audio signal frame pair L f and
  • angles or direction angles ⁇ m may be seen as indicative of a short-term estimate of a perceived direction of the respective frequency subband m of the audio signal frame pair.
  • the second parameters ⁇ m of the set of second parameters are obtainable from the direction from the assumed or known position of the listener A to the first loudspeaker LS weighted with the first channel audio signal frame energy e L within the respective frequency subband m , and the direction from the assumed or known position A of the listener to the second loudspeaker RS weighted with the second channel audio signal frame energy e R within the respective frequency subband m .
  • the differences # m -90° can also be seen as a set of second parameters indicative of a perceived direction of the respective frequency subband m of the audio signal frame pair L f and R f .
  • the angular offset relative to the first imaginary line S 3 may also be thought of as an indicator for the amount of spatial effects contained in the respective subband m of the audio signal frame pair L f and R f . This indicator may also be considered in deciding whether a gain should be applied to a specific subband m of the reconstructed first channel audio signal frame L f or of the reconstructed second channel audio signal frame R f .
  • the parameter aveDiff can be seen as an averaged representation of the set of second parameters ⁇ n
  • parameter aveDiff may consider only a subset of the second parameters ⁇ m , for example ⁇ m corresponding to a selected number of lowest frequency bands, instead of considering the full set of second parameters ⁇ m for the M frequency bands as in equation (11) .
  • aveDiff in determining whether a gain should be applied to at least one frequency subband m of the reconstructed first channel audio signal frame L f or to at least one frequency subband m of a reconstructed second channel audio signal frame R f , may thus correspond to basing the determination on averaged second parameters .
  • aveDiff assumes the value 0
  • an assumed value of 90° indicates maximum spatial effects within the pair of audio signal frames L f and R f , i.e. they differ completely from each other and are not correlated.
  • An advantage of basing determination whether a gain should be applied to at least one frequency subband m can be that can be that fast determination may be enabled. Checking if each second parameter ⁇ m meets a certain condition may require more time or higher processing power. This may be even more significant if several conditions have to be met.
  • Fig. 5 is a flowchart illustrating an exemplary embodiment of a first method according to the second aspect of the present invention.
  • whether a gain has been applied to at least one frequency subband m of a preceding reconstructed first or second channel audio signal frame is taken into account in determining whether a gain should be applied to at least one frequency subband m of the reconstructed first or second channel audio signal frame L f and R f , respectively.
  • st_wideArray whether a gain has been applied to at least one frequency subband m of a preceding reconstructed first or second channel audio signal frame
  • st wideArray is initialized to zero start up. It is updated after each audio signal frame pair by performing a left shift, thereby discarding the flag indicating whether a gain has been applied to the oldest reconstructed audio signal frame pair represented by st_wideArray .
  • the flag for the current reconstructed audio signal frame pair L f and R f is stored at the rightmost position st_wideArray [ 0 ] in the array. In the present example, eight flags are stored in st wideArray.
  • step 1 it is checked whether a gain has been applied to the last eight frames by comparing st_wideArray to the binary number 11111111. It is also checked whether the conditions aveDijf ⁇ 10° and ⁇ f f r r a ⁇ me - 9W ⁇ 10° are also met .
  • I f true applying a gain to at least one frequency subband m of the reconstructed first channel audio signal frame L f or of the reconstructed second channel audio signal frame R f is not enabled.
  • auditory spatial image widening is not enabled.
  • the flag widening enabled is used to indicate the whether or not a gain should be applied.
  • the value 1 corresponds to enabled auditory spatial image widening and the value 0 corresponds to disabled auditory spatial image widening.
  • a listener's perception of the auditory spatial image of an audio signal he is provided with at a certain time may depend on the history of the audio signal. For example, due to auditory spatial image widening having been applied to preceding reconstructed audio signal frames, the listener may continue to perceive a widened auditory spatial image even no gain is applied to any of the subbands m of the current reconstructed audio signal frames L f and R f . Thus, it may not be necessary to apply a gain to any of the subbands m of the currently perceived reconstructed audio s ignal frames L f and R f .
  • step 2 it i s checked whether the conditions aveDiff ⁇ 10° or ⁇ fr f ame - 90' ⁇ 10° are met .
  • auditory spatial image widening should be applied. Being greater than 10°, the amount of spatial effects in the audio signal frame pair L 1 and R 1 is rather large.
  • a gain should be applied to at least one frequency subband m of the reconstructed first channel audio signal frame L f or to at least one frequency subband m of the reconstructed second channel audio signal frame R f in order to obtain a reconstructed auditory spatial image closely resembling the original auditory spatial image of the audio signal frame pair L t and R 1 .
  • step 4 the check of step 2 yields a pos itive result , a third check i s performed in step 4 .
  • Thi s time it i s determined if the conditions aveDiff > 9° and ⁇ frame - W > 9° are met and if widening has been enabled for the past two frames.
  • a somewhat lesser amount of spatial effects in the audio signal frame pair L 1 and R 1 is sufficient to cause determination that widening should be applied.
  • step 5 comprises checking if the conditions aveDiff>%° and ⁇ fmme - 90° > 8° are met and if widening has been enabled for the past six frames.
  • step 6 is entered. It is checked if g enc ⁇ 2 and aveDiff ⁇ 10° hold, with
  • determination whether auditory spatial image widening should be applied is based on properties of human auditory perception. In consequence, reconstructed auditory spatial images may be obtained that closely resemble the original auditory spatial image.
  • a binary value flags Wldemng (m) is determined for each frequency subband m indicating whether a gain should be applied to the said frequency subband m of the reconstructed left channel audio signal frame L f or to said frequency subband m of the reconstructed right channel audio signal frame R f .
  • ⁇ m is greater than 90°, a gain should be applied to the respective frequency subband m of the reconstructed left channel audio signal frame L f . Otherwise, it should be applied to the respective frequency subband m of the reconstructed right channel audio signal frame R f .
  • each indicator ratio is the ratio of an indicator of the maximum of the energy e L in the frequency subband m of the first channel audio signal frame L f and the energy e R in said frequency subband m of the second channel audio signal frame R f , and an indicator of the minimum of the energy e L in the frequency subband m of the first channel audio signal frame L f and the energy e R in said frequency subband m of the second channel audio signal frame R f .
  • MAX(a m ,b m ) is an indicator of the maximum of the energy e L in the frequency subband m of the first channel audio signal frame L f and the energy e R in said frequency subband m of the second channel audio signal frame R f .
  • MIN(a m ,b m ) is an indicator of the minimum of the energy e L in the frequency subband m of the first channel audio signal frame L f and the energy e R in said frequency subband m of the second channel audio signal frame R f .
  • the gain g w ⁇ demng is then given by equation (14) :
  • the gain S w i den i ng ⁇ s comparatively small if the subband specific indicator ratios are large.
  • Large subband specific indicator ratios may occur if the indicator MAX(a m ,b m ) of is greater than the indicator MIN(a m ,b m ) .
  • large subband specific indicator ratios may occur if the energies of the first channel audio signal frame L f and the second channel audio signal frame R f in the respective subband m differ significantly.
  • applying the gain g wldmmg may yield a widened reconstructed auditory spatial image in which subtle inter-channel differences are emphasized more than prominent inter-channel differences.
  • the widened reconstructed auditory spatial image may closely resemble the original auditory spatial image.
  • Gain quantization may enable representation of the gain by a reduced number of bits. If the gain is to be applied to a frequency subband, transmitting it to an entity configured to apply it may only require a reduced bandwidth.
  • K is 16.
  • the quantized gain can be chosen as the quantized value qTblyk) minimizing the error distance to the unquantized gain value g wldemng •
  • a lookup table can be used.
  • the entity configured to determine the gain can then derive the index widening gam ldx of the respective quantized value.
  • An entity configured to apply the gain can retrieve the quanti zed gain g wldmmg from the lookup table us ing the index widening gam ldx .
  • g w , den , ng qTbl[widening gam _ ldx J ( 1 6 ;
  • the index widening gam ldx can possibly be represented by fewer bits than the quantized gain g w ⁇ demng itself.
  • Fig. 6 is a flowchart illustrating an exemplary embodiment of a second method according to the second aspect of the present invention.
  • step 112 it is determined in step 112 whether a gain should be applied to at least one frequency subband m of the reconstructed first channel audio signal frame L f or of the reconstructed second channel audio signal frame R f .
  • the results obtained in method step 112 are also incorporated into a bitstream to be provided for a parametric audio decoder.
  • Fig. 7 is a more detailed illustration of method step 112 of Fig. 6.
  • step 113 the first parameter ⁇ frame as defined by equation (7) is calculated.
  • Step 114 comprises calculating the set of second parameters ⁇ as defined by equation (10) .
  • step 115 the value ⁇ f i ⁇ - 90 1 i s computed, whi le in step 11 6 the values # m - 90° are determined.
  • step 117 aveDiff is determined according to equation (11) .
  • R 1 are obtained in step 122 by frequency-to-time transformation .
  • the actual gain application can be performed by multiplying the samples f L of the reconstructed first channel audio signal frame L f or the samples f R of the reconstructed second channel audio signal frame R f by g wldemng if the flags flags w ⁇ demng ⁇ m) indicate to do so.
  • the gain g wldemng can also be obtained from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of an indicator of the maximum MAX(icld L ⁇ i),icld R (i)) of the inter- channel level difference icld L (m) in the frequency subband m of the reconstructed first channel audio signal frame L f and the inter-channel level difference icld R ⁇ m) in said frequency subband m of the reconstructed second channel audio signal frame R f , and an indicator of the minimum MIN(icld L ⁇ i),icld R ⁇ i)) of the inter-channel level difference icld L ⁇ ) in the frequency subband m of the reconstructed first channel audio signal frame L f and the inter-channel level difference icld R ⁇ m) in said frequency subband m of the reconstructed second channel audio signal frame R f :
  • the parameters determining the inter-channel level differences icld L ⁇ ) and icld R ⁇ ) may anyway form part of the spatial extension information needed in order to establish these differences between the reconstructed first channel audio signal frame L f and the reconstructed second channel audio signal frame R f . It may be possible to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame L f or the reconstructed second channel audio signal frame R f at the entity configured to perform reconstruction of the reconstructed first and second channel audio signal frames L f and R f , i.e. at a decoder.
  • the gain can be applied to at least one frequency subband m of the reconstructed first channel audio signal frame L f and the reconstructed second channel audio signal frame R f at the decoder.
  • Fig. 9 is a schematic illustration of a first exemplary embodiment of an apparatus according to the first aspect of the present invention.
  • the apparatus comprises a processor 201 configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
  • the processor 201 can also be seen as means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
  • the processor 201 may be part of an audio encoder or an audio decoder, to name but a few examples. For instance, processor 201 may be configured to implement the steps of the flowcharts of Figs. 5-7 or 8.
  • Fig. 10 is a schematic illustration of a second exemplary embodiment of an apparatus 203 according to the first aspect of the present invention.
  • the apparatus 203 forms part of a parametric audio encoder 202. It comprises a processor 204.
  • the processor 204 comprises a frequency subband selection circuit 205 and a parameter determination circuit 206.
  • the processor 204 is configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
  • Processor 204 thus may be configured to implement the steps of the flowcharts of Fig. 5-7.
  • the first parameter and the second parameter can be determined by means of the parameter determination circuit 206.
  • Processor 204 is thus also configured to calculate the first parameter and the set of second parameters (see steps 113 and 114 of Fig. 7) .
  • the parameter determination circuit 206 may also be thought of as means for calculating the first parameter and the set of second parameters and so may the processor 204.
  • the parametric audio encoder 202 can be provided with an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame.
  • the parameter determination circuit 206 may be necessary for the parameter determination circuit 206 for calculating a first parameter indicative of a perceived direction of the audio signal frame pair and a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, may be available at the parametric audio encoder 202.
  • this information is available to the parameter determination circuit 206.
  • the frequency subband selection circuit 205 is configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband.
  • the processor 204 can be considered to be configured accordingly.
  • the frequency subband selection circuit 205 or the processor 204 may thus also be seen as means for determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband.
  • Fig. 11 is a schematic illustration of an embodiment of a readable medium 300 according to the fourth aspect of the present invention.
  • the readable medium 300 is a computer- readable medium.
  • a program 301 according to the fifth aspect of the present invention is stored thereon.
  • the program 301 comprises program code 302. When executed by a processor, the instructions of the program code 302 cause a processor (for instance processor 201 of Fig. 9 or processor 204 of Fig.
  • the program 301 can also be considered as a program according to the third aspect of the present invention.
  • the processor may for instance be implemented in hardware alone, may have certain aspects in software alone, or may be a combination of hardware and software.
  • the processor may either be a separate module or it may be a subcomponent of a module such as, for example, a processor or an application specific integrated circuit (ASIC) that has other functional components or structures, too.
  • ASIC application specific integrated circuit
  • the logical blocks in the schematic block diagrams as well as the flowchart and algorithm steps presented in the above description may at least partially be implemented in electronic hardware and/or computer software, wherein it may depend on the functionality of the logical block, flowchart step and algorithm step and on design constraints imposed on the respective devices to which degree a logical block, a flowchart step or algorithm step is implemented in hardware or software.
  • the presented logical blocks, flowchart steps and algorithm steps may for instance be implemented in one or more digital signal processors (DSPs) , application specific integrated circuits (ASICs) , field programmable gate arrays (FPGAs) or other programmable devices.
  • the computer software may be stored in a variety of computer- readable storage media of electric, magnetic, electro- magnetic or optic type and may be read and executed by a processor, such as for instance a microprocessor.
  • a processor such as for instance a microprocessor.
  • the processor and the storage medium may be coupled to interchange information, or the storage medium may be included in the processor.

Abstract

It is disclosed to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.

Description

Parametric Audio Coding
FIELD
This invention relates to parametric audio coding.
BACKGROUND
Channel capacity for transmitting an audio signal or storage capacity for storing an audio signal is often limited. In consequence, audio coding is applied to reduce the required bitrate. It has been the industries constant goal to develop audio coding techniques enabling high quality audio signal reconstruction while reducing the required bitrate to a minimum. This is especially true for multichannel audio coding where, compared to single channel audio signals, an even larger amount of information has to be dealt with.
SUMMARY OF SOME EXEMPLARY EMBODIMENTS OF THE INVENTION According to a first aspect of the present invention, a first apparatus is disclosed, comprising a processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
According to the first aspect of the present invention, further a second apparatus is disclosed, comprising means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
The means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, may for instance comprise a processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, but said means is not limited thereto.
The first apparatus as well as the second apparatus according to the first aspect of the present invention can be a module that forms part of or is to form part of another apparatus (such as for instance an audio encoder or decoder) , for instance a processor, or it can be a separate apparatus . According to a second aspect of the present invention, further a method is disclosed, comprising determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
According to a third aspect of the present invention, further a program is disclosed, comprising program code for performing the method according to the second aspect of the present invention, when the program is executed on a processor. The program may for instance be a computer program that is readable by a computer or processor. The program code may then for instance be computer program code. The program may for instance be distributed via a network, such as for instance the Internet. The program may for instance be stored on a tangible readable medium, for instance a computer-readable or processor-readable medium. The readable medium may for instance be embodied as an electric, magnetic, electro-magnetic, optic or other storage medium, and may either be a removable medium or a medium that is fixedly installed in an apparatus or device.
According to a fourth aspect of the present invention, further a readable storage medium encoded with instructions that, when executed by a processor, perform the method according to the second aspect of the present invention is disclosed.
The readable storage medium may for instance be a computer-readable or processor-readable storage medium. It may be embodied as an electric, magnetic, electromagnetic, optic or other storage medium, and may either be a removable storage medium or a storage medium that is fixedly installed in an apparatus or device.
According to a fifth aspect of the present invention, further a program is disclosed which causes a processor to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. The program may for instance be a computer program that is readable and/or executable by a computer or processor. The program may for instance be a computer program with computer program code. The program may be stored on a tangible readable medium, for instance a computer- readable or processor-readable medium. The readable medium may for instance be embodied as an electric, magnetic, electro-magnetic, optic or other storage medium, and may either be a removable medium or a medium that is fixedly installed in an apparatus or device.
The term auditory spatial image may be used to refer to the listener's spatial perception of components of an audio signal.
Human listeners are capable of perceiving an auditory spatial image by interpreting differences of the signals received by the left and by the right ear. Among these differences are the interaural level difference (ILD) and the interaural time difference (ITD) . Diffraction and reflection may be caused by at least some of the listener's body parts, thus affecting the audio signal on its way from a sound source to the listener's eardrum. A head related transfer function (HRTF) describes the transform an audio signal may be subject to due to these diffraction and reflection phenomena. It is direction- dependent. Hence, being used to the listener's individual HRTF, the listener's brain can derive directional cues from the interaural signal difference caused by the HRTF. Many audio signal reproduction technologies aim at obtaining an auditory spatial image of the reproduced audio signal that resembles the auditory spatial image of the original audio signal as closely as possible. Multichannel audio signals exist. An exemplary multichannel audio signal is a stereo audio signal. A stereo audio signal comprises a first channel audio signal and a second channel audio signal. The first channel and the second channel are often referred to as a left channel and a right channel, respectively. A stereo audio signal is usually intended to be reproduced via a set of speakers, wherein for each channel there is at least one speaker provided. In accordance with the naming of the channels, these two speakers can be called a left speaker and a right speaker, respectively. The left and the right speaker can be arranged with a certain offset to one another. A user of the audio signal reproduction system comprising said left and said right speaker, who may also be referred to as a listener or an auditor, may then localize whether a certain component of a replayed audio signal is provided through the left or through the right speaker. Of course, the audio signal may also be provided through both speakers, possibly with a different sound level for each speaker, so that the listener does not only perceive said signal component exclusively on one side. In consequence, the listener can perceive an auditory spatial image. In the case of two channels being provided, the auditory spatial image the listener perceives may also be called a stereo image.
For audio reproduction by means of speakers or headphones, the audio signal provided to the speakers or headphones may, among other parameters, exhibit a certain inter-channel level difference (ICLD) and a certain inter-channel time difference (ICTD) and a certain inter- channel correlation (ICC) . If reproduced by means of head phones, the ICLD and the ICTD may correspond to the ILD and to the ITD, respectively. In loudspeaker reproduction, the ICLD and the ICTD may only partially determine the ILD and the ITD, respectively. It is to be understood that the above explanations represent only some of the factors contributing to the complex system of human sound localization. Furthermore, it is readily clear to a skilled person that recording or encoding audio signals may involve using much more elaborate spatial information capturing or encoding techniques.
Using more than two channels may enable an even better spatial reproduction of an audio signal. For instance, four channel audio systems exist. A set of four loudspeakers, for example a front right loudspeaker, a front left loudspeaker, a rear right loudspeaker and a rear left loudspeaker, may be provided for audio signal reproduction. Five channel audio signal reproduction systems often comprise a fifth loudspeaker arranged in between the front right loudspeaker and the front left loudspeaker. It is noted that an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame may be formed by selecting two channels of a plurality of provided channels. For instance, if five channels are provided, the left and right front channels may form one channel pair and the left and right rear channels form another channel pair. It is also possible that the front left and rear left channels form a channel pair and the front right and rear right channels form another channel pair. In the following, exemplary embodiments of the present invention are explained with respect to a scenario in which only two channels are provided. It is to be understood that, as stated above, exemplary embodiments according to all aspects of the present invention may be applied in scenarios in which more than two channels are provided by forming at least one channel pair, each channel pair comprising two selected channels.
In order to reduce the bitrate required by an audio signal, various audio signal coding approaches exist. In the field of audio coding, the auditory spatial image a listener will perceive if provided with the decoded audio signal that has been obtained by decoding the previously encoded audio signal may be referred to as a reconstructed auditory spatial image. Again, it is often desired to obtain a reconstructed auditory spatial image that closely resembles the auditory spatial image of the original (unmodified and uncoded) audio signal. On the other hand, in some cases it may be desirable to modify the perceived auditory spatial image to be different from the original auditory spatial image, for example by introducing a wider or narrower auditory spatial image than the one present in the original audio signal.
Parametric audio coding may be thought of as generating a downmix signal from a multichannel audio signal and providing spatial extension information that allows reconstruction of the multichannel audio signal from the downmix signal. A downmix signal is generated from a multichannel audio signal in such a way that M input channels are used to generate N downmix channels, with N < M. As an example, a single-channel downmix signal - a mono signal - is generated from a stereo audio signal, implying N=I and M=2. In the course of audio signal encoding, the audio signal may be divided into a sequence of frames, each frame comprising a limited number of samples of the audio signal. In parametric audio coding, an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame may be encoded as a single mono audio signal frame and spatial extension information. One may say that the parametrically encoded audio signal represents the audio signal frame pair.
The spatial extension information may contain parameters such as ICLDs, ICTDs and ICCs, as well as other parameters. Any parametric audio coding technology may be used in the scope of the present invention. Exemplary parametric audio coding technologies comprise Binaural Cue Coding (BCC) and Spatial Audio Coding (SAC) . An advantage of parametric audio coding is that bitrate requirements can often be reduced significantly. Instead of separately encoding a full set of input channel signals, a smaller number of signals and the spatial extension information have to be provided for transmission or storage. For example instead of two audio channel signals only a mono signal, i.e. a single channel signal, and the spatial extension information has to be provided for transmission or storage. The bitrate required for the spatial extension information is usually smaller than the bitrate required for transmitting another mono audio signal. One may therefore say that parametric audio coding exploits channel interrelations to reduce the bitrate required by the parametrically encoded audio signal compared to the bitrate required by the original multichannel audio signal.
Any pertinent mono codec may be used for mono signal coding. Among them are Advanced Audio Coding (AAC), Advanced Audio Coding with spectral band replication (AAC+) and the International Telecommunication Union (ITU) G.718 mono codec, to give only some examples of mono codecs .
Decoding a parametrically encoded audio signal may involve decoding the mono signal. Exploiting the spatial extension information, the mono signal can serve as a basis for reconstructing a first channel audio signal and a second channel audio signal. The reconstructed first channel audio signal frame may be a representation of the first channel audio signal frame, while the reconstructed second channel audio signal frame may be a representation of the second channel audio signal frame. Hence, one may say that the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame.
An audio signal frame may be divided into a plurality of frequency subbands . In the frequency domain, a signal is represented by a set of frequency components. A frequency subband of the signal may then be thought of as comprising each frequency component falling within an upper frequency bound and a lower frequency bound limiting the respective frequency subband. The frequency subbands, i.e. their bounds, may be identical for the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame. Moreover, they may also be identical to the frequency subbands of the audio signal frame pair.
Exemplary embodiments according to all aspects of the present invention involve determination whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame.
One possibility to apply a gain to at least one frequency subband of a reconstructed audio signal frame is to multiply each frequency component of the respective frequency subband by a gain factor. Thus, the listener's perception of an audio signal can be modified by applying a gain to a certain frequency band. In particular, the auditory spatial image perceived by the listener may be affected by applying a gain to a frequency subband of either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame. A listener's perception may exhibit different characteristics in different frequency subbands . In consequence, it can be relevant to which specific frequency subband or frequency subbands a gain is applied. In addition, it can also be relevant whether a gain is applied to a frequency subband of a reconstructed first channel audio signal frame or to a frequency subband of a reconstructed second channel audio signal frame. Audio signal components falling within the subband of the reconstructed audio signal frame to which a gain is applied may be perceived more prominently in the respective channel than audio signal components falling within the subband of the reconstructed audio signal frame of the other channel.
According to all aspects of the present invention, the decision whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is based on a first parameter and a set of second parameters.
The first parameter may be indicative of a long-term estimate of a perceived direction of the audio signal frame pair. For instance, the first parameter may be indicative of a perceived direction of the (current) audio signal frame pair and at least one preceding audio signal frame pair. No limitations pertain to the first parameter as long as it is indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair. It is readily clear that different listeners are likely to have a different audio signal perception. Hence, there is probably not a single parameter that can exactly describe the perceived direction of an audio signal for all possible listeners. Moreover, with a variety of parameters contributing to the auditory spatial image a listener perceives and complex interrelations among these parameters, a model describing human audio signal localization is likely to be afflicted with at least some amount of inaccuracy. In consequence, various parameters can be determined that are indicative of a perceived direction of an audio signal, while it may be difficult to determine a parameter exactly representing said direction. In this respect, other examples for directional sensing of sounds may include techniques based on subspace decomposition of a covariance matrix of received signals; ESPRIT and MUSIC. The former is introduced in ,,A. Paulraj and R. R. Kailath, "ESPRIT-Estimation of Signal Parameters via Rotational Invariance Techniques," IEEE Trans. Acoust., Speech, Signal Processing, vol.37, no.7, pp.984-995, 1989." And the latter in ,,R. O. Schmidt, "Multiple Emitter Location and Signal Parameter Estimation, " IEEE Trans. Antennas and Propagation, vol.34, no.3, 1986."
According to all aspects of the present invention, the first parameter is indicative of a perceived direction of the audio signal frame pair and at least one preceding frame pair. In exemplary embodiments according to all aspects of the present invention, this may be understood as taking into account not only the perceived direction of a current audio signal frame pair but also considering the audio signal history. In consequence, the first parameter may change more slowly than, for instance, a parameter indicative of a perceived direction of the current audio signal frame pair only. It is noted that according to all aspects of the present invention, any number of preceding audio signal frames may be taken into account. Of course, if the current audio signal frame pair does not have a predecessor, a preceding audio signal frame pair cannot be taken into account.
Each second parameter of the set of second parameters may be indicative of a short-term estimate of a perceived direction of a frequency subband (or several frequency subbands) of the audio signal frame pair. For instance, each second parameter may be indicative of a perceived direction of a frequency subband of the audio signal frame pair. In exemplary embodiments according to all aspects of the present invention, each second parameter may be understood as a parameter that does not describe the perceived direction of the entire audio signal frame pair, but of one or more specific frequency subbands of the audio signal frame pair. In case of one specific subband, one may also call each second parameter a frequency subband specific parameter. Again, as already explained with regard to the first parameter indicative of a perceived direction of the audio signal frame pair, different parameters indicative of a perceived direction of a frequency subband of the audio signal frame pair can be employed. The set of second parameters may comprise at least two second parameters respectively indicative of a perceived direction of a frequency subband of the audio signal frame pair but may also comprise more than two second parameters. If the set of second parameters is only derived from the current audio signal frame pair and not also from a preceding audio signal frame pair, it may not only be thought of as comprising frequency subband specific information but it may also change more quickly than the first parameter for a sequence of audio signal frame pairs.
It is noted that according to all aspects of the present invention, any suitable processing of the first parameter and the set of second parameters may be employed in the course of determining whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame. Any condition that has to be met in order to cause determination to yield a positive result can be imposed on the first parameter and the set of second parameters.
According to all aspects of the present invention, no restrictions regarding the location at which it is determined whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame apply. For example, determination may be performed at a parametric audio encoder or at a parametric audio decoder. It is also possible that it takes place at a separate module neither forming part of an encoder nor of a decoder, such as for instance a network element located in a transmission path between an encoder and a decoder, a separate module co-located with the encoder, or a separate module co-located with the decoder. The auditory spatial image a listener perceives when provided with the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame, which will in the following be referred to as the reconstructed auditory spatial image, may differ from the auditory spatial image the listener would perceive if provided directly with the first channel audio signal frame and the second channel audio signal frame. The latter auditory spatial image will be in the following referred to as the original auditory spatial image .
One reason for the occurrence of a difference between the reconstructed auditory spatial image and the original auditory spatial image may be that parametric coding may possibly not allow exact reconstruction of the auditory spatial image. This may be, for instance, due to the spatial extension information only representing some but not every aspect of the interrelations of the first channel audio signal frame and the second channel audio signal frame. Coarse quantization of the spatial extension information may also adversely affect the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame with regard to the auditory spatial image.
The reconstructed auditory spatial image may be perceived by the listeners as a being different from the desired auditory spatial image. The desired reconstructed auditory spatial image may be the original auditory spatial image or for example a wider or a narrower auditory spatial image than the original one. Exemplary embodiments according to all aspects of the present invention can be employed to determine whether auditory spatial image width modification should be applied to the reconstructed auditory spatial image. As an example, the reconstructed auditory spatial image may be perceived as being narrow compared to the original auditory spatial image, and it may be desirable to apply auditory spatial image widening to the reconstructed auditory spatial image. Applying a gain to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame may yield a widened auditory spatial image, for example by means of a thereby obtained subband and channel specific audio signal amplification. Due to auditory spatial image widening, even a widened reconstructed auditory spatial image obtained from a parametrically encoded low bitrate audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame may closely resemble the original auditory spatial image.
However, so as to attain a desired auditory spatial image, for example the one closely resembling the original auditory spatial image, well targeted auditory spatial image width modification can be necessary. In order to minimize the risk of adversely affecting the auditory spatial image, exemplary embodiments according to all aspects of the present invention take into account parameters indicative of a perceived direction of the audio signal frame pair, of a preceding audio signal frame pair and also of a frequency subband of the audio signal frame pair.
Thereby, a hypothetic listener's directional audio signal perception can serve as a criterion whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame. An advantage of this approach can be that an auditory spatial image width modification result can be obtained that causes the listener's perception of the modified reconstructed auditory spatial image to closely resemble that of the original auditory spatial image. Also, the determination criteria used may help applying width modification only if it is beneficial. Only applying a gain in these cases can also have the advantage of reducing the signal processing load by potentially omitting gain application if the criteria are not met.
According to exemplary embodiments of the first apparatus according to the first aspect of the present invention, the first apparatus comprises a processor configured to calculate the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. According to exemplary embodiments of the second apparatus according to the first aspect of the present invention, the second apparatus further comprises means for calculating the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. These means may for instance be embodied as a processor configured to calculate the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, but are not limited thereto.
An advantage of these embodiments can be that it may then not be necessary to transmit the first parameter and the set of second parameters from another apparatus or device that is configured to determine the first parameter and the set of second parameters to the first apparatus or the second apparatus. The processor configured to calculate the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, or the corresponding means do not necessarily have to be separate entities. For example, the processor configured to calculate the first parameter and the set of second parameters may form part of the processor configured to determine whether a gain should be applied to a frequency subband of a reconstructed first channel audio signal frame or to a frequency subband of a reconstructed second channel audio signal frame .
According to exemplary embodiments of the method according to the second aspect of the present invention, the method comprises calculating the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention.
According to exemplary embodiments of the program according to the fifth aspect of the present invention, the program causes a processor to calculate the first parameter indicative of a perceived direction of the audio signal frame pair and the set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
According to exemplary embodiments of the first apparatus according to the first aspect of the present invention, the apparatus forms part of a parametric audio encoder.
According to exemplary embodiments of the second apparatus according to the first aspect of the present invention, the apparatus forms part of means for parametric audio encoding. The means for parametric audio coding may for instance be embodied as a parametric audio encoder, but are not limited thereto.
According to exemplary embodiments of the method according to the second aspect of the present invention, determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is performed on a parametric audio encoder side. This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention.
According to exemplary embodiments of the program according to the fifth aspect of the present invention, the program causes a processor forming part of a parametric audio encoder to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame.
For parametric audio encoding, a parametric audio encoder can be provided with an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame. Thus, information that may be necessary for calculating a first parameter indicative of a perceived direction of the audio signal frame pair and a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, may be available at the parametric audio encoder. The first parameter and the set of second parameters may then be computed on the parametric audio encoder side, too. This approach can render it unnecessary to forward these parameters to another module serving for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame. It may suffice to generate a flag indicating the determination result and to provide it to an entity configured to carry out the actual application of the gain if permitted by the flag value.
According to exemplary embodiments according to all aspects of the present invention, the first parameter and the second parameters are at least partially obtainable from an assumed or known position of a listener relative to a first loudspeaker to which the first channel audio signal frame is assigned and relative to a second loudspeaker to which the second channel audio signal frame is assigned.
For instance and as already elucidated above, for stereo audio reproduction, i.e. two channel audio reproduction, two speakers referred to as a left speaker and as a right speaker are used. First channel audio signal frames, for example left channel audio signal frames, are replayed by means of the left loudspeaker, while second channel audio signal frames, i.e. right channel audio signal frames, are replayed by means of the right loudspeaker. Put differently, the first channel audio signal frames are assigned to a first loudspeaker and the second channel audio signal frames are assigned to a second loudspeaker.
A listener's position can be expressed relative to the first loudspeaker and the second loudspeaker. It is possible that the listener's perceived auditory spatial image depends on his position relative to the first loudspeaker and to the second loudspeaker. As an example, being located closer to the first loudspeaker than to the second loudspeaker, audio signals emitted from the first loudspeaker may reach the listener with a smaller time difference and a higher sound intensity than audio signals emitted from the second speaker.
According to exemplary embodiments according to all aspects of the present invention, the listener's position relative to the first loudspeaker and the second loudspeaker can be assumed, for instance based on one or more pre-defined assumptions. To this end, it can be for instance assumed that the listener is positioned with the same distance to both the first and the second loudspeaker. Moreover, it may be assumed that the distance between the first loudspeaker and the second loudspeaker corresponds to the distance from the listener's position to each of the loudspeakers. Thus, the listener's position, the position of the first loudspeaker and the position of the second loudspeaker form the vertices of an equilateral triangle.
It is also possible, that the position of a listener relative to a first loudspeaker and a second loudspeaker is known. For instance, reconstructed and possibly widened first and second audio signal frames can be reproduced by means of the two loudspeakers. Since the reconstructed first channel audio signal frame can be seen as a representation of the first audio signal frame and since the reconstructed second channel audio signal frame can be seen as a representation of the second audio signal frame, also the first channel audio signal frame is assigned to the first loudspeaker and the second channel audio signal frame is assigned to the second loudspeaker .
The listener's position relative to these two loudspeakers may then be measured and provided to an entity configured to calculate the first parameter and the set of second parameters. In turn, perceived direction determination of the audio signal frame pair may be adapted to the reconstructed first and second audio signal frame reproduction scenario.
By taking into account a listener's assumed or known position relative to a first loudspeaker to which the first channel audio signal frame is assigned and relative to a second loudspeaker to which the second channel audio signal frame is assigned, determining whether a gain should be applied to at least one frequency subband of a reconstructed first or second channel audio signal frame can aim at reproduction of an auditory spatial image that could be perceived with the underlying assumed or known listener and loudspeaker position configuration.
Exemplary embodiments according to all aspects of the present invention comprise that the first parameter is obtainable from the direction from the assumed or known position of the listener to the first loudspeaker weighted with the sum of the energy of the first channel audio signal frame and the energy of a first channel audio signal frame of the at least one preceding audio signal frame pair, and the direction from the assumed or known position of the listener to the second loudspeaker weighted with the sum of the energy of the second channel audio signal frame and the energy of a second channel audio signal frame of the at least one preceding audio signal frame pair. Therein, in the calculation of the energies, all frequency subbands, or only a subset thereof (for instance a subset that comprises the frequency subbands that are considered important, such as for instance low frequencies, wherein this consideration may depend on the sample rate applied) may be considered.
These embodiments may provide a good representation of the perceived direction of the audio signal frame pair and the at least one preceding audio signal frame pair. Knowing the direction from the assumed or known position of the listener to the first loudspeaker and the direction from the assumed or known position of the listener to the second loudspeaker, the first parameter may be determined with low computational effort.
According to exemplary embodiments according to all aspects of the present invention, the second parameters of the set of second parameters are obtainable from the direction from the assumed or known position of the listener to the first loudspeaker weighted with the first channel audio signal frame energy within the respective frequency subband, and the direction from the assumed or known position of the listener to the second loudspeaker weighted with the second channel audio signal frame energy within the respective frequency subband. Exemplary embodiments according to all aspects of the present invention comprise that the first parameter and the second parameters are indicative of a perceived direction relative to a reference direction, for example relative to a first imaginary line passing through the assumed or known position of the listener, the first imaginary line being perpendicular to a second imaginary line connecting the first loudspeaker and the second loudspeaker. In other words, the first parameter can be indicative of a perceived direction of an audio signal frame pair and at least one preceding audio signal frame pair relative to said first imaginary line and each second parameter of the set of second parameters can be indicative of a perceived direction of a frequency subband of the audio signal frame pair relative to said first imaginary line.
By describing the above perceived directions relative to the first imaginary line, arranging the perceived directions in two groups, wherein the first group comprises perceived directions having a positive angular offset relative to the first imaginary line and the second group comprises perceived directions having a negative angular offset relative to the first imaginary line, is enabled. In exemplary embodiments according to all aspects of the present invention, the angular offset relative to the first imaginary line may also be thought of as an indicator for the amount of spatial effects contained in the entire audio signal frame pair and in a specific subband of the audio signal frame pair, respectively. This indicator may also be considered in deciding whether a gain should be applied to a specific subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame .
According to exemplary embodiments of the first apparatus according to the first aspect of the present invention, the processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is configured to base the determination on averaged second parameters. According to exemplary embodiments of the second apparatus according to the first aspect of the present invention, the means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame are configured to base the determination on averaged second parameters .
According to exemplary embodiments of the method according to the second aspect of the present invention, determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is based on averaged second parameters. This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention. According to exemplary embodiments of the program according to the fifth aspect of the present invention, the program causes a processor to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame based on averaged second parameters .
Averaging may be based on any existing averaging approach. Besides of computing the mean value or the median value, averaging may also comprise assigning weights to the second parameters before summing them up. The weights may control the influence of each second parameter on the averaged value. An advantage of basing determination whether a gain should be applied to at least one frequency subband of a reconstructed first or second channel audio signal frame on average second parameters can be that fast determination may be enabled. Checking if each second parameter meets a certain condition may require more time. This may be even more significant if several conditions have to be met.
According to exemplary embodiments of the first apparatus according to the first aspect of the present invention, the processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is further configured to consider whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame in determining whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame.
According to exemplary embodiments of the second apparatus according to the first aspect of the present invention, the means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame are configured to consider whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame in determining whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame.
According to exemplary embodiments of the method according to the second aspect of the present invention, determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame involves considering whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame. This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention .
According to exemplary embodiments of the program according to the fifth aspect of the present invention, the program causes a processor to consider whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame in determining whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame .
A listener's perception of the auditory spatial images of an audio signal he is provided with at a certain time may depend on the history of the audio signal. For example, due to auditory spatial image widening having been applied to a comparatively large number of preceding reconstructed audio signal frames, the listener may continue to perceive a widened auditory spatial image even no gain is applied to any of the subbands of the currently perceived reconstructed audio signal frames. Thus, it may not be necessary to apply a gain to any of the subbands of the currently perceived reconstructed audio signal frames. On the other hand, scenarios are imaginable in which neither the first parameter nor the set of second parameters would cause that it is determined that a gain should be applied to at least one frequency subband of a reconstructed first or second channel audio signal frame. However, if a gain has been applied to a subband of a preceding reconstructed audio signal frame it may be advisable to also apply a gain to at least one subband of a current reconstructed first or second channel audio signal frame so as not to cause sudden changes in the perceived auditory spatial image. An advantage of the embodiments of the present invention currently discussed can thus be that the history of the audio signal is taken into account, thereby allowing adequate use of auditory spatial image width modification.
In further exemplary embodiments according to all aspects of the present invention, the frequency subbands of the audio signal frame pair are non-uniform frequency subbands corresponding to the human auditory filters. The human auditory system can be modeled as a system comprising a plurality of auditory filters. The bandwidths of these auditory filters increase with increasing audio signal frequencies. An example of non- uniform frequency subbands corresponding to the human auditory filters are equivalent rectangular bandwidth (ERB) frequency subbands. By splitting the audio signal frame pair into non-uniform frequency subbands corresponding to the human auditory filters, determination whether a gain should be applied to a frequency subband of a reconstructed first or second channel audio signal frame can take into account psychoacoustic parameters of the human auditory system, thus possibly attaining enhanced modified auditory spatial images.
According to exemplary embodiments of the first apparatus according to the first aspect of the present invention, the apparatus comprises a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband. This processor may either be a separate module or it may form part of the processor configured to determine whether a gain should be applied to a frequency subband of a reconstructed first or second channel audio signal frame. According to exemplary embodiments of the second apparatus according to the first aspect of the present invention, the apparatus comprises means for determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband. These means may for example be embodied as a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband, but the means are not limited thereto.
According to exemplary embodiments of the method according to the second aspect of the present invention, the method comprises determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband. This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention.
According to exemplary embodiments of the program according to the fifth aspect of the present invention, the program causes a processor to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband.
These embodiments may for instance enable applying a gain to the reconstructed audio signal frame associated with the audio signal frame of the audio signal frame pair that is assigned to the loudspeaker for which the direction from an assumed or known position of a listener to said loudspeaker is closer to the perceived direction in the respective frequency subband as indicated by the second parameter of said frequency subband.
As an example, the perceived direction as indicated by the second parameter of a certain frequency subband of an audio signal frame pair can be 110°. The direction is measured from an assumed or known position of a listener relative to a pair of speakers. For instance and as already elucidated above, for stereo audio reproduction, i.e. two channel audio reproduction, two speakers referred to as a left speaker and as a right speaker are used. First channel audio signal frames, for example left channel audio signal frames, are replayed by means of the left loudspeaker, while second channel audio signal frames, i.e. right channel audio signal frames, are replayed by means of the right loudspeaker. Put differently, the first channel audio signal frames are assigned to a first loudspeaker and the second channel audio signal frames are assigned to a second loudspeaker.
An angle of 90° corresponds to the direction of a first imaginary line perpendicular to a second imaginary line connecting the left and the right loudspeaker, the first imaginary line passing through the assumed or known position of the listener. Therefore, directions describable by an angle greater than 90° are closer to the direction from the assumed or known position of the listener to the left speaker, while directions describable by an angle of less than 90° are closer to the direction from the assumed or known position of the listener to the right speaker.
The perceived direction of 110° is thus closer to the direction from the assumed or known position of the listener to the left loudspeaker. According to the exemplary embodiments of the present invention currently discussed, for the specific subband for which the perceived direction as indicated by the second parameter of the said subband is 110° it is determined to apply a gain to the subband of the reconstructed first channel audio signal frame. Said frame is associated with the left, i.e. first, channel audio signal frame of the audio signal frame pair that is assigned to the left loudspeaker because it may be thought of as a representation of the left channel audio signal frame.
Hence, the embodiments of the present invention currently discussed may enable emphasizing the content of the reconstructed audio signal frame contributing more significantly to the spatial auditory effects in the respective frequency subband. The modified auditory spatial image may then be close to the desired auditory spatial image, it may for example closely resemble the original auditory spatial image. Of course it does not have to be determined for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame if it has previously been determined that no gain should be applied at all.
According to exemplary embodiments of the first apparatus according to the first aspect of the present invention, the apparatus comprises a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame. This processor may either be a separate module or it may form part of the processor configured to determine whether a gain should be applied to a frequency subband of a reconstructed first channel audio signal frame or to a frequency subband of a reconstructed second channel audio signal frame . According to exemplary embodiments of the second apparatus according to the first aspect of the present invention, the apparatus comprises means for determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame. These means may for example be embodied as a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame, but the means are not limited thereto.
According to exemplary embodiments of the method according to the second aspect of the present invention, the method comprises determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame. This feature also pertains to exemplary embodiments of the program according to the third aspect of the present invention and to the readable storage medium according to the fourth aspect of the present invention.
According to exemplary embodiments of the program according to the fifth aspect of the present invention, the program causes a processor to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame. These embodiments may be advantageous because the inter- channel level differences may form part of the spatial extension information that may be needed in order to establish these differences between the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame. It may be possible to determine for each frequency subband whether a gain should be applied to either the reconstructed first or second channel audio signal frame at the entity configured to perform reconstruction of the reconstructed first and second channel audio signal frame. To this end, it may then not be necessary to provide said entity with additional parameters. Also, the gain can be applied to at least one frequency subband of the reconstructed first or second channel audio signal frame at said entity in exemplary embodiments according to all aspects of the present invention.
According to exemplary embodiments according to all aspects of the present invention, the gain is obtainable from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of an indicator of the maximum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame, and an indicator of the minimum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame . Thus, due to the computation of the reciprocal average value of frequency subband specific indicator ratios, it may be achieved that the gain is comparatively small if the subband specific indicator ratios are large. Large subband specific indicator ratios may occur if the indicator of the maximum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame is much greater than the indicator of the minimum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame. In other words, large subband specific indicator ratios may occur if the energies of the first channel audio signal frame and the second channel audio signal frame in the respective subband differ significantly. In consequence, applying the gain may yield a widened auditory spatial image, in which subtle inter-channel differences are emphasized more than prominent inter- channel differences. Thus the widened auditory spatial image may closely resemble the original auditory spatial image .
Other exemplary embodiments according to all aspects of the present invention comprise that the gain is obtainable from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of an indicator of the maximum of the inter-channel level difference in the frequency subband of the reconstructed first channel audio signal frame and the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame, and an indicator of the minimum of the inter- channel level difference in the frequency subband of the reconstructed first channel audio signal frame and the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame.
These embodiments may be advantageous because the inter- channel level differences may form part of the spatial extension information that may be needed in order to establish these differences between the reconstructed first channel audio signal frame and the reconstructed second channel audio signal frame. The gain may then be computed at the entity configured to perform reconstruction of the reconstructed first and second channel audio signal frame. To this end, it may then not be necessary to provide said entity with additional parameters for gain computation. Also, the gain can be applied to at least one frequency subband of the reconstructed first or second channel audio signal frame at said entity in exemplary embodiments according to all aspects of the present invention.
Exemplary embodiments according to all aspects of the present invention comprise that the gain is a quantized value. Gain quantization may enable representation of the gain by a reduced number of bits. Thus, transmitting the gain to an entity configured to apply it to at least one frequency subband of a reconstructed first or second channel audio signal frame may require a reduced bandwidth. According to exemplary embodiments according to all aspects of the present invention, the gain is obtainable from a lookup table. An advantage of these embodiments can be that instead of transmitting the gain to an entity configured to apply it to at least one frequency subband of a reconstructed first or second channel audio signal frame, it may suffice to merely transmit a lookup table index, which can possibly be represented by fewer bits than the quantized gain itself.
The features of the present invention and of its exemplary embodiments as presented above shall also be understood to be disclosed in all possible combinations with each other.
It is to be noted that the above description of embodiments of the present invention is to be understood to be merely exemplary and non-limiting.
Further aspects of the invention will be apparent from and elucidated with reference to the detailed description presented hereinafter.
BRIEF DESCRIPTION OF THE FIGURES In the figures show:
Fig. 1: An exemplary illustration of a method for parametric audio coding;
Fig. 2: A more detailed representation of method step 101 of Fig. 1; Fig. 3: A more detailed representation of method step 102 of Fig. 1 ;
Fig. 4: A schematic exemplary illustration of the assumed or known position of a listener relative to a first loudspeaker to which a first channel audio signal frame is assigned and relative to a second loudspeaker to which a second channel audio signal frame is assigned;
Fig. 5: A flowchart illustrating an exemplary embodiment of a first method according to the second aspect of the present invention;
Fig. 6: A flowchart illustrating an exemplary embodiment of a second method according to the second aspect of the present invention;
Fig. 7: A more detailed illustration of method step 112 of Fig. 6;
Fig. 8: A flowchart exemplarily illustrating decoding of a bitstream generated according to the method of Fig. 6 including gain application;
Fig. 9: A schematic illustration of a first exemplary embodiment of an apparatus according to the first aspect of the present invention;
Fig. 10: A schematic illustration of a second exemplary embodiment of an apparatus according to the first aspect of the present invention; Fig. 11: A schematic illustration of an exemplary embodiment of a readable medium according to the fourth aspect of the present invention.
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS OF THE
INVENTION
In the following detailed description, exemplary embodiments of the present invention will be described.
An input audio signal comprising two channels, namely L1 and R1, is provided to method step 101. Method step 101 comprises parametrically encoding the input audio signals L1 and R1. As an output, method step 101 delivers a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame. In step 102, the parametrically encoded audio signal is decoded. In decoding the parametrically encoded audio signal frame pair, a reconstructed first channel audio signal frame and a reconstructed second channel audio signal frame can be obtained.
Fig. 2 is a more detailed representation of method step 101 of Fig. 1.
The input signals L1 and R1 are divided into frames in step 103. Successive frames may overlap in time, for example half of their duration. As a result, the first channel audio signal frame L1 and the second channel audio signal frame R1 are obtained. They comprise samples tL and tR , respectively.
In step 104 windowing is applied to the audio signal frames L1 and R1. Windowing may serve for suppressing framing induced artefacts in time-to-frequency transformed representations of L1 and R1. Any pertinent windowing function —wT can be employed. An exemplary windowing function is the sinusoidal window given by:
Figure imgf000047_0001
wherein N is the number of samples in each frame Lt and R1. As a result, the windowed first channel audio signal frame Lw and the windowed second channel audio signal frame Rw are obtained.
In step 105 the windowed audio signal frames Lw and Rw are transformed to the frequency domain. To this end, any transform TF that provides complex valued output may be used. For example, Discrete Fourier Transform (DFT) , Modified Discrete Cosine Transform (MDCT) , Modified Discrete Sine Transform (MDST) or Quadrature Mirror
Filtering (QMF) may be used. The transformed windowed audio signal frames are referred to as Lf and Rf . The transformed windowed audio signal frames comprise the samples fL and fR I respectively. How they are obtained from the samples tL and tR can be described by the following equations:
Figure imgf000047_0002
(2) In step 106 the audio signal frames Lf and Rf are transformed to a downmix signal, which in this example embodiment is a mono signal Mf .
Mf=0.5-(Lf+Rf) (3)
Step 107 comprises encoding the mono signal Mf . Any pertinent mono codec, such as Advanced Audio Coding (AAC) , Advanced Audio Coding with spectral band replication (AAC+) and the International Telecommunication Union (ITU) G.718 mono codec, may be employed.
In step 108, spatial extension information is derived from the first channel audio signal frame Lf and the second channel audio signal frame Rf . Exemplary spatial extension parameters are the inter-channel level difference (ICLD), the inter-channel time difference (ICTD) and a the inter-channel correlation (ICC) .
By multiplexing the encoded mono signal and the spatial extension information, a bitstream can be formed. The bitstream representative of an input frame may comprise one or more encoded frames or packets. It can be thought of as a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame Lf or L1 and a second channel audio signal frame Rf or R1. Fig. 3 is a more detailed representation of method step 102 of Fig. 1.
According to Fig. 3, the bitstream provided from the encoding step 101 is demultiplexed in step 109. Thereby, an encoded mono signal and spatial extension information are obtained. In step 110 the mono signal is decoded, yielding the decoded mono signal Mf. Step 111 comprises applying the spatial extension information, thus obtaining the reconstructed first channel audio signal frame Lf and the reconstructed second channel audio signal frame Rf .
Fig. 4 schematically illustrates the assumed or known position A of a listener relative to a first loudspeaker LS to which a first channel audio signal frame Lt is assigned and relative to a second loudspeaker RS to which a second channel audio signal frame R1 is assigned;
For stereo audio reproduction, i.e. two channel audio reproduction, two loudspeakers referred to as a left loudspeaker LS and as a right loudspeaker RS are used. First channel audio signal frames, for example left channel audio signal frames L1, are replayed by means of the left loudspeaker LS, while second channel audio signal frames, i.e. right channel audio signal frames R1, are replayed by means of the right loudspeaker RS. Put differently, the first channel audio signal frames L1 are assigned to the first loudspeaker LS, while the second channel audio signal frames R1 are assigned to the second loudspeaker RS .
Also reconstructed and possibly widened first and second audio signal frames Lf and Rf can be reproduced by means of the two loudspeakers LS and RS. Since the reconstructed first channel audio signal frame Lf can be seen as a representation of the first audio signal frame L1 and since the reconstructed second channel audio signal frame Rf can be seen as a representation of the second audio signal frame Rt , the first channel audio signal frame L1 is assigned to the first loudspeaker LS and the second channel audio signal frame R1 is assigned to the second loudspeaker RS in this case, too.
The position A can be expressed relative to the loudspeakers LS and RS. For instance, the distance from position A to the first loudspeaker LS is Si, while the distance from position A to the second loudspeaker RS is S2. The distances Si and S2 are equal. Furthermore, the direction from position A to the first loudspeaker LS can be described by the angle ΘL and the direction from position A to the first loudspeaker LS can be described by the angle ΘR. In the present example, ΘL is 120° and ΘR is 60°. Thus, the triangle A, RS, LS is an equilateral triangle. The first imaginary line S3 is perpendicular to the second imaginary line S4 that connects the first loudspeaker LS and the second loudspeaker RS. The first imaginary line S3 passes through the position A. As an example, the listener may localize a sound component of the replayed first channel audio signal frame Lt or of the second channel audio signal frame Rt at a position P which can be described by its distance to the position A of the listener and the direction θ as seen from that position A.
In the following it is explained how a first parameter indicative of a perceived direction of an audio signal frame pair and at least one preceding audio signal frame pair may be obtained.
In a first step, the energy eL of the first channel audio signal frame L1 and the energy eR of the second channel audio signal frame Rt are computed. This can be done by calculating the sum of the squared absolute values of the frequency transformed samples fL or fR of the respective frame and then extracting the square root of this sum according to equation (4) .
Figure imgf000051_0001
It should be noted that, instead of considering all frequency subbands in the calculation of the energy e,frame 1 of the first channel audio signal frame Lt and the energy eR of the second channel audio signal frame Rt , only the most important frequency bands may be considered, for instance the frequency subbands corresponding to the low frequencies. Which frequency subbands are important may depend on the sample rate of the signal. For instance, half of the frequency subbands could be left out in case of a sample rate of 48kHz, whereas, at a sample rate of 8kHz, the frequency band coverage may only be up to 4kHz so in this case more than half of the bands may probably have to be included in the calculations.
The energies eL and eR of preceding frames are also taken into account according to the pseudocode given in equation (5) . eτ =eτ -β + eτ -(1-β), eκ =eκ -β + eκ -(1-/?) (5)
According to equation (5) , with each update the sum of the former value of the variable eL weighted with β and the value of eL weighted with {1 - β) is assigned to eL eR is computed accordingly. As an example, β may have the value 0.95. It may thus happen that eL and eR change only slightly with each update, and the eL and eR can be considered to represent the long-term energies of the first and second channels, respectively. eL and eR are initialized to zero so that they can be determined even if there has not been a preceding audio signal frame pair.
In the next step, the direction ΘL from the assumed or known position A of the listener to the first loudspeaker LS is weighted with the sum of the energy eL of the first channel audio signal frame Lt and the energy eL of a first channel audio signal frame of at least one preceding audio signal frame pair. Accordingly, the direction ΘR from the assumed or known position A of the listener to the second loudspeaker RS is weighted with the sum of the energy eR of the first channel audio signal frame Rt and the energy eR of a second channel audio signal frame of at least one preceding audio signal frame pair. Normalization with the sum of the energies e7 and eR is performed.
_ eLfmme cos(θL ) + eRfim cos(θR ) _ _ eL/_ • sin(#L ) + e^_ • sin(#Λ ) a _ r frame = ' °- - l frame = ( 6 ]
a -rframe an(^ α -lframe can ^e seen as a coordinate pair representing the overall localized position of sound components within the pair of audio signal frames L1 and Rt and preceding audio signal frames. For instance, a -r frame an<^ α -iframe can describe the position of the point P in Fig. 4. The angle or direction θframe is an indicator of the perceived direction of the audio signal frame pair Lt and Rt .
θframe can ^e a good representation of the perceived direction of the audio signal frame pair Lt and Rt and the at least one preceding audio signal frame pair:
"frame ~ ^-\a _ r frame ' a _ l frame ) ( 7 , In other words, θframe may be seen as a representation of a long-term estimate of a perceived direction of the audio signal .
Knowing the direction from the assumed or known position A of the listener to the first loudspeaker LS and the direction from the assumed or known position A of the listener to the second loudspeaker RS, the first parameter θframe may be determined with low computational effort.
The difference θframe-90° can also be seen as a first parameter indicative of a perceived direction of an audio signal frame pair Lf and Rf and at least one preceding audio signal frame pair or, in the time domain, of the pair Lt and Rt and at least one preceding audio signal frame pair.
It is indicative of a perceived direction of the audio signal frame pair Lf and Rf and at least one preceding audio signal frame pair relative to a first imaginary line S3 passing through the assumed or known position A of the listener, the first imaginary line S3 being perpendicular to a second imaginary line S4 connecting the first loudspeaker LS and the second loudspeaker RS.
The angular offset θfrfame -90' relative to the first imaginary line S3 may also be thought of as an indicator for the amount of spatial effects contained in the entire audio signal frame pair Lf and Rf and at least one preceding audio signal frame pair. This indicator may also be considered in deciding whether a gain should be applied to a specific subband of the reconstructed first channel audio signal frame Lf or of the reconstructed second channel audio signal frame Rf .
In the following it is explained how a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of a audio signal frame pair, can be obtained.
First, the frequency domain audio signal frames Lf and Rf are divided into m frequency subbands, wherein 0<m<M . sbθffset[m] is the lower bound of frequency subband m . The frequency subbands m , i.e. their bounds, may be identical for the first channel audio signal frame Lf and the second channel audio signal frame Rf . Moreover, they may also be identical to the frequency subbands of the reconstructed audio signal frame pair Lf and Rf .
The frequency subbands m of the audio signal frame pair Lf and Rf are non-uniform frequency subbands corresponding to the human auditory filters. Namely, they are equivalent rectangular bandwidth (ERB) frequency subbands. The human auditory system can be modeled as a system comprising a plurality of auditory filters. The bandwidths of these auditory filters increase with increasing audio signal frequencies. By splitting the audio signal frame pair Lf and Rf into non-uniform frequency subbands m corresponding to the human auditory filters, determination whether a gain should be applied to a frequency subband m of the reconstructed first channel audio signal frame Lf or of the reconstructed second channel audio signal frame Rf can take into account psychoacoustic parameters of the human auditory system, thus possibly attaining enhanced widened auditory spatial images.
In the next step the energies eL in each frequency subband m of the first channel audio signal frame Lf and the energies eR in each frequency subband m of the second channel audio signal frame Rf are determined:
Figure imgf000056_0001
(8)
The energies eL and eR can be considered to represent the short-term energies of the first and second channels in each frequency subband m, respectively.
Subsequently, the direction ΘL from the assumed or known position A of the listener to the first loudspeaker LS weighted with the energy eL of the first channel audio signal frame Lf , which corresponds to the energy of said frame in the time domain Lt , is calculated for each subband m . Accordingly, the direction ΘR from the assumed or known position A of the listener to the second loudspeaker RS weighted with the energy eR of the second channel audio signal frame Rf , which corresponds to the energy of said frame in the time domain Rt , is calculated for each subband m . Normalization with the sum of the energies eL and eR is performed: eL - cos(#z )+ eΛ - ∞s(θR ) . eL - sm(θL ) + eR - sin(øj a _ rm = — = , a _ιm = — = . ( 9 ; eLm + eRm eιm + eκm
Similar to the explanations given above with respect to the first parameter, a directional angle θm can be derived from a _rm and a _im .
Figure imgf000057_0001
The angles or direction angles θm are second parameters indicative of the perceived direction of the respective frequency subband m of the audio signal frame pair Lf and
Rf , and thus of the audio signal frame pair L1 and R1 in the time domain. In other words, the angles or direction angles θm may be seen as indicative of a short-term estimate of a perceived direction of the respective frequency subband m of the audio signal frame pair.
The second parameters θm of the set of second parameters are obtainable from the direction from the assumed or known position of the listener A to the first loudspeaker LS weighted with the first channel audio signal frame energy eL within the respective frequency subband m , and the direction from the assumed or known position A of the listener to the second loudspeaker RS weighted with the second channel audio signal frame energy eR within the respective frequency subband m .
The differences #m-90° can also be seen as a set of second parameters indicative of a perceived direction of the respective frequency subband m of the audio signal frame pair Lf and Rf .
They are indicative of a perceived direction of a respective frequency subband m of the audio signal frame pair Lf and Rf relative to a first imaginary line S3 passing through the assumed or known position A of the listener, the first imaginary line S3 being perpendicular to a second imaginary line S4 connecting the first loudspeaker LS and the second loudspeaker RS.
By describing the above perceived directions θm relative to the first imaginary line S3, arranging the perceived directions θm in two groups, wherein the first group comprises perceived directions having a positive angular offset relative to the first imaginary S3 line and the second group comprises perceived directions having a negative angular offset relative to the first imaginary line S3, is enabled. The angular offset relative to the first imaginary line S3 may also be thought of as an indicator for the amount of spatial effects contained in the respective subband m of the audio signal frame pair Lf and Rf . This indicator may also be considered in deciding whether a gain should be applied to a specific subband m of the reconstructed first channel audio signal frame Lf or of the reconstructed second channel audio signal frame Rf .
The parameter aveDiff can be seen as an averaged representation of the set of second parameters θn
Figure imgf000059_0001
Alternatively, the computation of parameter aveDiff may consider only a subset of the second parameters θm, for example θm corresponding to a selected number of lowest frequency bands, instead of considering the full set of second parameters θm for the M frequency bands as in equation (11) .
Considering aveDiff in determining whether a gain should be applied to at least one frequency subband m of the reconstructed first channel audio signal frame Lf or to at least one frequency subband m of a reconstructed second channel audio signal frame Rf , may thus correspond to basing the determination on averaged second parameters .
If aveDiff assumes the value 0, this can be understood as indicating that no spatial effects are contained within the pair of audio signal frames Lf and Rf , i.e. that the first audio signal frame Lf and the second audio signal frame Rf are identical. On the other hand, an assumed value of 90° indicates maximum spatial effects within the pair of audio signal frames Lf and Rf , i.e. they differ completely from each other and are not correlated.
An advantage of basing determination whether a gain should be applied to at least one frequency subband m can be that can be that fast determination may be enabled. Checking if each second parameter θm meets a certain condition may require more time or higher processing power. This may be even more significant if several conditions have to be met.
Fig. 5 is a flowchart illustrating an exemplary embodiment of a first method according to the second aspect of the present invention.
According to this embodiment, whether a gain has been applied to at least one frequency subband m of a preceding reconstructed first or second channel audio signal frame is taken into account in determining whether a gain should be applied to at least one frequency subband m of the reconstructed first or second channel audio signal frame Lf and Rf , respectively.
To this end, it is stored in the array st_wideArray, whether a gain has been applied to at least one frequency subband m of a preceding reconstructed first or second channel audio signal frame, st wideArray is initialized to zero start up. It is updated after each audio signal frame pair by performing a left shift, thereby discarding the flag indicating whether a gain has been applied to the oldest reconstructed audio signal frame pair represented by st_wideArray . The flag for the current reconstructed audio signal frame pair Lf and Rf is stored at the rightmost position st_wideArray [ 0 ] in the array. In the present example, eight flags are stored in st wideArray.
In step 1, it is checked whether a gain has been applied to the last eight frames by comparing st_wideArray to the binary number 11111111. It is also checked whether the conditions aveDijf < 10° and Θ ffrra^me - 9W < 10° are also met . I f true, applying a gain to at least one frequency subband m of the reconstructed first channel audio signal frame Lf or of the reconstructed second channel audio signal frame Rf is not enabled. In other words, auditory spatial image widening is not enabled. The flag widening enabled is used to indicate the whether or not a gain should be applied. The value 1 corresponds to enabled auditory spatial image widening and the value 0 corresponds to disabled auditory spatial image widening.
A listener's perception of the auditory spatial image of an audio signal he is provided with at a certain time may depend on the history of the audio signal. For example, due to auditory spatial image widening having been applied to preceding reconstructed audio signal frames, the listener may continue to perceive a widened auditory spatial image even no gain is applied to any of the subbands m of the current reconstructed audio signal frames Lf and Rf . Thus, it may not be necessary to apply a gain to any of the subbands m of the currently perceived reconstructed audio s ignal frames Lf and Rf .
In step 2 , it i s checked whether the conditions aveDiff < 10° or θ frf ame - 90' < 10° are met .
If false, i.e. if at least one of them is greater than 10°, it is determined that auditory spatial image widening should be applied. Being greater than 10°, the amount of spatial effects in the audio signal frame pair L1 and R1 is rather large. Thus, a gain should be applied to at least one frequency subband m of the reconstructed first channel audio signal frame Lf or to at least one frequency subband m of the reconstructed second channel audio signal frame Rf in order to obtain a reconstructed auditory spatial image closely resembling the original auditory spatial image of the audio signal frame pair Lt and R1.
I f the check of step 2 yields a pos itive result , a third check i s performed in step 4 . Thi s time , it i s determined if the conditions aveDiff > 9° and Θframe - W > 9° are met and if widening has been enabled for the past two frames. Thus, a somewhat lesser amount of spatial effects in the audio signal frame pair L1 and R1 is sufficient to cause determination that widening should be applied. Among other reasons, this stems from the fact that if a gain has been applied to a subband of a preceding reconstructed audio signal frame it may be advisable to also apply a gain to at least one subband m of current reconstructed audio signal frames Lf and Rf so as not to cause sudden changes in the perceived auditory spatial image. Moreover, as widening has not been applied to the past eight frames, but only to a smaller number of frames, it will make a perceptual difference whether or not widening is applied to at least one subband m of the current reconstructed audio signal frames Lf and Rf .
Similarly, step 5 comprises checking if the conditions aveDiff>%° and θfmme - 90° > 8° are met and if widening has been enabled for the past six frames.
If the checks in step 5 are negative, step 6 is entered. It is checked if genc < 2 and aveDiff < 10° hold, with
6 enc τhe variabie s a and bm and the
Figure imgf000063_0001
functions MAX ( ) and MIN () will be explained in more detail below (see the discussion of equation (13)) . If this is not the case, it is determined that auditory spatial image widening should be applied.
Taking into account the amount of spatial effects in the audio signal frame pair Lt and Rt as well as frequency subband specific spatial effect information and the audio signal history according to the exemplary embodiment of the present invention currently discussed, determination whether auditory spatial image widening should be applied is based on properties of human auditory perception. In consequence, reconstructed auditory spatial images may be obtained that closely resemble the original auditory spatial image.
In the following it is explained how it can be determined for each frequency subband m whether a gain should be applied to either the reconstructed first channel audio signal frame Lf or the reconstructed second channel audio signal frame Rf , wherein discrimination whether the gain should be applied to the respective subband m of the reconstructed first channel audio signal frame Lf or of the reconstructed second channel audio signal frame Rf is based on the second parameter θm associated with the respective frequency subband m .
To this end, if it has been determined that a gain should be applied to at least one frequency subband m of the reconstructed first channel audio signal frame Lf , i.e. the left channel audio signal frame, or to at least one frequency subband m of the reconstructed second channel audio signal frame Rf , i.e. the right channel audio signal frame, a binary value flagsWldemng(m) is determined for each frequency subband m indicating whether a gain should be applied to the said frequency subband m of the reconstructed left channel audio signal frame Lf or to said frequency subband m of the reconstructed right channel audio signal frame Rf . If θm is greater than 90°, a gain should be applied to the respective frequency subband m of the reconstructed left channel audio signal frame Lf . Otherwise, it should be applied to the respective frequency subband m of the reconstructed right channel audio signal frame Rf .
, , Left, θm > 90 ΛOc flags wιdemng{m) = { [ R„i.gh it, ot 1herwi ■se > 0 < m < M ( 12 ;
Thus, applying a gain to the reconstructed audio signal frame associated with the audio signal frame of the audio signal frame pair Lt and R1 that is assigned to the loudspeaker for which the direction from an assumed or known position A of a listener to said loudspeaker is closer to the perceived direction in the respective frequency subband m as indicated by the second parameter θm of said frequency subband m is enabled.
In the following it is explained how the gain can be obtained from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of an indicator of the maximum of the energy eL in the frequency subband m of the first channel audio signal frame Lf and the energy eR in said frequency subband m of the second channel audio signal frame Rf , and an indicator of the minimum of the energy eL in the frequency subband m of the first channel audio signal frame Lf and the energy eR in said frequency subband m of the second channel audio signal frame Rf .
Figure imgf000066_0001
Defining cιm and bm according to equation (13), MAX(am,bm) is an indicator of the maximum of the energy eL in the frequency subband m of the first channel audio signal frame Lf and the energy eR in said frequency subband m of the second channel audio signal frame Rf . Similarly, MIN(am,bm) is an indicator of the minimum of the energy eL in the frequency subband m of the first channel audio signal frame Lf and the energy eR in said frequency subband m of the second channel audio signal frame Rf . The gain gwιdemng is then given by equation (14) :
1 g widening;
J_ ψ MAXja m,bm) ( 14 :
Swidening ^s s reciprocal average value of the frequency subband specific indicator ratios of MAX(am,bm) and
Figure imgf000066_0002
With gwιdemng being the reciprocal average value of the frequency subband specific indicator ratios, the gain Swidening ^~s comparatively small if the subband specific indicator ratios are large. Large subband specific indicator ratios may occur if the indicator MAX(am,bm) of is greater than the indicator MIN(am,bm) . In other words, large subband specific indicator ratios may occur if the energies of the first channel audio signal frame Lf and the second channel audio signal frame Rf in the respective subband m differ significantly. In consequence, applying the gain gwldmmg may yield a widened reconstructed auditory spatial image in which subtle inter-channel differences are emphasized more than prominent inter-channel differences. Thus the widened reconstructed auditory spatial image may closely resemble the original auditory spatial image.
Gain quantization may enable representation of the gain by a reduced number of bits. If the gain is to be applied to a frequency subband, transmitting it to an entity configured to apply it may only require a reduced bandwidth.
A quantized representation of gwιdemng can be obtained using the quantization table qTbl{k): qTbl(k)=2025k , 0<k<K. (15;
In the present example, K is 16. The quantized gain can be chosen as the quantized value qTblyk) minimizing the error distance to the unquantized gain value gwldemng
To further reduce the required bitrate for representing the gain for example for transmission or storage, a lookup table can be used. The entity configured to determine the gain can then derive the index wideninggam ldx of the respective quantized value. An entity configured to apply the gain can retrieve the quanti zed gain gwldmmg from the lookup table us ing the index widening gam ldx . gw,den,ng = qTbl[widening gam _ldx J ( 1 6 ;
The index wideninggam ldx can possibly be represented by fewer bits than the quantized gain gwιdemng itself.
Fig. 6 is a flowchart illustrating an exemplary embodiment of a second method according to the second aspect of the present invention.
It is similar to the flowchart of Fig. 2. Hence, corresponding method steps are not repeatedly discussed. In contrast to the flowchart of Fig. 2, it is determined in step 112 whether a gain should be applied to at least one frequency subband m of the reconstructed first channel audio signal frame Lf or of the reconstructed second channel audio signal frame Rf . The results obtained in method step 112 are also incorporated into a bitstream to be provided for a parametric audio decoder.
Fig. 7 is a more detailed illustration of method step 112 of Fig. 6.
In step 113, the first parameter θframe as defined by equation (7) is calculated. Step 114 comprises calculating the set of second parameters θ as defined by equation (10) . In step 115 the value θfi - 901 i s computed, whi le in step 11 6 the values #m - 90° are determined. Subsequently, in step 117, aveDiff is determined according to equation (11) . In step 118, the actual decision whether a gain should be applied is performed as elucidated above with respect to Fig. 5. If gain application is not be carried out, merely the flag widening_enabled = 0 comprising this information is to form part of the bitstream to be provided for a parametric audio decoder. If a gain should be applied, it is computed in step 120 in accordance with equations (13) to (16) . Also, the flags flagsWldemng(m) according to equation
(12) are determined in step 119. The index wideninggaιn lώc and the flags flagsWldenmg(m) are multiplexed into the bitstream along with the flag widening_enabled = 1 indicating that the gain should be applied, the mono signal and the spatial extension information.
Fig. 8 is a flowchart exemplarily illustrating decoding of a bitstream generated according to the method of Fig. 6 including gain application. It is similar to the flowchart of Fig. 3, but in step 121 the index wideninggam ιdx, the flags flagswιdemng{m) and the flag widening_enabled = 1 are used to apply a gain to frequency subbands m of the reconstructed first channel audio signal frame Lf or of the reconstructed second channel audio signal frame Rf . Audio signal frames L1 and
R1 are obtained in step 122 by frequency-to-time transformation . The actual gain application can be performed by multiplying the samples fL of the reconstructed first channel audio signal frame Lf or the samples fR of the reconstructed second channel audio signal frame Rf by gwldemng if the flags flags wιdemng{m) indicate to do so.
This following pseudocode also describes this process. for(i = 0; i < M i++) { if widening enabled == Λl' bit { for(j = sbOffset [i] ; j < sbθffset [i + I] ; j++) { if (f lags_widening (i) == Right)
J R\J )~ J R\J )' S widening
Else fL(j)=fL(j)-g widening
In another embodiment, it is also possible to determine for each frequency subband m whether a gain should be applied to either the reconstructed first channel audio signal frame Lf or the reconstructed second channel audio signal frame Rf , wherein discrimination whether the gain should be applied to the respective subband m of the reconstructed first channel audio signal frame Lf or of the reconstructed second channel audio signal frame Rf is based on the inter-channel level difference icldL(m) in said frequency subband m of the reconstructed first channel audio signal frame Lf and on the inter-channel level difference icldR{m) in said frequency subband m of the reconstructed second channel audio signal frame Rf .
, N. [ Left, icld Λm) > icld Λm) flags wιdemng{m) = \ [ RΏi.ght ', ot 1herwi .se ' 0 < m < M ( IT,
The gain gwldemng can also be obtained from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of an indicator of the maximum MAX(icldL\i),icldR(i)) of the inter- channel level difference icldL(m) in the frequency subband m of the reconstructed first channel audio signal frame Lf and the inter-channel level difference icldR\m) in said frequency subband m of the reconstructed second channel audio signal frame Rf , and an indicator of the minimum MIN(icldL\i),icldR\i)) of the inter-channel level difference icldLψι) in the frequency subband m of the reconstructed first channel audio signal frame Lf and the inter-channel level difference icldR{m) in said frequency subband m of the reconstructed second channel audio signal frame Rf :
έ> widening ( 18 ;
Figure imgf000071_0001
The parameters determining the inter-channel level differences icldLψι) and icldRψι) may anyway form part of the spatial extension information needed in order to establish these differences between the reconstructed first channel audio signal frame Lf and the reconstructed second channel audio signal frame Rf . It may be possible to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame Lf or the reconstructed second channel audio signal frame Rf at the entity configured to perform reconstruction of the reconstructed first and second channel audio signal frames Lf and Rf , i.e. at a decoder. To this end, it may not be necessary to provide said entity with additional parameters for determining for each frequency subband m whether a gain should be applied to either the reconstructed first channel audio signal frame Lf or the reconstructed second channel audio signal frame Rf . Also, the gain can be applied to at least one frequency subband m of the reconstructed first channel audio signal frame Lf and the reconstructed second channel audio signal frame Rf at the decoder.
Fig. 9 is a schematic illustration of a first exemplary embodiment of an apparatus according to the first aspect of the present invention.
The apparatus comprises a processor 201 configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. The processor 201 can also be seen as means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. The processor 201 may be part of an audio encoder or an audio decoder, to name but a few examples. For instance, processor 201 may be configured to implement the steps of the flowcharts of Figs. 5-7 or 8.
Fig. 10 is a schematic illustration of a second exemplary embodiment of an apparatus 203 according to the first aspect of the present invention. The apparatus 203 forms part of a parametric audio encoder 202. It comprises a processor 204. In turn, the processor 204 comprises a frequency subband selection circuit 205 and a parameter determination circuit 206.
The processor 204 is configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. Processor 204 thus may be configured to implement the steps of the flowcharts of Fig. 5-7.
The first parameter and the second parameter can be determined by means of the parameter determination circuit 206. Processor 204 is thus also configured to calculate the first parameter and the set of second parameters (see steps 113 and 114 of Fig. 7) . The parameter determination circuit 206 may also be thought of as means for calculating the first parameter and the set of second parameters and so may the processor 204. For parametric audio encoding, the parametric audio encoder 202 can be provided with an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame. Thus, information that may be necessary for the parameter determination circuit 206 for calculating a first parameter indicative of a perceived direction of the audio signal frame pair and a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair, may be available at the parametric audio encoder 202. As the processor 204 and thus the parameter determination circuit 206 form part of the parametric audio encoder 202, this information is available to the parameter determination circuit 206.
The frequency subband selection circuit 205 is configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband. Hence, the processor 204 can be considered to be configured accordingly. The frequency subband selection circuit 205 or the processor 204 may thus also be seen as means for determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband.
Fig. 11 is a schematic illustration of an embodiment of a readable medium 300 according to the fourth aspect of the present invention.
In this example the readable medium 300 is a computer- readable medium. A program 301 according to the fifth aspect of the present invention is stored thereon. The program 301 comprises program code 302. When executed by a processor, the instructions of the program code 302 cause a processor (for instance processor 201 of Fig. 9 or processor 204 of Fig. 10) to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair. The program 301 can also be considered as a program according to the third aspect of the present invention.
It is to be understood that with respect to all of the above embodiments that relate to a processor the processor may for instance be implemented in hardware alone, may have certain aspects in software alone, or may be a combination of hardware and software. The processor may either be a separate module or it may be a subcomponent of a module such as, for example, a processor or an application specific integrated circuit (ASIC) that has other functional components or structures, too.
Furthermore, it is readily clear for a person skilled in the art that the logical blocks in the schematic block diagrams as well as the flowchart and algorithm steps presented in the above description may at least partially be implemented in electronic hardware and/or computer software, wherein it may depend on the functionality of the logical block, flowchart step and algorithm step and on design constraints imposed on the respective devices to which degree a logical block, a flowchart step or algorithm step is implemented in hardware or software. The presented logical blocks, flowchart steps and algorithm steps may for instance be implemented in one or more digital signal processors (DSPs) , application specific integrated circuits (ASICs) , field programmable gate arrays (FPGAs) or other programmable devices. The computer software may be stored in a variety of computer- readable storage media of electric, magnetic, electro- magnetic or optic type and may be read and executed by a processor, such as for instance a microprocessor. To this end, the processor and the storage medium may be coupled to interchange information, or the storage medium may be included in the processor.
The invention has been described above by means of embodiments, which shall be understood to be exemplary and non-limiting. In particular, it should be noted that there are alternative ways and variations which are obvious to a skilled person in the art and can be implemented without deviating from the scope and spirit of the appended claims. It should also be understood that the sequence of all method steps presented above is not mandatory, also alternative sequences may be possible.

Claims

1. An apparatus, comprising: a processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
2. The apparatus of claim 1, wherein the apparatus comprises a processor configured to calculate the first parameter and the set of second parameters.
3. The apparatus of any of the claims 1-2, wherein the apparatus forms part of a parametric audio encoder.
4. The apparatus of any of any of the claims 1-3, wherein the first parameter and the second parameters are at least partially obtainable from an assumed or known position of a listener with respect to a first loudspeaker to which the first channel audio signal frame is assigned and with respect to a second loudspeaker to which the second channel audio signal frame is assigned.
5. The apparatus of claim 4, wherein the first parameter is obtainable from:
- the direction from the assumed or known position of the listener to the first loudspeaker weighted with the sum of the energy of the first channel audio signal frame and the energy of a first channel audio signal frame of the at least one preceding audio signal frame pair, and
- the direction from the assumed or known position of the listener to the second loudspeaker weighted with the sum of the energy of the second channel audio signal frame and the energy of a second channel audio signal frame of the at least one preceding audio signal frame pair.
6. The apparatus of claim 4 or 5, wherein the second parameters of the set of second parameters are obtainable from the direction from:
- the assumed or known position of the listener to the first loudspeaker weighted with the first channel audio signal frame energy within the respective frequency subband, and
- the direction from the assumed or known position of the listener to the second loudspeaker weighted with the second channel audio signal frame energy within the respective frequency subband.
7. The apparatus of any of the claims 4-6, wherein the first parameter and the second parameters are indicative of a perceived direction relative to a first imaginary line passing through the assumed or known position of the listener, the first imaginary line being perpendicular to a second imaginary line connecting the first loudspeaker and the second loudspeaker .
8. The apparatus of any of the claims 1-7, wherein the processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is configured to base the determination on averaged second parameters .
9. The apparatus of any of the claims 1-8, wherein the processor configured to determine whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is further configured to consider whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame in determining whether a gain should be applied to at least one frequency subband of the reconstructed first channel audio signal frame or to at least one frequency subband of the reconstructed second channel audio signal frame.
10. The apparatus of any of the claims 1-9, wherein the frequency subbands of the audio signal frame pair are non-uniform frequency subbands corresponding to the human auditory filters.
11. The apparatus of any of the claims 1-10, wherein the apparatus comprises a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband.
12. The apparatus of any of the claims 1-10, wherein the apparatus comprises a processor configured to determine for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame.
13. The apparatus of any of the claims 1-12, wherein the gain is obtainable from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of:
- an indicator of the maximum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame, and
- an indicator of the minimum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame.
14. The apparatus of any of the claims 1-12, wherein the gain is obtainable from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of:
- an indicator of the maximum of the inter-channel level difference in the frequency subband of the reconstructed first channel audio signal frame and the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame, and
- an indicator of the minimum of the inter-channel level difference in the frequency subband of the reconstructed first channel audio signal frame and the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame.
15. The apparatus of any of the claims 1-14, wherein the gain is a quantized value.
16. The apparatus of claim 15, wherein the gain is obtainable from a lookup table.
17. An apparatus, comprising: means for determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
18. A method, comprising: determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame, wherein said reconstructed first and second channel audio signal frame are obtainable by decoding a parametrically encoded audio signal representing an audio signal frame pair comprising a first channel audio signal frame and a second channel audio signal frame, based on a first parameter indicative of a perceived direction of the audio signal frame pair and at least one preceding audio signal frame pair and on a set of second parameters, each second parameter indicative of a perceived direction of a frequency subband of the audio signal frame pair.
19. The method of claim 18, wherein it further comprises calculating the first parameter and the set of second parameters .
20. The method of any of the claims 18-19, wherein determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is performed on a parametric audio encoder side.
21. The method of any of the claims 18-20, wherein the first parameter and the second parameters are at least partially obtainable from an assumed or known position of a listener relative to a first loudspeaker to which the first channel audio signal frame is assigned and relative to a second loudspeaker to which the second channel audio signal frame is assigned.
22. The method of claim 21, wherein the first parameter is obtainable from:
- the direction from the assumed or known position of the listener to the first loudspeaker weighted with the sum of the energy of the first channel audio signal frame and the energy of a first channel audio signal frame of the at least one preceding audio signal frame pair, and
- the direction from the assumed or known position of the listener to the second loudspeaker weighted with the sum of the energy of the second channel audio signal frame and the energy of a second channel audio signal frame of the at least one preceding audio signal frame pair.
23. The method of any of the claims 21-22, wherein the second parameters of the set of second parameters are obtainable from the direction from:
- the assumed or known position of the listener to the first loudspeaker weighted with the first channel audio signal frame energy within the respective frequency subband, and
- the direction from the assumed or known position of the listener to the second loudspeaker weighted with the second channel audio signal frame energy within the respective frequency subband.
24. The method of any of the claims 21-23, wherein the first parameter and the second parameters are indicative of a perceived direction relative to a first imaginary line passing through the assumed or known position of the listener, the first imaginary line being perpendicular to a second imaginary line connecting the first loudspeaker and the second loudspeaker .
25. The method of any of the claims 18-24, wherein determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame is based on averaged second parameters .
26. The method of any of the claims 18-25, wherein determining whether a gain should be applied to at least one frequency subband of a reconstructed first channel audio signal frame or to at least one frequency subband of a reconstructed second channel audio signal frame involves considering whether a gain has been applied to at least one frequency subband of a preceding reconstructed first channel audio signal frame or a preceding reconstructed second channel audio signal frame.
27. The method of any of the claims 18-26, wherein the frequency subbands of the audio signal frame pair are non-uniform frequency subbands corresponding to the human auditory filters.
28. The method of any of the claims 18-27, wherein it comprises determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the second parameter associated with the respective frequency subband.
29. The method of any of the claims 18-27, wherein it comprises determining for each frequency subband whether a gain should be applied to either the reconstructed first channel audio signal frame or the reconstructed second channel audio signal frame, wherein discrimination whether the gain should be applied to the respective subband of the reconstructed first channel audio signal frame or of the reconstructed second channel audio signal frame is based on the inter-channel level difference in said frequency subband of the reconstructed first channel audio signal frame and on the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame
30. The method of any of the claims 18-29, wherein the gain is obtainable from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of: - an indicator of the maximum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame, and
- an indicator of the minimum of the energy in the frequency subband of the first channel audio signal frame and the energy in said frequency subband of the second channel audio signal frame.
31. The method of any of the claims 18-29, wherein the gain is obtainable from a reciprocal average value of frequency subband specific indicator ratios, wherein each indicator ratio is the ratio of:
- an indicator of the maximum of the inter-channel level difference in the frequency subband of the reconstructed first channel audio signal frame and the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame, and
- an indicator of the minimum of the inter-channel level difference in the frequency subband of the reconstructed first channel audio signal frame and the inter-channel level difference in said frequency subband of the reconstructed second channel audio signal frame.
32. The method of any of the claims 18-31, wherein the gain is a quantized value.
33. The method of claim 32, wherein the gain is obtainable from a lookup table.
34. A program comprising: program code for performing the method according to any of the claims 18-33, when the program is executed on a processor.
PCT/EP2008/068371 2008-12-30 2008-12-30 Parametric audio coding WO2010075895A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2008/068371 WO2010075895A1 (en) 2008-12-30 2008-12-30 Parametric audio coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2008/068371 WO2010075895A1 (en) 2008-12-30 2008-12-30 Parametric audio coding

Publications (1)

Publication Number Publication Date
WO2010075895A1 true WO2010075895A1 (en) 2010-07-08

Family

ID=40414085

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2008/068371 WO2010075895A1 (en) 2008-12-30 2008-12-30 Parametric audio coding

Country Status (1)

Country Link
WO (1) WO2010075895A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9478224B2 (en) 2013-04-05 2016-10-25 Dolby International Ab Audio processing system
US9672837B2 (en) 2013-09-12 2017-06-06 Dolby International Ab Non-uniform parameter quantization for advanced coupling

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007080225A1 (en) * 2006-01-09 2007-07-19 Nokia Corporation Decoding of binaural audio signals

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007080225A1 (en) * 2006-01-09 2007-07-19 Nokia Corporation Decoding of binaural audio signals

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JAKKA J: "Binaural to Multichannel Audio Upmix", THESIS HELSINKI UNIVERSITY OF TECHNOLOGY, 6 June 2005 (2005-06-06), pages 34, XP007907636 *
VILLE PULKKI: "Spatial Sound Generation and Perception by Amplitude Panning Techniques", PHD DISSERTATION, 3 August 2001 (2001-08-03), Helsinki, FI, pages 1 - 42, XP007907630 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9478224B2 (en) 2013-04-05 2016-10-25 Dolby International Ab Audio processing system
US9812136B2 (en) 2013-04-05 2017-11-07 Dolby International Ab Audio processing system
US9672837B2 (en) 2013-09-12 2017-06-06 Dolby International Ab Non-uniform parameter quantization for advanced coupling
US10057808B2 (en) 2013-09-12 2018-08-21 Dolby International Ab Non-uniform parameter quantization for advanced coupling
US10383003B2 (en) 2013-09-12 2019-08-13 Dolby International Ab Non-uniform parameter quantization for advanced coupling
US10694424B2 (en) 2013-09-12 2020-06-23 Dolby International Ab Non-uniform parameter quantization for advanced coupling
US11297533B2 (en) 2013-09-12 2022-04-05 Dolby International Ab Method and apparatus for audio decoding based on dequantization of quantized parameters
US11838798B2 (en) 2013-09-12 2023-12-05 Dolby International Ab Method and apparatus for audio decoding based on dequantization of quantized parameters

Similar Documents

Publication Publication Date Title
JP7161564B2 (en) Apparatus and method for estimating inter-channel time difference
US11664034B2 (en) Optimized coding and decoding of spatialization information for the parametric coding and decoding of a multichannel audio signal
KR101782917B1 (en) Audio signal processing method and apparatus
US9449603B2 (en) Multi-channel audio encoder and method for encoding a multi-channel audio signal
US9009057B2 (en) Audio encoding and decoding to generate binaural virtual spatial signals
CA2820351C (en) Apparatus and method for decomposing an input signal using a pre-calculated reference curve
JP2022137052A (en) Multi-channel signal encoding method and encoder
US20120039477A1 (en) Audio signal synthesizing
ES2547232T3 (en) Method and apparatus for processing a signal
US9275646B2 (en) Method for inter-channel difference estimation and spatial audio coding device
CN112262433A (en) Apparatus, method or computer program for estimating inter-channel time difference
GB2574667A (en) Spatial audio capture, transmission and reproduction
US20120195435A1 (en) Method, Apparatus and Computer Program for Processing Multi-Channel Signals
WO2010075895A1 (en) Parametric audio coding
JP2017058696A (en) Inter-channel difference estimation method and space audio encoder
Jansson Stereo coding for the ITU-T G. 719 codec
KR102195976B1 (en) Audio signal processing method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08875572

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08875572

Country of ref document: EP

Kind code of ref document: A1