WO2014091375A1 - Reverberation processing in an audio signal - Google Patents

Reverberation processing in an audio signal Download PDF

Info

Publication number
WO2014091375A1
WO2014091375A1 PCT/IB2013/060692 IB2013060692W WO2014091375A1 WO 2014091375 A1 WO2014091375 A1 WO 2014091375A1 IB 2013060692 W IB2013060692 W IB 2013060692W WO 2014091375 A1 WO2014091375 A1 WO 2014091375A1
Authority
WO
WIPO (PCT)
Prior art keywords
reverberation
audio
set
signals
environment
Prior art date
Application number
PCT/IB2013/060692
Other languages
French (fr)
Inventor
Erik Gosuinus Petrus Schuijers
Werner Paulus Josephus De Bruijn
Arnoldus Werner Johannes Oomen
Jeroen Gerardus Henricus Koppens
Original Assignee
Koninklijke Philips N.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US201261737144P priority Critical
Priority to US61/737,144 priority
Application filed by Koninklijke Philips N.V. filed Critical Koninklijke Philips N.V.
Publication of WO2014091375A1 publication Critical patent/WO2014091375A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/305Electronic adaptation of stereophonic audio signals to reverberation of the listening space
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2227/00Details of public address [PA] systems covered by H04R27/00 but not provided for in any of its subgroups
    • H04R2227/007Electronic adaptation of audio signals to reverberation of the listening space for PA

Abstract

An apparatus for processing an audio signal comprises an audio receiver (601) for receiving a set of input audio signals and a reference receiver (609) for receiving a reference reverberation for a rendering of the audio signal. A reverberation processor (607) determines an environment reverberation of an acoustic rendering environment and an audio processor (603) generates a set of output audio signals for a set of audio transducers (605) by processing the set of input audio signals. The audio processor (603) is arranged to modify reverberation for the set of input audio signals when generating the set of output audio signals, with the modification being dependent on the environment reverberation and the reference reverberation. The modification may specifically be such that the total overall reverberation matches the reference reverberation. The invention may provide increased control, e.g. by a content provider, over the audio environment experienced by a user.

Description

REVERBERATION PROCESSING IN AN AUDIO SIGNAL

FIELD OF THE INVENTION

The invention relates to processing of an audio signal and in particular, but not exclusively, to processing of audio objects that are not associated with a specific audio transducer rendering configuration.

BACKGROUND OF THE INVENTION

Digital encoding of various source signals has become increasingly important over the last decades as digital signal representation and communication increasingly has replaced analogue representation and communication. For example, audio content, such as speech and music, is increasingly based on digital content encoding. Furthermore, audio consumption has increasingly become an enveloping three dimensional experience with e.g. surround sound and home cinema setups becoming prevalent.

Audio encoding formats have been developed to provide increasingly capable, varied and flexible audio services and in particular audio encoding formats supporting spatial audio services have been developed.

Well known audio coding technologies like DTS and Dolby Digital produce a coded multi-channel audio signal that represents the spatial image as a number of channels that are placed around the listener at fixed positions. For a speaker setup that is different from the setup that corresponds to the multi-channel signal, the spatial image will be suboptimal. Also, these channel based audio coding systems are typically not able to cope with a different number of speakers.

(MPEG-D) MPEG Surround provides a multi-channel audio coding tool that allows existing mono- or stereo-based coders to be extended to multi-channel audio applications. Fig. 1 illustrates an example of elements of an MPEG Surround system. Using spatial parameters obtained by analysis of the original multichannel input, an MPEG

Surround decoder can recreate the spatial image by a controlled upmix of the mono- or stereo signal to obtain a multichannel output signal.

Since the spatial image of the multi-channel input signal is parameterized, MPEG Surround allows for decoding of the same multi-channel bit-stream by rendering devices that do not use a multichannel speaker setup. An example is virtual surround reproduction on headphones, which is referred to as the MPEG Surround binaural decoding process. In this mode a realistic surround experience can be provided while using regular headphones. Another example is the pruning of higher order multichannel outputs, e.g. 7.1 channels, to lower order setups, e.g. 5.1 channels.

Indeed, the variation and flexibility in the rendering configurations used for rendering spatial sound has increased significantly in recent years with more and more reproduction formats becoming available to the mainstream consumer. This requires flexible representation of audio. Important steps have been taken with the introduction of the MPEG Surround codec. Nevertheless, audio is still produced and transmitted for a specific loudspeaker setup. Reproduction over different setups and over non-standard (i.e. flexible or user-defined) speaker setups is not specified. Indeed, there is a desire to make audio encoding and representation increasingly independent of specific predetermined and nominal speaker setups. It is increasingly preferred that flexible adaptation to a wide variety of different speaker setups can be performed at the decoder/rendering side.

In order to provide for a more flexible representation of audio, MPEG standardized a format known as 'Spatial Audio Object Coding' (MPEG-D SAOC). In contrast to multichannel audio coding systems such as DTS, Dolby Digital and MPEG Surround, SAOC provides efficient coding of individual audio objects rather than audio channels. Whereas in MPEG Surround, each speaker channel can be considered to originate from a different mix of sound objects, SAOC makes individual sound objects available at the decoder side for interactive manipulation as illustrated in Fig. 2. In SAOC, multiple sound objects are coded into a mono or stereo downmix together with parametric data allowing the sound objects to be extracted at the rendering side thereby allowing the individual audio objects to be available for manipulation e.g. by the end-user.

Indeed, similarly to MPEG Surround, SAOC also creates a mono or stereo downmix. In addition object parameters are calculated and included. At the decoder side, the user may manipulate these parameters to control various features of the individual objects, such as position, level, equalization, or even to apply effects such as reverb. Fig. 3 illustrates an interactive interface that enables the user to control the individual objects contained in an SAOC bitstream. By means of a rendering matrix individual sound objects are mapped onto speaker channels.

SAOC allows a more flexible approach and in particular allows more rendering based adaptability by transmitting audio objects instead of only reproduction channels. This allows the decoder-side to place the audio objects at arbitrary positions in space, provided that the space is adequately covered by speakers. This way there is no relation between the transmitted audio and the reproduction or rendering setup, hence arbitrary speaker setups can be used. This is advantageous for e.g. home cinema setups in a typical living room, where the speakers are almost never at the intended positions. In SAOC, it is decided at the decoder side where the objects are placed in the sound scene, which is often not desired from an artistic point-of-view. The SAOC standard does provide ways to transmit a default rendering matrix in the bitstream, eliminating the decoder responsibility. However the provided methods rely on either fixed reproduction setups or on unspecified syntax. Thus SAOC does not provide normative means to fully transmit an audio scene independently of the speaker setup. Also, SAOC is not well equipped to the faithful rendering of diffuse signal components. Although there is the possibility to include a so called multichannel background object to capture the diffuse sound, this object is tied to one specific speaker configuration.

Another specification for an audio format for 3D audio is being developed by the 3D Audio Alliance (3DAA) which is an industry alliance initiated by SRS (Sound Retrieval System) Labs. 3DAA is dedicated to develop standards for the transmission of 3D audio, that "will facilitate the transition from the current speaker feed paradigm to a flexible object-based approach". In 3DAA, a bitstream format is to be defined that allows the transmission of a legacy multichannel downmix along with individual sound objects. In addition, object positioning data is included. The principle of generating a 3DAA audio stream is illustrated in Fig. 4.

In the 3DAA approach, the sound objects are received separately in the extension stream and these may be extracted from the multi-channel downmix. The resulting multi-channel downmix is rendered together with the individually available objects.

The objects may consist of so called stems. These stems are basically grouped (downmixed) tracks or objects. Hence, an object may consist of multiple sub-objects packed into a stem. In 3DAA, a multichannel reference mix can be transmitted with a selection of audio objects. 3DAA transmits the 3D positional data for each object. The objects can then be extracted using the 3D positional data. Alternatively, the inverse mix-matrix may be transmitted, describing the relation between the objects and the reference mix.

From the description of 3DAA, sound-scene information is likely transmitted by assigning an angle and distance to each object, indicating where the object should be placed relative to e.g. the default forward direction. This is useful for point-sources but fails to describe wide sources (like e.g. a choir or applause) or diffuse sound fields (such as ambiance). When all point-sources are extracted from the reference mix, an ambient multichannel mix remains. Similar to SAOC, the residual in 3DAA is fixed to a specific speaker setup.

Thus, both the SAOC and 3DAA approaches incorporate the transmission of individual audio objects that can be individually manipulated at the decoder side. A difference between the two approaches is that SAOC provides information on the audio objects by providing parameters characterizing the objects relative to the downmix (i.e. such that the audio objects are generated from the downmix at the decoder side) whereas 3DAA provides audio objects as full and separate audio objects (i.e. that can be generated independently from the downmix at the decoder side).

In MPEG, a new work item on 3D Audio, referred to as MPEG-H 3D Audio, is currently being initiated. Fig. 5 provides an illustration of the current high level block diagram of the intended MPEG 3D Audio system.

In addition to the traditional channel based format, object based and scene based formats are also to be supported. An important aspect of the system is that its quality should scale to transparency for increasing bitrate. This puts a burden on the use of parametric coding techniques that have been used quite heavily in the past (viz. HE-AAC v2, MPEG-D Surround, SAOC, USAC).

An important feature of the standard is that the encoded bitstream should be independent of the reproduction/rendering setup. Envisioned reproduction possibilities include flexible loudspeaker setups (envisaged up to 22.2 channels), virtual surround over headphones, and closely spaced speakers. Flexible loudspeaker setups refer to any number of speakers at arbitrary physical locations.

The decoder of MPEG 3D Audio is intended to comprise a rendering module that is responsible for translating the decoded individual audio channel s/objects into speaker feeds based on the physical location of the speakers, i.e. based on the specific rendering speaker configuration/ setup.

Thus, audio distribution approaches and standards are increasingly being driven towards an independence of the rendering setup. This requires the receiving/ decoding end to be able to adapt the processing and rendering to match the specific rendering setup used. This provides a high degree of flexibility and allows the individual renderer to adapt to the specific characteristics of the environment as well as to individual preferences. However, the increased flexibility also results in an increased uncertainty which may be disadvantageous in many scenarios. Indeed, the reproduction independent approach means that the content provider or generator does not know the specific rendering setup used, and this makes it more difficult for the content provider to control or influence the rendered audio. This may be disadvantageous in many scenarios where it is desired, either by the content provider or by the content consumer, that the rendered audio is controlled from the content provision side. For example, if it is required that the audio perceived by a user should correspond to a given original audio scene (as e.g. captured or generated by the content creator) as accurately as possible.

The high degree of flexibility may also provide uncertainty relating to how the renderer should process the audio signal, and may result in a more complicated processing at the decoder/rendering side.

Hence, an improved approach would be advantageous and in particular an approach allowing increased content provider control, facilitated or improved adaptability to different rendering configurations, facilitated or reduced determination of suitable processing algorithms, reduced complexity, an improved user experience, and/or improved performance would be advantageous.

SUMMARY OF THE INVENTION

Accordingly, the Invention seeks to preferably mitigate, alleviate or eliminate one or more of the above mentioned disadvantages singly or in any combination.

According to an aspect of the invention there is provided an apparatus for processing an audio signal, the apparatus comprising: an audio receiver for receiving a set of input audio signals; a reference receiver for receiving a reference reverberation for a rendering of the set of input audio signals; a reverberation processor for determining an environment reverberation of an acoustic rendering environment; an audio processor for generating a set of output audio signals for a set of audio transducers by processing the set of input audio signals; wherein the audio processor is arranged to modify reverberation for the set of input audio signals when generating the set of output audio signals, the modification being dependent on the environment reverberation and the reference reverberation.

The invention may provide an improved audio experience, and in particular an improved spatial audio experience. The approach may support rendering over a wide range of loudspeaker configurations and/or audio environments with increased adaptability of the user experience to the given configuration/ environment. The approach may specifically allow improved control of the perceived spaciousness of the rendered audio scene, and may e.g. in many embodiments facilitate or enable that such a perception may be at least partly controlled from the content provider end.

The reference reverberation may e.g. be a desired reverberation or target reverberation. The audio processor may in many embodiments modify the reverberation present on the set of output signals relative to the set of input signals. In many embodiments, the modification may increase reverberation on the set of output signals relative to the set of input signals, e.g. reverberation may be added if the reference reverberation is indicative of a higher amount of reverberation than indicated by the environment reverberation, and/or reduced if the reference reverberation is indicative of a lower amount of reverberation than indicated by the environment reverberation.

In some embodiments, additional reverberation may be added corresponding to a difference between the environment reverberation and the reference reverberation.

The set of input audio signals may comprise only a single audio signal. The set of output audio signals may comprise only a single audio signal. The set of input audio signals may comprise a different number of audio signals than the set of output audio signals.

Each signal of the set of input audio signals and the set of output audio signals may for example be an audio object, audio scene, audio channel or audio component. The set of input audio signals and/or the set of output audio signals may comprise different types of audio signals.

The reference reverberation and/or the environment reverberation may be represented by a single value, such as a level of reverberation or a duration of reverberation. For example, the reverberation(s) may be indicated by a proportion of energy after a given time threshold, or a duration at which the reverberation has reduced to a given level. The reverb eration(s) may specifically be represented by e.g. the value known as T60, i.e. by the time required for reflections of a direct sound to decay by 60 dB. In some embodiments, the environment reverberation and/or reference reverberation may be represented by more complex parameters and characteristics. For example, the reverb eration(s) may be

represented by transfer functions/ impulse responses.

In some embodiments, may be represented by frequency dependent data. For example, the reverb eration(s) may be represented by different values of T60 for different frequency bands, or by individual transfer functions/ impulse responses for different frequency bands. The set of audio signals may specifically be audio drive signals for audio transducers/ loudspeakers.

The environment reverberation may e.g. be determined from measurements of the rendering environment, from a user input, and/or from data received from an internal or external source.

In accordance with an optional feature of the invention, the audio processor is arranged to modify reverberation in response to a comparison of a combination of reverberation by the audio processor and the environment reverberation to the reference reverberation.

The audio processor may specifically be arranged to modify reverberation such that a combination of reverberation by the audio processor and the environment reverberation substantially matches the reference reverberation. Any suitable match criterion or measure may be used.

This may provide a particularly advantageous system in many embodiments. E.g. it may allow for a low complexity approach of controlling the perceived spaciousness and reverberation from e.g. an external source, such as from the content provider side.

In some embodiments, a magnitude (e.g. measured as an energy or duration of decay) of an added reverberation corresponds to a difference in magnitude between the measured environment reverberation and the reference reverberation. For example, the audio processor may be arranged to add reverberation such that the combination of the added reverberation and the reverberation occurring when rendering (represented by the environment reverberation) results in a T60 which is substantially equal to a reference reverberation T60 value received together with the set of input signals from a remote content provider.

In accordance with an optional feature of the invention, the environment reverberation comprises a direct sound to reverberant sound measure.

This may provide a particularly advantageous system in many embodiments, and specifically a direct sound to reverberation sound, such as a direct-to-reverberant energy ratio, may provide a particularly good indication of the perceived effect of reverberation.

In accordance with an optional feature of the invention, the audio processor comprises: a generator for generating a set of direct audio signals and a set of diffuse audio signals from the set of input audio signals; a level adjuster for adjusting levels of the set of diffuse audio signals relative to the set of direct audio signals in response to the environment reverberation and the reference reverberation; an output generator for generating the set of output signals from level adjusted diffuse signals and the set of direct audio signals.

This may provide an efficient and/or facilitated adjustment and control of reverberation. In some embodiments, the level adjuster may be arranged to adjust the level of the set of diffuse audio signals relative to the set of direct audio signals by adjusting the level of either the set of direct audio signals or the set of diffuse audio signals.

The generator may use any suitable criterion for designating signals or signal components as diffuse or direct. The generation of the direct and diffuse audio signals may include decomposition of at least one signal of the set of input audio signals.

In some embodiments, the generator may be arranged to divide the set of input audio signals into the set of direct audio signals and the set of diffuse audio signals by for each input audio signal designating the audio signal as a direct audio signal or as a diffuse audio signal.

In some embodiments, the generator may be arranged to decompose at least one audio signal of the input audio signals into a direct audio signal and a diffuse audio signal.

In accordance with an optional feature of the invention, the audio processor is arranged to increase a level of the set of diffuse signals relative to the set of direct signals for an increasing difference between the reference reverberation and the environment reverberation if the reference reverberation exceeds the environment reverberation.

This may provide a particularly efficient way of increasing reverberation. The approach may be particularly suitable for scenarios wherein the environment reverberation is indicative of a lower reverberation than indicated by the reference reverberation.

In accordance with an optional feature of the invention, the audio processor is arranged to decrease a level of the set of diffuse signals relative to the set of direct signals for an increasing difference between the environment reverberation and the reference

reverberation if the environment reverberation exceeds the reference reverberation.

This may provide a particularly efficient way of reducing reverberation. The approach may be particularly suitable for scenarios wherein the environment reverberation is indicative of a higher reverberation than indicated by the reference reverberation.

In accordance with an optional feature of the invention, the audio signal processor is arranged to determine a reverberation for the direct signals in response to the environment reverberation; and to determine the modification in response to the diffuse signals and the reverberation for the direct signals. The reverberation for the direct signals may correspond to the reverberation that is assumed to be introduced to a non-reverberant source when rendered in an acoustic environment which has a reverberation characteristic as indicated by the environment reverberation. A total reverberation may be considered to correspond to the reverberation resulting from the rendering of the direct audio signals combined with the rendering of the diffuse signal. Thus, the entire signal energy of the diffuse signals may be considered to be reverberation. In some embodiments, the combination may further include additional reverberation caused by the rendering of the diffuse signals, i.e. the total reverberation may also consider the combination with the reverberation that is assumed to be introduced to a diffuse signal when rendered in an acoustic environment that has a reverberation

characteristic as indicated by the environment reverberation.

In some embodiments, the audio signal processor is arranged to adjust a level of the set of diffuse signals relative to the set of direct signals to result in a combination of the level adjusted diffuse signal and the reverberation of the direct signals matching the reference reverb erati on .

In accordance with an optional feature of the invention, at least one of the reference reverberation and the measured reverberation is frequency dependent, and the audio processor is arranged to apply a frequency variant modification.

This may provide a more flexible and/or accurate adjustment of the perceived reverberation characteristics.

In accordance with an optional feature of the invention, the environment reverberation comprises an individual environment reverberation for at least a first audio transducer of the set of audio transducers.

This may provide improved and in particular more flexible and/or control of the reverberation performance of the system.

The environment reverberation may specifically reflect a direct sound to reverberant sound measure for a given (nominal) listening position. The environment reverberation may specifically be indicative of a reverberation characteristic of an acoustic transfer function from the audio transducer to the listening position.

In accordance with an optional feature of the invention, the audio processor is arranged to adapt a level of an output audio signal for the first audio transducer relative to a level of an output audio signal for a second audio transducer in response to the individual environment reverberation for the first audio transducer.

This may provide improved performance in many embodiments. In accordance with an optional feature of the invention, the audio processor is arranged to generate the output audio signal for the first audio transducer to have

substantially zero amplitude if the individual environment reverberation for the first audio transducer exceeds a threshold.

This may provide improved performance in many embodiments.

In accordance with an optional feature of the invention, the reference receiver and the audio receiver are arranged to receive an audio data signal from a remote source, the audio data signal comprising both the set of audio signals and the reference reverberation.

The invention may in many embodiments allow an improved control of reverberation characteristics of rendered audio from a remote source, such as e.g. from a content provider also providing the set of audio input signals.

In accordance with an optional feature of the invention, the reference receiver comprises a user interface, and the reference receiver is arranged to determine the reference reverberation in response to a user input.

The invention may in many embodiments allow improved control of reverberation characteristics of rendered audio by a user.

According to an aspect of the invention there is provided a method of processing an audio signal, the method comprising: receiving a set of input audio signals; receiving a reference reverberation for a rendering of the audio signal; determining an environment reverberation of an acoustic rendering environment; generating a set of output audio signals for a set of audio transducers by processing the set of input audio signals; wherein generating the set of output audio signals comprises modifying reverberation for the set of input audio signals, the modification being dependent on the environment reverberation and the reference reverberation.

These and other aspects, features and advantages of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will be described, by way of example only, with reference to the drawings, in which

Fig. 1 illustrates an example of elements of an MPEG Surround system;

Fig. 2 exemplifies the manipulation of audio objects possible in MPEG

SAOC; Fig. 3 illustrates an interactive interface that enables the user to control the individual objects contained in an SAOC bitstream;

Fig. 4 illustrates an example of the principle of audio encoding of 3DAA;

Fig. 5 illustrates an example of the principle of audio encoding envisaged for MPEG 3D Audio;

Fig. 6 illustrates an example of an audio rendering system in accordance with some embodiments of the invention;

Fig. 7 illustrates an example of a loudspeaker rendering configuration;

Fig. 8 illustrates an example of an impulse response of an acoustic transfer function;

Fig. 9 illustrates an example of an audio rendering unit in accordance with some embodiments of the invention; and

Fig. 10 illustrates an example of a processor for determining a reverberation modification parameter in accordance with some embodiments of the invention.

DETAILED DESCRIPTION OF SOME EMBODIMENTS OF THE INVENTION

The following description focuses on embodiments of the invention applicable to rendering of spatial audio data which includes audio objects that are not associated with a specific speaker configuration. However, it will be appreciated that the invention is not limited to this application but may be applied to many other audio signals and audio renderings.

Fig. 6 illustrates an example of an audio renderer in accordance with some embodiments of the invention.

The audio renderer comprises an audio receiver 601 which is arranged to receive audio data for audio that is to be rendered. The audio data may be received from any internal or external source For example, the audio data may be received from any suitable communication medium including direct communication or broadcast links. For example, communication may be via the Internet, data networks, radio broadcasts etc. As another example, the audio data may be received from a physical storage medium such as a CD, Blu-Ray™ disc, memory card etc. As yet another example, the audio data may generated locally, e.g. by a 3D audio model (as e.g. used by a gaming application).

In the example, the audio data comprises a plurality of audio signals which may include audio channel components associated with a specific rendering loudspeaker configuration (such as a spatial audio channel of a 5.1 surround signals) or audio objects that are not associated with any specific rendering loudspeaker configuration.

In the following, the rendering of audio signals will be described with reference to an example wherein the audio receiver 601 may receive a set of input signals that includes both audio channels and audio objects and to an example wherein the audio receiver 601 receives only a single audio signal. In the examples, the audio data may furthermore comprise position information indicating a position where the audio components should be rendered, i.e. indicative of a position that a listener at the nominal listening position should perceive the audio to originate from.

The audio renderer comprises a rendering unit 603 which is coupled to the audio receiver 601 and which is fed the received audio signal. The rendering unit 603 is furthermore coupled to a set of audio transducers, and specifically a set of loudspeakers 605, which are arranged in a given configuration. The specific loudspeaker configuration may vary substantially with both the number of loudspeakers and the position varying. The rendering unit 603 is arranged to generate drive signals for the speakers 605 from the received set of input audio signals. The rendering unit 603 is specifically arranged to generate a sound scene from the received signals to provide a spatial experience, and specifically may provide a surround sound experience.

The rendering unit 603 may specifically be arranged to perform positioning of audio sources, including for example positioning audio objects depending on position information provided for the audio objects, or to render an audio channel from a position corresponding to a nominal position associated with the audio channel. Such positioning may e.g. be based on panning operations which will be well known to the skilled person.

In order to perform such positioning the rendering unit 603 will take into account the actual speaker configuration, i.e. the number and positions of the speakers relative to a nominal listening position.

Fig. 7 illustrates an example of a possible speaker configuration corresponding to a five speaker surround setup. In the example, the speaker configuration comprises five speakers, namely a center speaker C, a left front speaker L, a right front speaker R, a left surround (or rear) speaker LS, and a right surround (or rear) speaker. The speakers are in this example positioned at positions in a circle around a listening position. The speaker configuration is in the example referenced to a listening position and furthermore to a listening orientation. Thus, in the example, a nominal listening position and orientation is assumed for the rendering. Thus, the rendering seeks to position the audio signal such that it, for a listener positioned at the nominal listening position and with the nominal listening orientation, will be perceived to originate from a sound source in the desired direction.

In the example, the positions may specifically be positions that are defined with respect to the (nominal) listening position and to the (nominal) listening orientation. In many embodiments, positions may only be considered in a horizontal plane, and distance may often be ignored. In such examples, the position may be considered as a one- dimensional position which is given by an angular direction relative to the reference direction which is given as a specific direction from the listening position. The reference direction may typically correspond to the direction assumed to be directly in front of the nominal listener, i.e. to the forward direction. Specifically, in Fig. 7, the reference direction is that from the listening position to the front center speaker C.

In the following, the angle between the reference direction and the direction from the listening position to a given speaker will simply be referred to as the angular position of the speaker.

The rendering of the audio is dependent on the physical locations of the speakers, i.e. on the rendering configuration. These positions may be determined or provided to the rendering unit 603 in various ways. For example, they may simply be provided by a direct user input, such as by the user directly providing a user input indicating the floor plan of speaker locations, e.g. using a mobile app interface.

Several fully or semi-automatic methods also exist for determining speaker positions. Most methods comprise relative speaker position location algorithms. They use e.g. ultrasound or audible signals to determine the relative positions. The acoustic methods (both those using ultra- and audible sound) are typically based on the concept of acoustic Time-Of-Flight, which means that the distance between any two speakers is determined by measuring the time it takes for sound to travel from one speaker to the other. This requires a microphone (or ultrasound receiver) to be integrated into each loudspeaker.

In the example, the rendering unit 603 thus processes the input signals to generate a set of output signals which in the exemplary embodiments corresponds to drive signals for a set of speakers 605. Each drive signal is associated with one audio transducer, i.e. a signal is provided for each speaker 605 of the speaker configuration. It will be appreciated that the drive signals may in some embodiments directly be used to drive the loudspeakers 605 but in many embodiments the signal paths may include further processing including for example amplification, filtering, impedance matching etc. The rendering unit 603 is arranged to perform positioning of the received sound sources. For example, for specific audio objects that are desired to be rendered from a specific position, the rendering unit 603 may map the audio object to the different drive signals and loudspeakers, e.g. based on panning operations.

The rendering unit 603 thus takes a set of input signals and processes these to generate a set of output signals, which in the specific case is a set of drive signals for a set of speakers 605. The processing may include any processing used to generate output signals from input signals including for example filtering, amplification, level adjustments, mixing, etc.

Each of the output signals may be a combination of contributions from the input signals, with each contribution corresponding to a signal path from the input signal to the output signal. Each contribution may be dependent on a number of factors and parameters including the type of audio, the desired frequency response, the desired position, the position of the speakers etc.

The audio renderer of Fig. 6 furthermore comprises a reverberation processor 607 for determining an environment reverberation of an acoustic rendering environment. The environment reverberation provides an indication of a reverberation characteristic for an acoustic rendering environment associated with the output signals. In particular, for the system of Fig. 6 where the output signals are drive signals for a set of speakers 605, the environment reverberation provides an indication of a reverberation characteristic for an acoustic rendering environment for the rendering of the drive signals from the speakers 605. The environment reverberation may specifically provide an indication of one or more reverberation characteristics for the room in which the speakers 605 are positioned.

The reverberation processor 607 may in general obtain information with respect to the acoustic environment. This acoustic information may consist of speaker specific information, such as a transfer function from each speaker to the sweet spot listening position (which may be dynamic due to the specific placement of the speaker), or may consist of speaker-independent information such as information on the room acoustics.

In some embodiments, reverberation processor 607 may optionally receive information about the listener position, which may for instance be absolute (relative to room boundaries) or relative to speaker locations.

In the perception of an audio signal rendered by a transducer (loudspeaker), a number of aspects related to spatial perception play an important role. These include:

Position of the speaker with respect to the head. By positioning a speaker at a certain elevation and azimuth angle from the user, localization cues, such as Interaural Intensity Differences (due to shadowing effects of the head) and Interaural Time Differences (due to differences in propagation time to the left and right ears) and monaural spectral cues are introduced. As a result, the user has the perception that the sound is originating from the direction of the speaker.

- Room acoustics. The perception of the same sound played in a church, a bathroom or a small living room can be completely different. This is primarily caused by the presence of reflections. In a small living room, the floor, ceiling, walls and objects in the room typically absorb a large portion of the sound. The sound is "dry"; the resulting reflections are limited. In a bathroom however, reflections are typically clearly present due to low absorption of the floor, ceiling and walls. The sound is "reverberant". The same is true for a church. However, a sound played in a church is mostly still perceived significantly different than when played in a bathroom. This is primarily due to the different dimensions. In a bathroom, the first reflections already enter the ears relatively shortly after the sound that comes directly from the speaker. The so called "early reflections" are close together and secondary reflections follow relatively shortly after the direct sound and early reflections. In a church, the walls are further apart causing early reflections and secondary reflections to be further spaced in time because it takes more time to travel between the boundaries and to the ears. Perceptually, especially for transient type of sounds, the individual early reflections can therefore often be well distinguished. The reflection density increases over time when higher order reflections are introduced. After a while the separate reflections fuse together into the "late reverberation" where the individual reflections can't be distinguished any more, rather the sound starts to become diffuse (see Fig. 8).

Thus, reverberation is a very significant factor in the audio perception for a listener. Various characteristics may be used together or individually to describe various characteristics of the reverberation of an acoustic environment, such as a room. Parameters may for example include:

T60. The time it takes for the reverberation to drop 60dB. The T60 parameter may be frequency dependent. The T60 parameter is usually considered to be the same across the room, so it may be a single number that characterizes the room, independent of source- and listener configurations.

Direct-reverb ratio. The ratio between the energy of the direct sound field and the reverberant sound field, often expressed in dB. This ratio may be frequency dependent. The direct-reverb ratio is typically dependent on both the acoustic properties of the room and the distance between the listener and the sound source, as well as on the directional properties of the sound source. It may also be dependent on the sound source position relative to other objects, such as walls. The direct sound to reverberated sound ratio is often considered to characterize the combination of the room and a specific source/listener configuration within the room.

- Correlation. The amount of correlation between the two ears. This typically refers to the late reverberation. The correlation may be frequency dependent.

Coloration. Absorption of sound energy is frequency dependent due to material properties. Due to the different materials in the room, the reverberation obtains a specific coloration. In addition, close proximity of a sound source to a reflective room boundary (e.g. a wall) can introduce coloration due to comb-filter effects.

In some embodiments, the reverberation processor 607 may be arranged to determine the environment reverberation in response to measurements of the rendering environment. Such measurements may e.g. be performed (semi)manually or may e.g. be performed using automated electronic measurements. As a specific example, the

reverberation processor 607 may generate test signals that are rendered by the speakers 605 and captured by a test microphone positioned at the listening position. The test signals may be rendered time sequentially by the different speakers, thereby allowing an individual characterization of reverberation for each speaker. The test signals may furthermore be designed to allow the suitable reverberation characteristics to be determined, e.g. they may correspond to short pulses and/or frequency sweeps.

In other embodiments, the environment reverberation may be determined in response to a user input. For example, a user may simply provide an input indicating whether the system is used in a room with low reverberation or with high reverberation (e.g. whether it is a dry/wet or echoic/ anechoic room etc.).

The audio renderer of Fig. 6 furthermore comprises a reference receiver 609 for receiving a reference reverberation for a rendering of the audio signal. This reference reverberation may e.g. be information with respect to the acoustics of the mastering system, i.e., the acoustics from the reference mastering, and/or may e.g. be information describing a desired set of acoustical parameters. The reference reverberation may specifically indicate a desired or target reverberation characteristic. The reference reverberation may define parameter(s) that correspond to the parameter(s) used to describe the environment reverberation. E.g. it may provide a T60 value, a direct to reverb ratio, a reverberation energy indication etc. In some embodiments, the environment reverberation and/or the reference reverberation may be provided as a full or partial transfer function. As an example, the reference reverberation may be used to provide an indication of a desired reverberation regarding the acoustics of the playback room.

Alternatively or additionally, the reference reverberation may provide an indication of a desired reverberation regarding the represented scene. Specifically, it may provide an indication of the perceived acoustics of the rendered audio, rendered acoustics and physical acoustics as a result of the playback of the rendered audio and acoustics.

In the system of Fig. 6, the rendering unit 603 is arranged to modify the reverberation such that the reverberation of the output signals may be different from the reverberation on the input signals. The rendering unit 603 may specifically be arranged to modify the reverberation in dependence on the environment reverberation and the reference reverberation. Specifically, the rendering unit 603 may modify the reverberation such that a combination of the added reverberation and the environment reverberation applied to the output audio signals matches the reference reverberation. For example, the magnitude of the added reverberation may correspond to a difference in magnitude between the measured environment reverberation and the reference reverberation.

The approach may in particular provide an efficient way for controlling and manipulating the audio user experience, e.g. from a remote source and without any knowledge of the specific rendering environment or speaker configuration that will be used. For example, a content provider may provide a reference reverberation for e.g. an audio scene. The reference reverberation may reflect the desired degree of reverberation. This reference reverberation can be defined without any knowledge of the actual rendering situation and can accordingly be determined at the content provider side. Furthermore, the same reference reverberation can be used for all Tenderers and accordingly is a generic indication that can be created e.g. together with the creation of the audio signal. In particular, the reference reverberation may be included with audio data and can be distributed with the audio data.

A renderer receiving this reference reverberation can accordingly adapt the local rendering to provide the desired audio experience. This adaptation takes into account the actual acoustic environment such that the combined effect of the modification of the rendering by the rendering unit 603 and the acoustic reverberation introduced by the acoustic environment results in a desired overall reverberation. For example, if the reference reverberation indicates a given amount of reverberation which is higher than the acoustic reverberation in the room, an additional reverberation may be added by the rendering unit 603 such that the desired reverberation is introduced. The audio renderer of Fig. 6 may compare the acoustic properties determined for the rendering room to the reference acoustic properties, with the rendering then being adjusted such that a sound representation is obtained which substantially matches the reference acoustic properties.

The modification of reverberation may for example be introduced individually to at least some audio signals of the input set of audio signals. For example, for a single audio object (say corresponding to a snare drum), a reference reverberation may indicate that the energy level of reverberation after, say, 100 msec should be e.g. 20% of the total energy level. The environment reverberation may indicate that in the current room, the energy level of reverberation after 100 msec is only 10%. In such a case, the rendering unit 603 may add reverberation such that the combined effect results in 20% of the energy being in

reverberation after 100 msec. The reverberation may for example be added by applying a FIR filter to the signal path(s) generating the drive signal(s).

The approach thus allows improved performance, and specifically enables rendering application independent remote control of reverberation, thereby e.g. allowing this to be done at the content provider side. Specifically, information of the acoustic properties of the reproduction room allows the properties of diffuse/reverberant sound to be altered such that it substantially matches that of a target reproduction.

As a specific example, the reference receiver 609 may for one audio object receive a reference reverberation which is given as a sampled transfer function. The sampled transfer function may specifically be one that has been measured in the master acoustic environment, i.e. it may characterize a room associated with the audio scene (whether a real or virtual room). Indeed, in some embodiments, the reference reverberation may be generated from acoustic transfer function measurements in a given room. Similarly, the environment reverberation may be a specific acoustic transfer function generated from measurements in the room in which the audio object will be rendered, i.e. the listening room. Thus, if the audio object is rendered as it is received; the resulting user experience will be that of a sound source being present in the listening room. However, in the example, the rendering unit 603 may modify the reverberation and specifically it may add reverberation. Thus, if, for example, the listening room is relatively small and acoustically relatively dead (i.e.

reflections are strongly attenuated), the environment reverberation will indicate a transfer function with relatively little reverberation. Thus, the transfer function will have a strong component corresponding to the direct sound, and perhaps a few early reflections, but will relatively quickly die out with little reverberation. However, if the audio is intended to be rendered to emulate a large concert hall, which typically has a fair amount of reverberation, the rendering of the audio object in the listening room will deviate substantially from the intended. In the example, the rendering of the audio object may include the addition of a filter, such as e.g. a FIR filter, which introduces a certain amount of delayed components, i.e. the filter may have a transfer function that emulates reverberation. In this way, the audio perceived by the user may correspond more to the intended acoustic environment than to the actual environment. Specifically, the introduced filter may be selected such that the combination of the filter and the acoustic transfer function of the environment reverberation correspond to the received reference transfer function. Specifically, the filter may be selected such that a correlation of the impulse response of the transfer function given by the environment reverberation and the impulse response of the filter is equal to the impulse response of the transfer function given by the reference reverberation.

The use of a reference reverberation and environment reverberation to control the reverberation operation of the rendering unit 603 provides a system wherein the content provider side can efficiently and effectively control reverberation performance at the rendering side while at the same time supporting e.g. audio objects that are speaker setup independent.

In many embodiments, the audio data and the reference reverberation may be received together from a remote source. Thus, the rendering of the audio data can be controlled from the remote source while still allowing the support of any kind of audio component. Furthermore, the remote source do not need to consider any aspects of the rendering or the rendering environment but can simply define characteristics of the desired reverberation, with the specific adaptation to the specific local characteristics being left to the individual renderer.

The received audio data may in many embodiments specifically contain a reference reverberation that represents the artistic intent for the acoustical parameters of sound scene environment. This could e.g. be in the form of dry signal in combination with a parameterization of the desired level of diffuseness. E.g., by providing a desired

reverberation time and direct to reverb ratio. Additionally, a parameterization of the reverb 's coloration could be transmitted. This makes it possible for a decoder to adapt the amount of diffuseness depending on the acoustic properties using e.g. a synthetic reverb (e.g. Jot reverberator). The parameters for such a reverberator could be controlled using the reference reverberation and the environment reverberation. However, the reference reverberation need not be received from the remote source together with the audio data. For example, in some embodiments, the reference reverberation can be received from another source. For example, a third party may, based on specific measurements of various sound environments generate a characterization of various different acoustic environments. For example, such a third party may provide acoustic characteristics for e.g. various concert venues and may provide these to be used with any audio data. Thus, a user may e.g. have a library of stored music. By downloading acoustic data related to a specific concert hall, the stored music may be rendered to sound like it was performed in that concert hall. Such an approach would e.g. allow a user to select whether the stored audio should be rendered as if it was played at a stadium concert, in a small studio, in a church hall, etc. with the corresponding characteristics being provided by the third party.

In some embodiments, the reference reverberation may be provided locally by a user. The audio renderer may for example include a user interface via which the user can enter a desired reverberation characteristic. The audio renderer may then proceed to render the audio signal to provide the desired reverberation. Thus, the approach may in some embodiments provide a very efficient way for user control of the provided audio experience.

In the following a specific example of an operation of the rendering unit 603 will in described.

In the example, a set of direct audio signals and a set of diffuse audio signals are generated from the input signals. The reverberation of the rendered set of output signals is then adjusted by changing the level of the diffuse audio signals relative to the direct signals.

It will be appreciated that any suitable generation/designation of direct signals and diffuse signals may be used. In particular, direct signals may be considered to be signals that have a reverberation below a given threshold and diffuse signals may be considered to be signals that have a reverberation above a given amount. In many embodiments, the designation of channels or audio components as either direct or diffuse may be based on an assumption of the amount of reverberation they contain or may be based on how spatially focused they are assumed to be. For example, audio objects may be considered to be direct signals as they typically represent specific well-defined sound sources whereas audio channels may in some cases be considered diffuse signals as they often represent background noise and or sounds. In many embodiments, audio signals, such as audio channels, may comprise both specific well defined sound sources and general background sources.

Therefore, in many embodiments, one or more audio signals may be decomposed into (sub)signals corresponding to respectively direct audio signals and diffuse audio signals. Thus, a direct audio signal may be a signal which is designated as such and therefore processed with the assumption that it corresponds to spatially narrow sources, whereas a diffuse signal may be a signal which is designated as such and therefore processed under the assumption that it corresponds to a spatially wider sound source. Specifically, a diffuse signal may be a signal which is considered to not be spatially well defined.

A direct audio signal may be an audio signal that is associated to a direct incidence to the human ears, i.e., the sound waves travel from the transducer through the air directly to the human head without being reflected by physical objects. A diffuse audio signal may be an audio signal that is associated to an indirect incidence to the human ears, i.e., the sound waves travel from the transducer through the air passing one or more physical object through refraction and or reflection before arriving at the human head. A diffuse audio signal may therefore comprise parts of or all the sound except the direct audio signal.

Fig. 9 illustrates an example of elements of the rendering unit 603.

In the example, the rendering unit 603 comprises a generator 901 which is fed the set of input signals. The generator 901 then proceeds to generate a set of direct audio signals and a set of diffuse audio signals from the input audio signals.

The rendering unit 603 is arranged to adjust the levels/gains for the set of diffuse audio signals relative to the set of direct audio signals in response to the environment reverberation and the reference reverberation. In the example of Fig. 9, this is done by adjusting the gain/level of the diffuse audio signals based on the reference reverberation and the environment reverberation. The adjustment is performed by a level adjuster 903.

The resulting direct audio signals and adjusted diffuse audio signals are fed to an output generator 905 which proceeds to generate the output/drive signals for the individual speakers.

The output generator 905 may specifically be arranged to position the various sound sources and thus may be based on both the specific speaker configuration and desired sound source positions. Typically, the direct audio signals will be rendered from definite positions (e.g. using panning operations) whereas the diffuse audio signals will typically be rendered with less definite positions, and may typically be spatially spread over a larger area and possibly be rendered with no spatial definiteness.

In some embodiments, the output generator 905 may be arranged to furthermore adapt the processing based on acoustic properties for the rendering environment. For example, if frequency responses of loudspeakers and/or acoustic transfer functions from the speakers to the listening position are known, the output generator 905 may compensate for these by e.g. using a filter with a reciprocal response.

In the example, a low complexity approach can be used to adapt the amount of reverberation to a desired level without requiring complex filtering functionality or consideration of transfer functions. Rather, the generator 901 generates a set of signals that correspond to direct sound sources, i.e. corresponding to sound perceived to be received from a specific direction and thus being non-reverberant sound, and a set of diffuse signals which corresponds to sound being perceived to not have a specific well defined position. The diffuse signals thus correspond to reverberant sound which is a result of combinations of multiple reflections, and which accordingly is generally perceived as not being received from a specific spatial position.

It is a particular advantage of the approach of Fig. 9 that the control of the amount of reverberation can be controlled by a simple level adjustment and does not require any additional generation of echoes etc. Indeed, the rendering unit 603 illustrated in Fig. 9 can individually adjust the weighting of signals that are perceived as direct signals relative to signals that are perceived as reverberation, thereby resulting in a system wherein the perceived reverberation characteristics of a perceived audio scene can be adjusted. Another advantage of the system of Fig. 9 is that the amount of reverberation can be reduced with relatively low complexity. Indeed, the amount of perceived reverberation of a rendered audio scene can be reduced simply be reducing the level/gain for the diffuse signals.

The level adjuster 903 may determine the amount of gain or attenuation to apply to one or more of the diffuse signals based on a comparison of the reference reverberation and the environment reverberation. If the reference reverberation indicates a desired level of reverberation which is higher than that of the environment reverberation, it may proceed to set the gain higher than unity (relative to the gain for the direct audio signals), thereby increasing the perceived amount of reverberation for the audio scene. Thus, if the acoustic properties of the rendering environment do not result in the introduction of as much reverberation as is indicated by the reference reverberation, the level adjuster 903 weighs the diffuse signals higher, thereby increasing the perceived reverberation level towards the desired level. The gain may typically be a monotonic function of how much the reference reverberation exceeds the environment reverberation, i.e. the level of the diffuse signals may be increased for an increasing difference between the reference reverberation and the environment reverberation. Similarly, if the reference reverberation indicates a desired level of

reverberation which is lower than that of the environment reverberation, it may proceed to set the gain lower than unity (relative to the gain for the direct audio signals), thereby increasing the perceived amount of reverberation for the audio scene. Thus, if the acoustic properties of the rendering environment results in the introduction of more reverberation than indicated by the reference reverberation, the level adjuster 903 weighs the diffuse signals lower thereby decreasing the perceived reverberation level towards the desired level. The gain may typically be a monotonic function of how much the environment reverberation exceeds the reference reverberation, i.e. the level of the diffuse signals may be increased for an increasing difference between the environment reverberation and the reference reverberation.

It will be appreciated that in some embodiments, the difference between the levels of reverberation indicated by the reference reverberation and the environment reverberation will be determined. Depending on the sign of the difference, i.e. depending on whether the reference level is higher than the environment level or not, the gain may be above or below unity (relative to the gain for the direct audio signals), and may increase or decrease with the absolute value of the difference. Thus, for the reference level being higher than the environment level, the gain is above unity and increases with the absolute value of the difference, and for the reference level being lower than the environment level, the gain is below unity and decreases with the absolute value of the difference.

The approach may furthermore be used with a range of representations of reference reverberations and environment reverberations. For example, a signal level for reverberation may be provided respectively by the environment reverberation and the reference reverberation. The level may be given as a direct to reverberant sound ratio. For example, the reference reverberation may indicate a percentage of energy of a desired transfer function after, say, 100 msec. Similarly, the environment reverberation may indicate a percentage of energy of a measured transfer function for the environment after 100 msec. A look-up table may then be provided which for percentage pairs provides a gain for the diffuse signals (which may include both gains above and below unity dependent on whether the environment reverberation or reference reverberation is the largest).

However, the approach may also be used with many other measures, including for example whole transfer functions (which could e.g. be used to determine various reverberant energy levels), or non-energy/level indications. For example, a look-up table may provide various gains for pairs of environment reverberations and reference reverberations represented by T60 values. Such an approach may possibly not result in the resulting overall reverberation exactly matching the desired reverberation, but will typically still provide a significant improvement.

The generator 901 may generate the direct and diffuse signals in any suitable way. For example, the signals may be generated by simply designating received audio signals as either a direct audio signal or a diffuse audio signal dependent e.g. on a type of audio signal, a property of the audio signal, or metadata indicating characteristics of the audio signal. In other embodiments, direct and diffuse audio components may be extracted from individual audio signals.

As an example in some embodiments, audio data may support the following input formats:

1) Combination of point-source objects optionally extended with a (set of) signal(s) describing the diffuse sound field (e.g. in the form of additional channels). In this case, the diffuse (reverberant) signal is given. Therefore, based on the determined acoustic parameters of the reproduction room it can be approximated what the total level of reverberation will be at the ears of the listener. By compensating the level of the diffuse signal, the level of the diffuse signal may be matched to the original level. Therefore, in this case the generator may simply implement a selection operation to select the diffuse sound field signal(s).

2) Channel-based input format. In case of a channel-based input format, the diffuse signal component is not known a priori. One way of estimating the diffuse signal from a channel based representation can be a signal decomposition, where common

(coherent) components between channels are estimated and are said to represent the direct sound source. The residual or remainder signal represents the diffuse signal.

An example of such a decomposition for a stereo signal is described below. The

decomposition generates two main and two ambient components from a stereo signal. First a rotation operation is used to generate a single directional/main component, where a parameter is chosen such that the energy of the signal m is maximized:

[w] = [cosa sina]- r

This operation is preferably conducted within a number of frequency bands. The left and right "main components" are then estimated as the least-squares fit of the estimated mono signal m :

Figure imgf000027_0001
where

al =

Figure imgf000027_0002

∑m[k]. [k]

ar =

m[k]-m*[k]

k≡knk where m [k] , l[k] and r[k] represent the main, left and right frequency/subband domain samples corresponding to T/F tile ktik . The two left and right ambient components dl and dr are then calculated as: dl = I - al■ m ,

dr = r - ar - m .

It will be appreciated that the specific approach for determining how to modify the reverberation may depend on the individual preferences and requirements of the individual embodiment.

In some embodiments, the modification may simply be determined in response to the energy of the diffuse signals and without considering any reverberation resulting from the direct audio signals. However, in other embodiments, the determination of the change of reverberation may be dependent on an estimate of reverberation caused by rendering the direct audio signals. Specifically, the level adjuster 903 may determine a reverberation contribution for the direct signals in response to the environment reverberation. This contribution may thus reflect the amount of reverberation that will result from the rendering of the direct audio signals. The modification may then be performed taking both the reverberation contribution from the direct audio signals and the diffuse signals into account. The diffuse signals will typically be considered to correspond to reverberant audio, and accordingly the reverberation effect when rendering diffuse signals may in many embodiments be ignored. As a specific example, the level of the diffuse signals may be set such that an energy of the level adjusted signal together with the energy of the reverberation for the direct audio signals matches an energy level indicated by the reference reverberation. It will be appreciated that all the energy levels referred to may be relative energy levels, and specifically may be relative to a direct or early sound energy. E.g. the energy levels may be expressed as direct to reverberant sound ratios.

In more detail, Fig. 10 illustrates an example of how the level adjuster 903 may determine a suitable gain for the diffuse signals. The approach is based on the notion that the overall perceived reverberation consists of the reverberation of the room itself and the reverberation that is part of the audio signals. The first component can be estimated by determining the amount/character of reverb from the direct portion of the audio signals. For this, both the direct signals as well as the acoustical information (represented by the environment reverberation) are used to establish the contribution. The second component is predominantly described by the contribution of the diffuse signals. The additional reverb introduced by the acoustics on these signals may typically be neglected. These two components together form the overall reverberation. This overall reverberation can be matched to the reference reverberation. Adaptation of the diffuse sound signal may consist of applying a single fixed gain that has been determined before playback. This may be a fixed gain per speaker.

In some embodiments, the adjustment may be adaptive by means of a control loop.

In some embodiments, at least one of the reference reverberation and the measured reverberation is frequency dependent and the rendering unit 603 is arranged to apply a frequency varying modification.

For example, the audio frequency range may be divided into a number of frequency bands, and the room reverberation may be measured in each of the frequency bands. If a frequency invariant reference reverberation is provided, the rendering unit 603 may proceed to determine a modification in each frequency band such that the desired reference reverberation is achieved individually in each frequency band. For example, a gain for the diffuse signals relative to the direct audio signals may be determined in each frequency band. The same approach of determining individual variations in frequency bands may be applied if the reference reverberation is frequency dependent or if they are both frequency dependent. In the previous examples, the environment reverberation has been determined as a common parameter (or set of parameters) for the entire room. However, in some embodiments, the environment reverberation comprises an individual environment reverberation for at least a first audio transducer of the set of audio transducers. Specifically, an environment reverberation may be determined individually for each speaker. The environment reverberation may specifically be a direct sound to reverberant sound measure, and accordingly it may reflect the amount of reverberant sound each speaker provides relative to the sound that reaches the listening position directly (including with sufficiently few reflections). As a specific example, the direct to reverberant sound may be determined as a ratio between an energy level for sound that reaches the listening position within a given time and an energy level for sound that reaches the listening position after a given duration.

Such a measure will typically depend on the position of the speaker relative to the listening position as well as relative to walls etc. The measure will also provide a strong indication of how strongly audio from the speaker is perceived to originate from a spatially well-defined audio source and how strongly it is perceived to be diffuse non-spatially specific sound.

The approach may be used to determine an individual modification of the reverberation for each loudspeaker. E.g. if the reference reverberation indicates a desired direct to reverberant ratio, the rendering unit 603 may adjust the level for the diffuse signals individually for each speaker to result in the indicated direct to reverberant ratio.

Also, in some embodiments, the generated reverberation may be controlled by adjusting the relative gains to the individual speakers. For example, relative gains for the loudspeakers may be determined based on the individual environment reverberation for the speakers, i.e. differences in environment reverberation may result in different gains.

Indeed, the differences in the acoustic response, and specifically the differences in the direct to reverberation ratio, may be exploited to control the overall direct to reverberation ratio as perceived by the listener. For example, a signal fed to a speaker that is placed close to a reflecting wall, will have a much more significant contribution to the reverberation contribution than a speaker that is placed near a curtain. Such effects may be employed by the rendering to control the perceived acoustics.

Indeed, in some embodiments, the rendering unit 603 may set the gain for one or more speakers to substantially zero (say less than 5% of a nominal gain for the speaker or of the gain applied to another speaker) if the environment reverberation for that speaker meets a given criterion. Specifically, if the environment reverberation for the speaker indicates a reverberation amount which is higher than a given threshold, the rendering unit 603 may set the gain to substantially zero, thereby effectively suppressing the speaker. The threshold may in many embodiments be dependent on the reference reverberation. This may for example allow the approach to reduce the overall reverberation. For example, if the reference reverberation indicates relatively low reverberation, the rendering unit 603 may decide to use only speakers for which the resulting acoustic reverberation (specifically the direct to reverberation ratio) is lower than the indicated reference reverberation.

It will be appreciated that the above description for clarity has described embodiments of the invention with reference to different functional circuits, units and processors. However, it will be apparent that any suitable distribution of functionality between different functional circuits, units or processors may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processors or controllers may be performed by the same processor or controllers. Hence, references to specific functional units or circuits are only to be seen as references to suitable means for providing the described functionality rather than indicative of a strict logical or physical structure or organization.

The invention can be implemented in any suitable form including hardware, software, firmware or any combination of these. The invention may optionally be implemented at least partly as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may be physically, functionally and logically implemented in any suitable way. Indeed the functionality may be implemented in a single unit, in a plurality of units or as part of other functional units. As such, the invention may be implemented in a single unit or may be physically and functionally distributed between different units, circuits and processors.

Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present invention is limited only by the accompanying claims. Additionally, although a feature may appear to be described in connection with particular embodiments, one skilled in the art would recognize that various features of the described embodiments may be combined in accordance with the invention. In the claims, the term comprising does not exclude the presence of other elements or steps.

Furthermore, although individually listed, a plurality of means, elements, circuits or method steps may be implemented by e.g. a single circuit, unit or processor. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. Also the inclusion of a feature in one category of claims does not imply a limitation to this category but rather indicates that the feature is equally applicable to other claim categories as appropriate.

Furthermore, the order of features in the claims do not imply any specific order in which the features must be worked and in particular the order of individual steps in a method claim does not imply that the steps must be performed in this order. Rather, the steps may be performed in any suitable order. In addition, singular references do not exclude a plurality. Thus references to "a", "an", "first", "second" etc. do not preclude a plurality. Reference signs in the claims are provided merely as a clarifying example shall not be construed as limiting the scope of the claims in any way.

Claims

CLAIMS:
1. An apparatus for processing an audio signal, the apparatus comprising:
an audio receiver (601) for receiving a set of input audio signals; a reference receiver (609) for receiving a reference reverberation for a rendering of the set of input audio signals;
a reverberation processor (607) for determining an environment reverberation of an acoustic rendering environment;
an audio processor (603) for generating a set of output audio signals for a set of audio transducers (605) by processing the set of input audio signals;
wherein the audio processor (603) is arranged to modify reverberation for the set of input audio signals when generating the set of output audio signals, the modification being dependent on the environment reverberation and the reference reverberation.
2. The apparatus of claim 1 wherein the audio processor (603) is arranged to modify reverberation in response to a comparison of a combination of reverberation by the audio processor and the environment reverberation to the reference reverberation.
3. The apparatus of claim 1 wherein the environment reverberation comprises a direct sound to reverberant sound measure.
4. The apparatus of claim 1 wherein the audio processor (603) comprises:
a generator (901) for generating a set of direct audio signals and a set of diffuse audio signals from the set of input audio signals;
a level adjuster (903) for adjusting levels of the set of diffuse audio signals relative to the set of direct audio signals in response to the environment reverberation and the reference reverberation;
an output generator (905) for generating the set of output signals from level adjusted diffuse signals and the set of direct audio signals.
5. The apparatus of claim 4 wherein the audio processor (603) is arranged to increase a level of the set of diffuse signals relative to the set of direct signals for an increasing difference between the reference reverberation and the environment reverberation if the reference reverberation exceeds the environment reverberation.
6. The apparatus of claim 4 wherein the audio processor (603) is arranged to decrease a level of the set of diffuse signals relative to the set of direct signals for an increasing difference between the environment reverberation and the reference reverberation if the environment reverberation exceeds the reference reverberation.
7. The apparatus of claim 4 wherein the audio signal processor (603) is arranged to determine a reverberation for the direct signals in response to the environment
reverberation; and to determine the modification in response to the diffuse signals and the reverberation for the direct signals.
8. The apparatus of claim 1 wherein at least one of the reference reverberation and the measured reverberation is frequency dependent, and the audio processor (603) is arranged to apply a frequency variant modification.
9. The apparatus of claim 1 wherein the environment reverberation comprises an individual environment reverberation for at least a first audio transducer of the set of audio transducers.
10. The apparatus of claim 8 wherein the audio processor (603) is arranged to adapt a level of an output audio signal for the first audio transducer relative to a level of an output audio signal for a second audio transducer in response to the individual environment reverberation for the first audio transducer.
11. The apparatus of claim 10 wherein the audio processor (603) is arranged to generate the output audio signal for the first audio transducer to have substantially zero amplitude if the individual environment reverberation for the first audio transducer exceeds a threshold.
12. The apparatus of claim 1 wherein the reference receiver (609) and the audio receiver (601) are arranged to receive an audio data signal from a remote source, the audio data signal comprising both the set of audio signals and the reference reverberation.
13. The apparatus of claim 1 wherein the reference receiver (609) comprises a user interface, and the reference receiver (609) is arranged to determine the reference
reverberation in response to a user input.
14. A method of processing an audio signal, the method comprising:
receiving a set of input audio signals;
receiving a reference reverberation for a rendering of the set of input audio signals;
determining an environment reverberation of an acoustic rendering environment;
generating a set of output audio signals for a set of audio transducers (605) by processing the set of input audio signals;
wherein generating the set of output audio signals comprises modifying reverberation for the set of input audio signals, the modification being dependent on the environment reverberation and the reference reverberation.
15. A computer program product comprising computer program code means adapted to perform all the steps of 14 when said program is run on a computer.
PCT/IB2013/060692 2012-12-14 2013-12-06 Reverberation processing in an audio signal WO2014091375A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US201261737144P true 2012-12-14 2012-12-14
US61/737,144 2012-12-14

Publications (1)

Publication Number Publication Date
WO2014091375A1 true WO2014091375A1 (en) 2014-06-19

Family

ID=49955445

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2013/060692 WO2014091375A1 (en) 2012-12-14 2013-12-06 Reverberation processing in an audio signal

Country Status (1)

Country Link
WO (1) WO2014091375A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI607655B (en) * 2015-06-19 2017-12-01 Sony Corp Coding apparatus and method, decoding apparatus and method, and program
US10051403B2 (en) 2016-02-19 2018-08-14 Nokia Technologies Oy Controlling audio rendering
US10393571B2 (en) 2015-07-06 2019-08-27 Dolby Laboratories Licensing Corporation Estimation of reverberant energy component from active audio source

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080019535A1 (en) * 2004-07-05 2008-01-24 Pioneer Corporation Reverberation Adjusting Apparatus, Reverberation Correcting Method, And Sound Reproducing System
US20110081032A1 (en) * 2009-10-05 2011-04-07 Harman International Industries, Incorporated Multichannel audio system having audio channel compensation
US20120275613A1 (en) * 2006-09-20 2012-11-01 Harman International Industries, Incorporated System for modifying an acoustic space with audio source content

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080019535A1 (en) * 2004-07-05 2008-01-24 Pioneer Corporation Reverberation Adjusting Apparatus, Reverberation Correcting Method, And Sound Reproducing System
US20120275613A1 (en) * 2006-09-20 2012-11-01 Harman International Industries, Incorporated System for modifying an acoustic space with audio source content
US20110081032A1 (en) * 2009-10-05 2011-04-07 Harman International Industries, Incorporated Multichannel audio system having audio channel compensation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI607655B (en) * 2015-06-19 2017-12-01 Sony Corp Coding apparatus and method, decoding apparatus and method, and program
US10393571B2 (en) 2015-07-06 2019-08-27 Dolby Laboratories Licensing Corporation Estimation of reverberant energy component from active audio source
US10051403B2 (en) 2016-02-19 2018-08-14 Nokia Technologies Oy Controlling audio rendering

Similar Documents

Publication Publication Date Title
EP2297978B1 (en) Apparatus and method for generating audio output signals using object based metadata
EP1989920B1 (en) Audio encoding and decoding
KR101676634B1 (en) Reflected sound rendering for object-based audio
CN105432097B (en) And weighted with content analysis with stereo room impulse response filter
KR101456640B1 (en) An Apparatus for Determining a Spatial Output Multi-Channel Audio Signal
US9918179B2 (en) Methods and devices for reproducing surround audio signals
US6639989B1 (en) Method for loudness calibration of a multichannel sound systems and a multichannel sound system
US7123731B2 (en) System and method for optimization of three-dimensional audio
JP6085029B2 (en) System for rendering and playing back audio based on objects in various listening environments
JP5688030B2 (en) Method and apparatus for encoding and optimal reproduction of a three-dimensional sound field
EP2304975B1 (en) Signal generation for binaural signals
ES2461601T3 (en) Procedure and apparatus for generating a binaural audio signal
CN1957640B (en) Scheme for generating a parametric representation for low-bit rate applications
US9154896B2 (en) Audio spatialization and environment simulation
US8908874B2 (en) Spatial audio encoding and reproduction
JP5406956B2 (en) System for extracting and modifying the echo content of an audio input signal
JP5595602B2 (en) Apparatus and method for decomposing an input signal using a pre-calculated reference curve
RU2449385C2 (en) Method and apparatus for conversion between multichannel audio formats
JP5455657B2 (en) Method and apparatus for enhancing speech reproduction
US20080273708A1 (en) Early Reflection Method for Enhanced Externalization
US20150358756A1 (en) An audio apparatus and method therefor
TWI555412B (en) Apparatus and method for integrated spatial geometry of the source coding streams
JP2012525051A (en) Audio signal synthesis
RU2667630C2 (en) Device for audio processing and method therefor
CN105684467B (en) The ears of the earphone handled using metadata are presented

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13820977

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct app. not ent. europ. phase

Ref document number: 13820977

Country of ref document: EP

Kind code of ref document: A1