GB2494894A

GB2494894A - Dynamic range control

Info

Publication number: GB2494894A
Application number: GB201116348A
Authority: GB
Inventors: Stephen Baldwin
Original assignee: Earsoft Ltd
Current assignee: Earsoft Ltd
Priority date: 2011-09-22
Filing date: 2011-09-22
Publication date: 2013-03-27
Also published as: GB201116348D0

Abstract

A method for adjusting dynamic range of an audio signal comprises providing an input audio signal with a first dynamic range, mapping the first dynamic range to a second dynamic range using a transfer function 107 with a linear portion aligned to an average level of the input audio signal, and generating an output audio signal with the second dynamic range from the input audio signal. The average level 105 may be determined by a using a one pole filter. It is also disclosed that the average may be computed over a predetermined psychoacoustic timescale. A user input representing a desired dynamic range (a dynamic range tolerance or DRT) may be used to constrain the second dynamic range. The transfer function may be determined on the basis of the user input, and may be dynamically adjusted in response to changes in the noise floor of the listening environment. Fade-in and fade out portions of an audio signal are maintained. A multi-band approach is also disclosed, so that different frequency regions of a signal are compressed by different amounts.

Description

DYNAMIC RANGE CONTROL

The present application is related to the Applicant's co-pending application entitled "Dynamic Range Control" filed on the same day, the contents of which are incorporated herein by reference in their entirety.

BACKGROUND

Dynamic range (for audio) generally describes the ratio of the softest sound to the loudest sound for a piece of audio, a musical instrument or piece of electronic equipment, and is measured in decibels (dB). Dynamic range measurements are used in audio equipment to indicate a component's maximum output signal and to rate a system's noise floor. For example, the dynamic range of human hearing, which is the difference between the softest and loudest sounds that a human can typically perceive is around IS 120dB.

In a noisy listening environment, quiet sections of audio at the lower end of its dynamic range can be obscured by ambient noise. To prevent this, it is typical for the dynamics to be compressed during mastering so that the relative level of quiet and loud parts of the signal is made more similar. For example, modern audio, such as music or television audio normally has a small dynamic range. By reducing the dynamic range of the signal, the audibility of the dynamics is reduced. Reducing the dynamic range is not optimal when it is desired to maximise the total audibility in all listening environments.

This requirement for the signal to be louder than the noise, but not so loud that it is uncomfortable, leads to the definition of the dynamic range tolerance (DRT) of a listening environment. The DRT alters depending on the listener's mood and requirements for the audio (for example, whether the audio is being used as background or for active listening). A larger dynamic range is associated with a greater difference between peak and root-mean-square (RMS) signal level. Therefore, in a better Hstening environment, a similarly greater difference between these is tolerated.

SUMMARY

According to an example, there is provided a method for adjusting dynamic range of an audio signal comprising providing an input audio signal with a first dynamic range, mapping the first dynamic range to a second dynamic range using a transfer function with a linear portion aligned to an average level of the input audio signal, and generating an output audio signal with the second dynamic range from the input audio signal. The average level of the input audio signal can be determined using a one pole low pass filter in combination with an absolute sum and average of the input audio signal with an averaging length greater than a predetermined minimum value.

The method can further comprise aligning the linear portion to the average level using a gain value to shift the transfer function with respect to the input audio signal. User input representing a dynamic range window can be used to substantially constraining the second dynamic range of the output audio signal. In an example, the transfer function is determined on the basis of the user input, and can be dynamically adjusted in response to changes in a noise floor of the listening environment. The measurement can be adjusted to account for the output audio signal. In an example, a fade-in or fade-out portion of the input audio signal is maintained. This can be by preserving a noise floor of the input audio signal.

According to an example there is provided a method for configuring the dynamic range of an output audio signal, comprising providing a dynamic range tolerance window, computing an average value for an input audio signal over a predetermined psychoacoustic timescale, using the average to generate a gain value to shift the dynamic range tolerance window, and using the input audio signal to generate the output audio signal, the output audio signal having a dynamic range substantially confined within the dynamic range tolerance window. In an example, the average level of the input audio signal can be determined using a one pole low pass filter in combination with an absolute sum and average of the input audio signal with an averaging length greater than a predetermined minimum value.

User input defining the dynamic range tolerance window can be received.

A fade-in or fade-out portion of the input audio signal can be maintained.

According to an example there is provided a system for processing an audio signal, comprising a signal processor to receive data representing an input audio signal, map the dynamic range of the input audio signal to an output dynamic range using a transfer function with a linear portion aligned to an average level of the input audio signal, generate an output audio signal with the output dynamic range, from the input audio signal. The average level of the input audio signal can be determined using a one pole low pass filter in combination with an absolute sum and average of the input audio signal with an averaging length greater than a predetermined minimum value. The signal processor is further operable to align the linear portion to the average level using a gain value to shift the transfer function with respect to the input audio signal. In an example, user input representing a dynamic range window for substantially constraining the dynamic range of the output audio signal can be received. A transfer function can be determined on the basis of user input. The signal processor can adjust the transfer function in response to changes in a noise floor of the listening environment, and can maintain a fade-in or fade-out portion of the input audio signal.

According to an example there is provided a computer program embedded on a non-transitory tangible computer readable storage medium, the computer program including machine readable instructions that, when executed by a processor, implement a method for adjusting dynamic range of an audio signal comprising receiving data representing a user selection for a dynamic range tolerance, determining a transfer function based on the dynamic range tolerance, processing an input audio signal to generate an output audio signal using the transfer function by maintaining an average level of the input audio signal within a range defined by the user selection.

BRIEF DESCRIPTION OF THE DRAWINGS

An embodiment of the invention will now be described, by way of example only, and with reference to the accompanying drawings, in which: Figure 1 is a schematic block diagram of a method according to an

example;

Figure 2 is a schematic representation of a transfer function according to an

example;

Figure 3 is a schematic block diagram of an averaging method according to

an example;

Figure 4 is a schematic block diagram of a method for processing a stereo signal according to an example; Figure 5 is a schematic block diagram of a method according to an

example;

Figure 6 is a schematic representation of the overall macro dynamics of a song according to an example; Figure 7 is a schematic representation of the overall macro dynamics of the song of figure 6 following processing using a method according to an

example; and

Figure 8 is a schematic block diagram of a device according to an example.

DETAILED DESCRIPTION

According to an example, there is provided an automatic dynamic range control method and system which provides a processed audio signal on the basis of a listener's DRT. Multiple layers of compression and dynamic range control operate to map an input signal to a desired DRT of a listener in a listening environment whilst performing a minimal amount of dynamic range compression. In an example, coefficients related to time scales over which compression can be varied are selected on the basis of psychoacoustic metrics. Accordingly, the scales are general to humans.

The DRT for a listener embodies a desired audio treatment in a listening environment, and is characterised by a dynamic range window giving a preferred average dynamic range region plus a dynamic range headroom region for an output audio signal. For a signal whose dynamic range is within the window characterising the DRT in the environment in which the signal is present, narrative and the main instruments in a piece of music for example can be easily heard and comprehended, and sudden disturbances in the form of loud effects, distortion and other such sounds do not affect the signal (inasmuch as the listener will typically not be inclined to desire a change in the level of volume of the signal as a result of the loud effects etc). If, however, the level of the signal fluctuates outside of the DRT window, there can be a tendency for a listener to seek to adjust the volume of the signal to compensate. This is typically because sounds will either appear too soft or too loud for the user.

Figure 1 is a schematic block diagram of a method according to an example. An input audio signal 101 can be any audio signal including a signal which is composed of music, spoken wordlnarrative, effects based audio or a combination of all three. For example, an input audio stream 101 can be a song, or a movie soundtrack. Input audio signal 101 has a first dynamic range 103 associated with it. The first dynamic range 103 represents the dynamic range of the input audio signal 101, and can be any dynamic range from zero. According to an example, an input dynamic range from an input audio signal 101 is not calculated. In block 105, the average level of the input audio signal 101 is determined. In an example, a running RMS of the signal 101 is computed using a selected averaging length.

In block 109, input is received representing a listening environment. The input can be received using a user interface (UI) which can provide multiple selectable options for a listening environment, at least. For example, an environment could be: cinema, home theatre, living room, kitchen, bedroom, portable music device, car, in-flight entertainment, each of which can have suitable selectable elements in the UI to enable a user to execute environmental dependent processing. In an example each of the environments has a different DRT associated with it which is related, amongst other things, to the noise floor of the environment in question. For example, the DRT for an in-flight entertainment environment will be smaller than that for a cinema environment due to differences in the noise floors associated with these environments as a result of ambient noise levels (the noise floor in an in-flight entertainment situation being relatively higher than that of the cinema environment for example).

In block 107 a transfer function is provided. The transfer function is determined using the input from block 109 representing the listening environment, and using the average level 105 of the input audio signal 101.

In an example, the transfer function 107 is used to map the first dynamic range 103 to a second dynamic range 111. An output audio signal 113 with the second dynamic range 111 is generated from the input audio signal 101.

Figure 2 is a schematic representation of a transfer curve according to an example. The transfer curve 201 has several portions depicted generally at 203, 205, 207 and 209, and is used to map a value for a dynamic range value of an input audio signal (Input (dB)) to a dynamic range value for an output audio signal (Output (dB)). Accordingly, transfer curve 201 is a graphical representation of a transfer function 107. The transfer function 107 therefore defines how different signal levels are scaled or mapped. In an example, in order to minimise perceivable processing artifacts in an audio signal, the transfer curve in the region of DRT for the listening environment in question is substantially linear -that is, signals are scaled substantially in direct proportionality in region 207. The region 207 is therefore selected to coincide with a DRT window for an environment, such that an output signal has a dynamic range corresponding to the DRI of a listener in that environment.

Regions 205 and 209 correspond to regions of dynamic range control outside the DRT region 207. To confine signals to within the DRT region would require a limiter for an upper level control for region 209, and an aggressive expander for the lower level control for region 205. However, extreme transfer curves such as those of regions 205, 209 typically produce undesirable end results -that is, extreme upward expansion of a signal below the DPI region results in multiple zero-crossing distortions which occur when the transfer curve has a discontinuity at zero.

Accordingly, the signal will have discontinuities every time it crosses zero as a result.

According to an example, in order to minimise the number of times that a signal is within the regions of dynamic range control (that is, when the signal is being modified in regions 205 and 209), the average level of the signal should lie within the DRT region 207 where the transfer curve is typically linear. To achieve this, a running RMS of an input audio signal is computed. According to an example, the RMS value is used to compute a gain value to shift the transfer function with respect to the input audio signal in order to align the linear portion to the average level of the input audio signal. Accordingly, the dynamic range of an output signal can be controlled so that the DRT of a user in a given listening environment is not exceeded (at either extreme) and the quality of the signal which is perceptible by the listener is not compromised. That is, by maintaining a level of dynamic range control in which signal changes are minimized as a result of an environment-dependent DRT shift, an output signal can be generated which improves a user experience within the sound environment in which they are listening.

In an example, the average level of the input audio signal is determined using an RMS measure of the input audio signal with an averaging length greater than a predetermined minimum value. For example, the averaging length can be a time period which is greater than the typical memory time of humans for a perceived sound level. When exposed to a sound with a consistent level, and given time, listeners typically lose sight of how loud or how quiet the sound is because there is no basis for reference. It is at the changes from one volume level to another where there is the strongest sense of the current loudness, but the overall level does little to affect the overall level of perceived loudness. Therefore, by setting an averaging time to be on the scale at which the brain tends to forget the volume level at the beginning of an interval, the effect of changes on the overall level of the signal will be slow enough for listeners not to perceive what is happening.

For times shorter than this, the transfer curve ensures that the dynamic range of the signal is within tolerance. According to an example, an averaging time of the order of several seconds to several minutes or more can be used. Averaging time can vary depending on user input relating to a DRT. For example, a user input representing a larger DRT can have a slower rate of change. Expansion and limiting typically hides the rate of change for smaller selected DRT sizes, but it will also decrease how hard a limiting region is working, especially for small DRT ranges.

When the input audio has an RMS that lies within region 203, a very large gain would be produced, which tends to infinity as the signal RMS tends to zero. To ensure this does not happen and to ensure that quiet sections of the input audio are not processed to be higher in volume than the sections that should be high in volume, the averaging happens in two steps.

Figure 3 is a schematic block diagram of an averaging method according to an example. Initially, an input audio signal 101 is averaged over a short timescale, such as of the order of a second. In block 303, if the value computed for the short scale average implies that for that time the signal would be inaudible (even in an ideal listening environment) then it is deemed that these parts of the signal should not be expanded. A new function of time is therefore defined which takes a cut-off value such as 0.003 or takes the value of the average of the signal over the past second at time t otherwise if the average is above a minimum threshold for example. The cut-off can be a value which is an adaptive signal dependant value based upon the measured noise floor of the input audio for example.

In block 305, the new function is averaged over a predetermined L0 psychoacoustic timescale and used to define a gain value 307.

Accordingly, the playback level will be low for fade-outs, so that the sound will emerge from inaudible, just as it does in a mastering house for

example.

Upward expansion (region 205 of figure 2) is difficult to achieve musically without significant look ahead (i.e. knowing what the signal will be in the future). Such extreme expansion can result in the signal overshooting the desired threshold for short periods of time unless rapid gain correction is used. However rapid gain changes create undesirable distortions.

According to an example, extreme levels of upward expansion are achieved by separately processing the signal in two different ways that, when summed together give the required expansion. This signal is then limited (region 209 in figure 2) in a similar way to achieve sound within the DRT region 207.

In an example, upward expansion of an audio signal can be achieved by compressing the dynamic range to zero and setting the playback level to be at the lower threshold. Accordingly, for any input level, the signal will be at least at the lower threshold.

Another copy of the audio can then be added at the correct level so that the signal RMS rises above the lower threshold and towards the upper threshold. By applying a similar process in the expansion region (region 209), a signal within the DRT can be obtained. The extreme compression needed to create a zero dynamics version of an input signal is in general masked by the second signal added on top. In an example, the playback level of this zero dynamics signal is at the level of ambient noise. Thus, if distortion harmonics created by compression have an amplitude below the amplitude of the signal being compressed (which is at the noise floor level), the distortions will be masked by the listening environment and therefore be inaudible.

LI

For stereo processing, two input channels (left and right) are turned into four input channels according to an example: left, right, mid (the sum of left and right), and side, (the difference between left and right). The four input channels (feeds) are processed independently of each other, except for the overall averages which define the overall driving gains for expansion and memory rate feeds. In an example, these are taken as the average of the left right mid and side level post filtering. Before limiting, the mid and side feeds are turned into left and right feeds and combined in equal measure with the processed left and right feeds. In an example, the left and right channel are then limited independently of each other.

Figure 4 is a schematic block diagram of a method for processing a stereo signal according to an example. User input representative of a listening environment is provided via a UI in block 109. A DRT 401 can be selected on the basis of the selected listening environment. Accordingly, multiple dUterent DRT metrics can be provided which map to respective different listening environments. For example, where the selected listening environment is a cinema, the DRT metric can provide a preferred average dynamic range window from around -38dB to 0dB, and a dynamic range headroom (peak) from around 0dB to.-24dB An in-flight entertainment listening environment can provide a preferred average dynamic range window from around -6dB to 0dB, and a headroom from around 0dB to +6dB. Other alternatives are possible. DRT metrics can be stored in a database 400. That is, a selected listening environment can map to a DRT metric from database 400 which provides the DRI 401.

In an example, input from a UI in block 109 can be in the form of input representing multiple sliding scale values which can be used to define a DRT metric. That is, a user can use a UI to select values for a preferred average dynamic range window and a dynamic range headroom. Such a selection can be executed by a user entering specific values using a sliding scale (or otherwise, such as raw numeric entry for example), or by using an interface which allows easy selection of values, such as a sliding scale which provides only a visual representation for a DRT metric. In the latter case, the actual values selected for a DRI metric may be unknown to a user, as they may simply use a UI element to provide a range within which they wish to constrain an audio signal for example.

An input audio signal 101 is provided, and both signal 101 and DRT 401 are input to blocks 403 and 405. Block 403 is a pre-processing filter which applies a gain value to each of the left, right, mid and side channels of the input signal 101. In an example, the pre-processing filter can be a k-filter which includes two stages of filtering -a first stage shelving filter, and a second stage high pass filter. In block 405, zero dynamic range and playback level at lower threshold processing occurs on the left, right, mid and side channels of signal 101. In block 407 the processed signals from blocks 403 and 405 can be combined and converted back to left and right channel signals only in block 409.

According to an example, the signal feed used for expansion is averaged with a relatively short average (of the order of -2.4 seconds for instance) and is used to define a gain which, when applied to the original signal produces a signal that has a constant RMS of I for the same averaging time. This constant signal 406 is the output for the first set of processing on the second signal stream from block 405. Similarly, the memory rate signal from the first feed from block 403 is referred to as 404. According to an example, this signal still needs further compression, which is achieved as described below. The signal is finally scaled by a value which places it at the bottom of the DRT. This is done to maintain values near the number 1, which minimises discritisation error.

A digital hard clipper (whereby the signal is simply set to a certain threshold value when it goes beyond it) applies a gain reduction for the shortest amount of time, and uses the exact level of gain reduction required to ensure the signal never exceeds the limit. Accordingly, when the signal is within the limit, a clipper has no effect. However, due to rapid changes in the gain caused by a digital hard clipper, the level of distortion harmonics can be too strong and of an unpleasant unmusical character (unless an aggressive, painful, hard hitting sound is the desired goal). Smoothing the transfer curve provides smoother distortion harmonics even though a small amount of compression is applied when it does not need to be even when the signal is below the threshold. According to an example, a different method is used.

Figure 5 is a schematic block diagram of a method according to an example. A clipped version 501 of 406 divided by 406 is defined as a gain reduction envelope (GRE) 503 according to an example. The GRE, if multiplied with the original signal gives the clipped signal. According to an example, the GRE can be smoothed in time by averaging it over a certain timescale. If the original signal is a continuous tone (i.e. sine wave with constant amplitude), then the smoothed GRE will be approximately a flat line provided the averaging is done over a sufficiently large timescale.

Therefore multiplying 406 with the smoothed GRE would simply have the effect of scaling it so that its peak is at the threshold. If the signal varies in time in such a way that compression is needed initially, but not later (a constantly decreasing in amplitude, transient signal), compression would fade away on the timescale of the averaging of the GRE. However, once the signal drops below the threshold, the smoothed GRE will take a moment to respond. This will mean that after a transient sound there will be a moment of lower amplitude, giving rise to an effect known as pump'.

In order to minimise distortions, the GRE is smoothed with multiple single pole low pass filters. In an example, the GRE is smoothed at the aural reflex relaxation time of -0.63Hz using four identical single pole low pass filters. The aural reflex relaxation time is the amount of time it typically takes for the muscles which contract when a loud sound is incident upon the ear to relax. This is a useful psychoacoustic timescale as the ear-brain system learns to correct sounds which are heard when the aural reflex occurs -thus, altering sound at this timescale tricks the brari nto thinking its aural reflex has relaxed, which implies that the preceding sound was loud.

When driven with a steady state sine wave, the filtered GRE does not typically go to a small enough value to achieve limiting. According to an example, a level correction for steady state 603 is therefore applied to the smoothed GRE so that it does so. This correction is derived from the average level of gain reduction relative to the required minimum level. This correction is pre-calculated and applied using a polynomial. Therefore, even after smoothing the GRE with a single pole filter, steady state sounds peaking over threshold reduce the gain by the amount to limit the signal without any clipping.

Put another way, the GRE created to limit steady state sounds does not typically provide sufficient gain reduction to cause limiting post filtering, unless the steady state sound is a digital square wave for example.

Because of this the GRE is processed in an example. The processing alters the GRE for any driving signal to be similar to that created by a square wave of the same amplitude. To achieve this, the lowest value of the GRE is held until the input signal used to define the GRE goes through a zero crossing point (a sample at which the sign of the signal flips from positive to negative or negative to positive). At the zero crossing points, the hold of the minimum is reset to the current GRE value. The result is that the GRE is altered to be more comparable to that formed from a square wave (and is identical for the portion of the wavelet after the minimum in the GRE has occurred). The GRE may still provide insufficient gain reduction to cause limiting to all steady state sounds. In an example, a correction polynomial can therefore be applied to the altered GRE so that post filtering, sine tones are limited properly. This typically leaves triangle waves and most impulse trains mildly under compressed, with square waves mildly over compressed. However, the deviaflon in gain reduction s significantly less than if the polynomial required in this instance is applied without the hold until zero crossing point' alteration.

The points in time where the zero crossing points take place are affected by the presence of DC in the signal. Because of this frequencies below 14Hz can be removed using a high pass filter before any processing is performed

in an example.

Typically, there are sounds present in most signals which have volume envelopes that vary faster than 0.63Hz. Accordingly, a new fundamental GRE of the signal is formed. According to an example, this GRE is smoothed with another four identical single pole low pass filters tuned to -2.3Hz, which is a temporal masking rate, instead of -0.63Hz. The pump effect mentioned previously occurs similarly with uncompressed sounds due to a psychoacoustic phenomenon known as temporal masking.

Temporal masking is when a low amplitude sound is inaudible due to a preceding high amplitude sound. The lack of audibility is perceived as quiet, so giving a similar effect to pump. Thus, pump can trick the brain into thinking this current sound was preceded by a loud sound, making the previous sound appear louder than its amplitude alone would suggest.

Smoothing the GRE on a timescale similar to that of temporal masking will therefore result in a signal which the brain perceives similarly to the uncompressed one, making the required levels of compression more

acceptable.

The distortion harmonics produced with this limiter would be more audible than with the first slower limiter, but because the slower limiter has come first, the faster limiter will perform less compression than if it was used on its own. This rate of compression is still too slow to catch transients however. Therefore, a fast' limiter is applied to the signal resulting from the second layer of limiting. According to an example, low pass filters on this third Imiter's GRE are tuned to 14Hz. The roughness' caused by the beating of two frequencies differing by 14Hz or more begins to be perceived by humans until the difference in frequency is so great that it is perceived as two separate tones. Compressing at a rate faster than 14Hz leads to an added roughness to the sound, whereas slower than or at this rate only changes the dynamic character rather than the tonal character.

As a result, there are no audible distortions without comparison such as listening to the original sound and the distorted one side by side repeatedly.

After this third limiter' the signal is very compressed.

Typically, most musical material is not highly transient in nature, and the dynamic range is typically much less than 6dB. By setting the overall average of the signal to be at the threshold, the compression is therefore always taking place. The compression does not alter the tone however, and so the result is that the signal is typically less than 3dB away from being at the noise floor of the listening environment at all times.

Although the RMS level of a signal is the largest factor in its perceived loudness, some frequencies are perceived louder than others due to a plethora of factors. A K-filter, as described above, has been shown to typically offer a more accurate map of the input signal to loudness, such that finding the average of a signal that varies in its frequency content post filtering and averaging leads to a number that varies more closely to how a constant frequency balanced sound (e.g. shaped noise) sounds louder or quieter when varied by the same number of dB's. The filtering before averaging gives a better guide to how the signal will be perceived in loudness.

In an example, the signal resulting from the 14Hz limiter is at the volume level of the noise floor, and is added to the signal 404. Because the processing on the two feeds of figure 4 has not altered phase, the feeds add constructively. Therefore, on summing the signal, the result will almost always be above the noise floor and thus s assumed to be always audible (even if only just). According to an example, this summed signal is now limited so that the high volume parts of the signal never exceed the dynamic range tolerance (or a DAC output level). The second feed (404) is of a higher average volume than the compressed (14Hz limited) version and thus masks the distortions in it. The result is a rich full sound with improved depth, which is only normally present in the mastering studio.

According to an example, the same three layer limiting technique is used in the final output limiting stage. However, in order to capture the remaining peaks without buffering a short sequence of samples that are about to be played ("look-ahead"), a clipper can be used. As discussed before, simply clipping the signal adds unwanted distortions. Therefore, a compromise is made to keep the processing as close to real time as possible while producing an acceptable level of distortions.

When two signals are multiplied together in the linear time domain, the result is a signal which contains the sum and the difference of the two frequencies. Therefore, multiplication of a low frequency tone with a high frequency tone will produce two tones close to the original high frequency tone. Because the rate of gain changes a clipper makes are very rapid, the GRE of a clipper has a very wide frequency content and so a large number of distortion products are created across the entire frequency spectrum.

Typically, the human ear hears best near 3 kHz. Typically, most of the energy in music resides in frequencies which are very small compared to 3 kHz, and so the resulting distortions are near 3 kHz, which is undesirable.

Thus, if the frequency content of the GRE can be reduced in amplitude in the frequency range where the human ear hears best, the audibility of the distortions will be lower and thus the result will be more pleasant on the ear.

In an example, by filtering the GRE with a finite impulse response (FIR) filter rather than an infinite impulse response (IIR) filter the signal, after multiplication with the filtered GRE, will not go above unity. A FIR filter consists of a set of coefficients which multiply the past and present input samples. These are then summed to give the output. The number of past input samples used defines the tap count -a 16 tap filter, as used in an example, uses the past 15 samples and the current sample. Typically, limiting occurs, but the frequency content of the filtered GRE will mean that the distortions produced by the smoothed clipper will be in the frequency regions where the ear is insensitive -i.e. at frequencies which are significantly higher or lower than 3kHz.

A FIR filter capable of attenuating 3kHz requires enough delay (look-ahead) to do so. At a sampling rate of 44.1kHz (which is used in CDs and most other consumer audio formats), a filter of length 16 samples leads to a resolution of 2.756KHz. In an example, an elliptic filter is used as it has good distortion-reducing characteristics when the first notch is set to the lowest frequency which can be attenuated for this filter length -that is, typically 2.756kHz. The filter also mildly attenuates the high frequencies in a 16 tap implementation. An average filter (has) lower computational load while being similar to an elliptic filter and can be used in CPU-critical implementations in an example.

To ensure that limiting still occurs, the GRE is held' at the lowest local value for 16 samples and then tails off as if the hold was not present (but including the delay). The filter is designed by taking the filter with the desired characteristics and then making the coefficients positive only by subtracting the smallest coefficient value. Applying the modified filter to the GRE will now only produce positive values. By adding the coefficients together and dividing each coefficient by this total, a filter is obtained where the sum of the coefficients is unity. Therefore, if the filter is applied to a flat line of the length of the filter (the held value), the value of the filter at the end of the flat line s that same value. Thus, the filter will ensure limitrtg.

The result is a psychoacoustic smooth look-ahead limiter which allows for levels of limiting of signals many dB higher than that bearable with generic hard clipping. When combined with the previous three layers of limiting', very high levels of total gain reduction are acceptable.

to One should note that the GRE hold' process also smoothes the GRE and alters its frequency distribution similarly to a low pass filter. The frequency response is similar to a sinc function tuned to 2.75kHz at the first notch.

The result is that for frequencies above 3kHz the limiting is very smooth sounding, meaning that, for example, hi-hats and the top frequencies of a snare crack are very pleasantly limited.

Another advantage of this FIR based approach with a filter that is as short as possible, is that limiting occurs for the shortest acceptable time, which leads to the highest possible overall RMS level. This is in fact higher than musically achievable with hard clipping as more gain reduction can be applied with the FIR smoothed approach before it becomes unacceptably unpleasant. This allows the entire dynamic range available within the DRT of the environment to be utilised to its fullest and allows audio equipment with limited peak output to achieve greater perceived loudness.

Figure 6 is a schematic representation of the overall macro dynamics of a song. As generally depicted by 601, the song starts quiet and crescendos, then jumps to a constant high level. It then jumps to a quieter section, and after this the music jumps to a high volume section which is roughly the same volume as that before, before jumping to a very high level denoted generally by 603. After this big finish' the music jumps to a very quiet section before fading away to dither noise at 605.

Consider that this song is being listened to in a car. The dynamic range tolerance thresholds are -7dBFS rms for the upper limit, and with the lower threshold being -I6dBFS rms. The DRT is thus only 9dB's which is significantly smaller than that of the input music, which is typically -24dB's.

Figure 7 is a schematic representation of the overall macro dynamics of the song of figure 6 following processing using a method according to an

example.

Assuming that no other tracks were playing before this song started, the very slow memory rate' average is zero at the start of the song. Once the track starts the RMS builds and the gain falls from zero to a more correct value so that by the time the song has reached half way through the first loud section of the song the level has effectively settled. The expansion feed has taken the input and squashed it to the lower threshold of the DRT.

Once the loud section beings, the level of the input from the memory rate' gain movement is similar to that of the lower DRT threshold. The two levels add to give an overall level of -lOdB's which is just above the middle of the DRT range. Note though how the over all level has jumped up by -6dBs at the start of this new section, a level of deviation not too dissimilar to that of the uncompressed version.

As the track continues through the first loud section, denoted generally by 701, the RMS level grows and the output level of the second feed before the sum and limiter falls so that by the end of that section the level has fallen to the middle of the DRT to -11.5dB's. Note that the rate that this has happened at is so slow that almost all listeners will not notice that the level was not constant. When the first quiet section, 703, comes at the end of the first loud section 701, the level will drop to the bottom of the DRT, but 2! will still be audible at all times, by the end of the quiet section the level will have risen slightly towards the middle of the DIRT.

At the jump to the second loud section, 705, the level will jump to the top limit of the DRT and will be hitting the limiter at the end of the chain hard, the result will be a compressed sound but will be loud and with the minimal possible distortions. As the section continues, the RMS increases so the level is reduced. This means that when the very loud section hits there is still a level jump up back to maximum compression. Through this section the level falls back towards the middle of the DIRT, and then jumps down to the bottom of the DIRT as the ending quiet section, 707, begins, the level rises and then falls with the fade getting closer and closer to the lower level of the DIRT, but with details of the fade brought forward. Providing that the fade is slower than the memory average' level control, the fade will appear to keep happening even if only due to reduction in SNR and at a rate of 0.1dB/s rather than 1dB/s for example.

According to an example, the system and method described above has generally been described with reference to a single band, and using a fixed level as the noise floor which is defined by user selection of the noise environment using a UI. In an example, a built in microphone of a portable player (or any other playback equipment) can be used to measure the noise floor of the environment continuously thereby allowing the DIRT to dynamically adjust to that of the listening environment.

In an example, a multiband approach with noise floors of each band would allow music to be changed in tone so that different frequency regions of a signal are compressed by respective different amounts. Accordingly, the perceived tone in the listening environment would remain the same as that within a poor listening environment. A multiband approach could enhance the quality of music in environments with large amounts of low frequency rumble, such as in cars or planes for example.

Figure 8 is a schematic block diagram of an apparatus according to an example suitable for implementing any of the system or processes described above. Apparatus 800 includes one or more processors, such as processor 801, providing an execution platform for executing machine readable instructions such as software. Commands and data from the processor 801 are communicated over a communication bus 399. The system 800 also includes a main memory 802, such as a Random Access Memory (RAM), where machine readable instructions may reside during runtime, and a secondary memory 805. The secondary memory 805 includes, for example, a hard disk drive 807 and/or a removable storage drive 830, representing a floppy diskette drive, a magnetic tape drive, a compact disk drive, etc., or a nonvolatile memory where a copy of the machine readable instructions or software may be stored. The secondary memory 805 may also include ROM (read only memory), EPROM (erasable, programmable ROM), EEPROM (electrically erasable, programmable ROM). In addition to software, data representing any one or more of an input audio signal, output audio signal, transfer function, average value for an audio signal and so on may be stored in the main memory 802 and/or the secondary memory 805. The removable storage drive 830 reads from and/or writes to a removable storage unit 809 in a well-known manner.

A user can interface with the system 800 with one or more input devices 811, such as a keyboard, a mouse, a stylus, and the like in order to provide user input data. The display adaptor 815 interfaces with the communication bus 399 and the display 817 and receives display data from the processor 801 and converts the display data into display commands for the display 817. A network interface 819 is provided for communicating with other systems and devices via a network (not shown). The system can include a wireless interface 821 for communicating with wireless devices in a wireless community.

It will be apparent to one of ordinary skill in the art that one or more of the components of the system 800 may not be included and/or other components may be added as is known in the art. The system 800 shown in figure 8 is provided as an example of a possible platform that may be used, and other types of platforms may be used as is known in the art.

One or more of the steps described above may be implemented as instructions embedded on a computer readable medium and executed on the system 800. The steps may be embodied by a computer program, which may exist in a variety of forms both active and inactive. For example, they may exist as software program(s) comprised of program instructions in source code, object code, executable code or other formats for performing some of the steps. Any of the above may be embodied on a computer readable medium, which include storage devices and signals, in compressed or uncompressed form. Examples of suitable computer readable storage devices include conventional computer system RAM (landom access memory), ROM (read only memory), EPROM (erasable, programmable ROM), FEPROM (electrically erasable, programmable ROM), and magnetic or optical disks or tapes. Examples of computer readable signals, whether modulated using a carrier or not, are signals that a computer system hosting or running a computer program may be configured to access, including signals downloaded through the Internet or other networks. Concrete examples of the foregoing include distribution of the programs on a CD ROM or via Internet download. In a sense, the Internet itself, as an abstract entity, is a computer readable medium. The same is true of computer networks in general. It is therefore to be understood that those functions enumerated above may be performed by any electronic device capable of executing the above-described functions.

According to an example, an input audio signal 805 and an output audio signal 805 can reside in memory 802, either wholly or partially.

Claims

<claim-text>CLAIMSWhat is claimed is: 1. A method for adjusting dynamic range of an audio signal comprising: providing an input audio signal with a first dynamic range; mapping the first dynamic range to a second dynamic range using a transfer function with a linear portion aligned to an average level of the input audio signal; and generating an output audio signal with the second dynamic range from the input audio signal.</claim-text> <claim-text>2. A method as claimed in claim 1, wherein the average level of the input audio signal is determined using a one pole low pass filter in combination with an absolute sum and average of the input audio signal with an averaging length greater than a predetermined minimum value.</claim-text> <claim-text>3. A method as claimed in claim 1 or 2, further comprising aligning the linear portion to the average level using a gain value to shift the transfer function with respect to the input audio signal.</claim-text> <claim-text>4. A method as claimed in any preceding claim, further comprising: receiving user input representing a dynamic range window for substantially constraining the second dynamic range of the output audio signal.</claim-text> <claim-text>5. A method as claimed in claim 4, wherein the transfer function is determined on the basis of the user input.</claim-text> <claim-text>6. A method as claimed in any preceding claim, wherein the transfer function is dynamically adjusted in response to changes in a noise floor of the listening environment.</claim-text> <claim-text>7. A method as claimed in claim 6, wherein the measurement is adjusted to account for the output audio signal.</claim-text> <claim-text>8. A method as claimed in any preceding claim, wherein a fade-in or fade-out portion of the input audio signal is maintained.</claim-text> <claim-text>9. A method as claimed in claim 8, wherein maintaining a fade-in or fade-out includes preserving a noise floor of the input audio signal.</claim-text> <claim-text>10. A method for configuring the dynamic range of an output audio signal, comprising: providing a dynamic range tolerance window; computing an average value for an input audio signal over a predetermined psychoacoustic timescale; using the average to generate a gain value to shift the dynamic range tolerance window; and using the input audio signal to generate the output audio signal, the output audio signal having a dynamic range substantially confined within the dynamic range tolerance window.</claim-text> <claim-text>11. A method as claimed in claim 10, wherein the average level of the input audio signal is determined using a one pole low pass filter in combination with an absolute sum and average of the input audio signal with an averaging length greater than a predetermined minimum value.</claim-text> <claim-text>12. A method as claimed in claim 10 or 11, further comprising: receiving user input defining the dynamic range tolerance window.</claim-text> <claim-text>13. A method as claimed in any of claims 10 to 12, wherein a fade-in or fade-out portion of the input audio signal is maintained.</claim-text> <claim-text>14. A system for processing an audio signal, comprising: a signal processor to: receive data representing an input audio signal; map the dynamic range of the input audio signal to an output dynamic range using a transfer function with a linear portion aligned to an average level of the input audio signal; generate an output audio signal with the output dynamic range, from the input audio signal.</claim-text> <claim-text>15. A system as claimed in claim 14, wherein the average level of the input audio signal is determined using a one pole low pass filter in combination with an absolute sum and average of the input audio signal with an averaging length greater than a predetermined minimum value.</claim-text> <claim-text>16. A system as claimed in claim 14 or 15, the signal processor further operable to align the linear portion to the average level using a gain value to shift the transfer function with respect to the input audio signal.</claim-text> <claim-text>17. A system as claimed in any of claims l4to 16, further comprising: receiving user input representing a dynamic range window for substantially constraining the dynamic range of the output audio signal.</claim-text> <claim-text>18. A method as claimed in claim 14, wherein the transfer function is determined on the basis of user input.</claim-text> <claim-text>19. A system as claimed in claim 18, the signal processor to adjust the transfer function in response to changes in a noise floor of the listening environment.</claim-text> <claim-text>20. A system as claimed in any of claims 14 to 19, the signal processor to maintain a fade-in or fade-out portion of the input audio signal.</claim-text> <claim-text>21. A computer program embedded on a non-transitory tangible computer readable storage medium, the computer program including machine readable instructions that, when executed by a processor, implement a method for adjusting dynamic range of an audio signal comprising: receiving data representing a user selection for a dynamic range tolerance; determining a transfer function based on the dynamic range tolerance; processing an input audio signal to generate an output audio signal using the transfer function by maintaining an average level of the input audio signal within a range defined by the user selection.</claim-text> <claim-text>22. A method substantially as hereinbefore described with reference to the accompanying drawings.</claim-text> <claim-text>23. A system substantially as hereinbefore described with reference to and as shown in the accompanying drawings.</claim-text>