WO2010127024A1 - Controlling the loudness of an audio signal in response to spectral localization - Google Patents

Controlling the loudness of an audio signal in response to spectral localization Download PDF

Info

Publication number
WO2010127024A1
WO2010127024A1 PCT/US2010/032807 US2010032807W WO2010127024A1 WO 2010127024 A1 WO2010127024 A1 WO 2010127024A1 US 2010032807 W US2010032807 W US 2010032807W WO 2010127024 A1 WO2010127024 A1 WO 2010127024A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
loudness
broadband
measure
response
Prior art date
Application number
PCT/US2010/032807
Other languages
French (fr)
Inventor
Michael J. Smithers
Original Assignee
Dolby Laboratories Licensing Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby Laboratories Licensing Corporation filed Critical Dolby Laboratories Licensing Corporation
Priority to US13/265,691 priority Critical patent/US8761415B2/en
Publication of WO2010127024A1 publication Critical patent/WO2010127024A1/en

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G9/00Combinations of two or more types of control, e.g. gain control and tone control
    • H03G9/02Combinations of two or more types of control, e.g. gain control and tone control in untuned amplifiers
    • H03G9/025Combinations of two or more types of control, e.g. gain control and tone control in untuned amplifiers frequency-dependent volume compression or expansion, e.g. multiple-band systems
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G9/00Combinations of two or more types of control, e.g. gain control and tone control
    • H03G9/005Combinations of two or more types of control, e.g. gain control and tone control of digital or coded signals

Definitions

  • This invention relates to audio signal processing.
  • the invention relates to modifying the loudness of an audio signal by measuring the weighted broadband level of the audio signal and modifying that weighted broadband level as a function of a spectral localization estimate of the audio signal.
  • a method for controlling the loudness of an audio signal comprises receiving an audio signal, generating an estimate of the spectral localization of the audio signal, generating a broadband measure of loudness of the audio signal, modifying said broadband measure of loudness in response to an estimate of the spectral localization of the audio signal, and modifying the broadband level of the audio signal in response to the modified broadband measure of loudness.
  • Generating an estimate of the spectral localization of the audio signal may include determining the degree to which a majority of the audio signal's energy is within half of the signal's total audio bandwidth when a perceptual frequency banding scale is employed, which may, in turn, include dividing the audio signal into a plurality of frequency bands and generating a scaling factor in response to the relative level in two frequency bands.
  • the modified broadband measure of loudness may be temporally smoothed and the broadband level of the audio signal may be modified in response to the smoothed modified broadband measure of loudness.
  • the temporal smoothing may have one or more time constants useful for syllabic speech processing.
  • the two frequency bands may be the bands having the second largest and largest level power values.
  • the two frequency bands may be the two lowest frequency bands.
  • Generating a broadband measure of loudness of the audio signal may determine the broadband measure of loudness of the audio signal after its processing by a weighting filter.
  • the broadband measure of loudness and the level in two or more of the frequency bands may each be based on short- term levels.
  • FIGS. 1 and 2 essentially identical devices and functions bear the same reference numeral when they appear in multiple figures. Modified devices and functions are distinguished by a prime (') or double prime (") symbol.
  • FIG. 1 is a functional schematic block diagram of an arrangement for controlling the loudness of an audio signal according to aspects of the present invention.
  • the arrangement receives the audio signal and applies it to two paths: a control path and a signal path.
  • the loudness of the audio signal is measured in a Measure Loudness device or function 2.
  • the resulting measure of loudness is applied to a Modify Loudness device or function 4 that uses the measure of loudness and one or more "Loudness Modification Parameters" to create a gain value that when multiplied with the original audio signal, as in multiplier 8, produces a modified audio signal having a desired modified loudness.
  • the audio signal in the signal path may be delayed by a delay 6 to match latency incurred in Measure Loudness 2 and Modify Loudness 4.
  • the arrangement may operate uninterruptedly, intermittently, or just once for a finite length audio signal, for the examples described herein, the audio signal may be processed intermittently in consecutive time interval blocks of approximately 5 to 20 milliseconds — however, such a block length is not critical to the invention.
  • Modify Loudness 4 includes a loudness versus gain characteristic that outputs a gain value in response to the input loudness measure and one or more loudness modification parameters.
  • the Loudness Modification Parameters may select and/or modify the loudness-versus-gain characteristic.
  • the resulting gain value may, for example, impose a dynamic range modification on the audio signal by, for example, applying compression and/or expansion as the loudness measure changes dynamically.
  • Measure Loudness 2 employs a weighted-level-based loudness measurement.
  • the audio signal is passed through a weighting filter 10, for example an A-, B-, or C- weighting filter, that emphasizes more perceptually relevant audio frequencies and de- emphasizes less perceptually relevant frequencies.
  • a weighting filter for example an A-, B-, or C- weighting filter, that emphasizes more perceptually relevant audio frequencies and de- emphasizes less perceptually relevant frequencies.
  • the broadband level typically, broadband power
  • the broadband level typically, broadband power
  • the broadband level is calculated in Level Calculation device or function 12 and, optionally, temporally smoothed in Smoothing device or function 14 to produce a time-smoothed broadband power measurement, which is a measure of loudness.
  • “broadband” is meant the entire frequency band or spectrum of the audio signal (full bandwidth audio) or substantially the entire frequency band (in practical implementations, band limiting filtering at the ends of the spectrum is often employed).
  • smoothing in Smoothing 14 may be performed with time constants commensurate with perceptual temporal smoothing of loudness.
  • time constants commensurate with perceptual temporal smoothing of loudness.
  • a temporal smoothing having an attack time of 15 ms and a release time of 50 ms may be employed.
  • those time constants are not critical to the invention and other values may be used.
  • spectrally localized signals are measured as being significantly louder than corresponding subjective loudness estimates and are treated disproportionately to other signals.
  • the result may be that more gain change is applied to spectrally localized signals than is necessary.
  • the audio signal is applied to a Filterbank device or function 16 that splits the audio signal into a multiplicity of frequency bands.
  • a Spectral Localization device or function 18 determines how spectrally localized the audio signal is and calculates a loudness-reducing scale factor.
  • the scale factor is multiplied by the broadband signal level estimate in order to lower the loudness estimate as the audio signal becomes more and more spectrally localized.
  • the resulting lowered broadband level estimate is optionally smoothed in Smoothing 14 as described above.
  • the optionally smoothed broadband level estimate which is a measure of loudness, is applied to Modify Loudness 4 that produces a gain value for use in producing the loudness-adjusted modified audio signal.
  • spectral localization is meant a measure of narrowbandedness of the signal component distribution in the bandwidth of an audio signal undergoing processing.
  • a signal may be considered narrowbanded or spectrally localized when the majority of its energy is within half of the human auditory bandwidth of 20 Hz to 20 kHz when a perceptual frequency banding scale such as ERB or critical band (Bark) scaling is employed.
  • the human auditory bandwidth may be divided into five bands, as shown in FIG.
  • the audio signal considered to be narrowbanded or spectrally localized when a majority of the signal's energy is in one of the bands, which would be less than half of the signal's total audio bandwidth on a Bark scale. It is not critical to the invention to determine narrowbandedness or spectral localization in the manner of the example. Other ways of determining narrowbandedness or spectral localization may be usable.
  • FIG. 2 is a functional schematic block diagram showing an example of a variation of the arrangement of FIG. 1 in which Measure Loudness 2' differs from Measure Loudness 2 of FIG. 1.
  • the weighted audio signal rather than the unweighted audio signal provides the input to Filterbank 16.
  • the Weighting Filter 10 acts as a DC blocking filter, reducing the effects of DC signals interfering with the Filterbank 16 and Spectral Localization 18 calculations.
  • FIG. 1 is a functional schematic block diagram of an arrangement for controlling the loudness of an audio signal in accordance with aspects of the present invention.
  • FIG. 2 is a functional schematic block diagram showing an example of a variation of the example of FIG. 2 in which Measure Loudness 2' differs from Measure Loudness 2 of FIG. 1.
  • FIG. 3 is an idealized response characteristic (gain versus frequency) of a weighting filter that is suitable for use in the arrangements of FIGS. 1 and 2.
  • FIG. 4 shows idealized filter response characteristics (power response versus frequency) of bands in a Filterbank 16, the response characteristics being suitable for use in the arrangements of FIGS. 1 and 2.
  • the audio signal is filtered by a weighting filter 10 whose frequency response may, for example, be as shown in the idealized response of FIG. 3.
  • the filter may have a first order highpass characteristic with a corner frequency at 300 Hz and a low frequency characteristic similar to other common A, B & C weighting filters used in weighted power measures.
  • Level Calculation 12 then computes the average sample (n) power over a block of N samples, where k is the block index.
  • the Level Calculation may be represented as
  • the Filterbank computes spectral band power values using the autocorrelation of the audio samples in the block k. More specifically, a sliding overlapping block of N+Q samples of the audio signal may be constructed, where Q samples overlap with the adjacent blocks, and are windowed by window function.
  • the window function may be unity in the center and taper down toward zero at its edges to reduce edge related errors in the autocorrelation.
  • a useful value for the overlap length Q may be 31 samples at 48 kHz, although this is not critical to the invention and other overlap lengths and sampling rates may be employed.
  • the autocorrelation of the k windowed sample block may then be computed. This may be represented as
  • the autocorrelation values A(k,l) may be transformed into band power values
  • a common method for calculating the power of an audio signal in a frequency band of interest is to filter the audio signal, and then calculate the autocorrelation of the filtered signal.
  • the 0 lag of the autocorrelation is the band power.
  • the band filter is a FIR filter
  • the band power can be equivalently calculated as the dot product of the autocorrelation of the signal and the autocorrelation of the filter impulse response. Because both autocorrelation vectors are symmetrical, the dot product can be performed using one half of the each of the autocorrelation vectors — where the non-zero lag values are summed twice.
  • Each row of the matrix M represents the one sided autocorrelation of a band filter.
  • Non-zero lag values are doubled to effect the necessary double summation.
  • Matrix M has one row for each band. In this example, five bands were found to produce useful results. The choice of the number of bands involves a tradeoff — although a small number of bands reduces complexity, when the number of bands is too small the arrangement may fail to detect narrowbandedness under common signal conditions.
  • Suitable Filterbank 16 band filter power responses are shown in the idealized responses of FIG. 4.
  • the matrix M implements these filters and also includes a scaling such that the "energy per ERB band" is the same from band to band.
  • the ERB scale is a psychoacoustic-based frequency mapping. In FIG. 4, the low “bump" at approximately 20 kHz is ripple from the 2 nd band filter that is centered at approximately 8 kHz in this example.
  • the band power of the first band preferably is reduced so as to be approximately similar to the others for commonly occurring signals by dropping its power by approximately 10 dB (x 0.1).
  • the result is the modified band power B' that may be expressed as:
  • the Spectral Localization device or function 18 may then calculate the scaling factor as the ratio of the second largest and largest band power values, a simple calculation requiring low processing power and memory.
  • the scaling factor may be constrained to be between approximately -7 dB (x 0.2) and 0 dB. If the denominator of the ratio is zero, the division result is undefined and so the scaling factor D(k) is set to 1.0.
  • a scaled weighted power measurement P ⁇ (k) may be calculated as the product of the weighted power measure the scaling factor:
  • the scaled weighted power measure may then be smoothed in
  • the calculation in Equation 6 may be further simplified by only considering the first (lowest in frequency) two bands (after reducing band 1).
  • the scaling factor may be calculated as the ratio of the smaller to the larger of the first two band powers.
  • the scaling factor preferably is constrained to a range of between -7 dB and 0 dB. As above, if the denominator of the ratio is zero, the divide result is undefined and so the scaling factor is set to 1.0.
  • Equation 8 has been found to have sound quality benefits.
  • One problem with Equation 6 is that during vocal singing, there can be instances of "ess” where not only the scaling factor rises toward 1.0, but the signal power also rises. The net effect is a dramatic increase in the power that can cause the downstream loudness processing to, in the case of dynamics processing, apply more gain reduction to the "ess” than is necessary. De-essing is a common tool of audio mixing but when over used, it can become perceptually annoying. Since Equation 8 only looks at the lower frequency bands, the scaling factor does not rise as quickly during the sibilance "ess" in vocal singing.
  • the invention may be implemented in hardware or software, or a combination of both (e.g., programmable logic arrays). Unless otherwise specified, algorithms and processes included as part of the invention are not inherently related to any particular computer or other apparatus. In particular, various general-purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus (e.g., integrated circuits) to perform the required method steps. Thus, the invention may be implemented in one or more computer programs executing on one or more programmable computer systems each comprising at least one processor, at least one data storage system (including volatile and non- volatile memory and/or storage elements), at least one input device or port, and at least one output device or port. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices, in known fashion.
  • Program code is applied to input data to perform the functions described herein and generate output information.
  • the output information is applied to one or more output devices, in known fashion.
  • the language may be a compiled or interpreted language.
  • Each such computer program is preferably stored on or downloaded to a storage media or device (e.g., solid state memory or media, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer system to perform the procedures described herein.
  • a storage media or device e.g., solid state memory or media, or magnetic or optical media
  • the inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein.

Landscapes

  • Tone Control, Compression And Expansion, Limiting Amplitude (AREA)

Abstract

The invention relates to modifying the loudness of an audio signal by measuring the weighted broadband level of the audio signal and modifying that weighted broadband level as a function of a spectral localization estimate of the audio signal.

Description

CONTROLLING THE LOUDNESS OF AN AUDIO SIGNAL IN RESPONSE TO
SPECTRAL LOCALIZATION
Cross-Reference to Related Applications
[0001] This application claims priority to United States Provisional Patent
Application No. 61/174,468, filed 30 April 2009, hereby incorporated by reference in its entirety.
Field of the Invention
[0002] This invention relates to audio signal processing. In particular, the invention relates to modifying the loudness of an audio signal by measuring the weighted broadband level of the audio signal and modifying that weighted broadband level as a function of a spectral localization estimate of the audio signal.
Summary of the Invention
[0003] According to aspects of the present invention, a method for controlling the loudness of an audio signal comprises receiving an audio signal, generating an estimate of the spectral localization of the audio signal, generating a broadband measure of loudness of the audio signal, modifying said broadband measure of loudness in response to an estimate of the spectral localization of the audio signal, and modifying the broadband level of the audio signal in response to the modified broadband measure of loudness. Generating an estimate of the spectral localization of the audio signal may include determining the degree to which a majority of the audio signal's energy is within half of the signal's total audio bandwidth when a perceptual frequency banding scale is employed, which may, in turn, include dividing the audio signal into a plurality of frequency bands and generating a scaling factor in response to the relative level in two frequency bands. The modified broadband measure of loudness may be temporally smoothed and the broadband level of the audio signal may be modified in response to the smoothed modified broadband measure of loudness. The temporal smoothing may have one or more time constants useful for syllabic speech processing. The two frequency bands may be the bands having the second largest and largest level power values. The two frequency bands may be the two lowest frequency bands. Generating a broadband measure of loudness of the audio signal may determine the broadband measure of loudness of the audio signal after its processing by a weighting filter. The broadband measure of loudness and the level in two or more of the frequency bands may each be based on short- term levels. [0004] Although in principle the invention may be practiced either in the analog or digital domain (or some combination of the two), in practical embodiments of the invention, audio signals are represented by samples in blocks of data and processing is done in the digital domain.
[0005] In FIGS. 1 and 2, essentially identical devices and functions bear the same reference numeral when they appear in multiple figures. Modified devices and functions are distinguished by a prime (') or double prime (") symbol.
[0006] FIG. 1 is a functional schematic block diagram of an arrangement for controlling the loudness of an audio signal according to aspects of the present invention. The arrangement receives the audio signal and applies it to two paths: a control path and a signal path. In the control path, the loudness of the audio signal is measured in a Measure Loudness device or function 2. The resulting measure of loudness is applied to a Modify Loudness device or function 4 that uses the measure of loudness and one or more "Loudness Modification Parameters" to create a gain value that when multiplied with the original audio signal, as in multiplier 8, produces a modified audio signal having a desired modified loudness. The audio signal in the signal path may be delayed by a delay 6 to match latency incurred in Measure Loudness 2 and Modify Loudness 4. Although the arrangement may operate uninterruptedly, intermittently, or just once for a finite length audio signal, for the examples described herein, the audio signal may be processed intermittently in consecutive time interval blocks of approximately 5 to 20 milliseconds — however, such a block length is not critical to the invention.
[0007] Modify Loudness 4 includes a loudness versus gain characteristic that outputs a gain value in response to the input loudness measure and one or more loudness modification parameters. The Loudness Modification Parameters may select and/or modify the loudness-versus-gain characteristic. The resulting gain value may, for example, impose a dynamic range modification on the audio signal by, for example, applying compression and/or expansion as the loudness measure changes dynamically.
[0008] Measure Loudness 2 employs a weighted-level-based loudness measurement.
The audio signal is passed through a weighting filter 10, for example an A-, B-, or C- weighting filter, that emphasizes more perceptually relevant audio frequencies and de- emphasizes less perceptually relevant frequencies. At each time interval (a block, in this example), the broadband level (typically, broadband power) of the audio signal is calculated in Level Calculation device or function 12 and, optionally, temporally smoothed in Smoothing device or function 14 to produce a time-smoothed broadband power measurement, which is a measure of loudness. By "broadband" is meant the entire frequency band or spectrum of the audio signal (full bandwidth audio) or substantially the entire frequency band (in practical implementations, band limiting filtering at the ends of the spectrum is often employed). If the goal is to modify the loudness of the audio signal over time, smoothing in Smoothing 14 may be performed with time constants commensurate with perceptual temporal smoothing of loudness. For example, a temporal smoothing having an attack time of 15 ms and a release time of 50 ms may be employed. However, those time constants are not critical to the invention and other values may be used. [0009] The inventor has determined that weighted level (typically, power) measurements such as performed by Weighting Filter 10 and Level Calculation 12, as described below, significantly overrate spectrally localized signals. In other words, when a short-term (a time commensurate with the block time interval, 5 to 20 ms, in the above example) loudness measurement based on weighted level is calibrated to match subjective loudness estimates for average and spectrally complex audio signals, spectrally localized signals are measured as being significantly louder than corresponding subjective loudness estimates and are treated disproportionately to other signals. For example, in the case of loudness-based dynamics processing, the result may be that more gain change is applied to spectrally localized signals than is necessary.
[0010] Still referring to the example of FIG. 1, in parallel to the weighted broadband level measurement performed in Weighting Filter 10 and Level Calculation 12, the audio signal is applied to a Filterbank device or function 16 that splits the audio signal into a multiplicity of frequency bands. A Spectral Localization device or function 18 determines how spectrally localized the audio signal is and calculates a loudness-reducing scale factor. The scale factor is multiplied by the broadband signal level estimate in order to lower the loudness estimate as the audio signal becomes more and more spectrally localized. The resulting lowered broadband level estimate is optionally smoothed in Smoothing 14 as described above. The optionally smoothed broadband level estimate, which is a measure of loudness, is applied to Modify Loudness 4 that produces a gain value for use in producing the loudness-adjusted modified audio signal.
[0011] By "spectral localization" is meant a measure of narrowbandedness of the signal component distribution in the bandwidth of an audio signal undergoing processing. For the purposes of this invention, a signal may be considered narrowbanded or spectrally localized when the majority of its energy is within half of the human auditory bandwidth of 20 Hz to 20 kHz when a perceptual frequency banding scale such as ERB or critical band (Bark) scaling is employed. In an example of a practical embodiment found to be useful, the human auditory bandwidth may be divided into five bands, as shown in FIG. 4, and the audio signal considered to be narrowbanded or spectrally localized when a majority of the signal's energy is in one of the bands, which would be less than half of the signal's total audio bandwidth on a Bark scale. It is not critical to the invention to determine narrowbandedness or spectral localization in the manner of the example. Other ways of determining narrowbandedness or spectral localization may be usable.
[0012] FIG. 2 is a functional schematic block diagram showing an example of a variation of the arrangement of FIG. 1 in which Measure Loudness 2' differs from Measure Loudness 2 of FIG. 1. In this FIG. 2 example, the weighted audio signal rather than the unweighted audio signal provides the input to Filterbank 16. This alternative arrangement has a few benefits. First, the Weighting Filter 10 acts as a DC blocking filter, reducing the effects of DC signals interfering with the Filterbank 16 and Spectral Localization 18 calculations. Second, it provides a more perceptually relevant audio signal to the Filterbank 16 and Spectral Localization 18, which leads to a final scaling factor that has been found to work better when the loudness of the audio signal is subsequently modified by the gain value in multiplier 8.
Brief Description of the Drawings
[0013] FIG. 1 is a functional schematic block diagram of an arrangement for controlling the loudness of an audio signal in accordance with aspects of the present invention.
[0014] FIG. 2 is a functional schematic block diagram showing an example of a variation of the example of FIG. 2 in which Measure Loudness 2' differs from Measure Loudness 2 of FIG. 1.
[0015] FIG. 3 is an idealized response characteristic (gain versus frequency) of a weighting filter that is suitable for use in the arrangements of FIGS. 1 and 2. [0016] FIG. 4 shows idealized filter response characteristics (power response versus frequency) of bands in a Filterbank 16, the response characteristics being suitable for use in the arrangements of FIGS. 1 and 2.
Detailed Description of the Invention
[0017] Referring to the example of FIG. 1, the audio signal is filtered by a weighting filter 10 whose frequency response may, for example, be as shown in the idealized response of FIG. 3. The filter may have a first order highpass characteristic with a corner frequency at 300 Hz and a low frequency characteristic similar to other common A, B & C weighting filters used in weighted power measures. The filter action may be represented as x'(n) = H[x(n)] , (1) where the weighting filter input is x(n), the weighting filter output is x'(n) and the filter's transfer function is H. Although this weighting filter characteristic has been found to be useful, it is not critical to the invention and other weighting filter characteristics may be employed.
[0018] Level Calculation 12 then computes the average sample (n) power over a block of N samples, where k is the block index. The Level Calculation may be represented as
P(k) = ^-∑x'(n)2 . (2)
[0019] As explained further below, the Filterbank computes spectral band power values using the autocorrelation of the audio samples in the block k. More specifically, a sliding overlapping block of N+Q samples of the audio signal may be constructed, where Q samples overlap with the adjacent blocks, and are windowed by window function. The window function may be unity in the center and taper down toward zero at its edges to reduce edge related errors in the autocorrelation. A useful value for the overlap length Q may be 31 samples at 48 kHz, although this is not critical to the invention and other overlap lengths and sampling rates may be employed. The autocorrelation of the k windowed sample block may then be computed. This may be represented as
A(k,l) l).x(n -l) for 1 = 0,..,5 , (3)
Figure imgf000006_0001
where w(n) are windowed samples and where / is the autocorrelation lag index. [0020] The autocorrelation values A(k,l) may be transformed into band power values
B using a matrix M, where b is the band index. Values for a sample rate of 48 kHz are suitable and may be rounded to 5 decimal places.
0.01051 0.01889 0.01447 0.00920 0.00454 0.00143 0.01640 0.01262 - 0.0111 - 0.01636 - 0.00528 0.00374
M = 0.02058 0.01052 -0.02625 -0.01667 0.00566 0.00616 0.02870 -0.0096 -0.03827 0.01435 0.00957 -0.00478 0.03576 -0.0358 -0.02043 0.03320 -0.01532 0.00255 B = MA where A (4)
Figure imgf000007_0001
[0021] A common method for calculating the power of an audio signal in a frequency band of interest is to filter the audio signal, and then calculate the autocorrelation of the filtered signal. The 0 lag of the autocorrelation is the band power. If the band filter is a FIR filter, the band power can be equivalently calculated as the dot product of the autocorrelation of the signal and the autocorrelation of the filter impulse response. Because both autocorrelation vectors are symmetrical, the dot product can be performed using one half of the each of the autocorrelation vectors — where the non-zero lag values are summed twice. Each row of the matrix M represents the one sided autocorrelation of a band filter. Non-zero lag values are doubled to effect the necessary double summation. Matrix M has one row for each band. In this example, five bands were found to produce useful results. The choice of the number of bands involves a tradeoff — although a small number of bands reduces complexity, when the number of bands is too small the arrangement may fail to detect narrowbandedness under common signal conditions. Suitable Filterbank 16 band filter power responses are shown in the idealized responses of FIG. 4. The matrix M implements these filters and also includes a scaling such that the "energy per ERB band" is the same from band to band. As is well known, the ERB scale is a psychoacoustic-based frequency mapping. In FIG. 4, the low "bump" at approximately 20 kHz is ripple from the 2nd band filter that is centered at approximately 8 kHz in this example.
[0022] Because typical audio signals have more bass energy and less energy with rising frequency (similar to a pink noise signal), the first band nearly always has significantly more energy than all the other bands. To compensate for this situation, the band power of the first band preferably is reduced so as to be approximately similar to the others for commonly occurring signals by dropping its power by approximately 10 dB (x 0.1). The result is the modified band power B' that may be expressed as:
{0.1- B[b] where£ = 0
B'= \ (5)
[ B[b] where£ > 0
[0023] After reducing band 1, the Spectral Localization device or function 18 may then calculate the scaling factor as the ratio of the second largest and largest band power values, a simple calculation requiring low processing power and memory. The scaling factor may be constrained to be between approximately -7 dB (x 0.2) and 0 dB. If the denominator of the ratio is zero, the division result is undefined and so the scaling factor D(k) is set to 1.0.
Figure imgf000008_0001
where B" = B' not including max(β') .
[0024] For typical audio signals that have roughly a pink-noise- shaped spectrum, the scaling factor D(k) is close to 1.0 and for spectrally localized signals is close to 0.2. [0025] Finally, a scaled weighted power measurement Pϋ(k) may be calculated as the product of the weighted power measure the scaling factor:
PD (k) = D(k).P(k) (7)
[0026] The scaled weighted power measure, optionally, may then be smoothed in
Smoothing 14.
[0027] The calculation in Equation 6 may be further simplified by only considering the first (lowest in frequency) two bands (after reducing band 1). The scaling factor may be calculated as the ratio of the smaller to the larger of the first two band powers. As above, the scaling factor preferably is constrained to a range of between -7 dB and 0 dB. As above, if the denominator of the ratio is zero, the divide result is undefined and so the scaling factor is set to 1.0.
Figure imgf000008_0002
[0028] In addition to being slightly faster to compute, Equation 8 has been found to have sound quality benefits. One problem with Equation 6 is that during vocal singing, there can be instances of "ess" where not only the scaling factor rises toward 1.0, but the signal power also rises. The net effect is a dramatic increase in the power that can cause the downstream loudness processing to, in the case of dynamics processing, apply more gain reduction to the "ess" than is necessary. De-essing is a common tool of audio mixing but when over used, it can become perceptually annoying. Since Equation 8 only looks at the lower frequency bands, the scaling factor does not rise as quickly during the sibilance "ess" in vocal singing. Implementation
[0029] The invention may be implemented in hardware or software, or a combination of both (e.g., programmable logic arrays). Unless otherwise specified, algorithms and processes included as part of the invention are not inherently related to any particular computer or other apparatus. In particular, various general-purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus (e.g., integrated circuits) to perform the required method steps. Thus, the invention may be implemented in one or more computer programs executing on one or more programmable computer systems each comprising at least one processor, at least one data storage system (including volatile and non- volatile memory and/or storage elements), at least one input device or port, and at least one output device or port. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices, in known fashion.
[0030] Each such program may be implemented in any desired computer language
(including machine, assembly, or high level procedural, logical, or object oriented programming languages) to communicate with a computer system. In any case, the language may be a compiled or interpreted language.
[0031] Each such computer program is preferably stored on or downloaded to a storage media or device (e.g., solid state memory or media, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer system to perform the procedures described herein. The inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein. [0032] A number of embodiments of the invention have been described.
Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, some of the steps described herein may be order independent, and thus can be performed in an order different from that described.

Claims

CLAIMS:
1. A method for controlling the loudness of an audio signal, comprising receiving an audio signal, generating an estimate of the spectral localization of the audio signal , generating a broadband measure of loudness of the audio signal, modifying said broadband measure of loudness in response to an estimate of the spectral localization of the audio signal, and modifying the broadband level of the audio signal in response to the modified broadband measure of loudness.
2. The method of claim 1 wherein generating an estimate of the spectral localization of the audio signal includes determining the degree to which a majority of the audio signal's energy is within half of the signal's total audio bandwidth when a perceptual frequency banding scale is employed.
3. The method of claim 2 wherein determining the degree to which a majority of the audio signal's energy is within half of the signal's total audio bandwidth when a perceptual frequency banding scale is employed includes dividing the audio signal into a plurality of frequency bands and generating a scaling factor in response to the relative level in two frequency bands.
4. The method of any one of claims 1-3 further comprising temporally smoothing the modified broadband measure of loudness and wherein the broadband level of the audio signal is modified in response to the smoothed modified broadband measure of loudness.
5. The method of claim 3 or claim 4 as dependent on claim 3 wherein the two frequency bands are the bands having the second largest and largest level power values.
6. The method of any one of claim 3, claim 4 as dependent on claim 3, or claim 5 wherein generating a scaling factor in response to the relative level in two frequency bands uses the two lowest frequency bands.
7. A method according to any one of claims 1-6 wherein generating a broadband measure of loudness of the audio signal determines the broadband measure of loudness of the audio signal after its processing by a weighting filter.
8. A method according to any one of claims 3-7 wherein the broadband measure of loudness and the level in two or more of the frequency bands are each based on short-term levels.
9. A method according to any one of claims 4-8 wherein the temporally smoothing has one or more time constants useful for syllabic speech processing.
10. Apparatus comprising means adapted to perform the method of any one of claims 1 through 9.
11. A computer program, stored on a computer-readable medium, for causing a computer to perform the method of any one of claims 1 through 9.
12. A computer-readable medium storing thereon the computer program performing the method of any one of claims 1 through 9.
PCT/US2010/032807 2009-04-30 2010-04-28 Controlling the loudness of an audio signal in response to spectral localization WO2010127024A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US13/265,691 US8761415B2 (en) 2009-04-30 2010-04-28 Controlling the loudness of an audio signal in response to spectral localization

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US17446809P 2009-04-30 2009-04-30
US61/174,468 2009-04-30

Publications (1)

Publication Number Publication Date
WO2010127024A1 true WO2010127024A1 (en) 2010-11-04

Family

ID=42288540

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2010/032807 WO2010127024A1 (en) 2009-04-30 2010-04-28 Controlling the loudness of an audio signal in response to spectral localization

Country Status (3)

Country Link
US (1) US8761415B2 (en)
TW (1) TWI538393B (en)
WO (1) WO2010127024A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8891789B2 (en) 2009-05-06 2014-11-18 Dolby Laboratories Licensing Corporation Adjusting the loudness of an audio signal with perceived spectral balance preservation
US8938313B2 (en) 2009-04-30 2015-01-20 Dolby Laboratories Licensing Corporation Low complexity auditory event boundary detection
CN108281148A (en) * 2016-12-30 2018-07-13 宏碁股份有限公司 Speech signal processing device and audio signal processing method

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101597375B1 (en) 2007-12-21 2016-02-24 디티에스 엘엘씨 System for adjusting perceived loudness of audio signals
US8538042B2 (en) 2009-08-11 2013-09-17 Dts Llc System for increasing perceived loudness of speakers
JP5942463B2 (en) * 2012-02-17 2016-06-29 株式会社ソシオネクスト Audio signal encoding apparatus and audio signal encoding method
JP5827442B2 (en) 2012-04-12 2015-12-02 ドルビー ラボラトリーズ ライセンシング コーポレイション System and method for leveling loudness changes in an audio signal
US9312829B2 (en) 2012-04-12 2016-04-12 Dts Llc System for adjusting loudness of audio signals in real time
US9685921B2 (en) * 2012-07-12 2017-06-20 Dts, Inc. Loudness control with noise detection and loudness drop detection
CN104080024B (en) 2013-03-26 2019-02-19 杜比实验室特许公司 Volume leveller controller and control method and audio classifiers
US10142731B2 (en) 2016-03-30 2018-11-27 Dolby Laboratories Licensing Corporation Dynamic suppression of non-linear distortion
TWI590236B (en) * 2016-12-09 2017-07-01 宏碁股份有限公司 Voice signal processing apparatus and voice signal processing method
US9860644B1 (en) 2017-04-05 2018-01-02 Sonos, Inc. Limiter for bass enhancement

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5878391A (en) * 1993-07-26 1999-03-02 U.S. Philips Corporation Device for indicating a probability that a received signal is a speech signal
US20040044525A1 (en) * 2002-08-30 2004-03-04 Vinton Mark Stuart Controlling loudness of speech in signals that contain speech and other types of audio material
US20070092089A1 (en) * 2003-05-28 2007-04-26 Dolby Laboratories Licensing Corporation Method, apparatus and computer program for calculating and adjusting the perceived loudness of an audio signal

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8090120B2 (en) 2004-10-26 2012-01-03 Dolby Laboratories Licensing Corporation Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
TWI517562B (en) 2006-04-04 2016-01-11 杜比實驗室特許公司 Method, apparatus, and computer program for scaling the overall perceived loudness of a multichannel audio signal by a desired amount
MY141426A (en) 2006-04-27 2010-04-30 Dolby Lab Licensing Corp Audio gain control using specific-loudness-based auditory event detection
US8849433B2 (en) 2006-10-20 2014-09-30 Dolby Laboratories Licensing Corporation Audio dynamics processing using a reset
US8521314B2 (en) 2006-11-01 2013-08-27 Dolby Laboratories Licensing Corporation Hierarchical control path with constraints for audio dynamics processing
WO2008085330A1 (en) 2007-01-03 2008-07-17 Dolby Laboratories Licensing Corporation Hybrid digital/analog loudness-compensating volume control
EP2162879B1 (en) 2007-06-19 2013-06-05 Dolby Laboratories Licensing Corporation Loudness measurement with spectral modifications
WO2009011826A2 (en) 2007-07-13 2009-01-22 Dolby Laboratories Licensing Corporation Time-varying audio-signal level using a time-varying estimated probability density of the level
TWI503816B (en) 2009-05-06 2015-10-11 Dolby Lab Licensing Corp Adjusting the loudness of an audio signal with perceived spectral balance preservation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5878391A (en) * 1993-07-26 1999-03-02 U.S. Philips Corporation Device for indicating a probability that a received signal is a speech signal
US20040044525A1 (en) * 2002-08-30 2004-03-04 Vinton Mark Stuart Controlling loudness of speech in signals that contain speech and other types of audio material
US20070092089A1 (en) * 2003-05-28 2007-04-26 Dolby Laboratories Licensing Corporation Method, apparatus and computer program for calculating and adjusting the perceived loudness of an audio signal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ALAN SEEFELDT ET AL: "A new objective measure of perceived loudness", AUDIO ENGINEERING SOCIETY CONVENTION PAPER, NEW YORK, NY, US, 28 October 2004 (2004-10-28), XP009087934 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8938313B2 (en) 2009-04-30 2015-01-20 Dolby Laboratories Licensing Corporation Low complexity auditory event boundary detection
US8891789B2 (en) 2009-05-06 2014-11-18 Dolby Laboratories Licensing Corporation Adjusting the loudness of an audio signal with perceived spectral balance preservation
CN108281148A (en) * 2016-12-30 2018-07-13 宏碁股份有限公司 Speech signal processing device and audio signal processing method
CN108281148B (en) * 2016-12-30 2020-12-22 宏碁股份有限公司 Speech signal processing apparatus and speech signal processing method

Also Published As

Publication number Publication date
TWI538393B (en) 2016-06-11
US8761415B2 (en) 2014-06-24
TW201106619A (en) 2011-02-16
US20120039490A1 (en) 2012-02-16

Similar Documents

Publication Publication Date Title
US8761415B2 (en) Controlling the loudness of an audio signal in response to spectral localization
US20220394380A1 (en) Audio Control Using Auditory Event Detection
US6915264B2 (en) Cochlear filter bank structure for determining masked thresholds for use in perceptual audio coding
EP2381574B1 (en) Apparatus and method for modifying an input audio signal
EP1629463B1 (en) Method, apparatus and computer program for calculating and adjusting the perceived loudness of an audio signal
US9672834B2 (en) Dynamic range compression with low distortion for use in hearing aids and audio systems
AU2011244268A1 (en) Apparatus and method for modifying an input audio signal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10716978

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 13265691

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10716978

Country of ref document: EP

Kind code of ref document: A1