WO2010127024A1 - Controlling the loudness of an audio signal in response to spectral localization - Google Patents
Controlling the loudness of an audio signal in response to spectral localization Download PDFInfo
- Publication number
- WO2010127024A1 WO2010127024A1 PCT/US2010/032807 US2010032807W WO2010127024A1 WO 2010127024 A1 WO2010127024 A1 WO 2010127024A1 US 2010032807 W US2010032807 W US 2010032807W WO 2010127024 A1 WO2010127024 A1 WO 2010127024A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio signal
- loudness
- broadband
- measure
- response
- Prior art date
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 67
- 230000003595 spectral effect Effects 0.000 title claims abstract description 18
- 230000004807 localization Effects 0.000 title claims abstract description 17
- 230000004044 response Effects 0.000 title claims description 21
- 238000000034 method Methods 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 11
- 238000009499 grossing Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 5
- 230000001419 dependent effect Effects 0.000 claims 2
- 230000006870 function Effects 0.000 description 15
- 238000004364 calculation method Methods 0.000 description 8
- 238000005259 measurement Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 101000802640 Homo sapiens Lactosylceramide 4-alpha-galactosyltransferase Proteins 0.000 description 2
- 102100035838 Lactosylceramide 4-alpha-galactosyltransferase Human genes 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G9/00—Combinations of two or more types of control, e.g. gain control and tone control
- H03G9/02—Combinations of two or more types of control, e.g. gain control and tone control in untuned amplifiers
- H03G9/025—Combinations of two or more types of control, e.g. gain control and tone control in untuned amplifiers frequency-dependent volume compression or expansion, e.g. multiple-band systems
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G9/00—Combinations of two or more types of control, e.g. gain control and tone control
- H03G9/005—Combinations of two or more types of control, e.g. gain control and tone control of digital or coded signals
Definitions
- This invention relates to audio signal processing.
- the invention relates to modifying the loudness of an audio signal by measuring the weighted broadband level of the audio signal and modifying that weighted broadband level as a function of a spectral localization estimate of the audio signal.
- a method for controlling the loudness of an audio signal comprises receiving an audio signal, generating an estimate of the spectral localization of the audio signal, generating a broadband measure of loudness of the audio signal, modifying said broadband measure of loudness in response to an estimate of the spectral localization of the audio signal, and modifying the broadband level of the audio signal in response to the modified broadband measure of loudness.
- Generating an estimate of the spectral localization of the audio signal may include determining the degree to which a majority of the audio signal's energy is within half of the signal's total audio bandwidth when a perceptual frequency banding scale is employed, which may, in turn, include dividing the audio signal into a plurality of frequency bands and generating a scaling factor in response to the relative level in two frequency bands.
- the modified broadband measure of loudness may be temporally smoothed and the broadband level of the audio signal may be modified in response to the smoothed modified broadband measure of loudness.
- the temporal smoothing may have one or more time constants useful for syllabic speech processing.
- the two frequency bands may be the bands having the second largest and largest level power values.
- the two frequency bands may be the two lowest frequency bands.
- Generating a broadband measure of loudness of the audio signal may determine the broadband measure of loudness of the audio signal after its processing by a weighting filter.
- the broadband measure of loudness and the level in two or more of the frequency bands may each be based on short- term levels.
- FIGS. 1 and 2 essentially identical devices and functions bear the same reference numeral when they appear in multiple figures. Modified devices and functions are distinguished by a prime (') or double prime (") symbol.
- FIG. 1 is a functional schematic block diagram of an arrangement for controlling the loudness of an audio signal according to aspects of the present invention.
- the arrangement receives the audio signal and applies it to two paths: a control path and a signal path.
- the loudness of the audio signal is measured in a Measure Loudness device or function 2.
- the resulting measure of loudness is applied to a Modify Loudness device or function 4 that uses the measure of loudness and one or more "Loudness Modification Parameters" to create a gain value that when multiplied with the original audio signal, as in multiplier 8, produces a modified audio signal having a desired modified loudness.
- the audio signal in the signal path may be delayed by a delay 6 to match latency incurred in Measure Loudness 2 and Modify Loudness 4.
- the arrangement may operate uninterruptedly, intermittently, or just once for a finite length audio signal, for the examples described herein, the audio signal may be processed intermittently in consecutive time interval blocks of approximately 5 to 20 milliseconds — however, such a block length is not critical to the invention.
- Modify Loudness 4 includes a loudness versus gain characteristic that outputs a gain value in response to the input loudness measure and one or more loudness modification parameters.
- the Loudness Modification Parameters may select and/or modify the loudness-versus-gain characteristic.
- the resulting gain value may, for example, impose a dynamic range modification on the audio signal by, for example, applying compression and/or expansion as the loudness measure changes dynamically.
- Measure Loudness 2 employs a weighted-level-based loudness measurement.
- the audio signal is passed through a weighting filter 10, for example an A-, B-, or C- weighting filter, that emphasizes more perceptually relevant audio frequencies and de- emphasizes less perceptually relevant frequencies.
- a weighting filter for example an A-, B-, or C- weighting filter, that emphasizes more perceptually relevant audio frequencies and de- emphasizes less perceptually relevant frequencies.
- the broadband level typically, broadband power
- the broadband level typically, broadband power
- the broadband level is calculated in Level Calculation device or function 12 and, optionally, temporally smoothed in Smoothing device or function 14 to produce a time-smoothed broadband power measurement, which is a measure of loudness.
- “broadband” is meant the entire frequency band or spectrum of the audio signal (full bandwidth audio) or substantially the entire frequency band (in practical implementations, band limiting filtering at the ends of the spectrum is often employed).
- smoothing in Smoothing 14 may be performed with time constants commensurate with perceptual temporal smoothing of loudness.
- time constants commensurate with perceptual temporal smoothing of loudness.
- a temporal smoothing having an attack time of 15 ms and a release time of 50 ms may be employed.
- those time constants are not critical to the invention and other values may be used.
- spectrally localized signals are measured as being significantly louder than corresponding subjective loudness estimates and are treated disproportionately to other signals.
- the result may be that more gain change is applied to spectrally localized signals than is necessary.
- the audio signal is applied to a Filterbank device or function 16 that splits the audio signal into a multiplicity of frequency bands.
- a Spectral Localization device or function 18 determines how spectrally localized the audio signal is and calculates a loudness-reducing scale factor.
- the scale factor is multiplied by the broadband signal level estimate in order to lower the loudness estimate as the audio signal becomes more and more spectrally localized.
- the resulting lowered broadband level estimate is optionally smoothed in Smoothing 14 as described above.
- the optionally smoothed broadband level estimate which is a measure of loudness, is applied to Modify Loudness 4 that produces a gain value for use in producing the loudness-adjusted modified audio signal.
- spectral localization is meant a measure of narrowbandedness of the signal component distribution in the bandwidth of an audio signal undergoing processing.
- a signal may be considered narrowbanded or spectrally localized when the majority of its energy is within half of the human auditory bandwidth of 20 Hz to 20 kHz when a perceptual frequency banding scale such as ERB or critical band (Bark) scaling is employed.
- the human auditory bandwidth may be divided into five bands, as shown in FIG.
- the audio signal considered to be narrowbanded or spectrally localized when a majority of the signal's energy is in one of the bands, which would be less than half of the signal's total audio bandwidth on a Bark scale. It is not critical to the invention to determine narrowbandedness or spectral localization in the manner of the example. Other ways of determining narrowbandedness or spectral localization may be usable.
- FIG. 2 is a functional schematic block diagram showing an example of a variation of the arrangement of FIG. 1 in which Measure Loudness 2' differs from Measure Loudness 2 of FIG. 1.
- the weighted audio signal rather than the unweighted audio signal provides the input to Filterbank 16.
- the Weighting Filter 10 acts as a DC blocking filter, reducing the effects of DC signals interfering with the Filterbank 16 and Spectral Localization 18 calculations.
- FIG. 1 is a functional schematic block diagram of an arrangement for controlling the loudness of an audio signal in accordance with aspects of the present invention.
- FIG. 2 is a functional schematic block diagram showing an example of a variation of the example of FIG. 2 in which Measure Loudness 2' differs from Measure Loudness 2 of FIG. 1.
- FIG. 3 is an idealized response characteristic (gain versus frequency) of a weighting filter that is suitable for use in the arrangements of FIGS. 1 and 2.
- FIG. 4 shows idealized filter response characteristics (power response versus frequency) of bands in a Filterbank 16, the response characteristics being suitable for use in the arrangements of FIGS. 1 and 2.
- the audio signal is filtered by a weighting filter 10 whose frequency response may, for example, be as shown in the idealized response of FIG. 3.
- the filter may have a first order highpass characteristic with a corner frequency at 300 Hz and a low frequency characteristic similar to other common A, B & C weighting filters used in weighted power measures.
- Level Calculation 12 then computes the average sample (n) power over a block of N samples, where k is the block index.
- the Level Calculation may be represented as
- the Filterbank computes spectral band power values using the autocorrelation of the audio samples in the block k. More specifically, a sliding overlapping block of N+Q samples of the audio signal may be constructed, where Q samples overlap with the adjacent blocks, and are windowed by window function.
- the window function may be unity in the center and taper down toward zero at its edges to reduce edge related errors in the autocorrelation.
- a useful value for the overlap length Q may be 31 samples at 48 kHz, although this is not critical to the invention and other overlap lengths and sampling rates may be employed.
- the autocorrelation of the k windowed sample block may then be computed. This may be represented as
- the autocorrelation values A(k,l) may be transformed into band power values
- a common method for calculating the power of an audio signal in a frequency band of interest is to filter the audio signal, and then calculate the autocorrelation of the filtered signal.
- the 0 lag of the autocorrelation is the band power.
- the band filter is a FIR filter
- the band power can be equivalently calculated as the dot product of the autocorrelation of the signal and the autocorrelation of the filter impulse response. Because both autocorrelation vectors are symmetrical, the dot product can be performed using one half of the each of the autocorrelation vectors — where the non-zero lag values are summed twice.
- Each row of the matrix M represents the one sided autocorrelation of a band filter.
- Non-zero lag values are doubled to effect the necessary double summation.
- Matrix M has one row for each band. In this example, five bands were found to produce useful results. The choice of the number of bands involves a tradeoff — although a small number of bands reduces complexity, when the number of bands is too small the arrangement may fail to detect narrowbandedness under common signal conditions.
- Suitable Filterbank 16 band filter power responses are shown in the idealized responses of FIG. 4.
- the matrix M implements these filters and also includes a scaling such that the "energy per ERB band" is the same from band to band.
- the ERB scale is a psychoacoustic-based frequency mapping. In FIG. 4, the low “bump" at approximately 20 kHz is ripple from the 2 nd band filter that is centered at approximately 8 kHz in this example.
- the band power of the first band preferably is reduced so as to be approximately similar to the others for commonly occurring signals by dropping its power by approximately 10 dB (x 0.1).
- the result is the modified band power B' that may be expressed as:
- the Spectral Localization device or function 18 may then calculate the scaling factor as the ratio of the second largest and largest band power values, a simple calculation requiring low processing power and memory.
- the scaling factor may be constrained to be between approximately -7 dB (x 0.2) and 0 dB. If the denominator of the ratio is zero, the division result is undefined and so the scaling factor D(k) is set to 1.0.
- a scaled weighted power measurement P ⁇ (k) may be calculated as the product of the weighted power measure the scaling factor:
- the scaled weighted power measure may then be smoothed in
- the calculation in Equation 6 may be further simplified by only considering the first (lowest in frequency) two bands (after reducing band 1).
- the scaling factor may be calculated as the ratio of the smaller to the larger of the first two band powers.
- the scaling factor preferably is constrained to a range of between -7 dB and 0 dB. As above, if the denominator of the ratio is zero, the divide result is undefined and so the scaling factor is set to 1.0.
- Equation 8 has been found to have sound quality benefits.
- One problem with Equation 6 is that during vocal singing, there can be instances of "ess” where not only the scaling factor rises toward 1.0, but the signal power also rises. The net effect is a dramatic increase in the power that can cause the downstream loudness processing to, in the case of dynamics processing, apply more gain reduction to the "ess” than is necessary. De-essing is a common tool of audio mixing but when over used, it can become perceptually annoying. Since Equation 8 only looks at the lower frequency bands, the scaling factor does not rise as quickly during the sibilance "ess" in vocal singing.
- the invention may be implemented in hardware or software, or a combination of both (e.g., programmable logic arrays). Unless otherwise specified, algorithms and processes included as part of the invention are not inherently related to any particular computer or other apparatus. In particular, various general-purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus (e.g., integrated circuits) to perform the required method steps. Thus, the invention may be implemented in one or more computer programs executing on one or more programmable computer systems each comprising at least one processor, at least one data storage system (including volatile and non- volatile memory and/or storage elements), at least one input device or port, and at least one output device or port. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices, in known fashion.
- Program code is applied to input data to perform the functions described herein and generate output information.
- the output information is applied to one or more output devices, in known fashion.
- the language may be a compiled or interpreted language.
- Each such computer program is preferably stored on or downloaded to a storage media or device (e.g., solid state memory or media, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer system to perform the procedures described herein.
- a storage media or device e.g., solid state memory or media, or magnetic or optical media
- the inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein.
Landscapes
- Tone Control, Compression And Expansion, Limiting Amplitude (AREA)
Abstract
The invention relates to modifying the loudness of an audio signal by measuring the weighted broadband level of the audio signal and modifying that weighted broadband level as a function of a spectral localization estimate of the audio signal.
Description
CONTROLLING THE LOUDNESS OF AN AUDIO SIGNAL IN RESPONSE TO
SPECTRAL LOCALIZATION
Cross-Reference to Related Applications
[0001] This application claims priority to United States Provisional Patent
Application No. 61/174,468, filed 30 April 2009, hereby incorporated by reference in its entirety.
Field of the Invention
[0002] This invention relates to audio signal processing. In particular, the invention relates to modifying the loudness of an audio signal by measuring the weighted broadband level of the audio signal and modifying that weighted broadband level as a function of a spectral localization estimate of the audio signal.
Summary of the Invention
[0003] According to aspects of the present invention, a method for controlling the loudness of an audio signal comprises receiving an audio signal, generating an estimate of the spectral localization of the audio signal, generating a broadband measure of loudness of the audio signal, modifying said broadband measure of loudness in response to an estimate of the spectral localization of the audio signal, and modifying the broadband level of the audio signal in response to the modified broadband measure of loudness. Generating an estimate of the spectral localization of the audio signal may include determining the degree to which a majority of the audio signal's energy is within half of the signal's total audio bandwidth when a perceptual frequency banding scale is employed, which may, in turn, include dividing the audio signal into a plurality of frequency bands and generating a scaling factor in response to the relative level in two frequency bands. The modified broadband measure of loudness may be temporally smoothed and the broadband level of the audio signal may be modified in response to the smoothed modified broadband measure of loudness. The temporal smoothing may have one or more time constants useful for syllabic speech processing. The two frequency bands may be the bands having the second largest and largest level power values. The two frequency bands may be the two lowest frequency bands. Generating a broadband measure of loudness of the audio signal may determine the broadband measure of loudness of the audio signal after its processing by a weighting filter. The broadband measure of loudness and the level in two or more of the frequency bands may each be based on short- term levels.
[0004] Although in principle the invention may be practiced either in the analog or digital domain (or some combination of the two), in practical embodiments of the invention, audio signals are represented by samples in blocks of data and processing is done in the digital domain.
[0005] In FIGS. 1 and 2, essentially identical devices and functions bear the same reference numeral when they appear in multiple figures. Modified devices and functions are distinguished by a prime (') or double prime (") symbol.
[0006] FIG. 1 is a functional schematic block diagram of an arrangement for controlling the loudness of an audio signal according to aspects of the present invention. The arrangement receives the audio signal and applies it to two paths: a control path and a signal path. In the control path, the loudness of the audio signal is measured in a Measure Loudness device or function 2. The resulting measure of loudness is applied to a Modify Loudness device or function 4 that uses the measure of loudness and one or more "Loudness Modification Parameters" to create a gain value that when multiplied with the original audio signal, as in multiplier 8, produces a modified audio signal having a desired modified loudness. The audio signal in the signal path may be delayed by a delay 6 to match latency incurred in Measure Loudness 2 and Modify Loudness 4. Although the arrangement may operate uninterruptedly, intermittently, or just once for a finite length audio signal, for the examples described herein, the audio signal may be processed intermittently in consecutive time interval blocks of approximately 5 to 20 milliseconds — however, such a block length is not critical to the invention.
[0007] Modify Loudness 4 includes a loudness versus gain characteristic that outputs a gain value in response to the input loudness measure and one or more loudness modification parameters. The Loudness Modification Parameters may select and/or modify the loudness-versus-gain characteristic. The resulting gain value may, for example, impose a dynamic range modification on the audio signal by, for example, applying compression and/or expansion as the loudness measure changes dynamically.
[0008] Measure Loudness 2 employs a weighted-level-based loudness measurement.
The audio signal is passed through a weighting filter 10, for example an A-, B-, or C- weighting filter, that emphasizes more perceptually relevant audio frequencies and de- emphasizes less perceptually relevant frequencies. At each time interval (a block, in this example), the broadband level (typically, broadband power) of the audio signal is calculated in Level Calculation device or function 12 and, optionally, temporally smoothed in Smoothing device or function 14 to produce a time-smoothed broadband power
measurement, which is a measure of loudness. By "broadband" is meant the entire frequency band or spectrum of the audio signal (full bandwidth audio) or substantially the entire frequency band (in practical implementations, band limiting filtering at the ends of the spectrum is often employed). If the goal is to modify the loudness of the audio signal over time, smoothing in Smoothing 14 may be performed with time constants commensurate with perceptual temporal smoothing of loudness. For example, a temporal smoothing having an attack time of 15 ms and a release time of 50 ms may be employed. However, those time constants are not critical to the invention and other values may be used. [0009] The inventor has determined that weighted level (typically, power) measurements such as performed by Weighting Filter 10 and Level Calculation 12, as described below, significantly overrate spectrally localized signals. In other words, when a short-term (a time commensurate with the block time interval, 5 to 20 ms, in the above example) loudness measurement based on weighted level is calibrated to match subjective loudness estimates for average and spectrally complex audio signals, spectrally localized signals are measured as being significantly louder than corresponding subjective loudness estimates and are treated disproportionately to other signals. For example, in the case of loudness-based dynamics processing, the result may be that more gain change is applied to spectrally localized signals than is necessary.
[0010] Still referring to the example of FIG. 1, in parallel to the weighted broadband level measurement performed in Weighting Filter 10 and Level Calculation 12, the audio signal is applied to a Filterbank device or function 16 that splits the audio signal into a multiplicity of frequency bands. A Spectral Localization device or function 18 determines how spectrally localized the audio signal is and calculates a loudness-reducing scale factor. The scale factor is multiplied by the broadband signal level estimate in order to lower the loudness estimate as the audio signal becomes more and more spectrally localized. The resulting lowered broadband level estimate is optionally smoothed in Smoothing 14 as described above. The optionally smoothed broadband level estimate, which is a measure of loudness, is applied to Modify Loudness 4 that produces a gain value for use in producing the loudness-adjusted modified audio signal.
[0011] By "spectral localization" is meant a measure of narrowbandedness of the signal component distribution in the bandwidth of an audio signal undergoing processing. For the purposes of this invention, a signal may be considered narrowbanded or spectrally localized when the majority of its energy is within half of the human auditory bandwidth of 20 Hz to 20 kHz when a perceptual frequency banding scale such as ERB or critical band
(Bark) scaling is employed. In an example of a practical embodiment found to be useful, the human auditory bandwidth may be divided into five bands, as shown in FIG. 4, and the audio signal considered to be narrowbanded or spectrally localized when a majority of the signal's energy is in one of the bands, which would be less than half of the signal's total audio bandwidth on a Bark scale. It is not critical to the invention to determine narrowbandedness or spectral localization in the manner of the example. Other ways of determining narrowbandedness or spectral localization may be usable.
[0012] FIG. 2 is a functional schematic block diagram showing an example of a variation of the arrangement of FIG. 1 in which Measure Loudness 2' differs from Measure Loudness 2 of FIG. 1. In this FIG. 2 example, the weighted audio signal rather than the unweighted audio signal provides the input to Filterbank 16. This alternative arrangement has a few benefits. First, the Weighting Filter 10 acts as a DC blocking filter, reducing the effects of DC signals interfering with the Filterbank 16 and Spectral Localization 18 calculations. Second, it provides a more perceptually relevant audio signal to the Filterbank 16 and Spectral Localization 18, which leads to a final scaling factor that has been found to work better when the loudness of the audio signal is subsequently modified by the gain value in multiplier 8.
Brief Description of the Drawings
[0013] FIG. 1 is a functional schematic block diagram of an arrangement for controlling the loudness of an audio signal in accordance with aspects of the present invention.
[0014] FIG. 2 is a functional schematic block diagram showing an example of a variation of the example of FIG. 2 in which Measure Loudness 2' differs from Measure Loudness 2 of FIG. 1.
[0015] FIG. 3 is an idealized response characteristic (gain versus frequency) of a weighting filter that is suitable for use in the arrangements of FIGS. 1 and 2. [0016] FIG. 4 shows idealized filter response characteristics (power response versus frequency) of bands in a Filterbank 16, the response characteristics being suitable for use in the arrangements of FIGS. 1 and 2.
Detailed Description of the Invention
[0017] Referring to the example of FIG. 1, the audio signal is filtered by a weighting filter 10 whose frequency response may, for example, be as shown in the idealized response of FIG. 3. The filter may have a first order highpass characteristic with a corner frequency at
300 Hz and a low frequency characteristic similar to other common A, B & C weighting filters used in weighted power measures. The filter action may be represented as x'(n) = H[x(n)] , (1) where the weighting filter input is x(n), the weighting filter output is x'(n) and the filter's transfer function is H. Although this weighting filter characteristic has been found to be useful, it is not critical to the invention and other weighting filter characteristics may be employed.
[0018] Level Calculation 12 then computes the average sample (n) power over a block of N samples, where k is the block index. The Level Calculation may be represented as
P(k) = ^-∑x'(n)2 . (2)
[0019] As explained further below, the Filterbank computes spectral band power values using the autocorrelation of the audio samples in the block k. More specifically, a sliding overlapping block of N+Q samples of the audio signal may be constructed, where Q samples overlap with the adjacent blocks, and are windowed by window function. The window function may be unity in the center and taper down toward zero at its edges to reduce edge related errors in the autocorrelation. A useful value for the overlap length Q may be 31 samples at 48 kHz, although this is not critical to the invention and other overlap lengths and sampling rates may be employed. The autocorrelation of the k windowed sample block may then be computed. This may be represented as
A(k,l) l).x(n -l) for 1 = 0,..,5 , (3)
where w(n) are windowed samples and where / is the autocorrelation lag index. [0020] The autocorrelation values A(k,l) may be transformed into band power values
B using a matrix M, where b is the band index. Values for a sample rate of 48 kHz are suitable and may be rounded to 5 decimal places.
0.01051 0.01889 0.01447 0.00920 0.00454 0.00143 0.01640 0.01262 - 0.0111 - 0.01636 - 0.00528 0.00374
M = 0.02058 0.01052 -0.02625 -0.01667 0.00566 0.00616 0.02870 -0.0096 -0.03827 0.01435 0.00957 -0.00478 0.03576 -0.0358 -0.02043 0.03320 -0.01532 0.00255
B = MA where A (4)
[0021] A common method for calculating the power of an audio signal in a frequency band of interest is to filter the audio signal, and then calculate the autocorrelation of the filtered signal. The 0 lag of the autocorrelation is the band power. If the band filter is a FIR filter, the band power can be equivalently calculated as the dot product of the autocorrelation of the signal and the autocorrelation of the filter impulse response. Because both autocorrelation vectors are symmetrical, the dot product can be performed using one half of the each of the autocorrelation vectors — where the non-zero lag values are summed twice. Each row of the matrix M represents the one sided autocorrelation of a band filter. Non-zero lag values are doubled to effect the necessary double summation. Matrix M has one row for each band. In this example, five bands were found to produce useful results. The choice of the number of bands involves a tradeoff — although a small number of bands reduces complexity, when the number of bands is too small the arrangement may fail to detect narrowbandedness under common signal conditions. Suitable Filterbank 16 band filter power responses are shown in the idealized responses of FIG. 4. The matrix M implements these filters and also includes a scaling such that the "energy per ERB band" is the same from band to band. As is well known, the ERB scale is a psychoacoustic-based frequency mapping. In FIG. 4, the low "bump" at approximately 20 kHz is ripple from the 2nd band filter that is centered at approximately 8 kHz in this example.
[0022] Because typical audio signals have more bass energy and less energy with rising frequency (similar to a pink noise signal), the first band nearly always has significantly more energy than all the other bands. To compensate for this situation, the band power of the first band preferably is reduced so as to be approximately similar to the others for commonly occurring signals by dropping its power by approximately 10 dB (x 0.1). The result is the modified band power B' that may be expressed as:
{0.1- B[b] where£ = 0
B'= \ (5)
[ B[b] where£ > 0
[0023] After reducing band 1, the Spectral Localization device or function 18 may then calculate the scaling factor as the ratio of the second largest and largest band power values, a simple calculation requiring low processing power and memory. The scaling factor
may be constrained to be between approximately -7 dB (x 0.2) and 0 dB. If the denominator of the ratio is zero, the division result is undefined and so the scaling factor D(k) is set to 1.0.
[0024] For typical audio signals that have roughly a pink-noise- shaped spectrum, the scaling factor D(k) is close to 1.0 and for spectrally localized signals is close to 0.2. [0025] Finally, a scaled weighted power measurement Pϋ(k) may be calculated as the product of the weighted power measure the scaling factor:
PD (k) = D(k).P(k) (7)
[0026] The scaled weighted power measure, optionally, may then be smoothed in
Smoothing 14.
[0027] The calculation in Equation 6 may be further simplified by only considering the first (lowest in frequency) two bands (after reducing band 1). The scaling factor may be calculated as the ratio of the smaller to the larger of the first two band powers. As above, the scaling factor preferably is constrained to a range of between -7 dB and 0 dB. As above, if the denominator of the ratio is zero, the divide result is undefined and so the scaling factor is set to 1.0.
[0028] In addition to being slightly faster to compute, Equation 8 has been found to have sound quality benefits. One problem with Equation 6 is that during vocal singing, there can be instances of "ess" where not only the scaling factor rises toward 1.0, but the signal power also rises. The net effect is a dramatic increase in the power that can cause the downstream loudness processing to, in the case of dynamics processing, apply more gain reduction to the "ess" than is necessary. De-essing is a common tool of audio mixing but when over used, it can become perceptually annoying. Since Equation 8 only looks at the lower frequency bands, the scaling factor does not rise as quickly during the sibilance "ess" in vocal singing.
Implementation
[0029] The invention may be implemented in hardware or software, or a combination of both (e.g., programmable logic arrays). Unless otherwise specified, algorithms and processes included as part of the invention are not inherently related to any particular computer or other apparatus. In particular, various general-purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus (e.g., integrated circuits) to perform the required method steps. Thus, the invention may be implemented in one or more computer programs executing on one or more programmable computer systems each comprising at least one processor, at least one data storage system (including volatile and non- volatile memory and/or storage elements), at least one input device or port, and at least one output device or port. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices, in known fashion.
[0030] Each such program may be implemented in any desired computer language
(including machine, assembly, or high level procedural, logical, or object oriented programming languages) to communicate with a computer system. In any case, the language may be a compiled or interpreted language.
[0031] Each such computer program is preferably stored on or downloaded to a storage media or device (e.g., solid state memory or media, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer system to perform the procedures described herein. The inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein. [0032] A number of embodiments of the invention have been described.
Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, some of the steps described herein may be order independent, and thus can be performed in an order different from that described.
Claims
1. A method for controlling the loudness of an audio signal, comprising receiving an audio signal, generating an estimate of the spectral localization of the audio signal , generating a broadband measure of loudness of the audio signal, modifying said broadband measure of loudness in response to an estimate of the spectral localization of the audio signal, and modifying the broadband level of the audio signal in response to the modified broadband measure of loudness.
2. The method of claim 1 wherein generating an estimate of the spectral localization of the audio signal includes determining the degree to which a majority of the audio signal's energy is within half of the signal's total audio bandwidth when a perceptual frequency banding scale is employed.
3. The method of claim 2 wherein determining the degree to which a majority of the audio signal's energy is within half of the signal's total audio bandwidth when a perceptual frequency banding scale is employed includes dividing the audio signal into a plurality of frequency bands and generating a scaling factor in response to the relative level in two frequency bands.
4. The method of any one of claims 1-3 further comprising temporally smoothing the modified broadband measure of loudness and wherein the broadband level of the audio signal is modified in response to the smoothed modified broadband measure of loudness.
5. The method of claim 3 or claim 4 as dependent on claim 3 wherein the two frequency bands are the bands having the second largest and largest level power values.
6. The method of any one of claim 3, claim 4 as dependent on claim 3, or claim 5 wherein generating a scaling factor in response to the relative level in two frequency bands uses the two lowest frequency bands.
7. A method according to any one of claims 1-6 wherein generating a broadband measure of loudness of the audio signal determines the broadband measure of loudness of the audio signal after its processing by a weighting filter.
8. A method according to any one of claims 3-7 wherein the broadband measure of loudness and the level in two or more of the frequency bands are each based on short-term levels.
9. A method according to any one of claims 4-8 wherein the temporally smoothing has one or more time constants useful for syllabic speech processing.
10. Apparatus comprising means adapted to perform the method of any one of claims 1 through 9.
11. A computer program, stored on a computer-readable medium, for causing a computer to perform the method of any one of claims 1 through 9.
12. A computer-readable medium storing thereon the computer program performing the method of any one of claims 1 through 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/265,691 US8761415B2 (en) | 2009-04-30 | 2010-04-28 | Controlling the loudness of an audio signal in response to spectral localization |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17446809P | 2009-04-30 | 2009-04-30 | |
US61/174,468 | 2009-04-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2010127024A1 true WO2010127024A1 (en) | 2010-11-04 |
Family
ID=42288540
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2010/032807 WO2010127024A1 (en) | 2009-04-30 | 2010-04-28 | Controlling the loudness of an audio signal in response to spectral localization |
Country Status (3)
Country | Link |
---|---|
US (1) | US8761415B2 (en) |
TW (1) | TWI538393B (en) |
WO (1) | WO2010127024A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8891789B2 (en) | 2009-05-06 | 2014-11-18 | Dolby Laboratories Licensing Corporation | Adjusting the loudness of an audio signal with perceived spectral balance preservation |
US8938313B2 (en) | 2009-04-30 | 2015-01-20 | Dolby Laboratories Licensing Corporation | Low complexity auditory event boundary detection |
CN108281148A (en) * | 2016-12-30 | 2018-07-13 | 宏碁股份有限公司 | Speech signal processing device and audio signal processing method |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101597375B1 (en) | 2007-12-21 | 2016-02-24 | 디티에스 엘엘씨 | System for adjusting perceived loudness of audio signals |
US8538042B2 (en) | 2009-08-11 | 2013-09-17 | Dts Llc | System for increasing perceived loudness of speakers |
JP5942463B2 (en) * | 2012-02-17 | 2016-06-29 | 株式会社ソシオネクスト | Audio signal encoding apparatus and audio signal encoding method |
JP5827442B2 (en) | 2012-04-12 | 2015-12-02 | ドルビー ラボラトリーズ ライセンシング コーポレイション | System and method for leveling loudness changes in an audio signal |
US9312829B2 (en) | 2012-04-12 | 2016-04-12 | Dts Llc | System for adjusting loudness of audio signals in real time |
US9685921B2 (en) * | 2012-07-12 | 2017-06-20 | Dts, Inc. | Loudness control with noise detection and loudness drop detection |
CN104080024B (en) | 2013-03-26 | 2019-02-19 | 杜比实验室特许公司 | Volume leveller controller and control method and audio classifiers |
US10142731B2 (en) | 2016-03-30 | 2018-11-27 | Dolby Laboratories Licensing Corporation | Dynamic suppression of non-linear distortion |
TWI590236B (en) * | 2016-12-09 | 2017-07-01 | 宏碁股份有限公司 | Voice signal processing apparatus and voice signal processing method |
US9860644B1 (en) | 2017-04-05 | 2018-01-02 | Sonos, Inc. | Limiter for bass enhancement |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5878391A (en) * | 1993-07-26 | 1999-03-02 | U.S. Philips Corporation | Device for indicating a probability that a received signal is a speech signal |
US20040044525A1 (en) * | 2002-08-30 | 2004-03-04 | Vinton Mark Stuart | Controlling loudness of speech in signals that contain speech and other types of audio material |
US20070092089A1 (en) * | 2003-05-28 | 2007-04-26 | Dolby Laboratories Licensing Corporation | Method, apparatus and computer program for calculating and adjusting the perceived loudness of an audio signal |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8090120B2 (en) | 2004-10-26 | 2012-01-03 | Dolby Laboratories Licensing Corporation | Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal |
TWI517562B (en) | 2006-04-04 | 2016-01-11 | 杜比實驗室特許公司 | Method, apparatus, and computer program for scaling the overall perceived loudness of a multichannel audio signal by a desired amount |
MY141426A (en) | 2006-04-27 | 2010-04-30 | Dolby Lab Licensing Corp | Audio gain control using specific-loudness-based auditory event detection |
US8849433B2 (en) | 2006-10-20 | 2014-09-30 | Dolby Laboratories Licensing Corporation | Audio dynamics processing using a reset |
US8521314B2 (en) | 2006-11-01 | 2013-08-27 | Dolby Laboratories Licensing Corporation | Hierarchical control path with constraints for audio dynamics processing |
WO2008085330A1 (en) | 2007-01-03 | 2008-07-17 | Dolby Laboratories Licensing Corporation | Hybrid digital/analog loudness-compensating volume control |
EP2162879B1 (en) | 2007-06-19 | 2013-06-05 | Dolby Laboratories Licensing Corporation | Loudness measurement with spectral modifications |
WO2009011826A2 (en) | 2007-07-13 | 2009-01-22 | Dolby Laboratories Licensing Corporation | Time-varying audio-signal level using a time-varying estimated probability density of the level |
TWI503816B (en) | 2009-05-06 | 2015-10-11 | Dolby Lab Licensing Corp | Adjusting the loudness of an audio signal with perceived spectral balance preservation |
-
2010
- 2010-04-28 WO PCT/US2010/032807 patent/WO2010127024A1/en active Application Filing
- 2010-04-28 US US13/265,691 patent/US8761415B2/en active Active
- 2010-04-29 TW TW099113664A patent/TWI538393B/en active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5878391A (en) * | 1993-07-26 | 1999-03-02 | U.S. Philips Corporation | Device for indicating a probability that a received signal is a speech signal |
US20040044525A1 (en) * | 2002-08-30 | 2004-03-04 | Vinton Mark Stuart | Controlling loudness of speech in signals that contain speech and other types of audio material |
US20070092089A1 (en) * | 2003-05-28 | 2007-04-26 | Dolby Laboratories Licensing Corporation | Method, apparatus and computer program for calculating and adjusting the perceived loudness of an audio signal |
Non-Patent Citations (1)
Title |
---|
ALAN SEEFELDT ET AL: "A new objective measure of perceived loudness", AUDIO ENGINEERING SOCIETY CONVENTION PAPER, NEW YORK, NY, US, 28 October 2004 (2004-10-28), XP009087934 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8938313B2 (en) | 2009-04-30 | 2015-01-20 | Dolby Laboratories Licensing Corporation | Low complexity auditory event boundary detection |
US8891789B2 (en) | 2009-05-06 | 2014-11-18 | Dolby Laboratories Licensing Corporation | Adjusting the loudness of an audio signal with perceived spectral balance preservation |
CN108281148A (en) * | 2016-12-30 | 2018-07-13 | 宏碁股份有限公司 | Speech signal processing device and audio signal processing method |
CN108281148B (en) * | 2016-12-30 | 2020-12-22 | 宏碁股份有限公司 | Speech signal processing apparatus and speech signal processing method |
Also Published As
Publication number | Publication date |
---|---|
TWI538393B (en) | 2016-06-11 |
US8761415B2 (en) | 2014-06-24 |
TW201106619A (en) | 2011-02-16 |
US20120039490A1 (en) | 2012-02-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8761415B2 (en) | Controlling the loudness of an audio signal in response to spectral localization | |
US20220394380A1 (en) | Audio Control Using Auditory Event Detection | |
US6915264B2 (en) | Cochlear filter bank structure for determining masked thresholds for use in perceptual audio coding | |
EP2381574B1 (en) | Apparatus and method for modifying an input audio signal | |
EP1629463B1 (en) | Method, apparatus and computer program for calculating and adjusting the perceived loudness of an audio signal | |
US9672834B2 (en) | Dynamic range compression with low distortion for use in hearing aids and audio systems | |
AU2011244268A1 (en) | Apparatus and method for modifying an input audio signal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10716978 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13265691 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10716978 Country of ref document: EP Kind code of ref document: A1 |