WO2009011827A1 - Audio processing using auditory scene analysis and spectral skewness - Google Patents

Audio processing using auditory scene analysis and spectral skewness Download PDF

Info

Publication number
WO2009011827A1
WO2009011827A1 PCT/US2008/008592 US2008008592W WO2009011827A1 WO 2009011827 A1 WO2009011827 A1 WO 2009011827A1 US 2008008592 W US2008008592 W US 2008008592W WO 2009011827 A1 WO2009011827 A1 WO 2009011827A1
Authority
WO
WIPO (PCT)
Prior art keywords
auditory
skewness
weighting
loudness
audio signal
Prior art date
Application number
PCT/US2008/008592
Other languages
French (fr)
Inventor
Michael John Smithers
Alan Jeffery Seefeldt
Original Assignee
Dolby Laboratories Licensing Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=39776994&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=WO2009011827(A1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Dolby Laboratories Licensing Corporation filed Critical Dolby Laboratories Licensing Corporation
Priority to AT08780174T priority Critical patent/ATE535906T1/en
Priority to EP08780174A priority patent/EP2168122B1/en
Priority to ES08780174T priority patent/ES2377719T3/en
Priority to JP2010517000A priority patent/JP5192544B2/en
Priority to CN2008800245251A priority patent/CN101790758B/en
Priority to BRPI0813723A priority patent/BRPI0813723B1/en
Priority to US12/668,741 priority patent/US8396574B2/en
Publication of WO2009011827A1 publication Critical patent/WO2009011827A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G9/00Combinations of two or more types of control, e.g. gain control and tone control
    • H03G9/005Combinations of two or more types of control, e.g. gain control and tone control of digital or coded signals
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G9/00Combinations of two or more types of control, e.g. gain control and tone control
    • H03G9/02Combinations of two or more types of control, e.g. gain control and tone control in untuned amplifiers
    • H03G9/12Combinations of two or more types of control, e.g. gain control and tone control in untuned amplifiers having semiconductor devices
    • H03G9/18Combinations of two or more types of control, e.g. gain control and tone control in untuned amplifiers having semiconductor devices for tone control and volume expansion or compression

Definitions

  • the invention relates to audio processing, in general, and to auditory scene analysis and spectral skewness, in particular.
  • Crockett and Seefeldt International Application under the Patent Cooperation Treaty, S.N. PCT/US2007/008313, entitled, "Controlling Dynamic Gain Parameters of Audio using Auditory Scene Analysis and Specific-Loudness-Based Detection of Auditory Events," naming Brett Graham Crockett and Alan Jeffrey Seefeldt as inventors, filed March 30, 2007, with Attorney Docket DOLl 86 PCT, and published on November 8, 2007 as WO 2007/127023;
  • Crockett U. S. Patent Application S.N. 10/474,387, entitled, "High Quality Time- Scaling and Pitch-Scaling of Audio Signals," naming Brett Graham Crockett as the inventor, filed October 10, 2003, with Attorney Docket No. DOL07503, and published on June 24, 2004 as US 2004/0122662 Al;
  • Crockett et al. U.S. Patent Application S.N. 10/478,398, entitled, "Method for Time Aligning Audio Signals Using Characterizations Based on Auditory Events," naming Brett G. Crockett et al. as inventors, filed November 20, 2003, with Attorney Docket No. DOL09201 , and published July 29, 2004 as US 2004/0148159 Al ;
  • Crockett U.S. Patent Application S.N. 10/478,538, entitled, "Segmenting Audio Signals Into Auditory Events," naming Brett G. Crockett as the inventor, filed November 20, 2003, with Attorney Docket No. DOL098, and published August 26, 2004 as US 2004/0165730 Al ;
  • Crockett et al . U.S. Patent Application S.N. 10/478,397, entitled, "Comparing Audio Using Characterizations Based on Auditory Events," naming Brett G. Crockett et al. as inventors, filed November 20, 2003, with Attorney Docket No. DOL092, and published September 2, 2004 as US 2004/0172240 A l ;
  • Audio Based on Auditory Scene Analysis Audio Engineering Society Convention Paper 6416, 1 18th Convention, Barcelona, May 28-31, 2005;
  • Crockett, B. "High Quality Multichannel Time Scaling and Pitch-Shifting using Auditory Scene Analysis," Audio Engineering Society Convention Paper 5948, New York, October 2003;
  • ASA auditory scene analysis
  • the segments are sometimes referred to as “auditory events” or “audio events.”
  • Albert S. Bregman, "Auditory Scene Analysis—The Perceptual Organization of Sound” Massachusetts Institute of Technology, 1991 , Fourth printing, 2001 , Second MIT Press paperback edition
  • Albert S. Bregman, "Auditory Scene Analysis—The Perceptual Organization of Sound” Massachusetts Institute of Technology, 1991 , Fourth printing, 2001 , Second MIT Press paperback edition
  • Bhadkamkar et al. U.S. Pat. No. 6,002,776 (Dec. 14, 1999) cites publications dating back to 1976 as “prior art work related to sound separation by auditory scene analysis.”
  • Crockett and Crocket et al. in the various patent applications and papers listed above identify auditory events. Those documents teach dividing an audio signal into auditory events (each tending to be perceived as separate and distinct) by detecting changes in spectral composition (amplitude as a function of frequency) with respect to time. This may be done, for example, by calculating the spectral content of successive time blocks of the audio signal, comparing the spectral content between successive time blocks and identifying an auditory event boundary as the boundary between blocks where the difference in the spectral content exceeds a threshold. Alternatively, changes in amplitude with respect to time may be calculated instead of or in addition to changes in spectral composition with respect to time.
  • the auditory event boundary markers are often arranged into a temporal control signal whereby the range, typically zero to one, indicates the strength of the event boundary. Furthermore this control signal is often filtered such that event boundary strength remains, and time intervals between the events boundaries are calculated as decaying values of the preceding event boundary. This filtered auditory event strength is then used by other audio processing methods including automatic gain control and dynamic range control.
  • AGC automatic gain control
  • DRC dynamic range control
  • auditory scene analysis improves the performance of AGC and DRC methods by minimizing the change in gain between auditory event boundaries, and confining much of the gain change to the neighborhood of an event boundary. It does this by modifying the dynamics-processing release behavior. In this way, auditory events sound consistent and natural.
  • Notes played on a piano are an example.
  • AGC or DRC methods the gain applied to the audio signal increases during the tail of each note, causing each note to swell unnaturally.
  • the AGC or DRC gain is held constant within each note and changes only near the onset of each note where an auditory event boundary is detected.
  • the resulting gain-adjusted audio signal sounds natural as the tail of each note dies away.
  • Typical implementations of auditory scene analysis (as in the references above) are deliberately level invariant. That is, they detect auditory event boundaries regardless of absolute signal level. While level invariance is useful in many applications, some auditory scene analyses benefit from some level dependence.
  • ASA control of AGC and DRC prevents large gain changes between auditory event boundaries.
  • longer-term gain changes can still be undesirable on some types of audio signals.
  • the AGC or DRC gain constrained to change only near event boundaries, may allow the level of the processing audio signal to rise undesirably and unnaturally during the quiet section. This situation occurs frequently in films where sporadic dialog alternates with quiet background sounds. Because the quiet background audio signal also contains auditory events, the AGC or DRC gain is changed near those event boundaries, and the overall audio signal level rises.
  • perceptually quieter refers not to quieter on an objective loudness measure (as in Seefeldt et al. and Seefeldt) but rather quieter based on the expected loudness of the content. For example, human experience indicates that a whisper is a quiet sound. If a dynamics processing system measures this to be quiet and consequently increases the AGC gain to achieve some nominal output loudness or level, the resulting gain-adjusted whisper would be louder than experience says it should be.
  • the method includes weighting the auditory events (an auditory event having a spectrum and a loudness), using skewness in the spectra and controlling loudness of the auditory events, using the weights.
  • the weighting being proportionate to the measure of skewness in the spectra; the measure of skewness is a measure of smoothed skewness; the weighting is insensitive to amplitude of the audio signal; the weighting is insensitive to power; the weighting is insensitive to loudness; any relationship between signal measure and absolute reproduction level is not known at the time of weighting; the weighting includes weighting auditory-event-boundary importance, using skewness in the spectra; and reducing swelling of AGC or DRC processing level during perceptibly quieter segments of the audio signal as compared to methods not perfo ⁇ ning the claimed weighting.
  • the invention is a computer-readable memory containing a computer program for performing any one of the above methods.
  • the invention is a computer system including a CPU, one of the above-mentioned memories and a bus communicatively coupling the CPU and the memory.
  • the invention is an audio-signal processor including a spectral-skewness calculator for calculating the spectral skewness in an audio signal, an auditory-events identifier for identifying and weighting auditory events in the audio signal, using the calculated spectral skewness, a parameters modifier for modifying parameters for controlling the loudness of auditory events in the audio signal and a controller for controlling the loudness of auditory events in the audio signal.
  • the invention is a method for controlling the loudness of auditory events in an audio signal, including calculating measures of skewness of spectra of successive auditory events of an audio signal, generating weights for the auditory events based on the measures of skewness, deriving a control signal from the weights and controlling the loudness of the auditory events using the control signal.
  • FIG. 1 illustrates a device for performing two Crockett and Seefeldt methods of analyzing auditory scenes and controlling dynamics-gain parameters.
  • FIG. 2 illustrates an audio processor for identifying auditory events and calculating skewness for modify the auditory events, themselves for modifying the dynamics-processing parameters, according to an embodiment of the invention.
  • FIG. 3 is a series of graphs illustrating the use of auditory events to control the release time in a digital implementation of a Dynamic Range Controller (DRC), according to one embodiment of the invention.
  • DRC Dynamic Range Controller
  • FIG. 4 is an idealized characteristic response of a linear filter suitable as a transmission filter according to an embodiment of the invention.
  • FIG. 5 shows a set of idealized auditory-filter characteristic responses that approximate critical banding on the ERB scale.
  • FlG. 1 illustrates a device 1 for analyzing auditory scenes and controlling dynamics-gain parameters according to Crockett and Seefeldt.
  • the device includes an auditory-events identifier 10, an optional auditory-events-characteristics identifier 1 1 and a dynamics-parameters modifier 12.
  • the auditory events identifier 10 receives audio as input and produces an input for the dynamics-parameters modifier 12 (and an input for the auditory-events-characteristics identifier 1 1 , if present).
  • the dynamics-parameters modifier 12 receives output of the auditory-events identifier 10 (and auditory-events- characteristics identifier 1 1, if present) and produces an output.
  • the auditory-events identifier 10 analyzes the spectrum and from the results identifies the location of perceptible audio events that are to control the dynamics-gain parameters.
  • the auditory-events identifier 10 transforms the audio into a perceptual-loudness domain (that may provide more psychoacoustically relevant information than the first method) and in the perceptual-loudness domain identifies the location of auditory events that are to control the dynamics-gain parameters.
  • the audio processing is aware of absolute acoustic-reproduction levels.
  • the dynamics-parameters modifier 12 modifies the dynamics parameters based on the output of the auditory-events identifier 10 (and auditory-events-characteristics identifier 1 1 , if present).
  • a digital audio signal x[n] is segmented into blocks, and for each block /, D[t] represents the spectral difference between the current block and the previous block.
  • D[t] is the sum, across all spectral coefficients, of the magnitude of the difference between normalized log spectral coefficients (in dB) for the current block / and the previous block / - 1 .
  • D[l] is proportional to absolute differences in spectra (itself in dB).
  • D[t] is the sum, across all specific-loudness coefficients, of the magnitude of the difference between normalized specific-loudness coefficients for the current block / and the previous block / - 1 .
  • D[t] is proportional to absolute differences in specific loudness (in sone).
  • D[t] exceeds a threshold D mm , then an event is considered to have occurred.
  • the event may have a strength, between zero and one, based on the ratio of D[t] minus D min to the difference between £> max and D mm .
  • the strength A[t] may be computed as:
  • A[t] D mm ⁇ D[t ⁇ ⁇ D m ⁇ i (1) max mm
  • the signal A[t] is an impulsive signal with an impulse occurring at the location of an event boundary. For the purposes of controlling the release time, one may further smooth the signal A[t] so that it decays smoothly to zero after the detection of an event boundary.
  • the smoothed event control signal A[I] may be computed from A[t] according to:
  • cc m ⁇ u controls the decay time of the event control signal.
  • FIG. 3 is a sequence of graphs illustrating the operation and effect of the invention, according to one embodiment, "b)" in FIG. 3 depicts the event control signal A[t] for the corresponding audio signal of "a)” in FIG. 3, with the half-decay time of the smoother set to 250 ms.
  • the audio signal contains three bursts of dialog, interspersed with quiet background campfire crackling sounds.
  • the event control signal shows many auditory events in both the dialog and the background sounds.
  • an embodiment of the invention modifies or weights the auditory strength A[t] using a measure of the asymmetry of the audio signal spectrum.
  • An embodiment of the invention calculates the spectral skewness of the excitation of the audio signal.
  • Skewness is a statistical measure of the asymmetry of a probability distribution.
  • a distribution symmetrical about the mean has zero skew.
  • a distribution with its bulk or mass concentrated above the mean and with a long tail tending lower than the mean has a negative skew.
  • a distribution concentrated below the mean and with a long tail tending higher than the mean has a positive skew.
  • the magnitude or power spectrum of a typical audio signal has positive skew. That is, the bulk of the energy in the spectrum is concentrated lower in the spectrum, and the spectrum has a long tail toward the upper part of the spectrum.
  • FIG. 2 illustrates an audio processor 2 according to an embodiment of the invention.
  • the audio processor 2 includes the dynamics-parameters modifier 12 and the optional auditory-events-characteristics identifier 1 1 of FIG. 1 , as well as an auditory- events identifier 20 and a skewness calculator 21.
  • the skewness calculator 21 and auditory-events identifier 20 both receive the audio signal 13, and the skewness calculator 21 produces input for the auditory-events identifier 20.
  • the auditory-events identifier 20, auditory-events-characteristics identifier 1 1 and dynamics-parameters modifier 12 are otherwise connected as are their counterparts in FIG. 1. In FIG.
  • the skewness calculator 21 calculates the skewness from a spectral representation of the audio signal 13, and the auditory-e vents identifier 20 calculates the auditory scene analysis from the same spectral representation.
  • the audio signal 13 may be grouped into 50 percent overlapping blocks of Msamples, and the Discrete Fourier Transform may be computed as follows:
  • the block size for the transform is assumed to be the same as that for calculating the auditory event signal. This need not be the case, however. Where different block rates exist, signals on one block rate may be interpolated or rate converted onto the same timescale as signals on the other block rate.
  • excitation signal E[b,t] approximating the distribution of energy along the basilar membrane of the inner ear at critical band b during time block t is computed:
  • T[k] represents the frequency response of a filter simulating the transmission of audio through the outer and middle ear
  • C b [k] represents the frequency response of the basilar membrane at a location corresponding to critical band b.
  • FIG. 4 depicts the frequency response of a suitable transmission filter T[k] .
  • FIG. 5 depicts a suitable set of critical band filter responses, corresponding to C b [k] , in which
  • ⁇ E[b,t] (6) and ⁇ is the variance of the excitation signal:
  • the skewness signal S ⁇ [/] of equation (5) fluctuates considerably and requires smoothing for it to avoid artifacts when modifying the event control signal and subsequent dynamics processing parameters.
  • One embodiment uses a single pole smoother with a decay constant a s ⁇ having a half-decay time of approximately 6.5 ms:
  • a constrained skewness SK"[t] may be computed as:
  • the "e)" graph shows the skewness signal that corresponds to the audio signal in "a)" of FlG. 3. The skewness is high for the louder dialog bursts and low for the background sounds.
  • the skewness signal SK"[t] passes to the auditory-events identifier 20 of FlG. 2 that weights the spectral difference measure D[t] as:
  • A*M SK"[t]D[t] (8)
  • the skewness-modified auditory strength signal A s ⁇ [t] is computed in the same way as A[t] in equation (1):
  • FIG. 3 depicts the skewness-modified event control signal A s ⁇ [t] for the corresponding audio signal in "a)” of FIG. 3. Fewer auditory events appear during the background sounds while events corresponding to the louder dialog remain.
  • FIG. 3 shows the skewness-modified event-controlled DRC signal. With fewer auditory events in the background sounds, the DRC gain stays relatively constant and moves only for the louder dialog sections, "h)" in FIG. 3 shows the resulting DRC- modified audio signal.
  • the DRC-modified audio signal has none of the undesirable swelling in level during the background sounds.
  • the skewness signal SK"[t] goes low sometimes for perceptually louder signals.
  • the value of spectral difference measure D[t] is large enough that even after weighting by the skewness signal SK"[t] in equation 8, the weighted spectral difference measure D s ⁇ [t] is typically still large enough to indicate an auditory event boundary.
  • the event control signal A s ⁇ [t] is not adversely affected.

Landscapes

  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Tone Control, Compression And Expansion, Limiting Amplitude (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Holo Graphy (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

A method for controlling the loudness of auditory events in an audio signal. In an embodiment, the method includes weighting the auditory events (an auditory event having a spectrum and a loudness), using skewness in the spectra and controlling loudness of the auditory events, using the weights. Various embodiments of the invention are as follows: The weighting being proportionate to the measure of skewness in the spectra; the measure of skewness is a measure of smoothed skewness; the weighting is insensitive to amplitude of the audio signal; the weighting is insensitive to power; the weighting is insensitive to loudness; and any relationship between signal measure and absolute reproduction level is not known at the time of weighting; the weighting includes weighting auditory-event-boundary importance, using skewness in the spectra.

Description

Description
Audio Processing using Auditory Scene Analysis and Spectral Skewness
Technical Field
The invention relates to audio processing, in general, and to auditory scene analysis and spectral skewness, in particular.
References and Incorporation by Reference
The following documents are hereby incorporated by reference in their entirety: Crockett and Seefeldt, International Application under the Patent Cooperation Treaty, S.N. PCT/US2007/008313, entitled, "Controlling Dynamic Gain Parameters of Audio using Auditory Scene Analysis and Specific-Loudness-Based Detection of Auditory Events," naming Brett Graham Crockett and Alan Jeffrey Seefeldt as inventors, filed March 30, 2007, with Attorney Docket DOLl 86 PCT, and published on November 8, 2007 as WO 2007/127023;
Seefeldt et al., International Application under the Patent Cooperation Treaty, S.N. PCT/US 2004/016964, entitled, "Method, Apparatus and Computer Program for Calculating and Adjusting the Perceived Loudness of an Audio Signal," naming Alan Jeffrey Seefeldt et al. as inventors, filed May 27, 2004, with Attorney Docket No. DOLl 19 PCT, and published on December 23, 2004 as WO 2004/111994 A2;
Seefeldt, International Application under the Patent Cooperation Treaty, S.N. PCT/US2005/038579, entitled "Calculating and Adjusting the Perceived Loudness and/or the Perceived Spectral Balance of an Audio Signal," naming Alan Jeffrey Seefeldt as the inventor, filed October 25, 2005, with Attorney Docket No. DOLl 5202 PCT, and published on May 4, 2006 as WO 2006/047600;
Crockett, U. S. Patent Application S.N. 10/474,387, entitled, "High Quality Time- Scaling and Pitch-Scaling of Audio Signals," naming Brett Graham Crockett as the inventor, filed October 10, 2003, with Attorney Docket No. DOL07503, and published on June 24, 2004 as US 2004/0122662 Al;
Crockett et al., U.S. Patent Application S.N. 10/478,398, entitled, "Method for Time Aligning Audio Signals Using Characterizations Based on Auditory Events," naming Brett G. Crockett et al. as inventors, filed November 20, 2003, with Attorney Docket No. DOL09201 , and published July 29, 2004 as US 2004/0148159 Al ;
Crockett, U.S. Patent Application S.N. 10/478,538, entitled, "Segmenting Audio Signals Into Auditory Events," naming Brett G. Crockett as the inventor, filed November 20, 2003, with Attorney Docket No. DOL098, and published August 26, 2004 as US 2004/0165730 Al ;
Crockett et al ., U.S. Patent Application S.N. 10/478,397, entitled, "Comparing Audio Using Characterizations Based on Auditory Events," naming Brett G. Crockett et al. as inventors, filed November 20, 2003, with Attorney Docket No. DOL092, and published September 2, 2004 as US 2004/0172240 A l ;
Smithers, International Application under the Patent Cooperation Treaty S.N. PCT/US 05/24630, entitled, "Method for Combining Audio Signals Using Auditory Scene Analysis," naming Michael John Smithers as the inventor, filed July 13, 2005, with Attorney Docket No. DOL148 PCT, and published March 9, 2006 as WO 2006/026161 ; Crockett, B. and Smithers, M., "A Method for Characterizing and Identifying
Audio Based on Auditory Scene Analysis," Audio Engineering Society Convention Paper 6416, 1 18th Convention, Barcelona, May 28-31, 2005;
Crockett, B., "High Quality Multichannel Time Scaling and Pitch-Shifting using Auditory Scene Analysis," Audio Engineering Society Convention Paper 5948, New York, October 2003; and
Seefeldt et al., "A New Objective Measure of Perceived Loudness," Audio Engineering Society Convention Paper 6236, San Francisco, October 28, 2004.
Background Art Auditory Events and Auditory Event Detection
The division of sounds into units or segments perceived as separate and distinct is sometimes referred to as "auditory event analysis" or "auditory scene analysis" ("ASA"). The segments are sometimes referred to as "auditory events" or "audio events." Albert S. Bregman, "Auditory Scene Analysis—The Perceptual Organization of Sound" (Massachusetts Institute of Technology, 1991 , Fourth printing, 2001 , Second MIT Press paperback edition) extensively discusses auditory scene analysis. In addition, Bhadkamkar et al., U.S. Pat. No. 6,002,776 (Dec. 14, 1999) cites publications dating back to 1976 as "prior art work related to sound separation by auditory scene analysis." However, Bhadkamkar et al. discourages the practical use of auditory scene analysis, concluding that "[techniques involving auditory scene analysis, although interesting from a scientific point of view as models of human auditory processing, are currently far too computationally demanding and specialized to be considered practical techniques for sound separation until fundamental progress is made."
Crockett and Crocket et al. in the various patent applications and papers listed above identify auditory events. Those documents teach dividing an audio signal into auditory events (each tending to be perceived as separate and distinct) by detecting changes in spectral composition (amplitude as a function of frequency) with respect to time. This may be done, for example, by calculating the spectral content of successive time blocks of the audio signal, comparing the spectral content between successive time blocks and identifying an auditory event boundary as the boundary between blocks where the difference in the spectral content exceeds a threshold. Alternatively, changes in amplitude with respect to time may be calculated instead of or in addition to changes in spectral composition with respect to time.
The auditory event boundary markers are often arranged into a temporal control signal whereby the range, typically zero to one, indicates the strength of the event boundary. Furthermore this control signal is often filtered such that event boundary strength remains, and time intervals between the events boundaries are calculated as decaying values of the preceding event boundary. This filtered auditory event strength is then used by other audio processing methods including automatic gain control and dynamic range control.
Dynamics Processing of Audio
The techniques of automatic gain control (AGC) and dynamic range control (DRC) are well known and common in many audio signal paths. In an abstract sense, both techniques measure the level of an audio signal and then gain-modify the signal by an amount that is a function of the measured level. In a linear, 1 : 1 dynamics processing system, the input audio is not processed and the output audio signal ideally matches the input audio signal. Additionally, imagine an audio dynamics processing system that automatically measures the input signal and controls the output signal with that - A - measurement. If the input signal rises in level by 6 dB and the processed output signal rises in level by only 3 dB, then the output signal has been compressed by a ratio of 2: 1 with respect to the input signal.
In Crockett and Seefeldt, auditory scene analysis improves the performance of AGC and DRC methods by minimizing the change in gain between auditory event boundaries, and confining much of the gain change to the neighborhood of an event boundary. It does this by modifying the dynamics-processing release behavior. In this way, auditory events sound consistent and natural.
Notes played on a piano are an example. With conventional AGC or DRC methods, the gain applied to the audio signal increases during the tail of each note, causing each note to swell unnaturally. With auditory scene analysis, the AGC or DRC gain is held constant within each note and changes only near the onset of each note where an auditory event boundary is detected. The resulting gain-adjusted audio signal sounds natural as the tail of each note dies away. Typical implementations of auditory scene analysis (as in the references above) are deliberately level invariant. That is, they detect auditory event boundaries regardless of absolute signal level. While level invariance is useful in many applications, some auditory scene analyses benefit from some level dependence.
One such case is the method described in Crockett and Seefeldt. There, ASA control of AGC and DRC prevents large gain changes between auditory event boundaries. However, longer-term gain changes can still be undesirable on some types of audio signals. When an audio signal goes from a louder to a quieter section, the AGC or DRC gain, constrained to change only near event boundaries, may allow the level of the processing audio signal to rise undesirably and unnaturally during the quiet section. This situation occurs frequently in films where sporadic dialog alternates with quiet background sounds. Because the quiet background audio signal also contains auditory events, the AGC or DRC gain is changed near those event boundaries, and the overall audio signal level rises.
Simply weighting the importance of auditory events by a measure of the audio signal level, power or loudness is undesirable. In many situations the relationship between the signal measure and absolute reproduction level is not known. Ideally, a measure discriminating or detecting perceptually quieter audio signals independent of the absolute level of the audio signal would be useful. Here, "perceptually quieter" refers not to quieter on an objective loudness measure (as in Seefeldt et al. and Seefeldt) but rather quieter based on the expected loudness of the content. For example, human experience indicates that a whisper is a quiet sound. If a dynamics processing system measures this to be quiet and consequently increases the AGC gain to achieve some nominal output loudness or level, the resulting gain-adjusted whisper would be louder than experience says it should be.
Disclosure of the Invention
Herein are taught methods and apparatus for controlling the loudness of auditory events in an audio signal. In an embodiment, the method includes weighting the auditory events (an auditory event having a spectrum and a loudness), using skewness in the spectra and controlling loudness of the auditory events, using the weights. Various embodiments of the invention are as follows: The weighting being proportionate to the measure of skewness in the spectra; the measure of skewness is a measure of smoothed skewness; the weighting is insensitive to amplitude of the audio signal; the weighting is insensitive to power; the weighting is insensitive to loudness; any relationship between signal measure and absolute reproduction level is not known at the time of weighting; the weighting includes weighting auditory-event-boundary importance, using skewness in the spectra; and reducing swelling of AGC or DRC processing level during perceptibly quieter segments of the audio signal as compared to methods not perfoπning the claimed weighting.
In other embodiments, the invention is a computer-readable memory containing a computer program for performing any one of the above methods.
In still other embodiments, the invention is a computer system including a CPU, one of the above-mentioned memories and a bus communicatively coupling the CPU and the memory.
In still another embodiment, the invention is an audio-signal processor including a spectral-skewness calculator for calculating the spectral skewness in an audio signal, an auditory-events identifier for identifying and weighting auditory events in the audio signal, using the calculated spectral skewness, a parameters modifier for modifying parameters for controlling the loudness of auditory events in the audio signal and a controller for controlling the loudness of auditory events in the audio signal. In still another embodiment, the invention is a method for controlling the loudness of auditory events in an audio signal, including calculating measures of skewness of spectra of successive auditory events of an audio signal, generating weights for the auditory events based on the measures of skewness, deriving a control signal from the weights and controlling the loudness of the auditory events using the control signal.
The various features of the present invention and its preferred embodiments may be better understood by referring to the following discussion and the accompanying drawings in which like reference numerals refer to like elements.
Description of the Drawings
FIG. 1 illustrates a device for performing two Crockett and Seefeldt methods of analyzing auditory scenes and controlling dynamics-gain parameters.
FIG. 2 illustrates an audio processor for identifying auditory events and calculating skewness for modify the auditory events, themselves for modifying the dynamics-processing parameters, according to an embodiment of the invention.
FIG. 3 is a series of graphs illustrating the use of auditory events to control the release time in a digital implementation of a Dynamic Range Controller (DRC), according to one embodiment of the invention.
FIG. 4 is an idealized characteristic response of a linear filter suitable as a transmission filter according to an embodiment of the invention.
FIG. 5 shows a set of idealized auditory-filter characteristic responses that approximate critical banding on the ERB scale.
Best Mode for Carrying Out the Invention FlG. 1 illustrates a device 1 for analyzing auditory scenes and controlling dynamics-gain parameters according to Crockett and Seefeldt. The device includes an auditory-events identifier 10, an optional auditory-events-characteristics identifier 1 1 and a dynamics-parameters modifier 12. The auditory events identifier 10 receives audio as input and produces an input for the dynamics-parameters modifier 12 (and an input for the auditory-events-characteristics identifier 1 1 , if present). The dynamics-parameters modifier 12 receives output of the auditory-events identifier 10 (and auditory-events- characteristics identifier 1 1, if present) and produces an output. The auditory-events identifier 10 analyzes the spectrum and from the results identifies the location of perceptible audio events that are to control the dynamics-gain parameters. Alternatively, the auditory-events identifier 10 transforms the audio into a perceptual-loudness domain (that may provide more psychoacoustically relevant information than the first method) and in the perceptual-loudness domain identifies the location of auditory events that are to control the dynamics-gain parameters. (In this alternative, the audio processing is aware of absolute acoustic-reproduction levels.)
The dynamics-parameters modifier 12 modifies the dynamics parameters based on the output of the auditory-events identifier 10 (and auditory-events-characteristics identifier 1 1 , if present).
In both alternatives, a digital audio signal x[n] is segmented into blocks, and for each block /, D[t] represents the spectral difference between the current block and the previous block.
For the first alternative, D[t] is the sum, across all spectral coefficients, of the magnitude of the difference between normalized log spectral coefficients (in dB) for the current block / and the previous block / - 1 . In this alternative D[l] is proportional to absolute differences in spectra (itself in dB). For the second alternative, D[t] is the sum, across all specific-loudness coefficients, of the magnitude of the difference between normalized specific-loudness coefficients for the current block / and the previous block / - 1 . In this alternative, D[t] is proportional to absolute differences in specific loudness (in sone).
In both alternatives, if D[t] exceeds a threshold Dmm , then an event is considered to have occurred. The event may have a strength, between zero and one, based on the ratio of D[t] minus Dmin to the difference between £>max and Dmm . The strength A[t] may be computed as:
0 D[t] < Dn
A'] - D
A[t] = Dmm < D[t} < Dm∞i (1) max mm
1 D[t] ≥ Dma The maximum and minimum limits are different for each alternative, due to their different units. The result, however, from both is an event strength in the range 0 to 1. Other alternatives may calculate an event strength, but the alternative expressed in equation (1) has proved itself in a number of areas, including controlling dynamics processing. Assigning a strength (proportional to the amount of spectral change associated with that event) to the auditory event allows greater control over the dynamics processing, compared to a binary event decision. Larger gain changes are acceptable during stronger events, and the signal in equation (1) allows such variable control.
The signal A[t] is an impulsive signal with an impulse occurring at the location of an event boundary. For the purposes of controlling the release time, one may further smooth the signal A[t] so that it decays smoothly to zero after the detection of an event boundary. The smoothed event control signal A[I] may be computed from A[t] according to:
Figure imgf000010_0001
Here ccmιu controls the decay time of the event control signal.
FIG. 3 is a sequence of graphs illustrating the operation and effect of the invention, according to one embodiment, "b)" in FIG. 3 depicts the event control signal A[t] for the corresponding audio signal of "a)" in FIG. 3, with the half-decay time of the smoother set to 250 ms. The audio signal contains three bursts of dialog, interspersed with quiet background campfire crackling sounds. The event control signal shows many auditory events in both the dialog and the background sounds.
In FIG. 3, "c)" shows the DRC gain signal where the event control signal A[t] is used to vary the release time constant for the DRC gain smoothing. As Crocket and Seefeldt describes, when the control signal is equal to one, the release smoothing coefficient is unaffected, and the smoothed gain changes according to the value of the time constant. When the control signal is equal to zero, the smoothed gain is prevented from changing. When the control signal is between zero and one, the smoothed gain is allowed to change — but at a reduced rate in proportion to the control signal. In "c" of FIG. 3, the DRC gain rises during the quiet background sounds due to the number of events detected in the background. The resulting DRC-modified audio signal in "d)" of FIG. 3 has audible and undesirable swelling of the background noise between the bursts of dialog. To reduce the gain change during quiet background sounds, an embodiment of the invention modifies or weights the auditory strength A[t] using a measure of the asymmetry of the audio signal spectrum. An embodiment of the invention calculates the spectral skewness of the excitation of the audio signal.
Skewness is a statistical measure of the asymmetry of a probability distribution. A distribution symmetrical about the mean has zero skew. A distribution with its bulk or mass concentrated above the mean and with a long tail tending lower than the mean has a negative skew. A distribution concentrated below the mean and with a long tail tending higher than the mean has a positive skew. The magnitude or power spectrum of a typical audio signal has positive skew. That is, the bulk of the energy in the spectrum is concentrated lower in the spectrum, and the spectrum has a long tail toward the upper part of the spectrum.
FIG. 2 illustrates an audio processor 2 according to an embodiment of the invention. The audio processor 2 includes the dynamics-parameters modifier 12 and the optional auditory-events-characteristics identifier 1 1 of FIG. 1 , as well as an auditory- events identifier 20 and a skewness calculator 21. The skewness calculator 21 and auditory-events identifier 20 both receive the audio signal 13, and the skewness calculator 21 produces input for the auditory-events identifier 20. The auditory-events identifier 20, auditory-events-characteristics identifier 1 1 and dynamics-parameters modifier 12 are otherwise connected as are their counterparts in FIG. 1. In FIG. 2, the skewness calculator 21 calculates the skewness from a spectral representation of the audio signal 13, and the auditory-e vents identifier 20 calculates the auditory scene analysis from the same spectral representation. The audio signal 13 may be grouped into 50 percent overlapping blocks of Msamples, and the Discrete Fourier Transform may be computed as follows:
M-\ _ 2πkn
X[k,t} = ∑x[n,t}e~ *' (3) where M= 2*N samples and x[n,t] denotes a block of samples.
The block size for the transform is assumed to be the same as that for calculating the auditory event signal. This need not be the case, however. Where different block rates exist, signals on one block rate may be interpolated or rate converted onto the same timescale as signals on the other block rate.
The excitation signal E[b,t] approximating the distribution of energy along the basilar membrane of the inner ear at critical band b during time block t is computed:
Figure imgf000012_0001
where T[k] represents the frequency response of a filter simulating the transmission of audio through the outer and middle ear and Cb[k] represents the frequency response of the basilar membrane at a location corresponding to critical band b.
FIG. 4 depicts the frequency response of a suitable transmission filter T[k] . FIG. 5 depicts a suitable set of critical band filter responses, corresponding to Cb[k] , in which
40 bands are spaced uniformly along the Moore and Glasberg Equivalent Rectangular Bandwidth (ERB) scale, for a sample rate of 48 kHz and transform size of M= 2048. A rounded exponential function describes each filter shape, and 1 ERB separates the bands.
If the auditory event boundaries are computed from the specific loudness spectrum, per Crocket and Seefeldt, then the excitation signal E[b,t] already exists as part of the specific-loudness calculation.
Finally the spectral skewness is computed from the excitation signal E[b,t] as:
Figure imgf000012_0002
where // is the arithmetic mean of the excitation:
μ = ±∑E[b,t] (6) and σ is the variance of the excitation signal:
Figure imgf000013_0001
The skewness signal S^[/] of equation (5) fluctuates considerably and requires smoothing for it to avoid artifacts when modifying the event control signal and subsequent dynamics processing parameters. One embodiment uses a single pole smoother with a decay constant a having a half-decay time of approximately 6.5 ms:
SK'[t] = aSKSK'[t - I] + (I - aSK )SK[t] (8)
Limiting the skewness to maximum and minimum SKmax and SKmm, respectively, may be useful. A constrained skewness SK"[t] may be computed as:
SK'[t) ≤ SKmm
SK'[t] - SKmm
SK"[t] = SKmm < SK'[t) < SKn (7)
SKmax ~ ^min SK[t] ≥ SKmax
Low values (values close to 0.0) of the skewness signal SK"[t] typically correspond to characteristically quieter signals, while high skewness values (values close to 1.0) typically correspond to characteristically louder signals. In FIG. 3, the "e)" graph shows the skewness signal that corresponds to the audio signal in "a)" of FlG. 3. The skewness is high for the louder dialog bursts and low for the background sounds.
The skewness signal SK"[t] passes to the auditory-events identifier 20 of FlG. 2 that weights the spectral difference measure D[t] as:
A*M = SK"[t]D[t] (8) The skewness-modified auditory strength signal A [t] is computed in the same way as A[t] in equation (1):
Figure imgf000014_0001
The skewness-modified auditory strength signal A [t] is smoothed in the same way as A[t] in equation (2):
Λ,M Λn>aAsκ[t-n (io)
Figure imgf000014_0002
otherwise
In FIG. 3, "f)" depicts the skewness-modified event control signal A [t] for the corresponding audio signal in "a)" of FIG. 3. Fewer auditory events appear during the background sounds while events corresponding to the louder dialog remain. In FIG. 3, "g)" shows the skewness-modified event-controlled DRC signal. With fewer auditory events in the background sounds, the DRC gain stays relatively constant and moves only for the louder dialog sections, "h)" in FIG. 3 shows the resulting DRC- modified audio signal.
The DRC-modified audio signal has none of the undesirable swelling in level during the background sounds.
The skewness signal SK"[t] goes low sometimes for perceptually louder signals. For these loud signals, the value of spectral difference measure D[t] is large enough that even after weighting by the skewness signal SK"[t] in equation 8, the weighted spectral difference measure D [t] is typically still large enough to indicate an auditory event boundary. The event control signal A [t] is not adversely affected.

Claims

Claims
\ . A method for controlling the loudness of auditory events in an audio signal, the method comprising: weighting the auditory events (an auditory event having a spectrum and a loudness), using skewness in the spectra; and controlling loudness of the auditory events, using the weights.
2. The method of claim 1 wherein the weighting comprises weighting the auditory events, the weighting proportionate to the measure of skewness in the spectra.
3. The method of claim 2 wherein the measure of skewness is a measure of smoothed skewness.
4. The method of claim 1 wherein the weighting is insensitive to amplitude of the audio signal.
5. The method of claim 1 wherein the weighting is insensitive to power.
6. The method of claim 1 wherein the weighting is insensitive to loudness.
7. The method of claim 1 wherein any relationship between signal measure and absolute reproduction level is not known at the time of weighting.
8. The method of claim 1 wherein the weighting comprises weighting auditory-event-boundary importance, using skewness in the spectra.
9. The method of claim 1 further comprising reducing swelling of AGC or DRC processing level during perceptibly quieter segments of the audio signal as compared to methods not performing the claimed weighting.
10. A computer-readable memory containing a computer program for performing any one of the methods of claims 1 - 9 .
1 1. A computer system comprising: a CPU; the memory of claim 10 ; and a bus communicatively coupling the CPU and the memory.
12. A audio-signal processor comprising: a spectral-skewness calculator for calculating the spectral skewness in an audio signal; an auditory-events identifier for identifying and weighting auditory events in the audio signal, using the calculated spectral skewness; a parameters modifier for modifying parameters for controlling the loudness of auditory events in the audio signal; and a controller for controlling the loudness of auditory events in the audio signal.
13. A method for controlling the loudness of auditory events in an audio signal, comprising: calculating measures of skewness of spectra of successive auditory events of an audio signal; generating weights for the auditory events based on the measures of skewness; deriving a control signal from the weights; and controlling the loudness of the auditory events using the control signal.
PCT/US2008/008592 2007-07-13 2008-07-11 Audio processing using auditory scene analysis and spectral skewness WO2009011827A1 (en)

Priority Applications (7)

Application Number Priority Date Filing Date Title
AT08780174T ATE535906T1 (en) 2007-07-13 2008-07-11 SOUND PROCESSING USING AUDITORIAL SCENE ANALYSIS AND SPECTRAL ASYMMETRY
EP08780174A EP2168122B1 (en) 2007-07-13 2008-07-11 Audio processing using auditory scene analysis and spectral skewness
ES08780174T ES2377719T3 (en) 2007-07-13 2008-07-11 Audio processing using an analysis of auditory scenes and spectral obliqueness.
JP2010517000A JP5192544B2 (en) 2007-07-13 2008-07-11 Acoustic processing using auditory scene analysis and spectral distortion
CN2008800245251A CN101790758B (en) 2007-07-13 2008-07-11 Audio processing using auditory scene analysis and spectral skewness
BRPI0813723A BRPI0813723B1 (en) 2007-07-13 2008-07-11 method for controlling the sound intensity level of auditory events, non-transient computer-readable memory, computer system and device
US12/668,741 US8396574B2 (en) 2007-07-13 2008-07-11 Audio processing using auditory scene analysis and spectral skewness

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US95946307P 2007-07-13 2007-07-13
US60/959,463 2007-07-13

Publications (1)

Publication Number Publication Date
WO2009011827A1 true WO2009011827A1 (en) 2009-01-22

Family

ID=39776994

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/008592 WO2009011827A1 (en) 2007-07-13 2008-07-11 Audio processing using auditory scene analysis and spectral skewness

Country Status (10)

Country Link
US (1) US8396574B2 (en)
EP (1) EP2168122B1 (en)
JP (1) JP5192544B2 (en)
CN (1) CN101790758B (en)
AT (1) ATE535906T1 (en)
BR (1) BRPI0813723B1 (en)
ES (1) ES2377719T3 (en)
RU (1) RU2438197C2 (en)
TW (1) TWI464735B (en)
WO (1) WO2009011827A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013150340A1 (en) * 2012-04-05 2013-10-10 Nokia Corporation Adaptive audio signal filtering
WO2014046941A1 (en) * 2012-09-19 2014-03-27 Dolby Laboratories Licensing Corporation Method and system for object-dependent adjustment of levels of audio objects
WO2014160542A2 (en) 2013-03-26 2014-10-02 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
WO2014160548A1 (en) 2013-03-26 2014-10-02 Dolby Laboratories Licensing Corporation Equalizer controller and controlling method
WO2014160678A2 (en) 2013-03-26 2014-10-02 Dolby Laboratories Licensing Corporation 1apparatuses and methods for audio classifying and processing
US10306392B2 (en) 2015-11-03 2019-05-28 Dolby Laboratories Licensing Corporation Content-adaptive surround sound virtualization
WO2020020043A1 (en) * 2018-07-25 2020-01-30 Dolby Laboratories Licensing Corporation Compressor target curve to avoid boosting noise
RU2826268C2 (en) * 2013-03-26 2024-09-09 Долби Лабораторис Лайсэнзин Корпорейшн Loudness equalizer controller and control method

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009086174A1 (en) 2007-12-21 2009-07-09 Srs Labs, Inc. System for adjusting perceived loudness of audio signals
WO2010126709A1 (en) 2009-04-30 2010-11-04 Dolby Laboratories Licensing Corporation Low complexity auditory event boundary detection
US8538042B2 (en) 2009-08-11 2013-09-17 Dts Llc System for increasing perceived loudness of speakers
US8510361B2 (en) * 2010-05-28 2013-08-13 George Massenburg Variable exponent averaging detector and dynamic range controller
EP2727383B1 (en) 2011-07-01 2021-04-28 Dolby Laboratories Licensing Corporation System and method for adaptive audio signal generation, coding and rendering
US9312829B2 (en) 2012-04-12 2016-04-12 Dts Llc System for adjusting loudness of audio signals in real time
EP2974253B1 (en) 2013-03-15 2019-05-08 Dolby Laboratories Licensing Corporation Normalization of soundfield orientations based on auditory scene analysis
ES2617314T3 (en) * 2013-04-05 2017-06-16 Dolby Laboratories Licensing Corporation Compression apparatus and method to reduce quantization noise using advanced spectral expansion
EP3111627B1 (en) 2014-02-28 2018-07-04 Dolby Laboratories Licensing Corporation Perceptual continuity using change blindness in conferencing
US9372881B1 (en) 2015-12-29 2016-06-21 International Business Machines Corporation System for identifying a correspondence between a COBOL copybook or PL/1 include file and a VSAM or sequential dataset
WO2017147325A1 (en) 2016-02-25 2017-08-31 Dolby Laboratories Licensing Corporation Multitalker optimised beamforming system and method
CN113015059B (en) * 2021-02-23 2022-10-18 歌尔科技有限公司 Audio optimization method, device, equipment and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040122662A1 (en) * 2002-02-12 2004-06-24 Crockett Brett Greham High quality time-scaling and pitch-scaling of audio signals
US20050071154A1 (en) * 2003-09-30 2005-03-31 Walter Etter Method and apparatus for estimating noise in speech signals
US7058572B1 (en) * 2000-01-28 2006-06-06 Nortel Networks Limited Reducing acoustic noise in wireless and landline based telephony

Family Cites Families (133)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2808475A (en) 1954-10-05 1957-10-01 Bell Telephone Labor Inc Loudness indicator
US4281218A (en) 1979-10-26 1981-07-28 Bell Telephone Laboratories, Incorporated Speech-nonspeech detector-classifier
US4624009A (en) 1980-05-02 1986-11-18 Figgie International, Inc. Signal pattern encoder and classifier
DE3314570A1 (en) 1983-04-22 1984-10-25 Philips Patentverwaltung Gmbh, 2000 Hamburg METHOD AND ARRANGEMENT FOR ADJUSTING THE REINFORCEMENT
US4594561A (en) * 1984-10-26 1986-06-10 Rg Dynamics, Inc. Audio amplifier with resistive damping for minimizing time displacement distortion
US4739514A (en) 1986-12-22 1988-04-19 Bose Corporation Automatic dynamic equalizing
US4887299A (en) 1987-11-12 1989-12-12 Nicolet Instrument Corporation Adaptive, programmable signal processing hearing aid
US5027410A (en) 1988-11-10 1991-06-25 Wisconsin Alumni Research Foundation Adaptive, programmable signal processing and filtering for hearing aids
JPH02118322U (en) 1989-03-08 1990-09-21
US5097510A (en) 1989-11-07 1992-03-17 Gs Systems, Inc. Artificial intelligence pattern-recognition-based noise reduction system for speech processing
US5369711A (en) 1990-08-31 1994-11-29 Bellsouth Corporation Automatic gain control for a headset
US5632005A (en) 1991-01-08 1997-05-20 Ray Milton Dolby Encoder/decoder for multidimensional sound fields
SG49883A1 (en) 1991-01-08 1998-06-15 Dolby Lab Licensing Corp Encoder/decoder for multidimensional sound fields
DE69214882T2 (en) 1991-06-06 1997-03-20 Matsushita Electric Ind Co Ltd Device for distinguishing between music and speech
US5278912A (en) 1991-06-28 1994-01-11 Resound Corporation Multiband programmable compression system
JPH0566795A (en) * 1991-09-06 1993-03-19 Gijutsu Kenkyu Kumiai Iryo Fukushi Kiki Kenkyusho Noise suppressing device and its adjustment device
US5363147A (en) 1992-06-01 1994-11-08 North American Philips Corporation Automatic volume leveler
DE4335739A1 (en) 1992-11-17 1994-05-19 Rudolf Prof Dr Bisping Automatically controlling signal=to=noise ratio of noisy recordings
US5457769A (en) 1993-03-30 1995-10-10 Earmark, Inc. Method and apparatus for detecting the presence of human voice signals in audio signals
US5706352A (en) 1993-04-07 1998-01-06 K/S Himpp Adaptive gain and filtering circuit for a sound reproduction system
US5434922A (en) 1993-04-08 1995-07-18 Miller; Thomas E. Method and apparatus for dynamic sound optimization
BE1007355A3 (en) 1993-07-26 1995-05-23 Philips Electronics Nv Voice signal circuit discrimination and an audio device with such circuit.
IN184794B (en) 1993-09-14 2000-09-30 British Telecomm
JP2986345B2 (en) 1993-10-18 1999-12-06 インターナショナル・ビジネス・マシーンズ・コーポレイション Voice recording indexing apparatus and method
TW247390B (en) 1994-04-29 1995-05-11 Audio Products Int Corp Apparatus and method for adjusting levels between channels of a sound system
US5500902A (en) 1994-07-08 1996-03-19 Stockham, Jr.; Thomas G. Hearing aid device incorporating signal processing techniques
GB9419388D0 (en) 1994-09-26 1994-11-09 Canon Kk Speech analysis
US5548538A (en) 1994-12-07 1996-08-20 Wiltron Company Internal automatic calibrator for vector network analyzers
US5682463A (en) 1995-02-06 1997-10-28 Lucent Technologies Inc. Perceptual audio compression based on loudness uncertainty
CA2167748A1 (en) 1995-02-09 1996-08-10 Yoav Freund Apparatus and methods for machine learning hypotheses
EP0661905B1 (en) 1995-03-13 2002-12-11 Phonak Ag Method for the fitting of hearing aids, device therefor and hearing aid
DE19509149A1 (en) 1995-03-14 1996-09-19 Donald Dipl Ing Schulz Audio signal coding for data compression factor
US5727119A (en) 1995-03-27 1998-03-10 Dolby Laboratories Licensing Corporation Method and apparatus for efficient implementation of single-sideband filter banks providing accurate measures of spectral magnitude and phase
WO1996032710A1 (en) 1995-04-10 1996-10-17 Corporate Computer Systems, Inc. System for compression and decompression of audio signals for digital transmission
US6301555B2 (en) 1995-04-10 2001-10-09 Corporate Computer Systems Adjustable psycho-acoustic parameters
US5601617A (en) 1995-04-26 1997-02-11 Advanced Bionics Corporation Multichannel cochlear prosthesis with flexible control of stimulus waveforms
JPH08328599A (en) 1995-06-01 1996-12-13 Mitsubishi Electric Corp Mpeg audio decoder
US5663727A (en) 1995-06-23 1997-09-02 Hearing Innovations Incorporated Frequency response analyzer and shaping apparatus and digital hearing enhancement apparatus and method utilizing the same
US5712954A (en) 1995-08-23 1998-01-27 Rockwell International Corp. System and method for monitoring audio power level of agent speech in a telephonic switch
US6002776A (en) * 1995-09-18 1999-12-14 Interval Research Corporation Directional acoustic signal processor and method therefor
US5907622A (en) 1995-09-21 1999-05-25 Dougherty; A. Michael Automatic noise compensation system for audio reproduction equipment
US6327366B1 (en) 1996-05-01 2001-12-04 Phonak Ag Method for the adjustment of a hearing device, apparatus to do it and a hearing device
US6108431A (en) 1996-05-01 2000-08-22 Phonak Ag Loudness limiter
US6430533B1 (en) 1996-05-03 2002-08-06 Lsi Logic Corporation Audio decoder core MPEG-1/MPEG-2/AC-3 functional algorithm partitioning and implementation
JPH09311696A (en) * 1996-05-21 1997-12-02 Nippon Telegr & Teleph Corp <Ntt> Automatic gain control device
JPH09312540A (en) 1996-05-23 1997-12-02 Pioneer Electron Corp Loudness volume controller
JP3765622B2 (en) 1996-07-09 2006-04-12 ユナイテッド・モジュール・コーポレーション Audio encoding / decoding system
EP0820212B1 (en) 1996-07-19 2010-04-21 Bernafon AG Acoustic signal processing based on loudness control
JP2953397B2 (en) 1996-09-13 1999-09-27 日本電気株式会社 Hearing compensation processing method for digital hearing aid and digital hearing aid
JP3367592B2 (en) * 1996-09-24 2003-01-14 日本電信電話株式会社 Automatic gain adjustment device
US6570991B1 (en) 1996-12-18 2003-05-27 Interval Research Corporation Multi-feature speech/music discrimination system
US5862228A (en) 1997-02-21 1999-01-19 Dolby Laboratories Licensing Corporation Audio matrix encoding
US6125343A (en) 1997-05-29 2000-09-26 3Com Corporation System and method for selecting a loudest speaker by comparing average frame gains
US6272360B1 (en) 1997-07-03 2001-08-07 Pan Communications, Inc. Remotely installed transmitter and a hands-free two-way voice terminal device using same
US6185309B1 (en) 1997-07-11 2001-02-06 The Regents Of The University Of California Method and apparatus for blind separation of mixed and convolved sources
KR100261904B1 (en) 1997-08-29 2000-07-15 윤종용 Headphone sound output apparatus
US6088461A (en) 1997-09-26 2000-07-11 Crystal Semiconductor Corporation Dynamic volume control system
US6233554B1 (en) 1997-12-12 2001-05-15 Qualcomm Incorporated Audio CODEC with AGC controlled by a VOCODER
US6298139B1 (en) 1997-12-31 2001-10-02 Transcrypt International, Inc. Apparatus and method for maintaining a constant speech envelope using variable coefficient automatic gain control
US6182033B1 (en) 1998-01-09 2001-01-30 At&T Corp. Modular approach to speech enhancement with an application to speech coding
US6353671B1 (en) 1998-02-05 2002-03-05 Bioinstco Corp. Signal processing circuit and method for increasing speech intelligibility
US6311155B1 (en) 2000-02-04 2001-10-30 Hearing Enhancement Company Llc Use of voice-to-remaining audio (VRA) in consumer applications
US6498855B1 (en) 1998-04-17 2002-12-24 International Business Machines Corporation Method and system for selectively and variably attenuating audio data
AU758242B2 (en) 1998-06-08 2003-03-20 Cochlear Limited Hearing instrument
EP0980064A1 (en) 1998-06-26 2000-02-16 Ascom AG Method for carrying an automatic judgement of the transmission quality of audio signals
GB2340351B (en) 1998-07-29 2004-06-09 British Broadcasting Corp Data transmission
US6351731B1 (en) 1998-08-21 2002-02-26 Polycom, Inc. Adaptive filter featuring spectral gain smoothing and variable noise multiplier for noise reduction, and method therefor
US6823303B1 (en) 1998-08-24 2004-11-23 Conexant Systems, Inc. Speech encoder using voice activity detection in coding noise
US6411927B1 (en) 1998-09-04 2002-06-25 Matsushita Electric Corporation Of America Robust preprocessing signal equalization system and method for normalizing to a target environment
FI113935B (en) 1998-09-25 2004-06-30 Nokia Corp Method for Calibrating the Sound Level in a Multichannel Audio System and a Multichannel Audio System
DE19848491A1 (en) 1998-10-21 2000-04-27 Bosch Gmbh Robert Radio receiver with audio data system has control unit to allocate sound characteristic according to transferred program type identification adjusted in receiving section
US6314396B1 (en) 1998-11-06 2001-11-06 International Business Machines Corporation Automatic gain control in a speech recognition system
EP1089242B1 (en) 1999-04-09 2006-11-08 Texas Instruments Incorporated Supply of digital audio and video products
AU4278300A (en) 1999-04-26 2000-11-10 Dspfactory Ltd. Loudness normalization control for a digital hearing aid
US6263371B1 (en) 1999-06-10 2001-07-17 Cacheflow, Inc. Method and apparatus for seaming of streaming content
US6985594B1 (en) 1999-06-15 2006-01-10 Hearing Enhancement Co., Llc. Voice-to-remaining audio (VRA) interactive hearing aid and auxiliary equipment
US6442278B1 (en) 1999-06-15 2002-08-27 Hearing Enhancement Company, Llc Voice-to-remaining audio (VRA) interactive center channel downmix
US6778966B2 (en) 1999-11-29 2004-08-17 Syfx Segmented mapping converter system and method
FR2802329B1 (en) 1999-12-08 2003-03-28 France Telecom PROCESS FOR PROCESSING AT LEAST ONE AUDIO CODE BINARY FLOW ORGANIZED IN THE FORM OF FRAMES
US6351733B1 (en) 2000-03-02 2002-02-26 Hearing Enhancement Company, Llc Method and apparatus for accommodating primary content audio and secondary content remaining audio capability in the digital audio production process
DE10018666A1 (en) 2000-04-14 2001-10-18 Harman Audio Electronic Sys Dynamic sound optimization in the interior of a motor vehicle or similar noisy environment, a monitoring signal is split into desired-signal and noise-signal components which are used for signal adjustment
US6889186B1 (en) 2000-06-01 2005-05-03 Avaya Technology Corp. Method and apparatus for improving the intelligibility of digitally compressed speech
AUPQ952700A0 (en) 2000-08-21 2000-09-14 University Of Melbourne, The Sound-processing strategy for cochlear implants
JP3448586B2 (en) 2000-08-29 2003-09-22 独立行政法人産業技術総合研究所 Sound measurement method and system considering hearing impairment
US6625433B1 (en) 2000-09-29 2003-09-23 Agere Systems Inc. Constant compression automatic gain control circuit
US6807525B1 (en) 2000-10-31 2004-10-19 Telogy Networks, Inc. SID frame detection with human auditory perception compensation
DE60029453T2 (en) 2000-11-09 2007-04-12 Koninklijke Kpn N.V. Measuring the transmission quality of a telephone connection in a telecommunications network
US7457422B2 (en) 2000-11-29 2008-11-25 Ford Global Technologies, Llc Method and implementation for detecting and characterizing audible transients in noise
FR2820573B1 (en) 2001-02-02 2003-03-28 France Telecom METHOD AND DEVICE FOR PROCESSING A PLURALITY OF AUDIO BIT STREAMS
WO2004019656A2 (en) 2001-02-07 2004-03-04 Dolby Laboratories Licensing Corporation Audio channel spatial translation
DE10107385A1 (en) 2001-02-16 2002-09-05 Harman Audio Electronic Sys Device for adjusting the volume depending on noise
US6915264B2 (en) 2001-02-22 2005-07-05 Lucent Technologies Inc. Cochlear filter bank structure for determining masked thresholds for use in perceptual audio coding
DK1290914T3 (en) 2001-04-10 2004-09-27 Phonak Ag Method of fitting a hearing aid to an individual
US7711123B2 (en) 2001-04-13 2010-05-04 Dolby Laboratories Licensing Corporation Segmenting audio signals into auditory events
US7461002B2 (en) 2001-04-13 2008-12-02 Dolby Laboratories Licensing Corporation Method for time aligning audio signals using characterizations based on auditory events
US7283954B2 (en) 2001-04-13 2007-10-16 Dolby Laboratories Licensing Corporation Comparing audio using characterizations based on auditory events
DE60209161T2 (en) 2001-04-18 2006-10-05 Gennum Corp., Burlington Multi-channel hearing aid with transmission options between the channels
KR100400226B1 (en) * 2001-10-15 2003-10-01 삼성전자주식회사 Apparatus and method for computing speech absence probability, apparatus and method for removing noise using the computation appratus and method
US7177803B2 (en) 2001-10-22 2007-02-13 Motorola, Inc. Method and apparatus for enhancing loudness of an audio signal
US20040037421A1 (en) 2001-12-17 2004-02-26 Truman Michael Mead Parital encryption of assembled bitstreams
US7068723B2 (en) 2002-02-28 2006-06-27 Fuji Xerox Co., Ltd. Method for automatically producing optimal summaries of linear media
US7155385B2 (en) 2002-05-16 2006-12-26 Comerica Bank, As Administrative Agent Automatic gain control for adjusting gain during non-speech portions
US7447631B2 (en) 2002-06-17 2008-11-04 Dolby Laboratories Licensing Corporation Audio coding system using spectral hole filling
FR2842576B1 (en) 2002-07-17 2004-10-08 Skf Ab FREE WHEEL BEARING DEVICE AND FREE WHEEL PULLEY
JP4257079B2 (en) 2002-07-19 2009-04-22 パイオニア株式会社 Frequency characteristic adjusting device and frequency characteristic adjusting method
JP4321049B2 (en) * 2002-07-29 2009-08-26 パナソニック電工株式会社 Automatic gain controller
DE10236694A1 (en) 2002-08-09 2004-02-26 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Equipment for scalable coding and decoding of spectral values of signal containing audio and/or video information by splitting signal binary spectral values into two partial scaling layers
US7454331B2 (en) 2002-08-30 2008-11-18 Dolby Laboratories Licensing Corporation Controlling loudness of speech in signals that contain speech and other types of audio material
MXPA05008317A (en) 2003-02-06 2005-11-04 Dolby Lab Licensing Corp Continuous backup audio.
DE10308483A1 (en) 2003-02-26 2004-09-09 Siemens Audiologische Technik Gmbh Method for automatic gain adjustment in a hearing aid and hearing aid
US7551745B2 (en) 2003-04-24 2009-06-23 Dolby Laboratories Licensing Corporation Volume and compression control in movie theaters
SG185134A1 (en) 2003-05-28 2012-11-29 Dolby Lab Licensing Corp Method, apparatus and computer program for calculating and adjusting the perceived loudness of an audio signal
US6923684B2 (en) 2003-10-10 2005-08-02 O'sullivan Industries, Inc. Power harness having multiple upstream USB ports
ATE527654T1 (en) 2004-03-01 2011-10-15 Dolby Lab Licensing Corp MULTI-CHANNEL AUDIO CODING
GB2413906A (en) 2004-04-21 2005-11-09 Imagination Tech Ltd Radio volume control system
US7617109B2 (en) 2004-07-01 2009-11-10 Dolby Laboratories Licensing Corporation Method for correcting metadata affecting the playback loudness and dynamic range of audio information
US7508947B2 (en) 2004-08-03 2009-03-24 Dolby Laboratories Licensing Corporation Method for combining audio signals using auditory scene analysis
TWI497485B (en) 2004-08-25 2015-08-21 Dolby Lab Licensing Corp Method for reshaping the temporal envelope of synthesized output audio signal to approximate more closely the temporal envelope of input audio signal
CA2580763C (en) * 2004-09-20 2014-07-29 John Gerard Beerends Frequency compensation for perceptual speech analysis
CA2581810C (en) 2004-10-26 2013-12-17 Dolby Laboratories Licensing Corporation Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
TWI397901B (en) * 2004-12-21 2013-06-01 Dolby Lab Licensing Corp Method for controlling a particular loudness characteristic of an audio signal, and apparatus and computer program associated therewith
US8265295B2 (en) 2005-03-11 2012-09-11 Rane Corporation Method and apparatus for identifying feedback in a circuit
TWI397903B (en) 2005-04-13 2013-06-01 Dolby Lab Licensing Corp Economical loudness measurement of coded audio
GB2428168A (en) 2005-07-06 2007-01-17 Motorola Inc A transmitter splits a signal into a plurality of sub-signals, each containing a plurality of sub-carriers, and amplifies each sub-signal separately.
WO2007120452A1 (en) 2006-04-04 2007-10-25 Dolby Laboratories Licensing Corporation Audio signal loudness measurement and modification in the mdct domain
CN101421781A (en) 2006-04-04 2009-04-29 杜比实验室特许公司 Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
TWI517562B (en) 2006-04-04 2016-01-11 杜比實驗室特許公司 Method, apparatus, and computer program for scaling the overall perceived loudness of a multichannel audio signal by a desired amount
CA2648237C (en) 2006-04-27 2013-02-05 Dolby Laboratories Licensing Corporation Audio gain control using specific-loudness-based auditory event detection
RU2413357C2 (en) 2006-10-20 2011-02-27 Долби Лэборетериз Лайсенсинг Корпорейшн Processing dynamic properties of audio using retuning
US8521314B2 (en) 2006-11-01 2013-08-27 Dolby Laboratories Licensing Corporation Hierarchical control path with constraints for audio dynamics processing
US8194889B2 (en) 2007-01-03 2012-06-05 Dolby Laboratories Licensing Corporation Hybrid digital/analog loudness-compensating volume control
KR101163411B1 (en) 2007-03-19 2012-07-12 돌비 레버러토리즈 라이쎈싱 코오포레이션 Speech enhancement employing a perceptual model
CN101681618B (en) 2007-06-19 2015-12-16 杜比实验室特许公司 Utilize the loudness measurement of spectral modifications

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7058572B1 (en) * 2000-01-28 2006-06-06 Nortel Networks Limited Reducing acoustic noise in wireless and landline based telephony
US20040122662A1 (en) * 2002-02-12 2004-06-24 Crockett Brett Greham High quality time-scaling and pitch-scaling of audio signals
US20050071154A1 (en) * 2003-09-30 2005-03-31 Walter Etter Method and apparatus for estimating noise in speech signals

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BRETT CROCKETT AND MICHAEL SMITHERS: "A Method for Characterizing and Identifying Audio Based on Auditory Scene Analysis", AUDIO ENGINEERING SOCIETY CONVENTION 118 PAPER 6416, May 2005 (2005-05-01), XP002500094 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013150340A1 (en) * 2012-04-05 2013-10-10 Nokia Corporation Adaptive audio signal filtering
US9633667B2 (en) 2012-04-05 2017-04-25 Nokia Technologies Oy Adaptive audio signal filtering
US9349384B2 (en) 2012-09-19 2016-05-24 Dolby Laboratories Licensing Corporation Method and system for object-dependent adjustment of levels of audio objects
WO2014046941A1 (en) * 2012-09-19 2014-03-27 Dolby Laboratories Licensing Corporation Method and system for object-dependent adjustment of levels of audio objects
US9842605B2 (en) 2013-03-26 2017-12-12 Dolby Laboratories Licensing Corporation Apparatuses and methods for audio classifying and processing
US10803879B2 (en) 2013-03-26 2020-10-13 Dolby Laboratories Licensing Corporation Apparatuses and methods for audio classifying and processing
US9548713B2 (en) 2013-03-26 2017-01-17 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
US9621124B2 (en) 2013-03-26 2017-04-11 Dolby Laboratories Licensing Corporation Equalizer controller and controlling method
WO2014160548A1 (en) 2013-03-26 2014-10-02 Dolby Laboratories Licensing Corporation Equalizer controller and controlling method
EP3190702A2 (en) 2013-03-26 2017-07-12 Dolby Laboratories Licensing Corp. Volume leveler controller and controlling method
EP3217545A1 (en) 2013-03-26 2017-09-13 Dolby Laboratories Licensing Corp. Volume leveler controller and controlling method
EP3232567A1 (en) 2013-03-26 2017-10-18 Dolby Laboratories Licensing Corporation Equalizer controller and controlling method
WO2014160542A2 (en) 2013-03-26 2014-10-02 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
US9923536B2 (en) 2013-03-26 2018-03-20 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
US10044337B2 (en) 2013-03-26 2018-08-07 Dolby Laboratories Licensing Corporation Equalizer controller and controlling method
RU2826268C2 (en) * 2013-03-26 2024-09-09 Долби Лабораторис Лайсэнзин Корпорейшн Loudness equalizer controller and control method
US10411669B2 (en) 2013-03-26 2019-09-10 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
EP3598448A1 (en) 2013-03-26 2020-01-22 Dolby Laboratories Licensing Corporation Apparatuses and methods for audio classifying and processing
US11711062B2 (en) 2013-03-26 2023-07-25 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
US10707824B2 (en) 2013-03-26 2020-07-07 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
WO2014160678A2 (en) 2013-03-26 2014-10-02 Dolby Laboratories Licensing Corporation 1apparatuses and methods for audio classifying and processing
US11218126B2 (en) 2013-03-26 2022-01-04 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
EP4080763A1 (en) 2013-03-26 2022-10-26 Dolby Laboratories Licensing Corp. Volume leveler controller and controlling method
US10306392B2 (en) 2015-11-03 2019-05-28 Dolby Laboratories Licensing Corporation Content-adaptive surround sound virtualization
WO2020020043A1 (en) * 2018-07-25 2020-01-30 Dolby Laboratories Licensing Corporation Compressor target curve to avoid boosting noise
US11894006B2 (en) 2018-07-25 2024-02-06 Dolby Laboratories Licensing Corporation Compressor target curve to avoid boosting noise

Also Published As

Publication number Publication date
CN101790758B (en) 2013-01-09
TWI464735B (en) 2014-12-11
EP2168122B1 (en) 2011-11-30
JP2010534030A (en) 2010-10-28
ES2377719T3 (en) 2012-03-30
US8396574B2 (en) 2013-03-12
TW200915301A (en) 2009-04-01
EP2168122A1 (en) 2010-03-31
US20100198378A1 (en) 2010-08-05
BRPI0813723B1 (en) 2020-02-04
ATE535906T1 (en) 2011-12-15
RU2010105052A (en) 2011-08-20
BRPI0813723A2 (en) 2017-07-04
CN101790758A (en) 2010-07-28
JP5192544B2 (en) 2013-05-08
RU2438197C2 (en) 2011-12-27

Similar Documents

Publication Publication Date Title
US8396574B2 (en) Audio processing using auditory scene analysis and spectral skewness
US11711060B2 (en) Audio control using auditory event detection
CN112640301B (en) Method and apparatus for dynamically adjusting threshold of compressor

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200880024525.1

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08780174

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 4521/KOLNP/2009

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 12668741

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2010517000

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2008780174

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2010105052

Country of ref document: RU

ENP Entry into the national phase

Ref document number: PI0813723

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20100113