WO2007127023A1 - Audio gain control using specific-loudness-based auditory event detection - Google Patents

Audio gain control using specific-loudness-based auditory event detection Download PDF

Info

Publication number
WO2007127023A1
WO2007127023A1 PCT/US2007/008313 US2007008313W WO2007127023A1 WO 2007127023 A1 WO2007127023 A1 WO 2007127023A1 US 2007008313 W US2007008313 W US 2007008313W WO 2007127023 A1 WO2007127023 A1 WO 2007127023A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
auditory
signal
loudness
event
Prior art date
Application number
PCT/US2007/008313
Other languages
French (fr)
Inventor
Brett Graham Crockett
Alan Jeffrey Seefeldt
Original Assignee
Dolby Laboratories Licensing Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby Laboratories Licensing Corporation filed Critical Dolby Laboratories Licensing Corporation
Priority to MX2008013753A priority Critical patent/MX2008013753A/en
Priority to AT07754779T priority patent/ATE493794T1/en
Priority to NO20191310A priority patent/NO345590B1/en
Priority to KR1020117001302A priority patent/KR101200615B1/en
Priority to PL07754779T priority patent/PL2011234T3/en
Priority to AU2007243586A priority patent/AU2007243586B2/en
Priority to DK07754779.2T priority patent/DK2011234T3/en
Priority to CN2007800147428A priority patent/CN101432965B/en
Priority to JP2009507694A priority patent/JP5129806B2/en
Priority to EP07754779A priority patent/EP2011234B1/en
Priority to BRPI0711063-4A priority patent/BRPI0711063B1/en
Priority to CA2648237A priority patent/CA2648237C/en
Priority to US12/226,698 priority patent/US8144881B2/en
Priority to DE602007011594T priority patent/DE602007011594D1/en
Publication of WO2007127023A1 publication Critical patent/WO2007127023A1/en
Priority to IL194430A priority patent/IL194430A/en
Priority to NO20084336A priority patent/NO339346B1/en
Priority to HK09106026.6A priority patent/HK1126902A1/en
Priority to AU2011201348A priority patent/AU2011201348B2/en
Priority to US13/406,929 priority patent/US9136810B2/en
Priority to US13/464,102 priority patent/US8428270B2/en
Priority to US13/850,380 priority patent/US9450551B2/en
Priority to NO20161296A priority patent/NO342157B1/en
Priority to NO20161295A priority patent/NO342160B1/en
Priority to US15/238,820 priority patent/US9685924B2/en
Priority to NO20161439A priority patent/NO342164B1/en
Priority to US15/447,518 priority patent/US9780751B2/en
Priority to US15/447,482 priority patent/US9742372B2/en
Priority to US15/447,556 priority patent/US9787269B2/en
Priority to US15/447,456 priority patent/US9698744B1/en
Priority to US15/447,529 priority patent/US9774309B2/en
Priority to US15/447,469 priority patent/US9768749B2/en
Priority to US15/447,493 priority patent/US9762196B2/en
Priority to US15/447,564 priority patent/US9866191B2/en
Priority to US15/447,543 priority patent/US9787268B2/en
Priority to US15/447,503 priority patent/US9768750B2/en
Priority to US15/809,413 priority patent/US10103700B2/en
Priority to NO20180271A priority patent/NO344655B1/en
Priority to NO20180266A priority patent/NO343877B1/en
Priority to NO20180272A priority patent/NO344658B1/en
Priority to US16/128,642 priority patent/US10284159B2/en
Priority to NO20190022A priority patent/NO344013B1/en
Priority to NO20190018A priority patent/NO344363B1/en
Priority to NO20190002A priority patent/NO344364B1/en
Priority to NO20190025A priority patent/NO344361B1/en
Priority to NO20190024A priority patent/NO344362B1/en
Priority to US16/365,947 priority patent/US10523169B2/en
Priority to US16/729,468 priority patent/US10833644B2/en
Priority to US17/093,178 priority patent/US11362631B2/en
Priority to US17/839,099 priority patent/US11711060B2/en
Priority to US18/327,585 priority patent/US11962279B2/en
Priority to US18/672,224 priority patent/US20240313729A1/en
Priority to US18/672,726 priority patent/US20240313730A1/en
Priority to US18/672,762 priority patent/US20240313731A1/en

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G3/00Gain control in amplifiers or frequency changers
    • H03G3/20Automatic control
    • H03G3/30Automatic control in amplifiers having semiconductor devices
    • H03G3/3089Control of digital or coded signals
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G3/00Gain control in amplifiers or frequency changers
    • H03G3/20Automatic control
    • H03G3/30Automatic control in amplifiers having semiconductor devices
    • H03G3/3005Automatic control in amplifiers having semiconductor devices in amplifiers suitable for low-frequencies, e.g. audio amplifiers
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G3/00Gain control in amplifiers or frequency changers
    • H03G3/20Automatic control
    • H03G3/30Automatic control in amplifiers having semiconductor devices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G1/00Details of arrangements for controlling amplification
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G7/00Volume compression or expansion in amplifiers
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G7/00Volume compression or expansion in amplifiers
    • H03G7/007Volume compression or expansion in amplifiers of digital or coded signals
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G9/00Combinations of two or more types of control, e.g. gain control and tone control
    • H03G9/005Combinations of two or more types of control, e.g. gain control and tone control of digital or coded signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/03Synergistic effects of band splitting and sub-band processing

Definitions

  • the present invention relates to audio dynamic range control methods and apparatus in which an audio processing device analyzes an audio signal and changes the level, gain or dynamic range of the audio, and all or some of the parameters of the audio gain and dynamics processing are generated as a function of auditory events.
  • the invention also relates to computer programs for practicing such methods or controlling such apparatus.
  • the present invention also relates to methods and apparatus using a specific- loudness-based detection of auditory events.
  • the invention also relates to computer programs for practicing such methods or controlling such apparatus.
  • AGC automatic gain control
  • DRC dynamic range control
  • ASA auditory scene analysis
  • an audio signal is divided into auditory events, each of which tends to be perceived as separate and distinct, by detecting changes in spectral composition (amplitude as a function of frequency) with respect to time. This may be done, for example, by calculating the spectral content of successive time blocks of the audio signal, calculating the difference in spectral content between successive time blocks of the audio signal, and identifying an auditory event boundary as the boundary between successive time blocks when the difference in the spectral content between such successive time blocks exceeds a threshold.
  • changes in amplitude with respect to time may be calculated instead of or in addition to changes in spectral composition with respect to time.
  • the process divides audio into time segments by analyzing the entire frequency band (full bandwidth audio) or substantially the entire frequency band (in practical implementations, band limiting filtering at the ends of the spectrum is often employed) and giving the greatest weight to the loudest audio signal components.
  • This approach takes advantage of a psychoacoustic phenomenon in which at smaller time scales (20 milliseconds (ms) and less) the ear may tend to focus on a single auditory event at a given time. This implies that while multiple events may be occurring at the same time, one component tends to be perceptually most prominent and may be processed individually as though it were the only event taking place. Taking advantage of this effect also allows the auditory event detection to scale with the complexity of the audio being processed.
  • the auditory event detection identifies the "most prominent" (i.e., the loudest) audio element at any given moment.
  • the process may also take into consideration changes in spectral composition with respect to time in discrete frequency subbands (fixed or dynamically determined or both fixed and dynamically determined subbands) rather than the full bandwidth.
  • This alternative approach takes into account more than one audio stream in different frequency subbands rather than assuming that only a single stream is perceptible at a particular time.
  • Auditory event detection may be implemented by dividing a time domain audio waveform into time intervals or blocks and then converting the data in each block to the frequency domain, using either a filter bank or a time-frequency transformation, such as the FFT.
  • the amplitude of the spectral content of each block may be normalized in order to eliminate or reduce the effect of amplitude changes.
  • Each resulting frequency domain representation provides an indication of the spectral content of the audio in the particular block.
  • the spectral content of successive blocks is compared and changes greater than a threshold may be taken to indicate the temporal start or temporal end of an auditory event.
  • the frequency domain data is normalized, as is described below.
  • the degree to which the frequency domain data needs to be normalized gives an indication of amplitude. Hence, if a change in this degree exceeds a predetermined threshold that too may be taken to indicate an event boundary. Event start and end points resulting from spectral changes and from amplitude changes may be ORed together so that event boundaries resulting from either type of change are identified.
  • Auditory scene analysis identifies perceptually discrete auditory events, with each event occurring between two consecutive auditory event boundaries.
  • the audible impairments caused by a gain change can be greatly reduced by ensuring that within an auditory event the gain is more nearly constant and by confining much of the change to the neighborhood of an event boundary.
  • the response to an increase in audio level (often called the attack) may be rapid, comparable with or shorter than the minimum duration of auditory events, but the response to a decrease (the release or recovery) may be slower so that sounds that ought to appear constant or to decay gradually may be audibly disturbed. Under such circumstances, it is very beneficial to delay the gain recovery until the next boundary or to slow down the rate of change of gain during an event.
  • an audio processing system receives an audio signal and analyzes and alters the gain and/or dynamic range characteristics of the audio.
  • the dynamic range modification of the audio is often controlled by parameters of a dynamics processing system (attack and release time, compression ratio, etc.) that have significant effects on the perceptual artifacts introduced by the dynamics processing.
  • Changes in signal characteristics with respect to time in the audio signal are detected and identified as auditory event boundaries, such that an audio segment between consecutive boundaries constitutes an auditory event in the audio signal.
  • the characteristics of the auditory events of interest may include characteristics of the events such as perceptual strength or duration.
  • an auditory event is a segment of audio that tends to be perceived as separate and distinct.
  • One usable measure of signal characteristics includes a measure of the spectral content of the audio, for example, as described in the cited Crockett and Crockett et al documents.
  • All or some of the one or more audio dynamics processing parameters may be generated at least partly in response to the presence or absence and characteristics of one or more auditory events.
  • An auditory event boundary may be identified as a change in signal characteristics with respect to time that exceeds a threshold.
  • all or some of the one or more parameters may be generated at least partly in response to a continuing measure of the degree of change in signal characteristics associated with said auditory event boundaries.
  • aspects of the invention may be implemented in analog and/or digital domains, practical implementations are likely to be implemented in the digital domain in which each of the audio signals are represented by individual samples or samples within blocks of data.
  • the signal characteristics may be the spectral content of audio within a block
  • the detection of changes in signal characteristics with respect to time may be the detection of changes in spectral content of audio from block to block
  • auditory event temporal start and stop boundaries each coincide with a boundary of a block of data.
  • the present invention presents two ways of performing auditory scene analysis.
  • the first performs spectral analysis and identifies the location of perceptible audio events that are used to control the dynamic gain parameters by identifying changes in spectral content.
  • the second way transforms the audio into a perceptual loudness domain (that may provide more psychoacoustically relevant information than the first way) and identifies the location of auditory events that are subsequently used to control the dynamic gain parameters. It should be noted that the second way requires that the audio processing be aware of absolute acoustic reproduction levels, which may not be possible in some implementations. Presenting both methods of auditory scene analysis allows implementations of AS A-controlled dynamic gain modification using processes or devices that may or may not be calibrated to take into account absolute reproduction levels.
  • FIG. 1 is a flow chart showing an example of processing steps for performing auditory scene analysis.
  • FIG. 2 shows an example of block processing, windowing and performing the DFT on audio while performing the auditory scene analysis.
  • FIG. 3 is in the nature of a flow chart or functional block diagram, showing parallel processing in which audio is used to identify auditory events and to identify the characteristics of the auditory events such that the events and their characteristics are used to modify dynamics processing parameters.
  • FIG. 4 is in the nature of a flow chart or functional block diagram, showing processing in which audio is used only to identify auditory events and the event characteristics are determined from the audio event detection such that the events and their characteristics are used to modify the dynamics processing parameters.
  • FIG. 5 is in the nature of a flow chart or functional block diagram, showing processing in which audio is used only to identify auditory events and the event characteristics are determined from the audio event detection and such that only the characteristics of the auditory events are used to modify the dynamics processing parameters.
  • FIG. 6 shows a set idealized auditory filter characteristic responses that approximate critical banding on the ERB scale.
  • the horizontal scale is frequency in Hertz and the vertical scale is level in decibels.
  • FIG. 7 shows the equal loudness contours of ISO 226.
  • the horizontal scale is frequency in Hertz (logarithmic base 10 scale) and the vertical scale is sound pressure level in decibels.
  • FIGS. 8a-c shows idealized input/output characteristics and input gain characteristics of an audio dynamic range compressor.
  • FIGS. 9a-f show an example of the use of auditory events to control the release time in a digital implementation of a traditional Dynamic Range Controller (DRC) in which the gain control is derived from the Root Mean Square (RMS) power of the signal.
  • DRC Dynamic Range Controller
  • RMS Root Mean Square
  • FIGS. 10a-f show an example of the use of auditory events to control the release time in a digital implementation of a traditional Dynamic Range Controller (DRC) in which the gain control is derived from the Root Mean Square (RMS) power of the signal for an alternate signal to that used in FIG. 9.
  • DRC Dynamic Range Controller
  • RMS Root Mean Square
  • FIG. 11 depicts a suitable set of idealized AGC and DRC curves for the application of AGC followed by DRC in a loudness domain dynamics processing system.
  • the goal of the combination is to make all processed audio have approximately the same perceived loudness while still maintaining at least some of the original audio's dynamics.
  • auditory scene analysis may be composed of four general processing steps as shown in a portion of FIG. 1.
  • the first step 1-1 (“Perform Spectral Analysis”) takes a time-domain audio signal, divides it into blocks and calculates a spectral profile or spectral content for each of the blocks.
  • Spectral analysis transfo ⁇ ns the audio signal into the short-term frequency domain. This may be performed using any f ⁇ lterbank, either based on transforms or banks of bandpass filters, and in either linear or warped frequency space (such as the Bark scale or critical band, which better approximate the characteristics of the human ear). With any filterbank there exists a tradeoff between time and frequency. Greater time resolution, and hence shorter time intervals, leads to lower frequency resolution. Greater frequency resolution, and hence narrower subbands, leads to longer time intervals.
  • the first step illustrated conceptually in FIG. 1 calculates the spectral content of successive time segments of the audio signal.
  • the ASA block size may be from any number of samples of the input audio signal, although 512 samples provide a good tradeoff of time and frequency resolution.
  • the differences in spectral content from block to block are determined ("Perform spectral profile difference measurements").
  • the second step calculates the difference in spectral content between successive time segments of the audio signal.
  • a powerful indicator of the beginning or end of a perceived auditory event is believed to be a change in spectral content.
  • the third step 1-3 "Identify location of auditory event boundaries"
  • the block boundary is taken to be an auditory event boundary.
  • the audio segment between consecutive boundaries constitutes an auditory event.
  • the third step sets an auditory event boundary between successive time segments when the difference in the spectral profile content between such successive time segments exceeds a threshold, thus defining auditory events.
  • auditory event boundaries define auditory events having a length that is an integral multiple of spectral profile blocks with a minimum length of one spectral profile block (512 samples in this example).
  • event boundaries need not be so limited.
  • the input block size may vary, for example, so as to be essentially the size of an auditory event.
  • FIG. 2 shows a conceptual representation of non-overlapping N sample blocks being windowed and transformed into the frequency domain by the Discrete Fourier Transform (DFT). Each block may be windowed and transformed into the frequency domain, such as by using the DFT, preferably implemented as a Fast Fourier Transform (FFT) for speed.
  • DFT Discrete Fourier Transform
  • M number of windowed samples in a block used to compute spectral profile
  • P number of samples of spectral computation overlap
  • any integer numbers may be used for the variables above.
  • M 512 samples (or 11.6 ms at 44.1 kHz)
  • the above-listed values were determined experimentally and were found generally to identify with sufficient accuracy the location and duration of auditory events. However, setting the value of P to 256 samples (50% overlap) rather than zero samples (no overlap) has been found to be useful in identifying some hard-to-find events. While many different types of windows may be used to minimize spectral artifacts due to windowing, the window used in the spectral profile calculations is an M-point Harming, Kaiser-Bessel or other suitable, preferably non-rectangular, window. The above- indicated values and a Harming window type were selected after extensive experimental analysis as they have shown to provide excellent results across a wide range of audio material. Non-rectangular windowing is preferred for the processing of audio signals with predominantly low frequency content.
  • Rectangular windowing produces spectral artifacts that may cause incorrect detection of events.
  • codec encoder/decoder
  • step 1-1 the spectrum of each M-sample block may be computed by windowing the data with an M-point Harming, Kaiser-Bessel or other suitable window, converting to the frequency domain using an M-point Fast Fourier Transform, and calculating the magnitude of the complex FFT coefficients.
  • the resultant data is normalized so that the largest magnitude is set to unity, and the normalized array of M numbers is converted to the log domain.
  • the data may also be normalized by some other metric such as the mean magnitude value or mean power value of the data.
  • the array need not be converted to the log domain, but the conversion simplifies the calculation of the difference measure in step 1-2. Furthermore, the log domain more closely matches the nature of the human auditory system.
  • the resulting log domain values have a range of minus infinity to zero.
  • a lower limit may be imposed on the range of values; the limit may be fixed, for example -60 dB, or be frequency-dependent to reflect the lower audibility of quiet sounds at low and very high frequencies. (Note that it would be possible to reduce the size of the array to M/2 in that the FFT represents negative as well as positive frequencies).
  • Step 1-2 calculates a measure of the difference between the spectra of adjacent blocks. For each block, each of the M (log) spectral coefficients from step 1-1 is subtracted from the corresponding coefficient for the preceding block, and the magnitude of the difference calculated (the sign is ignored). These M differences are then summed to one number. This difference measure may also be expressed as an average difference per spectral coefficient by dividing the difference measure by the number of spectral coefficients used in the sum (in this case M coefficients).
  • Step 1-3 identifies the locations of auditory event boundaries by applying a threshold to the array of difference measures from step 1-2 with a threshold value.
  • a difference measure exceeds a threshold, the change in spectrum is deemed sufficient to signal a new event and the block number of the change is recorded as an event boundary.
  • the threshold may be set equal to 2500 if the whole magnitude FFT (including the mirrored part) is compared or 1250 if half the FFT is compared (as noted above, the FFT represents negative as well as positive frequencies — for the magnitude of the FFT, one is the mirror image of the other). This value was chosen experimentally and it provides good auditory event boundary detection.
  • This parameter value may be changed to reduce (increase the threshold) or increase (decrease the threshold) the detection of events.
  • the process of FIG. 1 may be represented more generally by the equivalent arrangements of FIGS. 3, 4 and 5.
  • an audio signal is applied in parallel to an "Identify Auditory Events" function or step 3-1 that divides the audio signal into auditory events, each of which tends to be perceived as separate and distinct and to an optional "Identify Characteristics of Auditory Events” function or step 3-2.
  • the process of FIG. 1 may be employed to divide the audio signal into auditory events and their characteristics identified or some other suitable process may be employed.
  • the auditory event information which may be an identification of auditory event boundaries, determined by function or step 3-1 is then used to modify the audio dynamics processing parameters (such as attack, release, ratio, etc.) , as desired, by a "Modify Dynamics Parameters” function or step 3-3.
  • the optional "Identify Characteristics” function or step 3-3 also receives the auditory event information.
  • the "Identify Characteristics" function or step 3-3 may characterize some or all of the auditory events by one or more characteristics. Such characteristics may include an identification of the dominant subband of the auditory event, as described in connection with the process of FIG. 1.
  • the characteristics may also include one or more audio characteristics, including, for example, a measure of power of the auditory event, a measure of amplitude of the auditory event, a measure of the spectral flatness of the auditory event, and whether the auditory event is substantially silent, or other characteristics that help modify dynamics parameters such that negative audible artifacts of the processing are reduced or removed.
  • the characteristics may also include other characteristics such as whether the auditory event includes a transient.
  • FIGS.4 and 5 Alternatives to the arrangement of FIG. 3 are shown in FIGS.4 and 5.
  • the audio input signal is not applied directly to the "Identify Characteristics" function or step 4-3, but it does receive information from the "Identify Auditory Events” function or step 4-1.
  • the arrangement of FIG. 1 is a specific example of such an arrangement.
  • the functions or steps 5-1, 5-2 and 5-3 are arranged in series.
  • an excitation signal E[b,t] is computed that approximates the distribution of energy along the basilar membrane of the inner ear at critical band b during time block t.
  • This excitation may be computed from the Short-time Discrete Fourier Transform (STDFT) of the audio signal as follows:
  • E[b,t] ⁇ b E[b,t ⁇ I] + (I - ⁇ b ) ⁇ T ⁇ ktf ⁇ C b [kf ⁇ X[k,ttf (1)
  • X[k,t] represents the STDFT of x[n] at time block t and bin k.
  • t represents time in discrete units of transform blocks as opposed to a continuous measure, such as seconds.
  • T[k] represents the frequency response of a filter simulating the transmission of audio through the outer and middle ear
  • C b [k] represents the frequency response of the basilar membrane at a location corresponding to critical band b.
  • FIG. 6 depicts a suitable set of critical band filter responses in which 40 bands are spaced uniformly along the Equivalent Rectangular Bandwidth (ERB) scale, as defined by Moore and Glasberg. Each filter shape is described by a rounded exponential function and the bands are distributed using a spacing of 1 ERB.
  • the smoothing time constant ⁇ b in equation 1 may be advantageously chosen proportionate to the integration time of human loudness perception within band b.
  • the excitation at each band is transformed into an excitation level that would generate the same perceived loudness at IkHz.
  • Specific loudness a measure of perceptual loudness distributed across frequency and time, is then computed from the transformed excitation, E lkHz [b,t] , through a compressive non-linearity.
  • One such suitable function to compute the specific loudness N[b > t] is given by:
  • TQ lk/Jz is the threshold in quiet at IkHz and the constants ⁇ and a are chosen to match growth of loudness data as collected from listening experiments.
  • this transformation from excitation to specific loudness may be presented by the function ⁇ ⁇ such that:
  • the specific loudness N[h, t] is a spectral representation meant to simulate the manner in which a human perceives audio as a function of frequency and time. It captures variations in sensitivity to different frequencies, variations in sensitivity to level, and variations in frequency resolution. As such, it is a spectral representation well matched to the detection of auditory events. Though more computationally complex, comparing the difference of N ⁇ b, t] across bands between successive time blocks may in many cases result in more perceptually accurate detection of auditory events in comparison to the direct use of successive FFT spectra described above.
  • the first describes the use of auditory events to control the release time in a digital implementation of a Dynamic Range Controller (DRC) in which the gain control is derived from the Root Mean Square (RMS) power of the signal.
  • DRC Dynamic Range Controller
  • RMS Root Mean Square
  • the second embodiment describes the use of auditory events to control certain aspects of a more sophisticated combination of AGC and DRC implemented within the context of the psychoacoustic loudness model described above.
  • the described digital implementation of a DRC segments an audio signal x[n] into windowed, half-overlapping blocks, and for each block a modification gain based on a measure of the signal's local power and a selected compression curve is computed. The gain is smoothed across blocks and then multiplied with each block. The modified blocks are finally overlap-added to generate the modified audio signal y[n] .
  • the auditory scene analysis and digital implementation of DRC as described here divides the time-domain audio signal into blocks to perform analysis and processing
  • the DRC processing need not be performed using block segmentation.
  • the auditory scene analysis could be performed using block segmentation and spectral analysis as described above and the resulting auditory event locations and characteristics could be used to provide control information to a digital implementation of a traditional DRC implementation that typically operates on a sample-by-sample basis.
  • the same blocking structure used for auditory scene analysis is employed for the DRC to simplify the description of their combination.
  • the overlapping blocks of the audio signal may be represented as: rfn,t] + tM /2] for 0 ⁇ n ⁇ M - l
  • the window vt ⁇ n] tapers to zero at both ends and sums to unity when half-overlapped with itself; the commonly used sine window meets these criteria, for example.
  • G[t] a[f ⁇ ⁇ G[t -I] + (I - a[t])G[t] (7a)
  • the overlap- add synthesis shown above effectively smooths the gains across samples of the processed signal y[n ⁇ .
  • the gain control signal receives smoothing in addition to that in shown in equation 7a.
  • gain smoothing more sophisticated than the simple one-pole filter shown in equation 7a might be necessary in order to prevent audible distortion in the processed signal.
  • Figures 9a through 9c depict the result of applying the described DRC processing to an audio signal.
  • a compression curve similar to the one shown in Figure 8b is used: above -2OdB relative to full scale digital the signal is attenuated with a ratio of 5:1, and below
  • the signal is boosted with a ratio of 5:1.
  • the gain is smoothed with an attack coefficient cc altack corresponding to a half-decay time of 10ms and a release coefficient ct Please corresponding to a half-decay time of 500ms.
  • the original audio signal depicted in Figure 9a consists of six consecutive piano chords, with the final chord, located around samplel .75 x 10 5 , decaying into silence. Examining a plot of the gain G[t] in Figure 9b, it should be noted that the gain remains close to OdB while the six chords are played.
  • Figures 10a through 10c depict the results of applying the exact same DRC system to a different audio signal.
  • the first half of the signal consists of an up-tempo music piece at a high level, and then at approximately sample 10 x 10 4 the signal switches to a second up-tempo music piece, but at a significantly lower level.
  • Examining the gain in Figure 6b one sees that the signal is attenuated by approximately 1OdB during the first half, and then the gain rises back up to OdB during the second half when the softer piece is playing. In this case, the gain behaves as desired.
  • the use of auditory events to control the release time of this DRC system provides such a solution.
  • the signal Alt] is an impulsive signal with an impulse occurring at the location of an event boundary.
  • the smoothed event control signal A[t] may be computed from A[t] according to:
  • Gt ⁇ n controls the decay time of the event control signal.
  • Figures 9d and 1 Od depict the event control signal A ⁇ t ⁇ ⁇ for the two corresponding audio signals, with the half-decay time of the smoother set to 250ms. In the first case, one sees that an event boundary is detected for each of the six piano chords, and that the event control signal decays smoothly towards zero after each event. For the second signal, many events are detected very close to each other in time, and therefore the event control signal never decays fully to zero.
  • the smoothing coefficient a[t] from Equation 7a equals a rdease , as before, and when the control signal is equal to zero, the coefficient equals one so that the smoothed gain is prevented from changing.
  • the smoothing coefficient is interpolated between these two extremes using the control signal according to:
  • the release time is reset to a value proportionate to the event strength at the onset of an event and then increases smoothly to infinity after the occurrence of an event.
  • the rate of this increase is dictated by the coefficient a ⁇ 111 used to generate the smoothed event control signal.
  • Figures 9e and 1Oe show the effect of smoothing the gain with the event- controlled coefficient from Equation 13 as opposed to non-event-controlled coefficient from Equation 7b.
  • the event control signal falls to zero after the last piano chord, thereby preventing the gain from moving upwards.
  • the corresponding modified audio in Figure 9f does not suffer from an unnatural boost of the chord's decay.
  • the event control signal never approaches zero, and therefore the smoothed gain signal is inhibited very little through the application of the event control.
  • the trajectory of the smoothed gain is nearly identical to the non-event- controlled gain in Figure 10b. This is exactly the desired effect. Loudness Based AGC and DRC
  • the loudness domain dynamics processing system that is now described consists of AGC followed by DRC.
  • the goal of this combination is to make all processed audio have approximately the same perceived loudness while still maintaining at least some of the original audio's dynamics.
  • Figure 11 depicts a suitable set of AGC and DRC curves for this application. Note that the input and output of both curves is represented in units of sone since processing is performed in the loudness domain.
  • the AGC curve strives to bring the output audio closer to some target level, and, as mentioned earlier, does so with relatively slow time constants.
  • Figure 11 shows such a DRC curve where the AGC target falls within the "null band" of the DRC, the portion of the curve that calls for no modification.
  • the AGC places the long-term loudness of the audio within the null-band of the DRC curve so that minimal fast-acting DRC modifications need be applied. If the short- term loudness still fluctuates outside of the null-band, the DRC then acts to move the loudness of the audio towards this null-band.
  • Auditory events may be utilized to control the attack and release of both the AGC and DRC.
  • AGC both the attack and release times are large in comparison to the temporal resolution of event perception, and therefore event control may be advantageously employed in both cases.
  • DRC the attack is relatively short, and therefore event control may be needed only for the release as with the traditional DRC described above.
  • a difference signal D[t] similar to the one in Equations 10a and b may be computed from the specific loudness N ⁇ b, t] , defined in Equation 2, as follows:
  • the difference signal may then be processed in the same way shown in Equations 11 and 12 to generate a smooth event control signal A ⁇ t ⁇ used to control the attack and release times.
  • the AGC curve depicted in Figure 11 may be represented as a function that takes as its input a measure of loudness and generates a desired output loudness:
  • the DRC curve may be similarly represented:
  • the input loudness is a measure of the audio's long-term loudness.
  • AGC W a AGC [t]L AGC [t -I] + (I - a ACC lt])L[t]
  • N AGC [b,t] a AGC [t]N Aac [b,t - ⁇ ] + (l -a AGC [ ⁇ N[b,t] (16c)
  • the smoothing coefficients such that the attack time is approximately half that of the release.
  • the loudness modification scaling associated with the AGC as the ratio of the output loudness to input loudness:
  • the DRC modification may now be computed from the loudness after the application of the AGC scaling. Rather than smooth a measure of the loudness prior to the application of the DRC curve, one may alternatively apply the DRC curve to the instantaneous loudness and then subsequently smooth the resulting modification. This is similar to the technique described earlier for smoothing the gain of the traditional DRC.
  • the DRC may be applied in a multi-band fashion, meaning that the DRC modification is a function of the specific loudness N[b,t] in each band b, rather than the overall loudness L[t] .
  • DRC scaling in each band may be computed according to:
  • the AGC and DRC modifications may then be combined to form a total loudness scaling per band:
  • attack and release modes may by determined through the simultaneous smoothing of specific loudness itself:
  • the gains may be applied to each band of the filterbank used to compute the excitation, and the modified audio may then be generated by inverting the filterbank to produce a modified time domain audio signal.
  • the event control signal ⁇ [t] from Equation 12 may be used to vary the value of the DRC ratio parameter that is used to dynamically adjust the gain of the audio.
  • the Ratio parameter similarly to the attack and release time parameters, may contribute significantly to the perceptual artifacts introduced by dynamic gain adjustments.
  • the invention may be implemented in hardware or software, or a combination of both (e.g., programmable logic arrays). Unless otherwise specified, the algorithms included as part of the invention are not inherently related to any particular computer or other apparatus. In particular, various general-purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus ⁇ e.g., integrated circuits) to perform the required method steps. Thus, the invention may be implemented in one or more computer programs executing on one or more programmable computer systems each comprising at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device or port, and at least one output device or port. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices, in known fashion.
  • Program code is applied to input data to perform the functions described herein and generate output information.
  • the output information is applied to one or more output devices, in known fashion.
  • Each such program may be implemented in any desired computer language (including machine, assembly, or high level procedural, logical, or object oriented programming languages) to communicate with a computer system.
  • the language may be a compiled or interpreted language.
  • Each such computer program is preferably stored on or downloaded to a storage media or device (e.g., solid state memory or media, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer system to perform the procedures described herein.
  • a storage media or device e.g., solid state memory or media, or magnetic or optical media
  • the inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Tone Control, Compression And Expansion, Limiting Amplitude (AREA)
  • Control Of Amplification And Gain Control (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Circuits Of Receivers In General (AREA)
  • Document Processing Apparatus (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Ultra Sonic Daignosis Equipment (AREA)

Abstract

In one disclosed aspect, dynamic gain modification s are applied to an audio signal at least partly in response to auditory events and/or the degree of change in signal characteristics associated with said auditory event boundaries. In another aspect, an audio signal is divided into auditory events by comparing the difference in specific loudness between successive time blocks of the audio signal.

Description

Description
Audio Gain Control Using Specifϊc-Loudness-Based Auditory Event Detection
Technical Field
The present invention relates to audio dynamic range control methods and apparatus in which an audio processing device analyzes an audio signal and changes the level, gain or dynamic range of the audio, and all or some of the parameters of the audio gain and dynamics processing are generated as a function of auditory events. The invention also relates to computer programs for practicing such methods or controlling such apparatus.
The present invention also relates to methods and apparatus using a specific- loudness-based detection of auditory events. The invention also relates to computer programs for practicing such methods or controlling such apparatus.
Background A rt Dynamics Processing of Audio
The techniques of automatic gain control (AGC) and dynamic range control (DRC) are well known and are a common element of many audio signal paths. In an abstract sense, both techniques measure the level of an audio signal in some manner and then gain-modify the signal by an amount that is a function of the measured level. In a linear, 1:1 dynamics processing system, the input audio is not processed and the output audio signal ideally matches the input audio signal. Additionally, if one has an audio dynamics processing system that automatically measures characteristics of the input signal and uses that measurement to control the output signal, if the input signal rises in level by 6 dB and the output signal is processed such that it only rises in level by 3 dB, then the output signal has been compressed by a ratio of 2:1 with respect to the input signal. International Publication Number WO 2006/047600 Al ("Calculating and Adjusting the Perceived Loudness and/or the Perceived Spectral Balance of an Audio Signal" by Alan Jeffrey Seefeldt) provides a detailed overview of the five basic types of dynamics processing of audio: compression, limiting, automatic gain control (AGC), expansion and gating. { Auditory Events and Auditory Event Detection
The division of sounds into units or segments perceived as separate and distinct is sometimes referred to as "auditory event analysis" or "auditory scene analysis" ("ASA") and the segments are sometimes referred to as "auditory events" or "audio events." An extensive discussion of auditory scene analysis is set forth by Albert S. Bregman in his book Auditory Scene Analysis—The Perceptual Organization of Sound, Massachusetts Institute of Technology, 1991, Fourth printing, 2001, Second MIT Press paperback edition). In addition, U.S. Pat. No. 6,002,776 to Bhadkamkar, et al, Dec. 14, 1999 cites publications dating back to 1976 as "prior art work related to sound separation by auditory scene analysis." However, the Bhadkamkar, et al patent discourages the practical use of auditory scene analysis, concluding that "[t]echniques involving auditory scene analysis, although interesting from a scientific point of view as models of human auditory processing, are currently far too computationally demanding and specialized to be considered practical techniques for sound separation until fundamental progress is made."
A useful way to identify auditory events is set forth by Crockett and Crocket et al in various patent applications and papers listed below under the heading "Incorporation by Reference." According to those documents, an audio signal is divided into auditory events, each of which tends to be perceived as separate and distinct, by detecting changes in spectral composition (amplitude as a function of frequency) with respect to time. This may be done, for example, by calculating the spectral content of successive time blocks of the audio signal, calculating the difference in spectral content between successive time blocks of the audio signal, and identifying an auditory event boundary as the boundary between successive time blocks when the difference in the spectral content between such successive time blocks exceeds a threshold. Alternatively, changes in amplitude with respect to time may be calculated instead of or in addition to changes in spectral composition with respect to time.
In its least computationally demanding implementation, the process divides audio into time segments by analyzing the entire frequency band (full bandwidth audio) or substantially the entire frequency band (in practical implementations, band limiting filtering at the ends of the spectrum is often employed) and giving the greatest weight to the loudest audio signal components. This approach takes advantage of a psychoacoustic phenomenon in which at smaller time scales (20 milliseconds (ms) and less) the ear may tend to focus on a single auditory event at a given time. This implies that while multiple events may be occurring at the same time, one component tends to be perceptually most prominent and may be processed individually as though it were the only event taking place. Taking advantage of this effect also allows the auditory event detection to scale with the complexity of the audio being processed. For example, if the input audio signal being processed is a solo instrument, the audio events that are identified will likely be the individual notes being played. Similarly for an input voice signal, the individual components of speech, the vowels and consonants for example, will likely be identified as individual audio elements. As the complexity of the audio increases, such as music with a drumbeat or multiple instruments and voice, the auditory event detection identifies the "most prominent" (i.e., the loudest) audio element at any given moment.
At the expense of greater computational complexity, the process may also take into consideration changes in spectral composition with respect to time in discrete frequency subbands (fixed or dynamically determined or both fixed and dynamically determined subbands) rather than the full bandwidth. This alternative approach takes into account more than one audio stream in different frequency subbands rather than assuming that only a single stream is perceptible at a particular time.
Auditory event detection may be implemented by dividing a time domain audio waveform into time intervals or blocks and then converting the data in each block to the frequency domain, using either a filter bank or a time-frequency transformation, such as the FFT. The amplitude of the spectral content of each block may be normalized in order to eliminate or reduce the effect of amplitude changes. Each resulting frequency domain representation provides an indication of the spectral content of the audio in the particular block. The spectral content of successive blocks is compared and changes greater than a threshold may be taken to indicate the temporal start or temporal end of an auditory event.
Preferably, the frequency domain data is normalized, as is described below. The degree to which the frequency domain data needs to be normalized gives an indication of amplitude. Hence, if a change in this degree exceeds a predetermined threshold that too may be taken to indicate an event boundary. Event start and end points resulting from spectral changes and from amplitude changes may be ORed together so that event boundaries resulting from either type of change are identified.
Although techniques described in said Crockett and Crockett at al applications and papers are particularly useful in connection with aspects of the present invention, other techniques for identifying auditory events and event boundaries may be employed in aspects of the present invention.
Disclosure of the Invention
Conventional prior-art dynamics processing of audio involves multiplying the audio by a time- varying control signal that adjusts the gain of the audio producing a desired result. "Gain" is a scaling factor that scales the audio amplitude. This control signal may be generated on a continuous basis or from blocks of audio data, but it is generally derived by some form of measurement of the audio being processed, and its rate of change is determined by smoothing filters, sometimes with fixed characteristics and sometimes with characteristics that vary with the dynamics of the audio. For example, response times may be adjustable in accordance with changes in the magnitude or the power of the audio. Prior art methods such as automatic gain control (AGC) and dynamic range compression (DRC) do not assess in any psychoacoustically-based way the time intervals during which gain changes may be perceived as impairments and when they can be applied without imparting audible artifacts. Therefore, conventional audio dynamics processes can often introduce audible artifacts, i.e., the effects of the dynamics processing can introduce unwanted perceptible changes in the audio.
Auditory scene analysis identifies perceptually discrete auditory events, with each event occurring between two consecutive auditory event boundaries. The audible impairments caused by a gain change can be greatly reduced by ensuring that within an auditory event the gain is more nearly constant and by confining much of the change to the neighborhood of an event boundary. In the context of compressors or expanders, the response to an increase in audio level (often called the attack) may be rapid, comparable with or shorter than the minimum duration of auditory events, but the response to a decrease (the release or recovery) may be slower so that sounds that ought to appear constant or to decay gradually may be audibly disturbed. Under such circumstances, it is very beneficial to delay the gain recovery until the next boundary or to slow down the rate of change of gain during an event. For automatic gain control applications, where the medium- to long-term level or loudness of the audio is normalized and both attack and release times may therefore be long compared with the minimum duration of an auditory event, it is beneficial during events to delay changes or slow down rates of change in gain until the next event boundary for both increasing and decreasing gains. According to one aspect of the present invention, an audio processing system receives an audio signal and analyzes and alters the gain and/or dynamic range characteristics of the audio. The dynamic range modification of the audio is often controlled by parameters of a dynamics processing system (attack and release time, compression ratio, etc.) that have significant effects on the perceptual artifacts introduced by the dynamics processing. Changes in signal characteristics with respect to time in the audio signal are detected and identified as auditory event boundaries, such that an audio segment between consecutive boundaries constitutes an auditory event in the audio signal. The characteristics of the auditory events of interest may include characteristics of the events such as perceptual strength or duration. Some of said one or more dynamics processing parameters are generated at least partly in response to auditory events and/or the degree of change in signal characteristics associated with said auditory event boundaries.
Typically, an auditory event is a segment of audio that tends to be perceived as separate and distinct. One usable measure of signal characteristics includes a measure of the spectral content of the audio, for example, as described in the cited Crockett and Crockett et al documents. All or some of the one or more audio dynamics processing parameters may be generated at least partly in response to the presence or absence and characteristics of one or more auditory events. An auditory event boundary may be identified as a change in signal characteristics with respect to time that exceeds a threshold. Alternatively, all or some of the one or more parameters may be generated at least partly in response to a continuing measure of the degree of change in signal characteristics associated with said auditory event boundaries. Although, in principle, aspects of the invention may be implemented in analog and/or digital domains, practical implementations are likely to be implemented in the digital domain in which each of the audio signals are represented by individual samples or samples within blocks of data. In this case, the signal characteristics may be the spectral content of audio within a block, the detection of changes in signal characteristics with respect to time may be the detection of changes in spectral content of audio from block to block, and auditory event temporal start and stop boundaries each coincide with a boundary of a block of data. It should be noted that for the more traditional case of performing dynamic gain changes on a sample- by-sample basis, that the auditory scene analysis described could be performed on a block basis and the resulting auditory event information being used to perform dynamic gain changes that are applied sample-by-sample.
By controlling key audio dynamics processing parameters using the results of auditory scene analysis, a dramatic reduction of audible artifacts introduced by dynamics processing may be achieved.
The present invention presents two ways of performing auditory scene analysis. The first performs spectral analysis and identifies the location of perceptible audio events that are used to control the dynamic gain parameters by identifying changes in spectral content. The second way transforms the audio into a perceptual loudness domain (that may provide more psychoacoustically relevant information than the first way) and identifies the location of auditory events that are subsequently used to control the dynamic gain parameters. It should be noted that the second way requires that the audio processing be aware of absolute acoustic reproduction levels, which may not be possible in some implementations. Presenting both methods of auditory scene analysis allows implementations of AS A-controlled dynamic gain modification using processes or devices that may or may not be calibrated to take into account absolute reproduction levels.
Aspects of the present invention are described herein in an audio dynamics processing environment that includes aspects of other inventions. Such other inventions are described in various pending United States and International Patent Applications of Dolby Laboratories Licensing Corporation, the owner of the present application, which applications are identified herein.
Description of the Drawings
FIG. 1 is a flow chart showing an example of processing steps for performing auditory scene analysis.
FIG. 2 shows an example of block processing, windowing and performing the DFT on audio while performing the auditory scene analysis.
FIG. 3 is in the nature of a flow chart or functional block diagram, showing parallel processing in which audio is used to identify auditory events and to identify the characteristics of the auditory events such that the events and their characteristics are used to modify dynamics processing parameters.
FIG. 4 is in the nature of a flow chart or functional block diagram, showing processing in which audio is used only to identify auditory events and the event characteristics are determined from the audio event detection such that the events and their characteristics are used to modify the dynamics processing parameters.
FIG. 5 is in the nature of a flow chart or functional block diagram, showing processing in which audio is used only to identify auditory events and the event characteristics are determined from the audio event detection and such that only the characteristics of the auditory events are used to modify the dynamics processing parameters.
FIG. 6 shows a set idealized auditory filter characteristic responses that approximate critical banding on the ERB scale. The horizontal scale is frequency in Hertz and the vertical scale is level in decibels.
FIG. 7 shows the equal loudness contours of ISO 226. The horizontal scale is frequency in Hertz (logarithmic base 10 scale) and the vertical scale is sound pressure level in decibels.
FIGS. 8a-c shows idealized input/output characteristics and input gain characteristics of an audio dynamic range compressor.
FIGS. 9a-f show an example of the use of auditory events to control the release time in a digital implementation of a traditional Dynamic Range Controller (DRC) in which the gain control is derived from the Root Mean Square (RMS) power of the signal.
FIGS. 10a-f show an example of the use of auditory events to control the release time in a digital implementation of a traditional Dynamic Range Controller (DRC) in which the gain control is derived from the Root Mean Square (RMS) power of the signal for an alternate signal to that used in FIG. 9.
FIG. 11 depicts a suitable set of idealized AGC and DRC curves for the application of AGC followed by DRC in a loudness domain dynamics processing system. The goal of the combination is to make all processed audio have approximately the same perceived loudness while still maintaining at least some of the original audio's dynamics.
Best Mode for Carrying Out the Invention
Auditory Scene Analysis (Original, Non-Loudness Domain Method) In accordance with an embodiment of one aspect of the present invention, auditory scene analysis may be composed of four general processing steps as shown in a portion of FIG. 1. The first step 1-1 ("Perform Spectral Analysis") takes a time-domain audio signal, divides it into blocks and calculates a spectral profile or spectral content for each of the blocks. Spectral analysis transfoπns the audio signal into the short-term frequency domain. This may be performed using any fϊlterbank, either based on transforms or banks of bandpass filters, and in either linear or warped frequency space (such as the Bark scale or critical band, which better approximate the characteristics of the human ear). With any filterbank there exists a tradeoff between time and frequency. Greater time resolution, and hence shorter time intervals, leads to lower frequency resolution. Greater frequency resolution, and hence narrower subbands, leads to longer time intervals.
The first step, illustrated conceptually in FIG. 1 calculates the spectral content of successive time segments of the audio signal. In a practical embodiment, the ASA block size may be from any number of samples of the input audio signal, although 512 samples provide a good tradeoff of time and frequency resolution. In the second step 1-2, the differences in spectral content from block to block are determined ("Perform spectral profile difference measurements"). Thus, the second step calculates the difference in spectral content between successive time segments of the audio signal. As discussed above, a powerful indicator of the beginning or end of a perceived auditory event is believed to be a change in spectral content. In the third step 1-3 ("Identify location of auditory event boundaries"), when the spectral difference between one spectral-profile block and the next is greater than a threshold, the block boundary is taken to be an auditory event boundary. The audio segment between consecutive boundaries constitutes an auditory event. Thus, the third step sets an auditory event boundary between successive time segments when the difference in the spectral profile content between such successive time segments exceeds a threshold, thus defining auditory events. In this embodiment, auditory event boundaries define auditory events having a length that is an integral multiple of spectral profile blocks with a minimum length of one spectral profile block (512 samples in this example). In principle, event boundaries need not be so limited. As an alternative to the practical embodiments discussed herein, the input block size may vary, for example, so as to be essentially the size of an auditory event.
Following the identification of the event boundaries, key characteristics of the auditory event are identified, as shown in step 1-4. Either overlapping or non-overlapping segments of the audio may be windowed and used to compute spectral profiles of the input audio. Overlap results in finer resolution as to the location of auditory events and, also, makes it less likely to miss an event, such as a short transient. However, overlap also increases computational complexity. Thus, overlap may be omitted. FIG. 2 shows a conceptual representation of non-overlapping N sample blocks being windowed and transformed into the frequency domain by the Discrete Fourier Transform (DFT). Each block may be windowed and transformed into the frequency domain, such as by using the DFT, preferably implemented as a Fast Fourier Transform (FFT) for speed.
The following variables may be used to compute the spectral profile of the input block:
M = number of windowed samples in a block used to compute spectral profile P = number of samples of spectral computation overlap
In general, any integer numbers may be used for the variables above. However, the implementation will be more efficient if M is set equal to a power of 2 so that standard FFTs may be used for the spectral profile calculations. In a practical embodiment of the auditory scene analysis process, the parameters listed may be set to: M = 512 samples (or 11.6 ms at 44.1 kHz)
P = 0 samples (no overlap)
The above-listed values were determined experimentally and were found generally to identify with sufficient accuracy the location and duration of auditory events. However, setting the value of P to 256 samples (50% overlap) rather than zero samples (no overlap) has been found to be useful in identifying some hard-to-find events. While many different types of windows may be used to minimize spectral artifacts due to windowing, the window used in the spectral profile calculations is an M-point Harming, Kaiser-Bessel or other suitable, preferably non-rectangular, window. The above- indicated values and a Harming window type were selected after extensive experimental analysis as they have shown to provide excellent results across a wide range of audio material. Non-rectangular windowing is preferred for the processing of audio signals with predominantly low frequency content. Rectangular windowing produces spectral artifacts that may cause incorrect detection of events. Unlike certain encoder/decoder (codec) applications where an overall overlap/add process must provide a constant level, such a constraint does not apply here and the window may be chosen for characteristics such as its time/frequency resolution and stop-band rejection.
In step 1-1 (FIG. 1), the spectrum of each M-sample block may be computed by windowing the data with an M-point Harming, Kaiser-Bessel or other suitable window, converting to the frequency domain using an M-point Fast Fourier Transform, and calculating the magnitude of the complex FFT coefficients. The resultant data is normalized so that the largest magnitude is set to unity, and the normalized array of M numbers is converted to the log domain. The data may also be normalized by some other metric such as the mean magnitude value or mean power value of the data. The array need not be converted to the log domain, but the conversion simplifies the calculation of the difference measure in step 1-2. Furthermore, the log domain more closely matches the nature of the human auditory system. The resulting log domain values have a range of minus infinity to zero. In a practical embodiment, a lower limit may be imposed on the range of values; the limit may be fixed, for example -60 dB, or be frequency-dependent to reflect the lower audibility of quiet sounds at low and very high frequencies. (Note that it would be possible to reduce the size of the array to M/2 in that the FFT represents negative as well as positive frequencies).
Step 1-2 calculates a measure of the difference between the spectra of adjacent blocks. For each block, each of the M (log) spectral coefficients from step 1-1 is subtracted from the corresponding coefficient for the preceding block, and the magnitude of the difference calculated (the sign is ignored). These M differences are then summed to one number. This difference measure may also be expressed as an average difference per spectral coefficient by dividing the difference measure by the number of spectral coefficients used in the sum (in this case M coefficients).
Step 1-3 identifies the locations of auditory event boundaries by applying a threshold to the array of difference measures from step 1-2 with a threshold value. When a difference measure exceeds a threshold, the change in spectrum is deemed sufficient to signal a new event and the block number of the change is recorded as an event boundary. For the values of M and P given above and for log domain values (in step 1-1) expressed in units of dB, the threshold may be set equal to 2500 if the whole magnitude FFT (including the mirrored part) is compared or 1250 if half the FFT is compared (as noted above, the FFT represents negative as well as positive frequencies — for the magnitude of the FFT, one is the mirror image of the other). This value was chosen experimentally and it provides good auditory event boundary detection. This parameter value may be changed to reduce (increase the threshold) or increase (decrease the threshold) the detection of events. The process of FIG. 1 may be represented more generally by the equivalent arrangements of FIGS. 3, 4 and 5. In FIG. 3, an audio signal is applied in parallel to an "Identify Auditory Events" function or step 3-1 that divides the audio signal into auditory events, each of which tends to be perceived as separate and distinct and to an optional "Identify Characteristics of Auditory Events" function or step 3-2. The process of FIG. 1 may be employed to divide the audio signal into auditory events and their characteristics identified or some other suitable process may be employed. The auditory event information, which may be an identification of auditory event boundaries, determined by function or step 3-1 is then used to modify the audio dynamics processing parameters (such as attack, release, ratio, etc.) , as desired, by a "Modify Dynamics Parameters" function or step 3-3. The optional "Identify Characteristics" function or step 3-3 also receives the auditory event information. The "Identify Characteristics" function or step 3-3 may characterize some or all of the auditory events by one or more characteristics. Such characteristics may include an identification of the dominant subband of the auditory event, as described in connection with the process of FIG. 1. The characteristics may also include one or more audio characteristics, including, for example, a measure of power of the auditory event, a measure of amplitude of the auditory event, a measure of the spectral flatness of the auditory event, and whether the auditory event is substantially silent, or other characteristics that help modify dynamics parameters such that negative audible artifacts of the processing are reduced or removed. The characteristics may also include other characteristics such as whether the auditory event includes a transient.
Alternatives to the arrangement of FIG. 3 are shown in FIGS.4 and 5. In FIG. 4, the audio input signal is not applied directly to the "Identify Characteristics" function or step 4-3, but it does receive information from the "Identify Auditory Events" function or step 4-1. The arrangement of FIG. 1 is a specific example of such an arrangement. In FIG. 5, the functions or steps 5-1, 5-2 and 5-3 are arranged in series.
The details of this practical embodiment are not critical. Other ways to calculate the spectral content of successive time segments of the audio signal, calculate the differences between successive time segments, and set auditory event boundaries at the respective boundaries between successive time segments when the difference in the spectral profile content between such successive time segments exceeds a threshold may be employed. Auditory Scene Analysis (New, Loudness Domain Method)
International application under the Patent Cooperation Treaty S.N. PCT/US2005/038579, filed October 25, 2005, published as International Publication Number WO 2006/047600 Al, entitled "Calculating and Adjusting the Perceived Loudness and/or the Perceived Spectral Balance of an Audio Signal" by Alan Jeffrey
Seefeldt discloses, among other things, an objective measure of perceived loudness based on a psychoacoustic model. Said application is hereby incorporated by reference in its entirety. As described in said application, from an audio signal,
Figure imgf000013_0001
an excitation signal E[b,t] is computed that approximates the distribution of energy along the basilar membrane of the inner ear at critical band b during time block t. This excitation may be computed from the Short-time Discrete Fourier Transform (STDFT) of the audio signal as follows:
E[b,t] = ΛbE[b,t ~ I] + (I - λb)∑\T{ktf\Cb[kf\X[k,ttf
Figure imgf000013_0002
(1)
where X[k,t] represents the STDFT of x[n] at time block t and bin k. Note that in equation 1 t represents time in discrete units of transform blocks as opposed to a continuous measure, such as seconds. T[k] represents the frequency response of a filter simulating the transmission of audio through the outer and middle ear, and Cb[k] represents the frequency response of the basilar membrane at a location corresponding to critical band b. FIG. 6 depicts a suitable set of critical band filter responses in which 40 bands are spaced uniformly along the Equivalent Rectangular Bandwidth (ERB) scale, as defined by Moore and Glasberg. Each filter shape is described by a rounded exponential function and the bands are distributed using a spacing of 1 ERB. Lastly, the smoothing time constant λb in equation 1 may be advantageously chosen proportionate to the integration time of human loudness perception within band b.
Using equal loudness contours, such as those depicted in FIG. 7, the excitation at each band is transformed into an excitation level that would generate the same perceived loudness at IkHz. Specific loudness, a measure of perceptual loudness distributed across frequency and time, is then computed from the transformed excitation, ElkHz[b,t] , through a compressive non-linearity. One such suitable function to compute the specific loudness N[b>t] is given by:
Figure imgf000014_0001
(2)
where TQlk/Jz is the threshold in quiet at IkHz and the constants β and a are chosen to match growth of loudness data as collected from listening experiments. Abstractly, this transformation from excitation to specific loudness may be presented by the function ψ{ } such that:
N[b,t] = ψ{£;[b,t]}
Finally, the total loudness, L[t] , represented in units of sone, is computed by summing the specific loudness across bands:
W] = ∑N[b,t] b
(3)
The specific loudness N[h, t] is a spectral representation meant to simulate the manner in which a human perceives audio as a function of frequency and time. It captures variations in sensitivity to different frequencies, variations in sensitivity to level, and variations in frequency resolution. As such, it is a spectral representation well matched to the detection of auditory events. Though more computationally complex, comparing the difference of N\b, t] across bands between successive time blocks may in many cases result in more perceptually accurate detection of auditory events in comparison to the direct use of successive FFT spectra described above.
In said patent application, several applications for modifying the audio based on this psychoacoustic loudness model are disclosed. Among these are several dynamics processing algorithms, such as AGC and DRC. These disclosed algorithms may benefit from the use of auditory events to control various associated parameters. Because specific loudness is already computed, it is readily available for the purpose of detecting said events. Details of a preferred embodiment are discussed below.
Audio Dynamics Processing Parameter Control with Auditory Events
Two examples of embodiments of the invention are now presented. The first describes the use of auditory events to control the release time in a digital implementation of a Dynamic Range Controller (DRC) in which the gain control is derived from the Root Mean Square (RMS) power of the signal. The second embodiment describes the use of auditory events to control certain aspects of a more sophisticated combination of AGC and DRC implemented within the context of the psychoacoustic loudness model described above. These two embodiments are meant to serve as examples of the invention only, and it should be understood that the use of auditory events to control parameters of a dynamics processing algorithm is not restricted to the specifics described below.
Dynamic Range Control
The described digital implementation of a DRC segments an audio signal x[n] into windowed, half-overlapping blocks, and for each block a modification gain based on a measure of the signal's local power and a selected compression curve is computed. The gain is smoothed across blocks and then multiplied with each block. The modified blocks are finally overlap-added to generate the modified audio signal y[n] .
It should be noted, that while the auditory scene analysis and digital implementation of DRC as described here divides the time-domain audio signal into blocks to perform analysis and processing, the DRC processing need not be performed using block segmentation. For example the auditory scene analysis could be performed using block segmentation and spectral analysis as described above and the resulting auditory event locations and characteristics could be used to provide control information to a digital implementation of a traditional DRC implementation that typically operates on a sample-by-sample basis. Here, however, the same blocking structure used for auditory scene analysis is employed for the DRC to simplify the description of their combination.
Proceeding with the description of a block based DRC implementation, the overlapping blocks of the audio signal may be represented as: rfn,t]
Figure imgf000016_0001
+ tM /2] for 0 < n < M - l
(4)
where M is the block length and the hopsize is M/2, w[n] is the window, n is the sample index within the block, and t is the block index (note that here t is used in the same way as with the STDFT in equation 1; it represents time in discrete units of blocks rather than seconds, for example). Ideally, the window vt^n] tapers to zero at both ends and sums to unity when half-overlapped with itself; the commonly used sine window meets these criteria, for example.
For each block, one may then compute the RMS power to generate a power measure P[t] in dB per block:
Figure imgf000016_0002
(5)
As mentioned earlier, one could smooth this power measure with a fast attack and slow release prior to processing with a compression curve, but as an alternative the instantaneous power P[t] is processed and the resulting gain is smoothed. This alternate approach has the advantage that a simple compression curve with sharp knee points may be used, but the resulting gains are still smooth as the power travels through the knee- point. Representing a compression curve as shown in Figure 8c as a function F of signal level that generates a gain, the block gain G[t) is given by:
G[t} = F[PW)
(6)
Assuming that the compression curve applies greater attenuation as signal level increases, the gain will be decreasing when the signal is in "attack mode" and increasing when in "release mode". Therefore, a smoothed gain G[t] may be computed according to: G[t] = a[f\ ■ G[t -I] + (I - a[t])G[t] (7a)
where
Figure imgf000017_0001
(7b) and a release >'> a attach
(7c)
Finally, the smoothed gain G[Jj , which is in dB, is applied to each block of the signal, and the modified blocks are overlap-added to produce the modified audio:
y[n + tM /2] =
Figure imgf000017_0002
/2,* - l] for 0 < « < M /2 (8)
Note that because the blocks have been multiplied with a tapered window, as shown in equation 4, the overlap- add synthesis shown above effectively smooths the gains across samples of the processed signal y[n\. Thus, the gain control signal receives smoothing in addition to that in shown in equation 7a. In a more traditional implementation of DRC operating sample-by-sample rather than block-by-block, gain smoothing more sophisticated than the simple one-pole filter shown in equation 7a might be necessary in order to prevent audible distortion in the processed signal. Also, the use of block based processing introduces an inherent delay ofM/2 samples into the system, and as long as the decay time associated with aβttack is close to this delay, the signal x[n] does not need to be delayed further before the application of the gains for the purposes of preventing overshoot.
Figures 9a through 9c depict the result of applying the described DRC processing to an audio signal. For this particular implementation, a block length of M=512 is used at a sampling rate of 44.1 kHz. A compression curve similar to the one shown in Figure 8b is used: above -2OdB relative to full scale digital the signal is attenuated with a ratio of 5:1, and below
-3OdB the signal is boosted with a ratio of 5:1. The gain is smoothed with an attack coefficient ccaltack corresponding to a half-decay time of 10ms and a release coefficient ct Please corresponding to a half-decay time of 500ms. The original audio signal depicted in Figure 9a consists of six consecutive piano chords, with the final chord, located around samplel .75 x 105 , decaying into silence. Examining a plot of the gain G[t] in Figure 9b, it should be noted that the gain remains close to OdB while the six chords are played. This is because the signal energy remains, for the most part, between -3OdB and -2OdB, the region within which the DRC curve calls for no modification. However, after the hit of the last chord, the signal energy falls below -3OdB, and the gain begins to rise, eventually beyond 15dB, as the chord decays. Figure 9c depicts the resulting modified audio signal, and one can see that the tail of the final chord is boosted significantly. Audibly, this boosting of the chord's natural, low-level decay sound creates an extremely unnatural result. It is the aim of the present invention to prevent problems of this type that are associated with a traditional dynamics processor.
Figures 10a through 10c depict the results of applying the exact same DRC system to a different audio signal. In this case the first half of the signal consists of an up-tempo music piece at a high level, and then at approximately sample 10 x 104 the signal switches to a second up-tempo music piece, but at a significantly lower level. Examining the gain in Figure 6b, one sees that the signal is attenuated by approximately 1OdB during the first half, and then the gain rises back up to OdB during the second half when the softer piece is playing. In this case, the gain behaves as desired. One would like the second piece to be boosted relative to the first, and the gain should increase quickly after the transition to the second piece to be audibly unobtrusive. One sees a gain behavior that is similar to that for the first signal discussed, but here the behavior is desirable. Therefore, one would like to fix the first case without affecting the second. The use of auditory events to control the release time of this DRC system provides such a solution.
In the first signal that was examined in Figure 9, the boosting of the last chord's decay seems unnatural because the chord and its decay are perceived as a single auditory event whose integrity is expected to be maintained. In the second case, however, many auditory events occur while the gain increases, meaning that for any individual event, little change is imparted. Therefore the overall gain change is not as objectionable. One may therefore argue that a gain change should be allowed only in the near temporal vicinity of an auditory event boundary. One could apply this principal to the gain while it is in either attack or release mode, but for most practical implementations of a DRC, the gain moves so quickly in attack mode in comparison to the human temporal resolution of event perception that no control is necessary. One may therefore use events to control smoothing of the DRC gain only when it is in release mode.
A suitable behavior of the release control is now described. In qualitative terms, if an event is detected, the gain is smoothed with the release time constant as specified above in Equation 7a. As time evolves past the detected event, and if no subsequent events are detected, the release time constant continually increases so that eventually the smoothed gain is "frozen" in place. If another event is detected, then the smoothing time constant is reset to the original value and the process repeats. In order to modulate the release time, one may first generate a control signal based on the detected event boundaries. As discussed earlier, event boundaries may be detected by looking for changes in successive spectra of the audio signal. In this particular implementation, the DFT of each overlapping block x[n,t] may be computed to generate the STDFT of the audio signal x[n] :
X[k,t] = ∑xin,t]e M π=0
(9)
Next, the difference between the normalized log magnitude spectra of successive blocks may be computed according to:
m -χNORM[k,t
Figure imgf000019_0001
Figure imgf000019_0002
(10a)
where XNORM
Figure imgf000020_0001
(10b) Here the maximum of JJf[Ar, *^][ across bins k is used for normalization, although one might employ other normalization factors; for example, the average of
Figure imgf000020_0002
across bins. If the difference D[t] exceeds a threshold Dmia , then an event is considered to have occurred. Additionally, one may assign a strength to this event, lying between zero and one, based on the size of D[t ] in comparison to a maximum threshold -Dmax . The resulting auditory event strength signal A[t] may be computed as:
Figure imgf000020_0003
By assigning a strength to the auditory event proportional to the amount of spectral change associated with that event, greater control over the dynamics processing is achieved in comparison to a binary event decision. The inventors have found that larger gain changes are acceptable during stronger events, and the signal in equation 11 allows such variable control.
The signal Alt] is an impulsive signal with an impulse occurring at the location of an event boundary. For the purposes of controlling the release time, one may further smooth the signal A\t\ so that it decays smoothly to zero after the detection of an event boundary. The smoothed event control signal A[t] may be computed from A[t] according to:
A[t] A[t -\]
Figure imgf000020_0004
(12) Here Gt^n, controls the decay time of the event control signal. Figures 9d and 1 Od depict the event control signal A\t~\ for the two corresponding audio signals, with the half-decay time of the smoother set to 250ms. In the first case, one sees that an event boundary is detected for each of the six piano chords, and that the event control signal decays smoothly towards zero after each event. For the second signal, many events are detected very close to each other in time, and therefore the event control signal never decays fully to zero.
One may now use the event control signal A[t] to vary the release time constant used for smoothing the gain. When the control signal is equal to one, the smoothing coefficient a[t] from Equation 7a equals ardease , as before, and when the control signal is equal to zero, the coefficient equals one so that the smoothed gain is prevented from changing. The smoothing coefficient is interpolated between these two extremes using the control signal according to:
Figure imgf000021_0001
(13)
By interpolating the smoothing coefficient continuously as a function of the event control signal, the release time is reset to a value proportionate to the event strength at the onset of an event and then increases smoothly to infinity after the occurrence of an event. The rate of this increase is dictated by the coefficient a ^111 used to generate the smoothed event control signal.
Figures 9e and 1Oe show the effect of smoothing the gain with the event- controlled coefficient from Equation 13 as opposed to non-event-controlled coefficient from Equation 7b. In the first case, the event control signal falls to zero after the last piano chord, thereby preventing the gain from moving upwards. As a result, the corresponding modified audio in Figure 9f does not suffer from an unnatural boost of the chord's decay. In the second case, the event control signal never approaches zero, and therefore the smoothed gain signal is inhibited very little through the application of the event control. The trajectory of the smoothed gain is nearly identical to the non-event- controlled gain in Figure 10b. This is exactly the desired effect. Loudness Based AGC and DRC
As an alternative to traditional dynamics processing techniques where signal modifications are a direct function of simple signal measurements such as Peak or RMS power, International Patent Application S.N. PCT/US2005/038579 discloses use of the psychoacoustic based loudness model described earlier as a framework within which to perform dynamics processing. Several advantages are cited. First, measurements and modifications are specified in units of sone, which is a more accurate measure of loudness perception than more basic measures such as Peak or RMS power. Secondly, the audio may be modified such that the perceived spectral balance of the original audio is maintained as the overall loudness is changed. This way, changes to the overall loudness become less perceptually apparent in comparison to a dynamics processor that utilizes a wideband gain, for example, to modify the audio. Lastly, the psychoacoustic model is inherently multi-band, and therefore the system is easily configured to perform multi- band dynamics processing in order to alleviate the well-known cross-spectral pumping problems associated with a wideband dynamics processor.
Although performing dynamics processing in this loudness domain already holds several advantages over more traditional dynamics processing, the technique may be further improved through the use of auditory events to control various parameters. Consider the audio segment containing piano chords as depicted in 27a and the associated DRC shown in Figures 1 Ob and c. One could perform a similar DRC in the loudness domain, and in this case, when the loudness of the final piano chord's decay is boosted, the boost would be less apparent because the spectral balance of the decaying note would be maintained as the boost is applied. However, a better solution is to not boost the decay at all, and therefore one may advantageously apply the same principle of controlling attack and release times with auditory events in the loudness domain as was previously described for the traditional DRC.
The loudness domain dynamics processing system that is now described consists of AGC followed by DRC. The goal of this combination is to make all processed audio have approximately the same perceived loudness while still maintaining at least some of the original audio's dynamics. Figure 11 depicts a suitable set of AGC and DRC curves for this application. Note that the input and output of both curves is represented in units of sone since processing is performed in the loudness domain. The AGC curve strives to bring the output audio closer to some target level, and, as mentioned earlier, does so with relatively slow time constants. One may think of the AGC as making the long-term loudness of the audio equal to the target, but on a short-term basis, the loudness may fluctuate significantly around this target. Therefore, one may employ faster acting DRC to limit these fluctuations to some range deemed acceptable for the particular application. Figure 11 shows such a DRC curve where the AGC target falls within the "null band" of the DRC, the portion of the curve that calls for no modification. With this combination of curves, the AGC places the long-term loudness of the audio within the null-band of the DRC curve so that minimal fast-acting DRC modifications need be applied. If the short- term loudness still fluctuates outside of the null-band, the DRC then acts to move the loudness of the audio towards this null-band. As a final general note, one may apply the slow acting AGC such that all bands of the loudness model receive the same amount of loudness modification, thereby maintaining the perceived spectral balance, and one may apply the fast acting DRC in a manner that allows the loudness modification to vary across bands in order alleviate cross-spectral pumping that might otherwise result from fast acting band-independent loudness modification.
Auditory events may be utilized to control the attack and release of both the AGC and DRC. In the case of AGC, both the attack and release times are large in comparison to the temporal resolution of event perception, and therefore event control may be advantageously employed in both cases. With the DRC, the attack is relatively short, and therefore event control may be needed only for the release as with the traditional DRC described above.
As discussed earlier, one may use the specific loudness spectrum associated with the employed loudness model for the purposes of event detection. A difference signal D[t] , similar to the one in Equations 10a and b may be computed from the specific loudness N\b, t] , defined in Equation 2, as follows:
Figure imgf000023_0001
(14a)
where N[b,t] maκ{N[b,t]} (14b)
Here the maximum of jN[fc,/]| across frequency bands b is used for normalization, although one might employ other normalization factors; for example, the average of
|JV[Z>,*]| across frequency bands. If the difference D[t] exceeds a threshold Dmin , then an event is considered to have occurred. The difference signal may then be processed in the same way shown in Equations 11 and 12 to generate a smooth event control signal A\t\ used to control the attack and release times. ' The AGC curve depicted in Figure 11 may be represented as a function that takes as its input a measure of loudness and generates a desired output loudness:
(15a)
The DRC curve may be similarly represented:
L0 = F0^[L1]
(15b) .
For the AGC, the input loudness is a measure of the audio's long-term loudness. One may compute such a measure by smoothing the instantaneous loudness L\t] , defined in Equation 3, using relatively long time constants (on the order of several seconds). It has been shown that in judging an audio segment's long term loudness, humans weight the louder portions more heavily than the softer, and one may use a faster attack than release in the smoothing to simulate this effect. With the incorporation of event control for both the attack and release, the long-term loudness used for determining the AGC modification may therefore be computed according to:
£AGCW = a AGC[t]LAGC[t -I] + (I - a ACClt])L[t]
(16a) where
AGCV [A[t]aAGCrelease + (l - Λ[f]) Z[r] < ^ecp -1] (16b)
In addition, one may compute an associated long-term specific loudness spectrum that will later be used for the multi-band DRC:
NAGC[b,t] = aAGC[t]NAac[b,t -ϊ] + (l -aAGC[φN[b,t] (16c)
In practice one may choose the smoothing coefficients such that the attack time is approximately half that of the release. Given the long-term loudness measure, one may then compute the loudness modification scaling associated with the AGC as the ratio of the output loudness to input loudness:
C m — AGC i^AGcUS) J-1AGC L* J
(17)
The DRC modification may now be computed from the loudness after the application of the AGC scaling. Rather than smooth a measure of the loudness prior to the application of the DRC curve, one may alternatively apply the DRC curve to the instantaneous loudness and then subsequently smooth the resulting modification. This is similar to the technique described earlier for smoothing the gain of the traditional DRC. In addition, the DRC may be applied in a multi-band fashion, meaning that the DRC modification is a function of the specific loudness N[b,t] in each band b, rather than the overall loudness L[t] . However, in order to maintain the average spectral balance of the original audio, one may apply DRC to each band such that the resulting modifications have the same average effect as would result from applying DRC to the overall loudness. This may be achieved by scaling each band by the ratio of the long-term overall loudness (after the application of the AGC scaling) to the long-term specific loudness, and using this value as the argument to the DRC function. The result is then rescaled by the inverse of said ratio to produce the output specific loudness. Thus, the DRC scaling in each band may be computed according to:
)
Figure imgf000026_0001
The AGC and DRC modifications may then be combined to form a total loudness scaling per band:
STOT[b,f\ = SACC[t]Smc[b,t-\ (19)
This total scaling may then be smoothed across time independently for each band with a fast attack and slow release and event control applied to the release only. Ideally smoothing is performed on the logarithm of the scaling analogous to the gains of the traditional DRC being smoothed in their decibel representation, though this is not essential. To ensure that the smoothed total scaling moves in sync with the specific loudness in each band, attack and release modes may by determined through the simultaneous smoothing of specific loudness itself:
Sτoτ [b, t] = exp(αror [b, t] \og(sτoτ [b, t - 1])+ (1 - aτoτ [b, t]) log{STOT [b, t])) (20a)
N[b,f\ = aroτ[b,t]N[b,t-\] + (l -aτoτ[b,t])N[b,t] (20b)
where
Figure imgf000026_0002
(20c) Finεdly one may compute a target specific loudness based on the smoothed scaling applied to the original specific loudness
N[b,t] = Sτor[b,t]N[b,t] (21)
and then solve for gains G[b,t~\ that when applied to the original excitation result in a specific loudness equal to the target:
N[b,t] = ψ{G2[b,t]E[b,t)}
(22)
The gains may be applied to each band of the filterbank used to compute the excitation, and the modified audio may then be generated by inverting the filterbank to produce a modified time domain audio signal.
Additional Parameter Control
While the discussion above has focused on the control of AGC and DRC attack and release parameters via auditory scene analysis of the audio being processed, other important parameters may also benefit from being controlled via the ASA results. For example, the event control signal Λ[t] from Equation 12 may be used to vary the value of the DRC ratio parameter that is used to dynamically adjust the gain of the audio. The Ratio parameter, similarly to the attack and release time parameters, may contribute significantly to the perceptual artifacts introduced by dynamic gain adjustments.
Implementation
The invention may be implemented in hardware or software, or a combination of both (e.g., programmable logic arrays). Unless otherwise specified, the algorithms included as part of the invention are not inherently related to any particular computer or other apparatus. In particular, various general-purpose machines may be used with programs written in accordance with the teachings herein, or it may be more convenient to construct more specialized apparatus {e.g., integrated circuits) to perform the required method steps. Thus, the invention may be implemented in one or more computer programs executing on one or more programmable computer systems each comprising at least one processor, at least one data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device or port, and at least one output device or port. Program code is applied to input data to perform the functions described herein and generate output information. The output information is applied to one or more output devices, in known fashion.
Each such program may be implemented in any desired computer language (including machine, assembly, or high level procedural, logical, or object oriented programming languages) to communicate with a computer system. In any case, the language may be a compiled or interpreted language.
Each such computer program is preferably stored on or downloaded to a storage media or device (e.g., solid state memory or media, or magnetic or optical media) readable by a general or special purpose programmable computer, for configuring and operating the computer when the storage media or device is read by the computer system to perform the procedures described herein. The inventive system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer system to operate in a specific and predefined manner to perform the functions described herein. A number of embodiments of the invention have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the invention. For example, some of the steps described herein may be order independent, and thus may be performed in an order different from that described.
It should be understood that implementation of other variations and modifications of the invention and its various aspects will be apparent to those skilled in the art, and that the invention is not limited by these specific embodiments described. It is therefore contemplated to cover by the present invention any and all modifications, variations, or equivalents that fall within the true spirit and scope of the basic underlying principles disclosed and claimed herein.
Incorporation by Reference
The following patents, patent applications and publications are hereby incorporated by reference, each in their entirety. Audio Dynamics Processing
Audio Engineer's Reference Book, edited by Michael Talbot-Smith, 2nd edition. Limiters and Compressors, Alan Tutton, 2-1492-165. Focal Press, Reed Educational and Professional Publishing, Ltd., 1999. Detecting and Using Auditory Events
U.S. Patent Application S.N. 10/474,387, "High Quality Time-Scaling and Pitch- Scaling of Audio Signals" of Brett Graham Crockett, published June 24, 2004 as US 2004/0122662 Al.
U.S. Patent Application S.N. 10/478,398, "Method for Time Aligning Audio Signals Using Characterizations Based on Auditory Events" of Brett G. Crockett et al, published July 29, 2004 as US 2004/0148159 Al.
U.S. Patent Application S.N. 10/478,538, "Segmenting Audio Signals Into Auditory Events" of Brett G. Crockett, published August 26, 2004 as US 2004/0165730 Al . Aspects of the present invention provide a way to detect auditory events in addition to those disclosed in said application of Crockett.
U.S. Patent Application S.N. 10/478,397, "Comparing Audio Using Characterizations Based on Auditory Events" of Brett G. Crockett et al, published September 2, 2004 as US 2004/0172240 Al.
International Application under the Patent Cooperation Treaty S.N. PCT/US 05/24630 filed July 13, 2005, entitled "Method for Combining Audio Signals Using
Auditory Scene Analysis," of Michael John Smithers, published March 9, 2006 as WO 2006/026161.
International Application under the Patent Cooperation Treaty S.N. PCT/US 2004/016964, filed May 27, 2004, entitled "Method, Apparatus and Computer Program for Calculating and Adjusting the Perceived Loudness of an Audio Signal" of Alan Jeffrey Seefeldt et al, published December 23, 2004 as WO 2004/111994 A2.
International application under the Patent Cooperation Treaty S.N. PCT/US2005/038579, filed October 25, 2005, entitled "Calculating and Adjusting the Perceived Loudness and/or the Perceived Spectral Balance of an Audio Signal" by Alan Jeffrey Seefeldt and published as International Publication Number WO 2006/047600. "A Method for Characterizing and Identifying Audio Based on Auditory Scene Analysis," by Brett Crockett and Michael Smithers, Audio Engineering Society Convention Paper 64 Id, 118th Convention, Barcelona, May 28-31, 2005. "High Quality Multichannel Time Scaling and Pitch-Shifting using Auditory Scene Analysis," by Brett Crockett, Audio Engineering Society Convention Paper 5948, New York, October 2003.
"A New Objective Measure of Perceived Loudness" by Alan Seefeldt et al, Audio Engineering Society Convention Paper 6236, San Francisco, October 28, 2004.
Handbook for Sound Engineers, The New Audio Cyclopedia, edited by Glen M. Ballou, 2nd edition. Dynamics, 850-851. Focal Press an imprint of Butterworth- Heinemann, 1998.
Audio Engineer 's Reference Book, edited by Michael Talbot-Smith, 2nd edition, Section 2.9 ("Limiters and Compressors" by Alan Tutton), pp. 2.149-2.165, Focal Press, Reed Educational and Professional Publishing, Ltd., 1999.

Claims

Claints
1. An audio processing method in which a processor receives an input channel and generates an output channel that is generated by applying dynamic gain modifications to the input channel, comprising detecting changes in signal characteristics with respect to time in the audio input channel, identifying as auditory event boundaries changes in signal characteristics with respect to time in said input channel, wherein an audio segment between consecutive boundaries constitutes an auditory event in the channel, and generating all or some of one or more parameters of the audio dynamic gain modification method at least partly in response to auditory events and/or the degree of change in signal characteristics associated with said auditory event boundaries.
2. A method according to claim 1 wherein an auditory event is a segment of audio that tends to be perceived as separate and distinct.
3. A method according to claim 1 or claim 2 wherein said signal characteristics include the spectral content of the audio.
4. A method according to claim 1 or claim 2 wherein said signal characteristics include the perceptual loudness of the audio.
5. A method according to any one of claims 1-4 wherein all or some of said one or more parameters are generated at least partly in response to the presence or absence of one or more auditory events.
6. A method according to any one of claims 1-4 wherein said identifying identifies as an auditory event boundary a change in signal characteristics with respect to time that exceeds a threshold.
7. A method according to any one of claims 1-4 wherein said auditory event boundary may be modified by a function to create a control signal that is used to modify the audio dynamic gain modification parameters.
8. A method according to any one of claims 1-4 wherein all or some of said one or more parameters are generated at least partly in response to a continuing measure of the degree of change in signal characteristics associated with said auditory event boundaries.
9. Apparatus adapted to perform the methods of any one of claims 1 through 8.
10. A computer program, stored on a computer-readable medium, for causing a computer to control the apparatus of claim 9.
11. A computer program, stored on a computer-readable medium, for causing a computer to perform the methods of any one of claims 1 through 8.
12. A method for dividing an audio signal into auditory events, each of which tends to be perceived as separate and distinct, comprising calculating the difference in spectral content between successive time blocks of said audio signal, wherein the difference is calculated by comparing the difference in specific loudness between successive time blocks, wherein specific loudness is a measure of perceptual loudness as a function of frequency and time, and identifying an auditory event boundary as the boundary between successive time blocks when the difference in the spectral content between such successive time blocks exceeds a threshold.
13. A method according to claim 12 wherein said audio signal is represented by a discrete time sequence x[ri] that has been sampled from an audio source at a sampling frequency fs and the difference is calculated by comparing the difference in specific loudness N[b,t] across frequency bands b between successive time blocks t.
14. A method according to claim 13 wherein the difference in spectral content between successive time blocks of the audio signal is calculated according to D[t] - NNORM [b, t - 1]|
Figure imgf000033_0001
where
N tb tλ - N[bΛ b
15. A method according to claim 13 wherein the difference in spectral content between successive time blocks of the audio signal is calculated according to
where
Figure imgf000033_0002
16. Apparatus adapted to perform the methods of any one of claims 12 through 15.
17. A computer program, stored on a computer-readable medium, for causing a computer to control the apparatus of claim 16.
18. A computer program, stored on a computer-readable medium, for causing a computer to perform the methods of any one of claims 12 through 15.
PCT/US2007/008313 2006-04-27 2007-03-30 Audio gain control using specific-loudness-based auditory event detection WO2007127023A1 (en)

Priority Applications (53)

Application Number Priority Date Filing Date Title
MX2008013753A MX2008013753A (en) 2006-04-27 2007-03-30 Audio gain control using specific-loudness-based auditory event detection.
AT07754779T ATE493794T1 (en) 2006-04-27 2007-03-30 SOUND GAIN CONTROL WITH CAPTURE OF AUDIENCE EVENTS BASED ON SPECIFIC VOLUME
NO20191310A NO345590B1 (en) 2006-04-27 2007-03-30 Audio amplification control using specific volume-based hearing event detection
KR1020117001302A KR101200615B1 (en) 2006-04-27 2007-03-30 Auto Gain Control Using Specific-Loudness-Based Auditory Event Detection
PL07754779T PL2011234T3 (en) 2006-04-27 2007-03-30 Audio gain control using specific-loudness-based auditory event detection
AU2007243586A AU2007243586B2 (en) 2006-04-27 2007-03-30 Audio gain control using specific-loudness-based auditory event detection
DK07754779.2T DK2011234T3 (en) 2006-04-27 2007-03-30 Audio amplification control using specific-volume-based auditory event detection
CN2007800147428A CN101432965B (en) 2006-04-27 2007-03-30 Audio gain control using specific-loudness-based auditory event detection
JP2009507694A JP5129806B2 (en) 2006-04-27 2007-03-30 Speech gain control using auditory event detection based on specific loudness
EP07754779A EP2011234B1 (en) 2006-04-27 2007-03-30 Audio gain control using specific-loudness-based auditory event detection
BRPI0711063-4A BRPI0711063B1 (en) 2006-04-27 2007-03-30 METHOD AND APPARATUS FOR MODIFYING AN AUDIO DYNAMICS PROCESSING PARAMETER
CA2648237A CA2648237C (en) 2006-04-27 2007-03-30 Audio gain control using specific-loudness-based auditory event detection
US12/226,698 US8144881B2 (en) 2006-04-27 2007-03-30 Audio gain control using specific-loudness-based auditory event detection
DE602007011594T DE602007011594D1 (en) 2006-04-27 2007-03-30 SOUND AMPLIFICATION WITH RECORDING OF PUBLIC EVENTS ON THE BASIS OF SPECIFIC VOLUME
IL194430A IL194430A (en) 2006-04-27 2008-09-28 Audio gain control using specific-loudness-based auditory event detection
NO20084336A NO339346B1 (en) 2006-04-27 2008-10-16 Audio gain control using specific volume-based hearing event detection
HK09106026.6A HK1126902A1 (en) 2006-04-27 2009-07-03 Audio gain control using specific-loudness-based auditory event detection
AU2011201348A AU2011201348B2 (en) 2006-04-27 2011-03-24 Audio Gain Control using Specific-Loudness-Based Auditory Event Detection
US13/406,929 US9136810B2 (en) 2006-04-27 2012-02-28 Audio gain control using specific-loudness-based auditory event detection
US13/464,102 US8428270B2 (en) 2006-04-27 2012-05-04 Audio gain control using specific-loudness-based auditory event detection
US13/850,380 US9450551B2 (en) 2006-04-27 2013-03-26 Audio control using auditory event detection
NO20161296A NO342157B1 (en) 2006-04-27 2016-08-12 Audio gain control using specific volume-based hearing event detection
NO20161295A NO342160B1 (en) 2006-04-27 2016-08-12 Audio gain control using specific volume-based hearing event detection
US15/238,820 US9685924B2 (en) 2006-04-27 2016-08-17 Audio control using auditory event detection
NO20161439A NO342164B1 (en) 2006-04-27 2016-09-12 Audio gain control using specific volume-based hearing event detection
US15/447,518 US9780751B2 (en) 2006-04-27 2017-03-02 Audio control using auditory event detection
US15/447,482 US9742372B2 (en) 2006-04-27 2017-03-02 Audio control using auditory event detection
US15/447,556 US9787269B2 (en) 2006-04-27 2017-03-02 Audio control using auditory event detection
US15/447,456 US9698744B1 (en) 2006-04-27 2017-03-02 Audio control using auditory event detection
US15/447,529 US9774309B2 (en) 2006-04-27 2017-03-02 Audio control using auditory event detection
US15/447,469 US9768749B2 (en) 2006-04-27 2017-03-02 Audio control using auditory event detection
US15/447,493 US9762196B2 (en) 2006-04-27 2017-03-02 Audio control using auditory event detection
US15/447,564 US9866191B2 (en) 2006-04-27 2017-03-02 Audio control using auditory event detection
US15/447,543 US9787268B2 (en) 2006-04-27 2017-03-02 Audio control using auditory event detection
US15/447,503 US9768750B2 (en) 2006-04-27 2017-03-02 Audio control using auditory event detection
US15/809,413 US10103700B2 (en) 2006-04-27 2017-11-10 Audio control using auditory event detection
NO20180271A NO344655B1 (en) 2006-04-27 2018-02-21 Audio amplification control using specific volume-based hearing event detection
NO20180266A NO343877B1 (en) 2006-04-27 2018-02-21 Audio gain control using specific volume-based hearing event detection
NO20180272A NO344658B1 (en) 2006-04-27 2018-02-21 Audio amplification control using specific volume-based hearing event detection
US16/128,642 US10284159B2 (en) 2006-04-27 2018-09-12 Audio control using auditory event detection
NO20190022A NO344013B1 (en) 2006-04-27 2019-01-07 Audio gain control using specific volume-based hearing event detection
NO20190018A NO344363B1 (en) 2006-04-27 2019-01-07 Audio amplification control using specific volume-based hearing event detection
NO20190002A NO344364B1 (en) 2006-04-27 2019-01-07 Audio amplification control using specific volume-based hearing event detection
NO20190025A NO344361B1 (en) 2006-04-27 2019-01-08 Audio amplification control using specific volume-based hearing event detection
NO20190024A NO344362B1 (en) 2006-04-27 2019-01-08 Audio amplification control using specific volume-based hearing event detection
US16/365,947 US10523169B2 (en) 2006-04-27 2019-03-27 Audio control using auditory event detection
US16/729,468 US10833644B2 (en) 2006-04-27 2019-12-29 Audio control using auditory event detection
US17/093,178 US11362631B2 (en) 2006-04-27 2020-11-09 Audio control using auditory event detection
US17/839,099 US11711060B2 (en) 2006-04-27 2022-06-13 Audio control using auditory event detection
US18/327,585 US11962279B2 (en) 2006-04-27 2023-06-01 Audio control using auditory event detection
US18/672,224 US20240313729A1 (en) 2006-04-27 2024-05-23 Audio control using auditory event detection
US18/672,726 US20240313730A1 (en) 2006-04-27 2024-05-23 Audio control using auditory event detection
US18/672,762 US20240313731A1 (en) 2006-04-27 2024-05-23 Audio control using auditory event detection

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US79580806P 2006-04-27 2006-04-27
US60/795,808 2006-04-27

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US12/226,698 A-371-Of-International US8144881B2 (en) 2006-04-27 2007-03-30 Audio gain control using specific-loudness-based auditory event detection
US13/406,929 Continuation US9136810B2 (en) 2006-04-27 2012-02-28 Audio gain control using specific-loudness-based auditory event detection

Publications (1)

Publication Number Publication Date
WO2007127023A1 true WO2007127023A1 (en) 2007-11-08

Family

ID=38445597

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/008313 WO2007127023A1 (en) 2006-04-27 2007-03-30 Audio gain control using specific-loudness-based auditory event detection

Country Status (22)

Country Link
US (26) US8144881B2 (en)
EP (1) EP2011234B1 (en)
JP (2) JP5129806B2 (en)
KR (2) KR101200615B1 (en)
CN (2) CN102684628B (en)
AT (1) ATE493794T1 (en)
AU (2) AU2007243586B2 (en)
BR (1) BRPI0711063B1 (en)
CA (1) CA2648237C (en)
DE (1) DE602007011594D1 (en)
DK (1) DK2011234T3 (en)
ES (1) ES2359799T3 (en)
HK (2) HK1126902A1 (en)
IL (1) IL194430A (en)
MX (1) MX2008013753A (en)
MY (1) MY141426A (en)
NO (13) NO345590B1 (en)
PL (1) PL2011234T3 (en)
RU (1) RU2417514C2 (en)
TW (1) TWI455481B (en)
UA (1) UA93243C2 (en)
WO (1) WO2007127023A1 (en)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7508947B2 (en) 2004-08-03 2009-03-24 Dolby Laboratories Licensing Corporation Method for combining audio signals using auditory scene analysis
WO2010044439A1 (en) * 2008-10-17 2010-04-22 シャープ株式会社 Audio signal adjustment device and audio signal adjustment method
US8019095B2 (en) 2006-04-04 2011-09-13 Dolby Laboratories Licensing Corporation Loudness modification of multichannel audio signals
CN102195585A (en) * 2010-03-12 2011-09-21 哈曼贝克自动系统股份有限公司 Automatic correction of loudness level in audio signals
US8090120B2 (en) 2004-10-26 2012-01-03 Dolby Laboratories Licensing Corporation Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
US8144881B2 (en) 2006-04-27 2012-03-27 Dolby Laboratories Licensing Corporation Audio gain control using specific-loudness-based auditory event detection
JP2012509038A (en) * 2008-11-14 2012-04-12 ザット コーポレーション Dynamic volume control and multi-space processing prevention
US8170882B2 (en) 2004-03-01 2012-05-01 Dolby Laboratories Licensing Corporation Multichannel audio coding
US8195472B2 (en) 2001-04-13 2012-06-05 Dolby Laboratories Licensing Corporation High quality time-scaling and pitch-scaling of audio signals
US8199933B2 (en) 2004-10-26 2012-06-12 Dolby Laboratories Licensing Corporation Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
US8280743B2 (en) 2005-06-03 2012-10-02 Dolby Laboratories Licensing Corporation Channel reconfiguration with side information
US8396574B2 (en) 2007-07-13 2013-03-12 Dolby Laboratories Licensing Corporation Audio processing using auditory scene analysis and spectral skewness
US8437482B2 (en) 2003-05-28 2013-05-07 Dolby Laboratories Licensing Corporation Method, apparatus and computer program for calculating and adjusting the perceived loudness of an audio signal
US8488800B2 (en) 2001-04-13 2013-07-16 Dolby Laboratories Licensing Corporation Segmenting audio signals into auditory events
US8504181B2 (en) 2006-04-04 2013-08-06 Dolby Laboratories Licensing Corporation Audio signal loudness measurement and modification in the MDCT domain
US8521314B2 (en) 2006-11-01 2013-08-27 Dolby Laboratories Licensing Corporation Hierarchical control path with constraints for audio dynamics processing
WO2013154868A1 (en) * 2012-04-12 2013-10-17 Dolby Laboratories Licensing Corporation System and method for leveling loudness variation in an audio signal
WO2014046941A1 (en) * 2012-09-19 2014-03-27 Dolby Laboratories Licensing Corporation Method and system for object-dependent adjustment of levels of audio objects
US8744247B2 (en) 2008-09-19 2014-06-03 Dolby Laboratories Licensing Corporation Upstream quality enhancement signal processing for resource constrained client devices
US8849433B2 (en) 2006-10-20 2014-09-30 Dolby Laboratories Licensing Corporation Audio dynamics processing using a reset
WO2014160542A2 (en) 2013-03-26 2014-10-02 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
WO2014160678A2 (en) 2013-03-26 2014-10-02 Dolby Laboratories Licensing Corporation 1apparatuses and methods for audio classifying and processing
WO2014160548A1 (en) 2013-03-26 2014-10-02 Dolby Laboratories Licensing Corporation Equalizer controller and controlling method
US8892426B2 (en) 2008-12-24 2014-11-18 Dolby Laboratories Licensing Corporation Audio signal loudness determination and modification in the frequency domain
US9185507B2 (en) 2007-06-08 2015-11-10 Dolby Laboratories Licensing Corporation Hybrid derivation of surround sound audio channels by controllably combining ambience and matrix-decoded signal components
US9300714B2 (en) 2008-09-19 2016-03-29 Dolby Laboratories Licensing Corporation Upstream signal processing for client devices in a small-cell wireless network
WO2017023601A1 (en) * 2015-07-31 2017-02-09 Apple Inc. Encoded audio extended metadata-based dynamic range control
WO2017142916A1 (en) * 2016-02-19 2017-08-24 Dolby Laboratories Licensing Corporation Diffusivity based sound processing method and apparatus
US10306392B2 (en) 2015-11-03 2019-05-28 Dolby Laboratories Licensing Corporation Content-adaptive surround sound virtualization
US10923132B2 (en) 2016-02-19 2021-02-16 Dolby Laboratories Licensing Corporation Diffusivity based sound processing method and apparatus
US20220165289A1 (en) * 2020-11-23 2022-05-26 Cyber Resonance Corporation Methods and systems for processing recorded audio content to enhance speech
RU2826268C2 (en) * 2013-03-26 2024-09-09 Долби Лабораторис Лайсэнзин Корпорейшн Loudness equalizer controller and control method

Families Citing this family (77)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009086174A1 (en) 2007-12-21 2009-07-09 Srs Labs, Inc. System for adjusting perceived loudness of audio signals
KR101230479B1 (en) * 2008-03-10 2013-02-06 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. Device and method for manipulating an audio signal having a transient event
EP2373067B1 (en) * 2008-04-18 2013-04-17 Dolby Laboratories Licensing Corporation Method and apparatus for maintaining speech audibility in multi-channel audio with minimal impact on surround experience
US9253560B2 (en) * 2008-09-16 2016-02-02 Personics Holdings, Llc Sound library and method
EP2401872A4 (en) * 2009-02-25 2012-05-23 Conexant Systems Inc Speaker distortion reduction system and method
US8422699B2 (en) * 2009-04-17 2013-04-16 Linear Acoustic, Inc. Loudness consistency at program boundaries
WO2010126709A1 (en) 2009-04-30 2010-11-04 Dolby Laboratories Licensing Corporation Low complexity auditory event boundary detection
US8761415B2 (en) 2009-04-30 2014-06-24 Dolby Laboratories Corporation Controlling the loudness of an audio signal in response to spectral localization
TWI503816B (en) 2009-05-06 2015-10-11 Dolby Lab Licensing Corp Adjusting the loudness of an audio signal with perceived spectral balance preservation
US9055374B2 (en) * 2009-06-24 2015-06-09 Arizona Board Of Regents For And On Behalf Of Arizona State University Method and system for determining an auditory pattern of an audio segment
US8249275B1 (en) * 2009-06-26 2012-08-21 Cirrus Logic, Inc. Modulated gain audio control and zipper noise suppression techniques using modulated gain
US8554348B2 (en) * 2009-07-20 2013-10-08 Apple Inc. Transient detection using a digital audio workstation
DE112009005215T8 (en) * 2009-08-04 2013-01-03 Nokia Corp. Method and apparatus for audio signal classification
US8538042B2 (en) 2009-08-11 2013-09-17 Dts Llc System for increasing perceived loudness of speakers
EP2487791A1 (en) * 2009-10-07 2012-08-15 Nec Corporation Multiband compressor and adjustment method of same
CN105847829B (en) 2010-11-23 2019-08-09 Lg电子株式会社 Video coding apparatus and video decoder
US8855322B2 (en) * 2011-01-12 2014-10-07 Qualcomm Incorporated Loudness maximization with constrained loudspeaker excursion
JP5707219B2 (en) * 2011-05-13 2015-04-22 富士通テン株式会社 Acoustic control device
WO2012161717A1 (en) * 2011-05-26 2012-11-29 Advanced Bionics Ag Systems and methods for improving representation by an auditory prosthesis system of audio signals having intermediate sound levels
DE102011085036A1 (en) * 2011-10-21 2013-04-25 Siemens Medical Instruments Pte. Ltd. Method for determining a compression characteristic
TWI575962B (en) * 2012-02-24 2017-03-21 杜比國際公司 Low delay real-to-complex conversion in overlapping filter banks for partially complex processing
US9312829B2 (en) 2012-04-12 2016-04-12 Dts Llc System for adjusting loudness of audio signals in real time
JP5527827B2 (en) * 2012-04-17 2014-06-25 Necエンジニアリング株式会社 Loudness adjusting device, loudness adjusting method, and program
US9685921B2 (en) 2012-07-12 2017-06-20 Dts, Inc. Loudness control with noise detection and loudness drop detection
US9713675B2 (en) 2012-07-17 2017-07-25 Elwha Llc Unmanned device interaction methods and systems
US9044543B2 (en) 2012-07-17 2015-06-02 Elwha Llc Unmanned device utilization methods and systems
US9991861B2 (en) * 2012-08-10 2018-06-05 Bellevue Investments Gmbh & Co. Kgaa System and method for controlled dynamics adaptation for musical content
KR102071860B1 (en) * 2013-01-21 2020-01-31 돌비 레버러토리즈 라이쎈싱 코오포레이션 Optimizing loudness and dynamic range across different playback devices
EP2974253B1 (en) 2013-03-15 2019-05-08 Dolby Laboratories Licensing Corporation Normalization of soundfield orientations based on auditory scene analysis
JP6216553B2 (en) * 2013-06-27 2017-10-18 クラリオン株式会社 Propagation delay correction apparatus and propagation delay correction method
US10095468B2 (en) * 2013-09-12 2018-10-09 Dolby Laboratories Licensing Corporation Dynamic range control for a wide variety of playback environments
CN105531759B (en) * 2013-09-12 2019-11-26 杜比实验室特许公司 Loudness for lower mixed audio content adjusts
US9608588B2 (en) * 2014-01-22 2017-03-28 Apple Inc. Dynamic range control with large look-ahead
US10063207B2 (en) * 2014-02-27 2018-08-28 Dts, Inc. Object-based audio loudness management
EP3111627B1 (en) 2014-02-28 2018-07-04 Dolby Laboratories Licensing Corporation Perceptual continuity using change blindness in conferencing
CN110808723B (en) 2014-05-26 2024-09-17 杜比实验室特许公司 Audio signal loudness control
WO2016007947A1 (en) * 2014-07-11 2016-01-14 Arizona Board Of Regents On Behalf Of Arizona State University Fast computation of excitation pattern, auditory pattern and loudness
WO2016011288A1 (en) 2014-07-16 2016-01-21 Eariq, Inc. System and method for calibration and reproduction of audio signals based on auditory feedback
US10020001B2 (en) 2014-10-01 2018-07-10 Dolby International Ab Efficient DRC profile transmission
CN112185401B (en) 2014-10-10 2024-07-02 杜比实验室特许公司 Program loudness based on transmission-independent representations
JP6228100B2 (en) * 2014-11-17 2017-11-08 Necプラットフォームズ株式会社 Loudness adjustment device, loudness adjustment method, and loudness adjustment program
US20160171987A1 (en) * 2014-12-16 2016-06-16 Psyx Research, Inc. System and method for compressed audio enhancement
US10623854B2 (en) * 2015-03-25 2020-04-14 Dolby Laboratories Licensing Corporation Sub-band mixing of multiple microphones
US9653094B2 (en) * 2015-04-24 2017-05-16 Cyber Resonance Corporation Methods and systems for performing signal analysis to identify content types
US10109288B2 (en) 2015-05-27 2018-10-23 Apple Inc. Dynamic range and peak control in audio using nonlinear filters
GB2581032B (en) 2015-06-22 2020-11-04 Time Machine Capital Ltd System and method for onset detection in a digital signal
EP3341121A1 (en) 2015-08-28 2018-07-04 The Procter and Gamble Company Catalysts for the dehydration of hydroxypropionic acid and its derivatives
US9590580B1 (en) 2015-09-13 2017-03-07 Guoguang Electric Company Limited Loudness-based audio-signal compensation
US10341770B2 (en) * 2015-09-30 2019-07-02 Apple Inc. Encoded audio metadata-based loudness equalization and dynamic equalization during DRC
CN105404654A (en) * 2015-10-30 2016-03-16 魅族科技(中国)有限公司 Audio file playing method and device
US9756281B2 (en) 2016-02-05 2017-09-05 Gopro, Inc. Apparatus and method for audio based video synchronization
CN105845151B (en) * 2016-05-30 2019-05-31 百度在线网络技术(北京)有限公司 Audio gain method of adjustment and device applied to speech recognition front-ends
US20170365255A1 (en) * 2016-06-15 2017-12-21 Adam Kupryjanow Far field automatic speech recognition pre-processing
US9697849B1 (en) 2016-07-25 2017-07-04 Gopro, Inc. Systems and methods for audio based synchronization using energy vectors
US9640159B1 (en) * 2016-08-25 2017-05-02 Gopro, Inc. Systems and methods for audio based synchronization using sound harmonics
US9653095B1 (en) 2016-08-30 2017-05-16 Gopro, Inc. Systems and methods for determining a repeatogram in a music composition using audio features
GB201615538D0 (en) * 2016-09-13 2016-10-26 Nokia Technologies Oy A method , apparatus and computer program for processing audio signals
US9916822B1 (en) 2016-10-07 2018-03-13 Gopro, Inc. Systems and methods for audio remixing using repeated segments
TWI590239B (en) * 2016-12-09 2017-07-01 宏碁股份有限公司 Voice signal processing apparatus and voice signal processing method
CN108281148B (en) * 2016-12-30 2020-12-22 宏碁股份有限公司 Speech signal processing apparatus and speech signal processing method
US10374564B2 (en) 2017-04-20 2019-08-06 Dts, Inc. Loudness control with noise detection and loudness drop detection
US10491179B2 (en) 2017-09-25 2019-11-26 Nuvoton Technology Corporation Asymmetric multi-channel audio dynamic range processing
WO2019068915A1 (en) * 2017-10-06 2019-04-11 Sony Europe Limited Audio file envelope based on rms power in sequences of sub-windows
US11011180B2 (en) 2018-06-29 2021-05-18 Guoguang Electric Company Limited Audio signal dynamic range compression
US11894006B2 (en) 2018-07-25 2024-02-06 Dolby Laboratories Licensing Corporation Compressor target curve to avoid boosting noise
KR102584779B1 (en) * 2018-09-07 2023-10-05 그레이스노트, 인코포레이티드 Method and apparatus for dynamic volume control through audio classification
US11775250B2 (en) 2018-09-07 2023-10-03 Gracenote, Inc. Methods and apparatus for dynamic volume adjustment via audio classification
JP7031543B2 (en) * 2018-09-21 2022-03-08 株式会社Jvcケンウッド Processing equipment, processing method, reproduction method, and program
CN112640301B (en) * 2018-09-28 2022-03-29 杜比实验室特许公司 Method and apparatus for dynamically adjusting threshold of compressor
EP4408022A3 (en) 2018-10-24 2024-10-16 Gracenote, Inc. Methods and apparatus to adjust audio playback settings based on analysis of audio characteristics
US11347470B2 (en) * 2018-11-16 2022-05-31 Roku, Inc. Detection of media playback loudness level and corresponding adjustment to audio during media replacement event
CN109889170B (en) * 2019-02-25 2021-06-04 珠海格力电器股份有限公司 Audio signal control method and device
JP7275711B2 (en) * 2019-03-20 2023-05-18 ヤマハ株式会社 How audio signals are processed
US11133787B2 (en) 2019-06-25 2021-09-28 The Nielsen Company (Us), Llc Methods and apparatus to determine automated gain control parameters for an automated gain control protocol
US11019301B2 (en) 2019-06-25 2021-05-25 The Nielsen Company (Us), Llc Methods and apparatus to perform an automated gain control protocol with an amplifier based on historical data corresponding to contextual data
WO2021183916A1 (en) * 2020-03-13 2021-09-16 Immersion Networks, Inc. Loudness equalization system
EP3961624B1 (en) * 2020-08-28 2024-09-25 Sivantos Pte. Ltd. Method for operating a hearing aid depending on a speech signal

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040165730A1 (en) * 2001-04-13 2004-08-26 Crockett Brett G Segmenting audio signals into auditory events

Family Cites Families (167)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US2808475A (en) 1954-10-05 1957-10-01 Bell Telephone Labor Inc Loudness indicator
DE1736966U (en) 1956-09-28 1956-12-27 Heinz Schulze PROPELLER FOR PLAY AND MODEL AIRPLANES.
SU720691A1 (en) 1978-04-27 1980-03-05 Предприятие П/Я Р-6609 Automatic gain control device
US4281218A (en) 1979-10-26 1981-07-28 Bell Telephone Laboratories, Incorporated Speech-nonspeech detector-classifier
US4624009A (en) 1980-05-02 1986-11-18 Figgie International, Inc. Signal pattern encoder and classifier
DE3314570A1 (en) 1983-04-22 1984-10-25 Philips Patentverwaltung Gmbh, 2000 Hamburg METHOD AND ARRANGEMENT FOR ADJUSTING THE REINFORCEMENT
US4739514A (en) 1986-12-22 1988-04-19 Bose Corporation Automatic dynamic equalizing
US4887299A (en) 1987-11-12 1989-12-12 Nicolet Instrument Corporation Adaptive, programmable signal processing hearing aid
US4882762A (en) * 1988-02-23 1989-11-21 Resound Corporation Multi-band programmable compression system
KR940003351B1 (en) 1988-03-31 1994-04-20 주식회사 금성사 Circuit for auto gain control
US4953112A (en) 1988-05-10 1990-08-28 Minnesota Mining And Manufacturing Company Method and apparatus for determining acoustic parameters of an auditory prosthesis using software model
US5027410A (en) 1988-11-10 1991-06-25 Wisconsin Alumni Research Foundation Adaptive, programmable signal processing and filtering for hearing aids
JPH02118322U (en) 1989-03-08 1990-09-21
US5097510A (en) 1989-11-07 1992-03-17 Gs Systems, Inc. Artificial intelligence pattern-recognition-based noise reduction system for speech processing
US5369711A (en) 1990-08-31 1994-11-29 Bellsouth Corporation Automatic gain control for a headset
SG49883A1 (en) 1991-01-08 1998-06-15 Dolby Lab Licensing Corp Encoder/decoder for multidimensional sound fields
US5632005A (en) 1991-01-08 1997-05-20 Ray Milton Dolby Encoder/decoder for multidimensional sound fields
DE69214882T2 (en) 1991-06-06 1997-03-20 Matsushita Electric Ind Co Ltd Device for distinguishing between music and speech
US5278912A (en) 1991-06-28 1994-01-11 Resound Corporation Multiband programmable compression system
US5175769A (en) 1991-07-23 1992-12-29 Rolm Systems Method for time-scale modification of signals
KR940003351Y1 (en) 1991-10-17 1994-05-23 삼성전관 주식회사 Device for attachable polarizer
US5363147A (en) 1992-06-01 1994-11-08 North American Philips Corporation Automatic volume leveler
KR940003351A (en) 1992-07-15 1994-02-21 강진구 On-screen graphic display control device and method
GB2272615A (en) 1992-11-17 1994-05-18 Rudolf Bisping Controlling signal-to-noise ratio in noisy recordings
DE4335739A1 (en) 1992-11-17 1994-05-19 Rudolf Prof Dr Bisping Automatically controlling signal=to=noise ratio of noisy recordings
US5457769A (en) 1993-03-30 1995-10-10 Earmark, Inc. Method and apparatus for detecting the presence of human voice signals in audio signals
US5706352A (en) 1993-04-07 1998-01-06 K/S Himpp Adaptive gain and filtering circuit for a sound reproduction system
US5434922A (en) 1993-04-08 1995-07-18 Miller; Thomas E. Method and apparatus for dynamic sound optimization
BE1007355A3 (en) 1993-07-26 1995-05-23 Philips Electronics Nv Voice signal circuit discrimination and an audio device with such circuit.
IN184794B (en) 1993-09-14 2000-09-30 British Telecomm
JP2986345B2 (en) 1993-10-18 1999-12-06 インターナショナル・ビジネス・マシーンズ・コーポレイション Voice recording indexing apparatus and method
TW247390B (en) 1994-04-29 1995-05-11 Audio Products Int Corp Apparatus and method for adjusting levels between channels of a sound system
US5463695A (en) * 1994-06-20 1995-10-31 Aphex Systems, Ltd. Peak accelerated compressor
US5500902A (en) 1994-07-08 1996-03-19 Stockham, Jr.; Thomas G. Hearing aid device incorporating signal processing techniques
GB9419388D0 (en) 1994-09-26 1994-11-09 Canon Kk Speech analysis
US5548538A (en) 1994-12-07 1996-08-20 Wiltron Company Internal automatic calibrator for vector network analyzers
US5682463A (en) 1995-02-06 1997-10-28 Lucent Technologies Inc. Perceptual audio compression based on loudness uncertainty
CA2167748A1 (en) 1995-02-09 1996-08-10 Yoav Freund Apparatus and methods for machine learning hypotheses
EP0661905B1 (en) 1995-03-13 2002-12-11 Phonak Ag Method for the fitting of hearing aids, device therefor and hearing aid
US5727119A (en) 1995-03-27 1998-03-10 Dolby Laboratories Licensing Corporation Method and apparatus for efficient implementation of single-sideband filter banks providing accurate measures of spectral magnitude and phase
WO1996032710A1 (en) 1995-04-10 1996-10-17 Corporate Computer Systems, Inc. System for compression and decompression of audio signals for digital transmission
US6301555B2 (en) 1995-04-10 2001-10-09 Corporate Computer Systems Adjustable psycho-acoustic parameters
US5601617A (en) 1995-04-26 1997-02-11 Advanced Bionics Corporation Multichannel cochlear prosthesis with flexible control of stimulus waveforms
JPH08328599A (en) 1995-06-01 1996-12-13 Mitsubishi Electric Corp Mpeg audio decoder
US5663727A (en) 1995-06-23 1997-09-02 Hearing Innovations Incorporated Frequency response analyzer and shaping apparatus and digital hearing enhancement apparatus and method utilizing the same
US5712954A (en) 1995-08-23 1998-01-27 Rockwell International Corp. System and method for monitoring audio power level of agent speech in a telephonic switch
US6002776A (en) 1995-09-18 1999-12-14 Interval Research Corporation Directional acoustic signal processor and method therefor
US5907622A (en) 1995-09-21 1999-05-25 Dougherty; A. Michael Automatic noise compensation system for audio reproduction equipment
WO1997027578A1 (en) * 1996-01-26 1997-07-31 Motorola Inc. Very low bit rate time domain speech analyzer for voice messaging
US6327366B1 (en) 1996-05-01 2001-12-04 Phonak Ag Method for the adjustment of a hearing device, apparatus to do it and a hearing device
US6108431A (en) 1996-05-01 2000-08-22 Phonak Ag Loudness limiter
US6430533B1 (en) 1996-05-03 2002-08-06 Lsi Logic Corporation Audio decoder core MPEG-1/MPEG-2/AC-3 functional algorithm partitioning and implementation
JPH09312540A (en) 1996-05-23 1997-12-02 Pioneer Electron Corp Loudness volume controller
JP3765622B2 (en) 1996-07-09 2006-04-12 ユナイテッド・モジュール・コーポレーション Audio encoding / decoding system
EP0820212B1 (en) 1996-07-19 2010-04-21 Bernafon AG Acoustic signal processing based on loudness control
JPH1074097A (en) 1996-07-26 1998-03-17 Ind Technol Res Inst Parameter changing method and device for audio signal
JP2953397B2 (en) 1996-09-13 1999-09-27 日本電気株式会社 Hearing compensation processing method for digital hearing aid and digital hearing aid
US6049766A (en) 1996-11-07 2000-04-11 Creative Technology Ltd. Time-domain time/pitch scaling of speech or audio signals with transient handling
JP2991982B2 (en) 1996-11-29 1999-12-20 日本イーライリリー株式会社 Injection practice equipment
US6570991B1 (en) 1996-12-18 2003-05-27 Interval Research Corporation Multi-feature speech/music discrimination system
US5862228A (en) 1997-02-21 1999-01-19 Dolby Laboratories Licensing Corporation Audio matrix encoding
US6125343A (en) 1997-05-29 2000-09-26 3Com Corporation System and method for selecting a loudest speaker by comparing average frame gains
US6272360B1 (en) 1997-07-03 2001-08-07 Pan Communications, Inc. Remotely installed transmitter and a hands-free two-way voice terminal device using same
US6185309B1 (en) 1997-07-11 2001-02-06 The Regents Of The University Of California Method and apparatus for blind separation of mixed and convolved sources
KR100261904B1 (en) 1997-08-29 2000-07-15 윤종용 Headphone sound output apparatus
US6088461A (en) 1997-09-26 2000-07-11 Crystal Semiconductor Corporation Dynamic volume control system
US6330672B1 (en) 1997-12-03 2001-12-11 At&T Corp. Method and apparatus for watermarking digital bitstreams
US6233554B1 (en) 1997-12-12 2001-05-15 Qualcomm Incorporated Audio CODEC with AGC controlled by a VOCODER
US6298139B1 (en) 1997-12-31 2001-10-02 Transcrypt International, Inc. Apparatus and method for maintaining a constant speech envelope using variable coefficient automatic gain control
US6182033B1 (en) 1998-01-09 2001-01-30 At&T Corp. Modular approach to speech enhancement with an application to speech coding
US6353671B1 (en) 1998-02-05 2002-03-05 Bioinstco Corp. Signal processing circuit and method for increasing speech intelligibility
US6311155B1 (en) 2000-02-04 2001-10-30 Hearing Enhancement Company Llc Use of voice-to-remaining audio (VRA) in consumer applications
US6498855B1 (en) 1998-04-17 2002-12-24 International Business Machines Corporation Method and system for selectively and variably attenuating audio data
AU758242B2 (en) 1998-06-08 2003-03-20 Cochlear Limited Hearing instrument
EP0980064A1 (en) 1998-06-26 2000-02-16 Ascom AG Method for carrying an automatic judgement of the transmission quality of audio signals
GB2340351B (en) 1998-07-29 2004-06-09 British Broadcasting Corp Data transmission
US6351731B1 (en) 1998-08-21 2002-02-26 Polycom, Inc. Adaptive filter featuring spectral gain smoothing and variable noise multiplier for noise reduction, and method therefor
US6823303B1 (en) 1998-08-24 2004-11-23 Conexant Systems, Inc. Speech encoder using voice activity detection in coding noise
US6411927B1 (en) 1998-09-04 2002-06-25 Matsushita Electric Corporation Of America Robust preprocessing signal equalization system and method for normalizing to a target environment
FI113935B (en) 1998-09-25 2004-06-30 Nokia Corp Method for Calibrating the Sound Level in a Multichannel Audio System and a Multichannel Audio System
US6266644B1 (en) 1998-09-26 2001-07-24 Liquid Audio, Inc. Audio encoding apparatus and methods
DE19848491A1 (en) 1998-10-21 2000-04-27 Bosch Gmbh Robert Radio receiver with audio data system has control unit to allocate sound characteristic according to transferred program type identification adjusted in receiving section
US6314396B1 (en) 1998-11-06 2001-11-06 International Business Machines Corporation Automatic gain control in a speech recognition system
GB9824776D0 (en) 1998-11-11 1999-01-06 Kemp Michael J Audio dynamic control effects synthesiser
WO2000047014A1 (en) 1999-02-05 2000-08-10 The University Of Melbourne Adaptive dynamic range optimisation sound processor
EP1089242B1 (en) 1999-04-09 2006-11-08 Texas Instruments Incorporated Supply of digital audio and video products
AU4278300A (en) 1999-04-26 2000-11-10 Dspfactory Ltd. Loudness normalization control for a digital hearing aid
US6263371B1 (en) 1999-06-10 2001-07-17 Cacheflow, Inc. Method and apparatus for seaming of streaming content
US6985594B1 (en) 1999-06-15 2006-01-10 Hearing Enhancement Co., Llc. Voice-to-remaining audio (VRA) interactive hearing aid and auxiliary equipment
US6442278B1 (en) 1999-06-15 2002-08-27 Hearing Enhancement Company, Llc Voice-to-remaining audio (VRA) interactive center channel downmix
US6778966B2 (en) 1999-11-29 2004-08-17 Syfx Segmented mapping converter system and method
FR2802329B1 (en) 1999-12-08 2003-03-28 France Telecom PROCESS FOR PROCESSING AT LEAST ONE AUDIO CODE BINARY FLOW ORGANIZED IN THE FORM OF FRAMES
US6351733B1 (en) 2000-03-02 2002-02-26 Hearing Enhancement Company, Llc Method and apparatus for accommodating primary content audio and secondary content remaining audio capability in the digital audio production process
DE10018666A1 (en) 2000-04-14 2001-10-18 Harman Audio Electronic Sys Dynamic sound optimization in the interior of a motor vehicle or similar noisy environment, a monitoring signal is split into desired-signal and noise-signal components which are used for signal adjustment
US6651040B1 (en) 2000-05-31 2003-11-18 International Business Machines Corporation Method for dynamic adjustment of audio input gain in a speech system
US6889186B1 (en) 2000-06-01 2005-05-03 Avaya Technology Corp. Method and apparatus for improving the intelligibility of digitally compressed speech
KR100898879B1 (en) 2000-08-16 2009-05-25 돌비 레버러토리즈 라이쎈싱 코오포레이션 Modulating One or More Parameter of An Audio or Video Perceptual Coding System in Response to Supplemental Information
AUPQ952700A0 (en) 2000-08-21 2000-09-14 University Of Melbourne, The Sound-processing strategy for cochlear implants
JP3448586B2 (en) 2000-08-29 2003-09-22 独立行政法人産業技術総合研究所 Sound measurement method and system considering hearing impairment
US20040013272A1 (en) * 2001-09-07 2004-01-22 Reams Robert W System and method for processing audio data
US6625433B1 (en) 2000-09-29 2003-09-23 Agere Systems Inc. Constant compression automatic gain control circuit
US6947888B1 (en) * 2000-10-17 2005-09-20 Qualcomm Incorporated Method and apparatus for high performance low bit-rate coding of unvoiced speech
US6807525B1 (en) 2000-10-31 2004-10-19 Telogy Networks, Inc. SID frame detection with human auditory perception compensation
DE60029453T2 (en) 2000-11-09 2007-04-12 Koninklijke Kpn N.V. Measuring the transmission quality of a telephone connection in a telecommunications network
US7457422B2 (en) 2000-11-29 2008-11-25 Ford Global Technologies, Llc Method and implementation for detecting and characterizing audible transients in noise
US6958644B2 (en) 2001-01-10 2005-10-25 The Trustees Of Columbia University In The City Of New York Active filter circuit with dynamically modifiable gain
FR2820573B1 (en) 2001-02-02 2003-03-28 France Telecom METHOD AND DEVICE FOR PROCESSING A PLURALITY OF AUDIO BIT STREAMS
WO2004019656A2 (en) 2001-02-07 2004-03-04 Dolby Laboratories Licensing Corporation Audio channel spatial translation
EP1233509A1 (en) * 2001-02-14 2002-08-21 Thomson Licensing S.A. Digital audio processor
DE10107385A1 (en) 2001-02-16 2002-09-05 Harman Audio Electronic Sys Device for adjusting the volume depending on noise
US6915264B2 (en) 2001-02-22 2005-07-05 Lucent Technologies Inc. Cochlear filter bank structure for determining masked thresholds for use in perceptual audio coding
DK1290914T3 (en) 2001-04-10 2004-09-27 Phonak Ag Method of fitting a hearing aid to an individual
EP1377967B1 (en) 2001-04-13 2013-04-10 Dolby Laboratories Licensing Corporation High quality time-scaling and pitch-scaling of audio signals
US7610205B2 (en) * 2002-02-12 2009-10-27 Dolby Laboratories Licensing Corporation High quality time-scaling and pitch-scaling of audio signals
US7461002B2 (en) 2001-04-13 2008-12-02 Dolby Laboratories Licensing Corporation Method for time aligning audio signals using characterizations based on auditory events
US7283954B2 (en) 2001-04-13 2007-10-16 Dolby Laboratories Licensing Corporation Comparing audio using characterizations based on auditory events
DE60209161T2 (en) 2001-04-18 2006-10-05 Gennum Corp., Burlington Multi-channel hearing aid with transmission options between the channels
US20020173864A1 (en) * 2001-05-17 2002-11-21 Crystal Voice Communications, Inc Automatic volume control for voice over internet
MXPA03010750A (en) * 2001-05-25 2004-07-01 Dolby Lab Licensing Corp High quality time-scaling and pitch-scaling of audio signals.
JP4272050B2 (en) * 2001-05-25 2009-06-03 ドルビー・ラボラトリーズ・ライセンシング・コーポレーション Audio comparison using characterization based on auditory events
US7177803B2 (en) 2001-10-22 2007-02-13 Motorola, Inc. Method and apparatus for enhancing loudness of an audio signal
US20040037421A1 (en) 2001-12-17 2004-02-26 Truman Michael Mead Parital encryption of assembled bitstreams
US7068723B2 (en) 2002-02-28 2006-06-27 Fuji Xerox Co., Ltd. Method for automatically producing optimal summaries of linear media
DE60326782D1 (en) 2002-04-22 2009-04-30 Koninkl Philips Electronics Nv Decoding device with decorrelation unit
US7155385B2 (en) 2002-05-16 2006-12-26 Comerica Bank, As Administrative Agent Automatic gain control for adjusting gain during non-speech portions
US20030223597A1 (en) 2002-05-29 2003-12-04 Sunil Puria Adapative noise compensation for dynamic signal enhancement
US7447631B2 (en) 2002-06-17 2008-11-04 Dolby Laboratories Licensing Corporation Audio coding system using spectral hole filling
JP4257079B2 (en) 2002-07-19 2009-04-22 パイオニア株式会社 Frequency characteristic adjusting device and frequency characteristic adjusting method
DE10236694A1 (en) 2002-08-09 2004-02-26 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Equipment for scalable coding and decoding of spectral values of signal containing audio and/or video information by splitting signal binary spectral values into two partial scaling layers
US7454331B2 (en) 2002-08-30 2008-11-18 Dolby Laboratories Licensing Corporation Controlling loudness of speech in signals that contain speech and other types of audio material
MXPA05008317A (en) 2003-02-06 2005-11-04 Dolby Lab Licensing Corp Continuous backup audio.
DE10308483A1 (en) 2003-02-26 2004-09-09 Siemens Audiologische Technik Gmbh Method for automatic gain adjustment in a hearing aid and hearing aid
US7551745B2 (en) 2003-04-24 2009-06-23 Dolby Laboratories Licensing Corporation Volume and compression control in movie theaters
SG185134A1 (en) 2003-05-28 2012-11-29 Dolby Lab Licensing Corp Method, apparatus and computer program for calculating and adjusting the perceived loudness of an audio signal
JP2004356894A (en) * 2003-05-28 2004-12-16 Mitsubishi Electric Corp Sound quality adjuster
JP4226395B2 (en) 2003-06-16 2009-02-18 アルパイン株式会社 Audio correction device
US8918316B2 (en) 2003-07-29 2014-12-23 Alcatel Lucent Content identification system
US7729497B2 (en) 2004-01-13 2010-06-01 Koninklijke Philips Electronics N.V. Audio signal enhancement
ATE527654T1 (en) 2004-03-01 2011-10-15 Dolby Lab Licensing Corp MULTI-CHANNEL AUDIO CODING
GB2413906A (en) 2004-04-21 2005-11-09 Imagination Tech Ltd Radio volume control system
JP4168976B2 (en) 2004-05-28 2008-10-22 ソニー株式会社 Audio signal encoding apparatus and method
US7574010B2 (en) 2004-05-28 2009-08-11 Research In Motion Limited System and method for adjusting an audio signal
EP1601171B1 (en) 2004-05-28 2008-04-30 Research In Motion Limited System And Method For Adjusting An Audio Signal
US20080095385A1 (en) * 2004-06-30 2008-04-24 Koninklijke Philips Electronics, N.V. Method of and System for Automatically Adjusting the Loudness of an Audio Signal
US7617109B2 (en) 2004-07-01 2009-11-10 Dolby Laboratories Licensing Corporation Method for correcting metadata affecting the playback loudness and dynamic range of audio information
US7508947B2 (en) 2004-08-03 2009-03-24 Dolby Laboratories Licensing Corporation Method for combining audio signals using auditory scene analysis
CA2581810C (en) 2004-10-26 2013-12-17 Dolby Laboratories Licensing Corporation Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
WO2006058361A1 (en) 2004-12-04 2006-06-08 Dynamic Hearing Pty Ltd Method and apparatus for adaptive sound processing parameters
US20060126865A1 (en) 2004-12-13 2006-06-15 Blamey Peter J Method and apparatus for adaptive sound processing parameters
US8265295B2 (en) 2005-03-11 2012-09-11 Rane Corporation Method and apparatus for identifying feedback in a circuit
TW200638335A (en) 2005-04-13 2006-11-01 Dolby Lab Licensing Corp Audio metadata verification
TWI397903B (en) 2005-04-13 2013-06-01 Dolby Lab Licensing Corp Economical loudness measurement of coded audio
TWI396188B (en) 2005-08-02 2013-05-11 Dolby Lab Licensing Corp Controlling spatial audio coding parameters as a function of auditory events
CN101421781A (en) 2006-04-04 2009-04-29 杜比实验室特许公司 Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
WO2007120452A1 (en) 2006-04-04 2007-10-25 Dolby Laboratories Licensing Corporation Audio signal loudness measurement and modification in the mdct domain
TWI517562B (en) 2006-04-04 2016-01-11 杜比實驗室特許公司 Method, apparatus, and computer program for scaling the overall perceived loudness of a multichannel audio signal by a desired amount
CA2648237C (en) * 2006-04-27 2013-02-05 Dolby Laboratories Licensing Corporation Audio gain control using specific-loudness-based auditory event detection
US8750538B2 (en) 2006-05-05 2014-06-10 Creative Technology Ltd Method for enhancing audio signals
RU2413357C2 (en) 2006-10-20 2011-02-27 Долби Лэборетериз Лайсенсинг Корпорейшн Processing dynamic properties of audio using retuning
US8521314B2 (en) 2006-11-01 2013-08-27 Dolby Laboratories Licensing Corporation Hierarchical control path with constraints for audio dynamics processing
US8194889B2 (en) 2007-01-03 2012-06-05 Dolby Laboratories Licensing Corporation Hybrid digital/analog loudness-compensating volume control
PL2547031T3 (en) 2007-03-15 2014-07-31 Interdigital Tech Corp Method and apparatus for reordering data in an evolved high speed packet access
KR101163411B1 (en) 2007-03-19 2012-07-12 돌비 레버러토리즈 라이쎈싱 코오포레이션 Speech enhancement employing a perceptual model
CN101681618B (en) 2007-06-19 2015-12-16 杜比实验室特许公司 Utilize the loudness measurement of spectral modifications
US8054948B1 (en) 2007-06-28 2011-11-08 Sprint Communications Company L.P. Audio experience for a communications device user
WO2009086174A1 (en) 2007-12-21 2009-07-09 Srs Labs, Inc. System for adjusting perceived loudness of audio signals
JP4823352B2 (en) 2009-12-24 2011-11-24 株式会社東芝 Information processing device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040165730A1 (en) * 2001-04-13 2004-08-26 Crockett Brett G Segmenting audio signals into auditory events

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BLESSER, BARRY: "An Ultraminiature Console Compression System with Maximum User Flexibility", JOURNAL OF AUDIO ENGINEERING SOCIETY, vol. 20, no. 4, May 1972 (1972-05-01), New York, pages 297 - 302, XP002449773 *
HOEG W ET AL: "DYNAMIC RANGE CONTROL (DRC) AND MUSIC/SPEECH CONTROL (MSC) PROGRAMME-ASSOCIATED DATA SERVICES FOR DAB", EBU REVIEW- TECHNICAL, EUROPEAN BROADCASTING UNION. BRUSSELS, BE, no. 261, 21 September 1994 (1994-09-21), pages 56 - 70, XP000486553, ISSN: 0251-0936 *

Cited By (120)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8195472B2 (en) 2001-04-13 2012-06-05 Dolby Laboratories Licensing Corporation High quality time-scaling and pitch-scaling of audio signals
US8488800B2 (en) 2001-04-13 2013-07-16 Dolby Laboratories Licensing Corporation Segmenting audio signals into auditory events
US8437482B2 (en) 2003-05-28 2013-05-07 Dolby Laboratories Licensing Corporation Method, apparatus and computer program for calculating and adjusting the perceived loudness of an audio signal
US9672839B1 (en) 2004-03-01 2017-06-06 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques and differentially coded parameters
US9640188B2 (en) 2004-03-01 2017-05-02 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques
US9691404B2 (en) 2004-03-01 2017-06-27 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques
US9697842B1 (en) 2004-03-01 2017-07-04 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques and differentially coded parameters
US9311922B2 (en) 2004-03-01 2016-04-12 Dolby Laboratories Licensing Corporation Method, apparatus, and storage medium for decoding encoded audio channels
US10460740B2 (en) 2004-03-01 2019-10-29 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US9715882B2 (en) 2004-03-01 2017-07-25 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques
US9454969B2 (en) 2004-03-01 2016-09-27 Dolby Laboratories Licensing Corporation Multichannel audio coding
US9779745B2 (en) 2004-03-01 2017-10-03 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques and differentially coded parameters
US9704499B1 (en) 2004-03-01 2017-07-11 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques and differentially coded parameters
US10796706B2 (en) 2004-03-01 2020-10-06 Dolby Laboratories Licensing Corporation Methods and apparatus for reconstructing audio signals with decorrelation and differentially coded parameters
US8983834B2 (en) 2004-03-01 2015-03-17 Dolby Laboratories Licensing Corporation Multichannel audio coding
US9691405B1 (en) 2004-03-01 2017-06-27 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques and differentially coded parameters
US11308969B2 (en) 2004-03-01 2022-04-19 Dolby Laboratories Licensing Corporation Methods and apparatus for reconstructing audio signals with decorrelation and differentially coded parameters
US10269364B2 (en) 2004-03-01 2019-04-23 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques
US10403297B2 (en) 2004-03-01 2019-09-03 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US8170882B2 (en) 2004-03-01 2012-05-01 Dolby Laboratories Licensing Corporation Multichannel audio coding
US9520135B2 (en) 2004-03-01 2016-12-13 Dolby Laboratories Licensing Corporation Reconstructing audio signals with multiple decorrelation techniques
US7508947B2 (en) 2004-08-03 2009-03-24 Dolby Laboratories Licensing Corporation Method for combining audio signals using auditory scene analysis
US10454439B2 (en) 2004-10-26 2019-10-22 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US10411668B2 (en) 2004-10-26 2019-09-10 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US10389321B2 (en) 2004-10-26 2019-08-20 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US10389320B2 (en) 2004-10-26 2019-08-20 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US10374565B2 (en) 2004-10-26 2019-08-06 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US10361671B2 (en) 2004-10-26 2019-07-23 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US8488809B2 (en) 2004-10-26 2013-07-16 Dolby Laboratories Licensing Corporation Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
US9979366B2 (en) 2004-10-26 2018-05-22 Dolby Laboratories Licensing Corporation Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
US9966916B2 (en) 2004-10-26 2018-05-08 Dolby Laboratories Licensing Corporation Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
US9960743B2 (en) 2004-10-26 2018-05-01 Dolby Laboratories Licensing Corporation Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
US9954506B2 (en) 2004-10-26 2018-04-24 Dolby Laboratories Licensing Corporation Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
US10396739B2 (en) 2004-10-26 2019-08-27 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US10396738B2 (en) 2004-10-26 2019-08-27 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US11296668B2 (en) 2004-10-26 2022-04-05 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US8199933B2 (en) 2004-10-26 2012-06-12 Dolby Laboratories Licensing Corporation Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
US10389319B2 (en) 2004-10-26 2019-08-20 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US9705461B1 (en) 2004-10-26 2017-07-11 Dolby Laboratories Licensing Corporation Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
US10476459B2 (en) 2004-10-26 2019-11-12 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US8090120B2 (en) 2004-10-26 2012-01-03 Dolby Laboratories Licensing Corporation Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
US9350311B2 (en) 2004-10-26 2016-05-24 Dolby Laboratories Licensing Corporation Calculating and adjusting the perceived loudness and/or the perceived spectral balance of an audio signal
US10720898B2 (en) 2004-10-26 2020-07-21 Dolby Laboratories Licensing Corporation Methods and apparatus for adjusting a level of an audio signal
US8280743B2 (en) 2005-06-03 2012-10-02 Dolby Laboratories Licensing Corporation Channel reconfiguration with side information
US8600074B2 (en) 2006-04-04 2013-12-03 Dolby Laboratories Licensing Corporation Loudness modification of multichannel audio signals
US8504181B2 (en) 2006-04-04 2013-08-06 Dolby Laboratories Licensing Corporation Audio signal loudness measurement and modification in the MDCT domain
US8019095B2 (en) 2006-04-04 2011-09-13 Dolby Laboratories Licensing Corporation Loudness modification of multichannel audio signals
US9584083B2 (en) 2006-04-04 2017-02-28 Dolby Laboratories Licensing Corporation Loudness modification of multichannel audio signals
US9762196B2 (en) 2006-04-27 2017-09-12 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US10523169B2 (en) 2006-04-27 2019-12-31 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US9685924B2 (en) 2006-04-27 2017-06-20 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US11962279B2 (en) 2006-04-27 2024-04-16 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US9450551B2 (en) 2006-04-27 2016-09-20 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US9698744B1 (en) 2006-04-27 2017-07-04 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US11711060B2 (en) 2006-04-27 2023-07-25 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US11362631B2 (en) 2006-04-27 2022-06-14 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US10833644B2 (en) 2006-04-27 2020-11-10 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US8144881B2 (en) 2006-04-27 2012-03-27 Dolby Laboratories Licensing Corporation Audio gain control using specific-loudness-based auditory event detection
US8428270B2 (en) 2006-04-27 2013-04-23 Dolby Laboratories Licensing Corporation Audio gain control using specific-loudness-based auditory event detection
US9742372B2 (en) 2006-04-27 2017-08-22 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US10284159B2 (en) 2006-04-27 2019-05-07 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US10103700B2 (en) 2006-04-27 2018-10-16 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US9866191B2 (en) 2006-04-27 2018-01-09 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US9768750B2 (en) 2006-04-27 2017-09-19 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US9768749B2 (en) 2006-04-27 2017-09-19 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US9774309B2 (en) 2006-04-27 2017-09-26 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US9136810B2 (en) 2006-04-27 2015-09-15 Dolby Laboratories Licensing Corporation Audio gain control using specific-loudness-based auditory event detection
US9780751B2 (en) 2006-04-27 2017-10-03 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US9787269B2 (en) 2006-04-27 2017-10-10 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US9787268B2 (en) 2006-04-27 2017-10-10 Dolby Laboratories Licensing Corporation Audio control using auditory event detection
US8849433B2 (en) 2006-10-20 2014-09-30 Dolby Laboratories Licensing Corporation Audio dynamics processing using a reset
US8521314B2 (en) 2006-11-01 2013-08-27 Dolby Laboratories Licensing Corporation Hierarchical control path with constraints for audio dynamics processing
US9185507B2 (en) 2007-06-08 2015-11-10 Dolby Laboratories Licensing Corporation Hybrid derivation of surround sound audio channels by controllably combining ambience and matrix-decoded signal components
US8396574B2 (en) 2007-07-13 2013-03-12 Dolby Laboratories Licensing Corporation Audio processing using auditory scene analysis and spectral skewness
US9300714B2 (en) 2008-09-19 2016-03-29 Dolby Laboratories Licensing Corporation Upstream signal processing for client devices in a small-cell wireless network
US9251802B2 (en) 2008-09-19 2016-02-02 Dolby Laboratories Licensing Corporation Upstream quality enhancement signal processing for resource constrained client devices
US8744247B2 (en) 2008-09-19 2014-06-03 Dolby Laboratories Licensing Corporation Upstream quality enhancement signal processing for resource constrained client devices
JP5236006B2 (en) * 2008-10-17 2013-07-17 シャープ株式会社 Audio signal adjustment apparatus and audio signal adjustment method
WO2010044439A1 (en) * 2008-10-17 2010-04-22 シャープ株式会社 Audio signal adjustment device and audio signal adjustment method
US8787595B2 (en) 2008-10-17 2014-07-22 Sharp Kabushiki Kaisha Audio signal adjustment device and audio signal adjustment method having long and short term gain adjustment
JP2012509038A (en) * 2008-11-14 2012-04-12 ザット コーポレーション Dynamic volume control and multi-space processing prevention
US8892426B2 (en) 2008-12-24 2014-11-18 Dolby Laboratories Licensing Corporation Audio signal loudness determination and modification in the frequency domain
US9306524B2 (en) 2008-12-24 2016-04-05 Dolby Laboratories Licensing Corporation Audio signal loudness determination and modification in the frequency domain
EP2367286A1 (en) * 2010-03-12 2011-09-21 Harman Becker Automotive Systems GmbH Automatic correction of loudness level in audio signals
US8594345B2 (en) 2010-03-12 2013-11-26 Harman Becker Automotive Systems Gmbh Automatic correction of loudness level in audio signals
CN102195585A (en) * 2010-03-12 2011-09-21 哈曼贝克自动系统股份有限公司 Automatic correction of loudness level in audio signals
EP2367287A3 (en) * 2010-03-12 2011-10-26 Harman Becker Automotive Systems GmbH Automatic correction of loudness level in audio signals
US8498430B2 (en) 2010-03-12 2013-07-30 Harman Becker Automotive Systems Gmbh Automatic correction of loudness level in audio signals
US9960742B2 (en) 2012-04-12 2018-05-01 Dolby Laboratories Licensing Corporation System and method for leveling loudness variation in an audio signal
US9806688B2 (en) 2012-04-12 2017-10-31 Dolby Laboratories Licensing Corporation System and method for leveling loudness variation in an audio signal
WO2013154868A1 (en) * 2012-04-12 2013-10-17 Dolby Laboratories Licensing Corporation System and method for leveling loudness variation in an audio signal
US10090817B2 (en) 2012-04-12 2018-10-02 Dolby Laboratories Licensing Corporation System and method for leveling loudness variation in an audio signal
WO2014046941A1 (en) * 2012-09-19 2014-03-27 Dolby Laboratories Licensing Corporation Method and system for object-dependent adjustment of levels of audio objects
US9349384B2 (en) 2012-09-19 2016-05-24 Dolby Laboratories Licensing Corporation Method and system for object-dependent adjustment of levels of audio objects
US11218126B2 (en) 2013-03-26 2022-01-04 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
US9548713B2 (en) 2013-03-26 2017-01-17 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
US10411669B2 (en) 2013-03-26 2019-09-10 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
US9923536B2 (en) 2013-03-26 2018-03-20 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
RU2826268C2 (en) * 2013-03-26 2024-09-09 Долби Лабораторис Лайсэнзин Корпорейшн Loudness equalizer controller and control method
EP3217545A1 (en) 2013-03-26 2017-09-13 Dolby Laboratories Licensing Corp. Volume leveler controller and controlling method
EP3190702A2 (en) 2013-03-26 2017-07-12 Dolby Laboratories Licensing Corp. Volume leveler controller and controlling method
WO2014160548A1 (en) 2013-03-26 2014-10-02 Dolby Laboratories Licensing Corporation Equalizer controller and controlling method
EP3598448A1 (en) 2013-03-26 2020-01-22 Dolby Laboratories Licensing Corporation Apparatuses and methods for audio classifying and processing
EP4080763A1 (en) 2013-03-26 2022-10-26 Dolby Laboratories Licensing Corp. Volume leveler controller and controlling method
US10707824B2 (en) 2013-03-26 2020-07-07 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
WO2014160542A2 (en) 2013-03-26 2014-10-02 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
EP3232567A1 (en) 2013-03-26 2017-10-18 Dolby Laboratories Licensing Corporation Equalizer controller and controlling method
US10803879B2 (en) 2013-03-26 2020-10-13 Dolby Laboratories Licensing Corporation Apparatuses and methods for audio classifying and processing
US9842605B2 (en) 2013-03-26 2017-12-12 Dolby Laboratories Licensing Corporation Apparatuses and methods for audio classifying and processing
US11711062B2 (en) 2013-03-26 2023-07-25 Dolby Laboratories Licensing Corporation Volume leveler controller and controlling method
WO2014160678A2 (en) 2013-03-26 2014-10-02 Dolby Laboratories Licensing Corporation 1apparatuses and methods for audio classifying and processing
US10044337B2 (en) 2013-03-26 2018-08-07 Dolby Laboratories Licensing Corporation Equalizer controller and controlling method
WO2017023601A1 (en) * 2015-07-31 2017-02-09 Apple Inc. Encoded audio extended metadata-based dynamic range control
US9837086B2 (en) 2015-07-31 2017-12-05 Apple Inc. Encoded audio extended metadata-based dynamic range control
KR102122137B1 (en) 2015-07-31 2020-06-11 애플 인크. Encoded audio extension metadata-based dynamic range control
KR20180019715A (en) * 2015-07-31 2018-02-26 애플 인크. Encoded Audio Enhanced Metadata-Based Dynamic Range Control
US10306392B2 (en) 2015-11-03 2019-05-28 Dolby Laboratories Licensing Corporation Content-adaptive surround sound virtualization
US10923132B2 (en) 2016-02-19 2021-02-16 Dolby Laboratories Licensing Corporation Diffusivity based sound processing method and apparatus
WO2017142916A1 (en) * 2016-02-19 2017-08-24 Dolby Laboratories Licensing Corporation Diffusivity based sound processing method and apparatus
US20220165289A1 (en) * 2020-11-23 2022-05-26 Cyber Resonance Corporation Methods and systems for processing recorded audio content to enhance speech

Also Published As

Publication number Publication date
US9742372B2 (en) 2017-08-22
US20240186972A1 (en) 2024-06-06
US20170179904A1 (en) 2017-06-22
US20190013786A1 (en) 2019-01-10
IL194430A (en) 2013-05-30
US9768749B2 (en) 2017-09-19
EP2011234B1 (en) 2010-12-29
TWI455481B (en) 2014-10-01
NO20180272A1 (en) 2008-11-17
US10833644B2 (en) 2020-11-10
US20120155659A1 (en) 2012-06-21
NO20191310A1 (en) 2008-11-17
KR20090005225A (en) 2009-01-12
NO344363B1 (en) 2019-11-18
NO344655B1 (en) 2020-02-24
NO345590B1 (en) 2021-05-03
US20170179903A1 (en) 2017-06-22
US10103700B2 (en) 2018-10-16
US20170179901A1 (en) 2017-06-22
US9685924B2 (en) 2017-06-20
US9136810B2 (en) 2015-09-15
TW200803161A (en) 2008-01-01
US9774309B2 (en) 2017-09-26
NO20180266A1 (en) 2008-11-17
KR101041665B1 (en) 2011-06-15
NO339346B1 (en) 2016-11-28
NO20190025A1 (en) 2008-11-17
US20200144979A1 (en) 2020-05-07
US9787269B2 (en) 2017-10-10
US20180069517A1 (en) 2018-03-08
US20170179905A1 (en) 2017-06-22
US11362631B2 (en) 2022-06-14
RU2008146747A (en) 2010-06-10
KR20110022058A (en) 2011-03-04
US20170179907A1 (en) 2017-06-22
NO20190022A1 (en) 2008-11-17
US20170179906A1 (en) 2017-06-22
US20170179902A1 (en) 2017-06-22
AU2011201348B2 (en) 2013-04-18
JP5255663B2 (en) 2013-08-07
HK1126902A1 (en) 2009-09-11
US10284159B2 (en) 2019-05-07
CN101432965B (en) 2012-07-04
RU2417514C2 (en) 2011-04-27
US9780751B2 (en) 2017-10-03
MX2008013753A (en) 2009-03-06
US20090220109A1 (en) 2009-09-03
US20170179909A1 (en) 2017-06-22
CN102684628A (en) 2012-09-19
NO20180271A1 (en) 2008-11-17
AU2007243586A1 (en) 2007-11-08
NO20161295A1 (en) 2008-11-17
US20130243222A1 (en) 2013-09-19
AU2007243586B2 (en) 2010-12-23
AU2011201348A1 (en) 2011-04-14
US20190222186A1 (en) 2019-07-18
NO344013B1 (en) 2019-08-12
NO344362B1 (en) 2019-11-18
HK1176177A1 (en) 2013-07-19
IL194430A0 (en) 2009-08-03
US20240313729A1 (en) 2024-09-19
EP2011234A1 (en) 2009-01-07
JP2011151811A (en) 2011-08-04
JP2009535897A (en) 2009-10-01
US8428270B2 (en) 2013-04-23
US20120321096A1 (en) 2012-12-20
NO344364B1 (en) 2019-11-18
US9787268B2 (en) 2017-10-10
NO20161439A1 (en) 2008-11-17
US8144881B2 (en) 2012-03-27
NO344658B1 (en) 2020-03-02
NO342160B1 (en) 2018-04-09
US11962279B2 (en) 2024-04-16
NO20161296A1 (en) 2008-11-17
KR101200615B1 (en) 2012-11-12
NO344361B1 (en) 2019-11-18
JP5129806B2 (en) 2013-01-30
US9866191B2 (en) 2018-01-09
US20160359465A1 (en) 2016-12-08
ES2359799T3 (en) 2011-05-27
US20230318555A1 (en) 2023-10-05
US9762196B2 (en) 2017-09-12
NO20190018A1 (en) 2008-11-17
NO343877B1 (en) 2019-06-24
US20240313730A1 (en) 2024-09-19
US20240313731A1 (en) 2024-09-19
ATE493794T1 (en) 2011-01-15
NO20190002A1 (en) 2008-11-17
US20170179900A1 (en) 2017-06-22
CN102684628B (en) 2014-11-26
US9698744B1 (en) 2017-07-04
BRPI0711063A2 (en) 2011-08-23
CA2648237A1 (en) 2007-11-08
NO342157B1 (en) 2018-04-09
UA93243C2 (en) 2011-01-25
MY141426A (en) 2010-04-30
NO20190024A1 (en) 2008-11-17
US20170179908A1 (en) 2017-06-22
DE602007011594D1 (en) 2011-02-10
CA2648237C (en) 2013-02-05
US9768750B2 (en) 2017-09-19
US11711060B2 (en) 2023-07-25
NO342164B1 (en) 2018-04-09
NO20084336L (en) 2008-11-17
US20220394380A1 (en) 2022-12-08
US9450551B2 (en) 2016-09-20
US20210126606A1 (en) 2021-04-29
CN101432965A (en) 2009-05-13
PL2011234T3 (en) 2011-05-31
DK2011234T3 (en) 2011-03-14
BRPI0711063B1 (en) 2023-09-26
US10523169B2 (en) 2019-12-31

Similar Documents

Publication Publication Date Title
US11711060B2 (en) Audio control using auditory event detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07754779

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 194430

Country of ref document: IL

WWE Wipo information: entry into national phase

Ref document number: 2648237

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 4068/KOLNP/2008

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: PI 20084037

Country of ref document: MY

WWE Wipo information: entry into national phase

Ref document number: 2007243586

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 200780014742.8

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2009507694

Country of ref document: JP

Ref document number: MX/A/2008/013753

Country of ref document: MX

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2007754779

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2007243586

Country of ref document: AU

Date of ref document: 20070330

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2008146747

Country of ref document: RU

WWE Wipo information: entry into national phase

Ref document number: 12226698

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 1020117001302

Country of ref document: KR

ENP Entry into the national phase

Ref document number: PI0711063

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20081028