CN113257273A - Efficient DRC profile transmission - Google Patents

Efficient DRC profile transmission Download PDF

Info

Publication number
CN113257273A
CN113257273A CN202110526962.0A CN202110526962A CN113257273A CN 113257273 A CN113257273 A CN 113257273A CN 202110526962 A CN202110526962 A CN 202110526962A CN 113257273 A CN113257273 A CN 113257273A
Authority
CN
China
Prior art keywords
drc
audio signal
frames
profiles
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110526962.0A
Other languages
Chinese (zh)
Inventor
H·霍伊里奇
J·科喷斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dolby International AB
Original Assignee
Dolby International AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby International AB filed Critical Dolby International AB
Publication of CN113257273A publication Critical patent/CN113257273A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G7/00Volume compression or expansion in amplifiers
    • H03G7/002Volume compression or expansion in amplifiers in untuned or low-frequency amplifiers, e.g. audio amplifiers
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G7/00Volume compression or expansion in amplifiers
    • H03G7/007Volume compression or expansion in amplifiers of digital or coded signals
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03GCONTROL OF AMPLIFICATION
    • H03G9/00Combinations of two or more types of control, e.g. gain control and tone control
    • H03G9/005Combinations of two or more types of control, e.g. gain control and tone control of digital or coded signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams

Abstract

The present disclosure relates to efficient DRC profile transmission. A method for decoding an encoded audio signal is described. The encoded audio signal includes a sequence of frames and indicates a plurality of different Dynamic Range Control (DRC) profiles for a corresponding plurality of different rendering modes. The method comprises the following steps: determining a first rendering mode from the plurality of different rendering modes; determining one or more DRC profiles from a subset of DRC profiles comprised within a current frame of the sequence of frames; determining whether at least one of the one or more DRC profiles is applicable to a first rendering mode; selecting a default DRC profile as a current DRC profile if none of the one or more DRC profiles is applicable to the first rendering mode; wherein the definition data of the default DRC profile is known at the decoder; and decoding the current frame using the current DRC profile.

Description

Efficient DRC profile transmission
This application is a divisional application of the invention patent application having application number 201580053702.9, application date 2015, 9, 29, and entitled "efficient DRC profile transmission".
Cross Reference to Related Applications
This application claims priority to U.S. provisional patent application No.62/058,228, filed on 1/10/2014, which is hereby incorporated by reference in its entirety.
Technical Field
This document relates to audio signal processing. In particular, the present document relates to a method and corresponding system for transmitting a Dynamic Range Control (DRC) profile in a bandwidth efficient manner.
Background
The increasing popularity of media consumer devices creates new opportunities and challenges for the creators and distributors of media content for playback on these devices, as well as for the designers and manufacturers of these devices. Many consumer devices are capable of playing back a wide range of media content types and formats, including those typically associated with high quality, wide bandwidth and wide dynamic range audio content for HDTV, Blu-ray or DVD. Media processing devices can be used to play back this type of audio content on their own internal acoustic transducers or on external transducers (such as headphones or high quality home cinema systems); however, all these playback systems and environments place significantly different requirements on the dynamic range of the audio signal due to noise level variations in the environment or due to the limited ability of the playback system to reproduce the required sound pressure level without distortion. Limiting the dynamic range according to the environment is a way to provide high quality and intelligibility over a wide range of different rendering devices with different rendering capabilities and listening environments (i.e. over a wide range of rendering modes).
The following technical problems are solved: the author and distributor of media content are provided with bandwidth-efficient means of enabling audio signals to be reproduced with high quality and high intelligibility over a wide range of different rendering devices having different rendering capabilities.
Disclosure of Invention
According to an aspect, a method for generating an encoded audio signal is described. The encoded audio signal comprises a sequence of frames. The encoded audio signal indicates a plurality of different Dynamic Range Control (DRC) profiles for a corresponding plurality of different rendering modes. The method includes inserting different subsets of DRC profiles of the plurality of DRC profiles into different frames of a sequence of frames such that two or more frames of the sequence of frames collectively include the plurality of DRC profiles.
According to a further aspect, a method for decoding an encoded audio signal is described. The encoded audio signal comprises a sequence of frames. Further, the encoded audio signal indicates a plurality of different Dynamic Range Control (DRC) profiles for a corresponding plurality of different rendering modes. Different subsets of DRC profiles of the plurality of DRC profiles are included in different frames of the sequence of frames such that two or more frames of the sequence of frames collectively include the plurality of DRC profiles. The method includes determining a first rendering mode from a plurality of different rendering modes, and determining one or more DRC profiles from a subset of DRC profiles included within a current frame of a sequence of frames. Further, the method includes determining whether at least one of the one or more DRC profiles is applicable to the first rendering mode. Additionally, the method includes selecting a default DRC profile as the current DRC profile if none of the one or more DRC profiles is applicable to the first rendering mode; wherein the definition data of the default DRC profile is known at a decoder for decoding the encoded audio signal. Further, the method includes decoding the current frame using the current DRC profile.
According to a further aspect, a bitstream comprising an encoded audio signal is described. The encoded audio signal comprises a sequence of frames. The encoded audio signal indicates a plurality of different Dynamic Range Control (DRC) profiles for a corresponding plurality of different rendering modes. Different subsets of DRC profiles of the plurality of DRC profiles are included in different frames of the sequence of frames such that two or more frames of the sequence of frames collectively include the plurality of DRC profiles.
According to another aspect, an encoder for generating an encoded audio signal is described. The encoded audio signal comprises a sequence of frames. The encoded audio signal indicates a plurality of different Dynamic Range Control (DRC) profiles for a corresponding plurality of different rendering modes. The encoder is configured to insert different subsets of DRC profiles of the plurality of DRC profiles into different frames of a sequence of frames such that two or more frames of the sequence of frames collectively comprise the plurality of DRC profiles.
According to a further aspect, a decoder for decoding an encoded audio signal is described. The encoded audio signal comprises a sequence of frames. The encoded audio signal indicates a plurality of different Dynamic Range Control (DRC) profiles for a corresponding plurality of different rendering modes. Different subsets of DRC profiles of the plurality of DRC profiles are included in different frames of a frame sequence such that two or more frames of the frame sequence collectively include the plurality of DRC profiles. The decoder is configured to: determining a first rendering mode from the plurality of different rendering modes; determining one or more DRC profiles from a subset of DRC profiles comprised within a current frame of the sequence of frames; determining whether at least one of the one or more DRC profiles is applicable to a first rendering mode; selecting a default DRC profile as a current DRC profile if none of the one or more DRC profiles is applicable to the first rendering mode; wherein the definition data of the default DRC profile is known at the decoder; and decoding the current frame using the current DRC profile.
According to a further aspect, a software program is described. The software program may be adapted for execution on a processor and for performing the method steps outlined herein when carried out on a processor.
According to another aspect, a storage medium is described. The storage medium may comprise a software program adapted for execution on a processor and for performing the method steps outlined herein when carried out on the processor.
According to a further aspect, a computer program product is described. The computer program product may comprise executable instructions for performing the method steps outlined herein when executed on a computer.
It should be noted that the methods and systems including the preferred embodiments thereof as outlined in the present patent application may be used alone or in combination with other methods and systems disclosed herein. Moreover, all aspects of the methods and systems outlined in the present patent application may be combined in any combination. In particular, the features of the claims can be combined with one another in any desired manner.
Drawings
The invention is described below by way of example with reference to the accompanying drawings, in which
Fig. 1 and 2 illustrate an example audio decoder and an example audio encoder, respectively;
FIGS. 3 and 4 illustrate example dynamic range compression curves;
FIG. 5 illustrates an example frame sequence; and
fig. 6 illustrates a flow diagram of an example method for selecting a DRC profile.
Detailed Description
As indicated above, the present document addresses the technical problem of enabling designers and/or distributors of audio content to control the quality and intelligibility of audio content for different types of rendering modes. An example rendering mode is a home cinema rendering mode in which audio content is played back in a quiet environment using transducers that typically allow a very wide dynamic range. Another example rendering mode is a flat panel mode in which audio content is played back using transducers, such as televisions, which typically allow for a reduced dynamic range compared to home theatres. A further example rendering mode is a portable speaker mode in which audio content is played back using a loudspeaker of a portable electronic device, such as a smartphone. The dynamic range of this rendering mode is typically small compared to the above-mentioned rendering mode, and the environment tends to be noisy. Another example rendering mode is a portable headset mode in which audio content is played back using a headset incorporating the portable electronic device. The dynamic range is limited, but is typically higher than that provided by the microphone of the portable electronic device.
To allow high quality and intelligibility of different rendering modes, different DRC (dynamic range control) profiles for the different rendering modes may be provided along with the audio content. Audio content may be transmitted in a sequence of frames. The frame sequence may include I (i.e., independent) frames that may be decoded independently of previous or subsequent frames. Further, the frame sequence may include other types of frames (e.g., P-frames and/or B-frames) that typically exhibit correlation with respect to a previous frame and/or a subsequent frame. At least some of the frames of the sequence of frames may include a plurality of different DRC profiles for a plurality of different rendering modes. In particular, an I-frame of a sequence of frames may include the plurality of DRC profiles.
By inserting multiple different DRC profiles into the sequence of audio frames, the audio decoder is enabled to select the appropriate DRC profile for a particular rendering mode. As a result, it may be ensured that the rendered audio signal has a high quality (in particular without clipping or distortion introduced by the transducer) and a high intelligibility.
In the following, various aspects of dynamic range control are described. Without customized dynamic range control, input audio information (e.g., PCM samples, time-frequency samples in a QMF matrix, etc.) is typically reproduced at a playback device at a loudness level that is not appropriate for the particular playback environment of the playback device (i.e., including the device's physical and/or mechanical playback limitations), because the particular playback environment of the playback device may be different from the target playback environment for which encoded audio content has been encoded at the encoding device.
Techniques as described herein may be used to support dynamic range control of various audio content that is customized for any of various playback environments, while maintaining the perceived quality of the audio content and maintaining the artist's intent to adapt the content to different listening environments.
Dynamic Range Control (DRC) refers to a time-varying, level-dependent audio processing operation that changes (e.g., compresses, cuts, expands, boosts) a signal in order to convert an input dynamic range of loudness levels in audio content to an output dynamic range that is different from the input dynamic range. For example, in a dynamic range control scenario, soft sounds may be mapped (e.g., lifted, etc.) to a higher loudness level, and loud sounds may be mapped (e.g., clipped, etc.) to a lower loudness level. As a result, in the loudness domain, the output range of loudness levels becomes smaller than the input range of loudness levels in this example. In some embodiments, however, the dynamic range control may be reversible such that the original range is restored. For example, an expansion operation may be performed to restore the original range as long as the mapped loudness level in the output dynamic range, mapped from the original loudness level, reaches or is below the clipping level, each unique original loudness level is mapped to a unique output loudness level, and so on.
DRC techniques as described herein may be used to provide a better listening experience in certain playback environments or situations. For example, soft sounds in a noisy environment may be masked by noise that makes the soft sounds inaudible. Conversely, loud sounds may be undesirable in some situations, such as disturbing neighbors (e.g., in a "late night" listening mode). Many devices, which typically have loudspeakers with small form factors, are unable to reproduce sound at high output levels, or without perceptible distortion. In some cases, lower signal levels may be reproduced below the human hearing threshold. The DRC technique may perform a mapping of the input loudness level to the output loudness level based on DRC gains (e.g., scaling factors that scale audio amplitudes, boost ratios, cut ratios, etc.) looked up by a dynamic range compression curve.
Dynamic range compression curves refer to functions (e.g., look-up tables, curves, multi-segment segmentation lines, etc.) as follows: individual input loudness levels (e.g., input loudness levels of sounds other than dialogue, etc.) determined from individual frames of audio data are mapped to corresponding output loudness levels, with the result mapped to individual gains or gains for dynamic range control to translate the input loudness levels to corresponding output loudness levels. Each of the individual gains indicates an amount of gain to be applied to the signal for mapping the corresponding single input loudness level to the desired output loudness level. The output loudness level after application of the respective gain represents a target loudness level of audio content in the respective frame of audio data in the particular playback environment.
In addition to specifying a mapping between gain and loudness level, the dynamic range compression curve may also include, or may be provided with, a specific release time (release time) and an attack time (attack time) in applying a specific gain. Increasing refers to an increase in signal energy (or loudness) between consecutive time samples, while releasing refers to a decrease in energy (or loudness) between consecutive time samples. The attack time (e.g., 10 msec, 20 msec, etc.) refers to a time constant for smoothing the DRC gain when the corresponding signal is in the attack mode. The release time (e.g., 80 milliseconds, 100 milliseconds, etc.) refers to a time constant used to smooth the DRC gain when the corresponding signal is in the release mode. In some embodiments, additionally, optionally or alternatively, the time constant is used to smooth the signal energy (or loudness) before determining the DRC gain.
Different dynamic range compression curves may correspond to different playback environments (i.e., different rendering modes). For example, the dynamic range compression curve for the playback environment of a flat TV may be different from the dynamic range compression curve for the playback environment of a portable device. The playback device may have two or more playback environments. For example, a first dynamic range compression curve for a first playback environment for a portable device having a speaker may be different from a second dynamic range compression curve for a second playback environment for the same portable device having a headset.
Fig. 1 shows a block diagram of example components of an audio decoder 100. The audio decoder 100 includes a data extractor 104, a dynamic range controller 106, and an audio renderer 108. The data extractor 104 is configured to receive the encoded input signal 102. The encoded input signal 102 as described herein may be a bitstream comprising frames of input audio data (in particular a sequence of audio frames) that are encoded (e.g., compressed, etc.) and possibly also metadata. The bitstream may be an AC-4 bitstream. The data extractor 104 is configured to extract/decode input audio data frames and metadata from the encoded input signal 102. Each input audio data frame includes a plurality of encoded audio data blocks, each encoded audio data block representing a plurality of audio samples. Each frame represents a (e.g., constant) time interval that includes a certain number of audio samples. The frame size may vary with the sampling rate and the encoded data rate. The audio samples are representative of quantized audio data elements within one, two or more (audio) frequency bands or ranges (e.g., input PCM samples, input time-frequency samples in a QMF matrix, etc.). The quantized audio data elements in the input audio data frame may represent sound pressure waves in the digital (quantized) domain. The quantized audio data elements may cover a limited range of loudness levels up to or below a maximum possible value (e.g., a clipping level, a maximum loudness level, etc.).
The metadata may be used by the audio decoder 100 to process the input audio data frames. The metadata may include various operational parameters related to one or more operations to be performed by the decoder 100, one or more dynamic range compression curves (i.e., one or more DRC profiles), normalization parameters related to the loudness level of a dialog represented in a frame of input audio data, and so on. Dialog loudness levels may refer to (e.g., psychoacoustic, perceptual, etc.) levels of dialog loudness in an entire program (e.g., a movie, a TV program, a radio broadcast, etc.), a portion of a program, a dialog of a program, etc., program loudness, average dialog loudness, etc.
The operation and functionality of the decoder 100 or some or all of the modules (e.g., data extractor 104, dynamic range controller 106, etc.) may be altered in response to metadata extracted from the encoded input signal 102. For example, metadata, including but not limited to dynamic range compression curves, dialog loudness levels, etc., may be used by the decoder 100 to generate audio data elements in the digital domain (e.g., output PCM samples, output time-frequency samples in a QMF matrix, etc.). The output data elements may then be used to drive audio channels or speakers to achieve a specified loudness or reference reproduction level during playback in a particular playback environment.
The dynamic range controller 106 may be configured to receive some or all of the audio data elements in the input audio data frame and metadata, perform audio processing operations (e.g., dynamic range control operations, gain smoothing operations, gain limiting operations, etc.) on the audio data elements in the input audio data frame based at least in part on the metadata extracted from the encoded audio signal 102, and so forth.
In particular, dynamic range controller 106 may include a selector 110, a loudness calculator 112, and/or a DRC gain unit 114. The selector 110 may be configured to determine a speaker configuration (e.g., home theater mode, tablet mode, portable device with speaker mode, portable device with headphone mode, 5.1 speaker configuration mode, 7.1 speaker configuration mode, etc.) associated with a particular playback environment at the decoder 100. The speaker configuration may also be referred to as a rendering mode. Further, the selector 110 may be configured to select a particular dynamic range compression curve (i.e., DRC profile) from among the dynamic range compression curves extracted from the metadata of the encoded input signal 102 (i.e., from the plurality of DRC profiles).
The loudness calculator 112 may be configured to calculate one or more types of loudness levels represented by audio data elements in the input audio data frame. Examples of loudness level types include, but are not limited to, any of the following: individual loudness levels across individual frequency bands in individual channels over individual time intervals, wideband (or wideband) loudness levels across a wide (or wide) frequency range in individual channels, loudness levels determined from or smoothed over blocks or frames of audio data, loudness levels determined from or smoothed over more than one block or frame of audio data, loudness levels smoothed over one or more time intervals, and so forth. Zero, one, or more of these loudness levels may be changed for the purpose of dynamic range control of the decoder 100.
To determine the loudness level, the loudness calculator 112 may determine one or more time-dependent physical acoustic wave properties, such as spatial and/or local pressure levels at particular audio frequencies, etc., represented by audio data elements in the input audio data frame. The loudness calculator 112 may use the one or more time-varying physical wave properties to derive one or more types of loudness levels based on one or more psychoacoustic functions that model human loudness perception. The psychoacoustic function may be a non-linear function constructed based on a model of the human auditory system that converts/maps specific spatial pressure levels at specific audio frequencies to specific loudness for those specific audio frequencies.
Loudness levels (e.g., wideband, broadband, etc.) at multiple (audio) frequencies or multiple frequency bands may be derived by integration of specific loudness levels at the multiple (audio) frequencies or multiple frequency bands. The time-averaged, smoothed, etc., loudness level over one or more time intervals (e.g., longer than the time interval represented by the audio data elements in a block or frame of audio data, etc.) may be obtained by using one or more smoothing filters implemented in the decoder 100 as part of the audio processing operations. Another example method for determining a (wideband) loudness level is specified in ITU-R bs.1770. The method specified in ITU-R bs.1770 applies time-domain filtering to the time-domain input audio signal and then calculates the RMS (root mean square) level on each channel of the input audio signal, before integrating over the channels and gating the resulting loudness level.
The specific loudness level for different frequency bands may be calculated for each block of audio data having a certain (e.g., 256, etc.) samples. In integrating specific loudness levels into wideband (or wideband) loudness levels, a pre-filter may be used to apply frequency weighting (e.g., similar to IEC B-weighting, etc.) to specific loudness levels. Summing of the wide loudness levels across two or more channels (e.g., front left, front right, center, surround left, surround right, etc.) may be performed to provide an overall loudness level for the two or more channels.
The overall loudness level may refer to a broadband (wide-band) loudness level in a single channel (e.g., center, etc.) of the speaker configuration. The overall loudness level may refer to a wideband (or wideband) loudness level in multiple channels. The plurality of channels may be all channels in a speaker configuration (i.e. for rendering mode). Additionally, optionally or alternatively, the plurality of channels may include a subset of channels in a speaker configuration (e.g., a subset of channels including left front, right front and Low Frequency Effects (LFE); a subset of channels including left surround and right surround; and a subset of channels including a center, etc.).
The loudness level (e.g., wideband, overall, specific, etc.) may be used as an input to find the corresponding DRC gain (e.g., static, pre-smoothed, pre-limited, etc.) from the selected dynamic range compression curve. The loudness level to be used as an input to find the DRC gain may first be adjusted or normalized with respect to a dialog loudness level derived from metadata extracted from the encoded audio signal 102 and/or with respect to an output reference level for the rendering mode. The adjustments and normalization related to adjusting the dialog loudness level/output reference level may be performed in a non-loudness domain (e.g., SPL domain, etc.) on the portion of the audio content in the encoded audio signal 102 before the particular spatial pressure level represented in the portion of the audio content in the encoded audio signal 102 is converted or mapped to the particular loudness level of the portion of the audio content in the encoded audio signal 102.
The DRC gain unit 114 may be configured with a DRC algorithm that generates gains (e.g., gains for dynamic range control, for gain limiting, for gain smoothing, etc.) and applies the gains to one or more types of loudness levels represented by the audio data elements in the input audio data frame to achieve a target loudness level for a particular playback environment. Application of gain (e.g., DRC gain, etc.) as described herein may occur in the loudness domain. For example, the gain may be generated based on a loudness calculation (which may be in Sone, or only, e.g., unconverted SPL values compensated for dialog loudness levels), smoothed, and applied directly to the input signal. Techniques as described herein may apply gain to a signal in the loudness domain, then convert the signal from the loudness domain back to the (linear) SPL domain, and calculate the corresponding gain to be applied to the signal by evaluating the signal in the loudness domain before and after the gain is applied to the signal. The ratio (or difference when expressed in logarithmic dB representation) then determines the corresponding gain for the signal.
The DRC algorithm may operate with multiple DRC parameters. The DRC parameters include the dialog loudness level that has been calculated by the upstream encoder 150 (as described in the context of fig. 2) and embedded in the encoded audio signal 102, and that may be obtained by the decoder 100 from metadata in the encoded audio signal 102. The dialog loudness level from the upstream encoder 150 indicates the average dialog loudness level (e.g., energy per program, relative to a full-scale 1kHz sine wave, relative to a reference square wave, etc.). The dialog loudness level extracted from the encoded audio signal 102 may be used to reduce inter-program loudness level differences. The reference dialog loudness level may be set to the same value between different programs, with the decoder 100 being in the same particular playback environment. Based on the dialog loudness level from the metadata, DRC gain unit 114 may apply a dialog loudness-related gain to each audio data block in the program such that the output dialog loudness level (or output reference level) averaged over multiple audio data blocks of the program is increased/decreased to the reference dialog loudness level of the program (e.g., pre-configured, system default, user-configurable, profile-related, etc.). The dialog loudness level may also be used to calibrate the DRC algorithm, and in particular, the null-band of the DRC algorithm may be adjusted to the dialog loudness level. Alternatively, the desired output reference level may be used to calibrate the DRC algorithm when it is applied to a signal to which the gain has been applied, so that the dialog loudness level becomes equal to the desired output reference level. If speech gating has been used to determine the dialog specification (dialnorm) parameter, the dialog loudness level may correspond to a so-called dialog specification parameter. In some embodiments, the dialog loudness level corresponds to a dialog specification parameter that is determined not by using speech gating, but by gating based on a loudness level threshold.
The DRC gains may be used to account for loudness level differences within a program by boosting or clipping signal portions in soft and/or loud sounds according to a selected dynamic range compression curve. One or more of these DRC gains may be calculated/determined by the DRC algorithm based on the selected dynamic range compression curve and a determined (e.g., wideband, overall, specific, etc.) loudness level from one or more corresponding blocks of audio data, frames of audio data, etc.
The loudness level for determining (e.g., static, pre-smoothed, pre-gain limited, etc.) DRC gains by looking up the selected dynamic range compression curve may be calculated at short intervals (e.g., approximately 5.3 milliseconds, etc.). The integration time of the human auditory system (e.g., approximately 200 milliseconds, etc.) can be much longer. The DRC gains obtained from the selected dynamic range compression curves may be smoothed with a time constant that takes into account the long integration time of the human auditory system. To achieve a fast rate of change (increase or decrease) of the loudness level, a short time constant may be used to cause the loudness level to change over a short time interval corresponding to the short time constant. Conversely, to achieve a slow rate of change (increase or decrease) of the loudness level, a long time constant may be used to cause the loudness level to change over a long time interval corresponding to the long time constant.
The human auditory system may react to increased and decreased loudness levels at different integration times. Different time constants may be used to smooth the static DRC gain looked up from the selected dynamic range compression curve depending on whether the loudness level is to increase or decrease. For example, boost (loudness level increase) may be smoothed with a relatively short time constant (e.g., boost time, etc.), while release (loudness level decrease) may be smoothed with a relatively long time constant (e.g., release time, etc.), corresponding to the characteristics of the human visual system.
The DRC gain for a portion of the audio content (e.g., one or more blocks of audio data, frames of audio data, etc.) may be calculated using a loudness level determined from the portion of the audio content. The loudness level to be used for finding in the selected dynamic range compression curve may first be adjusted relative to (e.g., with respect to, etc.) the dialog loudness level (e.g., in a program of which the audio content is a part, etc.) in the metadata extracted from the encoded audio signal 102.
A reference dialog loudness level/output reference level (e.g., -31dB in "line" mode) may be specified or established for a particular playback environment at the decoder 100FS-20dB in the "RF" modeFSEtc.). Additionally, alternatively, or optionally, in some embodiments, the user may be given control over setting or changing the reference dialog loudness level at the decoder 100.
The DRC gain unit 114 may be configured to determine a dialog loudness related gain for the audio content to change from the input dialog loudness level to a reference dialog loudness level as the output dialog loudness level.
The audio renderer 108 may be configured to generate channel-specific audio data 116 for a particular speaker configuration (e.g., multi-channel, etc.) after applying gains determined based on DRC, gain limiting, gain smoothing, etc. to the input audio data extracted from the encoded audio signal 102. The channel-specific audio data 116 may be used to drive speakers, headphones, etc. represented in a speaker configuration.
Additionally and/or alternatively, the decoder 100 may be configured to perform one or more other operations related to processing, rendering, downmixing, resampling, etc., associated with the input audio signal.
The techniques as described herein may be used for various speaker configurations corresponding to various different surround sound configurations (e.g., 2.0, 3.0, 4.0, 4.1, 5.1, 6.1, 7.1, 10.2, 10-60 speaker configurations, 60+ speaker configurations, object signals or combinations of object signals, etc.) and various different rendering environment configurations (e.g., cinema, park, opera house, concert hall, bar, home, auditorium, etc.).
Fig. 2 illustrates an example encoder 150. The encoder 150 may include an audio content interface 152, a dialog loudness analyzer 154, a DRC reference library 156, and an audio signal encoder 158. The encoder 150 may be part of a broadcast system, an internet-based content server, an over-the-air network operator system, a film production system, etc.
The audio content interface 152 may be configured to receive audio content 160 and audio content control inputs 162 for generating the encoded audio signal 102 based at least on some or all of the audio content 160 and the audio content control inputs 162. For example, the audio content interface 152 may be used to receive audio content 160 and audio content control inputs 162 from a content creator, a content provider, and the like.
The audio content 160 may constitute some or all of the total media data including audio only, including audiovisual, and the like. The audio content 160 may include one or more of a portion of a program, several programs, one or more commercials, etc.
The dialog loudness analyzer 154 may be configured to determine/establish one or more dialog loudness levels for one or more portions of the audio content 152 (e.g., one or more programs, one or more commercials, etc.). The audio content may be represented by one or more sets of audio tracks. The dialog audio content of the audio content may be in a separate audio track and/or at least a portion of the dialog audio content of the audio content may be in an audio track that includes non-dialog audio content.
The audio content control input 162 may include some or all of the following: user control inputs, control inputs provided by systems/devices external to encoder 150, control inputs from the content creator, control inputs from the content provider, and the like. For example, a user (such as a mixing engineer or the like) may provide/specify one or more dynamic range compression curve identifiers; these identifiers may be used to retrieve one or more dynamic range compression curves from a data repository, such as a DRC reference repository (156) or the like, that are most appropriate for the audio content 160.
The DRC reference library 156 may be configured to store DRC reference parameter sets, and the like. The DRC reference parameter set may comprise definition data for one or more dynamic range compression curves, etc. The encoder 150 may encode (e.g., concurrently) more than one dynamic range compression curve into the encoded audio signal 102. Zero, one, or more of the dynamic range compression curves may be standard-based, proprietary, customized, decoder-modifiable, and so forth. For example, the dynamic range compression curves of fig. 3 and 4 may be encoded (e.g., concurrently) into the encoded audio signal 102.
The audio signal encoder 158 may be configured to: receive audio content from the audio content interface 152, receive dialog loudness levels from the dialog loudness analyzer 154, retrieve one or more DRC reference parameter sets (i.e., DRC profiles) from a DRC reference library 156, format the audio content into audio data blocks/frames, format dialog loudness levels, DRC reference parameter sets, etc. into metadata (e.g., metadata containers, metadata fields, metadata structures, etc.), and encode the audio data blocks/frames and metadata into the encoded audio signal 102.
Audio content to be encoded into the encoded audio signal 102 as described herein may be received in one or more of a variety of ways (such as wirelessly, via a wired connection, through a file, via internet download, etc.), in one or more of a variety of source audio formats.
The encoded audio signal 102 as described herein may be a portion of an entire media data bitstream (e.g., for an audio broadcast, an audio program, an audiovisual broadcast, etc.). The media data bitstream may be accessed from a server, computer, media storage device, media database, media file, or the like. The media data bitstream may be broadcast, transmitted or received over one or more wireless or wired network links. The media data bitstream may also be transmitted over an intermediate medium such as one or more of a network connection, a USB connection, a wide area network, a local area network, a wireless connection, an optical connection, a bus, a crossbar connection, a serial connection, etc.
Any of the components described (e.g., fig. 1, 2) may be implemented as one or more processes and/or one or more IC circuits (e.g., ASICs, FPGAs, etc.), which may be implemented in hardware, software, or a combination of hardware and software.
Fig. 3 and 4 illustrate example dynamic range compression curves that may be used by the DRC gain unit 104 in the decoder 100 to derive DRC gains from input loudness levels. As illustrated, the dynamic range compression curve may be centered on a reference loudness level (e.g., an output reference level) in the program in order to provide an overall gain suitable for a particular playback environment. Example definition data (e.g., in metadata of the encoded audio signal 102) for a dynamic range compression curve (e.g., including, but not limited to, any of a boost ratio, a cut ratio, a boost time, a release time, etc.) is shown in the following table. For different playback environments (e.g., at the decoder 100), different profiles (e.g., standard movie (film standard), easy movie (film light), standard music (music standard), light music (film light), speech, etc.) may be different:
Figure BDA0003065966880000151
Figure BDA0003065966880000161
TABLE 1
In accordance with in dBSPLOr dBFSLoudness level of meter and dBSPLOne or more compression curves described by associated gains in dB may be received, and DRC gain calculations in dB may be performedSPLDifferent loudness representations (e.g., Sone) where loudness levels have a non-linear relationship. Compression used in DRC gain calculationThe curve may then be transformed to be described with a different loudness representation (e.g., Sone).
Fig. 5 illustrates an example encoded audio signal 102 comprising a sequence of frames (numbered n +1 up to n +30, where n is an integer). In the illustrated example, every 5 th frame is an I frame. In the illustrated example, the I-frame (n +1) includes multiple DRC profiles identified as AVRs (audio/video receivers) for home theaters, tablets, portable HPs (headphones), and portable SPs (speakers). Each DRC profile includes a dynamic range compression curve as shown in fig. 3 and 4.
The multiple DRC profiles may be repeatedly inserted into I-frames of a sequence of frames. This allows the decoder 100 to determine the DRC profile appropriate for the encoded audio signal 102 and the current rendering mode at start-up of the encoded audio signal 102, after tuning into a running audio program, and/or subsequent splicing points. On the other hand, repeated transmission of the full set of DRC profiles results in a relatively high bit stream overhead. In view of this, it is proposed to transmit a varying subset of DRC profiles within an I-frame of the encoded audio signal 102.
Fig. 5 illustrates an example for inserting a DRC profile within a sequence of frames. In the illustrated example, only a single DRC profile from the full set of DRC profiles is inserted into the I-frame. The DRC profile inserted into an I-frame varies between I-frames and as a result, after N I-frames (in the illustrated example, N-4), the decoder 100 has received a full set of N DRC profiles. By doing so, the data rate for transmitting the full set of DRC profiles may be reduced while ensuring that the decoder 100 receives the full set of DRC profiles within a reasonable amount of time.
Fig. 6a and 6b show a flow diagram of an example method 600 for determining a DRC profile for decoding a frame of an encoded audio signal 102. The method 600 may be performed by the decoder 100, and in particular by the selector 110. When the encoded audio signal 102 begins to be received, the DRC profile used by the decoder 100 may be initialized. The DRC profile used to decode the current frame of the encoded audio signal 102 may be referred to as the current DRC profile. Thus, when starting up, the current DRC profile may be initialized. In particular, a default DRC profile (which is available at the decoder 100) may be set as the current DRC profile for rendering the current frame (method step 601). Thus, the variable "Profile" may be set to a Default DRC Profile (Default DRC Profile). In addition, the decoder 100 may track previously used configuration files. The previously used configuration file may be set to undefined.
The method 600 may further comprise a step 602 of retrieving a new frame to be decoded, i.e. the current frame, from the encoded audio signal 102. In step 603, it is verified whether the new frame is an I-frame that may include a DRC profile. If the new frame is not an I-frame, the method 600 continues with step 604 and the new frame is processed using the current DRC profile. Furthermore, in method step 605, the previously used configuration file is set as the current DRC configuration file (prev _ profile).
If the new frame is an I-frame, it may be checked in a method step 606 whether the I-frame comprises DRC data. For example, the metadata of the I-frame may include a flag indicating whether the I-frame includes DRC data. If DRC data is not present, the method 300 can continue with steps 604, 605. Otherwise, the method may proceed to method step 607.
In a method step 607, it may be verified whether the new frame is the first frame of the encoded audio signal 102 to be decoded. As can be seen from the flow diagrams of fig. 6a and 6b, this can be verified by checking the prev _ profile variable. If the prev _ profile variable is undefined, the new frame is the first frame to be decoded. If the new frame is the first frame to be decoded, the decoder 100 may use a predefined DRC profile in addition to the default DRC profile. To this end, the metadata of the new frame may include an Identifier (ID) for such a predefined DRC profile. Such a predefined DRC profile may be stored in a database at the decoder 100. The use of a predefined DRC profile may provide a bit rate efficient means for signaling to the decoder 100 that a DRC profile is to be used, since only the ID of the predefined profile needs to be transmitted (method step 608). A predefined DRC profile signaled using an ID may also be referred to as an implicit (explicit) DRC profile.
It should be noted that in some cases it may be beneficial to use only one predefined DRC profile in addition to the default DRC profile. In such cases, the decoder 100 may be configured to set the profile variable to a predefined (i.e., implicit) DRC profile without receiving any ID within the metadata of the new frame.
The method 600 may further include verifying whether the metadata of the new frame includes one or more explicit (explicit) DRC profiles (step 609). The explicit DRC profile may include an ID for identifying the explicit DRC profile. In addition, the explicit DRC profile typically includes data defining a dynamic range compression curve as shown in fig. 3 and 4. The dynamic range compression curve may be defined as a piecewise linear function. Further, the explicit DRC profile may indicate a range of Output Reference Levels (ORLs) to which the explicit DRC profile applies. For example, a default DRC profile and/or a predefined (implicit) DRC profile may be applicable for output reference levels ranging from-31 dB FS up to 0dB FS.
The ORL of the rendering device may indicate the dynamic range capabilities of the rendering device. In general, dynamic range capability decreases as ORL increases. In case ORL is high, a compression curve with a high degree of compression should be used in order to render the audio signal in an intelligible manner without clipping. On the other hand, in case ORL is low, the compression may be reduced in order to render the audio signal with a high dynamic range. Due to the high dynamic range capability of the rendering device, intelligibility of the audio signal can still be guaranteed.
If the metadata for the new frame includes at least one explicit DRC profile, the profile data for the first DRC profile is read (step 610). Further, it is verified whether the range of ORLs of the first DRC profile is applicable to the currently used rendering device (step 611). If this is not the case, method 600 continues to look for another explicit DRC profile within the metadata of the new frame. On the other hand, if the explicit DRC profile is applicable to the rendering device, the explicit DRC profile may be set to the current DRC profile to be used for processing the new frame (step 614).
The method 600 may further include verifying whether the headphone rendering mode is used and whether the explicit DRC profile is applicable to the headphone rendering mode (step 612). Additionally, method 600 may include verifying whether the explicit DRC profile is an updated profile compared to a previously used profile (step 613). To this end, the ID of the explicit DRC profile may be compared with the ID of the currently used profile. By doing so, it can be ensured that the decoder 100 always uses the most recent DRC profile.
Using the method 600, it may be ensured that the decoder 100 always identifies a DRC profile for rendering a frame of the encoded audio signal 102 even if the decoder 100 has not received a DRC profile for the current rendering mode (i.e. for the current rendering device). Furthermore, it is ensured that the decoder 100, upon receiving the corresponding DRC profile, applies the DRC profile for the current rendering mode.
Thus, a method 600 for decoding an encoded audio signal 102 is described. The encoded audio signal 102 comprises a sequence of frames. Furthermore, the encoded audio signal 102 indicates a plurality of different Dynamic Range Control (DRC) profiles for a corresponding plurality of different rendering modes. An example for a different rendering mode (or different rendering environment) is a first DRC profile for use in a home cinema rendering mode; a second DRC profile for use in a flat-panel rendering mode; a third DRC profile for use in a portable device loudspeaker rendering mode; and/or a fourth DRC profile for use in a headphone rendering mode. The DRC profile defines a specific DRC behavior. DRC behavior can be described in terms of a compression curve (and time constant) and/or in terms of DRC gain. The DRC gain may be a time-equidistant gain that may be applied to the encoded audio signal 102 to deploy the DRC. The compression curves may be accompanied by a time constant, which collectively configure the DRC algorithm. DRCs generally reduce the volume of loud sounds and amplify quiet sounds, thereby compressing the dynamic range of the audio signal for improving the experience in an undesirable reproduction environment.
The frame sequence typically comprises a plurality of consecutive frames forming an audio signal. An audio program (e.g., a broadcast TV or radio program) may include multiple audio signals concatenated at a splice point. For example, the primary audio program may be interrupted in a repetitive manner by the advertisement time. The sequence of frames may correspond to an entire audio program. Alternatively, the sequence of frames may correspond to one of a plurality of audio signals forming an entire audio program.
Different subsets of DRC profiles of the plurality of DRC profiles may be included within different frames of the sequence of frames such that two or more frames of the sequence of frames collectively (jointly) include the plurality of DRC profiles. As indicated above, the distribution of DRC profiles over multiple frames of a sequence of frames results in a reduction of the bitstream overhead for signaling the multiple DRC profiles.
The method 600 may include determining a first rendering mode from a plurality of different rendering modes. In particular, it may be determined which rendering mode is used for rendering the encoded audio signal 102. Further, the method 600 may include determining 609, 610 one or more DRC profiles from a plurality of DRC profiles included within a current frame of the sequence of frames. In other words, one or more DRC profiles from a subset of DRC profiles included within the current frame may be determined. Additionally, a determination 611 can be made as to whether at least one of the one or more DRC profiles is applicable to the first rendering mode. Determining 611 whether at least one of the one or more DRC profiles is applicable to the first rendering mode may comprise: the method further includes determining a first output reference level for the first rendering mode, determining a range of output reference levels to which a DRC profile of the one or more DRC profiles is applicable, and determining whether the first output reference level falls within the range of output reference levels.
The method 600 may further include: if none of the one or more DRC profiles are applicable to the first rendering mode, a default DRC profile is selected 604 as the current DRC profile. The definition data of the default DRC profile is generally known at the decoder 100 for decoding the encoded audio signal 102. Additionally, the method 600 may include decoding (and/or rendering) the current frame using the current DRC profile. Thus, it can be ensured that the decoder 100 can use the DRC profile (and the dynamic range compression curve) even if the decoder 100 has not received a DRC profile specific to the encoded audio signal 102.
Alternatively or additionally, the method 600 may comprise: if a first DRC profile of the one or more DRC profiles is determined to be applicable to the first rendering mode, the first DRC profile is selected 604 as the current DRC profile. As a result, the decoder 100 is configured to use the first DRC profile that is optimal for the encoded audio signal 102 and for the first rendering mode as soon as the decoder 100 receives the first DRC profile.
The method 600 may further comprise determining 603, 606 whether a current frame of the sequence of frames comprises one or more of the plurality of DRC profiles, i.e. whether the current frame comprises a subset of DRC profiles. As outlined in the context of fig. 5, the subset of DRC profiles is typically included within an I-frame of a sequence of frames. Thus, determining 603, 606 whether the current frame includes one or more of the plurality of DRC profiles or whether the current frame includes a subset of DRC profiles may comprise determining 603 whether the current frame is an I-frame. As indicated above, an I-frame may be a frame that may be decoded independently of any other frame in a sequence of frames. This may be due to the fact that the data comprised in such an I-frame is transmitted in a manner independent of data from a preceding or subsequent frame. In particular, the encoding of data included within an I-frame is indistinguishable from data included within a previous or subsequent frame.
Further, determining 603, 606 whether the current frame includes one or more of the plurality of DRC profiles or whether the current frame includes a subset of DRC profiles may include verifying 606 a DRC profile flag included within the current frame. DRC profiles within a bitstream of an encoded audio signal provide a bandwidth and computationally efficient means for identifying frames carrying DRC profiles.
The method 600 can further include determining whether the current frame indicates one of a plurality of implicit DRC profiles. The implicit DRC profile may include a predefined old-fashioned compression curve and time constants that may be used for transcoding to E-AC-3. As indicated above, the definition data of the implicit DRC profile may be known at the decoder 100 for decoding the input audio signal 102. In contrast to the default DRC profile, the implicit DRC profile may be specific to different types of audio signals (as specified in table 1, for example). The current frame of the frame sequence may indicate a particular implicit DRC profile (e.g. by using an identifier, ID). This may provide a bandwidth efficient means for signaling a DRC profile suitable for the encoded audio signal 102. If it is determined that the current frame indicates an implicit DRC profile, then the implicit DRC profile can be selected 608 as the current DRC profile.
The decoding of the current frame may include equalizing a level of the frame sequence to a first output reference level of the first rendering mode. Further, the decoding of the current frame may include adapting a loudness level of the current frame using a dynamic range compression curve specified within the current DRC profile. The adaptation of the loudness level may be performed as outlined in the context of fig. 1.
Depending on the number of frames in the sequence of frames, the current DRC profile may correspond to a default DRC profile (which is generally independent of the input audio signal 102), to an implicit DRC profile (which may be altered in a limited manner to accommodate the input audio signal 102), or to a first explicit DRC profile (which may have been designed for the input audio signal 102 and/or the first rendering mode).
Typically, only a subset of frames includes the DRC profile. Once the current DRC profile has been selected, the current DRC profile may be maintained for decoding frames of the frame sequence that do not include any DRC profiles. Furthermore, even when a frame having a DRC profile is received, the current DRC profile may be maintained as long as no DRC profile is received that is more updated than the current DRC profile and/or that has a higher correlation with the encoded audio signal 102 (where the selected first explicit DRC profile has a higher correlation than the selected implicit DRC profile, which has a higher correlation than the default DRC profile). By doing so, continuity and optimality of the DRC profile used can be ensured.
Complementary to the method 600 for decoding the encoded audio signal 102, a method for generating the encoded audio signal 102 or encoding the encoded audio signal 102 is described. The encoded audio signal 102 comprises a sequence of frames. Furthermore, the encoded audio signal 102 indicates a plurality of different Dynamic Range Control (DRC) profiles for a corresponding plurality of different rendering modes. The method may include inserting different subsets of DRC profiles of the plurality of DRC profiles into different frames of a sequence of frames such that two or more frames of the sequence of frames collectively include the plurality of DRC profiles. In other words, a subset of DRC profiles having a DRC profile less than the total number of DRC profiles may be provided along with different frames of the sequence of frames. By doing so, the overhead of encoding the audio signal 102 may be reduced while providing the full set of DRC profiles to the corresponding decoder 100. In other words, this method has an advantage that the degree of freedom of the encoder 150 to transmit the data of DRC is improved. This degree of freedom can be used to reduce the bit rate.
The frame sequence may include a subsequence of I frames (e.g., every xth frame of the frame sequence may be an I frame). Different DRC profile subsets may be inserted into different (e.g., consecutive) I-frames of an I-frame subsequence. To further reduce bandwidth, I-frames may be skipped, i.e., some of the I-frames may not include any DRC profile data.
The (e.g., each) subset of DRC profiles may comprise only one DRC profile. In particular, the plurality of DRC profiles may comprise N DRC profiles, where N is an integer, N > 1. The N DRC profiles may be inserted into N different frames in the sequence of frames. By doing so, the bit rate required to transmit the DRC profile can be minimized.
The method may further include inserting all of the plurality of DRC profiles into a first frame of a sequence of frames (e.g., a first frame of a sequence of frames of an audio signal). As a result, the rendering of the encoded audio signal 102 can be started directly with the correct explicit DRC profile. As indicated above, the audio program may be subdivided into a plurality of sub-audio programs, for example, a main audio program interrupted by a commercial break. It may be beneficial to insert all of the multiple DRC profiles into the first frame of each sub-audio program. In other words, it may be beneficial to insert all of the plurality of DRC profiles directly after one or more splice points of an audio program that includes a plurality of sub-audio programs.
Different subsets of DRC profiles of the plurality of DRC profiles may be inserted into different frames of the sequence of frames such that each subsequence of M directly consecutive frames of the sequence of frames together constitutes the plurality of DRC profiles, where M is an integer and M > 1. In other words, multiple DRC profiles may be repeatedly transmitted within a block of M frames. As a result, the decoder 100 must wait a maximum of M frames before obtaining the optimal explicit DRC profile for encoding the audio signal 102.
The method may further include inserting a flag into a frame of the sequence of frames, wherein the flag indicates whether the frame includes a DRC profile. Providing such flags enables the corresponding decoder 100 to efficiently identify frames that include DRC profile data.
A DRC profile of the multiple DRC profiles may be an explicit DRC profile comprising (i.e. carrying) definition data for defining a dynamic range compression curve. As outlined in this document, the dynamic range compression curve provides a mapping between the input loudness and the output loudness and/or the gain to be applied to the audio signal. In particular, the definition data may include one or more of the following: a boost gain for boosting the input loudness; a boost gain range indicating a range of input loudness to which the boost gain is applicable; a zero band range, which indicates the range of input loudness for which a gain of 0dB applies; a clipping gain for attenuating the input loudness; a cut-off gain range indicating a range of input loudness to which the cut-off gain is applicable; a boost gain ratio indicating a transition between a zero gain and a boost gain; and/or a skiving gain ratio indicating a transition between zero gain and skiving gain.
The method may further include inserting an indication (e.g., identifier, ID) of an implicit DRC profile, wherein definition data of the implicit DRC profile is generally known to a decoder 100 encoding the audio signal 102. The indication of the implicit DRC profile may provide a bandwidth efficient means for signaling the DRC profile that is (in a limited manner) altered to accommodate the encoded audio signal 102.
As outlined above, the frames of a frame sequence typically comprise audio data and metadata. The DRC profile subset is typically inserted as metadata.
The DRC profile may comprise definition data for defining a range of output reference levels to which the DRC profile is applicable. The output reference level typically indicates the dynamic range of the rendering mode. In particular, the dynamic range of the rendering mode may be reduced as the output reference level increases, and vice versa. Further, the maximum boost gain and the maximum clipping gain of the dynamic range compression curve of the DRC profile may increase as the output reference level increases, and vice versa. Thus, outputting the reference level provides an efficient means for selecting the appropriate DRC profile (with the appropriate dynamic range compression curve) for a particular rendering mode.
The method may further comprise generating a bitstream comprising the encoded audio signal 102. The bitstream may be an AC4 bitstream, i.e., the bitstream may be compatible with the AC4 bitstream format.
The method may further include inserting explicit DRC gains for encoding the audio signal 102 into frames of the sequence of frames. In particular, the DRC gain applicable to a particular frame of the sequence of frames may be inserted into the particular frame. Thus, each frame of the sequence of frames may include a DRC data component that includes one or more explicit DRC gains to be applied to the respective frame. In particular, each frame may include different explicit DRC gains for different rendering modes. To this end, DRC algorithms for different rendering modes may be applied within the encoder 150, and different DRC gains for different rendering modes may be determined at the encoder 150. Different DRC gains may then be explicitly inserted within the frame sequence. As a result, the corresponding decoder 100 directly applies the explicit DRC gain without performing the DRC algorithm using the dynamic range compression curve.
Thus, the frame sequence may include or may indicate a plurality of explicit DRC profiles for signaling dynamic range compression curves for a plurality of corresponding rendering modes. The multiple DRC profiles may be inserted into some (but not all) of the frames (e.g., I-frames) of a sequence of frames. Further, the frame sequence may include or may indicate one or more DRC profiles for the corresponding one or more rendering modes, wherein the one or more DRC profiles indicate that explicit DRC gains for the one or more rendering modes are inserted into frames of the frame sequence. For example, the one or more DRC profiles for signaling the explicit DRC gain may include a flag indicating whether the explicit DRC gain is included in a frame of the sequence of frames. The DRC gain may be inserted into each frame of the sequence of frames. In particular, each frame may include one or more DRC gains to be used for decoding the frame.
The method may include inserting a DRC profile for an explicit DRC gain into a subset of frames in the sequence of frames. For example, a DRC profile whose DRC gains are transmitted may indicate DRC configuration data for explicit gains. Specifically, the DRC profile whose DRC gain is transmitted may be included in all of the subset of DRC profiles. The DRC configuration data (e.g., a flag) may indicate that the sequence of frames includes an explicit DRC gain for a particular rendering mode. By doing so, the decoder 100 is informed of the following fact: for a particular rendering mode, the explicit DRC gains will be derived directly from the frames of the frame sequence.
Accordingly, the method may further comprise determining an explicit DRC gain for encoding the audio signal 102 for the particular rendering mode. Additionally, the method may include inserting the explicit DRC gain into a frame of the sequence of frames. The explicit DRC gain may be inserted into a frame in the frame sequence to which the explicit DRC gain applies. Further, a frame in the sequence of frames may include one or more explicit DRC gains required to decode the frame within a particular rendering mode.
The method may further include inserting a DRC profile indicating DRC configuration data for a particular rendering mode into a subset of frames (e.g., I-frames) in the sequence of frames. DRC configuration data (including, for example, flags) may indicate the fact that: for a particular rendering mode, the explicit DRC gain is included in a frame of the sequence of frames. Thus, the decoder 100 can efficiently determine whether to use compression curves from multiple DRC profiles for signaling dynamic range compression curves or whether to use explicit DRC gains.
The DRC profile for signaling the dynamic range compression curve and one or more DRC profiles pointing to the explicit DRC profile may be included within a dedicated syntax element (referred to as, e.g., DRC profile syntax element) of an I-frame of the sequence of frames.
The methods and systems described in this document may be implemented as software, firmware, and/or hardware. Some components may be implemented as software running on a digital signal processor or microprocessor, for example. Other components may be implemented as hardware and/or application specific integrated circuits, for example. The signals encountered in the described methods and systems may be stored on a medium such as a random access memory or an optical storage medium. They may be transmitted via a network, such as a radio network, a satellite network, a wireless network, or a wired network (e.g., the internet). Typical devices that use the methods and systems described in this document are portable electronic devices or other consumer devices for storing and/or rendering audio signals.

Claims (5)

1. A method for decoding an encoded audio signal, wherein the encoded audio signal comprises a sequence of frames containing encoded audio data and metadata comprising a plurality of different sets of dynamic range control gains, referred to as DRCs, wherein the encoded audio signal further comprises an indication of the loudness of the audio signal and DRC configuration metadata in one or more frames of the sequence of frames, wherein the DRC configuration metadata indicates a plurality of DRC profiles associated with the encoded audio signal and for each DRC profile indicates a range of output reference levels to which the DRC profile applies, wherein each set of DRC gains corresponds to one of the plurality of DRC profiles, the method comprising:
setting a desired output reference level for the decoded audio signal;
identifying one or more of DRC profiles for which the applicable range of output reference levels comprises a desired output reference level for the decoded audio signal;
selecting one of the identified DRC profiles;
decoding the encoded audio signal;
adjusting a dynamic range of the decoded audio signal by applying a DRC gain corresponding to the selected DRC profile to the decoded audio signal;
determining a loudness-related gain in response to the indication of the loudness of the audio signal and the desired output reference level of the decoded audio signal; and
the loudness-related gain is applied to the adjusted decoded audio signal to obtain a loudness-adjusted decoded audio signal having a desired output reference level.
2. A decoder for decoding an encoded audio signal, wherein the encoded audio signal comprises a sequence of frames containing encoded audio data and metadata comprising a plurality of different sets of dynamic range control gains, referred to as DRCs, wherein the encoded audio signal further comprises an indication of the loudness of the audio signal and DRC configuration metadata in one or more frames of the sequence of frames, wherein the DRC configuration metadata indicates a plurality of DRC profiles associated with the encoded audio signal and for each DRC profile indicates a range of output reference levels to which the DRC profile applies, wherein each set of DRC gains corresponds to one of the plurality of DRC profiles, wherein the decoder comprises one or more processors for:
setting a desired output reference level for the decoded audio signal;
identifying one or more of DRC profiles for which the applicable range of output reference levels comprises a desired output reference level for the decoded audio signal;
selecting one of the identified DRC profiles;
decoding the encoded audio signal;
adjusting a dynamic range of the decoded audio signal by applying a DRC gain corresponding to the selected DRC profile to the decoded audio signal;
determining a loudness-related gain in response to the indication of the loudness of the audio signal and the desired output reference level of the decoded audio signal; and
the loudness-related gain is applied to the adjusted decoded audio signal to obtain a loudness-adjusted decoded audio signal having a desired output reference level.
3. A non-transitory computer-readable storage medium containing a sequence of instructions, wherein the sequence of instructions, when executed by an audio signal processing device, causes the audio signal processing device to perform the method of claim 1.
4. A method for generating an encoded audio signal (102), wherein the encoded audio signal (102) comprises a sequence of frames; wherein the encoded audio signal (102) is indicative of a plurality of different dynamic range control, DRC, profiles for a corresponding plurality of different rendering modes; the method comprises the following steps:
inserting different subsets of DRC profiles of the plurality of different DRC profiles into different frames of the sequence of frames such that two or more frames of the sequence of frames collectively comprise the plurality of different DRC profiles;
wherein the subset of DRC profiles comprises only a single DRC profile and the plurality of different DRC profiles comprises N DRC profiles, where N is an integer and N > 1.
5. An encoder (150) for generating an encoded audio signal (102), wherein the encoded audio signal (102) comprises a sequence of frames; wherein the encoded audio signal (102) is indicative of a plurality of different dynamic range control, DRC, profiles for a corresponding plurality of different rendering modes; wherein the encoder (150) is configured to:
inserting different subsets of DRC profiles of the plurality of different DRC profiles into different frames of the sequence of frames such that two or more frames of the sequence of frames collectively comprise the plurality of different DRC profiles;
wherein the subset of DRC profiles comprises only a single DRC profile and the plurality of different DRC profiles comprises N DRC profiles, where N is an integer and N > 1.
CN202110526962.0A 2014-10-01 2015-09-29 Efficient DRC profile transmission Pending CN113257273A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201462058228P 2014-10-01 2014-10-01
US62/058,228 2014-10-01
CN201580053702.9A CN106796799B (en) 2014-10-01 2015-09-29 Efficient DRC profile transmission

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201580053702.9A Division CN106796799B (en) 2014-10-01 2015-09-29 Efficient DRC profile transmission

Publications (1)

Publication Number Publication Date
CN113257273A true CN113257273A (en) 2021-08-13

Family

ID=54288763

Family Applications (4)

Application Number Title Priority Date Filing Date
CN202110527052.4A Pending CN113257275A (en) 2014-10-01 2015-09-29 Efficient DRC profile transmission
CN202110526962.0A Pending CN113257273A (en) 2014-10-01 2015-09-29 Efficient DRC profile transmission
CN201580053702.9A Active CN106796799B (en) 2014-10-01 2015-09-29 Efficient DRC profile transmission
CN202110526963.5A Pending CN113257274A (en) 2014-10-01 2015-09-29 Efficient DRC profile transmission

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202110527052.4A Pending CN113257275A (en) 2014-10-01 2015-09-29 Efficient DRC profile transmission

Family Applications After (2)

Application Number Title Priority Date Filing Date
CN201580053702.9A Active CN106796799B (en) 2014-10-01 2015-09-29 Efficient DRC profile transmission
CN202110526963.5A Pending CN113257274A (en) 2014-10-01 2015-09-29 Efficient DRC profile transmission

Country Status (6)

Country Link
US (6) US10020001B2 (en)
EP (4) EP4044180A1 (en)
JP (5) JP6727194B2 (en)
CN (4) CN113257275A (en)
ES (2) ES2912586T3 (en)
WO (1) WO2016050740A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113257274A (en) * 2014-10-01 2021-08-13 杜比国际公司 Efficient DRC profile transmission

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102360613B1 (en) * 2014-11-07 2022-02-09 소니그룹주식회사 Transmission device, transmission method, reception device, and reception method
US9837086B2 (en) * 2015-07-31 2017-12-05 Apple Inc. Encoded audio extended metadata-based dynamic range control
US10999678B2 (en) * 2017-03-24 2021-05-04 Sharp Kabushiki Kaisha Audio signal processing device and audio signal processing system
EP3618463A4 (en) * 2017-04-25 2020-04-29 Sony Corporation Signal processing device, method, and program
EP3506661A1 (en) 2017-12-29 2019-07-03 Nokia Technologies Oy An apparatus, method and computer program for providing notifications
EP3753105B1 (en) * 2018-02-15 2023-01-11 Dolby Laboratories Licensing Corporation Loudness control methods and devices
EP3827429A4 (en) * 2018-07-25 2022-04-20 Dolby Laboratories Licensing Corporation Compressor target curve to avoid boosting noise
KR102253524B1 (en) * 2019-09-02 2021-05-20 네이버 주식회사 Method and system for applying loudness normalization
CN111933173B (en) * 2020-08-03 2022-03-01 南京工程学院 Dynamic range control method and system for gain smooth adjustment
US11907611B2 (en) * 2020-11-10 2024-02-20 Apple Inc. Deferred loudness adjustment for dynamic range control
JP2023551222A (en) * 2020-11-24 2023-12-07 ガウディオ・ラボ・インコーポレイテッド Method and apparatus for normalizing audio signals
EP4309373A1 (en) * 2021-03-10 2024-01-24 Dolby International AB Apparatus and method for leveling main and supplementary audio from a hbbtv service

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5956674A (en) * 1995-12-01 1999-09-21 Digital Theater Systems, Inc. Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels
US20020019733A1 (en) * 2000-05-30 2002-02-14 Adoram Erell System and method for enhancing the intelligibility of received speech in a noise environment
JP2007109328A (en) * 2005-10-14 2007-04-26 Kenwood Corp Reproducing device
US20100083344A1 (en) * 2008-09-30 2010-04-01 Dolby Laboratories Licensing Corporation Transcoding of audio metadata
US20120310654A1 (en) * 2010-02-11 2012-12-06 Dolby Laboratories Licensing Corporation System and Method for Non-destructively Normalizing Loudness of Audio Signals Within Portable Devices
CN102986136A (en) * 2010-04-22 2013-03-20 弗兰霍菲尔运输应用研究公司 Apparatus and method for modifying an input audio signal
CN203134365U (en) * 2013-01-21 2013-08-14 杜比实验室特许公司 Audio frequency decoder for audio processing by using loudness processing state metadata
CN105103222A (en) * 2013-03-29 2015-11-25 苹果公司 Metadata for loudness and dynamic range control

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659539A (en) 1995-07-14 1997-08-19 Oracle Corporation Method and apparatus for frame accurate access of digital audio-visual information
US6104441A (en) 1998-04-29 2000-08-15 Hewlett Packard Company System for editing compressed image sequences
KR100583497B1 (en) * 1999-04-02 2006-05-24 마츠시타 덴끼 산교 가부시키가이샤 Optical disc, recording device and reproducing device
KR20050084400A (en) 2002-12-18 2005-08-26 코닌클리케 필립스 일렉트로닉스 엔.브이. Adaptive encoding of digital multimedia information
US20040261111A1 (en) 2003-06-20 2004-12-23 Aboulgasem Abulgasem Hassan Interactive mulitmedia communications at low bit rates
US7398207B2 (en) * 2003-08-25 2008-07-08 Time Warner Interactive Video Group, Inc. Methods and systems for determining audio loudness levels in programming
TWI247546B (en) 2004-04-22 2006-01-11 Newsoft Technology Corp A video encoding method which carries out the encoding of P frame or B frame by utilizing I frame
TW200638335A (en) * 2005-04-13 2006-11-01 Dolby Lab Licensing Corp Audio metadata verification
US8199834B2 (en) 2006-01-04 2012-06-12 University Of Dayton Frame decimation through frame simplification
RU2417514C2 (en) 2006-04-27 2011-04-27 Долби Лэборетериз Лайсенсинг Корпорейшн Sound amplification control based on particular volume of acoustic event detection
US8521314B2 (en) * 2006-11-01 2013-08-27 Dolby Laboratories Licensing Corporation Hierarchical control path with constraints for audio dynamics processing
JP5530720B2 (en) 2007-02-26 2014-06-25 ドルビー ラボラトリーズ ライセンシング コーポレイション Speech enhancement method, apparatus, and computer-readable recording medium for entertainment audio
CN101295504B (en) * 2007-04-28 2013-03-27 诺基亚公司 Entertainment audio only for text application
DE112008000552B4 (en) 2007-05-14 2020-04-23 Samsung Electronics Co., Ltd. Method and device for receiving radio
US8468426B2 (en) 2008-07-02 2013-06-18 Apple Inc. Multimedia-aware quality-of-service and error correction provisioning
WO2010025686A1 (en) 2008-09-05 2010-03-11 The Chinese University Of Hong Kong Methods and devices for live streaming using pre-indexed file formats
US8606009B2 (en) * 2010-02-04 2013-12-10 Microsoft Corporation High dynamic range image generation and rendering
EP2610865B1 (en) 2010-08-23 2014-07-23 Panasonic Corporation Audio signal processing device and audio signal processing method
WO2014124377A2 (en) * 2013-02-11 2014-08-14 Dolby Laboratories Licensing Corporation Audio bitstreams with supplementary data and encoding and decoding of such bitstreams
US9055367B2 (en) * 2011-04-08 2015-06-09 Qualcomm Incorporated Integrated psychoacoustic bass enhancement (PBE) for improved audio
WO2012146757A1 (en) * 2011-04-28 2012-11-01 Dolby International Ab Efficient content classification and loudness estimation
KR101858695B1 (en) 2012-04-09 2018-05-16 엘지전자 주식회사 Method for managing data
JP5885571B2 (en) * 2012-04-16 2016-03-15 アルパイン株式会社 Digital broadcast receiver
KR102473260B1 (en) * 2013-01-21 2022-12-05 돌비 레버러토리즈 라이쎈싱 코오포레이션 Optimizing loudness and dynamic range across different playback devices
CA2898567C (en) 2013-01-28 2018-09-18 Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. Method and apparatus for normalized audio playback of media with and without embedded loudness metadata on new media devices
US9607624B2 (en) * 2013-03-29 2017-03-28 Apple Inc. Metadata driven dynamic range control
TWM487509U (en) * 2013-06-19 2014-10-01 杜比實驗室特許公司 Audio processing apparatus and electrical device
CN105531759B (en) * 2013-09-12 2019-11-26 杜比实验室特许公司 Loudness for lower mixed audio content adjusts
US10095468B2 (en) * 2013-09-12 2018-10-09 Dolby Laboratories Licensing Corporation Dynamic range control for a wide variety of playback environments
WO2016040906A1 (en) * 2014-09-11 2016-03-17 Grundy Kevin Patrick System and method for controlling dynamic range compression image processing
CN113257275A (en) * 2014-10-01 2021-08-13 杜比国际公司 Efficient DRC profile transmission

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5956674A (en) * 1995-12-01 1999-09-21 Digital Theater Systems, Inc. Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels
CN101872618A (en) * 1995-12-01 2010-10-27 Dts(Bvi)有限公司 Multi-channel audio decoder
US20020019733A1 (en) * 2000-05-30 2002-02-14 Adoram Erell System and method for enhancing the intelligibility of received speech in a noise environment
JP2007109328A (en) * 2005-10-14 2007-04-26 Kenwood Corp Reproducing device
US20100083344A1 (en) * 2008-09-30 2010-04-01 Dolby Laboratories Licensing Corporation Transcoding of audio metadata
CN102682780A (en) * 2008-09-30 2012-09-19 杜比国际公司 Transcoding of audio metadata
US20120310654A1 (en) * 2010-02-11 2012-12-06 Dolby Laboratories Licensing Corporation System and Method for Non-destructively Normalizing Loudness of Audio Signals Within Portable Devices
CN103795364A (en) * 2010-02-11 2014-05-14 杜比实验室特许公司 Method and device for decoding encoded input signal
CN102986136A (en) * 2010-04-22 2013-03-20 弗兰霍菲尔运输应用研究公司 Apparatus and method for modifying an input audio signal
CN203134365U (en) * 2013-01-21 2013-08-14 杜比实验室特许公司 Audio frequency decoder for audio processing by using loudness processing state metadata
CN105103222A (en) * 2013-03-29 2015-11-25 苹果公司 Metadata for loudness and dynamic range control

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
车振华, 姜晔, 邹采荣, 吴镇扬: "MPEG-2 AAC中的动态范围控制(DRC)技术", 电声技术, no. 03, 17 March 2001 (2001-03-17) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113257274A (en) * 2014-10-01 2021-08-13 杜比国际公司 Efficient DRC profile transmission
CN113257275A (en) * 2014-10-01 2021-08-13 杜比国际公司 Efficient DRC profile transmission

Also Published As

Publication number Publication date
US20190279652A1 (en) 2019-09-12
ES2814900T3 (en) 2021-03-29
US10783897B2 (en) 2020-09-22
CN106796799B (en) 2021-06-04
US20170249950A1 (en) 2017-08-31
JP2021193817A (en) 2021-12-23
US10354670B2 (en) 2019-07-16
EP3736809B1 (en) 2022-03-09
JP2023099123A (en) 2023-07-11
US10020001B2 (en) 2018-07-10
CN113257274A (en) 2021-08-13
JP6727194B2 (en) 2020-07-22
JP7273914B2 (en) 2023-05-15
US20210065728A1 (en) 2021-03-04
US20220254362A1 (en) 2022-08-11
EP3736809A1 (en) 2020-11-11
JP2017534903A (en) 2017-11-24
EP4044180A1 (en) 2022-08-17
US20240029748A1 (en) 2024-01-25
EP3201915A1 (en) 2017-08-09
JP6945092B2 (en) 2021-10-06
JP6834049B2 (en) 2021-02-24
WO2016050740A1 (en) 2016-04-07
EP3467827A1 (en) 2019-04-10
US20190139561A1 (en) 2019-05-09
CN113257275A (en) 2021-08-13
US11727948B2 (en) 2023-08-15
US11250868B2 (en) 2022-02-15
EP3201915B1 (en) 2018-12-12
JP2020171041A (en) 2020-10-15
CN106796799A (en) 2017-05-31
EP3467827B1 (en) 2020-07-29
ES2912586T3 (en) 2022-05-26
JP2021073814A (en) 2021-05-13

Similar Documents

Publication Publication Date Title
US11727948B2 (en) Efficient DRC profile transmission
JP7049503B2 (en) Dynamic range control for a variety of playback environments
JP7038788B2 (en) Loudness adjustments for downmixed audio content
CN105103222B (en) Metadata for loudness and dynamic range control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40057532

Country of ref document: HK