WO2023104360A1 - Method and apparatus for processing of audio data - Google Patents
Method and apparatus for processing of audio data Download PDFInfo
- Publication number
- WO2023104360A1 WO2023104360A1 PCT/EP2022/073628 EP2022073628W WO2023104360A1 WO 2023104360 A1 WO2023104360 A1 WO 2023104360A1 EP 2022073628 W EP2022073628 W EP 2022073628W WO 2023104360 A1 WO2023104360 A1 WO 2023104360A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- drc
- dynamic
- loudness
- audio data
- metadata
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/162—Interface to dedicated audio devices, e.g. audio drivers, interface to CODECs
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/167—Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G7/00—Volume compression or expansion in amplifiers
- H03G7/002—Volume compression or expansion in amplifiers in untuned or low-frequency amplifiers, e.g. audio amplifiers
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03G—CONTROL OF AMPLIFICATION
- H03G7/00—Volume compression or expansion in amplifiers
- H03G7/007—Volume compression or expansion in amplifiers of digital or coded signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/13—Aspects of volume control, not necessarily automatic, in stereophonic sound systems
Definitions
- the present disclosure relates generally to a method of metadata-based dynamic processing of audio data for playback and, in particular, to determining and applying one or more processing parameters to the audio data for dynamic loudness adjustment and/or dynamic range compression.
- the present disclosure further relates to a method of encoding audio data and metadata for dynamic loudness adjustment and/or dynamic range compression into a bitstream.
- the present disclosure yet further relates to a respective decoder and encoder as well as to a respective system and computer program products.
- the present disclosure further relates to a method of processing audio data for playback, a decoder for processing audio data for playback, and respective computer program products.
- loudness is the individual experience of sound pressure.
- the loudness of dialogue in a program has been found to be the most crucial parameter determining the perception of program loudness by a listener.
- the average loudness of a program is typically required for loudness compliance (for example, the CALM act in the US), and is also used for aligning dynamic range control (DRC) parameters.
- the dynamic range of a program is the difference between its quietest and loudest sounds.
- the dynamic range of a program depends on its content, for example, an action movie may have a different and wider dynamic range than a documentary, and reflects a creator's intent.
- capabilities of devices to play back audio content in the original dynamic range vary strongly.
- dynamic range control is thus a further key factor in providing optimal listening experience.
- the entire audio program or an audio program segment has to be analyzed and the resulting loudness and DRC parameters can be delivered along with audio data or encoded audio data to be applied in a decoder or playback device.
- loudness processing or levelling is used to ensure loudness compliance and, if applicable, potential dynamic range constraints depending on playback requirements. This approach delivers processed audio that is “optimized” for a single playback environment.
- a method of metadata-based dynamic processing of audio data for playback may include receiving, by a decoder, a bitstream including audio data and metadata for dynamic loudness.
- the method may further include decoding, by the decoder, the audio data and the metadata to obtain decoded audio data and the metadata.
- the method may further include determining, by the decoder, from the metadata, one or more processing parameters for dynamic loudness adjustment based on a playback condition.
- the method may further include applying the determined one or more processing parameters to the decoded audio data to obtain processed audio data.
- the method may include outputting the processed audio data for playback.
- said determining the one or more processing parameters may further include determining one or more processing parameters for dynamic range compression, DRC, based on the playback condition.
- the playback condition information may be indicative of a specific loudspeaker setup.
- the playback condition may include one or more of a device type of the decoder, characteristics of a playback device, characteristics of a loudspeaker, a loudspeaker setup, characteristics of background noise, characteristics of ambient noise and characteristics of the acoustic environment.
- the selected set of metadata may include a set of DRC sequences, DRCSet.
- each of the sets of metadata may include a respective set of DRC sequences, DRCSet.
- said determining the one or more processing parameters may be said to further include selecting, by the decoder, at least one of a set of DRC sequences, DRCSet, a set of equalizer parameters, EQSet, and a downmix, corresponding to the playback condition.
- said determining the one or more processing parameters may further include identifying a metadata identifier indicative of the at least one selected DRCSet, EQSet and downmix to determine the one or more processing parameters from the metadata.
- selecting the set of metadata may include identifying a set of metadata corresponding to a specific downmix.
- the specific downmix may be determined based on the loudspeaker setup.
- the metadata may include one or more processing parameters relating to average loudness values and optionally one or more processing parameters relating to dynamic range compression characteristics.
- each set of metadata may include such one or more processing parameters relating to average loudness values and optionally one or more processing parameters relating to dynamic range compression characteristics.
- the bitstream may further include additional metadata for static loudness adjustment to be applied to the decoded audio data.
- the bitstream may be an MPEG-D DRC bitstream and the presence of metadata may be signaled based on MPEG-D DRC bitstream syntax.
- a loudnessInfoSetExtension()-element may be used to carry the metadata as a payload.
- the metadata may comprise one or more metadata payloads, wherein each metadata payload may include a plurality of sets of parameters and identifiers, with each set including at least one of a DRCSet identifier, drcSetld, an EQSet identifier, eqSetld, and a downmix identifier, downmixld, in combination with one or more processing parameters relating to the identifiers in the set.
- said determining the one or more processing parameters may involve selecting a set among the plurality of sets in the payload based on the at least one DRCSet, EQSet, and downmix selected by the decoder, wherein the one or more processing parameters determined by the decoder may be the one or more processing parameters relating to the identifiers in the selected set.
- a decoder for metadata-based dynamic processing of audio data for playback.
- the decoder may comprise one or more processors and non-transitory memory configured to perform a method including receiving, by the decoder, a bitstream including audio data and metadata for dynamic loudness adjustment; decoding, by the decoder, the audio data and the metadata to obtain decoded audio data and the metadata; determining, by the decoder, from the metadata, one or more processing parameters for dynamic loudness adjustment based on a playback condition; applying the determined one or more processing parameters to the decoded audio data to obtain processed audio data; and outputting the processed audio data for playback.
- a method of encoding audio data and metadata for dynamic loudness adjustment, into a bitstream may include inputting original audio data into a loudness leveler for loudness processing to obtain, as an output from the loudness leveler, loudness processed audio data.
- the method may further include generating the metadata for dynamic loudness adjustment based on the loudness processed audio data and the original audio data.
- the method may include encoding the original audio data and the metadata into the bitstream.
- the metadata may include a plurality of sets of metadata. Each set of metadata may correspond to a respective (e.g., different) playback condition.
- the method may further include generating additional metadata for static loudness adjustment to be used by a decoder.
- said generating metadata may include comparison of the loudness processed audio data to the original audio data, wherein the metadata may be generated based on a result of said comparison.
- said generating metadata may further include measuring the loudness over one or more pre-defined time periods, wherein the metadata may be generated further based on the measured loudness.
- the measuring may comprise measuring overall loudness of the audio data.
- the measuring may comprise measuring loudness of dialogue in the audio data.
- the bitstream may be an MPEG-D DRC bitstream and the presence of the metadata may be signaled based on MPEG-D DRC bitstream syntax.
- a loudnessInfoSetExtension()-element may be used to carry the metadata as a payload.
- the metadata may comprise one or more metadata payloads, wherein each metadata payload may include a plurality of sets of parameters and identifiers, with each set including at least one of a DRCSet identifier, drcSetld, an EQSet identifier, eqSetld, and a downmix identifier, downmixld, in combination with one or more processing parameters relating to the identifiers in the set, and wherein the one or more processing parameters may be parameters for dynamic loudness adjustment by a decoder.
- the at least one of the drcSetld, the eqSetld, and the downmixld may be related to at least one of a set of DRC sequences, DRCSet, a set of equalizer parameters, EQSet, and downmix, to be selected by the decoder.
- the encoder may comprise one or more processors and non-transitory memory configured to perform a method including inputting original audio data into a loudness leveler for loudness processing to obtain, as an output from the loudness leveler, loudness processed audio data; generating the metadata for dynamic loudness adjustment based on the loudness processed audio data and the original audio data; and encoding the original audio data and the metadata into the bitstream.
- a system of an encoder for encoding in a bitstream original audio data and metadata for dynamic loudness adjustment and a decoder for metadata-based dynamic processing of audio data for playback.
- a computer program product comprising a computer-readable storage medium with instructions adapted to cause the device to carry out a method of metadata-based dynamic processing of audio data for playback or a method of encoding audio data and metadata for dynamic loudness adjustment, into a bitstream when executed by a device having processing capability.
- a computer- readable storage medium storing the computer program product described herein.
- the method may include receiving, by a decoder, a bitstream including encoded audio data and metadata, wherein the metadata includes one or more dynamic range control (DRC) sets, and for each DRC set, an indication of whether the DRC set is configured for providing a dynamic loudness compensation effect.
- the method may further include parsing the metadata, by the decoder, to identify DRC sets that are configured for providing the dynamic loudness compensation effect.
- the method may further include decoding, by the decoder, the encoded audio data to obtain decoded audio data.
- the method may further include selecting, by the decoder, one of the identified DRC sets configured for providing the dynamic loudness compensation effect.
- the method may further include applying to the decoded audio data, by the decoder, the one or more DRC gains corresponding to the selected DRC set to obtain dynamic loudness compensated audio data. And the method may include outputting the dynamic loudness compensated audio data for playback.
- the metadata may include a plurality of DRC sets configured for providing the dynamic loudness adjustment, wherein each of the plurality of DRC sets may also be associated with one or more playback conditions, and wherein the selecting may be performed in response to an indication of a playback condition provided to the decoder.
- the one or more DRC sets may also be configured to provide dynamic range control.
- the playback condition may include one or more of a device type of the decoder, characteristics of a playback device, characteristics of a loudspeaker, a loudspeaker setup, characteristics of background noise, characteristics of ambient noise and characteristics of the acoustic environment.
- the indication of whether the DRC set is configured for providing the dynamic loudness compensation effect may be provided in a parameter indicating one or more effects provided by the DRC set.
- the parameter indicating one or more effects provided by the DRC set may be a drcSetEffect bitfield of an MPEG-D DRC bitstream, wherein individual bits of the drcSetEffect bitfield correspond to different effects, and one of the bits of the drcSetEffect bitfield corresponds to the dynamic loudness compensation effect.
- the indication of whether the DRC set is configured for providing the dynamic loudness compensation effect may be whether the DRC set is specified in a dynamic loudness compensation bitstream payload.
- the dynamic loudness compensation bitstream payload may be included in an extension field of a previously defined bitstream syntax.
- the extension field may be a uniDrcConfigExtension field of an MPEG-D DRC bitstream, and the dynamic loudness compensation bitstream payload may be included only for specific values of a uniDrcConfigExtType parameter.
- a plurality of dynamic loudness compensation payloads specifying a plurality of DRC sets configured for providing the dynamic loudness compensation effect may be included in the extension field of the previously defined bitstream syntax.
- the indication of whether the DRC set is configured for providing the dynamic loudness compensation effect may be a field of a previously existing configuration element of a previously defined bitstream syntax.
- the field may be a dynamicLoudCompDRCSet parameter
- the previously existing configuration element may be a downmixinstructions element, a drcInstructionsBasic element, or a drdnstructionsUniDRC element of an MPEG-D DRC bitstream.
- the field may be a previously existing field reserved for future use.
- the indication of whether the DRC set is configured for providing the dynamic loudness compensation effect may be a field of an updated version of a previously existing configuration element of a previously defined bitstream syntax.
- the field maybe a dynamicLoudCompDRCSet parameter
- the updated version of the previously existing configuration element may be a downmixInstructionsV2 element or a drc!nstructionsUniDrcV2 element.
- an indication that a dynamic loudness compensation effect is desired may be provided to the decoder through an interface, and the DRC set may be selected in response to the indication provided to the decoder through the interface.
- indications of additional desired effects may be provided to the decoder through the interface
- the metadata may include a plurality of DRC sets configured to provide the dynamic loudness compensation effect, and the selection may depend on the additional desired effects.
- the indication that a dynamic loudness compensation effect is desired may be provided through a drcEffectTypeRequest parameter of a dynamicRangeController Interface payload.
- the metadata may include one or more static loudness values configured for providing static loudness adjustment to the decoded audio data.
- static loudness adjustment may be applied, in response to one or more of the static loudness values, to the decoded audio data or the dynamic loudness compensated audio data.
- a first one of the DRC sets may be configured for providing dynamic range control, and the first DRC set may comprise an indication that the selected DRC set configured for providing the dynamic loudness compensation effect may be configured for application in combination with the first DRC set.
- the selected DRC set may comprise an indication of whether the one or more DRC gains corresponding to the selected DRC set may only be applied in combination with DRC gains corresponding to the first DRC set.
- DRC gains corresponding to the first DRC set may be extracted from the bitstream and may be applied to the decoded audio data.
- the decoder may comprise one or more processors and non- transitory memory configured to perform a method of processing audio data for playback as described above.
- a computer program product comprising a computer-readable storage medium with instructions adapted to cause the device to carry out a method of processing audio data for playback as described above.
- Tn accordance with an eleventh aspect of the present disclosure there is provided a computer- readable storage medium storing the computer program product described herein.
- FIG. 1 illustrates an example of a decoder for metadata-based dynamic processing of audio data for playback.
- FIG. 2 illustrates an example of a method of metadata-based dynamic processing of audio data for playback.
- FIG. 3 illustrates an example of an encoder for encoding in a bitstream original audio data and metadata for dynamic loudness adjustment.
- FIG. 4 illustrates an example of a method of encoding audio data and metadata for dynamic loudness adjustment, into a bitstream.
- FIG. 5 illustrates an example of a device comprising one or more processors and non-transitory memory configured to perform the methods described herein.
- FIG. 6 illustrates an example of a method of processing audio data for playback.
- the average loudness of a program or dialogue is the main parameter or value used for loudness compliance of broadcast or streaming programs.
- the average loudness is typically set to -24 or - 23 LKFS.
- this single loudness value representing the loudness of the entire program, is carried in the bitstream.
- Using this value in the decoding process allows gain adjustments that result in predictable playback levels, so that programs play back at known consistent levels. Therefore, it is important that this loudness value is set properly and accurately. Since the average loudness is dependent on measuring the entire program prior to encoding, for real-time situations such as dynamic encoding with unknown loudness and dynamic range variation, this is, however, not possible.
- a dynamic loudness leveler is often used to modify or contour the audio data prior to encoding so that it meets the required loudness.
- This type of loudness management is often seen as an inferior method to meet compliance, as it often changes the dynamic range intercorrelation in the audio content and may thus change the creative intent. This is especially the case when it is desired to distribute one audio asset for all playback devices, which is one of the benefits of metadata driven codec and delivering systems.
- audio content is mixed with the required target loudness, and the corresponding loudness metadata is set to that value.
- a loudness leveler might still be used in those situations as it will be used to help steer the audio content to the target loudness but it will not be that lecturactive“ and is only used to when the audio content starts to deviate from the required target loudness.
- methods and apparatus described herein aim at making real-time processing situations, also denoted as dynamic processing situations, also metadata driven.
- the metadata allow for dynamic loudness adjustment and dynamic range compression in real-time situations.
- a decoder can search based on the syntax a given payload for an appropriate set of parameters and identifiers, by matching the aforementioned settings to the identifiers. The parameters included in the set whose identifiers best match the settings can then be selected as the processing parameters for dynamic loudness adjustment to be applied to received original audio data for correction.
- decoder settings e.g., DRCSet, EQSet, and downmix
- multiple sets of parameters for dynamic processing can be transmitted.
- the metadata-driven dynamic loudness compensation in addition to correcting the overall loudness, can also be used to “center” the DRC gain calculation and application. This centering maybe a result of correcting the loudness of the content, via the dynamic loudness compensation, and how DRC is typically calculated and applied. In this sense, metadata for dynamic loudness compensation can be said to be used for aligning DRC parameters.
- the decoder 100 may comprise one or more processors and non-transitory memory configured to perform a method including the processes as illustrated in the example of Figure 2 by means of steps S101 to S105.
- the decoder 100 may receive a bitstream including audio data and metadata and may be able to output the unprocessed (original) audio data, the processed audio data after application of dynamic processing parameters determined from the metadata and/or the metadata itself depending on requirements.
- the decoder 100 may receive a bitstream including audio data and metadata for dynamic loudness adjustment and optionally, dynamic range compression (DRC).
- the audio data may be encoded audio data, the audio data may further be unprocessed. That is, the audio data may be said to be original audio data.
- the metadata may include a plurality of sets of metadata. For example, each payload of metadata may include such plurality of sets of metadata. These different sets of metadata may relate to respective playback conditions (e.g., to different playback conditions).
- bitstream may be an MPEG-D DRC bitstream.
- the presence of metadata for dynamic processing of audio data may then be signaled based on MPEG-D DRC bitstream syntax.
- a loudnessInfoSetExtensionQ-element may be used to carry the metadata as a payload as detailed farther below.
- the audio data and the metadata may then be decoded, by the decoder, to obtain decoded audio data and the metadata.
- the metadata may include one or more processing parameters relating to average loudness values and optionally one or more processing parameters relating to dynamic range compression characteristics. It is understood that each set of metadata may include respective processing parameters.
- the metadata allows to apply dynamic or real-time correction. For example, when encoding and decoding for live real-time playout, the application of the “real-time” or dynamic loudness metadata is desired to ensure that the live playout audio is properly loudness managed.
- step SI 03 the decoder then determines, from the metadata, one or more processing parameters for dynamic loudness adjustment based on a playback condition. This may be done by using the playback condition or information derived from the playback condition (e.g., playback condition information), to identify an appropriate set of metadata among the plurality of sets of metadata.
- the playback condition or information derived from the playback condition e.g., playback condition information
- a playback condition may include one or more of a device type of the decoder, characteristics of a playback device, characteristics of a loudspeaker, a loudspeaker setup, characteristics of background noise, characteristics of ambient noise and characteristics of the acoustic environment.
- the playback condition information may be indicative of a specific loudspeaker setup. The consideration of a playback condition allows the decoder for a targeted selection of processing parameters for dynamic loudness adjustment with regard to device and environmental constraints.
- the process of determining the one or more processing parameters in step SI 03 may farther include selecting, by the decoder, at least one of a set of DRC sequences, DRCSet, set of equalizer parameters, EQSet, and a downmix, corresponding to the playback condition.
- the at least one of a DRCSet, EQSet and downmix correlates with or is indicative of the individual device and environmental constraints due to the playback condition.
- step SI 03 involves selecting a set of DRC sequences, DRCSet.
- the selected set of metadata may include such set of DRC sequences.
- the process of determining in step SI 03 may farther include identifying a metadata identifier indicative of the at least one selected DRCSet, EQSet and DownmixSet to determine the one or more processing parameters from the metadata.
- the metadata identifier thus enables to connect the metadata with a corresponding selected DRCSet, EQSet, and/or downmix, and thus with a respective playback condition.
- the specific loudspeaker setup may be used to determine a downmix, which in turn may be used for identifying and selecting an appropriate one among the plurality of sets of metadata.
- the specific loudspeaker setup and/or the downmix may be indicated by the aforementioned playback condition information.
- the metadata may comprise one or more metadata payloads (e.g., dynLoudCompQ payloads, such as shown in Table 5 below), wherein each metadata payload may include a plurality of sets of parameters (e.g., parameters dynLoudComp Value) and identifiers, with each set including at least one of a DRCSet identifier, drcSetld, an EQSet identifier, eqSetld, and a downmix identifier, downmixld, in combination with one or more processing parameters relating to the identifiers in the set.
- dynLoudCompQ payloads such as shown in Table 5 below
- each metadata payload may include a plurality of sets of parameters (e.g., parameters dynLoudComp Value) and identifiers, with each set including at least one of a DRCSet identifier, drcSetld, an EQSet identifier, eqSetld
- each payload may comprise an array of entries, each entry including processing parameters and identifiers (e.g., drcSetld, eqSetld, downmixld).
- the array of entries may correspond to the plurality of sets of metadata mentioned above.
- each entry comprises the downmix identifier.
- the determining in step SI 03 may thus involve selecting a set among the plurality of sets in the payload based on the downmix selected by the decoder (or alternatively, based on the at least one DRCSet, EQSet, and downmix), wherein the one or more processing parameters determined in step SI 03 may be the one or more processing parameters relating to the identifiers in the selected set. That is, depending on settings (e.g., DRCSet, EQSet, and downmix) present in the decoder, the decoder can search a given payload for an appropriate set of parameters and identifiers, by matching the aforementioned settings to the identifiers. The parameters included in the set whose identifiers best match the settings can then be selected as the processing parameters for dynamic loudness adjustment.
- settings e.g., DRCSet, EQSet, and downmix
- step SI 04 the determined one or more processing parameters may then be applied, by the decoder, to the decoded audio data to obtain processed audio data.
- the processed audio data for example live real-time audio data, are thus properly loudness managed.
- step SI 05 the processed audio data may then be output for playback.
- the bitstream may further include additional metadata for static loudness adjustment to be applied to the decoded audio data.
- Static loudness adjustment refers in contrast to dynamic processing for real-time situations to processing performed for general loudness normalization. Carrying the metadata for dynamic processing separately from the additional metadata for general loudness normalization allows to not have the “real-time” correction applied.
- the application of dynamic processing is desired to ensure that the live playout audio is properly loudness managed. But for a non-real-time playout, or a transcoding where the dynamic correction is not desired or required, the dynamic processing parameters determined from the metadata do not have to be applied.
- the originally unprocessed content can be retained, if desired.
- the original audio is encoded along with the metadata. This allows the playback device to selectively apply the dynamic processing and to farther enable playback of original audio content on high-end devices capable of playing back original audio.
- the loudness of the content (or what it should be after the dynamic loudness metadata is applied) would not indicate the actual loudness of the content, as the metadata available would be a composite value. Besides removing this ambiguity of what the content loudness (or program or anchor loudness) is, there are some cases where this would be particularly beneficial:
- the decoder or playback device Keeping the metadata for dynamic processing separate allows the decoder or playback device to turn off the application of dynamic processing and to apply an implemented real-time loudness leveler instead to avoid cascading leveling. This situation may occur, for example, if the device’s own real-time leveling solution is superior to the one used with the audio codec or, for example, if the device’s own real-time leveling solution cannot be disabled and therefore will always be active, resolution in farther processing leading to a compromised playback experience.
- the dynamic processing metadata may be used or stored for archive or on-demand services. Therefore, for the archive or on-demand services, a more accurate, or compliant loudness measurement, based on the entire program can be carried out, and the appropriate metadata reset. For use-cases where a fixed target loudness is used throughout the workflow, for example in a R128 compliant situation where -23 LKFS is recommended, this is also beneficial.
- the addition of the dynamic processing metadata is a “safety” measure, where the content is assumed and close to the required target and the addition of the dynamic processing metadata is a secondary check. Thus, having the ability to turn it off is desirable.
- an encoder for encoding in a bitstream original audio data and metadata for dynamic loudness adjustment and optionally, dynamic range compression, DRC is described which may comprise one or more processors and non- transitory memory configured to perform a method including the processes as illustrated in the steps in the example of Figure 4.
- step S201 original audio data may be input into a loudness leveler, 201, for loudness processing to obtain, as an output from the loudness leveler, 201, loudness processed audio data.
- step S202 metadata for dynamic loudness adjustment may then be generated based on the loudness processed audio data and the original audio data.
- Appropriate smoothing and time frames may be used to reduce artifacts.
- step S202 may include comparison of the loudness processed audio data to the original audio data, by an analyzer, 202, wherein the metadata may be generated based on a result of said comparison.
- the metadata thus generated can emulate the effect of the leveler at the decoder site.
- the metadata may include:
- Gain wideband and/or multiband processing parameters such that when applied to the original audio will produced loudness compliant audio for playback
- Processing parameters describing the dynamics of the audio such as o Peak - sample and true peak o Short-term loudness values o Change of short-term loudness values.
- step S202 may farther include measuring, by the analyzer, 202, the loudness over one or more pre-defined time periods, wherein the metadata may be generated farther based on the measured loudness.
- the measuring may comprise measuring overall loudness of the audio data.
- the measuring may comprise measuring loudness of dialogue in the audio data.
- the original audio data and the metadata may then be encoded into the bitstream.
- the bitstream may be an MPEG-D DRC bitstream and the presence of the metadata may be signaled based on MPEG-D DRC bitstream syntax.
- a loudnessInfoSetExtension()-element may be used to carry the metadata as a payload as detailed further below.
- the metadata may comprise one or more metadata payloads, wherein each metadata payload may include a plurality of sets of parameters and identifiers, with each set including at least one of a DRCSet identifier, drcSetld, an EQSet identifier, eqSetld, and a downmix identifier, downmixld, in combination with one or more processing parameters relating to the identifiers in the set, and wherein the one or more processing parameters may be parameters for dynamic loudness adjustment by a decoder.
- the at least one of the drcSetld, the eqSetld, and the downmixld may be related to at least one of a set of DRC sequences, DRCSet, a set of equalizer parameters, EQSet, and a downmix. to be selected by the decoder.
- the metadata may be said to include a plurality of sets of metadata, with each set corresponding to a respective playback condition (e.g., to a different playback condition).
- the method may further include generating additional metadata for static loudness adjustment to be used by a decoder. Keeping the metadata for dynamic loudness processing and the additional metadata separate in the bitstream and encoding further the original audio data into the bitstream has several advantages as detailed above.
- the methods described herein may be implemented on a decoder or an encoder, respectively, wherein the decoder and the encoder may comprise one or more processors and non-transitory memory configured to perform said methods.
- the decoder and the encoder may comprise one or more processors and non-transitory memory configured to perform said methods.
- An example of a device having such processing capability is illustrated in the example of Figure 5 showing said device, 300, including two processors, 301, and non-transitory memory, 302.
- the methods described herein can further be implemented on a system of an encoder for encoding in a bitstream original audio data and metadata for dynamic loudness adjustment and optionally, dynamic range compression, DRC, and a decoder for metadata-based dynamic processing of audio data for playback as described herein.
- the methods may further be implemented as a computer program product comprising a computer-readable storage medium with instructions adapted to cause the device to carry out said methods when executed by a device having processing capability.
- the computer program product may be stored on a computer-readable storage medium.
- the MPEG-D DRC syntax may be extended, e.g, the loudnessInfoSetExtension()- ⁇ cmexd. shown in Table 2 below, in order to also carry the dynamic processing metadata as a frame- based dynLoudComp update.
- another switch-case UNIDRCLOUDEXT_DYNLOUDCOMP may be added in the loudnessInfoSetExtensionO- ⁇ trtiexA as shown in Table 1.
- the switch-case UNIDRCLOUDEXT_DYNLOUDCOMP may be used to identify a new element dynLoudCompO as shown in Table 5.
- the loudnessInfoSetExtension()-e[e.vsyQrA may be an extension of the loudnessInfoSetQ-element as shown in Table 2.
- the loudnessInfoSetf)- element may be part of the uniDR QTelement as shown in Table 3.
- the drcSetld enables dynLoudComp (relating to the metadata) to be applied per DRC- set.
- the eqSetld enables dynLoudComp to be applied in combination with different settings for the equalization tool.
- the dynLoudCompQ element may also include a methodDefinition parameter (specified by, e.g., 4 bits) specifying a loudness measurement method used for deriving the dynamic program loudness metadata (e.g., anchor loudness, program loudness, short-term loudness, momentary loudness, etc.) and/or a measurementsystem parameter (specified by. e.g., 4 bits) specifying a loudness measurement system used for measuring the dynamic program loudness metadata (e.g., EBU R.128, ITU-R BS-1770 with or without preprocessing, ITU-R BS-1771, etc.).
- a methodDefinition parameter specified by, e.g., 4 bits
- a loudness measurement method used for deriving the dynamic program loudness metadata e.g., anchor loudness, program loudness, short-term loudness, momentary loudness, etc.
- a measurementsystem parameter specified by. e.g., 4 bits
- Such parameters may, e.
- Table 11 UniDre gain extension types Semantics dynLoudComp Value This field contains the value for dynLoudCompDb. The values are encoded according to the table below. The default value is 0 dB.
- Table 13 Loudness normalization processing Pseudo-Code for selection and processing of dynLoudComp
- the selection process in addition to the selection process shown in pseudo-code above (e.g., taking drcSetld, eqSetID and downmixld into consideration for selecting a dynLoudComp Value parameter), it may be beneficial for the selection process to also take a methodDefinition parameter and/or a measurementsystem parameter into consideration for selecting a dynLoudComp Value parameter.
- the loudness normalization processing pseudo-code described above may be replaced by the following alternative loudness normalization processing pseudo-code.
- a default value of dynLoudCompDb e.g., 0 dB, may be assumed to ensure that the value of dynLoudCompDb is defined, even for cases where dynamic loudness processing metadata is not present in the bitstream.
- Interface Extension Syntax it may be beneficial to allow control, e.g., by an end user, of whether or not dynamic loudness processing is performed, even when dynamic loudness processing information is present in a received bitstream.
- control may be provided by updating the MPEG-D DRC interface syntax to include an additional interface extension (e.g., UNIDRCINTERFACEEXT_DYNLOUD) which contains a revised loudness normalization control interface payload (e.g., loudnessNormalizationControlInterfaceVl()) as shown in the following tables.
- an additional interface extension e.g., UNIDRCINTERFACEEXT_DYNLOUD
- loudnessNormalizationControlInterfaceVl() e.g., loudnessNormalizationControlInterfaceVl()
- Additional Methods of Enabling Dynamic Loudness Compensation are possible. For example, instead of including a parameter which indicates a specific dynamic loudness compensation value to be applied to the signal, it is possible to define a specific set of DRC gains which already include the dynamic loudness compensation, such that, when those specific DRC gains are applied by a decoder, dynamic loudness compensation is applied. DRC gain sets suitable for applying such dynamic loudness compensation may be identified through specific bitstream elements, as explained further below. A benefit of such an approach is that it is not necessary to transmit an explicit dynamic loudness compensation gain in addition to other loudness information.
- a user can specify through an interface that the DRC gains including dynamic loudness compensation should be selected and applied, as explained farther below.
- bitstream elements may be provided to allow tighter creative control over whether and how dynamic loudness compensation is performed. For instance, a flag may indicate whether or not dynamic loudness compensation may be switched OFF by a user. In such cases, if the content creator allows for dynamic loudness compensation to be disabled, then dynamic loudness compensation will be applied or not as specified by the user.
- DRC effects which are provided by a specific DRC set.
- a DRC set might be appropriate for “Late Night” viewing, for viewing in “Noisy Environments”, for viewing at “Low Playback Levels”, etc.
- a DRC set effect parameter may indicate which specific effects are provided by the DRC set.
- the MPEG-D DRC standard allows a user to specify one or more desired DRC effects, as well as one or more optional fallback DRC effects. Such information from a user may be used to select a most appropriate DRC set from the available DRC sets. For example, if a DRC set exists which matches the desired DRC effects, then that set is selected. If such a set does not exist, but there is a set that matches a fallback DRC effect, that set may be selected.
- the list of defined DRC effects may be extended to include a DRC effect which provides dynamic loudness compensation.
- the list of DRC effects which can be specified by a user through an interface may also be extended to include a DRC set effect which provides dynamic loudness compensation.
- dynamic loudness compensation may be provided by including a DRC set indicated as providing a dynamic loudness compensation effect, and indicating, through an interface, a desire for a DRC set which provides dynamic loudness compensation, in which case, the decoder would select and apply the DRC set that corresponds to the dynamic loudness compensation DRC effect.
- a previous version of a table specifying DRC Set Effects that may be signaled in a bitstream may be updated to include an entry for a “Dynamic Loudness Compensation” effect as shown below.
- a row with a particular bit position (e.g., 13) corresponding to a Dynamic Loudness Compensation effect maybe added as shown in the table below.
- a bit in a bitfield (e.g. a drcSetEffect bitfield of a drcInstractionsBasicO or a drcInstnictionsUniDrc() or a drcInstructionsUniDrcVlQ payload) associated with a particular DRC Set may be set (e.g., bit position 13) to indicate that the DRC Set provides the Dynamic Loudness Compensation effect.
- a previous version of a table specifying DRC Set Effects that may be specified to a decoder through an interface may be updated to include an entry for a Dynamic Loudness Compensation effect as shown below.
- a row with a particular Index Value (e.g., 9) corresponding to a Dynamic Loudness Compensation effect may be added as shown in the table below.
- Table 24 Examples of DRC Effects that may be specified through a interface
- an interface parameter (e.g., a drcEffectTypeRequest parameter of a dynamicRangeController!nterface() payload) may be set to the specific value (e.g., 9) that corresponds to the Dynamic Loudness Compensation Effect to instruct the decoder that, if available, a DRC Set which provides the Dynamic Loudness Compensation Effect set should be selected and applied.
- a decoder may require that in order to select and apply a DRC set that provides a dynamic loudness compensation effect, a decoder must also perform loudness normalization.
- Such a requirement could be accomplished by only allowing the decoder to select a DRC set that provides a dynamic loudness compensation effect when loudness normalization is also enabled in the decoder.
- a requirement could be satisfied by modifying the interface to require that when requesting a DRC set which provides a dynamic loudness compensation effect, loudness normalization must also be enabled (e.g., by setting both a loudness normalization on flag and a target loudness value).
- a new payload (e.g, dynLoud Instructions ()) including instructions for dynamic loudness compensation could be defined.
- One or more of such payloads could, e.g., be included in an extension field (e.g., UNIDRCCONFEXT_V2) of an existing bitstream.
- Each of such payloads would be assigned a unique identifier (e.g., drcSetld) which corresponds to a DRC set that provides dynamic loudness compensation, and, to help easily identify such sets, a flag (e.g., a dynamicLoudCompDRCSet flag) could be set to 1 for each such DRC set signaled through the new payload.
- a flag e.g., a dynamicLoudCompDRCSet flag
- the user interface described above for selecting DRC sets that provide the dynamic loudness compensation effect could be used for DRC sets signaled through this new type of payload (e.g., dynLoudInstructions() payload) as well.
- a user could indicate to the decoder through the interface (e.g., using the drcEffectTypeRequest field) that a DRC set that provides dynamic loudness compensation is desired.
- the decoder would then select, if present, a DRC set signaled through the new payloads (e.g., a DRC set having a DRC Set ID that is identified as corresponding to dynamic loudness compensation, and, for instance, having a dynamicLoudCompDRCSet flag set to 1).
- a decoder may require that in order to select and apply a DRC set that provides a dynamic loudness compensation effect, a decoder must also perform loudness normalization. Such a requirement could be accomplished by only allowing the decoder to select a DRC set that provides a dynamic loudness compensation effect when loudness normalization is also enabled in the decoder. Alternatively, or in addition, such a requirement could be satisfied by modifying the interface to require that when requesting a DRC set which provides a dynamic loudness compensation effect, loudness normalization must also be enabled (e.g., by setting both a loudness normalization on flag and a target loudness value).
- an advantage to using this type of signaling is that, as described above, additional parameters indicating whether and how a user can enable and/or disable dynamic loudness compensation can be included in such new payloads.
- the new payload may also include a parameter (e.g., a dynamicLoudCompSwOffAllowed parameter) which indicates whether or not switching dynamic loudness compensation off through the user interface is allowed for each DRC set that applies dynamic loudness compensation.
- a unique DRC Set ID as well as a flag (e.g., dynamicLoudCompDRCSet) which specifically indicates the DRC Set as a DRC set that provides dynamic loudness compensation, may be associated with the payload.
- the payload may be associated with one or more downmix identification parameters (e.g., downmixld parameters) which indicate that the DRC set is intended for use with one or more specific downmixes / downmix configurations of the audio program.
- Such DRC Set identifier e.g., drcSetld parameter
- the flag e.g., dynamicLoudCompDRCSet flag
- DRC Set identifier e.g., drcSetld parameter
- the flag e.g., dynamicLoudCompDRCSet flag
- DRC Set identifier e.g., downmixID parameters
- downmixID parameters e.g., downmixID parameters
- such additional payloads may be contained in an extension payload of a configuration payload (e.g., a uniDrcConfigExtensionQ payload).
- a specific extension type e.g., a uniDrcConfigExtType of UNIDRCCONFEXT_V2 having a value of 0x2 may be used to indicate the presence of such additional payloads within the extension field of the configuration payload.
- An example of a syntax for an extension field of a configuration payload (e.g., a uniDrcConfigExtensionQ payload) is shown in the table below.
- the data associated with the specific extension type of the extension field of the configuration payload contains a parameter which indicates whether dynamic loudness compensation instructions are present (e.g., a dynLoudPresent parameter), and if so, a parameter which indicates the number of sets of dynamic loudness compensation instructions in the extension payload (e.g., a dynLoudlnstructions Count parameter), and each set of dynamic loudness compensation instructions (e.g., each dynLoudlnstructions () payload), where the syntax for the dynamic loudness compensation instructions may be as shown in the table above.
- a parameter which indicates whether dynamic loudness compensation instructions are present e.g., a dynLoudPresent parameter
- a parameter which indicates the number of sets of dynamic loudness compensation instructions in the extension payload e.g., a dynLoudlnstructions Count parameter
- each set of dynamic loudness compensation instructions e.g., each dynLoudl
- a new parameter e.g., a dynamicLoudCompDRCSet parameter
- a new parameter e.g., a dynamicLoudCompDRCSet parameter
- an existing configuration element e.g., in a downmixInstructionsQ element, a drc!nstructionsBasic() element, or a drcInstructionsUniDrcQ element.
- the new element could be included in a field which is already ignored by legacy decoders (e.g., in a reserved field).
- the new parameter could be included in an updated version of an existing configuration element (e.g., in a downmixlnstructionsV2() element or a drcInstructionsUniDrcV2() element), which would be ignored by legacy decoders.
- an existing configuration element e.g., in a downmixlnstructionsV2() element or a drcInstructionsUniDrcV2() element
- DRC Sets that are signaled as providing dynamic loudness compensation in this manner could be selected through an interface to the decoder by indicating that a DRC Set is desired that provides a dynamic loudness compensation effect.
- the decoder will identify and select a DRC Set which is indicated as providing dynamic loudness compensation (e.g., which has a dynamicLoudCompDRCSet parameter that equals 1).
- a decoder may use other rules (e.g., predefined rules, such as those defined in ISO/IEC 23003-4:2020) for selecting the most appropriate DRC set of the multiple DRC sets that provide dynamic loudness compensation.
- a decoder may require that in order to select and apply a DRC set that provides a dynamic loudness compensation effect, a decoder must also perform loudness normalization. Such a requirement could be accomplished by only allowing the decoder to select a DRC set that provides a dynamic loudness compensation effect when loudness normalization is also enabled in the decoder. Alternatively, or in addition, such a requirement could be satisfied by modifying the interface to require that when requesting a DRC set which provides a dynamic loudness compensation effect, loudness normalization must also be enabled (e.g., by setting both a loudness normalization on flag and a target loudness value).
- Figure 6 illustrates an example method of processing audio data for playback as described above.
- the decoder 100 may receive a bitstream including encoded audio data and metadata, wherein the metadata includes one or more dynamic range control (DRC) sets, and for each DRC set, an indication of whether the DRC set is configured for providing a dynamic loudness compensation effect, e.g., as described above.
- DRC dynamic range control
- bitstream may be an MPEG-D DRC bitstream.
- the presence of metadata for providing a dynamic loudness compensation effect may then be signaled based on MPEG-D DRC bitstream syntax, e.g., as described above.
- the decoder may then parse the metadata to identify DRC sets that are configured for providing the dynamic loudness compensation effect, e.g., as described above.
- the decoder may then decode the audio data to obtain decoded audio data, e.g., as described above.
- the decoder may then select one of the identified DRC sets configured for providing the dynamic loudness compensation effect, e.g., as described above.
- the decoder may then extract one or more DRC gains corresponding to the selected DRC set from the bitstream, e.g., as described above.
- step S306 the decoder may then apply the one or more DRC gains corresponding to the selected DRC set to the decoded audio data to obtain dynamic loudness compensated audio data, e.g., as described above.
- step S307 the dynamic loudness compensated audio data may then be output for playback, e.g., as described above.
- dynamic loudness compensation in combination with dynamic range control, may be achieved by transmitting a DRC set that contains gains which are a combination of dynamic range control gains and dynamic loudness compensation gains.
- a DRC set may contain gains for applying dynamic range control to a signal, and it may be important to allow for that DRC gain set to be applied independently of any other gains (e.g., dynamic loudness compensation gains). Therefore, if a DRC set is desired which contains gains for applying a combination of dynamic range control and dynamic loudness compensation, then a second DRC set needs to be specified which represents a combination of the dynamic range control gains of the first set and the desired dynamic loudness compensation gains.
- a more efficient way of transmitting data to accomplish the same goal is to provide only the dynamic range control gains in a first DRC set, and only the dynamic loudness compensation gains in a second DRC set, along with some additional metadata indicating the relationship between the two DRC sets.
- the first DRC set may include a parameter which indicates that there is another DRC set which depends on the first DRCset.
- a decoder may understand that it is possible to apply the gains of the first set independently, e.g., in case only dynamic range control is desired. Additionally, the decoder will understand that it is also possible to combine the gains of the first and second DRC sets in order to obtain a combination of dynamic range control and dynamic loudness compensation.
- Metadata can be included with the second DRC set to indicate whether the gains of that DRC set may only be used in conjunction with the first set, or whether they may be used independently of the gains of the first set.
- DRC Set 1 e.g., has a drcSetld equal to 1
- DRC Set 2 e.g., has a drcSetld equal to 2
- DRC Set 1 could include a parameter indicating that DRC Set 2 depends on DRC Set 1 (e.g., a dependsOnDrcSet parameter of DRC Set 1 may be equal to 2).
- DRC Set 2 could include a parameter indicating that the gains of DRC Set 2 may not be used independently (e.g., a noIndependentUse flag of DRC Set 2 may be set to a value 1). Conversely, if DRC Set 2 was not intended only for use in combination with dynamic range control, then DRC Set 2 could include a parameter indicating that the gains of DRC Set 2 may be used independently (e.g., a noIndependentUse flag of DRC Set 2 may be set to a value of 0).
- DRC Set 1 could have a drcSetEffect parameter in which the bit corresponding to the dynamic loudness compensation effect is set to 0, while DRC Set 2 could have the bit corresponding to the dynamic loudness compensation effect set to 1, and all other bits of the bitfield may be the same for both DRC Sets.
- an existing effect flag e.g., a bit of a drcSetEffect bitfield
- DRC Set 1 could have a drcSetEffect parameter in which the bit corresponding to the dynamic loudness compensation effect is set to 0
- DRC Set 2 could have the bit corresponding to the dynamic loudness compensation effect set to 1
- all other bits of the bitfield may be the same for both DRC Sets.
- the following table shows exemplary values of parameters accomplishing the above signaling of efficient transmission of dynamic loudness compensation data.
- Table 27 Exemplary Values of parameters for signaling efficient transmission of dynamic loudness compensation data
- DRC Set 2 would be signaled through the new payload (e.g., the drcSetld parameter of the dynLoudlnstructions payload would be set to 2).
- DRC Set 1 would be signaled as described above (e.g., a drcSetld parameter would be set to 1, and a dependsonDrcSet parameter would be set to 2, indicating that DRC Set 2 depends on DRC Set 1).
- DRC Set 2 should not be used independently (e.g., that it should always be used with DRC Set 1).
- an additional parameter could be added to the new payload (e.g., a noIndependentUse parameter could be added to the dynLoudlnstructions payload), such that, when the new parameter is set to 1, DRC Set 2 is only used with DRC Set 1. while when the new parameter is set to 0, DRC Set 2 may be used independently of DRC Set 1.
- a decoder will extract and apply the gains of both DRC Set 1 and DRC Set 2 in order to apply a combination of dynamic range control and dynamic loudness compensation.
- DRC Set 1 would include an indication that DRC Set 2 depends on DRC Set 1 as described above, and DRC Set 2 would include an indication of whether or not it may be used independently, or only in conjunction with DRC Set 2.
- a decoder will extract and apply the gains of DRC Set 1 and DRC Set 2 in order to apply a combination of dynamic range control and dynamic loudness compensation.
- a benefit of using dependent DRC Sets to enable dynamic loudness compensation is that doing so allows for more efficient transmission of the data required for dynamic loudness compensation, because it is eliminates the need to transmit redundant dynamic range control gains in the DRC Set which contains the dynamic loudness compensation gains.
- processor may refer to any device or portion of a device that processes electronic data, e.g., from registers and/or memory to transform that electronic data into other electronic data that, e.g., may be stored in registers and/or memory.
- a “computer” or a “computing machine” or a “computing platform” may include one or more processors.
- the methodologies described herein are, in one example embodiment, performable by one or more processors that accept computer-readable (also called machine-readable) code containing a set of instructions that when executed by one or more of the processors carry out at least one of the methods described herein. Any processor capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken are included.
- processors may include one or more of a CPU, a graphics processing unit, and a programmable DSP unit.
- the processing system further may include a memory subsystem including main RAM and/or a static RAM, and/or ROM.
- a bus subsystem maybe included for communicating between the components.
- the processing system further may be a distributed processing system with processors coupled by a network. If the processing system requires a display, such a display may be included, e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT) display.
- LCD liquid crystal display
- CRT cathode ray tube
- the processing system also includes an input device such as one or more of an alphanumeric input unit such as a keyboard, a pointing control device such as a mouse, and so forth.
- the processing system may also encompass a storage system such as a disk drive unit.
- the processing system in some configurations may include a sound output device, and a network interface device.
- the memory subsystem thus includes a computer-readable carrier medium that carries computer-readable code (e.g., software) including a set of instructions to cause performing, when executed by one or more processors, one or more of the methods described herein. Note that when the method includes several elements, e.g., several steps, no ordering of such elements is implied, unless specifically stated.
- the software may reside in the hard disk, or may also reside, completely or at least partially, within the RAM and/or within the processor during execution thereof by the computer system.
- the memory and the processor also constitute computer-readable carrier medium carrying computer-readable code.
- a computer-readable carrier medium may form, or be included in a computer program product.
- the one or more processors operate as a standalone device or may be connected, e.g., networked to other processors), in a networked deployment, the one or more processors may operate in the capacity of a server or a user machine in server-user network environment, or as a peer machine in a peer-to-peer or distributed network environment.
- the one or more processors may form a personal computer (PC), a tablet PC, a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- PC personal computer
- PDA Personal Digital Assistant
- a cellular telephone a web appliance
- network router switch or bridge
- machine shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
- each of the methods described herein is in the form of a computer-readable carrier medium carrying a set of instructions, e.g., a computer program that is for execution on one or more processors, e.g,, one or more processors that are part of web server arrangement.
- example embodiments of the present disclosure may be embodied as a method, an apparatus such as a special purpose apparatus, an apparatus such as a data processing system, or a computer-readable carrier medium, e.g., a computer program product.
- the computer-readable carrier medium carries computer readable code including a set of instructions that when executed on one or more processors cause the processor or processors to implement a method.
- aspects of the present disclosure may take the form of a method, an entirely hardware example embodiment, an entirely software example embodiment or an example embodiment combining software and hardware aspects.
- the present disclosure may take the form of carrier medium (e.g., a computer program product on a computer-readable storage medium) carrying computer- readable program code embodied in the medium.
- the software may further be transmitted or received over a network via a network interface device.
- the carrier medium is in an example embodiment a single medium, the term “carrier medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.
- the term “carrier medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by one or more of the processors and that cause the one or more processors to perform any one or more of the methodologies of the present disclosure.
- a carrier medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media.
- Non-volatile media includes, for example, optical, magnetic disks, and magneto-optical disks.
- Volatile media includes dynamic memory, such as main memory.
- Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise a bus subsystem. Transmission media may also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.
- carrier medium shall accordingly be taken to include, but not be limited to, solid-state memories, a computer product embodied in optical and magnetic media; a medium bearing a propagated signal detectable by at least one processor or one or more processors and representing a set of instructions that, when executed, implement a method; and a transmission medium in a network bearing a propagated signal detectable by at least one processor of the one or more processors and representing the set of instructions.
- any one of the terms comprising, comprised of or which comprises is an open term that means including at least the elements/features that follow, but not excluding others.
- the term comprising, when used in the claims should not be interpreted as being limitative to the means or elements or steps listed thereafter.
- the scope of the expression a device comprising A and B should not be limited to devices consisting only of elements A and B.
- Any one of the terms including or which includes or that includes as used herein is also an open term that also means including at least the elements/features that follow the term, but not excluding others. Thus, including is synonymous with and means comprising.
- EEE-A A method of metadata-based dynamic processing of audio data for playback, the method including processes of:
- EEE-A2 The method according to EEE-A1, wherein the metadata is indicative of processing parameters for dynamic loudness adjustment for a plurality of playback conditions.
- EEE-A3 The method according to EEE-A1 or EEE-A2, wherein said determining the one or more processing parameters further includes determining one or more processing parameters for dynamic range compression, DRC, based on the playback condition
- EEE-A4 The method according to any one of EEE-A1 to EEE-A3, wherein the playback condition includes one or more of a device type of the decoder, characteristics of a playback device, characteristics of a loudspeaker, a loudspeaker setup, characteristics of background noise, characteristics of ambient noise and characteristics of the acoustic environment.
- process (c) further includes selecting, by the decoder, at least one of a set of DRC sequences, DRCSet, a set of equalizer parameters, EQSet, and a downmix, corresponding to the playback condition.
- EEE-A6 The method according to EEE-A5, wherein process (c) further includes identifying a metadata identifier indicative of the at least one selected DRCSet, EQSet, and downmix to determine the one or more processing parameters from the metadata.
- EEE-A7 The method according to any one of EEE-A1 to EEE-A6, wherein the metadata includes one or more processing parameters relating to average loudness values and optionally one or more processing parameters relating to dynamic range compression characteristics.
- EEE-A8 The method according to any one of EEE- Al to EEE-A7, wherein the bitstream further includes additional metadata for static loudness adjustment to be applied to the decoded audio data.
- EEE-A9 The method according to any one of EEE-A1 to EEE-A8, wherein the bitstream is an MPEG-D DRC bitstream and the presence of metadata is signaled based on MPEG-D DRC bitstream syntax.
- EEE-A10 The method according to EEE-A9, wherein a loudnessInfoSetExtensionQ-element is used to carry the metadata as a payload.
- EEE-A11 The method according to any one of EEE-A1 to EEE-A10, wherein the metadata comprises one or more metadata payloads, wherein each metadata payload includes a plurality of sets of parameters and identifiers, with each set including at least one of a DRCSet identifier, drcSetld, an EQSet identifier, eqSetld, and a downmix identifier, downmixld, in combination with one or more processing parameters relating to the identifiers in the set.
- EEE-A12 The method according to EEE-A11 when depending on EEE-A5, wherein process (c) involves selecting a set among the plurality of sets in the payload based on the at least one DRCSet, EQSet, and downmix selected by the decoder, and wherein the one or more processing parameters determined at process (c) are the one or more processing parameters relating to the identifiers in the selected set.
- a decoder for metadata-based dynamic processing of audio data for playback comprising one or more processors and non- transitory memory configured to perform a method including processes of:
- EEE-A14 A method of encoding audio data and metadata for dynamic loudness adjustment, into a bitstream, the method including processes of:
- EEE-A15 The method according to EEE-A14, wherein the method further includes generating additional metadata for static loudness adjustment to be used by a decoder.
- EEE-A16 The method according to EEE-A14 or EEE-A15, wherein process (b) includes comparison of the loudness processed audio data to the original audio data, and wherein the metadata is generated based on a result of said comparison.
- EEE-A17 The method according to EEE-A16, wherein process (b) further includes measuring the loudness over one or more pre-defined time periods, and wherein the metadata is generated further based on the measured loudness.
- EEE-A18 The method according to EEE-A17, wherein the measuring comprises measuring overall loudness of the audio data.
- EEE-A19 The method according to EEE-A17, wherein the measuring comprises measuring loudness of dialogue in the audio data.
- EEE-A20 The method according to any one of EEE-A14 to EEE-A19. wherein the bitstream is an MPEG-D DRC bitstream and the presence of the metadata is signaled based on MPEG-D DRC bitstream syntax.
- EEE-A21 The method according to EEE-A20, wherein a loudnessinfo SetExtension()-element is used to carry the metadata as a payload.
- EEE-A22 The method according to any one of EEE-A14 to EEE-A21, wherein the metadata comprises one or more metadata payloads, wherein each metadata payload includes a plurality of sets of parameters and identifiers, with each set including at least one of a DRCSet identifier. drcSetld, an EQSet identifier, eqSetld, and a downmix identifier, downmixld, in combination with one or more processing parameters relating to the identifiers in the set, and wherein the one or more processing parameters are parameters for dynamic loudness adjustment by a decoder.
- EEE-A23 The method according to EEE-A22, wherein the at least one of the drcSetld, the eqSetld, and the downmixld is related to at least one of a set of DRC sequences, DRCSet, a set of equalizer parameters, EQSet, and a downmix, to be selected by the decoder.
- An encoder for encoding in a bitstream original audio data and metadata for dynamic loudness adjustment comprising one or more processors and non-transitory memory configured to perform a method including the processes of:
- EEE-A25 A system of an encoder for encoding in a bitstream original audio data and metadata for dynamic loudness adjustment and/or dynamic range compression, DRC, according to EEE- A24 and a decoder for metadata-based dynamic processing of audio data for playback according to EEE-A13.
- EEE-A26 A computer program product comprising a computer-readable storage medium with instructions adapted to cause the device to carry out the method according to any one of EEE-A1 to EEE-A12 or EEE-A14 to EEE-A23 when executed by a device having processing capability.
- EEE-A27 A computer-readable storage medium storing the computer program product of EEE- A26.
- EEE-A28 The method according to any one of EEE-A1 to EEE-A12 further comprising receiving, by the decoder, through an interface, an indication of whether or not to perform the metadata-based dynamic processing of audio data for playback, and when the decoder receives an indication not to perform the metadata-based dynamic processing of audio data for playback, bypassing at least the step of applying the determined one or more processing parameters to the decoded audio data.
- EEE-A29 The method according to EEE-A28, wherein until the decoder receives, through the interface, the indication of whether or not to perform the metadata-based dynamic processing of audio data for playback, the decoder bypasses at least the step of applying the determined one or more processing parameters to the decoded audio data.
- EEE-A30 The method of any one of EEE-A1 to EEE-A12, EEE-A28, or EEE-A29, wherein the metadata is indicative of processing parameters for dynamic loudness adjustment for a plurality of playback conditions, and the metadata further includes a parameter specifying a loudness measurement method used for deriving a processing parameter of the plurality of processing parameters.
- EEE-A31 The method of any one of EEE-A1 to EEE-A12, or EEE-A28 to EEE-A30, wherein the metadata is indicative of processing parameters for dynamic loudness adjustment for a plurality of playback conditions, and the metadata further includes a parameter specifying a loudness measurement system used for measuring a processing parameter of the plurality of processing parameters.
- a method of metadata-based dynamic processing of audio data for playback including: receiving, by a decoder, a bitstream including audio data and metadata for dynamic loudness adjustment, wherein the metadata for dynamic loudness adjustment comprises a plurality of sets of metadata, wherein each set of metadata corresponds to a respective playback condition; decoding, by the decoder, the audio data and the metadata to obtain decoded audio data and the metadata; selecting, in response to playback condition information provided to the decoder, a set of metadata corresponding to a specific playback condition, and extracting, from the selected set of metadata, one or more processing parameters for dynamic loudness adjustment; applying the extracted one or more processing parameters to the decoded audio data to obtain processed audio data; and outputting the processed audio data for playback.
- EEE-B2 The method according to EEE-BI, wherein said extracting the one or more processing parameters further includes extracting one or more processing parameters for dynamic range compression, DRC.
- EEE-B3 The method according to EEE-BI or EEE-B2, wherein the playback condition information is indicative of a specific loudspeaker setup.
- EEE-B4 The method according to any one of EEE-BI to EEE-B3, wherein the selected set of metadata includes a set of DRC sequences, DRCSet.
- EEE-B5 The method according to any of EEE-BI to EEE-B4, wherein selecting the set of metadata includes identifying a set of metadata corresponding to a specific downmix.
- EEE-B6 The method according to any one of EEE-BI to EEE-B5, wherein the sets of metadata each include one or more processing parameters relating to average loudness values and optionally one or more processing parameters relating to dynamic range compression characteristics.
- EEE-B7 The method according to any one of EEE-BI to EEE-B6, wherein the bitstream further includes additional metadata for static loudness adjustment to be applied to the decoded audio data.
- EEE-B8 The method according to any one of EEE-BI to EEE-B7, wherein the bitstream is an MPEG-D DRC bitstream and the presence of metadata is signaled based on MPEG-D DRC bitstream syntax.
- EEE-B9 The method according to EEE-B8, wherein a loudnessinfo SetExtension()-element is used to carry the metadata as a payload.
- EEE-B10 The method according to any one of EEE-B1 to EEE-B9, wherein the metadata comprises one or more metadata payloads, wherein each metadata payload includes a plurality of sets of parameters and identifiers, with each set including a respective downmix identifier, downmixld, in combination with one or more processing parameters relating to the downmix identifier in the set.
- a decoder for metadata-based dynamic processing of audio data for playback comprising one or more processors and non-transitory memory configured to perform a method including: receiving, by the decoder, a bitstream including audio data and metadata for dynamic loudness adjustment, wherein the metadata for dynamic loudness adjustment comprises a plurality of sets of metadata, wherein each set of metadata corresponds to a respective playback condition; decoding, by the decoder, the audio data and the metadata to obtain decoded audio data and the metadata; selecting, in response to a playback condition provided to the decoder a set of metadata corresponding to a specific playback condition, and extracting, from the selected set of metadata, one or more processing parameters for dynamic loudness adjustment; applying the extracted one or more processing parameters to the decoded audio data to obtain processed audio data; and outputting the processed audio data for playback.
- a method of encoding audio data and metadata for dynamic loudness adjustment into a bitstream including: inputting original audio data into a loudness leveler for loudness processing to obtain, as an output from the loudness leveler, loudness processed audio data; generating the metadata for dynamic loudness adjustment based on the loudness processed audio data and the original audio data; and encoding the original audio data and the metadata into the bitstream.
- EEE-B13 The method according to EEE-B12, wherein the method further includes generating additional metadata for static loudness adjustment to be used by a decoder.
- EEE-B14 The method according to EEE-B12 or EEE-B13, wherein said generating metadata includes comparison of the loudness processed audio data to the original audio data, and wherein the metadata is generated based on a result of said comparison.
- EEE-B15 The method according to EEE-B14, wherein said generating metadata further includes measuring the loudness over one or more pre-defined time periods, and wherein the metadata is generated further based on the measured loudness.
- EEE-B16 The method according to EEE-B15, wherein the measuring comprises measuring overall loudness of the audio data.
- EEE-B17 The method according to EEE-B15, wherein the measuring comprises measuring loudness of dialogue in the audio data.
- EEE-B18 The method according to any one of EEE-B12 to EEE-B17, wherein the bitstream is an MPEG-D DRC bitstream and the presence of the metadata is signaled based on MPEG-D DRC bitstream syntax.
- EEE-B19 The method according to EEE-B18, wherein a loudnessInfoSetExtensionQ-element is used to carry the metadata as a payload.
- EEE-B20 The method according to any one of EEE-B12 to EEE-B19, wherein the metadata comprises a plurality of sets of metadata, wherein each set of metadata corresponds to a respective playback condition.
- EEE-B21 The method according to any one of EEE-B12 to EEE-B20, wherein the metadata comprises one or more metadata payloads, wherein each metadata payload includes a plurality of sets of parameters and identifiers, with each set including a respective downmix identifier, downmixld, in combination with one or more processing parameters relating to the downmix identifier in the set, and wherein the one or more processing parameters are parameters for dynamic loudness adjustment by a decoder.
- An encoder for encoding in a bitstream original audio data and metadata for dynamic loudness adjustment comprising one or more processors and non-transitory memory configured to perform a method including: inputting original audio data into a loudness leveler for loudness processing to obtain, as an output from the loudness leveler, loudness processed audio data; generating the metadata for dynamic loudness adjustment based on the loudness processed audio data and the original audio data; and encoding the original audio data and the metadata into the bitstream.
- EEE-B23 A system of an encoder for encoding in a bitstream original audio data and metadata for dynamic loudness adjustment, according to EEE-B22 and a decoder for metadata-based dynamic processing of audio data for playback according to EEE-B11.
- EEE-B24 A computer program product comprising a computer-readable storage medium with instructions adapted to cause the device to carry out the method according to any one of EEE-B1 to EEE-B10 or EEE-B12 to EEE-B21 when executed by a device having processing capability.
- EEE-B25 A computer-readable storage medium storing the computer program product of EEE- B24.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Tone Control, Compression And Expansion, Limiting Amplitude (AREA)
Priority Applications (8)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| MX2024006931A MX2024006931A (es) | 2021-12-07 | 2022-08-24 | Metodo y aparato para procesar datos de audio. |
| CA3241847A CA3241847A1 (en) | 2021-12-07 | 2022-08-24 | Method and apparatus for processing of audio data |
| CN202280081463.8A CN118451498A (zh) | 2021-12-07 | 2022-08-24 | 用于处理音频数据的方法和装置 |
| EP22769193.8A EP4445365A1 (en) | 2021-12-07 | 2022-08-24 | Method and apparatus for processing of audio data |
| KR1020247022337A KR20240118131A (ko) | 2021-12-07 | 2022-08-24 | 오디오 데이터의 처리를 위한 방법 및 장치 |
| AU2022405503A AU2022405503A1 (en) | 2021-12-07 | 2022-08-24 | Method and apparatus for processing of audio data |
| JP2024534300A JP2024543726A (ja) | 2021-12-07 | 2022-08-24 | オーディオデータを処理する方法および装置 |
| US18/715,072 US20250046318A1 (en) | 2021-12-07 | 2022-08-24 | Method and apparatus for processing of audio data |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202163287029P | 2021-12-07 | 2021-12-07 | |
| US63/287,029 | 2021-12-07 | ||
| US202163290493P | 2021-12-16 | 2021-12-16 | |
| US63/290,493 | 2021-12-16 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023104360A1 true WO2023104360A1 (en) | 2023-06-15 |
Family
ID=83283345
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2022/073628 Ceased WO2023104360A1 (en) | 2021-12-07 | 2022-08-24 | Method and apparatus for processing of audio data |
Country Status (8)
| Country | Link |
|---|---|
| US (1) | US20250046318A1 (https=) |
| EP (1) | EP4445365A1 (https=) |
| JP (1) | JP2024543726A (https=) |
| KR (1) | KR20240118131A (https=) |
| AU (1) | AU2022405503A1 (https=) |
| CA (1) | CA3241847A1 (https=) |
| MX (1) | MX2024006931A (https=) |
| WO (1) | WO2023104360A1 (https=) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2015038522A1 (en) * | 2013-09-12 | 2015-03-19 | Dolby Laboratories Licensing Corporation | Loudness adjustment for downmixed audio content |
| US20170032793A1 (en) * | 2015-07-31 | 2017-02-02 | Apple Inc. | Encoded audio extended metadata-based dynamic range control |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP4472075B1 (en) * | 2014-10-01 | 2026-01-14 | Dolby International AB | Decoding an encoded audio signal using drc profiles |
-
2022
- 2022-08-24 CA CA3241847A patent/CA3241847A1/en active Pending
- 2022-08-24 WO PCT/EP2022/073628 patent/WO2023104360A1/en not_active Ceased
- 2022-08-24 KR KR1020247022337A patent/KR20240118131A/ko active Pending
- 2022-08-24 AU AU2022405503A patent/AU2022405503A1/en active Pending
- 2022-08-24 US US18/715,072 patent/US20250046318A1/en active Pending
- 2022-08-24 JP JP2024534300A patent/JP2024543726A/ja active Pending
- 2022-08-24 MX MX2024006931A patent/MX2024006931A/es unknown
- 2022-08-24 EP EP22769193.8A patent/EP4445365A1/en active Pending
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2015038522A1 (en) * | 2013-09-12 | 2015-03-19 | Dolby Laboratories Licensing Corporation | Loudness adjustment for downmixed audio content |
| US20170032793A1 (en) * | 2015-07-31 | 2017-02-02 | Apple Inc. | Encoded audio extended metadata-based dynamic range control |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20240118131A (ko) | 2024-08-02 |
| EP4445365A1 (en) | 2024-10-16 |
| MX2024006931A (es) | 2024-06-20 |
| CA3241847A1 (en) | 2023-06-15 |
| US20250046318A1 (en) | 2025-02-06 |
| AU2022405503A1 (en) | 2024-06-20 |
| JP2024543726A (ja) | 2024-11-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN107820711B (zh) | 用于音频编码系统中用户交互性的响度控制 | |
| JP2021089444A (ja) | 異なる再生装置を横断するラウドネスおよびダイナミックレンジの最適化 | |
| US20240355338A1 (en) | Method and apparatus for metadata-based dynamic processing of audio data | |
| US20250046318A1 (en) | Method and apparatus for processing of audio data | |
| US20250342841A1 (en) | Method and apparatus for processing of audio data | |
| JP7816881B2 (ja) | 変更オーディオビットストリームの生成及び処理のための方法及び装置 | |
| HK40130834A (en) | Method and apparatus for metadata-based dynamic processing of audio data | |
| CN118451498A (zh) | 用于处理音频数据的方法和装置 | |
| HK40111646A (zh) | 用於处理音频数据的方法和装置 | |
| RU2858248C2 (ru) | Способ и оборудование для динамической обработки на основе метаданных для аудиоданных | |
| CN117882133A (zh) | 用于音频数据的基于元数据的动态处理的方法和装置 | |
| HK40116098A (zh) | 用於处理音频数据的方法和装置 | |
| HK40104139A (zh) | 用於音频数据的基於元数据的动态处理的方法和装置 | |
| JP7631313B2 (ja) | 変更されたビットストリームを生成および処理する方法およびデバイス | |
| HK1226580B (zh) | 基於对象的音频响度管理 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22769193 Country of ref document: EP Kind code of ref document: A1 |
|
| DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
| WWE | Wipo information: entry into national phase |
Ref document number: 18715072 Country of ref document: US |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022405503 Country of ref document: AU Ref document number: AU2022405503 Country of ref document: AU |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 3241847 Country of ref document: CA |
|
| WWE | Wipo information: entry into national phase |
Ref document number: MX/A/2024/006931 Country of ref document: MX |
|
| ENP | Entry into the national phase |
Ref document number: 2024534300 Country of ref document: JP Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202280081463.8 Country of ref document: CN |
|
| REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112024011402 Country of ref document: BR |
|
| ENP | Entry into the national phase |
Ref document number: 2022405503 Country of ref document: AU Date of ref document: 20220824 Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 11202403757P Country of ref document: SG |
|
| ENP | Entry into the national phase |
Ref document number: 20247022337 Country of ref document: KR Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024118806 Country of ref document: RU Ref document number: 2022769193 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2022769193 Country of ref document: EP Effective date: 20240708 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202417058630 Country of ref document: IN |
|
| ENP | Entry into the national phase |
Ref document number: 112024011402 Country of ref document: BR Kind code of ref document: A2 Effective date: 20240606 |