WO2024076810A1 - Procédés, appareils et systèmes de réalisation d'une commande de gain à motivation perceptive - Google Patents

Procédés, appareils et systèmes de réalisation d'une commande de gain à motivation perceptive Download PDF

Info

Publication number
WO2024076810A1
WO2024076810A1 PCT/US2023/073365 US2023073365W WO2024076810A1 WO 2024076810 A1 WO2024076810 A1 WO 2024076810A1 US 2023073365 W US2023073365 W US 2023073365W WO 2024076810 A1 WO2024076810 A1 WO 2024076810A1
Authority
WO
WIPO (PCT)
Prior art keywords
gain
frame
audio signal
transition function
gain transition
Prior art date
Application number
PCT/US2023/073365
Other languages
English (en)
Inventor
Panji Setiawan
Benjamin Gilbert MCDONALD
Rishabh Tyagi
Original Assignee
Dolby Laboratories Licensing Corporation
Dolby International Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dolby Laboratories Licensing Corporation, Dolby International Ab filed Critical Dolby Laboratories Licensing Corporation
Publication of WO2024076810A1 publication Critical patent/WO2024076810A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Definitions

  • gain transition functions have been proposed to smoothly transition between the different gains applied to consecutive frames. If there is a drastic gain change between consecutive frames, this method may lead to audible artifacts. Further, in some cases the gain change between the determined gains of consecutive frames is too large and/or sudden for applying a smooth transition function. In this case, a hard transition may be used to ensure that the signal is within expected range. For example, a single bit may be used to convey the information that a hard transition is used between the gains of consecutive frames.
  • a speaker may be implemented to include multiple transducers, such as a woofer and a tweeter, which may be driven by a single, common speaker feed or multiple speaker feeds.
  • the speaker feed(s) may undergo different processing in different circuitry branches coupled to the different transducers.
  • system is used in a broad sense to denote a device, system, or subsystem.
  • a subsystem that implements a decoder may be referred to as a decoder system, and a system including such a subsystem (e.g., a system that generates X output signals in response to multiple inputs, in which the subsystem generates M of the inputs and the other X ⁇ M inputs are received from an external source) may also be referred to as a decoder system.
  • processor is used in a broad sense to denote a system or device programmable or otherwise configurable, such as with software or firmware, to perform operations on data, which may include audio, or video or other image data.
  • processors include a field-programmable gate array (or other configurable integrated circuit or chip set), a digital signal processor programmed and/or otherwise configured to perform pipelined processing on audio or other sound data, a programmable general purpose processor or computer, and a programmable microprocessor chip or chip set.
  • a method of performing gain control on audio signals is provided.
  • the audio signal may be a higher order ambisonics, HOA, audio signal.
  • a downmixed audio signal of an audio signal to be encoded may be obtained.
  • Obtaining the audio signal may include receiving the downmixed audio signal.
  • it may include determining the downmixed audio signal from the audio signal to be encoded.
  • an overload condition may be a condition in which the frame of the downmixed audio signal exceeds a predefined signal range.
  • the predefined signal range may be a signal range expected by the encoder.
  • the encoder may be a core encoder.
  • a gain transition function for the frame may be determined.
  • the gain transition function may be based at least on a gain transition step size.
  • the gain transition function may be applied to the frame to generate a gain adjusted frame of the downmixed audio signal.
  • the gain adjusted frame may be an attenuated frame or an amplified frame.
  • the gain adjusted frame and information indicative of the gain transition function may be provided for encoding by an encoder. [0010] By limiting the gain transition function to a gain transition step size, a smooth and not too sudden transition from consecutive gains can be achieved.
  • the gain transition step size may be insufficient to attenuate all samples of a frame to the signal range required by a core encoder.
  • the gain adjusted frame together with the information indicative of the gain transition function may be encoded.
  • the downmixed audio signal may be a spatially encoded downmixed signal.
  • the frame of the downmixed audio signal may be a current frame and the gain transition function is further based on a previous gain transition function applied to a frame preceding the current frame.
  • the gain transition function may further depend on a smoothing function based on the gain transition step size.
  • the gain transition function may include a transitory portion and a steady-state portion. The transitory portion may correspond to a transition from a gain associated with a preceding frame to the gain associated with the preceding frame adjusted by the gain transition step size.
  • the gain associated with the preceding frame adjusted by the gain transition step size may be an attenuation by the gain transition step size or an amplification by the gain transition step size of the gain associated with the preceding frame depending on a gain adjustment target of the current frame.
  • a length of the transitory portion may be limited by a delay introduced by a codec utilized by the encoder and decoder. [0018] Thereby, the gain control does introduce substantially zero additional delay. [0019] In some embodiments, the length of the transitory portion may be equal to or less than the number of samples used for an encoding operation by the encoder.
  • the gain transition function may be defined as ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ , ⁇ , ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ 1 is a smoothing function, ⁇ ⁇ represents the right-most index for which ⁇ is defined and L is the number of samples of one frame.
  • the gain transition step size may be a predefined value or may be determined from a set of predefined values of increasing size. The predefined value or the set of predefined values may be determined based on perceptive quality listening test or an objective quality measurement test.
  • the perceptive quality listening test may be a Multi- Stimulus Test with Hidden Reference and Anchor, MUSHRA.
  • the perceptive quality listening test may be part of a tuning process of the automatic gain control at the encoder and decoder.
  • the method may further include determining an overload amount caused by the frame of the downmixed audio signal.
  • the gain transition step size may be determined from the set of predefined values of increasing size depending on the overload amount.
  • the gain transition step size can be adapted to the rate of change needed between consecutive frames.
  • applying the gain transition function to the frame for generating a gain adjusted frame of the downmixed signal may include applying the gain transition function to samples of the downmixed audio signal.
  • encoding the gain adjusted frame together with the information indicative of the gain transition function may include determining an encoding scheme based on the gain transition function. In some cases, the encoding scheme may be determined based on the gain transition step size. In some cases, the encoding scheme may be determined based on whether the overload condition has been removed.
  • the encoding scheme may be one of Modified Discrete Cosine Transformation, MDCT, or Algebraic Code Excited Linear Prediction, ACELP. [0026] Thereby, the coding scheme can be optimized for the particular audio signal and the required gain transition step size.
  • a method of performing gain control on audio signals is provided.
  • an encoded frame of an audio signal may be received by a decoder.
  • the encoded frame of an audio signal may be decoded to obtain a frame of a downmixed audio signal and information indicative of gain control applied by an encoder.
  • An inverse gain transition function to be applied to the frame of the downmixed audio signal may be determined based at least in part on the information indicative of gain control applied by the encoder.
  • the information indicative of gain control applied by the encoder may include a gain transition step size.
  • the inverse gain transition function may be applied to the frame of the downmixed audio signal.
  • the method may further include upmixing the downmixed audio signal to generate an upmixed audio signal.
  • the upmixed audio signal may be suitable for rendering.
  • the method may further include rendering the upmixed signal to produce rendered audio data.
  • the method may further include playing back the rendered audio data using one or more of a loudspeaker or headphones.
  • the inverse gain transition function may be determined by inverting a gain transition function applied by the encoder.
  • the inverse gain transition function may include a transitory portion and a steady-state portion.
  • Some or all of the operations, functions and/or methods described herein may be performed by one or more devices according to instructions (e.g., software) stored on one or more non-transitory media.
  • Such non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc.
  • RAM random access memory
  • ROM read-only memory
  • At least some aspects of the present disclosure may be implemented via an apparatus.
  • one or more devices may be capable of performing, at least in part, the methods disclosed herein.
  • an apparatus is, or includes, an audio processing system having an interface system and a control system.
  • the control system may include one or more general purpose single- or multi-chip processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or other programmable logic devices, discrete gates or transistor logic, discrete hardware components, or combinations thereof.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • Figure 1 is an illustrative schematic block diagram of a system for providing gain control of audio signals in the prior art.
  • Figures 2A and 2B are illustrative schematic block diagrams of a system for implementing adaptive gain control in accordance with some embodiments.
  • Figures 3A and 3B show examples of gain transition functions that may be implemented by an encoder and inverse gain transition functions that may be implemented by a decoder, respectively, in accordance with some embodiments.
  • Figure 4 is a flowchart of an example process that may be performed by an encoder for implementing adaptive gain control in accordance with some embodiments.
  • Figure 5 is a flowchart of an example process that may be performed by a decoder for implementing adaptive gain control in accordance with some embodiments.
  • Figure 6 illustrates example use cases for an Immersive Voice and Services (IVAS) system in accordance with some embodiments.
  • Figure 7 shows a block diagram that illustrates examples of components of an apparatus capable of implementing various aspects of this disclosure.
  • Figures 8A and 8B illustrate example embodiments of audio codecs utilizing a perceptually motivated gain control of downmixed signals, where the gain transition step-size is uniform.
  • Figures 9A and 9B illustrate example embodiments of audio codecs utilizing a perceptually motivated gain control of downmixed signals, where the gain transition step-size is non-uniform.
  • Like reference numbers and designations in the various drawings indicate like elements.
  • DETAILED DESCRIPTION OF EMBODIMENTS [0046] Some coding techniques for scene-based audio, stereo audio, multi-channel audio, and/or object audio rely on coding multiple component signals after a downmix operation. Downmixing may allow a reduced number of audio components to be coded in a waveform encoded manner that retains the waveform, and the remaining components may be encoded parametrically.
  • the remaining components may be reconstructed using parametric metadata indicative of the parametric encoding. Because only a subset of the components are waveform encoded and the parametric metadata associated with the parametrically encoded components may be encoded efficiently with respect to bit rate, such a coding technique may be relatively bit rate efficient while still allowing high quality audio.
  • One problem that may occur is that downmix channels determined by a spatial encoder may include signals with levels that are not suitable for subsequent processing by a core codec that constructs an audio signal bitstream. For example, in some cases, a downmix signal may have a level that is so high that the core codec is overloaded despite the original input signal not being overloaded in any of its component signals.
  • Figure 1 shows a schematic block diagram of a conventional system 100 for performing gain control on encoded higher order Ambisonics (HOA) signals.
  • the schematic diagram shown in Figure 1 may be used for encoding and decoding MPEG-H signals.
  • MPEG- H is a group of international standards under development by the International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) Moving Picture Experts Group (MPEG). MPEG-H has various parts, including Part 3, MPEG-H 3D Audio.
  • the processing may include decomposition, for example, in which downmix channels are generated.
  • the downmix channels may include a set of signals which are bound by [-max, max] for a given frame. Because a core encoder 108 can encode signals within a range of [-1, 1), samples of the signals associated with the downmix channels that exceed the range of core encoder 108 may cause overload.
  • a gain control 106 adjusts the gain of the frame such that the associated signals are within the range of core encoder 108 (e.g., within [-1, 1)).
  • Core encoder 108 may be considered the codec that generates an encoded bitstream.
  • Side information generated by the decomposition/processing block 104 which may include metadata associated with parametrically encoded channels, or the like, may be encoded in a bitstream in connection with the signals produced as an output of core encoder 108.
  • the encoded bitstream is received by a decoder 112. Decoder 112 may extract the side information and a core decoder 116 may extract downmix signals.
  • An inverse gain control block 120 may then reverse the gain applied by the encoder.
  • the inverse gain control block 120 may amplify signals that were attenuated by gain control 106 of encoder 102.
  • the HOA signals may then be reconstructed by an HOA reconstruction block 122.
  • the HOA signals may be rendered and/or played back by rendering/playback block 124.
  • Rendering/playback block 124 may include, for example, various algorithms for rendering the reconstructed HOA output, e.g., as rendered audio data.
  • rendering the reconstructed HOA output may involve distributing the one or more signals of the HOA output across multiple speakers to achieve a particular perceptual impression.
  • rendering/playback block 124 may include one or more loudspeakers, headphones, etc. for presenting the rendered audio data.
  • Gain control 106 may implement gain control using the following techniques.
  • Gain control 106 may first determine an upper bound of the signal values in a frame. For example, for MPEG-H audio signals, the bound may be expressed as a product ⁇ ⁇ ⁇ ⁇ ⁇ , where the product is specified in the MPEG-H standard. Given the upper bound, the minimum attenuation required may ensure that the scaled signal samples are bound by the interval [-1, 1). In other words, the scaled samples may be within the range of core encoder 108. This may be determined by applying the gain factor of 2 ⁇
  • emin may be a negative number. In some be limited by a maximum amplification factor 2 ⁇ , where emax is a non-negative integer number. Accordingly, to perform both attenuation and amplification, a gain factor of 2 e can be defined, with the gain parameter e being a value in the range of [emin, emax]. Consequently, the lowest number of bits required to represent the gain parameter e is determined as ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇
  • a gain factor gn(j), for a particular channel n and frame j may be determined by applying a one frame delay, which corresponds to one HOA block, and utilizing the following recursive operation: ⁇ ⁇ ⁇ ⁇ ⁇ 1 ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ 2 ⁇ ⁇ 2 ⁇ [0053]
  • gn(j-2) represents a gain factor applied for the frame (j-2)
  • 2 ⁇ represents the gain factor adjustment required to calculate the gain factor g n (j-1) for the frame j-1.
  • gain parameters may be determined that do not produce an additional delay, because gain parameters may be determined based on lookahead samples generated for use by a codec.
  • the codec may be used by a perceptual encoder. Determination of gain transition functions are shown in and described below in connection with Figures 2-5.
  • Figures 2A and 2B show a schematic block diagram of an encoder 202 and a decoder 212, respectively, for performing low-delay adaptive gain control in accordance with exemplary embodiments.
  • an input HOA signal or first-order Ambisonics (FOA)
  • FOA first-order Ambisonics
  • spatial analysis block 204 may generate and output a set of M downmix channels 204A.
  • the number of downmix channels in the set of M downmix channels 204A may be in a range 1 ⁇ M ⁇ N.
  • spatial analysis block 204 may generate and output spatial side information 204B for reversing the downmix operation.
  • the downmix channels may include a primary downmix channel W’, which can be generated by mixing the omnidirectional input signal W with the directional input signals X, Y and Z using a variety of mixing gains, and up to 3 residual channels, X’, Y’, and Z’, each corresponding to signal components in the X, Y, and Z signals that cannot be predicted from the primary downmix signal.
  • spatial analysis block 204 utilizes the Spatial Reconstruction (SPAR) technique. SPAR is further described in D. McGrath, S. Bruhn, H. Purnhagen, M. Eckert, J. Torres, S. Brown, and D.
  • SPAR Spatial Reconstruction
  • spatial analysis block 204 may utilize any other suitable linear predictive codec of energy compacting transform, such as a Karhunen-Loeve Transform (KLT) or the like.
  • Core encoder 208 may be considered the codec that generates an encoded audio bitstream 208A.
  • the core encoder 208 and a core decoder 216 may introduce some lookahead samples that are to be utilized by an adaptive gain control 206 to determine gain parameters to avoid adding extra delay (zero additional delay) to the whole coding process.
  • the signals associated with the M downmix channels 204A may then be analyzed by an adaptive gain control 206.
  • Adaptive gain control 206 may determine whether signals associated with any of the M downmix channels 204A surpass the audio amplitude range expected by core encoder 208, and therefore, will overload core encoder 208.
  • adaptive gain control 206 may set a flag indicating that no gain control is applied.
  • the flag indication may be performed by setting a value for the flag, for example by setting a value of a single bit.
  • adaptive gain control 206 may not set the flag, thereby, preserving one bit (e.g., the bit associated with the flag).
  • a spatial metadata bitstream and/or a core encoder bitstream (which may be a perceptual encoder bitstream) are self-terminating
  • the presence of a gain control flag may be determined by determining whether there are any unread bits in the bitstream.
  • the unread bits may be left over bits in the bitstream.
  • adaptive gain control 206 may output the M downmix channels 206A.
  • the M downmix channels 206A may then be passed to core encoder 208 for encoding in a bitstream 208A.
  • adaptive gain control 206 may determine gain parameters and apply gain(s) to the M downmix channels according to the determined gain parameters.
  • the M downmix channels with gain applied 206A may then be passed to core encoder 208 for encoding in a bitstream.
  • adaptive gain control 206 may output side information on gain control 206B. Information regarding the flag may be comprised in the side information on gain control 206B.
  • Side information encoder 210 may encode spatial side information 204B together with gain parameters 206B as metadata 210A for transmission in a bitstream.
  • the decoder 212 may then extract and use this metadata to upmix the downmixed channels and reverse the gain adjustment.
  • Metadata 210A may later be utilized to reconstruct a representation of the original audio input that was downmixed by spatial analysis unit 204.
  • Side information encoder 210 may additionally provide side information 208B to core encoder 208. Core encoder 208 then may use side information 208B to choose between coding techniques. Both the encoded bitstream 208A and the encoded bitstream with metadata 210A may be multiplexed to form final bitstream output by encoder 202.
  • adaptive gain control 206 may determine a gain transition function that transitions between a gain parameter e(j-1) associated with a previous frame (e.g., the j-1 th frame) and a gain parameter of the current frame, e(j).
  • the gain transition function may be applied by adaptive gain control 206 on a frame by frame basis, wherein each frame may be a frame of one of the M downmix channels 204A.
  • the gain transition function may smoothly transition the gain parameter across the samples of the j th frame from the value of the gain parameter at the j-1 th frame (e.g., e(j-1)) to the gain parameter of the current frame (e.g., e(j)).
  • the gain transition function may include two portions: 1) a transitory portion in which the gain parameter is transitioning across the samples of the transition portion from the gain parameter of the preceding frame to the gain parameter of the current frame; and 2) a steady-state portion in which the gain parameter has the value of the gain parameter of the current frame for the samples of the steady-state portion.
  • the transitory portion in an instance in which the gain applied to the current frame is less than the gain applied to the previous frame, the transitory portion may be referred to as having a transitory type of “fade,” because the amount of attenuation increases across the samples of the current frame.
  • the case where the gain applied to the current frame is less than the gain applied to the previous frame may be represented as e(j) > e(j-1).
  • the transitory portion in an instance in which the gain applied to the current frame is greater than the gain applied to the previous frame, the transitory portion may be referred to as having a transitory type of “reverse fade,” or “un-fade,” because the amount of attenuation decreases across the samples of the current frame.
  • the case where the gain applied to the current frame is greater than the gain applied to the previous frame may be represented as e(j) ⁇ e(j-1).
  • the transitory portion in an instance in which the gain applied to the current frame is the same as the gain applied to the current frame, the transitory portion may be referred to as having a transitory type of “hold,” in which the transitory portion is not transitory and rather has the same value as the steady-state portion.
  • the gain transition function depends on a gain transition step size.
  • the gain transition step size may limit the amount of a possible transition from a preceding frame to the current frame.
  • the perceived quality may be measured based on known perceptive quality listening tests like the Multi-Stimulus Test with Hidden Reference and Anchor, MUSHRA.
  • the perceptive quality listening test may be part of a tuning process of the automatic gain control at the encoder and decoder.
  • parameters like the gain transition step size may be tuned for a particular audio scenario and codec until an optimum perceived audio quality is reached.
  • the tuned parameters are then used by the encoding/decoding system.
  • the processed output 206A of the automatic gain control 206 is further coded by a lossy core codec based on Algebraic Code Excited Linear Prediction (ACELP) coding which does not aim to do waveform reconstruction.
  • ACELP Algebraic Code Excited Linear Prediction
  • the automatic gain control 206 may also determine the attenuation amount necessary for the frame to be inside the expected range of the core encoder 208. If there is a large difference between the attenuation needed between consecutive frames, applying a transition function to achieve the required range [-1,1) by the core encoder 208 may lead to audible artifacts when the audio signal is rendered at the decoder. Instead of applying a transition function to keep each frame inside or at the limit of the required range, the transition function may be limited to a specific gain transition step size.
  • the transition function can only attenuate a single frame by an amount equal to the gain transition step size, i.e., ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ . Therefore, as an example, if the attenuation of a previous frame is ⁇ 10 ⁇ ⁇ , the attenuation applied to the first sample of the current frame will be ⁇ 10 ⁇ ⁇ , and the attenuation applied to the last sample of the current frame will be ⁇ 10 ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ .
  • the gain transition will be a constant value, e.g., ⁇ 10 ⁇ ⁇ .
  • the gain transition function will transition from the attenuation of the previous frame to the last sample of the current frame by ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ .
  • ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ may be chosen such that the attenuation amount applied by the automatic gain control 206 is not sufficient for keeping the frame inside the expected signal range of the core encoder 208.
  • ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ may be a fixed value.
  • ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ may be a single value, e.g., ⁇ 1 ⁇ ⁇ .
  • ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ may be chosen from a set of increasing fixed values, e.g., ⁇ 1 ⁇ ⁇ , ⁇ 3 ⁇ ⁇ , ⁇ 6 ⁇ ⁇ .
  • the value for ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ may be chosen depending on amount of overload caused by the frame without attenuation.
  • the automatic gain control is configured to have the ability to specify a set of target gain values G T , which may be represented as a table of numbers, e.g., integers, indicating the multiples of DBS attenuation provided at each step. This is motivated by the fact that smaller changes provide perceptual benefits, however a higher level of possible attenuation may be required for some signals. Specifying these non-uniform absolute steps allows for a wider attenuation range to be covered while providing the benefits of smaller steps for many likely cases.
  • One or more of such tables of integers may be specified and the information on the choice of a particular table being used at the encoder side may be signaled/transmitted to the decoder side.
  • the application of non-uniform steps is resulting in non-uniform gain transition shapes (level-dependent transition functions).
  • output level and attenuation information from the automatic gain control 206 system may be used in the decision-making process in other systems such as the core encoder 208. While the relaxed requirement can provide perceptual benefits, it can impact the core encoder 208 by either introducing a change in gain or by not meeting the strict requirement and allowing overload conditions to remain.
  • a transitory portion of a gain transition function may be determined using a prototype shape of a transitory part of a gain transition function, where the prototype shape is scaled based on the difference between the gain parameter of the current frame and the gain parameter of the preceding frame.
  • the prototype shape may be scaled based on e(j) – e(j-1).
  • a gain transition function utilizing such a protype function p may be represented as: ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ , ⁇ , ⁇ 1 samples of one frame.
  • the prototype shape of the transitory part gain can be defined as: ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ 1 ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ 1 ⁇
  • L may be ⁇ ⁇ ⁇ 1 for example.
  • each gain transition function has a transitory portion that begins at sample 0, which may correspond to the beginning of the current frame, with a gain of 0 dB, where 0 dB is the gain parameter of the preceding frame (e.g., the j-1 th frame).
  • the transitory portion of each gain transition function changes over the course of about 384 samples to the steady-state portion of the gain transition function.
  • the steady-state portion corresponds to a different gain transition step size for the j th frame, with an increase in (negative) gain of 6 dB, 12 dB, and 18 dB, respectively, relative to the gain of the preceding frame.
  • the transitory portion is of the same length (e.g., about 384 samples).
  • the length of the steady-state portion may correspond to an offset related to the delay introduced by the codec, e.g., 12 milliseconds in the example shown in Figure 3A.
  • the length of the transitory portion may be related to the reciprocal of the offset.
  • the length of the transitory portion is the frame length (e.g., 20 milliseconds) minus the codec delay (e.g., 12 milliseconds).
  • the codec delay may be the overall coder algorithmic delay excluding the frame size delay.
  • gain transition functions having a transitory portion of a transitory type of “reverse fade” or “un-fade” may be represented as mirror images flipped across a horizontal line of the gain transition functions shown in Figure 3A.
  • the horizontal line may be the x-axis.
  • decoder 212 may receive, as an input, the encoded audio bitstream 208A and the metadata bitstream 210A and can reconstruct the HOA signals, e.g., for rendering, or directly render to a desired output format.
  • a core decoder 216 receives the encoded audio bitstream 208A.
  • core decoder 216 may receive information 214A extracted from metadata bitstream 210A by side information decoder 214.
  • the core decoder 216 may decode the encoded audio bitstream 208A based on information 214A or without any side information knowledge and outputs M gain adjusted downmixed channels 216A to an inverse gain control 220.
  • Side information decoder 214 further extracts gain parameters and spatial side information and transmits this information 214B to inverse gain control 220 and spatial synthesis/rendering/playback block 222.
  • Inverse gain control 220 then may obtain the gain parameters that were applied by encoder 202 from information 214B.
  • inverse gain control 220 may retrieve the gain transition step size ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ and/or an indication of an arithmetic factor related to ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ , applied by encoder 202 from information 214B. Additionally, inverse gain control block 220 may retrieve, e.g., from memory, the shape of the transition function, i.e., a shape of the prototype function ⁇ , which is also referred to as smoothing function. Inverse gain control block 220 may then reverse the gain applied by encoder 202 using the obtained gain parameters and outputs M downmixed channels 220A.
  • inverse gain control 220 may construct an inverse gain transition function that transitions from the gain parameter of the preceding frame to the gain parameter of the current frame.
  • the inverse gain transition function may be the gain transition function applied by encoder 202 mirrored across a center vertical line and vertically adjusted.
  • the vertical line may be the y-axis.
  • Figure 3B an example of an inverse gain transition function that would be applied by a decoder responsive to the gain transition function shown in Figure 3A being applied by an encoder is shown in accordance with some implementations. As illustrated, the inverse gain transition function has a steady-state portion and a transitory portion.
  • the durations of the steady-state portions and the transitory portions of the inverse gain transition function may correspond to, e.g., be the same as, the durations of the corresponding steady- state portions and transitory portions of the gain transition function, as illustrated in Figures 3A and 3B.
  • each inverse gain transition function shown in Figure 3B begins at 0 dB and transitions to ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ for the current frame. That is, each inverse gain transition functions begins at 0 dB corresponding to the inverse gain applied to the preceding frame j-1.
  • the inverse gain applied by the decoder corresponds to an amplification with a gain of greater than 0 dB as shown in the gain transition function of Figure 3B.
  • the inverse gain applied by the decoder corresponds to an attenuation, e.g., with a gain of less than 0 dB.
  • the M downmix channels with inverse gain applied 220A are provided to a spatial synthesis/rendering/playback block 222.
  • Spatial synthesis/rendering/playback block 222 may reconstruct the HOA signals using information 214B.
  • spatial analysis block 204 utilizes SPAR techniques for spatial encoding
  • spatial synthesis/rendering/playback block 222 may utilize SPAR techniques to reconstruct one or more channels which were encoded using metadata 210A.
  • the reconstructed HOA output may then be rendered directly or provided to another entity for rendering.
  • Spatial synthesis/rendering/playback block 222 may include, for example, various algorithms for rendering the reconstructed HOA output, e.g., as rendered audio data.
  • rendering the reconstructed HOA output may involve distributing the one or more signals of the HOA output across multiple speakers to achieve a particular perceptual impression.
  • spatial synthesis/rendering/playback block 222 may include audio playback devices, e.g., one or more loudspeakers, headphones, etc., for presenting the rendered audio data.
  • Figure 4 shows an example of a process 400 for determining gain parameters and applying gain to downmixed signals according to the determined gain parameters in accordance with some implementations. In some implementations, blocks of process 400 may be performed by an encoder device.
  • blocks of process 400 may be performed in an order other than what is shown in Figure 4. In some implementations, two or more blocks of process 400 may be performed substantially in parallel. In some implementations, one or more blocks of process 400 may be omitted.
  • process 400 may obtain downmixed audio signal(s) associated with a frame of an audio signal to be encoded. The downmixed audio signal(s) may be associated with a frame of the audio signal to be encoded.
  • process 400 may use any suitable spatial encoding technique to determine a set of downmixed channels. Examples of spatial encoding techniques include SPAR, a linear predictive technique, or the like.
  • the set of downmixed channels may include anywhere from one to N channels, where N is the number of input channels, e.g., in the case of FOA signals, N is 4.
  • the downmixed signals may include audio signals corresponding to the downmixed channels for a particular frame of the audio signal.
  • process 400 may determine whether an overload condition exists for a codec, such as for the Enhanced Voice Services (EVS) codec, and/or for any other suitable codec. For example, process 400 may determine that an overload condition exists responsive to determining that signals corresponding to a frame of the downmix audio signal(s) exceed a predetermined range, e.g., [-1, 1), and/or any other suitable range.
  • a codec such as for the Enhanced Voice Services (EVS) codec
  • process 400 can proceed to 412 and can encode the downmixed signals. For example, in some implementations, process 400 can generate a bitstream that encodes the downmixed signals in connection with side information, such as metadata, that can be utilized by a decoder to upmix the downmixed signals, e.g., to reconstruct a FOA or HOA output.
  • side information such as metadata
  • process 400 can proceed to 406 and can determine a gain transition function for the frame that causes the overload condition to be avoided or if the change of the overload condition from one frame to the next frame is larger than the gain transition step size, the overload is at least reduced. Further, in 406, the gain transition function may be based on the gain transition step size. Additionally, the gain transition function may be based on a shape of a smoothing function.
  • the gain transition function may have a transitory portion and a steady-state portion, where the steady-state portion corresponds to the gain factor for the current frame, and the transitory portion corresponds to a sequence of intermediate gain factors for a subset of samples of the current frame that transition from the gain factor at the end of the preceding frame to the gain factor of the preceding frame ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ .
  • the transitory portion may be referred to as having a transitory type of “fade.” Conversely, in instances in which the gain parameter of the preceding frame corresponds to more attenuation than the gain parameter of the current frame, the transitory portion may be referred to as having a transitory type of “reverse fade” or “un-fade.” In instances in which the gain parameter of the preceding frame is the same as the gain parameter of the current frame, the transitory portion may be referred to as having a transitory type of “hold,”.
  • process 400 may apply the gain transition function to the downmixed signals associated with the frame. For example, in some implementations, process 400 may scale the samples of the downmixed signals by gain factors indicated by the gain transition function.
  • a first sample of the current frame may be scaled by a gain factor corresponding to the gain parameter of the preceding frame
  • a last sample of the current frame may be scaled by a gain factor corresponding to the gain parameter of the previous frame ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇
  • intervening samples may be scaled by gain factors corresponding to the gain parameters of the transitory or steady-state portions of the gain transition function.
  • the gain transition function may be applied to only the downmixed signals of the downmix channels for which the overload condition was detected at block 404.
  • the gain transition function may not be applied to the W’ and Z’ channels.
  • indications of the channels to which gain transition functions are applied, as well as the corresponding gain parameters for each channel may be encoded, e.g., at block 412.
  • the corresponding gain transition function may be applied to all downmix channels.
  • process 400 may provide the attenuated signal and information indicative of the gain transition function to an encoder for encoding.
  • Information indicative of the gain transition function may the gain transition step size and/or an arithmetic factor related to the gain transition step size. Additionally, the shape of the smoothing function may be provided to the encoder for encoding.
  • process 400 can encode the downmixed signals and, if gain was applied, information indicative of the gain parameter(s) for the frame.
  • the encoded downmixed signals may be the downmixed signals after application of the gain transition function at block 408.
  • the downmixed signals and any information indicative of gain parameters may be encoded by a codec to generate an encoded bitstream, such as the EVS codec, or the like, in connection with any side information, such as metadata, that may be used by a decoder to reconstruct or upmix the downmixed signals.
  • the encoded bitstream together with the metadata may then be stored and/or transmitted to a receiving device with the ability to reverse the processing steps of the encoder.
  • process 400 can encode the gain parameters in a set of bits.
  • the gain transition function may indicate a prototype/smoothing function associated with the transitory portion of the gain transition function.
  • a total number of bits used transmit gain control information is Ndmx + x* ⁇ ⁇ , where Ndmx represents the number of downmix channels (and where a single bit is utilized to indicate, for each of the Ndmx channels, whether gain control is enabled), and where ⁇ ⁇ represents the number of channels for which gain control has been enabled.
  • N dmx bits may be used to indicate that gain control is not enabled, e.g., 1 bit for each of the N dmx channels.
  • the total number of bits used to transmit gain control information is represented by x* ⁇ ⁇ .
  • the number of bits used is x.
  • the number of bits used is x.
  • FIG. 5 shows an example of a process 500 for obtaining gain parameters utilized by an encoder and applying an inverse gain transition function based on the obtained gain parameters in accordance with some implementations.
  • blocks of process 500 may be performed by a decoder device.
  • blocks of process 500 may be performed in an order other than what is shown in Figure 5.
  • two or more blocks of process 500 may be performed substantially in parallel.
  • one or more blocks of process 500 may be omitted.
  • Process 500 may begin at 502 by receiving an encoded frame of an audio signal.
  • the received frame (e.g., the current frame) is generally referred to herein as the j th frame.
  • the received frame may be immediately after a previously received frame, or may be a frame that is not immediately after a previously received frame.
  • process 500 can decode the encoded frame of the audio signal to obtain downmixed signals, and, if gain control was applied by the encoder, information indicative of gain control applied to the current frame.
  • Information indicative of gain control applied to the current frame may be the gain transitions step size applied by an encoder. Additionally, Information indicative of gain control applied to the current frame may be a shape of a smoothing function of a gain transition function applied by an encoder.
  • process 500 may additionally identify which downmix channels gain control was applied to. [0092] At 506, process 500 may determine an inverse gain transition function based on the gain transition step size. In some implementations, process 500 may further determine the inverse gain transition function based on the shape of the smoothing function. The inverse gain transition function may be calculated based on the gain transition function, or it may be chosen from a number of predefined inverse gain transition functions. [0093] In some implementations, process 500 may determine the inverse gain transition function to be the inverse of the gain transition function applied at the encoder. For example, the inverse gain transition function may correspond to the gain transition function mirrored across a horizontal line and adjusted.
  • Mirroring and adjustment may be along the x-axis.
  • An example of such an inverse gain transition function is shown in and described above in connection with Figure 3B.
  • the inverse gain transition function may have a steady-state portion that corresponds to the gain applied to the preceding frame.
  • the inverse gain transition function may then have a transitory portion that is the inverse of the transitory portion of the gain transition function applied at the encoder.
  • the inverse gain transition function may have a transitory portion that transitions from less amplification to more amplification.
  • the inverse gain transition function may have a transitory portion that transitions from more amplification to less amplification.
  • a duration of the transitory portion may relate to the delay introduced by the codec, where the duration of the transitory portion is the frame length (e.g., 20 milliseconds) minus the codec delay (e.g., 12 milliseconds). Note that, in instances in which the delay introduced by the codec is longer than a frame length, the inverse gain transition may be applied with a delay of one frame. In some instances, the delay may be obtained by process 500 (e.g., by the decoder) from the gain control bits.
  • the inverse gain transition function may also serve to attenuate signals that were amplified by the gain control of the encoder.
  • process 500 may apply the inverse gain transition function to the downmixed signals to reverse the gain applied by the encoder.
  • application of the inverse gain transition function may cause downmixed signals that were attenuated by the encoder to be amplified to reverse the attenuation.
  • application of the inverse gain transition function may cause downmixed signals that were amplified by the encoder to be attenuated to reverse the amplification.
  • the output of step 508 may then be M downmix channels with the same gain as the M downmix channels after step 402 of process 400.
  • process 500 can upmix the downmixed signals. Upmixing may be performed by a spatial encoder. In some examples, the spatial encoder may utilize SPAR techniques. The upmixed signals may correspond to a reconstructed FOA or HOA audio signal. In some implementations, process 500 may upmix the signals using side information, e.g., metadata, encoded in the bitstream, where the side information may be utilized to reconstruct parametrically-encoded signals. In some implementations, block 510 may be optional, e.g., when the downmixed signals can be rendered directly. [0096] In some implementations, at 512, process 500 may render the upmixed signals to generate rendered audio data.
  • side information e.g., metadata
  • block 510 may be optional, e.g., when the downmixed signals can be rendered directly.
  • process 500 may utilize any suitable rendering algorithms to render a FOA or HOA audio signal, e.g., to rendered scene-based audio data.
  • rendered audio data may be stored in any suitable format, e.g., for future presentation or playback.
  • block 512 is optional and therefore may be omitted.
  • process 500 may cause the rendered audio data to be played back.
  • the rendered audio data may be presented via one or more of loudspeakers and/or headphones.
  • multiple loudspeakers may be utilized, and the multiple loudspeakers may be positioned in any suitable positions or orientations relative to each other in three dimensions.
  • process 514 is optional and therefore may be omitted.
  • gain control information e.g., information indicative of gain parameters
  • different gain transition functions may be determined for each downmix channel for which an overload condition is detected.
  • gain control bits are needed to indicate whether or not gain control is being applied to each of the downmix channels, and gain transition function parameters are encoded for each of the downmix channels for which gain control is applied, as described above in connection with Figure 4.
  • a single gain transition function that is determined based on one downmix channel for which an overload condition exists may be applied to all of the downmix channels.
  • a more bitrate efficient encoding by applying the same gain transition function to all downmix channels, including for downmix channels for which no overload condition exists, may result in degradation of perceptual quality, by, for example, attenuating signals for which no overload of the codec exists.
  • utilizing a more targeted gain control, in which gain control is applied in a targeted manner to each downmix channel may require more bits to transmit gain control information.
  • gain control information may require re-allocation of bits typically used to waveform encode the downmix channels, which may in some cases reduce perceptual quality. Accordingly, there may be a situation-dependent tradeoff between applying the same gain transition function to all downmix channels and applying channel-specific gain control.
  • FIG. 6 illustrates example use cases for an IVAS system 600, according to an embodiment.
  • various devices communicate through call server 602 that is configured to receive audio signals from, for example, a public switched telephone network (PSTN) or a public land mobile network device (PLMN) illustrated by PSTN/OTHER PLMN 604.
  • PSTN public switched telephone network
  • PLMN public land mobile network device
  • Use cases support legacy devices 606 that render and capture audio in mono only, including but not limited to: devices that support enhanced voice services (EVS), multi-rate wideband (AMR-WB) and adaptive multi-rate narrowband (AMR-NB).
  • Use cases also support user equipment (UE) 608 and/or 614 that captures and renders stereo audio signals, or UE 610 that captures and binaurally renders mono signals into multi-channel signals.
  • Use cases also support immersive and stereo signals captured and rendered by video conference room systems 616 and/or 618, respectively.
  • Use cases also support stereo capture and immersive rendering of stereo audio signals for home theatre systems 620, and computer 612 for mono capture and immersive rendering of audio signals for virtual reality (VR) gear 622 and immersive content ingest 624.
  • VR virtual reality
  • Figure 7 is a block diagram that shows examples of components of an apparatus capable of implementing various aspects of this disclosure. As with other figures provided herein, the types and numbers of elements shown in Figure 7 are merely provided by way of example. Other implementations may include more, fewer and/or different types and numbers of elements. According to some examples, the apparatus 700 may be configured for performing at least some of the methods disclosed herein. In some implementations, the apparatus 700 may be, or may include, a television, one or more components of an audio system, a mobile device (such as a cellular telephone), a laptop computer, a tablet device, a smart speaker, or another type of device. [0102] According to some alternative implementations the apparatus 700 may be, or may include, a server.
  • a server such as a cellular telephone
  • the apparatus 700 may be, or may include, an encoder. Accordingly, in some instances the apparatus 700 may be a device that is configured for use within an audio environment, such as a home audio environment, whereas in other instances the apparatus 700 may be a device that is configured for use in “the cloud,” e.g., a server. [0103]
  • the apparatus 700 includes an interface system 705 and a control system 710.
  • the interface system 705 may, in some implementations, be configured for communication with one or more other devices of an audio environment.
  • the audio environment may, in some examples, be a home audio environment.
  • the audio environment may be another type of environment, such as an office environment, an automobile environment, a train environment, a street or sidewalk environment, a park environment, etc.
  • the interface system 705 may, in some implementations, be configured for exchanging control information and associated data with audio devices of the audio environment.
  • the control information and associated data may, in some examples, pertain to one or more software applications that the apparatus 700 is executing.
  • the interface system 705 may, in some implementations, be configured for receiving, or for providing, a content stream.
  • the content stream may include audio data.
  • the audio data may include, but may not be limited to, audio signals.
  • the audio data may include spatial data, such as channel data and/or spatial metadata.
  • the content stream may include video data and audio data corresponding to the video data.
  • the interface system 705 may include one or more network interfaces and/or one or more external device interfaces, such as one or more universal serial bus (USB) interfaces. According to some implementations, the interface system 705 may include one or more wireless interfaces.
  • the interface system 705 may include one or more devices for implementing a user interface, such as one or more microphones, one or more speakers, a display system, a touch sensor system and/or a gesture sensor system.
  • the interface system 705 may include one or more interfaces between the control system 710 and a memory system, such as the optional memory system 715 shown in Figure 7. However, the control system 710 may include a memory system in some instances.
  • the interface system 705 may, in some implementations, be configured for receiving input from one or more microphones in an environment.
  • the control system 710 may, for example, include a general purpose single- or multi- chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, and/or discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the control system 710 may reside in more than one device.
  • a portion of the control system 710 may reside in a device within one of the environments depicted herein and another portion of the control system 710 may reside in a device that is outside the environment, such as a server, a mobile device (e.g., a smartphone or a tablet computer), etc.
  • a portion of the control system 710 may reside in a device within one environment and another portion of the control system 710 may reside in one or more other devices of the environment.
  • control system 710 may reside in a device that is implementing a cloud-based service, such as a server, and another portion of the control system 710 may reside in another device that is implementing the cloud-based service, such as another server, a memory device, etc.
  • the interface system 705 also may, in some examples, reside in more than one device.
  • the control system 710 may be configured for performing, at least in part, the methods disclosed herein.
  • the control system 710 may be configured for implementing methods of determining gain parameters, applying gain transition functions, determining inverse gain transition functions, applying inverse gain transition functions, distributing bits for gain control with respect to a bitstream, or the like.
  • Some or all of the methods described herein may be performed by one or more devices according to instructions (e.g., software) stored on one or more non-transitory media.
  • Such non-transitory media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc.
  • RAM random access memory
  • ROM read-only memory
  • the one or more non-transitory media may, for example, reside in the optional memory system 715 shown in Figure 7 and/or in the control system 710. Accordingly, various innovative aspects of the subject matter described in this disclosure can be implemented in one or more non-transitory media having software stored thereon.
  • the software may, for example, include instructions for determining gain parameters, applying gain transition functions, determining inverse gain transition functions, applying inverse gain transition functions, distribution bits for gain control with respect to a bitstream, etc.
  • the software may, for example, be executable by one or more components of a control system such as the control system 710 of Figure 7.
  • the apparatus 700 may include the optional microphone system 720 shown in Figure 7.
  • the optional microphone system 720 may include one or more microphones.
  • one or more of the microphones may be part of, or associated with, another device, such as a speaker of the speaker system, a smart audio device, etc.
  • the apparatus 700 may not include a microphone system 720.
  • the apparatus 700 may nonetheless be configured to receive microphone data for one or more microphones in an audio environment via the interface system 710.
  • a cloud-based implementation of the apparatus 700 may be configured to receive microphone data, or a noise metric corresponding at least in part to the microphone data, from one or more microphones in an audio environment via the interface system 710.
  • the apparatus 700 may include the optional loudspeaker system 725 shown in Figure 7.
  • the optional loudspeaker system 725 may include one or more loudspeakers, which also may be referred to herein as “speakers” or, more generally, as “audio reproduction transducers.” In some examples, e.g., cloud-based implementations, the apparatus 700 may not include a loudspeaker system 725. In some implementations, the apparatus 700 may include headphones. Headphones may be connected or coupled to the apparatus 700 via a headphone jack or via a wireless connection, e.g., BLUETOOTH. [0112] Figures 8A and 8B illustrate example implementations of the perceptually motivated gain control where a sample uniform Gain Control with ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ 1 ⁇ ⁇ at the encoder side.
  • one frame consists of 1024 samples. Sample amplitudes are represented as dotted lines, while the gain applied per sample is represented by a solid line.
  • the gain function transitions from no attenuation (0 ⁇ ⁇ ) to an attenuation of ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ 0 ⁇ ⁇ ⁇ ⁇ 1 ⁇ ⁇ .
  • a further attenuation by ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ is introduced when the input audio signal exceeds 1 ⁇ ⁇ .
  • the resulting attenuated downmixed audio signal is depicted in Fig. 8B.
  • the amplitudes of the samples are depicted by dotted lines, while the gain function is depicted as a solid line.
  • the automatic gain control can react to overloads caused at the encoder by attenuating the signal with increasing values at each frame.
  • the gain transition step size is not large enough so that all sampled are below the required threshold (0 ⁇ ⁇ ). This may lead to distortions when the audio signal is rendered at the decoder, but the distortions caused by the overload at the encoder are less noticeable than the distortions caused by very sudden gain changes.
  • Some aspects of present disclosure include a system or device configured, e.g., programmed, to perform one or more examples of the disclosed methods, and a tangible computer readable medium, e.g., a disc, which stores code for implementing one or more examples of the disclosed methods or steps thereof.
  • a tangible computer readable medium e.g., a disc
  • some disclosed systems can be or include a programmable general purpose processor, digital signal processor, or microprocessor, programmed with software or firmware and/or otherwise configured to perform any of a variety of operations on data, including an embodiment of disclosed methods or steps thereof.
  • Such a general purpose processor may be or include a computer system including an input device, a memory, and a processing subsystem that is programmed (and/or otherwise configured) to perform one or more examples of the disclosed methods (or steps thereof) in response to data asserted thereto.
  • Some embodiments may be implemented as a configurable (e.g., programmable) digital signal processor (DSP) that is configured (e.g., programmed and otherwise configured) to perform required processing on audio signal(s), including performance of one or more examples of the disclosed methods.
  • DSP digital signal processor
  • embodiments of the disclosed systems may be implemented as a general purpose processor, e.g., a personal computer (PC) or other computer system or microprocessor, which may include an input device and a memory, which is programmed with software or firmware and/or otherwise configured to perform any of a variety of operations including one or more examples of the disclosed methods.
  • a general purpose processor e.g., a personal computer (PC) or other computer system or microprocessor
  • DSP digital signal processor
  • the other elements may include one or more loudspeakers and/or one or more microphones.
  • a general purpose processor configured to perform one or more examples of the disclosed methods may be coupled to an input device.
  • Examples of input devices include, e.g., a mouse and/or a keyboard.
  • the general purpose processor may be coupled to a memory, a display device, etc.
  • Another aspect of present disclosure is a computer readable medium, such as a disc or other tangible storage medium, which stores code for performing, e.g., by a coder executable to perform, one or more examples of the disclosed methods or steps thereof.
  • a method of performing gain control on audio signals comprising: obtaining a downmixed audio signal of an audio signal to be encoded; determining that an overload condition has occurred for a frame of the downmixed audio signal; responsive to determining that the overload condition has occurred, determining a gain transition function for the frame, wherein the gain transition function is based at least on a gain transition step size; applying the gain transition function to the frame to generate a gain adjusted frame of the downmixed audio signal; and providing the gain adjusted frame and information indicative of the gain transition function for encoding by an encoder.
  • EEE2 The method of claim EEE1, wherein the method further comprises: encoding the gain adjusted frame together with the information indicative of the gain transition function.
  • obtaining a downmixed audio signal of an audio signal to be encoded comprises: receiving the downmixed audio signal; or determining the downmixed audio signal from the audio signal to be encoded.
  • EEE4 The method of any previous claim, wherein the audio signal is a higher order ambisonics, HOA, audio signal.
  • EEE5. The method of any previous claim, wherein the downmixed audio signal is a spatially encoded downmixed signal.
  • EEE6 The method of any previous claim, wherein the overload condition is a condition in which the frame of the downmixed audio signal exceeds a predefined signal range.
  • EEE7 The method of EEE 6, wherein the predefined signal range is a signal range expected by the encoder.
  • EEE8 The method of any previous claim, wherein the frame of the downmixed audio signal is a current frame and the gain transition function is further based on a previous gain transition function applied to a preceding frame of the current frame.
  • EEE9. The method of any previous claim, wherein the gain transition function further depends on a smoothing function based on the gain transition step size.
  • EEE10. The method of EEE 8, wherein the gain transition function comprises a transitory portion and a steady-state portion, and wherein the transitory portion corresponds to a transition from again associated with the preceding frame to the gain associated with the preceding frame adjusted by the gain transition step size.
  • EEE11 The method of EEE 10, wherein.
  • the gain associated with the preceding frame adjusted by the gain transition step size is an attenuation by the gain transition step size or an amplification by the gain transition step size of the gain associated with the preceding frame depending on a gain adjustment target of the current frame.
  • EEE12 The method of EEEs 10 or 11, wherein a length of the transitory portion is limited by a delay introduced by a codec utilized by the encoder.
  • EEE13 The method of EEE 12, wherein the length of the transitory portion is equal to or less than the number of samples used for an encoding operation by the encoder.
  • EEE14. The method of any one of EEEs 10 to 13, wherein a length of the transitory portion is greater than 1 sample.
  • EEE15 The method of any one of EEEs 10 to 13, wherein a length of the transitory portion is greater than 1 sample.
  • the gain transition function is defined as ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ , ⁇ , ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ , ⁇ , ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ , ⁇ ⁇ 1, ⁇ ⁇ 1 ⁇ , ⁇ ⁇ 0 ... ⁇ 1 index, ⁇ is a smoothing function, ⁇ ⁇ represents the right-most index for which ⁇ is defined and L is the number of samples of one frame.
  • the gain transition step size is a predefined value.
  • the gain transition step size is determined from a set of predefined values of increasing size.
  • EEE18. The method of EEE 17, wherein the method further comprises: determining an overload amount caused by the frame of the downmixed audio signal; determining the gain transition step size from the set of predefined values of increasing size depending on the overload amount.
  • EEE19. The method of any previous claim, wherein the gain transition step size is determined based on a perceptive quality listening test or an objective quality measurement.
  • applying the gain transition function to the frame for generating a gain adjusted frame of the downmixed signal comprises: applying the gain transition function to samples of the downmixed audio signal, wherein a total number of the samples corresponds to the frame of the downmixed audio signal.
  • the method of EEE 2 or any one of EEEs 3 to 21 when depending on claim 2, wherein encoding the gain adjusted frame together with the information indicative of the gain transition function comprises: determining an encoding scheme based on the gain transition function.
  • determining an encoding scheme based on the gain transition function comprises: determining the encoding scheme based on the gain transition step size.
  • determining an encoding scheme based on the gain transition function comprises: determining the encoding scheme based on whether the gain transition function was able to remove the overload condition.
  • EEE25 The method of any one of EEEs 22 to 24, wherein the encoding scheme is one of Modified Discrete Cosine Transformation, MDCT, or Algebraic Code Excited Linear Prediction, ACELP.
  • EEE26 The method of any previous claim, wherein the gain adjusted frame is an attenuated frame or an amplified frame. EEE27.
  • a method of performing gain control on audio signals comprising: receiving, at a decoder, an encoded frame of an audio signal; decoding the encoded frame of an audio signal to obtain a frame of a downmixed audio signal and information indicative of gain control applied by an encoder; determining an inverse gain transition function to be applied to the frame of the downmixed audio signal based at least in part on the information indicative of gain control applied by the encoder, wherein the information indicative of gain control applied by the encoder comprises a gain transition step size; and applying the inverse gain transition function to the frame of the downmixed audio signal.
  • the method of any one of EEEs 27 to 30, wherein the information indicative of gain control applied by the encoder further comprises information indicative of a smoothing function.
  • EEE34 The method of any one of EEEs 27 to 32, wherein the inverse gain transition function comprises a transitory portion and a steady-state portion.
  • EEE34. The method of EEE 33, wherein a length of the transitory portion is limited by a delay introduced by a codec utilized by the decoder.
  • EEE35. An apparatus configured for implementing the method of any one of EEEs 1- 34.
  • EEE36. A program comprising instructions that when executed by a processing device cause the processing device to carry out the method according to any one of EEEs 1-34.
  • EEE37 A storage medium storing the program of EEE 36.
  • a method for performing gain control on audio signals comprising: receiving, by an automatic gain control system, a spatially encoded downmix audio signal; determining that an overload condition occurred for one or more frames of the received signal; responsive to the overload condition, generating an attenuated signal by applying a gain function to the received signal to attenuate the overload, the gain function being dependent on (1) an attenuation level parameter, (2) a gain function shape that specifies a respective attenuation level for each of the one or more frames, or (3) a combination of the attenuation level parameter and the gain function shape; and providing the attenuated signal and a representation of the attenuation level parameter to a core encoder for encoding.
  • the attenuation level parameter includes a table of numbers, each number corresponding to a respective attenuation level to be consecutively applied to the one or more frames.
  • EEE40 The method of EEE 39, wherein each number has a same value, indicating that each step of attenuation attenuates the signal by a same amount.
  • EEE41 The method of EEE 39, wherein the numbers increase in value, indicating that each step of attenuation attenuates the signal by an amount that is higher than a previous step.
  • EEE42 The method of any of EEEs 38-41, comprising steering the core encoder to encode the audio signal using different encoding schemes based on the attenuation level parameter.
  • EEE43 The method of any of EEEs 38-41, comprising steering the core encoder to encode the audio signal using different encoding schemes based on the attenuation level parameter.
  • EEE44 An apparatus configured for implementing the method of any one of EEEs 38- 43.
  • EEE45 One or more non-transitory media having software stored thereon, the software including instructions for controlling one or more devices to perform the method of any one of EEEs 38-43.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

L'invention concerne des systèmes, des procédés et des produits de programmes informatiques de réalisation d'une commande de gain sur des signaux audio. Un système de commande de gain automatique obtient un signal audio mélangé à la baisse d'un signal audio à coder. Le système détermine qu'une condition de surcharge s'est produite pour une trame du signal audio mélangé à la baisse. En réponse à la condition de surcharge, le système détermine une fonction de transition de gain pour la trame, la fonction de transition de gain étant basée au moins sur une taille de pas de transition de gain. Le système applique la fonction de transition de gain à la trame pour générer une trame à gain ajusté du signal audio mélangé à la baisse. Le système fournit la trame à gain ajusté et des informations indiquant la fonction de transition de gain pour un codage par un codeur.
PCT/US2023/073365 2022-10-06 2023-09-01 Procédés, appareils et systèmes de réalisation d'une commande de gain à motivation perceptive WO2024076810A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202263378678P 2022-10-06 2022-10-06
US63/378,678 2022-10-06
US202363503533P 2023-05-22 2023-05-22
US63/503,533 2023-05-22

Publications (1)

Publication Number Publication Date
WO2024076810A1 true WO2024076810A1 (fr) 2024-04-11

Family

ID=88204057

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/073365 WO2024076810A1 (fr) 2022-10-06 2023-09-01 Procédés, appareils et systèmes de réalisation d'une commande de gain à motivation perceptive

Country Status (1)

Country Link
WO (1) WO2024076810A1 (fr)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022192217A1 (fr) * 2021-03-11 2022-09-15 Dolby Laboratories Licensing Corporation Codec audio à commande de gain adaptative de signaux à mixage réducteur

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022192217A1 (fr) * 2021-03-11 2022-09-15 Dolby Laboratories Licensing Corporation Codec audio à commande de gain adaptative de signaux à mixage réducteur

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
D. MCGRATHS. BRUHNH. PURNHAGENM. ECKERTJ. TORRESS. BROWND. DARCY: "Immersive Audio Coding for Virtual Reality Using a Metadata-assisted Extension of the 3GPP EVS Codec", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (1CASSP, 2019, pages 730 - 734, XP033566263, DOI: 10.1109/ICASSP.2019.8683712

Similar Documents

Publication Publication Date Title
RU2659490C2 (ru) Концепция для объединенного сжатия динамического диапазона и управляемого предотвращения отсечения для аудиоустройств
KR101976757B1 (ko) 포스트 다운믹스 신호를 지원하는 다객체 오디오 부호화 장치 및 복호화 장치
JP4809370B2 (ja) マルチチャネル音声符号化における適応ビット割り当て
CN110648677B (zh) 用于下混合音频内容的响度调整
JP5511136B2 (ja) マルチチャネルシンセサイザ制御信号を発生するための装置および方法並びにマルチチャネル合成のための装置および方法
EP2169666B1 (fr) Procédé et appareil de traitement de signal
EP3762923B1 (fr) Codage audio
US20210005211A1 (en) Using metadata to aggregate signal processing operations
US20210319799A1 (en) Spatial parameter signalling
US20240153512A1 (en) Audio codec with adaptive gain control of downmixed signals
WO2024076810A1 (fr) Procédés, appareils et systèmes de réalisation d'une commande de gain à motivation perceptive
TW202422318A (zh) 用於執行感知激勵增益控制之方法、設備及系統
US20240161754A1 (en) Encoding of envelope information of an audio downmix signal
CN116982109A (zh) 具有下混信号自适应增益控制的音频编解码器
KR20240047372A (ko) 사운드 코덱에 있어서 출력 합성 왜곡의 제한을 위한 방법 및 디바이스
CN116982110A (zh) 对音频下混信号的包络信息进行编码
WO2022216542A1 (fr) Domaine technique d'atténuation multibande de signaux audio
WO2023172865A1 (fr) Procédés, appareil et systèmes de traitement audio par reconstruction spatiale-codage audio directionnel
CN116997960A (zh) 音频信号技术领域的多频带闪避

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23777466

Country of ref document: EP

Kind code of ref document: A1