WO2024115052A1 - Codage audio spatial paramétrique - Google Patents

Codage audio spatial paramétrique Download PDF

Info

Publication number
WO2024115052A1
WO2024115052A1 PCT/EP2023/080907 EP2023080907W WO2024115052A1 WO 2024115052 A1 WO2024115052 A1 WO 2024115052A1 EP 2023080907 W EP2023080907 W EP 2023080907W WO 2024115052 A1 WO2024115052 A1 WO 2024115052A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
audio
ratio
quantized
generating
Prior art date
Application number
PCT/EP2023/080907
Other languages
English (en)
Inventor
Adriana Vasilache
Mikko-Ville Laitinen
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2024115052A1 publication Critical patent/WO2024115052A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0004Design or structure of the codebook
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0007Codebook element generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/03Application of parametric coding in stereophonic audio systems

Definitions

  • the present application relates to apparatus and methods for spatial audio representation and encoding, but not exclusively for audio representation for an audio encoder.
  • Parametric spatial audio processing is a field of audio signal processing where the spatial aspect of the sound is described using a set of parameters.
  • parameters such as directions of the sound in frequency bands, and the ratios between the directional and non-directional parts of the captured sound in frequency bands.
  • These parameters are known to well describe the perceptual spatial properties of the captured sound at the position of the microphone array.
  • These parameters can be utilized in synthesis of the spatial sound accordingly, for headphones binaurally, for loudspeakers, or to other formats, such as Ambisonics.
  • the directions and direct-to-total energy ratios in frequency bands are thus a parameterization that is particularly effective for spatial audio capture.
  • a parameter set consisting of a direction parameter in frequency bands and an energy ratio parameter in frequency bands (indicating the directionality of the sound) can be also utilized as the spatial metadata (which may also include other parameters such as surround coherence, spread coherence, number of directions, distance etc) for an audio codec.
  • these parameters can be estimated from microphone-array captured audio signals, and for example a stereo or mono signal can be generated from the microphone array signals to be conveyed with the spatial metadata.
  • Immersive audio codecs are being implemented supporting a multitude of operating points ranging from a low bit rate operation to transparency.
  • An example of such a codec is the Immersive Voice and Audio Services (IVAS) codec which is being designed to be suitable for use over a communications network such as a 3GPP 4G/5G network including use in such immersive services as for example immersive voice and audio for virtual reality (VR).
  • IVAS Immersive Voice and Audio Services
  • This audio codec is expected to handle the encoding, decoding and rendering of speech, music and generic audio. It is furthermore expected to support channel-based audio and scene-based audio inputs including spatial information about the sound field and sound sources.
  • the codec is also expected to operate with low latency to enable conversational services as well as support high error robustness under various transmission conditions.
  • the stereo signal could be encoded, for example, with an AAC encoder and the mono signal could be encoded with an EVS encoder.
  • a decoder can decode the audio signals into PCM signals and process the sound in frequency bands (using the spatial metadata) to obtain the spatial output, for example a binaural output.
  • the aforementioned immersive audio codecs are particularly suitable for encoding captured spatial sound from microphone arrays (e.g., in mobile phones, VR cameras, stand-alone microphone arrays).
  • microphone arrays e.g., in mobile phones, VR cameras, stand-alone microphone arrays.
  • an encoder can have other input types, for example, loudspeaker signals, audio object signals, Ambisonic signals.
  • an apparatus for encoding an audio object parameter; the apparatus comprising means for: obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing the ratio parameters with respect to the audio objects using a first number of bits; generating a vector from a selection of the quantized ratio parameters; and generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.
  • the means for generating the integer value based on the indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects may be for: generating a single number value by appending elements from the vector; and generating the index from the single number, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value reached at the end of the iteration loop.
  • the means for generating a single number value by appending elements from the vector may be further for transforming the elements from the vector into a base representation based on the first number of bits.
  • the means for transforming the elements from the vector into a base representation based on the first number of bits may be for transforming the elements into one of: a base 10 representation when the first number of bits is three; base 16 representation when the first number of bits is four; or base 32 representation when the first number of bits is five.
  • the means for generating the vector from the selection of the quantized ratio parameters may be for generating the vector from the selection of all but one of the quantized ratio parameters.
  • the means for generating the vector from the selection of all but one of the quantized ratio parameters may be for: generating a full vector from the quantized ratio parameters for the audio objects; and generating the vector from a selection of all but one of the quantized ratio parameters for the audio objects.
  • the means for quantizing the ratio parameter with respect to the audio object using the first number of bits may be for scalar quantizing the ratio parameter with respect to the audio object using the first number of bits.
  • the first number of bits may be three and wherein the integer value may be an integer value in base ten.
  • the valid vector may be one in which one of: a sum of vector element values may be less than or equal to seven; or no element of the vector has a value which is greater than seven and the sum of vector element values may be less than or equal to seven.
  • an apparatus for decoding ratio parameters for audio objects, the apparatus comprising means for: obtaining an integer value representing ratio parameters for the audio objects; converting the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
  • the means for converting the integer value to the vector representing the selection of quantized ratio parameters based on the indexing of the vector may be for: generating a single number from the integer value, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value; and separating the single number into vector component values to generate the vector.
  • the means for regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters may be for generating at least one further quantized ratio parameter based on a value of summed elements of the vector subtracted from an expected sum value.
  • the means for dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify the distribution of the specific object within the object part of the total audio environment may be for scalar dequantizing the ratio parameter with respect to the audio object using a first number of bits.
  • the first number of bits may be three, the expected sum value may be seven and wherein the integer value may be an integer value in base ten.
  • a method for an apparatus for encoding an audio object parameter; the method comprising: obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing the ratio parameters with respect to the audio objects using a first number of bits; generating a vector from a selection of the quantized ratio parameters; and generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.
  • Generating the integer value based on the indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects may comprise: generating a single number value by appending elements from the vector; and generating the index from the single number, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value reached at the end of the iteration loop.
  • Generating a single number value by appending elements from the vector may further comprise transforming the elements from the vector into a base representation based on the first number of bits.
  • Transforming the elements from the vector into a base representation based on the first number of bits may comprise transforming the elements into one of: a base 10 representation when the first number of bits is three; base 16 representation when the first number of bits is four; or base 32 representation when the first number of bits is five.
  • Generating the vector from the selection of the quantized ratio parameters may comprise generating the vector from the selection of all but one of the quantized ratio parameters.
  • Generating the vector from the selection of all but one of the quantized ratio parameters may comprise: generating a full vector from the quantized ratio parameters for the audio objects; and generating the vector from a selection of all but one of the quantized ratio parameters for the audio objects.
  • Quantizing the ratio parameter with respect to the audio object using the first number of bits may comprise scalar quantizing the ratio parameter with respect to the audio object using the first number of bits.
  • the first number of bits may be three and wherein the integer value may be an integer value in base ten.
  • the valid vector may be one in which one of: a sum of vector element values may be less than or equal to seven; or no element of the vector has a value which is greater than seven and the sum of vector element values may be less than or equal to seven.
  • a method for an apparatus for decoding ratio parameters for audio objects comprising: obtaining an integer value representing ratio parameters for the audio objects; converting the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
  • Converting the integer value to the vector representing the selection of quantized ratio parameters based on the indexing of the vector may comprise: generating a single number from the integer value, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value; and separating the single number into vector component values to generate the vector.
  • Regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters may comprise generating at least one further quantized ratio parameter based on a value of summed elements of the vector subtracted from an expected sum value.
  • Dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify the distribution of the specific object within the object part of the total audio environment may comprise scalar dequantizing the ratio parameter with respect to the audio object using a first number of bits.
  • the first number of bits may be three, the expected sum value may be seven and wherein the integer value may be an integer value in base ten.
  • an apparatus for encoding an audio object parameter
  • the apparatus comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the system at least to perform: obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing the ratio parameters with respect to the audio objects using a first number of bits; generating a vector from a selection of the quantized ratio parameters; and generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.
  • the apparatus caused to perform generating the integer value based on the indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects may be further caused to perform: generating a single number value by appending elements from the vector; and generating the index from the single number, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value reached at the end of the iteration loop.
  • the apparatus caused to perform generating a single number value by appending elements from the vector may be further be caused to perform transforming the elements from the vector into a base representation based on the first number of bits.
  • the apparatus caused to perform transforming the elements from the vector into a base representation based on the first number of bits may be caused to perform transforming the elements into one of: a base 10 representation when the first number of bits is three; base 16 representation when the first number of bits is four; or base 32 representation when the first number of bits is five.
  • the apparatus caused to perform generating the vector from the selection of the quantized ratio parameters may be further caused to perform generating the vector from the selection of all but one of the quantized ratio parameters.
  • the apparatus caused to perform generating the vector from the selection of all but one of the quantized ratio parameters may be further caused to perform: generating a full vector from the quantized ratio parameters for the audio objects; and generating the vector from a selection of all but one of the quantized ratio parameters for the audio objects.
  • the apparatus caused to perform quantizing the ratio parameter with respect to the audio object using the first number of bits may be caused to perform scalar quantizing the ratio parameter with respect to the audio object using the first number of bits.
  • the first number of bits may be three and wherein the integer value may be an integer value in base ten.
  • the valid vector may be one in which one of: a sum of vector element values may be less than or equal to seven; or no element of the vector has a value which is greater than seven and the sum of vector element values may be less than or equal to seven.
  • an apparatus for decoding ratio parameters for audio objects, the apparatus comprising at least one processor and at least one memory storing instructions that, when executed by the at least one processor, cause the system at least to perform: obtaining an integer value representing ratio parameters for the audio objects; converting the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
  • the apparatus caused to perform converting the integer value to the vector representing the selection of quantized ratio parameters based on the indexing of the vector may be caused to perform: generating a single number from the integer value, by performing an iteration loop from a zeroth iteration up to and including the single number of iterations and sequentially associating index values to iteration loop iteration numbers which have valid vectors, wherein the integer value is the highest index value; and separating the single number into vector component values to generate the vector.
  • the apparatus caused to perform regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters may be further caused to perform generating at least one further quantized ratio parameter based on a value of summed elements of the vector subtracted from an expected sum value.
  • the apparatus caused to perform dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify the distribution of the specific object within the object part of the total audio environment may be caused to further perform scalar dequantizing the ratio parameter with respect to the audio object using a first number of bits.
  • the first number of bits may be three, the expected sum value may be seven and wherein the integer value may be an integer value in base ten.
  • an apparatus for encoding an audio object parameter comprising: means for obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; means for quantizing the ratio parameters with respect to the audio objects using a first number of bits; means for generating a vector from a selection of the quantized ratio parameters; and means for generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.
  • an apparatus for decoding ratio parameters for audio objects comprising: means for obtaining an integer value representing ratio parameters for the audio objects; means for converting the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; means for regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and means for dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
  • an apparatus for encoding an audio object parameter comprising: obtaining circuitry configured to obtain a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing circuitry configured to quantize the ratio parameters with respect to the audio objects using a first number of bits; vector generating circuitry for generating a vector from a selection of the quantized ratio parameters; and integer value generating circuitry configured to generate an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.
  • an apparatus for decoding ratio parameters for audio objects comprising: obtaining circuitry configured to obtain an integer value representing ratio parameters for the audio objects; converting circuitry configured to convert the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; regenerating circuitry configured to regenerate at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and dequantizing circuitry configured to dequantize the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
  • a computer program comprising instructions [or a computer readable medium comprising instructions] for causing an apparatus for encoding an audio object parameter, the apparatus caused to perform at least the following: obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing the ratio parameters with respect to the audio objects using a first number of bits; generating a vector from a selection of the quantized ratio parameters; and generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.
  • a computer program comprising instructions [or a computer readable medium comprising instructions] for causing an apparatus for decoding ratio parameters for audio objects, the apparatus caused to perform at least the following: obtaining an integer value representing ratio parameters for the audio objects; converting the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
  • a non-transitory computer readable medium comprising program instructions for causing an apparatus for encoding an audio object parameter, the apparatus caused to perform at least the following: obtaining a ratio parameter associated with a respective audio object within an audio environment, the audio environment comprising at least two audio objects and the ratio parameters configured to identify a distribution of the respective object within the object part of the total audio environment; quantizing the ratio parameters with respect to the audio objects using a first number of bits; generating a vector from a selection of the quantized ratio parameters; and generating an integer value based on an indexing from the vector, wherein the generated integer value represents the ratio parameters for the at least two audio objects.
  • a fourteenth there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus for decoding ratio parameters for audio objects, the apparatus caused to perform at least the following: obtaining an integer value representing ratio parameters for the audio objects; converting the integer value to a vector representing a selection of quantized ratio parameters based on an indexing of the vector; regenerating at least one further quantized ratio parameter from the vector selection of the quantized ratio parameters; and dequantizing the quantized ratio parameter to obtain ratio parameters for the audio objects, the ratio parameters configured to identify a distribution of a specific object within the object part of a total audio environment.
  • An apparatus comprising means for performing the actions of the method as described above.
  • An apparatus configured to perform the actions of the method as described above.
  • a computer program comprising program instructions for causing a computer to perform the method as described above.
  • a computer program product stored on a medium may cause an apparatus to perform the method as described herein.
  • An electronic device may comprise apparatus as described herein.
  • a chipset may comprise apparatus as described herein.
  • Embodiments of the present application aim to address problems associated with the state of the art.
  • Figure 1 shows schematically a system of apparatus suitable for implementing some embodiments
  • Figure 2 shows schematically an example metadata extractor and metadata compressor and packer as shown in the system of apparatus as shown in Figure 1 according to some embodiments;
  • Figure 3 shows a flow diagram of the operation of the example metadata extractor and metadata compressor and packer shown in Figure 2 according to some embodiments;
  • FIG 4 shows schematically an example ISM vector index generator as shown in Figure 2 according to some embodiments
  • Figure 5 shows a flow diagram of the operation of the example ISM vector index generator as shown in Figures 4 according to some embodiments;
  • Figure 6 shows schematically an example metadata decoder as shown in the system of apparatus as shown in Figure 1 according to some embodiments
  • Figure 7 shows a flow diagram of the operation of the example metadata decoder shown in Figure 6 according to some embodiments;
  • Figure 8 shows schematically a flow diagram of the operation of the example ISM vector index to vector generator as shown in Figure 6 according to some embodiments.
  • Figure 9 shows an example device suitable for implementing the apparatus shown in previous figures.
  • the 3GPP IVAS codec is configured to be receive a combined input format mode.
  • the combined input format mode will enable simultaneous encoding of two different audio input formats.
  • An example of two different audio input formats being currently considered is the combination of the MASA format with audio objects.
  • the audio objects data can also be known as independent stream with metadata (ISM) and is interchangeably described herein.
  • ISM ratio is used to describe the distribution of the ISM related audio content with respect to the objects. Specifically the ISM ratio identifies the distribution of a certain object within the object part of the total audio scene.
  • MASA-to-total energy ratio identifies a portion of MASA stream within the total audio scene (containing the objects and the MASA).
  • (1 - MASA-to-total energy ratio) identifies a portion of all the objects within the total audio scene
  • the following concept as discussed in detail herein is the efficient encoding of these ISM radios.
  • These ISM ratios could be indexed within a pyramidal truncation of a Zn lattice or to encoded by a suitable entropy encoder (such as a Golomb Rice coder) or a context arithmetic encoder.
  • a suitable entropy encoder such as a Golomb Rice coder
  • a context arithmetic encoder such as a Golomb Rice coder
  • the encoding using pyramidal truncation of a Zn lattice is more efficient in terms of compression efficiency, but it needs memory to store index offsets and description of the vectors of the Zn lattice.
  • the arithmetic encoding methods are generally less efficient because there is typically not sufficient data within an audio frame in order to determine the distribution of the index values.
  • the embodiments as discussed herein attempt to provide an indexing method for the lattice Zn vectors which does not need to store the index offsets nor the information relative to layer values.
  • the embodiments which employ such methods are efficient for lower dimensions of the lattice and can be used for the encoding the ISM ratio index vectors.
  • Metadata-Assisted Spatial Audio is an example of a parametric spatial audio format and representation suitable as an input format for IVAS.
  • spatial metadata associated with the audio signals may comprise multiple parameters (such as multiple directions and associated with each direction (or directional value) a direct-to-total energy ratio, spread coherence, distance, etc.) per time-frequency tile.
  • the spatial metadata may also comprise other parameters or may be associated with other parameters which are considered to be non-directional (such as surround coherence, diffuse-to-total energy ratio, remainder-to-total energy ratio) but when combined with the directional parameters are able to be used to define the characteristics of the audio scene.
  • a reasonable design choice which is able to produce a good quality output is one where the spatial metadata comprises one or more directions for each time-frequency subframe (and associated with each direction direct-to- total ratios, spread coherence, distance values etc) are determined.
  • parametric spatial metadata representation can use multiple concurrent spatial directions.
  • MASA the proposed maximum number of concurrent directions is two.
  • parameters such as: Direction index; Direct-to-total ratio; Spread coherence; and Distance.
  • other parameters such as Diffuse- to-total energy ratio; Surround coherence; and Remainder-to-total energy ratio are defined.
  • the quantization steps are relatively large.
  • the quantization points are at 0, ⁇ 45, ⁇ 90, ⁇ 135, and 180 degrees of azimuth.
  • the audio objects input format can comprise independent streams with metadata (ISM).
  • the metadata may not be available (in which cases, e.g., some default values may be assumed).
  • ISM ratio a parameter named ISM ratio has been defined and which identifies the distribution of a certain object within the object part of the total audio scene. The concept as discussed herein is the efficient encoding and decoding of these ISM ratio parameters.
  • Figure 1 depicts an example apparatus 100 and system for implementing embodiments of the application.
  • Figure 1 depicts an example apparatus and system for implementing embodiments of the application.
  • the system is shown with an ‘analysis’ part.
  • the ‘analysis’ part is the part from receiving the multi-channel signals up to an encoding of the metadata and downmix signal.
  • the input to the system ‘analysis’ part is the multi-channel audio signals 102.
  • a microphone channel signal input is described, however any suitable input (or synthetic multi-channel) format may be implemented in other embodiments.
  • the spatial analyser and the spatial analysis may be implemented external to the encoder.
  • the spatial (MASA) metadata associated with the audio signals may be provided to an encoder as a separate bit-stream.
  • the spatial (MASA) metadata may be provided as a set of spatial (direction) index values.
  • Figure 1 also depicts multiple audio objects 104 as a further input to the analysis part.
  • these multiple audio objects (or audio object stream) 104 may represent various sound sources within a physical space.
  • Each audio object may be characterized by an audio (object) signal and accompanying metadata comprising directional data (in the form of azimuth and elevation values) which indicate the position or direction of the audio object within a physical space on an audio frame basis.
  • the multi-channel signals 102 are passed to an analyser and encoder 101 , and specifically a transport signal generator 105 and to a metadata generator 103.
  • the metadata generator 103 is also configured to receive the multi-channel signals and analyse the signals to produce metadata 104 associated with the multi-channel signals and thus associated with the transport signals 106.
  • the analysis processor 103 may be configured to generate the metadata which may comprise, for each time-frequency analysis interval, a direction parameter and an energy ratio parameter and a coherence parameter (and in some embodiments a diffuseness parameter).
  • the direction, energy ratio and coherence parameters may in some embodiments be considered to be MASA spatial audio parameters (or MASA metadata).
  • the spatial audio parameters comprise parameters which aim to characterize the sound-field created/captured by the multi-channel signals (or two or more audio signals in general).
  • the parameters generated may differ from frequency band to frequency band.
  • band X all of the parameters are generated and transmitted, whereas in band Y only one of the parameters is generated and transmitted, and furthermore in band Z no parameters are generated or transmitted.
  • band Z no parameters are generated or transmitted.
  • the transport signals 106 and the metadata 104 may be passed to a combined encoder core 109.
  • the transport signal generator 105 is configured to receive the multi-channel signals and generate a suitable transport signal comprising a determined number of channels and output the transport signals 106 (MASA transport audio signals).
  • the transport signal generator 105 may be configured to generate a 2-audio channel downmix of the multi-channel signals.
  • the determined number of channels may be any suitable number of channels.
  • the transport signal generator in some embodiments is configured to otherwise select or combine, for example, by beamforming techniques the input audio signals to the determined number of channels and output these as transport signals.
  • the transport signal generator 105 is optional and the multi-channel signals are passed unprocessed to a combined encoder core 109 in the same manner as the transport signal are in this example.
  • the audio objects 104 may be passed to the audio object analyser 107 for processing.
  • the audio object analyser 107 analyses the object audio input stream 104 in order to produce suitable audio object transport signals and audio object metadata.
  • the audio object analyser may be configured to produce the audio object transport signals by downmixing the audio signals of the audio objects into a stereo channel using amplitude panning based on the associated audio object directions.
  • the audio object analyser may also be configured to produce the audio object metadata associated with the audio object input stream 104.
  • the audio object metadata may comprise direction values which are applicable for all sub-bands. So, if there are 4 objects, there are 4 directions.
  • the direction values also apply across all of the subframes of the frame, but in some embodiments the temporal resolution of the direction values can differ and the directions values apply for one or more than one sub-frames of the frame.
  • the audio object metadata may comprise energy ratios (or ISM ratios). In the following examples the energy ratios (or ISM ratios), are for each time-frequency tile for each object.
  • the audio object analyser 107 may be sited elsewhere and the audio objects 104 input to the analyser and encoder 101 is audio object transport signals and audio object metadata.
  • the analyser and encoder 101 may comprise an audio encoder core 109 which is configured to receive the transport audio (for example downmix) signals 106 and audio object transport signals 128 in order to generate a suitable encoding of these audio signals.
  • the audio encoder core 109 is further configured to receive the output of the metadata generator, the MASA metadata 104 and output an encoded or compressed form of the information as Encoded (MASA) metadata 116.
  • the analyser and encoder 101 may also comprise an audio object metadata encoder 111 which is similarly configured to receive the audio object metadata 108 and output an encoded or compressed form of the input information as encoded audio object metadata 112.
  • the combined encoder core 109 can be configured to implement a stream separation metadata determiner and encoder which can be configured to determine the relative contributory proportions of the multi-channel signals 102 (MASA audio signals) and audio objects 104 to the overall audio scene.
  • This measure of proportionality produced by the stream separation metadata determiner and encoder may be used to determine the proportion of quantizing and encoding “effort” expended for the input multi-channel signals 102 and the audio objects 104.
  • the stream separation metadata determiner and encoder may produce a metric which quantifies proportion of the encoding effort expended on the multichannel audio signals 102 compared to the encoding effort expended on the audio objects 104.
  • This metric may be used to drive the encoding of the audio object metadata 108 and the metadata 104. Furthermore, the metric as determined by the separation metadata determiner and encoder may also be used as an influencing factor in the process of encoding the transport audio signals 106 and audio object transport audio signal 128 performed by the combined encoder core 109.
  • the output metric from the stream separation metadata determiner and encoder can furthermore be represented as encoded stream separation metadata and be combined into the encoded metadata stream from the combined encoder core 109.
  • the analyser and encoder 101 can in some embodiments be a computer or mobile device (running suitable software stored on memory and on at least one processor), or alternatively a specific device utilizing, for example, FPGAs or ASICs.
  • the encoding may be implemented using any suitable scheme.
  • the encoder 107 may further interleave, multiplex to a single data stream or embed the encoded MASA metadata, audio object metadata and stream separation metadata within the encoded (downmixed) transport audio signals before transmission or storage shown in Figure 1 by the dashed line.
  • the multiplexing may be implemented using any suitable scheme.
  • an associated decoder and renderer 119 which is configured to obtain the bitstream 118 comprising encoded metadata 116, encoded transport audio signals 138 and encoded audio object metadata 112 and from these generate suitable spatial audio output signals.
  • the decoding and processing of such audio signals are known in principle and are not discussed in detail hereafter other than the decoding of the encoded ISM ratio metadata.
  • the audio object analyser 107 comprises an ISM ratio generator 201 .
  • the ISM ratio generator 201 is configured to generate independent streams with metadata (ISM) ratios associated with the audio object signals 104.
  • the ISM ratios can be obtained as follows.
  • the object audio signals s obj (t, i) are transformed to time-frequency domain S ob j(b, n, i) (where t is the temporal sample index, b the frequency bin index, n the temporal frame index, and i the object index.
  • the time-frequency domain signals can, e.g., be obtained via short-time Fourier transform (STFT) or complex-modulated quadrature filterbanks (QMF) (or low-delay variants of them).
  • STFT short-time Fourier transform
  • QMF complex-modulated quadrature filterbanks
  • the energies of the objects are computed in frequency bands where b k low is the lowest and b k high the highest bin of the frequency band k.
  • the ISM ratios (k, n, i) can be computed as where I is the number of objects.
  • the temporal resolution of the ISM ratios may be different than the temporal resolution of the time-frequency domain audio signals S obJ b, n, i) (i.e., the temporal resolution of the spatial metadata may be different than the temporal resolution of the time-frequency transform).
  • the computation (of the energy and/or the ISM ratios) may include summing over multiple temporal frames of the time-frequency domain audio signals and/or the energy values.
  • the ISM ratios are numbers between 0 and 1 and they correspond to the fraction with which one object is active within the audio scene created by all the objects. For each object there is one ISM ratio per frequency sub-band and time subframe. As discussed above the ISM ratios are passed to the audio object metadata encoder 111.
  • the audio object metadata encoder 111 is configured to encode the ISM ratios.
  • the directions i.e. , the azimuth and elevation angle per object
  • the audio object metadata encoder 111 comprises an ISM ratio quantizer 203.
  • the ISM ratio quantizer 203 is configured to receive the ISM ratio values 202 and quantize them.
  • the quantization of each of the ratios returns a positive integer value in binary from 000 to 111 (or 0 to 7 in decimal or base 10 form).
  • the quantization can be performed using any suitable number of bits.
  • the following examples show a uniform scalar quantizer based on 3 bits for each value. It can also be a non-uniform scalar quantizer. The distribution of the indexes does not influence the indexing.
  • the quantized ISM ratio index values 204 can then be passed to an ISM ratio vector generator 205.
  • the audio object metadata encoder 111 comprises an ISM ratio vector generator 205 which is configured to receive the quantized ISM ratio values and generate a vector representation of the ISM ratios for the subband.
  • the vectors to be indexed are ⁇ % G Z n
  • i i K ⁇ - In other words, they are the lattice Zn vectors of the pyramidal layer of norm K.
  • the vector produced by the ISM ratio vector generator 205 therefore in this example can be configured to determine vectors (to be indexed in a manner as described later) as follows:
  • the Vector Quantized ISM ratio index values 206 can then be passed to the ISM vector index generator 207.
  • the audio object metadata encoder 111 can in some embodiments comprise an ISM vector index generator 207.
  • the ISM vector index generator 207 can be configured to generate a suitable index value for the sub-band representing the vector and pass this as an encoded ISM ratio index value 208 which can for example be passed to a bitstream generator 209 to be included within the bitstream 118.
  • FIG. 3 With respect to Figure 3 is shown a flow diagram which summarises the operations of the example audio object analyser 107 and example audio object metadata encoder 111 shown in Figure 2.
  • the initial operation is one of receiving/obtaining the independent streams with metadata as shown in Figure 3 by step 301 .
  • step 303 the following operation is performed of generating ISM ratio values from the independent streams with metadata as shown in Figure 3 by step 303.
  • ISM ratio values Having determined the ISM ratio values, they can be quantized to generate quantized ISM ratio values as shown in Figure 3 by step 305.
  • the next operation is generating vectors from quantized ISM ratio values as shown in Figure 3 by step 307.
  • an ISM vector index is generated as shown in Figure 3 by step 309.
  • the ISM vector index can then be output for inclusion to the bitstream as shown in Figure 3 by step 311 .
  • ISM vector index generator 207 is shown in further detail.
  • the ISM vector index generator 207 comprises a vector component selector 401 which is configured to receive the vector quantized ISM ratio index values 206 and select vector components and pass these to a number generator 403.
  • the sum of the ISM ratios across all objects is 1.
  • the quantization indexes sum up to 1
  • the vector component selector is configured to select and forward the first N-1 components of a N length vector.
  • the output selected vector components are 0 and 4 (with 3 being not selected).
  • the first N-1 components are selected it would be understood that any suitable criteria for the selection of the N-1 can be implemented, for example the ‘last N-1’ components selected.
  • the selection or ‘dropping’ of a component can be implemented based on any suitable selection method. For example in some embodiments the first N-1 values are selected (or in other words the selection always ‘drops’ the last value because the entire space of permutations is used). In some embodiments there can be an estimation of a histogram of the resulting indexes and encode the indexes (corresponding to first components) with a variable bitrate.
  • a complexity reduction can be employed by (in the while loops) favouring the lowest value in the beginning.
  • the ISM vector index generator 207 further comprises a number generator 403 which is configured to receive the output of the vector component selector 401 and generate a number value from the selected vector components.
  • the number generator 403 is configured to generate a base 10 number from concatenating a base 10 representation of each of the selected components. Thus in the example shown above where the selected components are 0 and 4 then the base 10 value of 4 is generated by the number generator 403.
  • the number generator can then pass the generated number to a number to index generator 405.
  • the ISM vector index generator 207 comprises a number to index generator 405.
  • the indexing function can be defined by the following pseudocode implementation:
  • int16_t index_slice_enum ( int16_t *ratio_ism_idx, /* integer array to be indexed */ int16_t n ) /* space dimension */
  • ⁇ index ratio_ism_idx[0]
  • the function valid() is verifying if a given number corresponds to a valid (n- 1 )-dimensional array of integers, i.e. having the Laplacian norm less or equal to K.
  • the valid vector may be one in which one of: a sum of vector element values may be less than or equal to seven where the number of iterations for the valid vector loop is seven for two objects (as only one value is checked), seventy for three objects (as one two values are checked) or seven-hundred for four objects (as three values are checked) in the 3 bit quantization example.
  • there are eight (0,1 ,2,3, ... 7) valid vectors for two objects 36 valid vectors for 3 objects (such as shown in the table above) and 120 valid vectors for 4 objects.
  • the valid() function is defined as: int16_t valid( int16_t index, int16_t K, int16_t len )
  • the encoded ISM ratio index values 208 can then be output.
  • FIG. 5 With respect to Figure 5 is shown a flow diagram which summarises the operations of the example ISM vector index generator 207 shown in Figure 4.
  • the initial operation is one of receiving/obtaining the Vector Quantized ISM ratio index values as shown in Figure 5 by step 501 .
  • the selected vector components can be used to generate a single number from selected component values, for example by appending the selected vector component values into a single number, which is a decimal or base 10 representation number, as shown in Figure 5 by step 505. Then the following operation is one of generating an index value from the number as shown in Figure 5 by step 507.
  • This operation of generating an index value from the number can be shown as the loop of:
  • vector index is initialized at 0 as shown in Figure 5 by step 571 ;
  • Increment loop index if loop index is the input number then stop and index is vector index value otherwise perform a new check on the incremented loop index value as shown in Figure 5 by step 575.
  • FIG. 6 With respect to Figure 6 is shown in further detail the decoder as shown in Figure 1 with respect to the decoding and generating of decoded ISM ratio values. Furthermore Figures 7 and 8 show flow diagrams of the example decoder operations as shown in Figure 6 according to some embodiments.
  • the decoder comprises a bitstream demultiplexer 601 configured to demultiplex the bitstream 118 and extract encoded ISM ratio index values 602. These encoded ISM ratio index values 602 can be passed to a metadata decoder where the decoded ISM ratio values are output and used to generate spatial audio signals.
  • the method comprises receiving/obtaining the bitstream as shown in Figure 7 by step 701 .
  • the metadata decoder 603 comprises an ISM vector index to vector generator 605.
  • the ISM vector index to vector generator 605 is configured to generate decoded ISM vector values 606 from the encoded ISM ratio index values 602.
  • this can be implemented using the opposite index to vector value determination as discussed above.
  • step 705 there is the operation of generating of ISM vector value from ISM ratio index values.
  • This operation is further described with respect to Figure 8 and the operations of the example ISM vector index to vector generator as shown in Figure 6 where the encoded Vector Quantized ISM ratio index values are received or obtained by step 801 .
  • index value ISM ratio index value and variable J at 0 as shown in Figure 8 by step 803.
  • the index value is decremented (by 1 ) if the vector corresponding to J is valid as shown in Figure 8 by step 805.
  • step 809 a check is made on the index value, to determine if it is zero as shown in Figure 8 by step 809. If the value is zero then the loop returns to step 805 else it progresses to step 811 .
  • the last component is then generated based on the difference between the sum of the n-1 components and the Laplacian norm value K as shown in Figure 8 by step 813.
  • the deindexing function can furthermore be defined by the following pseudo-code:
  • ratio_idx_ism[n - 1 ] K - sum
  • the decoded ISM vector values 606 can then be passed to an ISM ratio generator 607.
  • the metadata decoder 603 can in some embodiments comprise an ISM ratio generator 607 which is configured to receive the decoded ISM vector values 606 and generate decoded ISM ratios 608 in a manner employing the opposite methods to those described above.
  • step 707 The operation of generating the ISM ratios from the ISM vector values is shown in Figure 7 by step 707.
  • the ISM ratio values can then be output as shown in Figure 7 by step 709.
  • the device may be any suitable electronics device or apparatus.
  • the device 1400 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc.
  • the device may for example be configured to implement the encoder/analyser part and/or the decoder part as shown in Figure 1 or any functional block as described above.
  • the device 1400 comprises at least one processor or central processing unit 1407.
  • the processor 1407 can be configured to execute various program codes such as the methods such as described herein.
  • the device 1400 comprises at least one memory 1411.
  • the at least one processor 1407 is coupled to the memory 1411.
  • the memory 1411 can be any suitable storage means.
  • the memory 1411 comprises a program code section for storing program codes implementable upon the processor 1407.
  • the memory 1411 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein.
  • the implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1407 whenever needed via the memory-processor coupling.
  • the device 1400 comprises a user interface 1405.
  • the user interface 1405 can be coupled in some embodiments to the processor 1407.
  • the processor 1407 can control the operation of the user interface 1405 and receive inputs from the user interface 1405.
  • the user interface 1405 can enable a user to input commands to the device 1400, for example via a keypad.
  • the user interface 1405 can enable the user to obtain information from the device 1400.
  • the user interface 1405 may comprise a display configured to display information from the device 1400 to the user.
  • the user interface 1405 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1400 and further displaying information to the user of the device 1400.
  • the user interface 1405 may be the user interface for communicating.
  • the device 1400 comprises an input/output port 1409.
  • the input/output port 1409 in some embodiments comprises a transceiver.
  • the transceiver in such embodiments can be coupled to the processor 1407 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network.
  • the transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
  • the transceiver can communicate with further apparatus by any suitable known communications protocol.
  • the transceiver can use a suitable radio access architecture based on long term evolution advanced (LTE Advanced, LTE-A) or new radio (NR) (or can be referred to as 5G), universal mobile telecommunications system (UMTS) radio access network (UTRAN or E-UTRAN), long term evolution (LTE, the same as E-UTRA), 2G networks (legacy network technology), wireless local area network (WLAN or Wi-Fi), worldwide interoperability for microwave access (WiMAX), Bluetooth®, personal communications services (PCS), ZigBee®, wideband code division multiple access (WCDMA), systems using ultra-wideband (UWB) technology, sensor networks, mobile ad-hoc networks (MANETs), cellular internet of things (loT) RAN and Internet Protocol multimedia subsystems (IMS), any other suitable option and/or any combination thereof.
  • LTE Advanced long term evolution advanced
  • NR new radio
  • 5G long term evolution advanced
  • the transceiver input/output port 1409 may be configured to receive the signals.
  • the device 1400 may be employed as at least part of the synthesis device.
  • the input/output port 1409 may be coupled to headphones (which may be a headtracked or a non-tracked headphones) or similar and loudspeakers.
  • the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
  • some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
  • the embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware.
  • any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
  • the software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
  • the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the data processors may be of any type suitable to the local technical environment, and may include one or more of general-purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
  • Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
  • the design of integrated circuits is by and large a highly automated process.
  • Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
  • Programs such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
  • the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
  • circuitry may refer to one or more or all of the following:
  • any portions of hardware processor(s) with software including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions
  • an apparatus such as a mobile phone or server, to perform various functions
  • hardware circuit(s) and or processor(s) such as a microprocessor(s) or a portion of a microprocessor(s), that requires software (e.g., firmware) for operation, but the software may not be present when it is not needed for operation.
  • circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware.
  • circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.
  • non-transitory is a limitation of the medium itself (i.e., tangible, not a signal ) as opposed to a limitation on data storage persistency (e.g., RAM vs. ROM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Appareil servant à coder un paramètre d'objet audio, l'appareil comprenant des moyens pour : obtenir un paramètre de rapport associé à un objet audio respectif dans un environnement audio, l'environnement audio comprenant au moins deux objets audio et les paramètres de rapport configurés pour identifier une distribution de l'objet respectif dans la partie objet de l'environnement audio total ; quantifier les paramètres de rapport par rapport aux objets audio à l'aide d'un premier nombre de bits ; générer un vecteur à partir d'une sélection des paramètres de rapport quantifiés ; et générer une valeur entière sur la base d'une indexation à partir du vecteur, la valeur entière générée représentant les paramètres de rapport pour les au moins deux objets audio.
PCT/EP2023/080907 2022-11-29 2023-11-07 Codage audio spatial paramétrique WO2024115052A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2217884.2A GB2624869A (en) 2022-11-29 2022-11-29 Parametric spatial audio encoding
GB2217884.2 2022-11-29

Publications (1)

Publication Number Publication Date
WO2024115052A1 true WO2024115052A1 (fr) 2024-06-06

Family

ID=84889624

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2023/080907 WO2024115052A1 (fr) 2022-11-29 2023-11-07 Codage audio spatial paramétrique

Country Status (2)

Country Link
GB (1) GB2624869A (fr)
WO (1) WO2024115052A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220122617A1 (en) * 2019-06-14 2022-04-21 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Parameter encoding and decoding
WO2022200666A1 (fr) * 2021-03-22 2022-09-29 Nokia Technologies Oy Combinaison de flux audio spatiaux

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2578603A (en) * 2018-10-31 2020-05-20 Nokia Technologies Oy Determination of spatial audio parameter encoding and associated decoding
WO2022223133A1 (fr) * 2021-04-23 2022-10-27 Nokia Technologies Oy Codage de paramètres spatiaux du son et décodage associé

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220122617A1 (en) * 2019-06-14 2022-04-21 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Parameter encoding and decoding
WO2022200666A1 (fr) * 2021-03-22 2022-09-29 Nokia Technologies Oy Combinaison de flux audio spatiaux

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
VASILACHE ADRIANA: "Conditional split lattice vector quantization for spectral encoding of audio signals", 2010 18TH EUROPEAN SIGNAL PROCESSING CONFERENCE, IEEE, 4 September 2006 (2006-09-04), pages 1 - 5, XP032753485, ISSN: 2219-5491, [retrieved on 20150327] *
WU TINGZHAO ET AL: "Audio object coding based on optimal parameter frequency resolution", MULTIMEDIA TOOLS AND APPLICATIONS, KLUWER ACADEMIC PUBLISHERS, BOSTON, US, vol. 78, no. 15, 5 March 2019 (2019-03-05), pages 20723 - 20738, XP036849703, ISSN: 1380-7501, [retrieved on 20190305], DOI: 10.1007/S11042-019-7409-7 *

Also Published As

Publication number Publication date
GB202217884D0 (en) 2023-01-11
GB2624869A (en) 2024-06-05

Similar Documents

Publication Publication Date Title
CN112639966A (zh) 空间音频参数编码和关联解码的确定
EP3874492B1 (fr) Détermination du codage de paramètre audio spatial et décodage associé
US20240185869A1 (en) Combining spatial audio streams
EP4082010A1 (fr) Combinaison de paramètres audio spatiaux
EP4082009A1 (fr) Fusion de paramètres audio spatiaux
CN114365218A (zh) 空间音频参数编码和相关联的解码的确定
WO2022214730A1 (fr) Séparation d'objets audio spatiaux
EP3991170A1 (fr) Détermination de codage de paramètre audio spatial et décodage associé
WO2022129672A1 (fr) Quantification de paramètres audio spatiaux
US20230335143A1 (en) Quantizing spatial audio parameters
WO2022223133A1 (fr) Codage de paramètres spatiaux du son et décodage associé
WO2024115052A1 (fr) Codage audio spatial paramétrique
WO2024115050A1 (fr) Codage audio spatial paramétrique
WO2024175319A1 (fr) Codage audio spatial de format d'entrée combiné
WO2024175320A1 (fr) Valeurs de priorité aux fins d'un codage audio spatial paramétrique
WO2024115051A1 (fr) Codage audio spatial paramétrique
WO2024199801A1 (fr) Codage audio spatial paramétrique à faible vitesse de codage
CA3208666A1 (fr) Transformation de parametres audio spatiaux
WO2020193865A1 (fr) Détermination de l'importance des paramètres audio spatiaux et codage associé
EP4162487A1 (fr) Codage de paramètres audio spatiaux et décodage associé

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23805488

Country of ref document: EP

Kind code of ref document: A1