WO2020260756A1 - Determination of spatial audio parameter encoding and associated decoding - Google Patents

Determination of spatial audio parameter encoding and associated decoding Download PDF

Info

Publication number
WO2020260756A1
WO2020260756A1 PCT/FI2020/050423 FI2020050423W WO2020260756A1 WO 2020260756 A1 WO2020260756 A1 WO 2020260756A1 FI 2020050423 W FI2020050423 W FI 2020050423W WO 2020260756 A1 WO2020260756 A1 WO 2020260756A1
Authority
WO
WIPO (PCT)
Prior art keywords
mapping
directional
values
index
indices
Prior art date
Application number
PCT/FI2020/050423
Other languages
French (fr)
Inventor
Adriana Vasilache
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Priority to EP20833011.8A priority Critical patent/EP3991170A4/en
Publication of WO2020260756A1 publication Critical patent/WO2020260756A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/002Dynamic bit allocation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/03Application of parametric coding in stereophonic audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems

Definitions

  • the present application relates to apparatus and methods for sound-field related parameter encoding, but not exclusively for time-frequency domain direction related parameter encoding for an audio encoder and decoder.
  • Parametric spatial audio processing is a field of audio signal processing where the spatial aspect of the sound is described using a set of parameters.
  • parameters such as directions of the sound in frequency bands, and the ratios between the directional and non-directional parts of the captured sound in frequency bands.
  • These parameters are known to well describe the perceptual spatial properties of the captured sound at the position of the microphone array.
  • These parameters can be utilized in synthesis of the spatial sound accordingly, for headphones binaurally, for loudspeakers, or to other formats, such as Ambisonics.
  • the directions and direct-to-total energy ratios in frequency bands are thus a parameterization that is particularly effective for spatial audio capture.
  • a parameter set consisting of a direction parameter in frequency bands and an energy ratio parameter in frequency bands (indicating the directionality of the sound) can be also utilized as the spatial metadata (which may also include other parameters such as coherence, spread coherence, number of directions, distance etc) for an audio codec.
  • these parameters can be estimated from microphone-array captured audio signals, and for example a stereo signal can be generated from the microphone array signals to be conveyed with the spatial metadata.
  • the stereo signal could be encoded, for example, with an AAC encoder.
  • a decoder can decode the audio signals into PCM signals and process the sound in frequency bands (using the spatial metadata) to obtain the spatial output, for example a binaural output.
  • the aforementioned solution is particularly suitable for encoding captured spatial sound from microphone arrays (e.g., in mobile phones, VR cameras, stand alone microphone arrays).
  • microphone arrays e.g., in mobile phones, VR cameras, stand alone microphone arrays.
  • a further input for the encoder is also multi-channel loudspeaker input, such as 5.1 or 7.1 channel surround inputs.
  • the directional components of the metadata which may comprise an elevation, azimuth (and energy ratio which is 1 -diffuseness) of a resulting direction, for each considered time/frequency subband. Quantization of these directional components is a current research topic.
  • an apparatus comprising means configured to: generate spatial audio signal directional metadata parameters for a time-frequency tile; generate at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generate indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encode the indices based on an estimate of the number of bits required to encode the indices.
  • the means configured to encode the indices based on an estimate of the number of bits required to encode the indices may be configured to: estimate the number of bits required to entropy encode the indices based on a first mapping from the at least one mapping for the time-frequency tile; determine whether the number of bits required is greater than a determined threshold value and when the number of bits required is greater than the determined number of bits then: estimate a further number of bits required to encode indices based on at least one further mapping from the at least one mapping for the time-frequency tile, wherein the at least one further mapping reduces a possible number of index values to be encoded; select one of the at least one further mapping based on a lowest number of bits required; and encode the indices based on the selected one of the at least one further mapping.
  • the at least one further mapping may comprise at least one of: mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 (corresponding to front direction) is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 (corresponding to front direction) is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; and mapping a mean removed index value to directional values.
  • the at least one further mapping may comprise at least one of: mapping index values to directional values for only one hemisphere; and mapping a single index value to a range of directional values.
  • the at least one further mapping may comprise a complementary mapping to the first mapping.
  • the encoding of the indices based on at least one further mapping may comprise at least one of: entropy encoding; and fixed rate encoding.
  • the at least one mapping may comprise at least one of: mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 (corresponding to front direction) is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 (corresponding to front direction) is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; mapping index values to directional values for only one hemisphere; mapping a single index value to a range of directional values; and mapping a mean removed index value to directional values.
  • a method comprising: generating spatial audio signal directional metadata parameters for a time- frequency tile; generating at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generating indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encoding the indices based on an estimate of the number of bits required to encode the indices.
  • the encoding of the indices based on an estimate of the number of bits required to encode the indices may comprise: estimating the number of bits required to entropy encode the indices based on a first mapping from the at least one mapping for the time-frequency tile; determining whether the number of bits required is greater than a determined threshold value and when the number of bits required is greater than the determined number of bits then: estimating a further number of bits required to encode indices based on at least one further mapping from the at least one mapping for the time-frequency tile, wherein the at least one further mapping reduces a possible number of index values to be encoded; selecting one of the at least one further mapping based on a lowest number of bits required; and encoding the indices based on the selected one of the at least one further mapping.
  • the at least one further mapping may comprise at least one of: mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 (corresponding to front direction) is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 (corresponding to a reference direction which may be a front direction) is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; and mapping a mean removed index value to directional values.
  • the at least one further mapping may comprise at least one of: mapping index values to directional values for only one hemisphere; and mapping a single index value to a range of directional values.
  • the at least one further mapping may comprise a complementary mapping to the first mapping.
  • Encoding of the indices based on at least one further mapping may comprise at least one of: encoding of the indices by entropy encoding; and encoding of the indices by fixed rate encoding.
  • the mapping may comprise at least one of: mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 (corresponding to a reference direction which may for example be a front direction) is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 (corresponding to front direction) is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; mapping index values to directional values for only one hemisphere; mapping a single index value to a range of directional values; and mapping a mean removed index value to directional values.
  • an apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: generate spatial audio signal directional metadata parameters for a time-frequency tile; generate at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generate indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encode the indices based on an estimate of the number of bits required to encode the indices.
  • the apparatus caused to encode the indices based on an estimate of the number of bits required to encode the indices may be caused to: estimate the number of bits required to entropy encode the indices based on a first mapping from the at least one mapping for the time-frequency tile; determine whether the number of bits required is greater than a determined threshold value and when the number of bits required is greater than the determined number of bits then: estimate a further number of bits required to encode indices based on at least one further mapping from the at least one mapping for the time-frequency tile, wherein the at least one further mapping reduces a possible number of index values to be encoded; select one of the at least one further mapping based on a lowest number of bits required; and encode the indices based on the selected one of the at least one further mapping.
  • the at least one further mapping may comprise at least one of: mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 (corresponding to a reference direction which may be a front direction) is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 (corresponding to a reference direction which may be a front direction) is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; mapping index values to directional values for only one hemisphere; mapping a single index value to a range of directional values; and mapping a mean removed index value to directional values.
  • the at least one further mapping may comprise a complementary mapping to the first mapping.
  • the apparatus caused to encode of the indices based on at least one further mapping may be caused to perform at least one of: entropy encoding; and fixed rate encoding.
  • the at least one mapping may comprise at least one of: mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 (corresponding to a reference direction which may be a front direction) is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 (corresponding to a reference direction which may be a front direction) is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; mapping index values to directional values for only one hemisphere; mapping a single index value to a range of directional values; and mapping a mean removed index value to directional values.
  • an apparatus comprising: means for generating spatial audio signal directional metadata parameters for a time-frequency tile; generating at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generating indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encoding the indices based on an estimate of the number of bits required to encode the indices.
  • a computer program comprising instructions [or a computer readable medium comprising program instructions] for causing an apparatus to perform at least the following: generating spatial audio signal directional metadata parameters for a time-frequency tile; generating at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generating indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encoding the indices based on an estimate of the number of bits required to encode the indices.
  • a non-transitory computer readable medium comprising program instructions for causing an apparatus to perform at least the following: generating spatial audio signal directional metadata parameters for a time-frequency tile; generating at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generating indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encoding the indices based on an estimate of the number of bits required to encode the indices.
  • an apparatus comprising: generating circuitry configured to generate spatial audio signal directional metadata parameters for a time-frequency tile; generating circuitry configured to generate at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generating circuitry configured to generate indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encoding circuitry configured to encode the indices based on an estimate of the number of bits required to encode the indices.
  • a computer readable medium comprising program instructions for causing an apparatus to perform at least the following: generating spatial audio signal directional metadata parameters for a time-frequency tile; generating at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generating indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encoding the indices based on an estimate of the number of bits required to encode the indices.
  • An apparatus comprising means for performing the actions of the method as described above.
  • An apparatus configured to perform the actions of the method as described above.
  • a computer program comprising program instructions for causing a computer to perform the method as described above.
  • a computer program product stored on a medium may cause an apparatus to perform the method as described herein.
  • An electronic device may comprise apparatus as described herein.
  • a chipset may comprise apparatus as described herein.
  • Embodiments of the present application aim to address problems associated with the state of the art.
  • Figure 1 shows schematically a system of apparatus suitable for implementing some embodiments
  • Figure 2 shows schematically the metadata encoder according to some embodiments
  • Figure 3 show a flow diagram of the first operations of the metadata encoder as shown in Figure 2 according to some embodiments;
  • Figure 4 shows a flow diagram of later operations of the metadata encoder as shown in Figure 2 according to some embodiments
  • Figure 5 shows a flow diagram of the entropy encoding of the direction indices as shown in Figure 4 according to some embodiments
  • Figure 6 shows a further flow diagram of the entropy encoding of the direction indices as shown in Figure 4 according to some embodiments.
  • Figure 7 shows schematically an example device suitable for implementing the apparatus shown.
  • the input format may be any suitable input format, such as multi-channel loudspeaker, Ambisonics (FOA/FIOA) etc. It is understood that in some embodiments the channel location is based on a location of the microphone or is a virtual location or direction.
  • the output of the example system is a multi channel loudspeaker arrangement. Flowever it is understood that the output may be rendered to the user via means other than loudspeakers.
  • the multi channel loudspeaker signals may be generalised to be two or more playback audio signals.
  • the metadata consists at least of elevation, azimuth and the energy ratio of a resulting direction, for each considered time/frequency subband.
  • the direction parameter components, the azimuth and the elevation are extracted from the audio data and then quantized to a given quantization resolution.
  • the resulting indexes must be further compressed for efficient transmission. For high bitrate, high quality lossless encoding of the metadata is needed.
  • the concept as discussed hereafter is to reduce values of the variables to be encoded.
  • the reduction can be implemented in some embodiments for the case when there are a higher number of symbols.
  • the change can be performed by subtracting from the number of symbols available the index to be encoded and encoding the resulting difference.
  • this corresponds to having audio sources situated with a bias to the rear.
  • the change can also be implemented in some embodiments by checking if all indexes are even or if all indexes are odd and encoding the values divided by two.
  • this corresponds to having the audio sources mainly situated on the upper or the lower side of audio scene.
  • the encoding of the MASA metadata is configured to first estimate the number of bits for the directional data based on the values of the quantized energy ratios for each time frequency tile. Furthermore the entropy encoding of the original quantization resolution is tested. If the resulting sum is larger than the amount of available bits, the number of bits can be proportionally reduced for each time frequency tile such that it fits the available number of bits, however the quantization resolution is not unnecessarily adjusted when the bitrate allows it (for example in higher bitrates).
  • the system 100 is shown with an ‘analysis’ part 121 and a‘synthesis’ part 131 .
  • The‘analysis’ part 121 is the part from receiving the multi-channel loudspeaker signals up to an encoding of the metadata and downmix signal and the‘synthesis’ part 131 is the part from a decoding of the encoded metadata and downmix signal to the presentation of the re-generated signal (for example in multi-channel loudspeaker form).
  • the input to the system 100 and the‘analysis’ part 121 is the multi-channel signals 102.
  • a microphone channel signal input is described, however any suitable input (or synthetic multi-channel) format may be implemented in other embodiments.
  • the spatial analyser and the spatial analysis may be implemented external to the encoder.
  • the spatial metadata associated with the audio signals may be a provided to an encoder as a separate bit-stream.
  • the spatial metadata may be provided as a set of spatial (direction) index values.
  • the multi-channel signals are passed to a downmixer 103 and to an analysis processor 105.
  • the downmixer 103 is configured to receive the multi channel signals and downmix the signals to a determined number of channels and output the downmix signals 104.
  • the downmixer 103 may be configured to generate a 2 audio channel downmix of the multi-channel signals.
  • the determined number of channels may be any suitable number of channels.
  • the downmixer 103 is optional and the multi-channel signals are passed unprocessed to an encoder 107 in the same manner as the downmix signal are in this example.
  • the analysis processor 105 is also configured to receive the multi-channel signals and analyse the signals to produce metadata 106 associated with the multi-channel signals and thus associated with the downmix signals 104.
  • the analysis processor 105 may be configured to generate the metadata which may comprise, for each time-frequency analysis interval, a direction parameter 108 and an energy ratio parameter 1 10 (and in some embodiments a coherence parameter, and a diffuseness parameter).
  • the direction and energy ratio may in some embodiments be considered to be spatial audio parameters.
  • the spatial audio parameters comprise parameters which aim to characterize the sound-field created by the multi-channel signals (or two or more playback audio signals in general).
  • the parameters generated may differ from frequency band to frequency band.
  • band X all of the parameters are generated and transmitted, whereas in band Y only one of the parameters is generated and transmitted, and furthermore in band Z no parameters are generated or transmitted.
  • band Z no parameters are generated or transmitted.
  • a practical example of this may be that for some frequency bands such as the highest band some of the parameters are not required for perceptual reasons.
  • the downmix signals 104 and the metadata 106 may be passed to an encoder 107.
  • the encoder 107 may comprise an audio encoder core 109 which is configured to receive the downmix (or otherwise) signals 104 and generate a suitable encoding of these audio signals.
  • the encoder 107 can in some embodiments be a computer (running suitable software stored on memory and on at least one processor), or alternatively a specific device utilizing, for example, FPGAs or ASICs.
  • the encoding may be implemented using any suitable scheme.
  • the encoder 107 may furthermore comprise a metadata encoder/quantizer 1 1 1 which is configured to receive the metadata and output an encoded or compressed form of the information.
  • the encoder 107 may further interleave, multiplex to a single data stream or embed the metadata within encoded downmix signals before transmission or storage shown in Figure 1 by the dashed line.
  • the multiplexing may be implemented using any suitable scheme.
  • the received or retrieved data may be received by a decoder/demultiplexer 133.
  • the decoder/demultiplexer 133 may demultiplex the encoded streams and pass the audio encoded stream to a downmix extractor 135 which is configured to decode the audio signals to obtain the downmix signals.
  • the decoder/demultiplexer 133 may comprise a metadata extractor 137 which is configured to receive the encoded metadata and generate metadata.
  • the decoder/demultiplexer 133 can in some embodiments be a computer (running suitable software stored on memory and on at least one processor), or alternatively a specific device utilizing, for example, FPGAs or ASICs.
  • the decoded metadata and downmix audio signals may be passed to a synthesis processor 139.
  • the system 100‘synthesis’ part 131 further shows a synthesis processor 139 configured to receive the downmix and the metadata and re-creates in any suitable format a synthesized spatial audio in the form of multi-channel signals 1 10 (these may be multichannel loudspeaker format or in some embodiments any suitable output format such as binaural or Ambisonics signals, depending on the use case) based on the downmix signals and the metadata.
  • a synthesis processor 139 configured to receive the downmix and the metadata and re-creates in any suitable format a synthesized spatial audio in the form of multi-channel signals 1 10 (these may be multichannel loudspeaker format or in some embodiments any suitable output format such as binaural or Ambisonics signals, depending on the use case) based on the downmix signals and the metadata.
  • the system is configured to receive multi-channel audio signals. Then the system (analysis part) is configured to generate a downmix or otherwise generate a suitable transport audio signal (for example by selecting some of the audio signal channels). The system is then configured to encode for storage/transmission the downmix (or more generally the transport) signal. After this the system may store/transmit the encoded downmix and metadata. The system may retrieve/receive the encoded downmix and metadata. Then the system is configured to extract the downmix and metadata from encoded downmix and metadata parameters, for example demultiplex and decode the encoded downmix and metadata parameters.
  • the system (synthesis part) is configured to synthesize an output multi channel audio signal based on extracted downmix of multi-channel audio signals and metadata.
  • the analysis processor 105 in some embodiments comprises a time- frequency domain transformer 201 .
  • the time-frequency domain transformer 201 is configured to receive the multi-channel signals 102 and apply a suitable time to frequency domain transform such as a Short Time Fourier Transform (STFT) in order to convert the input time domain signals into a suitable time-frequency signals.
  • STFT Short Time Fourier Transform
  • These time-frequency signals may be passed to a spatial analyser 203 and to a signal analyser 205.
  • time-frequency signals 202 may be represented in the time-frequency domain representation by
  • n can be considered as a time index with a lower sampling rate than that of the original time-domain signals.
  • Each subband k has a lowest bin b k low and a highest bin b k,high , and the subband contains all bins from b k,low to b k,high .
  • the widths of the subbands can approximate any suitable distribution. For example the Equivalent rectangular bandwidth (ERB) scale or the Bark scale.
  • the analysis processor 105 comprises a spatial analyser 203.
  • the spatial analyser 203 may be configured to receive the time- frequency signals 202 and based on these signals estimate direction parameters 108.
  • the direction parameters may be determined based on any audio based ‘direction’ determination.
  • the spatial analyser 203 is configured to estimate the direction with two or more signal inputs. This represents the simplest configuration to estimate a‘direction’, more complex processing may be performed with even more signals.
  • the spatial analyser 203 may thus be configured to provide at least one azimuth and elevation for each frequency band and temporal time-frequency block within a frame of an audio signal, denoted as azimuth (p(k,n) and elevation 0(k,n).
  • the direction parameters 108 may be also be passed to a direction analyser/index generator 215.
  • the spatial analyser 203 may also be configured to determine an energy ratio parameter 1 10.
  • the energy ratio may be the energy of the audio signal considered to arrive from a direction.
  • the direct-to-total energy ratio r(k,n) can be estimated, e.g., using a stability measure of the directional estimate, or using any correlation measure, or any other suitable method to obtain a ratio parameter.
  • the energy ratio may be passed to an energy ratio average generator/quantization resolution determiner 21 1 .
  • the analysis processor is configured to receive time domain multichannel or other format such as microphone or Ambisonics audio signals.
  • the analysis processor may apply a time domain to frequency domain transform (e.g. STFT) to generate suitable time-frequency domain signals for analysis and then apply direction analysis to determine direction and energy ratio parameters.
  • a time domain to frequency domain transform e.g. STFT
  • the analysis processor may then be configured to output the determined parameters.
  • the parameters may be combined over several time indices. Same applies for the frequency axis, as has been expressed, the direction of several frequency bins b could be expressed by one direction parameter in band k consisting of several frequency bins b. The same applies for all of the discussed spatial parameters herein.
  • the audio spatial metadata consists of azimuth, elevation, and energy ratio data for each subband.
  • the directional data is represented on 16 bits such that the azimuth is approximately represented on 9 bits, and the elevation on 7 bits.
  • the energy ratio is represented on 8 bits.
  • the metadata encoder/quantizer 1 1 1 may comprise an energy ratio average generator/quantization resolution determiner 21 1 .
  • the energy ratio average generator/quantization resolution determiner 21 1 may be configured to receive the energy ratios and from the analysis and from this generate a suitable encoding of the ratios. For example to receive the determined energy ratios (for example direct- to-total energy ratios, and furthermore diffuse-to-total energy ratios and remainder- to-total energy ratios) and encode/quantize these. These encoded forms may be passed to the encoder 217.
  • the energy ratio average generator/quantization resolution determiner 21 1 thus may be configured to apply a scalar non-uniform quantization using 3 bits for each sub-band.
  • the energy ratio average generator/quantization resolution determiner 21 1 is configured to, rather than controlling the transmitting/storing of all of the energy ratio values for all TF blocks, generate only one weighted average value per subband which is passed to the encoder to be transmitted/stored.
  • this average is computed by taking into account the total energy of each time-frequency block and the weighting applied based on the subbands having more energy.
  • the energy ratio average generator/quantization resolution determiner 21 1 is configured to determine the quantization resolution for the direction parameters (in other words a quantization resolution for elevation and azimuth values) for all of the time-frequency blocks in the frame.
  • This bit allocation may for example be defined by bits_dir0[0:N-1 ][0:M-1 ] and may be passed to the direction analyser/index generator 215.
  • the actions of the energy ratio average generator/quantization resolution determiner 21 1 can be summarised.
  • the first step is one of receiving the ratio values as shown in Figure 3 by step 301 .
  • the subband loop is started in Figure 3 by step 303.
  • the subband loop comprises a first action of using a determined number of bits (for example 3) to represent the energy ratio value based on the weighted average of the energy ratio value for all of the values within the time block (where the weighting is determined by the energy value of the audio signal) as shown in Figure 3 by step 305.
  • the second action is one determined the quantization resolution for the azimuth and elevation for all of the time block of the current subband based on the value of the energy ratio as shown in Figure 3 by step 307.
  • the loop is closed in Figure 3 by step 309.
  • the metadata encoder/quantizer 1 1 1 may comprise a direction analyser/index generator 215.
  • the direction index generator 215 is configured to receive the direction parameters (such as the azimuth (p(k, n) and elevation 0(k, n) 108 and the quantization bit allocation and from this generate a quantized output.
  • the quantization is based on an arrangement of spheres forming a spherical grid arranged in rings on a‘surface’ sphere which are defined by a look up table defined by the determined quantization resolution.
  • the spherical grid uses the idea of covering a sphere with smaller spheres and considering the centres of the smaller spheres as points defining a grid of almost equidistant directions. The smaller spheres therefore define cones or solid angles about the centre point which can be indexed according to any suitable indexing algorithm.
  • spherical quantization is described here any suitable quantization, linear or non-linear may be used.
  • the bits for direction parameters are allocated according to the table bits_direction[]; if the energy ratio has the index /, the number of bits for the direction is bits_direction[/].
  • ‘no_theta’ corresponds to the number of elevation values in the ‘North hemisphere’ of the sphere of directions, including the Equator.‘no_phi’ corresponds to the number of azimuth values at each elevation for each quantizer.
  • the direction analyser/index generator 215 can then be configured to entropy encode the azimuth and elevation indices.
  • the entropy coding is implemented for one frequency subband at a time, encoding all the time subframes for that subband. This means that for instance the best GR order is determined for the 4 values corresponding to the time subframes of a current subband. Furthermore as discussed herein when there are several methods to encode the values for one subband one of the methods is selected as discussed later.
  • the entropy encoding of the azimuth and the elevation indexes in some embodiments may be implemented using a Golomb Rice encoding method with two possible values for the Golomb Rice parameter. In some embodiments the entropy coding may also be implemented using any suitable entropy coding technique (for example Huffman, arithmetic coding ).
  • the direction analyser/index generator 215 can then be configured to compare the number of bits used by the entropy coding EC to the number of bits available for encoding bits_dir1 [0:N-1 ][0:M-1]. This decision may be implemented at a subband level. In other words the EC bits for azimuth + the EC bits for elevation of the current subband, i, are compared to sum(bits_dir1 [i][0:M-1 ]).
  • the entropy coding based on the quantized direction indices are used.
  • the direction analyser/index generator 215 can then be configured to start a loop for each subband upto the penultimate subband N-1 .
  • the direction analyser/index generator 215 can be configured to encode the direction indices using the fixed rate encoding method and using bits_dir1 [N-1 ][0:M-1 ] bits.
  • the first step is one of determining the direction indices based on the quantization as shown in Figure 4 by step 400. Flaving determined the direction indices they are entropy encoded as shown in Figure 4 by step 401 . The number of bits used entropy encoding the direction indices are compared against the number of bit allowed as shown in Figure 4 by step 403. Where the number of bits used for entropy encoding the direction indices is less than (or equal to) the number of bits allowed then the entropy encoded direction indices can be used as shown in Figure 4 by step 404 and the method ends for this frame.
  • the number of bits used for entropy encoding the direction indices is more than the number of bits allowed then the number of allocated bits is reduced such that the sim of the allocated bits equals the number of available bits left after encoding the energy ratios as shown in Figure 4 by step 405.
  • the number of allowed bits for encoding is determined as shown in Figure 4 by step 409.
  • a fixed rate encoding method is used to encode the indices using the reduced number of bits as shown in Figure 4 by step 41 1 .
  • Either the fixed rate encoding or the entropy encoding is then selected based on which method uses fewer bits and the selection furthermore can be indicated by a single bit as shown in Figure 4 by step 413.
  • step 415 The determination of whether there are any remaining bits available based on the difference between the number of allowed bits and the number of bits used by the selected encoding and the redistribution of the remaining bits to the later subband allocations is shown in Figure 4 by step 415.
  • the loop is then completed and may then repeat for the next subband as shown in Figure 4 by step 417.
  • bits_allowed sum(bits_dir1 [i][0:M-1 ])
  • nb min(bits_fixed, bits_ec)+1 ;
  • the optimisation of the entropy encoding of the elevation and the azimuth values can be performed separately and is described in further detail hereafter with respect to Figures 5 and 6.
  • the direction indices determination is started as shown in Figure 5 by step 501 .
  • the bits required for entropy encoding the indices determination shown is an elevation index determination.
  • a similar approach may be applied to the azimuth index determination.
  • a mapping is generated such that the elevation (or azimuth) value of 0 has an index of 0 and the increasing index values are assigned to increasing positive and negative elevation (azimuth) values as shown in Figure 5 by step 503.
  • mapping is applied to the audio sources (for example in the form of generating a codeword output based on a lookup table) as shown in Figure 5 by step 505.
  • the indices having been generated in some embodiments there is a check performed to determine whether all of the indices are located within the same hemisphere as shown in Figure 5 by step 507.
  • the index values can be divided by two (with a rounding up) and an indicator generated indicating which hemisphere the indices were all located within and then entropy encoding these values as shown in Figure 5 by step 509.
  • a mean removed entropy encoding may be configured to remove first the average index value for the subframes to be encoded, then remap the indices to positive ones and then encode them with a suitable entropy encoding, such as Golomb Rice encoding as shown in Figure 5 by step 510.
  • a check can be applied to determine whether all of the time subframes have the same elevation (azimuth) value or index as shown in Figure 5 by step 51 1 .
  • the next operation is one of providing the number of bits required for the entropy encoded indices and any indicator bits as shown in Figure 5 by step 517.
  • the index of the elevation can be determined from a codebook in the domain [-90; 90] which is formed such that an elevation with a value 0 returns a codeword with index zero and alternatively assigns increasing indexes to positive and negative codewords distancing themselves from the zero elevation value.
  • the encoder can be configured to check whether all of the audio sources are above (or all of the audio sources are below) the equator and where this is the case for all time subframes for a subband then dividing the indices by 2, in order to generate smaller valued indices which can be more efficiently encoded.
  • the function mean_removed_GR() in the above example is configured to remove first the average index value for the subframes to be encoded, then remap the indices to positive ones and then encodes them with Golomb Rice encoding.
  • odd_even_mean_removed_GR() is configured to check first if all indexes are odd or if all are even, signals this occurrence and indicates the type (odd or even) after which it encodes the halved indices.
  • a series of entropy encoding optimisation operations are performed and then the lowest value is selected. This for example can be shown with respect to the encoding of azimuth values and as shown in Figure 6.
  • the direction indices determination is started as shown in Figure 6 by step 601 .
  • a mapping is generated such that the azimuth value of 0 has an index of 0 and the increasing index values are assigned to increasing positive and negative azimuth values as shown in Figure 6 by step 503.
  • mapping is applied to the audio sources (for example in the form of generating a codeword output based on a lookup table) as shown in Figure 6 by step 605.
  • the index of the azimuth can be determined from a further codebook.
  • the zero value for the azimuth corresponds to a reference direction which may be the front direction, and positive values are to the left and negative values to the right.
  • the index of the azimuth value is assigned such that the values (-150, -120, -90, -60, -30, 0, 30, 60, 90, 120, 150, 180) have assigned the following indices (10, 8, 6, 4, 2, 0, 1 , 3, 5, 7, 9, 1 1 ).
  • the odd/even approach can be checked for the azimuth (corresponding to left /right positioning).
  • the higher index values are assigned to values from the back or rear of the‘capture environment’.
  • i Estimate the number of bits if encoding the indexes as they would be in front. Use mean removed order selective Golomb Rice coding.
  • the GR order may be 2 or 3.
  • the GR order can also be set to different values, depending of the default range for the number of symbols.
  • ii Estimate the number of bits if encoding the complementary indexes using mean removed order selective GR coding.
  • iii Use the encoding method that uses the fewer number of bits and use a bit to signal which method is used
  • the device may be any suitable electronics device or apparatus.
  • the device 1400 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc.
  • the device 1400 comprises at least one processor or central processing unit 1407.
  • the processor 1407 can be configured to execute various program codes such as the methods such as described herein.
  • the device 1400 comprises a memory 141 1 .
  • the at least one processor 1407 is coupled to the memory 141 1 .
  • the memory 141 1 can be any suitable storage means.
  • the memory 141 1 comprises a program code section for storing program codes implementable upon the processor 1407.
  • the memory 141 1 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1407 whenever needed via the memory-processor coupling.
  • the device 1400 comprises a user interface 1405.
  • the user interface 1405 can be coupled in some embodiments to the processor 1407.
  • the processor 1407 can control the operation of the user interface 1405 and receive inputs from the user interface 1405.
  • the user interface 1405 can enable a user to input commands to the device 1400, for example via a keypad.
  • the user interface 1405 can enable the user to obtain information from the device 1400.
  • the user interface 1405 may comprise a display configured to display information from the device 1400 to the user.
  • the user interface 1405 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1400 and further displaying information to the user of the device 1400.
  • the user interface 1405 may be the user interface for communicating with the position determiner as described herein.
  • the device 1400 comprises an input/output port 1409.
  • the input/output port 1409 in some embodiments comprises a transceiver.
  • the transceiver in such embodiments can be coupled to the processor 1407 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network.
  • the transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
  • the transceiver can communicate with further apparatus by any suitable known communications protocol.
  • the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
  • UMTS universal mobile telecommunications system
  • WLAN wireless local area network
  • IRDA infrared data communication pathway
  • the transceiver input/output port 1409 may be configured to receive the signals and in some embodiments determine the parameters as described herein by using the processor 1407 executing suitable code.
  • the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
  • some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
  • the embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware.
  • any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
  • the software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
  • the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
  • Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
  • the design of integrated circuits is by and large a highly automated process.
  • Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
  • Programs such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
  • the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

Abstract

An apparatus comprising means configured to: generate spatial audio signal directional metadata parameters for a time-frequency tile; generate at least one mapping (503) between values of the spatial audio signal directional metadata parameter are mapped to an index value (505); generate indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encode the indices based on an estimate (517) of the number of bits required to encode the indices.

Description

DETERMINATION OF SPATIAL AUDIO PARAMETER ENCODING AND
ASSOCIATED DECODING
Field
The present application relates to apparatus and methods for sound-field related parameter encoding, but not exclusively for time-frequency domain direction related parameter encoding for an audio encoder and decoder.
Background
Parametric spatial audio processing is a field of audio signal processing where the spatial aspect of the sound is described using a set of parameters. For example, in parametric spatial audio capture from microphone arrays, it is a typical and an effective choice to estimate from the microphone array signals a set of parameters such as directions of the sound in frequency bands, and the ratios between the directional and non-directional parts of the captured sound in frequency bands. These parameters are known to well describe the perceptual spatial properties of the captured sound at the position of the microphone array. These parameters can be utilized in synthesis of the spatial sound accordingly, for headphones binaurally, for loudspeakers, or to other formats, such as Ambisonics.
The directions and direct-to-total energy ratios in frequency bands are thus a parameterization that is particularly effective for spatial audio capture.
A parameter set consisting of a direction parameter in frequency bands and an energy ratio parameter in frequency bands (indicating the directionality of the sound) can be also utilized as the spatial metadata (which may also include other parameters such as coherence, spread coherence, number of directions, distance etc) for an audio codec. For example, these parameters can be estimated from microphone-array captured audio signals, and for example a stereo signal can be generated from the microphone array signals to be conveyed with the spatial metadata. The stereo signal could be encoded, for example, with an AAC encoder. A decoder can decode the audio signals into PCM signals and process the sound in frequency bands (using the spatial metadata) to obtain the spatial output, for example a binaural output.
The aforementioned solution is particularly suitable for encoding captured spatial sound from microphone arrays (e.g., in mobile phones, VR cameras, stand alone microphone arrays). However, it may be desirable for such an encoder to have also other input types than microphone-array captured signals, for example, loudspeaker signals, audio object signals, or Ambisonics signals.
Analysing first-order Ambisonics (FOA) inputs for spatial metadata extraction has been thoroughly documented in scientific literature related to Directional Audio Coding (DirAC) and Harmonic planewave expansion (Harpex). This is since there exist microphone arrays directly providing a FOA signal (more accurately: its variant, the B-format signal), and analysing such an input has thus been a point of study in the field.
A further input for the encoder is also multi-channel loudspeaker input, such as 5.1 or 7.1 channel surround inputs.
However with respect to the directional components of the metadata, which may comprise an elevation, azimuth (and energy ratio which is 1 -diffuseness) of a resulting direction, for each considered time/frequency subband. Quantization of these directional components is a current research topic.
Summary
There is provided according to a first aspect an apparatus comprising means configured to: generate spatial audio signal directional metadata parameters for a time-frequency tile; generate at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generate indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encode the indices based on an estimate of the number of bits required to encode the indices.
The means configured to encode the indices based on an estimate of the number of bits required to encode the indices may be configured to: estimate the number of bits required to entropy encode the indices based on a first mapping from the at least one mapping for the time-frequency tile; determine whether the number of bits required is greater than a determined threshold value and when the number of bits required is greater than the determined number of bits then: estimate a further number of bits required to encode indices based on at least one further mapping from the at least one mapping for the time-frequency tile, wherein the at least one further mapping reduces a possible number of index values to be encoded; select one of the at least one further mapping based on a lowest number of bits required; and encode the indices based on the selected one of the at least one further mapping.
The at least one further mapping may comprise at least one of: mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 (corresponding to front direction) is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 (corresponding to front direction) is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; and mapping a mean removed index value to directional values.
The at least one further mapping may comprise at least one of: mapping index values to directional values for only one hemisphere; and mapping a single index value to a range of directional values.
The at least one further mapping may comprise a complementary mapping to the first mapping.
The encoding of the indices based on at least one further mapping may comprise at least one of: entropy encoding; and fixed rate encoding.
The at least one mapping may comprise at least one of: mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 (corresponding to front direction) is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 (corresponding to front direction) is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; mapping index values to directional values for only one hemisphere; mapping a single index value to a range of directional values; and mapping a mean removed index value to directional values.
According to second aspect there is provided a method comprising: generating spatial audio signal directional metadata parameters for a time- frequency tile; generating at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generating indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encoding the indices based on an estimate of the number of bits required to encode the indices.
The encoding of the indices based on an estimate of the number of bits required to encode the indices may comprise: estimating the number of bits required to entropy encode the indices based on a first mapping from the at least one mapping for the time-frequency tile; determining whether the number of bits required is greater than a determined threshold value and when the number of bits required is greater than the determined number of bits then: estimating a further number of bits required to encode indices based on at least one further mapping from the at least one mapping for the time-frequency tile, wherein the at least one further mapping reduces a possible number of index values to be encoded; selecting one of the at least one further mapping based on a lowest number of bits required; and encoding the indices based on the selected one of the at least one further mapping.
The at least one further mapping may comprise at least one of: mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 (corresponding to front direction) is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 (corresponding to a reference direction which may be a front direction) is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; and mapping a mean removed index value to directional values.
The at least one further mapping may comprise at least one of: mapping index values to directional values for only one hemisphere; and mapping a single index value to a range of directional values.
The at least one further mapping may comprise a complementary mapping to the first mapping.
Encoding of the indices based on at least one further mapping may comprise at least one of: encoding of the indices by entropy encoding; and encoding of the indices by fixed rate encoding.
The mapping may comprise at least one of: mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 (corresponding to a reference direction which may for example be a front direction) is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 (corresponding to front direction) is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; mapping index values to directional values for only one hemisphere; mapping a single index value to a range of directional values; and mapping a mean removed index value to directional values.
According to a third aspect there is provided an apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: generate spatial audio signal directional metadata parameters for a time-frequency tile; generate at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generate indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encode the indices based on an estimate of the number of bits required to encode the indices.
The apparatus caused to encode the indices based on an estimate of the number of bits required to encode the indices may be caused to: estimate the number of bits required to entropy encode the indices based on a first mapping from the at least one mapping for the time-frequency tile; determine whether the number of bits required is greater than a determined threshold value and when the number of bits required is greater than the determined number of bits then: estimate a further number of bits required to encode indices based on at least one further mapping from the at least one mapping for the time-frequency tile, wherein the at least one further mapping reduces a possible number of index values to be encoded; select one of the at least one further mapping based on a lowest number of bits required; and encode the indices based on the selected one of the at least one further mapping.
The at least one further mapping may comprise at least one of: mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 (corresponding to a reference direction which may be a front direction) is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 (corresponding to a reference direction which may be a front direction) is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; mapping index values to directional values for only one hemisphere; mapping a single index value to a range of directional values; and mapping a mean removed index value to directional values.
The at least one further mapping may comprise a complementary mapping to the first mapping.
The apparatus caused to encode of the indices based on at least one further mapping may be caused to perform at least one of: entropy encoding; and fixed rate encoding. The at least one mapping may comprise at least one of: mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 (corresponding to a reference direction which may be a front direction) is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 (corresponding to a reference direction which may be a front direction) is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; mapping index values to directional values for only one hemisphere; mapping a single index value to a range of directional values; and mapping a mean removed index value to directional values.
According to a fourth aspect there is provided an apparatus comprising: means for generating spatial audio signal directional metadata parameters for a time-frequency tile; generating at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generating indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encoding the indices based on an estimate of the number of bits required to encode the indices.
According to a fifth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising program instructions] for causing an apparatus to perform at least the following: generating spatial audio signal directional metadata parameters for a time-frequency tile; generating at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generating indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encoding the indices based on an estimate of the number of bits required to encode the indices.
According to a seventh aspect there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus to perform at least the following: generating spatial audio signal directional metadata parameters for a time-frequency tile; generating at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generating indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encoding the indices based on an estimate of the number of bits required to encode the indices.
According to an eighth aspect there is provided an apparatus comprising: generating circuitry configured to generate spatial audio signal directional metadata parameters for a time-frequency tile; generating circuitry configured to generate at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generating circuitry configured to generate indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encoding circuitry configured to encode the indices based on an estimate of the number of bits required to encode the indices. According to a ninth aspect there is provided a computer readable medium comprising program instructions for causing an apparatus to perform at least the following: generating spatial audio signal directional metadata parameters for a time-frequency tile; generating at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value; generating indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and encoding the indices based on an estimate of the number of bits required to encode the indices.
An apparatus comprising means for performing the actions of the method as described above.
An apparatus configured to perform the actions of the method as described above.
A computer program comprising program instructions for causing a computer to perform the method as described above.
A computer program product stored on a medium may cause an apparatus to perform the method as described herein.
An electronic device may comprise apparatus as described herein.
A chipset may comprise apparatus as described herein. Embodiments of the present application aim to address problems associated with the state of the art.
Summary of the Figures
For a better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:
Figure 1 shows schematically a system of apparatus suitable for implementing some embodiments;
Figure 2 shows schematically the metadata encoder according to some embodiments;
Figure 3 show a flow diagram of the first operations of the metadata encoder as shown in Figure 2 according to some embodiments;
Figure 4 shows a flow diagram of later operations of the metadata encoder as shown in Figure 2 according to some embodiments;
Figure 5 shows a flow diagram of the entropy encoding of the direction indices as shown in Figure 4 according to some embodiments;
Figure 6 shows a further flow diagram of the entropy encoding of the direction indices as shown in Figure 4 according to some embodiments; and
Figure 7 shows schematically an example device suitable for implementing the apparatus shown.
Embodiments of the Application
The following describes in further detail suitable apparatus and possible mechanisms for the provision of effective spatial analysis derived metadata parameters. In the following discussions multi-channel system is discussed with respect to a multi-channel microphone implementation. Flowever as discussed above the input format may be any suitable input format, such as multi-channel loudspeaker, Ambisonics (FOA/FIOA) etc. It is understood that in some embodiments the channel location is based on a location of the microphone or is a virtual location or direction. Furthermore the output of the example system is a multi channel loudspeaker arrangement. Flowever it is understood that the output may be rendered to the user via means other than loudspeakers. Furthermore the multi channel loudspeaker signals may be generalised to be two or more playback audio signals.
The metadata consists at least of elevation, azimuth and the energy ratio of a resulting direction, for each considered time/frequency subband. The direction parameter components, the azimuth and the elevation are extracted from the audio data and then quantized to a given quantization resolution. The resulting indexes must be further compressed for efficient transmission. For high bitrate, high quality lossless encoding of the metadata is needed.
The concept as discussed hereafter is to reduce values of the variables to be encoded. The reduction can be implemented in some embodiments for the case when there are a higher number of symbols. The change can be performed by subtracting from the number of symbols available the index to be encoded and encoding the resulting difference. In some embodiments, for an azimuth representation this corresponds to having audio sources situated with a bias to the rear. In addition, the change can also be implemented in some embodiments by checking if all indexes are even or if all indexes are odd and encoding the values divided by two. For an elevation representation, in some embodiments this corresponds to having the audio sources mainly situated on the upper or the lower side of audio scene.
In such embodiments the encoding of the MASA metadata, for example within an IVAS codec, is configured to first estimate the number of bits for the directional data based on the values of the quantized energy ratios for each time frequency tile. Furthermore the entropy encoding of the original quantization resolution is tested. If the resulting sum is larger than the amount of available bits, the number of bits can be proportionally reduced for each time frequency tile such that it fits the available number of bits, however the quantization resolution is not unnecessarily adjusted when the bitrate allows it (for example in higher bitrates).
With respect to Figure 1 an example apparatus and system for implementing embodiments of the application are shown. The system 100 is shown with an ‘analysis’ part 121 and a‘synthesis’ part 131 . The‘analysis’ part 121 is the part from receiving the multi-channel loudspeaker signals up to an encoding of the metadata and downmix signal and the‘synthesis’ part 131 is the part from a decoding of the encoded metadata and downmix signal to the presentation of the re-generated signal (for example in multi-channel loudspeaker form).
The input to the system 100 and the‘analysis’ part 121 is the multi-channel signals 102. In the following examples a microphone channel signal input is described, however any suitable input (or synthetic multi-channel) format may be implemented in other embodiments. For example in some embodiments the spatial analyser and the spatial analysis may be implemented external to the encoder. For example in some embodiments the spatial metadata associated with the audio signals may be a provided to an encoder as a separate bit-stream. In some embodiments the spatial metadata may be provided as a set of spatial (direction) index values.
The multi-channel signals are passed to a downmixer 103 and to an analysis processor 105.
In some embodiments the downmixer 103 is configured to receive the multi channel signals and downmix the signals to a determined number of channels and output the downmix signals 104. For example the downmixer 103 may be configured to generate a 2 audio channel downmix of the multi-channel signals. The determined number of channels may be any suitable number of channels. In some embodiments the downmixer 103 is optional and the multi-channel signals are passed unprocessed to an encoder 107 in the same manner as the downmix signal are in this example.
In some embodiments the analysis processor 105 is also configured to receive the multi-channel signals and analyse the signals to produce metadata 106 associated with the multi-channel signals and thus associated with the downmix signals 104. The analysis processor 105 may be configured to generate the metadata which may comprise, for each time-frequency analysis interval, a direction parameter 108 and an energy ratio parameter 1 10 (and in some embodiments a coherence parameter, and a diffuseness parameter). The direction and energy ratio may in some embodiments be considered to be spatial audio parameters. In other words the spatial audio parameters comprise parameters which aim to characterize the sound-field created by the multi-channel signals (or two or more playback audio signals in general).
In some embodiments the parameters generated may differ from frequency band to frequency band. Thus for example in band X all of the parameters are generated and transmitted, whereas in band Y only one of the parameters is generated and transmitted, and furthermore in band Z no parameters are generated or transmitted. A practical example of this may be that for some frequency bands such as the highest band some of the parameters are not required for perceptual reasons. The downmix signals 104 and the metadata 106 may be passed to an encoder 107.
The encoder 107 may comprise an audio encoder core 109 which is configured to receive the downmix (or otherwise) signals 104 and generate a suitable encoding of these audio signals. The encoder 107 can in some embodiments be a computer (running suitable software stored on memory and on at least one processor), or alternatively a specific device utilizing, for example, FPGAs or ASICs. The encoding may be implemented using any suitable scheme. The encoder 107 may furthermore comprise a metadata encoder/quantizer 1 1 1 which is configured to receive the metadata and output an encoded or compressed form of the information. In some embodiments the encoder 107 may further interleave, multiplex to a single data stream or embed the metadata within encoded downmix signals before transmission or storage shown in Figure 1 by the dashed line. The multiplexing may be implemented using any suitable scheme.
In the decoder side, the received or retrieved data (stream) may be received by a decoder/demultiplexer 133. The decoder/demultiplexer 133 may demultiplex the encoded streams and pass the audio encoded stream to a downmix extractor 135 which is configured to decode the audio signals to obtain the downmix signals. Similarly the decoder/demultiplexer 133 may comprise a metadata extractor 137 which is configured to receive the encoded metadata and generate metadata. The decoder/demultiplexer 133 can in some embodiments be a computer (running suitable software stored on memory and on at least one processor), or alternatively a specific device utilizing, for example, FPGAs or ASICs.
The decoded metadata and downmix audio signals may be passed to a synthesis processor 139.
The system 100‘synthesis’ part 131 further shows a synthesis processor 139 configured to receive the downmix and the metadata and re-creates in any suitable format a synthesized spatial audio in the form of multi-channel signals 1 10 (these may be multichannel loudspeaker format or in some embodiments any suitable output format such as binaural or Ambisonics signals, depending on the use case) based on the downmix signals and the metadata.
Therefore in summary first the system (analysis part) is configured to receive multi-channel audio signals. Then the system (analysis part) is configured to generate a downmix or otherwise generate a suitable transport audio signal (for example by selecting some of the audio signal channels). The system is then configured to encode for storage/transmission the downmix (or more generally the transport) signal. After this the system may store/transmit the encoded downmix and metadata. The system may retrieve/receive the encoded downmix and metadata. Then the system is configured to extract the downmix and metadata from encoded downmix and metadata parameters, for example demultiplex and decode the encoded downmix and metadata parameters.
The system (synthesis part) is configured to synthesize an output multi channel audio signal based on extracted downmix of multi-channel audio signals and metadata.
With respect to Figure 2 an example analysis processor 105 and Metadata encoder/quantizer 1 1 1 (as shown in Figure 1 ) according to some embodiments is described in further detail.
The analysis processor 105 in some embodiments comprises a time- frequency domain transformer 201 .
In some embodiments the time-frequency domain transformer 201 is configured to receive the multi-channel signals 102 and apply a suitable time to frequency domain transform such as a Short Time Fourier Transform (STFT) in order to convert the input time domain signals into a suitable time-frequency signals. These time-frequency signals may be passed to a spatial analyser 203 and to a signal analyser 205.
Thus for example the time-frequency signals 202 may be represented in the time-frequency domain representation by
Si(b, n),
where b is the frequency bin index and n is the time-frequency block (frame) index and i is the channel index. In another expression, n can be considered as a time index with a lower sampling rate than that of the original time-domain signals. These frequency bins can be grouped into subbands that group one or more of the bins into a subband of a band index k = 0,..., K-1 . Each subband k has a lowest bin bk low and a highest bin bk,high, and the subband contains all bins from bk,low to bk,high. The widths of the subbands can approximate any suitable distribution. For example the Equivalent rectangular bandwidth (ERB) scale or the Bark scale.
In some embodiments the analysis processor 105 comprises a spatial analyser 203. The spatial analyser 203 may be configured to receive the time- frequency signals 202 and based on these signals estimate direction parameters 108. The direction parameters may be determined based on any audio based ‘direction’ determination.
For example in some embodiments the spatial analyser 203 is configured to estimate the direction with two or more signal inputs. This represents the simplest configuration to estimate a‘direction’, more complex processing may be performed with even more signals.
The spatial analyser 203 may thus be configured to provide at least one azimuth and elevation for each frequency band and temporal time-frequency block within a frame of an audio signal, denoted as azimuth (p(k,n) and elevation 0(k,n). The direction parameters 108 may be also be passed to a direction analyser/index generator 215.
The spatial analyser 203 may also be configured to determine an energy ratio parameter 1 10. The energy ratio may be the energy of the audio signal considered to arrive from a direction. The direct-to-total energy ratio r(k,n) can be estimated, e.g., using a stability measure of the directional estimate, or using any correlation measure, or any other suitable method to obtain a ratio parameter. The energy ratio may be passed to an energy ratio average generator/quantization resolution determiner 21 1 .
Therefore in summary the analysis processor is configured to receive time domain multichannel or other format such as microphone or Ambisonics audio signals.
Following this the analysis processor may apply a time domain to frequency domain transform (e.g. STFT) to generate suitable time-frequency domain signals for analysis and then apply direction analysis to determine direction and energy ratio parameters.
The analysis processor may then be configured to output the determined parameters.
Although directions and ratios are here expressed for each time index n, in some embodiments the parameters may be combined over several time indices. Same applies for the frequency axis, as has been expressed, the direction of several frequency bins b could be expressed by one direction parameter in band k consisting of several frequency bins b. The same applies for all of the discussed spatial parameters herein.
As also shown in Figure 2 an example metadata encoder/quantizer 1 1 1 is shown according to some embodiments.
As discussed above the audio spatial metadata consists of azimuth, elevation, and energy ratio data for each subband. In the MASA format the directional data is represented on 16 bits such that the azimuth is approximately represented on 9 bits, and the elevation on 7 bits. The energy ratio is represented on 8 bits. For each frame there are N=5 subbands and M=4 time blocks, making that (16+8)xMxN bits to be needed to store the uncompressed metadata for each frame. In a higher frequency resolution version, there could be 20 or 24 frequency subbands. Although in the following examples the MASA format bit allocations are used it is understood that other embodiments may be implemented with other bit allocation, or subband or time block choices and these are representative examples only.
The metadata encoder/quantizer 1 1 1 may comprise an energy ratio average generator/quantization resolution determiner 21 1 . The energy ratio average generator/quantization resolution determiner 21 1 may be configured to receive the energy ratios and from the analysis and from this generate a suitable encoding of the ratios. For example to receive the determined energy ratios (for example direct- to-total energy ratios, and furthermore diffuse-to-total energy ratios and remainder- to-total energy ratios) and encode/quantize these. These encoded forms may be passed to the encoder 217.
In some embodiments the energy ratio average generator/quantization resolution determiner 21 1 is configured to encode each energy ratio value using a determined number of bits. For example in the above case where there are N=5 subbands 3 bits are used to encode each energy ratio value. The energy ratio average generator/quantization resolution determiner 21 1 thus may be configured to apply a scalar non-uniform quantization using 3 bits for each sub-band.
Additionally the energy ratio average generator/quantization resolution determiner 21 1 is configured to, rather than controlling the transmitting/storing of all of the energy ratio values for all TF blocks, generate only one weighted average value per subband which is passed to the encoder to be transmitted/stored.
In some embodiments this average is computed by taking into account the total energy of each time-frequency block and the weighting applied based on the subbands having more energy.
Additionally the energy ratio average generator/quantization resolution determiner 21 1 is configured to determine the quantization resolution for the direction parameters (in other words a quantization resolution for elevation and azimuth values) for all of the time-frequency blocks in the frame. This bit allocation may for example be defined by bits_dir0[0:N-1 ][0:M-1 ] and may be passed to the direction analyser/index generator 215.
As shown in Figure 3 the actions of the energy ratio average generator/quantization resolution determiner 21 1 can be summarised. The first step is one of receiving the ratio values as shown in Figure 3 by step 301 . Then the subband loop is started in Figure 3 by step 303. The subband loop comprises a first action of using a determined number of bits (for example 3) to represent the energy ratio value based on the weighted average of the energy ratio value for all of the values within the time block (where the weighting is determined by the energy value of the audio signal) as shown in Figure 3 by step 305. Then the second action is one determined the quantization resolution for the azimuth and elevation for all of the time block of the current subband based on the value of the energy ratio as shown in Figure 3 by step 307. The loop is closed in Figure 3 by step 309.
This can furthermore be represented in pseudocode by the following
1 . For each subband i=1 :N
a. Use 3 bits to encode the corresponding energy ratio value b. Set the quantization resolution for the azimuth and the elevation for all the time block of the current subband. The quantization resolution is set by allowing a predefined number of bits given by the value of the energy ratio, bits_dir0[0:N-1 ][0:M-1 ]
2. End for
The metadata encoder/quantizer 1 1 1 may comprise a direction analyser/index generator 215. The direction index generator 215 is configured to receive the direction parameters (such as the azimuth (p(k, n) and elevation 0(k, n) 108 and the quantization bit allocation and from this generate a quantized output. In some embodiments the quantization is based on an arrangement of spheres forming a spherical grid arranged in rings on a‘surface’ sphere which are defined by a look up table defined by the determined quantization resolution. In other words the spherical grid uses the idea of covering a sphere with smaller spheres and considering the centres of the smaller spheres as points defining a grid of almost equidistant directions. The smaller spheres therefore define cones or solid angles about the centre point which can be indexed according to any suitable indexing algorithm. Although spherical quantization is described here any suitable quantization, linear or non-linear may be used.
For example in some embodiments the bits for direction parameters (azimuth and elevation) are allocated according to the table bits_direction[]; if the energy ratio has the index /, the number of bits for the direction is bits_direction[/].
Figure imgf000020_0001
The structure of the direction quantizers for different bit resolutions is given by the following variables:
Figure imgf000020_0002
‘no_theta’ corresponds to the number of elevation values in the ‘North hemisphere’ of the sphere of directions, including the Equator.‘no_phi’ corresponds to the number of azimuth values at each elevation for each quantizer.
For instance for 5 bits there are 4 elevation values corresponding to [0, 30, 60, 90] and 4-1 =3 negative elevation values [-30, -60, -90]. For the first elevation value, 0, there are 12 equidistant azimuth values, for the elevation values 30 and - 30 there are 7 equidistant azimuth values and so on. All quantization structures with the exception of the structure corresponding to 4 bits have the difference between consecutive elevation values given by 90 degrees divided by the number of elevation values‘no_theta’. This is an example and any other suitable distribution may be implemented. For example in some embodiments there may be implemented a spherical grid for 4 bits that might have no points under the Equator. Similarly the 3 bits distribution may be spread on the sphere or restricted to the Equator only.
Having determined the direction indices the direction analyser/index generator 215 can then be configured to entropy encode the azimuth and elevation indices. The entropy coding is implemented for one frequency subband at a time, encoding all the time subframes for that subband. This means that for instance the best GR order is determined for the 4 values corresponding to the time subframes of a current subband. Furthermore as discussed herein when there are several methods to encode the values for one subband one of the methods is selected as discussed later. The entropy encoding of the azimuth and the elevation indexes in some embodiments may be implemented using a Golomb Rice encoding method with two possible values for the Golomb Rice parameter. In some embodiments the entropy coding may also be implemented using any suitable entropy coding technique (for example Huffman, arithmetic coding ...).
Having entropy encoded the direction indices (the elevation and azimuth indices in this example) then the direction analyser/index generator 215 can then be configured to compare the number of bits used by the entropy coding EC to the number of bits available for encoding bits_dir1 [0:N-1 ][0:M-1]. This decision may be implemented at a subband level. In other words the EC bits for azimuth + the EC bits for elevation of the current subband, i, are compared to sum(bits_dir1 [i][0:M-1 ]).
If at a frame level the number of bits EC is equal or less than the available number of bits, then the entropy coding based on the quantized direction indices are used.
However at a frame level where the number of bits used in entropy coding EC is greater than the number of bits available then an allocation of the number of bits for quantization bits_dir1 [0:N-1 ][0:M-1 ] is reduced such that such that the sum of the allocated bits equals the number of available bits left after encoding the energy ratios.
Furthermore the direction analyser/index generator 215 can then be configured to start a loop for each subband upto the penultimate subband N-1 . Within this loop an allowed number of bits for the current subband is determined bits_allowed= sum(bits_dir1 [i][0:M-1 ]). Then having determined the number of allowed number of bits for the current subband the direction analyser/index generator 215 can be configured to encode the indices by using fixed rate encoding with the reduced allocated number of bits bits_fixed=bits_allowed.
The direction analyser/index generator 215 can then be configured to select either the fixed rate encoding or using entropy coding based on the method which uses fewer bits, i.e. select the lowest of bits_fixed or bits_ec. Furthermore the direction analyser/index generator 215 can then be configured to use one bit to indicate which of the two encoding methods have been selected. The number of bits used for the subband encoding is therefore nb = min(bits_fixed, bits_ec)+1 .
The direction analyser/index generator 215 can then be configured to determine whether there are bits available with respect to the allowed bits, in other words if diff = allowed_bits- nb >0. Where there is a difference between the number of bits available and the number of bits used in the subband then the difference diff can be distributed to the later subbands, for example by updating bits_dir1 [i+1 :N- 1 ][0_M-1 ], else the direction analyser/index generator 215 can be configured to subtract a bit from the next subband allocation bits_dir1 [i+1 ][0].
For the final subband N the direction analyser/index generator 215 can be configured to encode the direction indices using the fixed rate encoding method and using bits_dir1 [N-1 ][0:M-1 ] bits.
As shown in Figure 4 these actions of the direction analyser/index generator 215 can be summarised. The first step is one of determining the direction indices based on the quantization as shown in Figure 4 by step 400. Flaving determined the direction indices they are entropy encoded as shown in Figure 4 by step 401 . The number of bits used entropy encoding the direction indices are compared against the number of bit allowed as shown in Figure 4 by step 403. Where the number of bits used for entropy encoding the direction indices is less than (or equal to) the number of bits allowed then the entropy encoded direction indices can be used as shown in Figure 4 by step 404 and the method ends for this frame.
Where the number of bits used for entropy encoding the direction indices is more than the number of bits allowed then the number of allocated bits is reduced such that the sim of the allocated bits equals the number of available bits left after encoding the energy ratios as shown in Figure 4 by step 405.
Then a loop is started for the subbands from 1 to the penultimate (N-1 ) subband as shown in Figure 4 by step 407.
Within the loop, for the current subband, the number of allowed bits for encoding is determined as shown in Figure 4 by step 409.
Then a fixed rate encoding method is used to encode the indices using the reduced number of bits as shown in Figure 4 by step 41 1 .
Either the fixed rate encoding or the entropy encoding is then selected based on which method uses fewer bits and the selection furthermore can be indicated by a single bit as shown in Figure 4 by step 413.
The determination of whether there are any remaining bits available based on the difference between the number of allowed bits and the number of bits used by the selected encoding and the redistribution of the remaining bits to the later subband allocations is shown in Figure 4 by step 415.
The loop is then completed and may then repeat for the next subband as shown in Figure 4 by step 417.
Finally the last subband is encoded using a fixed rate method using the remaining allocation of bits as shown in Figure 4 by step 419.
This can furthermore be represented in pseudocode by the following
3. Attempt to entropy encode the azimuth and elevation indexes, applying entropy encoding optimisation and resulting in bits_EC being used
4. If bits EObits available a. Reduce the allocated number of bits, bits_dir1 [0:N-1 ][0:M-1 ], such that the sum of the allocated bits equals the number of available bits left after encoding the energy ratios
b. For each subband i=1 :N-1
i. Calculate allowed bits for current subband: bits_allowed= sum(bits_dir1 [i][0:M-1 ])
ii. Encode the direction parameter indexes by using fixed rate encoding with the reduced allocated number of bits, bits_fixed=bits_allowed, or using an entropy coding, bits_ec; select the one using less bits and use one bit to tell the method: nb = min(bits_fixed, bits_ec)+1 ;
iii. If there are bits available with respect to the allowed bits: (if diff = allowed_bits- nb >0)
1. Redistribute the difference, diff, to the following subbands, by updating bits_dir1 [i+1 :N-1][0_M-1]
iv. Else
1. Subtract one bit from bits_dir1 [i+1][0] v. End if
c. End for
d. Encode the direction parameter indexes for the last subband with the fixed rate approach using bits_dir1 [N-1 ][0:M-1 ] bits.
5. Else
a. Use the EC encoding
6. End
In some implementations the optimisation of the entropy encoding of the elevation and the azimuth values can be performed separately and is described in further detail hereafter with respect to Figures 5 and 6.
For example with respect to Figure 5 is shown an example wherein in some embodiments a series of index checks and optimisations are applied in order to attempt to reduce the number of bits required to entropy encode the direction indices. In some embodiments the direction indices determination is started as shown in Figure 5 by step 501 . In this example the bits required for entropy encoding the indices determination shown is an elevation index determination. However as described later a similar approach may be applied to the azimuth index determination.
In some embodiments a mapping is generated such that the elevation (or azimuth) value of 0 has an index of 0 and the increasing index values are assigned to increasing positive and negative elevation (azimuth) values as shown in Figure 5 by step 503.
Having generated the mapping then the mapping is applied to the audio sources (for example in the form of generating a codeword output based on a lookup table) as shown in Figure 5 by step 505.
The indices having been generated, in some embodiments there is a check performed to determine whether all of the indices are located within the same hemisphere as shown in Figure 5 by step 507.
Where all of the indices are located within the same hemisphere then the index values can be divided by two (with a rounding up) and an indicator generated indicating which hemisphere the indices were all located within and then entropy encoding these values as shown in Figure 5 by step 509.
Where all of the indices are not located within the same hemisphere then a mean removed entropy encoding can be applied to the indices. A mean removed entropy encoding may be configured to remove first the average index value for the subframes to be encoded, then remap the indices to positive ones and then encode them with a suitable entropy encoding, such as Golomb Rice encoding as shown in Figure 5 by step 510.
After applying entropy encoding, in some embodiments a check can be applied to determine whether all of the time subframes have the same elevation (azimuth) value or index as shown in Figure 5 by step 51 1 .
Where all of the time subframes have the same elevation (azimuth) value or index then an indicator is generated indicating the multiple of elevation (azimuth) value or index as shown in Figure 5 by step 513 otherwise the method passes directly to step 517.
The next operation is one of providing the number of bits required for the entropy encoded indices and any indicator bits as shown in Figure 5 by step 517.
For example with respect to elevation values, the index of the elevation can be determined from a codebook in the domain [-90; 90] which is formed such that an elevation with a value 0 returns a codeword with index zero and alternatively assigns increasing indexes to positive and negative codewords distancing themselves from the zero elevation value.
Thus as an example in some embodiments there is implemented a codebook with the codewords {-90, -60, -30, 0, 30, 60, 90} which produces the indexes {6, 4, 2, 0, 1 , 3, 5}. This indexing produces lower valued indexes for directions that are more probable in a general sense (where in practical examples the directions are near the Equator). Another observation is that if the audio sources are further away from the Equator, corresponding to higher values indexes, they tend to be all above or all under the Equator. In some embodiments the encoder can be configured to check whether all of the audio sources are above (or all of the audio sources are below) the equator and where this is the case for all time subframes for a subband then dividing the indices by 2, in order to generate smaller valued indices which can be more efficiently encoded.
In some embodiments the estimation of the number of bits for the elevation indices can be implemented in C as follows:
Figure imgf000026_0001
Figure imgf000027_0001
A special case of same elevation values for all the time subframes is also checked and signalled.
The function mean_removed_GR() in the above example is configured to remove first the average index value for the subframes to be encoded, then remap the indices to positive ones and then encodes them with Golomb Rice encoding.
This can be implemented, for example in C language, by the following:
Figure imgf000027_0002
Figure imgf000028_0001
The function odd_even_mean_removed_GR() is configured to check first if all indexes are odd or if all are even, signals this occurrence and indicates the type (odd or even) after which it encodes the halved indices.
Figure imgf000028_0002
Figure imgf000029_0001
Figure imgf000030_0001
In some embodiments a series of entropy encoding optimisation operations are performed and then the lowest value is selected. This for example can be shown with respect to the encoding of azimuth values and as shown in Figure 6. In some embodiments the direction indices determination is started as shown in Figure 6 by step 601 .
In some embodiments a mapping is generated such that the azimuth value of 0 has an index of 0 and the increasing index values are assigned to increasing positive and negative azimuth values as shown in Figure 6 by step 503.
Flaving generated the mapping then the mapping is applied to the audio sources (for example in the form of generating a codeword output based on a lookup table) as shown in Figure 6 by step 605.
In this example, the index of the azimuth can be determined from a further codebook. In this example the zero value for the azimuth corresponds to a reference direction which may be the front direction, and positive values are to the left and negative values to the right. In this example the index of the azimuth value is assigned such that the values (-150, -120, -90, -60, -30, 0, 30, 60, 90, 120, 150, 180) have assigned the following indices (10, 8, 6, 4, 2, 0, 1 , 3, 5, 7, 9, 1 1 ). In some embodiments the odd/even approach can be checked for the azimuth (corresponding to left /right positioning).
In this example the higher index values are assigned to values from the back or rear of the‘capture environment’.
The encoding of the azimuth indexes of a subframe can in some embodiments be performed based on the following:
1 . Determine the number of azimuth indices to be encoded for the current subband (as shown in Figure 6 by step 607) 2. Find the maximum number of symbols for the tiles of the current subband (as shown in Figure 6 by step 609)
3. If there are more symbols than a threshold (as shown in Figure 6 step 611 ) a. Encode (as shown in Figure 6 by step 613) the azimuth values by checking the encoding of the values given by the complementary values: no_symb-index_azimuth.
i. Estimate the number of bits if encoding the indexes as they would be in front. Use mean removed order selective Golomb Rice coding. The GR order may be 2 or 3. The GR order can also be set to different values, depending of the default range for the number of symbols. ii. Estimate the number of bits if encoding the complementary indexes using mean removed order selective GR coding. iii. Use the encoding method that uses the fewer number of bits and use a bit to signal which method is used
4. Else
a. Encode the azimuth indexes using a mean removed GR coding with order 1 or 2 (as shown in Figure 6 by step 615).
5. End
6. Check whether a minimum removed GR coding produces a better output and if better use it (as shown in Figure 6 by step 617)
In C language the encoding looks like in the following:
Figure imgf000031_0001
Figure imgf000032_0001
Figure imgf000033_0001
Figure imgf000034_0001
Figure imgf000035_0001
With respect to Figure 7 an example electronic device which may be used as the analysis or synthesis device is shown. The device may be any suitable electronics device or apparatus. For example in some embodiments the device 1400 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc.
In some embodiments the device 1400 comprises at least one processor or central processing unit 1407. The processor 1407 can be configured to execute various program codes such as the methods such as described herein.
In some embodiments the device 1400 comprises a memory 141 1 . In some embodiments the at least one processor 1407 is coupled to the memory 141 1 . The memory 141 1 can be any suitable storage means. In some embodiments the memory 141 1 comprises a program code section for storing program codes implementable upon the processor 1407. Furthermore in some embodiments the memory 141 1 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1407 whenever needed via the memory-processor coupling.
In some embodiments the device 1400 comprises a user interface 1405. The user interface 1405 can be coupled in some embodiments to the processor 1407. In some embodiments the processor 1407 can control the operation of the user interface 1405 and receive inputs from the user interface 1405. In some embodiments the user interface 1405 can enable a user to input commands to the device 1400, for example via a keypad. In some embodiments the user interface 1405 can enable the user to obtain information from the device 1400. For example the user interface 1405 may comprise a display configured to display information from the device 1400 to the user. The user interface 1405 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1400 and further displaying information to the user of the device 1400. In some embodiments the user interface 1405 may be the user interface for communicating with the position determiner as described herein.
In some embodiments the device 1400 comprises an input/output port 1409. The input/output port 1409 in some embodiments comprises a transceiver. The transceiver in such embodiments can be coupled to the processor 1407 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network. The transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
The transceiver can communicate with further apparatus by any suitable known communications protocol. For example in some embodiments the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA). The transceiver input/output port 1409 may be configured to receive the signals and in some embodiments determine the parameters as described herein by using the processor 1407 executing suitable code.
In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.

Claims

CLAIMS:
1 . An apparatus comprising means configured to:
generate spatial audio signal directional metadata parameters for a time- frequency tile;
generate at least one mapping between the values of the spatial audio signal directional metadata parameter and an index value;
generate indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and
encode the indices based on an estimate of the number of bits required to encode the indices.
2. The apparatus as claimed in claim 1 , wherein the means configured to encode the indices based on an estimate of the number of bits required to encode the indices is configured to:
estimate the number of bits required to entropy encode the indices based on a first mapping from the at least one mapping for the time-frequency tile;
determine whether the number of bits required is greater than a determined threshold value and when the number of bits required is greater than the determined number of bits then:
estimate a further number of bits required to encode indices based on at least one further mapping from the at least one mapping for the time- frequency tile, wherein the at least one further mapping reduces a possible number of index values to be encoded;
select one of the at least one further mapping based on a lowest number of bits required; and
encode the indices based on the selected one of the at least one further mapping.
3. The apparatus as claimed in claim 2, wherein the at least one further mapping comprises at least one of: mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 corresponding to a reference direction, is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0, corresponding to a reference direction, is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; and
mapping a mean removed index value to directional values.
4. The apparatus as claimed in any of claims 2 or 3, wherein the at least one further mapping comprises at least one of:
mapping index values to directional values for only one hemisphere; and mapping a single index value to a range of directional values.
5. The apparatus as claimed in any of claims 2 to 4, wherein the at least one further mapping comprises a complementary mapping to the first mapping.
6. The apparatus as claimed in any of claims 2 to 5, wherein the encoding of the indices based on at least one further mapping comprises at least one of:
entropy encoding; and
fixed rate encoding.
7. The apparatus as claimed in any of claims 1 to 6, wherein the at least one mapping comprises at least one of:
mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 corresponding to a reference direction, is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 corresponding to a reference direction, is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; mapping index values to directional values for only one hemisphere;
mapping a single index value to a range of directional values; and
mapping a mean removed index value to directional values.
8. A method comprising:
generating spatial audio signal directional metadata parameters for a time- frequency tile;
generating at least one mapping between values of the spatial audio signal directional metadata parameter are mapped to an index value;
generating indices associated with the spatial audio signal directional metadata parameters based on the at least one mapping; and
encoding the indices based on an estimate of the number of bits required to encode the indices.
9. The method as claimed in claim 8, wherein the encoding the indices based on an estimate of the number of bits required to encode the indices comprises: estimating the number of bits required to entropy encode the indices based on a first mapping from the at least one mapping for the time-frequency tile;
determining whether the number of bits required is greater than a determined threshold value and when the number of bits required is greater than the determined number of bits then:
estimating a further number of bits required to encode indices based on at least one further mapping from the at least one mapping for the time-frequency tile, wherein the at least one further mapping reduces a possible number of index values to be encoded;
selecting one of the at least one further mapping based on a lowest number of bits required; and encoding the indices based on the selected one of the at least one further mapping.
10. The method as claimed in claim 9, wherein the at least one further mapping comprises at least one of:
mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 corresponding to a reference direction, is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 corresponding to a reference direction, is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; and
mapping a mean removed index value to directional values.
11. The method as claimed in any of claims 9 or 10, wherein the at least one further mapping comprises at least one of:
mapping index values to directional values for only one hemisphere; and mapping a single index value to a range of directional values.
12. The method as claimed in any of claims 9 to 11 , wherein the at least one further mapping comprises a complementary mapping to the first mapping.
13. The method as claimed in any of claims 9 to 12, wherein encoding of the indices based on at least one further mapping comprises at least one of:
encoding of the indices by entropy encoding; and
encoding of the indices by fixed rate encoding.
14. The method as claimed in any of claims 8 to 13, wherein the mapping comprises at least one of: mapping an index value of 0 to a directional value of 0 and mapping increasing index values to increasing positive and negative directional values; mapping indices such that a directional value 0 corresponding to a reference direction, is mapped to index 0 and the directional values corresponding to directions from left or right are alternatively allocated to increasing index values; mapping of indices such that a directional value 0 corresponding to a reference direction, is mapped to index 0 and the directional values corresponding to directions from up or down are alternatively allocated to increasing index values; mapping index values to directional values for only one hemisphere;
mapping a single index value to a range of directional values; and
mapping a mean removed index value to directional values.
PCT/FI2020/050423 2019-06-25 2020-06-15 Determination of spatial audio parameter encoding and associated decoding WO2020260756A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP20833011.8A EP3991170A4 (en) 2019-06-25 2020-06-15 Determination of spatial audio parameter encoding and associated decoding

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1909138.8 2019-06-25
GB1909138.8A GB2585187A (en) 2019-06-25 2019-06-25 Determination of spatial audio parameter encoding and associated decoding

Publications (1)

Publication Number Publication Date
WO2020260756A1 true WO2020260756A1 (en) 2020-12-30

Family

ID=67511633

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2020/050423 WO2020260756A1 (en) 2019-06-25 2020-06-15 Determination of spatial audio parameter encoding and associated decoding

Country Status (3)

Country Link
EP (1) EP3991170A4 (en)
GB (1) GB2585187A (en)
WO (1) WO2020260756A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626550A (en) * 2021-08-05 2021-11-09 生态环境部卫星环境应用中心 Image tile map service method based on triple bidirectional index and optimized cache
WO2022152960A1 (en) * 2021-01-18 2022-07-21 Nokia Technologies Oy Transforming spatial audio parameters

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2472241A1 (en) * 2009-11-27 2012-07-04 ZTE Corporation Audio encoding/decoding method and system of lattice-type vector quantizing
EP3007168A1 (en) * 2013-05-31 2016-04-13 Sony Corporation Encoding device and method, decoding device and method, and program
WO2019091575A1 (en) * 2017-11-10 2019-05-16 Nokia Technologies Oy Determination of spatial audio parameter encoding and associated decoding
WO2019097017A1 (en) * 2017-11-17 2019-05-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for encoding or decoding directional audio coding parameters using different time/frequency resolutions

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6758675B2 (en) * 2002-02-04 2004-07-06 James M. Karabaic Base ten primary teaching kit
US7991610B2 (en) * 2005-04-13 2011-08-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Adaptive grouping of parameters for enhanced coding efficiency
US20060235683A1 (en) * 2005-04-13 2006-10-19 Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V. Lossless encoding of information with guaranteed maximum bitrate
CN116665683A (en) * 2013-02-21 2023-08-29 杜比国际公司 Method for parametric multi-channel coding
GB2575305A (en) * 2018-07-05 2020-01-08 Nokia Technologies Oy Determination of spatial audio parameter encoding and associated decoding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2472241A1 (en) * 2009-11-27 2012-07-04 ZTE Corporation Audio encoding/decoding method and system of lattice-type vector quantizing
EP3007168A1 (en) * 2013-05-31 2016-04-13 Sony Corporation Encoding device and method, decoding device and method, and program
WO2019091575A1 (en) * 2017-11-10 2019-05-16 Nokia Technologies Oy Determination of spatial audio parameter encoding and associated decoding
WO2019097017A1 (en) * 2017-11-17 2019-05-23 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for encoding or decoding directional audio coding parameters using different time/frequency resolutions

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3991170A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022152960A1 (en) * 2021-01-18 2022-07-21 Nokia Technologies Oy Transforming spatial audio parameters
CN113626550A (en) * 2021-08-05 2021-11-09 生态环境部卫星环境应用中心 Image tile map service method based on triple bidirectional index and optimized cache
CN113626550B (en) * 2021-08-05 2022-02-25 生态环境部卫星环境应用中心 Image tile map service method based on triple bidirectional index and optimized cache

Also Published As

Publication number Publication date
GB201909138D0 (en) 2019-08-07
GB2585187A (en) 2021-01-06
EP3991170A4 (en) 2023-05-10
EP3991170A1 (en) 2022-05-04

Similar Documents

Publication Publication Date Title
US11676612B2 (en) Determination of spatial audio parameter encoding and associated decoding
US11600281B2 (en) Selection of quantisation schemes for spatial audio parameter encoding
EP3707706B1 (en) Determination of spatial audio parameter encoding and associated decoding
EP3874492B1 (en) Determination of spatial audio parameter encoding and associated decoding
US20220343928A1 (en) Determination of spatial audio parameter encoding and associated decoding
WO2020016479A1 (en) Sparse quantization of spatial audio parameters
WO2020260756A1 (en) Determination of spatial audio parameter encoding and associated decoding
US20230047237A1 (en) Spatial audio parameter encoding and associated decoding
KR20220062621A (en) Spatial audio parameter encoding and related decoding
WO2019197713A1 (en) Quantization of spatial audio parameters
WO2019243670A1 (en) Determination of spatial audio parameter encoding and associated decoding
RU2797457C1 (en) Determining the coding and decoding of the spatial audio parameters
US20240127828A1 (en) Determination of spatial audio parameter encoding and associated decoding
US20230335143A1 (en) Quantizing spatial audio parameters
WO2022223133A1 (en) Spatial audio parameter encoding and associated decoding
EP3948861A1 (en) Determination of the significance of spatial audio parameters and associated encoding
WO2023084145A1 (en) Spatial audio parameter decoding
EP4264603A1 (en) Quantizing spatial audio parameters

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20833011

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020833011

Country of ref document: EP

Effective date: 20220125