WO2020016479A1 - Quantification éparse de paramètres audio spatiaux - Google Patents
Quantification éparse de paramètres audio spatiaux Download PDFInfo
- Publication number
- WO2020016479A1 WO2020016479A1 PCT/FI2019/050527 FI2019050527W WO2020016479A1 WO 2020016479 A1 WO2020016479 A1 WO 2020016479A1 FI 2019050527 W FI2019050527 W FI 2019050527W WO 2020016479 A1 WO2020016479 A1 WO 2020016479A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tracks
- quantization
- spatial audio
- index
- direction parameter
- Prior art date
Links
- 238000013139 quantization Methods 0.000 title claims abstract description 147
- 238000000034 method Methods 0.000 claims abstract description 44
- 230000005236 sound signal Effects 0.000 claims abstract description 28
- 238000004458 analytical method Methods 0.000 description 26
- 230000015572 biosynthetic process Effects 0.000 description 11
- 230000000875 corresponding effect Effects 0.000 description 11
- 238000003786 synthesis reaction Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 8
- 238000013461 design Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000009826 distribution Methods 0.000 description 6
- 239000004065 semiconductor Substances 0.000 description 6
- 238000003491 array Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000001427 coherent effect Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008867 communication pathway Effects 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000001955 cumulated effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 238000004091 panning Methods 0.000 description 1
- 238000012732 spatial analysis Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
- G10L19/038—Vector quantisation, e.g. TwinVQ audio
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/11—Application of ambisonics in stereophonic audio systems
Definitions
- the present application relates to apparatus and methods for sound-field related parameter encoding, but not exclusively for time-frequency domain direction related parameter encoding for an audio encoder and decoder.
- Parametric spatial audio processing is a field of audio signal processing where the spatial aspect of the sound is described using a set of parameters.
- parameters such as directions of the sound in frequency bands, and the ratios between the directional and non-directional parts of the captured sound in frequency bands.
- These parameters are known to well describe the perceptual spatial properties of the captured sound at the position of the microphone array.
- These parameters can be utilized in synthesis of the spatial sound accordingly, for headphones binaurally, for loudspeakers, or to other formats, such as Ambisonics.
- the directions and direct-to-total energy ratios in frequency bands are thus a parameterization that is particularly effective for spatial audio capture.
- a parameter set consisting of a direction parameter in frequency bands and an energy ratio parameter in frequency bands (indicating the directionality of the sound) can be also utilized as the spatial metadata for an audio codec.
- these parameters can be estimated from microphone-array captured audio signals, and for example a stereo signal can be generated from the microphone array signals to be conveyed with the spatial metadata.
- the stereo signal could be encoded, for example, with an AAC encoder or an EVS encoder or any other suitable encoder.
- a corresponding decoder can decode the audio signals into PCM signals, and process the sound in frequency bands (using the spatial metadata) to obtain the spatial output, for example a binaural output.
- the aforementioned solution is particularly suitable for encoding captured spatial sound from microphone arrays (e.g., in mobile phones, VR cameras, stand-alone microphone arrays).
- microphone arrays e.g., in mobile phones, VR cameras, stand-alone microphone arrays.
- a further input for the encoder is also multi-channel loudspeaker input, such as 5.1 or 7.1 channel surround inputs.
- a quantization and/or encoding which implements uniform granularity along the azimuth and the elevation components separately (when these two parameters are separately added to the metadata) can result in an uneven distribution of quantization and encoding states.
- a uniform approach to both separately results in an encoding scheme with a higher density nearer the ‘poles’ of the direction sphere, in other words directly above or below the locus or reference location.
- audio codecs such as the Immersed Voice Audio Service (IVAS) codec
- IVAS Immersed Voice Audio Service
- a method for spatial audio signal encoding comprising: determining, for two or more audio signals, a spatial audio direction parameter; using a quantization grid of quantization points for indexing the spatial audio direction parameter, wherein the quantization points of quantization grid are divided into at least two sets of tracks; and indexing the spatial audio direction parameter to an index of which a value of the index corresponds to a quantization point on a track from a first set of tracks of the at least two sets of tracks.
- Each of the at least two sets of tracks may comprise a plurality of tracks, wherein each track of the at least two sets of tracks may comprise a plurality of quantization points of the quantization grid.
- Indexing the spatial audio direction parameter to an index of which a value of the index corresponding to a quantization point on a track from a first set of tracks of the at least two sets of tracks may comprise indexing the spatial audio direction parameter to a quantization point of the plurality of quantization points from the plurality of tracks of the first set of tracks which is closest to the spatial audio direction parameter.
- the method may further comprise: indexing a further spatial audio direction parameter to a further index of which a value of the further index corresponds to a quantization point on a track from a second set of tracks of the at least two sets of tracks by indexing the further spatial audio direction parameter to a quantization point of the plurality of quantization points from the plurality of tracks of the second set of tracks which is closest to the further spatial audio direction parameter; and indexing a yet further spatial audio direction parameter to a yet further index of which a value of the yet further index corresponds to a further quantization point on a track from the first set of tracks of the at least two sets of tracks by indexing the yet further spatial audio direction parameter to a quantization point of the plurality or quantization points from the plurality of tracks of the first set of tracks which is closest to the yet further spatial audio direction parameter.
- the further index may comprise a reserved value for pointing to a value of an index which corresponds to a quantization point on a track from the first set of tracks of the at least two sets of tracks, and the spatial audio direction parameter may precede the further spatial audio direction parameter.
- the method may further comprise: determining a distance measure between the spatial audio direction parameter and the further spatial audio direction parameter; and using the reserved value to index the further spatial audio direction parameter in the case that the determined distance is less than a predetermined threshold value.
- the grid may be a spherical grid generated by covering a sphere with smaller spheres, the centres of the smaller spheres may define quantization points of the spherical grid, the spatial audio direction parameter may comprise an elevation and an azimuth component.
- the plurality of quantization points may be contiguous and the plurality of quantization points may form a ring or partial ring of contiguous points around the spherical grid.
- an apparatus for spatial audio signal encoding comprising means for determining, for two or more audio signals, at least one spatial audio direction parameter; means for using a quantization grid of quantization points for indexing the spatial audio direction parameter, wherein the quantization points of quantization grid are divided into at least two sets of tracks; and means for indexing the spatial audio direction parameter to an index of which a value of the index corresponds to a quantization point on a track from a first set of tracks of the at least two sets of tracks.
- Each of the at least two sets of tracks may comprise a plurality of tracks, wherein each track of the at least two sets of tracks may comprise a plurality of quantization points of the quantization grid,
- the means for indexing the spatial audio direction parameter to an index of which a value of the index corresponding to a quantization point on a track from a first set of tracks of the at least two sets of tracks may comprise: means for indexing the spatial audio direction parameter to a quantization point of the plurality of quantization points from the plurality of tracks of the first set of tracks which is closest to the spatial audio direction parameter.
- the apparatus may further comprise: means for indexing a further spatial audio direction parameter to a further index of which a value of the further index corresponds to a quantization point on a track from a second set of tracks of the at least two sets of tracks by indexing the further spatial audio direction parameter to a quantization point of the plurality of quantization points from the plurality of tracks of the second set of tracks which is closest to the further spatial audio direction parameter; and means for indexing a yet further spatial audio direction parameter to a yet further index of which a value of the yet further index corresponds to a further quantization point on a track from the first set of tracks of the at least two sets of tracks by indexing the yet further spatial audio direction parameter to a quantization point of the plurality or quantization points from the plurality of tracks of the first set of tracks which is closest to the yet further spatial audio direction parameter.
- the further index may comprise a reserved value for pointing to a value of an index which corresponds to a quantization point on a track from the first set of tracks of the at least two sets of tracks, and the spatial audio direction parameter may precede the further spatial audio direction parameter.
- the apparatus may further comprise: means for determining a distance measure between the spatial audio direction parameter and the further spatial audio direction parameter; and means for using the reserved value to index the further spatial audio direction parameter in the case that the determined distance is less than a predetermined threshold value.
- the grid may be a spherical grid generated by covering a sphere with smaller spheres, the centres of the smaller spheres may define quantization points of the spherical grid, the spatial audio direction parameter may comprise an elevation and an azimuth component.
- the plurality of quantization points may be contiguous and the plurality of quantization points may form a ring or partial ring of contiguous points around the spherical grid.
- an apparatus comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: determine, for two or more audio signals, at least one spatial audio direction parameter; use a quantization grid of quantization points for indexing the spatial audio direction parameter, wherein the quantization points of quantization grid are divided into at least two sets of tracks; and index the spatial audio direction parameter to an index of which a value of the index corresponds to a quantization point on a track from a first set of tracks of the at least two sets of tracks.
- Each of the at least two sets of tracks may comprise a plurality of tracks, wherein each track of the at least two sets of tracks may comprise a plurality of quantization points of the quantization grid,
- the apparatus caused to index the spatial audio direction parameter to an index of which a value of the index corresponding to a quantization point on a track from a first set of tracks of the at least two sets of tracks may be caused to: index the spatial audio direction parameter to a quantization point of the plurality of quantization points from the plurality of tracks of the first set of tracks which is closest to the spatial audio direction parameter.
- the apparatus may further caused to: index a further spatial audio direction parameter to a further index of which a value of the further index corresponds to a quantization point on a track from a second set of tracks of the at least two sets of tracks by indexing the further spatial audio direction parameter to a quantization point of the plurality of quantization points from the plurality of tracks of the second set of tracks which is closest to the further spatial audio direction parameter; and index a yet further spatial audio direction parameter to a yet further index of which a value of the yet further index corresponds to a further quantization point on a track from the first set of tracks of the at least two sets of tracks by indexing the yet further spatial audio direction parameter to a quantization point of the plurality or quantization points from the plurality of tracks of the first set of tracks which is closest to the yet further spatial audio direction parameter.
- the further index may comprise a reserved value for pointing to a value of an index which corresponds to a quantization point on a track from the first set of tracks of the at least two sets of tracks, and the spatial audio direction parameter may precede the further spatial audio direction parameter.
- the apparatus may be further caused to: determine a distance measure between the spatial audio direction parameter and the further spatial audio direction parameter; and use the reserved value to index the further spatial audio direction parameter in the case that the determined distance is less than a predetermined threshold value.
- the grid may be a spherical grid generated by covering a sphere with smaller spheres, the centres of the smaller spheres may define quantization points of the spherical grid, the spatial audio direction parameter may comprise an elevation and an azimuth component.
- the plurality of quantization points may be contiguous and the plurality of quantization points may form a ring or partial ring of contiguous points around the spherical grid.
- a computer program comprising program instructions for causing a computer to perform the method as described above.
- a computer program product stored on a medium may cause an apparatus to perform the method as described herein.
- An electronic device may comprise apparatus as described herein.
- a chipset may comprise apparatus as described herein.
- Embodiments of the present application aim to address problems associated with the state of the art.
- Figure 1 shows schematically a system of apparatus suitable for implementing some embodiments
- FIG. 2 shows schematically the analysis processor as shown in figure 1 according to some embodiments
- Figure 3a shows schematically the metadata encoder/quantizer as shown in figure 1 according to some embodiments
- Figure 3b shows schematically the metadata extractor as shown in figure 1 according to some embodiments
- Figure 3c to 3e shows schematically example sphere location configurations as used in the metadata encoder/quantizer and metadata extractor as shown in figures 3a and 3b according to some embodiments;
- Figure 4 shows a flow diagram of the operation of the system as shown in figure 1 according to some embodiments
- Figure 5 shows a flow diagram of the operation of the analysis processor as shown in figure 2 according to some embodiments
- Figure 6 shows a flow diagram of generating a direction index based on an input direction parameter in further detail
- Figure 7(a) shows a depiction of a quantisation points on the surface of a spherical grid
- Figure 7(b) shows how the quantisation points on the surface of a spherical grid can be divided into two sets of tracks of quantisation points
- Figure 7(c) shows a partial section of a spherical grid depicting a division into four sets of tracks of quantisation points
- Figure 8 shows a flow diagram of generating a quantized direction parameter based on an input direction index in further detail
- Figure 9 shows a flow diagram of an example operation of converting a quantized direction parameter from a direction index in further detail.
- Figure 10 shows schematically an example device suitable for implementing the apparatus shown.
- the input format for the audio codec may be any suitable input format, such as multi-channel loudspeaker, ambisonic (FOA/FIOA) etc. It is understood that in some embodiments the channel location is based on a location of the microphone or is a virtual location or direction.
- the output of the example system is a multi-channel loudspeaker arrangement. Flowever it is understood that the output may be rendered to the user via means other than loudspeakers such as a binauralized headphone output.
- an audio decoder may output a parametric representation of the encoded and transmitted audio, which can be presented to a user via a loudspeaker arrangement or headphones using an external Tenderer.
- the multi-channel loudspeaker signals may be generalised to be two or more playback audio signals.
- spatial metadata parameters such as direction and direct- to-total energy ratio (or diffuseness-ratio, or any suitable expression indicating the directionality/non-directionality of the sound at the given time-frequency interval) parameters in frequency bands are particularly suitable for expressing the perceptual properties of natural sound fields.
- Synthetic sound scenes such as 5.1 loudspeaker mixes commonly utilize audio effects and amplitude panning methods that provide spatial sound that differs from sounds occurring in natural sound fields.
- a 5.1 or 7.1 mix may be configured such that it contains coherent sounds played back from multiple directions.
- some sounds of a 5.1 mix perceived directly at the front are not produced by a centre (channel) loudspeaker, but for example coherently from left and right front (channels) loudspeakers, and potentially also from the centre (channel) loudspeaker.
- the spatial metadata parameters such as direction(s) and energy ratio(s) do not express such spatially coherent features accurately.
- other metadata parameters such as coherence parameters may be determined from analysis of the audio signals to express the audio signal relationships between the channels.
- an example of the incorporation of the direction information in the metadata is to use determined azimuth and elevation values.
- the proposed metadata index may then be used alongside a downmix signal (‘channels’), to define a parametric immersive format that can be utilized, e.g., for the Immersive Voice and Audio Service (IVAS) codec.
- channels downmix signal
- IVAS Immersive Voice and Audio Service
- the concept furthermore addresses the issue of how to index the directional parameters to the spherical grid with different levels of granularity or quantisation resolution, particularly when the distribution of quantisation points on the sphere is fixed.
- This issue may be particularly pertinent for audio codecs, such as IVAS, which are capable of operating at a number of different coding rates. For example during a low coding rate operating mode it can be beneficial to quantize the directional parameters by indexing the spherical grid at a lower granularity in order to reduce the number of bits to represent each set of directional parameters.
- the system 100 is shown with an ‘analysis’ part 121 and a‘synthesis’ part 131 .
- The‘analysis’ part 121 is the part from receiving the multi-channel loudspeaker signals up to an encoding of the metadata and downmix signal and the‘synthesis’ part 131 is the part from a decoding of the encoded metadata and downmix signal to the presentation of the re-generated signal (for example in multi-channel loudspeaker form).
- the input to the system 100 and the‘analysis’ part 121 is the multi-channel signals 102.
- the multi-channel signals are passed to a downmixer 103 and to an analysis processor 105.
- the downmixer 103 is configured to receive the multi-channel signals and downmix the signals to a determined number of channels and output the downmix signals 104.
- the downmixer 103 may be configured to generate a 2 audio channel downmix of the multi-channel signals.
- the determined number of channels may be any suitable number of channels.
- a two audio channel downmix such as a stereo pair signal can utilize in some embodiments 2 microphone signals corresponding to a left channel and a right channel of the capture device microphone array or the capture scene orientation.
- the selected pair of microphone signals can furthermore exhibit a distance between the channels that can be at least substantially like the distance between human ears, which can have additional perceptual advantages.
- the downmixer 103 is optional and the multi-channel signals are passed unprocessed to an encoder 107 in the same manner as the downmix signal are in this example.
- the analysis processor 105 is also configured to receive the multi-channel signals and analyse the signals to produce metadata 106 associated with the multi-channel signals and thus associated with the downmix signals 104.
- the analysis processor 105 may be configured to generate the metadata which may comprise, for each time-frequency analysis interval, a direction parameter 108, an energy ratio parameter 1 10, a coherence parameter 1 12, and a diffuseness parameter 1 14.
- the direction, energy ratio and diffuseness parameters may in some embodiments be considered to be spatial audio parameters.
- the spatial audio parameters comprise parameters which aim to characterize the sound- field created by the multi-channel signals (or two or more playback audio signals in general).
- the coherence parameters may be considered to be signal relationship audio parameters which aim to characterize the relationship between the multi- channel signals.
- more than one direction parameter can be generated.
- the energy ratio parameter 1 10, the coherence parameter 1 12, and the diffuseness parameter 1 14 each relate to a direction parameter 108.
- Some types of coherence parameters in particular may relate only to a time-frequency analysis interval and not a particular direction.
- the diffuseness parameter 1 14 may be indicated for each individual analysed direction or over all directions.
- the parameters generated may differ from frequency band to frequency band.
- band X all of the parameters are generated and transmitted, whereas in band Y only one of the parameters is generated and transmitted, and furthermore in band Z any other number of parameters are generated or transmitted.
- band Z any other number of parameters are generated or transmitted.
- a practical example of this may be that for some frequency bands such as the lowest band or the highest band some of the parameters are not required for perceptual reasons.
- the downmix signals 104 and the metadata 106 may be passed to an encoder 107.
- the encoder 107 may comprise a IVAS stereo core 109 which is configured to receive the downmix (or otherwise) signals 104 and generate a suitable encoding of these audio signals.
- the encoder 107 can in some embodiments be a computer (running suitable software stored on memory and on at least one processor), or alternatively a specific device utilizing, for example, FPGAs or ASICs.
- the encoding may be implemented using any suitable scheme.
- the encoder 107 may furthermore comprise a metadata encoder or quantizer 109 which is configured to receive the metadata and output an encoded or compressed form of the information. In some embodiments the encoder 107 may further interleave, multiplex to a single data stream or embed the metadata within encoded downmix signals before transmission or storage shown in Figure 1 by the dashed line.
- the multiplexing may be implemented using any suitable scheme.
- the received or retrieved data (stream) may be received by a decoder/demultiplexer 133.
- the decoder/demultiplexer 133 may demultiplex the encoded streams and pass the audio encoded stream to a downmix extractor 135 which is configured to decode the audio signals to obtain the downmix signals.
- the decoder/demultiplexer 133 may comprise a metadata extractor 137 which is configured to receive the encoded metadata and generate metadata.
- the decoder/demultiplexer 133 can in some embodiments be a computer (running suitable software stored on memory and on at least one processor), or alternatively a specific device utilizing, for example, FPGAs or ASICs.
- the decoded metadata and downmix audio signals may be passed to a synthesis processor 139.
- the system 100 ‘synthesis’ part 131 further shows a synthesis processor 139 configured to receive the downmix and the metadata and re-creates in any suitable format a synthesized spatial audio in the form of multi-channel signals 1 10 (these may be multichannel loudspeaker format or in some embodiments any suitable output format such as binaural or Ambisonics signals, depending on the use case) based on the downmix signals and the metadata.
- the synthesis processor 139 or Tenderer may be part of the decoder/demultiplexer 133.
- First the system (analysis part) is configured to receive multi-channel audio signals as shown in Figure 4 by step 401 .
- system is configured to generate a downmix of the multi- channel signals as shown in Figure 4 by step 403. Also the system (analysis part) is configured to analyse signals to generate metadata such as direction parameters; energy ratio parameters; diffuseness parameters and coherence parameters as shown in Figure 4 by step 405.
- the system is then configured to encode for storage/transmission the downmix signal and metadata as shown in Figure 4 by step 407.
- the system may store/transmit the encoded downmix and metadata as shown in Figure 4 by step 409.
- the system may retrieve/receive the encoded downmix and metadata as shown in Figure 4 by step 41 1 .
- the system is configured to extract the downmix and metadata from encoded downmix and metadata parameters, for example demultiplex and decode the encoded downmix and metadata parameters, as shown in Figure 4 by step 413.
- the system (synthesis part) is configured to synthesize an output multi-channel audio signal based on extracted downmix of multi-channel audio signals and metadata with coherence parameters as shown in Figure 4 by step 415.
- the analysis processor 105 in some embodiments comprises a time-frequency domain transformer 201 .
- the time-frequency domain transformer 201 is configured to receive the multi-channel signals 102 and apply a suitable time to frequency domain transform such as a Short Time Fourier Transform (STFT) in order to convert the input time domain signals into a suitable time-frequency signals.
- STFT Short Time Fourier Transform
- These time- frequency signals may be passed to a direction analyser 203 and to a signal analyser 205.
- time-frequency signals 202 may be represented in the time- frequency domain representation by
- n can be considered as a time index with a lower sampling rate than that of the original time-domain signals.
- Each subband k has a lowest bin b k low and a highest bin b kMgh , and the subband contains all bins from b k iow to b k gh .
- the widths of the subbands can approximate any suitable distribution. For example the Equivalent rectangular bandwidth (ERB) scale or the Bark scale.
- the analysis processor 105 comprises a direction analyser 203.
- the direction analyser 203 may be configured to receive the time-frequency signals 202 and based on these signals estimate direction parameters 108.
- the direction parameters may be determined based on any audio based ‘direction’ determination. Typically such method will utilize knowledge of the microphone array or any other information on the arrangement of the multi-channel signals.
- the direction analyser 203 is configured to estimate the direction with two or more signal inputs. This represents the simplest configuration to estimate a‘direction’, more complex processing may be performed with even more signals.
- the direction analyser 203 may thus be configured to provide for each frequency band and temporal frame an azimuth denoted as ⁇ p ⁇ k,n) and an elevation denoted as Q(k,n).
- the direction parameter 108 which may then be represented as an index on a spherical grid, may be also be passed to a signal analyser 205
- the direction analyser 203 is configured to determine an energy ratio parameter 1 10.
- the energy ratio may be considered to be a determination of the energy of the audio signal which can be considered to arrive from a direction relative to the total energy for the corresponding time-frequency interval.
- the direct-to-total energy ratio r(k,n) can be estimated, e.g., using a correlation measure, or any other suitable method to obtain a ratio parameter.
- the estimated direction 108 parameters may be output (and passed to an encoder).
- the estimated energy ratio parameters 1 10 may be passed to a signal analyser 205.
- the analysis processor 105 comprises a signal analyser 205.
- the signal analyser 205 is configured to receive parameters (such as the azimuth ⁇ p ⁇ k,n) and elevation 0(k, n ) or a spherical grid index108, and the direct-to-total energy ratios r(/c, n) 1 10) from the direction analyser 203.
- the signal analyser 205 may be further configured to receive the time-frequency signals ( Si(b, n )) 202 from the time-frequency domain transformer 201 . All of these are in the time-frequency domain; b is the frequency bin index, k is the frequency band index (each band potentially consists of several bins b), n is the time index, and / is the channel.
- the parameters may be combined over several time indices. Same applies for the frequency axis, as has been expressed, the direction of several frequency bins b could be expressed by one direction parameter in band k consisting of several frequency bins b. The same applies for all of the discussed spatial parameters herein.
- the signal analyser 205 is configured to produce a number of signal parameters. In the following disclosure there are the two parameters: coherence and diffuseness, both analysed in time-frequency domain. In addition, in some embodiments the signal analyser 205 is configured to modify the estimated energy ratios ( r(k, n )). The signal analyser 205 is configured to generate the coherence and diffuseness parameters based on any suitable known method.
- Summing the energy ratios and the diffuseness values for a given time-frequency interval can produce a value of one or a value smaller than one. In case, the value is smaller than one, the remaining energy can be considered noise.
- the first operation is one of receiving time domain multichannel (loudspeaker) audio signals as shown in Figure 5 by step 501 .
- time domain to frequency domain transform e.g. STFT
- step 507 applying analysis to determine coherence parameters (such as surrounding and/or spread coherence parameters) and diffuseness parameters is shown in Figure 5 by step 507.
- the energy ratio may also be modified based on the determined coherence parameters in this step.
- the direction metadata encoder 300 in some embodiments comprises a quantization input 302.
- the quantization input which may also be known as an encoding input is configured to define the granularity of spheres arranged around a reference location or position from which the direction parameter is determined.
- the quantization input is a predefined or fixed value.
- the direction metadata encoder 300 in some embodiments comprises a sphere positioner 303.
- the sphere positioner is configured to configure the arrangement of spheres based on the quantization input value.
- the proposed spherical grid uses the idea of covering a sphere with smaller spheres and considering the centres of the smaller spheres as points defining a grid of almost equidistant directions, such that the spherical grid comprises a number of points arranged in a form of a sphere.
- the concept as shown herein is one in which a sphere is defined relative to the reference location.
- the sphere can be visualised as a series of circles (or intersections) and for each circle intersection there are located at the circumference of the circle a defined number of (smaller) spheres. This is shown for example with respect to Figure 3c to 3e.
- Figure 3c shows an example‘equatorial cross-section’ or a first main circle 370 which has a radius defined as the‘main sphere radius.
- each smaller sphere has a circumference which at one point touches the main sphere circumference and at least one further point which touches at least one further smaller sphere circumference.
- the smaller sphere 371 touches main sphere 370 and smaller sphere 373
- smaller sphere 373 touches main sphere 370 and smaller spheres 371 and 375
- smaller sphere 375 touches main sphere 370 and smaller spheres 373 and 377
- smaller sphere 377 touches main sphere 370 and smaller spheres 375 and 379
- smaller sphere 379 touches main sphere
- Figure 3d shows an example‘tropical cross-section’ or further main circle 380 and the smaller spheres (shown as circle cross-sections) 381 , 383, 385 located such that each smaller sphere has a circumference which at one point touches the main sphere (circle) circumference and at least one further point which touches at least one further smaller sphere circumference.
- the smaller sphere 381 touches main sphere 380 and smaller sphere 383, smaller sphere 383 touches main sphere 380 and smaller spheres 381 and 385, smaller sphere 385 touches main sphere 380 and smaller sphere 383.
- Figure 3e shows an example sphere and the cross sections 370, 380 and smaller spheres (cross-sections) 371 associated with cross-section 370, smaller sphere 381 associated with cross-section 380 and other smaller spheres 392, 393, 394, 395, 397, 398.
- the circles with starting azimuth value at 0 are drawn.
- the sphere positioner 303 thus in some embodiments be configured to perform the following operations to define the directions corresponding to the covering spheres:
- the elevation resolution is approximately 1 degree.
- the resolution is correspondingly smaller.
- Each direction point on one circle can be indexed in increasing order with respect to the azimuth value.
- the index of the first point in each circle is given by an offset that can be deduced from the number of points on each circle, n(t).
- the offsets are calculated as the cumulated number of points on the circles for the given order, starting with the value 0 as first offset.
- One possible order of the circles could be to start with the Equator followed by the first circle above the Equator, then the first under the Equator, the second one above the Equator, and so on. Another option is to start with the Equator, then the circle above the Equator that is at an approximate elevation of 45 degrees and then the corresponding circle under the Equator, and then the remaining circles in alternative order. This way for some simpler positioning of loudspeakers, only the first circles are used, reducing the number of bits to send the information.
- the spherical grid can also be generated by considering the meridian 0 instead of the Equator, or any other meridian.
- Figure 1 1 a depicts a visualisation of quantisation points as a spherical grid, with each black point representing an indexed quantisation point.
- the direction metadata encoder 300 in some embodiments comprises a direction parameter input 108.
- the direction metadata encoder 300 comprises an elevation-azimuth to direction index (EA-DI) converter 305.
- the elevation-azimuth to direction index converter 305 in some embodiments is configured to receive the direction parameter input 108 and the sphere positioner information and convert the elevation-azimuth value from the direction parameter input 108 to a direction index by quantizing the elevation- azimuth value.
- the receiving of the quantization input is shown in Figure 6 by step 601 .
- the method may determine sphere positioning based on the quantization input as shown in Figure 6 by step 603.
- the method may comprise receiving the direction parameter as shown in Figure 6 by step 602.
- Flaving receiving the direction parameter and the sphere positioning information may comprise converting the direction parameter to a direction index based on the sphere positioning information as shown in Figure 6 by step 605.
- the method may then output the direction index as shown in Figure 6 by step 607.
- elevation-azimuth to direction index (EA-DI) converter 305 is configured to perform this conversion according to the following algorithm:
- the indexing of the elevation-azimuth value may not use a codebook structure as above for storing discrete elevation and azimuth values.
- linear quantization can be used where the number of circles N c and the granularity p as provided by the sphere positioner can be used to uniformly divide the range of elevation from to The position on the elevation range gives a circle index, and the number of azimuth discrete points and corresponding offset, o//( are known.
- the direction index Id 306 may be output.
- the quantisation points of the spherical grid can be determined in accordance with the parameter N c and that this parameter determines the density of quantisation points within the grid and ultimately the number of bits required to represent the index of the directional parameters.
- the audio codec may be required to index the directional parameters at a lower accuracy than would be required to index a directional parameter using the spherical grid according to the parameter N c .
- This may depend, for example, on the total bit rate available to encode the frame of audio (and audio metadata) or the specific bit rate allocation for the metadata parameters.
- each quantization point on the spherical grid is considered as a potential quantization value for a directional parameter.
- this can be achieved by dividing the spherical grid or a subset of the spherical grid or an approximation of the spherical grid or an approximation of a part of the spherical grid into a number of tracks of quantisation points.
- Each track can consist of a number of contiguous quantisation points running across the spherical grid in a uniform direction which can resemble a ring or partial ring around the surface of the spherical grid.
- a track of quantization points can be in its most general form a group of following quantization points which forms a path over the quantization grid.
- a first quantization point and a second quantization point in a track can neighbour each other on the grid.
- Figure 7 (b) shows how a spherical grid may be divided into two sets of tracks with each set having a number (or run) of individual quantization points over the“surface” of the spherical grid.
- the spherical grid is divided into two sets of tracks distributed in an alternating fashion over the spherical grid.
- the first set of tracks are depicted as the tracks formed by the contiguous “runs” of black coloured quantisation points such that track 701 forms a first track of the first set of tracks and track 703 forms a second track of the first set of tracks and so on.
- the second set of tracks are depicted in Figure 7(b) as the tracks formed by the contiguous“runs” of grey coloured quantisation points such that track 702 forms the first track of the second set of tracks and track 704 forms a second track of the second set of tracks and so on.
- Figure 7(c) shows an instance in which the spherical grid is divided into four sets of tracks, a representative sample of tracks of the first set are depicted as 71 1 and 712, a representative sample of tracks of the second set are depicted as 721 and 722, a representative sample of tracks of the third set are depicted as 731 and 732 and a representative sample of tracks of the fourth set are depicted as 741 and 742.
- directional parameters can be indexed at a lower bit rate by confining the indexing/quantisation step for a particular input directional parameter to a particular set of tracks of quantization points.
- the step of quantisation/indexing a series of directional parameters may take the following form.
- a first directional parameter may be quantised to a quantisation point on a track of the first set of tracks. This can take the form of finding the elevation (or circle) index and azimuth index according to equations 1 to 3 above when searching all quantisation points belonging to the tracks of the first set of tracks.
- the second directional parameter may by quantised to a quantisation point on a track of the second set of tracks. Again this can take the form of searching all quantisation points belonging to the tracks of the second sub set of tracks and encoding the circle and azimuth indices according to equations 1 to 3.
- the third directional parameter may be quantised to a quantisation point on a track of the third set of tracks and the fourth directional parameter may be quantised to a quantisation point on a track of the fourth set of tracks.
- the quantisation process scrolls back to the first set of tracks and the fifth in the series of directional parameters is quantised to a quantisation point on a track of the first set of tracks.
- the sixth directional parameter can be quantised to a quantisation point on a track of the second set of tracks
- seventh directional parameter can be quantised to a quantisation point on a track of the third set of tracks
- eighth directional parameters can be quantised to a quantisation point on a track of the fourth set of tracks.
- the EA-DI converter 305 indexes a directional parameter using a quantization point from one set of tracks. Furthermore, the order in which each set of tracks is presented to quantize a directional parameter is predetermined, thereby negating the need to transmit set of tracks index to the decoder. Additionally it should be appreciated that the ordering of each set of tracks is not limited to the above uniformly increasing (in terms of the set of tracks index p) mechanism as described above. In fact other patterns may be used such as an alternating pattern in which the sets of tracks may be presented to the incoming directional parameter in a 1 , 3, 2, 4 pattern (in terms of the set of tracks index)
- the spherical grid, a subset of the spherical grid, or an approximation of the spherical grid or a section of it can be divided into a varying number sets of tracks each consisting in a varying number of tracks with a varying number of quantization points.
- the quantised direction index I d simply references a point on the spherical grid for an azimuth and elevation value, and that the index of the set of tracks p does not form part of the quantised direction index I d .
- the number of bits required to index the particular directional parameter (the quantised direction index l d ) is reduced when compared to quantising/indexing to all possible quantisation points of the spherical grid. For instance in the above example of Figure 7(c) the reduction in the number of bits per directional parameter will be of the order 2 bits.
- the reduction in the number of bits required to index each directional parameter can be given by log 2 N T given that N T is the total number of sets of tracks and is a number to the power of two. It is understood the bit rate reduction is similarly relative to a subset or approximation of a spherical grid where the at least one set of tracks is used to quantize directions limited to said subset or approximation of the spherical grid instead of quantizing the directions on the full spherical grid.
- a spherical grid may have a total of 512 quantised directional parameter points, with each direction parameter being a two component vector of azimuth and elevation.
- indexing a quantised directional component would take a 9 bit direction index I d .
- a quantised directional component can be represented using a 7 bit direction index I d .
- a 7 bit direction index I d is required to index a quantised directional parameter point from one of the sets of four tracks.
- the EA-DI converter 305 can be arranged to compare the current directional parameter to, e.g., the immediate previous directional parameter in order to determine whether the relative change in value is below a predefined threshold. If it is determined that the current directional parameter is sufficiently close to an immediate previous directional parameter it can be advantageous to quantise the current directional parameter to the quantisation point of the quantised immediate previous directional parameter rather than taking a quantisation point from the allotted set of tracks that does not have this same quantization point.
- the directional index l d may be arranged to have at least one reserved value.
- the reserved directional index value of a directional index l d signifies that a previous directional index value from another set of tracks is to be taken as the directional index value for the current directional parameter instead of any of the quantization points/index values from the allocated set of tracks.
- the reserved directional index value is a pointer to a previous directional index.
- the 7 bit directional index l d may be arranged to have one index value reserved as a pointer to a previous directional index.
- the index 1 1 1 1 1 1 1 may be reserved as the pointer to a previous directional index.
- the four tracks cannot cover all the 512 points of the example spherical grid. Therefore, the combination of the four tracks will result in a very close approximation of the full grid.
- the directional index / d can be arranged to have more than one reserved value thereby enabling the directional index I d to point to more than one previous directional index.
- the index 1 1 1 1 1 10 may be assigned to point to the last but one directional index.
- EA-DI converter 305 can be arranged to compare the current directional parameter to immediate previous directional parameters in order to determine whether the relative change in value is below a predefined threshold.
- the threshold may be adaptive and based on the bit rate and specifically the highest possible directional resolution allowed for that operating mode.
- the distance measure d can be considered by taking the L2 norm distance between a previous directional parameter and a current directional parameter points.
- the direction metadata extractor 350 in some embodiments comprises a quantization input 352. This in some embodiments is passed from the metadata encoder or is otherwise agreed with the encoder.
- the quantization input 352 can be based on a codec operating mode in some embodiments.
- the quantization input is configured to define the granularity of spheres arranged around a reference location or position.
- the direction metadata extractor 350 in some embodiments comprises a direction index input 351. This may be received from the encoder or retrieved by any suitable means.
- the direction metadata extractor 350 in some embodiments comprises a sphere positioner 353.
- the sphere positioner 353 is configured to receive as an input the quantization input and generate the sphere arrangement in the same manner as generated in the encoder.
- the quantization input and the sphere positioner 353 is optional and the arrangement of spheres information is passed from the encoder rather than being generated in the extractor.
- the direction metadata extractor 350 comprises a direction index to elevation- azimuth (DI-EA) converter 355.
- the direction index to elevation-azimuth converter 355 is configured to receive the direction index and furthermore the sphere position information and generate an approximate or quantized elevation-azimuth output. In some embodiments the conversion is performed according to the following algorithm.
- the direction index l d can be decoded into an elevation index and azimuth index in order to obtain the quantized elevation value Q and quantized azimuth value f from respective codebooks.
- the receiving of the quantization input is shown in Figure 8 by step 801.
- the method may determine sphere positioning based on the quantization input as shown in Figure 8 by step 803.
- the method may comprise receiving the direction index as shown in Figure 8 by step 802. Flaving received the direction index and the sphere positioning information the method may comprise converting the direction index to a direction parameter in the form of a quantized direction parameter based on the sphere positioning information as shown in Figure 8 by step 805. The method may then output the quantized direction parameter as shown in Figure 8 by step 807.
- Figure 9 an example method for converting the direction index to a quantized elevation-azimuth (DI-EA) parameter, as shown in Figure 8 by step 805, according to some embodiments is shown.
- DI-EA quantized elevation-azimuth
- the method comprises finding the circle index value / such that o//( 0 ⁇ I d £ off(i + 1) as shown in Figure 9 by step 901 .
- Flaving determined the circle index the next operation is to calculate the circle index in the hemisphere from the sphere positioning information as shown in Figure 9 by step 903.
- a quantized elevation is determined based on the circle index as shown in Figure 9 by step 905.
- the quantized azimuth is determined based on the circle index and elevation information as shown in Figure 9 by step 907.
- spatial audio processing takes place in frequency bands.
- Those bands could be for example, the frequency bins of the time-frequency transform, or frequency bands combining several bins.
- the combination could be such that approximates properties of human hearing, such as the Bark frequency resolution.
- we could measure and process the audio in time-frequency areas combining several of the frequency bins b and/or time indices n.
- these aspects were not expressed by all of the equations above.
- typically one set of parameters such as one direction is estimated for that time-frequency area, and all time-frequency samples within that area are synthesized according to that set of parameters, such as that one direction parameter.
- the usage of a frequency resolution for parameter analysis that is different than the frequency resolution of the applied filter-bank is a typical approach in the spatial audio processing systems.
- the device may be any suitable electronics device or apparatus.
- the device 1400 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc.
- the device 1400 comprises at least one processor or central processing unit 1407.
- the processor 1407 can be configured to execute various program codes such as the methods such as described herein.
- the device 1400 comprises a memory 141 1 .
- the at least one processor 1407 is coupled to the memory 141 1 .
- the memory 141 1 can be any suitable storage means.
- the memory 141 1 comprises a program code section for storing program codes implementable upon the processor 1407.
- the memory 141 1 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1407 whenever needed via the memory-processor coupling.
- the device 1400 comprises a user interface 1405.
- the user interface 1405 can be coupled in some embodiments to the processor 1407.
- the processor 1407 can control the operation of the user interface 1405 and receive inputs from the user interface 1405.
- the user interface 1405 can enable a user to input commands to the device 1400, for example via a keypad.
- the user interface 1405 can enable the user to obtain information from the device 1400.
- the user interface 1405 may comprise a display configured to display information from the device 1400 to the user.
- the user interface 1405 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1400 and further displaying information to the user of the device 1400.
- the user interface 1405 may be the user interface for communicating with the position determiner as described herein.
- the device 1400 comprises an input/output port 1409.
- the input/output port 1409 in some embodiments comprises a transceiver.
- the transceiver in such embodiments can be coupled to the processor 1407 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network.
- the transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
- the transceiver can communicate with further apparatus by any suitable known communications protocol.
- the transceiver or transceiver means can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
- UMTS universal mobile telecommunications system
- WLAN wireless local area network
- IRDA infrared data communication pathway
- the transceiver input/output port 1409 may be configured to receive the signals and in some embodiments determine the parameters as described herein by using the processor 1407 executing suitable code. Furthermore the device may generate a suitable downmix signal and parameter output to be transmitted to the synthesis device. In some embodiments the device 1400 may be employed as at least part of the synthesis device. As such the input/output port 1409 may be configured to receive the downmix signals and in some embodiments the parameters determined at the capture device or processing device as described herein, and generate a suitable audio signal format output by using the processor 1407 executing suitable code. The input/output port 1409 may be coupled to any suitable audio output for example to a multichannel speaker system and/or headphones or similar.
- the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
- some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
- firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
- While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
- the embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware.
- any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
- the software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
- the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
- the data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
- Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
- the design of integrated circuits is by and large a highly automated process.
- Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
- Programs can automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
- the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Mathematical Physics (AREA)
- Stereophonic System (AREA)
Abstract
L'invention concerne, entre autres, un procédé de codage de signaux audio spatiaux comprenant les étapes consistant : à déterminer un paramètre de direction audio spatiale pour au moins deux signaux audio ; à utiliser une grille de quantification de points de quantification pour indexer le paramètre de direction audio spatiale, les points de quantification de la grille de quantification étant divisés en au moins deux ensembles de pistes ; et à indexer le paramètre de direction audio spatiale sur un index dont la valeur correspond à un point de quantification sur une piste provenant d'un premier ensemble de pistes desdits ensembles de pistes.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1811601.2A GB2575632A (en) | 2018-07-16 | 2018-07-16 | Sparse quantization of spatial audio parameters |
GB1811601.2 | 2018-07-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020016479A1 true WO2020016479A1 (fr) | 2020-01-23 |
Family
ID=63273079
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/FI2019/050527 WO2020016479A1 (fr) | 2018-07-16 | 2019-07-05 | Quantification éparse de paramètres audio spatiaux |
Country Status (2)
Country | Link |
---|---|
GB (1) | GB2575632A (fr) |
WO (1) | WO2020016479A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021250311A1 (fr) * | 2020-06-09 | 2021-12-16 | Nokia Technologies Oy | Codage de paramètres audio spatiaux et décodage associé |
WO2023152348A1 (fr) * | 2022-02-14 | 2023-08-17 | Orange | Codage et décodage de coordonnées sphériques utilisant un dictionnaire de quantification sphérique optimisé |
WO2024197541A1 (fr) * | 2023-03-27 | 2024-10-03 | 北京小米移动软件有限公司 | Procédé de codage de quantification, appareil, dispositif et support d'enregistrement |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2586461A (en) | 2019-08-16 | 2021-02-24 | Nokia Technologies Oy | Quantization of spatial audio direction parameters |
GB2587196A (en) * | 2019-09-13 | 2021-03-24 | Nokia Technologies Oy | Determination of spatial audio parameter encoding and associated decoding |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007011157A1 (fr) * | 2005-07-19 | 2007-01-25 | Electronics And Telecommunications Research Institute | Procede de quantification et de dequantification de la difference de niveaux de canal basee sur les informations de localisation de sources virtuelles |
US7933770B2 (en) * | 2006-07-14 | 2011-04-26 | Siemens Audiologische Technik Gmbh | Method and device for coding audio data based on vector quantisation |
WO2009067741A1 (fr) * | 2007-11-27 | 2009-06-04 | Acouity Pty Ltd | Compression de la bande passante de représentations paramétriques du champ acoustique pour transmission et mémorisation |
-
2018
- 2018-07-16 GB GB1811601.2A patent/GB2575632A/en not_active Withdrawn
-
2019
- 2019-07-05 WO PCT/FI2019/050527 patent/WO2020016479A1/fr active Application Filing
Non-Patent Citations (4)
Title |
---|
KRUGER, H. ET AL.: "On logarithmic spherical vector quantization", INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY AND ITS APPLICATIONS, 7 December 2008 (2008-12-07), pages 6, XP031451107, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4895481> [retrieved on 20191112] * |
LI, G. ET AL.: "The perceptual lossless quantization of spatial parameter for 3D audio signals", INTERNATIONAL CONFERENCE ON MULTIMEDIA MODELING, 31 December 2016 (2016-12-31), pages 381 - 392, XP047368507, Retrieved from the Internet <URL:https://link.springer.com/chapter/10.1007/978-3-319-51814-5_32> [retrieved on 20191104], DOI: 10.1007/978-3-319-51814-5_32 * |
MATSCHKAL, B. ET AL.: "Spherical logarithmic quantization", IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, vol. 18, no. 1, 2 June 2009 (2009-06-02), pages 126 - 140, XP011329117, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5062285> [retrieved on 20191104], DOI: 10.1109/TASL.2009.2024383 * |
YANG, C. ET AL.: "3D audio coding approach based on spatial perception features", CHINA COMMUNICATIONS, vol. 14, no. 11, 22 December 2017 (2017-12-22), pages 126 - 140, XP011674724, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8233656> [retrieved on 20191104], DOI: 10.1109/CC.2017.8233656 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021250311A1 (fr) * | 2020-06-09 | 2021-12-16 | Nokia Technologies Oy | Codage de paramètres audio spatiaux et décodage associé |
WO2023152348A1 (fr) * | 2022-02-14 | 2023-08-17 | Orange | Codage et décodage de coordonnées sphériques utilisant un dictionnaire de quantification sphérique optimisé |
FR3132811A1 (fr) * | 2022-02-14 | 2023-08-18 | Orange | Codage et décodage de coordonnées sphériques utilisant un dictionnaire de quantification sphérique optimisé |
WO2024197541A1 (fr) * | 2023-03-27 | 2024-10-03 | 北京小米移动软件有限公司 | Procédé de codage de quantification, appareil, dispositif et support d'enregistrement |
Also Published As
Publication number | Publication date |
---|---|
GB201811601D0 (en) | 2018-08-29 |
GB2575632A (en) | 2020-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3707706B1 (fr) | Détermination d'un codage de paramètre audio spatial et décodage associé | |
US11062716B2 (en) | Determination of spatial audio parameter encoding and associated decoding | |
GB2575305A (en) | Determination of spatial audio parameter encoding and associated decoding | |
JP7405962B2 (ja) | 空間オーディオパラメータ符号化および関連する復号化の決定 | |
WO2020016479A1 (fr) | Quantification éparse de paramètres audio spatiaux | |
WO2020070377A1 (fr) | Sélection de schémas de quantification pour un codage de paramètre audio spatial | |
WO2021032909A1 (fr) | Quantification de paramètres de direction de signal audio spatial | |
EP3776545B1 (fr) | Quantification de paramètres audio spatiaux | |
EP3991170A1 (fr) | Détermination de codage de paramètre audio spatial et décodage associé | |
KR20220047821A (ko) | 공간 오디오 방향 파라미터의 양자화 | |
US20240127828A1 (en) | Determination of spatial audio parameter encoding and associated decoding | |
CA3237983A1 (fr) | Decodage de parametre audio spatial | |
CA3208666A1 (fr) | Transformation de parametres audio spatiaux |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19837005 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19837005 Country of ref document: EP Kind code of ref document: A1 |