GB2578604A - Determination of spatial audio parameter encoding and associated decoding - Google Patents

Determination of spatial audio parameter encoding and associated decoding Download PDF

Info

Publication number
GB2578604A
GB2578604A GB1817809.5A GB201817809A GB2578604A GB 2578604 A GB2578604 A GB 2578604A GB 201817809 A GB201817809 A GB 201817809A GB 2578604 A GB2578604 A GB 2578604A
Authority
GB
United Kingdom
Prior art keywords
value
primary
directional
partition section
partitioning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1817809.5A
Other versions
GB201817809D0 (en
Inventor
Vasilache Adriana
Johannes Pihlajakuja Tapani
Juhani Järvinen Kari
Leppanen Jussi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Oy
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Priority to GB1817809.5A priority Critical patent/GB2578604A/en
Publication of GB201817809D0 publication Critical patent/GB201817809D0/en
Priority to PCT/FI2019/050703 priority patent/WO2020089509A1/en
Priority to EP19879837.3A priority patent/EP3874771A4/en
Publication of GB2578604A publication Critical patent/GB2578604A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

A directional parameter comprising azimuth and elevation values are received for sub-bands of an audio frame. The range of the directional parameter is partitioned, and a primary partition is then identified depending on the value of the directional parameter. The primary partition section is further sub-divided to yield a secondary partitioning of the range of the directional parameter, and one of the secondary partitions is identified. An embedded index value associated with the directional parameter is then obtained by combining the indices associated with the primary and secondary partitions. The primary partition may subdivide the azimuthal and elevational ranges into 8 quarter-hemispheres which may be enumerated using three bits, which represent spherical pseudo-triangles, and which may be sub-divided using the midpoints of the pseudo-triangle’s edges (fig. 8). This sub-division may be done recursively. The secondary partition may be selected to minimise distance between its centre and the directional parameter’s value.

Description

DETERMINATION OF SPATIAL AUDIO PARAMETER ENCODING AND
ASSOCIATED DECODING
Field
The present application relates to apparatus and methods for sound-field related parameter encoding, but not exclusively for time-frequency domain direction related parameter encoding for an audio encoder and decoder.
Background
Parametric spatial audio processing is a field of audio signal processing where the spatial aspect of the sound is described using a set of parameters. For example, in parametric spatial audio capture from microphone arrays, it is a typical and an effective choice to estimate from the microphone array signals a set of parameters such as directions of the sound in frequency bands, and the ratios between the directional and non-directional parts of the captured sound in frequency bands. These parameters are known to well describe the perceptual spatial properties of the captured sound at the position of the microphone array. These parameters can be utilized in synthesis of the spatial sound accordingly; for headphones binaurally, for loudspeakers, or to other formats, such as Ambisonics.
The directions and direct-to-total energy ratios in frequency bands are thus a parameterization that is particularly effective for spatial audio capture.
A parameter set consisting of a direction parameter in frequency bands and an energy ratio parameter in frequency bands (indicating the directionality of the sound) can be also utilized as the spatial metadata (which may also include other parameters such as coherence, spread coherence, number of directions, distance etc) for an audio codec. For example, these parameters can be estimated from microphone-array captured audio signals, and for example a stereo signal can be generated from the microphone array signals to be conveyed with the spatial metadata. The stereo signal could be encoded, for example, with an AAC encoder.
A decoder can decode the audio signals into PCM signals, and process the sound in frequency bands (using the spatial metadata) to obtain the spatial output, for example a binaural output.
The aforementioned solution is particularly suitable for encoding captured spatial sound from microphone arrays (e.g., in mobile phones, VR cameras, standalone microphone arrays). However, it may be desirable for such an encoder to have also other input types than microphone-array captured signals, for example, loudspeaker signals, audio object signals, or Ambisonic signals.
However with respect to the directional components of the metadata, which may comprise an elevation, azimuth (and energy ratio which is 1-diffuseness) of a resulting direction, for each considered time/frequency subband. Quantization of these directional components is a current research topic.
Summary
There is provided according to a first aspect an apparatus comprising means for: receiving values for sub-bands of a frame, the values comprising at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value; identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value; determining a primary index value associated with the identified primary partition section; identifying at least one further partition section based on at least one further partitioning of the primary partition section; determining at least one further index value associated with the identified further partition section; combining the primary index value and the at least one further index value to generate an embedded index value associated with the at least one directional value.
The means for identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value may be further for: partitioning the range of the at least one directional value into quarters of hemispheres; and identifying through which of the quarters of hemispheres the at least one directional value passes.
The means for determining a primary index value associated with the identified primary partition section may be further for determining a three bit value associated with the identified quarter of the hemisphere through which the at least one directional value passes.
The means for identifying at least one further partition section based on at least one further partitioning of the primary partition section may he further for: recursively partitioning an identified section through which the at least one directional value passes into four further sections; and identifying which of the four further sections the at least one directional value passes.
The means for recursively partitioning an identified section through which the at least one directional value passes into four further sections, wherein the identified section has a pseudo-triangle surface, may be further for: calculating angle coordinates of corners of a current pseudo-triangle; calculating angle coordinates of the mid vertex points of the current pseudo-triangle; and calculating angle coordinates of centres of each of the four further sections.
The means for calculating angle coordinates of centres of each of the four further sections may be further for employing a look-up table for calculating angle coordinates of centres of each of the four further sections.
The means for identifying which of the four further sections the at least one directional value passes may be for: determining distances between angle coordinates of centres of each of the four further sections and the at least one directional value; and selecting one of the four further sections based on the smallest absolute distances.
The means for determining at least one further index value associated with the identified further partition section may be further for determining a two bit value associated with the identified further partition section from the four further sections for each recursive partitioning.
The means for combining the primary index value and the at least one further 25 index value to generate an embedded index value associated with the at least one directional value may be further for juxtaposing the primary index value and at least one further index value bits.
The means for identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value may be further for at least one of: rotating the at least one directional value before identifying a primary partition section based on a primary partitioning of a range of the at least one directional value; and determining a reference orientation for the partitioning the range of the at least one directional value into quarters of hemispheres based on the at least one directional value. According to a second aspect there is provided an apparatus comprising means for: receiving encoded values for sub-bands of a frame, the values 5 comprising a primary index and at least one further index for each sub-band; identifying a primary partition section partitioning of a range of the at least one directional value based on a primary index value; identifying at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value; determining at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value based on the identified at least one further partition section.
The means for identifying a primary partition section partitioning of a range of the at least one directional value based on a primary index value may be further for identifying through which of partitioning of a sphere into quarters of hemispheres the at least one directional value passes.
The means for identifying at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value may be further for identifying which of a recursively partitioned four further sections the at least one directional value passes based on the at least one further index value.
According to a third aspect there is provided a method comprising: receiving values for sub-bands of a frame, the values comprising at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value; identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value; determining a primary index value associated with the identified primary partition section; identifying at least one further partition section based on at least one further partitioning of the primary partition section; determining at least one further index value associated with the identified further partition section; combining the primary index value and the at least one further index value to generate an embedded index value associated with the at least one directional value.
Identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value may comprise: partitioning the range of the at least one directional value into quarters of hemispheres; and identifying through which of the quarters of hemispheres the at least one directional value passes.
Determining a primary index value associated with the identified primary partition section may further comprise determining a three bit value associated with the identified quarter of the hemisphere through which the at least one directional value passes.
Identifying at least one further partition section based on at least one further partitioning of the primary partition section may further comprise: recursively partitioning an identified section through which the at least one directional value passes into four further sections; and identifying which of the four further sections the at least one directional value passes.
Recursively partitioning an identified section through which the at least one directional value passes into four further sections, wherein the identified section has a pseudo-triangle surface, may further comprise: calculating angle coordinates of corners of a current pseudo-triangle; calculating angle coordinates of the mid vertex points of the current pseudo-triangle; and calculating angle coordinates of centres of each of the four further sections.
Calculating angle coordinates of centres of each of the four further sections may further comprise employing a look-up table for calculating angle coordinates of centres of each of the four further sections.
Identifying which of the four further sections the at least one directional value passes may further comprise: determining distances between angle coordinates of centres of each of the four further sections and the at least one directional value; and selecting one of the four further sections based on the smallest absolute distances.
Determining at least one further index value associated with the identified 30 further partition section may further comprise determining a two bit value associated with the identified further partition section from the four further sections for each recursive partitioning.
Combining the primary index value and the at least one further index value to generate an embedded index value associated with the at least one directional value may further comprise juxtaposing the primary index value and at least one further index value bits.
Identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value may further comprise at least one of: rotating the at least one directional value before identifying a primary partition section based on a primary partitioning of a range of the at least one directional value; and determining a reference orientation for the partitioning the range of the at least one directional value into quarters of hemispheres based on the at least one directional value.
According to a fourth aspect there is provided a method comprising: receiving encoded values for sub-bands of a frame, the values comprising a primary index and at least one further index for each sub-band; identifying a primary partition section partitioning of a range of the at least one directional value based on a primary index value; identifying at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value; determining at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value based on the identified at least one further partition section.
Identifying a primary partition section partitioning of a range of the at least one directional value based on a primary index value may further comprise identifying through which of partitioning of a sphere into quarters of hemispheres the at least one directional value passes.
Identifying at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value may further comprise identifying which of a recursively partitioned four further sections the at least one directional value passes based on the at least one further index value.
According to a fifth aspect there is provided an apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, be caused to at least to: receive values for sub-bands of a frame, the values comprising at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value; identify a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value; determine a primary index value associated with the identified primary partition section; identify at least one further partition section based on at least one further partitioning of the primary partition section; determine at least one further index value associated with the identified further partition section; combine the primary index value and the at least one further index value to generate an embedded index value associated with the at least one directional value.
The apparatus caused to identify a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value may further be caused to: partition the range of the at least one directional value into quarters of hemispheres; and identify through which of the quarters of hemispheres the at least one directional value passes.
The apparatus caused to determine a primary index value associated with the identified primary partition section may further be caused to determine a three bit value associated with the identified quarter of the hemisphere through which the at least one directional value passes.
The apparatus caused to identify at least one further partition section based on at least one further partitioning of the primary partition section may further be caused to: recursively partition an identified section through which the at least one directional value passes into four further sections; and identify which of the four further sections the at least one directional value passes.
The apparatus caused to recursively partition an identified section through which the at least one directional value passes into four further sections, wherein the identified section has a pseudo-triangle surface, may further be caused to: calculate angle coordinates of corners of a current pseudo-triangle; calculate angle coordinates of the mid vertex points of the current pseudo-triangle; and calculate angle coordinates of centres of each of the four further sections.
The apparatus caused to calculate angle coordinates of centres of each of the four further sections may further be caused to employ a look-up table to calculate angle coordinates of centres of each of the four further sections.
The apparatus caused to identify which of the four further sections the at least one directional value passes may further be caused to: determine distances between angle coordinates of centres of each of the four further sections and the at least one directional value; and select one of the four further sections based on the smallest absolute distances.
The apparatus caused to determine at least one further index value associated with the identified further partition section may further be caused to determine a two bit value associated with the identified further partition section from the four further sections for each recursive partitioning.
The apparatus caused to combine the primary index value and the at least one further index value to aenerate an embedded index value associated with the at least one directional value may further be caused to juxtapose the primary index value and at least one further index value bits.
The apparatus caused to identify a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value may further be caused to perform at least one of: rotate the at least one directional value before identifying a primary partition section based on a primary partitioning of a range of the at least one directional value; and determine a reference orientation for the partitioning the range of the at least one directional value into quarters of hemispheres based on the at least one directional value.
According to a fifth aspect there is provided an apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: receive encoded values for sub-bands of a frame, the values comprising a primary index and at least one further index for each sub-band; identify a primary partition section partitioning of a range of the at least one directional value based on a primary index value; identify at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value; determine at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value based on the identified at least one further partition section.
The apparatus caused to identify a primary partition section partitioning of a range of the at least one directional value based on a primary index value may further be caused to identify through which of partitioning of a sphere into quarters of hemispheres the at least one directional value passes.
The apparatus caused to identify at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value may further be caused to identify which of a recursively partitioned four further sections the at least one directional value passes based on the at least one further index value.
According to a seventh aspect there is provided an apparatus comprising: means for receiving values for sub-bands of a frame, the values comprising at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value; means for identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value; means for determining a primary index value associated with the identified primary partition section; identifying at least one further partition section based on at least one further partitioning of the primary partition section; means for determining at least one further index value associated with the identified further partition section; means for combining the primary index value and the at least one further index value to generate an embedded index value associated with the at least one directional value.
The means for identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value may comprise: means for partitioning the range of the at least one directional value into quarters of hemispheres; and means for identifying through which of the quarters of hemispheres the at least one directional value passes. The means for determining a primary index value associated with the identified primary partition section may further comprise means for determining a three bit value associated with the identified quarter of the hemisphere through 30 which the at least one directional value passes.
The means for identifying at least one further partition section based on at least one further partitioning of the primary partition section may further comprise: means for recursively partitioning an identified section through which the at least one directional value passes into four further sections; and means for identifying which of the four further sections the at least one directional value passes.
The means for recursively partitioning an identified section through which the at least one directional value passes into four further sections. wherein the identified section has a pseudo-triangle surface, may further comprise: means for calculating angle coordinates of corners of a current pseudo-triangle; means for calculating angle coordinates of the mid vertex points of the current pseudo-triangle; and means for calculating angle coordinates of centres of each of the four further sections.
The means for calculating angle coordinates of centres of each of the four further sections may further comprise means for employing a look-up table for calculating angle coordinates of centres of each of the four further sections.
The means for identifying which of the four further sections the at least one directional value passes may further comprise: means for determining distances between angle coordinates of centres of each of the four further sections and the at least one directional value; and means for selecting one of the four further sections based on the smallest absolute distances.
The means for determining at least one further index value associated with the identified further partition section may further comprise means for determining a two bit value associated with the identified further partition section from the four further sections for each recursive partitioning.
The means for combining the primary index value and the at least one further index value to generate an embedded index value associated with the at least one directional value may further comprise means for juxtaposing the primary index value and at least one further index value bits.
The means for identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value may further comprise at least one of: means for rotating the at least one directional value before identifying a primary partition section based on a primary partitioning of a range of the at least one directional value; and means for determining a reference orientation for the partitioning the range of the at least one directional value into quarters of hemispheres based on the at least one directional value.
According to an eighth aspect there is provided an apparatus comprising: means for receiving encoded values for sub-bands of a frame, the values comprising a primary index and at least one further index for each sub-band: means for identifying a primary partition section partitioning of a range of the at least one directional value based on a primary index value; means for identifying at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value; means for determining at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value based on the identified at least one further partition section.
The means for identifying a primary partition section partitioning of a range of the at least one directional value based on a primary index value may further comprise means for identifying through which of partitioning of a sphere into quarters of hemispheres the at least one directional value passes.
The means for identifying at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value may further comprise means for identifying which of a recursively partitioned four further sections the at least one directional value passes based on the at least one further index value.
According to a ninth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising program instructions] for causing an apparatus to perform at least the following: receiving values for sub-bands of a frame, the values comprising at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value; identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value; determining a primary index value associated with the identified primary partition section; identifying at least one further partition section based on at least one further partitioning of the primary partition section; determining at least one further index value associated with the identified further partition section; combining the primary index value and the at least one further index value to generate an embedded index value associated with the at least one directional value.
According to a tenth aspect there is provided a computer program comprising instructions [or a computer readable medium comprising program instructions] for causing an apparatus to perform at least the following receiving encoded values for sub-bands of a frame, the values comprising a primary index and at least one further index for each sub-band; identifying a primary partition section partitioning of a range of the at least one directional value based on a primary index value; identifying at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value; determining at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value based on the identified at least one further partition section.
According to an eleventh aspect there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receiving values for sub-bands of a frame, the values comprising at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value; identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directiona*. value; determining a primary index value associated with the identified primary partition section; identifying at least one further partition section based on at least one further partitioning of the primary partition section; determining at least one further index value associated with the identified further partition section; combining the primary index value and the at least one further index value to generate an embedded index value associated with the at least one directional value.
According to a twelfth aspect there is provided a non-transitory computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receiving encoded values for sub-bands of a frame, the values comprising a primary index and at least one further index for each sub-band; identifying a primary partition section partitioning of a range of the at least one directional value based on a primary index value; identifying at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value; determining at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value based on the identified at least one further partition section.
According to a thirteenth aspect there is provided an apparatus comprising: receiving circuitry configured to receive values for sub-bands of a frame, the values comprising at least one directional value for each sub-hand, each directional value comprising an azimuth value and an elevation value; identifying circuitry configured to identify a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value; determining circuitry configured to determine a primary index value associated with the identified primary partition section; identifying circuitry configured to identify at least one further partition section based on at least one further partitioning of the primary partition section; determining circuitry configured to determine at least one further index value associated with the identified further partition section; combining circuitry configured to combine the primary index value and the at least one further index value to generate an embedded index value associated with the at least one directional value.
According to a fourteenth aspect there is provided an apparatus comprising: receiving circuitry configured to receive encoded values for sub-bands of a frame, the values comprising a primary index and at least one further index for each sub-band; identifying circuitry configured to identify a primary partition section partitioning of a range of the at least one directional value based on a primary index value; identifying circuitry configured to identify at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value; determining circuitry configured to determine at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value based on the identified at least one further partition section.
According to a fifteenth aspect there is provided a computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receiving values for sub-bands of a frame, the values comprising at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value; identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value; determining a primary index value associated with the identified primary partition section; identifying at least one further partition section based on at least one further partitioning of the primary partition section; determining at least one further index value associated with the identified further partition section; combining the primary index value and the at least one further index value to generate an embedded index value associated with the at least one directional value.
According to a sixteenth aspect there is provided a computer readable medium comprising program instructions for causing an apparatus to perform at least the following: receiving encoded values for sub-bands of a frame, the values comprising a primary index and at least one further index for each sub-band; identifying a primary partition section partitioning of a range of the at least one directional value based on a primary index value; identifying at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value; determining at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value based on the identified at least one further partition section.
An apparatus comprising means for performing the actions of the method as 20 described above.
An apparatus configured to perform the actions of the method as described above.
A computer program comprising program instructions for causing a computer to perform the method as described above.
A computer program product stored on a medium may cause an apparatus to perform the method as described herein.
An electronic device may comprise apparatus as described herein.
A chipset may comprise apparatus as described herein.
Embodiments of the present application aim to address problems associated with the state of the art.
Summary of the Figures
For a better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which: Figure 1 shows schematically a system of apparatus suitable for implementing some embodiments; Figure 2 shows schematically the metadata encoder according to some embodiments; Figure 3 shows schematically the embedded index metadata encoder according to some embodiments; Figure 4 shows a flow diagram of the operation of the embedded index 10 metadata encoder as shown in Figure 3 according to some embodiments; Figure 5 shows schematically the embedded index metadata decoder according to some embodiments; Figure 6 shows a flow diagram of the operation of the embedded index metadata decoder as shown in Figure 5 according to some embodiments; Figure 7a shows an example sphere and initial (primary) portioning of a directional sphere; Figure 7b shows an rotated or transformed further example sphere and initial (primary) portioning of a directional sphere; Figure 8 shows an example sphere section OABC and example divisions of 20 an external surface of the sphere section OABC; Figure 9 shows an example sphere section OABC and example spherical section partitioning; Figure 10 shows an example Type I pseudo-triangle ABC, followed by subsequent division of the triangles of type II III, and IV; Figure 11a and 11 b shows an example quantization index assignment for two pseudo triangle orientations (basis down, and basis up); Figure 12 shows schematically an example device suitable for implementing the apparatus shown.
Embodiments of the Application The following describes in further detail suitable apparatus and possible mechanisms for the provision of effective spatial analysis derived metadata parameters. In the following discussions multi-channel system is discussed with respect to a multi-channel microphone implementation. However as discussed above the input format may be any suitable input format, such as multi-channel loudspeaker, ambisonic (FOAIHOA) etc. It is understood that in some embodiments the channel location is based on a location of the microphone or is a virtual location or direction. Furthermore the output of the example system is a multi-channel loudspeaker arrangement. However it is understood that the output may be rendered to the user via means other than loudspeakers. Furthermore the multichannel loudspeaker signals may be generalised to be two or more playback audio signals.
The metadata consists at least of elevation, azimuth and the energy ratio of a resulting direction, for each considered time/frequency subband. In addition, other metadata parameters which may be obtained or determined are spread coherence, surround coherence, and distance.
The direction parameter components, the azimuth and the elevation are extracted from the audio data and then quantized to a given quantization resolution.
The resulting indexes may then be further compressed for efficient transmission. The concept as discussed hereafter employs an embedded indexing of the directional information such that by dropping bits from the original indexes, the directional data can be obtained at a desired I22ower resolution, without having to 20 recompile the quantization process. The implementation of the embedded indexing is enabled by a proposed partitioning of the directional range sphere. The use of the embedded indexing allows reading a reduced number of bits and form a reconstructed directional vector without requiring a requantization of the angles to a lower resolution quantizer.
With respect to Figure 1 an example apparatus and system for implementing embodiments of the application are shown, The system 100 is shown with an 'analysis' part 121 and a 'synthesis' part 131. The 'analysis' part 121 is the part from receiving the multi-channel loudspeaker signals up to an encoding of the metadata and downmix signal and the 'synthesis' part 131 is the part from a decoding of the encoded metadata and downmix signal to the presentation of the re-generated signal (for example in multi-channel loudspeaker form).
The input to the system 100 and the 'analysis' part 121 is the multi-channel signals 102. Any suitable input (or synthetic multi-channel) format may be received or obtained. For example the input may be multichannel loudspeaker audio signals, microphone array audio signals, audio object signals. For example, in the case the core audio is carried as MPEG-H 3D audio specified in the ISO/IEC 23008-3 (MPEG-H Part 3), the input can be audio objects (comprising one or more audio channels) and associated metadata, immersive multichannel signals, or Higher Order Arnbisonics (HOA) signals.
The multi-channel signals are passed to a transport signal generator 103 and to an analysis processor 105.
In some embodiments the transport signal generator 103 is configured to 10 downmix or otherwise select or combine, for example. by beamforming techniques the input audio signals to a determined number of channels and output these as transport signals. In some embodiments the analysis processor is configured to generate a 2 audio channel output. The determined number of channels may be two or any suitable number of channels. In some embodiments the analysis processor is configured to create transport signals for each of different types of input audio signals, the created transport signals for each of different types of input audio signals differing in their number of channels.
In some embodiments the analysis part 121 is configured to pass the received input audio signals 102 unprocessed to an encoder in the same mariner as the transport signals. In some embodiments the analysis part 121 is configured to select one or more of the input channel audio signals and output the selection as the transport signals 104.
In some embodiments the analysis processor 105 is also configured to receive the multi-channel signals and analyse the signals to produce metadata 106 25 associated with the multi-channel signals and thus associated with the transport signals 104. The analysis processor 105 is configured to generate the metadata which may comprise, for each time-frequency analysis interval, a direction parameter 108 and an energy ratio parameter 110 (and in some embodiments a coherence parameter, and a diffuseness parameter). The direction and energy ratio may in some embodiments be considered to be spatial audio parameters. In other words the spatial audio parameters comprise parameters which aim to characterize the sound-field created by the multi-channel signals (or two or more playback audio signals in general).
In some embodiments the parameters generated may differ from frequency band to frequency band. Thus for example in band X all of the parameters are generated and transmitted, whereas in band Y only one of the parameters is generated and transmitted, and furthermore in band Z no parameters are generated or transmitted. A practical example of this may be that for some frequency bands such as the highest band some of the parameters are not required for perceptual reasons. The downmix signals 104 and the metadata 106 may be passed to an encoder 107.
The encoder 107 may comprise an audio encoder core 109 which is configured to receive the downmix (or otherwise) signals 104 and generate a suitable encoding of these audio signals. The encoder 107 can in some embodiments be a computer (running suitable software stored on memory and on at least one processor), or alternatively a specific device utilizing, for example, FPGAs or ASICs. The encoding may be implemented using any suitable scheme.
The encoder 107 may furthermore comprise a metadata encoder/quantizer 111 which is configured to receive the metadata arid output an encoded or compressed form of the information. In some embodiments the encoder 107 may further interleave, multiplex to a single data stream or embed the metadata within encoded downmix signals before transmission or storage shown in Figure 1 by the dashed line. The multiplexing may be implemented using any suitable scheme.
In the decoder side, the received or retrieved data (stream) may be received by a decoder/demultiplexer 133. The decoder/demultiplexer 133 may demultiplex the encoded streams and pass the audio encoded stream to a transport signal extractor 135 which is configured to decode the audio signals to obtain the transport signals. Similarly the decoder/demultiplexer 133 may comprise a metadata extractor 137 which is configured to receive the encoded metadata and generate metadata. The decoder/demultiplexer 133 can in some embodiments be a computer (running suitable software stored on memory and on at least one processor), or alternatively a specific device utilizing, for example, FPGAs or ASICs.
The decoded metadata and transport audio signals may be passed to a synthesis processor 139.
The system 100 'synthesis' part 131 further shows a synthesis processor 139 configured to receive the transport signals and the metadata and re-create in any suitable format a synthesized spatial audio in the form of multi-channel signals 110 (these may be multichannel loudspeaker format or in some embodiments any suitable output format such as binaural or Ambisonics signals, depending on the use case) based on the downmix signals and the metadata.
Therefore in summary first the system (analysis part) is configured to receive multi-channel audio signals.
Then the system (analysis part) is configured to generate a suitable transport audio signal (for example by selecting some of the audio signal channels).
Suitable metadata may then be obtained, for example by analysis of the multi-channel audio signals (or by signal extraction of suitable metadata from the input).
The system is then configured to encode for storage/transmission the 15 transport audio signal and the metadata.
After this the system may store/transmit the encoded transport arid metadata.
The system may retrieve/receive the encoded transport audio signals and metadata.
Then the system is configured to extract the transport audio signals and metadata from encoded transport audio signals and metadata parameters, for example demultiplex and decode the encoded transport audio signals and metadata parameters.
The system (synthesis part) is configured to synthesize an output multi-channel audio signal based on extracted transport audio signals of multi-channel audio signals and metadata.
With respect to Figure 2 an example analysis processor 105 and Metadata encoderlquantizer 111 (as shown in Figure 1) according to some embodiments is described in further detail.
The analysis processor 105 in some embodiments comprises a time-frequency domain transformer 201.
In some embodiments the time-frequency domain transformer 201 is configured to receive the multi-channel signals 102 and apply a suitable time to frequency domain transform such as a Short Time Fourier Transform (STET) in order to convert the input time domain signals into a suitable time-frequency signals. These time-frequency signals may be passed to a spatial analyser 203.
The time-frequency signals 202 may be represented in the time-frequency 5 domain representation by si 0), n), where b is the frequency bin index and n is the time block (frame) index and i is the channel index. In another expression, n can be considered as a time index with a lower sampling rate than that of the original time-domain signals. These frequency bins can be grouped into subbands that group one or more of the bins into a subband of a band index k = 0,..., K-1. Each subband k has a lowest bin bk.iow, and a highest bin bkhigh, and the subband contains all bins from bum,. to bk,high* The widths of the subbands can approximate any suitable distribution. For example the Equivalent rectangular bandwidth (ERB) scale or the Bark scale.
In some embodiments the analysis processor 105 comprises a spatial analyser 203. The spatial analyser 203 may be configured to receive the time-frequency signals 202 and based on these signals estimate direction parameters 108. The direction parameters may be determined based on any audio based 'direction' determination.
For example in some embodiments the spatial analyser 203 is configured to estimate the direction with two or more signal inputs. This represents the simplest configuration to estimate a 'direction', more complex processing may be performed with even more signals.
The spatial analyser 203 may thus be configured to provide at least one 25 azimuth and elevation for each frequency band and temporal time block within a frame, denoted as azimuth c(k,n) and elevation e(k,n). The direction parameters 108 may be also be passed to a direction index generator 205.
The spatial analyser 203 may also be configured to determine an energy ratio parameter 110. The energy ratio may be considered to be a determination of the energy of the audio signal which can be considered to arrive from a direction. The direct-to-total energy ratio r(k,n) can be estimated, e.g., using a stability measure of the directional estimate, or using any correlation measure, or any other suitable method to obtain a ratio parameter. The energy ratio may be passed to an energy ratio encoder 227.
As discussed above in some embodiments the spatial analyser 203 may also he configured to determine other parameters such as spread coherence, surround coherence, and distance.
Therefore in summary the analysis processor is configured to receive time domain multichannel or other format such as microphone audio signals.
Although directions and ratios are here expressed for each time index n, in some embodiments the parameters may be combined over several time indices.
Same applies for the frequency axis, as has been expressed, the direction of several frequency bins b could be expressed by one direction parameter in band k consisting of several frequency bins b. The same applies for all of the discussed spatial parameters herein.
As also shown in Figure 2 an example metadata encoder/quantizer 111 is 15 shown according to some embodiments.
The metadata encoder/quantizer 111 may comprise a direction index generator 205 (or directional quantizer). The direction index generator 205 is configured to receive the direction parameters (such as the azimuth (p(k, n) and elevation 8(I<, ri) 108 and from this generate a quantized output in the form of an index 206 to a combiner 207.
Furthermore the metadata encoder/quantizer 111 may comprise an energy ratio encoder 223 which is configured to apply any suitable encoding and/or compression on the energy ratios and pass these to the combiner 207 is configured to receive the direction index values 206 (and other parameters such as energy ratios) and further encode them. For example in some embodiments the combiner 207 is configured to receive an expected bit allocation and modify the direction index values based on the expected bit allocation.
The combiner 207 is configured to receive the compressed energy ratios and direction indices and combine these to generate the compressed metadata stream for storing/transmitting.
With respect to Figure 3 is shown an example direction index generator 205. In some embodiments the direction parameters 300 are received and passed to a primary index generator 301 and a further index generator 303.
The primary index generator 301 in some embodiments is configured to receive the direction parameters as the azimuth (p(k, n) and elevation 0(k, n). The primary index generator 301 is configured to perform a rough partitioning of the sphere. For example the rough portioning may be a portioning of the sphere (within which the directional parameter range exists) into 8 equivalent regions. These regions may be shown for example by Figure 7 where the 8 regions are defined by OABC, OACY, OAYZ, OAZB (which defines the four 'northern' hemisphere regions where the elevation is >0) and OXBC. OXCY, OXYZ. OXZB (which defines the four 'southern' hemisphere regions where the elevation is >0).
As also shown in Figure 7 is the example region illustrated by the sphere section OABC that has as representative direction OD.
Each sphere section has a representative direction situated for instance in the centre of the section's exterior surface. Since in this example there are 8 sphere sections, they can be represented by 3 bits.
The primary index generator 301 may therefore be configured to determine within which section the direction parameter is within and generate (or lookup) a first three bits for direction index (in other words generate a primary index 1_0 representing the section within which the direction is.
This information may then be passed to a further index generator 303.
The further index generator 303 is configured to receive the direction parameter azimuth (p(k/n) and elevation 0(k, n) and furthermore the primary index value. The further index generator 303 may then be configured to perform an iterative process or sub-division of a current section (with for the second index is the division of one of the 8 sphere sections, for the third index is the division of the sub-section of one of the 8 sphere sections and so on).
For example in some embodiments each sphere section is further divided in 4 similar parts. For example Figure 8 shows a 'mapped' representation of the external surface of the section OABC. The external surface of the section OABC can be divided by point C' on the vertex midpoint between A and B, B' on the vertex midpoint between A and C, and A' on the vertex midpoint between C and B. In this manner the surface may be divided into four pseudo triangles, AC'B' 800, C'BA' 801, A'CB' 802 and A'B'C' 803 which form 4 small sphere sections given by the centres of the formed "pseudo-triangles".
In this manner the further index generator 303 can use two more bits to represent or identify which of the 4 sub-sections has the direction given by its centre closest to the direction to be encoded.
This may be recursively applied such that each "pseudo-triangle" is further 5 divided into 4 similar parts and two more bits used to define which sub-division has the direction given by its centre closest to the direction to be encoded and so on. For a final degree resolution in elevation of 1.4degrees, corresponding to a maximum angle distortion of 0.7degrees, 15 bits are needed. For 17 bits a resolution 0.7degrees is obtained, corresponding to a maximum angle distortion of 0.35 degrees.
The partitioning schematically represented in the 2D plane in Figure 8 corresponds to the 3D partitioning of the sphere section like presented in Figure 9. The obtained 4 pseudo-triangles are similar to one another, but they can be divided into 4 types of pseudo-triangles, based on their orientation with respect to the spherical coordinates: I. Upper corner-right isosceles pseudo-triangle like AC'B'. and ABC II. L-right pseudo triangle like C'BA' III. R-right pseudo triangle like B'A'G IV. Isosceles pseudo-triangle like C'A'B' Each of these pseudo-triangle types give through partitioning the following pseudo-triangle types I. Type I gives type 1,11,111, and IV II. Type 11 gives 3 type-11 and one reversed type II (rll) where reversed means here reflected with respect to a vertical axis and then with respect to a horizontal axis III. Type Ill gives 3 type III and one reversed type Ill (rIll) IV. Type IV gives 3 type IV and one reversed type IV (rIV).
The reversed types give through partitioning the following pseudo-triangle types: rll. Reversed II gives 3 reversed II type and 1 type II, rill. Reversed III gives 3 reversed III type and 1 type 111.
rIV. Reversed IV gives 3 reversed IV type and one type IV.
This for example may be shown in Figure 10 wherein the division of the Type I pseudo triangle ABC generates the Type I upper corner-right isosceles pseudo-triangle AC'S', L-right pseudo triangle C'BA', R-right pseudo triangle B'A'C and isosceles pseudo-triangle C'A'B'.
The sub-division of the Type II L-right pseudo triangle C'BA' produces 3 type II BDF, A'EF, DEC' and one reversed type II 0-10 DEF pseudo triangles.
The sub-division of the Type III R-right pseudo triangle B'A'C produces 3 type III CJI, IB'H, HA'J and one reversed type III (rIII) JIH.
The sub-division of the Type IV isosceles pseudo-triangle C'A'B' produces 10 3 type IV C'GE, GB'H, A'EH and one reversed type IV (rIV) GEH.
For each of these pseudo-triangle types as Voronoi regions, the codevectors expressed as elevation and azimuth values are given like in the following table: Type Azimuth Angle coord (triangle corners) Elevation T I ( 4), + 02) ..", 1 _ 0: + 02 9 -9, + 7,-02 -O-3 0 :, 4 02' as(16,, 2 c,ei, 00 02) k 2 II C(92,(b1)B(91,40Ar(91, 02) 1 1 1 1 i I-ei:82, (PDA101, 4)0C(91, 021:5.02-69 (ki+ -3052 -OD 1 2 002 = 01 IV A# Cb1 + °2)81(02,02)C(92. ..- 4, _ 0,, + 02 nil 9 9= 01 + 3-102 -6.,) 2 0 -0, + .i.(0, -00 (Bp 2 0 f rts: + 02 \ 2 i)(612, (PDF ( ei, * 2 i E(0, 02) 8=: 9, + -i-(92 -9, ) A= 01 4-i (92 --01) III ( 0 L + 0, \ 1.
(1) ' 4,-. -f- -5 ('-fr2 --03) H(01:40 ') /02,02) I leis 2 rIV A:Pi ÷ C-1)--.) B(0,, C)C(.611. 4)21) 9::: al 4--.-3(6, -91) 01+ 02 / 3 where 01, az and q51, 02 are the limits from lower to higher and from left to right of the elevation and azimuth angles of a pseudo-triangle, respectively.
At a 2bit stage quantization the index is from 0 to 3 and it corresponds to one of (Voronoi) regions as drawn in Figure 8. In some embodiments the secondary index generator may implement a look-up table which determines what pseudo-triangle shape is obtained as (Voronoi) region based on the quantization index and the pseudo-triangle shape that is partitioned.
Voronoi region based on the quantization index and the pseudo-triangle shape that is partitioned: Current pseudo-triangle type Quantization index 0 1 2 3 ii Ill IV II 11 11 11 r11 ill ill 111 Ill rill IV IV IV IV rIV ril r11 ill Ii rill rill rill rill Ill riV rIV rIV rIV IV This is represented for example as shown in Figures 1 la and 11 b. Figure 1 la shows the example shown in Figure 8 where the index values 0 to 3 are assigned to the sub-divisions.
Figure 1 lb shows the example shown in Figure 1 la where the triangle C'B'A is further divided and the quantization index vales 0 to 3 are assigned to these sub-10 divisions.
pseudo-triangle Current 'Quantization type ' index 0 01% 021,01.', 021 011, 921, 4511, 02 01, 02, 011, 02' 01" 021, Cr, 4/21 A- 91 + 92 8, + 02 2 ' ' 1, 2 1, 2 01, 01, 02 01 + 02 01+ 02 2 01, *-p 2, 02 11 01. -1-02 a, + 02 01 ±0, 01 4-02 °2 01. 01 01., ) , , 2 2 2 2 01 + 02 01 + 02 01 + 02 01 + 02.4. W..., * 2 2, W2 l, 2 Hi 01 + 0, 0, + 02 01 01 + 02 + 02 _ 01 + 2 01. ' 01
IV r11
2, 2, 2 01 4-02 91 + 02 (1)1 + 4)2 (P2 01, n 2, 4) 2, 02 2 t e, + 92 a,. 4-92 Hi ± 02 et + 02 -1 2, 0,!2, 01 + 02 -1, 2 02 2, 02 01 + 02 2, 01 + 02 01 + cfrz 01+ 4 2 4)-2 '1) 2 01 + 4 01 01 01 +02 3 01 ± 02 +301 + 4 4 0, + 02 01 + 02 a? 2 -01 + 02 1, 2 7.62 92 01 + 02, 01 + 02 2 * 01 + 02 2, 4,2 2, 02. 01, 2 2, 02 01 + 02 a, , 92 01 + 02 ---4 02 0,, rill 0, + 02 riV 1, 2 2, (9, 2, - 2 01 + 02 01 + 02 01 + 02 01 + 02 ----,5------, 02 al + 02 at, 2, 02 2, 02 01, 2 0, + 02 0, + e, el + 62 2,02 0 + 1 02 01 02 a2 2 01 + - ---, 01 + 02 1 02+ 0 01 01 + , + 301 + 02 01., 2 2: 02 4 37"1 + 02 + 4 4 This shows the limit angle coordinates for the next stage pseudo triangle where, 0]', 02', 01'42', as a function of the current stage quantization index (from 0 to 3) and the current pseudo triangle type. 01, 02, 01, 02 are the limit angle coordinates of the current pseudo triangle.
The secondary index generator 303 may then output the index values to an index combiner 305.
The index combiner 305 is configured to receive the primary and further indices and generate a concatenated or otherwise combined index bitstream to the 10 encoder.
Although the primary index generator and further index generator are shown as separate elements in Figure 3 in some embodiments they may be implemented in the same apparatus or process.
The operations of the direction index generator according to some 5 embodiments is shown with respect to Figure 4, The first operation is receiving the input direction defined by (0, 4)) as shown in Figure 4 by step 401.
The following operation is to determine to which of the 8 spherical sections the input direction belongs to and save the associated primary (3bit) index 1_0 as 10 shown in Figure 4 by step 403.
The next operation is to determine the absolute value of 8 and consider the corresponding spherical section from the "north hemisphere". The information about the elevation being negative is contained in the index 1_0. This is shown in Figure 4 by step 405.
The next operation is determining a current pseudo-triangle type, which for the primary partition is set to Type I as shown in Figure 4 by step 407.
Then a further index loop is started as shown in Figure 4 by step 409 (the further index loop can be iterated for any suitable number of iterations, for example for 1(.1:6).
Within the loop the angle coordinates of the corners of the current pseudo-triangle are calculated. For k=1 the angle coordinates of the pseudo-triangle depend on the index 1_0 as follows: 8A = 02; 4)A = 01 ± where 4)1. = 90 L/0/4-f, 4)2 = 90 ( r4i + 1), tai = 0.02 = 90..
The squared brackets signify an integer part. The first bit of the index 1_0 corresponds to the sign of 0, 0 if North hemisphere. 1 if South hemisphere. Under these assumptions the indexes 1_0 from 0 to 3 three correspond to the regions in the North hemisphere and the indexes from 4 to 7 to the regions in the South hemisphere.
°B = 91; 011 = 01 Oc = 0); (15c. = 4)2 For the next values for k, the angle coordinates are decided based on which pseudo-triangle from the 4 available at each step has been selected. The calculation of the angle coordinates of the corners is shown in Figure 4 by step 411. Then the angle coordinates of the mid vertexes points of the pseudo-triangle 5 are calculated as shown in Figure 4 by step 413. The angle coordinates of the mid vertexes are obtained as averages of the angle coordinates of the extreme points of each vertex.
After that the angle coordinates of the centres of each of the 4 partitions can be calculated (for example using the table shown above) as shown in Figure 4 by step 415.
Following this a distance is determined between (0, cb) and (0;,0-) is given by: d = -( sin(8) sin(8j) cos(0) cos (8j) cos(0 bj)) generated based on the input data with each of the 4 centers and select the 15 one nearest to the input (based on the shortest distance) as shown in Figure 4 by step 417.
Following this the index value is determined associated with the nearest one to generate a 2 bit index I_k and based on the index I_k determine the type of the current pseudo-triangle as shown in Figure 4 by step 419.
This brings the end of the loop and a further loop iteration is determined as shown in Figure 4 by step 421. Where a further loop is planned then the operation passes back to step 409.
Where no further loops are planned then an embedded index can be obtained by juxtaposing the bits of 1_0, 1_1, I_2, 1_3, 1_4, 1_5, l_6 and so on as 25 shown in Figure 4 by step 423.
In some embodiments the initial or further sectioning of the sphere may differ. For example if the points on the sphere should be on the Equator, the sphere should be rotated first. Thus for example based on the example shown in Figure 7 in some embodiments the direction D in terms of elevation and azimuth is used to define the initial hemisphere sectioning by orienting the sphere such that it is rotated 90 degrees around ZC (from Figure 7) such that B is the top of the sphere (North pole) and then orienting the sphere by a rotation 45 degrees around AX such that B drops to the right. In such a manner the circle that passes through A, X and the intersection point of OD and the sphere is located in the Equator position.
In performing this orientation or rotation enables having, at low resolutions as well, points on the Equator being represented on the Equator.
In some embodiments where rotations or transforms are implemented there may be two ways to perform the encoding. A first implementation is to rotate the input direction according to the described rotations, then perform the search in a manner described above, and then inverse rotate the final representation of the direction to get back in the original space.
A further implementation may be transform according to these two rotations the initial search in the first 8 sub-sections and as well as the formulas for the angles and the definitions of the pseudo-triangles. For example as shown in the example shown in Figure 7b an initial search within the 8 initial subsections may be depicted in the thick dashed line shown.
With respect to Figure 5 there is shown an example metadata extractor with respect to the decoding and extracting of directional metadata according to some embodiments.
In this example implementation the decoded directional parameter bitstream 500 is passed to a bit counter,'index splitter 501. In the example shown with respect to Figures 3 and 4 the direction parameter bitstrearn is the juxtaposition of indices 1 0, 1 1, I 2, 1 3, 1 4, 1 5, 1 6, and I 7.
The bit counter/index splitter 501 is configured to determine the number of indices 502 per parameter by counting the number of bits used to store the parameter. Furthermore in some embodiments this information (in the form of a suitable signal 502) is passed to a primary index decoder 503 and to a further index decoder 505. Thus for example the number of bits used according the example shown in Figures 3 and 4 will be 3 (for only the primary index), 5 (primary index and one further index), 7 (primary index and two further indices), 9 (primary index and three further indices), 11 (primary index and four further indices), 13 (primary index and five further indices), 15 (primary index and six further indices), or 17 (primary index and seven further indices).
In some embodiments the bit counter/index splitter 501 is furthermore configured to split the index values 504 such that the first index bits are passed to the primary index decoder 503 and the further index bits are passed to the further index decoder 505.
The primary index decoder 503 is configured to receive the first index bits and determine which section from the initial partition the direction is. For example 5 in some embodiments where the index value is <4 {0:3} then the directional elevation is >0 and otherwise is less than 0.
The primary index decoder furthermore may also determine a first set of limit angle coordinates based on the primary index value. For example in some embodiments this may be calculated as: 01 I= 90 [10/ 41 = 90 ( Fid +1), 01 = 0, 02 = 90.
Additionally the primary index decoder is configured to set the current triangle type to a Type I triangle.
The current triangle type and the limit angle coordinates may then be passed to the further index decoder 505.
The further index decoder 505 may be configured to receive the current triangle type (CTT) value and the limit angle coordinates based on the primary index decoding from the primary index decoder 503 and furtherrnore receive the indication of how many further indices are to be decoded and the further index values from the bit counter/index splitter 501.
The further index decoder 505 may then be configured to recursively use the further index values to generate further current triangle types values and limit angle coordinates.
When all of the further indices are decoded the further index decoder 505 may then output the codevector direction 508.
With respect to Figure 6 is shown an example flow diagram of the operations associated with the decoder shown in Figure 5.
The metadata extractor is configured to receive the embedded index bits N as shown in Figure 6 in step 601.
Having received the bits the number of indices, n, is determined as shown 30 in Figure 6 by step 603.
The first three bits are read to determine the primary index 10 as shown in Figure 6 by step 605.
Based on the primary index the current limit angle coordinates are determined as shown in Figure 6 by step 607.
Furthermore the current triangle type (CTT) is determined as shown in Figure 6 by step 609. This, in the example shown, is a Type I triangle.
Then a further index loop is started as shown in Figure 6 by step 611.
The next 2 bits are read to determine the first or next further index lei as shown in Figure 6 by step 613.
Based on the value of I_i and using a suitable determination (for example the look-up table defined above) the value of the current pseudo triangle type CTT 10 is determined as shown in Figure 6 by step 615.
Additionally the limit angle coordinates are updated as a function of I_i and CCT and set them as current limit angle coordinates as shown in Figure 6 by step 617.
Additional loops are then checked for as shown in Figure 6 by step 619.
Where further loops are required the operation passes back to step 611 and a further loop started.
Where further loops are not required then the direction is output based on the current limit angle coordinates as shown in Figure 6 by step 621.
With respect to Figure 12 an example electronic device which may be used 20 as the analysis or synthesis device is shown. The device may be any suitable electronics device or apparatus. For example in some embodiments the device 1400 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc. In some embodiments the device 1400 comprises at least one processor or 25 central processing unit 1407. The processor 1407 can be configured to execute various program codes such as the methods such as described herein.
In some embodiments the device 1400 comprises a memory 1411. In some embodiments the at least one processor 1407 is coupled to the memory 1411. The memory 1411 can be any suitable storage means. In some embodiments the memory 1411 comprises a program code section for storing program codes implementable upon the processor 1407. Furthermore in some embodiments the memory 1411 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1407 whenever needed via the memory-processor coupling.
In some embodiments the device 1400 comprises a user interface 1405. The user interface 1405 can be coupled in some embodiments to the processor 1407. In some embodiments the processor 1407 can control the operation of the user interface 1405 and receive inputs from the user interface 1405. In some embodiments the user interface 1405 can enable a user to input commands to the device 1400, for example via a keypad. In some embodiments the user interface 1405 can enable the user to obtain information from the device 1400. For example the user interface 1405 may comprise a display configured to display information from the device 1400 to the user. The user interface 1405 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1400 and further displaying information to the user of the device 1400. In some embodiments the user interface 1405 may be the user interface for communicating with the position determiner as described herein.
In some embodiments the device 1400 comprises an input/output port 1409.
The input/output port 1409 in some embodiments comprises a transceiver. The transceiver in such embodiments can be coupled to the processor 1407 and configured to enable a communication with other apparatus or electronic devices, for example via a w*wireless communications network. The transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
The transceiver can communicate with further apparatus by any suitable known communications protocol. For example in some embodiments the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
The transceiver input/output port 1409 may be configured to receive the signals and in some embodiments determine the parameters as described herein by using the processor 1407 executing suitable code. Furthermore the device may generate a suitable downmix signal and parameter output to be transmitted to the synthesis device.
In some embodiments the device 1400 may be employed as at least part of the synthesis device. As such the input/output port 1409 may be configured to receive the downmix signals and in some embodiments the parameters determined at the capture device or processing device as described herein, and generate a suitable audio signal format output by using the processor 1407 executing suitable code. The input/output port 1409 may be coupled to any suitable audio output for example to a multichannel speaker system and/or headphones or similar.
In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
For example, some aspects may be implemented in hardware. while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as 5 semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices arid systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
Embodiments of the inventions may he practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar modifications of the teachings of this invention will still fall within the scope of this invention as defined in the appended claims.

Claims (26)

  1. CLAIMS: 1. An apparatus comprising means for: receiving values for sub-bands of a frame, the values comprising at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value; identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value; determining a primary index value associated with the identified primary partition section; identifying at least one further partition section based on at least one further partitioning of the primary partition section; determining at least one further index value associated with the identified further partition section; combining the primary index value and the at least one further index value to generate an embedded index value associated with the at least one directional value.
  2. 2. The apparatus as claimed in claim 1, wherein the means for identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value is further for: partitioning the range of the at least one directional value into quarters of hemispheres; and identifying through which of the quarters of hemispheres the at least one directional value passes.
  3. 3. The apparatus as claimed in claim 2, wherein the means for determining a primary index value associated with the identified primary partition section is further for determining a three bit value associated with the identified quarter of the 30 hemisphere through which the at least one directional value passes.
  4. 4. The apparatus as claimed in any of claims 2 and 3, wherein the means for identifying at least one further partition section based on at least one further partitioning of the primary partition section is further for: recursively partitioning an identified section through which the at least one directional value passes into four further sections; and identifying which of the four further sections the at least one directional value passes.
  5. 5. The apparatus as claimed in claim 4, wherein the means for recursively partitioning an identified section through which the at least one directional value passes into four further sections, wherein the identified section has a pseudo-triangle surface, is further for: calculating angle coordinates of corners of a current pseudo-triangle; calculating angle coordinates of the mid vertex points of the current pseudo-triangle; and calculating angle coordinates of centres of each of the four further sections.
  6. 6. The apparatus as claimed in claim 5, wherein the means for calculating angle coordinates of centres of each of the four further sections is further for 20 employing a look-up table for calculating angle coordinates of centres of each of the four further sections.
  7. 7. The apparatus as claimed in any of claims 4 to 6, wherein the means for identifying which of the four further sections the at least one directional value 25 passes is for: determining distances between angle coordinates of centres of each of the four further sections and the at least one directional value; and selecting one of the four further sections based on the smallest absolute distances.
  8. 8. The apparatus as claimed in any of claims 4 to 7, wherein the means for determining at least one further index value associated with the identified further partition section is further for determining a two bit value associated with the identified further partition section from the four further sections for each recursive partitioning.
  9. 9. The apparatus as claimed in any of claims 1 to 8, wherein the means for combining the primary index value and the at least one further index value to generate an embedded index value associated with the at least one directional value is further for juxtaposing the primary index value and at least one further index value bits.
  10. 10. The apparatus as claimed in claim 2 or any claims dependent on claim 2, wherein the means for identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value is further for at least one of: rotating the at least one directional value before identifying a primary partition section based on a primary partitioning of a range of the at least one directional value; and determining a reference orientation for the partitioning the range of the at least one directional value into quarters of hemispheres based on the at least one directional value.
  11. 11. An apparatus comprising means for: receiving encoded values for sub-bands of a frame, the values comprising a primary index and at least one further index for each sub-band; identifying a primary partition section partitioning of a range of the at least one directional value based on a primary index value; identifying at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value; determining at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value based on the identified at least one further partition section.
  12. 12. The apparatus as claimed in claim 11, wherein the means for identifying a primary partition section partitioning of a range of the at least one directional value based on a primary index value is further for identifying through which of partitioning of a sphere into quarters of hemispheres the at least one directional value passes.
  13. 13. The apparatus as claimed in claim 12, wherein the means for identifying at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value is further for identifying which of a recursively partitioned four further sections the at least one directional value passes based on the at least one further index value.
  14. 14. A method comprising: receiving values for sub-bands of a frame, the values comprising at least one directional value for each sub-band, each directional value comprising an azimuth value and an elevation value; identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value; determining a primary index value associated with the identified primary partition section; identifying at least one further partition section based on at least one further partitioning of the primary partition section; determining at least one further index value associated with the identified further partition section; combining the primary index value and the at least one further index value to generate an embedded index value associated with the at least one directional 30 value.
  15. 15. The method as claimed in claim 14, wherein identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value further comprises: partitioning the range of the at least one directional value into quarters of 5 hemispheres; and identifying through which of the quarters of hemispheres the at least one directional value passes.
  16. 16. The method as claimed in claim 15, wherein determining a primary index value associated with the identified primary partition section further comprises determining a three bit value associated with the identified quarter of the hemisphere through which the at least one directional value passes.
  17. 17. The method as claimed in any of claims 15 and 16, wherein identifying at least one further partition section based on at least one further partitioning of the primary partition section further comprises: recursively partitioning an identified section through which the at least one directional value passes into four further sections; and identifying which of the four further sections the at least one directional value 20 passes.
  18. 18. The method as claimed in claim 17, wherein recursively partitioning an identified section through which the at least one directional value passes into four further sections, wherein the identified section has a pseudo-triangle surface.further comprises: calculating angle coordinates of corners of a current pseudo-triangle; calculating angle coordinates of the mid vertex points of the current pseudo-triangle; and calculating angle coordinates of centres of each of the four further sections.
  19. 19. The method as claimed in claim 18, wherein calculating angle coordinates of centres of each of the four further sections further comprises employing a look-up table for calculating angle coordinates of centres of each of the four further sections.
  20. 20. The method as claimed in any of claims 17 to 19, wherein identifying which of the four further sections the at least one directional value passes further comprises: determining distances between angle coordinates of centres of each of the four further sections and the at least one directional value; and selecting one of the four further sections based on the smallest absolute 10 distances.
  21. 21. The method as claimed in any of claims 17 to 20, wherein determining at least one further index value associated with the identified further partition section further comprises determining a two bit value associated with the identified further partition section from the four further sections for each recursive partitioning.
  22. 22. The method as claimed in any of claims 14 to 21, wherein combining the primary index value and the at least one further index value to generate an embedded index value associated with the at least one directional value further comprises generating by juxtaposing the primary index value and at least one further index value bits.
  23. 23. The method as claimed in claim 15 or any claims dependent on claim 15, wherein identifying a primary partition section based on a primary partitioning of a range of the at least one directional value and the at least one directional value further comprises at least one of: rotating the at least one directional value before identifying a primary partition section based on a primary partitioning of a range of the at least one directional value; and determining a reference orientation for the partitioning the range of the at least one directional value into quarters of hemispheres based on the at least one directional value.
  24. 24. A method comprising: receiving encoded values for sub-bands of a frame, the values comprising a primary index and at least one further index for each sub-band; identifying a primary partition section partitioning of a range of the at least 5 one directional value based on a primary index value; identifying at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value; determining at least one directional value for each sub-band, each 10 directional value comprising an azimuth value and an elevation value based on the identified at least one further partition section.
  25. 25. The method as claimed in claim 24, wherein identifying a primary partition section partitioning of a range of the at least one directional value based on a primary, index value further comprises identifying through which of partitioning of a sphere into quarters of hemispheres the at least one directional value passes.
  26. 26. The method as claimed in claim 25, wherein identifying at least one further partition section based on at least one further partitioning of the primary partition section based on at least one further index value further comprises identifying which of a recursively partitioned four further sections the at least one directional value passes based on the at least one further index value.
GB1817809.5A 2018-10-31 2018-10-31 Determination of spatial audio parameter encoding and associated decoding Withdrawn GB2578604A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
GB1817809.5A GB2578604A (en) 2018-10-31 2018-10-31 Determination of spatial audio parameter encoding and associated decoding
PCT/FI2019/050703 WO2020089509A1 (en) 2018-10-31 2019-10-01 Determination of spatial audio parameter encoding and associated decoding
EP19879837.3A EP3874771A4 (en) 2018-10-31 2019-10-01 Determination of spatial audio parameter encoding and associated decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1817809.5A GB2578604A (en) 2018-10-31 2018-10-31 Determination of spatial audio parameter encoding and associated decoding

Publications (2)

Publication Number Publication Date
GB201817809D0 GB201817809D0 (en) 2018-12-19
GB2578604A true GB2578604A (en) 2020-05-20

Family

ID=64655560

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1817809.5A Withdrawn GB2578604A (en) 2018-10-31 2018-10-31 Determination of spatial audio parameter encoding and associated decoding

Country Status (3)

Country Link
EP (1) EP3874771A4 (en)
GB (1) GB2578604A (en)
WO (1) WO2020089509A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023088560A1 (en) * 2021-11-18 2023-05-25 Nokia Technologies Oy Metadata processing for first order ambisonics

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1603342A1 (en) * 2003-03-07 2005-12-07 Sony Corporation Data encoding device and data encoding method and computer program
EP1879179A1 (en) * 2006-07-14 2008-01-16 Siemens Audiologische Technik GmbH Method and device for coding audio data based on vector quantisation
US20130329922A1 (en) * 2012-05-31 2013-12-12 Dts Llc Object-based audio system using vector base amplitude panning
EP2879409A1 (en) * 2013-11-27 2015-06-03 Akademia Gorniczo-Hutnicza im. Stanislawa Staszica w Krakowie A system and a method for determining approximate set of visible objects in beam tracing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060245601A1 (en) * 2005-04-27 2006-11-02 Francois Michaud Robust localization and tracking of simultaneously moving sound sources using beamforming and particle filtering
EP2860728A1 (en) * 2013-10-09 2015-04-15 Thomson Licensing Method and apparatus for encoding and for decoding directional side information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1603342A1 (en) * 2003-03-07 2005-12-07 Sony Corporation Data encoding device and data encoding method and computer program
EP1879179A1 (en) * 2006-07-14 2008-01-16 Siemens Audiologische Technik GmbH Method and device for coding audio data based on vector quantisation
US20130329922A1 (en) * 2012-05-31 2013-12-12 Dts Llc Object-based audio system using vector base amplitude panning
EP2879409A1 (en) * 2013-11-27 2015-06-03 Akademia Gorniczo-Hutnicza im. Stanislawa Staszica w Krakowie A system and a method for determining approximate set of visible objects in beam tracing

Also Published As

Publication number Publication date
WO2020089509A1 (en) 2020-05-07
GB201817809D0 (en) 2018-12-19
EP3874771A4 (en) 2022-08-17
EP3874771A1 (en) 2021-09-08

Similar Documents

Publication Publication Date Title
EP3861548B1 (en) Selection of quantisation schemes for spatial audio parameter encoding
EP3732678B1 (en) Determination of spatial audio parameter encoding and associated decoding
US11328735B2 (en) Determination of spatial audio parameter encoding and associated decoding
JP7213364B2 (en) Coding of Spatial Audio Parameters and Determination of Corresponding Decoding
KR20220043159A (en) Quantization of spatial audio direction parameters
US11475904B2 (en) Quantization of spatial audio parameters
KR20220047821A (en) Quantization of spatial audio direction parameters
GB2578604A (en) Determination of spatial audio parameter encoding and associated decoding
GB2585187A (en) Determination of spatial audio parameter encoding and associated decoding
JPWO2020089510A5 (en)
US20240079014A1 (en) Transforming spatial audio parameters
CA3237983A1 (en) Spatial audio parameter decoding

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)