EP2860728A1 - Procédé et appareil de codage et de décodage d'informations secondaires directionnelles - Google Patents

Procédé et appareil de codage et de décodage d'informations secondaires directionnelles Download PDF

Info

Publication number
EP2860728A1
EP2860728A1 EP20130306391 EP13306391A EP2860728A1 EP 2860728 A1 EP2860728 A1 EP 2860728A1 EP 20130306391 EP20130306391 EP 20130306391 EP 13306391 A EP13306391 A EP 13306391A EP 2860728 A1 EP2860728 A1 EP 2860728A1
Authority
EP
European Patent Office
Prior art keywords
directions
audio signal
direction values
frame
quantised
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP20130306391
Other languages
German (de)
English (en)
Inventor
Alexander Krüger
Sven Kordon
Oliver Wuebbolt
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thomson Licensing SAS
Original Assignee
Thomson Licensing SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing SAS filed Critical Thomson Licensing SAS
Priority to EP20130306391 priority Critical patent/EP2860728A1/fr
Publication of EP2860728A1 publication Critical patent/EP2860728A1/fr
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/03Application of parametric coding in stereophonic audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems

Definitions

  • the invention relates to a method and to an apparatus for encoding and for decoding directional side information for a 3D audio signal.
  • HOA Higher Order Ambisonics
  • WFS wave field synthesis
  • channel based approaches like 22.2.
  • HOA Higher Order Ambisonics
  • the HOA representation offers the advantage of being independent of a specific loudspeaker set-up. But this flexibility is at the expense of a decoding process which is required for the playback of the HOA representation on a particular loudspeaker set-up.
  • HOA may also be rendered to set-ups consisting of only few loudspeakers.
  • a further advantage of HOA is that the same representation can also be employed without any modification for binaural rendering to headphones.
  • HOA is based on the representation of the spatial density of complex harmonic plane wave amplitudes by a truncated Spherical Harmonics (SH) expansion.
  • SH Spherical Harmonics
  • Each expansion coefficient is a function of angular frequency, which can be equivalently represented by a time domain function.
  • O denotes the number of expansion coefficients.
  • HOA coefficient sequences or as HOA channels in the following.
  • An HOA representation can be expressed as a temporal sequence of HOA data frames containing HOA coefficients.
  • the compression of HOA sound field representations is proposed in patent applications EP 12305537.8 , EP 12306569.0 and EP 13305558.2 : these approaches have in common that they perform a sound field analysis and decompose the given HOA representation into a directional component and a residual ambient component.
  • the resulting compressed representation comprises of a number of quantised signals, resulting from the perceptual coding of the directional signals and relevant coefficient sequences of the ambient HOA component.
  • the resulting compressed representation comprises additional side information related to the quantised signals, which side information is necessary for the reconstruction of the HOA representation from its compressed version.
  • a problem to be solved by the invention is to further improve the compression of HOA representations. This problem is solved by the methods disclosed in claims 1 and 10. Apparatuses utilising these methods are disclosed in claims 2 and 11.
  • the invention deals with the coding of the side information related to the directional component, which additional compression is not addressed in the above-mentioned patent applications EP 12305537.8 , EP 12306569.0 and EP 13305558.2 .
  • EP 12305537.8 in order to efficiently code or compress a given HOA representation, it is analysed on a frame-by-frame basis and is decomposed into a directional component and a residual ambient component, whereby at compressor side the direction values are estimated based on a pre-defined grid of directions, and these direction values are used for the extraction of directional signals from the given HOA representation in the HOA compressor.
  • the resulting indices of directional signals as well as the direction values are encoded in a particular manner.
  • the inventive method is suited for encoding directional side information for a 3D audio signal, and includes the steps:
  • the inventive apparatus is suited for encoding directional side information for a 3D audio signal, said apparatus including:
  • the inventive method is suited for decoding directional side information for a 3D audio signal which directional side information was encoded according to the above encoding method, and includes the steps:
  • the inventive apparatus is suited for decoding directional side information for a 3D audio signal, which directional side information was encoded according to the above encoding method, said apparatus including means being adapted for:
  • Fig. 1 shows an HOA compression processing as described in patent application EP 13305558.2 , in which - following estimation of dominant sound source directions - a coding of directional side information is carried out.
  • HOA representation compression a frame-wise processing with non-overlapping input frames C(k) of HOA coefficient sequences of length L is used, where k denotes the frame index.
  • This long frame is 50% overlapped with an adjacent long frame and is successively used for the estimation of dominant sound source directions. If step/stage 11/12 is not present, the tilde symbol has no specific meaning.
  • step or stage 13 dominant sound sources are estimated.
  • the estimation provides a data set ⁇ ⁇ 1, ..., D ⁇ of indices of directional signals that have been detected as well as the set of corresponding direction estimates.
  • D denotes the maximum number of directional signals that has to be set before starting the HOA compression.
  • step or stage 14 the current frame C ⁇ ( k ) of HOA coefficient sequences is decomposed into a number of directional signals X DIR ( k -2) belonging to the directions contained in the set and a residual ambient HOA component C AMB ( k -2). The delay of two frames is introduced as a result of overlap-add processing in order to obtain smooth signals.
  • X DIR ( k - 2) is containing a total of D channels, of which however only those corresponding to the active directional signals are non-zero.
  • the indices specifying these channels are assumed to be output in the data set I DIR,ACT ( k - 2).
  • the decomposition in step/stage 14 provides some parameters ⁇ ( k -2) which are used at decompression side for predicting portions of the original HOA representation from the directional signals.
  • step or stage 15 the number of coefficients of the ambient HOA component C AMB ( k -2) is reduced so as to contain only O RED + D - N DIR , ACT ( k -2) non-zero HOA coefficient sequences, where N DIR,ACT ( k -2) -
  • the final ambient HOA representation with the reduced number of O RED + N DIR,ACT ( k -2) non-zero coefficient sequences is denoted by C AMS,RED ( k -2).
  • the indices of the chosen ambient HOA coefficient sequences are output in the data set I AMB,ACT ( k - 2).
  • the active directional signals contained in X DIR ( k -2) and the HOA coefficient sequences contained in C AMB,RED ( k -2) are assigned to the frame Y ( k -2) of I channels for individual perceptual encoding.
  • the data set ⁇ ⁇ 1,..., D ⁇ of indices of directional signals and the data set of corresponding direction value estimates from the estimation step/stage 13 are fed to a step or stage 18 that encodes the directional side information as described in the following.
  • Step/stage 18 outputs a vector a ( k ) denoting which directional signals are active in frame k , as well as a coded representation of all directions.
  • the values of can be entropy encoded.
  • Step/stage 34 receives vector a ( k ) denoting which directional signals are active in frame k , and the coded representation of all directions. Step/stage 34 decodes as described below this directional side information and outputs the data set ⁇ ⁇ 1, ..., D ⁇ of indices of directional signals and the set of corresponding direction estimates. In step or stage 31 a perceptual decoding of the I signals contained in is performed in order to obtain the I decoded signals in ⁇ ( k -2).
  • the perceptually decoded signals in ⁇ ( k -2) are redistributed in order to recreate the frame X ⁇ DIR ( k -2) of directional signals and the frame ⁇ AMB,RED ( k -2) of the ambient HOA component.
  • the information about how to re-distribute the signals is obtained by reproducing the assigning operation performed for the HOA compression, using the index data sets ⁇ DIR,ACT ( k) and I AMB,ACT ( k -2).
  • composition step or stage 33 a current frame ⁇ ( k -3) of the desired total HOA representation is re-composed using the frame X ⁇ DIR ( k -2) of the directional signals, the set of the active directional signal indices together with the set of the corresponding directions, the parameters ⁇ ( k -2) for predicting portions of the HOA representation from the directional signals, and the frame ⁇ AMS,RED ( k -2) of HOA coefficient sequences of the reduced ambient HOA component.
  • I.e., directional signals with respect to uniformly distributed directions are predicted from the directional signals ( X ⁇ DIR ( k -2)) using the received parameters ( ⁇ ( k -2)) for such prediction, and thereafter the current decompressed frame ( ⁇ ( k -3)) is re-composed from the frame of directional signals ( X ⁇ DIR ( k -2)), the predicted portions and the reduced ambient HOA component ( ⁇ AMB,RED ( k -2)).
  • the directional HOA component for the k-th frame is represented by a number D ACT ( k ) of directional signals and additional side information.
  • d 1 , ... , D ACT k of indices i ACT,d ( k ) of directional signals that have been detected.
  • the side information consists of the set G ⁇ ⁇ , ACT k : ⁇ ACT , d k
  • d 1 , ... , D ACT k of the corresponding directions ⁇ ACT, d ( k ).
  • Fig. 4 illustrates an exemplary result of the direction estimation for the first 7 frames (cf. Fig. 3 and the corresponding description of the representation of a direction in a spherical coordinate system).
  • the dots in Fig. 4 represent a grid of possible directions.
  • the direction estimates related to the directional signal with index 1 are marked by diamonds and the direction estimates related to the directional signal with index 2 are marked by crosses.
  • Fig. 5 This activity information is additionally illustrated in Fig. 5 , which shows for each frame index k if the direction with the respective index is active (indicated by white) or not (indicated by black).
  • a ( k ) will contain e.g. 4 to 8 bits per frame.
  • a current frame k will contain none, part or all of this set of pre-determined directional signals.
  • the current vector a ( k ) is transferred to the decoder or decompression side.
  • the coded direction value ⁇ ACT,1 ( k ) corresponds to the index indicated by the first non-zero element in a ( k )
  • the coded direction value ⁇ ACT,2 ( k ) is assumed to correspond to the index indicated by the second non-zero element in a ( k ), etc.
  • the coded representation of all direction values in the set are denoted by and is transferred to the decoder or decompression side.
  • the disadvantage of such individual or specific quantisation processing is that likely such specifically quantised direction values will not exactly match with the pre-defined grid of directions: in order to not introduce errors when carrying out in the HOA decompressor the re-synthesis of the HOA representation of the directional signals due to direction quantisation errors, the extraction of directional signals from the given HOA representation in the HOA compressor in step/stage 13 in Fig. 1 is based on that pre-defined grid of directions.
  • Patent applications EP 12306569.0 and EP 13305558.2 describe how directional signals can be extracted from an HOA representation.
  • the problem that the directions of the quantised direction values do not exactly match with the estimated directions can be solved in a first embodiment by exploiting the fact that the splitting into directional and ambient residual components and the direction estimation described in patent applications EP 12306569.0 and EP 13305558.2 is based on a direction search which is carried out on a fixed grid of directions (cf. patent application EP 13305156.5 for a description of direction estimation as an example).
  • a fixed grid represents the above-mentioned re-quantisation.
  • the estimated direction ⁇ ACT, d ( k ) is an element of a set ⁇ ⁇ q
  • q 1, ...,Q ⁇ of Q predefined directions.
  • Such coding of the directions offers the further advantage that it is not recursive, meaning that no knowledge of the direction estimates from previous frames is required for the decoding of the directions.
  • a disadvantage of such processing is that in general it does not achieve the related minimum possible average bit rate.
  • the possibly entropy encoded representation of all directions is received, wherein these direction values were quantised according to said pre-defined grid, and vector a ( k ) is received that comprises the encoded indices about which directions from the set of pre-defined directions are present in the current audio signal frame C ( k ). If necessary, an entropy decoding takes place.
  • the quantised direction values are re-quantised according to the pre-defined grid said, and vector a ( k ) is decoded.
  • the average bit rate for the coding of the directions for successive frames is further reduced by exploiting the relation between the direction estimates of successive frames.
  • the direction estimation as proposed in patent application EP 13305558.2 is based on an sound source movement model, which predicts the direction of a sound source in the k -th frame based on its movement between the ( k -2)-th and ( k -1)-th frame.
  • the quantised direction values (e.g. the direction index as proposed in the first embodiment) of the k -th frame are in the second embodiment coded using entropy coding like e.g. Huffman coding.
  • the individual code words for the direction values have a variable bit size depending on the frame adaptively determined probability of the individual directions.
  • direction values with a high probability are coded using small-size code words and direction values with a low probability are coded using large-size code words.
  • Such a coding strategy requires computation of the probability for the individual directions during HOA decompression in the same way as for the HOA compression.
  • the received entropy encoded quantised direction values are entropy decoded wherein frame adaptively an probability of the individual directions is determined.
  • this requires a high computational complexity in the HOA decompressor for computing the probabilities of the Q possible directions in each frame.
  • the processing is recursive, meaning that the decoding of a direction at decompression side is based on the knowledge of the directions from the previous two frames.
  • the number of possible probabilities is constrained to Q by making the probabilities dependent on the corresponding direction estimate in the last frame.
  • conditional probabilities are set them inversely proportional to the angular distance between a direction estimate in the current frame and the corresponding direction estimate in the last frame.
  • Another possibility is to measure the conditional a-priori probabilities of the direction estimates from some HOA representations instead of setting them.
  • the received entropy encoded quantised direction values are frame adaptively entropy decoded depending on an probability of the individual directions, whereby the number of possible probabilities is constrained to the number of directions in the pre-defined grid and the probabilities are dependent on the corresponding direction estimate in the last frame.
  • an entropy decoding table e.g. a Huffman table
  • the non-conditional probabilities of the quantised directions are employed.
  • Such probabilities can be measured from some test HOA representations, or can be assigned according to expectations about typical HOA sound field representation. For example, high probabilities are assigned for directions in the front and low probabilities for directions in the back.
  • Such a processing has the advantage of not being recursive, i.e. the decoding of a direction value is not based on the knowledge of any directions from previous frames.
  • the efficiency of this kind of processing is likely lower than that of the third embodiment.
  • the mode decision can be indicated by a Boolean variable which is prepended to the coded representation of a direction. Such a mode decision will in most cases minimise the bit amount of the corresponding code.
  • inventive processing can be carried out by a single processor or electronic circuit, or by several processors or electronic circuits operating in parallel and/or operating on different parts of the inventive processing.
  • the invention can be applied in any application where some directional information has to be efficiently coded, e.g. object based 3D audio where directional signals and object based side information have to be coded.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Stereophonic System (AREA)
EP20130306391 2013-10-09 2013-10-09 Procédé et appareil de codage et de décodage d'informations secondaires directionnelles Withdrawn EP2860728A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP20130306391 EP2860728A1 (fr) 2013-10-09 2013-10-09 Procédé et appareil de codage et de décodage d'informations secondaires directionnelles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP20130306391 EP2860728A1 (fr) 2013-10-09 2013-10-09 Procédé et appareil de codage et de décodage d'informations secondaires directionnelles

Publications (1)

Publication Number Publication Date
EP2860728A1 true EP2860728A1 (fr) 2015-04-15

Family

ID=49448078

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20130306391 Withdrawn EP2860728A1 (fr) 2013-10-09 2013-10-09 Procédé et appareil de codage et de décodage d'informations secondaires directionnelles

Country Status (1)

Country Link
EP (1) EP2860728A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019197713A1 (fr) * 2018-04-09 2019-10-17 Nokia Technologies Oy Quantification de paramètres audio spatiaux
WO2020089509A1 (fr) * 2018-10-31 2020-05-07 Nokia Technologies Oy Détermination de codage et de décodage associé de paramètre audio spatial

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070269063A1 (en) * 2006-05-17 2007-11-22 Creative Technology Ltd Spatial audio coding based on universal spatial cues
US20090083044A1 (en) * 2006-03-15 2009-03-26 France Telecom Device and Method for Encoding by Principal Component Analysis a Multichannel Audio Signal
US20110249821A1 (en) * 2008-12-15 2011-10-13 France Telecom encoding of multichannel digital audio signals

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090083044A1 (en) * 2006-03-15 2009-03-26 France Telecom Device and Method for Encoding by Principal Component Analysis a Multichannel Audio Signal
US20070269063A1 (en) * 2006-05-17 2007-11-22 Creative Technology Ltd Spatial audio coding based on universal spatial cues
US20110249821A1 (en) * 2008-12-15 2011-10-13 France Telecom encoding of multichannel digital audio signals

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HIRVONEN TONI ET AL: "Perceptual Compression Methods for Metadata in Directional Audio Coding Applied to Audiovisual Teleconference", AES CONVENTION 126; MAY 2009, AES, 60 EAST 42ND STREET, ROOM 2520 NEW YORK 10165-2520, USA, 7 May 2009 (2009-05-07), XP040508988 *
JING WANG ET AL: "Context-based adaptive arithmetic coding in time and frequency domain for the lossless compression of audio coding parameters at variable rate", EURASIP JOURNAL ON AUDIO, SPEECH, AND MUSIC PROCESSING, 21 May 2013 (2013-05-21), pages 1, XP055104567, Retrieved from the Internet <URL:http://asmp.eurasipjournals.com/content/pdf/1687-4722-2013-9.pdf> [retrieved on 20140226] *
MANUEL BRIAND: "Codage paramétrique basé sur l'Analyse en Composante Principale", 20 March 2007 (2007-03-20), pages 133 - 147, XP002661237, Retrieved from the Internet <URL:http://tel.archives-ouvertes.fr/docs/00/14/18/62/PDF/2007_Briand_ParametricAudioCoding.pdf> [retrieved on 20101013] *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019197713A1 (fr) * 2018-04-09 2019-10-17 Nokia Technologies Oy Quantification de paramètres audio spatiaux
KR20200140874A (ko) * 2018-04-09 2020-12-16 노키아 테크놀로지스 오와이 공간 오디오 파라미터의 양자화
EP3776545A4 (fr) * 2018-04-09 2022-01-05 Nokia Technologies Oy Quantification de paramètres audio spatiaux
US11475904B2 (en) 2018-04-09 2022-10-18 Nokia Technologies Oy Quantization of spatial audio parameters
WO2020089509A1 (fr) * 2018-10-31 2020-05-07 Nokia Technologies Oy Détermination de codage et de décodage associé de paramètre audio spatial

Similar Documents

Publication Publication Date Title
US11284210B2 (en) Methods and apparatus for compressing and decompressing a higher order ambisonics representation
US11184730B2 (en) Method and apparatus for compressing and decompressing a higher order ambisonics representation for a sound field
US11869523B2 (en) Method and apparatus for decoding a bitstream including encoded higher order ambisonics representations
EP2860728A1 (fr) Procédé et appareil de codage et de décodage d&#39;informations secondaires directionnelles
US20040230423A1 (en) Multiple channel mode decisions and encoding

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20131009

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20151016