EP2451196A1 - Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three - Google Patents

Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three Download PDF

Info

Publication number
EP2451196A1
EP2451196A1 EP10306212A EP10306212A EP2451196A1 EP 2451196 A1 EP2451196 A1 EP 2451196A1 EP 10306212 A EP10306212 A EP 10306212A EP 10306212 A EP10306212 A EP 10306212A EP 2451196 A1 EP2451196 A1 EP 2451196A1
Authority
EP
European Patent Office
Prior art keywords
data
ambisonics
order
value
coefficients
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP10306212A
Other languages
German (de)
French (fr)
Inventor
Holger Kropp
Florian Keiler
Johann-Markus Batke
Stefan Abeling
Johannes Boehm
Sven Kordon
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thomson Licensing SAS
Original Assignee
Thomson Licensing SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing SAS filed Critical Thomson Licensing SAS
Priority to EP10306212A priority Critical patent/EP2451196A1/en
Publication of EP2451196A1 publication Critical patent/EP2451196A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/03Application of parametric coding in stereophonic audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/11Application of ambisonics in stereophonic audio systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels

Definitions

  • the invention relates to a method and to an apparatus for generating and for decoding sound field data including Ambisonics sound field data of an order higher than three, wherein for encoding and for decoding different processing paths can be used.
  • 2D presentations include formats like stereo or surround sound, and are based on audio container formats like WAV and BWF (Broadcast Wave Format).
  • WAV Broadcast Wave Format
  • the wave format WAV is described in Microsoft, "Multiple Channel Audio Data and WAVE Files", updated March 7,2007 , http://www.microsoft.com/whdc/device/audio/multichaud.mspx , and in http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats /WAVE/WAVE.html, last update 19 June 2006.
  • WFS combines a high number of spherical sound sources for emulating plane waves from different directions. Therefore, a lot of loudspeakers or audio channels are required.
  • a description contains a number of source signals as well as their specific positions.
  • Ambisonics uses specific coefficients based on spherical harmonics for providing a sound field description that is independent from any specific loudspeaker set-up. This leads to a description which does not require information about loudspeaker positions during sound field recording or generation of synthetic scenes.
  • the reproduction accuracy in an Ambisonics system can be modified by its order N .
  • the 'higher-order Ambisonics' (HOA) description considers an order of more than one, and the focus in this application is on HOA.
  • the number of required audio information channels can be determined for a 2D or a 3D system, because this depends on the number of spherical harmonic bases.
  • 'mixed orders' have different orders in 2D (x-y plane only) and 3D (additionally z axis).
  • the first-order B-Format uses three channels for 2D and four channels for 3D.
  • the first-order B-Format is extended to the higher-order B-format. Depending on O a horizontal (2D), a full-sphere (3D), or a mixture sound field type description can be generated. By ignoring appropriate channels, this B-format is backward compatible, i.e. a 2D Ambisonics receiver is able to decode the 2D components from a 3D Ambisonics sound field.
  • the extended B-format for HOA considers only orders up to three, which corresponds to 16 channels maximum.
  • the older UHJ-format was introduced to enable mono and stereo compatibility.
  • the G-format was introduced to reproduce sound scenarios in 5.1 environments.
  • Wave FORMAT_EXTENSIBLE format is an extension of the above-mentioned WAV format.
  • One application is the use of Ambisonics B-format in the WAVEX description: "Wave Format Extensible and the .amb suffix or WAVEX and Ambisonics", http://mchapman.com/amb/wavex .
  • Wave-based audio format descriptions are used in different applications.
  • An environment which is very important today and will become even more important in the future are internet applications based on Ethernet transmission protocols.
  • a data structure for Ambisonics transmission that is able to use the above-mentioned B-format as well as additional features like the Ambisonics order and their coefficient's bit lengths in an efficient manner is not yet known to the applicant.
  • RTP Real-Time Protocol
  • This payload header extends the RTP header of Fig. 1 by a 2-octet extended sequence number and a 2-octet extended time stamp. Furthermore, one octet for flags and a reserved field, followed by a 3-octet SMPTE time stamp and a 4-octet offset value is proposed therein.
  • the 32-bit aligned payload data is following the header data.
  • a problem to be solved by the invention is to provide a data structure (i.e. a protocol layer) for 3D higher-order Ambisonics sound field description formats, which can be used for real-time transmission over Ethernet.
  • This problem is solved by the encoding method disclosed in claim 1 and the decoding method disclosed in claim 3. Apparatuses which utilise these methods are disclosed in claims 2 and 4, respectively.
  • the data structures described below facilitate real-time transmission of 3D sound field descriptions over Ethernet. From the content of additional metadata the transmitted 3D sound field can be adapted at receiver side to the available headphones or the number and positions of loudspeakers, for regular as well as for irregular set-ups. No regular loudspeaker set-ups including a large number of loudspeakers are required like in WFS.
  • the sound quality level can be adapted to the available sound reproduction system, e.g. by mapping a 3D Ambisonics sound field description onto a 2D loudspeaker set-up.
  • inventive data structure considers single microphones or microphone arrays as well as virtual acoustical sources with different accuracies and sample rates.
  • moving sources i.e. sources with time-dependent spatial positions
  • Ambisonics descriptions inherently.
  • the Ambisonics header information level is adaptable between a simple and an encoder related mode.
  • the latter one enables fast decoder modifications. This is useful especially for real-time applications.
  • the proposed data structure is extendable for classical audio scene descriptions, i.e. sound sources and their positions.
  • the inventive Ambisonics processing is based on linear operators, i.e. the Ambisonics channels data can be packed and transmitted singly or in an assembled manner as a matrix.
  • the inventive encoding method is suited for generating sound field data including Ambisonics sound field data of an order higher than three, said method including the steps:
  • the inventive encoder apparatus is suited for generating sound field data including Ambisonics sound field data of an order higher than three, said apparatus including:
  • the inventive decoding method is suited for decoding sound field data that were encoded according to the above encoding method using one or two or more of said paths, said method including the steps:
  • the inventive decoder apparatus is suited for decoding sound field data that were encoded according to the above encoding method using one or two or more of said paths, said apparatus including:
  • a first step or multiplier 33 all s source signals x ( k ) at each sample time kT , i.e. virtual single sources as well as microphone array sources, are multiplied with a matrix ⁇ defined in Eq.(1).
  • Fig. 3 shows a block diagram of an Ambisonics encoder for these four cases at production side. The required functions are represented by corresponding steps or stages in front of the transmission. All processing steps are clocked by a frequency that is made in stage 38 synchronous with the sample frequency 1/T.
  • a controller 37 receives a mode selection signal and the value of order N , and controls an optional multiplexer 36 that receives the filter responses and the output signal of multiplier 33, and outputs the inventive data structure frames 39.
  • Multiplier 33 represents a directional encoder providing corresponding coefficients and outputs the unfiltered vector data d ( k ), the order N value, and parameter Norm .
  • An array response filter 42 ('Filter 1' in Fig. 4 ) only for the microphone sources data can be arranged at decoder side.
  • the unfiltered vector data d ( k ), the order N value, and parameter Norm are assembled in a combiner 340 with radii data R S ( t ), and are fed to an optional multiplexer 36.
  • Radii data R S ( t ) represent the distances of the audio sources of the S input signals x ( k ), and refer to microphones as well as to artificially generated virtual sound sources.
  • the coefficients vector data d ( k ) pass through an array response filter 341 for the microphone sources (filter 2).
  • the filtering compensates the microphone-array response and is based on Bessel or Hankel functions. Basically, the signals from the output vectors d ( k ) are filtered.
  • the other inputs serve as parameters for the filter, e.g. parameter R is used for the term k * r .
  • the filtering is relevant only for microphones that have the individual radius R m . Such radii are taken into consideration in the term k * r of the Bessel or Hankel functions. Normally, the amplitude response of the filter starts with a lowpass characteristic but increases for higher frequencies.
  • the filtering is performed in dependency from the Ambisonics order N , the order n and the radii R m values, so as to compensate for non-linear frequency dependency.
  • a subsequent normalisation step or stage 351 for spherical waves data provides filtered coefficients A ( k ). It is assumed that there is also a corresponding filter at reproduction side (filter 431 in Fig. 4 ).
  • the filtered and normalised coefficients A ( k ), parameter Norm and the order N value are fed to multiplexer 36.
  • the coefficients vector data d ( k ) pass through an array response filter 342 for the microphone sources (filter 3).
  • the filtering is performed in dependency from said Ambisonics order N , said order n , the radii R m values and a radius R ref value representing the average radius R ref of the loudspeakers at decoder side as described in the below section "Radius R ref (RREF)", so as to compensate for non-linear frequency dependency.
  • a filter for spherical waves data is also arranged at reproduction side. Then the average radius R ref of the loudspeakers has to be considered already in filter 342.
  • a subsequent normalisation step or stage 352 for spherical waves data provides filtered coefficients A ( k ).
  • Step/stage 352 can include a distance coding like that described in connection with Fig. 4 .
  • the filtered coefficients A ( k ) from step/stage 352, parameter Norm , the order N value and radius value R ref are fed to multiplexer
  • the coefficients vector data d ( k ) pass through an array response filter 343 for the microphone sources (filter 4).
  • the filtering is performed in dependency from the Ambisonics order N , the radii R m values and a Plane Wave parameter.
  • a subsequent normalisation step or stage 353 for plane waves data provides parameter Norm , the order N value and a flag for Plane Wave to multiplexer 36.
  • the Ambisonics encoder can code the output signals 361 in any one of these paths, in any two of these paths, or in more than two of these paths.
  • the normalisation steps or stages 351 to 353 can use a normalisation or scaling as described below in section "Ambisonics Normalisation/Scaling Format (ANSF)".
  • the Ambisonics decoder depicted in Fig. 4 parses the incoming data data structures in a parser 41 in order to detect the case type and to provide the data for performing the appropriate functions.
  • An example for such parser is disclosed in WO 2009/106637 A1 .
  • Unfiltered vector data d ( k ), order value N , parameter Norm and each radii data R S ( t ) are parsed. These values pass through an array response filter 42 (Filter 1) for filtering (a filtering as described in Fig. 3 ) the received d ( k ) data under consideration of all radii R S ( t ).
  • the resulting filtered coefficients A ( k ) are distance coded (DC) in a distance coding step or stage 431 for all loudspeaker radii R LS and order N , and pass thereafter together with loudspeaker direction values ⁇ l ( representing the directions of the LS loudspeakers 46 ), value N and parameter Norm through an optional multiplexer 44 to a panning or pseudo inverse step or stage 45.
  • Distance coding means taking into account Bessel or Hankel functions with radii parameter in term k * r for plane or spherical waves.
  • Filtered coefficients A ( k ), parameter Norm and order value N are parsed.
  • the filtered coefficients A ( k ) are distance coded (DC) in a distance coding step or stage 432 for all loudspeaker radii R LS and order N , and pass thereafter together with loudspeaker direction values ⁇ l , value N and parameter Norm through multiplexer 44 to the panning or pseudo inverse step or stage 45.
  • Spherical waves on AE and AD sides are assumed.
  • Filtered coefficients A ( k ), order value N , parameter Norm and radius value R ref are parsed.
  • the filtered coefficients A ( k ) are distance coded (DC) in a distance coding step or stage 432 for all loudspeaker radii R LS and order N under consideration of radius R ref , and pass thereafter together with loudspeaker direction values ⁇ l , value N and parameter Norm through multiplexer 44 to the panning or pseudo inverse step or stage 45.
  • Spherical waves on AE and AD sides are assumed.
  • Filtered coefficients A ( k ), order value N , parameter Norm and a flag for Plane Waves are parsed.
  • the filtered coefficients A ( k ) together with loudspeaker direction values ⁇ l , value N and parameter Norm pass through multiplexer 44 to the panning or pseudo inverse step or stage 45. Plane waves on AE and AD sides are assumed.
  • a mode selector 47 selects in multiplexer 44 the corresponding path or paths a) to d) which was or were used at encoder side.
  • Decoder 45 which represents a panning or a mode matching operation including pseudo inverse, inverts the matrix ⁇ operation in the Ambisonics encoder in Fig. 3 , and applies this operation to the filtered coefficients A ( k ) or the filtered and distance coded coefficients A '( k ), respectively, in dependency from the parameter Norm , order value N and the loudspeaker direction values ⁇ l , and provides the l loudspeaker signals for a loudspeaker array 46.
  • Parser 41 also provides synchronisation information that is used for re-synchronisation of a clock 48.
  • the invention specifies a packet-based streaming format for encapsulating spatial sound field descriptions based on Ambisonics into an extended real-time transport protocol, in particular RTP, for real-time streaming of spatial audio scenes.
  • RTP extended real-time transport protocol
  • the focus is on a standalone spatial (2D/3D) audio real-time application, e.g. a transmission of a live concert or a live sport event via IP. This requires a specific spatial audio layer including time stamps and possibly synchronisation information.
  • the Ambisonics real-time stream can be used together with an RTP layer.
  • alternative RTP layers with or without extended headers are described below.
  • EASF Extended Ambisonics streaming format
  • Ethernet transmissions are performed in data packets with a typical packet length called 'path MTU' with up to 1500 or 9000 bytes.
  • 'path MTU' a typical packet length
  • 'frames' Such frame represents a dedicated time interval within which a typical number of packets is transmitted.
  • a frame For example in video applications, in 1080p video mode a frame contains 1080 data packets of which each one describes one line of a complete video frame.
  • a transmission should be frame based.
  • Case 1 requires a transmission of each time-dependent radii R S ( t ). This is an option if filter processing is to be performed in the decoder. However, in the following section the focus is on Cases 2-4 in which the filtered coefficients A ( k ) are transmitted. This allows a higher bandwidth because the transmission remains independent from all source positions, i.e. this is suited more for Ambisonics.
  • the protocol For standalone audio transmission, the protocol contains the following header data structure.
  • Payload Type 7 bits
  • the payload type is defined for an Audio standalone transmission as EASF.
  • EASF audio standalone transmission
  • the film format is chosen, e.g. DPX.
  • Sequence Number 16 bits The LSB bits for the sequence number. It increments by one for each RTP data packet sent, and may be used by the receiver for detecting packet loss and for restoring the packet sequence. The initial value of the sequence number is random (i.e. unpredictable) in order to make known-plaintext attacks on encryption more difficult.
  • Timestamp 32 bits
  • the timestamp denotes the sampling instant of the frame to which the RTP packet belongs. Packets belonging to the same frame must have the same timestamp.
  • RTP payload header extension According to the invention, the fields of the known RTP header keep their usual meaning, but that header is amended as follows: RTP Payload Frame Status (PLFS) - 2 bit The frame status describes which type of data will follow the extended RTP header in the payload block: PLFS code Payload type 00 Ambisonics coefficients 01 Frame end (+ Ambisonics coefficients) 10 Frame begin (+ Metadata) 11 Metadata I.e., in the first packet of a frame, instead of audio data, additional metadata can be transmitted. In case of Ambisonics transmission, the metadata contains source and Ambisonics encoder related information (production side information) required for the decoding process.
  • Time Code/Sync Frequency (TCSF) - 30 bit unsigned integer
  • the following SMPTE time code or the synchronisation is based on a specific clock frequency, the Time Code/Sync Frequency TSCF.
  • the TCSF is defined as a 30 bit integer field. The value is represented in Hz and leads to a frequency range from 0 to 1073.741824 MHz, wherein a value of 0 Hz is signalling that no time code is available.
  • the selection in data field AST facilitates not only a separation within Ambisonics (cf. the example provided below in connection with Fig. 9 ) but also the parallel transmission of differently encoded audio source signals (Ambisonocs and/or PCM data + position data), i.e. the inventive protocol can be complemented e.g. for PCM data.
  • the below-described SMPTE Time Code/Clock Sync Info (STCSI) facilitates the temporally correct assignment of the audio signal sources.
  • the general Ambisonics header is transmitted only in the first data packet of a frame and the individual Ambisonics header is transmitted in all other data packets.
  • the general Ambisonics header shall also be available in every data packet in front of the individual Ambisonics header. This mode enables a modification of the parameters in each data packet, i.e. in real-time. It can be useful for real-time applications where no or only small buffers are available. However, this mode decreases the available bandwidth.
  • Different sources can generate audio signals at the same time.
  • Known protocols are based on a separate transmission of the sound sources, i.e. every data frame refers to a single temporal section in which, depending on the sampling frequency, several samples can be contained. Therefore, in known protocols, different source signal occurring at the same time instant will use the same time stamp and the same frame number. This poses no problem for an offline processing, i.e. no real-time processing.
  • the transmitted data are buffered and assembled later on. However, this does not work for real-time processing in which a small latency is demanded.
  • the data field XAH facilitates a continued entrainment of the header, and the parser 41 in Fig. 4 can switch back and forth block-by-block (or Ethernet packet-by-packet or frame-by-frame) between different audio sources types.
  • Distinguishing between general header and individual header facilitates a real-time adaptation.
  • the value in the 24 bit field STCSI (see below) represents the SMPTE time code. If STS is set, field STCSI contains user-specific synchronisation information.
  • the packet offset describes the distance in bytes between the first payload octet of the first data packet in a frame relative to the first payload octet in the current data packet.
  • PAO(HIGH) represents the 32 MSBs and PAO(LOW) represents the 32 LSBs.
  • Ambisonics payload data and Ambisonics header data shall be fragmented such that the resulting RTP data packet is smaller than the 'path MTU' mentioned above.
  • the path MTU is a 'jumbo frame' of e.g. 9000 bytes.
  • a small individual Ambisonics header is sent in front of each data packet.
  • a general header contains source and encoder related information that can be useful for the Ambisonics decoder. It contains information that is valid for the all data packets within a frame, and for small frames and/or data packets it can be sent once at the beginning of a frame. Especially for real-time applications where the packet information is changing frequently, it can be advantageous to send the general header with each data packet.
  • Table 1 AFT code Format 00 B-Format order 01 numerical upward 10 numerical downward 11 Reserved Degree n Order m Channel 0 0 W 1 1 X 1 -1 Y 1 0 Z 2 0 R 2 1 S 2 -1 T 2 2 U 2 -2 V 3 0 K 3 1 L 3 -1 M 3 2 N 3 -2 O 3 3 P 3 -3 Q
  • the sequence of each matrix column in Eq.(1) from top to bottom represents a numerical upward order type.
  • a degree value always starts with 0 and runs up to Ambisonics Order N .
  • the sequence starts with lowest order - N and runs up to order + N .
  • the downward type uses for each degree the reversed order.
  • the Ambisonics order describes the quality of the Ambisonics en- and decoding via ⁇ .
  • An order up to 255 should be sufficient.
  • the order is distinguished in horizontal and vertical direction. In case of 2D, only AHO has a value greater than '0'.
  • a mixed order can have different AHO and AVO values.
  • Ambisonics Normalisation/Scaling Format (ANSF) - 3 bit Identifies different normalisation formats, typically used for Ambisonics.
  • the normalisation corresponds to the orthogonality relationship between Y n m and Y n ⁇ m ⁇ * .
  • additional normalisation principles e.g. Furse-Malham.
  • the Furse-Malham formulation facilitates a normalisation of the coefficients to get maximum values of ⁇ 1, which yields an optimal dynamic range.
  • the scaling factors are fixed over one frame. The scaling factors will be transmitted only once in front of the Ambisonics coefficients.
  • ANF code Format 000 Orthonormal 001 Schmidt semi-normalised 010 4n normalised 011 Unnormalised 100 Furse-Malham 101 Dedicated scaling 11x Reserved
  • the reference radius R ref value of the loudspeakers in mm is required in case of spherical waves.
  • f audible frequencies
  • speed of sound c 340 m/s.
  • This code defines the word length as well as the format (integer/floating point) of the transmitted Ambisonics coefficients A ( k ).
  • the sample format enables an adaptation to different value ranges.
  • nine sample formats are predefined: ASF code Format 0000 Unsigned integer 8 bit 0001 Signed integer 8 bit 0010 Signed integer 16 bit 0011 Signed integer 24 bit 0100 Signed integer 32 bit 0101 Signed integer 64 bit 0110 Float 32 bit (binary single prec.) 0111 Float 64 bit (binary double prec.) 1000 Float 128 bit (binary quad prec.) 1001-1111 Reserved
  • AIB If ASF is specified as an integer format, the number AIB of invalid bits can mask the lowest bits within the ASF integer. AIB is coded as 5 bit unsigned integer value, so that up to 31 bits can be marked as invalid. Valid bits start at MSB. Note that the word length of AIB is less than the ASF integer word length.
  • the rate at which the input data x i ( k ) are sampled is coded as an unsigned integer.
  • FSM If FSM is cleared, the following 31 bits for FS represent the file size in bytes. If FSM is set, FS represents the total number of data packets in the actual frame.
  • the frame size number FS is to be interpreted in view of the FSM flag's value. Depending on the application, the frame size can vary from frame to frame.
  • a 'frame' can contain several equal-length packets, wherein the last packet can have a different length that is described in the individual Ambisonics header. Every packet may use such a header for describing at the end lengths values that differ from prior packet lengths.
  • bits in front of APL are reserved. This enables an extension of the individual header, e.g. by packet related flags, and a 32 bit alignment for the following Ambisonics coefficients.
  • the maximum length is 65535.
  • the payload data type is defined in the data field PLFS (RTP Payload Frame Status), cf. Fig. 5 .
  • PLFS RTP Payload Frame Status
  • cf. Fig. 5 the payload data type is defined in the data field PLFS (RTP Payload Frame Status), cf. Fig. 5 .
  • 'pure' Ambisonics data or 'pure' metadata can be arranged.
  • the transmission processing operates in a sequential manner, i.e. at each transmission clock step (which is totally different from the sampling rate) only 32 or 64 bits of a data packet can be dealt with.
  • the number of considered Ambisonics samples in one data packet is related to one concatenated sample time or to a group of concatenated sample times.
  • the following examples of payload data show different dimensions, orders, and Ambisonics coefficients based on the encoder/decoder cases 2 to 4 of Fig. 3 .
  • the first index x of A( x , y ) describes the sequence number for a specific order, whereas the second index y stands for the sample time k in a data packet.
  • SMPTE MXF and XML are pre-defined.
  • AMT code Format 0x00 SMPTE MXF 0x80 XML 0x01-7F Rsrvd 0x81-0xFF Rsrvd
  • This data field is followed by specific metadata. If possible the metadata descriptions should be kept simple in order to get only one metadata packet in the 'begin packet' of a frame. However, the packet length in bytes is the same as for Ambisonics coefficients. If the amount of metadata will exceed this packet length, the metadata has to be fragmented into several packets which shall be inserted between packets with Ambisonics coefficients. If the metadata amount in bytes in one packet is less than the regular packet length, the remaining packet bytes are to be padded with '0' or stuffing bits.
  • the encapsulated CRC word at the end of each Ethernet packet should be used.
  • the content addressable memories CAM detect all protocol data which will lead to a decision about how the received data are to be processed in the following steps or stages, and the registers REG store information about the length or the payload data.
  • the parser evaluates the header data in a hierarchical manner and can be implemented in hardware or software, according to any real-time requirements.
  • spherical waves SPW or plane waves PW e.g. the worldwide live broadcast of a concert in 3D format, wherein all receiving units are arranged in cinemas.
  • the individual signals are to be transmitted separately so that a correct presentation can be facilitated.
  • the parser can distinguish this and supply two separate 'distance coding' units with the corresponding data items.
  • the inventive Ambisonics decoder depicted in Fig. 4 can process all these signals, whereas in the prior art several decoders would be required. I.e., the considering the Ambisonics wave type facilitates the advantages described above.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

Audio signal datastreams for 2D presentation are channel oriented. Due to 3D video in cinema and broadcasting, spatial or 3D audio becomes attractive. Ambisonics coding/decoding provides a sound field description that is independent from any specific loudspeaker set-up. The inventive system facilitates real-time transmission of Ambisonics of an order higher than '3' as well as single microphone signals. The transmitted 3D sound field can be adapted at receiver side to the available positions of loudspeakers. The Ambisonics header information level is adaptable between a simple and an encoder related mode enabling fast decoder modifications. The Ambisonics processing is based on linear operators, i.e. the Ambisonics channels data can be packed and transmitted singly or in an assembled manner as a matrix.

Description

  • The invention relates to a method and to an apparatus for generating and for decoding sound field data including Ambisonics sound field data of an order higher than three, wherein for encoding and for decoding different processing paths can be used.
  • Background
  • Traditional audio data signal transport streams for 2D presentation are channel oriented. 2D presentations include formats like stereo or surround sound, and are based on audio container formats like WAV and BWF (Broadcast Wave Format). The wave format WAV is described in Microsoft, "Multiple Channel Audio Data and WAVE Files", updated March 7,2007, http://www.microsoft.com/whdc/device/audio/multichaud.mspx , and in http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats /WAVE/WAVE.html, last update 19 June 2006.
  • Improved surround systems require an increasing number of loudspeakers or audio channels, which leads to an extension of these audio container formats.
  • Due to the upcoming 3D video activities in cinema and broadcasting, spatial or 3D audio becomes more and more attractive. Nevertheless, descriptions of spatial audio scenes are significantly more complex than in existing 2D surround sound systems. Well-known descriptions are based on Wave Field Synthesis (WFS, cf. WO2004/047485 A1 ) as well as on Ambisonics, which was already developed in the early 1970s: http://en.wikipedia.org/wiki/Ambisonics .
  • WFS combines a high number of spherical sound sources for emulating plane waves from different directions. Therefore, a lot of loudspeakers or audio channels are required. A description contains a number of source signals as well as their specific positions.
  • Ambisonics, however, uses specific coefficients based on spherical harmonics for providing a sound field description that is independent from any specific loudspeaker set-up. This leads to a description which does not require information about loudspeaker positions during sound field recording or generation of synthetic scenes. The reproduction accuracy in an Ambisonics system can be modified by its order N. The 'higher-order Ambisonics' (HOA) description considers an order of more than one, and the focus in this application is on HOA.
  • By that order the number of required audio information channels can be determined for a 2D or a 3D system, because this depends on the number of spherical harmonic bases. The number O of channels is for 2D: O=2·N+1, for 3D: O=(N+1)2. Besides true 2D or 3D cases, 'mixed orders' have different orders in 2D (x-y plane only) and 3D (additionally z axis).
  • The first-order B-Format uses three channels for 2D and four channels for 3D. The first-order B-Format is extended to the higher-order B-format. Depending on O a horizontal (2D), a full-sphere (3D), or a mixture sound field type description can be generated. By ignoring appropriate channels, this B-format is backward compatible, i.e. a 2D Ambisonics receiver is able to decode the 2D components from a 3D Ambisonics sound field. The extended B-format for HOA considers only orders up to three, which corresponds to 16 channels maximum.
  • The older UHJ-format was introduced to enable mono and stereo compatibility. The G-format was introduced to reproduce sound scenarios in 5.1 environments.
  • However, all these existing formats do not consider orders of more than three.
  • The Wave FORMAT_EXTENSIBLE format is an extension of the above-mentioned WAV format. One application is the use of Ambisonics B-format in the WAVEX description: "Wave Format Extensible and the .amb suffix or WAVEX and Ambisonics", http://mchapman.com/amb/wavex .
  • Invention
  • As mentioned above, known Ambisonics formats do not consider orders of more than three.
  • Wave-based audio format descriptions are used in different applications. An environment which is very important today and will become even more important in the future are internet applications based on Ethernet transmission protocols. However, a data structure for Ambisonics transmission that is able to use the above-mentioned B-format as well as additional features like the Ambisonics order and their coefficient's bit lengths in an efficient manner is not yet known to the applicant.
  • Another aspect is that in case of B-format always plane waves are assumed for the sound sources. Even for a higher quality of the acoustic wave field reproduction, a more realistic view should emulate the sound sources as spherical waves. But spherical waves will introduce more complex frequency dependencies than plane waves.
  • Furthermore, a transmission of video content is in many cases combined with audio content transmission. Existing streaming data structures, e.g. for cinema applications, consider 2D surround sound only, for example WAV or AIFF (Audio Interchange File Format).
  • A combined real-time transmission of video and audio format that is based on an extended 'Real-Time Protocol' (RTP) has been published in H. Schulzrinne et al., "RFC3550-RTP: A Transport Protocol for Real-Time Applications", Columbia University, http://www.faqs.org/rfcs/rfc3550.html, July 2003, in particular sections 5.1 and 5.3.1. The standard RTP header uses 12 data octets (8-bit data fields) in every RTP packet as depicted in Fig. 1. In EP 1936908 A1 an extension for such RTP header is proposed, for additionally encapsulating an extended RTP header and DPX (Digital Moving-Picture Exchange) data, AIFF/BWF audio data, or metadata, as depicted in Fig. 2.
  • This payload header extends the RTP header of Fig. 1 by a 2-octet extended sequence number and a 2-octet extended time stamp. Furthermore, one octet for flags and a reserved field, followed by a 3-octet SMPTE time stamp and a 4-octet offset value is proposed therein. The 32-bit aligned payload data is following the header data.
  • A problem to be solved by the invention is to provide a data structure (i.e. a protocol layer) for 3D higher-order Ambisonics sound field description formats, which can be used for real-time transmission over Ethernet. This problem is solved by the encoding method disclosed in claim 1 and the decoding method disclosed in claim 3. Apparatuses which utilise these methods are disclosed in claims 2 and 4, respectively.
  • The data structures described below facilitate real-time transmission of 3D sound field descriptions over Ethernet. From the content of additional metadata the transmitted 3D sound field can be adapted at receiver side to the available headphones or the number and positions of loudspeakers, for regular as well as for irregular set-ups. No regular loudspeaker set-ups including a large number of loudspeakers are required like in WFS.
  • Advantageously, in the inventive transmission data structure the sound quality level can be adapted to the available sound reproduction system, e.g. by mapping a 3D Ambisonics sound field description onto a 2D loudspeaker set-up. Advantageously, the inventive format enables Ambisonics orders up to N =255, whereas known Ambisonics formats allow orders up to N =3 only.
  • Further, the inventive data structure considers single microphones or microphone arrays as well as virtual acoustical sources with different accuracies and sample rates. Advantageously, moving sources (i.e. sources with time-dependent spatial positions) are considered in the Ambisonics descriptions inherently.
  • The Ambisonics header information level is adaptable between a simple and an encoder related mode. The latter one enables fast decoder modifications. This is useful especially for real-time applications.
  • The proposed data structure is extendable for classical audio scene descriptions, i.e. sound sources and their positions.
  • Generally, the inventive Ambisonics processing is based on linear operators, i.e. the Ambisonics channels data can be packed and transmitted singly or in an assembled manner as a matrix.
  • In principle, the inventive encoding method is suited for generating sound field data including Ambisonics sound field data of an order higher than three, said method including the steps:
    • receiving S input signals x (k) from a microphone array including M microphones, and/or from one or more virtual sound sources;
    • multiplying said input signals x (k) with a matrix Ψ, Ψ = Y 0 0 Ω 0 Y 0 0 Ω 1 . . Y 0 0 Ω S - 1 Y 1 - 1 Ω 0 Y 1 - 1 Ω 1 . Y 1 0 Ω 0 Y 1 0 Ω 1 . . . . Y N + N Ω 0 . . Y N + N Ω S - 1 ,
      Figure imgb0001
      wherein the matrix elements Y n m Ω s
      Figure imgb0002
      represent the spherical harmonics of all currently used directions Ω 0 ,..,Ω S-1 , index m denotes the order, index n denotes the degree of a spherical harmonics, N represents the Ambisonics order, n = 0,...,N, and m = -n,..., +n, so as to get coefficients vector data d (k) representing coded directional information of N Ambisonics signals for every sample time instant k;
    • processing said coefficients vector data d (k), value N and parameter Norm in one or two or more of the following four paths:
      1. a) combining said coefficients vector data d (k), said value N and said parameter Norm with radii data RS representing the distances of the sources of said S input signals x(k);
      2. b) based on spherical waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N and radii Rm values, said radii Rm values representing individual microphone radii in a microphone array, so as to compensate for non-linear frequency dependency, followed by normalising for spherical waves data, so as to provide filtered coefficients A(k), said parameter Norm and said order N value;
      3. c) based on spherical waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii Rm values and a radius Rref value, said radius Rref value representing a mean radius of loudspeakers arranged at decoder side, so as to compensate for non-linear frequency dependency, followed by normalising for spherical waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said radius Rref value;
      4. d) based on plane waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii RM values and a Plane Wave parameter, so as to compensate for non-linear frequency dependency, followed by normalising for plane waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said Plane Wave parameter;
    • in case a processing took place in two or more of said paths, multiplexing the corresponding data;
    • output of data frames including said provided data and values.
  • In principle the inventive encoder apparatus is suited for generating sound field data including Ambisonics sound field data of an order higher than three, said apparatus including:
    • means being adapted for multiplying S input signals x (k), which are received from a microphone array including M microphones and/or from one or more virtual sound sources, with a matrix Ψ, Ψ = Y 0 0 Ω 0 Y 0 0 Ω 1 . . Y 0 0 Ω S - 1 Y 1 - 1 Ω 0 Y 1 - 1 Ω 1 . Y 1 0 Ω 0 Y 1 0 Ω 1 . . . . Y N + N Ω 0 . . Y N + N Ω S - 1 ,
      Figure imgb0003
      wherein the matrix elements Y n m Ω s
      Figure imgb0004
      represent the spherical harmonics of all currently used directions Ω 0 ,...,Ω S-1 , index m denotes the order, index n denotes the degree of a spherical harmonics, N represents the Ambisonics order, n = 0,...,N, and m = -n,..., +n, so as to get coefficients vector data d (k) representing coded directional information of N Ambisonics signals for every sample time instant k;
    • means being adapted for processing said coefficients vector data d (k), value N and parameter Norm in one or two or more of the following four paths:
      1. a) combining said coefficients vector data d(k), said value N and said parameter Norm with radii data RS representing the distances of the sources of said S input signals x(k);
      2. b) based on spherical waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N and radii Rm values, said radii Rm values representing individual microphone radii in a microphone array, so as to compensate for non-linear frequency dependency, followed by normalising for spherical waves data, so as to provide filtered coefficients A (k), said parameter Norm and said order N value;
      3. c) based on spherical waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii RM values and a radius Rref value, said radius Rref value representing a mean radius of loudspeakers arranged at decoder side, so as to compensate for non-linear frequency dependency, followed by normalising for spherical waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said radius Rref value;
      4. d) based on plane waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii RM values and a Plane Wave parameter, so as to compensate for non-linear frequency dependency, followed by normalising for plane waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said Plane Wave parameter;
    • a multiplexer means for multiplexing the corresponding data in case a processing took place in two or more of said paths, which multiplexer means provide data frames including said provided data and values.
  • In principle, the inventive decoding method is suited for decoding sound field data that were encoded according to the above encoding method using one or two or more of said paths, said method including the steps:
    • parsing the incoming encoded data, determining the type or types a) to d) of said paths used for said encoding and providing the further data required for a decoding according to the encoding path type or types;
    • performing a corresponding decoding processing for one or two or more of the paths a) to d):
      1. a) based on spherical waves, filtering the received coefficients vector data d (k) in dependency from said radii data RS so as to provide filtered coefficients A (k),
        and distance coding said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
        and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;
      2. b) based on spherical waves, distance coding said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
        and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;
      3. c) based on spherical waves, distance coding said filtered coefficients A (k) in dependency from said order value N and said radius value Rref for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
        and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;
      4. d) based on plane waves, providing said filtered coefficients A (k), order value N, parameter Norm and a flag for Plane Waves;
    • in case a processing took place in two or more of said paths, multiplexing the corresponding data, wherein the selected path or paths are determined based on parameter Norm, order value N and said Plane Waves flag;
    • decoding said distance encoded filtered coefficients A'(k) or said filtered coefficients A(k), respectively, in dependency from said parameter Norm, said order value N and said loudspeaker direction values Ω l , so as to provide loudspeaker signals for a loudspeaker array.
  • In principle the inventive decoder apparatus is suited for decoding sound field data that were encoded according to the above encoding method using one or two or more of said paths, said apparatus including:
    • means being adapted for parsing the incoming encoded data, and for determining the type or types a) to d) of said paths used for said encoding and for providing the further data required for a decoding according to the encoding path type or types;
    • means being adapted for performing a corresponding decoding processing for one or two or more of the paths a) to d):
      1. a) based on spherical waves, filtering the received coefficients vector data d (k) in dependency from said radii data RS so as to provide filtered coefficients A (k),
        and distance coding said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
        and providing the distance encoded filtered coefficients A'(k) together with loudspeaker direction values Ω l , value N and parameter Norm;
      2. b) based on spherical waves, distance coding said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
        and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;
      3. c) based on spherical waves, distance coding said filtered coefficients A (k) in dependency from said order value N and said radius value Rref for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
        and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;
      4. d) based on plane waves, providing said filtered coefficients A (k), order value N, parameter Norm and a flag for Plane Waves;
    • multiplexing means which, in case a processing took place in two or more of said paths, select the corresponding data to be combined, based on parameter Norm, order value N and said Plane Waves flag;
    • decoding means which decode said distance encoded filtered coefficients A'(k) or said filtered coefficients A(k), respectively, in dependency from said parameter Norm, said order value N and said loudspeaker direction values Ω l , so as to provide loudspeaker signals for a loudspeaker array.
  • Advantageous additional embodiments of the invention are disclosed in the respective dependent claims.
  • Drawings
  • Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:
  • Fig. 1
    Known RTP header format;
    Fig. 2
    Known extended RTP header format encapsulating DPX data, audio data or metadata;
    Fig. 3
    Ambisonics encoder facilitating different applications at production side before Ambisonics coefficients and metadata are transmitted;
    Fig. 4
    Ambisonics decoder facilitating different applications at reproduction side following reception of Ambisonics coefficients and metadata;
    Fig. 5
    RTP payload header extension for Ambisonics data according to the invention;
    Fig. 6
    General Ambisonics data header;
    Fig. 7
    Individual Ambisonics data header;
    Fig. 8
    Ambisonics metadata;
    Fig. 9
    Ambisonics receiver parser.
    Exemplary embodiments
  • At first, different scenarios for sound recording or production as well as for reproduction are considered in order to derive the inventive Ethernet/IP based streaming data format. The description of these scenarios is based at production side on an Ambisonics encoder (AE) and at reproduction side on an Ambisonics decoder (AD).
  • In an Ambisonics encoder as shown in Fig. 3 there are two different kinds of possible input signals:
    • a microphone array 31 including m microphones, i.e. real sound sources;
    • v virtual sources 32, i.e. synthetic sounds.
  • For an HOA description of a source not only the time dependent source signal s(t) is required but also its position, which may move around and is time-dependent, too. The source position can be described by its spherical coordinates, i.e. the radius rS from the origin to the source and the angles (Θ S , Φ S ) = Ω S , where Θ S denotes the inclination and Φ S denotes the azimuth angle in the x,y plane.
  • In a first step or multiplier 33, all s source signals x (k) at each sample time kT, i.e. virtual single sources as well as microphone array sources, are multiplied with a matrix Ψ defined in Eq.(1).
  • Matrix Ψ with O rows and S columns performs a direction coding because Ψ contains the spherical harmonics Y n m Ω s
    Figure imgb0005
    of all currently used directions Ω S , wherein the superscript index m denotes the order and the subscript index n denotes the degree of a spherical harmonics (note: in connection with microphones the index m refers to the running number of a microphone). If N represents the Ambisonics order, the index n has values in the range 0,...,N, and the values of m are running from -N to +N. Ψ = Y 0 0 Ω 0 Y 0 0 Ω 1 . . Y 0 0 Ω S - 1 Y 1 - 1 Ω 0 Y 1 - 1 Ω 1 . Y 1 0 Ω 0 Y 1 0 Ω 1 . . . . Y N + N Ω 0 . . Y N + N Ω S - 1
    Figure imgb0006
    More details regarding indices n and m (m for order) are explained below in connection with Table 1. Instead of this specific format of matrix Ψ, any other equivalent representation for that matrix can be used instead.
  • Matrix Ψ is used to output a vector d (k) of N Ambisonics signals for every sample time instant k, as defined in Eq. (2) and Eq. (3) : d 0 0 k d 1 - 1 k d 1 0 k . d N - N k = Y 0 0 Ω 0 Y 0 0 Ω 1 . . Y 0 0 Ω S - 1 Y 1 - 1 Ω 0 Y 1 - 1 Ω 1 . Y 1 0 Ω 0 Y 1 0 Ω 1 . . . . Y N + N Ω 0 . . Y N + N Ω S - 1 x 0 k x 1 k . x s - 1 k
    Figure imgb0007
    d k = Ψ x k
    Figure imgb0008

    These signals are representing the complete sound field description that has to be transmitted to the reproduction side. Vector d (k) contains the directional information only. However, the distances of all sources over a specific frequency range are to be considered, too, and the frequency behaviour or dependency is non-linear. Therefore additional filters 341, 342 and 343 are required, which can be implemented at encoder or at decoder side. Especially for HOA, the plane wave processing is sometimes not sufficient because it does not consider frequency dependencies. Therefore, a more general processing will consider sources and sinks not only with plane waves but also with spherical waves. Both wave forms require additional steps or stages that use different factors depending on the radius r and the wave number k ω, where k ω = ω c = 2 π T c .
    Figure imgb0009
  • The pressure of a sound field p(r,Θ,Φ,k ω) can be calculated as follows: p r Θ Φ k ω = n = 0 m = - n m = n A n m k ω j n k ω r Y n m Θ Φ ,
    Figure imgb0010
    where jn (k ω r) describes the spherical Bessel function of the first type, which is depending on the product of wave number k ω and radius r. The coefficient A n m k ω
    Figure imgb0011
    or Ambisonics signal can be calculated in case of plane waves from every direction Ω S , independent from the frequency: A n m k ω = A n m = 4 π i n d n m
    Figure imgb0012
    This is not the case for spherical waves. Here, the coefficients A n m k ω
    Figure imgb0013
    are depending on the frequency: A n m k ω = - i k ω h n k ω r d n m ,
    Figure imgb0014
    where hn (k ω r) describes the spherical Hankel function of the first type.
  • All these dependencies lead to the following four cases that are to be considered for an extended transmission of Ambisonics coefficients based on RTP. Fig. 3 shows a block diagram of an Ambisonics encoder for these four cases at production side. The required functions are represented by corresponding steps or stages in front of the transmission. All processing steps are clocked by a frequency that is made in stage 38 synchronous with the sample frequency 1/T. A controller 37 receives a mode selection signal and the value of order N, and controls an optional multiplexer 36 that receives the filter responses and the output signal of multiplier 33, and outputs the inventive data structure frames 39. Multiplier 33 represents a directional encoder providing corresponding coefficients and outputs the unfiltered vector data d (k), the order N value, and parameter Norm.
  • Case 1:
  • An array response filter 42 ('Filter 1' in Fig. 4) only for the microphone sources data can be arranged at decoder side. The unfiltered vector data d (k), the order N value, and parameter Norm are assembled in a combiner 340 with radii data RS (t), and are fed to an optional multiplexer 36. Radii data RS (t) represent the distances of the audio sources of the S input signals x(k), and refer to microphones as well as to artificially generated virtual sound sources.
  • Case 2:
  • The coefficients vector data d (k) pass through an array response filter 341 for the microphone sources (filter 2). The filtering compensates the microphone-array response and is based on Bessel or Hankel functions. Basically, the signals from the output vectors d (k) are filtered. The other inputs serve as parameters for the filter, e.g. parameter R is used for the term k*r. The filtering is relevant only for microphones that have the individual radius Rm . Such radii are taken into consideration in the term k*r of the Bessel or Hankel functions. Normally, the amplitude response of the filter starts with a lowpass characteristic but increases for higher frequencies. The filtering is performed in dependency from the Ambisonics order N, the order n and the radii Rm values, so as to compensate for non-linear frequency dependency. A subsequent normalisation step or stage 351 for spherical waves data provides filtered coefficients A (k). It is assumed that there is also a corresponding filter at reproduction side (filter 431 in Fig. 4). The filtered and normalised coefficients A (k), parameter Norm and the order N value are fed to multiplexer 36.
  • Case 3:
  • The coefficients vector data d (k) pass through an array response filter 342 for the microphone sources (filter 3). The filtering is performed in dependency from said Ambisonics order N, said order n, the radii Rm values and a radius Rref value representing the average radius Rref of the loudspeakers at decoder side as described in the below section "Radius Rref (RREF)", so as to compensate for non-linear frequency dependency. In case microphone signals are used, a filter for spherical waves data is also arranged at reproduction side. Then the average radius Rref of the loudspeakers has to be considered already in filter 342. A subsequent normalisation step or stage 352 for spherical waves data provides filtered coefficients A (k). Step/stage 352 can include a distance coding like that described in connection with Fig. 4. The filtered coefficients A (k) from step/stage 352, parameter Norm, the order N value and radius value Rref are fed to multiplexer 36.
  • Case 4:
  • The coefficients vector data d (k) pass through an array response filter 343 for the microphone sources (filter 4). The filtering is performed in dependency from the Ambisonics order N, the radii Rm values and a Plane Wave parameter. A subsequent normalisation step or stage 353 for plane waves data provides parameter Norm, the order N value and a flag for Plane Wave to multiplexer 36.
  • The Ambisonics encoder can code the output signals 361 in any one of these paths, in any two of these paths, or in more than two of these paths.
    The normalisation steps or stages 351 to 353 can use a normalisation or scaling as described below in section "Ambisonics Normalisation/Scaling Format (ANSF)".
  • Following transmission of the values mentioned above, e.g. via an Ethernet connection, at reproduction side the Ambisonics decoder depicted in Fig. 4 parses the incoming data data structures in a parser 41 in order to detect the case type and to provide the data for performing the appropriate functions. An example for such parser is disclosed in WO 2009/106637 A1 .
  • Case 1:
  • Unfiltered vector data d (k), order value N, parameter Norm and each radii data RS (t) are parsed. These values pass through an array response filter 42 (Filter 1) for filtering (a filtering as described in Fig. 3) the received d (k) data under consideration of all radii RS (t). The resulting filtered coefficients A (k) are distance coded (DC) in a distance coding step or stage 431 for all loudspeaker radii RLS and order N, and pass thereafter together with loudspeaker direction values Ωl (representing the directions of the LS loudspeakers 46), value N and parameter Norm through an optional multiplexer 44 to a panning or pseudo inverse step or stage 45. Distance coding means taking into account Bessel or Hankel functions with radii parameter in term k*r for plane or spherical waves. Examples of distance coding are published in M.A.Poletti, "Three-Dimensional Surround Sound Systems Based on Sperical Harmonics", J.Audio Eng.Soc., vol.53, no.11, November 2005, e.g. in equations (31) and (32), and in J.Daniel, "Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format", AES 23th Intl.Conf., Copenhagen, Denmark, 23-25 May 2003.
  • Case 2:
  • Filtered coefficients A (k), parameter Norm and order value N are parsed. The filtered coefficients A (k) are distance coded (DC) in a distance coding step or stage 432 for all loudspeaker radii RLS and order N, and pass thereafter together with loudspeaker direction values Ω l , value N and parameter Norm through multiplexer 44 to the panning or pseudo inverse step or stage 45. Spherical waves on AE and AD sides are assumed.
  • Case 3:
  • Filtered coefficients A(k), order value N, parameter Norm and radius value Rref are parsed. The filtered coefficients A (k) are distance coded (DC) in a distance coding step or stage 432 for all loudspeaker radii RLS and order N under consideration of radius Rref , and pass thereafter together with loudspeaker direction values Ω l , value N and parameter Norm through multiplexer 44 to the panning or pseudo inverse step or stage 45. Spherical waves on AE and AD sides are assumed.
  • Case 4:
  • Filtered coefficients A (k), order value N, parameter Norm and a flag for Plane Waves are parsed. The filtered coefficients A (k) together with loudspeaker direction values Ω l , value N and parameter Norm pass through multiplexer 44 to the panning or pseudo inverse step or stage 45. Plane waves on AE and AD sides are assumed.
  • Based on parameter Norm, order value N and the Plane Waves flag, a mode selector 47 selects in multiplexer 44 the corresponding path or paths a) to d) which was or were used at encoder side. Decoder 45, which represents a panning or a mode matching operation including pseudo inverse, inverts the matrix Ψ operation in the Ambisonics encoder in Fig. 3, and applies this operation to the filtered coefficients A(k) or the filtered and distance coded coefficients A '(k), respectively, in dependency from the parameter Norm, order value N and the loudspeaker direction values Ω l , and provides the l loudspeaker signals for a loudspeaker array 46. The matrix Ψ operation is inverted for cases 1-3 by w l (k)=D·A'(k), and for case 4 by wl (k)=D·A(k). Parser 41 also provides synchronisation information that is used for re-synchronisation of a clock 48.
  • The invention specifies a packet-based streaming format for encapsulating spatial sound field descriptions based on Ambisonics into an extended real-time transport protocol, in particular RTP, for real-time streaming of spatial audio scenes. The focus is on a standalone spatial (2D/3D) audio real-time application, e.g. a transmission of a live concert or a live sport event via IP. This requires a specific spatial audio layer including time stamps and possibly synchronisation information. The Ambisonics real-time stream can be used together with an RTP layer. In addition, alternative RTP layers with or without extended headers are described below.
  • In general, for a spatial audio transmission a sound field description in Ambisonics can be used in which possible sound source positions are inherently encoded. An alternative is the transmission of the source signals together with their time-dependent or time-independent positions. A switching possibility between these two alternatives is provided, too, but the directly following section will focus on Ambisonics.
  • Extended Ambisonics streaming format (EASF)
  • Ethernet transmissions (e.g. via internet) are performed in data packets with a typical packet length called 'path MTU' with up to 1500 or 9000 bytes. In case Ambisonics sound fields are to be transmitted via Ethernet, such relatively small data packets are not large enough. Therefore, several packets can be combined in larger containers named 'frames'. Such frame represents a dedicated time interval within which a typical number of packets is transmitted. For example in video applications, in 1080p video mode a frame contains 1080 data packets of which each one describes one line of a complete video frame. Especially for real-time applications, even for audio (where low latency and low packet loss is important), a transmission should be frame based.
  • Because Ambisonics supports a sound field description independent of positions but with an adaptable quality, different amounts of data per packet or frame are possible. However, the number of octets in a data packet shall always be the same within a frame, except the last packet. In principle, the RTP sequence number is to be incremented with each packet.
  • With regard to Fig. 3 and Fig. 4, Case 1 requires a transmission of each time-dependent radii RS (t). This is an option if filter processing is to be performed in the decoder. However, in the following section the focus is on Cases 2-4 in which the filtered coefficients A (k) are transmitted. This allows a higher bandwidth because the transmission remains independent from all source positions, i.e. this is suited more for Ambisonics.
  • For standalone audio transmission, the protocol contains the following header data structure.
    A standard RTP header (cf. Fig. 1) containing the following bit fields:
    Version (V) - 2 bit
    RTP Version (default is V=2)
    Padding (P) - 1 bit
    If set, a data packet will contain several additional padding bytes. These are always located at the end following the payload. The last padding byte contains a count of how many padding bytes are to be ignored.
    Extension (X) - 1 bit
    If set, the fixed header is followed by exactly one header extension.
    CSRC count (CC) - 4 bit
    The number of contributing source identifiers, following the fixed header.
    Marker (M) - 1 bit
    In general, the marker bit can be defined by a profile. Here, it signalises the end of a frame, i.e. it is set for the last data packet. For other packets it must be cleared.
    Payload Type (PT) - 7 bits
    The payload type is defined for an Audio standalone transmission as EASF. For a combined transmission with uncompressed video the film format is chosen, e.g. DPX.
    Sequence Number - 16 bits
    The LSB bits for the sequence number. It increments by one for each RTP data packet sent, and may be used by the receiver for detecting packet loss and for restoring the packet sequence. The initial value of the sequence number is random (i.e. unpredictable) in order to make known-plaintext attacks on encryption more difficult. The standard 16-bit sequence number is augmented with another 16 bits in the payload header in order to avoid problems due to wrap-around when operating at high data rates.
    Timestamp - 32 bits
    The timestamp denotes the sampling instant of the frame to which the RTP packet belongs. Packets belonging to the same frame must have the same timestamp.
    RTP payload header extension
    According to the invention, the fields of the known RTP header keep their usual meaning, but that header is amended as follows:
    RTP Payload Frame Status (PLFS) - 2 bit
    The frame status describes which type of data will follow the extended RTP header in the payload block:
    PLFS code Payload type
    00 Ambisonics coefficients
    01 Frame end (+ Ambisonics coefficients)
    10 Frame begin (+ Metadata)
    11 Metadata
    I.e., in the first packet of a frame, instead of audio data, additional metadata can be transmitted. In case of Ambisonics transmission, the metadata contains source and Ambisonics encoder related information (production side information) required for the decoding process.
  • Time Code/Sync Frequency (TCSF) - 30 bit unsigned integer The following SMPTE time code or the synchronisation is based on a specific clock frequency, the Time Code/Sync Frequency TSCF. In order to support a large range of frequencies, the TCSF is defined as a 30 bit integer field. The value is represented in Hz and leads to a frequency range from 0 to 1073.741824 MHz, wherein a value of 0 Hz is signalling that no time code is available.
  • Audio Source Type (AST) - 2 bit
  • The transmission of audio content is possible in different modes. In form of Ambisonics sound field descriptions or sampled audio sources including their positions. The following table shows AST values and their meaning.
    AST code Possible sources
    00 Sound field
    01 Sound sources + fixed positions
    10 Sound sources + time dependent positions
    11 Reserved
  • The selection in data field AST facilitates not only a separation within Ambisonics (cf. the example provided below in connection with Fig. 9) but also the parallel transmission of differently encoded audio source signals (Ambisonocs and/or PCM data + position data), i.e. the inventive protocol can be complemented e.g. for PCM data. The below-described SMPTE Time Code/Clock Sync Info (STCSI) facilitates the temporally correct assignment of the audio signal sources.
  • Audio Dimension (ADIM) - 1 bit
  • The dimension in case of existing and extendable formats is described as follows:
    ADIM code Dimension
    0 2D
    1 3D
  • Extended Ambisonics Header (XAH) - 1 bit
  • If XAH is cleared, the general Ambisonics header is transmitted only in the first data packet of a frame and the individual Ambisonics header is transmitted in all other data packets.
  • If XAH is set, the general Ambisonics header shall also be available in every data packet in front of the individual Ambisonics header. This mode enables a modification of the parameters in each data packet, i.e. in real-time. It can be useful for real-time applications where no or only small buffers are available. However, this mode decreases the available bandwidth.
  • Different sources can generate audio signals at the same time. Known protocols are based on a separate transmission of the sound sources, i.e. every data frame refers to a single temporal section in which, depending on the sampling frequency, several samples can be contained. Therefore, in known protocols, different source signal occurring at the same time instant will use the same time stamp and the same frame number. This poses no problem for an offline processing, i.e. no real-time processing. The transmitted data are buffered and assembled later on. However, this does not work for real-time processing in which a small latency is demanded. In the inventive protocol, the data field XAH facilitates a continued entrainment of the header, and the parser 41 in Fig. 4 can switch back and forth block-by-block (or Ethernet packet-by-packet or frame-by-frame) between different audio sources types.
  • Distinguishing between general header and individual header facilitates a real-time adaptation.
  • Selector Time Code or Sync (STS) - 1 bit
  • If STS is cleared, the value in the 24 bit field STCSI (see below) represents the SMPTE time code. If STS is set, field STCSI contains user-specific synchronisation information.
  • Rsvrd - 3 bit
  • Reserved bits for future applications concerning the SMPTE time code or clock synchronisation.
  • SMPTE Time Code/Clock Sync Info (STCSI) - 24 bit
  • Identifies the SMPTE time code (hh:mm:ss:frfr = 6:6:6:6 bit), or synchronisation information for the local clocks of each source and sink. That synchronisation information format is user-dependent. It appears that this kind of synchronisation has not been used before for Ambisonics- and video synchronisation.
  • Packet Offset (PAO) - 64 bit
  • In a current frame the packet offset describes the distance in bytes between the first payload octet of the first data packet in a frame relative to the first payload octet in the current data packet. PAO(HIGH) represents the 32 MSBs and PAO(LOW) represents the 32 LSBs.
  • The above known and extended RTP header data are depicted in Fig. 5. PAO(LOW) is followed by the Ambisonics payload data.
  • Ambisonics payload layer
  • Ambisonics payload data and Ambisonics header data shall be fragmented such that the resulting RTP data packet is smaller than the 'path MTU' mentioned above. In case of 10GE transmission the path MTU is a 'jumbo frame' of e.g. 9000 bytes. There are two types of Ambisonics headers. A small individual Ambisonics header is sent in front of each data packet. A general header contains source and encoder related information that can be useful for the Ambisonics decoder. It contains information that is valid for the all data packets within a frame, and for small frames and/or data packets it can be sent once at the beginning of a frame. Especially for real-time applications where the packet information is changing frequently, it can be advantageous to send the general header with each data packet.
  • General Ambisonics header (only in the first data packet if XAH= 0) Ambisonics Endianness (AEN): 1 bit
  • The endianness used for the transmitted Ambisonics data.
    AE code Dimension
    0 Big Endian
    1 Little Endian
  • Ambisonics Header Length (AHL) - 8 bit
  • Identifies the length of the complete header in byte.
  • Ambisonics Wave Type (AWT) - 1 bit
  • Traditionally, Ambisonics assumes that all audio sources and loudspeakers provide plane waves for modelling the sound field. A typical example is the B-format. However, an extended Ambisonics sound field description with higher quality requires also a modelling with spherical waves. Therefore, the AWT field considers both possibilities.
    AWT code Dimension
    0 Plane wave
    1 Spherical wave
  • Ambisonics Order Type (AOT) - 2 bit
  • Identifies the sequence of how the Ambisonics coefficients are transmitted. Up to 4 order types can be addressed. The different formats depend on the order and indexing in Eq. (1), i.e. how the spherical harmonics are ordered in a column of W. The existing Ambisonics B-format uses a specific sequence of Ambisonics coefficients according to Table 1, wherein K to Z denotes known B-Format channels. In case of 3D the coefficients are transmitted from top to bottom in Table 1.
    E.g. for degree n=2, the sequence will be WXYZRSTUV. Table 1
    AFT code Format
    00 B-Format order
    01 numerical upward
    10 numerical downward
    11 Reserved
    Degree n Order m Channel
    0 0 W
    1 1 X
    1 -1 Y
    1 0 Z
    2 0 R
    2 1 S
    2 -1 T
    2 2 U
    2 -2 V
    3 0 K
    3 1 L
    3 -1 M
    3 2 N
    3 -2 O
    3 3 P
    3 -3 Q
  • As an alternative, the sequence of each matrix column in Eq.(1) from top to bottom represents a numerical upward order type. A degree value always starts with 0 and runs up to Ambisonics Order N. For each degree, the sequence starts with lowest order -N and runs up to order +N. The downward type uses for each degree the reversed order.
  • Ambisonics Horizontal Order (AHO) - 8 bit Ambisonics Vertical Order (AVO) - 8 bit
  • The Ambisonics order describes the quality of the Ambisonics en- and decoding via Ψ. An order up to 255 should be sufficient. According to the audio dimension the order is distinguished in horizontal and vertical direction.
    In case of 2D, only AHO has a value greater than '0'. A mixed order can have different AHO and AVO values.
  • RSVRD (Order) - 2x2 bit
  • For possible extension of order related issues, these reserved bits are considered in front of AHO and AVO.
  • Ambisonics Normalisation/Scaling Format (ANSF) - 3 bit Identifies different normalisation formats, typically used for Ambisonics. The normalisation corresponds to the orthogonality relationship between Y n m
    Figure imgb0015
    and Y *
    Figure imgb0016
    . Furthermore there are additional normalisation principles, e.g. Furse-Malham. The Furse-Malham formulation facilitates a normalisation of the coefficients to get maximum values of ±1, which yields an optimal dynamic range.
    In case of dedicated scaling the scaling factors are fixed over one frame. The scaling factors will be transmitted only once in front of the Ambisonics coefficients.
    ANF code Format
    000 Orthonormal
    001 Schmidt semi-normalised
    010 4n normalised
    011 Unnormalised
    100 Furse-Malham
    101 Dedicated scaling
    11x Reserved
  • Radius Rref (RREF) - 16 bit
  • The reference radius Rref value of the loudspeakers in mm is required in case of spherical waves. The maximal radius depends on the acoustic wave length λ which can be calculated from audible frequencies f (FLOW =20 Hz - fHI =20 kHz) and the speed of sound c =340 m/s. Thus for the radius Rref , values from 17.000 mm to 17 mm are required and a word length of 16 bit is sufficient for that.
  • Ambisonics Sample Format (ASF) - 4 bit
  • This code defines the word length as well as the format (integer/floating point) of the transmitted Ambisonics coefficients A (k). The sample format enables an adaptation to different value ranges. In the following table nine sample formats are predefined:
    ASF code Format
    0000 Unsigned integer 8 bit
    0001 Signed integer 8 bit
    0010 Signed integer 16 bit
    0011 Signed integer 24 bit
    0100 Signed integer 32 bit
    0101 Signed integer 64 bit
    0110 Float 32 bit (binary single prec.)
    0111 Float 64 bit (binary double prec.)
    1000 Float 128 bit (binary quad prec.)
    1001-1111 Reserved
  • Ambisonics Invalid Bits (AIB) - 5 bit
  • If ASF is specified as an integer format, the number AIB of invalid bits can mask the lowest bits within the ASF integer. AIB is coded as 5 bit unsigned integer value, so that up to 31 bits can be marked as invalid. Valid bits start at MSB. Note that the word length of AIB is less than the ASF integer word length.
  • Sample Rate (SR) - 32 bit
  • The rate at which the input data xi (k) are sampled. The value in Hz is coded as an unsigned integer.
  • Frame Size Mode (FSM) - 1 bit
  • If FSM is cleared, the following 31 bits for FS represent the file size in bytes. If FSM is set, FS represents the total number of data packets in the actual frame.
  • Frame Size (FS) - 31 bit
  • The frame size number FS is to be interpreted in view of the FSM flag's value. Depending on the application, the frame size can vary from frame to frame.
  • As mentioned above, a frame represents a unit of several data packets. It is assumed that for uncompressed data all packets expect the last one will have the same length. Then the frame size in bytes can be calculated to #bytes per frame = (FS-1)*packet size + last packet size.
  • Basic Ethernet applications do normally use MTU sizes of 1500 bytes. Modern 10 Gigabit Ethernet applications consider larger MTUs (e.g. 'jumbo frames' with 9000 to 16000 bytes). To enable data sets larger than 232 bytes (4GB), the frame size should be specified as a number of data packets. I.e., if a data packet contains 9000 bytes the maximum frame size would be greater than 35 Tbyte.
  • The general Ambisonics data header in the Ambisonics payload data is depicted in Fig. 6. A 'frame' can contain several equal-length packets, wherein the last packet can have a different length that is described in the individual Ambisonics header. Every packet may use such a header for describing at the end lengths values that differ from prior packet lengths.
  • Individual Ambisonics header Reserved (RSRVD) - 16 bit
  • The bits in front of APL are reserved. This enables an extension of the individual header, e.g. by packet related flags, and a 32 bit alignment for the following Ambisonics coefficients.
  • Ambisonics Packet Length (APL) - 16 bit
  • Defines the MTU length for each individual data packet in bytes. The maximum length is 65535.
  • This individual Ambisonics header is depicted in Fig. 7. If applied, the two data fields RSRVD and APL will follow data field FS in Fig. 6. APL contains the length of the following Ethernet packet which contains payload data (Ambisonics components).
  • Ambisonics payload data
  • As mentioned above, the payload data type is defined in the data field PLFS (RTP Payload Frame Status), cf. Fig. 5. Following the individual Ambisonics header, and possibly the individual Ambisonics header, 'pure' Ambisonics data or 'pure' metadata can be arranged.
  • Ambisonics coefficients
  • Due to the time dependency of the input samples x(kT)= x(k) and of the direction and radii RS (t), it is important to perform the Ambisonics encoding and decoding with regard to the specific sample time kT or even simpler at k.
  • However, when considering a protocol based transmission, the transmission processing operates in a sequential manner, i.e. at each transmission clock step (which is totally different from the sampling rate) only 32 or 64 bits of a data packet can be dealt with. The number of considered Ambisonics samples in one data packet is related to one concatenated sample time or to a group of concatenated sample times.
  • Normally, all Ambisonics coefficients have the same length across all data packets in a frame. However, if the general Ambisonics header is inserted in a normal data packet, the data parameters can be modified within a frame.
  • The following examples of payload data show different dimensions, orders, and Ambisonics coefficients based on the encoder/decoder cases 2 to 4 of Fig. 3. The first index x of A(x,y) describes the sequence number for a specific order, whereas the second index y stands for the sample time k in a data packet.
    Example 1: ADIM=1, AHO=AVO=3, ASF=2
    Figure imgb0017
    Figure imgb0018
    Example 2: ADIM=1, AHO=AVO=2, ASF=3, AIB=2
    Figure imgb0019
    Figure imgb0020
    Example 3: ADIM=1, AHO=AVO=2, ASF=4, AIB=7
    Figure imgb0021
    Figure imgb0022
    Example 4: ADIM=1, AHO=AVO=1, ASF=4
    Figure imgb0023
    Example 5: ADIM=1, AHO=AVO=1, ASF=7
    Figure imgb0024
    Figure imgb0025
  • Ambisonics metadata
  • If PLFS is set to 102 = 210, metadata are transmitted instead of Ambisonics coefficients. For metadata different formats are existing, of which some are considered below. Thus in front of the concrete metadata content a metadata type defines the specific formats as depicted in Fig. 8. The first two data fields RSRVD and APL are like in Fig. 7.
  • Ambisonics Metadata Type (AMT) - 16 bit
  • The types SMPTE MXF and XML are pre-defined.
    AMT code Format
    0x00 SMPTE MXF
    0x80 XML
    0x01-7F Rsrvd
    0x81-0xFF Rsrvd
  • Rsrvd - 16 bit
  • Reserved bits for future applications concerning metadata.
  • This data field is followed by specific metadata. If possible the metadata descriptions should be kept simple in order to get only one metadata packet in the 'begin packet' of a frame. However, the packet length in bytes is the same as for Ambisonics coefficients. If the amount of metadata will exceed this packet length, the metadata has to be fragmented into several packets which shall be inserted between packets with Ambisonics coefficients. If the metadata amount in bytes in one packet is less than the regular packet length, the remaining packet bytes are to be padded with '0' or stuffing bits.
  • For channel coding purposes the encapsulated CRC word at the end of each Ethernet packet should be used.
  • At the production side as shown in Fig. 3, three different Cases are considered by the above-mentioned data structure (i.e. three cases where A (k) data are transmitted and one case where d (k) data are transmitted). The question is how to detect the Ambisonics Encoding/Decoding mode at reproduction or receiver side. The Case chosen at production side can be derived in parser 41 in Fig. 4 from the bit fields RREF and AFT. The following table shows the values for RREF and AFT and their meaning:
    Mode Payload data
    2 filtered A (k), RREF=0, AFT= Spherical Wave
    3 filtered A (k), RREF≠0, AFT= Spherical Wave
    4 filtered A (k), AFT= B-format/Plane Wave
    RREF is obsolete
    Regarding the specific structure in figures 3 and 4, in Fig. 9 the parser 41 of the Ambisonics decoder in Fig. 4 is shown in more detail. For collecting corresponding data items from an Ambisonics data stream ADSTR, the parser can use registers REG and content addressable memories CAM. The content addressable memories CAM detect all protocol data which will lead to a decision about how the received data are to be processed in the following steps or stages, and the registers REG store information about the length or the payload data. The parser evaluates the header data in a hierarchical manner and can be implemented in hardware or software, according to any real-time requirements.
  • Example:
  • Several audio signals are generated and transmitted as spherical waves SPW or plane waves PW, e.g. the worldwide live broadcast of a concert in 3D format, wherein all receiving units are arranged in cinemas. In such case the individual signals are to be transmitted separately so that a correct presentation can be facilitated. By a corresponding arrangement of the protocol (Ambisonics Wave Type AWT described above) the parser can distinguish this and supply two separate 'distance coding' units with the corresponding data items. The inventive Ambisonics decoder depicted in Fig. 4 can process all these signals, whereas in the prior art several decoders would be required. I.e., the considering the Ambisonics wave type facilitates the advantages described above.

Claims (8)

  1. Method for generating sound field data including Ambisonics sound field data of an order higher than three, said method including the steps:
    - receiving s input signals x (k) from a microphone array (31) including m microphones, and/or from one or more virtual sound sources (32);
    - multiplying (33) said input signals x (k) with a matrix Ψ, Ψ = Y 0 0 Ω 0 Y 0 0 Ω 1 . . Y 0 0 Ω S - 1 Y 1 - 1 Ω 0 Y 1 - 1 Ω 1 . Y 1 0 Ω 0 Y 1 0 Ω 1 . . . . Y N + N Ω 0 . . Y N + N Ω S - 1 ,
    Figure imgb0026
    wherein the matrix elements Y n m Ω s
    Figure imgb0027
    represent the spherical harmonics of all currently used directions Ω 0 ,...,Ω S-1 , index m denotes the order, index n denotes the degree of a spherical harmonics, N represents the Ambisonics order, n = 0,...,N, and m = -n, ...,+n,
    so as to get coefficients vector data d (k) representing coded directional information of N Ambisonics signals for every sample time instant k;
    - processing said coefficients vector data d (k), value N and parameter Norm in one or two or more of the following four paths:
    a) combining (340) said coefficients vector data d (k), said value N and said parameter Norm with radii data RS representing the distances of the sources of said input signals x(k);
    b) based on spherical waves, array response filtering (341) said coefficients vector data d (k) in dependency from said Ambisonics order N and radii Rm values, said radii Rm values representing individual microphone radii in a microphone array, so as to compensate for non-linear frequency dependency, followed by normalising (351) for spherical waves data, so as to provide filtered coefficients A (k), said parameter Norm and said order N value;
    c) based on spherical waves, array response filtering (342) said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii Rm values and a radius Rref value, said radius Rref value representing a mean radius of loudspeakers arranged at decoder side, so as to compensate for non-linear frequency dependency, followed by normalising (352) for spherical waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said radius Rref value;
    d) based on plane waves, array response filtering (343) said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii Rm values and a Plane Wave parameter, so as to compensate for non-linear frequency dependency, followed by normalising (353) for plane waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said Plane Wave parameter;
    - in case a processing took place in two or more of said paths, multiplexing (36) the corresponding data;
    - output (361) of data frames (39) including said provided data and values.
  2. Apparatus for generating sound field data including Ambisonics sound field data of an order higher than three, said apparatus including:
    - means (33) being adapted for multiplying S input signals x (k), which are received from a microphone array (31) including m microphones and/or from one or more virtual sound sources (32), with a matrix Ψ, Ψ = Y 0 0 Ω 0 Y 0 0 Ω 1 . . Y 0 0 Ω S - 1 Y 1 - 1 Ω 0 Y 1 - 1 Ω 1 . Y 1 0 Ω 0 Y 1 0 Ω 1 . . . . Y N + N Ω 0 . . Y N + N Ω S - 1 ,
    Figure imgb0028
    wherein the matrix elements Y n m Ω s
    Figure imgb0029
    represent the spherical harmonics of all currently used directions Ω 0 ,...,Ω S-1 , index m denotes the order, index n denotes the degree of a spherical harmonics, N represents the Ambisonics order, n = 0,...,N, and m = -n,...,+n, so as to get coefficients vector data d (k) representing coded directional information of N Ambisonics signals for every sample time instant k;
    - means (340,341,351,342,352,343,353) being adapted for processing said coefficients vector data d (k), value N and parameter Norm in one or two or more of the following four paths:
    a) combining said coefficients vector data d (k), said value N and said parameter Norm with radii data RS representing the distances of the sources of said S input signals x(k);
    b) based on spherical waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N and radii Rm values, said radii Rm values representing individual microphone radii in a microphone array, so as to compensate for non-linear frequency dependency, followed by normalising for spherical waves data, so as to provide filtered coefficients A (k), said parameter Norm and said order N value;
    c) based on spherical waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii RM values and a radius Rref value, said radius Rref value representing a mean radius of loudspeakers arranged at decoder side, so as to compensate for non-linear frequency dependency, followed by normalising for spherical waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said radius Rref value;
    d) based on plane waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii RM values and a Plane Wave parameter, so as to compensate for non-linear frequency dependency, followed by normalising for plane waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said Plane Wave parameter;
    - a multiplexer means (36) for multiplexing the corresponding data in case a processing took place in two or more of said paths, which multiplexer means provide data frames (39) including said provided data and values.
  3. Method for decoding sound field data that were encoded according to claim 1 using one or two or more of said paths, said method including the steps:
    - parsing (41) the incoming encoded data, determining the type or types a) to d) of said paths used for said encoding and providing the further data required for a decoding according to the encoding path type or types;
    - performing a corresponding decoding processing for one or two or more of the paths a) to d):
    a) based on spherical waves, filtering (42) the received coefficients vector data d (k) in dependency from said radii data RS so as to provide filtered coefficients A (k),
    and distance coding (431) said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
    and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;
    b) based on spherical waves, distance coding (432) said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
    and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;
    c) based on spherical waves, distance coding (433) said filtered coefficients A (k) in dependency from said order value N and said radius value Rref for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
    and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;
    d) based on plane waves, providing said filtered coefficients A (k), order value N, parameter Norm and a flag for Plane Waves;
    - in case a processing took place in two or more of said paths, multiplexing (44) the corresponding data, wherein the selected (47) path or paths are determined based on parameter Norm, order value N and said Plane Waves flag;
    - decoding (45) said distance encoded filtered coefficients A'(k) or said filtered coefficients A(k), respectively, in dependency from said parameter Norm, said order value N and said loudspeaker direction values Ω l , so as to provide loudspeaker signals for a loudspeaker array (46).
  4. Apparatus for decoding sound field data that were encoded according to claim 1 using one or two or more of said paths, said apparatus including:
    - means (41) being adapted for parsing the incoming encoded data, and for determining the type or types a) to d) of said paths used for said encoding and for providing the further data required for a decoding according to the encoding path type or types;
    - means (42,431,432,433) being adapted for performing a corresponding decoding processing for one or two or more of the paths a) to d) :
    a) based on spherical waves, filtering the received coefficients vector data d (k) in dependency from said radii data RS so as to provide filtered coefficients A (k), and distance coding said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
    and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;
    b) based on spherical waves, distance coding said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
    and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;
    c) based on spherical waves, distance coding said filtered coefficients A (k) in dependency from said order value N and said radius value Rref for all radii Rl of loudspeakers to be used for a presentation of the decoded signals, and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;
    d) based on plane waves, providing said filtered coefficients A(k), order value N, parameter Norm and a flag for Plane Waves;
    - multiplexing means (44) which, in case a processing took place in two or more of said paths, select the corresponding data to be combined, based on parameter Norm, order value N and said Plane Waves flag;
    - decoding means (45) which decode said distance encoded filtered coefficients A'(k) or said filtered coefficients A(k), respectively, in dependency from said parameter Norm, said order value N and said loudspeaker direction values Ω l , so as to provide loudspeaker signals for a loudspeaker array (46).
  5. Method according to claim 3, or apparatus according to claim 4, wherein said parser (41) includes registers (REG) and content addressable memories (CAM) for collecting data items from the decoder input data by evaluating header data in a hierarchical manner, and wherein said content addressable memories (CAM) detect all protocol data which will lead to a decision about how the received data are to be processed in the decoding, and said registers (REG) store data item length information and/or information about payload data.
  6. Method according to claim 5, or apparatus according to claim 5, wherein said parser (41) provides data for two or more individual audio signals by distinguishing Ambisonics plane wave and spherical wave types (AWT).
  7. Method according to claim one of claims 1, 3 and 5, or apparatus according to one of claims 2, 4 and 5, wherein said Ambisonics sound field data are transferred using Ethernet or internet or a protocol network.
  8. Data structure for Ambisonics audio signal data which can be encoded according to claim 1, said data structure including:
    - a data field determining plane wave and spherical wave Ambisonics;
    - a data field determining the Ambisonics order types B-Format order, numerical upward order, numerical downward order;
    - a data field determining the channel in dependency from the degree n and the order m;
    - a data field determining horizontal or vertical order of the coefficients in the Ambisonics matrix.
EP10306212A 2010-11-05 2010-11-05 Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three Withdrawn EP2451196A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP10306212A EP2451196A1 (en) 2010-11-05 2010-11-05 Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP10306212A EP2451196A1 (en) 2010-11-05 2010-11-05 Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three

Publications (1)

Publication Number Publication Date
EP2451196A1 true EP2451196A1 (en) 2012-05-09

Family

ID=43585582

Family Applications (1)

Application Number Title Priority Date Filing Date
EP10306212A Withdrawn EP2451196A1 (en) 2010-11-05 2010-11-05 Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three

Country Status (1)

Country Link
EP (1) EP2451196A1 (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013545391A (en) * 2010-11-05 2013-12-19 トムソン ライセンシング Data structure for higher-order ambisonics audio data
WO2014012945A1 (en) * 2012-07-16 2014-01-23 Thomson Licensing Method and device for rendering an audio soundfield representation for audio playback
EP2733963A1 (en) 2012-11-14 2014-05-21 Thomson Licensing Method and apparatus for facilitating listening to a sound signal for matrixed sound signals
WO2014124261A1 (en) * 2013-02-08 2014-08-14 Qualcomm Incorporated Signaling audio rendering information in a bitstream
DE102013223201B3 (en) * 2013-11-14 2015-05-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and device for compressing and decompressing sound field data of a region
WO2015104166A1 (en) * 2014-01-08 2015-07-16 Thomson Licensing Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound field
WO2015130765A1 (en) * 2014-02-25 2015-09-03 Qualcomm Incorporated Order format signaling for higher-order ambisonic audio data
EP3002960A1 (en) * 2014-10-04 2016-04-06 Patents Factory Ltd. Sp. z o.o. System and method for generating surround sound
US9483228B2 (en) 2013-08-26 2016-11-01 Dolby Laboratories Licensing Corporation Live engine
US9609452B2 (en) 2013-02-08 2017-03-28 Qualcomm Incorporated Obtaining sparseness information for higher order ambisonic audio renderers
CN106796794A (en) * 2014-10-07 2017-05-31 高通股份有限公司 The normalization of environment high-order ambiophony voice data
US9883310B2 (en) 2013-02-08 2018-01-30 Qualcomm Incorporated Obtaining symmetry information for higher order ambisonic audio renderers
CN109448742A (en) * 2012-12-12 2019-03-08 杜比国际公司 The method and apparatus that the high-order ambiophony of sound field is indicated to carry out compression and decompression
US10334387B2 (en) 2015-06-25 2019-06-25 Dolby Laboratories Licensing Corporation Audio panning transformation system and method
US10356484B2 (en) 2013-03-15 2019-07-16 Samsung Electronics Co., Ltd. Data transmitting apparatus, data receiving apparatus, data transceiving system, method for transmitting data, and method for receiving data
TWI666931B (en) * 2013-03-15 2019-07-21 三星電子股份有限公司 Data transmitting apparatus, data receiving apparatus and data transceiving system
CN111460883A (en) * 2020-01-22 2020-07-28 电子科技大学 Video behavior automatic description method based on deep reinforcement learning
CN112216292A (en) * 2014-06-27 2021-01-12 杜比国际公司 Method and apparatus for decoding a compressed HOA sound representation of a sound or sound field
CN112908349A (en) * 2014-06-27 2021-06-04 杜比国际公司 Method and apparatus for determining a minimum number of integer bits required to represent non-differential gain values for compression of a representation of a HOA data frame
US11234091B2 (en) 2012-05-14 2022-01-25 Dolby Laboratories Licensing Corporation Method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation
US11962990B2 (en) 2013-05-29 2024-04-16 Qualcomm Incorporated Reordering of foreground audio objects in the ambisonics domain

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004047485A1 (en) 2002-11-21 2004-06-03 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio playback system and method for playing back an audio signal
EP1936908A1 (en) 2006-12-19 2008-06-25 Deutsche Thomson OHG Method, apparatus and data container for transferring high resolution audio/video data in a high speed IP network
WO2009106637A1 (en) 2008-02-28 2009-09-03 Thomson Licensing Hardware-based parser for packect-oriented protocols

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004047485A1 (en) 2002-11-21 2004-06-03 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio playback system and method for playing back an audio signal
EP1936908A1 (en) 2006-12-19 2008-06-25 Deutsche Thomson OHG Method, apparatus and data container for transferring high resolution audio/video data in a high speed IP network
WO2009106637A1 (en) 2008-02-28 2009-09-03 Thomson Licensing Hardware-based parser for packect-oriented protocols

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"Spherical harmonics", 28 June 2011 (2011-06-28), XP002646194, Retrieved from the Internet <URL:http://en.wikipedia.org/wiki/Spherical_harmonics> [retrieved on 20110628] *
J.DANIEL: "Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format", AES 23RD INTERNATIONAL CONFERENCE, vol. 23, 23 May 2003 (2003-05-23), XP002647040 *
J.DANIEL: "Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format", AES 23TH INTL.CONF., vol. 23, 25 May 2003 (2003-05-25)
M.A.POLETTI: "Three-Dimensional Surround Sound Systems Based on Sperical Harmonics", J.AUDIO ENG.SOC., vol. 53, no. 11, November 2005 (2005-11-01)
MICROSOFT, MULTIPLE CHANNEL AUDIO DATA AND WAVE FILES, 7 March 2007 (2007-03-07)

Cited By (75)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013545391A (en) * 2010-11-05 2013-12-19 トムソン ライセンシング Data structure for higher-order ambisonics audio data
US9241216B2 (en) 2010-11-05 2016-01-19 Thomson Licensing Data structure for higher order ambisonics audio data
TWI823073B (en) * 2012-05-14 2023-11-21 瑞典商杜比國際公司 Method and apparatus for compressing and decompressing a higher order ambisonics signal representation and non-transitory computer readable medium
US11234091B2 (en) 2012-05-14 2022-01-25 Dolby Laboratories Licensing Corporation Method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation
US11792591B2 (en) 2012-05-14 2023-10-17 Dolby Laboratories Licensing Corporation Method and apparatus for compressing and decompressing a higher order Ambisonics signal representation
CN107071685A (en) * 2012-07-16 2017-08-18 杜比国际公司 The method and apparatus for audio playback is represented for rendering audio sound field
CN106658342A (en) * 2012-07-16 2017-05-10 杜比国际公司 Method and device for rendering an audio soundfield representation for audio playback
EP4284026A3 (en) * 2012-07-16 2024-02-21 Dolby International AB Method and device for rendering an audio soundfield representation
WO2014012945A1 (en) * 2012-07-16 2014-01-23 Thomson Licensing Method and device for rendering an audio soundfield representation for audio playback
US11743669B2 (en) 2012-07-16 2023-08-29 Dolby Laboratories Licensing Corporation Method and device for decoding a higher-order ambisonics (HOA) representation of an audio soundfield
US11451920B2 (en) 2012-07-16 2022-09-20 Dolby Laboratories Licensing Corporation Method and device for decoding a higher-order ambisonics (HOA) representation of an audio soundfield
CN106658343B (en) * 2012-07-16 2018-10-19 杜比国际公司 Method and apparatus for rendering the expression of audio sound field for audio playback
EP4013072A1 (en) * 2012-07-16 2022-06-15 Dolby International AB Method and device for rendering an audio soundfield representation
US10075799B2 (en) 2012-07-16 2018-09-11 Dolby Laboratories Licensing Corporation Method and device for rendering an audio soundfield representation
US10939220B2 (en) 2012-07-16 2021-03-02 Dolby Laboratories Licensing Corporation Method and device for decoding a higher-order ambisonics (HOA) representation of an audio soundfield
US10595145B2 (en) 2012-07-16 2020-03-17 Dolby Laboratories Licensing Corporation Method and device for decoding a higher-order ambisonics (HOA) representation of an audio soundfield
CN104584588B (en) * 2012-07-16 2017-03-29 杜比国际公司 The method and apparatus for audio playback is represented for rendering audio sound field
CN106658343A (en) * 2012-07-16 2017-05-10 杜比国际公司 Method and device for rendering an audio sound field representation for audio playback
CN104584588A (en) * 2012-07-16 2015-04-29 汤姆逊许可公司 Method and device for rendering an audio soundfield representation for audio playback
CN107071685B (en) * 2012-07-16 2020-02-14 杜比国际公司 Method and apparatus for rendering an audio soundfield representation for audio playback
US9712938B2 (en) 2012-07-16 2017-07-18 Dolby Laboratories Licensing Corporation Method and device rendering an audio soundfield representation for audio playback
US10306393B2 (en) 2012-07-16 2019-05-28 Dolby Laboratories Licensing Corporation Method and device for rendering an audio soundfield representation
CN107071686A (en) * 2012-07-16 2017-08-18 杜比国际公司 The method and apparatus for audio playback is represented for rendering audio sound field
CN107071687A (en) * 2012-07-16 2017-08-18 杜比国际公司 The method and apparatus for audio playback is represented for rendering audio sound field
CN106658342B (en) * 2012-07-16 2020-02-14 杜比国际公司 Method and apparatus for rendering an audio soundfield representation for audio playback
CN107071687B (en) * 2012-07-16 2020-02-14 杜比国际公司 Method and apparatus for rendering an audio soundfield representation for audio playback
CN107071686B (en) * 2012-07-16 2020-02-14 杜比国际公司 Method and apparatus for rendering an audio soundfield representation for audio playback
US9961470B2 (en) 2012-07-16 2018-05-01 Dolby Laboratories Licensing Corporation Method and device for rendering an audio soundfield representation
US9723424B2 (en) 2012-11-14 2017-08-01 Dolby Laboratories Licensing Corporation Making available a sound signal for higher order ambisonics signals
EP2733963A1 (en) 2012-11-14 2014-05-21 Thomson Licensing Method and apparatus for facilitating listening to a sound signal for matrixed sound signals
WO2014075934A1 (en) 2012-11-14 2014-05-22 Thomson Licensing Making available a sound signal for higher order ambisonics signals
CN109448742A (en) * 2012-12-12 2019-03-08 杜比国际公司 The method and apparatus that the high-order ambiophony of sound field is indicated to carry out compression and decompression
CN109448742B (en) * 2012-12-12 2023-09-01 杜比国际公司 Method and apparatus for compressing and decompressing higher order ambisonic representations of a sound field
CN104981869B (en) * 2013-02-08 2019-04-26 高通股份有限公司 Audio spatial cue is indicated with signal in bit stream
US9883310B2 (en) 2013-02-08 2018-01-30 Qualcomm Incorporated Obtaining symmetry information for higher order ambisonic audio renderers
US9609452B2 (en) 2013-02-08 2017-03-28 Qualcomm Incorporated Obtaining sparseness information for higher order ambisonic audio renderers
RU2661775C2 (en) * 2013-02-08 2018-07-19 Квэлкомм Инкорпорейтед Transmission of audio rendering signal in bitstream
WO2014124261A1 (en) * 2013-02-08 2014-08-14 Qualcomm Incorporated Signaling audio rendering information in a bitstream
CN104981869A (en) * 2013-02-08 2015-10-14 高通股份有限公司 Signaling audio rendering information in a bitstream
US9870778B2 (en) 2013-02-08 2018-01-16 Qualcomm Incorporated Obtaining sparseness information for higher order ambisonic audio renderers
US10178489B2 (en) 2013-02-08 2019-01-08 Qualcomm Incorporated Signaling audio rendering information in a bitstream
TWI666931B (en) * 2013-03-15 2019-07-21 三星電子股份有限公司 Data transmitting apparatus, data receiving apparatus and data transceiving system
US10356484B2 (en) 2013-03-15 2019-07-16 Samsung Electronics Co., Ltd. Data transmitting apparatus, data receiving apparatus, data transceiving system, method for transmitting data, and method for receiving data
US11962990B2 (en) 2013-05-29 2024-04-16 Qualcomm Incorporated Reordering of foreground audio objects in the ambisonics domain
US9483228B2 (en) 2013-08-26 2016-11-01 Dolby Laboratories Licensing Corporation Live engine
WO2015071148A1 (en) 2013-11-14 2015-05-21 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and device for compressing and decompressing sound field data of an area
DE102013223201B3 (en) * 2013-11-14 2015-05-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and device for compressing and decompressing sound field data of a region
EP4089675A1 (en) * 2014-01-08 2022-11-16 Dolby International AB Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound field
US11211078B2 (en) 2014-01-08 2021-12-28 Dolby Laboratories Licensing Corporation Method and apparatus for decoding a bitstream including encoded higher order ambisonics representations
CN111179951A (en) * 2014-01-08 2020-05-19 杜比国际公司 Method and apparatus for decoding a bitstream comprising an encoded HOA representation, and medium
CN111182443A (en) * 2014-01-08 2020-05-19 杜比国际公司 Method and apparatus for decoding a bitstream comprising an encoded HOA representation, and medium
CN111179955A (en) * 2014-01-08 2020-05-19 杜比国际公司 Method and apparatus for decoding a bitstream comprising an encoded HOA representation, and medium
US10714112B2 (en) 2014-01-08 2020-07-14 Dolby Laboratories Licensing Corporation Method and apparatus for decoding a bitstream including encoded higher order Ambisonics representations
US10147437B2 (en) 2014-01-08 2018-12-04 Dolby Laboratories Licensing Corporation Method and apparatus for decoding a bitstream including encoding higher order ambisonics representations
EP3648102A1 (en) * 2014-01-08 2020-05-06 Dolby International AB Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound field
CN111028849A (en) * 2014-01-08 2020-04-17 杜比国际公司 Method and apparatus for decoding a bitstream comprising an encoded HOA representation, and medium
CN111179955B (en) * 2014-01-08 2024-04-09 杜比国际公司 Decoding method and apparatus comprising a bitstream encoding an HOA representation, and medium
CN111182443B (en) * 2014-01-08 2021-10-22 杜比国际公司 Method and apparatus for decoding a bitstream comprising an encoded HOA representation
US10424312B2 (en) 2014-01-08 2019-09-24 Dolby Laboratories Licensing Corporation Method and apparatus for decoding a bitstream including encoded higher order ambisonics representations
CN105981100A (en) * 2014-01-08 2016-09-28 杜比国际公司 Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound field
CN111028849B (en) * 2014-01-08 2024-03-01 杜比国际公司 Decoding method and apparatus comprising a bitstream encoding an HOA representation, and medium
CN111179951B (en) * 2014-01-08 2024-03-01 杜比国际公司 Decoding method and apparatus comprising a bitstream encoding an HOA representation, and medium
US11869523B2 (en) 2014-01-08 2024-01-09 Dolby Laboratories Licensing Corporation Method and apparatus for decoding a bitstream including encoded higher order ambisonics representations
US11488614B2 (en) 2014-01-08 2022-11-01 Dolby Laboratories Licensing Corporation Method and apparatus for decoding a bitstream including encoded Higher Order Ambisonics representations
US9990934B2 (en) 2014-01-08 2018-06-05 Dolby Laboratories Licensing Corporation Method and apparatus for improving the coding of side information required for coding a Higher Order Ambisonics representation of a sound field
US10553233B2 (en) 2014-01-08 2020-02-04 Dolby Laboratories Licensing Corporation Method and apparatus for decoding a bitstream including encoded higher order ambisonics representations
WO2015104166A1 (en) * 2014-01-08 2015-07-16 Thomson Licensing Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound field
WO2015130765A1 (en) * 2014-02-25 2015-09-03 Qualcomm Incorporated Order format signaling for higher-order ambisonic audio data
CN112216292A (en) * 2014-06-27 2021-01-12 杜比国际公司 Method and apparatus for decoding a compressed HOA sound representation of a sound or sound field
CN112908349A (en) * 2014-06-27 2021-06-04 杜比国际公司 Method and apparatus for determining a minimum number of integer bits required to represent non-differential gain values for compression of a representation of a HOA data frame
EP3002960A1 (en) * 2014-10-04 2016-04-06 Patents Factory Ltd. Sp. z o.o. System and method for generating surround sound
CN106796794A (en) * 2014-10-07 2017-05-31 高通股份有限公司 The normalization of environment high-order ambiophony voice data
US10334387B2 (en) 2015-06-25 2019-06-25 Dolby Laboratories Licensing Corporation Audio panning transformation system and method
CN111460883B (en) * 2020-01-22 2022-05-03 电子科技大学 Video behavior automatic description method based on deep reinforcement learning
CN111460883A (en) * 2020-01-22 2020-07-28 电子科技大学 Video behavior automatic description method based on deep reinforcement learning

Similar Documents

Publication Publication Date Title
EP2451196A1 (en) Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three
EP3175446B1 (en) Audio processing systems and methods
TWI476761B (en) Audio encoding method and system for generating a unified bitstream decodable by decoders implementing different decoding protocols
EP3800898B1 (en) Data processor and transport of user control data to audio decoders and renderers
JP4787442B2 (en) System and method for providing interactive audio in a multi-channel audio environment
CN111837182B (en) Method and apparatus for generating or decoding a bitstream comprising an immersive audio signal
EP1949693B1 (en) Method and apparatus for processing/transmitting bit-stream, and method and apparatus for receiving/processing bit-stream
JP7207447B2 (en) Receiving device, receiving method, transmitting device and transmitting method
JP6908168B2 (en) Receiver, receiver, transmitter and transmit method
JP7310849B2 (en) Receiving device and receiving method
WO2020152394A1 (en) Audio representation and associated rendering
CN106375778B (en) Method for transmitting three-dimensional audio program code stream conforming to digital movie specification
JP6699564B2 (en) Transmission device, transmission method, reception device, and reception method
KR101531510B1 (en) Receiving system and method of processing audio data
CN114448955B (en) Digital audio network transmission method, device, equipment and storage medium
WO2021255327A1 (en) Managing network jitter for multiple audio streams

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20121110